WO2022105608A1 - Rapid face density prediction and face detection method and apparatus, electronic device, and storage medium - Google Patents

Rapid face density prediction and face detection method and apparatus, electronic device, and storage medium Download PDF

Info

Publication number
WO2022105608A1
WO2022105608A1 PCT/CN2021/128477 CN2021128477W WO2022105608A1 WO 2022105608 A1 WO2022105608 A1 WO 2022105608A1 CN 2021128477 W CN2021128477 W CN 2021128477W WO 2022105608 A1 WO2022105608 A1 WO 2022105608A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
feature
features
group
image
Prior art date
Application number
PCT/CN2021/128477
Other languages
French (fr)
Chinese (zh)
Inventor
张敏文
周治尹
Original Assignee
上海点泽智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海点泽智能科技有限公司 filed Critical 上海点泽智能科技有限公司
Publication of WO2022105608A1 publication Critical patent/WO2022105608A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • the invention relates to image information processing technology, in particular to a method, device, electronic device and storage medium for fast face density prediction and face detection.
  • Face detection has important application value in security monitoring, witness comparison, human-computer interaction, social networking and other fields.
  • Digital cameras, smart phones and other devices have used face detection technology in a large number to achieve functions such as focusing on faces, sorting and classifying atlas during imaging, and various virtual beauty cameras also require face detection technology to locate faces.
  • the common face detection methods need to set the face candidate frame first, and learn the offset on the face candidate frame through the neural network to obtain the position of the face in the image, and the setting of the candidate frame will directly affect the The accuracy of face detection;
  • the FaceBoxes model has high accuracy, but contains a large amount of parameters;
  • the MTCNN Multi-task Cascaded Convolutional Networks
  • the U-shaped feature extraction network only expands high-level features during feature fusion, and does not fully utilize the texture information of high-level features and the detailed information of low-level features.
  • the present invention proposes a face detection method, comprising the following steps:
  • Step S1 acquiring an image to be detected
  • Step S2 using feature pyramid residual blocks to extract multi-scale features in the image to be detected
  • Step S3 adopting the mutual embedding upsampling module to perform feature fusion
  • Step S4 using the face detection module to predict the confidence level of the face and the width and height of the face.
  • the step S2 includes:
  • Step S2.1 use a 3 ⁇ 3 convolution kernel to convolve the image to be detected, and send the convolved image into the feature pyramid residual block to extract features;
  • Step S2.2 combine a plurality of the feature pyramid residual blocks into a feature extraction network, and extract the features of the feature map output by the step S2.1;
  • Step S2.3 Combine a plurality of the feature pyramid residual blocks into a feature extraction network, and extract the features of the feature map output in the step S2.2.
  • the feature pyramid residual block provided by this application includes:
  • a 1 ⁇ 1 convolution operation is used to expand the number of channels of the feature map; the feature maps are equally divided into 4 groups in the channel direction, and the first group uses a 3 ⁇ 3 convolution kernel with a hole size of 1 to convolve the features of the first group , the second group uses a 3 ⁇ 3 convolution kernel of hole size 2 to convolve the features of the second group, and the third group uses a 3 ⁇ 3 convolution kernel of hole size 4 to convolve the features of the third group , the fourth group uses a 3 ⁇ 3 convolution kernel with a hole size of 8 to convolve the features of the fourth group; the 4 groups of features convolved by the convolution kernel are combined in order to form the first feature map, using 1
  • the ⁇ 1 convolution performs feature fusion on the first feature map to form a second feature map; the feature map and the second feature map are added together.
  • the receptive fields of the atrous convolutions of the first group, the second group, the third group, and the fourth group are 3, 5, 9, and 17, respectively.
  • the present application implements feature fusion through feature pyramid residual blocks to increase the receptive field of neurons without increasing parameters.
  • the four groups of hole convolutions are all depthwise convolutions.
  • the original feature map is divided into single-channel feature maps, and then the single-channel convolution kernel is used to convolve the single-channel feature map, which can reduce the Parameters of the network model.
  • the 4 groups of convolutions of the residual block of the feature pyramid are distributed horizontally, which improves the receptive field of neurons without increasing the depth and parameters of the network, so that the network can extract more face information.
  • the step S3 includes:
  • Step S3.1 using the inter-embedded upsampling module to perform feature fusion on the features extracted in the step S2.2 and the features extracted in the step S2.3;
  • Step S3.2 Use the inter-embedded upsampling module to perform feature fusion on the features fused in the step S3.1 and the features extracted in the step S2.1.
  • the present application adopts the inter-embedded upsampling module on the high-stage feature map, adopts the channel attention model to obtain the first attention coefficient of each channel, and multiplies the first attention coefficient and the low-stage features to obtain the the first fusion feature of the channel attention model fusion;
  • the spatial attention model is used to obtain the second attention coefficient of each point in the feature map, and the second attention coefficient is multiplied by the up-sampled high-stage feature map to obtain the The second fusion feature fused by the spatial attention model; the first fusion feature and the second fusion feature are added to obtain the final fusion feature.
  • the step S4 includes:
  • Step S4.1 use a 3 ⁇ 3 convolution kernel to convolve the fused features in step S3.2;
  • Step S4.2 Use two 1 ⁇ 1 convolution kernels to predict face confidence and face width respectively.
  • the image to be detected can be regarded as a two-dimensional coordinate system, and the upper left corner of the image can be regarded as the origin of the coordinate system, then the face in the image can be regarded as a two-dimensional Gaussian distribution.
  • the center position of the face is the center point of the Gaussian distribution, its coordinate value corresponds to the mean of the two-dimensional Gaussian distribution, and the width and height of the face correspond to the variance of the two-dimensional Gaussian distribution.
  • another embodiment of the present application discloses a network training process with a label and a loss function, specifically:
  • x and y are the mean values of the two-dimensional Gaussian distribution N
  • ⁇ 1 and ⁇ 2 are the variances of the two-dimensional Gaussian distribution, corresponding to the width and height of the face, respectively. Therefore, the face distribution corresponding to an image containing n faces can be expressed as:
  • is the label for predicting the center point of the face, and ⁇ is the label for predicting the width and height of the face;
  • the loss function can be expressed as:
  • P and K are the output of the network, namely the confidence of the face (normalized Gaussian distribution amplitude) and the width and height of the face (variance of the Gaussian distribution), and ⁇ is the loss scale coefficient.
  • the embodiment of the present application also provides a fast face density prediction and face detection device, including:
  • an image acquisition module for acquiring the image to be detected
  • a feature extraction module used for extracting multi-scale features in the image to be detected by using a feature pyramid residual block
  • the feature fusion module is used for feature fusion using the inter-embedded upsampling module
  • the detection result module is used to use the face detection module to predict the confidence level of the face and the width and height of the face.
  • Embodiments of the present application further provide an electronic device, including a memory, a processor, and machine-readable instructions stored in the memory and executable on the processor, wherein the processor executes the machine-readable instructions , execute the method described above.
  • Embodiments of the present application further provide a storage medium on which a computer program is stored, characterized in that, when the program is run by a processor, the method as described above is executed.
  • This application adopts the method of predicting Gaussian distribution to predict the density of faces in images and detect faces in images, so as to avoid unstable factors caused by the use of candidate frames; a feature pyramid residual block is used to use a small convolution kernel And do not increase the depth of the network to increase the receptive field of neurons; realize that the receptive field of neurons is improved without increasing the depth and parameters of the network, so that the network can extract more face information; the inter-embedded upsampling module is used for Feature fusion, when realizing the fusion of high and low-level features, makes full use of the texture information of high-level features and the detailed information of low-level features.
  • FIG. 1 is a schematic flowchart of a method for fast face density prediction and face detection provided by an embodiment of the present application
  • FIG. 2 is a structural block diagram of a face density prediction and face detection model provided by an embodiment of the present application
  • FIG. 3 is a structural block diagram of a feature pyramid residual block provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of an apparatus for fast face density prediction and face detection provided by an embodiment of the present application
  • FIG. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • a fast face density prediction and face detection method includes the following steps:
  • Step S1 acquiring an image to be detected
  • the image to be detected refers to an image that needs to be detected whether it includes a human face, for example, a color image, a black-and-white image, or a binary image captured on a human face.
  • the method of obtaining the image to be detected in the above step S1 includes: using a terminal device such as a video camera, a video recorder or a color camera to photograph the target object to obtain the image to be detected; obtaining a pre-stored image to be detected, specifically for example: from a real-time video stream or from the video file in the file system to obtain the image to be detected, or obtain the image to be detected from the database, or obtain the image to be detected from the mobile storage device; use software such as a browser to obtain the image to be detected on the Internet, or Use other applications to access the Internet to obtain images to be inspected.
  • a terminal device such as a video camera, a video recorder or a color camera to photograph the target object to obtain the image to be detected
  • obtaining a pre-stored image to be detected specifically for example: from a real-time video stream or from the video file in the file system to obtain the image to be detected, or obtain the image to be detected from the database, or obtain the image to be detected from the mobile storage device
  • Step S2 using feature pyramid residual blocks to extract multi-scale features in the image to be detected
  • using the feature pyramid residual block to extract the multi-scale features in the image to be detected further includes the following steps:
  • Step S2.1 In the first stage, a 3 ⁇ 3 convolution kernel is used to convolve the image to be detected, and the convolved image is sent to the feature pyramid residual block to extract features;
  • Step S2.2 In the second stage, multiple feature pyramid residual blocks are combined into a feature extraction network, and the features of the feature map output in step S2.1 are extracted;
  • Step S2.3 In the third stage, multiple feature pyramid residual blocks are combined into a feature extraction network to extract the features of the feature map output in step S2.2.
  • FIG. 4 for the structural block diagram of the feature pyramid residual block provided by the embodiment of the present application.
  • a 1 ⁇ 1 convolution operation is used to expand the number of channels of the feature map; the feature maps are equally divided into 4 groups in the channel direction, and the first group uses a 3 ⁇ 3 convolution kernel with a hole size of 1 to convolve the features of the first group , the second group uses a 3 ⁇ 3 convolution kernel of hole size 2 to convolve the features of the second group, and the third group uses a 3 ⁇ 3 convolution kernel of hole size 4 to convolve the features of the third group , the fourth group uses a 3 ⁇ 3 convolution kernel with a hole size of 8 to convolve the features of the fourth group; the 4 groups of features convolved by the convolution kernel are combined in order to form the first feature map, using 1
  • the convolution of ⁇ 1 performs feature fusion on the first feature map to form a second feature map; the feature map and the second feature map are added together.
  • the receptive fields of the atrous convolutions of the first group, the second group, the third group, and the fourth group are 3, 5, 9, and 17, respectively.
  • Step S3 adopting the mutual embedding upsampling module to perform feature fusion
  • the embodiment of the present application adopts the inter-embedded upsampling module on the high-stage feature map, adopts the channel attention model to obtain the first attention coefficient of each channel, and multiplies the first attention coefficient and the low-stage features, obtaining the first fusion feature fused by the channel attention model;
  • the spatial attention model is used to obtain the second attention coefficient of each point in the feature map, and the second attention coefficient is multiplied by the up-sampled high-stage feature map to obtain the The second fusion feature of spatial attention model fusion;
  • the first fusion feature and the second fusion feature are added to obtain the final fusion feature.
  • the channel attention model and the spatial attention model are common technologies in the field, and mainly focus on the mechanism of local information, such as a certain image area in the image. With the change of tasks, attention areas tend to change, which will not be repeated in this application.
  • the inter-embedded upsampling module is used for feature fusion, and the texture information of the high-level features and the detailed information of the low-level features are fully utilized when the high-level and low-level feature fusion is realized.
  • Step S4 Using the face detection model network to predict the confidence level of the face and the width and height of the face. Specifically, it also includes the following steps:
  • Step S4.1 use a 3 ⁇ 3 convolution kernel to convolve the fused features in step S3.2;
  • Step S4.2 Use two 1 ⁇ 1 convolution kernels to predict face confidence and face width and height respectively.
  • the key points represent the key feature points in the face area; optionally, you can At the end of this method, there is another output, and the key points of the face are detected by the method of predicting the position of the center point of the face.
  • the image to be detected can be regarded as a two-dimensional coordinate system, and the upper left corner of the image can be regarded as the origin of the coordinate system, then the face in the image can be regarded as a two-dimensional Gaussian distribution.
  • the center position of the face is the center point of the Gaussian distribution, its coordinate value corresponds to the mean of the two-dimensional Gaussian distribution, and the width and height of the face correspond to the variance of the two-dimensional Gaussian distribution.
  • Another embodiment of the present application also provides a label and a loss function to perform a network training process, specifically:
  • x and y are the mean values of the two-dimensional Gaussian distribution N
  • ⁇ 1 and ⁇ 2 are the variances of the two-dimensional Gaussian distribution, corresponding to the width and height of the face, respectively. Therefore, the face distribution corresponding to an image containing n faces can be expressed as:
  • is the label for predicting the center point of the face, and ⁇ is the label for predicting the width and height of the face;
  • the loss function can be expressed as:
  • P and K are the output of the network, namely the confidence of the face (normalized Gaussian distribution amplitude) and the width and height of the face (variance of the Gaussian distribution), and ⁇ is the loss scale coefficient.
  • this method adopts the method of predicting the Gaussian distribution to predict the face density in the image and detect the face in the image, so as to avoid the unstable factors caused by the use of candidate frames.
  • the embodiment of the present application provides a face density prediction and face detection apparatus 300, including:
  • an image acquisition module 310 configured to acquire an image to be detected
  • a feature extraction module 320 configured to extract multi-scale features in the image to be detected by using a feature pyramid residual block
  • the feature fusion module 330 is used to perform feature fusion by adopting the mutual embedded upsampling module;
  • the detection result module 340 is configured to use the face detection module to predict the confidence level of the face and the width and height of the face to obtain the face detection result.
  • the device corresponds to the above-mentioned embodiments of the fast face density prediction and face detection methods, and can perform various steps involved in the above-mentioned method embodiments.
  • the device includes at least one software function module that can be stored in a memory in the form of software or firmware or fixed in an operating system (OS) of the device.
  • OS operating system
  • An electronic device 400 provided in an embodiment of the present application includes: a processor 410 and a memory 420, where the memory 420 stores machine-readable instructions executable by the processor 410, and the above method is executed when the machine-readable instructions are executed by the processor 410 .
  • the embodiment of the present application also provides a storage medium 430, where a computer program is stored on the storage medium 430, and the computer program is executed by the processor 410 to execute the above method.
  • the storage medium 430 can be realized by any type of volatile or non-volatile storage device or their combination, such as static random access memory (Static Random Access Memory, SRAM for short), electrically erasable programmable read-only Memory (Electrically Erasable Programmable Read-Only Memory, referred to as EEPROM), Erasable Programmable Read Only Memory (Erasable Programmable Read Only Memory, referred to as EPROM), Programmable Read-Only Memory (Programmable Red-Only Memory, referred to as PROM), only Read-Only Memory (ROM for short), magnetic memory, flash memory, magnetic disk or optical disk.
  • static random access memory Static Random Access Memory, SRAM for short
  • EEPROM Electrically erasable programmable read-only Memory
  • EPROM Erasable Programmable Read Only Memory
  • PROM Programmable Read-Only Memory
  • ROM Only Read-Only Memory
  • first and second are only used for the purpose of description, and cannot be understood as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature defined as “first”, “second” may expressly or implicitly include one or more of that feature. “Plurality” means two or more, unless expressly specifically limited otherwise.
  • the terms “installed”, “connected”, “connected”, “fixed” and other terms should be understood in a broad sense, for example, it may be a fixed connection or a detachable connection , or integrated; it can be a mechanical connection or an electrical connection; it can be a direct connection or an indirect connection through an intermediate medium, and it can be the internal connection of the two elements or the interaction relationship between the two elements.
  • installed may be a fixed connection or a detachable connection , or integrated; it can be a mechanical connection or an electrical connection; it can be a direct connection or an indirect connection through an intermediate medium, and it can be the internal connection of the two elements or the interaction relationship between the two elements.
  • a first feature "on” or “under” a second feature may be in direct contact between the first and second features, or the first and second features indirectly through an intermediary touch.
  • the first feature being “above”, “over” and “above” the second feature may mean that the first feature is directly above or obliquely above the second feature, or simply means that the first feature is level higher than the second feature.
  • the first feature being “below”, “below” and “below” the second feature may mean that the first feature is directly below or obliquely below the second feature, or simply means that the first feature has a lower level than the second feature.
  • a "computer-readable medium” can be any device that can contain, store, communicate, propagate, or transport the program for use by or in connection with an instruction execution system, apparatus, or apparatus.
  • computer readable media include the following: electrical connections with one or more wiring (electronic devices), portable computer disk cartridges (magnetic devices), random access memory (RAM), Read Only Memory (ROM), Erasable Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM).
  • the computer readable medium may even be paper or other suitable medium on which the program may be printed, as the paper or other medium may be optically scanned, for example, followed by editing, interpretation, or other suitable medium as necessary process to obtain the program electronically and then store it in computer memory.
  • various parts of the present invention may be implemented in hardware, software, firmware or a combination thereof.
  • various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system.
  • a suitable instruction execution system For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or a combination of the following techniques known in the art: Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, Programmable Gate Arrays (PGA), Field Programmable Gate Arrays (FPGA), etc.
  • each functional unit in each embodiment of the present invention may be integrated into one processing module, or each unit may exist physically alone, or two or more units may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. If the integrated modules are implemented in the form of software functional modules and sold or used as independent products, they may also be stored in a computer-readable storage medium.
  • the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The present application provides a rapid face density prediction and face detection method and apparatus, an electronic device, and a storage medium. The method comprises the following steps: obtaining an image to be detected; extracting, by using a feature pyramid residual block, a multi-scale feature in the image to be detected; performing feature fusion by using a mutual embedded upsampling module; and using a face detection module to predict a face confidence level and the width and height of the face. In the implementation process described above, the present application uses a method for predicting Gaussian distribution to predict the face density in the image and detect the face in the image, so as to avoid unstable factors caused by using a candidate box; a feature pyramid residual block is used and a small convolution kernel is used, and the depth of a network is not increased to increase the receptive field of neurons; the depth and parameter of the network are not increased, and the receptive field of the neurons is improved, so that more face information can be extracted by means of the network.

Description

一种快速人脸密度预测和人脸检测方法、装置、电子设备及存储介质A fast face density prediction and face detection method, device, electronic device and storage medium 技术领域technical field
本发明涉及图像信息处理技术,尤其涉及一种快速人脸密度预测和人脸检测方法、装置、电子设备及存储介质。The invention relates to image information processing technology, in particular to a method, device, electronic device and storage medium for fast face density prediction and face detection.
背景技术Background technique
人脸检测在安防监控、人证比对、人机交互、社交等领域都有重要的应用价值。数码相机、智能手机等端上的设备已经大量使用人脸检测技术实现成像时对人脸的对焦、图集整理分类等功能,各种虚拟美颜相机也需要人脸检测技术定位人脸。Face detection has important application value in security monitoring, witness comparison, human-computer interaction, social networking and other fields. Digital cameras, smart phones and other devices have used face detection technology in a large number to achieve functions such as focusing on faces, sorting and classifying atlas during imaging, and various virtual beauty cameras also require face detection technology to locate faces.
目前常见的人脸检测方法(FaceBoxes、MTCNN)需先设置人脸候选框,通过神经网络学习人脸候选框上偏移量来得到人脸在图像中的位置,而候选框的设置会直接影响人脸检测的精度;FaceBoxes模型具有很高的精度,但所包含的参数量较大;MTCNN(Multi-task Cascaded Convolutional Networks)模型参数量较少,但其特征表达能力一般,同时包含三个需要分开训练的神经网络,不容易训练;同时U型特征提取网络,特征融合时只是将高层特征扩展,未充分运用高层特征的纹理信息和低层特征的细节信息。At present, the common face detection methods (FaceBoxes, MTCNN) need to set the face candidate frame first, and learn the offset on the face candidate frame through the neural network to obtain the position of the face in the image, and the setting of the candidate frame will directly affect the The accuracy of face detection; the FaceBoxes model has high accuracy, but contains a large amount of parameters; the MTCNN (Multi-task Cascaded Convolutional Networks) model has a small amount of parameters, but its feature expression ability is general, and it contains three needs. Separately trained neural networks are not easy to train; at the same time, the U-shaped feature extraction network only expands high-level features during feature fusion, and does not fully utilize the texture information of high-level features and the detailed information of low-level features.
发明内容SUMMARY OF THE INVENTION
为解决上述技术问题,本发明提出一种人脸检测方法,包括以下步骤:In order to solve the above-mentioned technical problems, the present invention proposes a face detection method, comprising the following steps:
步骤S1:获取待检测图像;Step S1: acquiring an image to be detected;
步骤S2:采用特征金字塔残差块提取待检测图像中的多尺度特征;Step S2: using feature pyramid residual blocks to extract multi-scale features in the image to be detected;
步骤S3:采用互嵌入上采样模块进行特征融合;Step S3: adopting the mutual embedding upsampling module to perform feature fusion;
步骤S4:采用人脸检测模块预测人脸置信度和人脸的宽度、高度。Step S4: using the face detection module to predict the confidence level of the face and the width and height of the face.
优选地,所述步骤S2包括:Preferably, the step S2 includes:
步骤S2.1:使用3×3的卷积核与所述待检测图像进行卷积,将卷积后的图像送入到所述特征金字塔残差块中提取特征;Step S2.1: use a 3×3 convolution kernel to convolve the image to be detected, and send the convolved image into the feature pyramid residual block to extract features;
步骤S2.2:使用多个所述特征金字塔残差块组合成一个特征提取网络,提取所述步骤S2.1输出的特征图的特征;Step S2.2: combine a plurality of the feature pyramid residual blocks into a feature extraction network, and extract the features of the feature map output by the step S2.1;
步骤S2.3:使用多个所述特征金字塔残差块组合成一个特征提取网络,提取所述步骤S2.2输出的特征图的特征。Step S2.3: Combine a plurality of the feature pyramid residual blocks into a feature extraction network, and extract the features of the feature map output in the step S2.2.
优选地,本申请提供的特征金字塔残差块包括:Preferably, the feature pyramid residual block provided by this application includes:
采用1×1的卷积操作扩展特征图的通道数;将特征图在通道方向平均分成4组,第1组使用空洞大小为1的3×3卷积核卷积所述第1组的特征,第2组使用空洞大小为2的3×3卷积核卷积所述第2组的特征,第3组使用空洞大小为4的3×3卷积核卷积所述第3组的特征,第4组使用空洞大小为8的3×3卷积核卷积所述第4组的特征;将经卷积核卷积后的4组特征按顺序组合起来组成第一特征图,使用1×1的卷积将所述第一特征图进行特征融合形成第二特征图;将所述特征图和所述第二特征图相加起来。A 1×1 convolution operation is used to expand the number of channels of the feature map; the feature maps are equally divided into 4 groups in the channel direction, and the first group uses a 3×3 convolution kernel with a hole size of 1 to convolve the features of the first group , the second group uses a 3×3 convolution kernel of hole size 2 to convolve the features of the second group, and the third group uses a 3×3 convolution kernel of hole size 4 to convolve the features of the third group , the fourth group uses a 3×3 convolution kernel with a hole size of 8 to convolve the features of the fourth group; the 4 groups of features convolved by the convolution kernel are combined in order to form the first feature map, using 1 The ×1 convolution performs feature fusion on the first feature map to form a second feature map; the feature map and the second feature map are added together.
其中,第1组、第2组、第3组、第4组的空洞卷积的感受野分别是3、5、9、17。Among them, the receptive fields of the atrous convolutions of the first group, the second group, the third group, and the fourth group are 3, 5, 9, and 17, respectively.
本申请通过特征金字塔残差块进行特征融合实现了不增加参数的情况 下增大神经元的感受野。4组空洞卷积都是深度卷积,在特征图的通道方向,将原特征图分成单通道的特征图,再使用单通道的卷积核与单通道的特征图卷积,这样又可以减少网络模型的参数。特征金字塔残差块的4组卷积横向分布,不增加网络的深度和参数又提高了神经元的感受野,使得网络能够提取出更多的人脸信息。The present application implements feature fusion through feature pyramid residual blocks to increase the receptive field of neurons without increasing parameters. The four groups of hole convolutions are all depthwise convolutions. In the channel direction of the feature map, the original feature map is divided into single-channel feature maps, and then the single-channel convolution kernel is used to convolve the single-channel feature map, which can reduce the Parameters of the network model. The 4 groups of convolutions of the residual block of the feature pyramid are distributed horizontally, which improves the receptive field of neurons without increasing the depth and parameters of the network, so that the network can extract more face information.
优选地,所述步骤S3包括:Preferably, the step S3 includes:
步骤S3.1:采用所述互嵌入上采样模块将所述步骤S2.2提取的特征与所述步骤S2.3提取的特征进行特征融合;Step S3.1: using the inter-embedded upsampling module to perform feature fusion on the features extracted in the step S2.2 and the features extracted in the step S2.3;
步骤S3.2:使用所述互嵌入上采样模块将所述步骤S3.1融合后的特征与所述步骤S2.1提取的特征进行特征融合。Step S3.2: Use the inter-embedded upsampling module to perform feature fusion on the features fused in the step S3.1 and the features extracted in the step S2.1.
具体地,本申请采用互嵌入上采样模块在高阶段特征图上,采用通道注意力模型得到每个通道的第一注意力系数,将第一注意力系数和低阶段的特征相乘,得到经过所述通道注意力模型融合的第一融合特征;Specifically, the present application adopts the inter-embedded upsampling module on the high-stage feature map, adopts the channel attention model to obtain the first attention coefficient of each channel, and multiplies the first attention coefficient and the low-stage features to obtain the the first fusion feature of the channel attention model fusion;
在低阶段特征图上,采用空间注意力模型得到特征图中每一个点的第二注意力系数,将第二注意力系数和经过上采样的所述高阶段特征图相乘,得到经过所述空间注意力模型融合的第二融合特征;将所述第一融合特征与所述第二融合特征相加,得到最终融合特征。On the low-stage feature map, the spatial attention model is used to obtain the second attention coefficient of each point in the feature map, and the second attention coefficient is multiplied by the up-sampled high-stage feature map to obtain the The second fusion feature fused by the spatial attention model; the first fusion feature and the second fusion feature are added to obtain the final fusion feature.
优选地,所述步骤S4包括:Preferably, the step S4 includes:
步骤S4.1:使用一个3×3的卷积核卷积所述步骤S3.2融合后的特征;Step S4.1: use a 3×3 convolution kernel to convolve the fused features in step S3.2;
步骤S4.2:使用两个1×1的卷积核分别预测人脸置信度和人脸的宽度。Step S4.2: Use two 1×1 convolution kernels to predict face confidence and face width respectively.
具体地,待检测图像可以看成一个二维坐标系,图像的左上角看成坐 标系的原点,那么图像中的人脸可以看作是一个二维的高斯分布。人脸的中心位置是高斯分布的中心点,其坐标值对应二维高斯分布的均值,人脸的宽度和高度对应二维高斯分布的方差。Specifically, the image to be detected can be regarded as a two-dimensional coordinate system, and the upper left corner of the image can be regarded as the origin of the coordinate system, then the face in the image can be regarded as a two-dimensional Gaussian distribution. The center position of the face is the center point of the Gaussian distribution, its coordinate value corresponds to the mean of the two-dimensional Gaussian distribution, and the width and height of the face correspond to the variance of the two-dimensional Gaussian distribution.
优选地,本申请另一实施例披露了标签和损失函数进行网络训练过程,具体为:Preferably, another embodiment of the present application discloses a network training process with a label and a loss function, specifically:
中心点为(x,y)的人脸表示为:The face whose center point is (x, y) is expressed as:
f=N(x,y,σ1,σ2)f=N(x, y, σ1, σ2)
x,y为二维高斯分布N的均值,σ1,σ2为二维高斯分布的方差,分别对应人脸的宽度和高度。因此,一副包含n个人脸的图像对应的人脸分布可以表示为:x and y are the mean values of the two-dimensional Gaussian distribution N, and σ1 and σ2 are the variances of the two-dimensional Gaussian distribution, corresponding to the width and height of the face, respectively. Therefore, the face distribution corresponding to an image containing n faces can be expressed as:
I(x,y)=max(N(x i,y i,σ1 i,σ2 i)),i=1,2,…,n; I(x, y)=max(N(x i , y i , σ1 i , σ2 i )), i=1, 2, . . . , n;
而该图像的标签可以表示为:And the label for that image can be expressed as:
Figure PCTCN2021128477-appb-000001
Figure PCTCN2021128477-appb-000001
Figure PCTCN2021128477-appb-000002
Figure PCTCN2021128477-appb-000002
Figure PCTCN2021128477-appb-000003
Figure PCTCN2021128477-appb-000003
Figure PCTCN2021128477-appb-000004
Figure PCTCN2021128477-appb-000004
Ω为预测人脸中心点的标签,Ψ为预测人脸宽度和高度的标签;Ω is the label for predicting the center point of the face, and Ψ is the label for predicting the width and height of the face;
损失函数可以表示为:The loss function can be expressed as:
Figure PCTCN2021128477-appb-000005
Figure PCTCN2021128477-appb-000005
P,K分别为网络的输出,即人脸置信度(归一化的高斯分布幅值)和人脸的宽度、高度(高斯分布的方差),λ为损失比例系数。P and K are the output of the network, namely the confidence of the face (normalized Gaussian distribution amplitude) and the width and height of the face (variance of the Gaussian distribution), and λ is the loss scale coefficient.
本申请实施例还提供了一种快速人脸密度预测和人脸检测装置,包括:The embodiment of the present application also provides a fast face density prediction and face detection device, including:
图像获取模块,用于获取待检测图像;an image acquisition module for acquiring the image to be detected;
特征提取模块,用于采用特征金字塔残差块提取所述待检测图像中的多尺度特征;a feature extraction module, used for extracting multi-scale features in the image to be detected by using a feature pyramid residual block;
特征融合模块,用于采用互嵌入上采样模块进行特征融合;The feature fusion module is used for feature fusion using the inter-embedded upsampling module;
检测结果模块,用于采用人脸检测模块预测人脸置信度和人脸的宽度、高度。The detection result module is used to use the face detection module to predict the confidence level of the face and the width and height of the face.
本申请实施例还提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的机器可读指令,其特征在于,所述处理器执行所述机器可读指令时,执行如上面描述的方法。Embodiments of the present application further provide an electronic device, including a memory, a processor, and machine-readable instructions stored in the memory and executable on the processor, wherein the processor executes the machine-readable instructions , execute the method described above.
本申请实施例还提供了一种存储介质,其上存储有计算机程序,其特征在于,该程序被处理器运行时执行如上面描述的方法。Embodiments of the present application further provide a storage medium on which a computer program is stored, characterized in that, when the program is run by a processor, the method as described above is executed.
通过上述技术方案,本发明的有益效果是:Through the above-mentioned technical scheme, the beneficial effects of the present invention are:
本申请采用预测高斯分布的方法来预测图像中的人脸密度和检测图像 中的人脸,避免使用候选框而带来的不稳定因素;采用一种特征金字塔残差块使用小的卷积核并且不增加网络的深度来在增大神经元的感受野;实现不增加网络的深度和参数提高神经元的感受野,使得网络能够提取出更多的人脸信息;采用互嵌入上采样模块进行特征融合,实现了高低层特征融合时,充分运用了高层特征的纹理信息和低层特征的细节信息。This application adopts the method of predicting Gaussian distribution to predict the density of faces in images and detect faces in images, so as to avoid unstable factors caused by the use of candidate frames; a feature pyramid residual block is used to use a small convolution kernel And do not increase the depth of the network to increase the receptive field of neurons; realize that the receptive field of neurons is improved without increasing the depth and parameters of the network, so that the network can extract more face information; the inter-embedded upsampling module is used for Feature fusion, when realizing the fusion of high and low-level features, makes full use of the texture information of high-level features and the detailed information of low-level features.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments described in the present invention. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.
图1为本申请实施例提供的快速人脸密度预测和人脸检测方法的流程示意图;1 is a schematic flowchart of a method for fast face density prediction and face detection provided by an embodiment of the present application;
图2为本申请实施例提供的人脸密度预测和人脸检测模型的结构框图;2 is a structural block diagram of a face density prediction and face detection model provided by an embodiment of the present application;
图3本申请实施例提供的特征金字塔残差块的结构框图;3 is a structural block diagram of a feature pyramid residual block provided by an embodiment of the present application;
图4为本申请实施例提供的快速人脸密度预测和人脸检测装置的结构示意图;4 is a schematic structural diagram of an apparatus for fast face density prediction and face detection provided by an embodiment of the present application;
图5为本申请实施例提供的电子设备的结构示意图。FIG. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例, 而不是全部的实施例。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments.
除非另有定义,本文所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。本文中在本发明的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本发明。本文所使用的术语“及/或”包括一个或多个相关的所列项目的任意的和所有的组合。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terms used herein in the description of the present invention are for the purpose of describing specific embodiments only, and are not intended to limit the present invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
请参见图1本申请实施例提供的快速人脸密度预测和人脸检测方法的流程示意图;一种快速人脸密度预测和人脸检测方法包括以下步骤:Please refer to the schematic flowchart of the fast face density prediction and face detection method provided by the embodiment of the present application in FIG. 1; a fast face density prediction and face detection method includes the following steps:
步骤S1:获取待检测图像;Step S1: acquiring an image to be detected;
待检测图像,是指需要检测是否包括人脸的图像,具体例如:对人脸拍摄的彩色图像、黑白图像或者二值图像等。The image to be detected refers to an image that needs to be detected whether it includes a human face, for example, a color image, a black-and-white image, or a binary image captured on a human face.
上述步骤S1中的待检测图像的获得方式包括:使用摄像机、录像机或彩色照相机等终端设备对目标对象进行拍摄,获得待检测图像;获取预先存储的待检测图像,具体例如:从实时的视频流中或者从文件系统中的视频文件截取,获得待检测图像,或者从数据库中获取待检测图像,或者从移动存储设备中获取待检测图像;使用浏览器等软件获取互联网上的待检测图像,或者使用其它应用程序访问互联网获得待检测图像。The method of obtaining the image to be detected in the above step S1 includes: using a terminal device such as a video camera, a video recorder or a color camera to photograph the target object to obtain the image to be detected; obtaining a pre-stored image to be detected, specifically for example: from a real-time video stream or from the video file in the file system to obtain the image to be detected, or obtain the image to be detected from the database, or obtain the image to be detected from the mobile storage device; use software such as a browser to obtain the image to be detected on the Internet, or Use other applications to access the Internet to obtain images to be inspected.
步骤S2:采用特征金字塔残差块提取待检测图像中的多尺度特征;Step S2: using feature pyramid residual blocks to extract multi-scale features in the image to be detected;
在本申请实施例中,请参见图2本申请实施例提供的人脸密度预测和人脸检测模型的结构框图,采用特征金字塔残差块提取待检测图像中的多尺度特征还包括以下步骤:In the embodiment of the present application, referring to the structural block diagram of the face density prediction and face detection model provided by the embodiment of the present application in FIG. 2 , using the feature pyramid residual block to extract the multi-scale features in the image to be detected further includes the following steps:
步骤S2.1:第一阶段,使用3×3的卷积核与待检测图像进行卷积,将卷积后的图像送入到特征金字塔残差块中提取特征;Step S2.1: In the first stage, a 3×3 convolution kernel is used to convolve the image to be detected, and the convolved image is sent to the feature pyramid residual block to extract features;
步骤S2.2:第二阶段,使用多个特征金字塔残差块组合成一个特征提取网络,提取步骤S2.1输出的特征图的特征;Step S2.2: In the second stage, multiple feature pyramid residual blocks are combined into a feature extraction network, and the features of the feature map output in step S2.1 are extracted;
步骤S2.3:第三阶段,使用多个特征金字塔残差块组合成一个特征提取网络,提取步骤S2.2输出的特征图的特征。Step S2.3: In the third stage, multiple feature pyramid residual blocks are combined into a feature extraction network to extract the features of the feature map output in step S2.2.
具体地,特征金字塔残差块参见图4本申请实施例提供的特征金字塔残差块的结构框图;Specifically, for the feature pyramid residual block, refer to FIG. 4 for the structural block diagram of the feature pyramid residual block provided by the embodiment of the present application;
采用1×1的卷积操作扩展特征图的通道数;将特征图在通道方向平均分成4组,第1组使用空洞大小为1的3×3卷积核卷积所述第1组的特征,第2组使用空洞大小为2的3×3卷积核卷积所述第2组的特征,第3组使用空洞大小为4的3×3卷积核卷积所述第3组的特征,第4组使用空洞大小为8的3×3卷积核卷积所述第4组的特征;将经卷积核卷积后的4组特征按顺序组合起来组成第一特征图,使用1×1的卷积将所述第一特征图进行特征融合形成第二特征图;特征图和所述第二特征图相加起来。A 1×1 convolution operation is used to expand the number of channels of the feature map; the feature maps are equally divided into 4 groups in the channel direction, and the first group uses a 3×3 convolution kernel with a hole size of 1 to convolve the features of the first group , the second group uses a 3×3 convolution kernel of hole size 2 to convolve the features of the second group, and the third group uses a 3×3 convolution kernel of hole size 4 to convolve the features of the third group , the fourth group uses a 3×3 convolution kernel with a hole size of 8 to convolve the features of the fourth group; the 4 groups of features convolved by the convolution kernel are combined in order to form the first feature map, using 1 The convolution of ×1 performs feature fusion on the first feature map to form a second feature map; the feature map and the second feature map are added together.
其中,第1组、第2组、第3组、第4组的空洞卷积的感受野分别是3、5、9、17。Among them, the receptive fields of the atrous convolutions of the first group, the second group, the third group, and the fourth group are 3, 5, 9, and 17, respectively.
特征提取网络中,神经元要获得较大的感受野,要么使用较大的卷积核,要么加深网络的深度。这两种方法都会增加特征提取网络的参数量。本申请采用一种新的特征金字塔残差块,使用小的卷积核并且不增加网络的深度来在增大神经元的感受野。同时,横向扩展神经网络,使得网络能 够提取出更多的人脸信息。In the feature extraction network, to obtain a larger receptive field for neurons, either use a larger convolution kernel or deepen the depth of the network. Both methods increase the amount of parameters of the feature extraction network. This application adopts a new feature pyramid residual block, which uses small convolution kernels and does not increase the depth of the network to increase the receptive field of neurons. At the same time, the horizontal expansion of the neural network enables the network to extract more face information.
步骤S3:采用互嵌入上采样模块进行特征融合;Step S3: adopting the mutual embedding upsampling module to perform feature fusion;
具体地,本申请实施例采用互嵌入上采样模块在高阶段特征图上,采用通道注意力模型得到每个通道的第一注意力系数,将第一注意力系数和低阶段的特征相乘,得到经过所述通道注意力模型融合的第一融合特征;Specifically, the embodiment of the present application adopts the inter-embedded upsampling module on the high-stage feature map, adopts the channel attention model to obtain the first attention coefficient of each channel, and multiplies the first attention coefficient and the low-stage features, obtaining the first fusion feature fused by the channel attention model;
在低阶段特征图上,采用空间注意力模型得到特征图中每一个点的第二注意力系数,将第二注意力系数和经过上采样的所述高阶段特征图相乘,得到经过所述空间注意力模型融合的第二融合特征;On the low-stage feature map, the spatial attention model is used to obtain the second attention coefficient of each point in the feature map, and the second attention coefficient is multiplied by the up-sampled high-stage feature map to obtain the The second fusion feature of spatial attention model fusion;
将所述第一融合特征与所述第二融合特征相加,得到最终融合特征。The first fusion feature and the second fusion feature are added to obtain the final fusion feature.
通道注意力模型及空间注意力模型为本领域常见技术,主要聚焦于局部信息的机制,比如图像中的某一个图像区域。随着任务的变化,注意力区域往往会发生变化,本申请在此不赘述。The channel attention model and the spatial attention model are common technologies in the field, and mainly focus on the mechanism of local information, such as a certain image area in the image. With the change of tasks, attention areas tend to change, which will not be repeated in this application.
本申请采用互嵌入上采样模块进行特征融合,实现了高低层特征融合时,充分运用了高层特征的纹理信息和低层特征的细节信息。In this application, the inter-embedded upsampling module is used for feature fusion, and the texture information of the high-level features and the detailed information of the low-level features are fully utilized when the high-level and low-level feature fusion is realized.
步骤S4:采用人脸检测模型网络预测人脸置信度和人脸的宽度、高度。具体地,还包括以下步骤:Step S4: Using the face detection model network to predict the confidence level of the face and the width and height of the face. Specifically, it also includes the following steps:
步骤S4.1:使用一个3×3的卷积核卷积所述步骤S3.2融合后的特征;Step S4.1: use a 3×3 convolution kernel to convolve the fused features in step S3.2;
步骤S4.2:使用两个1×1的卷积核分别预测人脸置信度和人脸的宽度、高度。Step S4.2: Use two 1×1 convolution kernels to predict face confidence and face width and height respectively.
使用边界框对人脸图像中的人脸区域进行标注,以及对所述人脸区域对应的分类和关键点进行标注获得的,关键点表征人脸区域中的关键特征 点;可选地,可以在本方法的末端再接一个输出,用预测人脸中心点位置的方法来检测人脸的关键点。Using the bounding box to mark the face area in the face image, and to mark the classification and key points corresponding to the face area, the key points represent the key feature points in the face area; optionally, you can At the end of this method, there is another output, and the key points of the face are detected by the method of predicting the position of the center point of the face.
待检测图像可以看成一个二维坐标系,图像的左上角看成坐标系的原点,那么图像中的人脸可以看作是一个二维的高斯分布。人脸的中心位置是高斯分布的中心点,其坐标值对应二维高斯分布的均值,人脸的宽度和高度对应二维高斯分布的方差。The image to be detected can be regarded as a two-dimensional coordinate system, and the upper left corner of the image can be regarded as the origin of the coordinate system, then the face in the image can be regarded as a two-dimensional Gaussian distribution. The center position of the face is the center point of the Gaussian distribution, its coordinate value corresponds to the mean of the two-dimensional Gaussian distribution, and the width and height of the face correspond to the variance of the two-dimensional Gaussian distribution.
本申请的另一实施例还提供了标签和损失函数进行网络训练过程,具体为:Another embodiment of the present application also provides a label and a loss function to perform a network training process, specifically:
中心点为(x,y)的人脸表示为:The face whose center point is (x, y) is expressed as:
f=N(x,y,σ1,σ2)f=N(x, y, σ1, σ2)
x,y为二维高斯分布N的均值,σ1,σ2为二维高斯分布的方差,分别对应人脸的宽度和高度。因此,一副包含n个人脸的图像对应的人脸分布可以表示为:x and y are the mean values of the two-dimensional Gaussian distribution N, and σ1 and σ2 are the variances of the two-dimensional Gaussian distribution, corresponding to the width and height of the face, respectively. Therefore, the face distribution corresponding to an image containing n faces can be expressed as:
I(x,y)=max(N(x i,y i,σ1 i,σ2 i)),i=1,2,…,n; I(x, y)=max(N(x i , y i , σ1 i , σ2 i )), i=1, 2, . . . , n;
而该图像的标签可以表示为:And the label for that image can be expressed as:
Figure PCTCN2021128477-appb-000006
Figure PCTCN2021128477-appb-000006
Figure PCTCN2021128477-appb-000007
Figure PCTCN2021128477-appb-000007
Figure PCTCN2021128477-appb-000008
Figure PCTCN2021128477-appb-000008
Figure PCTCN2021128477-appb-000009
Figure PCTCN2021128477-appb-000009
Ω为预测人脸中心点的标签,Ψ为预测人脸宽度和高度的标签;Ω is the label for predicting the center point of the face, and Ψ is the label for predicting the width and height of the face;
损失函数可以表示为:The loss function can be expressed as:
Figure PCTCN2021128477-appb-000010
Figure PCTCN2021128477-appb-000010
P,K分别为网络的输出,即人脸置信度(归一化的高斯分布幅值)和人脸的宽度、高度(高斯分布的方差),λ为损失比例系数。P and K are the output of the network, namely the confidence of the face (normalized Gaussian distribution amplitude) and the width and height of the face (variance of the Gaussian distribution), and λ is the loss scale coefficient.
因此,本方法采用预测高斯分布的方法来预测图像中的人脸密度和检测图像中的人脸,避免使用候选框而带来的不稳定因素。Therefore, this method adopts the method of predicting the Gaussian distribution to predict the face density in the image and detect the face in the image, so as to avoid the unstable factors caused by the use of candidate frames.
请参见图4示出的本申请实施例提供的快速人脸密度预测和人脸检测装置的结构示意图;本申请实施例提供了一种人脸密度预测和人脸检测装置300,包括:Please refer to the schematic structural diagram of the apparatus for fast face density prediction and face detection provided by the embodiment of the present application shown in FIG. 4; the embodiment of the present application provides a face density prediction and face detection apparatus 300, including:
图像获取模块310,用于获取待检测图像;an image acquisition module 310, configured to acquire an image to be detected;
特征提取模块320,用于采用特征金字塔残差块提取所述待检测图像中的多尺度特征;A feature extraction module 320, configured to extract multi-scale features in the image to be detected by using a feature pyramid residual block;
特征融合模块330,用于采用互嵌入上采样模块进行特征融合;The feature fusion module 330 is used to perform feature fusion by adopting the mutual embedded upsampling module;
检测结果模块340,用于采用人脸检测模块预测人脸置信度和人脸的宽度、高度,获得人脸检测结果。The detection result module 340 is configured to use the face detection module to predict the confidence level of the face and the width and height of the face to obtain the face detection result.
应理解的是,该装置与上述的快速人脸密度预测和人脸检测方法实施例对应,能够执行上述方法实施例涉及的各个步骤,该装置具体的功能可以参见上文中的描述,为避免重复,此处适当省略详细描述。该装置包括至少一个能以软件或固件(firmware)的形式存储于存储器中或固化在装置的操作系统(operating system,OS)中的软件功能模块。It should be understood that the device corresponds to the above-mentioned embodiments of the fast face density prediction and face detection methods, and can perform various steps involved in the above-mentioned method embodiments. For the specific functions of the device, refer to the description above. To avoid repetition , and the detailed description is appropriately omitted here. The device includes at least one software function module that can be stored in a memory in the form of software or firmware or fixed in an operating system (OS) of the device.
请参见图5示出的本申请实施例提供的电子设备的结构示意图。本申请实施例提供的一种电子设备400,包括:处理器410和存储器420,存储器420存储有处理器410可执行的机器可读指令,机器可读指令被处理器410执行时执行如上的方法。Please refer to the schematic structural diagram of the electronic device provided by the embodiment of the present application shown in FIG. 5 . An electronic device 400 provided in an embodiment of the present application includes: a processor 410 and a memory 420, where the memory 420 stores machine-readable instructions executable by the processor 410, and the above method is executed when the machine-readable instructions are executed by the processor 410 .
本申请实施例还提供了一种存储介质430,该存储介质430上存储有计算机程序,该计算机程序被处理器410运行时执行如上的方法。The embodiment of the present application also provides a storage medium 430, where a computer program is stored on the storage medium 430, and the computer program is executed by the processor 410 to execute the above method.
其中,存储介质430可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(Static Random Access Memory,简称SRAM),电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,简称EEPROM),可擦除可编程只读存储器(Erasable Programmable Read Only Memory,简称EPROM),可编程只读存储器(Programmable Red-Only Memory,简称PROM),只读存储器(Read-Only Memory,简称ROM),磁存储器,快闪存储器,磁盘或光盘。Wherein, the storage medium 430 can be realized by any type of volatile or non-volatile storage device or their combination, such as static random access memory (Static Random Access Memory, SRAM for short), electrically erasable programmable read-only Memory (Electrically Erasable Programmable Read-Only Memory, referred to as EEPROM), Erasable Programmable Read Only Memory (Erasable Programmable Read Only Memory, referred to as EPROM), Programmable Read-Only Memory (Programmable Red-Only Memory, referred to as PROM), only Read-Only Memory (ROM for short), magnetic memory, flash memory, magnetic disk or optical disk.
在本发明的描述中,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更 多个该特征。“多个”的含义是两个或两个以上,除非另有明确具体的限定。In the description of the present invention, the terms "first" and "second" are only used for the purpose of description, and cannot be understood as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature defined as "first", "second" may expressly or implicitly include one or more of that feature. "Plurality" means two or more, unless expressly specifically limited otherwise.
在本发明中,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”、“固定”等术语应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或成一体;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通或两个元件的相互作用关系。对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在本发明中的具体含义。In the present invention, unless otherwise expressly specified and limited, the terms "installed", "connected", "connected", "fixed" and other terms should be understood in a broad sense, for example, it may be a fixed connection or a detachable connection , or integrated; it can be a mechanical connection or an electrical connection; it can be a direct connection or an indirect connection through an intermediate medium, and it can be the internal connection of the two elements or the interaction relationship between the two elements. For those of ordinary skill in the art, the specific meanings of the above terms in the present invention can be understood according to specific situations.
在本发明中,除非另有明确的规定和限定,第一特征在第二特征“上”或“下”可以是第一和第二特征直接接触,或第一和第二特征通过中间媒介间接接触。而且,第一特征在第二特征“之上”、“上方”和“上面”可是第一特征在第二特征正上方或斜上方,或仅仅表示第一特征水平高度高于第二特征。第一特征在第二特征“之下”、“下方”和“下面”可以是第一特征在第二特征正下方或斜下方,或仅仅表示第一特征水平高度小于第二特征。In the present invention, unless otherwise expressly specified and limited, a first feature "on" or "under" a second feature may be in direct contact between the first and second features, or the first and second features indirectly through an intermediary touch. Also, the first feature being "above", "over" and "above" the second feature may mean that the first feature is directly above or obliquely above the second feature, or simply means that the first feature is level higher than the second feature. The first feature being "below", "below" and "below" the second feature may mean that the first feature is directly below or obliquely below the second feature, or simply means that the first feature has a lower level than the second feature.
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必针对相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或 示例的特征进行结合和组合。In the description of this specification, description with reference to the terms "one embodiment," "some embodiments," "example," "specific example," or "some examples", etc., mean specific features described in connection with the embodiment or example , structure, material or feature is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, those skilled in the art may combine and combine the different embodiments or examples described in this specification, as well as the features of the different embodiments or examples, without conflicting each other.
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本发明的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本发明的实施例所属技术领域的技术人员所理解。Any description of a process or method in the flowcharts or otherwise described herein may be understood to represent a module, segment or portion of code comprising one or more executable instructions for implementing a specified logical function or step of the process , and the scope of the preferred embodiments of the invention includes alternative implementations in which the functions may be performed out of the order shown or discussed, including performing the functions substantially concurrently or in the reverse order depending upon the functions involved, which should It is understood by those skilled in the art to which the embodiments of the present invention belong.
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,“计算机可读介质”可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以 电子方式获得所述程序,然后将其存储在计算机存储器中。The logic and/or steps represented in flowcharts or otherwise described herein, for example, may be considered an ordered listing of executable instructions for implementing the logical functions, may be embodied in any computer-readable medium, For use with, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a system including a processor, or other system that can fetch instructions from and execute instructions from an instruction execution system, apparatus, or apparatus) or equipment. For the purposes of this specification, a "computer-readable medium" can be any device that can contain, store, communicate, propagate, or transport the program for use by or in connection with an instruction execution system, apparatus, or apparatus. More specific examples (non-exhaustive list) of computer readable media include the following: electrical connections with one or more wiring (electronic devices), portable computer disk cartridges (magnetic devices), random access memory (RAM), Read Only Memory (ROM), Erasable Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program may be printed, as the paper or other medium may be optically scanned, for example, followed by editing, interpretation, or other suitable medium as necessary process to obtain the program electronically and then store it in computer memory.
应当理解,本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。It should be understood that various parts of the present invention may be implemented in hardware, software, firmware or a combination thereof. In the above-described embodiments, various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or a combination of the following techniques known in the art: Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, Programmable Gate Arrays (PGA), Field Programmable Gate Arrays (FPGA), etc.
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。Those skilled in the art can understand that all or part of the steps carried by the methods of the above embodiments can be completed by instructing the relevant hardware through a program, and the program can be stored in a computer-readable storage medium, and the program can be stored in a computer-readable storage medium. When executed, one or a combination of the steps of the method embodiment is included.
此外,在本发明各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing module, or each unit may exist physically alone, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. If the integrated modules are implemented in the form of software functional modules and sold or used as independent products, they may also be stored in a computer-readable storage medium.
上述提到的存储介质可以是只读存储器,磁盘或光盘等。尽管上面已经示出和描述了本发明的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本发明的限制,本领域的普通技术人员在本发明的范围内可 以对上述实施例进行变化、修改、替换和变型。The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, and the like. Although the embodiments of the present invention have been shown and described above, it should be understood that the above-mentioned embodiments are exemplary and should not be construed as limiting the present invention. Embodiments are subject to variations, modifications, substitutions and variations.

Claims (12)

  1. 一种快速人脸密度预测和人脸检测方法,其特征在于,包括以下步骤:A method for fast face density prediction and face detection, characterized in that it comprises the following steps:
    步骤S1:获取待检测图像;Step S1: acquiring an image to be detected;
    步骤S2:采用特征金字塔残差块提取待检测图像中的多尺度特征;Step S2: using feature pyramid residual blocks to extract multi-scale features in the image to be detected;
    步骤S3:采用互嵌入上采样模块进行特征融合;Step S3: adopting the mutual embedding upsampling module to perform feature fusion;
    步骤S4:采用人脸检测模块预测人脸置信度和人脸的宽度、高度。Step S4: using the face detection module to predict the confidence level of the face and the width and height of the face.
  2. 根据权利要求1所述的一种快速人脸密度预测和人脸检测方法,其特征在于,所述步骤S2包括:A kind of fast face density prediction and face detection method according to claim 1, is characterized in that, described step S2 comprises:
    步骤S2.1:使用3×3的卷积核与所述待检测图像进行卷积,将卷积后的图像送入到所述特征金字塔残差块中提取特征;Step S2.1: use a 3×3 convolution kernel to convolve the image to be detected, and send the convolved image into the feature pyramid residual block to extract features;
    步骤S2.2:使用多个所述特征金字塔残差块组合成一个特征提取网络,提取所述步骤S2.1输出的特征图的特征;Step S2.2: combine a plurality of the feature pyramid residual blocks into a feature extraction network, and extract the features of the feature map output by the step S2.1;
    步骤S2.3:使用多个所述特征金字塔残差块组合成一个特征提取网络,提取所述步骤S2.2输出的特征图的特征。Step S2.3: Combine a plurality of the feature pyramid residual blocks into a feature extraction network, and extract the features of the feature map output in the step S2.2.
  3. 根据权利要求2所述的一种快速人脸密度预测和人脸检测方法,其特征在于,所述步骤S3包括:A kind of fast face density prediction and face detection method according to claim 2, is characterized in that, described step S3 comprises:
    步骤S3.1:采用所述互嵌入上采样模块将所述步骤S2.2提取的特征与所述步骤S2.3提取的特征进行特征融合;Step S3.1: using the inter-embedded upsampling module to perform feature fusion on the features extracted in the step S2.2 and the features extracted in the step S2.3;
    步骤S3.2:使用所述互嵌入上采样模块将所述步骤S3.1融合后的特征与所述步骤S2.1提取的特征进行特征融合。Step S3.2: Use the inter-embedded upsampling module to perform feature fusion on the features fused in the step S3.1 and the features extracted in the step S2.1.
  4. 根据权利要求3所述的一种快速人脸密度预测和人脸检测方法,其特征在于,所述步骤S4包括:A kind of fast face density prediction and face detection method according to claim 3, is characterized in that, described step S4 comprises:
    步骤S4.1:使用一个3×3的卷积核卷积所述步骤S3.2融合后的特征;Step S4.1: use a 3×3 convolution kernel to convolve the fused features in step S3.2;
    步骤S4.2:使用两个1×1的卷积核分别预测人脸置信度和人脸的宽度、高度。Step S4.2: Use two 1×1 convolution kernels to predict face confidence and face width and height respectively.
  5. 根据权利要求1所述的一种快速人脸密度预测和人脸检测方法,其特征在于,所述特征金字塔残差块包括:A kind of fast face density prediction and face detection method according to claim 1, is characterized in that, described feature pyramid residual block comprises:
    采用1×1的卷积操作扩展特征图的通道数;Use a 1×1 convolution operation to expand the number of channels of the feature map;
    将所述特征图在通道方向平均分成4组,第1组使用空洞大小为1的3×3卷积核卷积所述第1组的特征,第2组使用空洞大小为2的3×3卷积核卷积所述第2组的特征,第3组使用空洞大小为4的3×3卷积核卷积所述第3组的特征,第4组使用空洞大小为8的3×3卷积核卷积所述第4组的特征;Divide the feature map into 4 groups equally in the channel direction, the first group uses a 3×3 convolution kernel with a hole size of 1 to convolve the features of the first group, and the second group uses 3×3 with a hole size of 2 The convolution kernel convolves the features of the second group, the third group uses a 3×3 convolution kernel with a hole size of 4 to convolve the features of the third group, and the fourth group uses a 3×3 hole size of 8. The convolution kernel convolves the features of the fourth group;
    将经卷积核卷积后的4组特征按顺序组合起来组成第一特征图,使用1×1的卷积将所述第一特征图进行特征融合形成第二特征图;Combining the four sets of features convolved by the convolution kernel in order to form a first feature map, and using 1×1 convolution to perform feature fusion on the first feature map to form a second feature map;
    将所述特征图和所述第二特征图相加起来。The feature map and the second feature map are added together.
  6. 根据权利要求5所述的一种快速人脸密度预测和人脸检测方法,其特征在于,还包括:A kind of fast face density prediction and face detection method according to claim 5, is characterized in that, also comprises:
    所述第2组在空洞卷积前,将所述第2组特征与所述第1组卷积输出的特征相加;before the hole convolution in the second group, the second group of features is added to the features output by the first group of convolutions;
    所述第3组在空洞卷积前,将所述第3组特征与所述第2组卷积输出的特征相加;before the hole convolution in the third group, the third group of features is added to the features output by the second group of convolutions;
    所述第4组在空洞卷积前,将所述第4组特征与所述第3组卷积输出的特征相加。The fourth set of features is added to the features output by the third set of convolutions before the atrous convolution.
  7. 根据权利要求6所述的一种快速人脸密度预测和人脸检测方法,其特征在于,还包括:A kind of fast face density prediction and face detection method according to claim 6, is characterized in that, also comprises:
    所述第1组、第2组、第3组、第4组的空洞卷积的感受野分别是3、5、9、17。The receptive fields of the atrous convolutions of the first group, the second group, the third group, and the fourth group are 3, 5, 9, and 17, respectively.
  8. 根据权利要求1所述的一种快速人脸密度预测和人脸检测方法,其特征在于,所述互嵌入上采样模块包括:The method for fast face density prediction and face detection according to claim 1, wherein the mutual embedded upsampling module comprises:
    高阶段特征图上,采用通道注意力模型得到每个通道的第一注意力系数,将所述第一注意力系数和低阶段的特征相乘,得到经过所述通道注意力模型融合的第一融合特征;On the high-stage feature map, the channel attention model is used to obtain the first attention coefficient of each channel, and the first attention coefficient and the low-stage feature are multiplied to obtain the first attention coefficient fused by the channel attention model. fusion features;
    低阶段特征图上,采用空间注意力模型得到特征图中每一个点的第二注意力系数,将所述第二注意力系数和经过上采样的所述高阶段特征图相乘,得到经过所述空间注意力模型融合的第二融合特征;On the low-stage feature map, the spatial attention model is used to obtain the second attention coefficient of each point in the feature map, and the second attention coefficient is multiplied by the up-sampled high-stage feature map to obtain The second fusion feature of the spatial attention model fusion;
    将所述第一融合特征与所述第二融合特征相加,得到最终融合特征。The first fusion feature and the second fusion feature are added to obtain the final fusion feature.
  9. 根据权利要求1-8任一所述的一种快速人脸密度预测和人脸检测方法,其特征在于,还包括使用以下标签和损失函数进行网络训练:A kind of fast face density prediction and face detection method according to any one of claims 1-8, is characterized in that, also comprises using following label and loss function to carry out network training:
    中心点为(x,y)的人脸表示为:The face whose center point is (x, y) is expressed as:
    f=N(x,y,σ1,σ2)f=N(x, y, σ1, σ2)
    x,y为二维高斯分布N的均值,σ1,σ2为二维高斯分布的方差,分别对应人脸的宽度和高度。因此,一副包含n个人脸的图像对应的人脸分布可以表示为:x and y are the mean values of the two-dimensional Gaussian distribution N, and σ1 and σ2 are the variances of the two-dimensional Gaussian distribution, corresponding to the width and height of the face, respectively. Therefore, the face distribution corresponding to an image containing n faces can be expressed as:
    I(x,y)=max(N(x i,y i,σ1 i,σ2 i)),i=1,2,…,n; I(x, y)=max(N(x i , y i , σ1 i , σ2 i )), i=1, 2, . . . , n;
    而该图像的标签可以表示为:And the label for that image can be expressed as:
    Figure PCTCN2021128477-appb-100001
    Figure PCTCN2021128477-appb-100001
    Figure PCTCN2021128477-appb-100002
    Figure PCTCN2021128477-appb-100002
    Figure PCTCN2021128477-appb-100003
    Figure PCTCN2021128477-appb-100003
    Figure PCTCN2021128477-appb-100004
    Figure PCTCN2021128477-appb-100004
    Ω为预测人脸中心点的标签,Ψ为预测人脸宽度和高度的标签;Ω is the label for predicting the center point of the face, and Ψ is the label for predicting the width and height of the face;
    损失函数可以表示为:The loss function can be expressed as:
    Figure PCTCN2021128477-appb-100005
    Figure PCTCN2021128477-appb-100005
    P,K分别为网络的输出,即人脸置信度和人脸的宽度、高度,λ为损失比例系数。P and K are the output of the network, namely the confidence of the face and the width and height of the face, and λ is the loss scale coefficient.
  10. 一种快速人脸密度预测和人脸检测装置,其特征在于,包括:A device for fast face density prediction and face detection, characterized in that it includes:
    图像获取模块,用于获取待检测图像;an image acquisition module, used to acquire an image to be detected;
    特征提取模块,用于采用特征金字塔残差块提取所述待检测图像中的多尺度特征;a feature extraction module, used for extracting multi-scale features in the image to be detected by using a feature pyramid residual block;
    特征融合模块,用于采用互嵌入上采样模块进行特征融合;The feature fusion module is used for feature fusion using the mutual embedded upsampling module;
    检测结果模块,用于采用脸检测模块预测人脸置信度和人脸的宽度、高度。The detection result module is used to use the face detection module to predict the confidence level of the face and the width and height of the face.
  11. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的机器可读指令,其特征在于,所述处理器执行所述机器可读指令时,实现根据权利要求1-9中任一项所述的快速人脸密度预测和人脸检测方法。An electronic device, comprising a memory, a processor, and machine-readable instructions stored on the memory and executable on the processor, characterized in that, when the processor executes the machine-readable instructions, the method according to claim 1 is implemented. The fast face density prediction and face detection method described in any one of -9.
  12. 一种存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现根据权利要求1-9中任一项所述的快速人脸密度预测和人脸检测方法。A storage medium having a computer program stored thereon, characterized in that, when the computer program is executed by a processor, the method for fast face density prediction and face detection according to any one of claims 1-9 is implemented.
PCT/CN2021/128477 2020-11-19 2021-11-03 Rapid face density prediction and face detection method and apparatus, electronic device, and storage medium WO2022105608A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011306982.9 2020-11-19
CN202011306982.9A CN112329702B (en) 2020-11-19 2020-11-19 Method and device for rapid face density prediction and face detection, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2022105608A1 true WO2022105608A1 (en) 2022-05-27

Family

ID=74321660

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/128477 WO2022105608A1 (en) 2020-11-19 2021-11-03 Rapid face density prediction and face detection method and apparatus, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN112329702B (en)
WO (1) WO2022105608A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116343308A (en) * 2023-04-04 2023-06-27 湖南交通工程学院 Fused face image detection method, device, equipment and storage medium
CN116935477A (en) * 2023-09-13 2023-10-24 中南民族大学 Multi-branch cascade face detection method and device based on joint attention
CN118097363A (en) * 2024-04-28 2024-05-28 南昌大学 Face image generation and recognition method and system based on near infrared imaging

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329702B (en) * 2020-11-19 2021-05-07 上海点泽智能科技有限公司 Method and device for rapid face density prediction and face detection, electronic equipment and storage medium
CN113743197A (en) * 2021-07-23 2021-12-03 北京眼神智能科技有限公司 Rapid face detection method and device, electronic equipment and storage medium
CN113658226B (en) * 2021-08-26 2023-09-05 中国人民大学 Height detection method and system for height limiting device
CN113642545B (en) * 2021-10-15 2022-01-28 北京万里红科技有限公司 Face image processing method based on multi-task learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180137406A1 (en) * 2016-11-15 2018-05-17 Google Inc. Efficient Convolutional Neural Networks and Techniques to Reduce Associated Computational Costs
CN111178183A (en) * 2019-12-16 2020-05-19 深圳市华尊科技股份有限公司 Face detection method and related device
CN111373439A (en) * 2020-02-10 2020-07-03 香港应用科技研究院有限公司 Method for image segmentation using CNN
CN112329702A (en) * 2020-11-19 2021-02-05 上海点泽智能科技有限公司 Method and device for rapid face density prediction and face detection, electronic equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108364023A (en) * 2018-02-11 2018-08-03 北京达佳互联信息技术有限公司 Image-recognizing method based on attention model and system
CN109117876B (en) * 2018-07-26 2022-11-04 成都快眼科技有限公司 Dense small target detection model construction method, dense small target detection model and dense small target detection method
KR20200123501A (en) * 2019-04-15 2020-10-30 현대자동차주식회사 Apparatus and method for detecting object of vehicle
CN110136136B (en) * 2019-05-27 2022-02-08 北京达佳互联信息技术有限公司 Scene segmentation method and device, computer equipment and storage medium
CN111723748B (en) * 2020-06-22 2022-04-29 电子科技大学 Infrared remote sensing image ship detection method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180137406A1 (en) * 2016-11-15 2018-05-17 Google Inc. Efficient Convolutional Neural Networks and Techniques to Reduce Associated Computational Costs
CN111178183A (en) * 2019-12-16 2020-05-19 深圳市华尊科技股份有限公司 Face detection method and related device
CN111373439A (en) * 2020-02-10 2020-07-03 香港应用科技研究院有限公司 Method for image segmentation using CNN
CN112329702A (en) * 2020-11-19 2021-02-05 上海点泽智能科技有限公司 Method and device for rapid face density prediction and face detection, electronic equipment and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116343308A (en) * 2023-04-04 2023-06-27 湖南交通工程学院 Fused face image detection method, device, equipment and storage medium
CN116343308B (en) * 2023-04-04 2024-02-09 湖南交通工程学院 Fused face image detection method, device, equipment and storage medium
CN116935477A (en) * 2023-09-13 2023-10-24 中南民族大学 Multi-branch cascade face detection method and device based on joint attention
CN116935477B (en) * 2023-09-13 2023-12-26 中南民族大学 Multi-branch cascade face detection method and device based on joint attention
CN118097363A (en) * 2024-04-28 2024-05-28 南昌大学 Face image generation and recognition method and system based on near infrared imaging

Also Published As

Publication number Publication date
CN112329702A (en) 2021-02-05
CN112329702B (en) 2021-05-07

Similar Documents

Publication Publication Date Title
WO2022105608A1 (en) Rapid face density prediction and face detection method and apparatus, electronic device, and storage medium
Huh et al. Fighting fake news: Image splice detection via learned self-consistency
US10901740B2 (en) Synthetic depth image generation from cad data using generative adversarial neural networks for enhancement
WO2021057848A1 (en) Network training method, image processing method, network, terminal device and medium
US10872420B2 (en) Electronic device and method for automatic human segmentation in image
US9697416B2 (en) Object detection using cascaded convolutional neural networks
US10440276B2 (en) Generating image previews based on capture information
WO2019024771A1 (en) Car insurance image processing method, apparatus, server and system
CN111311475A (en) Detection model training method and device, storage medium and computer equipment
Rafique et al. Deep fake detection and classification using error-level analysis and deep learning
CN110942456B (en) Tamper image detection method, device, equipment and storage medium
CN112101386B (en) Text detection method, device, computer equipment and storage medium
Zou et al. License plate detection with shallow and deep CNNs in complex environments
CN111382647B (en) Picture processing method, device, equipment and storage medium
CN110599453A (en) Panel defect detection method and device based on image fusion and equipment terminal
CN112329762A (en) Image processing method, model training method, device, computer device and medium
CN116071309B (en) Method, device, equipment and storage medium for detecting sound scanning defect of component
JP2022090633A (en) Method, computer program product and computer system for improving object detection within high-resolution image
Panda et al. Kernel density estimation and correntropy based background modeling and camera model parameter estimation for underwater video object detection
CN114565768A (en) Image segmentation method and device
CN116844032A (en) Target detection and identification method, device, equipment and medium in marine environment
US20220122341A1 (en) Target detection method and apparatus, electronic device, and computer storage medium
KR102026280B1 (en) Method and system for scene text detection using deep learning
KR20230162010A (en) Real-time machine learning-based privacy filter to remove reflective features from images and videos
CN112598043A (en) Cooperative significance detection method based on weak supervised learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21893755

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21893755

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 14.11.2023)