WO2021174940A1 - 人脸检测方法与系统 - Google Patents

人脸检测方法与系统 Download PDF

Info

Publication number
WO2021174940A1
WO2021174940A1 PCT/CN2020/135079 CN2020135079W WO2021174940A1 WO 2021174940 A1 WO2021174940 A1 WO 2021174940A1 CN 2020135079 W CN2020135079 W CN 2020135079W WO 2021174940 A1 WO2021174940 A1 WO 2021174940A1
Authority
WO
WIPO (PCT)
Prior art keywords
map
image
area map
face
matched
Prior art date
Application number
PCT/CN2020/135079
Other languages
English (en)
French (fr)
Inventor
赵娅琳
陆进
陈斌
宋晨
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021174940A1 publication Critical patent/WO2021174940A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • the embodiments of the present application relate to the field of face recognition, and in particular, to a method and system for face detection.
  • the existing detection scheme is to use the multi-layer feature map information of the deep network for face detection; especially to extract low-level feature map information to improve the detection accuracy of small faces.
  • the effect is not ideal. There are three reasons: 1. Due to the small scale of the small face, after various down-sampling operations of the low-level convolutional network, the target feature information is lost too much, and only a small part of the information is left for detection; 2.
  • the artificially preset prediction frame has good robustness in complex environments and is widely used; however, due to the mismatch of the real face, the prediction frame size, and the perception field, the detection rate decreases sharply with the face size reduction decline. 3.
  • the artificially preset prediction frame needs to be carefully designed, and the sampling strategy needs to be cooperated in the detection stage to improve the detection rate of small faces.
  • the inventor realized that at present, a soft and hard NMS method can be used to improve the detection rate of small faces.
  • the essence is a post-processing process, that is, a new module is added in the detection stage of the network to process the face frame predicted by the network to improve the accuracy of face detection by means of a dual-threshold NMS.
  • the ability of the network has not been improved too much, which means that the network essentially does not pay much attention to the small face, which leads to insufficient detection accuracy of the small face.
  • the purpose of the embodiments of the present application is to provide a face detection method and system, which can improve the network, thereby improving the accuracy of small face detection.
  • an embodiment of the present application provides a face detection method, including:
  • an embodiment of the present application also provides a face detection system, including:
  • the acquisition module is used to acquire the to-be-processed image of the target user
  • the extraction module is used to extract the head area map, face area map, and body area map of the image to be processed to obtain the first classification features corresponding to the head area map, face area map, and body area map, respectively Picture and the first characteristic picture;
  • the classification regression module is configured to perform classification regression processing on the first classification feature map of the head region map, the face region map, and the body region map to obtain the image to be matched of the image to be processed;
  • the position regression module is used to perform position regression processing on the first classification feature map of the head area map, the face area map, and the body area map and the image to be matched to obtain the target person in the image to be matched Face.
  • an embodiment of the present application also provides a computer device, the computer device includes a memory and a processor, the memory stores a face detection system that can run on the processor, and the person When the face detection system is executed by the processor, the following methods are implemented:
  • an embodiment of the present application also provides a computer-readable storage medium, and a computer program is stored in the computer-readable storage medium, and the computer program can be executed by at least one processor to enable the At least one processor executes the following methods:
  • This application actually enhances the accuracy of face recognition by enhancing the body and head regions to face recognition.
  • convolution and pooling operations are used to reduce the loss of facial features and retain as many features as possible for detection and regression.
  • the branch of face detection is used for classification, and no additional calculation is added, so the detection rate of the face is improved.
  • FIG. 1 is a flowchart of Embodiment 1 of the applicant's face detection method.
  • FIG. 2 is a flowchart of step S104 in FIG. 1 according to the first embodiment of the application.
  • FIG. 3 is a flowchart of step S106 in FIG. 1 according to the first embodiment of the application.
  • FIG. 4 is a flowchart of step S106C in FIG. 1 according to the first embodiment of the application.
  • Fig. 5 is a schematic diagram of the program modules of the second embodiment of the applicant’s face detection system.
  • FIG. 6 is a schematic diagram of the hardware structure of the third embodiment of the computer equipment of this application.
  • the technical solution of this application can be applied to the fields of artificial intelligence, smart city, blockchain and/or big data technology to realize face detection.
  • the data involved in this application such as various images, can be stored in a database, or can be stored in a blockchain, which is not limited in this application.
  • FIG. 1 shows a flowchart of the steps of the face detection method according to the first embodiment of the present application. It can be understood that the flowchart in this method embodiment is not used to limit the order of execution of the steps.
  • the following exemplarily describes the computer device 2 as the execution subject. details as follows.
  • Step S100 Obtain a to-be-processed image of the target user.
  • the to-be-processed image of the target user is acquired through photographing software such as a camera, and the to-be-processed image is a full-body image of the target user, including the head, face, and body of the target user.
  • Step S102 Extract the features of the head area map, face area map, and body area map of the image to be processed to obtain first classification features corresponding to the head area map, face area map, and body area map, respectively Figure and the first feature map.
  • the head area, face area, and body area of the image to be processed are intercepted to obtain a head area map, a face area map, and a body area map.
  • the first feature map and a first feature classification map are subjected to a layer of convolution, the number of convolution kernels is set to be the same to ensure that the first feature map and the first feature classification map have the same feature extraction accuracy.
  • the first classification feature map is a pixel feature map, which is used to identify images to be matched that are similar to the image to be processed; the first feature map is a key point location feature map, which is used to perform position regression on the image to be processed.
  • step S102 further includes:
  • Step S102A intercepting the head area map, the face area map, and the body area map of the image to be processed.
  • the image to be processed is recognized by a recognition algorithm, and the head area, face area, and body area of the target user are respectively recognized and intercepted.
  • the recognition algorithm may be: opencv, Sift algorithm, etc.
  • Step S102B performing convolution and pooling operations on the image to be processed to obtain a first classification feature map and a first feature map of the head region map, the face region map, and the body region map of the image to be processed, respectively.
  • the convolution and pooling operations extract the image features of the head area map, face area map, and body area map of the image to be processed, and form the first of each head area map, face area map, and body area map.
  • the classification feature map and the first feature map The convolution operation performs sharpening and edge extraction on the image to be processed to obtain the head area map, face area map and body area map, and use the pooling operation to perform the features of the head area map, face area map and body area map Compress without changing the features of the image, get 4 position regression feature maps and 1 classification feature map for each head region map, face region map, and body region map.
  • the pooling operation compresses the larger location regression feature map and the classification feature map to obtain the first classification feature map and the first feature map.
  • the feature map is made smaller and the complexity of network calculation is simplified.
  • the main features of the feature map are extracted, which has the invariance of rotation and translation, which can ensure that the image can be extracted as if the image is translated as a whole for matching.
  • Step S104 Perform classification regression processing on the first classification feature map of each of the head region map, the face region map, and the body region map to obtain the image to be matched of the image to be processed.
  • the first classification feature map and the face feature map, head feature map, and body feature map of the image to be processed are calculated separately to determine whether a feature position in the first classification feature map changes.
  • Perform classification regression processing on the first classification feature map and facial feature map through the loss function and determine whether each pixel on the first classification feature map belongs to the pixel on the image to be processed.
  • the loss loss function (L2-loss) is used to determine whether each pixel on the first classification feature map belongs to the target pixel on the image to be processed.
  • a threshold can be set to filter out the first image to be matched that has a degree of affinity with the image to be processed that is greater than the preset threshold.
  • the first classification feature map is used to obtain preliminary coordinate points through the loss function, and L2-loss can be used to determine whether each pixel on the first classification feature map belongs to the target coordinate.
  • L2-loss can be used to determine whether each pixel on the first classification feature map belongs to the target coordinate.
  • L cls (y, y * ) indicates that the pixel y and pixel y * are respectively the confidence of whether each pixel on the predicted feature map is the target, and the value of each pixel on the ground truth true feature map ⁇ (0, 1) .
  • the first feature map of each head area map, face area map, and body area map and the classified image whose confidence is greater than the preset value are filtered out to obtain the second image to be matched, and the first classification feature map is obtained
  • the first to-be-matched image of is combined, and the to-be-matched image is obtained by further screening.
  • Step S106 Perform position regression processing on the first classification feature map of each of the head area map, the face area map, and the body area map with the image to be matched to obtain a target face in the image to be matched .
  • the first classification feature map, the first feature map, and the image to be matched are subjected to position regression processing through the loss function to obtain the target image in the image to be matched.
  • step S106 further includes:
  • step S106A the first feature map of each of the head region map, the face region map, and the body region map of each of the images to be matched is subjected to feature stitching processing through the concat layer to obtain a second feature map.
  • the concat layer integrates the first feature map of the head region map, the face region map, and the body region map for feature stitching.
  • the extracted feature channels are consistent, and the head region map ,
  • the face area map and the first feature map of the body area map are spliced with each feature channel to obtain a second feature map of the image to be processed for position classification regression. Since the first feature map has four position regression feature maps with different accuracy, when the features are spliced, four second feature maps with different accuracy can be obtained. All the features of the first feature map of the head region map, the face region map and the body region map after the above-mentioned processing of the image to be processed are connected to obtain a second feature map. Each feature channel is connected to obtain a second feature map of the target user's whole body.
  • Step S106B performing a convolution operation on the second feature map to obtain a third feature map.
  • the first feature is respectively used ROI-pooled and L2 regularization, and then the resulting features are merged to obtain a second feature map, which is re-scaled to match the original ratio of the feature. Then apply 1x1 convolution to match the number of channels of the original network to obtain the third feature map.
  • Step S106C Perform position regression processing on the third feature map by using a regression loss function to obtain the target face in the image to be matched.
  • the precise head position, face position, and body position of the image to be processed are obtained, so that the position of the feature of the third feature map does not change during feature splicing.
  • Calculate the distance between the head position, face position and body position of the image to be processed and the head position, face position and body position in the image to be matched, and the image to be matched with the smallest distance difference is the target image.
  • step S106C further includes:
  • Step S106C1 Calculate the loss values of the image to be matched and the third feature map by using a regression loss function.
  • the regression loss function may be a loss function, and bbox is used for regression.
  • L loc represents the loss value.
  • Step S106C2 if the loss value of the third feature map and the image to be matched is less than a preset threshold, then the face of the image to be matched is the target face.
  • an image to be matched with the smallest loss value of the image to be matched from the head area map, the face area map, and the body area map is filtered out, and the face of the image to be matched is extracted as the target face.
  • FIG. 5 shows a schematic diagram of the program modules of the second embodiment of the applicant's face detection system.
  • the face detection system 20 may include or be divided into one or more program modules, and the one or more program modules are stored in a storage medium and executed by one or more processors to complete
  • This application can also implement the aforementioned face detection method.
  • the program module referred to in the embodiments of the present application refers to a series of computer program instruction segments capable of completing specific functions, and is more suitable for describing the execution process of the face detection system 20 in the storage medium than the program itself. The following description will specifically introduce the functions of each program module in this embodiment:
  • the acquiring module 200 is used to acquire the to-be-processed image of the target user.
  • the to-be-processed image of the target user is acquired through photographing software such as a camera, and the to-be-processed image is a full-body image of the target user, including the head, face, and body of the target user.
  • the extraction module 202 is used to extract the head area map, face area map, and body area map of the image to be processed to obtain the first category corresponding to the head area map, face area map, and body area map, respectively
  • the feature map and the first feature map are a pixel feature map, which is used to identify images to be matched that are similar to the image to be processed; the first feature map is a key point location feature map, which is used to perform position regression on the image to be processed.
  • the head area, face area, and body area of the image to be processed are intercepted to obtain a head area map, a face area map, and a body area map.
  • the number of convolution kernels is set to be the same to ensure that the first feature map and the first feature classification map have the same feature extraction accuracy.
  • the extraction module 202 is also used for:
  • the head area map, the face area map, and the body area map of the image to be processed are intercepted.
  • the image to be processed is recognized by a recognition algorithm, and the head area, face area, and body area of the target user are respectively recognized and intercepted.
  • the recognition algorithm may be: opencv, Sift algorithm, etc.
  • the convolution and pooling operations extract the image features of the head area map, face area map, and body area map of the image to be processed, and form the first of each head area map, face area map, and body area map.
  • the classification feature map and the first feature map The convolution operation performs sharpening and edge extraction on the image to be processed to obtain the head area map, face area map and body area map, and use the pooling operation to perform the features of the head area map, face area map and body area map Compress without changing the features of the image, get 4 position regression feature maps and 1 classification feature map for each head region map, face region map, and body region map.
  • the pooling operation compresses the larger location regression feature map and the classification feature map to obtain the first classification feature map and the first feature map.
  • the feature map is made smaller and the complexity of network calculation is simplified.
  • the main features of the feature map are extracted, which has the invariance of rotation and translation, which can ensure that the image can be extracted and matched even if the image is translated as a whole.
  • the classification regression module 204 is configured to perform classification regression processing on the first classification feature map of the head region map, the face region map, and the body region map to obtain the image to be matched of the image to be processed.
  • the first classification feature map and the face feature map, head feature map, and body feature map of the image to be processed are calculated separately to determine whether a feature position in the first classification feature map changes.
  • Perform classification regression processing on the first classification feature map and facial feature map through the loss function and determine whether each pixel on the first classification feature map belongs to the pixel on the image to be processed.
  • the loss loss function (L2-loss) is used to determine whether each pixel on the first classification feature map belongs to the target pixel on the image to be processed.
  • a threshold can be set to filter out the first image to be matched that has a degree of affinity with the image to be processed that is greater than the preset threshold.
  • the first feature map is used to obtain preliminary coordinate points through the loss function, and L2-loss can be used to determine whether each pixel on the first feature map belongs to the target coordinate, and the formula is as follows:
  • L cls (y, y * ) represents the confidence of the pixel y and pixel y * respectively and whether each pixel on the predicted feature map is the target, and the value of each pixel on the ground truth true feature map ⁇ (0, 1) .
  • the first feature map of each head area map, face area map, and body area map and the classified image whose confidence is greater than the preset value are filtered out to obtain the second image to be matched, and the first classification feature map is obtained
  • the first to-be-matched image of is combined, and the to-be-matched image is obtained by further screening.
  • the position regression module 206 is configured to perform position regression processing on the first classification feature map of each of the head region map, face region map, and body region map with the image to be matched to obtain Target face.
  • the first classification feature map, the first feature map, and the image to be matched are subjected to position regression processing through the loss function to obtain the target image in the image to be matched.
  • the position regression module 206 is also used to:
  • the first feature map of each of the head region map, the face region map, and the body region map of each of the images to be matched is subjected to feature stitching processing through the concat layer to obtain a second feature map.
  • the concat layer integrates the first feature maps of the head region map, the face region map, and the body region map to perform feature stitching processing to obtain the second feature map.
  • the extracted feature channels are consistent, and each feature channel of the first feature map of the head region map, face region map and body region map is spliced to obtain the image to be processed for position classification and regression The second feature map. Since the first feature map has four position regression feature maps with different accuracy, when the features are spliced, four second feature maps with different accuracy can be obtained.
  • Each feature channel is connected to obtain a feature map of the target user's whole body.
  • a convolution operation is performed on the second feature map to obtain a third feature map.
  • the first feature is respectively used ROI-pooled and L2 regularization, and then the resulting features are merged to obtain a second feature map, which is re-scaled to match the original ratio of the feature. Then apply 1x1 convolution to match the number of channels of the original network to obtain the third feature map.
  • the precise head position, face position, and body position of the image to be processed are obtained, so that the position of the feature of the third feature map does not change during feature splicing.
  • Calculate the distance between the head position, face position and body position of the image to be processed and the head position, face position and body position in the image to be matched, and the image to be matched with the smallest distance difference is the target image.
  • the position regression module 206 is also used to:
  • the loss value of the image to be matched and the third feature map is calculated by using a regression loss function.
  • the regression loss function may be a loss function, and bbox is used for regression.
  • L loc represents the loss value.
  • the loss value of the third feature map and the image to be matched is less than a preset threshold, then the face of the image to be matched is the target face.
  • an image to be matched with the smallest loss value of the image to be matched from the head area map, the face area map, and the body area map is filtered out, and the face of the image to be matched is extracted as the target face.
  • the computer device 2 is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions.
  • the computer device 2 may be a rack server, a blade server, a tower server, or a cabinet server (including an independent server or a server cluster composed of multiple servers).
  • the computer device 2 at least includes, but is not limited to, a memory and a processor.
  • the memory stores a face detection system that can run on the processor. When the face detection system is executed by the processor, Part or all of the steps in the above method can be realized.
  • the computer device may also include a network interface and/or a face detection system.
  • the computer device may include a memory 21, a processor 22, a network interface 23, and a face detection system 20.
  • the memory 21, the processor 22, the network interface 23, and the face detection system 20 can be connected to each other in communication through a system bus. in:
  • the memory 21 includes at least one type of computer-readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory ( RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
  • the memory 21 may be an internal storage unit of the computer device 2, for example, the hard disk or memory of the computer device 2.
  • the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a smart media card (SMC), and a secure digital (Secure Digital, SMC) equipped on the computer device 2. SD) card, flash card (Flash Card), etc.
  • the memory 21 may also include both the internal storage unit of the computer device 2 and its external storage device.
  • the memory 21 is generally used to store the operating system and various application software installed in the computer device 2, for example, the program code of the face detection system 20 in the second embodiment.
  • the memory 21 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 22 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips.
  • the processor 22 is generally used to control the overall operation of the computer device 2.
  • the processor 22 is used to run the program code or process data stored in the memory 21, for example, to run the face detection system 20, so as to implement the face detection method of the first embodiment.
  • the processor 22 may execute the following methods:
  • processor 22 may also execute other steps of the method in the foregoing embodiment, which will not be repeated here.
  • the network interface 23 may include a wireless network interface or a wired network interface, and the network interface 23 is generally used to establish a communication connection between the server 2 and other electronic devices.
  • the network interface 23 is used to connect the server 2 to an external terminal through a network, and to establish a data transmission channel and a communication connection between the server 2 and the external terminal.
  • the network may be Intranet, Internet, Global System of Mobile Communication (GSM), Wideband Code Division Multiple Access (WCDMA), 4G network, 5G Network, Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.
  • Figure X only shows the computer device 2 with components 20-23, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead.
  • the face detection system 20 stored in the memory 21 may also be divided into one or more program modules.
  • the one or more program modules are stored in the memory 21 and are composed of one or more program modules.
  • a plurality of processors (the processor 22 in this embodiment) are executed to complete the application.
  • FIG. 5 shows a schematic diagram of program modules for implementing the second embodiment of the face detection system 20.
  • the face detection system 20 can be divided into an acquisition module 200, an extraction module 202, and a classification regression module. 204 and the position return module 206.
  • the program module referred to in the present application refers to a series of computer program instruction segments that can complete specific functions, and is more suitable than a program to describe the execution process of the face detection system 20 in the computer device 2.
  • the specific functions of the program modules 200-206 have been described in detail in the second embodiment, and will not be repeated here.
  • This embodiment also provides a computer-readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), only Readable memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disks, optical disks, servers, App application malls, etc., on which computer programs are stored, The corresponding function is realized when the program is executed by the processor.
  • the computer-readable storage medium in this embodiment is used to store the face detection system 20, and when executed by a processor, the face detection method in the first embodiment is implemented.
  • a computer program such as the face detection system 20 may be executed by at least one processor, so that the at least one processor executes the following method:
  • the storage medium involved in this application such as a computer-readable storage medium, may be non-volatile or volatile.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

一种人脸检测方法和一种人脸检测系统,该方法包括:获取目标用户的待处理图像(S100);提取所述待处理图像的头部区域图、脸部区域图及身体区域图,以得到所述头部区域图、脸部区域图与身体区域图对应的第一分类特征图与第一特征图(S102);将所述头部区域图、脸部区域图及身体区域图的第一分类特征图分类回归处理,以得到所述待处理图像的待匹配图像(S104);将所述头部区域图、脸部区域图与身体区域图的第一分类特征图与所述待匹配图像进行位置回归处理,以获取所述待匹配图像中的目标人脸(S106)。上述方案的有益效果在于:能够对网络进行改进,从而提高小脸检测的精确度。

Description

人脸检测方法与系统
本申请要求于2020年3月3日提交中国专利局、申请号为202010138386.8,发明名称为“人脸检测方法与系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及人脸识别领域,尤其涉及一种人脸检测方法与系统。
背景技术
目前,人脸检测在考勤,注册等领域应用已经十分成熟,精度相当高。但在楼宇及室外等不受控制的场景下,小尺度人脸检测仍然是一项巨大的挑战。发明人发现,现有检测方案是:利用深度网络的多层特征图信息进行人脸检测;特别是抽取低层特征图信息,提高小脸的检测精度。但效果并不理想,原因有三:1.小脸由于尺度较小,在经过低层卷积网络的各种下采样操作后,目标特征信息损失过大,只留下很少一部分信息用于检测;2.人为预设的预测框在复杂环境下的鲁棒性较好,应用广泛;但由于真实人脸,预测框尺度,感受野不匹配,造成检出率随着人脸尺度的减小急剧下降。3.人为预设的预测框需要精细设计,在检测阶段需要配合采样策略,才能提高小脸检出率。发明人意识到,目前可通过一种soft and hard NMS的方法来提高小脸的检出率。本质是一个后处理的过程,即在网络的检测阶段加入了一个新的模块,对网络预测出的人脸框进行处理,以双阈值NMS的方式提高检脸精度。但对网络的能力并没有做太多改进,意味着网络本质上对小脸的关注度没有太多提升,从而导致小脸的检测精度不够。
发明内容
有鉴于此,本申请实施例的目的是提供一种人脸检测方法与系统,能够对网络进行改进,从而提高小脸检测的精确度。
为实现上述目的,本申请实施例提供了一种人脸检测方法,包括:
获取目标用户的待处理图像;
提取所述待处理图像的头部区域图、脸部区域图及身体区域图,以分别得到所述头部区域图、脸部区域图及身体区域图对应的第一分类特征图与第一特征图;
将所述头部区域图、脸部区域图及身体区域图的第一分类特征图进行分类回归处理,以得到所述待处理图像的待匹配图像;
将所述头部区域图、脸部区域图及身体区域图的第一分类特征图与所述待处理图像进行位置回归处理,以获取所述待匹配图像中的目标人脸。
为实现上述目的,本申请实施例还提供了一种人脸检测系统,包括:
获取模块,用于获取目标用户的待处理图像;
提取模块,用于提取所述待处理图像的头部区域图、脸部区域图及身体区域图,以分别得到所述头部区域图、脸部区域图及身体区域图对应的第一分类特征图与第一特征图;
分类回归模块,用于将所述头部区域图、脸部区域图及身体区域图的第一分类特征图进行分类回归处理,以得到所述待处理图像的待匹配图像;
位置回归模块,用于将所述头部区域图、脸部区域图与身体区域图的第一分类特征图与所述待匹配图像进行位置回归处理,以获取所述待匹配图像中的目标人脸。
为实现上述目的,本申请实施例还提供了一种计算机设备,所述计算机设备包括存储器、处理器,所述存储器上存储有可在所述处理器上运行的人脸检测系统,所述人脸检测系统被所述处理器执行时实现以下方法:
获取目标用户的待处理图像;
提取所述待处理图像的头部区域图、脸部区域图及身体区域图,以分别得到所述头部区域图、脸部区域图及身体区域图对应的第一分类特征图与第一特征图;
将所述头部区域图、脸部区域图及身体区域图的第一分类特征图进行分类回归处理, 以得到所述待处理图像的待匹配图像;
将所述头部区域图、脸部区域图及身体区域图的第一分类特征图与所述待处理图像进行位置回归处理,以获取所述待匹配图像中的目标人脸。
为实现上述目的,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机程序,所述计算机程序可被至少一个处理器所执行,以使所述至少一个处理器执行以下方法:
获取目标用户的待处理图像;
提取所述待处理图像的头部区域图、脸部区域图及身体区域图,以分别得到所述头部区域图、脸部区域图及身体区域图对应的第一分类特征图与第一特征图;
将所述头部区域图、脸部区域图及身体区域图的第一分类特征图进行分类回归处理,以得到所述待处理图像的待匹配图像;
将所述头部区域图、脸部区域图及身体区域图的第一分类特征图与所述待处理图像进行位置回归处理,以获取所述待匹配图像中的目标人脸。
本申请实通过将身体及头部区域加强到人脸识别上,增强了人脸识别的准确度。特征提取时,采用了卷积操作与池化操作,减少了脸部的特征损失,保留尽可能多的特征用于检测和回归。但在待匹配图像的预测时,只使用人脸检测的分支进分类,并没有增加额外的计算量,因此提高了人脸的检出率。
附图说明
图1为本申请人脸检测方法实施例一的流程图。
图2为本申请实施例一图1中步骤S104的流程图。
图3为本申请实施例一图1中步骤S106的流程图。
图4为本申请实施例一图1中步骤S106C的流程图。
图5为本申请人脸检测系统实施例二的程序模块示意图。
图6为本申请计算机设备实施例三的硬件结构示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的技术方案可应用于人工智能、智慧城市、区块链和/或大数据技术领域,以实现人脸检测。可选的,本申请涉及的数据如各种图像等可存储于数据库中,或者可以存储于区块链中,本申请不做限定。
实施例一
参阅图1,示出了本申请实施例一之人脸检测方法的步骤流程图。可以理解,本方法实施例中的流程图不用于对执行步骤的顺序进行限定。下面以计算机设备2为执行主体进行示例性描述。具体如下。
步骤S100,获取目标用户的待处理图像。
具体地,通过摄像机等拍照软件获取目标用户的待处理图像,待处理图像为目标用户的全身图像,包括目标用户的头部、脸部和身体。
步骤S102,提取所述待处理图像的头部区域图、脸部区域图及身体区域图的特征,以分别得到所述头部区域图、脸部区域图及身体区域图对应的第一分类特征图与第一特征图。
具体地,对待处理图像的头部区域、脸部区域和身体区域进行截取,得到头部区域图、脸部区域图和身体区域图。对头部区域图、脸部区域图及身体区域图进行第一层卷积与第二层池化的两次下采样,分别得到头部区域图、脸部区域图及身体区域图的四张第一特征 图与一张第一特征分类图,经过一层卷积时,卷积核数目设置相同,对以保证第一特征图与第一特征分类图的特征提取精度一致。第一分类特征图为像素特征图,用于识别出待处理图像相似的待匹配图像;第一特征图为关键点位置特征图,用于对待处理图像进行位置回归。
示例性地,参阅图2,步骤S102进一步包括:
步骤S102A,截取所述待处理图像的头部区域图、脸部区域图与身体区域图。
具体地,通过识别算法对待处理图像进行识别,分别识别出目标用户的头部区域、脸部区域和身体区域,并进行截取,识别算法可以为:opencv、Sift算法等。
步骤S102B,对所述待处理图像进行卷积与池化操作,分别得到所述待处理图像的头部区域图、脸部区域图与身体区域图的第一分类特征图与第一特征图。
具体地,卷积与池化操作提取出待处理图像的头部区域图、脸部区域图与身体区域图的图像特征,形成各个头部区域图、脸部区域图与身体区域图的第一分类特征图与第一特征图。卷积操作对待处理图像进行锐化和边缘提取,得到头部区域图、脸部区域图与身体区域图,并利用池化操作将头部区域图、脸部区域图与身体区域图的特征进行压缩,且不改变图像的特征,得到每个头部区域图、脸部区域图与身体区域图的4张位置回归特征图与1张分类特征图。池化操作对较大的位置回归特征图与分类特征图进行压缩,得到第一分类特征图与第一特征图。一方面使特征图变小简化网络计算的复杂度,另一方面提取特征图的主要特征,具有旋转平移不变性,能够保证图像整体上发生了平移一样能提取特征进行匹配。
步骤S104,将每个所述头部区域图、脸部区域图及身体区域图的第一分类特征图进行分类回归处理,以得到所述待处理图像的待匹配图像。
具体地,将第一分类特征图与待处理图像的脸部特征图、头部特征图以及身体特征图分别进行计算,确定第一分类特征图中个特征位置是否发生变化。通过损失函数对第一分类特征图与脸部特征图进行分类回归处理,判断第一分类特征图上每一个像素点是否属于待处理图像上的像素点,可以使用损失函数进行计算,例如使用分类loss损失函数(L2-loss),以判断第一分类特征图上每一个像素点是否属于待处理图像上的目标像素。在判断时,可以通过设置阈值的方式,以筛选出与待处理图像的置性度大于预设阈值的第一待匹配图像。
示例性地,将所述第一分类特征图通过损失函数得到初步坐标点,可使用L2-loss,判断第一分类特征图上每一个像素点是否属于目标坐标,公式如下:
L cls(y,y *)=‖y-y *2
其中,L cls(y,y *)表示像素y与像素y *分别为预测特征图上每一个像素是否为目标的置信度,ground truth真实特征图上的每一个像素值∈(0,1)。将每个头部区域图、脸部区域图与身体区域图的第一特征图与分类图像的置信度大于预设值的分类图像筛选出来得到第二待匹配图像,与第一分类特征图得到的第一待匹配图像进行结合,进一步筛选得到待匹配图像。
步骤S106,将每个所述头部区域图、脸部区域图及身体区域图的第一分类特征图与所述待匹配图像进行位置回归处理,以获取所述待匹配图像中的目标人脸。
具体地,通过损失函数对第一分类特征图和第一特征图以及待匹配图像进行位置回归处理,得到待匹配图像中的目标图像。
示例性地,参阅图3,步骤S106进一步包括:
步骤S106A,通过concat层将每个所述待匹配图像头部区域图、脸部区域图与身体区域图的第一特征图进行特征拼接处理,得到第二特征图。
具体地,concat层将头部区域图、脸部区域图与身体区域图的第一特征图综合起来进 行特征拼接处理,在特征采样时,提取的特征通道的是一致的,将头部区域图、脸部区域图与身体区域图的第一特征图每个特征通道进行拼接,得到用于位置分类回归的待处理图像的第二特征图。由于第一特征图有四个不同精度的位置回归特征图,特征拼接时,可得到四张不同精确度的第二特征图。将待处理图像经过上述处理后的头部区域图、脸部区域图与身体区域图的第一征图的特征全部连接起来,得到第二特征图。每个特征通道进行连接,得到目标用户全身的第二特征图。
步骤S106B,对所述第二特征图进行卷积操作,以获得第三特征图。
具体地,将第一特征分别使用ROI-pooled和L2正则化,然后将这些结果的特征合并,得到第二特征图,并重新定标,以匹配特征的原始比例。然后应用1x1卷积以匹配原始网络的通道数量,得到第三特征图。
步骤S106C,通过回归损失函数对所述第三特征图进行位置回归处理,以获取所述待匹配图像中的目标人脸。
具体地,通过bbox回归loss处理后,得到待处理图像的精确的头部位置、脸部位置与身体位置,以使第三特征图的特征的位置在特征拼接时不发生变化。计算待处理图像的头部位置、脸部位置与身体位置与待匹配像中的头部位置、脸部位置与身体位置之间的距离,得到距离差异值最小的待匹配图像即为目标图像。
示例性地,参阅图4,步骤S106C进一步包括:
步骤S106C1,通过回归损失函数计算所述待匹配图像及所述第三特征图的损失值。
具体地,回归损失函数可以为loss函数,利用bbox进行回归。
示例性地,所述回归损失函数的计算公式为:
Figure PCTCN2020135079-appb-000001
其中,
Figure PCTCN2020135079-appb-000002
代表所述头部区域图、脸部区域图与身体区域图的第三特征图的像素点到所述待匹配图像的左上角(t x,t y)与右下角(d x,d y)的距离;
Figure PCTCN2020135079-appb-000003
表示所述待匹配图像的头部区域图、脸部区域图与身体区域图的像素点到所述待匹配图像的左上角与右下角的实际距离,i表示像素点;L loc表示损失值。
具体地,当L loc损失值越小时,表示两者的相似度越大,越匹配。
步骤S106C2,若所述第三特征图与所述待匹配图像的损失值小于预设阈值,则将所述待匹配图像的人脸为目标人脸。
具体地,筛选出待匹配图像分别与头部区域图、脸部区域图与身体区域图的损失值最小的待匹配图像,提取所述待匹配图像的人脸作为目标人脸。
实施例二
请继续参阅图5,示出了本申请人脸检测系统实施例二的程序模块示意图。在本实施例中,人脸检测系统20可以包括或被分割成一个或多个程序模块,一个或者多个程序模块被存储于存储介质中,并由一个或多个处理器所执行,以完成本申请,并可实现上述人脸检测方法。本申请实施例所称的程序模块是指能够完成特定功能的一系列计算机程序指令段,比程序本身更适合于描述人脸检测系统20在存储介质中的执行过程。以下描述将具体介绍本实施例各程序模块的功能:
获取模块200,用于获取目标用户的待处理图像。
具体地,通过摄像机等拍照软件获取目标用户的待处理图像,待处理图像为目标用户的全身图像,包括目标用户的头部、脸部和身体。
提取模块202,用于提取所述待处理图像的头部区域图、脸部区域图及身体区域图,以分别得到所述头部区域图、脸部区域图及身体区域图对应的第一分类特征图与第一特征图。第一分类特征图为像素特征图,用于识别出待处理图像相似的待匹配图像;第一特征图为关键点位置特征图,用于对待处理图像进行位置回归。
具体地,对待处理图像的头部区域、脸部区域和身体区域进行截取,得到头部区域图、脸部区域图和身体区域图。对头部区域图、脸部区域图及身体区域图进行第一层卷积与第二层池化的两次下采样,分别得到头部区域图、脸部区域图及身体区域图的四张第一特征图与一张第一特征分类图,经过一层卷积时,卷积核数目设置相同,对以保证第一特征图与第一特征分类图的特征提取精度一致。
示例性地,提取模块202还用于:
截取所述待处理图像的头部区域图、脸部区域图与身体区域图。
具体地,通过识别算法对待处理图像进行识别,分别识别出目标用户的头部区域、脸部区域和身体区域,并进行截取,识别算法可以为:opencv、Sift算法等。
对所述待处理图像进行卷积与池化操作,分别得到所述待处理图像的头部区域图、脸部区域图与身体区域图的第一分类特征图与第一特征图。
具体地,卷积与池化操作提取出待处理图像的头部区域图、脸部区域图与身体区域图的图像特征,形成各个头部区域图、脸部区域图与身体区域图的第一分类特征图与第一特征图。卷积操作对待处理图像进行锐化和边缘提取,得到头部区域图、脸部区域图与身体区域图,并利用池化操作将头部区域图、脸部区域图与身体区域图的特征进行压缩,且不改变图像的特征,得到每个头部区域图、脸部区域图与身体区域图的4张位置回归特征图与1张分类特征图。池化操作对较大的位置回归特征图与分类特征图进行压缩,得到第一分类特征图与第一特征图。一方面使特征图变小简化网络计算的复杂度,另一方面提取特征图的主要特征,具有旋转平移不变性,能够保证图像整体上发生了平移一样能提取特征进行匹配。
分类回归模块204,用于将所述头部区域图、脸部区域图及身体区域图的第一分类特征图进行分类回归处理,以得到所述待处理图像的待匹配图像。
具体地,将第一分类特征图与待处理图像的脸部特征图、头部特征图以及身体特征图分别进行计算,确定第一分类特征图中个特征位置是否发生变化。通过损失函数对第一分类特征图与脸部特征图进行分类回归处理,判断第一分类特征图上每一个像素点是否属于待处理图像上的像素点,可以使用损失函数进行计算,例如使用分类loss损失函数(L2-loss),以判断第一分类特征图上每一个像素点是否属于待处理图像上的目标像素。在判断时,可以通过设置阈值的方式,以筛选出与待处理图像的置性度大于预设阈值的第一待匹配图像。
示例性地,将所述第一特征图通过损失函数得到初步坐标点,可使用L2-loss,判断第一特征图上每一个像素点是否属于目标坐标,公式如下:
L cls(y,y *)=‖y-y *2
其中,L cls(y,y *)表示像素y与像素y *分别与预测特征图上每一个像素是否为目标的置信度,ground truth真实特征图上的每一个像素值∈(0,1)。将每个头部区域图、脸部区域图与身体区域图的第一特征图与分类图像的置信度大于预设值的分类图像筛选出来得到第二待匹配图像,与第一分类特征图得到的第一待匹配图像进行结合,进一步筛选得到待匹配图像。
位置回归模块206,用于将每个所述头部区域图、脸部区域图及身体区域图的第一分类特征图与所述待匹配图像进行位置回归处理,以获取所述待匹配图像中的目标人脸。
具体地,通过损失函数对第一分类特征图和第一特征图以及待匹配图像进行位置回归 处理,得到待匹配图像中的目标图像。
示例性地,位置回归模块206还用于:
通过concat层将每个所述待匹配图像头部区域图、脸部区域图与身体区域图的第一特征图进行特征拼接处理,得到第二特征图。
具体地,concat层将头部区域图、脸部区域图与身体区域图的第一特征图综合起来进行特征拼接处理,得到第二特征图。在特征采样时,提取的特征通道的是一致的,将头部区域图、脸部区域图与身体区域图的第一特征图每个特征通道进行拼接,得到用于位置分类回归的待处理图像的第二特征图。由于第一特征图有四个不同精度的位置回归特征图,特征拼接时,可得到四张不同精确度的第二特征图。每个特征通道进行连接,得到目标用户全身的特征图。
对所述第二特征图进行卷积操作,以获得第三特征图。
具体地,将第一特征分别使用ROI-pooled和L2正则化,然后将这些结果的特征合并,得到第二特征图,并重新定标,以匹配特征的原始比例。然后应用1x1卷积以匹配原始网络的通道数量,得到第三特征图。
通过回归损失函数对所述待匹配图像及所述第三特征图进行位置回归处理,以获取所述待匹配图像中的目标人脸。
具体地,通过bbox回归loss处理后,得到待处理图像的精确的头部位置、脸部位置与身体位置,以使第三特征图的特征的位置在特征拼接时不发生变化。计算待处理图像的头部位置、脸部位置与身体位置与待匹配像中的头部位置、脸部位置与身体位置之间的距离,得到距离差异值最小的待匹配图像即为目标图像。
示例性地,位置回归模块206还用于:
通过回归损失函数计算所述待匹配图像及所述第三特征图的损失值。
具体地,回归损失函数可以为loss函数,利用bbox进行回归。
示例性地,所述回归损失函数的计算公式为:
Figure PCTCN2020135079-appb-000004
其中,
Figure PCTCN2020135079-appb-000005
代表所述头部区域图、脸部区域图与身体区域图的第三特征图的像素点到所述待匹配图像的左上角(t x,t y)与右下角(d x,d y)的距离;
Figure PCTCN2020135079-appb-000006
表示所述待匹配图像的头部区域图、脸部区域图与身体区域图的像素点到所述待匹配图像的左上角与右下角的实际距离,i表示像素点;L loc表示损失值。
具体地,当L loc损失值越小时,表示两者的相似度越大,越匹配。
若所述第三特征图与所述待匹配图像的损失值小于预设阈值,则将所述待匹配图像的人脸为目标人脸。
具体地,筛选出待匹配图像分别与头部区域图、脸部区域图与身体区域图的损失值最小的待匹配图像,提取所述待匹配图像的人脸作为目标人脸。
实施例三
参阅图6,是本申请实施例三之计算机设备的硬件架构示意图。本实施例中,所述计算机设备2是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设备。该计算机设备2可以是机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。如图6所示,所述计算机设备2至少包括,但不限于,存储器和处理器,存储器上存储有可在处理器上运行的人 脸检测系统,所述人脸检测系统被处理器执行时可实现上述方法中的部分或全部步骤。可选的,该计算机设备还可包括网络接口和/或人脸检测系统。例如,该计算机设备可包括存储器21、处理器22、网络接口23以及人脸检测系统20,如可通过系统总线相互通信连接存储器21、处理器22、网络接口23、以及人脸检测系统20。其中:
本实施例中,存储器21至少包括一种类型的计算机可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,存储器21可以是计算机设备2的内部存储单元,例如该计算机设备2的硬盘或内存。在另一些实施例中,存储器21也可以是计算机设备2的外部存储设备,例如该计算机设备2上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,存储器21还可以既包括计算机设备2的内部存储单元也包括其外部存储设备。本实施例中,存储器21通常用于存储安装于计算机设备2的操作系统和各类应用软件,例如实施例二的人脸检测系统20的程序代码等。此外,存储器21还可以用于暂时地存储已经输出或者将要输出的各类数据。
处理器22在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器22通常用于控制计算机设备2的总体操作。本实施例中,处理器22用于运行存储器21中存储的程序代码或者处理数据,例如运行人脸检测系统20,以实现实施例一的人脸检测方法。示例的,处理器22可执行以下方法:
获取目标用户的待处理图像;
提取所述待处理图像的头部区域图、脸部区域图及身体区域图,以分别得到所述头部区域图、脸部区域图及身体区域图对应的第一分类特征图与第一特征图;
将所述头部区域图、脸部区域图及身体区域图的第一分类特征图进行分类回归处理,以得到所述待处理图像的待匹配图像;
将所述头部区域图、脸部区域图及身体区域图的第一分类特征图与所述待匹配图像进行位置回归处理,以获取所述待匹配图像中的目标人脸。
可选的,处理器22还可执行上述实施例中方法的其他步骤,这里不再赘述。
所述网络接口23可包括无线网络接口或有线网络接口,该网络接口23通常用于在所述服务器2与其他电子装置之间建立通信连接。例如,所述网络接口23用于通过网络将所述服务器2与外部终端相连,在所述服务器2与外部终端之间的建立数据传输通道和通信连接等。所述网络可以是企业内部网(Intranet)、互联网(Internet)、全球移动通讯系统(Global System of Mobile communication,GSM)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi等无线或有线网络。
需要指出的是,图X仅示出了具有部件20-23的计算机设备2,但是应理解的是,并不要求实施所有示出的部件,可以替代的实施更多或者更少的部件。
在本实施例中,存储于存储器21中的所述人脸检测系统20还可以被分割为一个或者多个程序模块,所述一个或者多个程序模块被存储于存储器21中,并由一个或多个处理器(本实施例为处理器22)所执行,以完成本申请。
例如,图5示出了所述实现人脸检测系统20实施例二的程序模块示意图,该实施例中,所述人脸检测系统20可以被划分为获取模块200、提取模块202、分类回归模块204及位置回归模块206。其中,本申请所称的程序模块是指能够完成特定功能的一系列计算机程序指令段,比程序更适合于描述所述人脸检测系统20在所述计算机设备2中的执行过程。所述程序模块200-206的具体功能在实施例二中已有详细描述,在此不再赘述。
实施例四
本实施例还提供一种计算机可读存储介质,如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘、服务器、App应用商城等等,其上存储有计算机程序,程序被处理器执行时实现相应功能。本实施例的计算机可读存储介质用于存储人脸检测系统20,被处理器执行时实现实施例一的人脸检测方法。示例的,计算机程序如人脸检测系统20可被至少一个处理器所执行,以使所述至少一个处理器执行以下方法:
获取目标用户的待处理图像;
提取所述待处理图像的头部区域图、脸部区域图及身体区域图,以分别得到所述头部区域图、脸部区域图及身体区域图对应的第一分类特征图与第一特征图;
将所述头部区域图、脸部区域图及身体区域图的第一分类特征图进行分类回归处理,以得到所述待处理图像的待匹配图像;
将所述头部区域图、脸部区域图及身体区域图的第一分类特征图与所述待匹配图像进行位置回归处理,以获取所述待匹配图像中的目标人脸
可选的,该计算机程序被处理器执行时还可实现上述实施例中方法的其他步骤,这里不再赘述。进一步可选的,本申请涉及的存储介质如计算机可读存储介质可以是非易失性的,也可以是易失性的。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种人脸检测方法,其中,包括:
    获取目标用户的待处理图像;
    提取所述待处理图像的头部区域图、脸部区域图及身体区域图,以分别得到所述头部区域图、脸部区域图及身体区域图对应的第一分类特征图与第一特征图;
    将所述头部区域图、脸部区域图及身体区域图的第一分类特征图进行分类回归处理,以得到所述待处理图像的待匹配图像;
    将所述头部区域图、脸部区域图及身体区域图的第一分类特征图与所述待匹配图像进行位置回归处理,以获取所述待匹配图像中的目标人脸。
  2. 根据权利要求1所述的人脸检测方法,其中,提取所述待处理图像的头部区域图、脸部区域图及身体区域图,以分别得到所述头部区域图、脸部区域图及身体区域图对应的第一分类特征图与第一特征图包括:
    截取所述待处理图像的头部区域图、脸部区域图与身体区域图;
    对所述待处理图像进行卷积与池化操作,分别得到所述待处理图像的头部区域图、脸部区域图与身体区域图的第一分类特征图与第一特征图。
  3. 根据权利要求1所述的人脸检测方法,其中,将所述头部区域图、脸部区域图与身体区域图的第一分类特征图与所述待匹配图像进行位置回归处理,以获取所述待匹配图像中的目标人脸包括:
    通过concat层将所述待匹配图像头部区域图、脸部区域图与身体区域图的第一特征图进行特征拼接处理,得到第二特征图;
    对所述第二特征图进行卷积操作,以获得第三特征图;
    通过回归损失函数对所述待匹配图像及所述第三特征图进行位置回归处理,以得到所述待匹配图像中的目标人脸。
  4. 根据权利要求3所述的人脸检测方法,其中,通过回归损失函数对所述待匹配图像及所述第三特征图进行位置回归处理,以得到所述待匹配图像中的目标人脸包括:
    通过回归损失函数计算所述待匹配图像及所述第三特征图的损失值;
    若所述第三特征图与所述待匹配图像的损失值小于预设阈值,则将所述待匹配图像的人脸为目标人脸。
  5. 根据权利要求4所述的人脸检测方法,其中,所述回归损失函数的计算公式为:
    Figure PCTCN2020135079-appb-100001
    其中,
    Figure PCTCN2020135079-appb-100002
    代表所述头部区域图、脸部区域图与身体区域图的第三特征图的像素点到所述待匹配图像的左上角(t x,ty)与右下角(d x,d y)的距离;
    Figure PCTCN2020135079-appb-100003
    表示所述待匹配图像的头部区域图、脸部区域图与身体区域图的像素点到所述待匹配图像的左上角与右下角的实际距离,i表示像素点;L loc表示损失值。
  6. 根据权利要求2所述的人脸检测方法,其中,对所述待处理图像进行卷积与池化操作,分别得到所述待处理图像的头部区域图、脸部区域图与身体区域图的第一分类特征图与第一特征图包括:
    通过卷积操作对待处理图像进行锐化和边缘提取,得到头部区域图、脸部区域图与身体区域图,并利用池化操作将头部区域图、脸部区域图与身体区域图的特征进行压缩,且不改变图像的特征,得到每个头部区域图、脸部区域图与身体区域图的四张第一特征图与 一张第一特征分类图。
  7. 一种人脸检测系统,其中,包括:
    获取模块,用于获取目标用户的待处理图像;
    提取模块,用于提取所述待处理图像的头部区域图、脸部区域图及身体区域图,以分别得到所述头部区域图、脸部区域图及身体区域图对应的第一分类特征图与第一特征图;
    分类回归模块,用于将所述头部区域图、脸部区域图与身体区域图的第一分类特征图进行分类回归处理,以得到所述待处理图像的待匹配图像;
    位置回归模块,用于将所述头部区域图、脸部区域图与身体区域图的第一分类特征图与所述待匹配图像进行位置回归处理,以获取所述待匹配图像中的目标图像。
  8. 根据权利要求7所述的人脸检测系统,其中,所述位置回归模块还用于:
    通过concat层将每个所述待匹配图像头部区域图、脸部区域图与身体区域图的第一特征图进行特征拼接处理,得到每个第二特征图;
    对所述第二特征图进行卷积操作,以获得第三特征图;
    通过回归损失函数对所述待匹配图像及所述第三特征图进行位置回归处理,以获取所述待匹配图像中的目标人脸。
  9. 一种计算机设备,其中,所述计算机设备包括存储器、处理器,所述存储器上存储有可在所述处理器上运行的人脸检测系统,所述人脸检测系统被所述处理器执行时实现以下方法:
    获取目标用户的待处理图像;
    提取所述待处理图像的头部区域图、脸部区域图及身体区域图,以分别得到所述头部区域图、脸部区域图及身体区域图对应的第一分类特征图与第一特征图;
    将所述头部区域图、脸部区域图及身体区域图的第一分类特征图进行分类回归处理,以得到所述待处理图像的待匹配图像;
    将所述头部区域图、脸部区域图及身体区域图的第一分类特征图与所述待匹配图像进行位置回归处理,以获取所述待匹配图像中的目标人脸。
  10. 根据权利要求9所述的计算机设备,其中,提取所述待处理图像的头部区域图、脸部区域图及身体区域图,以分别得到所述头部区域图、脸部区域图及身体区域图对应的第一分类特征图与第一特征图时,具体实现:
    截取所述待处理图像的头部区域图、脸部区域图与身体区域图;
    对所述待处理图像进行卷积与池化操作,分别得到所述待处理图像的头部区域图、脸部区域图与身体区域图的第一分类特征图与第一特征图。
  11. 根据权利要求9所述的计算机设备,其中,将所述头部区域图、脸部区域图与身体区域图的第一分类特征图与所述待匹配图像进行位置回归处理,以获取所述待匹配图像中的目标人脸时,具体实现:
    通过concat层将所述待匹配图像头部区域图、脸部区域图与身体区域图的第一特征图进行特征拼接处理,得到第二特征图;
    对所述第二特征图进行卷积操作,以获得第三特征图;
    通过回归损失函数对所述待匹配图像及所述第三特征图进行位置回归处理,以得到所述待匹配图像中的目标人脸。
  12. 根据权利要求11所述的计算机设备,其中,通过回归损失函数对所述待匹配图像及所述第三特征图进行位置回归处理,以得到所述待匹配图像中的目标人脸时,具体实现:
    通过回归损失函数计算所述待匹配图像及所述第三特征图的损失值;
    若所述第三特征图与所述待匹配图像的损失值小于预设阈值,则将所述待匹配图像的人脸为目标人脸。
  13. 根据权利要求12所述的计算机设备,其中,所述回归损失函数的计算公式为:
    Figure PCTCN2020135079-appb-100004
    其中,
    Figure PCTCN2020135079-appb-100005
    代表所述头部区域图、脸部区域图与身体区域图的第三特征图的像素点到所述待匹配图像的左上角(t x,ty)与右下角(d x,d y)的距离;
    Figure PCTCN2020135079-appb-100006
    表示所述待匹配图像的头部区域图、脸部区域图与身体区域图的像素点到所述待匹配图像的左上角与右下角的实际距离,i表示像素点;L loc表示损失值。
  14. 根据权利要求10所述的计算机设备,其中,对所述待处理图像进行卷积与池化操作,分别得到所述待处理图像的头部区域图、脸部区域图与身体区域图的第一分类特征图与第一特征图时,具体实现:
    通过卷积操作对待处理图像进行锐化和边缘提取,得到头部区域图、脸部区域图与身体区域图,并利用池化操作将头部区域图、脸部区域图与身体区域图的特征进行压缩,且不改变图像的特征,得到每个头部区域图、脸部区域图与身体区域图的四张第一特征图与一张第一特征分类图。
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质内存储有计算机程序,所述计算机程序可被至少一个处理器所执行,以使所述至少一个处理器执行以下方法:
    获取目标用户的待处理图像;
    提取所述待处理图像的头部区域图、脸部区域图及身体区域图,以分别得到所述头部区域图、脸部区域图及身体区域图对应的第一分类特征图与第一特征图;
    将所述头部区域图、脸部区域图及身体区域图的第一分类特征图进行分类回归处理,以得到所述待处理图像的待匹配图像;
    将所述头部区域图、脸部区域图及身体区域图的第一分类特征图与所述待匹配图像进行位置回归处理,以获取所述待匹配图像中的目标人脸。
  16. 根据权利要求15所述的计算机可读存储介质,其中,提取所述待处理图像的头部区域图、脸部区域图及身体区域图,以分别得到所述头部区域图、脸部区域图及身体区域图对应的第一分类特征图与第一特征图时,具体执行:
    截取所述待处理图像的头部区域图、脸部区域图与身体区域图;
    对所述待处理图像进行卷积与池化操作,分别得到所述待处理图像的头部区域图、脸部区域图与身体区域图的第一分类特征图与第一特征图。
  17. 根据权利要求15所述的计算机可读存储介质,其中,将所述头部区域图、脸部区域图与身体区域图的第一分类特征图与所述待匹配图像进行位置回归处理,以获取所述待匹配图像中的目标人脸时,具体执行:
    通过concat层将所述待匹配图像头部区域图、脸部区域图与身体区域图的第一特征图进行特征拼接处理,得到第二特征图;
    对所述第二特征图进行卷积操作,以获得第三特征图;
    通过回归损失函数对所述待匹配图像及所述第三特征图进行位置回归处理,以得到所述待匹配图像中的目标人脸。
  18. 根据权利要求17所述的计算机可读存储介质,其中,通过回归损失函数对所述待匹配图像及所述第三特征图进行位置回归处理,以得到所述待匹配图像中的目标人脸时,具体执行:
    通过回归损失函数计算所述待匹配图像及所述第三特征图的损失值;
    若所述第三特征图与所述待匹配图像的损失值小于预设阈值,则将所述待匹配图像的人脸为目标人脸。
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述回归损失函数的计算公式为:
    Figure PCTCN2020135079-appb-100007
    其中,
    Figure PCTCN2020135079-appb-100008
    代表所述头部区域图、脸部区域图与身体区域图的第三特征图的像素点到所述待匹配图像的左上角(t x,ty)与右下角(d x,d y)的距离;
    Figure PCTCN2020135079-appb-100009
    表示所述待匹配图像的头部区域图、脸部区域图与身体区域图的像素点到所述待匹配图像的左上角与右下角的实际距离,i表示像素点;L loc表示损失值。
  20. 根据权利要求16所述的计算机可读存储介质,其中,对所述待处理图像进行卷积与池化操作,分别得到所述待处理图像的头部区域图、脸部区域图与身体区域图的第一分类特征图与第一特征图时,具体执行:
    通过卷积操作对待处理图像进行锐化和边缘提取,得到头部区域图、脸部区域图与身体区域图,并利用池化操作将头部区域图、脸部区域图与身体区域图的特征进行压缩,且不改变图像的特征,得到每个头部区域图、脸部区域图与身体区域图的四张第一特征图与一张第一特征分类图。
PCT/CN2020/135079 2020-03-03 2020-12-10 人脸检测方法与系统 WO2021174940A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010138386.8A CN111310710A (zh) 2020-03-03 2020-03-03 人脸检测方法与系统
CN202010138386.8 2020-03-03

Publications (1)

Publication Number Publication Date
WO2021174940A1 true WO2021174940A1 (zh) 2021-09-10

Family

ID=71145482

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/135079 WO2021174940A1 (zh) 2020-03-03 2020-12-10 人脸检测方法与系统

Country Status (2)

Country Link
CN (1) CN111310710A (zh)
WO (1) WO2021174940A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115439938A (zh) * 2022-09-09 2022-12-06 湖南智警公共安全技术研究院有限公司 一种防分裂的人脸档案数据归并处理方法及系统

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310710A (zh) * 2020-03-03 2020-06-19 平安科技(深圳)有限公司 人脸检测方法与系统
CN111814612A (zh) * 2020-06-24 2020-10-23 浙江大华技术股份有限公司 目标的脸部检测方法及其相关装置
CN113469041A (zh) * 2021-06-30 2021-10-01 北京市商汤科技开发有限公司 一种图像处理方法、装置、计算机设备和存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107644208A (zh) * 2017-09-21 2018-01-30 百度在线网络技术(北京)有限公司 人脸检测方法和装置
CN110717424A (zh) * 2019-09-26 2020-01-21 南昌大学 一种基于预处理机制的实时极小人脸检测方法
CN111310710A (zh) * 2020-03-03 2020-06-19 平安科技(深圳)有限公司 人脸检测方法与系统

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107886074B (zh) * 2017-11-13 2020-05-19 苏州科达科技股份有限公司 一种人脸检测方法以及人脸检测系统
CN108416265A (zh) * 2018-01-30 2018-08-17 深圳大学 一种人脸检测方法、装置、设备及存储介质

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107644208A (zh) * 2017-09-21 2018-01-30 百度在线网络技术(北京)有限公司 人脸检测方法和装置
CN110717424A (zh) * 2019-09-26 2020-01-21 南昌大学 一种基于预处理机制的实时极小人脸检测方法
CN111310710A (zh) * 2020-03-03 2020-06-19 平安科技(深圳)有限公司 人脸检测方法与系统

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ACTIVEWASTE: "PyramidBox: A Context-assisted Single Shot Face Detector", CSDN BLOG, XP009530118, Retrieved from the Internet <URL:https://blog.csdn.net/qq_41375609/article/details/100528483> *
XU TANG; DANIEL K. DU; ZEQIANG HE; JINGTUO LIU: "PyramidBox: A Context-assisted Single Shot Face Detector", ARXIV.ORG, 17 August 2018 (2018-08-17), XP081091784 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115439938A (zh) * 2022-09-09 2022-12-06 湖南智警公共安全技术研究院有限公司 一种防分裂的人脸档案数据归并处理方法及系统
CN115439938B (zh) * 2022-09-09 2023-09-19 湖南智警公共安全技术研究院有限公司 一种防分裂的人脸档案数据归并处理方法及系统

Also Published As

Publication number Publication date
CN111310710A (zh) 2020-06-19

Similar Documents

Publication Publication Date Title
WO2021174940A1 (zh) 人脸检测方法与系统
CN110569721A (zh) 识别模型训练方法、图像识别方法、装置、设备及介质
Kou et al. Gradient domain guided image filtering
Jiang et al. Salient object detection: A discriminative regional feature integration approach
JP7266828B2 (ja) 画像処理方法、装置、デバイスおよびコンピュータプログラム
US9530045B2 (en) Method, system and non-transitory computer storage medium for face detection
CN111583097A (zh) 图像处理方法、装置、电子设备及计算机可读存储介质
CN110084238B (zh) 基于LadderNet网络的指静脉图像分割方法、装置和存储介质
CN111461170A (zh) 车辆图像检测方法、装置、计算机设备及存储介质
CN110489951A (zh) 风险识别的方法、装置、计算机设备和存储介质
CN114037637B (zh) 一种图像数据增强方法、装置、计算机设备和存储介质
CN112102340A (zh) 图像处理方法、装置、电子设备和计算机可读存储介质
CN111415373A (zh) 基于孪生卷积网络的目标跟踪与分割方法、系统及介质
WO2021151319A1 (zh) 卡片边框检测方法、装置、设备及可读存储介质
CN111914775A (zh) 活体检测方法、装置、电子设备及存储介质
CN115329111B (zh) 一种基于点云与影像匹配的影像特征库构建方法及系统
CN113344000A (zh) 证件翻拍识别方法、装置、计算机设备和存储介质
CN107392211B (zh) 基于视觉稀疏认知的显著目标检测方法
WO2022199395A1 (zh) 人脸活体检测方法、终端设备及计算机可读存储介质
CN109785367B (zh) 三维模型追踪中外点滤除方法和装置
CN112712468A (zh) 虹膜图像超分辨率重建方法及计算设备
CN110163910B (zh) 物体对象定位方法、装置、计算机设备和存储介质
CN113228105A (zh) 一种图像处理方法、装置和电子设备
US10706499B2 (en) Image processing using an artificial neural network
WO2022206679A1 (zh) 图像处理方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20923216

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20923216

Country of ref document: EP

Kind code of ref document: A1