WO2019091271A1 - 一种人脸检测方法以及人脸检测系统 - Google Patents

一种人脸检测方法以及人脸检测系统 Download PDF

Info

Publication number
WO2019091271A1
WO2019091271A1 PCT/CN2018/110703 CN2018110703W WO2019091271A1 WO 2019091271 A1 WO2019091271 A1 WO 2019091271A1 CN 2018110703 W CN2018110703 W CN 2018110703W WO 2019091271 A1 WO2019091271 A1 WO 2019091271A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
network
candidate area
image
frame
Prior art date
Application number
PCT/CN2018/110703
Other languages
English (en)
French (fr)
Inventor
晋兆龙
赵波
陈卫东
Original Assignee
苏州科达科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州科达科技股份有限公司 filed Critical 苏州科达科技股份有限公司
Publication of WO2019091271A1 publication Critical patent/WO2019091271A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Definitions

  • the present application relates to the field of software development technologies, and in particular, to a face detection method and a face detection system.
  • face detection method there are two main directions for face detection, one is the traditional face detection method using feature plus classifier, such as the widely used VJ face detector; the other is the face detection method based on deep learning framework.
  • the face detection method of the traditional feature plus classifier has two main defects. One is sensitive to factors such as size, angle and image quality, and the robustness of the detection effect is not enough. Second, the speed of detecting small faces on the large image is not enough. fast.
  • the face detection method using the deep learning framework has greatly improved the detection effect.
  • the advanced detection method can detect the face below 20 pixels ⁇ 20 pixels under the condition of controlling the false detection, and for the angle change And the picture quality is very robust.
  • this method is usually slower, and most methods are difficult to achieve real-time detection even with graphics processor (GPU) acceleration.
  • GPU graphics processor
  • an object of the present application is to provide a face detection method and a face detection system.
  • a face detection method includes the following steps: S10: establishing a convolutional neural network framework, the convolutional neural network framework including at least: a candidate region generation network, a correction network And the multi-information output network, wherein the step includes: training the candidate area generation network, the correction network, and the multi-information output network; and sequentially cascading the candidate area generation network, the correction network, and the multi-information output network; S20: The convolutional neural network framework is connected to a data preparation module, wherein the data preparation module is configured to acquire a source image, and generate a network, a correction network, and a demand source pair of the multi-information output network according to the candidate region.
  • the image is processed; S30: the data preparation module performs format conversion on the image to be detected in the source image according to the image input format requirement of the candidate region generation network, and sends the image to be sent to the candidate region generation network, and then runs the Generating a network for the candidate region, and generating a plurality of first face candidate region frames; S40: The data preparation module acquires image data according to the first face candidate area frame in the step S30, runs the correction network, and filters the first face candidate area frame, and the remaining first The face candidate area frame performs position correction and performs non-maximum suppression; S50: the data preparation module acquires image data according to the first face candidate area frame in the step S40, and runs the multi-information output network to filter the a first face candidate area frame, performing position correction on the remaining first face candidate area frame and performing non-maximal suppression, outputting the remaining first face candidate area frame and the first The face feature points corresponding to the face candidate area frame and the face pose.
  • a face detection system comprising: a data preparation module, configured to acquire a source image, and process the source image; a convolutional neural network framework, a convolutional neural network framework is coupled to the data preparation module, wherein the convolutional neural network framework includes at least a candidate region generation network, a correction network, and a multi-information output network that are sequentially cascaded; wherein the candidate region generation network is used Generating a plurality of first face candidate area frames; the correction network is configured to filter the first face candidate area frame generated after the candidate area generation network operation, and the remaining remaining after screening A face candidate area frame performs position correction and performs non-maximum suppression; the multi-information output network is configured to filter the first face candidate area frame generated after the modified network is run, and leave the remaining after screening The first face candidate area frame performs position correction and performs non-maximum suppression and outputs the remaining first face candidate area frame and the A face candidate region corresponding to the frame and face feature points face
  • the face detection system further includes a person's whole body recognition network, the person's whole body recognition network is cascaded with the multi-information output network, and is connected to the data preparation module; the person's whole body recognition network is used for The first face candidate region generated after the operation of the multi-information output network is expanded to obtain a person's whole body region frame, and the person's whole body region frame is screened, and the remaining person's whole body region frame is corrected and output.
  • a person's whole body recognition network the person's whole body recognition network is cascaded with the multi-information output network, and is connected to the data preparation module; the person's whole body recognition network is used for The first face candidate region generated after the operation of the multi-information output network is expanded to obtain a person's whole body region frame, and the person's whole body region frame is screened, and the remaining person's whole body region frame is corrected and output.
  • the face detection method and the face detection system form a convolutional neural network framework by adding a data preparation film block and cascading the data preparation module with three networks. Connection; after the candidate region generation network, the correction network, and the multi-information output network in the convolutional neural network framework, the face region, the face feature point, and the face pose can be output. While ensuring the face detection effect, the image cropping, image scaling, image normalization, image channel separation, data type conversion, non-maximum suppression and other operational functions are all completed by the data preparation module, and the data of each network. Uniform is provided by the data preparation module, which ensures that most of the calculations are done by the GPU.
  • the CPU is only responsible for a small amount of logic control and result analysis, and does not involve additional memory to the copy of the memory during the detection process, thus making full use of it.
  • GPU acceleration greatly improves the detection speed of GPU optimization.
  • the output information is rich; not only the face area, but also the face feature points, the face gestures and even the whole body of the characters, which is very helpful for the subsequent analysis.
  • FIG. 1 is a flowchart of a face detection method according to an embodiment of the present application.
  • FIG. 2 is a flowchart of establishing a convolutional neural network framework in a face detection method according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of three angles describing a face pose in a face detection method according to an embodiment of the present application
  • FIG. 4 is a flowchart of a stage for generating a candidate area generation network in a face detection method according to an embodiment of the present application
  • FIG. 5 is a flowchart of a human face recognition network in a face detection method according to an embodiment of the present application.
  • FIG. 6 is a detection result of a face detection method according to an embodiment of the present application on an FDDB;
  • FIG. 7 is a schematic structural diagram of a face detection system according to an embodiment of the present application.
  • a face detection method of the present application includes the following steps: S10: establishing a convolutional neural network framework, the convolutional neural network framework including at least: a candidate region generation network, a correction network, and a multi-information output a network, where the step includes the following steps: training the candidate area generation network, the correction network, and the multi-information output network; and sequentially cascading the candidate area generation network, the correction network, and the multi-information output network; S20: The convolutional neural network framework is connected to a data preparation module, wherein the data preparation module is configured to acquire a source image, and process the source image according to the requirements of the candidate region generation network, the correction network, and the multi-information output network; S30: The data preparation module performs format conversion on the image to be detected in the source image according to the image input format requirement of the candidate region generation network, and sends the image to be detected to the candidate region generation network, and then runs the candidate region generation network.
  • S40 the data The preparation module acquires image data according to the first face candidate area frame in the step S30, runs the correction network, and filters the first face candidate area frame, and the remaining first face candidate area frame Performing position correction and performing non-maximum suppression
  • S50 the data preparation module acquires image data according to the first face candidate area frame in the step S40, runs the multi-information output network, and filters the first face a candidate area frame, performing position correction on the remaining first face candidate area frame and performing non-maximal suppression, and outputting the remaining first face candidate area frame and the first face candidate area frame Corresponding face feature points and face poses.
  • FIG. 1 shows a flowchart of a face detection method according to an embodiment of the present application.
  • the face detection method of the present application is mainly used for monitoring face detection in an image.
  • the face detection method includes the following steps:
  • Step S10 Establish a convolutional neural network framework.
  • the convolutional neural network framework includes at least: a candidate area generation network, a correction network, and a multi-information output network.
  • the number of convolution layers of each network in the candidate area generation network, the correction network, and the multi-information output network does not exceed 4 layers to ensure the running speed of each network.
  • FIG. 2 a flow chart of establishing a convolutional neural network framework in a face detection method according to an embodiment of the present application is shown. As shown in FIG. 2, the following steps are included in step S10:
  • Step S101 Train the candidate area generation network, the correction network, and the multi-information output network.
  • the candidate area generating network is configured to generate a first face candidate area frame
  • the modified network is configured to filter the first face candidate area frame obtained by the candidate area generating network, and remove most of the Non-face area, and make certain position corrections.
  • the training of the candidate region generation network and the modified network may be, for example, Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li. Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks. IEEE Signal Processing Letters, vol. 23, no. .10, pp. 1499-1503, 2016. Mentioned in the literature.
  • the multi-information output network is configured to perform further determination and precise positioning according to the face area outputted by the correction network, and simultaneously output the face feature point and the face pose.
  • the data of the multi-information output network is prepared from three public data sets, and the training of the multi-information output network includes the following steps:
  • the face positive and negative samples and the face frame regression values are obtained from the winderface data set.
  • the face positive and negative samples and face frame regression values are obtained from the widerface data set using the candidate region generation network and the correction network, respectively.
  • a sample of facial feature points is obtained from the CelebA data set.
  • the facial feature points include a left eye, a right eye, a nose tip, a left mouth corner, and a right mouth corner.
  • a face pose sample is obtained from the AFLW data set.
  • the face pose description includes three angles, which are a rotation angle, a pitch angle, and a yaw angle.
  • FIG. 3 is a schematic diagram showing three angles of a face pose in a face detection method according to an embodiment of the present application.
  • the rotation angle refers to an angle of rotation in the direction of the arrow B
  • the elevation angle refers to an angle of rotation in the direction of the arrow C
  • the yaw angle refers to an angle of rotation in the direction of the arrow D.
  • the multi-task joint training is performed according to the face positive and negative samples, the face frame regression value, the face feature point sample, and the face pose sample.
  • the classification loss function is:
  • p i is the predicted value of the network. Is the label of the sample;
  • the loss function for the box position is:
  • the loss function of the face feature point is:
  • the loss function of the face pose is:
  • the total loss function for joint training is:
  • N is the number of samples
  • ⁇ j is the weight of the task. Indicates whether the sample is a human face, Represents the loss function for each task.
  • the weight of the classification training is 1, the weight of the frame position training is 0.75, the weight of the face feature point training is 0.75, and the weight of the face posture training is 0.5.
  • Step S102 Cascading the candidate area generation network, the correction network, and the multi-information output network in sequence.
  • Step S20 connecting the convolutional neural network framework to a data preparation module.
  • the data preparation module is configured to acquire a source image, and process the source image according to the requirements of the candidate region generation network, the correction network, and the multi-information output network, which may be a virtual module or may be a Physical electronics.
  • the data preparation module can be composed of a pre-expanded video memory and a series of operation functions, wherein the operation function includes image cropping, image scaling, image normalization, image channel separation, data type conversion, non-maximal suppression. And other functions.
  • the data preparation module may be a graphics processing unit (GPU), and the above operation functions may all be performed by a graphics processing unit (GPU) as a data preparation module.
  • GPU graphics processing unit
  • the source image refers to the original image captured by the monitoring device.
  • the source image is directly loaded into the data preparation module, and the data preparation module is respectively connected to the training completion candidate region generation network, the correction network, and the multi-information output network.
  • Each network runs to request data required by the data preparation module.
  • the data preparation module processes the source image according to the input requirements of the network, converts the image into a specified format, and stores the data in a specified position.
  • the network will start running.
  • the data preparation module can also obtain the results of each network after running.
  • Step S30 The data preparation module performs format conversion on the image to be detected in the source image according to the image input format requirement of the candidate region generation network, and sends the image to be sent to the candidate region generation network, and then runs the candidate region to generate The network generates a plurality of first face candidate area frames.
  • FIG. 4 shows a flowchart of running a candidate region generation network phase in a face detection method according to an embodiment of the present application. As shown in FIG. 4, in this step, the following steps are further included:
  • Step S301 The image preparation module constructs an image pyramid according to the image to be detected in the source image.
  • the image pyramid is formed by multiple pyramid layers formed by scaling the original image in the same ratio, for example, taking a 3-layer pyramid layer as an example, the bottom layer (the first layer) is the original image, and the second layer image is the second layer image.
  • the image formed by scaling the first layer image according to the A scale, and the third layer image is an image formed by scaling the second layer image according to the A scale, and details are not described herein.
  • each layer of the pyramid layer of the image pyramid constructed in step S301 is scaled by a scale of 0.709.
  • Step S302 Calculate a score and a position correction value of all the pixel regions of the first size on the image to be detected.
  • the first size is a template area of 12 x 12 pixels. That is, in this step, the score and position correction value of the template area of all 12 ⁇ 12 pixels on the input image on the image to be detected are calculated. Wherein, the calculation of the score and the position correction value is automatically calculated by the candidate area generation network based on the result of its training.
  • Step S303 The pixel region whose score is greater than the preset first threshold is used as the second face candidate region frame, and non-maximal suppression is performed.
  • Step S304 repeating the above steps S302 and S303 until all the pyramid layers in the image pyramid are processed.
  • Step S305 After the data preparation module maps all the second face candidate area frames to the image to be detected and performs non-maximal suppression, a plurality of the first face candidate area frames are generated. Specifically, in this step, the data preparation module maps all the second face candidate area frames acquired in step S304 according to the scaling ratio (ie, the above-mentioned scaling ratio of 0.709) to the Non-maximal suppression is performed in the image to be detected, thereby generating a first face candidate region frame.
  • the first face candidate area frame is the face candidate area finally generated by the candidate area generation network.
  • Step S40 The data preparation module acquires image data according to the first face candidate area frame in the step S30, and processes the image data according to the input requirement of the correction network and stores the image data to a designated position. Then, the correction network is run, and the first face candidate area frame is filtered, and the remaining first face candidate area frame is positionally corrected and non-maximally suppressed.
  • the manner of screening is similar to the above step S30. Under the training result of the modified network, the scores of the first face candidate area frames after the screening in step S30 are calculated, and the first face whose score is lower than the set threshold is calculated. The candidate area box is deleted.
  • Step S50 The data preparation module acquires image data according to the first face candidate area frame in the step S40, runs the multi-information output network, and filters the first face candidate area frame to the remaining The first face candidate area frame performs position correction and performs non-maximum suppression, and outputs the remaining first face candidate area frame and the face feature points corresponding to the first face candidate area frame and the face attitude.
  • the manner of screening is similar to the above steps S30 and S40. Under the training result of the multi-information output network, the scores of the first face candidate area frames after the screening in step S40 are calculated, and the score is lower than the set threshold. The first face candidate area box is deleted.
  • the convolutional neural network framework further includes a character whole body recognition network.
  • the character whole body recognition network is cascaded with the multi-information output network and connected to the data preparation module.
  • the basic structure of the character whole body recognition network is the same as the above-mentioned multi-information output network, except that the final output is reduced to two, that is, the regression value of the human confidence level and the frame position.
  • the calculation formula of the loss function is the same as the multi-information output network described above, except that the weight of the classification training in the joint training is 1, and the weight of the frame position training is 1.
  • the face detection method further includes the following steps:
  • Step S601 According to the area where the first face candidate area frame outputted in the step S50 is located, a person's whole body area frame is enlarged. Wherein, the whole body area frame of the character in the step is expanded according to the empirical value on the basis of the first face candidate area frame to obtain a rough body full-body area frame. Further, step S602 is also required to be performed.
  • Step S602 The data preparation module acquires image data according to the whole body area frame of the character in the step S601, and processes the image number according to the input requirement of the person's whole body recognition network, and stores the image number to the designated position. Then, the character whole body recognition network is run, and the whole body area frame of the character is screened, and the remaining full body area frame of the character is corrected and output. The manner of screening is similar to the above steps S30, S40 and S50. Under the training result of the whole body recognition network of the character, the calculation data of the whole body area frame of each character is obtained according to the execution result of step S601 stored by the calculation data preparation module, and the score is low. Delete the character's whole body area box at the set threshold. Further, the remaining person's whole body area frame is corrected and output.
  • the image of the output after the above processing from step S10 to step S602 is as shown in FIG. 5.
  • the broken line frame A1 is the first face candidate area frame outputted in the above step S50
  • the broken line frame A2 is the person whole body area frame outputted in the above step S602.
  • the number 99 above the dashed box A1 is the score of the first face candidate area frame corresponding to the dashed box A1 calculated after the step S50 is executed; (2.0, -5.5, -22.5) respectively represent the first face candidate area.
  • the angles of the rotation angle, the elevation angle, and the yaw angle of the face pose in the frame that is, the rotation angle is 2.2, the pitch angle is -5.5, and the yaw angle is -22.5.
  • the number 99 above the dashed box A2 is the score of the full body area frame of the character corresponding to the dashed box A2 calculated after the step S602 is executed.
  • the video image is formed into an image pyramid, with the image pyramid being the input to the network in this application.
  • the first network (candidate area generation network) is responsible for generating the first face candidate area frame.
  • the three concatenated convolution layers are mainly used to obtain the first candidate region and the regression vector. Then, according to the regression vector, the first face candidate area frame is filtered and merged, and a large number of non-face areas can be eliminated.
  • the convolutional core of the convolutional layer is 3x3 convolution.
  • the input of the second network is the first face candidate region frame and the regression vector obtained by the first network (candidate region generation network).
  • the second network (correction network) performs the necessary position adjustment on this basis, and similarly, three cascaded volume base layers are also sent for further non-face area removal.
  • the convolution kernel of the first two convolutional layers uses a 3x3 convolution
  • the convolution kernel of the third convolutional layer is a 2x2 convolution.
  • the third network performs further determination and precise positioning according to the first face candidate area frame output by the second network (correction network), using four concatenated convolution layers and one full connection layer. At the same time, face feature points and face poses are output.
  • the convolution kernel of the first two convolutional layers uses 3x3 convolution
  • the convolution kernel of the latter two convolutional layers is 2x2 convolution.
  • the fourth network generates a corresponding human upper body and a full-person candidate frame on the video image according to the first face candidate area frame determined by the third network (multi-information output network), and adopts two cascaded convolution layers. To get the person's location.
  • the convolutional core of the convolutional layer is 3x3 convolution.
  • the face detection method provided by the embodiment of the present application is a convolutional neural network formed by adding a data preparation film block and cascading the data preparation module with three networks.
  • the image cropping, image scaling, image normalization, image channel separation, data type conversion, non-maximum suppression and other operational functions are all completed by the data preparation module, and the data of each network.
  • Uniform is provided by the data preparation module, which ensures that most of the calculations are done by the GPU.
  • the CPU is only responsible for a small amount of logic control and result analysis, and does not involve additional memory to the copy of the memory during the detection process, thus making full use of it.
  • GPU acceleration greatly improves the detection speed of GPU optimization.
  • the output information is rich; not only the face area, but also the face feature points, the face gestures and even the whole body of the characters, which is very helpful for the subsequent analysis.
  • FIG. 7 shows a schematic structural diagram of a face detection system according to an embodiment of the present application.
  • the present application further provides a face detection system for implementing the above-described face detection method.
  • the face detection system mainly includes: a data preparation module 1 and a convolutional neural network framework.
  • the data preparation module 1 is configured to acquire a source image and process the source image.
  • the convolutional neural network framework is coupled to the data preparation module 1.
  • the convolutional neural network framework includes at least a candidate region generation network 21, a correction network 22, and a multi-information output network 23 which are sequentially cascaded.
  • the candidate area generation network 21, the correction network 22, and the multi-information output network 23 are respectively connected to the data preparation module 1.
  • the candidate region generation network 21 is configured to generate a plurality of first face candidate region frames.
  • the correction network 22 is configured to filter the first face candidate area frame generated after the candidate area generation network 21 is run, and perform position correction on the first face candidate area frame remaining after the screening and perform non-maximal suppression.
  • the multi-information output network 23 is configured to filter the first face candidate area frame generated after the correction network 22 is run, perform position correction on the first face candidate area frame remaining after the screening, and perform non-maximum suppression and output remainder.
  • the first face candidate area frame and the face feature point corresponding to the first face candidate area frame and the face pose are left.
  • the face detection system further includes a character body recognition network 24.
  • the character whole body recognition network 24 is cascaded with the multi-information output network 23, and is connected to the data preparation module 1.
  • the person whole body recognition network 24 is configured to expand the first face candidate area generated after the multi-information output network 23 is run to obtain a full body area frame of the character, and filter the whole body area frame of the character to correct the remaining body full body area frame. Output.
  • the face detection method and the face detection system are connected by a convolutional neural network framework formed by adding a data preparation film block and cascading the data preparation module with three networks; After the candidate region generation network, the correction network, and the multi-information output network in the convolutional neural network framework are sequentially operated, the face region, the face feature point, and the face pose can be output. While ensuring the face detection effect, the image cropping, image scaling, image normalization, image channel separation, data type conversion, non-maximum suppression and other operational functions are all completed by the data preparation module, and the data of each network. Uniform is provided by the data preparation module, which ensures that most of the calculations are done by the GPU.
  • the CPU is only responsible for a small amount of logic control and result analysis, and does not involve additional memory to the copy of the memory during the detection process, thus making full use of it.
  • GPU acceleration greatly improves the detection speed of GPU optimization.
  • the output information is rich; not only the face area, but also the face feature points, the face gestures and even the whole body of the characters, which is very helpful for the subsequent analysis.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

一种人脸检测方法以及人脸检测系统。人脸检测方法包括如下步骤:建立卷积神经网络框架,卷积神经网络框架至少包括:候选区域生成网络、修正网络以及多信息输出网络;将卷积神经网络框架与一数据准备模块连接;运行候选区域生成网络,生成多个第一人脸候选区域框;运行修正网络,筛选第一人脸候选区域框,对余留的第一人脸候选区域框进行位置修正并进行非极大抑制;运行多信息输出网络,筛选第一人脸候选区域框,对余留的第一人脸候选区域框进行位置修正并进行非极大抑制,输出余留的第一人脸候选区域框和第一人脸候选区域框对应的人脸特征点以及人脸姿态。

Description

一种人脸检测方法以及人脸检测系统 技术领域
本申请涉及软件开发技术领域,尤其涉及一种人脸检测方法以及人脸检测系统。
背景技术
目前人脸检测主要有两大方向,一种是采用特征加分类器的传统人脸检测方式,比如广泛应用的VJ人脸检测器;另一种是基于深度学习框架的人脸检测方式。
传统的特征加分类器的人脸检测方法主要有两个缺陷,一是对尺寸、角度、画质等因素敏感,检测效果的鲁棒性不够,二是在大图上检测小脸的速度不够快。而采用深度学习框架的人脸检测方式在检测效果上有很大的提升,如今先进的检测方法能够在控制误检的情况下检测出低于20像素×20像素的人脸,且对于角度变化和画质有很好的鲁棒性。但是这种方式通常速度较慢,大部分方法即使有图形处理器(GPU)加速也很难达到实时检测的要求。
发明内容
针对现有技术中的缺陷,本申请的目的是提供一种人脸检测方法以及人脸检测系统。
根据本申请的一个方面提供一种人脸检测方法,所述人脸检测方法包括如下步骤:S10:建立卷积神经网络框架,所述卷积神经网络框架至少包括:候选区域生成网络、修正网络以及多信息输出网络,其中,该步骤中包括如下步骤:训练所述候选区域生成网络、修正网络以及多信息输出网络;将所述候选区域生成网络、修正网络以及多信息输出网络依次级联;S20:将所述卷积神经网络框架与一数据准备模块连接,其中,所述数据准备模块用于获取源图像,并根据所述候选区域生成网络、修正网络以及多信息输出网络的需求对源图像进行处理;S30:所述数据准备模块按照 所述候选区域生成网络的图像输入格式要求将所述源图像中的待检测图像进行格式转换并发送至所述候选区域生成网络后,运行所述候选区域生成网络、生成多个第一人脸候选区域框;S40:所述数据准备模块根据所述步骤S30中的第一人脸候选区域框获取图像数据,运行所述修正网络,筛选所述第一人脸候选区域框,对余留的所述第一人脸候选区域框进行位置修正并进行非极大抑制;S50:所述数据准备模块根据所述步骤S40中的第一人脸候选区域框获取图像数据,运行所述多信息输出网络,筛选所述第一人脸候选区域框,对余留的所述第一人脸候选区域框进行位置修正并进行非极大抑制,输出余留的所述第一人脸候选区域框以及所述第一人脸候选区域框对应的人脸特征点以及人脸姿态。
根据本申请的另一个方面,还提供一种人脸检测系统,所述人脸检测系统包括:数据准备模块,用于获取源图像,并对源图像进行处理;卷积神经网络框架,所述卷积神经网络框架与所述数据准备模块连接,其中,所述卷积神经网络框架至少包括依次级联的候选区域生成网络、修正网络以及多信息输出网络;其中,所述候选区域生成网络用于生成多个第一人脸候选区域框;所述修正网络用于对所述候选区域生成网络运行后生成的所述第一人脸候选区域框进行筛选,对筛选后余留的所述第一人脸候选区域框进行位置修正并进行非极大抑制;所述多信息输出网络用于对所述修正网络运行后生成的所述第一人脸候选区域框进行筛选、对筛选后余留的所述第一人脸候选区域框进行位置修正并进行非极大抑制以及输出余留的所述第一人脸候选区域框以及所述第一人脸候选区域框对应的人脸特征点以及人脸姿态。
优选地,所述人脸检测系统还包括人物全身识别网络,所述人物全身识别网络与所述多信息输出网络级联,并且连接所述数据准备模块;所述人物全身识别网络用于对所述多信息输出网络运行后生成的所述第一人脸候选区域狂进行扩大得到人物全身区域框,并且筛选所述人物全身区域框,对余留的所述人物全身区域框修正后输出。
相比于现有技术,本申请实施例提供的人脸检测方法以及人脸检测系统中通过增加一数据准备膜块,并将数据准备模块与三个网络级联而成的卷积神经网络框架连接;经过卷积神经网络框架中候选区域生成网络、修 正网络以及多信息输出网络的依次运行后能够输出人脸区域,人脸特征点、人脸姿态。在保证人脸检测效果的同时,图像裁切、图像缩放、图像归一化、图像通道分离、数据类型转化、非极大抑制等操作函数都是由数据准备模块所完成的,每个网络的数据统一由数据准备模块提供,这样可以保证绝大部分的计算由GPU完成,CPU仅负责少量的逻辑控制和结果分析,并且在检测的过程中不涉及额外的内存到显存的拷贝,从而充分利用了GPU加速,极大地提升了检测速度GPU优化。
此外,还可以通过与一人物全身识别网络级联后进一步对人物全身进行设别。
可见,该人脸检测方法至少具有如下有益效果:
a、速度快;在有GPU(例如Nvidia GTX1080)加速的情况下,在1920×1080(1080p)的分辨率上检测20×20像素的人脸可以达到50FPS以上,完全满足实时检测的需求。
b、人脸检测定位效果比传统的特征加分类器的方式要好很多。
c、输出信息丰富;不仅有人脸区域,还可以输出人脸特征点、人脸姿态甚至使人物全身,对后续的分析有很大帮助。
附图说明
通过阅读参照以下附图对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:
图1为本申请的一个实施例的人脸检测方法的流程图;
图2为本申请的一个实施例的人脸检测方法中建立卷积神经网络框架的流程图;
图3为本申请的一个实施例的人脸检测方法中描述人脸姿态的三个角度的示意图;
图4为本申请的一个实施例的人脸检测方法中运行候选区域生成网络阶段的流程图;
图5为本申请的一个实施例的人脸检测方法中运行人物全身识别网络后的流程图;
图6为本申请的一个实施例的人脸检测方法在FDDB上的检测结果; 以及
图7为本申请的一个实施例的人脸检测系统的结构示意图。
具体实施方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的实施方式;相反,提供这些实施方式使得本申请将全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。在图中相同的附图标记表示相同或类似的结构,因而将省略对它们的重复描述。
所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。在下面的描述中,提供许多具体细节从而给出对本申请的实施方式的充分理解。然而,本领域技术人员应意识到,没有特定细节中的一个或更多,或者采用其它的方法、组元、材料等,也可以实践本申请的技术方案。在某些情况下,不详细示出或描述公知结构、材料或者操作以避免模糊本申请。
此外,附图仅为本公开的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
依据本申请的主旨构思,本申请的一种人脸检测方法包括如下步骤:S10:建立卷积神经网络框架,所述卷积神经网络框架至少包括:候选区域生成网络、修正网络以及多信息输出网络,其中,该步骤中包括如下步骤:训练所述候选区域生成网络、修正网络以及多信息输出网络;将所述候选区域生成网络、修正网络以及多信息输出网络依次级联;S20:将所述卷积神经网络框架与一数据准备模块连接,其中,所述数据准备模块用于获取源图像,并根据所述候选区域生成网络、修正网络以及多信息输出网络的需求对源图像进行处理;S30:所述数据准备模块按照所述候选区域生成网络的图像输入格式要求将所述源图像中的待检测图像进行格式 转换并发送至所述候选区域生成网络后,运行所述候选区域生成网络、生成多个第一人脸候选区域框;S40:所述数据准备模块根据所述步骤S30中的第一人脸候选区域框获取图像数据,运行所述修正网络,筛选所述第一人脸候选区域框,对余留的所述第一人脸候选区域框进行位置修正并进行非极大抑制;S50:所述数据准备模块根据所述步骤S40中的第一人脸候选区域框获取图像数据,运行所述多信息输出网络,筛选所述第一人脸候选区域框,对余留的所述第一人脸候选区域框进行位置修正并进行非极大抑制,输出余留的所述第一人脸候选区域框以及所述第一人脸候选区域框对应的人脸特征点以及人脸姿态。
下面结合附图和实施例对本申请的技术内容进行进一步地说明。
请参见图1,其示出了本申请的一个实施例的人脸检测方法的流程图。具体来说,本申请的人脸检测方法主要用于监控图像中的人脸检测。如图1所示,在本申请的实施例中,该人脸检测方法包括如下步骤:
步骤S10:建立卷积神经网络框架。具体来说,在本申请的实施例中,整个卷积神经网络框架的建立采用C++来实现。所述卷积神经网络框架至少包括:候选区域生成网络、修正网络以及多信息输出网络。候选区域生成网络、修正网络以及多信息输出网络中每个网络的卷积层层数均不超过4层,以保证每个网络的运行速度。请参见图2,其示出了本申请的一个实施例的人脸检测方法中建立卷积神经网络框架的流程图。如图2所示,在步骤S10中包括如下步骤:
步骤S101:训练所述候选区域生成网络、修正网络以及多信息输出网络。具体来说,所述候选区域生成网络用于负责生成第一人脸候选区域框,所述修正网络用于对所述候选区域生成网络得到的第一人脸候选区域框进行筛选,去除大部分非人脸区域,并进行一定的位置修正。其中,所述候选区域生成网络和所述修正网络的训练可以是如Kaipeng Zhang,Zhanpeng Zhang,Zhifeng Li.Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks.IEEE Signal Processing Letters,vol.23,no.10,pp.1499-1503,2016.文献中提到的。
所述多信息输出网络用于根据所述修正网络输出的人脸区域做进一 步的判定和精确定位,同时输出人脸特征点和人脸姿态。具体来说,所述多信息输出网络的数据从三个公开数据集准备,所述多信息输出网络的训练包括如下步骤:
由winderface数据集中获取人脸正负样本以及人脸框回归值。该人脸正负样本和人脸框回归值是利用所述候选区域生成网络和修正网络分别从widerface数据集中得到的。
由CelebA数据集中获取人脸特征点样本。在本申请实施例中,人脸特征点包括左眼、右眼、鼻尖、左嘴角和右嘴角。
由AFLW数据集中获取人脸姿态样本。在本申请实施例中,所述人脸姿态描述包括三个角度,分别为旋转角、俯仰角和偏航角。请参见图3,其示出了本申请的一个实施例的人脸检测方法中描述人脸姿态的三个角度的示意图。如图3所示,旋转角是指顺着箭头B方向旋转的角度,俯仰角是指顺着箭头C方向旋转的角度,偏航角是指顺着箭头D方向旋转的角度。其中,当上述角度均为零时,表示图像中的人脸正对监控画面;当上述角度为负值是表示沿上述箭头的反方向进行了旋转。
进而,根据所述人脸正负样本、人脸框回归值、人脸特征点样本以及人脸姿态样本进行多任务联合训练。
其中,分类的损失函数为:
Figure PCTCN2018110703-appb-000001
上式中,p i为网络的预测值,
Figure PCTCN2018110703-appb-000002
是样本的标签;
框位置的损失函数为:
Figure PCTCN2018110703-appb-000003
上式中,
Figure PCTCN2018110703-appb-000004
为网络的预测值,
Figure PCTCN2018110703-appb-000005
为样本的标注值;
人脸特征点的损失函数为:
Figure PCTCN2018110703-appb-000006
上式中,
Figure PCTCN2018110703-appb-000007
为网络的预测值,
Figure PCTCN2018110703-appb-000008
为样本的标注值;
人脸姿态的损失函数为:
Figure PCTCN2018110703-appb-000009
上式中,
Figure PCTCN2018110703-appb-000010
为网络的预测值,
Figure PCTCN2018110703-appb-000011
为样本的标注值;
联合训练的总损失函数为:
Figure PCTCN2018110703-appb-000012
上式中N为样本个数,α j表示任务的权重,
Figure PCTCN2018110703-appb-000013
表示样本是否为人脸,
Figure PCTCN2018110703-appb-000014
表示各个任务的损失函数。
在本申请的优选实施例中,分类训练的权重为1,框位置训练的权重为0.75,人脸特征点训练的权重为0.75,人脸姿态训练的权重为0.5。
步骤S102:将所述候选区域生成网络、修正网络以及多信息输出网络依次级联。
步骤S20:将所述卷积神经网络框架与一数据准备模块连接。具体来说,所述数据准备模块用于获取源图像,并根据所述候选区域生成网络、修正网络以及多信息输出网络的需求对源图像进行处理,其可以是一个虚拟模块或者也可以是一个实体的电子器件。具体来说,数据准备模块可以由一块提前开辟的显存和一系列操作函数构成,其中,操作函数包括图像裁切、图像缩放、图像归一化、图像通道分离、数据类型转化、非极大抑制等函数。在本申请的一个实施例中,数据准备模块可以是图形处理器(GPU),上述操作函数都可以由作为数据准备模块的图形处理器(GPU)来完成。
源图像是指监控设备捕捉的原始图像。源图像直接加载到数据准备模块,数据准备模块分别与上述训练完成的候选区域生成网络、修正网络以及多信息输出网络连接。每个网络运行前向数据准备模块请求所需的数据,数据准备模块根据网络的输入要求将图像数据各自的需求对源图像进行处理后、转换成指定格式存放到指定位置,数据就位后各个网络就开始运行。此外,数据准备模块还可以获取各个网络运行后的结果。
步骤S30:所述数据准备模块按照所述候选区域生成网络的图像输入格式要求将所述源图像中的待检测图像进行格式转换并发送至所述候选区域生成网络后,运行所述候选区域生成网络、生成多个第一人脸候选区域框。具体来说,请参见图4,其示出了本申请的一个实施例的人脸检测方法中运行候选区域生成网络阶段的流程图。如图4所示,在此步骤中,还包括如下步骤:
步骤S301:由所述数据准备模块根据所述源图像中的待检测图像构 建图像金字塔。其中,图像金字塔即为由原始图像按照相同的比例多个缩放,形成的多个金字塔层所构成,例如以3层金字塔层为例,最底层(第一层)为原始图像,第二层图像为第一层图像按照A比例缩放后形成的图像,第三层图像为第二层图像按照A比例缩放后形成的图像,在此不予赘述。在本申请的实施例中,所述步骤S301中构建的图像金字塔的每一层所述金字塔层均按照0.709的缩放比例逐层缩放构建。
步骤S302:计算所述待检测图像上所有第一尺寸的像素区域的得分和位置修正值。在此实施例中,第一尺寸为12×12像素的模板区域。即在此步骤中,计算待检测图像上输入图像上所有12×12像素的模板区域的得分和位置修正值。其中,得分和位置修正值的计算由所述候选区域生成网络自动根据其训练的结果进行计算。
步骤S303:将所述得分大于预设的第一阈值的像素区域作为第二人脸候选区域框,并进行非极大抑制。
步骤S304:重复上述步骤S302和步骤S303直至所述图像金字塔中的所有金字塔层均被处理。
步骤S305:所述数据准备模块将所有所述第二人脸候选区域框映射至所述待检测图像上并进行非极大抑制后,生成多个所述第一人脸候选区域框。具体来说,在此步骤中,所述数据准备模块将所述步骤S304中获取的所有所述第二人脸候选区域框按照所述缩放比例(即上述的0.709的缩放比例)映射至所述待检测图像内进行非极大抑制,以此生成第一人脸候选区域框。该第一人脸候选区域框即为所述候选区域生成网络最终生成的人脸候选区域。
步骤S40:所述数据准备模块根据所述步骤S30中的第一人脸候选区域框获取图像数据,并按照所述修正网络的输入要求对图像数据处理后存放到指定位置。然后运行所述修正网络,筛选所述第一人脸候选区域框,对余留的所述第一人脸候选区域框进行位置修正并进行非极大抑制。其中,筛选的方式与上述步骤S30类似地,在修正网络的训练结果下,计算步骤S30中筛选后的各个第一人脸候选区域框的得分,将得分低于设定阈值的第一人脸候选区域框删除。
步骤S50:所述数据准备模块根据所述步骤S40中的第一人脸候选 区域框获取图像数据,运行所述多信息输出网络,筛选所述第一人脸候选区域框,对余留的所述第一人脸候选区域框进行位置修正并进行非极大抑制,输出余留的所述第一人脸候选区域框以及所述第一人脸候选区域框对应的人脸特征点以及人脸姿态。其中,筛选的方式与上述步骤S30和步骤S40类似地,在多信息输出网络的训练结果下,计算步骤S40中筛选后的各个第一人脸候选区域框的得分,将得分低于设定阈值的第一人脸候选区域框删除。
进一步地,在本申请实施例中,所述卷积神经网络框架还包括人物全身识别网络。所述人物全身识别网络与所述多信息输出网络级联,且与所述数据准备模块连接。其中,所述人物全身识别网络的基本结构与上述多信息输出网络相同,只是最后的输出减少为2个,即人的置信度和框位置的回归值。损失函数的计算公式与上述多信息输出网络一样,不同之处在于,联合训练中的分类训练的权重为1,框位置训练的权重为1。
进而,如图1所示,所述人脸检测方法还包括如下步骤:
步骤S601:根据所述步骤S50中输出的第一人脸候选区域框所在区域,扩大得到一人物全身区域框。其中,该步骤中的人物全身区域框是在第一人脸候选区域框的基础上按照经验值扩大得到一个粗略的人物全身区域框。进而,还需执行步骤S602。
步骤S602:所述数据准备模块根据所述步骤S601中的人物全身区域框获取图像数据,按照人物全身识别网络的输入要求处理图像数后存放到指定位置。然后运行人物全身识别网络,筛选所述人物全身区域框,对余留的所述人物全身区域框修正后输出。其中,筛选的方式与上述步骤S30、S40和S50类似地,在人物全身识别网络的训练结果下,计算数据准备模块存放的根据步骤S601的执行结果得到各个人物全身区域框的得分,将得分低于设定阈值的人物全身区域框删除。进而,将余留的人物全身区域框修正后输出。
经过上述步骤S10至步骤S602处理后的输出的图像如图5所示。图5中虚线框A1为上述步骤S50输出的第一人脸候选区域框,虚线框A2为上述步骤S602输出的人物全身区域框。其中,虚框框A1上方的数字99为步骤S50执行后计算得到的虚框框A1对应的第一人脸候选区域框 的得分;(2.0,-5.5,-22.5)分别代表该第一人脸候选区域框内的表示人脸姿态的旋转角、俯仰角和偏航角的角度,即旋转角为2.2、俯仰角为-5.5、偏航角为-22.5。类似地,虚框框A2上方的数字99为步骤S602执行后计算得到的虚框框A2对应的人物全身区域框的得分。
结合上述步骤S10至步骤S602可见,本申请的优选实施例中,对监控图像中的人脸检测的步骤总结如下:
将视频图像形成图像金字塔,将图像金字塔作为本申请中网络的输入。
第一个网络(候选区域生成网络)负责生成第一人脸候选区域框。主要采用三个级联的卷积层来获得第一候选区域和回归向量。再根据回归向量来筛选和合并第一人脸候选区域框,可以剔除大量的非人脸区域。其中,卷积层的卷积核均采用3x3卷积。
第二个网络(修正网络)的输入为第一个网络(候选区域生成网络)得到的第一人脸候选区域框和回归向量。第二个网络(修正网络)在此基础上进行必要的位置调整,类似地,也送入三个级联的卷基层进行进一步的非人脸区域去除。其中,前两个卷积层的卷积核采用3x3卷积,第三个卷积层的卷积核为2x2卷积。
第三个网络(多信息输出网络)根据第二个网络(修正网络)输出的第一人脸候选区域框做进一步的判定和精确定位,采用四个级联的卷积层和一个全连接层,同时输出人脸特征点和人脸姿态。其中,前两个卷积层的卷积核采用3x3卷积,后两个卷积层的卷积核为2x2卷积。
第四个网络根据第三个网络(多信息输出网络)确定下来的第一人脸候选区域框,在视频图像上生成对应的人上半身和全人候选框,采用两个级联的卷积层来获得人的位置。其中,卷积层的卷积核均采用3x3卷积。
由上述图1至图5所示实施例可见,本申请实施例提供的人脸检测方法中通过增加一数据准备膜块,并将数据准备模块与三个网络级联而成的卷积神经网络框架连接;经过卷积神经网络框架中候选区域生成网络、修正网络以及多信息输出网络的依次运行后能够输出人脸区域,人脸特征点、人脸姿态。在保证人脸检测效果的同时,图像裁切、图像缩放、图像 归一化、图像通道分离、数据类型转化、非极大抑制等操作函数都是由数据准备模块所完成的,每个网络的数据统一由数据准备模块提供,这样可以保证绝大部分的计算由GPU完成,CPU仅负责少量的逻辑控制和结果分析,并且在检测的过程中不涉及额外的内存到显存的拷贝,从而充分利用了GPU加速,极大地提升了检测速度GPU优化。
此外,还可以通过与一人物全身识别网络级联后进一步对人物全身进行设别。
可见,该人脸检测方法至少具有如下有益效果:
a、速度快;在有GPU(例如Nvidia GTX1080)加速的情况下,在1920×1080(1080p)的分辨率上检测20×20像素的人脸可以达到50FPS以上,完全满足实时检测的需求。
b、人脸检测定位效果比传统的特征加分类器的方式要好很多;如图所示,利用该方法进行检测在FDDB上的检测结果可以在图6中查看。
c、输出信息丰富;不仅有人脸区域,还可以输出人脸特征点、人脸姿态甚至使人物全身,对后续的分析有很大帮助。
进一步地,请参见图7,其示出了本申请的一个实施例的人脸检测系统的结构示意图。具体来说,本申请还提供一种人脸检测系统,用于实现上述人脸检测方法。如图7所示,所述人脸检测系统主要包括:数据准备模块1和卷积神经网络框架。其中,数据准备模块1用于获取源图像,并对源图像进行处理。所述卷积神经网络框架与数据准备模块1连接。
如图7所示,所述卷积神经网络框架至少包括依次级联的候选区域生成网络21、修正网络22以及多信息输出网络23。候选区域生成网络21、修正网络22以及多信息输出网络23分别与数据准备模块1连接。其中,候选区域生成网络21用于生成多个第一人脸候选区域框。修正网络22用于对候选区域生成网络21运行后生成的第一人脸候选区域框进行筛选,对筛选后余留的第一人脸候选区域框进行位置修正并进行非极大抑制。多信息输出网络23用于对修正网络22运行后生成的第一人脸候选区域框进行筛选、对筛选后余留的第一人脸候选区域框进行位置修正并进行非极大抑制以及输出余留的第一人脸候选区域框以及第一人脸候选区域框对应的人脸特征点以及人脸姿态。
进一步地,在图7所示的实施例中,所述人脸检测系统还包括人物全身识别网络24。人物全身识别网络24与多信息输出网络23级联,并且连接数据准备模块1。人物全身识别网络24用于对多信息输出网络23运行后生成的第一人脸候选区域狂进行扩大得到人物全身区域框,并且筛选上述人物全身区域框,对余留的人物全身区域框修正后输出。
综上所述,本申请实施例提供的人脸检测方法以及人脸检测系统中通过增加一数据准备膜块,并将数据准备模块与三个网络级联而成的卷积神经网络框架连接;经过卷积神经网络框架中候选区域生成网络、修正网络以及多信息输出网络的依次运行后能够输出人脸区域,人脸特征点、人脸姿态。在保证人脸检测效果的同时,图像裁切、图像缩放、图像归一化、图像通道分离、数据类型转化、非极大抑制等操作函数都是由数据准备模块所完成的,每个网络的数据统一由数据准备模块提供,这样可以保证绝大部分的计算由GPU完成,CPU仅负责少量的逻辑控制和结果分析,并且在检测的过程中不涉及额外的内存到显存的拷贝,从而充分利用了GPU加速,极大地提升了检测速度GPU优化。
此外,还可以通过与一人物全身识别网络级联后进一步对人物全身进行设别。
可见,该人脸检测方法至少具有如下有益效果:
a、速度快;在有GPU(例如Nvidia GTX1080)加速的情况下,在1920×1080(1080p)的分辨率上检测20×20像素的人脸可以达到50FPS以上,完全满足实时检测的需求。
b、人脸检测定位效果比传统的特征加分类器的方式要好很多。
c、输出信息丰富;不仅有人脸区域,还可以输出人脸特征点、人脸姿态甚至使人物全身,对后续的分析有很大帮助。
虽然本申请已以可选实施例揭示如上,然而其并非用以限定本申请。本申请所属技术领域的技术人员,在不脱离本申请的精神和范围内,当可作各种的更动与修改。因此,本申请的保护范围当视权利要求书所界定的范围为准。

Claims (10)

  1. 一种人脸检测方法,其特征在于,所述人脸检测方法包括如下步骤:
    S10:建立卷积神经网络框架,所述卷积神经网络框架至少包括:候选区域生成网络、修正网络以及多信息输出网络,其中,该步骤中包括如下步骤:
    训练所述候选区域生成网络、修正网络以及多信息输出网络;
    将所述候选区域生成网络、修正网络以及多信息输出网络依次级联;
    S20:将所述卷积神经网络框架与一数据准备模块连接,其中,所述数据准备模块用于获取源图像,并根据所述候选区域生成网络、修正网络以及多信息输出网络的需求对源图像进行处理;
    S30:所述数据准备模块按照所述候选区域生成网络的图像输入格式要求将所述源图像中的待检测图像进行格式转换并发送至所述候选区域生成网络后,运行所述候选区域生成网络、生成多个第一人脸候选区域框;
    S40:所述数据准备模块根据所述步骤S30中的第一人脸候选区域框获取图像数据,运行所述修正网络,筛选所述第一人脸候选区域框,对余留的所述第一人脸候选区域框进行位置修正并进行非极大抑制;
    S50:所述数据准备模块根据所述步骤S40中的第一人脸候选区域框获取图像数据,运行所述多信息输出网络,筛选所述第一人脸候选区域框,对余留的所述第一人脸候选区域框进行位置修正并进行非极大抑制,输出余留的所述第一人脸候选区域框以及所述第一人脸候选区域框对应的人脸特征点以及人脸姿态。
  2. 如权利要求1所述的人脸检测方法,其特征在于,所述卷积神经网络框架还包括人物全身识别网络,所述人物全身识别网络与所述多信息输出网络级联,且与所述数据准备模块连接;所述人脸检测方法还包括如下步骤:
    S601:根据所述步骤S50中输出的第一人脸候选区域框所在区域,扩大得到人物全身区域框;
    S602:所述数据准备模块根据所述步骤S601中的人物全身区域框获取 图像数据,运行人物全身识别网络,筛选所述人物全身区域框,对余留的所述人物全身区域框修正后输出。
  3. 如权利要求1所述的人脸检测方法,其特征在于,所述步骤S30包括如下步骤:
    S301:由所述数据准备模块根据所述源图像中的待检测图像构建图像金字塔;
    S302:计算所述待检测图像上所有第一尺寸的像素区域的得分和位置修正值;
    S303:将所述得分大于预设的第一阈值的像素区域作为第二人脸候选区域框,并进行非极大抑制;
    S304:重复上述步骤S302和步骤S303直至所述图像金字塔中的所有金字塔层均被处理;
    S305:所述数据准备模块将所有所述第二人脸候选区域框映射至所述待检测图像上并进行非极大抑制后,生成多个所述第一人脸候选区域框。
  4. 如权利要求3所述的人脸检测方法,其特征在于,在所述步骤S301中构建的所述图像金字塔的每一层所述金字塔层均按照0.709的缩放比例逐层缩放构建;所述步骤S305中,所述数据准备模块将所述步骤S304中获取的所有所述第二人脸候选区域框按照所述缩放比例映射至所述待检测图像内。
  5. 如权利要求1所述的人脸检测方法,其特征在于,所述多信息输出网络的训练包括如下步骤:
    由winderface数据集中获取人脸正负样本以及人脸框回归值;
    由CelebA数据集中获取人脸特征点样本;
    由AFLW数据集中获取人脸姿态样本;
    根据所述人脸正负样本、人脸框回归值、人脸特征点样本以及人脸姿态样本进行多任务联合训练,其中,分类的损失函数为:
    Figure PCTCN2018110703-appb-100001
    上式中,p i为网络的预测值,
    Figure PCTCN2018110703-appb-100002
    是样本的标签;
    框位置的损失函数为:
    Figure PCTCN2018110703-appb-100003
    上式中,
    Figure PCTCN2018110703-appb-100004
    为网络的预测值,
    Figure PCTCN2018110703-appb-100005
    为样本的标注值;
    人脸特征点的损失函数为:
    Figure PCTCN2018110703-appb-100006
    上式中,
    Figure PCTCN2018110703-appb-100007
    为网络的预测值,
    Figure PCTCN2018110703-appb-100008
    为样本的标注值;
    人脸姿态的损失函数为:
    Figure PCTCN2018110703-appb-100009
    上式中,
    Figure PCTCN2018110703-appb-100010
    为网络的预测值,
    Figure PCTCN2018110703-appb-100011
    为样本的标注值;
    联合训练的总损失函数为:
    Figure PCTCN2018110703-appb-100012
    上式中N为样本个数,α j表示任务的权重,
    Figure PCTCN2018110703-appb-100013
    表示样本是否为人脸,
    Figure PCTCN2018110703-appb-100014
    表示各个任务的损失函数。
  6. 如权利要求5所述的人脸检测方法,其特征在于,所述联合训练总损失函数中,分类训练的权重为1,框位置训练的权重为0.5,人脸特征点训练的权重为1,人脸姿态训练的权重为1。
  7. 如权利要求5所述的人脸检测方法,其特征在于,所述人脸特征点包括左眼、右眼、鼻尖、左嘴角和右嘴角
  8. 如权利要求5所述的人脸检测方法,其特征在于,所述人脸姿态描述包括旋转角、俯仰角和偏航角。
  9. 一种人脸检测系统,其特征在于,所述人脸检测系统包括:
    数据准备模块,用于获取源图像,并对源图像进行处理;
    卷积神经网络框架,所述卷积神经网络框架与所述数据准备模块连接,其中,所述卷积神经网络框架至少包括依次级联的候选区域生成网络、修正网络以及多信息输出网络;其中,
    所述候选区域生成网络用于生成多个第一人脸候选区域框;
    所述修正网络用于对所述候选区域生成网络运行后生成的所述第一人脸候选区域框进行筛选,对筛选后余留的所述第一人脸候选区域框进行位置修正并进行非极大抑制;
    所述多信息输出网络用于对所述修正网络运行后生成的所述第一人脸候选区域框进行筛选、对筛选后余留的所述第一人脸候选区域框进行位置 修正并进行非极大抑制以及输出余留的所述第一人脸候选区域框以及所述第一人脸候选区域框对应的人脸特征点以及人脸姿态。
  10. 如权利要求9所述的人脸检测系统,其特征在于,所述人脸检测系统还包括人物全身识别网络,所述人物全身识别网络与所述多信息输出网络级联,并且连接所述数据准备模块;所述人物全身识别网络用于对所述多信息输出网络运行后生成的所述第一人脸候选区域狂进行扩大得到人物全身区域框,并且筛选所述人物全身区域框,对余留的所述人物全身区域框修正后输出。
PCT/CN2018/110703 2017-11-13 2018-10-17 一种人脸检测方法以及人脸检测系统 WO2019091271A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711115415.3A CN107886074B (zh) 2017-11-13 2017-11-13 一种人脸检测方法以及人脸检测系统
CN201711115415.3 2017-11-13

Publications (1)

Publication Number Publication Date
WO2019091271A1 true WO2019091271A1 (zh) 2019-05-16

Family

ID=61780364

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/110703 WO2019091271A1 (zh) 2017-11-13 2018-10-17 一种人脸检测方法以及人脸检测系统

Country Status (2)

Country Link
CN (1) CN107886074B (zh)
WO (1) WO2019091271A1 (zh)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110826402A (zh) * 2019-09-27 2020-02-21 深圳市华付信息技术有限公司 一种基于多任务的人脸质量估计方法
CN111241924A (zh) * 2019-12-30 2020-06-05 新大陆数字技术股份有限公司 基于尺度估计的人脸检测及对齐方法、装置、存储介质
CN111241927A (zh) * 2019-12-30 2020-06-05 新大陆数字技术股份有限公司 级联式人脸图像优选方法、系统、设备及可读存储介质
CN111310710A (zh) * 2020-03-03 2020-06-19 平安科技(深圳)有限公司 人脸检测方法与系统
CN111611917A (zh) * 2020-05-20 2020-09-01 北京华捷艾米科技有限公司 模型训练方法、特征点检测方法、装置、设备及存储介质
CN111814568A (zh) * 2020-06-11 2020-10-23 开易(北京)科技有限公司 一种用于驾驶员状态监测的目标检测方法及装置
CN112001205A (zh) * 2019-05-27 2020-11-27 北京君正集成电路股份有限公司 一种二次人脸检测的网络模型样本采集的方法
CN112183461A (zh) * 2020-10-21 2021-01-05 广州市晶华精密光学股份有限公司 一种车辆内部监控方法、装置、设备及存储介质
CN112364803A (zh) * 2020-11-20 2021-02-12 深圳龙岗智能视听研究院 活体识别辅助网络和训练方法、终端、设备及存储介质
CN112801066A (zh) * 2021-04-12 2021-05-14 北京圣点云信息技术有限公司 一种基于多姿态面部静脉的身份识别方法及装置
CN114283462A (zh) * 2021-11-08 2022-04-05 上海应用技术大学 口罩佩戴检测方法及系统
CN114663965A (zh) * 2022-05-24 2022-06-24 之江实验室 一种基于双阶段交替学习的人证比对方法和装置
CN116079749A (zh) * 2023-04-10 2023-05-09 南京师范大学 基于聚类分离条件随机场的机器人视觉避障方法及机器人
CN116416672A (zh) * 2023-06-12 2023-07-11 南昌大学 一种基于GhostNetV2的轻量化人脸与人脸关键点检测方法

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107886074B (zh) * 2017-11-13 2020-05-19 苏州科达科技股份有限公司 一种人脸检测方法以及人脸检测系统
CN108764248B (zh) * 2018-04-18 2021-11-02 广州视源电子科技股份有限公司 图像特征点的提取方法和装置
CN110555877B (zh) * 2018-05-31 2022-05-31 杭州海康威视数字技术股份有限公司 一种图像处理方法、装置及设备、可读介质
CN109002753B (zh) * 2018-06-01 2022-07-08 上海大学 一种基于卷积神经网络级联的大场景监控图像人脸检测方法
CN109344762B (zh) * 2018-09-26 2020-12-18 北京字节跳动网络技术有限公司 图像处理方法和装置
CN109446922B (zh) * 2018-10-10 2021-01-08 中山大学 一种实时鲁棒的人脸检测方法
CN109376659A (zh) * 2018-10-26 2019-02-22 北京陌上花科技有限公司 用于人脸关键点网络检测模型的训练方法、人脸关键点检测方法、装置
CN109902631B (zh) * 2019-03-01 2021-02-26 北京视甄智能科技有限公司 一种基于图像金字塔的快速人脸检测方法
CN110084216B (zh) * 2019-05-06 2021-11-09 苏州科达科技股份有限公司 人脸识别模型训练和人脸识别方法、系统、设备及介质
CN110276308B (zh) * 2019-06-25 2021-11-16 上海商汤智能科技有限公司 图像处理方法及装置
CN112241670B (zh) * 2019-07-18 2024-03-01 杭州海康威视数字技术股份有限公司 图像处理方法及装置
CN112241669A (zh) * 2019-07-18 2021-01-19 杭州海康威视数字技术股份有限公司 目标识别方法、装置、系统及设备、存储介质
CN112241936B (zh) * 2019-07-18 2023-08-25 杭州海康威视数字技术股份有限公司 图像处理方法、装置及设备、存储介质
CN110909651B (zh) * 2019-11-15 2023-12-26 腾讯科技(深圳)有限公司 视频主体人物的识别方法、装置、设备及可读存储介质
CN111144229A (zh) * 2019-12-05 2020-05-12 中国科学院深圳先进技术研究院 一种姿态检测系统及方法
CN111209819A (zh) * 2019-12-30 2020-05-29 新大陆数字技术股份有限公司 旋转不变的人脸检测方法、系统设备及可读存储介质
CN111178343A (zh) * 2020-04-13 2020-05-19 腾讯科技(深圳)有限公司 基于人工智能的多媒体资源检测方法、装置、设备及介质
CN111739070B (zh) * 2020-05-28 2022-07-22 复旦大学 一种基于渐进校准式网络的实时多姿态人脸检测算法
CN111814612A (zh) * 2020-06-24 2020-10-23 浙江大华技术股份有限公司 目标的脸部检测方法及其相关装置
CN112396016B (zh) * 2020-11-26 2021-07-23 武汉宏数信息技术有限责任公司 一种基于大数据技术的人脸识别系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912990A (zh) * 2016-04-05 2016-08-31 深圳先进技术研究院 人脸检测的方法及装置
CN107239736A (zh) * 2017-04-28 2017-10-10 北京智慧眼科技股份有限公司 基于多任务级联卷积神经网络的人脸检测方法及检测装置
CN107886074A (zh) * 2017-11-13 2018-04-06 苏州科达科技股份有限公司 一种人脸检测方法以及人脸检测系统

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100714724B1 (ko) * 2005-08-17 2007-05-07 삼성전자주식회사 얼굴 포즈 추정 장치와 추정 방법 그리고 상기 방법에 의한얼굴 인식 시스템
US9286524B1 (en) * 2015-04-15 2016-03-15 Toyota Motor Engineering & Manufacturing North America, Inc. Multi-task deep convolutional neural networks for efficient and robust traffic lane detection
CN106548185B (zh) * 2016-11-25 2019-05-24 三星电子(中国)研发中心 一种前景区域确定方法和装置
CN106650699B (zh) * 2016-12-30 2019-09-17 中国科学院深圳先进技术研究院 一种基于卷积神经网络的人脸检测方法及装置
CN106874868B (zh) * 2017-02-14 2020-09-18 北京飞搜科技有限公司 一种基于三级卷积神经网络的人脸检测方法及系统
CN107038429A (zh) * 2017-05-03 2017-08-11 四川云图睿视科技有限公司 一种基于深度学习的多任务级联人脸对齐方法
CN107145867A (zh) * 2017-05-09 2017-09-08 电子科技大学 基于多任务深度学习的人脸及人脸遮挡物检测方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912990A (zh) * 2016-04-05 2016-08-31 深圳先进技术研究院 人脸检测的方法及装置
CN107239736A (zh) * 2017-04-28 2017-10-10 北京智慧眼科技股份有限公司 基于多任务级联卷积神经网络的人脸检测方法及检测装置
CN107886074A (zh) * 2017-11-13 2018-04-06 苏州科达科技股份有限公司 一种人脸检测方法以及人脸检测系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
QIAN, CHENG: "Research on Deep Learning Based Face Recognition", ELECTRONIC TECHNOLOGY & INFORMATION SCIENCE , CHINA MASTER S THESES FULL-TEXT DATABASE, 15 July 2017 (2017-07-15), pages 26 - 31, ISSN: 1674-0246 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112001205A (zh) * 2019-05-27 2020-11-27 北京君正集成电路股份有限公司 一种二次人脸检测的网络模型样本采集的方法
CN112001205B (zh) * 2019-05-27 2023-10-31 北京君正集成电路股份有限公司 一种二次人脸检测的网络模型样本采集的方法
CN110826402B (zh) * 2019-09-27 2024-03-29 深圳市华付信息技术有限公司 一种基于多任务的人脸质量估计方法
CN110826402A (zh) * 2019-09-27 2020-02-21 深圳市华付信息技术有限公司 一种基于多任务的人脸质量估计方法
CN111241924A (zh) * 2019-12-30 2020-06-05 新大陆数字技术股份有限公司 基于尺度估计的人脸检测及对齐方法、装置、存储介质
CN111241927A (zh) * 2019-12-30 2020-06-05 新大陆数字技术股份有限公司 级联式人脸图像优选方法、系统、设备及可读存储介质
CN111310710A (zh) * 2020-03-03 2020-06-19 平安科技(深圳)有限公司 人脸检测方法与系统
CN111611917A (zh) * 2020-05-20 2020-09-01 北京华捷艾米科技有限公司 模型训练方法、特征点检测方法、装置、设备及存储介质
CN111814568A (zh) * 2020-06-11 2020-10-23 开易(北京)科技有限公司 一种用于驾驶员状态监测的目标检测方法及装置
CN111814568B (zh) * 2020-06-11 2022-08-02 开易(北京)科技有限公司 一种用于驾驶员状态监测的目标检测方法及装置
CN112183461A (zh) * 2020-10-21 2021-01-05 广州市晶华精密光学股份有限公司 一种车辆内部监控方法、装置、设备及存储介质
CN112364803A (zh) * 2020-11-20 2021-02-12 深圳龙岗智能视听研究院 活体识别辅助网络和训练方法、终端、设备及存储介质
CN112364803B (zh) * 2020-11-20 2023-08-11 深圳龙岗智能视听研究院 活体识别辅助网络的训练方法、终端、设备及存储介质
CN112801066A (zh) * 2021-04-12 2021-05-14 北京圣点云信息技术有限公司 一种基于多姿态面部静脉的身份识别方法及装置
CN114283462A (zh) * 2021-11-08 2022-04-05 上海应用技术大学 口罩佩戴检测方法及系统
CN114283462B (zh) * 2021-11-08 2024-04-09 上海应用技术大学 口罩佩戴检测方法及系统
CN114663965B (zh) * 2022-05-24 2022-10-21 之江实验室 一种基于双阶段交替学习的人证比对方法和装置
CN114663965A (zh) * 2022-05-24 2022-06-24 之江实验室 一种基于双阶段交替学习的人证比对方法和装置
CN116079749A (zh) * 2023-04-10 2023-05-09 南京师范大学 基于聚类分离条件随机场的机器人视觉避障方法及机器人
CN116416672A (zh) * 2023-06-12 2023-07-11 南昌大学 一种基于GhostNetV2的轻量化人脸与人脸关键点检测方法
CN116416672B (zh) * 2023-06-12 2023-08-29 南昌大学 一种基于GhostNetV2的轻量化人脸与人脸关键点检测方法

Also Published As

Publication number Publication date
CN107886074A (zh) 2018-04-06
CN107886074B (zh) 2020-05-19

Similar Documents

Publication Publication Date Title
WO2019091271A1 (zh) 一种人脸检测方法以及人脸检测系统
US11107232B2 (en) Method and apparatus for determining object posture in image, device, and storage medium
JP6688277B2 (ja) プログラム、学習処理方法、学習モデル、データ構造、学習装置、および物体認識装置
US20190043216A1 (en) Information processing apparatus and estimating method for estimating line-of-sight direction of person, and learning apparatus and learning method
US7844135B2 (en) Detecting orientation of digital images using face detection information
JP5553141B2 (ja) 画像処理システム、画像処理装置、画像処理方法、およびプログラム
WO2020134528A1 (zh) 目标检测方法及相关产品
JP2008234654A (ja) 目標画像検出方法及び画像検出装置
WO2023082784A1 (zh) 一种基于局部特征注意力的行人重识别方法和装置
US20180204097A1 (en) Automatic Capture and Refinement of a Digital Image of a Group of People without User Intervention
WO2022110591A1 (zh) 基于连麦直播的直播画面处理方法、装置及电子设备
WO2019080743A1 (zh) 一种目标检测方法、装置及计算机设备
WO2022002262A1 (zh) 基于计算机视觉的字符序列识别方法、装置、设备和介质
WO2022233223A1 (zh) 图像拼接方法、装置、设备及介质
US20160275338A1 (en) Image processing apparatus, image processing method, and computer-readable storing medium
Huang et al. Improving rotated text detection with rotation region proposal networks
WO2022143264A1 (zh) 人脸朝向估计方法、装置、电子设备及存储介质
US10999513B2 (en) Information processing apparatus having camera function, display control method thereof, and storage medium
JP4305000B2 (ja) 画像処理方法、画像処理装置、記憶媒体及びプログラム
TW202143110A (zh) 應用於畫面顯示的物件透明度改變方法及實物投影機
CN111275610A (zh) 一种人脸变老图像处理方法及系统
CN110570441A (zh) 一种超高清低延时视频控制方法及系统
JP2005316743A (ja) 画像処理方法および画像処理装置
WO2022042669A1 (zh) 一种图片处理方法、装置、设备及存储介质
WO2023029123A1 (zh) 一种顶点坐标的检测方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18875235

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18875235

Country of ref document: EP

Kind code of ref document: A1