WO2021190664A1 - 基于关键点定位的多人脸检测方法、系统及存储介质 - Google Patents
基于关键点定位的多人脸检测方法、系统及存储介质 Download PDFInfo
- Publication number
- WO2021190664A1 WO2021190664A1 PCT/CN2021/084307 CN2021084307W WO2021190664A1 WO 2021190664 A1 WO2021190664 A1 WO 2021190664A1 CN 2021084307 W CN2021084307 W CN 2021084307W WO 2021190664 A1 WO2021190664 A1 WO 2021190664A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- face
- key point
- point position
- average
- face detection
- Prior art date
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 124
- 239000013598 vector Substances 0.000 claims abstract description 59
- 238000000034 method Methods 0.000 claims abstract description 54
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 30
- 238000004364 calculation method Methods 0.000 claims abstract description 20
- 238000012549 training Methods 0.000 claims abstract description 16
- 230000001629 suppression Effects 0.000 claims abstract description 12
- 230000006870 function Effects 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 18
- 230000000694 effects Effects 0.000 abstract description 9
- 238000005516 engineering process Methods 0.000 abstract description 5
- 230000004044 response Effects 0.000 abstract description 5
- 238000013528 artificial neural network Methods 0.000 abstract description 4
- 230000001815 facial effect Effects 0.000 abstract 9
- 238000005070 sampling Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 7
- 238000013527 convolutional neural network Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
Definitions
- This application relates to face recognition technology of neural networks, and in particular to a method, system and storage medium for multi-face detection based on key point positioning.
- the present application provides a method, system and computer-readable storage medium for multi-face detection based on key point positioning, which mainly solves the problem of high power consumption in multi-face detection.
- this application provides a method for multi-face detection based on key point positioning, which is applied to an electronic device.
- the method includes: training a U-Net-based multi-face detection model by using a data set; and inputting the image to be detected into the training
- a good multi-face detection model based on U-Net obtains the feature map in the heat map format; the obtained feature map in the heat map format is used to obtain the real key point positions and people of the face using the heat map maximum suppression algorithm Feature vector of face average point position; the obtained real face key point position data and face average point position feature vector are used to determine the position of the face key point using an association algorithm to complete face detection.
- this application provides a multi-face detection system based on key point positioning, including a U-Net-based multi-face detection model training unit, feature map acquisition unit, feature vector acquisition unit and face key point positions
- the acquisition unit wherein the U-Net-based multi-face detection model training unit is used to train a U-Net-based multi-face detection model using a data set; the feature map acquisition unit is used to convert the image to be detected Input the trained multi-face detection model based on U-Net to obtain the feature map in the heat map format; the feature vector acquisition unit is used to suppress the obtained feature map in the heat map format by using the maximum value of the heat map
- the algorithm obtains the real face key point position and the average face point position feature vector; the face key point position acquisition unit uses the obtained real face key point position data and the face average point position data to determine using an association algorithm The position of the key points of the face to complete the face detection.
- the present application also provides an electronic device, which includes: at least one processor; and, a memory communicatively connected with the at least one processor; wherein the memory stores the A program executed by a processor, the program is executed by the at least one processor, so that the at least one processor can execute the above-mentioned method for detecting multiple faces based on key point positioning.
- the present application also provides a computer-readable storage medium storing a computer program, which when executed by a processor, realizes the steps of the above-mentioned method for multi-face detection based on key point positioning.
- the multi-face detection method, system, electronic device, and computer readable storage medium based on key point positioning proposed in this application train a U-Net-based multi-face detection model by using a data set; input the image to be detected into the trained U-Net-based multi-face detection model to obtain feature maps in heat map format; use the heat map maximum suppression algorithm to obtain the true face key point positions and face averages of the obtained feature maps in heat map format Point location feature vector; use the obtained real face key point location data and the average face point location data to obtain the face key point location by using the correlation algorithm to complete the face detection.
- the beneficial effects are as follows: 1) The multi-face detection method based on key point positioning of the present application uses a single model to complete face detection and face key point positioning at the same time, which saves the calculation process and steps, and realizes the acceleration of the response time of the final application. And the technical effect of reducing computing consumption is more suitable for mobile devices such as mobile phones with limited computing resources; the multi-face detection method based on key point positioning of this application is directly based on the key points of the face and the average point of the face to do face detection and human Position the key points of the face to obtain the information actually needed by the application.
- the face frame is only an output result, and the sub-networks related to the face frame can be cropped according to actual needs to further save calculations;
- the multi-face detection method based on key point positioning of this application is compared with the MTCNN network and numerical regression in the prior art Compared with the method of calculating key point positions in the method of calculating the key point position, the final output of the face key point position of the overall model of the present application is calculated and output by using a high-resolution heat map, which achieves the technical effect of improving accuracy and robustness.
- FIG. 1 is a flowchart of a preferred embodiment of a method for multi-face detection based on key point positioning according to this application;
- FIG. 2 is a flowchart of a preferred embodiment for determining the position of key points of a face in the method for multiple face detection based on key point positioning of this application;
- Fig. 3 is a schematic diagram of the logical structure of the multi-face detection system based on key point positioning of the present application
- FIG. 4 is a schematic structural diagram of a preferred embodiment of the electronic device of this application.
- FIG. 1 shows the flow of a preferred embodiment of the method for multi-face detection based on key point positioning according to the present application.
- the method may be executed by an apparatus, and the apparatus may be realized by software and/or hardware.
- the pre-steps of face-based applications are to locate the key points of the face first, then use the key points of the face to align the face, and finally perform the steps for practical applications-such as face recognition , Living body recognition, facial expression recognition, etc.
- the existing MTCNN contains three cascaded multi-task convolutional neural networks, namely Proposal Network (P-Net), Refine Network (R-Net), Output Network (O-Net), each multi-task convolutional neural network has three learning tasks, namely face classification, border regression and key point positioning.
- P-Net Proposal Network
- R-Net Refine Network
- OF-Net Output Network
- MTCNN realizes face detection and key point positioning is divided into three stages.
- P-Net obtains the regression vector of the candidate window and the bounding box of the face area, and uses the bounding box for regression to calibrate the candidate window, and then merges highly overlapping candidate boxes through non-maximum suppression (NMS) . Then use the candidate frame obtained by P-Net as input and enter it into R-Net. R-Net also uses bounding box regression and NMS to remove those false-positive regions to obtain more accurate candidate frames; finally, use O- Net outputs the positions of 5 key points.
- NMS non-maximum suppression
- the key to the pre-step is to find the face and locate the key points of the face.
- the face frame it is not necessary.
- the multi-face detection method based on key point positioning of the present application completes face detection and key point positioning at the same time by using a single model without going through the face frame step, saving the calculation process and steps, and realizing speeding up the final application
- the response time and the technical effect of reducing computing consumption are more suitable for mobile devices such as mobile phones with limited computing resources.
- the multiple face detection method based on key point positioning includes step S110 to step S140.
- the data set Before training a multi-face detection model based on U-Net, a data set must be obtained first.
- the data set is composed of two sets of data, and only the data set labeled with key points of the face exists at the same time.
- the method for acquiring the face key point annotations is that all the key point coordinates are recorded as (X p , Y p ); each face has 5 The key points are left eye, right eye, nose tip, left mouth corner, and right mouth corner; calculate the average value of the x and y coordinates of all key points to obtain an average point; take this average point as the position of the face (X f , Y f ).
- the method for obtaining face key point annotations and face frame annotations is, all key point coordinates are recorded as (X p , Y p ) ;
- Each face has 5 key points, namely the left eye, right eye, nose tip, left mouth corner, and right mouth corner; calculate the average of the x and y coordinates of all key points to obtain an average point; take this average point as The position of the face (X f , Y f ); the offset from the upper left corner of the face frame to the average point is recorded as (X tl , Y tl ), and the offset from the lower right corner of the face frame to the average point is recorded as (X rb , Y rb ).
- the above-mentioned data set may be used by databases including one or more of AFLW face database, COFW face database, MVFW face database, or OCFW face database.
- the AFLW face database contains 25993 face images collected from Flickr, and each face is calibrated with 21 key points.
- the COFW face database contains 845 face images in the training set of the LFPW face database and 500 other occluded face images, while the test set contains 507 severely occluded face images (including changes in posture and expression).
- the MVFW face database is a multi-view face data set, including 2050 training face images and 450 test face images. Each face is calibrated with 68 key points.
- the OCFW face database contains 2,951 training face images (all unoccluded faces) and 1,246 test face images (all occluded faces), and 68 key points are calibrated for each face.
- the U-Net-based multi-face detection model is based on the network model U-Net. It should be noted that the network structure of the U-Net-based multi-face detection model has a total of four layers, and the input pictures are down-sampled 4 times and up-sampled 4 times.
- the feature channel is output through the ReLU activation function; the picture during the down-sampling process on the left is cropped and copied; the picture is down-sampled through the maximum pooling, and the pooling kernel
- the size is 2 ⁇ 2; deconvolution, the image is up-sampled, and the size of the convolution kernel is 2 ⁇ 2; the image is convolved using a 1 ⁇ 1 convolution kernel.
- the input is a 527 ⁇ 527 ⁇ 1 picture
- convolution is performed through 64 3 ⁇ 3 convolution kernels
- 64 570 images are obtained after passing through the ReLU function.
- ⁇ 570 ⁇ 1 feature channel is the result of 570 ⁇ 570 ⁇ 64
- 64 feature extraction results of 568 ⁇ 568 ⁇ 1 are also obtained after the ReLU function.
- the processing result of the first layer is a feature picture of 568 ⁇ 568 ⁇ 64.
- the picture is downsampled to half the original size: 284 ⁇ 284 ⁇ 64, and then further extracted by 128 convolution kernels Picture features.
- each layer will undergo two convolutions to extract image features; each down-sampling layer will reduce the picture by half and double the number of convolution kernels.
- the result of the final down-sampling part is 28 ⁇ 28 ⁇ 1024, that is, there are a total of 1024 feature layers, and the feature size of each layer is 28 ⁇ 28. From bottom to top on the right part, there are 4 upsampling processes.
- the 28 ⁇ 28 ⁇ 1024 feature matrix is deconvolved through 512 2 ⁇ 2 convolution kernels, and the matrix is expanded to 56 ⁇ 56 ⁇ 512 (the result is only 512 in the blue part of the right half Feature channels, excluding the left side), because deconvolution can only enlarge the picture but cannot restore the picture.
- the method of cropping the picture on the left down-sampling to the same size and directly stitching it together is used to increase the feature layer (here It is the 512 feature channels in the white part on the left half), and then convolution is performed to extract the features. Since each valid convolution will make the result smaller, it is necessary to crop the picture in the down-sampling process on the left before each splicing.
- the entire new feature matrix becomes 56 ⁇ 56 ⁇ 1024, and then through 512 convolution kernels, the feature matrix of 52 ⁇ 52 ⁇ 512 is obtained after two convolutions, and then upsampling is performed again, and repeat The above process.
- Each layer will perform two convolutions to extract features, and each up-sampling layer will double the image and reduce the number of convolution kernels by half.
- the final up-sampling result is 388 ⁇ 388 ⁇ 64, that is, there are a total of 64 feature layers, and the feature size of each layer is 388 ⁇ 388.
- U-net is used to realize the function of positioning picture pixels.
- the network classifies each pixel in the image, and the final output is an image that is segmented according to the category of the pixel. That is to say, input the picture, after the convolutional neural network calculation, input 6 feature maps, 5 key points and an average point.
- the obtained feature map in the heat map format includes 5 faces Key point feature map and a face average point feature map.
- the 6 feature maps are 5 feature maps corresponding to 5 face key points, and 1 feature map is output for the average face point.
- the output format is all heat maps, and the heat indicates that there are face key points at this point, or The probability of a human face.
- the final output of the key point position of the face of the overall model of the present application is calculated and output by using a high-resolution heat map, which achieves the technical effect of improving accuracy and robustness.
- S130 Use the obtained feature map in the heat map format to obtain a real face key point position and a face average point position feature vector by using a heat map maximum value suppression algorithm.
- the image is input to the U-Net-based multi-face detection model, and 6 feature maps are output, including 5 key point feature maps and an average point feature map; then the face is obtained through the heat map maximum value suppression algorithm Key points (5) and the position of the average point of the face; extract the feature vector of the average position of the face to obtain the expected position of the key point of the face; through the expected position of the key point of the face and the real person obtained before Face key point The position of the face key point, so as to realize the correlation between the face and the above points (the face key point (and the average face point)).
- the obtained feature map of the heat map format is used to obtain the true face key point position and the face average point position feature vector using the heat map maximum value suppression algorithm, which is the use of traversal
- the heat map maximum value suppression algorithm which is the use of traversal
- S140 Use the obtained real face key point position data and the average face point position feature vector to determine the face key point position using an association algorithm to complete face detection.
- Fig. 2 shows the flow of a preferred embodiment of determining the position of key points of a face in the method for multi-face detection based on key point positioning according to the present application.
- the step of determining the position of the key point of the face by using the obtained real face key point position data and the average face point position data of the face using an association algorithm includes steps S210-S230:
- an algorithm that associates a human face with key points of the human face.
- the position of the face (average point) calculated above is all the detected faces; however, all the key points of the face calculated are not yet associated with the specific face attribution.
- S220 Obtain the expected key point position of the face through the feature vector of the average face point position and the offset vector of the average face point.
- Step 2 outputs the true key point position, and calculates the true key point distance expectation through the Euclidean distance formula
- the closest key point is the only key point that determines the correlation between the face. That is to say, for face detection and face key point positioning, the face frame is only an attached output result, and the sub-networks related to the face frame can be cropped according to actual needs, further saving calculations.
- the multi-face detection method based on key point positioning of the present application completes face detection and key point positioning at the same time by using a single model without going through the face frame step, saving the calculation process and steps, and realizing speeding up the final application
- the response time and the technical effect of reducing computing consumption are more suitable for mobile devices such as mobile phones with limited computing resources.
- the step further includes:
- the obtained real face key point position and the average face point position feature vector are used to obtain the face frame to complete the face detection. That is, the key points of the face and the average point of the face obtained in step S130 are obtained to obtain the face frame, and the face detection is completed according to the face frame.
- the coordinates of the upper left corner, the coordinates of the lower right corner, and the coordinates of the average point of the face rectangular frame are obtained according to the feature vector of the average point position of the face; and then the face rectangular frame is determined.
- model training in order to improve accuracy, model training has more dimensions of labeled information and stronger supervision signals (model training with labeled data is also called supervised learning), theoretically and experimental data can be improved Accuracy.
- the steps of obtaining a rectangular frame for the face that is, cropping the face area for archiving) are retained.
- the feature vector of the average point position of the face is extracted, the key point position expected sub-model of the face (another fully connected neural network sub-model) is input, and the regression calculation is performed, and 4 values are output, which respectively represent the The offset (X tl , Y tl ) from the upper left corner of the rectangular frame of the face to the average point position of the face in XY directions (X tl, Y tl ), and the position of the lower right corner of the rectangular frame of the face to the average point position of the face.
- the offsets in the XY directions (X rb , Y rb ); the values of these 4 offsets plus the coordinates of the average point position of the face (X f , Y f ) get the upper left corner of the face rectangle, Bottom right corner.
- the coordinates of the upper left corner, the coordinates of the lower right corner and the average of the rectangular frame of the face are obtained according to the feature vector of the average point position of the face. Point coordinates; and then determine the face rectangle to complete face detection.
- the multi-face detection method based on key point positioning of the present application uses a single model to complete face detection and key point positioning at the same time, without the face frame step, saving the calculation process and steps, and the final output of the overall model
- the position of the key points of the face is calculated and output using a high-resolution heat map, and the result will be more accurate and robust.
- FIG. 3 is a schematic diagram of the logical structure of the multi-face detection system based on key point positioning of the application; referring to FIG. 3, in order to achieve the above objective, the present application provides a multi-face detection system 300 based on key point positioning, including U-Net-based multi-face detection model training unit 310, feature map acquisition unit 320, feature vector acquisition unit 330, and face key point position acquisition unit 340; among them,
- the U-Net-based multi-face detection model training unit 310 is used to train a U-Net-based multi-face detection model using a data set; the data set is composed of two sets of data, with only face key points labeled Data sets and data sets that have both face key point annotations and face frame annotations.
- the feature map acquiring unit 320 is configured to input the image to be detected into a trained U-Net-based multi-face detection model to obtain a feature map in a heat map format; the image to be detected is input into the trained U-Net based feature map; Net's multi-face detection model, in the step of obtaining the feature map in the heat map format, the obtained feature map in the heat map format includes 5 face key point feature maps and 1 face average point feature map.
- the feature vector obtaining unit 330 is configured to use the obtained feature map in the heat map format to obtain the real face key point position and the face average point position feature vector by using the heat map maximum value suppression algorithm;
- the face key point position acquisition unit 340 uses the obtained real face key point position data and the average face point position data to determine the face key point position using an association algorithm to complete face detection.
- the face key point position acquisition unit further includes obtaining a face frame from the obtained real face key point position data and the average face point position feature vector to complete face detection.
- the feature maps in the heat map format obtained by the feature map obtaining unit 320 include 5 face key point feature maps and 1 face average point feature map.
- the face key point position acquisition unit 340 includes an offset vector acquisition module 341, an expected face key point position acquisition module 342, and a face key point position determination module 343;
- the offset vector acquisition module 341 is used to input the feature vector of the average point position of the face into the expected sub-model of the key point position of the face, and obtain the offset from the key point of the face to the average point of the face through regression calculation vector;
- the expected face key point position acquisition module 342 is configured to obtain the expected face key point position through the feature vector of the average face point position and the offset vector of the face average point;
- the face key point position determination module 343 is used to select the expected face key point position closest to the real face key point position through the Euclidean distance formula as the final output face key point position.
- the multi-face detection method based on key point positioning of the present application uses a single model to complete face detection and key point positioning at the same time, without going through the face frame step, saving the calculation process and steps, and the overall
- the final output of the model's key point position of the face is calculated and output using a high-resolution heat map, and the result will be more accurate and robust.
- This application provides a method for multi-face detection based on key point positioning, which is applied to an electronic device 4.
- Fig. 4 shows an application environment of a preferred embodiment of a method for multi-face detection based on key point positioning according to the present application.
- the electronic device 4 may be a terminal device with arithmetic function, such as a server, a smart phone, a tablet computer, a portable computer, a desktop computer, and the like.
- the electronic device 4 includes a processor 42, a memory 41, a communication bus 43 and a network interface 45.
- the memory 41 includes at least one type of readable storage medium.
- the readable storage medium may be non-volatile or volatile.
- the at least one type of readable storage medium may be a non-volatile storage medium such as flash memory, hard disk, multimedia card, card-type memory 41, and the like.
- the readable storage medium may be an internal storage unit of the electronic device 4, such as a hard disk of the electronic device 4.
- the readable storage medium may also be the external memory 41 of the electronic device 4, such as a plug-in hard disk equipped on the electronic device 4, or a smart memory card (Smart Media Card, SMC). , Secure Digital (SD) card, Flash Card, etc.
- the readable storage medium of the memory 41 is generally used to store a multi-face detection method program 40 based on key point positioning installed in the electronic device 4 and the like.
- the memory 41 can also be used to temporarily store data that has been output or will be output.
- the processor 42 may be a central processing unit (Central Processing Unit) in some embodiments.
- Central Processing Unit CPU
- CPU central processing unit
- microprocessor microprocessor or other data processing chip
- the communication bus 43 is used to realize connection and communication between these components.
- the network interface 44 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is generally used to establish a communication connection between the electronic device 4 and other electronic devices.
- FIG. 4 only shows the electronic device 4 with the components 41-44, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.
- the electronic device 4 may also include a user interface.
- the user interface may include an input unit such as a keyboard (Keyboard), a voice input device such as a microphone (microphone) and other devices with voice recognition functions, and a voice output device such as audio, earphones, etc.
- the user interface may also include a standard wired interface and a wireless interface.
- the electronic device 4 may also include a display, and the display may also be called a display screen or a display unit.
- the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an organic light-emitting diode (Organic Light-Emitting Diode, OLED) touch device, etc.
- the display is used for displaying information processed in the electronic device 4 and for displaying a visualized user interface.
- the electronic device 4 may also include a radio frequency (RF) circuit, a sensor, an audio circuit, etc., which will not be repeated here.
- RF radio frequency
- the memory 41 as a computer storage medium may include an operating system and a multi-face detection method program 40 based on key point positioning; the processor 42 executes the memory 41 based on
- the key point positioning multi-face detection method program 40 implements the following steps: use the data set to train the U-Net-based multi-face detection model; input the image to be detected into the trained U-Net-based multi-face detection model, Obtain the feature map in the heat map format; use the heat map maximum suppression algorithm to obtain the real face key point position and the average face point position feature vector of the obtained feature map in the heat map format; the obtained real person
- the key point position data of the face and the feature vector of the average point position of the face use the correlation algorithm to determine the key point position of the face to complete the face detection.
- the multi-face detection method program 40 based on key point positioning can also be divided into one or more modules, and the one or more modules are stored in the memory 41 and executed by the processor 42 to complete This application.
- the module referred to in this application refers to a series of computer program segments that can complete specific functions.
- the multi-face detection method program 40 based on key point positioning can be divided into a U-Net-based multi-face detection model training unit 310, a feature map acquisition unit 320, a feature vector acquisition unit 330, and a face key point position acquisition unit 340.
- this application also proposes a computer-readable storage medium, which mainly includes a storage data area and a storage program area.
- the storage data area can store data created according to the use of blockchain nodes, etc.
- the storage program area can store operations.
- System at least one application program required by the function, the computer-readable storage medium includes a multi-face detection method program based on key point positioning, when the multi-face detection method program based on key point positioning is executed by a processor Realize operations such as multi-face detection methods based on key point positioning.
- the computer-readable storage medium may be non-volatile or volatile.
- the specific implementation of the computer-readable storage medium of the present application is substantially the same as the specific implementation of the multiple face detection method, system, and electronic device based on key point positioning, and will not be repeated here.
- the multi-face detection method, system, electronic device, and computer-readable storage medium of this application based on key point positioning use a single model to complete face detection and key point positioning at the same time, saving the calculation process and steps.
- the face frame is only an output result, and the sub-networks related to the face frame can be cropped according to actual needs, further saving calculations; compared with the method of calculating key point positions in the prior art by the MTCNN network and numerical regression, this application
- the final output of the key point position of the face of the overall model is calculated and output using a high-resolution heat map, which achieves the technical effect of improving accuracy and robustness.
- the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
- Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
- the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
本申请涉及神经网络的人脸识别技术,揭露了一种基于关键点定位的多人脸检测方法,方法包括利用数据集训练基于U-Net的多人脸检测模型;将待检测图片输入训练好的基于U-Net的多人脸检测模型,获得热图格式的特征图;将所获得的热图格式的特征图,利用热图极大值抑制算法获取真实的人脸关键点位置和人脸平均点位置特征向量;将所获得真实的人脸关键点位置和人脸平均点位置特征向量利用关联算法确定人脸关键点位置,完成人脸检测。本申请还涉及区块链技术,数据存储于区块链中,本申请通过使用单一模型同时完成人脸检测和人脸关键点定位,节省了计算过程和步骤,实现了加快最终应用的响应时间和降低计算消耗的技术效果。
Description
本申请要求于2020年11月12日提交中国专利局、申请号为202011263174.9,发明名称为“ 基于关键点定位的多人脸检测方法、系统及存储介质”的中国发明专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及神经网络的人脸识别技术,尤其涉及一种基于关键点定位的多人脸检测方法、系统及存储介质。
近年来,视觉领域的深度卷积神经网络爆发式发展,人脸检测和人脸关键点定位的精度也有了很大的提升。
发明人意识到,现有技术中需要通过两个不同的卷积神经网络模型完成人脸检测和人脸关键点定位两个任务。虽然检测任务完成效果较好,但是存在的弊端如下:
由于通过两个不同的卷积神经网络完成,对于计算资源有限的手机端等移动设备而言功耗过高。
所以,亟需一种检测效果好,功耗低的基于人脸定位的多人脸检测方法。
本申请提供一种基于关键点定位的多人脸检测方法、系统及计算机可读存储介质,其主要解决多人脸检测功耗高的问题。
为实现上述目的,本申请提供一种基于关键点定位的多人脸检测方法,应用于电子装置,方法包括:利用数据集训练基于U-Net的多人脸检测模型;将待检测图片输入训练好的基于U-Net的多人脸检测模型,获得热图格式的特征图;将所获得的热图格式的特征图,利用热图极大值抑制算法获取真实的人脸关键点位置和人脸平均点位置特征向量;将所获得真实的人脸关键点位置数据和人脸平均点位置特征向量利用关联算法确定人脸关键点位置,完成人脸检测。
为实现上述目的,本申请提供一种基于关键点定位的多人脸检测系统,包括基于U-Net的多人脸检测模型训练单元、特征图获取单元、特征向量获取单元和人脸关键点位置获取单元;其中,所述基于U-Net的多人脸检测模型训练单元,用于利用数据集训练基于U-Net的多人脸检测模型;所述特征图获取单元,用于将待检测图片输入训练好的基于U-Net的多人脸检测模型,获得热图格式的特征图;所述特征向量获取单元,用于将所获得的热图格式的特征图,利用热图极大值抑制算法获取真实的人脸关键点位置和人脸平均点位置特征向量;所述人脸关键点位置获取单元,将所获得真实的人脸关键点位置数据和人脸平均点位置数据利用关联算法确定人脸关键点位置,完成人脸检测。
为实现上述目的,本申请还提供一种电子装置,该电子装置包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的程序,所述程序被所述至少一个处理器执行,以使所述至少一个处理器能够执行如上述的基于关键点定位的多人脸检测方法。
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,实现上述的基于关键点定位的多人脸检测方法的步骤。
本申请提出的基于关键点定位的多人脸检测方法、系统、电子装置及计算机可读存储介质,通过利用数据集训练基于U-Net的多人脸检测模型;将待检测图片输入训练好的基于U-Net的多人脸检测模型,获得热图格式的特征图;将所获得的热图格式的特征图,利用热图极大值抑制算法获取真实的人脸关键点位置和人脸平均点位置特征向量;将所获得真实的人脸关键点位置数据和人脸平均点位置数据利用关联算法获得人脸关键点位置,完成人脸检测。有益效果如下:1)、本申请的基于关键点定位的多人脸检测方法使用单一模型同时完成人脸检测和人脸关键点定位,节省了计算过程和步骤,实现了加快最终应用的响应时间和降低计算消耗的技术效果,更加适用于计算资源有限的手机端等移动设备;本申请的基于关键点定位的多人脸检测方法直接基于人脸关键点和人脸平均点做人脸检测和人脸关键点定位,得到应用实际需要的信息。而人脸框只是附带输出结果,可以根据实际需要裁剪掉人脸框相关的子网络,进一步节省计算;本申请的基于关键点定位的多人脸检测方法与现有技术中MTCNN网络和数值回归的方式计算关键点位置的方式相比,本申请的整体模型最终输出的人脸关键点位置是使用高分辨的热图计算输出来的,达到了提高精确性和鲁棒性的技术效果。
在此处键入技术解决方案描述段落。
在此处键入有益效果描述段落。
图1为本申请的基于关键点定位的多人脸检测方法较佳实施例的流程图;
图2为本申请的基于关键点定位的多人脸检测方法中确定人脸关键点位置的较佳实施例的流程图;
图3本申请的基于关键点定位的多人脸检测系统的逻辑结构示意图;
图4为本申请的电子装置的较佳实施例的结构示意图。
在此处键入本发明的最佳实施方式描述段落。
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
为了提高用户编码效率,本申请提供一种基于关键点定位的多人脸检测方法。图1示出了本申请基于关键点定位的多人脸检测方法较佳实施例的流程。参照图1所示,该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。
现有技术中基于人脸的应用的前置步骤都是先定位出人脸的关键点位置,然后使用人脸关键点来做人脸对齐,最后进行用于实际应用的步骤——比如人脸识别、活体识别、表情识别等。现有的MTCNN 包含三个级联的多任务卷积神经网络,分别是 Proposal Network (P-Net)、Refine Network (R-Net)、Output
Network (O-Net),每个多任务卷积神经网络均有三个学习任务,分别是人脸分类、边框回归和关键点定位。MTCNN 实现人脸检测和关键点定位分为三个阶段。首先由 P-Net 获得了人脸区域的候选窗口和边界框的回归向量,并用该边界框做回归,对候选窗口进行校准,然后通过非极大值抑制(NMS)来合并高度重叠的候选框。然后将 P-Net 得出的候选框作为输入,输入到 R-Net,R-Net 同样通过边界框回归和 NMS 来去掉那些 false-positive 区域,得到更为准确的候选框;最后,利用 O-Net 输出 5 个关键点的位置。
其中,其实前置步骤关键是找到人脸并定位出该人脸的关键点,至于人脸框并不是必须的。
本申请的一种基于关键点定位的多人脸检测方法通过使用单一模型同时完成人脸检测和人脸关键点定位,没有经过人脸框步骤,节省了计算过程和步骤,实现了加快最终应用的响应时间和降低计算消耗的技术效果,更加适用于计算资源有限的手机端等移动设备。
需要说明的是,本申请的一种基于关键点定位的多人脸检测方法,具体地说,所述基于关键点定位的多人脸检测方法包括步骤S110-步骤S140。
S110、利用数据集训练基于U-Net的多人脸检测模型。
在训练基于U-Net的多人脸检测模型之前首先要获取数据集,在一个具体的实施例中,所述数据集由两组数据集组成,仅有人脸关键点标注的数据集和同时存在人脸关键点标注和人脸框标注的数据集。具体地说,对于仅有人脸关键点标注的数据集而言,所述人脸关键点标注的获取方法为,所有的关键点坐标都记为(X
p,Y
p);每个人脸有5个关键点,分别为左眼、右眼、鼻尖、左嘴角、右嘴角;把所有关键点的x,y 坐标分别计算平均值,得到一个平均点;把这个平均点作为人脸的位置(X
f,Y
f)。对于同时存在人脸关键点标注和人脸框标注的数据集而言,其人脸关键点标注和人脸框标注的获取方法为,所有的关键点坐标都记为(X
p,Y
p);每个人脸有5个关键点,分别为左眼、右眼、鼻尖、左嘴角、右嘴角;把所有关键点的x,y 坐标分别计算平均值,得到一个平均点;把这个平均点作为人脸的位置(X
f,Y
f);把人脸框左上角到平均点的偏移量记为(X
tl,Y
tl),把人脸框右下角到平均点的偏移量记为(X
rb,Y
rb)。
上述数据集可以借助的数据库包括AFLW 人脸数据库、COFW 人脸数据库、MVFW 人脸数据库或者OCFW 人脸数据库中的一个或多个。其中,AFLW 人脸数据库包含 25993 幅从 Flickr 采集的人脸图像,每个人脸标定 21 个关键点。COFW 人脸数据库包含 LFPW 人脸数据库训练集中的 845 幅人脸图像以及其他 500 幅遮挡人脸图像,而测试集为 507 幅严重遮挡(同时包含姿态和表情的变化)的人脸图像,每个人脸标定 29 个关键点。MVFW 人脸数据库为多视角人脸数据集,包括 2050 幅训练人脸图像和 450 幅测试人脸图像,每个人脸标定 68 个关键点。OCFW 人脸数据库包含 2951 幅训练人脸图像(均为未遮挡人脸)和 1246 幅测试人脸图像(均为遮挡人脸),每个人脸标定 68 个关键点。
S120、将待检测图片输入训练好的基于U-Net的多人脸检测模型,获得热图格式的特征图。
在一个具体的实施中,基于U-Net的多人脸检测模型是基于网络模型U-Net。需要说明的是,基于U-Net的多人脸检测模型的网络结构一共有四层,分别对输入的图片进行了4次下采样和4次上采样。其中,包括利用3×3的卷积核对图片进行卷积后,通过ReLU激活函数输出特征通道;对左边下采样过程中的图片进行裁剪复制;通过最大池化对图片进行下采样,池化核大小为2×2;反卷积,对图像进行上采样,卷积核大小为2×2;使用1×1的卷积核对图片进行卷积。
在一个具体的实施例中,从最左边开始,输入的是一张527×527×1的图片,然后经过64个3×3的卷积核进行卷积,再通过ReLU函数后得到64个570×570×1的特征通道。然后把这570×570×64的结果再经过64个3×3的卷积核进行卷积,同样通过ReLU函数后得到64个568×568×1的特征提取结果,这就是第一层的处理结果。第一层的处理结果是568×568×64的特征图片,通过2×2的池化核,对图片下采样为原来大小的一半:284×284×64,然后通过128个卷积核进一步提取图片特征。后面的下采样过程也是以此类推,每一层都会经过两次卷积来提取图像特征;每下采样一层,都会把图片减小一半,卷积核数目增加一倍。最终下采样部分的结果是28×28×1024,也就是一共有1024个特征层,每一层的特征大小为28×28。右边部分从下往上则是4次上采样过程。从最右下角开始,把28×28×1024的特征矩阵经过512个2×2的卷积核进行反卷积,把矩阵扩大为56×56×512(结果仅仅是右半边蓝色部分的512个特征通道,不包含左边),由于反卷积只能扩大图片而不能还原图片,为了减少数据丢失,采取把左边降采样时的图片裁剪成相同大小后直接拼过来的方法增加特征层(这里才是左半边白色部分的512个特征通道),再进行卷积来提取特征。由于每一次valid卷积都会使得结果变小一圈,因此每次拼接之前都需要先把左边下采样过程中的图片进行裁剪。矩阵进行拼接后,整个新的特征矩阵就变成56×56×1024,然后经过512个卷积核,进行两次卷积后得到52×52×512的特征矩阵,再一次进行上采样,重复上述过程。每一层都会进行两次卷积来提取特征,每上采样一层,都会把图片扩大一倍,卷积核数目减少一半。最后上采样的结果是388×388×64,也就是一共有64个特征层,每一层的特征大小为388×388。
在最后一步中,选择了2个1×1的卷积核把64个特征通道变成2个,也就是最后的388×388×2,其实这里就是一个二分类的操作,把图片分成背景和目标两个类别。
总之,利用了U-net实现图片像素的定位的功能,该网络对图像中的每一个像素点进行分类,最后输出的是根据像素点的类别而分割好的图像。也就是说,输入图片,经过卷积神经网络计算后,输入6张特征图,5个关键点以及一个平均点。
具体地说,所述将待检测图片输入训练好的基于U-Net的多人脸检测模型,获得热图格式的特征图的步骤中,所获得的热图格式的特征图包括5张人脸关键点特征图和1张人脸平均点特征图。其中,所述6张特征图是,5个人脸关键点对应输出5张特征图,人脸平均点输出1张特征图,输出格式都是热图,热度表示该点存在人脸关键点、或人脸的概率。
本申请的整体模型最终输出的人脸关键点位置是使用高分辨的热图计算输出来的,达到了提高精确性和鲁棒性的技术效果。
S130、将所获得的热图格式的特征图,利用热图极大值抑制算法获取真实的人脸关键点位置和人脸平均点位置特征向量。
具体地说,将图片输入基于U-Net的多人脸检测模型,输出6张特征图,包括5个关键点特征图,一个平均点特征图;然后通过热图极大值抑制算法获得人脸关键点(5个)以及人脸平均点的位置;提取人脸平均点位置的特征向量,得到人脸的关键点的预期位置;通过人脸的关键点的预期位置与前面获得的真实的人脸关键点人脸关键点位置,从而实现人脸和上述点(人脸关键点(以及人脸平均点)的相互关联。
具体地说,在具体的实施过程中,将所获得的热图格式的特征图,利用热图极大值抑制算法获取真实的人脸关键点位置和人脸平均点位置特征向量,是利用遍历每张热图的每个点,如果该点的值比周围相邻点的值都大,则保留该点;否则去除该点,即将该点的值设置为零;如果该点的值小于阈值(比如0.5)也去除该点。计算完毕后,每张特征图上剩余的点即为对应特征图的人脸关键点位置(人脸平均点位置)。
S140、将所获得真实的人脸关键点位置数据和人脸平均点位置特征向量利用关联算法确定人脸关键点位置,完成人脸检测。
图2示出了本申请基于关键点定位的多人脸检测方法中确定人脸关键点位置较佳实施例的流程。参照图2所示,
所述将所获得真实的人脸关键点位置数据和人脸平均点位置数据利用关联算法确定人脸关键点位置步骤,包括步骤S210- S230:
在实例性的实施例中,把人脸和人脸关键点关联的算法。以上计算出的人脸(平均点)位置就是所有检出的人脸;但是,所有计算出的人脸关键点还没有关联到具体的人脸归属上。
S210、将所述人脸平均点位置的特征向量,输入人脸关键点位置预期子模型,通过回归计算获得人脸关键点到人脸平均点的偏移量向量。
具体地说,提取人脸平均点位置的特征向量,输入人脸关键点位置预期子模型,做回归计算,输出10个值,分别代表该人脸的5个关键点的X
p,Y
p坐标到人脸平均点位置的偏移量(每个关键点有X、Y两个方向的偏移量,5个关键点共输出10个值的偏移量)。
S220、通过人脸平均点位置的特征向量和人脸平均点的偏移量向量,获得预期的人脸的关键点位置。
人脸平均点位置加上人脸关键点位置预期子模型输出的5个关键点X、Y的偏移量,得到属于该人脸的5个关键点的预期位置。这里是通过另外一个神经网络子模型实现的。
S230、通过欧式距离公式,选择与真实的人脸关键点位置距离最近的预期的人脸的关键点位置,作为最终输出的人脸关键点位置。
把预期人脸关键点输出子模型输出的人脸5个关键点的预期位置,在步骤S210输出的多人脸关键点集合中比较,查找出距离最近的点,即为该人脸最终输出的人脸关键点位置。
总之,首先人脸平均点唯一确定了一个人脸,然后由人脸平均点找到5个关键点的预期位置,步骤2输出了真实的关键点位置,通过欧氏距离公式计算真实关键点距离预期关键点最近的就是唯一确定人脸对应关联的关键点了。也就是说,对于人脸检测和人脸关键点定位而言,而人脸框只是附带输出结果,可以根据实际需要裁剪掉人脸框相关的子网络,进一步节省计算。
本申请的一种基于关键点定位的多人脸检测方法通过使用单一模型同时完成人脸检测和人脸关键点定位,没有经过人脸框步骤,节省了计算过程和步骤,实现了加快最终应用的响应时间和降低计算消耗的技术效果,更加适用于计算资源有限的手机端等移动设备。
在一个具体的实施例中,在所述将所获得的真实的人脸关键点位置和人脸平均点位置数据利用关联算法获得人脸关键点位置,完成人脸检测的步骤中,还包括:将所获得真实的人脸关键点位置和人脸平均点位置特征向量获得人脸框,完成人脸检测。即将步骤S130中所获得的人脸关键点和人脸平均点,获得人脸框,根据所述人脸框,完成人脸检测。具体地说,根据人脸平均点位置的特征向量获得人脸矩形框的左上角坐标、右下角坐标和平均点坐标;进而确定人脸矩形框。
在具体实施过程中,为了提升精度,模型训练来说有更多维度的标注信息、更强的监督信号(带标注数据的模型训练也叫做监督学习),理论上和实验数据都指向说可以提升精度。保留了对人脸获取矩形框(即裁剪人脸区域存档)的步骤。
在一个具体的实施例中,提取人脸平均点位置的特征向量,输入人脸关键点位置预期子模型(另一个全连接神经网络子模型),做回归计算,输出4个值,分别代表该人脸的人脸矩形框左上角位置到人脸平均点位置X Y两个方向的偏移量(X
tl,Y
tl)、和该人脸的人脸矩形框右下角位置到人脸平均点位置X Y两个方向的偏移量(X
rb,Y
rb);这个4个偏移量的值加上人脸平均点位置坐标(X
f,Y
f)就得到了人脸矩形框的左上角、右下角。
总之,在计算资源充足、对人脸检测结果精度要求较高的场景中为了进一步提高检测精度会通过根据人脸平均点位置的特征向量获得人脸矩形框的左上角坐标、右下角坐标和平均点坐标;进而确定人脸矩形框,完成人脸检测。
本申请的基于关键点定位的多人脸检测方法通过使用单一模型同时完成人脸检测和人脸关键点定位,没有经过人脸框步骤,节省了计算过程和步骤, 而且整体模型最终输出的人脸关键点位置是使用高分辨的热图计算输出来的,结果会更精确和更鲁棒。
图3为本申请的基于关键点定位的多人脸检测系统的逻辑结构示意图;参照图3所示, 为实现上述目的,本申请提供一种基于关键点定位的多人脸检测系统300,包括基于U-Net的多人脸检测模型训练单元310、特征图获取单元320、特征向量获取单元330和人脸关键点位置获取单元340;其中,
所述基于U-Net的多人脸检测模型训练单元310,用于利用数据集训练基于U-Net的多人脸检测模型;所述数据集由两组数据集组成,仅有人脸关键点标注的数据集和同时存在人脸关键点标注和人脸框标注的数据集。
所述特征图获取单元320,用于将待检测图片输入训练好的基于U-Net的多人脸检测模型,获得热图格式的特征图;所述将待检测图片输入训练好的基于U-Net的多人脸检测模型,获得热图格式的特征图的步骤中,所获得的热图格式的特征图包括5张人脸关键点特征图和1张人脸平均点特征图。
所述特征向量获取单元330,用于将所获得的热图格式的特征图,利用热图极大值抑制算法获取真实的人脸关键点位置和人脸平均点位置特征向量;
所述人脸关键点位置获取单元340,将所获得真实的人脸关键点位置数据和人脸平均点位置数据利用关联算法确定人脸关键点位置,完成人脸检测。所述人脸关键点位置获取单元还包括将所获得真实的人脸关键点位置数据和人脸平均点位置特征向量获得人脸框,完成人脸检测。
其中,所述特征图获取单元320中所获得的热图格式的特征图包括5张人脸关键点特征图和1张人脸平均点特征图。
在一个具体的实施例中,所述人脸关键点位置获取单元340包括偏移向量获取模块341、预期的人脸关键点位置获取模块342,人脸关键点位置确定模块343;
所述偏移向量获取模块341,用于将所述人脸平均点位置的特征向量,输入人脸关键点位置预期子模型,通过回归计算获得人脸关键点到人脸平均点的偏移量向量;
所述预期的人脸关键点位置获取模块342,用于通过人脸平均点位置的特征向量和人脸平均点的偏移量向量,获得预期的人脸的关键点位置;
所述人脸关键点位置确定模块343,用于通过欧式距离公式,选择与真实的人脸关键点位置距离最近的预期的人脸的关键点位置,作为最终输出的人脸关键点位置。
综上所述,本申请的基于关键点定位的多人脸检测方法通过使用单一模型同时完成人脸检测和人脸关键点定位,没有经过人脸框步骤,节省了计算过程和步骤, 而且整体模型最终输出的人脸关键点位置是使用高分辨的热图计算输出来的,结果会更精确和更鲁棒。
本申请提供一种基于关键点定位的多人脸检测方法,应用于一种电子装置4。
图4示出了根据本申请基于关键点定位的多人脸检测方法较佳实施例的应用环境。
参照图4所示,在本实施例中,电子装置4可以是服务器、智能手机、平板电脑、便携计算机、桌上型计算机等具有运算功能的终端设备。
该电子装置4包括:处理器42、存储器41、通信总线43及网络接口45。
存储器41包括至少一种类型的可读存储介质。所述可读存储介质可以是非易失性的,也可以是易失性的。所述至少一种类型的可读存储介质可为如闪存、硬盘、多媒体卡、卡型存储器41等的非易失性存储介质。在一些实施例中,所述可读存储介质可以是所述电子装置4的内部存储单元,例如该电子装置4的硬盘。在另一些实施例中,所述可读存储介质也可以是所述电子装置4的外部存储器41,例如所述电子装置4上配备的插接式硬盘,智能存储卡(Smart Media Card, SMC),安全数字(Secure Digital, SD)卡,闪存卡(Flash Card)等。
在本实施例中,所述存储器41的可读存储介质通常用于存储安装于所述电子装置4的基于关键点定位的多人脸检测方法程序40等。所述存储器41还可以用于暂时地存储已经输出或者将要输出的数据。
处理器42在一些实施例中可以是一中央处理器(Central
Processing Unit, CPU),微处理器或其他数据处理芯片,用于运行存储器51中存储的程序代码或处理数据,例如执行基于关键点定位的多人脸检测方法程序40等。
通信总线43用于实现这些组件之间的连接通信。
网络接口44可选地可以包括标准的有线接口、无线接口(如WI-FI接口),通常用于在该电子装置4与其他电子设备之间建立通信连接。
图4仅示出了具有组件41-44的电子装置4,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。
可选地,该电子装置4还可以包括用户接口,用户接口可以包括输入单元比如键盘(Keyboard)、语音输入装置比如麦克风(microphone)等具有语音识别功能的设备、语音输出装置比如音响、耳机等,可选地用户接口还可以包括标准的有线接口、无线接口。
可选地,该电子装置4还可以包括显示器,显示器也可以称为显示屏或显示单元。在一些实施例中可以是LED显示器、液晶显示器、触控式液晶显示器以及有机发光二极管(Organic Light-Emitting Diode,OLED)触摸器等。显示器用于显示在电子装置4中处理的信息以及用于显示可视化的用户界面。
可选地,该电子装置4还可以包括射频(Radio Frequency,RF)电路,传感器、音频电路等等,在此不再赘述。
在图4所示的装置实施例中,作为一种计算机存储介质的存储器41中可以包括操作系统、以及基于关键点定位的多人脸检测方法程序40;处理器42执行存储器41中存储的基于关键点定位的多人脸检测方法程序40时实现如下步骤:利用数据集训练基于U-Net的多人脸检测模型;将待检测图片输入训练好的基于U-Net的多人脸检测模型,获得热图格式的特征图;将所获得的热图格式的特征图,利用热图极大值抑制算法获取真实的人脸关键点位置和人脸平均点位置特征向量;将所获得真实的人脸关键点位置数据和人脸平均点位置特征向量利用关联算法确定人脸关键点位置,完成人脸检测。
在其他实施例中,基于关键点定位的多人脸检测方法程序40还可以被分割为一个或者多个模块,一个或者多个模块被存储于存储器41中,并由处理器42执行,以完成本申请。本申请所称的模块是指能够完成特定功能的一系列计算机程序程序段。基于关键点定位的多人脸检测方法程序40可以分为包括基于U-Net的多人脸检测模型训练单元310、特征图获取单元320、特征向量获取单元330和人脸关键点位置获取单元340。
此外,本申请还提出一种计算机可读存储介质,主要包括存储数据区和存储程序区,其中,存储数据区可存储根据区块链节点的使用所创建的数据等,存储程序区可存储操作系统、至少一个功能所需的应用程序,所述计算机可读存储介质中包括基于关键点定位的多人脸检测方法程序,所述基于关键点定位的多人脸检测方法程序被处理器执行时实现如基于关键点定位的多人脸检测方法的操作。所述计算机可读存储介质可以是非易失性,也可以是易失性。
本申请之计算机可读存储介质的具体实施方式与上述基于关键点定位的多人脸检测方法、系统、电子装置的具体实施方式大致相同,在此不再赘述。
总的来说,本申请基于关键点定位的多人脸检测方法、系统、电子装置及计算机可读存储介质,使用单一模型同时完成人脸检测和人脸关键点定位,节省了计算过程和步骤,实现了加快最终应用的响应时间和降低计算消耗的技术效果,更加适用于计算资源有限的手机端等移动设备;直接基于人脸关键点和人脸平均点做人脸检测和人脸关键点定位,得到应用实际需要的信息。而人脸框只是附带输出结果,可以根据实际需要裁剪掉人脸框相关的子网络,进一步节省计算;与现有技术中MTCNN网络和数值回归的方式计算关键点位置的方式相比,本申请的整体模型最终输出的人脸关键点位置是使用高分辨的热图计算输出来的,达到了提高精确性和鲁棒性的技术效果。
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干程序用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发 明的专利保护范围内。
在此处键入工业实用性描述段落。
在此处键入序列表自由内容描述段落。
Claims (20)
- 一种基于关键点定位的多人脸检测方法,应用于电子装置,其中,方法包括:利用数据集训练基于U-Net的多人脸检测模型;将待检测图片输入训练好的基于U-Net的多人脸检测模型,获得热图格式的特征图;将所获得的热图格式的特征图,利用热图极大值抑制算法获取真实的人脸关键点位置和人脸平均点位置特征向量;将所获得真实的人脸关键点位置和人脸平均点位置特征向量利用关联算法确定人脸关键点位置,完成人脸检测。
- 根据权利要求1所述的基于关键点定位的多人脸检测方法,其中,在所述将所获得的真实的人脸关键点位置和人脸平均点位置数据利用关联算法获得人脸关键点位置,完成人脸检测的步骤中,还包括:通过所获得真实的人脸关键点位置和人脸平均点位置特征向量获得人脸框,完成人脸检测。
- 根据权利要求1所述的基于关键点定位的多人脸检测方法,其中,所述将所获得真实的人脸关键点位置数据和人脸平均点位置数据利用关联算法确定人脸关键点位置步骤,包括:将所述人脸平均点位置的特征向量,输入人脸关键点位置预期子模型,通过回归计算获得人脸关键点到人脸平均点的偏移量向量;通过人脸平均点位置的特征向量和人脸平均点的偏移量向量,获得预期的人脸的关键点位置;通过欧式距离公式,选择与真实的人脸关键点位置距离最近的预期的人脸的关键点位置,作为最终输出的人脸关键点位置。
- 根据权利要求1所述的基于关键点定位的多人脸检测方法,其中, 所述数据集由两组数据集组成,仅有人脸关键点标注的数据集和同时存在人脸关键点标注和人脸框标注的数据集。
- 根据权利要求4所述的基于关键点定位的多人脸检测方法,其中,所述数据集包括AFLW 人脸数据库、COFW 人脸数据库、MVFW 人脸数据库或者OCFW 人脸数据库中的一个或多个。
- 根据权利要求1所述的基于关键点定位的多人脸检测方法,其中,所述将待检测图片输入训练好的基于U-Net的多人脸检测模型,获得热图格式的特征图的步骤中,所获得的热图格式的特征图包括5张人脸关键点特征图和1张人脸平均点特征图。
- 根据权利要求6所述的基于关键点定位的多人脸检测方法,其中,所述基于U-Net的多人脸检测模型采用ReLU激活函数作为输出特征通道。
- 一种基于关键点定位的多人脸检测系统,其中,包括基于U-Net的多人脸检测模型训练单元、特征图获取单元、特征向量获取单元和人脸关键点位置获取单元;其中,所述基于U-Net的多人脸检测模型训练单元,用于利用数据集训练基于U-Net的多人脸检测模型;所述特征图获取单元,用于将待检测图片输入训练好的基于U-Net的多人脸检测模型,获得热图格式的特征图;所述特征向量获取单元,用于将所获得的热图格式的特征图,利用热图极大值抑制算法获取真实的人脸关键点位置和人脸平均点位置特征向量;所述人脸关键点位置获取单元,将所获得真实的人脸关键点位置数据和人脸平均点位置数据利用关联算法确定人脸关键点位置,完成人脸检测。
- 根据权利要求8所述的基于关键点定位的多人脸检测系统,其中,所述人脸关键点位置获取单元还包括将所获得真实的人脸关键点位置数据和人脸平均点位置特征向量获得人脸框,完成人脸检测。
- 根据权利要求8所述的基于关键点定位的多人脸检测系统,其中,所述人脸关键点位置获取单元包括偏移向量获取模块、预期的人脸关键点位置获取模块,人脸关键点位置确定模块;所述偏移向量获取模块,用于将所述人脸平均点位置的特征向量,输入人脸关键点位置预期子模型,通过回归计算获得人脸关键点到人脸平均点的偏移量向量;所述预期的人脸关键点位置获取模块,用于通过人脸平均点位置的特征向量和人脸平均点的偏移量向量,获得预期的人脸的关键点位置;所述人脸关键点位置确定模块,用于通过欧式距离公式,选择与真实的人脸关键点位置距离最近的预期的人脸的关键点位置,作为最终输出的人脸关键点位置。
- 根据权利要求8所述的基于关键点定位的多人脸检测系统,其中,所述数据集由两组数据集组成,仅有人脸关键点标注的数据集和同时存在人脸关键点标注和人脸框标注的数据集。
- 根据权利要求8所述的基于关键点定位的多人脸检测系统,其中,所述将待检测图片输入训练好的基于U-Net的多人脸检测模型,获得热图格式的特征图的步骤中,所获得的热图格式的特征图包括5张人脸关键点特征图和1张人脸平均点特征图。
- 根据权利要求8所述的基于关键点定位的多人脸检测系统,其中,所述特征图获取单元中所获得的热图格式的特征图包括5张人脸关键点特征图和1张人脸平均点特征图。
- 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时,实现基于关键点定位的多人脸检测方法,方法包括:利用数据集训练基于U-Net的多人脸检测模型;将待检测图片输入训练好的基于U-Net的多人脸检测模型,获得热图格式的特征图;将所获得的热图格式的特征图,利用热图极大值抑制算法获取真实的人脸关键点位置和人脸平均点位置特征向量;将所获得真实的人脸关键点位置和人脸平均点位置特征向量利用关联算法确定人脸关键点位置,完成人脸检测。
- 根据权利要求14所述的计算机可读存储介质,其中,在所述将所获得的真实的人脸关键点位置和人脸平均点位置数据利用关联算法获得人脸关键点位置,完成人脸检测的步骤中,还包括:通过所获得真实的人脸关键点位置和人脸平均点位置特征向量获得人脸框,完成人脸检测。
- 根据权利要求14所述的计算机可读存储介质,其中,所述将所获得真实的人脸关键点位置数据和人脸平均点位置数据利用关联算法确定人脸关键点位置步骤,包括:将所述人脸平均点位置的特征向量,输入人脸关键点位置预期子模型,通过回归计算获得人脸关键点到人脸平均点的偏移量向量;通过人脸平均点位置的特征向量和人脸平均点的偏移量向量,获得预期的人脸的关键点位置;通过欧式距离公式,选择与真实的人脸关键点位置距离最近的预期的人脸的关键点位置,作为最终输出的人脸关键点位置。
- 根据权利要求14所述的计算机可读存储介质,其中,所述数据集由两组数据集组成,仅有人脸关键点标注的数据集和同时存在人脸关键点标注和人脸框标注的数据集。
- 根据权利要求14所述的计算机可读存储介质,其中,所述将待检测图片输入训练好的基于U-Net的多人脸检测模型,获得热图格式的特征图的步骤中,所获得的热图格式的特征图包括5张人脸关键点特征图和1张人脸平均点特征图。
- 根据权利要求18所述的计算机可读存储介质,其中,所述基于U-Net的多人脸检测模型采用ReLU激活函数作为输出特征通道。
- 一种电子装置,其中,该电子装置包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的程序,所述程序被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至7中任一所述的基于关键点定位的多人脸检测方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011263174.9A CN112380978B (zh) | 2020-11-12 | 2020-11-12 | 基于关键点定位的多人脸检测方法、系统及存储介质 |
CN202011263174.9 | 2020-11-12 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021190664A1 true WO2021190664A1 (zh) | 2021-09-30 |
Family
ID=74583510
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/084307 WO2021190664A1 (zh) | 2020-11-12 | 2021-03-31 | 基于关键点定位的多人脸检测方法、系统及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112380978B (zh) |
WO (1) | WO2021190664A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113989568A (zh) * | 2021-10-29 | 2022-01-28 | 北京百度网讯科技有限公司 | 目标检测方法、训练方法、装置、电子设备以及存储介质 |
CN115205951A (zh) * | 2022-09-16 | 2022-10-18 | 深圳天海宸光科技有限公司 | 一种戴口罩人脸关键点数据生成方法 |
CN117523636A (zh) * | 2023-11-24 | 2024-02-06 | 北京远鉴信息技术有限公司 | 一种人脸检测方法、装置、电子设备及存储介质 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112380978B (zh) * | 2020-11-12 | 2024-05-07 | 平安科技(深圳)有限公司 | 基于关键点定位的多人脸检测方法、系统及存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109858466A (zh) * | 2019-03-01 | 2019-06-07 | 北京视甄智能科技有限公司 | 一种基于卷积神经网络的人脸关键点检测方法及装置 |
CN110516642A (zh) * | 2019-08-30 | 2019-11-29 | 电子科技大学 | 一种轻量化人脸3d关键点检测方法及系统 |
CN111914782A (zh) * | 2020-08-10 | 2020-11-10 | 河南威虎智能科技有限公司 | 人脸及其特征点的检测方法、装置、电子设备和存储介质 |
CN112380978A (zh) * | 2020-11-12 | 2021-02-19 | 平安科技(深圳)有限公司 | 基于关键点定位的多人脸检测方法、系统及存储介质 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109684911B (zh) * | 2018-10-30 | 2021-05-11 | 百度在线网络技术(北京)有限公司 | 表情识别方法、装置、电子设备及存储介质 |
-
2020
- 2020-11-12 CN CN202011263174.9A patent/CN112380978B/zh active Active
-
2021
- 2021-03-31 WO PCT/CN2021/084307 patent/WO2021190664A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109858466A (zh) * | 2019-03-01 | 2019-06-07 | 北京视甄智能科技有限公司 | 一种基于卷积神经网络的人脸关键点检测方法及装置 |
CN110516642A (zh) * | 2019-08-30 | 2019-11-29 | 电子科技大学 | 一种轻量化人脸3d关键点检测方法及系统 |
CN111914782A (zh) * | 2020-08-10 | 2020-11-10 | 河南威虎智能科技有限公司 | 人脸及其特征点的检测方法、装置、电子设备和存储介质 |
CN112380978A (zh) * | 2020-11-12 | 2021-02-19 | 平安科技(深圳)有限公司 | 基于关键点定位的多人脸检测方法、系统及存储介质 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113989568A (zh) * | 2021-10-29 | 2022-01-28 | 北京百度网讯科技有限公司 | 目标检测方法、训练方法、装置、电子设备以及存储介质 |
CN115205951A (zh) * | 2022-09-16 | 2022-10-18 | 深圳天海宸光科技有限公司 | 一种戴口罩人脸关键点数据生成方法 |
CN117523636A (zh) * | 2023-11-24 | 2024-02-06 | 北京远鉴信息技术有限公司 | 一种人脸检测方法、装置、电子设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN112380978B (zh) | 2024-05-07 |
CN112380978A (zh) | 2021-02-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021190664A1 (zh) | 基于关键点定位的多人脸检测方法、系统及存储介质 | |
US11113523B2 (en) | Method for recognizing a specific object inside an image and electronic device thereof | |
CN108446698B (zh) | 在图像中检测文本的方法、装置、介质及电子设备 | |
WO2021135509A1 (zh) | 图像处理方法、装置、电子设备及存储介质 | |
WO2017190646A1 (zh) | 一种人脸图像处理方法和装置、存储介质 | |
US20200279358A1 (en) | Method, device, and system for testing an image | |
WO2023035531A1 (zh) | 文本图像超分辨率重建方法及其相关设备 | |
CN112560753B (zh) | 基于特征融合的人脸识别方法、装置、设备及存储介质 | |
WO2020248848A1 (zh) | 智能化异常细胞判断方法、装置及计算机可读存储介质 | |
WO2023082784A1 (zh) | 一种基于局部特征注意力的行人重识别方法和装置 | |
CN110852311A (zh) | 一种三维人手关键点定位方法及装置 | |
CN110211195B (zh) | 生成图像集合的方法、装置、电子设备和计算机可读存储介质 | |
CN112581344A (zh) | 一种图像处理方法、装置、计算机设备及存储介质 | |
CN111353442A (zh) | 图像处理方法、装置、设备及存储介质 | |
CN111428671A (zh) | 人脸结构化信息识别方法、系统、装置及存储介质 | |
CN113012075A (zh) | 一种图像矫正方法、装置、计算机设备及存储介质 | |
CN106548117B (zh) | 一种人脸图像处理方法和装置 | |
CN113228105A (zh) | 一种图像处理方法、装置和电子设备 | |
WO2022063321A1 (zh) | 图像处理方法、装置、设备及存储介质 | |
CN112749576B (zh) | 图像识别方法和装置、计算设备以及计算机存储介质 | |
WO2021179751A1 (zh) | 图像处理方法和系统 | |
CN112395834B (zh) | 基于图片输入的脑图生成方法、装置、设备及存储介质 | |
CN112348008A (zh) | 证件信息的识别方法、装置、终端设备及存储介质 | |
WO2021051580A1 (zh) | 基于分组批量的图片检测方法、装置及存储介质 | |
WO2020077535A1 (zh) | 图像语义分割方法、计算机设备和存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21774899 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21774899 Country of ref document: EP Kind code of ref document: A1 |