WO2021179471A1 - 一种人脸模糊度检测方法、装置、计算机设备及存储介质 - Google Patents

一种人脸模糊度检测方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2021179471A1
WO2021179471A1 PCT/CN2020/097009 CN2020097009W WO2021179471A1 WO 2021179471 A1 WO2021179471 A1 WO 2021179471A1 CN 2020097009 W CN2020097009 W CN 2020097009W WO 2021179471 A1 WO2021179471 A1 WO 2021179471A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
image
block
layer
face image
Prior art date
Application number
PCT/CN2020/097009
Other languages
English (en)
French (fr)
Inventor
张奔奔
杭欣
Original Assignee
苏宁易购集团股份有限公司
苏宁云计算有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏宁易购集团股份有限公司, 苏宁云计算有限公司 filed Critical 苏宁易购集团股份有限公司
Priority to CA3174691A priority Critical patent/CA3174691A1/en
Publication of WO2021179471A1 publication Critical patent/WO2021179471A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • the present invention relates to the technical field of computer vision, in particular to a method, device, computer equipment and storage medium for detecting the ambiguity of a human face.
  • face recognition technology is becoming more and more important, such as face-swiping payment, face-swiping gates, etc., which greatly facilitates people’s lives.
  • face recognition technology is becoming more and more important, such as face-swiping payment, face-swiping gates, etc., which greatly facilitates people’s lives.
  • the quality of the face image input to the face recognition model will affect the recognition effect. It is particularly important to screen these face images reasonably, such as discarding the images with too high a degree of blur.
  • the detection of face ambiguity mainly includes two methods: full reference and no reference:
  • the full reference needs to use the original face image before degradation as a reference to compare with the blurred image.
  • the disadvantage of this method is that the original face image before degradation is not easy to obtain;
  • the traditional method is to input an image containing the face and the background.
  • the gradient function such as Brenner, Tenengrad, Laplacian algorithm to calculate the person The gradient value of the face area.
  • the larger the gradient value the clearer the contour of the face, that is, the clearer the face image.
  • the smaller the gradient value the more blurred the contour of the face, that is, the more blurred the face image.
  • neural networks have a powerful ability to extract image features, and the use of deep learning methods to detect the ambiguity of human faces has appeared, and some progress has been made accordingly.
  • the deep learning method is usually used to classify face image categories into fuzzy and clear. After the experiment, it is found that there are still some clear face images that are judged to be blurred, which cannot meet the detection requirements of high accuracy.
  • the present invention provides a method, device, computer equipment and storage medium for detecting facial ambiguity, which can effectively improve the accuracy of facial ambiguity detection.
  • the specific technical solutions provided by the embodiments of the present invention are as follows:
  • a method for detecting ambiguity of a human face includes:
  • a pre-trained blur detection model is used to predict each of the block images respectively to obtain the confidence that each block image corresponds to each of the multiple level labels, wherein the multiple levels
  • the label includes multiple levels of clarity and multiple levels of ambiguity
  • the blur degree of the face image is calculated according to the definition and blur degree of all the block images.
  • the extracting from the face image the feature block images where the multiple face feature points are located respectively includes:
  • the size of the face area is adjusted to a preset size, and the block image where each feature point of the face is located is extracted from the adjusted face area.
  • the ambiguity detection model is obtained by training in the following method:
  • the deep neural network includes a data input layer, a feature extraction layer, a first fully connected layer, an activation function layer, a Dropout layer, a second fully connected layer, and a loss function layer that are sequentially cascaded
  • the feature extraction layer includes The convolutional layer, the maximum pooling layer, the minimum pooling layer, and the concatenation layer, the data input layer, the maximum pooling layer, and the minimum pooling layer are respectively connected to the convolutional layer, and the maximum The pooling layer, the minimum pooling layer, and the first fully connected layer are respectively connected to the tandem layer.
  • the method further includes:
  • the method further includes:
  • the face image is a blurred image; otherwise, it is determined that the face image is a clear image.
  • a device for detecting ambiguity of a human face includes:
  • the extraction module is used to extract the block images where multiple facial feature points are respectively located from the face image
  • the prediction module is used to predict each block image separately through a pre-trained blur detection model, and obtain the confidence that each block image corresponds to each of the multiple level labels, wherein,
  • the multiple level tags include multiple sharpness levels and multiple ambiguity levels;
  • An obtaining module configured to calculate the definition and blurriness of each block image according to the confidence that each block image corresponds to each of the multiple level tags
  • the calculation module is used to calculate the blur degree of the face image according to the definition and blur degree of all the block images.
  • extraction module is specifically used for:
  • the size of the face area is adjusted to a preset size, and the block image where each feature point of the face is located is extracted from the adjusted face area.
  • the device further includes a training module, and the training module is specifically configured to:
  • the deep neural network includes a data input layer, a feature extraction layer, a first fully connected layer, an activation function layer, a Dropout layer, a second fully connected layer, and a loss function layer that are sequentially cascaded
  • the feature extraction layer includes The convolutional layer, the maximum pooling layer, the minimum pooling layer, and the concatenation layer, the data input layer, the maximum pooling layer, and the minimum pooling layer are respectively connected to the convolutional layer, and the maximum The pooling layer, the minimum pooling layer, and the first fully connected layer are respectively connected to the tandem layer.
  • training module is specifically used for:
  • the device further includes a judgment module, and the judgment module is specifically configured to:
  • the face image is a blurred image; otherwise, it is determined that the face image is a clear image.
  • a computer device including a memory, a processor, and a computer program that is stored in the memory and can run on the processor.
  • the processor executes the computer program
  • the computer program is The face blur detection method described in the aspect.
  • a computer-readable storage medium stores a computer program that, when executed by a processor, realizes the face blur detection method as described in the first aspect.
  • the present invention extracts the block images where multiple facial feature points are located from the face image, and then predicts that each block image corresponds to multiple levels by using a pre-trained blur detection model.
  • the confidence level of each level label in the label, and according to the confidence level of each block image corresponding to each level label in the multiple level labels, the definition and blur degree of each block image are obtained, and finally according to all block images Calculate the blur degree of the face image with the definition and blur degree of the face image.
  • the present invention uses pre-trained ambiguity detection
  • the model predicts the confidence that different block images in the face image corresponds to each level label in multiple level labels, and obtains each level label according to the confidence level of each block image corresponding to each level label in the multiple level labels.
  • the blur degree of each block image because the multiple level labels include multiple sharpness levels and multiple blur degree levels, compared with the prior art using deep learning methods, only the categories of face block images are divided into two categories: blurry and clear.
  • Two-classification processing method The present invention converts the two-classification problem into a multi-classification problem, and then converts it into the two-classification to obtain the fuzzy degree result, which can effectively avoid the problem of the clear image being misjudged as a blurred image, thereby further improving the image blurriness The accuracy of detection.
  • FIG. 1 is a flowchart of a method for detecting ambiguity of a human face according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of the structure of a deep neural network provided by an embodiment of the present invention.
  • 4a to 4c are ROC curve diagrams of the ambiguity detection model of the embodiment of the present invention on different test sets;
  • FIG. 5 is a structural diagram of a face ambiguity detection device provided by an embodiment of the present invention.
  • Fig. 6 is an internal structure diagram of a computer device provided by an embodiment of the present invention.
  • Fig. 1 is a flowchart of a method for detecting ambiguity of a face according to an embodiment of the present invention. As shown in Fig. 1, the method may include:
  • Step 101 Extract block images in which a plurality of facial feature points are respectively located from a face image.
  • the face region is detected on the face image, and the block image where the multiple facial feature points are respectively located is extracted from the face region.
  • the facial feature points can include feature points corresponding to the left pupil, right pupil, nose tip, left mouth corner, and right mouth corner, and can also be other feature points, such as feature points corresponding to eyebrows.
  • the block images where multiple facial feature points are located are respectively extracted from the face image, and different facial feature points are included in different block images, so that multiple block images can be extracted, for example, including The left-eye block image of the left pupil, the right-eye block image containing the right pupil, and so on.
  • each block image is separately predicted by the pre-trained blur detection model, and the confidence of each block image corresponding to each of the multiple level labels is obtained, where the multiple level labels include Multiple levels of sharpness and multiple levels of ambiguity.
  • the confidence that a certain block image corresponds to a certain level label is used to indicate the probability that the block image corresponds to the level label.
  • the clarity level is pre-divided into three levels according to the degree of clarity, including severe clarity, moderate clarity, and mild clarity.
  • the corresponding level labels are 0, 1, and 2;
  • Light to heavy is divided into three levels, including light blur, moderate blur, and heavy blur.
  • the corresponding grade labels are 3, 4, and 5 respectively. It can be understood that the number of levels of clarity and the number of levels of ambiguity are both It is not limited to three levels, and the embodiment of the present invention does not specifically limit this.
  • each block image is sequentially input into the blurriness detection model for prediction, and the confidence that each block image output by the blurriness detection model corresponds to each of the multiple class labels is obtained.
  • Step 103 Obtain the sharpness and blurriness of each block image according to the confidence that each block image corresponds to each level label of the multiple level labels.
  • the confidence level of the block image corresponding to each of the multiple level tags is calculated to obtain the clarity and blurriness of the block image.
  • the confidence of the block image corresponding to all the sharpness levels can be directly accumulated to obtain the sharpness of the block image
  • the confidence of the block image corresponding to all the blurriness levels can be directly accumulated to obtain the block
  • the blur degree of the image can also be obtained by other calculation methods to obtain the clarity and blur degree of the block image, which is not specifically limited in the embodiment of the present invention.
  • the confidence that the left-eye block image of a certain face image corresponds to the above 6 grade labels is: the probability corresponding to the grade label "0" is 0, and the probability corresponding to the grade label "1” Is 0.9, the probability corresponding to the level label "2" is 0.05, the probability corresponding to the level label "3” is 0.05, and the probability corresponding to the level label "4" and the level label "5" are both 0, directly to the left
  • the confidence of the eye block image corresponding to all the sharpness levels is accumulated, and the sharpness of the block image is obtained as 0.95.
  • the confidence of the left eye block image corresponding to all the blurriness levels is accumulated to obtain the confidence of the block image
  • the ambiguity is 0.05.
  • Step 104 Calculate the blur degree of the face image according to the sharpness and blur degree of all the block images.
  • the sharpness of all block images is accumulated and divided by the number of all block images to obtain the sharpness of the face image
  • the blur degree of all block images is accumulated and divided by the number of all block images , Get the blur degree of the face image.
  • An embodiment of the present invention provides a method for detecting ambiguity of a face, which extracts a block image where a plurality of facial feature points are located from a face image, and then uses a pre-trained ambiguity detection model to predict each block separately
  • the image corresponds to the confidence level of each level label in the multiple level labels, and according to the confidence level of each block image corresponding to each level label in the multiple level labels, the definition and blur degree of each block image are obtained
  • calculate the blur degree of the face image according to the clarity and blur degree of all the block images so that by using the block prediction idea, the blur degree of multiple block images in the face image is predicted separately, and then the predicted results are combined
  • the present invention uses pre- The trained ambiguity detection model predicts the confidence that different block images in the face image correspond to each level label in
  • the present invention converts the two-category problem into a multi-category problem and then converts it to two-category to obtain the fuzzy degree result, which can effectively avoid the problem of a clear image being misjudged as a blurred image. So as to further improve the accuracy of image blur detection.
  • the above-mentioned extracting feature block images where multiple facial feature points are respectively located from the face image may include:
  • Detect the face image locate the face area and multiple face feature points, adjust the size of the face area to a preset size, and extract the block where each face feature point is located from the adjusted face area image.
  • the trained MTCNN (Multi-task convolutional neural network) face detection model is used to detect the face image and locate the face area and multiple facial feature points.
  • the MTCNN face detection model here includes P-Net, R -Net and O-Net network layers, respectively responsible for generating detection frames, refined detection frames, and facial feature point positioning; the MTCNN face detection model can be trained with reference to the prior art model training method, and will not be repeated here.
  • the size of the face area is scaled to the preset size, and the coordinates of each face feature point are converted from the face image to the adjusted face area frame ,
  • Each face feature point is used as the center to carry out pixel expansion to the surrounding area to obtain multiple rectangular block images and cross-border processing.
  • the preset size is 184*184, and each face feature point is used as the center to Expand 24 pixels around to form a 48*48 block image.
  • the above-mentioned ambiguity detection model is obtained by training in the following method, including the steps:
  • Step 201 Extract a block image sample where each face feature point is located from a face image sample, where the face image sample includes clear face image samples with different sharpness levels and blurred face image samples with different blurriness levels.
  • face image samples with three levels of sharpness and blurriness are collected, and each level contains a certain number of face image samples (for example, 200). Then, the face area is detected on the face image samples, and the block image samples where each face feature point is located are extracted from the face area.
  • the trained MTCNN face detection model can be used to detect the face area and the person. Positioning of facial feature points. Since the image size of each image sample is inconsistent, the size of the detected face area is also inconsistent, so after the face area is obtained, it is uniformly scaled to the preset size, and the coordinates of each face feature point are converted from the face image.
  • the default size is 184*184 , Select the left pupil, right pupil, nose tip, left mouth corner and right mouth corner as the face feature points, and expand 24 pixels around each face feature point as the center to form a 48*48 block image sample, and save it . In this way, by processing a small number of face image samples, 5 times the block image samples can be generated for model training.
  • Step 202 Mark each block image sample with a corresponding level label, and divide the multiple block image samples marked with the level label into a training set and a verification set.
  • each block image sample is manually labeled with a corresponding level label. That is, through manual review, each image sample is assigned to the correct category according to the degree of clarity and blur. The severely clear label is 0, the moderately clear label is 1, the lightly clear label is 2, and the mildly fuzzy label is 3, and the The degree of fuzzy label is 4, the heavy fuzzy label is 5. Then the block image samples marked with the grade label are divided into a training set and a validation set according to a preset ratio (for example, 9:1). The training set is used for the training and verification of the parameter model. The set is used to calibrate the model during the training process.
  • Step 203 Perform iterative training on the pre-built deep neural network according to the training set and the verification set to obtain a ambiguity detection model.
  • the pre-built deep neural network is trained, and the trained deep neural network is verified according to the verification set. If When the verification result does not meet the iteration stop condition, iterative training and verification of the deep neural network is continued until the verification result meets the iteration stop condition, and the ambiguity detection model is obtained.
  • the training set and validation set are packaged and processed into data in LMDB format, and the pre-built deep neural network structure is saved in a file with the suffix ".prototxt", and the data is read
  • the batch can be set to reasonable values according to the hardware performance.
  • the verification times and test interval are set to 50 times and 100 times, and these parameters can be adjusted.
  • the model is trained, and the model file with the suffix ".caffemodel” is obtained.
  • the present invention uses the deep learning caffe framework, which is similar to using other deep learning frameworks.
  • training deep learning models requires tens of thousands or even hundreds of thousands of training samples, but in actual production, the real blur samples are very limited.
  • image processing is used to simulate the generated Gaussian blur or motion blur samples and The real sample gap is obvious, and the present invention collects clear face image samples with different definition levels and blurred face image samples with different blurriness levels, and extracts the block images where multiple facial feature points are located respectively from these image samples. Samples and mark the corresponding grade labels, and then use multiple block image samples marked with grade labels to train the constructed deep neural network, so that only a small number of face image samples can be used to obtain multiple real training samples. Thereby, the performance of the model can be further ensured, and the accuracy of image blur detection can be effectively improved.
  • the present invention converts the binary classification problem into multi-classification problem processing, which can greatly reduce the interference of the two-pole samples. By paying full attention to the difficult-to-separate samples, it is compared with the unclear level and the fuzzy level.
  • the method of direct two-classification processing obtains better detection results, which can effectively avoid the problem of clear images being misjudged as blurred images, and further improve the accuracy of image blur detection.
  • the above-mentioned deep neural network includes a data input layer, a feature extraction layer, a first fully connected layer, an activation function layer, a dropout layer, a second fully connected layer, and a loss function layer that are sequentially cascaded.
  • the extraction layer includes a convolutional layer, a maximum pooling layer, a minimum pooling layer, and a concatenation layer.
  • the data input layer, the maximum pooling layer, and the minimum pooling layer are respectively connected to the convolutional layer, and the maximum pooling layer and the minimum pooling layer are respectively connected to the convolutional layer.
  • the first fully connected layer and the first fully connected layer are respectively connected to the series connection layer.
  • FIG. 3 is a schematic structural diagram of a deep neural network provided by an embodiment of the present invention.
  • the first is the data input layer, whose function is to pack the data and input it into the network in small batches. Then there is a convolutional layer. Then there are separate pooling layers: a maximum pooling (Max pooling) and a minimum pooling (Min pooling), where the maximum pooling method is to retain the most prominent features, and the minimum pooling method is to preserve the most significant features.
  • Max pooling maximum pooling
  • Min pooling minimum pooling
  • the fully connected layer is used to classify the input block image features, and the Relu activation function in the activation function layer is used to discard the output Neurons with a value less than 0 can cause sparsity.
  • the Dropout layer (remove the layer) is used to reduce a small number of parameters each time the model is trained, increasing the generalization ability of the model.
  • Next is a fully connected layer that outputs the score value of each sharpness level and each ambiguity level.
  • the method may further include:
  • each test set includes extracting the block image test sample where each face feature point is located from the face image test sample.
  • each test set includes extracting the block image test sample where each face feature point is located from the face image test sample.
  • the specific extraction process please refer to step 201, which will not be repeated here.
  • the ambiguity prediction of each block image test sample in each test set is performed based on the ambiguity detection model to obtain the prediction result, and each block image test sample is drawn according to the prediction result of each block image test sample in each test set and the preset threshold.
  • the ROC (receiver operating characteristic curve) curve corresponding to each test set is analyzed to obtain the best threshold value by analyzing the ROC curve corresponding to each test set.
  • 138669 clear face images, 2334 semi-clear face images, 19050 clear face images in the security small image, and 1446 blurred face images were collected and combined into three image sets: clear face images and Blurred face image, semi-clear face image and blurred face image, security small image clear face image and blurred face image, respectively extract the block image where the face feature points are located from the face images in the three image sets
  • the test samples are formed into three test sets, and then the ambiguity detection model is used to predict each test set, and the ROC curve is drawn according to the prediction results of each block image test sample in each test set and the preset threshold.
  • Figure 4a shows the ROC curve of the ambiguity detection model on the test set formed by clear and fuzzy face images
  • Figure 4b shows the ambiguity detection model in the security clear image and blur
  • Figure 4c shows the ROC curve on the test set formed by the ambiguity detection model on the semi-clear and blurred face images.
  • three preset thresholds can be set through the expert experience method, from low to high respectively 0.19, 0.39, and 0.79. After analyzing the ROC curve, 0.39 is selected as the optimal threshold. A test set of 0.39 for clear and fuzzy faces was selected for testing, and the accuracy of the test results reached 99.3%.
  • the method may further include:
  • the optimal threshold is used as the standard to determine whether the facial image is a blurred image.
  • the facial image is determined to be a blurred image, and the blurred image is automatically detected. , Improve the image quality.
  • Fig. 5 is a structural diagram of a face blur detection device provided by an embodiment of the present invention. As shown in Fig. 5, the device includes:
  • the extraction module 51 is configured to extract from the face image the block images where the multiple facial feature points are respectively located;
  • the prediction module 52 is used to predict each block image separately through a pre-trained blur detection model, and obtain the confidence level of each block image corresponding to each of the multiple level labels, wherein the multiple levels
  • the label includes multiple levels of clarity and multiple levels of ambiguity
  • the obtaining module 53 is configured to calculate the sharpness and blurriness of each block image according to the confidence that each block image corresponds to each level label of the multiple level labels;
  • the calculation module 54 is used to calculate the blur degree of the face image according to the sharpness and blur degree of all the block images.
  • the extraction module 51 is specifically used for:
  • the size of the face area is adjusted to a preset size, and the block image where each face feature point is located is extracted from the adjusted face area.
  • the device further includes a training module 50, and the training module 50 is specifically used for:
  • the pre-built deep neural network is iteratively trained to obtain the ambiguity detection model.
  • the deep neural network includes a data input layer, a feature extraction layer, a first fully connected layer, an activation function layer, a dropout layer, a second fully connected layer, and a loss function layer, which are sequentially cascaded, and the feature extraction layer Including the convolutional layer, the maximum pooling layer, the minimum pooling layer, and the concatenation layer.
  • the data input layer, the maximum pooling layer, and the minimum pooling layer are respectively connected to the convolutional layer.
  • the maximum pooling layer, the minimum pooling layer, The first fully connected layer is respectively connected with the series connection layer.
  • the training module 50 is specifically used to:
  • the device further includes a judgment module 55, and the judgment module 55 is specifically configured to:
  • the face image is a blurred image; otherwise, it is determined that the face image is a clear image.
  • the face ambiguity detection device of this embodiment belongs to the same concept as the face ambiguity detection method embodiment in the above embodiment. For its specific implementation process and beneficial effects, please refer to the face ambiguity detection method embodiment. Go into details again.
  • Fig. 6 is an internal structure diagram of a computer device provided by an embodiment of the present invention.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 6.
  • the computer equipment includes a processor, a memory, and a network interface connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer program is executed by the processor, a method for detecting the ambiguity of a human face is realized.
  • FIG. 6 is only a block diagram of a part of the structure related to the solution of the present invention, and does not constitute a limitation on the computer device to which the solution of the present invention is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • a computer device including a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and the processor implements the following steps when the processor executes the computer program:
  • the pre-trained fuzzy degree detection model is used to predict each block image separately, and the confidence of each block image corresponding to each level label in the multiple level labels is obtained, where the multiple level labels include multiple clear Degree level and multiple ambiguity levels;
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:
  • the pre-trained fuzzy degree detection model is used to predict each block image separately, and the confidence of each block image corresponding to each level label in the multiple level labels is obtained, where the multiple level labels include multiple clear Degree level and multiple ambiguity levels;
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本发明公开一种人脸模糊度检测方法、装置、计算机设备及存储介质,属于计算机视觉技术领域,方法包括:从人脸图像中分别提取出多个人脸特征点分别所在的块图像;通过预先训练好的模糊度检测模型分别对每个块图像进行预测,获得每个块图像对应于多个等级标签中的每个等级标签的置信度,其中,多个等级标签中包括多个清晰度等级和多个模糊度等级;根据每个块图像对应于多个等级标签中的每个等级标签的置信度,获取每个块图像的清晰度和模糊度;根据所有块图像的清晰度和模糊度计算人脸图像的模糊度。本发明实施例能够有效提高人脸模糊度检测的准确率。

Description

一种人脸模糊度检测方法、装置、计算机设备及存储介质 技术领域
本发明涉及计算机视觉技术领域,尤其涉及一种人脸模糊度检测方法、装置、计算机设备及存储介质。
背景技术
随着人工智能时代的到来,人脸识别技术显得越来越重要,如刷脸支付,刷脸过闸机等等,极大地便捷了人们的生活。但是输入到人脸识别模型的人脸图像质量会影响识别的效果,对这些人脸图像进行合理的筛选,比如舍弃掉模糊程度过高的图像显得尤为重要。
目前对于人脸模糊度检测主要包括全参考和无参考两种方法:
(1)全参考需要使用降质前的原始人脸图像作为参照,与模糊图像进行对比,这种方法的缺点是降质前的原始人脸图像不易获得;
(2)无参考不需要任何图像作为参照,直接对人脸图像进行模糊度判断,这种方法有更广的适用性。
针对全参考模糊度检测方法,首先需要一张未降质的参考图,这就限制了很多应用场景,并且由于从摄像头采集到的人脸将直接用于模糊度判断,将其作为参考图像的方法不现实,故广为采用的是无参考的模糊度检测方法。
对于无参考的模糊度检测方法,传统做法是输入一张包含人脸和背景的图像,为了排除背景的干扰,首先检测出人脸的区域,然后使用梯度函数如Brenner、Tenengrad、Laplacian算法计算人脸区域的梯度值,梯度值越大说明人脸的轮廓越清晰,即人脸图像越清晰,反之,梯度值越小说明人脸的轮廓越模糊,即人脸图像越模糊。这种方法对少量人脸图像有效,但对大批量人脸图像无效,大 量的清晰图像被判为模糊,存在检测准确率不高的问题。
另外,随着深度学习的兴起,神经网络具有强大的提取图像特征的能力,出现了将深度学习的方法用于检测人脸模糊度,也相应取得了一些进展。通常使用深度学习方法是将人脸块图像类别分为模糊和清晰两类,实验后发现仍有一些清晰的人脸图像被判为模糊,无法达到高准确率的检测要求。
发明内容
为了解决上述背景技术中提到的至少一个问题,本发明提供了一种人脸模糊度检测方法、装置、计算机设备及存储介质,能够有效提高人脸模糊度检测的准确率。本发明实施例提供的具体技术方案如下:
第一方面,提供了一种人脸模糊度检测方法,所述方法包括:
从人脸图像中分别提取出多个人脸特征点分别所在的块图像;
通过预先训练好的模糊度检测模型分别对每个所述块图像进行预测,获得每个所述块图像对应于多个等级标签中的每个等级标签的置信度,其中,所述多个等级标签中包括多个清晰度等级和多个模糊度等级;
根据每个所述块图像对应于多个等级标签中的每个等级标签的置信度,获取每个所述块图像的清晰度和模糊度;
根据所有所述块图像的清晰度和模糊度计算所述人脸图像的模糊度。
进一步地,所述从人脸图像中分别提取多个人脸特征点分别所在的特征块图像,包括:
对所述人脸图像进行检测,定位出人脸区域以及多个人脸特征点;
对所述人脸区域的尺寸调整到预设尺寸,从调整后的所述人脸区域中提取每个所述人脸特征点分别所在的块图像。
进一步地,所述模糊度检测模型是通过如下方法训练得到:
从多个人脸图像样本中分别提取每个所述人脸特征点所在的块图像样本,其中,所述多个图像样本包括清晰人脸图像样本和模糊人脸图像样本;
对每个所述块图像样本进行标记相应的等级标签,并将标记有等级标签的多个所述块图像样本划分为训练集和验证集;
根据所述训练集和所述验证集对预先构建的深度神经网络进行迭代训练,得到所述模糊度检测模型。
进一步地,所述深度神经网络包括依次级联的数据输入层、特征提取层、第一全连接层、激活函数层、Dropout层、第二全连接层和损失函数层,所述特征提取层包括卷积层、最大池化层、最小池化层和串接层,所述数据输入层、所述最大池化层、所述最小池化层分别与所述卷积层相连接,所述最大池化层、所述最小池化层、所述第一全连接层分别与所述串接层相连接。
进一步地,所述方法还包括:
根据ROC曲线使用不同测试集对所述模糊度检测模型进行计算最优阈值。
进一步地,所述根据所有所述块图像的清晰度和模糊度计算所述人脸图像的模糊度步骤之后,所述方法还包括:
判断计算得到的所述人脸图像的模糊度是否高于所述最优阈值;
若是,则判定所述人脸图像为模糊图像,否则,则判定所述人脸图像为清晰图像。
第二方面,提供了一种人脸模糊度检测装置,所述装置包括:
提取模块,用于从人脸图像中分别提取出多个人脸特征点分别所在的块图像;
预测模块,用于通过预先训练好的模糊度检测模型分别对每个所述块图像进行预测,获得每个所述块图像对应于多个等级标签中的每个等级标签的置信度,其中,所述多个等级标签中包括多个清晰度等级和多个模糊度等级;
获取模块,用于根据每个所述块图像对应于多个等级标签中的每个等级标签的置信度,计算每个所述块图像的清晰度和模糊度;
计算模块,用于根据所有所述块图像的清晰度和模糊度计算所述人脸图像的模糊度。
进一步地,所述提取模块具体用于:
对所述人脸图像进行检测,定位出人脸区域以及多个人脸特征点;
对所述人脸区域的尺寸调整到预设尺寸,从调整后的所述人脸区域中提取每个所述人脸特征点分别所在的块图像。
进一步地,所述装置还包括训练模块,所述训练模块具体用于:
从多个人脸图像样本中分别提取每个所述人脸特征点所在的块图像样本,其中,所述多个图像样本包括清晰人脸图像样本和模糊人脸图像样本;
对每个所述块图像样本进行标记相应的等级标签,并将标记有等级标签的多个所述块图像样本划分为训练集和验证集;
根据所述训练集和所述验证集对预先构建的深度神经网络进行迭代训练,得到所述模糊度检测模型。
进一步地,所述深度神经网络包括依次级联的数据输入层、特征提取层、第一全连接层、激活函数层、Dropout层、第二全连接层和损失函数层,所述特征提取层包括卷积层、最大池化层、最小池化层和串接层,所述数据输入层、所述最大池化层、所述最小池化层分别与所述卷积层相连接,所述最大池化层、所述最小池化层、所述第一全连接层分别与所述串接层相连接。
进一步地,所述训练模块具体还用于:
根据ROC曲线使用不同测试集对所述模糊度检测模型进行计算最优阈值。
进一步地,所述装置还包括判断模块,所述判断模块具体用于:
判断计算得到的所述人脸图像的模糊度是否高于所述最优阈值;
若是,则判定所述人脸图像为模糊图像,否则,则判定所述人脸图像为清晰图像。
第三方面,提供了一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如第一方面所述的人脸模糊度检测方法。
第四方面,提供一种计算机可读存储介质,所述计算机可读存储介质存储 有计算机程序,所述计算机程序被处理器执行时实现如第一方面所述的人脸模糊度检测方法。
由上述技术方案可知,本发明通过从人脸图像中提取出多个人脸特征点分别所在的块图像,然后通过使用预先训练好的模糊度检测模型分别预测出每个块图像对应于多个等级标签中的每个等级标签的置信度,以及根据每个块图像对应于多个等级标签中的每个等级标签的置信度,获取每个块图像的清晰度和模糊度,最后根据所有块图像的清晰度和模糊度计算人脸图像的模糊度,这样通过使用分块预测思想,将人脸图像中的多个块图像分别预测模糊度,再将预测的结果组合起来共同预测整张人脸图像的模糊度,在一定程度上避免了由于某一块人脸被判错导致整体结果判错,从而有效提高人脸模糊度检测的准确性;另外,本发明通过使用预先训练好的模糊度检测模型来预测人脸图像中不同块图像对应于多个等级标签中的每个等级标签的置信度,并根据每个块图像对应于多个等级标签中的每个等级标签的置信度,获取每个块图像的模糊度,由于多个等级标签包括多个清晰度等级和多个模糊度等级,相比较现有技术中使用深度学习方法仅将人脸块图像类别区分为模糊和清晰两类的二分类处理方法,本发明通过将二分类问题转化为多分类问题处理,之后再转化为二分类得到模糊度结果,这样能够有效避免清晰图像误判为模糊图像的问题,从而进一步提高图像模糊度检测的准确率。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例提供的一种人脸模糊度检测方法的流程图;
图2为本发明实施例提供的模糊度检测模型训练过程的流程图;
图3为本发明实施例提供的深度神经网络的结构示意图;
图4a~4c为本发明实施例的模糊度检测模型在不同测试集上的ROC曲线图;
图5为本发明实施例提供的一种人脸模糊度检测装置的结构图;
图6为本发明实施例提供的计算机设备的内部结构图。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
需要说明的是,除非上下文明确要求,否则整个说明书和权利要求书中的“包括”、“包含”等类似词语应当解释为包含的含义而不是排他或穷举的含义;也就是说,是“包括但不限于”的含义。此外,在本发明的描述中,除非另有说明,“多个”的含义是两个或两个以上。
图1为本发明实施例提供的一种人脸模糊度检测方法的流程图,如图1所示,该方法可以包括:
步骤101,从人脸图像中分别提取出多个人脸特征点分别所在的块图像。
具体地,对人脸图像进行检测人脸区域,从人脸区域中提取出多个人脸特征点分别所在的块图像。
其中,人脸特征点可以包括左瞳孔、右瞳孔、鼻尖、左嘴角和右嘴角对应的特征点,此外还可以是其他特征点,例如眉毛对应的特征点。
本实施例中,从人脸图像中分别提取出多个人脸特征点分别所在的块图像,不同的人脸特征点包含在不同的块图像中,这样能够提取出多个块图像,例如,包含左瞳孔的左眼块图像、包含右瞳孔的右眼块图像,等等。
步骤102,通过预先训练好的模糊度检测模型分别对每个块图像进行预测, 获得每个块图像对应于多个等级标签中的每个等级标签的置信度,其中,多个等级标签中包括多个清晰度等级和多个模糊度等级。
其中,某个块图像对应于某个等级标签的置信度用于指示该块图像对应于该等级标签的概率。
其中,清晰度等级预先按照清晰程度由重至轻划分为三级,包括重度清晰、中度清晰和轻度清晰,对应的等级标签分别为0、1、2;模糊度等级预先按照模糊程度由轻至重划分为三级,包括轻度模糊、中度模糊和重度模糊,对应的等级标签分别为3、4、5,可以理解是,清晰度等级的级数、模糊度等级的级数均不局限于三级,本发明实施例对此不作具体限定。
具体地,将每个块图像依次输入到模糊度检测模型中进行预测,获得模糊度检测模型输出的每个块图像对应于多个等级标签中的每个等级标签的置信度。
步骤103,根据每个块图像对应于多个等级标签中的每个等级标签的置信度,获取每个块图像的清晰度和模糊度。
具体地,针对每个块图像,对该块图像对应于多个等级标签中的每个等级标签的置信度进行运算,得到该块图像的清晰度和模糊度。其中,可以直接对该块图像对应于所有的清晰度等级的置信度进行累加,得到该块图像的清晰度,直接对该块图像对应于所有的模糊度等级的置信度进行累加,得到该块图像的模糊度,也可以采用其他的运算方式得到块图像的清晰度和模糊度,本发明实施例对此不作具体限定。
示例性地,示假设某张人脸图像的左眼块图像对应于上述6种等级标签的置信度情况为:对应于等级标签“0”的概率为0,对应于等级标签“1”的概率为0.9,对应于等级标签“2”的概率为0.05,对应于等级标签“3”的概率为0.05,对应于等级标签“4”和等级标签“5”的概率均为0,直接对该左眼块图像对应于所有的清晰度等级的置信度进行累加,得到该块图像的清晰度为0.95,对该左眼块图像对应于所有的模糊度等级的置信度进行累加,得到该块图像的模糊度为0.05。
步骤104,根据所有块图像的清晰度和模糊度计算人脸图像的模糊度。
具体地,对所有块图像的清晰度进行累加并除以全部块图像的个数,得到该张人脸图像的清晰度,对所有块图像的模糊度进行累加并除以全部块图像的个数,得到该张人脸图像的模糊度。
本发明实施例提供的一种人脸模糊度检测方法,通过从人脸图像中提取出多个人脸特征点分别所在的块图像,然后使用预先训练好的模糊度检测模型分别预测出每个块图像对应于多个等级标签中的每个等级标签的置信度,以及根据每个块图像对应于多个等级标签中的每个等级标签的置信度,获取每个块图像的清晰度和模糊度,最后根据所有块图像的清晰度和模糊度计算人脸图像的模糊度,这样通过使用分块预测思想,将人脸图像中的多个块图像分别预测模糊度,再将预测的结果组合起来共同预测整张人脸图像的模糊度,在一定程度上避免了由于某一块人脸被判错导致整体结果判错,从而有效提高人脸模糊度检测的准确性;另外,本发明通过使用预先训练好的模糊度检测模型来预测人脸图像中不同块图像对应于多个等级标签中的每个等级标签的置信度,并根据每个块图像对应于多个等级标签中的每个等级标签的置信度,获取每个块图像的模糊度,由于多个等级标签包括多个清晰度等级和多个模糊度等级,相比较现有技术中使用深度学习方法仅将人脸块图像类别区分为模糊和清晰两类的二分类处理方法,本发明通过将二分类问题转化为多分类问题处理,之后再转化为二分类得到模糊度结果,这样能够有效避免清晰图像误判为模糊图像的问题,从而进一步提高图像模糊度检测的准确率。
在一较佳实施例中,上述的从人脸图像中分别提取多个人脸特征点分别所在的特征块图像,该过程可以包括:
对人脸图像进行检测,定位出人脸区域以及多个人脸特征点,并对人脸区域的尺寸调整到预设尺寸,从调整后的人脸区域中提取每个人脸特征点分别所在的块图像。
具体地,使用训练好的MTCNN(Multi-task convolutional neural network)人脸检测模型对人脸图像进行检测定位人脸区域以及多个人脸特征点,这里的 MTCNN人脸检测模型包括P-Net、R-Net和O-Net网络层,分别负责生成检测框、精修检测框和人脸特征点定位;MTCNN人脸检测模型可以参照现有技术的模型训练方法进行训练,此处不再赘述。
在定位出人脸区域以及多个人脸特征点之后,将人脸区域的尺寸缩放到预设尺寸,同时将各个人脸特征点的坐标由人脸图像转换到尺寸调整后的人脸区域框内,分别以各个人脸特征点为中心向四周进行像素扩充,得到多个矩形块图像并进行越界处理,于本实施例中,预设尺寸为184*184,以各个人脸特征点为中心向四周扩充24个像素,分别构成48*48大小的块图像。
在一较佳实施例中,如图2所示,上述的模糊度检测模型是通过如下方法训练得到,包括步骤:
步骤201,从人脸图像样本中提取每个人脸特征点所在的块图像样本,其中,人脸图像样本包括不同清晰度等级的清晰人脸图像样本以及不同模糊度等级的模糊人脸图像样本。
本实施例中,首先采集清晰度和模糊度各三个等级的人脸图像样本,每个等级包含的人脸图像样本均达到一定数量(例如200张)。然后对人脸图像样本进行检测人脸区域,从人脸区域中提取出每个人脸特征点分别所在的块图像样本,其中可以使用训练好的MTCNN人脸检测模型进行人脸区域的检测以及人脸特征点的定位。由于每个图像样本的图像尺寸大小不一致,检测到的人脸区域大小也不一致,所以在得到人脸区域后统一进行缩放到预设尺寸,同时将各个人脸特征点的坐标由人脸图像转换到尺寸调整后的人脸区域框内,分别以各个人脸特征点为中心向四周进行像素扩充,得到多个矩形块图像并进行越界处理,于本实施例中,预设尺寸为184*184,选用左瞳孔、右瞳孔、鼻尖、左嘴角和右嘴角作为人脸特征点,以各个人脸特征点为中心向四周扩充24个像素,分别构成48*48大小的块图像样本,并保存下来。这样,通过对少量的人脸图像样本进行处理,可以产生5倍的块图像样本以用于模型训练。
步骤202,对每个块图像样本进行标记相应的等级标签,并将标记有等级标 签的多个块图像样本划分为训练集和验证集。
本实施例中,通过上述步骤201,每个等级的人脸图像样本得到约1000张的块图像样本,在本步骤中,首先通过人工标注方式对每个块图像样本进行标记相应的等级标签,即通过人工审核将每一块图像样本按照清晰程度和模糊程度归属到正确的类别中,重度清晰标签为0,中度清晰标签为1,轻度清晰标签为2,轻度模糊标签为3,中度模糊标签为4,重度模糊标签为5,然后将标记有等级标签的块图像样本按照预设比例(例如9∶1)划分为训练集和验证集,训练集用于参数模型的训练,验证集用来在训练过程中对模型进行校正。
步骤203,根据训练集和验证集对预先构建的深度神经网络进行迭代训练,得到模糊度检测模型。
具体地,以训练集中的块图像样本为输入,以块图像样本对应的等级标签为输出,对预先构建的深度神经网络进行训练,并根据验证集,对训练后的深度神经网络进行验证,若验证结果不符合迭代停止条件时,则继续对深度神经网络进行迭代训练和验证,直至验证结果符合迭代停止条件,得到模糊度检测模型。
在具体实施过程中,在模型训练之前,对训练集和验证集进行打包处理成LMDB格式的数据,将预先构建的深度神经网络结构保存在后缀名为“.prototxt”格式的文件,读取数据的批次可根据硬件性能去设置合理的数值,在“solver.prototxt”设置超参数,在“solver.prototxt”设置超参数,学习率(learning rate)设置为0.005,最大迭代次数设置为4000次,验证次数和测试间隔设置为50次和100次,这些参数均可调整。接着进行模型的训练,得到后缀名为“.caffemodel”的模型文件。本发明使用的是深度学习caffe框架,使用其他深度学习框架类似。
通常来说,训练深度学习模型需要上万甚至数十万的训练样本,但在实际的生产中,真实的模糊样本非常有限,同时使用图像处理的方式去模拟生成的高斯模糊或运动模糊样本和真实样本差距明显,而本发明通过采集不同清晰度等级的清晰人脸图像样本和不同模糊度等级的模糊人脸图像样本,并从这些图 像样本中分别提取多个人脸特征点分别所在的块图像样本并标记相应的等级标签,然后利用标记有等级标签的多个块图像样本对构建的深度神经网络进行训练,这样只需使用少量的人脸图像样本,就可以得到数倍的真实训练样本,从而能够进一步保证模型的性能,进而有效地提高图像模糊度检测的准确性。
另外,由于在模糊检测中,重度清晰和重度模糊是两个极端,相对容易区分,而那些受光照、被拍摄者抖动或者相机像素等影响的样本处于中等清晰、轻度清晰、轻度模糊和中度模糊中,这些样本不容易区分。而本发明在模糊度检测模型的训练过程中,将二分类问题转化为多分类问题处理,能够极大地降低两极样本的干扰,通过充分关注难分的样本,比不分清晰等级与模糊等级而直接二分类处理的方法得到了更好的检测结果,从而能够有效避免清晰图像误判为模糊图像的问题,进一步提高了图像模糊度检测的准确率。
在一较佳实施例中,上述的深度神经网络包括依次级联的数据输入层、特征提取层、第一全连接层、激活函数层、Dropout层、第二全连接层和损失函数层,特征提取层包括卷积层、最大池化层、最小池化层和串接层,数据输入层、最大池化层、最小池化层分别与卷积层相连接,最大池化层、最小池化层、第一全连接层分别与串接层相连接。
如图3所示,图3为本发明实施例提供的深度神经网络的结构示意图。首先是数据输入层,作用是将数据打包后按小批量输入到网络中。接着是一个卷积层。然后是分离的池化层:一个最大值池化(Max pooling)和一个最小值池化(Min pooling),其中,最大值池化方式为了保留最显著的特征,最小值池化方式为了保存最容易被忽视的特征,两种池化方式结合使用达到了良好的效果,接着将两种池化得到的特征图(Feature map)通过串接层(Concat)进行串接,共同作为下一层的输入。接下来是全连接层、激活函数层和去掉层(即Dropout层),其中,全连接层用来对输入过来的块图像特征进行归类,激活函数层中的Relu激活函数用于舍弃掉输出值小于0的神经元以造成稀疏性,Dropout层(去掉层)是每次训练模型用来减掉少量的参数,增加了模型的泛化能力。接下来 仍是一个全连接层,用来输出每一个清晰度等级和每一个模糊度等级的得分数值。最后是归一化和损失函数层,用来将上一层全连接层的输出的结果映射到对应的概率值,之后使用交叉熵损失函数使它们和标签的差值越来越小,具体交叉熵损失函数公式可以参照现有技术,此处不再赘述。
在一较佳实施例中,上述的根据训练集和验证集对预先构建的深度神经网络进行迭代训练,得到模糊度检测模型步骤之后,方法还可以包括:
根据ROC曲线使用不同测试集对模糊度检测模型进行计算最优阈值。
其中,各个测试集中包括从人脸图像测试样本中提取每个人脸特征点所在的块图像测试样本,具体提取过程可以参照步骤201,此处不再赘述。
具体地,基于模糊度检测模型对每个测试集中的每个块图像测试样本进行模糊度预测,得到预测结果,根据每个测试集中的每个块图像测试样本的预测结果及预设阈值绘制每个测试集对应的ROC(receiver operating characteristic curve)曲线,对每个测试集对应的ROC曲线进行分析,获取最佳阈值。
在实际应用中,采集清晰人脸图像138669张、半清晰人脸图像2334张、安防小图清晰人脸图像19050张以及模糊人脸图像1446张,组合成三个图像集合:清晰人脸图像和模糊人脸图像、半清晰人脸图像和模糊人脸图像、安防小图清晰人脸图像和模糊人脸图像,分别对三个图像集合中的人脸图像进行提取人脸特征点所在的块图像测试样本,形成三个测试集,之后使用模糊度检测模型对各个测试集进行预测,根据每个测试集中的每个块图像测试样本的预测结果及预设阈值分别绘制出ROC曲线,参照附图4a~4c所示,其中,图4a示出了模糊度检测模型在清晰和模糊人脸图像所形成的测试集上的ROC曲线,图4b示出了模糊度检测模型在安防清晰小图和模糊人脸图像所形成的测试集上的ROC曲线,图4c示出了模糊度检测模型在半清晰和模糊人脸图像所形成的测试集上的ROC曲线。于本实施例中,可以通过专家经验法设置三档预设阈值,由低至高分别为0.19、0.39和0.79,经过对ROC曲线分析,选用0.39作为最优阈值。选用0.39针对清晰和模糊人脸的测试集进行测试,测试结果准确率达到 99.3%。
在一较佳实施例中,上述的根据所有块图像的清晰度和模糊度计算人脸图像的模糊度步骤之后,方法还可以包括:
判断计算得到的人脸图像的模糊度是否高于最优阈值,若是,则判定人脸图像为模糊图像,否则,则判定人脸图像为清晰图像。
本实施例中,以最佳阈值作为标准进行判断人脸图像是否为模糊图像,当人脸图像的模糊度高于最优阈值时,判定人脸图像为模糊图像,实现了自动检测出模糊图像,提升了图像质量。
图5为本发明实施例提供的一种人脸模糊度检测装置的结构图,如图5所示,该装置包括:
提取模块51,用于从人脸图像中分别提取出多个人脸特征点分别所在的块图像;
预测模块52,用于通过预先训练好的模糊度检测模型分别对每个块图像进行预测,获得每个块图像对应于多个等级标签中的每个等级标签的置信度,其中,多个等级标签中包括多个清晰度等级和多个模糊度等级;
获取模块53,用于根据每个块图像对应于多个等级标签中的每个等级标签的置信度,计算每个块图像的清晰度和模糊度;
计算模块54,用于根据所有块图像的清晰度和模糊度计算人脸图像的模糊度。
在一较佳实施例中,提取模块51具体用于:
对人脸图像进行检测,定位出人脸区域以及多个人脸特征点;
对人脸区域的尺寸调整到预设尺寸,从调整后的人脸区域中提取每个人脸特征点分别所在的块图像。
在一较佳实施例中,装置还包括训练模块50,训练模块50具体用于:
从多个人脸图像样本中分别提取每个人脸特征点所在的块图像样本,其中,多个图像样本包括清晰人脸图像样本和模糊人脸图像样本;
对每个块图像样本进行标记相应的等级标签,并将标记有等级标签的多个块图像样本划分为训练集和验证集;
根据训练集和验证集对预先构建的深度神经网络进行迭代训练,得到模糊度检测模型。
在一较佳实施例中,深度神经网络包括依次级联的数据输入层、特征提取层、第一全连接层、激活函数层、Dropout层、第二全连接层和损失函数层,特征提取层包括卷积层、最大池化层、最小池化层和串接层,数据输入层、最大池化层、最小池化层分别与卷积层相连接,最大池化层、最小池化层、第一全连接层分别与串接层相连接。
在一较佳实施例中,训练模块50具体还用于:
根据ROC曲线使用不同测试集对模糊度检测模型进行计算最优阈值。
在一较佳实施例中,装置还包括判断模块55,判断模块55具体用于:
判断计算得到的人脸图像的模糊度是否高于最优阈值;
若是,则判定人脸图像为模糊图像,否则,则判定人脸图像为清晰图像。
需要说明的是:本实施例提供的人脸模糊度检测装置中,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,本实施例的人脸模糊度检测装置与上述实施例中的人脸模糊度检测方法实施例属于同一构思,其具体实现过程和有益效果详见人脸模糊度检测方法实施例,这里不再赘述。
图6为本发明实施例提供的计算机设备的内部结构图。该计算机设备可以是服务器,其内部结构图可以如图6所示。该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的网络 接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种人脸模糊度检测方法。
本领域技术人员可以理解,图6中示出的结构,仅仅是与本发明方案相关的部分结构的框图,并不构成对本发明方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,提供一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现以下步骤:
从人脸图像中分别提取出多个人脸特征点分别所在的块图像;
通过预先训练好的模糊度检测模型分别对每个块图像进行预测,获得每个块图像对应于多个等级标签中的每个等级标签的置信度,其中,多个等级标签中包括多个清晰度等级和多个模糊度等级;
根据每个块图像对应于多个等级标签中的每个等级标签的置信度,获取每个块图像的清晰度和模糊度;
根据所有块图像的清晰度和模糊度计算人脸图像的模糊度。
在一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现以下步骤:
从人脸图像中分别提取出多个人脸特征点分别所在的块图像;
通过预先训练好的模糊度检测模型分别对每个块图像进行预测,获得每个块图像对应于多个等级标签中的每个等级标签的置信度,其中,多个等级标签中包括多个清晰度等级和多个模糊度等级;
根据每个块图像对应于多个等级标签中的每个等级标签的置信度,获取每个块图像的清晰度和模糊度;
根据所有块图像的清晰度和模糊度计算人脸图像的模糊度。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程, 是可以通过计算机程序来指令相关的硬件来完成,上述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本发明所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。

Claims (10)

  1. 一种人脸模糊度检测方法,其特征在于,所述方法包括:
    从人脸图像中分别提取出多个人脸特征点分别所在的块图像;
    通过预先训练好的模糊度检测模型分别对每个所述块图像进行预测,获得每个所述块图像对应于多个等级标签中的每个等级标签的置信度,其中,所述多个等级标签中包括多个清晰度等级和多个模糊度等级;
    根据每个所述块图像对应于多个等级标签中的每个等级标签的置信度,获取每个所述块图像的清晰度和模糊度;
    根据所有所述块图像的清晰度和模糊度计算所述人脸图像的模糊度。
  2. 根据权利要求1所述的方法,其特征在于,所述从人脸图像中分别提取多个人脸特征点分别所在的特征块图像,包括:
    对所述人脸图像进行检测,定位出人脸区域以及多个人脸特征点;
    对所述人脸区域的尺寸调整到预设尺寸,从调整后的所述人脸区域中提取每个所述人脸特征点分别所在的块图像。
  3. 根据权利要求1或2所述的方法,其特征在于,所述模糊度检测模型是通过如下方法训练得到:
    从人脸图像样本中提取每个所述人脸特征点所在的块图像样本,其中,所述人脸图像样本包括不同清晰度等级的清晰人脸图像样本以及不同模糊度等级的模糊人脸图像样本;
    对每个所述块图像样本进行标记相应的等级标签,并将标记有等级标签的多个所述块图像样本划分为训练集和验证集;
    根据所述训练集和所述验证集对预先构建的深度神经网络进行迭代训练,得到所述模糊度检测模型。
  4. 根据权利要求3所述的方法,其特征在于,所述深度神经网络包括依次级联的数据输入层、特征提取层、第一全连接层、激活函数层、Dropout层、第 二全连接层和损失函数层,所述特征提取层包括卷积层、最大池化层、最小池化层和串接层,所述数据输入层、所述最大池化层、所述最小池化层分别与所述卷积层相连接,所述最大池化层、所述最小池化层、所述第一全连接层分别与所述串接层相连接。
  5. 根据权利要求3所述的方法,其特征在于,所述方法还包括:
    根据ROC曲线使用不同测试集对所述模糊度检测模型进行计算最优阈值。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所有所述块图像的清晰度和模糊度计算所述人脸图像的模糊度步骤之后,所述方法还包括:
    判断计算得到的所述人脸图像的模糊度是否高于所述最优阈值;
    若是,则判定所述人脸图像为模糊图像,否则,则判定所述人脸图像为清晰图像。
  7. 一种人脸模糊度检测装置,其特征在于,所述装置包括:
    提取模块,用于从人脸图像中分别提取出多个人脸特征点分别所在的块图像;
    预测模块,用于通过预先训练好的模糊度检测模型分别对每个所述块图像进行预测,获得每个所述块图像对应于多个等级标签中的每个等级标签的置信度,其中,所述多个等级标签中包括多个清晰度等级和多个模糊度等级;
    获取模块,用于根据每个所述块图像对应于多个等级标签中的每个等级标签的置信度,计算每个所述块图像的清晰度和模糊度;
    计算模块,用于根据所有所述块图像的清晰度和模糊度计算所述人脸图像的模糊度。
  8. 根据权利要求7所述的装置,其特征在于,所述装置还包括训练模块,所述训练模块具体用于:
    从人脸图像样本中提取每个所述人脸特征点所在的块图像样本,其中,所述人脸图像样本包括不同清晰度等级的清晰人脸图像样本以及不同模糊度等级的模糊人脸图像样本;
    对每个所述块图像样本进行标记相应的等级标签,并将标记有等级标签的多个所述块图像样本划分为训练集和验证集;
    根据所述训练集和所述验证集对预先构建的深度神经网络进行迭代训练,得到所述模糊度检测模型。
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至6任一所述的人脸模糊度检测方法。
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至6任一所述的人脸模糊度检测方法。
PCT/CN2020/097009 2020-03-09 2020-06-19 一种人脸模糊度检测方法、装置、计算机设备及存储介质 WO2021179471A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA3174691A CA3174691A1 (en) 2020-03-09 2020-06-19 Human face fuzziness detecting method, device, computer equipment and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010156039.8 2020-03-09
CN202010156039.8A CN111368758B (zh) 2020-03-09 2020-03-09 一种人脸模糊度检测方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021179471A1 true WO2021179471A1 (zh) 2021-09-16

Family

ID=71206593

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/097009 WO2021179471A1 (zh) 2020-03-09 2020-06-19 一种人脸模糊度检测方法、装置、计算机设备及存储介质

Country Status (3)

Country Link
CN (1) CN111368758B (zh)
CA (1) CA3174691A1 (zh)
WO (1) WO2021179471A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113902740A (zh) * 2021-12-06 2022-01-07 深圳佑驾创新科技有限公司 图像模糊程度评价模型的构建方法
CN117475091A (zh) * 2023-12-27 2024-01-30 浙江时光坐标科技股份有限公司 高精度3d模型生成方法和系统

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111862040B (zh) * 2020-07-20 2023-10-31 中移(杭州)信息技术有限公司 人像图片质量评价方法、装置、设备及存储介质
CN111914939B (zh) * 2020-08-06 2023-07-28 平安科技(深圳)有限公司 识别模糊图像的方法、装置、设备及计算机可读存储介质
CN113239738B (zh) * 2021-04-19 2023-11-07 深圳市安思疆科技有限公司 一种图像的模糊检测方法及模糊检测装置
CN113362304B (zh) * 2021-06-03 2023-07-21 北京百度网讯科技有限公司 清晰度预测模型的训练方法和确定清晰等级的方法
CN113627314A (zh) * 2021-08-05 2021-11-09 Oppo广东移动通信有限公司 人脸图像模糊检测方法、装置、存储介质与电子设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106920229A (zh) * 2017-01-22 2017-07-04 北京奇艺世纪科技有限公司 图像模糊区域自动检测方法及系统
US20180040115A1 (en) * 2016-08-05 2018-02-08 Nuctech Company Limited Methods and apparatuses for estimating an ambiguity of an image
CN107844766A (zh) * 2017-10-31 2018-03-27 北京小米移动软件有限公司 人脸图像模糊度的获取方法、装置和设备
WO2019123554A1 (ja) * 2017-12-20 2019-06-27 日本電気株式会社 画像処理装置、画像処理方法、及び、記録媒体
CN110059642A (zh) * 2019-04-23 2019-07-26 北京海益同展信息科技有限公司 人脸图像筛选方法与装置
CN110363753A (zh) * 2019-07-11 2019-10-22 北京字节跳动网络技术有限公司 图像质量评估方法、装置及电子设备
CN110705511A (zh) * 2019-10-16 2020-01-17 北京字节跳动网络技术有限公司 模糊图像的识别方法、装置、设备及存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389030B (zh) * 2018-08-23 2022-11-29 平安科技(深圳)有限公司 人脸特征点检测方法、装置、计算机设备及存储介质
CN110163114B (zh) * 2019-04-25 2022-02-15 厦门瑞为信息技术有限公司 一种人脸角度及人脸模糊度分析方法、系统和计算机设备

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180040115A1 (en) * 2016-08-05 2018-02-08 Nuctech Company Limited Methods and apparatuses for estimating an ambiguity of an image
CN106920229A (zh) * 2017-01-22 2017-07-04 北京奇艺世纪科技有限公司 图像模糊区域自动检测方法及系统
CN107844766A (zh) * 2017-10-31 2018-03-27 北京小米移动软件有限公司 人脸图像模糊度的获取方法、装置和设备
WO2019123554A1 (ja) * 2017-12-20 2019-06-27 日本電気株式会社 画像処理装置、画像処理方法、及び、記録媒体
CN110059642A (zh) * 2019-04-23 2019-07-26 北京海益同展信息科技有限公司 人脸图像筛选方法与装置
CN110363753A (zh) * 2019-07-11 2019-10-22 北京字节跳动网络技术有限公司 图像质量评估方法、装置及电子设备
CN110705511A (zh) * 2019-10-16 2020-01-17 北京字节跳动网络技术有限公司 模糊图像的识别方法、装置、设备及存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113902740A (zh) * 2021-12-06 2022-01-07 深圳佑驾创新科技有限公司 图像模糊程度评价模型的构建方法
CN117475091A (zh) * 2023-12-27 2024-01-30 浙江时光坐标科技股份有限公司 高精度3d模型生成方法和系统
CN117475091B (zh) * 2023-12-27 2024-03-22 浙江时光坐标科技股份有限公司 高精度3d模型生成方法和系统

Also Published As

Publication number Publication date
CA3174691A1 (en) 2021-09-16
CN111368758A (zh) 2020-07-03
CN111368758B (zh) 2023-05-23

Similar Documents

Publication Publication Date Title
WO2021179471A1 (zh) 一种人脸模糊度检测方法、装置、计算机设备及存储介质
US11403876B2 (en) Image processing method and apparatus, facial recognition method and apparatus, and computer device
CN110569721B (zh) 识别模型训练方法、图像识别方法、装置、设备及介质
WO2021068322A1 (zh) 活体检测模型的训练方法、装置、计算机设备和存储介质
JP6330385B2 (ja) 画像処理装置、画像処理方法およびプログラム
KR20180109665A (ko) 객체 검출을 위한 영상 처리 방법 및 장치
CN111611873A (zh) 人脸替换检测方法及装置、电子设备、计算机存储介质
US20230056564A1 (en) Image authenticity detection method and apparatus
CN109472193A (zh) 人脸检测方法及装置
US11605210B2 (en) Method for optical character recognition in document subject to shadows, and device employing method
CN116110100B (zh) 一种人脸识别方法、装置、计算机设备及存储介质
KR20180109658A (ko) 영상 처리 방법과 장치
WO2021189770A1 (zh) 基于人工智能的图像增强处理方法、装置、设备及介质
JP2022133378A (ja) 顔生体検出方法、装置、電子機器、及び記憶媒体
CN112766028B (zh) 人脸模糊处理方法、装置、电子设备及存储介质
CN113177892A (zh) 生成图像修复模型的方法、设备、介质及程序产品
CN111340025A (zh) 字符识别方法、装置、计算机设备和计算机可读存储介质
CN110321778B (zh) 一种人脸图像处理方法、装置和存储介质
CN110942067A (zh) 文本识别方法、装置、计算机设备和存储介质
EP4174769A1 (en) Method and apparatus for marking object outline in target image, and storage medium and electronic apparatus
CN112766351A (zh) 一种图像质量的评估方法、系统、计算机设备和存储介质
CN111435445A (zh) 字符识别模型的训练方法及装置、字符识别方法及装置
CN111612732A (zh) 图像质量评估方法、装置、计算机设备及存储介质
CN116206373A (zh) 活体检测方法、电子设备及存储介质
CN111311772A (zh) 一种考勤处理方法及装置、电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20924820

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3174691

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20924820

Country of ref document: EP

Kind code of ref document: A1