WO2019228040A1 - 一种面部图像评分方法及摄像机 - Google Patents

一种面部图像评分方法及摄像机 Download PDF

Info

Publication number
WO2019228040A1
WO2019228040A1 PCT/CN2019/080024 CN2019080024W WO2019228040A1 WO 2019228040 A1 WO2019228040 A1 WO 2019228040A1 CN 2019080024 W CN2019080024 W CN 2019080024W WO 2019228040 A1 WO2019228040 A1 WO 2019228040A1
Authority
WO
WIPO (PCT)
Prior art keywords
facial image
score
scored
key point
image
Prior art date
Application number
PCT/CN2019/080024
Other languages
English (en)
French (fr)
Inventor
刘干
马程
Original Assignee
杭州海康威视数字技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视数字技术股份有限公司 filed Critical 杭州海康威视数字技术股份有限公司
Publication of WO2019228040A1 publication Critical patent/WO2019228040A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection

Definitions

  • the present application relates to the field of image scoring technology, and in particular, to a facial image scoring method and a camera.
  • the camera can capture facial images at a set time interval and save the facial images. For the same person, many face images will be captured within a period of time. If each face image is saved, it will occupy a lot of resources of the device. Therefore, the best image is generally selected from multiple face images for saving.
  • each face image can usually be scored, and the face image with the highest score is selected as the best image.
  • scoring each face image information such as whether the face in the face image is a normal face or whether the eyes are open can be detected and scored according to the detection result.
  • the situation of captured face images is ever-changing, and there are many factors that affect image quality.
  • the criteria for scoring face images in the above manner are relatively simple, and the ratings are not reasonable.
  • the embodiments of the present application provide a facial image scoring method and a camera to improve the rationality when scoring a facial image.
  • the specific technical solution is as follows.
  • an embodiment of the present application provides a camera, including: a processor, a memory, and an image acquisition module;
  • the image acquisition module is configured to collect a facial image to be scored and store the facial image in the memory
  • the processor is configured to obtain the facial image to be scored from the memory, and send the facial image to be scored to a pre-trained neural network, and the neural network may perform the evaluation on the facial image to be scored according to network parameters. Scoring to obtain a reference score; the processor extracts each key point of the facial area in the facial image to be scored from the facial image to be scored obtained from the memory; according to the distance between each key point and a preset distance and The correspondence between the size scores determines a size score of a face region in the facial image to be scored; and determines a final score of the facial image to be scored according to the reference score and the size score.
  • the network parameter is obtained by training the neural network according to a sample facial image and a corresponding standard score; the standard score is determined according to a facial state feature of the sample facial image; the processor, and Used to determine the standard score using:
  • the base library is used for Store various standard facial images
  • a standard score of the sample facial image is determined according to the subjective score and the objective score.
  • the processor is specifically configured to:
  • bin is the maximum gray value level of the pixel
  • S i is the i-th gray value in the sample facial image
  • G j is the j-th gray value in the standard facial image.
  • the h (S i, G j) is the gray value of the standard sample face image pixel at the same position of the face image are the number of occurrences of S i and G j
  • the S k is the The gray value of the k-th pixel in the sample facial image
  • G k is the gray value of the k-th pixel in the standard facial image
  • N is the The total number of pixels.
  • the processor is further configured to obtain an initial value of the network parameter by using the following operations:
  • a reference network parameter of a trained reference neural network is obtained as an initial value of the network parameter; the reference neural network is obtained by training using a sample image different from the sample facial image.
  • the processor is specifically configured to:
  • the processor is specifically configured to:
  • a size score of a face region in the facial image to be scored is determined according to the target size score and the deflection coefficient.
  • each key point includes: a left eye key point, a right eye key point, a nose tip key point, a left mouth corner key point, and a right mouth corner key point; the processor is specifically used to:
  • a and c are offset angles of the right eye key point and the right mouth corner key point with respect to the nose tip key point, and the left eye key point and the left mouth corner key point with respect to The offset angle of the nose tip key point, the right eye key point and the left eye key point with respect to the nose tip key point, and the left mouth corner key point and the right mouth key point relative to The two largest offset angles among the offset angles of the tip of the nose tip, the b and d are the remaining two offset angles, and the E is the right eye key point and the left eye key point The offset angle of the line relative to the horizontal line.
  • the processor is specifically configured to:
  • a final score of the facial image to be scored is determined according to the sharpness, the reference score, and the size score.
  • the processor is specifically configured to:
  • the diagonal direction gradient algorithm is used to calculate the gradient value of each pixel point in the facial image to be scored, and the clarity of the facial image to be scored is calculated according to the average value of the gradient value of each pixel point in the facial image to be scored degree.
  • the processor is specifically configured to:
  • the processor is further configured to:
  • an embodiment of the present application provides a facial image scoring method, which includes:
  • a final score of the facial image to be scored is determined according to the reference score and the size score.
  • the network parameter is obtained by training the neural network according to a sample facial image and a corresponding standard score; the standard score is determined according to the facial state characteristics of the sample facial image; the standard score uses the following Way to determine:
  • the base library is used for Store various standard facial images
  • a standard score of the sample facial image is determined according to the subjective score and the objective score.
  • the step of determining the similarity between the standard facial image and the sample facial image includes:
  • bin is the maximum gray value level of the pixel
  • S i is the i-th gray value in the sample facial image
  • G j is the j-th gray value in the standard facial image.
  • the h (S i, G j) is the gray value of the standard sample face image pixel at the same position of the face image are the number of occurrences of S i and G j
  • the S k is the The gray value of the k-th pixel in the sample facial image
  • G k is the gray value of the k-th pixel in the standard facial image
  • N is the gray value of the sample facial image or the standard facial image.
  • the initial value of the network parameter is obtained in the following manner:
  • a reference network parameter of a trained reference neural network is obtained as an initial value of the network parameter; the reference neural network is obtained by training using a sample image different from the sample facial image.
  • the step of determining the size score of the face region in the facial image to be scored according to the distance between each key point and a preset correspondence between the distance and the size score includes:
  • the step of determining a size score of a face region in the facial image to be scored according to the target size score includes:
  • a size score of a face region in the facial image to be scored is determined according to the target size score and the deflection coefficient.
  • each key point includes: a left eye key point, a right eye key point, a nasal tip key point, a left mouth corner key point, and a right mouth corner key point; and determining the face region in the facial image to be scored according to each offset angle
  • the deflection coefficient steps include:
  • a and c are offset angles of the right eye key point and the right mouth corner key point with respect to the nose tip key point, and the left eye key point and the left mouth corner key point with respect to The offset angle of the nose tip key point, the right eye key point and the left eye key point with respect to the nose tip key point, and the left mouth corner key point and the right mouth key point relative to The two largest offset angles among the offset angles of the tip of the nose tip, the b and d are the remaining two offset angles, and the E is the right eye key point and the left eye key point The offset angle of the line relative to the horizontal line.
  • the step of determining a final score of the facial image to be scored according to the reference score and the size score includes:
  • a final score of the facial image to be scored is determined according to the sharpness, the reference score, and the size score.
  • the step of calculating the sharpness of the facial image to be scored includes:
  • the diagonal direction gradient algorithm is used to calculate the gradient value of each pixel point in the facial image to be scored, and the clarity of the facial image to be scored is calculated according to the average value of the gradient value of each pixel point in the facial image to be scored degree.
  • the step of obtaining a facial image to be scored includes:
  • the method further includes:
  • an embodiment of the present application further provides a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program, and the computer program implements the facial image scoring method provided in the second aspect when executed by a processor. .
  • an embodiment of the present application further provides an application program, which is used to execute the facial image scoring method provided in the second aspect at runtime.
  • the facial image scoring method and camera provided in the embodiments of the present application can score a facial image to be scored by a neural network to obtain a reference score, and determine the facial area based on the distance between key points of the facial area in the facial image to be scored.
  • the size score determines the final score of the facial image to be scored according to the reference score and the size score.
  • the neural network can determine the reference score for the facial state characteristics of the facial image to be scored, and can determine the size score according to the distance between key points.
  • the embodiments of the present application can consider various state characteristics and size characteristics of the facial image, and the scoring standard is more comprehensive, so the rationality of the facial image scoring can be improved.
  • the implementation of any product or method of this application does not necessarily need to achieve all the advantages described above at the same time.
  • FIG. 1 is a schematic flowchart of a facial image scoring method according to an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a standard score determination method for training a neural network in the embodiment shown in FIG. 1 according to an embodiment of the present application;
  • FIG. 3 is a schematic diagram of a correspondence relationship between an interpupillary distance and a size score according to an embodiment of the present application
  • FIG. 4 is a schematic diagram of various key points in a human face according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a camera according to an embodiment of the present application.
  • an embodiment of the present application provides a method for scoring a facial image. The following describes this application in detail through specific embodiments.
  • FIG. 1 is a schematic flowchart of a facial image scoring method according to an embodiment of the present application. This method is applied to electronic equipment.
  • the electronic device may be a computer, a server, a video camera, a smart phone, and other devices with data processing functions.
  • the method includes the following steps:
  • Step S101 Acquire a facial image to be scored.
  • the facial image to be scored includes at least one facial area.
  • the facial image to be scored may be an image containing a facial region to be scored.
  • the face area can be a human face, or an animal face or a robot face.
  • the facial image to be scored may include only a facial area, or a facial area and a background area outside the facial area.
  • the electronic device when acquiring the facial image to be scored, the electronic device can directly acquire the facial image to be scored collected by the image acquisition module.
  • the electronic device may obtain a facial image to be scored from another device.
  • Step S102 Send the facial image to be scored to a pre-trained neural network, and the neural network scores the facial image to be scored according to the network parameters, obtains a reference score, and records it as DLScore.
  • the network parameters are obtained by training a neural network according to a sample facial image and a corresponding standard score in advance.
  • the standard score is determined based on the facial state characteristics of the sample facial image.
  • the facial state characteristics may include features such as whether the face is normal, whether the face is clear, whether the face is occluded and the occlusion site, whether an expression is present, whether the eyes are open, and the like.
  • the neural network After training the neural network according to the sample facial image and the corresponding standard score, the neural network can score the facial image to be scored according to the learned network parameters.
  • the obtained reference score is the score based on the characteristics of the facial state, without the need for single detection. Information such as whether the facial image to be scored is a normal face or whether the eyes are open. Therefore, the reference score obtained by the neural network can refer to more facial state features, and the final score of the facial image to be scored is more reasonable.
  • Neural networks can run inside electronic devices or inside other devices. When the neural network determines the reference score of the image to be scored, the reference score is sent to the electronic device as the execution subject. The electronic device can receive reference scores sent by other devices.
  • Step S103 extracting each key point of a facial area in the facial image to be scored.
  • each key point of the face region includes a left eye key point, a right eye key point, a nose tip key point, a left mouth corner key point, and a right mouth corner key point, etc., and may also include key points of the shape of the face region.
  • an active shape model (Active Shape Model), an active appearance model (Active Appearance Model), a constrained local model (Constrained Local Model), or a cascade shape regression based ( Cascaded Regression), deep learning-based methods, etc.
  • the extracting each key point of a facial area in the facial image to be scored may be the image coordinates of extracting each key point of a facial area in the facial image to be scored.
  • Step S104 Determine the size score of the face region in the facial image to be scored according to the distance between the key points and the preset correspondence between the distance and the size score.
  • the distance between the key points may be the interpupillary distance, the distance between the key point of the left mouth corner and the key point of the right mouth corner, or the distance between other key points and the key point of the nose.
  • the preset correspondence between the distance and the size score can be obtained in advance from the sample facial image.
  • the size score in this embodiment may indicate the clarity of the face area. The higher the size score, the sharper the face, and the closer the image is to the best.
  • the above size score can also be referred to as the face critical review score and recorded as FacePointScore.
  • steps S103 and S104 may also be performed before step S102, which is not limited in this application.
  • Step S105 Determine the final score of the facial image to be scored according to the reference score and the size score, and record it as FinalScore.
  • This step may specifically be an operation manner of adding, multiplying, or multiplying the reference score and the size score by a preset value to obtain the final score of the facial image to be scored.
  • the higher the final score the closer the facial image to be scored to the best image is considered.
  • the facial image to be scored with the highest final score may also be determined as the best image.
  • the neural network may determine the reference score for the facial state characteristics of the facial image to be scored, and determine the size score according to the distance between key points.
  • various state characteristics and size characteristics of a facial image can be considered, and the scoring standard is more comprehensive, so the rationality of scoring a facial image can be improved.
  • the determination of the above-mentioned standard score may be performed according to the schematic flowchart shown in FIG. 2, and specifically includes the following steps S201 to S203.
  • Step S201 Determine the score corresponding to the facial state feature as the subjective score of the sample facial image according to the facial state feature of the sample facial image and the preset correspondence relationship between the facial state feature and the score.
  • the facial state characteristics may include features indicating the degree of deflection of the face, sharpness, facial expression, whether to blink or not.
  • the degree of factors divides the image into the second, third, and so on. Each category serves as a ladder, and the rating of the previous ladder is higher than the rating of the next ladder.
  • occlusion may include eye occlusion, face occlusion, forehead occlusion, mouth occlusion, and the like.
  • Covers can be masks, hands, branches, books, glasses, etc. Expressions can include situations without obvious expressions, laughs, pouting, and so on.
  • a pre-trained facial state classifier can be used to determine the score of the sample facial image in terms of the degree of face deflection and the presence of occlusion.
  • a manual scoring method can be used to score sample facial images based on preset clarity, facial expressions, and whether they blink.
  • the subjective score of the sample facial image is obtained according to the score of the sample facial image determined by the facial state classifier and the score of the sample facial image manually evaluated. For example, the product of the score of the sample face image determined by the face state classifier and the score of the sample face image manually evaluated may be used as the subjective score of the sample face image.
  • the aforementioned facial state classifier may be trained according to a sample facial image and a corresponding label.
  • the label when setting a label, if the face in the sample facial image is a back face, the label is 0 points; if the elevation angle of the face is 40 degrees or more, or the deflection angle is 40 degrees or more, the label is 5 points. If the face is a positive face but the elevation angle is less than 40 degrees, the label can be determined according to the mapping table in Table 1.
  • the sample face images can be divided into 13 categories when setting the labels, and the category labels are 0, 5, 10, 15, 20, 30, 40, 50, 60, 70 , 80, 90, 100.
  • Step S202 Determine a standard facial image corresponding to the sample facial image from the base library, and determine the similarity between the standard facial image and the sample facial image as an objective score of the sample facial image.
  • the base library is used to store various standard facial images. Each standard face image can correspond to a different object. Each standard face image can be a positive face, very clear, unlabeled, and no blinking images.
  • the sample facial image can be matched with each standard facial image in the base library according to a preset image matching algorithm, and the successfully matched standard facial image is used as the The standard facial image corresponding to the sample facial image.
  • the standard facial image corresponding to the sample facial image can also be determined from the base database by manual determination.
  • the objects in the sample facial image and the corresponding standard facial image should be the same. For example, the persons in the sample facial image and the corresponding standard facial image are the same person.
  • Each standard face image in the base library can have the same resolution and image size.
  • the sample facial image may be pre-processed. This can make the image parameters of the sample facial image and the standard facial image on the same level, and thus make the determined similarity more accurate.
  • the above preprocessing may include normalizing the resolution and brightness of the sample facial image.
  • the sample facial image can be normalized to a size of 256 pixels ⁇ 256 pixels.
  • normalizing the brightness of the sample facial image it may include: calculating the average brightness of all standard facial images in the base library, and writing it as Aveobj; computing the average brightness of the m-th sample facial image, writing as Avem, where m represents The mth sample facial image; when Avem is greater than Aveobj, each gray value of the mth sample facial image is subtracted from
  • the resolution of the sample facial image may be normalized first, and then the brightness of the sample facial image after the resolution normalization may be normalized.
  • bin is the maximum gray level of the pixel, for example, bin can be 255.
  • S i is the i-th gray value in the sample facial image, and the value of i ranges from 0 to 255.
  • G j is the j-th gray value in the standard facial image, and the value of j ranges from 0 to 255.
  • h (S i , G j ) is the gray value of the pixel at the same position of the sample facial image and the standard facial image as the number of occurrences of S i and G j respectively.
  • Sk is the gray value of the k-th pixel in the sample facial image
  • G k is the gray value of the k-th pixel in the standard facial image
  • N is the total number of pixels in the sample facial image or standard facial image. The total number of pixels of the sample facial image or the standard facial image may be the same.
  • a joint histogram of the standard facial image G and the sample facial image S can be obtained in advance, and each h (S i , G j ) can be obtained from the joint histogram.
  • the abscissa of the joint histogram is each gray value combination, and the ordinate is the number of occurrences of each gray value combination in the sample facial image or standard facial image.
  • the position of the pixel corresponding to S i in the sample facial image is the same as the position of the pixel corresponding to G j in the standard facial image.
  • P (S i , G j ) represents a joint probability distribution of the standard facial image G and the sample facial image S.
  • P (S i ) and P (G i ) represent edge probability density.
  • the similarity Sim (S, G) can also be normalized to [0,1].
  • Step S203 Determine a standard score of the sample facial image according to the subjective score and the objective score.
  • the result of multiplying or adding the subjective score and the objective score may be determined as the standard score of the sample face image.
  • a score label closest to the standard score may also be determined from preset score labels, and the closest score label may be used as the updated standard score of the sample facial image to make the updated standard score One of the preset individual rating labels. This can make the standard score more standardized and facilitate subsequent processing.
  • the setting of the standard score of the sample facial image directly affects the rationality of the neural network in determining the reference score of the facial image to be scored.
  • This embodiment determines the standard score of the sample facial image according to the subjective score and the objective score, which can make the standard score more reasonable, and then make the trained neural network more meet the requirements, and the reference score of the facial image to be scored is more accurate.
  • the calibrated sample facial image can be input to a neural network, and the neural network processes the sample facial image to extract the facial state of the sample facial image. Characteristics, determine the evaluation score according to the facial state characteristics of the sample facial image, and then appropriately adjust the network parameters of the neural network based on the difference between the evaluation score and the standard score of the sample facial image.
  • the neural network gradually learns the correspondence between facial state characteristics and scores of the image relationship.
  • the method for adjusting the network parameters of the neural network may adopt a stochastic gradient descent algorithm, etc., and is not specifically limited herein.
  • the neural network can accurately score the images at this time, and then the training can be stopped to obtain the aforementioned neural network for scoring facial images to be scored.
  • the neural network includes the correspondence between facial state features and scores of the image.
  • the neural network After inputting the facial image to be scored into the neural network, the neural network can extract the facial state characteristics of the facial image to be scored, and then based on the network parameters, score the facial image to be scored according to the correspondence between the facial state characteristics of the image and the score, and obtain a reference score.
  • a commonly used deep learning network and a commonly used deep learning training framework may be selected, and a hyperparameter file for training the neural network may be prepared in advance for training.
  • the sample facial image can be divided into two parts, one for training and one for testing the accuracy of the neural network after training.
  • GoogleNet Inception V2 a GoogleNet Inception V2 network can be used, and the accuracy of the network can meet requirements and the efficiency is relatively high.
  • Commonly used deep learning training frameworks include Caffe, Tensorflow, Pytorch, and MXNet.
  • Caffe can be used in this embodiment, and Caffe can be improved to enable data enhancement.
  • Data enhancement refers to performing one or more of geometric transformation, color transformation, principal component analysis (PCA) processing, and blur processing on an image to increase the number of samples.
  • data enhancement may include horizontal and vertical flipping, random cropping, rotation, scaling, affine transformation, projection transformation, random erasure, Gaussian noise, blurring, adjustment of color saturation, brightness, contrast, and the like.
  • the initial values of network parameters are obtained in the following ways:
  • the reference neural network is obtained by using sample images different from the sample facial images described above.
  • the type of the reference neural network should be the same as the type of the neural network to be trained, for example, the reference neural network and the neural network to be trained are both GoogleNet Inception V2 networks.
  • the reference neural network can be a good network model trained on a large data set. Since the reference neural network already has relatively good network parameters, training on the reference neural network according to the sample facial image and the corresponding standard score can improve the training efficiency. This training method can be called fine-tuning.
  • the number of samples is related to the training time of the network model and the detection effect of the network model after training. In some specific tasks, if the number of samples is not large enough, it is not easy to retrain a new network model with good results, and the parameters are not easy to adjust. Fine-tuning can be used. In addition, even with a large number of samples, using fine-tuning tends to get better results than training a new network model from scratch. Fine-tuning is to use a model that has been trained on a dataset containing a large number of samples, and add specific samples to train a new network model. Among them, the trained model can be a large number of Model-trained using the ImageNet dataset- Zoo.
  • the hyperparameters contained in the hyperparameter file can indicate the network file used in training the neural network, the optimization algorithm, the learning rate, the learning rate change method, the maximum number of iterations, the training mode, and how often to save the classifier model during the training process.
  • the training mode can be GPU (Graphics Processing Unit) or CPU (Central Processing Unit).
  • Some hyperparameters are determined, such as the network file used for training, training mode, etc., and the selection of some hyperparameters needs to be determined using K-fold cross-validation, such as learning rate and optimization algorithms.
  • the network parameters during the training can be saved at preset time intervals. After the neural network is trained, it will obtain the network parameters saved multiple times, including the network parameters generated during the training process and the network parameters generated when the maximum number of training iterations is reached.
  • the network parameters obtained each time are a combination of a series of parameters.
  • this embodiment can test each neural network using the sample facial images during the test, and select the neural network with the highest classification accuracy as the neural network when determining the reference score.
  • step S104 the step of determining the size score of the face region in the facial image to be scored according to the distance between the key points and the preset distance and the corresponding relationship between the size score, may specifically include: Following steps 1a and 2a.
  • Step 1a Calculate the interpupillary distance according to each key point, and determine the target size score corresponding to the calculated interpupillary distance according to the preset correspondence between the interpupillary distance and the size score.
  • the target size score can be recorded as TupleScore.
  • FIG. 3 is a schematic diagram of the correspondence between the interpupillary distance and the size score.
  • the corresponding size score changes with a linear function as the interpupillary distance becomes larger.
  • the size score remains a fixed value, which is the maximum value Max. For example, d0 can be 0, dk can be 50, and the maximum value of the size score Max can be 100.
  • Step 2a Determine the size score of the face region in the facial image to be scored according to the target size score.
  • the target size score may be determined as a size score of a face region in a facial image to be scored, or a value obtained by performing a certain processing on the target size score may be determined as a size score of a face region in a facial image to be scored.
  • the target size score is rounded and divided by a certain value.
  • the size score of the face area in the facial image to be scored can be determined according to the size score corresponding to the interpupillary distance. This method is simple and easy to handle, and can reach a certain degree of rationality.
  • the step of determining the size score of the face region in the facial image to be scored according to the target size score may specifically include the following steps 2a- 1 and 2a-2.
  • Step 2a-1 Calculate the offset angle between each key point, and determine the deflection coefficient of the face area in the facial image to be scored according to each offset angle;
  • the offset angle between each key point can reflect the degree of deflection of the face area, and the offset coefficient can reflect this degree of offset.
  • each key point may include a left eye key point, a right eye key point, a nose tip key point, a left mouth corner key point, and a right mouth corner key point.
  • This step may specifically include:
  • a and c are the offset angles of the right eye key point and the right mouth corner key point with respect to the nose tip key point, the left eye key point and the left mouth corner key point with respect to the nose tip key point, the right eye key point and the left
  • the largest two offset angles of the offset angle of the eye key point from the nose tip key point and the left mouth corner key point and the right mouth corner key point from the nose tip key point, b and d are the two remaining offset angles
  • E is the offset angle of the line connecting the right eye key point and the left eye key point with respect to the horizontal line.
  • A is the offset angle of the right eye key point and the right mouth corner key point from the nose tip key point
  • B is the offset angle of the left eye key point and the left mouth corner key point from the nose tip key point
  • C is the right eye key point
  • D is the offset angle of the left mouth corner key point and the right mouth corner key point from the nose tip key point
  • E is the line connecting the right eye key point and the left eye key point Offset angle from the horizontal line.
  • each offset angle of A, B, C, D, and E is marked in FIG. 4.
  • 2 is the left eye key point
  • 1 is the right eye key point
  • 3 is the nose tip key point
  • 5 is the left mouth corner key point
  • 4 is the right mouth corner key point.
  • Step 2a-2 Determine the size score of the face region in the facial image to be scored according to the target size score and the deflection coefficient.
  • the product of the target size score and the deflection coefficient may be determined as the size score of the face region in the facial image to be scored. That is, the size score FacePointScore can be calculated according to the following formula:
  • the degree of deflection of the facial area is also considered.
  • this embodiment reflects the degree of deflection by a deflection coefficient, thereby increasing the specific gravity of the degree of deflection of the face region when evaluating the image. This can improve the rationality of the score.
  • the offset coefficient determined according to the offset angle between key points can more reasonably reflect the degree of deflection of the face area, thereby making the determined size score more accurate.
  • S105 the step of determining the final score of the facial image to be scored according to the reference score and the size score, may specifically include the following steps 1b and 2b.
  • Step 1b Calculate the sharpness of the facial image to be scored.
  • this embodiment considers the influence of the sharpness of the image on the score.
  • Roberts operator When calculating the sharpness of the facial image to be scored, Roberts operator, Prewitt operator, Sobel operator and Lapacian operator can be used. However, because the eyebrows and lips of the face will have obvious horizontal stripes, the horizontal gradient in these places will be relatively large. And, considering that some people wear glasses, the gradient near the glasses frame will also be large, and the edges of the glasses frame include horizontal and vertical parts, so using the operator that calculates the horizontal and vertical gradients will produce Large error.
  • this example can use the diagonal gradient algorithm to calculate the gradient value of each pixel point in the facial image to be scored. According to the average value of the gradient value of each pixel point in the facial image to be scored, Calculate the sharpness of the facial image to be scored.
  • the diagonal direction gradient algorithm can be a Prewitt operator. See Table 2 and Table 3 for the convolution template of the Prewitt operator.
  • the Prewitt operator when using the Prewitt operator to calculate the gradient value of each pixel point in the facial image to be scored, it may include: for each pixel point in the facial image to be scored, determining the pixel point and eight pixel points around the pixel point, Multiply the gray values of the 9 pixels with the corresponding values in Table 2 to obtain 9 products. Use the sum of the 9 products as the first sub-gradient value of the pixel; use the 9 pixels Multiply the gray value of each with the corresponding value in Table 3 to obtain 9 products, and use the sum of these 9 products as the second sub-gradient value of the pixel; the first sub-gradient value and the second sub-gradient value And as the gradient value of the pixel. In the above manner, the gradient value of each pixel in the facial image to be scored can be calculated.
  • a gradient image containing the gradient value can be obtained.
  • the result obtained by dividing the sum of all gradient values in the gradient image by the total number of gradient values is determined as an average value.
  • the average value may be normalized to be between [0,1], and the normalization will be performed.
  • the averaged value is used as the sharpness of the facial image to be scored.
  • a facial area may also be determined from the facial image to be scored, and the sharpness of the facial area may be calculated as the sharpness of the facial image to be scored.
  • the facial area usually includes the hair of the head and facial features.
  • the face area can be narrowed down according to a certain ratio, so that the processed facial area does not include the edge connected by the head and the background. section.
  • the sharpness of the processed facial area is calculated as the sharpness of the facial image to be scored. This can make the calculated sharpness more accurate.
  • Step 2b Determine the final score of the facial image to be scored according to the sharpness, reference score and size score.
  • This step may specifically include various embodiments.
  • the result of multiplying the above three definitions, the reference score, and the size score, or the result of addition, or the result of multiplying the two by a third quantity is used as the final score of the facial image to be scored.
  • the final score FinalScore of the facial image to be scored may be:
  • DLScore is the reference score
  • FacePointScore is the size score
  • ClarityScore is the sharpness.
  • the sharpness of the facial area was also considered.
  • the sharpness of the face area has a great influence on the best image, after obtaining the reference score and the size score in this embodiment, the sharpness is calculated again to increase the specific gravity of the sharpness when evaluating the image. This can improve the rationality of the score.
  • the step S101 of obtaining a facial image to be scored may specifically include: acquiring a captured image collected by a camera as the facial image to be scored.
  • the execution subject of this embodiment may be an electronic device or a camera itself.
  • the camera can capture images according to a preset time period, and use the captured images as facial images to be scored.
  • the method may further include: obtaining a final score of the best facial image, and determining whether the final score of the facial image to be scored is greater than the final score of the best facial image, and if it is greater than,
  • the face image to be scored is updated to the best face image, that is, the face image to be scored is used as a new best face image, and the original best face image is deleted. If it is not larger, it will not be processed.
  • the original best facial image is an image acquired before acquiring a facial image to be scored.
  • the camera captures images every t seconds, and the face is continuously captured during the movement of the person. Each time the captured image gets a score. If the score of the subsequent captured image is higher than that of the previous image, the image with the higher score is used as the current image. This can ensure that the current image is the best face image. The image saved after this person disappears is the best face image. Only one image is kept for each person, which saves resources. For another example, if the current image is Fc, the final score of the current image is 89 points, the image captured later is Fl, and the score of Fl is 99 points, then replace Fc with Fl, and continue to compare with the captured image later.
  • FIG. 5 is a schematic structural diagram of a camera according to an embodiment of the present application. This embodiment corresponds to the method embodiment shown in FIG. 1.
  • the camera can be a network camera.
  • the camera can detect, track, capture, score and filter moving faces.
  • the camera may include: a processor 501, a memory 502, and an image acquisition module 503.
  • An image acquisition module 503, configured to acquire a facial image to be scored, and store the facial image in the memory 502;
  • a processor 501 is configured to obtain a facial image to be scored from the memory 502, and send the facial image to be scored to a pre-trained neural network, and the neural network scores the facial image to be scored according to network parameters to obtain a reference score; the processor 501, For the facial image to be scored obtained from the memory, extract each key point of the facial area in the facial image to be scored; determine the facial image to be scored according to the distance between the key points and the preset distance and the correspondence between the size score. The size score of the face area; the final score of the facial image to be scored is determined according to the reference score and the size score.
  • the memory 502 may include random access memory (Random Access Memory, RAM), and may also include non-volatile memory (Non-Volatile Memory, NVM), such as at least one disk memory.
  • RAM Random Access Memory
  • NVM non-Volatile Memory
  • the memory 502 may also be at least one storage device located far from the foregoing processor 501.
  • the processor 501 may be a general-purpose processor, including a CPU, a Network Processor (NP), and the like; it may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC) Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • the network parameter is obtained by training the neural network according to the sample facial image and the corresponding standard score; the standard score is determined according to the facial state characteristics of the sample facial image.
  • the processor 501 is further configured to determine a standard score by using the following operations:
  • the base library is used for Store various standard facial images
  • the standard score of the sample facial image is determined.
  • the processor 501 is specifically configured to:
  • bin is the maximum gray value level of the pixel
  • S i is the i-th gray value in the sample facial image
  • G j is the j-th gray value in the standard facial image
  • h (S i , G j ) The gray value of the pixel at the same position of the sample facial image and the standard facial image are the occurrences of S i and G j respectively
  • Sk is the gray value of the k-th pixel in the sample facial image
  • G k is the standard facial image
  • N is the total number of pixels in the sample facial image or standard facial image.
  • the processor 501 is further configured to obtain an initial value of a network parameter by using the following operations:
  • the reference network parameters of the trained reference neural network are obtained as the initial values of the network parameters; the reference neural network is obtained by training with a sample image different from the sample facial image.
  • the processor 501 is specifically configured to:
  • the size score of the face region in the facial image to be scored is determined.
  • the processor 501 is specifically configured to:
  • the size score of the face region in the facial image to be scored is determined.
  • each key point includes: a left eye key point, a right eye key point, a nose tip key point, a left mouth corner key point, and a right mouth corner key point; the processor 501 is specifically configured to:
  • a and c are the offset angles of the right eye key point and the right mouth corner key point with respect to the nose tip key point, the left eye key point and the left mouth corner key point with respect to the nose tip key point, the right eye key point and the left
  • the largest two offset angles of the offset angle of the eye key point from the nose tip key point and the left mouth corner key point and the right mouth corner key point from the nose tip key point, b and d are the two remaining offset angles
  • E is the offset angle of the line connecting the right eye key point and the left eye key point with respect to the horizontal line.
  • the processor 501 is specifically configured to:
  • the final score of the facial image to be scored is determined based on the sharpness, reference score, and size score.
  • the processor 501 is specifically configured to:
  • the diagonal direction gradient algorithm is used to calculate the gradient value of each pixel point in the facial image to be scored, and the sharpness of the facial image to be scored is calculated according to the average value of the gradient value of each pixel point in the facial image to be scored.
  • the processor 501 is specifically configured to:
  • the processor 501 is further configured to:
  • An embodiment of the present application further provides a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program, and the computer program implements the facial image scoring method provided by the embodiment of the present application when the computer program is executed by a processor.
  • the method includes:
  • the final score of the facial image to be scored is determined according to the reference score and the size score.
  • the neural network may determine the reference score for the facial state characteristics of the facial image to be scored, and may determine the size score according to the distance between key points.
  • various state characteristics and size characteristics of the facial image can be considered, and the scoring standard is more comprehensive, so the rationality when scoring the facial image can be improved.
  • the application also provides an application program, which is used to execute the steps of the facial image scoring method provided by the embodiment of the application at runtime.
  • the facial image scoring method includes:
  • a final score of the facial image to be scored is determined according to the reference score and the size score.
  • the neural network may determine the reference score for the facial state characteristics of the facial image to be scored, and may determine the size score according to the distance between key points.
  • various state characteristics and size characteristics of the facial image can be considered, and the scoring standard is more comprehensive, so the rationality when scoring the facial image can be improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例提供了一种面部图像评分方法及摄像机。该方法包括:获取待评分面部图像;将待评分面部图像发送至预先训练的神经网络,由神经网络根据网络参数对所述待评分面部图像进行评分,得到参考评分;提取所述待评分面部图像中面部区域的各个关键点;根据各个关键点之间的距离以及预设的距离与尺寸评分的对应关系,确定待评分面部图像中面部区域的尺寸评分;根据参考评分和尺寸评分,确定所述待评分面部图像的最终评分。应用本申请实施例提供的方案,能够提高对面部图像评分的合理性。

Description

一种面部图像评分方法及摄像机
本申请要求于2018年5月30日提交中国专利局、申请号为201810540396.7发明名称为“一种面部图像评分方法及摄像机”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像评分技术领域,特别是涉及一种面部图像评分方法及摄像机。
背景技术
随着计算机视觉技术的发展,人脸识别技术已经广泛应用在了很多领域。例如,应用在公司、家庭、银行等区域。在应用中,摄像机可以按照设定的时间间隔抓拍人脸图像,并保存人脸图像。对于同一个人来说,一段时间内会抓拍得到很多个人脸图像,如果将每个人脸图像均保存,则会占用设备很多的资源。因此,一般会从多个人脸图像中选择最佳图像进行保存。
在从多个人脸图像中选择最佳图像时,通常可以对每个人脸图像进行评分,选择出评分最高的人脸图像作为最佳图像。在对每个人脸图像进行评分时,可以检测人脸图像中的人脸是否为正脸、人眼是否张开等信息,根据检测结果进行评分。但是在实际场景中,抓拍的人脸图像的情况千变万化,影响图像质量的因素也非常多,采用上述方式对人脸图像进行评分时标准比较单一化,评分不够合理。
发明内容
本申请实施例提供了一种面部图像评分方法及摄像机,以提高对面部图像评分时的合理性。具体的技术方案如下。
第一方面,本申请实施例提供了一种摄像机,包括:处理器、存储器和图像采集模块;
所述图像采集模块,用于采集待评分面部图像,并存储至所述存储器;
所述处理器,用于从所述存储器中获取所述待评分面部图像,将所述待评分面部图像发送至预先训练的神经网络,由所述神经网络根据网络参数对所述待评分面部图像进行评分,得到参考评分;处理器,针对从存储器中获取的待评分面部图像,提取所述待评分面部图像中面部区域的各个关键点;根据各个关键点之间的距离以及预设的距离与尺寸评分的对应关系,确定所述待评分面部图像中面部区域的尺寸评分;根据所述参考评分和所述尺寸评分,确定所述待评分面部图像的最终评分。
可选的,所述网络参数为根据样本面部图像及对应的标准评分对所述神经网络进行训练得到;所述标准评分为根据所述样本面部图像的面部状态特征确定;所述处理器,还用于采用以下操作确定所述标准评分:
根据所述样本面部图像的面部状态特征,以及预设的面部状态特征与评分的对应关系,确定所述面部状态特征对应的评分,作为所述样本面部图像的主观评分;
从底库中确定所述样本面部图像对应的标准面部图像,确定所述标准面部图像与所述样本面部图像之间的相似度,作为所述样本面部图像的客观评分;所述底库用于存储各个标准面部图像;
根据所述主观评分和所述客观评分,确定所述样本面部图像的标准评分。
可选的,所述处理器,具体用于:
根据以下公式,确定所述标准面部图像与所述样本面部图像之间的相似度Sim(S,G):
Figure PCTCN2019080024-appb-000001
Figure PCTCN2019080024-appb-000002
其中,所述bin为像素的最大灰度值级别,所述S i为所述样本面部图像中的第i级灰度值,所述G j为所述标准面部图像中的第j级灰度值,所述h(S i,G j)为所述样本面部图像与所述标准面部图像的相同位置处像素的灰度值分别为S i和G j的出现次数;所述S k为所述样本面部图像中第k个像素的灰度值,所述G k为所述标准面部图像中第k个像素的灰度值,所述N为所述样本面部图像或所述标准面部图像的像素总数量。
可选的,所述处理器,还用于采用以下操作得到所述网络参数的初始值:
获取已训练好的参考神经网络的参考网络参数,作为所述网络参数的初始值;所述参考神经网络为采用不同于所述样本面部图像的样本图像训练得 到。
可选的,所述处理器,具体用于:
根据各个关键点计算瞳距,根据预设的瞳距与尺寸评分的对应关系,确定计算得到的瞳距对应的目标尺寸评分;
根据所述目标尺寸评分,确定所述待评分面部图像中面部区域的尺寸评分。
可选的,所述处理器,具体用于:
计算各个关键点之间的偏移角度,根据各个偏移角度确定所述待评分面部图像中面部区域的偏转系数;
根据所述目标尺寸评分和所述偏转系数,确定所述待评分面部图像中面部区域的尺寸评分。
可选的,各个关键点包括:左眼关键点、右眼关键点、鼻尖关键点、左嘴角关键点和右嘴角关键点;所述处理器,具体用于:
根据以下公式,确定所述待评分面部图像中面部区域的偏转系数TupleCoefficient:
Figure PCTCN2019080024-appb-000003
其中,所述a及所述c为所述右眼关键点和所述右嘴角关键点相对于所述鼻尖关键点的偏移角度、所述左眼关键点和所述左嘴角关键点相对于所述鼻尖关键点的偏移角度、所述右眼关键点和所述左眼关键点相对于所述鼻尖关键点的偏移角度以及所述左嘴角关键点和所述右嘴角关键点相对于所述鼻尖关键点的偏移角度中最大的两个偏移角度,所述b及所述d为剩余两个偏移角度,所述E为所述右眼关键点和所述左眼关键点的连线相对于水平线的偏移角度。
可选的,所述处理器,具体用于:
计算所述待评分面部图像的清晰度;
根据所述清晰度、所述参考评分和所述尺寸评分,确定所述待评分面部图像的最终评分。
可选的,所述处理器,具体用于:
采用对角线方向梯度算法,计算所述待评分面部图像中各个像素点的梯度值,根据所述待评分面部图像中各个像素点的梯度值的平均值,计算所述待评分面部图像的清晰度。
可选的,所述处理器,具体用于:
从所述存储器获取所述图像采集模块采集的抓拍图像,作为待评分面部图像;
所述处理器,还用于:
在得到所述待评分面部图像的最终评分之后,获取最佳面部图像的评分;判断所述待评分面部图像的最终评分是否大于所述最佳面部图像的评分,如果大于,则将所述待评分面部图像更新为所述最佳面部图像。
第二方面,本申请实施例提供了一种面部图像评分方法,该方法包括:
获取待评分面部图像;
将所述待评分面部图像发送至预先训练的神经网络,由所述神经网络根据网络参数对所述待评分面部图像进行评分,得到参考评分;
提取所述待评分面部图像中面部区域的各个关键点;
根据各个关键点之间的距离以及预设的距离与尺寸评分的对应关系,确定所述待评分面部图像中面部区域的尺寸评分;
根据所述参考评分和所述尺寸评分,确定所述待评分面部图像的最终评分。
可选的,所述网络参数为根据样本面部图像及对应的标准评分对所述神经网络进行训练得到;所述标准评分为根据所述样本面部图像的面部状态特征确定;所述标准评分采用以下方式确定:
根据所述样本面部图像的面部状态特征,以及预设的面部状态特征与评分的对应关系,确定所述面部状态特征对应的评分,作为所述样本面部图像的主观评分;
从底库中确定所述样本面部图像对应的标准面部图像,确定所述标准面部图像与所述样本面部图像之间的相似度,作为所述样本面部图像的客观评分;所述底库用于存储各个标准面部图像;
根据所述主观评分和所述客观评分,确定所述样本面部图像的标准评分。
可选的,所述确定所述标准面部图像与所述样本面部图像之间的相似度的步骤,包括:
根据以下公式,确定所述标准面部图像与所述样本面部图像之间的相似度Sim(S,G):
Figure PCTCN2019080024-appb-000004
Figure PCTCN2019080024-appb-000005
其中,所述bin为像素的最大灰度值级别,所述S i为所述样本面部图像中的第i级灰度值,所述G j为所述标准面部图像中的第j级灰度值,所述h(S i,G j)为所述样本面部图像与所述标准面部图像的相同位置处像素的灰度值分别为S i和G j的出现次数;所述S k为所述样本面部图像中第k个像素的灰度值,所述G k为所述标准面部图像中第k个像素的灰度值,所述N为所述样本面部图像或所述标准面部图像的像素总数量。
可选的,所述网络参数的初始值采用以下方式得到:
获取已训练好的参考神经网络的参考网络参数,作为所述网络参数的初始值;所述参考神经网络为采用不同于所述样本面部图像的样本图像训练得到。
可选的,所述根据各个关键点之间的距离以及预设的距离与尺寸评分的对应关系,确定所述待评分面部图像中面部区域的尺寸评分的步骤,包括:
根据各个关键点计算瞳距,根据预设的瞳距与尺寸评分的对应关系,确定计算得到的瞳距对应的目标尺寸评分;
根据所述目标尺寸评分,确定所述待评分面部图像中面部区域的尺寸评分。
可选的,所述根据所述目标尺寸评分,确定所述待评分面部图像中面部区域的尺寸评分的步骤,包括:
计算各个关键点之间的偏移角度,根据各个偏移角度确定所述待评分面部图像中面部区域的偏转系数;
根据所述目标尺寸评分和所述偏转系数,确定所述待评分面部图像中面部区域的尺寸评分。
可选的,各个关键点包括:左眼关键点、右眼关键点、鼻尖关键点、左嘴角关键点和右嘴角关键点;所述根据各个偏移角度确定所述待评分面部图像中面部区域的偏转系数的步骤,包括:
根据以下公式,确定所述待评分面部图像中面部区域的偏转系数TupleCoefficient:
Figure PCTCN2019080024-appb-000006
其中,所述a及所述c为所述右眼关键点和所述右嘴角关键点相对于所述鼻尖关键点的偏移角度、所述左眼关键点和所述左嘴角关键点相对于所述鼻尖关键点的偏移角度、所述右眼关键点和所述左眼关键点相对于所述鼻尖关键点的偏移角度以及所述左嘴角关键点和所述右嘴角关键点相对于所述鼻尖关键点的偏移角度中最大的两个偏移角度,所述b及所述d为剩余两个偏移角度,所述E为所述右眼关键点和所述左眼关键点的连线相对于水平线的偏移角度。
可选的,所述根据所述参考评分和所述尺寸评分,确定所述待评分面部图像的最终评分的步骤,包括:
计算所述待评分面部图像的清晰度;
根据所述清晰度、所述参考评分和所述尺寸评分,确定所述待评分面部图像的最终评分。
可选的,所述计算所述待评分面部图像的清晰度的步骤,包括:
采用对角线方向梯度算法,计算所述待评分面部图像中各个像素点的梯度值,根据所述待评分面部图像中各个像素点的梯度值的平均值,计算所述待评分面部图像的清晰度。
可选的,所述获取待评分面部图像的步骤,包括:
获取摄像机采集的抓拍图像,作为待评分面部图像;
在得到所述待评分面部图像的最终评分之后,所述方法还包括:
获取最佳面部图像的评分;
判断所述待评分面部图像的最终评分是否大于所述最佳面部图像的评分,如果大于,则将所述待评分面部图像更新为所述最佳面部图像。
第三方面,本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现第二方面提供的面部图像评分方法。
第四方面,本申请实施例还提供了一种应用程序,该应用程序用于在运行时执行第二方面提供的面部图像评分方法。
本申请实施例提供的面部图像评分方法及摄像机,可以由神经网络对待评分面部图像进行评分,得到参考评分,并根据待评分面部图像中面部区域的各个关键点之间的距离,确定面部区域的尺寸评分,根据参考评分和尺寸评分,确定待评分面部图像的最终评分。神经网络可以针对待评分面部图像的面部状态特征确定参考评分,根据关键点之间的距离可以确定尺寸评分。本申请实施例能够考虑面部图像的各种状态特征以及尺寸特征,评分标准更全面,因此能够提高对面部图像评分的合理性。当然,实施本申请的任一产品或方法并不一定需要同时达到以上所述的所有优点。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单的介绍。显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的面部图像评分方法的一种流程示意图;
图2为本申请实施例提供的用于训练图1所示实施例中的神经网络的标准评分确定方式的一种流程示意图;
图3为本申请实施例的瞳距与尺寸评分的对应关系的一种示意图;
图4为本申请实施例的人脸中各个关键点的一种示意图;
图5为本申请实施例提供的摄像机的一种结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行 清楚、完整的描述。显然,所描述的实施例仅仅是本申请的一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本申请保护的范围。
为了提高对面部图像评分时的合理性,本申请实施例提供了一种面部图像评分方法。下面通过具体实施例,对本申请进行详细说明。
图1为本申请实施例提供的面部图像评分方法的一种流程示意图。该方法应用于电子设备。该电子设备可以为计算机、服务器、摄像机、智能手机等具有数据处理功能的设备。该方法包括如下步骤:
步骤S101:获取待评分面部图像。
其中,待评分面部图像中至少包含一个面部区域。待评分面部图像可以为包含待评分的面部区域的图像。面部区域可以为人脸,也可以为动物脸或机器人脸等。待评分面部图像可以只包含面部区域,也可以包含面部区域以及面部区域之外的背景区域。
当电子设备自身包含图像采集模块时,获取待评分面部图像时,电子设备可以直接获取图像采集模块采集的待评分面部图像。当电子设备自身不包含图像采集模块时,电子设备可以从其他设备处获取待评分面部图像。
步骤S102:将待评分面部图像发送至预先训练的神经网络,由神经网络根据网络参数对待评分面部图像进行评分,得到参考评分,记为DLScore。
其中,网络参数为预先根据样本面部图像及对应的标准评分对神经网络进行训练得到。标准评分为根据样本面部图像的面部状态特征确定。在得到参考评分之后,在一种实施方式中,参考评分越高,可以认为待评分面部图像越接近最佳图像;在另一种实施方式中,参考评分越低,可以认为待评分面部图像越接近最佳图像,这都是合理的。
面部状态特征可以包括是否为正脸、面部是否清晰、面部是否有遮挡以及遮挡部位、是否存在表情、是否睁眼等方面的特征。
根据样本面部图像及对应的标准评分训练好神经网络后,神经网络可以根据学习到的网络参数对待评分面部图像进行评分,所得到的参考评分即为根据面部状态特征进行的评分,无需单一地检测待评分面部图像是否为正脸、是否睁眼等信息。因此,神经网络得出的参考评分能够参考更多的面部状态特征,得到的待评分面部图像的最终评分也更合理。
神经网络可以运行在电子设备内部,也可以运行在其他设备内部。在神经网络确定待评分图像的参考评分时,将参考评分发送至作为执行主体的电 子设备。电子设备可以接收其他设备发送的参考评分。
步骤S103:提取待评分面部图像中面部区域的各个关键点。
本步骤具体是由作为执行主体的电子设备的执行的。其中,面部区域的各个关键点包括左眼关键点、右眼关键点、鼻尖关键点、左嘴角关键点和右嘴角关键点等,还可以包括脸部区域形状的关键点。
在提取待评分面部图像中面部区域的各个关键点时,可以采用主动形状模型(Active Shape Model)、主动外观模型(Active Appearance Model)、约束局部模型(Constrained Local Model)或基于级联形状回归(Cascaded Regression)的方法、基于深度学习的方法等。
上述提取待评分面部图像中面部区域的各个关键点可以为提取待评分面部图像中面部区域的各个关键点的图像坐标。
步骤S104:根据各个关键点之间的距离以及预设的距离与尺寸评分的对应关系,确定待评分面部图像中面部区域的尺寸评分。
其中,关键点之间的距离可以为瞳距,也可以为左嘴角关键点与右嘴角关键点之间的距离,或者其他关键点分别与鼻尖关键点之间的距离。预设的距离与尺寸评分的对应关系可以预先根据样本面部图像得到。
作为一种实施方式,在距离与尺寸评分的对应关系中,当距离越大,脸部区域的分辨率越大,脸部区域越清晰,对应的尺寸评分就较高。本实施例中的尺寸评分可以表示面部区域的清晰程度。尺寸评分越高,面部越清晰,图像越接近于最佳图像。
上述尺寸评分也可以称为面部关键点评分,记为FacePointScore。
上述步骤S103和步骤S104也可以在步骤S102之前执行,本申请对此不做限定。
步骤S105:根据参考评分和尺寸评分,确定待评分面部图像的最终评分,记为FinalScore。
本步骤,具体可以为将参考评分和尺寸评分的相加、相乘或相加后与预设值相乘的运算方式,得到待评分面部图像的最终评分。
在一种实施方式中,最终评分越高,认为待评分面部图像越接近于最佳图像。在确定各个待评分面部图像的最终评分之后,还可以将最终评分最高的待评分面部图像确定为最佳图像。
由上述内容可知,本实施例中,神经网络可以针对待评分面部图像的面部状态特征确定参考评分,根据关键点之间的距离可以确定尺寸评分。本实 施例能够考虑面部图像的各种状态特征以及尺寸特征,评分标准更全面,因此能够提高对面部图像评分的合理性。
本申请的另一实施例中,上述标准评分的确定可以按照图2所示流程示意图执行,具体包括以下步骤S201~S203。
步骤S201:根据样本面部图像的面部状态特征,以及预设的面部状态特征与评分的对应关系,确定该面部状态特征对应的评分,作为样本面部图像的主观评分。
在训练神经网络时,需要获取大量的样本面部图像,并可以确定每个样本面部图像的主观评分。
其中,面部状态特征可以包括表示脸部偏转程度、清晰度、脸部表情、是否眨眼等方面的特征。在建立面部状态特征与主观评分的对应关系时,可以将脸部很正、很清晰、无表情、没眨眼的图像标为一类,根据眨眼、低头、抬头、侧脸、笑、遮挡等不利因素的程度将图像分为第二类、第三类等。每一个类别作为一个阶梯,上一个阶梯的评分比下一个阶梯的评分高。
上述遮挡可以包括眼睛遮挡、脸部遮挡、前额遮挡、嘴巴遮挡等。遮挡物可以是口罩、手、树枝、书、眼镜等。表情可以包括无明显表情、笑、抿嘴等情况。
在确定面部状态特征对应的主观评分时,可以采用预先训练的面部状态分类器,确定样本面部图像在脸部偏转程度、是否存在遮挡方面的得分。可以采用人工评分的方式,根据预设的清晰程度、脸部表情和是否眨眼等情况对样本面部图像进行评分。根据面部状态分类器确定的样本面部图像的得分,以及人工评定的样本面部图像的评分,得到样本面部图像的主观评分。例如,可以将面部状态分类器确定的样本面部图像的得分,以及人工评定的样本面部图像的评分的乘积,作为样本面部图像的主观评分。
上述面部状态分类器可以根据样本面部图像和对应的标签进行训练。作为一种实施方式,在设定标签时,如果样本面部图像中的面部为背脸,则标签是0分;如果面部的俯仰角大于等于40度,或偏转角大于等于40度,则标签是5分。如果面部是正脸但是俯仰角小于40度,可以根据表1的映射表来确定标签。
表1
Figure PCTCN2019080024-appb-000007
Figure PCTCN2019080024-appb-000008
由上表1及上述标签设定方式可知,在设定标签时可以将样本面部图像分为13类,类别标签分别为0,5,10,15,20,30,40,50,60,70,80,90,100。
步骤S202:从底库中确定样本面部图像对应的标准面部图像,确定标准面部图像与样本面部图像之间的相似度,作为所述样本面部图像的客观评分。
其中,上述底库用于存储各个标准面部图像。各个标准面部图像可以对应不同的对象。每个标准面部图像均可以是正脸、很清晰、无标签、没有眨眼的图像。
从底库中确定样本面部图像对应的标准面部图像时,可以根据预设的图像匹配算法,将样本面部图像与底库中的各个标准面部图像进行匹配,将匹配成功的标准面部图像作为与该样本面部图像对应的标准面部图像。也可以采用人工确定的方式从底库中确定样本面部图像对应的标准面部图像。样本面部图像以及对应的标准面部图像中的对象应是同一个。例如样本面部图像以及对应的标准面部图像中的人是同一个人。
底库中的各个标准面部图像可以具有相同的分辨率和图像尺寸。在确定标准面部图像与样本面部图像之间的相似度之前,可以对样本面部图像进行预处理。这样能够使得样本面部图像与标准面部图像的图像参数在同一层次上,进而使得确定的相似度更准确。
上述预处理可以包括,对样本面部图像的分辨率进行归一化和亮度进行归一化。在对样本面部图像的分辨率进行归一化时,可以将样本面部图像归一化到256像素×256像素大小。在对样本面部图像的亮度进行归一化时,可以包括:计算底库中所有标准面部图像的平均亮度,记为Aveobj;计算第m个样本面部图像的平均亮度,记为Avem,其中m代表第m个样本面部图像;当Avem大于Aveobj时,将该第m个样本面部图像的每个灰度值减去|Aveobj-Avem|,当Avem小于Aveobj时,将该第m个样本面部图像的每个灰度值加上|Aveobj-Avem|,使得该样本面部图像的平均亮度与底库中所有标准面部图像的平均亮度相等。
在对样本面部图像的亮度进行归一化之前,可以先对样本面部图像的分辨率进行归一化,再对分辨率归一化后的样本面部图像的亮度进行归一化。
在确定标准面部图像G与样本面部图像S之间的相似度Sim(S,G)时,具体可以根据以下公式确定:
Figure PCTCN2019080024-appb-000009
Figure PCTCN2019080024-appb-000010
其中,bin为像素的最大灰度值级别,例如bin可以为255。S i为样本面部图像中的第i级灰度值,i的取值范围为0~255。G j为标准面部图像中的第j级灰度值,j的取值范围为0~255。h(S i,G j)为样本面部图像与标准面部图像的相同位置处像素的灰度值分别为S i和G j的出现次数。S k为样本面部图像中第k个像素的灰度值,G k为标准面部图像中第k个像素的灰度值,N为样本面部图像或标准面部图像的像素总数量。样本面部图像或标准面部图像的像素总数量可以相同。
在确定相似度Sim(S,G)时,可以预先得到标准面部图像G与样本面部图像S的联合直方图,从该联合直方图中得到每个h(S i,G j)。该联合直方图的横坐标为各个灰度值组合,纵坐标为每个灰度值组合在样本面部图像或标准面部图像中的出现次数。每个灰度值组合(S i,G j)中,S i对应的像素在样本面部图像中的位置与G j对应的像素在标准面部图像中的位置相同。
上述P(S i,G j)表示标准面部图像G与样本面部图像S的联合概率分布。P(S i)和P(G i)表示边缘概率密度。
在确定相似度Sim(S,G)之后,还可以将相似度Sim(S,G)归一化到[0,1]之间。
步骤S203:根据主观评分和客观评分,确定样本面部图像的标准评分。
在确定样本面部图像的标准评分时,可以将主观评分和客观评分相乘或相加后的结果确定为样本面部图像的标准评分。
在确定标准评分之后,还可以从预设的各个评分标签中确定与标准评分最接近的评分标签,将该最接近的评分标签作为该样本面部图像更新后的标准评分,使更新后的标准评分为预设的各个评分标签中的一个。这样能够使得标准评分更标准化,方便后续处理。
例如,某个样本面部图像的主观评分为80,客观评分为0.83,则标准评分可以为80*0.83=66.4。由于66.4与表1中的评分标签50的差的绝对值最小,因此可以将标准评分更新为50。
在训练神经网络时,样本面部图像的标准评分的设定直接影响着神经网络在确定待评分面部图像的参考评分时的合理性。本实施例根据主观评分和客观评分确定样本面部图像的标准评分,能够使得标准评分更合理,进而使得训练好的神经网络更符合要求,得到的待评分面部图像的参考评分更加准确。
确定每个样本面部图像的标准评分后,根据该评分标准对训练样本进行标定,便可以将标定后的样本面部图像输入神经网络,神经网络对样本面部图像进行处理,提取样本面部图像的面部状态特征,根据样本面部图像的面部状态特征确定评估分数,再根据评估分数及该样本面部图像的标准评分的差异,适当调整神经网络的网络参数,神经网络逐渐学习图像的面部状态特征与分数的对应关系。其中,调整神经网络的网络参数的方式可以采用随机梯度下降算法等,在此不做具体限定。
所有样本面部图像训练完毕或者训练迭代次数达到预设次数时,说明此时的神经网络已经可以对图像进行准确地评分,便可以停止训练,得到上述用于对待评分面部图像进行评分的神经网络。该神经网络包括图像的面部状态特征与分数的对应关系。
将待评分面部图像输入该神经网络后,该神经网络便可以提取待评分面部图像的面部状态特征,进而基于网络参数根据图像的面部状态特征与分数的对应关系对待评分面部图像进行评分,得到参考评分。
在本申请的另一实例中,在训练神经网络时,可以选择常用的深度学习网络,以及常用的深度学习训练框架,预先编写好训练神经网络时的超参数文件进行训练。在训练神经网络之前可以将样本面部图像分为两部分,一部分用于训练,一部分用于训练之后测试神经网络的精度。
常用的深度学习网络有很多可以选择,例如MobileNet、SqueezeNet、 Resnet、GoogleNet、GoogleNet Inception V2、GoogleNet Inception V3、GoogleNet Inception V4、Xception和ResNeXt等。在选择深度学习网络时,要兼顾网络模型的分类精度和确定待评分面部图像的参考评分的耗费时间,即要考虑精度和效率。本实施例可以采用GoogleNet Inception V2网络,该网络的精度能达到要求,效率也比较高。
常用的深度学习训练框架包括Caffe、Tensorflow、Pytorch和MXNet等,例如,本实施例可以采用Caffe,并且对Caffe进行改进使之可以进行数据增强。
在深度学习过程中,为了避免出现过拟合,通常需要向神经网络输入充足的数据量。当数据量不够大时,通过数据增强来提高深度学习训练效果是十分有效的手段。数据增强是指对图像进行几何变换、颜色变换、主成分分析(Principal Component Analysis,PCA)处理、模糊处理等中的一个或多个来达到增加样本数量的目的。具体地,数据增强可以包括水平、竖直翻转,随机裁剪、旋转、缩放、仿射变换、投影变换、随机擦除、加高斯噪声、模糊处理、调整色彩的饱和度、亮度、对比度等。
为了提高训练神经网络的效率,网络参数的初始值采用以下方式得到:
获取已训练好的参考神经网络的参考网络参数,作为网络参数的初始值。
其中,参考神经网络为采用不同于上述样本面部图像的样本图像训练得到。参考神经网络的类型应与上述待训练的神经网络的类型相同,例如参考神经网络和待训练的神经网络均为GoogleNet Inception V2网络。
参考神经网络可以为在大型数据集上训练好的网络模型。由于参考神经网络已经具有比较好的网络参数,因此根据样本面部图像和对应的标准评分在参考神经网络上进行训练,能够提高训练的效率。这种训练方式可以称为微调。
在训练网络模型时,样本的数量关系着网络模型的训练时长和训练好之后网络模型的检测效果。在一些特定任务中,如果样本的数量不够大,重新训练一个新的效果好的网络模型是不容易的,而且参数不好调整。这时可以使用微调。另外,即使有大量的样本,使用微调往往也会比从0开始训练一个新的网络模型,能得到更好的效果。微调就是利用已在包含大量样本的数据集上训练好的模型的基础上,加上特定的样本,来训练新的网络模型,其中,训练好的模型可以为使用ImageNet数据集训练的大量Model-Zoo。采用微调的好处在于不用完全重新训练网络模型,从而提高效率。一般重新训练网络模型的准确率都会从很低的值开始慢慢上升,但是微调能够使得网络模型在比较少的迭代次数之后得到比较好的效果。
在训练神经网络之前需要编写训练所需的超参数文件。超参数文件中包含的超参数能够指明训练神经网络时所用的网络文件、优化算法、学习率、学习率变化方式、最大迭代次数、训练模式、多久保存一次训练过程中的分类器模型等,其中,训练模式可以为GPU(Graphics Processing Unit,图形处理器)或CPU(Central Processing Unit,中央处理器)。有些超参数是确定的,比如训练所用的网络文件、训练模式等,而有些超参数的选择需要利用K折交叉验证来确定,比如学习率、优化算法等。
在神经网络的训练过程中,可以按照预设的时间间隔保存训练过程中的网络参数。在神经网络训练好之后,会得到多次保存的网络参数,包括训练过程中产生的网络参数和达到最大训练迭代次数时产生的网络参数。每次得到的网络参数均为一系列参数的组合。针对使用不同批次的网络参数的神经网络,本实施例可以使用测试时的样本面部图像对各个神经网络进行测试,选出分类精度最高的神经网络,作为确定参考评分时的神经网络。
在本申请的另一实施例中,步骤S104,根据各个关键点之间的距离以及预设的距离与尺寸评分的对应关系,确定待评分面部图像中面部区域的尺寸评分的步骤,具体可以包括以下步骤1a和2a。
步骤1a:根据各个关键点计算瞳距,根据预设的瞳距与尺寸评分的对应关系,确定计算得到的瞳距对应的目标尺寸评分。目标尺寸评分可以记为TupleScore。
在图像中,瞳距在图像中的比例越大,认为面部区域的分辨率就越高。而瞳距在大于一定阈值之后,人对于图像面部的感受是相差不多的。因此,可以将瞳距与尺寸评分的对应关系设定为分段函数。图3为瞳距与尺寸评分的对应关系的一种示意图。其中,瞳距在d0~dk区间时,对应的尺寸评分随着瞳距的变大按照直线函数变化,在大于dk时,尺寸评分保持固定值,也就是最大值Max。例如,d0可以为0,dk可以为50,尺寸评分的最大值Max可以为100。
步骤2a:根据上述目标尺寸评分,确定待评分面部图像中面部区域的尺寸评分。
作为一种实施方式,可以将目标尺寸评分确定为待评分面部图像中面部区域的尺寸评分,也可以将对目标尺寸评分进行一定处理之后的值确定为待评分面部图像中面部区域的尺寸评分。例如,将目标尺寸评分进行四舍五入处理、除以一定数值等处理。
综上,本实施例中,可以根据瞳距对应的尺寸评分确定待评分面部图像中面部区域的尺寸评分,这种方式既简单易处理,又能达到一定的合理性。
在本申请的另一实施例中,为了提高确定的尺寸评分的合理性,上述步骤2a,根据目标尺寸评分,确定待评分面部图像中面部区域的尺寸评分的步骤,具体可以包括以下步骤2a-1和2a-2。
步骤2a-1:计算各个关键点之间的偏移角度,根据各个偏移角度确定待评分面部图像中面部区域的偏转系数;
其中,各个关键点之间的偏移角度,可以反映出面部区域的偏转程度,而偏移系数能够体现出这种偏移程度。
本实施例中,各个关键点可以包括:左眼关键点、右眼关键点、鼻尖关键点、左嘴角关键点和右嘴角关键点。本步骤具体可以包括:
根据以下公式,确定待评分面部图像中面部区域的偏转系数TupleCoefficient:
Figure PCTCN2019080024-appb-000011
其中,a及c为右眼关键点和右嘴角关键点相对于鼻尖关键点的偏移角度、左眼关键点和左嘴角关键点相对于鼻尖关键点的偏移角度、右眼关键点和左眼关键点相对于鼻尖关键点的偏移角度以及左嘴角关键点和右嘴角关键点相对于鼻尖关键点的偏移角度中最大的两个偏移角度,b及d为剩余两个偏移角度,E为右眼关键点和左眼关键点的连线相对于水平线的偏移角度。也就是说:
Figure PCTCN2019080024-appb-000012
其中,A为右眼关键点和右嘴角关键点相对于鼻尖关键点的偏移角度,B为左眼关键点和左嘴角关键点相对于鼻尖关键点的偏移角度,C为右眼关键 点和左眼关键点相对于鼻尖关键点的偏移角度,D为左嘴角关键点和右嘴角关键点相对于鼻尖关键点的偏移角度,E为右眼关键点和左眼关键点的连线相对于水平线的偏移角度。上述各个偏移角度可以根据各个关键点的坐标计算得到。
参见图4所示的人脸示意图,A、B、C、D和E各个偏移角度均标注在图4中。其中,2为左眼关键点,1为右眼关键点,3为鼻尖关键点,5为左嘴角关键点,4为右嘴角关键点。
步骤2a-2:根据目标尺寸评分和偏转系数,确定待评分面部图像中面部区域的尺寸评分。
在一种实施方式中,可以将目标尺寸评分和偏转系数的乘积,确定为待评分面部图像中面部区域的尺寸评分。即尺寸评分FacePointScore可以按照以下公式计算得到:
FacePointScore=TupleCoefficient*TupleScore。
在训练神经网络时,也考虑了面部区域的偏转程度。但是,由于面部区域的偏转程度对最佳图像的影响较大,因此本实施例通过偏转系数来体现该偏转程度,提高面部区域的偏转程度在评价图像时的比重。这样能够提高评分的合理性。并且,根据关键点之间的偏移角度确定的偏移系数能够更合理地体现出面部区域的偏转程度,进而使得确定的尺寸评分更准确。
在本申请的另一实施例中,S105,根据参考评分和尺寸评分,确定待评分面部图像的最终评分的步骤,具体可以包括以下步骤1b和2b。
步骤1b:计算待评分面部图像的清晰度。
在实际中对象的移动会给图像的质量带来不确定的影响,比如人走动时可能产生的运动模糊。图像越清晰,评分应越高,即越接近于最佳图像。因此,本实施例考虑图像的清晰度对评分的影响。
在计算待评分面部图像的清晰度时,可以采用Roberts算子、Prewitt算子、Sobel算子和Lapacian算子等。但是,由于面部的眉毛、嘴唇等处会有明显的横条纹,这些地方的水平梯度会比较大。以及,考虑到部分人会戴眼镜,眼镜框附近的梯度也会很大,而眼镜框边缘包括横着的部分和竖着的部分,因此使用计算水平方向、竖直方向梯度的算子均会产生较大误差。
为了提高计算的清晰度的准确性,本实例可以采用对角线方向梯度算法,计算待评分面部图像中各个像素点的梯度值,根据待评分面部图像中各个像素点的梯度值的平均值,计算待评分面部图像的清晰度。
例如,对角线方向梯度算法可以为Prewitt算子。该Prewitt算子的卷积模 板可参见表2和表3。
表2
     
1    
1 1  
表3
1 1  
1    
     
具体的,采用Prewitt算子计算待评分面部图像中各个像素点的梯度值时,可以包括:针对待评分面部图像中每个像素点,确定该像素点以及该像素点周围的8个像素点,将这9个像素点的灰度值分别与表2中对应的值相乘,得到9个乘积,将这9个乘积的和作为该像素点的第一子梯度值;将这9个像素点的灰度值分别与表3中对应的值相乘,得到9个乘积,将这9个乘积的和作为该像素点的第二子梯度值;将第一子梯度值与第二子梯度值的和,作为该像素点的梯度值。按照上述方式,可以计算得到待评分面部图像中各个像素点的梯度值。
在计算待评分面部图像中各个像素点的梯度值之后,可以得到包含梯度值的梯度图像。将该梯度图像中的所有梯度值之和除以梯度值总个数后得到的结果确定为平均值。
在一种实施方式中,在得到待评分面部图像中各个像素点的梯度值的平均值之后,可以对该平均值进行归一化,使之处于[0,1]之间,并将归一化之后的平均值作为待评分面部图像的清晰度。
计算待评分面部图像的清晰度时,还可以从待评分面部图像中确定面部区域,计算面部区域的清晰度,作为待评分面部图像的清晰度。脸部区域通常包含头部的头发、面部五官等区域。为了避免头部与背景部分相连接的边缘在计算清晰度时产生的不利影响,可以按照一定比例对面部区域进行范围 缩小处理,使处理后的面部区域不包含头部与背景部分相连接的边缘部分。计算处理后的面部区域的清晰度,作为待评分面部图像的清晰度。这样能够使得计算得到的清晰度更准确。
步骤2b:根据上述清晰度、参考评分和尺寸评分,确定待评分面部图像的最终评分。
本步骤具体可以包括多种实施方式。例如,将上述清晰度、参考评分和尺寸评分三者相乘的结果,或相加的结果,或两者相加后再乘以第三个量的结果,作为待评分面部图像的最终评分。
例如,待评分面部图像的最终评分FinalScore可以为:
FinalScore=DLScore*FacePointScore*ClarityScore
其中,DLScore为参考评分,FacePointScore为尺寸评分,ClarityScore为清晰度。
在训练神经网络时,也考虑了面部区域的清晰度。但是,由于面部区域的清晰度对最佳图像的影响较大,因此本实施例在得到参考评分和尺寸评分之后,又计算了清晰度,提高清晰度在评价图像时的比重。这样能够提高评分的合理性。
在本申请的另一实施例中,步骤S101获取待评分面部图像的步骤,具体可以包括:获取摄像机采集的抓拍图像,作为待评分面部图像。
本实施例的执行主体可以为电子设备,也可以为摄像机本身。摄像机可以按照预设的时间周期抓拍图像,并将抓拍的图像作为待评分面部图像。
在得到待评分面部图像的最终评分之后,该方法还可以包括:获取最佳面部图像的最终评分,判断上述待评分面部图像的最终评分是否大于最佳面部图像的最终评分,如果大于,则将待评分面部图像更新为最佳面部图像,即将待评分面部图像作为新的最佳面部图像,将原来的最佳面部图像删除。如果不大于,则不予以处理。其中,原来的最佳面部图像为在获取待评分面部图像之前获取的图像。
例如,摄像机每隔t秒抓拍一次图像,在人移动过程中人脸不断地被抓拍,每次抓拍的图像都会得到一个评分。如果后续抓拍的图像的评分比之前的图像的评分高,则用评分高的图像作为当前图像。这样可以保证当前图像是最佳人脸图像。当此人消失时后保存的图像就是最佳人脸图像,针对每个人仅保留一个图像,这样比较节省资源。又如,当前图像是Fc,当前图像的最终评分为89分,后面抓到的图像是Fl,Fl的评分为99分,则用Fl代替Fc,继续和后面抓拍的图像进行比较。
图5为本申请实施例提供的摄像机的一种结构示意图。本实施例与图1所示方法实施例相对应。该摄像机可以为网络摄像机。该摄像机可以对运动人脸进行检测、跟踪、抓拍、评分和筛选。该摄像机可以包括:处理器501、存储器502和图像采集模块503。
图像采集模块503,用于采集待评分面部图像,并存储至存储器502;
处理器501,用于从存储器502中获取待评分面部图像,将待评分面部图像发送至预先训练的神经网络,由神经网络根据网络参数对待评分面部图像进行评分,得到参考评分;处理器501,针对从存储器中获取的待评分面部图像,提取待评分面部图像中面部区域的各个关键点;根据各个关键点之间的距离以及预设的距离与尺寸评分的对应关系,确定待评分面部图像中面部区域的尺寸评分;根据参考评分和尺寸评分,确定待评分面部图像的最终评分。
存储器502可以包括随机存取存储器(Random Access Memory,RAM),也可以包括非易失性存储器(Non-Volatile Memory,NVM),例如至少一个磁盘存储器。可选的,存储器502还可以是至少一个位于远离前述处理器501的存储装置。
上述处理器501可以是通用处理器,包括CPU、网络处理器(Network Processor,NP)等;还可以是数字信号处理器(Digital Signal Processing,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。
在本申请的另一实施例中,网络参数为根据样本面部图像及对应的标准评分对神经网络进行训练得到;标准评分为根据样本面部图像的面部状态特征确定。
处理器501,还用于采用以下操作确定标准评分:
根据所述样本面部图像的面部状态特征,以及预设的面部状态特征与评分的对应关系,确定所述面部状态特征对应的评分,作为所述样本面部图像的主观评分;
从底库中确定所述样本面部图像对应的标准面部图像,确定所述标准面部图像与所述样本面部图像之间的相似度,作为所述样本面部图像的客观评分;所述底库用于存储各个标准面部图像;
根据主观评分和客观评分,确定样本面部图像的标准评分。
在本申请的另一实施例中,处理器501具体用于:
根据以下公式,确定标准面部图像与样本面部图像之间的相似度Sim(S,G):
Figure PCTCN2019080024-appb-000013
Figure PCTCN2019080024-appb-000014
其中,bin为像素的最大灰度值级别,S i为样本面部图像中的第i级灰度值,G j为标准面部图像中的第j级灰度值,h(S i,G j)为样本面部图像与标准面部图像的相同位置处像素的灰度值分别为S i和G j的出现次数;S k为样本面部图像中第k个像素的灰度值,G k为标准面部图像中第k个像素的灰度值,N为样本面部图像或标准面部图像的像素总数量。
在本申请的另一实施例中,处理器501还用于采用以下操作得到网络参数的初始值:
获取已训练好的参考神经网络的参考网络参数,作为网络参数的初始值;参考神经网络为采用不同于样本面部图像的样本图像训练得到。
在本申请的另一实施例中,处理器501具体用于:
根据各个关键点计算瞳距,根据预设的瞳距与尺寸评分的对应关系,确定计算得到的瞳距对应的目标尺寸评分;
根据目标尺寸评分,确定待评分面部图像中面部区域的尺寸评分。
在本申请的另一实施例中,处理器501具体用于:
计算各个关键点之间的偏移角度,根据各个偏移角度确定待评分面部图像中面部区域的偏转系数;
根据目标尺寸评分和偏转系数,确定待评分面部图像中面部区域的尺寸评分。
在本申请的另一实施例中,各个关键点包括:左眼关键点、右眼关键点、鼻尖关键点、左嘴角关键点和右嘴角关键点;处理器501具体用于:
根据以下公式,确定待评分面部图像中面部区域的偏转系数TupleCoefficient:
Figure PCTCN2019080024-appb-000015
其中,a及c为右眼关键点和右嘴角关键点相对于鼻尖关键点的偏移角度、左眼关键点和左嘴角关键点相对于鼻尖关键点的偏移角度、右眼关键点和左眼关键点相对于鼻尖关键点的偏移角度以及左嘴角关键点和右嘴角关键点相对于鼻尖关键点的偏移角度中最大的两个偏移角度,b及d为剩余两个偏移角度,E为右眼关键点和左眼关键点的连线相对于水平线的偏移角度。
在本申请的另一实施例中,处理器501具体用于:
计算待评分面部图像的清晰度;
根据清晰度、参考评分和尺寸评分,确定待评分面部图像的最终评分。
在本申请的另一实施例中,处理器501具体用于:
采用对角线方向梯度算法,计算待评分面部图像中各个像素点的梯度值,根据待评分面部图像中各个像素点的梯度值的平均值,计算待评分面部图像的清晰度。
在本申请的另一实施例中,处理器501具体用于:
获取所述图像采集模块503采集的抓拍图像,作为待评分面部图像;
处理器501还用于:
在得到待评分面部图像的最终评分之后,获取最佳面部图像的评分;判断所述待评分面部图像的最终评分是否大于所述最佳面部图像的评分,如果大于,则将所述待评分面部图像更新为所述最佳面部图像。
由于上述摄像机实施例是基于方法实施例得到的,与该方法具有相同的技术效果,因此摄像机实施例的技术效果在此不再赘述。对于摄像机实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质内存储有计算机程序,计算机程序被处理器执行时实现本申请实施例提供的面部图像评分方法。该方法包括:
获取待评分面部图像;
将待评分面部图像发送至预先训练的神经网络,由神经网络根据网络参数对待评分面部图像进行评分,得到参考评分;
提取待评分面部图像中面部区域的各个关键点;
根据各个关键点之间的距离以及预设的距离与尺寸评分的对应关系,确定待评分面部图像中面部区域的尺寸评分;
根据参考评分和尺寸评分,确定待评分面部图像的最终评分。
本实施例中,神经网络可以针对待评分面部图像的面部状态特征确定参考评分,根据关键点之间的距离可以确定尺寸评分。本实施例能够考虑面部图像的各种状态特征以及尺寸特征,评分标准更全面,因此能够提高对面部图像评分时的合理性。
本申请还提供了一种应用程序,该应用程序用于在运行时执行本申请实施例提供的面部图像评分方法的步骤。
本申请的一个实施例中,上述面部图像评分方法,包括:
获取待评分面部图像;
将所述待评分面部图像发送至预先训练的神经网络,由所述神经网络根据网络参数对所述待评分面部图像进行评分,得到参考评分;
提取所述待评分面部图像中面部区域的各个关键点;
根据各个关键点之间的距离以及预设的距离与尺寸评分的对应关系,确定所述待评分面部图像中面部区域的尺寸评分;
根据所述参考评分和所述尺寸评分,确定所述待评分面部图像的最终评分。
本实施例中,神经网络可以针对待评分面部图像的面部状态特征确定参考评分,根据关键点之间的距离可以确定尺寸评分。本实施例能够考虑面部图像的各种状态特征以及尺寸特征,评分标准更全面,因此能够提高对面部图像评分时的合理性。
由于上述计算机可读存储介质及应用程序实施例是基于方法实施例得到的,与该方法具有相同的技术效果,因此计算机可读存储介质及应用程序实施例的技术效果在此不再赘述。对于计算机可读存储介质及应用程序实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者任何其他变体意在涵盖非排他性的包含,从而使得包括一系列 要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。
以上所述仅为本申请的较佳实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和原则之内所做的任何修改、等同替换、改进等,均包含在本申请的保护范围内。

Claims (11)

  1. 一种摄像机,其特征在于,包括:处理器、存储器和图像采集模块;
    所述图像采集模块,用于采集待评分面部图像,并存储至所述存储器;
    所述处理器,用于从所述存储器中获取所述待评分面部图像,将所述待评分面部图像发送至预先训练的神经网络,由所述神经网络根据网络参数对所述待评分面部图像进行评分,得到参考评分;
    所述处理器,针对从所述存储器中获取的待评分面部图像,提取所述待评分面部图像中面部区域的各个关键点;根据各个关键点之间的距离以及预设的距离与尺寸评分的对应关系,确定所述待评分面部图像中面部区域的尺寸评分;根据所述参考评分和所述尺寸评分,确定所述待评分面部图像的最终评分。
  2. 根据权利要求1所述的摄像机,其特征在于,所述网络参数为根据样本面部图像及对应的标准评分对所述神经网络进行训练得到;所述标准评分为根据所述样本面部图像的面部状态特征确定;
    所述处理器,还用于采用以下操作确定所述标准评分:
    根据所述样本面部图像的面部状态特征,以及预设的面部状态特征与评分的对应关系,确定所述面部状态特征对应的评分,作为所述样本面部图像的主观评分;
    从底库中确定所述样本面部图像对应的标准面部图像,确定所述标准面部图像与所述样本面部图像之间的相似度,作为所述样本面部图像的客观评分;所述底库用于存储各个标准面部图像;
    根据所述主观评分和所述客观评分,确定所述样本面部图像的标准评分。
  3. 根据权利要求2所述的摄像机,其特征在于,所述处理器,具体用于:
    根据以下公式,确定所述标准面部图像与所述样本面部图像之间的相似度Sim(S,G):
    Figure PCTCN2019080024-appb-100001
    Figure PCTCN2019080024-appb-100002
    其中,所述bin为像素的最大灰度值级别,所述S i为所述样本面部图像中 的第i级灰度值,所述G j为所述标准面部图像中的第j级灰度值,所述h(S i,G j)为所述样本面部图像与所述标准面部图像的相同位置处像素的灰度值分别为S i和G j的出现次数;所述S k为所述样本面部图像中第k个像素的灰度值,所述G k为所述标准面部图像中第k个像素的灰度值,所述N为所述样本面部图像或所述标准面部图像的像素总数量。
  4. 根据权利要求1所述的摄像机,其特征在于,所述处理器,还用于采用以下操作得到所述网络参数的初始值:
    获取已训练好的参考神经网络的参考网络参数,作为所述网络参数的初始值;所述参考神经网络为采用不同于所述样本面部图像的样本图像训练得到。
  5. 根据权利要求1所述的摄像机,其特征在于,所述处理器,具体用于:
    根据各个关键点计算瞳距,根据预设的瞳距与尺寸评分的对应关系,确定计算得到的瞳距对应的目标尺寸评分;
    根据所述目标尺寸评分,确定所述待评分面部图像中面部区域的尺寸评分。
  6. 根据权利要求5所述的摄像机,其特征在于,所述处理器,具体用于:
    计算各个关键点之间的偏移角度,根据各个偏移角度确定所述待评分面部图像中面部区域的偏转系数;
    根据所述目标尺寸评分和所述偏转系数,确定所述待评分面部图像中面部区域的尺寸评分。
  7. 根据权利要求6所述的摄像机,其特征在于,各个关键点包括:左眼关键点、右眼关键点、鼻尖关键点、左嘴角关键点和右嘴角关键点;所述处理器,具体用于:
    根据以下公式,确定所述待评分面部图像中面部区域的偏转系数TupleCoefficient:
    Figure PCTCN2019080024-appb-100003
    其中,所述a及所述c为所述右眼关键点和所述右嘴角关键点相对于所述 鼻尖关键点的偏移角度、所述左眼关键点和所述左嘴角关键点相对于所述鼻尖关键点的偏移角度、所述右眼关键点和所述左眼关键点相对于所述鼻尖关键点的偏移角度以及所述左嘴角关键点和所述右嘴角关键点相对于所述鼻尖关键点的偏移角度中最大的两个偏移角度,所述b及所述d为剩余两个偏移角度,所述E为所述右眼关键点和所述左眼关键点的连线相对于水平线的偏移角度。
  8. 根据权利要求1所述的摄像机,其特征在于,所述处理器,具体用于:
    计算所述待评分面部图像的清晰度;
    根据所述清晰度、所述参考评分和所述尺寸评分,确定所述待评分面部图像的最终评分。
  9. 根据权利要求8所述的摄像机,其特征在于,所述处理器,具体用于:
    采用对角线方向梯度算法,计算所述待评分面部图像中各个像素点的梯度值,根据所述待评分面部图像中各个像素点的梯度值的平均值,计算所述待评分面部图像的清晰度。
  10. 根据权利要求1所述的摄像机,其特征在于,所述处理器,具体用于:
    获取所述图像采集模块采集的抓拍图像,作为待评分面部图像;
    所述处理器,还用于:
    在得到所述待评分面部图像的最终评分之后,获取最佳面部图像的评分;判断所述待评分面部图像的最终评分是否大于所述最佳面部图像的评分,如果大于,则将所述待评分面部图像更新为所述最佳面部图像。
  11. 一种面部图像评分方法,其特征在于,所述方法包括:
    获取待评分面部图像;
    将所述待评分面部图像发送至预先训练的神经网络,由所述神经网络根据网络参数对所述待评分面部图像进行评分,得到参考评分;
    提取所述待评分面部图像中面部区域的各个关键点;
    根据各个关键点之间的距离以及预设的距离与尺寸评分的对应关系,确定所述待评分面部图像中面部区域的尺寸评分;
    根据所述参考评分和所述尺寸评分,确定所述待评分面部图像的最终评分。
PCT/CN2019/080024 2018-05-30 2019-03-28 一种面部图像评分方法及摄像机 WO2019228040A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810540396.7A CN110634116B (zh) 2018-05-30 2018-05-30 一种面部图像评分方法及摄像机
CN201810540396.7 2018-05-30

Publications (1)

Publication Number Publication Date
WO2019228040A1 true WO2019228040A1 (zh) 2019-12-05

Family

ID=68697212

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/080024 WO2019228040A1 (zh) 2018-05-30 2019-03-28 一种面部图像评分方法及摄像机

Country Status (2)

Country Link
CN (1) CN110634116B (zh)
WO (1) WO2019228040A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126268A (zh) * 2019-12-24 2020-05-08 北京奇艺世纪科技有限公司 关键点检测模型训练方法、装置、电子设备及存储介质
CN113012089A (zh) * 2019-12-19 2021-06-22 北京金山云网络技术有限公司 一种图像质量评价方法及装置
CN113822171A (zh) * 2021-08-31 2021-12-21 苏州中科先进技术研究院有限公司 一种宠物颜值评分方法、装置、存储介质及设备

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111415302B (zh) * 2020-03-25 2023-06-09 Oppo广东移动通信有限公司 图像处理方法、装置、存储介质及电子设备
CN111524123B (zh) * 2020-04-23 2023-08-08 北京百度网讯科技有限公司 用于处理图像的方法和装置
CN112418098A (zh) * 2020-11-24 2021-02-26 深圳云天励飞技术股份有限公司 视频结构化模型的训练方法及相关设备
CN113255472B (zh) * 2021-05-07 2024-05-24 北京中科通量科技有限公司 一种基于随机嵌入稳定性的人脸质量评价方法及系统
CN114724183B (zh) * 2022-04-08 2024-05-24 平安科技(深圳)有限公司 人体关键点检测方法、系统、电子设备及可读存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9549119B2 (en) * 2012-04-10 2017-01-17 Samsung Electronics Co., Ltd Apparatus and method for continuously taking a picture
CN107832802A (zh) * 2017-11-23 2018-03-23 北京智芯原动科技有限公司 基于人脸比对的人脸图像质量评价方法及装置
CN108038422A (zh) * 2017-11-21 2018-05-15 平安科技(深圳)有限公司 摄像装置、人脸识别的方法及计算机可读存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799877A (zh) * 2012-09-11 2012-11-28 上海中原电子技术工程有限公司 人脸图像筛选方法及系统
CN106205633B (zh) * 2016-07-06 2019-10-18 李彦芝 一种模仿、表演练习打分系统
CN106454338B (zh) * 2016-11-25 2018-09-25 广州视源电子科技股份有限公司 电子设备图片显示效果检测方法及装置
CN107194898A (zh) * 2017-06-23 2017-09-22 携程计算机技术(上海)有限公司 酒店图像的展示方法、存储介质及酒店信息的推送方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9549119B2 (en) * 2012-04-10 2017-01-17 Samsung Electronics Co., Ltd Apparatus and method for continuously taking a picture
CN108038422A (zh) * 2017-11-21 2018-05-15 平安科技(深圳)有限公司 摄像装置、人脸识别的方法及计算机可读存储介质
CN107832802A (zh) * 2017-11-23 2018-03-23 北京智芯原动科技有限公司 基于人脸比对的人脸图像质量评价方法及装置

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113012089A (zh) * 2019-12-19 2021-06-22 北京金山云网络技术有限公司 一种图像质量评价方法及装置
CN111126268A (zh) * 2019-12-24 2020-05-08 北京奇艺世纪科技有限公司 关键点检测模型训练方法、装置、电子设备及存储介质
CN111126268B (zh) * 2019-12-24 2023-04-25 北京奇艺世纪科技有限公司 关键点检测模型训练方法、装置、电子设备及存储介质
CN113822171A (zh) * 2021-08-31 2021-12-21 苏州中科先进技术研究院有限公司 一种宠物颜值评分方法、装置、存储介质及设备

Also Published As

Publication number Publication date
CN110634116A (zh) 2019-12-31
CN110634116B (zh) 2022-04-05

Similar Documents

Publication Publication Date Title
WO2019228040A1 (zh) 一种面部图像评分方法及摄像机
CN107610087B (zh) 一种基于深度学习的舌苔自动分割方法
CN108717663B (zh) 基于微表情的面签欺诈判断方法、装置、设备及介质
US7848548B1 (en) Method and system for robust demographic classification using pose independent model from sequence of face images
TW202004637A (zh) 一種風險預測方法、存儲介質和伺服器
KR20200004841A (ko) 셀피를 촬영하도록 사용자를 안내하기 위한 시스템 및 방법
WO2015078261A1 (en) Methods and systems for processing facial images
CN107330371A (zh) 3d脸部模型的脸部表情的获取方法、装置和存储装置
JP2017506379A (ja) 無制約の媒体内の顔を識別するシステムおよび方法
CN107368778A (zh) 人脸表情的捕捉方法、装置及存储装置
CN108229297A (zh) 人脸识别方法和装置、电子设备、计算机存储介质
WO2019091402A1 (zh) 年龄预估方法和装置
Lee et al. Markov random field models for hair and face segmentation
CN109685045A (zh) 一种运动目标视频跟踪方法及系统
CN111652317A (zh) 基于贝叶斯深度学习的超参数图像分割方法
CN109409298A (zh) 一种基于视频处理的视线追踪方法
Martinikorena et al. Fast and robust ellipse detection algorithm for head-mounted eye tracking systems
CN108416304B (zh) 一种利用上下文信息的三分类人脸检测方法
Kroon et al. Eye localization in low and standard definition content with application to face matching
CN113436735A (zh) 基于人脸结构度量的体重指数预测方法、设备和存储介质
CN110827327B (zh) 一种基于融合的长期目标跟踪方法
Yahiaoui et al. Markov Chains for unsupervised segmentation of degraded NIR iris images for person recognition
WO2015131710A1 (zh) 人眼定位方法及装置
US9349038B2 (en) Method and apparatus for estimating position of head, computer readable storage medium thereof
RU2768797C1 (ru) Способ и система для определения синтетически измененных изображений лиц на видео

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19810252

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19810252

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19810252

Country of ref document: EP

Kind code of ref document: A1