CN113435428A

CN113435428A - Photo album-based photo sticker selection method, electronic equipment and storage medium

Info

Publication number: CN113435428A
Application number: CN202110991809.5A
Authority: CN
Inventors: 林鸿飞; 周有喜; 乔国坤
Original assignee: Shenzhen Aishen Yingtong Information Technology Co Ltd
Current assignee: Core Computing Integrated Shenzhen Technology Co ltd
Priority date: 2021-08-27
Filing date: 2021-08-27
Publication date: 2021-09-24
Anticipated expiration: 2041-08-27
Also published as: CN113435428B

Abstract

The application discloses a photo album-based method for selecting photo stickers, electronic equipment and a storage medium. The photo album-based method for selecting the photo stickers obtains the face comprehensive score according to the face size, the face similarity, the face brightness, the face definition and the face angle in the face picture, and then selects the photo stickers according to the face comprehensive score, so that the problems of face blurring and too high or too low brightness of the selected photo stickers are reduced.

Description

Photo album-based photo sticker selection method, electronic equipment and storage medium

Technical Field

The application relates to the technical field of image processing, in particular to a photo album-based method for selecting a photo sticker, electronic equipment and a storage medium.

Background

The photo sticker is also called a sticker photo, and is a popular photographing mode, and most of the photo stickers are self-portrait face photos. The electronic photo album can gather photos of the same person together through the face clustering function to form a personal electronic photo album. In some of these scenarios, the personal electronic album needs to select one photo from the photos and place the photo on the cover of the personal album.

However, in the prior art, some fuzzy, too bright and too dark faces are easily selected from photos to be used as stickers to be placed on the cover of the personal photo album.

Disclosure of Invention

Based on this, in order to solve or improve the problems in the prior art, the present application provides a method, an electronic device, and a storage medium for selecting a photo sticker based on an album, which can reduce the problems of face blur and over-high or over-low brightness of the selected photo sticker.

In a first aspect, a method for selecting a photo sticker based on an album is provided, which includes:

acquiring a plurality of face pictures in an electronic photo album;

acquiring a face comprehensive score according to the face size, the face similarity, the face brightness, the face definition and the face angle in each face picture;

detecting whether the face in each face picture is an eye-opening face or not according to the sequence of the comprehensive face score from high to low;

when the eye-opening face is detected for the first time, extracting the eye-opening face in the corresponding face picture, and taking the eye-opening face as a sticker face.

In one embodiment, the method for selecting the photo stickers based on the photo album further includes:

when the human faces in all the human face pictures are detected to be the closed-eye human faces according to the sequence of the human face comprehensive scores from high to low, the human face in the human face picture with the highest comprehensive score is extracted to be used as the sticker human face.

In one embodiment, the obtaining of the comprehensive face score according to the face size, the face similarity, the face brightness, the face sharpness, and the face angle in each of the face pictures includes:

selecting a first face picture from the plurality of face pictures;

acquiring the size of a face frame of the first face picture, and determining the size of the largest face frame in the plurality of face pictures; dividing the size of the face frame of the first face picture by the maximum size of the face frame to obtain a face frame size ratio; multiplying the face frame size ratio by a face size weight value to obtain a face size dimension value of the first face picture;

acquiring a characteristic value of a face in the first face picture and a characteristic value of a face in each residual face picture according to a face recognition model, wherein the residual face pictures are the face pictures of the plurality of face pictures except the first face picture; acquiring an average value of similarity between the face in the first face picture and the faces in the residual face pictures according to the feature values of the faces in the first face picture and the feature values of the faces in the residual face pictures; multiplying the average value of the face similarity by a face similarity weight value to obtain a face similarity dimension value of the first face picture;

converting the face area in the first face picture into a gray image; acquiring an average value of gray points in a face area as a brightness value of the face; acquiring an absolute value of a difference between a brightness value of a human face and a preset brightness value; dividing the absolute value by the preset brightness value to obtain a brightness deviation degree; multiplying the brightness deviation degree by a face brightness weight value to obtain a face brightness dimension value of the first face picture;

acquiring a definition degree value of the face in the first face picture through a definition classification model; multiplying the sharpness score value by a face sharpness weighted value to obtain a face sharpness dimensional value of the first face picture;

acquiring a left-right inclination angle, a left-right deflection angle and a pitching angle of the face in the first face picture through a face angle classification model; adding the product of the left and right deflection angles and the left and right deflection weight values, the product of the left and right inclination angles and the left and right inclination weight values, and the product of the pitching angles and the pitching weight values to obtain a face angle dimension value of the first face picture;

and adding the face size dimension value, the face similarity dimension value, the face brightness dimension value, the face definition dimension value and the face angle dimension value of the first face picture to obtain the face comprehensive score of the first face picture.

In one embodiment of the method, an open-close eye classification model is adopted to detect whether the face in each face picture is an open-eye face;

before the detecting whether the face in each of the face pictures is an eye-opening face or not, the method further comprises the step of training the eye-opening and closing classification model:

acquiring a plurality of face images as sample images, wherein the face images comprise eye-opening face images and eye-closing face images;

marking the eye-opening face image by adopting an eye-opening label, and marking the eye-closing face image by adopting an eye-closing label to obtain an eye-opening and eye-closing training image;

and training the opening and closing eye classification model by using the opening and closing eye training image, wherein when the opening and closing eye classification model is trained, the feature difference of the face images with the opening eye labels and the closing eye labels is expanded, the feature difference between the face images with the opening eye labels is reduced, and the feature difference of the face images with the closing eye labels is reduced at the same time until the loss value of the opening and closing eye classification model is smaller than a preset value.

In one embodiment, the face angle classification model includes: a left and right deflection angle model, a left and right inclination angle model and a pitching angle model;

before the left-right inclination angle, the left-right deflection angle and the pitching angle of the face in the face picture are obtained through the face angle classification model, the left-right deflection angle model, the left-right inclination angle model and the pitching angle model are respectively trained;

wherein the training step of the left and right deflection angle models comprises: acquiring a plurality of face images; marking the face image by adopting a left deflection angle marking value and a right deflection angle marking value to obtain a left deflection training image and a right deflection training image, wherein the left deflection angle marking value and the right deflection angle marking value are obtained by dividing the difference value between a preset left deflection angle and a preset right deflection angle and the left and right inclination angles of the face in the face image by the preset left deflection angle and the preset right deflection angle; training the left and right deflection angle model through the left and right deflection training images;

the training step of the left-right inclination angle model comprises the following steps: acquiring a plurality of face images; marking the face image by adopting a left and right inclination angle marking value to obtain a left and right inclination training image, wherein the left and right inclination angle marking value is calculated by dividing the difference value between a preset left and right inclination angle and the left and right inclination angle of the face in the face image by the preset left and right inclination angle; training the left and right inclination angle model through the left and right inclination training images;

the step of training the pitch angle model comprises: acquiring a plurality of face images; labeling the face image by using a pitch angle labeling value to obtain a pitch training image, wherein the pitch angle labeling value is calculated by dividing a difference value between a preset pitch angle and a face pitch angle in the face image by the preset pitch angle; and training the pitch angle model through the pitch training image.

In one embodiment, the face size weight is 0.1, the face similarity weight is 0.2, the face brightness weight is 0.1, the face sharpness weight is 0.15, the yaw weight is 0.15, the pitch weight is 0.15.

In one embodiment of the present invention, the step of obtaining the size of the face frame of the face picture includes:

preprocessing the face picture, wherein the preprocessing comprises face righting processing and face image enhancement processing;

and inputting the preprocessed face picture into a face detection model to obtain the size of a face frame.

In one embodiment, before the inputting the preprocessed human face picture into the human face detection model, training the human face detection model is further included;

the loss function loss adopted for training the face detection model is as follows:

loss=L₁+L₂+L₃+L₄+β×L₅

wherein, L is₁For face frame coordinate offset loss, L₂Scaling loss for the face bounding box, L₃For face bounding box confidence loss, L₄For classification loss, said L₅For ambiguity loss, the beta is a preset coefficient;

wherein L is₅=（1+S（L₃+L₄））×C(B_t，B_p)

S is an S-type function, and S (L)₃+L₄) An S-shaped function value which is the sum of the face frame confidence loss and the classification loss, B_tAs a true ambiguity label, said B_pFor prediction ambiguity labeling, C is a two-class cross entropy function, C (B)_t，B_p) Two-class cross entropy function values for the true ambiguity label and the predicted ambiguity label.

In a second aspect, an electronic device is provided, which includes a memory and a processor, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the steps of the method for selecting photo albums based on photo albums.

In a third aspect, one or more non-transitory readable storage media storing computer-readable instructions are provided, which when executed by one or more processors, cause the one or more processors to perform the steps of the album selection-based photo method as described above.

According to the photo album-based method for selecting the photo albums, the comprehensive face score is obtained according to the face size, the face similarity, the face brightness, the face definition and the face angle in the face picture, and the photo albums are selected according to the comprehensive face score, so that the problems of face blurring and over-high or over-low brightness of the selected photo albums are reduced. Moreover, whether the face in the face picture is an eye-opening face is detected according to the sequence of the comprehensive face score from high to low; when the eye-opening face is detected for the first time, the eye-opening face is used as a photo sticker face, and eye-opening face detection cannot be performed on a subsequent face picture after eye opening is detected for the first time, so that the time for selecting the photo stickers is reduced.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below. It is to be understood that the drawings in the following description are illustrative only and are not restrictive of the invention.

Fig. 1 is a schematic diagram of an internal structure of an electronic device according to an embodiment of the present application.

Fig. 2 is a flowchart of a method for selecting a photo sticker based on an album in an embodiment of the present application.

Fig. 3 is another flowchart of a method for photo album-based photo sticker selection in an embodiment of the present application.

Fig. 4 is a schematic diagram of a face frame in a face picture according to an embodiment of the present application.

Fig. 5 is a schematic diagram of three dimensions, namely, a left-right deflection angle dimension y (yaw), a left-right inclination angle dimension r (roll), and a pitch dimension p (pitch), in a face picture according to an embodiment of the present application.

Fig. 6 is a schematic diagram illustrating left and right deflection of a face in a face picture according to an embodiment of the present application.

Fig. 7 is a schematic diagram illustrating a left-right inclination of a face in a face picture according to an embodiment of the present application.

Fig. 8 is a schematic view of the pitch of the face in the face picture according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application are clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Fig. 1 is a schematic diagram of an internal structure of an electronic device in one embodiment. As shown in fig. 1, the terminal includes a processor, a memory, and a network interface connected by a system bus. Wherein, the processor is used for providing calculation and control capability and supporting the operation of the whole electronic equipment. The memory is used for storing data, programs and the like, and the memory stores at least one computer program which can be executed by the processor to realize the wireless network communication method suitable for the electronic device provided by the embodiment of the application. The memory may include a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The computer program may be executed by a processor to implement a method for selecting photo albums based on photo albums provided by the following embodiments. The internal memory provides a cached execution environment for the operating system computer programs in the non-volatile storage medium. The network interface may be an ethernet card or a wireless network card, etc. for communicating with an external electronic device.

The electronic devices described in the present application may include mobile terminals such as a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a Personal Digital Assistant (PDA), a Portable Media Player (PMP), a navigation device, a wearable device, a smart band, a pedometer, and fixed terminals such as a Digital TV, a desktop computer, and the like.

The following description will be given taking a mobile terminal as an example, and it will be understood by those skilled in the art that the configuration according to the embodiment of the present application can be applied to a fixed type terminal in addition to elements particularly used for mobile purposes.

Referring to fig. 2, the method for selecting a photo sticker based on an album includes:

s10, acquiring a plurality of face pictures in the electronic album;

s20, obtaining face comprehensive scores according to the face size, the face similarity, the face brightness, the face definition and the face angle in each face picture;

s30, detecting whether the face in each face picture is an eye-opening face or not according to the sequence of the comprehensive face score from high to low;

and S40, when the eye-opening face is detected for the first time, extracting the eye-opening face in the corresponding face picture, and taking the eye-opening face as a sticker face.

In the method for selecting the photo stickers based on the photo album, the comprehensive face score is obtained according to the face size, the face similarity, the face brightness, the face definition and the face angle in the face picture, so that the problems of face blurring and over-high or over-low brightness of the selected photo stickers are reduced. Moreover, whether the face in the face picture is an eye-opening face is detected by sequencing the face comprehensive score from high to low; when the eye-opening face is detected for the first time, the eye-opening face is used as a photo sticker face, and eye-opening face detection cannot be performed on a subsequent face picture after eye opening is detected for the first time, so that the time for selecting the photo stickers is reduced.

Referring to fig. 3, in an embodiment, after detecting whether the face in each face picture is an open-eye face according to the ranking of the face synthesis score from high to low, the method further includes:

and S50, when the human faces in all the human face pictures are detected to be eye-closed human faces according to the sequence of the comprehensive human face scores from high to low, extracting the human face in the human face picture with the highest comprehensive human face score as a sticker human face.

In the embodiment, when the eye-opening face detection is completed on all the face pictures, the eye-opening face is still not detected, that is, all the faces are detected to be eye-closing faces, the face with the highest comprehensive score is used as the sticker face, and the situation that the selection of the stickers fails when all the eye-closing face pictures are closed can be reduced.

In some embodiments, a preset composite score threshold may be set, so that when the face composite score of the detected face pictures reaches the preset composite score threshold and no open-eye face is detected yet, that is, all the faces in the face pictures with the face composite score greater than or equal to the preset composite score threshold are closed-eye faces, the face with the highest composite score is taken as the sticker face.

The album in step S10 may be an electronic album, and includes a plurality of face pictures selected as the photo stickers. Optionally, the electronic album has a face clustering function, and the face clustering function can be used for clustering the pictures of the same person from the provided pictures of different persons and putting the pictures into the electronic album. Namely, through the clustering function, the electronic photo album can be a personal electronic photo composed of a plurality of face pictures of the same person. In addition, if the provided album is not an electronic album, the non-electronic album needs to be converted into the electronic album. And taking a plurality of face pictures obtained from the electronic photo album as a face picture set for selecting the photo stickers.

In step S20, a face comprehensive score is obtained according to the face size, the face similarity, the face brightness, the face sharpness, and the face angle in the face picture. The size of the face, the similarity of the face, the brightness of the face, the definition of the face and the angle of the face all affect the level of the comprehensive face score, that is, the comprehensive face score is related to the size of the face, the similarity of the face, the brightness of the face, the definition of the face and the angle of the face, and can be specifically realized by a comprehensive face score calculation method.

In one embodiment, a method for calculating a comprehensive face score is provided, and specifically, obtaining a comprehensive face score according to the face size, the face similarity, the face brightness, the face definition, and the face angle in each face picture includes:

s201, selecting a first face picture from a plurality of face pictures;

s202, obtaining the size of a face frame (the size of the face frame) of a first face picture, and determining the size of the largest face frame in a plurality of face pictures; dividing the size of the face frame of the first face picture by the maximum size of the face frame to obtain a size ratio of the face frame; multiplying the size ratio of the face frame by the face size weight value to obtain a face size dimension value of the first face picture;

s203, obtaining a characteristic value of a face in a first face picture and a characteristic value of a face in each residual face picture according to the face recognition model, wherein the residual face pictures are the residual face pictures except the first face picture in the plurality of face pictures; acquiring an average value of the similarity between the face in the first face picture and the face in each of the rest face pictures according to the feature value of the face in the first face picture and the feature values of the faces in the rest face pictures; multiplying the average value of the face similarity by the face similarity weight value to obtain a face similarity dimension value of the first face picture;

s204, converting a face area in the first face picture into a gray image; acquiring an average value of gray points in a face area as a brightness value of the face; acquiring an absolute value of a difference between a brightness value of a human face and a preset brightness value; dividing the absolute value by a preset brightness value to obtain a brightness deviation degree; multiplying the brightness deviation degree by the face brightness weight value to obtain a face brightness dimension value of the first face picture;

s205, obtaining a definition degree value of the face in the first face picture through a definition classification model; multiplying the definition degree value by a face definition weight value to obtain a face definition dimension value of the first face picture;

s206, acquiring a left-right inclination angle, a left-right deflection angle and a pitching angle of the face in the first face picture through the face angle classification model; adding the product of the left and right deflection angles and the left and right deflection weight values, the product of the left and right inclination angles and the left and right inclination weight values, and the product of the pitching angle and the pitching weight value to obtain a face angle dimension value of the first face picture;

and S207, adding the face size dimension value, the face similarity dimension value, the face brightness dimension value, the face definition dimension value and the face angle dimension value of the first face picture to obtain the face comprehensive score of the first face picture.

In step S201, the first face picture is one of the plurality of face pictures that is currently subjected to face comprehensive grading, and at this time, the other remaining face pictures that are not subjected to face comprehensive grading in the plurality of face pictures are called remaining face pictures. And grading each face picture in the plurality of face pictures according to a face comprehensive grading method of the first face picture in sequence to finish face comprehensive grading of each face picture in the plurality of face pictures.

Step S202 provides a method for obtaining a face size dimension value, where the face size dimension value is a score used for evaluating the size of a face in a comprehensive score of the face. The size dimension of the face is used as the standard for selecting the photo stickers, so that the photo stickers with large faces can be obtained by preferential screening. The larger the face size dimension value is, the larger the face is, and the better the face size dimension is. Optionally, the face size dimension value ranges from 0 to a face size weight value, for example, if the face size weight value is 0.1, the face size dimension value ranges from 0 to 0.1.

In the method for obtaining the face size dimension value, the size of the face in the picture can be represented by the size of the face frame. The frame of the face is a frame generated by the face detection model, as shown in fig. 4, the face frame 22 is located around the face 21. The size of the face frame can be understood as the size of the face frame, and can be specifically represented by the length of the side length of the frame, and the length is represented by taking pixels as a unit. In one embodiment, the step of obtaining the size of the face frame of the face picture includes:

s2021, preprocessing the face image, wherein the preprocessing comprises face righting processing and face image enhancement processing;

s2022, inputting the preprocessed human face picture into a human face detection model to obtain the size of a human face frame.

Step S2021 is to perform preprocessing on the face image, such as face straightening processing and face image enhancement processing, to reduce the problem of false face borders appearing in the face detection model. Specifically, the face centering is to obtain a face image with a correct face position; the human face image enhancement is to improve the quality of the human face image, so that the image is clearer visually and is more beneficial to the processing and recognition of a computer.

Optionally, the face righting processing specifically includes the steps of: obtaining an affine transformation matrix according to a plurality of (such as 5) feature points of the face in the face picture and a plurality of (such as 5) reference feature point coordinates of the standard face; and performing rotation translation correction on the face through the radiation transformation matrix to obtain the righted face.

Optionally, the enhancement processing of the face image specifically includes the steps of: counting the number of each pixel in the gray level in the whole image; calculating the probability distribution of each gray level in the image; calculating the cumulative distribution probability; calculating the gray value after equalization; the pixel values of the coordinates of the original pixels are mapped back.

Step S2022 is to obtain a face frame and a confidence corresponding to the face frame through a face detection model (face detection model). The face detection model is a deep learning neural network, such as a yolov 3-based neural network, and can acquire face borders and confidence degrees from face pictures.

Further, the training method of the face detection model based on yolov3 is improved, so that the face ambiguity can be scored, more accurate face ambiguity and confidence can be obtained, and the processing speed of the whole face detection model is increased. Specifically, the training set used in the training method of the face detection model increases the dimension of the ambiguity, that is, the ambiguity of the face in the pictures of the training set is marked to judge whether the face is a blurred face.

Further, the loss function loss adopted for training the face detection model is as follows:

loss=L₁+L₂+L₃+L₄+β×L₅

wherein L is₁For the loss of the coordinate offset of the face frame, L₂Is the scaling loss of the human face frame, L₃For face bounding box confidence loss, L₄To classify the loss, L₅For ambiguity loss, β is a preset coefficient;

wherein L is₅=（1+S（L₃+L₄））×C(B_t，B_p)

S is a function of the S type, S (L)₃+L₄) S-shaped function value which is the sum of confidence loss and classification loss of the human face frame, B_tAs a true ambiguity label, B_pFor prediction ambiguity labels, C is a two-class cross entropy function, C (B)_t，B_p) Two-class cross entropy function values for the true ambiguity label and the predicted ambiguity label.

In particular, L₁（L₁May be referred to as xy _ loss) may be a binary cross entropy loss designed based on the offset of the grid point coordinates at the top left of the center point of the face bounding box. L is₂（L₂Which may be referred to as wh _ loss) is a binary cross entropy penalty based on the face bounding box width and height design. L is₃（L₃Can be called confidence _ loss) is a binary cross entropy loss based on obj and no _ obj, and is specifically divided into two cases of obj and no _ obj to calculate the loss, wherein the binary cross entropy is calculated for obj (the face frame has a corresponding real frame); for no _ obj (the face bounding box has no corresponding real box), for example, when iou (Intersection over Union) of the face bounding box and the real box is lower than 0.5, a binary Intersection entropy corresponding to no _ obj needs to be calculated. L is₄（L₄May be referred to as class _ loss) may be a class loss based on two-class cross entropy, and further, for n classes, n two-class cross entropy loss functions are used.

Increasing L in loss function adopted by face detection model₅（L₅May be referred to as blu _ loss) as the loss of ambiguity. Specifically, in the training process, the ambiguity dimension is added to the dimension of yolov3 category feature, and the dimension of feature graph output can be N × N × [ a × (b + c + d)]And calculating, wherein NxN is the number of lattice points of the output feature map, a is the number of the preset anchor frames, b is the number of the prediction frame values of each face frame, c is the confidence coefficient of the prediction frame, and d is the class feature dimension number.

In the loss function adopted by the face detection model, β is a preset coefficient, also called an adjustable parameter, and is used for coordinating the relationship between blu _ loss and other losses.

S represents an S-type function, and is an activation function, specifically a sigmoid function, and it can be understood that S (L)₃+L₄) An S-shaped function value which is the sum of the confidence coefficient loss and the classification loss of the human face frame; b is_t(true _ blu _ label) is a true ambiguity label, B_p(pred _ blu _ label) is a prediction ambiguity label; c is a binary cross entropy function, in particular a binary _ cross _ entropy function; c (B)_t，B_p) Two-class cross entropy function values for the true ambiguity label and the predicted ambiguity label. That is, the loss of ambiguity is calculated by the following formula when the loss is calculated for each lattice point:

the sigmoid is an S-type activation function, the sigmoid (confidence _ loss + class _ loss) is a sigmoid function value of the sum of the confidence _ loss and the class _ loss, the binary _ cross _ entry is a binary loss function, the true _ blue _ label is a true ambiguity label, the pred _ blue _ label is a predicted ambiguity label, and the binary _ cross _ entry (true _ blue _ label, pred _ blue _ label) is a binary cross entropy function value of the true _ blue _ label and the pred _ blue _ label.

Wherein, confidence _ loss and class _ loss assist to adjust the ambiguity loss, sigmoid maps the value of confidence _ loss + class _ loss as a part of the ambiguity score loss adjusting coefficient, therefore, the three have strong correlation. It can be understood that if the confidence of the face frame and the confidence of the face category are low, i.e. the loss values of the face frame and the face category are high, the ambiguity will also obtain a large loss (not more than twice the directly calculated ambiguity loss); if the confidence of the face frame and the confidence of the face category are high, the loss values of the face frame and the face category are low, and the ambiguity loss value is close to a direct loss obtained by binary _ cross _ entry (true _ blue _ label, pred _ blue _ label), wherein the binary _ cross _ entry is a binary cross entropy loss function and is used for calculating binary-type losses of a prediction ambiguity label and a real ambiguity label, the true _ blue _ label is a real ambiguity label and is also called a prediction ambiguity score, and the pred _ blue _ label is a prediction ambiguity label and is also called a real ambiguity score. During the training process, pred _ blu _ label (prediction ambiguity label) will be gradually optimized to the true ambiguity label given in the training set.

The loss function of the face detection model comprises ambiguity loss, so that the detected face directly has the fuzzification attribute and does not need to be subjected to ambiguity judgment again; moreover, in order to reduce the loss function of the ambiguity, the confidence coefficient of the face frame needs to be lost, and blu _ loss is that the ambiguity loss is reduced as much as possible, so that the parameters of the face detection model in the process of the detritus training are better, and the confidence coefficients of the trained face frame and the trained face type are more accurate.

In addition, in order to make the face detection model converge more effectively, avoid the problem of gradient explosion caused by the incoordination of one or more scoring dimensions, and reduce the workload of parameter adjustment, the embodiment provides a dynamic gradient clipping method. Specifically, in the dynamic gradient clipping method, to prevent the situation of gradient explosion during the back propagation of the optimized neural network, a larger gradient is clipped, that is, an upper gradient limit is set to 1, and then a gradient larger than 1 is forcibly set to 1 to update the parameters.

In the dynamic gradient clipping method, it is found that there is no difference in gradients larger than a specified threshold, and a problem may occur in convergence of a face detection model due to a problem of threshold setting. That is, in the dynamic gradient clipping method, there is a problem that the gradient threshold is difficult to select.

For the problem that the gradient threshold is difficult to select, in this embodiment, the gradient is mapped to the specified range by using a preset function (preset mapping function), so that the effect of reflecting the difference in size is achieved, and the problem that the too large gradient causes gradient explosion is solved.

The preset gradient mapping function may be:

where z is the original gradient (true _ gradient), h_cIs the mapped gradient (clip _ gradient).

Specifically, after the first face picture is obtained, a face region in the first face picture is obtained by using a face detection model, wherein the face region is a region of the face picture corresponding to a face frame;

the face detection model is obtained by training through the following steps of:

inputting the training picture into a face detection model, and acquiring the descending gradient of the face detection model in the back propagation process;

mapping the descending gradient to a preset specified range through a mapping function to obtain a mapped descending gradient;

updating parameters of the face detection model through the mapped descending gradient;

wherein the gradient mapping function is:

where z is the descent gradient, e is the base of the natural logarithmic function, h_cIs the mapped falling gradient.

For example, in updating the parameters of the face detection model through the mapped descending gradient, the following formula may be used to update the parameters of the face detection model:

wherein,

in order to obtain the updated parameters, the parameters are,

in order to obtain the parameters before the update,

in order to preset the learning rate,

in order to be a function of the gradient mapping,

based on the sample data

The prediction function of (a) is determined,

in order to select one of the sample data,

in order to be a function of the gradient operator,

representing a prediction function

And (4) calculating the partial derivative of theta, wherein t is the updating times or the training times.

The embodiment may further include S2023, adjusting the size of the face frame according to the confidence corresponding to the face frame, and using the adjusted size of the face frame as the size of the face frame of the face picture, for example, if the confidence is lower than the preset confidence, reducing the size of the face frame according to a preset ratio.

In step S203, a face similarity value is obtained by using the face recognition model. Specifically, extracting a characteristic value of a face in each face picture through a face recognition model; sequentially selecting one face picture, sequentially comparing the features of the face in the face picture with the faces of the rest of other face pictures to obtain similarity values, summing all the obtained similarity values to obtain an average value, and multiplying the average value by a face similarity weighted value to respectively obtain each face similarity dimension value.

The similarity condition of the face in the face picture and the faces in other residual face pictures can be judged by utilizing the face similarity dimension value, wherein the higher the face similarity dimension value is, the more representative the face is in all the faces, for example, in a face sequence, most of the faces have higher image quality, and then the score value of the similarity dimension value corresponding to the face with poor image quality is lower. Optionally, the face similarity dimension value ranges from 0 to the face similarity weight value, and the larger the face similarity dimension value is, the better the face similarity is, and the higher the face image quality is.

In step S204, the brightness of the human face is scored. Specifically, a formula for converting an RGB color space into a gray scale space: the Gray = R × 0.299 + G × 0.587 + B × 0.114 converts the RGB image of the face area in the face picture into a Gray image, and then sums up the Gray points in the face area to obtain the average value, so as to obtain the face brightness value. Optionally, the brightness value of the face ranges from 0 to 255, and the smaller the brightness value of the face is, the darker the face is, and the larger the face is, the brighter the face is. The face region may specifically be a region corresponding to a face frame obtained by processing a face image with a face detection model.

And acquiring a face brightness dimension value through the face brightness value. Specifically, the face brightness value may be preset to represent the optimal brightness, for example, the preset face brightness value is 128, the brightness value of each face is subtracted from the preset face brightness value 128 to obtain an absolute value, the absolute value is then divided by the preset face brightness value 128 to obtain a brightness deviation, the brightness deviation may indirectly reflect the degree of the brightness deviation from the optimal brightness of the face, and finally, the brightness deviation is multiplied by the face brightness weight value to obtain a face brightness dimension value.

The face brightness dimension value can evaluate the brightness quality of the face. It can be understood that the range of the face brightness dimension value is from 0 to the face brightness weight value, and the larger the face brightness dimension value is, the better the face brightness dimension is.

Step S205 is to obtain a face definition dimension value, and score the face definition in the face picture according to the face definition dimension value. Optionally, the face sharpness dimension value corresponding to the blurred face is low. That is, the larger the face definition dimension value is, the clearer the face is.

Specifically, the neural network is trained by adopting a face training image with a definition label to obtain a definition classification model, corresponding definition values can be obtained after face pictures are input, and then the definition values are multiplied by face definition weight values to obtain face definition dimension values. Optionally, the sharpness score value is between 0 and 1, and a lower sharpness score value represents blurriness, and a higher sharpness score value represents sharpness, that is, the range of the face sharpness dimension value is between 0 and the face sharpness weight value.

In addition, in order to improve the effect of the definition classification accuracy of the definition classification model, the method for training the definition classification model comprises the following steps: and acquiring 5 thousands of clear face samples, marking the samples as clear, acquiring 5 thousands of fuzzy face samples, marking the samples as fuzzy, and training a neural network on the 10 thousands of sample data to obtain a definition classification model.

Step S206 is to calculate a face angle dimension value for evaluating the orientation of the face in the face picture. Referring to fig. 5, the face angle dimension can be specifically divided into three dimensions: the left and right deflection angle dimension Y (yaw), the left and right inclination angle dimension R (roll) and the pitch dimension P (pitch) respectively correspond to a left and right deflection angle dimension value, a left and right inclination angle dimension value and a pitch dimension value.

Specifically, the input face picture is processed by using a face angle classification model based on a neural network (the face angle classification model comprises a left-right deflection angle model, a left-right inclination angle model and a pitching angle model), so that a left-right deflection angle dimension value, a left-right inclination angle dimension value and a pitching angle dimension value corresponding to the face can be obtained.

before acquiring a left-right inclination angle, a left-right deflection angle and a pitching angle of a face in a face picture through a face angle classification model, respectively training a left-right deflection angle model, a left-right inclination angle model and a pitching angle model;

the training step of the left and right deflection angle model comprises the following steps: acquiring a plurality of face images; marking the face image by adopting a left deflection angle marking value and a right deflection angle marking value to obtain a left deflection training image and a right deflection training image, wherein the left deflection angle marking value and the right deflection angle marking value are obtained by dividing the difference value between the preset left deflection angle and the preset right deflection angle and the left and right inclination angle of the face in the face image by the preset left deflection angle and the preset right deflection angle; training a left deflection angle model and a right deflection angle model through a left deflection training image and a right deflection training image;

the method includes acquiring a plurality of face images, namely images of a plurality of faces under various light rays, wherein the face angle deviates from 90 degrees from the left to 90 degrees from the right (which can be recorded as-90 degrees), as shown in fig. 6. The face angle is a deflection angle of a face in the face image relative to the front face, and specifically, the front face may be set to 0 °. The number of the acquired face images can be 2 ten thousand.

Optionally, the left and right deflection angles are preset to be 90 degrees, the range of the left and right deflection angle marking values is 0-1, and the smaller the left and right deflection angle marking values, the larger the deviation of the face angle is.

The training step of the left and right inclination angle model comprises the following steps: acquiring a plurality of face images; marking the face image by adopting a left and right inclination angle marking value to obtain a left and right inclination training image, wherein the left and right inclination angle marking value is calculated by dividing the difference value of a preset left and right inclination angle and the left and right inclination angle of the face in the face image by the preset left and right inclination angle; training a left and right inclination angle model through a left and right inclination training image;

the obtained multiple face images are face images of multiple faces under multiple light rays (the multiple light rays may be light rays with different intensities or colors), wherein the face angle is inclined between 60 degrees from left to 60 degrees from right (which may be recorded as-60 degrees), as shown in fig. 7. The face angle is an inclination angle of a face in the face image relative to the frontal face, and specifically, the frontal face may be set to 0 °. The number of face images acquired may be 2 ten thousand.

Optionally, the left and right inclination angles are preset to be 60 degrees, the range of the left and right inclination angle marking values is 0-1, and the smaller the left and right inclination angle marking values, the larger the deviation of the face angle is.

The training step of the pitch angle model comprises the following steps: acquiring a plurality of face images; marking the face image by using a pitch angle marking value to obtain a pitch training image, wherein the pitch angle marking value is calculated by dividing the difference value between a preset pitch angle and the face pitch angle in the face image by the preset pitch angle; and training a pitch angle model through the pitch training image.

The multiple face images are obtained by taking images of multiple faces under multiple light rays, wherein the elevation angle of the face is 60 degrees to the depression angle of 60 degrees (which can be recorded as-60 degrees), as shown in fig. 8. The face inclination angle is an inclination angle of a face in the face image relative to the front face, and specifically, the front face may be set to 0 °. The number of face images acquired may be 2 ten thousand.

Optionally, the preset pitch angle is 60 degrees, the range of the pitch angle marking value is 0-1, and the smaller the pitch angle marking value is, the larger the deviation of the face angle is.

The range of the face angle dimension value is between 0 and 1, the smaller the face angle dimension value is, the larger the face angle deviation is, and the larger the face angle dimension value is, the smaller the face angle deviation is. And multiplying the left and right deflection angle dimension value, the left and right inclination angle dimension value and the pitching angle dimension value by the corresponding face angle weight value respectively, and finally summing to obtain the face angle dimension value. It is understood that the face angle dimension value ranges between 0 and the corresponding face angle weight value. Whether the face is a positive face or not can be evaluated through the face angle weighted value, and the larger the face angle weighted value is, the more positive the face in the face picture is.

When corresponding weights are set for all scoring dimensions, particularly, higher weights are set for more important dimensions so as to enhance the influence on comprehensive scoring. In one embodiment, the face size weight value is 0.08 to 0.12, the face similarity weight value is 0.15 to 0.25, the face brightness weight value is 0.08 to 0.12, the face definition weight value is 0.12 to 0.18, the yaw weight value is 0.12 to 0.18, the pitch weight value is 0.12 to 0.18, specifically, the face size weight value is 0.1, the face similarity weight value is 0.2, the face brightness weight value is 0.1, the face definition weight value is 0.15, the yaw weight value is 0.15, the pitch weight value is 0.15, and the low-quality sticker can be reduced.

Step S207 is to calculate the total dimension value of the face, specifically, sum the dimension values of the face size, the face similarity, the face brightness, the face sharpness, and the face angle of each face picture to obtain a face comprehensive score of the face picture. It is understood that a higher face composite score indicates a higher face quality.

In step S30, the open/close state of the face in the face picture is determined. In one embodiment, an open-close eye classification model is adopted to detect whether the face in the face picture is an open-eye face;

before detecting whether the face in the face picture is an eye-opening face or not, the method also comprises the step of training an eye-opening and eye-closing classification model:

s301, acquiring a plurality of face images, wherein the face images comprise eye-opening face images and eye-closing face images;

s302, labeling the eye-opening face image by adopting an eye-opening label, and labeling the eye-closing face image by adopting an eye-closing label to obtain an eye-opening and eye-closing training image;

and S303, training an opening and closing eye classification model by using the opening and closing eye training image, wherein when the opening and closing eye classification model is trained, the feature difference of the face images with the opening eye labels and the closing eye labels is expanded, the feature difference between the face images with the opening eye labels is reduced, and the feature difference of the face images with the closing eye labels is reduced until the loss value of the opening and closing eye classification model is smaller than a preset value.

The open-close eye classification model is a neural network model, and can judge the open-close eye state of the human face for the input human face picture.

Before judging the eye opening and closing state of the human face by using the input human face picture, training an eye opening and closing classification model by using an eye opening and closing human face image, wherein the human face image for training the eye opening and closing classification model is an eye opening human face and an eye closing human face under various light rays, carrying out eye opening labeling (setting a label as eye opening) on 5 thousands of eye opening human face images, and carrying out eye closing labeling (setting a label as eye closing) on the 5 thousands of eye closing human face images to obtain an eye opening and closing training image; and then training an open-closed eye classification model by using the open-closed eye training image.

In step S30, detecting whether the face in the face picture is an open-eye face according to the ranking of the comprehensive face score from high to low specifically includes: and sequencing all the face pictures from high to low according to the face comprehensive score, and then judging the eyes to be opened and closed through a high-to-low sequence face picture by using an eye opening and closing classification model.

In the step S40, in the process of sequentially judging the face pictures, if the face of the face picture is judged as an eye-open face for the first time, the face in the face picture is taken as a photo sticker, and the judgment of the following face picture is stopped; in step S50, in the process of determining the face picture, if the faces in all the face pictures are closed-eye faces, the face in the face picture with the highest comprehensive face score is selected as the photo sticker by default.

The photo album-based photo sticker selecting method can select a picture with better quality from the photo album, extract the face in the face picture as a photo sticker, and use the photo sticker as a cover of the personal photo album.

It should be understood that, although the steps in the flowcharts of fig. 2 and 3 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2 and 3 may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least some of the sub-steps or stages of other steps.

The embodiment of the present application further provides an electronic device, which includes a memory and a processor, where the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the steps of the method for selecting a photo sticker based on an album in any of the above embodiments.

The embodiment of the application also provides a computer readable storage medium. One or more non-transitory computer-readable storage media containing computer-executable instructions that, when executed by one or more processors, cause the processors to perform the steps of the method for photo album-based selection of a photo sticker.

A computer program product containing instructions which, when run on a computer, cause the computer to perform a method of selecting a sticker.

Any reference to memory, storage, database, or other medium used herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A photo album-based method for selecting photo stickers is characterized by comprising the following steps:

acquiring a plurality of face pictures in an electronic photo album;

2. The photo album based method of selecting photo stickers according to claim 1, further comprising:

when the human faces in all the human face pictures are detected to be the eye-closed human faces according to the sequence of the human face comprehensive scores from high to low, the human face in the human face picture with the highest human face comprehensive score is extracted to be used as the sticker human face.

3. The photo album based photo album:

selecting a first face picture from the plurality of face pictures;

acquiring a characteristic value of a face in the first face picture and a characteristic value of a face in each residual face picture according to a face recognition model, wherein the residual face pictures are the residual face pictures except the first face picture in the plurality of face pictures; acquiring an average value of similarity between the face in the first face picture and the faces in the residual face pictures according to the feature values of the faces in the first face picture and the feature values of the faces in the residual face pictures; multiplying the average value of the face similarity by a face similarity weight value to obtain a face similarity dimension value of the first face picture;

4. The photo album based photo sticker selection method according to claim 1, wherein an open-closed eye classification model is used to detect whether the face in each of the face pictures is an open-eye face;

5. The photo album-based photo sticker selection method according to claim 3, wherein the face angle classification model comprises: a left and right deflection angle model, a left and right inclination angle model and a pitching angle model;

6. The photo album based photo sticker selection method of claim 3, wherein the face size weight value is 0.1, the face similarity weight value is 0.2, the face brightness weight value is 0.1, the face sharpness weight value is 0.15, the yaw weight value is 0.15, the bank weight value is 0.15, and the pitch weight value is 0.15.

7. The photo album-based photo sticker selection method according to claim 3, wherein the step of obtaining the size of the face frame of the face picture comprises:

8. The photo album based photo sticker selection method according to claim 7, further comprising training the face detection model before inputting the preprocessed face picture into the face detection model;

loss=L₁+L₂+L₃+L₄+β×L₅

wherein L is₅=（1+S（L₃+L₄））×C(B_t，B_p)

S is an S-type function, thenS (L)₃+L₄) An S-shaped function value which is the sum of the face frame confidence loss and the classification loss, B_tAs a true ambiguity label, said B_pFor prediction ambiguity labeling, C is a two-class cross entropy function, C (B)_t，B_p) Two-class cross entropy function values for the true ambiguity label and the predicted ambiguity label.

9. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the computer program, when executed by the processor, causes the processor to perform the steps of the photo album selection photo sticker-based method according to any one of claims 1 to 8.

10. One or more non-transitory storage media storing computer-readable instructions thereon that, when executed by one or more processors, cause the one or more processors to perform the steps of the album selection sticker-based method of any one of claims 1-8.