CN113673466A

CN113673466A - Method for extracting photo stickers based on face key points, electronic equipment and storage medium

Info

Publication number: CN113673466A
Application number: CN202111000330.7A
Authority: CN
Inventors: 林鸿飞; 周有喜
Original assignee: Shenzhen Aishen Yingtong Information Technology Co Ltd
Current assignee: Core Computing Integrated Shenzhen Technology Co ltd
Priority date: 2021-08-27
Filing date: 2021-08-27
Publication date: 2021-11-19
Anticipated expiration: 2041-08-27
Also published as: CN113673466B

Abstract

The application discloses a method for extracting a photo sticker based on face key points, electronic equipment and a storage medium. In the method for extracting the photo stickers based on the face key points, whether the face in the face picture is a face in a preset state is judged by detecting the face key points in the face picture; when the face in the face picture is in the preset state, the face in the face picture is extracted to be used as a sticker face, and the problem that the face in the non-preset state is used as the sticker face is favorably solved.

Description

Method for extracting photo stickers based on face key points, electronic equipment and storage medium

Technical Field

The application relates to the technical field of image processing, in particular to a method for extracting a photo sticker based on face key points, electronic equipment and a storage medium.

Background

Some electronic photo albums have a face clustering function, can cluster the same people together, finally select a better picture from the picture, deduct the face in the picture as a photo sticker, and then use the photo sticker as a cover of the photo album.

However, the selected photo-album photo-.

Disclosure of Invention

Based on this, in order to solve or improve the problems in the prior art, the present application provides a method for extracting a photo sticker based on a face key point, an electronic device, and a storage medium, which can automatically use a face in a preset state as a photo sticker face according to a user's needs.

In a first aspect, a method for extracting a photo sticker based on face key points is provided, where the method includes:

acquiring a face picture;

detecting face key points in the face picture, wherein the face key points are preset mark points in the face;

when the face key points in the face picture are detected, determining whether the face in the face picture is a face in a preset state or not according to the detected face key points;

and when the face in the face picture is in a preset state, extracting the face in the face picture as a sticker face.

In one embodiment, the face key points in the face picture are detected through a face key point detection model;

the face key point detection model comprises a first detection module, a cutting module and a second detection module;

the first key point module is used for detecting initial key points preset in the face picture;

the cutting module is used for acquiring a face frame according to the preset initial key point and extracting a face image in the face frame;

the second detection module is used for detecting key points according to the face image in the face frame to obtain face key points.

In one embodiment, the loss function used for training the first detection module is smoothL1 function;

and training the Loss function adopted by the second detection module to be a Wing Loss function.

In one embodiment, the determining whether the face in the face picture is a face in a preset state according to the detected face key point includes:

judging whether the face belongs to a non-shielding face or not according to the detected face key points;

judging whether the face belongs to a smiling face according to the detected face key points;

judging whether the face belongs to an eye-opening face or not according to the detected face key points;

and when the human face belongs to a non-shielding human face, a smiling human face and an eye-opening human face at the same time, determining that the human face in the human face picture is a human face in a preset state.

In one embodiment, the determining whether the face belongs to a non-occluded face includes:

acquiring the positions of the detected key points of the human face;

dividing the face into a plurality of preset areas, and taking at least one of the preset areas as a shielding judgment area;

judging whether the detected face key points are distributed in all preset areas in the shielding judgment area or not;

when all the preset areas in the shielding judgment area contain the face key points, respectively judging whether the number of the face key points in each preset area in the shielding judgment area is larger than a preset number threshold of the corresponding preset area;

when the number of the detected face key points is larger than a preset number threshold of a corresponding preset area, determining that the face belongs to an unshielded face;

in one embodiment, the determining whether the face belongs to a smiling face includes:

acquiring the positions of the detected key points of the human face;

selecting a first key point, a second key point, a third key point and a fourth key point according to the detected positions of the key points of the face, wherein the first key point, the second key point, the third key point and the fourth key point are all located in a lip preset area of the face, a connecting line between the first key point and the second key point is parallel to the left-right direction of the face, and a connecting line between the first key point and the second key point is perpendicular to a connecting line between the third key point and the fourth key point;

calculating a first distance between a first key point and a second distance between a third key point and a fourth key point according to the positions of the first key point, the second key point, the third key point and the fourth key point;

and when the ratio of the first distance to the second distance is within a first preset ratio range, determining that the human face belongs to a smiling face.

In one embodiment, the determining whether the face belongs to an eye-open face includes:

judging the states of two eyes of the human face respectively by an eye opening detection method;

when the states of the two eyes are both eye opening states, determining that the human face belongs to an eye opening human face;

wherein, the judging the state of the left eye of the human face by the eye-opening detection method comprises:

acquiring the positions of the detected key points of the human face;

selecting a fifth key point, a sixth key point, a seventh key point and an eighth key point according to the detected positions of the key points of the face, wherein the fifth key point, the sixth key point, the seventh key point and the eighth key point are all located in a left eye preset area of the face, a connecting line between the fifth key point and the sixth key point is parallel to the left-right direction of the face, a connecting line between the fifth key point and the sixth key point is perpendicular to a connecting line between the seventh key point and the eighth key point, and the left-right direction of the face is a direction in which left eye corners and right eye corners are sequentially arranged;

calculating a third distance between a fifth key point and a sixth key point and a fourth distance between the seventh key point and an eighth key point according to the positions of the fifth key point, the sixth key point, the seventh key point and the eighth key point;

and when the ratio of the third distance to the fourth distance is within a second preset ratio range, determining that the left eye is in an eye-open state.

In one of the embodiments, the first and second containers,

after the face picture is obtained and before the face key points in the face picture are detected, a face detection model is used for obtaining a face area in the face picture;

the face detection model is obtained by training through the following steps of:

inputting the training picture into a face detection model, and acquiring the descending gradient of the face detection model in the back propagation process;

mapping the descending gradient to a preset specified range through a mapping function to obtain a mapped descending gradient;

updating parameters of the face detection model through the mapped descending gradient;

wherein the gradient mapping function is:

where z is the descent gradient, e is the base of the natural logarithmic function, h_cIs the mapped falling gradient.

In a second aspect, an electronic device is provided, which includes a memory and a processor, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the steps of the method for extracting the photo stickers based on the face key points as described above.

In a third aspect, one or more non-transitory readable storage media storing computer-readable instructions are provided, which when executed by one or more processors, cause the one or more processors to perform the steps of the method for extracting a photo based on face keypoints as described above.

In the method for extracting the photo stickers based on the face key points, whether the face in the face picture is a face in a preset state is judged by detecting the face key points in the face picture; when the face in the face picture is in the preset state, the face in the face picture is extracted to be used as a photo sticker face, and the problem that the face in the non-preset state is used as the photo sticker face is favorably solved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below. It is to be understood that the drawings in the following description are illustrative only and are not restrictive of the invention.

Fig. 1 is a schematic diagram of an internal structure of an electronic device according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a method for extracting a photo sticker based on face key points in an embodiment of the present application;

fig. 3 is a schematic diagram illustrating distribution of part of face key points in a face according to an embodiment of the present application;

fig. 4 is a schematic diagram illustrating distribution of key points of a human face in a preset mouth area in a non-smiling face state according to an embodiment of the present disclosure;

fig. 5 is a schematic diagram illustrating distribution of key points of a human face in a preset mouth area in a smiling face state according to an embodiment of the present application;

fig. 6 is a schematic diagram illustrating distribution of key points of a human face in a predetermined area of a left eye in an open state according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application are clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It should be noted that in the present application, a certain predetermined deviation is allowed for the parallel and the perpendicular, for example, the predetermined deviation is 0 ° to 10 °.

Fig. 1 is a schematic diagram of an internal structure of an electronic device in one embodiment. As shown in fig. 1, the terminal includes a processor, a memory, and a network interface connected by a system bus. Wherein, the processor is used for providing calculation and control capability and supporting the operation of the whole electronic equipment. The memory is used for storing data, programs and the like, and the memory stores at least one computer program which can be executed by the processor to realize the wireless network communication method suitable for the electronic device provided by the embodiment of the application. The memory may include a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The computer program may be executed by a processor to implement a method for extracting a photo sticker based on face key points provided in the following embodiments. The internal memory provides a cached execution environment for the operating system computer programs in the non-volatile storage medium. The network interface may be an ethernet card or a wireless network card, etc. for communicating with an external electronic device.

The electronic devices described in the present application may include mobile terminals such as a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a Personal Digital Assistant (PDA), a Portable Media Player (PMP), a navigation device, a wearable device, a smart band, a pedometer, and fixed terminals such as a Digital TV, a desktop computer, and the like.

The following description will be given taking a mobile terminal as an example, and it will be understood by those skilled in the art that the configuration according to the embodiment of the present application can be applied to a fixed type terminal in addition to elements particularly used for mobile purposes.

Referring to fig. 2, the method for extracting a photo sticker based on face key points includes steps 10 to 40.

Step 10, obtaining a face picture;

step 20, detecting face key points in the face picture, wherein the face key points are preset mark points in the face;

step 30, when the face key points in the face picture are detected, determining whether the face in the face picture is a face in a preset state or not according to the detected face key points;

and step 40, when the face in the face picture is in a preset state, extracting the face in the face picture as a sticker face.

In one of the application scenarios, the user needs to use the face of the unobstructed smiling face as the face of the sticker. That is to say, the eyes-opened face of the unoccluded smiling face is the face in the preset state, by acquiring the face key points in the face picture, and by judging whether the face is in the occlusion body, is the smiling face and whether the eyes-opened face exists according to the position relationship of the face key points, the partially occluded face, the non-smiling face and the eyes-closed face in the face picture are excluded, and the eyes-opened face of the unoccluded smiling face in the face picture is extracted as the face of the sticker, so that the requirements of the user are met.

Step 10 may be to sequentially or randomly acquire face pictures from various face pictures in the electronic album according to a certain sequence. Specifically, the face pictures are obtained from the electronic album according to a certain sequence until the face meeting the preset state is found, for example, the face pictures are sorted according to the grade of the face pictures, and the grade can be specifically determined according to the definition, the size and the like of the face pictures.

And step 20, detecting the key points of the face in the face picture. The face key points are preset mark points in the face, and the mark points are usually preset as face key points according to the part of the face, so as to mark the position of the part in the face.

Referring to fig. 3, in an embodiment of the present invention, the face key points should be at least distributed in preset areas such as facial features areas and face frame areas, and the number of the face key points in each preset area may be determined according to needs. For example, using 106 position relationships of key points of the human face in the human face, the selection of a non-occluded, smiling face, or open-eye human face as a sticker can be realized.

The distribution of 106 face key points and the corresponding face key point numbers are specifically shown in table 1 below.

Table 1: 106 individual face key point distribution and corresponding face key point number

In step 20, the detection of the face key points in the face picture may be specifically determined according to features of each part in the face. In one embodiment, the face key points in the face picture are detected by a face key point detection model. The human face key point detection model is a deep learning neural network model and can be used for detecting human face key points of an input human face picture and acquiring the human face key points in the human face picture.

In one embodiment, the face key point detection model comprises a first detection module, a cutting module and a second detection module; the first key point module is used for detecting initial key points preset in the face picture; the cutting module is used for acquiring a face frame according to the preset initial key point and extracting a face image in the face frame; the second detection module is used for detecting key points according to the face image in the face frame to obtain face key points.

Optionally, the first detection module and the second detection module are both modules based on a MobileNet-V2 network structure. The first detection module carries out rough key point positioning; obtaining a preset initial key point; cutting a face area according to a preset initial key point to obtain a face image in a face frame; and inputting the face picture into a second detection module to obtain other face key points.

The preset initial key points are face key points which are easy to recognize in the face, such as pupil key points, chin key points, nose tip key points, forehead key points, mouth corner key points and the like. And the cutting module calculates the size and the position of the face frame according to the preset initial key points, so that the face frame can cover the whole face, and the face image is extracted according to the position with the face frame. It can be understood that the face key points include preset initial key points and remaining face key points, where the initial key points are detected by the first module, and the remaining face key points are detected by the second detection module.

In the embodiment, the position of the face frame is determined by the easily-recognized preset initial key point, the face image is acquired by the position of the face frame, and the face image is input into the second detection module to acquire other face key points, so that the size of the picture input into the second detection module is reduced, and the detection speed and the detection accuracy are improved.

In one embodiment, the loss function used for training the first detection module is smoothL1 function; and training the Loss function adopted by the second detection module to be a Wing Loss function.

The formula of the smoothL1 function is:

the formula of the Wing Loss function is as follows:

wherein w is a first preset parameter value for limiting the range of the nonlinear part within an interval of [ -w, w ]; epsilon is a second preset parameter value and is used for restricting the curvature of the nonlinear area; c is a constant used to connect the linear and non-linear portions of the segment with smoothing.

In one embodiment, the face in the preset state may be a non-occluded smiling face open eye face. Specifically, the determining whether the face in the face picture is a face in a preset state according to the detected face key point includes:

step 301, judging whether the face belongs to a non-shielding face according to the detected face key points;

step 302, judging whether the human face belongs to a smiling face according to the detected human face key points;

step 303, judging whether the face belongs to an eye-opening face according to the detected face key points;

and 304, when the human face belongs to a non-shielding human face, a smiling human face and an eye opening human face at the same time, determining that the human face in the human face picture is a human face in a preset state.

And judging whether the face belongs to a non-shielding face, a smiling face and an eye-opening face at the same time or not according to the detected positions of the key points of the face, and determining that the face in the face picture is a face in a preset state when the face belongs to the non-shielding face, the smiling face and the eye-opening face at the same time, so as to extract the face meeting the requirements.

The occluded face refers to a face which is not a complete face in the picture, and an unoccluded face is a complete or substantially complete face and is not occluded by other objects. In one embodiment, the determining whether the face belongs to a non-occluded face includes:

3011, obtaining the positions of the detected key points of the human face;

3012, dividing the face into a plurality of preset regions, and using at least one of the preset regions as a shielding judgment region;

3013, determining whether the detected face key points are distributed in all preset areas in the occlusion determination area;

3014, when all preset regions in the occlusion judgment region contain the face key points, respectively judging whether the number of the face key points in the preset region in each occlusion judgment region is greater than a preset number threshold of the corresponding preset region;

and 3015, determining that the face belongs to an unshielded face when the number of the detected face key points is greater than a preset number threshold of the corresponding preset area.

In step 3011, the positions of the face key points are coordinates of the face key points in the face picture.

In step 3012, the plurality of preset regions are obtained by dividing the face, and for example, the plurality of regions may specifically be a left eye region, a right eye region, a nose region, a lip region, a forehead region, a left eyebrow region, a right eyebrow region, and a remaining region of the face. The occlusion judgment area is a part of or all of the preset areas for judging whether the face is occluded, specifically, the preset areas are classified in advance according to the occlusion judgment requirement, and the preset area for judging whether the face is occluded is determined as the occlusion judgment area. That is, at least one of the preset regions serves as the occlusion determination region. For example, a user can use a face with glasses as a sticker face, when the face wears glasses, a left eye area and a right eye area are shielded, key points of the face in the left eye area and the right eye area cannot be detected, and based on the key points, the left eye area and the right eye area can be excluded from a shielding judgment area, namely the shielding judgment area does not include the left eye area and the right eye area, so that the face with glasses is judged to be a non-shielding face.

In steps 3013 and 3015, when all the preset regions in the occlusion determination region have face key points and the number of the face key points is greater than the preset number threshold of the preset region, it may be determined that the face belongs to an unoccluded face. Each preset area has a preset quantity threshold value, and a mapping relation table is established.

For example, since 106 face keypoints are visible face keypoints, some keypoints may appear invisible if the face is occluded. For example, when the periphery of the mouth and the face frame is shielded, the face key points in the mouth preset region and the face key points in the face frame preset region are not displayed, so that a preset number threshold value can be given to each preset region, and when the face key points in any preset region of the face frame preset region, the mouth preset region and the nose preset region are shielded by more than 30%, the whole face is considered to be in a shielding state. In addition, because the face may be worn with sunglasses more commonly, the face key points in the preset eye area are difficult to detect, and therefore, when judging whether the face is blocked, the face key points in the preset eye area do not need to be judged, that is, the blocking judgment area does not include the preset eye area.

For example, when determining whether the face is occluded, the occlusion determination region specifically includes a face frame preset region, a mouth preset region, and a nose preset region, as shown in table 2 below.

TABLE 2 Preset area and face Key Point numbering

Predetermined area	106 key point numbering
		Face frame preset area	0-32 including left, chin and right face frames
Area of mouth	The outer edge of the upper lip (84-90), the outer edge of the lower lip (91-95)
		Region of nose	Bridge of the nose (43-46), lower edge of nose (47-51), lateral side of nose (78-83)

step 3021, obtaining the positions of the detected key points of the human face;

step 3022, selecting a first key point, a second key point, a third key point, and a fourth key point according to the detected positions of the key points of the face, where the first key point, the second key point, the third key point, and the fourth key point are all located in a preset area of a lip of the face, a connection line between the first key point and the second key point is parallel to the left-right direction of the face, and a connection line between the first key point and the second key point is perpendicular to a connection line between the third key point and the fourth key point;

step 3023, calculating a first distance between the first keypoint and the second keypoint and a second distance between the third keypoint and the fourth keypoint according to the positions of the first keypoint, the second keypoint, the third keypoint, and the fourth keypoint;

step 3024, when the ratio of the first distance to the second distance is within a first preset ratio range, determining that the face belongs to a smiling face.

The first key point may be a key point of a left corner position, the second key point may be a key point of a right corner position, the third key point may be a key point of a middle position of the upper lip (a middle point of a lower edge of the upper lip), and the fourth key point may be a key point of a middle position of the lower lip (a middle point of an upper edge of the lower lip).

In this embodiment, the ratio of the first distance to the second distance determines whether the mouth is in a smiling face state, and when the ratio of the first distance to the second distance is within a first preset ratio range, it is determined that the human face belongs to a smiling face. For example, the preset region for the mouth includes an upper lip region and a lower lip region. Wherein, the upper lip area is divided into a face key point (close to the nose and numbered as 84-90) on the upper half part and a face key point (close to the teeth and numbered as 96-100) on the lower half part; similarly, the lower lip region is also divided into an upper half key point (close to the teeth, with serial number 101) and a lower half key point (close to the chin, with serial number 91-95), and when the face key points are all in a visible state, whether the face is a smiling face is determined through the face key points.

Specifically, when the mouth is in a closed state, the key points of the lower half of the upper lip region and the key points of the upper half of the lower lip region are almost overlapped and the teeth are not exposed, as shown in fig. 4, the key points of the upper half of the upper lip region include the key points of the face with the number 87, and the key points of the lower half of the lower lip region include the key points of the face with the number 93.

When the mouth is in a smiling face, the gap between the key points of the lower half of the upper lip region and the key points of the upper half of the lower lip region is the tooth part, as shown in fig. 5, the key points of the lower half of the upper lip region include the key points of the face with the number 98, and the key points of the lower half of the lower lip region include the key points of the face with the number 102.

Based on the above analysis, whether the user is in a smiling face state is judged by the opening degree between the lower half key point of the upper lip region and the upper half key point of the lower lip region. The specific calculation method is as follows:

the formula for calculating the first distance between the first keypoint and the second keypoint is:

wherein L is₁Is a first distance, x₁Is the abscissa, y, of the first keypoint₁Is the ordinate, x, of the first keypoint₂Is the abscissa, y, of the second keypoint₂The ordinate of the second keypoint.

The formula for calculating the second distance between the third keypoint and the fourth keypoint is:

wherein L is₂Is a first distance, x₃Is the abscissa, y, of the third key point₃Is the ordinate, x, of the third key point₄Is the abscissa, y, of the fourth keypoint₄Is the ordinate of the fourth keypoint.

For example, referring to FIG. 5, the first keypoint is numbered 96 and the second keypoint is numbered 100. A first distance, lateral distance ax, is calculated between the first keypoint of rank 96 and the second keypoint of rank 100 in fig. 5. These two points are the two most marginal of all points in the lower half of the upper lip. Since the first and second

key points

96 and 100 can acquire the corresponding coordinate values in the image, the coordinate values are obtained

The third keypoint is numbered 98 and the fourth keypoint is numbered 102, and similarly, a second distance, i.e., the longitudinal distance between the first keypoint of serial number 98 and the second keypoint of serial number 102 in fig. 5, can be calculated. Calculating a ratio R between the second distance and the first distance₁I.e. R₁＝L₂/L₁，R₁Larger indicates that the mouth is more open, R₁Smaller indicates more closed mouth.

In particular, it is possible to obtain a first preset ratio R₀，0.1≤R₀Less than or equal to 0.5. A ratio between the second distance and the first distance of more than 0.1 indicates that the mouth is open and a smiling face is present. Further, in order to exclude the face in the states of yawning, wide mouth and the like, the upper limit of the first preset ratio may be set to 0.5. It should be noted that the range of the first preset ratio can be adjusted according to actual conditions.

3031, obtaining the positions of the detected key points of the human face;

3032, selecting a fifth key point, a sixth key point, a seventh key point and an eighth key point according to the detected positions of the key points of the human face, wherein the fifth key point, the sixth key point, the seventh key point and the eighth key point are all located in a left eye preset region of the human face, a connecting line between the fifth key point and the sixth key point is parallel to the left-right direction of the human face, and a connecting line between the fifth key point and the sixth key point is perpendicular to a connecting line between the seventh key point and the eighth key point;

step 3033, calculating a third distance between a fifth key point and a sixth key point and a fourth distance between the seventh key point and an eighth key point according to the positions of the fifth key point, the sixth key point, the seventh key point and the eighth key point;

step 3034, when the ratio of the third distance to the fourth distance is within a second preset ratio range, determining that the left eye is in an eye-open state.

Wherein, the fifth key point may be a key point of a left-eye angular position of the left eye and the sixth key point may be a key point of a right-eye angular position of the left eye, the seventh key point may be a key point of an upper-eyelid midpoint position (which may be understood as an upper-eyelid edge midpoint), and the eighth key point may be a key point of a lower-eyelid midpoint position (which may be understood as a lower-eyelid edge midpoint).

In this embodiment, the ratio of the third distance to the fourth distance determines whether the left eye is in an eye-open state, and when the ratio of the third distance to the fourth distance is within a second preset ratio range, the state of the left eye is determined to be in an eye-open state.

For example, referring to fig. 6, when the state of the left eye is determined, the fifth face key point and the sixth face key point are the face key point with the number 52 and the face key point with the number 55, respectively, and the fifth face key point and the sixth face key point are two points at the extreme edge of the left eye in the left-right direction. The seventh face key point and the eighth face key point are respectively the face key point with the number 72 and the face key point with the number 73, and the seventh face key point and the eighth face key point are two points in the middle of the left eye. In the left eye region, the second preset ratio range is greater than or equal to 0.2, i.e., if the ratio of the third distance to the fourth distance is less than 0.2, the left eye is considered to be closed.

Similarly, when the state of the right eye is determined, the ninth face key point and the tenth face key point are the face key point with the number 58 and the face key point with the number 61, respectively, and the ninth face key point 52 and the tenth face key point 58 are two points on the edge of the right eye in the left-right direction. The eleventh face key point and the twelfth face key point are respectively a face key point with the number 75 and a face key point with the number 78, and the eleventh face key point 75 and the twelfth face key point 78 are two points in the middle of the left eye. In the left eye region, the second preset ratio range is greater than or equal to 0.2, i.e., if the ratio of the fifth distance to the sixth distance is less than 0.2, the right eye is considered to be closed. The fifth distance is the distance between the ninth key point and the tenth key point, and the sixth distance is the distance between the eleventh key point and the twelfth key point.

The face may be considered to be in the open-eye state only when both the left and right eyes are in the open state. When the face picture is worn with sunglasses, the left eye and the right eye are both blocked, the picture is determined to be in a closed-eye state or an undeterminable state, and the picture is not used as a face of the photo sticker.

Optionally, after step 10, before step 20, the method further includes obtaining a face region in the face picture by using the face detection model. The face detection model is used for determining the position of a face frame (namely a frame-shaped face region) in a face picture, namely obtaining the face region; and 20, performing face key point detection on the image (face area) in the face frame through the face key point model to directly obtain face key points.

The face detection model may specifically be a face detection model based on the yolov3 neural network. Before the face detection model is used for acquiring the face region in the face picture, the method also comprises the steps of training the face detection model by adopting a face detection training set,

further, the training method of the face detection model based on yolov3 is improved, so that the face ambiguity can be scored, the face ambiguity and the confidence can be more accurately obtained, and the processing speed of the whole face detection model is increased. Specifically, the training set used in the training method of the face detection model increases the dimension of the ambiguity, that is, the ambiguity of the face in the picture of the training set is marked to determine whether the face is a blurred face.

Further, the loss function adopted by the face detection model is as follows:

loss＝xy_loss+wh_loss+confidence_loss+class_loss+β×blur_loss

the xy _ loss is the coordinate offset loss of a face frame, the wh _ loss is the scale loss of the face frame, the confidence _ loss is the confidence loss of the face frame, and the blu _ loss is the loss of ambiguity;

wherein, blu _ loss ═ (1+ signature (confidence _ loss + class _ loss)) × binary cross entry (true _ blu _ label, pred _ blu _ label);

the sigmoid is an activation function, the binary cross entry is a binary classification loss function, true _ blu _ label is a real ambiguity label, and pred _ blu _ label is a prediction ambiguity label. Specifically, xy _ loss may be a binary cross entropy loss designed based on the offset of the grid point coordinate at the top left corner of the center point of the face frame. wh _ loss is a binary cross entropy loss based on the face bounding box width and height design. loss _ confidence is a binary cross entropy loss based on obj and no _ obj, and is specifically divided into two situations of obj and no _ obj to calculate the loss, wherein the binary cross entropy is calculated for obj (a corresponding real frame of the face frame); for no _ obj (the face frame has no corresponding real frame), for example, when the iou of the face frame and the real frame is lower than 0.5, the binary cross entropy of no _ obj _ confidence _ loss needs to be calculated. loss _ class may be a two-class cross entropy loss function, and further, for n classes, n two-class cross entropy loss functions are used.

The face detection model adopts a loss function to add blur _ loss as the loss of ambiguity. Specifically, in the training process, a ambiguity dimension is added between the output dimension of the feature map and the yolov3 category feature dimension, the feature map output dimension can be calculated by N × [ a × (b + c + d) ], where N × N is the number of lattice points of the output feature map, a is the number of preset anchor frames, b is the number of prediction frame values of each face frame, c is the confidence of the prediction frames, and d is the category feature dimension degree.

In the loss function adopted by the face detection model, β is an adjustable parameter used for coordinating the relationship between blu _ loss and other losses. The loss of ambiguity is calculated when calculating the loss for each grid point using the following formula:

blur_loss＝(1+sigmoid(confidence_loss+class_loss))*binary cross entropy(true_blur_label,pred_blur_label)

sigmoid is an activation function, binary cross entry is a binary loss function, true _ blu _ label is a true ambiguity label, and pred _ blu _ label is a predicted ambiguity label.

Wherein, confidence _ loss and class _ loss assist to adjust the ambiguity loss, sigmoid maps confidence _ loss + class _ loss as a part of the ambiguity score loss adjusting coefficient, therefore, the three have strong correlation. It can be understood that if the confidence of the face frame and the confidence of the face category are low, i.e. the loss values of the face frame and the face category are high, the ambiguity will also obtain a large loss (not more than twice the directly calculated ambiguity loss); if the confidence of the face frame and the confidence of the face category are high, the loss values of the face frame and the face category are low, and the ambiguity loss value is close to a direct loss obtained by binary cross entropy entry (true _ blue _ label, pred _ blue _ label), wherein the binary cross entropy entry is a binary cross entropy loss function and is used for calculating the binary class loss of a prediction ambiguity label and a real ambiguity label, the true _ blue _ label is a real ambiguity label and is also called a prediction ambiguity label, and the pred _ blue _ label is a prediction ambiguity label and is also called a real ambiguity score. During the training process, pred _ blu _ label (prediction ambiguity label) will be gradually optimized to the true ambiguity label given in the training set.

The loss function of the face detection model comprises ambiguity loss, so that the detected face directly has the fuzzification attribute and does not need to be subjected to ambiguity judgment again; moreover, in order to reduce the loss function of the ambiguity, the confidence coefficient of the face frame needs to be lost, and the blur _ loss is that the ambiguity loss is reduced as much as possible, so that the parameters of the face detection model in the process of the detrending training are better, and the confidence coefficients of the trained face frame and the trained face type are more accurate

In addition, in order to make the face detection model converge more effectively, avoid the problem of gradient explosion caused by the incoordination of one or more scoring dimensions, and reduce the workload of parameter adjustment, the embodiment provides a dynamic gradient clipping method. Specifically, in the dynamic gradient clipping method, to prevent the situation of gradient explosion during the back propagation of the optimization network, a larger gradient is clipped, that is, an upper limit of the gradient is set to be 1, and then the gradient larger than 1 is forced to be 1 to update the parameters.

In the dynamic gradient clipping method, it is found that there is no difference in gradients larger than a specified threshold, and a problem may occur in convergence of a face detection model due to a problem of threshold setting. That is, in the dynamic gradient clipping method, there is a problem that the gradient threshold is difficult to select.

For the problem that the gradient threshold is difficult to select, the present embodiment maps the gradient to the specified range by using a preset function (preset mapping function), which not only reflects the size difference, but also prevents the problem that the too large gradient causes gradient explosion.

Wherein, the preset gradient mapping function is as follows:

where z is the original gradient (true _ gradient), h_cIs the mapped gradient (clip _ gradient).

Specifically, after the face picture is obtained and before the face key points in the face picture are detected, a face region in the face picture is obtained by using a face detection model;

wherein the gradient mapping function is:

For example, in updating the parameters of the face detection model through the mapped descending gradient, the following formula may be adopted to update the parameters of the face detection model:

wherein, theta_tFor updated parameters, θ_t-1For the parameters before updating, λ is the preset learning rate, h_cAs a gradient mapping function, f (x)_t，θ_t-1) Based on data x_tOf a prediction function of x_tThe selected training picture data.

In step 40, when the face in the face picture is the face in the preset state and the comprehensive score is the highest (i.e. the face in the preset state with the highest comprehensive score), the face in the face picture is extracted as the sticker face.

The acquisition of the comprehensive score can be specifically completed in step 10, and the face pictures are sorted from high to low according to the comprehensive score, and the step 10 can be realized by sequentially acquiring the face pictures from various face pictures in the electronic album according to the sequence of the comprehensive score from high to low.

The steps of acquiring the comprehensive face score are as follows:

acquiring a plurality of face pictures in an electronic photo album;

and acquiring a comprehensive face score according to the size, the similarity, the brightness, the definition and the angle of the face of each face picture.

The step of obtaining the comprehensive face score according to the face size, the face similarity, the face brightness, the face definition and the face angle in the face picture comprises the following steps:

acquiring the size of a face frame of a face picture, determining the largest size of the face frame in all the face pictures, and acquiring a face size weight value; dividing the size of the face frame by the maximum size of the face frame to obtain the size ratio of the face frame; multiplying the ratio of the size of the face frame by the weight value of the size of the face to obtain a dimension value of the size of the face;

acquiring a characteristic value of a face in the face picture and characteristic values of faces in the rest face pictures according to the face recognition model; acquiring an average value of similarity between the face in the face picture and the face in the rest face pictures according to the feature value of the face in the face picture and the feature values of the faces in the rest face pictures; multiplying the average value of the face similarity by the face similarity weight value to obtain a face similarity dimension value;

converting a face area in a face picture into a gray image; acquiring an average value of gray points in a face area as a brightness value of the face; acquiring an absolute value of a difference between a brightness value of a human face and a preset brightness value; dividing the absolute value by a preset brightness value to obtain a brightness deviation degree; multiplying the brightness deviation degree by the human face brightness weight value to obtain a human face brightness dimension value;

acquiring a definition degree value of a face in a face picture through a definition classification model; multiplying the definition degree value by a face definition weight value to obtain a face definition dimension value;

acquiring a left-right inclination angle, a left-right deflection angle and a pitching angle of a face in a face picture through a face angle classification model; adding the product of the left and right deflection angles and the left and right deflection weight values, the product of the left and right inclination angles and the left and right inclination weight values, and the product of the pitching angles and the pitching weight values to obtain a face angle dimension value;

and adding the face size dimension value, the face similarity dimension value, the face brightness dimension value, the face definition dimension value and the face angle dimension value of each face to obtain the comprehensive score of the face.

It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The embodiment of the present application further provides an electronic device, which includes a memory and a processor, where the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the steps of the method for extracting the photo stickers based on the face key points in any of the above embodiments.

The embodiment of the application also provides a computer readable storage medium. One or more non-transitory computer-readable storage media containing computer-executable instructions that, when executed by one or more processors, cause the processors to perform the steps of a method for extracting a sticker based on face keypoints.

A computer program product containing instructions which, when run on a computer, cause the computer to perform a method of selecting a sticker.

Any reference to memory, storage, database, or other medium used herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for extracting photo stickers based on face key points is characterized by comprising the following steps:

acquiring a face picture;

2. The method for extracting photo stickers based on face key points as claimed in claim 1, wherein the face key points in the face picture are detected by a face key point detection model;

3. The method for extracting patches based on face key points as claimed in claim 2, wherein the loss function adopted by the first detection module is trained as smoothL1 function;

4. The method for extracting photo stickers based on face key points as claimed in claim 1, wherein said determining whether the face in the face picture is a face in a preset state according to the detected face key points comprises:

5. The method for extracting photo stickers based on face key points as claimed in claim 4, wherein said determining whether said face belongs to a non-occluded face comprises:

acquiring the positions of the detected key points of the human face;

and when the number of the detected key points of the face is greater than a preset number threshold of the corresponding preset area, determining that the face belongs to an unshielded face.

6. The method for extracting photo stickers based on human face key points as claimed in claim 5, wherein said determining whether the human face belongs to a smiling face comprises:

acquiring the positions of the detected key points of the human face;

7. The method for extracting photo patches based on facial key points as claimed in claim 5, wherein said determining whether said face belongs to an eye-open face comprises:

wherein the judging the state of the left eye of the human face by the eye-opening detection method comprises:

acquiring the positions of the detected key points of the human face;

selecting a fifth key point, a sixth key point, a seventh key point and an eighth key point according to the detected positions of the key points of the face, wherein the fifth key point, the sixth key point, the seventh key point and the eighth key point are all located in a left eye preset area of the face, a connecting line between the fifth key point and the sixth key point is parallel to the left-right direction of the face, and a connecting line between the fifth key point and the sixth key point is perpendicular to a connecting line between the seventh key point and the eighth key point;

8. The method for extracting photo stickers based on face key points as claimed in claim 1, wherein after said obtaining the face picture, and before said detecting the face key points in the face picture, further comprising obtaining a face region in the face picture by using a face detection model;

wherein the gradient mapping function is:

9. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the computer program, when executed by the processor, causes the processor to perform the steps of the method for extracting a photo based on a face keypoint as claimed in any one of claims 1 to 8.

10. One or more non-transitory storage media storing computer-readable instructions, wherein the computer-readable instructions, when executed by one or more processors, cause the one or more processors to perform the steps of the method of extracting patches based on face keypoints according to any one of claims 1-8.