CN107153806B

CN107153806B - Face detection method and device

Info

Publication number: CN107153806B
Application number: CN201610120358.7A
Authority: CN
Inventors: 曾杰; 彭开; 李昆明
Original assignee: Actions Technology Co Ltd
Current assignee: Actions Technology Co Ltd
Priority date: 2016-03-03
Filing date: 2016-03-03
Publication date: 2021-06-01
Anticipated expiration: 2036-03-03
Also published as: CN107153806A

Abstract

The invention discloses a face detection method and a face detection device. In the face detection method, an image to be processed is acquired; respectively carrying out face region detection on the image to be processed according to M-level windows with different sizes, wherein M is an integer larger than 1; determining a confidence level of each detected face region; and detecting facial contour points of five sense organs according to the face region with the highest confidence coefficient. Because the face detection method carries out facial feature detection according to the face region with the highest confidence coefficient, certain robustness can be ensured, and the accuracy of the detection result can be further ensured to a certain extent.

Description

Face detection method and device

Technical Field

The invention relates to the field of image processing, in particular to a face detection method and device.

Background

With the development of computer technology, especially pattern recognition technology, face detection has important theoretical research value and application value. The face detection means that for any given image, a certain strategy is adopted to search the image to determine whether the image contains a face, and if the face is detected, the position, size and posture of the face can be returned.

At present, the wide application field of the face detection technology mainly has three aspects: 1) the automatic face recognition system detects whether a face exists in an image, determines the position and size of the face if the face exists, and then recognizes the face in the image; 2) media and entertainment, in the network virtual world, a large number of entertainment programs and effects can be generated through the change of human faces, and in consumer electronic products such as mobile phones and digital cameras, entertainment projects based on human faces are more and more abundant; 3) the image search, the search engine based on the face image recognition technology will have wide application prospect, the image is used as the input search, whether the face exists in the image can be judged, if so, the similar image is searched, and meanwhile, the similar face is searched.

Disclosure of Invention

The embodiment of the invention provides a face detection method and device.

The embodiment of the invention provides a face detection method, which comprises the following steps:

acquiring an image to be processed;

respectively carrying out face region detection on the image to be processed according to M-level windows with different sizes, wherein M is an integer larger than 1;

determining a confidence level of each detected face region;

and detecting facial contour points of five sense organs according to the face region with the highest confidence coefficient.

The embodiment of the invention provides a face detection device, which comprises:

the acquisition module is used for acquiring an image to be processed;

the first detection module is used for detecting the face area of the image to be processed according to M-level windows with different sizes, wherein M is an integer larger than 1;

a determining module for determining a confidence level of each detected face region;

and the second detection module is used for detecting the facial contour points of the five sense organs according to the face region with the highest confidence coefficient.

In the embodiment of the invention, the face region detection is respectively carried out on the image to be processed according to M-level windows with different sizes, the confidence coefficient of each detected face region is determined, and the facial feature contour point detection is carried out according to the face region with the highest confidence coefficient. The human face detection method can ensure certain robustness and further ensure the accuracy of a detection result to a certain extent by detecting the five sense organs according to the human face region with the highest confidence coefficient.

Drawings

Fig. 1 is a schematic flow chart of a face detection method according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a process of performing face detection on a candidate region according to an embodiment of the present invention;

fig. 3 is a schematic flowchart of a window sliding policy according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating a window sliding strategy according to an embodiment of the present invention;

fig. 5a to 5d are schematic diagrams of detection results obtained by the face detection method according to the embodiment of the invention;

FIG. 6 is a schematic flow chart of a makeup method according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of eye contour points provided by an embodiment of the present invention;

FIG. 8 is a diagram of the effect of eye makeup provided by an embodiment of the present invention;

FIG. 9 is a schematic view of lip contour points provided by an embodiment of the present invention;

FIG. 10 is a diagram illustrating the effect of applying a lip makeup according to an embodiment of the present invention;

FIG. 11 is a schematic diagram of a beard contour point provided by an embodiment of the present invention;

fig. 12 is a diagram illustrating the effect of making up the beard according to the embodiment of the present invention;

fig. 13a to 13d are schematic diagrams of human face detection and makeup results according to an embodiment of the present invention;

fig. 14 is a schematic structural diagram of a face detection apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, a schematic flowchart of a face detection method according to an embodiment of the present invention is shown, where the flow may be executed by an electronic device or an apparatus with an image processing function.

As shown, the process may include the following steps:

step 101: and acquiring an image to be processed.

Step 102: and respectively carrying out face region detection on the image to be processed according to M-level windows with different sizes, wherein M is an integer larger than 1.

In the face detection technology, in order to adapt to face detection of different sizes, the whole image can be traversed by using windows of different sizes, and face detection is performed on each candidate region.

The window size for face detection may be preset. In a preferred scheme, the first-level to Nth-level windows can be obtained according to the size of the initial window and a preset amplification coefficient; the size of the j +1 th-level window is obtained by amplifying the j +1 th-level window according to the amplification factor on the basis of the size of the j-level window, wherein 1< j < N-1, and N is an integer. For example: the magnification factor is 1.1, the size of the first window is 10 × 10 (unit is pixel, the same applies hereinafter), the length and width of the first window are multiplied by the magnification factor respectively, the size of the second window is 11 × 11, the length and width of the third window are multiplied by 1.1 on the basis of the second window, and so on.

Preferably, an M-level window may be selected from the N-level windows for subsequent face region detection. In general, the value of N is large, for example, a 32-level window is commonly used, which has a large calculation amount and consumes a long time, so that a window with a partial size is selected for subsequent face detection, which can reduce the calculation amount, save the calculation time, and reduce the requirements on hardware (for example, reduce the requirements on the specifications of a processor and a memory).

Preferably, M-level windows are chosen at equal intervals from the N-level windows. For example, in 32-level windows obtained according to the size of the initial window and a preset magnification factor, the 32-level windows are sorted in order from small size to large size as follows: window 1, window 2, … …, window 32, then select window 4, window 8, window 12, window 16, window 20, window 24, window 28, window 32, among the 32-level windows, the 8-level windows. Feature extraction is performed based on the 8-level windows, but the selected 8-level windows are uniformly and discretely distributed in the 32-level windows, so that the requirements of adapting to face detection with different sizes can be still met. The smaller the number of selected stages, the smaller the calculation amount, the faster the calculation speed, but the robustness is reduced therewith; conversely, the larger the calculation amount, the slower the calculation speed, but the higher the robustness. Therefore, in specific implementation, the number of the window stages can be determined according to different requirements of different scenes on robustness and operation speed.

In other embodiments, the M-level windows selected from the N-level windows with different sizes may not be uniformly and discretely distributed, for example, for a scene with a uniform face size in the image to be processed, the window with the corresponding size may be selected according to the possible face size.

As shown in fig. 2, when performing face detection on one of the primary windows, the window may be slid in the image to be processed to obtain candidate regions, and the following steps may be performed for each obtained candidate region:

step 1021: and selecting candidate regions according to the level window, and performing feature extraction on each candidate region by using a feature template corresponding to the level window.

Step 1022: and if the calculation result of the extracted features through the cascade classifier is larger than the threshold value of the cascade classifier, judging the corresponding candidate region as the face region, wherein the threshold value of the cascade classifier is obtained by reducing the threshold value based on the threshold value obtained by sample training.

In the above steps, if each stage of the cascade classifier determines that the candidate region is a face region, the candidate region is determined as the face region. Under certain conditions, part of cascade classifiers can be skipped, and face detection can be performed on the candidate regions only through the residual cascade classifiers.

The threshold of the cascade classifier is obtained through sample training. Preferably, in the embodiment of the present invention, after obtaining the cascade classifier according to the sample sequence, the threshold of the cascade classifier may be adjusted to be low. After the threshold value is properly reduced, for example, the threshold value is reduced to 94% -98% of the original threshold value in some scenes, the rejection rate of face detection (the probability of judging the face area as a non-face area) can be reduced, and on one hand, the anti-interference performance of interference factors such as face shielding, light influence, glasses wearing and the like is improved; on the other hand, the impact due to reducing the number of windows of different sizes used for feature extraction is avoided. The false recognition rate (the probability of judging a non-face area as a face area) can be improved while the false recognition rate is reduced, so that the false recognition rate and the false recognition rate can be comprehensively considered when the amplitude of reducing the threshold is determined, and the robustness of the system is ensured.

In the flow shown in fig. 2, the feature template corresponding to each level of window may include a plurality of feature templates. A plurality of feature templates may be used in step 1021 and may be Harr feature templates, which are detected using an Adaboost cascade classifier based on Harr features in step 1022 accordingly. The feature template may also be another feature template, which is not limited in this embodiment of the present invention.

In the step 1021, the hierarchical window may be sequentially slid from left to right and from top to bottom according to the set step length in the horizontal direction and the set step length in the vertical direction to obtain a plurality of candidate regions, and feature value extraction is performed after one candidate region is selected, and a cascade classifier is used to perform face region decision based on the extracted feature values. Typically, the step size in the horizontal direction is equal to the length of the level window and the step size in the vertical direction is equal to the height of the level window.

Preferably, the strategy of window sliding may be adjusted according to the detection result of the candidate region, so as to reduce the amount of calculation and save the operation time. A specific window sliding strategy may be as shown in fig. 3:

if the current candidate area is judged to be the face area, the area m-1 times of the second step length after the current candidate area is marked as a non-face area in the second direction, and the window is slid by n times of the first step length in the first direction to obtain an area to be selected (step 301, step 305, step 306, step 307). The above process is typically performed in the following cases: and under the condition that the sliding distance between the current candidate area and the boundary of the image to be processed in the first direction is greater than or equal to n times of the first step length. And when the sliding distance between the current candidate area and the boundary of the image to be processed in the first direction is less than n times of the first step length, jumping the window to the initial position of the first direction along the second direction to obtain the area to be selected (step 301, step 305, step 306 and step 308). Wherein m and n are integers greater than 1, and usually m and n are not more than 4.

If the obtained candidate area is marked as a non-face area, under the condition that the sliding distance between the candidate area and the boundary of the image to be processed in the first direction is greater than or equal to a first step length, a next candidate area is obtained according to a first step length sliding window (step 304, step 302, step 303, step 304, step 309); if the candidate area is not marked as a non-face area, the candidate area is used as a next candidate area (step 304, step 309).

If the current candidate area is judged to be the non-face area, the window is slid by a first step length in a first direction by taking the current candidate area as a starting point to obtain a candidate area (step 301, step 302, step 303). The above process is typically performed in the following cases: and under the condition that the sliding distance between the current candidate area and the boundary of the image to be processed in the first direction is greater than or equal to a first step size. And when the sliding distance between the current candidate area and the boundary of the image to be processed in the first direction is less than the first step length, jumping the window to the initial position of the first direction along the second direction to obtain the area to be selected (step 301, step 302 and step 308).

If the obtained candidate area is marked as a non-face area, sliding a window by a first step length under the condition that the sliding distance between the candidate area and the boundary of the image to be processed in the first direction is greater than or equal to the first step length to obtain a next candidate area (step 304, step 302, step 303, step 304, step 309); if the candidate area is not marked as a non-face area, the candidate area is used as a next candidate area (steps 304 and 309).

If the first direction is the horizontal direction and the second direction is the vertical direction, the first step length is the width of the window, and the second step length is the height of the window; if the first direction is the vertical direction and the second direction is the horizontal direction, the first step length is the height of the window and the second step length is the width of the window.

In order to explain the above flow more clearly, n-m-3 is taken as an example, and is specifically described with reference to fig. 4. As shown in fig. 4, the window is slid from left to right in the horizontal direction and from top to bottom in the vertical direction. If the current candidate area is A, if the area A is judged to be a human face area, sliding the window in the horizontal direction by taking the area A as a starting point and taking 3 times of window width as a step length to obtain an area B to be selected; and marking an area in the area A, namely the area C, which is 2 times of the height of the window in the vertical direction as a non-face area, namely when the window slides into the area, because the window is marked as the non-face area, the area is not subjected to feature extraction and judgment, and the window directly slides into the next area to be selected. If the current candidate region is B, if the region B is judged to be a non-face region and the sliding distance between the region B and the boundary of the image to be processed in the horizontal direction is smaller than the first step length, jumping the window to the initial position in the first direction along the second direction to obtain a region to be selected, namely a region D.

Step 103: a confidence level for each detected face region is determined.

When the windows with different sizes are used for detecting the face area of the image to be processed, the face area may be detected or not detected. The confidence of each region detected as a face is calculated, and the face region with the highest confidence is used as the final detected face region, such as the rectangular boxes shown in fig. 5a to 5 b. Specifically, the calculation formula of the confidence is shown in formula (1):

conf＝∑(T_i-T_ri) (1)

where conf represents the confidence of the face region, T_iThe calculation result T representing the characteristic value of the image to be processed in the face region after the ith-level cascade classifier_riThreshold, T, representing the ith cascade of classifiers_iAnd T_riThe larger the difference of (a), the higher the confidence.

Step 104: and detecting facial contour points of five sense organs according to the face region with the highest confidence coefficient.

Preferably, a first proportion and a second proportion may be preset, both the first proportion and the second proportion are less than or equal to 1, and the second proportion may be the same as or different from the first proportion.

In step 102, the image to be processed may be scaled according to the first ratio to obtain a first image, that is, the resolution of the image to be processed is reduced according to the first ratio; and then, respectively carrying out face region detection on the first image according to M-level windows with different sizes.

Correspondingly, in the step 104, the image to be processed may be scaled according to the second ratio to obtain the second image, that is, the resolution of the image to be processed is reduced according to the second ratio; then, according to the position of the face region with the highest confidence level in the first image, determining the face region with the highest confidence level in the second image; and determining the facial contour points in the image to be processed according to the positions of the facial contour points detected in the second image.

Preferably, the second ratio is greater than or equal to the first ratio, i.e. the resolution of the second image is greater than or equal to the resolution of the first image. Because the image resolution requirement for facial contour point detection is generally higher than that for human face region detection, if the image resolution for facial contour point detection is lower, the detected facial contour points are susceptible to interference factors, for example, the eye contour points are susceptible to interference of glasses in the case of lower resolution.

For example: the resolution of the image to be processed is 1920 × 1080, and when the face region is detected, the resolution of the image to be processed can be reduced to 320 × 240; when the facial contour point detection is performed, the resolution of the image to be processed can be reduced to 640 × 480, then the face region with the highest confidence degree detected in the image with the resolution of 320 × 240 is converted into the face region in the image with the resolution of 640 × 480, the facial contour point detection is performed, and finally the position of the facial contour point in the original image to be processed is determined according to the position of the detected facial contour point in the image with the resolution of 640 × 480. Through the process, the calculation amount can be reduced, the calculation time is saved, the adaptability of the device with different calculation capabilities is improved, and meanwhile, certain precision can be ensured.

In the step 104, the facial contour point detection may be performed by using various algorithms, and preferably, in the detection of facial contour points, the detection may be performed by using an ASM (Active Shape Model) algorithm. The effect obtained when the ASM algorithm is used for detecting the front face is good, but the inclination of the face is difficult to overcome.

Therefore, in order to overcome the problem that the ASM algorithm is not good in effect when detecting the tilted face, and to obtain the facial contour points with higher robustness, it is preferable that before the ASM algorithm is used to detect the facial features, the face tilt angle in the face region with the highest confidence coefficient is first calculated, and then the facial contour points in the face region with the highest confidence coefficient are determined according to the face tilt angle.

Specifically, the process of calculating the face inclination angle is as follows:

1) and determining the double-eye area in the face area.

For example, the binocular region in the face region may be estimated from the ratio of "three ting five eyes", where "three ting" refers to the length ratio of the face, from the forehead hairline to the eyebrow bone, from the eyebrow bone to the bottom of the nose, from the bottom of the nose to the chin, and 1/3 each occupying the length of the face, and "five eyes" refers to the width ratio of the face, from the left hairline to the right hairline, as the length of five eyes, the distance between the two eyes, and the distance between the outer sides of the two eyes to the side hairline, each being the distance of one eye. The detected face area can be equally divided into three rows and five columns, while the eyes are located in the second row, the second column and the second row, the fourth column.

2) The pupil positions of the two eyes are determined according to the component values of the color space in the two-eye area, wherein the component values of the color space can be one or more of a brightness value and a chromatic value.

Taking the color space of the image to be processed as YCrCb as an example, a set of points in the estimated positions of both eyes that meet the following condition may be used as the reference region of the eye:

-the Y component is within the [100,125] interval;

-the Cb component is within the [115,130] interval;

the Cr component is in the [125,145] interval.

And in the reference area of each eye, taking the position of the minimum value of the Y component as the pupil position of the eye. During specific execution, because the pupil position of the eye is determined according to the Y component, the pupil position may be interfered by hair or glasses, after the position of the minimum value of the Y component is obtained, the position is determined, if the position is located at the periphery of the reference area, the position is considered not to be the pupil position of the eye, and if the position is located in the preset central area, the position is determined to be the pupil position of the eye.

3) And determining the face inclination angle according to the pupil positions of the two eyes.

Suppose the pupil positions of both eyes are (x) respectively₁，y₁) And (x)₂，y₂) Then, the face inclination angle θ can be obtained according to the formula (2):

θ＝arctan[(y₂-y₁)/(x₂-x₁)] (2)

specifically, according to the face inclination angle, the process of determining the facial feature contour points in the face region with the highest confidence coefficient is as follows:

1) and rotating the image to be processed in the face area with the highest reliability according to the face inclination angle to obtain a rotated face area image.

2) And detecting facial contour points of the facial region image after rotation.

3) And reversely rotating the detected coordinates of the facial contour points according to the face inclination angle to obtain the coordinates of the facial contour points in the face region with the highest confidence coefficient.

Preferably, a threshold value can be preset for the face inclination angle, if the calculated face inclination angle is greater than the threshold value, it indicates that the face in the image affects the accuracy of the detected facial contour points due to the inclination, and the method of rotating the face region first and then acquiring the facial contour points is adopted; if the calculated face inclination angle is smaller than the threshold value, the fact that the inclination angle of the face in the image is small does not affect the accuracy of facial feature detection, and facial feature contour points can be directly detected on the face region without rotation, so that the calculation amount is reduced under the condition that the detection accuracy is ensured.

Points shown in FIGS. 5a to 5d are X, which is the outline point sequence of the facial features detected by the above method_face＝{(x₁,y₁),(x₂,y₂),...,(x_n,y_n)}。

The images shown in fig. 5a to 5d are taken by a prototype, and in the images, some of the faces have glasses, some of the faces have side faces to a certain extent, some of the faces have upward (downward) viewing angles of the images, and some of the faces have inclination.

The face detection method provided by the embodiment of the invention can be suitable for different application scenes, is particularly suitable for handheld equipment with low computing capability or make-up and beauty application with entertainment requirements, reduces the computation amount under the condition of ensuring certain robustness and improves the computation speed.

Further, after the contour points of the five sense organs are detected, the detected five sense organs can be made up, referring to fig. 6, the making up process specifically includes the following steps:

step 601: and processing the facial makeup template of the five sense organs according to the contour points of the five sense organs.

Specifically, in the above steps, the inclination angle of the facial features may be determined according to the contour points of the facial features, the rotation angle of the facial feature makeup template may be determined according to the inclination angle of the facial features, and the facial feature makeup template may be rotated according to the rotation angle of the facial feature makeup template; and determining the size of the five sense organs according to the contour points of the five sense organs, and zooming the makeup template of the five sense organs according to the size of the five sense organs.

Rotation matrix T_θAs shown in equation (3):

wherein θ represents the rotation angle of the five sense organs.

Scaling matrix T_sAs shown in equation (4):

wherein Sx represents the scaling parameter of the five sense organ template on the X axis, and Sy represents the scaling parameter of the five sense organ template on the Y axis.

In some embodiments, some makeup templates may be rotated or zoomed only or not in order to achieve a makeup effect of fun, exaggeration, etc., and the present invention is not limited thereto.

Step 602: and fitting the processed facial makeup template to a human face area in the image to be processed.

When the makeup template is attached to the face area in the image to be processed, the makeup template is attached to the image to be processed according to the attaching mode of the central point. For example, when the eye is made up, the center point position of the eye make-up template after rotation and zooming is aligned with the eye center point position in the image to be processed, and the eye make-up template after rotation and zooming is attached to the image to be processed.

Preferably, in the above step, the transparency of the processed facial makeup template may be set according to a preset transparency, and the facial makeup template with the transparency set may be attached to the face region with the transparency set in the image to be processed, as shown in formula (5).

I_out(x,y)＝(1-α)·I_in(x,y)+α·[T_scale·T_θ·I_mask(x,y)] (5)

Wherein, I_out(x, y) represents the output after makeup, I_in(x, y) denotes the image to be processed, I_mask(x, y) represents a makeup template, and α represents transparency.

In some embodiments of the present invention, in the to-be-processed image for applying makeup, the face area is generally large, and in the step 102, when the M-level windows are selected from the N-level windows arranged from small to large in size, only the window with the large size may be selected. Still taking the 32 windows as an example, the windows 18, 20, 22, 24, 26, 28, and the six-level windows may be selected to extract feature values, and face region detection may be performed according to the feature values, so as to reduce the amount of computation and increase the computation speed.

In order to more clearly illustrate the makeup process, the following is a description of the makeup of eyes, lips, and beards.

Example I eye make-up

Before makeup is performed, a face region in an image to be processed needs to be detected, and five sense organs of the face region are detected to obtain five sense organ contour points, the specific detection process is described in the foregoing embodiment, and the detected eye contour points are shown in fig. 7.

And then determining the inclination angle of the eyes according to the contour points of the eyes, wherein the inclination angle of the eyes comprehensively considers the individual differences of the inclination angle of the human face and the contour of the eyes. Wherein the face inclination angle theta_faceThe position of the center point of the eyes can be obtained, and the calculation method can be the calculation method described in formula (2), and can also be calculated according to the obtained contour points of the five sense organs, and is specifically shown in formula (6):

wherein (x)_rce，y_rce) And (x)_lce，y_lce) Respectively, center point position coordinates of the right and left eyes, which may be expressed by equations (7) and (8)

Wherein (x)_{e_in}，y_{e_in}) Represents the internal corner position coordinates of the eye, (x)_{e_out}，y_{e_out}) The reason why the coordinates of the central position of the eye are determined based on the coordinates of the external canthus position of the eye is that the coordinates of the positions of the internal and external canthus are not easily affected by interference factors such as glasses and are more stable.

Furthermore, the left (right) eye inclination angle θ of the image is obtained from the inner and outer eye corner points in consideration of individual differences of human eyes (difference expressed in the inclination angle of the eyes themselves)_le(θ_re) As shown in formula (9):

finally determining the rotation angle theta of the left eye makeup template_leyeThe rotation angle theta of the right eye makeup template is shown in formula (10)_reyeAs shown in formula (11):

the scaling factor Sx of the eye makeup template in the X-axis is shown in formula (12), and the scaling factor Sy in the Y-axis is shown in formula (13):

wherein the distance function

img_{e_in}And img_{e_out}Representing the position of the inner and outer corners of the eye, mask, respectively, in the image to be processed_{e_in}And mask_{e_out}Respectively, the positions of the inner and outer corners of the eye in a cosmetic template, img_{e_up}And img_{e_down}Respectively representing the positions of the extreme points of the upper and lower contours of the eye in the image to be processed, mask_{e_up}And mask_{e_down}The positions of the upper and lower contour extrema points of the eye in the eye makeup template are indicated, respectively.

Considering that the acquired upper and lower contour point positions of the eyes are affected by wearing glasses sometimes, and the calculated Y-axis scaling factor Sy is abnormal, the scaling factor is mainly based on the X-axis scaling factor Sx, because the left and right corner positions are generally stable and are less affected by interference. Determining whether the Y-axis scaling factor Sy is abnormal according to the aspect ratio of the human eye outline:

generally, the aspect ratio of the human eye contour does not exceed 3/5, and exceeding 3/5 indicates that the superior-inferior contour point position is abnormal, and the Y-axis scaling factor Sy is equal to the X-axis scaling factor Sx.

And rotating and zooming the eye makeup template according to the determined inclination angle and the zoom factor, and attaching the eye makeup template to the image to be processed, as shown in fig. 8.

Example two lip makeup

In the embodiment of the present invention, the lip makeup process is divided into a top lip makeup process and a bottom lip makeup process. The upper lip makeup and the lower lip makeup are similar to the eye makeup, before the makeup is carried out, firstly, a human face area in an image to be processed needs to be detected, five sense organs of the human face area are detected to obtain contour points of the five sense organs, and then, a lip inclination angle, the center point position of the lips in the image to be processed and a scaling coefficient are calculated according to the contour points of the lips.

Lip inclination angle theta_mFrom equation (14):

wherein (x)_{m_left},y_{m_left}) And (x)_{m_right},y_{m_right}) Respectively representing the coordinates of the left mouth corner point and the right mouth corner point,

the left and right corners of the lips and the extreme points of the upper and lower contours of the upper and lower lips are shown in fig. 9.

Center point position (x) of upper lip_cmu，y_cmu) Obtained from the formula (15) and the formula (16)

Wherein (x)_{mu_up},y_{mu_up}) And (x)_{mu_down},y_{mu_down}) Respectively representing the upper and lower contour extremum coordinates of the upper lip.

Similarly, the position (x) of the center point of the lower lip can be obtained by the method_cmd，y_cmd)。

The scaling factor Sx of the upper lip makeup template on the X axis is shown in formula (17), and the scaling factor Sy on the Y axis is shown in formula (18):

wherein, img_{m_left}And img_{m_right}Respectively representing the positions of the left and right mouth corner points in the image to be processed, mask_{m_left}And mask_{m_right}Respectively representing the positions of the left mouth corner point and the right mouth corner point in the lip makeup template, img_{mu_up}And img_{mu_down}Respectively representing the positions of the upper and lower contour extrema points of the upper lip in the image to be processed, mask_{mu_up}And mask_{mu_down}The positions of the upper contour extreme point and the lower contour extreme point of the lips in the upper lip makeup template are respectively shown.

Similarly, the scaling factors of the lower lip on the X axis and the Y axis can be obtained according to the method.

And rotating and zooming the upper lip makeup template and the lower lip makeup template according to the determined inclination angle and the determined zooming factor, and attaching the upper lip makeup template and the lower lip makeup template to the image to be processed, as shown in fig. 10.

EXAMPLE III cosmetic preparation for beard

Before making up the beard, the face region in the image to be processed needs to be detected, and the facial region is detected by the five sense organs to obtain contour points of the five sense organs, wherein the upper contour extreme point of the upper lip and the lower contour extreme point of the nose are shown in fig. 11.

The central position of the beard is determined by the upper contour extreme point of the upper lip and the lower contour extreme point of the nose, as shown in equation (19) and equation (20):

wherein (x)_{n_down},y_{n_down}) Extreme point coordinates of the lower contour of the nose, (x)_{mu_up},y_{mu_up}) Representing the upper contour extreme point coordinates of the upper lip. The central position of the beard on the image to be processed, namely the middle point of the connecting line of the lower extreme point of the nose and the upper contour extreme point of the upper lip can be seen from the formula.

The calculation process of the face inclination angle is referred to for the inclination angle of the beard, and details are not repeated here.

The scaling factor Sx of the beard makeup template on the X axis is shown in formula (21), and the scaling factor Sy on the Y axis is shown in formula (22):

wherein, img_rceAnd img_lceRespectively representing the positions of the left pupil and the right pupil in the image to be processed, and specifically referring to a formula (7) and a formula (8); mask_lbAnd mask_rbRespectively representTwo points are preset in the beard makeup template, and the distance between the two points is the distance between the center points of the left eye and the right eye which are preset according to the size proportion of the makeup template. img_{n_down}And img_{mu_up}Respectively representing the positions of the extreme point of the lower nose contour and the extreme point of the upper contour of the upper lip in the image to be processed, mask_{b_up}And mask_{b_down}Respectively showing the positions of the extreme points of the upper and lower contours in the makeup template of the beard.

And (3) rotationally zooming the beard makeup template according to the determined inclination angle and the zoom factor, and attaching the beard makeup template to the image to be processed, as shown in fig. 12.

Referring to fig. 13a to 13d, fig. 13a is a classic Lena image, a rectangular region in fig. 13b is a detected face region, points in fig. 13c are detected outline points of five sense organs, and fig. 13d is a makeup effect diagram. The results show that: the system is stable and reliable, can overcome the interference of the detection of facial features, outputs the correct position of facial features and ensures the accurate and natural makeup result.

In the embodiment of the invention, the face region detection is respectively carried out on the image to be processed according to the windows with different M-level sizes, and the confidence coefficient of each detected face region is calculated; and detecting facial contour points of five sense organs according to the face region with the highest confidence coefficient. The facial features are detected according to the face region with the highest confidence coefficient, so that the robustness is higher, and the obtained detection result is more accurate. In addition, the window with partial series is used for characteristic extraction, so that the operation efficiency can be improved, the operation amount is reduced and the system efficiency is improved under the condition of improving the robustness and the accuracy. The face makeup is stable and reliable, the interference of the detection of facial features of a plurality of faces can be overcome, and the accurate and natural makeup result is ensured. In addition, the system has high operation speed, has good user experience in practical test on MIPS equipment, can overcome the problems of weaker operation capability and poorer imaging quality of handheld embedded equipment, is also suitable for a PC platform with better application environment, is compatible with stability and high efficiency, and has large application range and practical value.

Based on the same technical concept, an embodiment of the present invention further provides a face detection apparatus, as shown in fig. 14, the apparatus includes:

an obtaining module 1401, configured to obtain an image to be processed;

the first detection module 1402 is configured to perform face region detection on an image to be processed according to M-level windows with different sizes, where M is an integer greater than 1;

a determining module 1403, configured to determine a confidence of each detected face region;

the second detection module 1404 is configured to perform facial feature contour point detection according to the face region with the highest confidence.

Further, the apparatus further includes a selecting module (not shown in the figure) for obtaining the 1 st-nth level windows according to the size of the initial window and a preset amplification factor; the size of the j +1 th level window is obtained by amplifying the size of the j level window according to an amplification factor, wherein 1< ═ j < ═ N-1, and N is an integer larger than M; then, M windows are selected from the N windows.

Preferably, the selecting module is specifically configured to select M-level windows at equal intervals from the N-level windows.

Specifically, the first detecting module 1402 is specifically configured to: when the face region detection is carried out on the image to be processed according to a primary window in M-level windows with different sizes, selecting a candidate region according to the primary window, and carrying out feature extraction on each candidate region by using a feature template corresponding to the primary window; if the calculation result of the extracted features through the cascade classifier is larger than the threshold value of the cascade classifier, judging the current candidate region as a face region; wherein, the threshold of the cascade classifier is obtained after being adjusted downwards on the basis of the threshold obtained by training the sample.

Preferably, the first detecting module 1402 is specifically configured to: if the candidate area is judged to be a face area, marking an area m-1 times of a second step length behind the candidate area as a non-face area in a second direction, and sliding a window in the first direction by n times of a first step length to obtain an area to be selected; if the candidate area is judged to be the non-face area, sliding the window by a first step length in a first direction to obtain an area to be selected; wherein m and n are integers greater than 1;

if the area to be selected is marked as a non-face area, sliding a window by a first step length in a first direction to obtain a candidate area; otherwise, taking the area to be selected as a candidate area;

wherein the size of the candidate region is the same as the size of the window; if the first direction is the horizontal direction and the second direction is the vertical direction, the first step length is the width of the window, and the second step length is the height of the window; if the first direction is the vertical direction and the second direction is the horizontal direction, the first step length is the height of the window and the second step length is the width of the window.

Specifically, the determining module 1403 is specifically configured to determine the confidence of the face region according to the formula (1).

Preferably, the first detecting module 1402 may further scale the image to be processed according to a first ratio to obtain a first image, where the first ratio is smaller than or equal to 1; and then, carrying out face region detection on the first image according to M-level windows with different sizes. The second detecting module 1404 may first scale the image to be processed according to a second ratio, so as to obtain a second image, where the second ratio is smaller than or equal to 1 and greater than or equal to the first ratio; then, according to the position of the face region with the highest confidence level in the first image, determining the face region with the highest confidence level in the second image; then, detecting facial contour points according to the face area with the highest confidence level in the second image; and finally, determining the facial contour points in the image to be processed according to the positions of the facial contour points detected in the second image.

Specifically, the second detection module 1404 is specifically configured to determine a face inclination angle in the face region with the highest confidence, and determine facial contour points in the face region with the highest confidence according to the face inclination angle.

Specifically, the process of calculating the face tilt angle by the second detection module 1404 is as follows:

1) and determining the double-eye area in the face area.

Specifically, the process of determining the facial contour points of the facial features in the facial region with the highest confidence level by the second detection module 1404 according to the face inclination angle is as follows:

Preferably, the second detection module 1404 may preset a threshold for the face inclination angle, and if the face inclination angle is greater than the preset threshold, rotate the image to be processed in the face region with the highest reliability according to the face inclination angle to obtain a rotated face region image; detecting facial feature contour points of the rotated face region image; and reversely rotating the detected coordinates of the facial contour points according to the face inclination angle to obtain the coordinates of the facial contour points in the face region with the highest confidence coefficient. If the calculated face inclination angle is smaller than the threshold value, the fact that the inclination angle of the face in the image is small does not affect the accuracy of facial feature detection, and facial feature contour points can be directly detected on the face region without rotation, so that the calculation amount is reduced under the condition that the detection accuracy is ensured.

Further, the face detection apparatus may further include:

a processing module (not shown in the figure) for processing the facial makeup template according to the facial contour points;

and the fitting module (not shown in the figure) is used for fitting the processed facial makeup template to the face area in the image to be processed.

Specifically, the processing module can determine the inclination angle of the facial features according to the coordinates of the contour points of the facial features, and rotate the facial feature makeup template according to the inclination angle; the size of the five sense organs can be determined according to the coordinates of the contour points of the five sense organs, and the makeup template of the five sense organs is zoomed according to the size of the five sense organs.

Specifically, when the processing module processes the eye makeup template, the coordinates of the center points of the left and right eyes and the inclination angles of the left and right eyes are determined according to the coordinates of the corners of the eyes, such as shown in formula (8), formula (9), and formula (10); determining the face inclination angle according to the coordinates of the central points of the left eye and the right eye, as shown in a formula (7); and determining the rotation angles of the left and right eye makeup templates according to the formula (11) and the formula (12).

Specifically, when the processing module processes the beard makeup template, firstly, the coordinates of the center point of the eye are determined according to the coordinates of the inner corner and the outer corner of the eye, as shown in formula (8) and formula (9); then determining the face inclination angle according to the coordinates of the central points of the left eye and the right eye, as shown in a formula (7); and taking the inclination angle of the human face as the rotation angle of the beard makeup template.

Specifically, when the processing module processes the lip makeup template, the inclination angle of the lips is determined according to the left and right mouth angle coordinates, as shown in formula (15); determining the size of the upper lip according to the coordinates of the contour point of the upper lip, and scaling the upper lip makeup template according to the size of the upper lip, as shown in a formula (18) and a formula (19); in the same way, the lower lip makeup template is zoomed; and (3) respectively determining the coordinates of the central points of the upper lip and the lower lip according to the coordinates of the extreme points of the upper contour and the lower contour of the upper lip and the lower lip, as shown in a formula (16) and a formula (17).

When the lip template is attached to the image to be processed, the attaching module attaches the rotated and zoomed upper lip makeup template to the image to be processed according to the coordinates of the central point of the upper lip; and fitting the rotated and/or scaled lower lip makeup template into the image to be processed according to the coordinates of the center point of the lower lip.

Specifically, the attaching module is further specifically configured to set the transparency of the processed facial makeup template and the transparency of the face region of the image to be processed according to a preset transparency, and attach the facial makeup template with the transparency set to the face region with the transparency set in the image to be processed.

In the embodiment of the invention, the face region detection is carried out on the image to be processed according to the windows with different M-level sizes, and the confidence coefficient of each detected face region is calculated; and detecting facial contour points of five sense organs according to the face region with the highest confidence coefficient. The facial features are detected according to the face region with the highest confidence coefficient, so that the robustness is higher, and the obtained detection result is more accurate. In addition, the window with partial size is used for characteristic extraction, so that the operation efficiency can be improved, the operation amount is reduced and the system efficiency is improved under the condition of improving the robustness and the accuracy. The face makeup is stable and reliable, the interference of the detection of facial features of a plurality of faces can be overcome, and the accurate and natural makeup result is ensured. In addition, the system has high operation speed, has good user experience in practical test on MIPS equipment, can overcome the problems of weaker operation capability and poorer imaging quality of handheld embedded equipment, is also suitable for a PC platform with better application environment, is compatible with stability and high efficiency, and has large application range and practical value.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A face detection method, comprising:

acquiring an image to be processed;

zooming the image to be processed according to a first proportion to obtain a first image, wherein the first proportion is less than or equal to 1;

respectively detecting the face regions of the first image according to M-level windows with different sizes, wherein M is an integer larger than 1;

determining a confidence level of each detected face region;

zooming the image to be processed according to a second proportion to obtain a second image, wherein the second proportion is less than or equal to 1 and is greater than the first proportion;

determining the face region with the highest confidence level in the second image according to the position of the face region with the highest confidence level in the first image;

detecting facial contour points of five sense organs according to the face area with the highest confidence level in the second image;

determining facial contour points in the image to be processed according to the positions of the facial contour points detected in the second image;

wherein the facial feature contour points comprise upper contour points and lower contour points of the facial feature object contour.

2. The method of claim 1, wherein before performing face region detection on the first image according to M-level windows of different sizes, further comprising:

obtaining a 1 st-Nth level window according to the size of the initial window and a preset amplification coefficient; the size of the j +1 th level window is obtained by amplifying according to the amplification factor on the basis of the size of the j level window, wherein 1< ═ j < ═ N-1, and N is an integer larger than M;

and selecting an M-level window from the N-level windows.

3. The method of claim 2, wherein selecting an M-level window from the N-level windows comprises: and selecting M-level windows from the N-level windows at equal intervals.

4. The method of claim 1, wherein performing face region detection on the first image according to M-level windows of different sizes comprises:

aiming at a primary window in the M-level windows with different sizes, selecting a candidate region according to the primary window, and extracting the features of each candidate region by using a feature template corresponding to the primary window;

if the calculation result of the extracted features through the cascade classifier is larger than the threshold value of the cascade classifier, judging the corresponding candidate region as a face region; wherein the threshold of the cascade classifier is obtained by adjusting down the threshold based on the threshold obtained by sample training.

5. The method of claim 4, wherein said selecting candidate regions based on the level window comprises:

if the candidate region is judged to be a face region, marking a region m-1 times of a second step length behind the candidate region as a non-face region in a second direction, and sliding a window in a first direction by n times of a first step length to obtain a region to be selected; if the candidate area is judged to be a non-face area, sliding a window by a first step length in a first direction to obtain an area to be selected; wherein m and n are integers greater than 1;

if the area to be selected is marked as a non-face area, sliding a window by a first step length in a first direction to obtain a candidate area; otherwise, taking the region to be selected as a candidate region;

wherein the candidate region size is the same as the level window size; if the first direction is the horizontal direction and the second direction is the vertical direction, the first step length is the width of the level window, and the second step length is the height of the level window; if the first direction is the vertical direction and the second direction is the horizontal direction, the first step length is the height of the level window, and the second step length is the width of the level window.

6. The method of claim 1 or 4, wherein the confidence level is determined according to the following formula:

conf＝∑(T_i-T_ri)

where conf represents the confidence of the face region, T_iThe calculation result of the characteristic value representing the face region through the ith cascade classifier, T_riRepresenting the threshold of the ith stage of the cascaded classifier.

7. The method of claim 1, wherein performing facial feature contour point detection based on the most confident face region in the second image comprises:

determining the face inclination angle in the face region with the highest confidence coefficient;

and determining facial contour points of the facial features in the facial region with the highest confidence coefficient according to the face inclination angle.

8. The method of claim 7, wherein the determining facial contour points of the facial region with the highest confidence level according to the face tilt angle comprises:

rotating the image in the face region with the highest confidence coefficient according to the face inclination angle to obtain a rotated face region image; or under the condition that the face inclination angle is larger than a preset threshold, rotating the image in the face region with the highest confidence coefficient according to the face inclination angle to obtain a rotated face region image;

detecting facial feature contour points of the rotated face region image;

and reversely rotating the coordinates of the detected facial contour points according to the face inclination angle to obtain the coordinates of the facial contour points in the face region with the highest confidence coefficient.

9. The method of claim 7, wherein determining a face tilt angle from the face region with the highest confidence comprises:

determining a binocular region in the face region with the highest confidence coefficient;

determining pupil positions of the two eyes according to component values of a color space in the two-eye area, wherein the component values of the color space comprise brightness values and/or chromatic values;

and determining the face inclination angle according to the pupil positions of the two eyes.

10. The method of any one of claims 1 to 5, 8, and 9, further comprising, after performing facial feature contour point detection:

determining the inclination angle of the facial features according to the coordinates of the contour points of the facial features, determining the rotation angle of the facial feature makeup template according to the inclination angle of the facial features, and rotating the facial feature makeup template according to the rotation angle; and/or determining the size of the five sense organs according to the coordinates of the contour points of the five sense organs, and zooming the facial makeup template of the five sense organs according to the size of the five sense organs;

and fitting the processed facial makeup template to the face area in the image to be processed.

11. The method as claimed in claim 10, wherein the determining the tilt angle of the facial features according to the coordinates of the contour points of the facial features and the determining the rotation angle of the facial feature makeup template according to the tilt angle of the facial features comprise:

determining coordinates of central points of the left eye and the right eye and inclination angles of the left eye and the right eye according to the coordinates of the inner canthus and the outer canthus of the eyes;

determining the face inclination angle according to the coordinates of the central points of the left and right eyes;

determining the rotation angles of the left and right eye makeup templates according to the following formula:

wherein, theta_leyeAnd theta_reyeRespectively representing the rotation angles of the left eye makeup template and the right eye makeup template; theta_faceRepresenting the face inclination angle; theta_leRepresenting the left eye tilt angle, θ_reIndicating the right eye tilt angle.

12. The method as claimed in claim 10, wherein the determining the tilt angle of the facial features according to the coordinates of the contour points of the facial features and the determining the rotation angle of the facial feature makeup template according to the tilt angle of the facial features comprise:

determining the coordinates of the center points of the eyes according to the coordinates of the inner canthus and the outer canthus;

and taking the inclination angle of the human face as the rotation angle of the beard makeup template.

13. The method as claimed in claim 10, wherein the determining the tilt angle of the facial features according to the coordinates of the contour points of the facial features and the determining the rotation angle of the facial feature makeup template according to the tilt angle of the facial features comprise: determining the inclination angle of the lips according to the left and right mouth angle coordinates;

the determining the size of the five sense organs according to the coordinates of the contour points of the five sense organs, and zooming the makeup template of the five sense organs according to the size of the five sense organs comprises the following steps:

determining the size of the upper lip according to the coordinates of the contour point of the upper lip, and zooming the upper lip makeup template according to the size of the upper lip; determining the size of a lower lip according to the coordinates of the contour point of the lower lip, and zooming the lower lip makeup template according to the size of the lower lip;

the fitting of the processed facial makeup template to the human face area in the image to be processed comprises the following steps:

determining the coordinates of the central points of the upper lip and the lower lip according to the coordinates of the extreme points of the upper contour and the lower contour of the upper lip and the lower lip respectively;

fitting the rotated and/or scaled upper lip makeup template into the image to be processed according to the coordinates of the central point of the upper lip; and fitting the rotated and/or scaled lower lip makeup template into the image to be processed according to the coordinates of the center point of the lower lip.

14. A face detection apparatus, comprising:

the acquisition module is used for acquiring an image to be processed;

the first detection module is used for zooming the image to be processed according to a first proportion to obtain a first image, the first proportion is smaller than or equal to 1, and the first image is subjected to face region detection respectively according to M levels of windows with different sizes, wherein M is an integer larger than 1;

the second detection module is used for zooming the image to be processed according to a second proportion to obtain a second image, wherein the second proportion is less than or equal to 1 and is greater than the first proportion; determining the face region with the highest confidence level in the second image according to the position of the face region with the highest confidence level in the first image; detecting facial contour points of five sense organs according to the face area with the highest confidence level in the second image; determining facial contour points in the image to be processed according to the positions of the facial contour points detected in the second image;

15. The apparatus of claim 14, further comprising:

the selection module is used for obtaining the 1 st-Nth level windows according to the size of the initial window and a preset amplification coefficient; the size of the j +1 th level window is obtained by amplifying according to the amplification factor on the basis of the size of the j level window, wherein 1< ═ j < ═ N-1, and N is an integer larger than M; and selecting an M-level window from the N-level windows.

16. The apparatus according to claim 15, wherein the selecting module is specifically configured to select M-level windows from the N-level windows at equal intervals.

17. The apparatus of claim 14, wherein the first detection module is specifically configured to:

aiming at a first-level window in the M-level windows with different sizes, when the face region of the first image is detected according to the M-level windows with different sizes, selecting a candidate region according to the level window, and extracting the features of each candidate region by using a feature template corresponding to the level window;

18. The apparatus of claim 17, wherein the first detection module is specifically configured to:

19. The apparatus of claim 14 or 17, wherein the determining module is specifically configured to:

determining the confidence of the face region according to the following formula:

conf＝∑(T_i-T_ri)

20. The apparatus of claim 14, wherein the second detection module is specifically configured to:

21. The apparatus of claim 20, wherein the second detection module is specifically configured to:

detecting facial feature contour points of the rotated face region image;

22. The apparatus of claim 20, wherein the second detection module is specifically configured to:

23. The apparatus of any of claims 14 to 18, 20 to 22, further comprising:

the processing module is used for determining the inclination angle of the facial features according to the coordinates of the contour points of the facial features, determining the rotation angle of the facial feature makeup template according to the inclination angle of the facial features, and rotating the facial feature makeup template according to the rotation angle; and/or determining the size of the five sense organs according to the coordinates of the contour points of the five sense organs, and zooming the facial makeup template of the five sense organs according to the size of the five sense organs;

and the fitting module is used for fitting the processed facial makeup template to the face area in the image to be processed.