CN111539911B

CN111539911B - Mouth breathing face recognition method, device and storage medium

Info

Publication number: CN111539911B
Application number: CN202010209044.0A
Authority: CN
Inventors: 罗冠; 游强; 胡卫明
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2020-03-23
Filing date: 2020-03-23
Publication date: 2021-09-28
Anticipated expiration: 2040-03-23
Also published as: CN111539911A

Abstract

The invention discloses a breath face recognition method, breath face recognition equipment and a storage medium. The method comprises the following steps: collecting effective face images; determining the attitude angle of a target face in an effective face image; if the attitude angle of the target face is within the preset attitude angle range, extracting the structural measurement characteristics of the target face from the effective face image; inputting the structural measurement characteristics of the target face into a pre-trained mouth breathing face recognition model, and obtaining a mouth breathing face recognition result output by the mouth breathing face recognition model; the method comprises the steps of carrying out image amplification processing on a preset positive sample image, and training a mouth breathing face recognition model by using the positive sample image subjected to the image amplification processing and the preset negative sample image. The invention utilizes the image processing technology and the pre-trained mouth breathing face recognition model to recognize whether the target face is the mouth breathing face, thereby solving the problem of higher cost for both suspected patients and doctors in the mouth breathing face diagnosis process.

Description

Mouth breathing face recognition method, device and storage medium

Technical Field

The invention relates to the technical field of image processing, in particular to a breath face recognition method, breath face recognition equipment and a storage medium.

Background

The mouth breathing face is also called as adenoid face in clinic. The mouth breathing face appearance is the change of developing dentognathic jaw due to adenoid hypertrophy, and then a specific dentognathic face deformity is formed, and the deformity reaction causes different face appearances from common people on the face. People with adenoids are often in a habit of breathing typically by using the oral cavity rather than the nasal cavity, and the breathing habit causes the people to have facial commonalities which are characterized by flat nose, shorter lips, retrohaving chin, forward tilting head and the like.

People with a mouth breathing face tend to suffer from sleep problems such as: the sleep difficulty at night, the snore symptom is serious, and the like, and the mouth breathing face seriously influences the personal image, so the early discovery and the early treatment (generally, the orthodontic treatment or the orthodontic treatment is needed) are very necessary.

However, the existing mouth breathing face-beautifying diagnosis process is relatively high in cost for both suspected patients and doctors. Suspected patients need to be identified in hospitals by means of professional equipment, and the time cost and the capital cost are high. When identifying the face and face of mouth breathing, doctors can determine the face and face of mouth breathing only by analyzing the adenoid and the jaw bone of the mouth cavity between the mouth cavity and the nasal cavity with the help of professional equipment, and the time cost is high.

Disclosure of Invention

The main purpose of the present embodiment of the present invention is to provide a method, a device, and a storage medium for identifying a mouth breathing face, so as to solve the problem that the existing mouth breathing face diagnosis process is costly for both suspected patients and doctors.

In view of the above technical problems, the embodiments of the present invention are implemented by the following technical solutions:

the embodiment of the invention provides a mouth breathing face recognition method, which comprises the following steps: collecting effective face images; determining the attitude angle of a target face in the effective face image; if the attitude angle of the target face is within a preset attitude angle range, extracting the structural measurement feature of the target face from the effective face image; inputting the structural measurement characteristics of the target face into a pre-trained mouth breathing face recognition model, and obtaining a mouth breathing face recognition result output by the mouth breathing face recognition model; and training the mouth breathing face recognition model by using the positive sample image subjected to the image amplification treatment and the preset negative sample image.

Wherein, the collecting of the effective face image comprises: collecting an environment image of a user; determining an average brightness value of the user environment image; if the average brightness value of the user environment image is within a preset brightness value range, performing face detection on the user environment image; if a face is detected in the user environment image, determining that the user environment image is a valid face image; and if the average brightness value of the user environment image is not in the brightness value range, or a human face is not detected in the user environment image, carrying out re-acquisition prompting.

Wherein, before the face detection for the user environment image, the method further comprises: determining an image brightness standard deviation of the user environment image; and if the image brightness standard deviation is smaller than a preset image brightness standard deviation threshold value, performing image enhancement processing on the user environment image by utilizing a gamma conversion algorithm.

Wherein the determining the attitude angle of the target face in the effective face image comprises: marking points in the effective face image aiming at the target face; acquiring a preset three-dimensional human head portrait model; wherein, the face of the three-dimensional human head portrait model is marked with mark points, and the number and the types of the mark points marked on the face of the three-dimensional human head portrait model are the same as those of the mark points marked on the target human face; and determining the attitude angle of the target face according to the mark points in the three-dimensional human head portrait model and the mark points aiming at the target face in the effective face image.

Extracting the structural measurement feature of the target face from the effective face image, wherein the extracting the structural measurement feature of the target face comprises the following steps: marking points in the effective face image aiming at the target face; extracting face structure key points of the target face according to the mark points of the target face; and extracting the structural measurement characteristics corresponding to the target face according to the face structural key points of the target face.

Wherein, carry out data augmentation processing to preset positive sample image, include: extracting the structural measurement features of the human face from the positive sample image; and adding Gaussian noise to each dimension of the structural measurement features of the human face to obtain a new positive sample image.

Wherein, if the mouth breathing face recognition model is an XGBoost model, before inputting the structural measurement feature of the target face into a pre-trained mouth breathing face recognition model, the method further comprises: training the mouth breathing face recognition model according to a preset data set; the data set comprises a training data set and a verification data set, wherein the training data set comprises a preset negative sample image and a positive sample image after image augmentation processing; according to the preset data set, training the mouth breathing face recognition model comprises the following steps: step 2, setting an initial value of the CART classification in the XGboost model and the maximum tree depth of the regression tree; step 4, training the structure and the weight of the XGboost model by using a preset training data set; step 6, verifying the trained structure and weight in the XGboost model by using a preset verification data set, and executing the depth adjustment of the maximum tree at the current time according to a verification result; and 8, determining whether the maximum tree depth adjusted at the previous time is the optimal maximum tree depth or not by using a preset grid search algorithm, if so, setting the maximum tree depth of the CART tree in the XGboost model as the optimal maximum tree depth, and otherwise, jumping to the step 4.

Wherein the data set further comprises: testing the data set; after the maximum tree depth of the CART tree in the XGboost model is set as the optimal maximum tree depth, the method further comprises the following steps: testing the XGboost model which is set to the optimal maximum tree depth by using a preset test data set, and determining a performance metric value of the XGboost model; and finishing the training of the XGboost model if the performance metric value of the XGboost model is within a preset performance range.

The embodiment of the invention also provides a mouth breathing face recognition device, which comprises: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program when executed by the processor implementing the method of mouth-breathing face recognition as set forth in any one of the preceding claims.

The embodiment of the invention also provides a storage medium, wherein the storage medium is stored with a mouth breathing face recognition program, and the mouth breathing face recognition program is executed by a processor to realize the mouth breathing face recognition method.

The embodiment of the invention has the following beneficial effects:

in the embodiment of the invention, an image processing technology is utilized to collect effective face images, the structural measurement characteristics of effective target faces in the effective face images are extracted, and whether the target faces are mouth breathing faces is identified by combining a pre-trained mouth breathing face identification model. Further, before the model is trained, positive sample images are collected and subjected to augmentation processing, so that the number of the positive sample images is increased, and the identification accuracy of the trained mouth breathing face recognition model is more accurate.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flow chart of a method of mouth-breathing face recognition according to an embodiment of the invention;

FIG. 2 is a flowchart of the steps for acquiring a valid face image according to one embodiment of the present invention;

FIG. 3 is a flowchart of the steps of an image enhancement process according to one embodiment of the invention;

FIG. 4 is a flowchart of the steps of attitude angle determination, according to one embodiment of the present invention;

FIG. 5 is a schematic diagram of coordinate system conversion according to an embodiment of the present invention;

FIG. 6 is a flowchart of the steps of structure metric feature extraction, according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a marker according to an embodiment of the present invention;

FIG. 8 is a schematic view of point A and a grout bearing point according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of key points of a facial structure according to an embodiment of the present invention;

FIG. 10 is a flowchart of the steps of an image augmentation process according to one embodiment of the present invention;

FIG. 11 is a flowchart of the training steps of a mouth breathing face recognition model according to one embodiment of the invention;

fig. 12 is a block diagram of a mouth-breathing face-volume recognition device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and specific embodiments.

According to an embodiment of the invention, a mouth breathing face recognition method is provided. Fig. 1 is a flowchart illustrating a mouth-breathing face recognition method according to an embodiment of the invention.

Step S110, collecting effective face images.

The valid face image refers to an image which contains a face and has an average brightness value within a preset average brightness value range.

And step S120, determining the attitude angle of the target face in the effective face image.

The target face refers to the face of a user to be identified with mouth breathing face.

The pose angles (θ, ψ, φ) of the target face include: pitch angle θ, yaw angle ψ, and rotation angle Φ.

In the present embodiment, the pose angle of the target face is determined from the face image of the target face.

The effective face image may include a plurality of faces, and one face is selected from the effective face image as a target face.

Step S130, if the pose angle of the target face is within a preset pose angle range, extracting a structural metric feature of the target face from the effective face image.

The structural measurement feature refers to the structural feature of a human face. Further, the structure metric features are multi-dimensional feature vectors. For example: size, angle, etc. of the five sense organs. The structure metric features can be used to predict whether the target face is a mouth breathing face.

And if the attitude angle of the target face is within the preset attitude angle range, the target face is basically a front face. The pitch angle range e-25 degrees and 25 degrees in the attitude angle range, the deflection angle range e-25 degrees and the rotation angle range e-35 degrees and 35 degrees can be set. When θ is 0, ψ is 0, and Φ is 0, it indicates that the current target face is a standard face. The pose angle of the target face is within the range of pose angles, namely: and judging that the target face is effective when the pitch angle of the target face is within the range of the pitch angle, the deviation angle is within the range of the deviation angle and the rotating angle is within the range of the rotating angle.

And if the attitude angle of the target face is not within the preset attitude angle range, indicating that the target face is not a front face, and then carrying out re-acquisition prompt so that the user acquires the user environment image again according to the re-acquisition prompt. And further, comparing the attitude angle of the target face with a preset attitude angle range, if the attitude angle range is exceeded, the target face is invalid, and sending a re-acquisition prompt to the user so as to prompt the user to upload an image containing the front face.

The face of the face on the front side is screened to identify the face of the mouth breathing, so that the identification accuracy of the face of the mouth breathing can be improved. When the face cannot be completely displayed, the face information is seriously lost, and the obtained recognition result of the mouth breathing face appearance is inaccurate.

Step S140, inputting the structural measurement characteristics of the target face into a pre-trained mouth breathing face recognition model, and obtaining a mouth breathing face recognition result output by the mouth breathing face recognition model; and training the mouth breathing face recognition model by using the positive sample image subjected to the image amplification treatment and the preset negative sample image.

And the mouth breathing face recognition model is used for recognizing whether the target face is the mouth breathing face according to the structural measurement characteristics of the target face. The recognition result output by the mouth breathing face recognition model is that the target face is the percentage of the mouth breathing face and the target face is not the percentage of the mouth breathing face.

The mouth breathing face recognition model includes but is not limited to: an eXtreme Gradient Boosting (XGBoost) model, a linear regression model, a Support Vector Machine (SVM) model, or a deep learning network.

Before mouth breathing face recognition is carried out by using the mouth breathing face recognition model, the mouth breathing face recognition model is trained by using a preset data set until the mouth breathing face recognition model converges.

Specifically, before the mouth breathing face recognition model is trained, a plurality of positive sample images and a plurality of negative sample images are acquired. The positive sample image refers to an image of a human face containing a mouth breathing face. The negative sample image is an image of a human face that does not contain the mouth breathing face. And performing image augmentation processing on each positive sample image, wherein the image augmentation processing comprises the following steps: extracting the structural measurement characteristics of the target face from the positive sample image; and adding Gaussian noise to each dimension feature in the structure metric features to obtain a new positive sample image based on the positive sample image. Wherein gaussian noise defining a smaller amplitude scale is added to each dimension of the structure metric features. And performing image amplification processing for multiple times based on the positive sample image, and further obtaining multiple new positive sample images based on the positive sample image. And labeling a first label for the positive sample image after the image augmentation processing, and labeling a second label for each negative sample image. The first label represents that the face in the image is the mouth breathing face, and the second label represents that the face in the image is not the mouth breathing face. And forming a data set according to all the positive sample images after the image augmentation processing and the collected negative sample images.

In the embodiment, an image processing technology is utilized to collect effective face images, the structural measurement characteristics of effective target faces in the effective face images are extracted, and whether the target faces are mouth breathing faces or not is identified by combining a pre-trained mouth breathing face recognition model. Further, before the model is trained, positive sample images are collected and subjected to augmentation processing, so that the number of the positive sample images is increased, and the identification accuracy of the trained mouth breathing face recognition model is more accurate.

The following is a detailed description of the steps for acquiring valid face images.

Fig. 2 is a flowchart illustrating steps of acquiring a valid face image according to an embodiment of the present invention.

And step S210, acquiring an environment image of the user.

The user environment image refers to an image in the camera view field acquired by the camera.

The user environment image can call a camera of the user equipment or the face acquisition equipment to acquire the user environment image, or acquire the user environment image uploaded by the user. For example: the user environment image is collected in real time by using the user equipment, and the user can be prompted to upload the user environment image.

One or more faces may be included in the user environment image. Of course, the user environment image may not include any human face.

Step S220, determining an average brightness value of the user environment image.

In this embodiment, I (x, y) may be used to represent a user environment image, where the width of the user environment image is w and the height of the user environment image is h; wherein x ∈ [0, w ]],y∈[0,h]；I_xyThe value of (b) represents the brightness value of a pixel point with position coordinates (x, y) in the user environment image, I_xy∈[0,255]。

The calculation formula of the average brightness value of the user environment image is as follows:

further, if the user environment image is a color image, I_xy＝[I_R,I_G,I_B]Wherein, I_R，I_GAnd I_BThe luminance values of the three channels of red, yellow and blue, respectively, the average luminance value of the user environment image may be replaced with the average of the luminance means of the three channels, that is: the average brightness value of the user environment image is (the brightness mean value of the red channel + the brightness mean value of the yellow channel + the brightness mean value of the blue channel) ÷ 3, and the brightness mean value is the sum of the brightness values of all the pixels ÷ the number of all the pixels.

Step S230, determining that the average brightness value of the user environment image is within a preset brightness value range; if yes, go to step S240; if not, step S270 is executed.

Presetting a brightness value range of [ I ]⁰,I¹]. End value I of the brightness value range⁰And I¹May be an empirical value or a value obtained by experiment. When in use

The average brightness value of the image representing the environment of the user is too dark; when in use

It indicates that the average luminance value of the user environment image is too bright.

In this embodiment, in order to reduce the number of times of acquiring the user environment image, a relatively extreme situation is simulated in advance, for example, an average brightness value of the user environment image in a night environment and a face scene where a high-power light source is directly irradiated is simulated, and the average brightness value of the user environment image in the night environment is used as a lower limit I of a brightness value range⁰Taking the average brightness value of the user environment image under the condition that the high-power light source directly irradiates the human face as the upper limit I of the brightness value range¹. Further, the lower limit I of the luminance value range may be set⁰And upper limit I¹Set to 25 and 230 in order. Images photographed in daily situations hardly appearSuch extreme average luminance values, once extreme conditions have occurred, represent that the image is hardly usable and needs to be discarded, and a preset rejection operation may be performed. The rejection may be a re-acquisition prompt. The brightness of the user environment image is judged, so that the precision of subsequent face detection can be improved.

Step S240, if the average brightness value of the user environment image is within the brightness value range, performing face detection on the user environment image.

The manner of performing face detection on the user environment image will be described in detail later.

Step S250, judging whether a human face is detected in the user environment image; if yes, go to step S260; if not, step S270 is executed.

Step S260, if a face is detected in the user environment image, determining that the user environment image is a valid face image.

After a face is detected in the user environment image, a face region is identified in the user environment image, and the identified face region is taken as a face image.

In this embodiment, a face detection frame may be used to identify an area where a face is located in the user environment image. And if a plurality of faces are detected in the user environment image, respectively identifying the area of each detected face by using a plurality of face detection frames.

Step S270, if the average brightness value of the user environment image is not within the brightness value range, or a human face is not detected in the user environment image, performing a re-acquisition prompt.

In this embodiment, before performing face detection on the user environment image, in order to ensure that the user environment image has good contrast, image enhancement processing may be performed on the user environment image.

The contrast of the user environment image refers to a measure of the different brightness levels between the brightest white and darkest black of the bright and dark regions in the user environment image, i.e., the magnitude of the brightness contrast (difference) of the user environment image. A larger brightness contrast represents a larger contrast, and a smaller brightness contrast represents a smaller contrast.

In this embodiment, the image enhancement processing method includes, but is not limited to: gamma transformation and logarithmic transformation. The following describes image enhancement processing performed on an environment image of a user with a small contrast.

FIG. 3 is a flowchart illustrating steps of an image enhancement process according to an embodiment of the present invention.

In step S310, an image brightness standard deviation of the user environment image is determined.

In order to determine whether the user environment image needs to be subjected to the image enhancement operation, an image brightness standard deviation σ of the user environment image may be calculated, and the image brightness standard deviation σ may be referred to as root-mean-square contrast.

In the present embodiment, the calculation formula of the image luminance standard deviation σ is as follows:

the greater the contrast of the user environment image is, the greater the image brightness standard deviation sigma is; the smaller the contrast of the user environment image, the smaller the image brightness standard deviation σ.

Step S320, if the image brightness standard deviation is smaller than a preset image brightness standard deviation threshold, performing image enhancement processing on the user environment image by using a gamma conversion algorithm.

For the user environment image with small contrast, the gamma conversion algorithm can be adopted for image enhancement processing. The gamma conversion algorithm has the standard form:

wherein, I (x, y) is the user environment image before image enhancement, O (x, y) is the user environment image after image enhancement, and γ is the control parameter. Wherein γ is greater than 0. That is, the following operation is performed for each pixel point in the user environment image:

wherein, the brightness value of the pixel point after the image enhancement is obtained.

When γ is greater than 1, the user environment image becomes dark as a whole, which stretches the region of higher brightness in the image while compressing the portion of lower brightness.

When γ is equal to 1, the user environment image has no change.

When γ is larger than 0 and smaller than 1, the user environment image becomes brighter as a whole, which stretches the area of lower brightness in the image and compresses the portion of higher brightness.

In this embodiment, the average brightness value of the user environment image is combined

The optimal brightness value range of the user environment image is 165-175, and 170 can be taken as an average brightness value threshold.

Where γ is the empirical formula:

when in use

When gamma is equal to 1, the user environment image has no change; when in use

When going to 0, γ goes to 0, the user environment image becomes bright as a whole, and the contrast increases; when in use

When the trend goes to 255, gamma tends to be infinite, and the user environment imageThe whole becomes dark and the contrast becomes large.

After the image enhancement processing is performed on the user environment image, denoising processing may be further performed on the user environment image after the image enhancement processing.

After the image enhancement processing is performed on the user environment image, face detection can be performed on the user environment image. Face detection is further described below.

The face detection method can be performed by adopting a sliding window method. Specifically, the sliding window moves in the user environment image in preset steps, the classifier performs face recognition on an image area in the sliding window based on the external outline of the face, and when a shape matched with the external outline of the face exists in the image area, the image area is classified into the face, which represents that the face is detected.

The sliding window may be considered a face detection box. Since faces vary in size, the size of the sliding window is scaled in size to match the size variations of different faces. In the process of detecting the face by using the sliding window, a face detection method based on a Histogram of Gradients (Histogram of Gradients) can be adopted to detect the face in the user environment image; a human face detection method based on Harr-like characteristics can also be adopted to detect the human face in the user environment image.

Of course, since the human face has its special structural and textural features, the embodiment of the present invention may also use a deep neural network to detect the human face in the user environment image.

The category of deep neural networks includes, but is not limited to: a Multi-Task cascaded convolutional Neural Network (MTCNN for short) and a MobileNet-SSD.

In the embodiment of the present invention, the MTCNN may be used to perform face detection on an input user environment image. The MTCNN may detect a face in the user environment image and identify an area in which the detected face is located using a face detection frame in the user environment image.

The MTCNN is a face detection deep learning model based on multi-task cascade CNN, and face frame regression and face key point (mark point) detection are comprehensively considered in the model. The user environment image input into the MTCNN can be scaled into user environment images with different sizes according to different scaling ratios, so that a characteristic pyramid of the image is formed, and faces with different sizes can be detected. MTCNN comprises three cascaded subnetworks, called PNet, RNet and ONet, respectively. Wherein, for each scale of the user environment image, PNet, RNet and ONet are respectively used for:

the PNet generates a regression vector of a candidate window and a bounding box for marking a face region according to an input user environment image; calibrating the generated candidate window by using the regression vector of the bounding box; and performing first deduplication processing on the calibrated candidate frame opening through a first Non-maximum suppression (NMS) algorithm to obtain a PNet deduplication candidate window.

RNet firstly uses the regression vector of the boundary frame to calibrate a candidate window subjected to PNet de-weight; and then, carrying out second-time duplicate removal processing on the calibrated candidate window by utilizing a second NMS algorithm to obtain the RNet duplicate-removed candidate window. In this way, further screening of candidate windows subject to PNet deduplication is achieved.

The ONet function is similar to the RNet function, and the regression vector of the bounding box is firstly utilized to calibrate a candidate window subjected to RNet de-weighting; and carrying out third-time de-duplication processing on the calibrated candidate window by using a third NMS algorithm, and simultaneously generating five marked point positions while removing the overlapped candidate window. In this way, while the ONet further screens the candidate windows subjected to RNet de-duplication, five marker points are detected on the face framed by each candidate window. The marking points refer to characteristic points marked at preset positions of the human face. The five marker points include: the mark points are respectively marked on the two pupils, the nose and the two corners of the mouth.

The overlap degree (IOU for short) set in the first NMS algorithm, the second NMS algorithm and the third NMS algorithm is different, the IOU is the first NMS algorithm, the second NMS algorithm and the third NMS algorithm from large to small, and therefore the PNet, the RNet and the ONet can finish the duplication elimination of the candidate windows from coarse to fine.

Since the user environment image input to the MTCNN is scaled according to different scaling ratios to form an image pyramid, that is, an image of multiple scales, and then the PNet, the RNet, and the ONet respectively perform face detection on the user environment image of each scale, it is necessary to normalize all candidate windows to the user environment image of the original size after face detection. For example: if the scale of some user environment images is twice of the original scale, then when the user environment images return to the original size, the candidate window needs to be normalized to the original size, that is, the size of the candidate window needs to be divided by 2. The candidate windows on multiple scales are normalized to the original scale for comparability.

In the present embodiment, before detecting a face in a user environment image based on a deep neural network, a face detection network MTCNN for face detection needs to be trained. Further, the training of the MTCNN includes: pre-training the MTCNN using an open-source face data set so as to pre-train weights in the MTCNN; the MTCNN is retrained using a pre-collected oriented face data set to perform fine-tune (fine-tune) training on weights in the MTCNN, so that the MTCNN can better detect a face image similar to the face type distribution of the oriented face data set. Face types, including but not limited to: age layer of the face, gender of the face, and skin color of the face.

Open source face data sets including, but not limited to: VGG-Face, FDDB. The open source data set is characterized in that the human faces are very wide, but lack of accuracy, and the human faces of all races are included, wherein the human faces of white people are taken as main faces. The directional face data set is a face image of a preset face type collected according to the characteristics of an application scene, for example: the images in the directional face data set are dominated by faces of yellow-seeded people.

Whether pre-training or fine training is performed, a face image of a face data set (an open source face data set and a directional face data set) is input into the MTCNN, the MTCNN is used for detecting a face in the face image, a detection result is compared with a result pre-labeled for the face image, if the detection result of the MTCNN is the same as the result pre-labeled for the face image, the trained MTCNN is indicated to correctly classify (namely accurately identify) a sample (the face image), and when the identification accuracy of the MTCNN is not improved any more, the MTCNN is considered to be converged. The recognition accuracy is the number of times of recognition accuracy ÷ (the number of times of recognition accuracy + the number of times of recognition error).

After the MTCNN converges, the MTCNN may perform face detection on the user environment image after image enhancement.

The user environment image is input to the trained MTCNN. The user environment image input to the MTCNN network may or may not include a human face. When the user environment image does not contain the face, the output result of the MTCNN network is null; when the user environment image contains a face, the MTCNN network outputs the user environment image containing a face detection frame (identifying a face region). When a face is detected to appear in the user environment image, the face is framed by a face detection frame. When a plurality of faces are detected to appear in the user environment image, each face is framed out by one face detection frame.

If the human face is detected in the user environment image and the average brightness value of the user environment image is within the brightness value range, the user environment image is determined to be an effective human face image, and then the attitude angle of the target human face in the effective human face image can be determined.

Fig. 4 is a flowchart illustrating the steps of determining the attitude angle according to an embodiment of the present invention.

And step S410, marking points in the effective face image according to the target face.

The posture of the face includes a pitch angle (pitch angle) of a face which heads down in a three-dimensional space, a yaw angle (yaw angle) of the face which is deviated to the left or right side, and an angle (rotation angle) of the face which rotates counterclockwise or clockwise in a plane. The estimation of the attitude angle of the target face is completed depending on the mark points of each part of the target face, and the more the mark points are, the finer the mark points are, and the more accurate the estimated attitude angle is.

In this embodiment, when determining the pose angle of the target face, the 5 marking points may be used to mark the target face in the effective face image based on the 5 marking points output by the MTCNN, or based on a 5 marking point model used in an open source machine learning library (dlib). Of course, in order to improve the accuracy of the pose estimation, a 68-point marking point model in dlib may also be used, i.e. 68 marking points are marked on the target face.

Step S420, acquiring a preset three-dimensional human head portrait model; wherein, the face of the three-dimensional human head portrait model is marked with mark points, and the number and the types of the mark points marked on the face of the three-dimensional human head portrait model are the same as those of the mark points marked on the target human face.

The type of the mark point can reflect the position of the mark point on the face. Therefore, each mark point marked on the target human face has a corresponding mark point at the corresponding position of the face of the three-dimensional human head portrait model.

If the face of the three-dimensional human head portrait model is marked with 5 marking points, marking of the 5 marking points can be carried out aiming at the target face; if 68 marking points are marked on the face of the three-dimensional human head portrait model, marking the 68 marking points aiming at the target human face.

And step S430, determining the attitude angle of the target human face according to the mark points in the three-dimensional human head portrait model and the mark points aiming at the target human face in the effective human face image.

And rotating the three-dimensional human head portrait model in three directions to enable the N marking points of the target human face to be superposed (or approximately superposed) with the N marking points in the three-dimensional human head portrait model, so that the posture of the three-dimensional human head portrait model is the posture of the target human face.

In this way, the pose angle estimation problem of the target face can be converted into the following optimization problem:

the attitude angle of the three-dimensional human head portrait model is assumed to be (theta, psi, phi), and the attitude angle, the deflection angle and the rotation angle are correspondingly arranged in sequence. As shown in fig. 5, with the camera (camera) parameters fixed, the rotation matrix R and translation vector t from the world coordinate system to the camera coordinate system are solved. The world coordinate system is a three-dimensional coordinate system where the three-dimensional human head portrait model is located, and the camera coordinate system is a plane coordinate system (two-dimensional coordinate system) where the target human face in the effective human face image is located.

And after the rotation matrix R and the translation vector t are obtained, carrying out Euler angle conversion on the rotation matrix R and the translation vector t to obtain a pitch angle, a deflection angle and a rotation angle of the target face.

Specifically, after N marker points are marked on the target face, each marker point on the target face is a projection point of one marker point of the three-dimensional human head portrait model face. The three-dimensional coordinate of a mark point P of the three-dimensional human head portrait model face is P_iThe imaging coordinate (two-dimensional coordinate) of the mark point P on the plane of the target face is f (P)_i(ii) a R, t), the two-dimensional coordinate of the real projection point p is p_iIn order to obtain the rotation matrix R and the translational vector t, only the following minimum projection mean square error problem needs to be solved.

The expression of the minimum projection mean square error may be:

thus, the minimum projection mean square error can be approximately solved by a Levenberg-Marquardt optimization method, and the optimization method has the following idea: and (3) slightly adjusting the three-dimensional human head portrait model to obtain the coordinates of the mark points on the three-dimensional human head portrait model projected on an image plane (the plane where the target human face is located) until the projected mean square error reaches a minimum value. In actual engineering application, a coordinate set of a mark point on the face of a three-dimensional human head portrait model on an image plane is obtained through a standard camera, then internal parameters (initial R and t) of the camera and the focal length of the camera are calibrated, and then functions such as solvePp and the like are called by using an open-source computer vision library OpenCV to complete posture estimation of a target face.

After the attitude angle of the target face is obtained, comparing the attitude angle of the target face with a preset attitude angle range, if the attitude angle of the target face is within the preset attitude angle range, the target face is considered to be effective, the target face in an effective face image can be cut, only the face area of the target face is reserved, a face image of the target face is obtained, and the structural measurement feature of the target face is extracted from the face image.

In this embodiment, before extracting the structure metric feature of the target face, a face alignment operation is performed on the target face. A face alignment operation comprising: and performing attitude angle compensation through affine transformation to enable the human face to be transformed into a front face or an approximate front face, wherein the operations are called human face alignment, and after the human face alignment operation, the structural measurement features of the target human face can be extracted.

Fig. 6 is a flowchart illustrating the steps of structure metric feature extraction according to an embodiment of the present invention.

And step S610, marking points in the effective face image according to the target face.

And S620, extracting the key points of the face structure of the target face according to the mark points of the target face.

Step S630, extracting the structural measurement characteristics corresponding to the target face according to the face structural key points of the target face.

The key points of the face structure refer to mark points for positioning the face structure. Structural key points of the face, including but not limited to: and the marking points are used for positioning key area positions such as eyebrows, eyes, a nose, a mouth, face contours and the like.

The structural measurement feature refers to the structural feature of a human face. Further, the structure metric features are multi-dimensional feature vectors. For example: size, angle, etc. of the five sense organs.

Specifically, the step of marking the face area of the target face is similar to the step of marking the face area of the target face when the pose angle is determined, but in order to better mark the structural information of the target face, the model used in the embodiment is 68 marker models in dlib, and the 68 marker points can outline each part of the target face, for example, the 68 marker points can outline the eyebrow shape, the eyes, the nose, the mouth, and the face contour. If 68 marking points are marked on the target face when effective face recognition is carried out, marking the marking points on the target face in the effective face image can use the marking points which are marked on the target face.

Further, since the embodiment of the present invention needs to identify whether the target face is a mouth breathing face, according to the characteristics of the mouth breathing face, when marking points on the face area of the target face, other marking points may be marked, for example: points (points) of the body fluid bearing and the midpoint of the hairline.

The supporting point refers to a depressed part between the lower edge of the lower lip and the chin top (called as pavilion), and the depressed part is closely related to the structural measurement feature of the lower part of the face. The mud jacking point may be taken as the 69 th marked point. Further, the grout point is generally on the line segment between the lower lip and the ground pavilion and at the bottom of the concavity. The bottom of the recess between the lower lip's lower edge and the pavilion is often the lowest point of brightness on the line segment. Assuming that the quartile points of a line segment from the lower edge of the lower lip to the ground pavilion are a, b and c in sequence; the point with the lowest brightness among the sub-line segments ab is searched for and is considered as the slurry bearing point to be found. Thus, 69 markers are found on the target face, as shown in fig. 7, which is a schematic diagram of markers according to an embodiment of the present invention.

The median point A of the hairline is the intersection point of the line connecting the hairline and the face skin and the central vertical line of the face passing through the top of the skull. Further, a face region in the sample image may be segmented based on a deep learning face segmentation technique, a face center vertical line may be determined in the face region, the face center vertical line may extend upward to pass through the vertex, and an intersection point of the face center vertical line and the face region is an intersection point. Specifically, a full convolution network is trained to distinguish which regions belong to face regions and which do not belong to face regions in a sample image, and then the face region segmentation problem is converted into a two-class classification problem. In the actual processing process, the generation of the candidate region (face region) of the sample image is a relatively critical problem, and the performance and efficiency of segmentation are directly related. A mask approximating the face region may be generated by first generating candidate regions using a superpixel-based segmentation method and then avoiding generating too small regions based on a non-maximum suppression algorithm. A relatively stable pair of points (e.g., nose tip H and person's meditope K) is selected from the known 69 points, resulting in a face center vertical line passing just through the center of the bridge of the nose and the chin location point, the intersection of the vertical line with the facial mask being the intersection point sought.

Fig. 8 is a schematic diagram of point a and a slurry bearing point according to an embodiment of the present invention. Fig. 8 shows an image captured from the internet, 68 marker points are marked on the target face of the image, the mask of the target face is determined, and the a point and the bearing point are marked on the 68 marker points. The mark points issued by the forehead hairline are points A, the mark points between the lower lip and the chin are the pulp bearing points, and the rest are the original 68 mark points.

The 70 marked points are used as initial marked points to extract face structure key points, as shown in table 1 below, but it should be understood by those skilled in the art that the face structure key points in table 1 are only for illustrating the embodiment and are not used to limit the embodiment.

TABLE 1

In table 1, there are three columns of data, which respectively represent the name, label, and sequence number of the key point of the face structure or the method for obtaining the key point of the face structure by using the label point. A plurality of facial structure key points for structure measurement can be extracted through 70 marking points, and according to the transverse and longitudinal relative ratio relation of the human face and the distribution condition of the marking points on the human face, the embodiment extracts 26 facial structure key points which are used for extracting the structure measurement characteristics in the next step. Fig. 9 shows a schematic diagram of extracting facial structure key points in a target human face.

And extracting facial structure measurement characteristics according to the extracted facial structure key points. A series of structure metric features can be extracted from the 26 facial structure key points extracted in the previous step, so that the face can be encoded into a corresponding structure metric feature vector.

The basic principle of structure metric feature selection is that the selected structure features have definite meanings and have close connection with the mouth breathing face recognition. Various structural measurement features can be extracted randomly, and whether the structural measurement features are closely related to the mouth breathing face recognition or not is determined in the process of training the mouth breathing face recognition model. The structural measurement features are closely related to the mouth breathing face recognition, namely that: the structure measurement feature is used for identifying the mouth breathing face and the face is accurate, and the mouth breathing face and the face are not accurately identified without the structure measurement feature. After determining the structural metric features that are closely related to the mouth breathing face recognition, the facial structural key points that need to be extracted can be determined.

Various structural measurement feature sets can be extracted through the face structural key points. As shown in table 2, a 25-dimensional structure metric feature vector can be obtained by extracting 25 structure metric features from f 0-f 24 (not including the feature labeled FF _ but, of course, the FF _ feature can also be used as a structure metric feature) based on the extracted 26 face structure key points, and of course, the 25 structure metric features are only used as a reference for structure metric feature extraction. In order to make all the structure measurement features representing distances based on the processing result of image pixels and keep dimensional uniformity, the normalization processing is carried out on the structure measurement features representing all the distances by taking the face width FF _ as a reference, and the structure measurement features representing the ratio and the structure measurement features representing the angles are kept unchanged.

TABLE 2

The structure measurement feature of any image of a human face can be encoded through table 2, so as to obtain a 25-dimensional structure measurement feature vector representation of the human face.

According to the embodiment, the 25-dimensional structure measurement feature vector can be used instead of the original face pixel to participate in training and recognition of the mouth breathing face recognition model, so that the calculation efficiency can be greatly improved.

According to the extracted structure measurement feature vector of the target face, the mouth breathing face recognition model can be trained in advance to finish the mouth breathing face recognition task.

The following describes the process of identifying the mouth-breathing face-volume by taking the mouth-breathing face-volume identification model as the XGBoost model as an example.

XGboost is a Boosting-based machine learning method. The XGBoost enhances the Classification performance by integrating Classification and regression tree (CART). The XGboost optimizes the structure and weight of the tree by using a random gradient descent method, and has good training speed and precision. XGboost can be used for classification and regression, and the mouth breathing face recognition problem can be regarded as a typical two-classification problem. In the embodiment of the invention, the XGboost model can be an XGboost Classifier model.

Before XGboost is trained, a dataset is constructed. Data of a plurality of sample images to which labels have been applied are included in the data set. The types of sample images include: a positive sample image and a negative sample image. The type of label of the sample image is the same as the type of the sample image. The data of the sample image includes: the structure of the face identified from the sample image measures the features. The data of the sample image may be obtained in the manner described with reference to fig. 2 to 6.

The data set is divided into three categories of sub data sets. The three categories of sub data sets include: training the data set, validating the data set, and testing the data set. The training data set includes data of preset negative sample images and data of positive sample images after image augmentation processing.

And the training data set is used for training the structure and the weight of the XGboost. The training data set may account for 80% of the sample images in the data set. The training data set includes two parts: and the training set of the positive sample images and the training set of the negative sample images after the image augmentation processing. The training set of positive sample images includes: data of a plurality of already labeled positive sample images. The training set of negative example images includes: data of a plurality of already labeled negative example images.

And verifying the data set for training the hyper-parameters of the XGboost. The categories of the hyper-parameters include: maximum number depth of CART tree in XGBoost. The validation dataset may select data for a portion of the labeled positive sample images and data for a portion of the labeled negative sample images from the training dataset. For example: the data of the sample image in the validation dataset accounts for 10% or 20% of the original dataset.

And the test data set is used for testing the accuracy of the XGboost in predicting the oral breathing face. The test data set may account for 20% of the data of the sample image of the data set. The test data set includes two parts: a test set of positive sample images and a test set of negative sample images. The test set of positive sample images includes: data of a plurality of already labeled positive sample images. The test set of negative sample images includes: data of a plurality of already labeled negative example images.

Because the positive sample images of the mouth breathing face are difficult to acquire, if the number of the positive sample images is small, the training task of mouth breathing face recognition cannot be completed enough. Therefore, the present embodiment enhances the structural metric features, that is, performs image augmentation processing on the acquired positive sample image.

FIG. 10 is a flowchart illustrating steps of an image augmentation process according to an embodiment of the present invention.

Step S1010, extracting the structural measurement feature of the human face from the positive sample image.

The structure metric feature is a multi-dimensional structure metric feature vector. Further, the structure-metric feature is a 24-dimensional structure-metric feature vector.

Acquiring all collected positive sample images, wherein each positive sample image comprises a human face, and the attitude angle of the human face is within the range of the attitude angle; for each positive sample image, if the image brightness standard deviation of the positive sample image is determined to be smaller than the image brightness standard deviation threshold value, performing image enhancement processing on the positive sample image by utilizing a gamma change algorithm; and extracting the structural measurement features of the human face from the positive sample image after the image enhancement processing.

Step S1020, gaussian noise is added to each dimensional feature in the structural metric features of the face, so as to obtain a new positive sample image.

And adding a randomly disturbed Gaussian noise to each dimension, taking the sampled value as an input value of the positive sample image on the dimension, and forming a new positive sample image after all the dimensions are subjected to the operation.

Specifically, the value of the positive sample image S in the ith dimension is obtained by resampling, and the value of this dimension is sampled based on a gaussian distribution:

S_i～G(μ,σ_- ²)；

wherein S is_iThe value resampled for the ith dimension of the positive sample image S, i.e. the value after addition of Gaussian noise, G is the Gaussian distribution function, μ and σ_-Is a preset sampling parameter, mu is a mean value, sigma_-Is the standard deviation.

Further, assuming that a plurality of image augmentation processes are performed on one positive sample image to obtain a plurality of new positive sample images, the mean value of the values of the corresponding dimensions of the plurality of new positive sample images is d, and the ratio of the standard deviation to the mean value d is q, then the sampling parameters are set as follows: mu ═ d, sigma_-＝d*q。

The ratio q is an empirical value or a value obtained by experiment. In setting the ratio q, a criterion of adding less noise to the features may be considered to avoid a large change in the positive sample image. In this embodiment, q is 0.1.

By the image augmentation processing, a large number of new positive sample images can be augmented on the basis of the original positive sample image. For example: after 2000 rounds of image augmentation processing are performed on each positive sample image, 2000 new positive sample images with noise added are generated on each positive sample image, if 15 original positive sample images exist, a positive sample image with capacity of 30000 can be generated, the 30000 positive sample images can be labeled, data of the 30000 positive sample images which are labeled are used as a training set of the positive sample images, the 15 original positive sample images can also be labeled, and data of the 15 positive sample images which are labeled are used as a testing set of the positive sample images.

In addition, it is also necessary to acquire negative sample images that do not have the problem of mouth breathing face. Specifically, images can be crawled from the internet through a crawler technology, and then negative sample images are screened out. The questionnaire can be set, the face image of the user with good sleep state (such as no breathing disorder, cough and snoring symptoms) extracted from the questionnaire is taken as a negative sample image, the face image of the user with problem sleep state (such as breathing disorder and severe snoring) is extracted as a positive sample image, and under the condition that the number of the obtained positive sample images is sufficient, the image augmentation processing can be omitted. For example: screening to obtain 33324 negative sample images and 2981 positive sample images in a questionnaire mode, labeling the 33324 negative sample images, taking the data of the 33324 labeled negative sample images as a training set of the negative sample images, labeling the 2981 positive sample images, and taking the data of the 2981 labeled positive sample images as a test set of the positive sample images.

On the training set and test set settings, as shown in table 3. Of course, those skilled in the art should understand that table 3 is only for illustrating the present embodiment and is not used to limit the present embodiment.

TABLE 3

And training the mouth breathing face recognition model based on the constructed data set.

FIG. 11 is a flowchart illustrating the training steps of the mouth-breathing face recognition model according to an embodiment of the present invention.

Step S1110, setting an initial value of the maximum tree depth of the CART tree in the XGBoost model.

Step S1120, training the structure and weight of the XGBoost model by using a preset training data set.

And sequentially inputting each sample image in the training data set into the XGboost model, obtaining a prediction result output by the XGboost model, comparing the prediction result with a label marked by the sample image, inputting the next sample image into the XGboost model if the prediction result is the same as the label, and adjusting the structure and the weight in the XGboost model if the prediction result is different from the label.

Step S1130, verify the trained structure and weight in the XGBoost model by using a preset verification data set, and perform the maximum tree depth adjustment of this time according to the verification result.

And sequentially inputting each sample image in the verification data set into the XGboost model, obtaining a prediction result output by the XGboost model, comparing the prediction result with a label marked on the sample image, counting the result with correct prediction if the prediction result is the same as the label, and counting the result with wrong prediction if the prediction result is different from the label. And determining the accuracy of the prediction result of the XGboost model. The accuracy is the count value of counting the correctly predicted result ÷ (count value of counting the correctly predicted result + count value of counting the incorrectly predicted result).

When adjusting the maximum tree depth, 1 may be added to the result of the previous adjustment.

Step S1140, determining whether the maximum tree depth of the previous adjustment is the optimal maximum tree depth by using a preset grid search algorithm; if so, go to step S1150; if not, step S1120 is performed.

If the accuracy of the prediction result of the XGboost model is higher after the maximum tree depth is adjusted at this time than the accuracy of the prediction result of the XGboost model is adjusted at the previous time after the maximum tree depth is adjusted at the previous time, the maximum tree depth is adjusted in a continued set; and if the accuracy of the prediction result of the XGboost model is lower after the maximum tree depth is adjusted at this time than the accuracy of the prediction result of the XGboost model after the maximum tree depth is adjusted at the previous time, determining the maximum tree depth adjusted at the previous time as the optimal maximum tree depth.

Step S1150, the maximum tree depth of the CART tree in the XGboost model is set as the optimal maximum tree depth.

Testing the XGboost model which is set to the optimal maximum tree depth by using a preset test data set, and determining a performance metric value of the XGboost model; and finishing the training of the XGboost model if the performance metric value of the XGboost model is within a preset performance range. Of course, if the performance metric value of the XGBoost model is not within the preset performance range, the process goes to step S1120. The preset performance range may be an empirical value or a value obtained through experiments.

Further, the performance metric value of the XGBoost model may be the accuracy of the classification. Specifically, each sample image in the test data set may be sequentially input to the XGBoost model, a prediction result output by the XGBoost model is obtained, the prediction result is compared with a label labeled on the sample image, if the prediction result is the same as the label, a result with a correct prediction is counted, and if the prediction result is different from the label, a result with a wrong prediction is counted. And determining the accuracy of the prediction result of the XGboost model. The accuracy is the count value of counting the correctly predicted result ÷ (count value of counting the correctly predicted result + count value of counting the incorrectly predicted result). The predetermined performance range may be a prediction result accuracy greater than a predetermined convergence threshold. Thus, if the accuracy of the XGBoost model prediction result is greater than the convergence threshold, it is determined that the XGBoost model converges. The convergence threshold may be an empirical value or a value obtained through experimentation.

In this embodiment, as the data set grows, the XGBoost model is iteratively updated, so that the accuracy of the XGBoost model is higher and better, and the effect is better and better.

In this embodiment, after multiple rounds of training, when the maximum depth of the CART tree is finally obtained to be 4, the performance on the verification set is optimal. Then, tests are carried out on the test set, and through multiple cross validation, the accuracy of the mouth breathing face recognition model for recognizing the positive sample image of the mouth breathing face is 14/15, the accuracy of the recognition model for recognizing the negative sample image is 2665/2981, and the recognition accuracy of the mouth breathing face recognition model for recognizing the mouth breathing face is higher.

In this embodiment, whether the target face is the mouth breathing face can be identified by using an image processing technology and a mouth breathing face recognition model. In the medical field, the problem of identifying the mouth breathing face is novel, and in the field of face orthodontics, particularly under the condition of underjaw development of teenagers and children, the identification method of the embodiment can greatly reduce the medical cost.

Since the number of classical cases of mouth breathing is small, the data set is augmented by a data enhancement method based on feature disturbance, the problem that the number of sample images of a machine learning model is too small is solved, the idea of the embodiment is simple, from face image preprocessing, structural measurement feature extraction and mouth breathing face recognition model training and prediction, the venation is clear, and the recognition result has good interpretability.

On the basis of feature vector coding, the XGBoost machine learning method is selected in the present embodiment, and actually, there are many methods that can be used in addition to the XGBoost machine learning method, such as: linear regression methods, SVM methods, and the like.

The embodiment of the invention not only identifies the mouth breathing face by means of combining the image processing technology and the mouth breathing face identification model, but also can be based on the idea of deep learning, namely, the original image containing the face, namely, the face pixel matrix is directly used as the input of the deep learning model, and the deep learning model outputs whether the face in the image is the mouth breathing face. The deep learning model automatically generates an effective measurement feature coding method according to a network structure, then the effective measurement feature is generated and simultaneously the prediction output of the mouth breathing face is given, and when the face data related to the mouth breathing face is accumulated to a certain scale, the accuracy of the method is higher.

The embodiment provides a breath face recognition device. Fig. 12 is a block diagram of a mouth-breathing face-volume recognition apparatus according to an embodiment of the present invention.

In this embodiment, the mouth breathing face recognition device includes, but is not limited to: a processor 1210, and a memory 1220.

The processor 1210 is configured to execute the mouth-breathing face recognition program stored in the memory 1220 to implement the above-mentioned mouth-breathing face recognition method.

Specifically, the processor 1210 is configured to execute the mouth-breathing face recognition program stored in the memory 1220 to implement the following steps: collecting effective face images; determining the attitude angle of a target face in the effective face image; if the attitude angle of the target face is within a preset attitude angle range, extracting the structural measurement feature of the target face from the effective face image; inputting the structural measurement characteristics of the target face into a pre-trained mouth breathing face recognition model, and obtaining a mouth breathing face recognition result output by the mouth breathing face recognition model; and training the mouth breathing face recognition model by using the positive sample image subjected to the image amplification treatment and the preset negative sample image.

The embodiment of the invention also provides a storage medium. The storage medium herein stores one or more programs. Among others, the storage medium may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk; the memory may also comprise a combination of memories of the kind described above.

When the one or more programs in the storage medium are executable by the one or more processors to implement the mouth-breathing face recognition method described above.

Specifically, the processor is configured to execute a mouth breathing face recognition program stored in the memory to implement the steps of: collecting effective face images; determining the attitude angle of a target face in the effective face image; if the attitude angle of the target face is within a preset attitude angle range, extracting the structural measurement feature of the target face from the effective face image; inputting the structural measurement characteristics of the target face into a pre-trained mouth breathing face recognition model, and obtaining a mouth breathing face recognition result output by the mouth breathing face recognition model; and training the mouth breathing face recognition model by using the positive sample image subjected to the image amplification treatment and the preset negative sample image.

The above description is only an example of the present invention, and is not intended to limit the present invention, and it is obvious to those skilled in the art that various modifications and variations can be made in the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims

1. A mouth breathing face recognition method is characterized by comprising the following steps:

collecting effective face images;

determining the attitude angle of a target face in the effective face image;

if the attitude angle of the target face is within a preset attitude angle range, extracting the structural measurement feature of the target face from the effective face image; wherein, in the effective face image, extracting the structure metric feature of the target face includes:

marking points in the effective face image aiming at the target face; wherein the mark points comprise a pulp bearing point and a hairline median point;

extracting face structure key points of the target face according to the mark points of the target face;

extracting structural measurement features corresponding to the target face according to the face structural key points of the target face;

inputting the structural measurement characteristics of the target face into a pre-trained mouth breathing face recognition model, and obtaining a mouth breathing face recognition result output by the mouth breathing face recognition model; the mouth breathing face recognition model is obtained by training the following data:

carrying out image amplification processing on a preset positive sample image, and utilizing data of the positive sample image subjected to the image amplification processing and data of a preset negative sample image;

and/or the presence of a gas in the gas,

and setting a questionnaire, wherein the face image of the user with good sleep state extracted from the questionnaire is taken as negative sample image data, and the face image of the user with the problem sleep state is extracted as positive sample image data.

2. The method of claim 1, wherein the acquiring of the valid facial image comprises:

collecting an environment image of a user;

determining an average brightness value of the user environment image;

if the average brightness value of the user environment image is within a preset brightness value range, performing face detection on the user environment image;

if a face is detected in the user environment image, determining that the user environment image is a valid face image;

and if the average brightness value of the user environment image is not in the brightness value range, or a human face is not detected in the user environment image, carrying out re-acquisition prompting.

3. The method of claim 2, further comprising, prior to the performing face detection on the image of the user environment:

determining an image brightness standard deviation of the user environment image;

and if the image brightness standard deviation is smaller than a preset image brightness standard deviation threshold value, performing image enhancement processing on the user environment image by utilizing a gamma conversion algorithm.

4. The method of claim 1, wherein determining the pose angle of the target face in the valid face image comprises:

marking points in the effective face image aiming at the target face;

acquiring a preset three-dimensional human head portrait model; wherein, the face of the three-dimensional human head portrait model is marked with mark points, and the number and the types of the mark points marked on the face of the three-dimensional human head portrait model are the same as those of the mark points marked on the target human face;

and determining the attitude angle of the target face according to the mark points in the three-dimensional human head portrait model and the mark points aiming at the target face in the effective face image.

5. The method of claim 1, wherein the data augmentation process is performed on a preset positive sample image, and comprises:

extracting the structural measurement features of the human face from the positive sample image;

and adding Gaussian noise to each dimension of the structural measurement features of the human face to obtain a new positive sample image.

6. The method of claim 5, wherein if the mouth breathing face recognition model is an XGboost model, before inputting the structural metric features of the target face into a pre-trained mouth breathing face recognition model, further comprising:

training the mouth breathing face recognition model according to a preset data set; the data set comprises a training data set and a verification data set, wherein the training data set comprises preset data of negative sample images and data of positive sample images after image augmentation processing;

according to the preset data set, training the mouth breathing face recognition model comprises the following steps:

step 2, setting an initial value of the CART classification in the XGboost model and the maximum tree depth of the regression tree;

step 4, training the structure and the weight of the XGboost model by using a preset training data set;

step 6, verifying the trained structure and weight in the XGboost model by using a preset verification data set, and executing the depth adjustment of the maximum tree at the current time according to a verification result;

and 8, determining whether the maximum tree depth adjusted at the previous time is the optimal maximum tree depth or not by using a preset grid search algorithm, if so, setting the maximum tree depth of the CART tree in the XGboost model as the optimal maximum tree depth, and otherwise, jumping to the step 4.

7. The method of claim 6, wherein the data set further comprises: testing the data set; after the maximum tree depth of the CART tree in the XGboost model is set as the optimal maximum tree depth, the method further comprises the following steps:

testing the XGboost model which is set to the optimal maximum tree depth by using a preset test data set, and determining a performance metric value of the XGboost model;

and finishing the training of the XGboost model if the performance metric value of the XGboost model is within a preset performance range.

8. A mouth-breathing face recognition device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program when executed by the processor implementing a mouth-breathing face recognition method as claimed in any one of claims 1 to 7.

9. A storage medium having stored thereon a mouth-breathing face recognition program which, when executed by a processor, implements a mouth-breathing face recognition method according to any one of claims 1 to 7.