CN113436735A

CN113436735A - Body weight index prediction method, device and storage medium based on face structure measurement

Info

Publication number: CN113436735A
Application number: CN202010209872.4A
Authority: CN
Inventors: 罗冠; 游强; 田勇; 殷晓珑
Original assignee: Beijing Haola Technology Co ltd
Current assignee: Beijing Haola Technology Co ltd
Priority date: 2020-03-23
Filing date: 2020-03-23
Publication date: 2021-09-24

Abstract

The invention discloses a body mass index prediction method based on face structure measurement, equipment and a storage medium. The method comprises the following steps: collecting effective face images; determining the attitude angle of a target face in the effective face image; if the attitude angle of the target face is within a preset attitude angle range, extracting the structural measurement feature of the target face from the effective face image; and inputting the structural measurement characteristics of the target face into a pre-trained body weight index prediction model, and acquiring the body weight index corresponding to the target face output by the body weight index prediction model. The BMI prediction method is convenient to operate and high in prediction accuracy, and can solve the problem that the real height and weight of a user cannot be effectively acquired so that the BMI of the user cannot be acquired.

Description

Body weight index prediction method, device and storage medium based on face structure measurement

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a Body Mass Index (BMI) prediction method based on face structure metrics, a device, and a storage medium.

Background

Body Mass Index (BMI) is an index that measures the body type of a human. Wherein BMI is the square of weight/height. Thus, the height and weight of the person need to be known when calculating the BMI. However, in many cases, it is difficult to acquire the real height and weight of the user, so that it is difficult to obtain the accurate BMI of the user.

Particularly, in interpersonal communication in modern society, height and weight all belong to individual privacy, and generally speaking about related topics is avoided, and the height and weight of a user are not easy to obtain. Under the condition that a user does not know, the collection of the personal information of the user often faces great legal risks, even under the condition that the user knows, the collection of the personal information of the user can select to avoid or even provide the personal information in a fake mode because the user thinks that privacy leakage exists in the information collection, and the obtained height and weight of the user are often inaccurate. For example: in some physical examination items, some users worry about that the physical index does not reach the standard, for example, the BMI does not reach the standard, and choose to have other people perform physical examination instead of themselves, so that the obtained BMI is inaccurate. For another example: when the user is prompted in the website to enter height and weight, the user is worried about the personal information being revealed, and thus false height and weight are provided, and the BMI thus obtained will be wrong. However, in some application scenarios, the user's accurate BMI must be used, but these application scenarios, in which the user's accurate BMI must be used, are negatively impacted due to the difficulty in acquiring the user's height and weight. For example: in the information notification of a series of business behaviors such as health insurance, the BMI is a very important index for participating in the insurance underwriting, the accurate BMI of a user needs to be used, and if the acquired BMI of the user is inaccurate, the subsequent work is inconvenient.

Disclosure of Invention

The invention mainly aims to provide a method, equipment and a storage medium for predicting a body mass index based on face structure measurement, so as to solve the problem that the accurate BMI of a user is difficult to obtain due to the fact that the real height and weight of the user are difficult to acquire.

Aiming at the technical problems, the invention solves the technical problems by the following technical scheme:

the invention provides a body mass index prediction method based on face structure measurement, which comprises the following steps: collecting effective face images; determining the attitude angle of a target face in the effective face image; if the attitude angle of the target face is within a preset attitude angle range, extracting the structural measurement feature of the target face from the effective face image; and inputting the structural measurement characteristics of the target face into a pre-trained body weight index prediction model, and acquiring the body weight index corresponding to the target face output by the body weight index prediction model.

Wherein, the collecting of the effective face image comprises: collecting an environment image of a user; determining an average brightness value of the user environment image; if the average brightness value of the user environment image is within a preset brightness value range, performing face detection on the user environment image; if a face is detected in the user environment image, determining that the user environment image is a valid face image; and if the average brightness value of the user environment image is not in the brightness value range, or a human face is not detected in the user environment image, carrying out re-acquisition prompting.

Wherein, before the face detection for the user environment image, the method further comprises: determining an image brightness standard deviation of the user environment image; and if the image brightness standard deviation is smaller than a preset image brightness standard deviation threshold value, performing image enhancement processing on the user environment image by utilizing a gamma conversion algorithm.

Wherein the determining the attitude angle of the target face in the effective face image comprises: marking points in the effective face image aiming at the target face; acquiring a preset three-dimensional human head portrait model; wherein, the face of the three-dimensional human head portrait model is marked with mark points, and the number of the mark points marked on the face of the three-dimensional human head portrait model and the number of the mark points marked on the target human face are the same as the types in the same dimension space; and determining the attitude angle of the target face according to the mark points in the three-dimensional human head portrait model and the mark points aiming at the target face in the effective face image.

Wherein, still include: and if the attitude angle of the target face is within a preset attitude angle range, performing face alignment operation on the target face before extracting the structural measurement features of the target face from the effective face image.

Extracting the structural measurement feature of the target face from the effective face image, wherein the extracting the structural measurement feature of the target face comprises the following steps: marking points in the effective face image aiming at the target face; extracting face structure key points of the target face according to the mark points of the target face; and extracting the structural measurement characteristics corresponding to the target face according to the face structural key points of the target face.

Wherein, the body mass index prediction model comprises the following types: an extreme gradient lifting XGboost model, a linear regression model, a Support Vector Machine (SVM) model or a deep learning network.

If the body weight index prediction model is an XGboost model, before inputting the structural measurement features of the target face into a pre-trained body weight index prediction model, the method further comprises the following steps: training the body mass index prediction model; wherein training the body mass index prediction model comprises: step 2, setting an initial value of the CART classification in the XGboost model and the maximum tree depth of the regression tree; step 4, training the structure and the weight of the XGboost model by using a preset training data set; step 6, verifying the trained structure and weight in the XGboost model by using a preset verification data set, and executing the depth adjustment of the maximum tree at the current time according to a verification result; and 8, determining whether the maximum tree depth adjusted at the previous time is the optimal maximum tree depth or not by using a preset grid search algorithm, if so, setting the maximum tree depth of the CART tree in the XGboost model as the optimal maximum tree depth, and otherwise, jumping to the step 4.

Setting the maximum tree depth of the CART tree in the XGboost model to be the optimal maximum tree depth, and then carrying out the following steps; testing the XGboost model which is set to the optimal maximum tree depth by using a preset test data set, and determining a performance metric value of the XGboost model; and finishing the training of the XGboost model if the performance metric value of the XGboost model is within a preset performance range.

The invention also provides a body mass index prediction device based on the human face structure measurement, which comprises a processor and a memory; the processor is used for executing a body weight index prediction program based on the human face structure metric stored in the memory so as to realize the body weight index prediction method based on the human face structure metric.

The present invention also provides a storage medium, wherein the storage medium stores one or more programs, and the one or more programs are executable by one or more processors to implement any one of the above-mentioned methods for body mass index prediction based on facial structure metrics.

The invention has the following beneficial effects:

the BMI prediction method is convenient to operate and high in prediction accuracy, and can solve the problem that the real height and weight of a user cannot be effectively acquired so that the BMI of the user cannot be acquired.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flow chart of a method for weight index prediction based on face structure metrics according to an embodiment of the present invention;

FIG. 2 is a flowchart of the steps for acquiring a valid face image according to one embodiment of the present invention;

FIG. 3 is a flowchart of the steps of an image enhancement process according to one embodiment of the invention;

FIG. 4 is a flowchart of the steps of attitude angle determination, according to one embodiment of the present invention;

FIG. 5 is a schematic diagram of coordinate system conversion according to an embodiment of the present invention;

FIG. 6 is a flowchart of the steps of structure metric feature extraction, according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a marker according to an embodiment of the present invention;

FIG. 8 is a flowchart illustrating the steps of training a body mass index prediction model according to one embodiment of the present invention;

fig. 9 is a block diagram of a body mass index prediction apparatus based on a face structure metric according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and specific embodiments.

According to the embodiment of the invention, a body mass index prediction method based on face structure measurement is provided. Fig. 1 is a flowchart illustrating a body mass index prediction method based on a face structure metric according to an embodiment of the present invention.

Step S110, collecting effective face images.

The valid face image refers to an image which contains a face and has an average brightness value within a preset average brightness value range.

And step S120, determining the attitude angle of the target face in the effective face image.

The target face refers to the face of a user whose BMI is to be determined.

The pose angles (θ, ψ, φ) of the target face include: pitch angle θ, yaw angle ψ, and rotation angle Φ.

In the present embodiment, the pose angle of the target face is determined from the face image of the target face.

If the effective face image is uploaded by the user, the face image of the user is collected, the face image is matched with the face images of one or more faces detected in the effective face image respectively, and if the face image is successfully matched with the face image of one face in the effective face image, the face corresponding to the successfully matched face image in the effective face image is used as a target face. By the method, the identity of the user to be subjected to the BMI can be verified, and the problem that the BMI detection is carried out by replacing the user is avoided.

And if the effective face image is collected on site, selecting a face from the effective face image as a target face.

Step S130, if the pose angle of the target face is within a preset pose angle range, extracting a structural metric feature of the target face from the effective face image.

The structural measurement feature refers to the structural feature of a human face. Further, the structure metric features are multi-dimensional feature vectors. For example: size, angle, etc. of the five sense organs. The structure metric features can be used to predict the BMI of the user to which the target face belongs.

And if the attitude angle of the target face is within the preset attitude angle range, the target face is basically a front face. The pitch angle range e-25 degrees and 25 degrees in the attitude angle range, the deflection angle range e-25 degrees and the rotation angle range e-35 degrees and 35 degrees can be set. When θ is 0, ψ is 0, and Φ is 0, it indicates that the current target face is a standard face. The pose angle of the target face is within the range of pose angles, namely: and judging that the target face is effective when the pitch angle of the target face is within the range of the pitch angle, the deviation angle is within the range of the deviation angle and the rotating angle is within the range of the rotating angle.

And if the attitude angle of the target face is not within the preset attitude angle range, indicating that the target face is not a front face, and then carrying out re-acquisition prompt so that the user acquires the user environment image again according to the re-acquisition prompt. And further, comparing the attitude angle of the target face with a preset attitude angle range, if the attitude angle range is exceeded, the target face is invalid, and sending a re-acquisition prompt to the user so as to prompt the user to upload an image containing the front face.

The BMI can be predicted by screening the face on the front side, so that the accuracy of the BMI can be improved. When the front face cannot be completely displayed, the face information is seriously lost, and the obtained BMI is inaccurate.

Step S140, inputting the structural measurement characteristics of the target face into a pre-trained body mass index prediction model, and obtaining the body mass index corresponding to the target face output by the body mass index prediction model.

And the body mass index prediction model is used for predicting the BMI corresponding to the target face according to the input structural measurement characteristics of the target face. Further, the body weight index corresponding to the target face is the body weight index of the user to which the target face belongs.

The method for predicting the BMI of the target face acquires an effective face image, extracts the structural measurement feature of the target face from the effective face image, and enables the BMI prediction model to predict the BMI corresponding to the target face by using the structural measurement feature. The embodiment can solve the problem that the real height and weight of the user cannot be effectively acquired, so that the BMI of the user cannot be acquired. The execution subject of the present embodiment may be a server, a desktop device, and/or a mobile device. The server, desktop device, and/or mobile device may be camera-enabled devices. The mobile device may be a user device, such as: possess smart mobile phone and the electronic scale of shooting function.

The applications that can be used in this embodiment are very wide, including but not limited to: the field of health insurance, the field of health physical examination and the field of body self-examination. For example: in a hospital environment, the BMI of a user is obtained in order to complete a task of participating in a security warranty, a physical examination item, and the like. In a home environment, the BMI of a user is acquired so as to know whether the body type of the user is standard or not. Further, in some occasions where the user identity needs to be verified and the BMI is measured, the embodiment can assist in realizing the verification process of the user identity while predicting the BMI.

The following is a detailed description of the steps for acquiring valid face images.

Fig. 2 is a flowchart illustrating steps of acquiring a valid face image according to an embodiment of the present invention.

And step S210, acquiring an environment image of the user.

The user environment image refers to an image in the camera view field acquired by the camera.

The user environment image may call a camera of the user equipment or the BMI acquisition device to acquire the user environment image, or acquire the user environment image uploaded by the user. For example: the user environment image is collected in real time by using the user equipment, and the user can be prompted to upload the user environment image.

One or more faces may be included in the user environment image. Of course, the user environment image may not include any human face.

Step S220, determining an average brightness value of the user environment image.

In this embodiment, I (x, y) may be used to represent a user environment image, where the width of the user environment image is w and the height of the user environment image is h; wherein x ∈ [0, w ]],y∈[0,h]；I_xyThe value of (b) represents the brightness value of a pixel point with position coordinates (x, y) in the user environment image, I_xy∈[0,255]。

The calculation formula of the average brightness value of the user environment image is as follows:

further, if the user environment image is a color image, I_xy＝[I_R,I_G,I_B]Wherein, I_R，I_GAnd I_BThe luminance values of the three channels of red, yellow and blue, respectively, the average luminance value of the user environment image may be replaced with the average of the luminance means of the three channels, that is: the average brightness value of the user environment image is (the brightness mean value of the red channel + the brightness mean value of the yellow channel + the brightness mean value of the blue channel) ÷ 3, and the brightness mean value is the sum of the brightness values of all the pixels ÷ the number of all the pixels.

Step S230, determining that the average brightness value of the user environment image is within a preset brightness value range; if yes, go to step S240; if not, step S270 is executed.

Presetting a brightness value range of [ I ]⁰,I¹]. End value I of the brightness value range⁰And I¹May be an empirical value or a value obtained by experiment. When in use

The average brightness value of the image representing the environment of the user is too dark; when in use

It indicates that the average luminance value of the user environment image is too bright.

In this embodiment, in order to reduce the number of times of acquiring the user environment image, a relatively extreme situation is simulated in advance, for example, an average brightness value of the user environment image in a night environment and a face scene where a high-power light source is directly irradiated is simulated, and the average brightness value of the user environment image in the night environment is used as a lower limit I of a brightness value range⁰Taking the average brightness value of the user environment image under the condition that the high-power light source directly irradiates the human face as the upper limit I of the brightness value range¹. Further, the lower limit I of the luminance value range may be set⁰And upper limit I¹Set to 25 and 230 in order. The extreme average brightness value is difficult to appear in the images shot in daily conditions, and once the extreme condition appears, the representative images are almostUnusable and discarded, and a preset reject operation may be performed. The rejection may be a re-acquisition prompt. The brightness of the user environment image is judged, so that the precision of subsequent face detection can be improved.

Step S240, if the average brightness value of the user environment image is within the brightness value range, performing face detection on the user environment image.

The manner of performing face detection on the user environment image will be described in detail later.

Step S250, judging whether a human face is detected in the user environment image; if yes, go to step S260; if not, step S270 is executed.

Step S260, if a face is detected in the user environment image, determining that the user environment image is a valid face image.

After a face is detected in the user environment image, a face region is identified in the user environment image, and the identified face region is taken as a face image.

In this embodiment, a face detection frame may be used to identify an area where a face is located in the user environment image. And if a plurality of faces are detected in the user environment image, respectively identifying the area of each detected face by using a plurality of face detection frames.

Step S270, if the average brightness value of the user environment image is not within the brightness value range, or a human face is not detected in the user environment image, performing a re-acquisition prompt.

In this embodiment, before performing face detection on the user environment image, in order to ensure that the user environment image has good contrast, image enhancement processing may be performed on the user environment image.

The contrast of the user environment image refers to a measure of the different brightness levels between the brightest white and darkest black of the bright and dark regions in the user environment image, i.e., the magnitude of the brightness contrast (difference) of the user environment image. A larger brightness contrast represents a larger contrast, and a smaller brightness contrast represents a smaller contrast.

In this embodiment, the image enhancement processing method includes, but is not limited to: gamma transformation and logarithmic transformation. The following describes image enhancement processing performed on an environment image of a user with a small contrast.

FIG. 3 is a flowchart illustrating steps of an image enhancement process according to an embodiment of the present invention.

In step S310, an image brightness standard deviation of the user environment image is determined.

In order to determine whether the user environment image needs to be subjected to the image enhancement operation, an image brightness standard deviation σ of the user environment image may be calculated, and the image brightness standard deviation σ may be referred to as root-mean-square contrast.

In the present embodiment, the calculation formula of the image luminance standard deviation σ is as follows:

the greater the contrast of the user environment image is, the greater the image brightness standard deviation sigma is; the smaller the contrast of the user environment image, the smaller the image brightness standard deviation σ.

Step S320, if the image brightness standard deviation is smaller than a preset image brightness standard deviation threshold, performing image enhancement processing on the user environment image by using a gamma conversion algorithm.

For the user environment image with small contrast, the gamma conversion algorithm can be adopted for image enhancement processing. The gamma conversion algorithm has the standard form:

wherein, I (x, y) is the user environment image before image enhancement, O (x, y) is the user environment image after image enhancement, and γ is the control parameter. Wherein γ is greater than 0. That is, the following operation is performed for each pixel point in the user environment image:

wherein, the brightness value of the pixel point after the image enhancement is obtained.

When γ is greater than 1, the user environment image becomes dark as a whole, which stretches the region of higher brightness in the image while compressing the portion of lower brightness.

When γ is equal to 1, the user environment image has no change.

When γ is larger than 0 and smaller than 1, the user environment image becomes brighter as a whole, which stretches the area of lower brightness in the image and compresses the portion of higher brightness.

In this embodiment, the average brightness value of the user environment image is combined

The optimal brightness value range of the user environment image is 165-175, and 170 can be taken as an average brightness value threshold.

Where γ is the empirical formula:

when in use

When gamma is equal to 1, the user environment image has no change; when in use

When going to 0, γ goes to 0, the user environment image becomes bright as a whole, and the contrast increases; when in use

When the trend goes to 255, γ tends to be infinite, the user environment image becomes dark as a whole, and the contrast becomes large.

After the image enhancement processing is performed on the user environment image, denoising processing may be further performed on the user environment image after the image enhancement processing.

After the image enhancement processing is performed on the user environment image, face detection can be performed on the user environment image. Face detection is further described below.

The face detection method can be performed by adopting a sliding window method. Specifically, the sliding window moves in the user environment image in preset steps, the classifier performs face recognition on an image area in the sliding window based on the external outline of the face, and when a shape matched with the external outline of the face exists in the image area, the image area is classified into the face, which represents that the face is detected.

The sliding window may be considered a face detection box. Since faces vary in size, the size of the sliding window is scaled in size to match the size variations of different faces. In the process of detecting the face by using the sliding window, a face detection method based on a Histogram of Gradients (Histogram of Gradients) can be adopted to detect the face in the user environment image; a human face detection method based on Harr-like characteristics can also be adopted to detect the human face in the user environment image.

Of course, since the human face has its special structural and textural features, the embodiment of the present invention may also use a deep neural network to detect the human face in the user environment image.

The category of deep neural networks includes, but is not limited to: a Multi-Task cascaded convolutional Neural Network (MTCNN for short) and a MobileNet-SSD.

In the embodiment of the present invention, the MTCNN may be used to perform face detection on an input user environment image. The MTCNN may detect a face in the user environment image and identify an area in which the detected face is located using a face detection frame in the user environment image.

The MTCNN is a face detection deep learning model based on multi-task cascade CNN, and face frame regression and face key point (mark point) detection are comprehensively considered in the model. The user environment image input into the MTCNN can be scaled into user environment images with different sizes according to different scaling ratios, so that a characteristic pyramid of the image is formed, and faces with different sizes can be detected. MTCNN comprises three cascaded subnetworks, called PNet, RNet and ONet, respectively. Wherein, for each scale of the user environment image, PNet, RNet and ONet are respectively used for:

the PNet generates a regression vector of a candidate window and a bounding box for marking a face region according to an input user environment image; calibrating the generated candidate window by using the regression vector of the bounding box; and performing first deduplication processing on the calibrated candidate frame opening through a first Non-maximum suppression (NMS) algorithm to obtain a PNet deduplication candidate window.

RNet firstly uses the regression vector of the boundary frame to calibrate a candidate window subjected to PNet de-weight; and then, carrying out second-time duplicate removal processing on the calibrated candidate window by utilizing a second NMS algorithm to obtain the RNet duplicate-removed candidate window. In this way, further screening of candidate windows subject to PNet deduplication is achieved.

The ONet function is similar to the RNet function, and the regression vector of the bounding box is firstly utilized to calibrate a candidate window subjected to RNet de-weighting; and carrying out third-time de-duplication processing on the calibrated candidate window by using a third NMS algorithm, and simultaneously generating five marked point positions while removing the overlapped candidate window. In this way, while the ONet further screens the candidate windows subjected to RNet de-duplication, five marker points are detected on the face framed by each candidate window. The marking points refer to characteristic points marked at preset positions of the human face. The five marker points include: the mark points are respectively marked on the two pupils, the nose and the two corners of the mouth.

The overlap degree (IOU for short) set in the first NMS algorithm, the second NMS algorithm and the third NMS algorithm is different, the IOU is the first NMS algorithm, the second NMS algorithm and the third NMS algorithm from large to small, and therefore the PNet, the RNet and the ONet can finish the duplication elimination of the candidate windows from coarse to fine.

Since the user environment image input to the MTCNN is scaled according to different scaling ratios to form an image pyramid, that is, an image of multiple scales, and then the PNet, the RNet, and the ONet respectively perform face detection on the user environment image of each scale, it is necessary to normalize all candidate windows to the user environment image of the original size after face detection. For example: if the scale of some user environment images is twice of the original scale, then when the user environment images return to the original size, the candidate window needs to be normalized to the original size, that is, the size of the candidate window needs to be divided by 2. The candidate windows on multiple scales are normalized to the original scale for comparability.

In the present embodiment, before detecting a face in a user environment image based on a deep neural network, a face detection network MTCNN for face detection needs to be trained. Further, the training of the MTCNN includes: pre-training the MTCNN using an open-source face data set so as to pre-train weights in the MTCNN; the MTCNN is retrained using a pre-collected oriented face data set to perform fine-tune (fine-tune) training on weights in the MTCNN, so that the MTCNN can better detect a face image similar to the face type distribution of the oriented face data set. Face types, including but not limited to: age layer of the face, gender of the face, and skin color of the face.

Open source face data sets including, but not limited to: VGG-Face, FDDB. The open source data set is characterized in that the human faces are very wide, but lack of accuracy, and the human faces of all races are included, wherein the human faces of white people are taken as main faces. The directional face data set is a face image of a preset face type collected according to the characteristics of an application scene, for example: the images in the directional face data set are dominated by faces of yellow-seeded people.

Whether pre-training or fine training is performed, a face image of a face data set (an open source face data set and a directional face data set) is input into the MTCNN, the MTCNN is used for detecting a face in the face image, a detection result is compared with a result pre-labeled for the face image, if the detection result of the MTCNN is the same as the result pre-labeled for the face image, the trained MTCNN is indicated to correctly classify (namely accurately identify) a sample (the face image), and when the identification accuracy of the MTCNN is not improved any more, the MTCNN is considered to be converged. The recognition accuracy is the number of times of recognition accuracy ÷ (the number of times of recognition accuracy + the number of times of recognition error).

After the MTCNN converges, the MTCNN may perform face detection on the user environment image after image enhancement.

The user environment image is input to the trained MTCNN. The user environment image input to the MTCNN network may or may not include a human face. When the user environment image does not contain the face, the output result of the MTCNN network is null; when the user environment image contains a face, the MTCNN network outputs the user environment image containing a face detection frame (identifying a face region). When a face is detected to appear in the user environment image, the face is framed by a face detection frame. When a plurality of faces are detected to appear in the user environment image, each face is framed out by one face detection frame.

If the human face is detected in the user environment image and the average brightness value of the user environment image is within the brightness value range, the user environment image is determined to be an effective human face image, and then the attitude angle of the target human face in the effective human face image can be determined.

Fig. 4 is a flowchart illustrating the steps of determining the attitude angle according to an embodiment of the present invention.

And step S410, marking points in the effective face image according to the target face.

The posture of the face includes a pitch angle (pitch angle) of a face which heads down in a three-dimensional space, a yaw angle (yaw angle) of the face which is deviated to the left or right side, and an angle (rotation angle) of the face which rotates counterclockwise or clockwise in a plane. The estimation of the attitude angle of the target face is completed depending on the mark points of each part of the target face, and the more the mark points are, the finer the mark points are, and the more accurate the estimated attitude angle is.

In this embodiment, when determining the pose angle of the target face, the 5 marking points may be used to mark the target face in the effective face image based on the 5 marking points output by the MTCNN, or based on a 5 marking point model used in an open source machine learning library (dlib). Of course, in order to improve the accuracy of the pose estimation, a 68-point marking point model in dlib may also be used, i.e. 68 marking points are marked on the target face.

Step S420, acquiring a preset three-dimensional human head portrait model; wherein, the face of the three-dimensional human head portrait model is marked with mark points, and the number of the mark points marked on the face of the three-dimensional human head portrait model and the mark points marked on the target human face are the same as the types in the same dimension space.

The type of the mark point can reflect the position of the mark point on the face. For example: the mark point positioned in the heart of the eyebrow can represent the mark point between the eyebrows.

The mark points marked on the face of the three-dimensional human head portrait model and the mark points marked on the target human face are of the same type in the same dimensional space, and the mark points refer to the following steps: converting the mark points of the target face into a three-dimensional space, wherein the mark points of the target face and the mark points of the face of the three-dimensional human head portrait model are of the same type; or after the mark points of the face of the three-dimensional human head portrait model are converted into the two-dimensional space, the mark points of the face of the three-dimensional human head portrait model and the mark points of the target face are the same in type. Therefore, each mark point marked on the target human face has a corresponding mark point at the corresponding position of the face of the three-dimensional human head portrait model.

If the face of the three-dimensional human head portrait model is marked with 5 marking points, marking of the 5 marking points can be carried out aiming at the target face; if 68 marking points are marked on the face of the three-dimensional human head portrait model, marking the 68 marking points aiming at the target human face.

And step S430, determining the attitude angle of the target human face according to the mark points in the three-dimensional human head portrait model and the mark points aiming at the target human face in the effective human face image.

And rotating the three-dimensional human head portrait model in three directions to enable the N marking points of the target human face to be superposed (or approximately superposed) with the N marking points in the three-dimensional human head portrait model, so that the posture of the three-dimensional human head portrait model is the posture of the target human face.

In this way, the pose angle estimation problem of the target face can be converted into the following optimization problem:

the attitude angle of the three-dimensional human head portrait model is assumed to be (theta, psi, phi), and the attitude angle, the deflection angle and the rotation angle are correspondingly arranged in sequence. As shown in fig. 5, with the camera (camera) parameters fixed, the rotation matrix R and translation vector t from the world coordinate system to the camera coordinate system are solved. The world coordinate system is a three-dimensional coordinate system where the three-dimensional human head portrait model is located, and the camera coordinate system is a plane coordinate system (two-dimensional coordinate system) where the target human face in the effective human face image is located.

And after the rotation matrix R and the translation vector t are obtained, carrying out Euler angle conversion on the rotation matrix R and the translation vector t to obtain a pitch angle, a deflection angle and a rotation angle of the target face.

Specifically, after N marker points are marked on the target face, each marker point on the target face is a projection point of one marker point of the three-dimensional human head portrait model face. The three-dimensional coordinate of a mark point P of the three-dimensional human head portrait model face is P_iThe imaging coordinate (two-dimensional coordinate) of the mark point P on the plane of the target face is f (P)_i(ii) a R, t), the two-dimensional coordinate of the real projection point p is p_iIn order to obtain the rotation matrix R and the translational vector t, only the following minimum projection mean square error problem needs to be solved.

The expression of the minimum projection mean square error may be:

thus, the minimum projection mean square error can be approximately solved by a Levenberg-Marquardt optimization method, and the optimization method has the following idea: and (3) slightly adjusting the three-dimensional human head portrait model to obtain the coordinates of the mark points on the three-dimensional human head portrait model projected on an image plane (the plane where the target human face is located) until the projected mean square error reaches a minimum value. In actual engineering application, a coordinate set of a mark point on the face of a three-dimensional human head portrait model on an image plane is obtained through a standard camera, then internal parameters (initial R and t) of the camera and the focal length of the camera are calibrated, and then functions such as solvePp and the like are called by using an open-source computer vision library OpenCV to complete posture estimation of a target face.

After the attitude angle of the target face is obtained, comparing the attitude angle of the target face with a preset attitude angle range, if the attitude angle of the target face is within the preset attitude angle range, the target face is considered to be effective, the target face in an effective face image can be cut, only the face area of the target face is reserved, a face image of the target face is obtained, and the structural measurement feature of the target face is extracted from the face image.

In this embodiment, before extracting the structure metric feature of the target face, a face alignment operation is performed on the target face. A face alignment operation comprising: and performing attitude angle compensation through affine transformation to enable the human face to be transformed into a front face or an approximate front face, wherein the operations are called human face alignment, and after the human face alignment operation, the structural measurement features of the target human face can be extracted.

Fig. 6 is a flowchart illustrating the steps of structure metric feature extraction according to an embodiment of the present invention.

And step S610, marking points in the effective face image according to the target face.

And S620, extracting the key points of the face structure of the target face according to the mark points of the target face.

Step S630, extracting the structural measurement characteristics corresponding to the target face according to the face structural key points of the target face.

The key points of the face structure refer to mark points for positioning the face structure. Structural key points of the face, including but not limited to: and the marking points are used for positioning key area positions such as eyebrows, eyes, a nose, a mouth, face contours and the like.

The structural measurement feature refers to the structural feature of a human face. Further, the structure metric features are multi-dimensional feature vectors. For example: size, angle, etc. of the five sense organs.

Specifically, the step of marking the face area of the target face is similar to the step of marking the face area of the target face when the pose angle is determined, but in order to better mark the structural information of the target face, the model used in the embodiment is 68 marker models in dlib, and the 68 marker points can outline each part of the target face, for example, the 68 marker points can outline the eyebrow shape, the eyes, the nose, the mouth, and the face contour. If 68 marking points are marked on the target face when effective face recognition is carried out, marking the marking points on the target face in the effective face image can use the marking points which are marked on the target face.

Further, according to the relationship between the BMI and the face structure, when the marking of the mark point is performed in the face area of the target face, other mark points may also be marked, for example: carry out the points (acupoints) of the body fluid.

The supporting point is a depressed part between the lower edge of the lower lip and the chin top (called as ground pavilion), and the depressed part is closely related to the structural measurement feature of the lower part of the face. It may be called a bearing point (pocket) as the 69 th marker point. Further, the grout point is generally on the line segment between the lower lip and the ground pavilion and at the bottom of the concavity. The bottom of the recess between the lower lip's lower edge and the pavilion is often the lowest point of brightness on the line segment. Assuming that the quartile points of a line segment from the lower edge of the lower lip to the ground pavilion are a, b and c in sequence; the point with the lowest brightness among the sub-line segments ab is searched for and is considered as the slurry bearing point to be found.

Thus, 69 markers are found on the target face, as shown in fig. 7, which is a schematic diagram of markers according to an embodiment of the present invention.

The 69 marked points are used as initial marked points to extract face structure key points, as shown in table 1 below, but it should be understood by those skilled in the art that the face structure key points in table 1 are only for illustrating the embodiment and are not used to limit the embodiment.

TABLE 1

There are three columns in table 1, which respectively represent the name, the label, and the serial number of the mark point of the face structure key point or the method for obtaining the face structure key point through the mark point. Many facial structure key points can be extracted through 69 marking points, and according to the relative ratio relation between the transverse direction and the longitudinal direction of the human face and the distribution condition of the marking points on the human face, the embodiment extracts 25 facial structure key points which are used for extracting the next structural measurement feature.

And extracting facial structure measurement characteristics according to the extracted facial structure key points. A series of structure metric features can be extracted from the 25 facial structure key points extracted in the previous step, so that the face can be encoded into a corresponding structure metric feature vector.

The basic principle of selection of structural metric features is that these structural features have definite meanings and are closely related to BMI. Various structural metric features can be randomly extracted, and whether the structural metric features are closely related to the BMI or not can be determined in the process of training the body mass index prediction model. The close association of the structural metric features with the BMI means that: the BMI predicted by using the structural metrology feature is more accurate, while the BMI predicted without using the structural metrology feature is not accurate enough. After determining the structural metric features that are closely related to the BMI, the structural key points of the face that need to be extracted can be determined.

Various structural measurement feature sets can be extracted through the face structural key points. As shown in table 2, 23 structural metric features are extracted from the 25 extracted facial structural key points, i.e., f 0-f 22, so that a 23-dimensional structural metric feature vector can be obtained, but it should be understood by those skilled in the art that the 23 structural metric features are only used as a reference for structural metric feature extraction. In order to make all the structure measurement features representing distances based on the processing result of image pixels and keep dimensional uniformity, the normalization processing is carried out on the structure measurement features representing all the distances by taking the face width FF _ as a reference, and the structure measurement features representing the ratio and the structure measurement features representing the angles are kept unchanged.

TABLE 2

The structure measurement feature of any image of a human face can be encoded through table 2, so that a 23-dimensional structure measurement feature vector representation of the human face is obtained.

In the embodiment, the 23-dimensional structure measurement feature vector can be used instead of the original face pixel to participate in the training and prediction of the body mass index prediction model, so that the calculation efficiency can be greatly improved.

According to the extracted structural measurement feature vector of the target face, a body mass index prediction model can be trained in advance to complete a BMI prediction task.

The body mass index prediction model includes, but is not limited to: an eXtreme Gradient Boosting (XGBoost) model, a linear regression model, a Support Vector Machine (SVM) model, or a deep learning network.

Due to the lack of an open source dataset for predicting BMI, a dataset for predicting BMI needs to be constructed. For example: a questionnaire system is constructed to collect face images (such as hundreds of thousands of face images), and the face images report corresponding sex, age, height and weight information by a user, so that the face images and BMI data for marking the faces can be obtained.

In order to reduce the negative influence of data noise on the accuracy of the model, effective data is screened from the collected data in the process of constructing the data set and is used as a training sample. Further, a plurality of information verification processes are set first, and data meeting preset requirements are automatically screened, for example: the human face image comprises a human face and is a front face, and the average brightness value of the human face image is within the range of the average brightness value; then randomly selecting partial automatically screened data to be manually consulted, and determining whether the screened samples meet the preset requirement; finally, the data attributes can be filtered, for example: screening data with the age in a preset age range. Only after the data passed all the screens, it represented that the data was a valid sample.

The process of BMI prediction will be described below, taking the XGBoost model as an example of the body mass index prediction model.

XGboost is a Boosting-based machine learning method. The XGBoost enhances the Classification performance by integrating Classification and regression tree (CART). The XGboost optimizes the structure and weight of the tree by using a random gradient descent method, and has good training speed and precision. XGBoost can be used for both classification and regression, and because the feature space of the BMI output is a continuous positive real space, the BMI prediction is a typical regression problem. In the embodiment of the invention, the XGboost model can be an XGboost Regressor model.

Before XGboost is trained, a dataset is constructed and divided into three categories of sub-datasets. The three categories of sub data sets include: training the data set, validating the data set, and testing the data set.

And the training data set is used for training the structure and the weight of the XGboost. The training data set may account for 60% of the data in the data set of valid samples. The sample has been tagged with a label, which may be the correct BMI. The data of the sample includes: the structure of the face identified from the sample measures the features.

And verifying the data set for training the hyper-parameters of the XGboost. The categories of the hyper-parameters include: maximum number depth of CART tree in XGBoost. The validation dataset may account for 20% of the valid sample's data in the dataset. The sample has been tagged with a label, which may be the correct BMI. The data of the sample includes: the structure of the face identified from the sample measures the features.

And the test data set is used for testing the accuracy of the XGboost in predicting the BMI. The test data set may account for 20% of the data of a valid sample of the data set. The sample has been tagged with a label, which may be the correct BMI. The data of the sample includes: the structure of the face identified from the sample measures the features.

FIG. 8 is a flowchart illustrating the steps of training a body mass index prediction model according to an embodiment of the present invention.

Step S810, setting an initial value of the maximum tree depth of the CART tree in the XGboost model.

And S820, training the structure and the weight of the XGboost model by using a preset training data set.

And sequentially inputting each sample in the training data set into the XGboost model, acquiring a predicted BMI output by the XGboost model, comparing the predicted BMI with a correct BMI marked by the sample, inputting the next sample into the XGboost model if the predicted BMI is the same as the correct BMI, and adjusting the structure and the weight in the XGboost model if the predicted BMI is different from the correct BMI.

Step S830, verifying the trained structure and weight in the XGboost model by using a preset verification data set, and performing the maximum tree depth adjustment of the current time according to a verification result.

Root Mean Square Error (RMSE) may be used to determine whether the trained structure and weights are appropriate. Assume m samples, where the BMI value predicted for the kth sample using the XGboost model is BMI'_kAnd the true value of the sample is BMI ″)_kThen, the root mean square error of the XGBoost model is:

if the root mean square error is smaller than a preset model error threshold value, the XGboost model is trained, and the performance of the XGboost model can be determined by using a prediction data set; if the root mean square error is greater than or equal to the model error threshold, step S820 is skipped, and the XGBoost model continues to be trained using the training data set.

When adjusting the maximum tree depth, 1 may be added to the result of the previous adjustment.

Step 840, determining whether the maximum tree depth of the previous adjustment is the optimal maximum tree depth by using a preset grid search algorithm; if yes, go to step S850; if not, step S820 is performed.

If the accuracy of predicting the BMI by the XGboost model after the maximum tree depth is adjusted at this time is higher than the accuracy of predicting the BMI by the XGboost model after the maximum tree depth is adjusted at the previous time, the maximum tree depth is continuously adjusted; and if the accuracy of the prediction of the BMI by the XGboost model after the maximum tree depth is adjusted at this time is lower than that after the maximum tree depth is adjusted at the previous time, the accuracy of the prediction of the BMI by the XGboost model is lower, and the maximum tree depth adjusted at the previous time is determined as the optimal maximum tree depth.

Step S850, setting the maximum tree depth of the CART tree in the XGboost model as the optimal maximum tree depth.

Testing the XGboost model which is set to the optimal maximum tree depth by using a preset test data set, and determining a performance metric value of the XGboost model; and finishing the training of the XGboost model if the performance metric value of the XGboost model is within a preset performance range. Of course, if the performance metric value of the XGBoost model is not within the preset performance range, the step S820 is skipped. The preset performance range may be an empirical value or a value obtained through experiments.

The performance metric value of the XGBoost model may be a root mean square error. The calculation manner of the root mean square error may refer to step S830.

In this embodiment, to expand the feature level, a perturbation factor, which is less than or equal to 0.1, may be randomly added to the structure metric feature of each dimension in the structure metric feature vector.

In this embodiment, as the data set grows, the XGBoost model is iteratively updated, so that the accuracy of the XGBoost model is higher and better, and the effect is better and better.

The embodiment provides a body mass index prediction device based on a face structure measurement. Fig. 9 is a block diagram of a body mass index prediction apparatus based on a face structure metric according to an embodiment of the present invention.

In this embodiment, the body mass index prediction device based on the face structure metric includes, but is not limited to: a processor 910, a memory 920.

The processor 910 is configured to execute a human face structure metric-based body mass index prediction program stored in the memory 920 to implement the human face structure metric-based body mass index prediction method described above.

Specifically, the processor 910 is configured to execute a face structure metric-based body mass index prediction program stored in the memory 920 to implement the following steps: collecting effective face images; determining the attitude angle of a target face in the effective face image; if the attitude angle of the target face is within a preset attitude angle range, extracting the structural measurement feature of the target face from the effective face image; and inputting the structural measurement characteristics of the target face into a pre-trained body weight index prediction model, and acquiring the body weight index corresponding to the target face output by the body weight index prediction model.

The embodiment of the invention also provides a storage medium. The storage medium herein stores one or more programs. Among others, the storage medium may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk; the memory may also comprise a combination of memories of the kind described above.

One or more programs in the storage medium are executable by one or more processors to implement the above-described method for weight index prediction based on facial structure metrics.

Specifically, the processor is configured to execute a face structure metric-based body mass index prediction program stored in the memory to implement the following steps: collecting effective face images; determining the attitude angle of a target face in the effective face image; if the attitude angle of the target face is within a preset attitude angle range, extracting the structural measurement feature of the target face from the effective face image; and inputting the structural measurement characteristics of the target face into a pre-trained body weight index prediction model, and acquiring the body weight index corresponding to the target face output by the body weight index prediction model.

The above description is only an example of the present invention, and is not intended to limit the present invention, and it is obvious to those skilled in the art that various modifications and variations can be made in the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims

1. A body mass index prediction method based on face structure measurement is characterized by comprising the following steps:

collecting effective face images;

determining the attitude angle of a target face in the effective face image;

if the attitude angle of the target face is within a preset attitude angle range, extracting the structural measurement feature of the target face from the effective face image;

and inputting the structural measurement characteristics of the target face into a pre-trained body weight index prediction model, and acquiring the body weight index corresponding to the target face output by the body weight index prediction model.

2. The method of claim 1, wherein the acquiring of the valid facial image comprises:

collecting an environment image of a user;

determining an average brightness value of the user environment image;

if the average brightness value of the user environment image is within a preset brightness value range, performing face detection on the user environment image;

if a face is detected in the user environment image, determining that the user environment image is a valid face image;

and if the average brightness value of the user environment image is not in the brightness value range, or a human face is not detected in the user environment image, carrying out re-acquisition prompting.

3. The method of claim 2, further comprising, prior to the performing face detection on the image of the user environment:

determining an image brightness standard deviation of the user environment image;

and if the image brightness standard deviation is smaller than a preset image brightness standard deviation threshold value, performing image enhancement processing on the user environment image by utilizing a gamma conversion algorithm.

4. The method of claim 1, wherein determining the pose angle of the target face in the valid face image comprises:

marking points in the effective face image aiming at the target face;

acquiring a preset three-dimensional human head portrait model; wherein, the face of the three-dimensional human head portrait model is marked with mark points, and the number of the mark points marked on the face of the three-dimensional human head portrait model and the number of the mark points marked on the target human face are the same as the types in the same dimension space;

and determining the attitude angle of the target face according to the mark points in the three-dimensional human head portrait model and the mark points aiming at the target face in the effective face image.

5. The method of claim 1, wherein extracting the structural metric feature of the target face from the valid face image comprises:

marking points in the effective face image aiming at the target face;

extracting face structure key points of the target face according to the mark points of the target face;

and extracting the structural measurement characteristics corresponding to the target face according to the face structural key points of the target face.

6. The method of claim 1, wherein the body mass index prediction model is of a type comprising: an extreme gradient lifting XGboost model, a linear regression model, a Support Vector Machine (SVM) model or a deep learning network.

7. The method of claim 6, wherein if the body weight index prediction model is an XGboost model, before inputting the structural metric features of the target face into a pre-trained body weight index prediction model, further comprising:

training the body mass index prediction model; wherein training the body mass index prediction model comprises:

step 2, setting an initial value of the CART classification in the XGboost model and the maximum tree depth of the regression tree;

step 4, training the structure and the weight of the XGboost model by using a preset training data set;

step 6, verifying the trained structure and weight in the XGboost model by using a preset verification data set, and executing the depth adjustment of the maximum tree at the current time according to a verification result;

and 8, determining whether the maximum tree depth adjusted at the previous time is the optimal maximum tree depth or not by using a preset grid search algorithm, if so, setting the maximum tree depth of the CART tree in the XGboost model as the optimal maximum tree depth, and otherwise, jumping to the step 4.

8. The method of claim 7, further comprising, after setting a maximum tree depth of the CART tree in the XGBoost model to the optimal maximum tree depth;

testing the XGboost model which is set to the optimal maximum tree depth by using a preset test data set, and determining a performance metric value of the XGboost model;

and finishing the training of the XGboost model if the performance metric value of the XGboost model is within a preset performance range.

9. A body mass index prediction device based on a face structure measure is characterized by comprising a processor, a memory; the processor is used for executing a body weight index prediction program based on the human face structure metric stored in the memory so as to realize the body weight index prediction method based on the human face structure metric, which is disclosed by any one of claims 1-8.

10. A storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the method for body mass index prediction based on facial structure metrics as claimed in any one of claims 1 to 8.