CN109086711B

CN109086711B - Face feature analysis method and device, computer equipment and storage medium

Info

Publication number: CN109086711B
Application number: CN201810844936.0A
Authority: CN
Inventors: 罗雄文; 高英; 沈雄
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2018-07-27
Filing date: 2018-07-27
Publication date: 2021-11-16
Anticipated expiration: 2038-07-27
Also published as: CN109086711A

Abstract

The invention relates to a face feature analysis method, a face feature analysis system, computer equipment and a storage medium. The method comprises the following steps: determining a rough position area of each facial feature part in a target face image by detecting the facial feature points of the whole face on the target face image; determining the precise position area of each facial features part by respectively detecting the corresponding facial features characteristic points on the facial features part image corresponding to each rough position area; and performing face feature analysis according to each accurate position area to obtain face feature information or/and a statistical analysis result. By adopting the method, the accuracy of the characteristic analysis result can be improved.

Description

Face feature analysis method and device, computer equipment and storage medium

Technical Field

The present invention relates to the field of image analysis technologies, and in particular, to a method and an apparatus for analyzing human face features, a computer device, and a storage medium.

Background

The face feature analysis is one of the most critical processes of face recognition, and before a target face image is recognized, the face features in the image need to be analyzed first. The quality of the face feature analysis will directly determine the face recognition effect.

However, in the current face feature analysis mode, the feature point detection of five sense organs is performed on the face scale in the face alignment algorithm, and this mode is easily interfered by irrelevant areas, and the accuracy of the feature analysis result is affected.

Disclosure of Invention

In view of the above, it is necessary to provide a face feature analysis method, an apparatus, a computer device and a storage medium capable of improving accuracy of feature analysis results.

A method of facial feature analysis, the method comprising: determining a rough position area of each facial feature part in a target face image by detecting the facial feature points of the whole face on the target face image; determining the precise position area of each facial features part by respectively detecting the corresponding facial features characteristic points on the facial features part image corresponding to each rough position area; and performing face feature analysis according to each accurate position area to obtain face feature information or/and a statistical analysis result.

An apparatus for facial feature analysis, the apparatus comprising:

the first region detection module is used for determining a rough position region of each facial feature part in the target face image by detecting the facial feature points of the whole face on the target face image;

the second region detection module is used for respectively detecting corresponding feature points of the five sense organs on the images of the five sense organs corresponding to the rough position regions so as to determine the precise position regions of the five sense organs;

and the characteristic analysis module is used for carrying out face characteristic analysis according to each accurate position area to obtain face characteristic information or/and a statistical analysis result.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program: determining a rough position area of each facial feature part in a target face image by detecting the facial feature points of the whole face on the target face image; determining the precise position area of each facial features part by respectively detecting the corresponding facial features characteristic points on the facial features part image corresponding to each rough position area; and performing face feature analysis according to each accurate position area to obtain face feature information or/and a statistical analysis result.

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of: determining a rough position area of each facial feature part in a target face image by detecting the facial feature points of the whole face on the target face image; determining the precise position area of each facial features part by respectively detecting the corresponding facial features characteristic points on the facial features part image corresponding to each rough position area; and performing face feature analysis according to each accurate position area to obtain face feature information or/and a statistical analysis result.

The human face feature analysis method, the human face feature analysis device, the computer equipment and the storage medium are characterized in that the rough position area of each facial feature part in the target human face image is determined by detecting the facial feature points of the whole human face on the target human face image, the accurate position area of each facial feature part is determined by respectively detecting the facial feature points corresponding to the facial feature parts of the facial features on the facial feature image corresponding to each rough position area, and the human face feature information or/and the statistical analysis result is obtained by analyzing the human face features according to each accurate position area. In the scheme, after the rough position (namely the rough position area) of each facial feature part is positioned, the fine-grained feature point detection of the facial feature scale is carried out on the facial feature image corresponding to each rough position area, so that the area (the precise position area) where the facial feature part is located can be further accurately confirmed, the interference of irrelevant areas or noise on the feature extraction of each subsequent facial feature image of the facial feature can be reduced, and the accuracy of the feature analysis result is improved.

Drawings

Fig. 1 is a schematic diagram of an internal structure of a terminal in one embodiment;

FIG. 2 is a schematic flow chart illustrating a method for analyzing human face features according to an embodiment;

FIG. 3 is a schematic flow chart of a face feature analysis method in another embodiment;

FIG. 4 is a flowchart illustrating the steps of analyzing facial features based on each pinpoint location area, according to one embodiment;

FIG. 5 is a schematic diagram of the components and principles of a face feature analyzer in one embodiment;

FIG. 6 is a schematic diagram illustrating the structure and principle of a human face feature point detector according to an embodiment;

FIG. 7 is a schematic flow chart illustrating the determination of the initial shape by heuristic feature point initialization, under an embodiment;

FIG. 8 is a schematic diagram showing the structure and principle of the eye feature point detector and the mouth feature point detector in one embodiment;

FIG. 9 is a schematic diagram of the training and detection process of the eyebrow feature point detector, the nose feature point detector and the ear feature point detector in one embodiment;

FIG. 10 illustrates the principle of omni-directional image feature extraction using gray level co-occurrence matrices, in one embodiment;

FIG. 11 is a block diagram showing the structure of a face feature analysis apparatus according to an embodiment;

FIG. 12 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It should be noted that the terms "first," "second," and "third," etc. in the description, claims, and drawings of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. In addition, the term "or/and" is only one kind of association relationship describing the associated object, and means that three kinds of relationships may exist, for example, "a or/and B", may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter associated objects are in an "or" relationship.

The face feature analysis method provided by the invention can be applied to the terminal shown in figure 1. The terminal comprises a processor, a nonvolatile storage medium, a network interface, an internal memory and an input device which are connected through a system bus. The non-volatile storage medium of the terminal stores an operating system and further comprises a face feature analysis device, and the face feature analysis device of the terminal is used for realizing a face feature analysis method. The processor is used for providing calculation and control capability and supporting the operation of the whole terminal. The internal memory in the terminal provides an environment for the operation of the touch operation control device in the nonvolatile storage medium, and the network interface is used for communicating with the server or other terminals, for example, when the terminal responds to a click operation, a control command can be generated and sent to the server or other terminals. Specifically, the face feature analysis device of the terminal may determine a rough position region of each facial feature part in the target face image by performing facial feature point detection on the whole face on the target face image, determine an accurate position region of each facial feature part by performing corresponding facial feature point detection on the facial feature image corresponding to each rough position region, and perform face feature analysis according to each accurate position region to obtain face feature information or/and a statistical analysis result. Among them, the terminal may not be limited to various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. It should be noted that fig. 1 is only an application example of the face feature analysis method of the present invention. The human face feature analysis method can also be applied to a server. The server may be implemented as a stand-alone server or as a server cluster of multiple servers.

In an embodiment, as shown in fig. 2, a method for analyzing a face feature is provided, which is described by taking the method applied to the terminal in fig. 1 as an example, and includes the following steps:

step S201: determining a rough position area of each five sense organs part in a target face image by detecting key feature points on the target face image;

here, the target face image includes an image of a face; the feature points of five sense organs may include any one or a combination of more of eye feature points (or referred to as eye feature points), eyebrow feature points (or referred to as eyebrow feature points), mouth feature points (or referred to as mouth feature points), nose feature points (or referred to as nose feature points), and ear feature points (or referred to as ear feature points); facial feature points may also be included.

The step is to detect the feature points of the five sense organs on the scale of the face of the target face image so as to roughly determine the position of each part of the five sense organs, and the detection of the feature points of the five sense organs with coarse precision on the scale of the face is carried out.

Step S202: determining the precise position area of each facial features part by respectively detecting the corresponding facial features characteristic points on the facial features part image corresponding to each rough position area;

for example, the eye feature points are detected on the facial feature image corresponding to the rough position region of the eyes, the ear feature points are detected on the facial feature image corresponding to the rough position region of the ears, the eyebrow feature points are detected on the facial feature image corresponding to the rough position region of the eyebrows, the mouth feature points are detected on the facial feature image corresponding to the rough position region of the mouth, and the nose feature points are detected on the facial feature image corresponding to the rough position region of the nose.

The step is to detect key feature points on the scale of the five sense organs, and is to detect the five sense organs feature points with fine precision.

Step S203: performing face feature analysis according to each accurate position area to obtain face feature information or/and a statistical analysis result;

specifically, a face region can be determined according to the precise position region of each facial feature part, and the face feature statistic is determined according to the pixel information of the face region.

In the face feature analysis method, the rough position area of each facial feature part in a target face image is determined by detecting the facial feature points of the whole face on the target face image, the accurate position area of each facial feature part is determined by respectively detecting the facial feature points corresponding to the facial feature part image corresponding to each rough position area, and face feature information or/and statistical analysis results are obtained by performing face feature analysis according to each accurate position area. In the scheme, after the approximate position (namely the rough position area) of each facial feature part is positioned, fine-grained feature point detection of facial feature scales is carried out on the facial feature image corresponding to each rough position area, so that the area (the precise position area) where the facial feature part is located can be further accurately confirmed, the interference of irrelevant areas or noise on the feature extraction of each subsequent facial feature image can be reduced, and the accuracy of the feature analysis result is improved. The comprehensive extraction and the comprehensive analysis of the multi-scale human face features in the human face scale, the five sense organs scale and even any local human face area can be realized.

It should be noted that "rough" and "precise" in this embodiment are only used to relatively distinguish the precise degree of the corresponding position region, but are not used to limit the specific precise degree value.

In one embodiment, the determining the precise position region of each facial feature by performing corresponding facial feature point detection on the facial feature image corresponding to each rough position region may include: inputting the facial feature images corresponding to the rough position areas into different facial feature point detectors according to the categories of the corresponding facial feature parts to perform corresponding facial feature point detection, and acquiring the precise position areas of the corresponding facial feature parts output by the facial feature point detectors.

In this embodiment, the facial feature images corresponding to the rough position areas are respectively input to different facial feature point detectors, and the facial feature point detectors can be set differently to meet different detection requirements, thereby improving the flexibility of application.

In one embodiment, different types of feature point detectors for the five sense organs may be used according to the posture and expression changes of the five sense organs, for example, the five sense organs with rich posture changes such as eyes and mouth, a feature point detector for the five sense organs matching the five sense organs with rich posture changes may be selected, and the five sense organs with fixed shapes such as eyebrows, nose and ears may be selected.

Specifically, the above-mentioned feature point detector for five sense organs may include an eye feature point detector, an eyebrow feature point detector, a mouth feature point detector, a nose feature point detector, and an ear feature point detector; the eye characteristic point detector and the mouth and chin characteristic point detector adopt a preset first type of five sense organ characteristic point detection model, and the eyebrow characteristic point detector, the nose characteristic point detector and the ear characteristic point detector adopt a preset second type of five sense organ characteristic point detection model.

The first type of feature point detection model may be a two-stage Neural Network feature point detection model combining a Convolutional Neural Network (CNN) and a Deep Belief Network (DBN) to better model rich feature point shape changes; the second type of feature point detection model may be an active appearance model (AMM) based on shape and texture information, which is used to model the feature point positions for efficient detection of key feature points.

In one embodiment, the two-stage neural network detection model comprises a first level convolutional neural network and a second level deep belief network; the convolutional neural network of the first level is used for learning the mapping from the original facial features picture to the initial shape; the second-level deep belief network is used for providing a corrected change from the initial position to the final position of the characteristic point fitted by the deep belief network for each characteristic point; the convolutional neural network of the first hierarchy comprises two convolutional layers, two maximum pooling layers and a full-link layer, and the activation function adopts a Relu function; each feature point uses a deep belief network comprising three hidden layers to correct the position, only the last layer of each deep belief network is a full connection layer, and the hidden layers in front of the full connection layer are all limited Boltzmann machines; and each limited Boltzmann machine is pre-trained layer by using a maximum likelihood method of unsupervised learning, and the final result of the pre-training layer by layer is finely adjusted by the last full-connection layer.

By adopting the scheme of the embodiment, the boltzmann machine layers corresponding to all the feature points can be trained in parallel, and the training efficiency can be improved. The neural network used by the first type of facial feature point detection model can be built and trained by using keras.

In one embodiment, as shown in fig. 3, a method for analyzing human face features is provided, which is described by taking the method as an example applied to the computer device in fig. 1, and includes the following steps:

step S301: carrying out face angle detection on the target face image to obtain a face angle value;

step S302: selecting a face feature detector matched with the face angle value;

specifically, a plurality of face feature detectors for detecting face images at different angles may be preset, and the plurality of face feature detectors are respectively associated with different face angle ranges, for example, (-15 °,15 °) are associated with a first face feature detector, [ -60 °, -15 ° ] and [15 °,60 ° ] are associated with a second face feature detector, (60 °,90 ° ] and [ -90 °, -60 °) are associated with a third face feature detector. Here, (. cndot.) denotes an open interval, [. cndot. ] denotes a closed interval, and (. cndot.) and [. cndot. ] denote half-open and half-closed intervals. However, the dividing manner of the face angle range and the number of the divided faces are not limited to this.

After the face angle value is obtained, the face angle range corresponding to the face angle value is determined, and then the associated face feature detector is inquired according to the face angle range.

Step S303: inputting the target face image into the selected face feature detector, and acquiring rough position areas of the five sense organs in the target face image output by the selected face feature detector;

step S304: detecting corresponding feature points of the five sense organs on images of the five sense organs corresponding to the rough position areas respectively to determine the precise position areas of the five sense organs;

step S305: and performing face feature analysis according to each accurate position area to obtain face feature information or/and a statistical analysis result.

In this embodiment, before performing face feature detection, face angle detection is performed on the target face image to obtain a face angle value, a face feature detector matched with the face angle value is selected, and the target face image is input to the selected face feature detector.

In one embodiment, the above detecting the face angle of the target face image to obtain the face angle value may include: and inputting the target face image into a preset multi-angle face detection model for face angle detection, and acquiring a face angle value output by the multi-angle face detection model, wherein the multi-angle face detection model comprises a plurality of face classifiers with different face angles.

In one embodiment, the face feature detector generally employs a cascade shape regressor, the cascade shape regressor includes two levels of integration, the cascade shape regressor takes an initial shape as input, and completes the correction of the initial shape through cascade connection of a plurality of weak regressors to obtain a final feature point shape; the method comprises the following steps that local texture features of feature points are fitted by using a plurality of random forests, a local texture feature vector of each feature point corresponds to one random forest, and the mode of each feature value on the feature vector is calculated through one random tree; and learning the shielding state information of the feature points through shallow model logistic regression, and describing the shielding state information of each feature point by adopting a uniform binary feature vector.

Wherein if the feature point occlusion state vector indicates that the current feature point is occluded, then the local texture features of the feature point will not be used for fitting the feature point position correction changes. The face feature point detector used in this embodiment fits the partial variation of the feature point position correction by xgboost regression forest and texture information of all feature points and occlusion states. The xgboost regression forest uses the loss function added with a plurality of regular terms, well balances the complexity and regression precision of the regression tree structure, controls the amplitude and precision of the correction of the feature point position, and improves the convergence speed of the feature point detection algorithm. The machine learning regressors used in the cascade shape regressors are all realized through a sklern machine learning standard library.

Considering that a cascade shape regressor is adopted, an initial shape (namely an initial feature point position sequence) needs to be input firstly, and the traditional mode is to randomly generate an initial shape in a face region as input. For this reason, in the present embodiment, a method for determining an initial shape input by a cascade shape regressor is proposed, and a process for determining an initial shape input by a cascade shape regressor in the present embodiment includes:

firstly, loading average shape information obtained by training a sample, and determining an initial characteristic point and the position of the initial characteristic point according to the average shape information;

the average shape information includes normalized relative positions of the marked first feature point and four end points of the face region and relative position information (described by normalized relative distance, relative angle and pixel difference triples) of the first feature point and other feature points.

Secondly, determining the positions of other feature points one by using the positions of the initial feature points and the relative information of the initial feature points and other feature points;

here, the other feature points refer to feature points other than the first feature point.

And finally, determining a feature point position sequence according to the position of the initial feature point and the positions of the other feature points, and taking the feature point position sequence as the initial shape.

By adopting the scheme of the embodiment, a more reasonable initial shape can be generated, and the stability of the detection result is improved.

In one embodiment, as shown in fig. 4, the performing, according to each of the accurate location areas, a face feature analysis to obtain face feature information or/and a statistical analysis result may include:

step S401: determining a face region according to each accurate position region, wherein the face region comprises the accurate position regions of all five sense organs or the accurate position regions of part of the five sense organs;

specifically, a face region is determined according to each of the accurate position regions, and the face region may include information of the whole face or only include part of face information (for example, only include an eye region, or include both a nose region and a mouth region).

Step S402: determining a gray level co-occurrence matrix according to the pixel information of the face area;

in addition, the feature matrix templates at different angles can be used for calculation so as to increase the diversity of feature data and improve the feature matching capability of the matrix template.

Step S403: determining each statistic of the gray level co-occurrence matrix according to the gray level co-occurrence matrix and a plurality of preset operators;

the types of the statistics can be selected according to actual needs, and in one embodiment, the statistics include 10 types of contrast, energy, entropy, inverse variance, correlation, uniformity, diversity, and mean, and variance, and more or less types can be included according to needs.

Step S404: extracting preset types of image features according to the statistics, converting the extracted image features into image feature vectors, storing the image feature vectors in a feature file form, or performing statistical analysis on rules of the image feature vectors to obtain statistical results, and performing visualization processing on the statistical results;

the face or five sense organs can be extracted from the above 10 types of statistics, and the feature extraction based on the statistics can be implemented in any way, such as: the depth of texture grooves is described by using gray contrast, the local change of face color is described by using the correlation of regional pixels, the smoothness of cheeks is described by using gray uniformity, and the like, which are not described in detail herein.

In addition, in order to visually display the image features of the target face image, the image features can be displayed in a chart mode after being subjected to statistical analysis after being extracted. The type of chart may be, but is not limited to, a bar chart, a sector chart, a line chart, a mesh chart, a histogram, and the like.

In the embodiment, the omnibearing feature extraction and analysis of the human face or the five sense organs is carried out based on the gray level co-occurrence matrix, so that the difficulty that a unique feature extraction algorithm needs to be specially designed for each image feature in the prior art can be overcome.

In order to facilitate an understanding of the present invention, a preferred embodiment of the present invention will be described in detail. In this embodiment, a face feature analyzer implemented by the face feature analysis method according to the present invention is described as an example.

The face feature analyzer in this embodiment can perform multi-scale feature point detection on face photographs with various angles, and automatically perform comprehensive extraction and analysis on features of the five sense organs on multiple levels. The human face feature analyzer firstly detects human faces with different angles on a picture (equivalent to the target human face image), determines the area where the human face is located, then carries out positioning extraction on feature points of the five sense organs on a plurality of scales according to the angles of the human face in the area, and finally carries out extraction of a plurality of kinds of image features by using a uniform feature matrix template according to accurate area information determined by the feature points of the five sense organs, thereby realizing omnidirectional human face five sense organs feature analysis.

The face feature analyzer in this embodiment is mainly: the comprehensive analysis of the human face features is completed through three stages, the core functions of the first two stages are realized through a reusable model based on machine learning or deep learning training, and the comprehensive extraction and analysis of the image features are realized through a traditional image processing strategy in the last stage. The face feature analyzer first needs to detect faces on a picture, and at this stage, a plurality of face classifiers specially used for detecting faces at different angles are trained and used for finding faces at different angles and different sizes on an image. In the stage of detecting the feature points of the five sense organs, the faces are sent to different face feature point detectors according to the characteristics of the faces to extract key feature points of the faces, the positions of the five sense organs in the faces are determined by using the coarse-grained face feature points, and different parts of the five sense organs, which are normalized by the resolution levels, are sent to the corresponding feature point detectors of the five sense organs to extract the key feature points of the five sense organs with fine granularity. The extraction and analysis of the features of the whole face region or different local regions can be realized through the feature points of the five sense organs with different scales, and the independent analysis of the features of different five sense organs can be realized on a fine-grained scale. Different feature point detectors use different algorithms to train the detection model on a targeted basis depending on their detection scale. In order to extract and analyze different types of image features, different statistics of gray level co-occurrence matrixes are used for all kinds of image feature information of the human face regions or facial features determined by the feature points, and comprehensive analysis and statistics of facial features are achieved on different levels.

Most of the traditional image feature extraction algorithms lack a full-automatic process as a support. In order to extract the features of some objects in an image, a series of pre-processing including filtering, decentralization and the like must be performed on the image, the specific positions of the corresponding objects in the image are detected, the image feature extraction operation is separated from various image processing operations supporting the image feature extraction operation, great inconvenience is brought to users, and sometimes even a great amount of time is spent on the users to design a set of reasonable operation combination to finish efficient image object feature extraction. According to the embodiment, a set of effective and full-automatic human face feature analysis scheme is provided by integrating operations from human face image preprocessing to human face five-sense organ feature all-around extraction and the like, the whole process of human face image feature extraction is simplified, a convenient and unified interface is provided for a user, and the development burden of related programs is reduced.

To complete the automatic extraction and analysis of the human face features, firstly, human face detection is performed on an image. Because the application range of the existing face detection algorithm is single, for example, some face detection algorithms are only suitable for front face detection, some face detection algorithms are only suitable for pure side face detection, and the effect of the classifier of the multi-angle face is poor, in the scheme of the embodiment, the face classifiers with different angles are combined in the face detection stage, and a unified multi-angle face detector group is constructed by integrating the advantages of the face classifiers with different angles, so that the convenient detection of the multi-angle face in the image is realized, and the application range of the existing face detection algorithm is widened.

After the specific position of the face is determined by using face detection, the scheme of the embodiment detects multi-scale key feature points of five sense organs in the region where the face is located. The existing face feature point detection algorithm only considers the detection of coarse-grained facial feature points on the face scale, and the feature information of the whole face is used for calculation, so that the number of feature points in a local facial feature region is too small, and the subsequent extraction of facial feature points in a specific region is influenced. Therefore, in the scheme of the embodiment, on the basis of human face feature point detection, the rough regions of the positions of the five sense organs are determined by using the detected coarse-grained feature points of the five sense organs, and then fine-grained feature point detection of the five sense organs is performed in the regions, so that the regions where the positions of the five sense organs are located are further accurately determined, interference of irrelevant regions or noise on subsequent extraction of image features of the positions of the five sense organs is reduced, and all-around extraction and all-around analysis of multi-scale human face features of human face dimensions, five sense organs dimensions and even any local human face region are realized.

The scheme of the embodiment adopts a uniform characteristic matrix template, namely a gray level co-occurrence matrix to calculate different types of image characteristics, and overcomes the difficulty that a unique characteristic extraction algorithm needs to be specially designed for each type of image characteristics in the past. The multiple image characteristics are uniformly calculated on the same characteristic matrix template, so that the repeated calculation expense of the pixel information is greatly reduced, and when different types of image characteristics are calculated on the same picture, the recalculation from the pixel information of the bottom layer is not needed. The bottom layer pixel information of the same picture only needs to be calculated into a gray level co-occurrence matrix, and then the extraction and analysis of various image characteristics can be realized through the calculation of different statistics of the gray level co-occurrence matrix. Meanwhile, the gray level co-occurrence matrix has a large amount of different statistical quantities, so that the method for extracting the human face image features is wider and more comprehensive than a method for independently designing the feature extraction for each type of image features, and provides more comprehensive feature information for the analysis and statistics of the image features.

The face feature analyzer in the present embodiment is explained in detail below. Fig. 5 is a schematic diagram illustrating a composition structure and a principle of the face feature analyzer in the embodiment.

As shown in fig. 5, the face feature analysis performed by the face feature analyzer in this embodiment includes three main processes, which are respectively: the method comprises a multi-angle face detection stage, a multi-scale facial feature point detection stage and a comprehensive image feature extraction and analysis stage. In the multi-angle face detection stage, a plurality of face classifiers for detecting faces at different angles are trained to construct an effective multi-angle face detector group, so that accurate identification of most faces is realized.

In the multi-scale feature point detection stage of the five sense organs, the key feature points of the five sense organs with different granularities are detected on two scale levels. Firstly, the face detected in the last stage is integrated into three face data sets according to different angle ranges, and the three face data sets are sent to face feature point detectors with different angles to detect key feature points of five sense organs. The face characteristic point detectors of all angles are improved cascade-shaped regressors, different face data sets are used for training, all faces in the data sets before training are translated and aligned in face directions according to a first face picture, and interference on initialization of the positions of the face characteristic points in the training process is reduced. The human face characteristic point detector provided by the invention determines the positions of key characteristic points of five sense organs on the human face more accurately than the traditional human face characteristic point detector based on shape information only by comprehensively considering the local texture information, the shielding state and the pixel difference information among the characteristic points. The rough position and the region of each facial feature part on the face can be roughly determined by detecting the facial feature points with the coarse granularity on the face scale, and the facial feature parts are respectively sent to corresponding fine-granularity facial feature point detectors after being aligned, so that a more accurate feature point detection result can be obtained, and facial feature information with different scales is provided for subsequent feature extraction operation. Different five sense organ parts adopt different five sense organ feature point detectors according to the change conditions of the postures and expressions of the five sense organ parts, such as: the five sense organs with rich posture changes, namely the eyes and the mouth, use a two-stage neural network feature point detector combining a Convolutional Neural Network (CNN) and a Deep Belief Network (DBN) to better model the shape changes of rich feature points; and the eyebrow, the nose and the ear, which are the parts of five sense organs with fixed shapes, are modeled by using a traditional active appearance model AAM based on shape and texture information so as to efficiently detect key feature points.

By utilizing the acquired feature point information of the five sense organs on different scales, the invention finally carries out the all-round feature extraction and analysis of the human face or the five sense organs through the unified multi-angle gray level co-occurrence matrix. For any accurate face region determined by the feature points, firstly calculating a gray level co-occurrence matrix aiming at the region, and then calculating different types of face features through different statistics of the gray level co-occurrence matrix to complete comprehensive face feature extraction. The obtained feature information of the human face or the five sense organs can be imaged with different types of features by using a feature visualization interface of opencv, statistical analysis of the feature information can be performed by using a data analysis interface provided by matplotlib, and the feature information of the whole human face data set can be written into a disk to generate a standard feature vector file for other modules.

Fig. 6 shows a schematic diagram of the structure and principle of the human face feature point detector. In this embodiment, an improved cascade shape regressor based on local binary features is used to detect key feature points of a human face, and the cascade shape regressor includes two-level integration, takes an initial shape as input, and completes shape correction through cascade of a plurality of weak regressors, thereby finally obtaining more accurate feature point positions. Each weak regression device only learns part of correction change from the initial characteristic point position to the final characteristic point position, and a more reasonable final characteristic point shape can be obtained through multiple corrections of a plurality of weak regression devices. The weak regressor models the delta change of the feature point location correction by constructing local texture features and occlusion state information for each feature point. And fitting the local texture features of the feature points by using a series of random forests, wherein the local texture feature vector of each feature point corresponds to one random forest, and the mode of each feature value on the feature vector is calculated by one random tree. Since texture features are typically discrete information, a model using a tree structure may fit such features well. In addition, the shielding state information of the feature points is learned through simpler shallow model logistic regression, the shielding state information of all the feature points is described by adopting a uniform binary feature vector, and if the shielding state vector of the feature points shows that the current feature points are shielded, the local texture features of the feature points cannot be used for fitting the correction change of the feature point positions. In this embodiment, the face feature point detector is used to fit the partial change of the feature point position correction through the xgboost regression forest, the texture information of all the feature points, and the occlusion state. The xgboost regression forest uses the loss function added with a plurality of regular terms, well balances the complexity and regression precision of the regression tree structure, controls the amplitude and precision of the correction of the feature point position, and improves the convergence speed of the feature point detection algorithm. The machine learning regressors used in the cascaded shape regressors are all implemented by a sklern machine learning standard library.

Before the face feature point detection is carried out by using the cascade shape regressor, an initial shape (namely an initial feature point position sequence) needs to be input first, and in the past, an initial shape is generally generated randomly in a face region as input. Therefore, in this embodiment, a heuristic feature point initialization method is provided, and a more reasonable initial shape is generated by using the priori knowledge of the feature points in the training set as a guide, so as to improve the stability of the feature point detection result. Before the method is executed, average shape information closely related to the positions of the feature points needs to be calculated in a training stage, and the average shape information includes normalized relative positions of a first marked feature point and four end points of a face region and relative position information (described by normalized relative distance, relative angle and pixel difference triplets) of the first feature point and other feature points. The method comprises the steps of determining the position of an initial characteristic point (a first characteristic point) on a human face area by using average shape information, then determining the initial positions of all the characteristic points one by one according to the position information of the initial characteristic point and other characteristic points, and inputting the obtained initial characteristic point position sequence serving as an initial shape into a human face characteristic point detector for shape correction.

Fig. 8 and 9 are schematic diagrams of fine-grained five sense organ feature point detectors, wherein fig. 8 represents the structure of an eye feature point detector and a mouth feature point detector, and fig. 9 represents the training and detection process of an eyebrow feature point detector, a nose feature point detector and an ear feature point detector.

Because eyes and mouths have more postures and feature point shapes along with the change of expressions, the feature point detectors of the eyes and the mouths respectively fit the initial shapes and the correction processes of the feature points by using the deep neural network with stronger universality. The eye feature point detector and the mouth feature point detector are composed of two levels of neural networks, the first level is a convolutional neural network, and the second level is a deep belief network. The convolutional neural network of the first level is mainly used for learning the mapping from the original facial features picture to the initial shape, and the second level provides a correction change from the initial position to the final position of a DBN fitting characteristic point for each characteristic point. Since the region of the five sense organs often contains less pixel information, for this reason, the convolutional neural network structure of the first hierarchy in this embodiment is simpler, the network is shallower, and only includes two convolutional layers, two maximum pooling layers, and one full-link layer, and also in order to mitigate the influence of gradient disappearance and accelerate convergence, the local response normalization that can convert the distribution of the activation input into a normal distribution is performed before the activation function uniformly adopts a Linear rectification function (ReLU) and the full-link layer finally outputs the position coordinates of each initial feature point. In addition, because the correction change of each feature point is only related to the pixel information near the feature point and the relative position information of other feature points, each feature point uses a deep belief network only containing three hidden layers to correct the position. All deep belief networks only have the last layer as a full-connection layer, all hidden layers in the front are limited Boltzmann machines, the Boltzmann machines can be pre-trained layer by using a maximum likelihood method of unsupervised learning, the final result only needs to be finely adjusted through the last full-connection layer, and meanwhile, the Boltzmann machine layers corresponding to all characteristic points can be trained in parallel, so that the training efficiency is improved. The neural networks used by the eye feature point detector and the mouth feature point detector are also constructed and trained using keras, wherein keras is an Application Programming Interface (API).

The eyebrows, the nose and the ears have small amplitude of change of the shapes along with the expression and the posture, so that the characteristic point detection can also achieve good effect by using an active appearance model based on appearance information. The active appearance model firstly calculates the initial shape of feature points of the five sense organs through an average five sense organs model obtained through training, and then corrects the initial shape by using a feature point correction model to obtain the final position of the feature points. As shown in fig. 9, the training process of the active appearance model is mainly divided into two stages, the first stage trains an average facial features model according to the labeled facial features in the training set, the orientation and the relative position of all facial features images in the training set are aligned to the first facial features image, then the average facial features model is initialized and used to describe all facial features models in the training set (because all facial features models in the training set can be obtained by performing affine transformation on the average facial features model), and finally the average facial features model is calculated by all facial features models repeatedly and all facial features models are updated by the calculated average facial features model until the average facial features model converges. In the second stage, the initial shape obtained by calculating the average five sense organs model is used as input, the marked feature point position is used as an optimization target, and the feature point correction model is trained. The learning process of the initial shape correction is: firstly, a series of candidate points are selected near each feature point, then gradient features (pixel difference of nearby pixel points is calculated along the normal direction of a connecting line of adjacent feature points) and local texture features (local binary pattern description is used) of all the candidate points are constructed, the distance relation between feature information of the candidate points and feature information of marked feature points is calculated by adopting the Mahalanobis distance so as to evaluate the proximity degree of the candidate points and target feature points, and finally, the candidate point closest to the marked feature points is selected to update the feature points, and the process is continued until the feature points are converged. After each round of learning of initial shape correction, the parameters of the feature point correction model are updated, and model optimization continues until the termination condition (parameter threshold set in advance) is met.

Fig. 10 illustrates the principle of omni-directional image feature extraction using a gray level co-occurrence matrix based on key feature points of a human face or five sense organs. In this embodiment, after the detection of the key feature points is completed, a face region is determined by using the feature points, where the face region may include information of the whole face or only include part of face information (for example, only include eyes, and include a nose and a mouth at the same time); then, a gray level co-occurrence matrix is calculated according to the pixel information of the face area, and feature matrix templates of different angles can be used for calculation so as to increase the diversity of feature data and improve the feature matching capability of the matrix templates; then, different operators are used for calculating various statistics of the gray level co-occurrence matrix; and finally, extracting image features of different types by using the statistics. The gray level co-occurrence matrix has a plurality of statistics for image feature extraction, in this embodiment, 10 of the statistics (contrast, energy, entropy, inverse variance, correlation, uniformity, difference, and average and variance) are selected to perform comprehensive feature extraction on the face or five sense organs, for example: using gray contrast to describe the shade of texture grooves, using the correlation of regional pixels to describe local variations in face color, and using gray uniformity to describe how smooth the cheeks are, etc. The gray scale co-occurrence matrix is calculated by using a function packet provided by opencv, part of statistics is calculated by using an operation interface provided by matlab, and the operation of part of statistics is independently realized. The calculated feature information may be used for statistical analysis or to generate a feature vector file for use by other modules.

It should be understood that although the various steps in the flow charts of fig. 2-4 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-4 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least portions of sub-steps or stages of other steps.

In one embodiment, as shown in fig. 11, there is provided a face feature analysis apparatus including: a first area detection module 1101, a second area detection module 1102, and a feature analysis module 1103, wherein:

a first region detection module 1101, configured to determine a rough position region of each facial feature in a target face image by performing facial feature point detection on the whole face on the target face image;

a second region detection module 1102, configured to determine a precise position region of each facial feature part by performing corresponding facial feature point detection on a facial feature part image corresponding to each coarse position region;

a feature analysis module 1103, configured to perform face feature analysis according to each accurate location area, to obtain face feature information or/and a statistical analysis result.

In an embodiment, the second region detection module 1102 may input the facial feature images corresponding to the coarse position regions into different facial feature point detectors according to the categories of the corresponding facial feature parts to perform corresponding facial feature point detection, so as to obtain the precise position regions of the corresponding facial feature parts output by each facial feature point detector.

In one embodiment, the above-mentioned face feature detector is a cascade shape regressor, and the determination process of the input initial shape of the cascade shape regressor includes: loading average shape information obtained through a training sample, and determining an initial characteristic point and the position of the initial characteristic point according to the average shape information; determining the positions of other feature points one by using the positions of the initial feature points and the relative information of the initial feature points and other feature points; and determining a feature point position sequence according to the position of the initial feature point and the positions of the other feature points, and taking the feature point position sequence as the initial shape.

For the specific definition of the facial feature analysis device, reference may be made to the above definition of the facial feature analysis method, which is not described herein again. All or part of the modules in the human face feature analysis device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 12. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of face feature analysis. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 12 is merely a block diagram of some of the structures associated with the inventive arrangements and is not intended to limit the computing devices to which the inventive arrangements may be applied, as a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program: determining a rough position area of each facial feature part in a target face image by detecting the facial feature points of the whole face on the target face image; determining the precise position area of each facial features part by respectively detecting the corresponding facial features characteristic points on the facial features part image corresponding to each rough position area; and performing face feature analysis according to each accurate position area to obtain face feature information or/and a statistical analysis result.

In one embodiment, when the processor executes the computer program to implement the step of determining the precise position region of each facial features by respectively detecting corresponding facial features on the facial features image corresponding to each coarse position region, the following steps are specifically implemented: inputting the facial feature images corresponding to the rough position areas into different facial feature point detectors according to the categories of the corresponding facial feature parts to perform corresponding facial feature point detection, and acquiring the precise position areas of the corresponding facial feature parts output by the facial feature point detectors.

In one embodiment, the processor, when executing the computer program, further performs the steps of: carrying out face angle detection on the target face image to obtain a face angle value; the step of determining the rough position area of each facial feature part in the target face image by detecting the facial feature points of the whole face on the target face image comprises the following steps: and selecting a face feature detector matched with the face angle value, inputting the target face image into the selected face feature detector, and acquiring the rough position area of each facial feature part in the target face image output by the selected face feature detector.

In one embodiment, when the processor executes the computer program to perform the step of performing the face feature analysis according to each accurate position region to obtain the face feature information or/and the statistical analysis result, the following steps are specifically implemented: determining a face region according to each accurate position region, wherein the face region comprises the accurate position regions of all five sense organs or the accurate position regions of part of the five sense organs; determining a gray level co-occurrence matrix according to the pixel information of the face area; determining each statistic of the gray level co-occurrence matrix according to the gray level co-occurrence matrix and a plurality of preset operators; and extracting preset types of image features according to the statistics, wherein the extracted image features are used as the face feature statistics.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: determining a rough position area of each facial feature part in a target face image by detecting the facial feature points of the whole face on the target face image; determining the precise position area of each facial features part by respectively detecting the corresponding facial features characteristic points on the facial features part image corresponding to each rough position area; and performing face feature analysis according to each accurate position area to obtain face feature information or/and a statistical analysis result.

In one embodiment, when the computer program is executed by the processor to implement the step of determining the precise position region of each facial features by respectively detecting corresponding facial features on the facial features image corresponding to each coarse position region, the following steps are specifically implemented: inputting the facial feature images corresponding to the rough position areas into different facial feature point detectors according to the categories of the corresponding facial feature parts to perform corresponding facial feature point detection, and acquiring the precise position areas of the corresponding facial feature parts output by the facial feature point detectors.

In one embodiment, the computer program when executed by the processor further performs the steps of: carrying out face angle detection on the target face image to obtain a face angle value; the step of determining the rough position area of each facial feature part in the target face image by detecting the facial feature points of the whole face on the target face image comprises the following steps: and selecting a face feature detector matched with the face angle value, inputting the target face image into the selected face feature detector, and acquiring the rough position area of each facial feature part in the target face image output by the selected face feature detector.

In one embodiment, when the computer program is executed by the processor to implement the step of performing the face feature analysis according to each of the accurate position areas to obtain the face feature information or/and the statistical analysis result, the following steps are specifically implemented: determining a face region according to each accurate position region, wherein the face region comprises the accurate position regions of all five sense organs or the accurate position regions of part of the five sense organs; determining a gray level co-occurrence matrix according to the pixel information of the face area; determining each statistic of the gray level co-occurrence matrix according to the gray level co-occurrence matrix and a plurality of preset operators; and extracting preset types of image features according to the statistics, wherein the extracted image features are used as the face feature statistics.

It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer readable storage medium, and when executed, may include the processes of the above embodiments of the methods. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, however, as long as there is no contradiction between the combinations of the technical features, the technical features should be considered as the scope of the present description.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the appended claims.

Claims

1. A method for analyzing human face features, the method comprising:

carrying out face angle detection on a target face image to obtain a face angle value, selecting a face characteristic point detector matched with the face angle value, inputting the target face image to the selected face characteristic point detector, and acquiring a rough position area of each facial feature part in the target face image output by the selected face characteristic point detector; the face characteristic point detector is a cascade shape regressor; the cascade shape regressor comprises two levels of integration, the cascade shape regressor takes an initial shape as input, and the correction of the initial shape is completed through cascade connection of a plurality of weak regressors to obtain a final characteristic point shape; the weak regressor builds local texture characteristics and shielding state information for each characteristic point to model the change increment of the position correction of the characteristic points; the method comprises the following steps that local texture features of feature points are fitted by using a plurality of random forests, a local texture feature vector of each feature point corresponds to one random forest, and the mode of each feature value on the feature vector is calculated through one random tree; learning the shielding state information of the feature points through shallow model logistic regression, and describing the shielding state information of each feature point by adopting a uniform binary feature vector;

inputting the facial feature images corresponding to the rough position areas into different facial feature point detectors according to the categories of the corresponding facial feature parts to perform corresponding facial feature point detection, and acquiring the precise position areas of the corresponding facial feature parts output by the facial feature point detectors; wherein the five sense organ feature point detector comprises an eye feature point detector, an eyebrow feature point detector, a mouth feature point detector, a nose feature point detector and an ear feature point detector; the eye feature point detector and the mouth feature point detector adopt a preset first type of feature point detection model of five sense organs, and the eyebrow feature point detector, the nose feature point detector and the ear feature point detector adopt a preset second type of feature point detection model of five sense organs;

performing face feature analysis according to each accurate position area to obtain face feature information or/and a statistical analysis result;

wherein the determining process of the initial shape of the cascade shape regressor input comprises: loading average shape information obtained through a training sample, and determining an initial characteristic point and the position of the initial characteristic point according to the average shape information; determining the positions of other feature points one by using the positions of the initial feature points and the relative information of the initial feature points and other feature points; and determining a feature point position sequence according to the position of the initial feature point and the positions of the other feature points, and taking the feature point position sequence as the initial shape.

2. The face feature analysis method according to claim 1, wherein the first type of feature point detection model for five sense organs is a two-stage neural network detection model combining a convolutional neural network and a deep belief network; the second type of feature point detection model for five sense organs is an active appearance model based on shape and texture information.

3. The face feature analysis method according to claim 2, wherein the two-stage neural network detection model includes a convolutional neural network of a first hierarchy and a deep belief network of a second hierarchy;

the convolutional neural network of the first level is used for learning the mapping of the original facial features picture to the initial shape;

the depth belief network of the second level is used for providing a correction change from the initial position to the final position of the depth belief network fitting feature points for each feature point;

the convolutional neural network of the first hierarchy comprises two convolutional layers, two maximum pooling layers and a full-link layer, and the activation function adopts a Relu function; each feature point uses a deep belief network comprising three hidden layers to correct the position, only the last layer of each deep belief network is a full connection layer, and the hidden layers in front of the full connection layer are all limited Boltzmann machines; and each limited Boltzmann machine is pre-trained layer by using a maximum likelihood method of unsupervised learning, and the final result of the pre-training layer by layer is finely adjusted by the last full-connection layer.

4. The method according to claim 3, wherein the performing facial feature analysis according to each of the precise location areas to obtain facial feature information or/and statistical analysis results comprises:

determining a face region according to each accurate position region, wherein the face region comprises the accurate position regions of all five sense organs or the accurate position regions of part of the five sense organs;

determining a gray level co-occurrence matrix according to the pixel information of the face area;

determining each statistic of the gray level co-occurrence matrix according to the gray level co-occurrence matrix and a plurality of preset operators;

extracting preset image features according to the statistics, converting the extracted image features into image feature vectors, storing the image feature vectors in a feature file mode, or performing statistical analysis on rules of the image feature vectors to obtain statistical results, and performing visualization processing on the statistical results.

5. The method of claim 4, wherein the statistics include contrast, energy, entropy, inverse variance, correlation, homogeneity, dissimilarity, and mean and sum variance.

6. The method for analyzing facial features of claim 1, wherein the detecting facial angles of the target facial image to obtain facial angle values comprises:

inputting the target face image into a preset multi-angle face detection model;

and acquiring a face angle value output by the multi-angle face detection model, wherein the multi-angle face detection model comprises a plurality of face classifiers with different face angles.

7. An apparatus for analyzing a face feature, the apparatus comprising:

the first region detection module is used for carrying out face angle detection on a target face image to obtain a face angle value, selecting a face characteristic point detector matched with the face angle value, inputting the target face image to the selected face characteristic point detector, and obtaining a rough position region of each facial feature part in the target face image output by the selected face characteristic point detector; the multi-angle face detection model comprises a plurality of face classifiers with different face angles; the face characteristic point detector is a cascade shape regressor; the cascade shape regressor comprises two levels of integration, the cascade shape regressor takes an initial shape as input, and the correction of the initial shape is completed through cascade connection of a plurality of weak regressors to obtain a final characteristic point shape; the weak regressor builds local texture characteristics and shielding state information for each characteristic point to model the change increment of the position correction of the characteristic points; the method comprises the following steps that local texture features of feature points are fitted by using a plurality of random forests, a local texture feature vector of each feature point corresponds to one random forest, and the mode of each feature value on the feature vector is calculated through one random tree; learning the shielding state information of the feature points through shallow model logistic regression, and describing the shielding state information of each feature point by adopting a uniform binary feature vector;

a second region detection module, configured to input the facial feature images corresponding to the rough position regions into different facial feature point detectors according to the categories of the corresponding facial feature parts to perform corresponding facial feature point detection, and obtain accurate position regions of the corresponding facial feature parts output by each facial feature point detector; wherein the five sense organ feature point detector comprises an eye feature point detector, an eyebrow feature point detector, a mouth feature point detector, a nose feature point detector and an ear feature point detector; the eye feature point detector and the mouth feature point detector adopt a preset first type of feature point detection model of five sense organs, and the eyebrow feature point detector, the nose feature point detector and the ear feature point detector adopt a preset second type of feature point detection model of five sense organs;

the characteristic analysis module is used for carrying out face characteristic analysis according to each accurate position area to obtain face characteristic information or/and a statistical analysis result;

the device also comprises an initial shape determining module, a shape determining module and a shape determining module, wherein the initial shape determining module is used for loading average shape information obtained by training samples and determining initial characteristic points and positions of the initial characteristic points according to the average shape information; determining the positions of other feature points one by using the positions of the initial feature points and the relative information of the initial feature points and other feature points; and determining a feature point position sequence according to the position of the initial feature point and the positions of the other feature points, and taking the feature point position sequence as the initial shape.

8. The apparatus according to claim 7, wherein the second region detection module further comprises: the first type of feature point detection model of the five sense organs is a two-stage neural network detection model combining a convolutional neural network and a deep belief network; the second type of feature point detection model for five sense organs is an active appearance model based on shape and texture information.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 6 are implemented when the computer program is executed by the processor.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.