WO2019200749A1

WO2019200749A1 - Facial recognition method, apparatus, computing device and storage medium

Info

Publication number: WO2019200749A1
Application number: PCT/CN2018/095498
Authority: WO
Inventors: 王义文; 王健宗; 肖京
Original assignee: 平安科技（深圳）有限公司
Priority date: 2018-04-17
Filing date: 2018-07-12
Publication date: 2019-10-24
Also published as: CN108764031A; CN108764031B

Abstract

The present application provides a facial recognition method comprising: selecting, according to a first image capture angle, a first feature extraction method based on a facial recognition module trained in the CapsNet architecture; extracting, according to the first feature extraction method, a first facial feature corresponding to the first image capture angle, and converting, according to a spatial relationship, the first facial feature into a frontal face first feature vector; determining whether the degree of similarity of the first feature vector and a preset feature vector is less than a preset threshold; if so determining the images to be of the same face.

Description

Method, device, computer device and storage medium for recognizing a face

This application claims priority to Chinese Patent Application No. 2018103446690, filed on Apr. 17, 2018, entitled "Method, Apparatus, Computer Equipment and Storage Media for Recognizing Human Faces", the entire contents of which are hereby incorporated by reference. The citations are incorporated herein by reference.

Technical field

The present application relates to the field of application of convolutional neural networks, and in particular to methods, devices, computer devices and storage media for recognizing a human face.

Background technique

With the continuous advancement of society and the urgent need for fast and effective automatic authentication, biometrics technology has developed rapidly in recent decades. Compared with other biometrics, face recognition is characterized by directness, friendliness and convenience. And get more extensive research. The face is composed of eyes, nose, mouth, chin, etc. Because of the differences in the shape, size and distribution of these parts, each face in the world is very different, so these parts can be used as important features of face recognition. . In the existing face recognition, whether it is face recognition or face recognition, it is necessary to face the pendulum to be accurately recognized, and the existing face recognition only recognizes whether the specific entity type representing the face exists or not, and does not Considering the connection between the spatial positional relationship of the specific entity type of the face, the accuracy of the face recognition is not high, and the image can only be recognized by comparing the image of the specified gesture. The existing face recognition is mechanically rigid and lacks stickers. Humanized design that fits human habits. Moreover, the existing face recognition has a low degree of similarity for a face with a very high similarity, such as a quadruple with a high degree of similarity.

technical problem

The main purpose of the present application is to provide a method for recognizing a human face, which aims to solve the technical problem that the existing face recognition mechanism is rigid and the accuracy is not high.

Technical solution

The present application proposes a method for recognizing a face, including:

Selecting a corresponding first feature extraction mode in the face recognition model based on the CapsNet network structure training according to the first shooting angle of the acquired face image;

Extracting, according to the first feature extraction manner, a feature of the first specific entity type corresponding to the first shooting angle, and selecting a feature of the first human face according to a spatial position relationship in the face recognition model Converting a first feature vector of a face image of a frontal face in the face recognition model;

Determining whether the similarity between the first feature vector and the preset feature vector is less than a preset threshold;

If it is smaller, it is determined that the acquired face image and the face image corresponding to the preset feature vector are face images of the same person.

The application proposes a device for recognizing a face, comprising:

An acquiring module, configured to select a corresponding first feature extraction mode in a face recognition model based on the CapsNet network structure training according to the first shooting angle of the acquired face image;

a first conversion module, configured to extract, according to the first feature extraction manner, a feature of the first face corresponding to the first shooting angle, and according to a spatial position relationship in the face recognition model, a feature of the first human face is converted into a first feature vector of the face image of the frontal face in the face recognition model;

a determining module, configured to determine whether a similarity between the first feature vector and the preset feature vector is less than a preset threshold;

The determining module is configured to determine that the face image corresponding to the acquired face image and the preset feature vector is a face image of the same person if the similarity is less than a preset threshold.

The application also provides a computer device comprising a memory and a processor, the memory storing computer readable instructions, the processor implementing the steps of the method when the computer readable instructions are executed.

The present application also provides a computer non-transitory readable storage medium having stored thereon computer readable instructions that, when executed by a processor, implement the steps of the methods described above.

Beneficial effect

The present application has the beneficial technical effects: the present application changes the structure of the output training model by changing the convolution structure, that is, changing the existing training model that only identifies a specific entity category to a spatial location that identifies a specific entity category and each specific entity category. The training model of the relationship realizes accurate representation and recognition of the facial image within any angle range that can recognize any facial features of the human face; the training model of the present application includes the spatial positional relationship of each facial feature, not only identifying the person The facial feature features also recognizes the spatial positional relationship of the facial features, improves the accuracy of face recognition, and thus accurately recognizes; the training model of the present application includes the spatial positional relationship of each facial feature, from any one that can recognize the face. From the perspective of the five senses, the frontal image of the face can be converted by the spatial positional relationship, which can be recognized under the arbitrary posture, and the face recognition is more flexible, efficient and humanized; the amount of data required for training the model in this application is large. Reduce, use fewer samples to train training models that accurately identify the output.

DRAWINGS

1 is a schematic flow chart of a method for recognizing a human face according to an embodiment of the present application;

2 is a schematic structural diagram of an apparatus for recognizing a human face according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an apparatus for optimizing a face of an embodiment of the present application; FIG.

4 is a schematic structural diagram of a training module according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of an apparatus for recognizing a human face according to another embodiment of the present application; FIG.

6 is a schematic structural diagram of a determining module according to an embodiment of the present application;

FIG. 7 is a schematic structural diagram of an apparatus for recognizing a human face according to still another embodiment of the present application; FIG.

FIG. 8 is a schematic structural diagram of an apparatus for recognizing a human face according to still another embodiment of the present application; FIG.

9 is a schematic structural diagram of an apparatus for recognizing a human face according to still another embodiment of the present application;

FIG. 10 is a schematic diagram showing the internal structure of a computer device according to an embodiment of the present application.

BEST MODE FOR CARRYING OUT THE INVENTION

Referring to FIG. 1, a method for recognizing a human face according to an embodiment of the present application includes:

S1: Select a corresponding first feature extraction mode in the face recognition model based on the CapsNet network structure training according to the first shooting angle of the acquired face image.

The shooting angle of this embodiment includes an angle that can capture any facial features of the human face based on the front line of sight of the frontal face, such as a lateral positive 90 degrees being a positive right face direction, and a lateral negative 90 degrees being a positive left side. Face orientation; for example, 90 degrees in the top direction, 90 degrees in the bottom direction, etc., all shooting angles are distributed on the spherical surface based on the front line of sight of the front face. The photographing angle of the present embodiment is acquired by a camera that assembles the above-described face recognition model. The first shooting angle of the embodiment is a certain shooting angle of the person to be verified to distinguish the second shooting angles when the registrant performs registration; the first feature extraction mode is relative to a specified shooting angle of the person to be verified. The second feature extraction method corresponding to each second shooting angle when the registrant performs registration is similar to the above, and the above-mentioned "first" and "second" are only differences, and are not limited. The CapsNet network structure of this embodiment is based on a capsule unit network structure. The capsule unit is a vector containing arbitrary values, each value representing a feature of an object that needs to be identified currently, such as an eye in a facial expression. . The capsule network is composed of a capsule unit, and the vector in the capsule network can represent not only the characteristics of the object but also the direction and state of the object. In this embodiment, the probability of the presence of a face is represented by the length of the input and output vector, and the direction of the vector represents some of the facial features of the face. Capsules of the same level predict the instantiation parameters of higher-level Capsules through transformation matrices. When multiple predictions are consistent (this embodiment uses dynamic routing to make predictions consistent), higher level Capsules will become active. In this embodiment, the various states of the facial features present in the face image are represented by the activation of the neurons in the Capsule. The above properties may include many different parameters, such as posture (position, size, direction), deformation, Speed, reflectivity, color, texture, etc. In this embodiment, the probability of the appearance of a face is expressed according to the vector length of the input and output, and the probability value must be between 0 and 1. In order to achieve probability compression and complete the activation function of the Capsule level, the capsule network of the present embodiment uses a non-linear function of Squashing, which ensures that the length of the short vector can be shortened to almost equal to zero, and the long vector The length is compressed to close but not more than one. The expression of Squashing's nonlinear function is divided into two parts:

with

The nonlinear function is:

The first part is the scaling scale of the input vector S _j , and the second part is the unit vector of the input vector S _j , which preserves the direction of the input vector and compresses the length of the input vector into the interval (0, 1). To achieve the probability of an entity appearing by the size of the vector modulus. The larger the modulus, the greater the probability. When the S _j vector is a zero vector, V _j can take 0, and when S _{j is} infinite, V _j infinitely approaches 1. This nonlinear function can be regarded as a kind of compression and redistribution of the length of the vector, and can also be regarded as a The way the input vector is activated after the output vector. The input vector of Capsule is equivalent to the scalar input of CNN, and the calculation of this vector is equivalent to the propagation and connection between two layers of Capsule. The calculation of the input vector is divided into two phases, namely linear combination and routing (routing process, this embodiment is dynamic routing), which is expressed as:

Where u is the output of the capsule network of the previous layer, and W is the weight to be multiplied for each output. It can be regarded as the output of each capsule neuron in the upper layer to a certain neuron in the next layer with different strengths and weaknesses. C is calculated according to the following formula:

C is the coupling coefficient. Compared with the CNN network structure, the network structure of the Capsnet capsule of this embodiment is similar to the linear weighted summation of the network, but a coupling coefficient C is added to the linear summation stage. In order to find C, it is necessary to first obtain b, b according to The following formula calculates:

b initial value is 0, so in the process of forward propagation S, this embodiment design W as a random value, b initialized to 0 can get C, U is the output of the upper layer capsule network, V _j is Capsule j The output vector, according to the above relationship, obtains a higher layer capsule input S. For two vectors of a given length but different directions, the product has the following conditions: positive value, zero, negative value. When the multiplication result of two vectors is positive, it means that the two vectors point in the same direction. Then the b update result becomes larger, then the coupling coefficient is high, indicating that the two vectors match very well. Conversely, if the multiplication of the two vectors is negative, then the b update result becomes smaller, then the coupling coefficient is small, indicating that the two vectors do not match. Determining C by iteration is equivalent to determining the route on which the capsule neurons are particularly large, and the end of the route is the correctly predicted capsule.

The face recognition model of the embodiment includes a coordinate system of the face image of each angle, and represents a specific facial features of the face in a vector, such as a face, an ear, a nose, and the like, including a human face, and The vector includes the parameters of the facial features, such as the size, position, direction, color and other posture attribute parameters, to indicate the relationship of the relative spatial position of the facial features. In this embodiment, the facial features recognized by the respective angles are different, and the corresponding manners of converting the facial images into the frontal faces according to the spatial positional relationship are different. For example, the vector representation of the nose obtained from the top view state is different from the vector representation of the nose acquired from the left face direction, and the rotation vector according to the nose vector in the top view state is converted into the front nose vector, unlike the face from the left side. The vector representation of the nose acquired in the direction is converted to the rotation of the front nose vector.

S2: extracting, according to the first feature extraction manner, a feature of the first face corresponding to the first shooting angle, and according to the spatial position relationship in the face recognition model, the feature of the first face is The first feature vector of the face image converted into a frontal face in the face recognition model.

In this embodiment, the face images corresponding to different feature extraction modes are different at different angles, and the features of the first face recognized under different angles are different. For example, each angle of the facial features can be recognized perpendicular to the angle of the front face. The positive state of the organ and the overall distribution between the facial features, and perpendicular to the direction of the right face, can only recognize the lateral state of the right eye, the right half of the nose, the right ear and the like. However, in this embodiment, the face recognition model is a three-dimensional model with spatial positional relationship, and a certain part of the three-dimensional structure recognized under the first shooting angle can be converted into another angle according to the inherent spatial positional relationship. The image of the three-dimensional structure, for example, converts the side state of the organ identified to the right eye, the right half of the nose, the right ear, and the like to a state perpendicular to the photographing of the frontal face, thereby identifying the positive state feature of each organ in the facial features. And an overall distribution state between the five senses, outputting a first feature vector corresponding to a specific entity type of the front face, and the specific entity type corresponding to the facial features of the face.

S3: Determine whether the similarity between the first feature vector and the preset feature vector is less than a preset threshold.

The preset feature vector of the embodiment is a result of outputting the face image of the preset preset angle by inputting the feature extraction mode of the corresponding angle. The preset threshold of this embodiment is 0.8 or more. For example, in the home smart door lock, the feature vector of the face image of A is pre-registered. When A wants to enter the home, the face recognition model in the smart door lock needs to be used for face recognition verification, and the verification can be opened. Smart door lock. When the A side is under the camera of the smart door lock, the camera captures the facial features of the side face of A and transmits it to the face recognition model. The face model invokes the recognition of A according to the facial features of the side face of A. The database is enabled with the facial position of the facial features of A, and converts the side face of A into a front face, and outputs the feature vector corresponding to the facial features in the front face, and calculates the feature vector to be compared with the preset feature vector registered by A, If the calculated value is within the set threshold, the smart door lock is controlled to be open.

S4: If it is smaller, it is determined that the acquired face image and the face image corresponding to the preset feature vector are face images of the same person.

The smaller the value of the similarity of this embodiment, the higher the degree of similarity. For example, the threshold is set to 0.8, and when it is less than 0.8, it is determined that the acquired face image and the face image corresponding to the preset feature vector are the same person's face shadow.

Further, before step S1 of this embodiment, the method includes:

S10: Collect face image data of each shooting angle of a plurality of people to construct a training model sample library.

In this embodiment, the face image data of each angle is collected, and the face image data of each angle of multiple people is included, so as to more accurately train the model and improve the generalization ability of the model, but the Capsnet network structure includes the spatial positional relationship factor. Compared with the existing CNN network structure, the sample data required to construct the training model is significantly reduced, but the training model with higher accuracy is obtained. CNN (Convolutional Neural Network), in terms of network design, the pooling layer not only reduces the parameters, but also avoids over-fitting, but at the same time discards some information, such as location information. CNN does not pay attention to the relative orientation of the components and the spatial relationship, only care about whether there are specific features. For example, when face recognition is performed by CNN, it is necessary to recognize two eyes, nose, mouth, etc. of the face in order to perform face recognition. Therefore, CNN face recognition must be positively shot before accurate recognition is required. Five features. Moreover, the positions of the two eyes, nose and mouth change during the recognition process, but after the recognition of all the facial features, the recognition result of the CNN is still a human face, resulting in a large recognition error, especially for identifying objects other than the human face. The error rate is higher. The CapsNet (Capsule Networks, CapsNet) network structure of this embodiment makes full use of the spatial position relationship, and the training model is established by matrix multiplication. The neural activity used in CapsNet changes with the change of the angle of view without eliminating the nerve. The influence of the change of the angle of view in the activity can simultaneously process a plurality of different affine transformations or different parts of different objects, so that the training model can recognize the face images of each angle. The training model of the embodiment has the homomorphism of the equivariant mapping, and the facial image of each angle still has the ability of recognition and representation after being rotated, translated, and scaled, so that the training model of the embodiment can recognize each observed face. A facial image of any angle of the five senses. For example, the CapsNet network structure of the embodiment outputs a training model corresponding to each angle of the face image, and can accurately identify the person within a range of perspectives such as a face view, a bottom view, a side view, a front view, and the like that can recognize any facial features of the face. The face, without the need for correct posture, improves the flexibility and efficiency of face recognition, and avoids the drawback that the face in the existing face recognition must be mechanically and correctly facing the recognition plane, which expands the flexibility of face recognition. The user experience of using the face recognition device is improved, and there is no need to change the hardware system of the existing face recognition system.

In this embodiment, A is a feature data matrix of a facial face of m×p, and B is a spatial position relationship matrix of a facial face of p×n, then a matrix M of m×n is a product of a matrix A and B, M=AB, but only when the number of columns of the matrix A is equal to the number of rows of the matrix B, it is meaningful to multiply A and B, wherein the i-th row and the j-th column elements in the matrix M can be expressed as:

Where a is the data in the matrix A, b is the data in the matrix B, and P is the number of columns or rows equal to the matrix A and B. In this embodiment, the combination relationship between the feature data of the facial features and the spatial position of the facial features is represented by the matrix multiplication in the model, and the data of the facial features are compactly combined by matrix multiplication. A face recognition model with a spatial positional relationship is simply represented.

S11: The face image data of the training model sample library is input into a convolutional layer neural network of a CapsNet network structure for training, to obtain the face recognition model.

The training model of the embodiment can recognize the facial features from any angle that recognizes the facial features, and converts the facial features of the frontal human face according to the spatial positional relationship in the training model, thereby performing face recognition.

Further, step S11 of the embodiment includes:

S111: Input the face image data of each angle of the training model sample library into the first convolution layer of the CapsNet network structure, convolute with the first specified convolution kernel and the first specified step, and output the output by specifying the activation function. the amount.

The CapsNet network structure of this embodiment includes two convolution layers and a full connection layer, and the first convolution layer is a conventional convolution layer, which serves as a detection function of pixel-level local features. The first convolutional layer of this embodiment has 256 9*9 first designated convolution kernels, the first specified step size is 1, and the activation function is designated as ReLU. The first convolution layer converts the pixel brightness into an activation of the local feature detector, and the output tensor of the first convolution layer serves as the input to the second convolution layer.

S112: input the tensor into a second convolution layer of the CapsNet network structure, and convolute with a second specified convolution kernel and a second specified step to construct a tensor structure, and output the tensor structure Capsule vector.

The second convolution layer of this embodiment is a Primary Capsules layer (main capsule layer), which is the bottom layer of the multi-dimensional entity category, and has 32 channels, each channel is composed of an 8-dimensional convolution structure, and each channel is output. An 8-dimensional vector that achieves the effect of 8*1 Capsules feature encapsulation. The Capsule of this embodiment is a set of neurons whose input and output vectors represent instantiation parameters of a particular entity class. The specific entity category of this embodiment is a facial feature. For example, the second specified convolution kernel of this embodiment is a 9*9 convolution kernel and the second specified step size is 2. In the CapsNet architecture of this embodiment, eight convolution units are packaged together into a new Caosule unit. The convolution calculation of the Primary Capsules layer does not use an activation function such as ReLU (Rectified Linear Unit), but prepares the input into the next Capsule unit in a vector manner.

S113: Propagating and routing the Capsule vector through a DigitCaps layer (digital capsule layer) of the CapsNet network structure, and outputting a face recognition model.

In the DigitCaps layer of the CapsNet network structure of this embodiment, the activation vector modulus of each Capsule gives an instance of each specific entity class. For example, the output range of the DigitCaps layer is between 0 and 1, and 0 means no. Existence, 1 indicates that it has appeared. The input of the DigitCaps layer is the output vector of all Capsules in the Primary Capsules layer, the vector dimension is [8, 1]; the vector dimension of the output vector of the DigitCaps layer is [16, 1], and the training of the 16-dimensional output of CapsNet in this embodiment The model is robust.

Before step S3 of another embodiment of the present application, the method includes:

S30: Receive a second shooting angle face image of the registered face.

The second shooting angle of the embodiment is the shooting angle of the registrant to distinguish it from the first shooting angle with respect to the person to be tested. Here, the “first” and the “second” are only different, and are not limited. The other paragraphs are the same as this, and are not described here. The registered face of the embodiment includes a face of one person or a plurality of people, so as to recognize a face image of a plurality of people on the same recognition device, and expand the application range.

S31: Select a second feature extraction manner corresponding to the second shooting angle according to each second shooting angle, and extract the features of the second human faces of the registered human faces corresponding to the second shooting angles one by one, according to the The spatial positional relationship in the face recognition model converts the features of each of the second human faces into a second feature vector of the face image of the front face of the registrant in the face recognition model.

In this embodiment, the conversion mode and the conversion principle of the second feature of the face image corresponding to the second shooting angle of the registered face to the face image of the front face are the same as the step S2, and details are not described herein.

S32: Set the second feature vector to be the preset feature vector.

By verifying the registered second feature vector as the reference preset feature vector, in order to accurately identify the face feature vector of the person to be tested, when the face feature vector to be tested is close to the preset feature vector, the determination is the same. personal.

The embodiment of the present application further includes a case where the registrant can be multiple people, and the preset feature vector corresponds to multiple. When the face recognition model is used for face recognition, the facial features of the different registrants can be separately established. Corresponding database, the database contains the facial features of the registrant and the spatial position relationship of the five senses. Obtaining a feature vector that matches the first feature vector among the plurality of preset feature vectors, and determining that the face image corresponding to the face image acquired in the current period is the same face image as the matched feature vector. When the face recognition model is recognized, the corresponding database is firstly retrieved according to the recognized facial features, and then the respective conversions are implemented according to the corresponding spatial positional relationship. The conversion process and the principle are the same as above, and will not be described again.

Further, step S3 of the embodiment includes:

S300: Calculate a distance value between the first feature vector and the preset feature vector.

The distance value of this step includes an Euclidean distance, a cosine distance or a Mahalanobis distance. In this embodiment, the Euclidean distance is preferably used to represent the similarity between the first feature vector and the preset feature vector. The Euclidean distance between the first feature vector and the preset feature vector in this embodiment is expressed as:

Where X is the face feature vector extracted from the face used for registration, Y is the face feature vector extracted at the time of verification, and n is a natural number.

S301: Determine whether the size of the distance value is less than a preset threshold.

In this embodiment, the face recognition is taken as an example. The CapsNet-based multi-angle recognition method of the present embodiment is also extended to the identification of target items in other fields, and is not described herein.

In this embodiment, the training model corresponding to the face image of each angle is output through the CapsNet network structure, and the feature extraction manner corresponding to the output face image changes according to the change of the angle of view, without eliminating the influence of the change of the angle of view in the neural activity. The different affine transformations or different components of different objects can be processed at the same time. The training model of the embodiment has the homomorphism of the equivariant mapping, and the facial images of each angle still have the recognition and representation after being rotated, translated and scaled. The ability to make the training model recognize facial images at various angles. In the training model outputted by the CapsNet network in this embodiment, the spatial position relationship of each facial feature is included, and the frontal image of the human face can be converted by the spatial positional relationship from any angle that can recognize any facial features of the human face. By using the face recognition method of the embodiment, the face features can be accurately recognized in any range of viewing angles such as face view, bottom view, side view, and front view, and the face of the existing face recognition must be mechanically and correctly corrected. The drawbacks of the recognition plane extend the flexibility of face recognition, improve the user experience of using the face recognition device, and do not need to change the hardware system of the existing face recognition system.

In still another embodiment of the present application, after step S4, the method includes:

S5: issuing a control command to the security system equipped with the face recognition model to open the security system, so that the application entity controlled by the security system is in a usable state.

In this embodiment, the face recognition device is used in the designated security system. When the face recognition is determined to be the same person, the preset control command when the same person is sent to the security system is used to better play the function of the security system. . The security system of this embodiment includes, but is not limited to, a smart door lock switch, an identity verification access control, and various internet security platforms, such as a tax registration platform, a bank account platform, and a candidate authentication platform, etc., to improve the timeliness and accuracy of verification. The application entity of this embodiment includes a physical object and a virtual platform, such as a physical toy, a public fitness equipment, and the like, and a virtual platform such as an online game platform, a network video platform, or the like.

In still another embodiment of the present application, after step S5, the method further includes:

S6: The statistic specifies whether the accumulated time length of the same entity for the continuous use of the application entity in the specified time period exceeds a threshold.

The threshold range of this step can be specifically set according to different fields of use. For example, the embodiment is used in the field of game monitoring to prevent the same long time from being in a game state and affecting physical health. In this embodiment, the threshold value can be set to be 2 hours in the accumulated game state for 12 hours.

S7: If exceeded, generate an instruction to close the application entity to prohibit continued use of the application entity.

The embodiment is used in combination with time monitoring to further expand the application scene field of face recognition, for example, combining time monitoring with real-time monitoring of face recognition, and controlling the game software to prevent excessive addiction to the game, such as The face recognition system is configured to monitor the use state of the game system and whether the user is the same person. When the game time of the same person in the game system is monitored to exceed the preset value, the game system is controlled to be in a lock screen state.

For example, for the management and distribution of public resources, the face recognition is used to monitor the use status of public facilities in the public domain. If it is determined that the same person's continuous use time exceeds the preset value, the public facility is shut down, and when it is determined that When others enter the public domain to use public facilities, they are automatically opened for use, which is conducive to the rational allocation of public resources.

S8: The image data of the same face is summarized into the same specified file.

In this embodiment, by categorizing the electronic photo data of the registrant as an example, by identifying the same registered face image, the images of the same registrant are summarized in the same folder, thereby realizing the regularization of the image data including the same registrant. To make it easier to find the corresponding photo data of the registrant.

In this embodiment, by changing the convolution structure, the structure of the output training model is changed, that is, the existing training model that only identifies the specific entity category is changed to a training model that takes into consideration the specific spatial type and the spatial positional relationship of each specific entity type. Realize any accurate representation and recognition of facial images within the range of angles that can recognize any facial features of the human face; the training model output through the CapsNet network contains the spatial positional relationship of each facial features, not only the facial features of the human face It also recognizes the spatial positional relationship of the facial features, improves the accuracy of face recognition, and thus accurately recognizes; the training model output through the CapsNet network contains the spatial positional relationship of each facial features, from any one that can recognize any facial features of the human face. The angle can be converted into the frontal image of the face through the spatial positional relationship, which can be recognized under random gestures, without the need to deliberately pose, the face recognition is more flexible, efficient and user-friendly; the data needed for the training model output through the CapsNet network The amount is greatly reduced, and the output can be trained accurately with fewer samples. Other training model.

Referring to FIG. 2, a CapsNet-based multi-angle recognition face device according to an embodiment of the present application includes:

The obtaining module 1 is configured to select a corresponding first feature extraction mode in the face recognition model based on the CapsNet network structure training according to the first shooting angle of the acquired face image.

The shooting angle of this embodiment includes an angle that can capture any facial features of the human face based on the front line of sight of the frontal face, such as a lateral positive 90 degrees being a positive right face direction, and a lateral negative 90 degrees being a positive left side. Face orientation; for example, 90 degrees in the top direction, 90 degrees in the bottom direction, etc., all shooting angles are distributed on the spherical surface based on the front line of sight of the front face. The photographing angle of the present embodiment is acquired by a camera that assembles the above-described face recognition model. The first shooting angle of the embodiment is a certain shooting angle of the person to be verified to distinguish the second shooting angles when the registrant performs registration; the first feature extraction mode is relative to a specified shooting angle of the person to be verified. The second feature extraction method corresponding to each second shooting angle when the registrant performs registration is similar to the above, and the above-mentioned "first" and "second" are only differences, and are not limited. The CapsNet network structure of this embodiment is based on a capsule unit network structure. The capsule unit is a vector containing arbitrary values, each value representing a feature of an object that needs to be recognized currently, such as an eye of a facial expression. The capsule network is composed of a capsule unit, and the vector of the capsule network can represent not only the characteristics of the object but also the direction and state of the object. In this embodiment, the probability of the presence of the face is represented by the length of the vector of the input and output, and the direction of the vector represents some of the facial features of the face). Capsules of the same level predict the instantiation parameters of higher-level Capsules through transformation matrices. When multiple predictions are consistent (this embodiment uses dynamic routing to make predictions consistent), higher level Capsules will become active. In this embodiment, the various states of the facial features present in the face image are represented by the activation of the neurons in the Capsule. The above properties may include many different parameters, such as posture (position, size, direction), deformation, Speed, reflectivity, color, texture, etc. The length of the input and output vector indicates the probability of a face appearing, and the probability value must be between 0 and 1. In order to achieve probability compression and complete the activation function of the Capsule level, the capsule network of the present embodiment uses a nonlinear function of Squashing, which ensures that the length of the short vector can be shortened to almost equal to zero, and the length of the long vector is compressed to Close to but not more than one. The expression of Squashing's nonlinear function is divided into two parts:

with

The nonlinear function is:

The first part is the scaling scale of the input vector S _j , and the second part is the unit vector of the input vector S _j , which preserves the direction of the input vector and compresses the length of the input vector into the interval (0, 1). To achieve the probability of an entity appearing by the size of the vector modulus. The larger the modulus, the greater the probability. When the S _j vector is a zero vector, V _j can take 0, and when S _{j is} infinite, V _j infinitely approaches 1. This nonlinear function can be regarded as a kind of compression and redistribution of the length of the vector, and can also be regarded as a The way the input vector is activated after the output vector. The input vector of Capsule is equivalent to the scalar input of classical neural network neurons, and the calculation of this vector is equivalent to the propagation and connection between two layers of Capsule. The calculation of the input vector is divided into two phases, namely linear combination and routing (routing process, this embodiment is dynamic routing), which is expressed as:

C is the coupling coefficient. Compared with the CNN network structure, the input of the Capsnet network structure of the present embodiment is similar to the linear weighted summation of the network, but a coupling coefficient C is added to the linear summation stage. In order to find C, it is necessary to first obtain b, b according to the following. Formula calculation:

b initial value is 0, so in the process of forward propagation S, this embodiment design W as a random value, b initialized to 0 can get C, U is the output of the upper layer capsule network, V _j is Capsule j The output vector, according to the above relationship, obtains a higher layer capsule input S. For two vectors of a given length but different directions, the product has the following conditions: positive value, zero, negative value. When the multiplication result of two vectors is positive, it means that the two vectors point in the same direction. Then the b update result becomes larger, then the coupling coefficient is high, indicating that the two vectors match very well. Conversely, if the multiplication of the two vectors is negative, then the b update result becomes smaller, then the coupling coefficient is small, indicating that the two vectors do not match. By iteratively determining C, it is equivalent to determining the route, the capsule neurons on the route are particularly large, and the end of the route is the correctly predicted capsule.

a first conversion module 2, configured to extract, according to the first feature extraction manner, a feature of the first face corresponding to the first shooting angle, and according to a spatial position relationship in the face recognition model, The feature of the first human face is converted into a first feature vector of the face image of the frontal face in the face recognition model.

The determining module 3 is configured to determine whether the similarity between the first feature vector and the preset feature vector is less than a preset threshold.

The determining module 4 is configured to determine that the acquired face image and the face image corresponding to the preset feature vector are the face images of the same person if the similarity is less than the preset threshold.

Referring to FIG. 3, the face recognition device of this embodiment includes:

The acquisition module 10 is configured to collect facial image data of each shooting angle of a plurality of people to construct a training model sample library.

In this embodiment, the face image data of each angle is collected, and the face image data of each angle of multiple people is included, so as to more accurately train the model and improve the generalization ability of the model, but the Capsnet network structure includes the spatial positional relationship factor. Compared with the existing CNN network structure, the sample data required to construct the training model is significantly reduced, but the training model with higher accuracy is obtained. The existing CNN (Convolutional Neural Network), from the network design, the pooling layer not only reduces the parameters, but also avoids over-fitting, but at the same time discards some information, such as location information. CNN does not pay attention to the relative orientation of the components and the spatial relationship, only care about whether there are specific features. For example, when face recognition is performed by CNN, it is necessary to recognize two eyes, nose, mouth, etc. of the face in order to perform face recognition. Therefore, CNN face recognition must be positively shot before accurate recognition is required. Five features. Moreover, the positions of the two eyes, nose and mouth change during the recognition process, but after the recognition of all the facial features, the recognition result of the CNN is still a human face, resulting in a large recognition error, especially for identifying objects other than the human face. The error rate is higher. The CapsNet (Capsule Networks, CapsNet) network structure of this embodiment makes full use of the spatial position relationship, and the training model is established by matrix multiplication. The neural activity used in CapsNet changes with the change of the angle of view without eliminating the neural activity. The influence of the change in the mid-view angle can simultaneously process a plurality of different affine transformations or different components of different objects, so that the training model can recognize the face images of the respective angles. The training model of the embodiment has the homomorphism of the equivariant mapping, and the facial image of each angle still has the ability of recognition and representation after being rotated, translated, and scaled, so that the training model of the embodiment can recognize each observed face. A facial image of any angle of the five senses. For example, the CapsNet network structure of the embodiment outputs a training model corresponding to each angle of the face image, and can accurately identify the person within a range of perspectives such as a face view, a bottom view, a side view, a front view, and the like that can recognize any facial features of the face. The face, without the need for correct posture, improves the flexibility and efficiency of face recognition, and avoids the drawback that the face in the existing face recognition must be mechanically and correctly facing the recognition plane, which expands the flexibility of face recognition. The user experience of using the face recognition device is improved, and there is no need to change the hardware system of the existing face recognition system.

The training module 11 is configured to input the face image data of the training model sample library into a convolutional layer neural network of a CapsNet network structure for training, to obtain the face recognition model.

Referring to FIG. 4, the training module 11 of this embodiment includes:

a first input unit 111, configured to input each angle face image data of the training model sample library into a first convolution layer of a CapsNet network structure, and perform convolution with a first specified convolution kernel and a first specified step size And output the tensor by specifying the activation function.

a second input unit 112, configured to input the tensor into a second convolution layer of a CapsNet network structure, and convolute with a second specified convolution kernel and a second specified step to construct a tensor structure, and The Capsule vector of the tensor structure is output.

The second convolution layer of this embodiment is a Primary Capsules layer (main capsule layer), which is the bottom layer of the multi-dimensional entity category, has 32 channels, and each channel is composed of an 8-dimensional convolution structure, and each channel outputs An 8-dimensional vector that achieves the effect of 8*1 Capsules feature encapsulation. The Capsule of this embodiment is a set of neurons whose input and output vectors represent instantiation parameters of a particular entity class. The specific entity category of this embodiment is a facial feature. For example, the second specified convolution kernel of this embodiment is a 9*9 convolution kernel and the second specified step size is 2. In the CapsNet architecture of this embodiment, eight convolution units are packaged together into a new Caosule unit. The convolution calculation of the Primary Capsules layer does not use an activation function such as ReLU, but prepares the input into the next layer of Capsule units in a vector manner.

The updating unit 113 is configured to propagate and update the Capsule vector through a DigitCaps layer (digital capsule layer) of the CapsNet network structure, and output a face recognition model.

In the DigitCaps layer of the CapsNet network structure of this embodiment, the activation vector modulus of each Capsule gives an instance of each specific entity class. A very special attribute is an instance of a specific entity class in the image, for example, The output range of the DigitCaps layer is between 0 and 1, 0 means no, 1 means. The input of the DigitCaps layer is the output vector ui of all Capsules in the Primary Capsules layer, the vector dimension is [8,1]; the output vector vj of the DigitCaps layer, the vector dimension is [16,1], and the 16-dimensional output of CapsNet of this embodiment The training model is robust.

Referring to FIG. 5, an apparatus for recognizing a human face according to another embodiment of the present application includes:

The receiving module 30 is configured to receive each second shooting angle face image of the registered human face.

The second shooting angle of this embodiment is each shooting angle of the registrant to be distinguished from the first shooting angle with respect to the person to be tested. The registered face of the embodiment includes a face of one or more people, so as to recognize a face image of a plurality of people on the same recognition device, and expand the application range.

The second conversion module 31 is configured to respectively select a second feature extraction manner corresponding to the second imaging angle according to the face images of the second imaging angles, and extract the first registration faces corresponding to the second imaging angles one by one. The characteristics of the two faces, and according to the spatial positional relationship in the face recognition model, the features of each second face are respectively converted into the face image of the front face of the registered person in the face recognition model. Feature vector.

In this embodiment, the conversion mode and the conversion principle of the second feature of the face image corresponding to the second shooting angle corresponding to the second shooting angle of the registered face are the same as the first conversion module 2, where not Narration.

The setting module 32 is configured to set the second feature vector to be the preset feature vector.

In other embodiments of the present invention, the registrant may be a plurality of people, and the preset feature vectors are corresponding to multiple. When the face recognition model is used for face recognition, the corresponding facial features of different registrants may be respectively established. The database contains the facial features of the registrant and the spatial positional relationship of the five senses. The setting module 32 includes an obtaining unit, configured to acquire a feature vector of the plurality of preset feature vectors that matches the first feature vector, and a determining unit, configured to determine a face of the current acquired face image corresponding to the matched feature vector The image is the same face image. When the face recognition model is recognized, the corresponding database is firstly retrieved according to the identified features of the facial features, and then the respective conversions are implemented according to the corresponding spatial positional relationship. The conversion process and the principle are the same as those described above, and will not be described again.

Referring to FIG. 6, the determining module 3 of the embodiment includes:

The calculating unit 300 is configured to calculate a distance value between the first feature vector and the preset vector.

The distance value of this embodiment includes an Euclidean distance, a cosine distance or a Mahalanobis distance. In this embodiment, the Euclidean distance is preferably used to represent the similarity between the first feature vector and the preset feature vector. The Euclidean distance between the first feature vector and the preset feature vector in this embodiment is expressed as:

The determining unit 301 is configured to determine whether the size of the distance value is less than a preset threshold.

In this embodiment, the face recognition is taken as an example, and the multi-angle recognition method based on CapsNet can be extended to the identification of target items in other fields.

In this embodiment, the training model corresponding to the face image of each angle is output through the CapsNet network structure, and the feature extraction manner corresponding to the output face image changes according to the change of the angle of view, without eliminating the influence of the change of the angle of view in the neural activity. The different affine transformations or different components of different objects can be processed at the same time. The training model of the embodiment has the homomorphism of the equivariant mapping, and the facial images of each angle still have the recognition and representation after being rotated, translated and scaled. The ability to make the training model recognize facial images at various angles. The CapsNet model of this embodiment includes the spatial positional relationship of each person's facial features. From any angle that can recognize any facial features of the human face, the frontal image of the human face can be converted through the spatial positional relationship. By using the face recognition method of the embodiment, the face features can be accurately recognized in any range of viewing angles such as face view, bottom view, side view, and front view, and the face of the existing face recognition must be mechanically and correctly corrected. The drawbacks of the recognition plane extend the flexibility of face recognition, improve the user experience of using the face recognition device, and do not need to change the hardware system of the existing face recognition system.

Referring to FIG. 7, an apparatus for recognizing a human face according to still another embodiment of the present application includes:

The issuing module 5 is configured to issue a control instruction to the security system equipped with the face recognition model to open the security system to make the application entity controlled by the security system in a usable state.

Referring to FIG. 8, an apparatus for recognizing a human face according to still another embodiment of the present application includes:

The statistics module 6 is configured to count whether the length of time that the same person continues to use the application entity exceeds a threshold in a specified time period.

The threshold range of this embodiment can be specifically set according to different fields of use. For example, the embodiment is used in the field of game monitoring to prevent the same long time from being in a game state and affecting physical health. In this embodiment, the threshold value can be set to be 2 hours in the game state in the continuous 12 hours.

The generating module 7 is configured to generate an instruction to close the application entity if the time length exceeds the threshold to prohibit continued use of the application entity.

The embodiment is used in combination with time monitoring to further expand the application scene field of face recognition, for example, combining time monitoring with real-time monitoring of face recognition, and controlling the game software to prevent excessive addiction to the game, such as The face recognition system is configured to monitor the use state of the game system and whether the user is the same person. When the game time of the same person in the game system is monitored to exceed the preset value, the game system is controlled to be in a lock screen state. For example, for the management and distribution of public resources, the face recognition is used to monitor the use status of public facilities in the public domain. If it is determined that the same person's continuous use time exceeds the preset value, the public facility is shut down, and when it is determined that When others enter the public domain, they are automatically opened for use, which facilitates the rational allocation of public resources.

Referring to FIG. 9, an apparatus for recognizing a human face according to still another embodiment of the present application includes:

The summary module 8 is configured to summarize the image data of the same face into the same specified file.

Referring to FIG. 10, a computer device is also provided in the embodiment of the present application. The computer device may be a server, and its internal structure may be as shown in FIG. The computer device includes a processor, memory, network interface, and database connected by a system bus. Among them, the computer designed processor is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium, an internal memory. The non-volatile storage medium stores an operating system, computer readable instructions, and a database. The memory provides an environment for the operation of operating systems and computer readable instructions in a non-volatile storage medium. The database of the computer device is used to store data such as identifying faces. The network interface of the computer device is used to communicate with an external terminal via a network connection. The computer readable instructions, when executed, perform the flow of an embodiment of the methods described above. It will be understood by those skilled in the art that the structure shown in FIG. 10 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation of the computer device to which the present application is applied.

An embodiment of the present application also provides a computer non-volatile readable storage medium having stored thereon computer readable instructions that, when executed, perform the processes of the embodiments of the methods described above. The above description is only the preferred embodiment of the present application, and is not intended to limit the scope of the patent application, and the equivalent structure or equivalent process transformations made by the specification and the drawings of the present application, or directly or indirectly applied to other related The technical field is equally included in the scope of patent protection of the present application.

Claims

A method for recognizing a human face, comprising:

Selecting a corresponding first feature extraction mode in the face recognition model based on the CapsNet network structure training according to the first shooting angle of the acquired face image;

Extracting, according to the first feature extraction manner, a feature of the first face corresponding to the first shooting angle, and according to a spatial position relationship in the face recognition model, the feature of the first face is a first feature vector of the face image converted into a frontal face in the face recognition model;

Determining whether the similarity between the first feature vector and the preset feature vector is less than a preset threshold;

If it is smaller, it is determined that the acquired face image and the face image corresponding to the preset feature vector are face images of the same person.
The method for recognizing a human face according to claim 1, wherein the selecting a corresponding first feature in a face recognition model based on CapsNet network structure training according to the first photographing angle of the acquired facial image Before the steps of the extraction method, include:

Collecting face image data of each shooting angle of a plurality of people to construct a training model sample library;

The face image data of the training model sample library is input into a convolutional layer neural network of a CapsNet network structure for training, and the face recognition model is obtained.
The method for recognizing a human face according to claim 2, wherein the face image data of the training model sample library is input into a convolutional layer neural network of a CapsNet network structure for training, and the face is obtained. Steps to identify the model, including:

Inputting each angle face image data of the training model sample library into a first convolution layer of the CapsNet network structure, convolving with the first specified convolution kernel and the first specified step, and outputting the output by specifying an activation function the amount;

Transmitting the tensor into a second convolutional layer of the CapsNet network structure and convolving with a second specified convolution kernel and a second specified step to construct a tensor structure and output a Capsule of the tensor structure vector;

The Capsule vector is propagated and updated by the DigitCaps layer of the CapsNet network structure, and the face recognition model is output.
The method for identifying a face according to claim 1, wherein the step of determining whether the similarity between the first feature vector and the preset feature vector is less than a preset threshold comprises:

Receiving a face image of each second shooting angle of the registered face;

Selecting, according to each second shooting angle, a second feature extraction manner corresponding to the second shooting angle, and extracting features of the second human faces of the registered human faces corresponding to the second shooting angles one by one, and according to the human face Identifying a spatial positional relationship in the model, and converting the features of each of the second human faces into the second feature vector of the face image of the front face of the registrant in the face recognition model;

The second feature vector is set to be the preset feature vector.
The method for recognizing a human face according to claim 4, wherein the step of determining whether the similarity between the first feature vector and the preset feature vector is less than a preset threshold comprises:

Calculating a distance value between the first feature vector and the preset feature vector;

It is determined whether the size of the distance value is less than a preset threshold.
The method for recognizing a human face according to claim 1, wherein the step of determining that the acquired face image and the face image corresponding to the preset feature vector are the same person's face image ,include:

A control command is issued to the security system equipped with the face recognition model to open the security system to enable the application entity controlled by the security system to be in a usable state.
The method for recognizing a human face according to claim 6, wherein said control system is issued with a control command to a security system equipped with said face recognition model to open said security system to be controlled by said security system After the application entity is in a usable state, it includes:

The statistics specify whether the accumulated time length of the same entity for the continuous use of the application entity in the specified time period exceeds a threshold;

If exceeded, an instruction to close the application entity is generated to prohibit continued use of the application entity.
A device for recognizing a human face, comprising:

An acquiring module, configured to select a corresponding first feature extraction mode in a face recognition model based on the CapsNet network structure training according to the first shooting angle of the acquired face image;

a first conversion module, configured to extract, according to the first feature extraction manner, a feature of the first face corresponding to the first shooting angle, and according to a spatial position relationship in the face recognition model, a feature of the first human face is converted into a first feature vector of the face image of the frontal face in the face recognition model;

a determining module, configured to determine whether a similarity between the first feature vector and the preset feature vector is less than a preset threshold;

The determining module is configured to determine that the face image corresponding to the acquired face image and the preset feature vector is a face image of the same person if the similarity is less than a preset threshold.
The device for recognizing a human face according to claim 8, comprising:

An acquisition module is configured to collect facial image data of each shooting angle of a plurality of people to construct a training model sample library;

And a training module, configured to input the face image data of the training model sample library into a convolutional layer neural network of a CapsNet network structure for training, to obtain the face recognition model.
The device for recognizing a human face according to claim 9, wherein the training module comprises:

a first input unit, configured to input each angle face image data of the training model sample library into a first convolution layer of a CapsNet network structure, and convolute with a first specified convolution kernel and a first specified step, And output the tensor by specifying the activation function;

a second input unit, configured to input the tensor into a second convolution layer of the CapsNet network structure, and convolute with a second specified convolution kernel and a second specified step to construct a tensor structure and output a Capsule vector of the tensor structure;

And an updating unit, configured to propagate and update the Capsule vector through the DigitCaps layer of the CapsNet network structure, and output a face recognition model.
The device for recognizing a human face according to claim 8, comprising:

a receiving module, configured to receive a face image of each second shooting angle of the registered face;

a second conversion module, configured to respectively select a second feature extraction manner corresponding to the second imaging angle according to each second imaging angle, and extract features of the second human faces of the registered human faces corresponding to the second imaging angles one by one And converting, according to the spatial positional relationship in the face recognition model, the features of each of the second human faces into the face image of the front face of the registrant in the face recognition model Second feature vector;

And a setting module, configured to set the second feature vector to be the preset feature vector.
The device for recognizing a human face according to claim 11, wherein the determining module comprises:

a calculating unit, configured to calculate a distance value between the first feature vector and the preset feature vector;

The determining unit is configured to determine whether the size of the distance value is less than a preset threshold.
The device for recognizing a human face according to claim 8, comprising:

And an issuing module, configured to issue a control instruction to the security system equipped with the face recognition model to open the security system, so that the application entity controlled by the security system is in a usable state.
The device for recognizing a human face according to claim 13, comprising:

a statistics module, configured to count whether a cumulative time length of the same entity for the specified use period of the application entity exceeds a threshold;

And a generating module, configured to: if the accumulated time length exceeds a threshold, generate an instruction to close the application entity to prohibit continued use of the application entity.
A computer device comprising a memory and a processor, the memory storing computer readable instructions, wherein the processor implements a method of recognizing a face when the computer readable instructions are executed, the method of recognizing a face ,include:

Selecting a corresponding first feature extraction mode in the face recognition model based on the CapsNet network structure training according to the first shooting angle of the acquired face image;

Extracting, according to the first feature extraction manner, a feature of the first face corresponding to the first shooting angle, and according to a spatial position relationship in the face recognition model, the feature of the first face is a first feature vector of the face image converted into a frontal face in the face recognition model;

Determining whether the similarity between the first feature vector and the preset feature vector is less than a preset threshold;

If it is smaller, it is determined that the acquired face image and the face image corresponding to the preset feature vector are face images of the same person.
The computer device according to claim 15, wherein the selecting the corresponding first feature extraction mode in the face recognition model based on the CapsNet network structure training according to the first shooting angle of the acquired face image Before the steps, include:

Collecting face image data of each shooting angle of a plurality of people to construct a training model sample library;

The face image data of the training model sample library is input into a convolutional layer neural network of a CapsNet network structure for training, and the face recognition model is obtained.
The computer device according to claim 16, wherein the face image data of the training model sample library is input into a convolutional layer neural network of a CapsNet network structure for training, and the face recognition model is obtained. Steps, including:

Inputting each angle face image data of the training model sample library into a first convolution layer of the CapsNet network structure, convolving with the first specified convolution kernel and the first specified step, and outputting the output by specifying an activation function the amount;

Transmitting the tensor into a second convolutional layer of the CapsNet network structure and convolving with a second specified convolution kernel and a second specified step to construct a tensor structure and output a Capsule of the tensor structure vector;

The Capsule vector is propagated and updated by the DigitCaps layer of the CapsNet network structure, and the face recognition model is output.
A computer non-volatile readable storage medium having stored thereon computer readable instructions, wherein the computer readable instructions are implemented by a processor to implement a method of recognizing a human face, the method of recognizing a human face, include:

Selecting a corresponding first feature extraction mode in the face recognition model based on the CapsNet network structure training according to the first shooting angle of the acquired face image;

Extracting, according to the first feature extraction manner, a feature of the first face corresponding to the first shooting angle, and according to a spatial position relationship in the face recognition model, the feature of the first face is a first feature vector of the face image converted into a frontal face in the face recognition model;

Determining whether the similarity between the first feature vector and the preset feature vector is less than a preset threshold;

If it is smaller, it is determined that the acquired face image and the face image corresponding to the preset feature vector are face images of the same person.
The computer non-volatile readable storage medium according to claim 18, wherein the selecting a corresponding face recognition model based on the CapsNet network structure training according to the first shooting angle of the acquired face image Before the steps of the first feature extraction method, include:

Collecting face image data of each shooting angle of a plurality of people to construct a training model sample library;

The face image data of the training model sample library is input into a convolutional layer neural network of a CapsNet network structure for training, and the face recognition model is obtained.
The computer non-volatile readable storage medium according to claim 19, wherein the face image data of the training model sample library is input into a convolutional layer neural network of a CapsNet network structure for training, The steps of the face recognition model include:

Inputting each angle face image data of the training model sample library into a first convolution layer of the CapsNet network structure, convolving with the first specified convolution kernel and the first specified step, and outputting the output by specifying an activation function the amount;

Transmitting the tensor into a second convolutional layer of the CapsNet network structure and convolving with a second specified convolution kernel and a second specified step to construct a tensor structure and output a Capsule of the tensor structure vector;

The Capsule vector is propagated and updated by the DigitCaps layer of the CapsNet network structure, and the face recognition model is output.