WO2019200749A1 - Facial recognition method, apparatus, computing device and storage medium - Google Patents

Facial recognition method, apparatus, computing device and storage medium Download PDF

Info

Publication number
WO2019200749A1
WO2019200749A1 PCT/CN2018/095498 CN2018095498W WO2019200749A1 WO 2019200749 A1 WO2019200749 A1 WO 2019200749A1 CN 2018095498 W CN2018095498 W CN 2018095498W WO 2019200749 A1 WO2019200749 A1 WO 2019200749A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
feature vector
face image
training
recognition model
Prior art date
Application number
PCT/CN2018/095498
Other languages
French (fr)
Chinese (zh)
Inventor
王义文
王健宗
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019200749A1 publication Critical patent/WO2019200749A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships

Definitions

  • the present application relates to the field of application of convolutional neural networks, and in particular to methods, devices, computer devices and storage media for recognizing a human face.
  • biometrics technology has developed rapidly in recent decades. Compared with other biometrics, face recognition is characterized by directness, friendliness and convenience. And get more extensive research.
  • the face is composed of eyes, nose, mouth, chin, etc. Because of the differences in the shape, size and distribution of these parts, each face in the world is very different, so these parts can be used as important features of face recognition. .
  • the existing face recognition whether it is face recognition or face recognition, it is necessary to face the pendulum to be accurately recognized, and the existing face recognition only recognizes whether the specific entity type representing the face exists or not, and does not Considering the connection between the spatial positional relationship of the specific entity type of the face, the accuracy of the face recognition is not high, and the image can only be recognized by comparing the image of the specified gesture.
  • the existing face recognition is mechanically rigid and lacks stickers. Humanized design that fits human habits.
  • the existing face recognition has a low degree of similarity for a face with a very high similarity, such as a quadruple with a high degree of similarity.
  • the main purpose of the present application is to provide a method for recognizing a human face, which aims to solve the technical problem that the existing face recognition mechanism is rigid and the accuracy is not high.
  • the present application proposes a method for recognizing a face, including:
  • the acquired face image and the face image corresponding to the preset feature vector are face images of the same person.
  • the application proposes a device for recognizing a face, comprising:
  • An acquiring module configured to select a corresponding first feature extraction mode in a face recognition model based on the CapsNet network structure training according to the first shooting angle of the acquired face image;
  • a first conversion module configured to extract, according to the first feature extraction manner, a feature of the first face corresponding to the first shooting angle, and according to a spatial position relationship in the face recognition model, a feature of the first human face is converted into a first feature vector of the face image of the frontal face in the face recognition model;
  • a determining module configured to determine whether a similarity between the first feature vector and the preset feature vector is less than a preset threshold
  • the determining module is configured to determine that the face image corresponding to the acquired face image and the preset feature vector is a face image of the same person if the similarity is less than a preset threshold.
  • the application also provides a computer device comprising a memory and a processor, the memory storing computer readable instructions, the processor implementing the steps of the method when the computer readable instructions are executed.
  • the present application also provides a computer non-transitory readable storage medium having stored thereon computer readable instructions that, when executed by a processor, implement the steps of the methods described above.
  • the present application has the beneficial technical effects: the present application changes the structure of the output training model by changing the convolution structure, that is, changing the existing training model that only identifies a specific entity category to a spatial location that identifies a specific entity category and each specific entity category.
  • the training model of the relationship realizes accurate representation and recognition of the facial image within any angle range that can recognize any facial features of the human face;
  • the training model of the present application includes the spatial positional relationship of each facial feature, not only identifying the person
  • the facial feature features also recognizes the spatial positional relationship of the facial features, improves the accuracy of face recognition, and thus accurately recognizes;
  • the training model of the present application includes the spatial positional relationship of each facial feature, from any one that can recognize the face.
  • the frontal image of the face can be converted by the spatial positional relationship, which can be recognized under the arbitrary posture, and the face recognition is more flexible, efficient and humanized; the amount of data required for training the model in this application is large. Reduce, use fewer samples to train training models that accurately identify the output.
  • FIG. 1 is a schematic flow chart of a method for recognizing a human face according to an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of an apparatus for recognizing a human face according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of an apparatus for optimizing a face of an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a training module according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an apparatus for recognizing a human face according to another embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a determining module according to an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of an apparatus for recognizing a human face according to still another embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an apparatus for recognizing a human face according to still another embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of an apparatus for recognizing a human face according to still another embodiment of the present application.
  • FIG. 10 is a schematic diagram showing the internal structure of a computer device according to an embodiment of the present application.
  • a method for recognizing a human face includes:
  • S1 Select a corresponding first feature extraction mode in the face recognition model based on the CapsNet network structure training according to the first shooting angle of the acquired face image.
  • the shooting angle of this embodiment includes an angle that can capture any facial features of the human face based on the front line of sight of the frontal face, such as a lateral positive 90 degrees being a positive right face direction, and a lateral negative 90 degrees being a positive left side. Face orientation; for example, 90 degrees in the top direction, 90 degrees in the bottom direction, etc., all shooting angles are distributed on the spherical surface based on the front line of sight of the front face.
  • the photographing angle of the present embodiment is acquired by a camera that assembles the above-described face recognition model.
  • the first shooting angle of the embodiment is a certain shooting angle of the person to be verified to distinguish the second shooting angles when the registrant performs registration; the first feature extraction mode is relative to a specified shooting angle of the person to be verified.
  • the second feature extraction method corresponding to each second shooting angle when the registrant performs registration is similar to the above, and the above-mentioned "first" and "second” are only differences, and are not limited.
  • the CapsNet network structure of this embodiment is based on a capsule unit network structure.
  • the capsule unit is a vector containing arbitrary values, each value representing a feature of an object that needs to be identified currently, such as an eye in a facial expression. .
  • the capsule network is composed of a capsule unit, and the vector in the capsule network can represent not only the characteristics of the object but also the direction and state of the object.
  • the probability of the presence of a face is represented by the length of the input and output vector, and the direction of the vector represents some of the facial features of the face.
  • Capsules of the same level predict the instantiation parameters of higher-level Capsules through transformation matrices. When multiple predictions are consistent (this embodiment uses dynamic routing to make predictions consistent), higher level Capsules will become active.
  • the various states of the facial features present in the face image are represented by the activation of the neurons in the Capsule.
  • the above properties may include many different parameters, such as posture (position, size, direction), deformation, Speed, reflectivity, color, texture, etc.
  • the probability of the appearance of a face is expressed according to the vector length of the input and output, and the probability value must be between 0 and 1.
  • the capsule network of the present embodiment uses a non-linear function of Squashing, which ensures that the length of the short vector can be shortened to almost equal to zero, and the long vector The length is compressed to close but not more than one.
  • the expression of Squashing's nonlinear function is divided into two parts: with The nonlinear function is: The first part is the scaling scale of the input vector S j , and the second part is the unit vector of the input vector S j , which preserves the direction of the input vector and compresses the length of the input vector into the interval (0, 1). To achieve the probability of an entity appearing by the size of the vector modulus. The larger the modulus, the greater the probability. When the S j vector is a zero vector, V j can take 0, and when S j is infinite, V j infinitely approaches 1.
  • This nonlinear function can be regarded as a kind of compression and redistribution of the length of the vector, and can also be regarded as a The way the input vector is activated after the output vector.
  • the input vector of Capsule is equivalent to the scalar input of CNN, and the calculation of this vector is equivalent to the propagation and connection between two layers of Capsule.
  • the calculation of the input vector is divided into two phases, namely linear combination and routing (routing process, this embodiment is dynamic routing), which is expressed as: Where u is the output of the capsule network of the previous layer, and W is the weight to be multiplied for each output. It can be regarded as the output of each capsule neuron in the upper layer to a certain neuron in the next layer with different strengths and weaknesses.
  • C is calculated according to the following formula:
  • C is the coupling coefficient.
  • the network structure of the Capsnet capsule of this embodiment is similar to the linear weighted summation of the network, but a coupling coefficient C is added to the linear summation stage.
  • b b according to The following formula calculates: b initial value is 0, so in the process of forward propagation S, this embodiment design W as a random value, b initialized to 0 can get C, U is the output of the upper layer capsule network, V j is Capsule j The output vector, according to the above relationship, obtains a higher layer capsule input S.
  • the product has the following conditions: positive value, zero, negative value.
  • the multiplication result of two vectors is positive, it means that the two vectors point in the same direction.
  • the b update result becomes larger, then the coupling coefficient is high, indicating that the two vectors match very well.
  • the coupling coefficient is small, indicating that the two vectors do not match. Determining C by iteration is equivalent to determining the route on which the capsule neurons are particularly large, and the end of the route is the correctly predicted capsule.
  • the face recognition model of the embodiment includes a coordinate system of the face image of each angle, and represents a specific facial features of the face in a vector, such as a face, an ear, a nose, and the like, including a human face, and
  • the vector includes the parameters of the facial features, such as the size, position, direction, color and other posture attribute parameters, to indicate the relationship of the relative spatial position of the facial features.
  • the facial features recognized by the respective angles are different, and the corresponding manners of converting the facial images into the frontal faces according to the spatial positional relationship are different.
  • the vector representation of the nose obtained from the top view state is different from the vector representation of the nose acquired from the left face direction, and the rotation vector according to the nose vector in the top view state is converted into the front nose vector, unlike the face from the left side.
  • the vector representation of the nose acquired in the direction is converted to the rotation of the front nose vector.
  • the face images corresponding to different feature extraction modes are different at different angles, and the features of the first face recognized under different angles are different.
  • each angle of the facial features can be recognized perpendicular to the angle of the front face.
  • the positive state of the organ and the overall distribution between the facial features, and perpendicular to the direction of the right face can only recognize the lateral state of the right eye, the right half of the nose, the right ear and the like.
  • the face recognition model is a three-dimensional model with spatial positional relationship, and a certain part of the three-dimensional structure recognized under the first shooting angle can be converted into another angle according to the inherent spatial positional relationship.
  • the image of the three-dimensional structure for example, converts the side state of the organ identified to the right eye, the right half of the nose, the right ear, and the like to a state perpendicular to the photographing of the frontal face, thereby identifying the positive state feature of each organ in the facial features. And an overall distribution state between the five senses, outputting a first feature vector corresponding to a specific entity type of the front face, and the specific entity type corresponding to the facial features of the face.
  • S3 Determine whether the similarity between the first feature vector and the preset feature vector is less than a preset threshold.
  • the preset feature vector of the embodiment is a result of outputting the face image of the preset preset angle by inputting the feature extraction mode of the corresponding angle.
  • the preset threshold of this embodiment is 0.8 or more.
  • the feature vector of the face image of A is pre-registered.
  • the face recognition model in the smart door lock needs to be used for face recognition verification, and the verification can be opened.
  • Smart door lock When the A side is under the camera of the smart door lock, the camera captures the facial features of the side face of A and transmits it to the face recognition model.
  • the face model invokes the recognition of A according to the facial features of the side face of A.
  • the database is enabled with the facial position of the facial features of A, and converts the side face of A into a front face, and outputs the feature vector corresponding to the facial features in the front face, and calculates the feature vector to be compared with the preset feature vector registered by A, If the calculated value is within the set threshold, the smart door lock is controlled to be open.
  • the threshold is set to 0.8, and when it is less than 0.8, it is determined that the acquired face image and the face image corresponding to the preset feature vector are the same person's face shadow.
  • step S1 of this embodiment the method includes:
  • S10 Collect face image data of each shooting angle of a plurality of people to construct a training model sample library.
  • the face image data of each angle is collected, and the face image data of each angle of multiple people is included, so as to more accurately train the model and improve the generalization ability of the model, but the Capsnet network structure includes the spatial positional relationship factor.
  • the sample data required to construct the training model is significantly reduced, but the training model with higher accuracy is obtained.
  • CNN Convolutional Neural Network
  • the pooling layer not only reduces the parameters, but also avoids over-fitting, but at the same time discards some information, such as location information. CNN does not pay attention to the relative orientation of the components and the spatial relationship, only care about whether there are specific features.
  • CapsNet Capsule Networks, CapsNet
  • the influence of the change of the angle of view in the activity can simultaneously process a plurality of different affine transformations or different parts of different objects, so that the training model can recognize the face images of each angle.
  • the training model of the embodiment has the homomorphism of the equivariant mapping, and the facial image of each angle still has the ability of recognition and representation after being rotated, translated, and scaled, so that the training model of the embodiment can recognize each observed face.
  • the CapsNet network structure of the embodiment outputs a training model corresponding to each angle of the face image, and can accurately identify the person within a range of perspectives such as a face view, a bottom view, a side view, a front view, and the like that can recognize any facial features of the face.
  • the face without the need for correct posture, improves the flexibility and efficiency of face recognition, and avoids the drawback that the face in the existing face recognition must be mechanically and correctly facing the recognition plane, which expands the flexibility of face recognition.
  • the user experience of using the face recognition device is improved, and there is no need to change the hardware system of the existing face recognition system.
  • A is a feature data matrix of a facial face of m ⁇ p
  • B is a spatial position relationship matrix of a facial face of p ⁇ n
  • the combination relationship between the feature data of the facial features and the spatial position of the facial features is represented by the matrix multiplication in the model, and the data of the facial features are compactly combined by matrix multiplication.
  • a face recognition model with a spatial positional relationship is simply represented.
  • S11 The face image data of the training model sample library is input into a convolutional layer neural network of a CapsNet network structure for training, to obtain the face recognition model.
  • the training model of the embodiment can recognize the facial features from any angle that recognizes the facial features, and converts the facial features of the frontal human face according to the spatial positional relationship in the training model, thereby performing face recognition.
  • step S11 of the embodiment includes:
  • S111 Input the face image data of each angle of the training model sample library into the first convolution layer of the CapsNet network structure, convolute with the first specified convolution kernel and the first specified step, and output the output by specifying the activation function. the amount.
  • the CapsNet network structure of this embodiment includes two convolution layers and a full connection layer, and the first convolution layer is a conventional convolution layer, which serves as a detection function of pixel-level local features.
  • the first convolutional layer of this embodiment has 256 9*9 first designated convolution kernels, the first specified step size is 1, and the activation function is designated as ReLU.
  • the first convolution layer converts the pixel brightness into an activation of the local feature detector, and the output tensor of the first convolution layer serves as the input to the second convolution layer.
  • S112 input the tensor into a second convolution layer of the CapsNet network structure, and convolute with a second specified convolution kernel and a second specified step to construct a tensor structure, and output the tensor structure Capsule vector.
  • the second convolution layer of this embodiment is a Primary Capsules layer (main capsule layer), which is the bottom layer of the multi-dimensional entity category, and has 32 channels, each channel is composed of an 8-dimensional convolution structure, and each channel is output.
  • the Capsule of this embodiment is a set of neurons whose input and output vectors represent instantiation parameters of a particular entity class.
  • the specific entity category of this embodiment is a facial feature.
  • the second specified convolution kernel of this embodiment is a 9*9 convolution kernel and the second specified step size is 2.
  • eight convolution units are packaged together into a new Caosule unit.
  • the convolution calculation of the Primary Capsules layer does not use an activation function such as ReLU (Rectified Linear Unit), but prepares the input into the next Capsule unit in a vector manner.
  • ReLU Rectified Linear Unit
  • S113 Propagating and routing the Capsule vector through a DigitCaps layer (digital capsule layer) of the CapsNet network structure, and outputting a face recognition model.
  • the activation vector modulus of each Capsule gives an instance of each specific entity class.
  • the output range of the DigitCaps layer is between 0 and 1, and 0 means no. Existence, 1 indicates that it has appeared.
  • the input of the DigitCaps layer is the output vector of all Capsules in the Primary Capsules layer, the vector dimension is [8, 1]; the vector dimension of the output vector of the DigitCaps layer is [16, 1], and the training of the 16-dimensional output of CapsNet in this embodiment The model is robust.
  • the method includes:
  • the second shooting angle of the embodiment is the shooting angle of the registrant to distinguish it from the first shooting angle with respect to the person to be tested.
  • the “first” and the “second” are only different, and are not limited.
  • the other paragraphs are the same as this, and are not described here.
  • the registered face of the embodiment includes a face of one person or a plurality of people, so as to recognize a face image of a plurality of people on the same recognition device, and expand the application range.
  • S31 Select a second feature extraction manner corresponding to the second shooting angle according to each second shooting angle, and extract the features of the second human faces of the registered human faces corresponding to the second shooting angles one by one, according to the The spatial positional relationship in the face recognition model converts the features of each of the second human faces into a second feature vector of the face image of the front face of the registrant in the face recognition model.
  • the conversion mode and the conversion principle of the second feature of the face image corresponding to the second shooting angle of the registered face to the face image of the front face are the same as the step S2, and details are not described herein.
  • the determination is the same. personal.
  • the embodiment of the present application further includes a case where the registrant can be multiple people, and the preset feature vector corresponds to multiple.
  • the face recognition model is used for face recognition
  • the facial features of the different registrants can be separately established.
  • the database contains the facial features of the registrant and the spatial position relationship of the five senses. Obtaining a feature vector that matches the first feature vector among the plurality of preset feature vectors, and determining that the face image corresponding to the face image acquired in the current period is the same face image as the matched feature vector.
  • the corresponding database is firstly retrieved according to the recognized facial features, and then the respective conversions are implemented according to the corresponding spatial positional relationship. The conversion process and the principle are the same as above, and will not be described again.
  • step S3 of the embodiment includes:
  • the distance value of this step includes an Euclidean distance, a cosine distance or a Mahalanobis distance.
  • the Euclidean distance is preferably used to represent the similarity between the first feature vector and the preset feature vector.
  • the Euclidean distance between the first feature vector and the preset feature vector in this embodiment is expressed as: Where X is the face feature vector extracted from the face used for registration, Y is the face feature vector extracted at the time of verification, and n is a natural number.
  • S301 Determine whether the size of the distance value is less than a preset threshold.
  • the face recognition is taken as an example.
  • the CapsNet-based multi-angle recognition method of the present embodiment is also extended to the identification of target items in other fields, and is not described herein.
  • the training model corresponding to the face image of each angle is output through the CapsNet network structure, and the feature extraction manner corresponding to the output face image changes according to the change of the angle of view, without eliminating the influence of the change of the angle of view in the neural activity.
  • the different affine transformations or different components of different objects can be processed at the same time.
  • the training model of the embodiment has the homomorphism of the equivariant mapping, and the facial images of each angle still have the recognition and representation after being rotated, translated and scaled. The ability to make the training model recognize facial images at various angles.
  • the spatial position relationship of each facial feature is included, and the frontal image of the human face can be converted by the spatial positional relationship from any angle that can recognize any facial features of the human face.
  • the face features can be accurately recognized in any range of viewing angles such as face view, bottom view, side view, and front view, and the face of the existing face recognition must be mechanically and correctly corrected.
  • the drawbacks of the recognition plane extend the flexibility of face recognition, improve the user experience of using the face recognition device, and do not need to change the hardware system of the existing face recognition system.
  • step S4 the method includes:
  • S5 issuing a control command to the security system equipped with the face recognition model to open the security system, so that the application entity controlled by the security system is in a usable state.
  • the face recognition device is used in the designated security system.
  • the preset control command when the same person is sent to the security system is used to better play the function of the security system.
  • the security system of this embodiment includes, but is not limited to, a smart door lock switch, an identity verification access control, and various internet security platforms, such as a tax registration platform, a bank account platform, and a candidate authentication platform, etc., to improve the timeliness and accuracy of verification.
  • the application entity of this embodiment includes a physical object and a virtual platform, such as a physical toy, a public fitness equipment, and the like, and a virtual platform such as an online game platform, a network video platform, or the like.
  • step S5 the method further includes:
  • the statistic specifies whether the accumulated time length of the same entity for the continuous use of the application entity in the specified time period exceeds a threshold.
  • the threshold range of this step can be specifically set according to different fields of use.
  • the embodiment is used in the field of game monitoring to prevent the same long time from being in a game state and affecting physical health.
  • the threshold value can be set to be 2 hours in the accumulated game state for 12 hours.
  • the embodiment is used in combination with time monitoring to further expand the application scene field of face recognition, for example, combining time monitoring with real-time monitoring of face recognition, and controlling the game software to prevent excessive addiction to the game, such as
  • the face recognition system is configured to monitor the use state of the game system and whether the user is the same person. When the game time of the same person in the game system is monitored to exceed the preset value, the game system is controlled to be in a lock screen state.
  • the face recognition is used to monitor the use status of public facilities in the public domain. If it is determined that the same person's continuous use time exceeds the preset value, the public facility is shut down, and when it is determined that When others enter the public domain to use public facilities, they are automatically opened for use, which is conducive to the rational allocation of public resources.
  • step S4 the method includes:
  • the structure of the output training model is changed, that is, the existing training model that only identifies the specific entity category is changed to a training model that takes into consideration the specific spatial type and the spatial positional relationship of each specific entity type.
  • the angle can be converted into the frontal image of the face through the spatial positional relationship, which can be recognized under random gestures, without the need to deliberately pose, the face recognition is more flexible, efficient and user-friendly; the data needed for the training model output through the CapsNet network The amount is greatly reduced, and the output can be trained accurately with fewer samples. Other training model.
  • a CapsNet-based multi-angle recognition face device includes:
  • the obtaining module 1 is configured to select a corresponding first feature extraction mode in the face recognition model based on the CapsNet network structure training according to the first shooting angle of the acquired face image.
  • the shooting angle of this embodiment includes an angle that can capture any facial features of the human face based on the front line of sight of the frontal face, such as a lateral positive 90 degrees being a positive right face direction, and a lateral negative 90 degrees being a positive left side. Face orientation; for example, 90 degrees in the top direction, 90 degrees in the bottom direction, etc., all shooting angles are distributed on the spherical surface based on the front line of sight of the front face.
  • the photographing angle of the present embodiment is acquired by a camera that assembles the above-described face recognition model.
  • the first shooting angle of the embodiment is a certain shooting angle of the person to be verified to distinguish the second shooting angles when the registrant performs registration; the first feature extraction mode is relative to a specified shooting angle of the person to be verified.
  • the second feature extraction method corresponding to each second shooting angle when the registrant performs registration is similar to the above, and the above-mentioned "first" and "second” are only differences, and are not limited.
  • the CapsNet network structure of this embodiment is based on a capsule unit network structure.
  • the capsule unit is a vector containing arbitrary values, each value representing a feature of an object that needs to be recognized currently, such as an eye of a facial expression.
  • the capsule network is composed of a capsule unit, and the vector of the capsule network can represent not only the characteristics of the object but also the direction and state of the object.
  • the probability of the presence of the face is represented by the length of the vector of the input and output, and the direction of the vector represents some of the facial features of the face).
  • Capsules of the same level predict the instantiation parameters of higher-level Capsules through transformation matrices. When multiple predictions are consistent (this embodiment uses dynamic routing to make predictions consistent), higher level Capsules will become active.
  • the various states of the facial features present in the face image are represented by the activation of the neurons in the Capsule.
  • the above properties may include many different parameters, such as posture (position, size, direction), deformation, Speed, reflectivity, color, texture, etc.
  • the length of the input and output vector indicates the probability of a face appearing, and the probability value must be between 0 and 1.
  • the capsule network of the present embodiment uses a nonlinear function of Squashing, which ensures that the length of the short vector can be shortened to almost equal to zero, and the length of the long vector is compressed to Close to but not more than one.
  • the expression of Squashing's nonlinear function is divided into two parts: with The nonlinear function is: The first part is the scaling scale of the input vector S j , and the second part is the unit vector of the input vector S j , which preserves the direction of the input vector and compresses the length of the input vector into the interval (0, 1). To achieve the probability of an entity appearing by the size of the vector modulus. The larger the modulus, the greater the probability. When the S j vector is a zero vector, V j can take 0, and when S j is infinite, V j infinitely approaches 1.
  • This nonlinear function can be regarded as a kind of compression and redistribution of the length of the vector, and can also be regarded as a The way the input vector is activated after the output vector.
  • the input vector of Capsule is equivalent to the scalar input of classical neural network neurons, and the calculation of this vector is equivalent to the propagation and connection between two layers of Capsule.
  • the calculation of the input vector is divided into two phases, namely linear combination and routing (routing process, this embodiment is dynamic routing), which is expressed as: Where u is the output of the capsule network of the previous layer, and W is the weight to be multiplied for each output. It can be regarded as the output of each capsule neuron in the upper layer to a certain neuron in the next layer with different strengths and weaknesses.
  • C is calculated according to the following formula: C is the coupling coefficient.
  • the input of the Capsnet network structure of the present embodiment is similar to the linear weighted summation of the network, but a coupling coefficient C is added to the linear summation stage.
  • b initial value is 0, so in the process of forward propagation S, this embodiment design W as a random value, b initialized to 0 can get C
  • U is the output of the upper layer capsule network
  • V j is Capsule j
  • the output vector obtains a higher layer capsule input S.
  • the product has the following conditions: positive value, zero, negative value.
  • the multiplication result of two vectors is positive, it means that the two vectors point in the same direction.
  • the b update result becomes larger, then the coupling coefficient is high, indicating that the two vectors match very well.
  • the coupling coefficient is small, indicating that the two vectors do not match.
  • the face recognition model of the embodiment includes a coordinate system of the face image of each angle, and represents a specific facial features of the face in a vector, such as a face, an ear, a nose, and the like, including a human face, and
  • the vector includes the parameters of the facial features, such as the size, position, direction, color and other posture attribute parameters, to indicate the relationship of the relative spatial position of the facial features.
  • the facial features recognized by the respective angles are different, and the corresponding manners of converting the facial images into the frontal faces according to the spatial positional relationship are different.
  • the vector representation of the nose obtained from the top view state is different from the vector representation of the nose acquired from the left face direction, and the rotation vector according to the nose vector in the top view state is converted into the front nose vector, unlike the face from the left side.
  • the vector representation of the nose acquired in the direction is converted to the rotation of the front nose vector.
  • a first conversion module 2 configured to extract, according to the first feature extraction manner, a feature of the first face corresponding to the first shooting angle, and according to a spatial position relationship in the face recognition model, The feature of the first human face is converted into a first feature vector of the face image of the frontal face in the face recognition model.
  • the face images corresponding to different feature extraction modes are different at different angles, and the features of the first face recognized under different angles are different.
  • each angle of the facial features can be recognized perpendicular to the angle of the front face.
  • the positive state of the organ and the overall distribution between the facial features, and perpendicular to the direction of the right face can only recognize the lateral state of the right eye, the right half of the nose, the right ear and the like.
  • the face recognition model is a three-dimensional model with spatial positional relationship, and a certain part of the three-dimensional structure recognized under the first shooting angle can be converted into another angle according to the inherent spatial positional relationship.
  • the image of the three-dimensional structure for example, converts the side state of the organ identified to the right eye, the right half of the nose, the right ear, and the like to a state perpendicular to the photographing of the frontal face, thereby identifying the positive state feature of each organ in the facial features. And an overall distribution state between the five senses, outputting a first feature vector corresponding to a specific entity type of the front face, and the specific entity type corresponding to the facial features of the face.
  • the determining module 3 is configured to determine whether the similarity between the first feature vector and the preset feature vector is less than a preset threshold.
  • the preset feature vector of the embodiment is a result of outputting the face image of the preset preset angle by inputting the feature extraction mode of the corresponding angle.
  • the preset threshold of this embodiment is 0.8 or more.
  • the feature vector of the face image of A is pre-registered.
  • the face recognition model in the smart door lock needs to be used for face recognition verification, and the verification can be opened.
  • Smart door lock When the A side is under the camera of the smart door lock, the camera captures the facial features of the side face of A and transmits it to the face recognition model.
  • the face model invokes the recognition of A according to the facial features of the side face of A.
  • the database is enabled with the facial position of the facial features of A, and converts the side face of A into a front face, and outputs the feature vector corresponding to the facial features in the front face, and calculates the feature vector to be compared with the preset feature vector registered by A, If the calculated value is within the set threshold, the smart door lock is controlled to be open.
  • the determining module 4 is configured to determine that the acquired face image and the face image corresponding to the preset feature vector are the face images of the same person if the similarity is less than the preset threshold.
  • the threshold is set to 0.8, and when it is less than 0.8, it is determined that the acquired face image and the face image corresponding to the preset feature vector are the same person's face shadow.
  • the face recognition device of this embodiment includes:
  • the acquisition module 10 is configured to collect facial image data of each shooting angle of a plurality of people to construct a training model sample library.
  • the face image data of each angle is collected, and the face image data of each angle of multiple people is included, so as to more accurately train the model and improve the generalization ability of the model, but the Capsnet network structure includes the spatial positional relationship factor.
  • the sample data required to construct the training model is significantly reduced, but the training model with higher accuracy is obtained.
  • the existing CNN Convolutional Neural Network
  • the pooling layer not only reduces the parameters, but also avoids over-fitting, but at the same time discards some information, such as location information. CNN does not pay attention to the relative orientation of the components and the spatial relationship, only care about whether there are specific features.
  • CapsNet Capsule Networks, CapsNet
  • the influence of the change in the mid-view angle can simultaneously process a plurality of different affine transformations or different components of different objects, so that the training model can recognize the face images of the respective angles.
  • the training model of the embodiment has the homomorphism of the equivariant mapping, and the facial image of each angle still has the ability of recognition and representation after being rotated, translated, and scaled, so that the training model of the embodiment can recognize each observed face.
  • the CapsNet network structure of the embodiment outputs a training model corresponding to each angle of the face image, and can accurately identify the person within a range of perspectives such as a face view, a bottom view, a side view, a front view, and the like that can recognize any facial features of the face.
  • the face without the need for correct posture, improves the flexibility and efficiency of face recognition, and avoids the drawback that the face in the existing face recognition must be mechanically and correctly facing the recognition plane, which expands the flexibility of face recognition.
  • the user experience of using the face recognition device is improved, and there is no need to change the hardware system of the existing face recognition system.
  • A is a feature data matrix of a facial face of m ⁇ p
  • B is a spatial position relationship matrix of a facial face of p ⁇ n
  • the combination relationship between the feature data of the facial features and the spatial position of the facial features is represented by the matrix multiplication in the model, and the data of the facial features are compactly combined by matrix multiplication.
  • a face recognition model with a spatial positional relationship is simply represented.
  • the training module 11 is configured to input the face image data of the training model sample library into a convolutional layer neural network of a CapsNet network structure for training, to obtain the face recognition model.
  • the training model of the embodiment can recognize the facial features from any angle that recognizes the facial features, and converts the facial features of the frontal human face according to the spatial positional relationship in the training model, thereby performing face recognition.
  • the training module 11 of this embodiment includes:
  • a first input unit 111 configured to input each angle face image data of the training model sample library into a first convolution layer of a CapsNet network structure, and perform convolution with a first specified convolution kernel and a first specified step size And output the tensor by specifying the activation function.
  • the CapsNet network structure of this embodiment includes two convolution layers and a full connection layer, and the first convolution layer is a conventional convolution layer, which serves as a detection function of pixel-level local features.
  • the first convolutional layer of this embodiment has 256 9*9 first designated convolution kernels, the first specified step size is 1, and the activation function is designated as ReLU.
  • the first convolution layer converts the pixel brightness into an activation of the local feature detector, and the output tensor of the first convolution layer serves as the input to the second convolution layer.
  • a second input unit 112 configured to input the tensor into a second convolution layer of a CapsNet network structure, and convolute with a second specified convolution kernel and a second specified step to construct a tensor structure, and The Capsule vector of the tensor structure is output.
  • the second convolution layer of this embodiment is a Primary Capsules layer (main capsule layer), which is the bottom layer of the multi-dimensional entity category, has 32 channels, and each channel is composed of an 8-dimensional convolution structure, and each channel outputs An 8-dimensional vector that achieves the effect of 8*1 Capsules feature encapsulation.
  • the Capsule of this embodiment is a set of neurons whose input and output vectors represent instantiation parameters of a particular entity class.
  • the specific entity category of this embodiment is a facial feature.
  • the second specified convolution kernel of this embodiment is a 9*9 convolution kernel and the second specified step size is 2.
  • eight convolution units are packaged together into a new Caosule unit.
  • the convolution calculation of the Primary Capsules layer does not use an activation function such as ReLU, but prepares the input into the next layer of Capsule units in a vector manner.
  • the updating unit 113 is configured to propagate and update the Capsule vector through a DigitCaps layer (digital capsule layer) of the CapsNet network structure, and output a face recognition model.
  • the activation vector modulus of each Capsule gives an instance of each specific entity class.
  • a very special attribute is an instance of a specific entity class in the image, for example,
  • the output range of the DigitCaps layer is between 0 and 1, 0 means no, 1 means.
  • the input of the DigitCaps layer is the output vector ui of all Capsules in the Primary Capsules layer, the vector dimension is [8,1]; the output vector vj of the DigitCaps layer, the vector dimension is [16,1], and the 16-dimensional output of CapsNet of this embodiment
  • the training model is robust.
  • an apparatus for recognizing a human face includes:
  • the receiving module 30 is configured to receive each second shooting angle face image of the registered human face.
  • the second shooting angle of this embodiment is each shooting angle of the registrant to be distinguished from the first shooting angle with respect to the person to be tested.
  • the registered face of the embodiment includes a face of one or more people, so as to recognize a face image of a plurality of people on the same recognition device, and expand the application range.
  • the second conversion module 31 is configured to respectively select a second feature extraction manner corresponding to the second imaging angle according to the face images of the second imaging angles, and extract the first registration faces corresponding to the second imaging angles one by one.
  • the characteristics of the two faces, and according to the spatial positional relationship in the face recognition model, the features of each second face are respectively converted into the face image of the front face of the registered person in the face recognition model. Feature vector.
  • the conversion mode and the conversion principle of the second feature of the face image corresponding to the second shooting angle corresponding to the second shooting angle of the registered face are the same as the first conversion module 2, where not Narration.
  • the setting module 32 is configured to set the second feature vector to be the preset feature vector.
  • the determination is the same. personal.
  • the registrant may be a plurality of people, and the preset feature vectors are corresponding to multiple.
  • the corresponding facial features of different registrants may be respectively established.
  • the database contains the facial features of the registrant and the spatial positional relationship of the five senses.
  • the setting module 32 includes an obtaining unit, configured to acquire a feature vector of the plurality of preset feature vectors that matches the first feature vector, and a determining unit, configured to determine a face of the current acquired face image corresponding to the matched feature vector The image is the same face image.
  • the corresponding database is firstly retrieved according to the identified features of the facial features, and then the respective conversions are implemented according to the corresponding spatial positional relationship.
  • the conversion process and the principle are the same as those described above, and will not be described again.
  • the determining module 3 of the embodiment includes:
  • the calculating unit 300 is configured to calculate a distance value between the first feature vector and the preset vector.
  • the distance value of this embodiment includes an Euclidean distance, a cosine distance or a Mahalanobis distance.
  • the Euclidean distance is preferably used to represent the similarity between the first feature vector and the preset feature vector.
  • the Euclidean distance between the first feature vector and the preset feature vector in this embodiment is expressed as: Where X is the face feature vector extracted from the face used for registration, Y is the face feature vector extracted at the time of verification, and n is a natural number.
  • the determining unit 301 is configured to determine whether the size of the distance value is less than a preset threshold.
  • the face recognition is taken as an example, and the multi-angle recognition method based on CapsNet can be extended to the identification of target items in other fields.
  • the training model corresponding to the face image of each angle is output through the CapsNet network structure, and the feature extraction manner corresponding to the output face image changes according to the change of the angle of view, without eliminating the influence of the change of the angle of view in the neural activity.
  • the different affine transformations or different components of different objects can be processed at the same time.
  • the training model of the embodiment has the homomorphism of the equivariant mapping, and the facial images of each angle still have the recognition and representation after being rotated, translated and scaled. The ability to make the training model recognize facial images at various angles.
  • the CapsNet model of this embodiment includes the spatial positional relationship of each person's facial features.
  • the frontal image of the human face can be converted through the spatial positional relationship.
  • the face features can be accurately recognized in any range of viewing angles such as face view, bottom view, side view, and front view, and the face of the existing face recognition must be mechanically and correctly corrected.
  • the drawbacks of the recognition plane extend the flexibility of face recognition, improve the user experience of using the face recognition device, and do not need to change the hardware system of the existing face recognition system.
  • an apparatus for recognizing a human face includes:
  • the issuing module 5 is configured to issue a control instruction to the security system equipped with the face recognition model to open the security system to make the application entity controlled by the security system in a usable state.
  • the face recognition device is used in the designated security system.
  • the preset control command when the same person is sent to the security system is used to better play the function of the security system.
  • the security system of this embodiment includes, but is not limited to, a smart door lock switch, an identity verification access control, and various internet security platforms, such as a tax registration platform, a bank account platform, and a candidate authentication platform, etc., to improve the timeliness and accuracy of verification.
  • the application entity of this embodiment includes a physical object and a virtual platform, such as a physical toy, a public fitness equipment, and the like, and a virtual platform such as an online game platform, a network video platform, or the like.
  • an apparatus for recognizing a human face includes:
  • the statistics module 6 is configured to count whether the length of time that the same person continues to use the application entity exceeds a threshold in a specified time period.
  • the threshold range of this embodiment can be specifically set according to different fields of use.
  • the embodiment is used in the field of game monitoring to prevent the same long time from being in a game state and affecting physical health.
  • the threshold value can be set to be 2 hours in the game state in the continuous 12 hours.
  • the generating module 7 is configured to generate an instruction to close the application entity if the time length exceeds the threshold to prohibit continued use of the application entity.
  • the embodiment is used in combination with time monitoring to further expand the application scene field of face recognition, for example, combining time monitoring with real-time monitoring of face recognition, and controlling the game software to prevent excessive addiction to the game, such as
  • the face recognition system is configured to monitor the use state of the game system and whether the user is the same person. When the game time of the same person in the game system is monitored to exceed the preset value, the game system is controlled to be in a lock screen state.
  • the face recognition is used to monitor the use status of public facilities in the public domain. If it is determined that the same person's continuous use time exceeds the preset value, the public facility is shut down, and when it is determined that When others enter the public domain, they are automatically opened for use, which facilitates the rational allocation of public resources.
  • an apparatus for recognizing a human face includes:
  • the summary module 8 is configured to summarize the image data of the same face into the same specified file.
  • the computer device may be a server, and its internal structure may be as shown in FIG.
  • the computer device includes a processor, memory, network interface, and database connected by a system bus. Among them, the computer designed processor is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium, an internal memory.
  • the non-volatile storage medium stores an operating system, computer readable instructions, and a database.
  • the memory provides an environment for the operation of operating systems and computer readable instructions in a non-volatile storage medium.
  • the database of the computer device is used to store data such as identifying faces.
  • the network interface of the computer device is used to communicate with an external terminal via a network connection.
  • the computer readable instructions when executed, perform the flow of an embodiment of the methods described above. It will be understood by those skilled in the art that the structure shown in FIG. 10 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation of the computer device to which the present application is applied.
  • An embodiment of the present application also provides a computer non-volatile readable storage medium having stored thereon computer readable instructions that, when executed, perform the processes of the embodiments of the methods described above.
  • the above description is only the preferred embodiment of the present application, and is not intended to limit the scope of the patent application, and the equivalent structure or equivalent process transformations made by the specification and the drawings of the present application, or directly or indirectly applied to other related The technical field is equally included in the scope of patent protection of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

The present application provides a facial recognition method comprising: selecting, according to a first image capture angle, a first feature extraction method based on a facial recognition module trained in the CapsNet architecture; extracting, according to the first feature extraction method, a first facial feature corresponding to the first image capture angle, and converting, according to a spatial relationship, the first facial feature into a frontal face first feature vector; determining whether the degree of similarity of the first feature vector and a preset feature vector is less than a preset threshold; if so determining the images to be of the same face.

Description

识别人脸的方法、装置、计算机设备和存储介质Method, device, computer device and storage medium for recognizing a face
本申请要求于2018年4月17日提交中国专利局、申请号为2018103446690,发明名称为“识别人脸的方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to Chinese Patent Application No. 2018103446690, filed on Apr. 17, 2018, entitled "Method, Apparatus, Computer Equipment and Storage Media for Recognizing Human Faces", the entire contents of which are hereby incorporated by reference. The citations are incorporated herein by reference.
技术领域Technical field
本申请涉及到卷积神经网络的应用领域,特别是涉及到识别人脸的方法、装置、计算机设备和存储介质。The present application relates to the field of application of convolutional neural networks, and in particular to methods, devices, computer devices and storage media for recognizing a human face.
背景技术Background technique
随着社会的不断进步以及快速有效的自动身份验证的迫切需求,生物特征识别技术在近几十年飞速发展,与其它生物特征识别相比,人脸识别由于具有直接、友好、方便的特点,而得到更广泛研究。人脸由眼睛、鼻子、嘴巴、下巴等部位构成,正因为这些部位的形状、大小和分布的各种差异,才使得世界上每张人脸千差万别,所以这些部位可作为人脸识别的重要特征。现有的人脸识别中无论是正脸识别还是侧脸识别,均需要正对摆拍才能准确识别,而且现有的人脸识别中只识别表征人脸的特定实体类型是否存在是否一致,而不考虑人脸的特定实体类型的空间位置关系之间的联络,使得人脸识别的精准性不高,而且只能通过比对指定姿态下摆拍的图像进行识别,现有人脸识别机械呆板,缺少贴合人类习惯的人性化设计。而且现有人脸识别对相似度极高的人脸,比如五官相似度极高的双胞胎识别精准度不高。With the continuous advancement of society and the urgent need for fast and effective automatic authentication, biometrics technology has developed rapidly in recent decades. Compared with other biometrics, face recognition is characterized by directness, friendliness and convenience. And get more extensive research. The face is composed of eyes, nose, mouth, chin, etc. Because of the differences in the shape, size and distribution of these parts, each face in the world is very different, so these parts can be used as important features of face recognition. . In the existing face recognition, whether it is face recognition or face recognition, it is necessary to face the pendulum to be accurately recognized, and the existing face recognition only recognizes whether the specific entity type representing the face exists or not, and does not Considering the connection between the spatial positional relationship of the specific entity type of the face, the accuracy of the face recognition is not high, and the image can only be recognized by comparing the image of the specified gesture. The existing face recognition is mechanically rigid and lacks stickers. Humanized design that fits human habits. Moreover, the existing face recognition has a low degree of similarity for a face with a very high similarity, such as a quadruple with a high degree of similarity.
技术问题technical problem
本申请的主要目的为提供一种识别人脸的方法,旨在解决现有人脸识别机械呆板且精度不高的技术问题。The main purpose of the present application is to provide a method for recognizing a human face, which aims to solve the technical problem that the existing face recognition mechanism is rigid and the accuracy is not high.
技术解决方案Technical solution
本申请提出一种识别人脸的方法,包括:The present application proposes a method for recognizing a face, including:
根据获取到的人脸影像的第一拍摄角度,在基于CapsNet网络结构训练的人脸识别模型中选择对应的第一特征提取方式;Selecting a corresponding first feature extraction mode in the face recognition model based on the CapsNet network structure training according to the first shooting angle of the acquired face image;
根据所述第一特征提取方式提取所述第一拍摄角度对应的第一特定实体类型的特征,并根据所述人脸识别模型中带有的空间位置关系,将所述第一人脸的特征在所述人脸识别模型中转换成正面人脸的人脸影像的第一特征向量;Extracting, according to the first feature extraction manner, a feature of the first specific entity type corresponding to the first shooting angle, and selecting a feature of the first human face according to a spatial position relationship in the face recognition model Converting a first feature vector of a face image of a frontal face in the face recognition model;
判断所述第一特征向量与预设特征向量的相似度是否小于预设阈值;Determining whether the similarity between the first feature vector and the preset feature vector is less than a preset threshold;
若小于,则判定所述获取到的人脸影像与所述预设特征向量对应的人脸影像为同一个人的人脸影像。If it is smaller, it is determined that the acquired face image and the face image corresponding to the preset feature vector are face images of the same person.
本申请提出一种识别人脸的装置,包括:The application proposes a device for recognizing a face, comprising:
获取模块,用于根据获取到的人脸影像的第一拍摄角度,在基于CapsNet网络结构训练的人脸识别模型中选择对应的第一特征提取方式;An acquiring module, configured to select a corresponding first feature extraction mode in a face recognition model based on the CapsNet network structure training according to the first shooting angle of the acquired face image;
第一转换模块,用于根据所述第一特征提取方式提取所述第一拍摄角度对应的第一人脸的特征,并根据所述人脸识别模型中带有的空间位置关系,将所述第一人脸的特征在所述人脸识别模型中转换成正面人脸的人脸影像的第一特征向量;a first conversion module, configured to extract, according to the first feature extraction manner, a feature of the first face corresponding to the first shooting angle, and according to a spatial position relationship in the face recognition model, a feature of the first human face is converted into a first feature vector of the face image of the frontal face in the face recognition model;
判断模块,用于判断所述第一特征向量与预设特征向量的相似度是否小于预设阈值;a determining module, configured to determine whether a similarity between the first feature vector and the preset feature vector is less than a preset threshold;
判定模块,用于若相似度小于预设阈值,则判定所述获取到的人脸影像与所述预设特征向量对应的人脸影像为同一个人的人脸影像。The determining module is configured to determine that the face image corresponding to the acquired face image and the preset feature vector is a face image of the same person if the similarity is less than a preset threshold.
本申请还提供了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现上述方法的步骤。The application also provides a computer device comprising a memory and a processor, the memory storing computer readable instructions, the processor implementing the steps of the method when the computer readable instructions are executed.
本申请还提供了一种计算机非易失性可读存储介质,其上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现上述的方法的步骤。The present application also provides a computer non-transitory readable storage medium having stored thereon computer readable instructions that, when executed by a processor, implement the steps of the methods described above.
有益效果Beneficial effect
本申请有益技术效果:本申请通过改变卷积结构,改变了输出的训练模型的结构,即将现有只识别特定实体类别的训练模型,改变为兼顾识别特定实体类别以及各特定实体类别的空间位置关系的训练模型,实现任何一个能识别到人脸任何五官的角度范围内,对人脸影像的准确表示与识别;本申请的训练模型中包含了各人脸五官的空间位置关系,不仅识别人脸五官特征,还识别五官的空间位置关系,提高人脸识别的精准度,进而精准识别;本申请的训练模型中包含了各人脸五官的空间位置关系,从任何一个能识别到人脸任何五官的角度,都可以通过空间位置关系转换出人脸的正面图像,随意姿态下均可识别,无需刻意摆拍,人脸识别更灵活高效且人性化;本申请训练模型时需要的数据量大幅减少,使用较少的样本也可训练输出准确识别的训练模型。The present application has the beneficial technical effects: the present application changes the structure of the output training model by changing the convolution structure, that is, changing the existing training model that only identifies a specific entity category to a spatial location that identifies a specific entity category and each specific entity category. The training model of the relationship realizes accurate representation and recognition of the facial image within any angle range that can recognize any facial features of the human face; the training model of the present application includes the spatial positional relationship of each facial feature, not only identifying the person The facial feature features also recognizes the spatial positional relationship of the facial features, improves the accuracy of face recognition, and thus accurately recognizes; the training model of the present application includes the spatial positional relationship of each facial feature, from any one that can recognize the face. From the perspective of the five senses, the frontal image of the face can be converted by the spatial positional relationship, which can be recognized under the arbitrary posture, and the face recognition is more flexible, efficient and humanized; the amount of data required for training the model in this application is large. Reduce, use fewer samples to train training models that accurately identify the output.
附图说明DRAWINGS
图1本申请一实施例识别人脸的方法流程示意图;1 is a schematic flow chart of a method for recognizing a human face according to an embodiment of the present application;
图2本申请一实施例识别人脸的装置结构示意图;2 is a schematic structural diagram of an apparatus for recognizing a human face according to an embodiment of the present application;
图3本申请一实施例的识别人脸的装置优化结构示意图;FIG. 3 is a schematic diagram of an apparatus for optimizing a face of an embodiment of the present application; FIG.
图4本申请一实施例的训练模块的结构示意图;4 is a schematic structural diagram of a training module according to an embodiment of the present application;
图5本申请另一实施例识别人脸的装置结构示意图;FIG. 5 is a schematic structural diagram of an apparatus for recognizing a human face according to another embodiment of the present application; FIG.
图6本申请一实施例的判断模块的结构示意图;6 is a schematic structural diagram of a determining module according to an embodiment of the present application;
图7本申请再一实施例识别人脸的装置结构示意图;FIG. 7 is a schematic structural diagram of an apparatus for recognizing a human face according to still another embodiment of the present application; FIG.
图8本申请又一实施例识别人脸的装置结构示意图;FIG. 8 is a schematic structural diagram of an apparatus for recognizing a human face according to still another embodiment of the present application; FIG.
图9本申请又一实施例识别人脸的装置结构示意图;9 is a schematic structural diagram of an apparatus for recognizing a human face according to still another embodiment of the present application;
图10本申请一实施例的计算机设备内部结构示意图。FIG. 10 is a schematic diagram showing the internal structure of a computer device according to an embodiment of the present application.
本发明的最佳实施方式BEST MODE FOR CARRYING OUT THE INVENTION
参照图1,本申请一实施例的识别人脸的方法,包括:Referring to FIG. 1, a method for recognizing a human face according to an embodiment of the present application includes:
S1:根据获取到的人脸影像的第一拍摄角度,在基于CapsNet网络结构训练的人脸识别模型中选择对应的第一特征提取方式。S1: Select a corresponding first feature extraction mode in the face recognition model based on the CapsNet network structure training according to the first shooting angle of the acquired face image.
本实施例的拍摄角度包括以正面人脸的正前方视线为基准的能够拍摄到人脸任何五官的角度,比如侧向正90度为正右侧脸方向,侧向负90度为正左侧脸方向;再比如正上90度为俯视方向、正下90度为仰视方向等等,所有拍摄角度分布在以正面人脸的正前方视线为基准的类球面上。本实施例的拍摄角度通过装配上述人脸识别模型的摄像机获取。本实施例的第一拍摄角度为待验证人的某一指定拍摄角度,以区别于注册人进行注册时的各第二拍摄角度;第一特征提取方式相对于待验证人的某一指定拍摄角度,同样区别于注册人进行注册时的各第二拍摄角度对应的各第二特征提取方式,综上以上“第一”、“第二”仅为区别,不作限定。本实施例的CapsNet网络结构是基于胶囊单元(Capsule)的网络结构,胶囊单元就是一个向量,包含任意值,每个值代表了当前需要识别的物体的一个特征,比如人脸五官中的眼睛等。胶囊网络由胶囊单元构成,胶囊网络中的向量,不仅可表示物体的特征、也可以表示物体的方向、状态等。本实施例通过输入输出的向量的长度表征人脸存在的概率,向量的方向表示人脸的某些五官属性。同一层级的Capsule通过变换矩阵对更高级别的Capsule的实例化参数进行预测。当多个预测一致时(本实施例使用动态路由使预测一致),更高级别的Capsule将变得活跃。本实施例通过Capsule中的神经元的激活情况表示了人脸影像中存在的人脸五官的各种性质,上述性质可以包含很多种不同的参数,例如姿势(位置,大小,方向)、变形、速度、反射率,色彩、纹理等。本实施例根据输入输出的向量长度表示了人脸出现的概率,概率值必须在0到1之间。为了实现概率压缩,并完成Capsule层级的激活功能,本实施例的胶囊网络中使用了Squashing(挤压)的非线性函数,该非线性函数确保短向量的长度能够缩短到几乎等于零,而长向量的长度压缩到接近但不超过1的情况。Squashing的非线性函数的表达式分为两部分:
Figure PCTCN2018095498-appb-000001
Figure PCTCN2018095498-appb-000002
非线性函数为:
Figure PCTCN2018095498-appb-000003
前一部分是输入向量S j的缩放尺度,第二部分是输入向量S j的单位向量,该非线性函数既保留了输入向量的方向,又将输入向量的长度压缩到区间(0,1)内,以实现用向量模的大小衡量某个实体出现的概率,模值越大,概率越大。S j向量为零向量时,V j能取到0,而S j无穷大时V j无限逼近1,该非线性函数可以看作是对向量长度的一种压缩和重分配,也可以看作是一种输入向量激活后的输出向量的方式。Capsule的输入向量就相当于CNN的标量输入,而该向量的计算就相当于两层Capsule间的传播与连接方式。输入向量的计算分为两个阶段,即线性组合和Routing(路由过程,本实施例为动态路由),表示为:
Figure PCTCN2018095498-appb-000004
Figure PCTCN2018095498-appb-000005
其中u是上一层胶囊网络的输出,W是每个输出要乘的权值,可以看作上一层每一个胶囊神经元以不同强弱的连接输出到后一层的某一个神经元。C根据下面公式计算:
Figure PCTCN2018095498-appb-000006
C 为耦合系数。本实施例的Capsnet胶囊网络结构与CNN网络结构相比,网络的输入即线性加权求和很类似,但是在线性求和阶段上多加了一个耦合系数C,为了求C必须先求b,b根据下面公式计算:
Figure PCTCN2018095498-appb-000007
b初始值为0,故在前向传播求S的过程中,本实施例把W设计成随机值,b初始化为0可以得到C,U就是上一层胶囊网络的输出,V j为Capsule j的输出向量,根据上述关系得到更高层的胶囊输入S。对于给定长度但方向不同的两个向量而言,乘积有下列几种情况:正值、零、负值,当两个向量的相乘结果为正时,代表两个向量指向的方向相似,则b更新结果变大,那么耦合系数就高,说明该两向量十分匹配。相反,若是两个向量相乘结果为负,则b更新结果变小,那么耦合系数就小,说明两个向量不匹配。通过迭代确定C,也就等于确定路线,该路线上胶囊神经元的模都特别大,路线的尽头就是正确预测的胶囊。
The shooting angle of this embodiment includes an angle that can capture any facial features of the human face based on the front line of sight of the frontal face, such as a lateral positive 90 degrees being a positive right face direction, and a lateral negative 90 degrees being a positive left side. Face orientation; for example, 90 degrees in the top direction, 90 degrees in the bottom direction, etc., all shooting angles are distributed on the spherical surface based on the front line of sight of the front face. The photographing angle of the present embodiment is acquired by a camera that assembles the above-described face recognition model. The first shooting angle of the embodiment is a certain shooting angle of the person to be verified to distinguish the second shooting angles when the registrant performs registration; the first feature extraction mode is relative to a specified shooting angle of the person to be verified. The second feature extraction method corresponding to each second shooting angle when the registrant performs registration is similar to the above, and the above-mentioned "first" and "second" are only differences, and are not limited. The CapsNet network structure of this embodiment is based on a capsule unit network structure. The capsule unit is a vector containing arbitrary values, each value representing a feature of an object that needs to be identified currently, such as an eye in a facial expression. . The capsule network is composed of a capsule unit, and the vector in the capsule network can represent not only the characteristics of the object but also the direction and state of the object. In this embodiment, the probability of the presence of a face is represented by the length of the input and output vector, and the direction of the vector represents some of the facial features of the face. Capsules of the same level predict the instantiation parameters of higher-level Capsules through transformation matrices. When multiple predictions are consistent (this embodiment uses dynamic routing to make predictions consistent), higher level Capsules will become active. In this embodiment, the various states of the facial features present in the face image are represented by the activation of the neurons in the Capsule. The above properties may include many different parameters, such as posture (position, size, direction), deformation, Speed, reflectivity, color, texture, etc. In this embodiment, the probability of the appearance of a face is expressed according to the vector length of the input and output, and the probability value must be between 0 and 1. In order to achieve probability compression and complete the activation function of the Capsule level, the capsule network of the present embodiment uses a non-linear function of Squashing, which ensures that the length of the short vector can be shortened to almost equal to zero, and the long vector The length is compressed to close but not more than one. The expression of Squashing's nonlinear function is divided into two parts:
Figure PCTCN2018095498-appb-000001
with
Figure PCTCN2018095498-appb-000002
The nonlinear function is:
Figure PCTCN2018095498-appb-000003
The first part is the scaling scale of the input vector S j , and the second part is the unit vector of the input vector S j , which preserves the direction of the input vector and compresses the length of the input vector into the interval (0, 1). To achieve the probability of an entity appearing by the size of the vector modulus. The larger the modulus, the greater the probability. When the S j vector is a zero vector, V j can take 0, and when S j is infinite, V j infinitely approaches 1. This nonlinear function can be regarded as a kind of compression and redistribution of the length of the vector, and can also be regarded as a The way the input vector is activated after the output vector. The input vector of Capsule is equivalent to the scalar input of CNN, and the calculation of this vector is equivalent to the propagation and connection between two layers of Capsule. The calculation of the input vector is divided into two phases, namely linear combination and routing (routing process, this embodiment is dynamic routing), which is expressed as:
Figure PCTCN2018095498-appb-000004
Figure PCTCN2018095498-appb-000005
Where u is the output of the capsule network of the previous layer, and W is the weight to be multiplied for each output. It can be regarded as the output of each capsule neuron in the upper layer to a certain neuron in the next layer with different strengths and weaknesses. C is calculated according to the following formula:
Figure PCTCN2018095498-appb-000006
C is the coupling coefficient. Compared with the CNN network structure, the network structure of the Capsnet capsule of this embodiment is similar to the linear weighted summation of the network, but a coupling coefficient C is added to the linear summation stage. In order to find C, it is necessary to first obtain b, b according to The following formula calculates:
Figure PCTCN2018095498-appb-000007
b initial value is 0, so in the process of forward propagation S, this embodiment design W as a random value, b initialized to 0 can get C, U is the output of the upper layer capsule network, V j is Capsule j The output vector, according to the above relationship, obtains a higher layer capsule input S. For two vectors of a given length but different directions, the product has the following conditions: positive value, zero, negative value. When the multiplication result of two vectors is positive, it means that the two vectors point in the same direction. Then the b update result becomes larger, then the coupling coefficient is high, indicating that the two vectors match very well. Conversely, if the multiplication of the two vectors is negative, then the b update result becomes smaller, then the coupling coefficient is small, indicating that the two vectors do not match. Determining C by iteration is equivalent to determining the route on which the capsule neurons are particularly large, and the end of the route is the correctly predicted capsule.
本实施例的人脸识别模型中包含了各角度人脸影像的坐标系,以向量表示一个特定人脸五官类型,比如向量中包含了人脸的眼睛、耳朵、鼻子等人脸五官类别,而且向量中包括了人脸五官的参数,比如大小、位置、方向、颜色等姿态属性参数,以表示五官之间的相对空间位置关系的联络转换关系。本实施例各角度识别到的人脸五官不同,对应的根据空间位置关系转换成正面人脸的人脸影像的方式不同。比如从俯视状态下获取的鼻子的向量表示不同于从左侧脸方向获取的鼻子的向量表示,而且根据俯视状态下的鼻子向量表示转换成正面鼻子向量的旋转方式,不同于根据从左侧脸方向获取的鼻子的向量表示转换成正面鼻子向量的旋转方式。The face recognition model of the embodiment includes a coordinate system of the face image of each angle, and represents a specific facial features of the face in a vector, such as a face, an ear, a nose, and the like, including a human face, and The vector includes the parameters of the facial features, such as the size, position, direction, color and other posture attribute parameters, to indicate the relationship of the relative spatial position of the facial features. In this embodiment, the facial features recognized by the respective angles are different, and the corresponding manners of converting the facial images into the frontal faces according to the spatial positional relationship are different. For example, the vector representation of the nose obtained from the top view state is different from the vector representation of the nose acquired from the left face direction, and the rotation vector according to the nose vector in the top view state is converted into the front nose vector, unlike the face from the left side. The vector representation of the nose acquired in the direction is converted to the rotation of the front nose vector.
S2:根据第一特征提取方式提取所述第一拍摄角度对应的第一人脸的特征,并根据所述人脸识别模型中带有的空间位置关系,将所述第一人脸的特征在所述人脸识别模型中转换成正面人脸的人脸影像的第一特征向量。S2: extracting, according to the first feature extraction manner, a feature of the first face corresponding to the first shooting angle, and according to the spatial position relationship in the face recognition model, the feature of the first face is The first feature vector of the face image converted into a frontal face in the face recognition model.
本实施例通过不同角度下人脸影像对应不同的特征提取方式,而且不同角度下识别到的第一人脸的特征不同,比如,以垂直于正面人脸的角度,可识别到五官中每个器官的正面状态特征以及五官之间的整体分布状态,而垂直于右侧脸方向,仅能识别到右眼、鼻子右半部分、右耳等器官的侧面状态。但本实施例中人脸识别模型为带有空间位置关系的三维立体模型,可以根据固有的空间位置关系,将第一拍摄角度下识别到的三维立体结构的某一部分,转换成另一角度下的三维立体结构的图像,比如将上述识别到右眼、鼻子右半部分、右耳等器官的侧面状态转换为垂直于正面人脸的拍摄状态,进而识别到五官中每个器官的正面状态特征以及五官之间的整体分布状态,输出正面人脸的特定实体类型对应的第一特征向量,上述特定实体类型对应人脸的五官。In this embodiment, the face images corresponding to different feature extraction modes are different at different angles, and the features of the first face recognized under different angles are different. For example, each angle of the facial features can be recognized perpendicular to the angle of the front face. The positive state of the organ and the overall distribution between the facial features, and perpendicular to the direction of the right face, can only recognize the lateral state of the right eye, the right half of the nose, the right ear and the like. However, in this embodiment, the face recognition model is a three-dimensional model with spatial positional relationship, and a certain part of the three-dimensional structure recognized under the first shooting angle can be converted into another angle according to the inherent spatial positional relationship. The image of the three-dimensional structure, for example, converts the side state of the organ identified to the right eye, the right half of the nose, the right ear, and the like to a state perpendicular to the photographing of the frontal face, thereby identifying the positive state feature of each organ in the facial features. And an overall distribution state between the five senses, outputting a first feature vector corresponding to a specific entity type of the front face, and the specific entity type corresponding to the facial features of the face.
S3:判断第一特征向量与预设特征向量的相似度是否小于预设阈值。S3: Determine whether the similarity between the first feature vector and the preset feature vector is less than a preset threshold.
本实施例的预设特征向量,是通过将采集的预设角度的人脸影像输入对应角度的特征提取方式后输出的结果。本实施例的预设阈值为0.8及其以上。举例地,家用智能门锁中的预先注册了A的人脸影像的特征向量,当A要进入家里时,需要通过智能门锁中的人脸识别模型进行人脸识别验证,验证通过方 可打开智能门锁。当A侧面站在智能门锁的摄像机下时,摄像机捕获了A的侧脸的五官特征,并输送到人脸识别模型中,人脸模型根据A的侧脸的五官特征,调用了A的识别数据库,启用A的人脸五官空间位置关系,并将A的侧脸转换为正面脸,并输出正面脸中五官对应的特征向量,计算该特征向量与A注册的预设特征向量进行比较计算,计算值在设定阈值内,则控制智能门锁呈打开状态。The preset feature vector of the embodiment is a result of outputting the face image of the preset preset angle by inputting the feature extraction mode of the corresponding angle. The preset threshold of this embodiment is 0.8 or more. For example, in the home smart door lock, the feature vector of the face image of A is pre-registered. When A wants to enter the home, the face recognition model in the smart door lock needs to be used for face recognition verification, and the verification can be opened. Smart door lock. When the A side is under the camera of the smart door lock, the camera captures the facial features of the side face of A and transmits it to the face recognition model. The face model invokes the recognition of A according to the facial features of the side face of A. The database is enabled with the facial position of the facial features of A, and converts the side face of A into a front face, and outputs the feature vector corresponding to the facial features in the front face, and calculates the feature vector to be compared with the preset feature vector registered by A, If the calculated value is within the set threshold, the smart door lock is controlled to be open.
S4:若小于,则判定获取到的人脸影像与预设特征向量对应的人脸影像为同一个人的人脸影像。S4: If it is smaller, it is determined that the acquired face image and the face image corresponding to the preset feature vector are face images of the same person.
本实施例的相似度的值越小,相似程度越高。举例地,阈值设为0.8,当小于0.8时,则判定获取到的人脸影像与所述预设特征向量对应的人脸影像为同一个人的人脸影。The smaller the value of the similarity of this embodiment, the higher the degree of similarity. For example, the threshold is set to 0.8, and when it is less than 0.8, it is determined that the acquired face image and the face image corresponding to the preset feature vector are the same person's face shadow.
进一步地,本实施例的步骤S1之前,包括:Further, before step S1 of this embodiment, the method includes:
S10:采集多个人的各拍摄角度的人脸影像数据,以构建训练模型样本库。S10: Collect face image data of each shooting angle of a plurality of people to construct a training model sample library.
本实施例通过采集各角度的人脸影像数据,包括多个人的各角度人脸影像数据,以便更精准地训练模型,提高模型的泛化能力,但由于Capsnet网络结构包括了空间位置关系的因素,相比于现有CNN网络结构在构建训练模型所需的样本数据明显减少,但却得到准确度更高的训练模型。CNN(Convolutional Neural Network,卷积神经网络),从网络设计上来说,池化层不仅减少了参数,还避免了过拟合,但同时抛弃了一些信息,比如位置信息。CNN不关注组件的朝向和空间上的相对关系,只在乎有没有特定的特征。比如现有通过CNN进行人脸识别时,需要识别到人脸的两只眼睛、鼻子、嘴巴等,才能进行人脸识别,所以CNN人脸识别时必须正向摆拍后才能精准识别到需要的五官特征。而且在识别过程中两只眼睛、鼻子、嘴巴的位置发生变化,但满足识别到所有的五官特征后CNN的识别结果依然是人脸,导致识别误差较大,尤其对于识别人脸之外的物品时,错误率更高。本实施例的CapsNet(胶囊网络,Capsule Networks,简称CapsNet)网络结构充分利用空间位置关系,由矩阵乘法来建立训练模型,CapsNet中采用的神经活动会随着视角的变化而变化,而不消除神经活动中视角变化带来的影响,可以同时处理多个不同仿射变换或不同对象的不同部件,使得训练模型可以识别各角度人脸影像。本实施例的训练模型具有等变映射的同变性,各角度人脸影像在经过旋转、平移、缩放后,依然具有识别、表示的能力,使得本实施例的训练模型可以识别各观测到人脸五官的任一角度的人脸影像。举例地,本实施例的CapsNet网络结构输出各角度人脸影像对应的训练模型,能够在人脸俯视、仰视、侧视、正视等任何能识别到人脸任何五官的视角范围内,准确识别人脸,而无需端正摆拍,提高人脸识别的灵活性和高效性,避免了现有人脸识别中人脸必须机械的、端正的面对识别平面的弊端,扩展了人脸识别的灵活性,提升用户使用人脸识别装置的使用体验,且无需改变现有的人脸识别系统的硬件系统。In this embodiment, the face image data of each angle is collected, and the face image data of each angle of multiple people is included, so as to more accurately train the model and improve the generalization ability of the model, but the Capsnet network structure includes the spatial positional relationship factor. Compared with the existing CNN network structure, the sample data required to construct the training model is significantly reduced, but the training model with higher accuracy is obtained. CNN (Convolutional Neural Network), in terms of network design, the pooling layer not only reduces the parameters, but also avoids over-fitting, but at the same time discards some information, such as location information. CNN does not pay attention to the relative orientation of the components and the spatial relationship, only care about whether there are specific features. For example, when face recognition is performed by CNN, it is necessary to recognize two eyes, nose, mouth, etc. of the face in order to perform face recognition. Therefore, CNN face recognition must be positively shot before accurate recognition is required. Five features. Moreover, the positions of the two eyes, nose and mouth change during the recognition process, but after the recognition of all the facial features, the recognition result of the CNN is still a human face, resulting in a large recognition error, especially for identifying objects other than the human face. The error rate is higher. The CapsNet (Capsule Networks, CapsNet) network structure of this embodiment makes full use of the spatial position relationship, and the training model is established by matrix multiplication. The neural activity used in CapsNet changes with the change of the angle of view without eliminating the nerve. The influence of the change of the angle of view in the activity can simultaneously process a plurality of different affine transformations or different parts of different objects, so that the training model can recognize the face images of each angle. The training model of the embodiment has the homomorphism of the equivariant mapping, and the facial image of each angle still has the ability of recognition and representation after being rotated, translated, and scaled, so that the training model of the embodiment can recognize each observed face. A facial image of any angle of the five senses. For example, the CapsNet network structure of the embodiment outputs a training model corresponding to each angle of the face image, and can accurately identify the person within a range of perspectives such as a face view, a bottom view, a side view, a front view, and the like that can recognize any facial features of the face. The face, without the need for correct posture, improves the flexibility and efficiency of face recognition, and avoids the drawback that the face in the existing face recognition must be mechanically and correctly facing the recognition plane, which expands the flexibility of face recognition. The user experience of using the face recognition device is improved, and there is no need to change the hardware system of the existing face recognition system.
本实施例设A为m×p的人脸五官的特征数据矩阵,B为p×n的人脸五官的空间位置关系矩阵,那么称m×n的矩阵M为矩阵A与B的乘积,记作M=AB,但只在矩阵A的列数等于矩阵B的行数时,A与B相乘才有意义,其中矩阵M中的第i行第j列元素可以表示为:
Figure PCTCN2018095498-appb-000008
其中,a为矩阵A内的数据,b为矩阵B内的数据,P为矩阵A与B内相等的列数或行数。本实施例通过矩阵乘法把人脸五官的特征数据与人脸五官的空间位置关系的组合关系,通过矩阵M表现在模型中,通过矩阵乘法把人脸五官的许多数据紧凑的集中到了一起,可以简便地表示带有空间位置关系的人脸识别模型。
In this embodiment, A is a feature data matrix of a facial face of m×p, and B is a spatial position relationship matrix of a facial face of p×n, then a matrix M of m×n is a product of a matrix A and B, M=AB, but only when the number of columns of the matrix A is equal to the number of rows of the matrix B, it is meaningful to multiply A and B, wherein the i-th row and the j-th column elements in the matrix M can be expressed as:
Figure PCTCN2018095498-appb-000008
Where a is the data in the matrix A, b is the data in the matrix B, and P is the number of columns or rows equal to the matrix A and B. In this embodiment, the combination relationship between the feature data of the facial features and the spatial position of the facial features is represented by the matrix multiplication in the model, and the data of the facial features are compactly combined by matrix multiplication. A face recognition model with a spatial positional relationship is simply represented.
S11:将所述训练模型样本库的人脸影像数据输入CapsNet网络结构的卷积层神经网络中进行训练,得到所述人脸识别模型。S11: The face image data of the training model sample library is input into a convolutional layer neural network of a CapsNet network structure for training, to obtain the face recognition model.
本实施例的训练模型可从识别到人脸五官的任意角度识别人脸特征,并根据训练模型中带有的空间位置关系转换出正面人脸的五官特征,进而进行人脸识别。The training model of the embodiment can recognize the facial features from any angle that recognizes the facial features, and converts the facial features of the frontal human face according to the spatial positional relationship in the training model, thereby performing face recognition.
进一步地,本实施例的步骤S11,包括:Further, step S11 of the embodiment includes:
S111:将训练模型样本库的各角度人脸影像数据输入CapsNet网络结构的第一卷积层,并以第一指定卷积核以及第一指定步幅进行卷积,并通过指定激活函数输出张量。S111: Input the face image data of each angle of the training model sample library into the first convolution layer of the CapsNet network structure, convolute with the first specified convolution kernel and the first specified step, and output the output by specifying the activation function. the amount.
本实施例的CapsNet网络结构包括两个卷积层和一个全连接层,第一卷积层为常规的卷积层,起到像素级局部特征的检测作用。本实施例的第一卷积层有256个9*9的第一指定卷积核,第一指定步幅取1,指定激活函数为ReLU。第一卷积层把像素亮度转化成局部特征检测器的激活,第一卷积层的输出张量作为第二卷积层的输入。The CapsNet network structure of this embodiment includes two convolution layers and a full connection layer, and the first convolution layer is a conventional convolution layer, which serves as a detection function of pixel-level local features. The first convolutional layer of this embodiment has 256 9*9 first designated convolution kernels, the first specified step size is 1, and the activation function is designated as ReLU. The first convolution layer converts the pixel brightness into an activation of the local feature detector, and the output tensor of the first convolution layer serves as the input to the second convolution layer.
S112:将所述张量输入CapsNet网络结构的第二卷积层,并以第二指定卷积核以及第二指定步幅进行卷积,以构建张量结构,并输出所述张量结构的Capsule(胶囊)向量。S112: input the tensor into a second convolution layer of the CapsNet network structure, and convolute with a second specified convolution kernel and a second specified step to construct a tensor structure, and output the tensor structure Capsule vector.
本实施例的第二卷积层为Primary Capsules层(主胶囊层),为多维度实体类别的最底层,具有32个通道,每个通道均由一个8维卷积结构组成,每个通道输出一个8维向量,达到8*1 Capsules的特征封装的效果。本实施例的Capsule是一组神经元,其输入输出向量表示特定实体类别的实例化参数。本实施例的特定实体类别为人脸五官。举例地,本实施例的第二指定卷积核为9*9卷积核以及第二指定步幅为2。本实施例的CapsNet架构中,将8个卷积单元封装在一起成为了一个新的Caosule单元。Primary Capsules层的卷积计算都没有使用ReLU(线性整流函数,Rectified Linear Unit)等激活函数,而以向量的方式预备输入到下一层Capsule(胶囊)单元中。The second convolution layer of this embodiment is a Primary Capsules layer (main capsule layer), which is the bottom layer of the multi-dimensional entity category, and has 32 channels, each channel is composed of an 8-dimensional convolution structure, and each channel is output. An 8-dimensional vector that achieves the effect of 8*1 Capsules feature encapsulation. The Capsule of this embodiment is a set of neurons whose input and output vectors represent instantiation parameters of a particular entity class. The specific entity category of this embodiment is a facial feature. For example, the second specified convolution kernel of this embodiment is a 9*9 convolution kernel and the second specified step size is 2. In the CapsNet architecture of this embodiment, eight convolution units are packaged together into a new Caosule unit. The convolution calculation of the Primary Capsules layer does not use an activation function such as ReLU (Rectified Linear Unit), but prepares the input into the next Capsule unit in a vector manner.
S113:通过CapsNet网络结构的DigitCaps层(数字胶囊层)对所述Capsule向量进行传播与Routing更新,并输出人脸识别模型。S113: Propagating and routing the Capsule vector through a DigitCaps layer (digital capsule layer) of the CapsNet network structure, and outputting a face recognition model.
本实施例的CapsNet网络结构的DigitCaps层中,每个Capsule的激活向量模长给出了每个特定实体类别的实例是否存在,比如,DigitCaps层的输出范围在0到1之间,0表示不存在,1表示出现了。DigitCaps层的输入是Primary Capsules层中所有Capsule的输出向量,向量维度为[8,1];DigitCaps层的输出向量的向量维度为[16,1],本实施例的CapsNet的16维输出的训练模型是鲁棒的。In the DigitCaps layer of the CapsNet network structure of this embodiment, the activation vector modulus of each Capsule gives an instance of each specific entity class. For example, the output range of the DigitCaps layer is between 0 and 1, and 0 means no. Existence, 1 indicates that it has appeared. The input of the DigitCaps layer is the output vector of all Capsules in the Primary Capsules layer, the vector dimension is [8, 1]; the vector dimension of the output vector of the DigitCaps layer is [16, 1], and the training of the 16-dimensional output of CapsNet in this embodiment The model is robust.
本申请另一实施例的步骤S3之前,包括:Before step S3 of another embodiment of the present application, the method includes:
S30:接收注册人脸的各第二拍摄角度人脸图像。S30: Receive a second shooting angle face image of the registered face.
本实施例的第二拍摄角度为注册人的各拍摄角度,以区别于相对于待测人的第一拍摄角度,此处“第一”、“第二”只为区别,不作限定,本申请的其他段落处的与此相同,不赘述。本实施例的注册人脸包括一个人或多个人的人脸,以便在同一识别设备上识别多人个的人脸图像,扩大应用范围。The second shooting angle of the embodiment is the shooting angle of the registrant to distinguish it from the first shooting angle with respect to the person to be tested. Here, the “first” and the “second” are only different, and are not limited. The other paragraphs are the same as this, and are not described here. The registered face of the embodiment includes a face of one person or a plurality of people, so as to recognize a face image of a plurality of people on the same recognition device, and expand the application range.
S31:根据各第二拍摄角度分别选择与第二拍摄角度对应的第二特征提取方式,一一对应提取各第二拍摄角度对应的注册人脸的各第二人脸的特征,并根据所述人脸识别模型中带有的空间位置关系,将各所述第二人脸的特征分别在人脸识别模型中转换成注册人的正面人脸的人脸影像的第二特征向量。S31: Select a second feature extraction manner corresponding to the second shooting angle according to each second shooting angle, and extract the features of the second human faces of the registered human faces corresponding to the second shooting angles one by one, according to the The spatial positional relationship in the face recognition model converts the features of each of the second human faces into a second feature vector of the face image of the front face of the registrant in the face recognition model.
本实施例将注册人脸各第二拍摄角度对应的各第二人脸的特征转换为正面人脸的人脸影像的第二特征的转换方式与转换原理同步骤S2,此处不赘述。In this embodiment, the conversion mode and the conversion principle of the second feature of the face image corresponding to the second shooting angle of the registered face to the face image of the front face are the same as the step S2, and details are not described herein.
S32:设置所述第二特征向量为所述预设特征向量。S32: Set the second feature vector to be the preset feature vector.
通过将模型验证注册过的第二特征向量为基准的预设特征向量,以便准确识别待测人的人脸特征向量,当待测的人脸特征向量与预设特征向量相近时,判定为同一个人。By verifying the registered second feature vector as the reference preset feature vector, in order to accurately identify the face feature vector of the person to be tested, when the face feature vector to be tested is close to the preset feature vector, the determination is the same. personal.
本申请实施例还包括注册人可为多人,预设特征向量对应为多个的情况,通过人脸识别模型进行人脸识别时,可通过不同注册人的人脸五官特征的不同,分别建立相对应的数据库,数据库中包含该注册人的五官特征以及五官的空间位置关系。获取多个预设特征向量中与第一特征向量匹配的特征向量,确定当期获取的人脸图像与该匹配的特征向量对应的人脸影像为同一人脸影像。通过人脸识别模型识别时先根据识别到的五官特征调取对应的数据库,然后根据对应的空间位置关系实现分别的转换,转换过程与原理同上所述,不赘述。The embodiment of the present application further includes a case where the registrant can be multiple people, and the preset feature vector corresponds to multiple. When the face recognition model is used for face recognition, the facial features of the different registrants can be separately established. Corresponding database, the database contains the facial features of the registrant and the spatial position relationship of the five senses. Obtaining a feature vector that matches the first feature vector among the plurality of preset feature vectors, and determining that the face image corresponding to the face image acquired in the current period is the same face image as the matched feature vector. When the face recognition model is recognized, the corresponding database is firstly retrieved according to the recognized facial features, and then the respective conversions are implemented according to the corresponding spatial positional relationship. The conversion process and the principle are the same as above, and will not be described again.
进一步地,本实施例的步骤S3,包括:Further, step S3 of the embodiment includes:
S300:计算上述第一特征向量与上述预设特征向量的距离值。S300: Calculate a distance value between the first feature vector and the preset feature vector.
本步骤的距离值包括欧氏距离、余弦距离或马氏距离,本实施例优选用欧氏距离表示第一特征向量与预设特征向量的相似度。本实施例第一特征向量与预设特征向量的欧式距离表示为:
Figure PCTCN2018095498-appb-000009
其中X为用注册时使用的人脸提取到的人脸特征向量,Y为验证时提取到的人脸特征向量,n表示自然数。
The distance value of this step includes an Euclidean distance, a cosine distance or a Mahalanobis distance. In this embodiment, the Euclidean distance is preferably used to represent the similarity between the first feature vector and the preset feature vector. The Euclidean distance between the first feature vector and the preset feature vector in this embodiment is expressed as:
Figure PCTCN2018095498-appb-000009
Where X is the face feature vector extracted from the face used for registration, Y is the face feature vector extracted at the time of verification, and n is a natural number.
S301:判断所述距离值的大小是否小于预设阈值。S301: Determine whether the size of the distance value is less than a preset threshold.
本实施例以人脸识别为例,本实施例的基于CapsNet的多角度识别方法,也推广至其他领域的目标物品的识别,不赘述。In this embodiment, the face recognition is taken as an example. The CapsNet-based multi-angle recognition method of the present embodiment is also extended to the identification of target items in other fields, and is not described herein.
本实施例通过CapsNet网络结构输出各角度人脸影像对应的训练模型,输出的各角度人脸影像对应的特征提取方式随着视角的变化而变化,而不消除神经活动中视角变化带来的影响,可以同时处理多个 不同仿射变换或不同对象的不同部件,本实施例的训练模型具有等变映射的同变性,各角度人脸影像在经过旋转、平移、缩放后,依然具有识别、表示的能力,使得训练模型可以识别各角度人脸影像。本实施例通过CapsNet网络输出的训练模型中包含了各人脸五官的空间位置关系,从任何一个能识别到人脸任何五官的角度,都可以通过空间位置关系转换出人脸的正面图像。利用本实施例的人脸识别方法,能够在人脸俯视、仰视、侧视、正视等任何视角范围内,准确识别人脸特征,避免了现有人脸识别中人脸必须机械的、端正的面对识别平面的弊端,扩展了人脸识别的灵活性,提升用户使用人脸识别装置的使用体验,且无需改变现有的人脸识别系统的硬件系统。In this embodiment, the training model corresponding to the face image of each angle is output through the CapsNet network structure, and the feature extraction manner corresponding to the output face image changes according to the change of the angle of view, without eliminating the influence of the change of the angle of view in the neural activity. The different affine transformations or different components of different objects can be processed at the same time. The training model of the embodiment has the homomorphism of the equivariant mapping, and the facial images of each angle still have the recognition and representation after being rotated, translated and scaled. The ability to make the training model recognize facial images at various angles. In the training model outputted by the CapsNet network in this embodiment, the spatial position relationship of each facial feature is included, and the frontal image of the human face can be converted by the spatial positional relationship from any angle that can recognize any facial features of the human face. By using the face recognition method of the embodiment, the face features can be accurately recognized in any range of viewing angles such as face view, bottom view, side view, and front view, and the face of the existing face recognition must be mechanically and correctly corrected. The drawbacks of the recognition plane extend the flexibility of face recognition, improve the user experience of using the face recognition device, and do not need to change the hardware system of the existing face recognition system.
本申请再一实施例中,步骤S4之后,包括:In still another embodiment of the present application, after step S4, the method includes:
S5:向装配有所述人脸识别模型的安防系统发出控制指令,以打开安防系统,使受所述安防系统控制的应用实体处于可使用状态。S5: issuing a control command to the security system equipped with the face recognition model to open the security system, so that the application entity controlled by the security system is in a usable state.
本实施例将人脸识别装置用于指定的安防系统,当通过人脸识别判定为同一人后,则向安防系统发送为同一个人时的预设控制指令,以便更好的发挥安防系统的功能。本实施例的安防系统包括但不限于智能门锁开关、身份验证门禁以及各种互联网安防平台,比如:税务登记平台、银行账户平台、考生身份验证平台等,提高验证的时效性以及准确性。本实施例的应用实体包括实体物体和虚拟平台,实体物体比如实体玩具、公共健身器材等,虚拟平台比如网络游戏平台、网络视频平台等。In this embodiment, the face recognition device is used in the designated security system. When the face recognition is determined to be the same person, the preset control command when the same person is sent to the security system is used to better play the function of the security system. . The security system of this embodiment includes, but is not limited to, a smart door lock switch, an identity verification access control, and various internet security platforms, such as a tax registration platform, a bank account platform, and a candidate authentication platform, etc., to improve the timeliness and accuracy of verification. The application entity of this embodiment includes a physical object and a virtual platform, such as a physical toy, a public fitness equipment, and the like, and a virtual platform such as an online game platform, a network video platform, or the like.
本申请又一实施例中,步骤S5之后,还包括:In still another embodiment of the present application, after step S5, the method further includes:
S6:统计指定同一个人在指定时间段的持续使用所述应用实体的累计时间长度是否超过阈值。S6: The statistic specifies whether the accumulated time length of the same entity for the continuous use of the application entity in the specified time period exceeds a threshold.
本步骤的阈值范围可根据不同的使用领域,进行具体设置。比如,本实施例用于游戏监控领域,用于防止同一个长时间处于游戏状态,影响身体健康。本实施例可设定阈值为连续12小时内的累计处于游戏状态的时间长度为2小时。The threshold range of this step can be specifically set according to different fields of use. For example, the embodiment is used in the field of game monitoring to prevent the same long time from being in a game state and affecting physical health. In this embodiment, the threshold value can be set to be 2 hours in the accumulated game state for 12 hours.
S7:若超过,则生成关闭所述应用实体的指令,以禁止继续使用该应用实体。S7: If exceeded, generate an instruction to close the application entity to prohibit continued use of the application entity.
本实施例通过与时间监控进行组合使用,以进一步扩大人脸识别的应用场景领域,比如将时间监控与人脸识别实时监控联合后,用于游戏软件的管控,以防止过度沉迷于游戏,比如设定人脸识别系统实施监控游戏系统的使用状态以及使用用户是否为同一个人,当监控到同一个人处于游戏系统的游戏时间超过预设值,则控制游戏系统处于锁屏状态。The embodiment is used in combination with time monitoring to further expand the application scene field of face recognition, for example, combining time monitoring with real-time monitoring of face recognition, and controlling the game software to prevent excessive addiction to the game, such as The face recognition system is configured to monitor the use state of the game system and whether the user is the same person. When the game time of the same person in the game system is monitored to exceed the preset value, the game system is controlled to be in a lock screen state.
再比如用于公共资源的管控分配,通过人脸识别监控指定公共领域范围内的公共设施的使用状态,若判断同一个人的持续使用时间超过预设值,则关停该公共设施,当判定是其他人进入该公共领域范围内使用公共设施时,自动开启提供使用,有利于合理分配公共资源。For example, for the management and distribution of public resources, the face recognition is used to monitor the use status of public facilities in the public domain. If it is determined that the same person's continuous use time exceeds the preset value, the public facility is shut down, and when it is determined that When others enter the public domain to use public facilities, they are automatically opened for use, which is conducive to the rational allocation of public resources.
本申请又一实施例中,步骤S4之后,包括:In still another embodiment of the present application, after step S4, the method includes:
S8:汇总所述同一人脸的图像数据至同一指定文件中。S8: The image data of the same face is summarized into the same specified file.
本实施例以对注册人的电子相片数据归类为例,通过对同一注册人脸图像的辨识,将同一注册人的 图像汇总在同一文件夹中,实现对包含同一注册人的图像数据进行规整,以便更方便的查找注册人的相应的相片资料。In this embodiment, by categorizing the electronic photo data of the registrant as an example, by identifying the same registered face image, the images of the same registrant are summarized in the same folder, thereby realizing the regularization of the image data including the same registrant. To make it easier to find the corresponding photo data of the registrant.
本实施例通过改变卷积结构,改变了输出的训练模型的结构,即将现有只识别特定实体类别的训练模型,改变为兼顾识别特定实体类型以及各特定实体类型的空间位置关系的训练模型,实现任何一个能识别到人脸任何五官的角度范围内,对人脸影像的准确表示与识别;通过CapsNet网络输出的训练模型中包含了各人脸五官的空间位置关系,不仅识别人脸五官特征,还识别五官的空间位置关系,提高人脸识别的精准度,进而精准识别;通过CapsNet网络输出的训练模型中包含了各人脸五官的空间位置关系,从任何一个能识别到人脸任何五官的角度,都可以通过空间位置关系转换出人脸的正面图像,随意姿态下均可识别,无需刻意摆拍,人脸识别更灵活高效且人性化;通过CapsNet网络输出的训练模型时需要的数据量大幅减少,使用较少的样本也可训练输出准确识别的训练模型。In this embodiment, by changing the convolution structure, the structure of the output training model is changed, that is, the existing training model that only identifies the specific entity category is changed to a training model that takes into consideration the specific spatial type and the spatial positional relationship of each specific entity type. Realize any accurate representation and recognition of facial images within the range of angles that can recognize any facial features of the human face; the training model output through the CapsNet network contains the spatial positional relationship of each facial features, not only the facial features of the human face It also recognizes the spatial positional relationship of the facial features, improves the accuracy of face recognition, and thus accurately recognizes; the training model output through the CapsNet network contains the spatial positional relationship of each facial features, from any one that can recognize any facial features of the human face. The angle can be converted into the frontal image of the face through the spatial positional relationship, which can be recognized under random gestures, without the need to deliberately pose, the face recognition is more flexible, efficient and user-friendly; the data needed for the training model output through the CapsNet network The amount is greatly reduced, and the output can be trained accurately with fewer samples. Other training model.
参照图2,本申请一实施例的基于CapsNet的多角度识别人脸的装置,包括:Referring to FIG. 2, a CapsNet-based multi-angle recognition face device according to an embodiment of the present application includes:
获取模块1,用于根据获取到的人脸影像的第一拍摄角度,在基于CapsNet网络结构训练的人脸识别模型中选择对应的第一特征提取方式。The obtaining module 1 is configured to select a corresponding first feature extraction mode in the face recognition model based on the CapsNet network structure training according to the first shooting angle of the acquired face image.
本实施例的拍摄角度包括以正面人脸的正前方视线为基准的能够拍摄到人脸任何五官的角度,比如侧向正90度为正右侧脸方向,侧向负90度为正左侧脸方向;再比如正上90度为俯视方向、正下90度为仰视方向等等,所有拍摄角度分布在以正面人脸的正前方视线为基准的类球面上。本实施例的拍摄角度通过装配上述人脸识别模型的摄像机获取。本实施例的第一拍摄角度为待验证人的某一指定拍摄角度,以区别于注册人进行注册时的各第二拍摄角度;第一特征提取方式相对于待验证人的某一指定拍摄角度,同样区别于注册人进行注册时的各第二拍摄角度对应的各第二特征提取方式,综上以上“第一”、“第二”仅为区别,不作限定。本实施例的CapsNet网络结构是基于胶囊单元(Capsule)的网络结构,胶囊单元就是一个向量,包含任意值,每个值代表了当前需要识别的物体的一个特征,比如人脸五官的眼睛等。胶囊网络由胶囊单元构成,胶囊网络的向量,不仅可表示物体的特征、也可以表示物体的方向、状态等。本实施例通过输入输出的向量的长度表征人脸存在的概率,向量的方向表示人脸的某些五官属性)。同一层级的Capsule通过变换矩阵对更高级别的Capsule的实例化参数进行预测。当多个预测一致时(本实施例使用动态路由使预测一致),更高级别的Capsule将变得活跃。本实施例通过Capsule中的神经元的激活情况表示了人脸影像中存在的人脸五官的各种性质,上述性质可以包含很多种不同的参数,例如姿势(位置,大小,方向)、变形、速度、反射率,色彩、纹理等。而输入输出的向量的长度表示了某个人脸出现的概率,概率值必须在0到1之间。为了实现概率压缩,并完成Capsule层级的激活功能,本实施例的胶囊网络中使用了Squashing的非线性函数,该非线性函数确保短向量的长度能够缩短到几乎等于零,而长向量的长度压缩到接近但不超过1的情况。Squashing的非线性函数的表达 式分为两部分:
Figure PCTCN2018095498-appb-000010
Figure PCTCN2018095498-appb-000011
非线性函数为:
Figure PCTCN2018095498-appb-000012
前一部分是输入向量S j的缩放尺度,第二部分是输入向量S j的单位向量,该非线性函数既保留了输入向量的方向,又将输入向量的长度压缩到区间(0,1)内,以实现用向量模的大小衡量某个实体出现的概率,模值越大,概率越大。S j向量为零向量时,V j能取到0,而S j无穷大时V j无限逼近1,该非线性函数可以看作是对向量长度的一种压缩和重分配,也可以看作是一种输入向量激活后的输出向量的方式。Capsule的输入向量就相当于经典神经网络神经元的标量输入,而该向量的计算就相当于两层Capsule间的传播与连接方式。输入向量的计算分为两个阶段,即线性组合和Routing(路由过程,本实施例为动态路由),表示为:
Figure PCTCN2018095498-appb-000013
Figure PCTCN2018095498-appb-000014
其中u是上一层胶囊网络的输出,W是每个输出要乘的权值,可以看作上一层每一个胶囊神经元以不同强弱的连接输出到后一层的某一个神经元。C根据下面公式计算:
Figure PCTCN2018095498-appb-000015
C为耦合系数。本实施例的Capsnet网络结构与CNN网络结构相比,网络的输入即线性加权求和很类似,但是在线性求和阶段上多加了一个耦合系数C,为了求C必须先求b,b根据下面公式计算:
Figure PCTCN2018095498-appb-000016
b初始值为0,故在前向传播求S的过程中,本实施例把W设计成随机值,b初始化为0可以得到C,U就是上一层胶囊网络的输出,V j为Capsule j的输出向量,根据上述关系得到更高层的胶囊输入S。对于给定长度但方向不同的两个向量而言,乘积有下列几种情况:正值、零、负值,当两个向量的相乘结果为正时,代表两个向量指向的方向相似,则b更新结果变大,那么耦合系数就高,说明该两向量十分匹配。相反,若是两个向量相乘结果为负,则b更新结果变小,那么耦合系数就小,说明两个向量不匹配。通过迭代确定C,也就等于确定了路线,该路线上胶囊神经元的模都特别大,路线的尽头就是正确预测的胶囊。
The shooting angle of this embodiment includes an angle that can capture any facial features of the human face based on the front line of sight of the frontal face, such as a lateral positive 90 degrees being a positive right face direction, and a lateral negative 90 degrees being a positive left side. Face orientation; for example, 90 degrees in the top direction, 90 degrees in the bottom direction, etc., all shooting angles are distributed on the spherical surface based on the front line of sight of the front face. The photographing angle of the present embodiment is acquired by a camera that assembles the above-described face recognition model. The first shooting angle of the embodiment is a certain shooting angle of the person to be verified to distinguish the second shooting angles when the registrant performs registration; the first feature extraction mode is relative to a specified shooting angle of the person to be verified. The second feature extraction method corresponding to each second shooting angle when the registrant performs registration is similar to the above, and the above-mentioned "first" and "second" are only differences, and are not limited. The CapsNet network structure of this embodiment is based on a capsule unit network structure. The capsule unit is a vector containing arbitrary values, each value representing a feature of an object that needs to be recognized currently, such as an eye of a facial expression. The capsule network is composed of a capsule unit, and the vector of the capsule network can represent not only the characteristics of the object but also the direction and state of the object. In this embodiment, the probability of the presence of the face is represented by the length of the vector of the input and output, and the direction of the vector represents some of the facial features of the face). Capsules of the same level predict the instantiation parameters of higher-level Capsules through transformation matrices. When multiple predictions are consistent (this embodiment uses dynamic routing to make predictions consistent), higher level Capsules will become active. In this embodiment, the various states of the facial features present in the face image are represented by the activation of the neurons in the Capsule. The above properties may include many different parameters, such as posture (position, size, direction), deformation, Speed, reflectivity, color, texture, etc. The length of the input and output vector indicates the probability of a face appearing, and the probability value must be between 0 and 1. In order to achieve probability compression and complete the activation function of the Capsule level, the capsule network of the present embodiment uses a nonlinear function of Squashing, which ensures that the length of the short vector can be shortened to almost equal to zero, and the length of the long vector is compressed to Close to but not more than one. The expression of Squashing's nonlinear function is divided into two parts:
Figure PCTCN2018095498-appb-000010
with
Figure PCTCN2018095498-appb-000011
The nonlinear function is:
Figure PCTCN2018095498-appb-000012
The first part is the scaling scale of the input vector S j , and the second part is the unit vector of the input vector S j , which preserves the direction of the input vector and compresses the length of the input vector into the interval (0, 1). To achieve the probability of an entity appearing by the size of the vector modulus. The larger the modulus, the greater the probability. When the S j vector is a zero vector, V j can take 0, and when S j is infinite, V j infinitely approaches 1. This nonlinear function can be regarded as a kind of compression and redistribution of the length of the vector, and can also be regarded as a The way the input vector is activated after the output vector. The input vector of Capsule is equivalent to the scalar input of classical neural network neurons, and the calculation of this vector is equivalent to the propagation and connection between two layers of Capsule. The calculation of the input vector is divided into two phases, namely linear combination and routing (routing process, this embodiment is dynamic routing), which is expressed as:
Figure PCTCN2018095498-appb-000013
Figure PCTCN2018095498-appb-000014
Where u is the output of the capsule network of the previous layer, and W is the weight to be multiplied for each output. It can be regarded as the output of each capsule neuron in the upper layer to a certain neuron in the next layer with different strengths and weaknesses. C is calculated according to the following formula:
Figure PCTCN2018095498-appb-000015
C is the coupling coefficient. Compared with the CNN network structure, the input of the Capsnet network structure of the present embodiment is similar to the linear weighted summation of the network, but a coupling coefficient C is added to the linear summation stage. In order to find C, it is necessary to first obtain b, b according to the following. Formula calculation:
Figure PCTCN2018095498-appb-000016
b initial value is 0, so in the process of forward propagation S, this embodiment design W as a random value, b initialized to 0 can get C, U is the output of the upper layer capsule network, V j is Capsule j The output vector, according to the above relationship, obtains a higher layer capsule input S. For two vectors of a given length but different directions, the product has the following conditions: positive value, zero, negative value. When the multiplication result of two vectors is positive, it means that the two vectors point in the same direction. Then the b update result becomes larger, then the coupling coefficient is high, indicating that the two vectors match very well. Conversely, if the multiplication of the two vectors is negative, then the b update result becomes smaller, then the coupling coefficient is small, indicating that the two vectors do not match. By iteratively determining C, it is equivalent to determining the route, the capsule neurons on the route are particularly large, and the end of the route is the correctly predicted capsule.
本实施例的人脸识别模型中包含了各角度人脸影像的坐标系,以向量表示一个特定人脸五官类型,比如向量中包含了人脸的眼睛、耳朵、鼻子等人脸五官类别,而且向量中包括了人脸五官的参数,比如大小、位置、方向、颜色等姿态属性参数,以表示五官之间的相对空间位置关系的联络转换关系。本实施例各角度识别到的人脸五官不同,对应的根据空间位置关系转换成正面人脸的人脸影像的方式不同。比如从俯视状态下获取的鼻子的向量表示不同于从左侧脸方向获取的鼻子的向量表示,而且根据俯视状态下的鼻子向量表示转换成正面鼻子向量的旋转方式,不同于根据从左侧脸方向获取的鼻子的向量表示转换成正面鼻子向量的旋转方式。The face recognition model of the embodiment includes a coordinate system of the face image of each angle, and represents a specific facial features of the face in a vector, such as a face, an ear, a nose, and the like, including a human face, and The vector includes the parameters of the facial features, such as the size, position, direction, color and other posture attribute parameters, to indicate the relationship of the relative spatial position of the facial features. In this embodiment, the facial features recognized by the respective angles are different, and the corresponding manners of converting the facial images into the frontal faces according to the spatial positional relationship are different. For example, the vector representation of the nose obtained from the top view state is different from the vector representation of the nose acquired from the left face direction, and the rotation vector according to the nose vector in the top view state is converted into the front nose vector, unlike the face from the left side. The vector representation of the nose acquired in the direction is converted to the rotation of the front nose vector.
第一转换模块2,用于根据所述第一特征提取方式提取所述第一拍摄角度对应的第一人脸的特征,并根据所述人脸识别模型中带有的空间位置关系,将所述第一人脸的特征在所述人脸识别模型中转换成正面人脸的人脸影像的第一特征向量。a first conversion module 2, configured to extract, according to the first feature extraction manner, a feature of the first face corresponding to the first shooting angle, and according to a spatial position relationship in the face recognition model, The feature of the first human face is converted into a first feature vector of the face image of the frontal face in the face recognition model.
本实施例通过不同角度下人脸影像对应不同的特征提取方式,而且不同角度下识别到的第一人脸的特征不同,比如,以垂直于正面人脸的角度,可识别到五官中每个器官的正面状态特征以及五官之间的整体分布状态,而垂直于右侧脸方向,仅能识别到右眼、鼻子右半部分、右耳等器官的侧面状态。但本实施例中人脸识别模型为带有空间位置关系的三维立体模型,可以根据固有的空间位置关系,将第一拍摄角度下识别到的三维立体结构的某一部分,转换成另一角度下的三维立体结构的图像,比如将上述识别到右眼、鼻子右半部分、右耳等器官的侧面状态转换为垂直于正面人脸的拍摄状态,进而识别到五官中每个器官的正面状态特征以及五官之间的整体分布状态,输出正面人脸的特定实体类型对应的第一特征向量,上述特定实体类型对应人脸的五官。In this embodiment, the face images corresponding to different feature extraction modes are different at different angles, and the features of the first face recognized under different angles are different. For example, each angle of the facial features can be recognized perpendicular to the angle of the front face. The positive state of the organ and the overall distribution between the facial features, and perpendicular to the direction of the right face, can only recognize the lateral state of the right eye, the right half of the nose, the right ear and the like. However, in this embodiment, the face recognition model is a three-dimensional model with spatial positional relationship, and a certain part of the three-dimensional structure recognized under the first shooting angle can be converted into another angle according to the inherent spatial positional relationship. The image of the three-dimensional structure, for example, converts the side state of the organ identified to the right eye, the right half of the nose, the right ear, and the like to a state perpendicular to the photographing of the frontal face, thereby identifying the positive state feature of each organ in the facial features. And an overall distribution state between the five senses, outputting a first feature vector corresponding to a specific entity type of the front face, and the specific entity type corresponding to the facial features of the face.
判断模块3,用于判断所述第一特征向量与预设特征向量的相似度是否小于预设阈值。The determining module 3 is configured to determine whether the similarity between the first feature vector and the preset feature vector is less than a preset threshold.
本实施例的预设特征向量,是通过将采集的预设角度的人脸影像输入对应角度的特征提取方式后输出的结果。本实施例的预设阈值为0.8及其以上。举例地,家用智能门锁中的预先注册了A的人脸影像的特征向量,当A要进入家里时,需要通过智能门锁中的人脸识别模型进行人脸识别验证,验证通过方可打开智能门锁。当A侧面站在智能门锁的摄像机下时,摄像机捕获了A的侧脸的五官特征,并输送到人脸识别模型中,人脸模型根据A的侧脸的五官特征,调用了A的识别数据库,启用A的人脸五官空间位置关系,并将A的侧脸转换为正面脸,并输出正面脸中五官对应的特征向量,计算该特征向量与A注册的预设特征向量进行比较计算,计算值在设定阈值内,则控制智能门锁呈打开状态。The preset feature vector of the embodiment is a result of outputting the face image of the preset preset angle by inputting the feature extraction mode of the corresponding angle. The preset threshold of this embodiment is 0.8 or more. For example, in the home smart door lock, the feature vector of the face image of A is pre-registered. When A wants to enter the home, the face recognition model in the smart door lock needs to be used for face recognition verification, and the verification can be opened. Smart door lock. When the A side is under the camera of the smart door lock, the camera captures the facial features of the side face of A and transmits it to the face recognition model. The face model invokes the recognition of A according to the facial features of the side face of A. The database is enabled with the facial position of the facial features of A, and converts the side face of A into a front face, and outputs the feature vector corresponding to the facial features in the front face, and calculates the feature vector to be compared with the preset feature vector registered by A, If the calculated value is within the set threshold, the smart door lock is controlled to be open.
判定模块4,用于若相似度小于预设阈值,则判定获取到的人脸影像与预设特征向量对应的人脸影像为同一个人的人脸影像。The determining module 4 is configured to determine that the acquired face image and the face image corresponding to the preset feature vector are the face images of the same person if the similarity is less than the preset threshold.
本实施例的相似度的值越小,相似程度越高。举例地,阈值设为0.8,当小于0.8时,则判定获取到的人脸影像与所述预设特征向量对应的人脸影像为同一个人的人脸影。The smaller the value of the similarity of this embodiment, the higher the degree of similarity. For example, the threshold is set to 0.8, and when it is less than 0.8, it is determined that the acquired face image and the face image corresponding to the preset feature vector are the same person's face shadow.
参照图3,本实施例的人脸识别装置,包括:Referring to FIG. 3, the face recognition device of this embodiment includes:
采集模块10,用于采集多个人的各拍摄角度的人脸影像数据,以构建训练模型样本库。The acquisition module 10 is configured to collect facial image data of each shooting angle of a plurality of people to construct a training model sample library.
本实施例通过采集各角度的人脸影像数据,包括多个人的各角度人脸影像数据,以便更精准地训练模型,提高模型的泛化能力,但由于Capsnet网络结构包括了空间位置关系的因素,相比于现有CNN网络结构在构建训练模型所需的样本数据明显减少,但却得到准确度更高的训练模型。现有的CNN(Convolutional Neural Network,卷积神经网络),从网络设计上来说,池化层不仅减少了参数,还避免了过拟合,但同时抛弃了一些信息,比如位置信息。CNN不关注组件的朝向和空间上的相对关系,只在乎有没有特定的特征。比如现有通过CNN进行人脸识别时,需要识别到人脸的两只眼睛、鼻子、嘴巴等,才能进行人脸识别,所以CNN人脸识别时必须正向摆拍后才能精准识别到需要的五官特征。而且在识别过程中两只眼睛、鼻子、嘴巴的位置发生变化,但满足识别到所有的五官特征后CNN的识别结果依然是人脸,导致识别误差较大,尤其对于识别人脸之外的物品时,错误率更高。本实施例的 CapsNet(胶囊网络,CapsuleNetworks,简称CapsNet)网络结构充分利用空间位置关系,由矩阵乘法来建立训练模型,CapsNet中采用的神经活动会随着视角的变化而变化,而不消除神经活动中视角变化带来的影响,可以同时处理多个不同仿射变换或不同对象的不同部件,使得训练模型可以识别各角度人脸影像。本实施例的训练模型具有等变映射的同变性,各角度人脸影像在经过旋转、平移、缩放后,依然具有识别、表示的能力,使得本实施例的训练模型可以识别各观测到人脸五官的任一角度的人脸影像。举例地,本实施例的CapsNet网络结构输出各角度人脸影像对应的训练模型,能够在人脸俯视、仰视、侧视、正视等任何能识别到人脸任何五官的视角范围内,准确识别人脸,而无需端正摆拍,提高人脸识别的灵活性和高效性,避免了现有人脸识别中人脸必须机械的、端正的面对识别平面的弊端,扩展了人脸识别的灵活性,提升用户使用人脸识别装置的使用体验,且无需改变现有的人脸识别系统的硬件系统。In this embodiment, the face image data of each angle is collected, and the face image data of each angle of multiple people is included, so as to more accurately train the model and improve the generalization ability of the model, but the Capsnet network structure includes the spatial positional relationship factor. Compared with the existing CNN network structure, the sample data required to construct the training model is significantly reduced, but the training model with higher accuracy is obtained. The existing CNN (Convolutional Neural Network), from the network design, the pooling layer not only reduces the parameters, but also avoids over-fitting, but at the same time discards some information, such as location information. CNN does not pay attention to the relative orientation of the components and the spatial relationship, only care about whether there are specific features. For example, when face recognition is performed by CNN, it is necessary to recognize two eyes, nose, mouth, etc. of the face in order to perform face recognition. Therefore, CNN face recognition must be positively shot before accurate recognition is required. Five features. Moreover, the positions of the two eyes, nose and mouth change during the recognition process, but after the recognition of all the facial features, the recognition result of the CNN is still a human face, resulting in a large recognition error, especially for identifying objects other than the human face. The error rate is higher. The CapsNet (Capsule Networks, CapsNet) network structure of this embodiment makes full use of the spatial position relationship, and the training model is established by matrix multiplication. The neural activity used in CapsNet changes with the change of the angle of view without eliminating the neural activity. The influence of the change in the mid-view angle can simultaneously process a plurality of different affine transformations or different components of different objects, so that the training model can recognize the face images of the respective angles. The training model of the embodiment has the homomorphism of the equivariant mapping, and the facial image of each angle still has the ability of recognition and representation after being rotated, translated, and scaled, so that the training model of the embodiment can recognize each observed face. A facial image of any angle of the five senses. For example, the CapsNet network structure of the embodiment outputs a training model corresponding to each angle of the face image, and can accurately identify the person within a range of perspectives such as a face view, a bottom view, a side view, a front view, and the like that can recognize any facial features of the face. The face, without the need for correct posture, improves the flexibility and efficiency of face recognition, and avoids the drawback that the face in the existing face recognition must be mechanically and correctly facing the recognition plane, which expands the flexibility of face recognition. The user experience of using the face recognition device is improved, and there is no need to change the hardware system of the existing face recognition system.
本实施例设A为m×p的人脸五官的特征数据矩阵,B为p×n的人脸五官的空间位置关系矩阵,那么称m×n的矩阵M为矩阵A与B的乘积,记作M=AB,但只在矩阵A的列数等于矩阵B的行数时,A与B相乘才有意义,其中矩阵M中的第i行第j列元素可以表示为:
Figure PCTCN2018095498-appb-000017
其中,a为矩阵A内的数据,b为矩阵B内的数据,P为矩阵A与B内相等的列数或行数。本实施例通过矩阵乘法把人脸五官的特征数据与人脸五官的空间位置关系的组合关系,通过矩阵M表现在模型中,通过矩阵乘法把人脸五官的许多数据紧凑的集中到了一起,可以简便地表示带有空间位置关系的人脸识别模型。
In this embodiment, A is a feature data matrix of a facial face of m×p, and B is a spatial position relationship matrix of a facial face of p×n, then a matrix M of m×n is a product of a matrix A and B, M=AB, but only when the number of columns of the matrix A is equal to the number of rows of the matrix B, it is meaningful to multiply A and B, wherein the i-th row and the j-th column elements in the matrix M can be expressed as:
Figure PCTCN2018095498-appb-000017
Where a is the data in the matrix A, b is the data in the matrix B, and P is the number of columns or rows equal to the matrix A and B. In this embodiment, the combination relationship between the feature data of the facial features and the spatial position of the facial features is represented by the matrix multiplication in the model, and the data of the facial features are compactly combined by matrix multiplication. A face recognition model with a spatial positional relationship is simply represented.
训练模块11,用于将所述训练模型样本库的人脸影像数据输入CapsNet网络结构的卷积层神经网络中进行训练,得到所述人脸识别模型。The training module 11 is configured to input the face image data of the training model sample library into a convolutional layer neural network of a CapsNet network structure for training, to obtain the face recognition model.
本实施例的训练模型可从识别到人脸五官的任意角度识别人脸特征,并根据训练模型中带有的空间位置关系转换出正面人脸的五官特征,进而进行人脸识别。The training model of the embodiment can recognize the facial features from any angle that recognizes the facial features, and converts the facial features of the frontal human face according to the spatial positional relationship in the training model, thereby performing face recognition.
参照图4,本实施例的训练模块11,包括:Referring to FIG. 4, the training module 11 of this embodiment includes:
第一输入单元111,用于将所述训练模型样本库的各角度人脸影像数据输入CapsNet网络结构的第一卷积层,并以第一指定卷积核以及第一指定步幅进行卷积,并通过指定激活函数输出张量。a first input unit 111, configured to input each angle face image data of the training model sample library into a first convolution layer of a CapsNet network structure, and perform convolution with a first specified convolution kernel and a first specified step size And output the tensor by specifying the activation function.
本实施例的CapsNet网络结构包括两个卷积层和一个全连接层,第一卷积层为常规的卷积层,起到像素级局部特征的检测作用。本实施例的第一卷积层有256个9*9的第一指定卷积核,第一指定步幅取1,指定激活函数为ReLU。第一卷积层把像素亮度转化成局部特征检测器的激活,第一卷积层的输出张量作为第二卷积层的输入。The CapsNet network structure of this embodiment includes two convolution layers and a full connection layer, and the first convolution layer is a conventional convolution layer, which serves as a detection function of pixel-level local features. The first convolutional layer of this embodiment has 256 9*9 first designated convolution kernels, the first specified step size is 1, and the activation function is designated as ReLU. The first convolution layer converts the pixel brightness into an activation of the local feature detector, and the output tensor of the first convolution layer serves as the input to the second convolution layer.
第二输入单元112,用于将所述张量输入CapsNet网络结构的第二卷积层,并以第二指定卷积核、以及第二指定步幅进行卷积,以构建张量结构,并输出所述张量结构的Capsule向量。a second input unit 112, configured to input the tensor into a second convolution layer of a CapsNet network structure, and convolute with a second specified convolution kernel and a second specified step to construct a tensor structure, and The Capsule vector of the tensor structure is output.
本实施例的第二卷积层为Primary Capsules层(主胶囊层),为多维度实体类别的最底层,具有32 个通道,每个通道均由一个8维卷积结构组成,每个通道输出一个8维向量,达到8*1 Capsules的特征封装的效果。本实施例的Capsule是一组神经元,其输入输出向量表示特定实体类别的实例化参数。本实施例的特定实体类别为人脸五官。举例地,本实施例的第二指定卷积核为9*9卷积核以及第二指定步幅为2。本实施例的CapsNet架构中,将8个卷积单元封装在一起成为了一个新的Caosule单元。Primary Capsules层的卷积计算都没有使用ReLU等激活函数,而以向量的方式预备输入到下一层Capsule(胶囊)单元中。The second convolution layer of this embodiment is a Primary Capsules layer (main capsule layer), which is the bottom layer of the multi-dimensional entity category, has 32 channels, and each channel is composed of an 8-dimensional convolution structure, and each channel outputs An 8-dimensional vector that achieves the effect of 8*1 Capsules feature encapsulation. The Capsule of this embodiment is a set of neurons whose input and output vectors represent instantiation parameters of a particular entity class. The specific entity category of this embodiment is a facial feature. For example, the second specified convolution kernel of this embodiment is a 9*9 convolution kernel and the second specified step size is 2. In the CapsNet architecture of this embodiment, eight convolution units are packaged together into a new Caosule unit. The convolution calculation of the Primary Capsules layer does not use an activation function such as ReLU, but prepares the input into the next layer of Capsule units in a vector manner.
更新单元113,用于通过CapsNet网络结构的DigitCaps层(数字胶囊层)对所述Capsule向量进行传播与Routing更新,并输出人脸识别模型。The updating unit 113 is configured to propagate and update the Capsule vector through a DigitCaps layer (digital capsule layer) of the CapsNet network structure, and output a face recognition model.
本实施例的CapsNet网络结构的DigitCaps层中,每个Capsule的激活向量模长给出了每个特定实体类别的实例是否存在,一个非常特殊的属性是图像中特定实体类别的实例存在,比如,DigitCaps层的输出范围在0到1之间,0表示不存在,1表示出现了。DigitCaps层的输入是Primary Capsules层中所有Capsule的输出向量ui,向量维度为[8,1];DigitCaps层的输出向量vj,向量维度为[16,1],本实施例的CapsNet的16维输出的训练模型是鲁棒的。In the DigitCaps layer of the CapsNet network structure of this embodiment, the activation vector modulus of each Capsule gives an instance of each specific entity class. A very special attribute is an instance of a specific entity class in the image, for example, The output range of the DigitCaps layer is between 0 and 1, 0 means no, 1 means. The input of the DigitCaps layer is the output vector ui of all Capsules in the Primary Capsules layer, the vector dimension is [8,1]; the output vector vj of the DigitCaps layer, the vector dimension is [16,1], and the 16-dimensional output of CapsNet of this embodiment The training model is robust.
参照图5,本申请另一实施例的识别人脸的装置,包括:Referring to FIG. 5, an apparatus for recognizing a human face according to another embodiment of the present application includes:
接收模块30,用于接收注册人脸的各第二拍摄角度人脸图像。The receiving module 30 is configured to receive each second shooting angle face image of the registered human face.
本实施例的第二拍摄角度为注册人的各拍摄角度,以区别于相对于待测人的第一拍摄角度。本实施例的注册人脸包括一个或多个人的人脸,以便在同一识别设备上识别多人个的人脸图像,扩大应用范围。The second shooting angle of this embodiment is each shooting angle of the registrant to be distinguished from the first shooting angle with respect to the person to be tested. The registered face of the embodiment includes a face of one or more people, so as to recognize a face image of a plurality of people on the same recognition device, and expand the application range.
第二转换模块31,用于根据各第二拍摄角度的人脸图像分别选择与第二拍摄角度对应的第二特征提取方式,一一对应提取各第二拍摄角度对应的注册人脸的各第二人脸的特征,并根据人脸识别模型中带有的空间位置关系,将各第二人脸的特征分别在人脸识别模型中转换成注册人的正面人脸的人脸影像的第二特征向量。The second conversion module 31 is configured to respectively select a second feature extraction manner corresponding to the second imaging angle according to the face images of the second imaging angles, and extract the first registration faces corresponding to the second imaging angles one by one. The characteristics of the two faces, and according to the spatial positional relationship in the face recognition model, the features of each second face are respectively converted into the face image of the front face of the registered person in the face recognition model. Feature vector.
本实施例将注册人脸各第二拍摄角度对应的各第二人脸的特征转换为正面人脸的人脸影像的第二特征的转换方式与转换原理同第一转换模块2,此处不赘述。In this embodiment, the conversion mode and the conversion principle of the second feature of the face image corresponding to the second shooting angle corresponding to the second shooting angle of the registered face are the same as the first conversion module 2, where not Narration.
设置模块32,用于设置所述第二特征向量为所述预设特征向量。The setting module 32 is configured to set the second feature vector to be the preset feature vector.
通过将模型验证注册过的第二特征向量为基准的预设特征向量,以便准确识别待测人的人脸特征向量,当待测的人脸特征向量与预设特征向量相近时,判定为同一个人。By verifying the registered second feature vector as the reference preset feature vector, in order to accurately identify the face feature vector of the person to be tested, when the face feature vector to be tested is close to the preset feature vector, the determination is the same. personal.
本发明其他实施例中注册人可为多人,预设特征向量对应为多个,通过人脸识别模型进行人脸识别时,可通过不同注册人的人脸五官特征的不同,分别建立相对应的数据库,数据库中包含该注册人的五官特征以及五官的空间位置关系。设置模块32包括获取单元,用于获取多个预设特征向量中与第一特征向量匹配的特征向量;以及确定单元,用于确定当期获取的人脸图像与该匹配的特征向量对应的人脸影像为同一人脸影像。通过人脸识别模型识别时先根据识别到的五官特征调取对应的数据库,然后根据 对应的空间位置关系实现分别的转换,转换过程与原理同上所述,不赘述。In other embodiments of the present invention, the registrant may be a plurality of people, and the preset feature vectors are corresponding to multiple. When the face recognition model is used for face recognition, the corresponding facial features of different registrants may be respectively established. The database contains the facial features of the registrant and the spatial positional relationship of the five senses. The setting module 32 includes an obtaining unit, configured to acquire a feature vector of the plurality of preset feature vectors that matches the first feature vector, and a determining unit, configured to determine a face of the current acquired face image corresponding to the matched feature vector The image is the same face image. When the face recognition model is recognized, the corresponding database is firstly retrieved according to the identified features of the facial features, and then the respective conversions are implemented according to the corresponding spatial positional relationship. The conversion process and the principle are the same as those described above, and will not be described again.
参照图6,本实施例的判断模块3,包括:Referring to FIG. 6, the determining module 3 of the embodiment includes:
计算单元300,用于计算上述第一特征向量与上述预设向量的距离值。The calculating unit 300 is configured to calculate a distance value between the first feature vector and the preset vector.
本实施例的距离值包括欧氏距离、余弦距离或马氏距离,本实施例优选用欧氏距离表示第一特征向量与预设特征向量的相似度。本实施例第一特征向量与预设特征向量的欧式距离表示为:
Figure PCTCN2018095498-appb-000018
其中X为用注册时使用的人脸提取到的人脸特征向量,Y为验证时提取到的人脸特征向量,n表示自然数。
The distance value of this embodiment includes an Euclidean distance, a cosine distance or a Mahalanobis distance. In this embodiment, the Euclidean distance is preferably used to represent the similarity between the first feature vector and the preset feature vector. The Euclidean distance between the first feature vector and the preset feature vector in this embodiment is expressed as:
Figure PCTCN2018095498-appb-000018
Where X is the face feature vector extracted from the face used for registration, Y is the face feature vector extracted at the time of verification, and n is a natural number.
判断单元301,用于判断所述距离值的大小是否小于预设阈值。The determining unit 301 is configured to determine whether the size of the distance value is less than a preset threshold.
本实施例以人脸识别为例,基于CapsNet的多角度识别方法,可推广至其他领域的目标物品的识别。In this embodiment, the face recognition is taken as an example, and the multi-angle recognition method based on CapsNet can be extended to the identification of target items in other fields.
本实施例通过CapsNet网络结构输出各角度人脸影像对应的训练模型,输出的各角度人脸影像对应的特征提取方式随着视角的变化而变化,而不消除神经活动中视角变化带来的影响,可以同时处理多个不同仿射变换或不同对象的不同部件,本实施例的训练模型具有等变映射的同变性,各角度人脸影像在经过旋转、平移、缩放后,依然具有识别、表示的能力,使得训练模型可以识别各角度人脸影像。本实施例的CapsNet模型中包含了各人脸五官的空间位置关系,从任何一个能识别到人脸任何五官的角度,都可以通过空间位置关系转换出人脸的正面图像。利用本实施例的人脸识别方法,能够在人脸俯视、仰视、侧视、正视等任何视角范围内,准确识别人脸特征,避免了现有人脸识别中人脸必须机械的、端正的面对识别平面的弊端,扩展了人脸识别的灵活性,提升用户使用人脸识别装置的使用体验,且无需改变现有的人脸识别系统的硬件系统。In this embodiment, the training model corresponding to the face image of each angle is output through the CapsNet network structure, and the feature extraction manner corresponding to the output face image changes according to the change of the angle of view, without eliminating the influence of the change of the angle of view in the neural activity. The different affine transformations or different components of different objects can be processed at the same time. The training model of the embodiment has the homomorphism of the equivariant mapping, and the facial images of each angle still have the recognition and representation after being rotated, translated and scaled. The ability to make the training model recognize facial images at various angles. The CapsNet model of this embodiment includes the spatial positional relationship of each person's facial features. From any angle that can recognize any facial features of the human face, the frontal image of the human face can be converted through the spatial positional relationship. By using the face recognition method of the embodiment, the face features can be accurately recognized in any range of viewing angles such as face view, bottom view, side view, and front view, and the face of the existing face recognition must be mechanically and correctly corrected. The drawbacks of the recognition plane extend the flexibility of face recognition, improve the user experience of using the face recognition device, and do not need to change the hardware system of the existing face recognition system.
参照图7,本申请再一实施例的识别人脸的装置,包括:Referring to FIG. 7, an apparatus for recognizing a human face according to still another embodiment of the present application includes:
发出模块5,用于向装配有所述人脸识别模型的安防系统发出控制指令,以打开所述安防系统,使受所述安防系统控制的应用实体处于可使用状态。The issuing module 5 is configured to issue a control instruction to the security system equipped with the face recognition model to open the security system to make the application entity controlled by the security system in a usable state.
本实施例将人脸识别装置用于指定的安防系统,当通过人脸识别判定为同一人后,则向安防系统发送为同一个人时的预设控制指令,以便更好的发挥安防系统的功能。本实施例的安防系统包括但不限于智能门锁开关、身份验证门禁以及各种互联网安防平台,比如:税务登记平台、银行账户平台、考生身份验证平台等,提高验证的时效性以及准确性。本实施例的应用实体包括实体物体和虚拟平台,实体物体比如实体玩具、公共健身器材等,虚拟平台比如网络游戏平台、网络视频平台等。In this embodiment, the face recognition device is used in the designated security system. When the face recognition is determined to be the same person, the preset control command when the same person is sent to the security system is used to better play the function of the security system. . The security system of this embodiment includes, but is not limited to, a smart door lock switch, an identity verification access control, and various internet security platforms, such as a tax registration platform, a bank account platform, and a candidate authentication platform, etc., to improve the timeliness and accuracy of verification. The application entity of this embodiment includes a physical object and a virtual platform, such as a physical toy, a public fitness equipment, and the like, and a virtual platform such as an online game platform, a network video platform, or the like.
参照图8,本申请又一实施例的识别人脸的装置,包括:Referring to FIG. 8, an apparatus for recognizing a human face according to still another embodiment of the present application includes:
统计模块6,用于统计指定同一个人在指定时间段,持续使用应用实体的时间长度是否超过阈值。The statistics module 6 is configured to count whether the length of time that the same person continues to use the application entity exceeds a threshold in a specified time period.
本实施例的阈值范围可根据不同的使用领域,进行具体设置。比如,本实施例用于游戏监控领域,用于防止同一个长时间处于游戏状态,影响身体健康。本实施例可设定阈值为连续12小时内的累计处 于游戏状态的时间长度为2小时。The threshold range of this embodiment can be specifically set according to different fields of use. For example, the embodiment is used in the field of game monitoring to prevent the same long time from being in a game state and affecting physical health. In this embodiment, the threshold value can be set to be 2 hours in the game state in the continuous 12 hours.
生成模块7,用于若时间长度超过阈值,则生成关闭应用实体的指令,以禁止继续使用该应用实体。The generating module 7 is configured to generate an instruction to close the application entity if the time length exceeds the threshold to prohibit continued use of the application entity.
本实施例通过与时间监控进行组合使用,以进一步扩大人脸识别的应用场景领域,比如将时间监控与人脸识别实时监控联合后,用于游戏软件的管控,以防止过度沉迷于游戏,比如设定人脸识别系统实施监控游戏系统的使用状态以及使用用户是否为同一个人,当监控到同一个人处于游戏系统的游戏时间超过预设值,则控制游戏系统处于锁屏状态。再比如用于公共资源的管控分配,通过人脸识别监控指定公共领域范围内的公共设施的使用状态,若判断同一个人的持续使用时间超过预设值,则关停该公共设施,当判定是其他人进入该公共领域范围内使用时,自动开启提供使用,利于合理分配公共资源。The embodiment is used in combination with time monitoring to further expand the application scene field of face recognition, for example, combining time monitoring with real-time monitoring of face recognition, and controlling the game software to prevent excessive addiction to the game, such as The face recognition system is configured to monitor the use state of the game system and whether the user is the same person. When the game time of the same person in the game system is monitored to exceed the preset value, the game system is controlled to be in a lock screen state. For example, for the management and distribution of public resources, the face recognition is used to monitor the use status of public facilities in the public domain. If it is determined that the same person's continuous use time exceeds the preset value, the public facility is shut down, and when it is determined that When others enter the public domain, they are automatically opened for use, which facilitates the rational allocation of public resources.
参照图9,本申请又一实施例的识别人脸的装置,包括:Referring to FIG. 9, an apparatus for recognizing a human face according to still another embodiment of the present application includes:
汇总模块8,用于汇总所述同一人脸的图像数据至同一指定文件中。The summary module 8 is configured to summarize the image data of the same face into the same specified file.
本实施例以对注册人的电子相片数据归类为例,通过对同一注册人脸图像的辨识,将同一注册人的图像汇总在同一文件夹中,实现对包含同一注册人的图像数据进行规整,以便更方便的查找注册人的相应的相片资料。In this embodiment, by categorizing the electronic photo data of the registrant as an example, by identifying the same registered face image, the images of the same registrant are summarized in the same folder, thereby realizing the regularization of the image data including the same registrant. To make it easier to find the corresponding photo data of the registrant.
参照图10,本申请实施例中还提供一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图10所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储识别人脸等数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令在执行时,执行如上述各方法的实施例的流程。本领域技术人员可以理解,图10中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定。Referring to FIG. 10, a computer device is also provided in the embodiment of the present application. The computer device may be a server, and its internal structure may be as shown in FIG. The computer device includes a processor, memory, network interface, and database connected by a system bus. Among them, the computer designed processor is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium, an internal memory. The non-volatile storage medium stores an operating system, computer readable instructions, and a database. The memory provides an environment for the operation of operating systems and computer readable instructions in a non-volatile storage medium. The database of the computer device is used to store data such as identifying faces. The network interface of the computer device is used to communicate with an external terminal via a network connection. The computer readable instructions, when executed, perform the flow of an embodiment of the methods described above. It will be understood by those skilled in the art that the structure shown in FIG. 10 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation of the computer device to which the present application is applied.
本申请一实施例还提供一种计算机非易失性可读存储介质,其上存储有计算机可读指令,该计算机可读指令在执行时,执行如上述各方法的实施例的流程。以上所述仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。An embodiment of the present application also provides a computer non-volatile readable storage medium having stored thereon computer readable instructions that, when executed, perform the processes of the embodiments of the methods described above. The above description is only the preferred embodiment of the present application, and is not intended to limit the scope of the patent application, and the equivalent structure or equivalent process transformations made by the specification and the drawings of the present application, or directly or indirectly applied to other related The technical field is equally included in the scope of patent protection of the present application.

Claims (20)

  1. 一种识别人脸的方法,其特征在于,包括:A method for recognizing a human face, comprising:
    根据获取到的人脸影像的第一拍摄角度,在基于CapsNet网络结构训练的人脸识别模型中选择对应的第一特征提取方式;Selecting a corresponding first feature extraction mode in the face recognition model based on the CapsNet network structure training according to the first shooting angle of the acquired face image;
    根据所述第一特征提取方式提取所述第一拍摄角度对应的第一人脸的特征,并根据所述人脸识别模型中带有的空间位置关系,将所述第一人脸的特征在所述人脸识别模型中转换成正面人脸的人脸影像的第一特征向量;Extracting, according to the first feature extraction manner, a feature of the first face corresponding to the first shooting angle, and according to a spatial position relationship in the face recognition model, the feature of the first face is a first feature vector of the face image converted into a frontal face in the face recognition model;
    判断所述第一特征向量与预设特征向量的相似度是否小于预设阈值;Determining whether the similarity between the first feature vector and the preset feature vector is less than a preset threshold;
    若小于,则判定所述获取到的人脸影像与所述预设特征向量对应的人脸影像为同一个人的人脸影像。If it is smaller, it is determined that the acquired face image and the face image corresponding to the preset feature vector are face images of the same person.
  2. 根据权利要求1所述的识别人脸的方法,其特征在于,所述根据获取到的人脸影像的第一拍摄角度,在基于CapsNet网络结构训练的人脸识别模型中选择对应的第一特征提取方式的步骤之前,包括:The method for recognizing a human face according to claim 1, wherein the selecting a corresponding first feature in a face recognition model based on CapsNet network structure training according to the first photographing angle of the acquired facial image Before the steps of the extraction method, include:
    采集多个人的各拍摄角度的人脸影像数据,以构建训练模型样本库;Collecting face image data of each shooting angle of a plurality of people to construct a training model sample library;
    将所述训练模型样本库的人脸影像数据输入CapsNet网络结构的卷积层神经网络中进行训练,得到所述人脸识别模型。The face image data of the training model sample library is input into a convolutional layer neural network of a CapsNet network structure for training, and the face recognition model is obtained.
  3. 根据权利要求2所述的识别人脸的方法,其特征在于,所述将所述训练模型样本库的人脸影像数据输入CapsNet网络结构的卷积层神经网络中进行训练,得到所述人脸识别模型的步骤,包括:The method for recognizing a human face according to claim 2, wherein the face image data of the training model sample library is input into a convolutional layer neural network of a CapsNet network structure for training, and the face is obtained. Steps to identify the model, including:
    将所述训练模型样本库的各角度人脸影像数据输入CapsNet网络结构的第一卷积层,并以第一指定卷积核以及第一指定步幅进行卷积,并通过指定激活函数输出张量;Inputting each angle face image data of the training model sample library into a first convolution layer of the CapsNet network structure, convolving with the first specified convolution kernel and the first specified step, and outputting the output by specifying an activation function the amount;
    将所述张量输入CapsNet网络结构的第二卷积层,并以第二指定卷积核、以及第二指定步幅进行卷积,以构建张量结构,并输出所述张量结构的Capsule向量;Transmitting the tensor into a second convolutional layer of the CapsNet network structure and convolving with a second specified convolution kernel and a second specified step to construct a tensor structure and output a Capsule of the tensor structure vector;
    通过CapsNet网络结构的DigitCaps层对所述Capsule向量进行传播与Routing更新,并输出人脸识别模型。The Capsule vector is propagated and updated by the DigitCaps layer of the CapsNet network structure, and the face recognition model is output.
  4. 根据权利要求1所述的识别人脸的方法,其特征在于,所述判断所述第一特征向量与预设特征向量的相似度是否小于预设阈值的步骤之前,包括:The method for identifying a face according to claim 1, wherein the step of determining whether the similarity between the first feature vector and the preset feature vector is less than a preset threshold comprises:
    接收注册人脸的各第二拍摄角度的人脸图像;Receiving a face image of each second shooting angle of the registered face;
    根据各第二拍摄角度分别选择与第二拍摄角度对应的第二特征提取方式,一一对应提取各第二拍摄角度对应的注册人脸的各第二人脸的特征,并根据所述人脸识别模型中带有的空间位置关系,将各所述第二人脸的特征分别在所述人脸识别模型中转换成所述注册人的正面人脸的人脸影像的第二特征向量;Selecting, according to each second shooting angle, a second feature extraction manner corresponding to the second shooting angle, and extracting features of the second human faces of the registered human faces corresponding to the second shooting angles one by one, and according to the human face Identifying a spatial positional relationship in the model, and converting the features of each of the second human faces into the second feature vector of the face image of the front face of the registrant in the face recognition model;
    设置所述第二特征向量为所述预设特征向量。The second feature vector is set to be the preset feature vector.
  5. 根据权利要求4所述的识别人脸的方法,其特征在于,所述判断所述第一特征向量与预设特征 向量的相似度是否小于预设阈值的步骤,包括:The method for recognizing a human face according to claim 4, wherein the step of determining whether the similarity between the first feature vector and the preset feature vector is less than a preset threshold comprises:
    计算所述第一特征向量与所述预设特征向量的距离值;Calculating a distance value between the first feature vector and the preset feature vector;
    判断所述距离值的大小是否小于预设阈值。It is determined whether the size of the distance value is less than a preset threshold.
  6. 根据权利要求1所述的识别人脸的方法,其特征在于,所述判定所述获取到的人脸影像与所述预设特征向量对应的人脸影像为同一个人的人脸影像的步骤之后,包括:The method for recognizing a human face according to claim 1, wherein the step of determining that the acquired face image and the face image corresponding to the preset feature vector are the same person's face image ,include:
    向装配有所述人脸识别模型的安防系统发出控制指令,以打开所述安防系统,使受所述安防系统控制的应用实体处于可使用状态。A control command is issued to the security system equipped with the face recognition model to open the security system to enable the application entity controlled by the security system to be in a usable state.
  7. 根据权利要求6所述的识别人脸的方法,其特征在于,所述向装配有所述人脸识别模型的安防系统发出控制指令,以打开所述安防系统,使受所述安防系统控制的应用实体处于可使用状态的步骤之后,包括:The method for recognizing a human face according to claim 6, wherein said control system is issued with a control command to a security system equipped with said face recognition model to open said security system to be controlled by said security system After the application entity is in a usable state, it includes:
    统计指定同一个人在指定时间段的持续使用所述应用实体的累计时间长度是否超过阈值;The statistics specify whether the accumulated time length of the same entity for the continuous use of the application entity in the specified time period exceeds a threshold;
    若超过,则生成关闭所述应用实体的指令,以禁止继续使用该应用实体。If exceeded, an instruction to close the application entity is generated to prohibit continued use of the application entity.
  8. 一种识别人脸的装置,其特征在于,包括:A device for recognizing a human face, comprising:
    获取模块,用于根据获取到的人脸影像的第一拍摄角度,在基于CapsNet网络结构训练的人脸识别模型中选择对应的第一特征提取方式;An acquiring module, configured to select a corresponding first feature extraction mode in a face recognition model based on the CapsNet network structure training according to the first shooting angle of the acquired face image;
    第一转换模块,用于根据所述第一特征提取方式提取所述第一拍摄角度对应的第一人脸的特征,并根据所述人脸识别模型中带有的空间位置关系,将所述第一人脸的特征在所述人脸识别模型中转换成正面人脸的人脸影像的第一特征向量;a first conversion module, configured to extract, according to the first feature extraction manner, a feature of the first face corresponding to the first shooting angle, and according to a spatial position relationship in the face recognition model, a feature of the first human face is converted into a first feature vector of the face image of the frontal face in the face recognition model;
    判断模块,用于判断所述第一特征向量与预设特征向量的相似度是否小于预设阈值;a determining module, configured to determine whether a similarity between the first feature vector and the preset feature vector is less than a preset threshold;
    判定模块,用于若相似度小于预设阈值,则判定所述获取到的人脸影像与所述预设特征向量对应的人脸影像为同一个人的人脸影像。The determining module is configured to determine that the face image corresponding to the acquired face image and the preset feature vector is a face image of the same person if the similarity is less than a preset threshold.
  9. 根据权利要求8所述的识别人脸的装置,其特征在于,包括:The device for recognizing a human face according to claim 8, comprising:
    采集模块,用于采集多个人的各拍摄角度的人脸影像数据,以构建训练模型样本库;An acquisition module is configured to collect facial image data of each shooting angle of a plurality of people to construct a training model sample library;
    训练模块,用于将所述训练模型样本库的人脸影像数据输入CapsNet网络结构的卷积层神经网络中进行训练,得到所述人脸识别模型。And a training module, configured to input the face image data of the training model sample library into a convolutional layer neural network of a CapsNet network structure for training, to obtain the face recognition model.
  10. 根据权利要求9所述的识别人脸的装置,其特征在于,所述训练模块,包括:The device for recognizing a human face according to claim 9, wherein the training module comprises:
    第一输入单元,用于将所述训练模型样本库的各角度人脸影像数据输入CapsNet网络结构的第一卷积层,并以第一指定卷积核以及第一指定步幅进行卷积,并通过指定激活函数输出张量;a first input unit, configured to input each angle face image data of the training model sample library into a first convolution layer of a CapsNet network structure, and convolute with a first specified convolution kernel and a first specified step, And output the tensor by specifying the activation function;
    第二输入单元,用于将所述张量输入CapsNet网络结构的第二卷积层,并以第二指定卷积核、以及第二指定步幅进行卷积,以构建张量结构,并输出所述张量结构的Capsule向量;a second input unit, configured to input the tensor into a second convolution layer of the CapsNet network structure, and convolute with a second specified convolution kernel and a second specified step to construct a tensor structure and output a Capsule vector of the tensor structure;
    更新单元,用于通过CapsNet网络结构的DigitCaps层对所述Capsule向量进行传播与Routing更新, 并输出人脸识别模型。And an updating unit, configured to propagate and update the Capsule vector through the DigitCaps layer of the CapsNet network structure, and output a face recognition model.
  11. 根据权利要求8所述的识别人脸的装置,其特征在于,包括:The device for recognizing a human face according to claim 8, comprising:
    接收模块,用于接收注册人脸的各第二拍摄角度的人脸图像;a receiving module, configured to receive a face image of each second shooting angle of the registered face;
    第二转换模块,用于根据各第二拍摄角度分别选择与第二拍摄角度对应的第二特征提取方式,一一对应提取各第二拍摄角度对应的注册人脸的各第二人脸的特征,并根据所述人脸识别模型中带有的空间位置关系,将各所述第二人脸的特征分别在所述人脸识别模型中转换成所述注册人的正面人脸的人脸影像的第二特征向量;a second conversion module, configured to respectively select a second feature extraction manner corresponding to the second imaging angle according to each second imaging angle, and extract features of the second human faces of the registered human faces corresponding to the second imaging angles one by one And converting, according to the spatial positional relationship in the face recognition model, the features of each of the second human faces into the face image of the front face of the registrant in the face recognition model Second feature vector;
    设置模块,用于设置所述第二特征向量为所述预设特征向量。And a setting module, configured to set the second feature vector to be the preset feature vector.
  12. 根据权利要求11所述的识别人脸的装置,其特征在于,所述判断模块,包括:The device for recognizing a human face according to claim 11, wherein the determining module comprises:
    计算单元,用于计算所述第一特征向量与所述预设特征向量的距离值;a calculating unit, configured to calculate a distance value between the first feature vector and the preset feature vector;
    判断单元,用于判断所述距离值的大小是否小于预设阈值。The determining unit is configured to determine whether the size of the distance value is less than a preset threshold.
  13. 根据权利要求8所述的识别人脸的装置,其特征在于,包括:The device for recognizing a human face according to claim 8, comprising:
    发出模块,用于向装配有所述人脸识别模型的安防系统发出控制指令,以打开所述安防系统,使受所述安防系统控制的应用实体处于可使用状态。And an issuing module, configured to issue a control instruction to the security system equipped with the face recognition model to open the security system, so that the application entity controlled by the security system is in a usable state.
  14. 根据权利要求13所述的识别人脸的装置,其特征在于,包括:The device for recognizing a human face according to claim 13, comprising:
    统计模块,用于统计指定同一个人在指定时间段的持续使用所述应用实体的累计时间长度是否超过阈值;a statistics module, configured to count whether a cumulative time length of the same entity for the specified use period of the application entity exceeds a threshold;
    生成模块,用于若累计时间长度超过阈值,则生成关闭所述应用实体的指令,以禁止继续使用该应用实体。And a generating module, configured to: if the accumulated time length exceeds a threshold, generate an instruction to close the application entity to prohibit continued use of the application entity.
  15. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现识别人脸的方法,该识别人脸的方法,包括:A computer device comprising a memory and a processor, the memory storing computer readable instructions, wherein the processor implements a method of recognizing a face when the computer readable instructions are executed, the method of recognizing a face ,include:
    根据获取到的人脸影像的第一拍摄角度,在基于CapsNet网络结构训练的人脸识别模型中选择对应的第一特征提取方式;Selecting a corresponding first feature extraction mode in the face recognition model based on the CapsNet network structure training according to the first shooting angle of the acquired face image;
    根据所述第一特征提取方式提取所述第一拍摄角度对应的第一人脸的特征,并根据所述人脸识别模型中带有的空间位置关系,将所述第一人脸的特征在所述人脸识别模型中转换成正面人脸的人脸影像的第一特征向量;Extracting, according to the first feature extraction manner, a feature of the first face corresponding to the first shooting angle, and according to a spatial position relationship in the face recognition model, the feature of the first face is a first feature vector of the face image converted into a frontal face in the face recognition model;
    判断所述第一特征向量与预设特征向量的相似度是否小于预设阈值;Determining whether the similarity between the first feature vector and the preset feature vector is less than a preset threshold;
    若小于,则判定所述获取到的人脸影像与所述预设特征向量对应的人脸影像为同一个人的人脸影像。If it is smaller, it is determined that the acquired face image and the face image corresponding to the preset feature vector are face images of the same person.
  16. 根据权利要求15所述的计算机设备,其特征在于,所述根据获取到的人脸影像的第一拍摄角度,在基于CapsNet网络结构训练的人脸识别模型中选择对应的第一特征提取方式的步骤之前,包括:The computer device according to claim 15, wherein the selecting the corresponding first feature extraction mode in the face recognition model based on the CapsNet network structure training according to the first shooting angle of the acquired face image Before the steps, include:
    采集多个人的各拍摄角度的人脸影像数据,以构建训练模型样本库;Collecting face image data of each shooting angle of a plurality of people to construct a training model sample library;
    将所述训练模型样本库的人脸影像数据输入CapsNet网络结构的卷积层神经网络中进行训练,得到所述人脸识别模型。The face image data of the training model sample library is input into a convolutional layer neural network of a CapsNet network structure for training, and the face recognition model is obtained.
  17. 根据权利要求16所述的计算机设备,其特征在于,所述将所述训练模型样本库的人脸影像数据输入CapsNet网络结构的卷积层神经网络中进行训练,得到所述人脸识别模型的步骤,包括:The computer device according to claim 16, wherein the face image data of the training model sample library is input into a convolutional layer neural network of a CapsNet network structure for training, and the face recognition model is obtained. Steps, including:
    将所述训练模型样本库的各角度人脸影像数据输入CapsNet网络结构的第一卷积层,并以第一指定卷积核以及第一指定步幅进行卷积,并通过指定激活函数输出张量;Inputting each angle face image data of the training model sample library into a first convolution layer of the CapsNet network structure, convolving with the first specified convolution kernel and the first specified step, and outputting the output by specifying an activation function the amount;
    将所述张量输入CapsNet网络结构的第二卷积层,并以第二指定卷积核、以及第二指定步幅进行卷积,以构建张量结构,并输出所述张量结构的Capsule向量;Transmitting the tensor into a second convolutional layer of the CapsNet network structure and convolving with a second specified convolution kernel and a second specified step to construct a tensor structure and output a Capsule of the tensor structure vector;
    通过CapsNet网络结构的DigitCaps层对所述Capsule向量进行传播与Routing更新,并输出人脸识别模型。The Capsule vector is propagated and updated by the DigitCaps layer of the CapsNet network structure, and the face recognition model is output.
  18. 一种计算机非易失性可读存储介质,其上存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现识别人脸的方法,该识别人脸的方法,包括:A computer non-volatile readable storage medium having stored thereon computer readable instructions, wherein the computer readable instructions are implemented by a processor to implement a method of recognizing a human face, the method of recognizing a human face, include:
    根据获取到的人脸影像的第一拍摄角度,在基于CapsNet网络结构训练的人脸识别模型中选择对应的第一特征提取方式;Selecting a corresponding first feature extraction mode in the face recognition model based on the CapsNet network structure training according to the first shooting angle of the acquired face image;
    根据所述第一特征提取方式提取所述第一拍摄角度对应的第一人脸的特征,并根据所述人脸识别模型中带有的空间位置关系,将所述第一人脸的特征在所述人脸识别模型中转换成正面人脸的人脸影像的第一特征向量;Extracting, according to the first feature extraction manner, a feature of the first face corresponding to the first shooting angle, and according to a spatial position relationship in the face recognition model, the feature of the first face is a first feature vector of the face image converted into a frontal face in the face recognition model;
    判断所述第一特征向量与预设特征向量的相似度是否小于预设阈值;Determining whether the similarity between the first feature vector and the preset feature vector is less than a preset threshold;
    若小于,则判定所述获取到的人脸影像与所述预设特征向量对应的人脸影像为同一个人的人脸影像。If it is smaller, it is determined that the acquired face image and the face image corresponding to the preset feature vector are face images of the same person.
  19. 根据权利要求18所述的计算机非易失性可读存储介质,其特征在于,所述根据获取到的人脸影像的第一拍摄角度,在基于CapsNet网络结构训练的人脸识别模型中选择对应的第一特征提取方式的步骤之前,包括:The computer non-volatile readable storage medium according to claim 18, wherein the selecting a corresponding face recognition model based on the CapsNet network structure training according to the first shooting angle of the acquired face image Before the steps of the first feature extraction method, include:
    采集多个人的各拍摄角度的人脸影像数据,以构建训练模型样本库;Collecting face image data of each shooting angle of a plurality of people to construct a training model sample library;
    将所述训练模型样本库的人脸影像数据输入CapsNet网络结构的卷积层神经网络中进行训练,得到所述人脸识别模型。The face image data of the training model sample library is input into a convolutional layer neural network of a CapsNet network structure for training, and the face recognition model is obtained.
  20. 根据权利要求19所述的计算机非易失性可读存储介质,其特征在于,所述将所述训练模型样本库的人脸影像数据输入CapsNet网络结构的卷积层神经网络中进行训练,得到所述人脸识别模型的步骤,包括:The computer non-volatile readable storage medium according to claim 19, wherein the face image data of the training model sample library is input into a convolutional layer neural network of a CapsNet network structure for training, The steps of the face recognition model include:
    将所述训练模型样本库的各角度人脸影像数据输入CapsNet网络结构的第一卷积层,并以第一指定 卷积核以及第一指定步幅进行卷积,并通过指定激活函数输出张量;Inputting each angle face image data of the training model sample library into a first convolution layer of the CapsNet network structure, convolving with the first specified convolution kernel and the first specified step, and outputting the output by specifying an activation function the amount;
    将所述张量输入CapsNet网络结构的第二卷积层,并以第二指定卷积核、以及第二指定步幅进行卷积,以构建张量结构,并输出所述张量结构的Capsule向量;Transmitting the tensor into a second convolutional layer of the CapsNet network structure and convolving with a second specified convolution kernel and a second specified step to construct a tensor structure and output a Capsule of the tensor structure vector;
    通过CapsNet网络结构的DigitCaps层对所述Capsule向量进行传播与Routing更新,并输出人脸识别模型。The Capsule vector is propagated and updated by the DigitCaps layer of the CapsNet network structure, and the face recognition model is output.
PCT/CN2018/095498 2018-04-17 2018-07-12 Facial recognition method, apparatus, computing device and storage medium WO2019200749A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810344669.0A CN108764031B (en) 2018-04-17 2018-04-17 Method, device, computer equipment and storage medium for recognizing human face
CN201810344669.0 2018-04-17

Publications (1)

Publication Number Publication Date
WO2019200749A1 true WO2019200749A1 (en) 2019-10-24

Family

ID=64010719

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/095498 WO2019200749A1 (en) 2018-04-17 2018-07-12 Facial recognition method, apparatus, computing device and storage medium

Country Status (2)

Country Link
CN (1) CN108764031B (en)
WO (1) WO2019200749A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111062936A (en) * 2019-12-27 2020-04-24 中国科学院上海生命科学研究院 Quantitative index evaluation method for facial deformation diagnosis and treatment effect
CN111062260A (en) * 2019-11-25 2020-04-24 杭州绿度信息技术有限公司 Automatic generation method of facial cosmetic recommendation scheme
CN111339990A (en) * 2020-03-13 2020-06-26 乐鑫信息科技(上海)股份有限公司 Face recognition system and method based on dynamic update of face features
CN111582305A (en) * 2020-03-26 2020-08-25 平安科技(深圳)有限公司 Biological feature recognition method and device, computer equipment and storage medium
CN111860093A (en) * 2020-03-13 2020-10-30 北京嘀嘀无限科技发展有限公司 Image processing method, device, equipment and computer readable storage medium
CN112183394A (en) * 2020-09-30 2021-01-05 江苏智库智能科技有限公司 Face recognition method and device and intelligent security management system
CN112884049A (en) * 2021-02-24 2021-06-01 浙江商汤科技开发有限公司 Method for detecting registration image in input image, and related device and equipment
CN113111679A (en) * 2020-01-09 2021-07-13 北京君正集成电路股份有限公司 Design method of human-shaped upper half monitoring network structure
CN113283313A (en) * 2021-05-10 2021-08-20 长沙海信智能系统研究院有限公司 Information processing method, device and equipment
CN115471946A (en) * 2022-10-18 2022-12-13 深圳市盛思达通讯技术有限公司 Quick passing system and method of non-contact detection gate

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291773A (en) * 2018-12-06 2020-06-16 西安光启未来技术研究院 Feature identification method and device
CN109784243B (en) * 2018-12-29 2021-07-09 网易(杭州)网络有限公司 Identity determination method and device, neural network training method and device, and medium
CN109948509A (en) * 2019-03-11 2019-06-28 成都旷视金智科技有限公司 Obj State monitoring method, device and electronic equipment
CN110059560B (en) * 2019-03-18 2023-02-24 创新先进技术有限公司 Face recognition method, device and equipment
CN110197125A (en) * 2019-05-05 2019-09-03 上海资汇信息科技有限公司 Face identification method under unconfined condition
CN110263236B (en) * 2019-06-06 2022-11-08 太原理工大学 Social network user multi-label classification method based on dynamic multi-view learning model
CN112434547B (en) * 2019-08-26 2023-11-14 中国移动通信集团广东有限公司 User identity auditing method and device
CN110909766B (en) * 2019-10-29 2022-11-29 北京明略软件系统有限公司 Similarity determination method and device, storage medium and electronic device
CN110866962B (en) * 2019-11-20 2023-06-16 成都威爱新经济技术研究院有限公司 Virtual portrait and expression synchronization method based on convolutional neural network
CN111931882B (en) * 2020-07-20 2023-07-21 五邑大学 Automatic goods checkout method, system and storage medium
CN112036281B (en) * 2020-07-29 2023-06-09 重庆工商大学 Facial expression recognition method based on improved capsule network
CN112115998B (en) * 2020-09-11 2022-11-25 昆明理工大学 Method for overcoming catastrophic forgetting based on anti-incremental clustering dynamic routing network
CN113219870B (en) * 2021-05-07 2022-03-08 禹焱科技河北有限公司 Intelligent data acquisition and sharing device for industrial instrument
CN113642540B (en) * 2021-10-14 2022-01-28 中国科学院自动化研究所 Capsule network-based facial expression recognition method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130266195A1 (en) * 2012-04-10 2013-10-10 Derek Shiell Hash-Based Face Recognition System
CN107729875A (en) * 2017-11-09 2018-02-23 上海快视信息技术有限公司 Three-dimensional face identification method and device
CN107832735A (en) * 2017-11-24 2018-03-23 百度在线网络技术(北京)有限公司 Method and apparatus for identifying face
CN107844744A (en) * 2017-10-09 2018-03-27 平安科技(深圳)有限公司 With reference to the face identification method, device and storage medium of depth information

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899579A (en) * 2015-06-29 2015-09-09 小米科技有限责任公司 Face recognition method and face recognition device
CN106022317A (en) * 2016-06-27 2016-10-12 北京小米移动软件有限公司 Face identification method and apparatus
CN106780906B (en) * 2016-12-28 2019-06-21 北京品恩科技股份有限公司 A kind of testimony of a witness unification recognition methods and system based on depth convolutional neural networks
CN107423690B (en) * 2017-06-26 2020-11-13 广东工业大学 Face recognition method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130266195A1 (en) * 2012-04-10 2013-10-10 Derek Shiell Hash-Based Face Recognition System
CN107844744A (en) * 2017-10-09 2018-03-27 平安科技(深圳)有限公司 With reference to the face identification method, device and storage medium of depth information
CN107729875A (en) * 2017-11-09 2018-02-23 上海快视信息技术有限公司 Three-dimensional face identification method and device
CN107832735A (en) * 2017-11-24 2018-03-23 百度在线网络技术(北京)有限公司 Method and apparatus for identifying face

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GEOFFREY: "Dynamic Routing Between Capsules", 31ST CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS 2017, 7 November 2017 (2017-11-07), XP055559227 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111062260A (en) * 2019-11-25 2020-04-24 杭州绿度信息技术有限公司 Automatic generation method of facial cosmetic recommendation scheme
CN111062260B (en) * 2019-11-25 2024-03-05 杭州绿度信息技术有限公司 Automatic generation method of face-beautifying recommendation scheme
CN111062936B (en) * 2019-12-27 2023-11-03 中国科学院上海营养与健康研究所 Quantitative index evaluation method for facial deformation diagnosis and treatment effect
CN111062936A (en) * 2019-12-27 2020-04-24 中国科学院上海生命科学研究院 Quantitative index evaluation method for facial deformation diagnosis and treatment effect
CN113111679A (en) * 2020-01-09 2021-07-13 北京君正集成电路股份有限公司 Design method of human-shaped upper half monitoring network structure
CN111339990B (en) * 2020-03-13 2023-03-24 乐鑫信息科技(上海)股份有限公司 Face recognition system and method based on dynamic update of face features
CN111339990A (en) * 2020-03-13 2020-06-26 乐鑫信息科技(上海)股份有限公司 Face recognition system and method based on dynamic update of face features
CN111860093B (en) * 2020-03-13 2024-05-14 北京嘀嘀无限科技发展有限公司 Image processing method, device, equipment and computer readable storage medium
CN111860093A (en) * 2020-03-13 2020-10-30 北京嘀嘀无限科技发展有限公司 Image processing method, device, equipment and computer readable storage medium
CN111582305B (en) * 2020-03-26 2023-08-18 平安科技(深圳)有限公司 Biological feature recognition method, apparatus, computer device and storage medium
CN111582305A (en) * 2020-03-26 2020-08-25 平安科技(深圳)有限公司 Biological feature recognition method and device, computer equipment and storage medium
CN112183394A (en) * 2020-09-30 2021-01-05 江苏智库智能科技有限公司 Face recognition method and device and intelligent security management system
CN112884049A (en) * 2021-02-24 2021-06-01 浙江商汤科技开发有限公司 Method for detecting registration image in input image, and related device and equipment
CN113283313B (en) * 2021-05-10 2022-10-11 长沙海信智能系统研究院有限公司 Information processing method, device and equipment
CN113283313A (en) * 2021-05-10 2021-08-20 长沙海信智能系统研究院有限公司 Information processing method, device and equipment
CN115471946A (en) * 2022-10-18 2022-12-13 深圳市盛思达通讯技术有限公司 Quick passing system and method of non-contact detection gate

Also Published As

Publication number Publication date
CN108764031A (en) 2018-11-06
CN108764031B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
WO2019200749A1 (en) Facial recognition method, apparatus, computing device and storage medium
CN109902546B (en) Face recognition method, face recognition device and computer readable medium
WO2020228525A1 (en) Place recognition method and apparatus, model training method and apparatus for place recognition, and electronic device
WO2019128508A1 (en) Method and apparatus for processing image, storage medium, and electronic device
WO2020103700A1 (en) Image recognition method based on micro facial expressions, apparatus and related device
WO2019227479A1 (en) Method and apparatus for generating face rotation image
WO2018228218A1 (en) Identification method, computing device, and storage medium
US20220375213A1 (en) Processing Apparatus and Method and Storage Medium
WO2021190296A1 (en) Dynamic gesture recognition method and device
JP6624794B2 (en) Image processing apparatus, image processing method, and program
WO2022188697A1 (en) Biological feature extraction method and apparatus, device, medium, and program product
Yu et al. Human action recognition using deep learning methods
WO2021218238A1 (en) Image processing method and image processing apparatus
WO2021047587A1 (en) Gesture recognition method, electronic device, computer-readable storage medium, and chip
Cheng et al. Augmented reality dynamic image recognition technology based on deep learning algorithm
CN112395979A (en) Image-based health state identification method, device, equipment and storage medium
CN113298158B (en) Data detection method, device, equipment and storage medium
Papadopoulos et al. Human action recognition using 3d reconstruction data
WO2022052782A1 (en) Image processing method and related device
WO2021190433A1 (en) Method and device for updating object recognition model
CN112529149A (en) Data processing method and related device
Chang et al. Salgaze: Personalizing gaze estimation using visual saliency
Prakash et al. Accurate hand gesture recognition using CNN and RNN approaches
Das et al. A fusion of appearance based CNNs and temporal evolution of skeleton with LSTM for daily living action recognition
WO2023142886A1 (en) Expression transfer method, model training method, and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18915661

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18915661

Country of ref document: EP

Kind code of ref document: A1