CN111860456B

CN111860456B - Face recognition method

Info

Publication number: CN111860456B
Application number: CN202010773738.7A
Authority: CN
Inventors: 苏文烈
Original assignee: Guangzhou Weizhilian Technology Co ltd
Current assignee: Guangzhou Weizhilian Technology Co ltd
Priority date: 2020-08-04
Filing date: 2020-08-04
Publication date: 2024-02-02
Anticipated expiration: 2040-08-04
Also published as: CN111860456A

Abstract

The invention belongs to the technical field of artificial intelligence, and discloses a face recognition method, which comprises the following steps: s1: acquiring a mask face image data set according to the non-mask face image data; s2: based on a neural network, a confidence adaptation mechanism and a multi-frame clustering prediction mechanism, using a mask face image dataset to establish a mask recognition model; s3: and acquiring the field video in real time, and identifying by using a mask identification model according to the field video to obtain an identification result. The invention solves the problems of low recognition effect, low recognition efficiency, complex structure of the face recognition neural network and large influence by scenes in the prior art.

Description

Face recognition method

Technical Field

The invention belongs to the technical field of artificial intelligence, and particularly relates to a face recognition method.

Background

Social security is becoming more important, identity authentication of mask molecules becomes particularly important, and although video monitoring is installed everywhere, due to the fact that the video monitoring environment is complex, the mask molecules are subjected to face identity recognition under huge monitoring due to the fact that the video monitoring environment is complex, the reasons of crowd density, motion dispersion and the like and the reasons of imaging environment and the like are also existing.

With the development of economy, the face recognition requirement of the mask also becomes extremely outstanding. Many head manufacturers develop face recognition algorithms under mask shielding in a targeted manner. However, the algorithm research results of all manufacturers are different, and the recognition accuracy is also uneven. But basically, the method is applied to static short-distance recognition scenes such as gate/face attendance and the like.

The prior art has the defects that:

in the face recognition field, wearing a mask/face shield belongs to large-area face shielding, and has been a recognized problem, and the difficulty is mainly manifested in the following four points:

1) The face recognition algorithm is mainly used for carrying out identity judgment according to facial features of the face, and when the mask is worn for recognition, the algorithm cannot accurately detect the facial position and position key points of five sense organs, so that the recognition effect is greatly reduced.

2) The deep learning technology used by the face recognition algorithm depends on massive training data, a large number of mask wearing photos are difficult to collect in a short period, manual labeling is carried out, and recognition efficiency is low;

3) The face recognition neural network has a complex structure and comprises multiple modules, the face recognition neural network not only has the face comparison module but also can influence the face detection, tracking and other modules when the mask is worn, and the whole design is greatly affected by interference.

4) The mask recognition under the current monitoring scene is very influenced by the complexity and diversity of the scene, and particularly imaging a fuzzy complex scene brings great challenges to the mask visual recognition.

Disclosure of Invention

The present invention aims to solve at least to some extent one of the above technical problems.

Therefore, the invention aims to provide a face recognition method for solving the problems of low recognition effect, low recognition efficiency, complex structure of a face recognition neural network and large influence of scenes in the prior art.

The technical scheme adopted by the invention is as follows:

a face recognition method comprises the following steps:

s1: acquiring a mask face image data set according to the non-mask face image data;

s2: based on a neural network, a confidence adaptation mechanism and a multi-frame clustering prediction mechanism, using a mask face image dataset to establish a mask recognition model;

s3: and acquiring the field video in real time, and identifying by using a mask identification model according to the field video to obtain an identification result.

Further, in step S1, the non-mask face image data is processed by using the multiple and interactive face image operation MaskGAN method, so as to obtain a mask face image data set.

Further, in step S2, the neural network is a residual network, and the residual network is provided with an LSTM long and short time memory module, and the LSTM long and short time memory module is provided with a forgetting gate.

Further, in step S2, the equation of the forgetting gate is:

f _t ＝σ(W _f ·[h _t-1 ，x _t ]+b _f )

wherein f _t Is a forgetting door function; sigma is a sigmoid activation function; h is a _t-1 The output of the previous time step (t-1); t is the time step indication; x is x _t Input of the current time step; b _f Is a convolutional layer bias term; w (W) _f Is a convolutional layer weight.

Further, in step S2, the output formula of the neural network is:

in the formula, o _ij An output feature weighted by attention; alpha _ij Is the normalized attention weight; i is an attention indication quantity; j is a unidirectional time step; n is the number of unidirectional time steps; h is a _j For the output of each time step.

Further, the specific steps of step S2 are as follows:

s2-1: dividing the face image data set into a training set and a testing set;

s2-2: training the neural network by using the training set to obtain an initial mask recognition model;

s2-3: adding a confidence adaptation mechanism and a multi-frame clustering prediction mechanism into an initial mask recognition model, and optimizing the initial mask recognition model by using a test set to obtain and output an optimal mask recognition model.

Further, in step S2, the confidence adaptation mechanism is: and adaptively adjusting the confidence threshold of the current model according to the imaging ambiguity of the face image dataset input into the current model.

Further, the specific steps of step S3 are as follows:

s3-1: acquiring a live video in real time, detecting the head of a human body by the live video, and acquiring the head characteristics of the head of the current human body after detecting the head of the human body;

s3-2: taking the head of a human body as an individual identifier, setting a head detection frame, continuously tracking the current individual in the field video according to the head characteristics and the positions of the corresponding head detection frames, and carrying out individual tracking matching to obtain an individual matching result, acquiring a plurality of Shan Zhen images of the current individual if the matching result is a new or existing individual, otherwise, ending the face recognition method;

s3-3: performing face detection on the current single-frame image, and performing face detection on the current single-frame image by using a mask recognition model after detecting the face;

s3-4: after the front face is detected, imaging ambiguity of the current single-frame image is obtained, a confidence threshold of a mask recognition model is adaptively adjusted according to the imaging ambiguity, and the adjusted mask recognition model is used for carrying out face recognition on the current single-frame image to obtain a face recognition result of the current single-frame image;

s3-5: updating the current single-frame image according to the plurality of Shan Zhen images acquired in the step S3-2, and returning to the step S3-3 until all the acquired single-frame images are subjected to face recognition to obtain a plurality of face recognition results;

s3-6: and obtaining the final recognition result of the current individual based on a multi-frame clustering prediction mechanism according to the plurality of face recognition results.

Further, in step S3-3, face detection is performed, and the face detection result is: the current single frame image is a masked face image or the current single frame image is a non-masked face image.

Further, in step S3-3, the non-mask face image is subjected to face detection to obtain a non-mask face, and the non-mask face is directly subjected to face recognition to obtain a recognition result;

and (3) carrying out face detection on the mask face image to obtain a mask face, and entering into step S3-4, and obtaining the imaging ambiguity of the current single-frame image according to the mask face.

The beneficial effects of the invention are as follows:

1) The mask recognition accuracy in the monitoring scene is relatively high, so that the self-adaptive mask recognition in most monitoring scenes can be met, and the scene influence is avoided;

2) The residual structure can effectively prevent gradient from disappearing, so that the capability of learning target characteristics is very strong, the running speed is ideal, and the recognition efficiency is improved;

3) The neural network is improved aiming at a special face scene of mask/mask shielding in a monitoring scene, an enhanced bidirectional LSTM network is used, the robustness of an algorithm is greatly improved, the face individual recognition capability in the monitoring scene is enhanced, and the complexity of the face recognition neural network structure is avoided;

4) The post-processing logic for multi-frame clustering prediction improves the accuracy of mask recognition prediction and improves the recognition accuracy.

Other advantageous effects of the present invention will be described in detail in the detailed description.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a face recognition method;

fig. 2 is an image of the structure of the LSTM long-short-term memory module.

Detailed Description

The invention will be further elucidated with reference to the drawings and to specific embodiments. The present invention is not limited to these examples, although they are described in order to assist understanding of the present invention. Functional details disclosed herein are merely for describing example embodiments of the invention. This invention may, however, be embodied in many alternate forms and should not be construed as limited to the embodiments set forth herein.

It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments of the invention. The terms "comprises," "comprising," "includes," and/or "including," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, and do not preclude the presence or addition of one or more other features, amounts, steps, operations, elements, components, and/or groups thereof.

It should be appreciated that in some alternative embodiments, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may in fact be executed substantially concurrently or the figures may sometimes be executed in the reverse order, depending upon the functionality/acts involved.

It should be understood that specific details are provided in the following description to provide a thorough understanding of the example embodiments. However, it will be understood by those of ordinary skill in the art that the example embodiments may be practiced without these specific details. For example, a system may be shown in block diagrams in order to avoid obscuring the examples with unnecessary detail. In other instances, well-known processes, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the example embodiments.

Example 1

As shown in fig. 1, the present embodiment provides a face recognition method, including the following steps:

processing the non-mask face image data by using a multiple and interactive face image operation mask method to obtain a mask face image data set;

the non-mask face data image uses the existing 8 major class 32 minor class 200 ten thousand face database to carry out secondary automatic affine transformation labeling, carries out automatic mask wearing with affine transformation through 68 point pairs of faces, and uses a multiple and interactive face image operation mask GAN method to process the non-mask face image data, comprising the following steps:

1) The shielding area of the mask (5% is used as shielding proportion interval, and 16 different gear classes are divided from 5% to 80%);

2) In order to adapt to the influence of the multi-style mask on face recognition, the N95 mask style, the common medical mask, the common mask and the facial tissues are processed in a classified manner;

3) In order to adapt to the influence of different color masks/face tissues on face recognition, the method uses four colors of blue, gray, white and black, namely an N95 mask style, a common medical mask, a common mask and face tissues for classification and mixing for treatment;

s2: based on a neural network, a confidence adaptation mechanism and a multi-frame clustering prediction mechanism, a mask face image dataset is used for establishing a mask recognition model, and a multi-frame clustering prediction post-processing logic is used for improving the accuracy of mask recognition prediction and improving the recognition accuracy;

the neural network is a residual network, the residual network is provided with an LSTM long short-time memory module, as shown in fig. 2, the LSTM long short-time memory module is provided with a forgetting gate, the residual structure can effectively prevent gradient from disappearing, so that the characteristic capability of learning a target is very strong, the running speed is ideal, the recognition efficiency is improved, the neural network is improved aiming at a special face scene of mask/mask shielding in a monitoring scene, the enhanced bidirectional LSTM network is used, the robustness of an algorithm is greatly improved, the mask individual recognition capability in the monitoring scene is enhanced, and the complexity of the face recognition neural network structure is avoided;

the formula of the forgetting door is as follows:

f _t ＝σ(W _f ·[h _t-1 ，x _t ]+b _f )

wherein f _t Is a forgetting door function; sigma is a sigmoid activation function; h is a _t-1 The output of the previous time step (t-1); t is the time step indication; x is x _t Input of the current time step; b _f Is a convolutional layer bias term; w (W) _f Is the weight of the convolution layer; the whole calculation process is to fuse the output of the previous time step and the input of the current time step through a convolution layer, then activate the output through a sigmoid function, wherein the output is limited between 0 and 1, 0 represents all forgetting, and 1 represents all reservation;

the formula of the attention module of the neural network is:

e _ij ＝tanh((h _s ·w)+b)*u

in the formula e _ij Is the attention weight before normalization; tanh is the hyperbolic tangent function; h is a _s Output for each time step; w is a convolution weight; b is a convolution offset term; u is a scaling factor; i is an attention indication quantity; j is a unidirectional time step;

the formula of the attention weight of the neural network is:

wherein alpha is _ij Is the normalized attention weight; e, e _ij Is the attention weight before normalization; i is an attention indication quantity; j is a unidirectional time step; k is the time step indication; n is the number of unidirectional time steps; the calculation is carried out, namely the normalized index softmax function is activated, the output is limited between 0 and 1, and the attention distribution is obtained;

the formula of the output characteristics of the neural network after the attention weighting is as follows:

in the formula, o _ij An output feature weighted by attention; alpha _ij Is the normalized attention weight; i is an attention indication quantity; j is a unidirectional time step; n is the number of unidirectional time steps; h is a _j Output for each time step;

the confidence adaptation mechanism is: the confidence threshold of the current model is adaptively adjusted according to the imaging ambiguity of the face image dataset input into the current model, and the confidence threshold is lowered when the imaging ambiguity is increased so as to obtain a single recognition result, so that the problem of unrecognizable imaging is avoided;

the specific steps of the step S2 are as follows:

s2-1: dividing the face image data set into a training set and a testing set;

s2-3: adding a confidence adaptation mechanism and a multi-frame clustering prediction mechanism into an initial mask recognition model, and optimizing the initial mask recognition model by using a test set to obtain and output an optimal mask recognition model;

s3: the method comprises the steps of acquiring a field video in real time, and identifying by using a mask identification model according to the field video to obtain an identification result, wherein the mask identification accuracy under a monitoring scene is relatively high, so that the self-adaptive mask face identification under most monitoring scenes can be satisfied, the scene influence is avoided, and the method comprises the following specific steps:

face detection is carried out, and the face detection result is as follows: the current single frame image is a mask face image or the current single frame image is a non-mask face image;

carrying out face detection on the non-mask face image to obtain a non-mask face, and directly carrying out face recognition on the non-mask face to obtain a recognition result;

carrying out face detection on the mask face image to obtain a mask face, entering into step S3-4, and obtaining the imaging ambiguity of the current single frame image according to the mask face;

imaging ambiguity calculation rules:

1) Pixel RGB to gray value calculation:

RGB is larger than or equal to Gray value Gray _n ＝R*0.3+G*0.59+B*0.11；

2) Calculating the average value of each pixel point of the gray level picture:

μ＝∑Gray _n n, wherein μ is an average value of each pixel of the gray scale picture; gray _n The gray value of the gray picture; n is the total number of pixel points;

3) Calculating the gray value variance of the whole gray picture:

S ² ＝∑(Gray _n -μ) ² /(N-1) where S ² For gray value variance, the smaller S, the more blurred the imaging;

in the embodiment, S is more than or equal to 10 and less than 20, and the confidence coefficient weight is adjusted to 0.8, so that the current confidence coefficient threshold value is 0.8×the preset confidence coefficient threshold value;

s is more than or equal to 20 and less than 30, the confidence coefficient weight is adjusted to be 0.9, and then the current confidence coefficient threshold value is 0.9;

30 is less than or equal to S, the confidence coefficient weight is adjusted to be 1, the current confidence coefficient threshold value is 1, the confidence coefficient threshold value is preset, and S=30 is the optimal critical value of fuzzy detection;

s3-5: updating the current single-frame image according to the plurality of Shan Zhen images acquired in the step S3-2, namely taking the next Shan Zhen image as the current single-frame image, and returning to the step S3-3 until all the acquired single-frame images are subjected to face recognition to obtain a plurality of face recognition results;

s3-6: according to a plurality of face recognition results, based on a multi-frame clustering prediction mechanism, obtaining the identity information of the current individual, namely a final recognition result, continuously updating the individual ID in continuous individual tracking and recognition, and in the initial tracking stage, the recognition result is probably changed continuously due to adverse factors such as mask shielding, long distance, insufficient light and the like, but after continuous tracking, the recognition result tends to be stable, accurate identity information is obtained, and the recognition success rate is greatly improved;

the multi-frame clustering prediction mechanism is as follows:

matched individuals P ₀ -P _k The confidence levels of (a) are respectively:

individual P ₀ Comprising m0 results:

individual P ₁ Comprising m1 results:

……

individual P _k Comprising mk results:

the inter-frame enhanced similarity of the recognition results of the individuals is:

r (F) ₀ -F _n )＝Max(∑P ₀ ，∑P ₁ ，...，∑P _k ) The corresponding individual P;

i.e. possibly matching to multiple individuals in continuous tracking recognition, and the mechanism comprehensively predicts according to the matching times and confidence, calculates the final confidence of different individuals for comparison, and the confidence corresponds to the maximum inter-frame enhancement similarity sigma P _j The individual P of (2) is the individual R (F) _i ) Where j=0, 1..k, i=0, 1..n, the recognition result becomes more stable with continuous tracking, and the identity information of the individual becomes more and more certain.

It will be apparent to those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, or they may alternatively be implemented in program code executable by computing devices, such that they may be stored in a memory device for execution by the computing devices, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The embodiments described above are merely illustrative and may or may not be physically separate if reference is made to the unit being described as a separate component; if a component is referred to as being a unit, it may or may not be a physical unit, may be located in one place, or may be distributed over multiple network elements. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some of the technical features thereof can be replaced by equivalents. Such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

The invention is not limited to the alternative embodiments described above, but any person may derive other various forms of products in the light of the present invention. The above detailed description should not be construed as limiting the scope of the invention, which is defined in the claims and the description may be used to interpret the claims.

Claims

1. A face recognition method is characterized in that: the method comprises the following steps:

s3: acquiring a field video in real time, and identifying by using a mask identification model according to the field video to obtain an identification result;

in the step S2, the neural network is a residual network, the residual network is provided with an LSTM long short-time memory module, and the LSTM long short-time memory module is provided with a forgetting gate;

in the step S2, the confidence adaptation mechanism is as follows: adaptively adjusting a confidence threshold of the current model according to imaging ambiguity of a face image dataset input into the current model;

the specific steps of the step S3 are as follows:

s3-6: obtaining a final recognition result of the current individual based on a multi-frame clustering prediction mechanism according to a plurality of face recognition results;

the multi-frame clustering prediction mechanism is as follows:

matched individuals P ₀ -P _k The confidence levels of (a) are respectively:

individual P ₀ Comprising m0 results:

individual P ₁ Comprising m1 results:

……

individual P _k Comprising mk results:

...

r (F) ₀ -F _n )＝Max(∑P ₀ ,∑P ₁ ,...,∑P _k ) The corresponding individual P.

2. The face recognition method of claim 1, wherein: in the step S1, non-mask face image data is processed by using a multiple and interactive face image operation MaskGAN method, so as to obtain a mask face image dataset.

3. The face recognition method of claim 1, wherein: in the step S2, the formula of the forgetting gate is:

f _t ＝σ(W _f ·[h _t-1 ，x _t ]+b _f )

4. The face recognition method of claim 1, wherein: in the step S2, the output formula of the neural network is:

5. The face recognition method of claim 1, wherein: the specific steps of the step S2 are as follows:

s2-1: dividing the face image data set into a training set and a testing set;

6. The face recognition method of claim 1, wherein: in the step S3-3, face detection is performed, and the face detection result is as follows: the current single frame image is a masked face image or the current single frame image is a non-masked face image.

7. The method for face recognition according to claim 6, wherein: in the step S3-3, the non-mask face image is subjected to face detection to obtain a non-mask face, and the non-mask face is directly subjected to face recognition to obtain a recognition result;