CN114596618A

CN114596618A - Face recognition training method and device for mask wearing, electronic equipment and storage medium

Info

Publication number: CN114596618A
Application number: CN202210322188.6A
Authority: CN
Inventors: 苏安炀; 唐大闰
Original assignee: Beijing Minglue Zhaohui Technology Co Ltd
Current assignee: Beijing Minglue Zhaohui Technology Co Ltd
Priority date: 2022-03-29
Filing date: 2022-03-29
Publication date: 2022-06-07

Abstract

The invention discloses a face recognition training method and device for a mask, electronic equipment and a computer readable storage medium, and belongs to the technical field of deep learning. Wherein, the method comprises the following steps: acquiring a first sample image, wherein the first sample image is a face image of a first face without wearing a mask; generating a second sample image based on the first sample image, wherein the second sample image is a face image of the first face mask; constructing a positive sample pair using the first sample image and the second sample image; and training the initial model by adopting the positive sample pair to obtain a target face recognition model. By the method and the device, the problem of high acquisition cost of the face image of the mask wearing is solved, and the face recognition efficiency of the mask wearing is improved.

Description

Face recognition training method and device for mask wearing, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of deep learning, in particular to a face recognition training method and device for a mask, electronic equipment and a computer-readable storage medium.

Background

The face recognition technology belongs to the field of deep learning application, and is characterized in that whether a face exists in an input face image or a video stream is judged firstly based on the face characteristics of a person, if the face exists, the position and the size of each face and the position information of each main facial organ are further obtained, the identity characteristics contained in each face are extracted according to the information, and the identity characteristics are compared with the known face, so that the identity of each face is recognized.

The deep learning face feature extraction process generally comprises face detection, face key point detection, face correction, face feature extraction and face feature comparison. Specifically, the face recognition process firstly detects the position of the face in a picture, cuts the position of the face, acquires the positions of key points such as facial features, and the like, acquires the posture of the face by calculating the mathematical relationship among the key points, and corrects the picture. And inputting the cut corrected front face picture into a face feature extraction network to obtain a face feature vector, and finally comparing the face feature vectors to judge whether the faces are the same.

The face recognition technology is mature at present and is widely applied to daily life or public security, judicial expertise and side inspection, but most face recognition at present has high requirements on face images, and has poor recognition accuracy on face images with shielding, such as wearing masks. Conventionally, a face image of a mask can be directly collected to train a face recognition model, and then the trained face recognition model is used for face detection, however, in this way, the cost of face image collection is high, and compared with a face image without the mask, a collected new face image with the mask has many detail changes, so that the face recognition model can be trained by paying too much attention to the detail changes, and the mask is ignored, so that the emphasis of face recognition model training is dispersed, and the model recognition effect is not good.

Disclosure of Invention

In order to solve the technical problems or at least partially solve the technical problems, the invention provides a face recognition training method and device for a mask, an electronic device and a computer-readable storage medium.

According to an aspect of an embodiment of the present application, there is provided a face recognition training method for a mask, including: acquiring a first sample image, wherein the first sample image is a face image of a first face without wearing a mask; generating a second sample image based on the first sample image, wherein the second sample image is a face image of the first face mask; constructing a positive sample pair using the first sample image and the second sample image; and training the initial model by adopting the positive sample pair to obtain a target face recognition model.

Further, the training of the initial model by using the positive sample pair to obtain the target face recognition model includes: acquiring a third sample image, wherein the third sample image is a face image of a second face; constructing a negative sample pair by using the first sample image and the third sample image, or constructing a negative sample pair by using the second sample image and the third sample image; and training an initial model by adopting the positive sample pair and the negative sample pair to obtain a target face recognition model.

Further, the acquiring a third sample image includes: acquiring a third sample image, wherein the third sample image is a face image of a second face without wearing a mask; generating a fourth sample image based on the third sample image, wherein the fourth sample image is a face image of the second face mask; the constructing a negative sample pair by using the first sample image and the third sample image, or constructing a negative sample pair by using the second sample image and the third sample image, includes: and constructing a negative sample pair by using the first sample image and the third sample image, or constructing a negative sample pair by using the second sample image and the third sample image, or constructing a negative sample pair by using the first sample image and the fourth sample image.

Further, the training of the initial model by using the positive sample pair and the negative sample pair to obtain a target face recognition model includes: inputting the positive sample pair or the negative sample pair into a feature extraction network respectively to obtain a first feature vector and a second feature vector respectively; and performing comparative learning on the first feature vector and the second feature vector by adopting a first loss function, inputting the first feature vector and the second feature vector into a classifier for feature classification, and performing supervised learning on a classification result to obtain a target face recognition model.

Further, the performing comparative learning on the first feature vector and the second feature vector by using the first loss function includes: inputting the first feature vector and the second feature vector into a first loss function for feature comparison; determining whether the first feature vector and the second feature vector are from a positive sample pair or a negative sample pair; if the first feature vector and the second feature vector are from a positive sample pair, approximating the first feature vector and the second feature vector using the first loss function; and if the first feature vector and the second feature vector are from a negative sample pair, separating the first feature vector and the second feature vector by using the first loss function.

Further, the inputting the first feature vector and the second feature vector into a classifier for feature classification, and performing supervised learning on classification results includes: inputting the first feature vector and the second feature vector into a classifier for feature classification to obtain a first classification result and a second classification result respectively; and respectively inputting the first classification result and the second classification result into a second loss function and a third loss function, and supervising the classification results through the second loss function and the third loss function.

Further, after the initial model is trained by using the positive sample pair and the negative sample pair to obtain a target face recognition model, the method further includes: extracting a feature extraction network in the target face recognition model; receiving a target face image to be recognized; and inputting the target face image into the feature extraction network, and outputting the target face feature.

According to another aspect of the embodiments of the present application, there is also provided a face recognition training device for wearing a mask, including: the image acquisition module is used for acquiring a first sample image, wherein the first sample image is a face image of a first face without wearing a mask; the mask generating module is used for generating a second sample image based on the first sample image, wherein the second sample image is a face image of the first face wearing a mask; a positive sample pair construction module for constructing a positive sample pair using the first sample image and the second sample image; and the recognition training module is used for training the initial model by adopting the positive sample pair to obtain a target face recognition model.

Further, the recognition training module comprises: the image acquisition submodule is used for acquiring a third sample image, wherein the third sample image is a face image of a second face; a negative sample pair construction module, configured to construct a negative sample pair by using the first sample image and the third sample image, or construct a negative sample pair by using the second sample image and the third sample image; and the first training unit is used for training the initial model by adopting the positive sample pair and the negative sample pair to obtain a target face recognition model.

Further, the image acquisition sub-module includes: the first acquisition unit is used for acquiring a third sample image, wherein the third sample image is a face image of a second face without wearing a mask; a first mask generating unit configured to generate a fourth sample image based on the third sample image, wherein the fourth sample image is a face image of the second face mask; and the negative sample pair construction module is used for constructing a negative sample pair by adopting the first sample image and the third sample image, or constructing a negative sample pair by adopting the second sample image and the third sample image, or constructing a negative sample pair by adopting the first sample image and the fourth sample image.

Further, the first training unit comprises: the feature extraction unit is used for respectively inputting the positive sample pair or the negative sample pair into a feature extraction network to respectively obtain a first feature vector and a second feature vector; the characteristic comparison learning unit is used for comparing and learning the first characteristic vector and the second characteristic vector by adopting a first loss function; the characteristic classification unit is used for inputting the first characteristic vector and the second characteristic vector into a classifier for characteristic classification; and the result supervision unit is used for carrying out supervision learning on the classification result to obtain the target face recognition model.

Further, the feature comparison learning unit includes: the first comparison learning unit is used for inputting the first feature vector and the second feature vector into the first loss function for feature comparison; a feature vector determination unit configured to determine whether the first feature vector and the second feature vector are from a positive sample pair or a negative sample pair; a feature vector approximation unit configured to approximate the first feature vector and the second feature vector using the first loss function if the first feature vector and the second feature vector are from a positive sample pair; a feature vector distancing unit configured to, if the first feature vector and the second feature vector are from a negative sample pair, distancing the first feature vector and the second feature vector using the first loss function.

Further, the feature classification unit includes: the first classification unit is used for inputting the first feature vector and the second feature vector into a classifier for feature classification to respectively obtain a first classification result and a second classification result; the result supervision unit includes: and the first supervision unit is used for respectively inputting the first classification result and the second classification result into a second loss function and a third loss function, and supervising the classification results through the second loss function and the third loss function.

Further, the apparatus further comprises: the network extraction module is used for extracting a feature extraction network in the target face recognition model; the image receiving module is used for receiving a target face image to be recognized; and the feature input module is used for inputting the target face image into the feature extraction network and outputting the target face features.

According to another aspect of the embodiments of the present application, there is also provided an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the computer program, when executed by the processor, implements the steps of the face recognition training method for a mask wearing apparatus as described above.

According to another aspect of embodiments of the present application, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the face recognition training method for a respirator as described above.

The embodiment of the present application further provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the steps in the above-mentioned face recognition training method for a mask.

The method can be applied to computer vision in the technical field of deep learning, and a first sample image is obtained through the method, wherein the first sample image is a face image of a first face without a mask; generating a second sample image based on the first sample image, wherein the second sample image is a face image of the first face mask; constructing a positive sample pair using the first sample image and the second sample image; the initial model is trained by adopting the positive sample pair to obtain a target face recognition model, the face image without wearing the mask and the generated face image wearing the mask are used as the positive sample pair to train the initial model, the face image wearing the mask does not need to be collected again, the collection cost of the face image wearing the mask is saved, the consistency of the facial image wearing the mask and the original face image without wearing the mask in the training process is ensured, so that the model focuses on the training of the mask features, the recognition efficiency of the wearing mask is improved, and the model recognition effect is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a block diagram of a hardware configuration of a computer according to an embodiment of the present invention;

FIG. 2 is a flow chart of a face recognition training method for a respirator according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a scenario implementation in an embodiment of the present invention;

fig. 4 is a block diagram of a face recognition training device for wearing a mask according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

The method provided by the first embodiment of the present application may be executed in a mobile phone, a computer, a tablet or a similar computing device. Taking an example of the present invention running on a computer, fig. 1 is a block diagram of a hardware structure of a computer according to an embodiment of the present invention. As shown in fig. 1, the computer may include one or more (only one shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and optionally, a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those of ordinary skill in the art that the configuration shown in FIG. 1 is illustrative only and is not intended to limit the configuration of the computer described above. For example, a computer may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 may be used to store a computer program, for example, a software program and a module of an application software, such as a computer program corresponding to a face recognition training method for a mask wearing device in an embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, so as to implement the above-mentioned method. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to a computer through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

In this embodiment, a face recognition training method for a mask is provided, and fig. 2 is a flowchart of a face recognition training method for a mask according to an embodiment of the present invention, as shown in fig. 2, the flowchart includes the following steps:

s10, acquiring a first sample image, wherein the first sample image is a face image of a first face without wearing a mask;

due to the training process of face recognition, the structure of the face recognition data set is generally evenly distributed, namely, the face recognition data set has hundreds of thousands to millions of ids, wherein each id corresponds to one person, each id has at least one face picture, in general, each id has a plurality of face pictures, and in individual cases, each id has hundreds of face pictures.

The face pictures used as the training samples in the embodiment have been subjected to face detection, face alignment and other operations, so that the proportion of the faces in each picture is consistent, and the position of the longitudinal axis of the faces is limited. The embodiment of the invention can acquire the face image without being covered by a mask from the data set for training as the first sample image.

S20, generating a second sample image based on the first sample image, wherein the second sample image is a face image of the first face mask;

after the first sample image is acquired, the face image of the wearer's face in the present embodiment is generated based on the first sample image. The second sample image may be generated based on the first sample image by adding a mask to the unobstructed face using a mask generating tool, so as to obtain a face image of the first face wearing the mask. Optionally, in a specific embodiment, a suitable mask picture may be added to the face image by detecting face key points in the picture and determining the size and offset angle of the mask picture by an algorithm to generate the face image of the first face wearing the mask.

S30, constructing a positive sample pair by using the first sample image and the second sample image;

in the classification problem, a sample belonging to a target class corresponding to a true value in a certain class is a positive sample. In this embodiment, the face images belonging to the same id are positive samples. Further, in the present embodiment, the positive sample pair is composed of the first sample image that is not worn and the second sample image that is generated with the mask worn, and since the face image of the mask worn is generated from the face image that is not worn, it can be determined that the first sample image and the second sample image belong to the same id.

And S40, training the initial model by adopting the positive sample pair to obtain a target face recognition model.

The face recognition model can be regarded as a classification model in the training process, namely, the face recognition model is composed of a backbone model (backbone) and a classifier (full connected, FC). The purpose of the backbone model is to extract specific information from an image and obtain high-dimensional information of a human face, the purpose of the classifier is to classify the extracted high-dimensional information, and if a data set of human face recognition comprises 100 ten thousand ids, the classifier outputs 100 ten thousand confidence degrees to judge the id to which each photo belongs.

In the real-time example, the first sample image and the second sample image are used as positive samples to train the initial model so as to obtain a trained target face recognition model. The embodiment is not particularly limited to the process of establishing the face recognition initial model.

Through the steps, the first sample image is obtained, wherein the first sample image is a face image of a first face without wearing a mask; generating a second sample image based on the first sample image, wherein the second sample image is a face image of the first face mask; constructing a positive sample pair using the first sample image and the second sample image; the initial model is trained by adopting the positive sample pair to obtain a target face recognition model, face images of the mask are not required to be collected again, the collection cost of the face images of the mask is saved, the consistency of the face images of the mask and the original face images of the mask which is not worn in the training process is ensured, the recognition efficiency of the mask is improved, and the model recognition effect is improved.

In an implementation manner of this embodiment, the training of the initial model is performed by using the positive sample pair, so as to obtain a target face recognition model, and S40 includes:

s41, acquiring a third sample image, wherein the third sample image is a face image of a second face;

unlike the first face in the above embodiment, the face image of the second face in the present embodiment represents a different id from that of the first face, that is, a face image of a non-identical person.

Similarly, the mode of obtaining the face image of the second face without wearing the mask may be obtained by connecting a face image dataset, wherein when obtaining the third sample image, care should be taken to avoid obtaining the face image belonging to the same id, specifically, whether the id of the obtained face image is consistent with the id of the first sample image or not may be judged, if so, the face image is discarded, and the next face image is continuously obtained in the dataset until the id of the image is not consistent with the id of the first sample image, and the face image is taken as the third sample image, so that the face images belonging to different ids are obtained.

The third sample image in this embodiment may be a face image of a second face without wearing a mask, or a face image of a second face with a mask.

S42, constructing a negative sample pair by using the first sample image and the third sample image, or constructing a negative sample pair by using the second sample image and the third sample image;

samples that do not belong to a certain class and other target classes that do not correspond to true values are negative samples. In this embodiment, the negative sample pairs for training are formed by face images of different persons, and the constructed negative sample pairs further include a face image of a first face wearing a mask and a face image of a second face.

And S43, training the initial model by adopting the positive sample pair and the negative sample pair to obtain a target face recognition model.

According to the embodiment of the invention, the face images with different ids are constructed to be used as the negative sample pair, the initial model is trained through the positive and negative samples, and the constructed negative sample pair comprises the generated face image of the mask, so that the recognition effect of the model on the face of the mask is improved.

In an embodiment of this embodiment, the acquiring the third sample image includes: acquiring a third sample image, wherein the third sample image is a face image of a second face without wearing a mask; generating a fourth sample image based on the third sample image, wherein the fourth sample image is a face image of the second face mask; the constructing a negative sample pair by using the first sample image and the third sample image, or constructing a negative sample pair by using the second sample image and the third sample image, includes: and constructing a negative sample pair by using the first sample image and the third sample image, or constructing a negative sample pair by using the second sample image and the third sample image, or constructing a negative sample pair by using the first sample image and the fourth sample image.

The face image of the second face without wearing the mask is acquired in this embodiment, and then the face image of the second face with wearing the mask is generated according to the face image of the second face without wearing the mask, wherein the face image generated to generate the second face with wearing the mask may be generated by a mask generation tool, or may be obtained by adding a suitable mask picture to the face image through an algorithm. And then, constructing a negative sample pair by using the first sample image and the third sample image, or the second sample image and the third sample image, or the first sample image and the fourth sample image, wherein the first sample image and the third sample image are face images which do not wear masks with different ids, and the sample pair of the second sample image and the third sample image and the sample pair of the first sample image and the fourth sample image are face images which do not wear masks with different ids and the other face image does not wear masks.

The negative sample pair in this embodiment has two cases: one is that the two images are original face images without occlusion, but the two images belong to different people; the other mode is that the two images belong to different people, one of the negative sample pairs is a generated face image of the mask, and by the mode, the face images with different ids are used as negative samples for comparison and comparison, and the initial model is trained by adopting the positive sample pair and the negative sample pair, so that the training effect of the model is improved.

In one example, training an initial model using the pair of positive samples and the pair of negative samples to obtain a target face recognition model includes:

s401, inputting the positive sample pair or the negative sample pair into a feature extraction network respectively to obtain a first feature vector and a second feature vector respectively;

the face feature extraction may be understood as a process of converting an input face image into a vector representation, and performing feature modeling on a face. The extracted features are generally classified into visual features, pixel statistical features, face image transformation coefficient features, face image algebraic features and the like. Generally, the methods for extracting the human face features include a knowledge-based characterization method, and also include a characterization method based on algebraic features or statistical learning. The knowledge-based characterization method mainly obtains feature data which is helpful for face classification according to shape description of face organs and distance characteristics between the face organs, and feature components of the feature data generally comprise Euclidean distance, curvature, angle and the like between feature points. The human face is composed of parts such as eyes, nose, mouth, chin and the like, and geometric description of the parts and the structural relationship among the parts can be used as important features for identifying the human face, and the features are called as geometric features. The knowledge-based face characterization mainly comprises a geometric feature-based method and a template matching method.

In this embodiment, how to perform feature extraction and features extracted by the feature extraction network are not specifically limited, and the positive and negative sample pairs constructed in the implementation of the present invention may be used as the input of the feature extraction network. In this embodiment, the first feature vector and the second feature vector are both output by the same feature extraction network, so that the two feature vectors are identical in dimension, precision and purpose and only come from different input pictures.

S402, performing comparison learning on the first feature vector and the second feature vector by adopting a first loss function, inputting the first feature vector and the second feature vector into a classifier for feature classification, and performing supervision learning on a classification result to obtain a target face recognition model.

The contrast type learning focuses on learning common characteristics among samples of the same type, and distinguishes differences among samples of different types. The embodiment of the invention adopts a first loss function to carry out comparison learning on a first feature vector and a second feature vector, wherein the first loss function is a comparison learning loss function which is mainly used for feature comparison, namely, samples which are originally similar are used for feature comparison, and after feature extraction, two samples are still similar in a feature space; and after the originally dissimilar samples are subjected to feature extraction, the two samples are still dissimilar in the feature space.

Further, the comparative learning loss function description is with reference to the following:

D_w＝||G_W(X₁)_shp-G_W(X₂)_shp||₂；

wherein a parameterized distance loss function D_wFor two sample features G_W(X₁)_shpAnd G_W(X₂)_shpThe Euclidean distance of; (Y, (X)₁,X₂)^j) Is the jth labeled sample pair; y denotes a label of whether two samples match, X₁And X₂When two samples are similar or matched, the value of Y is 1, and Y is 0, which represents mismatch; p is the number of training pairs; m is the margin and is the set threshold.

For originally similar samples, if the euclidean distance in the feature space is large, it means that the current model is not good, and therefore the loss is large, whereas if the euclidean distance in the feature space is small, the loss value becomes large.

The matching degree of the sample can be well expressed through the comparison learning loss function, and the method can be well used for training a model for extracting features.

In this embodiment, after feature vector data of two face images in a sample pair is extracted, the extracted feature vectors are compared and learned through a first loss function and used for training a feature extraction network, meanwhile, the first feature vector and the second feature vector are input into a classifier and are searched and matched with a feature template stored in a database, a threshold is set, when the similarity exceeds the threshold, a result obtained through matching is output to obtain corresponding id matched with the first feature vector and the second feature vector, so that feature classification is performed on the first feature vector and the second feature vector, and then classification results are supervised through a second loss function and a third loss function, wherein the second loss function and the third loss function are loss functions of face recognition, such as ArcFace, cosface and the like.

Further, in an example of this embodiment, the performing comparison learning on the first feature vector and the second feature vector by using the first loss function includes: inputting the first feature vector and the second feature vector into a first loss function for feature comparison; determining whether the first feature vector and the second feature vector are from a positive sample pair or a negative sample pair; if the first feature vector and the second feature vector are from a positive sample pair, approximating the first feature vector and the second feature vector using the first loss function; and if the first feature vector and the second feature vector come from a negative sample pair, the first loss function is adopted to enable the first feature vector and the second feature vector to be far away.

In this embodiment, the first loss function is used to make the feature vectors extracted by the positive sample pair and the feature vectors extracted by the negative sample pair as close as possible to each other, and to make the feature vectors extracted by the positive sample pair and the feature vectors extracted by the negative sample pair and the first loss function as far away from each other, so as to improve the recognition effect of the positive and negative sample pairs.

Further, in an example of this embodiment, the inputting the first feature vector and the second feature vector into a classifier for feature classification, and performing supervised learning on a classification result includes: inputting the first feature vector and the second feature vector into a classifier for feature classification to obtain a first classification result and a second classification result respectively; and respectively inputting the first classification result and the second classification result into a second loss function and a third loss function, and supervising the classification results through the second loss function and the third loss function.

Loss functions in supervised learning are often used to assess the degree of inconsistency between the true values of the samples and the predicted values of the models. In the embodiment of the invention, the classification results after the feature classification are respectively supervised by using the second loss function and the third loss function. The second and third loss functions are generally loss functions of face recognition, such as ArcFace, CosFace, and the like.

Wherein the loss function of ArcFace is referenced as follows:

the starting point of the arcfacace loss is that the distance between classes is optimized from an inverse cosine space, and the cos value is smaller in a 0-pi monotone interval by adding m to an included angle.

The loss function for CosFace is referenced as follows:

the cosface loss is designed by subtracting a positive value from the cosine space, such that over the monotonic interval 0-pi,

and when the loss converges, theta is smaller, and the class interior is gathered more.

Further, in an example of this embodiment, after the initial model is trained by using the pair of positive samples and the pair of negative samples to obtain a target face recognition model, the method further includes: extracting a feature extraction network in the target face recognition model; receiving a target face image to be recognized; and inputting the target face image into the feature extraction network, and outputting the target face feature.

In this embodiment, when the method is used, only the feature extraction network, that is, the backbone model in the target face recognition model is retained, and the face features of the input face image are extracted by using the feature extraction network trained by the method in this embodiment. The method comprises the steps of receiving a target face image to be recognized, for example, a real face image of a mask wearing acquired through a security inspection camera, inputting the real face image of the mask wearing into a trained feature extraction network to obtain features of a target face, and then performing face recognition based on the output target face features.

In the inference stage, that is, when the model is actually used, the feature extraction network trained in the above embodiment is used, so that the parameter quantities of the features used in the actual use and the training are completely consistent, and any parameter quantity is not increased, so that the trained model is actually applied.

Referring to fig. 3, fig. 3 is a schematic diagram of a scenario implementation in the embodiment of the present invention.

As shown in fig. 3, the upper left corner in the figure is an original non-occluded face input picture, the lower left corner is a generated mask-wearing face sample, two pieces of face image data are input into the same backbone network, i.e., a feature extraction network, and output as a first feature and a second feature, the first feature and the second feature are sent into a first loss function for feature comparison, and are also sent into the same classifier structure for feature classification, after classification, a first classification result and a second classification result are obtained, and then the classification results are supervised by using a second loss function and a third loss function.

In this embodiment, a large number of non-occlusion face pictures are collected first, so the training set used is all face non-occlusion pictures, also called original pictures, and the training set contains more ids, and each id contains at least 3 face images with differences. In this embodiment, the original non-occluded face input picture is a face image of one id randomly selected from the training set, and then a mask is added to the non-occluded face using a mask generation tool, and constructs a pair of an original face picture without shielding and a generated data sample of the face with the mask, the data sample pair is a positive sample pair with the same id, two pictures in the data sample pair are aligned and compressed into 112 × 112 resolution, and are sequentially input into a backbone network, obtaining a first feature and a second feature, respectively, and then feeding the first feature and the second feature into a first loss function for feature comparison, for positive sample pairs, the first loss function keeps the characteristics of the two as close as possible, for negative sample pairs, the first loss function keeps the characteristics of the two away, the features output in this embodiment are all 512 dimensions, and the first loss function is a comparative learning loss function. The first feature and the second feature are simultaneously sent to the same classifier structure for feature classification, a first classification result and a second classification result corresponding to id are obtained respectively, and then a second loss function and a third loss function are used for supervising the classification results, wherein the second loss function and the third loss function are loss functions of face recognition, such as ArcFace and cosface. When the face recognition method is used, the features of the face can be extracted for face recognition only by keeping the trained backbone network. The facial image of wearing the gauze mask need not gather again in this implementation, has saved the collection cost of the facial image of wearing the gauze mask to guaranteed to wear gauze mask facial image and the original uniformity that does not wear gauze mask facial image feature extraction in the training process, adopted the mode of contrast study, improved the recognition efficiency to wearing the gauze mask, promoted model identification effect.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 2

In this embodiment, a face recognition training device for wearing a mask is further provided, which is used to implement the above embodiments and preferred embodiments, and the description of the face recognition training device is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 4 is a block diagram of a face recognition training device for wearing a mask according to an embodiment of the present invention, and as shown in fig. 4, the device includes: an image acquisition module 100, a mask generation module 200, a positive sample pair construction module 300, and an identification training module 400, wherein,

the image acquisition module 100 is configured to acquire a first sample image, where the first sample image is a face image of a first face without wearing a mask;

a mask generating module 200, configured to generate a second sample image based on the first sample image, where the second sample image is a face image of the first face wearing a mask;

a positive sample pair construction module 300 configured to construct a positive sample pair using the first sample image and the second sample image;

and the recognition training module 400 is configured to train the initial model by using the positive sample pair to obtain a target face recognition model.

Optionally, the recognition training module 400 comprises: the image acquisition submodule is used for acquiring a third sample image, wherein the third sample image is a face image of a second face; a negative sample pair construction module, configured to construct a negative sample pair using the first sample image and the third sample image, or construct a negative sample pair using the second sample image and the third sample image; and the first training unit is used for training the initial model by adopting the positive sample pair and the negative sample pair to obtain a target face recognition model.

Optionally, the image acquisition sub-module comprises: the first acquisition unit is used for acquiring a third sample image, wherein the third sample image is a face image of a second face without wearing a mask; a first mask generating unit configured to generate a fourth sample image based on the third sample image, wherein the fourth sample image is a face image of the second face mask; and the negative sample pair construction module is used for constructing a negative sample pair by adopting the first sample image and the third sample image, or constructing a negative sample pair by adopting the second sample image and the third sample image, or constructing a negative sample pair by adopting the first sample image and the fourth sample image.

Optionally, the first training unit comprises: the feature extraction unit is used for respectively inputting the positive sample pair or the negative sample pair into a feature extraction network to respectively obtain a first feature vector and a second feature vector; the characteristic comparison learning unit is used for comparing and learning the first characteristic vector and the second characteristic vector by adopting a first loss function; the characteristic classification unit is used for inputting the first characteristic vector and the second characteristic vector into a classifier for characteristic classification; and the result supervision unit is used for carrying out supervision learning on the classification result to obtain the target face recognition model.

Optionally, the feature contrast learning unit includes: the first comparison learning unit is used for inputting the first feature vector and the second feature vector into a first loss function for feature comparison; a feature vector determination unit configured to determine whether the first feature vector and the second feature vector are from a positive sample pair or a negative sample pair; a feature vector approximation unit configured to approximate the first feature vector and the second feature vector using the first loss function if the first feature vector and the second feature vector are from a positive sample pair; a feature vector distancing unit configured to, if the first feature vector and the second feature vector are from a negative sample pair, distancing the first feature vector and the second feature vector using the first loss function.

Optionally, the feature classification unit includes: the first classification unit is used for inputting the first feature vector and the second feature vector into a classifier for feature classification to respectively obtain a first classification result and a second classification result; the result supervision unit includes: and the first supervision unit is used for respectively inputting the first classification result and the second classification result into a second loss function and a third loss function, and supervising the classification results through the second loss function and the third loss function.

Optionally, the apparatus further comprises: the network extraction module is used for extracting a feature extraction network in the target face recognition model; the image receiving module is used for receiving a target face image to be recognized; and the characteristic input module is used for inputting the target face image into the characteristic extraction network and outputting the target face characteristic.

It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.

Example 3

Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.

Optionally, the electronic device may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Alternatively, in this embodiment, the processor may be configured to execute the steps in any of the above method embodiments through a computer program.

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another apparatus, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be an indirect coupling or communication connection through some interfaces, units or modules, and may be electrical or in other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A face recognition training method for a mask, comprising:

acquiring a first sample image, wherein the first sample image is a face image of a first face without wearing a mask;

generating a second sample image based on the first sample image, wherein the second sample image is a face image of the first face mask;

constructing a positive sample pair using the first sample image and the second sample image;

and training the initial model by adopting the positive sample pair to obtain a target face recognition model.

2. The face recognition training method of a respirator according to claim 1, wherein the training of the initial model using the positive sample pair to obtain the target face recognition model comprises:

acquiring a third sample image, wherein the third sample image is a face image of a second face;

constructing a negative sample pair by using the first sample image and the third sample image, or constructing a negative sample pair by using the second sample image and the third sample image;

and training an initial model by adopting the positive sample pair and the negative sample pair to obtain a target face recognition model.

3. The face recognition training method of a respirator as set forth in claim 2, wherein the acquiring of the third sample image comprises:

acquiring a third sample image, wherein the third sample image is a face image of a second face without wearing a mask;

generating a fourth sample image based on the third sample image, wherein the fourth sample image is a face image of the second face mask;

the constructing a negative sample pair by using the first sample image and the third sample image, or constructing a negative sample pair by using the second sample image and the third sample image, includes:

and constructing a negative sample pair by using the first sample image and the third sample image, or constructing a negative sample pair by using the second sample image and the third sample image, or constructing a negative sample pair by using the first sample image and the fourth sample image.

4. The face recognition training method of a respirator according to claim 2, wherein the training of the initial model using the pair of positive samples and the pair of negative samples to obtain the target face recognition model comprises:

inputting the positive sample pair or the negative sample pair into a feature extraction network respectively to obtain a first feature vector and a second feature vector respectively;

and performing comparative learning on the first feature vector and the second feature vector by adopting a first loss function, inputting the first feature vector and the second feature vector into a classifier for feature classification, and performing supervised learning on a classification result to obtain a target face recognition model.

5. The face recognition training method of a respirator according to claim 4, wherein the learning by comparing the first feature vector and the second feature vector using the first loss function comprises:

inputting the first feature vector and the second feature vector into a first loss function for feature comparison;

determining whether the first feature vector and the second feature vector are from a positive sample pair or a negative sample pair;

if the first feature vector and the second feature vector are from a positive sample pair, approximating the first feature vector and the second feature vector using the first loss function;

and if the first feature vector and the second feature vector are from a negative sample pair, separating the first feature vector and the second feature vector by using the first loss function.

6. The face recognition training method of a respirator according to claim 4, wherein the step of inputting the first feature vector and the second feature vector into a classifier for feature classification and performing supervised learning on the classification result comprises the steps of:

inputting the first feature vector and the second feature vector into a classifier for feature classification to obtain a first classification result and a second classification result respectively;

and respectively inputting the first classification result and the second classification result into a second loss function and a third loss function, and supervising the classification results through the second loss function and the third loss function.

7. The wearing mask face recognition training method according to claim 1, wherein after training an initial model using the pair of positive samples and the pair of negative samples to obtain a target face recognition model, the method further comprises:

extracting a feature extraction network in the target face recognition model;

receiving a target face image to be recognized;

and inputting the target face image into the feature extraction network, and outputting the target face feature.

8. The utility model provides a wear face identification trainer of gauze mask which characterized in that includes:

the device comprises an image acquisition module, a processing module and a display module, wherein the image acquisition module is used for acquiring a first sample image, and the first sample image is a face image of a first face without wearing a mask;

the mask generating module is used for generating a second sample image based on the first sample image, wherein the second sample image is a face image of the first face wearing a mask;

a positive sample pair construction module for constructing a positive sample pair using the first sample image and the second sample image;

and the recognition training module is used for training the initial model by adopting the positive sample pair to obtain a target face recognition model.

9. An electronic device, characterized in that the electronic device comprises a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the face recognition training method of a respirator according to any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, implements the steps of the face recognition training method of a respirator according to any one of claims 1 to 7.