CN113963237B - Model training method, mask wearing state detection method, electronic device and storage medium - Google Patents

Model training method, mask wearing state detection method, electronic device and storage medium Download PDF

Info

Publication number
CN113963237B
CN113963237B CN202111575831.8A CN202111575831A CN113963237B CN 113963237 B CN113963237 B CN 113963237B CN 202111575831 A CN202111575831 A CN 202111575831A CN 113963237 B CN113963237 B CN 113963237B
Authority
CN
China
Prior art keywords
face
vector
mask
label
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111575831.8A
Other languages
Chinese (zh)
Other versions
CN113963237A (en
Inventor
付贤强
寇鸿斌
朱海涛
何武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Dilusense Technology Co Ltd
Original Assignee
Beijing Dilusense Technology Co Ltd
Hefei Dilusense Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dilusense Technology Co Ltd, Hefei Dilusense Technology Co Ltd filed Critical Beijing Dilusense Technology Co Ltd
Priority to CN202111575831.8A priority Critical patent/CN113963237B/en
Publication of CN113963237A publication Critical patent/CN113963237A/en
Application granted granted Critical
Publication of CN113963237B publication Critical patent/CN113963237B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The embodiment of the invention relates to the field of face recognition, and discloses a model training, mask wearing state detection method, electronic equipment and a storage medium, wherein standard depth images of a face wearing mask and a face not wearing mask are obtained, the standard depth image of the same face wearing mask is used as an image sample, and the standard depth image of the face not wearing mask and the class divided according to the tightness state of the face wearing mask are used as a first label and a second label; carrying out the same format conversion on the image sample and the first label thereof to obtain a sample vector and a first label vector; taking the sample vector as input and the first feature vector as output, and constructing a feature extraction model; taking a difference vector of the sample vector and the first feature vector as input, and taking the probability that the difference vector belongs to each second label as output to construct a classifier; and performing joint training on the feature extraction model and the classifier. The model of this scheme training can be based on the degree of depth information of people's face and detect the elasticity state that gauze mask was worn to people's face.

Description

Model training method, mask wearing state detection method, electronic device and storage medium
Technical Field
The invention relates to the field of face recognition, in particular to a model training and mask wearing state detection method, electronic equipment and a storage medium.
Background
During the emergent public health incident, the gauze mask needs to be worn in the public places, so that the contact is reduced, and the excessive density of people is avoided. Wearing the mask is an effective means for epidemic prevention and control. At present, the common method for supervising the wearing of the mask of a worker is to supervise the mask manually and check whether the mask is worn by the worker, so that the efficiency is low, the labor cost is required, and the probability of infection of the worker is increased. Therefore, various automatic mask wearing state detection methods are developed, wherein the mainstream is a detection and identification method for the mask wearing state of a person based on deep learning, and the mask wearing state of the person can be identified, including whether the person wears the mask and whether the mask wearing mode is standard, so that the work of manual supervision is reduced, and the improvement of the work efficiency of each department is facilitated.
The detection and identification method for the wearing state of the mask of the person based on deep learning generally refers to a deep convolution neural network which simulates the hierarchical structure of the human brain and consists of well-arranged neuron interconnections. The mask wearing state detection and identification method based on deep learning is characterized in that shallow layer and deep layer feature extraction is carried out on an image through a deep convolutional neural network, and original marking data are used for correcting network parameters, so that the wearing mask detection network learns to locate the position of a face in the image, and meanwhile, the mask wearing state of the face is identified, including whether the mask is worn or not and whether the mask wearing mode is standard or not.
The existing detection and identification method for the wearing state of the mask of the person based on deep learning can only identify whether the face area with the mask or without the mask and wearing the mask is exposed out of the nose or not, and the mouth does not have a standard wearing mode, but some irregular situations of the mask to be worn still exist and cannot be detected.
Disclosure of Invention
The invention aims to provide a model training method, a mask wearing state detection method, electronic equipment and a storage medium, which can detect the tightness state of a face wearing a mask based on the depth information of the face, thereby enriching the detection of irregular wearing modes.
In order to solve the above technical problem, an embodiment of the present invention provides a model training method, including:
acquiring standard depth images of a face wearing mask and a face not wearing mask, and taking the standard depth image of the face wearing mask as an image sample, wherein the standard depth image of the face not wearing mask and the class divided according to the tightness state of the face wearing mask are sequentially taken as a first label and a second label corresponding to the image sample;
carrying out the same format conversion on the image sample and the first label thereof to respectively obtain a one-dimensional sample vector and a first label vector;
taking the sample vector as input, and taking a first feature vector describing the face with the mask removed as output to construct a feature extraction model; the sample vector is the same length as the first feature vector;
taking a difference vector of the sample vector and the first feature vector as an input, and taking the probability that the difference vector belongs to each second label as an output to construct a classifier;
and performing joint training on the feature extraction model and the classifier, wherein a loss function in the joint training is constructed based on a first loss between the first feature vector and the first label vector output by the feature extraction model and a second loss between a prediction category output by the classifier and the second label.
The embodiment of the invention also provides a wearing mask state detection method, which comprises the following steps:
acquiring a first standard depth image of a face wearing mask to be detected;
carrying out format conversion on the first standard depth image to obtain a one-dimensional detection vector;
and sequentially processing the detection vectors by adopting the feature extraction model and the classifier obtained by the model training method combined training to obtain the first feature vectors corresponding to the detection vectors and the categories of the tightness state of the mask.
An embodiment of the present invention also provides an electronic device, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the model training method as described above, and the mask wearing state detection method as described above.
Embodiments of the present invention also provide a computer-readable storage medium storing a computer program that, when executed by a processor, implements the model training method as described above, and the wearing mask state detection method as described above.
Compared with the prior art, the method and the device have the advantages that the standard depth images of the face wearing mask and the face not wearing mask are obtained, the standard depth image of the face wearing mask is used as the image sample, and the standard depth image of the face not wearing mask and the class divided according to the tightness of the face wearing mask are sequentially used as the first label and the second label corresponding to the image sample; carrying out the same format conversion on the image sample and the first label thereof to respectively obtain a one-dimensional sample vector and a first label vector; taking the sample vector as input, and taking a first feature vector describing the face with the mask removed as output to construct a feature extraction model; the sample vector is the same as the first feature vector in length; taking a difference vector of the sample vector and the first feature vector as input, and taking the probability that the difference vector belongs to each second label as output to construct a classifier; and performing combined training on the feature extraction model and the classifier, wherein a loss function in the combined training is constructed on the basis of a first loss between a first feature vector and a first label vector output by the feature extraction model and a second loss between a prediction category output by the classifier and a second label. This scheme is based on the relative degree of depth information of people's face, the standard depth image who constructs out the people's face and wear gauze mask and does not wear the gauze mask is as image sample and first label in proper order, and train the model through image sample, with study people's face under the standard condition, from wearing the gauze mask to picking off the face depth information change of gauze mask process, thereby utilize the difference of change, and the second label of marking for image sample according to the elasticity state of wearing the gauze mask divides in advance learns the elasticity state that the gauze mask was worn to the people's face, obtain the classifier of the elasticity state that different people's faces worn the gauze mask, the model that obtains through the training, can directly detect the elasticity state that the gauze mask was worn to the people's face, thereby richen the detection to the unnormal mode of wearing.
Drawings
FIG. 1 is a first flowchart illustrating a first embodiment of a model training method according to the present invention;
FIG. 2 is a schematic diagram of the structure of an encryption model and a decryption model according to an embodiment of the invention;
FIG. 3 is a detailed flowchart II of a model training method according to an embodiment of the present invention;
fig. 4 is a first flowchart illustrating a face recognition method for a mask according to an embodiment of the present invention;
FIG. 5 is a second flowchart illustrating a face recognition method for a mask according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.
An embodiment of the present invention relates to a model training method, and as shown in fig. 1, the model training method provided in this embodiment includes the following steps.
Step 101: the method comprises the steps of obtaining standard depth images of a face wearing mask and a face not wearing mask, taking the standard depth image of the face wearing mask as an image sample, and sequentially taking the standard depth image of the face not wearing mask and the class divided according to the tightness state of the face wearing mask as a first label and a second label corresponding to the image sample.
In particular, a depth camera may be used to capture a face depth image. In an actual scene, the environmental conditions and states when different persons take depth maps are not completely the same, and the states when the same person takes depth maps at different times are also different. In order to obtain a better corresponding relationship between the image samples and the labels thereof and make different image samples have comparability, the present embodiment normalizes the originally acquired face depth image to obtain a depth image in a unified standard state, which is recorded as a "standard depth image".
The definition of the standard depth image is: the size of the image is fixed, the facial posture of the face in the image is correct, the expression is natural, and the face area covers the whole image area.
It should be noted that the depth information in the standard depth images referred to in this embodiment is relative depth information, and the function of the depth information is to enable the alignment of the depth images captured in different faces and/or different states to be comparable. For example, any key point (e.g. the middle position of the forehead) in the face region that is not covered by the mask may be used as a reference position, the relative depth value of the position is set to 0, and the relative depth values of other positions may be the difference of the original depths of the other positions relative to the reference position. For example: the original depth value of the reference position is 5, the original depth value of the inner corner of the eye is 4, and the relative depth value of the inner corner of the eye is-1.
After the standard depth images of different faces wearing masks and non-wearing masks are obtained, the standard depth image of the wearing mask of the same face can be used as an image sample, and the standard depth image of the non-wearing mask can be used as a first label corresponding to the image sample. For example, in two depth images of a set of worn mask and unworn mask acquired at a time of the same face, a standard depth image corresponding to a depth image of the worn mask is taken as an image sample, and a standard depth image corresponding to a depth image of the unworn mask is taken as a first label corresponding to the image sample. When the acquired standard depth images are labeled, the standard depth images of the mask worn by the same person and the standard depth images of the mask not worn by the same person need to be labeled as the same sample, so that the standard depth images of the mask worn by the same sample can be used as training samples (image samples) when a model is trained conveniently, and the standard depth images of the other mask not worn by the same person can be used as corresponding first labels for training.
The traditional detection of wearing the gauze mask state is only to whether wearing the gauze mask, whether the gauze mask shelters from the wearing gauze mask mode of assigned position (nose, mouth) and whether standardizes and detect, but does not pay attention to also can't effectively detect the elasticity state of wearing the gauze mask. However, the tightness of the wearing mask is also a key factor for effectively evaluating the protective effect of wearing the mask. For example, if the mask is worn loosely, the virus transmission is easily caused by excessive airflow; if the mask is worn too tightly, the wearing comfort of the mask is seriously affected. Therefore, in the embodiment, an innovative detection idea is provided for the tightness state of the mask.
After the image sample is obtained, a second label can be marked for the image sample according to the classification (such as too loose, moderate and too tight) of the tightness state of the mask. The judgment principle of the second label can be determined according to the physiological feeling of the face wearing the mask, the distance from the key position (mouth and nose) of the face to the mask, the distance from the edge position of the mask to the face and other factors when the standard depth image of the face wearing the mask is acquired.
In order to enable subsequent model training to have higher generalization performance, more than 200 persons of depth images need to be acquired to construct an image sample and a label thereof. The person to be collected needs to slightly rotate the head to collect the face depth images in different postures.
Step 102: and carrying out the same format conversion on the image sample and the first label thereof to respectively obtain a one-dimensional sample vector and a first label vector.
Specifically, since the image sample and the first label are both two-dimensional depth information, format conversion can be performed on the two-dimensional image sample and the first label to facilitate model training, and a one-dimensional sample vector and a first label vector are obtained respectively. The present embodiment does not limit the specific format conversion method.
In one example, format conversion may be achieved by: respectively expanding the depth values in the image sample and the first label thereof according to the row sequence or the column sequence in the image to obtain a one-dimensional vector; the one-dimensional vector after the image sample is unfolded is a sample vector, and the one-dimensional vector after the first label is unfolded is a first label vector.
Specifically, for an image sample, the depth values of the pixel points in the image sample may be expanded according to the row order or the column order of the pixel points in the image sample, so as to obtain a one-dimensional vector as a sample vector, where the length of the sample vector is the number of the pixel points included in the sample image. Similarly, for the first label (first label image), the depth value of each pixel point in the first label can be expanded according to the row order or the column order of the pixel point in the image, and a one-dimensional vector is obtained as the first label vector, and the length of the first label vector is the number of pixel points included in the first label image.
The one-dimensional vector developed by rows is as (depth value of pixel point in first row, depth value of pixel point in second row, … …, depth value of pixel point in last row), and the one-dimensional vector developed by columns is as (depth value of pixel point in first column, depth value of pixel point in second column, … …, depth value of pixel point in last column).
Step 103: taking the sample vector as input, describing a first feature vector of the face with the mask removed as output, and constructing a feature extraction model; the sample vector is the same length as the first feature vector.
Specifically, a conventional deep learning network E (simply referred to as "model E") may be used as the feature extraction model, and trainable parameters of the model E are denoted as WE. The input of the model E is the sample vector, and the output of the model E is a first characteristic vector describing the face after the mask is removed. In the feature extraction process, the length of the vector is not changed, i.e. the sample vector is the same as the length of the first feature vector.
In one example, the feature extraction model may include: an encryption model and a decryption model; accordingly, the process of constructing the feature extraction model may include the following steps.
The method comprises the following steps: and taking the sample vector as input and the one-dimensional second characteristic vector as output to construct an encryption model.
Specifically, a conventional deep learning network may be employed as the network structure of the encryption model. The input of the encryption model is the sample vector, and the output of the encryption model is a second feature vector obtained by compressing the sample vector. During encryption the length of the vector is compressed, i.e. the length of the second feature vector is smaller than the length of the sample vector. For example, the length of the second feature vector is set to a fixed value of 128 bits.
In one example, as shown in FIG. 2, the cryptographic model may include: the coiling layer, the pooling layer, the first full-connection layer and the second full-connection layer are sequentially connected in series from front to back; wherein the input of the convolutional layer is the input of the encryption model and the output of the second fully-connected layer is the output of the encryption model.
Specifically, a sample vector is sequentially subjected to a convolution layer and a pooling layer, then depth information characteristics in an image sample are extracted, and meanwhile vector length compression is realized; and then the second feature vector is output as a one-dimensional vector with a fixed length after entering two full-connection layers.
Step two: and taking the second characteristic vector output by the encryption model as input, and taking the first characteristic vector as output to construct a decryption model.
Specifically, a conventional deep learning network may be employed as the network structure of the decryption model. The input of the decryption model is a second characteristic vector output by the encryption model, and the output of the decryption model is defined as a one-dimensional vector which is corresponding to the sample vector and is subjected to mask removal, namely: the first feature vector. Therefore, the function of the decryption model is to restore the second feature vector to the first label corresponding to the image sample as much as possible, namely the first label vector of the standard depth image of the face without wearing the mask after format conversion. In the decryption process, the length of the vector is expanded, namely the length of the first feature vector is larger than that of the second feature vector, and the length of the first feature vector is the same as that of the sample vector, namely the length of the first feature vector is the same as that of the first label vector, so that the loss of the two vectors can be calculated more conveniently in the subsequent process.
In one example, as shown in FIG. 2, the decryption model may include: the third full connecting layer and the fourth full connecting layer are sequentially connected in series; wherein the input of the third fully-connected layer is the input of the decryption model and the output of the fourth fully-connected layer is the output of the decryption model.
Specifically, after the second feature vector output by the encryption model sequentially passes through the third full connection layer and the fourth full connection layer, the vector length is expanded and restored to the same length as the sample vector and the first label vector.
Step 104: and taking the difference vector of the sample vector and the first feature vector as input, and taking the probability that the difference vector belongs to each second label as output to construct a classifier.
Specifically, a conventional deep learning network C (simply referred to as "model C") is constructed as a classifier, and trainable parameters of the model C are recorded as WC. The input of the model C is a difference vector v between the sample vector and the first feature vector output by the feature extraction model in step 103, and the output is a C (the size is the same as the total second label number) dimensional vector p, where p isj,iAnd the probability value of the difference value vector corresponding to the jth image sample belonging to the ith second label is represented.
Step 105: and performing combined training on the feature extraction model and the classifier, wherein a loss function in the combined training is constructed on the basis of a first loss between a first feature vector and a first label vector output by the feature extraction model and a second loss between a prediction category output by the classifier and a second label.
Specifically, the constructed feature extraction model (model E) and the classifier (model C) are jointly trained by using image samples until a convergence condition is met. The convergence condition may include that the loss value is smaller than a certain predetermined smaller value, or that the iteration exceeds a maximum predetermined number of times, or the like. And after the training process meets the convergence condition, obtaining the feature extraction model and the classifier after the training is finished.
The loss function in the process of performing the joint training can be constructed based on a first loss between a first feature vector output by the feature extraction model and a first label vector corresponding to the first feature vector, and a second loss between a prediction category output by the classifier and a second label.
The methods of constructing the first loss and the second loss will be described below, respectively.
The first loss construction process can be realized by the following steps.
The first loss is calculated by the following formula:
Figure 339881DEST_PATH_IMAGE001
…………………………(1)
wherein L isE(WE) Is the first penalty, n is the vector length,
Figure 802086DEST_PATH_IMAGE002
order the first label vector g intoiThe value of the element(s) of (c),
Figure DEST_PATH_IMAGE003
is ordered in the first feature vector p asiThe value of (2).
Here, the order in the vector isiThe element value of (1) corresponds to the first element value in the standard depth imageiThe depth value of each pixel point.
Furthermore, the first feature vector and the first label vector need to be normalized before constructing the loss function (formula (1)) of the first loss. For example, the two vectors are normalized to be between 0 and 1, so as to achieve the effect of similar alignment.
The second loss construction process can be realized by the following steps.
The second loss is calculated by equation (2):
Figure 621662DEST_PATH_IMAGE004
………………………(2)
wherein, WCAs trainable parameters of the classifier C, LC(WC) For the second loss, B is the batch size of the image sample, c is the total class label number, yj,iIs the actual probability, p, of the ith class label to which the jth image sample belongs in a batch of image samplesj,iThe prediction probability of the ith class label to which the jth image sample belongs in the image samples is obtained.
The second penalty may be on trainable parameters W of classifier CCConstraining the trainable parameters W in the model C to be trained along the direction of the first feature vector under a certain second label with high probability of belonging to the second label and low probability of belonging to the non-second labelC
On the basis, when the feature extraction model and the classifier are subjected to combined training, the loss function during the adopted combined training can be constructed by the formula (3).
Figure DEST_PATH_IMAGE005
………………………(3)
Therein, lossallIs the loss value in the joint training, LE(WE) Is the first loss, LC(WC) The second loss.
Specifically, when the loss of the joint training is calculated by using the formula (3), the parameters (W) of the model E and the model C are optimized according to the conventional deep learning network optimization methodE、WC) Namely:
Figure 952149DEST_PATH_IMAGE006
compared with the related art, the standard depth images of the face wearing mask and the face not wearing mask are obtained, the standard depth image of the face wearing mask is used as the image sample, and the standard depth image of the face not wearing mask and the class divided according to the tightness state of the face wearing mask are sequentially used as the first label and the second label corresponding to the image sample; carrying out the same format conversion on the image sample and the first label thereof to respectively obtain a one-dimensional sample vector and a first label vector; taking the sample vector as input, and taking a first feature vector describing the face with the mask removed as output to construct a feature extraction model; the sample vector is the same as the first feature vector in length; taking a difference vector of the sample vector and the first feature vector as input, and taking the probability that the difference vector belongs to each second label as output to construct a classifier; and performing combined training on the feature extraction model and the classifier, wherein a loss function in the combined training is constructed on the basis of a first loss between a first feature vector and a first label vector output by the feature extraction model and a second loss between a prediction category output by the classifier and a second label. This scheme is based on the relative degree of depth information of people's face, the standard depth image who constructs out the people's face and wear gauze mask and does not wear the gauze mask is as image sample and first label in proper order, and train the model through image sample, with study people's face under the standard condition, from wearing the gauze mask to picking off the face depth information change of gauze mask process, thereby utilize the difference of change, and the second label of marking for image sample according to the elasticity state of wearing the gauze mask divides in advance learns the elasticity state that the gauze mask was worn to the people's face, obtain the classifier of the elasticity state that different people's faces worn the gauze mask, the model that obtains through the training, can directly detect the elasticity state that the gauze mask was worn to the people's face, thereby richen the detection to the unnormal mode of wearing.
Another embodiment of the present invention relates to a model training method which is an improvement of the model training method shown in fig. 1, the improvement being: the process of obtaining standard depth images of a face wearing a mask and a face not wearing a mask is refined. As shown in fig. 3, the above step 101 may include the following sub-steps.
Substep 1011: original depth images of a plurality of faces of a wearer and a non-wearer are acquired.
Specifically, a depth camera may be used to capture a face depth image, and two face depth images in two states, i.e., a state where the same person wears a mask and a state where the same person does not wear the mask, are taken as a set of depth images in the captured face depth image at a time. For each group of depth images, facial expressions and posture features of the face of the shot person are required to be consistent as much as possible, and the only difference is limited to the difference between a worn mask and an unworn mask. In this way, under the same shooting conditions, the difference between two captured depth images is theoretically limited to the difference between the depth information of the mask-shielded area, and the depth information of the other areas is the same.
Substep 1012: and selecting a face area from the original depth image, and adjusting the face angle in the face area to be the front face posture.
Specifically, the original depth image is subjected to face recognition to obtain a face region (the position of the face in a non-mask wearing state, the face in a mask wearing state + the position of the mask), and the face region is selected by using a rectangular frame. And then, adjusting the face angle in the face area to be the front face posture.
In this embodiment, the method for evaluating the face angle in the face region and adjusting the face angle to the front face pose is not limited.
In one example, adjusting the face angle in the face region to a frontal face pose may be accomplished by the following steps.
The method comprises the following steps: and rotating the preset frontal face depth template to obtain different angles, calculating Euclidean distances of the depth maps between the frontal face depth template and the face region at different angles, and taking the angle with the minimum Euclidean distance as the Euler angle of the face region.
Specifically, a large number of face depth maps with positive attitude can be collected in advance, a face depth template is formed by fitting the face depth maps with the least square method, and corresponding face key points are taken out. And then, continuously rotating the front face depth template to obtain different angles, and calculating Euclidean distances between the front face depth template and the obtained depth map of the face region at different angles, wherein the rotating angle of the front face depth template corresponding to the minimum Euclidean distance value is the face angle in the face region, namely the Euler angle of the face region. In order to reduce the calculation amount, when calculating the euclidean distance between two depth maps, the corresponding face key points in the two depth maps can be selected to calculate the minimum euclidean distance. For the face depth map of the mask, after the positions of part of face key points shielded by the mask are estimated, the euclidean distance between the corresponding face key points can be calculated.
Step two: and reversely rotating the face angle in the face region by an Euler angle to obtain the face region in the front face posture.
Specifically, after a face angle (euler angle) in the face region is obtained, the face may be rotated in a reverse direction by the euler angle, so as to obtain the face region in the front face posture.
The rotation matrix formula corresponding to the face rotation is as follows:
establishing a coordinate system, setting the abscissa of the depth map as an x axis, setting the ordinate as a y axis, setting the depth value as a z axis, and setting Euler angles [ theta x, theta y and theta z ] expressed by the human face posture]. Right-multiplying the depth image by the corresponding rotation matrix (R x (θ)、R y (θ)、R z (θ) Can convert the face angle into a positive face, namely:
rotation about the x-axis:
Figure 741114DEST_PATH_IMAGE007
………………………(4)
rotation about the y-axis:
Figure 374220DEST_PATH_IMAGE008
………………………(5)
rotation about the z-axis:
Figure 209321DEST_PATH_IMAGE009
………………………(6)
wherein, in the formulas (4), (5) and (6)θWill correspond to thetax, thetay and thetaz in that order.
Substep 1013: the face area under the frontal face posture is adjusted to be in a unified preset size to form a standard depth image, the standard depth image of a mask worn by the same face is used as an image sample, and the standard depth image of the mask not worn by the same face is used as a first label corresponding to the image sample.
Specifically, after the face region in the front face pose is obtained, the image size corresponding to the face region in the front face pose may be scaled to have the same size that is uniformly preset, so as to form a standard depth image corresponding to the original depth image. Namely, standard depth images of a face wearing mask and a face not wearing mask. And after the standard depth image is obtained, the standard depth image of the mask worn on the same face is used as an image sample, and the standard depth image of the mask not worn is used as a first label corresponding to the image sample.
Compared with the related art, the embodiment obtains the original depth images of the worn mask and the unworn mask of a plurality of human faces; selecting a face area from the original depth image frame, and adjusting the face angle in the face area to be a front face posture; and adjusting the face area under the front face posture to be in a unified preset size to form a standard depth image, thereby quickly obtaining the standard depth image corresponding to each original depth image.
One embodiment of the present invention relates to a method for detecting a wearing mask state, which is implemented based on the model training method in the above embodiment. As shown in fig. 4, the method for detecting the wearing mask state according to the present embodiment includes the following steps.
Step 201: acquiring a first standard depth image of the face mask to be detected.
Specifically, for the face to be detected, an original depth image of the face in the mask wearing state is obtained, and then a standard depth image of the face to be detected in the mask wearing state is obtained by adopting the same processing procedure as that of the standard depth image obtained in step 101. In this embodiment, the standard depth image of the face to be detected in the mask wearing state is referred to as a "first standard depth image".
Step 202: and carrying out format conversion on the first standard depth image to obtain a one-dimensional detection vector.
Specifically, for the first standard depth image, the format of the first standard depth image is converted by using the same processing procedure as that in step 102, so as to obtain a one-dimensional vector, and the one-dimensional vector is recorded as a "detection vector".
Step 203: and (3) sequentially processing the detection vectors by adopting a feature extraction model and a classifier obtained by joint training of a model training method to obtain a first feature vector corresponding to the detection vectors and the category of the tightness state of the wearing mask.
Specifically, for the first standard depth image, the feature extraction model and the classifier obtained through training by the training method in the above embodiment are sequentially processed to obtain a first feature vector output by the feature extraction model, and a category of tightness state of the mask worn in the first standard depth image output by the classifier, such as any one of too loose, moderate and too tight.
By the method, not only can the class of the tightness state of the mask worn when the mask is worn on the face be detected, but also the face features of the face with the mask removed, namely the first feature vector, can be obtained. And the first feature vector can identify the identity information of the face.
For example, as shown in fig. 5, in one example, the following steps may also be performed after step 203.
Step 204: and comparing the first feature vector corresponding to the detection vector with the one-dimensional vector after format conversion corresponding to the second standard depth image of the face without wearing the mask in the registry, and determining the identity information of the face to be detected.
The registry stores a plurality of one-dimensional vectors in advance, and each one-dimensional vector is obtained by converting the format of a standard depth image when the face does not wear the mask, namely a second standard depth image.
Specifically, when the face to be detected and the face corresponding to a certain one-dimensional vector in the registry are the same person, the first feature vector corresponding to the face to be detected should be similar to the one-dimensional vector in the registry. Therefore, the first feature vector is compared with each one-dimensional vector in the registry in a similar manner, and for the one-dimensional vector exceeding the similarity threshold, the face corresponding to the one-dimensional vector with the largest similarity value is determined to be the face to be detected currently, so that the identity information of the face to be detected is determined.
Compared with the prior art, the method and the device have the advantages that the first standard depth image of the face wearing mask to be detected is obtained; carrying out format conversion on the first standard depth image to obtain a one-dimensional detection vector; the detection vectors are sequentially processed by adopting the feature extraction model and the classifier obtained by the model training method in the combined training, so that the first feature vectors corresponding to the detection vectors and the class of the tightness state of the mask wearing are obtained, and the tightness state of the mask wearing is detected.
Furthermore, the first feature vector corresponding to the detection vector is compared with the one-dimensional vector after format conversion corresponding to the second standard depth image of the face without wearing the mask in the registry, so that the identity information of the face to be detected is determined, and the accuracy of face recognition of the wearing mask is ensured.
Another embodiment of the invention relates to an electronic device, as shown in FIG. 6, comprising at least one processor 302; and a memory 301 communicatively coupled to the at least one processor 302; the memory 301 stores instructions executable by the at least one processor 302, and the instructions are executed by the at least one processor 302 to enable the at least one processor 302 to perform any of the method embodiments described above.
Where the memory 301 and processor 302 are coupled in a bus, the bus may comprise any number of interconnected buses and bridges that couple one or more of the various circuits of the processor 302 and memory 301 together. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 302 is transmitted over a wireless medium through an antenna, which further receives the data and transmits the data to the processor 302.
The processor 302 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 301 may be used to store data used by processor 302 in performing operations.
Another embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The computer program realizes any of the above-described method embodiments when executed by a processor.
That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims (12)

1. A method of model training, comprising:
acquiring standard depth images of a face wearing mask and a face not wearing mask, and taking the standard depth image of the face wearing mask as an image sample, wherein the standard depth image of the face not wearing mask and the class divided according to the tightness state of the face wearing mask are sequentially taken as a first label and a second label corresponding to the image sample;
carrying out the same format conversion on the image sample and the first label thereof to respectively obtain a one-dimensional sample vector and a first label vector;
taking the sample vector as input, and taking a first feature vector describing the face with the mask removed as output to construct a feature extraction model; the sample vector is the same length as the first feature vector;
taking a difference vector of the sample vector and the first feature vector as an input, and taking the probability that the difference vector belongs to each second label as an output to construct a classifier;
performing joint training on the feature extraction model and the classifier, wherein a loss function in the joint training is constructed based on a first loss between the first feature vector and the first label vector output by the feature extraction model and a second loss between a prediction class output by the classifier and the second label;
the feature extraction model includes: an encryption model and a decryption model; with the sample vector is as input, describes the first feature vector of face after removing the gauze mask as output, constructs the feature extraction model, includes:
taking the sample vector as input and the one-dimensional second characteristic vector as output to construct the encryption model;
and taking the second feature vector output by the encryption model as an input, and taking the first feature vector as an output to construct the decryption model.
2. The method of claim 1, wherein said obtaining a standard depth image of a face wearing mask and a non-wearing mask comprises:
acquiring original depth images of a plurality of faces of a wearer and a non-wearer;
selecting a face region from the original depth image, and adjusting the face angle in the face region into a front face posture;
and adjusting the face area under the face-righting posture to be in a unified preset size to form the standard depth image.
3. The method of claim 2,
the adjusting of the face direction in the face region to the face pose comprises:
rotating a preset front face depth template to obtain different angles, calculating Euclidean distances of depth maps between the front face depth template and the face region at different angles, and taking the angle with the minimum Euclidean distance as an Euler angle of the face region;
and reversely rotating the face angle in the face region by the Euler angle to obtain the face region in the front face posture.
4. The method of claim 1, wherein performing the same format conversion on the image sample and the first label thereof to obtain a one-dimensional sample vector and a first label vector, respectively, comprises:
respectively expanding the depth values in the image sample and the first label thereof according to the row sequence or the column sequence in the image to obtain a one-dimensional vector; the one-dimensional vector after the image sample is unfolded is the sample vector, and the one-dimensional vector after the first label is unfolded is the first label vector.
5. The method of claim 1, wherein the cryptographic model comprises: the coiling layer, the pooling layer, the first full-connection layer and the second full-connection layer are sequentially connected in series from front to back; the input of the convolution layer is the input of the encryption model, and the output of the second full-connection layer is the output of the encryption model;
the decryption model includes: the third full connecting layer and the fourth full connecting layer are sequentially connected in series; the input of the third fully-connected layer is the input of the decryption model, and the output of the fourth fully-connected layer is the output of the decryption model.
6. The method of claim 1, wherein the first loss is constructed by:
calculating the first loss by the following equation:
Figure 46653DEST_PATH_IMAGE001
wherein L isE(WE) For the first penalty, n is the vector length,
Figure 170598DEST_PATH_IMAGE002
ordering the first label vector g intoiThe value of the element(s) of (c),
Figure 300228DEST_PATH_IMAGE003
ordering the first feature vector p intoiThe value of (2).
7. The method of claim 6, further comprising:
normalizing the first feature vector and the first label vector prior to constructing a loss function for the first loss.
8. The method of claim 1, wherein the second loss is constructed by:
calculating the second loss by the following formula:
Figure 583442DEST_PATH_IMAGE004
wherein, WCAs trainable parameters of the classifier C, LC(WC) For the second loss, B is the batch size of the image sample, C is the total second label number, yj,iIs the actual probability, p, of the ith second label to which the jth image sample belongs in the image samplesj,iThe prediction probability of the ith second label which belongs to the jth image sample in the image samples is obtained.
9. A method for detecting the state of a wearing mask, comprising:
acquiring a first standard depth image of a face wearing mask to be detected;
carrying out format conversion on the first standard depth image to obtain a one-dimensional detection vector;
and (3) sequentially processing the detection vectors by adopting a feature extraction model obtained by the combined training of the model training method according to any one of claims 1 to 8 and a classifier to obtain the first feature vectors corresponding to the detection vectors and the categories of the tightness state of the mask.
10. The method of claim 9, further comprising:
and comparing the first feature vector corresponding to the detection vector with the one-dimensional vector after format conversion corresponding to the second standard depth image of the face without wearing the mask in the registry, and determining the identity information of the face to be detected.
11. An electronic device, comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the model training method of any one of claims 1 to 8 and the wear mask status detection method of claim 9 or 10.
12. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the model training method according to any one of claims 1 to 8 and the wearing mask state detection method according to claim 9 or 10.
CN202111575831.8A 2021-12-22 2021-12-22 Model training method, mask wearing state detection method, electronic device and storage medium Active CN113963237B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111575831.8A CN113963237B (en) 2021-12-22 2021-12-22 Model training method, mask wearing state detection method, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111575831.8A CN113963237B (en) 2021-12-22 2021-12-22 Model training method, mask wearing state detection method, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN113963237A CN113963237A (en) 2022-01-21
CN113963237B true CN113963237B (en) 2022-03-25

Family

ID=79473646

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111575831.8A Active CN113963237B (en) 2021-12-22 2021-12-22 Model training method, mask wearing state detection method, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN113963237B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114612738B (en) * 2022-02-16 2022-11-11 中国科学院生物物理研究所 Training method of cell electron microscope image segmentation model and organelle interaction analysis method
CN116631019B (en) * 2022-03-24 2024-02-27 清华大学 Mask suitability detection method and device based on facial image

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107121372A (en) * 2017-05-03 2017-09-01 东华大学 A kind of nose respiratory air flow monitor and the method for obtaining mouth mask respiratory resistance
CN111611874A (en) * 2020-04-29 2020-09-01 杭州电子科技大学 Face mask wearing detection method based on ResNet and Canny
CN111914629A (en) * 2020-06-19 2020-11-10 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for generating training data for face recognition
CN112115818A (en) * 2020-09-01 2020-12-22 燕山大学 Mask wearing identification method
CN112163016A (en) * 2020-09-24 2021-01-01 易显智能科技有限责任公司 Driving capability evaluation method based on Internet of vehicles cloud cooperation and related device
CN112200154A (en) * 2020-11-17 2021-01-08 苏州方正璞华信息技术有限公司 Face recognition method and device for mask, electronic equipment and storage medium
CN112528830A (en) * 2020-12-07 2021-03-19 南京航空航天大学 Lightweight CNN mask face pose classification method combined with transfer learning
CN113221667A (en) * 2021-04-20 2021-08-06 北京睿芯高通量科技有限公司 Face and mask attribute classification method and system based on deep learning
CN113537066A (en) * 2021-07-16 2021-10-22 烽火通信科技股份有限公司 Wearing mask face recognition method based on multi-granularity mixed loss and electronic equipment
CN113688793A (en) * 2021-09-22 2021-11-23 万章敏 Training method of face model and face recognition system
CN113785304A (en) * 2021-09-20 2021-12-10 商汤国际私人有限公司 Face recognition method and device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070248238A1 (en) * 2005-12-13 2007-10-25 Abreu Marcio M Biologically fit wearable electronics apparatus and methods
CN107735136B (en) * 2015-06-30 2021-11-02 瑞思迈私人有限公司 Mask sizing tool using mobile applications
CN109815801A (en) * 2018-12-18 2019-05-28 北京英索科技发展有限公司 Face identification method and device based on deep learning
US11030782B2 (en) * 2019-11-09 2021-06-08 Adobe Inc. Accurately generating virtual try-on images utilizing a unified neural network framework
CN113505768A (en) * 2021-09-10 2021-10-15 北京的卢深视科技有限公司 Model training method, face recognition method, electronic device and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107121372A (en) * 2017-05-03 2017-09-01 东华大学 A kind of nose respiratory air flow monitor and the method for obtaining mouth mask respiratory resistance
CN111611874A (en) * 2020-04-29 2020-09-01 杭州电子科技大学 Face mask wearing detection method based on ResNet and Canny
CN111914629A (en) * 2020-06-19 2020-11-10 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for generating training data for face recognition
CN112115818A (en) * 2020-09-01 2020-12-22 燕山大学 Mask wearing identification method
CN112163016A (en) * 2020-09-24 2021-01-01 易显智能科技有限责任公司 Driving capability evaluation method based on Internet of vehicles cloud cooperation and related device
CN112200154A (en) * 2020-11-17 2021-01-08 苏州方正璞华信息技术有限公司 Face recognition method and device for mask, electronic equipment and storage medium
CN112528830A (en) * 2020-12-07 2021-03-19 南京航空航天大学 Lightweight CNN mask face pose classification method combined with transfer learning
CN113221667A (en) * 2021-04-20 2021-08-06 北京睿芯高通量科技有限公司 Face and mask attribute classification method and system based on deep learning
CN113537066A (en) * 2021-07-16 2021-10-22 烽火通信科技股份有限公司 Wearing mask face recognition method based on multi-granularity mixed loss and electronic equipment
CN113785304A (en) * 2021-09-20 2021-12-10 商汤国际私人有限公司 Face recognition method and device
CN113688793A (en) * 2021-09-22 2021-11-23 万章敏 Training method of face model and face recognition system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于改进YOLOv3的自然场景人员口罩佩戴检测算法;程可欣等;《计算机系统应用》;20210215;第231-236页 *

Also Published As

Publication number Publication date
CN113963237A (en) 2022-01-21

Similar Documents

Publication Publication Date Title
CN111191616A (en) Face shielding detection method, device, equipment and storage medium
CN113963237B (en) Model training method, mask wearing state detection method, electronic device and storage medium
Burl et al. Face localization via shape statistics
CN109819208A (en) A kind of dense population security monitoring management method based on artificial intelligence dynamic monitoring
WO2020107847A1 (en) Bone point-based fall detection method and fall detection device therefor
CN105740780B (en) Method and device for detecting living human face
CN112766160A (en) Face replacement method based on multi-stage attribute encoder and attention mechanism
CN111598038B (en) Facial feature point detection method, device, equipment and storage medium
US20050084140A1 (en) Multi-modal face recognition
CN104143076B (en) The matching process of face shape and system
CN111274916A (en) Face recognition method and face recognition device
CN110633624B (en) Machine vision human body abnormal behavior identification method based on multi-feature fusion
CN113963426B (en) Model training method, mask wearing face recognition method, electronic device and storage medium
CN114937232B (en) Wearing detection method, system and equipment for medical waste treatment personnel protective appliance
CN113963183B (en) Model training method, face recognition method, electronic device and storage medium
CN111639580B (en) Gait recognition method combining feature separation model and visual angle conversion model
CN111914643A (en) Human body action recognition method based on skeleton key point detection
CN113947803B (en) Model training, sample data generation method for face recognition and electronic equipment
CN111160121A (en) Portrait recognition system, method and device based on deep learning
CN115273150A (en) Novel identification method and system for wearing safety helmet based on human body posture estimation
CN115376184A (en) IR image in-vivo detection method based on generation countermeasure network
CN107742112A (en) A kind of face method for anti-counterfeit and device based on image
JP6876312B1 (en) Learning model generation method, computer program and information processing device
CN114038045A (en) Cross-modal face recognition model construction method and device and electronic equipment
CN205541026U (en) Double - circuit entrance guard device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230404

Address after: 230091 room 611-217, R & D center building, China (Hefei) international intelligent voice Industrial Park, 3333 Xiyou Road, high tech Zone, Hefei, Anhui Province

Patentee after: Hefei lushenshi Technology Co.,Ltd.

Address before: 100083 room 3032, North B, bungalow, building 2, A5 Xueyuan Road, Haidian District, Beijing

Patentee before: BEIJING DILUSENSE TECHNOLOGY CO.,LTD.

Patentee before: Hefei lushenshi Technology Co.,Ltd.