CN116758617B

CN116758617B - Campus student check-in method and campus check-in system under low-illuminance scene

Info

Publication number: CN116758617B
Application number: CN202311027739.7A
Authority: CN
Inventors: 肖芸; 李武; 云贵全
Original assignee: Sichuan Information Technology College
Current assignee: Sichuan Information Technology College
Priority date: 2023-08-16
Filing date: 2023-08-16
Publication date: 2023-11-10
Anticipated expiration: 2043-08-16
Also published as: CN116758617A

Abstract

The application provides a campus student check-in method and a campus check-in system under a low-light-intensity scene. The sign-in method comprises the following steps: the self-calibration technology is utilized to enhance the low-light intensity of the campus face image; creating a face sign-in encoder by using a converter neural network model and encoding an image; performing multi-scale local feature alignment and multi-scale global feature alignment on the encoder by utilizing a gradient inversion technology; creating a decoder by using a converter neural network model and realizing a low-light illuminance detection head; and training and storing the model by using an unsupervised domain self-adaptive technology to obtain a face sign-in result in a low-light scene. The sign-in system comprises a processing unit, a coding unit, an alignment unit and an identification unit, so as to realize the sign-in method. The application can effectively detect the object in the low-illumination image through unsupervised self-adaptation, and obviously reduce the dependence of the model on the sample. The overall properties of the image can be aligned to reduce feature bias.

Description

Campus student check-in method and campus check-in system under low-illuminance scene

Technical Field

The application relates to the technical field of face recognition check-in, in particular to a campus student check-in method and a campus check-in system under a low-light-intensity scene.

Background

Face recognition sign-in is a basic task of computer vision and is widely applied to industrial scenes such as face sign-in, automatic driving, scene understanding and the like. While low light environments are an integral part of everyday activities, low light environments pose a significant challenge to computer vision. In general, an image obtained during night or foggy days has characteristics of low contrast, low brightness, noise, and blurring due to insufficient light. Such images directly degrade the performance of the existing face check-in model, resulting in significant detection errors. Despite a major breakthrough in the face recognition field, existing research involves bright images, not dim light. Therefore, the campus face check-in method suitable for the low-illumination image is very important for application of the artificial intelligence in the campus. The face recognition system in the low illumination scene at present mainly comprises: (1) image enhancement-based detection methods. In order to obtain reliable detection in adverse conditions such as night or cloudy days, the requirement for low-light image enhancement must be met. This method requires pre-processing the low-light image to improve brightness and contrast, and then detecting the enhanced image. (2) an end-to-end detection based method. This class uses supervised learning to build a detection model and requires a large amount of annotated training data that is expensive and time consuming to collect. And (3) an adaptive detection method based on an unsupervised field. Using a labeled dataset as a source domain and an unlabeled dataset as a target domain helps the model learn domain invariant feature representation at the domain or class level. In the case of no or few tags in the normal illumination data set, the method may apply features learned from the normal illumination data to the low illumination image detection.

Disclosure of Invention

The application aims to at least solve one of the following technical problems in the prior art:

(1) Low light illumination scenes cannot be handled. In the existing campus face check-in system, images obtained in foggy days and nights have the characteristics of low contrast, low brightness, noise, blurring caused by insufficient light and the like. Such images directly degrade the performance of existing object detection models, resulting in significant detection errors.

(2) Image global feature loss. Meaning that the distribution of source domain images with normal illumination and target domain images with low illumination may not be exactly matched at the global image level using contrast loss, as the two domains have different scene layouts and object combinations.

(3) The local features of the image are lost. It is meant that the local features, such as texture and color of the source domain image with normal illumination and the target domain image with low illumination, are perfectly matched, which may fail due to the deviation of class-level semantics of the two images. In most current approaches, researchers are only concerned with using local or global feature alignment.

Therefore, the first aspect of the application provides a campus student check-in method under a low-light-intensity scene.

The second aspect of the application provides a campus check-in system.

The application provides a campus student check-in method under a low-light-intensity scene, which comprises the following steps:

s1, performing low-illuminance enhancement on a campus face image by using a self-calibration technology;

s2, creating a face sign-in encoder by using a converter neural network model and encoding an image;

s3, performing multi-scale local feature alignment and multi-scale global feature alignment on the encoder by utilizing a gradient inversion technology;

s4, creating a decoder by using a converter neural network model and realizing a low-illuminance detection head;

and S5, training and storing a model by using an unsupervised domain self-adaptive technology to obtain a face sign-in result in a low-illuminance scene.

According to the technical scheme, the campus student check-in method under the low-light-intensity scene can also have the following additional technical characteristics:

in the above technical solution, step S1 includes:

s11, establishing a low-illuminance face recognition data set, extracting at least part of data in the low-illuminance face recognition data set as a source domain data set, and taking the rest data in the low-illuminance face recognition data set as a target domain data set;

s12, performing low-light image enhancement on the images in the source domain data set by using a homomorphic filtering model, learning the illuminance relation between the low-illuminance images and the expected clear images, performing illuminance estimation while enhancing the images, acquiring enhanced output brightness by removing the estimated illuminance, and establishing an illuminance learning relation according to a homomorphic filtering theory;

s13, self-calibrating is conducted on the relationship of the contrast degree; firstly, defining a self-calibration module to enable each stage in the low-light image enhancement process to converge to the same state; defining the input of each previous stage as low light observation, bridging the input of each stage; then, introducing a self-calibration map and adding it to low-light observations, presenting an illuminance difference between the input in each stage and the first stage; finally, a self-calibration model is formed.

In the above technical solution, step S1 further includes:

s14, training a self-calibration model; unsupervised learning is employed to enhance the network learning capability, where fidelity is defined as the total loss of self-calibration model.

In the above technical solution, step S2 includes:

s21, creating a face sign-in encoder; the encoder is a bidirectional encoding structure based on a converter neural network, and comprises a multi-head attention and a feedforward neural network;

s22, dividing an image in the source domain data set into a plurality of image blocks;

s23, inputting each image block into a face sign-in encoder to perform vector calculation, obtaining a multi-head attention vector of a face image, and creating a multi-head attention moment array to calculate the attention score of the multi-head attention vector of the face image;

s24, carrying out normalization processing on the attention score to obtain the attention score after normalization processing;

s25, inputting the attention score subjected to normalization processing into a feedforward neural network FFN for linear transformation to obtain a shallow face image feature vector of the low-illumination face image.

In the above technical solution, in step S3, the performing multi-scale local feature alignment includes:

s31, shallow face image feature vectors of the low-illumination face images are sent into a gradient inversion layer GRL, an antagonism learning strategy is used for reducing loss of a multi-scale local feature alignment module in a forward propagation process, and in a reverse propagation process, the gradient inversion layer GRL multiplies input errors by negative scalar to increase loss of the multi-scale local feature alignment module, so that low-level feature differences of a source domain data set and a target domain data set are reduced;

s32, feeding the characteristic diagram generated in the S31 into a plurality of convolution layers with different channel sizes;

s33, inputting the feature map processed in the S32 into a corresponding domain classification layer, and training a loss function of a multi-scale local feature alignment module by using a least square method, wherein the loss function of the multi-scale local feature alignment module comprises local feature alignment loss of a source domain data set and local feature alignment loss of a target domain data set.

In the above technical solution, in step S3, the multi-scale global feature alignment includes

S34, sending the output vector subjected to multi-scale local feature alignment into a gradient inversion layer GRL, and using an antagonism learning strategy, wherein the gradient inversion layer GRL minimizes the loss of a multi-scale global feature alignment module during forward propagation and maximizes the loss of the multi-scale global feature alignment module by multiplying an input error by a negative scalar during backward propagation, thereby reducing low-level feature differences between a source domain data set and a target domain data set;

s35, feeding the characteristic diagram generated in the S34 into a plurality of convolution layers with different channel sizes;

s36, inputting the feature map processed in the S35 into a domain classification layer, so that a domain classifier cannot distinguish whether the features come from a source domain data set or a target domain data set, and training a loss function of a multi-scale global feature alignment module by using a least square method, wherein the loss function of the multi-scale global feature alignment module comprises global feature alignment loss of the source domain data set and global feature alignment loss of the target domain data set.

In the above technical solution, step S4 includes:

s41, creating a face sign-in decoder; the decoder is a bidirectional coding structure based on a converter neural network; the decoder includes a multi-headed attention and feed-forward neural network;

s42, the feature vector obtained in the step S3 is sent to a decoder;

s43, traversing the decoder to realize a face detection head network;

s44, creating a face sign-in detection head network, and obtaining a weight vector and a deviation term by training a fully-connected neural network; and calculating face recognition effect loss estimation by using a cross entropy loss function according to the prediction result of the detection network.

In the above technical solution, step S43 includes:

s431, traversing each decoder layer in turn to obtain an attention score;

s432, creating a multi-head attention matrix, wherein the multi-head attention moment matrix comprises a query matrix, a key matrix and a value matrix;

s433, calculating attention scores according to the multi-head attention matrix;

s434, carrying out normalization processing on the attention score to obtain a normalized attention score;

s435, inputting the attention score after normalization processing into a feedforward neural network, and outputting an image semantic vector through the feedforward neural network.

In the above technical solution, step S5 includes:

s51, performing self-adaptive training in an unsupervised domain; firstly, training a model by using a source domain data set, and storing the model and setting the model as a source domain after training; then, the learned knowledge of the model is migrated to a target domain data set and set as a target domain; finally, testing the model by using the target domain data set;

s52, recognizing a face sign-in; and (5) classifying and regressing the output vector of the model in the step (S51) by using a fully connected neural network to obtain a detection result of face recognition.

The application also provides a campus check-in system, which adopts the method of any one of the technical proposal to realize the campus student check-in under the low-light illumination scene, comprising the following steps:

the processing unit is used for enhancing low-illuminance of the campus face image by using a self-calibration technology, and a campus face image data preprocessing module is established;

the encoding unit is used for creating a face sign-in encoder by using the converter neural network model, encoding the image and creating a campus face encoding unit;

the alignment unit is used for carrying out multi-scale local feature alignment and multi-scale global feature alignment on the encoder by utilizing a gradient inversion technology, and establishing a campus face alignment unit;

and the identification unit is used for creating a decoder by using the converter neural network model, realizing a low-light detection head, training and storing the model by using an unsupervised domain self-adaptive technology, and obtaining a face sign-in result in a low-light scene.

In summary, due to the adoption of the technical characteristics, the application has the beneficial effects that:

(1) The present application improves a generic object detection network by using a normal illumination image as a source domain and a low illumination image as a target domain. The application can effectively detect the object in the low-illumination image through unsupervised self-adaptation, and obviously reduce the dependence of the model on the sample.

(2) The present application develops a new domain adaptive multi-scale local feature alignment module and a multi-scale global feature alignment module that performs multi-scale local feature alignment on a feature map to align the perceived field in the feature map, thereby reducing low-level feature bias. And carrying out multi-scale global (image-level) feature alignment on the feature map to align the overall attribute of the image, thereby reducing feature deviations of the background, the scene, the target layout and the like.

(3) Based on comprehensive evaluation and comparison with the current method, the method provided by the application has the advantages that the performance is improved in the low-illumination campus face check-in, and the method provided by the application has good generalization capability.

Additional aspects and advantages of the application will be set forth in part in the description which follows, or may be learned by practice of the application.

Drawings

The foregoing and/or additional aspects and advantages of the application will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:

fig. 1 is a flow chart of a campus student check-in method in a low-light scene according to an embodiment of the present application;

fig. 2 is a block diagram of a campus check-in system according to an embodiment of the present application.

The correspondence between the reference numerals and the component names in fig. 1 to 2 is:

210. a processing unit; 220. a coding unit; 230. an alignment unit; 240. and an identification unit.

Detailed Description

In order that the above-recited objects, features and advantages of the present application will be more clearly understood, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description. It should be noted that, without conflict, the embodiments of the present application and features in the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, however, the present application may be practiced otherwise than as described herein, and therefore the scope of the present application is not limited to the specific embodiments disclosed below.

A campus student check-in method and a campus check-in system in a low-light scene according to some embodiments of the present application are described below with reference to fig. 1 to 2.

Some embodiments of the application provide a campus student check-in method in a low-light-intensity scene.

As shown in fig. 1, a first embodiment of the present application provides a campus student check-in method under a low-light scene, including the following steps:

s1, performing low-illuminance enhancement on a campus face image by using a self-calibration technology; specifically, step S1 includes:

s11, establishing a low-illuminance face recognition data set, extracting at least part of data in the low-illuminance face recognition data set as a source domain data set, and taking the rest data in the low-illuminance face recognition data set as a target domain data set; in a specific embodiment, the first 80% of the extracted low-light face recognition dataset is set as the source domain dataset X _S Then, the remaining 20% of the low-illuminance face recognition data set is extracted and set as the target domain data set X _T The method comprises the steps of carrying out a first treatment on the surface of the Finally, the target domain data set X _T The tag in (a) is deleted.

S12, using homomorphic filtering model to make source domain data set X _S The image in (2) is subjected to low-light image enhancement, the illuminance relation between the low-light image and the expected clear image is learned, and the formula is as follows:c, wherein a is a desired clear image, y is a low-illumination image, and c is adjustable light; then, carrying out illumination estimation while enhancing the image, obtaining enhanced output brightness by removing the estimated illumination, and establishing an illumination learning relation according to a homomorphic filtering theory; wherein a parameter θ is introduced to map the illuminance relationship +.>The method comprises the steps of carrying out a first treatment on the surface of the The illuminance learning relationship F is:

wherein u is ^t Residual terms representing the T-th stage (t=0, …, T-1), T being the total number of stages, x ^t Representing the illuminance at the T-th stage (t=0, …, T-1), y being a low-illuminance image; wherein a weight sharing mechanism is employed, i.e. with the same architecture H and weights θ in each stage.

S13, self-calibrating is conducted on the relationship of the contrast degree; firstly, defining a self-calibration module to enable each stage in the low-light image enhancement process to converge to the same state; defining the input of each previous stage as low light observation, bridging the input of each stage; self-calibration maps s and v are then introduced and added to the low-light observations, presenting an illuminance difference between the input in each phase and the first phase, the formula:

wherein,a self-calibration model is generated; t is greater than or equal to 1, z ^t Is a sharp image input of each stage, v ^t Is the conversion input of each stage s ^t Is the mapping input of each stage, K _θ Is an introduced parameterized operator and a learnable parameter.

Finally, the illuminance learning relation that forms the self-calibration model, i.e., the unit (t.gtoreq.1) converted into the basic t-th stage, can be written as:

。

in some embodiments, step S1 further comprises:

s14, training a self-calibration model; in view of the inaccuracy of existing training methods, unsupervised learning is employed to enhance the ability of the network to learn, where the total loss of the model is defined as L _f ，L _f Representing fidelity, then there are:

where d is the predicted illuminance result, e ^t-1 And outputting the result of the calibration.

S2, creating a face sign-in encoder by using a converter neural network model and encoding an image; specifically, step S2 includes:

s21, creating a face sign-in encoder; the encoders are bi-directional encoding structures based on a neural network of transducers, in one particular embodiment 12 in number, each comprising a multi-headed attention and a feed-forward neural network;

s22, dividing the image in the source domain data set Xs into a plurality of image blocks;

s23, inputting each image block into a face sign-in encoder to perform vector calculation, obtaining a multi-head attention vector of a face image, and creating a multi-head attention moment array to calculate the attention score of the multi-head attention vector of the face image; the calculation method comprises the following steps:

multihead = Concat(head ₁ , …, head _h )×W _Q

wherein multihead represents image multi-head attention, concat () represents image attention connection function, head _i Represents the i-th image attention, h represents the vector size, W _Q Representing the weight vector, Q representing the query matrix, softmax (.) representing the normalization function, K ^T Representing the transposed key matrix, V representing the value matrix; d, d _k Representing the dimensions of the key matrix.

S24, carrying out normalization processing on the attention score to obtain the attention score after normalization processing, wherein the method comprises the following steps:

M=U+Sublayer(U)

wherein M represents the attention score vector after residual connection, U represents the attention of the face image, and subayer () represents the residual connection.

S25, inputting the attention score vector M subjected to normalization processing into a feedforward neural network FFN to perform linear transformation, and obtaining a shallow face image feature vector of the low-illumination face image. Specifically, the following formula is employed:

FFN(M) = Max(0, W ₁ +b ₁ )×W ₂ +b ₂

wherein Max (.) represents the activation function of the neuron, 0 represents the gradient of the activation function, W ₁ Represents the weight of layer 1, W ₂ Representing layer 2 weights, b ₁ Representing parameters to be learned of the first layer, b ₂ Representing the parameters to be learned of the second layer.

S3, performing multi-scale local feature alignment and multi-scale global feature alignment on the encoder by utilizing a gradient inversion technology; specifically, in step S3, the performing multi-scale local feature alignment includes:

s31, sending shallow face image feature vectors of the low-illumination face image into a gradient inversion layer GRL, and using an antagonism learning strategy to furthest reduce the loss of a multi-scale local feature alignment module in the forward propagation process, wherein in the backward propagation process, the gradient inversion layer GRL multiplies an input error by a negative scalar to furthest increase the loss of the multi-scale local feature alignment module, so that the low-level feature difference of a source domain data set and a target domain data set is reduced;

s32, feeding the feature map generated in the S31 into a plurality of convolution layers with different channel sizes so as to improve domain invariance of features obtained by a feature extraction network;

s33, inputting the feature map processed in the S32 into a corresponding domain classification layer DC, and training a loss function of a multi-scale local feature alignment module by using a least square method, wherein the loss function of the multi-scale local feature alignment moduleIncluding local feature alignment loss of a source domain data set and local feature alignment loss of a target domain data setLoss of function. The calculation method comprises the following steps:

wherein the method comprises the steps ofAnd->Is the local feature alignment loss in the source domain and the target domain,/->Is the i-th image of the source field input, < >>Is the i-th image of the target field input, < >>Is the j-th multiscale local feature extractor, < ->Is the output of the jth multi-scale domain classifier layer, W and H are the width and height of the feature map, W and H are the parameters representing the values of the width and height of the feature map, and>and->The total number of normally illuminated images and the total number of low illuminated images, respectively.

In step S3, the multi-scale global feature alignment includes

s36, inputting the feature map processed in the S35 into a field classification layer DC, so that the field classifier cannot distinguish whether the features come from a source field data set or a target field data set, and the field invariance of the generated network is improved; training a loss function of a multi-scale global feature alignment module by using a least square methodIncluding global feature alignment loss for the source domain dataset and global feature alignment loss for the target domain dataset:

wherein,is the i-th image of the source field input, < >>Is the i-th image of the target field input, < >>Is the j-th multi-scale global feature extractor, < >>Is the output of the j-th multi-scale domain classification layer,/->And->The total number of normally illuminated images and the total number of low illuminated images, respectively.

S4, creating a decoder by using a converter neural network model and realizing a low-illuminance detection head; specifically, step S4 includes:

s41, creating a face sign-in decoder; the decoder is a bidirectional coding structure based on a converter neural network; in a specific embodiment, the number of decoders is 12, each decoder comprising a multi-headed attention and a feed-forward neural network;

s42, the feature vector obtained in the step S3 is sent to a decoder;

s43, traversing the decoder to realize a face detection head network; the method specifically comprises the following steps:

s431, traversing each decoder layer in turn to obtain an attention score;

s432, creating a multi-head attention matrix, wherein the multi-head attention moment matrix comprises a query matrix Q, a key matrix K and a value matrix V;

S44, creating a face sign-in detection head network, and obtaining a weight vector by training a fully-connected neural networkAnd deviation term->The method comprises the steps of carrying out a first treatment on the surface of the For an input vector g of a first layer of the fully connected neural network, the number of input neurons is 768, and the number of output neurons is 2; according to the prediction result of the detection network, utilizing the trafficThe cross entropy loss function calculates face recognition effect loss estimates. Wherein, the forward propagation function P of the full-connection network and the loss function L of the full-connection network used in the calculation process _f1 The formula of (2) is as follows:

wherein f is a network activation function; g is a face sample; s is the number of samples;loss for the ith sample; />A label representing sample i, with a positive value of 1 and a negative value of 0; />Representing the probability that sample i is predicted to be positive; n is the number of categories.

And S5, training and storing a model by using an unsupervised domain self-adaptive technology to obtain a face sign-in result in a low-illuminance scene. Specifically, step S5 includes:

s51, performing self-adaptive training in an unsupervised domain; firstly, training a model by using a source domain data set, and storing the model and setting the model as a source domain after training; then, the learned knowledge of the model is migrated to a target domain data set and set as a target domain; finally, testing the model by using the target domain data set; the formula of the unsupervised domain adaptive training is as follows:

wherein,for final delivery of the modelGo out vector (I),>for knowledge learned in the source domain dataset, < > for>For knowledge learned on the target domain dataset.

S52, recognizing a face sign-in; and (5) classifying and regressing the output vector of the model in the step (S51) by using a fully connected neural network to obtain a detection result of face recognition. The formula of the face recognition classification Y is:

Y = f(K)×X+h

wherein f (-) is a network activation function, K represents a weight matrix, X is a semantic representation vector of a face image, and h is a model parameter to be learned.

The second embodiment of the present application provides a campus check-in system, as shown in fig. 2, for implementing a campus student check-in a low-light scene by using the method described in the above embodiment, including:

the processing unit 210 performs low-light intensity enhancement on the campus face image by using a self-calibration technology, and establishes a campus face image data preprocessing module;

the encoding unit 220 creates a face check-in encoder by using the converter neural network model and encodes the image, and creates a campus face encoding unit;

the alignment unit 230 performs multi-scale local feature alignment and multi-scale global feature alignment on the encoder by using a gradient inversion technology, and establishes a campus face alignment unit;

the recognition unit 240 creates a decoder and implements a low-light detection head using the converter neural network model, trains and saves the model using an unsupervised domain adaptive technique, and obtains a face check-in result in a low-light scene.

In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. The campus student check-in method under the low-light-intensity scene is characterized by comprising the following steps of:

s5, training and storing a model by using an unsupervised domain self-adaptive technology to obtain a face sign-in result in a low-illuminance scene;

wherein, step S1 includes:

s13, self-calibrating is conducted on the relationship of the contrast degree; firstly, defining a self-calibration module to enable each stage in the low-light image enhancement process to converge to the same state; defining the input of each previous stage as low light observation, bridging the input of each stage; then, introducing a self-calibration map and adding it to low-light observations, presenting an illuminance difference between the input in each stage and the first stage; finally, forming a self-calibration model;

s14, training a self-calibration model; enhancing the network learning capability by adopting unsupervised learning, wherein the fidelity is defined as the total loss of the self-calibration model;

the step S2 comprises the following steps:

s25, inputting the attention score subjected to normalization processing into a feedforward neural network FFN for linear transformation to obtain a shallow face image feature vector of the low-illumination face image;

the step S4 includes:

s42, the feature vector obtained in the step S3 is sent to a decoder;

s43, traversing the decoder to realize a face detection head network;

s44, creating a face sign-in detection head network, and obtaining a weight vector and a deviation term by training a fully-connected neural network; calculating face recognition effect loss estimation by using a cross entropy loss function according to the prediction result of the detection network;

step S43 includes:

s431, traversing each decoder layer in turn to obtain an attention score;

2. The method for checking in a campus student in a low light level scene as claimed in claim 1, wherein in step S3, the performing multi-scale local feature alignment includes:

3. The campus student check-in method in a low light level scenario of claim 2, wherein in step S3, the multi-scale global feature alignment comprises

4. A campus student check-in method in a low light level scenario as claimed in claim 3, wherein step S5 comprises:

5. A campus check-in system, wherein the method of any one of claims 1 to 4 is used to implement a campus student check-in a low light scene, comprising: