CN116758617B - Campus student check-in method and campus check-in system under low-illuminance scene - Google Patents
Campus student check-in method and campus check-in system under low-illuminance scene Download PDFInfo
- Publication number
- CN116758617B CN116758617B CN202311027739.7A CN202311027739A CN116758617B CN 116758617 B CN116758617 B CN 116758617B CN 202311027739 A CN202311027739 A CN 202311027739A CN 116758617 B CN116758617 B CN 116758617B
- Authority
- CN
- China
- Prior art keywords
- low
- data set
- face
- image
- campus
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000012549 training Methods 0.000 claims abstract description 30
- 238000001514 detection method Methods 0.000 claims abstract description 29
- 238000005286 illumination Methods 0.000 claims abstract description 26
- 238000005516 engineering process Methods 0.000 claims abstract description 20
- 238000012545 processing Methods 0.000 claims abstract description 20
- 238000003062 neural network model Methods 0.000 claims abstract description 14
- 239000013598 vector Substances 0.000 claims description 37
- 238000013528 artificial neural network Methods 0.000 claims description 28
- 230000006870 function Effects 0.000 claims description 23
- 239000011159 matrix material Substances 0.000 claims description 23
- 238000010606 normalization Methods 0.000 claims description 16
- 230000008569 process Effects 0.000 claims description 10
- 230000008485 antagonism Effects 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000010586 diagram Methods 0.000 claims description 6
- 230000002708 enhancing effect Effects 0.000 claims description 6
- 238000001914 filtration Methods 0.000 claims description 6
- 230000002457 bidirectional effect Effects 0.000 claims description 5
- 230000000694 effects Effects 0.000 claims description 4
- 238000007781 pre-processing Methods 0.000 claims description 4
- 238000012360 testing method Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 230000004913 activation Effects 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 4
- 230000002411 adverse Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 210000002364 input neuron Anatomy 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 210000004205 output neuron Anatomy 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0499—Feedforward networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The application provides a campus student check-in method and a campus check-in system under a low-light-intensity scene. The sign-in method comprises the following steps: the self-calibration technology is utilized to enhance the low-light intensity of the campus face image; creating a face sign-in encoder by using a converter neural network model and encoding an image; performing multi-scale local feature alignment and multi-scale global feature alignment on the encoder by utilizing a gradient inversion technology; creating a decoder by using a converter neural network model and realizing a low-light illuminance detection head; and training and storing the model by using an unsupervised domain self-adaptive technology to obtain a face sign-in result in a low-light scene. The sign-in system comprises a processing unit, a coding unit, an alignment unit and an identification unit, so as to realize the sign-in method. The application can effectively detect the object in the low-illumination image through unsupervised self-adaptation, and obviously reduce the dependence of the model on the sample. The overall properties of the image can be aligned to reduce feature bias.
Description
Technical Field
The application relates to the technical field of face recognition check-in, in particular to a campus student check-in method and a campus check-in system under a low-light-intensity scene.
Background
Face recognition sign-in is a basic task of computer vision and is widely applied to industrial scenes such as face sign-in, automatic driving, scene understanding and the like. While low light environments are an integral part of everyday activities, low light environments pose a significant challenge to computer vision. In general, an image obtained during night or foggy days has characteristics of low contrast, low brightness, noise, and blurring due to insufficient light. Such images directly degrade the performance of the existing face check-in model, resulting in significant detection errors. Despite a major breakthrough in the face recognition field, existing research involves bright images, not dim light. Therefore, the campus face check-in method suitable for the low-illumination image is very important for application of the artificial intelligence in the campus. The face recognition system in the low illumination scene at present mainly comprises: (1) image enhancement-based detection methods. In order to obtain reliable detection in adverse conditions such as night or cloudy days, the requirement for low-light image enhancement must be met. This method requires pre-processing the low-light image to improve brightness and contrast, and then detecting the enhanced image. (2) an end-to-end detection based method. This class uses supervised learning to build a detection model and requires a large amount of annotated training data that is expensive and time consuming to collect. And (3) an adaptive detection method based on an unsupervised field. Using a labeled dataset as a source domain and an unlabeled dataset as a target domain helps the model learn domain invariant feature representation at the domain or class level. In the case of no or few tags in the normal illumination data set, the method may apply features learned from the normal illumination data to the low illumination image detection.
Disclosure of Invention
The application aims to at least solve one of the following technical problems in the prior art:
(1) Low light illumination scenes cannot be handled. In the existing campus face check-in system, images obtained in foggy days and nights have the characteristics of low contrast, low brightness, noise, blurring caused by insufficient light and the like. Such images directly degrade the performance of existing object detection models, resulting in significant detection errors.
(2) Image global feature loss. Meaning that the distribution of source domain images with normal illumination and target domain images with low illumination may not be exactly matched at the global image level using contrast loss, as the two domains have different scene layouts and object combinations.
(3) The local features of the image are lost. It is meant that the local features, such as texture and color of the source domain image with normal illumination and the target domain image with low illumination, are perfectly matched, which may fail due to the deviation of class-level semantics of the two images. In most current approaches, researchers are only concerned with using local or global feature alignment.
Therefore, the first aspect of the application provides a campus student check-in method under a low-light-intensity scene.
The second aspect of the application provides a campus check-in system.
The application provides a campus student check-in method under a low-light-intensity scene, which comprises the following steps:
s1, performing low-illuminance enhancement on a campus face image by using a self-calibration technology;
s2, creating a face sign-in encoder by using a converter neural network model and encoding an image;
s3, performing multi-scale local feature alignment and multi-scale global feature alignment on the encoder by utilizing a gradient inversion technology;
s4, creating a decoder by using a converter neural network model and realizing a low-illuminance detection head;
and S5, training and storing a model by using an unsupervised domain self-adaptive technology to obtain a face sign-in result in a low-illuminance scene.
According to the technical scheme, the campus student check-in method under the low-light-intensity scene can also have the following additional technical characteristics:
in the above technical solution, step S1 includes:
s11, establishing a low-illuminance face recognition data set, extracting at least part of data in the low-illuminance face recognition data set as a source domain data set, and taking the rest data in the low-illuminance face recognition data set as a target domain data set;
s12, performing low-light image enhancement on the images in the source domain data set by using a homomorphic filtering model, learning the illuminance relation between the low-illuminance images and the expected clear images, performing illuminance estimation while enhancing the images, acquiring enhanced output brightness by removing the estimated illuminance, and establishing an illuminance learning relation according to a homomorphic filtering theory;
s13, self-calibrating is conducted on the relationship of the contrast degree; firstly, defining a self-calibration module to enable each stage in the low-light image enhancement process to converge to the same state; defining the input of each previous stage as low light observation, bridging the input of each stage; then, introducing a self-calibration map and adding it to low-light observations, presenting an illuminance difference between the input in each stage and the first stage; finally, a self-calibration model is formed.
In the above technical solution, step S1 further includes:
s14, training a self-calibration model; unsupervised learning is employed to enhance the network learning capability, where fidelity is defined as the total loss of self-calibration model.
In the above technical solution, step S2 includes:
s21, creating a face sign-in encoder; the encoder is a bidirectional encoding structure based on a converter neural network, and comprises a multi-head attention and a feedforward neural network;
s22, dividing an image in the source domain data set into a plurality of image blocks;
s23, inputting each image block into a face sign-in encoder to perform vector calculation, obtaining a multi-head attention vector of a face image, and creating a multi-head attention moment array to calculate the attention score of the multi-head attention vector of the face image;
s24, carrying out normalization processing on the attention score to obtain the attention score after normalization processing;
s25, inputting the attention score subjected to normalization processing into a feedforward neural network FFN for linear transformation to obtain a shallow face image feature vector of the low-illumination face image.
In the above technical solution, in step S3, the performing multi-scale local feature alignment includes:
s31, shallow face image feature vectors of the low-illumination face images are sent into a gradient inversion layer GRL, an antagonism learning strategy is used for reducing loss of a multi-scale local feature alignment module in a forward propagation process, and in a reverse propagation process, the gradient inversion layer GRL multiplies input errors by negative scalar to increase loss of the multi-scale local feature alignment module, so that low-level feature differences of a source domain data set and a target domain data set are reduced;
s32, feeding the characteristic diagram generated in the S31 into a plurality of convolution layers with different channel sizes;
s33, inputting the feature map processed in the S32 into a corresponding domain classification layer, and training a loss function of a multi-scale local feature alignment module by using a least square method, wherein the loss function of the multi-scale local feature alignment module comprises local feature alignment loss of a source domain data set and local feature alignment loss of a target domain data set.
In the above technical solution, in step S3, the multi-scale global feature alignment includes
S34, sending the output vector subjected to multi-scale local feature alignment into a gradient inversion layer GRL, and using an antagonism learning strategy, wherein the gradient inversion layer GRL minimizes the loss of a multi-scale global feature alignment module during forward propagation and maximizes the loss of the multi-scale global feature alignment module by multiplying an input error by a negative scalar during backward propagation, thereby reducing low-level feature differences between a source domain data set and a target domain data set;
s35, feeding the characteristic diagram generated in the S34 into a plurality of convolution layers with different channel sizes;
s36, inputting the feature map processed in the S35 into a domain classification layer, so that a domain classifier cannot distinguish whether the features come from a source domain data set or a target domain data set, and training a loss function of a multi-scale global feature alignment module by using a least square method, wherein the loss function of the multi-scale global feature alignment module comprises global feature alignment loss of the source domain data set and global feature alignment loss of the target domain data set.
In the above technical solution, step S4 includes:
s41, creating a face sign-in decoder; the decoder is a bidirectional coding structure based on a converter neural network; the decoder includes a multi-headed attention and feed-forward neural network;
s42, the feature vector obtained in the step S3 is sent to a decoder;
s43, traversing the decoder to realize a face detection head network;
s44, creating a face sign-in detection head network, and obtaining a weight vector and a deviation term by training a fully-connected neural network; and calculating face recognition effect loss estimation by using a cross entropy loss function according to the prediction result of the detection network.
In the above technical solution, step S43 includes:
s431, traversing each decoder layer in turn to obtain an attention score;
s432, creating a multi-head attention matrix, wherein the multi-head attention moment matrix comprises a query matrix, a key matrix and a value matrix;
s433, calculating attention scores according to the multi-head attention matrix;
s434, carrying out normalization processing on the attention score to obtain a normalized attention score;
s435, inputting the attention score after normalization processing into a feedforward neural network, and outputting an image semantic vector through the feedforward neural network.
In the above technical solution, step S5 includes:
s51, performing self-adaptive training in an unsupervised domain; firstly, training a model by using a source domain data set, and storing the model and setting the model as a source domain after training; then, the learned knowledge of the model is migrated to a target domain data set and set as a target domain; finally, testing the model by using the target domain data set;
s52, recognizing a face sign-in; and (5) classifying and regressing the output vector of the model in the step (S51) by using a fully connected neural network to obtain a detection result of face recognition.
The application also provides a campus check-in system, which adopts the method of any one of the technical proposal to realize the campus student check-in under the low-light illumination scene, comprising the following steps:
the processing unit is used for enhancing low-illuminance of the campus face image by using a self-calibration technology, and a campus face image data preprocessing module is established;
the encoding unit is used for creating a face sign-in encoder by using the converter neural network model, encoding the image and creating a campus face encoding unit;
the alignment unit is used for carrying out multi-scale local feature alignment and multi-scale global feature alignment on the encoder by utilizing a gradient inversion technology, and establishing a campus face alignment unit;
and the identification unit is used for creating a decoder by using the converter neural network model, realizing a low-light detection head, training and storing the model by using an unsupervised domain self-adaptive technology, and obtaining a face sign-in result in a low-light scene.
In summary, due to the adoption of the technical characteristics, the application has the beneficial effects that:
(1) The present application improves a generic object detection network by using a normal illumination image as a source domain and a low illumination image as a target domain. The application can effectively detect the object in the low-illumination image through unsupervised self-adaptation, and obviously reduce the dependence of the model on the sample.
(2) The present application develops a new domain adaptive multi-scale local feature alignment module and a multi-scale global feature alignment module that performs multi-scale local feature alignment on a feature map to align the perceived field in the feature map, thereby reducing low-level feature bias. And carrying out multi-scale global (image-level) feature alignment on the feature map to align the overall attribute of the image, thereby reducing feature deviations of the background, the scene, the target layout and the like.
(3) Based on comprehensive evaluation and comparison with the current method, the method provided by the application has the advantages that the performance is improved in the low-illumination campus face check-in, and the method provided by the application has good generalization capability.
Additional aspects and advantages of the application will be set forth in part in the description which follows, or may be learned by practice of the application.
Drawings
The foregoing and/or additional aspects and advantages of the application will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
fig. 1 is a flow chart of a campus student check-in method in a low-light scene according to an embodiment of the present application;
fig. 2 is a block diagram of a campus check-in system according to an embodiment of the present application.
The correspondence between the reference numerals and the component names in fig. 1 to 2 is:
210. a processing unit; 220. a coding unit; 230. an alignment unit; 240. and an identification unit.
Detailed Description
In order that the above-recited objects, features and advantages of the present application will be more clearly understood, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description. It should be noted that, without conflict, the embodiments of the present application and features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, however, the present application may be practiced otherwise than as described herein, and therefore the scope of the present application is not limited to the specific embodiments disclosed below.
A campus student check-in method and a campus check-in system in a low-light scene according to some embodiments of the present application are described below with reference to fig. 1 to 2.
Some embodiments of the application provide a campus student check-in method in a low-light-intensity scene.
As shown in fig. 1, a first embodiment of the present application provides a campus student check-in method under a low-light scene, including the following steps:
s1, performing low-illuminance enhancement on a campus face image by using a self-calibration technology; specifically, step S1 includes:
s11, establishing a low-illuminance face recognition data set, extracting at least part of data in the low-illuminance face recognition data set as a source domain data set, and taking the rest data in the low-illuminance face recognition data set as a target domain data set; in a specific embodiment, the first 80% of the extracted low-light face recognition dataset is set as the source domain dataset X S Then, the remaining 20% of the low-illuminance face recognition data set is extracted and set as the target domain data set X T The method comprises the steps of carrying out a first treatment on the surface of the Finally, the target domain data set X T The tag in (a) is deleted.
S12, using homomorphic filtering model to make source domain data set X S The image in (2) is subjected to low-light image enhancement, the illuminance relation between the low-light image and the expected clear image is learned, and the formula is as follows:c, wherein a is a desired clear image, y is a low-illumination image, and c is adjustable light; then, carrying out illumination estimation while enhancing the image, obtaining enhanced output brightness by removing the estimated illumination, and establishing an illumination learning relation according to a homomorphic filtering theory; wherein a parameter θ is introduced to map the illuminance relationship +.>The method comprises the steps of carrying out a first treatment on the surface of the The illuminance learning relationship F is:
wherein u is t Residual terms representing the T-th stage (t=0, …, T-1), T being the total number of stages, x t Representing the illuminance at the T-th stage (t=0, …, T-1), y being a low-illuminance image; wherein a weight sharing mechanism is employed, i.e. with the same architecture H and weights θ in each stage.
S13, self-calibrating is conducted on the relationship of the contrast degree; firstly, defining a self-calibration module to enable each stage in the low-light image enhancement process to converge to the same state; defining the input of each previous stage as low light observation, bridging the input of each stage; self-calibration maps s and v are then introduced and added to the low-light observations, presenting an illuminance difference between the input in each phase and the first phase, the formula:
wherein,a self-calibration model is generated; t is greater than or equal to 1, z t Is a sharp image input of each stage, v t Is the conversion input of each stage s t Is the mapping input of each stage, K θ Is an introduced parameterized operator and a learnable parameter.
Finally, the illuminance learning relation that forms the self-calibration model, i.e., the unit (t.gtoreq.1) converted into the basic t-th stage, can be written as:
。
in some embodiments, step S1 further comprises:
s14, training a self-calibration model; in view of the inaccuracy of existing training methods, unsupervised learning is employed to enhance the ability of the network to learn, where the total loss of the model is defined as L f ,L f Representing fidelity, then there are:
where d is the predicted illuminance result, e t-1 And outputting the result of the calibration.
S2, creating a face sign-in encoder by using a converter neural network model and encoding an image; specifically, step S2 includes:
s21, creating a face sign-in encoder; the encoders are bi-directional encoding structures based on a neural network of transducers, in one particular embodiment 12 in number, each comprising a multi-headed attention and a feed-forward neural network;
s22, dividing the image in the source domain data set Xs into a plurality of image blocks;
s23, inputting each image block into a face sign-in encoder to perform vector calculation, obtaining a multi-head attention vector of a face image, and creating a multi-head attention moment array to calculate the attention score of the multi-head attention vector of the face image; the calculation method comprises the following steps:
multihead = Concat(head 1 , …, head h )×W Q
wherein multihead represents image multi-head attention, concat () represents image attention connection function, head i Represents the i-th image attention, h represents the vector size, W Q Representing the weight vector, Q representing the query matrix, softmax (.) representing the normalization function, K T Representing the transposed key matrix, V representing the value matrix; d, d k Representing the dimensions of the key matrix.
S24, carrying out normalization processing on the attention score to obtain the attention score after normalization processing, wherein the method comprises the following steps:
M=U+Sublayer(U)
wherein M represents the attention score vector after residual connection, U represents the attention of the face image, and subayer () represents the residual connection.
S25, inputting the attention score vector M subjected to normalization processing into a feedforward neural network FFN to perform linear transformation, and obtaining a shallow face image feature vector of the low-illumination face image. Specifically, the following formula is employed:
FFN(M) = Max(0, W 1 +b 1 )×W 2 +b 2
wherein Max (.) represents the activation function of the neuron, 0 represents the gradient of the activation function, W 1 Represents the weight of layer 1, W 2 Representing layer 2 weights, b 1 Representing parameters to be learned of the first layer, b 2 Representing the parameters to be learned of the second layer.
S3, performing multi-scale local feature alignment and multi-scale global feature alignment on the encoder by utilizing a gradient inversion technology; specifically, in step S3, the performing multi-scale local feature alignment includes:
s31, sending shallow face image feature vectors of the low-illumination face image into a gradient inversion layer GRL, and using an antagonism learning strategy to furthest reduce the loss of a multi-scale local feature alignment module in the forward propagation process, wherein in the backward propagation process, the gradient inversion layer GRL multiplies an input error by a negative scalar to furthest increase the loss of the multi-scale local feature alignment module, so that the low-level feature difference of a source domain data set and a target domain data set is reduced;
s32, feeding the feature map generated in the S31 into a plurality of convolution layers with different channel sizes so as to improve domain invariance of features obtained by a feature extraction network;
s33, inputting the feature map processed in the S32 into a corresponding domain classification layer DC, and training a loss function of a multi-scale local feature alignment module by using a least square method, wherein the loss function of the multi-scale local feature alignment moduleIncluding local feature alignment loss of a source domain data set and local feature alignment loss of a target domain data setLoss of function. The calculation method comprises the following steps:
wherein the method comprises the steps ofAnd->Is the local feature alignment loss in the source domain and the target domain,/->Is the i-th image of the source field input, < >>Is the i-th image of the target field input, < >>Is the j-th multiscale local feature extractor, < ->Is the output of the jth multi-scale domain classifier layer, W and H are the width and height of the feature map, W and H are the parameters representing the values of the width and height of the feature map, and>and->The total number of normally illuminated images and the total number of low illuminated images, respectively.
In step S3, the multi-scale global feature alignment includes
S34, sending the output vector subjected to multi-scale local feature alignment into a gradient inversion layer GRL, and using an antagonism learning strategy, wherein the gradient inversion layer GRL minimizes the loss of a multi-scale global feature alignment module during forward propagation and maximizes the loss of the multi-scale global feature alignment module by multiplying an input error by a negative scalar during backward propagation, thereby reducing low-level feature differences between a source domain data set and a target domain data set;
s35, feeding the characteristic diagram generated in the S34 into a plurality of convolution layers with different channel sizes;
s36, inputting the feature map processed in the S35 into a field classification layer DC, so that the field classifier cannot distinguish whether the features come from a source field data set or a target field data set, and the field invariance of the generated network is improved; training a loss function of a multi-scale global feature alignment module by using a least square methodIncluding global feature alignment loss for the source domain dataset and global feature alignment loss for the target domain dataset:
wherein,is the i-th image of the source field input, < >>Is the i-th image of the target field input, < >>Is the j-th multi-scale global feature extractor, < >>Is the output of the j-th multi-scale domain classification layer,/->And->The total number of normally illuminated images and the total number of low illuminated images, respectively.
S4, creating a decoder by using a converter neural network model and realizing a low-illuminance detection head; specifically, step S4 includes:
s41, creating a face sign-in decoder; the decoder is a bidirectional coding structure based on a converter neural network; in a specific embodiment, the number of decoders is 12, each decoder comprising a multi-headed attention and a feed-forward neural network;
s42, the feature vector obtained in the step S3 is sent to a decoder;
s43, traversing the decoder to realize a face detection head network; the method specifically comprises the following steps:
s431, traversing each decoder layer in turn to obtain an attention score;
s432, creating a multi-head attention matrix, wherein the multi-head attention moment matrix comprises a query matrix Q, a key matrix K and a value matrix V;
s433, calculating attention scores according to the multi-head attention matrix;
s434, carrying out normalization processing on the attention score to obtain a normalized attention score;
s435, inputting the attention score after normalization processing into a feedforward neural network, and outputting an image semantic vector through the feedforward neural network.
S44, creating a face sign-in detection head network, and obtaining a weight vector by training a fully-connected neural networkAnd deviation term->The method comprises the steps of carrying out a first treatment on the surface of the For an input vector g of a first layer of the fully connected neural network, the number of input neurons is 768, and the number of output neurons is 2; according to the prediction result of the detection network, utilizing the trafficThe cross entropy loss function calculates face recognition effect loss estimates. Wherein, the forward propagation function P of the full-connection network and the loss function L of the full-connection network used in the calculation process f1 The formula of (2) is as follows:
wherein f is a network activation function; g is a face sample; s is the number of samples;loss for the ith sample; />A label representing sample i, with a positive value of 1 and a negative value of 0; />Representing the probability that sample i is predicted to be positive; n is the number of categories.
And S5, training and storing a model by using an unsupervised domain self-adaptive technology to obtain a face sign-in result in a low-illuminance scene. Specifically, step S5 includes:
s51, performing self-adaptive training in an unsupervised domain; firstly, training a model by using a source domain data set, and storing the model and setting the model as a source domain after training; then, the learned knowledge of the model is migrated to a target domain data set and set as a target domain; finally, testing the model by using the target domain data set; the formula of the unsupervised domain adaptive training is as follows:
wherein,for final delivery of the modelGo out vector (I),>for knowledge learned in the source domain dataset, < > for>For knowledge learned on the target domain dataset.
S52, recognizing a face sign-in; and (5) classifying and regressing the output vector of the model in the step (S51) by using a fully connected neural network to obtain a detection result of face recognition. The formula of the face recognition classification Y is:
Y = f(K)×X+h
wherein f (-) is a network activation function, K represents a weight matrix, X is a semantic representation vector of a face image, and h is a model parameter to be learned.
The second embodiment of the present application provides a campus check-in system, as shown in fig. 2, for implementing a campus student check-in a low-light scene by using the method described in the above embodiment, including:
the processing unit 210 performs low-light intensity enhancement on the campus face image by using a self-calibration technology, and establishes a campus face image data preprocessing module;
the encoding unit 220 creates a face check-in encoder by using the converter neural network model and encodes the image, and creates a campus face encoding unit;
the alignment unit 230 performs multi-scale local feature alignment and multi-scale global feature alignment on the encoder by using a gradient inversion technology, and establishes a campus face alignment unit;
the recognition unit 240 creates a decoder and implements a low-light detection head using the converter neural network model, trains and saves the model using an unsupervised domain adaptive technique, and obtains a face check-in result in a low-light scene.
In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.
Claims (5)
1. The campus student check-in method under the low-light-intensity scene is characterized by comprising the following steps of:
s1, performing low-illuminance enhancement on a campus face image by using a self-calibration technology;
s2, creating a face sign-in encoder by using a converter neural network model and encoding an image;
s3, performing multi-scale local feature alignment and multi-scale global feature alignment on the encoder by utilizing a gradient inversion technology;
s4, creating a decoder by using a converter neural network model and realizing a low-illuminance detection head;
s5, training and storing a model by using an unsupervised domain self-adaptive technology to obtain a face sign-in result in a low-illuminance scene;
wherein, step S1 includes:
s11, establishing a low-illuminance face recognition data set, extracting at least part of data in the low-illuminance face recognition data set as a source domain data set, and taking the rest data in the low-illuminance face recognition data set as a target domain data set;
s12, performing low-light image enhancement on the images in the source domain data set by using a homomorphic filtering model, learning the illuminance relation between the low-illuminance images and the expected clear images, performing illuminance estimation while enhancing the images, acquiring enhanced output brightness by removing the estimated illuminance, and establishing an illuminance learning relation according to a homomorphic filtering theory;
s13, self-calibrating is conducted on the relationship of the contrast degree; firstly, defining a self-calibration module to enable each stage in the low-light image enhancement process to converge to the same state; defining the input of each previous stage as low light observation, bridging the input of each stage; then, introducing a self-calibration map and adding it to low-light observations, presenting an illuminance difference between the input in each stage and the first stage; finally, forming a self-calibration model;
s14, training a self-calibration model; enhancing the network learning capability by adopting unsupervised learning, wherein the fidelity is defined as the total loss of the self-calibration model;
the step S2 comprises the following steps:
s21, creating a face sign-in encoder; the encoder is a bidirectional encoding structure based on a converter neural network, and comprises a multi-head attention and a feedforward neural network;
s22, dividing an image in the source domain data set into a plurality of image blocks;
s23, inputting each image block into a face sign-in encoder to perform vector calculation, obtaining a multi-head attention vector of a face image, and creating a multi-head attention moment array to calculate the attention score of the multi-head attention vector of the face image;
s24, carrying out normalization processing on the attention score to obtain the attention score after normalization processing;
s25, inputting the attention score subjected to normalization processing into a feedforward neural network FFN for linear transformation to obtain a shallow face image feature vector of the low-illumination face image;
the step S4 includes:
s41, creating a face sign-in decoder; the decoder is a bidirectional coding structure based on a converter neural network; the decoder includes a multi-headed attention and feed-forward neural network;
s42, the feature vector obtained in the step S3 is sent to a decoder;
s43, traversing the decoder to realize a face detection head network;
s44, creating a face sign-in detection head network, and obtaining a weight vector and a deviation term by training a fully-connected neural network; calculating face recognition effect loss estimation by using a cross entropy loss function according to the prediction result of the detection network;
step S43 includes:
s431, traversing each decoder layer in turn to obtain an attention score;
s432, creating a multi-head attention matrix, wherein the multi-head attention moment matrix comprises a query matrix, a key matrix and a value matrix;
s433, calculating attention scores according to the multi-head attention matrix;
s434, carrying out normalization processing on the attention score to obtain a normalized attention score;
s435, inputting the attention score after normalization processing into a feedforward neural network, and outputting an image semantic vector through the feedforward neural network.
2. The method for checking in a campus student in a low light level scene as claimed in claim 1, wherein in step S3, the performing multi-scale local feature alignment includes:
s31, shallow face image feature vectors of the low-illumination face images are sent into a gradient inversion layer GRL, an antagonism learning strategy is used for reducing loss of a multi-scale local feature alignment module in a forward propagation process, and in a reverse propagation process, the gradient inversion layer GRL multiplies input errors by negative scalar to increase loss of the multi-scale local feature alignment module, so that low-level feature differences of a source domain data set and a target domain data set are reduced;
s32, feeding the characteristic diagram generated in the S31 into a plurality of convolution layers with different channel sizes;
s33, inputting the feature map processed in the S32 into a corresponding domain classification layer, and training a loss function of a multi-scale local feature alignment module by using a least square method, wherein the loss function of the multi-scale local feature alignment module comprises local feature alignment loss of a source domain data set and local feature alignment loss of a target domain data set.
3. The campus student check-in method in a low light level scenario of claim 2, wherein in step S3, the multi-scale global feature alignment comprises
S34, sending the output vector subjected to multi-scale local feature alignment into a gradient inversion layer GRL, and using an antagonism learning strategy, wherein the gradient inversion layer GRL minimizes the loss of a multi-scale global feature alignment module during forward propagation and maximizes the loss of the multi-scale global feature alignment module by multiplying an input error by a negative scalar during backward propagation, thereby reducing low-level feature differences between a source domain data set and a target domain data set;
s35, feeding the characteristic diagram generated in the S34 into a plurality of convolution layers with different channel sizes;
s36, inputting the feature map processed in the S35 into a domain classification layer, so that a domain classifier cannot distinguish whether the features come from a source domain data set or a target domain data set, and training a loss function of a multi-scale global feature alignment module by using a least square method, wherein the loss function of the multi-scale global feature alignment module comprises global feature alignment loss of the source domain data set and global feature alignment loss of the target domain data set.
4. A campus student check-in method in a low light level scenario as claimed in claim 3, wherein step S5 comprises:
s51, performing self-adaptive training in an unsupervised domain; firstly, training a model by using a source domain data set, and storing the model and setting the model as a source domain after training; then, the learned knowledge of the model is migrated to a target domain data set and set as a target domain; finally, testing the model by using the target domain data set;
s52, recognizing a face sign-in; and (5) classifying and regressing the output vector of the model in the step (S51) by using a fully connected neural network to obtain a detection result of face recognition.
5. A campus check-in system, wherein the method of any one of claims 1 to 4 is used to implement a campus student check-in a low light scene, comprising:
the processing unit is used for enhancing low-illuminance of the campus face image by using a self-calibration technology, and a campus face image data preprocessing module is established;
the encoding unit is used for creating a face sign-in encoder by using the converter neural network model, encoding the image and creating a campus face encoding unit;
the alignment unit is used for carrying out multi-scale local feature alignment and multi-scale global feature alignment on the encoder by utilizing a gradient inversion technology, and establishing a campus face alignment unit;
and the identification unit is used for creating a decoder by using the converter neural network model, realizing a low-light detection head, training and storing the model by using an unsupervised domain self-adaptive technology, and obtaining a face sign-in result in a low-light scene.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311027739.7A CN116758617B (en) | 2023-08-16 | 2023-08-16 | Campus student check-in method and campus check-in system under low-illuminance scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311027739.7A CN116758617B (en) | 2023-08-16 | 2023-08-16 | Campus student check-in method and campus check-in system under low-illuminance scene |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116758617A CN116758617A (en) | 2023-09-15 |
CN116758617B true CN116758617B (en) | 2023-11-10 |
Family
ID=87953595
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311027739.7A Active CN116758617B (en) | 2023-08-16 | 2023-08-16 | Campus student check-in method and campus check-in system under low-illuminance scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116758617B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110807740A (en) * | 2019-09-17 | 2020-02-18 | 北京大学 | Image enhancement method and system for window image of monitoring scene |
CN113052210A (en) * | 2021-03-11 | 2021-06-29 | 北京工业大学 | Fast low-illumination target detection method based on convolutional neural network |
CN113111947A (en) * | 2021-04-16 | 2021-07-13 | 北京沃东天骏信息技术有限公司 | Image processing method, apparatus and computer-readable storage medium |
CN113269903A (en) * | 2021-05-24 | 2021-08-17 | 上海应用技术大学 | Face recognition class attendance system |
CN113902915A (en) * | 2021-10-12 | 2022-01-07 | 江苏大学 | Semantic segmentation method and system based on low-illumination complex road scene |
CN114998145A (en) * | 2022-06-07 | 2022-09-02 | 湖南大学 | Low-illumination image enhancement method based on multi-scale and context learning network |
CN115861101A (en) * | 2022-11-29 | 2023-03-28 | 福州大学 | Low-illumination image enhancement method based on depth separable convolution |
CN115880225A (en) * | 2022-11-10 | 2023-03-31 | 北京工业大学 | Dynamic illumination human face image quality enhancement method based on multi-scale attention mechanism |
WO2023092386A1 (en) * | 2021-11-25 | 2023-06-01 | 中国科学院深圳先进技术研究院 | Image processing method, terminal device, and computer readable storage medium |
CN116580243A (en) * | 2023-05-24 | 2023-08-11 | 北京理工大学 | Cross-domain remote sensing scene classification method for mask image modeling guide domain adaptation |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10997690B2 (en) * | 2019-01-18 | 2021-05-04 | Ramot At Tel-Aviv University Ltd. | Method and system for end-to-end image processing |
-
2023
- 2023-08-16 CN CN202311027739.7A patent/CN116758617B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110807740A (en) * | 2019-09-17 | 2020-02-18 | 北京大学 | Image enhancement method and system for window image of monitoring scene |
CN113052210A (en) * | 2021-03-11 | 2021-06-29 | 北京工业大学 | Fast low-illumination target detection method based on convolutional neural network |
CN113111947A (en) * | 2021-04-16 | 2021-07-13 | 北京沃东天骏信息技术有限公司 | Image processing method, apparatus and computer-readable storage medium |
CN113269903A (en) * | 2021-05-24 | 2021-08-17 | 上海应用技术大学 | Face recognition class attendance system |
CN113902915A (en) * | 2021-10-12 | 2022-01-07 | 江苏大学 | Semantic segmentation method and system based on low-illumination complex road scene |
WO2023092386A1 (en) * | 2021-11-25 | 2023-06-01 | 中国科学院深圳先进技术研究院 | Image processing method, terminal device, and computer readable storage medium |
CN114998145A (en) * | 2022-06-07 | 2022-09-02 | 湖南大学 | Low-illumination image enhancement method based on multi-scale and context learning network |
CN115880225A (en) * | 2022-11-10 | 2023-03-31 | 北京工业大学 | Dynamic illumination human face image quality enhancement method based on multi-scale attention mechanism |
CN115861101A (en) * | 2022-11-29 | 2023-03-28 | 福州大学 | Low-illumination image enhancement method based on depth separable convolution |
CN116580243A (en) * | 2023-05-24 | 2023-08-11 | 北京理工大学 | Cross-domain remote sensing scene classification method for mask image modeling guide domain adaptation |
Non-Patent Citations (4)
Title |
---|
Low-Light Image Enhancement Combined with Attention Map and U-Net Network;Weiji He等;《2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education (ICISCAE)》;397-401 * |
Multi-Scale Feature Guided Low-Light Image Enhancement;Lanqing Guo等;《2021 IEEE International Conference on Image Processing (ICIP)》;554-558 * |
低光照条件下的图像增强和识别关键技术研究;梁锦秀;《中国博士学位论文全文数据库 (信息科技辑)》(第01期);I138-97 * |
基于注意力机制和域适应的低照度目标检测方法研究;肖芸;《中国优秀硕士学位论文全文数据库 (信息科技辑)》(第02期);I138-3196 * |
Also Published As
Publication number | Publication date |
---|---|
CN116758617A (en) | 2023-09-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110414462B (en) | Unsupervised cross-domain pedestrian re-identification method and system | |
CN105447473B (en) | A kind of any attitude facial expression recognizing method based on PCANet-CNN | |
CN111898736B (en) | Efficient pedestrian re-identification method based on attribute perception | |
Komorowski et al. | Minkloc++: lidar and monocular image fusion for place recognition | |
CN108985268B (en) | Inductive radar high-resolution range profile identification method based on deep migration learning | |
CN114492574A (en) | Pseudo label loss unsupervised countermeasure domain adaptive picture classification method based on Gaussian uniform mixing model | |
CN112381788B (en) | Part surface defect increment detection method based on double-branch matching network | |
CN111931814B (en) | Unsupervised countering domain adaptation method based on intra-class structure tightening constraint | |
CN109492610B (en) | Pedestrian re-identification method and device and readable storage medium | |
CN113743544A (en) | Cross-modal neural network construction method, pedestrian retrieval method and system | |
CN116246338B (en) | Behavior recognition method based on graph convolution and transducer composite neural network | |
CN111242870B (en) | Low-light image enhancement method based on deep learning knowledge distillation technology | |
CN114283325A (en) | Underwater target identification method based on knowledge distillation | |
CN116486408A (en) | Cross-domain semantic segmentation method and device for remote sensing image | |
CN117993282A (en) | Domain adaptability information bottleneck federal learning method for intelligent manufacturing fault diagnosis | |
CN116758617B (en) | Campus student check-in method and campus check-in system under low-illuminance scene | |
CN117372853A (en) | Underwater target detection algorithm based on image enhancement and attention mechanism | |
CN116958700A (en) | Image classification method based on prompt engineering and contrast learning | |
Schenkel et al. | Domain adaptation for semantic segmentation using convolutional neural networks | |
CN116797821A (en) | Generalized zero sample image classification method based on fusion visual information | |
Xia et al. | Multi-RPN Fusion-Based Sparse PCA-CNN Approach to Object Detection and Recognition for Robot-Aided Visual System | |
CN116895002B (en) | Multi-graph contrast learning-based method and system for detecting adaptive targets from domain | |
CN118212422B (en) | Weak supervision image segmentation method for multidimensional differential mining | |
CN117523549B (en) | Three-dimensional point cloud object identification method based on deep and wide knowledge distillation | |
CN116129198B (en) | Multi-domain tire pattern image classification method, system, medium and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |