CN114299559A

CN114299559A - Finger vein identification method based on lightweight fusion global and local feature network

Info

Publication number: CN114299559A
Application number: CN202111617415.XA
Authority: CN
Inventors: 沈雷; 牟家乐; 徐文贵
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-12-27
Filing date: 2021-12-27
Publication date: 2022-04-08

Abstract

A finger vein identification method based on lightweight fusion global and local feature network is disclosed. 1. Constructing a data set, and dividing a training set and a testing set; 2. designing a lightweight fusion global and local feature network FGL-MobileNet based on a lightweight residual error unit; 3. designing an FGL-Net network based on fusion of global and local features, wherein the FGL-Net network comprises a backbone network based on an improved residual error network and a global feature and local feature extraction module; 4. designing a network loss function; 5. training the whole FGL-MobileNet network model until the whole training set is iteratively trained for a plurality of times; 6. and inputting the test set images into the trained FGL-MobileNet network model to extract the finger vein features, and identifying and comparing the finger vein features. According to the invention, FGL-MobileNet is built by stacking the units, so that the network can rapidly and effectively expand the receptive field of the finger vein characteristics, thereby obtaining more abundant finger vein detail characteristics, enabling the extracted finger vein characteristics to have more distinctive power, and greatly reducing the network parameters.

Description

Finger vein identification method based on lightweight fusion global and local feature network

Technical Field

The invention belongs to the field of biological feature recognition and computer vision, and mainly relates to a finger vein recognition method based on a lightweight fusion global and local feature network.

Background

At present, the finger vein recognition technology is rapidly developed and is widely applied in various industries, and more people are provided with faster and safer identity authentication service. The traditional finger vein recognition algorithm mainly performs recognition by extracting lines and textures of finger veins, and minutiae points such as end points and bifurcation points, etc., which are used as finger vein features. The gray distribution of the vein area and the background area of the finger vein image has obvious difference, the difference forms the texture of the finger vein image, and the texture of the finger vein image also has better identification performance. The Local Binary Pattern (LBP) is widely used for extracting image textures, a 3 x 3 field window is set to slide on an image, the positive and negative relation of the gray difference between the central pixel of the window and the field pixel point is coded, the coding result is converted into decimal numbers to serve as the LBP value of the central pixel point, the texture information of a local area is reflected, the method is insensitive to the linear change of illumination of the whole image, the calculation mode is simple, and the anti-noise performance is weak. The LBP-based method proposed by Rosdi et al is a local linear binary method (LLBP) to extract local features of the finger veins from both the horizontal and vertical directions. Lee et al use a support vector machine to classify LBP features and weight various features according to classification results, improving recognition rate. People who pay Hua and the like combine NBP and blocking ideas, and propose MMNBP to extract vein texture features, so that the robustness of the vein features is improved, but the grasp of global vein information is insufficient, and the vein structurality is difficult to be better highlighted. The vein minutiae features are further extracted on the basis of a vein topological structure, but due to the influences of factors such as finger placement pressure, illumination intensity and equipment exposure, the acquired finger vein images are fuzzy and have low contrast, so that enough vein minutiae cannot be extracted, and the final identification performance is greatly influenced.

The traditional finger vein features are based on artificially well-designed features, are shallow in feature level, and have good recognition performance, but the traditional features usually have high requirements on image quality and are difficult to adapt to complex finger posture changes in practical application scenes.

In recent years, deep learning technology is developed rapidly, deep convolutional neural networks are widely concerned and researched by a plurality of scholars, can extract shallow features such as outlines and angular points of images and can also extract high-level semantic features at a deeper level, and therefore the application of the deep convolutional neural networks in the fields of face recognition, target detection and the like obtains great results and brings huge economic benefits to the society. The AlexNet-based finger vein recognition algorithm is proposed by the pottery aspiration and brave et al, so that the network training speed is increased, the model complexity is reduced, and a good classification effect is achieved. However, the above methods all adopt classification networks, and have high accuracy in classifying the fingers in the training set, but have poor generalization performance, that is, the accuracy in classifying the untrained fingers is greatly reduced. Tang et al propose a two-stage network, which is divided into a basic network and an extended network, and supervise the network by using an optimized contrast loss function, thereby training the finger vein features with more discriminative power and improving the finger vein identification performance. However, the existing finger vein data set has few finger categories, and there are no finger vein images with rich postures for training, and the deep learning finger vein algorithm mostly relies on the global characteristics of the finger veins to identify, and the characteristics that the overall vein structure of the similar finger images changes greatly and the local significant vein structure changes slightly when the finger postures change are not fully considered to design the network structure, so that the identification rate is still low when the finger postures change.

The deep learning technology is promoted to be continuously advanced along with the rapid breakthrough of the performance of hardware such as a GPU. At present, the accuracy of the deep convolutional neural network has surpassed the human and the traditional technical scheme under various application scenes. However, with the increasingly high performance and the increasingly deep depth of the deep convolutional neural network, the required parameter amount and the required calculation amount are also increased significantly, and with the problems that the network model occupies a large storage space and consumes a high amount of power. These problems limit the application of neural networks to resource-constrained embedded devices. Since the finger vein recognition system is usually developed and operated based on embedded devices, it is very necessary to compress the network model.

Therefore, the invention aims to design a suitable light weight finger vein feature extraction network aiming at finger vein recognition tasks based on a light weight network design strategy, and improves the generalization capability of the light weight network by combining the idea of knowledge distillation.

Disclosure of Invention

A finger vein identification method based on a lightweight fusion global and local feature network is designed. Firstly, a light-weight residual error unit (SE-ErfResBlock) based on a rapid receptive field is designed by utilizing deep separable convolution, and FGL-MobileNet is built by stacking the unit, so that the receptive field of finger vein features can be rapidly and effectively expanded by a network, richer finger vein detail features can be obtained, the extracted finger vein features have more distinctive power, and meanwhile, the network parameters are greatly reduced. In order to make the network attach more importance to the learning of important features of the finger veins and reduce the attention to useless information, SE-ErfResBlock adds a channel attention mechanism. Experimental results show that the proposed FGL-MobileNet effectively reduces the size of a model and has better recognition performance.

A finger vein identification method based on a lightweight fusion global and local feature network comprises the following steps:

step 1, constructing a data set, and dividing the data set into a training set and a testing set. And expanding the divided training set by utilizing translation, brightness change, rotation and noise adding operations to meet different types of veins so as to improve the generalization capability of the network model.

And 2, designing a lightweight fusion global and local feature network FGL-MobileNet, and constructing the FGL-MobileNet through the lightweight residual error unit SE-ErfResBlock based on the rapid receptive field.

And 3, designing an FGL-Net network based on fusion of global and local characteristics, which mainly comprises a backbone network based on an improved residual error network and a global characteristic and local characteristic extraction module.

And 4, designing a network loss function, and adopting a cross entropy loss and Curricular face loss joint supervision network in order to improve the learning and characterization capability of a network model in the FGL-Net network training process. Compared with FGL-Net, FGL-MobileNet has shallow network depth and obviously reduced parameter quantity, so that the learning capability of the FGL-MobileNet is limited, the generalization performance of a model is reduced due to direct training of the FGL-MobileNet, stable finger vein features are difficult to learn, and the recognition rate is low. Therefore, the invention introduces a teacher-student mode, adds feature map loss and knowledge distillation loss in the loss item, takes FGL-Net as a teacher network, and teaches the learned more accurate finger vein feature distribution to the student network FGL-MobileNet as much as possible, thereby improving the generalization performance of FGL-MobileNet.

And 5, training the whole model until the whole training set is iteratively trained for a plurality of times.

And 6, inputting the test set images into the trained model to extract the finger vein features and carrying out recognition comparison.

The invention has the following beneficial effects:

firstly, a light-weight residual error unit (SE-ErfResBlock) based on a rapid receptive field is designed by utilizing deep separable convolution, and FGL-MobileNet is built by stacking the unit, so that the receptive field of the finger vein features can be rapidly and effectively expanded by a network, richer finger vein detail features can be obtained, the extracted finger vein features have more distinguishing force, and meanwhile, the network parameters are greatly reduced. In order to make the network attach more importance to learning important features of the finger veins and reduce the attention to useless information, SE-ErfResBlock adds a channel attention mechanism. Experimental results show that the proposed FGL-MobileNet effectively reduces the size of a model and has better recognition performance.

Drawings

FIG. 1 is a diagram illustrating the detailed information and partitioning of a data set according to the present invention.

Fig. 2 is a finger vein image for data set enhancement, respectively.

FIG. 3 shows a lightweight residual error unit (SE-ErfResBlock) based on a fast receptive field for constructing FGL-MobileNet network according to the present invention.

FIG. 4 is a channel attention mechanism in the SE-ErfResBlock residual unit.

FIG. 5 is an overall structure diagram of FGL-MobileNet network model.

FIG. 6 shows the detailed network structure of FGL-MobileNet.

FIG. 7 is an improved residual block in FGL-Net network.

FIG. 8 is a graph of FGL-MobileNet supervised distillation loss by only the last feature classification layer of FGL-Net.

FIG. 9 is a loss of feature maps from FGL-MobileNet, which is guided by training of feature maps at different stages of FGL-Net.

FIG. 10 is a flow chart of model training.

FIG. 11 is a comparison of Top1 ranking performance under different loss function training in order to verify the effectiveness of the network FGL-MobileNet loss function of the present invention.

FIG. 12 shows ROC curves of VGG-16, finger vein improved residual error networks, FGL-Net, MobileNet v3, FGL-MobileNet, and KD-FGL-MobileNet for finger vein feature extraction, where (a) is the ROC curve of each network in FV-USM test set; (b) ROC curves on FV-Normal test set for each network.

FIG. 13 is a comparison of Top1 ranking performance of different networks on FV-USM test set and FV-Normal test set.

FIG. 14 shows a comparison of the sizes of the VGG-16, finger vein improved residual error network, FGL-Net, MobileNet v3, and FGL-MobileNet models.

FIG. 15 shows the zero-misrecognition recognition rate and Top1 ordering of VGG-16, finger vein improved residual error network, FGL-Net, MobileNet 3, FGL-MobileNet on FV-Special test set.

Detailed Description

The invention is further illustrated by the following figures and examples.

The invention specifically realizes the following steps:

s1, constructing a data set, the finger vein data set used herein comprising 3 parts: a subject group collected Normal finger pose dataset (FV-Normal), a Special finger pose dataset (FV-Special), and a university of Malaysia finger vein dataset (FV-USM).

(1) The FV-Normal dataset contained 4600 finger venous information in the Normal resting position, 6 images per finger, a total of 24000 images, and an image resolution of 200 × 500. The first 4000 fingers are from 500 people, each person collects 8 finger information, the last 600 fingers are from 100 people, and each person collects 6 finger information. The training set and the test set are divided into 5:1 according to the number of people, finger vein images of the first 500 people are used for training, and the extended training set is obtained through rotation, translation and other modes. Images of the posterior 100 finger veins were used for testing.

(2) The FV-Special data set comprises 64 fingers, each finger comprises 9 placing postures of normal, heavy pressing, bending, upward moving, downward moving, plane left rotating, plane right rotating, axis left rotating and right axis rotating, 10 images are collected in each finger placing posture, the total amount is 5760 images, the image resolution ratio is 200 multiplied by 500, and the FV-Special data set is used for testing the recognition performance of a finger vein recognition algorithm under different finger placing postures.

(3) The fingers of the FV-USM data set belong to 123 persons, the total number of the fingers is 492, each finger has 12 finger vein images, the total number of the fingers is 5904 images, and the resolution of the ROI image is 100 x 300. The training set and the test set are divided according to the ratio of 4:1, the front 392 finger vein images are used for training, and the extended training set is obtained through expansion in modes of rotation, translation and the like. The latter 100 finger vein images were used for testing. Fig. 1 shows specific information of 3 data sets.

As can be seen from FIG. 1, the FV-USM data set has only 392 fingers for training and only 4704 finger vein images, and the FV-Normal data set has only 6 finger vein images per finger although 4000 fingers for training. The insufficient number of images in the finger vein training set brings two problems: (1) the trained finger vein feature extraction network is easy to be over-fitted, and the generalization capability on finger vein images outside a training set is insufficient; (2) too few training samples mean that the gesture of each finger is single, the trained model is used for feature extraction, and the robustness of the extracted finger vein features to the gesture changes of the fingers is poor. Therefore, it is necessary to enhance the generalization capability of the network model by appropriately expanding the training set. The training set is augmented herein using several conventional image processing techniques as follows.

(1) Translation transformation

The finger vein image in the translational posture can be calculated by means of the transformation matrix T and the original image, as shown in formulas (1.1) and (1.2):

in the formula (x)₀,y₀) As the coordinates of pixel points on the original image of the finger vein, (x)₁,y₁) As the translated image pixel point coordinates, (t)_x,t_y) The finger vein images after translation are shown in fig. 2 as the translation distances of the images in the up-down and left-right directions, respectively.

(2) Variation of brightness

Influenced by factors such as the pressure of placing the finger of the user, the ambient illumination intensity, equipment exposure and the like, the condition that the overall brightness is inconsistent usually appears in the collected finger veins, and the diversity of brightness change is insufficient due to less images in the training set, so that the generalization capability of the trained network model is insufficient. Therefore, the change of the overall brightness of the finger vein image can be realized by adjusting the gray scale factor of the gamma change, so that the brightness diversity of the training set image is enriched. The gamma conversion formula is shown in formula (1.3):

in the formula, g₀Is the gray value of a point on the original image of the finger vein, g₁Representing the luminance transformed pixel values. And lambda is a gray adjustment coefficient, when the value of lambda is greater than 0, the whole gray value of the image becomes small, the visual effect of the image is relatively dark, otherwise, the whole gray value of the image becomes large, the visual effect of the image is relatively bright, and the aim of adjusting the brightness is fulfilled. Fig. 2 shows the brightness variation of the finger vein image.

(3) Rotational transformation

The rotation transformation is used for simulating a plane rotation posture finger vein image, the conversion formula is the same as formula (1.1), and the transformation matrix is the formula (1.4):

in the formula (x)_c,y_c) The coordinate of the central pixel point of the original finger vein image is shown, and theta is a rotation angle, and the values are +/-2 +/-4. Fig. 2 shows the finger vein image after ± 2 degrees of rotation.

(4) Image noise adding

In the image acquisition process, foreign matters such as dust can possibly fall on the mirror surface of the image acquisition equipment, so that the acquired vein image can generate noise, the noise such as dust is simulated by adding random salt and pepper noise, the difficulty of network training is increased, and the anti-noise performance and the feature extraction capability of the network are improved. Fig. 2 shows a finger vein image with salt and pepper noise added.

S2, designing a lightweight fusion global and local feature network FGL-MobileNet, constructing the FGL-MobileNet through a lightweight residual error unit (SE-ErfResBlock) based on a rapid receptive field, adding a channel attention mechanism to the SE-ErfResBlock, constructing the designed SE-ErfResBlock as shown in FIG. 3, wherein the channel attention mechanism is an SE module as shown in FIG. 4, and the SE module comprises three parts: a compression operation, an excitation operation, and a weighting operation. It is assumed that the input signature graph is x,the dimension of which is W multiplied by H multiplied by C₁. The first step is a compression operation, using GAP on the input x, to compress the finger vein spatial features on each channel by a real number, at which time the feature map size becomes 1 × 1 × C₁And each real number represents the global characteristic of the finger vein on each channel, namely the real number has a global vein receptive field. The compression operation does not learn how important between the different channels. Therefore, a second step of excitation operation is required, which includes two fully-connected layers, the first fully-connected layer performs dimensionality reduction on the output of the compression operation to achieve the purpose of reducing the number of parameters, the second fully-connected layer is used for recovering the original dimensionality, and the process can learn the nonlinear relation among different characteristic channels to obtain a weight matrix representing the importance of each channel; and the third step is weighting operation, namely multiplying the channel importance weighting matrix output by the excitation operation by the channel corresponding to the input characteristic diagram for weighting, thereby realizing the enhancement of important finger vein characteristics and the suppression of the expression of useless information.

The structure of FGL-MobileNet network is shown in FIG. 5. The detailed network structure is shown in fig. 6. FGL-MobileNet is mainly divided into a backbone network and a global and local feature extraction module, wherein the backbone network is mainly composed of lightweight residual error units; the method specifically comprises a 5-layer convolution structure, Conv _1 is composed of 1 batch normalization layer, 1 Mish activation layer and 13 x 3 standard convolution layer, Conv _2 to Conv _4 are composed of 2, 2 and 4 SE-ErfResBlock respectively, Conv _5 is divided into 3 branches, each branch is composed of 2 SE-ErfResBlock, and global finger vein features and local significant vein features under corresponding granularity sizes are learned respectively. In the method, in order to obtain a feature map with higher resolution for blocking, downsampling operation is cancelled in Branch-2 and Branch-3. In the global and local feature extraction module, global vein features are extracted through Global Average Pooling (GAP), local significant vein features are extracted through Global Maximum Pooling (GMP), dimensionality reduction is performed on the extracted global features and the extracted local features through a layer of 1 × 1 convolution, and finger vein fusion feature vectors are obtained through feature splicing. 5 256-dimensional local features L and 3 256-dimensional global features G are spliced to obtain 2048-dimensional fusion feature vectors F.

S3, designing an FGL-Net network based on fusion of global and local features, mainly comprising a backbone network based on an improved residual error network and a global feature and local feature extraction module. The improved residual block is shown in fig. 7. The design structures of the FGL-Net network and the FGL-MobileNet network are the same, except that the trunk network adopts a ResNet50 structure and comprises a 5-layer convolution structure, Conv _ T1 comprises 1 convolution layer with convolution kernel size of 3 multiplied by 3, a BN layer and a Mish activation layer, and Conv _ T2 to Conv _ T5 respectively comprise 3, 4, 14 and 3 improved residual error blocks. Global features and local features are extracted by a global and local feature extraction module provided by FGL-MobileNet design, and 2048-dimensional finger vein fusion feature vectors are finally output by FGL-Net.

S4, designing a loss function, and in the FGL-Net network training process, adopting a cross entropy loss and CurricularFace loss joint monitoring network in order to improve the learning representation capability of the network model. The calculation formula is as shown in formula (4.1):

L＝L_CrossEntropy+L_{CurricularFace} (4.1)

in the formula, L_CrossEntropyFor standard cross entropy loss, L_{CurricularFace}Is a curricillarface loss.

Wherein N is the number of images in a batch in the training set, N is the number of categories of the venous images in the training set, f_iIs the feature vector of the ith finger vein image, y_iDenotes f_iCorresponding class, W_jRepresents the jth column, b, of the weight matrix W_j、b_yiIs the bias term.

The invention introduces Curricular face loss, aims to enlarge the inter-class difference of the finger vein feature vectors, reduce the intra-class difference, and simultaneously excavate finger samples with special postures in an online self-adaptive manner, so that the network can better adapt to posture changes such as translation, axis rotation and the like, and the calculation formula is as shown in formula (4.3):

in the formula, N is the number of a batch of images in the training set, s is the radius of the hypersphere, m is an angle increment penalty term, f_iIs the feature vector of the ith finger vein image, y_iDenotes f_iThe corresponding category of the content file,

to represent

And f_iAngle between, N (t, cos θ)_j) The negative cosine similarity function is expressed, and the formula is shown as the formula (4.4):

t^(k)＝αr^(k)+(1-α)t^(k-1) (4.5)

where t is the adaptive estimation parameter, t ⁽⁰⁾0, α is momentum, r^(k)The average value of the sine and cosine similarity of the kth mini-batch is shown. When the finger vein image is classified correctly, the image is considered as a normal posture finger vein image, and N (t, cos theta) at the moment_j)＝cosθ_j(ii) a When the finger vein image is classified incorrectly, the image is considered as a finger vein image with a special posture, N (t, cos theta)_j)＝cosθ_j(t+cosθ_j). In the early stage of training, t approaches 0, (t + cos θ)_j) When the weight of the normal finger is less than 1, the weight of the normal finger is greater than that of the finger in the special posture, and the network pays more attention to the normal finger, so that the model convergence is favorably accelerated; with the continuous advance of training, the sine and cosine similarity of the samples is higher, namely r^(k)Constantly increasing, t^(k)Continuously increase, and finally (t + cos θ)_j) If the weight of the finger in the special posture is more than that of the normal finger, the network pays more attention to the training of the finger in the special posture, so that the networkThe change of postures of finger translation, shaft rotation and the like is better adapted.

FGL-MobileNet and FGL-Net have the same overall network framework design, so the original training loss function of FGL-MobileNet is consistent with FGL-Net, and the formula is as (4.6):

Loss_Oringin＝Loss_CrossEntropy+Loss_{CurricularFace} (4.6)

in the formula, Loss_CrossEntropyFor cross-entropy Loss, Loss_{CurricularFace}For Curricular face loss, the global feature vector and the finger vein fusion feature vector both contain global vein information, the distinguishing power is high, Curricular face loss is adopted for constraint, the problem of non-alignment of block local features can exist, so that the local feature vector is constrained by cross entropy loss, and the interclass division can be guaranteed.

In order to enable the network to learn global vein information and local vein information under different granularity sizes and improve the identification performance of the network under different vein images, the FGL-Net and the FGL-MobileNet respectively comprise 3 branches, and respectively comprise 3 global features, 5 local features and 1 finger vein fusion feature. As shown in FIG. 8, the invention carries out knowledge distillation on the classification vectors of the 9 finger vein features of FGL-MobileNet through the classification vectors of the 9 finger vein features of FGL-Net, so that the FGL-MobileNet fully imitates the output probability distribution of FGL-Net, the FGL-MobileNet can better extract the finger vein global features and local features under different granularity sizes, and the robustness of the FGL-MobileNet to fingers with special postures is improved. The knowledge distillation loss formula is as follows (4.7):

in the formula, Loss_SRepresenting the Loss of distillation, Loss of two finger vein classification vectors_KDRepresents the total loss of distillation of network knowledge, G_si,G_tiClassification vectors, G, representing the global features of the ith branches of FGL-MobileNet and FGL-Net, respectively_sF,G_tFRespectively representClassification vector of finger vein fusion features of FGL-MobileNet and FGL-Net, L_si-j,L_ti-jRepresenting the categorical vectors of the jth local feature in the ith branch of FGL-MobileNet and FGL-Net, respectively.

The FGL-MobileNet is supervised only by the last characteristic classification layer of the FGL-Net, and the problem of insufficient constraint force exists, so that the performance improvement of the FGL-MobileNet is limited. The backbone network is responsible for learning the mapping relation from the finger vein original image to the high-level semantic features of the finger vein. Because the FGL-Net network has deeper hierarchy, more parameter quantity and stronger learning capability, the finger vein information contained in the extracted feature map is more accurate and abundant, and the mapping can be better completed. The FGL-Net and FGL-MobileNet main networks respectively comprise 4 residual convolution layers from a shallow layer to a deep layer, 6 intermediate feature maps are generated in total, and finger vein feature information of different stages of the networks is extracted. Therefore, training of the feature map of the corresponding stage in the FGL-MobileNet is guided through the intermediate feature maps of different stages of the FGL-Net, the FGL-MobileNet is more powerfully restricted to continuously learn more effective finger vein feature representation of the FGL-Net from a shallow layer to a deep layer, and mapping from a finger vein original image to a finger vein high-level semantic feature is better completed. In order to better highlight the vein texture information, the distance between feature maps is measured by using L2 Loss. The loss of the characteristic diagram is shown in FIG. 9, which has the formula (4.8):

in the formula, Loss_fRepresents the loss of the characteristic diagram, U_iIth feature map, V, representing student network_iAn ith feature diagram representing a teacher network.

Therefore, the overall FGL-MobileNet loss function formula is shown in equation (4.9):

Loss_sum＝α×Loss_Origin+(1-α)×Loss_KD+λ×Loss_f (4.9)

in the formula, Loss_sumRepresenting the total loss function, alpha and lambda being loss weighting coefficients, the different losses being adjustedIn order to better learn the finger vein feature representation of the teacher network by FGL-MobileNet, the weight of the function is set to α ═ 0.05 and λ ═ 1 in this chapter.

And S5, training the whole model, as shown in FIG. 10, until the whole training set is iteratively trained for a plurality of times.

And S6, inputting the test set images into the trained model to extract the finger vein features and carrying out recognition comparison.

The method adopts the Euclidean distance as a standard for measuring the similarity of the vein features, the smaller the Euclidean distance is, the higher the similarity of the two vein features is, and otherwise, the lower the similarity of the two vein features is. And (3) performing 1: 1, different types of comparison are carried out, a threshold value T is obtained, and the finger vein characteristics of a test set are similarly carried out by 1: 1, comparing the same type, and if the comparison result is smaller than the corresponding threshold value, judging that the comparison is successful, otherwise, judging that the comparison is failed.

The server GPU used in the experiment is NVIDIATITANTX, the CUDA version is 10.1, the operating system is 64-bit Ubuntu16.04, the used deep learning framework is Pythroch 1.7.1, and the programming language is python 3.7.9. In the training stage, the batch size of the finger vein images is set to be 16, 16 rounds of training are carried out, the initial value of the learning rate is 0.0001, the attenuation is 10 times every 5 rounds, the network parameter is updated by adopting an adaptive moment Estimation (ADAM) optimizer for optimization, the dimensionality of the finger vein feature vector is 2048 dimensions, and the normalized size of the images is 112 multiplied by 112.

In order to verify the effectiveness of the designed FGL-MobileNet loss function, the invention designs 3 groups of comparative experiments based on FV-USM data set. In the experiment of the group 1, the network is only restricted by FGL-MobileNet original loss, in the experiment of the group 2, distillation loss is added on the basis of the experiment of the group 1, and in the experiment of the group 3, distillation loss and characteristic diagram loss, namely a loss function designed in this chapter, are added on the basis of the experiment of the group 1. The experimental result is shown in fig. 11, compared with the original loss, after the distillation loss is added, the zero-misidentification recognition rate of the network is improved by 0.92%, the Top1 sequencing performance is improved by 0.12%, and it is shown that after the distillation loss is added, FGL-MobileNet can learn the finger vein feature distribution output by the teacher model, so that the finger vein recognition performance is improved. After the knowledge distillation loss and the characteristic diagram loss are added, the zero-misidentification recognition rate of the network is improved by 2.14% compared with the original loss, the Top1 sequencing performance is improved by 0.17%, the recognition performance of the FGL-MobileNet is further improved after the characteristic diagram loss is added, and the characteristic diagram loss can be more strongly restricted from a shallow layer network to a deep layer network to approximate to a teacher model, so that the teacher model can be more effectively learned to indicate vein feature representation.

In order to verify the effectiveness of FGL-MobileNet, the invention sets the following groups of comparative experiments. Wherein, FGL-MobileNet represents that only the original loss training is adopted, and KD-FGL-MobileNet represents that the loss function designed by the invention is adopted for training.

(1) Comparison of Performance on FV-USM test set and FV-Normal test set

The VGG-16, the finger vein improved residual network, the FGL-Net, the MobileNet v3, the FGL-MobileNet and the KD-FGL-MobileNet are respectively used for extracting the finger vein characteristics, and ROC curves of the networks are shown in FIG. 11. From the ROC curve in fig. 12, when FAR is 0, the recognition rates of FGL-MobileNet on FV-USM test set and FV-Normal test set are respectively improved by 4.23%, 10.98%, 9.13%, 8.48%, 16.69%, and 6.21% compared with VGG-16, finger vein improved residual error network, and MobileNetv3 zero false recognition, which indicates that FGL-MobileNet can better extract finger vein feature information, and respectively reduced by 2.2% and 1.90% compared with FGL-Net, and indicates that the generalization performance of FGL-MobileNet is weakened due to shallower network and less parameter quantity. But KD-FGL-MobileNet improves 2.14% and 1.77% respectively in FV-USM test set and FV-Normal test set compared with FGL-MobileNet zero false recognition rate, and only differs from FGL-Net by 0.02% and 0.13%, which indicates that the loss function designed in this chapter can effectively transfer finger vein feature distribution learned by FGL-Net to FGL-MobileNet.

As can be seen from FIG. 13, the Top1 ranking performance of FGL-MobileNet on FV-USM test set and FV-Normal test set was improved by 1.34%, 4.10%, 3.82% and 1.58%, 8.09%, 4.79% respectively compared with VGG-16, finger vein improved residual network, MobileNet v 3. Compared with FGL-MobileNet, KD-FGL-MobileNet improves the Top1 sequencing performance by 0.19 percent and 0.17 percent on an FV-USM test set and an FV-Normal test set, and only has a difference of 0.02 percent and 0.00 percent with FGL-Net, which shows that FGL-MobileNet designed in this chapter and the effectiveness of loss functions thereof effectively improve the Top1 sequencing performance of finger veins.

As can be seen from FIG. 14, the model size of FGL-MobileNet was reduced by 17 times compared to FGL-Net, which is only 12.25M. Compared with VGG-16 and finger vein improvement, the residual network parameters are greatly reduced, and are only slightly larger than that of the MobileNet v3 model, so that the practicability of FGL-MobileNet is greatly improved.

(2) Performance comparison on FV-Special test set

In order to verify the recognition performance and the Top1 sorting performance of the FGL-MobileNet on the finger vein images in the special postures, the MobileNet v3, the FGL-Net, the FGL-MobileNet and the KD-FGL-MobileNet are respectively used for feature extraction, and the zero-misrecognition recognition rate and the Top1 sorting of each network in different finger postures are shown in FIG. 15.

As can be seen from fig. 15, under different finger placement postures, the average zero-false recognition rate and the average Top1 ranking performance of FGL-MobileNet are respectively improved by 17.93% and 8.68% compared with MobileNetv3, which indicates that the finger vein features extracted by FGL-MobileNet herein are more robust to finger posture changes. Compared with FGL-Net, the average zero-misrecognition recognition rate and the average Top1 sequencing performance of FGL-MobileNet are respectively reduced by 6.19 percent and 3.78 percent, the loss is large, but when the loss function designed in the chapter is adopted for training, the KD-FGL-MobileNet performance and the FGL-Net performance have almost the same difference, which shows that the loss function in the chapter can effectively transfer the robustness of FGL-Net on finger posture change to FGL-MobileNet, and the recognition performance of FGL-MobileNet under a finger vein image with a special posture is obviously improved.

Claims

1. A finger vein identification method based on a lightweight fusion global and local feature network is characterized by comprising the following steps:

step 1, constructing a data set, and dividing the data set into a training set and a test set; expanding the divided training set by using translation, brightness change, rotation and noise adding operations;

step 2, designing a lightweight fusion global and local feature network FGL-MobileNet based on a lightweight residual error unit;

step 3, designing an FGL-Net network based on fusion of global and local characteristics, which mainly comprises a backbone network based on an improved residual error network and a global characteristic and local characteristic extraction module;

step 4, designing a network loss function, and adopting a cross entropy loss and Curricular face loss joint supervision network in the FGL-Net network training process; aiming at the FGL-MobileNet network, a teacher-student mode is introduced, the loss of a feature map and the loss of knowledge distillation are added into a loss item, the FGL-Net is used as a teacher network, and the learned more accurate finger vein feature distribution is taught to the student network FGL-MobileNet as much as possible, so that the generalization performance of the FGL-MobileNet is improved;

step 5, training the whole FGL-MobileNet network model until the whole training set is iteratively trained for a plurality of times;

and 6, inputting the test set images into the trained FGL-MobileNet network model to extract the finger vein features, and identifying and comparing the finger vein features.

2. The finger vein identification method based on the lightweight fusion global and local feature network according to claim 1, wherein the step 2 is implemented as follows:

FGL-MobileNet is constructed based on a lightweight residual error unit SE-ErfResBlock of a rapid receptive field, and a channel attention mechanism is added to the SE-ErfResBlock, wherein the channel attention mechanism is a SE module and comprises three parts: a compression operation, an excitation operation, and a weighting operation.

3. The finger vein identification method based on the lightweight fusion global and local feature network according to claim 2, wherein the SE module is specifically implemented as follows: let the input feature map be x, with dimensions W × H × C₁(ii) a The first step is a compression operation, using GAP on the input x, to compress the finger vein spatial features on each channel by a real number, at which time the feature map size becomes 1 × 1 × C₁(ii) a The second activation operation comprises two full operationsThe first full-connection layer is used for reducing the dimension of the output of the compression operation, and the second full-connection layer is used for recovering the original dimension to obtain a weight matrix representing the importance of each channel; and the third step is weighting operation, namely multiplying the channel importance weight matrix output by the excitation operation by the channel corresponding to the input characteristic diagram for weighting, thereby realizing the enhancement of the important finger vein characteristics and the suppression of the expression of useless information.

4. The finger vein recognition method based on the lightweight fusion global and local feature network of claim 2, wherein the FGL-MobileNet network structure mainly comprises a backbone network and a global and local feature extraction module; the main network mainly comprises a lightweight residual error unit, and specifically comprises 5 layers of convolution structures: conv _1 is composed of 1 batch normalization layer, 1 Mish activation layer and 13 × 3 standard convolution layer; conv _2 to Conv _4 are respectively composed of 2, 2 and 4 SE-ErfResBlock; conv _5 is divided into 3 branches, each branch is composed of 2 SE-ErfResBlock, and global finger vein features and local significant vein features under corresponding granularity sizes are learned respectively.

5. The finger vein identification method based on the lightweight fusion global and local feature network of claim 4 is characterized in that in the global and local feature extraction module, global vein features are extracted through global average pooling, local significant vein features are extracted through global maximum pooling, the extracted global features and local features are subjected to dimension reduction through a layer of 1 x 1 convolution, and finger vein fusion feature vectors are obtained through feature splicing; 5 256-dimensional local features L and 3 256-dimensional global features G are spliced to obtain 2048-dimensional fusion feature vectors F.

6. The finger vein recognition method based on the lightweight fusion global and local feature network of claim 1, which is characterized in that the FGL-Net network based on the fusion global and local features mainly comprises a backbone network based on an improved residual error network and a global feature and local feature extraction module; the design structure of the FGL-Net network is the same as that of the FGL-MobileNet network, the main network of the FGL-Net network adopts a ResNet50 structure and comprises a 5-layer convolution structure, Conv _ T1 comprises 1 convolution layer with convolution kernel size of 3 multiplied by 3, a BN layer and a Mish activation layer, and Conv _ T2 to Conv _ T5 respectively comprise 3, 4, 14 and 3 improved residual blocks; and finally outputting a 2048-dimensional finger vein fusion feature vector by the FGL-Net.

7. The finger vein recognition method based on the lightweight fusion global and local feature network of claim 1, wherein the loss function is designed as follows:

in the FGL-Net network training process, a cross entropy loss and Curricular face loss joint supervision network is adopted, and the calculation formula is as shown in formula (4.1):

L＝L_CrossEntropy+L_{CurricularFace} (4.1)

in the formula, L_CrossEntropyFor standard cross entropy loss, L_{CurricularFace}Is Currickarrface loss;

wherein N is the number of images in a batch in the training set, N is the number of categories of vein images in the training set, and f_iIs the feature vector of the ith finger vein image, y_iDenotes f_iCorresponding class, W_jRepresents the jth column, b, of the weight matrix W_j、

Is a bias term;

the formula for calculating the CurrickiarFace loss is shown as the formula (4.3):

in the formula, N is the number of images in a batch in the training set, s is the radius of the hypersphere, and m is the angleDegree increment penalty term, f_iIs the feature vector of the ith finger vein image, y_iDenotes f_iThe corresponding category of the content file,

to represent

t^(k)＝αr^(k)+(1-α)t^(k-1) (4.5)

where t is the adaptive estimation parameter, t⁽⁰⁾0, α is momentum, r^(k)The average value of sine and cosine similarity of the kth mini-batch is shown; when the finger vein image is classified correctly, the image is considered as a normal posture finger vein image, and N (t, cos theta) at the moment_j)＝cosθ_j(ii) a When the finger vein image is classified incorrectly, the image is considered as a finger vein image with a special posture, N (t, cos theta)_j)＝cosθ_j(t+cosθ_j) (ii) a In the early stage of training, t approaches 0, (t + cos θ)_j) When the weight of the normal finger is less than 1, the weight of the normal finger is greater than that of the finger in the special posture, and the network focuses more on the normal finger, so that the model convergence is accelerated; with the continuous progress of training, the sine and cosine similarity of the samples is higher and higher, namely r^(k)Constantly increasing, t^(k)Continuously increase, and finally (t + cos θ)_j) If the weight of the finger in the special posture is more than 1, namely the weight of the finger in the special posture is more than that of the finger in the normal posture, the network pays more attention to the training of the finger in the special posture, and the network can better adapt to the changes of the finger translation posture and the axis rotation posture.

8. The finger vein recognition method based on lightweight fusion global and local feature network of claim 7, wherein the FGL-MobileNet has the same overall network framework design as the FGL-Net, so the original training loss function of the FGL-MobileNet is consistent with the FGL-Net, and the formula is as shown in formula (4.6):

Loss_Oringin＝Loss_CrossEntropy+Loss_{CurricularFace} (4.6)

in the formula, Loss_CrossEntropyFor cross-entropy Loss, Loss_{CurricularFace}For Curricular face loss, the global feature vector and the finger vein fusion feature vector both contain global vein information, the distinguishing power is high, Curricular face loss is adopted for constraint, the problem that local features of blocks are not aligned possibly exists, so that the local feature vector is constrained by cross entropy loss, and the interclassmate division is guaranteed;

FGL-Net and FGL-MobileNet comprise 3 branches, respectively comprise 3 global features, 5 local features and 1 finger vein fusion feature; knowledge distillation is carried out on the classification vectors of the 9 finger vein features of the FGL-MobileNet through the classification vectors of the 9 finger vein features of the FGL-Net, so that the FGL-MobileNet fully imitates the output probability distribution of the FGL-Net, and the knowledge distillation loss formula is as shown in a formula (4.7):

in the formula, Loss_SRepresenting the Loss of distillation, Loss of two finger vein classification vectors_KDRepresents the total loss of distillation of network knowledge, G_si,G_tiClassification vectors, G, representing the global features of the ith branches of FGL-MobileNet and FGL-Net, respectively_sF,G_tFClassification vectors representing finger vein fusion characteristics of FGL-MobileNet and FGL-Net, respectively, L_si-j,L_ti-jClassification vectors representing jth local features in ith branches of FGL-MobileNet and FGL-Net, respectively;

the FGL-Net and FGL-MobileNet backbone networks respectively comprise 4 residual convolution layers from a shallow layer to a deep layer, 6 intermediate feature maps are generated in total, and finger vein feature information of different stages of the networks is extracted; therefore, training of the feature maps of corresponding stages in the FGL-MobileNet is guided through the intermediate feature maps of different stages of the FGL-Net, the FGL-MobileNet is more powerfully restrained from learning more effective finger vein feature representation from a shallow layer to a deep layer, and mapping from a finger vein original image to a finger vein high-level semantic feature is better completed; and measuring the distance between the characteristic graphs by adopting characteristic graph loss, wherein the formula is as shown in formula (4.8):

in the formula, Loss_fRepresents the loss of the characteristic diagram, U_iIth feature map, V, representing student network_iAn ith feature map representing the teacher network;

Loss_sum＝α×Loss_Origin+(1-α)×Loss_KD+λ×Loss_f (4.9)

in the formula, Loss_sumRepresenting the overall loss function, and α and λ are loss weighting coefficients.