CN113538359A

CN113538359A - System and method for finger vein image segmentation

Info

Publication number: CN113538359A
Application number: CN202110783187.7A
Authority: CN
Inventors: 刘旭华; 徐红; 邹伟; 韩烽
Original assignee: Beijing Shuguang Autopass Technology Co ltd
Current assignee: Beijing Shuguang Autopass Technology Co ltd
Priority date: 2021-07-12
Filing date: 2021-07-12
Publication date: 2021-10-22
Anticipated expiration: 2041-07-12
Also published as: CN113538359B

Abstract

The embodiment of the invention provides a system for segmenting a finger vein image, which comprises the following components: the encoder is used for carrying out downsampling on the input finger vein image with different resolutions based on the hole convolution to generate a plurality of feature maps of vein lines with different resolutions of the finger vein image; the decoder is used for carrying out feature fusion and up-sampling on the basis of the feature map of the corresponding vein line to generate a segmented venation image; wherein the encoder and decoder are trained in the following manner: generating a vein image with a plurality of training sample encoders and decoders including finger vein images and corresponding vein labels, calculating a loss value from the vein image and the vein labels with a loss function, and updating parameters of a neural network in the encoders and decoders by back propagation based on the loss value; according to the method, the input finger vein image is downsampled at different resolutions based on the hole convolution, so that a finer vein image with fewer noise points can be extracted.

Description

System and method for finger vein image segmentation

Technical Field

The present invention relates to the field of image processing, in particular to the field of finger vein extraction, and more particularly to a system and method for finger vein image segmentation.

Background

In the present society, with the continuous development of science and technology, the information security problem becomes the most urgent problem at present, and the security problem of how to effectively protect personal information is particularly important. The biological characteristics widely used at present mainly include human faces, fingerprints, palmprints, irises and the like, but the physiological characteristics are easy to steal. Therefore, researches on biometric identification with high anti-counterfeiting performance and high security have been paid attention by researchers. The finger vein (hereinafter referred to as finger vein) has the characteristics of liveness and high anti-counterfeiting property and is difficult to steal, so that the finger vein identification technology has important research significance and practical value, and how to completely and accurately extract the lines of the finger vein influences the identification accuracy to a great extent.

In finger vein recognition, vein feature extraction of veins is a key to vein recognition. The traditional finger vein segmentation method has great limitation, specific application conditions and weak generalization capability; deep learning can solve the problems to a certain extent due to the strong feature expression capability of the deep learning. However, when the existing finger vein segmentation models based on the convolutional neural network are used for training finger vein images based on small samples, overfitting is easy to occur, and the generalization capability is poor; or problems of some detail information being lost after segmentation, more noisy points or inconsistent veins in the vein image. For example, in the field of medical image segmentation, a net network model proposed by Ronneberger (Ronneberger) and the like for biological cell image segmentation is widely used in the field of medical image segmentation in recent years, but when the net is used for finger vein segmentation, fine information is lost after segmentation, more noise points appear, or veins in a vein image are not consistent.

Disclosure of Invention

It is therefore an object of the present invention to overcome the above-mentioned drawbacks of the prior art and to provide a system and method for finger vein image segmentation.

The purpose of the invention is realized by the following technical scheme:

according to a first aspect of the present invention, there is provided a system for finger vein image segmentation, comprising: the encoder is used for carrying out downsampling on the input finger vein image with different resolutions based on the hole convolution to generate a plurality of feature maps of vein lines with different resolutions of the finger vein image; and the decoder is used for performing feature fusion and upsampling on the basis of the corresponding feature map of the vein texture to generate a segmented venation image. Wherein the encoder and decoder are trained in the following manner: the method includes generating a vein image with a plurality of training samples encoder and decoder including finger vein images and corresponding vein labels, calculating a loss value from the vein image and the vein labels with a loss function, and updating parameters of a neural network in the encoder and decoder by back propagation based on the loss value.

In some embodiments of the present invention, the downsampling the input finger vein image based on the hole convolution with different resolutions includes downsampling according to a normal hole convolution layer and a residual block hole convolution layer.

In some embodiments of the present invention, the encoder performs a convolution operation with a step size of 2 through the corresponding hole convolution layer during the down-sampling process to reduce the resolution of the feature map of the vein pattern.

In some embodiments of the invention, the encoder does not reduce the resolution of the feature map of the vein print by a pooling operation during the downsampling process.

In some embodiments of the present invention, all activation functions of the system for finger vein image segmentation except the activation function for generating the segmented finger vein choroid map employ a Mish activation function.

In some embodiments of the invention, the encoder comprises: a first encoding module configured to perform downsampling using a normal hole convolution layer to obtain a first feature map of a first resolution; a second encoding module configured to down-sample the first feature map of the first resolution based on the hole convolution layer of the residual block to obtain a second feature map of a second resolution; a third encoding module configured to down-sample the second feature map of the second resolution based on the hole convolution layer of the residual block to obtain a third feature map of a third resolution; and the fourth coding module is configured to perform convolution operation for increasing the number of channels on the third feature map of the third resolution based on the hole convolution layer of the residual block to obtain a fourth feature map of the third resolution.

In some embodiments of the present invention, the residual block includes a first residual block, the first residual block includes a first main branch and a first bypass branch, the first main branch includes at least two hole convolution layers, the first bypass branch includes a BatchNorm (BN) layer and a hash activation function, which are connected in sequence, and a sum of an output of the first main branch and an output of the first bypass branch is used as an output of the first residual block after passing through a hash activation function.

In some embodiments of the invention, the residual block comprises a second residual block comprising a second main branch and a second bypass branch, the second main branch comprising at least two hole convolution layers, an output of the second bypass branch being equal to the input, and a sum of the output of the second main branch and the output of the second bypass branch being an output of the second residual block after a Mish activation function.

In some embodiments of the invention, the system may further comprise: and the link layer comprises a two-dimensional convolutional neural network layer and is used for performing convolution operation for increasing the number of channels on the fourth feature map to obtain a fifth feature map of a third resolution and outputting the fifth feature map to the decoder.

In some embodiments of the invention, the decoder comprises: the first decoding module is configured to perform fusion, upsampling and two-dimensional ordinary convolution operation for reducing the number of channels on the third feature map and the fifth feature map of the third resolution to obtain a sixth feature map of the second resolution; the second decoding module is configured to perform two-dimensional ordinary convolution operations of fusing, upsampling and reducing the number of channels on the second feature map and the sixth feature map of the second resolution to obtain a seventh feature map of the first resolution; the third decoding module is configured to perform fusion, upsampling and two-dimensional ordinary convolution operation for reducing the number of channels on the first feature map and the seventh feature map of the first resolution to obtain an eighth feature map; the fourth decoding module is configured to perform two-dimensional ordinary convolution operation for reducing the number of channels on the eighth feature map with the original resolution to obtain a ninth feature map; a fifth decoding module configured to process the ninth feature map to generate a segmented context image.

In some embodiments of the invention, the penalty value is a weighted sum of the two-class cross-entropy penalty and the Dice penalty.

According to a second aspect of the present invention, there is provided a method of extracting venular veins of a finger, comprising: the finger vein image is input to the system according to the first aspect, and the vein image of the segmented finger vein is output.

According to a third aspect of the present invention, there is provided a method for identification based on finger veins, comprising: extracting a venation image of a finger vein of an identity to be identified by using the system of the first aspect or the method of the second aspect; and performing identity recognition based on the extracted vein image of the finger vein.

According to a fourth aspect of the present invention, there is provided an electronic apparatus comprising: one or more processors; and a memory, wherein the memory is to store one or more executable instructions; the one or more processors are configured to implement the steps of the method of the third or fourth aspect via execution of the one or more executable instructions.

Drawings

Embodiments of the invention are further described below with reference to the accompanying drawings, in which:

FIG. 1 is a block diagram of a system for finger vein image segmentation according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating a first residual block according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating a second residual block according to an embodiment of the present invention;

FIG. 4 is a graph comparing the Mish activation function and the ReLU activation function;

FIG. 5 is a block diagram of one embodiment of a system for finger vein image segmentation, in accordance with an embodiment of the present invention;

FIG. 6 is a schematic view of a finger vein collection scenario using a finger vein instrument;

FIG. 7 is a schematic diagram of the principle of collecting a finger vein using a finger vein instrument;

FIG. 8 is a schematic diagram of finger vein data collected by the finger vein machine;

FIG. 9 is a schematic diagram of a finger vein image and corresponding choroid tags;

FIG. 10 is a graph of the accuracy of the network structure of the system of the present invention on the training set and three prior network structures collected during the experiment;

FIG. 11 shows the accuracy results of the network structure of the system of the present invention on the validation set and three prior network structures collected during the experiment;

FIG. 12 is a graph of the average cross-over ratio of the three prior network structures collected during the experiment and the network structure of the system of the present invention over the training set;

FIG. 13 is a graph of the average cross-over ratio of three prior network structures collected during an experiment and the network structure of the system of the present invention over a validation set;

FIG. 14 shows the results of the average cross-over ratio using three different activation functions collected during the experiment;

fig. 15 is a comparison graph of segmentation effects of three existing network structures acquired during experiments and the network structure of the system of the present invention on some finger vein images.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail by embodiments with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As mentioned in the background section, after the finger vein image is segmented by using the existing segmented neural network uet, some fine information may be lost after segmentation, more noise may occur, or the venation in the venation image may be discontinuous. If subsequent exploitation is to be performed, the performance of the corresponding task based on the extracted context image may also be affected. For example, the extracted vein veins of the finger are subsequently used for identity recognition accuracy. Therefore, the invention constructs a system for segmenting the finger vein image, and performs downsampling with different resolutions on the input finger vein image based on the hole convolution, thereby reducing the information loss in the finger vein extraction process, avoiding the influence of noise points on vein lines, and extracting a more precise vein image with less noise points.

Before describing embodiments of the present invention in detail, some of the terms used therein will be explained as follows:

hole convolution (also called dilation convolution) refers to convolution in which holes are injected into a standard (normal) convolution kernel. Hole convolution adds a hyperparameter, called the hole rate, to standard convolution.

Step size (Stride), refers to the step length of the convolution kernel at each time in the corresponding direction of the graph. The default step size is 1, which indicates that one pixel point is moved in the corresponding direction on the image at a time. If the step length is 2, it is indicated that two pixels are moved in the corresponding direction at a time on the image.

The activation function refers to a function for introducing a nonlinear factor in the neural network. E.g. Sigmoid, tanh, ReLU, Mish activation functions.

In the field of image segmentation, the uet network is a segmented network of a classic U-type network architecture. However, when using the Unet network, the pooling layer may lose part of vein information in the vein extraction of the finger vein, and the segmented vein-vein graph may have breakpoints, resulting in incomplete veins and more noise points. The accuracy of subsequent prediction tasks (such as authentication) performed based on the Vena digitalis context map segmented by the Unet network is poor. Therefore, the invention is improved based on a U-shaped network architecture, and the hole convolution is adopted in a characteristic extraction part (corresponding to an encoder) to replace the traditional convolution for down-sampling, so that the receptive field can be increased, and a segmentation network capable of better extracting the vein information of the finger veins is obtained.

According to an embodiment of the present invention, a system for segmentation of a finger vein image is provided, see fig. 1, including an encoder and a decoder, wherein the encoder is configured to perform downsampling of the input finger vein image with different resolutions based on a hole convolution to generate a feature map of a plurality of vein lines of the finger vein image with different resolutions; and the decoder is used for performing feature fusion and upsampling on the basis of the corresponding feature map of the vein texture to generate a segmented venation image. When the resolution of the input image is reduced based on the hole convolution, the resolution and the number of channels of the feature map output by one coding module can be adjusted by combining the hyper-parameters during the corresponding hole convolution.

According to one embodiment of the invention, in order to better extract the venation digitalis by utilizing feature maps under multiple scales, a plurality of coding modules are arranged in an encoder. For example, the encoding module may include a first encoding module, a second encoding module, and a third encoding module. The first encoding module, the second encoding module and the third encoding module may be configured to perform downsampling of the input finger vein image with different resolutions based on hole convolution to generate a feature map of a plurality of vein lines of the finger vein image with different resolutions. To further enrich the extracted feature information, according to an embodiment of the present invention, a fourth encoding module is added in the encoder. The fourth encoding module may not resize the feature map, and the resolution of the output feature map may be the same as the input feature map, but the output feature map has more channels than the input feature map. The system may further include a link layer for linking the encoder and the decoder, and transmitting an output of a last encoding module in the encoder to the decoder as an input after performing channel adjustment. According to an embodiment of the present invention, in order to better extract more complete finger vein veins from the feature map at multiple scales, a plurality of decoding modules may be provided in the decoder. For example, the decoding module may include a first decoding module, a second decoding module, a third decoding module, a fourth decoding module, and a fifth decoding module. The profiles of the corresponding coding modules in the encoder can be fed via the skip branch into the decoding modules with the same or similar resolution input profiles. For example, the output of the first encoding module may be transmitted to the third decoding module via a skip branch, and the input of the second encoding module may be transmitted to the second decoding module via a skip branch. The input of the third encoding module may be transmitted to the first decoding module via a skip branch. The link layer performs channel adjustment on the output of the last coding module in the encoder and transmits the output to a first decoding module of a decoder as input. The first decoding module, the second decoding module and the third decoding module are connected in sequence, the output of the previous module is used as the input of the next module, and the previous module can be configured to perform fusion, upsampling and two-dimensional ordinary convolution operation for reducing the number of channels and transmitting the two-dimensional ordinary convolution operation for the characteristic diagram from the input of the skip branch and the input of the main branch (i.e. the U-shaped link formed by connecting the encoder, the link layer and the decoder in sequence). The fourth decoding module may be configured to perform a two-dimensional ordinary convolution operation with a reduced number of channels on the input feature map. The fifth decoding module may be configured to process the input feature map to generate a segmented context image.

It should be understood that the configuration shown in fig. 1 is only an alternative embodiment, and one skilled in the art could adapt it based on the principle of the present invention to obtain a system with similar effect. For example, on the basis of the above embodiment, the fourth encoding module is removed and the output of the third encoding module is passed to the link layer. For another example, a down-sampling coding module is added between the third coding module and the fourth coding module, an up-sampling decoding module is added between the link layer and the first decoding module, and a skip branch is added between the added coding module and the decoding module.

According to one embodiment of the invention, a residual network may be used to deepen the network depth during downsampling. Some studies have shown that the Unet network has achieved good prediction of segmentation in semantic segmentation or other prediction tasks at the pixel level. However, the convolution of the Unet network is similar to that of the conventional convolutional neural network in that the image is convolved first and then pooled, the receptive field is increased while the size of the finger vein image is reduced, so that although the receptive field can be increased, the size of the finger vein feature map can be reduced at the same time, the resolution of the feature map is reduced, at this time, the original image size is restored through UpSampling (UpSampling), and in the process of increasing after reducing, some venation information of the finger vein image is lost, so that the continuity of the pulse path of the segmented finger vein is affected. In order to better extract the pulse path information of the finger veins, the system adopts the hole convolution instead of the conventional convolution operation. The signature size after hole convolution is as follows:

where i represents the size of the input feature map, p represents the number of zero padding in the convolution process, and Ker_nK + (k-1) (d-1), where k denotes the convolution kernel size, d denotes the sampling rate of the hole convolution, and s denotes the step size of the convolution. From the above equation, the magnitude of d causes the magnitude of the convolution kernel to change, i.e., d increases the convolution kernel Ker_nAlso becomes large, the convolution kernel Ker_nWill result in an output signature

_oOf (2), i.e. the convolution kernel Ker_nEnlarged and output characteristic diagram_oIt becomes smaller, that is, d increases to make the convolution kernel Ker_nBecome larger, but at the same time will output the characteristic Map_oAnd becomes smaller. By adjusting these hyper-parameters accordingly, a feature map of the required resolution can be obtained during the encoding process.

In order to better exploit the residual network to extract contextual features of finger veins, applicants have designed two types of residual blocks for data processing. Referring to fig. 2, a residual block may include a first residual block and/or a second residual block according to an embodiment of the present invention. The first residual block may include a first main branch and a first bypass branch, the first main branch includes a hole convolution layer, a BN layer, a hash activation function, a BN layer, and a hash activation function, which are sequentially connected, the first bypass branch includes a BN layer and a hash activation function, which are sequentially connected, and a sum of an output of the first main branch and an output of the first bypass branch passes through a hash activation function and then is used as an output of the first residual block. Referring to fig. 3, according to an embodiment of the present invention, the second residual block may include a second main branch and a second bypass branch, the second main branch includes a hole convolution layer, a BN layer, a hash activation function, a BN layer, and a hash activation function, which are connected in sequence, an output of the second bypass branch is equal to an input, and a sum of an output of the second main branch and an output of the second bypass branch is used as an output of the second residual block after passing through a hash activation function. Wherein, the BN layer refers to Batch Normal or Batch Normal. According to an embodiment of the present invention, the first encoding module, the second encoding module, and the third encoding module may each perform downsampling using a residual block-based hole convolution layer; it will be appreciated that this is not essential. For example, the down-sampling of the input finger vein image by the hole convolution with different resolutions may include down-sampling according to a normal hole convolution layer and a hole convolution layer based on a residual block. For example, the first coding module performs downsampling based on a normal hole convolution layer. The second encoding module and the third encoding module perform downsampling of the hole convolution layer based on the residual block. The technical scheme of the embodiment can at least realize the following beneficial technical effects: the advantage of the hole convolution is that the receptive field can be increased even without pooling operation, and the problem of vein information loss caused in the process of the vein feature map becoming smaller and larger can be avoided. And the global information of vein grains of the finger vein image can be better learned, and the vein segmentation precision of the finger vein and the continuity of vein veins are improved.

According to an embodiment of the present invention, in order to reduce the resolution of the feature map more quickly and avoid the neural network from being too large, the encoder performs a convolution operation with a step size of 2 through the corresponding hole convolution layer in the downsampling process to reduce the resolution of the feature map of the vein pattern. In order to avoid the influence of any pooling operation on vein information of the finger vein during the down-sampling process, the encoder of the present invention does not reduce the resolution of the feature map of the vein pattern during the down-sampling process through the pooling operation. Thus, by combining the cavity volume and the corresponding step length hyperparameter, the venation information of the finger vein can be better extracted.

According to an embodiment of the present invention, the activation functions other than the activation function for generating the segmented finger vein venation map in the system for finger vein image segmentation may adopt ReLU, Mish or Sigmoid activation functions. However, the applicant has found that in this process, the effect of different activation functions on the extracted results is different. In network training, Sigmoid activation function and ReLU activation function are the most commonly used convolutional layer activation functions. The Sigmoid function has no sparsity, so that the problems of low convergence speed of network training, gradient disappearance and the like can be caused. Although there are many advantages to the ReLU function compared to the Sigmoid function, the ReLU function is not suitable for the input of data with large gradient, because after the parameter is updated, the ReLU neuron has the problem that the activation function is no longer available, so that the gradient is always zero, i.e. the ReLU function may have the problem of neuron necrosis (network "ded"). Fig. 4 shows a schematic diagram of the ReLU activation function and the Mish activation function, with the abscissa as input and the ordinate as output; as can be seen from the dashed box label, compared with the ReLU activation function, the Mish activation function is not completely truncated when the value is negative, but allows a smaller negative gradient to flow in, so that the integrity of the venation information of the finger vein can be ensured; the Mish function has the characteristics of lower bound and no upper bound, the problem of gradient saturation is well avoided, meanwhile, the Mish function ensures the smoothness of each point, so that the gradient descending effect is better than that of ReLU, and the activation function allows vein information with better finger veins to flow into a neural network, thereby improving the segmentation accuracy and generalization capability of the network and ensuring the continuity of the vein information. Therefore, in order to extract the pulse path information of the finger vein more completely, the system preferably adopts a Mish activation function instead of the ReLU function. That is, in the down-sampling and up-sampling, in order to better propagate vein pattern feature information, a Mish activation function is adopted to replace a ReLU activation function, and the Mish activation function is helpful for a decoder to restore finger vein pattern feature details in the up-sampling. The formula of the Mish activation function is as follows:

f(x)＝x*tanh(δ(x))；

where x represents an input feature map, δ (x) ═ ln (1+ e)^x)。

One exemplary network architecture of a system for finger vein image segmentation according to an embodiment of the present invention is shown in fig. 5, and includes a first encoding module, a second encoding module, a third encoding module, a fourth encoding module, a link layer, a first decoding module, a second decoding module, a third decoding module, a fourth decoding module, and a fifth decoding module. The input of the first decoding module (assuming that the input finger vein image is 100 × 200 × 1, namely 100 × 200 pixels, a single-channel image) is sequentially processed by the zero padding module, the common hole convolution layer, the BN layer and the Mish activation function to obtain an output feature map, and the output feature map is transmitted to the second encoding module and the third decoding module through the skip branch; the step length of the common void convolutional layer is set to be 2, the size of a convolutional kernel is 3 multiplied by 3, and the number of output channels is 64; the size of the output feature map is 50 × 100 × 64(50 × 100 pixels, 64 channels, which are similar in meaning and will not be described again). The zero filling module is used for filling zero on the edge of the feature map so as to enable the size of the output feature map to reach the preset size; the outermost pixels of the input feature map are typically filled with 0 s. The BN layer refers to a layer for Batch Normal. The input of the second coding module can be configured to be sequentially processed by a first residual block and two second residual blocks to obtain an output characteristic diagram, and the output characteristic diagram is transmitted to the third coding module and transmitted to the second decoding module through the skip branch; the step size of one of the hole convolution layers in the first residual block and the second residual block of the second coding module is set to 2 to reduce the resolution of the feature map, for example: the step size of the first hole convolution layer of the first residual block of the second coding module is set to 2, the step sizes of the second hole convolution layer of the first residual block and the two hole convolution layers of all the second residual blocks are set to 1, the number of channels can be set to 128(n is 128), and the size of the output feature map is 25 x 50 x 128. The input of the third coding module can be configured to be sequentially processed by one first residual block and three second residual blocks to obtain an output characteristic diagram, and the output characteristic diagram is transmitted to the fourth coding module and transmitted to the first decoding module through the skip branch; the step size of one of the hole convolution layers in the first residual block and the second residual block of the third coding module is set to 2 to reduce the resolution of the feature map, for example: the step size of the first hole convolution layer of the first residual block of the third coding module is set to 2, the step sizes of the second hole convolution layer of the first residual block and the two hole convolution layers of all the second residual blocks are set to 1, the number of channels can be set to 256(n is 256), and the size of the output feature map is 13 x 25 x 256. The input of the fourth encoding module may be configured to sequentially undergo the processing of one first residual block and five second residual blocks to obtain an output feature map, and transmit the output feature map to the link layer; the fourth encoding module may be configured not to adjust the resolution of the feature map but to adjust to increase the number of channels, for example: the step size of the two hole convolution layers of the first residual block of the fourth encoding module is set to 1, the step sizes of the two hole convolution layers of all the second residual blocks are set to 1, the number of channels can be set to 512(n is 512), and the size of the output feature map is 13 × 25 × 512. The input of the link layer can be configured to be sequentially processed by a zero padding module, a two-dimensional convolution module Conv2D, a BN layer and a Mish activation function to obtain an output characteristic diagram, and the output characteristic diagram is transmitted to a first decoding module; the size of the output signature was 13 × 25 × 1024. The first decoding module may be configured to perform fusion, upsampling, zero padding module, two-dimensional convolution module Conv2D, BN layer, and Mish activation function processing on the feature map from the third encoding module and the feature map from the link layer to obtain an output feature map; the size of the output signature was 25 x 50 x 512. In the process of fusing the feature maps, the first decoding module fuses the feature map with the size of 13 × 25 × 256 from the third encoding module and the feature map with the size of 13 × 25 × 1024 from the link layer, that is, performs a splicing operation, to obtain the feature map with the size of 13 × 25 × 1280. Upsampling 2D2x2 represents a two-dimensional deconvolution operation with a 2x2 convolution kernel. The second decoding module may be configured to perform fusion, upsampling, zero padding module, two-dimensional convolution module Conv2D, BN layer, and Mish activation function processing on the feature map from the second encoding module and the feature map from the first decoding module to obtain an output feature map; the size of the output signature was 50 x 100 x 256. In the process of fusing the feature maps, the second decoding module fuses the feature map with the size of 25 × 50 × 128 from the second encoding module and the feature map with the size of 25 × 50 × 512 from the first decoding module to obtain the feature map with the size of 25 × 50 × 640. The third decoding module may be configured to perform fusion, upsampling, zero padding module, two-dimensional convolution module Conv2D, BN layer, and Mish activation function processing on the feature map from the first encoding module and the feature map from the second decoding module to obtain an output feature map; the size of the output feature map is 100 × 200 × 128. In the process of fusing the feature maps, the third decoding module fuses the feature map with the size of 50 × 100 × 64 from the first encoding module and the feature map with the size of 50 × 100 × 256 from the second decoding module to obtain the feature map with the size of 50 × 100 × 320. The fourth decoding module may be configured to process the feature map from the third decoding module through a zero padding module, a two-dimensional convolution module Conv2D, a BN layer, and a Mish activation function to obtain an output feature map; the size of the output signature was 100 x 200 x 64. The fifth decoding module may be configured to perform a 1 × 1 convolution operation (Conv 1 × 1) on the feature map from the fourth decoding module and perform Sigmoid activation function processing to obtain a segmented vein image, wherein the size of the segmented vein image is 100 × 200 × 1. It should be understood that the size of the characteristic diagram of the input and output of each module, the number of the first residual blocks, and the number of the second residual blocks in the present invention can be adjusted according to the actual application, and the specific values are only illustrative. In this embodiment, the encoding and decoding parts of the system both use a Mish activation function instead of a ReLU activation function; and deepening the depth of the network by using the first residual block and the second residual block, wherein hole convolution is adopted to replace ordinary convolution to carry out downsampling without pooling. The network architecture adopted by the system of the embodiment can not only exert Resnet to enable the model to be rapidly converged, but also has the advantages of preventing model degradation and gradient disappearance and the like, and can also show the advantages of simple structure of a Unet network and reduction of model data volume. Inputting a finger vein image into an encoder part to perform feature extraction of vein paths, namely, the finger vein image is sampled downwards and subjected to feature extraction for multiple times to obtain a vein line feature map with lower resolution, the feature maps are continuously learned through a network to obtain a feature map capable of efficiently distinguishing the finger vein lines, namely, the finger vein image is subjected to downsampling to obtain a feature map of the vein lines, and encoding is completed in an upsampling part; in the method, three times of feature fusion are carried out on corresponding layers with the same feature graph size in encoding and decoding, so that the problems of model degradation and the like are effectively solved.

According to one embodiment of the invention, a system for finger vein image segmentation is trained in the following manner: training a system for finger vein image segmentation with a plurality of training samples comprising finger vein images and corresponding vein labels generates a vein image, calculating a loss value from the vein image and the vein labels with a loss function, and updating parameters of a neural network in the system for finger vein image segmentation by back propagation based on the loss value. In one embodiment, the neural network includes neural networks in an encoder and a decoder. In one embodiment, a neural network includes an encoder, a decoder, and a neural network in a link layer. In the training process of the system for segmenting the finger vein image, the selection of the loss function also has an influence on the performance of the system. Because the finger vein image is black in gray vein lines on the whole, and the vein lines are segmented only by segmenting the foreground and the background of the vein, when the number of foreground pixels of the vein image is far smaller than that of background pixels, the negative sample component in the loss function is dominant, so that the model is seriously biased to the background, and the effect is reduced. To avoid model imbalance, according to one embodiment of the invention, Dice loss and Binary Cross-Entropy loss (Binary Cross-Entropy) are used to train the system, so that the system can avoid imbalance caused by different numbers of foreground and background pixels. The loss function can be expressed as:

Loss＝αL_p+(1-α)L_Dice；

wherein L is_pRepresents a two-class cross entropy loss, and alpha represents L_pWeight of (1), L_DiceRepresenting the Dice loss.

N denotes the total number of pixels of a single graph, y denotes the venation label, y _i1 means that the ith pixel point is a vein pixel point, and y _i0 means that the ith pixel is a background pixel, p (y)_i) Indicating the probability that the ith pixel is a vein pixel. For the two-class cross entropy loss, the finger vein image comprises two classes of vein pixel points and background pixel points, y _i1 means that the ith pixel point is a vein pixel point, and y_iThe ith pixel point is a background pixel point, which is equivalent to setting the vein pixel point as a Positive sample (Positive), the Label is 1(Label is 1), the background pixel point is a Negative sample (Negative), and the Label is 0(Label is 0); for positive samples y_i＝1，loss＝-log(p(y_i) When p (y)_i) The larger the loss, the smaller the loss, and in the most ideal case p (y)_i) 1, the corresponding loss is 0; for negative example y_i＝0，loss＝-log(1-p(y_i) When p (y)_i) The smaller the corresponding loss, the most ideal case is p (y)_i) The corresponding loss at this time is 0.

Y represents the venation label, Y represents the segmentation output, n represents the intersection, and | · | represents the number of pixels. For Dice loss (also called Dice coefficient), the Dice loss is a set similarity measurement function, and is generally used for calculating the similarity of two samples, and the value range is [0, 1 ]]The greater the Dice coefficient, the higher the similarity. Because the vein pixel points and the background pixel points of the finger vein image are extremely unbalanced, the training process of the network is stable by adopting the Dice loss. Two are providedThe proportion of the sub-losses can be adjusted according to alpha, and the alpha can be set according to the engineering practice of the actual task as required. For example, when the applicant performs an experiment, after tuning parameters, α is selected to be 0.35, and it is considered that the model is more likely to select a mode in which the Dice coefficient is more emphasized, that is: it is desirable that the distribution of the background is closer to the real image. The cross entropy loss of the two classes essentially reflects the prediction accuracy of the model, so the weight is slightly lower. It should be understood that if the Batch Size (Batch Size) is not 1, the sum of the losses of all samples or the average loss is calculated as the final total loss after weighted summation of the two-class cross entropy loss and the Dice loss of each sample in a Batch to obtain the loss of each sample.

In order to verify the effect of the present invention, the applicant also performed corresponding experiments.

1. Environment of experiment

Hardware environment of experiment: CPU is Inter (R) core (TM) i7-7700HQ CPU @2.80GHZ 8; the memory is 15.3 GiB; the graphics card is GeForce GTX 1060.

Setting network environment and parameters: all models in the experiment are realized based on an open source framework keras under an Ubuntu16.04 environment, the CUDA version is 9.0, and the programming language is Python 3.5. In the network training stage, an RMSprop optimization objective function is adopted, in the training process, the size of each batch processing is 8, the learning rate is 0.0001, and the Dropout factor in the full connection layer is set to be 0.5. The training network finger vein image has a size of 256 × 256, and is input as an image of a finger vein and output as a divided pulse road map. It should be understood that the batch size, learning rate, Dropout factor may be set as desired. Preferably, the set range of batch sizes may be 8-16; the setting range of the learning rate is 0.0001-0.001 and the Dropout factor is 0.4-0.7, and the respective specific values can be selected from these ranges to set.

2. Vena digitalis data acquisition and labeling

The finger vein data used in the experiment is acquired by a finger vein analyzer, and the finger vein analyzer is shown in fig. 6, and the acquisition device consists of a near infrared Light (LED) component for placing a finger and a charge coupled sensor (CCD). And establishing a finger vein database for the experiment by using the collected finger veins, and extracting vein characteristics of the veins for subsequent finger vein identification.

The finger vein collection principle is shown in fig. 7, a near infrared light source LED above a finger can well penetrate the finger, and in penetrating the finger, because hemoglobin in venous blood can absorb more near infrared light than bones, muscles and the like in the finger, when the venous blood vessel absorbs infrared light, the finger vein can obtain a black line corresponding to one vein under a CCD, and other tissues can present a gray-black finger vein image. The vein data was collected voluntarily by 60 subjects using a finger vein machine, finger vein images were collected by the finger vein machine as shown in fig. 8, the Index finger (Index), Middle finger (Middle) and Ring finger (Ring) of the left hand, and Index finger (Index), Middle finger (Middle) and Ring finger (Ring) of the right hand were collected for each subject, 6 images were collected for each finger of the subject, the format of the collected images was JPEG, the image size (Pix) was 100 × 200, and there were 2160 vein images in total.

In the collected finger vein data, a training data set is constructed, a plurality of finger vein images are randomly selected from 60 parts of originally collected vein data, and the positions of blood vessels in the vein images are labeled by data labeling software to obtain venation labels (also called label images). For example, for the original finger vein image shown in the left diagram of fig. 9, the corresponding venation label is seen in the right diagram of fig. 9, the venation label being an image containing only vein vessels. For example, 35 finger vein images are randomly selected for labeling, and the 35 images labeled with only vein blood vessels are used as labels of the original 35 vein images. In the training of the network, in order to prevent the network from being overfitted, data enhancement can be performed on 35 original vein images and corresponding labels to increase the diversity of data and the number of data. The original image, the corresponding label, the enhanced vein image and the label are mixed together, the system is trained by dividing the training data and the verification data into 8:2, and the performance of the system is tested on a verification set.

3. Performance evaluation

In the standard for evaluating the performance of a plurality of evaluation models of image semantic segmentation, the application uses a Mean Intersection over Union (MIoU) as an evaluation index of a final model for the model evaluation of a vein segmentation algorithm. MIoU, i.e. computing IoU values on each class, computes the ratio of two aggregates, the true (Ground try) and the Predicted (Predicted Segmentation) values, in the semantic Segmentation problem of images. The MIoU calculation is as follows:

wherein k +1 is the number of label types in the finger vein image, i represents the real type in the finger vein image, and p_iiRepresenting the true number, i.e. the number of pixels predicted to be correct, p_ijRepresenting the number of pixels that originally belonged to class i but predicted to be class j, i.e. the number of false positive pixels, p_jiThe number of false negative pixels.

4. Analysis of Experimental results

In order to verify the segmentation effect of the network structure on the venation of the finger vein, three different segmentation methods are adopted as comparison experiments on a self-constructed data set, and the traditional Unet network, the Segnet network and the Res _ Unet network are respectively adopted to carry out the segmentation experiments on the vein grains of the finger vein. Fig. 10 shows the accuracy variation curve of the system during the training process, wherein the abscissa is the round of training (Epoch), the ordinate is the accuracy, the corresponding numbers point to the corresponding curve by means of arrows, corresponding to the meaning of the curve represented by the numbers in the lower right of the figure; for example, the curve 1 corresponding to the Unet network, the curve 2 corresponding to the Segnet network, and the curve 3 corresponding to the Resnet50_ Unet network (abbreviated as Res _ Unet later), the curve 4 corresponding to the system of the present invention (named veinseg _ net), which are similar in meaning to the following figures, is not described again. In the training process of the system, the accuracy of the traditional Unet and Segnet changes slowly after the 10 th turn and slowly becomes stable in the 40 turns. As can be seen in fig. 10, the training accuracy of the conventional Unet network is stabilized at about 97%, while the training accuracy of Segnet is gradually stabilized at about 95%. The Res _ unit network and the system gradually tend to be stable after the 5 th turn, but the training accuracy of the network structure proposed by the present application is higher than that of Res _ unit, between 20 and 30 turns, and the accuracy of the system is higher than that of Res _ unit, and then gradually tend to be stable. The performance on the verification set is shown in fig. 11, from which it is obvious that the network structure of the system is better than the other three network structures under the same condition and gradually becomes stable. Referring to fig. 12, it is apparent that the effect of the network of the present application on the pulse path segmentation of the finger vein is better than that of the conventional Unet and Segnet networks, and the training average merge ratio (MIoU) of the network structure proposed by the present application is higher than that of Res _ Unet after the 5 th round and gradually becomes stable. Cross-over ratio (val _ MIoU) of verification process referring to fig. 13, it can be derived from the variation curve of the cross-over ratio of different network structures on the verification set, that the verification cross-over ratio of the traditional Unet and Segnet gradually becomes stable in the 30 th round although the verification cross-over ratio fluctuates, the MIoU of the traditional Unet tends to be about 0.65 and gradually stabilizes, and the MIoU of the sgene tends to be about 0.58 and gradually stabilizes. Compared with Unet, Segnet and Res _ Unet, the network structure of the system has a lower average intersection ratio than Res _ Unet in the first 5 rounds, but after the 5 th round, the network structure of the system has a higher average intersection ratio than Res _ Unet and gradually stabilizes to about 0.73.

In order to verify the performance of the Mish activation function, three groups of comparison experiments are carried out on a self-established data set, the first group uses a ReLU activation function, and the ReLU function is not suitable for input with larger gradient, because after the parameters are updated, the ReLU neuron can not have an activation function, so that the gradient is always zero, namely the ReLU function can have the problem of network 'die'; the second group uses a Mish activation function, the Mish activation function is not completely cut off when the Mish activation function has a negative value, and a smaller negative gradient is allowed to flow in, so that the integrity of the pulse path information of the finger vein can be ensured, and the problem of network 'die' brought by the ReLU activation function can be relieved; the third group was tested for performance using the Swish activation function under the same experimental conditions. From FIG. 14, the MIoU comparison results on the validation set using three different activation functions Mish > ReLU > Swish can be derived. Therefore, the Mish activation function MIoU adopted by the application has better effect than ReLU and Swish. The average merging ratio obtained by testing the traditional Unet network, the Segnet network, the Res _ Unet network and the network structure of the system on a self-built data set is shown in the table 1, and for the same test data and the same test condition, the MIoU comparison result obtained by using different network structures is that the network structure of the system is > Res _ Unet > Unet > Segnet. Therefore, the network structure provided by the application has better effect on the venation segmentation of the finger veins.

TABLE 1 MIoU test results for different network architectures

Split network architecture	Number of rounds	MIoU(％)
			Unet	50	64.96
Segnet	50	58.32
			Resnet_unet	50	71.94
Network structure of the present system	50	72.70

Fig. 15 shows the segmentation effect of the conventional Unet, Segnet, Res _ Unet and the system of the present application on the verification set, and the comparison shows that the system of the present application can clearly segment the venation characteristics of the vein. The conventional Unet and Segnet have breakpoints and discontinuities and some noise points for vein segmentation, and have large false positive rate and false negative rate. Compare Res _ unet, the vein line that the system of this application was cut apart is more continuous and can avoid the influence of noise point to the vein line, simultaneously, and the image that meets in the face of the image segmentation model cuts apart the marginal shock problem, and the system of this application shows more outstanding, and the edge is more clear. The finger vein identification is influenced by a plurality of factors due to the indistinguishable performance and the similarity of the finger veins, the system can extract image features from multiple scales, and the algorithm is better in robustness.

According to an embodiment of the present invention, a method for extracting a vein of a finger vein is provided, the extracted vein image is input into the system for finger vein image segmentation of the foregoing embodiment, and a vein image of the segmented finger vein is output. For example, a finger vein gray scale map is input to the system, and after the system processes the finger vein gray scale map, a segmented vein image, such as a finger vein represented by a binary image, is obtained.

According to an embodiment of the present invention, a method for identification based on finger veins, for example, a venation image of a finger vein to be identified is extracted by using the system for finger vein image segmentation of the foregoing embodiment or the foregoing method; and performing identity recognition based on the extracted vein image of the finger vein. For example, in some identity verification apparatuses, the system of the present invention is deployed so as to segment the acquired finger vein image of the corresponding person to obtain a vein image to be recognized, and perform matching calculation based on the vein image to be recognized and the vein image of the person registered in the system, thereby performing identity recognition.

It should be noted that, although the steps are described in a specific order, the steps are not necessarily performed in the specific order, and in fact, some of the steps may be performed concurrently or even in a changed order as long as the required functions are achieved.

The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.

The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may include, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A system for finger vein image segmentation, comprising:

the encoder is used for carrying out downsampling on the input finger vein image with different resolutions based on the hole convolution to generate a plurality of feature maps of vein lines with different resolutions of the finger vein image;

the decoder is used for carrying out feature fusion and up-sampling on the basis of the feature map of the corresponding vein line to generate a segmented venation image;

wherein the encoder and decoder are trained in the following manner:

the method includes generating a vein image with a plurality of training samples encoder and decoder including finger vein images and corresponding vein labels, calculating a loss value from the vein image and the vein labels with a loss function, and updating parameters of a neural network in the encoder and decoder by back propagation based on the loss value.

2. The system according to claim 1, wherein said downsampling at different resolutions of said input finger vein image based on hole convolution comprises downsampling based on normal hole convolution layer and residual block based hole convolution layer.

3. The system for finger vein image segmentation according to claim 1 or 2, wherein the encoder performs convolution operation with step size of 2 through the corresponding hole convolution layer during the down sampling process to reduce the resolution of the feature map of the vein pattern.

4. The system for finger vein image segmentation according to claim 3, wherein the encoder does not reduce the resolution of the feature map of vein prints by a pooling operation during the down-sampling.

5. The system for finger vein image segmentation according to claim 1, wherein all activation functions of the system for finger vein image segmentation except the activation function for generating the segmented finger vein context map adopt Mish activation function.

6. The system for finger vein image segmentation according to claim 1, 2 or 5, characterized in that the encoder comprises:

a first encoding module configured to perform downsampling using a normal hole convolution layer to obtain a first feature map of a first resolution;

a second encoding module configured to down-sample the first feature map of the first resolution based on the hole convolution layer of the residual block to obtain a second feature map of a second resolution;

a third encoding module configured to down-sample the second feature map of the second resolution based on the hole convolution layer of the residual block to obtain a third feature map of a third resolution;

and the fourth coding module is configured to perform convolution operation for increasing the number of channels on the third feature map of the third resolution based on the hole convolution layer of the residual block to obtain a fourth feature map of the third resolution.

7. The system of claim 6, wherein the residual block comprises a first residual block, the first residual block comprises a first main branch and a first bypass branch, the first main branch comprises at least two hole convolution layers, the first bypass branch comprises a BN layer and a Mish activation function which are connected in sequence, and the sum of the output of the first main branch and the output of the first bypass branch is used as the output of the first residual block after passing through a Mish activation function.

8. The system according to claim 6, wherein the residual block comprises a second residual block, the second residual block comprises a second main branch and a second bypass branch, the second main branch comprises at least two hole convolution layers, an output of the second bypass branch is equal to an input, and a sum of an output of the second main branch and an output of the second bypass branch is an output of the second residual block after passing through a Mish activation function.

9. The system for finger vein image segmentation according to claim 6, further comprising: and the link layer comprises a two-dimensional convolutional neural network layer and is used for performing convolution operation for increasing the number of channels on the fourth feature map to obtain a fifth feature map of a third resolution and outputting the fifth feature map to the decoder.

10. The system for finger vein image segmentation according to claim 9, wherein the decoder comprises:

the first decoding module is configured to perform fusion, upsampling and two-dimensional ordinary convolution operation for reducing the number of channels on the third feature map and the fifth feature map of the third resolution to obtain a sixth feature map of the second resolution;

the second decoding module is configured to perform two-dimensional ordinary convolution operations of fusing, upsampling and reducing the number of channels on the second feature map and the sixth feature map of the second resolution to obtain a seventh feature map of the first resolution;

the third decoding module is configured to perform fusion, upsampling and two-dimensional ordinary convolution operation for reducing the number of channels on the first feature map and the seventh feature map of the first resolution to obtain an eighth feature map;

the fourth decoding module is configured to perform two-dimensional ordinary convolution operation for reducing the number of channels on the eighth feature map with the original resolution to obtain a ninth feature map;

a fifth decoding module configured to process the ninth feature map to generate a segmented context image.

11. The system for finger vein image segmentation according to claim 1, 2 or 5, wherein the loss value is a weighted sum of a binary cross entropy loss and a Dice loss.

12. The system for finger vein image segmentation according to claim 11, wherein the loss function is represented as:

Loss＝αL_p+(1-α)L_Dice；

13. A method of extracting venular veins from a finger, comprising:

inputting a finger vein image into the system of any one of claims 1 to 12, and outputting a vein image of the segmented finger vein.

14. A method for identity recognition based on finger veins is characterized by comprising the following steps:

extracting a venation image of a finger vein of an identity to be identified by using the system of any one of claims 1 to 12 or the method of claim 13;

and performing identity recognition based on the extracted vein image of the finger vein.

15. A computer-readable storage medium, having embodied thereon a computer program, the computer program being executable by a processor to perform the steps of the method of claim 13 or 14.

16. An electronic device, comprising:

one or more processors; and

a memory, wherein the memory is to store one or more executable instructions;

the one or more processors are configured to implement the steps of the method of claim 13 or 14 via execution of the one or more executable instructions.