CN113537111A

CN113537111A - Iris segmentation method based on double-branch deep convolutional network

Info

Publication number: CN113537111A
Application number: CN202110841762.4A
Authority: CN
Inventors: 陈思华; 李军侠; 高昂昂
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2021-07-26
Filing date: 2021-07-26
Publication date: 2021-10-22

Abstract

The invention discloses an iris segmentation method based on a double-branch deep convolutional network, which comprises the following steps of 1: constructing a double-branch deep network segmentation model, which comprises a coding layer, an attention layer, a decoding layer, a mask branch and inner and outer edge branches; step 2: setting a loss function for constraining the dual-branch deep network segmentation model; and step 3: training a double-branch depth network segmentation model by adopting a Pythrch frame; and 4, step 4: testing a double-branch depth network segmentation model; and 5: inputting an eye image, and performing iris segmentation by using a dual-branch depth network segmentation model. The method can improve the accuracy of the human body complex iris segmentation, so that the subsequent iris identification precision is higher.

Description

Iris segmentation method based on double-branch deep convolutional network

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to an iris segmentation method based on a double-branch deep convolutional network.

Background

In recent years, data information has become more and more accessible. Therefore, information security becomes an important research direction today, and identity authentication also becomes an important part of information security management. The traditional identification methods, such as identification cards, passports, smart cards, and employee cards, have been far from meeting the needs of the current society, and in order to meet the rapidly developing era, people should research more reliable and more convenient identification methods. The personal identification is carried out through the fingerprint information of the human finger to identify the express delivery in life.

Because the lines of the fingerprints of each person are different, the fingerprint database is established by extracting the fingerprint texture information, and the identity of the person can be judged by comparing the fingerprint information of the identified person with the information in the fingerprint database. However, for those who work in labor type, long-term work wears the finger fingerprint, resulting in loss of fingerprint information. In daily production and life, people inevitably touch some articles, and illegal personnel can forge the articles by extracting fingerprint information on the touched articles, so that the problems of illegal authorization and property loss are caused.

In daily production and life, the face recognition technology is mainly applied to security inspection of railway stations and airports, face payment of payment instruments, attendance card punching of companies and the like. However, for face recognition, the face information may be very different due to different ages, good or bad mood, face obstruction (such as mask, cosmetics, sunglasses, etc.), and the influence of light. The accuracy of face recognition is not very high.

The iris of a human body is unique, and the iris cannot be changed due to the growth of the human body under the general condition, so that the iris has strong stability. Therefore, iris recognition is one of the more stable, accurate and reliable biometric identification technologies.

A complete iris recognition system usually includes several processes, such as collecting iris images, segmenting and positioning irises, extracting iris features, and matching with an iris database. Iris segmentation and localization are important parts in iris preprocessing, and directly determine the accuracy of subsequent iris recognition.

Disclosure of Invention

The technical problem to be solved by the invention is to provide an iris segmentation method based on a double-branch deep convolutional network aiming at the defects of the prior art, which can improve the accuracy of human body complex iris segmentation and ensure that the subsequent iris identification precision is higher.

In order to achieve the technical purpose, the technical scheme adopted by the invention is as follows:

the iris segmentation method based on the double-branch deep convolutional network comprises the following steps:

step 1: constructing a double-branch deep network segmentation model, which comprises a coding layer, an attention layer, a decoding layer, a mask branch and inner and outer edge branches;

step 2: setting a loss function for constraining the dual-branch deep network segmentation model;

and step 3: training a double-branch depth network segmentation model by adopting a Pythrch frame;

and 4, step 4: testing a double-branch depth network segmentation model;

and 5: inputting an eye image, and performing iris segmentation by using a dual-branch depth network segmentation model.

In order to optimize the technical scheme, the specific measures adopted further comprise:

in the coding layer, two times of convolution of 3x3 are adopted, each convolution is provided with a ReLu function, and the maximum pooling of 2x2 is adopted to reduce the data quantity after the convolution;

in the decoding layer, a 3x3 convolution band ReLu function is adopted, and then deconvolution of 2x2 is carried out, and in the process of each deconvolution, the data of the characteristic channel is doubled.

The attention layer extracts context features by utilizing spatial pyramid pooling and global average pooling parallel processing;

the spatial pyramid pooling adopts a hole convolution, context features are captured through a plurality of proportions, and finally the number of channels is reduced to an expected value through a 1-by-1 convolution; and when pyramid pooling is performed, global average pooling is processed in parallel;

the global average pooling is to accumulate all pixel values in all the characteristic graphs and then average the pixel values;

after spatial pyramid pooling and global average pooling, the features are processed by convolution with 1x1 to obtain a feature map, and unimportant noise interference is filtered out basically.

In the above-mentioned two-branch depth network segmentation model, the mask branch and the inner and outer edge branches output the iris mask and the inner and outer boundaries of the iris respectively;

furthermore, Canny edge detection is carried out on the iris mask to obtain the inner and outer boundaries of the iris mask, and morphological dilation corrosion operation is carried out after the Canny edge detection is coupled with the inner and outer boundaries obtained by the inner and outer edge branches to be used as the final inner and outer boundaries of the iris.

The above-mentioned two-branch deep network segmentation model adopts two loss functions, and its total loss function is:

L＝L_M+L_B

wherein L is the total loss function, L_MFor a Dice coeffient function for a masked branch, L_BIs the binary cross entry function for the inner and outer edge branches;

the Dice coefficient function is used for evaluating the similarity between a predicted result and a real result:

in the formula, p is a real result, q is a predicted result, the real result and the predicted result are accumulated and summed from i to j, and then divided by respective square sums to obtain a similar ratio of the predicted result to the real result, namely an accurate predicted result.

The binary cross entry function is used for ensuring that the segmented iris outline is closer to a true value, and the calculation formula is as follows:

in the formula, y_iFor the text in the ith image-text,

to predict the text in the ith image-text, n is the number of image-text samples, output _ size indicates the output data size, and accumulating i from 1 to the full size of the image allows it to be calculated over a full image.

In the model training process in the step 3, the batch _ size is set to 1, and the initial learning rate is set to 0.01;

and continuously training until loss is converged by a random gradient descent method, and continuously evaluating the performance of the model by using a verification set to adjust the hyperparameter, wherein the training is stopped when the loss value is finally converged to a constant.

When the model training is performed in the step 3, L2 regularization is added to avoid the occurrence of overfitting, and the formula of regularization is as follows:

where λ is an adjustment factor.

In the step 4, an acquired eye image is selected from the test set, the eye image is input into the two-branch depth network segmentation model, the trained model weight is loaded for segmentation to obtain a probability map, the probability map is binarized to generate a final iris contour curve, and the final iris contour curve is superposed on the original image to observe the segmentation result of the two-branch depth network segmentation model.

The invention has the following beneficial effects:

the invention adds a class of output at the output end of the encoding and decoding semantic segmentation network, and adds the attention module at the middle part of the encoding and decoding network, thereby improving the segmentation accuracy.

Two paths can be obtained at the output end of the network through the processing of the convolution kernel, wherein one path is an iris mask image, and the other path is the inner and outer boundaries of the iris. The inner and outer boundaries of the iris can be obtained by carrying out Canny edge detection on the mask image of the iris once, and the obtained inner and outer boundaries of the two paths are from the same iris image, so that a more accurate iris boundary curve can be obtained by carrying out coupling processing on the iris image once. Because the deep convolutional network generates certain boundary loss in the processing process, the obtained iris boundary curve has certain fracture or noise interference, and the method also adds morphological expansion and corrosion treatment to the output of the network, thereby obtaining a smoother boundary curve.

In order to enable the coding and decoding network to better acquire the characteristic information of the image, the attention layer is added in the center of the network, and the attention module adopts a spatial pyramid pooling and global average pooling parallel processing mode, so that the receptive field can be increased, and more characteristic information can be acquired.

Drawings

FIG. 1 is a diagram of a network architecture;

FIG. 2 is a diagram of an attention module configuration;

FIG. 3 is a graph of the segmentation results;

in fig. 3, (a) an iris original image, (b) an iris mask, (c) an inner and outer iris outline, and (d) an iris prediction result;

FIG. 4 is a graph showing the results of comparison of different methods;

in FIG. 4, (a) the original image, (b) the Iris pull, push elasticity model, (c) U-net, (d) V-net, (e) U-net + attention, (f) the method of the present invention;

FIG. 5 is a graph showing the results of comparing the accuracy of the method of the present invention;

FIG. 6 is a flow chart of the method of the present invention.

Detailed Description

Embodiments of the present invention are described in further detail below with reference to the accompanying drawings.

Referring to fig. 6, the present invention provides a method for iris segmentation in a dual-branch deep convolutional network, which includes:

and 4, step 4: testing a double-branch depth network segmentation model;

In the embodiment, the invention adds a class of output at the output end of the coding and decoding semantic segmentation network and adds the attention module at the middle part of the coding and decoding network, thereby improving the segmentation accuracy.

Two paths can be obtained at the output end of the network through the processing of the convolution kernel, wherein one path is an iris mask image, and the other path is the inner and outer boundaries of the iris. The inner and outer boundaries of the iris can be obtained by carrying out Canny edge detection on the mask image of the iris once, and the obtained inner and outer boundaries of the two paths are from the same iris image, so that a more accurate iris boundary curve can be obtained by carrying out coupling processing on the iris image once. Because the deep convolutional network generates certain boundary loss in the processing process, the obtained iris boundary curve has certain fracture or noise interference, and a morphological expansion and corrosion treatment is added to the output of the network, so that a smoother boundary curve can be obtained. In order to enable the coding and decoding network to better acquire the characteristic information of the image, an attention layer is added in the center of the network, and a spatial pyramid pooling and global average pooling parallel processing mode is adopted in an attention module, so that the receptive field can be increased, and more characteristic information can be acquired.

The data set adopted during model training is CASIA. v4-distance, the data set comprises 2567 iris images from 142 testees, the Chinese academy of sciences automation research institute (CASIA) adopts a remote iris camera to shoot in a far distance under near infrared illumination (NIR), each image comprises a complete upper half face part, and therefore the left iris and the right iris are contained.

Specifically, the method comprises the following steps:

in step 1, as shown in fig. 1, a network structure diagram is shown, and a typical codec network is at the front end.

In the coding layer, two times of convolution of 3x3 are adopted, a ReLu function is carried after each convolution, the maximum pooling of 2x2 is adopted after the convolution to reduce the data quantity, and a large number of convolution operations are avoided to enable the model to enter an overfitting state.

In the decoding layer, a 3x3 convolution band ReLu is also adopted, but a 2x2 deconvolution is followed, and in the process of each deconvolution, the data of the characteristic channel is doubled, so that complete data prediction can be obtained.

The output of the network is increased from the original one class output to the second one class output.

The net can finally generate different classification targets by processing the convolution kernels,

the first type of output is the mask of the iris and the second type of output is the inner and outer boundaries of the iris.

For the first path, the inner and outer boundaries of the iris obtained by Canny edge detection of the iris mask output by the network can be obtained, and since the input of the network is the eye image of the same person, the inner and outer boundaries of the iris obtained in the first path and the second path should be consistent in principle.

Based on such consideration, the inner and outer boundaries obtained by Canny of the first path and the inner and outer boundaries obtained by the second path are coupled at a time, so that more accurate inner and outer boundaries of the iris can be obtained.

However, the boundary obtained so far has very large noise interference, so that the edge is very rough, and therefore, the boundary is added with a morphological dilation-erosion operation in the future, so that a complete and smooth edge profile curve can be obtained.

Considering that a large amount of interference noise signals are generated in the process of acquiring the iris images, an attention layer is added into the network. As shown in fig. 1, an attention layer is added between the encoding channel and the decoding channel of the network, so that more characteristic information can be focused, and the influence of noise interference information on the characteristic information is reduced.

Fig. 2 is a structural diagram of an attention layer, and first considers Spatial Pyramid Pooling (ASPP) mentioned in deep lab V3, and is therefore designed to extract context features by parallel processing of Spatial Pyramid Pooling and Global average Pooling (Global average Pooling).

The spatial pyramid pooling uses a hole convolution (scaled convolution) which can enlarge the field of view of the convolution process, so that each convolution output contains a larger range of information.

Spatial pyramid pooling is equivalent to capturing context features by multiple scales, and finally reducing the number of channels to the desired value by a 1x1 convolution.

During pyramid pooling, the design of global average pooling, which is accumulating all pixel values in all feature maps and then averaging, is processed together in parallel. After spatial pyramid pooling and global average pooling, the features of the spatial pyramid pooling and the global average pooling are processed by convolution of 1x1, so that unimportant noise interference is basically filtered out of the obtained feature map, and great convenience is brought to subsequent processing.

In step 2, in order to enable the model to obtain better segmentation performance, the invention designs a loss function for the network. The model is constrained through the loss function, so that the predicted value is infinitely close to the true value, and higher segmentation precision can be obtained. The invention adopts two loss functions, and the total loss function is as follows:

L＝L_M+L_B

wherein L is the total loss function, L_MFor a Dice coeffient function for a masked branch, L_BIs the binary cross entry function for the inner and outer edge branches.

The Dice coefficient function is mainly used for evaluating the similarity between a predicted result and a real result

In the formula, L_MAs a loss function on the mask branch. In the formula, p is a real result, q is a predicted result, the real result and the predicted result are accumulated and summed from i to j, and then divided by respective square sums, so that the similarity of the predicted result and the real result can be calculated, and the similarity is an accurate predicted result. Its value is generally [0,1 ]]0 represents a complete error, and the prediction result has no similarity with the real result; 1 is the most ideal state, and the predicted result is exactly the same as the true correct result. Therefore, in the process of model training, L should be enabled as much as possible_MThe larger the value, the better, it is possible to approach 1 infinitely.

The binary cross entry function is mainly used for ensuring that the segmented iris outline is closer to the true value, and the calculation formula is as follows:

in the formula, L_BFor boundary losses on the inside and outside contour legs, y_iFor the text in the ith image-text,

to predict the text in the ith image-text, n is the number of image-text samples, output _ size indicates the output data size, and accumulating i from 1 to the full size of the image allows it to be calculated over a full image. The function is used for restraining the prediction result, so that the segmented iris outline can be closer to a real image, and the segmentation precision can be improved.

And 3, during model training, a Pythrch frame is adopted, and a great number of data packets are contained in the Pytorch, so that the model can be very simple during training.

The optimization algorithm used in training is a Stochastic Gradient Descent (SGD) method. SGD has many superior properties that are not only computationally fast, but also automatically move away from a relatively poor local optimum. Meanwhile, the answer finally found by adopting the SGD algorithm has strong generality, namely the answer can better perform on a data set which is not seen before but obeys the same distribution.

Setting the batch _ size to be 1 in the training process; the initial learning rate was set to 0.01. Training is continued until loss convergence through a random gradient descent method, and model performance is evaluated by using a verification set continuously to adjust the hyperparameter. The variation of the loss value is observed in the training process, the loss is finally converged to a constant, and the training is stopped.

In the initial training process, the model generates a certain overfitting because the samples of the database are complex.

After the over-fitting phenomenon occurs, L2 regularization is added during model training, so that the over-fitting phenomenon can be effectively avoided.

The regularization formula adjusts the residual squared by adding the amount of shrinkage.

Where λ is an adjustment factor that determines how the complexity of the model is evaluated. The model complexity is represented by an increase in the coefficients. For the scheme, a proper lambda value needs to be selected, and after the proper lambda value is selected, overfitting of the model can be effectively avoided, and the generalization capability of the model is improved.

Example (b): in step 3 model training, 60% of the total amount of data is used as a training set, 20% is used as a validation set, and 20% is used as a test set. Since the images in the data set are the top images of the entire face, it is necessary to perform preprocessing to manually extract the left and right eye images for each subject.

The training adopts a Pythrch deep learning frame, the optimization algorithm adopts an SGD algorithm, and the hyper-parameters are set as follows: batch _ size is set to 1; the initial learning rate was set to 0.01. In order to avoid the situation that the model falls into overfitting in the training process, L2 regularization processing is added in the training process.

After the network is trained according to the scheme, an acquired eye image is selected from the test set, a two-branch depth network segmentation model is input, the trained model weight is loaded for segmentation to obtain a probability map, the probability map is subjected to binarization (the probability value is changed from 0.5 to 1 and is changed from 0.5 to 0), and a final iris contour curve is generated. In order to better and more intuitively observe the segmentation result of the present invention, the finally generated iris outline curve is superimposed on the original image, as shown in fig. 3.

In fig. 3, (a) is an iris original image, and a more complete iris image is selected to clearly and clearly discuss the segmentation of the present invention. (b) Is an iris mask image marked by labelme software. (c) The contour curve generated for the network can be seen clearly that the edge is not smooth, more interference situations exist, and the situation of local missing fracture also exists. For the phenomenon, morphological processing is added at the output end of the network, firstly, a corrosion algorithm is adopted to remove some noise points, burrs and the like on the contour line, but certain defects can be generated after corrosion, so that the contour line is expanded once, and the expansion can close the curve defects in a small range. The final segmentation result is shown in fig. 3(d), where it can be clearly observed that the contour curve substantially matches the inner and outer contours of the iris, and the curve is a smoother curve rather than the curve generated by the network having so many noise points.

Next, in order to demonstrate the advancement of the present invention, comparative experiments were conducted on the present invention and several more popular schemes. In the comparison experiment, in order to compare the iris segmentation results of various methods, the invention adopts a normal iris and four more complex iris images. As shown in fig. 4, the first behavior is a normal iris image. The second line adopts an iris image which generates specular reflection during acquisition, a reflected shadow can be seen on the eyeball of an acquired person, and the specular reflection can directly influence the accuracy of iris positioning. The third row takes an image where eyelid occlusion results in loss of the iris, which results in the upper half of the iris being completely occluded by the eyelid, since the subject does not have the eye glasses open. The fourth row adopts the condition that the glasses are shielded, and the glasses can reflect illumination, thereby bringing great difficulty to segmentation. The fifth row is an image of attention deviation, and the eyes of the measured person are not focused on the acquisition equipment when the image is acquired, so that iris deviation is caused. The four situations are the situations which are often encountered in the process of iris segmentation, the interference on the acquisition of the iris in a real production life scene is very large, and the four representative situations are selected for comparison and exposition in the invention.

In the invention, the experimental results are shown in figure 4 after comparison tests with a classical iris pull-push elasticity model, a U-net network, a V-net network and a U-net network with an attention module are carried out. The work of the present invention was to reproduce the above-described methods by manually writing codes by studying the relevant papers on these methods. The Attention module in the U-net network is in the center of the network as the Attention layer of the invention, but the difference is that the Attention module in the U-net network adopts the Attention module built by the most basic convolution, and the Attention module processed by the spatial pyramid pooling and the global average pooling in parallel is adopted in the invention.

As can be appreciated from fig. 4, the segmentation effect of the various methods is also valid for normal irises. However, for a more complex iris, the result of the U-net network segmentation added with the attention module is closer to that of the method, but the U-net network segmentation area of the attention module is often a bit larger than that of the iris, and the method of the invention can be completely attached to the inner contour and the outer contour of the iris.

The classical iris segmentation method performs well in normal iris segmentation, but because it is classified by boundary pixel gradients, the pixel difference between the iris and the pupil at the inner boundary of the iris is large, so its segmentation at the inner boundary is good. But for the outer boundary, there may be a large interference of the outer boundary of the complex iris resulting in segmentation errors. Meanwhile, when the iris is collected, the lens of the camera cannot be reflected in the pupil, so that the change of the pixel gradient is large, and the iris can be segmented by a classical algorithm.

In recent years, the segmentation result of the more widely used U-net and V-net networks is superior to that of the classical algorithm, the approximate boundary of the iris can be better segmented, but certain deviation exists, and on some complex irises with extremely high interference, the segmentation of the two methods can deviate from the real boundary. The performance of the U-net network segmentation added with the attention module is superior to that of a pure U-net network, the segmentation result is most similar to the method in several experiments of the comparison, the position of the iris in the data can be well found out, and the iris can be predicted and segmented, so that the effect of adding the attention module in the network can be shown to be certain, and the performance of a network model can be improved.

To further verify the usability of the method of the invention and the precision of the segmentation, the invention again experimentally verified the accuracy between the method of the invention and the results of the manual labeling. As shown in fig. 5. In order to clearly and obviously see the difference, a simpler iris image is adopted, (a) is a more complete iris image, the iris and the pupil are concentric circles, and (b) is a manually marked ground truth curve, which is a more accurate true value. (c) The middle curve is the result of the segmentation of the present invention. As is evident from fig. 5, the method of the invention and the ground truth almost completely coincide at a well-defined iris, and the coincidence is also quite high for irises that are covered above by the eyelids. Through comparison of the experiment, the method disclosed by the invention is closer to a true value in precision.

The model is then evaluated for the index. The adopted indexes are Dice indexes, the formulas of the indexes are the same as the loss formulas of the Dice coefficient, the meanings of the indexes are the same, and the similarity between the true value and the predicted value is considered. This compares the method of the present invention with the comparison of the dice index for U-net, V-net and U-net networks with attention modules as shown in Table 1.

TABLE 1 comparison of Dice indices for the inventive method and different methods

Table 1 also again demonstrates the comparison of fig. 4 in the previous section, and the results of the U-net network segmentation with the attention module added are more similar to the method of the present invention. Pure U-net network as analyzed above, when encountering more complex images, it has a certain decline, and the indexes of V-net network appearing behind U-net have a certain improvement, but the overall comparison has a certain defect. The segmentation accuracy of the method reaches 0.88, which is nearly 10% higher than that of the U-net and V-net algorithms, so that the method can effectively show that the algorithm of the invention is due to the U-net and V-net algorithms in iris segmentation prediction.

Next, the present invention also makes certain ablative experiments, the results of which are shown in Table 2

Table 2 ablation test results of the present invention

The invention removes the attention module and the double branches after network output, the essence of the invention is a coder-decoder structure, and the structure is basically the same as the U-net network, so the indexes obtained by calculation are the same. However, for the final index of the encoder-decoder structure and the attention module is higher than that of the corresponding scheme in table 1, the comparison of the pair of data can show that when the attention module is added to the network, the design of the inside of the module is very important, and the segmentation performance of the common attention module is not as good as that of the parallel operation of pyramid pooling and global average pooling. Comparing the metrics of the encoder-decoder structure of table 2 with the attention module alone and with the multi-tap output module alone, it is easy to find that the attention module improves network performance better than the multi-tap network. Comparing the method of the present invention with the method of a single module demonstrates that the method of the present invention is superior to the method alone in the accuracy of segmentation.

The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims

1. The iris segmentation method based on the double-branch deep convolutional network is characterized by comprising the following steps:

and 4, step 4: testing a double-branch depth network segmentation model;

2. The method as claimed in claim 1, wherein in the coding layer, two convolutions of 3x3 are adopted, each convolution is followed by a ReLu function, and the maximum pooling of 2x2 is adopted after the convolution to reduce the data quantity;

3. The method for iris segmentation based on the two-branch deep convolutional network of claim 1, wherein the attention layer extracts context features by parallel processing of spatial pyramid pooling and global average pooling;

4. The method of claim 1, wherein in the model, the mask branch and the inner and outer edge branches output the iris mask and the inner and outer boundaries of the iris respectively;

5. The method of claim 1, wherein the two-branch depth convolutional network based iris segmentation model adopts two loss functions, and the total loss function is:

L＝L_M+L_B

in the formula, y_iFor the text in the ith image-text,

to predict text in the ith image-text, n is the number of image-text samples, and output _ size represents the output data size.

6. The method for splitting an iris based on a two-branch deep convolutional network of claim 1, wherein in the model training process of step 3, the batch _ size is set to 1, and the initial learning rate is set to 0.01;

7. The method for splitting an iris based on a dual-branch deep convolutional network as claimed in claim 1, wherein during model training in step 3, L2 regularization is added to avoid overfitting, and the regularization formula is as follows:

where λ is an adjustment factor.

8. The method for splitting an iris based on a two-branch depth convolutional network of claim 1, wherein in step 4, an eye image collected from any one of the test sets is input into the two-branch depth convolutional network splitting model, the trained model weight is loaded for splitting to obtain a probability map, the probability map is binarized to generate a final iris contour curve, and the final iris contour curve is superimposed on an original image to observe the splitting result of the two-branch depth convolutional network splitting model.