CN115830319A

CN115830319A - Strabismus iris segmentation method based on attention mechanism and verification method

Info

Publication number: CN115830319A
Application number: CN202211513398.XA
Authority: CN
Inventors: 唐红雨; 施冬梅; 沙鸥; 胡杨
Original assignee: Zhenjiang College
Current assignee: Zhenjiang College
Priority date: 2022-11-29
Filing date: 2022-11-29
Publication date: 2023-03-21

Abstract

The invention discloses an attention mechanism-based squint iris segmentation method and a verification method, and provides an innovative technical scheme for the problem of squint iris segmentation by constructing a positioning segmentation model based on an attention mechanism and an active contour method aiming at the problem of squint iris and simultaneously considering the inner and outer boundaries and effective areas of the iris. In consideration of the characteristics of natural closed curve contours of iris images, the inner and outer boundaries of the squint iris are positioned from coarse to precise through an active contour model, pole coordinates after coarse positioning transformation are used as an initial contour point strategy and a contour point offset prediction fitting mode, finally, 128 obtained ordered contour points are used for positioning effective iris areas, areas near and between the inner and outer boundaries are extracted as effective prediction areas of a segmentation stage, the range needing prediction is narrowed, and iris boundary information is provided for subsequent normalization and the like.

Description

Strabismus iris segmentation method and verification method based on attention mechanism

Technical Field

The invention relates to an attention mechanism-based squint iris segmentation method and a verification method, and belongs to the technical field of artificial intelligence.

Background

In the current society, biological information recognition has become an essential part in daily life, and among recognition means such as palm prints, human faces, fingerprints, gait and irises, iris recognition has the highest security. Because the iris itself has unique, stable, non-contact characteristics, different individuals and even before twins are distinct and the iris itself is difficult to counterfeit. When an iris recognition task is performed, images are generally preprocessed, and the preprocessing mainly comprises iris image quality evaluation, iris living body detection, iris positioning, iris segmentation, iris normalization and the like. However, the requirement of iris recognition on the acquired image is high, the requirement is difficult to meet under the current complex recognition scene which tends to be less constrained, the acquired image cannot guarantee the direct-view lens of the eye region, and noise interference such as oblique eye, closed eye, shielding and reflection exists. The oblique eye problem is different from the problems of eye closing, blocking and the like, an effective iris area which can be used for identification still exists in an acquired image, but most iris segmentation algorithm models do not have the capacity of solving oblique eye interference.

The prior art performs segmentation processing on an iris image from two angles, namely pixel-based and boundary-based. Most traditional iris segmentation methods are not very robust, and oblique eye iris detection needs to follow a precondition that the inner and outer boundaries of the iris are assumed to be circular. In reality, the inner and outer boundaries of the iris present the shape of an irregular closed curve like a circle, and the inner and outer boundaries of the oblique-eye iris present free curves like an ellipse instead of a circle. The pixel-based method takes a pixel point as a minimum unit, and does not consider that the structure of the iris has the characteristic of continuous boundary. Moreover, the methods require manual factor setting and feature selection, and the training processes of feature extraction and classifier are separated, so that the method has poor effect in processing iris segmentation of noisy images. With the wide application of the deep learning framework, many scholars apply the method based on deep learning to the iris segmentation task, and put forward many segmentation methods. Most methods convert the iris segmentation problem into a two-classification semantic segmentation problem to solve, so that many semantic segmentation methods can achieve good effects in the field of iris segmentation. For example, a multi-scale convolutional neural network is applied to iris segmentation, an attention mechanism is introduced into a U-Net model for iris segmentation, after an iris Mask region is obtained, a significant point set is selected through a K-means clustering method to obtain the diameters of the inner boundary and the outer boundary of the U-Net model, but the method is poor in performance under the condition that the iris region is not communicated, and the inner boundary and the outer boundary are regarded as regular curves. However, most of iris segmentation methods based on deep learning only achieve effective segmentation of iris regions, and ignore the positioning of inner and outer boundaries of an iris, which is more important for iris recognition and directly affects iris normalization, so that it is difficult to directly apply these iris segmentation methods with excellent performance to an iris recognition system.

Disclosure of Invention

The invention aims to provide an attention mechanism-based squint iris segmentation method and a verification method, and provides an innovative technical scheme for the problem of squint iris segmentation by constructing a positioning segmentation model based on an attention mechanism and an active contour method aiming at the problem of squint iris and simultaneously considering the inner and outer boundaries and effective areas of the iris.

The purpose of the invention is realized by the following technical scheme:

an attention mechanism-based squint iris segmentation method comprises the following steps:

step one, human eye image acquisition:

collecting human eyes to be identified by a camera, and storing the human eyes in a database;

step two, preliminarily determining a model structure:

sending the human eye image into a ResNet50 (residual error Network 50) Network for feature extraction, setting a plurality Of ROI (Region Of Interest) for each point in the extracted feature image, sending the ROI into an RPN (Region suggestion Network) module to select effective signals, and filtering out partial ROI; then sending the feature map and the rest ROI into a RoIAlign module, and finally performing target prediction, wherein the RoIAlign module only outputs prediction information of a frame and a mask;

step three, target detection and coarse positioning:

in a deep learning model based on deep snake, a top-down structure of a Feature Pyramid (FPN) is adopted to perform multi-scale Feature extraction on an input image; the method adopts Resnet50 as an uplink network, the Resnet transmits residual errors through layer skipping for training, two basic modules of Identity Block (identification Block) and Conv Block (convolution Block) are provided, and the network is divided into five layers according to different combination modes of the two modules, which are represented by C1-C5; resnet101 has 22 Identity blocks at C4, while Resnet50 has only 5 at C4; in the FPN downlink part, performing up-sampling on a feature map by adopting deconvolution with 3 multiplied by 3 and step length of 1 and without filling, and then performing pixel-to-pixel addition on the feature map and a corresponding C1-C5 uplink feature map to obtain a P2-P5 fusion feature layer, wherein the C1 layer does not perform transverse connection in consideration of the memory problem; the P5 characteristic layer is subjected to SoftPool pooling to obtain a P6 characteristic layer used for the RPN module, and the parameter of the pooling layer is set to be 2; in order to ensure the same channel number, the characteristic diagram of the transverse connection is aligned with the channel through a 1 multiplied by 256 convolution layer; finally, eliminating aliasing effect brought by up-sampling of the P2-P5 layers through a 3 x 256 convolutional layer; and sending the processed P2-P6 fusion feature layer to an RPN module.

Fourthly, accurately positioning and segmenting the target:

the active contour fitting indirectly solves the problem of image segmentation by minimizing a defined energy function, and a contour curve gradually approaches to the edge of a detected object under the driving of the minimum value of the energy function so as to segment a target finally; the formula is as follows:

wherein v(s) = [ x(s), y(s)],s∈[0,1]The method comprises the following steps that a group of contour points are formed, and the points are connected end to end in a straight line to form a contour line; x(s) and y(s) respectively represent the coordinate position of each control point in the image, s is an independent variable describing the boundary in the form of a fourier transform, the first term inside the integral is the mode of the first derivative of v, called the elastic energy, the second term is the mode of the second derivative of v, called the bending energy, the third term is the external energy (external force), α, β, E _ext Is a constant coefficient;

processing the coarse positioning result of the NMS to be used as an initial contour; transforming the initial contour to better wrap the iris area; firstly, respectively taking middle points of each side of BBox, sending the four points as initial poles into a backbone network, using real pole coordinates as labels, predicting four offset vectors in one-to-one correspondence through network learning, and changing the positions of the four points through the offset vectors; then, one fourth of each side length of BBox is used as four sides of the octagon at four points, and the four sides are sequentially connected to form the octagon; finally, uniformly sampling on the octagon, wherein sampling points are boundary points representing the initial contour;

further, the initial contour is sent to a backbone network, coordinates of the same sampling points on the actual contour are used as labels, iterative prediction of offset vectors of all points is carried out, and after the coordinates of boundary points are changed through the offset vectors each time, iteration is carried out again as a new contour; in order to obtain contour features, firstly, a residual error network structure is used as a main body for feature extraction, because the inner boundary and the outer boundary of the oblique eye iris are respectively a natural closed free curve, boundary points describing the contour are connected end to end, and each point is connected with adjacent points by a line;

designating a point as a starting point, the entire boundary can be represented as an ordered set of points; the method is expressed in a discrete period signal form and is processed by cyclic convolution, and the expression is as follows:

wherein f represents a signal and k represents a circular kernel function;

furthermore, the feature extraction part consists of eight cyclic convolution modules in total, each convolution module consists of a cyclic convolution layer, a (Batch Normalization) BN layer and a Relu (Rectified Linear Unit) activation function layer, and the output of the module is summed with the output of the next module through jump connection; meanwhile, the output of each module is spliced at the tail of the feature extraction part through jumping connection to serve as the final extracted feature; then carrying out multi-scale contour feature fusion by using the 1 × 1 convolution layer and the SoftPool layer; redefining feature expression by utilizing twice cyclic convolution to improve feature expression capability; finally outputting the offset prediction vector of each contour point through a 1 × 1 convolutional layer; softPool calculates the exponential weighting coefficient of the pooling area corresponding to each pixel point, then multiplies the pixel point numerical value and adds the product, and the final result is used as the pooling result of the area; assuming the pooling coefficient is 2, the pooling operation is expressed as follows:

wherein, P represents the value of the pixel points participating in pooling; after accurate positioning, the accurate inner and outer edge profiles of the iris and the initial BBox are taken as constraint conditions and sent to a segmentation module together with a characteristic diagram for iris mask prediction;

furthermore, in a segmentation module, considering the segmentation accuracy and the complexity and the operation time of a model, a lightweight Attention mechanism ECA (Efficient Channel Attention) is introduced to improve the segmentation performance; after input features are subjected to channel-level global average pooling without reducing dimensions, each feature channel considers feature information of K adjacent channels in a one-dimensional convolution mode so as to capture local cross-channel interaction information;

the loss function of the model is composed of three parts, namely BBox loss, contour point offset loss and Mask loss, and the expression is as follows:

L _total ＝αL _BBox +βL _snake +γL _mask (4)

wherein alpha, beta and gamma are weight coefficients used for adjusting the influence of the loss of the three parts on the total loss; let α = β = γ =1,L _snake Using focal loss, L _BBox And L _Mask The commonly used smooth L1loss and binary cross loss were used, respectively.

A strabismus Iris segmentation verification method based on an attention mechanism is characterized in that training and testing are performed on SBVPI, CASIA-Iris-Complex-Off-angle and CASIA-Iris-Africa three data sets; firstly, unifying the size of data to 640 multiplied by 480, enlarging the scale of a training set by adopting an online data enhancement method, randomly processing pictures by adopting rotation, scaling, mirroring, noise adding and random shielding modes provided by a pytorech, and then randomly extracting samples and sending the samples into a network for training; in the training process, the training step length of the detection stage is set to be 1e-4, and the gamma is set to be 0.5; processing an input picture in a segmentation stage by adopting an Adam optimization algorithm, setting pixel values inside and outside an inner boundary and an outer boundary of an iris to be zero, and simultaneously limiting a prediction range of a segmentation network between the inner boundary and the outer boundary;

the PS-RANCAC needs to provide four points on the inner boundary of the iris besides the original oblique eye iris image, provides coordinates of the four points in a manual marking mode, can not obtain a segmentation result in an Irisseg experiment on part of the oblique eye iris images, and removes the images when calculating the segmentation performance index; since IrisSeg itself directly obtains the segmentation result, the mask result is used as the inner and outer boundary result of the method through edge fitting;

the verification is compared and analyzed from the aspects of positioning, segmentation, operation time and model complexity, and two performance indexes of E1 and E2 are used as segmentation evaluation standards;

e1 represents the mean value of the segmentation error rate, and the expression is:

where n, r, c represent the number of test images, high (number of rows) andwidth (column number), G, M respectively representing true and predicted iris masks, r ', c' respectively representing column and row coordinates of pixels in G and M, and operator

Representing a logical exclusive-or operation to evaluate pixels in G and M that are not coincident;

e2 is proposed as a complement to E1 to compensate for the disparity in the prior probabilities of the iris and non-iris regions of the pixels in the image. The method balances the unbalanced relationship between the false positive rate fp and the false negative rate fn, and has the following specific expression:

wherein n represents the number of test images; the values of the E1 and E2 indices are both between [0,1], with values closer to 1 indicating poorer performance and values closer to 0 indicating better performance.

The object of the invention can be further achieved by the following technical measures:

in the second step, a binary classification and frame regression method is adopted for selecting effective signals, an RPN module is composed of a 3 x 3 convolution and two branches, the convolution and activation operations are used for further concentrating feature information, the two classification branches are used for classifying a background and a foreground, and the other branch is used for regression of an anchor frame BBox (Bounding Box); firstly, anchor frames with different sizes are generated for each feature image pixel point, the anchor frames are sorted according to the output probability, most of the predicted foreground color probability is reserved, and then the anchor frames are corrected for the first time by using the output regression value of the RPN.

In the foregoing squint iris segmentation method based on attention mechanism, in the second step, the method for performing target prediction includes: after passing through the RPN module, only 2000 preselected frames are reserved on the feature map, pixel correction is carried out by using a bilinear interpolation method, an ROI (region of interest) region is screened by Non-Maximum Suppression (NMS), and finally a BBox branch and a binary branch are sent for prediction.

In the fourth step, 40 sampling points are taken to represent the inner boundary or the outer boundary profile of the iris.

In the foregoing method for dividing an squint iris based on an attention mechanism, in step four, the circle kernel function k =5.

In the foregoing method for iris segmentation based on attention mechanism, in step four, the number K of channels in the ECA layer is selected in an adaptive manner, and when the input channel dimension is C:

wherein | t | _odd The nearest odd number, γ, b are parameters, and 2 and 1 are taken, respectively.

Compared with the prior art, the invention has the beneficial effects that: in consideration of the characteristics of natural closed curve contours of iris images, the inner and outer boundaries of the oblique iris are positioned from rough to precise through an active contour model, pole coordinates after rough positioning transformation are used as an initial contour point strategy and a contour point offset prediction fitting mode, finally, 128 obtained ordered contour points are used for positioning effective iris areas, areas near and between the inner and outer boundaries are extracted as effective prediction areas of a segmentation stage, the range needing to be predicted is narrowed, and iris boundary information is provided for subsequent normalization and the like. Meanwhile, the hierarchical structure in the positioning network is adjusted, and a maxporoling (maximum pooling) layer is replaced by a SoftPool layer, so that more information is reserved without increasing parameters. An ECA (ECA rule is an Event-Condition-Action-Condition-Action short for short) attention mechanism focusing model attention area is merged into the segmentation module, the SoftPool layer is used for optimizing image information transfer, and the bottom layer characteristics and the final characteristic expression performance are improved in a mode of deepening and increasing network branches. The effectiveness of the method is proved by experimental results on three data sets of SBVPI, CASIA-Iris-Off-angle and CASIA-Iris-Africa (SBVPI data set, chinese academy Iris oblique angle data set and Chinese academy Iris Asian data set), and compared with other methods, the effectiveness of the method is improved by 0.316% on the positioning result and the error rates of 3.26%, 3.9% and 3.45% on the segmentation result.

Drawings

FIG. 1 is a view of the overall structure of the model of the present invention;

FIG. 2 is a block diagram of an RPN module;

FIG. 3 is a diagram of an initial contour transformation process;

FIG. 4 is a diagram of a backbone network architecture;

FIG. 5 is a schematic diagram of an ECA attention model module;

FIG. 6 is a graph of positioning results;

FIG. 7 is a diagram of the results of testing of methods on a data set;

fig. 8 is a graph of the segmentation results of each method on three data sets.

Detailed Description

The invention is further described with reference to the following figures and specific examples.

The invention has the technical characteristics that: and obtaining an iris initial positioning and segmentation result by adopting a ResNet50 network and a characteristic pyramid, positioning an initial contour by utilizing a pole, and representing an iris boundary by using an ordered point coordinate set through equal-interval sampling. And inputting the initial contour into an active contour fitting network based on DeepsSnake to obtain an accurate iris positioning result, and partitioning the region through an internal and external boundary constraint model. An ECA mechanism is introduced to improve the segmentation precision, so that the network attention is focused on an effective iris area, and the loss of image detail information caused by pooling is reduced by utilizing a SoftPool layer.

The invention relates to an attention mechanism-based squint iris segmentation method, which comprises the following steps of:

first, human eye image acquisition

And collecting the human eyes to be identified by adopting a camera, and storing the human eyes in a database.

Second, preliminarily determining the model structure

The invention provides a multi-task model, which solves the problems of positioning the inner and outer boundaries of the oblique eye iris and dividing the effective area of the iris at the same time, the structure of the adopted ResNet50 network integral network model is shown in figure 1, and the size of an input image is 640 multiplied by 480 pixels. In the figure:

CNN-conditional Neural Network, representing a Convolutional Neural Network;

RPN-Region Proposal Network, representing regional proposed Network;

full connected layers, representing fully connected layers;

MASK, representing a MASK;

active contour positioning, representing Active contour positioning;

RoIAlign-Region Of Interest Align, indicating Region Of Interest alignment.

Firstly, an input picture is sent to a ResNet50 network (residual error network 50) for feature extraction, and compared with a VGG (Visual Geometry Group), the ResNet (residual error network) can enable the network depth to be deeper and obtain semantic features of higher layers. Setting a plurality Of ROI (Region Of Interest) for each point in the extracted feature map, sending the ROI into an RPN (Region suggestion Network) module for binary classification and border regression, and filtering partial ROI. And then, sending the feature map and the rest ROIs into a RoIAlign module, and finally performing target prediction. Unlike Mask-RCNN (Mask-Region Convolutional Neural Network Mask area Neural Network), there is no need to perform class classification, so only the prediction information of the bounding box and Mask is output. The frame is then fed into the active contour positioning module as an initial contour. Through experiments, the final determination of 40 contour points can well represent the inner boundary or the outer boundary contour of the iris, and more calculation amount and memory can not be wasted. And predicting the offset vector of each point by using the deformed contour for the second time, iteratively fitting to obtain accurate inner and outer boundaries, and finally constraining the mask prediction by using the boundaries. Prior to performing mask prediction, attention is added to improve segmentation accuracy.

The binary classification and frame regression method comprises the following steps:

the RPN block consists of a 3 × 3 convolution and two branches, the convolution and activation operations being intended to further concentrate the feature information. The two classification branches are used for classifying the background and the foreground, the other branch is used for regression of BBox (Bounding Box anchor frame), and the RPN module is shown in fig. 2. Firstly, anchor frames with different sizes are generated for each feature image pixel point, the anchor frames are sorted according to the output probability, most of the predicted foreground color probability is reserved, and then the anchor frames are corrected for the first time by using the output regression value of the RPN.

The method for predicting the target comprises the following steps: after passing through the RPN module, only 2000 preselected frames are reserved on the feature map, but because the predicted value is a floating point number, if the predicted value is simply rounded and corresponds to the original number, a large deviation occurs, and therefore a bilinear interpolation method is used for pixel correction. Then, the ROI area is screened through Non-Maximum Suppression (NMS), and finally, a BBox branch and a binary branch are sent for prediction.

Thirdly, target detection and coarse positioning

In the deep learning model based on deep snake, a top-down structure of a Feature Pyramid (FPN) is adopted to perform multi-scale Feature extraction on an input image. By adopting the FPN structure, better feature map fusion can be realized, the feature map fusion comprises bottom-level detail information and high-level semantic information, and features of all stages are fully utilized. The invention adopts Resnet50 as an uplink network, the Resnet is trained by layer-hopping transmission residual errors, two basic modules of Identity Block (identification Block) and Conv Block (convolution Block) are provided in total, and the network is divided into five layers according to different combination modes of the two modules, which are represented by C1-C5. Resnet101 has 22 Identity blocks at C4, while Resnet50 has only 5 at C4. In the FPN descending part, deconvolution with the step length of 3 multiplied by 3 and the step length of 1 and without filling is adopted to carry out up-sampling on the feature diagram, and then the feature diagram and the corresponding C1-C5 ascending feature diagram are added among pixels to obtain a P2-P5 fusion feature layer, wherein the C1 layer does not carry out transverse connection considering the problem of memory. And the P5 characteristic layer is subjected to SoftPool pooling to obtain a P6 characteristic layer for the RPN module, and the parameter of the pooling layer is set to be 2. To ensure the same number of channels, the cross-connected feature map is channel aligned by a 1 × 1 × 256 convolutional layer. And finally, eliminating aliasing effects brought by upsampling the P2-P5 layers through a convolution layer of 3 x 256. And sending the processed P2-P6 fusion feature layer to an RPN module.

Fourthly, accurately positioning and segmenting the target

The active contour fitting is a target contour description method, and the core idea of the active contour fitting is to indirectly solve the problem of image segmentation through a process of minimizing a defined energy function. Under the drive of the minimum value of the energy function, the contour curve gradually approaches to the edge of the detected object, and finally the target is segmented. The formula is as follows:

wherein v(s) = [ x(s), y(s)],s∈[0,1]Is a group of contour points, and the points are connected end to end by straight lines to form a contour line. x(s) and y(s) respectively represent the coordinate position of each control point in the image. s is an argument describing the boundary in the form of a fourier transform. The first term inside the integral is the modulus of the first derivative of v, called the elastic energy. The second term is the mode of the second derivative of v, called the bending energy, and the third term is called the external energy (external force). Alpha, beta, E _ext Is a constant coefficient.

The Snake model is sensitive to the initial position, and the initial contour is obtained by processing the coarse positioning result of the NMS. Specifically, as shown in fig. 3, the model is inspired by the extreme idea of ExtreNet, and transforms the initial contour to better wrap the iris region. Firstly, respectively taking the middle points of each side of the BBox, sending the four points as initial poles into a backbone network, using the coordinates of the real poles as labels, predicting four offset vectors corresponding to each other one by one through network learning, and then changing the positions of the four points through the offset vectors. Then, one fourth of each side length of BBox is used as four sides of the octagon at four points, and the four sides are sequentially connected to form the octagon. And finally, uniformly sampling on the octagon, wherein sampling points are boundary points representing the initial contour. Through experiments, the boundary can be well described by taking 40 sampling points, and meanwhile, the calculation and the memory are not excessively occupied.

And further, sending the initial profile into a backbone network, and performing iterative prediction on offset vectors of all points by taking the coordinates of the same sampling points on the actual profile as tags. The iteration is performed again as a new contour after the coordinates of the boundary points are changed each time by means of an offset vector. As shown in fig. 4, in order to obtain the contour feature, feature extraction is performed by using a portion mainly including a residual network structure. Because the inner and outer boundaries of the oblique eye iris are respectively a natural closed free curve, the boundary points describing the contour are connected end to end, and each point is connected with the adjacent points by a line.

By designating a point as the starting point, the entire boundary can be represented as an ordered set of points. Feature extraction using standard one-dimensional convolution destroys the topology of the contour. And thus is represented in the form of a discrete periodic signal and processed using a cyclic convolution. The cyclic convolution is a kind of periodic convolution, and its expression is as follows:

where f denotes the signal and k denotes the circular kernel function. In the experiments herein, the circle kernel function k =5 is taken.

Further, the feature extraction part is composed of eight cyclic convolution modules in total, and each convolution module is composed of a cyclic convolution layer, a (Batch Normalization) BN layer and a Relu (Rectified Linear Unit) activation function layer. The output of a module will be summed with the output of the next module over the skip connection. And simultaneously, the output of each module is spliced at the tail of the feature extraction part through jumping connection to serve as the final extracted feature. Multi-scale contour feature fusion was then performed using 1 x 1 convolutional layers and SoftPool layers. And the feature expression is redefined by utilizing two times of cyclic convolution, so that the feature expression capability is improved. Finally, the offset prediction vector of each contour point is output through the 1 × 1 convolutional layer. SoftPool calculates the exponential weighting coefficient of the pooling area corresponding to each pixel point, then multiplies the pixel point numerical value and adds the product, and the final result is used as the pooling result of the area. The advantage of this is that each element before pooling contributes to the post-pooling result, preserving the basic attributes while amplifying the more intense feature activations, and compared to traditional pooling, softPool is differentiable, providing a gradient for each input during back-propagation, while improving computational and memory efficiency. Assuming a pooling coefficient of 2, the pooling operation can be expressed as follows:

where P represents the value of the participating pooled pixels. After accurate positioning, the accurate inner and outer edge profiles of the iris and the initial BBox are taken as constraint conditions and sent to a segmentation module together with a characteristic diagram for iris mask prediction.

Further, in the segmentation module, the segmentation accuracy and the complexity and the running time of the model are considered.

And a lightweight Attention mechanism ECA (Efficient Channel Attention) is introduced to improve the segmentation performance. The mechanism is a deep dig and retrofit of the SE module, as shown in fig. 5. After input features are subjected to channel-level global average pooling without reducing dimensions, each feature channel can consider the feature information of K adjacent channels in a one-dimensional convolution mode to capture local cross-channel interaction information, the mechanism improves the performance and efficiency of attention channels from the aspects of avoiding dimension reduction and proper cross-channel interaction, and the ECA is a light module and can greatly improve the performance of a network model while introducing less parameters. According to the method, an ECA layer is added to a partition part, and a model attention area can be well focused by combining a network connection mode of a jump layer, so that the model feature expression capability can be optimized. And the selection of the number of channels K of the ECA layer is determined in an adaptive mode, and when the input channel dimension is C:

The loss function of the model is composed of three parts, namely BBox loss, contour point offset loss and Mask loss, and the expression is shown as follows.

L _total ＝αL _BBox +βL _snake +γL _mask (12)

Wherein alpha, beta and gamma are weight coefficients used for adjusting the influence of the loss of the three parts on the total loss. In this experiment, α = β = γ =1.Focal distance can solve the problem of unbalance of positive and negative samples of training data, and can focus on difficult samples. For oblique iris, L is because the problem itself is difficult to segment relative to the normal iris image and occlusion problems exist, which all result in a large number of difficult samples during training _snake Focal loss was used. L is _BBox And L _Mask The usual smooth L1loss and binary cross loss were used, respectively. Get L _BBox Middle beta =1,L _snake α =1.

Example verification:

in order to evaluate the performance of the proposed algorithm in different scenarios, training and testing were performed on three datasets SBVPI, CASIA-Iris-Complex-Off-angle, CASIA-Iris-Africa.

The verification method comprises the following steps:

the data is first uniformly sized to 640 x 480 to facilitate the next preprocessing operations. Because the data quantity is less, the scale of the training set is enlarged by adopting an online data enhancement method, the pictures are randomly processed by adopting the modes of rotation, scaling, mirroring, noise addition, random shielding and the like provided by the pytorch, and then samples are randomly extracted and sent to a network for training. In the training process, the training step size of the detection stage is set to be 1e-4, and the gamma is set to be 0.5. An Adam optimization algorithm is selected. Processing the input picture in the segmentation stage, setting the pixel values inside and outside the inner boundary and the outer boundary of the iris to zero, and simultaneously limiting the prediction range of the segmentation network between the inner boundary and the outer boundary.

The experiment is divided into a contrast experiment and an ablation experiment, and both the experiments are trained and tested on the three data sets mentioned above. In the comparison test, the PS-RANCAC needs to provide four points on the inner boundary of the iris in addition to the original oblique-eye iris image, in order to fit the inner boundary of the iris through the four points to achieve the effect of assisting the conjecture of the outer boundary by the inner boundary. In the experiment, coordinates of four points are provided by means of manual labeling. In the IrisSeg experiment, partial oblique-eye iris images cannot obtain segmentation results, and the images are removed when the segmentation performance index is calculated. Since IrisSeg itself directly obtains the segmentation results, the mask results are used as the inner and outer boundary results of the method by edge fitting.

The experiment is compared and analyzed from the aspects of positioning, segmentation, operation time and model complexity in total, and two performance indexes of E1 and E2 are used as segmentation evaluation criteria.

Segmentation indices come from nice.i iris segmentation games and are widely used to evaluate the performance of iris segmentation algorithms. Where E1 represents the mean value of the segmentation error rate, the expression is:

wherein n, r and c respectively represent the number, height (line number) and width (column number) of the test image, G and M respectively represent the real oblique eye iris mask and the predicted oblique eye iris mask, r 'and c' respectively represent the column coordinates and row coordinates of pixel points in G and M, and operators

A logical exclusive or operation is shown to evaluate pixels in G and M that are not coincident.

M2 is proposed as a complement to M1 to compensate for the disparity in the prior probabilities of the iris and non-iris regions of pixels in the image. It balances the unbalanced relationship between the false positive rate fp and the false negative rate fn. The specific expression is as follows:

where n represents the number of test images. The values for the E1 and E2 indices are both between [0,1], with values closer to 1 indicating poorer performance and values closer to 0 indicating better performance.

The general operation time refers to the time taken by the model to operate calculation and output results, so the time taken from feeding data to the network to outputting the results is taken as the operation time, and the superiority of the network model is distinguished by respectively comparing the time taken by different models on different data sets.

Further, experimental results and analysis

The positioning results of the algorithm and PS-RANSAC and Irisseg on different test sets are shown in FIG. 6.

It can be seen intuitively that the model proposed herein is closer to the actual inner and outer boundaries, and the fitting mode based on the active contour has a good effect on fitting the iris boundary problem. Fig. 7 (a), (b), and (c) are AUC graphs of different datasets (three datasets SBVPI, CASIA-Iris-Complex-Off-angle, CASIA-Iris-Africa), and it can be seen that there are some differences in the same threshold on different datasets.

In the evaluation of the segmentation results, experiments were performed using test sets of SBVPI, CASIA-Iris-Off-angle, and CASIA-Iris-Africa, respectively, and comparative analysis was performed on the experimental results using E1 and E2 indices. The segmentation results of the methods on the three data sets are shown in fig. 8, where the first row is an original oblique-eye iris image, the second row is an iris mask corresponding to the first row, and the last three rows respectively represent the results of PS-RANSAC (Polar Spline Random Sample Consensus), irisSeg (iris segmentation) and our method.

As can be seen from the results of fig. 8, in the face of the oblique Iris problem, irisSeg has obvious errors, which illustrates that the performance of the general Iris segmentation algorithm is degraded in the face of the oblique Iris segmentation problem in a complex scene, while the PS-RANSAC method has significantly enlarged Iris mask regions generated on the data sets of the cas-Iris-Off-angle and cas-Iris-Africa, and has a different shape from the true mask region, which illustrates that the performance is degraded in the face of the near-infrared oblique Iris problem, which may be caused by the fact that the Iris-sclera-pupil boundary region has a smaller difference in pixel values in the two data sets, and the Iris region is not in a circular-like shape in the region due to the oblique Iris problem.

The E1 and E2 indexes of the method and PS-RANSAC and IrisSeg methods on three data sets are shown in Table 1. It can be seen that the method achieves significant performance advantages on each data set. Compared with PS-RANSAC and IrisSeg methods, the method respectively obtains 3.26%, 3.9% and 3.45% performance improvement on SBVPI, CASIA-Iris-Off-angle and CASIA-Iris-Africa data sets.

TABLE 1 partitioning Performance index for each method

In addition to the above embodiments, the present invention may have other embodiments, and any technical solutions formed by equivalent substitutions or equivalent transformations fall within the scope of the claims of the present invention.

Claims

1. An attention mechanism-based squint iris segmentation method is characterized by comprising the following steps:

step one, human eye image acquisition:

step two, preliminarily determining a model structure:

sending the human eye image into a ResNet50 network for feature extraction, setting a plurality of ROI (region of interest) for each point in the extracted feature image, sending the ROI into an RPN (resilient packet network) module to select effective signals, and filtering out partial ROI; then, the feature map and the residual ROI are sent to a RoIAlign module, target prediction is finally carried out, and the RoIAlign module only outputs prediction information of the frame and the mask;

step three, target detection and coarse positioning:

in a deep learning model based on deep snake, performing multi-scale feature extraction on an input image by adopting a top-down structure of a feature pyramid FPN; the method adopts Resnet50 as an uplink network, the Resnet transmits residual errors through layer jump for training, the Resnet is provided with two basic modules, namely a mark block and a convolution block, the network is divided into five layers according to different combination modes of the two modules, and the five layers are represented by C1-C5; resnet101 has 22 Identity blocks at C4, while Resnet50 has only 5 at C4; in the FPN downlink part, performing up-sampling on a feature map by adopting deconvolution with 3 multiplied by 3 and step length of 1 and without filling, and then performing pixel-to-pixel addition on the feature map and a corresponding C1-C5 uplink feature map to obtain a P2-P5 fusion feature layer, wherein the C1 layer does not perform transverse connection in consideration of the memory problem; the P5 characteristic layer is subjected to SoftPool pooling to obtain a P6 characteristic layer used for the RPN module, and the parameter of the pooling layer is set to be 2; in order to ensure the same channel number, the characteristic diagram of the transverse connection is aligned with the channel through a 1 multiplied by 256 convolution layer; finally, eliminating aliasing effect brought by up-sampling of the P2-P5 layers through a 3 x 256 convolutional layer; sending the processed P2-P6 fusion feature layer into an RPN module;

fourthly, accurately positioning and segmenting the target:

wherein v(s) = [ x(s), y(s)],s∈[0,1]The contour line is formed by connecting a group of contour points end to end in a straight line; x(s) and y(s) respectively represent the coordinate position of each control point in the image, s is an independent variable describing the boundary in the form of a fourier transform, the first term inside the integral is the mode of the first derivative of v, called elastic energy, the second term is the mode of the second derivative of v, called bending energy, the third term is the external energy, alpha, beta, E _ext Is a constant coefficient;

processing the coarse positioning result of the NMS to be used as an initial contour; transforming the initial contour to better wrap the iris area; firstly, respectively taking the middle points of each side of the BBox, sending the four points as initial poles into a backbone network, using the coordinates of the real poles as labels, predicting four offset vectors corresponding to each other one by one through network learning, and changing the positions of the four points through the offset vectors; then, one fourth of each side length of BBox is used as four sides of the octagon at four points, and the four sides are sequentially connected to form the octagon; finally, uniformly sampling on the octagon, wherein sampling points are boundary points representing the initial contour;

sending the initial contour into a backbone network, taking the coordinates of the same sampling points on the actual contour as tags, performing iterative prediction of offset vectors of each point, and iterating again as a new contour after changing the coordinates of boundary points through the offset vectors each time; in order to obtain contour features, firstly, a residual error network structure is used as a main body for feature extraction, because the inner boundary and the outer boundary of the oblique eye iris are respectively a natural closed free curve, boundary points describing the contour are connected end to end, and each point is connected with adjacent points by a line; designating a point as a starting point, the whole boundary can be represented as an ordered set of points; the method is expressed in a discrete period signal form and is processed by cyclic convolution, and the expression is as follows:

wherein f represents a signal and k represents a circular kernel function;

the feature extraction part consists of eight cyclic convolution modules in total, each convolution module consists of a cyclic convolution layer, a BN layer and a Relu activation function layer, and the output of the module is summed with the output of the next module through jump connection; meanwhile, the output of each module is spliced at the tail of the feature extraction part through jumping connection to serve as the final extracted feature; then carrying out multi-scale contour feature fusion by using the 1 × 1 convolution layer and the SoftPool layer; redefining feature expression by utilizing twice cyclic convolution to improve feature expression capability; finally, outputting the offset prediction vector of each contour point through a 1 multiplied by 1 convolutional layer; softPool calculates the exponential weighting coefficient of the pooling area corresponding to each pixel point, then multiplies the pixel point numerical value and adds the product, and the final result is used as the pooling result of the area; assuming a pooling factor of 2, the pooling operation is expressed as follows:

wherein P represents the value of the pixel participating in pooling; after accurate positioning, the accurate inner and outer edge profiles of the iris and the initial BBox are taken as constraint conditions and sent to a segmentation module together with a characteristic diagram for iris mask prediction;

in a segmentation module, considering the segmentation accuracy and the complexity and the operation time of a model, a lightweight attention mechanism ECA is introduced to improve the segmentation performance; after the input features are subjected to channel-level global average pooling without reducing the dimension number, each feature channel considers the feature information of K adjacent channels in a one-dimensional convolution mode so as to capture the interaction information of local cross channels;

L _total ＝αL _BBox +βL _snake +γL _mask (4)

wherein alpha, beta and gamma are weight coefficients used for adjusting the influence of the loss of the three parts on the total loss; let α = β = γ =1,L _snake Using focal loss, L _BBox And L _Mask The usual smooth L1loss and binary cross loss were used, respectively.

2. The method as claimed in claim 1, wherein in the second step, the binary classification and frame regression method is adopted to select the valid signal, the RPN module is composed of a 3 × 3 convolution and two branches, the convolution and activation operations are to further concentrate the feature information, the two classification branches are used to classify the background and the foreground, and the other branch is used to perform regression of the anchor frame BBox; firstly, anchor frames with different sizes are generated for each feature image pixel point, the anchor frames are sorted according to the output probability, most of the predicted foreground color probability is reserved, and then the anchor frames are corrected for the first time by using the output regression value of the RPN.

3. The method for iris strabismus segmentation based on attention mechanism as claimed in claim 1, wherein in the second step, the method for target prediction comprises: after passing through the RPN module, only 2000 preselected frames are reserved on the feature map, pixel correction is carried out by using a bilinear interpolation method, then the ROI area is screened by non-maximum suppression, and finally BBox branches and binary classification branches are sent for prediction.

4. The method of claim 1, wherein in step four, 40 samples are taken to represent an inner boundary or an outer boundary contour of the iris.

5. The method of claim 1, wherein in step four, the circle kernel function k =5.

6. The method for squint iris segmentation based on attention mechanism as claimed in claim 1, wherein in step four, the selection of the number of channels K of the ECA layer is determined in an adaptive manner, and when the input channel dimension is C:

wherein, | t _odd The nearest odd number, γ, b are parameters, and 2 and 1 are taken, respectively.

7. The method for verifying squint Iris segmentation based on attention mechanism as claimed in claim 1, wherein training and testing are performed on three data sets of SBVPI, CASIA-Iris-Complex-Off-angle, CASIA-Iris-Africa; firstly, unifying the size of data to 640 multiplied by 480, adopting an online data enhancement method to enlarge the scale of a training set, adopting rotation, scaling, mirroring, noise adding and random shielding modes provided by a pitorch to randomly process pictures, and then randomly extracting samples and sending the samples into a network for training; in the training process, the training step length of the detection stage is set to be 1e-4, and the gamma is set to be 0.5; processing an input picture in a segmentation stage by adopting an Adam optimization algorithm, setting pixel values inside and outside an inner boundary and an outer boundary of an iris to be zero, and simultaneously limiting a prediction range of a segmentation network between the inner boundary and the outer boundary;

the PS-RANCAC needs to provide four points on the inner boundary of the iris besides the original oblique eye iris image, provides coordinates of the four points in a manual marking mode, can not obtain a segmentation result in an IrisSeg experiment by part of the oblique eye iris images, and removes the images when calculating the segmentation performance index; since IrisSeg itself directly obtains the segmentation result, the mask result is used as the inner and outer boundary result of the method through edge fitting;

wherein n, r and c respectively represent the number, height and width of the test image, G and M respectively represent the real oblique eye iris mask and the predicted oblique eye iris mask, r 'and c' respectively represent the column coordinates and row coordinates of the pixel points in G and M, and an operator