CN116824681B

CN116824681B - Eye detection method, system and equipment based on deep convolutional neural network

Info

Publication number: CN116824681B
Application number: CN202311070940.3A
Authority: CN
Inventors: 刘岸; 商海峰; 韩玉佳; 焦欣悦
Original assignee: Beijing Gicom Network Technology Co ltd
Current assignee: Beijing Gicom Network Technology Co ltd
Priority date: 2023-08-24
Filing date: 2023-08-24
Publication date: 2023-11-24
Anticipated expiration: 2043-08-24
Also published as: CN116824681A

Abstract

The application provides an eye detection method, system and equipment based on a deep convolutional neural network. The method comprises the following steps: marking the training data set to obtain a true phase; constructing a deep neural network model; training the deep neural network model by using the training data set; forward reasoning is carried out on the iris image to be processed by using the trained deep neural network model, and a reasoning result is obtained; and (3) post-processing the reasoning result, and analyzing the positions of eyes and left and right eye judgment results in the iris image according to the position difference between the rectangular frame and the prediction frame. The eye detection method, the eye detection system and the eye detection equipment based on the deep convolutional neural network can solve the problems of insufficient detection precision and high omission factor in the existing iris recognition.

Description

Eye detection method, system and equipment based on deep convolutional neural network

Technical Field

The application relates to the technical field of iris recognition, in particular to an eye detection method, system and equipment based on a deep convolutional neural network.

Background

The iris identifying technology is one kind of human body biological identifying technology. The iris recognition has the characteristics of uniqueness, stability, non-contact property, high safety and the like, is acknowledged to be the most accurate and convenient biological recognition technology, and is widely applied to various scenes requiring accurate identity authentication, such as finance, security, checkpoints, access control, insurance and the like.

Human eye detection is the forefront link of iris recognition, and the accuracy and performance of the human eye detection seriously influence the effect of the subsequent calculation processing process, but the human eye detection is the link which is most easily ignored. In conventional iris processing procedures, iris localization is typically accomplished directly by detecting the iris. This approach does not accurately obtain complete eye information, and requires additional methods to make left and right eye decisions. Other human eye detection methods based on machine learning are mainly used in face recognition, and have the problems of insufficient detection precision, higher omission factor and the like for iris recognition.

Disclosure of Invention

The application provides an eye detection method, system and equipment based on a deep convolutional neural network, which can solve the problems of insufficient detection precision and high omission factor in the existing iris recognition.

In a first aspect, the present application provides an eye detection method based on a deep convolutional neural network, the method comprising:

s1: marking a training data set to obtain a true phase, wherein the training data set contains iris images to be trained, and the true phase contains position information of a rectangular frame in the training data set;

s2: constructing a deep neural network model;

s3: training the deep neural network model by using the training data set;

s4: forward reasoning is carried out on the iris image to be processed by using the trained deep neural network model, so that a reasoning result is obtained, and the reasoning result contains the position information of the predicted rectangular frame in the iris image;

s5: and (3) post-processing the reasoning result, and analyzing the positions of eyes and left and right eye judgment results in the iris image according to the predicted position information of the rectangular frame.

In a second aspect, the present application also provides an eye detection system based on a deep convolutional neural network, the system comprising:

the marking module is used for marking the training data set to obtain a true phase, wherein the training data set contains iris images to be trained, and the true phase contains position information of a rectangular frame in the training data set;

the construction module is used for constructing a deep neural network model;

the training module is used for training the deep neural network model by using the training data set;

the reasoning module is used for carrying out forward reasoning on the iris image to be processed by using the trained deep neural network model to obtain a reasoning result, wherein the reasoning result comprises the position information of the predicted rectangular frame in the iris image;

and the post-processing module is used for carrying out post-processing on the reasoning result and analyzing the positions of eyes and left and right eye judgment results in the iris image according to the predicted position information of the rectangular frame.

In a third aspect, the present application also provides an electronic device, including:

a memory for storing executable instructions;

and the processor is used for realizing the eye detection method based on the deep convolutional neural network when executing the executable instructions stored in the memory.

One or more technical schemes provided by the application have at least the following technical effects or advantages:

the method for detecting the left eye and the right eye effectively comprises the steps of marking rectangular frames on iris images in a training data set, completing model training, judging whether the iris images are left eyes or right eyes according to the position information of the predicted rectangular frames given by the model, improving the detection precision of the left eye and the right eye detection, and reducing the omission ratio.

Drawings

Fig. 1 is a flowchart of an eye detection method based on a deep convolutional neural network according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an eye mark according to an embodiment of the present application;

FIG. 3 is a network configuration diagram provided in an embodiment of the present application;

fig. 4 is a view of a ConvBlock structure provided by an embodiment of the present application;

FIG. 5 is a diagram of DwBlock according to an embodiment of the present application;

FIG. 6 is a diagram of an InceptionBlock according to an embodiment of the present application;

FIG. 7 is a block diagram of an eye detection system based on a deep convolutional neural network according to an embodiment of the present application;

fig. 8 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Aiming at the problems of insufficient detection precision and high omission factor in the existing iris recognition, the embodiment of the application provides an eye detection method, system and equipment based on a deep convolutional neural network.

Fig. 1 shows a flowchart of an eye detection method based on a deep convolutional neural network according to an embodiment of the present application. Referring to fig. 1, the eye detection method based on the deep convolutional neural network includes the steps of:

s1: and marking the training data set to obtain a true phase, wherein the training data set contains iris images to be trained, and the true phase contains position information of a rectangular frame in the training data set.

S2: and constructing a deep neural network model.

S3: the deep neural network model is trained using the training dataset.

S4: and forward reasoning is carried out on the iris image to be processed by using the trained deep neural network model, so that a reasoning result is obtained, wherein the reasoning result contains the position information of the predicted rectangular frame in the iris image.

Through the processing procedure, the iris image which needs to be judged by the left eye image and the right eye image is input by the deep neural network model, the position of the predicted rectangular frame is output, and finally, the judgment of the left eye or the right eye is carried out according to the position of the predicted rectangular frame, so that the detection precision is greatly improved, and the omission factor is greatly reduced.

The processing of the method is described in detail below.

In step S1, the training data set is marked to obtain GT (real) data. The mark information includes coordinate position information (loc) of eyes in a pair of iris images, classification information (class) of whether or not the eyes are, left eye/right eye classification information (lr_class).

The coordinate position information corresponds to a rectangular frame, and the rectangular frame exactly comprises a complete eye, specifically exactly comprises the upper eyelid boundary, the lower eyelid boundary, the left eyelid boundary and the right eyelid boundary. Expressed in terms of upper left boundary point coordinates (x 1, y 1) and lower right boundary point coordinates (x 2, y 2) of the rectangular frame.

Whether or not the eyes are classified, 1 indicates that there are correct eyes in the rectangular frame, and 0 indicates that there are no eyes.

Left/right eye classification information, 1 indicates a left eye, and 0 indicates a right eye.

The group Truth of a complete eye is expressed in GT (x 1, y1, x2, y2, class, lr_class), where (x 1, y1, x2, y 2) represents position information, class represents whether or not it is eye classification information, lr_class represents left/right eye classification information. When more than 1 eye is present in a pair of iris images, all eyes within the image should be marked. Fig. 2 illustrates a marking pattern in which both left and right eyes exist.

In step S2, a deep neural network model is designed. The method comprises key elements such as a network model structure, data preprocessing, candidate rectangular frame design, position information coding, positive and negative sample selection, loss design and the like.

Network model structure: the network model structure used in this example is shown in fig. 3, and details and structures are shown in table 1, fig. 4, fig. 5, and fig. 6.

Table 1 details network structure (example)

The model was divided into 3 consecutive stages, steps as follows:

s11: the first Stage comprises a number of consecutive downsampling processes. The function is to extract features during rapid image degradation. In this example, 4 downsampling processes are used, and a DwBlock structure is used, where the downsampling rate is 2.

S12: the second Stage comprises a plurality of successive feature fusion processes. The function of the method is to fuse the features of the preliminary extraction. In this example, 3 fusion processes are used, and an incommonblock structure is used, and the fusion process does not perform downsampling.

S13: the third Stage includes a process of multi-scale outputting predicted rectangular frame information. The method has the effect of outputting prediction results of different dimensions and different aspect ratios, including position information of a predicted rectangular frame, classification information of the predicted rectangular frame, left and right eye classification information and the like. This example uses 2 different scale channels.

The data preprocessing includes preprocessing of image data and preprocessing of marker data.

Wherein, the image preprocessing comprises the following steps:

s21: the image data is first subjected to various necessary data augmentation processes, wherein deformation, cutting, horizontal turning, rotation, etc. may affect the operation of the position information of the marking data, and the corresponding marking position information should be adjusted at the same time.

S22: the data-augmented image contains at least one complete, marked eye.

S23: the final input image is scaled into a single-channel gray scale image with a specified size (width 640x height 360), and is input into a network model to be trained after standardized processing.

S24: the standardized processing mode is thatWherein x is _new For the normalized image pixel values, x is the image pixel value before processing,/for the normalized image pixel value>Statistical mean value of pixel values for training data set, < >>Is the standard deviation.

Wherein the marking data preprocessing comprises the following steps:

s31: the corresponding position information is adjusted in coordination with an augmentation processing operation that may affect the position information of the tag data.

S32: and finally, after normalization processing is carried out on the position information, the position information is used as new group trunk information, and the new group trunk information is input into a network model to be trained.

S33: the normalization processing mode is that，/>Wherein (x) _new ,y _new ) For the new coordinate point after normalization processing, (x, y) is the coordinate point before processing, width is the pixel width of the corresponding image, and high is the pixel height of the corresponding image.

Candidate rectangular box design: the rectangular box candidates were designed with 2 width dimensions, 2 aspect ratios, 2 densities, expressed in unsynchronized lengths.

The marking mode of the candidate rectangular frame is (cx, cy, w, h), wherein (cx, cy) is the central coordinate of the normalized rectangular frame of the candidate, and w, h is the width and height of the normalized rectangular frame of the candidate.

The normalization processing mode is thatWhere width is the pixel width of the corresponding image and high is the pixel height of the corresponding image.

Position information encoding, comprising:

position information Loc (x 1, y1, x2, y 2) normalized for one rectangular frame is input, where (x 1, y 1) represents the upper left boundary point coordinates of the rectangular frame and (x 2, y 2) represents the lower right boundary point coordinates.

The output is (cx, cy, w, h), where (cx, cy) represents the center encoding result of the rectangle and (w, h) represents the wide-high encoding result.

Wherein,(px, py) is the normalized center coordinates of the rectangular frame of the candidate, (pw, ph) is the normalized width of the rectangular frame of the candidateHigh. />Is the coding coefficient.

，/>Is the coding coefficient.

The decoding process is the inverse of the encoding process.

Positive and negative sample selection, comprising the steps of:

s41: IOU >0 between the predicted rectangular frame and the candidate rectangular frame, and N samples ranked at the front are positive samples; others are negative examples of candidates.

S42: the classification loss of all candidate negative samples is calculated and ranked.

S43: the most difficult negative samples, i.e. the most lost M candidate negative samples, are taken as the final negative samples to be selected.

S44: wherein the method comprises the steps ofα is the negative sample magnification, α=5.

In the loss design, the network model training loss is designed as a joint loss, and loss combinations with 4 different weights are adopted, wherein the loss combinations comprise a loss function and a classification function:

1. the loss function for regression position information is used for improving processing precision, and two regression loss functions are used at the same time.

Regression loss function 1: using smoothL1, the formula was

Where x is the numerical difference value encoded between the predicted rectangular and real frames, the upper left and lower right corner coordinates.

Regression loss function 2: CIoU loss, the calculation formula is

Predicted rectangular frame b and real frame b ^gt Cross-over ratio of (C)

Wherein:

wherein b represents a predicted rectangular box b ^gt Representing a real frame;rectangular frame b and real frame b for prediction ^gt Is a center point distance of (2); c is the minimum circumscribed rectangular diagonal length of the predicted rectangular frame and the real frame;

(w, h) and (w) ^gt ,h ^gt ) Representing the width and height of the predicted rectangular frame and the width and height of the real frame respectively;

，/>。

2. the classification function uses a cross entropy loss function, which is used to confirm whether the predicted rectangular box contains the correct eye and to determine whether the predicted eye is the left or right eye. The calculation method is as follows:

wherein y is _i A label representing sample i, positive class 1, negative class 0; p is p _i Representing the probability that sample i is predicted to be a positive class; in the left-right eye judgment, the left eye is represented by a positive class, and the right eye is represented by a negative class.

To sum up, the above 4 losses are overlapped according to different weights, and the calculation mode is as follows:

，

wherein,calculating weights for the ith loss; />Is the calculated value of the i-th loss. In this embodiment +.>,。

In step S3, training the network model using the training data set, including the steps of:

s51: splitting the marked image and data into two sets: training data set and validation data set, ratio 9:1.

s52: training is performed using the training data set, and network model verification and parameter screening are performed using the verification data set.

S53: and inputting the images preprocessed by the training data set and the marking data into a network, and performing forward reasoning on the input images by the network to obtain a current reasoning result.

S54: and performing classification loss calculation and regression loss calculation by using the current reasoning result and the corresponding marking data to obtain the current joint loss.

S55: the loss is propagated to the network through the reverse gradient, and the current network weight is adjusted together with other parameters such as learning rate and the like;

s56: the above process is repeated until the training is finished.

S57: the training end conditions are as follows: the specified rounds of training (epoch=100) were completed.

S58: the network model parameter screening conditions are as follows: the index of the dataset is verified as the best result of the current training and meets expectations. The example metrics include IOU metrics [ IOU >0.99] and F1Score metrics [ F1Score >0.99].

In step S4, forward reasoning is carried out on the iris image to be processed by using the trained network model, and a reasoning result is obtained.

The reasoning result is a prediction result, and comprises all the encoded predicted rectangular frame position information, the corresponding classification information and the left-right eye classification information. This information must be further processed to resolve the best and correct results.

In step S5, the inference result is post-processed, and the position of the eyes and the left and right eye judgment result in the iris image are analyzed. The method comprises the following steps:

s61: firstly, decoding the reasoning result.

S62: and (5) decoding the processed data, and screening out a predicted rectangular frame with the best matching through non-maximum value suppression processing. The IOU threshold used for non-maximum suppression is 0.5 and the classification threshold is 0.6.

S63: and carrying out inverse normalization on the screened predicted rectangular frame to obtain the image coordinate information of the predicted rectangular frame.

S64: and judging the left and right eyes according to the set left and right eye classification threshold values. The classification threshold is taken to be 0.5.

To this end, the best and correct predicted rectangular frame position information is obtained, and whether it is the left eye or the right eye is judged.

Fig. 7 is a block diagram of an eye detection system based on a deep convolutional neural network according to an embodiment of the present application. Referring to fig. 7, the eye detection system based on the deep convolutional neural network includes: marking module 71, construction module 72, training module 73, reasoning module 74, and post-processing module 75.

The marking module 71 is configured to mark a training data set to obtain a true phase, where the training data set includes iris images to be trained, and the true phase includes position information of a rectangular frame in the training data set.

The construction module 72 is used to construct a deep neural network model.

The training module 73 is configured to train the deep neural network model using the training data set.

The inference module 74 is configured to forward infer an iris image to be processed using the trained deep neural network model, so as to obtain an inference result, where the inference result includes position information of a predicted rectangular frame in the iris image.

The post-processing module 75 is configured to post-process the reasoning result, and analyze the position of the eyes and the left and right eye judgment result in the iris image according to the predicted position information of the rectangular frame.

In some embodiments, the marking information of the training data set by the marking module 71 includes: coordinate position information of eyes in iris images, classification information of whether eyes are, left/right eye classification information.

In some embodiments, build module 72 includes: the device comprises a preprocessing unit, a design unit, a coding unit, a selection unit and a calculation unit.

And the preprocessing unit is used for preprocessing the image and the marking data.

And the design unit is used for designing a rectangular frame for the eye images in the training data set.

The coding unit is used for coding the position information of the rectangular frame;

the selection unit is used for selecting positive and negative samples of the training data set;

and a calculation unit for calculating the joint loss.

In some embodiments, the computing unit is specifically configured to: calculating a classification loss; calculating regression loss; and carrying out weighted average on the obtained classification loss and regression loss to obtain the joint loss.

In some embodiments, build module 72 further includes: and (5) constructing a unit.

A building unit, configured to build a network model structure including three phases, where the three phases include: a downsampling process, a feature fusion process and a multi-scale predicted rectangular frame information output process.

In some embodiments, training module 73 comprises: the device comprises an inference unit, a loss calculation unit, an adjustment unit and a repeated execution unit.

The inference unit is used for inputting the images preprocessed by the training data set and the marking data into the network, and the network performs forward inference on the input images to obtain a current inference result.

The loss calculation unit is used for carrying out classification loss calculation and regression loss calculation by using the current reasoning result and the corresponding marking data to obtain the current joint loss.

The adjustment unit is used for adjusting the current network weight together with other parameters by propagating the loss into the network through the inverse gradient.

The repeated execution unit is used for repeating the above process until training is finished.

In some implementations, the current inference result includes a plurality of neighboring result values.

In some embodiments, the post-processing module 75 includes: the device comprises a decoding unit, a suppressing unit, an inverse normalization unit and a judging unit.

The decoding unit is used for decoding the reasoning result.

The suppression unit is used for decoding the processed data and screening out a predicted rectangular frame which is optimally matched through non-maximum suppression processing.

And the inverse normalization unit is used for inversely normalizing the screened predicted rectangular frame to obtain the image coordinate information of the predicted rectangular frame.

The judging unit is used for judging left and right eyes according to the set classification threshold.

Fig. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present application, and shows a block diagram of an exemplary electronic device suitable for implementing an embodiment of the present application. The electronic device shown in fig. 8 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present application. As shown in fig. 8, the electronic apparatus includes a processor 81, a memory 82, an input device 83, and an output device 84; the number of processors 81 in the electronic device may be one or more, in fig. 8, one processor 81 is taken as an example, and the processors 81, the memory 82, the input device 83, and the output device 84 in the electronic device may be connected by a bus or other means, in fig. 8, by a bus connection is taken as an example.

The memory 82 is used as a computer readable storage medium for storing a software program, a computer executable program, and modules, such as program instructions/modules corresponding to an eye detection method based on a deep convolutional neural network in an embodiment of the present application. The processor 81 executes various functional applications of the computer device and data processing by running software programs, instructions and modules stored in the memory 82, i.e., implements the above-described eye detection method based on the deep convolutional neural network.

Note that the above is only a preferred embodiment of the present application and the technical principle applied. It will be understood by those skilled in the art that the present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the application. Therefore, while the application has been described in connection with the above embodiments, the application is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the application, which is set forth in the following claims.

Claims

1. An eye detection method based on a deep convolutional neural network, comprising:

s2: constructing a deep neural network model;

s3: training the deep neural network model by using the training data set;

s5: post-processing the reasoning result, and analyzing the positions of eyes and left and right eye judgment results in the iris image according to the predicted position information of the rectangular frame;

in step S2, a deep neural network model is constructed, including:

preprocessing the image and the marking data;

designing a rectangular frame for an eye image in the training dataset;

position information coding is carried out on the candidate rectangular frames;

positive and negative sample selection is carried out on the training data set;

calculating joint loss;

preprocessing the image and the marking data, including preprocessing the image data and preprocessing the marking data, wherein the image preprocessing includes the following steps:

s21: the image data is subjected to data augmentation processing, wherein deformation, cutting, horizontal overturning and rotation influence the operation of the position information of the marking data, and the corresponding marking position information is adjusted at the same time;

s22: the image subjected to data augmentation at least comprises one complete marked eye;

s23: finally, the input image is zoomed into a single-channel gray level image with a specified size, and the single-channel gray level image is input into a network model to be trained after standardized processing;

s24: the standardized processing mode is thatWherein x is _new For the normalized image pixel values, x is the image pixel value before processing,/for the normalized image pixel value>Statistical mean value of pixel values for training data set, < >>Is the standard deviation;

wherein the marking data preprocessing comprises the following steps:

s31: matching with the augmentation processing operation of the position information affecting the marking data, and adjusting the corresponding position information;

s32: finally, after normalization processing is carried out on the position information, the position information is used as new group trunk information, and the new group trunk information is input into a network model to be trained;

2. The method of claim 1, wherein the marking information for the training data set comprises: coordinate position information of eyes in iris images, classification information of whether eyes are, left/right eye classification information.

3. The method of claim 1, wherein calculating the joint loss comprises:

calculating a classification loss;

calculating regression loss;

and carrying out weighted average on the obtained classification loss and regression loss to obtain the joint loss.

4. The method of claim 1, wherein constructing a deep neural network model further comprises:

constructing a network model structure comprising three stages, wherein the three stages comprise: a downsampling process, a feature fusion process and a multi-scale predicted rectangular frame information output process.

5. The method of claim 1, wherein training the deep neural network model using the training dataset in step S3 comprises:

inputting the images preprocessed by the training data set and the marking data into a network, and performing forward reasoning on the input images by the network to obtain a current reasoning result;

carrying out classification loss calculation and regression loss calculation by using the current reasoning result and the corresponding marking data to obtain current joint loss;

the loss is propagated into the network through a reverse gradient, and the current network weight is adjusted together with other parameters;

the above process is repeated until the training is finished.

6. The method of claim 5, wherein the current inference result comprises a plurality of neighboring result values.

7. The method according to claim 1, wherein in step S5, the post-processing of the reasoning result, and analyzing the position of the eyes and the left and right eye judgment result in the iris image based on the position difference between the rectangular frame and the predicted rectangular frame, includes:

s61: firstly, decoding an reasoning result;

s62: the data after the decoding processing is filtered out a predicted rectangular frame with optimal matching through non-maximum value inhibition processing;

s63: performing inverse normalization on the screened predicted rectangular frame to obtain image coordinate information of the predicted rectangular frame;

s64: and judging left and right eyes according to the set classification threshold.

8. An eye detection system based on a deep convolutional neural network, comprising:

the construction module is used for constructing a deep neural network model;

the post-processing module is used for carrying out post-processing on the reasoning result and analyzing the positions of eyes and left and right eye judgment results in the iris image according to the predicted position information of the rectangular frame;

constructing a deep neural network model, comprising:

preprocessing the image and the marking data;

designing a rectangular frame for an eye image in the training dataset;

position information coding is carried out on the candidate rectangular frames;

positive and negative sample selection is carried out on the training data set;

calculating joint loss;

wherein the marking data preprocessing comprises the following steps:

s33: the normalization processing mode is that,，/>wherein (x) _new ,y _new ) For the new coordinate point after normalization processing, (x, y) is the coordinate point before processing, width is the pixel width of the corresponding image, and high is the pixel height of the corresponding image.

9. An electronic device, comprising:

a memory for storing executable instructions;

a processor for implementing a deep convolutional neural network-based eye detection method as claimed in any one of claims 1 to 7 when executing executable instructions stored in said memory.