CN113591815B

CN113591815B - Method for generating canthus recognition model and method for recognizing canthus in eye image

Info

Publication number: CN113591815B
Application number: CN202111147584.1A
Authority: CN
Inventors: 张小亮; 王秀贞; 戚纪纲; 杨占金; 其他发明人请求不公开姓名
Original assignee: Beijing Superred Technology Co Ltd
Current assignee: Beijing Superred Technology Co Ltd
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2021-12-21
Anticipated expiration: 2041-09-29
Also published as: CN113591815A

Abstract

The disclosure discloses a method for generating an eye corner identification model and a method for identifying an eye corner in an eye image. The method for generating the canthus recognition model comprises the following steps: labeling key points of an eye corner in the eye image to obtain labeled data; respectively preprocessing the eye image and the annotation data to obtain a preprocessed eye image and preprocessed annotation data; constructing an eye corner identification model and setting initial model parameters; inputting the preprocessed eye image into an eye corner recognition model to recognize eye corner key points in the eye corner recognition model, and outputting a feature map containing the eye corner key points; processing the characteristic graph to generate a numerical coordinate about the corner key point; and determining a loss value based on the labeling data, the numerical coordinates and the characteristic diagram to update parameters of the canthus identification model until the loss value meets a preset condition, and taking the corresponding canthus identification model as a finally generated canthus identification model.

Description

Method for generating canthus recognition model and method for recognizing canthus in eye image

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a method for identifying whether an eye corner in an eye image is visible.

Background

In the technical fields of identity identification and the like, the iris characteristics have wide market prospect and scientific research value due to the characteristics of stability, uniqueness and non-invasiveness.

In an actual application scene, some situations unfavorable for iris recognition sometimes occur in an acquired eye image containing human eyes. For example, due to occlusion or the like, the eye image includes only a part of the eye and lacks features such as canthus; as another example, there are different degrees of tilt of the human eye in the eye image. Both of these problems limit the efficiency and accuracy of iris recognition.

In view of the above, a solution for processing an eye image is needed to solve the above problem.

Disclosure of Invention

To this end, the present disclosure provides a method of generating an eye corner identification model and a method of identifying an eye corner in an eye image in an attempt to solve or at least alleviate the problems presented above.

According to a first aspect of the present disclosure, there is provided a method of generating an eye corner recognition model, comprising the steps of: labeling key points of an eye corner in the eye image to obtain labeled data; respectively preprocessing the eye image and the annotation data to obtain a preprocessed eye image and preprocessed annotation data; constructing an eye corner identification model and setting initial model parameters; inputting the preprocessed eye image into an eye corner recognition model to recognize eye corner key points in the eye corner recognition model, and outputting a feature map containing the eye corner key points; processing the characteristic graph to generate a numerical coordinate about the corner key point; and determining a loss value based on the labeling data, the numerical coordinates and the characteristic diagram so as to update the model parameters of the canthus identification model, and taking the corresponding canthus identification model as the finally generated canthus identification model when the loss value meets the preset condition.

Optionally, in the method according to the present disclosure, the step of respectively preprocessing the eye image and the annotation data to obtain a preprocessed eye image and preprocessed annotation data includes: cutting out a human eye region from the eye image according to a preset size, and taking the cut-out image as a preprocessed eye image; and normalizing the labeling coordinates of the human eye key points in the labeling data to obtain the preprocessed labeling data.

Optionally, in a method according to the present disclosure, the eye corner recognition model comprises a convolution processing component and a classification component coupled to each other; the convolution processing component at least comprises a plurality of processing stages, and an attention module is coupled between each processing stage, wherein the plurality of processing stages are suitable for extracting the features of the preprocessed eye image, and the attention module is suitable for enhancing the features of the preprocessed eye image.

Optionally, in the method according to the present disclosure, the step of inputting the preprocessed eye image into the eye corner recognition model to recognize the eye corner key points therein, and outputting the feature map including the eye corner key points includes: extracting the key points of the canthus of the preprocessed eye image through a convolution processing component, and outputting a first feature map containing the position coordinates of the key points of the canthus; by means of the classification component, the probability that the extracted corner of the eye keypoints are visible is predicted, and a second feature map comprising the probability is output.

Optionally, in the method according to the present disclosure, the processing the feature map, and the generating of the numerical coordinates about the corner key points includes: generating a feature map template with the same size as the second feature map, wherein the values of the feature map template are distributed in an interval of [ -1,1 ]; and performing dot product operation on the characteristic diagram template and the second characteristic diagram to generate numerical coordinates.

Optionally, in a method according to the present disclosure, annotating data comprises: the method comprises the steps of marking coordinates of the corner key points and an attribute label indicating whether the corner key points are visible or not, wherein the attribute label is 1 when the corner key points are visible, and the attribute label is 0 when the corner key points are invisible.

Optionally, the method according to the present disclosure further comprises the steps of: if the attribute label is 0, only calculating the loss value of the numerical value coordinate; if the attribute label is 1, calculating the loss value of the numerical coordinate and the loss value of the second characteristic diagram.

According to a second aspect of the present disclosure, there is provided a method of identifying an eye corner in an eye image, comprising the steps of: inputting the eye image into an eye corner recognition model, and outputting a feature map containing key points of the eye corner after processing; converting the coordinates of the canthus key points in the feature map to generate converted coordinates; if the converted coordinates are not in the coordinate range of the eye image, determining that the eye corner in the eye image is invisible; and if the converted coordinates are in the coordinate range of the eye image, confirming whether the eye corner in the eye image is visible or not by calculating the confidence degree of the key point of the eye corner, wherein the eye corner identification model is generated by executing the method.

Optionally, in the method according to the present disclosure, the step of confirming whether the corner of the eye in the eye image is visible by calculating the confidence of the corner key point includes: calculating a variance based on the feature map and the feature map template; determining confidence of the corner key points based on the variance; if the confidence coefficient is greater than the threshold value, confirming that the canthus in the eye image is visible; and if the confidence coefficient is not greater than the threshold value, confirming that the canthus in the eye image is invisible.

According to a third aspect of the present disclosure, there is provided a computing device comprising: at least one processor; and a memory storing program instructions that, when read and executed by the processor, cause the computing device to perform the above-described method.

According to a fourth aspect of the present disclosure, there is provided a readable storage medium storing program instructions which, when read and executed by a computing device, cause the computing device to perform the above method.

According to the technical scheme of the disclosure, when the eye corner recognition model is trained, the labeled data is standardized through preprocessing. And then, a convolutional neural network is designed according to the structural characteristics of human eyes to extract human eye characteristic information, and the constructed eye corner recognition model guarantees detection precision and also considers the problem of network delay.

The foregoing description is only an overview of the technical solutions of the present disclosure, and the embodiments of the present disclosure are described below in order to make the technical means of the present disclosure more clearly understood and to make the above and other objects, features, and advantages of the present disclosure more clearly understandable.

Drawings

To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which are indicative of various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to be within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent from the following detailed description read in conjunction with the accompanying drawings. Throughout this disclosure, like reference numerals generally refer to like parts or elements.

FIG. 1 shows a schematic diagram of a computing device 100, according to one embodiment of the present disclosure;

FIG. 2 illustrates a flow diagram of a method 200 of generating an eye corner identification model according to one embodiment of the present disclosure;

FIG. 3 illustrates a schematic structural diagram of an eye corner identification model 300 according to one embodiment of the present disclosure;

FIG. 4 shows a schematic diagram of a process of attention structure according to one embodiment of the present disclosure;

FIGS. 5 and 6 show schematic diagrams of the major modules of an eye corner identification model 300 according to one embodiment of the present disclosure;

fig. 7 shows a flowchart of a method 700 of identifying an eye corner in an eye image according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

To address the problems in the prior art, the present disclosure provides a solution for identifying the corners of the eyes in an eye image. Through the scheme, the canthus in the eye image can be rapidly identified. In an embodiment according to the present disclosure, the identified corners of the eyes are indicated with corner key points, more specifically, with left and right eye corner points. Therefore, on one hand, the condition that the canthus is not in the image due to reasons such as occlusion in the eye image can be filtered out, and the eye image which is more in line with the specification is obtained. This may help other tasks that follow, such as determining whether the iris belongs to the left eye or the right eye, and whether the canthus is present may have a greater effect on the determination result. On the other hand, in the iris recognition system, by recognizing the eye corner, the iris image may be rotated according to two corner points of the eye corner (i.e., the left eye corner point and the right eye corner point) so that the two points of the eye corner are on the same horizontal plane, which may accelerate the speed of subsequent iris recognition.

According to an embodiment of the present disclosure, a method of generating an eye corner recognition model and a method of recognizing an eye corner in an eye image are performed by a computing device. Fig. 1 is a configuration diagram of an exemplary computing device 100.

As shown in FIG. 1, in a basic configuration 102, a computing device 100 typically includes a system memory 106 and one or more processors 104. A memory bus 108 may be used for communication between the processor 104 and the system memory 106.

Depending on the desired configuration, the processor 104 may be any type of processing, including but not limited to: a microprocessor (μ P), a microcontroller (μ C), a digital information processor (DSP), or any combination thereof. The processor 104 may include one or more levels of cache, such as a level one cache 110 and a level two cache 112, a processor core 114, and registers 116. The example processor core 114 may include an Arithmetic Logic Unit (ALU), a Floating Point Unit (FPU), a digital signal processing core (DSP core), or any combination thereof. The example memory controller 118 may be used with the processor 104, or in some implementations the memory controller 118 may be an internal part of the processor 104.

Depending on the desired configuration, system memory 106 may be any type of memory, including but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. The physical memory in the computing device is usually referred to as a volatile memory RAM, and data in the disk needs to be loaded into the physical memory to be read by the processor 104. System memory 106 may include an operating system 120, one or more applications 122, and program data 124. In some implementations, the application 222 can be arranged to execute instructions on an operating system with the program data 124 by one or more processors 104. Operating system 120 may be, for example, Linux, Windows, etc., which includes program instructions for handling basic system services and performing hardware dependent tasks. The application 122 includes program instructions for implementing various user-desired functions, and the application 122 may be, for example, but not limited to, a browser, instant messenger, a software development tool (e.g., an integrated development environment IDE, a compiler, etc.), and the like. When the application 122 is installed into the computing device 100, a driver module may be added to the operating system 120.

When the computing device 100 is started, the processor 104 reads program instructions of the operating system 120 from the memory 106 and executes them. The application 122 runs on top of the operating system 120, utilizing the operating system 120 and interfaces provided by the underlying hardware to implement various user-desired functions. When the user starts the application 122, the application 122 is loaded into the memory 106, and the processor 104 reads the program instructions of the application 122 from the memory 106 and executes the program instructions.

The computing device 100 also includes a storage device 132, the storage device 132 including a removable memory 136 and a non-removable memory 138, both the removable memory 136 and the non-removable memory 138 being connected to the storage interface bus 134.

Computing device 100 may also include an interface bus 140 that facilitates communication from various interface devices (e.g., output devices 142, peripheral interfaces 144, and communication devices 146) to the basic configuration 102 via the bus/interface controller 130. The example output device 142 includes a graphics processing unit 148 and an audio processing unit 150. They may be configured to facilitate communication with various external devices, such as a display 153 or speakers, via one or more a/V ports 152. Example peripheral interfaces 144 may include a serial interface controller 154 and a parallel interface controller 156, which may be configured to facilitate communication with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device) or other peripherals (e.g., printer, scanner, etc.) via one or more I/O ports 158. An example communication device 146 may include a network controller 160, which may be arranged to facilitate communications with one or more other computing devices 162 over a network communication link via one or more communication ports 164.

A network communication link may be one example of a communication medium. Communication media may typically be embodied by computer readable instructions, data structures, program modules, and may include any information delivery media, such as carrier waves or other transport mechanisms, in a modulated data signal. A "modulated data signal" may be a signal that has one or more of its data set or its changes made in such a manner as to encode information in the signal. By way of non-limiting example, communication media may include wired media such as a wired network or private-wired network, and various wireless media such as acoustic, Radio Frequency (RF), microwave, Infrared (IR), or other wireless media. The term computer readable media as used herein may include both storage media and communication media.

The computing device 100 also includes a memory interface bus 134 coupled to the bus/interface controller 130. The memory interface bus 134 is coupled to the memory device 132, and the memory device 132 is adapted for data storage. An exemplary storage device 132 may include removable storage 136 (e.g., CD, DVD, U-disk, removable hard disk, etc.) and non-removable storage 138 (e.g., hard disk drive, HDD, etc.).

In a computing device 100 according to the present disclosure, the application 122 includes instructions for performing the method 200 of generating an eye corner identification model of the present disclosure, and/or the method 700 of identifying an eye corner in an eye image. Training data for training the generation of the corner of the eye recognition model and model parameters associated with the corner of the eye recognition model may be included in program data 124, and the disclosure is not limited thereto. Further, the instructions may instruct the processor 104 to perform the above-described method of the present disclosure to identify the corner of the eye in the eye image (i.e., to confirm whether the corner of the eye in the eye image is visible).

Fig. 2 shows a flow diagram of a method 200 of generating an eye corner recognition model according to one embodiment of the present disclosure. According to one embodiment, method 200 may be performed in a computing device (computing device 100 as described above). The method 200 is directed to generating an eye corner recognition model through training, wherein the eye corner recognition model can be applied to an eye corner recognition scheme to recognize an eye corner from an eye image.

As shown in fig. 2, the method 200 begins at step S210. In step S210, the canthus key points in the eye image are labeled to obtain labeled data.

The eye image may be acquired, for example, as follows: the camera faces the front of the human face and collects eye images containing human eyes. In general, the eye image needs to emphasize a human eye portion, in other words, in the eye image, the human eye portion should occupy a large area in the image.

According to the embodiment of the disclosure, the collected eye image is labeled, and the labeling strategy is to label the eye corner coordinate points of the human eyes as the key points of the eye corners. The annotation data includes: the labeling coordinates of the canthus key points and the attribute labels indicating whether the corresponding canthus key points are visible or not. Wherein, when the corner key point is visible (i.e. the corner key point exists in the eye image), the attribute label is 1; when the corner key point is not visible (i.e., the corner key point is not present in the eye image), the attribute label is 0.

In one embodiment, if the canthus key point is not visible, the recorded labeled coordinates are (-1, -1), and the attribute label of the point is recorded as 0; if the corner key point is visible, the recorded labeling coordinate is the actual coordinate of the corner key point in the eye image, and the attribute label of the point is recorded as 1. Further, the content and order of the annotation data are: the labeling coordinates of the left eye corner points, the attribute labels (0 or 1) of the left eye corner points, the labeling coordinates of the right eye corner points and the attribute labels (0 or 1) of the right eye corner points.

Subsequently, in step S220, the eye image and the annotation data are preprocessed respectively to obtain a preprocessed eye image and preprocessed annotation data.

According to an embodiment of the present disclosure, step S220 includes two parts: the method comprises the steps of preprocessing an eye image and preprocessing marking data, wherein the two steps are respectively carried out. These are described below.

1) Pre-processing for eye images

And cutting out an image corresponding to the human eye region from the eye image according to a preset size, and taking the cut-out image as a preprocessed eye image. The preset size is, for example, an input image size required for the corner of the eye recognition model.

Certainly, in order to make the trained eye corner recognition model more robust, in the preprocessing stage, necessary data enhancement processing can be performed on the eye image, and then the eye region can be cut out. Data enhancement processing includes, but is not limited to, gaussian blur, luminance adjustment, affine transformation, and the like.

2) Preprocessing for annotated data

The image corresponding to the human eye area is cut out from the eye image, so that the coordinate information of the eye corner is changed along with the image. Therefore, it is necessary to normalize the labeling coordinates of the eye key points in the labeling data, and use the normalized labeling coordinates and the corresponding attribute labels as the preprocessed labeling data.

In one embodiment, the labeled coordinates of the canthus keypoints are calculated according to equation (1) and their coordinate values are normalized to between [ -1,1], which is done to maintain the same distribution as the output of the canthus recognition model.

(1)

Wherein the content of the first and second substances,x、yin order to normalize the previously noted coordinates,

、

in order to be the coordinates after the normalization,w、his the size of the cropped eye image.

Further, for convenient calculation, the marking data after preprocessingIf the corner key points are visible, the labeled coordinates are the coordinates corresponding to the cropped image (i.e.,

、

) (ii) a If the key point of the corner of the eye is invisible, the labeled coordinates are all recorded as 0.

Subsequently, in step S230, an eye corner recognition model is constructed, and initial model parameters are set.

According to an embodiment of the present disclosure, an eye corner recognition model includes a convolution processing component and a classification component coupled to each other.

The convolution processing component extracts the key points of the eye corners in the pre-processed eye images by performing operations such as convolution and the like on the pre-processed eye images, and positions the key points of the eye corners to the position coordinates of the key points of the eye corners to generate a first feature map indicating the positions of the key points of the eye corners. In other words, the first feature map includes the location coordinates of the identified corner key points. More specifically, the generated first feature map has 2 pieces indicating the position coordinates of the left-eye corner point and the right-eye corner point, respectively. The first feature legend may be a heatmap, for example, to which the disclosure is not limited.

And then, inputting the 2 first feature maps into a classification component, carrying out operations such as normalization and the like, and correspondingly outputting 2 second feature maps. In the second feature map, each pixel value represents the probability that the pixel at the corresponding position is visible, and the sum of all the pixel values is 1. Since the position coordinates of the corner of the eye keypoints are indicated in the first feature map of the input classification component, the probability that the identified corner of the eye keypoints are visible can be determined from the second feature map. More specifically, 2 second feature maps indicate the probability that the corner points of the left eye and the right eye are visible, respectively.

In one embodiment, the convolution processing component is designed for structured information that the human eye has. Such structured information may include, for example, eyelids, irises, etc., curves formed by upper and lower eyelids having intersecting points (i.e., eye corner points), irises positioned between curves formed by upper and lower eyelids, etc. In the present disclosure, this is referred to as a structural feature of the eye image (i.e., "feature" described below).

The convolution processing component includes at least a plurality of processing stages, and attention modules are coupled between the processing stages. The plurality of processing stages are used for extracting the features of the pre-processed eye image, and the attention module is used for enhancing the features of the pre-processed eye image.

An example of an eye corner recognition model according to an embodiment of the present disclosure is shown below. It should be understood that the canthus identification model is only used as an example, and any canthus identification model constructed based on the description of the embodiments of the present disclosure is within the scope of the present disclosure.

Fig. 3 shows a schematic structural diagram of an eye corner identification model 300 according to an embodiment of the present disclosure. As shown in FIG. 3, the corner of the eye recognition model 300 includes a convolution processing component 310 and a classification component 320 coupled to each other. The pre-processed eye images are input to the eye corner recognition model 300 and processed by the classification component 320 to output a feature map.

As shown in fig. 3, the convolution processing component 310 further includes a convolution layer (Conv), a first processing Stage (Stage 1), an intermediate processing module, a second processing Stage (Stage 2), an intermediate processing module, a third processing Stage (Stage 3), an intermediate processing module, an Up sampling module (Up Sample), an intermediate processing module, and a convolution layer (Conv) coupled in sequence.

Wherein, the intermediate processing module comprises an attention module and a convolution module. The first processing Stage (Stage 1), the second processing Stage (Stage 2) and the third processing Stage (Stage 3) are used for extracting the features of the pre-processed eye image. And the attention module in the intermediate processing module is used for enhancing the characteristics of the preprocessed eye image.

According to one embodiment, when constructing the canthus recognition model 300, the input size and memory access cost are necessarily limited in order to balance its accuracy and speed of reasoning. The main strategy for the structural design of the corner of the eye recognition model 300 is to reduce branching and reduce the use of repetitive features.

A specific network structure is shown in table 1. It should be noted that table 1 is merely an example, and the present disclosure is not limited to a specific network structure of the corner of the eye recognition model.

TABLE 1

Layer name	Output size	Convolution kernel size	Step size	Number of module repetitions	Channels of output channel number
						Input	128×192				3
Conv1	64×96	3	2	1	32
						Stage1	32×48	3	2/1	4	64
Attention	32×48			1	64
						Conv2	32×48	3	1	1	64
Stage2	16×24	3	2/1	6	128
						Attention	16×24			1	128
Conv3	16×24	3	1	1	128
						Stage3	8×12	3	2/1	4	256
Attention	8×12			1	256
						Conv4	8×12	3	1	1	256
Up Sample	16×24	3		1	128
						Attention	16×24			1	128
Conv5	16×24	3	1	1	128
						Conv6	16×24	1	1	1	2
Softmax	16×24			1	2

Wherein the layer structure from Conv1 to Conv6 constitutes the convolution processing component 310, and the Softmax layer represents the classification component 320. Conv1 and Conv6 represent convolutional layers preceding the first Stage of processing, Stage1 and the sorting module Softmax, respectively.

According to one embodiment, the intermediate processing modules in turn comprise Attention modules (Attention) and convolution modules (Conv 2, Conv3, Conv 4, Conv 5), as in table 1, Attention and Conv2 constitute the intermediate processing modules connecting Stage1 and Stage2, Attention and Conv3 constitute the intermediate processing modules connecting Stage2 and Stage3, and so on. The output size and the number of channels of the feature diagram processed by the intermediate processing module are not changed.

As shown in table 1, in the convolution processing component 310, the corner key points of the pre-processed eye image are extracted through operations such as convolution of each layer. When the output size of the feature map is reduced to 8 × 12, the feature map is not reduced any more, and an upsampling operation is performed on the feature map, so that the size of the feature map is 16 × 24, and then two feature maps are obtained through Conv6 to serve as first feature maps, wherein the two first feature maps respectively represent a left eye corner point and a right eye corner point.

Then, the two first feature maps are input into the classification component 320, the probabilities that the extracted left eye corner point and right eye corner point are visible are predicted, and the two feature maps are output as second feature maps which respectively represent the probabilities that the left eye corner point and the right eye corner point are visible. According to one embodiment, the first feature map is subjected to a Softmax function, the resulting second feature map is a discrete probability distribution, and the sum of the pixel values of the second feature map is 1. And judging whether the left eye corner point is visible or not according to the probability value of the coordinate position corresponding to the left eye corner point. Similarly, whether the right eye corner point is visible or not can be judged according to the probability value of the corresponding coordinate position of the right eye corner point in the other second characteristic diagram.

As described above, in the embodiment according to the present disclosure, the Attention module Attention is designed for the structural information of the human eye to enhance the partial features. FIG. 4 illustrates a process of an attention module according to one embodiment of the present disclosure. As shown in FIG. 4, first, the input feature map F is activated (e.g., ReLu) to obtain

. The purpose of this is to normalize the feature map, which is a regularization function. Then will be

And sending the data into a Sigmoid function module, wherein the aim of the module is to enhance the boundary information between the eyelid, the iris and other areas. Finally, the Sigmo will be processedAnd performing Hadamard product operation on the output of the id function module and the F to output a characteristic diagram O.

An expression of the Sigmoid function module is shown as formula (2):

(2)

where, δ represents a Sigmoid function,Avgrepresentation pair feature diagram

And (6) calculating an average value.

In addition, as can be seen from table 1, in each of the processing stages (Stage 1, Stage2, and Stage 3), the number of times of module repetition greater than 1 is set, indicating the number of occurrences of the main module. It should be understood that when the number of module repetitions is 1, it can also be considered that the module only appears 1 time, i.e. the structure of the layer is not repeated any more.

According to the present embodiment, the main modules in each Stage have the same configuration, and each Stage includes 2 main modules, which are referred to as a module a and a module B for convenience of description. And the first module in each Stage adopts a module B, and the following modules adopt a module A. Continuing with table 1, the number of repetitions of the modules of Stage1 and Stage3 is 4, i.e., Stage1 and Stage3 are each formed by sequentially coupling module B, module a, and module a. The number of module repetitions in Stage2 is 6, that is, Stage2 is formed by sequentially coupling module B, module a, and module a.

Fig. 5 and 6 show schematic diagrams of the main blocks of the corner of the eye recognition model 300. Fig. 5 shows a block a, and fig. 6 shows a block B. In the figure, h × w × c indicates the width w, height h, and channel c of the input feature map, and the number of input feature maps is omitted.

In fig. 5, the input eigen channels undergo channel splitting operation (channel split), half of which undergo 1 × 1 convolution operation, the output channels are t × c/2, and t is the channel expansion coefficient; then, the convolution operation of 3 multiplied by 3 is carried out to restore the channels to c/2 channels; after that, the two branches are subjected to a splicing operation (concat). This process does not involve a downsampling operation.

In fig. 6, s =2 indicates that the feature map is downsampled, and the role of t is the same as that in fig. 5. And (3) directly carrying out splicing operation (concat) on the two branches, wherein the input channel c is not separated in the process, and the purpose is to make up the characteristic loss caused in the down-sampling process by increasing the number of channels of the characteristic diagram.

It should be noted that, here, the structure of the corner of the eye recognition model according to an embodiment of the present disclosure is shown only as an example. The present disclosure is not so limited.

Subsequently, in step S240, the preprocessed eye image is input into the eye corner recognition model to recognize the eye corner key points therein, and the feature map containing the eye corner key points is output.

As described above, the pre-processed eye image is input to the eye corner recognition model, the eye corner key points of the pre-processed eye image are extracted by the convolution processing component, and the first feature map including the position coordinates of the eye corner key points is output. The first feature map is then input to a classification component, which predicts the probability that the extracted corner key points are visible, and outputs a second feature map containing the probability. According to the present embodiment, the second feature map is used as the feature map including the corner key points that is finally output.

Because the feature graph is output by the eye corner identification model, the size of the feature graph has a large influence on the reasoning speed of the model, and therefore, in order to improve the efficiency of the model in deployment, the small feature graph is used in the last layer of the network. But also results in a loss of accuracy, mainly due to the fact that the size of the output signature differs from the size of the network input by a large factor. As in the example of table 1, the output feature map size is 16 × 24, and the input image size is 128 × 192. In order to alleviate the precision loss caused by using the small feature maps, the output feature maps can be converted into numerical coordinates, and then loss calculation is carried out on the numerical coordinates.

Therefore, in the subsequent step S250, the feature map is processed to generate numerical coordinates regarding the corner key points.

According to one embodiment, a feature map template having the same size as the feature map (i.e., the second feature map) is generated, i.e., the feature map template has a size of 16 × 24, and the pixel values corresponding to the coordinates in the feature map template are distributed in the interval [ -1,1] corresponding to the pre-processing of the labeled data in step S220. In one embodiment, for each feature map, a feature map template X corresponding to an abscissa and a feature map template Y corresponding to an ordinate are generated respectively. For example, the feature map template may be generated by formula (3), corresponding to the horizontal and vertical coordinates:

(3)

wherein i and j are horizontal and vertical coordinates of the second characteristic diagram,

、

respectively are characteristic diagram templates of horizontal and vertical coordinates,

the characteristic diagram template is from left to right, each column has the same numerical value, and similarly,

the numerical values of all the rows are the same from top to bottom;w、his the size of the feature map template.

To further illustrate the computation of the profile template, the following is illustrated. Assume a signature graph is:

0.0	0.0	0.0	0.0	0.0
					0.0	0.0	0.0	0.1	0.0
0.0	0.0	0.1	0.6	0.1
					0.0	0.0	0.0	0.1	0.0
0.0	0.0	0.0	0.0	0.0

that is, in equation (3), j = 1.. w and i = 1.. h correspond to the feature map templates of the horizontal and vertical coordinates obtained by calculation according to equation (3) if w = h = 5. Wherein, the characteristic diagram template X is:

-1	-0.5	0	0.5	1
					-1	-0.5	0	0.5	1
-1	-0.5	0	0.5	1
					-1	-0.5	0	0.5	1
-1	-0.5	0	0.5	1

the characteristic diagram template Y is as follows:

-1	-1	-1	-1	-1
					-0.5	-0.5	-0.5	-0.5	-0.5
0	0	0	0	0
					0.5	0.5	0.5	0.5	0.5
1	1	1	1	1

and after calculating the feature map templates of the two second feature maps, performing dot product operation on the feature map templates and the second feature maps to generate numerical coordinates. More specifically, the two feature map templates and the corresponding second feature maps are respectively subjected to dot product operation to obtain two numerical values, and the two numerical values respectively correspond to the abscissa and the ordinate, namely the numerical coordinate of the key point of the canthus. The two second characteristic graphs are calculated as above, and then: the numerical coordinates of the left eye corner point and the numerical coordinates of the right eye corner point.

In one embodiment, the second feature maps are denoted as Z1 and Z2, and each second feature map corresponds to a feature map template X and Y, and Z1 and X are dot product operations to obtain X, Z1 and Y are dot product operations to obtain Y. Then, x and y are the transformed numerical coordinates of the feature map, and are denoted as (x, y). In other words, the two output second feature maps are converted into two (x, y).

Subsequently, in step S260, a loss value is determined based on the labeling data, the numerical coordinates and the feature map to update the model parameters of the corner of the eye recognition model, and when the loss value satisfies a predetermined condition, the corresponding corner of the eye recognition model is used as the finally generated corner of the eye recognition model.

In one embodiment, the loss value is divided into two parts, determined by equation (4):

（4）

wherein the content of the first and second substances,

a second characteristic diagram is shown, which is,

the coordinates of the numerical values are represented by,tfor attribute tags, functionsL _dLoss value, function, calculated from logarithmic coordinatesL _fRefers to the loss value calculated for the second profile,

to balance the over-parameters of the two part loss values. Wherein the content of the first and second substances,L _fand part of the numerical coordinates plays a role in supervision, so that the final calculation result is more accurate.

In one embodiment of the present invention, the substrate is,L _dthe part can be calculated by means of Euclidean distance and the like,L _fthe fraction may be calculated using a function such as KL divergence. Of course, not limited thereto.

In addition, when the loss value is specifically calculated, if the attribute tag in the labeled data is 0, only the loss value of the numerical coordinate is calculated, that is, only the function in the formula (4) is calculatedL _dA moiety; if the attribute label is 1, the loss value is calculated according to the formula (4), that is, the loss value of the numerical coordinate and the loss value of the second feature map are calculated.

According to the method 200 of the present disclosure, in training the eye corner recognition model, the annotation data is first normalized by preprocessing. And then, a convolutional neural network is designed according to the structural characteristics of human eyes to extract human eye characteristic information, and the constructed eye corner recognition model guarantees detection precision and also considers the problem of network delay.

In addition, the loss function is designed to carry out loss calculation on the feature diagram output by the eye angle recognition model, so that the regression result is more accurate. Specifically, on one hand, when a loss value is calculated, the accuracy loss caused by small size of the feature map is effectively relieved by converting the feature map into numerical coordinates; on the other hand, the KL divergence learning characteristic diagram distribution plays a role in supervising the loss part of the numerical value coordinate, so that the loss value is calculated more accurately.

Fig. 7 shows a flowchart of a method 700 of identifying an eye corner in an eye image according to an embodiment of the present disclosure. The method 700 may be implemented on the basis of the method 200. As shown in fig. 7, the method 700 begins at step S710.

In step S710, the eye image to be recognized is input into the eye corner recognition model, and the feature map including the key points of the eye corner is output after processing.

In one embodiment, two feature maps (i.e., the second feature map mentioned above) are output via the canthus recognition model, wherein one feature map indicates coordinates of a left eye corner point and a probability that the left eye corner point is visible, and the other feature map indicates coordinates of a right eye corner point and a probability that the right eye corner point is visible.

Wherein, the eye corner recognition model can be generated by training of the method 200. Regarding the processing flow of the corner of the eye recognition model, reference may be made to the related description in the method 200, and details are not repeated here.

Furthermore, according to the embodiment of the present disclosure, before the eye image to be recognized is input to the corner of the eye recognition model, the eye image may be further preprocessed. The pre-processing process may refer to the pre-processing of the eye image in method 200. And cutting out an image corresponding to the human eye region from the eye image according to a preset size, and taking the cut-out image as a preprocessed eye image. The preset size is, for example, an input image size required for the corner of the eye recognition model. It should be appreciated that data enhancement processing of the eye image is no longer required here.

Subsequently, in step S720, the coordinates of the corner of the eye key points in the feature map are converted, and converted coordinates are generated.

As described above, the values of the feature map output by the eye corner recognition model (including the coordinates of the recognized eye corner key points) are distributed at [ -1,1], where they are converted to coordinates on the original image. According to an embodiment, referring to the preprocessing of the labeled data in the method 200, the coordinates of the left-eye corner point and the right-eye corner point are respectively converted by using the inverse operation of the formula (1), so as to obtain the corresponding converted coordinates. Specifically, the following formula (5) is expressed:

(5)

wherein the content of the first and second substances,

、

in order to convert the coordinates of the object to the coordinates,

、

for the coordinates of the corner key points in the output feature map,

、

is the dimension of the feature map.

According to the embodiment of the present disclosure, whether the eye corner key points are visible or not needs to be determined jointly by combining the positions (i.e., the converted coordinates) of the numerical coordinate points of the identified eye corner key points and the confidence obtained from the distribution features of the feature map. The specific determination process is as follows in step S730 and step S740.

In the following step S730, it is determined whether the converted coordinates are within the coordinate range of the eye image to be processed, and if the converted coordinates are not within the coordinate range of the eye image, it is determined that the eye corner in the eye image is invisible.

Specifically, if the converted coordinates of the left eye corner point are not in the coordinate range of the eye image, the left eye corner point is determined to be invisible; and if the converted coordinates of the right eye corner points are not in the coordinate range of the eye image, confirming that the right eye corner points are invisible.

Subsequently, in step S740, if the transformed coordinates are within the coordinate range of the eye image, it is determined whether the canthus in the eye image is visible by calculating the confidence of the canthus key point.

It should be noted that the processing manner of the two feature maps is the same, so that the present disclosure does not specifically distinguish the feature map corresponding to the left-eye corner point or the right-eye corner point, and the feature maps are collectively referred to as feature maps for description. It should be understood that, when the converted coordinates of the left eye corner point and/or the converted coordinates of the right eye corner point are within the coordinate range of the eye image, the following steps are respectively performed on the respective corresponding feature maps to correspondingly calculate the confidence of the left eye corner point and/or the right eye corner point, so as to confirm whether the left eye corner point and/or the right eye corner point is visible.

In one embodiment, the confidence of the corner key points is calculated as follows.

In the first step, the variance is calculated based on the feature map and the feature map template. The feature map template is consistent with the feature map template in step S250, and is not described herein again. For each feature map, the feature map and the feature map template X, Y are subjected to Hadamard product operation respectively to obtain two intermediate operation images; and summing the pixel values of the intermediate operation images to obtain corresponding sum values which are recorded as mx and my. Because the characteristic diagram is processed by softmax and can be used as probability representation of coordinate discrete distribution, the characteristic diagram and the characteristic diagram template X are subjected to Hadamard product operation, and then the result mx obtained by summation is represented as a mean value in the X direction; likewise, my represents the mean in the y-direction.

Then, the variances corresponding to mx and my are calculated according to equations (6) (7):

(6)

(7)

where M represents the value of each pixel in the feature map and var1 and var2 represent variances.

Similarly, the above operations are performed on the two feature maps, which correspond to the two variance values, respectively.

Second, for each feature map, based on the two variances var1 and var2, the confidence of the corner key points is calculated, which can be calculated using equations (8) (9), denoted conf1 and conf 2:

(8)

(9)

wherein the content of the first and second substances,

and

is hyper-parametric, and

the number larger than 0 is required to prevent the generation of the condition that the denominator is zero, and the specific numerical value can be selected according to the specific condition.

Then, the two confidences conf1 and conf2 are averaged to obtain the confidence corresponding to the feature map.

Thus, when the confidence is greater than the threshold, the corner of the eye in the eye image is confirmed to be visible. When the confidence is not greater than the threshold, it is confirmed that the corner of the eye in the eye image is not visible.

According to the method 700 of identifying the corner of the eye in the eye image of the present disclosure, the corner of the eye in the eye image is predicted by the eye identification model. The predicted result comprises the position coordinates of the left eye corner point and the right eye corner point and the visible probability value. Then, a series of data decoding processes are performed on the predicted result to obtain a final recognition result.

Based on the scheme disclosed by the invention, when the eye image or the iris image is acquired, whether the canthus in the current image frame is visible or not is analyzed, the condition that the acquired image is shielded or the canthus is not in the image can be identified in advance, and then the image is filtered, so that the image which is more in line with the standard and used for iris identification is obtained. This may help for other tasks that follow. For example, when the iris is determined to belong to the left eye or the right eye, the presence or absence of the canthus has a large influence on the determination result. For another example, in an iris recognition system, through visual analysis of the eye corner, the iris image can be rotated according to two corner points of the eye corner, so that the two corner points of the eye corner are on the same horizontal plane, which accelerates the speed of subsequent iris recognition. Therefore, the method has important implications for the fields of iris recognition, VR and the like.

The various techniques described herein may be implemented in connection with hardware or software or, alternatively, with a combination of both. Thus, the methods and apparatus of the present disclosure, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as removable hard drives, U.S. disks, floppy disks, CD-ROMs, or any other machine-readable storage medium, wherein, when the program is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the disclosure.

In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Wherein the memory is configured to store program code; the processor is configured to perform the methods of the present disclosure according to instructions in the program code stored in the memory.

By way of example, and not limitation, readable media may comprise readable storage media and communication media. Readable storage media store information such as computer readable instructions, data structures, program modules or other data. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of readable media.

In the description provided herein, algorithms and displays are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with examples of the present disclosure. The required structure for constructing such a system will be apparent from the description above. Moreover, this disclosure is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the present disclosure as described herein, and any descriptions above of specific languages are provided for disclosure of preferred embodiments of the present disclosure.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the disclosure may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the disclosure, various features of the disclosure are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various disclosed aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that is, the claimed disclosure requires more features than are expressly recited in each claim. Rather, disclosed aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this disclosure.

Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into multiple sub-modules.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Moreover, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the disclosure and form different embodiments. For example, any of the claimed embodiments may be used in any combination.

Furthermore, some of the described embodiments are described herein as a method or combination of method elements that can be performed by a processor of a computer system or by other means of performing the described functions. A processor having the necessary instructions for carrying out the method or method elements thus forms a means for carrying out the method or method elements. Further, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is used to implement the functions performed by the elements for the purposes of this disclosure.

As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.

While the disclosure has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the disclosure as described herein. Moreover, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the disclosed subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The disclosure of the present disclosure is intended to be illustrative, but not limiting, of the scope of the disclosure, which is set forth in the following claims.

Claims

1. A method of generating an eye corner recognition model, comprising the steps of:

labeling the canthus key points in the eye image to obtain labeling data, wherein the labeling data comprises: the method comprises the following steps of marking coordinates of an canthus key point and an attribute label indicating whether the canthus key point is visible or not, wherein the attribute label is 1 when the canthus key point is visible, and the attribute label is 0 when the canthus key point is invisible;

respectively preprocessing the eye image and the annotation data to obtain a preprocessed eye image and preprocessed annotation data;

constructing an eye corner identification model and setting initial model parameters;

inputting the preprocessed eye image into the eye corner recognition model to recognize eye corner key points in the eye corner recognition model, and outputting a feature map containing the eye corner key points;

processing the feature map to generate numerical coordinates of key points of the canthus;

and determining a loss value based on the labeling data, the numerical coordinate and the feature map to update model parameters of the eye corner identification model, and taking the corresponding eye corner identification model as a finally generated eye corner identification model when the loss value meets a preset condition, wherein if the attribute label is 0, only the loss value of the numerical coordinate is calculated, and if the attribute label is 1, the loss value of the numerical coordinate and the loss value of the feature map are calculated.

2. The method of claim 1, wherein,

the corner of the eye recognition model comprises a convolution processing component and a classification component which are coupled with each other;

the convolution processing component comprises a plurality of processing stages, and attention modules are coupled among the processing stages,

wherein the plurality of processing stages are adapted to extract features of the pre-processed eye image and the attention module is adapted to enhance the features of the pre-processed eye image.

3. The method of claim 2, wherein the step of inputting the pre-processed eye image into the eye corner recognition model to recognize the eye corner key points therein and outputting the feature map containing the eye corner key points comprises:

extracting the canthus key points of the preprocessed eye image through the convolution processing component, and outputting a first feature map containing position coordinates of the canthus key points;

predicting, by the classification component, a probability that the extracted corner of the eye keypoint is visible, and outputting a second feature map comprising the probability.

4. The method of claim 3, wherein the processing the feature map to generate numerical coordinates for the corner of the eye keypoints comprises:

generating a feature map template with the same size as the second feature map, wherein the values of the feature map template are distributed in an interval of [ -1,1 ];

and performing dot product operation on the feature map template and the second feature map to generate the numerical value coordinate.

5. The method of claim 1, wherein the loss value is determined by:

，

wherein the content of the first and second substances,

a second characteristic diagram is shown, which is,

to balance the over-parameters of the two part loss values.

6. A method of identifying an eye corner in an eye image, comprising the steps of:

inputting the eye image into an eye corner recognition model, and outputting a feature map containing key points of the eye corner after processing;

converting the coordinates of the key points of the canthus in the feature map to generate converted coordinates;

if the converted coordinates are not in the coordinate range of the eye image, confirming that the eye corner in the eye image is invisible;

if the converted coordinates are in the coordinate range of the eye image, determining whether the canthus in the eye image is visible or not by calculating the confidence degree of the canthus key points,

wherein the corner of the eye recognition model is generated by performing the method of any one of claims 1-5,

wherein, the coordinates of the eye corner key points are converted by the following formula:

，

in the formula (I), the compound is shown in the specification,

、

in order to convert the coordinates of the object to the coordinates,

、

for the coordinates of the corner key points in the output feature map,

、

is the dimension of the feature map.

7. The method of claim 6, wherein the step of confirming whether the corner of the eye in the eye image is visible by calculating the confidence of the corner key points comprises:

calculating a variance based on the feature map and a feature map template, wherein the feature map template is generated according to the coordinates of the feature map, and for each feature map, the feature map template X corresponding to the abscissa and the feature map template Y corresponding to the ordinate are provided;

determining a confidence level of the corner of the eye keypoints based on the variance;

if the confidence is greater than a threshold value, confirming that the canthus in the eye image is visible;

if the confidence is not greater than a threshold, confirming that the corner of the eye in the eye image is not visible,

wherein the step of calculating the variance comprises: respectively performing Hadamard product operation on the feature map and the feature map template X, Y to obtain two intermediate operation images; summing the pixel values of the intermediate operation images to obtain corresponding sum values which are recorded as mx and my; and respectively calculating the variances of the feature map and mx and my.

8. A computing device, comprising:

at least one processor and a memory storing program instructions;

the program instructions, when read and executed by the processor, cause the computing device to perform the method of any of claims 1-5, and/or the method of claim 6 or 7.