NZ788647A - Neural network for eye image segmentation and image quality estimation - Google Patents
Neural network for eye image segmentation and image quality estimationInfo
- Publication number
- NZ788647A NZ788647A NZ788647A NZ78864717A NZ788647A NZ 788647 A NZ788647 A NZ 788647A NZ 788647 A NZ788647 A NZ 788647A NZ 78864717 A NZ78864717 A NZ 78864717A NZ 788647 A NZ788647 A NZ 788647A
- Authority
- NZ
- New Zealand
- Prior art keywords
- color value
- eye image
- image
- pixel
- iris
- Prior art date
Links
- 230000001537 neural Effects 0.000 title claims 4
- 238000003709 image segmentation Methods 0.000 title 1
- 210000000554 Iris Anatomy 0.000 claims abstract 36
- 210000001747 Pupil Anatomy 0.000 claims abstract 34
- 210000003786 Sclera Anatomy 0.000 claims abstract 3
- 230000000875 corresponding Effects 0.000 claims 9
- 230000011218 segmentation Effects 0.000 claims 3
- 229910010282 TiON Inorganic materials 0.000 claims 1
Abstract
method for determining eye contours in a semantically segmented eye image, the method is under control of a hardware processor and comprises: receiving a semantically segmented eye image of an eye image comprising a plurality of pixels, wherein a pixel of the semantically segmented eye image has a color value, wherein the color value of the pixel of the semantically segmented eye image is a first color value, a second color value, a third color value, or a fourth color value, wherein the first color value corresponds to a background of the eye image, wherein the second color value corresponds to a sclera of the eye in the eye image, wherein the third color value corresponds to an iris of the eye in the eye image, and wherein the fourth color value corresponds to a pupil of the eye in the eye image; determining a pupil contour using the semantically segmented eye image; and determining an iris contour using the semantically segmented eye image, wherein, the pupil contour is determined using a first binary image created based on the semantically segmented eye image, wherein a color value of a first binary image pixel of the first binary image is the fourth color value or the third color value, and/or the iris contour is determined using a second binary image created based on the semantically segmented eye image, wherein a color value of a second binary image pixel of the second binary image is the third color value or the second color value. color value, wherein the color value of the pixel of the semantically segmented eye image is a first color value, a second color value, a third color value, or a fourth color value, wherein the first color value corresponds to a background of the eye image, wherein the second color value corresponds to a sclera of the eye in the eye image, wherein the third color value corresponds to an iris of the eye in the eye image, and wherein the fourth color value corresponds to a pupil of the eye in the eye image; determining a pupil contour using the semantically segmented eye image; and determining an iris contour using the semantically segmented eye image, wherein, the pupil contour is determined using a first binary image created based on the semantically segmented eye image, wherein a color value of a first binary image pixel of the first binary image is the fourth color value or the third color value, and/or the iris contour is determined using a second binary image created based on the semantically segmented eye image, wherein a color value of a second binary image pixel of the second binary image is the third color value or the second color value.
Description
MLEAP.056WO PATENT
NEURAL NETWORK FOR EYE IMAGE SEGMENTATION AND IMAGE
QUALITY ESTIMATION
CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims the benefit of priority to Russian Patent
Application Number 2016138608, filed September 29, 2016, entitled NEURAL NETWORK
FOR EYE IMAGE SEGMENTATION AND IMAGE Y ESTIMATION, which is
hereby incorporated by reference herein in its entirety.
[0001A] This application is a divisional of New Zealand Patent Application No.
751997, the entire content of which is orated herein by reference.
BACKGROUND
Field
The present disclosure s generally to systems and methods for eye
image segmentation and more particularly to using a convolutional neural network for both
eye image segmentation and image quality estimation.
Description of the Related Art
In the field of personal biometric fication, one of the most effective
known methods is to use the naturally occurring patterns in the human eye, inantly
the iris or the retina. In both the iris and the , patterns of color, either from the fibers of
the stroma in the case of the iris or from the patterns of blood s in the case of the retina,
are used for personal biometric identification. In either case, these patterns are generated
epigenetically by random events in the morphogenesis of this tissue; this means that they will
be distinct for even genetically identical (monozygotic) twins.
A conventional iris code is a bit string extracted from an image of the iris.
To e the iris code, an eye image is segmented to separate the iris form the pupil and
sclera, the segmented eye image is mapped into polar or pseudo-polar nates, and phase
information is extracted using complex-valued two-dimensional wavelets (e.g., Gabor or
Haar). A typical iris code is a bit string based on the signs of the wavelet convolutions and
has 2048 bits. The iris code may be accompanied by a mask with an equal number of bits
that signify whether an analyzed region was occluded by eyelids, eyelashes, specular
tions, or corrupted by noise. Use of such an iris code is the standard for many common
iris-based biometric tasks such as identification of passengers from passport data.
SUMMARY
The process of segmenting an eye image to separate the iris from the pupil
and sclera has many challenges.
In one , a method for eye image segmentation and image y
estimation is disclosed. The method is under control of a hardware processor and comprises:
receiving an eye image; processing the eye image using a ution neural network to
generate a segmentation of the eye image; and processing the eye image using the
convolution neural network to generate a quality estimation of the eye image, n the
convolution neural network comprises a segmentation tower and a quality estimation tower,
wherein the segmentation tower comprises tation layers and shared layers, wherein
the quality estimation tower comprises quality estimation layers and the shared layers,
wherein a first output layer of the shared layers is ted to a first input layer of the
segmentation tower and a second input layer of the segmentation tower, wherein the first
output layer of the shared layers is connected to an input layer of the quality estimation layer,
and wherein receiving the eye image comprises receiving the eye image by an input layer of
the shared layers.
In another aspect, a method for eye image tation and image quality
estimation is disclosed. The method is under control of a hardware processor and comprises:
receiving an eye image; processing the eye image using a convolution neural network to
generate a segmentation of the eye image; and sing the eye image using the
convolution neural network to generate a quality estimation of the eye image.
In yet another aspect, a method for training a convolution neural network
for eye image tation and image quality estimation is disclosed. The method is under
control of a hardware sor and comprises: obtaining a training set of eye images;
providing a convolutional neural network with the training set of eye images; and ng the
convolutional neural network with the training set of eye images, wherein the convolution
neural network comprises a segmentation tower and a quality estimation tower, wherein the
segmentation tower comprises segmentation layers and shared layers, wherein the quality
estimation tower ses quality estimation layers and the shared , wherein an output
layer of the shared layers is connected to a first input layer of the tation tower and a
second input layer of the segmentation tower, and wherein the output layer of the shared
layers is connected to an input layer of the quality estimation layer.
In a further aspect, a method for determining eye contours in a
semantically segmented eye image is disclosed. The method is under control of a hardware
processor and comprises: receiving a ically segmented eye image of an eye image
comprising a plurality of pixels, wherein a pixel of the semantically segmented eye image
has a color value, wherein the color value of the pixel of the semantically ted eye
image is a first color value, a second color value, a third color value, and a fourth color value,
wherein the first color value corresponds to a background of the eye image, wherein the
second color value corresponds to a sclera of the eye in the eye image, wherein the third
color value corresponds to an iris of the eye in the eye image, and n the fourth color
value corresponds to a pupil of the eye in the eye image; determining a pupil contour using
the semantically segmented eye image; determining an iris contour using the semantically
segmented eye image; and determining a mask for an irrelevant area in the semantically
segmented eye image.
In another aspect, a method for determining eye contours in a semantically
segmented eye image is disclosed. The method is under control of a hardware sor and
comprises: ing a semantically segmented eye image of an eye image; determining a
pupil contour of an eye in the eye image using the semantically ted eye image;
ining an iris contour of the eye in the eye image using the semantically segmented eye
image; and determining a mask for an irrelevant area in the eye image.
[0010A] In one broad form, the present invention seeks to provide a method for
determining eye contours in a semantically segmented eye image, the method is under
control of a hardware processor and comprises: receiving a semantically segmented eye
image of an eye image comprising a plurality of , wherein a pixel of the semantically
segmented eye image has a color value, wherein the color value of the pixel of the
semantically segmented eye image is a first color value, a second color value, a third color
value, or a fourth color value, wherein the first color value corresponds to a ound of
the eye image, n the second color value corresponds to a sclera of the eye in the eye
image, wherein the third color value corresponds to an iris of the eye in the eye image, and
wherein the fourth color value corresponds to a pupil of the eye in the eye image;
determining a pupil contour using the semantically segmented eye image; and ining an
iris contour using the semantically segmented eye image, wherein, the pupil r is
determined using a first binary image created based on the semantically segmented eye
image, wherein a color value of a first binary image pixel of the first binary image is the
fourth color value or the third color value, and/or the iris contour is determined using a
second binary image created based on the semantically segmented eye image, wherein a
color value of a second binary image pixel of the second binary image is the third color value
or the second color value.
[0010B] In one embodiment, the first color value is greater than the second color
value, wherein the second color value is greater than the third color value, and wherein the
third color value is r than the fourth color value.
[0010C] In one embodiment, determining the pupil r using the semantically
ted eye image comprises: determining a pupil contour ; removing a plurality of
pixels from the pupil contour border; and determining the pupil contour as an ellipse from
remaining pixels of the pupil contour border.
[0010D] In one embodiment, ining a pupil contour border further comprises:
determining contours in the first binary image; and selecting a longest contour of the
determined contours in the first binary image as a pupil contour border.
[0010E] In one embodiment, the color value of the first binary image pixel of the
first binary image is the fourth color value if a corresponding pixel in the semantically
segmented eye image has a value greater than or equal to the fourth color value, and the third
color value if the corresponding pixel in the semantically segmented eye image has a value
not greater than or equal to the fourth color value.
[0010F] In one embodiment, the method further comprises: determining a pupil
contour points bounding box enclosing the pupil contour border; computing a pupil points
area size as a diagonal of the pupil contours points bounding box; and determining a pupil
contour threshold based on the pupil points area size.
[0010G] In one embodiment, the pupil contour threshold is a fraction multiplied by
the pupil points area size, and n the fraction is in a range from 0.02 to 0.20.
] In one embodiment, the method further comprises creating a third binary
image comprising a plurality of pixels, wherein a color value of a third binary image pixel of
the plurality of pixels of the third binary image is the third color value or the second color
value.
[0010I] In one embodiment, the color value of the third binary image pixel of the
plurality of pixels of the third binary image is the third color value if a corresponding pixel in
the semantically segmented eye image has a value r than or equal to the third color
value, and the second color value if the corresponding pixel in the semantically segmented
eye image has a value not r than or equal to the third color value.
[0010J] In one embodiment, removing a plurality of pixels from the pupil contour
border comprises, for a pupil contour border pixel of the pupil contour border: determining a
closest pixel in the third binary image that has a color value of the second color value and
that is closest to the pupil contour border pixel; determining a distance between the pupil
r border pixel and the closest pixel in the third binary image; and removing the pupil
contour border pixel from the pupil contour border if the distance between the pupil contour
border pixel and the closest pixel in the third binary image is smaller than a pupil r
threshold.
[0010K] In one embodiment, determining the iris contour using the semantically
segmented eye image comprises: determining an iris contour border; removing a plurality of
pixels from the iris contour border; and determining the iris contour as an ellipse from
remaining pixels of the iris r border.
] In one embodiment, determining the iris contour border comprises:
determining contours in the second binary image; and selecting a longest contour of the
determined rs in the second binary image as an iris contour border.
[0010M] In one embodiment, the method r comprises: determining an iris
contour points bounding box enclosing the iris contour border; computing an iris points area
size as a diagonal of the iris contours points bounding box; and determining an iris contour
threshold based on the iris points area size.
[0010N] In one embodiment, a color value of the second binary image pixel of the
plurality of pixels of the second binary image is the third color value if a corresponding pixel
in the semantically segmented eye image has a value greater than or equal to the third color
value, and the second color value if the corresponding pixel in the semantically segmented
eye image has a value not greater than or equal to the third color value.
[0010O] In one embodiment, the iris contour threshold is a fraction multiple by the
iris points area size, and wherein the fraction is in a range from 0.02 to 0.20.
] In one embodiment, the method further ses creating a fourth binary
image comprising a plurality of pixels, wherein a color value of a fourth binary image pixel
of the plurality of pixels of the fourth binary image is the second color value or the first color
value.
[0010Q] In one embodiment, a color value of a fourth binary image pixel of the
plurality of pixels of the fourth binary image is the second color value if a corresponding
pixel in the semantically segmented eye image has a value greater than or equal to the second
color value, and the first color value if the corresponding pixel in the semantically segmented
eye image has a value not r than or equal to the second color value.
[0010R] In one embodiment, removing a plurality of pixels from the iris r
border comprises, for an iris contour border pixel of the contour border: determining a closest
pixel in the fourth binary image that has a color value of the first color value and that is
closest to the iris r border pixel; determining a distance between the iris contour border
pixel and the closest pixel in the fourth binary image; and removing the iris contour border
pixel from the iris contour border if the distance between the iris r border pixel and the
closest pixel in the fourth binary image is smaller than an iris contour threshold.
[0010S] In one embodiment, the method r comprises determining a binary
mask to cover an irrelevant area in the semantically segmented eye image.
[0010T] In one ment, determining the binary mask to cover the irrelevant
area in the eye image comprises: creating a binary mask image comprising a plurality of
pixels, wherein a binary mask image pixel of the binary mask image has a color value;
setting the color value of the binary mask image pixel to the third color value if a
corresponding pixel in the ically segmented eye image has a value greater than or
equal to the third color value; and setting the color value of the binary mask image pixel to
the second color value if a corresponding pixel in the semantically segmented eye image has
a value not r than or equal to the third color value.
] In one embodiment, the irrelevant area comprises a n of the
semantically segmented eye image e of an area defined by the iris contour.
[0010V] In one embodiment, the method further comprises: applying the binary
mask to the semantically segmented eye image to generate a relevant eye image comprising a
portion of the eye image that excludes the irrelevant area; and calculating a biometric
signature from the relevant eye image.
[0010W] In one embodiment, the biometric signature comprises an iris code.
[0010X] In one embodiment, the method further comprises creating a polar image
of an iris of an eye in the eye image from the eye image using the pupil contour, the iris
r, and the mask for the irrelevant area in the semantically segmented eye image.
[0010Y] In one embodiment, receiving the semantically ted eye image of an
eye image comprising a ity of pixels comprises: receiving an eye image; processing the
eye image using a convolution neural network to generate the semantically segmented eye
image; and processing the eye image using the convolution neural network to te a
quality estimation of the eye image, wherein the convolution neural k comprises a
segmentation tower and a y estimation tower, wherein the tation tower
comprises segmentation layers and shared layers, wherein the quality estimation tower
comprises quality estimation layers and the shared layers, wherein a first output layer of the
shared layers is connected to a first input layer of the segmentation tower and a second input
layer of the segmentation tower, wherein the first output layer of the shared layers is
connected to an input layer of the quality estimation layer, and wherein receiving the eye
image comprises receiving the eye image by an input layer of the shared layers.
Details of one or more implementations of the subject matter described in
this specification are set forth in the accompanying drawings and the description below.
Other features, aspects, and advantages will become apparent from the description, the
drawings, and the claims. Neither this summary nor the following detailed ption
purports to define or limit the scope of the inventive subject matter.
BRIEF DESCRIPTION OF THE DRAWINGS
is a block m of an example convolutional neural network
with a merged architecture that includes a segmentation tower and a quality estimation tower
sharing shared layers.
schematically illustrates an example eye in an eye image.
FIGS. 3A-3C depict an example convolutional neural network with a
merged architecture.
shows example s of segmenting eye images using a
convolutional neural network with the merged convolutional network architecture rated
in
is a flow diagram of an example s of creating a convolutional
neural network with a merged architecture.
is a flow diagram of an example process of segmenting an eye
image using a convolutional neural k with a merged architecture.
is a flow diagram of an example process of determining a pupil
contour, an iris contour, and a mask for irrelevant image area in a segmented eye image.
schematically illustrates an example ically segmented eye
image.
is a flow diagram of an example process of determining a pupil
contour or an iris contour in a segmented eye image.
FIGS. 10A-10C tically illustrate an example pupil contour
determination.
shows example results of determining pupil contours, iris
contours, and masks for irrelevant image areas using the example process illustrated in FIGS.
7 and 9.
FIGS. 12A-12B show example results of training a convolutional neural
network with a triplet network architecture on iris images in polar coordinates obtained after
g pupil rs and iris contours with the example processes shown in FIGS. 7 and 9.
is a block diagram of an example convolutional neural network
with a triplet k architecture.
schematically illustrates an example of a wearable display system.
Throughout the drawings, reference numbers may be re-used to indicate
correspondence between nced elements. The drawings are provided to rate
example embodiments described herein and are not intended to limit the scope of the
disclosure.
DETAILED DESCRIPTION
A conventional wavelet-based iris code with 2048 bits can be used for iris
identification. However, the iris code can be sensitive to variations including image
cropping, image ng, lighting conditions while capturing , occlusion by s
and eyelashes, and image angle of view. Additionally, prior to computing the iris code, an
eye image needs to be segmented to separate the iris region from the pupil region and the
surrounding sclera region.
A convolutional neural network (CNN) may be used for segmenting eye
images. Eye images can include the periocular region of the eye, which es the eye and
portions around the eye such as eyelids, eyebrows, eyelashes, and skin surrounding the eye.
An eye image can be segmented to generate the pupil region, iris , or sclera region of
an eye in the eye image. An eye image can also be ted to generate the background of
the eye image, including skin such as an eyelid around an eye in the eye image. The
segmented eye image can be used to compute an iris code, which can in turn be used for iris
identification. To generate an eye image segmentation useful or suitable for iris
identification, quality of the eye image or segmented eye image may be determined or
estimated. With the quality of the eye image or segmented eye image determined, eye
images that may not be useful or suitable for iris identification can be ined and filtered
out from subsequent iris identification. For example, eye images which capture blinking
eyes, blurred eye images, or improperly segmented eye images may not be useful or suitable
for iris identification. By filtering out poor quality eye images or segmented eye images, iris
identification can be improved. One le cause of generating improperly segmented eye
images is having an insufficient number of eye images that are similar to the improperly
segmented eye images when training the convolutional neural network to segment eye
images.
Systems and methods disclosed herein address various challenges related
to eye image segmentation and image quality estimation. For example, a convolutional
neural network such as a deep neural network (DNN) can be used to m both eye image
segmentation and image quality estimation. A CNN for ming both eye image
segmentation and image quality estimation can have a merged architecture. A CNN with a
merged architecture can include a segmentation tower, which segments eye images, and a
quality estimation tower, which determines quality estimations of eye images so poor quality
eye images can be ed out. The segmentation tower can include segmentation layers
connected to shared layers. The segmentation layers can be CNN layers unique to the
segmentation tower and not shared with the y estimation tower. The quality estimation
tower can include quality estimation layers ted to the shared layers. The quality
estimation layers can be CNN layers unique to the y tion tower and not shared
with the segmentation tower. The shared layers can be CNN layers that are shared by the
segmentation tower and the quality estimation tower.
The tation tower can segment eye images to generate
segmentations of the eye images. The shared layers of the segmentation tower (or the quality
estimation tower) can receive as its input an eye image, for example a 120 x 160 grayscale
image. The segmentation tower can generate tation tower . The segmentation
tower output can include multiple images, e.g., four , one for each of the pupil region,
iris region, sclera region, or background region of the eye image. The quality estimation
tower can generate quality estimations of the eye images or segmented eye images.
When training the convolutional neural network with the merged
architecture, many kernels can be learned. A kernel, when applied to its input, produces a
resulting feature map showing the response to that particular learned kernel. The resulting
feature map can then be sed by a kernel of another layer of the CNN which down
samples the resulting feature map through a pooling operation to generate a smaller feature
map. The s can then be repeated to learn new kernels for computing their resulting
feature maps.
The segmentation tower (or the quality estimation tower) in the merged
CNN architecture can implement an encoding-decoding architecture. The early layers of the
segmentation tower (or the quality estimation tower) such as the shared layers can encode the
eye image by gradually decreasing l dimension of feature maps and increasing the
number of e maps computed by the layers. Some layers of the segmentation tower (or
the quality estimation tower) such as the last layers of the segmentation layers (or the quality
estimation layers) can decode the encoded eye image by gradually increasing spatial
dimension of feature maps back to the original eye image size and decreasing the number of
feature maps computed by the layers.
A possible advantage of the merged CNN architecture including both a
segmentation tower and a quality estimation tower is that during training, the shared layers of
the CNN find feature maps that are useful for both segmentation and image quality.
Accordingly, such a CNN can be beneficial compared to use of separate CNNs, one for
segmentation and another one for quality estimation, in which the feature maps for each
separate CNN may have little or no relationship.
Example Convolutional Neural Network
is a block diagram of an example convolutional neural network 100
with a merged architecture that includes a segmentation tower 104 and a y tion
tower 108 sharing shared layers 112. The convolutional neural k 100 such as a deep
neural network (DNN) can be used to perform both eye image segmentation and image
quality estimation. A CNN 100 with a merged ecture can include a tation tower
104 and a quality estimation tower 108. The segmentation tower 104 can include
segmentation layers 116 connected to the shared layers 112. The shared layers 112 can be
CNN layers that are shared by the segmentation tower 104 and the quality estimation tower
108. An output layer of the shared layers 112 can be connected to an input layer of the
segmentation layers 116. One or more output layers of the shared layers 112 can be
connected to one or more input layers of the segmentation layers 116. The segmentation
layers 116 can be CNN layers unique to the tation tower 104 and not shared with the
quality estimation tower 108.
The quality estimation tower 108 can include quality estimation layers 120
and the shared layers 112. The y estimation layers 120 can be CNN layers unique to
the quality tion tower 108 and not shared with the segmentation tower 104. An output
layer of the shared layers 112 can be a shared layer 112 that is connected to an input layer of
the quality estimation layers 120. An input layer of the quality estimation layers 120 can be
connected to an output layer of the shared layers 112. One or more output layers of the
shared layers 112 can be connected to one or more input layers of the quality estimation
layers 120.
The shared layers 112 can be connected to the segmentation layers 116 or
the quality estimation layers 120 differently in different implementations. For example, an
output layer of the shared layers 112 can be ted to one or more input layers of the
segmentation layers 116 or one or more input layers of the quality estimation layers 120. As
another example, an output layer of the shared layers 112 can be connected to one or more
input layers of the segmentation layers 116 and one or more input layers of the y
estimation layers 120. Different numbers of output layers of the shared layers 112, such as 1,
2, 3, or more output layers, can be connected to the input layers of the segmentation layers
116 or the y estimation layers 120. Different numbers of input layers of the
segmentation layers 116 or the quality estimation layers 120, such as 1, 2, 3, or more input
layers, can be connected to the output layers of the shared layers 112.
The segmentation tower 104 can process an eye image 124 to generate
segmentations of the eye image. schematically illustrates an example eye 200 in an
eye image 124. The eye 200 includes eyelids 204, a sclera 208, an iris 212, and a pupil 216.
A curve 216a shows the pupillary ry between the pupil 216 and the iris 212, and a
curve 212a shows the limbic boundary between the iris 212 and the sclera 208 (the “white”
of the eye). The eyelids 204 include an upper eyelid 204a and a lower eyelid 204b.
With nce to an input layer of the shared layers 112 of the
segmentation tower 104 (or the y estimation tower 108) can receive as its input an eye
image 124, for example a 120 x 160 grayscale image. The segmentation tower 104 can
generate segmentation tower output 128. The segmentation tower output 128 can include
multiple images, e.g., four images, one for each region corresponding to the pupil 216, the
iris 212, the sclera 208, or the background in the eye image 124. The background of the eye
image can include s that correspond to eyelids, eyebrows, eyelashes, or skin
surrounding an eye in the eye image 124. In some implementations, the segmentation tower
output 128 can include a segmented eye image. A segmented eye image can include
segmented pupil, iris, sclera, or background.
The quality estimation tower 108 can process an eye image 124 to
generate quality estimation tower output such as a quality estimation of the eye image 124.
A quality tion of the eye image 124 can be a binary classification: a good quality
estimation classification or a bad quality estimation classification. A quality tion of
the eye image 124 can comprise a probability of the eye image 124 having a good quality
estimation classification. If the probability of the eye image 124 being good exceeds a high
quality threshold (such as 75%, 85%, 95%), the image can be classified as being good.
Conversely, in some embodiments, if the probability is below a low quality threshold (such
as 25%, 15%, 5%), then the eye image 124 can be classified as being poor.
When ng the convolutional neural k 100, many kernels are
learned. A kernel, when applied to the input eye image 124 or a e map computed by a
previous CNN layer, produces a ing feature map g the response of its input to
that ular kernel. The ing feature map can then be processed by a kernel of another
layer of the convolutional neural network 100 which down samples the resulting feature map
h a pooling operation to generate a smaller feature map. The process can then be
repeated to learn new kernels for computing their resulting feature maps. Accordingly, the
shared layers can be advantageously trained aneously when training the segmentation
tower 104 and the quality estimation tower 108.
The tation tower 104 (or the quality estimation tower 108) can
implement an encoding-decoding architecture. The early layers of the segmentation tower
104 (or the quality estimation tower 108) such as the shared layers 112 can encode an eye
image 124 by gradually decreasing spatial ion of feature maps and increasing the
number of feature maps computed by the layers. Decreasing spatial dimension may
advantageously result in the feature maps of middle layers of the segmentation tower 104 (or
the quality estimation tower 108) global context aware.
However decreasing spatial dimension may result in accuracy degradation,
for example, at segmentation boundaries such as the pupillary boundary or the limbic
boundary. In some implementations, a layer of the segmentation tower 104 (or the quality
estimation tower 108) can concatenate feature maps from different layers such as output
layers of the shared layers 104. The resulting concatenated feature maps may
advantageously be multi-scale because features extracted at multiple scales can be used to
e both local and global context and the feature maps of the earlier layers can retain
more high frequency details leading to sharper segmentation boundaries.
In some implementations, a convolution layer with a kernel size greater
than 3 pixels x 3 pixels can be replaced with consecutive 3 pixels x 3 pixels convolution
layers. With consecutive 3 pixels x 3 pixels convolution layer, the convolutional neural
network 100 can advantageously be smaller or faster.
Some layers of the segmentation tower 104 (or the y estimation
tower 108) such as the last layers of the segmentation layers 116 (or the quality estimation
layers 120) can decode the encoded eye image by gradually increasing spatial dimension of
feature maps back to the original eye image size and decreasing the number of feature maps.
Some layers of the convolutional neural network 100, for example the last two layers of the
quality estimation layers 120, can be fully connected.
Example Convolutional Neural Network Layers
The convolutional neural k 100 can include one or more neural
network layers. A neural network layer can apply linear or non-linear transformations to its
input to generate its output. A neural network layer can be a convolution layer, a
normalization layer (e.g., a brightness normalization layer, a batch normalization (BN) layer,
a local contrast normalization (LCN) layer, or a local response normalization (LRN) layer), a
rectified linear layer, an upsampling layer, a enation layer, a g layer, a fully
connected layer, a linear fully connected layer, a softsign layer, a recurrent layer, or any
combination thereof.
A convolution layer can apply a set of s that convolve or apply
convolutions to its input to generate its output. The normalization layer can be a ness
normalization layer that normalizes the brightness of its input to generate its output with, for
example, L2 normalization. A normalization layer can be a batch ization (BN) layer
that can normalize the brightness of a plurality of images with respect to one another at once
to generate a plurality of normalized images as its output. Non-limiting es of methods
for normalizing brightness include local contrast normalization (LCN) or local response
normalization (LRN). Local contrast normalization can normalize the st of an image
non-linearly by izing local regions of the image on a per pixel basis to have mean of
zero and variance of one. Local response normalization can normalize an image over local
input regions to have mean of zero and variance of one. The normalization layer may speed
up the computation of the eye segmentations and quality estimations.
A rectified linear layer can be a rectified linear layer unit (ReLU) layer or
a parameterized rectified linear layer unit (PReLU) layer. The ReLU layer can apply a ReLU
function to its input to generate its output. The ReLU function ReLU(x) can be, for example,
max(0, x). The PReLU layer can apply a PReLU function to its input to generate its output.
The PReLU function PReLU(x) can be, for example, x if x ≥ 0 and ax if x < 0, where a is a
positive number.
An upsampling layer can upsample its input to generate its output. For
example, the upsampling layer can upsample a 4 pixels x 5 pixels input to generate a 8 pixels
x 10 pixels output using upsampling s such as the nearest neighbor method or the
bicubic interpolation method. The concatenation layer can enate its input to generate
its output. For example, the concatenation layer can concatenate four 5 pixels x 5 pixels
feature maps to generate one 20 pixels x 20 pixels feature map. As another example, the
concatenation layer can concatenate four 5 pixels x 5 pixels feature maps and four 5 pixels x
pixels feature maps to generate eight 5 pixels x 5 pixels feature maps. The g layer
can apply a pooling function which down samples its input to generate its output. For
example, the g layer can down sample a 20 pixels x 20 pixels image into a 10 pixels x
pixels image. Non-limiting examples of the g function e maximum pooling,
average pooling, or minimum g.
A node in a fully connected layer is connected to all nodes in the previous
layer. A linear fully ted layer, similar to a linear classifier, can be a fully connected
layer with two output values such as good quality or bad quality. The softsign layer can
apply a softsign function to its input. The softsign function (softsign(x)) can be, for example,
(x / (1 + |x|)). The softsign layer may neglect impact of per-element outliers. A ement
outlier may occur because of eyelid occlusion or accidental bright spot in the eye images.
At a time point t, the recurrent layer can compute a hidden state s(t), and a
recurrent connection can provide the hidden state s(t) at time t to the recurrent layer as an
input at a subsequent time point t+1. The recurrent layer can compute its output at time t+1
based on the hidden state s(t) at time t. For example, the recurrent layer can apply the
softsign function to the hidden state s(t) at time t to e its output at time t+1. The
hidden state of the recurrent layer at time t+1 has as an input the hidden state s(t) of the
recurrent layer at time t. The recurrent layer can compute the hidden state s(t+1) by
applying, for example, a ReLU function to its input.
The number of the neural network layers in the convolutional neural
network 100 can be different in different implementations. For example, the number of the
neural network layers in the convolutional neural network 100 can be 100. The input type of
a neural network layer can be different in different implementations. For example, a neural
network layer can receive the output of a neural network layer as its input. The input of a
neural network layer can be ent in different implementations. For e, the input of
a neural network layer can include the output of a neural network layer.
The input size or the output size of a neural network layer can be quite
large. The input size or the output size of a neural network layer can be n x m, where n
denotes the height in pixels and m denotes the width in pixels of the input or the output. For
example, n x m can be 120 pixels x 160 pixels. The channel size of the input or the output of
a neural network layer can be different in different implementations. For example, the
channel size of the input or the output of a neural network layer can be eight. Thus, the a
neural network layer can receive eight ls or feature maps as its input or generate eight
channels or feature maps as its . The kernel size of a neural network layer can be
different in different implementations. The kernel size can be n x m, where n denotes the
height in pixels and m denotes the width in pixels of the kernel. For example, n or m can be
3 . The stride size of a neural network layer can be different in different
implementations. For example, the stride size of a neural network layer can be three. A
neural k layer can apply a g to its input, for example a n x m padding, where n
denotes the height and m denotes the width of the padding. For example, n or m can be one
pixel.
Example Shared Layers
FIGS. 3A-3C depict an example convolutional neural network 100 with a
merged architecture. depicts an example architecture of the shared layers 112 of the
segmentation tower 104 of the convolutional neural network 100. An input layer of the
shared layers 112 can be a convolution layer 302a that convolves an input eye image 124 (a
120 x 160 grayscale image) with 3 x 3 kernels (3 pixels x 3 pixels) after adding a 1 x 1
padding (1 pixel x 1 pixel). After adding a padding and convolving its input, the ution
layer 302a generates 8 ls of output with each channel being a 120 x 160 feature map,
denoted as 8 x 120 x 160 in the block representing the ution layer 302a. The 8
channels of output can be processed by a local se normalization (LRN) layer 302b, a
batch normalization (BN) layer 302c, and a rectified linear layer unit (ReLU) layer 302d.
The ReLU layer 302d can be connected to a convolution layer 304a that
convolves the output of the ReLU layer 302d with 3 x 3 kernels after adding a 1 x 1 padding
to generate eight channels of output (120 x 160 feature maps). The eight channels of output
can be sed by a batch normalization layer 304c and a ReLU layer 304d. The ReLU
layer 304d can be connected to a maximum g (MAX POOLING) layer 306a that pools
the output of the ReLU layer 304d with 2 x 2 kernels using 2 x 2 stride (2 pixels x 2 pixels)
to generate 8 channels of output (60 x 80 feature maps).
The maximum pooling layer 306a can be connected to a convolution layer
308a that convolves the output of the maximum pooling layer 306a with 3 x 3 kernels after
adding a 1 x 1 padding to generate 16 channels of output (60 x 80 feature maps). The 16
channels of output can be processed by a batch normalization layer 308c and a ReLU layer
308d.
The ReLU layer 308d can be connected to a convolution layer 310a that
ves the output of the ReLU layer 308d with 3 x 3 kernels after adding a 1 x 1 padding
to generate 16 ls of output (60 x 80 feature maps). The 16 channels of output can be
processed by a batch normalization layer 310c and a ReLU layer 310d. The ReLU layer
310d can be connected to a maximum pooling layer 312a that pools the output of the ReLU
layer 310d with 2 x 2 kernels using 2 x 2 stride to generate 16 channels of output (30 x 40
e maps).
The maximum pooling layer 312a can be connected to a convolution layer
314a that convolves the output of the maximum pooling layer 312a with 3 x 3 kernels after
adding a 1 x 1 padding to generate 32 channels of output (30 x 40 feature maps). During a
training cycle when training the convolutional neural network 100, 30 % of weight values of
the convolution layer 314a can be randomly set to values of zero, for a dropout ratio of 0.3.
The 32 channels of output can be processed by a batch ization layer 314c and a ReLU
layer 314d.
The ReLU layer 314d can be connected to a convolution layer 316a that
ves the output of the ReLU layer 314d with 3 x 3 kernels after adding a 1 x 1 padding
to generate 32 channels of output (30 x 40 feature maps). The 32 channels of output can be
processed by a batch normalization layer 316c and a ReLU layer 316d. The ReLU layer
316d can be connected to a maximum pooling layer 318a that pools the output of the ReLU
layer 316d with 2 x 2 kernels using 2 x 2 stride to generate 32 channels of output (15 x 20
feature maps).
The maximum pooling layer 318a can be connected to a convolution layer
320a that convolves the output of the maximum pooling layer 318a with 3 x 3 kernels after
adding a 1 x 1 g to generate 32 channels of output (15 x 20 feature maps). During a
training cycle when training the convolutional neural network 100, 30 % of weight values of
the ution layer 320a can be randomly set to values of zero, for a t ratio of 0.3.
The 32 channels of output can be processed by a batch normalization layer 320c and a ReLU
layer 320d.
The ReLU layer 320d can be connected to a convolution layer 322a that
convolves the output of the ReLU layer 320d with 3 x 3 kernels after adding a 1 x 1 padding
to generate 32 channels of output (15 x 20 feature maps). The 32 channels of output can be
processed by a batch normalization layer 322c and a ReLU layer 322d. The ReLU layer
322d can be connected to a maximum pooling layer 324a that pools the output of the ReLU
layer 322d with 2 x 2 kernels using 2 x 2 stride after adding a 1 x 0 g to generate 32
channels of output (8 x 10 feature maps). The maximum pooling layer 324a can be
connected to an input layer of the segmentation layers 116.
The maximum pooling layer 324a can be connected to a convolution layer
326a that convolves the output of the maximum pooling layer 324a with 3 x 3 kernels after
adding a 1 x 1 padding to generate 32 channels of output (8 x 10 feature maps). During a
training cycle when training the utional neural network 100, 30 % of weight values of
the convolution layer 326a can be randomly set to values of zero, for a dropout ratio of 0.3.
The 32 channels of output can be processed by a batch normalization layer 326c and a ReLU
layer 326d. The maximum pooling layer 324a can be connected to the segmentation layers
The ReLU layer 326d can be connected to a convolution layer 328a that
convolves the output of the ReLU layer 326d with 3 x 3 kernels after adding a 1 x 1 g
to generate 32 channels of output (8 x 10 feature maps). The 32 channels of output can be
processed by a batch normalization layer 328c and a ReLU layer 328d. The ReLU layer
328d can be connected to a maximum pooling layer 330a that pools the output of the ReLU
layer 328d with 2 x 2 kernels using 2 x 2 stride to generate 32 channels of output (4 x 5
feature maps). The maximum pooling layer 330a can be ted to the segmentation
layers 116 and the quality estimation layers 120.
The example shared layers 112 in implements an encoding
ecture. The example shared layers 112 encodes an eye image 124 by gradually
decreasing spatial dimension of feature maps and increasing the number of e maps
ed by the layers. For example, the convolution layer 302a generates 8 channels of
output with each channel being a 120 x 160 feature map while the convolution layer 326a
generates 32 ls of output with each channel being a 8 x 10 feature map.
Example Segmentation Layers
depicts an example architecture of the segmentation layers 116 of
the segmentation tower 104 of the convolutional neural network 100. An input layer of the
segmentation layers 116 can be an average pooling layer 332a that is connected to the
maximum pooling layer 330a of the shared layers 112. The average pooling layer 332a can
pool the output of the maximum pooling layer 330a with 4 x 5 kernels (4 pixels x 5 pixels) to
generate 32 channels of output (1 x 1 feature maps, i.e. feature maps each with a ion
of 1 pixel x 1 pixel). The average pooling layer 332a can be connected to an upsampling
layer 334a that uses the t neighbor method with a -1 x 0 padding (-1 pixel x 0 pixel) to
generate 32 channels of output (4 x 5 feature maps).
A concatenation layer 336a can be an input layer of the segmentation
layers 116 that is connected to the maximum g layer 330a of the shared layers 112.
The concatenation layer 336a can also be connected to the upsampling layer 334a. After
concatenating its input received from the m pooling layer 330a and the upsampling
layer 334a, the concatenation layer 336a can generate 64 channels of output (4 x 5 feature
maps). By enating the outputs from two layers, features extracted at multiple scales
can be used to provide both local and global context and the feature maps of the earlier layers
can retain more high frequency details leading to sharper segmentation boundaries. Thus, the
resulting concatenated feature maps generated by the concatenation layer 336a may
advantageously be scale. The concatenation layer 336a can be connected to an
ling layer 338a that uses the nearest or method to generate 64 ls of
output (8 x 10 feature maps). During a training cycle when training the convolutional neural
network 100, 30 % of weight values of the upsampling layer 338a can be randomly set to
values of zero, for a t ratio of 0.3.
The upsampling layer 338a can be connected to a ution layer 340a
that convolves the output of the upsampling layer 338a with 3 x 3 kernels after adding a 1 x 1
padding to generate 32 channels of output (8 x 10 feature maps). The 32 channels of output
can be processed by a batch normalization layer 340c and a ReLU layer 340d. The ReLU
layer 340d can be connected to a convolution layer 342a that convolves the output of the
ReLU layer 340d with 3 x 3 kernels after adding a 1 x 1 padding to generate 32 channels of
output (8 x 10 e maps). The 32 channels of output can be processed by a batch
normalization layer 342c and a ReLU layer 342d.
A concatenation layer 344a can be an input layer of the segmentation
layers 116 that is connected to the maximum pooling layer 324a of the shared layers 112.
The concatenation layer 344a can also be connected to the ReLU layer 342a. After
concatenating its input received from the ReLU layer 342a and the maximum pooling layer
324a, the concatenation layer 344a generates 64 channels of output (64 8 x 10 feature maps).
The concatenation layer 344a can be ted to an upsampling layer 346a that uses the
nearest neighbor method to generate 64 channels of output (15 x 20 feature maps). During a
training cycle when training the convolutional neural network 100, 30 % of weight values of
the upsampling layer 346a can be randomly set to values of zero, for a dropout ratio of 0.3.
The upsampling layer 346a can be connected to a convolution layer 348a
that convolves the output of the upsampling layer 346a with 3 x 3 kernels after adding a 1 x 1
padding to generate 32 channels of output (15 x 20 e maps). The 32 channels of output
can be processed by a batch normalization layer 348c and a ReLU layer 348d. The ReLU
layer 348d can be connected to a convolution layer 350a that convolves the output of the
ReLU layer 348d with 3 x 3 kernels after adding a 1 x 1 padding to te 32 channels of
output (15 x 20 feature maps). The 32 channels of output can be sed by a batch
normalization layer 350c and a ReLU layer 350d.
The ReLU layer 350d can be connected to an upsampling layer 352a that
uses the nearest neighbor method to generate 32 channels of output (30 x 40 feature maps).
During a training cycle when training the convolutional neural network 100, 30 % of weight
values of the upsampling layer 352a can be randomly set to values of zero, for a dropout ratio
of 0.3.
The upsampling layer 352a can be connected to a convolution layer 354a
that convolves the output of the upsampling layer 352a with 3 x 3 kernels after adding a 1 x 1
padding to generate 32 channels of output (30 x 40 feature maps). The 32 channels of output
can be processed by a batch normalization layer 354c and a ReLU layer 354d. The ReLU
layer 354d can be connected to a convolution layer 356a that convolves the output of the
ReLU layer 354d with 3 x 3 kernels after adding a 1 x 1 padding to generate 32 channels of
output (30 x 40 e maps). The 32 channels of output can be sed by a batch
normalization layer 356c and a ReLU layer 356d.
The ReLU layer 356d can be connected to an ling layer 358a that
uses the nearest neighbor method to generate 32 channels of output (60 x 80 feature maps).
The upsampling layer 358a can be connected to a convolution layer 360a that convolves the
output of the upsampling layer 358a with 3 x 3 kernels after adding a 1 x 1 padding to
te 16 channels of output (60 x 80 feature maps). The 16 channels of output can be
sed by a batch normalization layer 360c and a ReLU layer 360d. The ReLU layer
360d can be ted to a convolution layer 362a that convolves the output of the ReLU
layer 360d with 3 x 3 kernels after adding a 1 x 1 padding to generate 16 channels of output
(60 x 80 feature maps). The 16 channels of output can be sed by a batch normalization
layer 362c and a ReLU layer 362d.
The ReLU layer 362d can be connected to an upsampling layer 364a that
uses the nearest neighbor method to generate 16 channels of output (120 by 160 feature
maps). The upsampling layer 364a can be connected to a convolution layer 366a that
convolves the output of the upsampling layer 364a with 5 x 5 kernels after adding a 2 x 2
g to generate 4 channels of output (120 x 160 output images). The convolution layer
366a can be an output layer of the segmentation layers 116. The 4 output images can be the
segmentation tower output 128, one for reach region corresponding to the pupil 216, the iris
212, the sclera 208, or the background of the eye image 124. In some implementations, the
segmentation tower output 128 can be an image with four color values, one for each region
corresponding to the pupil 216, the iris 212, the sclera 208, or the background of the eye
image 124.
The e segmentation layers 116 in implements a decoding
architecture. The example segmentation layers 116 decodes the encoded eye image by
gradually increasing spatial dimension of feature maps back to the original eye image size
and sing the number of feature maps. For example, the average pooling layer 332a
tes 32 channels of output with each channel being a 1 x 1 feature map, while the
convolution layer 366a generates 4 channels of output with each channel being a 120 x 160
feature map.
Example Quality Estimation Layers
depicts an example architecture of the quality estimation layers
120 of the quality estimation tower 108 of the convolutional neural network 100. An input
layer of the quality tion layers 120 can be a convolution layer 368a. The convolution
layer 368a can convolve the output of the maximum pooling layer 330a of the shared layers
112 with 3 x 3 kernels (3 pixels x 3 pixels) after adding a 1 x 1 padding (1 pixel x 1 pixel) to
generate 32 channels of output (4 x 5 feature maps, i.e. feature maps with a dimension of 4
pixels x 5 ). During a ng cycle when training the convolutional neural network
100, 50 % of weight values of the convolution layer 368a can be randomly set to values of
zero, for a dropout ratio of 0.5. The 32 channels of output can be processed by a batch
normalization layer 368c and a ReLU layer 368d.
The ReLU layer 368d can be connected to a convolution layer 370a that
convolves the output of the ReLU layer 368d with 3 x 3 kernels after adding a 1 x 1 padding
to generate 16 channels of output (4 x 5 feature maps). The 16 ls of output can be
processed by a batch normalization layer 370c and a ReLU layer 370d. The ReLU layer
370d can be connected to an average pooling layer 372a that can pool the output of the ReLU
layer 370d with 4 x 5 kernels to generate 16 channels of output (1 x 1 feature maps).
The average pooling layer 370d can be connected to linear, fully
connected layer 374a that tes 8 ls of output (1 pixel x 1 pixel feature maps).
During a training cycle when training the convolutional neural k 100, 50 % of weight
values of the linear, fully ted layer 374a can be randomly set to values of zero, for a
dropout ratio of 0.5. The 8 channels of output can be processed by a batch normalization
layer 374c and a ReLU layer 374d. The ReLU layer 374d can be connected to a linear, fully
connected layer 376a that generates at least two channels of output (1 x 1 feature maps). The
linear, fully connected layer 376a can be an output layer of the quality estimation layers 120.
The at least two channels of output can be the quality estimation tower output 128 with one
channel corresponding to the good quality estimation and one channel ponding to the
bad quality tion.
Example Training of Convolutional Neural Networks
Different convolutional neural networks (CNNs) can be different from one
another in two ways. The architecture of the CNNs, for example the number of layers and
how the layers are interconnected, can be different. The weights which can affect the
strength of effect propagated from one layer to another can be different. The output of a
layer can be some nonlinear function of the weighted sum of its inputs. The weights of a
CNN can be the s that appear in these summations, and can be approximately
ous to the synaptic strength of a neural connection in a ical system.
The process of training a CNN 100 is the process of presenting the CNN
100 with a training set of eye images 124. The training set can include both input data and
corresponding reference output data. This training set can include both example inputs and
ponding reference outputs. Through the process of training, the weights of the CNN
100 can be incrementally learned such that the output of the network, given a particular input
data from the training set, comes to match (as closely as le) the reference output
corresponding to that input data.
Thus, in some implementations, a CNN 100 having a merged architecture
is trained, using a training set of eye images 124, to learn segmentations and quality
estimations of the eye images 124. During a training cycle, the segmentation tower 104
being d can s an eye image 124 of the training set to generate a tation
tower output 128 which can include 4 output images, one for reach region corresponding to
the pupil 216, the iris 212, the sclera 208, or the background of the eye image 124. The
quality estimation tower 108 being trained can process an eye image 124 of the training set to
generate a quality estimation tower output 132 of the eye image 124. A difference between
the segmentation tower output 128 of the eye image 124 and a reference segmentation tower
output of the eye image 124 can be computed. The reference segmentation tower output of
the eye image 124 can include four reference output images, one for reach region
corresponding to the pupil 216, the iris 212, the sclera 208, or the background of the eye
image 124. A difference between the quality estimation tower output 132 of the eye image
124 and a reference quality estimation tower output of the eye image 124 can be computed.
Parameters of the CNN 100 can be updated based on one or both of the
ences. For example, parameters of the segmentation layers 116 of the CNN 100 can be
updated based on the difference between the tation tower output 128 of the eye image
124 and the nce segmentation tower output of the eye image 124. As another example,
parameters of the quality estimation layers 120 of the CNN 100 can be updated based on the
ence between the quality estimation tower output 132 of the eye image 124 and the
reference quality estimation tower output of the eye image 124. As yet another example,
parameters of the shared layers 112 can be updated based on both differences. As a further
example, parameters of the segmentation layers 116 of the CNN 100 or parameters of the
quality estimation layers 120 of the CNN 100 can be updated based on both ences. The
two differences can affect the parameters of the shared layers 112, the segmentation layers
116, or the quality estimation layers 130 differently in different entations. For
example, the difference between the tation tower output 128 and the reference
segmentation tower output can affect the parameters of the shared layers 112 or the
segmentation layers 116 to a greater extent compared to the effect of the ence between
the quality estimation tower output 132 and the reference quality estimation tower output.
During a training cycle, a percentage of the parameters of the
convolutional neural network 100 can be set to values of zero. The percentage can be, for
example, 5 % – 50%, for a dropout ratio of 0.05 – 0.50. The parameters of the CNN 100 set
to values of zero during a training cycle can be different in different implementations. For
example, parameters of the CNN 100 set to values of zero can be randomly selected. As
r example, if 30% of the parameters of the CNN 100 are set to values of zero, then
approximately 30% of parameters of each layer of the CNN 100 can be randomly set to
values of zero.
When training the convolutional neural k 100 with the merged
architecture, many kernels are learned. A kernel, when applied to its input, produces a
resulting feature map showing the se to that ular learned kernel. The resulting
feature map can then be processed by a kernel of r layer of the CNN which samples
the resulting feature map through a pooling operation to generate a smaller feature map. The
process can then be repeated to learn new kernels for computing their resulting feature maps.
Example Eye Images and Segmented Eye Images
shows example results of segmenting eye images 124 using a
utional neural network 100 with the merged convolutional k architecture
illustrated in panel a shows a segmentation of the eye image shown in panel b. The segmentation of the eye image included a background region 404a, a sclera
region 408a, an iris region 412a, or a pupil region 416a of the eye image. The quality
estimation of the eye image shown in panel b was a good quality estimation of 1.000.
Accordingly, the quality estimation of the eye image was a good y estimation.
panel c shows a segmentation of the eye image shown in
panel d. The segmentation of the eye image included a background region 404c, a sclera
region 408c, an iris region 412c, or a pupil region 416c of the eye image. The quality
estimation of the eye image shown in panel d was a good quality estimation of 0.997.
Accordingly, the quality estimation of the eye image was a good quality estimation.
panel e shows a segmentation of the eye image shown in
panel f. A sclera, an iris, and a pupil of an eye in the eye image shown in panel f
were occluded by eyelids of the eye. The segmentation of the eye image included a
background region 404e, a sclera region 408e, an iris region 412e, or a pupil region 416e of
the eye image. The y estimation of the eye image shown in panel f was a good
quality estimation of 0.009. Accordingly, the quality estimation of the eye image was a bad
quality estimation.
panel g shows a segmentation of the eye image shown in
panel h. A , an iris, and a pupil of an eye in the eye image shown in panel h
were occluded by eyelids of the eye. Furthermore, the eye image is blurry. The
segmentation of the eye image included a ound region 404g, a sclera region 408g, an
iris region 412g, or a pupil region 416g of the eye image. The quality of the eye image
shown in panel h was a good quality estimation of 0.064. Accordingly, the quality
estimation of the eye image was a bad quality estimation.
Example Process for Eye Image tation and Image Quality Estimation
is a flow diagram of an example process 500 of creating a
convolutional neural network 100 with a merged architecture. The process 500 starts at
block 504. At block 508, shared layers 112 of a convolutional neural network (CNN) 100 are
created. The shared layers 112 can include a plurality of layers and a plurality of kernels.
Creating the shared layers 112 can e ng the plurality of layers, creating the
plurality of kernels with appropriate kernel sizes, strides, or paddings, or connecting the
successive layers of the plurality of layers.
At block 512, segmentation layers 116 of the CNN 100 are created. The
segmentation layers 116 can include a plurality of layers and a ity of kernels. Creating
the segmentation layers 116 can include creating the plurality of layers, creating the ity
of kernels with appropriate kernel sizes, strides, or gs, or connecting the successive
layers of the ity of layers. At block 516, an output layer of the shared layers 112 can be
connected to an input layer of the segmentation layers 116 to generate a segmentation tower
104 of the CNN 100.
At block 520, quality estimation layers 120 of the CNN 100 are created.
The quality estimation layers 120 can include a plurality of layers and a ity of kernels.
Creating the quality estimation layers 120 can include creating the plurality of layers,
creating the plurality of kernels with appropriate kernel sizes, strides, or paddings, or
connecting the successive layers of the plurality of layers. At block 524, an output layer of
the shared layers 112 can be connected to an input layer of the quality estimation layers 120
to generate a quality estimation tower 108 of the CNN 100. The process 500 ends at block
is a flow diagram of an example process 600 of segmenting an eye
image 124 using a convolutional neural network 100 with a merged ecture. The
process 600 starts at block 604. At block 608, a neural network receives an eye image 124.
For example, an input layer of shared layers 112 of a CNN 100 can receive the eye image
124. An image sensor (e.g., a digital camera) of a user device can capture the eye image 124
of a user, and the neural network can receive the eye image 124 from the image sensor.
After receiving the eye image 124 at block 608, the neural network
segments the eye image 124 at block 612. For example, a segmentation tower 104 of the
CNN 100 can generate a segmentation of the eye image 124. An output layer of the
segmentation tower 104 can, together with other layers of the segmentation tower 104,
compute the segmentation of the eye image 124, including a pupil region, an iris region, a
sclera region, or a background region of an eye in the eye image 124.
At block 616, the neural network computes a quality tion of the eye
image 124. For example, a quality estimation tower 108 of the CNN 100 can generate the
quality estimation of the eye image 124. An output layer of the quality estimation tower 108
can, together with other layers of the quality estimation tower 108, compute the quality
tion of the eye image 124, such as a good quality estimation or a bad quality
tion.
Example Process of Determining a Pupil r, an Iris Contour, and a Mask for Irrelevant
Image Area
A conventional iris code is a bit string extracted from an image of the iris.
To compute the iris code, an eye image is segmented to separate the iris form the pupil and
sclera, for example, using the convolutional neural k 100 with the merged ecture
illustrated in The segmented eye image can then be mapped into polar or pseudopolar
coordinates before phase information can be extracted using complex-valued twodimensional
wavelets (e.g., Gabor or Haar). One method of ng a polar (or pseudopolar
) image of the iris can include determining a pupil contour, determining an iris contour,
and using the determined pupil contour and the determined iris contour to create the polar
image.
is a flow diagram of an example process 700 of determining a pupil
contour, an iris contour, and a mask for irrelevant image area in a segmented eye image. The
s 700 starts at block 704. At block 708, a segmented eye image is received. The
segmented eye image can include segmented pupil, iris, sclera, or background. A user device
can capture an eye image 124 of a user and compute the segmented eye image. A user
device can implement the example convolutional neural network (CNN) 100 with the merged
architecture illustrated in FIGS. 3A-3C or the e s 600 rated in to
compute the segmented eye image.
The segmented eye image can be a semantically segmented eye image.
schematically rates an example semantically segmented eye image 800. The
semantically segmented eye image 800 can be computed from an image of the eye 200
illustrated in The semantically segmented eye image 800 can have a dimension of n
pixels x m pixels, where n denotes the height in pixels and m denotes the width in pixels of
the semantically segmented eye image 800.
A pixel of the semantically segmented eye image 800 can have one of four
color values. For example, a pixel 804 of the semantically segmented eye image 800 can
have a color value that corresponds to a background 808 of the eye image (denoted as “first
color value” in . The color value that ponds to the ound 808 of the eye
image can have a numeric value such as one. The background 808 of the eye image can
include regions that correspond to s, eyebrows, eyelashes, or skin surrounding the eye
200. As another example, a pixel of the semantically segmented eye image 800 can have a
color value that corresponds to a sclera 208 of the eye 200 in the eye image (denoted as
“second color value” in . The color value that corresponds to the sclera 208 of the eye
200 in the eye image can have a numeric value such as two. As yet example, a pixel of the
semantically segmented eye image 800 can have a color value that corresponds to an iris 212
of the eye 200 in the eye image (denoted as “third color value” in . The color value
that corresponds to the iris 212 of the eye 200 in the eye image can have a numeric value
such as three. As another example, a pixel 812 of the ically segmented eye image 800
can have a color value that corresponds to a pupil 216 of the eye 200 in the eye image
(denoted as “fourth color value” in . The color value that corresponds to the pupil
216 of the eye 200 in the eye image can have a c value such as four. In curve
216a shows the pupillary boundary between the pupil 216 and the iris 212, and curve 212a
shows the limbic boundary between the iris 212 and the sclera 208 (the “white” of the eye).
With reference to at block 712, a pupil contour of the eye 200 in
the eye image can be determined. The pupil contour can be the curve 216a that shows the
pupillary boundary between the pupil 216 and the iris 212. The pupil contour can be
ined using an example process 900 rated in (described in greater detail
below). At block 716, an iris contour of the eye 200 in the eye image can be determined.
The iris contour can be the curve 212a that shows the limbic boundary n the iris 212
and the sclera 208. The iris contour can be determined using the example process 900
illustrated in (described in r detail below). The processes used for determining
the pupil contour and the iris contour can be the same or can be optimized for each
ination because, for example, the pupil size and the iris size can be different.
At block 720, a mask image for an irrelevant area in the eye image can be
determined. The mask image can have a dimension of n pixels x m pixels, where n denotes
the height in pixels and m denotes the width in pixels of the mask image. A dimension of the
semantically segmented eye image 800 and a dimension of the mask image can be the same
or can be ent. The mask can be a binary mask image. A pixel of the binary mask
image can have a value of zero or a value of one. The pixel of the binary mask image can
have a value of zero if a corresponding pixel in the semantically segmented eye image 800
has a value greater than or equal to, for example, the third color value such as the numeric
value of three. The pixel of the binary mask image can have a value of one if a
corresponding pixel in the semantically segmented eye image 800 does not have a value
greater than or equal to, for example, the third color value such as the numeric value of three.
In some implementations, the process 700 can optionally create a polar image of the iris 212
of the eye 200 in the eye image using the pupil contour, the iris contour, and the mask for the
irrelevant area in the semantically segmented eye image. The process 700 ends at block 724.
Example Process of Determining a Pupil Contour or an Iris contour
is a flow diagram of an example process 900 of determining a pupil
contour or an iris contour in a segmented eye image. The process 900 starts at block 904. At
block 908, a binary image can be created from a segmented eye image, such as the
semantically segmented eye image 800. A schematically illustrates an example
binary image 1000A created at block 904. The binary image 1000A can have a dimension of
n pixels x m pixels, where n denotes the height in pixels and m denotes the width in pixels of
the binary image 1000A. The dimension of the segmented eye image or the semantically
segmented eye image 800 and the dimension of the binary image 1000A can be the same or
can be different.
A pixel 1004a of the binary image 1000A can have a color value of zero if
a corresponding pixel in the semantically ted eye image 800 has a value not greater
than or equal to a threshold color value, for example the “fourth color value.” A pixel 1012a
of the binary image 1000A can have a color value of one if a ponding pixel in the
semantically ted eye image 800 has a value greater than or equal to a threshold color
value, for example the “fourth color value.” In some implementations, pixels of the binary
image 1000A can have values other than zero or one. For example, the pixel 1004a of the
binary image 1000A can have a color value of “third color value” such as the numeric value
three. The pixel 1012a of the binary image 1000A can have a color value of “fourth color
value,” such as the numeric value , where the “fourth color value” is r than the
“third color value”.
With reference to at block 912, contours in the binary image
1000A are determined. For example, contours in the binary image 1000A can be ined
using, for example, the OpenCV findContours function (available from opencv.org). B schematically illustrates an example contour 1016 in the binary image 1000A. Referring
to at block 916, a contour border can be determined. The contour border can be a
t contour in the binary image 1000A. The contour 1016 in the binary image 1000A
can be the t contour in the binary image 1000A. The contour 1016 can include a
plurality of pixels of the binary image 1000A, such as the pixel 1024a.
At block 920, a contour points bounding box (e.g., a contour points
bounding box 1020 in B) is determined. The contour points bounding box 1020 can
be a st rectangle enclosing the longest contour border such as the contour border 1016.
At block 924, a points area size can be determined. The points area size can be a al
1028 of the contour points bounding box 1020 in the binary image 1000A in B.
At block 928, a second binary image can be created from a segmented eye
image, such as the semantically segmented eye image 800. C schematically
rates an example second binary image 1000C. The second binary image 1000C can
have a dimension of n pixels x m pixels, where n denotes the height in pixels and m denotes
the width in pixels of the second binary image 1000C. The dimension of the binary image
1000A and the dimension of the binary image 1000A can the same or can be different.
A pixel 1004c of the second binary image 1000C can have a color value of
zero if a corresponding pixel in the semantically segmented eye image 800 has a value not
greater than or equal to a threshold color value, for example the “third color value.” A pixel
1012c of the second binary image 1000C can have a color value of one if a corresponding
pixel in the semantically ted eye image 800 has a value greater than or equal to a
threshold color value, for e the “third color value.” In some implementations, pixels
of the second binary image 1000C can have values other than zero or one. For example, the
pixel 1004c of the second binary image 1000C can have a color value of “second color
value” such as the numeric value two. The pixel 1012c of the second binary image 1000B
can have a color value of “third color value,” such as the numeric value three, where the
“third color value” is greater than the “second color value”.
With nce to at block 932, a pixel (e.g. a pixel 1024c in ) in the second binary image 1000C that corresponds to the pixel 1024a in the binary
image 1000A is determined. If a dimension of the second binary image 1000C and a
dimension of the binary image 1000A are the same, then the pixel 1024c can have a
coordinate of (m1; n1) in the second binary image 1000C and the pixel 1024a can have a
coordinate of (m1; n1) in the binary image 1000A, wherein m1 denotes the coordinate in the
width direction and n1 denotes the coordinate in the height direction. A distance between the
pixel 1024c and a pixel in the second binary image 1000C that has a color value of 0 and is
closest to the pixel 1024c is determined. For example, the distance can be a distance 1032 in
C between the pixel 1024c and the pixel 1036 in the second binary image 1000C that
has a color value of 0 and is closest to the pixel 1024c. The ce 1032 can be determined
using, for example, the OpenCV distanceTransform function.
At block 936, the pixel 1024a can be removed from the pixels of the
contour 1016 if it is inappropriate for determining a pupil contour. The pixel 1024a can be
inappropriate for determining a pupil contour if the distance 1032 is smaller than a
predetermined threshold. The predetermined threshold can be a fraction multiplied by a size
of the contour points bounding box 1020, such as the points area size or a size of a diagonal
1028 of the contour points bounding box 1020 in B. The fraction can be in the range
from 0.02 to 0.20. For example, the fraction can be 0.08.
At block 940, a pupil contour can be determined from the remaining pixels
of the contour border 1016 by fitting a curve (such as an ellipse) to the remaining pixels. The
ellipse can be ined using, for example, the OpenCV fitEllipse function. The process
900 ends at block 944. Although FIGS. C has been used to illustrates using the
process 900 to ine a pupil contour, the process 900 can also be used to determine an
iris contour.
e Pupil Contour and Iris r Determination
show example results of determining iris contours, pupil contours,
and masks for irrelevant image areas using the example processes 700 and 900 illustrated in
FIGS. 7 and 9. , panels a-f show example results of ining an iris contour, a
pupil contour, and a mask for irrelevant image area of an eye image. , panel a shows
an eye image. , panel b shows a semantically ted eye image of the eye image
in , panel a using a convolutional neural network 100 with the merged convolutional
network architecture illustrated in The semantically segmented eye images included
a background region 1104a with a numeric color value of one, a sclera region 1108a with a
numeric color value of two, an iris region 1112a with a numeric color value of three, or a
pupil region 1116a of the eye image with a numeric color value of four.
, panels c shows the remaining pixels 1120a of a r border of
the pupil and the remaining pixels 1124a of a contour border of the iris overlaid on the eye
image shown in , panel a determined using the process 900 at block 936. ,
panels d shows the remaining pixels 1120a of the contour border of the pupil and the
remaining pixels 1124a of the contour border of the iris overlaid on the semantically
segmented eye image shown in , panel b. , panel e shows an ellipse of the
pupil 1128a and an ellipse of the iris 1132a determined by fitting the remaining pixels of the
contour border of the pupil 1120a and the contour border of the iris 1124a by the process 900
at block 940. , panels f shows a binary mask image for an irrelevant area in the eye
image by the process 700 at block 720. The binary mask image includes a region 1136a that
corresponds to the iris region 1112a and the pupil region 1116a of the semantically
ted eye image shown in , panel b. The binary mask image also includes a
region 1140a that corresponds to the background region 1104a and the sclera region 1108a.
Similar to , panels a-f, , panels g-l show example results of
determining an iris r, a pupil contour, and a mask for irrelevant image area of r
eye image.
Example Iris tication Using a CNN with a t Network Architecture Trained on
Segmented Polar Images
FIGS. 12A-12B show example results of training a convolutional neural
network (CNN) with a triplet network ecture on iris images in polar coordinates
obtained after fitting pupil contours and iris contours with the example processes shown in
FIGS. 7 and 9. The triplet network architecture is shown in and described in greater
detail below.
A is a histogram plot of the probability density vs. embedding
distance. The iris images of the same subjects were closer together in the embedding space,
and the iris images of different subjects were further away from one another in the
embedding space. B is a receiver characteristic (ROC) curve of true positive rate
(TPR) vs. false ve rate (FPR). The area under the ROC curve was 99.947%. Using iris
images in polar coordinates to train the CNN with a triplet network architecture, 0.884%
EER was achieved.
t Network Architecture
Using images of the human eye, a convolutional neural network (CNN)
with a triplet k architecture can be trained to learn an embedding that maps from the
higher dimensional eye image space to a lower dimensional embedding space. The
dimension of the eye image space can be quite large. For example, an eye image of 256
pixels by 256 pixels can potentially include thousands or tens of thousands of degrees of
freedom. is a block diagram of an example convolutional neural network 1300 with
a triplet network ecture. A CNN 1300 can be trained to learn an embedding 1304
(Emb). The ing 1304 can be a function that maps an eye image (Img) 1308 in the
higher dimensional eye image space into an embedding space entation (EmbImg) of
the eye image in a lower dimensional ing space. For example, Emb(Img) = .
The eye image (Img) 1308 can be an iris image in polar nates computed using a pupil
contour and an iris contour determined with the example processes shown in FIGS. 7 and 9.
The embedding space representation, a representation of the eye image in
the embedding space, can be an n-dimensional real number vectors. The embedding space
representation of an eye image can be an n-dimensional eye description. The dimension of
the representations in the embedding space can be different in different implementations. For
example, the ion can be in a range from 16 to 2048. In some implementations, n is
128. The elements of the embedding space representations can be represented by real
numbers. In some architectures, the embedding space representation is represented as n
floating point numbers during training but it may be quantized to n bytes for authentication.
Thus, in some cases, each eye image is represented by an n-byte representation.
Representations in an embedding space with larger dimension may perform better than those
with lower dimension but may require more training. The embedding space representation
can have, for example, unit length.
The CNN 1300 can be trained to learn the embedding 1304 such that the
distance between eye images, independent of imaging conditions, of one person (or of one
’s left or right eye) in the ing space is small because they are clustered together
in the embedding space. In contrast, the distance between a pair of eye images of different
persons (or of a person’s ent eye) can be large in the embedding space e they are
not clustered together in the ing space. Thus, the distance between the eye images
from the same person in the embedding space, the embedding distance, can be smaller than
the distance between the eye images from different persons in the embedding space. The
distance between two eye images can be, for example, the Euclidian distance (a L2 norm)
between the embedding space representations of the two eye images.
The ce between two eye images of one person, for example an
anchor eye image (ImgA) 1312a and a positive eye image (ImgP) 1312p, can be small in the
embedding space. The distance between two eye images of different persons, for example
the anchor eye image (ImgA) 1312a and a negative eye image (ImgN) 1312n can be larger in
the embedding space. The ImgA 1312a is an “anchor” image because its embedding space
representation can be compared to embedding space representations of eye images of the
same person (e.g., the ImgP 1312p) and different persons (e.g., ImgN . ImgA 1312p
is a ive” image because the ImgP 1312p and the ImgA 1312a are eye images of the
same person. The ImgN 1312n is a ive” image because the ImgN 1312n and the ImgA
1312a are eye images of different persons. Thus, the distance between the ImgA 1312a and
the ImgP 1312p in the embedding space can be smaller than the distance between the ImgA
1312a and the ImgN 1312N in the ing space.
The embedding network (Emb) 1304 can map the ImgA 1312a, the ImgP
1312p, and the ImgN 1312n in the higher dimensional eye image space into an anchor
embedding image (EmbA) 1316a, a positive embedding image (EmbP) 1316a, and a negative
embedding image (EmbN) 1316n. For example, Emb(ImgA) = EmbA; Emb(ImgP) = EmbP;
and gN) = EmbN. Thus, the distance n the EmbA 1316a and the EmbP 1316a
in the embedding space can be smaller than the distance between EmbP 1316a and EmbN
1316n in the embedding space.
To learn the embedding 1304, a training set T1 of eye images 1308 can be
used. The eye images 1380 can be iris images in polar coordinates computed using a pupil
contour and an iris contour determined with the example processes shown in FIGS. 7-9. The
eye images 1308 can include the images of left eyes and right eyes. The eye images 1308
can be associated with labels, where the labels distinguish the eye images of one person from
eye images of another person. The labels can also distinguish the eye images of the left eye
and the right eye of a person. The ng set T1 can include pairs of eye image and label
(Img; Label). The training set T1 of (Img; Label) pairs can be received from an eye image
data store.
To learn the ing 1304, the CNN 1300 with a triplet network
architecture can include three identical embedding networks, for example an anchor
embedding network (ENetworkA) 1320a, a positive embedding network (ENetworkP)
1320p, and a negative embedding network (ENetworkN) 1320n. The ing networks
1320a, 1320p, or 1320n can map eye images from the eye image space into embedding space
representations of the eye images in the embedding space. For example, the rkA
1320a can map an ImgA 1312a into an EmbA 1316a. The ENetworkA 1320p can map an
ImgP 1312p into an EmbP 1316p. The ENetworkN 1320n can map an ImgN 1312n into an
EmbN 1316n.
The convolutional neural network 1300 with the triplet network
architecture can learn the embedding 1304 with a triplet training set T2 including triplets of
eye images. Two eye images of a triplet are from the same person, for example the ImgA
1312a and the ImgP 1312p. The third eye image of the triplet is from a different person, for
example the ImgN 1312n. The ENetworkA 1320a, the ENetworkP 1320p, and the
ENetworkN 1320n can map triplets of (ImgA; ImgP; ImgN) into triplets of (EmbA; EmbP;
EmbN). The eye authentication trainer 1304 can generate the triplet training set T2 from the
training set T1 of (Img; Label) pairs.
The ImgA 1312a, the ImgP 1312p, or the ImgN 1312n can be different in
different entations. For example, the ImgA 1312a and the ImgP 1312p can be eye
images of one person, and the ImgN 1312n can be an eye image of another person. As
another example, the ImgA 1312a and the ImgP 1312p can be images of one ’s left
eye, and the ImgN 1312n can be an image of the person’s right eye or an eye image of
another person.
The triplet network architecture can be used to learn the embedding 1304
such that an eye image of a person in the embedding space is closer to all other eye images of
the same person in the embedding space than it is to an eye image of any other person in the
embedding space. For example, |EmbA – EmbP| < |EmbA – EmbN|, where |EmbA – EmbP|
s the absolute ce between the EmbA 1316a and the EmbP 1316p in the
embedding space, and |EmbA – EmbN| s the absolute distance between the EmbA
1316a and the EmbN 1316n in the embedding space.
In some implementations, the t network architecture can be used to
learn the embedding 1304 such that an image of a person’s left eye in the embedding space is
closer to all images of the same person’s left eye in the ing space than it is to any
image of the person’s right eye or any eye image of another person in the embedding space.
The dimension of the embedding space representations can be different in
different implementations. The dimension of the EmbA 1316a, EmbP 1316p, and EmbN
1316n can be the same, for example 431. The length of the embedding space representation
can be different in different implementations. For example, the EmbA 1316a, EmbP 1316p,
or EmbN 1316n can be normalized to have unit length in the embedding space using L2
normalization. Thus, the embedding space representations of the eye images are on a
hypersphere in the embedding space.
The triplet network ecture can e a triplet loss layer 1324
configured to compare the EmbA 1316a, the EmbP 1316p, and the EmbN 1316n. The
embedding 1304 learned with the triplet loss layer 1324 can map eye images of one person
onto a single point or a cluster of points in close proximity in the embedding space. The
triplet loss layer 1324 can ze the distance n eye images of the same person in
the embedding space, for example the EmbA 1316a and the EmbP 1316p. The triplet loss
layer 1324 can maximize the distance between eye images of different persons in the
embedding space, for example EmbA 1316a, and the EmbN 1316n.
The triplet loss layer 1324 can compare the EmbA 1316a, the EmbP
1316p, and the EmbN 1316n in a number of ways. For e, the triplet loss layer 1324
can compare the EmbA 1316a, the EmbP 1316p, and the EmbN 1316n by computing:
Maximum(0, |EmbA – EmbP|2 – |EmbA – EmbN|2 + m), Equation (1)
where |EmbA – EmbP| denotes the absolute distance between the EmbA 1316a and the
EmbP 1316p in the embedding space, |EmbA – EmbN| denotes the absolute distance between
the EmbA 1316a and the EmbN 1316n, and m denotes a margin. The margin can be different
in different implementations. For example, the margin can be 0.16 or another number in a
range from 0.01 to 1.0. Thus, in some implementations, the embedding 1304 can be learned
from eye images of a ity of persons, such that the distance in the embedding space
between the eye images from the same person is smaller than the distance in the embedding
space between eye images from different persons. In terms of the particular implementation
of Equation (1), the squared distance in the embedding space between all eye images from
the same person is small, and the squared distance in the embedding space between a pair of
eye images from different persons is large.
The function of the margin m used in comparing the EmbA 1316a, the
EmbP 1316p, and the EmbN 1316n can be different in ent implementations. For
example, the margin m can enforce a margin between each pair of eye images of one person
and eye images of all other persons in the embedding space. Accordingly, the embedding
space representations of one person’s eye images can be clustered closely together in the
embedding space. At the same time, the embedding space representations of ent
persons’ eye images can be maintained or maximized. As another example, the margin m
can enforce a margin between each pair of images of one person’s left eye and images of the
person’s right eye or eye images of all other persons.
During an iteration of the learning of the ing 1304, the triplet loss
layer 1324 can compare the EmbA 1316a, the EmbP 1316p, and the EmbN 1316n for
different s of triplets. For example, the triplet loss layer 1324 can compare the EmbA
1316a, the EmbP 1316p, and the EmbN 1316n for all triplets (EmbA; EmbP; EmbN) in the
triplet training set T2. As another example, the triplet loss layer 1324 can compare the EmbA
1316a, the EmbP 1316p, and EmbN 1316n for a batch of ts (EmbA; EmbP; EmbN) in
the triplet training set T2. The number of triplets in the batch can be different in different
implementations. For example, the batch can include 64 ts of (EmbA; EmbP; EmbN).
As another example, the batch can include all the triplets (EmbA; EmbP; EmbN) in the triplet
ng set T2.
During an iteration of learning the embedding 1304, the triplet loss layer
1324 can compare the EmbA 1316a, the EmbP 1316p, and the EmbN 1316n for a batch of
triplets (EmbA; EmbP; EmbN) by computing a t loss. The triplet loss can be, for
∑áÜ@5 m(0, |EmbA(i) – EmbP(i)|6 – |EmbA(i) – EmbN(i)|6 + �� ), Equation (2)
where n s the number of triplets in the batch of triplets; and EmbA(i), ), and
EmbN(i) denotes the ith EmbA 1316a, EmbP 1316p, and EmbN 1316n in the batch of
triplets.
During the learning of the embedding 1304, the eye authentication r
1304 can update the ENetworkA 1320a, the ENetworkP 1320p, and the ENetworkN 1320n
based on the comparison between a batch of triplets (EmbA; EmbP; EmbN), for example the
t loss between a batch of triplets (EmbA; EmbP; EmbN). The eye authentication trainer
1304 can update the ENetworkA 1320a, the ENetworkP 1320p, and the ENetworkN 1320n
ically, for example every ion or every 1,000 iterations. The eye authentication
trainer 1304 can update the ENetworkA 1320a, the ENetworkP 1320p, and the ENetworkN
1320n to optimize the embedding space. Optimizing the embedding space can be different in
ent implementations. For example, optimizing the embedding space can include
minimizing Equation (2). As another example, optimizing the embedding space can include
minimizing the distance between the EmbA 1316a and the EmbP 1316p and maximizing the
distance between the EmbA 1316a and the EmbN 1316n.
After ions of optimizing the embedding space, one or more of the
following can be computed: an embedding 1304 that maps eye images from the higher
ional eye image space into representations of the eye images in a lower dimensional
embedding space; or a threshold value 1328 for a user device to determine whether the
embedding space representation of an user’s eye image is similar enough to an authorized
user’s eye image in the embedding space such that the user should be authenticated as the
authorized user. The embedding 1304 or the threshold value 1328 can be ined without
specifying the features of eye images that can or should use in computing the embedding
1304 or the threshold value 1328.
The threshold value 1328 can be different in different implementations.
For example, the threshold value 1328 can be the largest distance between eye images of the
same person determined from the (ImgA; ImgP; ImgN) triplets during the last ion of
learning the embedding 1304. As another example, the threshold value 1328 can be the
median distance between eye images of the same person determined from the (ImgA; ImgP;
ImgN) triplets during the last iteration of learning the embedding 1304. As yet another
example, the threshold value 1328 can be smaller than the largest distance between eye
images of the different persons ined from the (ImgA; ImgP; ImgN) triplets during the
last iteration of learning the embedding 1304.
The number of iterations required to learn the embedding 1304 can be
different in different implementations. For example, the number of iterations can be
100,000. As another example, the number of ions may not be predetermined and can
depend on iterations required to learn an embedding 1304 with satisfactory characteristics
such as having an equal error rate (EER) of 2%. As yet another example, the number of
iterations can depend on iterations required to obtain a satisfactory triplet loss.
The ability of the embedding 1304 to distinguish unauthorized users and
ized users can be different in different implementations. For example, the false
positive rate (FPR) of the embedding 1304 can be 0.01%; and the true ve rate (TPR) of
the embedding 1304 can be 99.99%. As another example, the false negative rate (FNR) of
the ing 1304 can be 0.01%; and the true negative rate (TNR) of the embedding 1304
can be 99.99%. The equal error rate (EER) of the embedding 1304 can be 1%, for example.
Example Wearable Display System
In some embodiments, a user device can be, or can be included, in a
le display device, which may advantageously provide a more immersive virtual reality
(VR), augmented reality (AR), or mixed reality (MR) experience, where digitally reproduced
images or ns thereof are presented to a wearer in a manner wherein they seem to be, or
may be perceived as, real.
Without being limited by theory, it is believed that the human eye
typically can interpret a finite number of depth planes to provide depth perception.
Consequently, a highly believable simulation of perceived depth may be achieved by
ing, to the eye, different presentations of an image corresponding to each of these
limited number of depth planes. For example, displays containing a stack of waveguides
may be configured to be worn oned in front of the eyes of a user, or viewer. The stack
of waveguides may be utilized to e three-dimensional perception to the ain by
using a plurality of waveguides to direct light from an image injection device (e.g., discrete
displays or output ends of a multiplexed display which pipe image information via one or
more l fibers) to the viewer’s eye at ular angles (and amounts of divergence)
corresponding to the depth plane associated with a particular ide.
In some embodiments, two stacks of waveguides, one for each eye of a
viewer, may be utilized to provide different images to each eye. As one example, an
augmented reality scene may be such that a wearer of an AR technology sees a real-world
park-like setting featuring people, trees, buildings in the background, and a concrete
platform. In addition to these items, the wearer of the AR technology may also perceive that
he “sees” a robot statue standing upon the real-world platform, and a cartoon-like avatar
character flying by which seems to be a ification of a bumble bee, even though the
robot statue and the bumble bee do not exist in the real world. The stack(s) of waveguides
may be used to generate a light field ponding to an input image and in some
implementations, the wearable display comprises a wearable light field display. Examples of
wearable display device and waveguide stacks for providing light field images are described
in U.S. Patent Publication No. 2015/0016777, which is hereby incorporated by reference
herein in its entirety for all it contains.
illustrates an example of a wearable display system 1400 that can
be used to present a VR, AR, or MR experience to a display system wearer or viewer 1404.
The le display system 1400 may be programmed to perform any of the applications or
embodiments described herein (e.g., eye image segmentation, eye image quality estimation,
pupil contour determination, or iris contour ination). The y system 1400
includes a display 1408, and various mechanical and electronic modules and systems to
support the functioning of that display 1408. The y 1408 may be coupled to a frame
1412, which is wearable by the display system wearer or viewer 1404 and which is
configured to position the display 1408 in front of the eyes of the wearer 1404. The display
1408 may be a light field display. In some embodiments, a speaker 1416 is coupled to the
frame 1412 and oned adjacent the ear canal of the user in some ments, another
speaker, not shown, is positioned adjacent the other ear canal of the user to provide for
/shapeable sound control. The display 1408 is operatively coupled 1420, such as by a
wired lead or wireless connectivity, to a local data processing module 1424 which may be
mounted in a variety of configurations, such as fixedly attached to the frame 1412, fixedly
attached to a helmet or hat worn by the user, embedded in headphones, or otherwise
removably attached to the user 1404 (e.g., in a backpack-style uration, in a beltcoupling
style configuration).
The local sing and data module 1424 may comprise a hardware
processor, as well as non-transitory digital memory, such as non-volatile memory e.g., flash
, both of which may be utilized to assist in the processing, caching, and storage of
data. The data include data (a) captured from sensors (which may be, e.g., operatively
coupled to the frame 1412 or otherwise attached to the wearer 1404), such as image capture
devices (such as cameras), hones, inertial measurement units, accelerometers,
compasses, GPS units, radio devices, and/or gyros; and/or (b) acquired and/or processed
using remote processing module 1428 and/or remote data repository 1432, possibly for
passage to the display 1408 after such processing or retrieval. The local processing and data
module 1424 may be operatively coupled to the remote processing module 1428 and remote
data repository 1432 by communication links 1436, 1440, such as via a wired or wireless
communication links, such that these remote modules 1428, 1432 are operatively coupled to
each other and available as ces to the local processing and data module 1424. The
image capture device(s) can be used to capture the eye images used in the eye image
segmentation, eye image y estimation, pupil contour ination, or iris contour
determination procedures.
In some embodiments, the remote processing module 1428 may comprise
one or more processors ured to analyze and process data and/or image information
such as video information captured by an image capture device. The video data may be
stored locally in the local processing and data module 1424 and/or in the remote data
repository 1432. In some embodiments, the remote data repository 1432 may comprise a
digital data storage facility, which may be available through the internet or other networking
configuration in a “cloud” resource configuration. In some embodiments, all data is stored
and all computations are performed in the local processing and data module 1424, allowing
fully autonomous use from a remote .
In some implementations, the local processing and data module 1424
and/or the remote processing module 1428 are programmed to m embodiments of eye
image segmentation, eye image quality tion, pupil contour determination, or iris
contour determination disclosed herein. For e, the local processing and data module
1424 and/or the remote processing module 1428 can be programmed to perform
embodiments of the processes 500, 600, 700, or 900 described with reference to FIGS. 5, 6,
7, or 9. The local processing and data module 1424 and/or the remote processing module
1428 can be mmed to use the eye image segmentation, eye image quality estimation,
pupil r ination, or iris contour determination techniques disclosed herein in
biometric extraction, for example to identify or authenticate the identity of the wearer 1404.
The image e device can capture video for a particular application (e.g., video of the
wearer’s eye for an eye-tracking application or video of a wearer’s hand or finger for a
gesture identification application). The video can be analyzed using the CNN 100 by one or
both of the sing modules 1424, 1428. In some cases, off-loading at least some of the
eye image segmentation, eye image quality estimation, pupil contour ination, or iris
contour determination to a remote processing module (e.g., in the “cloud”) may improve
efficiency or speed of the computations. The parameters of the CNN 100 (e.g., weights, bias
terms, subsampling factors for pooling layers, number and size of kernels in different layers,
number of feature maps, etc.) can be stored in data modules 1424 and/or 1432.
The results of the video analysis (e.g., the output of the CNN 100) can be
used by one or both of the sing s 1424, 1428 for additional operations or
processing. For example, in various CNN applications, biometric identification, eyetracking
, recognition or classification of gestures, objects, poses, etc. may be used by the
wearable y system 1400. For example, video of the wearer’s eye(s) can be used for eye
image segmentation or image quality estimation, which, in turn, can be used by the
processing modules 1424, 1428 for iris contour determination or pupil contour determination
of the wearer 1404 through the display 1408. The processing modules 1424, 1428 of the
wearable display system 1400 can be mmed with one or more embodiments of eye
image segmentation, eye image quality estimation, pupil contour determination, or iris
contour determination to perform any of the video or image sing applications
described herein.
Embodiments of the CNN 100 can be used to segment eye images and
provide image quality estimation in other biometric applications. For example, an eye
r in a biometric security system (such as, e.g., those used at transportation depots such
as airports, train stations, etc., or in secure facilities) that is used to scan and analyze the eyes
of users (such as, e.g., passengers or workers at the secure facility) can include an eyeimaging
camera and hardware programmed to process eye images using embodiments of the
CNN 100. Other applications of the CNN 100 are possible such as for biometric
identification (e.g., generating iris , eye gaze tracking, and so forth.
Additional Aspects
In a 1st aspect, a method for eye image segmentation and image quality
estimation is disclosed. The method is under control of a hardware processor and comprises:
receiving an eye image; processing the eye image using a convolution neural network to
generate a tation of the eye image; and processing the eye image using the
convolution neural network to generate a quality tion of the eye image, wherein the
convolution neural network comprises a segmentation tower and a quality estimation tower,
wherein the segmentation tower comprises segmentation layers and shared layers, wherein
the quality estimation tower comprises quality estimation layers and the shared layers,
wherein a first output layer of the shared layers is connected to a first input layer of the
segmentation tower and a second input layer of the segmentation tower, wherein the first
output layer of the shared layers is connected to an input layer of the quality estimation layer,
and wherein receiving the eye image comprises receiving the eye image by an input layer of
the shared layers.
In a 2nd , the method of aspect 1, n a second output layer of
the shared layers is connected to a third input layer of the segmentation tower.
In a 3rd aspect, the method of any one of aspects 1-2, wherein processing
the eye image using the convolution neural network to generate the segmentation of the eye
image comprises ting the segmentation of the eye image using the segmentation tower,
and n an output of an output layer of the segmentation tower is the segmentation of the
eye image.
In a 4th , the method of aspect 3, wherein the tation of the
eye image includes a background, a sclera, an iris, or a pupil of the eye image.
In a 5th aspect, the method of any one of aspects 1-4, wherein processing
the eye image using the convolution neural network to generate the quality estimation of the
eye image ses generating the quality estimation of the eye image using the quality
estimation tower, and wherein an output of an output layer of the quality estimation tower
comprises the quality estimation of the eye image.
In a 6th aspect, the method of any one of aspects 1-5, n the quality
estimation of the eye image is a good quality estimation or a bad y estimation.
In a 7th aspect, the method of any one of aspects 1-6, wherein the shared
layers, the segmentation , or the quality estimation layers comprise a convolution layer,
a brightness normalization layer, a batch normalization layer, a rectified linear layer, an
upsampling layer, a concatenation layer, a pooling layer, a fully connected layer, a linear
fully connected layer, a softsign layer, or any combination thereof.
In a 8th aspect, a method for eye image tation and image quality
estimation is disclosed. The method is under control of a re processor and ses:
receiving an eye image; processing the eye image using a convolution neural network to
generate a segmentation of the eye image; and processing the eye image using the
convolution neural network to generate a y estimation of the eye image.
In a 9th aspect, the method of aspect 8, wherein the convolution neural
network comprises a segmentation tower and a quality estimation tower, wherein the
segmentation tower comprises segmentation layers and shared layers, wherein the quality
estimation tower comprises quality estimation layers and the shared layers, and wherein
receiving the eye image comprises receiving the eye image by an input layer of the shared
layers.
In a 10th aspect, the method of aspect 9, wherein a first output layer of the
shared layers is connected to a first input layer of the segmentation tower.
In a 11th , the method of aspect 10, wherein the first output layer of
the shared layers is connected to a second input layer of the segmentation tower.
In a 12th aspect, the method of any one of aspects 10-11, wherein the first
output layer of the shared layers is connected to an input layer of the quality estimation
tower.
In a 13th , the method of any one of aspects 9-12, wherein
processing the eye image using the convolution neural network to generate the segmentation
of the eye image comprises generating the segmentation of the eye image using the
segmentation tower, and wherein an output of an output layer of the segmentation tower is
the segmentation of the eye image.
In a 14th aspect, the method of any one of aspects 9-13, wherein the
segmentation of the eye image es a background, a sclera, an iris, or a pupil of the eye
image.
In a 15th aspect, the method of any one of aspects 9-14, wherein
processing the eye image using the convolution neural network to generate the quality
estimation of the eye image comprises generating the quality estimation of the eye image
using the quality estimation tower, and wherein an output of an output layer of the quality
estimation tower is the quality estimation of the eye image.
In a 16th aspect, the method of any one of aspects 9-15, wherein the
shared layers, the segmentation layers, or the quality estimation layers comprise a
convolution layer, a batch ization layer, a rectified linear layer, an upsampling layer, a
concatenation layer, a pooling layer, a fully connected layer, a linear fully connected layer, or
any ation thereof.
In a 17th aspect, the method of aspect 16, wherein the batch normalization
layer is a batch local contrast normalization layer or a batch local response normalization
layer.
In a 18th aspect, the method of any one of aspects 9-17, wherein the
shared layers, the segmentation layers, or the quality estimation layers se a brightness
ization layer, a softsign layer, or any combination thereof.
In a 19th aspect, the method of any one of aspects 8-18, n the eye
image is ed by an image sensor of a user device for authentication.
In a 20th aspect, the method of any one of aspects 8-19, wherein the
segmentation of the eye image comprises mostly of the iris n of the eye image.
In a 21st aspect, the method of any one of aspects 8-19, wherein the
segmentation of the eye image comprises mostly of the retina portion of the eye image.
In a 22nd aspect, a method for training a convolution neural network for
eye image segmentation and image quality estimation is disclosed. The method is under
control of a hardware processor and comprises: obtaining a training set of eye images;
providing a convolutional neural k with the training set of eye images; and training the
convolutional neural network with the training set of eye images, wherein the convolution
neural network comprises a segmentation tower and a quality estimation tower, wherein the
segmentation tower comprises segmentation layers and shared layers, wherein the y
estimation tower comprises quality estimation layers and the shared layers, wherein an output
layer of the shared layers is connected to a first input layer of the tation tower and a
second input layer of the segmentation tower, and wherein the output layer of the shared
layers is connected to an input layer of the quality estimation layer.
In a 23rd aspect, the method of aspect 22, wherein training the
convolutional neural network with the training set of eye images comprises: processing an
eye image of the training set using the segmentation tower to generate a segmentation of the
eye image; processing the eye image of the training set using the quality estimation tower to
generate a y estimation of the eye image; computing a first difference between the
tation of the eye image and a reference segmentation of the eye image; computing a
second difference between the y estimation of the eye image and a nce quality
estimation of the eye image; and updating ters of the convolutional neural network
using the first difference and the second difference.
In a 24th aspect, the method of aspect 23, n updating the parameters
of the utional neural network using the first difference and the second difference
comprises setting a first percentage of the parameters of the convolutional neural network to
values of zero during a first training cycle when training the convolutional neural network.
In a 25th aspect, the method of aspect 24, wherein setting the first
percentage of the parameters of the convolutional neural network to values of zero during the
first training cycle when training the convolutional neural k comprises randomly
setting the first percentage of the parameters of the convolutional neural network to values of
zero during the first ng cycle when training the convolutional neural network.
In a 26th aspect, the method of any one of s 24-25, wherein
updating the parameters of the convolutional neural network using the first difference and the
second difference further comprises setting a second percentage of the parameters of the
convolutional neural network to values of zero during a second training cycle when training
the utional neural network.
In a 27th aspect, the method of aspect 26, wherein setting the second
percentage of the ters of the convolutional neural network to values of zero during the
second training cycle when training the convolutional neural network comprises randomly
g the second percentage of the parameters of the convolutional neural network to values
of zero during the second training cycle when ng the convolutional neural network.
In a 28th aspect, the method of aspect 27, wherein the first percentage or
the second percentage is between 50% and 30%.
In a 29th aspect, the method of any one of aspects 23-28, wherein the
segmentation of the eye image comprises a ound, a sclera, an iris, or a pupil of the eye
image, and wherein the reference segmentation of the eye image comprises a nce
background, a reference sclera, a reference iris, or a nce pupil of the eye image.
In a 30th aspect, the method of any one of aspects 22-28, wherein the
shared layers, the segmentation layers, or the quality tion layers se a
convolution layer, a brightness normalization layer, a batch normalization layer, a rectified
linear layer, an upsampling layer, a concatenation layer, a pooling layer, a fully connected
layer, a linear fully connected layer, a softsign layer, or any combination thereof.
In a 31st aspect, a computer system is disclosed. The computer system
comprises: a hardware processor; and non-transitory memory having ctions stored
thereon, which when executed by the hardware processor cause the processor to perform the
method of any one of aspects 1-30.
In a 32nd aspect, the computer system of aspect 31, wherein the computer
system ses a mobile device.
In a 33rd aspect, the computer system of aspect 32, wherein the mobile
device comprises a wearable display system.
In a 34th aspect, a method for determining eye contours in a semantically
segmented eye image is disclosed. The method is under control of a hardware processor and
comprises: receiving a semantically segmented eye image of an eye image comprising a
plurality of pixels, wherein a pixel of the ically segmented eye image has a color
value, wherein the color value of the pixel of the semantically segmented eye image is a first
color value, a second color value, a third color value, and a fourth color value, wherein the
first color value corresponds to a background of the eye image, wherein the second color
value corresponds to a sclera of the eye in the eye image, wherein the third color value
corresponds to an iris of the eye in the eye image, and n the fourth color value
corresponds to a pupil of the eye in the eye image; determining a pupil contour using the
semantically segmented eye image; determining an iris contour using the semantically
segmented eye image; and determining a mask for an irrelevant area in the semantically
segmented eye image.
In a 35th aspect, the method of aspect 34, wherein the first color value is
r than the second color value, wherein the second color value is greater than the third
color value, and wherein the third color value is greater than the fourth color value.
In a 36th aspect, the method of any one of aspects 34-35, wherein
determining the pupil contour using the semantically segmented eye image comprises:
creating a first binary image sing a plurality of pixels, wherein a color value of a first
binary image pixel of the first binary image is the fourth color value if a corresponding pixel
in the semantically segmented eye image has a value greater than or equal to the fourth color
value, and the third color value if the corresponding pixel in the semantically ted eye
image has a value not greater than or equal to the fourth color value; determining contours in
the first binary image; selecting a longest contour of the determined contours in the first
binary image as a pupil contour ; determining a pupil contour points bounding box
enclosing the pupil contour border; computing a pupil points area size as a al of the
pupil contours points bounding box; creating a second binary image comprising a plurality of
pixels, wherein a color value of a second binary image pixel of the plurality of pixels of the
second binary image is the third color value if a corresponding pixel in the semantically
segmented eye image has a value r than or equal to the third color value, and the
second color value if the corresponding pixel in the semantically segmented eye image has a
value not greater than or equal to the third color value; for a pupil contour border pixel of the
pupil contour border: determining a closest pixel in the second binary image that has a color
value of the second color value and that is closest to the pupil contour border pixel;
determining a distance between the pupil contour border pixel and the closest pixel in the
second binary image; and removing the pupil r border pixel from the pupil contour
border if the distance between the pupil contour border pixel and the closest pixel in the
second binary image is smaller than a predetermined pupil r threshold; and
determining the pupil contour as an ellipse from remaining pixels of the pupil contour border.
In a 37th aspect, the method of any one of aspects 34-36, wherein
determining the iris contour using the semantically segmented eye image comprises: ng
a third binary image comprising a ity of pixels, wherein a color value of a third binary
image pixel of the plurality of pixels of the third binary image is the third color value if a
corresponding pixel in the semantically segmented eye image has a value greater than or
equal to the third color value, and the second color value if the corresponding pixel in the
semantically segmented eye image has a value not greater than or equal to the third color
value; determining contours in the third binary image; selecting a longest contour of the
ined contours in the third binary image as an iris contour border; determining an iris
contour points bounding box enclosing the iris contour border; computing an iris points area
size as a diagonal of the iris contours points bounding box; creating a fourth binary image
sing a plurality of , wherein a color value of a fourth binary image pixel of the
plurality of pixels of the fourth binary image is the second color value if a corresponding
pixel in the semantically segmented eye image has a value greater than or equal to the second
color value, and the first color value if the corresponding pixel in the semantically segmented
eye image has a value not greater than or equal to the second color value; for an iris contour
border pixel of the contour border: determining a closest pixel in the fourth binary image that
has a color value of the first color value and that is t to the iris contour border pixel;
determining a ce between the iris contour border pixel and the t pixel in the
fourth binary image; and removing the iris contour border pixel from the iris contour border
if the distance between the iris contour border pixel and the closest pixel in the fourth binary
image is smaller than a predetermined iris contour threshold; and determining the iris contour
by determining an ellipse from remaining pixels of the iris contour border.
In a 38th , the method of any one of aspects 34-37, determining the
mask for the irrelevant area in the eye image comprises: creating a binary mask image
comprising a plurality of pixels, wherein a binary mask image pixel of the binary mask image
has a color value; setting the color value of the binary mask image pixel to the third color
value if a corresponding pixel in the semantically segmented eye image has a value r
than or equal to the third color value; and setting the color value of the binary mask image
pixel to the second color value if a corresponding pixel in the semantically segmented eye
image has a value not greater than or equal to the third color value.
In a 39th , the method of any one of aspects 36-38, wherein
predetermined pupil contour threshold is a fraction multiplied by the pupil points area size,
and wherein the fraction is in a range from 0.02 to 0.20.
In a 40th aspect, the method of any one of aspects 37-39, wherein the
predetermined iris contour threshold is a fraction multiple by the iris points area size, and
wherein the fraction is in a range from 0.02 to 0.20.
In a 41st aspect, the method of any one of s 34-40, further
comprising creating a polar image of an iris of an eye in the eye image from the eye image
using the pupil contour, the iris r, and the mask for the irrelevant area in the
semantically segmented eye image.
In a 42nd aspect, the method of any one of aspects 34-41, wherein
receiving the ically segmented eye image of an eye image comprising a plurality of
pixels comprises: receiving an eye image; processing the eye image using a convolution
neural network to generate the semantically segmented eye image; and processing the eye
image using the convolution neural network to te a y estimation of the eye
image, wherein the convolution neural network comprises a tation tower and a quality
estimation tower, wherein the segmentation tower ses segmentation layers and shared
layers, wherein the quality estimation tower comprises quality estimation layers and the
shared layers, wherein a first output layer of the shared layers is connected to a first input
layer of the segmentation tower and a second input layer of the segmentation tower, wherein
the first output layer of the shared layers is connected to an input layer of the quality
estimation layer, and wherein receiving the eye image comprises receiving the eye image by
an input layer of the shared layers.
In a 43rd aspect, a method for determining eye contours in a semantically
segmented eye image is disclosed. The method is under control of a hardware processor and
ses: ing a semantically segmented eye image of an eye image; determining a
pupil contour of an eye in the eye image using the semantically segmented eye image;
determining an iris contour of the eye in the eye image using the ically segmented eye
image; and determining a mask for an irrelevant area in the eye image.
In a 44th aspect, the method of aspect 43, wherein a dimension of the
semantically segmented eye image and a dimension of the mask image are the same.
In a 45th aspect, the method of any one of aspects 43-44, n the
semantically segmented eye image comprises a plurality of pixels, and wherein a color value
of a pixel of the semantically segmented eye image corresponds to a background of the eye
image, a sclera of the eye in the eye image, an iris of the eye in the eye image, or a pupil of
the eye in the eye image.
In a 46th , the method of aspect 45, wherein the color value of the
pixel of the semantically segmented eye image is a first color value, a second color value, a
third color value, or a fourth color, wherein the first color value ponds to the
background of the eye image, wherein the second color value corresponds to the sclera of the
eye in the eye image, wherein the third color value corresponds to the iris of the eye in the
eye image, and wherein the fourth color value corresponds to the pupil of the eye in the eye
image.
In a 47th , the method of aspect 46, wherein the first color value is
greater than the second color value, wherein the second color value is greater than the third
color value, and wherein the third color value is greater than the fourth color value.
In a 48th aspect, the method of any one of aspects 46-47, wherein
determining the pupil contour using the semantically segmented eye image comprises:
creating a first binary image from the ically segmented eye image; determining a
longest pupil contour in the first binary image; creating a second binary image from the
segmented eye image; removing a t pupil contour pixel of the longest pupil contour
using the second binary image that is inappropriate for ining the pupil r; and
determining the pupil contour as an ellipse from remaining pixels of the longest pupil contour
in the first binary image.
In a 49th aspect, the method of aspect 48, wherein a pixel of the first
binary image has a first binary image color value if a corresponding pixel in the semantically
segmented eye image has a value greater than or equal to the fourth color value, and a second
binary image color value otherwise, wherein the first binary image color value is greater than
the second binary image color value, and wherein a pixel of the second binary image has the
first binary image color value if a corresponding pixel in the semantically segmented eye
image has a value r than or equal to the third color value, and the second binary image
color value otherwise.
In a 50th aspect, the method of any one of aspects 48-49, wherein
removing the longest pupil contour pixel of the longest pupil contour using the second binary
image that is opriate for determining the pupil r comprises: determining a
distance between the longest pupil contour pixel and a pixel in the second binary image that
has the second binary image color value and is closest to the longest pupil contour pixel; and
removing the longest pupil contour pixel from the longest pupil contour if the distance is
smaller than a predetermined pupil contour threshold.
In a 51st aspect, the method of aspect 50, wherein determining the
distance between the longest pupil contour pixel and the pixel in the second binary image
that has the second binary image color value and is closest to the longest pupil contour pixel
comprises: determining a distance n a pixel in the second binary image ponding
to the longest pupil contour pixel and the pixel in the second binary image that has the
second binary image color value and is closest to the pixel in the second binary image
corresponding to the longest pupil contour pixel.
In a 52nd aspect, the method of any one of aspects 48-49, further
comprising determining a smallest bounding box enclosing the t pupil contour in the
first binary image.
In a 53rd , the method of aspect 52, further comprising determining
a size of the smallest bounding box enclosing the longest pupil contour in the first binary
image.
In a 54th aspect, the method of aspect 53, wherein the size of the smallest
bounding box enclosing the longest pupil contour in the first binary image is a diagonal of
the smallest bounding box enclosing the longest pupil contour in first the binary image.
In a 55th aspect, the method of any one of aspects 53-54, wherein the
predetermined pupil r threshold is a fraction multiplied by the size of the smallest
bounding box enclosing the longest pupil contour in the first binary image, and wherein the
fraction is in a range from 0.02 to 0.20.
In a 56th aspect, the method of any one of aspects 48-55, wherein
determining the iris contour using the semantically segmented eye image comprises: creating
a third binary image from the semantically segmented eye image; determining a longest iris
contour in the first binary image; creating a fourth binary image from the ically
segmented eye image; removing a longest iris contour pixel of the longest iris contour using
the fourth binary image that is inappropriate for determining the iris contour; and
determining the iris contour as an ellipse from remaining pixels of the longest iris contour in
the first binary image.
In a 57th aspect, the method of aspect 56, n a pixel of the third
binary image has the first binary image color value if a ponding pixel in the
semantically ted eye image has a value greater than or equal to the third color value,
and the second binary image color value ise, and wherein a pixel of the fourth binary
image has the first binary image color value if a ponding pixel in the ically
segmented eye image has a value greater than or equal to the second color value, and the
second binary image color value otherwise.
In a 58th aspect, the method of any one of aspects 56-57, wherein
removing the longest iris contour pixel of the longest iris contour using the fourth binary
image that is inappropriate for determining the iris contour comprises: determining a distance
between the longest iris contour pixel and a pixel in the fourth binary image that has the
second binary image color value and is closest to the longest iris contour pixel; and removing
the longest iris contour pixel from the longest iris contour if the distance between the longest
iris contour pixel and the pixel in the fourth binary image is smaller than a predetermined iris
contour threshold.
In a 59th aspect, the method of aspect 58, wherein determining the
ce between the longest iris contour pixel and the pixel in the fourth binary image that
has the second binary image color value and is closest to the longest iris contour pixel
comprises: determining a distance between a pixel in the fourth binary image corresponding
to the longest iris contour pixel and the pixel in the fourth binary image that has a color value
of the second binary image color value and is closest to the pixel in the fourth binary image
corresponding to the longest iris r pixel.
In a 60th aspect, the method of any one of aspects 56-57, further
comprising determining a smallest bounding box enclosing the longest iris contour in the
third binary image.
In a 61st aspect, the method of aspect 60, further comprising determining a
size of the smallest bounding box enclosing the longest iris contour in the third binary image.
In a 62nd aspect, the method of aspect 61, wherein the size of the smallest
bounding box enclosing the t iris contour in the third binary image is a diagonal of the
smallest bounding box ing the longest iris r in third the binary image.
In a 63rd aspect, the method of any one of aspects 61-62, wherein the
predetermined iris contour threshold is a fraction multiplied by the size of the smallest
bounding box enclosing the longest iris contour in the first binary image, wherein the fraction
is in a range from 0.02 to 0.20.
In a 64th aspect, the method of any one of aspects 49-63, wherein
determining the mask for the irrelevant area in the eye image comprises creating a binary
mask image comprising a plurality of pixels, wherein a pixel of the binary mask image has
the first binary image color value if a corresponding pixel in the semantically ted eye
image has a value greater than or equal to the third color value, and the second binary image
color value otherwise.
In a 65th aspect, the method of any one of aspects 43-64, further
comprising creating a polar image of an iris of an eye in the eye image from the eye image
using the pupil contour, the iris r, and the mask for the irrelevant area in the
semantically segmented eye image.
In a 66th aspect, the method of any one of aspects 43-65, wherein
ing the semantically segmented eye image of an eye image comprises: receiving an eye
image; processing the eye image using a convolution neural network to generate the
segmentation of the eye image; and processing the eye image using the convolution neural
network to generate a quality estimation of the eye image.
In a 67th aspect, the method of any one of aspects 43-66, wherein
receiving the semantically segmented eye image of an eye image comprises: receiving an eye
image; processing the eye image using a convolution neural network to generate the
semantically segmented eye image; and processing the eye image using the convolution
neural network to generate a quality estimation of the eye image.
In a 68th aspect, a computer system is disclosed. The er system
ses: a hardware processor; and non-transitory memory having instructions stored
thereon, which when executed by the re processor cause the processor to perform the
method of any one of aspects 34-67.
In a 69th aspect, the computer system of aspect 68, wherein the computer
system comprises a mobile device.
In a 70th aspect, the computer system of aspect 69, wherein the mobile
device comprises a wearable display system. The wearable display system may comprise a
head-mounted augmented or virtual reality display .
In a 71st aspect, a system for eye image segmentation and image quality
estimation, the system comprising: an eye-imaging camera configured to obtain an eye
image; non-transitory memory configured to store the eye image; a hardware processor in
communication with the non-transitory memory, the hardware sor programmed to:
receive the eye image; process the eye image using a convolution neural network to generate
a tation of the eye image; and process the eye image using the convolution neural
network to generate a y estimation of the eye image, wherein the convolution neural
k comprises a segmentation tower and a quality estimation tower, wherein the
segmentation tower comprises segmentation layers and shared layers, wherein the quality
estimation tower comprises quality estimation layers and the shared , wherein a first
output layer of the shared layers is ted to a first input layer of the segmentation tower
and to a second input layer of the segmentation tower, at least one of the first input layer or
the second input layer comprising a concatenation layer, wherein the first output layer of the
shared layers is connected to an input layer of the quality estimation layer, and wherein the
eye image is received by an input layer of the shared layers.
In a 72nd aspect, the system of aspect 71, wherein a second output layer of
the shared layers is connected to a third input layer of the segmentation tower, the third input
layer comprising a enation layer.
In a 73rd aspect, the system of any one of aspects 71 or 72, wherein to
process the eye image using the convolution neural network to generate the segmentation of
the eye image, the hardware sor is programmed to: generate the segmentation of the
eye image using the tation tower, wherein an output of an output layer of the
segmentation tower comprises the segmentation of the eye image.
In a 74th aspect, the system of any one of aspects 71 to 73, wherein the
segmentation of the eye image includes a background, a sclera, an iris, or a pupil of the eye
image.
In a 75th aspect, the system of aspect 74, wherein the hardware processor
is further programmed to: determine a pupil contour of an eye in the eye image using the
segmentation of the eye image; determine an iris contour of the eye in the eye image using
the tation of the eye image; and determine a mask for an irrelevant area in the eye
image.
In a 76th aspect, the system of any one of aspects 71 to 75, wherein the
shared layers are ured to encode the eye image by decreasing a spatial dimension of
feature maps and increasing a number of e maps computed by the shared layers.
In a 77th , the system of aspect 76, wherein the segmentation layers
are configured to decode the eye image encoded by the shared layers by increasing the l
dimension of the e maps and reducing the number of feature maps.
In a 78th aspect, the system of any one of aspects 71 to 77, wherein to
process the eye image using the convolution neural network to generate the quality
estimation of the eye image, the hardware processor is programmed to: generate the quality
estimation of the eye image using the quality estimation tower, wherein an output of an
output layer of the quality estimation tower comprises the quality estimation of the eye
image.
In a 79th aspect, the system of any one of aspects 71 to 78, wherein the
quality estimation tower is configured to output at least two ls of output, wherein a
first of the at least two channels comprises a good quality estimation and a second of the at
least two channels comprises a bad quality estimation.
In an 80th aspect, the system of any one of aspects 71 to 79, wherein the
shared layers, the segmentation layers, or the y estimation layers se a
convolution layer, a ness normalization layer, a batch normalization layer, a rectified
linear layer, an upsampling layer, a concatenation layer, a pooling layer, a fully connected
layer, a linear fully connected layer, a softsign layer, or any combination thereof.
In an 81st aspect, a system for eye image segmentation and image quality
tion, the system comprising: an eye-imaging camera configured to obtain an eye
image; non-transitory memory configured to store the eye image; a re processor in
communication with the non-transitory memory, the hardware sor programmed to:
receive the eye image; process the eye image using a convolution neural network to generate
a segmentation of the eye image; and process the eye image using the convolution neural
network to generate a quality estimation of the eye image, wherein the convolution neural
k comprises a segmentation tower and a quality tion tower, wherein the
segmentation tower comprises segmentation layers and shared layers, n the quality
estimation tower comprises quality estimation layers and the shared layers, wherein the
segmentation layers are not shared with the quality estimation tower, wherein the quality
estimation layers are not shared with the tation tower, and wherein the eye image is
received by an input layer of the shared layers.
In an 82nd aspect, the system of aspect 81, n a first output layer of
the shared layers is connected to a first input layer of the segmentation tower.
In an 83rd aspect, the system of aspect 82, wherein the first output layer of
the shared layers is connected to a second input layer of the segmentation tower, wherein the
first input layer or the second input layer comprises a enation layer.
In an 84th aspect, the system of aspect 82 or 83, wherein the first output
layer of the shared layers is further connected to an input layer of the quality estimation
tower.
In an 85th aspect, the system of any one of aspects 81 to 84, wherein to
process the eye image using the ution neural network to generate the segmentation of
the eye image, the hardware processor is programmed to: generate the segmentation of the
eye image using the segmentation tower, wherein an output of an output layer of the
segmentation tower comprises the segmentation of the eye image.
In an 86th aspect, the system of any one of aspects 81 to 85, wherein the
segmentation of the eye image includes a background, a sclera, an iris, or a pupil of the eye
image.
In an 87th aspect, the system of any one of aspects 81 to 86, wherein to
s the eye image using the convolution neural network to generate the quality
estimation of the eye image, the hardware processor is programmed to: generate the quality
estimation of the eye image using the quality estimation tower, wherein an output of an
output layer of the quality estimation tower comprises the y tion of the eye
image.
In an 88th aspect, the system of any one of aspects 81 to 87, wherein the
shared layers, the segmentation layers, or the quality estimation layers comprise a
convolution layer, a batch normalization layer, a rectified linear layer, an ling layer, a
concatenation layer, a pooling layer, a fully ted layer, a linear fully connected layer, or
any combination thereof.
In an 89th aspect, the system of aspect 88, wherein the batch
normalization layer is a batch local contrast normalization layer or a batch local response
normalization layer.
In a 90th , the system of any one of aspects 81 to 89, wherein the
shared layers, the segmentation layers, or the quality estimation layers se a brightness
normalization layer, a softsign layer, or any combination thereof.
In a 91st aspect, the system of any one of aspects 71 to 90, further
comprising a display configured to display virtual images to a user of the system.
In a 92nd , the system of aspect 91, wherein the display comprises a
light field display or a display configured to display the virtual images at multiple depth
planes.
In a 93rd aspect, the system of any one of aspects 71 to 92, wherein the
hardware sor is further programmed to calculate a biometric signature from a
segmentation of the eye image, wherein the segmentation is generated by the segmentation
tower of the convolution neural network.
In a 94th aspect, the system of aspect 93 n the biometric signature
comprises an iris code.
Each of the processes, methods, and algorithms described herein and/or
depicted in the attached figures may be embodied in, and fully or partially automated by,
code modules executed by one or more physical computing systems, hardware computer
processors, application-specific circuitry, and/or electronic hardware configured to execute
specific and particular computer instructions. For example, computing s can include
general e computers (e.g., servers) mmed with specific computer instructions or
special purpose computers, special purpose circuitry, and so forth. A code module may be
compiled and linked into an executable program, installed in a dynamic link library, or may
be written in an interpreted mming language. In some implementations, particular
operations and methods may be performed by circuitry that is specific to a given function.
Further, certain implementations of the functionality of the present
disclosure are sufficiently mathematically, computationally, or technically complex that
application-specific hardware or one or more physical ing s (utilizing
appropriate specialized executable instructions) may be necessary to perform the
functionality, for e, due to the volume or complexity of the calculations involved or to
provide results substantially in real-time. For example, a video may include many frames,
with each frame having millions of pixels, and ically mmed computer hardware
is necessary to process the video data to e a desired image processing task (e.g., eye
image tation and quality estimation using the CNN 100 with the merged architecture)
or application in a commercially reasonable amount of time.
Code modules or any type of data may be stored on any type of nontransitory
computer-readable medium, such as physical computer storage including hard
drives, solid state memory, random access memory (RAM), read only memory (ROM),
optical disc, volatile or non-volatile storage, combinations of the same and/or the like. The
methods and modules (or data) may also be transmitted as generated data signals (e.g., as part
of a carrier wave or other analog or digital propagated signal) on a variety of computerreadable
transmission mediums, including wireless-based and cable-based s,
and may take a variety of forms (e.g., as part of a single or multiplexed analog signal, or as
multiple discrete digital packets or frames). The results of the disclosed processes or process
steps may be stored, persistently or otherwise, in any type of non-transitory, tangible
computer storage or may be communicated via a computer-readable transmission medium.
Any processes, blocks, states, steps, or functionalities in flow diagrams
described herein and/or depicted in the attached figures should be understood as potentially
representing code modules, segments, or ns of code which include one or more
executable instructions for implementing specific ons (e.g., logical or arithmetical) or
steps in the process. The various processes, blocks, states, steps, or functionalities can be
ed, rearranged, added to, deleted from, modified, or otherwise changed from the
illustrative examples provided herein. In some embodiments, additional or different
ing systems or code s may perform some or all of the functionalities described
herein. The methods and processes bed herein are also not limited to any particular
sequence, and the blocks, steps, or states relating thereto can be performed in other sequences
that are appropriate, for example, in serial, in el, or in some other manner. Tasks or
events may be added to or removed from the disclosed example embodiments. Moreover,
the separation of s system components in the implementations described herein is for
illustrative purposes and should not be understood as requiring such separation in all
implementations. It should be understood that the described program components, methods,
and systems can generally be integrated together in a single computer t or packaged
into multiple computer products. Many implementation variations are possible.
The processes, s, and systems may be implemented in a network
(or distributed) computing environment. k environments include enterprise-wide
computer networks, intranets, local area networks (LAN), wide area networks (WAN),
personal area networks (PAN), cloud ing networks, crowd-sourced computing
networks, the Internet, and the World Wide Web. The network may be a wired or a wireless
network or any other type of communication network.
The s and methods of the disclosure each have several innovative
aspects, no single one of which is solely responsible or required for the desirable attributes
disclosed . The various features and processes described above may be used
independently of one another, or may be combined in various ways. All possible
combinations and subcombinations are intended to fall within the scope of this sure.
Various modifications to the implementations described in this disclosure may be readily
apparent to those skilled in the art, and the c principles defined herein may be applied
to other implementations without departing from the spirit or scope of this disclosure. Thus,
the claims are not ed to be limited to the implementations shown herein, but are to be
accorded the widest scope consistent with this disclosure, the principles and the novel
features disclosed herein.
Certain features that are described in this ication in the context of
separate implementations also can be implemented in combination in a single
implementation. Conversely, various features that are described in the context of a single
implementation also can be implemented in multiple implementations separately or in any
suitable subcombination. Moreover, although features may be described above as acting in
certain combinations and even initially d as such, one or more features from a claimed
combination can in some cases be d from the combination, and the claimed
combination may be directed to a subcombination or variation of a subcombination. No
single feature or group of features is necessary or indispensable to each and every
embodiment.
Conditional language used herein, such as, among , “can,” “could,”
“might,” “may,” “e.g.,” and the like, unless specifically stated otherwise, or otherwise
understood within the context as used, is generally intended to convey that n
embodiments include, while other embodiments do not include, certain features, elements
and/or steps. Thus, such conditional language is not generally intended to imply that
features, elements and/or steps are in any way required for one or more embodiments or that
one or more ments necessarily include logic for deciding, with or without author input
or prompting, whether these features, ts and/or steps are included or are to be
performed in any particular embodiment. The terms “comprising,” “including,” “having,”
and the like are synonymous and are used inclusively, in an open-ended fashion, and do not
exclude additional elements, features, acts, operations, and so forth. Also, the term “or” is
used in its inclusive sense (and not in its exclusive sense) so that when used, for example, to
connect a list of elements, the term “or” means one, some, or all of the elements in the list. In
addition, the articles “a,” “an,” and “the” as used in this ation and the appended claims
are to be construed to mean “one or more” or “at least one” unless specified otherwise.
As used herein, a phrase referring to “at least one of” a list of items refers
to any combination of those items, including single s. As an example, “at least one
of: A, B, or C” is intended to cover: A, B, C, A and B, A and C, B and C, and A, B, and C.
Conjunctive language such as the phrase “at least one of X, Y and Z,” unless specifically
stated otherwise, is otherwise understood with the context as used in general to convey that
an item, term, etc. may be at least one of X, Y or Z. Thus, such conjunctive language is not
generally ed to imply that n embodiments e at least one of X, at least one of
Y and at least one of Z to each be present.
Similarly, while operations may be depicted in the drawings in a ular
order, it is to be recognized that such operations need not be performed in the particular order
shown or in sequential order, or that all illustrated operations be performed, to achieve
ble results. Further, the drawings may schematically depict one more example
processes in the form of a flowchart. However, other operations that are not depicted can be
incorporated in the example methods and processes that are schematically illustrated. For
example, one or more additional operations can be performed before, after, simultaneously,
or between any of the illustrated operations. onally, the operations may be rearranged
or reordered in other implementations. In n circumstances, asking and parallel
processing may be advantageous. Moreover, the separation of various system components in
the implementations described above should not be understood as requiring such separation
in all implementations, and it should be understood that the described m components
and systems can generally be integrated together in a single software t or ed
into multiple software products. Additionally, other entations are within the scope of
the following claims. In some cases, the actions recited in the claims can be performed in a
different order and still achieve desirable results.
The reference in this specification to any prior publication (or information
derived from it), or to any matter which is known, is not, and should not be taken as an
acknowledgment or admission or any form of suggestion that the prior publication (or
information derived from it) or known matter forms part of the common general dge
in the field of endeavour to which this specification relates.
Claims (25)
1. A method for determining eye contours in a semantically segmented eye image, the method is under control of a hardware processor and comprises: receiving a semantically segmented eye image of an eye image comprising a plurality of pixels, wherein a pixel of the semantically segmented eye image has a color value, wherein the color value of the pixel of the semantically segmented eye image is a first color value, a second color value, a third color value, or a fourth color value, wherein the first color value corresponds to a background of the eye image, wherein the second color value corresponds to a sclera of the eye in the eye image, n the third color value corresponds to an iris of the eye in the eye image, wherein the fourth color value corresponds to a pupil of the eye in the eye image; determining a pupil contour using the semantically segmented eye image; and determining an iris contour using the semantically segmented eye image, wherein, the pupil contour is determined using a first binary image created based on the semantically segmented eye image, wherein a color value of a first binary image pixel of the first binary image is the fourth color value or the third color value, and/or the iris contour is determined using a second binary image created based on the semantically segmented eye image, wherein a color value of a second binary image pixel of the second binary image is the third color value or the second color value.
2. The method of claim 1, n the first color value is r than the second color value, wherein the second color value is greater than the third color value, and wherein the third color value is greater than the fourth color value.
3. The method of claim 1 or claim 2, wherein determining the pupil contour using the semantically ted eye image comprises: determining a pupil contour border; removing a ity of pixels from the pupil contour border; and determining the pupil r as an ellipse from remaining pixels of the pupil r border.
4. The method of claim 3 n determining a pupil contour border further comprises: determining contours in the first binary image; and selecting a longest contour of the determined contours in the first binary image as a pupil contour border.
5. The method of claim 4, wherein the color value of the first binary image pixel of the first binary image is the fourth color value if a corresponding pixel in the semantically segmented eye image has a value greater than or equal to the fourth color value, and the third color value if the corresponding pixel in the semantically ted eye image has a value not greater than or equal to the fourth color value.
6. The method of claim 4 or claim 5, comprising: determining a pupil contour points bounding box enclosing the pupil contour border; computing a pupil points area size as a diagonal of the pupil contours points bounding box; and determining a pupil contour threshold based on the pupil points area size.
7. The method of claim 6, wherein the pupil contour threshold is a fraction multiplied by the pupil points area size, and wherein the on is in a range from 0.02 to 0.20.
8. The method of any one of the claims 3 to 7, comprising creating a third binary image comprising a plurality of pixels, wherein a color value of a third binary image pixel of the ity of pixels of the third binary image is the third color value or the second color value.
9. The method of claim 8, wherein the color value of the third binary image pixel of the plurality of pixels of the third binary image is the third color value if a corresponding pixel in the semantically segmented eye image has a value greater than or equal to the third color value, and the second color value if the ponding pixel in the semantically segmented eye image has a value not greater than or equal to the third color value.
10. The method of claim 8 or claim 9, wherein removing a plurality of pixels from the pupil contour border comprises, for a pupil contour border pixel of the pupil contour border: determining a closest pixel in the third binary image that has a color value of the second color value and that is closest to the pupil contour border pixel; determining a distance between the pupil contour border pixel and the closest pixel in the third binary image; and removing the pupil contour border pixel from the pupil contour border if the distance between the pupil contour border pixel and the closest pixel in the third binary image is smaller than a pupil contour threshold.
11. The method of any one of the claims 1 to 10, wherein determining the iris contour using the semantically ted eye image comprises: ining an iris contour border; removing a plurality of pixels from the iris r border; and determining the iris contour as an ellipse from remaining pixels of the iris contour border.
12. The method claim 11, n determining the iris contour border comprises: determining contours in the second binary image; and selecting a longest contour of the determined contours in the second binary image as an iris contour .
13. The method of claim 12 comprising: determining an iris contour points bounding box enclosing the iris contour border; computing an iris points area size as a diagonal of the iris contours points ng box; and determining an iris contour threshold based on the iris points area size.
14. The method of any one of the claims 11 to 13, wherein a color value of the second binary image pixel of the plurality of pixels of the second binary image is the third color value if a corresponding pixel in the semantically segmented eye image has a value greater than or equal to the third color value, and the second color value if the corresponding pixel in the ically segmented eye image has a value not greater than or equal to the third color value.
15. The method of claim 14, wherein the iris contour threshold is a fraction multiple by the iris points area size, and wherein the on is in a range from 0.02 to 0.20.
16. The method of any one of the claims 11 to 15, comprising creating a fourth binary image comprising a plurality of pixels, wherein a color value of a fourth binary image pixel of the plurality of pixels of the fourth binary image is the second color value or the first color value.
17. The method of claim 16, wherein a color value of a fourth binary image pixel of the plurality of pixels of the fourth binary image is the second color value if a corresponding pixel in the semantically segmented eye image has a value greater than or equal to the second color value, and the first color value if the corresponding pixel in the ically segmented eye image has a value not greater than or equal to the second color value.
18. The method of claim 16 or claim 17, wherein removing a plurality of pixels from the iris contour border comprises, for an iris contour border pixel of the contour border: determining a closest pixel in the fourth binary image that has a color value of the first color value and that is t to the iris contour border pixel; determining a ce between the iris r border pixel and the closest pixel in the fourth binary image; and removing the iris contour border pixel from the iris contour border if the distance between the iris contour border pixel and the closest pixel in the fourth binary image is smaller than an iris contour threshold.
19. The method of any one of the claims 1 to 18, sing determining a binary mask to cover an irrelevant area in the semantically segmented eye image.
20. The method of claim 19 wherein determining the binary mask to cover the irrelevant area in the eye image comprises: creating a binary mask image comprising a plurality of pixels, wherein a binary mask image pixel of the binary mask image has a color value; setting the color value of the binary mask image pixel to the third color value if a corresponding pixel in the semantically segmented eye image has a value greater than or equal to the third color value; and setting the color value of the binary mask image pixel to the second color value if a corresponding pixel in the semantically segmented eye image has a value not greater than or equal to the third color value.
21. The method of claim 19 or claim 20, wherein the irrelevant area comprises a portion of the semantically segmented eye image outside of an area defined by the iris contour.
22. The method of any one of the claims 19 to 21, comprising: applying the binary mask to the semantically segmented eye image to generate a relevant eye image comprising a n of the eye image that excludes the vant area; and calculating a biometric signature from the relevant eye image.
23. The method of claim 22, n the biometric signature comprises an iris code.
24. The method of any one of the claims 1 to 23, further comprising creating a polar image of an iris of an eye in the eye image from the eye image using the pupil contour, the iris contour, and the mask for the irrelevant area in the semantically segmented eye image.
25. The method of any one of the claims 1 to 24, wherein receiving the semantically segmented eye image of an eye image sing a plurality of pixels comprises: receiving an eye image; processing the eye image using a convolution neural network to generate the semantically segmented eye image; and processing the eye image using the convolution neural network to generate a quality estimation of the eye image, wherein the convolution neural network comprises a segmentation tower and a y estimation tower, wherein the tation tower comprises segmentation layers and shared layers, wherein the quality estimation tower comprises y estimation layers and the shared layers, wherein a first output layer of the shared layers is connected to a first input layer of the segmentation tower and a second input layer of the tation tower, wherein the first output layer of the shared layers is connected to an input layer of the quality estimation layer, and wherein receiving the eye image comprises receiving the eye image by an input layer of the shared layers. 100 \ SEGMENTATEON TOWER SEGMENTATEQN LAYERS .................... OUTPUT 128 115 ENTA—TiON ..................................................................... TOWER 194 EYE IMAGE 124 .............. SHAREf’é/‘YERS QUALITY EST‘MATiON ................................................................... TOWER we QUAUTY ESTEMATEON QUALETY ESTEMATEON .333 ............... LAYERS TOWER QUTPUT 132 3 12% WO 63451 uwmm 3 mwmm . 8% 3.8%; mg 3 ummm 20:39.28 mm? 2m fizmmx3 ”DE wamxmm €ch% wmmkfl 3mm 3 3mm 9% 316% 299$)“me mo gm” 2m
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
RU2016138608 | 2016-09-29 |
Publications (1)
Publication Number | Publication Date |
---|---|
NZ788647A true NZ788647A (en) | 2022-05-27 |
Family
ID=
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2017337317B2 (en) | Neural network for eye image segmentation and image quality estimation | |
JP7237137B2 (en) | Deep neural network for iris identification | |
CN112446270B (en) | Training method of pedestrian re-recognition network, pedestrian re-recognition method and device | |
US10353271B2 (en) | Depth estimation method for monocular image based on multi-scale CNN and continuous CRF | |
US8774499B2 (en) | Embedded optical flow features | |
EP3861526A1 (en) | Semantic segmentation with soft cross-entropy loss | |
WO2021021554A1 (en) | Real time perspective correction on faces | |
CN113065645A (en) | Twin attention network, image processing method and device | |
CN109934112A (en) | A kind of face alignment method and camera | |
NZ788647A (en) | Neural network for eye image segmentation and image quality estimation | |
CN116363764A (en) | Fake video detection method and system based on forensic embedding and contextual embedding |