CN113269148A

CN113269148A - Sight estimation method, sight estimation device, computer equipment and storage medium

Info

Publication number: CN113269148A
Application number: CN202110703027.7A
Authority: CN
Inventors: 邹泽宇
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2021-06-24
Filing date: 2021-06-24
Publication date: 2021-08-17

Abstract

The present invention relates to the field of artificial intelligence technologies, and in particular, to a gaze estimation method, apparatus, computer device, and storage medium. The sight line estimation method comprises the steps of obtaining a face image and a sight line estimation model based on capsule network training; wherein the gaze estimation model comprises an encoder and a decoder; carrying out feature point positioning on the face image to obtain an eye region image; carrying out feature coding on the eye region image through a coder to obtain a human eye feature map; the human eye feature map comprises a plurality of human eye key points; decoding the human eye characteristic image through a decoder to obtain a heat image corresponding to the eye area image; wherein the heat map comprises location information for each eye keypoint; and performing sight estimation according to the position information of the key points of the human eyes to obtain a sight estimation result. The sight line estimation method can effectively improve the estimation accuracy. The invention also relates to the technical field of block chains, and the face image can be stored in the block chain.

Description

Sight estimation method, sight estimation device, computer equipment and storage medium

Technical Field

The present invention relates to the field of artificial intelligence technologies, and in particular, to a gaze estimation method, apparatus, computer device, and storage medium.

Background

The human eye sight is important non-language information in the human communication process, and is applied to scenes such as medicine, games, driving assistance, interactive application and the like at present. In the early conventional sight line estimation method, estimation is performed by simulating a two-dimensional regression function between the change of eye features and a focus position mainly by using the geometric features of the eye such as the purkinje's spots formed by the reflection of the pupil, the cornea or infrared light. However, the conventional algorithm usually needs a certain prior experience, so that the algorithm is limited by the influence of environmental factors, for example, under the natural environment, due to the influence of different environmental factors such as illumination change, face occlusion, and jitter blur, the accuracy of the sight line estimation is different, and the accuracy of the sight line estimation cannot be guaranteed.

Disclosure of Invention

The embodiment of the invention provides a sight line estimation method, a sight line estimation device, computer equipment and a storage medium, and aims to solve the problem that the existing sight line estimation is easily interfered by environmental factors to cause inaccurate estimation.

A gaze estimation method, comprising:

acquiring a face image and a sight estimation model based on capsule network training; wherein the gaze estimation model comprises an encoder and a decoder;

positioning the characteristic points of the face image to obtain an eye area image;

carrying out feature coding on the eye region image through the coder to obtain a human eye feature map; wherein the human eye feature map comprises a plurality of human eye key points;

decoding the human eye characteristic graph through the decoder to obtain a heat map corresponding to the eye area image; wherein the heat map comprises location information for each of the human eye keypoints;

and performing sight estimation according to the position information of the key points of the human eyes to obtain a sight estimation result.

A gaze estimation device, comprising:

the data acquisition module is used for acquiring a face image and a capsule network training-based sight estimation model; wherein the gaze estimation model comprises an encoder and a decoder;

the characteristic point positioning module is used for positioning the characteristic points of the face image to obtain an eye region image;

the coding module is used for carrying out characteristic coding on the eye region image through the coder to obtain a human eye characteristic diagram; wherein the human eye feature map comprises a plurality of human eye key points;

the decoding module is used for decoding the human eye characteristic graph through the decoder to obtain a heat map corresponding to the eye area image; wherein the heat map comprises location information for each of the human eye keypoints;

and the sight estimation module is used for carrying out sight estimation according to the position information of the key points of the human eyes to obtain a sight estimation result.

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the gaze estimation method when executing the computer program.

A computer storage medium storing a computer program which, when executed by a processor, implements the steps of the above-described gaze estimation method.

In the sight estimation method, the sight estimation device, the computer equipment and the storage medium, the human face image and the sight estimation model trained based on the capsule network are obtained, then the feature point positioning is carried out on the human face image to obtain the eye region image, the feature coding is carried out on the eye region image through the encoder to obtain the human eye feature map which comprises a plurality of human eye key points and corresponds to the eye region image, the capsule network is applied to a regression task for sight estimation, so that the feature coding is carried out on the key points of human eyes by means of a dynamic routing mechanism of the capsule network, the spatial feature information is reserved, the relevant feature information of the eyes can be effectively obtained, and the sight estimation model is more robust. And then, carrying out feature regression on the eye feature map through a decoder to obtain a heat map corresponding to the eye region image, introducing local features on the space in a heat map regression mode, enhancing the spatial generalization capability of the sight line estimation model, and being more beneficial to regression of key points of the human eye. And finally, performing sight estimation according to the position information of the key points of the human eyes to obtain a sight estimation result, so that the estimation accuracy can be effectively improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a schematic diagram of an application environment of a gaze estimation method according to an embodiment of the present invention;

FIG. 2 is a flowchart of a gaze estimation method in accordance with an embodiment of the present invention;

FIG. 3 is a flowchart of a gaze estimation method in accordance with an embodiment of the present invention;

FIG. 4 is a flowchart of a gaze estimation method in accordance with an embodiment of the present invention;

FIG. 5 is a detailed flowchart of step S203 in FIG. 2;

FIG. 6 is a detailed flowchart of step S205 in FIG. 2;

FIG. 7 is a specific flowchart of step S601 in FIG. 6;

FIG. 8 is a detailed flowchart of step S602 in FIG. 6;

FIG. 9 is a schematic diagram of a gaze estimation device in accordance with an embodiment of the present invention;

FIG. 10 is a schematic diagram of a computer device according to an embodiment of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The gaze estimation method may be applied in an application environment as in fig. 1, where a computer device communicates with a server over a network. The computer device may be, but is not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices. The server may be implemented as a stand-alone server.

In an embodiment, as shown in fig. 2, a method for estimating a gaze is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:

s201: acquiring a face image and a sight estimation model based on capsule network training; wherein the gaze estimation model comprises an encoder and a decoder.

The method can be applied to the fields of VR, video monitoring or other sight tracking, and is used for tracking the sight of a target person and realizing accurate sight estimation. In this embodiment, the face image may be an image including a face captured frame by frame in a real-time recorded real-time video stream or an image including a face captured in an uploaded offline video, which is not limited herein. Furthermore, in order to ensure the effectiveness of the subsequent sight line estimation, whether a human face is included in the frame image captured from the video may be detected, and if the human face is included in the frame image, the frame image may be used as a human face image to perform the subsequent sight line estimation. If not, the image can be directly discarded to select the next frame, or the corresponding frame image is selected from the video according to a set time interval, and the human face detection step is repeatedly executed until the image containing the human face is obtained.

Specifically, the sight line estimation model is obtained based on capsule network training. It can be understood that, since the dimension of the data is reduced by a pooling operation, such as maximum pooling (maxporoling), in the current convolutional neural network, only the most active neurons are retained and transferred to the next layer, which results in loss of valuable spatial information and low spatial resolution, when the input changes slightly, the output is basically unchanged, and therefore, an additional step is required to restore the image details. For the capsule network, each neuron in the capsule network is a vector neuron carrying directions, traditional pooling operation is abandoned, and low-level features are output to a high level with higher similarity through a dynamic routing mechanism, so that detailed sample information (such as accurate target position, rotation, size and the like) is stored instead of being recovered after being lost, more image details can be reserved, and the capsule network has stronger explanatory performance.

Further, lower layer features in the capsule network will only be passed to the higher layers that match them, for example: low-level features such as eyes, mouth, etc., high-level features that will be transferred to the "face"; the low-level features of fingers, palms and the like are transmitted to the high-level features of the hands to construct the spatial relationship between the low-level features and the high-level features, so that the sight estimation model obtained by training is more robust.

S202: and carrying out characteristic point positioning on the face image to obtain an eye area image.

The human face characteristic points are characteristic coordinate points obtained by inputting standard human face images into the characteristic point detection model for identification. The face characteristic points comprise five characteristic points such as a left eye, a right eye, a nose tip, a left mouth corner and a right mouth corner. Specifically, the standard face image is input into the feature point detection model for identification, and the feature point detection model obtains the positioning coordinates of the five feature points. Further, the server cuts the face image based on the positioning coordinates of the face feature points to acquire an eye area image.

The face characteristic point detection model can be realized by training the pictures marked with the positions of the face characteristic points by adopting a DCNN (distributed computing network). DCNN (deep convolutional neural network) is a kind of deep convolutional neural network.

S203: carrying out feature coding on the eye region image through a coder to obtain a human eye feature map; the human eye feature map comprises a plurality of human eye key points.

S204: decoding the eye characteristic graph through a decoder to obtain a heat map corresponding to the eye area image; wherein the heat map includes location information for each of the eye's keypoints.

The sight line estimation model comprises an encoder and a decoder, wherein the encoder is used for encoding network low-level features and network high-level features to obtain eye feature point information, and feature regression is carried out on the obtained eye feature point information through the decoder to obtain relevant feature point position information. Specifically, the gaze estimation model may be obtained by training an eye image and a heat map corresponding to the eye image that is prepared in advance as a training sample. The heat map includes a plurality of labeled eye keypoints. Further, the key points of the human eye include a plurality of (e.g., 8) boundary points corresponding to the corneal limbus, a plurality of (e.g., 8) boundary points of the iris edge, an iris center point, and a cornea center point.

In this embodiment, the heat map and the original map are the same in size, and in the heat map, the closer the pixel values of the target key points are, the larger the pixel values are, and vice versa. It can be understood that, when compared with a method of directly regressing coordinates, a method based on heat map regression can introduce local features in space, and the spatial generalization capability is strong, thereby being more beneficial to regressing key points.

In this embodiment, the basic features are extracted by using a conventional convolutional network, and then input into a capsule layer of the capsule network, so that the basic features are combined by using a main capsule layer of a dynamic routing mechanism, and 18D vectors are output to focus on eye feature point information. And then inputting the 18D vector output by the main capsule layer into a decoder, and decoding the relevant feature information of the catch of the eye to obtain the position information of the key points of 18 human eyes.

S205: and performing sight estimation according to the position information of the key points of the human eyes to obtain a sight estimation result.

The gaze estimation result includes, but is not limited to, a pitch angle and a yaw angle. The pitch angle (pitch) refers to the rotation of the target object about the X-axis. The position information includes the location coordinates of each of the human eye key points. Yaw (yaw) refers to the rotation of the target object about the Y-axis. Specifically, coordinate calculation is performed according to position information, i.e., position coordinates, of a plurality of key points of the human eye to obtain a pitch angle and a yaw angle, which can be used to determine the gaze direction or the sight line direction of the human eye.

Further, by determining the pitch angle and the yaw angle with preset threshold values, the gaze direction of the human eye, such as left, right, upper left, lower right, etc., can be determined.

In the embodiment, the eye region image is obtained by obtaining the face image and the sight line estimation model trained based on the capsule network, then the feature point positioning is carried out on the face image, the feature coding is carried out on the eye region image through the encoder, the human eye feature map which comprises a plurality of human eye key points and corresponds to the eye region image is obtained, the capsule network is applied to the regression task for sight line estimation, the feature coding is carried out on the human eye key points through the dynamic routing mechanism of the capsule network, the spatial feature information is reserved, the eye related feature information can be effectively obtained, and therefore the sight line estimation model is more robust. And then, carrying out feature regression on the eye feature map through a decoder to obtain a heat map corresponding to the eye region image, introducing local features on the space in a heat map regression mode, enhancing the spatial generalization capability of the sight line estimation model, and being more beneficial to regression of key points of the human eye. And finally, performing sight estimation according to the position information of the key points of the human eyes to obtain a sight estimation result, so that the estimation accuracy can be effectively improved.

In an embodiment, as shown in fig. 3, before step S203, the method further includes the following steps:

s301: and carrying out eye opening detection on the eye region image to obtain a detection result.

S302: and if the detection result is that the eyes are open, performing the step of performing characteristic coding on the eye region image through the encoder to obtain an eye characteristic image.

In this embodiment, in order to further ensure the effectiveness of subsequent sight line estimation, eye opening detection may be performed on the face image to detect the closed state of the human eyes, and if eye opening is detected, the eye region image is considered to be effective and may be input to the sight line estimation model for sight line estimation; when the eye closing is detected, the eye area image is considered to be invalid, that is, the sight line estimation cannot be performed, the image can be directly discarded to select the next frame, or the corresponding frame image is selected from the video according to a set time interval, and the step of eye opening detection is repeatedly performed until the valid eye area image is obtained. The open eye detection may be implemented by a pre-trained open eye detection model, which may be trained by the labeled open eye image and the closed eye image.

In one embodiment, as shown in fig. 4, the gaze estimation model corresponds to a target recognition region; wherein the target identification area is used for indicating a left eye or a right eye; before step S203, the method further includes the following steps:

s401: acquiring a face image and a sight estimation model based on capsule network training; wherein the gaze estimation model comprises an encoder and a decoder.

Specifically, step S401 is consistent with the specific execution step of step S201, and is not described herein again to avoid repetition.

S402: and carrying out characteristic point positioning on the face image to obtain an eye area image.

Specifically, step S402 is consistent with the specific execution step of step S202, and is not described herein again to avoid repetition.

S403: and if the eye area corresponding to the eye area image is different from the target identification area, performing mirror symmetry processing on the eye area image to obtain a symmetrical image corresponding to the eye area image.

The eye area corresponding to the eye area image, i.e. the left eye or the right eye, can be obtained directly in step S202, which is not described herein again.

In this embodiment, the sight line estimation model mainly estimates a monocular image, and human eyes distinguish left and right eyes, so that the monocular image is trained by uniformly using left or right eyes as training samples during training, and in practical application, the images in the eye regions need to be further judged to distinguish the left and right eyes, if the left eye is used as a training sample during training, that is, the eye specific region is the left eye, then in practical application, the right eye image needs to be mirror-symmetric to obtain a symmetric image, and then the symmetric image is input into the sight line estimation model to be estimated; on the contrary, if the right eye is used as the training sample during training, that is, the specific eye area is the right eye, in practical application, the left eye image needs to be mirror-symmetric to obtain a symmetric image, and then the symmetric image is input into the sight estimation model for sight estimation, so that the effectiveness and accuracy of the sight estimation are ensured.

S404: and carrying out feature coding on the symmetrical image through an encoder to obtain a human eye feature map.

Specifically, the feature encoding step of step S404 is consistent with step S203, and is not repeated here to avoid repetition.

S405: decoding the eye characteristic graph through a decoder to obtain a heat map corresponding to the eye area image; wherein the heat map includes location information for each of the eye's keypoints.

Specifically, the step of step S405 is consistent with step S204, and is not described herein again to avoid repetition.

S406: and performing sight estimation according to the position information of the key points of the human eyes to obtain a sight estimation result.

Specifically, the step of step S406 is consistent with step S205, and is not repeated here to avoid repetition.

In one embodiment, as shown in fig. 5, the encoder includes a convolutional layer, a main capsule layer, and a sight line estimation capsule layer connected in sequence; in step S203, performing feature coding on the eye region image through an encoder to obtain an eye feature map, which specifically includes the following steps:

s501: and performing feature extraction on the eye region image through the convolutional layer to obtain a first feature output by the convolutional layer.

S502: and performing characteristic combination on the output of the convolution layer through the main capsule layer to obtain a second characteristic of the output of the main capsule layer.

S503: and performing feature transformation on the second features through the sight line estimation capsule layer to obtain a human eye feature map.

Specifically, when feature coding is performed on an eye region image, a plurality of basic feature vectors, namely first features, can be obtained by sequentially performing feature extraction on the convolutional layer, and then basic feature combinations extracted by the convolutional layer are combined through the main capsule layer through a dynamic routing mechanism to obtain second features output by the main capsule layer, so that the current low-layer capsule is output to which high-layer capsule, pooling operation in the existing convolutional network is replaced, and more spatial information is reserved. Finally, the second feature is subjected to feature transformation through a sight line estimation capsule layer, namely a full connection layer, so that a human eye feature map with a preset dimension (for example, 18D) is obtained.

In this embodiment, the sight estimation capsule layer is constructed through the practical application scenario of the present application to output the feature vectors with the preset dimensions, and in this embodiment, since it is finally required to regress 18 human eye key points (i.e., 8 boundary points of the limbus (white edge of eye), 8 boundary points of the iris edge, the cornea center, and the iris center), the sight estimation capsule layer performs feature transformation on the second feature to obtain a human eye feature map with 18 feature channels, where each feature channel corresponds to a human eye key point.

In one embodiment, as shown in fig. 6, the key points of the human eye include an iris center point, a cornea center point, and a plurality of limbus boundary points; in step S205, performing gaze estimation according to the position information of the key points of the human eyes to obtain a gaze estimation result, specifically including the following steps:

s601: and calculating the yaw angle based on the abscissa corresponding to the iris central point, the abscissa corresponding to the cornea central point and the abscissa corresponding to the corneal limbus boundary point.

S602: and calculating the pitch angle based on the vertical coordinate corresponding to the iris center point, the vertical coordinate of the cornea center point and the vertical coordinate corresponding to the corneal limbus boundary point.

In this embodiment, the pitch angle may be calculated by the abscissa corresponding to the center of the iris, the abscissa corresponding to the center of the cornea, and the abscissa corresponding to the boundary point. And calculating the yaw angle according to the vertical coordinate corresponding to the iris center, the vertical coordinate of the cornea center and the vertical coordinate corresponding to the boundary point.

In an embodiment, as shown in fig. 7, in step S601, the method specifically includes the following steps of calculating a yaw angle based on an abscissa corresponding to the iris center point, an abscissa of the cornea center point, and an abscissa corresponding to the limbus boundary point:

s701: and if the abscissa corresponding to the iris center point is not larger than the abscissa of the cornea center point, acquiring the minimum abscissa from the abscissas corresponding to the plurality of the cornea edge boundary points.

S702: and calculating the yaw angle based on the minimum abscissa, the abscissa corresponding to the corneal central point and the abscissa corresponding to the iris central point.

S703: and if the abscissa corresponding to the iris center point is not larger than the abscissa of the cornea center point, acquiring the maximum abscissa from the abscissas corresponding to the plurality of cornea edge boundary points.

S704: and calculating the yaw angle based on the maximum abscissa, the abscissa corresponding to the corneal central point and the abscissa corresponding to the iris central point.

Specifically, the above-described calculation process of calculating the pitch angle is expressed by the following equation,

where θ represents the yaw angle and the center of the cornea is

The coordinates of the center of the iris are

Multiple boundary points of the limbus

In an embodiment, as shown in fig. 8, in step S601, the method specifically includes the following steps of calculating a yaw angle based on an abscissa corresponding to the iris center point, an abscissa of the cornea center point, and an abscissa corresponding to the limbus boundary point:

s801: if the longitudinal coordinate corresponding to the iris center point is not larger than the longitudinal coordinate of the cornea center point, acquiring the minimum longitudinal coordinate from the longitudinal coordinates corresponding to the plurality of cornea edge boundary points;

s802: calculating a pitch angle based on the minimum ordinate, the ordinate corresponding to the cornea central point and the ordinate corresponding to the iris central point;

s803: if the longitudinal coordinate corresponding to the iris center point is larger than the horizontal coordinate of the cornea center point, acquiring the maximum longitudinal coordinate from the longitudinal coordinates corresponding to the plurality of cornea edge boundary points;

s804: and calculating the pitch angle based on the maximum vertical coordinate, the vertical coordinate corresponding to the cornea central point and the vertical coordinate corresponding to the iris central point.

wherein phi represents the pitch angle and the coordinate of the cornea center is

The coordinates of the center of the iris are

Multiple boundary points of the limbus

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

In an embodiment, a gaze estimation apparatus is provided, which corresponds to the gaze estimation method in the above embodiments one to one. As shown in fig. 9, the gaze estimation apparatus includes a data acquisition module 10, a feature point positioning module 20, an encoding module 30, a decoding module 40, and a gaze estimation module 50.

The functional modules are explained in detail as follows:

the data acquisition module 10 is used for acquiring a face image and a sight estimation model based on capsule network training; wherein the gaze estimation model comprises an encoder and a decoder.

And the characteristic point positioning module 20 is configured to perform characteristic point positioning on the face image to obtain an eye region image.

The encoding module 30 is configured to perform feature encoding on the eye region image through the encoder to obtain an eye feature map; wherein the human eye feature map comprises a plurality of human eye key points.

The decoding module 40 is configured to decode the human eye feature map through the decoder to obtain a heat map corresponding to the eye region image; wherein the heat map includes location information for each of the human eye keypoints.

And the sight estimation module 50 is used for performing sight estimation according to the position information of the key points of the human eyes to obtain a sight estimation result.

Specifically, the sight line estimation device further includes an eye opening detection module and a detection result processing module.

And the eye opening detection module is used for detecting the eye opening of the eye region image to obtain a detection result.

And the detection result processing module is used for executing the step of carrying out characteristic coding on the eye region image through the encoder to obtain a human eye characteristic diagram if the detection result is that the eyes are open.

Specifically, the sight line estimation model corresponds to a target identification area; wherein the target recognition area is for indicating a left eye or a right eye; the sight line estimation device also comprises a symmetry processing module and an encoding module.

And the symmetry processing module is used for carrying out mirror symmetry processing on the eye area image to obtain a symmetric image corresponding to the eye area image if the eye area corresponding to the eye area image is different from the target identification area.

And the coding module is used for carrying out characteristic coding on the symmetrical image through the coder to obtain a human eye characteristic diagram.

Specifically, the encoder comprises a convolution layer, a main capsule layer and a sight line estimation capsule layer which are connected in sequence; the encoding module 30 includes a convolution unit, a feature combination unit, and a feature change unit.

And the convolution unit is used for extracting the features of the eye region image through the convolution layer to obtain the first features output by the convolution layer.

And the characteristic combination unit is used for carrying out characteristic combination on the output of the convolution layer through the main capsule layer to obtain a second characteristic output by the main capsule layer.

And the characteristic change unit is used for carrying out characteristic transformation on the second characteristic through the sight line estimation capsule layer to obtain the human eye characteristic diagram.

Specifically, the human eye key points comprise an iris center point, a cornea center point and a plurality of limbus boundary points; the sight line estimation result comprises a pitch angle and a yaw angle; the sight line estimation module includes a yaw angle calculation unit and a pitch angle calculation unit.

And the yaw angle calculation unit is used for calculating the yaw angle based on the abscissa corresponding to the iris central point, the abscissa of the cornea central point and the abscissa corresponding to the corneal limbus boundary point.

And the pitch angle calculation unit is used for calculating the pitch angle based on the vertical coordinate corresponding to the iris central point, the vertical coordinate of the cornea central point and the vertical coordinate corresponding to the corneal limbus boundary point.

Specifically, the yaw angle calculation unit includes a minimum abscissa determination subunit, a first calculation subunit, a maximum abscissa determination subunit, and a second calculation subunit.

A minimum abscissa determining subunit, configured to, if the abscissa corresponding to the iris center point is not greater than the abscissa of the cornea center point, obtain the minimum abscissa from the abscissas corresponding to the plurality of limbus boundary points.

A first calculating subunit, configured to calculate the yaw angle based on the minimum abscissa, the abscissa corresponding to the corneal central point, and the abscissa corresponding to the iris central point.

And the maximum abscissa determining subunit is used for acquiring the maximum abscissa from the abscissas corresponding to the plurality of limbus boundary points if the abscissa corresponding to the iris central point is not larger than the abscissa of the cornea central point.

And the second calculating subunit is used for calculating the yaw angle based on the maximum abscissa, the abscissa corresponding to the cornea central point and the abscissa corresponding to the iris central point. Specifically, the first linear normalization processing unit includes a linear transformation processing unit and a normalization processing unit.

Specifically, the pitch angle calculation unit includes a minimum ordinate determination subunit, a first calculation subunit, a maximum ordinate determination subunit, and a second calculation subunit.

And the minimum vertical coordinate determining subunit is used for acquiring a minimum vertical coordinate from the vertical coordinates corresponding to the plurality of limbus boundary points if the vertical coordinate corresponding to the iris center point is not larger than the vertical coordinate of the cornea center point.

And the first calculating subunit is used for calculating the pitch angle based on the minimum vertical coordinate, the vertical coordinate corresponding to the cornea central point and the vertical coordinate corresponding to the iris central point.

And the maximum vertical coordinate determining subunit is used for acquiring the maximum vertical coordinate from the vertical coordinates corresponding to the plurality of limbus boundary points if the vertical coordinate corresponding to the iris center point is larger than the horizontal coordinate of the cornea center point.

And the second calculating subunit is used for calculating the pitch angle based on the maximum vertical coordinate, the vertical coordinate corresponding to the cornea central point and the vertical coordinate corresponding to the iris central point.

For specific definition of the sight line estimation device, reference may be made to the above definition of the sight line estimation method, which is not described herein again. The respective modules in the above-described sight line estimation device may be entirely or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 10. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a computer storage medium and an internal memory. The computer storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the computer storage media. The database of the computer device is used to store data generated or acquired during execution of the gaze estimation method, such as a gaze estimation model. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a gaze estimation method.

In one embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the steps of the gaze estimation method in the above-described embodiments are implemented, for example, steps S201 to S205 shown in fig. 2, or steps shown in fig. 3 to 8. Alternatively, the processor implements the functions of each module/unit in the embodiment of the gaze estimation apparatus when executing the computer program, for example, the functions of each module/unit shown in fig. 9, and are not described here again to avoid repetition.

In an embodiment, a computer storage medium is provided, and a computer program is stored on the computer storage medium, and when being executed by a processor, the computer program implements the steps of the gaze estimation method in the foregoing embodiments, such as steps S201 to S205 shown in fig. 2 or steps shown in fig. 3 to 8, which are not repeated herein for avoiding repetition. Alternatively, the computer program is executed by the processor to implement the functions of the modules/units in the embodiment of the gaze estimation apparatus, for example, the functions of the modules/units shown in fig. 9, and are not described herein again to avoid repetition.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules, so as to perform all or part of the functions described above.

The above examples are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the foregoing examples, those of ordinary skill in the art should understand that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A gaze estimation method, comprising:

2. The gaze estimation method of claim 1, wherein before said feature encoding of the eye region image by the encoder to obtain a human eye feature map, the gaze estimation method further comprises;

carrying out eye opening detection on the eye region image to obtain a detection result;

and if the detection result is that the eyes are open, performing the step of performing characteristic coding on the eye region image through the encoder to obtain an eye characteristic image.

3. The gaze estimation method of claim 1, wherein the gaze estimation model corresponds to a target recognition region; wherein the target recognition area is for indicating a left eye or a right eye; the method also comprises the steps of carrying out characteristic coding on the eye region image through the coder to obtain a human eye characteristic image;

if the eye area corresponding to the eye area image is different from the target identification area, mirror symmetry processing is carried out on the eye area image to obtain a symmetrical image corresponding to the eye area image;

the feature coding is performed on the eye region image through the encoder to obtain an eye feature map, and the method comprises the following steps:

and carrying out feature coding on the symmetrical image through the coder to obtain a human eye feature map.

4. The gaze estimation method of claim 1, wherein the encoder comprises a convolutional layer, a main capsule layer, and a gaze estimation capsule layer connected in sequence; the feature coding is performed on the eye region image through the encoder to obtain an eye feature map, and the method comprises the following steps:

performing feature extraction on the eye region image through the convolutional layer to obtain a first feature output by the convolutional layer;

performing characteristic combination on the output of the convolution layer through the main capsule layer to obtain a second characteristic of the output of the main capsule layer;

and performing feature transformation on the second features through the sight line estimation capsule layer to obtain the human eye feature map.

5. The gaze estimation method of claim 1, wherein the human eye key points comprise an iris center point, a cornea center point, and a plurality of limbal boundary points; the sight line estimation result comprises a pitch angle and a yaw angle;

the sight line estimation according to the position information of the key points of the human eyes to obtain a sight line estimation result comprises the following steps:

calculating the yaw angle based on the abscissa corresponding to the iris central point, the abscissa of the cornea central point and the abscissa corresponding to the limbus boundary point;

and calculating the pitch angle based on the vertical coordinate corresponding to the iris central point, the vertical coordinate of the cornea central point and the vertical coordinate corresponding to the corneal limbus boundary point.

6. The gaze estimation method of claim 5, wherein the calculating the yaw angle based on the abscissa corresponding to the iris center point, the abscissa of the corneal center point, and the abscissa corresponding to the limbus boundary point comprises:

if the abscissa corresponding to the iris central point is not larger than the abscissa of the cornea central point, acquiring the minimum abscissa from the abscissas corresponding to the plurality of limbus boundary points;

calculating the yaw angle based on the minimum abscissa, the abscissa corresponding to the corneal central point and the abscissa corresponding to the iris central point;

if the abscissa corresponding to the iris central point is not larger than the abscissa of the cornea central point, acquiring the maximum abscissa from the abscissas corresponding to the plurality of limbus boundary points;

and calculating the yaw angle based on the maximum abscissa, the abscissa corresponding to the corneal central point and the abscissa corresponding to the iris central point.

7. The gaze estimation method of claim 5, wherein the calculating the pitch angle based on the ordinate corresponding to the iris center point, the ordinate corresponding to the corneal center point, and the ordinate corresponding to the limbus boundary point comprises:

if the longitudinal coordinate corresponding to the iris center point is not larger than the longitudinal coordinate of the cornea center point, acquiring the minimum longitudinal coordinate from the longitudinal coordinates corresponding to the plurality of limbus boundary points;

calculating the pitch angle based on the minimum ordinate, the ordinate corresponding to the cornea center point and the ordinate corresponding to the iris center point;

if the vertical coordinate corresponding to the iris center point is larger than the horizontal coordinate of the cornea center point, acquiring the maximum vertical coordinate from the vertical coordinates corresponding to the plurality of limbus boundary points;

and calculating the pitch angle based on the maximum vertical coordinate, the vertical coordinate corresponding to the cornea central point and the vertical coordinate corresponding to the iris central point.

8. A gaze estimation device, comprising:

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the gaze estimation method according to any one of claims 1 to 7 when executing the computer program.

10. A computer storage medium storing a computer program, wherein the computer program when executed by a processor implements the steps of the gaze estimation method of any one of claims 1 to 7.