CN112597872A - Gaze angle estimation method and device, storage medium, and electronic device - Google Patents

Gaze angle estimation method and device, storage medium, and electronic device Download PDF

Info

Publication number
CN112597872A
CN112597872A CN202011502278.0A CN202011502278A CN112597872A CN 112597872 A CN112597872 A CN 112597872A CN 202011502278 A CN202011502278 A CN 202011502278A CN 112597872 A CN112597872 A CN 112597872A
Authority
CN
China
Prior art keywords
eye
angle
region
eyes
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011502278.0A
Other languages
Chinese (zh)
Inventor
沈丽娜
牛建伟
储刘火
陶冶
杨超
张宏志
黄赫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Horizon Robotics Science and Technology Co Ltd
Original Assignee
Shenzhen Horizon Robotics Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Horizon Robotics Science and Technology Co Ltd filed Critical Shenzhen Horizon Robotics Science and Technology Co Ltd
Priority to CN202011502278.0A priority Critical patent/CN112597872A/en
Publication of CN112597872A publication Critical patent/CN112597872A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/197Matching; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/193Preprocessing; Feature extraction

Abstract

The embodiment of the disclosure discloses a sight angle estimation method and device, a storage medium and electronic equipment, wherein the method comprises the following steps: determining an eye region included in an image to be detected; performing feature extraction on the eye region to obtain eye region features corresponding to the eye region; determining angle interval vectors corresponding to the two eyes in the eye region and offset angle vectors corresponding to the two eyes in the eye region based on the eye region features; determining a gaze angle corresponding to both eyes in the eye region based on the angle interval vector and the offset angle vector. According to the embodiment of the invention, the accuracy of the sight angle estimation is improved by combining the positioning angle interval with the offset angle, and the embodiment of the invention can adapt to head postures of various amplitudes by processing on the basis of the eye region, so that the robustness of the sight angle estimation method is improved.

Description

Gaze angle estimation method and device, storage medium, and electronic device
Technical Field
The present disclosure relates to computer vision technologies, and in particular, to a gaze angle estimation method and apparatus, a storage medium, and an electronic device.
Background
The sight angle estimation is to deduce the sight direction of a person from an eye picture or a face picture. In human-computer interaction, a computer automatically estimates the observation direction of human eyes, namely sight angle estimation, which is an important application; for example, during driving, the current state of the driver can be predicted through the sight angle estimation, and whether distraction, fatigue and the like occur, so that the automobile is controlled to execute a series of corresponding operations and/or reminders.
Disclosure of Invention
The present disclosure is proposed to solve the above technical problems. The embodiment of the disclosure provides a sight angle estimation method and device, a storage medium and an electronic device.
According to an aspect of the embodiments of the present disclosure, there is provided a gaze angle estimation method, including:
determining an eye region included in an image to be detected;
performing feature extraction on the eye region to obtain eye region features corresponding to the eye region;
determining angle interval vectors corresponding to the two eyes in the eye region and offset angle vectors corresponding to the two eyes in the eye region based on the eye region features;
determining a gaze angle corresponding to both eyes in the eye region based on the angle interval vector and the offset angle vector.
According to another aspect of the embodiments of the present disclosure, there is provided a neural network training method, the neural network including a feature extraction branch network, a first positioning branch network, and a second positioning branch network; the method comprises the following steps:
inputting a sample eye image into the neural network, and performing feature extraction on the sample eye image through a feature extraction branch network of the neural network to obtain a sample eye feature; the sample eye image comprises two eyes of a human face, and the two eyes in the sample eye image have corresponding known sight angles;
processing the sample eye features by utilizing the first positioning branch network and the second positioning branch network to obtain predicted sight angles corresponding to two eyes in the sample eye image;
determining a first loss based on the predicted gaze angle and the known gaze angle;
determining a net loss based on the first loss, training the neural network based on the net loss.
According to still another aspect of the embodiments of the present disclosure, there is provided a gaze angle estimation apparatus including:
the region extraction module is used for determining an eye region included in the image to be detected;
the feature extraction module is used for performing feature extraction on each eye region obtained by the region extraction module to obtain eye region features corresponding to each eye region;
the interval estimation module is used for determining angle interval vectors corresponding to the two eyes in the eye region and offset angle vectors corresponding to the two eyes in the eye region based on the eye region features obtained by the feature extraction module;
and the angle determining module is used for determining the sight line angles corresponding to the two eyes in the eye region based on the angle interval vector and the offset angle vector determined by the interval estimation module.
According to still another aspect of the embodiments of the present disclosure, there is provided a neural network training apparatus, the neural network including a feature extraction branch network, a first positioning branch network, and a second positioning branch network; the method comprises the following steps:
the characteristic extraction module is used for inputting the sample eye image into the neural network and extracting the characteristics of the sample eye image through the characteristic extraction branch network of the neural network to obtain the sample eye characteristics; the sample eye image comprises two eyes of a human face, and the two eyes in the sample eye image have corresponding known sight angles;
the angle estimation module is used for processing the sample eye features obtained by the feature extraction module by utilizing the first positioning branch network and the second positioning branch network to obtain predicted sight angles corresponding to two eyes in the sample eye images;
a loss determination module for determining a first loss based on the predicted gaze angle and the known gaze angle obtained by the angle estimation module;
and the network training module is used for determining the network loss based on the first loss obtained by the loss determining module and training the neural network based on the network loss.
According to still another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program for executing the gaze angle estimation method of any of the above embodiments or the neural network training method of any of the above embodiments.
According to still another aspect of an embodiment of the present disclosure, there is provided an electronic apparatus including:
a processor; a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the instructions to implement the gaze angle estimation method according to any of the above embodiments or the neural network training method according to any of the above embodiments.
Based on the gaze angle estimation method and device, the storage medium, and the electronic device provided by the embodiments of the present disclosure, the accuracy of gaze angle estimation is improved by combining the positioning angle interval with the offset angle, and since the embodiments of the present disclosure perform processing based on the eye region, the method can adapt to head postures of various amplitudes, and improve the robustness of the gaze angle estimation method.
The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in more detail embodiments of the present disclosure with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. In the drawings, like reference numbers generally represent like parts or steps.
Fig. 1 is a schematic structural diagram of a gaze angle estimation system according to an exemplary embodiment of the present disclosure.
Fig. 2a is a schematic structural diagram of a neural network training system according to an exemplary embodiment of the present disclosure.
Fig. 2b is a schematic view of a gaze angle of a sample eye image applied in a neural network training system according to an exemplary embodiment of the present disclosure.
Fig. 3 is a flowchart illustrating a gaze angle estimation method according to an exemplary embodiment of the disclosure.
Fig. 4 is a schematic flow chart of step 303 in the embodiment shown in fig. 3 of the present disclosure.
Fig. 5 is a schematic flow chart of step 3031 in the embodiment shown in fig. 4 of the present disclosure.
Fig. 6a is a schematic flow chart of step 3032 in the embodiment shown in fig. 4 of the present disclosure.
Fig. 6b is a schematic diagram of an eye region in an example of a gaze angle estimation method provided by the embodiment of the disclosure.
Fig. 6c is a schematic diagram of an eye region subjected to data enhancement in one example of a gaze angle estimation method provided by the embodiment of the present disclosure.
Fig. 6d is a schematic diagram of an eye region in another example of the gaze angle estimation method provided by the embodiment of the disclosure.
Fig. 6e is a schematic diagram of an eye region subjected to data enhancement in another example of the gaze angle estimation method provided by the embodiment of the disclosure.
Fig. 7 is a flowchart illustrating a neural network training method according to an exemplary embodiment of the disclosure.
Fig. 8 is a schematic structural diagram of a gaze angle estimation apparatus according to an exemplary embodiment of the present disclosure.
Fig. 9 is a schematic structural diagram of a gaze angle estimation apparatus according to another exemplary embodiment of the present disclosure.
Fig. 10 is a schematic structural diagram of a neural network training device according to an exemplary embodiment of the present disclosure.
Fig. 11 is a schematic structural diagram of a neural network training device according to another exemplary embodiment of the present disclosure.
Fig. 12 is a block diagram of an electronic device provided in an exemplary embodiment of the present disclosure.
Detailed Description
Hereinafter, example embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.
It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.
It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present disclosure are used merely to distinguish one element from another, and are not intended to imply any particular technical meaning, nor is the necessary logical order between them.
It is also understood that in embodiments of the present disclosure, "a plurality" may refer to two or more and "at least one" may refer to one, two or more.
It is also to be understood that any reference to any component, data, or structure in the embodiments of the disclosure, may be generally understood as one or more, unless explicitly defined otherwise or stated otherwise.
In addition, the term "and/or" in the present disclosure is only one kind of association relationship describing an associated object, and means that three kinds of relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the former and latter associated objects are in an "or" relationship.
It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
The disclosed embodiments may be applied to electronic devices such as terminal devices, computer systems, servers, etc., which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, servers, and the like, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.
Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
Summary of the application
In the course of implementing the present disclosure, the inventor finds that, in the existing solutions, for the line-of-sight detection, the mainstream method is generally a model-based method, that is, the line-of-sight direction vector is calculated by using the infrared light source reflection and a geometric model. The existing scheme has at least the following problems: the cost is high, the requirement on image quality is high, complex equipment (a plurality of infrared light sources and a high-definition camera) is needed, the condition that the head posture is basically not moved needs to be met, and the application range of the scheme is greatly limited.
Exemplary System
The invention provides a sight angle estimation method, which is used for determining a sight angle from coarse to fine in two stages based on confidence coefficient and offset, and improving the contrast of an eye region by adopting a data enhancement method in a data preprocessing stage, so that the precision and robustness of sight angle estimation and the adaptability to head movement are improved.
Fig. 1 is a schematic structural diagram of a gaze angle estimation system according to an exemplary embodiment of the present disclosure. As shown in fig. 1, wherein the neural network comprises: a feature extraction branch network 102, a first positioning branch network 103 and a second positioning branch network 104;
the acquired image to be detected is processed by a face detection model and an eye key point model 101 to obtain an eye region (other preposition information such as eye key points).
And data enhancement processing is carried out on the eye area, the contrast of the eye area is enhanced, the attention to the eye area is strengthened, and the generalization capability of the model is improved.
Inputting the data-enhanced eye region into a neural network, and performing feature extraction on the eye region through a feature extraction branch network 102 of the neural network to obtain features of the eye region; the eye region characteristics are processed through a first positioning branch network 103 and a second positioning branch network 104 of a neural network to realize two-stage sight direction estimation, a sight region is divided into a set number of horizontal angle intervals and vertical angle intervals, the horizontal angle interval and the vertical angle interval where the sight is roughly positioned by the first positioning branch network 103 output corresponding confidence degrees, and an angle interval vector corresponding to the sight direction in the eye region is determined; the second positioning branch network 104 performs more detailed classification and regression, first performs classification, selects and maps the classification result to a certain interval according to the confidence, and then obtains the offset angle vector by using regression offset with finer granularity.
And adding the offset angle vector to the angle interval vector obtained by the first-stage coarse positioning, and determining the horizontal sight angle and the vertical sight angle of the left eye and the horizontal sight angle and the vertical sight angle of the right eye in the eyes of the eye region.
The positioning precision of the sight angle is improved by a two-stage coarse-fine positioning method; the label improves the contrast of the eye region by adopting data enhancement aiming at the eye region, and improves the robustness of the sight angle estimation method.
Fig. 2a is a schematic structural diagram of a neural network training system according to an exemplary embodiment of the present disclosure. As shown in fig. 2a, in the training process, the neural network includes: a feature extraction branch network 102, a first positioning branch network 103, a second positioning branch network 104, an eye segmentation branch network 205, and a key point prediction branch network 206;
inputting the sample eye image into a neural network, wherein the sample eye image can be obtained by face detection and human eye detection based on the sample image including eyes, and the sample eye image can be input into the neural network after data enhancement; both eyes in the sample eye image as shown in fig. 2b have corresponding known line of sight angles; wherein the known gaze angle may include: a known horizontal line of sight angle and a known vertical line of sight angle for the left eye, and a known horizontal line of sight angle and a known vertical line of sight angle for the right eye. The method comprises the steps of realizing two-stage sight direction estimation through a feature extraction branch network 102, a first positioning branch network 103 and a second positioning branch network 104 in a neural network, and determining predicted sight angles of two eyes of an eye region, wherein the predicted sight angles can comprise: the predicted horizontal and vertical gaze angles for the left eye, and the predicted horizontal and vertical gaze angles for the right eye.
A first loss is determined based on the predicted gaze angle and the known gaze angle.
On the basis of determining the first loss, the eye segmentation branch network 205 can be utilized to perform segmentation processing on the sample eye image, and a second loss is determined; the sample eye image includes known eye keypoints.
The auxiliary eye keypoint detection task determines a third loss based on the predicted eye keypoints and the known eye keypoints of the sample eye image determined by the keypoint prediction branch network 206.
Determining a network loss based on the first loss, the second loss, and the third loss, and adjusting network parameters in the neural network by inverse gradient propagation based on the network loss. When applied, the trained neural network only includes the feature extraction branch network 102, the first positioning branch network 103 and the second positioning branch network 104.
In the embodiment, the corresponding losses of the pupil segmentation task and the sight angle task are simultaneously fed back to the neural network, so that the neural network learns the characteristics of the two tasks, the internal relations of the two tasks can be mutually promoted, the model can obtain a better sight result, and the adaptability of the neural network to head movement is improved; and moreover, the loss corresponding to the eye key point detection task is combined, and the prediction precision of the neural network can be improved to a certain extent.
Exemplary method
Fig. 3 is a flowchart illustrating a gaze angle estimation method according to an exemplary embodiment of the disclosure. The embodiment can be applied to an electronic device, as shown in fig. 3, and includes the following steps:
step 301, determining an eye region included in an image to be detected.
The image to be detected includes at least one pair of eyes of a person, the eye region in this embodiment refers to an image region including the pair of eyes of the person, and optionally, the eye region may be obtained by the face detection model and the eye keypoint model 101 in the embodiment provided in fig. 1.
Step 302, performing feature extraction on the eye region to obtain eye region features corresponding to the eye region.
Optionally, feature extraction may be performed on the eye region by using a feature extraction network or by using a feature extraction branch network in the overall neural network to obtain features of the eye region, and optionally, the feature extraction branch network is the feature extraction branch network 102 in the embodiment provided in fig. 1.
Step 303, determining an angle interval vector corresponding to both eyes in the eye region and an offset angle vector corresponding to both eyes in the eye region based on the eye region feature.
In this embodiment, the two branch networks in the neural network may be used to process the eye region respectively, so as to obtain an angle interval vector and an offset angle vector, determine the large direction of the sight angle through the angle interval vector, and improve the accuracy of the sight angle direction through the offset angle vector.
And step 304, determining the sight line angles corresponding to the two eyes in the eye region based on the angle interval vector and the offset angle vector.
Alternatively, as shown in fig. 2b, the gaze angle includes a pitch angle (pitch) of the eyeball, which represents a rotation angle of the eyeball in the X axis, and a yaw angle (yaw), which represents a rotation angle of the eyeball in the Y axis; the angle interval vectors and the offset angle vectors are accumulated, the specific offset angle of the binocular vision in the angle interval vectors is determined according to the offset angle vectors, and the accuracy of the determined vision angle is improved.
According to the sight angle estimation method provided by the embodiment of the disclosure, the accuracy of sight angle estimation is improved by combining the positioning angle interval with the offset angle, and the method can adapt to head postures of various amplitudes and improve the robustness of the sight angle estimation method as the eye region is processed on the basis.
Optionally, step 302 in the above embodiment may include: and performing feature extraction on the eye region by using the feature extraction branch network of the neural network to obtain eye region features corresponding to the eye region.
The feature extraction branch network in the embodiment is a part of the neural network, and feature extraction is performed on the eye region through the feature extraction branch network, so that a basis is provided for subsequently estimating the view angle through other parts in the neural network, and the efficiency of view estimation is improved.
As shown in fig. 4, based on the embodiment shown in fig. 3, step 303 may include the following steps:
step 3031, determining angle interval vectors corresponding to the two eyes in the eye region based on the eye region characteristics by using a first positioning branch network of the neural network.
Optionally, as shown in fig. 1, the first positioning branch network 103 and the second positioning branch network 104 in this embodiment are respectively connected to the feature extraction branch network 102, and the first positioning branch network 103 and the second positioning branch network 104 may process the eye region features at the same time, so as to improve the network processing efficiency.
Step 3032, determining offset angle vectors corresponding to the two eyes in the eye region based on the eye region features and the angle interval vector by using a second positioning branch network of the neural network.
The neural network in this embodiment includes three branch networks: the feature extraction branch network, the first positioning branch network and the second positioning branch network are used for carrying out feature extraction once through the feature extraction branch network, and the eye region features obtained by feature extraction are respectively input into the first positioning branch network and the second positioning branch network, so that repeated feature extraction operation is avoided, and the processing efficiency of the neural network is improved.
As shown in fig. 5, based on the embodiment shown in fig. 4, step 3031 may include the following steps:
and 501, dividing the included angles of the image to be detected in the horizontal direction and the vertical direction relative to the normal vector into a set number of horizontal angle intervals and a set number of vertical angle intervals respectively.
In this embodiment, a direction perpendicular to the image to be detected is taken as a normal vector direction (corresponding to a yaw angle of an eyeball in the embodiment shown in fig. 2 b), an included angle between a horizontal direction and the normal vector is-90 degrees to 90 degrees (corresponding to a pitch angle of the eyeball in the embodiment shown in fig. 2 b), wherein 90 degrees represents that the image to be detected is horizontally left on a plane where the image to be detected is located; 90 degrees represents the horizontal right on the plane of the image to be detected; the 0-degree representation is superposed with the normal vector and is vertical to the image to be detected; the included angle between the vertical direction and the normal vector is-90 degrees to 90 degrees, wherein 90 degrees represent that the image to be detected is vertically upward on the plane; 90 degrees represents vertically downwards on the plane of the image to be detected; the 0-degree representation is superposed with the normal vector and is vertical to the image to be detected; this embodiment divides into the interval of setting for quantity respectively with the contained angle of horizontal direction and the contained angle of vertical direction, and every interval includes a plurality of angles, can number in order to distinguish every interval, for example, divides into 6 horizontal angle intervals with the contained angle of horizontal direction and normal vector, is respectively: 1 interval [ -90 degrees, -60 degrees ], 2 intervals [ -60 degrees, -30 degrees ], 3 intervals [ -30 degrees, 0 degrees ], 4 intervals [0 degrees, 30 degrees ], 5 intervals [30 degrees, 60 degrees ], 6 intervals [60 degrees, 90 degrees ]; the coarse positioning of the sight line direction can be realized by positioning the angle interval where the sight line direction is located.
Step 502, the first positioning branch network of the neural network is used to process the eye region features, and a first confidence degree corresponding to each horizontal angle interval and a second confidence degree corresponding to each vertical angle interval are obtained for both eyes in the eye region.
In this embodiment, the first positioning branch network may be implemented by using a classification network, and the classification network determines a first confidence level of each horizontal angle interval corresponding to each of the two eyes in the eye region, and optionally, may determine a first confidence level of a left eye for a left eye of the two eyes, and may determine a first confidence level of a right eye for a right eye of the two eyes; and determining a second confidence degree of each vertical angle interval corresponding to the two eyes, optionally, determining a second confidence degree of the left eye for the left eye of the two eyes, and determining a second confidence degree of the right eye for the right eye.
Step 503, determining two horizontal angle intervals corresponding to the two eyes based on the first confidence degree, and determining two vertical angle intervals corresponding to the two eyes based on the second confidence degree.
Optionally, a horizontal angle interval and a vertical angle interval corresponding to the left eye, and a horizontal angle interval and a vertical angle interval corresponding to the right eye may be respectively determined for both eyes, wherein the horizontal angle interval corresponding to the maximum left-eye first confidence coefficient in the set number of left-eye first confidence coefficients corresponding to the left eye is taken as the horizontal angle interval corresponding to the left eye, and the horizontal angle interval corresponding to the maximum right-eye first confidence coefficient in the set number of right-eye first confidence coefficients corresponding to the right eye is taken as the horizontal angle interval corresponding to the right eye; and taking the vertical angle interval corresponding to the maximum left-eye second confidence coefficient in the set number of left-eye second confidence coefficients corresponding to the left eye as the vertical angle interval corresponding to the left eye, and taking the vertical angle interval corresponding to the maximum right-eye second confidence coefficient in the set number of right-eye second confidence coefficients corresponding to the right eye as the vertical angle interval corresponding to the right eye.
Step 504, determining angle interval vectors corresponding to the two eyes based on the two horizontal angle intervals and the two vertical angle intervals corresponding to the two eyes.
In this embodiment, a one-dimensional 4-element vector is formed based on two horizontal angle intervals and two vertical angle intervals, so that a horizontal angle interval and a vertical angle interval corresponding to two eyes expressed based on an angle interval vector are realized, a result of coarse positioning of the line of sight of two eyes is expressed by an angle interval vector, and since the coarse positioning divides an angle into intervals of a set number, the corresponding angle interval can be determined only by determining a confidence coefficient corresponding to each interval, so that data processing amount is greatly reduced, and efficiency of coarse positioning is improved.
As shown in fig. 6a, based on the embodiment shown in fig. 4, step 3032 may include the following steps:
step 601, processing the eye region features by using a second positioning branch network of the neural network to obtain two horizontal deviation angles of each eye in the two eyes in the horizontal direction and two vertical deviation angles in the vertical direction.
In this embodiment, the second positioning branch network may perform more detailed classification and regression on the eyes in the eye region, for example, each interval corresponding to each rough classification is subdivided into a plurality of subintervals, and at this time, the second positioning branch network may be considered as a classification network, and the confidence level of each subinterval corresponding to the eyes is determined, so as to determine a finer-grained gaze angle; or the second positioning branch network is an offset prediction network, and based on the input eye region characteristics, the offset of the eyes in the determined angle interval vector is directly obtained, so that the finer-grained sight line angle is determined.
Step 602, determining offset angle vectors corresponding to the two eyes based on the two horizontal offset angles and the two vertical offset angles.
In this embodiment, a one-dimensional 4-dimensional vector is formed based on two horizontal offset angles and two vertical offset angles, so that a horizontal offset angle and a vertical offset angle corresponding to two eyes expressed based on one offset angle vector are realized, a fine-grained positioning result of the line of sight of two eyes is represented by the offset angle vector, the angle is divided into a set number of intervals by coarse positioning, only the approximate direction of the line of sight is determined by the coarse positioning, a specific line of sight angle can be determined by the offset angle vector, and the accuracy of the angular positioning of the line of sight of two eyes is improved.
In some alternative embodiments, the gaze angle comprises a horizontal gaze angle and a vertical gaze angle for the left eye, and a horizontal gaze angle and a vertical gaze angle for the right eye; step 304 in the above embodiments may include:
and adding the angle interval vector and the offset angle vector to obtain a sight angle vector.
As can be seen from the above embodiments, the angle interval vector and the offset angle vector are respectively one-dimensional 4-ary vectors, and the two vectors may be added according to elements to obtain a new one-dimensional 4-ary vector added by each element, that is, the sight angle vector.
A horizontal line-of-sight angle and a vertical line-of-sight angle for a left eye, and a horizontal line-of-sight angle and a vertical line-of-sight angle for a right eye in both eyes of the eye region are determined based on the line-of-sight angle vectors.
In the embodiment, the offset angle vector is superimposed on the angle interval vector of the coarse positioning, so that the sight line direction is accurately positioned in the angle interval vector of the coarse positioning, a coarse-to-fine sight line direction positioning mode is realized, and the robustness and the accuracy of the sight line angle estimation method are improved.
In some optional embodiments, step 301 may comprise:
and executing face detection processing on the image to be detected to obtain at least one face area in the image to be detected.
Optionally, as shown in the embodiment provided in fig. 1, the face detection model in the face detection model and the eye keypoint model 101 performs face detection on the image to be detected, and determines at least one face region included in the image to be detected.
And performing face key point identification on each face region to obtain eye key points in the face region.
Optionally, the eye keypoints in the face detection model and the eye keypoint model 101 shown in the embodiment provided in fig. 1 perform eye keypoint recognition on each face region respectively, and determine the eye keypoints included in the face region, where optionally, the eye keypoints may include, but are not limited to, 16 keypoints of each eye in both eyes.
And obtaining eye regions corresponding to the face regions based on the human eye key points in each face region.
The embodiment determines the positions of two eyes in each face region based on a plurality of eye key points obtained by detection, and further obtains the eye region in each face region; the sight angle estimation is carried out by the eye region, and the adaptability of the sight angle estimation method to head movement is improved due to the fact that other parts of the human face are removed.
In some optional embodiments, before step 302, the method may further include:
and performing local histogram equalization processing on the eye region to enhance the contrast of the eye region, so as to obtain the eye region with enhanced contrast.
According to the embodiment, the contrast of the eyes is enhanced through local histogram equalization aiming at the eye area, the attention to the eye area is enhanced, and the accuracy of the sight angle estimation is improved. For example, in an alternative example, fig. 6b is a schematic diagram of an eye region in an example of a gaze angle estimation method provided by the embodiment of the present disclosure; fig. 6c is a schematic diagram of an eye region subjected to data enhancement in one example of a gaze angle estimation method provided by the embodiment of the disclosure; fig. 6d is a schematic diagram of an eye region in another example of a gaze angle estimation method provided by the embodiment of the present disclosure; fig. 6e is a schematic diagram of an eye region subjected to data enhancement in another example of the gaze angle estimation method provided by the embodiment of the disclosure.
Fig. 7 is a flowchart illustrating a neural network training method according to an exemplary embodiment of the disclosure. The neural network comprises a feature extraction branch network, a first positioning branch network and a second positioning branch network; as shown in fig. 7, the method comprises the following steps:
step 701, inputting the sample eye image into a neural network, and performing feature extraction on the sample eye image through a feature extraction branch network of the neural network to obtain a sample eye feature.
The sample eye image comprises two eyes of a human face, and the two eyes in the sample eye image have corresponding known sight angles.
In this embodiment, the process of extracting features of the sample image may be implemented with reference to step 302 in the gaze angle estimation method.
Step 702, processing the sample eye features by using the first positioning branch network and the second positioning branch network to obtain predicted gaze angles corresponding to two eyes in the sample eye image.
This step can be implemented with reference to the above-described gaze angle estimation method, steps 303 and 304 in the embodiments shown in fig. 3-5, the only difference being that the network parameters in the neural network differ.
A first loss is determined based on the predicted gaze angle and the known gaze angle, step 703.
In this embodiment, the known gaze angle is used as the supervision information for neural network training, and the loss of the neural network is trained according to the difference between the predicted gaze angle predicted by the current neural network and the known gaze angle, for example, as shown in fig. 2a, the first loss is obtained according to the predicted gaze angles output by the first positioning branch network 103 and the second positioning branch network 104; the first loss may be calculated by any loss function, such as an L1 loss function.
And step 704, determining a network loss based on the first loss, and training the neural network based on the network loss.
Optionally, parameter adjustment of the neural network may be implemented in a gradient back propagation manner based on the network loss, the neural network after the parameter adjustment is obtained each time, and then the above steps 701 to 703 are executed by the neural network after the parameter adjustment until a preset condition is met, so as to obtain the trained neural network, where the preset condition may be a set iteration number, a value of the network loss decreased in two consecutive iterations is less than a set value, and the like. In the embodiment, the neural network is trained by utilizing the known sight angle, so that the obtained neural network can better predict the sight angle in the eye region, and the accuracy of sight angle estimation is improved.
In some optional embodiments, the neural network further comprises an eye segmentation branch network; at this point, both eyes in the sample eye image also have corresponding known eye keypoints.
Step 701 may further include:
and determining a supervision iris region, a supervision scleral region and a supervision background region in the sample eye image based on the known eye key points corresponding to the sample eye image.
Wherein, the eye key points may include, but are not limited to, 16 eye key points; and performing ellipse fitting on the eye key points to determine a supervised iris region and a supervised eyelid region in the sample eye image, wherein other regions in the sample eye image can be used as a supervised background region.
And carrying out segmentation processing on the eye features of the sample by using the eye segmentation branch network to obtain a segmented predicted iris region, a predicted eyelid region and a predicted background region.
The eye segmentation branch network in the embodiment realizes region segmentation of the sample eye features, and segments the sample eye features into a predicted iris region, a predicted eyelid region and a predicted background region.
The second loss is determined based on the supervised iris area, the supervised eyelid area, and the supervised background area, and the predicted iris area, the predicted eyelid area, and the predicted background area.
Optionally, taking a surveillance iris region, a surveillance eyelid region and a surveillance background region as surveillance information; determining a second loss collectively as a difference between the predicted iris region and the supervised iris region, as a difference between the predicted eyelid region and the supervised eyelid region, and as a difference between the predicted background region and the supervised background region; for example, in the embodiment shown in FIG. 2a, the second loss is derived based on the output of the eye segmentation branch network 205.
In this embodiment, step 704 may include:
weighting and summing the first loss and the second loss to obtain network loss; the neural network is trained based on the network losses.
In this embodiment, the network loss is determined by combining the first loss and the second loss, and the process of training the neural network based on the network loss may refer to step 704, which is not described herein again; the embodiment alleviates the problem of inaccuracy caused by determining the sight angle with the face orientation under the condition that the relative amplitude of the head posture is small and the sight angle is large by combining the second loss, and enhances the capability of accurate regression; the pupil segmentation result and the sight angle application result are fed back to the neural network at the same time, so that the neural network learns the characteristics of the two tasks at the same time, and the internal connection of the two tasks can be mutually promoted, so that the neural network obtains a better sight angle result.
In some optional embodiments, the neural network further comprises a keypoint prediction branch network;
step 701 may further include:
and performing key point prediction on the eye features of the sample by using the key point prediction branch network to obtain predicted eye key points.
Alternatively, the keyword prediction branch network may be a key point detection network, and the predicted eye key points in the sample eye image are obtained through detection, and optionally, the number of the predicted eye key points is the same as the number of known eye key points, for example, all of the predicted eye key points include 16 eye key points, and the like.
A third loss is determined based on the predicted ocular keypoints and the known ocular keypoints.
Optionally, each known eye key point in the known eye key points corresponds to a known coordinate, a predicted coordinate corresponding to each predicted eye key point in the predicted eye key points is predicted, for convenience of correspondence, each point in the known eye key points and the predicted eye key points can be numbered respectively, differences between the known eye key points and the predicted eye key points in correspondence relation are determined through numbering, and a third loss is determined based on differences between all the known eye key points and the predicted eye key points; for example, in the embodiment shown in FIG. 2a, the output of the branch network 206 is predicted based on the keypoints to yield a third penalty.
In this embodiment, step 704 may include:
weighting and summing the first loss, the second loss and the third loss to obtain the network loss; the neural network is trained based on the network losses.
In this embodiment, the network loss is determined by combining the first loss, the second loss, and the third loss, and the process of training the neural network based on the network loss may refer to step 704, which is not described herein again; in the embodiment, on the basis of combining the second loss, the neural network is trained by combining the third loss, and the prediction accuracy of the neural network on the sight angle is further improved through the key points of the eyes.
In some alternative embodiments, step 702 may include:
and determining a prediction angle interval vector corresponding to the two eyes in the sample eye image based on the sample eye features by utilizing a first positioning branch network of the neural network.
The implementation and effect of this step can refer to step 3031 in the above embodiment, and will not be described herein again.
And determining a prediction offset angle vector corresponding to the two eyes in the sample eye image based on the sample eye feature and the prediction angle interval vector by utilizing a second positioning branch network of the neural network.
The implementation and effect of this step can refer to step 3032 in the above embodiment, and will not be described herein again.
And determining the predicted sight line angles corresponding to the two eyes in the sample eye image based on the prediction angle interval vector and the prediction offset angle vector.
The implementation and effect of this step can refer to step 304 in the above embodiment, and will not be described herein again.
The network structure of the neural network provided in this embodiment does not change during the training process, and therefore, the process of determining the predicted gaze angle during the network training is the same as the process of estimating the gaze angle by using the neural network, and only the difference is that the network parameters in the neural network are different, the predicted gaze angle obtained in this embodiment is not the optimal gaze angle result, and the embodiment realizes the prediction of the gaze angle by using the feature extraction branch network of the neural network, the first positioning branch network and the second positioning branch network, that is, the adjustment of the network parameters in the three branch networks is realized, the performance of each branch network is improved, and the prediction accuracy of the neural network is further improved.
Any of the gaze angle estimation or neural network training methods provided by embodiments of the present disclosure may be performed by any suitable device having data processing capabilities, including but not limited to: terminal equipment, a server and the like. Alternatively, any of the gaze angle estimation or neural network training methods provided by embodiments of the present disclosure may be performed by a processor, such as a processor that executes any of the gaze angle estimation or neural network training methods mentioned by embodiments of the present disclosure by invoking corresponding instructions stored in a memory. And will not be described in detail below.
Exemplary devices
Fig. 8 is a schematic structural diagram of a gaze angle estimation apparatus according to an exemplary embodiment of the present disclosure. As shown in fig. 8, the apparatus provided in this embodiment includes:
and the region extraction module 81 is configured to determine an eye region included in the image to be detected.
And the feature extraction module 82 is configured to perform feature extraction on the eye region obtained by the region extraction module 81 to obtain an eye region feature corresponding to the eye region.
The interval estimation module 83 is configured to determine an angle interval vector corresponding to two eyes in the eye region and an offset angle vector corresponding to two eyes in the eye region based on the eye region feature obtained by the feature extraction module 82.
An angle determining module 84, configured to determine gaze angles corresponding to the two eyes in the eye region based on the angle interval vector and the offset angle vector determined by the interval estimating module 83.
According to the sight angle estimation device provided by the embodiment of the disclosure, the accuracy of sight angle estimation is improved by combining the positioning angle interval with the offset angle, and the sight angle estimation method can adapt to head postures of various amplitudes and improve robustness because the eye area is used as a basis for processing.
Fig. 9 is a schematic structural diagram of a gaze angle estimation apparatus according to another exemplary embodiment of the present disclosure. As shown in fig. 9, the apparatus provided in this embodiment includes:
the feature extraction module 82 is specifically configured to perform feature extraction on the eye region by using a feature extraction branch network of the neural network, so as to obtain an eye region feature corresponding to the eye region.
An interval estimation module 83, comprising:
a first branch positioning unit 831, configured to determine, by using a first positioning branch network of the neural network, angle interval vectors corresponding to both eyes in the eye region based on the eye region features;
a second branch positioning unit 832 for determining an offset angle vector corresponding to both eyes in the eye region based on the eye region feature and the angle interval vector using a second positioning branch network of the neural network.
Optionally, the first branch positioning unit 831 is specifically configured to divide the included angles of the to-be-detected image in the horizontal direction and the vertical direction with respect to the normal vector into a set number of horizontal angle intervals and a set number of vertical angle intervals, respectively; processing the eye region features by using a first positioning branch network of the neural network to obtain a first confidence coefficient of each horizontal angle interval corresponding to each eye in the eye region and a second confidence coefficient of each vertical angle interval corresponding to each eye in the eye region; determining two horizontal angle intervals corresponding to the two eyes based on the first confidence coefficient, and determining two vertical angle intervals corresponding to the two eyes based on the second confidence coefficient; and determining angle interval vectors corresponding to the two eyes based on the two horizontal angle intervals and the two vertical angle intervals corresponding to the two eyes.
Optionally, the second branch positioning unit 832 is specifically configured to utilize a second positioning branch network of the neural network to process the eye region features, so as to obtain two horizontal offset angles in the horizontal direction and two vertical offset angles in the vertical direction of each of the two eyes; based on the two horizontal offset angles and the two vertical offset angles, offset angle vectors corresponding to the two eyes are determined.
Optionally, the gaze angle comprises a horizontal gaze angle and a vertical gaze angle for the left eye, and a horizontal gaze angle and a vertical gaze angle for the right eye;
an angle determination module 84, comprising:
an angle vector determining unit 841, configured to add the angle interval vector and the offset angle vector to obtain a gaze angle vector;
a binocular angle determination unit 842, configured to determine a horizontal line-of-sight angle and a vertical line-of-sight angle for the left eye and a horizontal line-of-sight angle and a vertical line-of-sight angle for the right eye in the eyes of the eye region based on the line-of-sight angle vectors.
A region extraction module 81 comprising:
the face detection unit 811 is used for performing face detection processing on the image to be detected to obtain at least one face region in the image to be detected;
a human eye detection unit 812, configured to perform face key point identification on each face region to obtain human eye key points in the face region;
the eye region determining unit 813 is configured to obtain an eye region corresponding to the face region based on the key points of human eyes in each face region.
In this embodiment, before the feature extraction module 82, the method may further include:
the data enhancement module 85 performs local histogram equalization processing on the eye region to enhance the contrast of the eye region, so as to obtain the eye region with enhanced contrast.
Fig. 10 is a schematic structural diagram of a neural network training device according to an exemplary embodiment of the present disclosure. The neural network comprises a feature extraction branch network, a first positioning branch network and a second positioning branch network; as shown in fig. 10, the apparatus provided in this embodiment includes:
and the feature extraction module 11 is configured to input the sample eye image into the neural network, and perform feature extraction on the sample eye image through a feature extraction branch network of the neural network to obtain a sample eye feature.
The sample eye image comprises two eyes of a human face, and the two eyes in the sample eye image have corresponding known sight angles.
The angle estimation module 12 is configured to process the sample eye features obtained by the feature extraction module 11 by using the first positioning branch network and the second positioning branch network to obtain predicted gaze angles corresponding to two eyes in the sample eye image;
and a loss determining module 13, configured to determine a first loss based on the predicted gaze angle and the known gaze angle obtained by the angle estimating module 12.
And a network training module 14, configured to determine a network loss based on the first loss obtained by the loss determining module 13, and train the neural network based on the network loss.
In the embodiment, the neural network is trained by utilizing the known sight angle, so that the obtained neural network can better predict the sight angle in the eye region, and the accuracy of sight angle estimation is improved.
Fig. 11 is a schematic structural diagram of a neural network training device according to another exemplary embodiment of the present disclosure. As shown in fig. 11, the apparatus provided in this embodiment includes:
the neural network further comprises an eye segmentation branch network; both eyes in the sample eye image also have corresponding known eye keypoints;
after the feature extraction module 11, the method further includes:
and the segmentation supervision determining module 15 is configured to determine a supervision iris region, a supervision eyelid region and a supervision background region in the sample eye image based on the known eye key points corresponding to the sample eye image.
And the segmentation prediction module 16 is configured to perform segmentation processing on the eye features of the sample by using an eye segmentation branch network to obtain a segmented predicted iris region, a predicted eyelid region, and a predicted background region.
The second loss determination module 17 determines a second loss based on the supervised iris region, the supervised eyelid region, and the supervised background region, and the predicted iris region, the predicted eyelid region, and the predicted background region.
The network training module 14 is specifically configured to perform weighted summation on the first loss and the second loss to obtain a network loss; the neural network is trained based on the network losses.
Optionally, the neural network further comprises a keypoint prediction branch network;
after the feature extraction module 11, the method further includes:
and the key point prediction module 18 is configured to perform key point prediction on the eye features of the sample by using the key point prediction branch network to obtain predicted eye key points.
A third loss determination module 19 for determining a third loss based on the predicted eye keypoints and the known eye keypoints.
The network training module 14 is specifically configured to perform weighted summation on the first loss, the second loss, and the third loss to obtain a network loss; the neural network is trained based on the network losses.
The angle estimation module 12 is specifically configured to determine, by using a first positioning branch network of a neural network, prediction angle interval vectors corresponding to both eyes in a sample eye image based on the sample eye features; determining a prediction offset angle vector corresponding to the two eyes in the sample eye image based on the sample eye feature and the prediction angle interval vector by utilizing a second positioning branch network of the neural network; and determining the predicted sight line angles corresponding to the two eyes in the sample eye image based on the prediction angle interval vector and the prediction offset angle vector.
Exemplary electronic device
Next, an electronic apparatus according to an embodiment of the present disclosure is described with reference to fig. 12. The electronic device may be either or both of the first device 100 and the second device 200, or a stand-alone device separate from them that may communicate with the first device and the second device to receive the collected input signals therefrom.
FIG. 12 illustrates a block diagram of an electronic device in accordance with an embodiment of the disclosure.
As shown in fig. 12, the electronic device 120 includes one or more processors 121 and a memory 122.
The processor 121 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 120 to perform desired functions.
Memory 122 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by processor 121 to implement the gaze angle estimation or neural network training methods of the various embodiments of the disclosure described above and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.
In one example, the electronic device 120 may further include: an input device 123 and an output device 124, which are interconnected by a bus system and/or other form of connection mechanism (not shown).
For example, when the electronic device is the first device 100 or the second device 200, the input device 123 may be the microphone or the microphone array described above for capturing the input signal of the sound source. When the electronic device is a stand-alone device, the input means 123 may be a communication network connector for receiving the acquired input signals from the first device 100 and the second device 200.
The input device 123 may also include, for example, a keyboard, a mouse, and the like.
The output device 124 may output various information including the determined distance information, direction information, and the like to the outside. The output devices 124 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.
Of course, for simplicity, only some of the components of the electronic device 120 relevant to the present disclosure are shown in fig. 12, omitting components such as buses, input/output interfaces, and the like. In addition, the electronic device 120 may include any other suitable components, depending on the particular application.
Exemplary computer program product and computer-readable storage Medium
In addition to the above-described methods and apparatus, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the gaze angle estimation or neural network training method according to various embodiments of the present disclosure described in the "exemplary methods" section above of this specification.
The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the gaze angle estimation or neural network training method according to various embodiments of the present disclosure described in the "exemplary methods" section above in this specification.
The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims (10)

1. A gaze angle estimation method, comprising:
determining an eye region included in an image to be detected;
performing feature extraction on the eye region to obtain eye region features corresponding to the eye region;
determining angle interval vectors corresponding to the two eyes in the eye region and offset angle vectors corresponding to the two eyes in the eye region based on the eye region features;
determining a gaze angle corresponding to both eyes in the eye region based on the angle interval vector and the offset angle vector.
2. The method according to claim 1, wherein the extracting the features of the eye region to obtain the features of the eye region corresponding to the eye region comprises:
and performing feature extraction on the eye region by using a feature extraction branch network of the neural network to obtain eye region features corresponding to the eye region.
3. The method of claim 2, wherein the determining, based on the ocular region features, an angle interval vector corresponding to both eyes in the ocular region and an offset angle vector corresponding to both eyes in the ocular region comprises:
determining angle interval vectors corresponding to the two eyes in the eye region based on the eye region features by utilizing a first positioning branch network of the neural network;
determining, with a second positioning branch network of the neural network, offset angle vectors corresponding to both eyes in the eye region based on the eye region features and the angle interval vector.
4. The method of claim 1, wherein the gaze angle comprises a horizontal gaze angle and a vertical gaze angle for a left eye, and a horizontal gaze angle and a vertical gaze angle for a right eye;
the determining, based on the angle interval vector and the offset angle vector, gaze angles corresponding to both eyes in the eye region includes:
adding the angle interval vector and the offset angle vector to obtain a sight line angle vector;
determining a horizontal line-of-sight angle and a vertical line-of-sight angle for a left eye, and a horizontal line-of-sight angle and a vertical line-of-sight angle for a right eye of both eyes of the eye region based on the line-of-sight angle vectors.
5. A neural network training method, the neural network includes a feature extraction branch network, a first positioning branch network and a second positioning branch network; the method comprises the following steps:
inputting a sample eye image into the neural network, and performing feature extraction on the sample eye image through a feature extraction branch network of the neural network to obtain a sample eye feature; the sample eye image comprises two eyes of a human face, and the two eyes in the sample eye image have corresponding known sight angles;
processing the sample eye features by utilizing the first positioning branch network and the second positioning branch network to obtain predicted sight angles corresponding to two eyes in the sample eye image;
determining a first loss based on the predicted gaze angle and the known gaze angle;
determining a net loss based on the first loss, training the neural network based on the net loss.
6. The method of claim 5, the neural network further comprising an eye segmentation branch network; both eyes in the sample eye image also have corresponding known eye keypoints;
after the step of inputting the sample eye image into the neural network and performing feature extraction on the sample eye image through the feature extraction branch network of the neural network to obtain the sample eye feature, the method further includes:
determining a supervised iris region, a supervised eyelid region and a supervised background region in the sample eye image based on known eye key points corresponding to the sample eye image;
carrying out segmentation processing on the sample eye features by using the eye segmentation branch network to obtain a segmented predicted iris region, a predicted eyelid region and a predicted background region;
determining a second loss based on the supervised iris, eyelid, and background regions, and the predicted iris, eyelid, and background regions;
the determining a net loss based on the first loss, training the neural network based on the net loss, comprising:
weighting and summing the first loss and the second loss to obtain a network loss;
training the neural network based on the network loss.
7. A gaze angle estimation device, comprising:
the region extraction module is used for determining an eye region included in the image to be detected;
the feature extraction module is used for performing feature extraction on the eye region obtained by the region extraction module to obtain eye region features corresponding to the eye region;
the interval estimation module is used for determining angle interval vectors corresponding to the two eyes in the eye region and offset angle vectors corresponding to the two eyes in the eye region based on the eye region features obtained by the feature extraction module;
and the angle determining module is used for determining the sight line angles corresponding to the two eyes in the eye region based on the angle interval vector and the offset angle vector determined by the interval estimation module.
8. A neural network training device, the neural network comprising a feature extraction branch network, a first positioning branch network and a second positioning branch network; the method comprises the following steps:
the characteristic extraction module is used for inputting the sample eye image into the neural network and extracting the characteristics of the sample eye image through the characteristic extraction branch network of the neural network to obtain the sample eye characteristics; the sample eye image comprises two eyes of a human face, and the two eyes in the sample eye image have corresponding known sight angles;
the angle estimation module is used for processing the sample eye features obtained by the feature extraction module by utilizing the first positioning branch network and the second positioning branch network to obtain predicted sight angles corresponding to two eyes in the sample eye images;
a loss determination module for determining a first loss based on the predicted gaze angle and the known gaze angle obtained by the angle estimation module;
and the network training module is used for determining the network loss based on the first loss obtained by the loss determining module and training the neural network based on the network loss.
9. A computer-readable storage medium storing a computer program for executing the gaze angle estimation method of any one of claims 1 to 4 or the neural network training method of any one of claims 5 to 6.
10. An electronic device, the electronic device comprising:
a processor; a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the instructions to implement the gaze angle estimation method of any one of claims 1 to 4 or the neural network training method of any one of claims 5 to 6.
CN202011502278.0A 2020-12-18 2020-12-18 Gaze angle estimation method and device, storage medium, and electronic device Pending CN112597872A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011502278.0A CN112597872A (en) 2020-12-18 2020-12-18 Gaze angle estimation method and device, storage medium, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011502278.0A CN112597872A (en) 2020-12-18 2020-12-18 Gaze angle estimation method and device, storage medium, and electronic device

Publications (1)

Publication Number Publication Date
CN112597872A true CN112597872A (en) 2021-04-02

Family

ID=75199376

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011502278.0A Pending CN112597872A (en) 2020-12-18 2020-12-18 Gaze angle estimation method and device, storage medium, and electronic device

Country Status (1)

Country Link
CN (1) CN112597872A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113470114A (en) * 2021-08-31 2021-10-01 北京世纪好未来教育科技有限公司 Sight estimation method, sight estimation device, electronic equipment and computer-readable storage medium
CN113468956A (en) * 2021-05-24 2021-10-01 北京迈格威科技有限公司 Attention judging method, model training method and corresponding device
CN113506328A (en) * 2021-07-16 2021-10-15 北京地平线信息技术有限公司 Method and device for generating sight line estimation model and method and device for estimating sight line
CN113705349A (en) * 2021-07-26 2021-11-26 电子科技大学 Attention power analysis method and system based on sight estimation neural network
WO2024051345A1 (en) * 2022-09-07 2024-03-14 浙江极氪智能科技有限公司 Driver's line of sight identification method and apparatus, vehicle and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102548465A (en) * 2010-09-27 2012-07-04 松下电器产业株式会社 Line-of-sight estimation device
CN107871107A (en) * 2016-09-26 2018-04-03 北京眼神科技有限公司 Face authentication method and device
CN108734086A (en) * 2018-03-27 2018-11-02 西安科技大学 The frequency of wink and gaze estimation method of network are generated based on ocular
CN109033957A (en) * 2018-06-20 2018-12-18 同济大学 A kind of gaze estimation method based on quadratic polynomial
CN109492514A (en) * 2018-08-28 2019-03-19 初速度(苏州)科技有限公司 A kind of method and system in one camera acquisition human eye sight direction
JP2019098024A (en) * 2017-12-06 2019-06-24 国立大学法人静岡大学 Image processing device and method
CN110310321A (en) * 2018-03-23 2019-10-08 爱信精机株式会社 Direction of visual lines estimates device, direction of visual lines evaluation method and direction of visual lines estimation program
CN110889332A (en) * 2019-10-30 2020-03-17 中国科学院自动化研究所南京人工智能芯片创新研究院 Lie detection method based on micro expression in interview

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102548465A (en) * 2010-09-27 2012-07-04 松下电器产业株式会社 Line-of-sight estimation device
CN107871107A (en) * 2016-09-26 2018-04-03 北京眼神科技有限公司 Face authentication method and device
JP2019098024A (en) * 2017-12-06 2019-06-24 国立大学法人静岡大学 Image processing device and method
CN110310321A (en) * 2018-03-23 2019-10-08 爱信精机株式会社 Direction of visual lines estimates device, direction of visual lines evaluation method and direction of visual lines estimation program
CN108734086A (en) * 2018-03-27 2018-11-02 西安科技大学 The frequency of wink and gaze estimation method of network are generated based on ocular
CN109033957A (en) * 2018-06-20 2018-12-18 同济大学 A kind of gaze estimation method based on quadratic polynomial
CN109492514A (en) * 2018-08-28 2019-03-19 初速度(苏州)科技有限公司 A kind of method and system in one camera acquisition human eye sight direction
CN110889332A (en) * 2019-10-30 2020-03-17 中国科学院自动化研究所南京人工智能芯片创新研究院 Lie detection method based on micro expression in interview

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113468956A (en) * 2021-05-24 2021-10-01 北京迈格威科技有限公司 Attention judging method, model training method and corresponding device
CN113506328A (en) * 2021-07-16 2021-10-15 北京地平线信息技术有限公司 Method and device for generating sight line estimation model and method and device for estimating sight line
CN113705349A (en) * 2021-07-26 2021-11-26 电子科技大学 Attention power analysis method and system based on sight estimation neural network
CN113705349B (en) * 2021-07-26 2023-06-06 电子科技大学 Attention quantitative analysis method and system based on line-of-sight estimation neural network
CN113470114A (en) * 2021-08-31 2021-10-01 北京世纪好未来教育科技有限公司 Sight estimation method, sight estimation device, electronic equipment and computer-readable storage medium
WO2024051345A1 (en) * 2022-09-07 2024-03-14 浙江极氪智能科技有限公司 Driver's line of sight identification method and apparatus, vehicle and storage medium

Similar Documents

Publication Publication Date Title
CN112597872A (en) Gaze angle estimation method and device, storage medium, and electronic device
US11295474B2 (en) Gaze point determination method and apparatus, electronic device, and computer storage medium
US11182592B2 (en) Target object recognition method and apparatus, storage medium, and electronic device
EP3525165B1 (en) Method and apparatus with image fusion
US9626553B2 (en) Object identification apparatus and object identification method
US11163978B2 (en) Method and device for face image processing, storage medium, and electronic device
JP2018525718A (en) Face recognition system and face recognition method
WO2018219180A1 (en) Method and apparatus for determining facial image quality, as well as electronic device and computer storage medium
US11367310B2 (en) Method and apparatus for identity verification, electronic device, computer program, and storage medium
JP6822482B2 (en) Line-of-sight estimation device, line-of-sight estimation method, and program recording medium
JP6071002B2 (en) Reliability acquisition device, reliability acquisition method, and reliability acquisition program
US11386710B2 (en) Eye state detection method, electronic device, detecting apparatus and computer readable storage medium
KR20210012012A (en) Object tracking methods and apparatuses, electronic devices and storage media
CN106471440A (en) Eye tracking based on efficient forest sensing
CN111898571A (en) Action recognition system and method
CN112651321A (en) File processing method and device and server
JP7227385B2 (en) Neural network training and eye open/close state detection method, apparatus and equipment
CN113592706A (en) Method and device for adjusting homography matrix parameters
CN112200109A (en) Face attribute recognition method, electronic device, and computer-readable storage medium
CN109858355B (en) Image processing method and related product
EP2998928B1 (en) Apparatus and method for extracting high watermark image from continuously photographed images
US10902628B1 (en) Method for estimating user eye orientation using a system-independent learned mapping
CN113506328A (en) Method and device for generating sight line estimation model and method and device for estimating sight line
JP7103443B2 (en) Information processing equipment, information processing methods, and programs
CN112199978A (en) Video object detection method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination