CN112597872A - Gaze angle estimation method and device, storage medium, and electronic device - Google Patents
Gaze angle estimation method and device, storage medium, and electronic device Download PDFInfo
- Publication number
- CN112597872A CN112597872A CN202011502278.0A CN202011502278A CN112597872A CN 112597872 A CN112597872 A CN 112597872A CN 202011502278 A CN202011502278 A CN 202011502278A CN 112597872 A CN112597872 A CN 112597872A
- Authority
- CN
- China
- Prior art keywords
- eye
- angle
- region
- eyes
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 96
- 210000001508 eye Anatomy 0.000 claims abstract description 423
- 239000013598 vector Substances 0.000 claims abstract description 106
- 238000000605 extraction Methods 0.000 claims abstract description 75
- 238000012545 processing Methods 0.000 claims abstract description 27
- 238000013528 artificial neural network Methods 0.000 claims description 121
- 238000012549 training Methods 0.000 claims description 41
- 230000011218 segmentation Effects 0.000 claims description 19
- 210000000744 eyelid Anatomy 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 9
- 210000003128 head Anatomy 0.000 abstract description 6
- 230000036544 posture Effects 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 24
- 238000001514 detection method Methods 0.000 description 18
- 230000008569 process Effects 0.000 description 12
- 210000005252 bulbus oculi Anatomy 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000004886 head movement Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 210000001747 pupil Anatomy 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/18—Eye characteristics, e.g. of the iris
- G06V40/197—Matching; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/18—Eye characteristics, e.g. of the iris
- G06V40/193—Preprocessing; Feature extraction
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Ophthalmology & Optometry (AREA)
- Human Computer Interaction (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the disclosure discloses a sight angle estimation method and device, a storage medium and electronic equipment, wherein the method comprises the following steps: determining an eye region included in an image to be detected; performing feature extraction on the eye region to obtain eye region features corresponding to the eye region; determining angle interval vectors corresponding to the two eyes in the eye region and offset angle vectors corresponding to the two eyes in the eye region based on the eye region features; determining a gaze angle corresponding to both eyes in the eye region based on the angle interval vector and the offset angle vector. According to the embodiment of the invention, the accuracy of the sight angle estimation is improved by combining the positioning angle interval with the offset angle, and the embodiment of the invention can adapt to head postures of various amplitudes by processing on the basis of the eye region, so that the robustness of the sight angle estimation method is improved.
Description
Technical Field
The present disclosure relates to computer vision technologies, and in particular, to a gaze angle estimation method and apparatus, a storage medium, and an electronic device.
Background
The sight angle estimation is to deduce the sight direction of a person from an eye picture or a face picture. In human-computer interaction, a computer automatically estimates the observation direction of human eyes, namely sight angle estimation, which is an important application; for example, during driving, the current state of the driver can be predicted through the sight angle estimation, and whether distraction, fatigue and the like occur, so that the automobile is controlled to execute a series of corresponding operations and/or reminders.
Disclosure of Invention
The present disclosure is proposed to solve the above technical problems. The embodiment of the disclosure provides a sight angle estimation method and device, a storage medium and an electronic device.
According to an aspect of the embodiments of the present disclosure, there is provided a gaze angle estimation method, including:
determining an eye region included in an image to be detected;
performing feature extraction on the eye region to obtain eye region features corresponding to the eye region;
determining angle interval vectors corresponding to the two eyes in the eye region and offset angle vectors corresponding to the two eyes in the eye region based on the eye region features;
determining a gaze angle corresponding to both eyes in the eye region based on the angle interval vector and the offset angle vector.
According to another aspect of the embodiments of the present disclosure, there is provided a neural network training method, the neural network including a feature extraction branch network, a first positioning branch network, and a second positioning branch network; the method comprises the following steps:
inputting a sample eye image into the neural network, and performing feature extraction on the sample eye image through a feature extraction branch network of the neural network to obtain a sample eye feature; the sample eye image comprises two eyes of a human face, and the two eyes in the sample eye image have corresponding known sight angles;
processing the sample eye features by utilizing the first positioning branch network and the second positioning branch network to obtain predicted sight angles corresponding to two eyes in the sample eye image;
determining a first loss based on the predicted gaze angle and the known gaze angle;
determining a net loss based on the first loss, training the neural network based on the net loss.
According to still another aspect of the embodiments of the present disclosure, there is provided a gaze angle estimation apparatus including:
the region extraction module is used for determining an eye region included in the image to be detected;
the feature extraction module is used for performing feature extraction on each eye region obtained by the region extraction module to obtain eye region features corresponding to each eye region;
the interval estimation module is used for determining angle interval vectors corresponding to the two eyes in the eye region and offset angle vectors corresponding to the two eyes in the eye region based on the eye region features obtained by the feature extraction module;
and the angle determining module is used for determining the sight line angles corresponding to the two eyes in the eye region based on the angle interval vector and the offset angle vector determined by the interval estimation module.
According to still another aspect of the embodiments of the present disclosure, there is provided a neural network training apparatus, the neural network including a feature extraction branch network, a first positioning branch network, and a second positioning branch network; the method comprises the following steps:
the characteristic extraction module is used for inputting the sample eye image into the neural network and extracting the characteristics of the sample eye image through the characteristic extraction branch network of the neural network to obtain the sample eye characteristics; the sample eye image comprises two eyes of a human face, and the two eyes in the sample eye image have corresponding known sight angles;
the angle estimation module is used for processing the sample eye features obtained by the feature extraction module by utilizing the first positioning branch network and the second positioning branch network to obtain predicted sight angles corresponding to two eyes in the sample eye images;
a loss determination module for determining a first loss based on the predicted gaze angle and the known gaze angle obtained by the angle estimation module;
and the network training module is used for determining the network loss based on the first loss obtained by the loss determining module and training the neural network based on the network loss.
According to still another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program for executing the gaze angle estimation method of any of the above embodiments or the neural network training method of any of the above embodiments.
According to still another aspect of an embodiment of the present disclosure, there is provided an electronic apparatus including:
a processor; a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the instructions to implement the gaze angle estimation method according to any of the above embodiments or the neural network training method according to any of the above embodiments.
Based on the gaze angle estimation method and device, the storage medium, and the electronic device provided by the embodiments of the present disclosure, the accuracy of gaze angle estimation is improved by combining the positioning angle interval with the offset angle, and since the embodiments of the present disclosure perform processing based on the eye region, the method can adapt to head postures of various amplitudes, and improve the robustness of the gaze angle estimation method.
The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in more detail embodiments of the present disclosure with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. In the drawings, like reference numbers generally represent like parts or steps.
Fig. 1 is a schematic structural diagram of a gaze angle estimation system according to an exemplary embodiment of the present disclosure.
Fig. 2a is a schematic structural diagram of a neural network training system according to an exemplary embodiment of the present disclosure.
Fig. 2b is a schematic view of a gaze angle of a sample eye image applied in a neural network training system according to an exemplary embodiment of the present disclosure.
Fig. 3 is a flowchart illustrating a gaze angle estimation method according to an exemplary embodiment of the disclosure.
Fig. 4 is a schematic flow chart of step 303 in the embodiment shown in fig. 3 of the present disclosure.
Fig. 5 is a schematic flow chart of step 3031 in the embodiment shown in fig. 4 of the present disclosure.
Fig. 6a is a schematic flow chart of step 3032 in the embodiment shown in fig. 4 of the present disclosure.
Fig. 6b is a schematic diagram of an eye region in an example of a gaze angle estimation method provided by the embodiment of the disclosure.
Fig. 6c is a schematic diagram of an eye region subjected to data enhancement in one example of a gaze angle estimation method provided by the embodiment of the present disclosure.
Fig. 6d is a schematic diagram of an eye region in another example of the gaze angle estimation method provided by the embodiment of the disclosure.
Fig. 6e is a schematic diagram of an eye region subjected to data enhancement in another example of the gaze angle estimation method provided by the embodiment of the disclosure.
Fig. 7 is a flowchart illustrating a neural network training method according to an exemplary embodiment of the disclosure.
Fig. 8 is a schematic structural diagram of a gaze angle estimation apparatus according to an exemplary embodiment of the present disclosure.
Fig. 9 is a schematic structural diagram of a gaze angle estimation apparatus according to another exemplary embodiment of the present disclosure.
Fig. 10 is a schematic structural diagram of a neural network training device according to an exemplary embodiment of the present disclosure.
Fig. 11 is a schematic structural diagram of a neural network training device according to another exemplary embodiment of the present disclosure.
Fig. 12 is a block diagram of an electronic device provided in an exemplary embodiment of the present disclosure.
Detailed Description
Hereinafter, example embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.
It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.
It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present disclosure are used merely to distinguish one element from another, and are not intended to imply any particular technical meaning, nor is the necessary logical order between them.
It is also understood that in embodiments of the present disclosure, "a plurality" may refer to two or more and "at least one" may refer to one, two or more.
It is also to be understood that any reference to any component, data, or structure in the embodiments of the disclosure, may be generally understood as one or more, unless explicitly defined otherwise or stated otherwise.
In addition, the term "and/or" in the present disclosure is only one kind of association relationship describing an associated object, and means that three kinds of relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the former and latter associated objects are in an "or" relationship.
It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
The disclosed embodiments may be applied to electronic devices such as terminal devices, computer systems, servers, etc., which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, servers, and the like, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.
Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
Summary of the application
In the course of implementing the present disclosure, the inventor finds that, in the existing solutions, for the line-of-sight detection, the mainstream method is generally a model-based method, that is, the line-of-sight direction vector is calculated by using the infrared light source reflection and a geometric model. The existing scheme has at least the following problems: the cost is high, the requirement on image quality is high, complex equipment (a plurality of infrared light sources and a high-definition camera) is needed, the condition that the head posture is basically not moved needs to be met, and the application range of the scheme is greatly limited.
Exemplary System
The invention provides a sight angle estimation method, which is used for determining a sight angle from coarse to fine in two stages based on confidence coefficient and offset, and improving the contrast of an eye region by adopting a data enhancement method in a data preprocessing stage, so that the precision and robustness of sight angle estimation and the adaptability to head movement are improved.
Fig. 1 is a schematic structural diagram of a gaze angle estimation system according to an exemplary embodiment of the present disclosure. As shown in fig. 1, wherein the neural network comprises: a feature extraction branch network 102, a first positioning branch network 103 and a second positioning branch network 104;
the acquired image to be detected is processed by a face detection model and an eye key point model 101 to obtain an eye region (other preposition information such as eye key points).
And data enhancement processing is carried out on the eye area, the contrast of the eye area is enhanced, the attention to the eye area is strengthened, and the generalization capability of the model is improved.
Inputting the data-enhanced eye region into a neural network, and performing feature extraction on the eye region through a feature extraction branch network 102 of the neural network to obtain features of the eye region; the eye region characteristics are processed through a first positioning branch network 103 and a second positioning branch network 104 of a neural network to realize two-stage sight direction estimation, a sight region is divided into a set number of horizontal angle intervals and vertical angle intervals, the horizontal angle interval and the vertical angle interval where the sight is roughly positioned by the first positioning branch network 103 output corresponding confidence degrees, and an angle interval vector corresponding to the sight direction in the eye region is determined; the second positioning branch network 104 performs more detailed classification and regression, first performs classification, selects and maps the classification result to a certain interval according to the confidence, and then obtains the offset angle vector by using regression offset with finer granularity.
And adding the offset angle vector to the angle interval vector obtained by the first-stage coarse positioning, and determining the horizontal sight angle and the vertical sight angle of the left eye and the horizontal sight angle and the vertical sight angle of the right eye in the eyes of the eye region.
The positioning precision of the sight angle is improved by a two-stage coarse-fine positioning method; the label improves the contrast of the eye region by adopting data enhancement aiming at the eye region, and improves the robustness of the sight angle estimation method.
Fig. 2a is a schematic structural diagram of a neural network training system according to an exemplary embodiment of the present disclosure. As shown in fig. 2a, in the training process, the neural network includes: a feature extraction branch network 102, a first positioning branch network 103, a second positioning branch network 104, an eye segmentation branch network 205, and a key point prediction branch network 206;
inputting the sample eye image into a neural network, wherein the sample eye image can be obtained by face detection and human eye detection based on the sample image including eyes, and the sample eye image can be input into the neural network after data enhancement; both eyes in the sample eye image as shown in fig. 2b have corresponding known line of sight angles; wherein the known gaze angle may include: a known horizontal line of sight angle and a known vertical line of sight angle for the left eye, and a known horizontal line of sight angle and a known vertical line of sight angle for the right eye. The method comprises the steps of realizing two-stage sight direction estimation through a feature extraction branch network 102, a first positioning branch network 103 and a second positioning branch network 104 in a neural network, and determining predicted sight angles of two eyes of an eye region, wherein the predicted sight angles can comprise: the predicted horizontal and vertical gaze angles for the left eye, and the predicted horizontal and vertical gaze angles for the right eye.
A first loss is determined based on the predicted gaze angle and the known gaze angle.
On the basis of determining the first loss, the eye segmentation branch network 205 can be utilized to perform segmentation processing on the sample eye image, and a second loss is determined; the sample eye image includes known eye keypoints.
The auxiliary eye keypoint detection task determines a third loss based on the predicted eye keypoints and the known eye keypoints of the sample eye image determined by the keypoint prediction branch network 206.
Determining a network loss based on the first loss, the second loss, and the third loss, and adjusting network parameters in the neural network by inverse gradient propagation based on the network loss. When applied, the trained neural network only includes the feature extraction branch network 102, the first positioning branch network 103 and the second positioning branch network 104.
In the embodiment, the corresponding losses of the pupil segmentation task and the sight angle task are simultaneously fed back to the neural network, so that the neural network learns the characteristics of the two tasks, the internal relations of the two tasks can be mutually promoted, the model can obtain a better sight result, and the adaptability of the neural network to head movement is improved; and moreover, the loss corresponding to the eye key point detection task is combined, and the prediction precision of the neural network can be improved to a certain extent.
Exemplary method
Fig. 3 is a flowchart illustrating a gaze angle estimation method according to an exemplary embodiment of the disclosure. The embodiment can be applied to an electronic device, as shown in fig. 3, and includes the following steps:
The image to be detected includes at least one pair of eyes of a person, the eye region in this embodiment refers to an image region including the pair of eyes of the person, and optionally, the eye region may be obtained by the face detection model and the eye keypoint model 101 in the embodiment provided in fig. 1.
Optionally, feature extraction may be performed on the eye region by using a feature extraction network or by using a feature extraction branch network in the overall neural network to obtain features of the eye region, and optionally, the feature extraction branch network is the feature extraction branch network 102 in the embodiment provided in fig. 1.
In this embodiment, the two branch networks in the neural network may be used to process the eye region respectively, so as to obtain an angle interval vector and an offset angle vector, determine the large direction of the sight angle through the angle interval vector, and improve the accuracy of the sight angle direction through the offset angle vector.
And step 304, determining the sight line angles corresponding to the two eyes in the eye region based on the angle interval vector and the offset angle vector.
Alternatively, as shown in fig. 2b, the gaze angle includes a pitch angle (pitch) of the eyeball, which represents a rotation angle of the eyeball in the X axis, and a yaw angle (yaw), which represents a rotation angle of the eyeball in the Y axis; the angle interval vectors and the offset angle vectors are accumulated, the specific offset angle of the binocular vision in the angle interval vectors is determined according to the offset angle vectors, and the accuracy of the determined vision angle is improved.
According to the sight angle estimation method provided by the embodiment of the disclosure, the accuracy of sight angle estimation is improved by combining the positioning angle interval with the offset angle, and the method can adapt to head postures of various amplitudes and improve the robustness of the sight angle estimation method as the eye region is processed on the basis.
Optionally, step 302 in the above embodiment may include: and performing feature extraction on the eye region by using the feature extraction branch network of the neural network to obtain eye region features corresponding to the eye region.
The feature extraction branch network in the embodiment is a part of the neural network, and feature extraction is performed on the eye region through the feature extraction branch network, so that a basis is provided for subsequently estimating the view angle through other parts in the neural network, and the efficiency of view estimation is improved.
As shown in fig. 4, based on the embodiment shown in fig. 3, step 303 may include the following steps:
Optionally, as shown in fig. 1, the first positioning branch network 103 and the second positioning branch network 104 in this embodiment are respectively connected to the feature extraction branch network 102, and the first positioning branch network 103 and the second positioning branch network 104 may process the eye region features at the same time, so as to improve the network processing efficiency.
The neural network in this embodiment includes three branch networks: the feature extraction branch network, the first positioning branch network and the second positioning branch network are used for carrying out feature extraction once through the feature extraction branch network, and the eye region features obtained by feature extraction are respectively input into the first positioning branch network and the second positioning branch network, so that repeated feature extraction operation is avoided, and the processing efficiency of the neural network is improved.
As shown in fig. 5, based on the embodiment shown in fig. 4, step 3031 may include the following steps:
and 501, dividing the included angles of the image to be detected in the horizontal direction and the vertical direction relative to the normal vector into a set number of horizontal angle intervals and a set number of vertical angle intervals respectively.
In this embodiment, a direction perpendicular to the image to be detected is taken as a normal vector direction (corresponding to a yaw angle of an eyeball in the embodiment shown in fig. 2 b), an included angle between a horizontal direction and the normal vector is-90 degrees to 90 degrees (corresponding to a pitch angle of the eyeball in the embodiment shown in fig. 2 b), wherein 90 degrees represents that the image to be detected is horizontally left on a plane where the image to be detected is located; 90 degrees represents the horizontal right on the plane of the image to be detected; the 0-degree representation is superposed with the normal vector and is vertical to the image to be detected; the included angle between the vertical direction and the normal vector is-90 degrees to 90 degrees, wherein 90 degrees represent that the image to be detected is vertically upward on the plane; 90 degrees represents vertically downwards on the plane of the image to be detected; the 0-degree representation is superposed with the normal vector and is vertical to the image to be detected; this embodiment divides into the interval of setting for quantity respectively with the contained angle of horizontal direction and the contained angle of vertical direction, and every interval includes a plurality of angles, can number in order to distinguish every interval, for example, divides into 6 horizontal angle intervals with the contained angle of horizontal direction and normal vector, is respectively: 1 interval [ -90 degrees, -60 degrees ], 2 intervals [ -60 degrees, -30 degrees ], 3 intervals [ -30 degrees, 0 degrees ], 4 intervals [0 degrees, 30 degrees ], 5 intervals [30 degrees, 60 degrees ], 6 intervals [60 degrees, 90 degrees ]; the coarse positioning of the sight line direction can be realized by positioning the angle interval where the sight line direction is located.
In this embodiment, the first positioning branch network may be implemented by using a classification network, and the classification network determines a first confidence level of each horizontal angle interval corresponding to each of the two eyes in the eye region, and optionally, may determine a first confidence level of a left eye for a left eye of the two eyes, and may determine a first confidence level of a right eye for a right eye of the two eyes; and determining a second confidence degree of each vertical angle interval corresponding to the two eyes, optionally, determining a second confidence degree of the left eye for the left eye of the two eyes, and determining a second confidence degree of the right eye for the right eye.
Optionally, a horizontal angle interval and a vertical angle interval corresponding to the left eye, and a horizontal angle interval and a vertical angle interval corresponding to the right eye may be respectively determined for both eyes, wherein the horizontal angle interval corresponding to the maximum left-eye first confidence coefficient in the set number of left-eye first confidence coefficients corresponding to the left eye is taken as the horizontal angle interval corresponding to the left eye, and the horizontal angle interval corresponding to the maximum right-eye first confidence coefficient in the set number of right-eye first confidence coefficients corresponding to the right eye is taken as the horizontal angle interval corresponding to the right eye; and taking the vertical angle interval corresponding to the maximum left-eye second confidence coefficient in the set number of left-eye second confidence coefficients corresponding to the left eye as the vertical angle interval corresponding to the left eye, and taking the vertical angle interval corresponding to the maximum right-eye second confidence coefficient in the set number of right-eye second confidence coefficients corresponding to the right eye as the vertical angle interval corresponding to the right eye.
In this embodiment, a one-dimensional 4-element vector is formed based on two horizontal angle intervals and two vertical angle intervals, so that a horizontal angle interval and a vertical angle interval corresponding to two eyes expressed based on an angle interval vector are realized, a result of coarse positioning of the line of sight of two eyes is expressed by an angle interval vector, and since the coarse positioning divides an angle into intervals of a set number, the corresponding angle interval can be determined only by determining a confidence coefficient corresponding to each interval, so that data processing amount is greatly reduced, and efficiency of coarse positioning is improved.
As shown in fig. 6a, based on the embodiment shown in fig. 4, step 3032 may include the following steps:
In this embodiment, the second positioning branch network may perform more detailed classification and regression on the eyes in the eye region, for example, each interval corresponding to each rough classification is subdivided into a plurality of subintervals, and at this time, the second positioning branch network may be considered as a classification network, and the confidence level of each subinterval corresponding to the eyes is determined, so as to determine a finer-grained gaze angle; or the second positioning branch network is an offset prediction network, and based on the input eye region characteristics, the offset of the eyes in the determined angle interval vector is directly obtained, so that the finer-grained sight line angle is determined.
In this embodiment, a one-dimensional 4-dimensional vector is formed based on two horizontal offset angles and two vertical offset angles, so that a horizontal offset angle and a vertical offset angle corresponding to two eyes expressed based on one offset angle vector are realized, a fine-grained positioning result of the line of sight of two eyes is represented by the offset angle vector, the angle is divided into a set number of intervals by coarse positioning, only the approximate direction of the line of sight is determined by the coarse positioning, a specific line of sight angle can be determined by the offset angle vector, and the accuracy of the angular positioning of the line of sight of two eyes is improved.
In some alternative embodiments, the gaze angle comprises a horizontal gaze angle and a vertical gaze angle for the left eye, and a horizontal gaze angle and a vertical gaze angle for the right eye; step 304 in the above embodiments may include:
and adding the angle interval vector and the offset angle vector to obtain a sight angle vector.
As can be seen from the above embodiments, the angle interval vector and the offset angle vector are respectively one-dimensional 4-ary vectors, and the two vectors may be added according to elements to obtain a new one-dimensional 4-ary vector added by each element, that is, the sight angle vector.
A horizontal line-of-sight angle and a vertical line-of-sight angle for a left eye, and a horizontal line-of-sight angle and a vertical line-of-sight angle for a right eye in both eyes of the eye region are determined based on the line-of-sight angle vectors.
In the embodiment, the offset angle vector is superimposed on the angle interval vector of the coarse positioning, so that the sight line direction is accurately positioned in the angle interval vector of the coarse positioning, a coarse-to-fine sight line direction positioning mode is realized, and the robustness and the accuracy of the sight line angle estimation method are improved.
In some optional embodiments, step 301 may comprise:
and executing face detection processing on the image to be detected to obtain at least one face area in the image to be detected.
Optionally, as shown in the embodiment provided in fig. 1, the face detection model in the face detection model and the eye keypoint model 101 performs face detection on the image to be detected, and determines at least one face region included in the image to be detected.
And performing face key point identification on each face region to obtain eye key points in the face region.
Optionally, the eye keypoints in the face detection model and the eye keypoint model 101 shown in the embodiment provided in fig. 1 perform eye keypoint recognition on each face region respectively, and determine the eye keypoints included in the face region, where optionally, the eye keypoints may include, but are not limited to, 16 keypoints of each eye in both eyes.
And obtaining eye regions corresponding to the face regions based on the human eye key points in each face region.
The embodiment determines the positions of two eyes in each face region based on a plurality of eye key points obtained by detection, and further obtains the eye region in each face region; the sight angle estimation is carried out by the eye region, and the adaptability of the sight angle estimation method to head movement is improved due to the fact that other parts of the human face are removed.
In some optional embodiments, before step 302, the method may further include:
and performing local histogram equalization processing on the eye region to enhance the contrast of the eye region, so as to obtain the eye region with enhanced contrast.
According to the embodiment, the contrast of the eyes is enhanced through local histogram equalization aiming at the eye area, the attention to the eye area is enhanced, and the accuracy of the sight angle estimation is improved. For example, in an alternative example, fig. 6b is a schematic diagram of an eye region in an example of a gaze angle estimation method provided by the embodiment of the present disclosure; fig. 6c is a schematic diagram of an eye region subjected to data enhancement in one example of a gaze angle estimation method provided by the embodiment of the disclosure; fig. 6d is a schematic diagram of an eye region in another example of a gaze angle estimation method provided by the embodiment of the present disclosure; fig. 6e is a schematic diagram of an eye region subjected to data enhancement in another example of the gaze angle estimation method provided by the embodiment of the disclosure.
Fig. 7 is a flowchart illustrating a neural network training method according to an exemplary embodiment of the disclosure. The neural network comprises a feature extraction branch network, a first positioning branch network and a second positioning branch network; as shown in fig. 7, the method comprises the following steps:
The sample eye image comprises two eyes of a human face, and the two eyes in the sample eye image have corresponding known sight angles.
In this embodiment, the process of extracting features of the sample image may be implemented with reference to step 302 in the gaze angle estimation method.
This step can be implemented with reference to the above-described gaze angle estimation method, steps 303 and 304 in the embodiments shown in fig. 3-5, the only difference being that the network parameters in the neural network differ.
A first loss is determined based on the predicted gaze angle and the known gaze angle, step 703.
In this embodiment, the known gaze angle is used as the supervision information for neural network training, and the loss of the neural network is trained according to the difference between the predicted gaze angle predicted by the current neural network and the known gaze angle, for example, as shown in fig. 2a, the first loss is obtained according to the predicted gaze angles output by the first positioning branch network 103 and the second positioning branch network 104; the first loss may be calculated by any loss function, such as an L1 loss function.
And step 704, determining a network loss based on the first loss, and training the neural network based on the network loss.
Optionally, parameter adjustment of the neural network may be implemented in a gradient back propagation manner based on the network loss, the neural network after the parameter adjustment is obtained each time, and then the above steps 701 to 703 are executed by the neural network after the parameter adjustment until a preset condition is met, so as to obtain the trained neural network, where the preset condition may be a set iteration number, a value of the network loss decreased in two consecutive iterations is less than a set value, and the like. In the embodiment, the neural network is trained by utilizing the known sight angle, so that the obtained neural network can better predict the sight angle in the eye region, and the accuracy of sight angle estimation is improved.
In some optional embodiments, the neural network further comprises an eye segmentation branch network; at this point, both eyes in the sample eye image also have corresponding known eye keypoints.
Step 701 may further include:
and determining a supervision iris region, a supervision scleral region and a supervision background region in the sample eye image based on the known eye key points corresponding to the sample eye image.
Wherein, the eye key points may include, but are not limited to, 16 eye key points; and performing ellipse fitting on the eye key points to determine a supervised iris region and a supervised eyelid region in the sample eye image, wherein other regions in the sample eye image can be used as a supervised background region.
And carrying out segmentation processing on the eye features of the sample by using the eye segmentation branch network to obtain a segmented predicted iris region, a predicted eyelid region and a predicted background region.
The eye segmentation branch network in the embodiment realizes region segmentation of the sample eye features, and segments the sample eye features into a predicted iris region, a predicted eyelid region and a predicted background region.
The second loss is determined based on the supervised iris area, the supervised eyelid area, and the supervised background area, and the predicted iris area, the predicted eyelid area, and the predicted background area.
Optionally, taking a surveillance iris region, a surveillance eyelid region and a surveillance background region as surveillance information; determining a second loss collectively as a difference between the predicted iris region and the supervised iris region, as a difference between the predicted eyelid region and the supervised eyelid region, and as a difference between the predicted background region and the supervised background region; for example, in the embodiment shown in FIG. 2a, the second loss is derived based on the output of the eye segmentation branch network 205.
In this embodiment, step 704 may include:
weighting and summing the first loss and the second loss to obtain network loss; the neural network is trained based on the network losses.
In this embodiment, the network loss is determined by combining the first loss and the second loss, and the process of training the neural network based on the network loss may refer to step 704, which is not described herein again; the embodiment alleviates the problem of inaccuracy caused by determining the sight angle with the face orientation under the condition that the relative amplitude of the head posture is small and the sight angle is large by combining the second loss, and enhances the capability of accurate regression; the pupil segmentation result and the sight angle application result are fed back to the neural network at the same time, so that the neural network learns the characteristics of the two tasks at the same time, and the internal connection of the two tasks can be mutually promoted, so that the neural network obtains a better sight angle result.
In some optional embodiments, the neural network further comprises a keypoint prediction branch network;
step 701 may further include:
and performing key point prediction on the eye features of the sample by using the key point prediction branch network to obtain predicted eye key points.
Alternatively, the keyword prediction branch network may be a key point detection network, and the predicted eye key points in the sample eye image are obtained through detection, and optionally, the number of the predicted eye key points is the same as the number of known eye key points, for example, all of the predicted eye key points include 16 eye key points, and the like.
A third loss is determined based on the predicted ocular keypoints and the known ocular keypoints.
Optionally, each known eye key point in the known eye key points corresponds to a known coordinate, a predicted coordinate corresponding to each predicted eye key point in the predicted eye key points is predicted, for convenience of correspondence, each point in the known eye key points and the predicted eye key points can be numbered respectively, differences between the known eye key points and the predicted eye key points in correspondence relation are determined through numbering, and a third loss is determined based on differences between all the known eye key points and the predicted eye key points; for example, in the embodiment shown in FIG. 2a, the output of the branch network 206 is predicted based on the keypoints to yield a third penalty.
In this embodiment, step 704 may include:
weighting and summing the first loss, the second loss and the third loss to obtain the network loss; the neural network is trained based on the network losses.
In this embodiment, the network loss is determined by combining the first loss, the second loss, and the third loss, and the process of training the neural network based on the network loss may refer to step 704, which is not described herein again; in the embodiment, on the basis of combining the second loss, the neural network is trained by combining the third loss, and the prediction accuracy of the neural network on the sight angle is further improved through the key points of the eyes.
In some alternative embodiments, step 702 may include:
and determining a prediction angle interval vector corresponding to the two eyes in the sample eye image based on the sample eye features by utilizing a first positioning branch network of the neural network.
The implementation and effect of this step can refer to step 3031 in the above embodiment, and will not be described herein again.
And determining a prediction offset angle vector corresponding to the two eyes in the sample eye image based on the sample eye feature and the prediction angle interval vector by utilizing a second positioning branch network of the neural network.
The implementation and effect of this step can refer to step 3032 in the above embodiment, and will not be described herein again.
And determining the predicted sight line angles corresponding to the two eyes in the sample eye image based on the prediction angle interval vector and the prediction offset angle vector.
The implementation and effect of this step can refer to step 304 in the above embodiment, and will not be described herein again.
The network structure of the neural network provided in this embodiment does not change during the training process, and therefore, the process of determining the predicted gaze angle during the network training is the same as the process of estimating the gaze angle by using the neural network, and only the difference is that the network parameters in the neural network are different, the predicted gaze angle obtained in this embodiment is not the optimal gaze angle result, and the embodiment realizes the prediction of the gaze angle by using the feature extraction branch network of the neural network, the first positioning branch network and the second positioning branch network, that is, the adjustment of the network parameters in the three branch networks is realized, the performance of each branch network is improved, and the prediction accuracy of the neural network is further improved.
Any of the gaze angle estimation or neural network training methods provided by embodiments of the present disclosure may be performed by any suitable device having data processing capabilities, including but not limited to: terminal equipment, a server and the like. Alternatively, any of the gaze angle estimation or neural network training methods provided by embodiments of the present disclosure may be performed by a processor, such as a processor that executes any of the gaze angle estimation or neural network training methods mentioned by embodiments of the present disclosure by invoking corresponding instructions stored in a memory. And will not be described in detail below.
Exemplary devices
Fig. 8 is a schematic structural diagram of a gaze angle estimation apparatus according to an exemplary embodiment of the present disclosure. As shown in fig. 8, the apparatus provided in this embodiment includes:
and the region extraction module 81 is configured to determine an eye region included in the image to be detected.
And the feature extraction module 82 is configured to perform feature extraction on the eye region obtained by the region extraction module 81 to obtain an eye region feature corresponding to the eye region.
The interval estimation module 83 is configured to determine an angle interval vector corresponding to two eyes in the eye region and an offset angle vector corresponding to two eyes in the eye region based on the eye region feature obtained by the feature extraction module 82.
An angle determining module 84, configured to determine gaze angles corresponding to the two eyes in the eye region based on the angle interval vector and the offset angle vector determined by the interval estimating module 83.
According to the sight angle estimation device provided by the embodiment of the disclosure, the accuracy of sight angle estimation is improved by combining the positioning angle interval with the offset angle, and the sight angle estimation method can adapt to head postures of various amplitudes and improve robustness because the eye area is used as a basis for processing.
Fig. 9 is a schematic structural diagram of a gaze angle estimation apparatus according to another exemplary embodiment of the present disclosure. As shown in fig. 9, the apparatus provided in this embodiment includes:
the feature extraction module 82 is specifically configured to perform feature extraction on the eye region by using a feature extraction branch network of the neural network, so as to obtain an eye region feature corresponding to the eye region.
An interval estimation module 83, comprising:
a first branch positioning unit 831, configured to determine, by using a first positioning branch network of the neural network, angle interval vectors corresponding to both eyes in the eye region based on the eye region features;
a second branch positioning unit 832 for determining an offset angle vector corresponding to both eyes in the eye region based on the eye region feature and the angle interval vector using a second positioning branch network of the neural network.
Optionally, the first branch positioning unit 831 is specifically configured to divide the included angles of the to-be-detected image in the horizontal direction and the vertical direction with respect to the normal vector into a set number of horizontal angle intervals and a set number of vertical angle intervals, respectively; processing the eye region features by using a first positioning branch network of the neural network to obtain a first confidence coefficient of each horizontal angle interval corresponding to each eye in the eye region and a second confidence coefficient of each vertical angle interval corresponding to each eye in the eye region; determining two horizontal angle intervals corresponding to the two eyes based on the first confidence coefficient, and determining two vertical angle intervals corresponding to the two eyes based on the second confidence coefficient; and determining angle interval vectors corresponding to the two eyes based on the two horizontal angle intervals and the two vertical angle intervals corresponding to the two eyes.
Optionally, the second branch positioning unit 832 is specifically configured to utilize a second positioning branch network of the neural network to process the eye region features, so as to obtain two horizontal offset angles in the horizontal direction and two vertical offset angles in the vertical direction of each of the two eyes; based on the two horizontal offset angles and the two vertical offset angles, offset angle vectors corresponding to the two eyes are determined.
Optionally, the gaze angle comprises a horizontal gaze angle and a vertical gaze angle for the left eye, and a horizontal gaze angle and a vertical gaze angle for the right eye;
an angle determination module 84, comprising:
an angle vector determining unit 841, configured to add the angle interval vector and the offset angle vector to obtain a gaze angle vector;
a binocular angle determination unit 842, configured to determine a horizontal line-of-sight angle and a vertical line-of-sight angle for the left eye and a horizontal line-of-sight angle and a vertical line-of-sight angle for the right eye in the eyes of the eye region based on the line-of-sight angle vectors.
A region extraction module 81 comprising:
the face detection unit 811 is used for performing face detection processing on the image to be detected to obtain at least one face region in the image to be detected;
a human eye detection unit 812, configured to perform face key point identification on each face region to obtain human eye key points in the face region;
the eye region determining unit 813 is configured to obtain an eye region corresponding to the face region based on the key points of human eyes in each face region.
In this embodiment, before the feature extraction module 82, the method may further include:
the data enhancement module 85 performs local histogram equalization processing on the eye region to enhance the contrast of the eye region, so as to obtain the eye region with enhanced contrast.
Fig. 10 is a schematic structural diagram of a neural network training device according to an exemplary embodiment of the present disclosure. The neural network comprises a feature extraction branch network, a first positioning branch network and a second positioning branch network; as shown in fig. 10, the apparatus provided in this embodiment includes:
and the feature extraction module 11 is configured to input the sample eye image into the neural network, and perform feature extraction on the sample eye image through a feature extraction branch network of the neural network to obtain a sample eye feature.
The sample eye image comprises two eyes of a human face, and the two eyes in the sample eye image have corresponding known sight angles.
The angle estimation module 12 is configured to process the sample eye features obtained by the feature extraction module 11 by using the first positioning branch network and the second positioning branch network to obtain predicted gaze angles corresponding to two eyes in the sample eye image;
and a loss determining module 13, configured to determine a first loss based on the predicted gaze angle and the known gaze angle obtained by the angle estimating module 12.
And a network training module 14, configured to determine a network loss based on the first loss obtained by the loss determining module 13, and train the neural network based on the network loss.
In the embodiment, the neural network is trained by utilizing the known sight angle, so that the obtained neural network can better predict the sight angle in the eye region, and the accuracy of sight angle estimation is improved.
Fig. 11 is a schematic structural diagram of a neural network training device according to another exemplary embodiment of the present disclosure. As shown in fig. 11, the apparatus provided in this embodiment includes:
the neural network further comprises an eye segmentation branch network; both eyes in the sample eye image also have corresponding known eye keypoints;
after the feature extraction module 11, the method further includes:
and the segmentation supervision determining module 15 is configured to determine a supervision iris region, a supervision eyelid region and a supervision background region in the sample eye image based on the known eye key points corresponding to the sample eye image.
And the segmentation prediction module 16 is configured to perform segmentation processing on the eye features of the sample by using an eye segmentation branch network to obtain a segmented predicted iris region, a predicted eyelid region, and a predicted background region.
The second loss determination module 17 determines a second loss based on the supervised iris region, the supervised eyelid region, and the supervised background region, and the predicted iris region, the predicted eyelid region, and the predicted background region.
The network training module 14 is specifically configured to perform weighted summation on the first loss and the second loss to obtain a network loss; the neural network is trained based on the network losses.
Optionally, the neural network further comprises a keypoint prediction branch network;
after the feature extraction module 11, the method further includes:
and the key point prediction module 18 is configured to perform key point prediction on the eye features of the sample by using the key point prediction branch network to obtain predicted eye key points.
A third loss determination module 19 for determining a third loss based on the predicted eye keypoints and the known eye keypoints.
The network training module 14 is specifically configured to perform weighted summation on the first loss, the second loss, and the third loss to obtain a network loss; the neural network is trained based on the network losses.
The angle estimation module 12 is specifically configured to determine, by using a first positioning branch network of a neural network, prediction angle interval vectors corresponding to both eyes in a sample eye image based on the sample eye features; determining a prediction offset angle vector corresponding to the two eyes in the sample eye image based on the sample eye feature and the prediction angle interval vector by utilizing a second positioning branch network of the neural network; and determining the predicted sight line angles corresponding to the two eyes in the sample eye image based on the prediction angle interval vector and the prediction offset angle vector.
Exemplary electronic device
Next, an electronic apparatus according to an embodiment of the present disclosure is described with reference to fig. 12. The electronic device may be either or both of the first device 100 and the second device 200, or a stand-alone device separate from them that may communicate with the first device and the second device to receive the collected input signals therefrom.
FIG. 12 illustrates a block diagram of an electronic device in accordance with an embodiment of the disclosure.
As shown in fig. 12, the electronic device 120 includes one or more processors 121 and a memory 122.
The processor 121 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 120 to perform desired functions.
In one example, the electronic device 120 may further include: an input device 123 and an output device 124, which are interconnected by a bus system and/or other form of connection mechanism (not shown).
For example, when the electronic device is the first device 100 or the second device 200, the input device 123 may be the microphone or the microphone array described above for capturing the input signal of the sound source. When the electronic device is a stand-alone device, the input means 123 may be a communication network connector for receiving the acquired input signals from the first device 100 and the second device 200.
The input device 123 may also include, for example, a keyboard, a mouse, and the like.
The output device 124 may output various information including the determined distance information, direction information, and the like to the outside. The output devices 124 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.
Of course, for simplicity, only some of the components of the electronic device 120 relevant to the present disclosure are shown in fig. 12, omitting components such as buses, input/output interfaces, and the like. In addition, the electronic device 120 may include any other suitable components, depending on the particular application.
Exemplary computer program product and computer-readable storage Medium
In addition to the above-described methods and apparatus, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the gaze angle estimation or neural network training method according to various embodiments of the present disclosure described in the "exemplary methods" section above of this specification.
The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the gaze angle estimation or neural network training method according to various embodiments of the present disclosure described in the "exemplary methods" section above in this specification.
The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.
Claims (10)
1. A gaze angle estimation method, comprising:
determining an eye region included in an image to be detected;
performing feature extraction on the eye region to obtain eye region features corresponding to the eye region;
determining angle interval vectors corresponding to the two eyes in the eye region and offset angle vectors corresponding to the two eyes in the eye region based on the eye region features;
determining a gaze angle corresponding to both eyes in the eye region based on the angle interval vector and the offset angle vector.
2. The method according to claim 1, wherein the extracting the features of the eye region to obtain the features of the eye region corresponding to the eye region comprises:
and performing feature extraction on the eye region by using a feature extraction branch network of the neural network to obtain eye region features corresponding to the eye region.
3. The method of claim 2, wherein the determining, based on the ocular region features, an angle interval vector corresponding to both eyes in the ocular region and an offset angle vector corresponding to both eyes in the ocular region comprises:
determining angle interval vectors corresponding to the two eyes in the eye region based on the eye region features by utilizing a first positioning branch network of the neural network;
determining, with a second positioning branch network of the neural network, offset angle vectors corresponding to both eyes in the eye region based on the eye region features and the angle interval vector.
4. The method of claim 1, wherein the gaze angle comprises a horizontal gaze angle and a vertical gaze angle for a left eye, and a horizontal gaze angle and a vertical gaze angle for a right eye;
the determining, based on the angle interval vector and the offset angle vector, gaze angles corresponding to both eyes in the eye region includes:
adding the angle interval vector and the offset angle vector to obtain a sight line angle vector;
determining a horizontal line-of-sight angle and a vertical line-of-sight angle for a left eye, and a horizontal line-of-sight angle and a vertical line-of-sight angle for a right eye of both eyes of the eye region based on the line-of-sight angle vectors.
5. A neural network training method, the neural network includes a feature extraction branch network, a first positioning branch network and a second positioning branch network; the method comprises the following steps:
inputting a sample eye image into the neural network, and performing feature extraction on the sample eye image through a feature extraction branch network of the neural network to obtain a sample eye feature; the sample eye image comprises two eyes of a human face, and the two eyes in the sample eye image have corresponding known sight angles;
processing the sample eye features by utilizing the first positioning branch network and the second positioning branch network to obtain predicted sight angles corresponding to two eyes in the sample eye image;
determining a first loss based on the predicted gaze angle and the known gaze angle;
determining a net loss based on the first loss, training the neural network based on the net loss.
6. The method of claim 5, the neural network further comprising an eye segmentation branch network; both eyes in the sample eye image also have corresponding known eye keypoints;
after the step of inputting the sample eye image into the neural network and performing feature extraction on the sample eye image through the feature extraction branch network of the neural network to obtain the sample eye feature, the method further includes:
determining a supervised iris region, a supervised eyelid region and a supervised background region in the sample eye image based on known eye key points corresponding to the sample eye image;
carrying out segmentation processing on the sample eye features by using the eye segmentation branch network to obtain a segmented predicted iris region, a predicted eyelid region and a predicted background region;
determining a second loss based on the supervised iris, eyelid, and background regions, and the predicted iris, eyelid, and background regions;
the determining a net loss based on the first loss, training the neural network based on the net loss, comprising:
weighting and summing the first loss and the second loss to obtain a network loss;
training the neural network based on the network loss.
7. A gaze angle estimation device, comprising:
the region extraction module is used for determining an eye region included in the image to be detected;
the feature extraction module is used for performing feature extraction on the eye region obtained by the region extraction module to obtain eye region features corresponding to the eye region;
the interval estimation module is used for determining angle interval vectors corresponding to the two eyes in the eye region and offset angle vectors corresponding to the two eyes in the eye region based on the eye region features obtained by the feature extraction module;
and the angle determining module is used for determining the sight line angles corresponding to the two eyes in the eye region based on the angle interval vector and the offset angle vector determined by the interval estimation module.
8. A neural network training device, the neural network comprising a feature extraction branch network, a first positioning branch network and a second positioning branch network; the method comprises the following steps:
the characteristic extraction module is used for inputting the sample eye image into the neural network and extracting the characteristics of the sample eye image through the characteristic extraction branch network of the neural network to obtain the sample eye characteristics; the sample eye image comprises two eyes of a human face, and the two eyes in the sample eye image have corresponding known sight angles;
the angle estimation module is used for processing the sample eye features obtained by the feature extraction module by utilizing the first positioning branch network and the second positioning branch network to obtain predicted sight angles corresponding to two eyes in the sample eye images;
a loss determination module for determining a first loss based on the predicted gaze angle and the known gaze angle obtained by the angle estimation module;
and the network training module is used for determining the network loss based on the first loss obtained by the loss determining module and training the neural network based on the network loss.
9. A computer-readable storage medium storing a computer program for executing the gaze angle estimation method of any one of claims 1 to 4 or the neural network training method of any one of claims 5 to 6.
10. An electronic device, the electronic device comprising:
a processor; a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the instructions to implement the gaze angle estimation method of any one of claims 1 to 4 or the neural network training method of any one of claims 5 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011502278.0A CN112597872B (en) | 2020-12-18 | 2020-12-18 | Sight angle estimation method and device, storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011502278.0A CN112597872B (en) | 2020-12-18 | 2020-12-18 | Sight angle estimation method and device, storage medium and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112597872A true CN112597872A (en) | 2021-04-02 |
CN112597872B CN112597872B (en) | 2024-06-28 |
Family
ID=75199376
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011502278.0A Active CN112597872B (en) | 2020-12-18 | 2020-12-18 | Sight angle estimation method and device, storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112597872B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113379644A (en) * | 2021-06-30 | 2021-09-10 | 北京字跳网络技术有限公司 | Training sample obtaining method and device based on data enhancement and electronic equipment |
CN113468956A (en) * | 2021-05-24 | 2021-10-01 | 北京迈格威科技有限公司 | Attention judging method, model training method and corresponding device |
CN113470114A (en) * | 2021-08-31 | 2021-10-01 | 北京世纪好未来教育科技有限公司 | Sight estimation method, sight estimation device, electronic equipment and computer-readable storage medium |
CN113506328A (en) * | 2021-07-16 | 2021-10-15 | 北京地平线信息技术有限公司 | Method and device for generating sight line estimation model and method and device for estimating sight line |
CN113705349A (en) * | 2021-07-26 | 2021-11-26 | 电子科技大学 | Attention power analysis method and system based on sight estimation neural network |
CN114879843A (en) * | 2022-05-12 | 2022-08-09 | 平安科技(深圳)有限公司 | Sight line redirection method based on artificial intelligence and related equipment |
WO2024051345A1 (en) * | 2022-09-07 | 2024-03-14 | 浙江极氪智能科技有限公司 | Driver's line of sight identification method and apparatus, vehicle and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102548465A (en) * | 2010-09-27 | 2012-07-04 | 松下电器产业株式会社 | Line-of-sight estimation device |
CN107871107A (en) * | 2016-09-26 | 2018-04-03 | 北京眼神科技有限公司 | Face authentication method and device |
CN108734086A (en) * | 2018-03-27 | 2018-11-02 | 西安科技大学 | The frequency of wink and gaze estimation method of network are generated based on ocular |
CN109033957A (en) * | 2018-06-20 | 2018-12-18 | 同济大学 | A kind of gaze estimation method based on quadratic polynomial |
CN109492514A (en) * | 2018-08-28 | 2019-03-19 | 初速度(苏州)科技有限公司 | A kind of method and system in one camera acquisition human eye sight direction |
JP2019098024A (en) * | 2017-12-06 | 2019-06-24 | 国立大学法人静岡大学 | Image processing device and method |
CN110310321A (en) * | 2018-03-23 | 2019-10-08 | 爱信精机株式会社 | Direction of visual lines estimates device, direction of visual lines evaluation method and direction of visual lines estimation program |
CN110889332A (en) * | 2019-10-30 | 2020-03-17 | 中国科学院自动化研究所南京人工智能芯片创新研究院 | Lie detection method based on micro expression in interview |
-
2020
- 2020-12-18 CN CN202011502278.0A patent/CN112597872B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102548465A (en) * | 2010-09-27 | 2012-07-04 | 松下电器产业株式会社 | Line-of-sight estimation device |
CN107871107A (en) * | 2016-09-26 | 2018-04-03 | 北京眼神科技有限公司 | Face authentication method and device |
JP2019098024A (en) * | 2017-12-06 | 2019-06-24 | 国立大学法人静岡大学 | Image processing device and method |
CN110310321A (en) * | 2018-03-23 | 2019-10-08 | 爱信精机株式会社 | Direction of visual lines estimates device, direction of visual lines evaluation method and direction of visual lines estimation program |
CN108734086A (en) * | 2018-03-27 | 2018-11-02 | 西安科技大学 | The frequency of wink and gaze estimation method of network are generated based on ocular |
CN109033957A (en) * | 2018-06-20 | 2018-12-18 | 同济大学 | A kind of gaze estimation method based on quadratic polynomial |
CN109492514A (en) * | 2018-08-28 | 2019-03-19 | 初速度(苏州)科技有限公司 | A kind of method and system in one camera acquisition human eye sight direction |
CN110889332A (en) * | 2019-10-30 | 2020-03-17 | 中国科学院自动化研究所南京人工智能芯片创新研究院 | Lie detection method based on micro expression in interview |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113468956A (en) * | 2021-05-24 | 2021-10-01 | 北京迈格威科技有限公司 | Attention judging method, model training method and corresponding device |
CN113379644A (en) * | 2021-06-30 | 2021-09-10 | 北京字跳网络技术有限公司 | Training sample obtaining method and device based on data enhancement and electronic equipment |
CN113506328A (en) * | 2021-07-16 | 2021-10-15 | 北京地平线信息技术有限公司 | Method and device for generating sight line estimation model and method and device for estimating sight line |
CN113705349A (en) * | 2021-07-26 | 2021-11-26 | 电子科技大学 | Attention power analysis method and system based on sight estimation neural network |
CN113705349B (en) * | 2021-07-26 | 2023-06-06 | 电子科技大学 | Attention quantitative analysis method and system based on line-of-sight estimation neural network |
CN113470114A (en) * | 2021-08-31 | 2021-10-01 | 北京世纪好未来教育科技有限公司 | Sight estimation method, sight estimation device, electronic equipment and computer-readable storage medium |
CN114879843A (en) * | 2022-05-12 | 2022-08-09 | 平安科技(深圳)有限公司 | Sight line redirection method based on artificial intelligence and related equipment |
WO2024051345A1 (en) * | 2022-09-07 | 2024-03-14 | 浙江极氪智能科技有限公司 | Driver's line of sight identification method and apparatus, vehicle and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112597872B (en) | 2024-06-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112597872B (en) | Sight angle estimation method and device, storage medium and electronic equipment | |
US11295474B2 (en) | Gaze point determination method and apparatus, electronic device, and computer storage medium | |
US11182592B2 (en) | Target object recognition method and apparatus, storage medium, and electronic device | |
EP3525165B1 (en) | Method and apparatus with image fusion | |
US9626553B2 (en) | Object identification apparatus and object identification method | |
US8958609B2 (en) | Method and device for computing degree of similarly between data sets | |
JP2018525718A (en) | Face recognition system and face recognition method | |
US11163978B2 (en) | Method and device for face image processing, storage medium, and electronic device | |
WO2018219180A1 (en) | Method and apparatus for determining facial image quality, as well as electronic device and computer storage medium | |
US20210124928A1 (en) | Object tracking methods and apparatuses, electronic devices and storage media | |
US11367310B2 (en) | Method and apparatus for identity verification, electronic device, computer program, and storage medium | |
JP6822482B2 (en) | Line-of-sight estimation device, line-of-sight estimation method, and program recording medium | |
US11386710B2 (en) | Eye state detection method, electronic device, detecting apparatus and computer readable storage medium | |
JP6071002B2 (en) | Reliability acquisition device, reliability acquisition method, and reliability acquisition program | |
CN106471440A (en) | Eye tracking based on efficient forest sensing | |
JP7227385B2 (en) | Neural network training and eye open/close state detection method, apparatus and equipment | |
CN112651321A (en) | File processing method and device and server | |
US10902628B1 (en) | Method for estimating user eye orientation using a system-independent learned mapping | |
EP2998928B1 (en) | Apparatus and method for extracting high watermark image from continuously photographed images | |
CN109858355B (en) | Image processing method and related product | |
CN113592706A (en) | Method and device for adjusting homography matrix parameters | |
CN112200109A (en) | Face attribute recognition method, electronic device, and computer-readable storage medium | |
CN112347843B (en) | Method and related device for training wrinkle detection model | |
CN113506328A (en) | Method and device for generating sight line estimation model and method and device for estimating sight line | |
CN113129333A (en) | Multi-target real-time tracking method and system and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |