CN112597872B - Sight angle estimation method and device, storage medium and electronic equipment - Google Patents

Sight angle estimation method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN112597872B
CN112597872B CN202011502278.0A CN202011502278A CN112597872B CN 112597872 B CN112597872 B CN 112597872B CN 202011502278 A CN202011502278 A CN 202011502278A CN 112597872 B CN112597872 B CN 112597872B
Authority
CN
China
Prior art keywords
eye
angle
region
network
sight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011502278.0A
Other languages
Chinese (zh)
Other versions
CN112597872A (en
Inventor
沈丽娜
牛建伟
储刘火
陶冶
杨超
张宏志
黄赫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Horizon Robotics Science and Technology Co Ltd
Original Assignee
Shenzhen Horizon Robotics Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Horizon Robotics Science and Technology Co Ltd filed Critical Shenzhen Horizon Robotics Science and Technology Co Ltd
Priority to CN202011502278.0A priority Critical patent/CN112597872B/en
Publication of CN112597872A publication Critical patent/CN112597872A/en
Application granted granted Critical
Publication of CN112597872B publication Critical patent/CN112597872B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/197Matching; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/193Preprocessing; Feature extraction

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Ophthalmology & Optometry (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the disclosure discloses a sight angle estimation method and device, a storage medium and electronic equipment, wherein the method comprises the following steps: determining an eye region included in the image to be detected; extracting features of the eye region to obtain eye region features corresponding to the eye region; based on the eye region characteristics, determining angle interval vectors corresponding to eyes in the eye region and offset angle vectors corresponding to eyes in the eye region; and determining the sight angles corresponding to the eyes in the eye area based on the angle interval vector and the offset angle vector. The embodiment of the disclosure combines the positioning angle interval with the offset angle, improves the accuracy of the sight angle estimation, and can adapt to head gestures with various amplitudes and improve the robustness of the sight angle estimation method because the embodiment of the disclosure processes based on the eye area.

Description

Sight angle estimation method and device, storage medium and electronic equipment
Technical Field
The disclosure relates to computer vision technology, in particular to a sight angle estimation method and device, a storage medium and electronic equipment.
Background
The line of sight angle estimation is to deduce the line of sight direction of a person from an eye picture or a face picture. In human-computer interaction, the computer is enabled to automatically estimate the observation direction of human eyes, namely, the sight angle estimation, which is an important application; for example, during driving, the current state of the driver can be predicted through the view angle estimation, and whether distraction, fatigue and the like occur or not is predicted, so that the automobile is controlled to execute a series of corresponding operations and/or reminders.
Disclosure of Invention
The present disclosure has been made in order to solve the above technical problems. The embodiment of the disclosure provides a sight angle estimation method and device, a storage medium and electronic equipment.
According to an aspect of the embodiments of the present disclosure, there is provided a line-of-sight angle estimation method, including:
determining an eye region included in the image to be detected;
Extracting features of the eye region to obtain eye region features corresponding to the eye region;
based on the eye region characteristics, determining angle interval vectors corresponding to eyes in the eye region and offset angle vectors corresponding to eyes in the eye region;
and determining the sight angles corresponding to the eyes in the eye area based on the angle interval vector and the offset angle vector.
According to another aspect of the disclosed embodiments, there is provided a neural network training method, the neural network including a feature extraction branch network, a first positioning branch network, and a second positioning branch network; comprising the following steps:
inputting a sample eye image into the neural network, and carrying out feature extraction on the sample eye image through a feature extraction branch network of the neural network to obtain sample eye features; the sample eye image comprises two eyes of a human face, and the two eyes in the sample eye image have corresponding known sight angles;
Processing the sample eye features by using the first positioning branch network and the second positioning branch network to obtain predicted line-of-sight angles corresponding to eyes in the sample eye images;
determining a first loss based on the predicted line of sight angle and the known line of sight angle;
a network loss is determined based on the first loss, and the neural network is trained based on the network loss.
According to still another aspect of the embodiments of the present disclosure, there is provided a line-of-sight angle estimation apparatus including:
The region extraction module is used for determining an eye region included in the image to be detected;
The feature extraction module is used for extracting features of each eye region obtained by the region extraction module to obtain eye region features corresponding to each eye region;
the interval estimation module is used for determining an angle interval vector corresponding to the eyes in the eye area and an offset angle vector corresponding to the eyes in the eye area based on the eye area characteristics obtained by the characteristic extraction module;
And the angle determining module is used for determining the sight angles corresponding to the eyes in the eye area based on the angle interval vector and the offset angle vector determined by the interval estimating module.
According to yet another aspect of an embodiment of the present disclosure, there is provided a neural network training apparatus including a feature extraction branch network, a first positioning branch network, and a second positioning branch network; comprising the following steps:
The feature extraction module is used for inputting the sample eye image into the neural network, and extracting features of the sample eye image through a feature extraction branch network of the neural network to obtain sample eye features; the sample eye image comprises two eyes of a human face, and the two eyes in the sample eye image have corresponding known sight angles;
The angle estimation module is used for processing the sample eye characteristics obtained by the characteristic extraction module by utilizing the first positioning branch network and the second positioning branch network to obtain a predicted line-of-sight angle corresponding to eyes in the sample eye image;
A loss determination module configured to determine a first loss based on the predicted line-of-sight angle and the known line-of-sight angle obtained by the angle estimation module;
and the network training module is used for determining network loss based on the first loss obtained by the loss determination module and training the neural network based on the network loss.
According to still another aspect of the embodiments of the present disclosure, there is provided a computer readable storage medium storing a computer program for executing the line-of-sight angle estimation method according to any one of the embodiments or the neural network training method according to any one of the embodiments.
According to still another aspect of the embodiments of the present disclosure, there is provided an electronic device including:
a processor; a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the instructions to implement the line-of-sight angle estimation method according to any one of the embodiments or the neural network training method according to any one of the embodiments.
According to the sight angle estimation method and device, the storage medium and the electronic equipment, which are provided by the embodiment of the disclosure, the accuracy of sight angle estimation is improved by combining the positioning angle interval with the offset angle, and the robustness of the sight angle estimation method is improved due to the fact that the embodiment of the disclosure processes based on the eye area, the head gestures with various amplitudes can be adapted.
The technical scheme of the present disclosure is described in further detail below through the accompanying drawings and examples.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing embodiments thereof in more detail with reference to the accompanying drawings. The accompanying drawings are included to provide a further understanding of embodiments of the disclosure, and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure, without limitation to the disclosure. In the drawings, like reference numerals generally refer to like parts or steps.
Fig. 1 is a schematic diagram of a view angle estimation system according to an exemplary embodiment of the present disclosure.
Fig. 2a is a schematic structural diagram of a neural network training system according to an exemplary embodiment of the present disclosure.
Fig. 2b is a schematic view of a view angle of a sample eye image applied in a neural network training system according to an exemplary embodiment of the present disclosure.
Fig. 3 is a flow chart illustrating a method for estimating a line of sight angle according to an exemplary embodiment of the present disclosure.
Fig. 4 is a schematic flow chart of step 303 in the embodiment shown in fig. 3 of the present disclosure.
Fig. 5 is a schematic flow chart of step 3031 in the embodiment shown in fig. 4 of the present disclosure.
Fig. 6a is a schematic flow chart of step 3032 in the embodiment of fig. 4 of the present disclosure.
Fig. 6b is a schematic view of an eye region in one example of a gaze angle estimation method provided by an embodiment of the present disclosure.
Fig. 6c is a schematic diagram of an eye region subjected to data enhancement in one example of a gaze angle estimation method provided by an embodiment of the present disclosure.
Fig. 6d is a schematic view of an eye region in another example of a gaze angle estimation method provided by an embodiment of the present disclosure.
Fig. 6e is a schematic diagram of an eye region subjected to data enhancement in another example of a gaze angle estimation method provided by an embodiment of the present disclosure.
Fig. 7 is a flow chart of a neural network training method according to an exemplary embodiment of the present disclosure.
Fig. 8 is a schematic view of a view angle estimating apparatus according to an exemplary embodiment of the present disclosure.
Fig. 9 is a schematic structural view of a viewing angle estimating device provided in another exemplary embodiment of the present disclosure.
Fig. 10 is a schematic structural diagram of a neural network training device according to an exemplary embodiment of the present disclosure.
Fig. 11 is a schematic structural view of a neural network training device according to another exemplary embodiment of the present disclosure.
Fig. 12 is a block diagram of an electronic device provided in an exemplary embodiment of the present disclosure.
Detailed Description
Hereinafter, example embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present disclosure and not all of the embodiments of the present disclosure, and that the present disclosure is not limited by the example embodiments described herein.
It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless it is specifically stated otherwise.
It will be appreciated by those of skill in the art that the terms "first," "second," etc. in embodiments of the present disclosure are used merely to distinguish between different steps, devices or modules, etc., and do not represent any particular technical meaning nor necessarily logical order between them.
It should also be understood that in embodiments of the present disclosure, "plurality" may refer to two or more, and "at least one" may refer to one, two or more.
It should also be appreciated that any component, data, or structure referred to in the presently disclosed embodiments may be generally understood as one or more without explicit limitation or the contrary in the context.
In addition, the term "and/or" in this disclosure is merely an association relationship describing an association object, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the front and rear association objects are an or relationship.
It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and that the same or similar features may be referred to each other, and for brevity, will not be described in detail.
Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.
Embodiments of the present disclosure may be applicable to electronic devices such as terminal devices, computer systems, servers, etc., which may operate with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with the terminal device, computer system, server, or other electronic device include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, minicomputer systems, mainframe computer systems, and distributed cloud computing technology environments that include any of the above systems, and the like.
Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment in which tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computing system storage media including memory storage devices.
Summary of the application
In carrying out the present disclosure, the inventors have found that in existing solutions, the dominant approach is typically a model-based approach for line-of-sight detection, i.e. using infrared light source reflection and a geometric model to calculate the line-of-sight direction vector. The existing scheme has at least the following problems: the cost is high, the requirement on image quality is high, complex equipment (a plurality of infrared light sources and high-definition cameras) is needed, the condition that the head gesture is basically motionless is needed to be met, and the application range of the scheme is greatly limited.
Exemplary System
The present disclosure provides a gaze angle estimation method, which determines a gaze angle from coarse to fine in two stages based on confidence and offset, and may further employ a data enhancement method to enhance the contrast of an eye region in a data preprocessing stage, thereby enhancing the accuracy and robustness of gaze angle estimation and adaptability to head movements.
Fig. 1 is a schematic diagram of a view angle estimation system according to an exemplary embodiment of the present disclosure. As shown in fig. 1, the neural network includes: a feature extraction branch network 102, a first positioning branch network 103, and a second positioning branch network 104;
The acquired image to be detected is processed by a face detection model and an eye key point model 101 to obtain an eye region (other pre-information such as a human eye key point).
And the data enhancement processing is carried out on the eye area, so that the contrast of the eye area is enhanced, the attention to the eye area is enhanced, and the generalization capability of the model is improved.
Inputting the eye region subjected to data enhancement into a neural network, and extracting the features of the eye region through a feature extraction branch network 102 of the neural network to obtain the features of the eye region; the eye region characteristics are processed through a first positioning branch network 103 and a second positioning branch network 104 of the neural network to realize two-stage vision direction estimation, a vision region is divided into a set number of horizontal angle intervals and vertical angle intervals, corresponding confidence is output in the horizontal angle intervals and the vertical angle intervals in which the first positioning branch network 103 coarsely positions the vision, and one angle interval vector corresponding to the vision direction in the eye region is determined; the second positioning branch network 104 performs finer classification and regression, performs classification first, selects and maps the classification result to a certain interval according to the confidence level, and then obtains the offset angle vector by using the regression offset with finer granularity.
And adding the offset angle vector to the angle interval vector obtained by coarse positioning in the first stage, and determining the horizontal sight angle and the vertical sight angle of the left eye and the horizontal sight angle and the vertical sight angle of the right eye in the eyes of the eye region.
According to the embodiment of the disclosure, the positioning accuracy of the sight angle is improved through a two-stage coarse-to-fine positioning method; the label adopts data enhancement aiming at the eye area to improve the contrast of the eye area, so that the robustness of the sight angle estimation method is improved.
Fig. 2a is a schematic structural diagram of a neural network training system according to an exemplary embodiment of the present disclosure. As shown in fig. 2a, in the training process, the neural network includes: a feature extraction branch network 102, a first positioning branch network 103, a second positioning branch network 104, an eye segmentation branch network 205, and a keypoint prediction branch network 206;
Inputting a sample eye image into a neural network, wherein the sample eye image can be obtained through face detection and human eye detection based on the sample image comprising two eyes, and the sample eye image can be input into the neural network after data enhancement; both eyes in the sample eye image as shown in fig. 2b have corresponding known line of sight angles; wherein the known line of sight angle may include: a known horizontal line of sight angle and a known vertical line of sight angle for the left eye, and a known horizontal line of sight angle and a known vertical line of sight angle for the right eye. Two-stage gaze direction estimation is achieved through feature extraction branch network 102, first positioning branch network 103, and second positioning branch network 104 in a neural network, determining predicted gaze angles of both eyes of an eye region, wherein the predicted gaze angles may include: the predicted horizontal and vertical line of sight angles for the left eye, and the predicted horizontal and vertical line of sight angles for the right eye.
A first loss is determined based on the predicted line of sight angle and the known line of sight angle.
On the basis of determining the first loss, the eye segmentation branch network 205 may be utilized to segment the sample eye image, so as to determine the second loss; the sample ocular image includes known ocular keypoints.
An auxiliary eye keypoint detection task determines a third loss based on the predicted eye keypoints and known eye keypoints of the sample eye image determined by the keypoint prediction branch network 206.
Network losses are determined based on the first, second, and third losses, and network parameters in the neural network are adjusted by back gradient propagation based on the network losses. The trained neural network, when applied, includes only the feature extraction branch network 102, the first positioning branch network 103, and the second positioning branch network 104.
According to the embodiment, the corresponding losses of the pupil segmentation task and the vision angle task are fed back to the neural network at the same time, so that the neural network learns the characteristics of the two tasks, the internal connection of the two tasks can be mutually promoted, a better vision result can be obtained by the model, and the adaptability of the neural network to head movement is improved; and moreover, the loss corresponding to the eye key point detection task is combined, so that the prediction accuracy of the neural network can be improved to a certain extent.
Exemplary method
Fig. 3 is a flow chart illustrating a method for estimating a line of sight angle according to an exemplary embodiment of the present disclosure. The embodiment can be applied to an electronic device, as shown in fig. 3, and includes the following steps:
In step 301, an eye region included in an image to be detected is determined.
The eye region in this embodiment refers to an image region including two eyes of one person, and optionally, the eye region may be obtained by the face detection model and the eye keypoint model 101 in the embodiment provided in fig. 1.
And 302, extracting features of the eye region to obtain eye region features corresponding to the eye region.
Optionally, the eye region may be feature extracted using a feature extraction network or using a feature extraction branch network in an overall neural network to obtain eye region features, optionally the feature extraction branch network is a feature extraction branch network 102 as in the embodiment provided in fig. 1.
Step 303, determining an angle interval vector corresponding to eyes in the eye region and an offset angle vector corresponding to eyes in the eye region based on the eye region features.
In this embodiment, the two branch networks in the neural network may be used to process the eye region respectively, so as to obtain an angle interval vector and an offset angle vector, the large direction of the line of sight angle is determined by the angle interval vector, and the accuracy of the line of sight angle direction is improved by the offset angle vector.
Step 304, determining the corresponding sight line angles of the eyes in the eye area based on the angle interval vector and the offset angle vector.
Alternatively, as shown in fig. 2b, wherein the line of sight angle includes a pitch angle (pitch) of the eyeball, which represents the rotation angle of the eyeball in the X-axis, and a yaw angle (yaw), which represents the rotation angle of the eyeball in the Y-axis; the angle interval vector and the offset angle vector are accumulated, the specific offset angle of the binocular vision in the angle interval vector is determined by the offset angle vector, and the accuracy of the determined vision angle is improved.
According to the sight angle estimation method provided by the embodiment of the disclosure, the accuracy of sight angle estimation is improved by combining the positioning angle interval with the offset angle, and the embodiment of the disclosure can adapt to head gestures with various amplitudes and improve the robustness of the sight angle estimation method because the eye area is used as a basis for processing.
Optionally, step 302 in the above embodiment may include: and performing feature extraction on the eye region by using a feature extraction branch network of the neural network to obtain eye region features corresponding to the eye region.
The feature extraction branch network in the embodiment is a part of the neural network, and feature extraction is performed on the eye region through the feature extraction branch network, so that a basis is provided for estimating the sight angle through other parts in the neural network, and the sight estimation efficiency is improved.
As shown in fig. 4, step 303 may include the following steps, based on the embodiment shown in fig. 3, described above:
Step 3031, determining angle interval vectors corresponding to eyes in the eye region based on the eye region characteristics by using a first positioning branch network of the neural network.
Alternatively, as shown in fig. 1, the first positioning branch network 103 and the second positioning branch network 104 in the present embodiment are respectively connected to the feature extraction branch network 102, and the first positioning branch network 103 and the second positioning branch network 104 may process the eye region feature at the same time, so as to improve the network processing efficiency.
Step 3032, determining offset angle vectors corresponding to both eyes in the eye region based on the eye region characteristics and the angle interval vectors using a second positioning branch network of the neural network.
The neural network in this embodiment includes three branch networks: the feature extraction branch network, the first positioning branch network and the second positioning branch network perform feature extraction once through the feature extraction branch network, and the eye region features obtained by the feature extraction are respectively input into the first positioning branch network and the second positioning branch network, so that repeated feature extraction operation is avoided, and the processing efficiency of the neural network is improved.
As shown in fig. 5, on the basis of the embodiment shown in fig. 4, step 3031 may include the following steps:
In step 501, the angles of the image to be detected relative to the normal vector in the horizontal direction and the vertical direction are respectively divided into a set number of horizontal angle intervals and vertical angle intervals.
In this embodiment, a direction perpendicular to the image to be detected is taken as a normal vector direction (corresponding to a yaw angle of an eyeball in the embodiment shown in fig. 2 b), and an included angle between the horizontal direction and the normal vector is-90 degrees to 90 degrees (corresponding to a pitch angle of the eyeball in the embodiment shown in fig. 2 b), wherein 90 degrees represents a horizontal left direction on a plane on which the image to be detected is located; -90 degrees represents the horizontal right on the plane in which the image to be detected lies; the 0 degree representation coincides with the normal vector and is perpendicular to the image to be detected; the included angle between the vertical direction and the normal vector is-90 degrees to 90 degrees, wherein 90 degrees represents the vertical upward direction on the plane of the image to be detected; -90 degrees representing vertically downwards on the plane in which the image to be detected lies; the 0 degree representation coincides with the normal vector and is perpendicular to the image to be detected; in this embodiment, the included angle in the horizontal direction and the included angle in the vertical direction are divided into the intervals of a set number, each interval includes a plurality of angles, and each interval can be numbered to distinguish, for example, the included angle between the horizontal direction and the normal vector is divided into 6 horizontal angle intervals, which are respectively: 1 interval [ -90 degrees, -60 degrees ], 2 interval [ -60 degrees, -30 degrees, 3 interval [ -30 degrees, 0 degrees ], 4 interval [0 degrees, 30 degrees ], 5 interval [30 degrees, 60 degrees ], 6 interval [60 degrees, 90 degrees ]; coarse positioning of the line of sight direction can be achieved by positioning the line of sight direction in the angle interval.
Step 502, processing the eye region features by using a first positioning branch network of the neural network to obtain a first confidence level corresponding to each horizontal angle interval and a second confidence level corresponding to each vertical angle interval of both eyes in the eye region.
In this embodiment, the first positioning branch network may be implemented by using a classification network, and the first confidence level of each horizontal angle interval corresponding to two eyes in the eye area is determined by using the classification network, optionally, the first confidence level of the left eye may be determined for the left eye of the two eyes, and the first confidence level of the right eye may be determined for the right eye; and simultaneously determining a second confidence level of the eyes corresponding to each vertical angle interval, optionally determining a left-eye second confidence level for the left eye and a right-eye second confidence level for the right eye.
In step 503, two horizontal angle intervals corresponding to the eyes are determined based on the first confidence level, and two vertical angle intervals corresponding to the eyes are determined based on the second confidence level.
Optionally, a horizontal angle interval and a vertical angle interval corresponding to the left eye and a horizontal angle interval and a vertical angle interval corresponding to the right eye can be respectively determined for the two eyes, wherein the horizontal angle interval corresponding to the largest left eye first confidence in the set number of left eye first confidence degrees corresponding to the left eye is used as the horizontal angle interval corresponding to the left eye, and the horizontal angle interval corresponding to the largest right eye first confidence in the set number of right eye first confidence degrees corresponding to the right eye is used as the horizontal angle interval corresponding to the right eye; the vertical angle interval corresponding to the left eye second confidence coefficient with the largest left eye second confidence coefficient in the set number corresponding to the left eye is used as the vertical angle interval corresponding to the left eye, and the vertical angle interval corresponding to the right eye with the largest right eye second confidence coefficient in the set number corresponding to the right eye second confidence coefficient is used as the vertical angle interval corresponding to the right eye.
In step 504, an angle interval vector corresponding to both eyes is determined based on the two horizontal angle intervals and the two vertical angle intervals corresponding to both eyes.
In this embodiment, a one-dimensional 4-element vector is formed based on two horizontal angle intervals and two vertical angle intervals, so that a horizontal angle interval and a vertical angle interval corresponding to two eyes based on the expression of one angle interval vector are realized, the coarse positioning result of the eyes' sight line is represented by the angle interval vector, and because the coarse positioning divides the angle into a set number of intervals, the corresponding angle interval can be determined only by determining the confidence coefficient corresponding to each interval, thereby greatly reducing the data processing capacity and improving the coarse positioning efficiency.
As shown in fig. 6a, step 3032 may include the following steps, based on the embodiment shown in fig. 4, as described above:
In step 601, the second positioning branch network of the neural network is used to process the eye region characteristics, so as to obtain two horizontal offset angles of each eye in the two eyes in the horizontal direction and two vertical offset angles in the vertical direction.
In this embodiment, the second positioning branch network may perform finer classification and regression on both eyes in the eye area, for example, subdivide each interval corresponding to each coarse classification into a plurality of subintervals, and at this time, the second positioning branch network may be considered as a classification network, and determine the confidence level of each subinterval corresponding to both eyes, and further determine the line of sight angle with finer granularity; or the second positioning branch network is an offset prediction network, and based on the input eye region characteristics, the offset of the eyes in the determined angle interval vector is directly obtained, and the sight angle with finer granularity is determined.
In step 602, offset angle vectors corresponding to both eyes are determined based on the two horizontal offset angles and the two vertical offset angles.
In this embodiment, a one-dimensional 4-element vector is formed based on two horizontal offset angles and two vertical offset angles, so that the horizontal offset angles and the vertical offset angles corresponding to eyes expressed based on one offset angle vector are realized, the result of fine granularity positioning of the eyes 'sight line is represented by the offset angle vector, and because the angle is divided into a set number of intervals by coarse positioning, only the general direction of the sight line is determined by coarse positioning, and the specific sight line angle can be determined by the offset angle vector, thereby improving the accuracy of the positioning of the eyes' sight line angles.
In some alternative embodiments, the line of sight angles include horizontal and vertical line of sight angles for the left eye, and horizontal and vertical line of sight angles for the right eye; step 304 in the above embodiment may include:
and adding the angle interval vector and the offset angle vector to obtain the sight angle vector.
The above embodiment shows that the angle interval vector and the offset angle vector are respectively one-dimensional 4-element vectors, and the two vectors can be added according to elements to obtain a new one-dimensional 4-element vector added by each element, namely, the sight angle vector.
The horizontal and vertical line of sight angles of the left eye and the horizontal and vertical line of sight angles of the right eye in the eyes of the eye region are determined based on the line of sight angle vector.
According to the embodiment, the offset angle vector is overlapped into the angle interval vector of coarse positioning, so that the accurate positioning of the sight line direction in the angle interval vector of coarse positioning is realized, the positioning mode of the sight line direction from coarse to fine is realized, and the robustness and the accuracy of the sight line angle estimation method are improved.
In some alternative embodiments, step 301 may include:
and carrying out face detection processing on the image to be detected to obtain at least one face area in the image to be detected.
Optionally, the face detection model shown in the embodiment provided in fig. 1 and the face detection model in the eye key point model 101 perform face detection on the image to be detected, and determine at least one face area included in the image to be detected.
And executing face key point recognition on each face area to obtain the eye key points in the face area.
Optionally, the eye keypoint model in the face detection model and the eye keypoint model 101 as shown in the embodiment provided in fig. 1 performs eye keypoint identification on each face region, and determines eye keypoints included in the face region, where optionally, the eye keypoints may include, but are not limited to, 16 keypoints of each of two eyes.
And obtaining an eye region corresponding to the face region based on the eye key points in each face region.
The embodiment determines the positions of the eyes in each face area based on the detected multiple eye key points, so as to obtain the eye area in each face area; the eye area is used for estimating the sight angle, and the adaptability of the sight angle estimation method to head movement is improved due to the fact that other parts of the human face are removed.
In some alternative embodiments, prior to step 302, it may further include:
And performing local histogram equalization processing on the eye region to enhance the contrast of the eye region and obtain the eye region with enhanced contrast.
The embodiment strengthens the contrast of eyes through local histogram equalization aiming at the eye area, strengthens the attention to the eye area and improves the accuracy of sight angle estimation. For example, in one alternative example, fig. 6b is a schematic view of an eye region in one example of a gaze angle estimation method provided by an embodiment of the present disclosure; FIG. 6c is a schematic diagram of an eye region with data enhancement in one example of a gaze angle estimation method provided by an embodiment of the present disclosure; FIG. 6d is a schematic view of an eye region in another example of a gaze angle estimation method provided by embodiments of the present disclosure; fig. 6e is a schematic diagram of an eye region subjected to data enhancement in another example of a gaze angle estimation method provided by an embodiment of the present disclosure.
Fig. 7 is a flow chart of a neural network training method according to an exemplary embodiment of the present disclosure. The neural network comprises a feature extraction branch network, a first positioning branch network and a second positioning branch network; as shown in fig. 7, the method comprises the following steps:
And 701, inputting the sample eye image into a neural network, and carrying out feature extraction on the sample eye image through a feature extraction branch network of the neural network to obtain sample eye features.
The sample eye image comprises two eyes of a human face, and the two eyes in the sample eye image have corresponding known sight angles.
In this embodiment, the process of extracting the features of the sample image may be implemented with reference to step 302 in the above-mentioned line-of-sight angle estimation method.
Step 702, processing the sample eye feature by using the first positioning branch network and the second positioning branch network to obtain a predicted line of sight angle corresponding to both eyes in the sample eye image.
This step may be implemented with reference to steps 303 and 304 in the embodiments of fig. 3-5 of the line of sight angle estimation method described above, the only difference being the network parameters in the neural network.
Step 703 determines a first loss based on the predicted line of sight angle and the known line of sight angle.
In this embodiment, the known line of sight angle is used as the supervision information of the neural network training, and the loss of the neural network is trained by using the difference between the predicted line of sight angle obtained by the current neural network prediction and the known line of sight angle, for example, as shown in fig. 2a, the first loss is obtained by using the predicted line of sight angles output by the first positioning branch network 103 and the second positioning branch network 104; the first loss may be calculated by any loss function, for example, an L1 loss function.
Step 704, determining a network loss based on the first loss, and training the neural network based on the network loss.
Optionally, the parameter adjustment of the neural network may be implemented by means of gradient back propagation based on the network loss, and the steps 701 to 703 are performed on the neural network after each time of obtaining the adjusted parameter until a preset condition is met, so as to obtain a trained neural network, where the preset condition may be a set iteration number, a value of the network loss falling in two consecutive iterations is smaller than a set value, and so on. According to the embodiment, the neural network is trained by using the known sight angle, so that the sight angle in the eye area can be predicted better by the obtained neural network, and the accuracy of sight angle estimation is improved.
In some alternative embodiments, the neural network further comprises an eye-segmentation branch network; at this time, both eyes in the sample eye image also have corresponding known eye keypoints.
After step 701, it may further include:
and determining a supervision iris area, a supervision sclera area and a supervision background area in the sample eye image based on the known eye key points corresponding to the sample eye image.
Wherein the eye keypoints may include, but are not limited to, 16 eye keypoints; by carrying out ellipse fitting on the eye key points, the supervision iris area and the supervision eyelid area in the sample eye image can be determined, and other areas in the sample eye image can be used as supervision background areas.
And carrying out segmentation processing on the sample eye features by utilizing an eye segmentation branch network to obtain a segmented predicted iris region, a predicted eyelid region and a predicted background region.
The eye segmentation branch network in the embodiment realizes the region segmentation of the sample eye features, and the sample eye features are segmented into three parts, namely a predicted iris region, a predicted eyelid region and a predicted background region.
The second penalty is determined based on the supervised iris region, the supervised eyelid region, and the supervised background region, and the predicted iris region, the predicted eyelid region, and the predicted background region.
Optionally, taking a supervision iris area, a supervision eyelid area and a supervision background area as supervision information; determining a second loss together with the difference between the predicted iris region and the supervised iris region, the difference between the predicted eyelid region and the supervised eyelid region, and the difference between the predicted background region and the supervised background region; for example, in the embodiment shown in fig. 2a, the second penalty is derived based on the output of the eye-split branching network 205.
In this embodiment, step 704 may include:
the first loss and the second loss are weighted and summed to obtain network loss; the neural network is trained based on network losses.
In this embodiment, the network loss is determined by combining the first loss and the second loss, and the training process of the neural network based on the network loss can refer to the above step 704, which is not described herein again; the embodiment relieves the problem of inaccuracy caused by determining the line-of-sight angle by the face orientation under the condition of smaller relative amplitude of the head gesture and larger line-of-sight angle by combining the second loss, and enhances the capability of accurate regression; the pupil segmentation result and the vision angle result are fed back to the neural network at the same time, so that the neural network learns the characteristics of two tasks at the same time, and the internal connection of the two tasks can promote each other, so that the neural network obtains a better vision angle result.
In some alternative embodiments, the neural network further comprises a keypoint prediction branch network;
after step 701, it may further include:
and carrying out key point prediction on the sample eye features by using a key point prediction branch network to obtain predicted eye key points.
Alternatively, the keyword-predicting branch network may be a keyword-detecting network, and the predicted eye keywords in the sample eye image are obtained by detecting, alternatively, the number of predicted eye keywords is the same as the number of known eye keywords, for example, all including 16 eye keywords, and the like.
A third loss is determined based on the predicted eye keypoints and the known eye keypoints.
Optionally, each of the known eye keypoints corresponds to a known coordinate, each of the predicted eye keypoints corresponds to a predicted coordinate, for convenience of correspondence, each of the known eye keypoints and the predicted eye keypoints may be separately numbered, a difference between the known eye keypoints and the predicted eye keypoints having a correspondence is determined by the numbering, and a third loss is determined based on the differences between all the known eye keypoints and the predicted eye keypoints; for example, in the embodiment shown in FIG. 2a, a third penalty is derived based on the output of the critical point predicted branch network 206.
In this embodiment, step 704 may include:
The first loss, the second loss and the third loss are weighted and summed to obtain network loss; the neural network is trained based on network losses.
In this embodiment, the network loss is determined by combining the first loss, the second loss and the third loss, and the training process of the neural network based on the network loss can refer to the above step 704, which is not described herein again; according to the embodiment, the neural network is trained by combining the second loss and the third loss, and the accuracy of prediction of the neural network on the sight angle is further improved through the eye key points.
In some alternative embodiments, step 702 may include:
And determining a predicted angle interval vector corresponding to both eyes in the sample eye image based on the sample eye feature by using a first positioning branch network of the neural network.
The implementation and effect of this step may refer to step 3031 in the above embodiment, and will not be described herein.
And determining a predicted offset angle vector corresponding to both eyes in the sample eye image based on the sample eye feature and the predicted angle interval vector by using a second positioning branch network of the neural network.
The implementation and effect of this step may refer to step 3032 in the above embodiment, and will not be described herein.
And determining the predicted line-of-sight angles corresponding to the eyes in the sample eye image based on the predicted angle interval vector and the predicted offset angle vector.
The implementation and effect of this step may refer to step 304 in the above embodiment, and will not be described herein.
The network structure of the neural network provided in this embodiment is not changed in the training process, so that the process of determining the predicted line of sight angle in the network training is the same as the process of estimating the line of sight angle by using the neural network, and the difference is that the network parameters in the neural network are different, and the predicted line of sight angle obtained in this embodiment is not the optimal line of sight angle result.
Any of the line-of-sight angle estimation or neural network training methods provided by embodiments of the present disclosure may be performed by any suitable device having data processing capabilities, including, but not limited to: terminal equipment, servers, etc. Or any of the line-of-sight angle estimation or neural network training methods provided by the embodiments of the present disclosure may be executed by a processor, such as the processor executing any of the line-of-sight angle estimation or neural network training methods mentioned by the embodiments of the present disclosure by invoking corresponding instructions stored in a memory. And will not be described in detail below.
Exemplary apparatus
Fig. 8 is a schematic view of a view angle estimating apparatus according to an exemplary embodiment of the present disclosure. As shown in fig. 8, the apparatus provided in this embodiment includes:
The region extraction module 81 is configured to determine an eye region included in the image to be detected.
The feature extraction module 82 is configured to perform feature extraction on the eye region obtained by the region extraction module 81, so as to obtain an eye region feature corresponding to the eye region.
The interval estimation module 83 is configured to determine an angle interval vector corresponding to both eyes in the eye region and an offset angle vector corresponding to both eyes in the eye region based on the eye region features obtained by the feature extraction module 82.
The angle determining module 84 is configured to determine the line of sight angles corresponding to the eyes in the eye area based on the angle interval vector and the offset angle vector determined by the interval estimating module 83.
According to the sight angle estimation device provided by the embodiment of the disclosure, the accuracy of sight angle estimation is improved by combining the positioning angle interval with the offset angle, and the embodiment of the disclosure can adapt to head gestures with various amplitudes and improve the robustness of a sight angle estimation method because the eye area is used as a basis for processing.
Fig. 9 is a schematic structural view of a viewing angle estimating device provided in another exemplary embodiment of the present disclosure. As shown in fig. 9, the apparatus provided in this embodiment includes:
The feature extraction module 82 is specifically configured to perform feature extraction on the eye region by using a feature extraction branch network of the neural network, so as to obtain an eye region feature corresponding to the eye region.
The section estimation module 83 includes:
a first branch positioning unit 831 for determining an angle interval vector corresponding to both eyes in the eye area based on the eye area characteristics using a first positioning branch network of the neural network;
A second branch positioning unit 832 is configured to determine offset angle vectors corresponding to eyes in the eye region based on the eye region features and the angle interval vectors by using a second positioning branch network of the neural network.
Optionally, the first branch positioning unit 831 is specifically configured to divide the included angle of the image to be detected with respect to the normal vector in the horizontal direction and the vertical direction into a set number of horizontal angle intervals and vertical angle intervals, respectively; processing the eye region characteristics by using a first positioning branch network of the neural network to obtain a first confidence coefficient of each horizontal angle interval and a second confidence coefficient of each vertical angle interval corresponding to the eyes in the eye region; determining two horizontal angle intervals corresponding to the eyes based on the first confidence coefficient, and determining two vertical angle intervals corresponding to the eyes based on the second confidence coefficient; and determining the angle interval vector corresponding to the two eyes based on the two horizontal angle intervals and the two vertical angle intervals corresponding to the two eyes.
Optionally, the second branch positioning unit 832 is specifically configured to process the eye region feature by using a second positioning branch network of the neural network, so as to obtain two horizontal offset angles of each eye in the two eyes in the horizontal direction and two vertical offset angles in the vertical direction; based on the two horizontal offset angles and the two vertical offset angles, offset angle vectors corresponding to both eyes are determined.
Optionally, the line of sight angle includes a horizontal line of sight angle and a vertical line of sight angle of the left eye, and a horizontal line of sight angle and a vertical line of sight angle of the right eye;
the angle determination module 84 includes:
An angle vector determining unit 841, configured to add the angle interval vector and the offset angle vector to obtain a line-of-sight angle vector;
The binocular angle determining unit 842 is used for determining the horizontal line of sight angle and the vertical line of sight angle of the left eye and the horizontal line of sight angle and the vertical line of sight angle of the right eye in the eyes area based on the line of sight angle vector.
The region extraction module 81 includes:
A face detection unit 811, configured to perform face detection processing on an image to be detected, to obtain at least one face area in the image to be detected;
a human eye detection unit 812, configured to perform facial key point recognition on each face region, so as to obtain a human eye key point in the face region;
The eye region determining unit 813 is configured to obtain an eye region corresponding to the face region based on the eye key points in each face region.
Before the feature extraction module 82, the method may further include:
The data enhancement module 85 performs local histogram equalization processing on the eye region to enhance the contrast of the eye region, and obtains the eye region with enhanced contrast.
Fig. 10 is a schematic structural diagram of a neural network training device according to an exemplary embodiment of the present disclosure. The neural network comprises a feature extraction branch network, a first positioning branch network and a second positioning branch network; as shown in fig. 10, the apparatus provided in this embodiment includes:
The feature extraction module 11 is configured to input the sample eye image into a neural network, and perform feature extraction on the sample eye image through a feature extraction branch network of the neural network to obtain a sample eye feature.
The sample eye image comprises two eyes of a human face, and the two eyes in the sample eye image have corresponding known sight angles.
The angle estimation module 12 is configured to process the sample eye feature obtained by the feature extraction module 11 by using the first positioning branch network and the second positioning branch network, so as to obtain a predicted line-of-sight angle corresponding to both eyes in the sample eye image;
the loss determination module 13 is configured to determine a first loss based on the predicted line of sight angle and the known line of sight angle obtained by the angle estimation module 12.
The network training module 14 is configured to determine a network loss based on the first loss obtained by the loss determination module 13, and train the neural network based on the network loss.
According to the embodiment, the neural network is trained by using the known sight angle, so that the sight angle in the eye area can be predicted better by the obtained neural network, and the accuracy of sight angle estimation is improved.
Fig. 11 is a schematic structural view of a neural network training device according to another exemplary embodiment of the present disclosure. As shown in fig. 11, the apparatus provided in this embodiment includes:
The neural network also includes an eye segmentation branch network; both eyes in the sample eye image also have corresponding known eye keypoints;
After the feature extraction module 11, further includes:
The segmentation supervision determining module 15 is configured to determine a supervision iris area, a supervision eyelid area and a supervision background area in the sample eye image based on the known eye key points corresponding to the sample eye image.
The segmentation prediction module 16 is configured to perform segmentation processing on the sample eye feature by using an eye segmentation branch network, so as to obtain a segmented predicted iris region, a predicted eyelid region and a predicted background region.
The second loss determination module 17 determines a second loss based on the supervised iris region, the supervised eyelid region, and the supervised background region, and the predicted iris region, the predicted eyelid region, and the predicted background region.
The network training module 14 is specifically configured to weight and sum the first loss and the second loss to obtain a network loss; the neural network is trained based on network losses.
Optionally, the neural network further comprises a keypoint prediction branch network;
After the feature extraction module 11, further includes:
The keypoint predicting module 18 is configured to predict the keypoints of the sample eye feature by using the keypoint predicting branch network, so as to obtain predicted eye keypoints.
A third loss determination module 19 for determining a third loss based on the predicted eye keypoints and the known eye keypoints.
The network training module 14 is specifically configured to weight and sum the first loss, the second loss, and the third loss to obtain a network loss; the neural network is trained based on network losses.
The angle estimation module 12 is specifically configured to determine, using a first positioning branch network of the neural network, a predicted angle interval vector corresponding to both eyes in the sample eye image based on the sample eye feature; determining a predicted offset angle vector corresponding to both eyes in the sample eye image based on the sample eye feature and the predicted angle interval vector by using a second positioning branch network of the neural network; and determining the predicted line-of-sight angles corresponding to the eyes in the sample eye image based on the predicted angle interval vector and the predicted offset angle vector.
Exemplary electronic device
Next, an electronic device according to an embodiment of the present disclosure is described with reference to fig. 12. The electronic device may be either or both of the first device 100 and the second device 200, or a stand-alone device independent thereof, which may communicate with the first device and the second device to receive the acquired input signals therefrom.
Fig. 12 illustrates a block diagram of an electronic device according to an embodiment of the disclosure.
As shown in fig. 12, the electronic device 120 includes one or more processors 121 and memory 122.
Processor 121 may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities and may control other components in electronic device 120 to perform desired functions.
Memory 122 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium that can be executed by the processor 121 to implement the gaze angle estimation or neural network training methods of the various embodiments of the present disclosure described above, and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, and the like may also be stored in the computer-readable storage medium.
In one example, the electronic device 120 may further include: an input device 123 and an output device 124, which are interconnected by a bus system and/or other forms of connection mechanisms (not shown).
For example, when the electronic device is the first device 100 or the second device 200, the input means 123 may be a microphone or a microphone array as described above for capturing an input signal of a sound source. When the electronic device is a stand-alone device, the input means 123 may be a communication network connector for receiving the acquired input signals from the first device 100 and the second device 200.
In addition, the input device 123 may include, for example, a keyboard, a mouse, and the like.
The output device 124 may output various information to the outside, including the determined distance information, direction information, and the like. The output device 124 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, etc.
Of course, only some of the components of the electronic device 120 that are relevant to the present disclosure are shown in fig. 12, components such as buses, input/output interfaces, etc. are omitted for simplicity. In addition, the electronic device 120 may include any other suitable components depending on the particular application.
Exemplary computer program product and computer readable storage Medium
In addition to the methods and apparatus described above, embodiments of the present disclosure may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in a gaze angle estimation or neural network training method according to the various embodiments of the present disclosure described in the "exemplary methods" section of this specification.
The computer program product may write program code for performing the operations of embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium, having stored thereon computer program instructions, which when executed by a processor, cause the processor to perform the steps in a line-of-sight angle estimation or neural network training method according to various embodiments of the present disclosure described in the above "exemplary methods" section of the present disclosure.
The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The basic principles of the present disclosure have been described above in connection with specific embodiments, but it should be noted that the advantages, benefits, effects, etc. mentioned in the present disclosure are merely examples and not limiting, and these advantages, benefits, effects, etc. are not to be considered as necessarily possessed by the various embodiments of the present disclosure. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, since the disclosure is not necessarily limited to practice with the specific details described.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, so that the same or similar parts between the embodiments are mutually referred to. For system embodiments, the description is relatively simple as it essentially corresponds to method embodiments, and reference should be made to the description of method embodiments for relevant points.
The block diagrams of the devices, apparatuses, devices, systems referred to in this disclosure are merely illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.
The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described sequence of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present disclosure may also be implemented as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
It is also noted that in the apparatus, devices and methods of the present disclosure, components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered equivalent to the present disclosure.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the disclosure to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims (10)

1. A line of sight angle estimation method, comprising:
determining an eye region included in the image to be detected; performing local histogram equalization processing on the eye region to enhance the contrast of the eye region and obtain the eye region with enhanced contrast;
Extracting features of the eye region to obtain eye region features corresponding to the eye region;
based on the eye region characteristics, determining angle interval vectors corresponding to eyes in the eye region and offset angle vectors corresponding to eyes in the eye region;
and determining the sight angles corresponding to the eyes in the eye area based on the angle interval vector and the offset angle vector.
2. The method of claim 1, wherein the feature extraction of the eye region to obtain eye region features corresponding to the eye region comprises:
And performing feature extraction on the eye region by using a feature extraction branch network of the neural network to obtain eye region features corresponding to the eye region.
3. The method of claim 2, wherein the determining, based on the eye region features, an angle interval vector corresponding to both eyes in the eye region and an offset angle vector corresponding to both eyes in the eye region comprises:
Determining angle interval vectors corresponding to eyes in the eye region based on the eye region characteristics by using a first positioning branch network of the neural network;
and determining offset angle vectors corresponding to eyes in the eye region based on the eye region characteristics and the angle interval vectors by using a second positioning branch network of the neural network.
4. The method of claim 1, wherein the line of sight angles include horizontal and vertical line of sight angles for the left eye and horizontal and vertical line of sight angles for the right eye;
The determining, based on the angle interval vector and the offset angle vector, a line of sight angle corresponding to eyes in the eye region includes:
Adding the angle interval vector and the offset angle vector to obtain a sight angle vector;
And determining the horizontal sight angle and the vertical sight angle of the left eye and the horizontal sight angle and the vertical sight angle of the right eye in the eyes of the eye region based on the sight angle vector.
5. A neural network training method, the neural network comprising a feature extraction branch network, a first positioning branch network and a second positioning branch network; comprising the following steps:
Inputting a sample eye image into the neural network, and carrying out feature extraction on the sample eye image through a feature extraction branch network of the neural network to obtain sample eye features; the sample eye image comprises two eyes of a human face, and the two eyes in the sample eye image have corresponding known sight angles; performing local histogram equalization processing on the sample eye image to enhance the contrast of an eye region and obtain a sample eye image with enhanced contrast;
Processing the sample eye features by using the first positioning branch network and the second positioning branch network to obtain predicted line-of-sight angles corresponding to eyes in the sample eye images;
determining a first loss based on the predicted line of sight angle and the known line of sight angle;
a network loss is determined based on the first loss, and the neural network is trained based on the network loss.
6. The method of claim 5, the neural network further comprising an eye-segmentation branch network; both eyes in the sample eye image also have corresponding known eye keypoints;
The method comprises the steps of inputting a sample eye image into the neural network, extracting the characteristics of the sample eye image through a characteristic extraction branch network of the neural network, and obtaining the sample eye characteristics, and then further comprises the following steps:
Determining a supervision iris area, a supervision eyelid area and a supervision background area in the sample eye image based on known eye key points corresponding to the sample eye image;
Dividing the sample eye features by using the eye division branch network to obtain a divided predicted iris region, a predicted eyelid region and a predicted background region;
Determining a second loss based on the supervised iris region, the supervised eyelid region, and the supervised background region, and the predicted iris region, the predicted eyelid region, and the predicted background region;
the determining a network loss based on the first loss, training the neural network based on the network loss, comprising:
Weighting and summing the first loss and the second loss to obtain network loss;
training the neural network based on the network loss.
7. A line of sight angle estimation apparatus, comprising:
The region extraction module is used for determining an eye region included in the image to be detected;
The feature extraction module is used for extracting features of the eye region obtained by the region extraction module to obtain eye region features corresponding to the eye region;
the interval estimation module is used for determining an angle interval vector corresponding to the eyes in the eye area and an offset angle vector corresponding to the eyes in the eye area based on the eye area characteristics obtained by the characteristic extraction module;
The angle determining module is used for determining the sight angles corresponding to the eyes in the eye area based on the angle interval vector and the offset angle vector determined by the interval estimating module;
further comprises: and the data enhancement module is used for executing local histogram equalization processing on the eye region so as to enhance the contrast of the eye region and obtain the eye region with enhanced contrast.
8. A neural network training device, the neural network comprising a feature extraction branch network, a first positioning branch network, and a second positioning branch network; comprising the following steps:
The feature extraction module is used for inputting the sample eye image into the neural network, and extracting features of the sample eye image through a feature extraction branch network of the neural network to obtain sample eye features; the sample eye image comprises two eyes of a human face, and the two eyes in the sample eye image have corresponding known sight angles; performing local histogram equalization processing on the sample eye image to enhance the contrast of an eye region and obtain a sample eye image with enhanced contrast;
The angle estimation module is used for processing the sample eye characteristics obtained by the characteristic extraction module by utilizing the first positioning branch network and the second positioning branch network to obtain a predicted line-of-sight angle corresponding to eyes in the sample eye image;
A loss determination module configured to determine a first loss based on the predicted line-of-sight angle and the known line-of-sight angle obtained by the angle estimation module;
and the network training module is used for determining network loss based on the first loss obtained by the loss determination module and training the neural network based on the network loss.
9. A computer readable storage medium storing a computer program for executing the line of sight angle estimation method of any one of the preceding claims 1 to 4 or the neural network training method of any one of the preceding claims 5 to 6.
10. An electronic device, the electronic device comprising:
a processor; a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the instructions to implement the line-of-sight angle estimation method according to any one of claims 1 to 4 or the neural network training method according to any one of claims 5 to 6.
CN202011502278.0A 2020-12-18 2020-12-18 Sight angle estimation method and device, storage medium and electronic equipment Active CN112597872B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011502278.0A CN112597872B (en) 2020-12-18 2020-12-18 Sight angle estimation method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011502278.0A CN112597872B (en) 2020-12-18 2020-12-18 Sight angle estimation method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN112597872A CN112597872A (en) 2021-04-02
CN112597872B true CN112597872B (en) 2024-06-28

Family

ID=75199376

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011502278.0A Active CN112597872B (en) 2020-12-18 2020-12-18 Sight angle estimation method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN112597872B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113468956A (en) * 2021-05-24 2021-10-01 北京迈格威科技有限公司 Attention judging method, model training method and corresponding device
CN113379644A (en) * 2021-06-30 2021-09-10 北京字跳网络技术有限公司 Training sample obtaining method and device based on data enhancement and electronic equipment
CN113506328A (en) * 2021-07-16 2021-10-15 北京地平线信息技术有限公司 Method and device for generating sight line estimation model and method and device for estimating sight line
CN113705349B (en) * 2021-07-26 2023-06-06 电子科技大学 Attention quantitative analysis method and system based on line-of-sight estimation neural network
CN113470114A (en) * 2021-08-31 2021-10-01 北京世纪好未来教育科技有限公司 Sight estimation method, sight estimation device, electronic equipment and computer-readable storage medium
CN114879843B (en) * 2022-05-12 2024-07-02 平安科技(深圳)有限公司 Sight redirection method based on artificial intelligence and related equipment
CN116189153A (en) * 2022-09-07 2023-05-30 浙江极氪智能科技有限公司 Method and device for identifying sight line of driver, vehicle and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033957A (en) * 2018-06-20 2018-12-18 同济大学 A kind of gaze estimation method based on quadratic polynomial

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5466610B2 (en) * 2010-09-27 2014-04-09 パナソニック株式会社 Gaze estimation device
CN107871107A (en) * 2016-09-26 2018-04-03 北京眼神科技有限公司 Face authentication method and device
JP7046347B2 (en) * 2017-12-06 2022-04-04 国立大学法人静岡大学 Image processing device and image processing method
JP6840697B2 (en) * 2018-03-23 2021-03-10 株式会社豊田中央研究所 Line-of-sight direction estimation device, line-of-sight direction estimation method, and line-of-sight direction estimation program
CN108734086B (en) * 2018-03-27 2021-07-27 西安科技大学 Blink frequency and sight line estimation method based on eye area generation network
CN109492514A (en) * 2018-08-28 2019-03-19 初速度(苏州)科技有限公司 A kind of method and system in one camera acquisition human eye sight direction
CN110889332A (en) * 2019-10-30 2020-03-17 中国科学院自动化研究所南京人工智能芯片创新研究院 Lie detection method based on micro expression in interview

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033957A (en) * 2018-06-20 2018-12-18 同济大学 A kind of gaze estimation method based on quadratic polynomial

Also Published As

Publication number Publication date
CN112597872A (en) 2021-04-02

Similar Documents

Publication Publication Date Title
CN112597872B (en) Sight angle estimation method and device, storage medium and electronic equipment
US11295474B2 (en) Gaze point determination method and apparatus, electronic device, and computer storage medium
WO2022027912A1 (en) Face pose recognition method and apparatus, terminal device, and storage medium.
US11163978B2 (en) Method and device for face image processing, storage medium, and electronic device
US10891473B2 (en) Method and device for use in hand gesture recognition
EP3525165A1 (en) Method and apparatus with image fusion
JP2018525718A (en) Face recognition system and face recognition method
KR101612605B1 (en) Method for extracting face feature and apparatus for perforimg the method
US9213897B2 (en) Image processing device and method
US11367310B2 (en) Method and apparatus for identity verification, electronic device, computer program, and storage medium
EP4024270A1 (en) Gesture recognition method, electronic device, computer-readable storage medium, and chip
WO2019205633A1 (en) Eye state detection method and detection apparatus, electronic device, and computer readable storage medium
WO2021208767A1 (en) Facial contour correction method and apparatus, and device and storage medium
CN112308006A (en) Sight line area prediction model generation method and device, storage medium and electronic equipment
KR20210113621A (en) Method, apparatus and apparatus for training neural network and detecting eye opening/closing state
CN112651321A (en) File processing method and device and server
CN112115790A (en) Face recognition method and device, readable storage medium and electronic equipment
EP2998928B1 (en) Apparatus and method for extracting high watermark image from continuously photographed images
KR102415507B1 (en) Image processing method and image processing apparatus
CN115795355B (en) Classification model training method, device and equipment
CN109858355B (en) Image processing method and related product
KR101909326B1 (en) User interface control method and system using triangular mesh model according to the change in facial motion
WO2009096208A1 (en) Object recognition system, object recognition method, and object recognition program
CN115830715A (en) Unmanned vehicle control method, device and equipment based on gesture recognition
CN115311723A (en) Living body detection method, living body detection device and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant