CN112308932B

CN112308932B - Gaze detection method, device, equipment and storage medium

Info

Publication number: CN112308932B
Application number: CN202011217737.0A
Authority: CN
Inventors: 朱冬晨; 李嘉茂; 李航; 林敏静; 张晓林
Original assignee: Shanghai Institute of Microsystem and Information Technology of CAS
Current assignee: Shanghai Institute of Microsystem and Information Technology of CAS
Priority date: 2020-11-04
Filing date: 2020-11-04
Publication date: 2023-12-08
Anticipated expiration: 2040-11-04
Also published as: CN112308932A

Abstract

The application discloses a gaze detection method, a gaze detection device, gaze detection equipment and a storage medium, wherein a benchmark image and a reference image of a target object acquired by a target camera and internal and external parameters of the target camera are acquired; constructing a head coordinate system corresponding to the reference image based on the pixel coordinate system corresponding to the reference image; determining a first coordinate conversion relation; determining a first camera coordinate of the binocular pupil center of the target object in a camera coordinate system corresponding to the reference image based on a first pixel coordinate and a second pixel coordinate of the binocular pupil center of the target object in the pixel coordinate system corresponding to the reference image and the reference image respectively and the internal parameter and the external parameter of the target camera; determining the center of the eyes of the target object and the first head coordinate of the center of the eyes of the target object based on the head coordinate system corresponding to the average face model and the reference image; determining the gaze direction and gaze point of the target object in the camera coordinate system to which the reference image corresponds may enable the use of the camera as an acquisition device.

Description

Gaze detection method, device, equipment and storage medium

Technical Field

The present invention relates to the field of gaze detection, and in particular, to a gaze detection method, apparatus, device, and storage medium.

Background

Gaze detection is a technique for acquiring the direction of the line of sight of a subject by means of mechanical, electronic, optical, etc. detection means. With the continuous development of research on the technology of looking at, researchers abandon invasive detection equipment, acquire face and human eye images by using peripheral cameras and other equipment, and estimate the direction of the line of sight by processing and analyzing the face and human eye images. The sight line is taken as a direct expression of the interest points and the attention points of the person, and can reflect the psychological and physiological states of the person, so that the sight line detection technology has wide application in a plurality of fields such as man-machine interaction, medical diagnosis, human factor analysis, virtual reality and the like.

Line-of-sight detection methods can be classified into invasive line-of-sight detection methods and non-invasive line-of-sight detection methods depending on the manner in which the acquisition device is placed. The prior art gaze tracking approaches are invasive, such as electrooculography, which estimates eye movement by placing electrodes at the edges of the eye to obtain potential differences. The invasive gaze tracking method has disadvantages in that it causes discomfort to a person and has a problem of unstable signals when used for a long time. Most of the non-invasive devices employ a gaze point detection method and a three-dimensional gaze detection method. The gaze point detection method can estimate the gaze point of the subject on a two-dimensional plane such as a mobile phone screen, a pad screen, a computer screen and the like.

Three-dimensional line-of-sight detection methods can be classified into geometric-based line-of-sight detection methods and characterization-based line-of-sight detection methods according to line-of-sight feature extraction methods. The mapping relation from the eye image to the sight line is directly learned by a characterization method, usually machine learning or deep learning is used, and a mapping model is obtained by training a large amount of data marked with the true value of the sight line direction, but the method is not suitable for a platform with poor computing capability due to the fact that larger computing force is needed. Geometry-based methods require extraction of some significant or invariant local features of the eye, such as pupil contours, heterochromatic edges of the iris and sclera, corner and cornea reflections (infrared illumination generation), etc., which have high requirements on the quality and resolution of the acquired images, requiring additional infrared illumination. The method needs to rely on higher calculation force and hardware equipment, and is complex in calculation and high in cost.

Disclosure of Invention

In order to solve the technical problems, the invention provides a gaze detection method, a gaze detection device, a gaze detection apparatus and a gaze point detection program, which can realize gaze detection of a target object by using a camera as an acquisition apparatus, and determine a gaze direction and a gaze point of the target object.

To achieve the above object, the present application provides a gaze detection method, comprising:

acquiring a reference image and a reference image of a target object acquired by a target camera and internal and external parameters of the target camera;

constructing a head coordinate system corresponding to the reference image based on the pixel coordinate system corresponding to the reference image;

determining a first coordinate conversion relation between a head coordinate system corresponding to the reference image and a camera coordinate system corresponding to the reference image;

determining a first camera coordinate of the binocular pupil center of the target object in a camera coordinate system corresponding to the reference image based on a first pixel coordinate, a second pixel coordinate and an internal parameter and an external parameter of the target camera in the pixel coordinate systems corresponding to the reference image and the reference image respectively;

determining the center of the eyes of the target object based on the average face model and the head coordinate system corresponding to the reference image, and the first head coordinate of the center of the eyes of the target object in the head coordinate system corresponding to the reference image;

determining second camera coordinates of the center of the binocular eyeball of the target object in a camera coordinate system corresponding to the reference image based on the first head coordinates and the first coordinate conversion relation;

And determining the sight line direction and the gaze point of the target object in a camera coordinate system corresponding to the reference image based on the first camera coordinate and the second camera coordinate.

In another aspect, the present application also provides a gaze detection apparatus, which may include:

the acquisition module is used for acquiring a reference image and a reference image of a target object acquired by a target camera and internal and external parameters of the target camera;

the head coordinate system construction module is used for constructing a head coordinate system corresponding to the reference image based on the pixel coordinate system corresponding to the reference image;

the conversion relation determining module is used for determining a first coordinate conversion relation between a head coordinate system corresponding to the reference image and a camera coordinate system corresponding to the reference image;

the pupil center calculating module is used for determining a first camera coordinate of the binocular pupil center of the target object in a camera coordinate system corresponding to the reference image based on a first pixel coordinate and a second pixel coordinate of the binocular pupil center of the target object in the pixel coordinate system corresponding to the reference image and the reference image respectively and the internal and external parameters of the target camera;

The first calculation module of eyeball center is used for determining the center of the binocular eyeball of the target object based on the average face model and the head coordinate system corresponding to the reference image, and the first head coordinate of the center of the binocular eyeball of the target object in the head coordinate system corresponding to the reference image;

the eyeball center second calculation module is used for determining second camera coordinates of the center of the binocular eyeball of the target object in a camera coordinate system corresponding to the reference image based on the first head coordinates and the first coordinate conversion relation;

and the gaze determination module is used for determining the gaze direction and the gaze point of the target object in the camera coordinate system corresponding to the reference image based on the first camera coordinate and the second camera coordinate.

the gaze detection system comprises a processor and a memory, wherein at least one instruction or at least one section of program is stored in the memory, and the at least one instruction or the at least one section of program is loaded and executed by the processor to realize the gaze detection method.

In addition, the application also provides a storage medium, wherein at least one instruction or at least one section of program is stored in the storage medium, and the at least one instruction or the at least one section of program is loaded and executed by a processor to realize the gaze detection method.

The implementation of the application has the following beneficial effects:

the method comprises the steps of acquiring a reference image and a reference image of a target object acquired by a target camera and internal and external parameters of the target camera; constructing a head coordinate system corresponding to the reference image based on the pixel coordinate system corresponding to the reference image; determining a first coordinate conversion relation between a head coordinate system corresponding to the reference image and a camera coordinate system corresponding to the reference image; determining a first camera coordinate of the binocular pupil center of the target object in a camera coordinate system corresponding to the reference image based on a first pixel coordinate, a second pixel coordinate and an internal parameter and an external parameter of the target camera in the pixel coordinate systems corresponding to the reference image and the reference image respectively; determining the center of the eyes of the target object based on the average face model and the head coordinate system corresponding to the reference image, and the first head coordinate of the center of the eyes of the target object in the head coordinate system corresponding to the reference image; determining second camera coordinates of the center of the binocular eyeball of the target object in a camera coordinate system corresponding to the reference image based on the first head coordinates and the first coordinate conversion relation; based on the first camera coordinates and the second camera coordinates, the sight line direction and the gaze point of the target object in the camera coordinate system corresponding to the reference image are determined, so that the camera can be used as acquisition equipment to perform gaze detection on the object, the sight line direction and the gaze point of the target object are determined, dependence on the acquisition equipment is reduced, and the calculation force requirement on a hardware platform is reduced.

Drawings

In order to more clearly illustrate the technical solutions of the present application, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a gaze detection method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a pixel coordinate system according to an embodiment of the present application;

FIG. 3 is a head coordinate system corresponding to a constructed reference image according to an embodiment of the present application

Is a flow diagram of (1);

fig. 4 is a schematic diagram of a camera coordinate system according to an embodiment of the present application;

FIG. 5 is a schematic flow chart of determining a first coordinate transformation relationship according to an embodiment of the present application;

fig. 6 is a flow chart of a gaze detection method according to another embodiment of the present application;

fig. 7 is a schematic flow chart of determining a first camera coordinate according to an embodiment of the present application;

fig. 8 is a schematic flow chart of determining the pupil center depth according to an embodiment of the present application;

Fig. 9 is a schematic flow chart of determining a sight line direction and a gaze point according to an embodiment of the present application;

fig. 10 is a schematic diagram of a gaze detection apparatus according to an embodiment of the present application.

Detailed Description

In order to make the technical solution of the present application better understood by those skilled in the art, the technical solution of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, shall fall within the scope of the application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In order to implement the technical scheme of the application, more engineering technicians can easily understand and apply the application, and the working principle of the application is further explained by combining specific embodiments.

The application can be applied to the field of gaze detection, and utilizes a binocular or multi-view camera to collect the image to be detected of the object, and the head pose and eyeball pose are determined by analyzing the image to be detected of the object, and the binocular vision intersection is obtained according to the head pose and eyeball pose, so that the vision direction and the gaze point of the target object are determined. When the multi-camera is adopted to collect the image to be detected of the object, and when any reference image has a problem, analysis can be carried out according to other reference images, so that the normal operation of gaze detection is ensured. The gaze detection system can run under natural illumination, can detect the three-dimensional sight direction of the head gesture of the object when the head gesture has larger translation and rotation, and reduces the calculation force requirement on a hardware platform.

An embodiment of a gaze detection method of the present application is described below, as shown in fig. 1, and may include:

s101: and acquiring a base image and a reference image of the target object acquired by the target camera and internal and external parameters of the target camera.

The object camera in the embodiment of the application can comprise a binocular camera or a multi-view camera. When a binocular camera is used for acquiring images of a target object, one of the acquired images is used as a reference image, and the other image is used as a reference image. When a multi-camera is used for acquiring images of a target object, one of the acquired images is used as a reference image, and at least two images are left as reference images. The target camera may be calibrated and corrected prior to acquiring the baseline image and the reference image of the target object acquired by the target camera. And calibrating the target cameras to obtain the internal parameters and the external parameters of the target cameras, and simultaneously obtaining a rotation matrix and a translation matrix between the target cameras. For example, before the binocular camera collects the reference image and the reference image of the target object, the image collected by the left camera is used as the reference image to calibrate the binocular camera, so that the internal parameters of the left and right cameras are respectively C _L And C _R And finding a rotation matrix R from right camera to left camera _R2L And translation vector T _R2L 。

Specifically, the internal parameters and external parameters of the target camera in the embodiment of the present application may include a center point of the camera, a principal axis of the camera, an image plane of the camera, a principal point of the camera (a point where the principal axis intersects with the image plane), a focal length of the camera, and the like.

S103: and constructing a head coordinate system corresponding to the reference image based on the pixel coordinate system corresponding to the reference image.

Specifically, a pixel coordinate system can be determined according to the camera plane of the object camera, as shown in fig. 2, a quadrilateral ABCD is an image plane of the camera, in which the origin of the pixel coordinate system is the intersection O of the u-axis and the v-axis ₀ The horizontal axis u and the vertical axis v are the row and column, respectively, where the pixel points are located. The pixel coordinate system is a two-dimensional coordinate system and the head coordinate system is a three-dimensional coordinate system.

In some embodiments, as shown in fig. 3, constructing a head coordinate system corresponding to the reference image based on the pixel coordinate system corresponding to the reference image may include:

s1031: facial key points of the target object are extracted from the reference image.

Specifically, a face detection algorithm may be used to analyze the reference image and extract the face key points of the target object. For example, the Dlib face recognition tool library can be used for recognizing the reference image, the face key points of the target object are extracted, in addition, when the Dlib face recognition tool library is used for recognizing the reference image, the number of the face key points to be extracted is set, and the calculated amount can be reduced while the number of the face key points is ensured to meet the subsequent analysis requirement.

S1033: and determining a third pixel coordinate of the face key point of the target object in a pixel coordinate system corresponding to the reference image.

Specifically, the third pixel coordinates are two-dimensional coordinates. For example, face key points of 68 target objects are extracted from the reference image, and a set P of two-dimensional coordinates of the face key points of the 68 target objects can be determined in a pixel coordinate system corresponding to the reference image _landmarks (u _L ，v _L )。

S1035: and determining a head coordinate system corresponding to the reference image based on the average face model and the third pixel coordinate.

The average face model in the embodiment of the application refers to a preset three-dimensional face model, and the average face model may include key points corresponding to the face key points of the target object. According to the average face model and the face key points of the target object, the origin and coordinate axes of the head coordinate system corresponding to the reference image can be determined, and the origin coordinate and coordinate axis direction of the head coordinate system corresponding to the reference image can be determined by combining the third pixel coordinate, so that the head coordinate system corresponding to the reference image is determined.

S105: a first coordinate conversion relationship between a head coordinate system corresponding to the reference image and a camera coordinate system corresponding to the reference image is determined.

Specifically, the first coordinate transformation relationship may include a head coordinate system corresponding to the reference image and the reference imageRotation matrix R between corresponding camera coordinate systems _H2C And translation vector T _H2C . The head coordinate system corresponding to the reference image and the camera coordinate system corresponding to the reference image may be mutually converted by a rotation matrix and a translation vector therebetween. The method may include determining a second coordinate conversion relationship between a camera coordinate system corresponding to the reference image and a pixel coordinate system corresponding to the reference image based on the internal and external parameters of the target camera prior to determining the first coordinate conversion relationship. Specifically, the second coordinate conversion relationship may include a rotation matrix and a translation vector between a camera coordinate system corresponding to the reference image and a pixel coordinate system corresponding to the reference image. As shown in fig. 4, in the camera coordinate system, the principal point of the camera in the internal and external parameters of the target camera is taken as the origin, and the axis parallel to the camera plane of the camera is taken as X _c Axes and Y _c An axis, taking the main axis of the camera as Z _c Axis, Z _c The axis is perpendicular to the acquisition plane of the target object. In the pixel coordinate system, the intersection point O of the u-axis and v-axis of the phase plane of the camera ₀ By taking the rows and columns of the image plane of the camera as the horizontal axis u and the vertical axis v and combining the internal parameters and the external parameters of the target camera, the rotation matrix and the translation matrix between the camera coordinate system corresponding to the reference image and the pixel coordinate system corresponding to the reference image, namely the conversion relationship between the camera coordinate system corresponding to the reference image and the pixel coordinate system corresponding to the reference image, can be obtained.

Accordingly, as shown in fig. 5, determining the first coordinate conversion relationship between the head coordinate system corresponding to the reference image and the camera coordinate system corresponding to the reference image may include:

s1051: and determining second head coordinates of the face key points of the target object in the head coordinate system corresponding to the reference image.

Specifically, the second head coordinates of the face key point of the target object in the head coordinate system corresponding to the reference image are three-dimensional coordinates.

S1053: and analyzing the second head coordinate, the third pixel coordinate and the second coordinate conversion relation to determine the first coordinate conversion relation.

In the embodiment of the application, three-dimensional is obtainedAfter the second head coordinate, the two-dimensional third pixel coordinate and the second coordinate conversion relation, the motion relation from the face key Point represented by the three-dimensional second head coordinate to the face key Point pair represented by the corresponding two-dimensional third pixel coordinate can be solved through the reprojection and PnP (transparent-n-Point) algorithm. When determining a plurality of face key points of the target object and projection positions of the plurality of face key points, the pose of the reference camera in the target camera can be determined according to the algorithm, namely, the head coordinate system of the reference camera and the rotation matrix R of the camera coordinate system can be determined _H2C And a translation matrix T _H2C Thereby determining a first coordinate conversion relationship.

S107: and determining a first camera coordinate of the binocular pupil center of the target object in a camera coordinate system corresponding to the reference image based on the first pixel coordinate and the second pixel coordinate of the binocular pupil center of the target object in the pixel coordinate system corresponding to the reference image and the reference image respectively and the internal parameter and the external parameter of the target camera.

In some embodiments, based on the first pixel coordinate, the second pixel coordinate, and the inner parameter of the target camera, of the binocular pupil center of the target object in the pixel coordinate system corresponding to the reference image and the reference image, respectively, the method may further include, as shown in fig. 6:

s1061: the regions of interest of the eyes in the reference image and the reference image are respectively determined based on the facial key points of the target object.

Specifically, a region of interest containing both eyes may be extracted from the base image and the reference image based on the face key points of the target object, respectively.

S1063: and determining a gradient map and a gray weight map of the region of interest of the eyes in the reference image, and a gradient map and a gray weight map of the region of interest of the eyes in the reference image.

Specifically, gradients in the u-v directions are respectively calculated for regions of interest of eyes in a reference image and a reference image, gradient amplitude diagrams of respective two-dimensional gradient vectors are calculated, dynamic thresholds are calculated by using the gradient amplitude diagrams, the gradients are subjected to grading and normalization processing, a plurality of pixel sets are obtained after the processing, each pixel set is a region corresponding to a real scene, the inside of each region has consistent attributes, and adjacent regions do not have consistent attributes, so that a normalized gradient diagram is obtained. Similarly, the dynamic threshold may be used to divide the image pixels into sets of gray levels, each set of gray levels being an area corresponding to the real scene, each area having a uniform gray level therein, and adjacent areas not having a uniform gray level. Because the color of the pupil center is darker than that of the rest positions of the eyes, the gray scale map of the interested areas of the eyes can be negated, and a gray scale weight map can be obtained.

S1065: the point which simultaneously meets the preset gradient condition and the gray scale condition in the reference image is taken as the center of the pupil of the two eyes of the target object in the reference image, and the point which simultaneously meets the preset gradient condition and the gray scale condition in the reference image is taken as the center of the pupil of the two eyes of the target object in the reference image.

As shown, point x _i And point c is any point in the region of interest of both eyes,is x _i The gradient vector at which the gradient is located,is directed from c to x _i Is used for the normalized displacement vector of (a). Point x _i The effect of whether point c is the pupil center can be expressed as an inner product. Wherein (1)>And->In the same direction, expressed in point x _i When the point c is the iris boundary point, the probability that the point c is the pupil center is high. Omega _c The higher the gray weight at point c, the higher the probability that point c is the pupil center. According to->And->Each pupil center can be found:

in some embodiments, as shown in fig. 7, determining the first camera coordinates of the binocular pupil center of the target object in the camera coordinate system corresponding to the reference image based on the first pixel coordinates, the second pixel coordinates, and the internal and external parameters of the target camera in the pixel coordinate systems corresponding to the reference image and the reference image, respectively, may include:

s1071: and determining the depth of the pupil center of the eyes of the target object in the camera coordinate system corresponding to the reference image based on the first pixel coordinate, the second pixel coordinate and the internal and external parameters of the target camera.

Specifically, as shown in fig. 8, determining the depth of the pupil center of the two eyes of the target object in the camera coordinate system corresponding to the reference image based on the first pixel coordinate, the second pixel coordinate, and the internal parameter and the external parameter of the target camera may include:

S10711: and determining the parallax of the center of the pupil of the two eyes of the target object according to the first pixel coordinates and the second pixel coordinates of the center of the pupil of the two eyes of the target object in the pixel coordinate systems corresponding to the reference image and the reference image respectively.

Specifically, taking a reference image and a reference image acquired by a binocular camera as examples, the coordinate of the center of the pupil of the left eye of the target object in the reference image on the u-axis is P _pupilL (u _L ) The coordinate of the pupil center of the right eye on the u axis is P _pupilR (u _L ) The method comprises the steps of carrying out a first treatment on the surface of the The coordinates of the center of the pupil of the left eye of the target object in each reference image on the u axis are P _pupilL (u _R ) The coordinate of the pupil center of the right eye on the u axis is P _pupilR (u _R )，Then the parallax at the center of the pupil of the left eye is d _pupilL The parallax at the pupil center of the right eye is d _pupilR The parallax formulas of the centers of pupils of the eyes are respectively as follows:

d _pupilL ＝P _pupilL (u _L )-P _pupilL (u _R )

d _pupilR ＝P _pupilR (u _L )-P _pupilR (u _R )

parallax d at the pupil center of the left eye _pupilL And the parallax d of the pupil center of the right eye _pupilR Are all one-dimensional values.

S10713: a camera baseline length is determined based on the internal and external parameters of the target camera.

Specifically, the camera baseline length of the target camera refers to the module length of the translation vector from the camera collecting the reference image to the camera collecting the reference image, and the translation vector is the external parameter between the target cameras, so that the camera baseline length can be determined based on the external parameter in the internal parameter and the external parameter of the target camera.

S10715: the depth of the binocular pupil center of the target object in the camera coordinate system corresponding to the reference image is determined based on the camera focal length, the camera baseline length and the parallax of the binocular pupil center of the target object in the internal and external parameters of the target camera.

Specifically, the depth of the pupil center of the two eyes under the camera coordinate system of the reference image can be obtained according to the internal and external parameters of the target camera. Depth z of left eye pupil center _L ＝f·L _R2L /d _pupilL Depth z of pupil center of right eye _R ＝f·L _R2L /d _pupilR Wherein f is the nominal camera focal length, L _R2L Is the right camera to left camera translation vector T _R2L Is representative of the length of the baseline. In the embodiment of the multi-eye camera for acquiring the reference image and the reference image, each of the reference image and the reference image may be used to calculate the parallax of the centers of the pupils of the two eyes, and then the parallax of the centers of the pupils of the left eye in the depth formula of the centers of the pupils of the left eye may be an average value of the parallaxes of the centers of the pupils of the left eye, wherein the parallaxes of the centers of the pupils of the left eye are the parallaxes of the centers of the pupils of the right eye after removing the outliers, and the depth formula of the centers of the pupils of the right eye corresponds toThe parallax of the center of the right eye pupil may be an average of the parallaxes of the center of the right eye pupils.

S1073: the first camera coordinates are determined based on the depth of the pupil center of the eyes of the target object in the camera coordinate system corresponding to the reference image and the first pixel coordinates.

Specifically, according to the depth of the pupil center of the two eyes of the target object in the camera coordinate system corresponding to the reference image and the first pixel coordinate, the pupil center of the two eyes of the target object in x can be obtained _C ，y _C Upper coordinates.

The three-dimensional coordinates of the pupil center of the two eyes of the target object can be determined based on the depth of the pupil center of the two eyes of the target object in the camera coordinate system corresponding to the reference image and the two-dimensional coordinates of the pupil center of the two eyes of the target object in the camera coordinate system corresponding to the reference image. In an embodiment in which the binocular camera captures an image of the target object, the three-dimensional coordinates P of the left eye pupil center of the target object _pupilL (x _C ，y _C ，z _C ) And the three-dimensional coordinates P of the pupil center of the right eye of the target object _pupilR (x _C ，y _C ，z _C )，P _pupilL (x _C ，y _C ，z _C ) And P _pupilR (x _C ，y _C ，z _C ) The formula of (2) is as follows:

s109: and determining the center of the eyes of the target object based on the average face model and the head coordinate system corresponding to the reference image, and determining the first head coordinate of the center of the eyes of the target object in the head coordinate system corresponding to the reference image.

Specifically, in the head coordinate system of the reference image, the three-dimensional coordinate of the center of the left eye eyeball may be represented as P _centerL (x _H ，y _H ，z _H ) The three-dimensional coordinates of the center of the right eye eyeball may be represented as P _centerR (x _H ，y _H ，z _H )。

S111: and determining second camera coordinates of the center of the binocular eyeball of the target object in a camera coordinate system corresponding to the reference image based on the first head coordinates and the first coordinate conversion relation.

Specifically, the center coordinates P of the left eye eyeball in the head coordinate system of the reference image _centerL (x _H ，y _H ，z _H ) And the center coordinate P of the eyeball of the right eye _centerR (x _H ，y _H ，z _H ) Using rotation matrix R of head coordinate system and camera coordinate system in reference image _H2C And translation vector T _H2C The three-dimensional coordinates of the center of the eyes of the target object in the camera coordinate system of the reference image can be obtained, wherein the left eye eyeball coordinates are P _centerL (x _C ，y _C ，z _C ) The eyeball coordinate of the right eye is P _centerR (x _C ，y _C ，z _C ) The specific expression is:

P _centerL (x _C ，y _C ，z _C )＝R _H2C P _centerL (x _C ，y _C ，z _C )+T _H2C

P _centerR (x _C ，y _C ，z _C )＝R _H2C P _centerR (x _C ，y _C ，z _C )+T _H2C

s113: based on the first camera coordinates and the second camera coordinates, a gaze direction and a gaze point of the target object in the camera coordinate system corresponding to the reference image are determined.

Specifically, based on the first camera coordinate and the second camera coordinate, a line connecting the center of the eyes and the center of the pupils of the eyes in the camera coordinate system of the reference image can be obtained, and the line is the sight line direction of the target object.

In some embodiments, as shown in fig. 9, determining the gaze direction and gaze point of the target object in the camera coordinate system corresponding to the reference image based on the first camera coordinate, and the second camera coordinate may include:

s1131: and determining the sight line vectors of the eyes of the target object in the camera coordinate system corresponding to the reference image based on the first camera coordinate and the second camera coordinate.

Specifically, the line of sight vector for the left eyeLine of sight vector for right eye->

S1133: the line-of-sight direction of the target object in the camera coordinate system corresponding to the reference image is determined based on the line-of-sight vectors of both eyes of the target object in the camera coordinate system corresponding to the reference image.

Specifically, the target object line of sight direction is the same as the direction of the line of sight vector.

S1135: and determining two straight lines of sight vectors in the camera coordinate system corresponding to the reference image, wherein the two eyes of the target object are in the camera coordinate system corresponding to the reference image.

Specifically, the three-dimensional coordinates P in the camera coordinate system of the reference camera according to the pupil centers of the eyes of the target object _pupilL (x _C ，y _C ，z _C )，P _pupilR (x _C ，y _C ，z _C ) And a line of sight vectorAnd->The representation of two points on the line of sight of the target object in the camera coordinate system of the reference image can be obtained:

wherein a is ₁ And a ₂ Respectively isAnd->Point P is a point on the left eye line and point Q is a point on the right eye line. Segment |PQ| represents a line connecting two straight lines

The following min|PQ| is solved by the least square method ² A in the formula ₁ And a ₂ P and Q are obtained, so that a point closest to the distance between two lines of sight in space can be obtained as a point of regard.

S1137: and determining the intersection point of the two straight lines as the fixation point of the target object in the camera coordinate system corresponding to the reference image.

The above embodiment shows that the application acquires the reference image and the reference image of the target object acquired by the target camera, and the internal parameters and the external parameters of the target camera; constructing a head coordinate system corresponding to the reference image based on the pixel coordinate system corresponding to the reference image; determining a first coordinate conversion relation between a head coordinate system corresponding to the reference image and a camera coordinate system corresponding to the reference image; determining a first camera coordinate of the binocular pupil center of the target object in a camera coordinate system corresponding to the reference image based on a first pixel coordinate and a second pixel coordinate of the binocular pupil center of the target object in the pixel coordinate system corresponding to the reference image and the reference image respectively and the internal parameter and the external parameter of the target camera; determining the center of the eyes of the target object based on the average face model and the head coordinate system corresponding to the reference image, and the first head coordinate of the center of the eyes of the target object in the head coordinate system corresponding to the reference image; determining a second camera coordinate of the center of the binocular eyeball of the target object in a camera coordinate system corresponding to the reference image based on the first head coordinate and the first coordinate conversion relation; based on the first camera coordinates and the second camera coordinates, the sight direction and the gaze point of the target object in the camera coordinate system corresponding to the reference image are determined, so that the camera can be used as acquisition equipment to perform gaze detection on the object, and the sight direction and the gaze point of the target object are determined. The gaze detection system can run under natural illumination, can detect the three-dimensional sight direction of the head gesture of the object when the head gesture has larger translation and rotation, and reduces the calculation force requirement on a hardware platform.

Another aspect of the present application also provides an embodiment of a gaze detection apparatus, as shown in fig. 10, which may include:

the acquisition module 201 is configured to acquire a baseline image and a reference image of a target object acquired by a target camera, and internal and external parameters of the target camera.

The head coordinate system construction module 203 is configured to construct a head coordinate system corresponding to the reference image based on the pixel coordinate system corresponding to the reference image.

The conversion relation determining module 205 is configured to determine a first coordinate conversion relation between a head coordinate system corresponding to the reference image and a camera coordinate system corresponding to the reference image.

The pupil center calculating module 207 is configured to determine, based on the first pixel coordinate and the second pixel coordinate of the pupil center of the two eyes of the target object in the pixel coordinate system corresponding to the reference image and the reference image, and the inner parameter and the outer parameter of the target camera, a first camera coordinate of the pupil center of the two eyes of the target object in the camera coordinate system corresponding to the reference image.

The eyeball center first calculation module 209 is configured to determine a binocular eyeball center of the target object based on the average face model and a head coordinate system corresponding to the reference image, and a first head coordinate of the binocular eyeball center of the target object in the head coordinate system corresponding to the reference image.

The eyeball center second calculation module 211 is configured to determine, based on the first head coordinate and the first coordinate conversion relationship, a second camera coordinate of the center of the binocular eyeball of the target object in the camera coordinate system corresponding to the reference image.

A gaze determination module 213 for determining a gaze direction and a gaze point of the target object in the camera coordinate system corresponding to the reference image based on the first camera coordinate and the second camera coordinate.

Specifically, the head coordinate system construction module 203 may include:

and a face key point extraction unit for extracting a face key point of the target object from the reference image.

And the third pixel coordinate determining unit is used for determining the third pixel coordinate of the face key point of the target object in the pixel coordinate system corresponding to the reference image.

And the head coordinate system construction unit is used for analyzing the average face model and the third pixel coordinate and determining a head coordinate system corresponding to the reference image.

Specifically, the device may further include:

and the binocular interest region determining module is used for respectively determining the binocular interest regions in the reference image and the reference image based on the face key points of the target object.

The image feature determining module is used for determining a gradient map and a gray weight map of the region of interest of the eyes in the reference image and a gradient map and a gray weight map of the region of interest of the eyes in the reference image.

The pupil center determining module is used for taking a point in the reference image, which simultaneously meets the preset gradient condition and the gray level condition, as the center of the pupil of the two eyes of the target object in the reference image, and taking a point in the reference image, which simultaneously meets the preset gradient condition and the gray level condition, as the center of the pupil of the two eyes of the target object in the reference image.

Specifically, the pupil center calculation module 207 may include:

and the pupil depth determining unit is used for determining the depth of the pupil center of the two eyes of the target object in the camera coordinate system corresponding to the reference image based on the first pixel coordinate, the second pixel coordinate and the internal and external parameters of the target camera.

And the pupil two-dimensional coordinate determining unit is used for determining the two-dimensional coordinates of the pupil center of the two eyes of the target object in the camera coordinate system corresponding to the reference image based on the first pixel coordinates.

And the first camera coordinate determining unit is used for determining the first camera coordinate based on the depth of the pupil center of the two eyes of the target object in the camera coordinate system corresponding to the reference image and the two-dimensional coordinate of the pupil center of the two eyes of the target object in the camera coordinate system corresponding to the reference image.

Specifically, the pupil depth determination unit may include:

And the parallax determining unit is used for determining the parallax of the center of the binocular pupil of the target object based on the first pixel coordinate and the second pixel coordinate of the center of the binocular pupil of the target object in the pixel coordinate systems corresponding to the base image and the reference image respectively.

And the camera baseline length determining unit is used for determining the camera baseline length based on the internal and external parameters of the target camera.

And the pupil depth calculation unit is used for determining the depth of the two-eye pupil center of the target object in the camera coordinate system corresponding to the reference image based on the camera focal length, the camera baseline length and the parallax of the two-eye pupil center of the target object in the internal and external parameters of the target camera.

Specifically, the device may further include:

and the second coordinate conversion relation determining module is used for determining a second coordinate conversion relation between a camera coordinate system corresponding to the reference image and a pixel coordinate system corresponding to the reference image according to the internal parameter and the external parameter of the target camera.

Accordingly, the conversion relation determining module 205 may include:

and the second head coordinate determining unit is used for determining the second head coordinate of the face key point of the target object in the head coordinate system corresponding to the reference image.

And the first coordinate conversion relation determining unit is used for analyzing the second head coordinate, the third pixel coordinate and the second coordinate conversion relation to determine the first coordinate conversion relation.

Specifically, the gaze determination module 213 may include:

and the sight line vector determining unit is used for determining the sight line vector of the two eyes of the target object in the camera coordinate system corresponding to the reference image based on the first camera coordinate and the second camera coordinate.

And a line-of-sight direction determination unit configured to determine a line-of-sight direction of the target object in the camera coordinate system corresponding to the reference image based on the line-of-sight vectors of both eyes of the target object in the camera coordinate system corresponding to the reference image.

And the straight line determining unit is used for determining two straight lines of the sight line vector in the camera coordinate system corresponding to the reference image, wherein the two eyes of the target object are in the camera coordinate system corresponding to the reference image.

And the gaze point determining unit is used for determining that the intersection point of the two straight lines is the gaze point of the target object in the camera coordinate system corresponding to the reference image.

In another aspect, the present application also provides an embodiment of a gaze detection apparatus, the apparatus including a processor and a memory, the memory storing at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by the processor to implement a gaze detection method in any of the above embodiments.

In another aspect, the present application further provides an embodiment of a computer storage medium, where at least one instruction or at least one program is stored, where the at least one instruction or at least one program is loaded and executed by a processor to implement the gaze detection method in any of the foregoing embodiments.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while the embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims of the present invention, any of the claimed embodiments may be used in any combination.

The present invention may also be embodied as a device or system program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the present invention may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order, and the words may be interpreted as names.

Claims

1. A gaze detection method, the method comprising:

acquiring a reference image and a reference image of a target object acquired by a target camera and internal and external parameters of the target camera; the target camera is a binocular camera or a multi-view camera; the reference image is one image of at least two images acquired by the target camera, and the reference image is an image of the at least two images except the reference image;

determining a sight vector of both eyes of a target object in a camera coordinate system corresponding to the reference image based on the first camera coordinate and the second camera coordinate;

determining the sight direction of a target object in a camera coordinate system corresponding to the reference image based on the sight vectors of the two eyes of the target object in the camera coordinate system corresponding to the reference image;

determining two straight lines of sight vectors in a camera coordinate system corresponding to the reference image, wherein the two eyes of the target object are in the camera coordinate system corresponding to the reference image;

and determining the intersection point of the two straight lines as the fixation point of the target object in the camera coordinate system corresponding to the reference image.

2. The method of claim 1, wherein constructing a head coordinate system corresponding to the reference image based on a pixel coordinate system corresponding to the reference image comprises:

extracting facial key points of a target object from the reference image;

Determining a third pixel coordinate of a face key point of the target object in a pixel coordinate system corresponding to the reference image;

and analyzing the average face model and the third pixel coordinates to determine a head coordinate system corresponding to the reference image.

3. The method of claim 2, wherein prior to the determining the first camera coordinates of the binocular pupil center of the target object in the camera coordinate system corresponding to the reference image based on the first pixel coordinates, the second pixel coordinates, and the internal and external parameters of the target camera in the pixel coordinate systems corresponding to the reference image and the reference image, the method further comprises:

determining a binocular region of interest in the reference image and the reference image respectively based on the facial key points of the target object;

determining a gradient map and a gray weight map of a binocular region of interest in the reference image, and a gradient map and a gray weight map of the binocular region of interest in the reference image;

and taking a point in the reference image, which simultaneously meets a preset gradient condition and a gray level condition, as a pupil center of the eyes of the target object in the reference image, and taking a point in the reference image, which simultaneously meets the preset gradient condition and the gray level condition, as a pupil center of the eyes of the target object in the reference image.

4. The method of claim 1, wherein the determining the first camera coordinate of the binocular pupil center of the target object in the camera coordinate system corresponding to the reference image based on the first pixel coordinate, the second pixel coordinate, and the internal and external parameters of the target camera in the pixel coordinate systems corresponding to the reference image and the reference image, respectively, comprises:

determining the depth of the pupil center of the two eyes of the target object in a camera coordinate system corresponding to the reference image according to the first pixel coordinate, the second pixel coordinate and the internal and external parameters of the target camera;

and determining the first camera coordinate according to the depth of the pupil center of the eyes of the target object in the camera coordinate system corresponding to the reference image and the first pixel coordinate.

5. The method of claim 4, wherein determining the depth of the pupil center of the two eyes of the target object in the camera coordinate system corresponding to the reference image based on the first pixel coordinates, the second pixel coordinates, and the internal and external parameters of the target camera comprises:

determining parallax of the center of the pupil of the two eyes of the target object according to a first pixel coordinate and a second pixel coordinate of the center of the pupil of the two eyes of the target object in a pixel coordinate system corresponding to the reference image and the reference image respectively;

Determining a camera baseline length based on the internal and external parameters of the target camera;

and determining the depth of the binocular pupil center of the target object in a camera coordinate system corresponding to the reference image based on the camera focal length in the inner and outer parameters of the target camera, the camera baseline length and the parallax of the binocular pupil center of the target object.

6. The method according to claim 2, wherein the method further comprises:

determining a second coordinate conversion relation between a camera coordinate system corresponding to the reference image and a pixel coordinate system corresponding to the reference image according to the internal and external parameters of the target camera;

correspondingly, the determining the first coordinate conversion relation between the head coordinate system corresponding to the reference image and the camera coordinate system corresponding to the reference image includes:

determining a second head coordinate of the face key point of the target object in a head coordinate system corresponding to the reference image;

and analyzing the second head coordinate, the third pixel coordinate and the second coordinate conversion relation to determine the first coordinate conversion relation.

7. A gaze detection apparatus, the apparatus comprising:

The acquisition module is used for acquiring a reference image and a reference image of a target object acquired by a target camera and internal and external parameters of the target camera; the target camera is a binocular camera or a multi-view camera; the reference image is one image of at least two images acquired by the target camera, and the reference image is an image of the at least two images except the reference image;

the gaze determination module is used for determining the gaze vector of both eyes of the target object in the camera coordinate system corresponding to the reference image based on the first camera coordinate and the second camera coordinate; determining the sight direction of a target object in a camera coordinate system corresponding to the reference image based on the sight vectors of the two eyes of the target object in the camera coordinate system corresponding to the reference image; determining two straight lines of sight vectors in a camera coordinate system corresponding to the reference image, wherein the two eyes of the target object are in the camera coordinate system corresponding to the reference image; and determining the intersection point of the two straight lines as the fixation point of the target object in the camera coordinate system corresponding to the reference image.

8. A gaze detection apparatus, characterized in that said apparatus comprises a processor and a memory, said memory having stored therein at least one instruction or at least one program, said at least one instruction or said at least one program being loaded and executed by said processor to implement a gaze detection method according to any of claims 1 to 6.

9. A computer storage medium having stored therein at least one instruction or at least one program loaded and executed by a processor to implement the gaze detection method of any of claims 1 to 6.