CN111524175A

CN111524175A - Depth reconstruction and eye movement tracking method and system for asymmetric multiple cameras

Info

Publication number: CN111524175A
Application number: CN202010300631.0A
Authority: CN
Inventors: 王荃
Original assignee: Dongguan Dongquan Intelligent Technology Co ltd
Current assignee: Dongguan Dongquan Intelligent Technology Co ltd
Priority date: 2020-04-16
Filing date: 2020-04-16
Publication date: 2020-08-11

Abstract

The invention discloses a depth reconstruction and eye movement tracking method and system for asymmetric multiple cameras, wherein the method comprises the following steps: respectively acquiring images of an object through a camera group consisting of at least one visible light camera and at least one infrared camera which are fixed at different positions; and acquiring paired feature points in the two images, determining the positions of the paired feature points, and reconstructing the depth and three-dimensional coordinates of the feature points according to the positions of the paired feature points. Pupil boundary information and pupil center point can be accurately obtained through the infrared camera, and meanwhile, the visible light camera can be applied to occasions with various color requirements. The depth reconstruction system with the asymmetric multiple cameras can be applied to facial feature points for depth reconstruction and three-dimensional head tracking, and further accurate eye movement tracking is achieved.

Description

Depth reconstruction and eye movement tracking method and system for asymmetric multiple cameras

Technical Field

The invention relates to the field of artificial intelligence and human-computer interaction, in particular to a depth reconstruction and eye movement tracking method and system for asymmetric multiple cameras.

Background

Eye tracking can be divided into two types from the imaging type:

the first type is eye tracking using infrared fill-in light and an imaging mode of an infrared camera (or an infrared filter). On one hand, the boundary between the pupil and the iris is clearer, so that the boundary of the pupil can be found more accurately; meanwhile, the pupil is not easy to be shielded by the upper eyelid and the lower eyelid, so that the pupil boundary alignment is facilitated. On the other hand, the infrared lamp for light supplement can let the formation of image have stable illumination environment, can use evening or under the low light condition, and the infrared lamp of light supplement can form specular reflection on the sclera simultaneously to become the reference object of eye movement. The drawback of infrared imaging is that with only single-channel imaging (only grey-scale values and no color channels that differ from RGB red green blue), color monotony is not suitable for color-related tasks. Therefore, the infrared imaging has a relatively narrow application range compared with the visible light, and is usually designed for eye tracking.

The second category is eye movement tracking in visible light, using common visible light imaging, the eye movement of the eyeball and the user's point of interest are typically determined by capturing the user's facial feature points and centering the pupil or iris + pupil ensemble. The advantage of visible light imaging lies in the universality of hardware and the simplicity of hardware setting, and visible light cameras are often already available in various existing devices, such as a front camera or a rear camera of a mobile phone, and a camera of a tablet computer or a notebook computer. But the disadvantage is that the boundary between the pupil and the iris is not obvious under visible light, especially in the iris of people with dark iris, the color of the pupil is closer to that of the iris. In addition, the iris is easily blocked by the eyelid of the user, so that the pupil center is difficult to be accurately obtained under visible light. Visible light imaging is also easily affected by conditions such as reflection and glare in the environment, and the pupil center is difficult to obtain accurately.

At present, visible light binocular imaging is widely applied at present, symmetrical binocular imaging elements are generally adopted, the binocular imaging elements are common in robots or artificial intelligence interaction equipment, left and right lenses are generally visible light cameras or infrared cameras, and parameters of the left and right lenses are consistent or similar. The defects are mainly influenced by illumination conditions, and the calculation amount for extracting the corresponding feature points in the deep reconstruction is large. When applied to eye movement tracking, the main disadvantage of visible light binocular imaging is that it is difficult to obtain an accurate pupil center position.

Disclosure of Invention

The invention provides a depth reconstruction and eye movement tracking method and system of asymmetric multiple cameras, which are used for solving the technical problem that color and accurate pupil center position are difficult to be considered in eye movement tracking.

In order to solve the technical problems, the technical scheme provided by the invention is as follows:

a depth reconstruction method of asymmetric multi-camera comprises the following steps:

respectively acquiring images of an object through a camera group consisting of at least one visible light camera and at least one infrared camera which are fixed at different positions;

acquiring paired feature points in the two images, and determining the coordinates of the feature points on the object on the imaging plane of the visible light camera and the coordinates of the feature points on the imaging plane of the infrared camera respectively;

determining the central positions of the visible light camera and the infrared camera and the positions of the imaging planes of the visible light camera and the infrared camera respectively, determining the positions of paired feature points, and reconstructing the depth and the three-dimensional coordinates of the feature points according to the positions of the paired feature points.

Preferably, the determining the positions of the paired feature points includes: and calculating to obtain coordinates of the feature points in respective image planes of the visible light camera and the infrared camera, and converting the coordinates into the same fixed coordinate system.

Preferably, the depth of the feature points is reconstructed according to the positions of the paired feature points, and the calculation formula is as follows:

Z＝b*f/(XL-XR)

wherein Z is the depth of the feature point, i.e. the distance from the feature point to the camera plane; b is the optical center distance between the visible light camera and the infrared camera; f is the focal length; and XR and XL are distances from imaging points of two characteristic points in pairs to the left edge of the image on the visible light camera image plane and the infrared camera image plane respectively.

The invention also provides an eye movement tracking method of the asymmetric multi-camera, which comprises the following steps:

the depth reconstruction method of the asymmetric multi-camera is adopted to obtain the positions of the facial feature points of the human face in pairs, and reconstruct the depth and the three-dimensional coordinates of the facial feature points to obtain the three-dimensional image of the human face;

and acquiring the positions of the pupil center points of the human eyes in pairs, and reconstructing the depth of the pupil center points to acquire the depth and the three-dimensional coordinates of the human eyes.

Preferably, the method further comprises the steps of establishing a space geometric model according to the depth of the human eyes, the three-dimensional coordinates of the pupil center points of the human eyes, the positions of the infrared camera or the visible light camera and the light source, and the refraction relation of light rays entering the eyes from the air, and calculating the optical axis direction of the eyeballs according to the space geometric model;

determining the visual axis direction of the eyeball according to the optical axis direction and a preset individual calibration mode;

and acquiring the visual axis direction of the eyeballs at adjacent moments to realize eye movement tracking.

Preferably, the pupil center point is determined according to the infrared camera.

Preferably, the optical axis direction of the eyeball is a connecting line of a pupil center p and a corneal curvature center c, and the distance between the pupil center and the corneal curvature center is K | | | p-c | |.

As a general inventive concept, the present invention also provides an asymmetric multi-camera depth reconstruction system, including:

the camera group comprises at least one visible light camera and at least one infrared camera which are fixed at different positions so as to respectively acquire images of an object;

the position calibration unit is used for determining the positions of the paired characteristic points according to the coordinates of the characteristic points on the object on the image plane of the visible light camera, the coordinates on the image plane of the infrared camera and the relative positions of the visible light camera and the infrared camera;

and the depth reconstruction unit is used for reconstructing the depth and the three-dimensional coordinates of the characteristic points according to the positions of the paired characteristic points after the images of the characteristic points on the object are acquired in pairs through the position calibration unit and the positions of the characteristic points are acquired.

Preferably, the position calibration unit includes a feature point coordinate calculation subunit, and the feature point coordinate calculation subunit is configured to calculate and obtain a coordinate of the feature point in a fixed image plane coordinate system of the visible light camera and the infrared camera.

The invention also provides an asymmetric multi-camera eye tracking system, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of any one of the methods when executing the computer program.

The invention has the following beneficial effects:

1. according to the depth reconstruction method and system for the asymmetric multiple cameras, the infrared camera and the visible light camera are combined, and the advantages of infrared imaging and visible light imaging in eye movement tracking are fully exerted. The invention can be applied to occasions with various color requirements, can accurately obtain the boundary position and can accurately reconstruct the depth of each characteristic point. The method can be applied to the deep reconstruction of facial feature points, can reduce the calculation amount similar to the SLAM algorithm, can quickly obtain the distance from the face and the pupil to the camera, and can realize the three-dimensional face tracking.

2. According to the eye tracking method and system of the asymmetric multi-camera, the infrared camera and the visible light camera are combined to form 3D face tracking and eye tracking in an asymmetric multi-camera mode. The advantages of infrared imaging and visible light imaging in eye movement tracking are fully played, pupils can be accurately positioned and sclera reflection points can be formed under infrared light, but a visible light camera is used simultaneously, the depth of the face (the distance between the face and the camera) can be obtained, and the visible light camera is used for other imaging functions, such as daily self-photographing, photographing including face detection, face recognition, expression recognition, heartbeat measurement and the like by using computer vision or a machine learning method.

3. In addition, according to the eye movement tracking method and system of the asymmetric multi-camera, the asymmetric binocular is used for obtaining the depth of the pupil, meanwhile, the interference of the external illumination condition on imaging and eye movement tracking can be reduced through infrared imaging, the boundary of the pupil and the central point of the pupil can be accurately obtained through eye movement tracking, and the eye movement track can be better tracked.

In addition to the objects, features and advantages described above, other objects, features and advantages of the present invention are also provided. The present invention will be described in further detail below with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:

FIG. 1 is a flow chart of a method for depth reconstruction with asymmetric multi-camera in accordance with a preferred embodiment of the present invention;

FIG. 2 is a schematic view of a depth calculation model in accordance with a preferred embodiment of the present invention;

FIG. 3 is a schematic flow chart of an asymmetric multi-camera eye tracking method according to a preferred embodiment of the present invention;

FIG. 4 is a visual eye diagram (a) and an infrared imaging eye diagram (b) of a preferred embodiment of the present invention;

fig. 5 is a diagram of the pupil + iris center position under visible light (a) and the pupil center under infrared imaging (b) in accordance with a preferred embodiment of the present invention;

FIG. 6 is another schematic flow chart of an asymmetric multi-camera eye tracking method according to a preferred embodiment of the present invention;

fig. 7 is a schematic diagram of a geometric model of the eye space in accordance with a preferred embodiment of the present invention.

Detailed Description

The embodiments of the invention will be described in detail below with reference to the drawings, but the invention can be implemented in many different ways as defined and covered by the claims.

Referring to fig. 1, the method for reconstructing depth of asymmetric multi-camera of the present invention comprises the following steps:

and determining the position difference between the visible light camera and the infrared camera or the lens center positions of the visible light camera and the infrared camera and the positions of the visible light camera and the infrared camera on respective imaging planes, determining the positions of the paired feature points, and reconstructing the depth and the three-dimensional coordinates of the feature points according to the positions of the paired feature points.

The binocular depth calculation has the advantages that the larger the direct baseline distance of the two cameras is, the longer the distance which can be measured is; and can be applied indoors and outdoors. The binocular has the disadvantages that configuration and calibration are complicated, and calculation of parallax by finding corresponding feature points in two graphs by using methods such as SLAM (simultaneous localization and mapping, instant positioning and map construction, or concurrent mapping and positioning) consumes very much computing resources, and usually requires acceleration of GPU (graphics processing Unit)/FPGA (Field Programmable Gate Array) devices.

And the infrared camera and the visible light camera are combined in the steps, so that the advantages of infrared imaging and visible light imaging in eye movement tracking are fully exerted. The invention can be applied to occasions with various color requirements, can accurately obtain the boundary position and can accurately reconstruct the depth of each characteristic point. The method can be applied to the deep reconstruction of facial feature points, can reduce the calculation amount similar to the SLAM algorithm, can quickly obtain the distance from the face and the pupil to the camera, and can realize the three-dimensional face tracking.

In implementation, determining the positions of the feature points includes: and calculating to obtain the coordinates of the feature points in a fixed image plane coordinate system in the visible light camera and the infrared camera. The obtained coordinates are xy coordinates of the characteristic points on the imaging element CCD or CMOS. After the coordinates of the imaging plane of the visible light camera and the imaging plane of the infrared camera in the same fixed imaging plane coordinate system are determined, the imaging plane is used as a calculation plane, and subsequent depth reconstruction and three-dimensional coordinate establishment are carried out on the basis of the two-dimensional coordinate of the calculation plane. Here, it should be noted that, since the relative positions of the visible light camera and the infrared camera are known in the mounting process, the relative positions of the imaging plane of the visible light camera and the imaging plane of the infrared camera are known.

After conventional camera calibration and other steps, we can calculate the depth (distance to the camera) of the target point through binocular imaging. In depth reconstruction, referring to fig. 2, Target is a Target point (feature point) in space, f is a focal length, and OL and OR are optical centers of the left and right cameras. The optical axes of the left and right cameras are parallel. The small box on the connecting line of the target point and the optical center of the camera is the imaging point of the target point on the left and right image planes, and XL and XR are the distances between the two imaging points on the left and right image planes from the left edge of the image. From this, the calculation formula for reconstructing the depth of the feature points from the positions of the paired feature points is as follows:

Z＝b*f/(XL-XR)

wherein Z is the depth of the feature point, i.e. the distance from the feature point to the camera plane; b is the optical center distance between the visible light camera and the infrared camera; f is the focal length (to simplify the computational model, the same focal length of the visible camera and the infrared camera are usually selected); XR and XL are distances from imaging points of two feature points in pairs to the left edge of an image on the visible light camera image plane and the infrared camera image plane respectively. Here, it should be noted that the above calculation formula is exemplified in the present invention, and when the above calculation formula is specifically applied to different scenes, the above calculation formula of the depth may be adjusted according to the actual geometric relationship of the camera. However, no matter how it is deformed, the purpose of calculating the depth is achieved. Therefore, any modification of the present invention in its computational principle is within the scope of the present invention.

Referring to fig. 3, the present embodiment further provides an asymmetric multi-camera eye tracking method, including the following steps:

the depth reconstruction method of the asymmetric multi-camera is adopted to obtain the positions of the facial feature points of the human face in pairs, and reconstruct the depth of the facial feature points to obtain a three-dimensional image of the human face;

When reconstructing a three-dimensional image of a human face and the depth of human eyes, calculation can be performed separately. And respectively calculating to obtain a three-dimensional image of the whole human face and the depth of human eyes, wherein the positions of the human face characteristic points and the positions of the human eyes can be mutually verified when a three-dimensional model of the whole head is established.

Many tools are available for face detection and face feature point detection, and face and feature point detection can be implemented in visible light or infrared light, for example, dlib and other tool packages. Facial feature points including the inner and outer eye angles of the left and right eyes and the left and right mouth angles can be detected, see fig. 4(a) and 4(b) for eye diagrams under visible light and infrared imaging, respectively; fig. 5(a) and 5(b) are diagrams of the center position of the pupil + iris under visible light and the center of the pupil under infrared imaging, respectively. After the images of the characteristic points are obtained, the steps are adopted for depth reconstruction, and a three-dimensional image of a human face and the depth of human eyes can be obtained. The advantages of infrared imaging and visible light imaging in eye tracking are fully exerted.

In this embodiment, the pupil center point is determined by using the infrared camera, so that more accurate positioning can be obtained.

As an alternative embodiment, in another possible embodiment, the iris reflection point may also be determined by an infrared camera, and further, the spatial geometric model may be established based on the pupil center point in combination with the iris reflection point, so that the established spatial geometric model may be more accurate. Meanwhile, the depth of the face (the distance between the face and the camera) can be acquired by using a visible light camera, and the visible light camera is used for other imaging functions, such as daily self-photographing, photographing or face detection, face recognition, expression recognition, heartbeat measurement and the like by using a computer vision or machine learning method. Therefore, in this embodiment, the asymmetric multi-camera mode combining the infrared camera and the visible light camera can not only keep the daily photographing function, but also more accurately achieve the purpose of eye tracking.

In implementation, as shown in fig. 6, the refraction relationship of light entering eyes from air can be further determined according to the depth of human eyes, the three-dimensional coordinates of the pupil center of human eyes, the positions of the infrared camera or the visible light camera, the light source, and the like; and establishing a space geometric model, and calculating the optical axis direction of the eyeball according to the space geometric model. The spatial geometry model of fig. 7 shows a ray tracing diagram of the system and eye, with all points represented as three-dimensional column vectors in the right-hand cartesian World Coordinate System (WCS).

A ray from a source l is simulated which is reflected at a point q on the corneal surface such that the reflected ray passes through the camera center point O and intersects the camera detector plane at a point u.

A ray from the pupil center p is simulated which is refracted at a point on the corneal surface, r, such that the refracted ray passes through the camera center point O and intersects the camera detector plane at point v.

The optical axis of the eye passes through the pupil center p and the corneal center of curvature c, and the optical axis of the eye in space can be reconstructed as a line defined by these two points. Considering the distance K between the pupil center and the corneal center of curvature, K | | | p-c | |. The target points-pupil center p and corneal center of curvature c-are in a line, which is called the optical axis. Before eye tracking, calibration can be performed, and it is known from common practice that each person has a fixed declination angle (Beta angle) between the visual axis and the optical axis, and the fixed declination angle varies from person to person (each person has different eyeball geometry). Therefore, a calibration procedure for different individuals is required to obtain this fixed declination. Specifically, the human eye is required to view a point with a known position in a different space, and the deviation of the visual axis from the axis when the human eye views the point with the known position is recorded, so as to obtain the angle of Beta. The visual axis is determined through the calibrated Beta angle and the optical axis, so that the eye movement tracking is more accurate. In this embodiment, the human eyeball model and the fixed declination angle value (the value of the Beta angle) are calculated by the multi-point calibration method. And then, according to the points c of the continuous adjacent moments and the pupil center point p, further calculating to obtain a visual axis, and achieving the aim of tracking the eye movement.

Therefore, the boundary of the pupil and the central point of the pupil can be more accurately obtained by tracking the eye movement, and the eye movement track can be better tracked.

The invention also provides a depth reconstruction system of the asymmetric multi-camera, which comprises:

the position calibration unit is used for determining the positions of the paired characteristic points according to the coordinates of the characteristic points on the object on the image plane of the visible light camera, the coordinates on the image plane of the infrared camera and the relative positions of the visible light camera and the infrared camera; the position calibration unit comprises a feature point coordinate calculation subunit, and the feature point coordinate calculation subunit is used for calculating and obtaining the coordinate of the feature point in a fixed image plane coordinate system of the visible light camera and the infrared camera.

The depth reconstruction unit reconstructs the depth of the feature points according to the positions of the paired feature points, and the calculation formula is as follows:

Z＝b*f/(XL-XR)

wherein Z is the depth of the feature point, i.e. the distance from the feature point to the camera plane; b is the optical center distance between the visible light camera and the infrared camera; f is the focal length; XR and XL are distances from imaging points of two feature points in pairs to the left edge of an image on the visible light camera image plane and the infrared camera image plane respectively.

The present embodiment also provides an asymmetric multi-camera eye tracking system, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of any of the above embodiments when executing the computer program.

In summary, the invention combines the infrared camera and the visible light camera to form 3D face tracking and eye movement tracking in an asymmetric multi-camera mode. The advantages of infrared imaging and visible light imaging in eye movement tracking are fully played, pupils can be accurately positioned and sclera reflecting points can be formed under infrared light, but a visible light camera is used at the same time, the depth of the face (the distance between the visible light camera and the face) can be acquired, and the visible light camera is used for other imaging functions, such as daily self-photographing, photographing including face detection, face recognition, expression recognition, heartbeat measurement and the like by using computer vision or a machine learning method. The method can reduce the calculation amount similar to the SLAM algorithm, can quickly obtain the distance between the face and the pupil to the camera, realizes three-dimensional face tracking, can more accurately obtain the boundary of the pupil and the central point of the pupil, and can better track the eye movement track.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A depth reconstruction method of asymmetric multi-camera is characterized by comprising the following steps:

2. The asymmetric multi-camera depth reconstruction method as in claim 1, wherein the determining the positions of the pairs of feature points comprises: and calculating to obtain coordinates of the feature points in respective image planes of the visible light camera and the infrared camera, and converting the coordinates into the same fixed coordinate system.

3. The asymmetric multi-camera depth reconstruction method according to claim 2, wherein the depth of the feature points is reconstructed according to the positions of the pair of feature points, and the calculation formula is as follows:

Z＝b*f/(XL-XR)

4. An asymmetric multi-camera eye tracking method is characterized by comprising the following steps:

adopting the depth reconstruction method of the asymmetric multi-camera as claimed in any one of claims 1 to 3, obtaining the positions of the facial feature points of the human face in pairs, and reconstructing the depth and three-dimensional coordinates of the facial feature points to obtain a three-dimensional image of the human face;

5. The eye movement tracking method of the asymmetric multi-camera according to claim 4, further comprising establishing a space geometric model according to the depth of the human eye, the three-dimensional coordinates of the pupil center point of the human eye, and the positions of the infrared camera or the visible light camera and the light source, and the refraction relationship of the light entering the eye from the air, and calculating the optical axis direction of the eyeball according to the space geometric model;

6. The asymmetric multi-camera eye tracking method according to claim 5, wherein the pupil center point is determined according to the infrared camera.

7. The asymmetric multi-camera eye tracking method according to claim 5, wherein the optical axis direction of the eyeball is a connecting line between a pupil center p and a corneal curvature center c, and the distance between the pupil center and the corneal curvature center is K | | | p-c | |.

8. An asymmetric multi-camera depth reconstruction system, comprising:

9. The asymmetric multi-camera depth reconstruction system as in claim 8, wherein the position calibration unit comprises a feature point coordinate calculation subunit, and the feature point coordinate calculation subunit is configured to calculate coordinates of the feature point in a fixed image plane coordinate system of the visible light camera and the infrared camera.

10. An asymmetric multi-camera eye tracking system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program performs the steps of the method of any of claims 4 to 7.