WO2023272453A1

WO2023272453A1 - Gaze calibration method and apparatus, device, computer-readable storage medium, system, and vehicle

Info

Publication number: WO2023272453A1
Application number: PCT/CN2021/102861
Authority: WO
Inventors: 张代齐; 张国华; 袁麓; 郑爽; 李腾; 黄为
Original assignee: 华为技术有限公司
Priority date: 2021-06-28
Filing date: 2021-06-28
Publication date: 2023-01-05
Also published as: CN113661495A

Abstract

The present application relates to the field of intelligent driving, and discloses a gaze calibration method and apparatus, a device, a computer-readable storage medium, a system, and a vehicle. In the present application, an eye three-dimensional position of a user is obtained by means of a first image comprising eyes of the user, a gaze point three-dimensional position of the user is obtained by means of a calibration position on a display screen or a second image comprising a scene outside a vehicle as seen by the user, and a high-accuracy second gaze direction is obtained according to the eye three-dimensional position and gaze point three-dimensional position of the user, such that the accuracy of gaze estimation of the user is effectively improved, and the present application may be suitable for a cockpit scene. In addition, in the application, an optimization sample comprising the second gaze direction of the user and the first image thereof is also used to optimize a gaze tracking model by means of a few‑shot learning method, such that the accuracy of gaze estimation of the gaze tracking model for a specific user is improved, and a gaze tracking model having high accuracy for the specific user can be obtained.

Description

Line of sight calibration method and device, device, computer readable storage medium, system, vehicle

technical field

The present application relates to the field of intelligent driving, and in particular to a sight calibration method, device, equipment, computer-readable storage medium, system, and vehicle.

Background technique

Gaze tracking is an important support for upper-level applications such as distraction detection, takeover level estimation, and gaze interaction in the smart cockpit. Due to the differences in the external characteristics of the eyes and the internal structure of the eyeball between people, it is usually impossible to train a gaze tracking model that is accurate for "everyone". At the same time, due to camera installation errors and other reasons, directly using the line of sight angle output by the line of sight tracking model will cause a certain loss of accuracy, resulting in inaccurate line of sight estimation. If the error of gaze estimation can be corrected, the user experience of upper-level applications based on gaze tracking can be effectively improved.

Contents of the invention

In view of the above problems in the related technologies, the present application provides a line-of-sight calibration method, device, device, computer-readable storage medium, system, and vehicle, which can effectively improve the accuracy of line-of-sight estimation for a specific user.

In order to achieve the above object, the first aspect of the present application provides a line of sight calibration method, including: according to the first image collected by the first camera including the user's eyes, obtain the three-dimensional position of the user's eyes and the first line of sight direction; according to the three-dimensional position of the eyes, The first line of sight direction, the external parameters of the first camera and the external parameters and internal parameters of the second camera obtain the gaze area of the user in the second image, and the second image is collected by the second camera and includes the scene outside the vehicle seen by the user; according to The user's gaze area in the second image and the second image obtain the position of the user's gaze point in the second image; according to the position of the gaze point and the internal reference of the second camera, obtain the three-dimensional position of the user's gaze point; according to the three-dimensional gaze point position and the three-dimensional position of the eyes to obtain the second line of sight direction of the user, and the second line of sight direction is used as the calibrated line of sight direction.

Thus, the second image can be used to calibrate the user's line of sight direction to obtain a second line of sight direction with high accuracy, effectively improving the accuracy of the user's line of sight data, and further improving the user experience of upper-layer applications based on line of sight tracking.

As a possible implementation manner of the first aspect, the first gaze direction is extracted from the first image based on a gaze tracking model.

Thus, the user's initial line of sight direction can be efficiently obtained.

As a possible implementation of the first aspect, the gaze area of the user in the second image is obtained according to the three-dimensional position of the eye, the first line of sight direction, the external parameters of the first camera, and the external parameters and internal parameters of the second camera, including: Obtain the gaze area of the user in the second image according to the three-dimensional position of the eye, the direction of the first line of sight, the extrinsic parameters of the first camera, the extrinsic parameters and internal parameters of the second camera, and the accuracy of the line-of-sight tracking model.

In this way, the error caused by the limitation of the accuracy of the line-of-sight tracking model can be eliminated in the finally obtained second line-of-sight direction.

As a possible implementation of the first aspect, the method further includes: using the user's second gaze direction and the first image as optimization samples of the user, and optimizing the gaze tracking model based on a small sample learning method.

As a result, a small number of samples and small-scale training can continuously improve the gaze tracking model's gaze estimation accuracy for a specific user, and then obtain a user-level gaze tracking model.

As a possible implementation manner of the first aspect, the method further includes: screening the gaze point or the second gaze direction according to the confidence level of the user's gaze point in the second image.

Therefore, the amount of computation can be reduced, and the processing efficiency and the accuracy of line-of-sight calibration can be improved.

As a possible implementation manner of the first aspect, the position of the gaze point of the user in the second image is obtained according to the gaze area of the user in the second image and the second image by using a gaze point calibration model.

Thus, the gaze point of the user in the second image can be obtained efficiently, accurately and stably.

As a possible implementation of the first aspect, the gaze point calibration model also provides a probability value of the user's gaze point in the second image, and the confidence is determined by the probability value.

Thus, the data provided by the gaze point calibration model can be fully utilized to improve processing efficiency.

The second aspect of the present application provides a line of sight calibration method, including:

Obtain the three-dimensional position of the user's gaze point in response to the user's gaze operation on the reference point in the display screen;

Obtain the three-dimensional position of the user's eye according to the first image including the user's eye collected by the first camera;

According to the three-dimensional position of the gaze point and the three-dimensional position of the eyes, the second line of sight direction of the user is obtained.

As a result, the accuracy of the user's gaze data can be effectively improved, thereby improving the user experience of upper-layer applications based on gaze tracking.

As a possible implementation of the second aspect, the display screen is an augmented reality head-up display.

Thus, the driver's line of sight calibration can be realized without affecting the safe driving of the driver.

As a possible implementation of the second aspect, the method further includes: using the user's second gaze direction and the first image as optimization samples of the user, and optimizing the gaze tracking model based on a small sample learning method.

In this way, a small number of samples and small-scale training can continuously improve the gaze tracking model's estimation accuracy for a specific user's gaze, and then obtain a user-level gaze tracking model.

The third aspect of the present application provides a line of sight calibration device, including:

The eye position determination unit is configured to obtain the three-dimensional position of the user's eyes according to the first image including the user's eyes captured by the first camera;

The first line-of-sight determination unit is configured to obtain the first line-of-sight direction of the user according to the first image including the eyes of the user captured by the first camera;

The gaze area unit is configured to obtain the gaze area of the user in the second image according to the three-dimensional position of the eye, the first line of sight direction, the external parameters of the first camera, and the external parameters and internal parameters of the second camera, and the second image is collected by the second camera And include the scene outside the car seen by the user;

The gaze point calibration unit is configured to obtain the position of the gaze point of the user in the second image according to the gaze area of the user in the second image and the second image;

The gaze point conversion unit is configured to obtain the three-dimensional position of the gaze point of the user according to the position of the gaze point and the internal reference of the second camera;

The second line-of-sight determination unit is configured to obtain a second line-of-sight direction of the user according to the three-dimensional position of the gaze point and the three-dimensional position of the eyes, and the second line-of-sight direction is used as the calibrated line-of-sight direction.

As a possible implementation of the third aspect, the first gaze direction is extracted from the first image based on a gaze tracking model.

Thus, the user's initial line of sight direction can be efficiently obtained.

As a possible implementation of the third aspect, the gaze area unit is configured according to the three-dimensional position of the eye, the direction of the first line of sight, the external parameters of the first camera, the external parameters of the second camera, the internal parameters of the second camera, and the accuracy of the line-of-sight tracking model, A gaze area of the user in the second image is obtained.

As a possible implementation of the third aspect, the method further includes: an optimization unit configured to use the user's second gaze direction and the first image as optimization samples of the user, and optimize the gaze tracking model based on a small sample learning method.

As a possible implementation of the third aspect, the gaze point calibration unit is further configured to screen the gaze point according to the confidence of the user's gaze point in the second image; and/or the optimization unit is further configured to The confidence of the gaze point in the second image filters the second gaze direction.

As a possible implementation manner of the third aspect, the position of the gaze point of the user in the second image is obtained according to the gaze area of the user in the second image and the second image by using a gaze point calibration model.

As a possible implementation of the third aspect, the gaze point calibration model also provides a probability value of the user's gaze point in the second image, and the confidence is determined by the probability value.

The fourth aspect of the present application provides a line of sight calibration device, including:

The gaze point position determination unit is configured to obtain the three-dimensional position of the gaze point of the user in response to the user's gaze operation on the reference point in the display screen;

The second line of sight determination unit is configured to obtain the user's second line of sight direction according to the three-dimensional position of the gaze point and the three-dimensional position of the eyes.

As a possible implementation manner of the fourth aspect, the display screen is a display screen of an augmented reality head-up display system.

As a possible implementation of the fourth aspect, the device further includes: an optimization unit configured to use the user's second gaze direction and the first image as optimization samples of the user, and optimize the gaze tracking model based on a small sample learning method.

A fifth aspect of the present application provides a computing device, including:

at least one processor; and

At least one memory stores program instructions, and when the program instructions are executed by at least one processor, at least one processor executes the above-mentioned line-of-sight calibration method.

The sixth aspect of the present application provides a computer-readable storage medium, on which program instructions are stored, wherein, when the program instructions are executed by a computer, the computer executes the above sight calibration method.

The seventh aspect of the present application provides a driver monitoring system, including:

a first camera configured to capture a first image including the user's eyes;

a second camera configured to collect a second image comprising a scene outside the vehicle seen by the user;

at least one processor; and

At least one memory stores program instructions, and when the program instructions are executed by the at least one processor, the at least one processor executes the line-of-sight calibration method of the first aspect above.

As a result, the accuracy of line-of-sight estimation for users such as drivers in the vehicle cockpit scene can be effectively improved, thereby improving the user experience of the driver monitoring system and users of upper-layer applications such as distraction detection, takeover level estimation, and line-of-sight interaction in the smart cockpit experience.

As a possible implementation manner of the seventh aspect, the driver monitoring system further includes: a display screen configured to display a reference point to the user; when the program instructions are executed by at least one processor, at least one processor executes the second aspect. Sight calibration method.

The eighth aspect of the present application provides a vehicle, including the above-mentioned driver monitoring system.

As a result, it can effectively improve the accuracy of line-of-sight estimation of users such as drivers in the vehicle cockpit scene, thereby improving the user experience of upper-layer applications such as distraction detection, takeover level estimation, and line-of-sight interaction in the vehicle cockpit, and ultimately improving the intelligent driving of vehicles. safety.

In the embodiment of the present application, the three-dimensional position of the user's eyes is obtained through the first image containing the user's eyes, and the three-dimensional position of the user's gaze point is obtained through the calibration position on the display screen or the second image containing the scene outside the vehicle seen by the user, and then the accurate The high-degree second line of sight direction effectively improves the accuracy of user line of sight estimation and is applicable to cockpit scenarios. In addition, the second gaze direction and the first image can also be used as personalized samples of the user to optimize the gaze tracking model, thereby obtaining a gaze tracking model for a specific user, thereby solving the difficulty of optimizing the gaze tracking model and its impact on some users. The problem of low line-of-sight estimation accuracy.

These and other aspects of the invention will be made more apparent in the following description of the embodiment(s).

Description of drawings

The various features of the present invention and the relationship between the various features are further described below with reference to the accompanying drawings. The drawings are exemplary, some features are not shown to scale, and in some drawings, features customary in the field to which the application pertains and are not necessary for the application may be omitted, or additionally shown for the The application is not an essential feature, and the combination of the various features shown in the drawings is not intended to limit the application. In addition, in the whole specification, the content indicated by the same reference numeral is also the same. The specific accompanying drawings are explained as follows:

Fig. 1 is a schematic diagram of an exemplary architecture of a system in an embodiment of the present application.

Fig. 2 is a schematic diagram of the installation position of the sensor in an embodiment of the present application.

Fig. 3 is a schematic flowchart of a line of sight calibration method in an embodiment of the present application.

Fig. 4 is an example diagram of eye reference points in an embodiment of the present application.

Fig. 5 is a schematic flow chart of eye three-dimensional position estimation in an embodiment of the present application.

Fig. 6 is an example diagram of a cockpit scene applicable to the embodiment of the present application.

FIG. 7 is a schematic diagram of the gaze area in the reference coordinate system in the scene in FIG. 6 .

FIG. 8 is a schematic diagram of the gaze area in the second image in the scene in FIG. 6 .

Fig. 9 is a schematic flowchart of determining the gaze area of the user in the second image in an embodiment of the present application.

Fig. 10 is a projection example diagram between the gaze area in the reference coordinate system and the gaze area in the second image.

Fig. 11 is a schematic structural diagram of a gaze point calibration model in an embodiment of the present application.

Fig. 12 is a schematic flow chart of obtaining a three-dimensional position of a gaze point in an embodiment of the present application.

Fig. 13 is a schematic diagram of an exemplary flow chart of optimizing a gaze tracking model in an embodiment of the present application.

Fig. 14 is a schematic diagram of the driver's line of sight calibration and model optimization process in the cockpit scene.

Fig. 15 is a schematic structural diagram of a line of sight calibration device in an embodiment of the present application.

Fig. 16 is a schematic diagram of an exemplary architecture of a system in another embodiment of the present application.

Fig. 17 is a schematic flowchart of a line of sight calibration method in another embodiment of the present application.

Fig. 18 is a schematic structural diagram of a line-of-sight calibration device in another embodiment of the present application.

FIG. 19 is a schematic structural diagram of a computing device according to an embodiment of the present application.

detailed description

The words "first, second, third, etc." or similar terms such as module A, module B, and module C in the description and claims are only used to distinguish similar objects, and do not represent a specific ordering of objects. It can be understood that Obviously, where permitted, the specific order or sequence can be interchanged such that the embodiments of the application described herein can be practiced in other sequences than those illustrated or described herein.

In the following description, the involved reference numerals representing steps, such as S110, S120, etc., do not mean that this step must be executed, and the order of the previous and subsequent steps can be interchanged or executed simultaneously if allowed.

The term "comprising" used in the description and claims should not be interpreted as being restricted to what is listed thereafter; it does not exclude other elements or steps. Therefore, it should be interpreted as specifying the presence of said features, integers, steps or components, but not excluding the presence or addition of one or more other features, integers, steps or components and groups thereof. Therefore, the expression "apparatus comprising means A and B" should not be limited to an apparatus consisting of parts A and B only.

Reference in this specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases "in one embodiment" or "in an embodiment" in various places in this specification do not necessarily all refer to the same embodiment, but may. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to one of ordinary skill in the art from this disclosure, in one or more embodiments.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. In case of any inconsistency, the meaning stated in this manual or the meaning derived from the content recorded in this manual shall prevail. In addition, the terms used herein are only for the purpose of describing the embodiments of the application, and are not intended to limit the application.

In order to accurately describe the technical content in this application, and to accurately understand the present invention, the following explanations or definitions are given to the terms used in this specification before describing the specific embodiments.

Eye tracking (Eye tracking/gaze tracking), a technology for measuring the direction or point of gaze of human eyes.

Eye tracking/gaze tracking model (Eye tracking/gaze tracking model), a machine learning model that can estimate the direction or point of gaze of human eyes through images containing human eyes or faces. For example, neural network models, etc.

Driver Monitoring System (Driver Monitoring System, DMS), based on image processing technology, voice processing technology, etc., monitors the status of the driver in the car, which includes an in-car camera, processor, fill light, etc. installed in the cockpit of the car Components, the in-vehicle camera can capture images including the driver's face, head, and part of the torso (eg, arm) (ie, the DMS image in this paper).

The exterior camera, also known as the front camera, is used to capture images including scenes outside the car (especially scenes in front of the vehicle), which include the scenes outside the car seen by the driver.

The color (Red Green Blue, RGB) camera images the object in color by sensing the natural light or near-infrared light reflected back from the object.

Time of Flight (TOF) camera, by emitting light pulses to the target object, while recording the reflection movement time of the light pulse, calculates the distance between the light pulse emitter and the target object, and generates a 3D image of the target object. The 3D image includes the depth information of the target object and the information of the reflected light intensity.

PnP (Perspective-n-Point), through the N feature points in the world coordinate system and the N image points in the image coordinate system, calculate their projection relationship, so as to obtain the problem of camera or object pose. PnP solution refers to: Given n 3D reference points {c1, c2,..., Cn} to the matching point pairs of 2D projection points {u1, u2,..., un} on the camera image, the known 3D reference points are in the world coordinates The coordinates in the system, the coordinates of the 2D point in the image coordinate system, the internal parameter K of the camera is known, and the pose transformation {R|t} between the world coordinate system and the camera coordinate system is calculated, R is the rotation matrix, and t represents translation variable.

Landmark algorithm, a kind of face feature point extraction technology.

The world coordinate system, also known as the measurement coordinate system and the objective coordinate system, is a three-dimensional rectangular coordinate system, which can be used as a reference to describe the three-dimensional position of the camera and the object to be measured. It is the absolute coordinate system of the objective three-dimensional world. Usually, Pw( Xw, Yw, Zw) represent its coordinate value.

The camera coordinate system is a three-dimensional rectangular coordinate system, with the optical center of the camera as the coordinate origin, the Z axis as the camera optical axis, and the X axis and Y axis respectively parallel to the X axis and Y axis in the image coordinate system. Usually, Pc(Xc , Yc, Zc) represent its coordinate value.

The external parameters of the camera can determine the relative positional relationship between the camera coordinate system and the world coordinate system, and the parameters converted from the world coordinate system to the camera coordinate system, including the rotation matrix R and the translation vector T. Taking pinhole imaging as an example, the camera extrinsic parameters, world coordinates and camera coordinates satisfy the relation (1): Pc=RPw+T(1); where Pw is the world coordinate, Pc is the camera coordinate, T=(Tx,Ty ,Tz), is the translation vector, R=R(α,β,γ) is the rotation matrix, respectively, the rotation angle around the Z axis of the camera coordinate system is γ, the rotation angle around the Y axis is β, and the rotation angle around the X axis is α, these 6 parameters namely α, β, γ, Tx, Ty, Tz constitute the external parameters of the camera.

The internal parameters of the camera determine the projection relationship from the three-dimensional space to the two-dimensional image, which is only related to the camera. Taking the pinhole imaging model as an example, regardless of image distortion, the internal parameters can include the scale factor of the camera in the two coordinate axes u and v directions of the image coordinate system, and the principal point coordinates (x ₀ , y ₀ ), the coordinate axis tilt parameter s, the scale factor of the u axis is the ratio of the physical length of each pixel in the x direction in the image coordinate system to the camera focal length f, and the v axis scale factor is the pixel in the y direction of the image coordinate system The ratio of the physical length to the focal length of the camera. If image distortion is considered, the internal parameters can include the scale factor of the camera in the two coordinate axes u and v directions of the image coordinate system, the principal point coordinates relative to the imaging plane coordinate system, the coordinate axis tilt parameter and the distortion parameter, and the distortion parameter can include the camera The three radial distortion parameters and two tangential distortion parameters of .

The internal and external parameters of the camera can be obtained through Zhang Zhengyou calibration. In the embodiment of the present application, the internal reference and external reference of the first camera, and the internal reference and external reference of the second camera are all calibrated in the same world coordinate system.

The imaging plane coordinate system, that is, the image coordinate system, takes the center of the image plane as the coordinate origin, and the X-axis and Y-axis are respectively parallel to the two vertical sides of the image plane. P(x, y) is usually used to represent its coordinate value, and the image coordinate A system is the position of a pixel in an image in physical units (for example, millimeters).

The pixel coordinate system, that is, the image coordinate system in pixels, takes the upper left vertex of the image plane as the origin, and the X-axis and Y-axis are parallel to the X-axis and Y-axis of the image coordinate system, usually p(u,v) Represents its coordinate value, and the pixel coordinate system represents the position of the pixel in the image in units of pixels.

Taking the pinhole camera model as an example, the coordinate value of the pixel coordinate system and the coordinate value of the camera coordinate system satisfy the relationship (2).

Among them, (u, v) represent the coordinates of the image coordinate system in units of pixels, (Xc, Yc, Zc) represent the coordinates in the camera coordinate system, and K is the matrix representation of the internal camera parameters.

Few-shot learning refers to that after the neural network pre-learns a large number of samples of a certain known category, it only needs a small number of labeled samples to achieve rapid learning for a new category.

Meta-learning is an important branch of small-sample learning research. The main idea is to train the neural network by using a large number of small-sample tasks similar to the target small-sample task when the target task has fewer training samples. , so that the trained neural network has a good initial value on the target task, and then use a small number of training samples of the target small sample task to adjust the trained neural network.

Model-agnostic meta-learning (MAML) algorithm, a specific algorithm of meta-learning, its idea is: to train the initialization parameters of the machine learning model, so that the machine learning model can learn from a small amount of data from new tasks Better performance can be obtained after performing one or more learnings on the parameters.

soft argmax, an algorithm or function that can obtain the coordinates of key points through a heat map, can be implemented by a layer of a neural network, and the layer that realizes soft argmax can be called a soft argmax layer.

Binary cross entropy, a type of loss function.

Monocular depth estimation (Fast Depth), a method of estimating the distance of each pixel in the image relative to the shooting source by using an RGB image under one or only perspective.

Head Up Display (HUD), also known as parallel display system, can project important driving information such as speed, engine speed, battery power, navigation, etc. onto the windshield in front of the driver, so that the driver does not bow his head or Just turn your head and you can see vehicle parameters and driving information such as speed, engine speed, battery power, navigation, etc. through the windshield display area.

The augmented reality head-up display system (AR-HUD) precisely combines the image information with the actual traffic conditions through the specially designed internal optical system, and projects information such as tire pressure, speed, and rotational speed onto the front windshield, enabling the owner to While driving, you can view car-related information without looking down.

The first possible implementation is to collect a large amount of gaze data to train the gaze tracking model, deploy the trained gaze tracking model on the vehicle end, and the vehicle end uses the gaze tracking model to process the real-time collected images to finally obtain the user's line of sight. This implementation mainly has the following defects: there may be large individual differences between the samples used in training the gaze tracking model and the current user (for example, individual differences in the internal structure of the human eye, etc.), which makes the gaze tracking model difficult for the current user. The degree of matching is not high, resulting in inaccurate estimation of the current user's line of sight.

The second possible implementation is: use the screen to display a specific image, calibrate the gaze tracking device through the interaction between the user of the gaze tracking device and the specific image on the screen, and obtain the parameters for the user, thereby improving the use of the gaze tracking device. the accuracy of the This implementation method mainly has the following defects: it relies on the active cooperation of the user, the operation is cumbersome, and may cause calibration errors due to improper human operation, which ultimately affects the accuracy of the eye-tracking device for the user. At the same time, because it is difficult to deploy a large enough display screen directly in front of the driver in the cockpit, this implementation method is not suitable for the cockpit scene.

The third possible implementation method is that when using the screen to display the playback picture, first use the basic gaze tracking model to predict the preliminary gaze direction, obtain the preliminary gaze area on the screen according to the preliminary gaze direction, and combine the preliminary gaze area with the currently playing screen Correct the predicted gaze area to improve the gaze estimation accuracy. This implementation method mainly has the following defects: it is only applicable to the scene of watching the screen, and for the scene of constantly changing gaze point, the accuracy rate is low.

The above-mentioned implementation methods all have the problem of inaccurate line-of-sight estimation in the cockpit scene. In view of this, the embodiment of the present application proposes a line of sight calibration method, device, device, computer-readable storage medium, system, and vehicle. The three-dimensional position of the user's eye is obtained through the first image including the user's eye, and the calibration on the display screen position or the second image containing the scene outside the vehicle seen by the user to obtain the three-dimensional position of the user's gaze point, and obtain the second line-of-sight direction with high accuracy from the three-dimensional position of the user's eyes and the three-dimensional position of the gaze point. This example can effectively improve the accuracy of user line of sight estimation and is applicable to cockpit scenarios. In addition, in the embodiment of the present application, the optimized samples including the user's second gaze direction and the first image are also used to optimize the gaze tracking model through a small-sample learning method, so as to improve the gaze tracking model's estimation accuracy for the user's gaze, thereby obtaining the user's The advanced eye-tracking model solves the problem of difficulty in optimizing the eye-tracking model and the low accuracy of some users' eye-line estimation.

The embodiments of the present application may be applicable to any application scenario that requires real-time calibration or estimation of a person's gaze direction. In some examples, the embodiments of the present application may be applicable to the calibration or estimation of the driver's and/or passengers' line of sight in the cockpit environment of vehicles such as vehicles, boats, and aircrafts. In other examples, the embodiments of the present application may also be applicable to other scenarios, for example, performing line-of-sight calibration or estimation on a person wearing wearable glasses or other devices. Certainly, the embodiment of the present application may also be applied to other scenarios, which will not be listed one by one here.

[Example 1]

The system to which the embodiment is applicable is exemplarily described below.

Fig. 1 shows a schematic architecture diagram of an exemplary system 100 of this embodiment in a cockpit environment. Referring to FIG. 1 , the exemplary system 100 may include: a first camera 110 , a second camera 120 , an image processing system 130 and a model optimization system 140 .

The first camera 110 is responsible for capturing the user's eye image (ie, the first image hereinafter). Referring to FIG. 1 , taking the cockpit scene as an example, the first camera 110 may be an in-vehicle camera in the DMS, and the in-vehicle camera is used to photograph the driver in the cockpit. Taking the driver as an example, referring to the example in Figure 2, the in-vehicle camera is a DMS camera that can be installed near the A-pillar of the car (position ① in Figure 2) or near the steering wheel, and the DMS camera is preferably a higher-resolution RGB camera. Here, the human eye image (i.e., the first image hereinafter) generally refers to various types of images including human eyes, for example, a human face image, a bust image including a human face, and the like. In some embodiments, in order to obtain other information of the user while obtaining the position of the human eye through the first image, and reduce the amount of image data, the human eye image (that is, the first image hereinafter) may be a human face image.

The second camera 120 is responsible for collecting a scene image (that is, the second image below), which includes the scene outside the vehicle seen by the user, that is, the field of view of the second camera 120 and the field of view of the user at least partially overlap . Referring to FIG. 2 , taking the cockpit scene and the driver as an example, the second camera 120 may be an exterior camera, and the exterior camera may be used to capture the scene in front of the vehicle seen by the driver. Referring to the example in Figure 2, the camera outside the vehicle can be a front camera installed above the front windshield of the vehicle (position ② in Figure 2), which can capture the scene in front of the vehicle, that is, the scene outside the vehicle seen by the driver, The front camera is preferably a TOF camera, which can collect depth images, so as to obtain the distance between the vehicle and the target object in front (for example, the object that the user is looking at) through the image.

The image processing system 130 is an image processing system capable of processing DMS images and scene images, and it can run a gaze tracking model to obtain the user's preliminary gaze data and use the preliminary gaze data (ie, the first gaze direction below) to perform the hereinafter described The line of sight calibration method obtains the user's calibrated line of sight data (ie, the second line of sight direction hereinafter), thereby improving the accuracy of the user's line of sight data.

The model optimization system 140 can be responsible for the optimization of the gaze tracking model, which can optimize the gaze tracking model by using the user's calibrated gaze data provided by the image processing system 130 and provide the optimized gaze tracking model to the image processing system 130, thereby improving the sight line The accuracy of the tracking model's estimate of the user's line of sight.

In practical applications, the first camera 110, the second camera 120 and the image processing system 130 can all be deployed at the vehicle end, that is, in the vehicle. The model optimization system 140 can be deployed on the vehicle side and/or the cloud as required. The image processing system 130 and the model optimization system 140 may communicate through a network.

In some embodiments, the above exemplary system 100 may further include a model training system 150, which is responsible for training a gaze tracking model, which may be deployed in the cloud. In practical applications, the model optimization system 140 and the model training system 150 can be realized by the same system.

Referring to FIG. 2 , the camera coordinate system of the first camera 110 may be a Cartesian coordinate system Xc ₁ -Yc ₁ -Zc ₁ , and the camera coordinate system of the second camera 120 may be a Cartesian coordinate system Xc ₂ -Yc ₂ -Zc ₂ , The image coordinate system and pixel coordinate system of the first camera 110 and the second camera 120 are not shown in FIG. 2 . In this embodiment, in order to optimize the gaze tracking model by using the second gaze direction obtained through calibration, the camera coordinate system of the first camera 110 is used as the reference coordinate system. Coordinates and/or angle representations in the camera coordinate system of the camera head 110 . In a specific application, the reference coordinate system can be freely selected according to various factors such as actual needs, specific application scenarios, and calculation complexity requirements, but is not limited thereto. For example, the cockpit coordinate system of the vehicle may also be used as the reference coordinate system.

The line of sight calibration method of this embodiment will be described in detail below.

Fig. 3 shows an exemplary flow of the line of sight calibration method in this embodiment. Referring to Figure 3, an exemplary line of sight calibration method in this embodiment may include the following steps:

Step S301, according to the first image collected by the first camera 110 including the user's eyes, obtain the three-dimensional position of the user's eyes and the first line of sight direction;

Step S302, according to the three-dimensional position of the eyes, the first line of sight direction, the external parameters of the first camera 110 and the external parameters and internal parameters of the second camera 120, obtain the gaze area of the user in the second image, and the second image is collected by the second camera 120 And include the scene outside the car seen by the user;

Step S303, according to the user's gaze area in the second image and the second image, obtain the position of the user's gaze point in the second image;

Step S304, according to the position of the user's gaze point in the second image and the internal parameters of the second camera 120, obtain the three-dimensional position of the user's gaze point;

Step S305, according to the three-dimensional position of the gaze point and the three-dimensional position of the eyes, the second line of sight direction of the user is obtained, and the second line of sight direction is used as the line of sight direction after calibration.

The line of sight calibration method of this embodiment can use the second image to calibrate the user’s line of sight direction to obtain a second line of sight direction with high accuracy, effectively improving the accuracy of the user’s line of sight data, and further improving the user experience of upper-layer applications based on line of sight tracking .

The first gaze direction is extracted from the first image based on a gaze tracking model. Taking the system 100 as an example, the eye-tracking model can be trained by the model training system 150 deployed on the cloud and provided to the image processing system 130 deployed on the user's vehicle. The image processing system 130 runs the eye-tracking model on the first image including the user's eyes. The image is processed to obtain the user's first gaze direction.

The three-dimensional position of the eye can be expressed as the coordinates of the pre-selected eye fiducial point in the reference coordinate system. In at least some embodiments, the eye reference point can be selected according to the requirements of the application scene, the usage of the gaze direction, the requirement of computational complexity, the condition of hardware performance and the user's own requirements. FIG. 4 shows an example diagram of eye reference points, which may include but not limited to one or more of the middle point O of the centers of the two eyes, the center of the left eye O1, and the center of the right eye O2. Here, the center of the eye can be the center of the pupil of the eye, the center of the eyeball, the center of the cornea or other positions of the eye, which can be freely selected as required.

For users in the cockpit scene, the distance between the gaze point and the two eyes will be much greater than the distance between the two eyes. At this time, the middle point O of the center of the two eyes can be selected as the eye reference point. In this way, the line of sight estimation can be In the case of high precision, the amount of data is reduced, the computational complexity is reduced, and the processing efficiency is improved. If it is necessary to use the second gaze direction to optimize the gaze tracking model, and the user expects a higher accuracy of the gaze tracking model, the left eye center O1 and the right eye center O2 can be selected as eye reference points.

The gaze direction may be represented by a viewing angle and/or a gaze vector in a reference coordinate system. The angle of view may be the angle between the line of sight and the axis of the eyes, and the intersection of the line of sight and the axis of the eyes is the three-dimensional position of the user's eyes. The sight vector is a direction vector starting from the position of the eye in the reference coordinate system and ending at the position of the gaze point in the reference coordinate system. The direction vector can include the three-dimensional coordinates of the eye reference point in the reference coordinate system and the gaze The three-dimensional coordinates of the point in the datum coordinate system.

The fixation point refers to the point at which the user's eyes are fixed. Taking the cockpit scene as an example, the driver's gaze point is the specific position where the driver's eyes are looking. A gaze point can be represented by its position in space. In this embodiment, the three-dimensional position of the gaze point is represented by the three-dimensional coordinates of the gaze point in the reference coordinate system.

In step S301, the three-dimensional position of the user's eyes may be determined in various applicable ways. In some implementation manners, the three-dimensional position of the eyes can be obtained through a face feature point detection algorithm combined with a pre-built 3D face model. In some implementation manners, the three-dimensional position of the eye may be obtained by using the Landmark algorithm based on the two-dimensional position obtained from the first image combined with the depth information of the first image. It can be understood that any method that can obtain the three-dimensional position of a certain point in the image through image data is applicable to the determination of the three-dimensional position of the user's eyes in step S301, and will not be listed here.

Fig. 5 shows an exemplary process of eye three-dimensional position estimation. Referring to Fig. 5, the exemplary process of eye three-dimensional position estimation may include: step S501, use the face detection algorithm and the face feature point detection algorithm to process the first image, and obtain the user's face feature points in the first image position in the image; S502, combining the position of the user's face feature point in the first image with the pre-acquired standard 3D face model for PnP solution, and solving the 3D coordinates of the user's face feature point in the reference coordinate system; S503 , extracting the 3D coordinates of the user's eye reference point from the 3D coordinates of the user's facial feature points in the reference coordinate system as the 3D coordinates of the user's eyes. It should be noted that FIG. 5 is only an example, and is not intended to limit a specific implementation manner of eye three-dimensional position estimation in this embodiment.

In step S302, according to the three-dimensional position of the user's eyes, the first line of sight direction, the external parameters of the first camera 110, the internal parameters and external parameters of the second camera 120, the camera perspective projection model can be used to determine the gaze area of the user in the second image (Hereinafter, the "focus area in the second image" is simply referred to as "the second focus area"). Here, the camera perspective projection model may be a pinhole imaging model or a nonlinear perspective projection model.

In order to obtain a more accurate second gaze area, step S302 may include: according to the three-dimensional position of the user's eyes, the first line of sight direction, the external parameters of the first camera 110, the internal parameters and external parameters of the second camera 120, and the line of sight tracking model Accuracy, to obtain the gaze area of the user in the second image. In this way, the error caused by the limitation of the accuracy of the line-of-sight tracking model can be eliminated in the finally obtained second line-of-sight direction.

The process of obtaining the second gaze area will be described in detail below in conjunction with specific scenarios.

Fig. 6 shows a scene where a driver (not shown in the figure) in the cockpit environment looks at pedestrians in the crosswalk in front of the vehicle.

Fig. 9 shows an exemplary flow of determining the second gaze area of the user. Referring to Figure 9, the process of obtaining the user's second gaze area may include the following steps:

Step S901, determine the gaze area S1 of the user in the reference coordinate system according to the three-dimensional position of the user's eyes and the first line of sight direction.

Specifically, according to the coordinates (Xc ₁ , Yc ₁ , Zc ₁ ) of the reference point of the user's eyes in the reference coordinate system and the first line of sight direction ON (angle of view θ) obtained through the first image, the user's line of sight in the reference coordinate system is obtained ON. Assume that the average accuracy value of the gaze tracking model is expressed as: ±α, where α represents the error value of the viewing angle, the lower the accuracy of the gaze tracking model, the greater the value of α. In this step, the line of sight angle θ can be adjusted to an interval value [θ-α, θ+α], and the cone formed by the line of sight with the line of sight angle θ-α and the line of sight with the line of sight angle of θ+α is used as the user’s reference coordinate system The fixation area S1 of .

Fig. 7 shows the visualized graphics of the driver's gaze area S1 in the reference coordinate system in the scene shown in Fig. 6, O represents the three-dimensional position of the eyes, the solid line with arrows represents the first line of sight direction ON, and θ represents the first line of sight The angle of view in direction ON, α represents the average precision value of the gaze tracking model, and the dotted cone represents the user's gaze area S1 in the reference coordinate system.

Step S902 , projecting the gaze area S1 of the user in the reference coordinate system to the pixel coordinate system of the second camera 120 to obtain the second gaze area Q of the user.

Fig. 8 shows the second image captured by the second camera of the scene shown in Fig. 6, in which only the part where the driver is looking at is shown, and the content irrelevant to this embodiment in the scene shown in Fig. 6 is omitted, and Fig. 8 marks the user's second gaze area Q.

Taking the pinhole imaging model as an example, combined with the examples in FIGS. 6 to 8 , the projection process in this step can be realized by formula (1) and formula (2). Specifically, first, based on the external parameters of the first camera 110 and the external parameters of the second camera 120, the gaze area S1 is transformed into the camera coordinate system of the second camera 120 according to formula (1), and the gaze area S2 is obtained; then, Based on the internal reference of the second camera 120, the gaze region S2 is projected into the pixel coordinate system of the second camera 120 according to relational expression (2), to obtain the second gaze region Q of the user. Here, the extrinsics of the first camera 110 and the extrinsics of the second camera 120 are calibrated in the same world coordinate system.

The gaze area S1 is projected on the imaging surface of the second camera 120 as a quadrilateral second gaze area Q through the external parameters of the first camera 110 and the internal parameters and external parameters of the second camera 120. Usually, the accuracy of the gaze tracking model is lower , the larger the value of α, the larger the angle of the user's fixation area S1 in the reference coordinate system, and the larger the width of the quadrilateral second fixation area Q.

FIG. 10 shows an exemplary projection diagram of a line of sight OX. Referring to FIG. 10 , the projection of a point x with different depths on the line of sight OX on the imaging plane of the second camera 120 is O'X'. As shown in FIG. 10 , taking the left side O as the origin of the human line of sight in space, and OX as the first line of sight direction L, which is mapped to the camera imaging surface of the second camera 120, the mapping point of the origin of the human line of sight is O', the first The line of sight direction L is mapped to line of sight L'.

It should be noted that the methods shown in FIGS. 7 to 10 are only examples, and the method for obtaining the second attention region in the embodiment of the present application is not limited thereto.

The second gaze area may be characterized by grayscale image data. The pixels in the grayscale image data of the second attention area correspond to the pixels in the second image one by one, and the grayscale value of each pixel can indicate whether it belongs to the attention area. See the example in Figure 11 below, assuming that the visual representation of the second image is Fig1, the black and white image Fig2 is the visual representation of the second attention area, the black pixels in the black and white image Fig2 do not belong to the second attention area, and the white pixels belong to the second attention area. Look at the area. Taking the cockpit scene as an example, when the second camera uses a TOF camera, the second image is a TOF image, and the gray value of each pixel in the second image can indicate the distance from the corresponding point of the target object to the second camera.

In step S303, the gaze point of the user in the second image can be obtained based on the second gaze area and the second image through a pre-trained gaze point calibration model (herein, "the gaze point in the second image" is referred to as "the first gaze point" for short). The location of the two gaze points"). The gaze point calibration model can be any machine learning model available for image processing. Considering the high precision and good stability of the neural network, in the embodiment of the present application, the gaze point calibration model is preferably a neural network model.

An exemplary implementation of the gaze point calibration model will be described in detail below.

Fig. 11 shows an exemplary network structure of a gaze point calibration model. Referring to FIG. 11 , the gaze point calibration model may be an encoder-decoder (encoder-decoder structure) neural network model. As shown in Figure 11, the gaze point calibration model may include a channel-wise concat layer, a ResNet-18 based encoder, a Convolutional GRU Cell, a ResNet -18 decoding network (ResNet-18 based decoder), softargmax normalization (soft-argmax+scaling) layer.

Referring to Fig. 11, the processing process of the gaze point calibration model includes: at the input end of the gaze point calibration model, the image of the second gaze area and the second image are merged into a new one in the channel direction through the channel dimension splicing layer. Image, if the second image and the image of the second attention area are both single-channel grayscale images, the new image obtained by combining has 2 channels, if the second image is an RGB three-channel color image and the image of the second attention area is For a single-channel grayscale image, the merged new image has 4 channels, that is, a 4-channel image; the merged new image is input into the encoding network, and then processed by the encoding network, the convolutional gate recurrent unit neuron, and the decoding network in turn. The decoding network outputs the heat map Fig3, and the gray value of each pixel in the heat map Fig3 indicates the probability that the corresponding pixel is the fixation point. After the decoding network outputs the heat map Fig3, the heat map Fig3 is calculated by the softargmax normalization layer to obtain the position of the gaze point in the second image, that is, the coordinates (x, y) of the corresponding pixel point of the gaze point in the second image. Usually, a line of sight has a fixation point, and each fixation point may contain one or more pixels in the second image.

The fixation point calibration model can be obtained by pre-training. During training, the scene image and its corresponding grayscale image of the gaze area (the range of the gaze area in the grayscale image of the gaze area is the set value) are used as samples, and the real gaze area of the sample A known. During the training process, the ResNet part and the soft-argmax standard layer are trained at the same time but different loss functions are used. The embodiment of this application does not limit the specific loss function used. For example, the loss function of the ResNet part can be binary cross entropy (BCE loss), and the loss function of the soft-argmax standard layer can be mean square error (MSE loss).

In some examples, the decoding network in the ResNet part can use pixel-level binary cross-entropy as a loss function, and the expression is shown in the following formula (3).

Among them, y _i is the label of whether the pixel i is the fixation point, which is 1 when it is the fixation point, and 0 when it is not the fixation point. p(y _i ) is the probability value that pixel i is the fixation point in the heat map Fig3 output by the decoding network , N is the total number of pixels of the second image Fig1, that is, the total number of pixels of the heat map Fig3. In the example of FIG.

In step S304, according to the position of the user's gaze point in the second image and the internal reference of the second camera 120, there are many specific implementations for obtaining the three-dimensional position of the user's gaze point. The three-dimensional position of the gaze point is the reference coordinate system (first The three-dimensional coordinates of the gaze point in the camera coordinate system of the camera 110). It can be understood that any algorithm for obtaining the position of a certain point in space based on its position in the image can be applied to step S304.

Considering that inverse perspective transformation is relatively mature and has low computational complexity, in step S304, it is preferable to obtain the three-dimensional position of the gaze point through inverse perspective transformation. Specifically, in step S304, the Z-axis coordinates of the gaze point in the reference coordinate system can be obtained only by obtaining the depth of the second gaze point, and in conjunction with step S303 to obtain the position of the second gaze point, that is, the pixel coordinates (u, v), through Simple inverse perspective transformation can obtain the three-dimensional coordinates of the gaze point in the reference coordinate system, that is, the three-dimensional position of the gaze point.

FIG. 12 shows an exemplary specific implementation process of step S304. Referring to Fig. 12, step S304 may include: step S3041, using the second image to obtain the depth of the second gaze point based on the monocular depth estimation algorithm, the depth is the distance h of the gaze point relative to the second camera 120, and the distance h Estimate the Z-axis coordinate Zc ₂ of the gaze point in the camera coordinate system of the second camera; Step S3042, according to the position of the second gaze point, that is, the pixel coordinates (u, v) and the Z of the gaze point in the camera coordinate system of the second camera The axis coordinates are based on the internal and external parameters of the second camera 120 and the external parameters of the first camera 110 to obtain the three-dimensional coordinates of the gaze point in the reference coordinate system.

In step S3041, the distance h of each pixel in the second image relative to the second camera 120 can be calculated by using the second image through a monocular depth estimation algorithm such as FastDepth, and can be extracted from it according to the position of the second gaze point, that is, the pixel coordinates The distance h from the second gaze point relative to the second camera 120 . Here, various applicable algorithms may be used for depth estimation. In one example, it is preferable to calculate the depth of each pixel in the second image through the monocular depth estimation (FastDepth) algorithm, which has low computational complexity, high processing efficiency, mature and stable algorithm, and relatively low requirements for hardware performance , which is convenient to realize by the vehicle-end equipment with relatively low computing power.

In step S3042, according to the position of the second gaze point, that is, the pixel coordinates (u, v), the Z-axis coordinate Zc of the gaze point in the reference coordinate system, and the internal reference of the second camera 120, the gaze point at Coordinate values (Xc ₂ , Yc ₂ , Zc ₂ ) in the camera coordinate system of the second camera 120, based on the extrinsic parameters of the second camera 120 and the extrinsic parameters of the first camera 110, the camera coordinates of the gaze point in the second camera 120 The coordinate values (Xc ₂ , Yc ₂ , Zc ₂ ) in the system can be deduced by formula (1) to obtain the coordinate values (Xc ₁ , Yc ₁ , Zc ₁ ) of the gaze point in the camera coordinate system of the first camera 110, and the coordinate values (Xc ₁ , Yc ₁ , Zc ₁ ) is the three-dimensional position of the gaze point.

Usually, a line of sight has one fixation point, but due to the limitation of accuracy, multiple fixation points may be obtained corresponding to the same line of sight. At this time, the gaze point can be screened according to the confidence of the user's gaze point in the second image, so that the second line of sight direction can be obtained only by performing subsequent steps on the screened out gaze point, which can ensure the second line of sight direction Accurate while reducing the amount of calculation and improving processing efficiency. Here, the screening of gaze points can be performed before step S304, and can also be performed after step S304.

In step S303, the gaze point calibration model also provides the probability value of the second gaze point, and the confidence degree of the second gaze point can be determined by the probability value. In some embodiments, the heat map provided by the gaze point calibration model includes a probability value of the second gaze point, which represents the probability that the second gaze point is a real gaze point, and a higher probability value indicates that the corresponding second gaze point is The higher the possibility of the real gaze point, the probability value may be directly used as the confidence degree of the second gaze point or the proportional function value of the probability value may be used as the confidence degree of the second gaze point. Therefore, the confidence degree of the second gaze point can be obtained without separate calculation, which can improve processing efficiency and reduce calculation complexity.

There may be multiple specific implementation manners for screening gaze points based on confidence. In some examples, only gaze points whose confidence level of the second gaze point exceeds a preset first confidence threshold (for example, 0.9) or whose confidence level is relatively highest may be selected. If there are still multiple gaze points with the relatively highest confidence level of the second gaze point or exceeding the first confidence threshold, one or more gaze points may be randomly selected from these gaze points. Certainly, if there are still multiple gaze points whose confidence level of the second gaze point exceeds the first confidence threshold or the gaze point with the relatively highest confidence degree of the second gaze point, these multiple gaze points may also be reserved at the same time. In this way, through screening, it can not only ensure that the final obtained second line of sight direction is more accurate, but also reduce the amount of computation and data in steps S304, S305, and the following step S306, thereby effectively improving processing efficiency and reducing hardware consumption. It is easy to realize by the vehicle-end equipment with low computing power and relatively limited storage capacity.

In step S305, the second line of sight direction may be represented by a vector or an angle of view determined by the three-dimensional position of the gaze point and the three-dimensional position of the eye. In some embodiments, in the camera coordinate system of the first camera, the second line-of-sight direction can be represented by a vector with the three-dimensional position of the eye as the starting point and the three-dimensional position of the gaze point as the end point. In some embodiments, in the camera coordinate system of the first camera, the second line of sight can be characterized by the angle between the line of sight starting from the three-dimensional position of the eye and pointing to the three-dimensional position of the gaze point and the axis of the reference point of the user's eyes (that is, the viewing angle) direction.

The line-of-sight calibration in steps S301 to S305 in the embodiment of the present application may be performed by the image processing system 130 in the system 100 .

In general, deep learning models can use a small number of samples for "small sample learning" to improve the model accuracy for specific users. But for the eye-tracking model, the required data is the user’s line-of-sight data (for example, the angle of sight) in the camera coordinate system. This type of numerical data is difficult to obtain directly in the general environment, which makes the user-level data of the eye-tracking model Optimization becomes difficult. In view of this, the gaze tracking model can be optimized using the result obtained in step S305.

After step S305, the gaze calibration method of the embodiment of the present application may further include: step S306, using the user's second gaze direction and the first image as user optimization samples, and optimizing the gaze tracking model based on a small sample learning method. In this way, a small number of samples and small-scale training can continuously improve the gaze tracking model's estimation accuracy for a specific user's gaze, and then obtain a user-level gaze tracking model.

Taking the exemplary system in FIG. 1 above as an example, FIG. 13 shows an exemplary implementation process of eye-tracking model optimization in step S306. Referring to FIG. 13 , the exemplary process may include: step S3061, the image processing system 130 stores the second line of sight direction and its corresponding first image as the user's optimized sample in the user's sample library, and the sample library can be compared with User information (for example, user identity information) is associated to facilitate query, and is deployed in the model optimization system 140 . In step S3062, the model optimization system 140 optimizes the user's gaze tracking model obtained in the previous optimization based on the small-sample learning method by using the newly added optimization samples in the user's sample library. Step S3063, the model optimization system 140 sends the user's gaze tracking model optimized this time to the image processing system 130 on the user's car side, so that the image processing system 130 can use the optimized gaze tracking model to obtain its first line of sight direction. Among them, the parameter data of the user's gaze tracking model and the user's sample library obtained in the previous optimization can be associated with user information (for example, user identity information), so that the optimized samples and previous samples can be directly queried through user information during this optimization. The parameter data of the gaze tracking model obtained by one optimization. In this way, the user's optimization samples can be collected in real time and the gaze tracking model can be continuously optimized without the user being aware of it. The longer the user uses the gaze tracking model and the higher the frequency, the more accurate the gaze tracking model can estimate the user's gaze. The experience will be better, and while improving the accuracy of user gaze estimation in real time, it solves the technical problem that the gaze tracking model has low accuracy and difficult optimization for some users.

In practical applications, the optimization of step S3062 can be performed regularly or when the number of newly added optimization samples reaches a certain number or other preset conditions are met. In the case that the image processing system 130 and the model optimization system 140 can communicate normally, the optimization of step S3061 Sample library updates can be performed in real time.

Optionally, in step S3061, the user's optimized sample may be selectively uploaded to improve the quality of the optimized sample, reduce unnecessary optimization operations, and reduce hardware loss caused by model optimization. Specifically, the second gaze direction may be screened according to the confidence of the second gaze point, and only optimized samples formed by the screened second gaze direction and its corresponding first image are uploaded. Here, the screening of the second gaze direction may include but not limited to: 1) selecting a second gaze direction whose confidence level of the second gaze point is greater than a preset second confidence threshold (for example, 0.95); 2) selecting the second gaze direction; The confidence of the fixation point relative to the highest second gaze direction. Here, regarding the confidence level of the second gaze point, reference may be made to related descriptions above, and details will not be repeated here.

The few-shot learning method can be implemented by any algorithm that can optimize the gaze tracking model with a small number of samples. For example, the user's optimization samples can be used to optimize the gaze tracking model using the MAML algorithm, so as to realize the optimization of the gaze tracking model based on the small sample learning method. As a result, a gaze tracking model that is more suitable for a specific user's individual characteristics can be obtained through a small number of samples, with a small amount of data and low computational complexity, which is conducive to reducing hardware loss and hardware cost.

The following uses a cockpit scene as an example to illustrate the specific implementation of this embodiment.

FIG. 14 illustrates an exemplary process flow for the system 100 to perform line of sight calibration and model optimization in a cockpit environment. Referring to FIG. 14 , the processing flow may include: step S1401, the camera in the vehicle G captures the DMS image (i.e. the first image) of the driver A in the cockpit of the vehicle, the DMS image includes the face of the driver A, The image processing system 130 at the vehicle end of the vehicle G runs the line-of-sight tracking model to deduce the initial line-of-sight direction (i.e. the first line-of-sight direction), and at the same time use the DMS image to estimate the human eye position to obtain the three-dimensional position of the eyes of the driver A; step S1402, image processing The system 130 performs inference based on the external image captured by the external camera (i.e. the second image) and the fixation area of the initial line of sight direction to obtain the calibrated line of sight direction of the driver A (i.e. the second line of sight direction). In the scene currently seen by A, the image outside the vehicle is collected synchronously with the above-mentioned DMS image. Step S1403, when it is judged that the reliability of the calibration line of sight direction is high (for example, the confidence level of the second gaze point meets the relevant requirements above), the image processing system 130 uses the DMS image of driver A and the calibration line of sight direction as the driver A A's personalized data (i.e., optimized samples) is uploaded to the model optimization system 140, and the model optimization system 140 optimizes the driver A's gaze tracking model using a small-sample learning method, obtains the driver A's gaze tracking model and sends it to the vehicle The image processing system 130 at the G car end. It can be seen that this embodiment uses images outside the vehicle to calibrate the initial line of sight data estimated by the line of sight tracking model to improve the accuracy of line of sight data, and use the obtained calibrated line of sight data as the user's personalized line of sight data to optimize the line of sight tracking model and improve the line of sight tracking model. The line-of-sight estimation accuracy for the corresponding user. Therefore, this embodiment can not only solve the problem that the gaze estimation result of the gaze tracking model is inaccurate in actual use in the cockpit scene, but also solve the problem that the gaze tracking model is difficult to optimize due to the inability to obtain the user's gaze data in the cockpit scene. technical problem. Moreover, the system also has growth potential. In the vehicle scene, the above-mentioned processing flow can continue without the user's perception. The more the user uses the system, the more accurate the user's line of sight estimation is, and the accuracy of the line of sight tracking model for the user Also taller.

Fig. 15 shows an exemplary structure of a sight calibration device 1500 provided in this embodiment. Referring to Fig. 15, the line of sight calibration device 1500 of this embodiment may include:

The eye position determination unit 1501 is configured to obtain the three-dimensional position of the user's eyes according to the first image collected by the first camera including the user's eyes;

The first line of sight determining unit 1502 is configured to obtain the first line of sight direction of the user according to the first image including the eyes of the user captured by the first camera;

The gaze area unit 1503 is configured to obtain the gaze area of the user in the second image according to the three-dimensional position of the eye, the first line of sight direction, the external parameters of the first camera, and the external parameters and internal parameters of the second camera, and the second image Captured by the second camera and including the scene outside the vehicle seen by the user;

The gaze point calibration unit 1504 is configured to obtain the position of the gaze point of the user in the second image according to the gaze area of the user in the second image and the second image;

The gaze point conversion unit 1505 is configured to obtain the three-dimensional position of the gaze point of the user according to the position of the gaze point of the user in the second image and the internal parameters of the second camera;

The second line of sight determining unit 1506 is configured to obtain a second line of sight direction of the user according to the three-dimensional position of the gaze point and the three-dimensional position of the eye, and the second line of sight direction is used as a calibrated line of sight direction.

In some embodiments, the first gaze direction is extracted from the first image based on a gaze tracking model.

In some embodiments, the gaze area unit 1503 is configured to obtain the gaze area of the user in the second image according to the three-dimensional position of the eye, the first line of sight direction, the external parameters of the first camera, and the external parameters and internal parameters of the second camera, The method includes: obtaining the gaze area of the user in the second image according to the three-dimensional position of the eye, the first line of sight direction, the external parameters of the first camera, the external parameters of the second camera and the accuracy of the line-of-sight tracking model.

In some embodiments, the gaze calibration device further includes: an optimization unit 1507 configured to use the user's second gaze direction and the first image as optimization samples of the user, and optimize the gaze tracking based on a small sample learning method Model.

In some embodiments, the gaze point calibration unit 1504 can also be configured to filter the gaze point according to the confidence of the user's gaze point in the second image; and/or, the optimization unit 1507 is also configured to filter the gaze point according to the user's The confidence of the gaze point in the second image filters the second gaze direction.

In some embodiments, the position of the gaze point of the user in the second image is obtained according to the gaze area of the user in the second image and the second image by using a gaze point calibration model.

In some embodiments, the gaze point calibration model also provides a probability value of the user's gaze point in the second image, and the confidence level is determined by the probability value.

[Example 2]

FIG. 16 shows an exemplary architecture of a system 1600 applicable to this embodiment. 16, the exemplary system 1600 of this embodiment is basically the same as the system 100 of Embodiment 1, the difference is that the second camera 120 in the exemplary system 1600 of this embodiment is an optional component, which includes a display The display screen 160, which can be deployed on the vehicle end, is realized through the existing display components in the vehicle end equipment. Other parts of the system 1600 in this embodiment, namely the first camera 110, the image processing system 130, the model optimization system 140, and the model training system 150150, have basically the same functions as the corresponding parts in the system 100 in Embodiment 1, and will not be repeated here. This embodiment uses the display screen 160 marked with the positional relationship with the first camera 110 (that is, the camera in the car), and relies on the reference point of the user's gaze on the display screen 160 to realize the calibration of the user's line of sight and obtain its optimized sample. The eye-tracking model performs few-shot learning to improve its accuracy.

Fig. 17 shows an exemplary flow of the line of sight calibration method in this embodiment. Referring to Figure 17, the line of sight calibration method of this embodiment may include the following steps:

Step S1701, in response to the user's gazing operation on the reference point on the display screen 160, obtain the three-dimensional position of the user's gazing point;

Before this step, it may also include: controlling the display screen 160 to provide a line of sight calibration interface to the user, the line of sight calibration interface including a visual prompt for reminding the user to gaze at the reference point, so that the user performs a corresponding gaze operation according to the visual prompt. Here, the specific form of the line-of-sight calibration interface is not limited by this embodiment.

In this step, the gazing operation may be any operation related to the user gazing at the reference point on the display screen 160, and the embodiment of the present application does not limit the specific implementation or expression of the gazing operation. For example, the gaze operation may include inputting confirmation information in the gaze calibration interface while the user gazes at a reference point in the gaze calibration interface.

Taking the cockpit scene as an example, the display screen 160 may be, but not limited to, an AR-HUD of a vehicle, a dashboard of a vehicle, a portable electronic device of a user, or others. Usually, the line of sight calibration in the cockpit scene is mainly aimed at the driver or the co-pilot. Therefore, in order to ensure that the line of sight calibration does not affect safe driving, the display screen 160 is preferably an AR-HUD.

In this step, the three-dimensional coordinates of each reference point on the display screen 160 in the camera coordinate system of the first camera 110 may be pre-calibrated through the positional relationship between the display screen 160 and the first camera 110 . In this way, if the user gazes at a reference point, the reference point is the user's gaze point, and the three-dimensional coordinates of the reference point in the camera coordinate system of the first camera 110 are the three-dimensional position of the user's gaze point.

Step S1702, according to the first image collected by the first camera 110 including the user's eyes, obtain the three-dimensional position of the user's eyes;

The specific implementation of this step is the same as the specific implementation of the three-dimensional position of the eye in step S301 in the first embodiment, and will not be repeated here.

Step S1703, according to the three-dimensional position of the gaze point and the three-dimensional position of the eyes, the second line of sight direction of the user is obtained.

The specific implementation manner of this step is the same as step S305 in the first embodiment, and will not be repeated here.

The sight calibration method of this embodiment can obtain the three-dimensional position of the user's gaze point by using the reference point, and at the same time obtain the three-dimensional position of the user's eyes in combination with the first image, that is, obtain the second sight direction with high accuracy. It can be seen that the line-of-sight calibration method of this embodiment can not only effectively improve the accuracy of user line-of-sight estimation, but also has simple operation, low computational complexity, and high processing efficiency, and is suitable for the cockpit environment.

In the method of this embodiment, the camera coordinate system of the first camera 110 is preferably used as the reference coordinate system, and the second line-of-sight direction obtained thereby can be directly used for the optimization of the line-of-sight tracking model. Both the three-dimensional position of the gaze point and the three-dimensional position of the eyes are represented by the three-dimensional coordinate values in the camera coordinate system of the first camera 110 , and the second line of sight direction can be represented by the view angle or direction vector in the camera coordinate system of the first camera 110 . For details, refer to the relevant description of Embodiment 1, and details are not repeated here.

Similar to the embodiment, the gaze calibration method of this embodiment may further include: step S1704, using the user's second gaze direction and the first image as the user's optimization samples, and optimizing the gaze tracking model based on the small sample learning method. In this way, a small number of samples and small-scale training can continuously improve the gaze tracking model's estimation accuracy for a specific user's gaze, and then obtain a user-level gaze tracking model. The specific implementation manner of this step is the same as that of step S306 in the first embodiment, and will not be repeated here. Since the three-dimensional position of the gaze point in this step is obtained through calibration, its accuracy is relatively high. Therefore, there is no need to screen the second gaze direction before step S1704 in this embodiment.

Fig. 18 shows an exemplary structure of a sight calibration device 1800 provided in this embodiment. Referring to Fig. 18, the line of sight calibration device 1800 of this embodiment may include:

The gaze point position determination unit 1801 is configured to obtain the three-dimensional position of the gaze point of the user in response to the user's gaze operation on the reference point in the display screen;

The second line of sight determining unit 1506 is configured to obtain a second line of sight direction of the user according to the three-dimensional position of the gaze point and the three-dimensional position of the eye.

In some embodiments, the display screen is an augmented reality head-up display.

In some embodiments, the device further includes: an optimization unit 1507 configured to use the user's second gaze direction and the first image as optimization samples of the user, and optimize the gaze tracking model based on a few-shot learning method.

The computing device and the computer-readable storage medium in the embodiments of the present application are described below.

FIG. 19 is a schematic structural diagram of a computing device 1900 provided by an embodiment of the present application. The computing device 1900 includes: a processor 1910 and a memory 1920 .

The computing device 1900 may also include a communication interface 1930 and a bus 1940 . It should be understood that the communication interface 1930 in the computing device 1900 shown in FIG. 19 can be used to communicate with other devices. The memory 1920 and the communication interface 1930 can be connected to the processor 1910 through the bus 1940 . For ease of representation, only one line is used in FIG. 19 , but it does not mean that there is only one bus or one type of bus.

Wherein, the processor 1910 may be connected to the memory 1920 . The memory 1920 can be used to store the program codes and data. Therefore, the memory 1920 may be a storage unit inside the processor 1910, or an external storage unit independent of the processor 1910, or may include a storage unit inside the processor 1910 and an external storage unit independent of the processor 1910. part.

It should be understood that, in this embodiment of the present application, the processor 1910 may be a central processing unit (central processing unit, CPU). The processor can also be other general-purpose processors, digital signal processors (digital signal processors, DSPs), application specific integrated circuits (application specific integrated circuits, ASICs), off-the-shelf programmable gate arrays (field programmable gate arrays, FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like. Alternatively, the processor 1910 adopts one or more integrated circuits for executing related programs, so as to realize the technical solutions provided by the embodiments of the present application.

The memory 1920 may include read-only memory and random-access memory, and provides instructions and data to the processor 1910 . A portion of processor 1910 may also include non-volatile random access memory. For example, processor 1910 may also store device type information.

When the computing device 1900 is running, the processor 1910 executes the computer-executed instructions in the memory 1920 to execute the operation steps of the line-of-sight calibration method in the above-mentioned embodiments.

It should be understood that the computing device 1900 according to the embodiment of the present application may correspond to a corresponding body executing the methods according to the various embodiments of the present application, and the above-mentioned and other operations and/or functions of the modules in the computing device 1900 are for realizing the present invention. For the sake of brevity, the corresponding processes of the methods in the embodiments are not repeated here.

The system architecture of the embodiment of the present application and its related applications are exemplarily described below.

The embodiment of the present application also provides a driver monitoring system, which includes the above-mentioned first camera 110 , second camera 120 and computing device 1900 .

In some embodiments, the first camera 110 is configured to capture a first image including the eyes of the user, and the second camera 120 is configured to capture a second image including the scene seen by the user, and both the first camera 110 and the second camera 120 can communicate with each other. The computing device 1900 communicates. In the computing device 1900, the processor 1910 uses the first image provided by the first camera 110 and the second image provided by the second camera 120 to execute computer-executed instructions in the memory 1920 to execute the operation steps of the line of sight calibration method in the first embodiment above.

In some embodiments, the driver monitoring system may further include a display screen configured to display reference points to the user. In the computing device 1900, the processor 1910 uses the first image provided by the first camera 110 and the three-dimensional position of the reference point displayed on the display screen to execute computer-executed instructions in the memory 1920 to execute the operation steps of the line-of-sight calibration method in the second embodiment above.

In some embodiments, the driver monitoring system can also include a cloud server, which can be configured to use the user's second line of sight direction and the first image provided by the computing device 1900 as the user's optimization sample, and optimize the line of sight tracking model based on the small sample learning method , and provide the optimized gaze tracking model to the computing device 1900, so as to improve the estimation accuracy of the gaze tracking model for the user's gaze.

Specifically, the architecture of the driver monitoring system can refer to the system shown in FIG. 1 in the first embodiment and the system shown in FIG. 16 in the second embodiment. Wherein, the image processing system 130 can be deployed in the computing device 1900, and the above-mentioned model optimization system 140 can be deployed in the cloud server.

An embodiment of the present application also provides a vehicle, which may include the above-mentioned driver monitoring system. In a specific application, the vehicle is a motor vehicle, which can be but not limited to a sports utility vehicle, a bus, a large truck, a passenger vehicle of various commercial vehicles, and can also be but not limited to a vehicle of various boats and ships. Watercraft, aircraft, etc., which may also be, but are not limited to, hybrid vehicles, electric vehicles, plug-in hybrid electric vehicles, hydrogen powered vehicles, and other alternative fuel vehicles. Wherein, the hybrid vehicle can be any vehicle with two or more power sources, for example, a vehicle with gasoline and electric power sources.

Those skilled in the art can appreciate that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.

Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described system, device and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.

If the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

The embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, it is used to execute a line-of-sight calibration method, and the method includes the solutions described in the above-mentioned embodiments at least one of the .

The computer storage medium in the embodiments of the present application may use any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples (non-exhaustive list) of computer readable storage media include: electrical connections with one or more leads, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In this document, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a data signal carrying computer readable program code in baseband or as part of a carrier wave. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. .

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for performing the operations of the present application may be written in one or more programming languages or combinations thereof, including object-oriented programming languages—such as Java, Smalltalk, C++, and conventional Procedural Programming Language - such as "C" or a similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through the Internet using an Internet service provider). connect).

Note that the above are only preferred embodiments and technical principles used in this application. Those skilled in the art will understand that the present invention is not limited to the specific embodiments described herein, and that various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present invention. Therefore, although the present application has been described in detail through the above embodiments, the present invention is not limited to the above embodiments, and can also include more other equivalent embodiments without departing from the concept of the present invention, all of which belong to protection scope of the present invention.

Claims

A line of sight calibration method, characterized in that, comprising:

Obtaining the three-dimensional position of the user's eye and the first line of sight direction according to the first image including the user's eye collected by the first camera;

According to the three-dimensional position of the eye, the first line of sight direction, the external parameters of the first camera and the external parameters and internal parameters of the second camera, the gaze area of the user in the second image is obtained, and the second image is collected by the second camera And include the scene outside the car seen by the user;

Obtaining the position of the user's gaze point in the second image according to the gaze area of the user in the second image and the second image;

Obtain the three-dimensional position of the user's gaze point according to the position of the gaze point and the internal reference of the second camera;

According to the three-dimensional position of the gaze point and the three-dimensional position of the eyes, a second line of sight direction of the user is obtained, and the second line of sight direction is used as a calibrated line of sight direction.
The gaze calibration method according to claim 1, wherein the first gaze direction is extracted from the first image based on a gaze tracking model.
The line of sight calibration method according to claim 2, characterized in that, according to the three-dimensional position of the eye, the first line of sight direction, the extrinsic parameters of the first camera, and the extrinsic and internal parameters of the second camera, the user's position in the second image is obtained. Gaze area, including: obtaining the gaze area of the user in the second image according to the three-dimensional position of the eye, the first line of sight direction, the extrinsic parameters of the first camera, the extrinsic parameters of the second camera, and the accuracy of the gaze tracking model. .
The sight calibration method according to any one of claims 2 or 3, further comprising: taking the user's second sight direction and the first image as the user's optimized samples, and learning based on small samples The method optimizes the gaze tracking model.
The line of sight calibration method according to any one of claims 1 to 4, further comprising: performing the gaze point or the second line of sight direction according to the confidence of the user's gaze point in the second image filter.
The line of sight calibration method according to any one of claims 1 to 5, wherein the position of the user's gaze point in the second image is based on the gaze area of the user in the second image and the position of the gaze point using a gaze point calibration model. obtained from the second image described above.
The line of sight calibration method according to claim 6, wherein the gaze point calibration model simultaneously provides a probability value of the user's gaze point in the second image, and the confidence level is determined by the probability value.
A line of sight calibration method, characterized in that, comprising:

Obtain the three-dimensional position of the user's gaze point in response to the user's gaze operation on the reference point in the display screen;

Obtain the three-dimensional position of the user's eye according to the first image including the user's eye collected by the first camera;

According to the three-dimensional position of the gaze point and the three-dimensional position of the eyes, the second line-of-sight direction of the user is obtained.
The sight calibration method according to claim 8, wherein the display screen is a display screen of an augmented reality head-up display system.
The sight calibration method according to claim 8, further comprising: using the user's second sight direction and the first image as the optimization samples of the user, and optimizing based on the small sample learning method Eye tracking model.
A line of sight calibration device, characterized in that it comprises:

The eye position determination unit is configured to obtain the three-dimensional position of the user's eyes according to the first image including the user's eyes captured by the first camera;

The first line-of-sight determination unit is configured to obtain the first line-of-sight direction of the user according to the first image including the eyes of the user captured by the first camera;

The gaze area unit is configured to obtain the gaze area of the user in the second image according to the three-dimensional position of the eye, the first line of sight direction, the external parameters of the first camera, and the external parameters and internal parameters of the second camera, and the second image is obtained by The second camera captures and includes the scene outside the vehicle seen by the user;

The gaze point calibration unit is configured to obtain the position of the gaze point of the user in the second image according to the gaze area of the user in the second image and the second image;

The gaze point conversion unit is configured to obtain the three-dimensional position of the gaze point of the user according to the position of the gaze point and the internal reference of the second camera;

The second line of sight determining unit is configured to obtain a second line of sight direction of the user according to the three-dimensional position of the gaze point and the three-dimensional position of the eye.
The gaze calibration device according to claim 11, wherein the first gaze direction is extracted from the first image based on a gaze tracking model.
The sight calibration device according to claim 12, wherein the fixation area unit is configured to be based on the three-dimensional position of the eye, the first line of sight direction, the extrinsic parameters of the first camera, and the extrinsic and internal parameters of the second camera. As well as the accuracy of the gaze tracking model, the gaze area of the user in the second image is obtained.
The sight calibration device according to any one of claims 11 to 13, further comprising:

The optimization unit is configured to use the user's second gaze direction and the first image as optimization samples of the user, and optimize the gaze tracking model based on a small sample learning method.
The line of sight calibration device according to any one of claims 11 to 14, wherein the gaze point calibration unit is further configured to perform the gaze point calibration on the gaze point according to the confidence of the user's gaze point in the second image. Screening; and/or, the optimization unit is further configured to screen the second gaze direction according to the confidence of the user's gaze point in the second image.
The sight calibration device according to any one of claims 11 to 15, wherein the position of the user's gaze point in the second image is based on the gaze area of the user in the second image and the position of the gaze point using a gaze point calibration model. obtained from the second image described above.
The sight calibration device according to claim 16, wherein the gaze point calibration model also provides a probability value of the user's gaze point in the second image, and the confidence level is determined by the probability value.
A line of sight calibration device, characterized in that it comprises:

The gaze point position determination unit is configured to obtain the three-dimensional position of the gaze point of the user in response to the user's gaze operation on the reference point in the display screen;

The eye position determination unit is configured to obtain the three-dimensional position of the user's eyes according to the first image including the user's eyes captured by the first camera;

The second line of sight determining unit is configured to obtain a second line of sight direction of the user according to the three-dimensional position of the gaze point and the three-dimensional position of the eye.
The sight calibration device according to claim 18, wherein the display screen is an augmented reality head-up display.
The line of sight calibration device according to claim 18, wherein the device further comprises:

The optimization unit is configured to use the second gaze direction of the user and the first image as optimization samples of the user, and optimize the gaze tracking model based on a small sample learning method.
A computing device, comprising:

at least one processor; and

At least one memory storing program instructions which, when executed by the at least one processor, cause the at least one processor to perform the method of any one of claims 1 to 10.
A computer-readable storage medium on which program instructions are stored, wherein the program instructions cause the computer to execute the method according to any one of claims 1 to 10 when executed by a computer.
A driver monitoring system, characterized in that it comprises:

a first camera configured to capture a first image including the user's eyes;

a second camera configured to collect a second image comprising a scene outside the vehicle seen by the user;

at least one processor; and

At least one memory storing program instructions which, when executed by the at least one processor, cause the at least one processor to perform the method of any one of claims 1 to 7.
The driver monitoring system according to claim 23, further comprising:

a display screen configured to display a reference point to a user;

The program instructions, when executed by the at least one processor, cause the at least one processor to perform the method of any one of claims 8 to 10.
A vehicle, characterized by comprising the driver monitoring system according to claim 23 or 24.