WO2023272453A1 - Procédé et appareil d'étalonnage du regard, dispositif, support de stockage lisible par ordinateur, système et véhicule - Google Patents

Procédé et appareil d'étalonnage du regard, dispositif, support de stockage lisible par ordinateur, système et véhicule Download PDF

Info

Publication number
WO2023272453A1
WO2023272453A1 PCT/CN2021/102861 CN2021102861W WO2023272453A1 WO 2023272453 A1 WO2023272453 A1 WO 2023272453A1 CN 2021102861 W CN2021102861 W CN 2021102861W WO 2023272453 A1 WO2023272453 A1 WO 2023272453A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
gaze
image
line
sight
Prior art date
Application number
PCT/CN2021/102861
Other languages
English (en)
Chinese (zh)
Inventor
张代齐
张国华
袁麓
郑爽
李腾
黄为
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202180001805.6A priority Critical patent/CN113661495A/zh
Priority to PCT/CN2021/102861 priority patent/WO2023272453A1/fr
Publication of WO2023272453A1 publication Critical patent/WO2023272453A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • Gaze tracking is an important support for upper-level applications such as distraction detection, takeover level estimation, and gaze interaction in the smart cockpit. Due to the differences in the external characteristics of the eyes and the internal structure of the eyeball between people, it is usually impossible to train a gaze tracking model that is accurate for "everyone". At the same time, due to camera installation errors and other reasons, directly using the line of sight angle output by the line of sight tracking model will cause a certain loss of accuracy, resulting in inaccurate line of sight estimation. If the error of gaze estimation can be corrected, the user experience of upper-level applications based on gaze tracking can be effectively improved.
  • the present application provides a line-of-sight calibration method, device, device, computer-readable storage medium, system, and vehicle, which can effectively improve the accuracy of line-of-sight estimation for a specific user.
  • the first aspect of the present application provides a line of sight calibration method, including: according to the first image collected by the first camera including the user's eyes, obtain the three-dimensional position of the user's eyes and the first line of sight direction; according to the three-dimensional position of the eyes, The first line of sight direction, the external parameters of the first camera and the external parameters and internal parameters of the second camera obtain the gaze area of the user in the second image, and the second image is collected by the second camera and includes the scene outside the vehicle seen by the user; according to The user's gaze area in the second image and the second image obtain the position of the user's gaze point in the second image; according to the position of the gaze point and the internal reference of the second camera, obtain the three-dimensional position of the user's gaze point; according to the three-dimensional gaze point position and the three-dimensional position of the eyes to obtain the second line of sight direction of the user, and the second line of sight direction is used as the calibrated line of sight direction.
  • the second image can be used to calibrate the user's line of sight direction to obtain a second line of sight direction with high accuracy, effectively improving the accuracy of the user's line of sight data, and further improving the user experience of upper-layer applications based on line of sight tracking.
  • the first gaze direction is extracted from the first image based on a gaze tracking model.
  • a small number of samples and small-scale training can continuously improve the gaze tracking model's gaze estimation accuracy for a specific user, and then obtain a user-level gaze tracking model.
  • the position of the gaze point of the user in the second image is obtained according to the gaze area of the user in the second image and the second image by using a gaze point calibration model.
  • the gaze point of the user in the second image can be obtained efficiently, accurately and stably.
  • the gaze point calibration model also provides a probability value of the user's gaze point in the second image, and the confidence is determined by the probability value.
  • the data provided by the gaze point calibration model can be fully utilized to improve processing efficiency.
  • the second aspect of the present application provides a line of sight calibration method, including:
  • the accuracy of the user's gaze data can be effectively improved, thereby improving the user experience of upper-layer applications based on gaze tracking.
  • the display screen is an augmented reality head-up display.
  • the method further includes: using the user's second gaze direction and the first image as optimization samples of the user, and optimizing the gaze tracking model based on a small sample learning method.
  • the third aspect of the present application provides a line of sight calibration device, including:
  • the eye position determination unit is configured to obtain the three-dimensional position of the user's eyes according to the first image including the user's eyes captured by the first camera;
  • the first line-of-sight determination unit is configured to obtain the first line-of-sight direction of the user according to the first image including the eyes of the user captured by the first camera;
  • the gaze area unit is configured to obtain the gaze area of the user in the second image according to the three-dimensional position of the eye, the first line of sight direction, the external parameters of the first camera, and the external parameters and internal parameters of the second camera, and the second image is collected by the second camera And include the scene outside the car seen by the user;
  • the gaze point calibration unit is configured to obtain the position of the gaze point of the user in the second image according to the gaze area of the user in the second image and the second image;
  • the second line-of-sight determination unit is configured to obtain a second line-of-sight direction of the user according to the three-dimensional position of the gaze point and the three-dimensional position of the eyes, and the second line-of-sight direction is used as the calibrated line-of-sight direction.
  • the second image can be used to calibrate the user's line of sight direction to obtain a second line of sight direction with high accuracy, effectively improving the accuracy of the user's line of sight data, and further improving the user experience of upper-layer applications based on line of sight tracking.
  • the first gaze direction is extracted from the first image based on a gaze tracking model.
  • the gaze area unit is configured according to the three-dimensional position of the eye, the direction of the first line of sight, the external parameters of the first camera, the external parameters of the second camera, the internal parameters of the second camera, and the accuracy of the line-of-sight tracking model, A gaze area of the user in the second image is obtained.
  • a small number of samples and small-scale training can continuously improve the gaze tracking model's estimation accuracy for a specific user's gaze, and then obtain a user-level gaze tracking model.
  • the gaze point calibration unit is further configured to screen the gaze point according to the confidence of the user's gaze point in the second image; and/or the optimization unit is further configured to The confidence of the gaze point in the second image filters the second gaze direction.
  • the gaze point calibration model also provides a probability value of the user's gaze point in the second image, and the confidence is determined by the probability value.
  • the data provided by the gaze point calibration model can be fully utilized to improve processing efficiency.
  • the gaze point position determination unit is configured to obtain the three-dimensional position of the gaze point of the user in response to the user's gaze operation on the reference point in the display screen;
  • the eye position determination unit is configured to obtain the three-dimensional position of the user's eyes according to the first image including the user's eyes captured by the first camera;
  • the second line of sight determination unit is configured to obtain the user's second line of sight direction according to the three-dimensional position of the gaze point and the three-dimensional position of the eyes.
  • the accuracy of the user's gaze data can be effectively improved, thereby improving the user experience of upper-layer applications based on gaze tracking.
  • the driver's line of sight calibration can be realized without affecting the safe driving of the driver.
  • the device further includes: an optimization unit configured to use the user's second gaze direction and the first image as optimization samples of the user, and optimize the gaze tracking model based on a small sample learning method.
  • a fifth aspect of the present application provides a computing device, including:
  • the sixth aspect of the present application provides a computer-readable storage medium, on which program instructions are stored, wherein, when the program instructions are executed by a computer, the computer executes the above sight calibration method.
  • the seventh aspect of the present application provides a driver monitoring system, including:
  • At least one memory stores program instructions, and when the program instructions are executed by the at least one processor, the at least one processor executes the line-of-sight calibration method of the first aspect above.
  • the accuracy of line-of-sight estimation for users such as drivers in the vehicle cockpit scene can be effectively improved, thereby improving the user experience of the driver monitoring system and users of upper-layer applications such as distraction detection, takeover level estimation, and line-of-sight interaction in the smart cockpit experience.
  • the eighth aspect of the present application provides a vehicle, including the above-mentioned driver monitoring system.
  • Fig. 1 is a schematic diagram of an exemplary architecture of a system in an embodiment of the present application.
  • Fig. 2 is a schematic diagram of the installation position of the sensor in an embodiment of the present application.
  • Fig. 3 is a schematic flowchart of a line of sight calibration method in an embodiment of the present application.
  • Fig. 5 is a schematic flow chart of eye three-dimensional position estimation in an embodiment of the present application.
  • Fig. 6 is an example diagram of a cockpit scene applicable to the embodiment of the present application.
  • FIG. 7 is a schematic diagram of the gaze area in the reference coordinate system in the scene in FIG. 6 .
  • FIG. 8 is a schematic diagram of the gaze area in the second image in the scene in FIG. 6 .
  • Fig. 9 is a schematic flowchart of determining the gaze area of the user in the second image in an embodiment of the present application.
  • Fig. 10 is a projection example diagram between the gaze area in the reference coordinate system and the gaze area in the second image.
  • Fig. 11 is a schematic structural diagram of a gaze point calibration model in an embodiment of the present application.
  • Fig. 14 is a schematic diagram of the driver's line of sight calibration and model optimization process in the cockpit scene.
  • Fig. 16 is a schematic diagram of an exemplary architecture of a system in another embodiment of the present application.
  • Fig. 17 is a schematic flowchart of a line of sight calibration method in another embodiment of the present application.
  • Fig. 18 is a schematic structural diagram of a line-of-sight calibration device in another embodiment of the present application.
  • FIG. 19 is a schematic structural diagram of a computing device according to an embodiment of the present application.
  • Eye tracking/gaze tracking model (Eye tracking/gaze tracking model), a machine learning model that can estimate the direction or point of gaze of human eyes through images containing human eyes or faces.
  • Eye tracking/gaze tracking model a machine learning model that can estimate the direction or point of gaze of human eyes through images containing human eyes or faces.
  • neural network models etc.
  • Driver Monitoring System based on image processing technology, voice processing technology, etc., monitors the status of the driver in the car, which includes an in-car camera, processor, fill light, etc. installed in the cockpit of the car Components, the in-vehicle camera can capture images including the driver's face, head, and part of the torso (eg, arm) (ie, the DMS image in this paper).
  • DMS Driver Monitoring System
  • the in-vehicle camera can capture images including the driver's face, head, and part of the torso (eg, arm) (ie, the DMS image in this paper).
  • Time of Flight (TOF) camera by emitting light pulses to the target object, while recording the reflection movement time of the light pulse, calculates the distance between the light pulse emitter and the target object, and generates a 3D image of the target object.
  • the 3D image includes the depth information of the target object and the information of the reflected light intensity.
  • Landmark algorithm a kind of face feature point extraction technology.
  • the internal parameters of the camera determine the projection relationship from the three-dimensional space to the two-dimensional image, which is only related to the camera.
  • the internal parameters can include the scale factor of the camera in the two coordinate axes u and v directions of the image coordinate system, and the principal point coordinates (x 0 , y 0 ), the coordinate axis tilt parameter s, the scale factor of the u axis is the ratio of the physical length of each pixel in the x direction in the image coordinate system to the camera focal length f, and the v axis scale factor is the pixel in the y direction of the image coordinate system The ratio of the physical length to the focal length of the camera.
  • the internal parameters can include the scale factor of the camera in the two coordinate axes u and v directions of the image coordinate system, the principal point coordinates relative to the imaging plane coordinate system, the coordinate axis tilt parameter and the distortion parameter, and the distortion parameter can include the camera The three radial distortion parameters and two tangential distortion parameters of .
  • the internal and external parameters of the camera can be obtained through Zhang Zhengyou calibration.
  • the internal reference and external reference of the first camera, and the internal reference and external reference of the second camera are all calibrated in the same world coordinate system.
  • the imaging plane coordinate system that is, the image coordinate system, takes the center of the image plane as the coordinate origin, and the X-axis and Y-axis are respectively parallel to the two vertical sides of the image plane.
  • P(x, y) is usually used to represent its coordinate value
  • the image coordinate A system is the position of a pixel in an image in physical units (for example, millimeters).
  • the pixel coordinate system that is, the image coordinate system in pixels, takes the upper left vertex of the image plane as the origin, and the X-axis and Y-axis are parallel to the X-axis and Y-axis of the image coordinate system, usually p(u,v) Represents its coordinate value, and the pixel coordinate system represents the position of the pixel in the image in units of pixels.
  • the coordinate value of the pixel coordinate system and the coordinate value of the camera coordinate system satisfy the relationship (2).
  • (u, v) represent the coordinates of the image coordinate system in units of pixels
  • (Xc, Yc, Zc) represent the coordinates in the camera coordinate system
  • K is the matrix representation of the internal camera parameters.
  • Few-shot learning refers to that after the neural network pre-learns a large number of samples of a certain known category, it only needs a small number of labeled samples to achieve rapid learning for a new category.
  • Meta-learning is an important branch of small-sample learning research.
  • the main idea is to train the neural network by using a large number of small-sample tasks similar to the target small-sample task when the target task has fewer training samples. , so that the trained neural network has a good initial value on the target task, and then use a small number of training samples of the target small sample task to adjust the trained neural network.
  • Model-agnostic meta-learning (MAML) algorithm a specific algorithm of meta-learning, its idea is: to train the initialization parameters of the machine learning model, so that the machine learning model can learn from a small amount of data from new tasks Better performance can be obtained after performing one or more learnings on the parameters.
  • MAML Model-agnostic meta-learning
  • soft argmax an algorithm or function that can obtain the coordinates of key points through a heat map, can be implemented by a layer of a neural network, and the layer that realizes soft argmax can be called a soft argmax layer.
  • Head Up Display also known as parallel display system, can project important driving information such as speed, engine speed, battery power, navigation, etc. onto the windshield in front of the driver, so that the driver does not bow his head or Just turn your head and you can see vehicle parameters and driving information such as speed, engine speed, battery power, navigation, etc. through the windshield display area.
  • the first possible implementation is to collect a large amount of gaze data to train the gaze tracking model, deploy the trained gaze tracking model on the vehicle end, and the vehicle end uses the gaze tracking model to process the real-time collected images to finally obtain the user's line of sight.
  • This implementation mainly has the following defects: there may be large individual differences between the samples used in training the gaze tracking model and the current user (for example, individual differences in the internal structure of the human eye, etc.), which makes the gaze tracking model difficult for the current user. The degree of matching is not high, resulting in inaccurate estimation of the current user's line of sight.
  • the second possible implementation is: use the screen to display a specific image, calibrate the gaze tracking device through the interaction between the user of the gaze tracking device and the specific image on the screen, and obtain the parameters for the user, thereby improving the use of the gaze tracking device.
  • the accuracy of the This implementation method mainly has the following defects: it relies on the active cooperation of the user, the operation is cumbersome, and may cause calibration errors due to improper human operation, which ultimately affects the accuracy of the eye-tracking device for the user. At the same time, because it is difficult to deploy a large enough display screen directly in front of the driver in the cockpit, this implementation method is not suitable for the cockpit scene.
  • the optimized samples including the user's second gaze direction and the first image are also used to optimize the gaze tracking model through a small-sample learning method, so as to improve the gaze tracking model's estimation accuracy for the user's gaze, thereby obtaining the user's
  • the advanced eye-tracking model solves the problem of difficulty in optimizing the eye-tracking model and the low accuracy of some users' eye-line estimation.
  • the embodiments of the present application may be applicable to any application scenario that requires real-time calibration or estimation of a person's gaze direction.
  • the embodiments of the present application may be applicable to the calibration or estimation of the driver's and/or passengers' line of sight in the cockpit environment of vehicles such as vehicles, boats, and aircrafts.
  • the embodiments of the present application may also be applicable to other scenarios, for example, performing line-of-sight calibration or estimation on a person wearing wearable glasses or other devices.
  • the embodiment of the present application may also be applied to other scenarios, which will not be listed one by one here.
  • the first camera 110 is responsible for capturing the user's eye image (ie, the first image hereinafter).
  • the first camera 110 may be an in-vehicle camera in the DMS, and the in-vehicle camera is used to photograph the driver in the cockpit.
  • the in-vehicle camera is a DMS camera that can be installed near the A-pillar of the car (position 1 in Figure 2) or near the steering wheel, and the DMS camera is preferably a higher-resolution RGB camera.
  • the human eye image (i.e., the first image hereinafter) generally refers to various types of images including human eyes, for example, a human face image, a bust image including a human face, and the like.
  • the human eye image (that is, the first image hereinafter) may be a human face image.
  • the second camera 120 is responsible for collecting a scene image (that is, the second image below), which includes the scene outside the vehicle seen by the user, that is, the field of view of the second camera 120 and the field of view of the user at least partially overlap .
  • a scene image that is, the second image below
  • the second camera 120 may be an exterior camera, and the exterior camera may be used to capture the scene in front of the vehicle seen by the driver.
  • the camera outside the vehicle can be a front camera installed above the front windshield of the vehicle (position 2 in Figure 2), which can capture the scene in front of the vehicle, that is, the scene outside the vehicle seen by the driver,
  • the front camera is preferably a TOF camera, which can collect depth images, so as to obtain the distance between the vehicle and the target object in front (for example, the object that the user is looking at) through the image.
  • the image processing system 130 is an image processing system capable of processing DMS images and scene images, and it can run a gaze tracking model to obtain the user's preliminary gaze data and use the preliminary gaze data (ie, the first gaze direction below) to perform the hereinafter described
  • the line of sight calibration method obtains the user's calibrated line of sight data (ie, the second line of sight direction hereinafter), thereby improving the accuracy of the user's line of sight data.
  • the model optimization system 140 can be responsible for the optimization of the gaze tracking model, which can optimize the gaze tracking model by using the user's calibrated gaze data provided by the image processing system 130 and provide the optimized gaze tracking model to the image processing system 130, thereby improving the sight line The accuracy of the tracking model's estimate of the user's line of sight.
  • the above exemplary system 100 may further include a model training system 150, which is responsible for training a gaze tracking model, which may be deployed in the cloud.
  • a model training system 150 which is responsible for training a gaze tracking model, which may be deployed in the cloud.
  • the model optimization system 140 and the model training system 150 can be realized by the same system.
  • the camera coordinate system of the first camera 110 may be a Cartesian coordinate system Xc 1 -Yc 1 -Zc 1
  • the camera coordinate system of the second camera 120 may be a Cartesian coordinate system Xc 2 -Yc 2 -Zc 2
  • the image coordinate system and pixel coordinate system of the first camera 110 and the second camera 120 are not shown in FIG. 2 .
  • the camera coordinate system of the first camera 110 is used as the reference coordinate system. Coordinates and/or angle representations in the camera coordinate system of the camera head 110 .
  • the reference coordinate system can be freely selected according to various factors such as actual needs, specific application scenarios, and calculation complexity requirements, but is not limited thereto.
  • the cockpit coordinate system of the vehicle may also be used as the reference coordinate system.
  • FIG. 3 shows an exemplary flow of the line of sight calibration method in this embodiment.
  • an exemplary line of sight calibration method in this embodiment may include the following steps:
  • Step S301 according to the first image collected by the first camera 110 including the user's eyes, obtain the three-dimensional position of the user's eyes and the first line of sight direction;
  • Step S302 according to the three-dimensional position of the eyes, the first line of sight direction, the external parameters of the first camera 110 and the external parameters and internal parameters of the second camera 120, obtain the gaze area of the user in the second image, and the second image is collected by the second camera 120 And include the scene outside the car seen by the user;
  • Step S303 according to the user's gaze area in the second image and the second image, obtain the position of the user's gaze point in the second image;
  • Step S304 according to the position of the user's gaze point in the second image and the internal parameters of the second camera 120, obtain the three-dimensional position of the user's gaze point;
  • Step S305 according to the three-dimensional position of the gaze point and the three-dimensional position of the eyes, the second line of sight direction of the user is obtained, and the second line of sight direction is used as the line of sight direction after calibration.
  • the line of sight calibration method of this embodiment can use the second image to calibrate the user’s line of sight direction to obtain a second line of sight direction with high accuracy, effectively improving the accuracy of the user’s line of sight data, and further improving the user experience of upper-layer applications based on line of sight tracking .
  • the first gaze direction is extracted from the first image based on a gaze tracking model.
  • the eye-tracking model can be trained by the model training system 150 deployed on the cloud and provided to the image processing system 130 deployed on the user's vehicle.
  • the image processing system 130 runs the eye-tracking model on the first image including the user's eyes.
  • the image is processed to obtain the user's first gaze direction.
  • the gaze direction may be represented by a viewing angle and/or a gaze vector in a reference coordinate system.
  • the angle of view may be the angle between the line of sight and the axis of the eyes, and the intersection of the line of sight and the axis of the eyes is the three-dimensional position of the user's eyes.
  • the sight vector is a direction vector starting from the position of the eye in the reference coordinate system and ending at the position of the gaze point in the reference coordinate system.
  • the direction vector can include the three-dimensional coordinates of the eye reference point in the reference coordinate system and the gaze The three-dimensional coordinates of the point in the datum coordinate system.
  • the fixation point refers to the point at which the user's eyes are fixed. Taking the cockpit scene as an example, the driver's gaze point is the specific position where the driver's eyes are looking. A gaze point can be represented by its position in space. In this embodiment, the three-dimensional position of the gaze point is represented by the three-dimensional coordinates of the gaze point in the reference coordinate system.
  • Fig. 5 shows an exemplary process of eye three-dimensional position estimation.
  • the exemplary process of eye three-dimensional position estimation may include: step S501, use the face detection algorithm and the face feature point detection algorithm to process the first image, and obtain the user's face feature points in the first image position in the image; S502, combining the position of the user's face feature point in the first image with the pre-acquired standard 3D face model for PnP solution, and solving the 3D coordinates of the user's face feature point in the reference coordinate system; S503 , extracting the 3D coordinates of the user's eye reference point from the 3D coordinates of the user's facial feature points in the reference coordinate system as the 3D coordinates of the user's eyes.
  • FIG. 5 is only an example, and is not intended to limit a specific implementation manner of eye three-dimensional position estimation in this embodiment.
  • the camera perspective projection model can be used to determine the gaze area of the user in the second image (Hereinafter, the "focus area in the second image” is simply referred to as "the second focus area").
  • the camera perspective projection model may be a pinhole imaging model or a nonlinear perspective projection model.
  • step S302 may include: according to the three-dimensional position of the user's eyes, the first line of sight direction, the external parameters of the first camera 110, the internal parameters and external parameters of the second camera 120, and the line of sight tracking model Accuracy, to obtain the gaze area of the user in the second image.
  • the error caused by the limitation of the accuracy of the line-of-sight tracking model can be eliminated in the finally obtained second line-of-sight direction.
  • Fig. 6 shows a scene where a driver (not shown in the figure) in the cockpit environment looks at pedestrians in the crosswalk in front of the vehicle.
  • Fig. 9 shows an exemplary flow of determining the second gaze area of the user.
  • the process of obtaining the user's second gaze area may include the following steps:
  • Step S901 determine the gaze area S1 of the user in the reference coordinate system according to the three-dimensional position of the user's eyes and the first line of sight direction.
  • the user's line of sight in the reference coordinate system is obtained ON.
  • the average accuracy value of the gaze tracking model is expressed as: ⁇ , where ⁇ represents the error value of the viewing angle, the lower the accuracy of the gaze tracking model, the greater the value of ⁇ .
  • the line of sight angle ⁇ can be adjusted to an interval value [ ⁇ - ⁇ , ⁇ + ⁇ ], and the cone formed by the line of sight with the line of sight angle ⁇ - ⁇ and the line of sight with the line of sight angle of ⁇ + ⁇ is used as the user’s reference coordinate system
  • Fig. 7 shows the visualized graphics of the driver's gaze area S1 in the reference coordinate system in the scene shown in Fig. 6,
  • O represents the three-dimensional position of the eyes
  • the solid line with arrows represents the first line of sight direction ON
  • represents the first line of sight
  • represents the average precision value of the gaze tracking model
  • the dotted cone represents the user's gaze area S1 in the reference coordinate system.
  • Fig. 8 shows the second image captured by the second camera of the scene shown in Fig. 6, in which only the part where the driver is looking at is shown, and the content irrelevant to this embodiment in the scene shown in Fig. 6 is omitted, and Fig. 8 marks the user's second gaze area Q.
  • the projection process in this step can be realized by formula (1) and formula (2). Specifically, first, based on the external parameters of the first camera 110 and the external parameters of the second camera 120, the gaze area S1 is transformed into the camera coordinate system of the second camera 120 according to formula (1), and the gaze area S2 is obtained; then, Based on the internal reference of the second camera 120, the gaze region S2 is projected into the pixel coordinate system of the second camera 120 according to relational expression (2), to obtain the second gaze region Q of the user.
  • the extrinsics of the first camera 110 and the extrinsics of the second camera 120 are calibrated in the same world coordinate system.
  • the gaze area S1 is projected on the imaging surface of the second camera 120 as a quadrilateral second gaze area Q through the external parameters of the first camera 110 and the internal parameters and external parameters of the second camera 120.
  • the accuracy of the gaze tracking model is lower , the larger the value of ⁇ , the larger the angle of the user's fixation area S1 in the reference coordinate system, and the larger the width of the quadrilateral second fixation area Q.
  • FIG. 10 shows an exemplary projection diagram of a line of sight OX.
  • the projection of a point x with different depths on the line of sight OX on the imaging plane of the second camera 120 is O'X'.
  • OX the projection of a point x with different depths on the line of sight OX on the imaging plane of the second camera 120
  • the mapping point of the origin of the human line of sight is O'
  • the first The line of sight direction L is mapped to line of sight L'.
  • FIGS. 7 to 10 are only examples, and the method for obtaining the second attention region in the embodiment of the present application is not limited thereto.
  • the gaze point of the user in the second image can be obtained based on the second gaze area and the second image through a pre-trained gaze point calibration model (herein, "the gaze point in the second image” is referred to as “the first gaze point” for short).
  • the gaze point calibration model can be any machine learning model available for image processing. Considering the high precision and good stability of the neural network, in the embodiment of the present application, the gaze point calibration model is preferably a neural network model.
  • the decoding network outputs the heat map Fig3, and the gray value of each pixel in the heat map Fig3 indicates the probability that the corresponding pixel is the fixation point.
  • the heat map Fig3 is calculated by the softargmax normalization layer to obtain the position of the gaze point in the second image, that is, the coordinates (x, y) of the corresponding pixel point of the gaze point in the second image.
  • a line of sight has a fixation point, and each fixation point may contain one or more pixels in the second image.
  • the fixation point calibration model can be obtained by pre-training.
  • the scene image and its corresponding grayscale image of the gaze area (the range of the gaze area in the grayscale image of the gaze area is the set value) are used as samples, and the real gaze area of the sample A known.
  • the ResNet part and the soft-argmax standard layer are trained at the same time but different loss functions are used.
  • the embodiment of this application does not limit the specific loss function used.
  • the loss function of the ResNet part can be binary cross entropy (BCE loss)
  • the loss function of the soft-argmax standard layer can be mean square error (MSE loss).
  • the decoding network in the ResNet part can use pixel-level binary cross-entropy as a loss function, and the expression is shown in the following formula (3).
  • y i is the label of whether the pixel i is the fixation point, which is 1 when it is the fixation point, and 0 when it is not the fixation point.
  • p(y i ) is the probability value that pixel i is the fixation point in the heat map Fig3 output by the decoding network
  • N is the total number of pixels of the second image Fig1, that is, the total number of pixels of the heat map Fig3.
  • step S304 according to the position of the user's gaze point in the second image and the internal reference of the second camera 120, there are many specific implementations for obtaining the three-dimensional position of the user's gaze point.
  • the three-dimensional position of the gaze point is the reference coordinate system (first The three-dimensional coordinates of the gaze point in the camera coordinate system of the camera 110). It can be understood that any algorithm for obtaining the position of a certain point in space based on its position in the image can be applied to step S304.
  • step S304 it is preferable to obtain the three-dimensional position of the gaze point through inverse perspective transformation.
  • the Z-axis coordinates of the gaze point in the reference coordinate system can be obtained only by obtaining the depth of the second gaze point, and in conjunction with step S303 to obtain the position of the second gaze point, that is, the pixel coordinates (u, v), through Simple inverse perspective transformation can obtain the three-dimensional coordinates of the gaze point in the reference coordinate system, that is, the three-dimensional position of the gaze point.
  • step S304 may include: step S3041, using the second image to obtain the depth of the second gaze point based on the monocular depth estimation algorithm, the depth is the distance h of the gaze point relative to the second camera 120, and the distance h Estimate the Z-axis coordinate Zc 2 of the gaze point in the camera coordinate system of the second camera; Step S3042, according to the position of the second gaze point, that is, the pixel coordinates (u, v) and the Z of the gaze point in the camera coordinate system of the second camera The axis coordinates are based on the internal and external parameters of the second camera 120 and the external parameters of the first camera 110 to obtain the three-dimensional coordinates of the gaze point in the reference coordinate system.
  • the distance h of each pixel in the second image relative to the second camera 120 can be calculated by using the second image through a monocular depth estimation algorithm such as FastDepth, and can be extracted from it according to the position of the second gaze point, that is, the pixel coordinates The distance h from the second gaze point relative to the second camera 120 .
  • a monocular depth estimation algorithm such as FastDepth
  • various applicable algorithms may be used for depth estimation.
  • step S3042 according to the position of the second gaze point, that is, the pixel coordinates (u, v), the Z-axis coordinate Zc of the gaze point in the reference coordinate system, and the internal reference of the second camera 120, the gaze point at Coordinate values (Xc 2 , Yc 2 , Zc 2 ) in the camera coordinate system of the second camera 120, based on the extrinsic parameters of the second camera 120 and the extrinsic parameters of the first camera 110, the camera coordinates of the gaze point in the second camera 120
  • the coordinate values (Xc 2 , Yc 2 , Zc 2 ) in the system can be deduced by formula (1) to obtain the coordinate values (Xc 1 , Yc 1 , Zc 1 ) of the gaze point in the camera coordinate system of the first camera 110, and the coordinate values (Xc 1 , Yc 1 , Zc 1 ) is the three-dimensional position of the gaze point.
  • a line of sight has one fixation point, but due to the limitation of accuracy, multiple fixation points may be obtained corresponding to the same line of sight.
  • the gaze point can be screened according to the confidence of the user's gaze point in the second image, so that the second line of sight direction can be obtained only by performing subsequent steps on the screened out gaze point, which can ensure the second line of sight direction Accurate while reducing the amount of calculation and improving processing efficiency.
  • the screening of gaze points can be performed before step S304, and can also be performed after step S304.
  • the gaze point calibration model also provides the probability value of the second gaze point, and the confidence degree of the second gaze point can be determined by the probability value.
  • the heat map provided by the gaze point calibration model includes a probability value of the second gaze point, which represents the probability that the second gaze point is a real gaze point, and a higher probability value indicates that the corresponding second gaze point is The higher the possibility of the real gaze point, the probability value may be directly used as the confidence degree of the second gaze point or the proportional function value of the probability value may be used as the confidence degree of the second gaze point. Therefore, the confidence degree of the second gaze point can be obtained without separate calculation, which can improve processing efficiency and reduce calculation complexity.
  • only gaze points whose confidence level of the second gaze point exceeds a preset first confidence threshold (for example, 0.9) or whose confidence level is relatively highest may be selected. If there are still multiple gaze points with the relatively highest confidence level of the second gaze point or exceeding the first confidence threshold, one or more gaze points may be randomly selected from these gaze points. Certainly, if there are still multiple gaze points whose confidence level of the second gaze point exceeds the first confidence threshold or the gaze point with the relatively highest confidence degree of the second gaze point, these multiple gaze points may also be reserved at the same time.
  • the line-of-sight calibration in steps S301 to S305 in the embodiment of the present application may be performed by the image processing system 130 in the system 100 .
  • FIG. 13 shows an exemplary implementation process of eye-tracking model optimization in step S306.
  • the exemplary process may include: step S3061, the image processing system 130 stores the second line of sight direction and its corresponding first image as the user's optimized sample in the user's sample library, and the sample library can be compared with User information (for example, user identity information) is associated to facilitate query, and is deployed in the model optimization system 140 .
  • the model optimization system 140 optimizes the user's gaze tracking model obtained in the previous optimization based on the small-sample learning method by using the newly added optimization samples in the user's sample library.
  • step S3062 can be performed regularly or when the number of newly added optimization samples reaches a certain number or other preset conditions are met.
  • the optimization of step S3061 Sample library updates can be performed in real time.
  • the user's optimized sample may be selectively uploaded to improve the quality of the optimized sample, reduce unnecessary optimization operations, and reduce hardware loss caused by model optimization.
  • the second gaze direction may be screened according to the confidence of the second gaze point, and only optimized samples formed by the screened second gaze direction and its corresponding first image are uploaded.
  • the screening of the second gaze direction may include but not limited to: 1) selecting a second gaze direction whose confidence level of the second gaze point is greater than a preset second confidence threshold (for example, 0.95); 2) selecting the second gaze direction; The confidence of the fixation point relative to the highest second gaze direction.
  • a preset second confidence threshold for example, 0.95
  • the few-shot learning method can be implemented by any algorithm that can optimize the gaze tracking model with a small number of samples.
  • the user's optimization samples can be used to optimize the gaze tracking model using the MAML algorithm, so as to realize the optimization of the gaze tracking model based on the small sample learning method.
  • a gaze tracking model that is more suitable for a specific user's individual characteristics can be obtained through a small number of samples, with a small amount of data and low computational complexity, which is conducive to reducing hardware loss and hardware cost.
  • FIG. 14 illustrates an exemplary process flow for the system 100 to perform line of sight calibration and model optimization in a cockpit environment.
  • the processing flow may include: step S1401, the camera in the vehicle G captures the DMS image (i.e. the first image) of the driver A in the cockpit of the vehicle, the DMS image includes the face of the driver A,
  • the image processing system 130 at the vehicle end of the vehicle G runs the line-of-sight tracking model to deduce the initial line-of-sight direction (i.e.
  • Fig. 15 shows an exemplary structure of a sight calibration device 1500 provided in this embodiment.
  • the line of sight calibration device 1500 of this embodiment may include:
  • the eye position determination unit 1501 is configured to obtain the three-dimensional position of the user's eyes according to the first image collected by the first camera including the user's eyes;
  • the gaze point calibration unit 1504 is configured to obtain the position of the gaze point of the user in the second image according to the gaze area of the user in the second image and the second image;
  • the second line of sight determining unit 1506 is configured to obtain a second line of sight direction of the user according to the three-dimensional position of the gaze point and the three-dimensional position of the eye, and the second line of sight direction is used as a calibrated line of sight direction.
  • the first gaze direction is extracted from the first image based on a gaze tracking model.
  • the gaze calibration device further includes: an optimization unit 1507 configured to use the user's second gaze direction and the first image as optimization samples of the user, and optimize the gaze tracking based on a small sample learning method Model.
  • the gaze point calibration unit 1504 can also be configured to filter the gaze point according to the confidence of the user's gaze point in the second image; and/or, the optimization unit 1507 is also configured to filter the gaze point according to the user's The confidence of the gaze point in the second image filters the second gaze direction.
  • the position of the gaze point of the user in the second image is obtained according to the gaze area of the user in the second image and the second image by using a gaze point calibration model.
  • the gaze point calibration model also provides a probability value of the user's gaze point in the second image, and the confidence level is determined by the probability value.
  • FIG. 16 shows an exemplary architecture of a system 1600 applicable to this embodiment.
  • the exemplary system 1600 of this embodiment is basically the same as the system 100 of Embodiment 1, the difference is that the second camera 120 in the exemplary system 1600 of this embodiment is an optional component, which includes a display
  • the display screen 160 which can be deployed on the vehicle end, is realized through the existing display components in the vehicle end equipment.
  • Other parts of the system 1600 in this embodiment namely the first camera 110, the image processing system 130, the model optimization system 140, and the model training system 150150, have basically the same functions as the corresponding parts in the system 100 in Embodiment 1, and will not be repeated here.
  • This embodiment uses the display screen 160 marked with the positional relationship with the first camera 110 (that is, the camera in the car), and relies on the reference point of the user's gaze on the display screen 160 to realize the calibration of the user's line of sight and obtain its optimized sample.
  • the eye-tracking model performs few-shot learning to improve its accuracy.
  • Fig. 17 shows an exemplary flow of the line of sight calibration method in this embodiment.
  • the line of sight calibration method of this embodiment may include the following steps:
  • Step S1701 in response to the user's gazing operation on the reference point on the display screen 160, obtain the three-dimensional position of the user's gazing point;
  • the display screen 160 may also include: controlling the display screen 160 to provide a line of sight calibration interface to the user, the line of sight calibration interface including a visual prompt for reminding the user to gaze at the reference point, so that the user performs a corresponding gaze operation according to the visual prompt.
  • the specific form of the line-of-sight calibration interface is not limited by this embodiment.
  • the gazing operation may be any operation related to the user gazing at the reference point on the display screen 160, and the embodiment of the present application does not limit the specific implementation or expression of the gazing operation.
  • the gaze operation may include inputting confirmation information in the gaze calibration interface while the user gazes at a reference point in the gaze calibration interface.
  • the display screen 160 may be, but not limited to, an AR-HUD of a vehicle, a dashboard of a vehicle, a portable electronic device of a user, or others.
  • the line of sight calibration in the cockpit scene is mainly aimed at the driver or the co-pilot. Therefore, in order to ensure that the line of sight calibration does not affect safe driving, the display screen 160 is preferably an AR-HUD.
  • the three-dimensional coordinates of each reference point on the display screen 160 in the camera coordinate system of the first camera 110 may be pre-calibrated through the positional relationship between the display screen 160 and the first camera 110 . In this way, if the user gazes at a reference point, the reference point is the user's gaze point, and the three-dimensional coordinates of the reference point in the camera coordinate system of the first camera 110 are the three-dimensional position of the user's gaze point.
  • step S301 The specific implementation of this step is the same as the specific implementation of the three-dimensional position of the eye in step S301 in the first embodiment, and will not be repeated here.
  • Step S1703 according to the three-dimensional position of the gaze point and the three-dimensional position of the eyes, the second line of sight direction of the user is obtained.
  • step S305 in the first embodiment, and will not be repeated here.
  • the sight calibration method of this embodiment can obtain the three-dimensional position of the user's gaze point by using the reference point, and at the same time obtain the three-dimensional position of the user's eyes in combination with the first image, that is, obtain the second sight direction with high accuracy. It can be seen that the line-of-sight calibration method of this embodiment can not only effectively improve the accuracy of user line-of-sight estimation, but also has simple operation, low computational complexity, and high processing efficiency, and is suitable for the cockpit environment.
  • the gaze calibration method of this embodiment may further include: step S1704, using the user's second gaze direction and the first image as the user's optimization samples, and optimizing the gaze tracking model based on the small sample learning method.
  • step S1704 using the user's second gaze direction and the first image as the user's optimization samples, and optimizing the gaze tracking model based on the small sample learning method.
  • a small number of samples and small-scale training can continuously improve the gaze tracking model's estimation accuracy for a specific user's gaze, and then obtain a user-level gaze tracking model.
  • the specific implementation manner of this step is the same as that of step S306 in the first embodiment, and will not be repeated here. Since the three-dimensional position of the gaze point in this step is obtained through calibration, its accuracy is relatively high. Therefore, there is no need to screen the second gaze direction before step S1704 in this embodiment.
  • Fig. 18 shows an exemplary structure of a sight calibration device 1800 provided in this embodiment.
  • the line of sight calibration device 1800 of this embodiment may include:
  • the gaze point position determination unit 1801 is configured to obtain the three-dimensional position of the gaze point of the user in response to the user's gaze operation on the reference point in the display screen;
  • the second line of sight determining unit 1506 is configured to obtain a second line of sight direction of the user according to the three-dimensional position of the gaze point and the three-dimensional position of the eye.
  • the display screen is an augmented reality head-up display.
  • the device further includes: an optimization unit 1507 configured to use the user's second gaze direction and the first image as optimization samples of the user, and optimize the gaze tracking model based on a few-shot learning method.
  • an optimization unit 1507 configured to use the user's second gaze direction and the first image as optimization samples of the user, and optimize the gaze tracking model based on a few-shot learning method.
  • FIG. 19 is a schematic structural diagram of a computing device 1900 provided by an embodiment of the present application.
  • the computing device 1900 includes: a processor 1910 and a memory 1920 .
  • the computing device 1900 may also include a communication interface 1930 and a bus 1940 . It should be understood that the communication interface 1930 in the computing device 1900 shown in FIG. 19 can be used to communicate with other devices.
  • the memory 1920 and the communication interface 1930 can be connected to the processor 1910 through the bus 1940 .
  • only one line is used in FIG. 19 , but it does not mean that there is only one bus or one type of bus.
  • the processor 1910 may be connected to the memory 1920 .
  • the memory 1920 can be used to store the program codes and data. Therefore, the memory 1920 may be a storage unit inside the processor 1910, or an external storage unit independent of the processor 1910, or may include a storage unit inside the processor 1910 and an external storage unit independent of the processor 1910. part.
  • the processor 1910 may be a central processing unit (central processing unit, CPU).
  • the processor can also be other general-purpose processors, digital signal processors (digital signal processors, DSPs), application specific integrated circuits (application specific integrated circuits, ASICs), off-the-shelf programmable gate arrays (field programmable gate arrays, FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the processor 1910 adopts one or more integrated circuits for executing related programs, so as to realize the technical solutions provided by the embodiments of the present application.
  • the memory 1920 may include read-only memory and random-access memory, and provides instructions and data to the processor 1910 .
  • a portion of processor 1910 may also include non-volatile random access memory.
  • processor 1910 may also store device type information.
  • the processor 1910 executes the computer-executed instructions in the memory 1920 to execute the operation steps of the line-of-sight calibration method in the above-mentioned embodiments.
  • the computing device 1900 may correspond to a corresponding body executing the methods according to the various embodiments of the present application, and the above-mentioned and other operations and/or functions of the modules in the computing device 1900 are for realizing the present invention.
  • the corresponding processes of the methods in the embodiments are not repeated here.
  • the embodiment of the present application also provides a driver monitoring system, which includes the above-mentioned first camera 110 , second camera 120 and computing device 1900 .
  • the first camera 110 is configured to capture a first image including the eyes of the user
  • the second camera 120 is configured to capture a second image including the scene seen by the user
  • both the first camera 110 and the second camera 120 can communicate with each other.
  • the computing device 1900 communicates.
  • the processor 1910 uses the first image provided by the first camera 110 and the second image provided by the second camera 120 to execute computer-executed instructions in the memory 1920 to execute the operation steps of the line of sight calibration method in the first embodiment above.
  • the driver monitoring system may further include a display screen configured to display reference points to the user.
  • the processor 1910 uses the first image provided by the first camera 110 and the three-dimensional position of the reference point displayed on the display screen to execute computer-executed instructions in the memory 1920 to execute the operation steps of the line-of-sight calibration method in the second embodiment above.
  • the driver monitoring system can also include a cloud server, which can be configured to use the user's second line of sight direction and the first image provided by the computing device 1900 as the user's optimization sample, and optimize the line of sight tracking model based on the small sample learning method , and provide the optimized gaze tracking model to the computing device 1900, so as to improve the estimation accuracy of the gaze tracking model for the user's gaze.
  • a cloud server which can be configured to use the user's second line of sight direction and the first image provided by the computing device 1900 as the user's optimization sample, and optimize the line of sight tracking model based on the small sample learning method , and provide the optimized gaze tracking model to the computing device 1900, so as to improve the estimation accuracy of the gaze tracking model for the user's gaze.
  • the architecture of the driver monitoring system can refer to the system shown in FIG. 1 in the first embodiment and the system shown in FIG. 16 in the second embodiment.
  • the image processing system 130 can be deployed in the computing device 1900, and the above-mentioned model optimization system 140 can be deployed in the cloud server.
  • an embodiment of the present application also provides a vehicle, which may include the above-mentioned driver monitoring system.
  • the vehicle is a motor vehicle, which can be but not limited to a sports utility vehicle, a bus, a large truck, a passenger vehicle of various commercial vehicles, and can also be but not limited to a vehicle of various boats and ships.
  • Watercraft, aircraft, etc. which may also be, but are not limited to, hybrid vehicles, electric vehicles, plug-in hybrid electric vehicles, hydrogen powered vehicles, and other alternative fuel vehicles.
  • the hybrid vehicle can be any vehicle with two or more power sources, for example, a vehicle with gasoline and electric power sources.
  • the disclosed systems, devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, it is used to execute a line-of-sight calibration method, and the method includes the solutions described in the above-mentioned embodiments at least one of the .
  • the computer storage medium in the embodiments of the present application may use any combination of one or more computer-readable media.
  • the computer readable medium may be a computer readable signal medium or a computer readable storage medium.
  • a computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples (non-exhaustive list) of computer readable storage media include: electrical connections with one or more leads, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer readable signal medium may include a data signal carrying computer readable program code in baseband or as part of a carrier wave. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. .
  • Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Computer program code for performing the operations of the present application may be written in one or more programming languages or combinations thereof, including object-oriented programming languages—such as Java, Smalltalk, C++, and conventional Procedural Programming Language - such as "C" or a similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through the Internet using an Internet service provider). connect).
  • LAN local area network
  • WAN wide area network
  • connect such as AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

La présente demande se rapporte au domaine de la conduite intelligente et concerne un procédé et un appareil d'étalonnage du regard, ainsi qu'un dispositif, un support de stockage lisible par ordinateur, un système et un véhicule. Selon la présente demande, une position tridimensionnelle des yeux d'un utilisateur est obtenue au moyen d'une première image comprenant les yeux de l'utilisateur, une position tridimensionnelle du point de regard de l'utilisateur est obtenue au moyen d'une position d'étalonnage sur un écran d'affichage ou d'une seconde image comprenant une scène à l'extérieur d'un véhicule telle que vue par l'utilisateur, et une seconde direction de regard à haute précision est obtenue en fonction de la position tridimensionnelle des yeux et de la position tridimensionnelle du point de regard de l'utilisateur, de façon à ce que la précision d'estimation du regard de l'utilisateur soit améliorée de manière efficace et que la présente demande puisse être appropriée à une scène d'habitacle. De plus, selon la demande, un échantillon d'optimisation comprenant la seconde direction du regard de l'utilisateur et la première image de celui-ci est également utilisé pour optimiser un modèle de suivi du regard au moyen d'un procédé d'apprentissage en quelques coups, ce qui permet d'améliorer la précision de l'estimation du regard du modèle de suivi du regard pour un utilisateur spécifique, ainsi que d'obtenir un modèle de suivi du regard ayant une précision élevée pour l'utilisateur spécifique.
PCT/CN2021/102861 2021-06-28 2021-06-28 Procédé et appareil d'étalonnage du regard, dispositif, support de stockage lisible par ordinateur, système et véhicule WO2023272453A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180001805.6A CN113661495A (zh) 2021-06-28 2021-06-28 视线校准方法及装置、设备、计算机可读存储介质、系统、车辆
PCT/CN2021/102861 WO2023272453A1 (fr) 2021-06-28 2021-06-28 Procédé et appareil d'étalonnage du regard, dispositif, support de stockage lisible par ordinateur, système et véhicule

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/102861 WO2023272453A1 (fr) 2021-06-28 2021-06-28 Procédé et appareil d'étalonnage du regard, dispositif, support de stockage lisible par ordinateur, système et véhicule

Publications (1)

Publication Number Publication Date
WO2023272453A1 true WO2023272453A1 (fr) 2023-01-05

Family

ID=78494760

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/102861 WO2023272453A1 (fr) 2021-06-28 2021-06-28 Procédé et appareil d'étalonnage du regard, dispositif, support de stockage lisible par ordinateur, système et véhicule

Country Status (2)

Country Link
CN (1) CN113661495A (fr)
WO (1) WO2023272453A1 (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116052235B (zh) * 2022-05-31 2023-10-20 荣耀终端有限公司 注视点估计方法及电子设备
CN115661913A (zh) * 2022-08-19 2023-01-31 北京津发科技股份有限公司 一种眼动分析方法及系统
CN115840502B (zh) * 2022-11-23 2023-07-21 深圳市华弘智谷科技有限公司 三维视线追踪方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018170538A1 (fr) * 2017-03-21 2018-09-27 Seeing Machines Limited Système et procédé de capture de données de position de regard réelle
CN109849788A (zh) * 2018-12-29 2019-06-07 北京七鑫易维信息技术有限公司 信息提供方法、装置及系统
CN110341617A (zh) * 2019-07-08 2019-10-18 北京七鑫易维信息技术有限公司 眼球追踪方法、装置、车辆和存储介质
CN111427451A (zh) * 2020-03-25 2020-07-17 中国人民解放军海军特色医学中心 采用扫描仪与眼动仪确定注视点在三维场景中位置的方法
US20210042520A1 (en) * 2019-06-14 2021-02-11 Tobii Ab Deep learning for three dimensional (3d) gaze prediction

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018170538A1 (fr) * 2017-03-21 2018-09-27 Seeing Machines Limited Système et procédé de capture de données de position de regard réelle
CN109849788A (zh) * 2018-12-29 2019-06-07 北京七鑫易维信息技术有限公司 信息提供方法、装置及系统
US20210042520A1 (en) * 2019-06-14 2021-02-11 Tobii Ab Deep learning for three dimensional (3d) gaze prediction
CN110341617A (zh) * 2019-07-08 2019-10-18 北京七鑫易维信息技术有限公司 眼球追踪方法、装置、车辆和存储介质
CN111427451A (zh) * 2020-03-25 2020-07-17 中国人民解放军海军特色医学中心 采用扫描仪与眼动仪确定注视点在三维场景中位置的方法

Also Published As

Publication number Publication date
CN113661495A (zh) 2021-11-16

Similar Documents

Publication Publication Date Title
WO2023272453A1 (fr) Procédé et appareil d'étalonnage du regard, dispositif, support de stockage lisible par ordinateur, système et véhicule
US11699293B2 (en) Neural network image processing apparatus
WO2021197189A1 (fr) Procédé, système et appareil d'affichage d'informations basé sur la réalité augmentée, et dispositif de projection
CN110167823B (zh) 用于驾驶员监测的系统和方法
WO2021013193A1 (fr) Procédé et appareil d'identification de feu de circulation
EP3033999B1 (fr) Appareil et procédé de détermination de l'état d'un conducteur
WO2019137065A1 (fr) Procédé et appareil de traitement d'image, système d'affichage tête haute monté sur véhicule, et véhicule
CN110703904B (zh) 一种基于视线跟踪的增强虚拟现实投影方法及系统
US20190279009A1 (en) Systems and methods for monitoring driver state
US11112791B2 (en) Selective compression of image data during teleoperation of a vehicle
US20220058407A1 (en) Neural Network For Head Pose And Gaze Estimation Using Photorealistic Synthetic Data
CN110341617B (zh) 眼球追踪方法、装置、车辆和存储介质
US11948315B2 (en) Image composition in multiview automotive and robotics systems
CN109889807A (zh) 车载投射调节方法、装置、设备和存储介质
WO2022241638A1 (fr) Procédé et appareil de projection, et véhicule et ar-hud
JP7176520B2 (ja) 情報処理装置、情報処理方法及びプログラム
CN111854620B (zh) 基于单目相机的实际瞳距测定方法、装置以及设备
WO2022068193A1 (fr) Dispositif portable, procédé et appareil de guidage intelligents, système de guidage et support de stockage
WO2021227969A1 (fr) Procédé de traitement de données et dispositif correspondant
WO2023272725A1 (fr) Procédé et appareil de traitement d'image faciale, et véhicule
WO2022257120A1 (fr) Procédé, dispositif et système de détermination de position de pupille
CN116543266A (zh) 注视行为知识引导的自动驾驶智能模型训练方法及装置
CN113780125A (zh) 一种驾驶员多特征融合的疲劳状态检测方法及装置
JP2021009503A (ja) 個人データ取得システム、個人データ取得方法、画像処理装置用顔センシングパラメータの調整方法及びコンピュータプログラム
CN113822174B (zh) 视线估计的方法、电子设备及存储介质

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE