WO2023272453A1 - Gaze calibration method and apparatus, device, computer-readable storage medium, system, and vehicle - Google Patents
Gaze calibration method and apparatus, device, computer-readable storage medium, system, and vehicle Download PDFInfo
- Publication number
- WO2023272453A1 WO2023272453A1 PCT/CN2021/102861 CN2021102861W WO2023272453A1 WO 2023272453 A1 WO2023272453 A1 WO 2023272453A1 CN 2021102861 W CN2021102861 W CN 2021102861W WO 2023272453 A1 WO2023272453 A1 WO 2023272453A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- gaze
- image
- line
- sight
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 103
- 210000001508 eye Anatomy 0.000 claims abstract description 133
- 238000005457 optimization Methods 0.000 claims abstract description 55
- 230000013016 learning Effects 0.000 claims abstract description 25
- 238000012544 monitoring process Methods 0.000 claims description 14
- 230000003190 augmentative effect Effects 0.000 claims description 6
- 230000004044 response Effects 0.000 claims description 6
- 238000012216 screening Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 description 40
- 238000010586 diagram Methods 0.000 description 18
- 238000004422 calculation algorithm Methods 0.000 description 17
- 230000008569 process Effects 0.000 description 17
- 238000012549 training Methods 0.000 description 17
- 230000006870 function Effects 0.000 description 15
- 238000003384 imaging method Methods 0.000 description 9
- 230000003287 optical effect Effects 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 6
- 238000001514 detection method Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 210000003128 head Anatomy 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 101150004367 Il4i1 gene Proteins 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 210000005252 bulbus oculi Anatomy 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000009434 installation Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- UFHFLCQGNIYNRP-UHFFFAOYSA-N Hydrogen Chemical compound [H][H] UFHFLCQGNIYNRP-UHFFFAOYSA-N 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 210000004087 cornea Anatomy 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 235000019800 disodium phosphate Nutrition 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 210000000887 face Anatomy 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 239000000446 fuel Substances 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 229910052739 hydrogen Inorganic materials 0.000 description 1
- 239000001257 hydrogen Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 210000001747 pupil Anatomy 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Definitions
- Gaze tracking is an important support for upper-level applications such as distraction detection, takeover level estimation, and gaze interaction in the smart cockpit. Due to the differences in the external characteristics of the eyes and the internal structure of the eyeball between people, it is usually impossible to train a gaze tracking model that is accurate for "everyone". At the same time, due to camera installation errors and other reasons, directly using the line of sight angle output by the line of sight tracking model will cause a certain loss of accuracy, resulting in inaccurate line of sight estimation. If the error of gaze estimation can be corrected, the user experience of upper-level applications based on gaze tracking can be effectively improved.
- the present application provides a line-of-sight calibration method, device, device, computer-readable storage medium, system, and vehicle, which can effectively improve the accuracy of line-of-sight estimation for a specific user.
- the first aspect of the present application provides a line of sight calibration method, including: according to the first image collected by the first camera including the user's eyes, obtain the three-dimensional position of the user's eyes and the first line of sight direction; according to the three-dimensional position of the eyes, The first line of sight direction, the external parameters of the first camera and the external parameters and internal parameters of the second camera obtain the gaze area of the user in the second image, and the second image is collected by the second camera and includes the scene outside the vehicle seen by the user; according to The user's gaze area in the second image and the second image obtain the position of the user's gaze point in the second image; according to the position of the gaze point and the internal reference of the second camera, obtain the three-dimensional position of the user's gaze point; according to the three-dimensional gaze point position and the three-dimensional position of the eyes to obtain the second line of sight direction of the user, and the second line of sight direction is used as the calibrated line of sight direction.
- the second image can be used to calibrate the user's line of sight direction to obtain a second line of sight direction with high accuracy, effectively improving the accuracy of the user's line of sight data, and further improving the user experience of upper-layer applications based on line of sight tracking.
- the first gaze direction is extracted from the first image based on a gaze tracking model.
- a small number of samples and small-scale training can continuously improve the gaze tracking model's gaze estimation accuracy for a specific user, and then obtain a user-level gaze tracking model.
- the position of the gaze point of the user in the second image is obtained according to the gaze area of the user in the second image and the second image by using a gaze point calibration model.
- the gaze point of the user in the second image can be obtained efficiently, accurately and stably.
- the gaze point calibration model also provides a probability value of the user's gaze point in the second image, and the confidence is determined by the probability value.
- the data provided by the gaze point calibration model can be fully utilized to improve processing efficiency.
- the second aspect of the present application provides a line of sight calibration method, including:
- the accuracy of the user's gaze data can be effectively improved, thereby improving the user experience of upper-layer applications based on gaze tracking.
- the display screen is an augmented reality head-up display.
- the method further includes: using the user's second gaze direction and the first image as optimization samples of the user, and optimizing the gaze tracking model based on a small sample learning method.
- the third aspect of the present application provides a line of sight calibration device, including:
- the eye position determination unit is configured to obtain the three-dimensional position of the user's eyes according to the first image including the user's eyes captured by the first camera;
- the first line-of-sight determination unit is configured to obtain the first line-of-sight direction of the user according to the first image including the eyes of the user captured by the first camera;
- the gaze area unit is configured to obtain the gaze area of the user in the second image according to the three-dimensional position of the eye, the first line of sight direction, the external parameters of the first camera, and the external parameters and internal parameters of the second camera, and the second image is collected by the second camera And include the scene outside the car seen by the user;
- the gaze point calibration unit is configured to obtain the position of the gaze point of the user in the second image according to the gaze area of the user in the second image and the second image;
- the second line-of-sight determination unit is configured to obtain a second line-of-sight direction of the user according to the three-dimensional position of the gaze point and the three-dimensional position of the eyes, and the second line-of-sight direction is used as the calibrated line-of-sight direction.
- the second image can be used to calibrate the user's line of sight direction to obtain a second line of sight direction with high accuracy, effectively improving the accuracy of the user's line of sight data, and further improving the user experience of upper-layer applications based on line of sight tracking.
- the first gaze direction is extracted from the first image based on a gaze tracking model.
- the gaze area unit is configured according to the three-dimensional position of the eye, the direction of the first line of sight, the external parameters of the first camera, the external parameters of the second camera, the internal parameters of the second camera, and the accuracy of the line-of-sight tracking model, A gaze area of the user in the second image is obtained.
- a small number of samples and small-scale training can continuously improve the gaze tracking model's estimation accuracy for a specific user's gaze, and then obtain a user-level gaze tracking model.
- the gaze point calibration unit is further configured to screen the gaze point according to the confidence of the user's gaze point in the second image; and/or the optimization unit is further configured to The confidence of the gaze point in the second image filters the second gaze direction.
- the gaze point calibration model also provides a probability value of the user's gaze point in the second image, and the confidence is determined by the probability value.
- the data provided by the gaze point calibration model can be fully utilized to improve processing efficiency.
- the gaze point position determination unit is configured to obtain the three-dimensional position of the gaze point of the user in response to the user's gaze operation on the reference point in the display screen;
- the eye position determination unit is configured to obtain the three-dimensional position of the user's eyes according to the first image including the user's eyes captured by the first camera;
- the second line of sight determination unit is configured to obtain the user's second line of sight direction according to the three-dimensional position of the gaze point and the three-dimensional position of the eyes.
- the accuracy of the user's gaze data can be effectively improved, thereby improving the user experience of upper-layer applications based on gaze tracking.
- the driver's line of sight calibration can be realized without affecting the safe driving of the driver.
- the device further includes: an optimization unit configured to use the user's second gaze direction and the first image as optimization samples of the user, and optimize the gaze tracking model based on a small sample learning method.
- a fifth aspect of the present application provides a computing device, including:
- the sixth aspect of the present application provides a computer-readable storage medium, on which program instructions are stored, wherein, when the program instructions are executed by a computer, the computer executes the above sight calibration method.
- the seventh aspect of the present application provides a driver monitoring system, including:
- At least one memory stores program instructions, and when the program instructions are executed by the at least one processor, the at least one processor executes the line-of-sight calibration method of the first aspect above.
- the accuracy of line-of-sight estimation for users such as drivers in the vehicle cockpit scene can be effectively improved, thereby improving the user experience of the driver monitoring system and users of upper-layer applications such as distraction detection, takeover level estimation, and line-of-sight interaction in the smart cockpit experience.
- the eighth aspect of the present application provides a vehicle, including the above-mentioned driver monitoring system.
- Fig. 1 is a schematic diagram of an exemplary architecture of a system in an embodiment of the present application.
- Fig. 2 is a schematic diagram of the installation position of the sensor in an embodiment of the present application.
- Fig. 3 is a schematic flowchart of a line of sight calibration method in an embodiment of the present application.
- Fig. 5 is a schematic flow chart of eye three-dimensional position estimation in an embodiment of the present application.
- Fig. 6 is an example diagram of a cockpit scene applicable to the embodiment of the present application.
- FIG. 7 is a schematic diagram of the gaze area in the reference coordinate system in the scene in FIG. 6 .
- FIG. 8 is a schematic diagram of the gaze area in the second image in the scene in FIG. 6 .
- Fig. 9 is a schematic flowchart of determining the gaze area of the user in the second image in an embodiment of the present application.
- Fig. 10 is a projection example diagram between the gaze area in the reference coordinate system and the gaze area in the second image.
- Fig. 11 is a schematic structural diagram of a gaze point calibration model in an embodiment of the present application.
- Fig. 14 is a schematic diagram of the driver's line of sight calibration and model optimization process in the cockpit scene.
- Fig. 16 is a schematic diagram of an exemplary architecture of a system in another embodiment of the present application.
- Fig. 17 is a schematic flowchart of a line of sight calibration method in another embodiment of the present application.
- Fig. 18 is a schematic structural diagram of a line-of-sight calibration device in another embodiment of the present application.
- FIG. 19 is a schematic structural diagram of a computing device according to an embodiment of the present application.
- Eye tracking/gaze tracking model (Eye tracking/gaze tracking model), a machine learning model that can estimate the direction or point of gaze of human eyes through images containing human eyes or faces.
- Eye tracking/gaze tracking model a machine learning model that can estimate the direction or point of gaze of human eyes through images containing human eyes or faces.
- neural network models etc.
- Driver Monitoring System based on image processing technology, voice processing technology, etc., monitors the status of the driver in the car, which includes an in-car camera, processor, fill light, etc. installed in the cockpit of the car Components, the in-vehicle camera can capture images including the driver's face, head, and part of the torso (eg, arm) (ie, the DMS image in this paper).
- DMS Driver Monitoring System
- the in-vehicle camera can capture images including the driver's face, head, and part of the torso (eg, arm) (ie, the DMS image in this paper).
- Time of Flight (TOF) camera by emitting light pulses to the target object, while recording the reflection movement time of the light pulse, calculates the distance between the light pulse emitter and the target object, and generates a 3D image of the target object.
- the 3D image includes the depth information of the target object and the information of the reflected light intensity.
- Landmark algorithm a kind of face feature point extraction technology.
- the internal parameters of the camera determine the projection relationship from the three-dimensional space to the two-dimensional image, which is only related to the camera.
- the internal parameters can include the scale factor of the camera in the two coordinate axes u and v directions of the image coordinate system, and the principal point coordinates (x 0 , y 0 ), the coordinate axis tilt parameter s, the scale factor of the u axis is the ratio of the physical length of each pixel in the x direction in the image coordinate system to the camera focal length f, and the v axis scale factor is the pixel in the y direction of the image coordinate system The ratio of the physical length to the focal length of the camera.
- the internal parameters can include the scale factor of the camera in the two coordinate axes u and v directions of the image coordinate system, the principal point coordinates relative to the imaging plane coordinate system, the coordinate axis tilt parameter and the distortion parameter, and the distortion parameter can include the camera The three radial distortion parameters and two tangential distortion parameters of .
- the internal and external parameters of the camera can be obtained through Zhang Zhengyou calibration.
- the internal reference and external reference of the first camera, and the internal reference and external reference of the second camera are all calibrated in the same world coordinate system.
- the imaging plane coordinate system that is, the image coordinate system, takes the center of the image plane as the coordinate origin, and the X-axis and Y-axis are respectively parallel to the two vertical sides of the image plane.
- P(x, y) is usually used to represent its coordinate value
- the image coordinate A system is the position of a pixel in an image in physical units (for example, millimeters).
- the pixel coordinate system that is, the image coordinate system in pixels, takes the upper left vertex of the image plane as the origin, and the X-axis and Y-axis are parallel to the X-axis and Y-axis of the image coordinate system, usually p(u,v) Represents its coordinate value, and the pixel coordinate system represents the position of the pixel in the image in units of pixels.
- the coordinate value of the pixel coordinate system and the coordinate value of the camera coordinate system satisfy the relationship (2).
- (u, v) represent the coordinates of the image coordinate system in units of pixels
- (Xc, Yc, Zc) represent the coordinates in the camera coordinate system
- K is the matrix representation of the internal camera parameters.
- Few-shot learning refers to that after the neural network pre-learns a large number of samples of a certain known category, it only needs a small number of labeled samples to achieve rapid learning for a new category.
- Meta-learning is an important branch of small-sample learning research.
- the main idea is to train the neural network by using a large number of small-sample tasks similar to the target small-sample task when the target task has fewer training samples. , so that the trained neural network has a good initial value on the target task, and then use a small number of training samples of the target small sample task to adjust the trained neural network.
- Model-agnostic meta-learning (MAML) algorithm a specific algorithm of meta-learning, its idea is: to train the initialization parameters of the machine learning model, so that the machine learning model can learn from a small amount of data from new tasks Better performance can be obtained after performing one or more learnings on the parameters.
- MAML Model-agnostic meta-learning
- soft argmax an algorithm or function that can obtain the coordinates of key points through a heat map, can be implemented by a layer of a neural network, and the layer that realizes soft argmax can be called a soft argmax layer.
- Head Up Display also known as parallel display system, can project important driving information such as speed, engine speed, battery power, navigation, etc. onto the windshield in front of the driver, so that the driver does not bow his head or Just turn your head and you can see vehicle parameters and driving information such as speed, engine speed, battery power, navigation, etc. through the windshield display area.
- the first possible implementation is to collect a large amount of gaze data to train the gaze tracking model, deploy the trained gaze tracking model on the vehicle end, and the vehicle end uses the gaze tracking model to process the real-time collected images to finally obtain the user's line of sight.
- This implementation mainly has the following defects: there may be large individual differences between the samples used in training the gaze tracking model and the current user (for example, individual differences in the internal structure of the human eye, etc.), which makes the gaze tracking model difficult for the current user. The degree of matching is not high, resulting in inaccurate estimation of the current user's line of sight.
- the second possible implementation is: use the screen to display a specific image, calibrate the gaze tracking device through the interaction between the user of the gaze tracking device and the specific image on the screen, and obtain the parameters for the user, thereby improving the use of the gaze tracking device.
- the accuracy of the This implementation method mainly has the following defects: it relies on the active cooperation of the user, the operation is cumbersome, and may cause calibration errors due to improper human operation, which ultimately affects the accuracy of the eye-tracking device for the user. At the same time, because it is difficult to deploy a large enough display screen directly in front of the driver in the cockpit, this implementation method is not suitable for the cockpit scene.
- the optimized samples including the user's second gaze direction and the first image are also used to optimize the gaze tracking model through a small-sample learning method, so as to improve the gaze tracking model's estimation accuracy for the user's gaze, thereby obtaining the user's
- the advanced eye-tracking model solves the problem of difficulty in optimizing the eye-tracking model and the low accuracy of some users' eye-line estimation.
- the embodiments of the present application may be applicable to any application scenario that requires real-time calibration or estimation of a person's gaze direction.
- the embodiments of the present application may be applicable to the calibration or estimation of the driver's and/or passengers' line of sight in the cockpit environment of vehicles such as vehicles, boats, and aircrafts.
- the embodiments of the present application may also be applicable to other scenarios, for example, performing line-of-sight calibration or estimation on a person wearing wearable glasses or other devices.
- the embodiment of the present application may also be applied to other scenarios, which will not be listed one by one here.
- the first camera 110 is responsible for capturing the user's eye image (ie, the first image hereinafter).
- the first camera 110 may be an in-vehicle camera in the DMS, and the in-vehicle camera is used to photograph the driver in the cockpit.
- the in-vehicle camera is a DMS camera that can be installed near the A-pillar of the car (position 1 in Figure 2) or near the steering wheel, and the DMS camera is preferably a higher-resolution RGB camera.
- the human eye image (i.e., the first image hereinafter) generally refers to various types of images including human eyes, for example, a human face image, a bust image including a human face, and the like.
- the human eye image (that is, the first image hereinafter) may be a human face image.
- the second camera 120 is responsible for collecting a scene image (that is, the second image below), which includes the scene outside the vehicle seen by the user, that is, the field of view of the second camera 120 and the field of view of the user at least partially overlap .
- a scene image that is, the second image below
- the second camera 120 may be an exterior camera, and the exterior camera may be used to capture the scene in front of the vehicle seen by the driver.
- the camera outside the vehicle can be a front camera installed above the front windshield of the vehicle (position 2 in Figure 2), which can capture the scene in front of the vehicle, that is, the scene outside the vehicle seen by the driver,
- the front camera is preferably a TOF camera, which can collect depth images, so as to obtain the distance between the vehicle and the target object in front (for example, the object that the user is looking at) through the image.
- the image processing system 130 is an image processing system capable of processing DMS images and scene images, and it can run a gaze tracking model to obtain the user's preliminary gaze data and use the preliminary gaze data (ie, the first gaze direction below) to perform the hereinafter described
- the line of sight calibration method obtains the user's calibrated line of sight data (ie, the second line of sight direction hereinafter), thereby improving the accuracy of the user's line of sight data.
- the model optimization system 140 can be responsible for the optimization of the gaze tracking model, which can optimize the gaze tracking model by using the user's calibrated gaze data provided by the image processing system 130 and provide the optimized gaze tracking model to the image processing system 130, thereby improving the sight line The accuracy of the tracking model's estimate of the user's line of sight.
- the above exemplary system 100 may further include a model training system 150, which is responsible for training a gaze tracking model, which may be deployed in the cloud.
- a model training system 150 which is responsible for training a gaze tracking model, which may be deployed in the cloud.
- the model optimization system 140 and the model training system 150 can be realized by the same system.
- the camera coordinate system of the first camera 110 may be a Cartesian coordinate system Xc 1 -Yc 1 -Zc 1
- the camera coordinate system of the second camera 120 may be a Cartesian coordinate system Xc 2 -Yc 2 -Zc 2
- the image coordinate system and pixel coordinate system of the first camera 110 and the second camera 120 are not shown in FIG. 2 .
- the camera coordinate system of the first camera 110 is used as the reference coordinate system. Coordinates and/or angle representations in the camera coordinate system of the camera head 110 .
- the reference coordinate system can be freely selected according to various factors such as actual needs, specific application scenarios, and calculation complexity requirements, but is not limited thereto.
- the cockpit coordinate system of the vehicle may also be used as the reference coordinate system.
- FIG. 3 shows an exemplary flow of the line of sight calibration method in this embodiment.
- an exemplary line of sight calibration method in this embodiment may include the following steps:
- Step S301 according to the first image collected by the first camera 110 including the user's eyes, obtain the three-dimensional position of the user's eyes and the first line of sight direction;
- Step S302 according to the three-dimensional position of the eyes, the first line of sight direction, the external parameters of the first camera 110 and the external parameters and internal parameters of the second camera 120, obtain the gaze area of the user in the second image, and the second image is collected by the second camera 120 And include the scene outside the car seen by the user;
- Step S303 according to the user's gaze area in the second image and the second image, obtain the position of the user's gaze point in the second image;
- Step S304 according to the position of the user's gaze point in the second image and the internal parameters of the second camera 120, obtain the three-dimensional position of the user's gaze point;
- Step S305 according to the three-dimensional position of the gaze point and the three-dimensional position of the eyes, the second line of sight direction of the user is obtained, and the second line of sight direction is used as the line of sight direction after calibration.
- the line of sight calibration method of this embodiment can use the second image to calibrate the user’s line of sight direction to obtain a second line of sight direction with high accuracy, effectively improving the accuracy of the user’s line of sight data, and further improving the user experience of upper-layer applications based on line of sight tracking .
- the first gaze direction is extracted from the first image based on a gaze tracking model.
- the eye-tracking model can be trained by the model training system 150 deployed on the cloud and provided to the image processing system 130 deployed on the user's vehicle.
- the image processing system 130 runs the eye-tracking model on the first image including the user's eyes.
- the image is processed to obtain the user's first gaze direction.
- the gaze direction may be represented by a viewing angle and/or a gaze vector in a reference coordinate system.
- the angle of view may be the angle between the line of sight and the axis of the eyes, and the intersection of the line of sight and the axis of the eyes is the three-dimensional position of the user's eyes.
- the sight vector is a direction vector starting from the position of the eye in the reference coordinate system and ending at the position of the gaze point in the reference coordinate system.
- the direction vector can include the three-dimensional coordinates of the eye reference point in the reference coordinate system and the gaze The three-dimensional coordinates of the point in the datum coordinate system.
- the fixation point refers to the point at which the user's eyes are fixed. Taking the cockpit scene as an example, the driver's gaze point is the specific position where the driver's eyes are looking. A gaze point can be represented by its position in space. In this embodiment, the three-dimensional position of the gaze point is represented by the three-dimensional coordinates of the gaze point in the reference coordinate system.
- Fig. 5 shows an exemplary process of eye three-dimensional position estimation.
- the exemplary process of eye three-dimensional position estimation may include: step S501, use the face detection algorithm and the face feature point detection algorithm to process the first image, and obtain the user's face feature points in the first image position in the image; S502, combining the position of the user's face feature point in the first image with the pre-acquired standard 3D face model for PnP solution, and solving the 3D coordinates of the user's face feature point in the reference coordinate system; S503 , extracting the 3D coordinates of the user's eye reference point from the 3D coordinates of the user's facial feature points in the reference coordinate system as the 3D coordinates of the user's eyes.
- FIG. 5 is only an example, and is not intended to limit a specific implementation manner of eye three-dimensional position estimation in this embodiment.
- the camera perspective projection model can be used to determine the gaze area of the user in the second image (Hereinafter, the "focus area in the second image” is simply referred to as "the second focus area").
- the camera perspective projection model may be a pinhole imaging model or a nonlinear perspective projection model.
- step S302 may include: according to the three-dimensional position of the user's eyes, the first line of sight direction, the external parameters of the first camera 110, the internal parameters and external parameters of the second camera 120, and the line of sight tracking model Accuracy, to obtain the gaze area of the user in the second image.
- the error caused by the limitation of the accuracy of the line-of-sight tracking model can be eliminated in the finally obtained second line-of-sight direction.
- Fig. 6 shows a scene where a driver (not shown in the figure) in the cockpit environment looks at pedestrians in the crosswalk in front of the vehicle.
- Fig. 9 shows an exemplary flow of determining the second gaze area of the user.
- the process of obtaining the user's second gaze area may include the following steps:
- Step S901 determine the gaze area S1 of the user in the reference coordinate system according to the three-dimensional position of the user's eyes and the first line of sight direction.
- the user's line of sight in the reference coordinate system is obtained ON.
- the average accuracy value of the gaze tracking model is expressed as: ⁇ , where ⁇ represents the error value of the viewing angle, the lower the accuracy of the gaze tracking model, the greater the value of ⁇ .
- the line of sight angle ⁇ can be adjusted to an interval value [ ⁇ - ⁇ , ⁇ + ⁇ ], and the cone formed by the line of sight with the line of sight angle ⁇ - ⁇ and the line of sight with the line of sight angle of ⁇ + ⁇ is used as the user’s reference coordinate system
- Fig. 7 shows the visualized graphics of the driver's gaze area S1 in the reference coordinate system in the scene shown in Fig. 6,
- O represents the three-dimensional position of the eyes
- the solid line with arrows represents the first line of sight direction ON
- ⁇ represents the first line of sight
- ⁇ represents the average precision value of the gaze tracking model
- the dotted cone represents the user's gaze area S1 in the reference coordinate system.
- Fig. 8 shows the second image captured by the second camera of the scene shown in Fig. 6, in which only the part where the driver is looking at is shown, and the content irrelevant to this embodiment in the scene shown in Fig. 6 is omitted, and Fig. 8 marks the user's second gaze area Q.
- the projection process in this step can be realized by formula (1) and formula (2). Specifically, first, based on the external parameters of the first camera 110 and the external parameters of the second camera 120, the gaze area S1 is transformed into the camera coordinate system of the second camera 120 according to formula (1), and the gaze area S2 is obtained; then, Based on the internal reference of the second camera 120, the gaze region S2 is projected into the pixel coordinate system of the second camera 120 according to relational expression (2), to obtain the second gaze region Q of the user.
- the extrinsics of the first camera 110 and the extrinsics of the second camera 120 are calibrated in the same world coordinate system.
- the gaze area S1 is projected on the imaging surface of the second camera 120 as a quadrilateral second gaze area Q through the external parameters of the first camera 110 and the internal parameters and external parameters of the second camera 120.
- the accuracy of the gaze tracking model is lower , the larger the value of ⁇ , the larger the angle of the user's fixation area S1 in the reference coordinate system, and the larger the width of the quadrilateral second fixation area Q.
- FIG. 10 shows an exemplary projection diagram of a line of sight OX.
- the projection of a point x with different depths on the line of sight OX on the imaging plane of the second camera 120 is O'X'.
- OX the projection of a point x with different depths on the line of sight OX on the imaging plane of the second camera 120
- the mapping point of the origin of the human line of sight is O'
- the first The line of sight direction L is mapped to line of sight L'.
- FIGS. 7 to 10 are only examples, and the method for obtaining the second attention region in the embodiment of the present application is not limited thereto.
- the gaze point of the user in the second image can be obtained based on the second gaze area and the second image through a pre-trained gaze point calibration model (herein, "the gaze point in the second image” is referred to as “the first gaze point” for short).
- the gaze point calibration model can be any machine learning model available for image processing. Considering the high precision and good stability of the neural network, in the embodiment of the present application, the gaze point calibration model is preferably a neural network model.
- the decoding network outputs the heat map Fig3, and the gray value of each pixel in the heat map Fig3 indicates the probability that the corresponding pixel is the fixation point.
- the heat map Fig3 is calculated by the softargmax normalization layer to obtain the position of the gaze point in the second image, that is, the coordinates (x, y) of the corresponding pixel point of the gaze point in the second image.
- a line of sight has a fixation point, and each fixation point may contain one or more pixels in the second image.
- the fixation point calibration model can be obtained by pre-training.
- the scene image and its corresponding grayscale image of the gaze area (the range of the gaze area in the grayscale image of the gaze area is the set value) are used as samples, and the real gaze area of the sample A known.
- the ResNet part and the soft-argmax standard layer are trained at the same time but different loss functions are used.
- the embodiment of this application does not limit the specific loss function used.
- the loss function of the ResNet part can be binary cross entropy (BCE loss)
- the loss function of the soft-argmax standard layer can be mean square error (MSE loss).
- the decoding network in the ResNet part can use pixel-level binary cross-entropy as a loss function, and the expression is shown in the following formula (3).
- y i is the label of whether the pixel i is the fixation point, which is 1 when it is the fixation point, and 0 when it is not the fixation point.
- p(y i ) is the probability value that pixel i is the fixation point in the heat map Fig3 output by the decoding network
- N is the total number of pixels of the second image Fig1, that is, the total number of pixels of the heat map Fig3.
- step S304 according to the position of the user's gaze point in the second image and the internal reference of the second camera 120, there are many specific implementations for obtaining the three-dimensional position of the user's gaze point.
- the three-dimensional position of the gaze point is the reference coordinate system (first The three-dimensional coordinates of the gaze point in the camera coordinate system of the camera 110). It can be understood that any algorithm for obtaining the position of a certain point in space based on its position in the image can be applied to step S304.
- step S304 it is preferable to obtain the three-dimensional position of the gaze point through inverse perspective transformation.
- the Z-axis coordinates of the gaze point in the reference coordinate system can be obtained only by obtaining the depth of the second gaze point, and in conjunction with step S303 to obtain the position of the second gaze point, that is, the pixel coordinates (u, v), through Simple inverse perspective transformation can obtain the three-dimensional coordinates of the gaze point in the reference coordinate system, that is, the three-dimensional position of the gaze point.
- step S304 may include: step S3041, using the second image to obtain the depth of the second gaze point based on the monocular depth estimation algorithm, the depth is the distance h of the gaze point relative to the second camera 120, and the distance h Estimate the Z-axis coordinate Zc 2 of the gaze point in the camera coordinate system of the second camera; Step S3042, according to the position of the second gaze point, that is, the pixel coordinates (u, v) and the Z of the gaze point in the camera coordinate system of the second camera The axis coordinates are based on the internal and external parameters of the second camera 120 and the external parameters of the first camera 110 to obtain the three-dimensional coordinates of the gaze point in the reference coordinate system.
- the distance h of each pixel in the second image relative to the second camera 120 can be calculated by using the second image through a monocular depth estimation algorithm such as FastDepth, and can be extracted from it according to the position of the second gaze point, that is, the pixel coordinates The distance h from the second gaze point relative to the second camera 120 .
- a monocular depth estimation algorithm such as FastDepth
- various applicable algorithms may be used for depth estimation.
- step S3042 according to the position of the second gaze point, that is, the pixel coordinates (u, v), the Z-axis coordinate Zc of the gaze point in the reference coordinate system, and the internal reference of the second camera 120, the gaze point at Coordinate values (Xc 2 , Yc 2 , Zc 2 ) in the camera coordinate system of the second camera 120, based on the extrinsic parameters of the second camera 120 and the extrinsic parameters of the first camera 110, the camera coordinates of the gaze point in the second camera 120
- the coordinate values (Xc 2 , Yc 2 , Zc 2 ) in the system can be deduced by formula (1) to obtain the coordinate values (Xc 1 , Yc 1 , Zc 1 ) of the gaze point in the camera coordinate system of the first camera 110, and the coordinate values (Xc 1 , Yc 1 , Zc 1 ) is the three-dimensional position of the gaze point.
- a line of sight has one fixation point, but due to the limitation of accuracy, multiple fixation points may be obtained corresponding to the same line of sight.
- the gaze point can be screened according to the confidence of the user's gaze point in the second image, so that the second line of sight direction can be obtained only by performing subsequent steps on the screened out gaze point, which can ensure the second line of sight direction Accurate while reducing the amount of calculation and improving processing efficiency.
- the screening of gaze points can be performed before step S304, and can also be performed after step S304.
- the gaze point calibration model also provides the probability value of the second gaze point, and the confidence degree of the second gaze point can be determined by the probability value.
- the heat map provided by the gaze point calibration model includes a probability value of the second gaze point, which represents the probability that the second gaze point is a real gaze point, and a higher probability value indicates that the corresponding second gaze point is The higher the possibility of the real gaze point, the probability value may be directly used as the confidence degree of the second gaze point or the proportional function value of the probability value may be used as the confidence degree of the second gaze point. Therefore, the confidence degree of the second gaze point can be obtained without separate calculation, which can improve processing efficiency and reduce calculation complexity.
- only gaze points whose confidence level of the second gaze point exceeds a preset first confidence threshold (for example, 0.9) or whose confidence level is relatively highest may be selected. If there are still multiple gaze points with the relatively highest confidence level of the second gaze point or exceeding the first confidence threshold, one or more gaze points may be randomly selected from these gaze points. Certainly, if there are still multiple gaze points whose confidence level of the second gaze point exceeds the first confidence threshold or the gaze point with the relatively highest confidence degree of the second gaze point, these multiple gaze points may also be reserved at the same time.
- the line-of-sight calibration in steps S301 to S305 in the embodiment of the present application may be performed by the image processing system 130 in the system 100 .
- FIG. 13 shows an exemplary implementation process of eye-tracking model optimization in step S306.
- the exemplary process may include: step S3061, the image processing system 130 stores the second line of sight direction and its corresponding first image as the user's optimized sample in the user's sample library, and the sample library can be compared with User information (for example, user identity information) is associated to facilitate query, and is deployed in the model optimization system 140 .
- the model optimization system 140 optimizes the user's gaze tracking model obtained in the previous optimization based on the small-sample learning method by using the newly added optimization samples in the user's sample library.
- step S3062 can be performed regularly or when the number of newly added optimization samples reaches a certain number or other preset conditions are met.
- the optimization of step S3061 Sample library updates can be performed in real time.
- the user's optimized sample may be selectively uploaded to improve the quality of the optimized sample, reduce unnecessary optimization operations, and reduce hardware loss caused by model optimization.
- the second gaze direction may be screened according to the confidence of the second gaze point, and only optimized samples formed by the screened second gaze direction and its corresponding first image are uploaded.
- the screening of the second gaze direction may include but not limited to: 1) selecting a second gaze direction whose confidence level of the second gaze point is greater than a preset second confidence threshold (for example, 0.95); 2) selecting the second gaze direction; The confidence of the fixation point relative to the highest second gaze direction.
- a preset second confidence threshold for example, 0.95
- the few-shot learning method can be implemented by any algorithm that can optimize the gaze tracking model with a small number of samples.
- the user's optimization samples can be used to optimize the gaze tracking model using the MAML algorithm, so as to realize the optimization of the gaze tracking model based on the small sample learning method.
- a gaze tracking model that is more suitable for a specific user's individual characteristics can be obtained through a small number of samples, with a small amount of data and low computational complexity, which is conducive to reducing hardware loss and hardware cost.
- FIG. 14 illustrates an exemplary process flow for the system 100 to perform line of sight calibration and model optimization in a cockpit environment.
- the processing flow may include: step S1401, the camera in the vehicle G captures the DMS image (i.e. the first image) of the driver A in the cockpit of the vehicle, the DMS image includes the face of the driver A,
- the image processing system 130 at the vehicle end of the vehicle G runs the line-of-sight tracking model to deduce the initial line-of-sight direction (i.e.
- Fig. 15 shows an exemplary structure of a sight calibration device 1500 provided in this embodiment.
- the line of sight calibration device 1500 of this embodiment may include:
- the eye position determination unit 1501 is configured to obtain the three-dimensional position of the user's eyes according to the first image collected by the first camera including the user's eyes;
- the gaze point calibration unit 1504 is configured to obtain the position of the gaze point of the user in the second image according to the gaze area of the user in the second image and the second image;
- the second line of sight determining unit 1506 is configured to obtain a second line of sight direction of the user according to the three-dimensional position of the gaze point and the three-dimensional position of the eye, and the second line of sight direction is used as a calibrated line of sight direction.
- the first gaze direction is extracted from the first image based on a gaze tracking model.
- the gaze calibration device further includes: an optimization unit 1507 configured to use the user's second gaze direction and the first image as optimization samples of the user, and optimize the gaze tracking based on a small sample learning method Model.
- the gaze point calibration unit 1504 can also be configured to filter the gaze point according to the confidence of the user's gaze point in the second image; and/or, the optimization unit 1507 is also configured to filter the gaze point according to the user's The confidence of the gaze point in the second image filters the second gaze direction.
- the position of the gaze point of the user in the second image is obtained according to the gaze area of the user in the second image and the second image by using a gaze point calibration model.
- the gaze point calibration model also provides a probability value of the user's gaze point in the second image, and the confidence level is determined by the probability value.
- FIG. 16 shows an exemplary architecture of a system 1600 applicable to this embodiment.
- the exemplary system 1600 of this embodiment is basically the same as the system 100 of Embodiment 1, the difference is that the second camera 120 in the exemplary system 1600 of this embodiment is an optional component, which includes a display
- the display screen 160 which can be deployed on the vehicle end, is realized through the existing display components in the vehicle end equipment.
- Other parts of the system 1600 in this embodiment namely the first camera 110, the image processing system 130, the model optimization system 140, and the model training system 150150, have basically the same functions as the corresponding parts in the system 100 in Embodiment 1, and will not be repeated here.
- This embodiment uses the display screen 160 marked with the positional relationship with the first camera 110 (that is, the camera in the car), and relies on the reference point of the user's gaze on the display screen 160 to realize the calibration of the user's line of sight and obtain its optimized sample.
- the eye-tracking model performs few-shot learning to improve its accuracy.
- Fig. 17 shows an exemplary flow of the line of sight calibration method in this embodiment.
- the line of sight calibration method of this embodiment may include the following steps:
- Step S1701 in response to the user's gazing operation on the reference point on the display screen 160, obtain the three-dimensional position of the user's gazing point;
- the display screen 160 may also include: controlling the display screen 160 to provide a line of sight calibration interface to the user, the line of sight calibration interface including a visual prompt for reminding the user to gaze at the reference point, so that the user performs a corresponding gaze operation according to the visual prompt.
- the specific form of the line-of-sight calibration interface is not limited by this embodiment.
- the gazing operation may be any operation related to the user gazing at the reference point on the display screen 160, and the embodiment of the present application does not limit the specific implementation or expression of the gazing operation.
- the gaze operation may include inputting confirmation information in the gaze calibration interface while the user gazes at a reference point in the gaze calibration interface.
- the display screen 160 may be, but not limited to, an AR-HUD of a vehicle, a dashboard of a vehicle, a portable electronic device of a user, or others.
- the line of sight calibration in the cockpit scene is mainly aimed at the driver or the co-pilot. Therefore, in order to ensure that the line of sight calibration does not affect safe driving, the display screen 160 is preferably an AR-HUD.
- the three-dimensional coordinates of each reference point on the display screen 160 in the camera coordinate system of the first camera 110 may be pre-calibrated through the positional relationship between the display screen 160 and the first camera 110 . In this way, if the user gazes at a reference point, the reference point is the user's gaze point, and the three-dimensional coordinates of the reference point in the camera coordinate system of the first camera 110 are the three-dimensional position of the user's gaze point.
- step S301 The specific implementation of this step is the same as the specific implementation of the three-dimensional position of the eye in step S301 in the first embodiment, and will not be repeated here.
- Step S1703 according to the three-dimensional position of the gaze point and the three-dimensional position of the eyes, the second line of sight direction of the user is obtained.
- step S305 in the first embodiment, and will not be repeated here.
- the sight calibration method of this embodiment can obtain the three-dimensional position of the user's gaze point by using the reference point, and at the same time obtain the three-dimensional position of the user's eyes in combination with the first image, that is, obtain the second sight direction with high accuracy. It can be seen that the line-of-sight calibration method of this embodiment can not only effectively improve the accuracy of user line-of-sight estimation, but also has simple operation, low computational complexity, and high processing efficiency, and is suitable for the cockpit environment.
- the gaze calibration method of this embodiment may further include: step S1704, using the user's second gaze direction and the first image as the user's optimization samples, and optimizing the gaze tracking model based on the small sample learning method.
- step S1704 using the user's second gaze direction and the first image as the user's optimization samples, and optimizing the gaze tracking model based on the small sample learning method.
- a small number of samples and small-scale training can continuously improve the gaze tracking model's estimation accuracy for a specific user's gaze, and then obtain a user-level gaze tracking model.
- the specific implementation manner of this step is the same as that of step S306 in the first embodiment, and will not be repeated here. Since the three-dimensional position of the gaze point in this step is obtained through calibration, its accuracy is relatively high. Therefore, there is no need to screen the second gaze direction before step S1704 in this embodiment.
- Fig. 18 shows an exemplary structure of a sight calibration device 1800 provided in this embodiment.
- the line of sight calibration device 1800 of this embodiment may include:
- the gaze point position determination unit 1801 is configured to obtain the three-dimensional position of the gaze point of the user in response to the user's gaze operation on the reference point in the display screen;
- the second line of sight determining unit 1506 is configured to obtain a second line of sight direction of the user according to the three-dimensional position of the gaze point and the three-dimensional position of the eye.
- the display screen is an augmented reality head-up display.
- the device further includes: an optimization unit 1507 configured to use the user's second gaze direction and the first image as optimization samples of the user, and optimize the gaze tracking model based on a few-shot learning method.
- an optimization unit 1507 configured to use the user's second gaze direction and the first image as optimization samples of the user, and optimize the gaze tracking model based on a few-shot learning method.
- FIG. 19 is a schematic structural diagram of a computing device 1900 provided by an embodiment of the present application.
- the computing device 1900 includes: a processor 1910 and a memory 1920 .
- the computing device 1900 may also include a communication interface 1930 and a bus 1940 . It should be understood that the communication interface 1930 in the computing device 1900 shown in FIG. 19 can be used to communicate with other devices.
- the memory 1920 and the communication interface 1930 can be connected to the processor 1910 through the bus 1940 .
- only one line is used in FIG. 19 , but it does not mean that there is only one bus or one type of bus.
- the processor 1910 may be connected to the memory 1920 .
- the memory 1920 can be used to store the program codes and data. Therefore, the memory 1920 may be a storage unit inside the processor 1910, or an external storage unit independent of the processor 1910, or may include a storage unit inside the processor 1910 and an external storage unit independent of the processor 1910. part.
- the processor 1910 may be a central processing unit (central processing unit, CPU).
- the processor can also be other general-purpose processors, digital signal processors (digital signal processors, DSPs), application specific integrated circuits (application specific integrated circuits, ASICs), off-the-shelf programmable gate arrays (field programmable gate arrays, FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
- the processor 1910 adopts one or more integrated circuits for executing related programs, so as to realize the technical solutions provided by the embodiments of the present application.
- the memory 1920 may include read-only memory and random-access memory, and provides instructions and data to the processor 1910 .
- a portion of processor 1910 may also include non-volatile random access memory.
- processor 1910 may also store device type information.
- the processor 1910 executes the computer-executed instructions in the memory 1920 to execute the operation steps of the line-of-sight calibration method in the above-mentioned embodiments.
- the computing device 1900 may correspond to a corresponding body executing the methods according to the various embodiments of the present application, and the above-mentioned and other operations and/or functions of the modules in the computing device 1900 are for realizing the present invention.
- the corresponding processes of the methods in the embodiments are not repeated here.
- the embodiment of the present application also provides a driver monitoring system, which includes the above-mentioned first camera 110 , second camera 120 and computing device 1900 .
- the first camera 110 is configured to capture a first image including the eyes of the user
- the second camera 120 is configured to capture a second image including the scene seen by the user
- both the first camera 110 and the second camera 120 can communicate with each other.
- the computing device 1900 communicates.
- the processor 1910 uses the first image provided by the first camera 110 and the second image provided by the second camera 120 to execute computer-executed instructions in the memory 1920 to execute the operation steps of the line of sight calibration method in the first embodiment above.
- the driver monitoring system may further include a display screen configured to display reference points to the user.
- the processor 1910 uses the first image provided by the first camera 110 and the three-dimensional position of the reference point displayed on the display screen to execute computer-executed instructions in the memory 1920 to execute the operation steps of the line-of-sight calibration method in the second embodiment above.
- the driver monitoring system can also include a cloud server, which can be configured to use the user's second line of sight direction and the first image provided by the computing device 1900 as the user's optimization sample, and optimize the line of sight tracking model based on the small sample learning method , and provide the optimized gaze tracking model to the computing device 1900, so as to improve the estimation accuracy of the gaze tracking model for the user's gaze.
- a cloud server which can be configured to use the user's second line of sight direction and the first image provided by the computing device 1900 as the user's optimization sample, and optimize the line of sight tracking model based on the small sample learning method , and provide the optimized gaze tracking model to the computing device 1900, so as to improve the estimation accuracy of the gaze tracking model for the user's gaze.
- the architecture of the driver monitoring system can refer to the system shown in FIG. 1 in the first embodiment and the system shown in FIG. 16 in the second embodiment.
- the image processing system 130 can be deployed in the computing device 1900, and the above-mentioned model optimization system 140 can be deployed in the cloud server.
- an embodiment of the present application also provides a vehicle, which may include the above-mentioned driver monitoring system.
- the vehicle is a motor vehicle, which can be but not limited to a sports utility vehicle, a bus, a large truck, a passenger vehicle of various commercial vehicles, and can also be but not limited to a vehicle of various boats and ships.
- Watercraft, aircraft, etc. which may also be, but are not limited to, hybrid vehicles, electric vehicles, plug-in hybrid electric vehicles, hydrogen powered vehicles, and other alternative fuel vehicles.
- the hybrid vehicle can be any vehicle with two or more power sources, for example, a vehicle with gasoline and electric power sources.
- the disclosed systems, devices and methods may be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the units is only a logical function division. In actual implementation, there may be other division methods.
- multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
- the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
- the embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, it is used to execute a line-of-sight calibration method, and the method includes the solutions described in the above-mentioned embodiments at least one of the .
- the computer storage medium in the embodiments of the present application may use any combination of one or more computer-readable media.
- the computer readable medium may be a computer readable signal medium or a computer readable storage medium.
- a computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples (non-exhaustive list) of computer readable storage media include: electrical connections with one or more leads, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
- a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
- a computer readable signal medium may include a data signal carrying computer readable program code in baseband or as part of a carrier wave. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
- a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. .
- Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
- Computer program code for performing the operations of the present application may be written in one or more programming languages or combinations thereof, including object-oriented programming languages—such as Java, Smalltalk, C++, and conventional Procedural Programming Language - such as "C" or a similar programming language.
- the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through the Internet using an Internet service provider). connect).
- LAN local area network
- WAN wide area network
- connect such as AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The present application relates to the field of intelligent driving, and discloses a gaze calibration method and apparatus, a device, a computer-readable storage medium, a system, and a vehicle. In the present application, an eye three-dimensional position of a user is obtained by means of a first image comprising eyes of the user, a gaze point three-dimensional position of the user is obtained by means of a calibration position on a display screen or a second image comprising a scene outside a vehicle as seen by the user, and a high-accuracy second gaze direction is obtained according to the eye three-dimensional position and gaze point three-dimensional position of the user, such that the accuracy of gaze estimation of the user is effectively improved, and the present application may be suitable for a cockpit scene. In addition, in the application, an optimization sample comprising the second gaze direction of the user and the first image thereof is also used to optimize a gaze tracking model by means of a few‑shot learning method, such that the accuracy of gaze estimation of the gaze tracking model for a specific user is improved, and a gaze tracking model having high accuracy for the specific user can be obtained.
Description
本申请涉及智能驾驶领域,尤其涉及一种视线校准方法及装置、设备、计算机可读存储介质、系统、车辆。The present application relates to the field of intelligent driving, and in particular to a sight calibration method, device, equipment, computer-readable storage medium, system, and vehicle.
视线追踪是智能座舱中分神检测、接管等级预估、视线交互等上层应用的重要支撑。由于人与人眼睛外部特征及眼球内部结构的差异,通常无法训练出一个对“每个人”都准确的视线追踪模型。同时,由于摄像头安装误差等原因,直接使用视线追踪模型输出的视线角度会产生一定的精度损失,导致视线估计不准。如果可以修正视线估计的误差,即可有效提升基于视线追踪的上层应用的用户体验。Gaze tracking is an important support for upper-level applications such as distraction detection, takeover level estimation, and gaze interaction in the smart cockpit. Due to the differences in the external characteristics of the eyes and the internal structure of the eyeball between people, it is usually impossible to train a gaze tracking model that is accurate for "everyone". At the same time, due to camera installation errors and other reasons, directly using the line of sight angle output by the line of sight tracking model will cause a certain loss of accuracy, resulting in inaccurate line of sight estimation. If the error of gaze estimation can be corrected, the user experience of upper-level applications based on gaze tracking can be effectively improved.
发明内容Contents of the invention
鉴于相关技术存在的以上问题,本申请提供一种视线校准方法及装置、设备、计算机可读存储介质、系统、车辆,能够有效提升特定用户视线估计的准确性。In view of the above problems in the related technologies, the present application provides a line-of-sight calibration method, device, device, computer-readable storage medium, system, and vehicle, which can effectively improve the accuracy of line-of-sight estimation for a specific user.
为达到上述目的,本申请第一方面提供一种视线校准方法,包括:根据第一摄像头采集的包含用户眼睛的第一图像,获得用户的眼睛三维位置和第一视线方向;根据眼睛三维位置、第一视线方向、第一摄像头的外参和第二摄像头的外参与内参,获得用户在第二图像中的注视区域,第二图像由第二摄像头采集且包含用户看到的车外场景;根据用户在第二图像中的注视区域和第二图像,获得用户在第二图像中注视点的位置;根据注视点的位置、第二摄像头的内参,获得用户的注视点三维位置;根据注视点三维位置和眼睛三维位置,获得用户的第二视线方向,第二视线方向作为校准后的视线方向。In order to achieve the above object, the first aspect of the present application provides a line of sight calibration method, including: according to the first image collected by the first camera including the user's eyes, obtain the three-dimensional position of the user's eyes and the first line of sight direction; according to the three-dimensional position of the eyes, The first line of sight direction, the external parameters of the first camera and the external parameters and internal parameters of the second camera obtain the gaze area of the user in the second image, and the second image is collected by the second camera and includes the scene outside the vehicle seen by the user; according to The user's gaze area in the second image and the second image obtain the position of the user's gaze point in the second image; according to the position of the gaze point and the internal reference of the second camera, obtain the three-dimensional position of the user's gaze point; according to the three-dimensional gaze point position and the three-dimensional position of the eyes to obtain the second line of sight direction of the user, and the second line of sight direction is used as the calibrated line of sight direction.
由此,可利用第二图像校准用户的视线方向以获得准确度较高的第二视线方向,有效提升用户视线数据的准确性,进而提升基于视线追踪的上层应用的用户体验。Thus, the second image can be used to calibrate the user's line of sight direction to obtain a second line of sight direction with high accuracy, effectively improving the accuracy of the user's line of sight data, and further improving the user experience of upper-layer applications based on line of sight tracking.
作为第一方面的一种可能的实现方式,第一视线方向是基于视线追踪模型从第一图像中提取的。As a possible implementation manner of the first aspect, the first gaze direction is extracted from the first image based on a gaze tracking model.
由此,可高效获得用户的初始视线方向。Thus, the user's initial line of sight direction can be efficiently obtained.
作为第一方面的一种可能的实现方式,根据眼睛三维位置、第一视线方向、第一摄像头的外参和第二摄像头的外参与内参,获得用户在第二图像中的注视区域,包括:根据眼睛三维位置、第一视线方向、第一摄像头的外参、第二摄像头的外参与内参以及视线追踪模型的精度,获得用户在第二图像中的注视区域。As a possible implementation of the first aspect, the gaze area of the user in the second image is obtained according to the three-dimensional position of the eye, the first line of sight direction, the external parameters of the first camera, and the external parameters and internal parameters of the second camera, including: Obtain the gaze area of the user in the second image according to the three-dimensional position of the eye, the direction of the first line of sight, the extrinsic parameters of the first camera, the extrinsic parameters and internal parameters of the second camera, and the accuracy of the line-of-sight tracking model.
由此,可在最终获得的第二视线方向中消除视线追踪模型精度限制而带来的误差。In this way, the error caused by the limitation of the accuracy of the line-of-sight tracking model can be eliminated in the finally obtained second line-of-sight direction.
作为第一方面的一种可能的实现方式,还包括:以用户的第二视线方向和第一图像作为用户的优化样本,基于小样本学习方法优化视线追踪模型。As a possible implementation of the first aspect, the method further includes: using the user's second gaze direction and the first image as optimization samples of the user, and optimizing the gaze tracking model based on a small sample learning method.
由此,可以少量样本、小规模训练持续提升视线追踪模型对特定用户的视线估计 精度,进而获得用户级的视线追踪模型。As a result, a small number of samples and small-scale training can continuously improve the gaze tracking model's gaze estimation accuracy for a specific user, and then obtain a user-level gaze tracking model.
作为第一方面的一种可能的实现方式,还包括:根据用户在第二图像中注视点的置信度对注视点或第二视线方向进行筛选。As a possible implementation manner of the first aspect, the method further includes: screening the gaze point or the second gaze direction according to the confidence level of the user's gaze point in the second image.
由此,可减少运算量,提高处理效率和视线校准准确性。Therefore, the amount of computation can be reduced, and the processing efficiency and the accuracy of line-of-sight calibration can be improved.
作为第一方面的一种可能的实现方式,用户在第二图像中注视点的位置是利用注视点校准模型根据用户在第二图像中的注视区域和第二图像获得的。As a possible implementation manner of the first aspect, the position of the gaze point of the user in the second image is obtained according to the gaze area of the user in the second image and the second image by using a gaze point calibration model.
由此,可高效准确且稳定地获得用户在第二图像中的注视点。Thus, the gaze point of the user in the second image can be obtained efficiently, accurately and stably.
作为第一方面的一种可能的实现方式,注视点校准模型同时提供了用户在第二图像中注视点的概率值,置信度由概率值确定。As a possible implementation of the first aspect, the gaze point calibration model also provides a probability value of the user's gaze point in the second image, and the confidence is determined by the probability value.
由此,可充分利用注视点校准模型提供的数据来提高处理效率。Thus, the data provided by the gaze point calibration model can be fully utilized to improve processing efficiency.
本申请第二方面提供一种视线校准方法,包括:The second aspect of the present application provides a line of sight calibration method, including:
响应于用户对显示屏中参考点的注视操作,获得用户的注视点三维位置;Obtain the three-dimensional position of the user's gaze point in response to the user's gaze operation on the reference point in the display screen;
根据第一摄像头采集的包含用户眼睛的第一图像,获得用户的眼睛三维位置;Obtain the three-dimensional position of the user's eye according to the first image including the user's eye collected by the first camera;
根据注视点三维位置和眼睛三维位置,获得用户的第二视线方向。According to the three-dimensional position of the gaze point and the three-dimensional position of the eyes, the second line of sight direction of the user is obtained.
由此,可有效提升用户视线数据的准确性,进而提升基于视线追踪的上层应用的用户体验。As a result, the accuracy of the user's gaze data can be effectively improved, thereby improving the user experience of upper-layer applications based on gaze tracking.
作为第二方面的一种可能的实现方式,显示屏为增强现实抬头显示。As a possible implementation of the second aspect, the display screen is an augmented reality head-up display.
由此,可在不影响驾驶员安全驾驶的情况下实现其视线校准。Thus, the driver's line of sight calibration can be realized without affecting the safe driving of the driver.
作为第二方面的一种可能的实现方式,方法还包括:以用户的第二视线方向和第一图像作为用户的优化样本,基于小样本学习方法优化视线追踪模型。As a possible implementation of the second aspect, the method further includes: using the user's second gaze direction and the first image as optimization samples of the user, and optimizing the gaze tracking model based on a small sample learning method.
由此,可以少量样本、小规模训练持续提升视线追踪模型对特定用户的视线估计精度,进而获得用户级的视线追踪模型。In this way, a small number of samples and small-scale training can continuously improve the gaze tracking model's estimation accuracy for a specific user's gaze, and then obtain a user-level gaze tracking model.
本申请第三方面提供一种视线校准装置,包括:The third aspect of the present application provides a line of sight calibration device, including:
眼睛位置确定单元,配置为根据第一摄像头采集的包含用户眼睛的第一图像,获得用户的眼睛三维位置;The eye position determination unit is configured to obtain the three-dimensional position of the user's eyes according to the first image including the user's eyes captured by the first camera;
第一视线确定单元,配置为根据第一摄像头采集的包含用户眼睛的第一图像,获得用户的第一视线方向;The first line-of-sight determination unit is configured to obtain the first line-of-sight direction of the user according to the first image including the eyes of the user captured by the first camera;
注视区域单元,配置为根据眼睛三维位置、第一视线方向、第一摄像头的外参和第二摄像头的外参与内参,获得用户在第二图像中的注视区域,第二图像由第二摄像头采集且包含用户看到的车外场景;The gaze area unit is configured to obtain the gaze area of the user in the second image according to the three-dimensional position of the eye, the first line of sight direction, the external parameters of the first camera, and the external parameters and internal parameters of the second camera, and the second image is collected by the second camera And include the scene outside the car seen by the user;
注视点校准单元,配置为根据用户在第二图像中的注视区域和第二图像,获得用户在第二图像中注视点的位置;The gaze point calibration unit is configured to obtain the position of the gaze point of the user in the second image according to the gaze area of the user in the second image and the second image;
注视点转换单元,配置为根据注视点的位置、第二摄像头的内参,获得用户的注视点三维位置;The gaze point conversion unit is configured to obtain the three-dimensional position of the gaze point of the user according to the position of the gaze point and the internal reference of the second camera;
第二视线确定单元,配置为根据注视点三维位置和眼睛三维位置,获得用户的第二视线方向,第二视线方向作为校准后的视线方向。The second line-of-sight determination unit is configured to obtain a second line-of-sight direction of the user according to the three-dimensional position of the gaze point and the three-dimensional position of the eyes, and the second line-of-sight direction is used as the calibrated line-of-sight direction.
由此,可利用第二图像校准用户的视线方向以获得准确度较高的第二视线方向,有效提升用户视线数据的准确性,进而提升基于视线追踪的上层应用的用户体验。Thus, the second image can be used to calibrate the user's line of sight direction to obtain a second line of sight direction with high accuracy, effectively improving the accuracy of the user's line of sight data, and further improving the user experience of upper-layer applications based on line of sight tracking.
作为第三方面的一种可能的实现方式,第一视线方向是基于视线追踪模型从第一 图像中提取的。As a possible implementation of the third aspect, the first gaze direction is extracted from the first image based on a gaze tracking model.
由此,可高效获得用户的初始视线方向。Thus, the user's initial line of sight direction can be efficiently obtained.
作为第三方面的一种可能的实现方式,注视区域单元,是配置为根据眼睛三维位置、第一视线方向、第一摄像头的外参、第二摄像头的外参与内参以及视线追踪模型的精度,获得用户在第二图像中的注视区域。As a possible implementation of the third aspect, the gaze area unit is configured according to the three-dimensional position of the eye, the direction of the first line of sight, the external parameters of the first camera, the external parameters of the second camera, the internal parameters of the second camera, and the accuracy of the line-of-sight tracking model, A gaze area of the user in the second image is obtained.
由此,可在最终获得的第二视线方向中消除视线追踪模型精度限制而带来的误差。In this way, the error caused by the limitation of the accuracy of the line-of-sight tracking model can be eliminated in the finally obtained second line-of-sight direction.
作为第三方面的一种可能的实现方式,还包括:优化单元,配置为以用户的第二视线方向和第一图像作为用户的优化样本,基于小样本学习方法优化视线追踪模型。As a possible implementation of the third aspect, the method further includes: an optimization unit configured to use the user's second gaze direction and the first image as optimization samples of the user, and optimize the gaze tracking model based on a small sample learning method.
由此,可以少量样本、小规模训练持续提升视线追踪模型对特定用户的视线估计精度,进而获得用户级的视线追踪模型。In this way, a small number of samples and small-scale training can continuously improve the gaze tracking model's estimation accuracy for a specific user's gaze, and then obtain a user-level gaze tracking model.
作为第三方面的一种可能的实现方式,注视点校准单元,还配置为根据用户在第二图像中注视点的置信度对注视点进行筛选;和/或,优化单元,还配置为根据用户在第二图像中注视点的置信度对第二视线方向进行筛选。As a possible implementation of the third aspect, the gaze point calibration unit is further configured to screen the gaze point according to the confidence of the user's gaze point in the second image; and/or the optimization unit is further configured to The confidence of the gaze point in the second image filters the second gaze direction.
由此,可减少运算量,提高处理效率和视线校准准确性。Therefore, the amount of computation can be reduced, and the processing efficiency and the accuracy of line-of-sight calibration can be improved.
作为第三方面的一种可能的实现方式,用户在第二图像中注视点的位置是利用注视点校准模型根据用户在第二图像中的注视区域和第二图像获得的。As a possible implementation manner of the third aspect, the position of the gaze point of the user in the second image is obtained according to the gaze area of the user in the second image and the second image by using a gaze point calibration model.
由此,可高效准确且稳定地获得用户在第二图像中的注视点。Thus, the gaze point of the user in the second image can be obtained efficiently, accurately and stably.
作为第三方面的一种可能的实现方式,注视点校准模型同时提供了用户在第二图像中注视点的概率值,置信度由概率值确定。As a possible implementation of the third aspect, the gaze point calibration model also provides a probability value of the user's gaze point in the second image, and the confidence is determined by the probability value.
由此,可充分利用注视点校准模型提供的数据来提高处理效率。Thus, the data provided by the gaze point calibration model can be fully utilized to improve processing efficiency.
本申请第四方面提供一种视线校准装置,包括:The fourth aspect of the present application provides a line of sight calibration device, including:
注视点位置确定单元,配置为响应于用户对显示屏中参考点的注视操作,获得用户的注视点三维位置;The gaze point position determination unit is configured to obtain the three-dimensional position of the gaze point of the user in response to the user's gaze operation on the reference point in the display screen;
眼睛位置确定单元,配置为根据第一摄像头采集的包含用户眼睛的第一图像,获得用户的眼睛三维位置;The eye position determination unit is configured to obtain the three-dimensional position of the user's eyes according to the first image including the user's eyes captured by the first camera;
第二视线确定单元,配置为根据注视点三维位置和眼睛三维位置,获得用户的第二视线方向。The second line of sight determination unit is configured to obtain the user's second line of sight direction according to the three-dimensional position of the gaze point and the three-dimensional position of the eyes.
由此,可有效提升用户视线数据的准确性,进而提升基于视线追踪的上层应用的用户体验。As a result, the accuracy of the user's gaze data can be effectively improved, thereby improving the user experience of upper-layer applications based on gaze tracking.
作为第四方面的一种可能的实现方式,显示屏为增强现实抬头显示系统的显示屏。As a possible implementation manner of the fourth aspect, the display screen is a display screen of an augmented reality head-up display system.
由此,可在不影响驾驶员安全驾驶的情况下实现其视线校准。Thus, the driver's line of sight calibration can be realized without affecting the safe driving of the driver.
作为第四方面的一种可能的实现方式,装置还包括:优化单元,配置为以用户的第二视线方向和第一图像作为用户的优化样本,基于小样本学习方法优化视线追踪模型。As a possible implementation of the fourth aspect, the device further includes: an optimization unit configured to use the user's second gaze direction and the first image as optimization samples of the user, and optimize the gaze tracking model based on a small sample learning method.
由此,可以少量样本、小规模训练持续提升视线追踪模型对特定用户的视线估计精度,进而获得用户级的视线追踪模型。In this way, a small number of samples and small-scale training can continuously improve the gaze tracking model's estimation accuracy for a specific user's gaze, and then obtain a user-level gaze tracking model.
本申请第五方面提供一种计算设备,包括:A fifth aspect of the present application provides a computing device, including:
至少一个处理器;以及at least one processor; and
至少一个存储器,其与存储有程序指令,程序指令当被至少一个处理器执行时使 得至少一个处理器执行上述的视线校准方法。At least one memory stores program instructions, and when the program instructions are executed by at least one processor, at least one processor executes the above-mentioned line-of-sight calibration method.
本申请第六方面提供一种计算机可读存储介质,其上存储有程序指令,其特征在于,程序指令当被计算机执行时使得计算机执行上述的视线校准方法。The sixth aspect of the present application provides a computer-readable storage medium, on which program instructions are stored, wherein, when the program instructions are executed by a computer, the computer executes the above sight calibration method.
本申请第七方面提供一种驾驶员监控系统,包括:The seventh aspect of the present application provides a driver monitoring system, including:
第一摄像头,配置为采集包含用户眼睛的第一图像;a first camera configured to capture a first image including the user's eyes;
第二摄像头,配置为采集包含用户看到的车外场景的第二图像;a second camera configured to collect a second image comprising a scene outside the vehicle seen by the user;
至少一个处理器;以及at least one processor; and
至少一个存储器,其与存储有程序指令,程序指令当被至少一个处理器执行时使得至少一个处理器执行上述第一方面的视线校准方法。At least one memory stores program instructions, and when the program instructions are executed by the at least one processor, the at least one processor executes the line-of-sight calibration method of the first aspect above.
由此,能够有效提升车辆座舱场景中诸如驾驶员等用户的视线估计准确性,进而提升驾驶员监控系统的用户体验和智能座舱中分神检测、接管等级预估、视线交互等上层应用的用户体验。As a result, the accuracy of line-of-sight estimation for users such as drivers in the vehicle cockpit scene can be effectively improved, thereby improving the user experience of the driver monitoring system and users of upper-layer applications such as distraction detection, takeover level estimation, and line-of-sight interaction in the smart cockpit experience.
作为第七方面的一种可能的实现方式,驾驶员监控系统还包括:显示屏,配置为向用户显示参考点;程序指令当被至少一个处理器执行时使得至少一个处理器执行第二方面的视线校准方法。As a possible implementation manner of the seventh aspect, the driver monitoring system further includes: a display screen configured to display a reference point to the user; when the program instructions are executed by at least one processor, at least one processor executes the second aspect. Sight calibration method.
本申请第八方面提供一种车辆,包括上述的驾驶员监控系统。The eighth aspect of the present application provides a vehicle, including the above-mentioned driver monitoring system.
由此,能够有效提升车辆座舱场景中诸如驾驶员等用户的视线估计准确性,进而提升车辆座舱中分神检测、接管等级预估、视线交互等上层应用的用户体验,最终提升车辆智能驾驶的安全性。As a result, it can effectively improve the accuracy of line-of-sight estimation of users such as drivers in the vehicle cockpit scene, thereby improving the user experience of upper-layer applications such as distraction detection, takeover level estimation, and line-of-sight interaction in the vehicle cockpit, and ultimately improving the intelligent driving of vehicles. safety.
本申请实施例通过包含用户眼睛的第一图像获得用户的眼睛三维位置,通过显示屏上的标定位置或者包含用户看到的车外场景的第二图像获得用户的注视点三维位置,进而获得准确度较高的第二视线方向,有效提升了用户视线估计的准确性,可适用于座舱场景。此外,第二视线方向及第一图像还可作为用户的个性化样本,来优化视线追踪模型,由此,可获得针对特定用户的视线追踪模型,从而解决视线追踪模型优化难及其对部分用户视线估计精度低的问题。In the embodiment of the present application, the three-dimensional position of the user's eyes is obtained through the first image containing the user's eyes, and the three-dimensional position of the user's gaze point is obtained through the calibration position on the display screen or the second image containing the scene outside the vehicle seen by the user, and then the accurate The high-degree second line of sight direction effectively improves the accuracy of user line of sight estimation and is applicable to cockpit scenarios. In addition, the second gaze direction and the first image can also be used as personalized samples of the user to optimize the gaze tracking model, thereby obtaining a gaze tracking model for a specific user, thereby solving the difficulty of optimizing the gaze tracking model and its impact on some users. The problem of low line-of-sight estimation accuracy.
本发明的这些和其它方面在以下(多个)实施例的描述中会更加简明易懂。These and other aspects of the invention will be made more apparent in the following description of the embodiment(s).
以下参照附图来进一步说明本发明的各个特征和各个特征之间的联系。附图均为示例性的,一些特征并不以实际比例示出,并且一些附图中可能省略了本申请所涉及领域的惯常的且对于本申请非必要的特征,或是额外示出了对于本申请非必要的特征,附图所示的各个特征的组合并不用以限制本申请。另外,在本说明书全文中,相同的附图标记所指代的内容也是相同的。具体的附图说明如下:The various features of the present invention and the relationship between the various features are further described below with reference to the accompanying drawings. The drawings are exemplary, some features are not shown to scale, and in some drawings, features customary in the field to which the application pertains and are not necessary for the application may be omitted, or additionally shown for the The application is not an essential feature, and the combination of the various features shown in the drawings is not intended to limit the application. In addition, in the whole specification, the content indicated by the same reference numeral is also the same. The specific accompanying drawings are explained as follows:
图1是本申请一实施例中系统的示例性架构示意图。Fig. 1 is a schematic diagram of an exemplary architecture of a system in an embodiment of the present application.
图2是本申请一实施例中传感器的安装位置示意图。Fig. 2 is a schematic diagram of the installation position of the sensor in an embodiment of the present application.
图3是本申请一实施例中视线校准方法的流程示意图。Fig. 3 is a schematic flowchart of a line of sight calibration method in an embodiment of the present application.
图4是本申请一实施例中眼睛基准点的示例图。Fig. 4 is an example diagram of eye reference points in an embodiment of the present application.
图5是本申请一实施例中眼睛三维位置估计的流程示意图。Fig. 5 is a schematic flow chart of eye three-dimensional position estimation in an embodiment of the present application.
图6是本申请实施例所适用座舱场景的一示例图。Fig. 6 is an example diagram of a cockpit scene applicable to the embodiment of the present application.
图7是图6场景中基准坐标系中注视区域的示意图。FIG. 7 is a schematic diagram of the gaze area in the reference coordinate system in the scene in FIG. 6 .
图8是图6场景中第二图像中注视区域的示意图。FIG. 8 is a schematic diagram of the gaze area in the second image in the scene in FIG. 6 .
图9是本申请一实施例中确定用户在第二图像中的注视区域的流程示意图。Fig. 9 is a schematic flowchart of determining the gaze area of the user in the second image in an embodiment of the present application.
图10是基准坐标系中注视区域与第二图像中注视区域之间的投影示例图。Fig. 10 is a projection example diagram between the gaze area in the reference coordinate system and the gaze area in the second image.
图11是本申请一实施例中注视点校准模型的结构示意图。Fig. 11 is a schematic structural diagram of a gaze point calibration model in an embodiment of the present application.
图12是本申请一实施例中获得注视点三维位置的流程示意图。Fig. 12 is a schematic flow chart of obtaining a three-dimensional position of a gaze point in an embodiment of the present application.
图13是本申请一实施例中优化视线追踪模型的示例性流程示意图。Fig. 13 is a schematic diagram of an exemplary flow chart of optimizing a gaze tracking model in an embodiment of the present application.
图14是座舱场景中驾驶员的视线校准及模型优化过程示意图。Fig. 14 is a schematic diagram of the driver's line of sight calibration and model optimization process in the cockpit scene.
图15是本申请一实施例中视线校准装置的结构示意图。Fig. 15 is a schematic structural diagram of a line of sight calibration device in an embodiment of the present application.
图16是本申请另一实施例中系统的示例性架构示意图。Fig. 16 is a schematic diagram of an exemplary architecture of a system in another embodiment of the present application.
图17是本申请另一实施例中视线校准方法的流程示意图。Fig. 17 is a schematic flowchart of a line of sight calibration method in another embodiment of the present application.
图18是本申请另一实施例中视线校准装置的结构示意图。Fig. 18 is a schematic structural diagram of a line-of-sight calibration device in another embodiment of the present application.
图19是本申请实施例计算设备的结构示意图。FIG. 19 is a schematic structural diagram of a computing device according to an embodiment of the present application.
说明书和权利要求书中的词语“第一、第二、第三等”或模块A、模块B、模块C等类似用语,仅用于区别类似的对象,不代表针对对象的特定排序,可以理解地,在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。The words "first, second, third, etc." or similar terms such as module A, module B, and module C in the description and claims are only used to distinguish similar objects, and do not represent a specific ordering of objects. It can be understood that Obviously, where permitted, the specific order or sequence can be interchanged such that the embodiments of the application described herein can be practiced in other sequences than those illustrated or described herein.
在以下的描述中,所涉及的表示步骤的标号,如S110、S120……等,并不表示一定会按此步骤执行,在允许的情况下可以互换前后步骤的顺序,或同时执行。In the following description, the involved reference numerals representing steps, such as S110, S120, etc., do not mean that this step must be executed, and the order of the previous and subsequent steps can be interchanged or executed simultaneously if allowed.
说明书和权利要求书中使用的术语“包括”不应解释为限制于其后列出的内容;它不排除其它的元件或步骤。因此,其应当诠释为指定所提到的所述特征、整体、步骤或部件的存在,但并不排除存在或添加一个或更多其它特征、整体、步骤或部件及其组群。因此,表述“包括装置A和B的设备”不应局限为仅由部件A和B组成的设备。The term "comprising" used in the description and claims should not be interpreted as being restricted to what is listed thereafter; it does not exclude other elements or steps. Therefore, it should be interpreted as specifying the presence of said features, integers, steps or components, but not excluding the presence or addition of one or more other features, integers, steps or components and groups thereof. Therefore, the expression "apparatus comprising means A and B" should not be limited to an apparatus consisting of parts A and B only.
本说明书中提到的“一个实施例”或“实施例”意味着与该实施例结合描述的特定特征、结构或特性包括在本发明的至少一个实施例中。因此,在本说明书各处出现的用语“在一个实施例中”或“在实施例中”并不一定都指同一实施例,但可以指同一实施例。此外,在一个或多个实施例中,能够以任何适当的方式组合各特定特征、结构或特性,如从本公开对本领域的普通技术人员显而易见的那样。Reference in this specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases "in one embodiment" or "in an embodiment" in various places in this specification do not necessarily all refer to the same embodiment, but may. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to one of ordinary skill in the art from this disclosure, in one or more embodiments.
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。如有不一致,以本说明书中所说明的含义或者根据本说明书中记载的内容得出的含义为准。另外,本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. In case of any inconsistency, the meaning stated in this manual or the meaning derived from the content recorded in this manual shall prevail. In addition, the terms used herein are only for the purpose of describing the embodiments of the application, and are not intended to limit the application.
为了准确地对本申请中的技术内容进行叙述,以及为了准确地理解本发明,在对具体实施方式进行说明之前先对本说明书中所使用的术语给出如下的解释说明或定义。In order to accurately describe the technical content in this application, and to accurately understand the present invention, the following explanations or definitions are given to the terms used in this specification before describing the specific embodiments.
视线追踪(Eye tracking/gaze tracking),测量人眼注视方向或注视点的技术。Eye tracking (Eye tracking/gaze tracking), a technology for measuring the direction or point of gaze of human eyes.
视线追踪模型(Eye tracking/gaze tracking model),可通过包含人眼或人脸的图 像估计人眼注视方向或注视点的机器学习模型。例如,神经网络模型等。Eye tracking/gaze tracking model (Eye tracking/gaze tracking model), a machine learning model that can estimate the direction or point of gaze of human eyes through images containing human eyes or faces. For example, neural network models, etc.
驾驶员监控系统(Driver Monitoring System,DMS),基于图像处理技术、语音处理技术等监测车内驾驶员状态的系统,其包括安装在车内座舱中的车内摄像头、处理器、补光灯等组件,车内摄像头可捕捉包含驾驶员脸部、头部、部分躯干(例如,手臂)的图像(即本文的DMS图像)。Driver Monitoring System (Driver Monitoring System, DMS), based on image processing technology, voice processing technology, etc., monitors the status of the driver in the car, which includes an in-car camera, processor, fill light, etc. installed in the cockpit of the car Components, the in-vehicle camera can capture images including the driver's face, head, and part of the torso (eg, arm) (ie, the DMS image in this paper).
车外摄像头,也称前方摄像头,用于捕捉包含车外场景(尤其是车辆前方场景)的图像,该图像中包含驾驶员所看到的车外场景。The exterior camera, also known as the front camera, is used to capture images including scenes outside the car (especially scenes in front of the vehicle), which include the scenes outside the car seen by the driver.
彩色(Red Green Blue,RGB)摄像头,通过感应物体反射回来的自然光或近红外光对物体彩色地成像。The color (Red Green Blue, RGB) camera images the object in color by sensing the natural light or near-infrared light reflected back from the object.
飞行时间(Time of Flight,TOF)摄像头,通过向目标物体发射光脉冲,同时记录光脉冲的反射运动时间,推算出光脉冲发射器同目标物体的距离,并以此生成目标物体的3D图像,该3D图像包括目标物体的深度信息和反射光强度的信息。Time of Flight (TOF) camera, by emitting light pulses to the target object, while recording the reflection movement time of the light pulse, calculates the distance between the light pulse emitter and the target object, and generates a 3D image of the target object. The 3D image includes the depth information of the target object and the information of the reflected light intensity.
PnP(Perspective-n-Point),通过世界坐标系中的N个特征点与图像坐标系中的N个像点,计算出其投影关系,从而获得相机或物体位姿的问题。PnP求解是指:给定n个3D参考点{c1,c2,…,Cn}到摄像机图像上2D投影点{u1,u2,…,un}的匹配点对,已知3D参考点在世界坐标系下的坐标,2D点在图像坐标系下的坐标,已知摄像机的内参数K,求世界坐标系与摄像机坐标系之间的位姿变换{R|t},R为旋转矩阵,t表示平移变量。PnP (Perspective-n-Point), through the N feature points in the world coordinate system and the N image points in the image coordinate system, calculate their projection relationship, so as to obtain the problem of camera or object pose. PnP solution refers to: Given n 3D reference points {c1, c2,..., Cn} to the matching point pairs of 2D projection points {u1, u2,..., un} on the camera image, the known 3D reference points are in the world coordinates The coordinates in the system, the coordinates of the 2D point in the image coordinate system, the internal parameter K of the camera is known, and the pose transformation {R|t} between the world coordinate system and the camera coordinate system is calculated, R is the rotation matrix, and t represents translation variable.
Landmark算法,人脸部特征点提取技术的一种。Landmark algorithm, a kind of face feature point extraction technology.
世界坐标系,也称为测量坐标系、客观坐标系,是一个三维直角坐标系,以其为基准可以描述相机和待测物体的三维位置,是客观三维世界的绝对坐标系,通常用Pw(Xw,Yw,Zw)表示其坐标值。The world coordinate system, also known as the measurement coordinate system and the objective coordinate system, is a three-dimensional rectangular coordinate system, which can be used as a reference to describe the three-dimensional position of the camera and the object to be measured. It is the absolute coordinate system of the objective three-dimensional world. Usually, Pw( Xw, Yw, Zw) represent its coordinate value.
相机坐标系,为三维直角坐标系,以相机的光心为坐标原点,Z轴为相机光轴,X轴、Y轴分别平行于图像坐标系中的X轴、Y轴,通常用Pc(Xc,Yc,Zc)表示其坐标值。The camera coordinate system is a three-dimensional rectangular coordinate system, with the optical center of the camera as the coordinate origin, the Z axis as the camera optical axis, and the X axis and Y axis respectively parallel to the X axis and Y axis in the image coordinate system. Usually, Pc(Xc , Yc, Zc) represent its coordinate value.
相机的外参,可决定相机坐标系与世界坐标系之间的相对位置关系,从世界坐标系转换到相机坐标系的参数,包括旋转矩阵R和平移向量T。以针孔成像为例,相机外参、世界坐标和相机坐标满足关系式(1):Pc=RPw+T(1);其中,Pw为世界坐标,Pc是相机坐标,T=(Tx,Ty,Tz),是平移向量,R=R(α,β,γ)是旋转矩阵,分别是绕相机坐标系的Z轴旋转角度为γ、绕Y轴旋转角度为β、绕X轴旋转角度为α,这6个参数即α,β,γ,Tx,Ty,Tz组成了相机的外参。The external parameters of the camera can determine the relative positional relationship between the camera coordinate system and the world coordinate system, and the parameters converted from the world coordinate system to the camera coordinate system, including the rotation matrix R and the translation vector T. Taking pinhole imaging as an example, the camera extrinsic parameters, world coordinates and camera coordinates satisfy the relation (1): Pc=RPw+T(1); where Pw is the world coordinate, Pc is the camera coordinate, T=(Tx,Ty ,Tz), is the translation vector, R=R(α,β,γ) is the rotation matrix, respectively, the rotation angle around the Z axis of the camera coordinate system is γ, the rotation angle around the Y axis is β, and the rotation angle around the X axis is α, these 6 parameters namely α, β, γ, Tx, Ty, Tz constitute the external parameters of the camera.
相机的内参,决定了从三维空间到二维图像的投影关系,仅与相机有关。以小孔成像模型为例,不考虑图像畸变上,内参可包括相机在图像坐标系两个坐标轴u、v方向上的尺度因子、相对于成像平面坐标系的主点坐标(x
0,y
0)、坐标轴倾斜参数s,u轴的尺度因子是每个像素在图像坐标系中x方向的物理长度与相机焦距f的比值,v轴尺度因子是像素在图像坐标系中y方向上的物理长度与相机焦距的比值。若考虑图像畸变,内参可包括相机在图像坐标系两个坐标轴u、v方向上的尺度因子、相对于成像平面坐标系的主点坐标、坐标轴倾斜参数和畸变参数,畸变参数可包括相机的三个径向畸变参数和两个切向畸变参数。
The internal parameters of the camera determine the projection relationship from the three-dimensional space to the two-dimensional image, which is only related to the camera. Taking the pinhole imaging model as an example, regardless of image distortion, the internal parameters can include the scale factor of the camera in the two coordinate axes u and v directions of the image coordinate system, and the principal point coordinates (x 0 , y 0 ), the coordinate axis tilt parameter s, the scale factor of the u axis is the ratio of the physical length of each pixel in the x direction in the image coordinate system to the camera focal length f, and the v axis scale factor is the pixel in the y direction of the image coordinate system The ratio of the physical length to the focal length of the camera. If image distortion is considered, the internal parameters can include the scale factor of the camera in the two coordinate axes u and v directions of the image coordinate system, the principal point coordinates relative to the imaging plane coordinate system, the coordinate axis tilt parameter and the distortion parameter, and the distortion parameter can include the camera The three radial distortion parameters and two tangential distortion parameters of .
相机的内参和外参可以通过张正友标定获取。本申请实施例中,第一摄像头的内参和外参、第二摄像头的内参和外参均在同一世界坐标系中标定。The internal and external parameters of the camera can be obtained through Zhang Zhengyou calibration. In the embodiment of the present application, the internal reference and external reference of the first camera, and the internal reference and external reference of the second camera are all calibrated in the same world coordinate system.
成像平面坐标系,即图像坐标系,以图像平面的中心为坐标原点,X轴和Y轴分别平行于图像平面的两条垂直边,通常用P(x,y)表示其坐标值,图像坐标系是用物理单位(例如,毫米)表示像素在图像中的位置。The imaging plane coordinate system, that is, the image coordinate system, takes the center of the image plane as the coordinate origin, and the X-axis and Y-axis are respectively parallel to the two vertical sides of the image plane. P(x, y) is usually used to represent its coordinate value, and the image coordinate A system is the position of a pixel in an image in physical units (for example, millimeters).
像素坐标系,即以像素为单位的图像坐标系,以图像平面的左上角顶点为原点,X轴和Y轴分别平行于图像坐标系的X轴和Y轴,通常用p(u,v)表示其坐标值,像素坐标系是以像素为单位表示像素在图像中的位置。The pixel coordinate system, that is, the image coordinate system in pixels, takes the upper left vertex of the image plane as the origin, and the X-axis and Y-axis are parallel to the X-axis and Y-axis of the image coordinate system, usually p(u,v) Represents its coordinate value, and the pixel coordinate system represents the position of the pixel in the image in units of pixels.
以针孔相机模型为例,像素坐标系的坐标值与相机坐标系的坐标值之间满足关系式(2)。Taking the pinhole camera model as an example, the coordinate value of the pixel coordinate system and the coordinate value of the camera coordinate system satisfy the relationship (2).
其中,(u、v)表示以像素为单位的图像坐标系的坐标,(Xc,Yc,Zc)表示相机坐标系中的坐标,K为相机内参的矩阵表示。Among them, (u, v) represent the coordinates of the image coordinate system in units of pixels, (Xc, Yc, Zc) represent the coordinates in the camera coordinate system, and K is the matrix representation of the internal camera parameters.
小样本学习(few-shot learning),是指神经网络在预学习了一定已知类别的大量样本后,对于新的类别,只需要少量的标记样本就能够实现快速学习。Few-shot learning refers to that after the neural network pre-learns a large number of samples of a certain known category, it only needs a small number of labeled samples to achieve rapid learning for a new category.
元学习(meta-learning),是小样本学习研究中的一个重要的分支,主要思想是当目标任务的训练样本较少时,通过使用大量与目标小样本任务相似的小样本任务来训练神经网络,以使训练后的神经网络在目标任务上有着不错的初始值,然后利用少量的目标小样本任务的训练样本对训练后的神经网络进行调整。Meta-learning is an important branch of small-sample learning research. The main idea is to train the neural network by using a large number of small-sample tasks similar to the target small-sample task when the target task has fewer training samples. , so that the trained neural network has a good initial value on the target task, and then use a small number of training samples of the target small sample task to adjust the trained neural network.
模型无关元学习(Model-agnostic meta-learning,MAML)算法,元学习的一种具体算法,其思想是:训练机器学习模型的初始化参数,使该机器学习模型能够在来自新任务的少量数据上对参数执行一次或多次的学习后能够得到较佳的表现。Model-agnostic meta-learning (MAML) algorithm, a specific algorithm of meta-learning, its idea is: to train the initialization parameters of the machine learning model, so that the machine learning model can learn from a small amount of data from new tasks Better performance can be obtained after performing one or more learnings on the parameters.
soft argmax,能够通过热力图得到关键点坐标的一种算法或函数,可采用神经网络的层实现,实现soft argmax的层可称为soft argmax层。soft argmax, an algorithm or function that can obtain the coordinates of key points through a heat map, can be implemented by a layer of a neural network, and the layer that realizes soft argmax can be called a soft argmax layer.
二值交叉熵,损失函数的一种。Binary cross entropy, a type of loss function.
单目深度估计(Fast Depth),利用一张或者唯一视角下的RGB图像估计图像中每个像素相对拍摄源的距离的方法。Monocular depth estimation (Fast Depth), a method of estimating the distance of each pixel in the image relative to the shooting source by using an RGB image under one or only perspective.
抬头显示系统(Head Up Display,HUD),又称平行显示系统,能够将时速、发动机转速、电池电量、导航等重要行车信息投射到驾驶员前面的挡风玻璃上,使驾驶员不低头、不转头就便可通过挡风玻璃显示区域看到时速、发动机转速、电池电量、导航等车辆参数和驾驶信息。Head Up Display (HUD), also known as parallel display system, can project important driving information such as speed, engine speed, battery power, navigation, etc. onto the windshield in front of the driver, so that the driver does not bow his head or Just turn your head and you can see vehicle parameters and driving information such as speed, engine speed, battery power, navigation, etc. through the windshield display area.
增强现实抬头显示系统(AR-HUD),是通过内部特殊设计的光学系统将图像信息精确地结合于实际交通路况中,将胎压、速度、转速等信息投射到前挡风玻璃上,使车主在行车中,无需低头就能查看汽车相关信息。The augmented reality head-up display system (AR-HUD) precisely combines the image information with the actual traffic conditions through the specially designed internal optical system, and projects information such as tire pressure, speed, and rotational speed onto the front windshield, enabling the owner to While driving, you can view car-related information without looking down.
第一种可能的实现方式是,采集大量视线数据训练视线追踪模型,将训练好的视线追踪模型部署在车载端,车载端利用该视线追踪模型对实时采集的图像进行处理以 最终获得使用者的视线方向。该实现方式主要存在如下缺陷:训练视线追踪模型时所使用的样本与当前使用者可能存在较大的个体差异(例如,人眼内部结构的个体差异等),这使得视线追踪模型对当前使用者的匹配程度不高,导致当前使用者视线估计不准。The first possible implementation is to collect a large amount of gaze data to train the gaze tracking model, deploy the trained gaze tracking model on the vehicle end, and the vehicle end uses the gaze tracking model to process the real-time collected images to finally obtain the user's line of sight. This implementation mainly has the following defects: there may be large individual differences between the samples used in training the gaze tracking model and the current user (for example, individual differences in the internal structure of the human eye, etc.), which makes the gaze tracking model difficult for the current user. The degree of matching is not high, resulting in inaccurate estimation of the current user's line of sight.
第二种可能的实现方式是:使用屏幕显示特定图像,通过视线追踪设备使用者与屏幕上特定图像的交互校准视线追踪设备,获得针对于该使用者的参数,从而提升视线追踪设备对该使用者的准确度。该实现方式主要存在如下缺陷:依赖使用者的主动配合,操作繁琐,可能因人为操作不当而导致校准失误,最终影响视线追踪设备对使用者的准确度。同时,因车载情况下座舱中较难部署足够大的显示屏在驾驶员正前方,因此该实现方式不适用于座舱场景。The second possible implementation is: use the screen to display a specific image, calibrate the gaze tracking device through the interaction between the user of the gaze tracking device and the specific image on the screen, and obtain the parameters for the user, thereby improving the use of the gaze tracking device. the accuracy of the This implementation method mainly has the following defects: it relies on the active cooperation of the user, the operation is cumbersome, and may cause calibration errors due to improper human operation, which ultimately affects the accuracy of the eye-tracking device for the user. At the same time, because it is difficult to deploy a large enough display screen directly in front of the driver in the cockpit, this implementation method is not suitable for the cockpit scene.
第三种可能的实现方式是,在使用屏幕显示播放画面时,首先使用基础视线追踪模型预测初步视线方向,依据初步视线方向获取在屏幕上的初步注视区域,将初步注视区域结合正在播放的画面矫正预测的注视区域,从而提升视线估计精度。该实现方式主要存在如下缺陷:仅适用于注视屏幕的场景,并且对于注视点不断变化的场景,准确率较低。The third possible implementation method is that when using the screen to display the playback picture, first use the basic gaze tracking model to predict the preliminary gaze direction, obtain the preliminary gaze area on the screen according to the preliminary gaze direction, and combine the preliminary gaze area with the currently playing screen Correct the predicted gaze area to improve the gaze estimation accuracy. This implementation method mainly has the following defects: it is only applicable to the scene of watching the screen, and for the scene of constantly changing gaze point, the accuracy rate is low.
上述实现方式均存在座舱场景下视线估计不准确的问题。鉴于此,本申请实施例提出了一种视线校准方法及装置、设备、计算机可读存储介质、系统、车辆,通过包含用户眼睛的第一图像获得用户的眼睛三维位置,通过显示屏上的标定位置或者包含用户看到的车外场景的第二图像获得用户的注视点三维位置,并由用户的眼睛三维位置和注视点三维位置获得准确度较高的第二视线方向,由此,本实施例能够有效提升用户视线估计的准确性,可适用于座舱场景。此外,本申请实施例,还利用包含用户的第二视线方向和其第一图像的优化样本,通过小样本学习方法优化视线追踪模型,以提升视线追踪模型针对用户的视线估计精度,从而获得用户级的视线追踪模型,解决了视线追踪模型优化难及其对部分用户视线估计精度低的问题。The above-mentioned implementation methods all have the problem of inaccurate line-of-sight estimation in the cockpit scene. In view of this, the embodiment of the present application proposes a line of sight calibration method, device, device, computer-readable storage medium, system, and vehicle. The three-dimensional position of the user's eye is obtained through the first image including the user's eye, and the calibration on the display screen position or the second image containing the scene outside the vehicle seen by the user to obtain the three-dimensional position of the user's gaze point, and obtain the second line-of-sight direction with high accuracy from the three-dimensional position of the user's eyes and the three-dimensional position of the gaze point. This example can effectively improve the accuracy of user line of sight estimation and is applicable to cockpit scenarios. In addition, in the embodiment of the present application, the optimized samples including the user's second gaze direction and the first image are also used to optimize the gaze tracking model through a small-sample learning method, so as to improve the gaze tracking model's estimation accuracy for the user's gaze, thereby obtaining the user's The advanced eye-tracking model solves the problem of difficulty in optimizing the eye-tracking model and the low accuracy of some users' eye-line estimation.
本申请实施例可适用于任何需要对人的视线方向进行实时校准或估计的应用场景。一些示例中,本申请实施例可适用于诸如车辆、船只、飞行器等交通工具的座舱环境中驾驶员和/或乘客的视线校准或估计。其他示例中,本申请实施例还可适用于其他场景,例如对佩戴诸如可穿戴眼睛或其他设备的人员进行视线校准或估计。当然,本申请实施例还可应用于其他场景,在此不再一一列举。The embodiments of the present application may be applicable to any application scenario that requires real-time calibration or estimation of a person's gaze direction. In some examples, the embodiments of the present application may be applicable to the calibration or estimation of the driver's and/or passengers' line of sight in the cockpit environment of vehicles such as vehicles, boats, and aircrafts. In other examples, the embodiments of the present application may also be applicable to other scenarios, for example, performing line-of-sight calibration or estimation on a person wearing wearable glasses or other devices. Certainly, the embodiment of the present application may also be applied to other scenarios, which will not be listed one by one here.
【实施例一】[Example 1]
下面先对实施例所适用的系统进行示例性地说明。The system to which the embodiment is applicable is exemplarily described below.
图1示出了座舱环境中本实施例示例性系统100的架构示意图。参见图1所示,该示例性系统100可包括:第一摄像头110、第二摄像头120、图像处理系统130和模型优化系统140。Fig. 1 shows a schematic architecture diagram of an exemplary system 100 of this embodiment in a cockpit environment. Referring to FIG. 1 , the exemplary system 100 may include: a first camera 110 , a second camera 120 , an image processing system 130 and a model optimization system 140 .
第一摄像头110负责采集用户的人眼图像(即,下文的第一图像)。参见图1所示,以座舱场景为例,第一摄像头110可以是DMS中的车内摄像头,该车内摄像头用于拍摄座舱中的驾驶员。以驾驶员为例,参见图2的示例,车内摄像头是可以安装在汽车A柱(图2中的①位置)或方向盘附近的DMS摄像头,该DMS摄像头优选为分辨率较高的RGB摄像头。这里,人眼图像(即,下文的第一图像)泛指包含了 人眼的各种类型图像,例如,人脸图像、包含人脸的半身图像等。一些实施例中,为便于通过第一图像获取到人眼位置的同时,获得用户的其他信息,同时减少图像数据量,人眼图像(也即,下文的第一图像)可以是人脸图像。The first camera 110 is responsible for capturing the user's eye image (ie, the first image hereinafter). Referring to FIG. 1 , taking the cockpit scene as an example, the first camera 110 may be an in-vehicle camera in the DMS, and the in-vehicle camera is used to photograph the driver in the cockpit. Taking the driver as an example, referring to the example in Figure 2, the in-vehicle camera is a DMS camera that can be installed near the A-pillar of the car (position ① in Figure 2) or near the steering wheel, and the DMS camera is preferably a higher-resolution RGB camera. Here, the human eye image (i.e., the first image hereinafter) generally refers to various types of images including human eyes, for example, a human face image, a bust image including a human face, and the like. In some embodiments, in order to obtain other information of the user while obtaining the position of the human eye through the first image, and reduce the amount of image data, the human eye image (that is, the first image hereinafter) may be a human face image.
第二摄像头120负责采集场景图像(也即,下文的第二图像),该场景图像中包含了用户看到的车外场景,也即,第二摄像头120的视野与用户的视野至少部分地重合。参见图2所示,以座舱场景和驾驶员为例,第二摄像头120可以是车外摄像头,该车外摄像头可用于拍摄驾驶员所看到的车辆前方场景。参见图2的示例,车外摄像头可以是安装于车辆前挡风玻璃上方(图2中的②位置)的前方摄像头,其可拍摄车辆前方的场景,也即驾驶员看到的车外场景,该前方摄像头优选为TOF摄像头,其可采集深度图像,便于通过图像获得车辆与其前方目标物体(例如,用户注视的物体)之间的距离。The second camera 120 is responsible for collecting a scene image (that is, the second image below), which includes the scene outside the vehicle seen by the user, that is, the field of view of the second camera 120 and the field of view of the user at least partially overlap . Referring to FIG. 2 , taking the cockpit scene and the driver as an example, the second camera 120 may be an exterior camera, and the exterior camera may be used to capture the scene in front of the vehicle seen by the driver. Referring to the example in Figure 2, the camera outside the vehicle can be a front camera installed above the front windshield of the vehicle (position ② in Figure 2), which can capture the scene in front of the vehicle, that is, the scene outside the vehicle seen by the driver, The front camera is preferably a TOF camera, which can collect depth images, so as to obtain the distance between the vehicle and the target object in front (for example, the object that the user is looking at) through the image.
图像处理系统130是能够处理DMS图像及场景图像的图像处理系统,其可运行视线追踪模型以获得用户的初步视线数据并利用该初步视线数据(即下文的第一视线方向)执行下文所述的视线校准方法以获得用户的校准视线数据(即下文的第二视线方向),从而提升用户视线数据的准确性。The image processing system 130 is an image processing system capable of processing DMS images and scene images, and it can run a gaze tracking model to obtain the user's preliminary gaze data and use the preliminary gaze data (ie, the first gaze direction below) to perform the hereinafter described The line of sight calibration method obtains the user's calibrated line of sight data (ie, the second line of sight direction hereinafter), thereby improving the accuracy of the user's line of sight data.
模型优化系统140可负责视线追踪模型的优化,其可利用图像处理系统130提供的用户的校准视线数据对视线追踪模型进行优化并将优化后的视线追踪模型提供给图像处理系统130,从而提升视线追踪模型对用户的视线估计精度。The model optimization system 140 can be responsible for the optimization of the gaze tracking model, which can optimize the gaze tracking model by using the user's calibrated gaze data provided by the image processing system 130 and provide the optimized gaze tracking model to the image processing system 130, thereby improving the sight line The accuracy of the tracking model's estimate of the user's line of sight.
实际应用中,第一摄像头110、第二摄像头120和图像处理系统130均可部署于车端,即车辆中。模型优化系统140可根据需要部署于车端和/或云端。图像处理系统130与模型优化系统140之间可通过网络通信。In practical applications, the first camera 110, the second camera 120 and the image processing system 130 can all be deployed at the vehicle end, that is, in the vehicle. The model optimization system 140 can be deployed on the vehicle side and/or the cloud as required. The image processing system 130 and the model optimization system 140 may communicate through a network.
一些实施例中,上述示例性系统100中还可包括模型训练系统150,该模型训练系统150负责训练得到视线追踪模型,其可部署于云端。实际应用中,模型优化系统140和模型训练系统150可通过同一系统实现。In some embodiments, the above exemplary system 100 may further include a model training system 150, which is responsible for training a gaze tracking model, which may be deployed in the cloud. In practical applications, the model optimization system 140 and the model training system 150 can be realized by the same system.
参见图2所示,第一摄像头110的相机坐标系可以是直角坐标系Xc
1-Yc
1-Zc
1,第二摄像头120的相机坐标系可以是直角坐标系Xc
2-Yc
2-Zc
2,第一摄像头110、第二摄像头120的图像坐标系、像素坐标系在图2中未示出。本实施例中,为便于利用校准获得的第二视线方向优化视线追踪模型,以第一摄像头110的相机坐标系为基准坐标系,视线方向、注视点三维位置和眼睛三维位置均可通过第一摄像头110的相机坐标系中的坐标和/或角度表示。具体应用中,根据实际需求、具体应用场景、计算复杂度的要求等各种因素自由选定基准坐标系,不限于此。例如,还可以车辆的座舱坐标系为基准坐标系。
Referring to FIG. 2 , the camera coordinate system of the first camera 110 may be a Cartesian coordinate system Xc 1 -Yc 1 -Zc 1 , and the camera coordinate system of the second camera 120 may be a Cartesian coordinate system Xc 2 -Yc 2 -Zc 2 , The image coordinate system and pixel coordinate system of the first camera 110 and the second camera 120 are not shown in FIG. 2 . In this embodiment, in order to optimize the gaze tracking model by using the second gaze direction obtained through calibration, the camera coordinate system of the first camera 110 is used as the reference coordinate system. Coordinates and/or angle representations in the camera coordinate system of the camera head 110 . In a specific application, the reference coordinate system can be freely selected according to various factors such as actual needs, specific application scenarios, and calculation complexity requirements, but is not limited thereto. For example, the cockpit coordinate system of the vehicle may also be used as the reference coordinate system.
下面对本实施例的视线校准方法进行详细说明。The line of sight calibration method of this embodiment will be described in detail below.
图3示出了本实施例视线校准方法的示例性流程。参见图3所示,本实施例的一种示例性视线校准方法可包括如下步骤:Fig. 3 shows an exemplary flow of the line of sight calibration method in this embodiment. Referring to Figure 3, an exemplary line of sight calibration method in this embodiment may include the following steps:
步骤S301,根据第一摄像头110采集的包含用户眼睛的第一图像,获得用户的眼睛三维位置和第一视线方向;Step S301, according to the first image collected by the first camera 110 including the user's eyes, obtain the three-dimensional position of the user's eyes and the first line of sight direction;
步骤S302,根据眼睛三维位置、第一视线方向、第一摄像头110的外参和第二摄像头120的外参与内参,获得用户在第二图像中的注视区域,第二图像由第二摄像 头120采集且包含用户看到的车外场景;Step S302, according to the three-dimensional position of the eyes, the first line of sight direction, the external parameters of the first camera 110 and the external parameters and internal parameters of the second camera 120, obtain the gaze area of the user in the second image, and the second image is collected by the second camera 120 And include the scene outside the car seen by the user;
步骤S303,根据用户在第二图像中的注视区域和所述第二图像,获得用户在第二图像中注视点的位置;Step S303, according to the user's gaze area in the second image and the second image, obtain the position of the user's gaze point in the second image;
步骤S304,根据用户在第二图像中注视点的位置、第二摄像头120的内参,获得用户的注视点三维位置;Step S304, according to the position of the user's gaze point in the second image and the internal parameters of the second camera 120, obtain the three-dimensional position of the user's gaze point;
步骤S305,根据注视点三维位置和眼睛三维位置,获得用户的第二视线方向,第二视线方向作为校准后的视线方向。Step S305, according to the three-dimensional position of the gaze point and the three-dimensional position of the eyes, the second line of sight direction of the user is obtained, and the second line of sight direction is used as the line of sight direction after calibration.
本实施例的视线校准方法,可利用第二图像校准用户的视线方向以获得准确度较高的第二视线方向,有效提升用户视线数据的准确性,进而提升基于视线追踪的上层应用的用户体验。The line of sight calibration method of this embodiment can use the second image to calibrate the user’s line of sight direction to obtain a second line of sight direction with high accuracy, effectively improving the accuracy of the user’s line of sight data, and further improving the user experience of upper-layer applications based on line of sight tracking .
第一视线方向是基于视线追踪模型从第一图像中提取的。以系统100为例,视线追踪模型可由部署于云端的模型训练系统150训练获得并提供给部署于用户车端的图像处理系统130,由图像处理系统130运行该视线追踪模型对包含用户眼睛的第一图像进行处理从而获得用户的第一视线方向。The first gaze direction is extracted from the first image based on a gaze tracking model. Taking the system 100 as an example, the eye-tracking model can be trained by the model training system 150 deployed on the cloud and provided to the image processing system 130 deployed on the user's vehicle. The image processing system 130 runs the eye-tracking model on the first image including the user's eyes. The image is processed to obtain the user's first gaze direction.
眼睛三维位置可表示为预先选定的眼睛基准点在基准坐标系中的坐标。至少一些实施例中,可以根据应用场景的需求、视线方向的使用情况、计算复杂度的需求、硬件性能的情况以及用户自身的需求来选定眼睛基准点。图4示出了眼睛基准点的示例图,眼睛基准点可以包括但不限于两眼中心的中间点O、左眼中心O1、右眼中心O2中之一或多项。这里,眼中心可以是眼睛的瞳孔中心、眼球中心、角膜中心或眼睛的其他位置,可根据需要自由选定。The three-dimensional position of the eye can be expressed as the coordinates of the pre-selected eye fiducial point in the reference coordinate system. In at least some embodiments, the eye reference point can be selected according to the requirements of the application scene, the usage of the gaze direction, the requirement of computational complexity, the condition of hardware performance and the user's own requirements. FIG. 4 shows an example diagram of eye reference points, which may include but not limited to one or more of the middle point O of the centers of the two eyes, the center of the left eye O1, and the center of the right eye O2. Here, the center of the eye can be the center of the pupil of the eye, the center of the eyeball, the center of the cornea or other positions of the eye, which can be freely selected as required.
对于座舱场景中的用户而言,注视点与其两眼的距离将远大于其两眼间距,此时,可以选定两眼中心的中间点O为眼睛基准点,如此,可在不影响视线估计精度的情况下减少数据量,降低计算复杂度,提升处理效率。若需使用第二视线方向优化视线追踪模型,且用户期望视线追踪模型的精度较高,可选定左眼中心O1和右眼中心O2为眼睛基准点。For users in the cockpit scene, the distance between the gaze point and the two eyes will be much greater than the distance between the two eyes. At this time, the middle point O of the center of the two eyes can be selected as the eye reference point. In this way, the line of sight estimation can be In the case of high precision, the amount of data is reduced, the computational complexity is reduced, and the processing efficiency is improved. If it is necessary to use the second gaze direction to optimize the gaze tracking model, and the user expects a higher accuracy of the gaze tracking model, the left eye center O1 and the right eye center O2 can be selected as eye reference points.
视线方向可通过基准坐标系中的视角和/或视线向量来表示。视角可以是视线与眼睛轴线的夹角,视线与眼睛轴线的交点位置即为用户的眼睛三维位置。视线向量是以眼睛在基准坐标系中的位置为起点、以注视点在基准坐标系中的位置为终点的方向矢量,该方向矢量中可包含眼睛基准点在基准坐标系中的三维坐标和注视点在基准坐标系中的三维坐标。The gaze direction may be represented by a viewing angle and/or a gaze vector in a reference coordinate system. The angle of view may be the angle between the line of sight and the axis of the eyes, and the intersection of the line of sight and the axis of the eyes is the three-dimensional position of the user's eyes. The sight vector is a direction vector starting from the position of the eye in the reference coordinate system and ending at the position of the gaze point in the reference coordinate system. The direction vector can include the three-dimensional coordinates of the eye reference point in the reference coordinate system and the gaze The three-dimensional coordinates of the point in the datum coordinate system.
注视点是指用户眼睛所注视的点。以座舱场景为例,驾驶员的注视点即是驾驶员眼睛看向的具体位置。一注视点可通过其在空间的位置表示。本实施例中,注视点的三维位置通过注视点在基准坐标系中的三维坐标表示。The fixation point refers to the point at which the user's eyes are fixed. Taking the cockpit scene as an example, the driver's gaze point is the specific position where the driver's eyes are looking. A gaze point can be represented by its position in space. In this embodiment, the three-dimensional position of the gaze point is represented by the three-dimensional coordinates of the gaze point in the reference coordinate system.
步骤S301中,可通过各种可适用的方式确定用户的眼睛三维位置。一些实现方式中,眼睛三维位置可通过人脸特征点检测算法结合预先构建的3D人脸模型获得。一些实现方式中,眼睛三维位置可通过第一图像获得的二维位置结合第一图像的深度信息采用Landmark算法来获得。可以理解,任何可通过图像数据得到图像中某个点的三维位置的方法均可适用步骤S301中用户眼睛三维位置的确定,此处不再一一列举。In step S301, the three-dimensional position of the user's eyes may be determined in various applicable ways. In some implementation manners, the three-dimensional position of the eyes can be obtained through a face feature point detection algorithm combined with a pre-built 3D face model. In some implementation manners, the three-dimensional position of the eye may be obtained by using the Landmark algorithm based on the two-dimensional position obtained from the first image combined with the depth information of the first image. It can be understood that any method that can obtain the three-dimensional position of a certain point in the image through image data is applicable to the determination of the three-dimensional position of the user's eyes in step S301, and will not be listed here.
图5示出了眼睛三维位置估计的示例性过程。参见图5所示,眼睛三维位置估计的示例性过程可包括:步骤S501,使用人脸检测算法和人脸特征点检测算法对第一图像进行处理,获得用户的人脸特征点在第一图像中的位置;S502,将用户的人脸特征点在第一图像中的位置结合预先获得的标准3D人脸模型进行PnP求解,解得用户的人脸特征点在基准坐标系的3D坐标;S503,从用户的人脸特征点在基准坐标系的3D坐标中提取用户的眼睛基准点的3D坐标作为用户眼睛的3D坐标。需要说明的是,图5仅为示例,并非用于限制本实施例眼睛三维位置估计的具体实现方式。Fig. 5 shows an exemplary process of eye three-dimensional position estimation. Referring to Fig. 5, the exemplary process of eye three-dimensional position estimation may include: step S501, use the face detection algorithm and the face feature point detection algorithm to process the first image, and obtain the user's face feature points in the first image position in the image; S502, combining the position of the user's face feature point in the first image with the pre-acquired standard 3D face model for PnP solution, and solving the 3D coordinates of the user's face feature point in the reference coordinate system; S503 , extracting the 3D coordinates of the user's eye reference point from the 3D coordinates of the user's facial feature points in the reference coordinate system as the 3D coordinates of the user's eyes. It should be noted that FIG. 5 is only an example, and is not intended to limit a specific implementation manner of eye three-dimensional position estimation in this embodiment.
步骤S302中,可根据用户的眼睛三维位置、第一视线方向、第一摄像头110的外参、第二摄像头120的内参和外参,利用摄像机透视投影模型确定用户在第二图像中的注视区域(下文将“在第二图像中的注视区域”简称为“第二注视区域”)。这里,摄像机透视投影模型可以是针孔成像模型或非线性的透视投影模型。In step S302, according to the three-dimensional position of the user's eyes, the first line of sight direction, the external parameters of the first camera 110, the internal parameters and external parameters of the second camera 120, the camera perspective projection model can be used to determine the gaze area of the user in the second image (Hereinafter, the "focus area in the second image" is simply referred to as "the second focus area"). Here, the camera perspective projection model may be a pinhole imaging model or a nonlinear perspective projection model.
为获得更为准确的第二注视区域,步骤S302可包括:根据用户的眼睛三维位置、第一视线方向、第一摄像头110的外参、第二摄像头120的内参和外参以及视线追踪模型的精度,获得用户在第二图像中的注视区域。由此,可在最终获得的第二视线方向中消除视线追踪模型精度限制而带来的误差。In order to obtain a more accurate second gaze area, step S302 may include: according to the three-dimensional position of the user's eyes, the first line of sight direction, the external parameters of the first camera 110, the internal parameters and external parameters of the second camera 120, and the line of sight tracking model Accuracy, to obtain the gaze area of the user in the second image. In this way, the error caused by the limitation of the accuracy of the line-of-sight tracking model can be eliminated in the finally obtained second line-of-sight direction.
下面结合具体场景详细说明获得第二注视区域的过程。The process of obtaining the second gaze area will be described in detail below in conjunction with specific scenarios.
图6示出了座舱环境中驾驶员(图中未示出)看向车辆前方人行横道中行人的场景。Fig. 6 shows a scene where a driver (not shown in the figure) in the cockpit environment looks at pedestrians in the crosswalk in front of the vehicle.
图9示出了确定用户的第二注视区域的示例性流程。参见图9所示,获得用户的第二注视区域的过程可包括如下步骤:Fig. 9 shows an exemplary flow of determining the second gaze area of the user. Referring to Figure 9, the process of obtaining the user's second gaze area may include the following steps:
步骤S901,根据用户的眼睛三维位置和第一视线方向确定用户在基准坐标系中的注视区域S1。Step S901, determine the gaze area S1 of the user in the reference coordinate system according to the three-dimensional position of the user's eyes and the first line of sight direction.
具体地,根据用户眼睛基准点在基准坐标系中的坐标(Xc
1,Yc
1,Zc
1)和通过第一图像获得的第一视线方向ON(视角θ)得到用户在基准坐标系中的视线ON。假设视线追踪模型的平均精度值表示为:±α,α表示视角的误差值,视线追踪模型的精度越低,α取值越大。本步骤中可以将视线角度θ调整为区间值[θ-α,θ+α],以视线角度θ-α的视线和视线角度为θ+α的视线形成的椎体作为用户在基准坐标系中的注视区域S1。
Specifically, according to the coordinates (Xc 1 , Yc 1 , Zc 1 ) of the reference point of the user's eyes in the reference coordinate system and the first line of sight direction ON (angle of view θ) obtained through the first image, the user's line of sight in the reference coordinate system is obtained ON. Assume that the average accuracy value of the gaze tracking model is expressed as: ±α, where α represents the error value of the viewing angle, the lower the accuracy of the gaze tracking model, the greater the value of α. In this step, the line of sight angle θ can be adjusted to an interval value [θ-α, θ+α], and the cone formed by the line of sight with the line of sight angle θ-α and the line of sight with the line of sight angle of θ+α is used as the user’s reference coordinate system The fixation area S1 of .
图7示出了图6所示场景中驾驶员在基准坐标系中的注视区域S1的可视化图形,O表示眼睛三维位置,带有箭头的实线表示第一视线方向ON,θ表示第一视线方向ON的视角,α表示视线追踪模型的平均精度值,虚线椎体表示用户在基准坐标系中的注视区域S1。Fig. 7 shows the visualized graphics of the driver's gaze area S1 in the reference coordinate system in the scene shown in Fig. 6, O represents the three-dimensional position of the eyes, the solid line with arrows represents the first line of sight direction ON, and θ represents the first line of sight The angle of view in direction ON, α represents the average precision value of the gaze tracking model, and the dotted cone represents the user's gaze area S1 in the reference coordinate system.
步骤S902,将用户在基准坐标系中的注视区域S1投影到第二摄像头120的像素坐标系,以获得用户的第二注视区域Q。Step S902 , projecting the gaze area S1 of the user in the reference coordinate system to the pixel coordinate system of the second camera 120 to obtain the second gaze area Q of the user.
图8示出了第二摄像头拍摄图6所示场景所得的第二图像,其中仅示出了驾驶员注视的部分、略去了图6所示场景中与本实施例无关的内容,且图8中标记了用户的第二注视区域Q。Fig. 8 shows the second image captured by the second camera of the scene shown in Fig. 6, in which only the part where the driver is looking at is shown, and the content irrelevant to this embodiment in the scene shown in Fig. 6 is omitted, and Fig. 8 marks the user's second gaze area Q.
以小孔成像模型为例,结合图6~图8的示例,本步骤的投影过程可通过式(1)和式(2)来实现。具体地,首先,基于第一摄像头110的外参和第二摄像头120的 外参,按照式(1)将注视区域S1转换到第二摄像头120的相机坐标系中,得到注视区域S2;然后,基于第二摄像头120的内参,根据关系式(2)将注视区域S2投影到第二摄像头120的像素坐标系中,得到用户的第二注视区域Q。这里,第一摄像头110的外参和第二摄像头120的外参是在同一世界坐标系中标定的。Taking the pinhole imaging model as an example, combined with the examples in FIGS. 6 to 8 , the projection process in this step can be realized by formula (1) and formula (2). Specifically, first, based on the external parameters of the first camera 110 and the external parameters of the second camera 120, the gaze area S1 is transformed into the camera coordinate system of the second camera 120 according to formula (1), and the gaze area S2 is obtained; then, Based on the internal reference of the second camera 120, the gaze region S2 is projected into the pixel coordinate system of the second camera 120 according to relational expression (2), to obtain the second gaze region Q of the user. Here, the extrinsics of the first camera 110 and the extrinsics of the second camera 120 are calibrated in the same world coordinate system.
将注视区域S1经过第一摄像头110的外参和第二摄像头120的内参和外参,在第二摄像头120的成像面上投影为一四边形的第二注视区域Q,通常视线追踪模型精度越低,α取值越大,用户在基准坐标系中的注视区域S1角度越大,四边形的第二注视区域Q宽度越大。The gaze area S1 is projected on the imaging surface of the second camera 120 as a quadrilateral second gaze area Q through the external parameters of the first camera 110 and the internal parameters and external parameters of the second camera 120. Usually, the accuracy of the gaze tracking model is lower , the larger the value of α, the larger the angle of the user's fixation area S1 in the reference coordinate system, and the larger the width of the quadrilateral second fixation area Q.
图10示出了一条视线OX的投影示例图。参见图10所示,视线OX上深度不同的点x在第二摄像头120的成像面上的投影为O’X’。如图10所示,以左侧O为空间中人眼视线原点,OX为第一视线方向L,映射到第二摄像头120的相机成像面中,人眼视线原点映射点为O’,第一视线方向L映射为视线L’。FIG. 10 shows an exemplary projection diagram of a line of sight OX. Referring to FIG. 10 , the projection of a point x with different depths on the line of sight OX on the imaging plane of the second camera 120 is O'X'. As shown in FIG. 10 , taking the left side O as the origin of the human line of sight in space, and OX as the first line of sight direction L, which is mapped to the camera imaging surface of the second camera 120, the mapping point of the origin of the human line of sight is O', the first The line of sight direction L is mapped to line of sight L'.
需要说明的是,图7~10所示方法仅为示例,本申请实施例中获得第二注视区域的方法不限于此。It should be noted that the methods shown in FIGS. 7 to 10 are only examples, and the method for obtaining the second attention region in the embodiment of the present application is not limited thereto.
第二注视区域可通过灰度图像数据来表征。第二注视区域的灰度图像数据中像素点与第二图像中的像素点一一对应,每个像素点的灰度值可指示自身是否属于注视区域。参见下文图11的示例,假设第二图像的可视化表示为Fig1,黑白图像Fig2即为第二注视区域的可视化表示,黑白图像Fig2中黑色像素点不属于第二注视区域,白色像素点属于第二注视区域。以座舱场景为例,第二摄像头采用TOF摄像头时,第二图像为TOF图像,第二图像中各像素的灰度值可指示目标物体的对应点到第二摄像头的距离。The second gaze area may be characterized by grayscale image data. The pixels in the grayscale image data of the second attention area correspond to the pixels in the second image one by one, and the grayscale value of each pixel can indicate whether it belongs to the attention area. See the example in Figure 11 below, assuming that the visual representation of the second image is Fig1, the black and white image Fig2 is the visual representation of the second attention area, the black pixels in the black and white image Fig2 do not belong to the second attention area, and the white pixels belong to the second attention area. Look at the area. Taking the cockpit scene as an example, when the second camera uses a TOF camera, the second image is a TOF image, and the gray value of each pixel in the second image can indicate the distance from the corresponding point of the target object to the second camera.
步骤S303中,可以通过预先训练好的注视点校准模型基于第二注视区域和第二图像获得用户在第二图像中的注视点(本文将“在第二图像中的注视点”简称为“第二注视点”)的位置。该注视点校准模型可以是任何可用于图像处理的机器学习模型。考虑到神经网络精度高且稳定性好,本申请实施例中,注视点校准模型优选为神经网络模型。In step S303, the gaze point of the user in the second image can be obtained based on the second gaze area and the second image through a pre-trained gaze point calibration model (herein, "the gaze point in the second image" is referred to as "the first gaze point" for short). The location of the two gaze points"). The gaze point calibration model can be any machine learning model available for image processing. Considering the high precision and good stability of the neural network, in the embodiment of the present application, the gaze point calibration model is preferably a neural network model.
下面对注视点校准模型的示例性实现方式进行详细说明。An exemplary implementation of the gaze point calibration model will be described in detail below.
图11示出了注视点校准模型的示例性网络结构。参见图11所示,注视点校准模型可以是编码器-解码器(encoder-decoder结构)的神经网络模型。参见图11所示,注视点校准模型可包括通道维度拼接(channel-wise concat)层、ResNet-18编码网络(ResNet-18 based encoder)、卷积门循环单元神经元(Convolutional GRU Cell)、ResNet-18解码网络(ResNet-18 based decoder)、softargmax标准化(soft-argmax+scaling)层。Fig. 11 shows an exemplary network structure of a gaze point calibration model. Referring to FIG. 11 , the gaze point calibration model may be an encoder-decoder (encoder-decoder structure) neural network model. As shown in Figure 11, the gaze point calibration model may include a channel-wise concat layer, a ResNet-18 based encoder, a Convolutional GRU Cell, a ResNet -18 decoding network (ResNet-18 based decoder), softargmax normalization (soft-argmax+scaling) layer.
参见图11所示,注视点校准模型的处理过程包括:在注视点校准模型的输入端,先经通过通道维度拼接层将第二注视区域的图像与第二图像在通道方向上合并为一个新图像,若第二图像和第二注视区域的图像均为单通道灰度图像,则合并获得的新图像具有2个通道,若第二图像为RGB三通道彩色图像而第二注视区域的图像为单通道灰度图像,则合并获得的新图像具有4个通道,即4通道图像;将合并后的新图像输入编码网络,依次经编码网络、卷积门循环单元神经元和解码网络的处理,解码 网络输出热力图Fig3,热力图Fig3中每个像素的灰度值指示相应像素点是注视点的概率。在解码网络输出热力图Fig3后,热力图Fig3经softargmax标准化层计算而得到注视点在第二图像中的位置,即注视点在第二图像中的对应像素点的坐标(x,y)。通常,一条视线具有一个注视点,每个注视点可能包含第二图像中的一个或多个像素点。Referring to Fig. 11, the processing process of the gaze point calibration model includes: at the input end of the gaze point calibration model, the image of the second gaze area and the second image are merged into a new one in the channel direction through the channel dimension splicing layer. Image, if the second image and the image of the second attention area are both single-channel grayscale images, the new image obtained by combining has 2 channels, if the second image is an RGB three-channel color image and the image of the second attention area is For a single-channel grayscale image, the merged new image has 4 channels, that is, a 4-channel image; the merged new image is input into the encoding network, and then processed by the encoding network, the convolutional gate recurrent unit neuron, and the decoding network in turn. The decoding network outputs the heat map Fig3, and the gray value of each pixel in the heat map Fig3 indicates the probability that the corresponding pixel is the fixation point. After the decoding network outputs the heat map Fig3, the heat map Fig3 is calculated by the softargmax normalization layer to obtain the position of the gaze point in the second image, that is, the coordinates (x, y) of the corresponding pixel point of the gaze point in the second image. Usually, a line of sight has a fixation point, and each fixation point may contain one or more pixels in the second image.
注视点校准模型可预先训练获得,训练时以场景图像及其对应的注视区域灰度图像(该注视区域灰度图像中注视区域的范围为设定值)作为样本,且该样本的真实注视区域已知。训练过程中,ResNet部分和soft-argmax标准层同时训练但采用不同的损失函数,对于采用具体采用何种损失函数,本申请实施例不予限制。例如,ResNet部分的损失函数可以是二值交叉熵(BCE loss),soft-argmax标准层的损失函数可以是均方误差(MSE loss)。The fixation point calibration model can be obtained by pre-training. During training, the scene image and its corresponding grayscale image of the gaze area (the range of the gaze area in the grayscale image of the gaze area is the set value) are used as samples, and the real gaze area of the sample A known. During the training process, the ResNet part and the soft-argmax standard layer are trained at the same time but different loss functions are used. The embodiment of this application does not limit the specific loss function used. For example, the loss function of the ResNet part can be binary cross entropy (BCE loss), and the loss function of the soft-argmax standard layer can be mean square error (MSE loss).
一些示例中,ResNet部分中的解码网络可使用像素级别的二值交叉熵作为损失函数,表达式如下式(3)所示。In some examples, the decoding network in the ResNet part can use pixel-level binary cross-entropy as a loss function, and the expression is shown in the following formula (3).
其中,y
i为像素i是否为注视点的标签,是注视点时取1,非注视点时取0,p(y
i)为解码网络输出的热力图Fig3中像素i是注视点的概率值,N为第二图像Fig1的像素总数也即热力图Fig3的像素总数,图11的示例中,第二图像的规格为128×72,其像素总数N=128*72=9216。
Among them, y i is the label of whether the pixel i is the fixation point, which is 1 when it is the fixation point, and 0 when it is not the fixation point. p(y i ) is the probability value that pixel i is the fixation point in the heat map Fig3 output by the decoding network , N is the total number of pixels of the second image Fig1, that is, the total number of pixels of the heat map Fig3. In the example of FIG.
步骤S304中,根据用户在第二图像中注视点的位置、第二摄像头120的内参获得用户的注视点三维位置的具体实现方式可以有多种,注视点三维位置即为基准坐标系(第一摄像头110的相机坐标系)中注视点的三维坐标。可以理解,基于某个点在图像中的位置获得其在空间中位置的任何算法均可适用于步骤S304。In step S304, according to the position of the user's gaze point in the second image and the internal reference of the second camera 120, there are many specific implementations for obtaining the three-dimensional position of the user's gaze point. The three-dimensional position of the gaze point is the reference coordinate system (first The three-dimensional coordinates of the gaze point in the camera coordinate system of the camera 110). It can be understood that any algorithm for obtaining the position of a certain point in space based on its position in the image can be applied to step S304.
考虑到逆透视变换相对成熟且计算复杂度较低,步骤S304中优选通过逆透视变换来得到注视点三维位置。具体地,步骤S304中只需通过获得第二注视点的深度即可得到基准坐标系中注视点的Z轴坐标,结合步骤S303获得第二注视点的位置即像素坐标(u,v),通过简单的逆透视变换即可得到注视点在基准坐标系中的三维坐标,也即注视点的三维位置。Considering that inverse perspective transformation is relatively mature and has low computational complexity, in step S304, it is preferable to obtain the three-dimensional position of the gaze point through inverse perspective transformation. Specifically, in step S304, the Z-axis coordinates of the gaze point in the reference coordinate system can be obtained only by obtaining the depth of the second gaze point, and in conjunction with step S303 to obtain the position of the second gaze point, that is, the pixel coordinates (u, v), through Simple inverse perspective transformation can obtain the three-dimensional coordinates of the gaze point in the reference coordinate system, that is, the three-dimensional position of the gaze point.
图12示出了步骤S304的示例性具体实现流程。参见图12所示,步骤S304可以包括:步骤S3041,基于单目深度估计算法利用第二图像得到第二注视点的深度,该深度即为注视点相对第二摄像头120的距离h,由距离h估算出第二摄像头的相机坐标系中注视点的Z轴坐标Zc
2;步骤S3042,根据第二注视点的位置即像素坐标(u,v)和第二摄像头的相机坐标系中注视点的Z轴坐标,基于第二摄像头120的内参和外参以及第一摄像头110的外参得到基准坐标系中注视点的三维坐标。
FIG. 12 shows an exemplary specific implementation process of step S304. Referring to Fig. 12, step S304 may include: step S3041, using the second image to obtain the depth of the second gaze point based on the monocular depth estimation algorithm, the depth is the distance h of the gaze point relative to the second camera 120, and the distance h Estimate the Z-axis coordinate Zc 2 of the gaze point in the camera coordinate system of the second camera; Step S3042, according to the position of the second gaze point, that is, the pixel coordinates (u, v) and the Z of the gaze point in the camera coordinate system of the second camera The axis coordinates are based on the internal and external parameters of the second camera 120 and the external parameters of the first camera 110 to obtain the three-dimensional coordinates of the gaze point in the reference coordinate system.
步骤S3041中,可利用第二图像通过诸如FastDepth等单目深度估计算法计算得到第二图像中每个像素相对第二摄像头120的距离h,根据第二注视点的位置即像素坐标即可从中提取到第二注视点相对第二摄像头120的距离h。这里,深度估计可以采用各种可适用的算法。一示例中,优选通过单目深度估计(FastDepth)算法计算得到第二图像中每个像素点的深度,该算法计算复杂度低、处理效率高、算法成熟稳定, 对硬件性能的需求相对较低,便于通过计算能力相对较低的车端设备来实现。In step S3041, the distance h of each pixel in the second image relative to the second camera 120 can be calculated by using the second image through a monocular depth estimation algorithm such as FastDepth, and can be extracted from it according to the position of the second gaze point, that is, the pixel coordinates The distance h from the second gaze point relative to the second camera 120 . Here, various applicable algorithms may be used for depth estimation. In one example, it is preferable to calculate the depth of each pixel in the second image through the monocular depth estimation (FastDepth) algorithm, which has low computational complexity, high processing efficiency, mature and stable algorithm, and relatively low requirements for hardware performance , which is convenient to realize by the vehicle-end equipment with relatively low computing power.
步骤S3042中,根据第二注视点的位置即像素坐标(u,v)和基准坐标系中注视点的Z轴坐标Zc、第二摄像头120的内参,通过式(2)反推注视点在第二摄像头120的相机坐标系中的坐标值(Xc
2,Yc
2,Zc
2),再基于第二摄像头120的外参和第一摄像头110的外参由注视点在第二摄像头120的相机坐标系中的坐标值(Xc
2,Yc
2,Zc
2)通过式(1)推理得到注视点在第一摄像头110的相机坐标系中的坐标值(Xc
1,Yc
1,Zc
1),坐标值(Xc
1,Yc
1,Zc
1)即为注视点的三维位置。
In step S3042, according to the position of the second gaze point, that is, the pixel coordinates (u, v), the Z-axis coordinate Zc of the gaze point in the reference coordinate system, and the internal reference of the second camera 120, the gaze point at Coordinate values (Xc 2 , Yc 2 , Zc 2 ) in the camera coordinate system of the second camera 120, based on the extrinsic parameters of the second camera 120 and the extrinsic parameters of the first camera 110, the camera coordinates of the gaze point in the second camera 120 The coordinate values (Xc 2 , Yc 2 , Zc 2 ) in the system can be deduced by formula (1) to obtain the coordinate values (Xc 1 , Yc 1 , Zc 1 ) of the gaze point in the camera coordinate system of the first camera 110, and the coordinate values (Xc 1 , Yc 1 , Zc 1 ) is the three-dimensional position of the gaze point.
通常,一条视线具有一个注视点,但因精度限制对应同一条视线可能获得多个注视点。此时,可以根据用户在第二图像中注视点的置信度对注视点进行筛选,这样,仅需对筛选出的注视点执行后续步骤即可获得第二视线方向,可在确保第二视线方向准确的同时减少计算量,提高处理效率。这里,注视点的筛选可以在步骤S304之前执行,也可在步骤S304之后进行。Usually, a line of sight has one fixation point, but due to the limitation of accuracy, multiple fixation points may be obtained corresponding to the same line of sight. At this time, the gaze point can be screened according to the confidence of the user's gaze point in the second image, so that the second line of sight direction can be obtained only by performing subsequent steps on the screened out gaze point, which can ensure the second line of sight direction Accurate while reducing the amount of calculation and improving processing efficiency. Here, the screening of gaze points can be performed before step S304, and can also be performed after step S304.
步骤S303中注视点校准模型同时提供了第二注视点的概率值,第二注视点的置信度可由该概率值确定。一些实施例中,注视点校准模型提供的热力图中包含了第二注视点的概率值,该概率值表征第二注视点是真实注视点的概率,概率值越高说明相应第二注视点是真实注视点的可能性越高,可直接以该概率值作为第二注视点的置信度或者以该概率值的正比例函数值作为第二注视点的置信度。由此,无需单独计算即可获得第二注视点的置信度,可提升处理效率,同时降低计算复杂度。In step S303, the gaze point calibration model also provides the probability value of the second gaze point, and the confidence degree of the second gaze point can be determined by the probability value. In some embodiments, the heat map provided by the gaze point calibration model includes a probability value of the second gaze point, which represents the probability that the second gaze point is a real gaze point, and a higher probability value indicates that the corresponding second gaze point is The higher the possibility of the real gaze point, the probability value may be directly used as the confidence degree of the second gaze point or the proportional function value of the probability value may be used as the confidence degree of the second gaze point. Therefore, the confidence degree of the second gaze point can be obtained without separate calculation, which can improve processing efficiency and reduce calculation complexity.
基于置信度对注视点进行筛选的具体实现方式可以有多种。一些示例中,可仅选取第二注视点置信度超出预先设定的第一置信度阈值(例如,0.9)或置信度相对最高的注视点。若第二注视点置信度相对最高或超出第一置信度阈值的注视点仍有多个,可以从这些注视点中随机选取一个或多个。当然,若第二注视点置信度超出第一置信度阈值的注视点或第二注视点置信度相对最高的注视点仍有多个,也可同时保留这多个注视点。如此,通过筛选,不仅可确保最终获得的第二视线方向准确性更高,而且可减少步骤S304、步骤S305以及下文步骤S306的运算量和数据量,从而有效提升处理效率,同时降低硬件损耗,便于通过计算能力较低、存储容量相对有限的车端设备实现。There may be multiple specific implementation manners for screening gaze points based on confidence. In some examples, only gaze points whose confidence level of the second gaze point exceeds a preset first confidence threshold (for example, 0.9) or whose confidence level is relatively highest may be selected. If there are still multiple gaze points with the relatively highest confidence level of the second gaze point or exceeding the first confidence threshold, one or more gaze points may be randomly selected from these gaze points. Certainly, if there are still multiple gaze points whose confidence level of the second gaze point exceeds the first confidence threshold or the gaze point with the relatively highest confidence degree of the second gaze point, these multiple gaze points may also be reserved at the same time. In this way, through screening, it can not only ensure that the final obtained second line of sight direction is more accurate, but also reduce the amount of computation and data in steps S304, S305, and the following step S306, thereby effectively improving processing efficiency and reducing hardware consumption. It is easy to realize by the vehicle-end equipment with low computing power and relatively limited storage capacity.
步骤S305中,可以通过注视点三维位置和眼睛三维位置确定的向量或视角来表示第二视线方向。一些实施例中,在第一摄像头的相机坐标系中,可通过眼睛三维位置为起点、注视点三维位置为终点的向量表征第二视线方向。一些实施例中,在第一摄像头的相机坐标系中,可以眼睛三维位置为起点且指向注视点三维位置的视线与用户眼睛基准点的轴线之间的夹角(即视角)来表征第二视线方向。In step S305, the second line of sight direction may be represented by a vector or an angle of view determined by the three-dimensional position of the gaze point and the three-dimensional position of the eye. In some embodiments, in the camera coordinate system of the first camera, the second line-of-sight direction can be represented by a vector with the three-dimensional position of the eye as the starting point and the three-dimensional position of the gaze point as the end point. In some embodiments, in the camera coordinate system of the first camera, the second line of sight can be characterized by the angle between the line of sight starting from the three-dimensional position of the eye and pointing to the three-dimensional position of the gaze point and the axis of the reference point of the user's eyes (that is, the viewing angle) direction.
本申请实施例中步骤S301~步骤S305的视线校准可通过系统100中的图像处理系统130来执行。The line-of-sight calibration in steps S301 to S305 in the embodiment of the present application may be performed by the image processing system 130 in the system 100 .
通常情况下深度学习模型可以使用少量样本进行“小样本学习”来提升针对特定用户的模型精度。但对于视线追踪模型来说,需要的数据是相机坐标系下用户的视线数据(例如,视线角度),这种数值类型的数据在一般环境中很难直接获得,这使得视线追踪模型的用户级优化变得困难。鉴于此,可使用步骤S305得到的结果优化视线追踪模型。In general, deep learning models can use a small number of samples for "small sample learning" to improve the model accuracy for specific users. But for the eye-tracking model, the required data is the user’s line-of-sight data (for example, the angle of sight) in the camera coordinate system. This type of numerical data is difficult to obtain directly in the general environment, which makes the user-level data of the eye-tracking model Optimization becomes difficult. In view of this, the gaze tracking model can be optimized using the result obtained in step S305.
在步骤S305之后,本申请实施例的视线校准方法还可包括:步骤S306,以用户的第二视线方向和第一图像作为用户的优化样本,基于小样本学习方法优化视线追踪模型。由此,可以少量样本、小规模训练持续提升视线追踪模型对特定用户的视线估计精度,进而获得用户级的视线追踪模型。After step S305, the gaze calibration method of the embodiment of the present application may further include: step S306, using the user's second gaze direction and the first image as user optimization samples, and optimizing the gaze tracking model based on a small sample learning method. In this way, a small number of samples and small-scale training can continuously improve the gaze tracking model's estimation accuracy for a specific user's gaze, and then obtain a user-level gaze tracking model.
以上文图1的示例性系统为例,图13示出了步骤S306中视线追踪模型优化的示例性实现流程。参见图13所示,该示例性流程可包括:步骤S3061,图像处理系统130将第二视线方向和其对应的第一图像作为用户的优化样本存入用户的样本库中,该样本库可与用户信息(例如,用户身份信息)关联以便于查询,部署于模型优化系统140中。步骤S3062,模型优化系统140利用用户的样本库中新增的优化样本,基于小样本学习方法针对前一次优化获得的用户的视线追踪模型进行优化。步骤S3063,模型优化系统140将本次优化得到的用户的视线追踪模型下发给用户车端的图像处理系统130,以便图像处理系统130在用户的下一次视线校准中利用优化后的视线追踪模型获得其第一视线方向。其中,前一次优化得到的用户的视线追踪模型的参数数据、用户的样本库均可与用户信息(例如,用户身份信息)关联,以便本次优化时可通过用户信息直接查询到优化样本和前一次优化得到的视线追踪模型的参数数据。如此,可在用户无感的情况下实时采集用户的优化样本并持续优化其视线追踪模型,用户使用视线追踪模型时间越长、频率越高,该视线追踪模型对用户的视线估计越准确,用户的体验也会越好,在实时提升用户视线估计准确性的同时解决了视线追踪模型针对某些用户视线估计精度低且优化难的技术问题。Taking the exemplary system in FIG. 1 above as an example, FIG. 13 shows an exemplary implementation process of eye-tracking model optimization in step S306. Referring to FIG. 13 , the exemplary process may include: step S3061, the image processing system 130 stores the second line of sight direction and its corresponding first image as the user's optimized sample in the user's sample library, and the sample library can be compared with User information (for example, user identity information) is associated to facilitate query, and is deployed in the model optimization system 140 . In step S3062, the model optimization system 140 optimizes the user's gaze tracking model obtained in the previous optimization based on the small-sample learning method by using the newly added optimization samples in the user's sample library. Step S3063, the model optimization system 140 sends the user's gaze tracking model optimized this time to the image processing system 130 on the user's car side, so that the image processing system 130 can use the optimized gaze tracking model to obtain its first line of sight direction. Among them, the parameter data of the user's gaze tracking model and the user's sample library obtained in the previous optimization can be associated with user information (for example, user identity information), so that the optimized samples and previous samples can be directly queried through user information during this optimization. The parameter data of the gaze tracking model obtained by one optimization. In this way, the user's optimization samples can be collected in real time and the gaze tracking model can be continuously optimized without the user being aware of it. The longer the user uses the gaze tracking model and the higher the frequency, the more accurate the gaze tracking model can estimate the user's gaze. The experience will be better, and while improving the accuracy of user gaze estimation in real time, it solves the technical problem that the gaze tracking model has low accuracy and difficult optimization for some users.
实际应用中,可以定时或在新增的优化样本达到一定数量或其他预设条件满足的情况下执行步骤S3062的优化,在图像处理系统130与模型优化系统140可正常通信的情况下步骤S3061的样本库更新可实时进行。In practical applications, the optimization of step S3062 can be performed regularly or when the number of newly added optimization samples reaches a certain number or other preset conditions are met. In the case that the image processing system 130 and the model optimization system 140 can communicate normally, the optimization of step S3061 Sample library updates can be performed in real time.
可选地,步骤S3061中可选择性地上传用户的优化样本,以提高优化样本的质量,减少不必要的优化操作,降低模型优化带来的硬件损耗。具体地,可以根据第二注视点置信度对第二视线方向进行筛选,仅上传筛选得到的第二视线方向和其对应的第一图像所形成的优化样本。这里,第二视线方向的筛选可以包括但不限于:1)选取第二注视点的置信度大于预先设定的第二置信度阈值(例如,0.95)的第二视线方向;2)选取第二注视点的置信度相对最高的第二视线方向。这里,关于第二注视点的置信度,可参见前文相关描述,不再赘述。Optionally, in step S3061, the user's optimized sample may be selectively uploaded to improve the quality of the optimized sample, reduce unnecessary optimization operations, and reduce hardware loss caused by model optimization. Specifically, the second gaze direction may be screened according to the confidence of the second gaze point, and only optimized samples formed by the screened second gaze direction and its corresponding first image are uploaded. Here, the screening of the second gaze direction may include but not limited to: 1) selecting a second gaze direction whose confidence level of the second gaze point is greater than a preset second confidence threshold (for example, 0.95); 2) selecting the second gaze direction; The confidence of the fixation point relative to the highest second gaze direction. Here, regarding the confidence level of the second gaze point, reference may be made to related descriptions above, and details will not be repeated here.
小样本学习方法可以通过能够以少量样本实现视线追踪模型优化的任何算法来实现。例如,可利用用户的优化样本对视线追踪模型使用MAML算法进行优化,以实现基于小样本学习方法的视线追踪模型的优化。由此,通过少量样本即可获得更契合特定用户个体特性的视线追踪模型,数据量小、计算复杂度低,有利于减少硬件损耗,降低硬件成本。The few-shot learning method can be implemented by any algorithm that can optimize the gaze tracking model with a small number of samples. For example, the user's optimization samples can be used to optimize the gaze tracking model using the MAML algorithm, so as to realize the optimization of the gaze tracking model based on the small sample learning method. As a result, a gaze tracking model that is more suitable for a specific user's individual characteristics can be obtained through a small number of samples, with a small amount of data and low computational complexity, which is conducive to reducing hardware loss and hardware cost.
下面以座舱场景为例说明本实施例的具体实现方式。The following uses a cockpit scene as an example to illustrate the specific implementation of this embodiment.
图14示出了座舱环境中系统100执行视线校准和模型优化的示例性处理流程。参见图14所示,该处理流程可包括:步骤S1401,车辆G的车内摄像头捕捉车辆座舱中的驾驶员A的DMS图像(即第一图像),DMS图像中包含驾驶员A的脸部,经车辆G车端的图像处理系统130运行视线追踪模型,推理出初始视线方向(即第一 视线方向),同时利用DMS图像进行人眼位置估计得到驾驶员A的眼睛三维位置;步骤S1402,图像处理系统130结合车外摄像头捕捉的车外图像(即第二图像)及初始视线方向的注视区域进行推理,获得驾驶员A的校准视线方向(即第二视线方向),车外图像中包含驾驶员A当前看到的场景,车外图像与上述DMS图像同步采集。步骤S1403,在判断校准视线方向的可信度较高(例如,第二注视点的置信度满足上文相关要求)时,图像处理系统130将驾驶员A的DMS图像与校准视线方向作为驾驶员A的个性化数据(即优化样本)上传至模型优化系统140,模型优化系统140对驾驶员A的视线追踪模型使用小样本学习方式进行优化,获得驾驶员A的视线追踪模型并下发至车辆G车端的图像处理系统130。可见,本实施例使用车外图像校准视线追踪模型估计的初始视线数据以提升视线数据的准确性,并以所获得的校准视线数据作为用户的个性化视线数据优化视线追踪模型,提升视线追踪模型对相应用户的视线估计精度。由此,本实施例不仅可解决视线追踪模型的视线估计结果在座舱场景下实际使用时不准确的问题,同时还可解决座舱场景下因无法获取用户的视线数据而导致视线追踪模型难以优化的技术问题。并且,该系统还具有成长性,车载场景下其上述处理流程可在用户无感的情况下持续进行,用户使用越多该系统对该用户的视线估计越准确,视线追踪模型对该用户的精度也越高。FIG. 14 illustrates an exemplary process flow for the system 100 to perform line of sight calibration and model optimization in a cockpit environment. Referring to FIG. 14 , the processing flow may include: step S1401, the camera in the vehicle G captures the DMS image (i.e. the first image) of the driver A in the cockpit of the vehicle, the DMS image includes the face of the driver A, The image processing system 130 at the vehicle end of the vehicle G runs the line-of-sight tracking model to deduce the initial line-of-sight direction (i.e. the first line-of-sight direction), and at the same time use the DMS image to estimate the human eye position to obtain the three-dimensional position of the eyes of the driver A; step S1402, image processing The system 130 performs inference based on the external image captured by the external camera (i.e. the second image) and the fixation area of the initial line of sight direction to obtain the calibrated line of sight direction of the driver A (i.e. the second line of sight direction). In the scene currently seen by A, the image outside the vehicle is collected synchronously with the above-mentioned DMS image. Step S1403, when it is judged that the reliability of the calibration line of sight direction is high (for example, the confidence level of the second gaze point meets the relevant requirements above), the image processing system 130 uses the DMS image of driver A and the calibration line of sight direction as the driver A A's personalized data (i.e., optimized samples) is uploaded to the model optimization system 140, and the model optimization system 140 optimizes the driver A's gaze tracking model using a small-sample learning method, obtains the driver A's gaze tracking model and sends it to the vehicle The image processing system 130 at the G car end. It can be seen that this embodiment uses images outside the vehicle to calibrate the initial line of sight data estimated by the line of sight tracking model to improve the accuracy of line of sight data, and use the obtained calibrated line of sight data as the user's personalized line of sight data to optimize the line of sight tracking model and improve the line of sight tracking model. The line-of-sight estimation accuracy for the corresponding user. Therefore, this embodiment can not only solve the problem that the gaze estimation result of the gaze tracking model is inaccurate in actual use in the cockpit scene, but also solve the problem that the gaze tracking model is difficult to optimize due to the inability to obtain the user's gaze data in the cockpit scene. technical problem. Moreover, the system also has growth potential. In the vehicle scene, the above-mentioned processing flow can continue without the user's perception. The more the user uses the system, the more accurate the user's line of sight estimation is, and the accuracy of the line of sight tracking model for the user Also taller.
图15示出了本实施例提供的视线校准装置1500的示例性结构。参见图15所示,本实施例的视线校准装置1500可包括:Fig. 15 shows an exemplary structure of a sight calibration device 1500 provided in this embodiment. Referring to Fig. 15, the line of sight calibration device 1500 of this embodiment may include:
眼睛位置确定单元1501,配置为根据第一摄像头采集的包含用户眼睛的第一图像,获得用户的眼睛三维位置;The eye position determination unit 1501 is configured to obtain the three-dimensional position of the user's eyes according to the first image collected by the first camera including the user's eyes;
第一视线确定单元1502,配置为根据第一摄像头采集的包含用户眼睛的第一图像,获得用户的第一视线方向;The first line of sight determining unit 1502 is configured to obtain the first line of sight direction of the user according to the first image including the eyes of the user captured by the first camera;
注视区域单元1503,配置为根据所述眼睛三维位置、第一视线方向、第一摄像头的外参和第二摄像头的外参与内参,获得用户在第二图像中的注视区域,所述第二图像由所述第二摄像头采集且包含用户看到的车外场景;The gaze area unit 1503 is configured to obtain the gaze area of the user in the second image according to the three-dimensional position of the eye, the first line of sight direction, the external parameters of the first camera, and the external parameters and internal parameters of the second camera, and the second image Captured by the second camera and including the scene outside the vehicle seen by the user;
注视点校准单元1504,配置为根据用户在第二图像中的注视区域和所述第二图像,获得用户在第二图像中注视点的位置;The gaze point calibration unit 1504 is configured to obtain the position of the gaze point of the user in the second image according to the gaze area of the user in the second image and the second image;
注视点转换单元1505,配置为根据用户在第二图像中注视点的位置、所述第二摄像头的内参,获得用户的注视点三维位置;The gaze point conversion unit 1505 is configured to obtain the three-dimensional position of the gaze point of the user according to the position of the gaze point of the user in the second image and the internal parameters of the second camera;
第二视线确定单元1506,配置为根据所述注视点三维位置和所述眼睛三维位置,获得用户的第二视线方向,所述第二视线方向作为校准后的视线方向。The second line of sight determining unit 1506 is configured to obtain a second line of sight direction of the user according to the three-dimensional position of the gaze point and the three-dimensional position of the eye, and the second line of sight direction is used as a calibrated line of sight direction.
一些实施例中,第一视线方向是基于视线追踪模型从所述第一图像中提取的。In some embodiments, the first gaze direction is extracted from the first image based on a gaze tracking model.
一些实施例中,注视区域单元1503是配置为根据所述眼睛三维位置、第一视线方向、第一摄像头的外参和第二摄像头的外参与内参,获得用户在第二图像中的注视区域,包括:根据所述眼睛三维位置、第一视线方向、第一摄像头的外参、第二摄像头的外参与内参以及所述视线追踪模型的精度,获得用户在第二图像中的注视区域。In some embodiments, the gaze area unit 1503 is configured to obtain the gaze area of the user in the second image according to the three-dimensional position of the eye, the first line of sight direction, the external parameters of the first camera, and the external parameters and internal parameters of the second camera, The method includes: obtaining the gaze area of the user in the second image according to the three-dimensional position of the eye, the first line of sight direction, the external parameters of the first camera, the external parameters of the second camera and the accuracy of the line-of-sight tracking model.
一些实施例中,视线校准装置还包括:优化单元1507,配置为以所述用户的第二视线方向和所述第一图像作为所述用户的优化样本,基于小样本学习方法优化所述视线追踪模型。In some embodiments, the gaze calibration device further includes: an optimization unit 1507 configured to use the user's second gaze direction and the first image as optimization samples of the user, and optimize the gaze tracking based on a small sample learning method Model.
一些实施例中,注视点校准单元1504还可配置为根据所述用户在第二图像中注视点的置信度对所述注视点进行筛选;和/或,优化单元1507还配置为根据所述用户在第二图像中注视点的置信度对第二视线方向进行筛选。In some embodiments, the gaze point calibration unit 1504 can also be configured to filter the gaze point according to the confidence of the user's gaze point in the second image; and/or, the optimization unit 1507 is also configured to filter the gaze point according to the user's The confidence of the gaze point in the second image filters the second gaze direction.
一些实施例中,所述用户在第二图像中注视点的位置是利用注视点校准模型根据用户在第二图像中的注视区域和所述第二图像获得的。In some embodiments, the position of the gaze point of the user in the second image is obtained according to the gaze area of the user in the second image and the second image by using a gaze point calibration model.
一些实施例中,所述注视点校准模型同时提供了用户在第二图像中注视点的概率值,所述置信度由概率值确定。In some embodiments, the gaze point calibration model also provides a probability value of the user's gaze point in the second image, and the confidence level is determined by the probability value.
【实施例二】[Example 2]
图16示出了本实施例所适用系统1600的示例性架构。参见图16所示,本实施例的示例性系统1600与实施例一的系统100基本相同,所不同的是,本实施例的示例性系统1600中第二摄像头120为可选组件,其包含显示屏160,该显示屏160可部署于车端,通过车端设备中已有的显示组件来实现。本实施例中系统1600的其他部分即第一摄像头110、图像处理系统130、模型优化系统140、模型训练系统150150与实施例一系统100中的相应部分功能基本相同,不再赘述。本实施例使用与第一摄像头110(即,车内摄像头)标记了位置关系的显示屏160,依靠用户注视显示屏160的参考点实现用户视线的校准并获取其优化样本,使用该优化样本对视线追踪模型进行小样本学习以提升其精度。FIG. 16 shows an exemplary architecture of a system 1600 applicable to this embodiment. 16, the exemplary system 1600 of this embodiment is basically the same as the system 100 of Embodiment 1, the difference is that the second camera 120 in the exemplary system 1600 of this embodiment is an optional component, which includes a display The display screen 160, which can be deployed on the vehicle end, is realized through the existing display components in the vehicle end equipment. Other parts of the system 1600 in this embodiment, namely the first camera 110, the image processing system 130, the model optimization system 140, and the model training system 150150, have basically the same functions as the corresponding parts in the system 100 in Embodiment 1, and will not be repeated here. This embodiment uses the display screen 160 marked with the positional relationship with the first camera 110 (that is, the camera in the car), and relies on the reference point of the user's gaze on the display screen 160 to realize the calibration of the user's line of sight and obtain its optimized sample. The eye-tracking model performs few-shot learning to improve its accuracy.
下面对本实施例的视线校准方法进行详细说明。The line of sight calibration method of this embodiment will be described in detail below.
图17示出了本实施例中视线校准方法的示例性流程。参见图17所示,本实施例的视线校准方法可包括如下步骤:Fig. 17 shows an exemplary flow of the line of sight calibration method in this embodiment. Referring to Figure 17, the line of sight calibration method of this embodiment may include the following steps:
步骤S1701,响应于用户对显示屏160中参考点的注视操作,获得用户的注视点三维位置;Step S1701, in response to the user's gazing operation on the reference point on the display screen 160, obtain the three-dimensional position of the user's gazing point;
本步骤之前,还可以包括:控制显示屏160向用户提供视线校准界面,所述视线校准界面中包含用于提醒用户注视参考点的可视化提示,以便用户根据该可视化提示执行相应的注视操作。这里,视线校准界面的具体形式,本实施例不予限制。Before this step, it may also include: controlling the display screen 160 to provide a line of sight calibration interface to the user, the line of sight calibration interface including a visual prompt for reminding the user to gaze at the reference point, so that the user performs a corresponding gaze operation according to the visual prompt. Here, the specific form of the line-of-sight calibration interface is not limited by this embodiment.
本步骤中,注视操作可以是任何用户注视显示屏160中参考点的相关操作,对于注视操作的具体实现方式或表现形式,本申请实施例不予限制。举例来说,注视操作可以包括用户注视视线校准界面中参考点的同时在视线校准界面中输入确认信息。In this step, the gazing operation may be any operation related to the user gazing at the reference point on the display screen 160, and the embodiment of the present application does not limit the specific implementation or expression of the gazing operation. For example, the gaze operation may include inputting confirmation information in the gaze calibration interface while the user gazes at a reference point in the gaze calibration interface.
以座舱场景为例,显示屏160可以是但不限于车辆的AR-HUD、车辆的仪表盘、用户的便携式电子设备或其他。通常,座舱场景中的视线校准主要针对驾驶员或副驾驶员,因此,为确保视线校准不影响安全驾驶,显示屏160优选为AR-HUD。Taking the cockpit scene as an example, the display screen 160 may be, but not limited to, an AR-HUD of a vehicle, a dashboard of a vehicle, a portable electronic device of a user, or others. Usually, the line of sight calibration in the cockpit scene is mainly aimed at the driver or the co-pilot. Therefore, in order to ensure that the line of sight calibration does not affect safe driving, the display screen 160 is preferably an AR-HUD.
本步骤中,显示屏160中每个参考点在第一摄像头110的相机坐标系中的三维坐标可通过显示屏160与第一摄像头110的位置关系预先标定。如此,用户注视一个参考点,则该参考点即为用户的注视点,该参考点在第一摄像头110的相机坐标系中的三维坐标即为用户的注视点三维位置。In this step, the three-dimensional coordinates of each reference point on the display screen 160 in the camera coordinate system of the first camera 110 may be pre-calibrated through the positional relationship between the display screen 160 and the first camera 110 . In this way, if the user gazes at a reference point, the reference point is the user's gaze point, and the three-dimensional coordinates of the reference point in the camera coordinate system of the first camera 110 are the three-dimensional position of the user's gaze point.
步骤S1702,根据第一摄像头110采集的包含用户眼睛的第一图像,获得用户的眼睛三维位置;Step S1702, according to the first image collected by the first camera 110 including the user's eyes, obtain the three-dimensional position of the user's eyes;
本步骤的具体实施方式与实施例一中步骤S301中眼睛三维位置的具体实现方式相同,不再赘述。The specific implementation of this step is the same as the specific implementation of the three-dimensional position of the eye in step S301 in the first embodiment, and will not be repeated here.
步骤S1703,根据注视点三维位置和眼睛三维位置,获得用户的第二视线方向。Step S1703, according to the three-dimensional position of the gaze point and the three-dimensional position of the eyes, the second line of sight direction of the user is obtained.
本步骤的具体实现方式与实施例一种步骤S305相同,不再赘述。The specific implementation manner of this step is the same as step S305 in the first embodiment, and will not be repeated here.
本实施例的视线校准方法,利用参考点即可获得用户的注视点三维位置,同时结合第一图像获得用户的眼睛三维位置,即获得了准确度较高的第二视线方向。可见,本实施例的视线校准方法不仅可有效提升用户视线估计的准确度,而且操作简单、计算复杂度低、处理效率高,适用于座舱环境。The sight calibration method of this embodiment can obtain the three-dimensional position of the user's gaze point by using the reference point, and at the same time obtain the three-dimensional position of the user's eyes in combination with the first image, that is, obtain the second sight direction with high accuracy. It can be seen that the line-of-sight calibration method of this embodiment can not only effectively improve the accuracy of user line-of-sight estimation, but also has simple operation, low computational complexity, and high processing efficiency, and is suitable for the cockpit environment.
本实施例的方法优选第一摄像头110的相机坐标系为基准坐标系,由此获得的第二视线方向可直接用于视线追踪模型的优化。注视点三维位置和眼睛三维位置均通过第一摄像头110的相机坐标系中的三维坐标值表示,第二视线方向可通过第一摄像头110的相机坐标系中的视角或方向矢量表示。详细细节可参见实施例一相关描述,不再赘述。In the method of this embodiment, the camera coordinate system of the first camera 110 is preferably used as the reference coordinate system, and the second line-of-sight direction obtained thereby can be directly used for the optimization of the line-of-sight tracking model. Both the three-dimensional position of the gaze point and the three-dimensional position of the eyes are represented by the three-dimensional coordinate values in the camera coordinate system of the first camera 110 , and the second line of sight direction can be represented by the view angle or direction vector in the camera coordinate system of the first camera 110 . For details, refer to the relevant description of Embodiment 1, and details are not repeated here.
与实施例一同理,本实施例的视线校准方法还可包括:步骤S1704,以用户的第二视线方向和第一图像作为用户的优化样本,基于小样本学习方法优化视线追踪模型。由此,可以少量样本、小规模训练持续提升视线追踪模型对特定用户的视线估计精度,进而获得用户级的视线追踪模型。本步骤的具体实现方式与实施例一中步骤S306相同,不再赘述。因本步骤中注视点三维位置是通过标定得到,其准确性较高,因此,本实施例步骤S1704之前,无需对第二视线方向进行筛选。Similar to the embodiment, the gaze calibration method of this embodiment may further include: step S1704, using the user's second gaze direction and the first image as the user's optimization samples, and optimizing the gaze tracking model based on the small sample learning method. In this way, a small number of samples and small-scale training can continuously improve the gaze tracking model's estimation accuracy for a specific user's gaze, and then obtain a user-level gaze tracking model. The specific implementation manner of this step is the same as that of step S306 in the first embodiment, and will not be repeated here. Since the three-dimensional position of the gaze point in this step is obtained through calibration, its accuracy is relatively high. Therefore, there is no need to screen the second gaze direction before step S1704 in this embodiment.
图18示出了本实施例提供的视线校准装置1800的示例性结构。参见图18所示,本实施例的视线校准装置1800可包括:Fig. 18 shows an exemplary structure of a sight calibration device 1800 provided in this embodiment. Referring to Fig. 18, the line of sight calibration device 1800 of this embodiment may include:
注视点位置确定单元1801,配置为响应于用户对显示屏中参考点的注视操作,获得用户的注视点三维位置;The gaze point position determination unit 1801 is configured to obtain the three-dimensional position of the gaze point of the user in response to the user's gaze operation on the reference point in the display screen;
眼睛位置确定单元1501,配置为根据第一摄像头采集的包含用户眼睛的第一图像,获得用户的眼睛三维位置;The eye position determination unit 1501 is configured to obtain the three-dimensional position of the user's eyes according to the first image collected by the first camera including the user's eyes;
第二视线确定单元1506,配置为根据所述注视点三维位置和所述眼睛三维位置,获得用户的第二视线方向。The second line of sight determining unit 1506 is configured to obtain a second line of sight direction of the user according to the three-dimensional position of the gaze point and the three-dimensional position of the eye.
一些实施例中,所述显示屏为增强现实抬头显示。In some embodiments, the display screen is an augmented reality head-up display.
一些实施例中,所述装置还包括:优化单元1507,配置为以所述用户的第二视线方向和所述第一图像作为所述用户的优化样本,基于小样本学习方法优化视线追踪模型。In some embodiments, the device further includes: an optimization unit 1507 configured to use the user's second gaze direction and the first image as optimization samples of the user, and optimize the gaze tracking model based on a few-shot learning method.
下面对本申请实施例的计算设备及计算机可读存储介质进行说明。The computing device and the computer-readable storage medium in the embodiments of the present application are described below.
图19是本申请实施例提供的一种计算设备1900的结构性示意性图。该计算设备1900包括:处理器1910、存储器1920。FIG. 19 is a schematic structural diagram of a computing device 1900 provided by an embodiment of the present application. The computing device 1900 includes: a processor 1910 and a memory 1920 .
计算设备1900中还可包括通信接口1930、总线1940。应理解,图19所示的计算设备1900中的通信接口1930可以用于与其他设备之间进行通信。存储器1920、通信接口1930可以通过总线1940与处理器1910连接。为便于表示,图19中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。The computing device 1900 may also include a communication interface 1930 and a bus 1940 . It should be understood that the communication interface 1930 in the computing device 1900 shown in FIG. 19 can be used to communicate with other devices. The memory 1920 and the communication interface 1930 can be connected to the processor 1910 through the bus 1940 . For ease of representation, only one line is used in FIG. 19 , but it does not mean that there is only one bus or one type of bus.
其中,该处理器1910可以与存储器1920连接。该存储器1920可以用于存储该程序代码和数据。因此,该存储器1920可以是处理器1910内部的存储单元,也可以是与处理器1910独立的外部存储单元,还可以是包括处理器1910内部的存储单元和 与处理器1910独立的外部存储单元的部件。Wherein, the processor 1910 may be connected to the memory 1920 . The memory 1920 can be used to store the program codes and data. Therefore, the memory 1920 may be a storage unit inside the processor 1910, or an external storage unit independent of the processor 1910, or may include a storage unit inside the processor 1910 and an external storage unit independent of the processor 1910. part.
应理解,在本申请实施例中,该处理器1910可以采用中央处理单元(central processing unit,CPU)。该处理器还可以是其它通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate Array,FPGA)或者其它可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。或者该处理器1910采用一个或多个集成电路,用于执行相关程序,以实现本申请实施例所提供的技术方案。It should be understood that, in this embodiment of the present application, the processor 1910 may be a central processing unit (central processing unit, CPU). The processor can also be other general-purpose processors, digital signal processors (digital signal processors, DSPs), application specific integrated circuits (application specific integrated circuits, ASICs), off-the-shelf programmable gate arrays (field programmable gate arrays, FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like. Alternatively, the processor 1910 adopts one or more integrated circuits for executing related programs, so as to realize the technical solutions provided by the embodiments of the present application.
该存储器1920可以包括只读存储器和随机存取存储器,并向处理器1910提供指令和数据。处理器1910的一部分还可以包括非易失性随机存取存储器。例如,处理器1910还可以存储设备类型的信息。The memory 1920 may include read-only memory and random-access memory, and provides instructions and data to the processor 1910 . A portion of processor 1910 may also include non-volatile random access memory. For example, processor 1910 may also store device type information.
在计算设备1900运行时,所述处理器1910执行所述存储器1920中的计算机执行指令执行上述各实施例中视线校准方法的操作步骤。When the computing device 1900 is running, the processor 1910 executes the computer-executed instructions in the memory 1920 to execute the operation steps of the line-of-sight calibration method in the above-mentioned embodiments.
应理解,根据本申请实施例的计算设备1900可以对应于执行根据本申请各实施例的方法中的相应主体,并且计算设备1900中的各个模块的上述和其它操作和/或功能分别为了实现本实施例各方法的相应流程,为了简洁,在此不再赘述。It should be understood that the computing device 1900 according to the embodiment of the present application may correspond to a corresponding body executing the methods according to the various embodiments of the present application, and the above-mentioned and other operations and/or functions of the modules in the computing device 1900 are for realizing the present invention. For the sake of brevity, the corresponding processes of the methods in the embodiments are not repeated here.
下面对本申请实施例的系统架构及其相关应用进行示例性地说明。The system architecture of the embodiment of the present application and its related applications are exemplarily described below.
本申请实施例还提供一种驾驶员监控系统,其包括上文的第一摄像头110、第二摄像头120和计算设备1900。The embodiment of the present application also provides a driver monitoring system, which includes the above-mentioned first camera 110 , second camera 120 and computing device 1900 .
一些实施例中,第一摄像头110配置为采集包含用户眼睛的第一图像,第二摄像头120配置为采集包含用户看到的场景的第二图像,第一摄像头110和第二摄像头120均可与计算设备1900通信。计算设备1900中,处理器1910利用第一摄像头110提供的第一图像和第二摄像头120提供的第二图像执行存储器1920中的计算机执行指令执行上述实施例一中视线校准方法的操作步骤。In some embodiments, the first camera 110 is configured to capture a first image including the eyes of the user, and the second camera 120 is configured to capture a second image including the scene seen by the user, and both the first camera 110 and the second camera 120 can communicate with each other. The computing device 1900 communicates. In the computing device 1900, the processor 1910 uses the first image provided by the first camera 110 and the second image provided by the second camera 120 to execute computer-executed instructions in the memory 1920 to execute the operation steps of the line of sight calibration method in the first embodiment above.
一些实施例中,驾驶员监控系统还可包括显示屏,配置为向用户显示参考点。计算设备1900中,处理器1910利用第一摄像头110提供的第一图像和显示屏所显示的参考点的三维位置执行存储器1920中的计算机执行指令执行上述实施例二中视线校准方法的操作步骤。In some embodiments, the driver monitoring system may further include a display screen configured to display reference points to the user. In the computing device 1900, the processor 1910 uses the first image provided by the first camera 110 and the three-dimensional position of the reference point displayed on the display screen to execute computer-executed instructions in the memory 1920 to execute the operation steps of the line-of-sight calibration method in the second embodiment above.
一些实施例中,驾驶员监控系统还可包括云端服务器,其可配置为以计算设备1900提供的用户的第二视线方向和第一图像作为用户的优化样本,基于小样本学习方法优化视线追踪模型,并将优化后的视线追踪模型提供给计算设备1900,从而提升视线追踪模型对用户的视线估计精度。In some embodiments, the driver monitoring system can also include a cloud server, which can be configured to use the user's second line of sight direction and the first image provided by the computing device 1900 as the user's optimization sample, and optimize the line of sight tracking model based on the small sample learning method , and provide the optimized gaze tracking model to the computing device 1900, so as to improve the estimation accuracy of the gaze tracking model for the user's gaze.
具体地,驾驶员监控系统的架构可参见实施例一中图1所示的系统和实施例二中图16所示的系统。其中,图像处理系统130可部署于计算设备1900中,上文所述的模型优化系统140可部署于云端服务器中。Specifically, the architecture of the driver monitoring system can refer to the system shown in FIG. 1 in the first embodiment and the system shown in FIG. 16 in the second embodiment. Wherein, the image processing system 130 can be deployed in the computing device 1900, and the above-mentioned model optimization system 140 can be deployed in the cloud server.
本申请实施例还提供一种车辆,其可包括上述的驾驶员监控系统。具体应用中,该车辆即是机动车辆,其可以是但不限于运动型多用途车辆、大客车、大货车、各种商用车辆的乘用车辆,还可以是但不限于各种舟艇、船舶的船只、航空器等,其还可以是但不限于混合动力车辆、电动车辆、插电式混合动力电动车辆、氢动力车辆和其 他替代性的燃料车辆。其中,混合动力车辆可以是任何具有两种或更多种动力源的车辆,例如具有汽油和电力两种动力源的车辆。An embodiment of the present application also provides a vehicle, which may include the above-mentioned driver monitoring system. In a specific application, the vehicle is a motor vehicle, which can be but not limited to a sports utility vehicle, a bus, a large truck, a passenger vehicle of various commercial vehicles, and can also be but not limited to a vehicle of various boats and ships. Watercraft, aircraft, etc., which may also be, but are not limited to, hybrid vehicles, electric vehicles, plug-in hybrid electric vehicles, hydrogen powered vehicles, and other alternative fuel vehicles. Wherein, the hybrid vehicle can be any vehicle with two or more power sources, for example, a vehicle with gasoline and electric power sources.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those skilled in the art can appreciate that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described system, device and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .
本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时用于执行一种视线校准方法,该方法包括上述各个实施例所描述的方案中的至少之一。The embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, it is used to execute a line-of-sight calibration method, and the method includes the solutions described in the above-mentioned embodiments at least one of the .
本申请实施例的计算机存储介质,可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是,但不限于,电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适 的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。The computer storage medium in the embodiments of the present application may use any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples (non-exhaustive list) of computer readable storage media include: electrical connections with one or more leads, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In this document, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。A computer readable signal medium may include a data signal carrying computer readable program code in baseband or as part of a carrier wave. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. .
计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括、但不限于无线、电线、光缆、RF等等,或者上述的任意合适的组合。Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
可以以一种或多种程序设计语言或其组合来编写用于执行本申请操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for performing the operations of the present application may be written in one or more programming languages or combinations thereof, including object-oriented programming languages—such as Java, Smalltalk, C++, and conventional Procedural Programming Language - such as "C" or a similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through the Internet using an Internet service provider). connect).
注意,上述仅为本申请的较佳实施例及所运用的技术原理。本领域技术人员会理解,本发明不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此,虽然通过以上实施例对本申请进行了较为详细的说明,但是本发明不仅仅限于以上实施例,在不脱离本发明的构思的情况下,还可以包括更多其他等效实施例,均属于本发明的保护范畴。Note that the above are only preferred embodiments and technical principles used in this application. Those skilled in the art will understand that the present invention is not limited to the specific embodiments described herein, and that various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present invention. Therefore, although the present application has been described in detail through the above embodiments, the present invention is not limited to the above embodiments, and can also include more other equivalent embodiments without departing from the concept of the present invention, all of which belong to protection scope of the present invention.
Claims (25)
- 一种视线校准方法,其特征在于,包括:A line of sight calibration method, characterized in that, comprising:根据第一摄像头采集的包含用户眼睛的第一图像,获得用户的眼睛三维位置和第一视线方向;Obtaining the three-dimensional position of the user's eye and the first line of sight direction according to the first image including the user's eye collected by the first camera;根据所述眼睛三维位置、第一视线方向、第一摄像头的外参和第二摄像头的外参与内参,获得用户在第二图像中的注视区域,所述第二图像由所述第二摄像头采集且包含用户看到的车外场景;According to the three-dimensional position of the eye, the first line of sight direction, the external parameters of the first camera and the external parameters and internal parameters of the second camera, the gaze area of the user in the second image is obtained, and the second image is collected by the second camera And include the scene outside the car seen by the user;根据用户在第二图像中的注视区域和所述第二图像,获得用户在第二图像中注视点的位置;Obtaining the position of the user's gaze point in the second image according to the gaze area of the user in the second image and the second image;根据所述注视点的位置、所述第二摄像头的内参,获得用户的注视点三维位置;Obtain the three-dimensional position of the user's gaze point according to the position of the gaze point and the internal reference of the second camera;根据所述注视点三维位置和所述眼睛三维位置,获得用户的第二视线方向,所述第二视线方向作为校准后的视线方向。According to the three-dimensional position of the gaze point and the three-dimensional position of the eyes, a second line of sight direction of the user is obtained, and the second line of sight direction is used as a calibrated line of sight direction.
- 根据权利要求1所述的视线校准方法,其特征在于,所述第一视线方向是基于视线追踪模型从所述第一图像中提取的。The gaze calibration method according to claim 1, wherein the first gaze direction is extracted from the first image based on a gaze tracking model.
- 根据权利要求2所述的视线校准方法,其特征在于,根据所述眼睛三维位置、第一视线方向、第一摄像头的外参和第二摄像头的外参与内参,获得用户在第二图像中的注视区域,包括:根据所述眼睛三维位置、第一视线方向、第一摄像头的外参、第二摄像头的外参与内参以及所述视线追踪模型的精度,获得用户在第二图像中的注视区域。The line of sight calibration method according to claim 2, characterized in that, according to the three-dimensional position of the eye, the first line of sight direction, the extrinsic parameters of the first camera, and the extrinsic and internal parameters of the second camera, the user's position in the second image is obtained. Gaze area, including: obtaining the gaze area of the user in the second image according to the three-dimensional position of the eye, the first line of sight direction, the extrinsic parameters of the first camera, the extrinsic parameters of the second camera, and the accuracy of the gaze tracking model. .
- 根据权利要求2或3任一项所述的视线校准方法,其特征在于,还包括:以所述用户的第二视线方向和所述第一图像作为所述用户的优化样本,基于小样本学习方法优化所述视线追踪模型。The sight calibration method according to any one of claims 2 or 3, further comprising: taking the user's second sight direction and the first image as the user's optimized samples, and learning based on small samples The method optimizes the gaze tracking model.
- 根据权利要求1至4任一项所述的视线校准方法,其特征在于,还包括:根据所述用户在第二图像中注视点的置信度对所述注视点或所述第二视线方向进行筛选。The line of sight calibration method according to any one of claims 1 to 4, further comprising: performing the gaze point or the second line of sight direction according to the confidence of the user's gaze point in the second image filter.
- 根据权利要求1至5任一项所述的视线校准方法,其特征在于,所述用户在第二图像中注视点的位置是利用注视点校准模型根据用户在第二图像中的注视区域和所述第二图像获得的。The line of sight calibration method according to any one of claims 1 to 5, wherein the position of the user's gaze point in the second image is based on the gaze area of the user in the second image and the position of the gaze point using a gaze point calibration model. obtained from the second image described above.
- 根据权利要求6所述的视线校准方法,其特征在于,所述注视点校准模型同时提供了用户在第二图像中注视点的概率值,所述置信度由所述概率值确定。The line of sight calibration method according to claim 6, wherein the gaze point calibration model simultaneously provides a probability value of the user's gaze point in the second image, and the confidence level is determined by the probability value.
- 一种视线校准方法,其特征在于,包括:A line of sight calibration method, characterized in that, comprising:响应于用户对显示屏中参考点的注视操作,获得用户的注视点三维位置;Obtain the three-dimensional position of the user's gaze point in response to the user's gaze operation on the reference point in the display screen;根据第一摄像头采集的包含用户眼睛的第一图像,获得用户的眼睛三维位置;Obtain the three-dimensional position of the user's eye according to the first image including the user's eye collected by the first camera;根据所述注视点三维位置和所述眼睛三维位置,获得用户的第二视线方向。According to the three-dimensional position of the gaze point and the three-dimensional position of the eyes, the second line-of-sight direction of the user is obtained.
- 根据权利要求8所述的视线校准方法,其特征在于,所述显示屏为增强现实抬头显示系统的显示屏。The sight calibration method according to claim 8, wherein the display screen is a display screen of an augmented reality head-up display system.
- 根据权利要求8所述的视线校准方法,其特征在于,所述方法还包括:以所述用户的第二视线方向和所述第一图像作为所述用户的优化样本,基于小样本学习方法优化视线追踪模型。The sight calibration method according to claim 8, further comprising: using the user's second sight direction and the first image as the optimization samples of the user, and optimizing based on the small sample learning method Eye tracking model.
- 一种视线校准装置,其特征在于,包括:A line of sight calibration device, characterized in that it comprises:眼睛位置确定单元,配置为根据第一摄像头采集的包含用户眼睛的第一图像,获得用户的眼睛三维位置;The eye position determination unit is configured to obtain the three-dimensional position of the user's eyes according to the first image including the user's eyes captured by the first camera;第一视线确定单元,配置为根据第一摄像头采集的包含用户眼睛的第一图像,获得用户的第一视线方向;The first line-of-sight determination unit is configured to obtain the first line-of-sight direction of the user according to the first image including the eyes of the user captured by the first camera;注视区域单元,配置为根据所述眼睛三维位置、第一视线方向、第一摄像头的外参和第二摄像头的外参与内参,获得用户在第二图像中的注视区域,所述第二图像由所述第二摄像头采集且包含用户看到的车外场景;The gaze area unit is configured to obtain the gaze area of the user in the second image according to the three-dimensional position of the eye, the first line of sight direction, the external parameters of the first camera, and the external parameters and internal parameters of the second camera, and the second image is obtained by The second camera captures and includes the scene outside the vehicle seen by the user;注视点校准单元,配置为根据用户在第二图像中的注视区域和所述第二图像,获得用户在第二图像中注视点的位置;The gaze point calibration unit is configured to obtain the position of the gaze point of the user in the second image according to the gaze area of the user in the second image and the second image;注视点转换单元,配置为根据所述注视点的位置、所述第二摄像头的内参,获得用户的注视点三维位置;The gaze point conversion unit is configured to obtain the three-dimensional position of the gaze point of the user according to the position of the gaze point and the internal reference of the second camera;第二视线确定单元,配置为根据所述注视点三维位置和所述眼睛三维位置,获得用户的第二视线方向。The second line of sight determining unit is configured to obtain a second line of sight direction of the user according to the three-dimensional position of the gaze point and the three-dimensional position of the eye.
- 根据权利要求11所述的视线校准装置,其特征在于,所述第一视线方向是基于视线追踪模型从所述第一图像中提取的。The gaze calibration device according to claim 11, wherein the first gaze direction is extracted from the first image based on a gaze tracking model.
- 根据权利要求12所述的视线校准装置,其特征在于,所述注视区域单元,是配置为根据所述眼睛三维位置、第一视线方向、第一摄像头的外参、第二摄像头的外参与内参以及所述视线追踪模型的精度,获得用户在第二图像中的注视区域。The sight calibration device according to claim 12, wherein the fixation area unit is configured to be based on the three-dimensional position of the eye, the first line of sight direction, the extrinsic parameters of the first camera, and the extrinsic and internal parameters of the second camera. As well as the accuracy of the gaze tracking model, the gaze area of the user in the second image is obtained.
- 根据权利要求11至13任一项所述的视线校准装置,其特征在于,还包括:The sight calibration device according to any one of claims 11 to 13, further comprising:优化单元,配置为以所述用户的第二视线方向和所述第一图像作为所述用户的优化样本,基于小样本学习方法优化所述视线追踪模型。The optimization unit is configured to use the user's second gaze direction and the first image as optimization samples of the user, and optimize the gaze tracking model based on a small sample learning method.
- 根据权利要求11至14任一项所述的视线校准装置,其特征在于,所述注视点校准单元,还配置为根据所述用户在第二图像中注视点的置信度对所述注视点进行筛选;和/或,所述优化单元,还配置为根据所述用户在第二图像中注视点的置信度对 所述第二视线方向进行筛选。The line of sight calibration device according to any one of claims 11 to 14, wherein the gaze point calibration unit is further configured to perform the gaze point calibration on the gaze point according to the confidence of the user's gaze point in the second image. Screening; and/or, the optimization unit is further configured to screen the second gaze direction according to the confidence of the user's gaze point in the second image.
- 根据权利要求11至15任一项所述的视线校准装置,其特征在于,所述用户在第二图像中注视点的位置是利用注视点校准模型根据用户在第二图像中的注视区域和所述第二图像获得的。The sight calibration device according to any one of claims 11 to 15, wherein the position of the user's gaze point in the second image is based on the gaze area of the user in the second image and the position of the gaze point using a gaze point calibration model. obtained from the second image described above.
- 根据权利要求16所述的视线校准装置,其特征在于,所述注视点校准模型同时提供了用户在第二图像中注视点的概率值,所述置信度由所述概率值确定。The sight calibration device according to claim 16, wherein the gaze point calibration model also provides a probability value of the user's gaze point in the second image, and the confidence level is determined by the probability value.
- 一种视线校准装置,其特征在于,包括:A line of sight calibration device, characterized in that it comprises:注视点位置确定单元,配置为响应于用户对显示屏中参考点的注视操作,获得用户的注视点三维位置;The gaze point position determination unit is configured to obtain the three-dimensional position of the gaze point of the user in response to the user's gaze operation on the reference point in the display screen;眼睛位置确定单元,配置为根据第一摄像头采集的包含用户眼睛的第一图像,获得用户的眼睛三维位置;The eye position determination unit is configured to obtain the three-dimensional position of the user's eyes according to the first image including the user's eyes captured by the first camera;第二视线确定单元,配置为根据所述注视点三维位置和所述眼睛三维位置,获得用户的第二视线方向。The second line of sight determining unit is configured to obtain a second line of sight direction of the user according to the three-dimensional position of the gaze point and the three-dimensional position of the eye.
- 根据权利要求18所述的视线校准装置,其特征在于,所述显示屏为增强现实抬头显示。The sight calibration device according to claim 18, wherein the display screen is an augmented reality head-up display.
- 根据权利要求18所述的视线校准装置,其特征在于,所述装置还包括:The line of sight calibration device according to claim 18, wherein the device further comprises:优化单元,配置为以所述用户的第二视线方向和所述第一图像作为所述用户的优化样本,基于小样本学习方法优化视线追踪模型。The optimization unit is configured to use the second gaze direction of the user and the first image as optimization samples of the user, and optimize the gaze tracking model based on a small sample learning method.
- 一种计算设备,其特征在于,包括:A computing device, comprising:至少一个处理器;以及at least one processor; and至少一个存储器,其与存储有程序指令,所述程序指令当被所述至少一个处理器执行时使得所述至少一个处理器执行权利要求1至10任一项所述的方法。At least one memory storing program instructions which, when executed by the at least one processor, cause the at least one processor to perform the method of any one of claims 1 to 10.
- 一种计算机可读存储介质,其上存储有程序指令,其特征在于,所述程序指令当被计算机执行时使得所述计算机执行权利要求1至10任一项所述的方法。A computer-readable storage medium on which program instructions are stored, wherein the program instructions cause the computer to execute the method according to any one of claims 1 to 10 when executed by a computer.
- 一种驾驶员监控系统,其特征在于,包括:A driver monitoring system, characterized in that it comprises:第一摄像头,配置为采集包含用户眼睛的第一图像;a first camera configured to capture a first image including the user's eyes;第二摄像头,配置为采集包含用户看到的车外场景的第二图像;a second camera configured to collect a second image comprising a scene outside the vehicle seen by the user;至少一个处理器;以及at least one processor; and至少一个存储器,其与存储有程序指令,所述程序指令当被所述至少一个处理器执行时使得所述至少一个处理器执行权利要求1至7任一项所述的方法。At least one memory storing program instructions which, when executed by the at least one processor, cause the at least one processor to perform the method of any one of claims 1 to 7.
- 根据权利要求23所述的驾驶员监控系统,其特征在于,还包括:The driver monitoring system according to claim 23, further comprising:显示屏,配置为向用户显示参考点;a display screen configured to display a reference point to a user;所述程序指令当被所述至少一个处理器执行时使得所述至少一个处理器执行权利要求8至10任一项所述的方法。The program instructions, when executed by the at least one processor, cause the at least one processor to perform the method of any one of claims 8 to 10.
- 一种车辆,其特征在于,包括权利要求23或24所述的驾驶员监控系统。A vehicle, characterized by comprising the driver monitoring system according to claim 23 or 24.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202180001805.6A CN113661495A (en) | 2021-06-28 | 2021-06-28 | Sight line calibration method, sight line calibration device, sight line calibration equipment, sight line calibration system and sight line calibration vehicle |
PCT/CN2021/102861 WO2023272453A1 (en) | 2021-06-28 | 2021-06-28 | Gaze calibration method and apparatus, device, computer-readable storage medium, system, and vehicle |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/102861 WO2023272453A1 (en) | 2021-06-28 | 2021-06-28 | Gaze calibration method and apparatus, device, computer-readable storage medium, system, and vehicle |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023272453A1 true WO2023272453A1 (en) | 2023-01-05 |
Family
ID=78494760
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/102861 WO2023272453A1 (en) | 2021-06-28 | 2021-06-28 | Gaze calibration method and apparatus, device, computer-readable storage medium, system, and vehicle |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113661495A (en) |
WO (1) | WO2023272453A1 (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115116039A (en) * | 2022-01-14 | 2022-09-27 | 长城汽车股份有限公司 | Vehicle cabin outside sight line tracking method and device, vehicle and storage medium |
CN116052235B (en) * | 2022-05-31 | 2023-10-20 | 荣耀终端有限公司 | Gaze point estimation method and electronic equipment |
CN115661913A (en) * | 2022-08-19 | 2023-01-31 | 北京津发科技股份有限公司 | Eye movement analysis method and system |
CN115840502B (en) * | 2022-11-23 | 2023-07-21 | 深圳市华弘智谷科技有限公司 | Three-dimensional sight tracking method, device, equipment and storage medium |
CN116704589B (en) * | 2022-12-01 | 2024-06-11 | 荣耀终端有限公司 | Gaze point estimation method, electronic device and computer readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018170538A1 (en) * | 2017-03-21 | 2018-09-27 | Seeing Machines Limited | System and method of capturing true gaze position data |
CN109849788A (en) * | 2018-12-29 | 2019-06-07 | 北京七鑫易维信息技术有限公司 | Information providing method, apparatus and system |
CN110341617A (en) * | 2019-07-08 | 2019-10-18 | 北京七鑫易维信息技术有限公司 | Eyeball tracking method, apparatus, vehicle and storage medium |
CN111427451A (en) * | 2020-03-25 | 2020-07-17 | 中国人民解放军海军特色医学中心 | Method for determining position of fixation point in three-dimensional scene by adopting scanner and eye tracker |
US20210042520A1 (en) * | 2019-06-14 | 2021-02-11 | Tobii Ab | Deep learning for three dimensional (3d) gaze prediction |
-
2021
- 2021-06-28 WO PCT/CN2021/102861 patent/WO2023272453A1/en active Application Filing
- 2021-06-28 CN CN202180001805.6A patent/CN113661495A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018170538A1 (en) * | 2017-03-21 | 2018-09-27 | Seeing Machines Limited | System and method of capturing true gaze position data |
CN109849788A (en) * | 2018-12-29 | 2019-06-07 | 北京七鑫易维信息技术有限公司 | Information providing method, apparatus and system |
US20210042520A1 (en) * | 2019-06-14 | 2021-02-11 | Tobii Ab | Deep learning for three dimensional (3d) gaze prediction |
CN110341617A (en) * | 2019-07-08 | 2019-10-18 | 北京七鑫易维信息技术有限公司 | Eyeball tracking method, apparatus, vehicle and storage medium |
CN111427451A (en) * | 2020-03-25 | 2020-07-17 | 中国人民解放军海军特色医学中心 | Method for determining position of fixation point in three-dimensional scene by adopting scanner and eye tracker |
Also Published As
Publication number | Publication date |
---|---|
CN113661495A (en) | 2021-11-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023272453A1 (en) | Gaze calibration method and apparatus, device, computer-readable storage medium, system, and vehicle | |
US11699293B2 (en) | Neural network image processing apparatus | |
WO2021197189A1 (en) | Augmented reality-based information display method, system and apparatus, and projection device | |
WO2021013193A1 (en) | Traffic light identification method and apparatus | |
CN110167823B (en) | System and method for driver monitoring | |
CN110703904B (en) | Visual line tracking-based augmented virtual reality projection method and system | |
EP3033999B1 (en) | Apparatus and method for determining the state of a driver | |
US20220058407A1 (en) | Neural Network For Head Pose And Gaze Estimation Using Photorealistic Synthetic Data | |
WO2019137065A1 (en) | Image processing method and apparatus, vehicle-mounted head up display system, and vehicle | |
US20190279009A1 (en) | Systems and methods for monitoring driver state | |
US11112791B2 (en) | Selective compression of image data during teleoperation of a vehicle | |
WO2022241638A1 (en) | Projection method and apparatus, and vehicle and ar-hud | |
US11948315B2 (en) | Image composition in multiview automotive and robotics systems | |
JP7176520B2 (en) | Information processing device, information processing method and program | |
CN110341617B (en) | Eyeball tracking method, device, vehicle and storage medium | |
CN111854620B (en) | Monocular camera-based actual pupil distance measuring method, device and equipment | |
WO2023272725A1 (en) | Facial image processing method and apparatus, and vehicle | |
WO2022257120A1 (en) | Pupil position determination method, device and system | |
CN114463832B (en) | Point cloud-based traffic scene line of sight tracking method and system | |
CN113822174B (en) | Sight line estimation method, electronic device and storage medium | |
CN113780125A (en) | Fatigue state detection method and device for multi-feature fusion of driver | |
CN116543266A (en) | Automatic driving intelligent model training method and device guided by gazing behavior knowledge | |
JP2021009503A (en) | Personal data acquisition system, personal data acquisition method, face sensing parameter adjustment method for image processing device and computer program | |
WO2024031709A1 (en) | Display method and device | |
CN117441190A (en) | Position positioning method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21947417 Country of ref document: EP Kind code of ref document: A1 |