CN110263657B - Human eye tracking method, device, system, equipment and storage medium - Google Patents
Human eye tracking method, device, system, equipment and storage medium Download PDFInfo
- Publication number
- CN110263657B CN110263657B CN201910438457.3A CN201910438457A CN110263657B CN 110263657 B CN110263657 B CN 110263657B CN 201910438457 A CN201910438457 A CN 201910438457A CN 110263657 B CN110263657 B CN 110263657B
- Authority
- CN
- China
- Prior art keywords
- user
- camera
- dimensional
- cameras
- preset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 210000003128 head Anatomy 0.000 claims abstract description 207
- 210000001747 pupil Anatomy 0.000 claims abstract description 200
- 210000001508 eye Anatomy 0.000 claims description 158
- 238000001514 detection method Methods 0.000 claims description 22
- 238000012216 screening Methods 0.000 claims description 14
- 230000000007 visual effect Effects 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 5
- 230000001815 facial effect Effects 0.000 claims description 3
- 238000009434 installation Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 abstract description 21
- 238000012545 processing Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 13
- 230000005540 biological transmission Effects 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 6
- 238000012360 testing method Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 210000005252 bulbus oculi Anatomy 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 210000000887 face Anatomy 0.000 description 2
- 230000033001 locomotion Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 210000004087 cornea Anatomy 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 230000037308 hair color Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 229910044991 metal oxide Inorganic materials 0.000 description 1
- 150000004706 metal oxides Chemical class 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003565 oculomotor Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 230000011514 reflex Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/18—Eye characteristics, e.g. of the iris
- G06V40/19—Sensors therefor
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/61—Control of cameras or camera modules based on recognised objects
- H04N23/611—Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Ophthalmology & Optometry (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
- User Interface Of Digital Computer (AREA)
- Image Processing (AREA)
Abstract
The embodiment of the invention discloses a human eye tracking method, a human eye tracking device, a human eye tracking system, human eye tracking equipment and a storage medium, wherein the method comprises the following steps: calling a first camera to acquire user images in a preset viewing area, and determining three-dimensional head position information corresponding to each user in the preset viewing area according to the user images; determining a first preset number of target second cameras from the plurality of second cameras according to the three-dimensional head orientation information; calling each target second camera to collect a face image of the user, and determining two-dimensional binocular pupil orientation information of the user according to each face image; and determining the three-dimensional binocular pupil orientation information of the user according to the at least two-dimensional binocular pupil orientation information. By the technical scheme of the embodiment of the invention, the calculation speed and the calculation precision can be simultaneously improved, and the method and the device can be suitable for a multi-user human eye tracking scene.
Description
Technical Field
The present invention relates to image processing technologies, and in particular, to a method, an apparatus, a system, a device, and a storage medium for tracking human eyes.
Background
The human eye tracking technology is mainly applied to the fields of human-computer interaction, naked eye 3D display, virtual reality and the like, and the watching viewpoint position of a person is obtained by tracking the movement of eyeballs; the current watching position is judged through human eye tracking in the current naked eye 3D display screen, and the display image is modified to reduce the left-right eye crosstalk phenomenon of the 3D image.
The existing eye tracking is mainly realized based on PCCR (pupil corneal reflex) and an image processing identification mode, such as a Tobii eye tracker. The Tobii oculomotor uses a near infrared light source to create reflected images of the cornea and pupil of a user's eye, and then uses two image sensors to capture the eye and reflected images. The eye spatial position and gaze direction are then accurately calculated based on image processing algorithms and a three-dimensional eyeball model.
However, the existing eye tracking method does not have the user identification capability, and is only suitable for scenes of a single user in a short distance, such as using a computer, VR glasses, eye examination, and the like. In addition, in the existing eye tracking method, a face region is usually recognized in a user image, and then pupil positions of both eyes of the user are calculated according to the face region, because the pixel area occupied by the face region in the user image may be small, when the pupil positions of both eyes are directly calculated in the user image, the situation that the calculation accuracy and the calculation speed cannot be simultaneously improved is caused.
Therefore, there is an urgent need for a human eye tracking method that can improve the calculation speed and the calculation accuracy.
Disclosure of Invention
The embodiment of the invention provides a human eye tracking method, a human eye tracking device, a human eye tracking system, human eye tracking equipment and a storage medium, which can simultaneously improve the calculation speed and accuracy and can be suitable for a human eye tracking scene of multiple users.
In a first aspect, an embodiment of the present invention provides an eye tracking method, including:
calling a first camera to collect user images in a preset viewing area, and determining three-dimensional head position information corresponding to each user in the preset viewing area according to the user images;
determining a first preset number of target second cameras from the plurality of second cameras according to the three-dimensional head orientation information;
calling each target second camera to collect a face image of the user, and determining two-dimensional binocular pupil orientation information of the user according to each face image;
and determining the three-dimensional binocular pupil orientation information of the user according to the at least two-dimensional binocular pupil orientation information.
In a second aspect, an embodiment of the present invention further provides an eye tracking apparatus, including:
the three-dimensional head orientation information determining module is used for calling a first camera to collect user images in a preset watching area and determining three-dimensional head orientation information corresponding to each user in the preset watching area according to the user images;
the target second camera determining module is used for determining a first preset number of target second cameras from the plurality of second cameras according to the three-dimensional head position information;
the two-dimensional binocular pupil orientation information determining module is used for calling each target second camera to acquire a face image of the user and determining the two-dimensional binocular pupil orientation information of the user according to each face image;
and the three-dimensional binocular pupil orientation information determining module is used for determining the three-dimensional binocular pupil orientation information of the user according to at least two pieces of the two-dimensional binocular pupil orientation information.
In a third aspect, an embodiment of the present invention further provides an eye tracking system, where the system includes: the system comprises a first camera, a plurality of second cameras and a human eye tracking device; the eye tracking device is used for realizing the eye tracking method provided by any embodiment of the invention.
In a fourth aspect, an embodiment of the present invention further provides an apparatus, where the apparatus includes:
one or more processors;
a memory for storing one or more programs;
an input device for acquiring an image;
output means for displaying screen information;
when executed by the one or more processors, cause the one or more processors to implement a method of eye tracking as provided by any of the embodiments of the invention.
In a fifth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the eye tracking method according to any embodiment of the present invention.
According to the embodiment of the invention, the three-dimensional head position information corresponding to each user in the preset watching area is determined according to the user image acquired by the first camera, so that the head of the user is tracked. Based on the three-dimensional head azimuth information of the user, the first preset number of target second cameras are determined from the plurality of second cameras, so that the pixel area occupied by the binocular pupil area in the face image collected by each target second camera is large, the three-dimensional binocular pupil azimuth information of the user can be accurately and quickly determined according to the plurality of face images, the high-speed tracking of human eyes is achieved, and meanwhile, the calculation precision is improved. In the embodiment of the invention, the plurality of second cameras are used for collecting the face images of the users in the preset watching area, so that different second cameras can be called to simultaneously collect the face images of different users, and the method and the device are suitable for human eye tracking scenes of multiple users.
Drawings
Fig. 1 is a flowchart of an eye tracking method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a second camera matching according to an embodiment of the present invention;
FIG. 3 is an example of distance sensitivity of light rays to eye position in the depth direction according to an embodiment of the present invention;
FIG. 4 is an illustration of a second three-dimensional representation of head position information in accordance with an embodiment of the present invention;
FIG. 5 is a schematic diagram of a method for receiving facial image data according to an embodiment of the present invention;
FIG. 6 is a flowchart illustrating an eye tracking method according to a second embodiment of the present invention;
FIG. 7 is an example of a spiral search pattern according to a second embodiment of the present invention;
fig. 8 is a schematic diagram of a second camera juxtaposition at a second layout position according to a second embodiment of the present invention;
fig. 9 is an example of a three-layer data table corresponding to a second camera according to a second embodiment of the present invention;
FIG. 10 is a schematic diagram of a human eye tracking apparatus according to a third embodiment of the present invention;
fig. 11 is a schematic structural diagram of an eye tracking system according to a fourth embodiment of the present invention;
fig. 12 is a layout example of the first cameras when the preset viewing area is an area inside a circle according to the fourth embodiment of the present invention;
fig. 13 is a layout example of a first camera when a preset viewing area is a circular outer area according to a fourth embodiment of the present invention;
fig. 14 is a layout example of a first camera when a preset viewing zone is a straight-line one-sided viewing zone according to a fourth embodiment of the present invention;
fig. 15 is an example of an intersection area of two adjacent second cameras according to a fourth embodiment of the present invention;
fig. 16 is a layout example of the second camera when the preset viewing area is an area inside a circle according to the fourth embodiment of the present invention;
fig. 17 is a layout example of a second camera when a preset viewing area is a circular outer area according to a fourth embodiment of the present invention;
fig. 18 is a layout example of second cameras when a preset viewing zone is a straight-line one-sided viewing zone according to a fourth embodiment of the present invention;
fig. 19 is a schematic structural diagram of another eye tracking system according to the fourth embodiment of the present invention;
fig. 20 is a schematic structural diagram of an apparatus according to a fifth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It should be further noted that, for the convenience of description, only some structures related to the present invention are shown in the drawings, not all of them.
Example one
Fig. 1 is a flowchart of an eye tracking method according to an embodiment of the present invention, where the embodiment is applicable to a situation where a user performs tracking and positioning on pupils of two eyes of the user when the user views a 3D display screen. The method can be executed by a human eye tracking device, which can be implemented by software and/or hardware, and is integrated in a device with a 3D display function, such as a naked eye 3D advertising machine, a naked eye 3D display and the like. As shown in fig. 1, the method specifically includes the following steps:
s110, calling a first camera to collect user images in the preset watching area, and determining three-dimensional head direction information corresponding to each user in the preset watching area according to the user images.
Wherein, the first camera may refer to a camera for tracking the head of the user. The first camera in this embodiment may be a 3D camera, or may be a plurality of 2D cameras. Because the first camera is responsible for tracking the head of the user, the requirements on the details of the face and the tracking speed are not high, and the first camera with a large visual angle can be selected. The preset viewing region may refer to a region where the user views the 3D display screen, which may be predetermined according to the shape and position of the 3D display screen. For example, if the 3D display screen is a circle and faces the center of the circle for displaying, the preset viewing area may be within a circular area formed by the circle. There may be one or more users in the preset viewing area to view simultaneously. In this embodiment, the number and the shooting positions (i.e., the layout mode) of the first cameras may be preset according to the shape and the positions of the 3D display screen, so that the total detection area of each first camera may cover the preset viewing area, and the head of each user in the preset viewing area may be tracked by using the first cameras. In this embodiment, the heads of at least 10 users can be tracked simultaneously, and the specific number of the users can be determined according to the shooting performance of the first camera. The three-dimensional head orientation information may include three-dimensional head position information and head orientation information of the user, where the head orientation information may be used to reflect a head state of the user, such as head up, head down, or head over.
Specifically, each first camera can be called to collect a user image in a preset watching area, tracking and positioning of the head of the user can be achieved by means of the visual positioning principle and the like, image processing is performed on the user image through the technologies such as face matching and the like, different user heads are recognized, and therefore three-dimensional head direction information corresponding to each user in the preset watching area can be calculated. In this embodiment, the first camera may preferably be a high-definition color camera, so that a hair color, a skin color, and the like of the user may be identified according to RGB information in the acquired user image, and thus, three-dimensional head orientation information of the user may be determined more accurately.
Illustratively, the three-dimensional head orientation information corresponding to each user in the preset viewing area may be stored by using the following data structure:
and hid is an identification number corresponding to a tracked head of a certain user so as to distinguish different heads of users. (x, y, z) is the three-dimensional head position coordinates of the user, and the unit can be set to mm, (angle _ x, angle _ y, angle _ z) is the orientation vector corresponding to the head orientation of the user. The accuracy of the rotation of the user's head in the horizontal direction (i.e., the rotation angle in the y-axis) is greater in this embodiment than in the other two directions (i.e., the rotation angles in the x-axis and the z-axis).
And S120, determining a first preset number of target second cameras from the plurality of second cameras according to the three-dimensional head orientation information.
The second camera may be a camera for tracking pupils of both eyes of the user. The second camera in this embodiment may be a 2D camera. In this embodiment, the specific number and the shooting positions (i.e., the layout mode) of the second cameras may be preset according to the shape and the positions of the 3D display screen, so that the total detection area of each second camera may cover the preset viewing area, and thus, the second cameras may be used to track the eyes of each user in the preset viewing area, thereby achieving eye tracking of multiple users. The target second camera may refer to a second camera selected from a plurality of second cameras and best acquiring the image of the face of the user. The first preset number may be set according to a service requirement and a scenario. The first preset number in this embodiment may be at least two.
Specifically, for each user, the target second camera corresponding to the user can be determined from the plurality of second cameras based on the three-dimensional head orientation information of the user, so that the pixel area occupied by the pupil area of the two eyes in the face image acquired by each target second camera is large, the resolution is further ensured, and the accuracy of eye tracking is improved. For example, in this embodiment, the second camera may be a black and white camera, and an illuminating infrared light source is disposed at a mounting position of the second camera, so that image acquisition may be performed. In this embodiment, the resolution of the second camera may be smaller than the resolution of the first camera, so as to further improve the processing speed of the image.
Exemplarily, S120 may include: determining a second preset number of candidate second cameras corresponding to the user and the matching degree corresponding to each candidate second camera according to the three-dimensional head position information and the position configuration information of each second camera; and screening a first preset number of target second cameras from the candidate second cameras according to the matching degrees and the current calling times corresponding to each candidate second camera.
The azimuth configuration information of the second camera may include, but is not limited to, an installation position, a resolution, a depth of field parameter, and a viewing angle range of the second camera. For example, the orientation configuration information of each second camera may be stored in the following data structure:
wherein, cid is an identification number corresponding to the second camera so as to distinguish different second cameras; (x, y, z) is the coordinate of the installation position corresponding to the second camera, and the unit can be millimeter mm; (angle _ x, angle _ y, angle _ z) is a direction vector corresponding to the shooting center of the second camera. width and height respectively represent the resolution of the second camera; fov _ h and fov _ v are the viewing angles of the second camera in the horizontal and vertical directions, respectively, in degrees. dof is the depth of field parameter of the second camera. Type is a layout mode of the second cameras, such as a circular inward Type, a circular outward Type and a plane Type, where the circular inward Type may refer to that the shooting directions of the second cameras distributed on the circle are all towards the center of the circle; the circular outward shape may refer to a position where the shooting directions of the second cameras distributed on the circle deviate from the center of the circle; the flat type may mean that the second cameras distributed on the straight line are all photographed toward the viewing area on one side of the straight line. The embodiment can select a corresponding screening optimization strategy based on the layout mode. It should be noted that the mounting position coordinates (x, y, z) corresponding to each second camera stored in the present embodiment refer to coordinates in the world coordinate system so as to perform data matching.
Specifically, in this embodiment, the three-dimensional head orientation information is matched with the orientation configuration information of each second camera, a second preset number of candidate second cameras capable of shooting the face image of the user are determined, and the matching degree corresponding to each candidate second camera can be determined according to an included angle between the head orientation direction of the user and the center line of the shooting angle of each candidate second camera, for example, the larger the included angle between the head orientation direction and the center line of the shooting angle, the smaller the matching degree corresponding to the candidate second camera is. In this embodiment, whether a shooting occlusion exists or not may also be detected according to a positional relationship between a plurality of users in a preset viewing area, so as to adjust a matching degree corresponding to the candidate second camera. For example, if there is a user B that blocks the user a when a candidate second camera is used to capture the user a, the matching degree corresponding to the candidate second camera may be reduced, or the matching degree corresponding to the candidate second camera is set to be the minimum, so as to avoid image acquisition for the user a by using the candidate second camera. When tracking multiple users simultaneously, each second camera may correspond to multiple shooting tasks, and each shooting task corresponds to one-time calling of the second camera, so that shooting can be performed for different users. The current number of times of call corresponding to the candidate second camera may be the number of shooting tasks to be executed corresponding to the candidate second camera at the current time. The first preset number in this embodiment is less than or equal to the second preset number. The size of the second preset number may be determined according to a service scenario and an actual operating condition. When the first preset number is smaller than the second preset number, the optimal second cameras, namely the target second cameras, in the first preset number can be further screened out from the candidate cameras in the second preset number based on the matching degree corresponding to each candidate camera and the current calling times. When the first preset data is equal to the second preset number, each determined candidate second camera may be directly determined as a target second camera.
Fig. 2 shows an exemplary schematic diagram of a second camera matching. As shown in fig. 2, the second cameras are arranged in a circular inward manner, that is, a plurality of arrangement positions are uniformly distributed on the circle, a plurality of second cameras can be installed at each arrangement position, and the shooting direction of each camera faces to the center position of the circle. In fig. 2, the dotted line indicates the direction of the head of the user, and C1 indicates the shooting angle of view corresponding to the second camera C1 located at the layout position 1; c2 is a shooting angle of view corresponding to the second camera C2 located at the layout position 2; c3 is a shooting angle of view corresponding to the second camera C3 located at the layout position 2; c4 refers to a shooting angle of view corresponding to the second camera C4 located at the layout position 3. In fig. 2, the head of the user a faces between the layout position 1 and the layout position 2, for the user a, an optimal second camera can be matched at both the layout position 1 and the layout position 2, the optimal second camera at the layout position can be determined according to the shooting angle of each second camera at the layout position, and as shown in fig. 2, two target second cameras matched to the user a are respectively a second camera C1 and a second camera C2. Similarly, the second cameras matching the two targets for the user B are the second camera C3 and the second camera C4, respectively.
S130, calling each target second camera to collect the face image of the user, and determining the two-dimensional binocular pupil orientation information of the user according to each face image.
The two-dimensional binocular pupil orientation information may include two-dimensional binocular pupil position information and eye gaze direction information of a user, where the two-dimensional binocular pupil position information includes two-dimensional left-eye pupil position information and two-dimensional right-eye pupil position information. Illustratively, the two-dimensional binocular pupil orientation information of the user may be stored in the following data structure:
wherein, (left _ x, left _ y), (right _ x, right _ y) represent two-dimensional position coordinates of the pupils of the left and right eyes, wherein the x-axis direction can be a left-to-right direction, and the y-axis direction can be a top-to-bottom direction; (angle _ x, angle _ y) represents the angle of the eye gaze direction, wherein the x direction can be divided into four ranges of "none", "left", "middle" and "right", respectively the y direction can be four ranges of "none", "up", "middle" and "down", wherein "none" means direction uncertainty. The Time may be a Time point corresponding to the calculation of the pupil position of the eye, and may be in ms, so as to distinguish different pupil positions of the eye by using the Time point. Mode represents a detected eye Mode, for example, four modes of a binocular Mode, a left eye Mode, a right eye Mode, and a monocular Mode, where the monocular Mode may mean that only the position of the pupil of one eye is detected and whether the eye is the left eye or the right eye cannot be recognized. The two-dimensional position coordinates corresponding to the monocular mode may be stored at the left pupil position (left _ x, left _ y) or the right pupil position (right _ x, right _ y). The precision of the two-dimensional position coordinates of the pupils of both eyes calculated in this embodiment is ± 1 pixel.
Specifically, the embodiment utilizes the screened target second camera to pertinently collect the face image of the user, so that when the target second camera with lower resolution is utilized to collect the face image, the occupied pixel area of the pupil area of the two eyes in the face image can be ensured to be more, and the calculation accuracy can be improved on the premise of improving the calculation speed. After the first preset number of target second cameras are determined, each face image corresponding to the user is obtained by calling each target second camera, and each collected face image can be processed based on an image processing algorithm, so that the two-dimensional binocular pupil orientation information in each face image can be rapidly and accurately calculated.
And S140, determining three-dimensional binocular pupil orientation information of the user according to the at least two-dimensional binocular pupil orientation information.
The three-dimensional binocular pupil orientation information may include three-dimensional binocular pupil position information and eye gazing direction information of the user. In light field display, due to the problem of light ray angle, the distance sensitivity of the light rays of pixels in different directions to the eye position in the depth direction is different. Fig. 3 gives an example of the distance sensitivity of a ray to the eye position in the depth direction. The light field display screen in fig. 3 has a circular shape and can be viewed by a user in the circular area. In fig. 3, the precision of the light ray emitted from "pixel 1" at the eye pupil position in the y-axis direction is d1, the precision of the light ray emitted from "pixel 2" at the eye pupil position in the y-axis direction is d2, and it can be seen that d2< d1, i.e., the precision of the light ray emitted from "pixel 2" at the eye pupil position in the y-axis direction is higher. If the precision of the eye pupil position in the y-axis direction is smaller than d2, the light emitted by the "pixel 2" cannot be irradiated to the eye pupil position of the user, resulting in a missing pixel phenomenon, and thus the precision in the depth distance needs to be improved.
Specifically, the present embodiment may perform three-dimensional reconstruction on the pupils of the two eyes of the user according to at least two pieces of two-dimensional binocular pupil orientation information, and calculate the three-dimensional binocular pupil orientation information of the user, so as to improve the accuracy on the depth distance and avoid the above-mentioned pixel missing phenomenon. The embodiment can input the three-dimensional binocular pupil orientation information of the user into the 3D display screen drive, so that the 3D display screen can determine corresponding display data according to the three-dimensional binocular pupil orientation information, and the user can view a corresponding three-dimensional picture.
According to the technical scheme of the embodiment, the three-dimensional head orientation information corresponding to each user in the preset watching area is determined according to the user image collected by the first camera, so that the head of the user is tracked. The method comprises the steps that a first preset number of target second cameras are determined from a plurality of second cameras based on three-dimensional head azimuth information of a user, so that the pixel area occupied by a binocular pupil area in a face image collected by each target second camera is large, the three-dimensional binocular pupil azimuth information of the user can be accurately and quickly determined according to a plurality of face images, high-speed tracking of human eyes is achieved, and meanwhile calculation accuracy is improved. In addition, the embodiment of the invention utilizes a plurality of second cameras to collect the face images of the users in the preset watching area, thereby calling different second cameras to simultaneously collect the face images of different users, and further realizing the eye tracking of the users.
On the basis of the above technical solution, "screening out a first preset number of target second cameras from each candidate second camera according to each matching degree and the current calling number corresponding to each candidate second camera", may include: screening out candidate second cameras with the current calling times smaller than or equal to the preset calling times according to the current calling times corresponding to the candidate second cameras as second cameras to be selected; and based on the matching degree corresponding to the second cameras to be selected, performing descending order arrangement on the cameras, and determining the first preset number of the second cameras to be selected after arrangement as target second cameras.
The preset calling times may refer to a maximum value of the number of the shooting tasks to be executed corresponding to the second camera, and may be preset according to the service requirement and the scene. For example, the preset number of calls may be set to 5. Specifically, in this embodiment, in each candidate second camera, a candidate second camera whose current calling frequency is less than or equal to a preset calling frequency is screened, and the screened candidate second camera is used as a second camera to be selected, then the matching degrees of the second cameras to be selected are arranged from large to small, and the arranged first preset number of second cameras to be selected is determined as the target second camera.
On the basis of the above technical solution, the apparatus for eye tracking in this embodiment may periodically invoke the first camera, so as to periodically acquire a user image, determine three-dimensional head orientation information of a corresponding user according to the user image acquired each time, determine the target second camera by using the three-dimensional head orientation information, and determine two-dimensional binocular pupil orientation information of the user according to a face image acquired by each target second camera. That is, in the present embodiment, corresponding two-dimensional head orientation information may be determined from periodically acquired user images. If the at least two pieces of two-dimensional binocular pupil position information cannot be determined according to the collected user images in the preset historical times, the fact that the head of the user is always in a shielded state or the user is always in a moving state is indicated, and therefore when the current target second camera is determined, all the candidate second cameras in the second preset number can be determined as the target second cameras, the probability of tracking human eyes is increased, and the tracking efficiency is improved.
Alternatively, before S140, the method may further include: if only one two-dimensional binocular pupil orientation information is determined according to each face image, re-screening at least one target second camera from the remaining candidate second cameras after the target second camera is screened, calling the re-screened target second camera to collect the face image of the user, and determining corresponding two-dimensional binocular pupil orientation information according to the re-collected face image; or if the two-dimensional binocular pupil orientation information cannot be determined according to each face image, re-screening at least two target second cameras from the remaining candidate second cameras after the target second cameras are screened, calling each re-screened target second camera to acquire the face image of the user, and determining corresponding two-dimensional binocular pupil orientation information according to each re-acquired face image.
Specifically, after the first preset number of face images of the user are acquired by calling the first preset number of target second cameras, if the user suddenly turns or is located at a junction position of the second cameras, and the like, the acquired face images may not contain the binocular pupil information of the user, so that the corresponding two-dimensional binocular pupil orientation information cannot be determined based on the face images. If only one two-dimensional binocular pupil orientation information is determined according to each face image, at least one target second camera can be screened out again from the remaining candidate second cameras based on the same screening rule, the newly screened target second camera is used for collecting the face image of the user again, and the corresponding two-dimensional binocular pupil orientation information is determined again according to the face image so as to obtain at least two-dimensional binocular pupil orientation information. The screening rule may be a rule for screening based on the current number of calls and the matching degree of the candidate camera. Similarly, if the two-dimensional binocular pupil orientation information cannot be determined according to each face image, that is, each face image does not contain binocular pupil information, at least two target second cameras can be screened out again from the remaining second cameras, the face image of the user is re-acquired by using the re-screened target second cameras, and corresponding two-dimensional binocular pupil orientation information is determined again according to the face image, so that at least two-dimensional binocular pupil orientation information can be obtained.
On the basis of the above technical solution, after S140 and before S110, the method further includes: estimating second three-dimensional head position information of the user at the next moment according to the three-dimensional binocular pupil position information of the user at the current moment and the three-dimensional binocular pupil position information of the user at the historical moment; and predicting three-dimensional binocular pupil orientation information of the user at the next moment according to the second three-dimensional head orientation information.
The first camera in this embodiment collects user images in a preset viewing area frame by frame according to a preset frame rate, so that three-dimensional head orientation information corresponding to a user can be periodically determined according to the periodically collected user images. The delay time consumed for calculating the three-dimensional binocular pupil orientation information from the three-dimensional head orientation information is shorter than the delay time consumed for acquiring the user image and determining the three-dimensional head orientation information from the user image. That is to say, after the three-dimensional binocular pupil position information of the user at the current moment is determined according to the current frame user image acquired by the first camera, the three-dimensional head position information of the user at the next moment determined according to the next frame user image cannot be immediately obtained, that is, the head positioning and tracking speed is slower than the eye positioning and tracking speed. In this embodiment, after the three-dimensional binocular pupil position information of the user at the current time is determined according to the current frame user image, that is, after the operations in steps S110 to S140 are performed, and before the three-dimensional head position information of the user at the next time is determined according to the next frame user image, the three-dimensional head position information of the user at the next time can be estimated, so that the problem of slow head positioning and tracking speed is solved, and the tracking speed is further increased. The second three-dimensional head orientation information in this embodiment may be three-dimensional head orientation information obtained by performing estimation according to existing three-dimensional binocular pupil orientation information of a user. The three-dimensional binocular pupil position information at the historical time may be three-dimensional binocular pupil position information accurately determined based on the user image at the historical time.
Specifically, after S140, if the three-dimensional head position information at the next time determined according to the user image is not obtained, the second three-dimensional head position information of the user at the next time can be accurately estimated according to the three-dimensional binocular pupil position information calculated at the current time and the three-dimensional binocular pupil position information calculated at the historical time, so that the three-dimensional binocular pupil position information of the user at the next time can be estimated according to the second three-dimensional head position information, and the problem of slow head positioning and tracking speed is solved. When accurate three-dimensional head orientation information determined from the user image is obtained, the operations of steps S120 to S140 are performed according to the three-dimensional head orientation information, so that the delay time can be reduced, and the tracking speed can be further increased.
Illustratively, the second three-dimensional head orientation information includes a three-dimensional head position and a rotation angle; accordingly, second three-dimensional head orientation information of the user at the next moment can be estimated according to the following formula:
wherein (X) p1 ,Y p1 ,Z p1 ) And alpha 1 The three-dimensional eye pupil position and the gaze direction angle of the user at the current moment P1; (X) p2 ,Y p2 ,Z p2 ) And alpha 2 The three-dimensional eye pupil position and the gazing direction angle of the user at the historical moment P2 are obtained; (X) p3 ,Y p3 ,Z p3 ) And alpha 3 The three-dimensional eye pupil position and the gazing direction angle of the user at the historical moment P3; (X, Y, Z) and α are estimated three-dimensional head position and rotation angle of the user at the next time instant. The three-dimensional eye pupil positions at the current time P1, the historical time P2 and the historical time P3 may be the three-dimensional right eye pupil position or the three-dimensional left eye pupil position of the user. Fig. 4 gives an example of a second three-dimensional presentation of head position information. Points a and a in fig. 4 are the three-dimensional eye pupil position and the angle of the gazing direction of the user at the next moment, respectively; the points A1 and A1 are respectively the three-dimensional eye pupil position and the angle of the gazing direction of the user at the current moment P1; the points A2 and A2 are the three-dimensional eye pupil position and the angle of the gazing direction of the user at the current moment P2 respectively; the points A3 and A3 are the three-dimensional head position and the rotation angle of the user at the current time P3, respectively, and in this embodiment, the three-dimensional head position and the rotation angle at the next time of the user can be estimated according to the accurate three-dimensional eye pupil position and the angle of the gaze direction corresponding to the current time P1 and the previous two times P2 and P3 by a proportionally variable prediction mechanism, so that the operations in steps S120 to S140 can be continuously performed according to the estimated second three-dimensional head orientation information, and the tracking speed is increased.
On the basis of the above technical solution, after S130, the method further includes: and determining the position and the size of a target face area in the face image collected by each target second camera according to the three-dimensional head orientation information. The target face region may refer to an image region composed of a face contour in the face image. The embodiment can demarcate the position and the size of the target face area in the face image based on the three-dimensional head orientation information of the user, so that the calculation area can be reduced, and the two-dimensional binocular pupil orientation information of the user can be calculated more quickly by only carrying out image processing on the target face area, so that the calculation speed of human eye positioning can be further improved.
On the basis of the foregoing technical solution, the "determining the two-dimensional binocular pupil orientation information of the user according to each face image" in S130 may include: and determining the time for receiving the data through the scanning line according to the position and the size of the target face area, receiving the target face area data sent by the target second camera according to the time, and determining the two-dimensional binocular pupil orientation information of the user according to the received target face image data.
The second target camera may send the acquired face image data to the eye tracking device in a line manner using a CMOS (complementary metal oxide semiconductor Interface). The second target camera may store image data using a continuous memory module, and may transmit the face image data in a manner of directly transmitting a row pointer list of the face image, i.e., in a manner of a scan line, so as to improve a data transmission speed. According to the embodiment, the time for receiving data through the scanning line can be calculated according to the position and the size of the target face area, so that less useless data are received on the premise of receiving the face area, the delay time caused by the fact that the target second camera collects the face image can be reduced, and the tracking speed is further improved. Illustratively, fig. 5 gives a schematic diagram of receiving face image data. The face image resolution in fig. 5 is 640 × 480. When the target second camera starts to transmit a frame of face image, a frame start synchronization signal (the corresponding transmission time is denoted as Ts) is transmitted first, and then a frame end synchronization signal (the corresponding transmission time is denoted as Te) is transmitted at the end of transmission, and the line transmission speed of the target second camera can be determined according to the two signals: (Te-Ts)/480. The human eye tracking device can determine the position of the last line of the target face region at the height of 300 according to the three-dimensional head azimuth information, so that the data receiving time corresponding to the target face region can be calculated as follows: 300 × (Te-Ts)/480. When the receiving time reaches 300 x (Te-Ts)/480, the timing is started after the frame start synchronizing signal is received, and the target face image data containing the face area is received, the rest face image data does not need to be received at the moment, so that the data transmission speed can be improved, the two-dimensional binocular pupil orientation information of the user can be more quickly determined according to the received target face image data, the delay time is reduced, and the tracking speed and the calculation speed are further improved.
Example two
Fig. 6 is a flowchart of a human eye tracking method according to a second embodiment of the present invention, and in this embodiment, based on the above-mentioned embodiments, "determining three-dimensional binocular pupil orientation information of a user according to at least two pieces of two-dimensional binocular pupil orientation information" is optimized. Wherein explanations of the same or corresponding terms as those of the above-described embodiments are omitted.
Referring to fig. 6, the eye tracking method provided in this embodiment specifically includes the following steps:
s210, calling a first camera to collect user images in the preset watching area, and determining three-dimensional head direction information corresponding to each user in the preset watching area according to the user images.
S220, determining a first preset number of target second cameras from the plurality of second cameras according to the three-dimensional head orientation information.
And S230, calling each target second camera to acquire the face image of the user.
S240, if at least two pieces of two-dimensional binocular pupil orientation information corresponding to the current moment of the user cannot be determined according to each face image, determining a spiral searching rule according to the three-dimensional binocular pupil orientation information at the historical moment.
Specifically, when the user suddenly turns around or is located at a boundary position of the second camera, and the like, the second camera cannot detect the binocular pupil information of the user by calling the target, that is, at least two pieces of two-dimensional binocular pupil orientation information cannot be determined according to each face image, which indicates that the three-dimensional head orientation information determined according to the user image is inaccurate, and the three-dimensional head orientation information corresponding to the user at the current moment needs to be predicted, so that the three-dimensional head orientation information determined according to the user image is adjusted. Generally, the head rotates at a faster angular speed than the head moves when the user views the screen, so that besides establishing a head kinematics model, the user can search outward in a spiral shape to improve the prediction speed. According to the embodiment, the radius value and the motion track of the spiral at each node can be determined according to the three-dimensional binocular pupil position information determined at the historical moment, so that the spiral search rule is determined. An example of a spiral search pattern is given in fig. 7. Nodes "0", "1", "2", "3", "4" and "5" in fig. 7 respectively represent different head positions, and the straight line connecting each node represents the head orientation corresponding to the node. The spiral search rule in this embodiment may be that node "0" is the starting position, and the searching is repeated outward according to the sequence of nodes "1" to "5".
And S250, taking the three-dimensional head orientation information determined according to the user image as the three-dimensional head orientation information of the user at the current moment.
Specifically, when the three-dimensional head position information of the user is predicted by using the spiral search rule, the three-dimensional head position information determined according to the user image is inaccurate, that is, the target second camera called according to the three-dimensional head position information cannot detect the pupil information of the eyes of the user, and in this embodiment, the three-dimensional head position information is first used as the three-dimensional head position information of the user at the current time, so as to adjust the three-dimensional head position information.
And S260, adjusting the three-dimensional head orientation information at the current moment according to the spiral search rule, and taking the adjusted three-dimensional head orientation information at the current moment as first three-dimensional head orientation information.
Specifically, the three-dimensional head orientation information at the current time may be adjusted according to the spiral search graph corresponding to the spiral search rule. Illustratively, as in fig. 7, a node "0" represents three-dimensional head position information determined from the user image, that is, three-dimensional head position information at the current time. When the re-search is performed outward according to the sequence of the nodes "1" to "5", the three-dimensional head position and the head orientation corresponding to the next node "1" of the node "0" may be determined as the adjusted three-dimensional head orientation information at the current time, that is, the first three-dimensional head orientation information, so that the head orientation information is reasonably adjusted and predicted.
And S270, determining at least two first two-dimensional binocular pupil position information of the user at the current moment according to the first three-dimensional head position information.
Specifically, after the first three-dimensional head information is predicted, a first preset number of target second cameras can be determined from the plurality of second cameras according to the first three-dimensional head information, each target second camera is called to collect a face image of the user, and at least two first two-dimensional binocular pupil orientation information of the user at the current moment is determined according to each face image.
S280, determining three-dimensional binocular pupil orientation information of the user according to the at least two first two-dimensional binocular pupil orientation information.
Specifically, the embodiment may perform three-dimensional reconstruction on the pupils of the two eyes of the user according to the at least two pieces of first two-dimensional pupil orientation information, and calculate the three-dimensional pupil orientation information of the two eyes of the user, so that the calculation accuracy and speed may be improved.
According to the technical scheme of the embodiment, when the three-dimensional head position information determined according to the user image cannot detect the pupil information of the eyes of the user, the three-dimensional head position information can be adjusted and predicted by utilizing the spiral search rule, so that more accurate first three-dimensional head position information can be obtained, and based on the first three-dimensional head position information, at least two pieces of first two-dimensional pupil position information can be obtained, so that the three-dimensional pupil position information of the eyes of the user can be determined, the problem that the eyes cannot be tracked due to the fact that the user suddenly rotates and the like can be solved, and the eye tracking speed and the eye tracking precision are further improved.
On the basis of the above technical scheme, if at least two first two-dimensional binocular pupil position information at the current time cannot be determined according to the first three-dimensional head position information, and the current test duration is less than the preset duration, the first three-dimensional head position information is updated to the three-dimensional head position information at the current time, and step S260 is performed.
The preset duration may be determined according to the frame rate of the first camera, for example, a time interval between two adjacent frames of images captured by the first camera may be determined as the preset duration, so as to ensure that the prediction adjustment operation is performed before the three-dimensional head orientation information determined according to the user image is obtained. Specifically, after a first preset number of target second cameras are determined from a plurality of second cameras according to first three-dimensional head information, and each target second camera is called to collect a face image of a user, if at least two pieces of first two-dimensional binocular pupil orientation information of the user at the current moment cannot be determined according to each face image, and the current test duration is less than a preset duration, it is indicated that the estimated first three-dimensional head orientation information is wrong at the moment, and the test duration is short, at the moment, the first three-dimensional head orientation information can be used as the three-dimensional head orientation information at the current moment, and the operation of the steps S260-S280 is returned to be executed, the three-dimensional head orientation information at the current moment is adjusted again, and the first three-dimensional head orientation information is updated. For example, when the node "1" in fig. 7 is used as the three-dimensional head orientation information at the current time and a re-search is performed outward in the order of the nodes "1" - "5", the three-dimensional head position and the head orientation corresponding to the node "2" next to the node "1" may be determined as the adjusted three-dimensional head orientation information at the current time, that is, the updated first three-dimensional head orientation information, so that the head orientation information may be reasonably adjusted and predicted again.
On the basis of the technical scheme, if at least two first two-dimensional binocular pupil position information at the current moment cannot be determined according to the first three-dimensional head position information, and the current testing time length is equal to the preset time length, the three-dimensional binocular pupil information determined at the previous moment by the user is used as the three-dimensional binocular pupil position information at the current moment of the user.
Specifically, if at least two pieces of first two-dimensional binocular pupil position information at the current time cannot be determined according to the first three-dimensional head position information, and the current test duration is equal to the preset duration, it is indicated that the three-dimensional head position information determined according to the user image is to be obtained, at this time, the prediction operation may be stopped to avoid performing prediction indefinitely, and the three-dimensional binocular pupil information determined at the previous time is directly determined as the three-dimensional binocular pupil position information at the current time. For example, in this embodiment, when the current testing duration is equal to the preset duration, if only one piece of first two-dimensional binocular pupil position information at the current time is determined, the depth distance may be calculated by using the two-dimensional binocular pupil position information determined at the previous time, and the three-dimensional binocular pupil position information at the current time is calculated more accurately according to the depth distance and the one piece of first two-dimensional binocular pupil position information at the current time.
On the basis of the above technical scheme, before calling the first camera to acquire the user image in the preset viewing area, the method further comprises: determining the number of first layout positions corresponding to the first camera according to a viewing angle range corresponding to a preset viewing area and a first preset orientation error corresponding to the first camera; and determining the number of the first cameras corresponding to each first layout position according to the first visual angle of the first cameras.
The first preset orientation error may refer to a degree of rotation of the head of the user when a new first camera is matched, and may be preset according to a service requirement and a scene. For example, if the first cameras are arranged on a circle, the first preset orientation error may be set to 60 degrees, that is, a new first camera may exist for capturing the front image of the user every time the head of the user rotates 60 degrees. The viewing angle range may refer to a viewing angle corresponding to the preset viewing area, for example, if the preset viewing area is a circular area, the corresponding preset viewing angle is 360 degrees. The first view angle of the first camera may refer to a range of shooting angles of the first camera. The preset detection distance corresponding to the preset viewing area may be a maximum distance from the user to the first camera in the preset viewing area. The first depth of field of each first camera is larger than the preset detection distance corresponding to the preset viewing area, so that the first camera can be used for clearly shooting the user image in the preset viewing area.
Specifically, the present embodiment may determine, as the number of the first layout positions, a result obtained by dividing the viewing angle range corresponding to the preset viewing area by the first preset orientation error corresponding to the first camera. And determining the number of the first cameras corresponding to each first layout position according to the view angle range required by each first layout position and the first view angle of the first cameras, so that the whole preset viewing area can be covered. For example, if the required viewing angle range of the first layout positions is 150 degrees and the first viewing angle of the first camera is 150 degrees, it may be determined that only one first camera needs to be installed at each first layout position.
On the basis of the above technical scheme, before calling the first camera to acquire the user image in the preset viewing area, the method further comprises: determining the number of second layout positions corresponding to the second camera according to a viewing angle range corresponding to the preset viewing area and a second preset orientation error corresponding to the second camera; determining the number of depth layers corresponding to each second layout position according to the second depth of field of the second camera and the preset detection distance corresponding to the preset viewing area; and determining the number of second cameras corresponding to each layer of depth of field according to the second visual angle of the second cameras.
The second preset orientation error may refer to a degree of rotation of the head of the user when the optimal second camera is matched, and may be preset according to a service requirement and a scene. For example, if the first camera is laid out on a circle, the first preset orientation error may be set to 30 degrees, so that an optimal second camera may exist to capture the front image of the user every 30 degrees of head rotation of the user, and it may be ensured that the left and right eyes of the user are distinguished. The viewing angle range may refer to a viewing angle corresponding to the preset viewing area, for example, if the preset viewing area is a circular area, the corresponding preset viewing angle is 360 degrees. The preset detection distance corresponding to the preset viewing area may be a maximum distance from the user to the second camera in the preset viewing area. The second depth of field of the second camera can be smaller than the preset detection distance, so that at least two layers of depth of field layouts can be performed, the resolution of the image is guaranteed, and the contrast is improved. The second angle of view of the second camera may refer to a range of shooting angles of the second camera, which may be obtained by lens parameters of the second camera. In this embodiment, the first camera is used for tracking the head of the user, and has low requirements on the face details and the tracking speed, and the second camera is used for tracking the eye position, and has high requirements on the tracking speed and the positioning accuracy, so that the first preset orientation error can be set to be greater than or equal to the second preset orientation error, and the first view angle is greater than the second view angle.
Specifically, the present embodiment may determine, as the number of the second layout positions, a result obtained by dividing the viewing angle range corresponding to the preset viewing area by the second preset orientation error corresponding to the second camera. According to the second depth of field of the second camera and the preset detection distance corresponding to the preset watching area, the number of layers of the proper depth of field is selected, so that the whole preset watching area can be covered, the situation that the image can be clearly imaged under different shooting distances can be guaranteed, and the resolution and the contrast of the image are improved. Illustratively, the present embodiment may use three layers of depth (i.e., near-mid-far) for multi-layer layout, and each layer may increase the coverage of tracking by arranging the second cameras side-by-side. And determining the number of the second cameras corresponding to each layer of depth of field according to the required visual angle range of each layer of depth of field and the second visual angle of the second cameras, so that the whole preset viewing area can be covered. For example, if the horizontal viewing angle range required for each depth of field is 150 degrees and the second viewing angle of the second camera is 30 degrees, it may be determined that there are 6 second cameras corresponding to each depth of field, so as to ensure that the coverage range in the horizontal direction is 150 degrees, as shown in fig. 8. In order to improve the coverage angle range in the vertical direction, at least two second camera groups may be arranged in each second layout position in a superposed manner, and each second camera group includes at least two layers of second cameras with different depths of field, so that the situations that the user lowers and raises the head can be tracked. For example, when the second layout position corresponds to a 3-layer depth of field, the 3-layer depth of field (i.e., one second camera group) corresponds to 18 second cameras, and if two second camera groups are arranged in each layout position in a superimposed manner, it can be determined that 36 second cameras need to be installed at each second layout position.
After a corresponding number of second cameras are installed at each second layout position, the azimuth configuration information of each second camera can be stored in a three-layer data table manner, so that the matching and searching speed of the second cameras is increased, and the determination speed of the target second camera can be increased. Illustratively, the three-level data table may include one second layout position pointer table, a plurality of structure pointer tables, and a plurality of data structure tables. Fig. 9 shows an example of three layers of data tables corresponding to the second camera. As shown in fig. 9, the second layout position pointer table may sequentially arrange the respective second layout positions in the layout direction (e.g., clockwise or counterclockwise) of the second layout positions. Each second layout position corresponds to a structure pointer table, and the structure pointer table is used for storing the identification codes cid of each second camera corresponding to the second layout position. The data structure table may be used to store the orientation configuration information structcamera _ position corresponding to the second camera.
The following is an embodiment of an eye tracking apparatus provided in an embodiment of the present invention, which belongs to the same inventive concept as the eye tracking method in the above embodiments, and reference may be made to the above embodiment of the eye tracking method for details that are not described in detail in the embodiment of the eye tracking apparatus.
EXAMPLE III
Fig. 10 is a schematic structural diagram of a human eye tracking device according to a third embodiment of the present invention, which is applicable to a situation where a user tracks and positions pupils of both eyes of the user when watching a 3D display screen. The device includes: a three-dimensional head orientation information determining module 310, a target second camera determining module 320, a two-dimensional binocular pupil orientation information determining module 330, and a three-dimensional binocular pupil orientation information determining module 340.
The three-dimensional head orientation information determining module 310 is configured to invoke a first camera to acquire a user image in a preset viewing area, and determine three-dimensional head orientation information corresponding to each user in the preset viewing area according to the user image; a target second camera determination module 320, configured to determine a first preset number of target second cameras from the plurality of second cameras according to the three-dimensional head orientation information; the two-dimensional binocular pupil orientation information determining module 330 is configured to call each target second camera to acquire a face image of the user, and determine two-dimensional binocular pupil orientation information of the user according to each face image; the three-dimensional binocular pupil orientation information determining module 340 is configured to determine three-dimensional binocular pupil orientation information of the user according to the at least two-dimensional binocular pupil orientation information.
Optionally, the target second camera determining module 320 includes:
the candidate second camera determining unit is used for determining a second preset number of candidate second cameras corresponding to the user and the matching degree corresponding to each candidate second camera according to the three-dimensional head position information and the position configuration information of each second camera;
the target second camera determining unit is used for screening out a first preset number of target second cameras from the candidate second cameras according to the matching degrees and the current calling times corresponding to each candidate second camera; wherein the first preset number is smaller than the second preset number.
Optionally, the target second camera determining unit is specifically configured to: screening out candidate second cameras with the current calling times smaller than or equal to the preset calling times according to the current calling times corresponding to the candidate second cameras to serve as second cameras to be selected; and performing descending order arrangement on the second cameras to be selected based on the matching degrees corresponding to the second cameras to be selected, and determining the first preset number of the second cameras to be selected after arrangement as target second cameras.
Optionally, the three-dimensional binocular pupil position information determining module 340 is specifically configured to: if at least two pieces of two-dimensional binocular pupil orientation information corresponding to the current moment of the user cannot be determined according to each face image, determining a spiral search rule according to three-dimensional binocular pupil orientation information at historical moments; taking the three-dimensional head orientation information determined according to the user image as the three-dimensional head orientation information of the user at the current moment; adjusting the three-dimensional head position information at the current moment according to a spiral search rule, and taking the adjusted three-dimensional head position information at the current moment as first three-dimensional head position information; determining at least two first two-dimensional binocular pupil position information of the user at the current moment according to the first three-dimensional head position information; and determining three-dimensional binocular pupil position information of the user according to the at least two first two-dimensional binocular pupil position information.
Optionally, the apparatus further comprises:
the second three-dimensional head position information determining module is used for estimating second three-dimensional head position information of the user at the next moment according to the three-dimensional two-dimensional pupil position information of the user at the current moment and the three-dimensional two-dimensional pupil position information at the historical moment after the three-dimensional two-dimensional pupil position information of the user is determined according to the at least two-dimensional pupil position information and before the three-dimensional head position information corresponding to each user in the preset watching area is determined according to the user image;
and the three-dimensional binocular pupil position information estimation module is used for estimating the three-dimensional binocular pupil position information of the user at the next moment according to the second three-dimensional head position information.
Optionally, the second three-dimensional head position information comprises a three-dimensional head position and a rotation angle; correspondingly, second three-dimensional head orientation information of the user at the next moment is estimated according to the following formula:
wherein (X) p1 ,Y p1 ,Z p1 ) And alpha 1 The three-dimensional eye pupil position and the angle of the gazing direction of the user at the current moment P1 are obtained; (X) p2 ,Y p2 ,Z p2 ) And alpha 2 The three-dimensional eye pupil position and the gazing direction angle of the user at the historical moment P2 are obtained; (X) p3 ,Y p3 ,Z p3 ) And alpha 3 The three-dimensional eye pupil position and the gazing direction angle of the user at the historical moment P3; (X, Y, Z) and α are estimated three-dimensional head position and rotation angle of the user at the next time instant.
Optionally, the apparatus further comprises:
and the target face area determining module is used for determining the position and the size of a target face area in the face image collected by each target second camera according to the three-dimensional head orientation information after determining a first preset number of target second cameras from the plurality of second cameras.
Optionally, the two-dimensional binocular pupil position information determining module 330 is specifically configured to: and determining the time for receiving data through the scanning line according to the position and the size of the target face area, receiving target face area data sent by the target second camera according to the time, and determining the two-dimensional binocular pupil orientation information of the user according to the received target face image data.
Optionally, the first camera is a color camera or a 3D camera, the second camera is a black-and-white camera, and an illumination infrared light source is arranged at a mounting position of the second camera.
Optionally, the apparatus further comprises: the number determining module of the first layout positions is used for determining the number of the first layout positions corresponding to the first camera according to the viewing angle range corresponding to the preset viewing area and the first preset orientation error corresponding to the first camera before the first camera is called to collect the user image in the preset viewing area;
and the first camera number determining module is used for determining the number of the first cameras corresponding to each first layout position according to the first visual angle of the first cameras, wherein the first depth of field of each first camera is greater than the preset detection distance corresponding to the preset viewing area.
Optionally, the apparatus further comprises: the number determining module of the second layout positions is used for determining the number of the second layout positions corresponding to the second camera according to the viewing angle range corresponding to the preset viewing area and the second preset orientation error corresponding to the second camera before the first camera is called to collect the user image in the preset viewing area;
the depth-of-field layer number determining module is used for determining the depth-of-field layer number corresponding to each second layout position according to the second depth of field of the second camera and the preset detection distance corresponding to the preset viewing area, wherein the second depth of field is smaller than the preset detection distance;
and the second camera number determining module is used for determining the number of the second cameras corresponding to each layer of depth of field according to the second visual angle of the second cameras.
The eye tracking device provided by the embodiment of the invention can execute the eye tracking method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects for executing the eye tracking method.
It should be noted that, in the embodiment of the eye tracking apparatus, the included units and modules are merely divided according to the functional logic, but are not limited to the above division, as long as the corresponding functions can be implemented; in addition, the specific names of the functional units are only for the convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
Example four
Fig. 11 is a schematic structural diagram of an eye tracking system according to a fourth embodiment of the present invention. Referring to fig. 11, the system includes: a first camera 410, a plurality of second cameras 420, and a human eye tracking device 430; the eye tracking apparatus 430 may be used to implement the eye tracking method according to any embodiment of the present invention.
The first camera 410 is used for capturing an image of a user in a preset viewing area, so as to track the head of the user. The first camera 410 in this embodiment may be a 3D camera or a plurality of 2D cameras. Since the first camera 410 is responsible for tracking the head of the user, the requirements on the details of the face and the tracking speed are not high, so that the first camera 410 with a large viewing angle can be selected. The second camera 420 is used for collecting the face image of the user so as to realize high-speed tracking of human eyes. The second camera 420 in this embodiment may be a 2D camera. Illustratively, the first camera 410 may be a high-definition color camera and the second camera 420 may be a black and white camera. The resolution of the first camera in this embodiment may be greater than the resolution of the second camera. According to the embodiment, the calculation speed can be increased on the premise of ensuring the calculation accuracy by using the second camera with lower resolution.
The layout requirement of the first camera 410 in this embodiment may be set as, but is not limited to: (1) The depth of field of the first camera 410 is greater than the preset detection distance corresponding to the preset viewing area, so that only the first cameras with one depth of field need to be utilized within the preset detection distance range, that is, the depth of field of each first camera is the same. (2) The area occupied by the head region in the user image acquired by the first camera 410 may be larger than the precision requirement, such as 60 × 60 pixels, so as to make the captured head region clearer. (3) A sufficient number of first cameras are set according to a first preset orientation error, which may be set to 60 degrees, and may cover the entire preset viewing area.
The present embodiment may layout the first camera based on the layout requirements of the first camera 410 and the preset viewing area. Specifically, at least one first camera is arranged at each first layout position in the preset viewing area, and the total detection area corresponding to each first camera is the preset viewing area, that is, the total shooting range of each first camera can cover the preset viewing area. For example, when the preset viewing area is an area in a circle, the first layout positions may be distributed on the circle corresponding to the preset viewing area, and at least one first camera is disposed at each first layout position, and the shooting direction of each first camera faces the center position of the circle. The distance between every two adjacent first layout positions can be equal, namely, the first layout positions are uniformly distributed on the closed shape corresponding to the preset viewing area; or within a preset allowable error range. For example, fig. 12 shows a layout example of the first cameras when the preset viewing area is an area inside a circle. The dotted line in fig. 12 indicates the optical axis of the first camera, and the solid lines on both sides of the dotted line indicate the viewing angle range of the first camera. As shown in fig. 12, 6 first layout positions are uniformly distributed on the circle, and one first camera is disposed at each first layout position, that is, the entire circular inner area can be covered by disposing 6 first cameras, so that user images at various positions in the circular inner area can be collected, and the layout manner of the first cameras can be referred to as a circular inward type.
For example, when the preset viewing area is a circular outer area, the first layout positions may be distributed on a circle corresponding to the preset viewing area, and each first layout position is provided with at least one first camera, and the shooting direction of each first camera deviates from the center position of the circle. The distance between every two adjacent first layout positions can be equal, namely, the first layout positions are uniformly distributed on the closed shape corresponding to the preset viewing area; or within a preset allowable error range. For example, fig. 13 shows a layout example of the first camera when the preset viewing area is a circular outer area. The broken line in fig. 13 indicates the optical axis of the first camera, and the solid lines on both sides of the broken line indicate the viewing angle range of the first camera. As shown in fig. 13, 6 first layout positions are uniformly distributed on the circle, and one first camera is disposed at each first layout position, that is, the entire circular outer area can be covered by disposing 6 first cameras, so that the user image in the circular outer area can be captured, and the layout manner of the first cameras can be referred to as a circular outward shape.
For example, when the preset viewing area is a straight single-sided viewing area, such as a viewing area in a movie theater, the first layout positions may be distributed on a straight line of the straight single-sided viewing area, and at least one first camera is disposed at each first layout position, and a photographing direction of each first camera is photographed toward the straight single-sided viewing area. The distance between every two adjacent first layout positions can be equal, namely, the first layout positions are uniformly distributed on a straight line of a straight line single-side viewing area; or within a preset allowable error range. For example, fig. 14 shows a layout example of the first cameras when the preset viewing zone is a straight-line one-sided viewing zone. The broken line in fig. 14 indicates the optical axis of the first camera, and the solid lines on both sides of the broken line indicate the viewing angle range of the first camera. As shown in fig. 14, 3 first layout positions are uniformly distributed on a straight line, and one first camera is provided at each first layout position, and the three first cameras can shoot toward a straight line one-sided viewing area in a fan-shaped direction so as to cover the entire straight line one-sided viewing area, and this layout manner of the first cameras can be referred to as a planar type.
The layout requirement of the second camera 420 in this embodiment may be set as, but is not limited to: (1) At least two layers of depth of field are used at each second layout position in order to improve image resolution and contrast. And (2) the frame rate of the second camera is greater than 60 frames per second. (3) The area occupied by the face region in the face image acquired by the second camera can be larger than the precision requirement, such as 100 × 100 pixels, so as to improve the calculation precision of the two-dimensional binocular pupil orientation information. (4) The shortest distance d in the depth of field range crossing region (such as the shaded region in fig. 15) of two adjacent second cameras at each second layout position is greater than the distance between pupils of two eyes, so as to ensure that a face image simultaneously including the pupils of the left and right eyes can be acquired. Where d may be set to 6.5cm. (5) A sufficient number of second layout positions are set according to a second preset orientation error, which may be set to 30 degrees, and may cover the entire preset viewing area. (6) When the second camera is a black-and-white camera, a lighting infrared light source is arranged at the center of each second layout position so as to perform light supplementing lighting.
For example, at least one second camera group may be disposed at each second layout position, each second camera group includes at least two layers of second cameras, each layer of second cameras is a plurality of second cameras with the same depth of field, and the depth of field of the second cameras in different layers is different. As shown in fig. 8, when the user is at a detection range of 1 to 5 meters away from the second camera, the second camera may use a low-resolution (640 × 480) black and white camera with a high speed (greater than 90 frames per second), and the second viewing angle of each second camera is 30 degrees, each second camera group may include three second cameras with different focal lengths to ensure that the user at different distances can be clearly imaged, and the coverage range in the horizontal direction may reach 150 degrees at maximum by using a manner that 6 cameras are arranged in parallel.
The present embodiment may layout the second camera based on the layout requirements of the second camera 420 and the preset viewing area. Specifically, at least one second camera is arranged at each second layout position in the preset viewing area, and the total detection area corresponding to each second camera is the preset viewing area, that is, the total shooting range of each second camera can cover the preset viewing area. For example, when the preset viewing area is an area in a circle, the second layout positions may be distributed on the circle corresponding to the preset viewing area, and at least one second camera is disposed at each second layout position, and the shooting direction of each second camera faces the center position of the circle. The distance between every two adjacent second layout positions can be equal, namely, the second layout positions are uniformly distributed on the closed shape corresponding to the preset viewing area; and can be within the preset allowable error range. For example, fig. 16 shows a layout example of the second camera when the preset viewing area is an area inside a circle. As shown in fig. 16, 12 second layout positions are uniformly distributed on a circle, and if each second camera group is laid out according to the mode shown in fig. 8, 18 second cameras can be arranged at the second layout positions in the horizontal direction, that is, each second camera group includes 18 second cameras, so as to ensure that the shooting angle of each second layout position in the horizontal direction can be 150 degrees, where the total number of the required second cameras is: 12 × 18=216. In order to improve the coverage angle range in the vertical direction, at least two second camera groups may be further arranged at each second layout position in a superposed manner, so that the situation when the user looks at the head with his head down and head up can be tracked, at this time, at least 12 × 18 × 2=432 second cameras should be arranged at each second layout position, so that the second camera which is the best target can shoot the image of the front face of the user every 30 degrees of rotation of the head of the user, thereby reducing the occlusion phenomenon and improving the tracking accuracy. This arrangement of the second cameras may be referred to as a circular inward type.
For example, when the preset viewing area is a circular outer area, the second layout positions may be distributed on a closed shape corresponding to the preset viewing area, and at least one second camera is disposed at each second layout position, and the shooting direction of each second camera deviates from the center position of the circle. The distance between every two adjacent second layout positions can be equal, namely, the second layout positions are uniformly distributed on the closed shape corresponding to the preset viewing area; or within a preset allowable error range. For example, fig. 17 shows a layout example of the second camera when the preset viewing area is a circular outer area. As shown in fig. 17, 12 second layout positions 1 to 12 are uniformly distributed on the circle, and each second layout position is provided with 18 second cameras in the horizontal direction, so as to ensure that the shooting angle of each second layout position in the horizontal direction can be 150 degrees.
For example, when the preset viewing area is a straight single-sided viewing area, such as a viewing area in a movie theater, the second layout positions may be distributed on a straight line of the straight single-sided viewing area, and at least one second camera is disposed at each second layout position, and a photographing direction of each second camera is photographed toward the straight single-sided viewing area. The distance between every two adjacent second layout positions can be equal, namely, the second layout positions are uniformly distributed on a straight line of the straight line single-side viewing area; and can be within the preset allowable error range. For example, fig. 18 shows an example of the layout of the second camera when the preset viewing zone is a straight-line one-sided viewing zone. As shown in fig. 18, 3 second layout positions are uniformly distributed on a straight line, if each second layout position is laid out according to the manner of fig. 8, 18 second cameras are arranged to ensure that the shooting angle of each second layout position in the horizontal direction can be 150 degrees, and the shooting direction corresponding to the three second layout positions can be a fan shape so as to cover the whole straight line single-side viewing area of the second layout positions in the horizontal direction, and this manner of laying out the second cameras can be referred to as a plane type.
The working process of the eye tracking system in the embodiment is as follows: the eye tracking device 430 calls the first camera 410, so that the first camera 410 collects a user image in a preset viewing area, and transmits the collected user image to the eye tracking device 430, the eye tracking device 430 determines three-dimensional head position information corresponding to each user in the preset viewing area according to the user image, determines a first preset number of target second cameras from the plurality of second cameras according to the three-dimensional head position information, and calls each target second camera, so that each target second camera can collect a face image of the user, and transmits the collected face image to the eye tracking device 430, the eye tracking device 430 determines two-dimensional binocular pupil position information of the user according to each face image, and determines three-dimensional binocular pupil position information of the user according to at least two-dimensional binocular pupil position information, thereby realizing high-speed tracking of the user's binocular pupils, and improving calculation accuracy.
According to the eye tracking system provided by the embodiment, the first camera and the second camera are used for respectively realizing head identification and high-speed eye tracking, and the eye tracking device 430 is used for scheduling and managing the second camera, so that high-speed tracking of pupils of both eyes of a user can be realized, and meanwhile, the calculation precision is improved.
Based on the above technical solutions, the eye tracking apparatus 430 may be integrated into one server to implement the eye tracking method provided by any embodiment of the present invention, or the first client, the plurality of second clients, and the central server may be utilized to implement the eye tracking method provided by any embodiment of the present invention. Fig. 19 is a schematic structural diagram of another eye tracking system provided in this embodiment, and as shown in fig. 19, the system includes: a first camera 410, a plurality of second cameras 420, a first client 440, a plurality of second clients 450, and a central server 460.
The first camera 410 is connected to the first client 440, and is configured to collect a user image in a preset viewing area, and send the user image to the first client 440; the first client 440 is connected with the central server 460, and is configured to determine three-dimensional head orientation information corresponding to each user in a preset viewing area according to a user image sent by the first client 440, and send the three-dimensional head orientation information to the central server 460; each second camera 420 is connected with a corresponding second client 450, and is used for collecting a face image of the user and sending the face image to the corresponding second client 450; the second client 450 is connected with the central server 460, and is configured to determine two-dimensional binocular pupil position information of the user according to each face image, and send the two-dimensional binocular pupil position information to the central server 460; the central server 460 is configured to determine a first preset number of target second cameras from the plurality of second cameras according to the three-dimensional head orientation information sent by the first client 440, call a second client connected to each target second camera, obtain two-dimensional binocular pupil orientation information sent by each called second client, and determine three-dimensional binocular pupil orientation information of the user according to at least two pieces of two-dimensional binocular pupil orientation information.
It should be noted that the first client may be, but is not limited to, a high-performance PC (Personal Computer). The second client may be, but is not limited to, an embedded computer to increase loudness speed. In this embodiment, the number of the first cameras 410 may be one or more, so as to improve the head tracking range and improve the head tracking accuracy. When a plurality of first cameras 410 exist, each first camera 410 is connected with the first client 440, so that the first client 440 performs image processing on the user images acquired by the first cameras 410, and the three-dimensional head orientation information of each user in the preset viewing area is more accurately determined.
Specifically, the working process of the eye tracking system provided in fig. 19 is: the first client 440 calls the first camera 410, so that the first camera 410 collects a user image in a preset viewing area, and sends the user image to the first client 440; the first client 440 determines three-dimensional head position information corresponding to each user in a preset viewing area according to the user image sent by the first client 440, and sends the three-dimensional head position information to the central server 460; the central server 460 is configured to determine a first preset number of target second cameras from the plurality of second cameras 420 according to the three-dimensional head position information sent by the first client 440, and call a second client connected to each target second camera; the second client 450 connected with each target second camera calls the corresponding target second camera, so that each target second camera collects the face image of the user and sends the face image to the corresponding second client 450; the second client 450 determines the two-dimensional binocular pupil orientation information of the user according to each received face image, and sends the two-dimensional binocular pupil orientation information to the central server 460; the center server 460 determines three-dimensional binocular pupil position information of the user according to the at least two-dimensional binocular pupil position information. The center server 460 may further be connected to the 3D display screen driver, so as to input the three-dimensional binocular pupil position information of the user into the 3D display screen driver, so that the 3D display screen may determine corresponding display data according to the three-dimensional binocular pupil position information, and the user may view a corresponding three-dimensional picture. In the embodiment, the first client, the second client and the central server are respectively responsible for processing three links in the human eye tracking process, namely, the first client is responsible for processing the user image, the second client is responsible for processing the face image, and the central server is responsible for matching and scheduling the second camera and calculating the three-dimensional binocular pupil orientation information of the user, so that the system has higher running speed and higher processing efficiency, and the human eye tracking speed is further improved.
EXAMPLE five
Fig. 20 is a schematic structural diagram of an apparatus according to a fifth embodiment of the present invention. Referring to fig. 20, the apparatus includes:
one or more processors 510;
a memory 520 for storing one or more programs;
an input device 530 for acquiring an image;
an output device 540 for displaying screen information;
the one or more programs, when executed by the one or more processors 510, cause the one or more processors 510 to implement a method for eye tracking as set forth in any of the embodiments above.
In FIG. 20, one processor 510 is taken as an example; the processor 510, the memory 520, the input device 530 and the output device 540 in the apparatus may be connected by a bus or other means, and the connection by the bus is exemplified in fig. 20.
The memory 520 may be used as a computer-readable storage medium for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the eye tracking method in the embodiment of the present invention (for example, the three-dimensional head position information determining module 310, the target second camera determining module 320, the two-dimensional binocular pupil position information determining module 330, and the three-dimensional binocular pupil position information determining module 340 in the eye tracking apparatus). The processor 510 implements the eye tracking method described above by executing software programs, instructions, and modules stored in the memory 520 to execute various functional applications of the device and data processing.
The memory 520 mainly includes a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the storage data area may store data created according to use of the device, and the like. Further, the memory 520 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 520 can further include memory located remotely from the processor 510, which can be connected to devices through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 530 may include a camera or other capturing device for capturing a user image and a face image and inputting the captured face image and face image to the processor 510 for data processing.
The output device 540 may include a display device such as a display screen for displaying screen information.
The apparatus of the present embodiment and the eye tracking method of the present embodiment belong to the same inventive concept, and the technical details that are not described in detail in the present embodiment can be referred to the above embodiments, and the present embodiment has the same advantageous effects as the eye tracking method.
EXAMPLE six
The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the eye tracking method as provided by any of the embodiments of the present invention.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer-readable storage medium may be, for example but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It will be understood by those skilled in the art that the modules or steps of the invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and optionally they may be implemented by program code executable by a computing device, such that it may be stored in a memory device and executed by a computing device, or it may be separately fabricated into various integrated circuit modules, or it may be fabricated by fabricating a plurality of modules or steps thereof into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. Those skilled in the art will appreciate that the present invention is not limited to the particular embodiments described herein, and that various obvious changes, rearrangements and substitutions will now be apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.
Claims (20)
1. A method of eye tracking, comprising:
calling a first camera to collect user images in a preset watching area, and determining three-dimensional head direction information corresponding to each user in the preset watching area according to the user images;
determining a first preset number of target second cameras from the plurality of second cameras according to the three-dimensional head orientation information;
calling each target second camera to collect the face image of the user, and determining the two-dimensional binocular pupil orientation information of the user according to each face image;
determining three-dimensional binocular pupil orientation information of the user according to the at least two pieces of two-dimensional binocular pupil orientation information;
if at least two pieces of two-dimensional binocular pupil orientation information corresponding to the current moment of the user cannot be determined according to each facial image, determining a spiral search rule according to three-dimensional binocular pupil orientation information at historical moments; taking the three-dimensional head position information determined according to the user image as the three-dimensional head position information of the user at the current moment; adjusting the three-dimensional head position information at the current moment according to the spiral search rule, and taking the adjusted three-dimensional head position information at the current moment as first three-dimensional head position information; determining at least two first two-dimensional binocular pupil position information of the user at the current moment according to the first three-dimensional head position information; and determining the three-dimensional binocular pupil position information of the user according to at least two first two-dimensional binocular pupil position information.
2. The method of claim 1, wherein determining a first preset number of target second cameras from a plurality of second cameras based on the three-dimensional head position information comprises:
determining a second preset number of candidate second cameras corresponding to the user and a matching degree corresponding to each candidate second camera according to the three-dimensional head position information and the position configuration information of each second camera;
screening a first preset number of target second cameras from the candidate second cameras according to the matching degrees and the current calling times corresponding to the candidate second cameras;
wherein the first preset number is less than or equal to the second preset number.
3. The method of claim 2, wherein the step of screening a first preset number of target second cameras from the candidate second cameras according to the matching degrees and the current calling times corresponding to each candidate second camera comprises:
screening out candidate second cameras with the current calling times smaller than or equal to preset calling times according to the current calling times corresponding to the candidate second cameras, and taking the candidate second cameras as second cameras to be selected;
and based on the matching degree corresponding to the second cameras to be selected, performing descending order arrangement on the second cameras to be selected, and determining the arranged first preset number of second cameras to be selected as target second cameras.
4. The method according to claim 1, further comprising, after determining three-dimensional binocular pupil orientation information of the user according to at least two of the two-dimensional binocular pupil orientation information, and before determining three-dimensional head orientation information corresponding to each user in the preset viewing area according to the user image:
according to the three-dimensional binocular pupil position information of the user at the current moment and the three-dimensional binocular pupil position information of the user at the historical moment, second three-dimensional head position information of the user at the next moment is estimated;
and estimating the three-dimensional binocular pupil orientation information of the user at the next moment according to the second three-dimensional head orientation information.
5. The method of claim 4, wherein the second three-dimensional head position information comprises a three-dimensional head position and a rotation angle;
correspondingly, second three-dimensional head orientation information of the user at the next moment is estimated according to the following formula:
wherein (X) p1 ,Y p1 ,Z p1 ) And alpha 1 The three-dimensional eye pupil position and the gaze direction angle of the user at the current moment P1 are obtained; (X) p2 ,Y p2 ,Z p2 ) And alpha 2 The three-dimensional eye pupil position and the gazing direction angle of the user at the historical moment P2 are obtained; (X) p3 ,Y p3 ,Z p3 ) And alpha 3 The three-dimensional eye pupil position and the gazing direction angle of the user at the historical moment P3 are obtained; (X, Y, Z) and α are estimated three-dimensional head position and rotation angle of the user at the next time instant.
6. The method of claim 1, after determining a first preset number of target second cameras from the plurality of second cameras, further comprising:
and determining the position and the size of a target face area in the face image acquired by each target second camera according to the three-dimensional head orientation information.
7. The method of claim 6, wherein determining two-dimensional binocular pupil orientation information of the user from each of the facial images comprises:
and determining the time for receiving data through a scanning line according to the position and the size of the target face area, receiving target face area data sent by the target second camera according to the time, and determining the two-dimensional binocular pupil orientation information of the user according to the received target face image data.
8. The method according to any one of claims 1 to 7, wherein the first camera is a color camera or a 3D camera, the second camera is a black and white camera, and an illuminating infrared light source is arranged at the installation position of the second camera.
9. The method of claim 1, prior to invoking the first camera to capture the user image in the preset viewing area, further comprising:
determining the number of first layout positions corresponding to the first camera according to the viewing angle range corresponding to the preset viewing area and the first preset orientation error corresponding to the first camera;
and determining the number of the first cameras corresponding to each first layout position according to the first visual angle of the first cameras, wherein the first depth of field of each first camera is greater than the preset detection distance corresponding to the preset viewing area.
10. The method of claim 1, prior to invoking the first camera to capture the user image in the preset viewing area, further comprising:
determining the number of second layout positions corresponding to the second camera according to the viewing angle range corresponding to the preset viewing area and a second preset orientation error corresponding to the second camera;
determining the number of depth-of-field layers corresponding to each second layout position according to a second depth of field of the second camera and a preset detection distance corresponding to the preset viewing area, wherein the second depth of field is smaller than the preset detection distance;
and determining the number of the second cameras corresponding to each layer of depth of field according to the second visual angle of the second cameras.
11. An eye tracking device, comprising:
the three-dimensional head orientation information determining module is used for calling a first camera to acquire a user image in a preset viewing area and determining three-dimensional head orientation information corresponding to each user in the preset viewing area according to the user image;
the target second camera determining module is used for determining a first preset number of target second cameras from the plurality of second cameras according to the three-dimensional head position information;
the two-dimensional binocular pupil orientation information determining module is used for calling each target second camera to acquire a face image of the user and determining the two-dimensional binocular pupil orientation information of the user according to each face image;
the three-dimensional binocular pupil orientation information determining module is used for determining the three-dimensional binocular pupil orientation information of the user according to at least two pieces of two-dimensional binocular pupil orientation information;
the three-dimensional binocular pupil orientation information determining module is specifically configured to: if at least two pieces of two-dimensional binocular pupil orientation information corresponding to the current moment of the user cannot be determined according to each face image, determining a spiral search rule according to three-dimensional binocular pupil orientation information at historical moments; taking the three-dimensional head position information determined according to the user image as the three-dimensional head position information of the user at the current moment; adjusting the three-dimensional head position information at the current moment according to the spiral search rule, and taking the adjusted three-dimensional head position information at the current moment as first three-dimensional head position information; determining at least two first two-dimensional binocular pupil position information of the user at the current moment according to the first three-dimensional head position information; and determining the three-dimensional binocular pupil position information of the user according to at least two first two-dimensional binocular pupil position information.
12. An eye tracking system, the system comprising: the system comprises a first camera, a plurality of second cameras and a human eye tracking device; wherein the eye tracking apparatus is used to implement the eye tracking method of any one of claims 1-10.
13. The system of claim 12, wherein the first camera is a color camera or a 3D camera and the second camera is a black and white camera.
14. The system according to claim 12, wherein at least one of the first cameras is disposed at each first layout position within a preset viewing area, and a total detection area corresponding to each of the first cameras is the preset viewing area;
at least one second camera is arranged at each second layout position in a preset viewing area, and a total detection area corresponding to each second camera is the preset viewing area.
15. The system of claim 14, wherein an illuminating infrared light source is disposed at the center of each of the second layout positions.
16. The system according to any one of claims 12-15, wherein a depth of field of each of the first cameras is greater than a predetermined detection distance corresponding to a predetermined viewing area.
17. The system according to claim 14, wherein at least one second camera group is disposed at each second layout position, the second camera group comprises at least two layers of second cameras, each layer of second cameras is a plurality of second cameras with the same depth of field, and the depth of field of the second cameras in different layers is different.
18. The system according to claim 14, wherein the shortest distance in the intersection region of the depth of field ranges of two adjacent second cameras at each second layout position is greater than the distance between pupils of both eyes.
19. An apparatus, characterized in that the apparatus comprises:
one or more processors;
a memory for storing one or more programs;
an input device for acquiring an image;
output means for displaying screen information;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the eye tracking method of any one of claims 1-10.
20. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the eye tracking method according to any one of claims 1 to 10.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910438457.3A CN110263657B (en) | 2019-05-24 | 2019-05-24 | Human eye tracking method, device, system, equipment and storage medium |
PCT/CN2019/106701 WO2020237921A1 (en) | 2019-05-24 | 2019-09-19 | Eye tracking method, apparatus and system, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910438457.3A CN110263657B (en) | 2019-05-24 | 2019-05-24 | Human eye tracking method, device, system, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110263657A CN110263657A (en) | 2019-09-20 |
CN110263657B true CN110263657B (en) | 2023-04-18 |
Family
ID=67915324
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910438457.3A Active CN110263657B (en) | 2019-05-24 | 2019-05-24 | Human eye tracking method, device, system, equipment and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110263657B (en) |
WO (1) | WO2020237921A1 (en) |
Families Citing this family (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112929638B (en) * | 2019-12-05 | 2023-12-15 | 北京芯海视界三维科技有限公司 | Eye positioning method and device and multi-view naked eye 3D display method and device |
CN113132643B (en) * | 2019-12-30 | 2023-02-07 | Oppo广东移动通信有限公司 | Image processing method and related product |
CN113128243B (en) * | 2019-12-31 | 2024-07-26 | 苏州协尔智能光电有限公司 | Optical recognition system, optical recognition method and electronic equipment |
CN111158162B (en) * | 2020-01-06 | 2022-08-30 | 亿信科技发展有限公司 | Super multi-viewpoint three-dimensional display device and system |
CN113448428B (en) * | 2020-03-24 | 2023-04-25 | 中移(成都)信息通信科技有限公司 | Sight focal point prediction method, device, equipment and computer storage medium |
CN111586352A (en) * | 2020-04-26 | 2020-08-25 | 上海鹰觉科技有限公司 | Multi-photoelectric optimal adaptation joint scheduling system and method |
CN111881861B (en) * | 2020-07-31 | 2023-07-21 | 北京市商汤科技开发有限公司 | Display method, device, equipment and storage medium |
CN111935473B (en) * | 2020-08-17 | 2022-10-11 | 广东申义实业投资有限公司 | Rapid eye three-dimensional image collector and image collecting method thereof |
KR20220039113A (en) * | 2020-09-21 | 2022-03-29 | 삼성전자주식회사 | Method and apparatus for transmitting video content using edge computing service |
CN112417977B (en) * | 2020-10-26 | 2023-01-17 | 青岛聚好联科技有限公司 | Target object searching method and terminal |
CN112711982B (en) * | 2020-12-04 | 2024-07-09 | 科大讯飞股份有限公司 | Visual detection method, device, system and storage device |
CN112583980A (en) * | 2020-12-23 | 2021-03-30 | 重庆蓝岸通讯技术有限公司 | Intelligent terminal display angle adjusting method and system based on visual identification and intelligent terminal |
CN112804504B (en) * | 2020-12-31 | 2022-10-04 | 成都极米科技股份有限公司 | Image quality adjusting method, image quality adjusting device, projector and computer readable storage medium |
CN114697602B (en) * | 2020-12-31 | 2023-12-29 | 华为技术有限公司 | Conference device and conference system |
CN112799407A (en) * | 2021-01-13 | 2021-05-14 | 信阳师范学院 | Pedestrian navigation-oriented gaze direction estimation method |
CN113138664A (en) * | 2021-03-30 | 2021-07-20 | 青岛小鸟看看科技有限公司 | Eyeball tracking system and method based on light field perception |
CN113476037A (en) * | 2021-06-29 | 2021-10-08 | 京东方科技集团股份有限公司 | Sleep monitoring method based on child sleep system and terminal processor |
CN114449250A (en) * | 2022-01-30 | 2022-05-06 | 纵深视觉科技(南京)有限责任公司 | Method and device for determining viewing position of user relative to naked eye 3D display equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101068342A (en) * | 2007-06-05 | 2007-11-07 | 西安理工大学 | Video frequency motion target close-up trace monitoring method based on double-camera head linkage structure |
CN103324284A (en) * | 2013-05-24 | 2013-09-25 | 重庆大学 | Mouse control method based on face and eye detection |
CN105930821A (en) * | 2016-05-10 | 2016-09-07 | 上海青研信息技术有限公司 | Method for identifying and tracking human eye and apparatus for applying same to naked eye 3D display |
WO2016142489A1 (en) * | 2015-03-11 | 2016-09-15 | SensoMotoric Instruments Gesellschaft für innovative Sensorik mbH | Eye tracking using a depth sensor |
CN107609516A (en) * | 2017-09-13 | 2018-01-19 | 重庆爱威视科技有限公司 | Adaptive eye moves method for tracing |
CN109598253A (en) * | 2018-12-14 | 2019-04-09 | 北京工业大学 | Mankind's eye movement measuring method based on visible light source and camera |
CN109688403A (en) * | 2019-01-25 | 2019-04-26 | 广州杏雨信息科技有限公司 | One kind being applied to perform the operation indoor naked eye 3D human eye method for tracing and its equipment |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090196460A1 (en) * | 2008-01-17 | 2009-08-06 | Thomas Jakobs | Eye tracking system and method |
US8970452B2 (en) * | 2011-11-02 | 2015-03-03 | Google Inc. | Imaging method |
WO2016146486A1 (en) * | 2015-03-13 | 2016-09-22 | SensoMotoric Instruments Gesellschaft für innovative Sensorik mbH | Method for operating an eye tracking device for multi-user eye tracking and eye tracking device |
-
2019
- 2019-05-24 CN CN201910438457.3A patent/CN110263657B/en active Active
- 2019-09-19 WO PCT/CN2019/106701 patent/WO2020237921A1/en active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101068342A (en) * | 2007-06-05 | 2007-11-07 | 西安理工大学 | Video frequency motion target close-up trace monitoring method based on double-camera head linkage structure |
CN103324284A (en) * | 2013-05-24 | 2013-09-25 | 重庆大学 | Mouse control method based on face and eye detection |
WO2016142489A1 (en) * | 2015-03-11 | 2016-09-15 | SensoMotoric Instruments Gesellschaft für innovative Sensorik mbH | Eye tracking using a depth sensor |
CN105930821A (en) * | 2016-05-10 | 2016-09-07 | 上海青研信息技术有限公司 | Method for identifying and tracking human eye and apparatus for applying same to naked eye 3D display |
CN107609516A (en) * | 2017-09-13 | 2018-01-19 | 重庆爱威视科技有限公司 | Adaptive eye moves method for tracing |
CN109598253A (en) * | 2018-12-14 | 2019-04-09 | 北京工业大学 | Mankind's eye movement measuring method based on visible light source and camera |
CN109688403A (en) * | 2019-01-25 | 2019-04-26 | 广州杏雨信息科技有限公司 | One kind being applied to perform the operation indoor naked eye 3D human eye method for tracing and its equipment |
Non-Patent Citations (3)
Title |
---|
Multi-user eye tracking suitable for 3D display applications;Hopf K,et al.;《3DTV-CON》;20110616;全文 * |
协同式眼动跟踪技术及其交互应用研究;沈晓权;《中国优秀硕士学位论文全文数据库-信息科技辑》;20190115;全文 * |
基于暗瞳图像的人眼视线估计;张太宁等;《物理学报》;20130708(第13期);第2、4章 * |
Also Published As
Publication number | Publication date |
---|---|
CN110263657A (en) | 2019-09-20 |
WO2020237921A1 (en) | 2020-12-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110263657B (en) | Human eye tracking method, device, system, equipment and storage medium | |
US10477149B2 (en) | Holographic video capture and telepresence system | |
EP3804301B1 (en) | Re-creation of virtual environment through a video call | |
EP3414742B1 (en) | Optimized object scanning using sensor fusion | |
RU2722495C1 (en) | Perception of multilayer augmented entertainment | |
CN108292489A (en) | Information processing unit and image generating method | |
KR101675567B1 (en) | Apparatus and system for acquiring panoramic images, method using it, computer program and computer readable recording medium for acquiring panoramic images | |
CN108139803A (en) | For the method and system calibrated automatically of dynamic display configuration | |
CN113272863A (en) | Depth prediction based on dual pixel images | |
CN108028887A (en) | Focusing method of taking pictures, device and the equipment of a kind of terminal | |
WO2016108720A1 (en) | Method and device for displaying three-dimensional objects | |
CN108885342A (en) | Wide Baseline Stereo for low latency rendering | |
KR20160094190A (en) | Apparatus and method for tracking an eye-gaze | |
KR20220044897A (en) | Wearable device, smart guide method and device, guide system, storage medium | |
US20200159339A1 (en) | Desktop spatial stereoscopic interaction system | |
JP5741353B2 (en) | Image processing system, image processing method, and image processing program | |
CN115393182A (en) | Image processing method, device, processor, terminal and storage medium | |
KR101788005B1 (en) | Method for generating multi-view image by using a plurality of mobile terminal | |
JP2024013947A (en) | Imaging device, imaging method of imaging device and program | |
CN114581514A (en) | Method for determining fixation point of eyes and electronic equipment | |
JP6855493B2 (en) | Holographic video capture and telepresence system | |
EP4439250A1 (en) | Gaze-driven autofocus camera for mixed-reality passthrough | |
JP2024017297A (en) | Imaging apparatus, wearable device, control method, program, and system | |
US20230185371A1 (en) | Electronic device | |
US20230188828A1 (en) | Electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 40008454 Country of ref document: HK |
|
GR01 | Patent grant | ||
GR01 | Patent grant |