GB2574410A - Apparatus and method for eye-tracking based text input - Google Patents

Apparatus and method for eye-tracking based text input Download PDF

Info

Publication number
GB2574410A
GB2574410A GB1809138.9A GB201809138A GB2574410A GB 2574410 A GB2574410 A GB 2574410A GB 201809138 A GB201809138 A GB 201809138A GB 2574410 A GB2574410 A GB 2574410A
Authority
GB
United Kingdom
Prior art keywords
camera
scene
information
image stream
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB1809138.9A
Other versions
GB201809138D0 (en
Inventor
Jan Stefan Hamminga Derk
Beliaeva Marila
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Robot Protos Ltd
Original Assignee
Robot Protos Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Robot Protos Ltd filed Critical Robot Protos Ltd
Priority to GB1809138.9A priority Critical patent/GB2574410A/en
Publication of GB201809138D0 publication Critical patent/GB201809138D0/en
Publication of GB2574410A publication Critical patent/GB2574410A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/12Details of acquisition arrangements; Constructional details thereof
    • G06V10/14Optical characteristics of the device performing the acquisition or on the illumination arrangements
    • G06V10/147Details of sensors, e.g. sensor lenses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/033Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor
    • G06F3/0346Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor with detection of the device orientation or free movement in a 3D space, e.g. 3D mice, 6-DOF [six degrees of freedom] pointers using gyroscopes, accelerometers or tilt-sensors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/01Indexing scheme relating to G06F3/01
    • G06F2203/011Emotion or mood input determined on the basis of sensed human body parameters such as pulse, heart rate or beat, temperature of skin, facial expressions, iris, voice pitch, brain activity patterns

Abstract

Constructing a 3D scene from multiple camera image streams for the purpose of text input using eye-tracking and micro expression emotion analysis. Includes a primary camera and a secondary rotatable camera of higher angular resolution, together used to incrementally construct and refine a 3D scene of the objects in the primary camera view. The constructed scene is then used to perform eye gaze pattern tracking over a displayed keyboard and emotion classification by micro expression extraction, wherein the emotional information is used as a real-time feedback system for auto-correcting the text input. The 3D scene maybe iteratively constructed based on corresponding features found between the primary camera image stream and secondary camera image stream combined with the secondary camera rotation angle. Alternatively the 3D scene may be constructed from features located in the image stream of secondary cameras, the location of the features determined using pixel coordinates, position orientation and optical properties of the cameras.

Description

Apparatus and method for eye-tracking based text input
Field and background of the Invention
The present application relates to a method and apparatus for tracking the gaze point of a user or information consumer (eye-tracking), in particular closed-loop, non-contact, and non-emitting and using said method and apparatus as means of text input. Comparable methods for such text input are, in example: pressing keys on a touchscreen of a phone or voice recognition systems.
Eye-tracking as a text input method offers great potential in situations where operation by hands or voice commands is either impossible or undesired. Examples are sterile environments where physical interaction with the phone is prohibited or simply situations requiring unobtrusive communication. Current technical solutions however are either simply too large to fit in most computing devices, too unpleasant to use due to requiring helmets, glasses, or contact lenses to be worn, or too inaccurate to work as a practical, every day textual input method.
Summary of the Invention
The invention proposes an apparatus and method offering substantial benefits over current solutions, combining a very compact physical form with a novel feedback system that overcomes accuracy issues associated with small form-factor solutions. The apparatus comprises a primary camera module mounted fixedly, one or multiple secondary camera modules mounted to allow rotation around a hinge-point at the lens opening face, sensors to measure said rotation angles, a processor coupled to determine and control secondary camera rotations.
The method comprises extracting visually distinct features from the primary camera image stream, locating one or more of said features in the image stream of one or more secondary cameras, when found determining the location of each located feature in 3D space using pixel coordinates, position, orientation, and optical properties of each camera. This process is done continuously to incrementally construct and refine a 3D scene.
The 3D scene is simultaneously and continuously scanned for known patterns. If a human face is found the refinement process is directed to focus refinement on the eye area, and a new process is started to continuously analyse the face for specific gaze patterns. Patterns over a displayed keyboard are used to determine which words a user is spelling by looking at the sequence of letters. If a word is deemed complete it is displayed to the user along with one or more less likely alternatives.
The user's microexpression is captured and categorized as 'approval· or 'disapproval' on the moment of first gazing upon each displayed word, after which the best candidate word is selected for input. Alternatively each completed word is immediately added to a text input field and the user's microexpression is captured upon first gaze on the added word, the word replaced with the next best candidate each time the captured micro expression signifies disapproval.
Through the greatly enhanced speed of the feedback loop, compared to existing methods of input correction, a user will experience the correction as a natural and integral part of the input session. The user is incentivized and trained continuously to provide clear microexpressions, improving the system, effectiveness during a longer period of use.
Brief description of the draw ings
Figure 1 shows an example of the apparatus, the field-of-view for two cameras and eyes as gazing upon a keyboard key.
Detailed description
An apparatus according to claims 1-10 and a method according to claims 11-19, further detailed below:
The invention comprises a two or more camera apparatus (figure 1), wherein one (primary) camera is fixedly mounted and works as a fixed reference, featuring a relatively wide-angle lens to capture most of the target object in the field-of-view (5), comparable to what is customarily used in mobile phone front facing cameras. At least one (secondary) camera, using a lens system characterized by a narrower field-of-view (7) and greater angular resolution, is mounted offset from the primary camera optical axis in such a way that using a hinge point at the lens face allows it to image a significant part of the field-of-view of the primary camera by rotating (6). A series of electromagnetic (voice coil) actuators are mounted in the apparatus, at the image sensor side of the secondary camera, allowing exact control of the rotation of the camera unit.
A continuous process analyses the respective image streams from each camera: A heuristic method, such as a neural network, is used to find distinctive image features in both image streams that correspond to the same physical object. If no match can be found the secondary camera is rotated to the next best orientation, if a match is found the pixel coordinates in each image, of the matching feature, are combined with the secondary camera rotation to perform a triangulation in 3-dimensional (3D) space. The determined geometry points are mapped to a 3D scene along with information on colour and time, optionally secondary camera rotation and a certainty score of the triangulation are kept and associated with mapped points.
A simultaneous process continuously analyses the 3D scene for known objects, objects for which a size is fixed, for example part of the apparatus itself, a standardized object such as a coin or a specific object of which the size is previously established. Found objects are then used to assess and improve the overall scene accuracy.
Once a human face has been detected the eyes (4) are continuously monitored to determine the pattern of gazing on keys (2) displayed on a screen (1).
The use of microexpressions, facial expressions generally lasting less than half a second, is to filter out the user's general emotional state and mood changes, such as reactions to the environment, text to input or any other factor, from the person's instant reaction to a displayed word, A user may, during an input session, see a word incorrectly interpreted by the method and show'' an instinctive and direct reaction before reverting to the user's overall mood.
Combining microexpressions with tracking a user's gaze pattern can also have its advantages in situations where there is a need for understanding exactly if and how a user reacts to a display of information. In example, the designer of a railway information system will want to understand which displayed information is consumed first and which information is ineffective.

Claims (19)

Claims What is claimed is:
1. An apparatus comprising: a primary camera module mounted fixedly relative to the apparatus; one or multiple secondary camera modules mounted to allow rotation around a first axis perpendicular to the primary camera optical axis and a second axis of which the direction vector is perpendicular to both the primary camera, optical axis and the first rotation axis; one or more actuators to rotate a secondary camera along said axis; sensors to measure secondary camera rotation angle for each of said rotation axis; a processor coupled to all actuators and sensors used to control secondary camera rotation; said processor iteratively constructing and locally refining a 3-dimensional (3D) scene, based on corresponding features found between the primary camera image stream and any secondary camera image stream, combined with the corresponding secondary camera, rotation angle.
2. The apparatus of claim 1, wherein: the primary camera has a wider angle of view than the secondary cameras; secondary cameras each have greater angular resolution than the primary camera.
3. The apparatus of claim 1, wherein: the lens opening side of a secondary camera module is mounted in a flexible medium which functions as a hinge for rotation of said camera, module; conductors for powering and signalling between the apparatus and said camera module are integrated in said flexible medium.
4. The apparatus of claim 1, wherein at least one of the cameras enables measuring the distance between camera, and subject by determining the elapsed time between the apparatus emitting either a series of photons, a pattern of photons, or both and said camera(s) identifying the reflecting photons.
5. The apparatus of claim 1, further comprising inertia and orientation measurement sensors to account and compensate for orientation changes of the apparatus in respect to the 3D scene.
6. The apparatus of claim 1, wherein the 3D scene constructed by the apparatus is used to analyse eye movements and gaze points.
7. The apparatus of claim 6, wherein a geometrical representation of a known display of text or graphical information is part of the 3D scene and the eye movements and gaze points are used to determine which displayed information a person is gazing upon at any specific time.
8. The apparatus of claim 7, wherein said display comprises a keyboard layout to allow a person to input text based on said gazing information.
9. The apparatus as in any of claims 1-8, wherein: the display is part of the apparatus; the apparatus is a mobile device, such as, but not limited to, a phone, tablet, or laptop.
10. The apparatus as in any of claims 1-8, wherein the apparatus is stationary mounted in or near a display of textual information, such as, but not limited to, advertising billboards or public announcement screens.
11. A method comprising: extracting visually distinct features from a primary camera image stream; locating one or more of said features in the image stream of one or more secondary cameras; determining the location of each said located feature in 3D space using pixel coordinates, position, orientation, and optical properties of each camera; incrementally constructing and refining a 3D scene from said located features.
12. The method of claim 11, further comprising a database of predefined 3D geometrical patterns, such as, but not limited to, facial geometry, to generate visual features of interest that may appear in the primary camera image stream.
13. The method of claim 11, wherein an image stream of higher angular resolution and narrower angle of view can be oriented to enhance detail in specific areas of the 3D scene.
14. The method of claim 11, wherein the 3D scene comprises at least one human face and the 3D scene is used to determine: gaze points, gaze durations, and motion patterns between said gaze points of each face present in the scene.
15. The method of claim 14, wherein: the 3D scene is used to determine micro expressions; said microexpressions are used to determine a category of emotion, such as, but not limited to, approval, or disapproval; said microexpressions and emotional information is coupled to eye gazing patterns.
16. The method of claim 14. wherein a keyboard is displayed to a user and said user's pattern of sequentially gazing at letters, spelling a word, are converted to words to be used as textual input.
17. The method as in any of claims 14-15, wherein the gathered information is used to analyse consumption of information on a display, such as, but not limited to: determining which information on a billboard is most effective in catching attention; or establishing comparison information for multiple information display locations.
18. The method according to claims 15 and 16, wherein: the words best fitting a gaze pattern are presented to the user on a display device; for each said word, in the moment the user gazes on it, the user's micro expression is classified as approving or disapproving; the best word candidate is selected based upon said classification.
19. The method according to claims 15 and 16, wherein: the words best fitting a gaze pattern are continuously written to an input field, presented to the user on a display device; for each of said written words a certainty score is kept; for each s?.id word a list of alternative choices is kept; if the user gazes on a specific word in said input field, the user's microexpression is classified as approving or disapproving; in the case disapproval is detected the disapproved word is swapped out for the next best candidate.
GB1809138.9A 2018-06-04 2018-06-04 Apparatus and method for eye-tracking based text input Withdrawn GB2574410A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB1809138.9A GB2574410A (en) 2018-06-04 2018-06-04 Apparatus and method for eye-tracking based text input

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1809138.9A GB2574410A (en) 2018-06-04 2018-06-04 Apparatus and method for eye-tracking based text input

Publications (2)

Publication Number Publication Date
GB201809138D0 GB201809138D0 (en) 2018-07-18
GB2574410A true GB2574410A (en) 2019-12-11

Family

ID=62872830

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1809138.9A Withdrawn GB2574410A (en) 2018-06-04 2018-06-04 Apparatus and method for eye-tracking based text input

Country Status (1)

Country Link
GB (1) GB2574410A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140098198A1 (en) * 2012-10-09 2014-04-10 Electronics And Telecommunications Research Institute Apparatus and method for eye tracking
US20160210503A1 (en) * 2011-07-14 2016-07-21 The Research Foundation For The State University Of New York Real time eye tracking for human computer interaction

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160210503A1 (en) * 2011-07-14 2016-07-21 The Research Foundation For The State University Of New York Real time eye tracking for human computer interaction
US20140098198A1 (en) * 2012-10-09 2014-04-10 Electronics And Telecommunications Research Institute Apparatus and method for eye tracking

Also Published As

Publication number Publication date
GB201809138D0 (en) 2018-07-18

Similar Documents

Publication Publication Date Title
CN110647237B (en) Gesture-based content sharing in an artificial reality environment
US10132633B2 (en) User controlled real object disappearance in a mixed reality display
EP2978218B1 (en) Computer display device mounted on eyeglasses
CN103561635B (en) Sight line tracking system
US20150379770A1 (en) Digital action in response to object interaction
US20130083063A1 (en) Service Provision Using Personal Audio/Visual System
KR101563312B1 (en) System for gaze-based providing education content
Mehrubeoglu et al. Real-time eye tracking using a smart camera
US20110213664A1 (en) Local advertising content on an interactive head-mounted eyepiece
KR20130000401A (en) Local advertising content on an interactive head-mounted eyepiece
US9442571B2 (en) Control method for generating control instruction based on motion parameter of hand and electronic device using the control method
KR20190030140A (en) Method for eye-tracking and user terminal for executing the same
US11567569B2 (en) Object selection based on eye tracking in wearable device
WO2012119371A1 (en) User interaction system and method
WO2021073743A1 (en) Determining user input based on hand gestures and eye tracking
Lander et al. hEYEbrid: A hybrid approach for mobile calibration-free gaze estimation
EP3991013A1 (en) Method, computer program and head-mounted device for triggering an action, method and computer program for a computing device and computing device
KR20190113252A (en) Method for eye-tracking and terminal for executing the same
WO2018122709A1 (en) Wearable augmented reality eyeglass communication device including mobile phone and mobile computing via virtual touch screen gesture control and neuron command
CN103713387A (en) Electronic device and acquisition method
US10558951B2 (en) Method and arrangement for generating event data
GB2574410A (en) Apparatus and method for eye-tracking based text input
KR20200079748A (en) Virtual reality education system and method for language training of disabled person
AlKassim et al. Sixth sense technology: Comparisons and future predictions
Bilal et al. Design a Real-Time Eye Tracker

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)