EP1050010A1 - Systeme de lecture a sortie vocale avec navigation gestuelle - Google Patents

Systeme de lecture a sortie vocale avec navigation gestuelle

Info

Publication number
EP1050010A1
EP1050010A1 EP98953891A EP98953891A EP1050010A1 EP 1050010 A1 EP1050010 A1 EP 1050010A1 EP 98953891 A EP98953891 A EP 98953891A EP 98953891 A EP98953891 A EP 98953891A EP 1050010 A1 EP1050010 A1 EP 1050010A1
Authority
EP
European Patent Office
Prior art keywords
text
user
finger
image
camera
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP98953891A
Other languages
German (de)
English (en)
Inventor
James T. Sears
David A. Goldberg
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ascent Technology Inc
Original Assignee
Ascent Technology Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ascent Technology Inc filed Critical Ascent Technology Inc
Publication of EP1050010A1 publication Critical patent/EP1050010A1/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/016Input arrangements with force or tactile feedback as computer generated output to the user
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/014Hand-worn input/output arrangements, e.g. data gloves
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/0304Detection arrangements using opto-electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/033Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/041Digitisers, e.g. for touch screens or touch pads, characterised by the transducing means
    • G06F3/042Digitisers, e.g. for touch screens or touch pads, characterised by the transducing means by opto-electronic means
    • G06F3/0425Digitisers, e.g. for touch screens or touch pads, characterised by the transducing means by opto-electronic means using a single imaging device like a video camera for tracking the absolute position of a single or a plurality of objects with respect to an imaged reference surface, e.g. video camera imaging a display or a projection screen, a table or a wall surface, on which a computer generated image is displayed or projected
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B21/00Teaching, or communicating with, the blind, deaf or mute
    • G09B21/001Teaching or communicating with blind persons
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B21/00Teaching, or communicating with, the blind, deaf or mute
    • G09B21/001Teaching or communicating with blind persons
    • G09B21/006Teaching or communicating with blind persons using audible presentation of the information
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B21/00Teaching, or communicating with, the blind, deaf or mute
    • G09B21/001Teaching or communicating with blind persons
    • G09B21/007Teaching or communicating with blind persons using both tactile and audible presentation of the information
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B21/00Teaching, or communicating with, the blind, deaf or mute
    • G09B21/001Teaching or communicating with blind persons
    • G09B21/008Teaching or communicating with blind persons using visual presentation of the information for the partially sighted
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/033Indexing scheme relating to G06F3/033
    • G06F2203/0331Finger worn pointing device

Definitions

  • the present invention relates to an electronic reading system for converting text to synthesized speech that mav be used by low-vision and blind people as well as others that have difficulty reading printed text and more particularly relates to an electronic reading sv stem that includes improved functionality for allowing the user to navigate within the text
  • OCR optical character recognition
  • magnifying systems generally consist of an electronic video capture system (usually with a CCD camera) connected to a video display.
  • the book to be read is placed on a mechanical tracking mechanism beneath the video capture system, and assists the user in moving the book horizontally so as to keep the current line of text within the field of view of the camera.
  • Means are generally provided to the user to adjust the contrast of the image, invert the colors of the image, and adjust the focus through manual controls on the face of the magnifying systems.
  • magnifying systems Because people with residual vision feel empowered using their remaining vision, and because they can use the magnifying systems to see information that is outside the scope of reading machines (e.g. seeing graphics on a page), and because they are generally less expensive than electronic reading machines, magnifying systems currently enjoy a far larger market than electronic reading machines. The are a large number of such magnifying systems currently available, including ones from Telesensory of Mountain View, CA, Magnisight of Colorado Springs. CO. and Optelec of Westford. MA. However, conventional magnifying systems suffer from a number of problems.
  • the mechanisms for tracking lines of text are often difficult to use, since they are manually- guided mechanical systems that require relatively precise and steady hand movements to guide the movement.
  • the present invention is directed to a method for electronically readmg text under interactive control by a user
  • the method includes obtaining a digital image that includes text to be read, performing symbology recognition on the digital image, determining a command signal from a sequence of user-generated spatial configurations of at least one pointer, choosing a subset of the recognized symbology to
  • the present invention is also directed to an electronic reading apparatus for converting text to spoken words for a user
  • the apparatus includes a digital imaging device that converts text to a digital imagmg signal, and a character recognizer receptive of the digital imaging signal, the recognizer generating a recognized character signal comprising the svmbohc identity of the recognized text and the location of the recognized text relative to the digital imaging signal
  • the apparatus also includes a pointer that is operated by the user to indicate commands wherein commands are encoded in the location and movement of the pointer, and a pointer tracker receptive of the pointer location and movement the tracker generating a pointer location and movement signal
  • the apparatus further includes a command interpreter receptive of the pointer location and movement signal and the recognized character signal, the interpreter generating a command signal, and a controller receptive of the command signal and the recognized character signal, the controller generating an output signal representative of at least portions of the text recognized
  • the apparatus includes a transducer receptive of the output signal for converting the output signal to a humanlv-perceptible form
  • Fig la is a perspective view of a device incorporating the first embodiment of the present invention
  • Fig l b is a perspective view from below ot the camera mount depicted in Fig la
  • Fig 2 is a flow diagram of the steps of information processing of the device of Fig la
  • Fig 3 is a perspective view ot a device incorporating the second embodiment of the present invention
  • Fig 4 is a perspective view of a device incorporating the third embodiment of the present invention
  • Fig 5a is a side view of a device incorporating the fourth embodiment of the present invention
  • Fig 5b is a side view of the device of Fig 5a with the finger in a different configuration
  • Fig 5c is a front view of the device of Fig 5a
  • Fig 5d is a side iew ot a variation of the device of Fig 5a with a cut-away view ot the lens svstem
  • Fig 6 is a flow diagram of the steps of pointer tracking as used in the flow diagram of Fig 2
  • Fig l a is a perspective diagram of the first preferred embodiment of the present invention
  • the electronic reading machine 29 is mounted on top of a video monitor 31 with the field of view onto the surface below on which printed material 33 is placed
  • the printed material 33 can be text in a variety of formats on a variety of substrates including books magazines newspapers, food packaging, medicine bottles, bus schedules, utility bills, or CD-ROM labels
  • the electronic leading machine 29 comprises a main system 35. from which a camera mount
  • the camera mount 37 comprises one or more electronic imaging devices (such as CCD or CMOS cameras)
  • a view of the camera mount 37 from the underside is shown in Fig lb a perspective diagram
  • a camera 39 which may comprise a CCD or CMOS imaging sensor 41 along with an attached lens 43. is angled away from the main system 35. so that it is directed towards the printed material 33
  • the camera mount 37 may incorporate one or more illumination sources, so as to provide constant illumination over the field of view In Fig lb. such illumination is provided by two rows of illumination sources 45 along the lateral edges of the mount 37
  • These illumination sources 45 may comprise rows of LEDs.
  • thin fluorescent sources such as Tl lamps often used as illumination for backlit displays on portable computers). or may be other sources including incandescent sources
  • these illumination sources 45 may be combined with reflectors behind the source and mav also be optionally combined with focusing lenses, which may comprise Fresnel optics or lenses to provide relatively even illumination on the surface of the printed material 33
  • diffusing means may be optionally included, in order to provide for even illumination on the paper
  • illumination sources need not be in rows, as shown in Fig lb, but may also comprise point sources or sources located in varied arrangements around the camera 39 In general, it is convenient to juxtapose the illumination source and camera so that any shadows thus formed by the illumination source will be minimized or absent in the image formed by the camera assembh
  • the image or images obtained bv the camera 39 are transmitted to an electronic computing device located within the mam svstem 35
  • the device may comprise either a general-purpose personal computer, or an embedded computer optimized for use in the reading system
  • the computing device processes the images in order to optimize the contrast and brightness of the image and then further processes the image in order to extract textual information (e g by optical character recognition (OCR)) or to interpret graphical information
  • OCR optical character recognition
  • Fig 2 is a flow diagram that depicts the use of the system described in Figs la and lb for reading text on the printed material 33
  • the user places printed information into the field of view of the camera assembly, comprising the image sensor 41 and lens 43 During an image capture step 51.
  • the image is read by the image sensor 41 , and is then converted to a digital signal and processed during video digitizing 53
  • the output digital image consisting of a two-dimensional arrav of pixel values (generally eitner 8-bit gray-scale or 24-bit color) is then sent to a digital computer where the image is analyzed in at least two modes In the first mode, the image is converted into its text representation in an optical character recognition step 55 whereas in the second mode, the image is analyzed for the presence, orientation and movement of a pointer object (e g a finger 34.
  • a pointer tracking step 57 which is under the influence of the user and which is located on top of the printed material 33, in a pointer tracking step 57
  • the pointer that is being tracked in the tracking step 57 may alternatively comprise an object attached to a finger or hand, such as a colored dot or a blinking light, or may be an object held by the user, such as a wooden, plastic or metal rod. which mav have passive or active markings to make it more easily tracked
  • optical character recognition 55 and pointer tracking 57 is both a text representation of the printer material 33, as well as an indication of the text to be read from the pointer tracker 57
  • the user indicates the text to be read through pointer gestures, that might include presenting his finger 34 in a particular orientation forming a distinctive shape with two or more fingers 34 waving his finger 34 back and forth, or tapping his finger 34 at a location During pomter tracking 57, the movements of the pointer are interpreted, and the text that is indicated to be read is determined This text to be read is converted to speech during speech synthesis 63 In general there will be a prior or concurrent step of speech rate adjustment 61.
  • pointer tracking 57 also supplies input to a step of feedback generation 65 through a step of feedback transduction 69, which is used to indicate to the user information other than the vocalized text on the page supplied through the steps of text selection 59.
  • Speech rate adjustment 61 and speech synthesis 63
  • sounds could be used to indicate whether the printed material 33 was oriented properly, whether the paper 33 needed to be moved in order to place additional text within the field of view of the image sensor 41 , or the manner in which the pointer 34 is aligned with respect to existing text (e g whether it is pointing at text or not)
  • image enhancement 73 can improve image readability using analog or digital enhancement techniques such as increasing contrast, changing the image brightness, emphasizing edges, inverting color polarity
  • This image may be combined in a step of video mixing 67 with an overlay of feedback information which could include placing a box around the text currently being vocalized
  • the combined signals are presented then to the user in a step of video display 71 Detailed Description of the First Preferred Embodiment
  • the step of image capture 51 can involve either color or black and white images
  • the advantage of color images is balanced bv the higher data throughput required to transmit the image to the computing device present within the main system 35
  • Either CMOS or CCD sensors may be used for the image sensor 41 , and are selected on the basis of cost, pixel density, noise and other variables
  • the image sensor may communicate through various means with the main system 35 computer including parallel, universal serial bus (USB), IEEE 1394. or 16-bit
  • PCMCIA PCMCIA
  • CardBus 32-bit connections
  • DMA direct memory access
  • the choice of communications interface is made on the basis of cost, throughput, and DMA capabilities
  • the mam system 35 computer should be of sufficient power to perform the remaining steps of the process
  • any Intel Pentium or compatible chip of 150 MHz speed will be sufficient, although a faster speed will provide improved results
  • other non-Intel processors such as those that are used in Windows CE systems, will suffice if they are of a similar performance
  • Windows 98 and Windows NT 4 0 operating systems are suitable for system operation
  • Windows CE are also suitable, if support programs for functions such as optical character recognition and speech synthesis are available
  • the computer of the mam system 35 may be part of a separate system, such as an office or home desktop computer
  • a general purpose computer greatly reduces the cost of a system of the present invention
  • the main computing functions of the desktop computer processor, power supply, motherboard functions, etc
  • input from microphones and output from speakers and video displays integrated with the computer can be used.
  • the number of pixels to be obtained during image capture 51 is determined by the size of the area to be read, and the requirements of the optical character recognition (OCR) program
  • OCR optical character recognition
  • the higher the pixel density, the better the accuracy of the OCR It is preferred to have a pixel density of 125 pixels per inch (dpi), which is slightly less than most facsimile (FAX) machines, although pixel densities of 300 dpi or better provide even better OCR accuracy
  • the image sensor 41 must have a sufficient number of pixels, and the optics of the lens 43 must allow a small FOV at short operating distances
  • the DVC-323 digital camera from Kodak (Rochester, NY) has minimal but sufficient operating characteristics for the present invention
  • the camera operates in "still” mode, capturing images of 640 by 480 pixels with a "macro image size ot 4 7 by 3 5 inches, translating to about 140 dpi with the standard lens
  • the camera transfers the image to the host computer via a USB connection
  • Video digitizing 53 includes analog-to-digital conversion, if it is not an integral part of the image sensor 41 (many CMOS sensors include integral analog-to-digital converters) Once the image is transferred to the ma system 35. it can be digitally manipulated to make the input more appropriate for subsequent interpretation For example, the signal may be converted from a color image to a gray-scale or bina ⁇ zed black-and-white image, since many OCR programs operate most effectively on such images In addition, the image may be gain adjusted, despeckled. and otherwise manipulated to improve the image tor subsequent processing.
  • the optical character recognition step 55 is carried out in the main system 35 using standard OCR algorithms, such as those employed by the Tiger program of Cognitive Technology of Corte Madera. CA These programs not only convert the image to its text representation but also identify the location of particular letters, the font sizes and styles used, and basic text formatting such as indenting and paragraph margins
  • the pointer tracking step 57 operates using commonlv used tracking algorithms While many pointers may be used, it is most convenient for the pointer object to be part of the users hand since it is always available, it is easily placed in the vicinity of the printer material 33. and fingers and hands are naturally used to point at objects, and have ranges of both large scale and small scale motion appropriate for that task More specifically, for purposes of this description, the use of one or more fingers of the user s hand will be used as illustration of pointer tracking and basic gesture-based navigational commands, as shown using the finger 34 of Fig 1
  • FIG. 6 is a flow diagram of the steps of an alternative method of pointer tracking 57, in this case for tracking a finger
  • the input to a step of edge detection 161 is the digitized video image from video digitizing 53
  • Edge detection finds large positional changes in pixel value, which may be performed by convolving the image using multipoint edge enhancement operators, or by simpler arithmetic manipulation of adjacent pixels
  • This edge enhanced image is then subtracted from a similarly edge enhanced image of the sheet without the finger, taken before the finger is placed into the field of view, in a step of image subtraction 163
  • This image should have small amounts of noise due to changes in illumination and movement of the printed material 33 that occurs between the time that the two images were taken Therefore noise, determined by both the magnitude of the residual pixel information, as well as its degree of localization, is removed in a thresholding and signal extraction step 165
  • the continuous values present until this point are converted into binary (black versus white) values through thresholding Individual pixels are now grouped together into
  • an edge thinning step 169 looks for such parallel and closely spaced lines, and resolves them into a single line, generally at the midpoint
  • the image has been reduced to lines representing the current position of the pointer, and in a step 177, these lines can be compared with biomet ⁇ c information 177 which indicates norms for finger length, width, and the like From these comparisons, finger position and orientation can be established
  • the current finger information is stored in a finger database 175 sorted on the basis of time
  • the index finger 34 may be inserted to varying degrees within the field of view of the image sensor 41
  • its width should be roughly between 12 and 25 mm in width
  • two fingers 34 should be between 30 and 50 mm in width (it should be noted that these widths ranges do not overlap)
  • the current finger information is then compared with past finger position and orientation in a finger motion detection step 173. in order to determine the motion of the finger over time For example, if the finger travels first in one direction and then the other direction over a period of one-half a second, a wagging motion of 2 hertz would be returned If a color camera 39 is employed, the finger 34 could be identified on the basis of its color in distinction with the color of the background-printed material 33 This would still require an initial detection of the finger in order to determine the skin color for later use.
  • the pointer tracking 57 could look for colors with the known hue of the finger, and use this to determine the location of the finger 34 It should be appreciated that there are many algorithms that may be employed for the detection of the presence, location, orientation and movement of the finger 34.
  • the algorithm of Fig 6 is only an indication of a method that will provide the necessary information
  • Other algorithms may be more accurate or consume less computing resources or have other advantages over the method given Tapping motions by fingers 34 can be readily detected by a variety of means
  • the apparent width of the finger 34 slightly increases as it is raised, and then decreases as it is lowered In a subtraction of successive images, this is seen as an outline difference of the finger 34, especially since the finger 34 will not be moving m general directly in the direction ot the image sensor 41
  • the finger 34 is
  • the finger 34 locator defines a "reading window comprising text that is contextually related For instance text within a paragraph is more closeK related than text in a prior or succeeding paragraph Text in the same column generally has (except
  • the text selector 59 determines that text to be immediately read, and is linked to text to be successively read.
  • the user indicates through gestural movements the manner in which the text is to be read For example text mav be read continuously, either at a fast or slow rate, single lines or paragraphs
  • moving one finger 34 back and forth sidewavs over text may indicate that the text should be read continuoush Tapping on the text may indicate that only a single line of text should be read Curling the finger up (bringing the fingernail vertically under the hand) could indicate that a paragraph of text should be
  • gestural movements could be used not only to select the text to be read, but also the manner in which the text output should be generated, or other parameters of the electronic readmg process
  • the speed with which the single finger 34 moves back and forth across the page, as described above could be used to determine the rate at which synthesized speed is read
  • the user could move his finger 34 down the page through the text, and the system would adjust speech rate in order that the current speech output would be approximately at the text which is in front of the finger 34
  • Spreading two fingers apart e g the index finger and thumb
  • a closed fist could be used to direct the electronic reader to shut itself off
  • the step of speech rate adjustment 61 sets a rate of speech output
  • a predetermined default rate generally chosen from the range of 80-160 words per mmute, which may be user selected, as well as range limits beyond which speech recognition by the user will be challenging
  • a set of gestural movements along with the command interpretations constitutes a gestural user interface
  • One such interface would comprise the following gestures and commands
  • One or more fingers moving back and forth would constitute a clear command, stopping any current reading
  • 4 fingers would be laid on the printed material 33 until reading begins, where such reading could be stopped with the clear command as described above
  • the user would put his thumb and index finger together to form a
  • Moving a single finger horizontally across a page reads the text in the line above the finger at a rate such that the vocalized texts keeps pace with the movement of the finger, moving the finger vertically reads the single word in each line closest to the finger as the line is passed by the finger
  • Moving a double finger (two fingers extended side-by-side) vertically through the text reads the text at a rate whose speed is roughly proportional to the speed of the hand, but which has lower and higher predetermined rates which may not be exceeded
  • gestural movements that can be distinguished by processing of visual images by a computer (e g one. two or more fingers placed flat, wiggling one or more fingers left to right, tapping a finger, curling a finger inwards, making a fist etc ), as well as commands which the user wishes to make with these gestures (e g read the text above the finger, move to the next block of text, read the text faster, read more loudly, stop reading remember this text)
  • the particular linkage of a gesture with a command may be cognitively linked - e g a flat hand, like a "stop motion, may be used to stop reading
  • many different gestures may be linked with different commands within the spirit of the present invention
  • the gesture-based commands may be supplemented with physical controls (such as buttons, knobs, sliders and keyboards) to allow other modes of input In step 63.
  • the speech selected m text selection 59 will be synthesized at a rate determined by speech rate adjustment 61
  • the means of synthesizing speech may include both software and hardware components
  • a preferred method of speech generation would use software programs such as Lernhout & Hauspie's Text-to- Speech (Burlington, MA)
  • the output speech is encoded by the speech synthesis software m an approp ⁇ ate format, such as 16-bit linear PCM encoding, and then output through a speaker 47 (see Fig 1) located on the main system 35 If the user wishes for more privacy when operating the system, a jack 46 is provided into which headphones may be inserted
  • the locational information is provided to the user by way of feedback means, which may comprise tactile, audio and visual feedback, or a combination of these different modalities Tactile -
  • the tactile feedback mechanism may comprise a worn, held or sub-surface (below the printed material 33) transducer that vibrates in response to the presence of textual information within the reading window
  • the transducer may be attached or clipped to the tip of the finger Vibrating pms or rotating eccentrics would generate the skin deflection associated with a tactile feeling
  • the held transducer may be cupped or grasped w ithin the user " s hand that is directing the reading process (I e on which the finger locator is based), and includes similar vibration means as for the worn device described above
  • the sub-surface transducer comprises one or more vibratory transducers which is located beneath the surface of the textual information For instance, a raised reading platform could be placed within the field of view, delimiting the extent of the field of view, and additional!;, incorporate tactile feedback means that transmits tactile feedback through
  • Information is provided by the tactile means through the presence or absence of vibration, the intensity of vibration, the frequency of vibration, the periodic timing of vibrations, and the direction of vibration
  • Combinations and variations of the vibrational characteristics can thereby convey information about the density of text (e g lines per inch) the size of the text font, closeness of the locator finger to the text, direction of the closest text outside of the reading window, alignment of the text relative to the horizontal of the camera assembly image, and other such information as is useful to navigate through textual information
  • a characteristic pulsing vibration would indicate nearby text, and the frequency and intensity of this pulsing vibration would guide the user to the text
  • characteristic vibratory patterns can indicate when the reading window is positioned over graphics
  • the use of tactile information to guide the user in reading is also desc ⁇ bed in PCT patent application PCTTJS97/02079 to Sears titled "Tactilely-Guided Voice- Output Reading Device ' which is incorporated herein by reference
  • a finger-mounted tactile unit may produce displacement of a movable member underneath the tip of the finger locator, giving the perception to the user that their finger is moving over a topologically elevated text
  • the member would push up on the finger from below, raising the finger, and giving the impression that the line of text was raised relative to the surrounding surface
  • the mechanical actuator may also provide physical tilt to the perceived elevated component
  • the physical actuator may have two vertical actuator elements beneath an inflexible, relatively horizontal cross-member As the height of the two vertical actuator elements changes, the slope of the joining cross-member will change, resulting in the perception of slope This reinforces the perception described previously in this paragraph of traversing up and over an elevated line of text, which in actuality is flat
  • a tactile feedback mechanism is attached to the user s finger 34, this provides a convenient platform for means to locate and track the finger
  • a blinking LED facing upwards towards the image sensor 41 may be placed on the tactile transducer housing wherein the blinking is synchronized with image capture 51 such that during successive image captures, the LED is on and then off By comparing the two successive images, the location of the finger can be easily tracked
  • the audible feedback means includes the generation of sounds of va ⁇ ous volumes, frequencies. timbres, repetition frequency and directional source location (with the use of multiple speakers and techniques to produce three-dimensional holographic sound, such as that provided from SRS 3D Sound from SRS Labs of Santa
  • Ana. CA that conveys information such as that described for tactile feedback means For instance, if there is no textual information within the reading window the frequency and or intensity of a sound can increase as the finger locator is brought closer to readable text
  • spoken information may be used to guide or inform the user For example, the word "graphic can be enunciated to indicate the presence of graphical information
  • perceptually distinctive background sounds can indicate the density of graphical information (e g keved to the spatial frequencies within the graphic or the distribution of color densities) Visual - Many potential users of this svstem have complete vision, yet have trouble reading (e g the learning disabled, dyslexic, or alexic) or have low vision where acuity is insufficient for reading common printed text sizes In such cases, the residual vision may be well employed to guide the user through the text information
  • the system would incorporate either a monitor (such as a computer display or television screen) or alternatively a visual display that might comprise a bank of LEDs. a liquid crystal display or scanned laser beams projected onto the printed material 33
  • the image of the printed material is presented to the user This image may be enhanced by affecting the brightness and contrast of the image
  • a magnified view of the image around the reading window mav be called upon through a signal input by the user
  • This signal may be input either by a pressure-sensitive button attached under the tip of the finger locator, or alternatively, may be a visual gestural cue interpretable by the computer
  • the thumb ana index finger may be spread apart to indicate the desired horizontal or diagonal extent of the field of view in the magnified image
  • that text which is currently within the reading window may be indicated through changing the text color or by highlighting the text which comprises the reading window
  • the image displayed on the screen need not be real-time captured by the camera assembly, including the finger locator, but may be derived from a previously captured image in which the finger is not present, so that a clean image of just the source reading material is displayed
  • the image of the user's finger may be replaced with an icon representing the finger locator
  • the visual feedback means is a visual display that does not directly project pixel images from the camera input, then that display may be located on the directing finger or hand, or may be at a fixed location, such as being incorporated into the camera assembly housing Location on the directing hand allows the user to simultaneously view the material being read, as well as the visual feedback information
  • a preferred embodiment of this form of visual feedback means would be a pair of rows of LEDs, operating similarly to the tactile display pins and lights described in PCT patent application PCT/US97/02079 to Sears titled "Tactilely-guided voice-output readmg apparatus " However, instead of the LEDs being pointed back towards the user, as in the patent application referenced above, the lights would preferably by pointing forwards, illuminating the text currently in the field of view that is to be vocalized
  • Control for this feedback is provided in a feedback generation step 65, which accepts input from pointer tracking 57 and text selection 59. which contain information about the position and movement of the finger 34. as well as the location of text elements on the printed material 33 and the text elements being read
  • the feedback so generated is provided through feedback transduction 69, via either tactile, audible or visual signals as previously described
  • output may be through a step of video display 71 , m forms of visual feedback as previously described, such as the highlighting of certain text
  • this video feedback is performed in conjunction with display of images from the step of image capture 51, and thus may require a step of video mixing
  • the digitized video images from the digitizing 53 may be digitally altered in the feedback generation 65, and then provided as digital images for video display 71
  • the feedback device whether tactile, audible or visual, or a combination of these, can direct the user how to move their finger locator along the text line of which the current reading window is a part, which we will call here the ""track line " With such means, feedback is given to the user to indicate when the finger locator is moving off of the track line
  • the intensity and/or frequency of tactile or audible feedback can peak when the finger locator is located precisely below the track line, and drop off in intensity and/or frequency in rough
  • the user may direct the reading system to read according to parsed textual content That is. that the readmg system will read blocks of contiguous text at a preset rate until some selection delimiter is reached
  • This selection delimiter may either be intrinsic to the text (such as the end of a paragraph), or it may be bounded by a cue provided by the user
  • the user mav direct the system to provide continuous speech through the use of two fingers instead of one. and stroke the fingers vertically along the section of the text to be read
  • an audible cue such as a beep
  • an audible cue indicates that the user should further instruct the system as to the next selection
  • buttons may be available in a unit accessible to the free hand on which the finger locator is not located
  • This keyboard may include positional navigation keys (such as arrow keys), contextual navigation keys (e g ""next word” or “previous paragraph” keys) or mode selection keys (e g "read continuously” or “check spelling” keys)
  • a microphone on the mam system 35 may be positioned so as to receive vocal input from the user, which allows the user to select different modes of action or to navigate through the computer interpreted text using spoken commands
  • the field of view in macro mode is 4 7 by 3 5 inches, providing a resolution near the lowest possible for optical character recognition
  • Four cameras arranged in a rectangular arrangement with minimal 0 2 inch overlap in their fields of view would provide a composite field of view of 9 0 by 6 6 inches, which is adequate to cover a standard
  • this invention could also be used for machine translation of text from one language to another
  • the apparatus and methods of the present invention would allow a person to hear the text in their native language Language translation would occur after the OCR program interpretation of the captured image into text input
  • the computer may correct for syntax and other language construction differences in order to create proper speech in the native language of the user (this is opposed, for instance, to word-by-word translation, which would be a separate option)
  • the text and images captured by the system of the present invention can be used to input the text and images for storage and use on the main system 35 computer This might be used, for instance, as a low-resolution scanner and text input mechanism for general application by users who may or may not have a disability
  • PaperPort system produced by Visioneer (Freemont. CA) is that localized portions of pages may be classified independently, that valuable desktop surface is not consumed with a bulky scanner, the system of the present invention may be used while sitting at a work desk, and that the time required for scanning is not required
  • the user for example, can open the letter, visually scan it for pertinent data, manually gesture for the data to keep, speak into a computer voice recognition system to indicate the disposition of the data, and then dispose of the letter
  • a user in a warehouse could point to a bar code to read
  • the system using a digital image instead of a conventional laser scanning bar code reader to obtain printed information would then read the one-dimensional or two-dimensional bar code, and enter it into the system Because the user would not need to hold a bar code scanner in his hand, this would permit more efficient two-handed movement in the inventory system and thereby permit increased speeds of data input
  • Fig 3 is a perspective diagram of a reading machine that incorporates two cameras A multiplicity of legs
  • a low-magnification wide-angle FOV camera 87 is used to track command gestures
  • This camera 87 may be fixed in its orientation, provided that the field of view is sufficiently large to capture images from the entire printed material of interest
  • the camera 87 may be outfitted with a wide-angle lens that may have a constant non-linear distortion (e g a barrel or fish-eye effect)
  • software within the computer would be required to remove this constant distortion
  • the extent of the field of view of the fixed wide-angle camera encompasses the entire printed material 33 This range may be large enough to allow an entire unfolded page of newspaper to be read without repositioning of the paper
  • a pan-tilt camera 89 is provided with a generally smaller FOV than the wide-angle camera 87 previously mentioned
  • This camera 89 may or may not be outfitted with zoom capability, and if the camera 89 does have zoom capability, the range of magnifications needed will be more limited than in a single camera embodiment, since many low-magnification requirements are satisfied by the low-magnification wide- angle FOV camera used to track command gestures
  • the extent of the field of view of the pan-tilt camera is shown by the area 91 on the printed material 33 This area is of such a size that the pixel density on the imaging sensor of the camera 89 allows for accurate optical character recognition of text in the field of view
  • a laser scanning mechanism 95 can be mounted in such a way as to be able to illuminate small sections of all printed material to be read
  • the purpose of the laser scanner 95 is to highlight the words be g read and spoken, providing feedback to partially-sighted users as to what is currently being read
  • the scanning mechanism 95 is controlled to produce an illuminated box 93 around or fully including the current word bemg read In this way.
  • the laser scanning may be timed so as not to overlap in time with the exposure of the cameras 87 and 89
  • the word or words of interest may be shown on a display screen, as described previously for other embodiments of the present invention, in order to provide feedback to users
  • this laser scanning mechanism 95 could also be used other reading systems such as that of Fig 1
  • the laser scanner 95 may have the additional function of highlighting text that is searched for under direction from the user
  • the user may direct the system to search for a specific word such as "'pay or for classes of words or text, such as those dealing with currency (e g text preceded by a currency symbol such as $' which involves a number with two decimal digits, or which contains the word dollars", or alternatively to scan for non-text symbology such as a bar code or location encoded data such as the page number, which is located in generally predictable locations on a page)
  • a specific word such as "'pay or for classes of words or text, such as those dealing with currency (e g text preceded by a currency symbol such as $' which involves a number with two decimal digits, or which contains the word dollars”
  • non-text symbology such as a bar code or location encoded data such as the page number, which is located in generally predictable locations on a page
  • the laser scanner 95 may be affixed to the pan-tilt mechanism of the high-resolution camera 89, so that the laser is always pointing roughly in the direction of the camera 89 field of view In this way the laser scanner 95 will need a smaller range of motion
  • a wide-field illuminator 97 which is mounted on the platform 85 near to the location of the cameras, and pointed in such a direction as to illuminate text beneath the platform 85
  • the range of the illuminator 97 is such as to provide light that is incident on the widest physical range accessible by both the wide-field and pan-tilt cameras 87 and 89
  • the wide-field illuminator 97 is a fluorescent lamp with reflector and optics to spread the light roughly evenly over the largest field of view of the wide-field camera 87
  • the pan-tilt mechanism of the camera 89 should preferably be oriented so that movement along either the pan or the tilt axis scans horizontally across the printed material, roughly following a text line, while movement in the other axis scans roughly vertically across the page While this orientation of the camera 89 is not required it will generally reduce the amount of complex combined pan-tilt movement as text in a line is read It should also be understood that the mechanism pointing the camera may be served by gimbal mechanisms different from pan-tilt mechanisms as long as accurate control in two-dimensions is available and that a sufficient range of motion is provided Instead of moving the camera 89 it is also within the spirit of the present invention to rotate one or more mirrors while the camera 89 remains fixed in location and orientation
  • the two cameras 87 and 89 may be replaced by a single camera with zoom capabilities
  • the camera In reading text newly placed under the camera the camera mav be in low magnification zoom, where large areas of the page can be observed within a frame
  • the camera can scan the observed page for control signals in the form of user hand signals or motion During this time before the user has indicated a command the camera may scan both horizontally and vertically over the area of the page looking for the presence of the user s hand
  • the hand can be tracked until a command is received, either through hand movement, finger orientation or position, or other mput modality At this point, the magnification of the camera is increased to an extent that allows the text to be reliably interpreted by the OCR program Thus, the zoom mechanism will magnify large font headline text to a lesser extent than small fonts, for example in a footnote
  • a light mounted on the camera assembly which is oriented in the direction of the camera field of view may provide additional illumination whose intensity can be variably increased as the magnification of the zoom element of the camera increases
  • the actual control of the illumination source intensity is through feedback involving analvsis of the images captured by the camera
  • the exposure time of the camera can be increased in response to changes in the magnification in order to compensate for the available light at different magnifications
  • the coordinated action of the cameras 87 and 89, as well as the laser scanner 95 are preferably controlled b ⁇ the computer located in the main system 35 that is engaged in the analysis of images from the camera
  • all of these elements are generally, though not necessarily, connected electronically to the ma system 35, which may be located on the platform 85 Additionally, instead of being separately mounted to the platform 85.
  • the zoom camera is particularly valuable if the image captured by the camera is projected on a computer screen, since the hardware zoom can present a magnification with full pixel information to the user, without need for variable software magnification which mav be of lower quality due to the use of smaller numbers of pixels
  • OCR 55 optical character recognition
  • pointer tracking 57 when printed material 33 is placed within the field of view of the image capture 51 means.
  • OCR 55 may begin immediately, before gestural input from the user has begun Image capture 51 , video digitizing 53 and OCR 55 may proceed opportunistically given text within the field of view and if the gestural command directs the system to read text already interpreted vocalization of the text through speech synthesis 63 can begin almost immediately If the text to be read is not among that alreadv interpreted, then image capture 51 of the indicated text using high pixel densities suitable for OCR 55 can begin.
  • This mixing of optical character recognition 55 and pointer tracking 57 can be performed by a single camera with zoom capabilities, changing rapidly from narrow to wide field in order to both capture text and gestural commands, but the use of two cameras allows high resolution text capture to occur simultaneous with low resolution wide field image capture 51
  • the reading system can read text that is obscured by the user's hand during gestural commands For instance, if the system has begun reading a passage, and the user inadvertently covers some of the text to be read with his hand, the information under his hand may already be stored Thus, not only can text vocalization continue, but also images of the text where the user ' s hand is currently placed can be shown in video display 71 even though current unobscured images of the text are not available
  • the user may view the text on a video display, similar to that used in the first embodiment Fig
  • FIG. 3 shows the use of a touch-screen video display 32, which may be alternatively used With the touch screen display 32 instead of making the gesture-based navigational commands within the field of view of the imaging system, the commands are placed directly via finger 34 movements on a touch-sensitive surface 50 of the touchscreen video display 32
  • the touch-sensitive surface 50 can use capacitive, resistive, surface acoustic wave or other techniques to determine the presence and motion of fingers on the screen, such as resistive digital touch screens manufactured by Jayco of Orange, California While these surfaces 50 generally allow feedback of a single point, and are therefore generally incapable of interpreting the differences between a single finger and multiple fingers used gesture-based commands, even the use of a single point allows the system to distinguish left- right versus up-down motion, tapping motions, and even back-and-forth motions from moving, lifting, returning, and moving motions This provides a vocabulary of motions that can be used in commanding the system Instead of having to interpret images for gesture-based commands, the system must interpret only the presence or absence of touch contact, and the
  • the touch-screen display 32 When using the touch-screen display 32 the text within the system field of view is presented on the touch screen 32. and the user indicates by gesture-based commands not only the text to read, but the manner and speed of reading, as well Because the user interacts with an image, rather than the actual printed material 33, only a single view is permitted at a time This encourages the use of a single camera with pan.
  • the user can control the pan and tilt by appropriate command gestures on the touch screen 32 (e g dragging a finger in the direction of panning, or "drawing a circle ot smaller or larger radius to increase or decrease the zoom), or the system can automatically track lines of text through OCR-based motion control
  • the image shown on the screen need not necessarily be the current field of view of the currently active camera, but may be instead a stored image, allowing the cameras 87 and 89 to be capturing images of the printed material 33 for later reading
  • the user may interact with text that is modified in the step of image enhancement 73, which may render it more visible to users with residual vision than the printed material 33 from which the text comes
  • This enhancement may include, as previously discussed, contrast and brightness control, and the image may be further modified by highlighting certain text (such as the text or text line currently being read)
  • operation using a touch screen display 32 even allows for the use of a flat-bed scanner to obtain images of the printed material 33. with the user providing gesture-based commands through the touch screen display 32
  • This mode of operation has the virtue of using inexpensive flatbed scanners, but suffers from the difficulty of using scanners described in the background section above
  • scanners require up to a minute or more to scan a standard page of text, whereas image capture using digital cameras supports near immediate reading once the printed material 33 is placed in the field of view of the system
  • Another enhancement of this embodiment of the present invention is to import images for optical character reading directly from the screen image buffer of the computer of the main system 35
  • the computer of the mam system 35 is connected to the World Wide Web graphic interface to the Internet (hereinafter referred to simply as the Web)
  • Much of the text interface to the Web is graphic in nature - that is, is presented as pixel images of text, rather than as text which is displayed through Hypertext Markup Language (HTML) text primitives
  • Web interface software e g Web browsers
  • Web interface software typically are unable to provide access to this graphics based, non-HTML text to vision-impaired or blind users It is w ithin the teachings of the present invention to access a screen image buffer of the computer of the main system 35.
  • the svstem preferentially operates in hybrid mode, where text displayed in HTML-code is directly interpreted from the code whereas text displayed as graphics is interpreted through OCR 55 of the present invention
  • the reason for this is to avoid the need to OCR-interpret text whose symbology is already known to the system
  • One method would be to use a touch screen displav 32 in which the position touched by the user is directly mapped onto the pixels beneath the user s finger The effect then becomes directlv comparable to that of the user making gestural commands on printed material 33 except that the text is present on a screen rather than paper
  • An alternative method of interfacing with the screen-based text is to use the cameras 87 and 89 to record gestural movements made within their field of view w ithout respect to material beneath the gestures That is there may or may not be prmted material 33 within the field of view of the cameras 87 and 89 and what is there is ignored by the system Instead, the system maps the position of the user s fingers within the field of view and maps the location ot the hand and fingers relative to the field of view to the relative positions of recognized text from the screen image in the field of view Thus if the user s index fingertip is about 12% from the left of the field of view, and 47% from the top of the field of view of the
  • This embodiment of the present invention mav also be used as a reading device for children, both for its entertainment effects as well as educational value
  • a child user who could not currently read would bring their favorite children s book to the system of the present invention, and place it in the field of view of the system
  • the system could not onh read the book for the child, but also highlight words as they are being spoken through use of the laser scanner 95 thereby providing feedback to the child useful for gaming the ability to read
  • the platform 85 may be supported on collapsible or hinged legs, or may even be available in forms without leg supports, and be worn by the user
  • the cameras, illuminators and scanners or some subset of these may be worn on a head-mount, such as on a pair of glasses, telephone headset, headphones, or cap
  • An example of such a worn reading machine is shown m Fig 4, a perspective diagram of an eyeglass reading machine 100
  • An eyeglass frame 101 provides the basic platform for the reading machme
  • a wide-field camera 103 on one eyeglass earpiece provides functionality similar to that of the wide-field camera 87 of Fig 3, and a narrower field camera 105 provides functionality similar to that of the pan-tilt camera 89
  • Suitable cameras for this embodiment of the present invention include the DXC-LS1 lipstick camera from Sony (Japan)
  • a speaker 107 which provides audible feedback to the user, which may be stereo encoded For instance, to direct the user to turn their head to the right thereby repointing the cameras 103 and 105 fields of view, a noise may be fed through the right speaker
  • This audible feedback is supplemented or replaced by tactile feedback transducer 109 that vibrates one or more p s 1 1 1 on the inside surface of the earpiece, against the bones above the ear
  • the power and communications are brought to this reading machine 100 through a pair of cords 1 13 that feed along the earpiece
  • These cords can be incorporated into an eyeglass support (not shown) that lies along the back of the user's neck, preventing the eyeglass readmg apparatus from dropping
  • the cords 1 13 lead to a computer that may be carried in various means, including backpacks, hip packs, shoulder bags or an article of clothing such as a vest
  • the major functional difference between this embodiment and that described in Fig 3 above is that the narrow-field camera 105 does not
  • these feedback means may be supplemented by a laser pointer on the eyeglass oriented so that its light falls near to or directly on the center of the field of view of the narrow field camera 105 This will allow users with residual vision to identify the field of view of this camera 105. and thus track lines of text If combined with a pan and tilt mechanism, this laser could also be used to highlight text on the page in the manner of the laser scanner 95 in Fig 3 above
  • this embodiment of the present invention leaves the hands of the user free to hold and manipulate the text and also to perform the gestural commands described above
  • the device of Fig 4 may also be used to interpret text not located on printed material brought to the svstem. but rather may also include text on public signage, computer screens, directions affixed to a wall, or book covers on a librarv shelf, to which the reading apparatus has been brought
  • the ability to read such text will be conditioned by either a variable focussing means or through use of a camera with a very great depth of field (e g a 'pinhole" camera), so that text at various distances can be read
  • An alternative embodiment of the present invention is to have the camera assembly mounted on the user s hand, as in a portable system
  • the camera or cameras capturing the images of text to be read are either at a fixed location, or located relatively distantly from the text (e g mounted on the user ' s head or chest)
  • the camera received commands, at least in part from hand and finger gestures of the user that were captured by the camera or cameras
  • Fig 5a and Fig 5b presents side views of a fourth embodiment of the present invention
  • Fig 5c presents a frontal view of the device
  • a camera is mounted directly on the user's fingertip 121 in a finger housing 123
  • the camera in the finger housing 123 is naturally pointing in the same direction
  • Images are then transferred by a cable 125 connecting the finger housing to a general-purpose or special purpose computer, such as contained in the mam system 35, as in the previous embodiments
  • a general-purpose or special purpose computer such as contained in the mam system 35
  • the finger housing 123 is strapped onto the user's index finger 121 with two straps, a medial strap 127 encircling the middle segment of the index finger, and a distal strap 129 which encircles the distal segment of the index finger
  • the medial strap 127 is longer in the longitudinal finger direction, and is the primary structural stabilizer of the finger housing 123 on the index finger 121
  • the medial strap 127 is conveniently fabricated from fab ⁇ c or plastic
  • the finger-housing 123 rests on top of the finger 121 , with a lens 131 above the distal-most segment, and points along the axis of the finger 121.
  • a supporting member 139 made of a less flexible material. connects the medial and distal straps 127 and 129 so as to provide support for the distal strap 129. as well as to maintain a fixed distance between the two straps
  • a Spandex or other fabric sheath may be placed around the finger housing 123 and associated straps 127 and 129 and supporting member 139
  • Illumination is provided for the camera by illuminators 133 around the periphery of the camera, pointing the same direction as the camera, as can be seen in F ig 5c
  • the illuminators 133 are conveniently light-emitting diodes (LEDs). and may be of different colors to aid in the discrimination of different colored text, or text on different colored backgrounds In the case of different colored LEDs, the LEDs 133 would be turned on in sequence or in combination to provide illumination with the greatest contrast of text to its background
  • One such arrangement of LEDs is shown in Fig 5c, although a smaller number or different topological arrangement of LEDs is within the spirit of the present invention
  • ambient illumination may be sufficient to provide images of the text without additional illumination from the device
  • the user s finger 121 will generally be inclined to the page at an angle of greater than 45 degrees, as shown in Fig 5a
  • the captured image will not be square and will appear distorted if compensation is not made either in the optical hardware, the camera positioning or image capture software
  • the optical path within the finger housing 123 may include either tilted mirrors or prisms to remove some or most of the optical distortion caused by the non-orthogonal camera angle
  • these methods cannot entirely remove the non-orthogonal image, since the angle with which the user positions the camera cannot be entirely controlled or predicted, and small amounts of distortion may remain This final distortion may be somewhat compensated for by image processmg software within the computer, which may detect the angle of the camera position by assessing various features of the image.
  • the lighting from the illuminators can be known and calibrated for a vertical camera arrangement If the camera is angled, that portion of the image that is divergent will generally also have less reflected light, since the incident light from the illuminators is spread over a larger area
  • the variation in illumination intensity can be used to detect spreading of the image, and provide the information necessary to remove the distortion
  • a miniature tilt sensor such as those that use a fluid sensing device, may be used to detect camera tilt
  • the image processmg software within the computer may remove the effects of tilt.
  • a circular beam of light of known spread may be projected during certain image captures, and the tilt and distance of the surface can be unambiguously determined from the size and shape of the beam captured in the images Using this method, the illumination spread angle must be different and preferably smaller than the camera field-of-view in order to distinguish distance
  • angle of camera tilt can include looking at the divergence of angle in vertical parts of letters, such as the vertical bars on "h”, “1", “b”, “K”, and many other letters If the camera is not orthogonal to the text, the angle of the vertical bars will vary within different parts of the image
  • the user may want to pull the camera away from the printed text in order to increase the field of view of the camera
  • the lens system of the camera will generally operate with a very short focal length, it is generally hard to allow the lens to accommodate a very large range of focal depth In part, this can be accomplished by using a very small lens aperture, creating a pinhole camera with large depth of field
  • This strategy is limited by the reduced light capturing of such a pinhole lens system, and the need to compensate for this effect with higher illumination than may be available
  • the camera can be outfitted with a movable lens system, which provides variable focus
  • a movable lens system which provides variable focus
  • the finger housing 123 is primarily positioned and stabilized on the middle segment of the index finger 121 by the medial strap 127
  • the strap 129 on the distal segment pulls a stiff actuator 135 which is attached tangentially to the camera lens 131. and thus rotates the lens 131 which is attached to the camera by a screw mechanism
  • the distance from the lens 131 to the camera is adjusted, thereby changing the focal point of the camera assembly
  • an actuator may extend from the bottom of the lens 131 and rest on the distal finger 121 segment under the influence of spring pressure As the finger 121 flexes, the actuator would move downward to rest on the new position of the finger 121.
  • the camera does not capture images containmg the user's finger or hand, and so images of user hand or finger gestures cannot be used directly to communicate commands to the computer
  • three different methods, used in isolation or in combination, are used to allow the user to issue hand-based commands to the computer
  • a small button 137 may be placed on the distal strap 129 on the finger housing 123.
  • the button 137 is actuated
  • the electrical connections for this button may be transmitted through wires placed withm the distal and medial straps 127 and 129, and the support member 139
  • the button 137 permits both single and double "clicking" as command inputs
  • the user may click once to activate reading, and a second click would stop reading Double clicking could command activation of voice input, change lighting, or indicate another function
  • the sequences of images from the camera can indicate special finger gestures as command inputs
  • the camera can detect changes illumination, and by detecting offsets of common image elements from frame to frame, determine direction and speed of finger movement For example, if the user's finger 121 is above the page, and then brought down rapidly in a tapping motion, the illumination intensity on the page from the LEDs 133 will increase rapidly, as the lights are brought closer to the paper Then, as the finger 121 is brought into contact with the surface of the reading material
  • Accelerometers located within or on the finger housing 123 can detect and communicate the direction and magnitude of acceleration Thus a tapping motion down would be detected as a moderate acceleration downwards, followed by a very sharp, impulsive upwards acceleration as the finger strikes the page surface and stops
  • Such accelerometer devices are widely available in piezoelectric, piezoresistive and variable capacitor form from companies such as Endevco of San Juan Capistrano, CA
  • the use of the button, of image analysis, and of accelerometer information, or other methods of determining finger position and movement, may all be used to determine and interpret finger gestures for user input of commands to the svstem
  • this information may be used to determine the location and movement of the hand, for interpreting hand gestural commands to the computer Additionally this information might be used for an automatic focusing mechanism, in which either the camera or the lens were moved according to the dictates of the object distance By varying the distance from the lens to the camera imaging sensor, different focal points mav be accommodated
  • a convenient method for determining the distance from the camera face to the reading material is the common t ⁇ angulation technique used in industrial photoelectric sensors and handheld cameras
  • a roughly colhmated beam that is co-aligned with the camera line of sight, but offset by a small distance, is projected onto the printed material
  • the location of the beam contact with the printed material within the camera image will vary predictably
  • the distance from the camera to the printed material may be computed
  • the beam may be switched on and off between successive camera frames, and through the process of image subtraction, the location of the beam within the image will be easily identified
  • a diode laser with a colhmating lens is placed within the finger housing
  • a narrow-output beam LED can be placed within a hole in the finger housing, such that a roughly colhmated beam emerges from the hole
  • the diode laser has the advantage of a longer working distance, although the LED system has the advantage of cost and size in its favor
  • multiple beams measuring distance can be used to additionally determine tilt and curvature of the images surface
  • other means of communicating commands to the computer are useful, most notably verbal commands that are input to the computer using a microphone and interpreted by a voice recognition program
  • This microphone will generally be integrated near, on, or in the computer system to which the system is connected by cord 125
  • Other input may be available through one or more buttons 141 located on the exte ⁇ or of the finger housing 123 These buttons may be used to "wake up" the system, when the system is in a sleep or power-saving mode, turn the system off, alert the system that audible input from the microphone is to be entered by the user, or other such commands
  • a tactile interface 143 could be included the finger housing for is embodiment, and the audible and visual feedbacks can be handled by the computer in the same manner as the previous embodiments
  • the tactile feedback stimulators 143 on the device may be located at a number of positions within the spirit of the present invention
  • one or more stimulators 143 may be located on the inside surface of the straps 127 and 129 used to attached the finger housing to the user s index finger
  • the tactile stimulators 143 may be located on the underside of the finger housing 123, against the dorsal surface of the finger 121
  • the sensitivity of the finger 121 varies substantially with position, and the highest sensitivity occurs on the ventral surface of the distal segment of the finger, which is the optimal location for the positioning of the tactile sensors although other locations may suffice
  • finger housing 123 in Fig 5a through Fig 5c is shown resting primarily on the dorsal surface of the finger 121. It is within the spirit of the present invention for the finger housing 123 to be both more substantial in size, as well as encompass a wider range of circumference around the finger 121 In this case, the user's finger would insert in a hole in the device, and electronics would be placed around the finger 121 Tactile stimulators
  • the finger housing 123 may be located on any segment of the finger, and may be conveniently located not on the middle segment, as shown in Fig 51 through
  • Fig 5d presents a side view of this embodiment of the present invention, in which the optics of the camera are presented in schematic cross-section
  • the finger housing 123 is located on the proximal finger 121 segment, secured to the finger via a housing strap 145
  • a bellows arrangement 151 shown in cross-section
  • the p ⁇ sm 147 redirects the light from a field of view 155 near the tip of the finger to the input path to the finger housing 123
  • the bellows is secured to the medial strap 127 by a bellows attachment 153.
  • the prism 147 may alternatively be a fluid-filled prism, so that as the finger 121 moves, instead of moving the p ⁇ sm 147, it changes the relative angle of the faces of the prism, thereby adjusting the optics in the required manner
  • Fig 5d The placement of elements shown in Fig 5d has a number of advantages, including a larger field of view, given the larger distance to the printed material, a larger depth of field, greater comfort (since the weight of the device is closer to the point of rotation at the knuckle, and therefore presents less torque around the knuckle), and some of the weight of the device may be carried not on the finger but over the knuckle
  • the present invention provides a number of advantages relative to magnifying and electronic reading devices practiced in the prior art. including •
  • the systems mav be used with general-purpose computers, which are becoming ubiquitous in office and home environments These computer systems provide both the computing power necessary, as well as ancillary input and output devices, including video displays and audio feedback
  • the price of the system for the end-user who already has a suitable computer will be very inexpensive
  • the performance of the reading systems will correspondingly improve
  • the system of the present invention can be used from a sitting position, as the printed material need be placed only on the desktop, rather than in a raised scanner of current reading machines
  • the third (eyeglass) and fourth (fingertip) embodiments of the present invention are easily made portable, so that reading can be performed wherever and whenever printed material is encountered, whether at school, at work at the store or at a restaurant

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Business, Economics & Management (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

L'invention concerne un système de lecture de textes imprimés (29) à entrée optique et à sortie vocale (47, 63), destiné aux personnes mal ou non voyantes, dans lequel l'utilisateur effectue les entrées dans le système avec des gestes de la main. Les images du texte à lire (37, 51), que l'utilisateur manipule au moyen de commandes gestuelles avec les doigts ou la main, sont introduites dans un ordinateur qui décode les images du texte pour obtenir leurs significations symboliques par reconnaissance optique de caractères (55) et surveille ensuite (57) les positions et les mouvements de la main et des doigts afin d'interpréter les mouvements gestuels selon le sens des commandes qu'ils véhiculent. Afin de permettre à l'utilisateur de sélectionner du texte et d'aligner les documents imprimés, des moyens audibles et tactiles assurent le retour d'information. Un synthétiseur vocal lit le texte de manière audible. Pour les utilisateurs à vision résiduelle, le retour d'informations (71) se fait avec du texte agrandi et à image améliorée. On peut améliorer les performances à l'aide de plusieurs caméras dont les champs de vision sont identiques ou différents. De plus, on peut utiliser des variantes de configuration des dispositifs de l'invention en mode portatif, y compris avec des plates-formes portables telles que les lunettes (100) ou un système 'à portée de doigts' (123). L'utilisation des commandes gestuelles est naturelle et se distingue par la rapidité d'apprentissage et la simplicité d'utilisation. Le dispositif peut également être utilisé comme un assistant à l'apprentissage de la lecture et pour l'entrée de données et la capture d'images à usage domestique ou professionnel.
EP98953891A 1997-10-22 1998-10-22 Systeme de lecture a sortie vocale avec navigation gestuelle Withdrawn EP1050010A1 (fr)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US6313597P 1997-10-22 1997-10-22
US63135P 1997-10-22
US6871397P 1997-12-29 1997-12-29
US68713P 1997-12-29
PCT/US1998/022392 WO1999021122A1 (fr) 1997-10-22 1998-10-22 Systeme de lecture a sortie vocale avec navigation gestuelle

Publications (1)

Publication Number Publication Date
EP1050010A1 true EP1050010A1 (fr) 2000-11-08

Family

ID=26743083

Family Applications (1)

Application Number Title Priority Date Filing Date
EP98953891A Withdrawn EP1050010A1 (fr) 1997-10-22 1998-10-22 Systeme de lecture a sortie vocale avec navigation gestuelle

Country Status (4)

Country Link
EP (1) EP1050010A1 (fr)
AU (1) AU1114899A (fr)
CA (1) CA2308213A1 (fr)
WO (1) WO1999021122A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101217031B (zh) * 2007-01-05 2011-02-16 林其禹 全自主式读谱及演奏音乐的机器人及方法
US20180088782A1 (en) * 2005-06-20 2018-03-29 Samsung Electronics Co., Ltd. Method for realizing user interface using camera and mobile communication terminal for the same

Families Citing this family (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2794560B1 (fr) * 1999-06-01 2001-09-21 Thomson Csf Procede d'aide a la lecture notamment pour malvoyant
JP2001091218A (ja) * 1999-09-14 2001-04-06 Mitsubishi Electric Inf Technol Center America Inc 3次元運動検知装置及び3次元運動検知方法
US6901561B1 (en) * 1999-10-19 2005-05-31 International Business Machines Corporation Apparatus and method for using a target based computer vision system for user interaction
JP4052498B2 (ja) 1999-10-29 2008-02-27 株式会社リコー 座標入力装置および方法
JP2001184161A (ja) 1999-12-27 2001-07-06 Ricoh Co Ltd 情報入力方法、情報入力装置、筆記入力装置、筆記データ管理方法、表示制御方法、携帯型電子筆記装置および記録媒体
US20030023446A1 (en) * 2000-03-17 2003-01-30 Susanna Merenyi On line oral text reader system
US6803906B1 (en) 2000-07-05 2004-10-12 Smart Technologies, Inc. Passive touch system and method of detecting user input
JP5042437B2 (ja) 2000-07-05 2012-10-03 スマート テクノロジーズ ユーエルシー カメラベースのタッチシステム
GB2381687B (en) * 2001-10-31 2005-08-24 Hewlett Packard Co Assisted reading method and apparatus
EP1570374A4 (fr) * 2002-10-16 2010-06-02 Korea Electronics Telecomm Procede et systeme de transformation adaptative d'un contenu visuel en fonction des symptomes caracteristiques de basse vision et des preferences de presentation d'un utilisateur
US6954197B2 (en) 2002-11-15 2005-10-11 Smart Technologies Inc. Size/scale and orientation determination of a pointer in a camera-based touch system
US7532206B2 (en) 2003-03-11 2009-05-12 Smart Technologies Ulc System and method for differentiating between pointers used to contact touch surface
US7274356B2 (en) 2003-10-09 2007-09-25 Smart Technologies Inc. Apparatus for determining the location of a pointer within a region of interest
US7355593B2 (en) 2004-01-02 2008-04-08 Smart Technologies, Inc. Pointer tracking across multiple overlapping coordinate input sub-regions defining a generally contiguous input region
US7460110B2 (en) 2004-04-29 2008-12-02 Smart Technologies Ulc Dual mode touch system
US8120596B2 (en) 2004-05-21 2012-02-21 Smart Technologies Ulc Tiled touch system
US7697827B2 (en) 2005-10-17 2010-04-13 Konicek Jeffrey C User-friendlier interfaces for a camera
US9442607B2 (en) 2006-12-04 2016-09-13 Smart Technologies Inc. Interactive input system and method
US8094137B2 (en) 2007-07-23 2012-01-10 Smart Technologies Ulc System and method of detecting contact on a display
US8487881B2 (en) * 2007-10-17 2013-07-16 Smart Technologies Ulc Interactive input system, controller therefor and method of controlling an appliance
US8902193B2 (en) 2008-05-09 2014-12-02 Smart Technologies Ulc Interactive input system and bezel therefor
IT1390595B1 (it) 2008-07-10 2011-09-09 Universita' Degli Studi Di Brescia Dispositivo di ausilio nella lettura di un testo stampato
US8339378B2 (en) 2008-11-05 2012-12-25 Smart Technologies Ulc Interactive input system with multi-angle reflector
US8692768B2 (en) 2009-07-10 2014-04-08 Smart Technologies Ulc Interactive input system
US8577146B2 (en) 2010-04-09 2013-11-05 Sony Corporation Methods and devices that use an image-captured pointer for selecting a portion of a captured image
NO341403B1 (no) * 2012-02-27 2017-10-30 Ablecon As Hjelpemiddelsystem for synshemmede
GB2507963A (en) * 2012-11-14 2014-05-21 Renergy Sarl Controlling a Graphical User Interface
DE102014005088A1 (de) * 2014-04-08 2015-10-08 Köppern Und Eberts Ug (Haftungsbeschränkt) Lesehilfe und Verfahren zum Helfen des Lesens von Text enthaltenden Dokumenten
US20150310767A1 (en) * 2014-04-24 2015-10-29 Omnivision Technologies, Inc. Wireless Typoscope
FR3061150B1 (fr) * 2016-12-22 2023-05-05 Thales Sa Systeme de designation interactif pour vehicule, notamment pour aeronef, comportant un serveur de donnees
JP2019067166A (ja) * 2017-10-02 2019-04-25 富士ゼロックス株式会社 電子機器
CN111027556B (zh) * 2019-03-11 2023-12-22 广东小天才科技有限公司 一种基于图像预处理的搜题方法及学习设备
CN110032994B (zh) * 2019-06-10 2019-09-20 上海肇观电子科技有限公司 文字检测方法、阅读辅助设备、电路及介质
KR20220027081A (ko) 2019-06-10 2022-03-07 넥스트브이피유 (상하이) 코포레이트 리미티드 텍스트 검출 방법, 판독 지원 디바이스 및 매체
KR102373960B1 (ko) * 2021-09-10 2022-03-15 (주)웅진씽크빅 독서 지원 장치 및 이를 이용한 사용자 입력 감지 방법

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
USD287021S (en) * 1984-12-26 1986-12-02 Johnson Carlos J Combined camera and eyeglasses
US5168531A (en) * 1991-06-27 1992-12-01 Digital Equipment Corporation Real-time recognition of pointing information from video
US5325123A (en) * 1992-04-16 1994-06-28 Bettinardi Edward R Method and apparatus for variable video magnification
JPH07271818A (ja) * 1994-03-31 1995-10-20 Toshiba Corp ハイパーメディアシステム
US5736978A (en) * 1995-05-26 1998-04-07 The United States Of America As Represented By The Secretary Of The Air Force Tactile graphics display

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO9921122A1 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180088782A1 (en) * 2005-06-20 2018-03-29 Samsung Electronics Co., Ltd. Method for realizing user interface using camera and mobile communication terminal for the same
US10545645B2 (en) * 2005-06-20 2020-01-28 Samsung Electronics Co., Ltd Method for realizing user interface using camera and mobile communication terminal for the same
CN101217031B (zh) * 2007-01-05 2011-02-16 林其禹 全自主式读谱及演奏音乐的机器人及方法

Also Published As

Publication number Publication date
CA2308213A1 (fr) 1999-04-29
AU1114899A (en) 1999-05-10
WO1999021122A1 (fr) 1999-04-29

Similar Documents

Publication Publication Date Title
US6115482A (en) Voice-output reading system with gesture-based navigation
WO1999021122A1 (fr) Systeme de lecture a sortie vocale avec navigation gestuelle
US10741167B2 (en) Document mode processing for portable reading machine enabling document navigation
US9626000B2 (en) Image resizing for optical character recognition in portable reading machine
US8036895B2 (en) Cooperative processing for portable reading machine
US7659915B2 (en) Portable reading device with mode processing
US8284999B2 (en) Text stitching from multiple images
US7627142B2 (en) Gesture processing with low resolution images with high resolution processing for optical character recognition for a reading machine
US7629989B2 (en) Reducing processing latency in optical character recognition for portable reading machine
US7325735B2 (en) Directed reading mode for portable reading machine
US8249309B2 (en) Image evaluation for reading mode in a reading machine
US8186581B2 (en) Device and method to assist user in conducting a transaction with a machine
US20150043822A1 (en) Machine And Method To Assist User In Selecting Clothing
US20060071950A1 (en) Tilt adjustment for optical character recognition in portable reading machine
US20060017810A1 (en) Mode processing in portable reading machine
WO2005096760A2 (fr) Dispositif de lecture portatif avec traitement modal
AU709833B2 (en) Tactiley-guided, voice-output reading apparatus

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20000428

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): BE CH DE DK ES FI FR GB GR IE IT LI NL SE

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20030503