WO2019014163A1 - Handheld text scanner and reader - Google Patents

Handheld text scanner and reader Download PDF

Info

Publication number
WO2019014163A1
WO2019014163A1 PCT/US2018/041363 US2018041363W WO2019014163A1 WO 2019014163 A1 WO2019014163 A1 WO 2019014163A1 US 2018041363 W US2018041363 W US 2018041363W WO 2019014163 A1 WO2019014163 A1 WO 2019014163A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
light emitting
scanning
images
lens
Prior art date
Application number
PCT/US2018/041363
Other languages
French (fr)
Inventor
Jamee MILLER
Payden MILLER
Shane PAGE
Original Assignee
Hidden Abilities, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hidden Abilities, Inc. filed Critical Hidden Abilities, Inc.
Publication of WO2019014163A1 publication Critical patent/WO2019014163A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • G06F40/56Natural language generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/142Image acquisition using hand-held instruments; Constructional details of the instruments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present invention relates to systems and methods for machine reading of printed text. More particularly, the present invention relates to systems and methods that scan printed text and output an audio transcription of the text in real-time for individuals with dyslexia or other reading challenges.
  • Systems and methods in accordance with the present invention convert printed text to audio as a user scans the text with a camera.
  • printed text can be "read” in real-time by a handheld device with the audio corresponding to the text generated by a speaker.
  • a system in accordance with the present invention for scanning text and generating an audio signal derived from the text may comprise a case that contains at least some of the electrical and optical components of the system.
  • the case may extend from a scanning tip to a terminal end, such that the case may be gripped by a hand of a user and the scanning tip may be moved along a line of printed text that is to be scanned and read by the system.
  • At least one light emitting diode may be provided in, on or near the scanning tip. The at least one light emitting diodes may be oriented to project light from the scanning tip onto text to be scanned when the light emitting diode is activated.
  • At least two light emitting diodes may be provided in, on, or near the scanning tip, and a lens may be retained in the scanning tip oriented between the at least two light emitting diodes to focus images on a sensor within the case and located along a focal axis of the lens.
  • the sensor may detect images focused upon the sensor by the lens.
  • a memory unit may be contained in the case and may receive image from the sensor. The memory unit may retain a series of images taken by the sensor at a predetermined scan rate.
  • a processor under the control of instructions embodied in a computer readable medium retained in a non-transitory form may operate to analyze images retained by the memory unit in the order in which the images were taken to identify changes in contrast within the images and, based upon identified changes in contrast, may determine when a letter within the scanned text begins and ends, may analyzes spaces between identified letters to determine when words begin and end, may convert images of words to text, and may convert the text to an audio signal reading the text.
  • a system may further comprise at least one audio speaker that outputs the audio signal to the user.
  • the lens may be oriented at an angle relative to a long-axis of the case, and further the at least one light emitting diode may be oriented at an angle relative to the focal axis of the lens.
  • the lens may be oriented at an angle of forty-five degrees from the long axis of the case, while in other examples the lens may be oriented at an angle of less than forty-five degrees, and in further examples the lens may be oriented at an angle between thirty and sixty degrees.
  • at least two light emitting diodes may be provided and may be inclined inward toward the lens, thereby permitting the light emitted by the diodes to illuminate text being focused on the sensor by the lens.
  • the at least two light emitting diodes may be inclined inward different amounts.
  • the at least two light emitting diodes may comprise exactly two light emitting diodes.
  • At least one of the light emitting diodes may comprise an RGB light emitting diode that, under the control of the processor executing instructions embodied in a computer readable medium retained in a non-transitory form, outputs information descriptive of the status of the system by changing the color of light emitted by the RGB light emitting diode.
  • the present invention may comprise a method for converting printed text into audio.
  • the method may comprise scanning printed text using a handheld device, retaining captured images in a memory device, analyzing the captured images of text sequentially to detect letters and words using a computer processor executing computer readable instructions retained in a non-transitory format, converting the captured images to text using the computer processor executing computer readable instructions retained in a non-transitory format, and producing an audio signal reading the text using a computer processor executing computer readable instructions retained in a non-transitory format.
  • scanning the printed text may comprise capturing images of the text using a lens and sensor to capture images while a scanning tip of the handheld device containing the lens and the sensor is moved along a line of text.
  • Scanning printed text further may further comprise illuminating the printed text using at least two light emitting diodes affixed to the scanning tip of the handheld device.
  • Examples of methods in accordance with the present invention may further comprise outputting the audio signal to at least one speaker, which in some examples may comprise transmitting the audio signal to at least one earpiece.
  • transmitting the audio signal to at least one earpiece may comprise wirelessly transmitting the audio signal.
  • At least one of the at least two light emitting diodes may by an RGB light emitting diode that, under the control of the computer processor executing computer readable instructions retained in a non-transitory format, emits light of different colors to indicate a status of the handheld device.
  • the present invention may comprise a handheld scanning device for converting printed text to computer-synthesized speech in real-time.
  • the handheld scanning device may comprise at least one computer processor that executes computer readable instructions retained in a non-transitory form in at least one non-transitory media.
  • the computer readable instructions may cause the computer processor to at least convert images captured from printed text to text and then to convert the resulting text to synthesized speech.
  • the handheld scanning device may further comprise at least one sensor that captures images of printed text focused upon the at least one sensor as the handheld device is scanned along a line of printed text and at least one memory device that receives captured images from the at least one sensor and provides the received images to the at least one computer processor.
  • the handheld scanning device may comprise at least one light emitting diode that illuminates the line of printed text being scanned.
  • the at least one light emitting diode may comprise an RGB light emitting diode.
  • the at least one light emitting diode may comprise at least two light emitting diodes.
  • Exemplary handheld scanning devices may further comprise at least one speaker that produces the synthesized speech.
  • a case may retain the at least one computer processor, the at least one memory device, and the at least one sensor within the interior of the case, the case further retaining the at least one or, alternatively, at least two light emitting diodes in a position to illuminate the line of printed text being scanned.
  • a handheld scanning device in accordance with the present invention may further comprise a lens positioned between at least two light emitting diodes that focuses an image of the line of text being scanned onto the sensor.
  • the at least two light emitting diodes may be oriented in a non-parallel fashion with the focal axis of the lens, such that the light emitted by each of the at least two light emitting diodes at least partially falls on the extension of the focal axis of the lens on the line of text being scanned.
  • the at least two light emitting diodes may comprise two light emitting diodes oriented at different angles relative to the focal axis of the lens.
  • a handheld scanning device may provide a wireless communication interface that transmits the synthesized speech from the at least one computer processor to the at least one speaker, and the at least one speaker may comprise an earpiece.
  • the handheld scanning device may further comprise at least one rechargeable battery retained within the case, the at least one rechargeable battery powering the at least one computer processor and the at least two light emitting diodes wile the handheld scanning device is in use.
  • FIG. 1 schematically illustrates an exemplary handheld text scanning device in accordance with the present invention
  • FIG. 2 illustrates an example scanning tip of a handheld text scanning device in accordance with the present invention
  • FIG. 3 further illustrates an example scanning tip of a handheld scanning device in accordance with the present invention
  • FIG. 4 illustrates an exemplary handheld text scanning device in accordance with the present invention in use to scan printed text
  • FIG. 5 illustrates an example of text scanned and analyzed by a handheld text scanning device in accordance with the present invention
  • FIG. 6 illustrates an example of a scanning tip of a handheld scanning device in accordance with the present invention scanning text
  • FIGS. 7 schematically illustrates exemplary handheld text scanning device in accordance with the present invention
  • FIG. 8 illustrates an example of a method in accordance with the present invention.
  • FIG. 9 illustrates a further example of a scanning tip of a handheld scanning device in accordance with the present invention.
  • the present invention provides systems and methods for scanning printed text and generating audio that reads the text for a user.
  • the present invention particularly addresses the needs of students with dyslexia or other conditions impacting the ability to read printed materials, but also may benefit individuals of various ages and circumstances. While solutions like audio books or text-to-speech of digital files may provide some benefit to individuals who struggle to read printed text, systems and methods in accordance with the present invention do not require the material to be prepared in advance of scanning.
  • the present invention does not even require a digital file of the text, as systems and methods in accordance with the present invention prepare a digital file as text is scanned using a handheld device in accordance with the present invention.
  • the present invention may be used to read a library book immediately after it is plucked from the shelf, a restaurant menu, a textbook, or any other type of printed text.
  • the scanner used in accordance with the present invention may be handheld, a user may select the specific portion of printed material to read. For example, a student may choose to read the caption of a textbook figure, or a diner may choose to read the entrees section of a menu. If a statement is interesting or unclear to a user, the present invention permits that individual to re-read the text by simply re-scanning it.
  • Systems and methods in accordance with the present invention may provide a handheld device with an optical sensor that receives images of text.
  • Optical sensors used in accordance with the present invention may provide a pixel array and an associated integrated circuit that produces an image based upon the interaction of the pixels with electromagnetic radiation, such as visible light, incident upon each individual pixel.
  • Examples of optical sensors that may receive an image in systems and methods in accordance with the present invention are active pixel sensors and charge-coupled devices (CCDs).
  • Examples of active pixel sensors that may be used as optical sensors in accordance with the present invention are complementary metal-oxide-semiconductor (CMOS) sensors.
  • CMOS complementary metal-oxide-semiconductor
  • a sensor may comprise a linear sensor array.
  • Images of text may be focused on an optical sensor using a pinhole, a lens, or multiple lenses.
  • a lens used to focus an image of text may have a diameter of 8 millimeters and a focal length of 8 millimeters, radius of curvature of 7.84 millimeters, a thickness of approximately 2 millimeters, and a magnification of four times, while being located 2 millimeters from the sensor used to detect images.
  • other lens geometries may be used in accordance with the present invention.
  • a series of sequential images of text may be captured by the sensor as a scanning tip of a device is moved along a line of text. Those sequential images may be retained in computer memory and processed to identify characters and words within the images of scanned text.
  • Images of the text may be converted to text. Audio reading that text may be generated and output to one or more speaker.
  • the one or more speaker used to produce audio may be integral with the handheld scanner or may comprise an external speaker(s).
  • An external speaker(s) may comprise one or more earbud or headset that discretely provides audio to the ear(s) of a user.
  • Systems and methods in accordance with the present invention may provide at least one light emitting diode (LED) that illuminate text for scanning.
  • the at least one LED may be provided on the scanning tip of a handheld device in accordance with the invention near a lens used to focus images of text upon the optical sensor.
  • the at least one LED may comprise at least two LEDs positioned on opposing sides of a lens.
  • any number of LEDs, such as four or more, may be used to illuminate text as it is scanned.
  • One or more LED provided may be oriented at an angle relative to the focal axis of the lens in order to emit light on text being scanned using the lens. In situations where two or more LEDs are used to illuminate text, different LEDs may be inclined at different angles relative to the focal axis of a lens.
  • the at least one LED used to illuminate text in accordance with the present invention may be of any color, but in some examples one or more white LED may be used.
  • at least one white LED may be used in conjunction with at least one RGB LED.
  • the at least one RGB LED may be used to produce white light (by simultaneously generating red, green, and blue white in the appropriate relative luminosities) during the scanning of text, but the at least one RGB LED may be further used to output information regarding the handheld device to a user.
  • the RGB LED may emit red light to indicate that a battery powering the device requires charging, may emit green light to indicate that the battery is fully charged, may emit a blue light to indicate that the device is powering on, etc.
  • a variety of states may be communicated to a user using various light colors produced by the RGB LED (which may, of course, comprise more than red, green, or blue colors) and patterns of flashes of light of different colors.
  • a handheld device in accordance with the present invention may be powered by one or more rechargeable battery.
  • a battery may be chargeable using an interface such as one of the various Universal Serial Bus (USB) standards, which may permit both charging of a battery and the exchange of data between computing components of the handheld device and external computing devices.
  • USB Universal Serial Bus
  • other types of connections may be used for charging and/or data exchange, such as wireless connections. If a data exchange connection, whether wired or wireless, is provided in conjunction with a handheld device in accordance with the present invention, the connection may be used to provide software updates and/or provide maintenance for the handheld device and its various components.
  • Systems and methods in accordance with the present invention may use one or more computer processor executing computer readable code retained in a non-transitory medium to execute methods in accordance with the present invention.
  • Such methods may receive images of text captured by the optical sensor and may analyze those images to identify the abrupt changes in an image indicating the boundary of a letter or other character of text. Based upon the distance between the boundaries of characters, individual words within the text may be identified. Images of words, letters, or other characters may be used to generate text, for example using optical character recognition (OCR) techniques.
  • OCR optical character recognition
  • optical character recognition software that may be used in accordance with the present invention is OpenCV, a library of functions available for use in accordance with an open source license, but systems and methods in accordance with the present invention may use any type of optical character recognition techniques and/or software.
  • the text may then be processed using text-to-speech techniques, and the resulting audio may be output to one or more speaker to be heard by the user.
  • Various types of software and/or techniques may be used to convert text to speech, such as software known as "Acapela Text to Speech" available from Acapela Group France.
  • a handheld text scanning device in accordance with the present invention may have a resilient case, such as may be made from molded plastic, nylon, or other materials.
  • a case may contain within it or retain upon its surface at least some of the components of a device in accordance with the present invention. Electrical and/or computing components may be retained within the case, and may comprise components such as integrated circuit boards, computer memory devices (such as RAM and ROM), sensors, switches, batteries, buses, connectors, antennae, ports, processors, and/or other devices, as well as structures to support and/or retain the components.
  • the case may further retain a lens in a scanning tip, and may retain at least one LED on the exterior of the scanning tip while permitting appropriate electrical connections to power and control the at least one LED.
  • a case may be molded for easy use as a handheld device, for example having an elongated structure permitting it to be held comfortably like a pencil by the user scanning text.
  • FIG. 1 illustrates one example of a device 100 in accordance with the present invention.
  • a handheld text scanning device 100 may have a first end or scanning tip 110 that may retain at least one LED and a lens as described in examples below.
  • the opposing end 120 of the device 100 may be located opposite of the scanning tip 110 on a long axis 130 of the device 100.
  • the scanning tip may have a face inclined at an angle 150 relative to the long axis 130 of the device 100.
  • the angle 150 of inclination of a scanning tip 110 may be between thirty and sixty degrees, while in other examples the angle 150 of inclination may comprise forty -five degrees.
  • FIG. 2 illustrates an example scanning tip 110 of a device in accordance with the present invention in greater detail. While in some examples a pinhole may be used to focus images of text, in many examples a lens 210 may be provided to focus images of text on an optical sensor 220 contained within the case 140. In some examples, a lens assembly 215 may retain both the lens 210 and the optical sensor 220, and the assembly 215 may be retained in a corresponding hole or opening provided in the case 140 at the scanning tip 110.
  • a camera board 240 or other circuitry which may include an appropriate microcontroller, may be provided in conjunction with the optical sensor 220 to receive and initially process images captured by the optical sensor 220.
  • images captured by the optical sensor 220 may be received from the camera board 240 by other electrical and/or computing components of the device for additional processing in accordance with the present invention.
  • camera board 240 may be omitted, for example by combining the circuitry of camera board 240 with other components of the device.
  • the first LED 230 may be oriented to project light at a first angle 231 relative to the face of the scanning tip 110, and the second LED 232 may be oriented to project light at a second angle
  • the lens 230 may be co-planar or parallel to the face of the scanning tip 110, in which case the focal axis of the lens 210 may be perpendicular to the surface face of the scanning tip 110, and in such an example the magnitudes of first angle 231 and the second angle 233 will determine the angles of the light emitted by the respective first LED 230 and second LED 232 with respect to the focal axis of the lens 210.
  • FIG. 9 illustrates an example of a scanning tip for a handheld device in accordance with the present invention wherein a first LED 930 and a second LED 932 are retained within the interior of the case 940 of the handheld scanning device 910. While FIG. 9 illustrates an example in accordance with the present invention that provides two LEDs, more or fewer LEDs may be provided. While some or all of the LEDs provided my comprise white LEDs, at least one of the provided LEDs may optionally comprise an RGB LED for use in indicating an operational status to a user. While at least one LED may be oriented to project light at a desired location by securing the LED to the exterior of a scanning tip, one or more LED may be oriented to project light at a desired location using at least one fiber optic element.
  • a first fiber optic element 950 transmits light emitted by the first LED 930 through a first opening in the case 940, while a second fiber optic element 952 transmits light emitted by the second LED through a second opening in the case 940.
  • Fiber optic elements 950, 952 may terminate on opposing sides of lens 910. The terminal ends of the fiber optic elements 950, 952 may be flush with the exterior of the scanning tip, may protrude from the scanning tip, or may be recessed within the scanning tip. In an example such as depicted in FIG. 9, a fiber optic elements may be provided to correspond with each of the LEDs retained in the case.
  • any number of LEDs may be retained within a case with light emitted by the LED transmitted using a fiber optic element, such as at least one LED and fiber optic element, at least two LEDs and fiber optic elements, etc.
  • one or more LED may be positioned entirely or partially on the exterior of the case, while at least one additional LED may be retained within the case with a corresponding fiber optic element transmitting light from the LED out of the case to illuminate text to be scanned.
  • fiber optic elements may be rigid or flexible. If flexible fiber optic elements are used to transmit light from one or more LED, the one or more LED within the case need not be oriented along the axis of desired illumination from that LED.
  • a fiber optic element may be retained at a desired location within a case by providing indentations within the casing, such as a rib or ribs provided within the interior of the case. Adhesives may additionally or alternatively be used to retain a fiber optic element at a desired location within the case. An adhesive may be used to affix a fiber optic element to the corresponding LED and/or at the location of the opening of the case or other aperture through which the light will be transmitted to illuminate text to be scanned.
  • FIG. 3 illustrates an example of the scanning tip 110 from along the focal axis of the lens 210.
  • the first LED 230 and the second LED 232 may be on opposing sides of the lens 210, such that the first LED 230 is above the lens 210 when the device is held for scanning text and the second LED 232 is below the lens 210 when the device is held for scanning text.
  • additional LEDs such as a third LED 234 and a fourth LED 236 may also be provided, for example on opposing lateral sides of lens 210.
  • both the number and arrangement of LEDs illustrated in FIG. 3 are exemplary only. In many examples in accordance with the present invention, only the first LED 230 and the second LED 232 are provided, while in further examples only a single LED is provided.
  • various numbers of LEDs may be evenly spaced around the lens 210, while in other examples various numbers of LEDs may be unevenly spaced around the lens 210. In some examples, LEDs may be emitted or replaced with other types of light sources.
  • FIG. 4 illustrates an example of text 415 printed on a substrate 410, such as paper, being scanned by a device 100 held by a hand 420 of a user. Audio generated based upon the scanned text 415 may be output to a speaker such as earbud 430 to produce sound 435 discernable by the user.
  • FIG. 4 is exemplary only, and the distance illustrated between the text 415 and the scanning tip of the device 100 is exaggerated beyond what will be needed in some examples in order to show the entirety of the exemplary text 415.
  • an earbud 430 operating using a wireless connection, such as a connection using the Bluetooth protocol, with device is illustrated, a speaker used to output sound may take other forms.
  • a speaker may be integral to the device 100 or may be connected through a cord or cable to device 100.
  • a speaker used in conjunction with the present invention may comprise any types of external speaker, and in further examples multiple speakers may be used to output audio of the scanned text being read.
  • FIG. 5 illustrates an example of a first word 510 and a second word 550 such as may be scanned using systems and methods in accordance with the present invention.
  • the first word 510 comprises a first letter 512, a second letter 514, and a third letter 516, with a first space 522 between the first letter 512 and the second letter 514 and a second space 524 between the second letter 514 and the third letter 516.
  • the second word 550 may comprise a first letter 552, a second letter 554, a third letter 556, and a fourth letter 558, separated by a first space 562, second space 564, and third space 566, respectively.
  • a space 540 may separate the first word 510 from the second word 550, that is to say the last letter 516 of the first word 510 and the first letter 552 of the second word 550, such that space 540 separating words is larger than the other spaces (i.e., spaces 522, 524, 526, 562, 564, 566, 568) separating letters within a word.
  • the boundaries of individual letters or other types of characters, if present
  • the images of text may be broken into individual words for further processing.
  • words of various lengths beyond those depicted in FIG. 5 may be scanned in accordance with the present invention.
  • the text depicted in FIG. 5 comprises two words and seven letters
  • the images captured while a device in accordance with the present invention is moved along a line of text will often contain far fewer letters than shown in FIG. 5.
  • the optical sensor may capture images at a rate that provides images of only single letters or portions of single letters at a time, but by computationally stitching these series of images together entire words and sets of words may be identified and captured in the images.
  • accelerometers or other devices may be provided to enable the device to measure the rate at which the device is being moved by a user across a line of text, but in many examples accelerometers may be omitted.
  • Accelerometers or other components may be particularly unnecessary when the rate at which the optical sensor captures images is rapid enough to provide substantial overlap between subsequent images captures, thereby permitting the images to be readily stitched together.
  • the present examples use printed English, which is read left-to-right, the present invention may also be adapted for use with languages read right-to-left, top-to-bottom, or bottom-to-top.
  • FIG. 6 shows an example of a single letter 552 being scanned by a handheld device in accordance with the present invention.
  • a lens 210 may have a focal axis 610, and the device may be positioned such that letter 552 is on or near the focal axis 610.
  • an optical sensor 220 also positioned along the focal axis 610 of lens 210, may have an image of the letter 552 focused upon the pixels of the sensor 220.
  • the letter 552 (as well as other text characters) may be illuminated as the text is scanned by the first LED 230 and the second LED 232.
  • FIG. 7 schematically illustrates one exemplary arrangement of components that may be used to scan printed text using a device 110 in accordance with the present invention.
  • an optical sensor 220 and a camera board 240 may receive images of text as it is scanned by moving the scanning tip of a device 100 along a line of text.
  • a bus 790 or other connection mechanism(s) may operably connect the various components described in order to permit them to exchange information and to receive power from a battery 760 and a power controller 750.
  • a temporary memory device 710 such as random access memory, may serve as a buffer to temporarily retain images or text generated from images, audio generated from text, or other transient information produced in accordance with the present invention.
  • An interface 740 may provide a connection with an audio device.
  • interface 740 may comprise a Bluetooth module (which may include an antenna) permitting audio to be transmitted from the interface 740 to a wireless earbud.
  • a permanent memory device 730 may contain computer readable code in a non-transitory form that causes the various components of device 100, some of which are depicted in the example of FIG. 7, to execute methods in accordance with the present invention.
  • permanent memory 730 may be updatable, such as to provide software updates, upgrades, bug fixes, patches, etc., to the device 100.
  • Permanent memory may comprise flash memory such as a 15nm EMMC04G-M627-X02U flash memory unit available from guitarist Technology.
  • a processing unit 720 may comprise a computer processor executing the instructions retained in permanent memory 730 and interfacing with the other components (such as sensor 220 and camera board 240, temporary memory 710, etc.) of device 100 to perform methods in accordance with the present invention. While a variety of computer processors may be used for processing unit 720, one example of an acceptable processor is an embedded processor such as a Texas Instruments AM3358BZCZ100.
  • a first LED 230 and a second LED 232 may be activated or, if one is an RGB LED used to indicate a status of the device 100, controlled by processor 720 in accordance with computer readable instructions retained in permanent memory 730.
  • a charging port 780 may permit a battery 760 to be charged.
  • Battery 760 may comprise a lithium ion or other rechargeable battery, although in some examples disposable batteries may be used in conjunction with a scanning device in accordance with the present invention.
  • a button 770 may be used to activate or deactivate the device 100. When activated, LEDs 230, 232 may be illuminated, and optical sensor 220 may be actively capturing images which are then processed by the various components of device to identify letters and words within those images to convert to text and ultimately audio.
  • a variety of optical sensors may be used in accordance with the present invention, such as a TSL1401CL available from Mouser Electronics.
  • FIG. 8 depicts one example of a method 800 in accordance with the present invention. Method 800 may be implemented using handheld text scanning devices in accordance with the present invention, and may be executed at least in part by one or more computer processor executing computer readable code retained in a non-transitory form.
  • step 810 sequential images of text may be captured along a line of text.
  • Step 810 may occur as the scanning tip of a handheld device in accordance with the present invention is moved along the line of text by a user. Images of the printed text may be captured by an optical sensor as the device is moved. The printed text may be illuminated, for example by at least one LED, as step 810 is performed.
  • the images may be retained in memory, and in step 830 the images may be analyzed to identify areas of high contrast corresponding to the boundaries of characters in the text.
  • images of characters may be identified in the images, as well as the distances between characters.
  • step 850 based upon the linear distance between characters, individual letters and words may be identified within the images. Steps 820, 830, and 840 may involve stitching multiple sequential captured images into one or more larger image for processing.
  • the images of the words may be converted to text via techniques such as optical character recognition.
  • the text may be converted to audio, for example using text-to-speech techniques.
  • the audio may be output using a speaker, thereby reading the scanned text to the user.
  • each of the steps of method 800 may be performed simultaneously as lines of text are scanned in accordance with the present invention.
  • a continuous series of images are available for conversion to text in step 860, which then provides a continuous series of text to be converted to audio in step 870 and output for a user in step 880.
  • Additional steps beyond those depicted in the example of FIG. 8 may be added to method 800 and/or other methods in accordance with the present invention.
  • at least one LED that may optionally be used to illuminate text to be scanned in step 810 may comprise an RGB LED that alters the color of light output in order to provide information regarding the status of a device.
  • Examples of other optional steps include checking for software updates when connected to an external computing device by a port or a wireless connection, connecting wirelessly to one or more speaker (such as a Bluetooth earbud), detecting a scan rate at which a user is scanning text (for example, using an accelerometer and/or by analyzing the degree of overlap for consecutive images), etc.
  • systems and methods in accordance with the present invention are not limited to these examples.
  • the present invention may be practiced using a pinhole instead of a lens or, in other examples, multiple lenses.
  • more than a single optical sensor may be used to capture images, and various types of optical sensors may be used in accordance with the present invention. Wile depicted as powered by a rechargeable battery, devices in accordance with the present invention may be powered through other means, such as using an electrical plug-in.
  • Various dimensions may be used for devices in accordance with the present invention.
  • different intended users may prefer or require different dimensions.
  • a young schoolchild may prefer a smaller device than an adult.
  • Cases may enclose and/or carry at least some of the components of a device in accordance with the present invention.
  • the material used to create the case, and the manner in which the case is fabricated and assembled, may vary.
  • a case may be formed from a pair of injection molded plastic pieces that mate to retain the various operable components.
  • the interior of the case may be configured to receive and hold the components of the device, such as printed circuit boards, integrated circuits, memory units, optical sensors, processors, batteries, etc.
  • Additional components such as additional input or output devices for use in controlling a device in accordance with the present invention may be provided beyond those described in examples herein.
  • further input mechanisms such as various buttons, dials, switches, touchscreens, or other input mechanisms may be provided to receive user input to control a device.
  • the output mechanisms comprise at least one speaker and, optionally, an RGB LED that alters its output based upon the state of the device, but other types of output mechanisms, such as display screens, additional lights (such as further LEDs), gauges, vibrational generators, etc.
  • a single device may be capable of use to scan text from a variety of languages, with a user being able to use an input mechanism to select which language is being scanned.
  • a device in accordance with the present invention may be wirelessly paired with software, such as an "app," operating on a computing device such as a mobile phone, and the software may provide an interface the user may use to select the language or desired.
  • translation capabilities may be provided, by using machine translation of text from the language scanned to a second language, with the second language being used to produce audio.
  • a device in accordance with the present invention need not connect to a computer network such as an intranet or the internet.
  • a device in accordance with the present invention may connect to a network using a port or an appropriate wireless protocol, such as one of the 802.11 family of protocols.
  • a network connection may be useful to provide updates to the software of a device in accordance with the present invention from a remote server, and may further be used to provide enhanced functionality such as translation capabilities.
  • such connectivity is optional.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Character Input (AREA)
  • Facsimile Scanning Arrangements (AREA)
  • Image Input (AREA)

Abstract

A handheld text scanner may capture images of printed text. The images may be converted to text, and the text may then be converted to audio. The audio may be output using a speaker, such as an earbud. At least one LED may illuminate the text being scanned. An LED may comprise an RGB LED that alters the color of light output to communicate information regarding the status of the device.

Description

HANDHELD TEXT SCANNER AND READER
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of United States provisional patent application Serial Number 62/604,638, entitled "User operated hand-held text scanner with audio playback," filed on July 12, 2017, and United States non-provisional patent application Serial Number 15/872,742, entitled "Handheld Text Scanner and Reader," both of which are incorporated herein by reference.
FIELD OF INVENTION
[0002] The present invention relates to systems and methods for machine reading of printed text. More particularly, the present invention relates to systems and methods that scan printed text and output an audio transcription of the text in real-time for individuals with dyslexia or other reading challenges.
BACKGROUND AND DESCRIPTION OF THE RELATED ART
[0003] Conditions such as dyslexia make reading text difficult for individuals.
Because most school curriculums require reading of texts and other materials, individuals with dyslexia can struggle academically from elementary school into college and beyond. Even outside of educational systems, functioning in work environments or simply interacting in daily life often requires an individual to read printed text, which is a challenge if that individual has dyslexia. SUMMARY OF THE INVENTION
[0004] Systems and methods in accordance with the present invention convert printed text to audio as a user scans the text with a camera. In accordance with the present invention, printed text can be "read" in real-time by a handheld device with the audio corresponding to the text generated by a speaker.
[0005] In some examples, a system in accordance with the present invention for scanning text and generating an audio signal derived from the text may comprise a case that contains at least some of the electrical and optical components of the system. The case may extend from a scanning tip to a terminal end, such that the case may be gripped by a hand of a user and the scanning tip may be moved along a line of printed text that is to be scanned and read by the system. At least one light emitting diode may be provided in, on or near the scanning tip. The at least one light emitting diodes may be oriented to project light from the scanning tip onto text to be scanned when the light emitting diode is activated. In some examples, at least two light emitting diodes may be provided in, on, or near the scanning tip, and a lens may be retained in the scanning tip oriented between the at least two light emitting diodes to focus images on a sensor within the case and located along a focal axis of the lens. The sensor may detect images focused upon the sensor by the lens. A memory unit may be contained in the case and may receive image from the sensor. The memory unit may retain a series of images taken by the sensor at a predetermined scan rate. A processor under the control of instructions embodied in a computer readable medium retained in a non-transitory form may operate to analyze images retained by the memory unit in the order in which the images were taken to identify changes in contrast within the images and, based upon identified changes in contrast, may determine when a letter within the scanned text begins and ends, may analyzes spaces between identified letters to determine when words begin and end, may convert images of words to text, and may convert the text to an audio signal reading the text. A system may further comprise at least one audio speaker that outputs the audio signal to the user. In some example systems in accordance with the present invention, the lens may be oriented at an angle relative to a long-axis of the case, and further the at least one light emitting diode may be oriented at an angle relative to the focal axis of the lens. In some further examples in accordance with the present invention, the lens may be oriented at an angle of forty-five degrees from the long axis of the case, while in other examples the lens may be oriented at an angle of less than forty-five degrees, and in further examples the lens may be oriented at an angle between thirty and sixty degrees. In some examples, at least two light emitting diodes may be provided and may be inclined inward toward the lens, thereby permitting the light emitted by the diodes to illuminate text being focused on the sensor by the lens.
[0006] In further examples of a system for scanning text and generating an audio signal derived from the text in accordance with the present invention, the at least two light emitting diodes may be inclined inward different amounts. In yet further examples in accordance with the present invention, the at least two light emitting diodes may comprise exactly two light emitting diodes. At least one of the light emitting diodes may comprise an RGB light emitting diode that, under the control of the processor executing instructions embodied in a computer readable medium retained in a non-transitory form, outputs information descriptive of the status of the system by changing the color of light emitted by the RGB light emitting diode.
[0007] In some examples, the present invention may comprise a method for converting printed text into audio. In some such examples, the method may comprise scanning printed text using a handheld device, retaining captured images in a memory device, analyzing the captured images of text sequentially to detect letters and words using a computer processor executing computer readable instructions retained in a non-transitory format, converting the captured images to text using the computer processor executing computer readable instructions retained in a non-transitory format, and producing an audio signal reading the text using a computer processor executing computer readable instructions retained in a non-transitory format. In some examples, scanning the printed text may comprise capturing images of the text using a lens and sensor to capture images while a scanning tip of the handheld device containing the lens and the sensor is moved along a line of text. Scanning printed text further may further comprise illuminating the printed text using at least two light emitting diodes affixed to the scanning tip of the handheld device. Examples of methods in accordance with the present invention may further comprise outputting the audio signal to at least one speaker, which in some examples may comprise transmitting the audio signal to at least one earpiece. In some examples, transmitting the audio signal to at least one earpiece may comprise wirelessly transmitting the audio signal. In yet further examples of methods in accordance with the present invention, at least one of the at least two light emitting diodes may by an RGB light emitting diode that, under the control of the computer processor executing computer readable instructions retained in a non-transitory format, emits light of different colors to indicate a status of the handheld device.
[0008] In some examples, the present invention may comprise a handheld scanning device for converting printed text to computer-synthesized speech in real-time. In examples, the handheld scanning device may comprise at least one computer processor that executes computer readable instructions retained in a non-transitory form in at least one non-transitory media. The computer readable instructions may cause the computer processor to at least convert images captured from printed text to text and then to convert the resulting text to synthesized speech. The handheld scanning device may further comprise at least one sensor that captures images of printed text focused upon the at least one sensor as the handheld device is scanned along a line of printed text and at least one memory device that receives captured images from the at least one sensor and provides the received images to the at least one computer processor. In further examples, the handheld scanning device may comprise at least one light emitting diode that illuminates the line of printed text being scanned. In further examples, the at least one light emitting diode may comprise an RGB light emitting diode. In yet further examples, the at least one light emitting diode may comprise at least two light emitting diodes. Exemplary handheld scanning devices may further comprise at least one speaker that produces the synthesized speech. In examples of handheld scanning devices in accordance with the present invention, a case may retain the at least one computer processor, the at least one memory device, and the at least one sensor within the interior of the case, the case further retaining the at least one or, alternatively, at least two light emitting diodes in a position to illuminate the line of printed text being scanned.
[0009] In further examples, a handheld scanning device in accordance with the present invention may further comprise a lens positioned between at least two light emitting diodes that focuses an image of the line of text being scanned onto the sensor. In some examples, the at least two light emitting diodes may be oriented in a non-parallel fashion with the focal axis of the lens, such that the light emitted by each of the at least two light emitting diodes at least partially falls on the extension of the focal axis of the lens on the line of text being scanned. In some examples of a handheld scanning device in accordance with the present invention, the at least two light emitting diodes may comprise two light emitting diodes oriented at different angles relative to the focal axis of the lens. In examples in accordance with the present invention, a handheld scanning device may provide a wireless communication interface that transmits the synthesized speech from the at least one computer processor to the at least one speaker, and the at least one speaker may comprise an earpiece. In some examples, the handheld scanning device may further comprise at least one rechargeable battery retained within the case, the at least one rechargeable battery powering the at least one computer processor and the at least two light emitting diodes wile the handheld scanning device is in use.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
[0010] Examples of systems and methods in accordance with the present invention are described in conjunction with the attached drawings, wherein:
[0011] FIG. 1 schematically illustrates an exemplary handheld text scanning device in accordance with the present invention;
[0012] FIG. 2 illustrates an example scanning tip of a handheld text scanning device in accordance with the present invention;
[0013] FIG. 3 further illustrates an example scanning tip of a handheld scanning device in accordance with the present invention;
[0014] FIG. 4 illustrates an exemplary handheld text scanning device in accordance with the present invention in use to scan printed text;
[0015] FIG. 5 illustrates an example of text scanned and analyzed by a handheld text scanning device in accordance with the present invention;
[0016] FIG. 6 illustrates an example of a scanning tip of a handheld scanning device in accordance with the present invention scanning text;
[0017] FIGS. 7 schematically illustrates exemplary handheld text scanning device in accordance with the present invention;
[0018] FIG. 8 illustrates an example of a method in accordance with the present invention; and
[0019] FIG. 9 illustrates a further example of a scanning tip of a handheld scanning device in accordance with the present invention. DETAILED DESCRIPTION
[0020] The present invention provides systems and methods for scanning printed text and generating audio that reads the text for a user. The present invention particularly addresses the needs of students with dyslexia or other conditions impacting the ability to read printed materials, but also may benefit individuals of various ages and circumstances. While solutions like audio books or text-to-speech of digital files may provide some benefit to individuals who struggle to read printed text, systems and methods in accordance with the present invention do not require the material to be prepared in advance of scanning. The present invention does not even require a digital file of the text, as systems and methods in accordance with the present invention prepare a digital file as text is scanned using a handheld device in accordance with the present invention. The present invention may be used to read a library book immediately after it is plucked from the shelf, a restaurant menu, a textbook, or any other type of printed text.
[0021] Because the scanner used in accordance with the present invention may be handheld, a user may select the specific portion of printed material to read. For example, a student may choose to read the caption of a textbook figure, or a diner may choose to read the entrees section of a menu. If a statement is interesting or unclear to a user, the present invention permits that individual to re-read the text by simply re-scanning it.
[0022] Systems and methods in accordance with the present invention may provide a handheld device with an optical sensor that receives images of text. Optical sensors used in accordance with the present invention may provide a pixel array and an associated integrated circuit that produces an image based upon the interaction of the pixels with electromagnetic radiation, such as visible light, incident upon each individual pixel. Examples of optical sensors that may receive an image in systems and methods in accordance with the present invention are active pixel sensors and charge-coupled devices (CCDs). Examples of active pixel sensors that may be used as optical sensors in accordance with the present invention are complementary metal-oxide-semiconductor (CMOS) sensors. In examples in accordance with the present invention, a sensor may comprise a linear sensor array. Images of text may be focused on an optical sensor using a pinhole, a lens, or multiple lenses. In some examples, a lens used to focus an image of text may have a diameter of 8 millimeters and a focal length of 8 millimeters, radius of curvature of 7.84 millimeters, a thickness of approximately 2 millimeters, and a magnification of four times, while being located 2 millimeters from the sensor used to detect images. However, other lens geometries may be used in accordance with the present invention. A series of sequential images of text may be captured by the sensor as a scanning tip of a device is moved along a line of text. Those sequential images may be retained in computer memory and processed to identify characters and words within the images of scanned text. Images of the text may be converted to text. Audio reading that text may be generated and output to one or more speaker. The one or more speaker used to produce audio may be integral with the handheld scanner or may comprise an external speaker(s). An external speaker(s) may comprise one or more earbud or headset that discretely provides audio to the ear(s) of a user.
[0023] Systems and methods in accordance with the present invention may provide at least one light emitting diode (LED) that illuminate text for scanning. The at least one LED may be provided on the scanning tip of a handheld device in accordance with the invention near a lens used to focus images of text upon the optical sensor. In some examples, the at least one LED may comprise at least two LEDs positioned on opposing sides of a lens. However, any number of LEDs, such as four or more, may be used to illuminate text as it is scanned. One or more LED provided may be oriented at an angle relative to the focal axis of the lens in order to emit light on text being scanned using the lens. In situations where two or more LEDs are used to illuminate text, different LEDs may be inclined at different angles relative to the focal axis of a lens.
[0024] The at least one LED used to illuminate text in accordance with the present invention may be of any color, but in some examples one or more white LED may be used. In some examples in accordance with the present invention, at least one white LED may be used in conjunction with at least one RGB LED. In such examples, the at least one RGB LED may be used to produce white light (by simultaneously generating red, green, and blue white in the appropriate relative luminosities) during the scanning of text, but the at least one RGB LED may be further used to output information regarding the handheld device to a user. For example, the RGB LED may emit red light to indicate that a battery powering the device requires charging, may emit green light to indicate that the battery is fully charged, may emit a blue light to indicate that the device is powering on, etc. A variety of states may be communicated to a user using various light colors produced by the RGB LED (which may, of course, comprise more than red, green, or blue colors) and patterns of flashes of light of different colors.
[0025] A handheld device in accordance with the present invention may be powered by one or more rechargeable battery. Such a battery may be chargeable using an interface such as one of the various Universal Serial Bus (USB) standards, which may permit both charging of a battery and the exchange of data between computing components of the handheld device and external computing devices. However, other types of connections may be used for charging and/or data exchange, such as wireless connections. If a data exchange connection, whether wired or wireless, is provided in conjunction with a handheld device in accordance with the present invention, the connection may be used to provide software updates and/or provide maintenance for the handheld device and its various components. [0026] Systems and methods in accordance with the present invention may use one or more computer processor executing computer readable code retained in a non-transitory medium to execute methods in accordance with the present invention. Such methods may receive images of text captured by the optical sensor and may analyze those images to identify the abrupt changes in an image indicating the boundary of a letter or other character of text. Based upon the distance between the boundaries of characters, individual words within the text may be identified. Images of words, letters, or other characters may be used to generate text, for example using optical character recognition (OCR) techniques. One example of optical character recognition software that may be used in accordance with the present invention is OpenCV, a library of functions available for use in accordance with an open source license, but systems and methods in accordance with the present invention may use any type of optical character recognition techniques and/or software. The text may then be processed using text-to-speech techniques, and the resulting audio may be output to one or more speaker to be heard by the user. Various types of software and/or techniques may be used to convert text to speech, such as software known as "Acapela Text to Speech" available from Acapela Group France.
[0027] A handheld text scanning device in accordance with the present invention may have a resilient case, such as may be made from molded plastic, nylon, or other materials. A case may contain within it or retain upon its surface at least some of the components of a device in accordance with the present invention. Electrical and/or computing components may be retained within the case, and may comprise components such as integrated circuit boards, computer memory devices (such as RAM and ROM), sensors, switches, batteries, buses, connectors, antennae, ports, processors, and/or other devices, as well as structures to support and/or retain the components. The case may further retain a lens in a scanning tip, and may retain at least one LED on the exterior of the scanning tip while permitting appropriate electrical connections to power and control the at least one LED. A case may be molded for easy use as a handheld device, for example having an elongated structure permitting it to be held comfortably like a pencil by the user scanning text.
[0028] FIG. 1 illustrates one example of a device 100 in accordance with the present invention. A handheld text scanning device 100 may have a first end or scanning tip 110 that may retain at least one LED and a lens as described in examples below. The opposing end 120 of the device 100 may be located opposite of the scanning tip 110 on a long axis 130 of the device 100. To facilitate scanning printed text in a normal reading position, the scanning tip may have a face inclined at an angle 150 relative to the long axis 130 of the device 100. In some examples, the angle 150 of inclination of a scanning tip 110 may be between thirty and sixty degrees, while in other examples the angle 150 of inclination may comprise forty -five degrees.
[0029] FIG. 2 illustrates an example scanning tip 110 of a device in accordance with the present invention in greater detail. While in some examples a pinhole may be used to focus images of text, in many examples a lens 210 may be provided to focus images of text on an optical sensor 220 contained within the case 140. In some examples, a lens assembly 215 may retain both the lens 210 and the optical sensor 220, and the assembly 215 may be retained in a corresponding hole or opening provided in the case 140 at the scanning tip 110. A camera board 240 or other circuitry, which may include an appropriate microcontroller, may be provided in conjunction with the optical sensor 220 to receive and initially process images captured by the optical sensor 220. As described in some examples herein, images captured by the optical sensor 220 may be received from the camera board 240 by other electrical and/or computing components of the device for additional processing in accordance with the present invention. In some examples, camera board 240 may be omitted, for example by combining the circuitry of camera board 240 with other components of the device.
[0030] Still referring to the example of FIG. 2, a first LED 230 and a second LED
232 may be provided on the exterior of the scanning tip 110 proximate to the lens 210 such that, when activated, the LEDs 230, 232 will illuminate printed text being scanned. The first LED 230 may be oriented to project light at a first angle 231 relative to the face of the scanning tip 110, and the second LED 232 may be oriented to project light at a second angle
233 relative to the face of the scanning tip. In the example of FIG. 2, and as discussed further in some examples below, the lens 230 may be co-planar or parallel to the face of the scanning tip 110, in which case the focal axis of the lens 210 may be perpendicular to the surface face of the scanning tip 110, and in such an example the magnitudes of first angle 231 and the second angle 233 will determine the angles of the light emitted by the respective first LED 230 and second LED 232 with respect to the focal axis of the lens 210.
[0031] FIG. 9 illustrates an example of a scanning tip for a handheld device in accordance with the present invention wherein a first LED 930 and a second LED 932 are retained within the interior of the case 940 of the handheld scanning device 910. While FIG. 9 illustrates an example in accordance with the present invention that provides two LEDs, more or fewer LEDs may be provided. While some or all of the LEDs provided my comprise white LEDs, at least one of the provided LEDs may optionally comprise an RGB LED for use in indicating an operational status to a user. While at least one LED may be oriented to project light at a desired location by securing the LED to the exterior of a scanning tip, one or more LED may be oriented to project light at a desired location using at least one fiber optic element.
[0032] In the example of FIG. 9, a first fiber optic element 950 transmits light emitted by the first LED 930 through a first opening in the case 940, while a second fiber optic element 952 transmits light emitted by the second LED through a second opening in the case 940. Fiber optic elements 950, 952 may terminate on opposing sides of lens 910. The terminal ends of the fiber optic elements 950, 952 may be flush with the exterior of the scanning tip, may protrude from the scanning tip, or may be recessed within the scanning tip. In an example such as depicted in FIG. 9, a fiber optic elements may be provided to correspond with each of the LEDs retained in the case. Any number of LEDs may be retained within a case with light emitted by the LED transmitted using a fiber optic element, such as at least one LED and fiber optic element, at least two LEDs and fiber optic elements, etc. In further examples, one or more LED may be positioned entirely or partially on the exterior of the case, while at least one additional LED may be retained within the case with a corresponding fiber optic element transmitting light from the LED out of the case to illuminate text to be scanned. If used, fiber optic elements may be rigid or flexible. If flexible fiber optic elements are used to transmit light from one or more LED, the one or more LED within the case need not be oriented along the axis of desired illumination from that LED. A fiber optic element may be retained at a desired location within a case by providing indentations within the casing, such as a rib or ribs provided within the interior of the case. Adhesives may additionally or alternatively be used to retain a fiber optic element at a desired location within the case. An adhesive may be used to affix a fiber optic element to the corresponding LED and/or at the location of the opening of the case or other aperture through which the light will be transmitted to illuminate text to be scanned.
[0033] FIG. 3 illustrates an example of the scanning tip 110 from along the focal axis of the lens 210. The first LED 230 and the second LED 232 may be on opposing sides of the lens 210, such that the first LED 230 is above the lens 210 when the device is held for scanning text and the second LED 232 is below the lens 210 when the device is held for scanning text. Optionally, additional LEDs, such as a third LED 234 and a fourth LED 236 may also be provided, for example on opposing lateral sides of lens 210. However, both the number and arrangement of LEDs illustrated in FIG. 3 are exemplary only. In many examples in accordance with the present invention, only the first LED 230 and the second LED 232 are provided, while in further examples only a single LED is provided. In yet further examples in accordance with the present invention, various numbers of LEDs may be evenly spaced around the lens 210, while in other examples various numbers of LEDs may be unevenly spaced around the lens 210. In some examples, LEDs may be emitted or replaced with other types of light sources.
[0034] FIG. 4 illustrates an example of text 415 printed on a substrate 410, such as paper, being scanned by a device 100 held by a hand 420 of a user. Audio generated based upon the scanned text 415 may be output to a speaker such as earbud 430 to produce sound 435 discernable by the user. FIG. 4 is exemplary only, and the distance illustrated between the text 415 and the scanning tip of the device 100 is exaggerated beyond what will be needed in some examples in order to show the entirety of the exemplary text 415. Further, while an earbud 430 operating using a wireless connection, such as a connection using the Bluetooth protocol, with device is illustrated, a speaker used to output sound may take other forms. For example, a speaker may be integral to the device 100 or may be connected through a cord or cable to device 100. Further, a speaker used in conjunction with the present invention may comprise any types of external speaker, and in further examples multiple speakers may be used to output audio of the scanned text being read.
[0035] FIG. 5 illustrates an example of a first word 510 and a second word 550 such as may be scanned using systems and methods in accordance with the present invention. The first word 510 comprises a first letter 512, a second letter 514, and a third letter 516, with a first space 522 between the first letter 512 and the second letter 514 and a second space 524 between the second letter 514 and the third letter 516. Similarly, the second word 550 may comprise a first letter 552, a second letter 554, a third letter 556, and a fourth letter 558, separated by a first space 562, second space 564, and third space 566, respectively. As is typical in contemporary printed text, a space 540 may separate the first word 510 from the second word 550, that is to say the last letter 516 of the first word 510 and the first letter 552 of the second word 550, such that space 540 separating words is larger than the other spaces (i.e., spaces 522, 524, 526, 562, 564, 566, 568) separating letters within a word. By analyzing captured images to identify high contrast areas, the boundaries of individual letters (or other types of characters, if present) may be identified, thereby enabling both the characters and the spaces between the characters to be identified. By identifying larger spaces, such as space 540 in the example of FIG. 5, the images of text may be broken into individual words for further processing. Of course, words of various lengths beyond those depicted in FIG. 5 may be scanned in accordance with the present invention.
[0036] While the text depicted in FIG. 5 comprises two words and seven letters, it should be appreciated that the images captured while a device in accordance with the present invention is moved along a line of text will often contain far fewer letters than shown in FIG. 5. Indeed, as the scanning tip of a device is moved along a line of printed text, the optical sensor may capture images at a rate that provides images of only single letters or portions of single letters at a time, but by computationally stitching these series of images together entire words and sets of words may be identified and captured in the images. In some examples, accelerometers or other devices may be provided to enable the device to measure the rate at which the device is being moved by a user across a line of text, but in many examples accelerometers may be omitted. Accelerometers or other components may be particularly unnecessary when the rate at which the optical sensor captures images is rapid enough to provide substantial overlap between subsequent images captures, thereby permitting the images to be readily stitched together. It should be further noted that while the present examples use printed English, which is read left-to-right, the present invention may also be adapted for use with languages read right-to-left, top-to-bottom, or bottom-to-top. Once images of individual words have been captured, the images may be converted to text using optical character recognition techniques. The resulting text may then be used to produce audio for play over at least one speaker using text-to-speech techniques.
[0037] FIG. 6 shows an example of a single letter 552 being scanned by a handheld device in accordance with the present invention. A lens 210 may have a focal axis 610, and the device may be positioned such that letter 552 is on or near the focal axis 610. As a result, an optical sensor 220, also positioned along the focal axis 610 of lens 210, may have an image of the letter 552 focused upon the pixels of the sensor 220. The letter 552 (as well as other text characters) may be illuminated as the text is scanned by the first LED 230 and the second LED 232.
[0038] FIG. 7 schematically illustrates one exemplary arrangement of components that may be used to scan printed text using a device 110 in accordance with the present invention. As described above, an optical sensor 220 and a camera board 240 may receive images of text as it is scanned by moving the scanning tip of a device 100 along a line of text. A bus 790 or other connection mechanism(s) may operably connect the various components described in order to permit them to exchange information and to receive power from a battery 760 and a power controller 750. A temporary memory device 710, such as random access memory, may serve as a buffer to temporarily retain images or text generated from images, audio generated from text, or other transient information produced in accordance with the present invention. An interface 740 may provide a connection with an audio device. For example, interface 740 may comprise a Bluetooth module (which may include an antenna) permitting audio to be transmitted from the interface 740 to a wireless earbud. [0039] A permanent memory device 730, may contain computer readable code in a non-transitory form that causes the various components of device 100, some of which are depicted in the example of FIG. 7, to execute methods in accordance with the present invention. In some examples, permanent memory 730 may be updatable, such as to provide software updates, upgrades, bug fixes, patches, etc., to the device 100. Permanent memory may comprise flash memory such as a 15nm EMMC04G-M627-X02U flash memory unit available from Kingston Technology. A processing unit 720 may comprise a computer processor executing the instructions retained in permanent memory 730 and interfacing with the other components (such as sensor 220 and camera board 240, temporary memory 710, etc.) of device 100 to perform methods in accordance with the present invention. While a variety of computer processors may be used for processing unit 720, one example of an acceptable processor is an embedded processor such as a Texas Instruments AM3358BZCZ100. A first LED 230 and a second LED 232 may be activated or, if one is an RGB LED used to indicate a status of the device 100, controlled by processor 720 in accordance with computer readable instructions retained in permanent memory 730. A charging port 780 may permit a battery 760 to be charged. Battery 760 may comprise a lithium ion or other rechargeable battery, although in some examples disposable batteries may be used in conjunction with a scanning device in accordance with the present invention. A button 770 may be used to activate or deactivate the device 100. When activated, LEDs 230, 232 may be illuminated, and optical sensor 220 may be actively capturing images which are then processed by the various components of device to identify letters and words within those images to convert to text and ultimately audio. A variety of optical sensors may be used in accordance with the present invention, such as a TSL1401CL available from Mouser Electronics. [0040] FIG. 8 depicts one example of a method 800 in accordance with the present invention. Method 800 may be implemented using handheld text scanning devices in accordance with the present invention, and may be executed at least in part by one or more computer processor executing computer readable code retained in a non-transitory form.
[0041] In step 810, sequential images of text may be captured along a line of text.
Step 810 may occur as the scanning tip of a handheld device in accordance with the present invention is moved along the line of text by a user. Images of the printed text may be captured by an optical sensor as the device is moved. The printed text may be illuminated, for example by at least one LED, as step 810 is performed. In step 820, the images may be retained in memory, and in step 830 the images may be analyzed to identify areas of high contrast corresponding to the boundaries of characters in the text. In step 840, images of characters may be identified in the images, as well as the distances between characters. In step 850, based upon the linear distance between characters, individual letters and words may be identified within the images. Steps 820, 830, and 840 may involve stitching multiple sequential captured images into one or more larger image for processing. In step 860, the images of the words may be converted to text via techniques such as optical character recognition. In step 870, the text may be converted to audio, for example using text-to-speech techniques. In step 880, the audio may be output using a speaker, thereby reading the scanned text to the user.
[0042] Each of the steps of method 800 may be performed simultaneously as lines of text are scanned in accordance with the present invention. By continuously capturing images in step 810, while continuously analyzing those images in steps 830, 840, 850, a continuous series of images are available for conversion to text in step 860, which then provides a continuous series of text to be converted to audio in step 870 and output for a user in step 880. [0043] Additional steps beyond those depicted in the example of FIG. 8 may be added to method 800 and/or other methods in accordance with the present invention. For example, at least one LED that may optionally be used to illuminate text to be scanned in step 810 may comprise an RGB LED that alters the color of light output in order to provide information regarding the status of a device. Examples of other optional steps include checking for software updates when connected to an external computing device by a port or a wireless connection, connecting wirelessly to one or more speaker (such as a Bluetooth earbud), detecting a scan rate at which a user is scanning text (for example, using an accelerometer and/or by analyzing the degree of overlap for consecutive images), etc.
[0044] While described in examples herein, systems and methods in accordance with the present invention are not limited to these examples. In addition to the single-lens example illustrated and described, the present invention may be practiced using a pinhole instead of a lens or, in other examples, multiple lenses. Further, more than a single optical sensor may be used to capture images, and various types of optical sensors may be used in accordance with the present invention. Wile depicted as powered by a rechargeable battery, devices in accordance with the present invention may be powered through other means, such as using an electrical plug-in.
[0045] Various dimensions may be used for devices in accordance with the present invention. In some examples, different intended users may prefer or require different dimensions. For example, a young schoolchild may prefer a smaller device than an adult.
[0046] Cases may enclose and/or carry at least some of the components of a device in accordance with the present invention. The material used to create the case, and the manner in which the case is fabricated and assembled, may vary. In some examples, a case may be formed from a pair of injection molded plastic pieces that mate to retain the various operable components. The interior of the case may be configured to receive and hold the components of the device, such as printed circuit boards, integrated circuits, memory units, optical sensors, processors, batteries, etc.
[0047] Additional components, such as additional input or output devices for use in controlling a device in accordance with the present invention may be provided beyond those described in examples herein. For example, while depicted in examples herein as simply having a single button to turn the device on and off, further input mechanisms such as various buttons, dials, switches, touchscreens, or other input mechanisms may be provided to receive user input to control a device. Meanwhile, in the examples previously herein the output mechanisms comprise at least one speaker and, optionally, an RGB LED that alters its output based upon the state of the device, but other types of output mechanisms, such as display screens, additional lights (such as further LEDs), gauges, vibrational generators, etc.
[0048] While generally described herein as being used to scan printed English text and generate audio corresponding to that text, systems and methods in accordance with the present invention are not limited to any single language. In some examples, a single device may be capable of use to scan text from a variety of languages, with a user being able to use an input mechanism to select which language is being scanned. In some examples, a device in accordance with the present invention may be wirelessly paired with software, such as an "app," operating on a computing device such as a mobile phone, and the software may provide an interface the user may use to select the language or desired. In some further examples, translation capabilities may be provided, by using machine translation of text from the language scanned to a second language, with the second language being used to produce audio.
[0049] In the examples depicted herein, a device in accordance with the present invention need not connect to a computer network such as an intranet or the internet. For instances such as classroom use, the lack of network connectivity may be preferred, as it prevents outside resources from being surreptitiously being used for classwork. However, in some examples, a device in accordance with the present invention may connect to a network using a port or an appropriate wireless protocol, such as one of the 802.11 family of protocols. A network connection may be useful to provide updates to the software of a device in accordance with the present invention from a remote server, and may further be used to provide enhanced functionality such as translation capabilities. However, such connectivity is optional.

Claims

1. A system for scanning text and generating an audio signal derived from the text, the system comprising: a case extending from a scanning tip to a terminal end, such that the case may be gripped by a hand of a user and the scanning tip may be moved along a line of printed text to be scanned and read by the system; at least one light emitting diode on the scanning tip oriented to project light from the scanning tip when the light emitting diode is activated; a lens retained in the scanning tip; a sensor within the case and along a focal axis of the lens, the sensor detecting images focused upon the sensor by the lens; a memory unit that receives images from the sensor, the memory unit retaining a series of images taken by the sensor at a predetermined scan rate; a processor that, under the control of instructions embodied in a computer readable medium retained in a non-transitory form: analyzes images retained by the memory unit in the order in which the images were taken to identify changes in contrast within the images and, based upon identified changes in contrast, determines when a letter within the scanned text begins and ends,
analyzes spaces between identified letters to determine when words begin and end,
converts images of words to text, and
converts the text to an audio signal reading the text; and at least one audio speaker that outputs the audio signal to the user.
2. The system for scanning text and generating an audio signal derived from the text of claim 1, wherein the at least one light emitting diode comprises an RGB light emitting diode.
3. The system for scanning text and generating an audio signal derived from the text of claim 1, wherein the at least one light emitting diode comprises at least two light emitting diodes and the lens is retained in the scanning tip between the at least two light emitting diodes.
4. The system for scanning text and generating an audio signal derived from the text of claim 3, wherein the lens is oriented at an angle relative to a long-axis of the case and the at least two light emitting diodes are oriented at an angle relative to the focal axis of the lens.
5. The system for scanning text and generating an audio signal derived from the text of claim 4, wherein the lens is oriented at an angle of forty-five degrees from the long axis of the case.
6. The system for scanning text and generating an audio signal derived from the text of claim 5, wherein the at least two light emitting diodes are inclined inward toward the lens.
7. The system for scanning text and generating an audio signal derived from the text of claim 6, wherein the at least two light emitting diodes are inclined inward different amounts.
8. The system for scanning text and generating an audio signal derived from the text of claim 7, wherein the at least two light emitting diodes comprise two light emitting diodes.
9. The system for scanning text and generating an audio signal derived from the text of claim 8, wherein at least one of the light emitting diodes comprises an RGB light emitting diode that, under the control of the processor executing instructions embodied in a computer readable medium retained in a non-transitory form, outputs information descriptive of the status of the system by changing the color of light emitted by the RGB light emitting diode.
10. A method for converting printed text into audio, the method comprising: scanning printed text using a handheld device, scanning the printed text comprising capturing images of the text using a lens and sensor to capture images while a scanning tip of the handheld device containing the lens and sensor is moved along a line of text, scanning printed text further comprising illuminating the printed text using at least two light emitting diodes affixed to the scanning tip of the handheld device; retaining captured images in a memory device; analyzing the captured images of text sequentially to detect letters and words using a computer processor executing computer readable instructions retained in a non-transitory format; converting the captured images to text using the computer processor executing computer readable instructions retained in a non-transitory format; and producing an audio signal reading the text using a computer processor executing computer readable instructions retained in a non-transitory format.
11. The method for converting printed text into audio of claim 10, further comprising outputting the audio signal to at least one speaker.
12. The method for converting printed text into audio of claim 11, wherein outputting the audio signal to at least one speaker comprises transmitting the audio signal to at least one earpiece.
13. The method for converting printed text into audio of claim 12, wherein transmitting the audio signal to at least one earpiece comprises wirelessly transmitting the audio signal.
14. The method for converting printed text into audio of claim 10, wherein one of the at least two light emitting diodes is an RGB light emitting diode that, under the control of the computer processor executing computer readable instructions retained in a non-transitory format, emits light of different colors to indicate a status of the handheld device.
15. A handheld scanning device for converting printed text to computer-synthesized speech in real-time, the handheld scanning device comprising: at least one computer processor that executes computer readable instructions retained in a non-transitory form in at least one non-transitory media, the computer readable instructions causing the computer processor to at least convert images captured from printed text to text and to convert text to synthesized speech; at least one sensor that captures images of printed text focused upon the at least one sensor as the handheld device is scanned along a line of printed text; at least one memory device that receives captured images from the at least one sensor and provides the received images to the at least one computer processor; at least two light emitting diodes that illuminate the line of printed text being scanned; at least one speaker that produces the synthesized speech; and a case that retains the at least one computer processor, the at least one memory device, and the at least one sensor within the interior of the case, the case further retaining the at least two light emitting diodes in a position to illuminate the line of printed text being scanned.
16. The handheld scanning device for converting printed text to computer- synthesized speech in real-time of claim 15, further comprising a lens positioned between the at least two light emitting diodes that focuses an image of the line of text being scanned onto the sensor.
17. The handheld scanning device for converting printed text to computer- synthesized speech in real-time of claim 16, wherein the at least two light emitting diodes are oriented in a non-parallel fashion with the focal axis of the lens, such that the light emitted by each of the at least two light emitting diodes at least partially falls on the extension of the focal axis of the lens on the line of text being scanned.
18. The handheld scanning device for converting printed text to computer- synthesized speech in real-time of claim 17, wherein the at least two light emitting diodes comprise two light emitting diodes oriented at different angles relative to the focal axis of the lens.
19. The handheld scanning device for converting printed text to computer- synthesized speech in real-time of claim 18, further comprising a wireless communication interface that transmits the synthesized speech from the at least one computer processor to the at least one speaker, wherein the at least one speaker comprises an earpiece.
20. The handheld scanning device for converting printed text to computer- synthesized speech in real-time of claim 19, further comprising at least one rechargeable battery retained within the case, the at least one rechargeable battery powering the at least one computer processor and the at least two light emitting diodes wile the handheld scanning device is in use.
PCT/US2018/041363 2017-07-12 2018-07-10 Handheld text scanner and reader WO2019014163A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201762604638P 2017-07-12 2017-07-12
US62/604,638 2017-07-12
US201815872742A 2018-01-16 2018-01-16
US15/872,742 2018-01-16

Publications (1)

Publication Number Publication Date
WO2019014163A1 true WO2019014163A1 (en) 2019-01-17

Family

ID=65001504

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2018/041363 WO2019014163A1 (en) 2017-07-12 2018-07-10 Handheld text scanner and reader

Country Status (1)

Country Link
WO (1) WO2019014163A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5999666A (en) * 1997-09-09 1999-12-07 Gobeli; Garth W. Device and method for optical scanning of text
US20040057228A1 (en) * 2002-09-19 2004-03-25 Veutron Corporation Light source of LED for scanner
US20040138872A1 (en) * 2000-09-05 2004-07-15 Nir Einat H. In-context analysis and automatic translation
US7686227B2 (en) * 2004-03-17 2010-03-30 Socket Mobile, Inc. Cordless hand scanner with improved user feedback

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5999666A (en) * 1997-09-09 1999-12-07 Gobeli; Garth W. Device and method for optical scanning of text
US20040138872A1 (en) * 2000-09-05 2004-07-15 Nir Einat H. In-context analysis and automatic translation
US20040057228A1 (en) * 2002-09-19 2004-03-25 Veutron Corporation Light source of LED for scanner
US7686227B2 (en) * 2004-03-17 2010-03-30 Socket Mobile, Inc. Cordless hand scanner with improved user feedback

Similar Documents

Publication Publication Date Title
US7364077B2 (en) PDA compatible text scanner
US8538087B2 (en) Aiding device for reading a printed text
US8235291B2 (en) PDA compatible text scanner
Ani et al. Smart Specs: Voice assisted text reading system for visually impaired persons using TTS method
CN113297843B (en) Reference resolution method and device and electronic equipment
CN112256868A (en) Zero-reference resolution method, method for training zero-reference resolution model and electronic equipment
CN110765998A (en) Hand-held reader
CN110972123A (en) Method, hardware and computer medium for realizing wireless connection
KR20080090117A (en) The automatic talking book
JP4558731B2 (en) Reader
KR100631158B1 (en) Multimedia digital cood printing apparatus and printing method
KR20180057951A (en) System of using the voice pen for learning
Saleous et al. Read2Me: A cloud-based reading aid for the visually impaired
WO2019014163A1 (en) Handheld text scanner and reader
Lakshmi Design and Implementation of Text to Speech conversion using Raspberry pi
CN214376521U (en) Light emitting module and portable scanning device
CN210840309U (en) Touch and talk pen
CN210804812U (en) Point reading device
US8474720B2 (en) PDA compatible text scanner
CN102843491A (en) Scanning/shooting instrument with projecting function
CN110826347B (en) Handheld reader and reading control method and device thereof
KR101779165B1 (en) A system for scannig document
CN111405135A (en) Intelligent auxiliary learning equipment
CN206480176U (en) A kind of Chinese phonetic alphabet point reading device
KR20200049435A (en) Method and apparatus for providing service based on character recognition

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18831078

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18831078

Country of ref document: EP

Kind code of ref document: A1