WO2021142242A1 - Systems, and programs for visualization of auditory signals - Google Patents

Systems, and programs for visualization of auditory signals Download PDF

Info

Publication number
WO2021142242A1
WO2021142242A1 PCT/US2021/012677 US2021012677W WO2021142242A1 WO 2021142242 A1 WO2021142242 A1 WO 2021142242A1 US 2021012677 W US2021012677 W US 2021012677W WO 2021142242 A1 WO2021142242 A1 WO 2021142242A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
auditory
source
magnitude
display
Prior art date
Application number
PCT/US2021/012677
Other languages
French (fr)
Inventor
Eyal Dror
Gil SEGAL
Original Assignee
Format Civil Engineering Ltd.
The IP Law Firm of Guy Levi, LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Format Civil Engineering Ltd., The IP Law Firm of Guy Levi, LLC filed Critical Format Civil Engineering Ltd.
Publication of WO2021142242A1 publication Critical patent/WO2021142242A1/en

Links

Classifications

    • GPHYSICS
    • G02OPTICS
    • G02CSPECTACLES; SUNGLASSES OR GOGGLES INSOFAR AS THEY HAVE THE SAME FEATURES AS SPECTACLES; CONTACT LENSES
    • G02C11/00Non-optical adjuncts; Attachment thereof
    • G02C11/10Electronic devices other than hearing aids
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61FFILTERS IMPLANTABLE INTO BLOOD VESSELS; PROSTHESES; DEVICES PROVIDING PATENCY TO, OR PREVENTING COLLAPSING OF, TUBULAR STRUCTURES OF THE BODY, e.g. STENTS; ORTHOPAEDIC, NURSING OR CONTRACEPTIVE DEVICES; FOMENTATION; TREATMENT OR PROTECTION OF EYES OR EARS; BANDAGES, DRESSINGS OR ABSORBENT PADS; FIRST-AID KITS
    • A61F11/00Methods or devices for treatment of the ears or hearing sense; Non-electric hearing aids; Methods or devices for enabling ear patients to achieve auditory perception through physiological senses other than hearing sense; Protective devices for the ears, carried on the body or in the hand
    • A61F11/04Methods or devices for enabling ear patients to achieve auditory perception through physiological senses other than hearing sense, e.g. through the touch sense
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/01Head-up displays
    • G02B27/017Head mounted
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/01Head-up displays
    • G02B27/0101Head-up displays characterised by optical features
    • G02B2027/014Head-up displays characterised by optical features comprising information/image processing systems

Definitions

  • the disclosure is directed to assistive devices for the DHH Deaf, and/or hard of hearing. Specifically, the disclosure is directed to systems, and non-transitory storage medium for providing real-time, source, location and magnitude -specific rendering of auditory triggers.
  • a system for providing visual indication of auditory signal to a user comprising: a wearable display device sized and configured to receive auditory signals and display visual markers to the user; and a central processing module (CPM) in communication with the wearable display device, configured to: receive a plurality of auditory signals; analyze at least one of the: direction, source, and magnitude of each of the plurality of the auditory signal; convert at least one of the auditory signal’s analyzed to a marker specific for at least one of the direction, source, and magnitude; and cause the wearable display device to render the marker.
  • CPM central processing module
  • a non-transitory processor readable storage medium having thereon a set of executable instructions, configured, when executed by at least one processor included in a system comprising a wearable display device sized and configured to receive auditory signals and display visual markers to the user and a central processing module (CPM), cause the central processing module to: receive a plurality of auditory signals; analyze at least one of the: direction, source, and magnitude of each of the plurality of the auditory signal; convert at least one of the auditory signal’s analyzed to a marker specific for at least one of the direction, source, and magnitude; and cause the wearable display device to render the marker
  • CPM central processing module
  • FIG. 1 illustrates a schematic of the system’s components
  • FIG. 2. is a schematic flow chart of an exemplary implementation of the process of providing real-time, source, location and magnitude -specific rendering of auditory triggers
  • FIG. 3 is a schematic illustrating the system architecture.
  • exemplary implementations of systems, and non-transitory storage media for providing real-time, source, location and magnitude -specific rendering of auditory triggers.
  • a system for providing visual indication of auditory signal via, for example, automatic transcription), in such a way that the user receives (perceives) the auditory signal without distraction or minimal interference from the system to the user’s general point/area of attention/concentration/focus/regard.
  • systems for real time auditory-to- visual transcription by surround auditory input analysis (separation of different sound types, voices, their directions and amplitude, tec.) and render their visual interpretation or expression on a transparent near-eye display, AR lenses, Smart Lenses or other display e.g. PC monitor, TV or smartphone display screen.
  • the systems and programs embodied on non-transitory memory device(s) are sized and configured to allow the user, to control streaming of visual markers, such as text, symbols and animation onto a selectable display; and selectably position those markers on the user’s field of view.
  • the user can set the location; the text follows dialog partner’s face on the left/above/below and the same for symbols.
  • control movements and gestures as well as the reading of the rendered text/symbols/GIFs are adapted and configured to have minimal interference and/or disturbance and/or distraction and/or diversion effects on the user point of regard, and/or area of observation and/or particular subject of interest and/or attention.
  • the systems, and non-transitory storage media for providing real-time, source, location and magnitude -specific rendering of auditory triggers disclosed is configured to allow the user to control text format, such as, for example at least one of: initial text size, background, font, bold, italics, and the like on the display, e.g. pre-set/online/ for optimizing the view and understanding of overlaying displayed info under different scenarios and backgrounds. These scenarios can arise in dark indoors, or noisy outdoor environment.
  • the systems and programs are configured to pre-select a field of attention, either automatically, based on acoustic density parameters, or, by the user.
  • displaying, or rendering of outdoors environment to the user can be carries out using the following pseudo-code:
  • a system for providing visual indication of auditory signal to a user comprising: a wearable display device sized and configured to receive auditory signals and display visual markers to the user; and a central processing module (CPM) in communication with the wearable display device, configured to: receive a plurality of auditory signals; analyze at least one of the: direction, source, speaker identification and magnitude of each of the plurality of the auditory signal; convert at least one of the auditory signal’s analyzed to a marker specific for at least one of the direction, source, and magnitude; and cause the wearable display device to render the marker.
  • CPM central processing module
  • the term “render” does not make any assumptions as to whether the rendering process is performed by software rendering or by hardware rendering, but rather to produce a 2D graphics image on an output device (e.g., transparent near eye display 106, see e.g., FIG. 1).
  • the systems and programs disclosed herein are configured to provide an indication of a direction from the wearable device of a source of sound and/or an intensity of the sound.
  • a user may be at a crosswalk attempting to cross a street, and an oncoming car may be honking at the user in order to alert to the user that the car is driving through the crosswalk.
  • the wearable display device capable of addressing that need can be wearable glasses, comprising a microphone array 105i configured to provide a surround auditory input.
  • the wearable device in conjunction with the rest of the system is configured to provide holistic integration of the auditory surround field with the visual field. In other words, include space geometry and incidental sounds and echoes into the auditory system for filtering and additional processing.
  • the microphone array is sized and configured to map the surrounding auditory system by sending a 360° sweeping frequency and analyzing its reception.
  • the microphone array may comprise a plurality of microphones, for example, a plurality of directional microphones.
  • the plurality of microphones configured to detect the sound of the surrounding environment may be arranged in a manner to detect sound coming from a plurality of directions.
  • the microphones may be arranged in a microphone array.
  • FIG. 1 illustrates an example wearable device 150 having a microphone array 105i.
  • wearable device 150 comprises an array 105i of directional (and/or omni) microphones.
  • Each directional microphone 105i is arranged so as to capture audio in its respective corresponding region 108j (configured to sense sound at specific directions, amplitude, distance and width of lobe).
  • each directional microphone is positioned to primarily detect sound coming from the corresponding region, the microphones may also detect sound coming from other regions.
  • the term “directional microphone” is used as the generic term for all microphones which are not omnidirectional microphones, and refers to microphones that respond differently to sounds arriving from different directions, wherein the direction dependent sensitivity to acoustic signals is defined or described by the directional microphones' directivity pattern and encompasses, for example, figure eight microphones, cardioid heart shaped microphones, super cardioid microphones and hyper cardioid microphones but, as indicated, not omnidirectional microphones.
  • wearable device 150 further comprises imaging module 107, configured to capture gestures from the user, using for example glove 102, and/or fingertip (wireless) sensor 103 (e.g., a tactile sensor for detecting a contact pressure), whereby the user can organize the marker scene on the near-eye transparent display 106.
  • imaging module 107 configured to capture gestures from the user, using for example glove 102, and/or fingertip (wireless) sensor 103 (e.g., a tactile sensor for detecting a contact pressure), whereby the user can organize the marker scene on the near-eye transparent display 106.
  • imaging module whether as coupled to wearable device 150, or on board of portable computing device 110, and as used herein refers a unit that can include at least one, built-in image and/or optic sensor that outputs electrical signals, which have been obtained through photoelectric conversion, as an image, and/or alternatively, configured to enable the user to change the location of images on near-eye transparent display 106
  • module refers to software, hardware, for example, a processor, or a combination thereof that is programmed with instructions for carrying an algorithm or method (see e.g., processing unit 120.
  • the modules described herein may communicate through a wired connection, for example, a hard-wired connections, a local area network, or the modules may communicate wirelessly.
  • the imaging module may comprise charge coupled devices (CCDs), a complimentary metal-oxide semiconductor (CMOS), a hyperspectral camera, or a combination comprising one or more of the foregoing.
  • CCDs charge coupled devices
  • CMOS complimentary metal-oxide semiconductor
  • hyperspectral camera or a combination comprising one or more of the foregoing.
  • the imaging module can comprise a digital frame camera, where the field of view (FOV, referring to the extent of the observable world that is seen at any given moment by a user or the near-eye display, which can be adjustable) can be predetermined by, for example, the camera size and the distance from the subject’s face.
  • FOV field of view
  • the cameras used in the imaging modules of the systems and programs disclosed can be a digital camera.
  • digital camera refers in an exemplary implementation to a digital still camera, a digital video recorder that can capture a still image of an object and the like.
  • the digital camera can comprise an image capturing unit or module, a capture controlling module, a processing unit (which can be the same or separate from the central processing module).
  • Setting up the device can be done, in certain exemplary implementation, using the following pseudo-code:
  • Mic Array module receives control inputs from user:
  • Beam direction if front - keep beam direction if other - change to other direction Beam opening angle is defined by Operating made selection by user.
  • the device is operable to focus on the speaker (person) or location, to the exclusion of others.
  • the user may us the wireless finger (or hand/head/body) sensor to point and/or select the person/area of interest without losing focus.
  • the user can read the text and symbols AND keep visual attention. In these circumstances the user doesn't need to look at the display (or anywhere else for selecting the subject of interest or attention).
  • the display is additionally configured to show control symbols/icons as part of the near-eye display so the user can actuate these control symbols by “blind pointing” at them and activate the wanted feature.
  • Display of dynamic/virtual keyboard on the near-eye display is operable in certain configurations for fast writing controlled by the (wireless) fingers/wrist sensor. Availability of the virtual keyboard and actionable icons, enables the user to write and/or select features fast.
  • the term "operable” means the system and/or the device and/or the program, or a certain element or step is fully functional, sized, adapted and calibrated, comprises elements for, and meets applicable operability requirements to perform a recited function when activated, coupled, implemented, actuated, effected, realized, or when an executable program is executed by at least one processor associated with the system and/or the device.
  • the term "operable” means the system and/or the circuit is fully functional and calibrated, comprises logic for, having the hardware and firmware necessary, as well as the circuitry for, and meets applicable operability requirements to perform a recited function when executed by at least one processor.
  • actionable icon is used herein to mean graphics, and /or icons that can be used to trigger one or more actions on the GUI displayed on the near-eye display.
  • actionable icons may include, but are not limited to, specific speakers, boundaries of given areas of interest, volume level, color assignment, or a combination comprising the foregoing.
  • a hearing-impaired user that can speak, but with hard accent
  • the virtual keyboard can enable the user to correct interactively his speech (provided the system has features a tailored AI interpreter for the deaf user’s accent, diction, enunciation and the like, if automatic transcription has errors.
  • Imaging module 107 can also be configured to transmit video feed to either near-eye transparent display 106, and/or CPM 120, and/or portable computing device 110 (e.g., smartphone).
  • the video feed can be used, with other audible triggers to determine optical and acoustic flow, thus adding to the maintenance of the direction of audible trigger flow through the user’s scene.
  • the imaging module can comprise video cameras, configured to transmit at the format commensurate with local standard, for example, NTSC, PAL, or as encoded and encrypted packets.
  • the video can be transmitted using radio frequency, or if intended for indoor use, using Bluetooth, ZigBee, or cellular networks provided the structure is equipped with the proper beacons.
  • the video encoding can have a format of H.264, H.265, MPEG, and the like.
  • the combination of video stream and recorded audio obtained from the microphone array is used for machine learning to optimize display and leam the user’s preferences, as well as filter those sounds that are irrelevant to the user in a given circumstance.
  • both portable computing device 110 and CPM 120 may further comprise a user interface module, whereby, “user interface module” broadly refers to any visual, graphical, tactile, audible, sensory, or other means of providing information to and/or receiving information from the user or other entity.
  • “user interface module” broadly refers to any visual, graphical, tactile, audible, sensory, or other means of providing information to and/or receiving information from the user or other entity.
  • a set of instructions 111-114 which enable presenting a graphical user interface (GUI) on near-eye transparent display 106 to the user 700 (not shown) for displaying and changing and or inputting data associated with a data object (e.g., wearable device 150, near-eye transparent display 106) in data fields.
  • the user interface module is capable of displaying any data that it reads from imaging module 107, and microphone array 105i.
  • near eye refers to a display where near-eye display 106 is disposed in near proximity to eyes of user 700 in one exemplary implementation.
  • Near proximity can include various ranges according to system parameters and design criteria for example, near eye solutions are within 10 cm (e.g., within 7.5 cm, and within 5 cm) of the eyes of a user 700 when system 10 is in use.
  • portable wearable device 110 is positioned within 2.5 cm.
  • near-eye transparent display 106 of wearable device 150 can be monocular in certain exemplary implementations, or binocular in others. Furthermore, if the display is bi-ocular, the display of the markers can be limited to a single ocular display, or to both, and can further move from one to the other. In implementations using monocular display, near-eye transparent display 106 may cover a single eye of the user or both.
  • the term “bi-ocular near-eye transparent display” refers to near-eye transparent display 106 which it is intended to display audible markers in front of both eyes simultaneously. Consequently, bi-ocular near-eye transparent display 106 comprises optical elements configured to present data to both eyes. Conversely, the term "monocular” as used herein, refers to optical elements and near-eye transparent display 106 having a single eyepiece or ocular designed for viewing an objective image, typically in front of one eye, while the other eye remains uncovered.
  • the near-eye display is a module capable of being assembled as a kit of parts to form the system disclosed herein.
  • the module is operable to couple to other parts in the kit of parts, for example as a clip-on to a frame of existing eye glasses, a helmet and other wearable devices that will allow the near-eye display to provide the display to user 700.
  • the near eye display whether as a module or integral component can be comprised of one or more of, by way of example only; OLED, TOLED, iLED (micro-LED), PHOLED (Phosphorescent OLED), WOLED (White OLED), FOLED (Flexible OLED), ELED (Electroluminescent display), TFEL (Thin Film Electroluminescent), TDEL (Thick dielectric electroluminescent), or Quantum Dot Laser.
  • the system further comprises means, such as at least one of: SONAR, LIDAR, Doppler radar, RGBD camera, Time-of-flight (ToF) Camera and the like, each operable to provide acoustic mapping of a given volume, for example, a room.
  • the mapping is used in certain exemplary implementations for better isolation of input sources and/or noise cancellation from other sound input sources.
  • mapping can also be used for sound orientation (e.g. distance resolution of the source), for example where the user is sitting in front of a PC near the wall, the dog at the end of the room barks. The bark sound is received from the dog as well as from the wall in front of the user.
  • mapping enables the system to define that the sound (e.g., the barking) is from behind on the far side.
  • mapping is used for selection of a desired area of interest and isolating a source by cancelling all other noises (as well as echoes) out of the area of audible interest (in other words, an area of acoustic interest AAI).
  • the systems and devices disclosed are operable to focus on a predefined segment (in other words, beam focus), as well as provide resolution of sound 360° around the user, both indoors and outdoors.
  • the system comprises processor readable media, in communication with a non-volatile memory device having thereon a set of executable instructions, configured when executed to perform machine learning capabilities for allowing the user to discern the different audible sources and, based on the selected field of attention (FOAT), selecting a predetermined number out of several by the user saying its name.
  • the system is then configured to displays the possible descriptor of the mark and the user can then choose one by, in certain examples, finger touch on the user interface module’s touch screen (e.g., of portable computing device 110), or by indicating verbally.
  • the systems, and non-transitory storage media for providing real-time, source, location and magnitude -specific rendering of auditory triggers by a specific marker is further configured to change the marker (for example, a person’s image or avatar, a symbol, a caption and the like) in size as a function of the auditory signal’s magnitude.
  • the marker for example, a person’s image or avatar, a symbol, a caption and the like
  • the source of sound identified and selected will increase (and decrease) in size the closer (or further) the audible source is to the user.
  • the markers can be for example, at least one of: the voice of a known person (recognized by the system or defined by the user), sirens (e.g., ambulance, police, fire, air-raid, hurricane), baby cry, doorbell, ringtone (recognized by the system or defined by the user), car/truck/ship/train horns, boiling kettle, various appliance sounds (e.g., microwave, washing machine, dishwasher and the like), door slam, sliding window, creaking of hinges/steps, lock opening, fan sounds (e.g., room fan, ceiling fan, computer fan and the like), and a combination comprising the foregoing. It is noted, that each source would be assigned a specific marker (e.g., symbol), that is unique to that source.
  • a specific marker e.g., symbol
  • the systems used herein can be computerized systems, further comprising a central processing module; a display module; and a user interface module.
  • the Display modules which can include display elements, which may include any type of element which acts as a display.
  • a typical example is a Liquid Crystal Display (LCD).
  • LCD for example, includes a transparent electrode plate arranged on each side of a liquid crystal.
  • OLED displays and Bi-stable displays.
  • New display technologies are also being developed constantly. Therefore, the term display should be interpreted widely and should not be associated with a single display technology.
  • the display module may be mounted on a printed circuit board (PCB) (121, see e.g., FIG. 1) of electronic device 120, arranged within a protective housing and the display module is protected from damage by a glass or plastic plate arranged over the display element and attached to the housing.
  • PCB printed circuit board
  • a non-transitory processor readable storage medium having thereon a set of executable instructions, configured, when executed by at least one processor included in a system comprising a wearable display device sized and configured to receive auditory signals and display visual markers to the user and a central processing module (CPM, see e.g., 304, FIG. 3), cause the central processing module to: receive a plurality of auditory signals; analyze at least one of the: direction, source, and magnitude of each of the plurality of the auditory signal; convert at least one of the auditory signal’s analyzed to a marker specific for at least one of the direction, source, and magnitude; and cause the wearable display device to render the marker.
  • a text box is rendered under the speaker (in other words, the source of the audible signal) chin with the width of the speaker shoulders and if the speaker moves the text box is operable to follow and maintain the predefined configuration.
  • Non-volatile media can be, for example, optical or magnetic disks, such as a storage device.
  • Volatile media includes dynamic memory, such as main memory.
  • Memory device as used in the programs and systems described herein can be any of various types of memory devices or storage devices.
  • the term “memory device” is intended to encompass an installation medium, e.g., a CD-ROM, floppy disks, or tape device; a computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, Rambus RAM, etc.; or a non-volatile memory such as a magnetic media, e.g., a hard drive, optical storage, or ROM, EPROM, FLASH, etc.
  • the memory device may comprise other types of memory as well, or combinations thereof.
  • the memory medium may be located in a first computer in which the programs are executed (e.g., wearable device 150), and/or may be located in a second different computer [or micro controller, e.g., portable computing device 110] which connects to the first computer over a network, such as the Internet.
  • the second computer may further provide program instructions to the first computer for execution.
  • the term “memory device” can also include two or more memory devices which may reside in different locations, e.g., in different computers that are connected over a network.
  • FIG. 2 An exemplary implementation of the system 200 is provided in FIG. 2, illustrating both audible capture 201, and visual 202 capture of an audible trigger by a person. Captured by an i th microphone in microphone array 105i, and transmitted 203 wirelessly to be processed by sound data module 204 and audio and sounds module 205, where following pre-classification the system determine whether the source is recognized (may be affected by the user as well), based on which sound an direction symbols (interchangeable with markers) are assigned 206, whereby, together with linguistic data module 207 equipped for example with natural language recognition, and speech-to-text module 208, will convert the sound to text 209, and be inputted into sound merge module 210, comprising lip reading capabilities.
  • sound data module 204 and audio and sounds module 205 where following pre-classification the system determine whether the source is recognized (may be affected by the user as well), based on which sound an direction symbols (interchangeable with markers) are assigned 206, whereby, together with linguistic data module 207 equipped for example
  • the source is captured 202 with, for example video stream, then transmitted 216 to image processing module 217, where, optionally using wireless transmission 221, the processed image can undergo 218 facial recognition and lip-reading analysis 219, then uploaded to sound merge module 210, where both input streams (audible 201 and visual 202), using wireless 215 communication (or wired communication), are input 211 into display generator 212, which renders 213 the processed data on near-eye display 214 (near-eye transparent display 106, see e.g., FIG. 1).
  • FIG. 3 illustrating an exemplary configuration of system 300.
  • audio inputs 301 generally captured and provided by one or more microphones.
  • the microphones are in wired or wireless communication with the CPM 304.
  • the microphone arrays are configured as a module, capable of being assembled as a kit of parts to form system 300 (or 10), for example by operably coupling to regular glasses, for example, by clipping on, and/or adhering to the frame of regular glasses (or AR/VR/MR devices).
  • video inputs 302 generally comprise of one or more cameras forming a portion of the imaging module 107. Similar to audio module, the imaging module 302 are in wired or wireless communication with the CPM 304.
  • manual inputs 303 are supplied from various sensors (e.g., fingertip/hand sensors 102, see e.g., FIG. 1) again available in certain exemplary implementations as a separate module, capable of being assembled as a kit of parts to form system 300 (or 10) operable to be in wired or wireless communication with CPM 304.
  • CPM 304 is operable in certain implementations, to perform at least two processes: i) Speech to text transcription ii) Display Process.
  • Processing module can be locally embedded within portable computing device 110, capable of being assembled as a kit of parts to form system 300 (or 10), or be remote (cloud- based).
  • CPM 304 can have additional processes such as Spatial Visual analysis operable to detect and identifies visible objects e.g. upper part of a person (e.g., speaker or user) face, lips, eyes and the like.
  • Speech to text process 305 is operable to transcribe in real time words and sentences input from voice to text which are further manipulated before display.
  • the system is configured to execute more than one Speech to text processes at the same time.
  • Spatial Auditory analysis 306 detects, separates and retrieves all the sound types other tan speech, transcribes and converts (renders) these inputs to graphic format (text symbol, actionable icons, etc. ) and sends it to the Text/Graphic Display Process for final arrangement and determination of if, how and what to display.
  • Spatial Video analysis 307 is configured to detect and surrounding objects, such as, for example people, their direction, face and lips and sends the information to the Text/Graphic Display Process for final arrangement and determination of if, how and what to display (render).
  • Display process 308 in system 300 is operable to receive interpreted and analyzed processable data about the auditory and visual surroundings and based on it, arranges it in form of dynamic graphic presentation such as text, symbols, actionable icons and animations adapted to the specific needs of user 700.
  • Systems 300 used herein can be computerized systems further comprising: central processing module 304; a transparent near-eye display module 106; and user interface module (e.g., sensor 102).
  • Display modules e.g., near-eye display 106 which can include display elements, may further include any type of element which acts as a display.
  • a typical example is a Liquid Crystal Display (LCD).
  • LCD for example, includes a transparent electrode plate arranged on each side of a liquid crystal.
  • OLED displays and Bi-stable displays New display technologies are also being developed constantly. Therefore, the term display should be interpreted broadly and should not be associated with a single display technology.
  • the display module 106 may be mounted on a printed circuit board (PCB) of an electronic device, arranged within a protective housing and the display module is protected from damage by a glass or plastic plate arranged over the display element and attached to the housing.
  • PCB printed circuit board
  • module is used herein to refer to software computer program code and/or any hardware or circuitry utilized to provide the functionality attributed to the module.
  • module or component can also refer to software objects or routines that execute on the computing system.
  • the different components, modules, engines, and services described herein may be implemented as objects or processes that execute on the computing system (e.g., as separate threads).
  • module processing circuit
  • processing unit may be a single processing device or a plurality of processing devices.
  • Such a processing device may be a microprocessor, micro-controller, digital signal processor, microcomputer, central processing unit, field programmable gate array, programmable logic device, state machine, logic circuitry, analog circuitry, digital circuitry, and/or any device that manipulates signals (analog and/or digital) based on hard coding of the circuitry and/or operational instructions (in other words, firmware).
  • the at least one processor, processing circuit, and/or processing unit may have an associated memory and/or an integrated memory element, which may be a single memory device, a plurality of memory devices, and/or embedded circuitry of the processing module, module, processing circuit, and/or processing unit.
  • Such a memory device may be a read-only memory, random access memory, transient memory, non transient memory, static memory, dynamic memory, flash memory, cache memory, and/or any device that stores digital information.
  • the processing devices may be centrally located or may be distributed (e.g., cloud computing via indirect coupling via a local area network and/or a wide area network).
  • the memory element may store, and processor, module, processing circuit, and/or processing unit executes, hard coded and/or operational instructions corresponding to at least some of the steps and/or functions.
  • Such a memory device or memory element can be and is included in an exemplary implementation, as an article of manufacture.
  • the at least one processor may be operably coupled to the various modules and components with appropriate circuitry may also be used herein, the term(s) “operably coupled to”, “coupled to”, and/or “coupling” includes direct coupling between items and/or indirect coupling between items via an intervening item (e.g., an item includes, but is not limited to, a component, an element, a circuit, an engine, and/or a module) where, for indirect coupling, the intervening item does not modify the information of a signal but may adjust its current level, voltage level, and/or power level.
  • an intervening item e.g., an item includes, but is not limited to, a component, an element, a circuit, an engine, and/or a module
  • inferred coupling includes direct and indirect coupling between two items in the same manner as “coupled to”.
  • operble to or “operably coupled to” indicates that an item includes one or more of power connections, input(s), output(s), etc., to perform, when activated, one or more its corresponding functions and may further include inferred coupling to one or more other items.
  • associated with includes direct and/or indirect coupling of separate items and/or one item being embedded within another item.
  • a system for providing visual indication of auditory signal to a user comprising: a wearable display device sized and configured to receive auditory signals and display visual markers to the user; and a central processing module (CPM) in communication with the wearable display device, configured to: receive a plurality of auditory signals; analyze at least one of the: direction, source, speaker identification, and magnitude of each of the plurality of the auditory signal; convert at least one of the auditory signal’s analyzed to a marker specific for at least one of the direction, source, and magnitude; and cause the wearable display device to render the marker, wherein (i) the user is hard-of-hearing, (ii) the wearable display device further comprises: a first microphone array configured to provide a surround auditory input; and an additional (second) wearable microphone array configured to provide a surround auditory input, (iii) each microphone array comprises a plurality of directional microphones, wherein (iv) the wearable display device
  • an article of manufacture comprising a central processing module having at least one processor, in communication with a non- transitory processor readable storage medium having thereon a set of executable instructions, configured, when executed by the at least one processor, cause the central processing module to: receive a plurality of auditory signals; analyze at least one of the: direction, source, and magnitude of each of the plurality of the auditory signal; convert at least one of the auditory signal’s analyzed to a marker specific for at least one of the direction, source, and magnitude; and render the marker, wherein (xi) the article of manufacture further comprises a wearable display device sized and configured to receive auditory signals and display visual markers to the user, (xii) the wearable display device comprises a microphone array of at least three microphones configured to provide a surround auditory input and wherein the set of executable instructions are further configured, when executed, to: determine at least one of the direction, source, and magnitude of the auditory signal based on acoustic flow
  • kits of parts capable of being assembled for providing visual indication of auditory signal to a user
  • the kit comprising: a microphone array configured to provide a surround auditory input; an imaging module, the imaging module configured to capture an image of at least one of: the auditory signal source and user gesture ; a wearable display device sized and configured to receive auditory signals and display visual markers to the user; and a central processing module (CPM) in communication with the wearable display device, the microphone array; and the imaging module, configured to: receive a plurality of auditory signals; analyze at least one of the: direction, source, speaker identification, and magnitude of each of the plurality of the auditory signal; convert at least one of the auditory signal’s analyzed to a marker specific for at least one of the direction, source, and magnitude; and cause the wearable display device to render the marker.
  • CPM central processing module

Abstract

The disclosure relates to assistive devices for the DHH Deaf, and/or hard of hearing. Specifically, the disclosure relates to systems, and non-transitory storage media for providing real-time, source, location and magnitude-specific markers' rendering of auditory triggers.

Description

SYSTEMS, AND PROGRAMS FOR VISUALIZATION OF AUDITORY SIGNALS
BACKGROUND
[0001] The disclosure is directed to assistive devices for the DHH Deaf, and/or hard of hearing. Specifically, the disclosure is directed to systems, and non-transitory storage medium for providing real-time, source, location and magnitude -specific rendering of auditory triggers.
[0002] Although significant progress has been made in providing subjects who have hearing deficiencies with improved hearing aids, a wide adoption of the hearing aids among the population is still not achieved. Moreover, in a noise-rich environment, it is hard for individual to identify the source and follow the source’s movement in space.
[0003] These and other shortcomings of the existing technology are sought to be resolved herein.
SUMMARY
[0004] Disclosed, in various exemplary implementations, are systems, and non-transitory storage medium for providing real-time, source, location and magnitude -specific rendering of auditory triggers.
[0005] In an exemplary implementation, provided herein is a system for providing visual indication of auditory signal to a user, the system comprising: a wearable display device sized and configured to receive auditory signals and display visual markers to the user; and a central processing module (CPM) in communication with the wearable display device, configured to: receive a plurality of auditory signals; analyze at least one of the: direction, source, and magnitude of each of the plurality of the auditory signal; convert at least one of the auditory signal’s analyzed to a marker specific for at least one of the direction, source, and magnitude; and cause the wearable display device to render the marker.
[0006] In another exemplary implementation, provided herein is a non-transitory processor readable storage medium having thereon a set of executable instructions, configured, when executed by at least one processor included in a system comprising a wearable display device sized and configured to receive auditory signals and display visual markers to the user and a central processing module (CPM), cause the central processing module to: receive a plurality of auditory signals; analyze at least one of the: direction, source, and magnitude of each of the plurality of the auditory signal; convert at least one of the auditory signal’s analyzed to a marker specific for at least one of the direction, source, and magnitude; and cause the wearable display device to render the marker
[0007] These and other features of the systems, and non-transitory storage media for providing real-time, source, location and magnitude- specific rendering of auditory triggers, will become apparent from the following detailed description when read in conjunction with the figures and examples, which are exemplary, not limiting.
BRIEF DESCRIPTION OF THE FIGURES
[0008] For a better understanding of systems, and non-transitory storage media for providing real-time, source, location and magnitude -specific rendering of auditory triggers, with regard to the exemplary implementations thereof, reference is made to the accompanying examples and figures, in which:
[0009] FIG. 1, illustrates a schematic of the system’s components;
[00010] FIG. 2., is a schematic flow chart of an exemplary implementation of the process of providing real-time, source, location and magnitude -specific rendering of auditory triggers; and
[00011] FIG. 3, is a schematic illustrating the system architecture.
DETAILED DESCRIPTION
[00012] Provided herein are exemplary implementations of systems, and non-transitory storage media for providing real-time, source, location and magnitude -specific rendering of auditory triggers. Accordingly, provided herein is a system for providing visual indication of auditory signal (via, for example, automatic transcription), in such a way that the user receives (perceives) the auditory signal without distraction or minimal interference from the system to the user’s general point/area of attention/concentration/focus/regard.
[00013] In certain implementations, provided herein are systems for real time auditory-to- visual transcription by surround auditory input analysis (separation of different sound types, voices, their directions and amplitude, tec.) and render their visual interpretation or expression on a transparent near-eye display, AR lenses, Smart Lenses or other display e.g. PC monitor, TV or smartphone display screen. The systems and programs embodied on non-transitory memory device(s) are sized and configured to allow the user, to control streaming of visual markers, such as text, symbols and animation onto a selectable display; and selectably position those markers on the user’s field of view. For example, the user can set the location; the text follows dialog partner’s face on the left/above/below and the same for symbols.
[00014] In the systems provided, using the methods and programs disclosed, the user’s control movements and gestures as well as the reading of the rendered text/symbols/GIFs are adapted and configured to have minimal interference and/or disturbance and/or distraction and/or diversion effects on the user point of regard, and/or area of observation and/or particular subject of interest and/or attention.
[00015] In certain implementations, the systems, and non-transitory storage media for providing real-time, source, location and magnitude -specific rendering of auditory triggers disclosed, is configured to allow the user to control text format, such as, for example at least one of: initial text size, background, font, bold, italics, and the like on the display, e.g. pre-set/online/ for optimizing the view and understanding of overlaying displayed info under different scenarios and backgrounds. These scenarios can arise in dark indoors, or noisy outdoor environment. In certain implementations, the systems and programs are configured to pre-select a field of attention, either automatically, based on acoustic density parameters, or, by the user.
[00016] For example, and in an exemplary implementation, displaying, or rendering of outdoors environment to the user, can be carries out using the following pseudo-code:
Check mode (it is set to outdoor)
Set beam angle 360 deg Get sound
If sound is from siren, display symbol on alert window, otherwise ignore.
Sound increase in volume , if yes then increase symbol direction size.
If Symbol size reached max., then change color to hotter red.
Check direction
If direction changed, then set direction vector of alarm symbol to new direction
[00017] Accordingly and in an exemplary implementation, provided herein is a system for providing visual indication of auditory signal to a user, the system comprising: a wearable display device sized and configured to receive auditory signals and display visual markers to the user; and a central processing module (CPM) in communication with the wearable display device, configured to: receive a plurality of auditory signals; analyze at least one of the: direction, source, speaker identification and magnitude of each of the plurality of the auditory signal; convert at least one of the auditory signal’s analyzed to a marker specific for at least one of the direction, source, and magnitude; and cause the wearable display device to render the marker. In the context of the various implementations and examples disclosed herein, the term “render” does not make any assumptions as to whether the rendering process is performed by software rendering or by hardware rendering, but rather to produce a 2D graphics image on an output device (e.g., transparent near eye display 106, see e.g., FIG. 1).
[00018] Under certain circumstances, it may be difficult for a user to hear the sound of the surrounding environment. In an exemplary implementation, the systems and programs disclosed herein are configured to provide an indication of a direction from the wearable device of a source of sound and/or an intensity of the sound. As an example, a user may be at a crosswalk attempting to cross a street, and an oncoming car may be honking at the user in order to alert to the user that the car is driving through the crosswalk. In such a case, it may be helpful to indicate to the user the direction from which the honk is coming (e.g., from the left or the right), the intensity of the honk (e.g., in order to indicate how close the oncoming car it to the user), and the direction at which the car is progressing (e.g., to the left or the right). The wearable display device (see e.g., 150, FIG. 1), capable of addressing that need can be wearable glasses, comprising a microphone array 105i configured to provide a surround auditory input. The wearable device, in conjunction with the rest of the system is configured to provide holistic integration of the auditory surround field with the visual field. In other words, include space geometry and incidental sounds and echoes into the auditory system for filtering and additional processing. In certain implementations, the microphone array is sized and configured to map the surrounding auditory system by sending a 360° sweeping frequency and analyzing its reception.
[00019] The microphone array may comprise a plurality of microphones, for example, a plurality of directional microphones. The plurality of microphones configured to detect the sound of the surrounding environment may be arranged in a manner to detect sound coming from a plurality of directions. For instance, the microphones may be arranged in a microphone array. FIG. 1 illustrates an example wearable device 150 having a microphone array 105i. In particular, wearable device 150 comprises an array 105i of directional (and/or omni) microphones. Each directional microphone 105i is arranged so as to capture audio in its respective corresponding region 108j (configured to sense sound at specific directions, amplitude, distance and width of lobe).
[00020] Note that although each directional microphone is positioned to primarily detect sound coming from the corresponding region, the microphones may also detect sound coming from other regions. In the context of the disclosure, the term “directional microphone” is used as the generic term for all microphones which are not omnidirectional microphones, and refers to microphones that respond differently to sounds arriving from different directions, wherein the direction dependent sensitivity to acoustic signals is defined or described by the directional microphones' directivity pattern and encompasses, for example, figure eight microphones, cardioid heart shaped microphones, super cardioid microphones and hyper cardioid microphones but, as indicated, not omnidirectional microphones.
[00021] As illustrated in FIG. 1, wearable device 150 further comprises imaging module 107, configured to capture gestures from the user, using for example glove 102, and/or fingertip (wireless) sensor 103 (e.g., a tactile sensor for detecting a contact pressure), whereby the user can organize the marker scene on the near-eye transparent display 106. It is noted that the term “imaging module” whether as coupled to wearable device 150, or on board of portable computing device 110, and as used herein refers a unit that can include at least one, built-in image and/or optic sensor that outputs electrical signals, which have been obtained through photoelectric conversion, as an image, and/or alternatively, configured to enable the user to change the location of images on near-eye transparent display 106, while the term “module” refers to software, hardware, for example, a processor, or a combination thereof that is programmed with instructions for carrying an algorithm or method (see e.g., processing unit 120. The modules described herein may communicate through a wired connection, for example, a hard-wired connections, a local area network, or the modules may communicate wirelessly. The imaging module may comprise charge coupled devices (CCDs), a complimentary metal-oxide semiconductor (CMOS), a hyperspectral camera, or a combination comprising one or more of the foregoing. If static images are acquired, the imaging module can comprise a digital frame camera, where the field of view (FOV, referring to the extent of the observable world that is seen at any given moment by a user or the near-eye display, which can be adjustable) can be predetermined by, for example, the camera size and the distance from the subject’s face. Furthermore, the FOV can be selectably variable to coincide with the field of attention determined automatically and/or by the user. The cameras used in the imaging modules of the systems and programs disclosed, can be a digital camera. The term “digital camera” refers in an exemplary implementation to a digital still camera, a digital video recorder that can capture a still image of an object and the like. The digital camera can comprise an image capturing unit or module, a capture controlling module, a processing unit (which can be the same or separate from the central processing module). [00022] Setting up the device can be done, in certain exemplary implementation, using the following pseudo-code:
Operating Mode Setup (affects Audio input setup as wet! as Rendering and Display windows setup)
Home
Voice only
Street/Outdoor
Dialog Room conference
Trialog Auditorium. Theatre
Multi conversation Smartphone, tablet, Laptop
Beam direction T V
Default Beam of mic array points to front,
Mic Array module receives control inputs from user:
Beam direction if front - keep beam direction if other - change to other direction Beam opening angle is defined by Operating made selection by user.
[00023] For example, using a regular gesture on smartphone display pad (or any other control feature) to select a person of interest (or location of interest) the device is operable to focus on the speaker (person) or location, to the exclusion of others. Similarly, the user may us the wireless finger (or hand/head/body) sensor to point and/or select the person/area of interest without losing focus. In other words, the user can read the text and symbols AND keep visual attention. In these circumstances the user doesn't need to look at the display (or anywhere else for selecting the subject of interest or attention). Moreover, the display is additionally configured to show control symbols/icons as part of the near-eye display so the user can actuate these control symbols by “blind pointing” at them and activate the wanted feature. Display of dynamic/virtual keyboard on the near-eye display is operable in certain configurations for fast writing controlled by the (wireless) fingers/wrist sensor. Availability of the virtual keyboard and actionable icons, enables the user to write and/or select features fast.
[00024] In the context of the disclosure, the term "operable" means the system and/or the device and/or the program, or a certain element or step is fully functional, sized, adapted and calibrated, comprises elements for, and meets applicable operability requirements to perform a recited function when activated, coupled, implemented, actuated, effected, realized, or when an executable program is executed by at least one processor associated with the system and/or the device. In relation to systems and circuits, the term "operable" means the system and/or the circuit is fully functional and calibrated, comprises logic for, having the hardware and firmware necessary, as well as the circuitry for, and meets applicable operability requirements to perform a recited function when executed by at least one processor. Additionally, the term "actionable icon" is used herein to mean graphics, and /or icons that can be used to trigger one or more actions on the GUI displayed on the near-eye display. For example, actionable icons, may include, but are not limited to, specific speakers, boundaries of given areas of interest, volume level, color assignment, or a combination comprising the foregoing.
[00025] In certain exemplary implementations, a hearing-impaired user that can speak, but with hard accent, the virtual keyboard can enable the user to correct interactively his speech (provided the system has features a tailored AI interpreter for the deaf user’s accent, diction, enunciation and the like, if automatic transcription has errors.
[00026] Imaging module 107 can also be configured to transmit video feed to either near-eye transparent display 106, and/or CPM 120, and/or portable computing device 110 (e.g., smartphone). The video feed can be used, with other audible triggers to determine optical and acoustic flow, thus adding to the maintenance of the direction of audible trigger flow through the user’s scene. Moreover, the imaging module can comprise video cameras, configured to transmit at the format commensurate with local standard, for example, NTSC, PAL, or as encoded and encrypted packets. The video can be transmitted using radio frequency, or if intended for indoor use, using Bluetooth, ZigBee, or cellular networks provided the structure is equipped with the proper beacons. In an exemplary implementation, the video encoding can have a format of H.264, H.265, MPEG, and the like. In certain implementation, the combination of video stream and recorded audio obtained from the microphone array, is used for machine learning to optimize display and leam the user’s preferences, as well as filter those sounds that are irrelevant to the user in a given circumstance.
[00027] Likewise, both portable computing device 110 and CPM 120 may further comprise a user interface module, whereby, “user interface module” broadly refers to any visual, graphical, tactile, audible, sensory, or other means of providing information to and/or receiving information from the user or other entity. For example, a set of instructions 111-114 which enable presenting a graphical user interface (GUI) on near-eye transparent display 106 to the user 700 (not shown) for displaying and changing and or inputting data associated with a data object (e.g., wearable device 150, near-eye transparent display 106) in data fields. In an exemplary implementation, the user interface module is capable of displaying any data that it reads from imaging module 107, and microphone array 105i.
[00028] In the context of the disclosure, the term “near eye” refers to a display where near-eye display 106 is disposed in near proximity to eyes of user 700 in one exemplary implementation. Near proximity can include various ranges according to system parameters and design criteria for example, near eye solutions are within 10 cm (e.g., within 7.5 cm, and within 5 cm) of the eyes of a user 700 when system 10 is in use. In certain implementations, portable wearable device 110 is positioned within 2.5 cm.
[00029] It is noted, that near-eye transparent display 106 of wearable device 150, can be monocular in certain exemplary implementations, or binocular in others. Furthermore, if the display is bi-ocular, the display of the markers can be limited to a single ocular display, or to both, and can further move from one to the other. In implementations using monocular display, near-eye transparent display 106 may cover a single eye of the user or both. In the context of this description and the subsequent claims, the term “bi-ocular near-eye transparent display” refers to near-eye transparent display 106 which it is intended to display audible markers in front of both eyes simultaneously. Consequently, bi-ocular near-eye transparent display 106 comprises optical elements configured to present data to both eyes. Conversely, the term "monocular" as used herein, refers to optical elements and near-eye transparent display 106 having a single eyepiece or ocular designed for viewing an objective image, typically in front of one eye, while the other eye remains uncovered.
[00030] Furthermore, in an exemplary implementation, the near-eye display is a module capable of being assembled as a kit of parts to form the system disclosed herein. The module is operable to couple to other parts in the kit of parts, for example as a clip-on to a frame of existing eye glasses, a helmet and other wearable devices that will allow the near-eye display to provide the display to user 700. For example, the near eye display, whether as a module or integral component can be comprised of one or more of, by way of example only; OLED, TOLED, iLED (micro-LED), PHOLED (Phosphorescent OLED), WOLED (White OLED), FOLED (Flexible OLED), ELED (Electroluminescent display), TFEL (Thin Film Electroluminescent), TDEL (Thick dielectric electroluminescent), or Quantum Dot Laser.
[00031] In an exemplary implementation, the system further comprises means, such as at least one of: SONAR, LIDAR, Doppler radar, RGBD camera, Time-of-flight (ToF) Camera and the like, each operable to provide acoustic mapping of a given volume, for example, a room. The mapping is used in certain exemplary implementations for better isolation of input sources and/or noise cancellation from other sound input sources. In certain exemplary implementations, mapping can also be used for sound orientation (e.g. distance resolution of the source), for example where the user is sitting in front of a PC near the wall, the dog at the end of the room barks. The bark sound is received from the dog as well as from the wall in front of the user. Furthermore, mapping enables the system to define that the sound (e.g., the barking) is from behind on the far side. In certain exemplary implementations, mapping is used for selection of a desired area of interest and isolating a source by cancelling all other noises (as well as echoes) out of the area of audible interest (in other words, an area of acoustic interest AAI). In an exemplary implementation, the systems and devices disclosed are operable to focus on a predefined segment (in other words, beam focus), as well as provide resolution of sound 360° around the user, both indoors and outdoors.
[00032] In certain implementations, the system comprises processor readable media, in communication with a non-volatile memory device having thereon a set of executable instructions, configured when executed to perform machine learning capabilities for allowing the user to discern the different audible sources and, based on the selected field of attention (FOAT), selecting a predetermined number out of several by the user saying its name. The system is then configured to displays the possible descriptor of the mark and the user can then choose one by, in certain examples, finger touch on the user interface module’s touch screen (e.g., of portable computing device 110), or by indicating verbally.
[00033] As indicated, the systems, and non-transitory storage media for providing real-time, source, location and magnitude -specific rendering of auditory triggers by a specific marker is further configured to change the marker (for example, a person’s image or avatar, a symbol, a caption and the like) in size as a function of the auditory signal’s magnitude. In other words, in certain implementations, the source of sound identified and selected will increase (and decrease) in size the closer (or further) the audible source is to the user. The markers can be for example, at least one of: the voice of a known person (recognized by the system or defined by the user), sirens (e.g., ambulance, police, fire, air-raid, hurricane), baby cry, doorbell, ringtone (recognized by the system or defined by the user), car/truck/ship/train horns, boiling kettle, various appliance sounds (e.g., microwave, washing machine, dishwasher and the like), door slam, sliding window, creaking of hinges/steps, lock opening, fan sounds (e.g., room fan, ceiling fan, computer fan and the like), and a combination comprising the foregoing. It is noted, that each source would be assigned a specific marker (e.g., symbol), that is unique to that source.
[00034] For example, separation of three speakers (in other words, people speaking), as well as a crying baby, using the systems and programs disclosed herein can be achieved using the following pseudo-code: Check mode
Set proper beam angle and direction
Set display mode {Closed Caption rows at bottom, M words by N rows box, Flashing, etc...)
Get voices
Detect number of voices.
Give each recognized voice and ID (e.g, V1, V2 and V3) if a voice is not identified, ask user to give name for that voice. if voice Is identified, attach name to proper text box.
Get word (for unique pitch detection) if voice is from (Vi, V2, V3), display word in text window No. 1, 2 3 correspondency if voice is non from Vi, V2 or V3, ignore
Get sound if sound is from baby crying, display symbol on alert window, otherwise ignore.
[00035] The systems used herein can be computerized systems, further comprising a central processing module; a display module; and a user interface module. The Display modules, which can include display elements, which may include any type of element which acts as a display. A typical example is a Liquid Crystal Display (LCD). LCD for example, includes a transparent electrode plate arranged on each side of a liquid crystal. There are however, many other forms of displays, for example OLED displays and Bi-stable displays. New display technologies are also being developed constantly. Therefore, the term display should be interpreted widely and should not be associated with a single display technology. Also, the display module may be mounted on a printed circuit board (PCB) (121, see e.g., FIG. 1) of electronic device 120, arranged within a protective housing and the display module is protected from damage by a glass or plastic plate arranged over the display element and attached to the housing.
[00036] In an exemplary implementation, disclosed herein is a non-transitory processor readable storage medium having thereon a set of executable instructions, configured, when executed by at least one processor included in a system comprising a wearable display device sized and configured to receive auditory signals and display visual markers to the user and a central processing module (CPM, see e.g., 304, FIG. 3), cause the central processing module to: receive a plurality of auditory signals; analyze at least one of the: direction, source, and magnitude of each of the plurality of the auditory signal; convert at least one of the auditory signal’s analyzed to a marker specific for at least one of the direction, source, and magnitude; and cause the wearable display device to render the marker. In certain exemplary implementations, a text box is rendered under the speaker (in other words, the source of the audible signal) chin with the width of the speaker shoulders and if the speaker moves the text box is operable to follow and maintain the predefined configuration.
[00037] The term “computer-readable medium”, or “processor-readable medium”, as used herein, in addition to having its ordinary meaning, refers to any medium that participates in providing instructions to at least one processor for execution. Such a medium may take many forms, including but not limited to, non-volatile media and volatile media. Non-volatile media can be, for example, optical or magnetic disks, such as a storage device. Volatile media includes dynamic memory, such as main memory.
[00038] Memory device as used in the programs and systems described herein can be any of various types of memory devices or storage devices. The term “memory device” is intended to encompass an installation medium, e.g., a CD-ROM, floppy disks, or tape device; a computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, Rambus RAM, etc.; or a non-volatile memory such as a magnetic media, e.g., a hard drive, optical storage, or ROM, EPROM, FLASH, etc. The memory device may comprise other types of memory as well, or combinations thereof. In addition, the memory medium may be located in a first computer in which the programs are executed (e.g., wearable device 150), and/or may be located in a second different computer [or micro controller, e.g., portable computing device 110] which connects to the first computer over a network, such as the Internet. In the latter instance, the second computer may further provide program instructions to the first computer for execution. The term “memory device” can also include two or more memory devices which may reside in different locations, e.g., in different computers that are connected over a network.
[00039] An exemplary implementation of the system 200 is provided in FIG. 2, illustrating both audible capture 201, and visual 202 capture of an audible trigger by a person. Captured by an ith microphone in microphone array 105i, and transmitted 203 wirelessly to be processed by sound data module 204 and audio and sounds module 205, where following pre-classification the system determine whether the source is recognized (may be affected by the user as well), based on which sound an direction symbols (interchangeable with markers) are assigned 206, whereby, together with linguistic data module 207 equipped for example with natural language recognition, and speech-to-text module 208, will convert the sound to text 209, and be inputted into sound merge module 210, comprising lip reading capabilities. [00040] Simultaneously, using e.g., imaging module 107 (see e.g., FIG. 1), the source is captured 202 with, for example video stream, then transmitted 216 to image processing module 217, where, optionally using wireless transmission 221, the processed image can undergo 218 facial recognition and lip-reading analysis 219, then uploaded to sound merge module 210, where both input streams (audible 201 and visual 202), using wireless 215 communication (or wired communication), are input 211 into display generator 212, which renders 213 the processed data on near-eye display 214 (near-eye transparent display 106, see e.g., FIG. 1).
[00041] Turning now to FIG. 3, illustrating an exemplary configuration of system 300. As illustrated, audio inputs 301, generally captured and provided by one or more microphones. The microphones are in wired or wireless communication with the CPM 304. In certain implementations, the microphone arrays are configured as a module, capable of being assembled as a kit of parts to form system 300 (or 10), for example by operably coupling to regular glasses, for example, by clipping on, and/or adhering to the frame of regular glasses (or AR/VR/MR devices). Likewise, video inputs 302 generally comprise of one or more cameras forming a portion of the imaging module 107. Similar to audio module, the imaging module 302 are in wired or wireless communication with the CPM 304. Moreover, manual inputs 303 are supplied from various sensors (e.g., fingertip/hand sensors 102, see e.g., FIG. 1) again available in certain exemplary implementations as a separate module, capable of being assembled as a kit of parts to form system 300 (or 10) operable to be in wired or wireless communication with CPM 304. CPM 304 is operable in certain implementations, to perform at least two processes: i) Speech to text transcription ii) Display Process. Processing module can be locally embedded within portable computing device 110, capable of being assembled as a kit of parts to form system 300 (or 10), or be remote (cloud- based). In addition, CPM 304 can have additional processes such as Spatial Visual analysis operable to detect and identifies visible objects e.g. upper part of a person (e.g., speaker or user) face, lips, eyes and the like.
[00042] As illustrated, Speech to text process 305, is operable to transcribe in real time words and sentences input from voice to text which are further manipulated before display. In certain implementations, the system is configured to execute more than one Speech to text processes at the same time. In addition, Spatial Auditory analysis 306 detects, separates and retrieves all the sound types other tan speech, transcribes and converts (renders) these inputs to graphic format (text symbol, actionable icons, etc. ) and sends it to the Text/Graphic Display Process for final arrangement and determination of if, how and what to display. Likewise, Spatial Video analysis 307 is configured to detect and surrounding objects, such as, for example people, their direction, face and lips and sends the information to the Text/Graphic Display Process for final arrangement and determination of if, how and what to display (render). Display process 308 in system 300, is operable to receive interpreted and analyzed processable data about the auditory and visual surroundings and based on it, arranges it in form of dynamic graphic presentation such as text, symbols, actionable icons and animations adapted to the specific needs of user 700.
[00043] Systems 300 used herein can be computerized systems further comprising: central processing module 304; a transparent near-eye display module 106; and user interface module (e.g., sensor 102). Display modules e.g., near-eye display 106), which can include display elements, may further include any type of element which acts as a display. A typical example is a Liquid Crystal Display (LCD). LCD for example, includes a transparent electrode plate arranged on each side of a liquid crystal. There are however, many other forms of displays, for example OLED displays and Bi-stable displays. New display technologies are also being developed constantly. Therefore, the term display should be interpreted broadly and should not be associated with a single display technology. Also, the display module 106 may be mounted on a printed circuit board (PCB) of an electronic device, arranged within a protective housing and the display module is protected from damage by a glass or plastic plate arranged over the display element and attached to the housing.
[00044] In the context of the disclosure, the term “module” is used herein to refer to software computer program code and/or any hardware or circuitry utilized to provide the functionality attributed to the module. Further, the term “module” or “component” can also refer to software objects or routines that execute on the computing system. The different components, modules, engines, and services described herein may be implemented as objects or processes that execute on the computing system (e.g., as separate threads). As may also be used herein, the terms “module”, “processing circuit”, and/or “processing unit” may be a single processing device or a plurality of processing devices. Such a processing device may be a microprocessor, micro-controller, digital signal processor, microcomputer, central processing unit, field programmable gate array, programmable logic device, state machine, logic circuitry, analog circuitry, digital circuitry, and/or any device that manipulates signals (analog and/or digital) based on hard coding of the circuitry and/or operational instructions (in other words, firmware). The at least one processor, processing circuit, and/or processing unit may have an associated memory and/or an integrated memory element, which may be a single memory device, a plurality of memory devices, and/or embedded circuitry of the processing module, module, processing circuit, and/or processing unit. Such a memory device may be a read-only memory, random access memory, transient memory, non transient memory, static memory, dynamic memory, flash memory, cache memory, and/or any device that stores digital information.
[00045] Note that if the at least one processor, module, servers, network etc., processing circuit, and/or processing unit includes more than one processing device, the processing devices may be centrally located or may be distributed (e.g., cloud computing via indirect coupling via a local area network and/or a wide area network). Still further it is noted that, the memory element may store, and processor, module, processing circuit, and/or processing unit executes, hard coded and/or operational instructions corresponding to at least some of the steps and/or functions. Such a memory device or memory element can be and is included in an exemplary implementation, as an article of manufacture.
[00046] The term "comprising" and its derivatives, as used herein, are intended to be open ended terms that specify the presence of the stated features, elements, components, groups, integers, and/or steps, but do not exclude the presence of other unstated features, elements, components, groups, integers and/or steps. The foregoing also applies to words having similar meanings such as the terms, "including", "having" and their derivatives.
[00047] The terms “a”, “an” and “the” herein do not denote a limitation of quantity, and are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. The suffix “(s)” as used herein is intended to include both the singular and the plural of the term that it modifies, thereby including one or more of that term (e.g., the microphone(s) includes one or more microphone). Reference throughout the specification to “one exemplary implementation”, “another exemplary implementation”, “an exemplary implementation”, and so forth, when present, means that a particular element (e.g., feature, structure, instruction, and/or characteristic) described in connection with the exemplary implementation is included in at least one exemplary implementation described herein, and may or may not be present in other exemplary implementations. In addition, it is to be understood that the described elements may be combined in any suitable manner in the various exemplary implementations.
[00048] Unless specifically stated otherwise, as apparent from the discussions, it is appreciated that throughout the specification discussions utilizing terms such as “processing,” “loading,” “in communication,” “detecting,” “calculating,” “determining”, “analyzing,” “isolating” or the like, refer to the action and/or processes of a computer or computing system, or similar electronic computing device, that manipulate and/or transform data represented as physical, such as the auditory signal into other data similarly represented as visual markers, such as the converted audible signals.
[00049] Further, the at least one processor (e.g., CPM 304) may be operably coupled to the various modules and components with appropriate circuitry may also be used herein, the term(s) “operably coupled to”, “coupled to”, and/or “coupling” includes direct coupling between items and/or indirect coupling between items via an intervening item (e.g., an item includes, but is not limited to, a component, an element, a circuit, an engine, and/or a module) where, for indirect coupling, the intervening item does not modify the information of a signal but may adjust its current level, voltage level, and/or power level. As may further be used herein, inferred coupling (i.e., where one element is coupled to another element by inference) includes direct and indirect coupling between two items in the same manner as “coupled to”. As may even further be used herein, the term “operable to” or “operably coupled to” indicates that an item includes one or more of power connections, input(s), output(s), etc., to perform, when activated, one or more its corresponding functions and may further include inferred coupling to one or more other items. As may still further be used herein, the term “associated with”, includes direct and/or indirect coupling of separate items and/or one item being embedded within another item.
[00050] Although the foregoing disclosure for systems, and non-transitory storage medium for providing real-time, source, location and magnitude -specific rendering of auditory triggers, has been described in terms of some exemplary implementations, other exemplary implementations will be apparent to those of ordinary skill in the art from the disclosure herein. Moreover, the described exemplary implementations have been presented by way of example only, and are not intended to limit the scope of the exemplary implementations. Indeed, the novel methods, programs, and systems described herein may be embodied in a variety of other forms without departing from the spirit thereof. Accordingly, other combinations, omissions, substitutions and modifications will be apparent to the skilled artisan in view of the disclosure herein.
[00051] Accordingly and in an embodiment, provided herein is a system for providing visual indication of auditory signal to a user, the system comprising: a wearable display device sized and configured to receive auditory signals and display visual markers to the user; and a central processing module (CPM) in communication with the wearable display device, configured to: receive a plurality of auditory signals; analyze at least one of the: direction, source, speaker identification, and magnitude of each of the plurality of the auditory signal; convert at least one of the auditory signal’s analyzed to a marker specific for at least one of the direction, source, and magnitude; and cause the wearable display device to render the marker, wherein (i) the user is hard-of-hearing, (ii) the wearable display device further comprises: a first microphone array configured to provide a surround auditory input; and an additional (second) wearable microphone array configured to provide a surround auditory input, (iii) each microphone array comprises a plurality of directional microphones, wherein (iv) the wearable display device is monocular device, or a bi-ocular device comprising a near-eye transparent display, (v) the wearable display device further comprises an imaging module, the imaging module configured to capture an image of at least one of: the auditory signal source, and the user gesture, wherein (vi) the marker specific for at least one of: the direction, source, and magnitude is further configured to change in at least one of: size, color, and shape as a function of the auditory signal’ s magnitude, (vii) is further configured to change in location on the display as a function of direction, wherein (viii) the auditory signal’s source is at least one of: a known person, a siren, a baby cry, a doorbell, a ringtone, a vehicle horn, a boiling kettle, an appliance sound, a door slam, a sliding window, a lock opening, and a combination comprising the foregoing, wherein the system (ix) further comprises at least one of: a fingertip sensor, a finger sensor, and a hand sensor, configured to input user preferences for initial location of the marker on the near eye transparent display, (x) the at least one of: a fingertip sensor, a finger sensor, and a hand sensor, is further operable to provide interactive control to the user.
[00052] In another exemplary implementation, provided herein is an article of manufacture comprising a central processing module having at least one processor, in communication with a non- transitory processor readable storage medium having thereon a set of executable instructions, configured, when executed by the at least one processor, cause the central processing module to: receive a plurality of auditory signals; analyze at least one of the: direction, source, and magnitude of each of the plurality of the auditory signal; convert at least one of the auditory signal’s analyzed to a marker specific for at least one of the direction, source, and magnitude; and render the marker, wherein (xi) the article of manufacture further comprises a wearable display device sized and configured to receive auditory signals and display visual markers to the user, (xii) the wearable display device comprises a microphone array of at least three microphones configured to provide a surround auditory input and wherein the set of executable instructions are further configured, when executed, to: determine at least one of the direction, source, and magnitude of the auditory signal based on acoustic flow among the three microphones in the microphone array, wherein (xiii) the wearable display device is monocular device, or a bi-ocular device comprising a near-eye transparent display, wherein (xiv) the set of executable instructions are further configured, when executed, to: change at least one of: size, color, and shape of at least one marker on the near-eye transparent display as a function of the auditory signal’s magnitude, and wherein (xv) the set of executable instructions are further configured, when executed, to: change the location of at least one marker on the near-eye transparent display as a function of the auditory signal’s direction and source relative to the user.
[00053] In yet another exemplary implementation, provided herein is a kit of parts capable of being assembled for providing visual indication of auditory signal to a user, the kit comprising: a microphone array configured to provide a surround auditory input; an imaging module, the imaging module configured to capture an image of at least one of: the auditory signal source and user gesture ; a wearable display device sized and configured to receive auditory signals and display visual markers to the user; and a central processing module (CPM) in communication with the wearable display device, the microphone array; and the imaging module, configured to: receive a plurality of auditory signals; analyze at least one of the: direction, source, speaker identification, and magnitude of each of the plurality of the auditory signal; convert at least one of the auditory signal’s analyzed to a marker specific for at least one of the direction, source, and magnitude; and cause the wearable display device to render the marker.
[00054] It will be apparent to one of ordinary skill in the art that various changes and modifications can be made therein without departing from the spirit and scope thereof. Accordingly, it is intended that the present disclosure covers the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents.

Claims

What is claimed:
1. A system for providing visual indication of auditory signal to a user, the system comprising: a) a wearable display device sized and configured to receive auditory signals and display visual markers to the user; and b) a central processing module (CPM) in communication with the wearable display device, configured to: i. receive a plurality of auditory signals; ii. analyze at least one of the: direction, source, speaker identification, and magnitude of each of the plurality of the auditory signal; iii. convert at least one of the auditory signal’s analyzed to a marker specific for at least one of the direction, source, and magnitude; and iv. cause the wearable display device to render the marker.
2. The system of claim 1, wherein the user is hard-of-hearing.
3. The system of claim 1, wherein the wearable display device comprises: a) a microphone array configured to provide a surround auditory input.
4. The system of claim 3, wherein each microphone array comprises a plurality of directional and/or omni microphones.
5. The system of claim 4, wherein the wearable display device is monocular device, or a bi ocular device comprising a near-eye transparent display.
6. The system of claim 5, wherein the wearable display device further comprises an imaging module, the imaging module configured to capture an image of at least one of: the auditory signal source and user gesture.
7. The system of claim 1, wherein the marker specific for at least one of: the direction, source, and magnitude is further configured to change in at least one of: size, color, and shape as a function of at least one of: the auditory signal’s magnitude, and the auditory signal’s direction.
8.
9. The system of claim 1, wherein the auditory signal’s source comprises at least one of: a known person, an unknown person, a siren, a baby cry, a doorbell, a ringtone, a vehicle horn, a boiling kettle, an appliance sound, a door slam, a sliding window, a lock opening, and a combination comprising the foregoing.
10. The system of claim 5, further comprising at least one of: a fingertip sensor, a finger sensor, and a hand sensor, configured to input user preferences for initial location of the marker on the near eye transparent display.
11. The system of claim 10, wherein the at least one of: a fingertip sensor, a finger sensor, and a hand sensor, is further operable to provide interactive control to the user.
12. An article of manufacture comprising a central processing module having at least one processor, in communication with a non-transitory computer readable storage medium having thereon a set of executable instmctions, configured, when executed by the at least one processor, cause the central processing module to: a. receive a plurality of auditory signals; b. analyze at least one of the: direction, source, and magnitude of each of the plurality of the auditory signal; c. convert at least one of the auditory signal’s analyzed to a marker specific for at least one of the direction, source, and magnitude; and d. render the marker.
13. The article of claim 12, wherein the article of manufacture further comprises a wearable display device sized and configured to receive auditory signals and display visual markers to the user.
14. The article of claim 13, wherein the wearable display device comprises a microphone array of at least three microphones configured to provide a surround auditory input and wherein the set of executable instructions are further configured, when executed, to: determine at least one of the direction, source, and magnitude of the auditory signal based on acoustic flow among the three microphones in the microphone array.
15. The article of claim 12, wherein the wearable display device is monocular device, or a bi ocular device comprising a near-eye transparent display.
16. The article of claim 12, wherein the set of executable instructions are further configured, when executed, to: change at least one of: size, color, and shape of at least one marker on the near eye transparent display as a function of the auditory signal’s magnitude.
17. The article of claim 12, wherein the set of executable instructions are further configured, when executed, to: change the location of at least one marker on the near-eye transparent display as a function of the auditory signal’s direction and source relative to the user.
PCT/US2021/012677 2020-01-08 2021-01-08 Systems, and programs for visualization of auditory signals WO2021142242A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202062958426P 2020-01-08 2020-01-08
US62/958,426 2020-01-08

Publications (1)

Publication Number Publication Date
WO2021142242A1 true WO2021142242A1 (en) 2021-07-15

Family

ID=76788874

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/012677 WO2021142242A1 (en) 2020-01-08 2021-01-08 Systems, and programs for visualization of auditory signals

Country Status (1)

Country Link
WO (1) WO2021142242A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023097277A1 (en) * 2021-11-23 2023-06-01 Hearing Glasses Llc Smart glasses to assist those who are deaf or hard of hearing
US20230252418A1 (en) * 2022-02-09 2023-08-10 My Job Matcher, Inc. D/B/A Job.Com Apparatus for classifying candidates to postings and a method for its use

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130079061A1 (en) * 2010-05-17 2013-03-28 Tata Consultancy Services Limited Hand-held communication aid for individuals with auditory, speech and visual impairments
US20130279705A1 (en) * 2011-11-14 2013-10-24 Google Inc. Displaying Sound Indications On A Wearable Computing System
US20140236594A1 (en) * 2011-10-03 2014-08-21 Rahul Govind Kanegaonkar Assistive device for converting an audio signal into a visual representation
US20160142830A1 (en) * 2013-01-25 2016-05-19 Hai Hu Devices And Methods For The Visualization And Localization Of Sound
US20170018281A1 (en) * 2015-07-15 2017-01-19 Patrick COSSON Method and device for helping to understand an auditory sensory message by transforming it into a visual message

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130079061A1 (en) * 2010-05-17 2013-03-28 Tata Consultancy Services Limited Hand-held communication aid for individuals with auditory, speech and visual impairments
US20140236594A1 (en) * 2011-10-03 2014-08-21 Rahul Govind Kanegaonkar Assistive device for converting an audio signal into a visual representation
US20130279705A1 (en) * 2011-11-14 2013-10-24 Google Inc. Displaying Sound Indications On A Wearable Computing System
US20160142830A1 (en) * 2013-01-25 2016-05-19 Hai Hu Devices And Methods For The Visualization And Localization Of Sound
US20170018281A1 (en) * 2015-07-15 2017-01-19 Patrick COSSON Method and device for helping to understand an auditory sensory message by transforming it into a visual message

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023097277A1 (en) * 2021-11-23 2023-06-01 Hearing Glasses Llc Smart glasses to assist those who are deaf or hard of hearing
US20230252418A1 (en) * 2022-02-09 2023-08-10 My Job Matcher, Inc. D/B/A Job.Com Apparatus for classifying candidates to postings and a method for its use

Similar Documents

Publication Publication Date Title
US11632470B2 (en) Methods and apparatus to assist listeners in distinguishing between electronically generated binaural sound and physical environment sound
US11828940B2 (en) System and method for user alerts during an immersive computer-generated reality experience
US11010601B2 (en) Intelligent assistant device communicating non-verbal cues
US20170188173A1 (en) Method and apparatus for presenting to a user of a wearable apparatus additional information related to an audio scene
JP7092108B2 (en) Information processing equipment, information processing methods, and programs
JP2016208348A (en) Display device, control method for display device, and program
KR20140091552A (en) Displaying sound indications on a wearable computing system
US20220028406A1 (en) Audio-visual sound enhancement
EP3261367B1 (en) Method, apparatus, and computer program code for improving perception of sound objects in mediated reality
WO2021142242A1 (en) Systems, and programs for visualization of auditory signals
CA3166345A1 (en) Hearing aid systems and methods
KR20140091195A (en) Glasses and control method thereof
US11740350B2 (en) Ultrasonic sensor
GB2598333A (en) A method and head-mounted unit for assisting a user
WO2018104731A1 (en) Image processing system and method
Salvi et al. Smart glass using IoT and machine learning technologies to aid the blind, dumb and deaf
WO2023150327A1 (en) Smart glass interface for impaired users or users with disabilities
KR101455830B1 (en) Glasses and control method thereof
US20200036939A1 (en) Transparency system for commonplace camera
US11234090B2 (en) Using audio visual correspondence for sound source identification
US20240146847A1 (en) Methods and Apparatus to Assist Listeners in Distinguishing Between Electronically Generated Binaural Sound and Physical Environment Sound
US20240098409A1 (en) Head-worn computing device with microphone beam steering
JP2023531849A (en) AUGMENTED REALITY DEVICE FOR AUDIO RECOGNITION AND ITS CONTROL METHOD

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21738374

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21738374

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 21738374

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 19.01.2023)