CN113168225A - Locating spatialized sound nodes for echo location using unsupervised machine learning - Google Patents
Locating spatialized sound nodes for echo location using unsupervised machine learning Download PDFInfo
- Publication number
- CN113168225A CN113168225A CN201980076681.0A CN201980076681A CN113168225A CN 113168225 A CN113168225 A CN 113168225A CN 201980076681 A CN201980076681 A CN 201980076681A CN 113168225 A CN113168225 A CN 113168225A
- Authority
- CN
- China
- Prior art keywords
- sound
- dimensional space
- echo
- user
- depth
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000007340 echolocation Effects 0.000 title claims abstract description 60
- 238000010801 machine learning Methods 0.000 title claims abstract description 22
- 238000000034 method Methods 0.000 claims description 33
- 230000003190 augmentative effect Effects 0.000 claims description 13
- 238000012545 processing Methods 0.000 claims description 10
- 230000006870 function Effects 0.000 claims description 7
- 238000003064 k means clustering Methods 0.000 claims description 5
- 230000033001 locomotion Effects 0.000 claims description 5
- 230000004438 eyesight Effects 0.000 abstract description 6
- 230000004807 localization Effects 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 11
- 238000004891 communication Methods 0.000 description 6
- 230000004044 response Effects 0.000 description 6
- 230000011218 segmentation Effects 0.000 description 6
- 238000012800 visualization Methods 0.000 description 5
- 230000009471 action Effects 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 3
- 239000007787 solid Substances 0.000 description 3
- 208000010415 Low Vision Diseases 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000004303 low vision Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S15/00—Systems using the reflection or reradiation of acoustic waves, e.g. sonar systems
- G01S15/02—Systems using the reflection or reradiation of acoustic waves, e.g. sonar systems using reflection of acoustic waves
- G01S15/04—Systems determining presence of a target
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F9/00—Games not otherwise provided for
- A63F9/0001—Games specially adapted for handicapped, blind or bed-ridden persons
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S15/00—Systems using the reflection or reradiation of acoustic waves, e.g. sonar systems
- G01S15/88—Sonar systems specially adapted for specific applications
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B21/00—Teaching, or communicating with, the blind, deaf or mute
- G09B21/001—Teaching or communicating with blind persons
- G09B21/006—Teaching or communicating with blind persons using audible presentation of the information
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
- H04S7/303—Tracking of listener position or orientation
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F9/00—Games not otherwise provided for
- A63F9/0001—Games specially adapted for handicapped, blind or bed-ridden persons
- A63F2009/0003—Games specially adapted for blind or partially sighted people
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/01—Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/11—Application of ambisonics in stereophonic audio systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
- H04S7/303—Tracking of listener position or orientation
- H04S7/304—For headphones
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Remote Sensing (AREA)
- Radar, Positioning & Navigation (AREA)
- General Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Computer Networks & Wireless Communication (AREA)
- Business, Economics & Management (AREA)
- Educational Technology (AREA)
- Educational Administration (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Processing Or Creating Images (AREA)
Abstract
Described herein are systems for generating echo-localized sounds to assist a user with no or limited vision in navigating a three-dimensional space (e.g., a physical environment, a computer gaming experience, and/or a virtual reality experience). An input is received from a user to generate an echogenic localization sound to navigate a three-dimensional space. Based at least on the received input, a digital representation of the three-dimensional space is segmented into one or more depth planes using an unsupervised machine learning algorithm. For each depth plane, determining an object segment for each object within a particular depth plane; determining locations of a plurality of echoing sound nodes according to a depth level and a surface area of each object defined by the determined segments; and generating an echo location sound comprising spatialized sound from each echo sound node originating from the determined location.
Description
Background
Echolocation allows people to perceive their surroundings by emitting audible sounds and listening to reflections of sound waves produced by nearby objects. Visually impaired people can navigate using echo location.
Disclosure of Invention
Described herein is a system for generating echo location sounds to assist a user in navigating a three-dimensional space, comprising: a processing system comprising a processor and a memory having computer-executable instructions stored thereon that, when executed by the processor, cause the processing system to: receiving input from a user to generate an echo location sound to navigate a three-dimensional space; based at least on the received input: segmenting the digital representation of the three-dimensional space into one or more depth planes using an unsupervised machine learning algorithm; for each depth plane, determining an object segment for each object within a particular depth plane; determining locations of a plurality of echoing sound nodes according to a depth level and a surface area of each object defined by the determined segments; and generating an echo location sound comprising spatialization sound from each echo sound node originating from the determined location.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Drawings
FIG. 1 is a functional block diagram illustrating a system for generating echo location sounds to assist a user in navigating a three-dimensional space.
Fig. 2 is a diagram illustrating an exemplary initial depth level.
Fig. 3 is a diagram illustrating exemplary depth level and segmentation objects.
Fig. 4 is a diagram illustrating an exemplary echo node located on an object.
FIG. 5 is a flow diagram illustrating a method of generating echo location sounds to assist a user in navigating in three-dimensional space.
FIG. 6 is a flow chart illustrating a method of generating echo location sounds to assist a user in navigating a three dimensional space.
FIG. 7 is a functional block diagram illustrating an exemplary computing system.
Detailed Description
Various technologies pertaining to generating echo location sounds to assist a user in navigating a three-dimensional space are now described with reference to the drawings, wherein like reference numerals are used to refer to like elements throughout. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more aspects. It may be evident, however, that such aspect(s) may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to facilitate describing one or more aspects. Further, it should be understood that functionality that is described as being performed by certain system components may be performed by multiple components. Similarly, for example, a component may be configured to perform functionality described as being performed by multiple components.
The subject disclosure supports various products and processes that perform or are configured to perform various actions with respect to generating echo location sounds to assist a user in navigating a three-dimensional space. The following are one or more exemplary systems and methods.
Aspects of the subject disclosure relate to technical issues to assist (e.g., without or limited to vision) users in navigating a three-dimensional space (e.g., a physical environment, a computer gaming experience, and/or a virtual reality experience). Technical features associated with solving the problem relate to receiving input from a user to generate echo location sounds for navigating a three-dimensional space; based on (e.g., in response to) at least the received input: segmenting the digital representation of the three-dimensional space into one or more depth planes using an unsupervised machine learning algorithm; for each depth plane, determining an object segment for each object within a particular depth plane; determining locations of a plurality of echoing sound nodes according to a depth level and a surface area of each object defined by the determined segments; and generating an echo location sound comprising spatialization sound from each echo sound node originating from the determined location. Thus, aspects of these technical features exhibit the technical effect of more efficiently and effectively assisting users with no or limited vision to navigate in three-dimensional space using computer-generated echo-localized sounds.
Furthermore, the term "or" is intended to mean an inclusive "or" rather than an exclusive "or". That is, unless specified otherwise, or clear from context, the phrase "X employs A or B" is intended to mean any of the natural inclusive permutations. That is, the phrase "X employs a or B" is satisfied by any of the following examples: x is A; x is B; or X adopts A and B. In addition, the articles "a" and "an" as used in this application and the appended claims should generally be construed to mean "one or more" unless specified otherwise or clear from context to be directed to a singular form.
As used herein, the terms "component" and "system," as well as various forms thereof (e.g., component, system, subsystem, etc.), are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, an object, an instance, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computer and the computer can be a component. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers. Further, as used herein, the term "exemplary" is intended to mean serving as an illustration or example of something, and is not intended to indicate priority.
People(s) without or with low vision may have difficulty navigating a physical world and/or a virtual world associated with a gaming experience, for example. Echolocation allows a person with no or low vision to perceive the person's surroundings in response to emitted auditory sounds and by listening to reflections of sound waves generated by nearby objects.
Described herein are systems and methods of generating echo localization sounds to assist a user in navigating a three-dimensional space (e.g., a physical space and/or a virtual environment). In response to a user input (e.g., a request) to generate an echo to navigate a three-dimensional space, a representation associated with the three-dimensional space (e.g., a computer image) is segmented into depth levels (e.g., planes) using an unsupervised machine learning algorithm (e.g., a clustering algorithm) (from the perspective of the user). For each depth level, an object segment is determined for each object within a particular depth level. The locations of the plurality of echogenic sound nodes are determined (e.g., a predetermined number and/or dynamically determined based on three-dimensional space) according to the depth level and surface area of each object defined by the determined segments. Generating an echo location sound comprising the spatialized sound from each echo sound node originating from the determined location. For example, spatialization sounds associated with closer and/or larger objects may be louder relative to spatialization sounds associated with farther and/or smaller objects.
In some embodiments, the systems and methods may provide accessibility features that can be incorporated with three-dimensional game(s) to allow a wider player population (e.g., user(s) with limited vision and/or user(s) without vision). The spatial audio cues provided by the systems and methods may allow a user to navigate a three-dimensional game. For example, using unsupervised machine learning, the systems and methods may determine a best/optimal location and sound for each echoed sound node to assist the user in navigating the three-dimensional space.
Referring to fig. 1, a system for generating echo location sounds to assist a user in navigating a three-dimensional space 100 is illustrated. The system 100 may assist the non-sighted or low-sighted person(s) in navigating the physical world and/or the virtual world (e.g., gaming experience) by locating echoing sound nodes and generating echoing location sounds from the echoing sound nodes.
The system 100 includes an input component 110, the input component 110 receiving input from a user to generate an echogenic localization sound to assist in navigating a three-dimensional space. In some embodiments, input may be received via an input device (e.g., a pointing device such as a mouse, trackball, stylus, touch pad, keyboard, microphone, joystick, game pad, satellite dish, scanner, camera, other computer, etc.). In some embodiments, the input may be received via, for example, buttons or touch-sensitive inputs of a virtual reality/augmented reality headset. In some embodiments, the input may be a voice command received via a microphone. In some embodiments, the input may be inferred based on gesture(s) and/or motion(s) of the user.
In some embodiments, the input may be based on gestures, e.g., gestures from various touches (e.g., touch screen (s)) and/or motion sensitive systems (e.g., virtual reality visualization/manipulation systems). In some embodiments, the input component 110 may receive gestures from a gesture-sensitive display, which may be an integrated system with a display and sensors, and/or from a non-coherent display and sensors (not shown). In some embodiments, the input component 110 may receive the gesture via a virtual reality visualization/manipulation system (not shown) or an augmented reality visualization/manipulation system (not shown). The virtual reality visualization/manipulation system and/or the augmented reality visualization/manipulation system may include an accelerometer/gyroscope, a 3D display, head tracking, eye tracking, gaze tracking, and/or an immersive augmented reality system.
In some embodiments, the three-dimensional space comprises a computer-generated gaming experience. For example, the three-dimensional space may be displayed via a computer display (e.g., LCD, LED, plasma), virtual reality headphones, and/or augmented reality headphones. In some embodiments, when the user navigates the three-dimensional space using the echo location sounds generated by the system 100, the three-dimensional space is not displayed to the user (because the user has no or limited vision). In some embodiments, the three-dimensional space includes a virtual reality, mixed reality, and/or augmented reality environment.
In some embodiments, the three-dimensional space comprises a physical environment. For example, a digital representation of a three-dimensional space may be captured using a digital camera, a three-dimensional camera, and/or a depth camera. For example, the representation may be generated based on depth image(s) from a depth sensing camera.
The system 100 also includes a depth plane component(s) 120 that segments the digital representation of the three-dimensional space into one or more depth plane(s) using an unsupervised machine learning algorithm in response to the received input. In some embodiments, the digital representation of the three-dimensional space is based on a current (e.g., stationary) location of the user. In some embodiments, the digital representation of the three-dimensional space is based on a predicted or inferred position of the user (e.g., based on a direction and speed of movement of the user in the physical environment and/or the computer-generated virtual environment).
In some embodiments, when the three-dimensional space includes a computer-generated gaming experience, the digital representation may be a view of the gaming experience from a user perspective (e.g., directional or in the direction of travel). In some embodiments, when the three-dimensional space includes a virtual reality, mixed reality, and/or augmented reality environment, the digital representation may be a view of the virtual reality, mixed reality, and/or augmented reality environment from the perspective of the user (e.g., oriented or in the direction of travel). In some embodiments, when the three-dimensional space includes a physical environment, the digital representation may be a view (e.g., a directional view or a view in a direction of travel) of the physical environment (e.g., an image and/or a three-dimensional image) obtained from a user perspective.
In some embodiments, the segmentation of the digital representation of the three-dimensional space into depth planes may be performed based on a predetermined number of planes and associated distances (e.g., (1) zero to five feet, (2) greater than five feet to ten feet, and (3) greater than ten feet of three planes). In some embodiments, the digital representation of the three-dimensional space forming the segments of the depth plane may be performed using a clustering algorithm to identify an appropriate number of clusters (e.g., depth planes). In some embodiments, the clustering algorithm comprises a k-means clustering algorithm (e.g., where k is equal to the number of data clusters) that employs an elbow method that examines the percentage of variance as a function of the number of clusters. For example, the number of clusters (k) may be selected at a point where the marginal gain from an additional number of clusters drops (e.g., below a threshold amount). In this way, an optimal number of clusters can be determined, where adding additional cluster(s) would not significantly benefit the modeling of the data.
Once the representation of the three-dimensional space has been segmented into depth plane(s), the object segmentation component 130 may determine, for each depth level, object segments for the object(s) within the particular depth level. In some embodiments, the object segmentation component 130 may utilize an unsupervised machine learning algorithm to determine the object segmentation. These determined segments may define the surface area of a particular object at a particular depth level. Referring to fig. 2, a diagram 200 illustrating an exemplary initial depth level is shown. Turning to fig. 3, a diagram 300 illustrates exemplary depth levels and segmented objects of diagram 200 of fig. 2. The illustration 300 includes four depth levels 310, 320, 330, and 340.
Referring back to fig. 1, next, the spatial node localization component 140 determines a location for each of the plurality of echoing sound nodes as a function of the depth level and the surface area of the object defined by the determined segmentation. In some embodiments, the number of echo sound nodes is predetermined (e.g., thirty). In some embodiments, the number of echoing sound nodes is dynamically determined based on the surface area of the objects and their associated depth levels. In some embodiments, the number of echoing sound nodes does not exceed a predetermined maximum number (e.g., thirty). In some embodiments, the number of echo sound nodes is greater than or equal to a predetermined minimum number (e.g., three).
In some embodiments, the echogenic sound nodes may be placed in decreasing order of depth and are commensurate with the size and shape of the particular object (e.g., based on the surface area of the particular object defined by the determined object segments). Referring briefly to fig. 4, a diagram 400 illustrates an exemplary echoing sound node 410 located on the object of fig. 2 and 3.
In some embodiments, the first sound generated by the first echoed syllable point may be output at a high volume and short delay to indicate that the object is proximate to the user. The second sound generated by the second echoing sound node may be output at a lower volume and longer delay than the first sound to indicate that the second object is further away from the user than the first object.
The system 100 includes a sound generation component 150 that generates echo location sounds, each echo location sound including spatialization sounds originating from a particular echo sound node at a determined location. The system 100 includes an output component 160, the output component 160 providing the generated echo location sound to the user. In some embodiments, the echo location sound is provided through computer speaker(s), headphones (e.g., stereo, virtual reality, augmented reality, mixed reality), and/or room speakers.
In some embodiments, the generated echo location sound may be provided using: channel-based audio output (e.g., using a dolby 5.1 surround sound system), spherical sound representation (e.g., ambient stereo, higher order ambient stereo), and/or object-based audio output.
In some embodiments, the generated echo location sound may be provided to the user via a head-mounted device configured to modify the audio signal based on a head-related transfer function (HRTF) to produce a spatialized audio signal corresponding to the echo location sound. HRTFs modify audio signals based on simulated positioning to account for changes in the volume and directional perception of the audio signals that originate from the simulated location (e.g., echo sound node (s)).
Figures 5 and 6 illustrate exemplary methods related to generating echo location sounds to assist a user in navigating in three-dimensional space. While the method is shown and described as a series of acts performed in a sequence, it is to be understood and appreciated that the method is not limited by the sequence. For example, some acts may occur in an order different than that described herein. Additionally, an action may occur concurrently with another action. Further, in some instances, not all acts may be required to implement a methodology described herein.
Further, the acts described herein may be computer-executable instructions that may be implemented by one or more processors and/or stored on computer-readable medium(s). The computer-executable instructions may include routines, subroutines, programs, threads of execution, and/or the like. Still further, results of acts of the methods may be stored in a computer readable medium, displayed on a display device, and/or the like.
Referring to fig. 5, a method of generating echo location sounds to assist a user in navigating a three-dimensional space 500 is illustrated. In some embodiments, method 500 is performed by system 100.
At 510, an input is received from a user to generate an echo location sound to navigate a three-dimensional space. At 520, based on (e.g., in response to) at least the received input, the digital representation of the three-dimensional space is segmented into one or more depth planes using an unsupervised machine learning algorithm (e.g., from the perspective of the user).
At 530, for each depth plane, an object segment is determined for each object within a particular depth plane. At 540, the locations of the plurality of echoing sound nodes are determined according to the depth level and surface area of each object defined by the determined segments.
At 550, an echo location sound is generated that includes spatialized sound from each echo sound node originating from the determined location. In some embodiments, the generated echo location sound may be provided using: channel-based audio output (e.g., using a dolby 5.1 surround sound system), spherical sound representation (e.g., ambient stereo, higher order ambient stereo), and/or object-based audio output.
Turning to fig. 6, a method of generating echo location sounds to assist a user in navigating a three-dimensional space 600 is illustrated. In some embodiments, method 600 is performed by system 100.
At 610, input is received from a user to generate an echo location sound to navigate a three-dimensional space. At 620, a digital representation of the three-dimensional space is captured. For example, a digital representation of a three-dimensional space may be captured using a digital camera, a three-dimensional camera, and/or a depth camera.
At 630, in response to the received input, the digital representation of the three-dimensional space is segmented into one or more depth planes using an unsupervised machine learning algorithm (e.g., from the perspective of the user).
At 640, for each depth plane, an object segment is determined for each object within a particular depth plane. At 650, locations of the plurality of echoing sound nodes are determined as a function of the depth level and surface area of each object defined by the determined segments.
At 660, echo location sounds are generated that include spatialized sounds from each echo sound node originating from the determined location. In some embodiments, the generated echo location sound may be provided using: channel-based audio output (e.g., using a dolby 5.1 surround sound system), spherical sound representation (e.g., ambient stereo, higher order ambient stereo), and/or object-based audio output.
Described herein is a system for generating echo location sounds to assist a user in navigating a three-dimensional space, the system comprising: a processing system comprising one or more processors; and a memory having computer-executable instructions stored thereon that, when executed by the one or more processors, cause the processing system to: receiving input from a user to generate an echo location sound to navigate a three-dimensional space; based at least on the received input: segmenting the digital representation of the three-dimensional space into one or more depth planes using an unsupervised machine learning algorithm; for each depth plane, determining an object segment for each object within a particular depth plane; determining locations of a plurality of echoing sound nodes according to a depth level and a surface area of each object defined by the determined segments; and generating an echo location sound comprising spatialization sound from each echo sound node originating from the determined location.
The system may further include wherein the unsupervised machine learning algorithm comprises a clustering algorithm, wherein each cluster identified by the clustering algorithm comprises a depth level.
The system may include a memory having stored thereon further computer-executable instructions that, when executed by the one or more processors, cause the processing system to: a digital representation of a three-dimensional space is captured. The system may further include wherein the digital representation of the three-dimensional space is captured using at least one of a digital camera, a three-dimensional camera, or a depth camera. The system may further include wherein the echo location sound is generated by at least one of a virtual reality headset, a mixed reality headset, or an augmented reality headset. The system may further include wherein the echo location sound is generated using at least one of a channel-based audio output, a spherical sound representation, or an object-based audio output.
The system may further include wherein the input is inferred based on at least one of a gesture or a movement of the user. The system may further include wherein the input is based on a gesture of the user. The system may further comprise wherein the three-dimensional space comprises a computer-generated gaming experience. The system may further comprise wherein the three-dimensional space comprises a physical environment.
Described herein is a method of generating echo location sounds to assist a user in navigating a three-dimensional space, comprising: receiving input from a user to generate an echo location sound to navigate a three-dimensional space; based at least on the received input: segmenting the digital representation of the three-dimensional space into one or more depth planes using an unsupervised machine learning algorithm; for each depth plane, determining an object segment for each object within a particular depth plane; determining locations of a plurality of echoing sound nodes according to a depth level and a surface area of each object defined by the determined segments; and generating an echo location sound comprising spatialization sound from each echo sound node originating from the determined location.
The method may further include, wherein the unsupervised machine learning algorithm comprises a k-means clustering algorithm using elbow rules, the k-means clustering algorithm examining a percentage of variance as a function of a number of clusters to determine the plurality of clusters. The method may further include capturing a digital representation of the three-dimensional space using at least one of a digital camera, a three-dimensional camera, or a depth camera.
The method may further include wherein the echo location sound is generated by at least one of a virtual reality headset, a mixed reality headset, or an augmented reality headset. The method may further include wherein the echo location sound is generated using at least one of a channel-based audio output, a spherical sound representation, or an object-based audio output. The method may further include wherein the three-dimensional space includes a computer-generated gaming experience.
Described herein are computer storage media storing computer readable instructions that, when executed, cause a computing device to: receiving input from a user to generate an echo location sound to navigate a three-dimensional space; based at least on the received input: segmenting the digital representation of the three-dimensional space into one or more depth planes using an unsupervised machine learning algorithm; for each depth plane, determining an object segment for each object within a particular depth plane; determining locations of a plurality of echoing sound nodes according to a depth level and a surface area of each object defined by the determined segments; and generating an echo location sound comprising spatialization sound from each echo sound node originating from the determined location.
The computer storage medium may further include wherein the unsupervised machine learning algorithm comprises a clustering algorithm, wherein each cluster identified by the clustering algorithm comprises a depth level. The computer storage medium may further include wherein the echo location sound is generated by at least one of a virtual reality headset, a mixed reality headset, or an augmented reality headset, and wherein the echo location sound is generated using at least one of a channel-based audio output, a spherical sound representation, or an object-based audio output. Computer storage media may also include wherein the three dimensional space includes a computer generated gaming experience.
Referring to fig. 7, illustrated is an example general purpose processing system, computer, or computing device 702 (e.g., a mobile phone, desktop, laptop, tablet, watch, server, handheld, programmable consumer or industrial electronic product, set-top box, gaming system, computing node, etc.). For example, the computing device 702 may be used in a system for generating echo location sounds to assist a user in navigating in the three-dimensional space 100.
Processor(s) 720 may utilize a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any processor, controller, microcontroller, or state machine. Processor(s) 720 may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, a multi-core processor, one or more microprocessors in conjunction with a DSP core, or any other such configuration. In one embodiment, processor(s) 720 may be a graphics processor.
Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes storage devices such as memory devices (e.g., Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), etc.), magnetic storage devices (e.g., hard disk, floppy disk, cassette, tape, etc.), optical disks (e.g., Compact Disk (CD), Digital Versatile Disk (DVD), etc.), and solid state devices (e.g., Solid State Drive (SSD), flash memory drive (e.g., card, stick, key drive), etc.), or any other similar medium that transmits or communicates desired information accessible by computer 702. Thus, a computer storage medium does not include a modulated data signal as well as the modulated data signal described with respect to the communication medium.
Communication media embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.
The mass storage device(s) 750 include removable/non-removable, volatile/non-volatile computer storage media for storage of large amounts of data relative to the memory 730. For example, mass storage device(s) 750 include, but are not limited to, one or more devices such as a magnetic or optical disk drive, floppy disk drive, flash memory, solid state drive, or memory stick.
The memory 730 and mass storage device(s) 750 may include or have stored thereon the following: an operating system 760, one or more applications 762, one or more program modules 764, and data 766. Operating system 760 acts to control and allocate resources of the computer 702. Applications 762 include one or both of system and application software, and the management of resources by operating system 760 to perform one or more actions can be developed through program modules 764 and data 766 stored in memory 730 and/or mass storage device(s) 750. Thus, the application 762 may transform a general-purpose computer 702 into a specialized machine in accordance with the logic it provides.
All or portions of the claimed subject matter can be implemented using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computer to implement the disclosed functionality. By way of example, and not limitation, system 100 or portions thereof can be an application program 762 or form part of an application program 762 and includes one or more modules 764 and data 766 stored in memory and/or mass storage device(s) 750. The one or more modules 764 and data 766 may implement the functionality thereof when executed by the one or more processor(s) 720.
According to a particular embodiment, processor(s) 720 may correspond to a system on a chip (SOC) or similar architecture that includes, or in other words integrates, hardware and software on a single integrated circuit substrate. Here, processor(s) 720 can include one or more processors and memory that is at least similar to processor(s) 720 and memory 730, and so on. Conventional processors include a minimal amount of hardware and software and rely extensively on external hardware and software. In contrast, a processor's SOC implementation is more powerful because it has embedded within it hardware and software that can implement a particular functionality with minimal or no dependency on external hardware and software. For example, the system 100 and/or associated functionality may be embedded within hardware in an SOC architecture.
What has been described above includes examples of aspects of the claimed subject matter. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the claimed subject matter, but one of ordinary skill in the art may recognize that many further combinations and permutations of the disclosed subject matter are possible. Accordingly, the disclosed subject matter is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims. Furthermore, to the extent that the term "includes" is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term "comprising" as "comprising" is interpreted when employed as a transitional word in a claim.
Claims (15)
1. A system for generating echo location sounds to assist a user in navigating a three-dimensional space, comprising:
a processing system comprising one or more processors and a memory having computer-executable instructions stored thereon that, when executed by the one or more processors, cause the processing system to:
receiving input from a user to generate an echo location sound to navigate a three-dimensional space;
based at least on the received input:
segmenting the digital representation of the three-dimensional space into one or more depth planes using an unsupervised machine learning algorithm;
for each depth plane, determining an object segment for each object within a particular depth plane;
determining locations of a plurality of echoing sound nodes according to a depth level and a surface area of each object defined by the determined segments; and
generating the echo location sound comprising spatialization sound from each echo sound node originating from the determined location.
2. The system of claim 1, wherein the unsupervised machine learning algorithm comprises a clustering algorithm, each cluster identified by the clustering algorithm comprising a depth level.
3. The system of claim 1, the memory further having stored thereon computer-executable instructions that, when executed by the one or more processors, cause the processing system to:
capturing the digital representation of the three-dimensional space.
4. The system of claim 3, wherein the digital representation of the three-dimensional space is captured using at least one of: a digital camera, a three-dimensional camera, or a depth camera.
5. The system of claim 1, wherein the echo location sound is generated by at least one of a virtual reality headset, a mixed reality headset, or an augmented reality headset.
6. The system of claim 1, wherein the echo location sound is generated using at least one of a channel-based audio output, a spherical sound representation, or an object-based audio output.
7. The system of claim 1, wherein the input is inferred based on at least one of a gesture or a motion of the user.
8. The system of claim 1, wherein the input is based on a gesture of the user.
9. The system of claim 1, wherein the three-dimensional space comprises a computer-generated gaming experience.
10. The system of claim 1, wherein the three-dimensional space comprises a physical environment.
11. A method of generating echo location sounds to assist a user in navigating a three-dimensional space, comprising:
receiving input from the user to generate echo location sounds to navigate the three-dimensional space;
based at least on the received input:
segmenting the digital representation of the three-dimensional space into one or more depth planes using an unsupervised machine learning algorithm;
for each depth plane, determining an object segment for each object within a particular depth plane;
determining locations of a plurality of echoing sound nodes according to a depth level and a surface area of each object defined by the determined segments; and
generating the echo location sound comprising spatialization sound from each echo sound node originating from the determined location.
12. The method of claim 11, wherein the unsupervised machine learning algorithm comprises a k-means clustering algorithm employing elbow rules, the k-means clustering algorithm examining a percentage of variance as a function of a number of clusters to determine a plurality of clusters.
13. The method of claim 11, wherein the three-dimensional space comprises a computer-generated gaming experience.
14. A computer storage medium storing computer readable instructions that, when executed, cause a computing device to:
receiving input from a user to generate an echo location sound to navigate a three-dimensional space;
based at least on the received input:
segmenting the digital representation of the three-dimensional space into one or more depth planes using an unsupervised machine learning algorithm;
for each depth plane, determining an object segment for each object within a particular depth plane;
determining locations of a plurality of echoing sound nodes according to a depth level and a surface area of each object defined by the determined segments; and
generating the echo location sound comprising spatialization sound from each echo sound node originating from the determined location.
15. The computer storage medium of claim 14, wherein the three-dimensional space comprises a computer-generated gaming experience.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/198,238 | 2018-11-21 | ||
US16/198,238 US11287526B2 (en) | 2018-11-21 | 2018-11-21 | Locating spatialized sounds nodes for echolocation using unsupervised machine learning |
PCT/US2019/060184 WO2020106458A1 (en) | 2018-11-21 | 2019-11-07 | Locating spatialized sounds nodes for echolocation using unsupervised machine learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113168225A true CN113168225A (en) | 2021-07-23 |
CN113168225B CN113168225B (en) | 2024-03-01 |
Family
ID=69160054
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201980076681.0A Active CN113168225B (en) | 2018-11-21 | 2019-11-07 | Locating spatialized acoustic nodes for echo location using unsupervised machine learning |
Country Status (4)
Country | Link |
---|---|
US (1) | US11287526B2 (en) |
EP (1) | EP3864494B1 (en) |
CN (1) | CN113168225B (en) |
WO (1) | WO2020106458A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116402906A (en) * | 2023-06-08 | 2023-07-07 | 四川省医学科学院·四川省人民医院 | Signal grade coding method and system based on kidney echo |
WO2024088336A1 (en) * | 2022-10-28 | 2024-05-02 | International Business Machines Corporation | Multimodal machine learning for generating three-dimensional audio |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5097326A (en) * | 1989-07-27 | 1992-03-17 | U.S. Philips Corporation | Image-audio transformation system |
US20050208457A1 (en) * | 2004-01-05 | 2005-09-22 | Wolfgang Fink | Digital object recognition audio-assistant for the visually impaired |
CN101931853A (en) * | 2009-06-23 | 2010-12-29 | 索尼公司 | Audio signal processing apparatus and acoustic signal processing method |
CN102027440A (en) * | 2008-03-18 | 2011-04-20 | 艾利普提克实验室股份有限公司 | Object and movement detection |
US20120124470A1 (en) * | 2010-11-17 | 2012-05-17 | The Johns Hopkins University | Audio display system |
CN104885438A (en) * | 2012-10-31 | 2015-09-02 | 思杰系统有限公司 | Systems and methods of monitoring performance of acoustic echo cancellation |
US20170323485A1 (en) * | 2016-05-09 | 2017-11-09 | Magic Leap, Inc. | Augmented reality systems and methods for user health analysis |
US20180310116A1 (en) * | 2017-04-19 | 2018-10-25 | Microsoft Technology Licensing, Llc | Emulating spatial perception using virtual echolocation |
US20180314416A1 (en) * | 2017-04-27 | 2018-11-01 | Magic Leap, Inc. | Light-emitting user input device |
Family Cites Families (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4907136A (en) * | 1989-03-03 | 1990-03-06 | Jorgensen Adam A | Echo location system for vision-impaired persons |
US5475651A (en) * | 1994-10-18 | 1995-12-12 | The United States Of America As Represented By The Secretary Of The Navy | Method for real-time extraction of ocean bottom properties |
US6671226B1 (en) * | 2001-06-01 | 2003-12-30 | Arizona Board Of Regents | Ultrasonic path guidance for visually impaired |
AU2003241125A1 (en) * | 2002-06-13 | 2003-12-31 | I See Tech Ltd. | Method and apparatus for a multisensor imaging and scene interpretation system to aid the visually impaired |
US8068644B2 (en) * | 2006-03-07 | 2011-11-29 | Peter Thomas Tkacik | System for seeing using auditory feedback |
JP5709906B2 (en) * | 2010-02-24 | 2015-04-30 | アイピープレックス ホールディングス コーポレーション | Augmented reality panorama for the visually impaired |
US20130278631A1 (en) * | 2010-02-28 | 2013-10-24 | Osterhout Group, Inc. | 3d positioning of augmented reality information |
WO2012068280A1 (en) * | 2010-11-16 | 2012-05-24 | Echo-Sense Inc. | Remote guidance system |
US8797386B2 (en) * | 2011-04-22 | 2014-08-05 | Microsoft Corporation | Augmented auditory perception for the visually impaired |
WO2014106085A1 (en) * | 2012-12-27 | 2014-07-03 | Research Foundation Of The City University Of New York | Wearable navigation assistance for the vision-impaired |
WO2014113891A1 (en) * | 2013-01-25 | 2014-07-31 | Hu Hai | Devices and methods for the visualization and localization of sound |
GB201313128D0 (en) * | 2013-07-23 | 2013-09-04 | Stanier James G | Acoustic Spatial Sensory Aid |
US9488833B2 (en) * | 2014-02-07 | 2016-11-08 | International Business Machines Corporation | Intelligent glasses for the visually impaired |
US10409548B2 (en) * | 2016-09-27 | 2019-09-10 | Grabango Co. | System and method for differentially locating and modifying audio sources |
US10436593B2 (en) * | 2016-11-08 | 2019-10-08 | Reem Jafar ALATAAS | Augmented reality assistance system for the visually impaired |
EP3370133B1 (en) * | 2017-03-02 | 2023-10-18 | Nokia Technologies Oy | Audio processing |
JP7175281B2 (en) * | 2017-03-28 | 2022-11-18 | マジック リープ, インコーポレイテッド | Augmented reality system with spatialized audio associated with user-scanned virtual objects |
US10251011B2 (en) * | 2017-04-24 | 2019-04-02 | Intel Corporation | Augmented reality virtual reality ray tracing sensory enhancement system, apparatus and method |
US10496157B2 (en) * | 2017-05-09 | 2019-12-03 | Microsoft Technology Licensing, Llc | Controlling handheld object light sources for tracking |
GB2569576A (en) * | 2017-12-20 | 2019-06-26 | Sony Interactive Entertainment Inc | Audio generation system |
EP3540566B1 (en) * | 2018-03-12 | 2022-06-08 | Nokia Technologies Oy | Rendering a virtual scene |
US10909372B2 (en) * | 2018-05-28 | 2021-02-02 | Microsoft Technology Licensing, Llc | Assistive device for the visually-impaired |
US10735882B2 (en) * | 2018-05-31 | 2020-08-04 | At&T Intellectual Property I, L.P. | Method of audio-assisted field of view prediction for spherical video streaming |
US10712900B2 (en) * | 2018-06-06 | 2020-07-14 | Sony Interactive Entertainment Inc. | VR comfort zones used to inform an In-VR GUI editor |
US10867061B2 (en) * | 2018-09-28 | 2020-12-15 | Todd R. Collart | System for authorizing rendering of objects in three-dimensional spaces |
US10997828B2 (en) * | 2019-08-09 | 2021-05-04 | Accenture Global Solutions Limited | Sound generation based on visual data |
-
2018
- 2018-11-21 US US16/198,238 patent/US11287526B2/en active Active
-
2019
- 2019-11-07 WO PCT/US2019/060184 patent/WO2020106458A1/en unknown
- 2019-11-07 CN CN201980076681.0A patent/CN113168225B/en active Active
- 2019-11-07 EP EP19836016.6A patent/EP3864494B1/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5097326A (en) * | 1989-07-27 | 1992-03-17 | U.S. Philips Corporation | Image-audio transformation system |
US20050208457A1 (en) * | 2004-01-05 | 2005-09-22 | Wolfgang Fink | Digital object recognition audio-assistant for the visually impaired |
CN102027440A (en) * | 2008-03-18 | 2011-04-20 | 艾利普提克实验室股份有限公司 | Object and movement detection |
CN101931853A (en) * | 2009-06-23 | 2010-12-29 | 索尼公司 | Audio signal processing apparatus and acoustic signal processing method |
US20120124470A1 (en) * | 2010-11-17 | 2012-05-17 | The Johns Hopkins University | Audio display system |
CN104885438A (en) * | 2012-10-31 | 2015-09-02 | 思杰系统有限公司 | Systems and methods of monitoring performance of acoustic echo cancellation |
US20170323485A1 (en) * | 2016-05-09 | 2017-11-09 | Magic Leap, Inc. | Augmented reality systems and methods for user health analysis |
US20180310116A1 (en) * | 2017-04-19 | 2018-10-25 | Microsoft Technology Licensing, Llc | Emulating spatial perception using virtual echolocation |
US20180314416A1 (en) * | 2017-04-27 | 2018-11-01 | Magic Leap, Inc. | Light-emitting user input device |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024088336A1 (en) * | 2022-10-28 | 2024-05-02 | International Business Machines Corporation | Multimodal machine learning for generating three-dimensional audio |
CN116402906A (en) * | 2023-06-08 | 2023-07-07 | 四川省医学科学院·四川省人民医院 | Signal grade coding method and system based on kidney echo |
CN116402906B (en) * | 2023-06-08 | 2023-08-11 | 四川省医学科学院·四川省人民医院 | Signal grade coding method and system based on kidney echo |
Also Published As
Publication number | Publication date |
---|---|
US11287526B2 (en) | 2022-03-29 |
EP3864494A1 (en) | 2021-08-18 |
US20200158865A1 (en) | 2020-05-21 |
CN113168225B (en) | 2024-03-01 |
EP3864494B1 (en) | 2023-07-05 |
WO2020106458A1 (en) | 2020-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10503996B2 (en) | Context-aware display of objects in mixed environments | |
US11443453B2 (en) | Method and device for detecting planes and/or quadtrees for use as a virtual substrate | |
US10088900B2 (en) | Information processing method and information processing system | |
US10542366B1 (en) | Speaker array behind a display screen | |
KR20240011871A (en) | Method for controlling virtual object, and related apparatus | |
CN107281753B (en) | Scene sound effect reverberation control method and device, storage medium and electronic equipment | |
US8788978B2 (en) | Pinch zoom velocity detent | |
JP2015515075A (en) | 3D graphic user interface | |
US10187737B2 (en) | Method for processing sound on basis of image information, and corresponding device | |
US10278001B2 (en) | Multiple listener cloud render with enhanced instant replay | |
CN110869983A (en) | Interactive input control in a simulated three-dimensional (3D) environment | |
CN113168225B (en) | Locating spatialized acoustic nodes for echo location using unsupervised machine learning | |
WO2021103609A1 (en) | Method and apparatus for driving interaction object, electronic device and storage medium | |
CN107688426B (en) | Method and device for selecting target object | |
EP2879038A1 (en) | Input system with parallel input data | |
US20230350536A1 (en) | Displaying an environment from a selected point-of-view | |
US20220286802A1 (en) | Spatial audio modification | |
US11354011B2 (en) | Snapping range for augmented reality | |
WO2018076927A1 (en) | Operating method and device applicable to space system, and storage medium | |
WO2022066361A1 (en) | Transposing virtual objects between viewing arrangements | |
CN117202624A (en) | Heat dissipation processing method and device of MR (magnetic resonance) equipment, electronic equipment and readable storage medium | |
CN114788306A (en) | Placing sounds within content |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |