WO2022150640A1 - Utilisation d'un haut-parleur intelligent pour estimer l'emplacement d'un utilisateur - Google Patents

Utilisation d'un haut-parleur intelligent pour estimer l'emplacement d'un utilisateur Download PDF

Info

Publication number
WO2022150640A1
WO2022150640A1 PCT/US2022/011692 US2022011692W WO2022150640A1 WO 2022150640 A1 WO2022150640 A1 WO 2022150640A1 US 2022011692 W US2022011692 W US 2022011692W WO 2022150640 A1 WO2022150640 A1 WO 2022150640A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
recited
music
reflectors
differencing
Prior art date
Application number
PCT/US2022/011692
Other languages
English (en)
Inventor
Lili Qiu
Mei Wang
Wei Sun
Original Assignee
Board Of Regents, The University Of Texas System
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Board Of Regents, The University Of Texas System filed Critical Board Of Regents, The University Of Texas System
Publication of WO2022150640A1 publication Critical patent/WO2022150640A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S3/00Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received
    • G01S3/80Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received using ultrasonic, sonic or infrasonic waves
    • G01S3/802Systems for determining direction or deviation from predetermined direction
    • G01S3/808Systems for determining direction or deviation from predetermined direction using transducers spaced apart and measuring phase or time difference between signals therefrom, i.e. path-difference systems
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/20Position of source determined by a plurality of spaced direction-finders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/40Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
    • H04R1/406Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/03Synergistic effects of band splitting and sub-band processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/20Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic
    • H04R2430/23Direction finding using a sum-delay beam-former

Definitions

  • the present invention relates generally to user localization systems, and more particularly to using a smart device, such as a smart speaker, to estimate the location of the user.
  • User localization systems attempt to estimate the location of the user. For example, such systems may attempt to use audio and vision-based schemes to estimate the location of the user. Such vision-based schemes may involve the use of cameras. However, it may not be prudent to deploy cameras throughout a home due to privacy concerns.
  • a method for identifying a location of a user comprises collecting signals emanating from the user by a microphone array of a smart device. The method further comprises estimating an angle of arrival of propagation paths of one or more of the collected signals emanating from the user. The method additionally comprises estimating positions and orientations of reflectors of a room structure. Furthermore, the method comprises identifying the location of the user by retracing the propagation paths of the one or more signals collected from the user based on the positions and the orientations of the reflectors of the room structure.
  • Figure 1 illustrates a diagram of a placement of smart device, such as a smart speaker, within a living space, such as a house, apartment, etc., in accordance with an embodiment of the present invention
  • Figure 2 is a diagram of the software components of the smart speaker used to estimate the location of the user within the living space in accordance with an embodiment of the present invention
  • Figure 3 illustrates an embodiment of the present invention of the hardware configuration of the smart speaker which is representative of a hardware environment for practicing the present invention
  • Figure 4 is a flowchart of a method for identifying the location of the user in accordance with an embodiment of the present invention
  • Figure 5 is a flowchart of a method for estimating the angle of arrival (AoA) of the propagation paths of one or more of the collected signals in accordance with an embodiment of the present invention
  • Figure 6 is a diagram of the multi-resolution analysis algorithm in accordance with an embodiment of the present invention.
  • Figure 7 is a diagram illustrating the comparison of the AoA derived from Short-Time Fourier Transform (STFT) and wavelet with and without differencing in accordance with an embodiment of the present invention
  • Figure 8 is a flowchart of a method for estimating the AoA of the propagation paths of one or more signals reflected from the reflectors of the room structure in accordance with an embodiment of the present invention
  • Figure 9 illustrates an exemplary azimuth-distance profile in accordance with an embodiment of the present invention
  • Figure 10A illustrates retracing using a ray structure for each of the two near parallel paths in accordance with an embodiment of the present invention
  • Figure 10B illustrates retracing using a cone structure for each of the paths in accordance with an embodiment of the present invention.
  • user localization systems attempt to estimate the location of the user. For example, such systems may attempt to use audio and vision-based schemes to estimate the location of the user. Such vision-based schemes may involve the use of cameras. However, it may not be prudent to deploy cameras throughout a home due to privacy concerns.
  • Other such user localization systems utilize device-based tracking, which requires the user, whose location is to be estimated, to carry a device (e.g., smartphone), which may not be convenient for the user at home.
  • a device e.g., smartphone
  • the embodiments of the present invention provide a means for accurately estimating the location of the user using standard inexpensive equipment (e.g., smart speaker) that may already be located in the user’s home.
  • standard inexpensive equipment e.g., smart speaker
  • the principles of the present invention estimate the user’s location, such as within a particular room in a house, by retracing multiple propagation paths that the user’s sound traverses.
  • the angle of arrivals (AoAs) of the multiple paths traversed by the voice signals from the user to a microphone array, such as on a smart speaker are estimated.
  • the multipath may include a direct path (referring to the path of a signal propagating between the user and the microphone array without any reflections) and the reflected paths (referring to the path of a signal propagating between the user and the microphone array with reflections, such as via walls, ceilings, etc.).
  • the indoor space structure e.g., walls, ceilings
  • the indoor space structure is estimated by emitting wideband chirp pulses to estimate the angle of arrival (AoA) and distance to the reflectors (e.g., walls) in the room.
  • the propagation paths of the signals are retraced based on the estimated AoA of the voice signals and the reflected chirp signals to localize the voice.
  • “Localizing the voice,” as used herein, refers to estimating the location of the source of the voice, which in the case of the present invention, represents the location of the user speaking such words.
  • the principles of the present invention may actively map indoor rooms and localize voice sources using only a smart device, such as a smart speaker, without additional hardware.
  • the present invention may localize voice in both line of sight (LoS) and non-light of sight (NLoS) scenarios.
  • LoS scenarios refer to the user being within sight of the smart device, such as the smart speaker; whereas, the NLoS scenarios refer to the user not being within sight of the smart device (e.g., smart speaker), such as being behind a wall or in a different room.
  • Prior user localization systems are not capable of estimating the user’s location in NLoS scenarios.
  • “localizing the voice,” as used herein, refers to estimating the location of the source of the voice, which in the case of the present invention, represents the location of the user speaking such words.
  • such words may refer to a command, such as to turn on a light.
  • the ability to localize human voice benefits smart devices, such as smart speakers, in many ways. For example, knowing the user’s location allows the smart speaker to beamform its transmission to the user so that it can both hear from and transmit to a faraway user. Second, the user location gives context information, which can assist in interpreting the user’s intent.
  • the smart speaker can resolve the ambiguity and tell which light to turn on depending on the user’s location.
  • knowing the user’s location also enables location based services. For instance, a smart speaker can automatically adjust the temperature and lighting condition near the user.
  • location information can also help with speech recognition and natural language processing by providing important context information. For example, when a user says “orange” in the kitchen, it knows that refers to a fruit; whereas, when the same user says "orange” elsewhere, it may be interpreted as a color.
  • a microphone array widely available on a smart speaker is utilized to collect the received signals from the user.
  • the earliest arriving voice signals are captured so that the signal traversing via the shortest path has small or no overlap with those traversing via the longer paths.
  • wavelet and Short-Time Fourier Transform (STFT) analysis are performed on the signals emanating from the user over different time windows to benefit from both transient signals with low coherence and long signals with high cumulative energy. Furthermore, differencing is applied to the wavelet and STFT analysis to cancel the signals in the time-frequency domain to reduce coherence, thereby improving the AoA accuracy.
  • STFT Short-Time Fourier Transform
  • the room contour i.e., the distances and direction of the walls, ceilings, etc.
  • the smart device e.g., smart speaker
  • the smart device emits wide-band Frequency Modulated Continuous Waves (FMCW) chirp pulses and utilizes the wideband 3D multiple signal classification (MUSIC) algorithm to estimate the multiple propagation paths from the reflected chirp pulses simultaneously.
  • FMCW Frequency Modulated Continuous Waves
  • MUSIC wideband 3D multiple signal classification
  • the wide bandwidth not only improves distance resolution, but also allows one to leverage the frequency diversity to estimate the AoAs of coherent signals.
  • the AoA estimation is improved by leveraging the assumption of a rectangle room (which is common in real world scenarios).
  • the accuracy of the distance estimation, such as to the wall is improved by using beamforming.
  • Figure 1 illustrates a diagram of a placement of smart device, such as a smart speaker, within a living space, such as a house, apartment, etc., in accordance with an embodiment of the present invention.
  • “Living space,” as used herein, refers to space within a building. In such a space, a person (or people) may or may not live.
  • a living space may include examples, such as house, apartment, etc. where people live.
  • a living space may also include spaces, such as hospital rooms, offices, etc., where people work or interact with others.
  • living space 100 includes a smart device 101, such as a smart speaker as shown in Figure 1, that is placed within a particular room of the home.
  • smart speaker 101 is located in room 102A (identified as “Room 1” in Figure 1), separated from other rooms, such as room 102B (identified as “Room 2” in Figure 1), room 102C (identified as “Room 3” in Figure 1) and room 102D (identified as “Room 4” in Figure 1) via a wall, door, etc.
  • a “smart device,” as used herein, refers to an electronic device, generally connected to other devices or networks via different wireless protocols, such as Bluetooth, Zigbee, NFC, Wi Fi, LiFi, 5G, etc., that can operate to some extent interactively and autonomously.
  • wireless protocols such as Bluetooth, Zigbee, NFC, Wi Fi, LiFi, 5G, etc.
  • smartphones smart televisions or displays, smart thermostats, smart doorbells, smart locks, smart refrigerators, phablets and tablets, smartwatches, smart bands, smart key chains, smart speakers and others.
  • a “smart speaker,” as used herein, refers to a speaker and voice command device with an integrated virtual assistant that offers interactive actions and hands-free activation with the help of one "hot word” (or several "hot words”).
  • smart speaker 101 functions as a smart device that utilizes Wi-Fi, Bluetooth and other protocol standards to extend usage beyond audio playback, such as to control home automation devices, such as light source 103.
  • This can include, but is not limited to, features, such as compatibility across a number of services and platforms, peer-to-peer connection through mesh networking, virtual assistants, and others. Each can have its own designated interface and features in-house, usually launched or controlled via application or home automation software.
  • Some smart speakers also include a screen to show the user a visual response.
  • smart speaker 101 may be utilized to control home automation devices, such as light source 103.
  • a user 104 may be located in room 102A and instruct light source 103 to be turned on using voice commands captured by smart speaker 101.
  • smart speaker 101 includes a microphone array 105 for extracting voice input, such as voice signals emanating from user 104 or signals reflected from reflectors of the room structure, such as walls, ceilings, etc.
  • a “reflector,” as used herein, refers to a object in living space 100 that causes the reflection of a signal, whether a signal emanating from user 104 or a chirp pulse emanating from smart speaker 101.
  • the reflector may be a wall, ceiling, table, etc. located within living space 100.
  • FIG. 1 illustrates a living space with four rooms
  • living space 100 may include any number of rooms, which may be separated in any number of ways (e.g., doors, walls, ceilings, etc.).
  • user 104 may be located in any room, including being located in a different room from smart speaker 101.
  • living space 100 may include any number of smart speakers 101.
  • Figure 2 is a diagram of the software components of smart speaker 101 (Figure 1) used to estimate the location of the user 104 (Figure 1) within living space 100 ( Figure 1) in accordance with an embodiment of the present invention.
  • smart speaker 101 includes an angle of arrival estimator 201 configured to estimate the angle of arrival of signals emanating from user 104 as discussed further below in connection with Figures 4-9 and 10A-10B.
  • Smart speaker 101 further includes a room structure estimator 202 configured to estimate the room contour as discussed further below in connection with Figures 4-9 and 10A-10B.
  • Room contour refers to the structure or outline of living space 100.
  • Smart speaker 101 additionally includes a constrained beam retracing engine 203 configured to retrace the propagation paths of the signals received from user 104 as well as the signals received from the reflectors of the room structure as discussed further below in connection with Figures 4-9 and 10A-10B.
  • “Retracing,” as used herein, refers to tracing back the propagation path of the signal(s) collected from user 104 and tracing back the propagation path of the signal(s) collected from the reflectors of the room structure.
  • Figure 3 illustrates an embodiment of the present invention of the hardware configuration of smart speaker 101 ( Figures 1 and 2) which is representative of a hardware environment for practicing the present invention.
  • Smart speaker 101 may be a machine that operates as a standalone device or may be networked to other machines. Further, while smart speaker 101 is shown only as a single machine, the term "system” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
  • Smart speaker 101 may include one or more speakers 301, one or more processors 302, a main memory 303, and a static memory 304, which communicate with each other via a link 305 (e.g., a bus).
  • Smart speaker 101 may further include a video display unit 306, an alphanumeric input device 307 (e.g., a keyboard), and a user interface (UI) navigation device 308.
  • Video display unit 306, alphanumeric input device 307, and UI navigation device 308 may be incorporated into a touch screen display.
  • a UI of smart speaker 101 can be realized by a set of instructions that can be executed by processor 302 to control operation of video display unit 306, alphanumeric input device 307, and UI navigation device 308.
  • Video display unit 306, alphanumeric input device 307, and UI navigation device 308 may be implemented on smart speaker 101 arranged as a virtual assistant to manage parameters of the virtual assistant.
  • smart speaker 101 includes microphone array 105 and a set of optical sensors 309 having source(s) 310 and detectors (s) 311.
  • Smart speaker 101 may include a set of acoustic sensors 312 having transmitter(s) 313 and receiver(s) 314.
  • Smart speaker 101 may also include a network interface device 315, and one or more sensors (not shown), such as a global positioning system (GPS) sensor, compass, accelerometer, or other sensor.
  • the communications may be provided using a bus 305, which can include a link in a wired transmission or a wireless transmission.
  • network interface device 315 interconnects bus 305 with an outside network (network 316) thereby allowing smart speaker 101 to communicate with other devices, such as other smart devices (not shown), etc.
  • Network 316 may be, for example, a local area network, a wide area network, a wireless wide area network, a circuit-switched telephone network, a Global System for Mobile Communications (GSM) network, a Wireless Application Protocol (WAP) network, a WiFi network, an IEEE 802.11 standards network, various combinations thereof, etc.
  • GSM Global System for Mobile Communications
  • WAP Wireless Application Protocol
  • WiFi Wireless Fidelity
  • IEEE 802.11 standards network
  • main memory 303 may store application 317, which may include, for example, angle of arrival estimator 201 (Figure 2), room structure estimator 202 ( Figure 2) and constrained beam retracing engine 203 ( Figure 2). Furthermore, application 317 may include, for example, a program for estimating the location of the user, such as user 104 ( Figure 1), as discussed further below in connection with Figures 4-9 and 10A-10B.
  • Processor(s) 302 may include instructions to completely or at least partially operate smart speaker 101 as an activated smart home speaker with user localization capabilities.
  • Components of smart speaker 101, as taught herein, can be distributed as modules having instructions in one or more of main memory 303, static memory 304, and/or within instructions 318 of processor(s) 302.
  • application 317 of smart device 101 includes the software components of angle of arrival estimator 201, room structure estimator 202 and constrained beam retracing engine 203.
  • such components may be implemented in hardware, where such hardware components would be connected to bus 305.
  • the functions discussed above performed by such components are not generic computer functions.
  • smart speaker 101 is a particular machine that is the result of implementing specific, non-generic computer functions.
  • the present invention may be a system, a method, and/or a computer program product at any possible technical detail level of integration
  • the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
  • the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
  • the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • a non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or Flash memory erasable programmable read-only memory
  • SRAM static random access memory
  • CD-ROM compact disc read-only memory
  • DVD digital versatile disk
  • memory stick a floppy disk
  • a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon
  • a computer readable storage medium is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
  • Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
  • the network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
  • Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, configuration data for integrated circuitry, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++, or the like, and procedural programming languages, such as the "C" programming language or similar programming languages.
  • the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
  • These computer readable program instructions may be provided to a processor of a computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
  • the computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).
  • the functions noted in the blocks may occur out of the order noted in the Figures.
  • two blocks shown in succession may, in fact, be accomplished as one step, executed concurrently, substantially concurrently, in a partially or wholly temporally overlapping manner, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • user localization systems attempt to estimate the location of the user. For example, such systems may attempt to use audio and vision-based schemes to estimate the location of the user. Such vision-based schemes may involve the use of cameras. However, it may not be prudent to deploy cameras throughout a home due to privacy concerns. Other such user localization systems utilize device-based tracking, which requires the user, whose location is to be estimated, to carry a device (e.g., smartphone), which may not be convenient for the user at home. Furthermore, other such user localization systems may utilize device-free radio frequency. However, such a scheme requires a large bandwidth as well as many antennas or millimeter wave chirps to achieve high accuracy, which is not easy to deploy at home.
  • the embodiments of the present invention provide a means for accurately estimating the location of the user using standard inexpensive equipment (e.g., smart speaker) that may already be located in the user’s home as discussed below in connection with Figures 4-9 and 10A-10B.
  • standard inexpensive equipment e.g., smart speaker
  • Figure 4 is a flowchart of a method 400 for identifying the location of the user (e.g., user 104 of Figure 1) in accordance with an embodiment of the present invention.
  • smart speaker 101 collects signals emanating from user 104 by microphone array 105 of smart speaker 101.
  • user 104 may speak the command to turn on light source 103. Such words may then be collected by microphone array 105 of smart speaker 101.
  • angle of arrival estimator 201 of smart speaker 101 estimates an angle of arrival (AoA) of the propagation paths of one or more of the signals collected in step 401.
  • Figure 5 is a flowchart of a method 500 estimating the AoA of the propagation paths of one or more of the collected signals (collected in step 401) in accordance with an embodiment of the present invention.
  • angle of arrival estimator 201 of smart speaker 101 performs wavelet and Short-Time Fourier Transform (STFT) analysis on the signals emanating from user 104 over different time windows.
  • STFT Short-Time Fourier Transform
  • step 502 angle of arrival estimator 201 of smart speaker 101 applies differencing to the wavelet and STFT analysis to cancel signals in a time-frequency domain.
  • angle of arrival estimator 201 of smart speaker 101 uses the results from the STFT analysis on the signals emanating from user 104 to form a base.
  • step 504 angle of arrival estimator 201 of smart speaker 101 searches for a nearest neighbor in the results of the wavelet analysis and in the results of applying the differencing to the wavelet and STFT analysis for each point in the base. [0076] In step 505, angle of arrival estimator 201 of smart speaker 101 selects a number of peaks from a selected number of nearest neighbors as corresponding to the estimated angle of arrival of the propagation paths of one or more of the collected signals.
  • smart speaker 101 utilizes time-frequency analysis to reduce coherence in voice signals since signals that differ in either time or frequency will be separated out.
  • smart speaker 101 separates coherence in across different frequency bins, and then cancels the paths in each frequency bin by taking the difference between the two consecutive time windows. It is noted that such a process is useful for voice signals since different pitches may occur at a different time.
  • An important decision in time- frequency analysis is to select the sizes of the time window and frequency bin to perform the analysis.
  • aggregating the signals over a larger time window and a larger frequency bin improves the signal-to-noise ratio (SNR) and in turn improves the AoA estimation accuracy.
  • SNR signal-to-noise ratio
  • a larger time window and a larger frequency bin also means more coherent signals.
  • the frequency of voice signals varies unpredictably over time, which makes it challenging to determine a fixed time window and frequency bin.
  • Figure 6 is a diagram of the multi-resolution analysis algorithm in accordance with an embodiment of the present invention.
  • STFT Short- Time Fourier Transform
  • MUSIC multiple signal classification
  • MUSIC multiple signal classification
  • frequency analysis is performed using smaller windows 602, where the difference between adjacent windows is taken (see 605) to reduce the coherent signals and improve AoA estimation under the coherent multipath.
  • wavelet 603 is utilized, which has a higher time resolution for relatively high frequency signals.
  • the transient voice signals that have low or no coherence can be captured thereby reducing outliers in the MUSIC AoA estimation.
  • wavelet is combined with STFT with different window sizes as shown in Figure 6. A discussion regarding these methods is elaborated below.
  • the evanescent pitches are selected in the time-frequency domain to reduce error from coherence.
  • the next step is to further reduce coherent signals by taking the difference between two consecutive time windows for each antenna (see 605). This cancels the paths with different delay in the time-frequency domain, and is more effective than cancelling in the time- domain alone. If the difference between two adjacent windows is greater than the delay difference of any two paths, this process can remove the old paths. As a result, coherence is reduced in the short time window.
  • wavelet is a multi-resolution analysis.
  • short basis functions to isolate signal discontinuities and long basis functions to perform detailed frequency analysis are used.
  • Wavelet 603 has superior resolution for relatively high frequency signals. Transient signals in the small time window have less energy and may yield large errors. As a result, to improve accuracy, the differences of the wavelet spectrum in the two consecutive time windows are taken (see 606) to further reduce the coherence.
  • the AoA derived from applying MUSIC to STFT and wavelet are compared.
  • Figure 7 shows the results for the case where a woman speaks at 2.4 meters away from microphone array 105.
  • Figure 7 is a diagram illustrating the comparison of the AoA derived from STFT and wavelet with and without differencing in accordance with an embodiment of the present invention.
  • dashed lines 701 are ground truth AoAs of different paths.
  • the wavelet results without taking difference are plotted as circles 703, which also deviates a lot from dashed lines 701 because of low energy.
  • Points 704 are the AoA estimates derived from MUSIC when differencing is applied to STFT and wavelet, referred to herein as the STFT Diff and Wavelet Diff methods. Compared with the original results (shown in circles 702, 703), differencing brings the estimation closer to the ground truth angles (shown as dashed lines 701).
  • Figure 6 illustrates the algorithm for deriving the results using different time windows, where the combined results 607 are synthesized to select the final AoA results as discussed herein.
  • the results from STFT, STFT differencing and wavelet differencing are combined using a linear regression.
  • training traces that contain the result from each method e.g., MUSIC, STFT+MUSIC, and wavelet+MUSIC
  • AoA the ground truth Angle of Arrival
  • the least square method may be utilized to determine the parameters, A and b.
  • the trained model outputs the combined result using the estimations from these individual methods.
  • a non-linear model e.g., neural network
  • the non-linear model outputs the combined result using the estimations from these individual methods.
  • the multiple signal classification (MUSIC) profiles from STFT, STFT differencing, and wavelet differencing results are combined in which the peaks from the combined MUSIC profiles are selected.
  • the weighted cluster of these points are computed, where the weight is set according to the magnitude of the MUSIC peak.
  • the top K clusters from each algorithm is selected, where f is a positive whole number greater than zero.
  • the nearest neighbor algorithm is used to form the base. Since STFT with a large window provides more stable results without significant outliers, such an algorithm is used to form the base. For each point in the base, the nearest neighbor is searched in the results of the other two methods as they contain both more accurate real peaks and outlier peaks. Finally, the top P peaks from the selected nearest neighbors are picked as the final AoA estimates.
  • Algorithm 1 A pseudo code of the algorithm for estimating the AoA of the propagation paths of one or more of the collected signals is shown below as Algorithm 1.
  • Algorithm 1 Multi-Resolution Analysis Algorithm.
  • spectShortDiff diff(STFT(signal,ShortWindow));
  • spectWaveletDiff diff(Wavelet(signal));
  • room structure estimator 202 of smart speaker 101 estimates the room structure by emitting chirp pulses in the room structure, such as living space 100.
  • the chirp pulses correspond to frequency-modulated continuous-wave (FMCW) chirps.
  • room structure estimator 202 of smart speaker 101 collects signals reflected by the reflectors (e.g., ceilings, doors, walls) of the room structure (e.g., living space 100) from the emitted chirp pulses.
  • the frequency of the FMCW signals reflected by the reflectors of the room structure from the emitted chirp pulses that are collected by smart speaker 101 is between 1 kHz and 3 kHz.
  • step 405 room structure estimator 202 of smart speaker 101 estimates an angle of arrival (AoA) of the propagation paths of one or more signals reflected from the reflectors of the room structure.
  • AoA angle of arrival
  • Figure 8 is a flowchart of a method 800 for estimating the AoA of the propagation paths of one or more signals reflected from the reflectors of the room structure in accordance with an embodiment of the present invention.
  • room structure estimator 202 of smart speaker 101 divides a FMCW signal into multiple subbands in a time domain.
  • room structure estimator 202 of smart speaker 101 runs a 3D multiple signal classification (MUSIC) algorithm in each of the multiple subbands to generate a 3D MUSIC profiles for each of the multiple subbands, where the 3D MUSIC profile corresponds to an azimuth AoA-distance profile.
  • MUSIC 3D multiple signal classification
  • step 803 room structure estimator 202 of smart speaker 101 sums the generated 3D MUSIC profiles.
  • step 804 room structure estimator 202 of smart speaker 101 searches and selects azimuth AoAs from the summed generated 3D MUSIC profiles that minimize a fitting error with a rectangular room.
  • step 805 room structure estimator 202 of smart speaker 101 estimates the angle of arrival (AoA) of the propagation paths of one or more signals reflected from the reflectors of the room structure by adjusting the angles of the selected azimuth AoAs so that adjacent azimuth AoAs of the selected azimuth AoAs correspond to p/2.
  • AoA angle of arrival
  • step 806 room structure estimator 202 of smart speaker 101 performs a delay-and- sum beamforming algorithm on the FMCW signals reflected by the reflectors of the room structure forming FMCW profiles containing distance information of the estimated distance of the reflectors to smart speaker 101.
  • the principles of the present invention estimate the room contour using wideband 3D MUSIC algorithms. Accuracy is improved by leveraging constraints of the azimuth AoA and applying beamforming.
  • smart speaker 101 estimates the room structure once unless it is moved to a new position. In one embodiment, smart speaker 101 estimates the room structure by sending FMCW chirps. Let / c , B and T denote the center frequency, bandwidth, and duration of the chirp, respectively. Upon receiving the reflected signals, smart speaker 101 applies the 3D MUSIC algorithm.
  • the 2D Range-Azimuth MUSIC algorithm is generalized to a 3D joint estimation of distance, azimuth AoA and elevation AoA.
  • the 3D MUSIC algorithm has better resolution than the 2D MUSIC algorithm since the peaks that differ in any of the three dimensions are separated out.
  • the received signals are transformed into a 3D sinusoid whose frequencies are proportional to the distance and a function of the two angles.
  • the steering vector is extended to have three input parameters: distance R, azimuth angle Q, and elevation angle cp. , (1)
  • i is the array index
  • N is the number of microphones
  • r is the radius of the microphone array
  • c is sound speed
  • N s is the subsampling rate
  • M s is the temporal smoothing window
  • T s is the time interval.
  • FMCW signals from 1 kHz to 3 kHz for AoA are used for estimation.
  • the 2 kHz bandwidth is divided into 20 subbands each with 100 Hz.
  • room structure estimator 202 since the frequency of the FMCW signal increases linearly over time, room structure estimator 202 divides the FMCW signal into multiple subbands in the time domain, runs the 3D MUSIC algorithm in each subband to generate a 3D MUSIC profile (azimuth AoA-distance profile) for each of the subbands, and then sums up the 3D MUSIC profiles from all the subbands.
  • the transmission signal is aligned with the received signal so that they span the same subband.
  • the alignment is determined by the distance. Therefore, a peak is identified in the 3D MUSIC profile (azimuth AoA-distance profile), which is obtained by mixing the received signal with the transmitted signal that is sent d T ago, where dT is the propagation delay and determined based on the distance.
  • the azimuth AoA and distance output from the 3D MUSIC algorithm are used as shown in Figure 9.
  • Figure 9 illustrates an exemplary azimuth-distance profile in accordance with an embodiment of the present invention.
  • the MUSIC profile can be noisy, which makes it difficult to determine the right peaks to use for distance and AoA estimation of the walls.
  • the shapes of most rooms in a living area correspond to rectangular shapes, such information is leveraged to improve peak selection.
  • room structure estimator 202 selects the peaks such that the difference in the azimuth AoA of two consecutive peaks are as close to 90° as possible.
  • room structure estimator 202 searches for the 4 peaks qo, qi, 0 3 ⁇ 4 Q3 from the 3D MUSIC profile that minimizes the fitting error with a rectangular room (i.e., min ⁇ , PhaseDi f f (0 / q, + i) p/2, where PhaseDi f f (.) is the difference between the two angles by taking into account the phase wraps every 2p).
  • room structure estimator 202 adjusts the solutions so that the difference between the adjacent AoA is exactly p/2. In one embodiment, this can be done by finding 0j that minimizes f maj AoA is set to
  • smart speaker 101 sends 1 kHz - 10 kHz FMCW chirps.
  • 1 kHz - 3 kHz FMCW chirps are used for AoA estimation to reduce computational cost since MUSIC requires expensive eigenvalue decomposition, but the 1 kHz - 10 kHz FMCW chirps are used for distance estimation.
  • the SNR is increased using beamforming.
  • the delay-and-sum (DAS) beamforming algorithm is utilized by room structure estimator 202 towards the estimated azimuth AoAs. Then, room structure estimator 202 searches for a peak in the beamformed FMCW profile. It has been discovered that after beamforming, the peak magnitude increases significantly with a more accurate distance estimation.
  • DAS delay-and-sum
  • constrained beam retracing engine 203 of smart speaker 101 identifies a location of user 104 by retracing the propagation path of the signal(s) collected from user 104 and retracing the propagation paths of the signals(s) collected from the reflectors of the room structure.
  • “Retracing,” as used herein, refers to tracing back the propagation path of the signal(s) collected from user 104 and tracing back the propagation path of the signal(s) collected from the reflectors of the room structure.
  • the location of user 104 is determined by retracing each of the propagation paths of the signals collected from user 104 and retracing each of the propagation paths of the signals collected from the reflectors of the room structure as a cone structure resulting in a plurality of cone structures, where the location of user 104 corresponds to a point in the cone structures such that a circle centered at the point with a radius of 0.5 m overlaps with a maximum number of cones as discussed further below.
  • the width of each of the cone structures corresponds to a peak width obtained using the 3D MUSIC algorithm on the FMCW signals reflected by the reflectors of the room structure.
  • the user can be localized by retracing the paths using the estimated AoA of the voice signals and room structure.
  • constrained beam retracing engine 203 may first find the reflection points on the walls by the propagation path derived from the estimated AoA.
  • Figure 10A illustrates retracing using a ray structure for each of the two near parallel paths in accordance with an embodiment of the present invention. Then, constrained beam retracing engine 203 traces back the incoming path of voice signals before the wall reflection based on the reflection property.
  • the following strategies may be employed by constrained beam retracing engine 203.
  • the propagation paths are treated as a cone where the cone center is determined by the estimated AoA and the cone width is determined by the MUSIC peak width. This allows one to capture the uncertainty in the AoA estimation.
  • constrained beam retracing engine 203 lets the AoA estimation procedure return more paths so that the room structure is incorporated to make an informed decision on which paths to use for localization. Specifically, for each of the K paths returned by the AoA estimation, constrained beam retracing engine 203 traces back using the cone structure as shown in Figure 10B.
  • Figure 10B illustrates retracing using a cone structure for each of the paths in accordance with an embodiment of the present invention.
  • constrained beam retracing engine 203 searches for a point O such that the circle centered at the point with aradius of 0.5 m overlaps with the maximum number of cones corresponding to the other K- 1 paths.
  • the user is localized at the point O.
  • K is any whole number greater than 2.
  • the location of the user is determined by retracing each of the propagation paths of the signals collected from the user based on the positions and orientations of the reflectors of the room structure as a cone structure resulting in a plurality of cone structures, where each point in the cone structure is assigned a probability based on the distance from the peaks in the multiple signal classification (MUSIC) profiles.
  • the joint probability of a point in space is computed as the product of probabilities from the cone structures corresponding to all the reflectors.
  • the location of the user is then derived as the weighted centroid of all the points where the weights are the joint probabilities.
  • the width of the cone structure is determined by the width of the peaks in the multiple signal classification (MUSIC) profile.
  • the location of the user then corresponds to a point in the plurality of cone structures such that a circle centered at the point overlaps with a maximum number of cones.
  • MUSIC multiple signal classification
  • the location of the user is determined by retracing each of the propagation paths of the signals collected from the user based on the positions and orientations of the reflectors of the room structure as an intersection of the propagation paths from multiple reflectors.
  • the positions and orientations of the reflectors of the room structure are estimated using the multiple signal classification (MUSIC) algorithm.
  • MUSIC multiple signal classification
  • smart speaker 101 interprets a command from user 104 in connection with the identified location of user 104. For example, the user may have instructed smart speaker 101 to turn on the light; however, in a living space 100 containing multiple lights, smart speaker 101 may not know which light to turn on without knowing the user’s location.
  • smart speaker 101 may perform a table took-up in a table containing a listing of commands, objects and locations. Such a table may be stored in a storage device (e.g., memory 304).
  • such a statement may be associated with the command (e.g., “turn on light source”).
  • speech recognition software may be used to recognize and translate spoken words into text. Examples of such speech recognition software include Braina Pro, e-Speaking, IBM® Watson Speech to Text, Amazon® Transcribe, etc.
  • Such text may be listed in a table and associated with a command, such as “turn on light source.”
  • Each command may then be associated with an object, such as a light source, and a location within living area 100 (e.g., room 102A). After identifying the location of user 104, the appropriate object associated with the command may be identified.
  • the command to turn on the light associated with light source 103 in room 102A will be identified.
  • smart speaker 101 will be able to determine which light source 103 to be activated.
  • the location of the user can be determined by the smart device (e.g., smart speaker) localizing the user’s voice, even in situations when the user is not within the line of sight by the smart device, which could not be performed in prior user localization systems.
  • the smart device e.g., smart speaker
  • the user location gives context information, which can help to better interpret the user's intent.
  • the smart device e.g., smart speaker
  • knowing the location also enables location based services.
  • a smart device e.g., smart speaker
  • location information can also help with speech recognition and natural language processing by providing important context information. For example, when a user says “orange” in the kitchen, it knows that refers to a fruit; when the same user says “orange” elsewhere, it may interpret that as a color.
  • the principles of the present invention may identify the location of the user via the use of a single smart speaker, such as in a building with multiple rooms, without the need for a smart speaker to be located in each room.
  • the principles of the present invention may identify the location of the user via the use of multiple smart speakers, such as in a building.
  • the smart speakers may collectively be utilized to estimate the location of the user.
  • each smart speaker independently records the received sound.
  • the smart speaker that receives the highest volume will run the algorithm of the present invention to localize the sound.
  • each smart speaker uses MUSIC to determine the AoA and beamform to the estimated AoA, and record the sound from each beamforming angle. Then the smart speakers share the recorded sounds and cluster them to determine the number of unique sounds.
  • the present invention identifies the smart speaker that “hears” the loudest sound (or highest peak in the MUSIC profile) and appoints that smart speaker to localize the sound using the approach of the present invention.
  • the approach of the present invention to localize the sound can be applied.
  • the MUSIC algorithm may be implemented to estimate the Angle of Arrival (AoA) as well as the beamform towards each AoA.
  • the AoAs are clustered based on the similarity of the sound, and the AoAs corresponding to the sound in the same cluster are used as the AoA for the sound.
  • the approach of the present invention is then applied to localize sound in each cluster.
  • a clustering algorithm e.g., k means clustering algorithm
  • Spectral clustering may then be used to automatically determine the number of clusters.
  • the present invention tracks a sound generated from a moving source, such as a user.
  • the approach of the present invention may be used to localize the sound during each snapshot.
  • accuracy is enhanced by leveraging the temporal relationship in the movement to enhance accuracy.
  • cones when cones are used to re-trace, a term is added to minimize the change in the positions during two consecutive intervals.
  • the location of the user corresponds to a point in the plurality of cone structures such that a circle centered at the point overlaps with a maximum number of cones and is also close to the previous position.
  • the present invention localizes a user in a different room based on the room structure and the AoA for multiple paths that the user’s sound traverses.
  • embodiments of the present invention provide a means for improving the technology or technical field of user localization systems by more accurately estimating the location of the user using standard inexpensive equipment (e.g., smart speaker).
  • standard inexpensive equipment e.g., smart speaker
  • the present invention improves the technology or technical field involving user localization systems.
  • user localization systems attempt to estimate the location of the user.
  • such systems may attempt to use audio and vision-based schemes to estimate the location of the user.
  • vision-based schemes may involve the use of cameras.
  • Other such user localization systems utilize device-based tracking, which requires the user, whose location is to be estimated, to carry a device (e.g., smartphone), which may not be convenient for the user at home.
  • other such user localization systems may utilize device-free radio frequency.
  • Embodiments of the present invention improve such technology by having a smart device, such as a smart speaker, collect signals emanating from a user in a living space by a microphone array of the smart speaker. The angle of arrival of the propagation paths of one or more of the collected signals are then estimated. Furthermore, the smart speaker estimates a room structure by emitting chirp pulses in the room structure. The smart speaker then collects the signal reflected by reflectors of the room structure from the emitted chirp pulses. The smart speaker then estimates the angle of arrival of the propagation paths of one or more signals collected from the reflectors of the room structure.
  • a smart device such as a smart speaker
  • the location of the user is then identified by retracing the propagation paths of the one or more signals collected from the user and the propagation paths of the one or more signals collected from the reflectors of the room structure.
  • the location of the user can be determined by the smart device (e.g., smart speaker) localizing the user’s voice, even in situations when the user is not within the line of sight by the smart device, which could not be performed in prior user localization systems.
  • the location of the user can be more accurately identified than in prior user localization systems.
  • the present invention utilizes inexpensive equipment to identify the location of the user as opposed to using expensive equipment as in prior user localization systems. Consequently, in this manner, there is an improvement in the technical field involving user localization systems.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Remote Sensing (AREA)
  • Health & Medical Sciences (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • Otolaryngology (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

Procédé, dispositif intelligent et produit-programme informatique permettant d'identifier l'emplacement d'un utilisateur. Un dispositif intelligent, tel qu'un haut-parleur intelligent, collecte des signaux émanant d'un utilisateur dans un espace de vie par un réseau de microphones du haut-parleur intelligent. L'angle d'arrivée des trajets de propagation d'un ou de plusieurs des signaux collectés est ensuite estimé. En outre, le haut-parleur intelligent estime une structure de pièce par émission d'impulsions modulées en fréquence dans la structure de pièce. Le haut-parleur intelligent collecte ensuite les signaux réfléchis par des réflecteurs de la structure de pièce à partir des impulsions modulées en fréquence émises. Le haut-parleur intelligent estime ensuite les positions et les orientations des réflecteurs dans la structure de pièce. L'emplacement de l'utilisateur est ensuite identifié en retraçant les trajets de propagation du ou des signaux reçus en provenance de l'utilisateur sur la base des positions et des orientations des réflecteurs.
PCT/US2022/011692 2021-01-08 2022-01-07 Utilisation d'un haut-parleur intelligent pour estimer l'emplacement d'un utilisateur WO2022150640A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163135231P 2021-01-08 2021-01-08
US63/135,231 2021-01-08

Publications (1)

Publication Number Publication Date
WO2022150640A1 true WO2022150640A1 (fr) 2022-07-14

Family

ID=82357460

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/011692 WO2022150640A1 (fr) 2021-01-08 2022-01-07 Utilisation d'un haut-parleur intelligent pour estimer l'emplacement d'un utilisateur

Country Status (1)

Country Link
WO (1) WO2022150640A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120044786A1 (en) * 2009-01-20 2012-02-23 Sonitor Technologies As Acoustic position-determination system
US8174931B2 (en) * 2010-10-08 2012-05-08 HJ Laboratories, LLC Apparatus and method for providing indoor location, position, or tracking of a mobile computer using building information
US20130002488A1 (en) * 2011-06-30 2013-01-03 Sony Corporation Wideband beam forming device; wideband beam steering device and corresponding methods
US20180299527A1 (en) * 2015-12-22 2018-10-18 Huawei Technologies Duesseldorf Gmbh Localization algorithm for sound sources with known statistics
US20200309930A1 (en) * 2017-10-30 2020-10-01 The Research Foundation For The State University Of New York System and Method Associated with User Authentication Based on an Acoustic-Based Echo-Signature

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120044786A1 (en) * 2009-01-20 2012-02-23 Sonitor Technologies As Acoustic position-determination system
US8174931B2 (en) * 2010-10-08 2012-05-08 HJ Laboratories, LLC Apparatus and method for providing indoor location, position, or tracking of a mobile computer using building information
US20130002488A1 (en) * 2011-06-30 2013-01-03 Sony Corporation Wideband beam forming device; wideband beam steering device and corresponding methods
US20180299527A1 (en) * 2015-12-22 2018-10-18 Huawei Technologies Duesseldorf Gmbh Localization algorithm for sound sources with known statistics
US20200309930A1 (en) * 2017-10-30 2020-10-01 The Research Foundation For The State University Of New York System and Method Associated with User Authentication Based on an Acoustic-Based Echo-Signature

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHEN SHENG, CHEN DAGUAN, WEI YU-LIN, YANG ZHIJIAN, CHOUDHURY ROMIT ROY: "Voice Localization Using Nearby Wall Reflections", PROCEEDINGS OF THE 26TH ANNUAL INTERNATIONAL CONFERENCE ON MOBILE COMPUTING AND NETWORKING, 21 September 2020 (2020-09-21), pages 1 - 14, XP058463910, DOI: https://doi.org/10.1145/3372224.3380884 *

Similar Documents

Publication Publication Date Title
Shen et al. Voice localization using nearby wall reflections
US11064294B1 (en) Multiple-source tracking and voice activity detections for planar microphone arrays
JP2020128987A (ja) 無線測位システム
CN103229071B (zh) 用于基于超声反射信号的对象位置估计的系统和方法
Wang et al. Symphony: localizing multiple acoustic sources with a single microphone array
Moutinho et al. Indoor localization with audible sound—Towards practical implementation
US20150287422A1 (en) Methods and systems for improved measurement, entity and parameter estimation, and path propagation effect measurement and mitigation in source signal separation
Wang et al. {MAVL}: Multiresolution analysis of voice localization
US20130096922A1 (en) Method, apparatus and computer program product for determining the location of a plurality of speech sources
EP3227720A1 (fr) Procédé et appareil pour effectuer une détection de présence d'ultrasons
JP2017516131A (ja) 源信号分離における改良された測定、エンティティ及びパラメータ推定、及び経路伝播効果測定及び軽減のための方法及びシステム
CN105277921B (zh) 一种基于智能手机的被动声源定位方法
JP2017516131A5 (fr)
Tervo et al. Acoustic reflection localization from room impulse responses
CN112098942B (zh) 一种智能设备的定位方法和智能设备
CN104041075A (zh) 音频源位置估计
Ayllón et al. Indoor blind localization of smartphones by means of sensor data fusion
Di Carlo et al. Mirage: 2d source localization using microphone pair augmentation with echoes
US11474194B2 (en) Controlling a device by tracking movement of hand using acoustic signals
Daniel et al. Echo-enabled direction-of-arrival and range estimation of a mobile source in ambisonic domain
Jensen et al. An EM method for multichannel TOA and DOA estimation of acoustic echoes
Wang et al. Localizing multiple acoustic sources with a single microphone array
WO2022150640A1 (fr) Utilisation d'un haut-parleur intelligent pour estimer l'emplacement d'un utilisateur
Di Carlo et al. dEchorate: a calibrated room impulse response database for echo-aware signal processing
Chen et al. Voicemap: Autonomous mapping of microphone array for voice localization

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22737208

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22737208

Country of ref document: EP

Kind code of ref document: A1