WO2006070044A1 - A method and a device for localizing a sound source and performing a related action - Google Patents

A method and a device for localizing a sound source and performing a related action Download PDF

Info

Publication number
WO2006070044A1
WO2006070044A1 PCT/FI2004/000805 FI2004000805W WO2006070044A1 WO 2006070044 A1 WO2006070044 A1 WO 2006070044A1 FI 2004000805 W FI2004000805 W FI 2004000805W WO 2006070044 A1 WO2006070044 A1 WO 2006070044A1
Authority
WO
WIPO (PCT)
Prior art keywords
electronic device
sound
devices
location
sound source
Prior art date
Application number
PCT/FI2004/000805
Other languages
French (fr)
Inventor
Jussi Virolainen
Pauli Laine
Original Assignee
Nokia Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Corporation filed Critical Nokia Corporation
Priority to PCT/FI2004/000805 priority Critical patent/WO2006070044A1/en
Publication of WO2006070044A1 publication Critical patent/WO2006070044A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/041Digitisers, e.g. for touch screens or touch pads, characterised by the transducing means
    • G06F3/043Digitisers, e.g. for touch screens or touch pads, characterised by the transducing means using propagating acoustic waves
    • G06F3/0433Digitisers, e.g. for touch screens or touch pads, characterised by the transducing means using propagating acoustic waves in which the acoustic waves are either generated by a movable member and propagated within a surface layer or propagated within a surface layer and captured by a movable member

Definitions

  • the present invention relates to user interfaces and input devices such as game controllers. Especially the invention pertains to sound source localization based control of mobile devices and software thereof.
  • Modern mobile devices such as PDAs (Pocket Digital Assistant) and mobile terminals, or generally portable or hand-held devices like game controllers have evolved tremendously during the last 10 years or so.
  • means for providing such devices with necessary input information have improved as to the comfort, learning curve, reliability, size, etc.
  • Classic input means by the word "classic" in the sense of electronic industry referring herein to only a few or few tens of years period before the present, such as a keyboard/keypad, a mouse, and various style joysticks, have in certain areas retained their strong position whereas in some others also newer means have gained popularity.
  • buttons or other pressure-sensitive areas e.g. touch pad
  • the total number of buttons or other pressure-sensitive areas is still somewhat limited due to the size of the surface area on hand for the purpose and to the increasing cost/technical difficulties in producing miniature scale but still user accessible button placements.
  • common "button-tapping" based physical interaction used for inputting user data or control information is limited to certain predetermined areas of the device, such areas being dedicated to that very specific purpose already during the product development and manufacturing stage.
  • a PDA may incorporate a virtual keyboard feature or an external add-on by which a common keyboard layout is optically superimposed on a (preferably solid and smooth) surface. Then the device user's finger position(s) on these virtual keys is determined by optical/visual pattern detection techniques.
  • Audio particularly speech, based user interfaces quite often support a certain vocabulary, either a predefined or user- adjustable one, including a number of elements, e.g. words or syllables, that alone or as a certain group form commands linked to a certain function/action to be executed by the device.
  • a predefined or user- adjustable one including a number of elements, e.g. words or syllables, that alone or as a certain group form commands linked to a certain function/action to be executed by the device.
  • the used vocabulary size, computational (case-dependent) distance between elements, etc
  • the object of the present invention is to circumvent or at least alleviate the aforementioned imperfections in remote device access.
  • the object is met by introducing a method and a related device for audio localization based execution of actions such as various device control functions and/or data input.
  • a sound is generated.
  • Resulting sound that propagates through the terminal body or air is detected by transducers, e.g. a microphone array consisting of 2 to N microphones.
  • transducers e.g. a microphone array consisting of 2 to N microphones.
  • These audio signals are analysed by signal processing techniques in electric form, and as a result, tapping location relative to the microphone array/target device is determined.
  • location it can be referred to both distance and angle of bearing or relative/absolute coordinates from the target device's viewpoint.
  • the location estimate can be relative in which case e.g.
  • the target device is itself in the origin of the local coordinate system as the device does not contain additional data to be able to localize itself.
  • the target device may even determine the absolute position of the sound sources if it is aware of its position due to e.g. GPS (Global Positioning System) capability or some localization service provided by the mobile network.
  • the target device may optionally localize itself by receiving predetermined audio signals from sources with known positions and by analysing them.
  • the location estimate of the sound source is finally used as control information for performing an action.
  • orientation detection of the terminal might be desirable.
  • a magnetometer can be used as a sort of compass.
  • gyros gyroscopes
  • the microphone array can be mounted in the terminal itself. Simple example is a ready-made stereo microphone installed therein. However, the microphone array can be a separate device that establishes a data transfer connection to the terminal e.g. over
  • the array may even constitute from a group of terminals or multipart devices containing the transducers. Terminals having only a single microphone can form an ad hoc array. In this case, proximity detection and time delay estimation or clock synchronization between terminals is preferred.
  • the above basic solution may be cultivated based on each particular application and more components can be flexibly added thereto, or respectively, some components may be taken away or replaced by other alternatives as explained in the detailed description hereinafter.
  • the utility of the invention arises from a plurality of issues.
  • the invention can be implemented in many target devices substantially without installing new hardware or accessory devices.
  • the suggested solution is technically feasible also in a global sense as necessary hardware for realizing it is already adopted worldwide meaning processors, memory chips, and audio transducers such as microphones and A/D converters.
  • the invention core resides in the program logic, being software, tailored hardware, or both, and especially on how the available hardware resources are arranged to function together.
  • adoption curve for utilizing the input method proposed herein is short, as the user is not expected to learn mechanics of any new device as such and it basically is enough to get oneself familiar with his own physique and maybe on general level the basic acoustic properties of the surrounding materials he is going to use as a part of sound generation process.
  • tapping foamed plastic does not typically make a sound loud enough, i.e. a sound that's not suppressed too much by background noise for enabling reliable localization.
  • synthetic or non-human (initiated) sounds may be applied as input.
  • a method for performing an action in an electronic device based on sound source localization having the steps of
  • an electronic device comprising data input means for receiving sound or related structural vibration data, further comprising processing means and memory means for processing and storing instructions and data, and in particular, for determining on the basis of the received data one or more location estimates of one or more sound sources, is characterized in that it is adapted to perform an action dependent on said one or more location estimates of said one or more sound sources.
  • a mobile terminal comprises a number of microphones and necessary software to detect incoming sounds and localize the sound source(s). The estimated position of the sound source(s) is then used to control a parameter value of the terminal, to execute certain action, etc.
  • Such solution can also be exploited in more direct control of the device by implementing a "virtual" or “audio” keyboard feature, in particular the response analysis part thereof, by localizing tapping events on a "keyboard” that may actually be e.g. a natural size paper copy of a true keyboard layout placed on a table or the aforementioned optically projected one.
  • One additional use case is a virtual drum set application in which a number of tap locations are mapped to different drums such as a bass drum, snare, hi-hat etc. Tapping a location with active mapping triggers playing the associated drum sound. The received tap intensity may be used to adjust the playback dynamics.
  • This approach can be also extended towards other audio applications, e.g. a piano/synthesizer simulator, where different piano/synth sounds are played based on sound source location estimates.
  • a mobile terminal a PDA or a mobile gaming console is further adapted to detect vibration advancing from the sound source through the device structure, e.g. a cover.
  • the device structure e.g. a cover.
  • an accelerometer is additionally installed in the device. Initial sound source location estimate acquired by utilizing the microphones can be adjusted by information provided by the accelerometer to make the detection more robust in a noisy environment.
  • audio keyboard applications and alike can benefit from the installation of the accelerometer; the target device may be situated on a surface that's also the one to be tapped with fingers to first boost the accelerometer readings and thus indirectly also improve the overall localization performance.
  • the device cover itself is used as an audio source together with another object like a pen, a stylus, or a fingertip; this type of solution, although being feasible even without the accelerometer in accordance with the first embodiment, could be used to implement audio type keypad in a mobile device, for example, to replace all or some of the original keys or to work along with those.
  • a group of mobile terminals comprising microphones is connected together to form a microphone array, or in more general terms, a localization system consisting of two or more separable devices.
  • the terminals shall be aware of the distance/time delay or clock sync between them to be able to properly analyse the received sounds as an aggregate.
  • the user teaches the target device to detect taps at certain locations and to associate controls with them. He wants to define three arbitrary locations on a table, and when each of them is tapped a different control action will be performed.
  • the preferred methodology of teaching could indicate first setting the device in a learning/teaching mode and tapping the first tap area several times.
  • the device runs, for example, neural network or hidden Markov model based methods for first learning the locations of new tap areas in the teaching mode and subsequently for classifying the tap positions in the use mode. This two-step technique can be seen analogous to teaching a machine to automatically recognise user's handwriting.
  • Fig. IA discloses a typical prior art computer control arrangement with a keyboard and a mouse
  • Fig. IB discloses a classic game controller with a control stick and a few buttons.
  • Fig. 2 illustrates the first embodiment of the invention wherein a mobile terminal has been equipped with necessary hardware such as a number of microphones and software for implementing the inventive concept.
  • Fig. 3 illustrates the second embodiment of the invention wherein a mobile terminal has been additionally provided with an accelerometer to further enhance the localization performance.
  • Fig. 4 illustrates the third embodiment of the invention in which three terminals constitute a system for audio localization and action execution.
  • Fig. 5 discloses a flow diagram of the method of the invention for localizing sound signals and performing actions.
  • Fig. 6 discloses a block diagram of an electronic device in accordance with the invention.
  • FIG. 2 illustrates, by way of example only, mobile terminal 202 that contains at least two microphones 208 or one stereo microphone.
  • Using only one (mono) microphone instead is possible for distance detection but more accurate localization is then quite challenging.
  • Either the microphones 208 have been fixedly installed in the terminal during manufacturing, or at least one of them is a so-called accessory/add-on device attached to the terminal by the user. Such attachment may be wired or wireless, e.g. Bluetooth or IR connection.
  • originating sounds 204, 206 will eventually catch up microphones that convert the oscillation of the medium (air) into electric signals.
  • the electric signals are funnelled to processing means, e.g.
  • a microprocessor or a DSP that use microphone data for estimating the location of the sound sources and further for controlling a number of the terminal's functions or parameters by that.
  • determination of the location estimates and further control of the functions can be divided to be performed by e.g. several dedicated chips functionally connected with each other.
  • tap detection could be improved by using a priori knowledge of the "excitation" signal. If the device knew the tap sound characteristics before, it would boost the performance in noisy conditions. Further, this approach could be combined with the location-learning embodiment. Several tap sounds are recorded and an averaged estimate of the tap sound characteristics is stored. When the tap detection process is running, the detector may compare the expected tap sound to the detected version and decide whether the former was present in the microphone input signal. This method could enhance the sound detection/localization results if the surface tapped on was different material, made different sound, etc. Microphone placement is substantially fictitious in the figure and in reality a plurality of variables has to be evaluated for proper device design. Some microphone array architectures that have been found suitable for localization are listed below:
  • binaural localization binaural headset in which microphones are in the headphones that the user wears, or alternatively, a stereo microphone in the terminal
  • time reverse-type methods may be applied in location detection: Fink, M., “Time reversed acoustics”, Scientific American, pp.67-73, Nov. 1999.
  • nearfield refers to situation where distance of audio event is comparable to a distance between separate microphones and farfield to a situation where distance of the audio event is several times greater than the distance between separate microphones.
  • compact game console 302 or a corresponding electronic apparatus with same hand-held nature further includes at least one accelerometer 308 in addition to a number of microphones 310.
  • accelerometer 308 can be used for detecting vibrations not directly introduced to console 302 itself.
  • Accelerometers 308 are basically transducers that measure changes in the velocity of related objects for determining acceleration or vibration thereof. Typically the measurements are performed by somewhat basic arrangement including a mass (m) spring (k) system, a so-called pendulous accelerometer, wherein the displacement of the mass attached to the spring is proportional to and caused by the acceleration ( ⁇ inertia) and is, at the same time, measurable by utilizing e.g. electrical means, i.e. displacement causes a change in capacitance between two measurement points.
  • an accelerometer is made sensitive to a certain axis only, and therefore plurality of them is needed to cover multiple axes.
  • other accelerometer types like piezoelectric and electromagnetic exist and a person skilled in the art may test and compare different available solutions in connection with the implementation of the invention to find a best-tailored solution for each use case.
  • microphones and accelerometers are not the only transducer types deemed applicable in providing console 302 or some other target device with fully useful input information.
  • microphones and accelerometers are not the only transducer types deemed applicable in providing console 302 or some other target device with fully useful input information.
  • transducers or even transducers with completely different technical approach alike are exploitable for further analysis and sound source localization as, at least, supplementary means.
  • a detectable audio event such as tapping may be realized by hitting fingers against a surface.
  • a stylus or some other pointing device possibly producing more localizable sounds than a finger due to a smaller cross-section and therefore reduced contacting surface thereof, could be used to improve the detectability and localizability of tapping. Understandably, harder material of the stylus in comparison with relatively soft human skin also introduces less elasticity and therefore more power to be emitted in the impact between the stylus and the target surface.
  • the third embodiment of the invention is depicted in figure 4.
  • Three devices namely game console 402, mobile terminal 404, and PDA 406 have been functionally connected together, console 402 and terminal 404 via a wireless, e.g. Bluetooth, connection, and terminal 404 and PDA 406 through a traditional wired connection, e.g. Firewire, to establish an aggregate entity.
  • a wireless e.g. Bluetooth
  • terminal 404 and PDA 406 through a traditional wired connection, e.g. Firewire, to establish an aggregate entity.
  • the devices can be functionally connected all together to form e.g.
  • the devices may communicate serially with one device 404 remaining in the centre of the transmission chain as a data forwarding link. Irrespective of the used communication mode, the devices shall be kept aware of the distances/time delays between them. In case there's only one master device substantially taking care of localization algorithms etc, the other devices do not have to be aware of the locations of the remaining ones.
  • Devices belonging to the aggregate entity may mutually localize themselves by sending e.g. predetermined test (audio) signals to each other. Based on the received signal spectrum/power data and/or possible other parameters determinable from the received test signals the devices can first estimate their internal placement before starting to localize external sources 408.
  • determination of internal/external locations of sound sources may be handled by the aggregate entity either
  • each aggregate device is adapted to calculate it's own or external sound sources' position based on internal or at least aggregate internal data
  • b) concentratedly where basically one or more devices perform all the location analysis on behalf of the other devices belonging to the aggregate, although the latter may provide the former with some (microphone/accelerometer) measurement data, or c) jointly, where certain type of information is calculated in one or more devices to a number of devices of the aggregate.
  • time code or other timing data may be included in exchanged data packets (measurement or analysis data) to enable the other device(s) to properly map such data into time-axis for further analysis/exploitation.
  • the third embodiment may be applicable especially in situations wherein devices 402, 404, 406 forming the aggregate entity do not otherwise bear necessary capabilities to implement localization applications described hereinbefore. Deficiencies may reside in hardware features, software features, or both. Secondly, whenever plurality of users will join together to exploit localization-controlled application and have their localization-enabled devices with them, the otherwise available redundancy can be cleverly put into use by utilizing many devices instead of mere one for better localization performance, for example. Such application could pertain to various games, plays, party events etc.
  • a further embodiment of the invention follows the general pattern of the previous embodiments with a target device adapted to detect taps at certain locations and to associate controls with them.
  • the user determines three (or some other preferred number) arbitrary locations on a table, e.g. locations corresponding to taps 204, 206 and an additional third one not shown. When each of the locations is tapped a different control action shall be performed.
  • the device could be adapted to the prevailing use case by the user by first setting the device in a learning mode e.g. through some UI function and then by tapping several times to the first tap area. The device runs a classifier such as a neural network or hidden Markov model to learn the position of the area. Procedure is repeated for each location.
  • the user may manually define the number of recognizable locations and other related parameters like sound characteristics through UI and, for example, store them as default values for future use, or at least some of the parameters may be automatically detected by the device, if e.g. despite the possible variance in taps related to the same location the overall separability between the different locations is good enough according to some predetermined classification criteria.
  • a potential musical (including rhythmical) application of this arrangement is e.g. a virtual drum simulator.
  • Each drum is mapped to a certain location on the table. Tapping of such locations then triggers the associated drum samples. Tap amplitude can be used to control the dynamics of the sample playback.
  • Exemplary embodiments set forth above are not contradictory and therefore their features may be creatively combined and cultivated for each emerging scenario without significant problems by a person skilled in the art.
  • keypad symbols could be projected onto a table surface by optical projection techniques. Then, the sound generated by tapping at a certain location representing a key would be detected by microphone array and the detected location be mapped to corresponding symbol presented in projected keypad.
  • keypad of 4 x 3 buttons could be implemented this way.
  • Time constants shall be selected to carry out the functionality of a multipurpose keypad and to detect if user had tapped once, twice or three times the same key location to produce e.g. letter [j, k or 1] in a conventional keypad of a contemporary mobile phone.
  • the location of the microphone array according to the sound event can be estimated.
  • Location of the device either relative or absolute depending on the known parameters, could be detected versus a reference tone (possibly ultrasonic to avoid disturbing the users) or some other reference audio event as explained hereinbefore.
  • Multiple sound events originating from different (known) locations can be used to improve the localization.
  • reference tone possibly ultrasonic to avoid disturbing the users
  • microphone array's orientation and possibly relative or even absolute location may be locally determinable from the received sound(s).
  • a basic principle applies: more information available more accurate localization results can be expected.
  • Acoustics based localization associated with larger distances may be used in game applications to measure the position of players or some other factor.
  • Players themselves can also utilize additional communication-enabled gear such as accelerometers for controlling purposes. This could make some new game concepts possible in which a player receives instructions (by audio or visual representation) to move into some physical location in a space. After certain duration of time an audio event is played from a loudspeaker that everybody can hear, and each device detects its own location versus this reference tone. If the player is in right/wrong place appropriate feedback is provided.
  • buttons can replace some buttons of a mobile terminal or another preferred target device.
  • main purpose of the "buttons” is to make sound that can be detected by the microphone array mounted in the terminal.
  • Conventional button mechanics could be set aside.
  • a sensitive display ⁇ touch screen
  • step 502 the method is initialised, which may refer to, for example, launching the necessary software, initialising the related variables, and setting up connections to other devices in case of multiple device interconnection.
  • step 504 the device(s) are adapted to receive and process sound signals, by this referring to producing a number of audio samples with A/D converter inputting microphone output data, for example, via available microphone arrays and additional transducers such as accelerometers. Multiple devices may also exchange input data between them if seen useful. Transducers converting the input signals into electrical counterparts may also encompass more sophisticated calculation/analysis means (both hw and sw), e.g. frequency axis analysis/sound parameterisation tools, to refine the raw measurement data directly into more usable form.
  • Step 504 may further incorporate an optional learning/teaching stage, i.e. receipt of test signals for estimating the predetermined ( ⁇ allowed) sound source locations and/or sound characteristics.
  • step 506 it is checked by comparing a regularly updated time counter ( ⁇ timer) or mere sample counter value with a predetermined threshold, or by some other means, whether a data acquisition period has ended. If that's the case, method execution shall be continued from step 508, otherwise step 504 is repeated.
  • data acquisition process according to step 504 may be continuous and happen in the background whereas the rest of the procedure steps still take place only occasionally, for example when there is enough new input data for further processing/analysis, or continuously, i.e. sliding window type or some other continuous estimation process is applied to the substantially continuously produced input data.
  • word "continuous" can however refer to a discrete time resolution of one sample, for example.
  • step 508 location estimate(s) are determined for the sound source(s).
  • one or more sources may be determinable from the same source data.
  • the actual methodology to perform the localization step does not belong to the scope of the invention, and as briefly introduced earlier, the person skilled in the art may utilize the techniques that seem to fit the prevailing scenario best.
  • Step 510 indicates the execution of actions associated with the localization results.
  • the associations may be application-specific and at least partly user-definable on case-by- case basis. For example, a certain application-specific parameter value (e.g. playback volume) can be dependent on the estimated location of a certain sound source; some localization result may trigger launching/closing a program; some other localization result could indicate typing a certain character on the display (and optionally inserting it in the document displayed) as with the virtual/audio keyboard application, etc. Naturally a localization result of a sound event can also result giving a certain response to the user of the device through visual, tactile or acoustic means.
  • the localization arrangement may be used to control different functionalities (e.g.
  • Step 512 implies the conditional termination of the method as a consequence of finishing the execution of associated actions of step 510. However, if the method execution shall be continued, execution thereof is reverted back to step 504. If not, method is ended in step 514.
  • FIG. 6 that shall be taken only as an example is a generic block diagram of an electronic device capable of executing the suggested method.
  • the device can be a mobile terminal, a PDA, or a hand-held game console/game controller (preferably a wireless one), for example.
  • It comprises memory 604, divided between one or more physical memory chips, including necessary code, e.g. in a form of a computer program/application, and other data, e.g. current configuration and input sound data provided by various transducers.
  • Processing unit 602 e.g. a microprocessor, a DSP, microcontroller, or a programmable logic chip, is required for the actual execution of the method in accordance with instructions stored in memory 604.
  • Display 606 and keypad 608 or other applicable user input means provide the user with optional device control and data visualization means ( ⁇ user interface). Respectively, various other output means such as a number of loudspeakers are optionally included to generate the users the preferred response (not shown).
  • Data input means 610 either fixedly or detachably mounted, include microphone arrays and additional transducers such as accelerometer for inputting the necessary data to the device. Also wire-based or wireless transceivers (for transmission and/or reception) may be included for communication with other devices or accessories like external loudspeakers.
  • Optional (light-)projecting means 612 superimpose e.g. a virtual keypad/keyboard on a nearby surface.
  • the invention may be implemented as a combination of tailored software and more generic hardware, or exclusively through specialized hardware such as ASICs (Application Specific Integrated Circuit).
  • ASICs Application Specific Integrated Circuit
  • Software for carrying out the method of the invention can be delivered on a carrier medium such as a floppy, a cd-rom, a memory card, or a hard disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

A method and a device for performing actions (510) based on localization (508) of sound source(s). The estimated location of a sound source specifies launching a certain program or performing some other predetermined action. In addition to a microphone, an accelerometer can be used in the device for enhancing the localization results. The device may be a mobile terminal or a PDA (Personal Digital Assistant) with invention dependent additional software and/or hardware, for example.

Description

A method and a device for localizing a sound source and performing a related action
FIELD OF THE INVENTION
The present invention relates to user interfaces and input devices such as game controllers. Especially the invention pertains to sound source localization based control of mobile devices and software thereof.
BACKGROUND OF THE INVENTION
Modern mobile devices such as PDAs (Pocket Digital Assistant) and mobile terminals, or generally portable or hand-held devices like game controllers have evolved tremendously during the last 10 years or so. Likewise, means for providing such devices with necessary input information have improved as to the comfort, learning curve, reliability, size, etc. Classic input means, by the word "classic" in the sense of electronic industry referring herein to only a few or few tens of years period before the present, such as a keyboard/keypad, a mouse, and various style joysticks, have in certain areas retained their strong position whereas in some others also newer means have gained popularity.
Next, we shall consider a typical button arrangement (~keyboard) and a joystick as input means in more detail. See figure Ia of standard keyboard 102 and mouse 104 set-up for controlling a computer and figure Ib for an illustrative example of a basic shape joystick 106, i.e. the most common game controller. Certainly joysticks with one multi-axis adjustable controller stick 108 or cross controller (D-pad) and a (minor) number of buttons 110, or button arrangements 102 like the de facto factory standard QWERTY keyboard possibly accompanied by mouse 104, are useful in many traditional cases wherein neither the delivery speed of user input nor flexible replacement of different buttons, i.e. separate input locations, are that essential factors.
Quite often the total number of buttons or other pressure-sensitive areas (e.g. touch pad) is still somewhat limited due to the size of the surface area on hand for the purpose and to the increasing cost/technical difficulties in producing miniature scale but still user accessible button placements. Notwithstanding the problems possibly emerging from positioning the buttons over a device's surface area, another inconvenient issue remains; common "button-tapping" based physical interaction used for inputting user data or control information is limited to certain predetermined areas of the device, such areas being dedicated to that very specific purpose already during the product development and manufacturing stage.
Some modern applications also utilize direct contact-free user input means for providing the target devices with necessary information. Such means may exploit optical arrangements, for example. A PDA may incorporate a virtual keyboard feature or an external add-on by which a common keyboard layout is optically superimposed on a (preferably solid and smooth) surface. Then the device user's finger position(s) on these virtual keys is determined by optical/visual pattern detection techniques.
Even sound may be used to provide input to a device equipped with necessary reception means such as a microphone. Audio, particularly speech, based user interfaces quite often support a certain vocabulary, either a predefined or user- adjustable one, including a number of elements, e.g. words or syllables, that alone or as a certain group form commands linked to a certain function/action to be executed by the device. In practice it's a question of audio/speech recognition ability (algorithm itself, performance in noisy conditions, etc) of the device and the used vocabulary (size, computational (case-dependent) distance between elements, etc) how problem- free and useful such approach finally is in different use scenarios.
Nevertheless, numerous prior art techniques and technologies described above do not address the problem of remotely and flexibly inputting control or other information to a target device. Typical wired, infrared, or radio frequency remote controllers can limitedly be used for the purpose but even them do not cope with all requirements set by various applications. First, a separate remote controller unit may either be lost or forgotten somewhere quite easily. Secondly, it anyhow takes some additional space, which is not preferable e.g. in case of otherwise maximally small-sized mobile devices that should be kept as portable as possible. As a third note, (real-time) multi-user applications like games would, with a high probability, require using multiple controllers, one per each user. That fact, besides putting additional strain on the users in form of acquiring controllers and particularly getting familiar with their different functionalities, also contravenes the design rules of convenience and simplicity for maintaining the use experience as transparent as possible. SUMMARY OF THE INVENTION
The object of the present invention is to circumvent or at least alleviate the aforementioned imperfections in remote device access. The object is met by introducing a method and a related device for audio localization based execution of actions such as various device control functions and/or data input.
In a basic solution and considering e.g. a mobile terminal acting as a target device, when user taps his/her fingers against or merely touches a surface near the terminal or on the terminal itself, a sound is generated. Resulting sound that propagates through the terminal body or air (as an exemplary medium) is detected by transducers, e.g. a microphone array consisting of 2 to N microphones. These audio signals are analysed by signal processing techniques in electric form, and as a result, tapping location relative to the microphone array/target device is determined. By location it can be referred to both distance and angle of bearing or relative/absolute coordinates from the target device's viewpoint. The location estimate can be relative in which case e.g. the target device is itself in the origin of the local coordinate system as the device does not contain additional data to be able to localize itself. Alternatively, the target device may even determine the absolute position of the sound sources if it is aware of its position due to e.g. GPS (Global Positioning System) capability or some localization service provided by the mobile network. The target device may optionally localize itself by receiving predetermined audio signals from sources with known positions and by analysing them. The location estimate of the sound source is finally used as control information for performing an action. Together with GPS or some other localization- enabling service, also orientation detection of the terminal might be desirable. E.g. a magnetometer can be used as a sort of compass. Correspondingly, gyros (gyroscopes) may be utilized.
The microphone array can be mounted in the terminal itself. Simple example is a ready-made stereo microphone installed therein. However, the microphone array can be a separate device that establishes a data transfer connection to the terminal e.g. over
Bluetooth. The array may even constitute from a group of terminals or multipart devices containing the transducers. Terminals having only a single microphone can form an ad hoc array. In this case, proximity detection and time delay estimation or clock synchronization between terminals is preferred. The above basic solution may be cultivated based on each particular application and more components can be flexibly added thereto, or respectively, some components may be taken away or replaced by other alternatives as explained in the detailed description hereinafter.
The utility of the invention arises from a plurality of issues. First of all, the invention can be implemented in many target devices substantially without installing new hardware or accessory devices. The suggested solution is technically feasible also in a global sense as necessary hardware for realizing it is already adopted worldwide meaning processors, memory chips, and audio transducers such as microphones and A/D converters. The invention core resides in the program logic, being software, tailored hardware, or both, and especially on how the available hardware resources are arranged to function together. From the usage point of view, adoption curve for utilizing the input method proposed herein is short, as the user is not expected to learn mechanics of any new device as such and it basically is enough to get oneself familiar with his own physique and maybe on general level the basic acoustic properties of the surrounding materials he is going to use as a part of sound generation process. For example, tapping foamed plastic does not typically make a sound loud enough, i.e. a sound that's not suppressed too much by background noise for enabling reliable localization. Also synthetic or non-human (initiated) sounds may be applied as input. Variety of different applications may take use of the invention due to its natural flexibility as audio sources and thus the actual emitted sounds are not limited to any particular type or (relative) location; in the device itself, for example, "tapping the body" is possible instead of mere button presses, or an external location like the table surface on which the device lies can be used as a "sound source" if knocked with fingers etc. Other benefits of the invention are mentioned in connection with the description of related embodiments.
In one aspect of the invention, a method for performing an action in an electronic device based on sound source localization, having the steps of
-receiving sound or related structural vibration data through a plurality of transducers, -determining on the basis of the received data one or more location estimates for one or more sound sources, is characterized in that it further has the step of -performing an action in the electronic device, said action being dependent on said one or more location estimates of said one or more sound sources. In another aspect of the invention, an electronic device comprising data input means for receiving sound or related structural vibration data, further comprising processing means and memory means for processing and storing instructions and data, and in particular, for determining on the basis of the received data one or more location estimates of one or more sound sources, is characterized in that it is adapted to perform an action dependent on said one or more location estimates of said one or more sound sources.
In an embodiment of the invention a mobile terminal comprises a number of microphones and necessary software to detect incoming sounds and localize the sound source(s). The estimated position of the sound source(s) is then used to control a parameter value of the terminal, to execute certain action, etc. Such solution can also be exploited in more direct control of the device by implementing a "virtual" or "audio" keyboard feature, in particular the response analysis part thereof, by localizing tapping events on a "keyboard" that may actually be e.g. a natural size paper copy of a true keyboard layout placed on a table or the aforementioned optically projected one. One additional use case is a virtual drum set application in which a number of tap locations are mapped to different drums such as a bass drum, snare, hi-hat etc. Tapping a location with active mapping triggers playing the associated drum sound. The received tap intensity may be used to adjust the playback dynamics. This approach can be also extended towards other audio applications, e.g. a piano/synthesizer simulator, where different piano/synth sounds are played based on sound source location estimates.
In another embodiment a mobile terminal, a PDA or a mobile gaming console is further adapted to detect vibration advancing from the sound source through the device structure, e.g. a cover. To enhance the detection of internally progressing vibrations, although such may be recognizable through microphones as well, an accelerometer is additionally installed in the device. Initial sound source location estimate acquired by utilizing the microphones can be adjusted by information provided by the accelerometer to make the detection more robust in a noisy environment. Likewise, audio keyboard applications and alike can benefit from the installation of the accelerometer; the target device may be situated on a surface that's also the one to be tapped with fingers to first boost the accelerometer readings and thus indirectly also improve the overall localization performance. The extreme case is that the device cover itself is used as an audio source together with another object like a pen, a stylus, or a fingertip; this type of solution, although being feasible even without the accelerometer in accordance with the first embodiment, could be used to implement audio type keypad in a mobile device, for example, to replace all or some of the original keys or to work along with those.
In a further embodiment of the invention a group of mobile terminals comprising microphones is connected together to form a microphone array, or in more general terms, a localization system consisting of two or more separable devices. The terminals shall be aware of the distance/time delay or clock sync between them to be able to properly analyse the received sounds as an aggregate.
Yet in a further embodiment the user teaches the target device to detect taps at certain locations and to associate controls with them. He wants to define three arbitrary locations on a table, and when each of them is tapped a different control action will be performed. The preferred methodology of teaching could indicate first setting the device in a learning/teaching mode and tapping the first tap area several times. The device runs, for example, neural network or hidden Markov model based methods for first learning the locations of new tap areas in the teaching mode and subsequently for classifying the tap positions in the use mode. This two-step technique can be seen analogous to teaching a machine to automatically recognise user's handwriting.
Accompanying dependent claims disclose embodiments of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
Hereinafter the invention is described in more detail by reference to the attached drawings, wherein
Fig. IA discloses a typical prior art computer control arrangement with a keyboard and a mouse Fig. IB discloses a classic game controller with a control stick and a few buttons.
Fig. 2 illustrates the first embodiment of the invention wherein a mobile terminal has been equipped with necessary hardware such as a number of microphones and software for implementing the inventive concept.
Fig. 3 illustrates the second embodiment of the invention wherein a mobile terminal has been additionally provided with an accelerometer to further enhance the localization performance. Fig. 4 illustrates the third embodiment of the invention in which three terminals constitute a system for audio localization and action execution.
Fig. 5 discloses a flow diagram of the method of the invention for localizing sound signals and performing actions. Fig. 6 discloses a block diagram of an electronic device in accordance with the invention.
DETAILED DESCRIPTION OF THE EMBODIMENTS OF THE INVENTION
Figures IA and IB were already covered in conjunction with the description of related prior art.
Figure 2 illustrates, by way of example only, mobile terminal 202 that contains at least two microphones 208 or one stereo microphone. Using only one (mono) microphone instead is possible for distance detection but more accurate localization is then quite challenging. Either the microphones 208 have been fixedly installed in the terminal during manufacturing, or at least one of them is a so-called accessory/add-on device attached to the terminal by the user. Such attachment may be wired or wireless, e.g. Bluetooth or IR connection. When the user taps on the nearby objects like the table on which terminal 202 has been positioned, originating sounds 204, 206 will eventually catch up microphones that convert the oscillation of the medium (air) into electric signals. The electric signals are funnelled to processing means, e.g. a microprocessor or a DSP, that use microphone data for estimating the location of the sound sources and further for controlling a number of the terminal's functions or parameters by that. Alternatively, determination of the location estimates and further control of the functions can be divided to be performed by e.g. several dedicated chips functionally connected with each other.
Optionally, tap detection could be improved by using a priori knowledge of the "excitation" signal. If the device knew the tap sound characteristics before, it would boost the performance in noisy conditions. Further, this approach could be combined with the location-learning embodiment. Several tap sounds are recorded and an averaged estimate of the tap sound characteristics is stored. When the tap detection process is running, the detector may compare the expected tap sound to the detected version and decide whether the former was present in the microphone input signal. This method could enhance the sound detection/localization results if the surface tapped on was different material, made different sound, etc. Microphone placement is substantially fictitious in the figure and in reality a plurality of variables has to be evaluated for proper device design. Some microphone array architectures that have been found suitable for localization are listed below:
1) binaural localization (binaural headset in which microphones are in the headphones that the user wears, or alternatively, a stereo microphone in the terminal)
2) static linear multi microphone array of N microphones
3) static arbitrary multi microphone array of N microphones 4) dynamic arbitrary multi-microphone array (ad hoc array comprised of multiple devices, proximity detection between devices required)
Principles of binaural localization (two microphones) have been described in Roman, N, Wang, D, Brown, G "Speech segregation based on sound localization", Journal of Acoustical Society of America, VoI 114 no. 4, Pt 1. Oct 2003.
One method for speech source localization is disclosed in Brandstein, M. S., Silverman, H. F, "A practical methodology for speech source localization with microphone arrays", Computer speech and language, no. 11, pp. 91-126, 1997.
As a special case, time reverse-type methods may be applied in location detection: Fink, M., "Time reversed acoustics", Scientific American, pp.67-73, Nov. 1999.
Detection of the sound source location either near the microphone area or farther away therefrom can be considered as different problems. In the following concept nearfield refers to situation where distance of audio event is comparable to a distance between separate microphones and farfield to a situation where distance of the audio event is several times greater than the distance between separate microphones.
When the sound source is located in the nearfield, localization can be highly based on direct sound localization. It can be expected that first early reflections from surfaces and walls reach the microphones not until relatively long time (4-20ms) after the direct sound. In addition, amplitude difference between direct sound and reflected sound can be relatively large. Location detection in the nearfield can thus be accomplished without big problems. The detected wave fronts can be considered spherical, which make accurate distance estimation possible. Meanwhile, in the farfield, time difference between direct sound and reflected sound can be short and amplitude difference small. Room effect generally makes accurate location detection much more difficult than in the nearfield scenario. Incoming wave fronts are planar, which complicates localization in its turn.
Instead of utilizing only one microphone array, several arrays can be used for the detection. Overlapping regions improve location accuracy.
In the second embodiment of the invention and in accordance with figure 3 compact game console 302 or a corresponding electronic apparatus with same hand-held nature further includes at least one accelerometer 308 in addition to a number of microphones 310. When both microphone array (stereo microphone) 310 and accelerometer 308 are in use, taps 304, 306 are localizable once again using microphone signals, as was the case in the above first embodiment, but now, practically simultaneously taps are also detected by accelerometer(s) 308 to provide an additional parameter or parameters for the signal analysis stage. This way the system can be made more robust against environmental noise. Likewise, accelerometer 308 can be used for detecting vibrations not directly introduced to console 302 itself.
Accelerometers 308 are basically transducers that measure changes in the velocity of related objects for determining acceleration or vibration thereof. Typically the measurements are performed by somewhat basic arrangement including a mass (m) spring (k) system, a so-called pendulous accelerometer, wherein the displacement of the mass attached to the spring is proportional to and caused by the acceleration (~inertia) and is, at the same time, measurable by utilizing e.g. electrical means, i.e. displacement causes a change in capacitance between two measurement points. Sometimes an accelerometer is made sensitive to a certain axis only, and therefore plurality of them is needed to cover multiple axes. Also other accelerometer types like piezoelectric and electromagnetic exist and a person skilled in the art may test and compare different available solutions in connection with the implementation of the invention to find a best-tailored solution for each use case.
It shall be kept in mind though that microphones and accelerometers are not the only transducer types deemed applicable in providing console 302 or some other target device with fully useful input information. Thus either other audio signal transducers or even transducers with completely different technical approach alike (optical, thermal, etc) are exploitable for further analysis and sound source localization as, at least, supplementary means.
A detectable audio event such as tapping may be realized by hitting fingers against a surface. Alternatively, a stylus or some other pointing device, possibly producing more localizable sounds than a finger due to a smaller cross-section and therefore reduced contacting surface thereof, could be used to improve the detectability and localizability of tapping. Understandably, harder material of the stylus in comparison with relatively soft human skin also introduces less elasticity and therefore more power to be emitted in the impact between the stylus and the target surface.
Regardless of the used materials, even footsteps can be detected as well as other preferred "natural" or synthetic audio events. Instead of detecting only the event, additional information about the detectable sound itself can be used to improve the detection process in noisy environment. Time-domain and frequency domain methods such as different transformations (e.g. Fourier analysis) can be used to analyse the sounds and to reduce problems caused by environmental noise and reverberation. One possibility is to detect the sharpness of the event. If the event is too smooth, for example attack and decay times or spectral content do not fall into predetermined limits, it is not considered as a proper event. One may use different types of tap- sounds, e.g. fingertip or nails to make a difference on the intended effect. A possible enhancement related to this is the utilization of direction-sensitive microphone surface layer 'hairs', which make different sound when they are moved towards different directions.
The third embodiment of the invention is depicted in figure 4. Three devices, namely game console 402, mobile terminal 404, and PDA 406 have been functionally connected together, console 402 and terminal 404 via a wireless, e.g. Bluetooth, connection, and terminal 404 and PDA 406 through a traditional wired connection, e.g. Firewire, to establish an aggregate entity. It's not a crucial detail from the invention's standpoint however, whether the connected devices are essentially identical or not; it's sufficient that the devices are able to communicate with each other to the extent required by the invention. The devices can be functionally connected all together to form e.g. a circle-shaped entity, in which case one device multicasts its own measurement/analysis data to other two devices, or alternatively, the devices may communicate serially with one device 404 remaining in the centre of the transmission chain as a data forwarding link. Irrespective of the used communication mode, the devices shall be kept aware of the distances/time delays between them. In case there's only one master device substantially taking care of localization algorithms etc, the other devices do not have to be aware of the locations of the remaining ones. Devices belonging to the aggregate entity may mutually localize themselves by sending e.g. predetermined test (audio) signals to each other. Based on the received signal spectrum/power data and/or possible other parameters determinable from the received test signals the devices can first estimate their internal placement before starting to localize external sources 408. Also exchange of received microphone/accelerometer signals and analysis results etc can be actualised between devices. Alternatively/additionally e.g. GPS equipped devices may directly inform each other about their individual positions to set-up necessary initial location information of the entities forming the aggregate entity. To attain even more sophisticated solution also (relational) further movement of aggregate devices shall be possible by adaptively/continuously updating the necessary location information between them.
In general terms, determination of internal/external locations of sound sources may be handled by the aggregate entity either
a) independently, where each aggregate device is adapted to calculate it's own or external sound sources' position based on internal or at least aggregate internal data, b) concentratedly, where basically one or more devices perform all the location analysis on behalf of the other devices belonging to the aggregate, although the latter may provide the former with some (microphone/accelerometer) measurement data, or c) jointly, where certain type of information is calculated in one or more devices to a number of devices of the aggregate.
Additionally, to enable rapid information exchange between devices, they shall advantageously be synchronized to ease instant data transmission/reception. Alternatively, time code or other timing data may be included in exchanged data packets (measurement or analysis data) to enable the other device(s) to properly map such data into time-axis for further analysis/exploitation.
The third embodiment may be applicable especially in situations wherein devices 402, 404, 406 forming the aggregate entity do not otherwise bear necessary capabilities to implement localization applications described hereinbefore. Deficiencies may reside in hardware features, software features, or both. Secondly, whenever plurality of users will join together to exploit localization-controlled application and have their localization-enabled devices with them, the otherwise available redundancy can be cleverly put into use by utilizing many devices instead of mere one for better localization performance, for example. Such application could pertain to various games, plays, party events etc.
A further embodiment of the invention follows the general pattern of the previous embodiments with a target device adapted to detect taps at certain locations and to associate controls with them.
Referring back to figure 2, localization process is now enhanced at the expense of the transparency thereof. The user determines three (or some other preferred number) arbitrary locations on a table, e.g. locations corresponding to taps 204, 206 and an additional third one not shown. When each of the locations is tapped a different control action shall be performed. The device could be adapted to the prevailing use case by the user by first setting the device in a learning mode e.g. through some UI function and then by tapping several times to the first tap area. The device runs a classifier such as a neural network or hidden Markov model to learn the position of the area. Procedure is repeated for each location. The user may manually define the number of recognizable locations and other related parameters like sound characteristics through UI and, for example, store them as default values for future use, or at least some of the parameters may be automatically detected by the device, if e.g. despite the possible variance in taps related to the same location the overall separability between the different locations is good enough according to some predetermined classification criteria.
A potential musical (including rhythmical) application of this arrangement is e.g. a virtual drum simulator. Each drum is mapped to a certain location on the table. Tapping of such locations then triggers the associated drum samples. Tap amplitude can be used to control the dynamics of the sample playback.
Exemplary embodiments set forth above are not contradictory and therefore their features may be creatively combined and cultivated for each emerging scenario without significant problems by a person skilled in the art. For example, keypad symbols could be projected onto a table surface by optical projection techniques. Then, the sound generated by tapping at a certain location representing a key would be detected by microphone array and the detected location be mapped to corresponding symbol presented in projected keypad. For example, keypad of 4 x 3 buttons could be implemented this way. Time constants shall be selected to carry out the functionality of a multipurpose keypad and to detect if user had tapped once, twice or three times the same key location to produce e.g. letter [j, k or 1] in a conventional keypad of a contemporary mobile phone.
According to the common principle of reciprocity, instead of detecting the location of a sound event directly, the location of the microphone array according to the sound event can be estimated. Location of the device, either relative or absolute depending on the known parameters, could be detected versus a reference tone (possibly ultrasonic to avoid disturbing the users) or some other reference audio event as explained hereinbefore. Multiple sound events originating from different (known) locations can be used to improve the localization. Thus, depending on the available a priori information about reference tone characteristics, absolute location of the sound source sending the reference tone, and transmission path properties, microphone array's orientation and possibly relative or even absolute location may be locally determinable from the received sound(s). A basic principle applies: more information available more accurate localization results can be expected.
Acoustics based localization associated with larger distances may be used in game applications to measure the position of players or some other factor. Players themselves can also utilize additional communication-enabled gear such as accelerometers for controlling purposes. This could make some new game concepts possible in which a player receives instructions (by audio or visual representation) to move into some physical location in a space. After certain duration of time an audio event is played from a loudspeaker that everybody can hear, and each device detects its own location versus this reference tone. If the player is in right/wrong place appropriate feedback is provided.
Considering the applicability of the invention even further, microphones can replace some buttons of a mobile terminal or another preferred target device. In this case main purpose of the "buttons" is to make sound that can be detected by the microphone array mounted in the terminal. Conventional button mechanics could be set aside. Likewise, a sensitive display (~touch screen) can be implemented by microphones. A flow diagram illustrating the general inventive concept and the core of the invention as a step-by-step method is shown in figure 5. In step 502 the method is initialised, which may refer to, for example, launching the necessary software, initialising the related variables, and setting up connections to other devices in case of multiple device interconnection. In step 504 the device(s) are adapted to receive and process sound signals, by this referring to producing a number of audio samples with A/D converter inputting microphone output data, for example, via available microphone arrays and additional transducers such as accelerometers. Multiple devices may also exchange input data between them if seen useful. Transducers converting the input signals into electrical counterparts may also encompass more sophisticated calculation/analysis means (both hw and sw), e.g. frequency axis analysis/sound parameterisation tools, to refine the raw measurement data directly into more usable form. Step 504 may further incorporate an optional learning/teaching stage, i.e. receipt of test signals for estimating the predetermined (~allowed) sound source locations and/or sound characteristics.
In step 506 it is checked by comparing a regularly updated time counter (~timer) or mere sample counter value with a predetermined threshold, or by some other means, whether a data acquisition period has ended. If that's the case, method execution shall be continued from step 508, otherwise step 504 is repeated. Alternatively, data acquisition process according to step 504 may be continuous and happen in the background whereas the rest of the procedure steps still take place only occasionally, for example when there is enough new input data for further processing/analysis, or continuously, i.e. sliding window type or some other continuous estimation process is applied to the substantially continuously produced input data. When working in a digital domain, word "continuous" can however refer to a discrete time resolution of one sample, for example.
In step 508 location estimate(s) are determined for the sound source(s). Depending on the used localization method and application, one or more sources may be determinable from the same source data. The actual methodology to perform the localization step does not belong to the scope of the invention, and as briefly introduced earlier, the person skilled in the art may utilize the techniques that seem to fit the prevailing scenario best.
Step 510 indicates the execution of actions associated with the localization results. The associations may be application-specific and at least partly user-definable on case-by- case basis. For example, a certain application-specific parameter value (e.g. playback volume) can be dependent on the estimated location of a certain sound source; some localization result may trigger launching/closing a program; some other localization result could indicate typing a certain character on the display (and optionally inserting it in the document displayed) as with the virtual/audio keyboard application, etc. Naturally a localization result of a sound event can also result giving a certain response to the user of the device through visual, tactile or acoustic means. The localization arrangement may be used to control different functionalities (e.g. scroll up/down, play a cd, etc) of the device. Associations of this nature are relatively easy to implement by modern programming means and thereupon, as the total number of different applications is self-evidently enormous, there is no sense to start listing them here in order to provide a sort of maximally exhaustive record.
Step 512 implies the conditional termination of the method as a consequence of finishing the execution of associated actions of step 510. However, if the method execution shall be continued, execution thereof is reverted back to step 504. If not, method is ended in step 514.
Figure 6 that shall be taken only as an example is a generic block diagram of an electronic device capable of executing the suggested method. The device can be a mobile terminal, a PDA, or a hand-held game console/game controller (preferably a wireless one), for example. It comprises memory 604, divided between one or more physical memory chips, including necessary code, e.g. in a form of a computer program/application, and other data, e.g. current configuration and input sound data provided by various transducers. Processing unit 602, e.g. a microprocessor, a DSP, microcontroller, or a programmable logic chip, is required for the actual execution of the method in accordance with instructions stored in memory 604. Display 606 and keypad 608 or other applicable user input means provide the user with optional device control and data visualization means (~user interface). Respectively, various other output means such as a number of loudspeakers are optionally included to generate the users the preferred response (not shown). Data input means 610, either fixedly or detachably mounted, include microphone arrays and additional transducers such as accelerometer for inputting the necessary data to the device. Also wire-based or wireless transceivers (for transmission and/or reception) may be included for communication with other devices or accessories like external loudspeakers. Optional (light-)projecting means 612 superimpose e.g. a virtual keypad/keyboard on a nearby surface. The invention may be implemented as a combination of tailored software and more generic hardware, or exclusively through specialized hardware such as ASICs (Application Specific Integrated Circuit).
Software for carrying out the method of the invention can be delivered on a carrier medium such as a floppy, a cd-rom, a memory card, or a hard disk.
The scope of the invention is found in the following claims. Although certain focused examples were given throughout the text about the invention's applicability, feasible method steps, or related device internals, purpose thereof was not to limit the usage area of the fulcrum of the invention to any certain field, which should be evident to any rational reader. Meanwhile, the invention shall be considered as a novel, practical method for providing control or other type of information to an electric apparatus by utilizing especially audio signals and localization thereof for the purpose.

Claims

Claims
1. A method for performing an action in an electronic device based on sound source localization, has the steps of
-receiving sound or related structural vibration data through a plurality of transducers (504),
-determining on the basis of the received data one or more location estimates for one or more sound sources (508), characterized in that it further has the step of
-performing an action in the electronic device (510), said action being dependent on said one or more location estimates of said one or more sound sources.
2. The method of claim 1, wherein said plurality of transducers includes a microphone.
3. The method of any of claims 1-2, wherein said plurality of transducers includes an accelerometer.
4. The method of claim 2, wherein two or more microphones form a microphone array.
5. The method of claim 4, wherein said array comprises at least one of the following: binaural array, static linear multi microphone array, static arbitrary multi microphone array, and dynamic arbitrary multi-microphone array.
6. The method of any of claims 1-5, wherein said action is at least one of the following: controlling the device's functionality, updating a parameter value, launching/closing an application, inserting a character or a symbol on a device display, inserting a character or a symbol in a document visible on the display, carrying out a control event relating to a game, and giving a response to the user of the device.
7. The method of any of claims 1-6, wherein a number of key areas are projected on an external surface, and touching or tapping a surface onto which a certain key area has been projected is detected and then the tapping location estimate determined for performing the action.
8. The method of claim 1, wherein a plurality of devices forms an aggregate device, said aggregate device being the electronic device.
9. The method of claim 8, wherein a connection between two devices belonging to said plurality of devices is either wireless or wired.
10. The method of claim 8, wherein a device belonging to said plurality of devices determines its position in relation to another device belonging to said plurality of devices.
11. The method of claim 1, wherein said receiving step includes receipt of sound or structural vibration caused by touching or tapping the surface of the electrical device.
12. The method of claim 11, wherein said receiving step includes receiving sound or structural vibration caused by touching a display or a key area of the electronic device.
13. The method of any of claims 1-12, wherein said electronic device is portable.
14. The method of any of claims 1-13, further including a learning stage to estimate at least one of the following based on received sound or vibration data: allowed or predetermined sound source location, and characteristic of a sound emitted by a sound source.
15. The method of any of claims 1-14, wherein orientation of the electronic device is estimated preferably by utilizing a magnetometer or a gyro.
16. The use of method of any of claims 1-15 substantially in a game or musical application.
17. An electronic device comprising data input means (610) for receiving sound or related structural vibration data, further comprising processing means (602) and memory means (604) for processing and storing instructions and data, and in particular, for determining on the basis of the received data one or more location estimates of one or more sound sources (508), characterized in that said device is adapted to perform an action dependent on said one or more location estimates of said one or more sound sources.
18. The electronic device of claim 17, wherein said data input means include at least one transducer.
19. The electronic device of claim 18, wherein said transducer is a microphone or an accelerometer.
20. The electronic device of any of claims 17-19, wherein two or more microphones, either included in or functionally connected to the electronic device, form a microphone array.
21. The electronic device of claim 20, wherein said array comprises at least one of the following: binaural array, static linear multi microphone array, static arbitrary multi microphone array, and dynamic arbitrary multi-microphone array.
22. The electronic device of any of claims 17-21, wherein said action is at least one of the following: controlling the device's functionality, updating a parameter value, launching/closing an application, inserting a character or a symbol on a device display, inserting a character or a symbol in a document visible on the display, carrying out a control event relating to a game, and generating a response to the device user.
23. The electronic device of any of claims 17-22, further comprising projecting means (612) for superimposing a number of key areas on an external surface, whereby the processing means (602) are adapted to localize the key area subjected to a touch or tapping as the location estimate of the sound source.
24. The electronic device of claim 17, adapted to receive and localize sound or structural vibration caused by touching or tapping the surface of the electronic device.
25. The electronic device of claim 24, adapted to receive and localize sound or structural vibration caused by touching a display or a keypad of the electronic device.
26. The electronic device of claim 17 that is an aggregate entity comprising a plurality of separable devices.
27. The electronic device of claim 26, wherein a connection between two devices belonging to said plurality of devices is either wireless or wired.
28. The electronic device of claim 26, wherein a device belonging to said plurality of devices is adapted to determine its position in relation to another device belonging to said plurality of devices.
29. The electronic device of claim 17 that is substantially a mobile terminal, a PDA (Personal Digital Assistant), a desktop game console, a hand-held game console, a wireless game controller, or an input device.
30. The electronic device of claim 29 that is GSM (Global System for Mobile Communications), WLAN (Wireless Local Area Network) or UMTS (Universal Mobile Telecommunication System) compatible.
31. The electronic device of any of claims 17-30, further adapted to localize itself based on one or more received sound signals.
32. The device of any of claims 17-31, further adapted to estimate at least one of the following based on received sound or vibration data: allowed or predetermined sound source location, and characteristic of a sound emitted by a sound source.
33. The device of any of claims 17-32, further adapted to estimate its orientation by preferably utilizing a magnetometer or a gyro.
34. The device of any of claims 17-33 that is a musical accessory.
35. The device of any of claims 17-34, adapted to play a sound associated with the estimated location.
36. A computer program comprising code means to execute the method steps of claim 1.
37. A carrier medium carrying the computer executable program of claim 36.
PCT/FI2004/000805 2004-12-29 2004-12-29 A method and a device for localizing a sound source and performing a related action WO2006070044A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/FI2004/000805 WO2006070044A1 (en) 2004-12-29 2004-12-29 A method and a device for localizing a sound source and performing a related action

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/FI2004/000805 WO2006070044A1 (en) 2004-12-29 2004-12-29 A method and a device for localizing a sound source and performing a related action

Publications (1)

Publication Number Publication Date
WO2006070044A1 true WO2006070044A1 (en) 2006-07-06

Family

ID=36614535

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2004/000805 WO2006070044A1 (en) 2004-12-29 2004-12-29 A method and a device for localizing a sound source and performing a related action

Country Status (1)

Country Link
WO (1) WO2006070044A1 (en)

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008047294A2 (en) * 2006-10-18 2008-04-24 Koninklijke Philips Electronics N.V. Electronic system control using surface interaction
US20110018825A1 (en) * 2009-07-27 2011-01-27 Sony Corporation Sensing a type of action used to operate a touch panel
WO2011138071A1 (en) * 2010-05-07 2011-11-10 Robert Bosch Gmbh Device and method for operating a device
WO2012098425A1 (en) * 2011-01-17 2012-07-26 Nokia Corporation An audio scene processing apparatus
WO2012058465A3 (en) * 2010-10-29 2012-08-23 Qualcomm Incorporated Transitioning multiple microphones from a first mode to a second mode
WO2013059488A1 (en) * 2011-10-18 2013-04-25 Carnegie Mellon University Method and apparatus for classifying touch events on a touch sensitive surface
WO2014109916A1 (en) 2013-01-08 2014-07-17 Sony Corporation Controlling a user interface of a device
CN104598193A (en) * 2014-12-29 2015-05-06 联想(北京)有限公司 Information processing method and electronic equipment
US9329715B2 (en) 2014-09-11 2016-05-03 Qeexo, Co. Method and apparatus for differentiating touch screen users based on touch event analysis
US9329688B2 (en) 2013-02-28 2016-05-03 Qeexo, Co. Input tools having vibro-acoustically distinct regions and computing device for use with the same
US9355418B2 (en) 2013-12-19 2016-05-31 Twin Harbor Labs, LLC Alerting servers using vibrational signals
US9778783B2 (en) 2014-02-12 2017-10-03 Qeexo, Co. Determining pitch and yaw for touchscreen interactions
WO2017191894A1 (en) * 2016-05-03 2017-11-09 Lg Electronics Inc. Electronic device and controlling method thereof
US9864453B2 (en) 2014-09-22 2018-01-09 Qeexo, Co. Method and apparatus for improving accuracy of touch screen event analysis by use of edge classification
US9864454B2 (en) 2013-03-25 2018-01-09 Qeexo, Co. Method and apparatus for classifying finger touch events on a touchscreen
US9922637B2 (en) 2016-07-11 2018-03-20 Microsoft Technology Licensing, Llc Microphone noise suppression for computing device
US10095402B2 (en) 2014-10-01 2018-10-09 Qeexo, Co. Method and apparatus for addressing touch discontinuities
US10282024B2 (en) 2014-09-25 2019-05-07 Qeexo, Co. Classifying contacts or associations with a touch sensitive device
CN109753191A (en) * 2017-11-03 2019-05-14 迪尔阿扣基金两合公司 A kind of acoustics touch-control system
US10365763B2 (en) 2016-04-13 2019-07-30 Microsoft Technology Licensing, Llc Selective attenuation of sound for display devices
US10564761B2 (en) 2015-07-01 2020-02-18 Qeexo, Co. Determining pitch for proximity sensitive interactions
US10599250B2 (en) 2013-05-06 2020-03-24 Qeexo, Co. Using finger touch types to interact with electronic devices
US10606417B2 (en) 2014-09-24 2020-03-31 Qeexo, Co. Method for improving accuracy of touch screen event analysis by use of spatiotemporal touch patterns
US10642404B2 (en) 2015-08-24 2020-05-05 Qeexo, Co. Touch sensitive device with multi-sensor stream synchronized data
US10712858B2 (en) 2014-09-25 2020-07-14 Qeexo, Co. Method and apparatus for classifying contacts with a touch sensitive device
US10942603B2 (en) 2019-05-06 2021-03-09 Qeexo, Co. Managing activity states of an application processor in relation to touch or hover interactions with a touch sensitive device
US10949029B2 (en) 2013-03-25 2021-03-16 Qeexo, Co. Method and apparatus for classifying a touch event on a touchscreen as related to one of multiple function generating interaction layers
US11009989B2 (en) 2018-08-21 2021-05-18 Qeexo, Co. Recognizing and rejecting unintentional touch events associated with a touch sensitive device
US11175698B2 (en) 2013-03-19 2021-11-16 Qeexo, Co. Methods and systems for processing touch inputs based on touch type and touch intensity
US11231815B2 (en) 2019-06-28 2022-01-25 Qeexo, Co. Detecting object proximity using touch sensitive surface sensing and ultrasonic sensing
US11619983B2 (en) 2014-09-15 2023-04-04 Qeexo, Co. Method and apparatus for resolving touch screen ambiguities

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1182643A1 (en) * 2000-08-03 2002-02-27 Sony Corporation Apparatus for and method of processing audio signal
US20020167862A1 (en) * 2001-04-03 2002-11-14 Carlo Tomasi Method and apparatus for approximating a source position of a sound-causing event for determining an input used in operating an electronic device
GB2385125A (en) * 2002-02-06 2003-08-13 Soundtouch Ltd Using vibrations generated by movement along a surface to determine position
US20040004600A1 (en) * 2000-02-17 2004-01-08 Seiko Epson Corporation Input device using tapping sound detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040004600A1 (en) * 2000-02-17 2004-01-08 Seiko Epson Corporation Input device using tapping sound detection
EP1182643A1 (en) * 2000-08-03 2002-02-27 Sony Corporation Apparatus for and method of processing audio signal
US20020167862A1 (en) * 2001-04-03 2002-11-14 Carlo Tomasi Method and apparatus for approximating a source position of a sound-causing event for determining an input used in operating an electronic device
GB2385125A (en) * 2002-02-06 2003-08-13 Soundtouch Ltd Using vibrations generated by movement along a surface to determine position

Cited By (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008047294A3 (en) * 2006-10-18 2008-06-26 Koninkl Philips Electronics Nv Electronic system control using surface interaction
WO2008047294A2 (en) * 2006-10-18 2008-04-24 Koninklijke Philips Electronics N.V. Electronic system control using surface interaction
US20110018825A1 (en) * 2009-07-27 2011-01-27 Sony Corporation Sensing a type of action used to operate a touch panel
CN101968696A (en) * 2009-07-27 2011-02-09 索尼公司 Sensing a type of action used to operate a touch panel
EP2280337A3 (en) * 2009-07-27 2011-06-22 Sony Corporation Sensing a type of action used to operate a touch panel
WO2011138071A1 (en) * 2010-05-07 2011-11-10 Robert Bosch Gmbh Device and method for operating a device
US9226069B2 (en) 2010-10-29 2015-12-29 Qualcomm Incorporated Transitioning multiple microphones from a first mode to a second mode
WO2012058465A3 (en) * 2010-10-29 2012-08-23 Qualcomm Incorporated Transitioning multiple microphones from a first mode to a second mode
WO2012098425A1 (en) * 2011-01-17 2012-07-26 Nokia Corporation An audio scene processing apparatus
US9851841B2 (en) 2011-10-18 2017-12-26 Carnegie Mellon University Method and apparatus for classifying touch events on a touch sensitive surface
US9465494B2 (en) 2011-10-18 2016-10-11 Carnegie Mellon University Method and apparatus for classifying touch events on a touch sensitive surface
WO2013059488A1 (en) * 2011-10-18 2013-04-25 Carnegie Mellon University Method and apparatus for classifying touch events on a touch sensitive surface
US10642407B2 (en) 2011-10-18 2020-05-05 Carnegie Mellon University Method and apparatus for classifying touch events on a touch sensitive surface
EP2926228A4 (en) * 2013-01-08 2016-09-07 Sony Corp Controlling a user interface of a device
WO2014109916A1 (en) 2013-01-08 2014-07-17 Sony Corporation Controlling a user interface of a device
US9329688B2 (en) 2013-02-28 2016-05-03 Qeexo, Co. Input tools having vibro-acoustically distinct regions and computing device for use with the same
US11175698B2 (en) 2013-03-19 2021-11-16 Qeexo, Co. Methods and systems for processing touch inputs based on touch type and touch intensity
US9864454B2 (en) 2013-03-25 2018-01-09 Qeexo, Co. Method and apparatus for classifying finger touch events on a touchscreen
US10949029B2 (en) 2013-03-25 2021-03-16 Qeexo, Co. Method and apparatus for classifying a touch event on a touchscreen as related to one of multiple function generating interaction layers
US11262864B2 (en) 2013-03-25 2022-03-01 Qeexo, Co. Method and apparatus for classifying finger touch events
US10599250B2 (en) 2013-05-06 2020-03-24 Qeexo, Co. Using finger touch types to interact with electronic devices
US10969957B2 (en) 2013-05-06 2021-04-06 Qeexo, Co. Using finger touch types to interact with electronic devices
US9355418B2 (en) 2013-12-19 2016-05-31 Twin Harbor Labs, LLC Alerting servers using vibrational signals
US11048355B2 (en) 2014-02-12 2021-06-29 Qeexo, Co. Determining pitch and yaw for touchscreen interactions
US9778783B2 (en) 2014-02-12 2017-10-03 Qeexo, Co. Determining pitch and yaw for touchscreen interactions
US10599251B2 (en) 2014-09-11 2020-03-24 Qeexo, Co. Method and apparatus for differentiating touch screen users based on touch event analysis
US9329715B2 (en) 2014-09-11 2016-05-03 Qeexo, Co. Method and apparatus for differentiating touch screen users based on touch event analysis
US11619983B2 (en) 2014-09-15 2023-04-04 Qeexo, Co. Method and apparatus for resolving touch screen ambiguities
US9864453B2 (en) 2014-09-22 2018-01-09 Qeexo, Co. Method and apparatus for improving accuracy of touch screen event analysis by use of edge classification
US11029785B2 (en) 2014-09-24 2021-06-08 Qeexo, Co. Method for improving accuracy of touch screen event analysis by use of spatiotemporal touch patterns
US10606417B2 (en) 2014-09-24 2020-03-31 Qeexo, Co. Method for improving accuracy of touch screen event analysis by use of spatiotemporal touch patterns
US10712858B2 (en) 2014-09-25 2020-07-14 Qeexo, Co. Method and apparatus for classifying contacts with a touch sensitive device
US10282024B2 (en) 2014-09-25 2019-05-07 Qeexo, Co. Classifying contacts or associations with a touch sensitive device
US10095402B2 (en) 2014-10-01 2018-10-09 Qeexo, Co. Method and apparatus for addressing touch discontinuities
CN104598193A (en) * 2014-12-29 2015-05-06 联想(北京)有限公司 Information processing method and electronic equipment
CN104598193B (en) * 2014-12-29 2020-04-24 联想(北京)有限公司 Information processing method and electronic equipment
US10564761B2 (en) 2015-07-01 2020-02-18 Qeexo, Co. Determining pitch for proximity sensitive interactions
US10642404B2 (en) 2015-08-24 2020-05-05 Qeexo, Co. Touch sensitive device with multi-sensor stream synchronized data
US10365763B2 (en) 2016-04-13 2019-07-30 Microsoft Technology Licensing, Llc Selective attenuation of sound for display devices
US10191595B2 (en) 2016-05-03 2019-01-29 Lg Electronics Inc. Electronic device with plurality of microphones and method for controlling same based on type of audio input received via the plurality of microphones
KR20170124890A (en) * 2016-05-03 2017-11-13 엘지전자 주식회사 Electronic device and method for controlling the same
KR102434104B1 (en) * 2016-05-03 2022-08-19 엘지전자 주식회사 Electronic device and method for controlling the same
WO2017191894A1 (en) * 2016-05-03 2017-11-09 Lg Electronics Inc. Electronic device and controlling method thereof
US9922637B2 (en) 2016-07-11 2018-03-20 Microsoft Technology Licensing, Llc Microphone noise suppression for computing device
CN109753191A (en) * 2017-11-03 2019-05-14 迪尔阿扣基金两合公司 A kind of acoustics touch-control system
CN109753191B (en) * 2017-11-03 2022-07-26 迪尔阿扣基金两合公司 Acoustic touch system
US11009989B2 (en) 2018-08-21 2021-05-18 Qeexo, Co. Recognizing and rejecting unintentional touch events associated with a touch sensitive device
US10942603B2 (en) 2019-05-06 2021-03-09 Qeexo, Co. Managing activity states of an application processor in relation to touch or hover interactions with a touch sensitive device
US11231815B2 (en) 2019-06-28 2022-01-25 Qeexo, Co. Detecting object proximity using touch sensitive surface sensing and ultrasonic sensing

Similar Documents

Publication Publication Date Title
WO2006070044A1 (en) A method and a device for localizing a sound source and performing a related action
US9928835B1 (en) Systems and methods for determining content preferences based on vocal utterances and/or movement by a user
CN109256146B (en) Audio detection method, device and storage medium
Essl et al. Interactivity for mobile music-making
CN103235642B (en) Virtual musical instrument system that a kind of 6 dimension sense organs are mutual and its implementation
JP6737996B2 (en) Handheld controller for computer, control system for computer and computer system
TW200813795A (en) Method, apparatus, and computer program product for entry of data or commands based on tap detection
WO2014179096A1 (en) Detection of and response to extra-device touch events
WO2020059245A1 (en) Information processing device, information processing method and information processing program
JP7140083B2 (en) Electronic wind instrument, control method and program for electronic wind instrument
WO2010112677A1 (en) Method for controlling an apparatus
CN105373220B (en) It is interacted using position sensor and loudspeaker signal with the user of device
US12100380B2 (en) Audio cancellation system and method
US20220405047A1 (en) Audio cancellation system and method
US20240221714A1 (en) Transfer function generation system and method
US20230252963A1 (en) Computing Device
CN214504972U (en) Intelligent musical instrument
CN109739388B (en) Violin playing method and device based on terminal and terminal
JP6111526B2 (en) Music generator
Overholt Advancements in violin-related human-computer interaction
KR100650890B1 (en) Mobile communication terminal having music player and music playing method in that terminal
JP7353136B2 (en) controller
CN107404581B (en) Musical instrument simulation method and device for mobile terminal, storage medium and mobile terminal
CN109801613B (en) Terminal-based cello playing method and device and terminal
US20220270576A1 (en) Emulating a virtual instrument from a continuous movement via a midi protocol

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 04805200

Country of ref document: EP

Kind code of ref document: A1

WWW Wipo information: withdrawn in national office

Ref document number: 4805200

Country of ref document: EP