WO2006070044A1

WO2006070044A1 - A method and a device for localizing a sound source and performing a related action

Info

Publication number: WO2006070044A1
Application number: PCT/FI2004/000805
Authority: WO
Inventors: Jussi Virolainen; Pauli Laine
Original assignee: Nokia Corporation
Priority date: 2004-12-29
Filing date: 2004-12-29
Publication date: 2006-07-06

Abstract

A method and a device for performing actions (510) based on localization (508) of sound source(s). The estimated location of a sound source specifies launching a certain program or performing some other predetermined action. In addition to a microphone, an accelerometer can be used in the device for enhancing the localization results. The device may be a mobile terminal or a PDA (Personal Digital Assistant) with invention dependent additional software and/or hardware, for example.

Description

A method and a device for localizing a sound source and performing a related action

FIELD OF THE INVENTION

The present invention relates to user interfaces and input devices such as game controllers. Especially the invention pertains to sound source localization based control of mobile devices and software thereof.

BACKGROUND OF THE INVENTION

Modern mobile devices such as PDAs (Pocket Digital Assistant) and mobile terminals, or generally portable or hand-held devices like game controllers have evolved tremendously during the last 10 years or so. Likewise, means for providing such devices with necessary input information have improved as to the comfort, learning curve, reliability, size, etc. Classic input means, by the word "classic" in the sense of electronic industry referring herein to only a few or few tens of years period before the present, such as a keyboard/keypad, a mouse, and various style joysticks, have in certain areas retained their strong position whereas in some others also newer means have gained popularity.

Next, we shall consider a typical button arrangement (~keyboard) and a joystick as input means in more detail. See figure Ia of standard keyboard 102 and mouse 104 set-up for controlling a computer and figure Ib for an illustrative example of a basic shape joystick 106, i.e. the most common game controller. Certainly joysticks with one multi-axis adjustable controller stick 108 or cross controller (D-pad) and a (minor) number of buttons 110, or button arrangements 102 like the de facto factory standard QWERTY keyboard possibly accompanied by mouse 104, are useful in many traditional cases wherein neither the delivery speed of user input nor flexible replacement of different buttons, i.e. separate input locations, are that essential factors.

Quite often the total number of buttons or other pressure-sensitive areas (e.g. touch pad) is still somewhat limited due to the size of the surface area on hand for the purpose and to the increasing cost/technical difficulties in producing miniature scale but still user accessible button placements. Notwithstanding the problems possibly emerging from positioning the buttons over a device's surface area, another inconvenient issue remains; common "button-tapping" based physical interaction used for inputting user data or control information is limited to certain predetermined areas of the device, such areas being dedicated to that very specific purpose already during the product development and manufacturing stage.

Some modern applications also utilize direct contact-free user input means for providing the target devices with necessary information. Such means may exploit optical arrangements, for example. A PDA may incorporate a virtual keyboard feature or an external add-on by which a common keyboard layout is optically superimposed on a (preferably solid and smooth) surface. Then the device user's finger position(s) on these virtual keys is determined by optical/visual pattern detection techniques.

Even sound may be used to provide input to a device equipped with necessary reception means such as a microphone. Audio, particularly speech, based user interfaces quite often support a certain vocabulary, either a predefined or user- adjustable one, including a number of elements, e.g. words or syllables, that alone or as a certain group form commands linked to a certain function/action to be executed by the device. In practice it's a question of audio/speech recognition ability (algorithm itself, performance in noisy conditions, etc) of the device and the used vocabulary (size, computational (case-dependent) distance between elements, etc) how problem- free and useful such approach finally is in different use scenarios.

Nevertheless, numerous prior art techniques and technologies described above do not address the problem of remotely and flexibly inputting control or other information to a target device. Typical wired, infrared, or radio frequency remote controllers can limitedly be used for the purpose but even them do not cope with all requirements set by various applications. First, a separate remote controller unit may either be lost or forgotten somewhere quite easily. Secondly, it anyhow takes some additional space, which is not preferable e.g. in case of otherwise maximally small-sized mobile devices that should be kept as portable as possible. As a third note, (real-time) multi-user applications like games would, with a high probability, require using multiple controllers, one per each user. That fact, besides putting additional strain on the users in form of acquiring controllers and particularly getting familiar with their different functionalities, also contravenes the design rules of convenience and simplicity for maintaining the use experience as transparent as possible. SUMMARY OF THE INVENTION

The object of the present invention is to circumvent or at least alleviate the aforementioned imperfections in remote device access. The object is met by introducing a method and a related device for audio localization based execution of actions such as various device control functions and/or data input.

In a basic solution and considering e.g. a mobile terminal acting as a target device, when user taps his/her fingers against or merely touches a surface near the terminal or on the terminal itself, a sound is generated. Resulting sound that propagates through the terminal body or air (as an exemplary medium) is detected by transducers, e.g. a microphone array consisting of 2 to N microphones. These audio signals are analysed by signal processing techniques in electric form, and as a result, tapping location relative to the microphone array/target device is determined. By location it can be referred to both distance and angle of bearing or relative/absolute coordinates from the target device's viewpoint. The location estimate can be relative in which case e.g. the target device is itself in the origin of the local coordinate system as the device does not contain additional data to be able to localize itself. Alternatively, the target device may even determine the absolute position of the sound sources if it is aware of its position due to e.g. GPS (Global Positioning System) capability or some localization service provided by the mobile network. The target device may optionally localize itself by receiving predetermined audio signals from sources with known positions and by analysing them. The location estimate of the sound source is finally used as control information for performing an action. Together with GPS or some other localization- enabling service, also orientation detection of the terminal might be desirable. E.g. a magnetometer can be used as a sort of compass. Correspondingly, gyros (gyroscopes) may be utilized.

The microphone array can be mounted in the terminal itself. Simple example is a ready-made stereo microphone installed therein. However, the microphone array can be a separate device that establishes a data transfer connection to the terminal e.g. over

Bluetooth. The array may even constitute from a group of terminals or multipart devices containing the transducers. Terminals having only a single microphone can form an ad hoc array. In this case, proximity detection and time delay estimation or clock synchronization between terminals is preferred. The above basic solution may be cultivated based on each particular application and more components can be flexibly added thereto, or respectively, some components may be taken away or replaced by other alternatives as explained in the detailed description hereinafter.

The utility of the invention arises from a plurality of issues. First of all, the invention can be implemented in many target devices substantially without installing new hardware or accessory devices. The suggested solution is technically feasible also in a global sense as necessary hardware for realizing it is already adopted worldwide meaning processors, memory chips, and audio transducers such as microphones and A/D converters. The invention core resides in the program logic, being software, tailored hardware, or both, and especially on how the available hardware resources are arranged to function together. From the usage point of view, adoption curve for utilizing the input method proposed herein is short, as the user is not expected to learn mechanics of any new device as such and it basically is enough to get oneself familiar with his own physique and maybe on general level the basic acoustic properties of the surrounding materials he is going to use as a part of sound generation process. For example, tapping foamed plastic does not typically make a sound loud enough, i.e. a sound that's not suppressed too much by background noise for enabling reliable localization. Also synthetic or non-human (initiated) sounds may be applied as input. Variety of different applications may take use of the invention due to its natural flexibility as audio sources and thus the actual emitted sounds are not limited to any particular type or (relative) location; in the device itself, for example, "tapping the body" is possible instead of mere button presses, or an external location like the table surface on which the device lies can be used as a "sound source" if knocked with fingers etc. Other benefits of the invention are mentioned in connection with the description of related embodiments.

In one aspect of the invention, a method for performing an action in an electronic device based on sound source localization, having the steps of

-receiving sound or related structural vibration data through a plurality of transducers, -determining on the basis of the received data one or more location estimates for one or more sound sources, is characterized in that it further has the step of -performing an action in the electronic device, said action being dependent on said one or more location estimates of said one or more sound sources. In another aspect of the invention, an electronic device comprising data input means for receiving sound or related structural vibration data, further comprising processing means and memory means for processing and storing instructions and data, and in particular, for determining on the basis of the received data one or more location estimates of one or more sound sources, is characterized in that it is adapted to perform an action dependent on said one or more location estimates of said one or more sound sources.

In an embodiment of the invention a mobile terminal comprises a number of microphones and necessary software to detect incoming sounds and localize the sound source(s). The estimated position of the sound source(s) is then used to control a parameter value of the terminal, to execute certain action, etc. Such solution can also be exploited in more direct control of the device by implementing a "virtual" or "audio" keyboard feature, in particular the response analysis part thereof, by localizing tapping events on a "keyboard" that may actually be e.g. a natural size paper copy of a true keyboard layout placed on a table or the aforementioned optically projected one. One additional use case is a virtual drum set application in which a number of tap locations are mapped to different drums such as a bass drum, snare, hi-hat etc. Tapping a location with active mapping triggers playing the associated drum sound. The received tap intensity may be used to adjust the playback dynamics. This approach can be also extended towards other audio applications, e.g. a piano/synthesizer simulator, where different piano/synth sounds are played based on sound source location estimates.

In another embodiment a mobile terminal, a PDA or a mobile gaming console is further adapted to detect vibration advancing from the sound source through the device structure, e.g. a cover. To enhance the detection of internally progressing vibrations, although such may be recognizable through microphones as well, an accelerometer is additionally installed in the device. Initial sound source location estimate acquired by utilizing the microphones can be adjusted by information provided by the accelerometer to make the detection more robust in a noisy environment. Likewise, audio keyboard applications and alike can benefit from the installation of the accelerometer; the target device may be situated on a surface that's also the one to be tapped with fingers to first boost the accelerometer readings and thus indirectly also improve the overall localization performance. The extreme case is that the device cover itself is used as an audio source together with another object like a pen, a stylus, or a fingertip; this type of solution, although being feasible even without the accelerometer in accordance with the first embodiment, could be used to implement audio type keypad in a mobile device, for example, to replace all or some of the original keys or to work along with those.

In a further embodiment of the invention a group of mobile terminals comprising microphones is connected together to form a microphone array, or in more general terms, a localization system consisting of two or more separable devices. The terminals shall be aware of the distance/time delay or clock sync between them to be able to properly analyse the received sounds as an aggregate.

Yet in a further embodiment the user teaches the target device to detect taps at certain locations and to associate controls with them. He wants to define three arbitrary locations on a table, and when each of them is tapped a different control action will be performed. The preferred methodology of teaching could indicate first setting the device in a learning/teaching mode and tapping the first tap area several times. The device runs, for example, neural network or hidden Markov model based methods for first learning the locations of new tap areas in the teaching mode and subsequently for classifying the tap positions in the use mode. This two-step technique can be seen analogous to teaching a machine to automatically recognise user's handwriting.

Accompanying dependent claims disclose embodiments of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

Hereinafter the invention is described in more detail by reference to the attached drawings, wherein

Fig. IA discloses a typical prior art computer control arrangement with a keyboard and a mouse Fig. IB discloses a classic game controller with a control stick and a few buttons.

Fig. 2 illustrates the first embodiment of the invention wherein a mobile terminal has been equipped with necessary hardware such as a number of microphones and software for implementing the inventive concept.

Fig. 3 illustrates the second embodiment of the invention wherein a mobile terminal has been additionally provided with an accelerometer to further enhance the localization performance. Fig. 4 illustrates the third embodiment of the invention in which three terminals constitute a system for audio localization and action execution.

Fig. 5 discloses a flow diagram of the method of the invention for localizing sound signals and performing actions. Fig. 6 discloses a block diagram of an electronic device in accordance with the invention.

DETAILED DESCRIPTION OF THE EMBODIMENTS OF THE INVENTION

Figures IA and IB were already covered in conjunction with the description of related prior art.

Figure 2 illustrates, by way of example only, mobile terminal 202 that contains at least two microphones 208 or one stereo microphone. Using only one (mono) microphone instead is possible for distance detection but more accurate localization is then quite challenging. Either the microphones 208 have been fixedly installed in the terminal during manufacturing, or at least one of them is a so-called accessory/add-on device attached to the terminal by the user. Such attachment may be wired or wireless, e.g. Bluetooth or IR connection. When the user taps on the nearby objects like the table on which terminal 202 has been positioned, originating sounds 204, 206 will eventually catch up microphones that convert the oscillation of the medium (air) into electric signals. The electric signals are funnelled to processing means, e.g. a microprocessor or a DSP, that use microphone data for estimating the location of the sound sources and further for controlling a number of the terminal's functions or parameters by that. Alternatively, determination of the location estimates and further control of the functions can be divided to be performed by e.g. several dedicated chips functionally connected with each other.

Optionally, tap detection could be improved by using a priori knowledge of the "excitation" signal. If the device knew the tap sound characteristics before, it would boost the performance in noisy conditions. Further, this approach could be combined with the location-learning embodiment. Several tap sounds are recorded and an averaged estimate of the tap sound characteristics is stored. When the tap detection process is running, the detector may compare the expected tap sound to the detected version and decide whether the former was present in the microphone input signal. This method could enhance the sound detection/localization results if the surface tapped on was different material, made different sound, etc. Microphone placement is substantially fictitious in the figure and in reality a plurality of variables has to be evaluated for proper device design. Some microphone array architectures that have been found suitable for localization are listed below:

1) binaural localization (binaural headset in which microphones are in the headphones that the user wears, or alternatively, a stereo microphone in the terminal)

2) static linear multi microphone array of N microphones

3) static arbitrary multi microphone array of N microphones 4) dynamic arbitrary multi-microphone array (ad hoc array comprised of multiple devices, proximity detection between devices required)

Principles of binaural localization (two microphones) have been described in Roman, N, Wang, D, Brown, G "Speech segregation based on sound localization", Journal of Acoustical Society of America, VoI 114 no. 4, Pt 1. Oct 2003.

One method for speech source localization is disclosed in Brandstein, M. S., Silverman, H. F, "A practical methodology for speech source localization with microphone arrays", Computer speech and language, no. 11, pp. 91-126, 1997.

As a special case, time reverse-type methods may be applied in location detection: Fink, M., "Time reversed acoustics", Scientific American, pp.67-73, Nov. 1999.

Detection of the sound source location either near the microphone area or farther away therefrom can be considered as different problems. In the following concept nearfield refers to situation where distance of audio event is comparable to a distance between separate microphones and farfield to a situation where distance of the audio event is several times greater than the distance between separate microphones.

When the sound source is located in the nearfield, localization can be highly based on direct sound localization. It can be expected that first early reflections from surfaces and walls reach the microphones not until relatively long time (4-20ms) after the direct sound. In addition, amplitude difference between direct sound and reflected sound can be relatively large. Location detection in the nearfield can thus be accomplished without big problems. The detected wave fronts can be considered spherical, which make accurate distance estimation possible. Meanwhile, in the farfield, time difference between direct sound and reflected sound can be short and amplitude difference small. Room effect generally makes accurate location detection much more difficult than in the nearfield scenario. Incoming wave fronts are planar, which complicates localization in its turn.

Instead of utilizing only one microphone array, several arrays can be used for the detection. Overlapping regions improve location accuracy.

In the second embodiment of the invention and in accordance with figure 3 compact game console 302 or a corresponding electronic apparatus with same hand-held nature further includes at least one accelerometer 308 in addition to a number of microphones 310. When both microphone array (stereo microphone) 310 and accelerometer 308 are in use, taps 304, 306 are localizable once again using microphone signals, as was the case in the above first embodiment, but now, practically simultaneously taps are also detected by accelerometer(s) 308 to provide an additional parameter or parameters for the signal analysis stage. This way the system can be made more robust against environmental noise. Likewise, accelerometer 308 can be used for detecting vibrations not directly introduced to console 302 itself.

Accelerometers 308 are basically transducers that measure changes in the velocity of related objects for determining acceleration or vibration thereof. Typically the measurements are performed by somewhat basic arrangement including a mass (m) spring (k) system, a so-called pendulous accelerometer, wherein the displacement of the mass attached to the spring is proportional to and caused by the acceleration (~inertia) and is, at the same time, measurable by utilizing e.g. electrical means, i.e. displacement causes a change in capacitance between two measurement points. Sometimes an accelerometer is made sensitive to a certain axis only, and therefore plurality of them is needed to cover multiple axes. Also other accelerometer types like piezoelectric and electromagnetic exist and a person skilled in the art may test and compare different available solutions in connection with the implementation of the invention to find a best-tailored solution for each use case.

It shall be kept in mind though that microphones and accelerometers are not the only transducer types deemed applicable in providing console 302 or some other target device with fully useful input information. Thus either other audio signal transducers or even transducers with completely different technical approach alike (optical, thermal, etc) are exploitable for further analysis and sound source localization as, at least, supplementary means.

A detectable audio event such as tapping may be realized by hitting fingers against a surface. Alternatively, a stylus or some other pointing device, possibly producing more localizable sounds than a finger due to a smaller cross-section and therefore reduced contacting surface thereof, could be used to improve the detectability and localizability of tapping. Understandably, harder material of the stylus in comparison with relatively soft human skin also introduces less elasticity and therefore more power to be emitted in the impact between the stylus and the target surface.

Regardless of the used materials, even footsteps can be detected as well as other preferred "natural" or synthetic audio events. Instead of detecting only the event, additional information about the detectable sound itself can be used to improve the detection process in noisy environment. Time-domain and frequency domain methods such as different transformations (e.g. Fourier analysis) can be used to analyse the sounds and to reduce problems caused by environmental noise and reverberation. One possibility is to detect the sharpness of the event. If the event is too smooth, for example attack and decay times or spectral content do not fall into predetermined limits, it is not considered as a proper event. One may use different types of tap- sounds, e.g. fingertip or nails to make a difference on the intended effect. A possible enhancement related to this is the utilization of direction-sensitive microphone surface layer 'hairs', which make different sound when they are moved towards different directions.

The third embodiment of the invention is depicted in figure 4. Three devices, namely game console 402, mobile terminal 404, and PDA 406 have been functionally connected together, console 402 and terminal 404 via a wireless, e.g. Bluetooth, connection, and terminal 404 and PDA 406 through a traditional wired connection, e.g. Firewire, to establish an aggregate entity. It's not a crucial detail from the invention's standpoint however, whether the connected devices are essentially identical or not; it's sufficient that the devices are able to communicate with each other to the extent required by the invention. The devices can be functionally connected all together to form e.g. a circle-shaped entity, in which case one device multicasts its own measurement/analysis data to other two devices, or alternatively, the devices may communicate serially with one device 404 remaining in the centre of the transmission chain as a data forwarding link. Irrespective of the used communication mode, the devices shall be kept aware of the distances/time delays between them. In case there's only one master device substantially taking care of localization algorithms etc, the other devices do not have to be aware of the locations of the remaining ones. Devices belonging to the aggregate entity may mutually localize themselves by sending e.g. predetermined test (audio) signals to each other. Based on the received signal spectrum/power data and/or possible other parameters determinable from the received test signals the devices can first estimate their internal placement before starting to localize external sources 408. Also exchange of received microphone/accelerometer signals and analysis results etc can be actualised between devices. Alternatively/additionally e.g. GPS equipped devices may directly inform each other about their individual positions to set-up necessary initial location information of the entities forming the aggregate entity. To attain even more sophisticated solution also (relational) further movement of aggregate devices shall be possible by adaptively/continuously updating the necessary location information between them.

In general terms, determination of internal/external locations of sound sources may be handled by the aggregate entity either

a) independently, where each aggregate device is adapted to calculate it's own or external sound sources' position based on internal or at least aggregate internal data, b) concentratedly, where basically one or more devices perform all the location analysis on behalf of the other devices belonging to the aggregate, although the latter may provide the former with some (microphone/accelerometer) measurement data, or c) jointly, where certain type of information is calculated in one or more devices to a number of devices of the aggregate.

Additionally, to enable rapid information exchange between devices, they shall advantageously be synchronized to ease instant data transmission/reception. Alternatively, time code or other timing data may be included in exchanged data packets (measurement or analysis data) to enable the other device(s) to properly map such data into time-axis for further analysis/exploitation.

The third embodiment may be applicable especially in situations wherein devices 402, 404, 406 forming the aggregate entity do not otherwise bear necessary capabilities to implement localization applications described hereinbefore. Deficiencies may reside in hardware features, software features, or both. Secondly, whenever plurality of users will join together to exploit localization-controlled application and have their localization-enabled devices with them, the otherwise available redundancy can be cleverly put into use by utilizing many devices instead of mere one for better localization performance, for example. Such application could pertain to various games, plays, party events etc.

A further embodiment of the invention follows the general pattern of the previous embodiments with a target device adapted to detect taps at certain locations and to associate controls with them.

Referring back to figure 2, localization process is now enhanced at the expense of the transparency thereof. The user determines three (or some other preferred number) arbitrary locations on a table, e.g. locations corresponding to taps 204, 206 and an additional third one not shown. When each of the locations is tapped a different control action shall be performed. The device could be adapted to the prevailing use case by the user by first setting the device in a learning mode e.g. through some UI function and then by tapping several times to the first tap area. The device runs a classifier such as a neural network or hidden Markov model to learn the position of the area. Procedure is repeated for each location. The user may manually define the number of recognizable locations and other related parameters like sound characteristics through UI and, for example, store them as default values for future use, or at least some of the parameters may be automatically detected by the device, if e.g. despite the possible variance in taps related to the same location the overall separability between the different locations is good enough according to some predetermined classification criteria.

A potential musical (including rhythmical) application of this arrangement is e.g. a virtual drum simulator. Each drum is mapped to a certain location on the table. Tapping of such locations then triggers the associated drum samples. Tap amplitude can be used to control the dynamics of the sample playback.

Exemplary embodiments set forth above are not contradictory and therefore their features may be creatively combined and cultivated for each emerging scenario without significant problems by a person skilled in the art. For example, keypad symbols could be projected onto a table surface by optical projection techniques. Then, the sound generated by tapping at a certain location representing a key would be detected by microphone array and the detected location be mapped to corresponding symbol presented in projected keypad. For example, keypad of 4 x 3 buttons could be implemented this way. Time constants shall be selected to carry out the functionality of a multipurpose keypad and to detect if user had tapped once, twice or three times the same key location to produce e.g. letter [j, k or 1] in a conventional keypad of a contemporary mobile phone.

According to the common principle of reciprocity, instead of detecting the location of a sound event directly, the location of the microphone array according to the sound event can be estimated. Location of the device, either relative or absolute depending on the known parameters, could be detected versus a reference tone (possibly ultrasonic to avoid disturbing the users) or some other reference audio event as explained hereinbefore. Multiple sound events originating from different (known) locations can be used to improve the localization. Thus, depending on the available a priori information about reference tone characteristics, absolute location of the sound source sending the reference tone, and transmission path properties, microphone array's orientation and possibly relative or even absolute location may be locally determinable from the received sound(s). A basic principle applies: more information available more accurate localization results can be expected.

Acoustics based localization associated with larger distances may be used in game applications to measure the position of players or some other factor. Players themselves can also utilize additional communication-enabled gear such as accelerometers for controlling purposes. This could make some new game concepts possible in which a player receives instructions (by audio or visual representation) to move into some physical location in a space. After certain duration of time an audio event is played from a loudspeaker that everybody can hear, and each device detects its own location versus this reference tone. If the player is in right/wrong place appropriate feedback is provided.

Considering the applicability of the invention even further, microphones can replace some buttons of a mobile terminal or another preferred target device. In this case main purpose of the "buttons" is to make sound that can be detected by the microphone array mounted in the terminal. Conventional button mechanics could be set aside. Likewise, a sensitive display (~touch screen) can be implemented by microphones. A flow diagram illustrating the general inventive concept and the core of the invention as a step-by-step method is shown in figure 5. In step 502 the method is initialised, which may refer to, for example, launching the necessary software, initialising the related variables, and setting up connections to other devices in case of multiple device interconnection. In step 504 the device(s) are adapted to receive and process sound signals, by this referring to producing a number of audio samples with A/D converter inputting microphone output data, for example, via available microphone arrays and additional transducers such as accelerometers. Multiple devices may also exchange input data between them if seen useful. Transducers converting the input signals into electrical counterparts may also encompass more sophisticated calculation/analysis means (both hw and sw), e.g. frequency axis analysis/sound parameterisation tools, to refine the raw measurement data directly into more usable form. Step 504 may further incorporate an optional learning/teaching stage, i.e. receipt of test signals for estimating the predetermined (~allowed) sound source locations and/or sound characteristics.

In step 506 it is checked by comparing a regularly updated time counter (~timer) or mere sample counter value with a predetermined threshold, or by some other means, whether a data acquisition period has ended. If that's the case, method execution shall be continued from step 508, otherwise step 504 is repeated. Alternatively, data acquisition process according to step 504 may be continuous and happen in the background whereas the rest of the procedure steps still take place only occasionally, for example when there is enough new input data for further processing/analysis, or continuously, i.e. sliding window type or some other continuous estimation process is applied to the substantially continuously produced input data. When working in a digital domain, word "continuous" can however refer to a discrete time resolution of one sample, for example.

In step 508 location estimate(s) are determined for the sound source(s). Depending on the used localization method and application, one or more sources may be determinable from the same source data. The actual methodology to perform the localization step does not belong to the scope of the invention, and as briefly introduced earlier, the person skilled in the art may utilize the techniques that seem to fit the prevailing scenario best.

Step 510 indicates the execution of actions associated with the localization results. The associations may be application-specific and at least partly user-definable on case-by- case basis. For example, a certain application-specific parameter value (e.g. playback volume) can be dependent on the estimated location of a certain sound source; some localization result may trigger launching/closing a program; some other localization result could indicate typing a certain character on the display (and optionally inserting it in the document displayed) as with the virtual/audio keyboard application, etc. Naturally a localization result of a sound event can also result giving a certain response to the user of the device through visual, tactile or acoustic means. The localization arrangement may be used to control different functionalities (e.g. scroll up/down, play a cd, etc) of the device. Associations of this nature are relatively easy to implement by modern programming means and thereupon, as the total number of different applications is self-evidently enormous, there is no sense to start listing them here in order to provide a sort of maximally exhaustive record.

Step 512 implies the conditional termination of the method as a consequence of finishing the execution of associated actions of step 510. However, if the method execution shall be continued, execution thereof is reverted back to step 504. If not, method is ended in step 514.

Figure 6 that shall be taken only as an example is a generic block diagram of an electronic device capable of executing the suggested method. The device can be a mobile terminal, a PDA, or a hand-held game console/game controller (preferably a wireless one), for example. It comprises memory 604, divided between one or more physical memory chips, including necessary code, e.g. in a form of a computer program/application, and other data, e.g. current configuration and input sound data provided by various transducers. Processing unit 602, e.g. a microprocessor, a DSP, microcontroller, or a programmable logic chip, is required for the actual execution of the method in accordance with instructions stored in memory 604. Display 606 and keypad 608 or other applicable user input means provide the user with optional device control and data visualization means (~user interface). Respectively, various other output means such as a number of loudspeakers are optionally included to generate the users the preferred response (not shown). Data input means 610, either fixedly or detachably mounted, include microphone arrays and additional transducers such as accelerometer for inputting the necessary data to the device. Also wire-based or wireless transceivers (for transmission and/or reception) may be included for communication with other devices or accessories like external loudspeakers. Optional (light-)projecting means 612 superimpose e.g. a virtual keypad/keyboard on a nearby surface. The invention may be implemented as a combination of tailored software and more generic hardware, or exclusively through specialized hardware such as ASICs (Application Specific Integrated Circuit).

Software for carrying out the method of the invention can be delivered on a carrier medium such as a floppy, a cd-rom, a memory card, or a hard disk.

The scope of the invention is found in the following claims. Although certain focused examples were given throughout the text about the invention's applicability, feasible method steps, or related device internals, purpose thereof was not to limit the usage area of the fulcrum of the invention to any certain field, which should be evident to any rational reader. Meanwhile, the invention shall be considered as a novel, practical method for providing control or other type of information to an electric apparatus by utilizing especially audio signals and localization thereof for the purpose.

Claims

1. A method for performing an action in an electronic device based on sound source localization, has the steps of

-receiving sound or related structural vibration data through a plurality of transducers (504),

-determining on the basis of the received data one or more location estimates for one or more sound sources (508), characterized in that it further has the step of

-performing an action in the electronic device (510), said action being dependent on said one or more location estimates of said one or more sound sources.

2. The method of claim 1, wherein said plurality of transducers includes a microphone.

3. The method of any of claims 1-2, wherein said plurality of transducers includes an accelerometer.

4. The method of claim 2, wherein two or more microphones form a microphone array.

5. The method of claim 4, wherein said array comprises at least one of the following: binaural array, static linear multi microphone array, static arbitrary multi microphone array, and dynamic arbitrary multi-microphone array.

6. The method of any of claims 1-5, wherein said action is at least one of the following: controlling the device's functionality, updating a parameter value, launching/closing an application, inserting a character or a symbol on a device display, inserting a character or a symbol in a document visible on the display, carrying out a control event relating to a game, and giving a response to the user of the device.

7. The method of any of claims 1-6, wherein a number of key areas are projected on an external surface, and touching or tapping a surface onto which a certain key area has been projected is detected and then the tapping location estimate determined for performing the action.

8. The method of claim 1, wherein a plurality of devices forms an aggregate device, said aggregate device being the electronic device.

9. The method of claim 8, wherein a connection between two devices belonging to said plurality of devices is either wireless or wired.

10. The method of claim 8, wherein a device belonging to said plurality of devices determines its position in relation to another device belonging to said plurality of devices.

11. The method of claim 1, wherein said receiving step includes receipt of sound or structural vibration caused by touching or tapping the surface of the electrical device.

12. The method of claim 11, wherein said receiving step includes receiving sound or structural vibration caused by touching a display or a key area of the electronic device.

13. The method of any of claims 1-12, wherein said electronic device is portable.

14. The method of any of claims 1-13, further including a learning stage to estimate at least one of the following based on received sound or vibration data: allowed or predetermined sound source location, and characteristic of a sound emitted by a sound source.

15. The method of any of claims 1-14, wherein orientation of the electronic device is estimated preferably by utilizing a magnetometer or a gyro.

16. The use of method of any of claims 1-15 substantially in a game or musical application.

17. An electronic device comprising data input means (610) for receiving sound or related structural vibration data, further comprising processing means (602) and memory means (604) for processing and storing instructions and data, and in particular, for determining on the basis of the received data one or more location estimates of one or more sound sources (508), characterized in that said device is adapted to perform an action dependent on said one or more location estimates of said one or more sound sources.

18. The electronic device of claim 17, wherein said data input means include at least one transducer.

19. The electronic device of claim 18, wherein said transducer is a microphone or an accelerometer.

20. The electronic device of any of claims 17-19, wherein two or more microphones, either included in or functionally connected to the electronic device, form a microphone array.

21. The electronic device of claim 20, wherein said array comprises at least one of the following: binaural array, static linear multi microphone array, static arbitrary multi microphone array, and dynamic arbitrary multi-microphone array.

22. The electronic device of any of claims 17-21, wherein said action is at least one of the following: controlling the device's functionality, updating a parameter value, launching/closing an application, inserting a character or a symbol on a device display, inserting a character or a symbol in a document visible on the display, carrying out a control event relating to a game, and generating a response to the device user.

23. The electronic device of any of claims 17-22, further comprising projecting means (612) for superimposing a number of key areas on an external surface, whereby the processing means (602) are adapted to localize the key area subjected to a touch or tapping as the location estimate of the sound source.

24. The electronic device of claim 17, adapted to receive and localize sound or structural vibration caused by touching or tapping the surface of the electronic device.

25. The electronic device of claim 24, adapted to receive and localize sound or structural vibration caused by touching a display or a keypad of the electronic device.

26. The electronic device of claim 17 that is an aggregate entity comprising a plurality of separable devices.

27. The electronic device of claim 26, wherein a connection between two devices belonging to said plurality of devices is either wireless or wired.

28. The electronic device of claim 26, wherein a device belonging to said plurality of devices is adapted to determine its position in relation to another device belonging to said plurality of devices.

29. The electronic device of claim 17 that is substantially a mobile terminal, a PDA (Personal Digital Assistant), a desktop game console, a hand-held game console, a wireless game controller, or an input device.

30. The electronic device of claim 29 that is GSM (Global System for Mobile Communications), WLAN (Wireless Local Area Network) or UMTS (Universal Mobile Telecommunication System) compatible.

31. The electronic device of any of claims 17-30, further adapted to localize itself based on one or more received sound signals.

32. The device of any of claims 17-31, further adapted to estimate at least one of the following based on received sound or vibration data: allowed or predetermined sound source location, and characteristic of a sound emitted by a sound source.

33. The device of any of claims 17-32, further adapted to estimate its orientation by preferably utilizing a magnetometer or a gyro.

34. The device of any of claims 17-33 that is a musical accessory.

35. The device of any of claims 17-34, adapted to play a sound associated with the estimated location.

36. A computer program comprising code means to execute the method steps of claim 1.

37. A carrier medium carrying the computer executable program of claim 36.