WO2010112677A1

WO2010112677A1 - Method for controlling an apparatus

Info

Publication number: WO2010112677A1
Application number: PCT/FI2010/050251
Authority: WO
Inventors: Antti JYLHÄ; Cumhur Erkut
Original assignee: Aalto-Korkeakoulusäätiö
Priority date: 2009-04-03
Filing date: 2010-03-30
Publication date: 2010-10-07
Also published as: FI20095371A; FI20095371A0

Abstract

The invention relates to a method for controlling an apparatus, which comprises obtaining (530) a discrete feature of an event from a received sonic gesture, obtaining (540) a continuous feature relating to the event from the received sonic gesture, and controlling (560) the apparatus by the obtained discrete feature of the event and the obtained continuous feature relating to the event. The invention also relates to the apparatus, which is controlled by the obtained discrete feature of the event and the obtained continuous feature relating to the event, a computer program product for controlling the apparatus, and a carrier medium comprising the computer program product for controlling the apparatus.

Description

METHOD FOR CONTROLLING AN APPARATUS

TECHNICAL FIELD OF THE INVENTION

The invention relates to a method for controlling an apparatus by a sonic gesture. The invention also relates to an apparatus, which is controlled by a sonic gesture. In addition, the invention relates to a computer program product for controlling an apparatus by a sonic gesture. Furthermore, the invention relates to a carrier medium, which comprises a computer program product for controlling an apparatus by a sonic gesture.

BACKGROUND OF THE INVENTION

The life and environment of the modern people is filled with appliances and applications, and people try to interact with those appliances and applications. This interaction is still widely bound to conventional means of commanding the devices with e.g. buttons, keyboards, and point-and-click interfaces. For example, the user of the portable music player is forced to take the music player out of the pocket for pressing buttons to change a song to be played in the music player.

Similar restrictions apply to other mobile devices, as well as to household appliances, which do not enable remote interactions. Even remote controls for televisions and game consoles require hands-on actions and engage also the eyes of the user. In consequence of this, the user is detached from his/her immediate activities and surroundings.

The people interact with their surroundings continuously and get continuous multimodal information in response to their continuous multimodal actions. Sound is a key modality in interaction between people, not only with speech but also with son- ic gestures such as hand clapping, whistling, and non-speech utterances. These sonic gestures are often culturally bound to specific meanings and contexts, and can be seen as an additional language along written and spoken languages; a language that can be much simpler but still efficient. So, the use of such sonic gestures in interactive contexts between humans and their appliances, e.g. per- sonal computers, mobile phones, and stereo sets, should provide natural means for interacting with these appliances. Current user interface solutions in order to execute the interaction between a user and a device are based mostly on graphical user interfaces, buttons, or point-and- click approaches. In addition, there are some attemps to realize a hands-free interaction or an eyes-free interaction, such as an audio input and a speech recogni- tion with a computer, but those are in general infamous in their lack of robustness and reliability.

Furthermore, there are several applications demonstrating the detection of the single sound events, but they are restricted to event-based interactions and it is not in line with the way people naturally interact with their surroundings.

The natural way to interact with the surroundings includes also a continuous interaction from which turning a door handle is an everyday example. In the handle turning a force affected by a person who wants to open a door changing continuously between the door opening person and the handle instead of an event- based "press the button and the door opens" paradigm. Similar natural closed-loop continuous interactions are everywhere, but not implemented in mobile devices and other appliances.

SUMMARY

One object of the invention is to provide richness in information carried by sonic gestures, which are used in the audio based remote controlling of the apparatus- es.

The object of the invention is fulfilled by providing a method, wherein a discrete feature of an event is obtained from a received sonic gesture, a continuous feature relating to the event is obtained from the received sonic gesture, and the apparatus is controlled by the obtained discrete feature of the event and the obtained continuous feature relating to the event.

The object of the invention is also fulfilled by providing an apparatus, which is configured to obtain a discrete feature of an event from a received sonic gesture, obtain a continuous feature relating to the event from the received sonic gesture, and be controlled by the obtained discrete feature of the event and the obtained conti- nuous feature relating to the event.

The object of the invention is also fulfilled by providing a computer program product, which, when the computer program product is run in a computer, obtains a discrete feature of an event from a received sonic gesture, obtains a continuous feature relating to the event from the received sonic gesture, and controls the apparatus by the obtained discrete feature of the event and the obtained continuous feature relating to the event.

The object of the invention is also fulfilled by providing a carrier medium, which comprises a computer program product, which, when the computer program product is run in a computer, obtains a discrete feature of an event from a received sonic gesture, obtains a continuous feature relating to the event from the received sonic gesture, and controls the apparatus by the obtained discrete feature of the event and the obtained continuous feature relating to the event.

According to an embodiment of the invention the user of the apparatus, such as mobile device, claps his/her hands at a certain tempo in order to produce a sonic gesture indicating a control command for the apparatus. The apparatus receives the user induced sonic gesture by a microphone and transforms the audio input to a form where it is possible to extract the control command from the audio input by means of a suitable software stored in the memory of the apparatus. The software specifies from the received and transformed input at least one discrete parameter and continuous parameter and associates the specified discrete parameter(s) and continuous parameter(s) to a desired control command, which is then performed in the apparatus. Finally, the result of the control command is indicated to the user e.g. by an audio output through the loudspeaker of the apparatus or a visible output through the display of the apparatus.

An embodiment of the present invention relates to a method according to independent claim 1.

Also, an embodiment of the present invention relates to an apparatus according to independent claim 10.

In addition, an embodiment of the present invention relates to a computer program product according to independent claim 1 1.

Furthermore, an embodiment of the present invention relates to a carrier medium according to independent claim 12.

Further embodiments are defined in dependent claims.

According to an embodiment of the invention a method for controlling an apparatus comprises obtaining a discrete feature of an event from a received sonic ges- ture, obtaining a continuous feature relating to the event from the received sonic gesture, and controlling the apparatus by the obtained discrete feature of the event and the obtained continuous feature relating to the event.

The term "event" refers to e.g. a human induced hand clap, finger tap, foot step, or whistle.

The term "sonic gesture" refers to a human induced audio input, which is directed to an apparatus in order to control the apparatus. The sonic gesture comprises at least one event and possibly an interval or intervals around the event.

The term "apparatus" refers to any device, which is capable of receiving sounds (i.e. equipped with a microphone), extracting control information from the received sounds and executing a function indicated by the extracted control information. Such device can be e.g. a mobile phone, laptop, navigator, personal digital assistant (PDA), personal computer, game console, vehicle, device relating to a vehicle control or entertainment in a vehicle, household appliance, television, DVD player, set-top box, home entertainment system, watch, toy, music instrument, public display in e.g. a shopping center, and stereo set.

According to an embodiment of the invention the method, which is disclosed in the previous embodiment, comprises that obtaining the discrete feature of the event and the continuous feature relating to the event from the received sonic gesture is based on a predefined model or a model created during the execution of the method or a combination of the predefined model and the model created during the execution of the method. The discrete parameters and continuous parameters can be determined by means of e.g. pattern recognition. In the case of the predefined model the user or users of the apparatus teach the software in the apparatus to determine the discrete parameter(s) and/or continuous parameter(s) before the user's distinct command (supervised learning). The model learnt during the method execution, for one, is based on e.g. statistical pattern regularities (unsuper- vised learning). The learning process is also possible to carry out as a combination of the supervised and unsupervised learning.

According to an embodiment of the invention the method, which is disclosed in any of the previous embodiments, comprises that obtaining the discrete feature of the event from the received sonic gesture comprises determining a type of the event or the temporal pattern of the event. The type of the event can be e.g. a hand clap, hand clap with a certain hand configuration, finger snap, foot step, whistle, or whis- tie melody. The temporal pattern of the event(s) can be e.g. the temporal pattern of the hand claps, finger snaps, foot steps, or whistles, the temporal pattern of the whistle, or the temporal pattern of the different event types, whereupon the temporal pattern comprises e.g. hand claps and finger snaps.

The term "hand configuration" refers to hand clap types (clapping modes). For example, one clapping mode can be a mode wherein hands are kept parallel and flat. In another clapping mode the hands are in an angle in relation to each other with a natural hand curvature.

According to an embodiment of the invention the method, which is disclosed in any of the previous embodiments, comprises that obtaining the discrete feature of the event from the received sonic gesture further comprises determining various types of events from the received sonic gesture. The sonic gestures can also comprise several types of event, whereupon a sonic gesture can comprise e.g. a hand clap and a finger snap.

According to an embodiment of the invention the method, which is disclosed in any of the previous embodiments, further comprises determining an entity producing the sonic gesture or a location of the entity producing the sonic gesture. The user of the apparatus is identified by means of the supervised learning and/or unsuper- vised learning. The location of the user can be determined by e.g. means of two microphones implemented in the apparatus. It is possible to use the user's location determination as a user identification method, especially when the apparatus has more than one user at the same time. The user identification can be naturally improved by using both the pattern recognition based user identification and the user's location determination. These preceding user identification and user's location definition can be executed in context of the discrete parameter determination and/or continuous parameter determination.

According to an embodiment of the invention the method, which is disclosed in any of the previous embodiments, comprises that obtaining the continuous feature relating to the event from the received sonic gesture comprises determining a tempo of events or variations of types of events. The tempo of the events refers to e.g. hand clapping, finger snapping, or foot stepping tempo. The event type variations can comprise e.g the continuous variation of the hand configuration (clap type) or the continuous variation of the finger snaps and foot steps. Also, the determined continuous feature can be e.g. the pitch or variance of the whistle or the direction of the sonic gesture. The continuous feature determination can be executed to- gether with the discrete parameter determination or after the discrete parameter determination.

According to an embodiment of the invention the method, which is disclosed in any of the previous embodiments, comprises that controlling the apparatus by the ob- tained discrete feature of the event and the obtained continuous feature relating to the event comprises determining a command by the discrete feature of the event and the continuous feature relating to the event.

According to an embodiment of the invention the method, which is disclosed in any of the previous embodiments, comprises that controlling the apparatus by the ob- tained discrete feature of the event and the obtained continuous feature relating to the event further comprises performing the command in the apparatus for controlling the apparatus.

According to an embodiment of the invention the method, which is disclosed in any of the previous embodiments, wherein the event is a hand clap, finger snap, finger tap, foot step, or whistle.

According to an embodiment of the invention an apparatus is configured to obtain a discrete feature of an event from a received sonic gesture, obtain a continuous feature relating to the event from the received sonic gesture, and be controlled by the obtained discrete feature of the event and the obtained continuous feature re- lating to the event.

According to an embodiment of the invention the apparatus, which is disclosed in the previous embodiment, is configured to obtain the discrete feature of the event and the continuous feature relating to the event from the received sonic gesture by means of a predefined model or a model created during the execution of the me- thod or a combination of the predefined model and the model created during the execution of the method.

According to an embodiment of the invention the apparatus, which is disclosed in any of the previous embodiments, is configured to determine a type of the event or the temporal pattern of the event.

According to an embodiment of the invention the apparatus, which is disclosed in any of the previous embodiments, is configured to determine various types of events from the received sonic gesture. According to an embodiment of the invention the apparatus, which is disclosed in any of the previous embodiments, is further configured determine an entity producing the sonic gesture or a location of the entity producing the sonic gesture.

According to an embodiment of the invention the apparatus, which is disclosed in any of the previous embodiments, is configured determine a tempo of events or variations of types of events.

According to an embodiment of the invention the apparatus, which is disclosed in any of the previous embodiments, is configured to determine a command by the discrete feature of the event and the continuous feature relating to the event.

According to an embodiment of the invention the apparatus, which is disclosed in any of the previous embodiments, is configured to perform the command.

According to an embodiment of the invention the apparatus, which is disclosed in any of the previous embodiments, wherein the event is a hand clap, finger snap, finger tap, foot step, or whistle.

The method according to embodiments of the invention enables a remote interaction, i.e. the user does not need to be in contact with the apparatus he/she is interacting with, for the use of the user induced sounds as input in a human-computer interaction (HCI). Naturally this also applies to cases, wherein the apparatus is controlled by more than one user.

The method according to embodiments of the invention uses both the discrete and continuous characteristics of the sonic gestures and, thus, provides richness in the information conveyed by the sonic gestures. For example, from hand clapping, in addition to the detection of the isolated claps, one can extract discrete parameters, such as a hand configuration and temporal patterns of several claps, and conti- nuous parameters, such as the clapping tempo and the continuous variation of the hand configuration. All of these parameters can be estimated from the sound of the natural act of the clapping hands.

The method according to embodiments of the invention can be executed in widely available existing hardware. Any apparatus capable of gathering sound and processing audio signals is suitable. These capabilities are readily available in e.g. mobile phones and laptop computers. The method according to embodiments of the invention is also possible to provide by using multisensory input, which offers more richness to the sonic gestures. The sound of walking is one example of combining the audio input with a kinesthetic input, wherein e.g. the mobile device in the user's pocket receives additional input by means of accelerometers implemented in the mobile device.

BRIEF DESCRIPTION OF THE DRAWINGS

Next, the embodiments of the invention will be described in greater detail with reference to exemplary embodiments in accordance with the accompanying drawings, of which

Figure 1 illustrates an exemplary view of an arrangement according to an advan- tegeous embodiment of the invention,

Figure 2 illustrates an exemplary view of an arrangement according to an advan- tegeous embodiment of the invention,

Figure 3 illustrates an exemplary view of a method according to an advante- geous embodiment of the invention,

Figure 4 illustrates an exemplary flowchart of a method for controlling an apparatus according to an advantageous embodiment of the invention,

Figure 5 illustrates an exemplary flowchart of a method for event discovering according to an advantageous embodiment of the invention, and

Figure 6 illustrates an exemplary view of an apparatus configured to perform a method according to an advantageous embodiment of the invention.

DETAILED DESCRIPTION

Figure 1 illustrates an arrangement 100, wherein an apparatus 110, such as a mobile station, which comprises a microphone or microphones for enabling to re- ceive sonic gestures 120 produced by a user 130. A processor in the mobile station 1 10 runs an application, e.g. a music application, wherein the user 130 can control a drummer in order to play drums. If the user 130 wants to control the drummer in the music application so that the drummer plays in a certain way, he/she sends commands through an air interface to the music application in the mobile station 1 10 e.g. by clapping his/her hands. The mobile station 1 10 provides a feedback 140, which can be e.g. a visual feedback, audio feedback, tactile feedback, or combination of several feedback types, to the user's sonic gesture 120.

Figure 2, for one, illustrates an arrangement 200, wherein a computer 210 having microphones in order to receive sonic gestures 220a, 220b produced by users 230a, 230b. Also in this case, the users 230a, 230b control an application run by the computer by means of sonic gestures 230a, 230b such as hand claps, finger snaps, foot steps, and whistles. Let's assume that the application is a game, wherein both users 230a, 230b have own game figure, which they control.

The computer 210 is capable of determining a direction where the user induced sonic gesture 220a, 220b arrives e.g. by means of two microphones. This determination of the arrival direction of the sonic gesture 220a, 220b provides one opportunity to specify the identity of the user 230a, 230b or it can be used together with learning techniques, wherein the application in the computer 210 is taught to identify the user 230a, 230b, who has accomplished the received sonic gesture 220a, 220b.

So, when the user 230a sends through an air interface the sonic gesture 220a to the computer 210, it associates the received sonic gesture 220a to the game figure of the user 230a. The computer 210 defines from the sonic gesture 220a a desired action, which the game figure performs, and provides a feedback 240, which can be e.g. a visual feedback through a display 250, audio feedback through loudspeakers 260a, 260b, or combination of visual and audio feedbacks.

As a countermove, the user 230b orders his/her own game figure to perform a desired action by the sonic gesture 220b and he/she receives the feedback 240 similarly.

Figure 3 depicts a sonic gesture 300 to be controlled, which comprises several event types in the same sonic gesture. Also in this example the user of the apparatus controls a music application, wherein e.g. a drummer plays drums.

A first hand configuration 310, wherein the user claps his/her hands so that the hands are parallel and flat, indicates a drumbeat for a "torn torn" drum and a second hand configuration 320, wherein the user's hands are in an angle in relation to each other with a natural curvature, indicates a drumbeat for a bass drum.

The user of the music application starts to drum by clapping his/her hands according to the first hand configuration 310 in order to beat the "torn torn" drum. Imme- diately after a short interval 330a, the bass drum is beaten by a hand clap according to the second hand configuation 320. In addition, after a little bit longer interval 330b than the interval 330a, the user beats the torn torn drum once and then the bass drum twice. Between the last three drumbeats there are short intervals 330c, 33Od as the upper time period shows. Thus, the user has provided the sonic gesture 300, which includes several different hand claps types (configurations) and the intervals 330a-330d forming a certain temporal pattern.

The controlled apparatus, which has a display and at least one loudspeaker, receives the user's sonic gesture 300 and extracts the received command to a suitable form, where the music application is capable of recognize the sonic gesture (command) and perform it so that the drummer of the music application beats the torn torn drum and the bass drum on the display of the apparatus according to the user induced command. In addition, the user can of course hear the desired drumming pattern from the loudspeaker(s) of the apparatus. The received sonic gesture 300 is also stored in the memory of the apparatus.

One or more discrete feature found from the produced sonic gesture 300 is e.g. the hand clap types 310, 320 and the temporal pattern of the hand claps. One or more continuous feature can be e.g. a hand clap tempo and continuously varying hand clap type.

The above-mentioned sonic gesture 300 is also possible to establish e.g. by finger snaps and foot steps, and, in fact, only imagination limits events and event combinations to be used.

Since the apparatus has received and stored the sonic gesture 300, the user can control the playback of the stored sonic gesture (sequence) by new sonic ges- tures. For example, when the user claps his/her hands in a fast tempo, the stored sequence 300 is reproduced by a faster tempo than the original sequence 300 as one can see from a sequence 340 in the lower time period. Respectively, when the hands are clapped in a slower tempo, the sequence 300 is reproduced by a slower tempo than the original sequence 300.

Another example relating to discrete and continuous parameters' utilization can be e.g. a flamenco clapping application, wherein the application plays an example clapping sequence comprising different hand clap types. A user aspires to reproduce the example sequence by clapping his/her hands continuously with the same hand configurations and clapping tempo. The application receives the user's continuos clapping sounds and determines whether he/she manages to clap with similar hand configurations and clapping tempo. After that the application produces a visual and/or audible feedback.

Figure 4 presents, by means of an example only, a general flowchart describing an event learning method 400 according to the embodiment of the invention.

In the method start-up in step 410, an apparatus and/or an application executing the method are turned on and necessary stages, such as a set up definition and different parameters' initialisation, are provided.

Next, in step 420, data is collected, i.e. sonic gestures comprising at least one event such as a hand claps, finger snap, foot step, or whistle used in a control process.

The collected data is processed during step 430 so that e.g. in the case of statistical method one can find from the processed data statistical regularities and create statistical models for the events. Therefore, the application can be taught to identify an entity (user), who accomplishes a sonic gesture.

The created models are stored in the apparatus in step 440 so that the application can utilize the predefined models afterwards in order to specify events from received sonic gesture and the entity providing the sonic gesture. The created models are compared to new real data for discovering similarities between the created models and a real received event.

In step 450 the user estimates a need for further data and if the need for further data exists, it is possible to return to step 420.

Otherwise, the method is successfully completed and the method is ended in step 460.

Figure 5 discloses, by means of an example only, a flow chart describing a method 500 for controlling an apparatus according to one embodiment of the invention.

During the method start-up in step 510, an apparatus and/or an application, which perform the method, are turned on and necessary stages are provided, such as an application set up definition and different variables and parameters' initialisation, before the apparatus control. In this case, a user defines the set up, variables, and parameters in view of the apparatus control.

Next, in step 520 the apparatus receives a sonic gesture accomplished by the user of the apparatus through at least one microphone. The received sonic gesture is processed in the apparatus into a suitable form, e.g. an electric signal, in order to obtain desired discrete and continuous features.

In step 530, the discrete features such as event types and the temporal patterns of the events are determined from the processed signal. The determination is performed by using e.g. a pattern recognition method. The pattern recognition used in context of the discrete and continuous features is based on predefined models stored in the memory of the apparatus, models learnt during the apparatus control, or a combination, wherein both the predefined models and models learnt during the process are utilised.

Furthermore, during step 530 the identity of the sonic gesture induced by the user can be defined on the basis of the direction where the sonic gesture has arrived when there is more than one user. The pattern recognition is also usable for user identification and both techniques can be used together in order to improve the user identification.

Together with the determination of the discrete features in step 530, or after the discrete features determination, the continuous features are specified during step 540 from the processed signal. Such continuous features can be e.g. a events' tempo or event type variations.

In step 550, a command, which the user wants to execute in the apparatus, is specified on the grounds of the determined discrete and continuous features and the specified command is performed in the apparatus as step 560 describes and its consequences are displayed in the application on the display of the apparatus or reproduced through e.g. a loudspeaker or loudspeaker system connected to the apparatus.

In step 570 the user estimates a need for further commands (sonic gestures) to the application and if there is the need for further commands, it is possible to return to step 520.

If the user decides that the there is no need for further commands, i.e. the application control is successfully completed, the control method is ended in step 580. Figure 6 presents one example of a mobile device 600 adapted to control an application. The mobile device comprises processor 610 for performing instructions and handling data, a memory unit 620 for storing data such as instructions and application data, a user interface 630, which can be e.g. a keyboard, touchpad, or other selection means, data transfer means 640 for transmitting and receiving data, means 650 for receiving sonic gestures such as one or more microphone, and a loudspeaker 660 in order to establish an audio feedback. The mobile device can also comprise a display 670 for providing a graphical or tactile feedback.

Memory 620 stores at least a user interface application 622 and an application 624 for determining both discrete and continuous features. The means for receiving sonic gestures 650 obtain sonic gestures and the processor 610 manipulates the received audio data according to the instructions of the corresponding application

624 in order to determine both discrete and continuous features, and obtains a command on the grounds of the specified discrete and continuous features. Then, the processor 610 performs the obtained command, which is reproduced through the loudspeaker 660 and/or the display 670.

The invention has been now explained above with reference to the aforesaid embodiments and the several advantages of the invention have been demonstrated. It is clear that the invention is not only restricted to these embodiments, but com- prises all possible embodiments within the spirit and scope of the invention thought and the following patent claims.

Claims

1. Method for controlling an apparatus, which method comprises

obtaining (530) a discrete feature of an event from a received sonic gesture,

obtaining (540) a continuous feature relating to the event from the received sonic gesture, and

controlling (560) the apparatus by the obtained discrete feature of the event and the obtained continuous feature relating to the event.

2. The method according to claim 1 , wherein obtaining the discrete feature of the event and the continuous feature relating to the event from the received sonic gesture is based on a predefined model or a model created during the execution of the method or a combination of the predefined model and the model created during the execution of the method.

3. The method according to claim 1 or 2, wherein obtaining the discrete feature of the event from the received sonic gesture comprises determining a type of the event or the temporal pattern of the event.

4. The method according to any of previous claims, wherein obtaining the discrete feature of the event from the received sonic gesture further comprises determining various types of events from the received sonic gesture.

5. The method according to any of previous claims, wherein the method further comprises determining an entity producing the sonic gesture or a location of the entity producing the sonic gesture.

6. The method according to any of previous claims, wherein obtaining the continuous feature relating to the event from the received sonic gesture comprises determining a tempo of events or variations of types of events.

7. The method according to any of previous claims, wherein controlling the apparatus by the obtained discrete feature of the event and the obtained continuous feature relating to the event comprises determining (550) a command by the discrete feature of the event and the continuous feature relating to the event.

8. The method according to any of previous claims, wherein controlling the ap- paratus by the obtained discrete feature of the event and the obtained continuous feature relating to the event further comprises performing (560) the command in the apparatus for controlling the apparatus.

9. The method according to any of previous claims, wherein the event is a hand clap, finger snap, finger tap, foot step, or whistle.

10. An apparatus configured to perform the method of any of claims 1 -9.

11. A computer program product configured to perform the method of any of claims 1 -9, when said computer program product is run in a computer.

12. A carrier medium comprising a computer program product according to claim 11.