WO2020009651A1 - Voice based user interface for controlling at least one function - Google Patents

Voice based user interface for controlling at least one function Download PDF

Info

Publication number
WO2020009651A1
WO2020009651A1 PCT/SE2019/050669 SE2019050669W WO2020009651A1 WO 2020009651 A1 WO2020009651 A1 WO 2020009651A1 SE 2019050669 W SE2019050669 W SE 2019050669W WO 2020009651 A1 WO2020009651 A1 WO 2020009651A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
user interface
based user
activation trigger
processor
Prior art date
Application number
PCT/SE2019/050669
Other languages
French (fr)
Inventor
Adam HENRIKSSON
Carl Johan Uggla
Tor HAUKSSON NETTELBLADT
Original Assignee
Zound Industries International Ab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zound Industries International Ab filed Critical Zound Industries International Ab
Publication of WO2020009651A1 publication Critical patent/WO2020009651A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/26Devices for calling a subscriber
    • H04M1/27Devices whereby a plurality of signals may be stored simultaneously
    • H04M1/271Devices whereby a plurality of signals may be stored simultaneously controlled by voice recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present disclosure generally relates to the field of user interfaces and in particular to a method for providing a voice based user interface, a voice based user interface, and an audio device comprising a voice based user interface.
  • a user interface is where the interactions between humans and machines, or devices, occur. The goal of this interaction is to support the user in operating the underlying technical device. It may allow efficient operation and control of the device from the human end, whilst the device simultaneously feeds back information that aids the users' decision-making process. Examples of this broad concept of user interfaces include the interactive aspects of computer operating systems, heavy machinery operator controls, and process controls.
  • the goal of user interfaces is to make it easy, efficient, and enjoyable, e.g. user-friendly, to operate a device in the way which produces the desired result.
  • a voice based user interface also known as voice user interface (VUI)
  • VUI voice user interface
  • the primary advantage of a VUI is that it may allow for a hands-free and/or eyes-free way in which a user may interact with the device while focusing his or her attention elsewhere. It may allow users to initiate automated services and execute their day-to- day tasks in a faster and/or more intuitive manner.
  • VUIs voice enabled devices
  • the market of voice enabled devices, comprising VUIs is growing exponentially. The development towards making them even more intuitive and easy to use continues.
  • the VUIs are quickly growing smarter, learning the user’s speech patterns over time and even building their own vocabulary.
  • a user that wants to activate a certain function such as starting his/her favorite playlist, on a voice enabled device may achieve this in two ways.
  • the user may achieve this by using an app or by using the voice based user interface.
  • the user In order to use the app, the user has to unlock the mobile phone and select the playlist in a visual interface. This may be distracting and also takes time.
  • the voice based user interface In order to use the voice based user interface, the user has to activate the voice based user interface and then talk into the microphone of the device. This may be stigmatizing and awkward in public places. Accordingly, available voice based user interfaces may not be optimal in all situations.
  • the present disclosure recognizes the fact that there are moments when VUIs may be inadequate. For example, when repetitive commands are sent to voice enabled products, or when a user is among other people. However, even in these situations, there is still a need for a voice based user interface that may be both easy and time efficient to use.
  • a method of providing a voice based user interface for controlling at least one function is provided.
  • the method comprises obtaining a voice command by means of a voice-recording device.
  • the voice command constitutes a command for controlling the at least one function.
  • the method further comprises converting the obtained voice command into a voice sample representation by means of at least one processor; and associating the voice sample representation with an activation trigger, by means of the at least one processor. Responsive to an activation of the activation trigger, the method comprises initiating, by means of the at least one processor, the voice sample representation for controlling the at least one function.
  • the method further comprises storing the voice sample representation associated with the activation trigger in at least one memory.
  • controlling the at least one function comprises controlling the at least one function in at least one processor that is located remotely from the voice based user interface.
  • the method then further comprises transmitting said voice sample representation to said at least one processor by means of a communications interface.
  • the voice sample representation may for example be a code and initiating the voice sample representation then comprises executing said code.
  • the activation trigger comprises an action associated with an activation button.
  • the activation button may for example be a pushbutton.
  • the activation trigger may then be a depression on the pushbutton.
  • the activation trigger comprises an action associated with multiple pushbuttons.
  • associating the voice sample representation with the activation trigger further comprises recording, by the at least one processor, an activation trigger, and associating the voice sample representation with the recorded action.
  • the activation trigger is associated with different voice sample representations depending on a context.
  • the context includes at least one of a time and a location.
  • the method further comprises generating a confirmation after that the voice sample representation has been associated with the activation trigger.
  • a voice based user interface for implementing the method according to the first aspect.
  • the voice based user interface is communicatively connected to i) a voice-recording device, wherein the voice-recording device is configured to obtain a voice command, the voice command constituting a command for controlling the at least one function; ii) at least one processor; and iii) at least one memory.
  • Said at least one memory comprises instructions that are executable by the at least one processor, whereby the at least one processor is operative to convert the received voice command into a voice sample representation, and associate the voice sample representation with an activation trigger. In response to an activation of said activation trigger, the at least one processor is operative to initiate the voice sample representation for controlling the at least one function.
  • the at least one memory is configured to store the voice sample representation associated with said activation trigger.
  • the voice based user interface is communicatively connected to a communications interface.
  • the controlled at least one function is executed in at least one processor located remotely from the voice based user interface, and the communications interface is configured to transmit the voice sample representation to said at least one processor.
  • the voice sample representation is a code.
  • the at least one memory comprises instructions that are executable by the at least one processor, whereby the at least one processor is operative to initiate said voice sample representation by executing said code.
  • the voice based user interface is further communicatively connected to an activation button.
  • Said activation trigger comprises an action associated with the activation button.
  • the activation button may for example be a pushbutton.
  • the activation trigger may for example comprise a depression on the pushbutton.
  • the activation trigger comprises an action associated with multiple pushbuttons.
  • the voice sample representation is associated with the activation trigger by means of the at least one processor by record an activation trigger, and associate the voice sample representation with the recorded action.
  • the activation trigger is associated with different voice sample representations depending on a context.
  • the context includes at least one of a time and a location.
  • the at least one memory comprises instructions that are executable by the at least one processor, whereby the at least one processor is operative to generate a confirmation when the voice sample representation is associated with the activation trigger.
  • an audio device comprising a voice based user interface according to the second aspect.
  • the audio device may for example be a loudspeaker, a pair of headphones, a hearing aid or a pair of ear protection devices. When the audio device is a pair of headphones, they may for example be in-ears headphones.
  • the audio device comprises a voice based user interface for controlling at least one function.
  • the voice based user interface is communicatively connected to a voice-recording device, wherein the voice-recording device is configured to obtain a voice command.
  • the voice command constitutes a command for controlling the at least one function.
  • the voice based user interface of the audio device is further communicatively connected to least one processor; and at least one memory.
  • the at least one memory comprises instructions that are executable by the at least one processor, whereby the at least one processor is operative to convert the received voice command into a voice sample representation, and associate the voice sample representation with an activation trigger. In response to an activation of the activation trigger, the at least one processor is operative to initiate the voice sample representation for controlling the at least one function.
  • the various embodiments described herein provide an easy and time efficient solution for controlling at least one functions via a voice based user interface.
  • the various embodiments allows a user to control at least one function without using his/her voice, while still not requiring the user to look at a visual interface. This will accordingly make the controlling of the at least one function smoother, by cutting away or otherwise reducing unnecessary steps, so that the user may achieve the intended task faster.
  • Figure 1 is a flowchart according to an example method
  • Figure 2 shows an example implementation of a voice based user interface
  • Figure 3 illustrates an audio device according to one embodiment
  • Figure 4 shows an example embodiment.
  • a method 100 of providing a voice based user interface for controlling at least one function This method 100 will now be described with reference to Figure 1.
  • the method 100 starts by that a voice command may be obtained 110.
  • the voice command may be obtained by means of a voice-recording device.
  • the voice command may constitute a command for controlling the at least one function. Accordingly, the voice command may be any command uttered by a user with the intention to control at least one function.
  • the controlled at least function may be any function associated with any application or device.
  • the method 100 may further comprise converting 120 the obtained voice command into a voice sample representation by means of at least one processor.
  • the voice sample representation may be in any format that may be interpreted by an application or device to be, or otherwise represent, a command for controlling the at least one function.
  • the voice sample representation may be associated 140 with an activation trigger by means of the at least one processor.
  • An activation trigger may be a trigger that may activate the voice sample representation in order to control the at least one function. Accordingly, responsive to an activation of the activation trigger, the voice sample representation may be initiated 170, by means of the at least one processor, for controlling the at least one function.
  • the controlled at least one function may be any function that may be performed by the at least one processor.
  • the function may be an elementary function comprising a single command, such as dialing a certain device associated with a person.
  • the function may be a more advanced function comprising several combined commands, such as for example sending a text message to a certain device comprising the daily weather forecast.
  • Further examples of functions may be, without limitations, playing a certain playlist, starting a workout app when a user intends to go out running, send a text message saying the user is busy, etc.
  • the disclosed embodiment may link a voice command with an activation trigger, such that a user that wants to control a certain function only has to activate the associated activation trigger.
  • a voice command with an activation trigger
  • the user just has to activate the activation trigger in order to control the at least one function.
  • the user does not have to utter the voice command and the user does not have to look at a visual user interface.
  • the disclosed embodiment may enable the user to create silent and personal “shortcuts” to their favorite skills and actions.
  • the method 100 may further comprise storing 150 the voice sample representation associated with the activation trigger in at least one memory.
  • storing the voice sample representation associated with the activation trigger it may be possible to re-use the activation trigger, it may be possible to change, or update, the voice sample representation associated with the activation trigger and it may also be possible to delete the voice sample representation associated with the activation trigger. Accordingly, the disclosed embodiment may provide a more flexible solution.
  • each voice sample representation may be associated with an activation trigger for controlling at least one function.
  • one activation trigger may be stored to initiate one voice sample representation for controlling one function and another activation trigger may be stored to initiate another voice sample representation for controlling another function.
  • controlling the at least one function may comprise controlling the at least one function in at least one processor that is located remotely from the voice based user interface.
  • the method 100 may then further comprise transmitting 180 the voice sample representation to the at least one processor by means of a communications interface. Accordingly, the activation trigger may be activated at a device that is remote from where the at least one function may be controlled.
  • the proposed method 100 may accordingly be implemented in several devices communicatively coupled to each other and may provide a more flexible solution.
  • the voice sample representation may for example be a code.
  • initiating 170 the voice sample representation may comprise executing 190 the code.
  • the controlled at least function may be performed.
  • the activation trigger may comprise an action associated with an activation button. Accordingly, in order to initiate the voice sample representation for controlling the at least one function, an action may be performed with the activation button.
  • the action may be any action involving the activation button. Examples of such actions may for example be touching, dragging, depressing and turning the activation button.
  • the activation button may be at least one activation button, i.e. may also include several activation buttons.
  • the activation button(s) may for example be a pushbutton.
  • a pushbutton is a traditional hardware button that is physically visible to a user.
  • the activation trigger may for example be a depression, such as a push, on the at least one pushbutton.
  • the activation button(s) may alternatively, or additionally, be a capacitive button. When an object, such as a finger, comes in contact with the capacitive button, it causes an interference with the capacitance. It changes the total capacitance, and the button may be engineered to initiate the voice sample representation when disturbance occurs.
  • the activation button may also, additionally or alternatively, be a software button. Software buttons are exact replicas of capacitive button, but mirrored onto a screen. Hence, the activation button(s) may be any type of button as long as it in some way indicate to the user that it is a button where an activation trigger may be performed.
  • the activation trigger may comprise an action associated with multiple, i.e. two or more, pushbuttons.
  • the number of multiple pushbuttons is two.
  • the activation trigger may for example comprise a first depression on the first pushbutton and then another depression on the other one of the two pushbuttons in order to initiate the voice sample representation. Hence, in this example, if a depression on just one of the two pushbuttons occurs, the voice sample representation will not be initiated.
  • buttons By associating the activation trigger with multiple buttons, it may be possible to expand the possible alternatives of activation triggers when a limited amount of activation buttons are available. Hence, the numbers of activation triggers may be increased, without having to increasing the numbers of buttons. It may for example be desirable to limit the numbers of buttons because of limited space available for buttons, or simply just because of design reasons.
  • associating the voice sample representation with the activation trigger may further comprise recording 130, by the at least one processor, an activation trigger, and associating 140 the voice sample representation with the recorded action.
  • the voice sample representation may be associated with three depressions, such as pushes, on the activation button, or associate the voice sample representation with a short depression followed by a long depression. This may expand the possible numbers of activation triggers further.
  • the activation trigger may comprise an action associated with a sensor.
  • the sensor may be any kind of sensor that may register an activation trigger.
  • the sensor may be a gyro or an accelerometer and the activation trigger may be an action associated with that sensor.
  • an action may be initiated with the sensor.
  • the action may be any action involving the sensor. Examples of such actions may be a certain movement of the sensor.
  • the activation trigger may be associated with different voice sample representations depending on a context.
  • the context may include at least one of a time and a location. Accordingly, different voice sample representations may be initiated when the same activation trigger is activated depending on the time of day or the location. It may however be noted, that context does not have to be limited to location and time. Context may further include, for example, weather, surrounding people, mood etc.
  • the function that may be controlled by the activation trigger will depend on the current time. If an activation trigger is performed in the morning, before 8 am, this may for example give a daily weather forecast. While, if the same activation trigger is performed at breakfast, between 8 and 9 am, it may for example broadcast the daily news flash briefing. Furthermore, if the same activation trigger is performed in the afternoon, between 6 and 9 pm, it may for example play a certain playlist. Alternatively, when the context includes a location, the activation trigger may be associated with the location. If an activation trigger is performed at home, the activation trigger may initiate one voice sample representation, while if the same activation trigger is performed at the office, the activation trigger may initiate another voice sample representation.
  • the context may include both location and time. Hence, it may be possible to, for example, distinguish between weekdays and weekends.
  • the activation trigger may, for example, initiate one voice sample representation for sending a text message saying that the user is busy, while at nine o’clock at a weekend, a user may be at home and the activation trigger may initiate another voice sample representation, for example listening to a favorite playlist.
  • the above described embodiment has the advantage that a single activation trigger may be used for several different purposes.
  • By associating the voice sample representation in dependence with a context it may be possible to adapt the controlled function to prevailing conditions, achieve a more flexible solution and it may be easier to satisfy a user’s need.
  • the method 100 may further comprise generating 160, or otherwise producing or creating, a confirmation after that the voice sample representation has been associated with the activation trigger.
  • the confirmation may, for example, be an audio confirmation, a visual confirmation, a tactile confirmation or a combination of the previous mentioned confirmations.
  • a voice based user interface for implementing the method according to the first aspect.
  • Figure 2 discloses an example implementation of a voice based user interface 10 for controlling at least one function.
  • the voice based user interface 10 may be communicatively connected to a voice-recording device 25, 225.
  • the voice-recording device 25, 225 may be configured to obtain a voice command.
  • the voice command may constitute a command for controlling the at least one function.
  • the voice based user interface 10 may further be communicatively connected to at least one processor 40, 240; and at least one memory 60, 260.
  • the at least one memory 60, 260 may comprise instructions that are executable by the at least one processor 40, 240 whereby the at least one processor 40, 240 may be operative to convert the received voice command into a voice sample representation, and associate the voice sample representation with an activation trigger. In response to an activation of the activation trigger, the voice sample representation may be initiated for controlling the at least one function.
  • the at least one function may be controlled without having to utter a command, thus making it possible to control the at least one function in places where it is not appropriate to speak.
  • the voice-recording device 25, 225 may according to one embodiment be comprised within an audio interface 20, 220.
  • the audio interface 20, 220 may, for example, be configured to both receive and produce audio.
  • the at least one memory 60, 260 may be configured to store the voice sample representation associated with said activation trigger.
  • the at least one memory 260 may be located remote from the voice based user interface 10, for example in a remote device 200, or the memory 60 may be located within the same device 20 as the voice based user interface 10.
  • the voice based user interface 10 may, according to one embodiment, be communicatively connected to a communications interface 30.
  • the controlled at least one function may be executed in at least one processor 240 located remotely from the voice based user interface 10.
  • the communications interface 30 may be configured to transmit the voice sample representation to the at least one processor 240.
  • the voice based user interface 10 may control functions with a processor 240 that may be located remotely from the voice based user interface 10.
  • a more flexible solution is provided. It may be possible to locate the at least one processor 40, 240 where it may be the most suitable and hence, the voice based user interface 10 does not have be located in a device comprising a processor for controlling the at least one function.
  • the most suitable location may, for example, depend on the prevailing conditions such as available component space and cost constraints.
  • this embodiment does not exclude the possibility to also communicatively couple the voice based user interface 10 to a processor located within the same device as the voice based user interface 10.
  • the voice sample representation may for example be a code.
  • the code may for example be a computer code.
  • the code may alternatively be a machine code or any other suitable code that may be readable by the at least one processor 40, 240.
  • the at least one memory 60, 260 may comprise instructions that are executable by the at least one processor 40, 240, whereby the at least one processor 40, 240 may be operative to initiate the voice sample representation by executing the code.
  • the voice based user interface 10 may be further communicatively connected to an activation button 50.
  • the activation trigger may comprise an action associated with the activation button 50.
  • the activation trigger may comprise an action with the activation button 50.
  • the activation button 50 may be any type of button, as described above.
  • the activation button 50 may for example be a pushbutton.
  • the activation button may be at least one button, i.e. there may be multiple buttons. These activation buttons 50 may all be of the same type, or they may be of different types.
  • the activation trigger may comprise a depression on the pushbutton.
  • the activation trigger may be any action with the activation button 50.
  • the only thing that may limit the associated activation trigger, may be the software and/or the hardware within the button.
  • the activation button 50 is a turnable knob
  • the associated activation trigger may be a turning of the turnable knob
  • the activation button 50 is a software button, as described above, the activation trigger may be a tap or a touch action with the software button.
  • the activation trigger may alternatively or additionally comprise an action associated with multiple activation buttons 50. All the activation buttons may for example be pushbuttons. Accordingly, the voice sample representation may not be initiated until the action associated with all the involved pushbuttons has been performed.
  • the action associated with multiple pushbuttons may comprise, without limitations, a depression on each of at least two pushbuttons. Accordingly, if there are two activation buttons 50, there may be at least four different activation triggers associated with four different voice sample representations, i.e.
  • the possible activation triggers may be increased without increasing the numbers of activation buttons 50.
  • the voice sample representation may be associated with the activation trigger by means of the at least one processor 40, 240 by associate the voice sample representation with the recorded action. Hence, longer and more complex sequences of activation triggers may be recorded and associated with a voice sample representation.
  • the activation trigger may be associated with different voice sample representations depending on a context.
  • the context may include at least one of a time and a location.
  • the at least one memory 60, 260 may comprise instructions that are executable by the at least one processor 40, 240, whereby the at least one processor 40, 240 may be operative to generate a confirmation when the voice sample representation is associated with the activation trigger.
  • an audio device 20 may comprise a voice based user interface 10 according to the second aspect.
  • the audio device 20 may for example be, without limitations, a loudspeaker, a hearing aid, a pair of ear protection devices or a pair of headphones. If the audio device 20 is a pair of headphones, the pair of headphones may for example be in-ears headphones. Accordingly, the audio device 100 may be any device comprising a voice based user interface 10 for controlling at least one function.
  • the function may in some embodiments be related to audio, such as playing and pausing music. However, the controlled at least one function may, additionally or alternatively, not be related to audio.
  • the audio device 20 may further comprise a visual based user interface 310, as illustrated in Figure 3.
  • the visual user interface 310 may display visual information.
  • the visual user interface 310 may, for example, display information about stored voice sample representations with their associated activation triggers. This information may for example be comprised within an app, where it may be possible to get an overview of the created activation triggers used for controlling the at least one function.
  • the app may enable the user to update, delete or even create new activation triggers with associated voice sample representations.
  • the audio device 20 may be communicatively coupled to a remote device 200 which may comprise a visual user interface 310 as previously discussed.
  • the voice based user interface 25, 225 may be comprised within an audio device 20 such as a pair of headphones.
  • the headphones may comprise at least one activation button 50 in form of a pushbutton.
  • the headphones may further be communicatively coupled to a remote device 200.
  • the remote device 200 may, for example, be a smartphone.
  • the headphones may be communicatively coupled to a voice-recording device 25, 225 within the headphones or within the smartphone.
  • the voice-recording device 25, 225 may obtain 420 a voice command.
  • the voice command may constitute a command for controlling at least one function.
  • the voice-recording device 25, 225 may be configure to obtain the voice command when a button is pressed 410.
  • the button may be the activation button 50, or the button may be a different button.
  • the button may be located at the pair of headphones communicatively coupled to the voice based user interface 25, 225, or the button may be located at the smartphone communicatively connected to the headphones. When the button is released, this may indicate that the obtaining of the voice command is completed.
  • the voice command When the voice command is obtained, the voice command may be converted 440 into a voice sample representation. As this, in this exemplary embodiment, is performed remotely from the headphone, the voice command may first be transmitted 430 via the communications interfaces 30, 230 to the smartphone.
  • the converted voice sample representation may represent the action that the received voice command is to be translated into.
  • the obtained voice command may be converted into the voice sample representation by communicating with the hosted app service 450 by communicating via an app 460.
  • the response to the action may then be transmitted back to the smartphone 470, 480 and possibly further to the headphone 490, depending on which voice command that was received by the headphones.
  • the converted voice sample representation may then be associated with an activation trigger, which may be stored within a memory at the headphone or within the smartphone.
  • an activation trigger may be activated in order to control the at least one function, instead of using the voice command.

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Selective Calling Equipment (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The present disclosure generally relates to the field of user interfaces and in particular to a method for providing a voice based user interface, a voice based user interface, and an audio device comprising a voice based user interface. Among other things, the present disclosure presents a method of providing a voice based user interface for controlling at least one function. The method may comprise obtaining (110) a voice command by means of a voice-recording device. The voice command may constitute a command for controlling the at least one function. The obtained voice command may be converted (120) into a voice sample representation by means of at least one processor and the voice sample representation may be associated (140) with an activation trigger by means of the at least one processor. Responsive to an activation of the activation trigger, the voice sample representation may be initiated (170), by means of the at least one processor, for controlling the at least one function.

Description

VOICE BASED USER INTERFACE FOR CONTROLLING AT LEAST ONE
FUNCTION
Technical field
The present disclosure generally relates to the field of user interfaces and in particular to a method for providing a voice based user interface, a voice based user interface, and an audio device comprising a voice based user interface.
Background
This section is intended to provide a background to the various embodiments of the invention that are described in this disclosure. Therefore, unless otherwise indicated herein, what is described in this section should not be interpreted to be prior art by its mere inclusion in this section.
A user interface is where the interactions between humans and machines, or devices, occur. The goal of this interaction is to support the user in operating the underlying technical device. It may allow efficient operation and control of the device from the human end, whilst the device simultaneously feeds back information that aids the users' decision-making process. Examples of this broad concept of user interfaces include the interactive aspects of computer operating systems, heavy machinery operator controls, and process controls.
Generally, the goal of user interfaces is to make it easy, efficient, and enjoyable, e.g. user-friendly, to operate a device in the way which produces the desired result. This generally means that the user should preferably need to provide minimal input to achieve the desired output, and also that the device minimizes undesired outputs to the user.
Recently, voice based user interfaces have evolved. A voice based user interface, also known as voice user interface (VUI), may allow a user to interact with a computer, smartphone or other voice enabled device through voice or speech commands. The primary advantage of a VUI is that it may allow for a hands-free and/or eyes-free way in which a user may interact with the device while focusing his or her attention elsewhere. It may allow users to initiate automated services and execute their day-to- day tasks in a faster and/or more intuitive manner.
The market of voice enabled devices, comprising VUIs, is growing exponentially. The development towards making them even more intuitive and easy to use continues. The VUIs are quickly growing smarter, learning the user’s speech patterns over time and even building their own vocabulary.
Today, a user that wants to activate a certain function such as starting his/her favorite playlist, on a voice enabled device, e.g. a mobile phone, may achieve this in two ways. The user may achieve this by using an app or by using the voice based user interface. In order to use the app, the user has to unlock the mobile phone and select the playlist in a visual interface. This may be distracting and also takes time. On the other hand, in order to use the voice based user interface, the user has to activate the voice based user interface and then talk into the microphone of the device. This may be stigmatizing and awkward in public places. Accordingly, available voice based user interfaces may not be optimal in all situations.
Summary
It is in view of the above background and other considerations that the various embodiments of the present disclosure have been made.
The present disclosure recognizes the fact that there are moments when VUIs may be inadequate. For example, when repetitive commands are sent to voice enabled products, or when a user is among other people. However, even in these situations, there is still a need for a voice based user interface that may be both easy and time efficient to use.
In view of the above, it is therefore a general object of the aspects and embodiments described throughout this disclosure to provide an easy and time efficient solution for controlling at least one function via a voice based user interface.
This general object has been addressed by the appended independent claims. Advantageous embodiments are defined in the appended dependent claims.
According to a first aspect, there is provided a method of providing a voice based user interface for controlling at least one function.
The method comprises obtaining a voice command by means of a voice-recording device. The voice command constitutes a command for controlling the at least one function. The method further comprises converting the obtained voice command into a voice sample representation by means of at least one processor; and associating the voice sample representation with an activation trigger, by means of the at least one processor. Responsive to an activation of the activation trigger, the method comprises initiating, by means of the at least one processor, the voice sample representation for controlling the at least one function.
In one embodiment, the method further comprises storing the voice sample representation associated with the activation trigger in at least one memory.
In one embodiment, controlling the at least one function comprises controlling the at least one function in at least one processor that is located remotely from the voice based user interface. The method then further comprises transmitting said voice sample representation to said at least one processor by means of a communications interface. The voice sample representation may for example be a code and initiating the voice sample representation then comprises executing said code.
In one embodiment, the activation trigger comprises an action associated with an activation button. The activation button may for example be a pushbutton. The activation trigger may then be a depression on the pushbutton. In one exemplary embodiment, the activation trigger comprises an action associated with multiple pushbuttons.
In one embodiment, associating the voice sample representation with the activation trigger further comprises recording, by the at least one processor, an activation trigger, and associating the voice sample representation with the recorded action.
In one embodiment, the activation trigger is associated with different voice sample representations depending on a context. The context includes at least one of a time and a location.
In one embodiment, the method further comprises generating a confirmation after that the voice sample representation has been associated with the activation trigger.
According to a second aspect, there is provided a voice based user interface for implementing the method according to the first aspect.
In one embodiment, the voice based user interface is communicatively connected to i) a voice-recording device, wherein the voice-recording device is configured to obtain a voice command, the voice command constituting a command for controlling the at least one function; ii) at least one processor; and iii) at least one memory. Said at least one memory comprises instructions that are executable by the at least one processor, whereby the at least one processor is operative to convert the received voice command into a voice sample representation, and associate the voice sample representation with an activation trigger. In response to an activation of said activation trigger, the at least one processor is operative to initiate the voice sample representation for controlling the at least one function.
In one embodiment, the at least one memory is configured to store the voice sample representation associated with said activation trigger.
In one embodiment, the voice based user interface is communicatively connected to a communications interface. The controlled at least one function is executed in at least one processor located remotely from the voice based user interface, and the communications interface is configured to transmit the voice sample representation to said at least one processor.
In one embodiment, the voice sample representation is a code. The at least one memory comprises instructions that are executable by the at least one processor, whereby the at least one processor is operative to initiate said voice sample representation by executing said code.
In one embodiment, the voice based user interface is further communicatively connected to an activation button. Said activation trigger comprises an action associated with the activation button. The activation button may for example be a pushbutton. Then, the activation trigger may for example comprise a depression on the pushbutton.
In one embodiment, the activation trigger comprises an action associated with multiple pushbuttons.
In one embodiment, the voice sample representation is associated with the activation trigger by means of the at least one processor by record an activation trigger, and associate the voice sample representation with the recorded action.
In one embodiment, the activation trigger is associated with different voice sample representations depending on a context. The context includes at least one of a time and a location.
In one embodiment, the at least one memory comprises instructions that are executable by the at least one processor, whereby the at least one processor is operative to generate a confirmation when the voice sample representation is associated with the activation trigger.
According to a third aspect, there is provided an audio device comprising a voice based user interface according to the second aspect. The audio device may for example be a loudspeaker, a pair of headphones, a hearing aid or a pair of ear protection devices. When the audio device is a pair of headphones, they may for example be in-ears headphones. In one embodiment, the audio device comprises a voice based user interface for controlling at least one function. The voice based user interface is communicatively connected to a voice-recording device, wherein the voice-recording device is configured to obtain a voice command. The voice command constitutes a command for controlling the at least one function. The voice based user interface of the audio device is further communicatively connected to least one processor; and at least one memory. The at least one memory comprises instructions that are executable by the at least one processor, whereby the at least one processor is operative to convert the received voice command into a voice sample representation, and associate the voice sample representation with an activation trigger. In response to an activation of the activation trigger, the at least one processor is operative to initiate the voice sample representation for controlling the at least one function.
The various embodiments described herein provide an easy and time efficient solution for controlling at least one functions via a voice based user interface. The various embodiments allows a user to control at least one function without using his/her voice, while still not requiring the user to look at a visual interface. This will accordingly make the controlling of the at least one function smoother, by cutting away or otherwise reducing unnecessary steps, so that the user may achieve the intended task faster.
Brief Description of Drawings
These and other aspects, features and advantages will be apparent and elucidated from the following description of various embodiments, reference being made to the accompanying drawings, in which:
Figure 1 is a flowchart according to an example method;
Figure 2 shows an example implementation of a voice based user interface;
Figure 3 illustrates an audio device according to one embodiment; and
Figure 4 shows an example embodiment.
Detailed Description
The present invention will now be described more fully hereinafter. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided by way of example so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those persons skilled in the relevant art. Like reference numbers refer to like elements throughout the description.
According to a first aspect, there is provided a method 100 of providing a voice based user interface for controlling at least one function. This method 100 will now be described with reference to Figure 1.
The method 100 starts by that a voice command may be obtained 110. The voice command may be obtained by means of a voice-recording device. The voice command may constitute a command for controlling the at least one function. Accordingly, the voice command may be any command uttered by a user with the intention to control at least one function. The controlled at least function may be any function associated with any application or device.
The method 100 may further comprise converting 120 the obtained voice command into a voice sample representation by means of at least one processor. The voice sample representation may be in any format that may be interpreted by an application or device to be, or otherwise represent, a command for controlling the at least one function.
Thereafter, the voice sample representation may be associated 140 with an activation trigger by means of the at least one processor. An activation trigger may be a trigger that may activate the voice sample representation in order to control the at least one function. Accordingly, responsive to an activation of the activation trigger, the voice sample representation may be initiated 170, by means of the at least one processor, for controlling the at least one function.
The controlled at least one function may be any function that may be performed by the at least one processor. The function may be an elementary function comprising a single command, such as dialing a certain device associated with a person. Alternatively, the function may be a more advanced function comprising several combined commands, such as for example sending a text message to a certain device comprising the daily weather forecast. Further examples of functions may be, without limitations, playing a certain playlist, starting a workout app when a user intends to go out running, send a text message saying the user is busy, etc.
Accordingly, the disclosed embodiment may link a voice command with an activation trigger, such that a user that wants to control a certain function only has to activate the associated activation trigger. Hence, when a user is among several other people, or in a place where silence is required - e.g. a library, the user just has to activate the activation trigger in order to control the at least one function. The user does not have to utter the voice command and the user does not have to look at a visual user interface. Thus, the disclosed embodiment may enable the user to create silent and personal “shortcuts” to their favorite skills and actions.
It is realized that it may be more time efficient to use an activation trigger instead of uttering a long complicated screed of voice commands. The intuitive way to eliminate a long voice command would be to replace it with a shorter one. However, according to that solution, a voice command still has to be uttered. Instead, with the disclosed embodiment, these uncomfortable situations may be eliminated.
Another intuitive way to solve the above stated problem with long complicated voice commands would be to replace them with a visual user interface. However, as discussed above, that solution would not be particularly time efficient as it would require the user to look at the visual user interface instead.
According to one embodiment, the method 100 may further comprise storing 150 the voice sample representation associated with the activation trigger in at least one memory. By storing the voice sample representation associated with the activation trigger, it may be possible to re-use the activation trigger, it may be possible to change, or update, the voice sample representation associated with the activation trigger and it may also be possible to delete the voice sample representation associated with the activation trigger. Accordingly, the disclosed embodiment may provide a more flexible solution.
Furthermore, by storing the voice sample representation associated with the activation trigger, it may be possible to store multiple voice sample representations, wherein each voice sample representation may be associated with an activation trigger for controlling at least one function. Hence, one activation trigger may be stored to initiate one voice sample representation for controlling one function and another activation trigger may be stored to initiate another voice sample representation for controlling another function.
In one embodiment, controlling the at least one function may comprise controlling the at least one function in at least one processor that is located remotely from the voice based user interface. The method 100 may then further comprise transmitting 180 the voice sample representation to the at least one processor by means of a communications interface. Accordingly, the activation trigger may be activated at a device that is remote from where the at least one function may be controlled. The proposed method 100 may accordingly be implemented in several devices communicatively coupled to each other and may provide a more flexible solution.
The voice sample representation may for example be a code. Thus, initiating 170 the voice sample representation may comprise executing 190 the code. By executing said code, the controlled at least function may be performed.
In one advantageous embodiment, the activation trigger may comprise an action associated with an activation button. Accordingly, in order to initiate the voice sample representation for controlling the at least one function, an action may be performed with the activation button. The action may be any action involving the activation button. Examples of such actions may for example be touching, dragging, depressing and turning the activation button. The activation button may be at least one activation button, i.e. may also include several activation buttons.
The activation button(s) may for example be a pushbutton. A pushbutton is a traditional hardware button that is physically visible to a user. When the activation button is a pushbutton, the activation trigger may for example be a depression, such as a push, on the at least one pushbutton.
The activation button(s) may alternatively, or additionally, be a capacitive button. When an object, such as a finger, comes in contact with the capacitive button, it causes an interference with the capacitance. It changes the total capacitance, and the button may be engineered to initiate the voice sample representation when disturbance occurs. The activation button may also, additionally or alternatively, be a software button. Software buttons are exact replicas of capacitive button, but mirrored onto a screen. Hence, the activation button(s) may be any type of button as long as it in some way indicate to the user that it is a button where an activation trigger may be performed.
In one embodiment, the activation trigger may comprise an action associated with multiple, i.e. two or more, pushbuttons. In advantageous embodiments, the number of multiple pushbuttons is two. According to such embodiments, the activation trigger may for example comprise a first depression on the first pushbutton and then another depression on the other one of the two pushbuttons in order to initiate the voice sample representation. Hence, in this example, if a depression on just one of the two pushbuttons occurs, the voice sample representation will not be initiated.
By associating the activation trigger with multiple buttons, it may be possible to expand the possible alternatives of activation triggers when a limited amount of activation buttons are available. Hence, the numbers of activation triggers may be increased, without having to increasing the numbers of buttons. It may for example be desirable to limit the numbers of buttons because of limited space available for buttons, or simply just because of design reasons.
According to one embodiment, associating the voice sample representation with the activation trigger may further comprise recording 130, by the at least one processor, an activation trigger, and associating 140 the voice sample representation with the recorded action. With this proposed embodiment, it may be possible to associate the voice sample representation with more complicated activation triggers. For example, the voice sample representation may be associated with three depressions, such as pushes, on the activation button, or associate the voice sample representation with a short depression followed by a long depression. This may expand the possible numbers of activation triggers further.
According to one exemplary embodiment, the activation trigger may comprise an action associated with a sensor. The sensor may be any kind of sensor that may register an activation trigger. For example, the sensor may be a gyro or an accelerometer and the activation trigger may be an action associated with that sensor. Accordingly, in order to initiate the voice sample representation for controlling the at least one function, an action may be initiated with the sensor. The action may be any action involving the sensor. Examples of such actions may be a certain movement of the sensor.
In one further embodiment, the activation trigger may be associated with different voice sample representations depending on a context. The context may include at least one of a time and a location. Accordingly, different voice sample representations may be initiated when the same activation trigger is activated depending on the time of day or the location. It may however be noted, that context does not have to be limited to location and time. Context may further include, for example, weather, surrounding people, mood etc.
When context includes time, the function that may be controlled by the activation trigger will depend on the current time. If an activation trigger is performed in the morning, before 8 am, this may for example give a daily weather forecast. While, if the same activation trigger is performed at breakfast, between 8 and 9 am, it may for example broadcast the daily news flash briefing. Furthermore, if the same activation trigger is performed in the afternoon, between 6 and 9 pm, it may for example play a certain playlist. Alternatively, when the context includes a location, the activation trigger may be associated with the location. If an activation trigger is performed at home, the activation trigger may initiate one voice sample representation, while if the same activation trigger is performed at the office, the activation trigger may initiate another voice sample representation.
Alternatively, the context may include both location and time. Hence, it may be possible to, for example, distinguish between weekdays and weekends. At nine o’clock at a weekday, a user may be at the office and the activation trigger may, for example, initiate one voice sample representation for sending a text message saying that the user is busy, while at nine o’clock at a weekend, a user may be at home and the activation trigger may initiate another voice sample representation, for example listening to a favorite playlist.
The above described embodiment has the advantage that a single activation trigger may be used for several different purposes. By associating the voice sample representation in dependence with a context, it may be possible to adapt the controlled function to prevailing conditions, achieve a more flexible solution and it may be easier to satisfy a user’s need.
In one embodiment, the method 100 may further comprise generating 160, or otherwise producing or creating, a confirmation after that the voice sample representation has been associated with the activation trigger. Hence, the user may be assured that the voice sample representation actually has been associated with the wanted activation trigger and the risk of the user redoing the method again unnecessarily may be reduced. The confirmation may, for example, be an audio confirmation, a visual confirmation, a tactile confirmation or a combination of the previous mentioned confirmations.
According to a second aspect, there is provided a voice based user interface for implementing the method according to the first aspect.
Figure 2 discloses an example implementation of a voice based user interface 10 for controlling at least one function. The voice based user interface 10 may be communicatively connected to a voice-recording device 25, 225. The voice-recording device 25, 225 may be configured to obtain a voice command. The voice command may constitute a command for controlling the at least one function. The voice based user interface 10 may further be communicatively connected to at least one processor 40, 240; and at least one memory 60, 260. In one exemplary embodiment, the at least one memory 60, 260 may comprise instructions that are executable by the at least one processor 40, 240 whereby the at least one processor 40, 240 may be operative to convert the received voice command into a voice sample representation, and associate the voice sample representation with an activation trigger. In response to an activation of the activation trigger, the voice sample representation may be initiated for controlling the at least one function.
By the above disclosed voice based user interface 10, a more time efficient voice based user interface is provided. Accordingly, the at least one function may be controlled without having to utter a command, thus making it possible to control the at least one function in places where it is not appropriate to speak.
The voice-recording device 25, 225, may according to one embodiment be comprised within an audio interface 20, 220. The audio interface 20, 220 may, for example, be configured to both receive and produce audio.
In one embodiment, the at least one memory 60, 260 may be configured to store the voice sample representation associated with said activation trigger. The at least one memory 260 may be located remote from the voice based user interface 10, for example in a remote device 200, or the memory 60 may be located within the same device 20 as the voice based user interface 10. By storing the voice sample representation associated with the activation trigger, it may be possible to reuse the saved activation trigger multiple times and hence making it possible to store the activation trigger associated with the voice commands which are used the most.
The voice based user interface 10 may, according to one embodiment, be communicatively connected to a communications interface 30. Thus, the controlled at least one function may be executed in at least one processor 240 located remotely from the voice based user interface 10. The communications interface 30 may be configured to transmit the voice sample representation to the at least one processor 240. Accordingly, the voice based user interface 10 may control functions with a processor 240 that may be located remotely from the voice based user interface 10.
By providing a voice based user interface 10 according to the above, a more flexible solution is provided. It may be possible to locate the at least one processor 40, 240 where it may be the most suitable and hence, the voice based user interface 10 does not have be located in a device comprising a processor for controlling the at least one function. The most suitable location may, for example, depend on the prevailing conditions such as available component space and cost constraints. However, it may be noted that this embodiment does not exclude the possibility to also communicatively couple the voice based user interface 10 to a processor located within the same device as the voice based user interface 10.
The voice sample representation may for example be a code. The code may for example be a computer code. The code may alternatively be a machine code or any other suitable code that may be readable by the at least one processor 40, 240. The at least one memory 60, 260 may comprise instructions that are executable by the at least one processor 40, 240, whereby the at least one processor 40, 240 may be operative to initiate the voice sample representation by executing the code.
In one embodiment, the voice based user interface 10 may be further communicatively connected to an activation button 50. The activation trigger may comprise an action associated with the activation button 50. Hence, in order for the at least one function to be controlled, the activation trigger may comprise an action with the activation button 50. The activation button 50 may be any type of button, as described above. The activation button 50, may for example be a pushbutton. The activation button may be at least one button, i.e. there may be multiple buttons. These activation buttons 50 may all be of the same type, or they may be of different types.
When the activation button 50 is a pushbutton, the activation trigger may comprise a depression on the pushbutton. However, it may be realized that the activation trigger may be any action with the activation button 50. The only thing that may limit the associated activation trigger, may be the software and/or the hardware within the button. For example, if the activation button 50 is a turnable knob, the associated activation trigger may be a turning of the turnable knob, or if the activation button 50 is a software button, as described above, the activation trigger may be a tap or a touch action with the software button.
In one embodiment, the activation trigger may alternatively or additionally comprise an action associated with multiple activation buttons 50. All the activation buttons may for example be pushbuttons. Accordingly, the voice sample representation may not be initiated until the action associated with all the involved pushbuttons has been performed. For example, the action associated with multiple pushbuttons may comprise, without limitations, a depression on each of at least two pushbuttons. Accordingly, if there are two activation buttons 50, there may be at least four different activation triggers associated with four different voice sample representations, i.e. a first voice representation associated with a depression on the first activation button; a second voice representation associated with a depression on the second activation button; a third voice representation associated with a first depression on the first activation button and then a depression on the second activation button; and a fourth voice sample representation associated with a first depression on the second activation button and then a depression on the first activation button. Hence, the possible activation triggers may be increased without increasing the numbers of activation buttons 50.
In one embodiment, the voice sample representation may be associated with the activation trigger by means of the at least one processor 40, 240 by associate the voice sample representation with the recorded action. Hence, longer and more complex sequences of activation triggers may be recorded and associated with a voice sample representation.
In one embodiment, the activation trigger may be associated with different voice sample representations depending on a context. The context may include at least one of a time and a location.
In one embodiment, the at least one memory 60, 260 may comprise instructions that are executable by the at least one processor 40, 240, whereby the at least one processor 40, 240 may be operative to generate a confirmation when the voice sample representation is associated with the activation trigger.
In a third aspect, there is provided an audio device 20. The audio device 20 may comprise a voice based user interface 10 according to the second aspect.
The audio device 20 may for example be, without limitations, a loudspeaker, a hearing aid, a pair of ear protection devices or a pair of headphones. If the audio device 20 is a pair of headphones, the pair of headphones may for example be in-ears headphones. Accordingly, the audio device 100 may be any device comprising a voice based user interface 10 for controlling at least one function. The function may in some embodiments be related to audio, such as playing and pausing music. However, the controlled at least one function may, additionally or alternatively, not be related to audio.
In one embodiment, the audio device 20 may further comprise a visual based user interface 310, as illustrated in Figure 3. The visual user interface 310 may display visual information. The visual user interface 310 may, for example, display information about stored voice sample representations with their associated activation triggers. This information may for example be comprised within an app, where it may be possible to get an overview of the created activation triggers used for controlling the at least one function. The app may enable the user to update, delete or even create new activation triggers with associated voice sample representations.
In one embodiment, the audio device 20 may be communicatively coupled to a remote device 200 which may comprise a visual user interface 310 as previously discussed.
One non-limiting exemplary embodiment related to the above disclosed voice based user interface 25, 225 may now be described with reference to Figure 4. The voice based user interface 25, 225 may be comprised within an audio device 20 such as a pair of headphones. The headphones may comprise at least one activation button 50 in form of a pushbutton. The headphones may further be communicatively coupled to a remote device 200. The remote device 200 may, for example, be a smartphone.
The headphones may be communicatively coupled to a voice-recording device 25, 225 within the headphones or within the smartphone. The voice-recording device 25, 225 may obtain 420 a voice command. The voice command may constitute a command for controlling at least one function. In one embodiment, the voice-recording device 25, 225 may be configure to obtain the voice command when a button is pressed 410. The button may be the activation button 50, or the button may be a different button. The button may be located at the pair of headphones communicatively coupled to the voice based user interface 25, 225, or the button may be located at the smartphone communicatively connected to the headphones. When the button is released, this may indicate that the obtaining of the voice command is completed.
When the voice command is obtained, the voice command may be converted 440 into a voice sample representation. As this, in this exemplary embodiment, is performed remotely from the headphone, the voice command may first be transmitted 430 via the communications interfaces 30, 230 to the smartphone. The converted voice sample representation may represent the action that the received voice command is to be translated into.
According to this exemplary embodiment, the obtained voice command may be converted into the voice sample representation by communicating with the hosted app service 450 by communicating via an app 460. The response to the action may then be transmitted back to the smartphone 470, 480 and possibly further to the headphone 490, depending on which voice command that was received by the headphones.
The converted voice sample representation may then be associated with an activation trigger, which may be stored within a memory at the headphone or within the smartphone. Thus, the next time the function is intended to be controlled, the activation trigger may be activated in order to control the at least one function, instead of using the voice command. Hence, a voice based user interface is provided that may be both easy and time efficient to use.
Modifications and other variants of the described embodiments will come to mind to one skilled in the art having benefit of the teachings presented in the foregoing description and associated drawings. Therefore, it is to be understood that the embodiments are not limited to the specific example embodiments described in this disclosure and that modifications and other variants are intended to be included within the scope of this disclosure. Furthermore, although specific terms may be employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation. Therefore, a person skilled in the art would recognize numerous variations to the described embodiments that would still fall within the scope of the disclosure. As used herein, the terms“comprise/comprises” or“include/includes” do not exclude the presence of other elements or steps. Furthermore, although individual features may be included in different numbered example embodiments, these may possibly advantageously be combined, and the inclusion of different numbered example embodiments does not imply that a combination of features is not feasible and/or advantageous. In addition, singular references do not exclude a plurality.

Claims

1. A method of providing a voice based user interface for controlling at least one function, the method comprising:
obtaining (110) a voice command by means of a voice-recording device, the voice command constituting a command for controlling the at least one function;
converting (120) the obtained voice command into a voice sample representation by means of at least one processor;
associating (140) the voice sample representation with an activation trigger by means of the at least one processor; and
responsive to an activation of said activation trigger, initiating (170), by means of the at least one processor, the voice sample representation for controlling the at least one function.
2. The method according to claim 1, further comprising:
storing (150) the voice sample representation associated with the activation trigger in at least one memory.
3. The method according to any of claim 1 or 2, wherein controlling the at least one function comprises controlling the at least one function in at least one processor that is located remotely from the voice based user interface, and wherein the method further comprises:
transmitting (180) said voice sample representation to said at least one processor by means of a communications interface.
4. The method according to any one of the previous claims, wherein the voice sample representation is a code, and wherein initiating the voice sample representation comprises:
executing (190) said code.
5. The method according to any one of the previous claims, wherein the activation trigger comprises an action associated with an activation button.
6. The method according to claim 5, wherein the activation button is a pushbutton.
7. The method according to claim 6, wherein the activation trigger is a depression on the pushbutton.
8. The method according to claim 6 or 7, wherein the activation trigger comprises an action associated with multiple pushbuttons.
9. The method according to any one of the previous claims, wherein associating the voice sample representation with the activation trigger further comprises:
recording (130), by the at least one processor, an activation trigger, and
associating (140) the voice sample representation with the recorded action.
10. The method according to any one of the previous claims, wherein the activation trigger is associated with different voice sample representations depending on a context, and wherein the context includes at least one of a time and a location.
11. The method according to any one of the previous claims, wherein the method further comprises:
generating (160) a confirmation after that the voice sample representation has been associated with the activation trigger.
12. A voice based user interface (10) for controlling at least one function, the voice based user interface being communicatively connected to:
i) a voice-recording device (25, 225), wherein the voice-recording device (25, 225) is configured to obtain a voice command, the voice command constituting a command for controlling the at least one function;
ii) at least one processor (40, 240); and
iii) at least one memory (60, 260);
wherein said at least one memory (60, 260) comprises instructions that are executable by the at least one processor (40, 240) whereby the at least one processor (40, 240) is operative to:
convert the received voice command into a voice sample representation, and associate the voice sample representation with an activation trigger, and in response to an activation of said activation trigger,
initiate the voice sample representation for controlling the at least one function.
13. The voice based user interface (10) according to claim 12, wherein
the at least one memory (60, 260) is configured to store the voice sample representation associated with said activation trigger.
14. The voice based user interface (10) according to any of claim 12 or 13, wherein the voice based user interface is communicatively connected to a communications interface (30), wherein the controlled at least one function is executed in at least one processor (240) located remotely from the voice based user interface (10), and wherein the communications interface (30) is configured to transmit the voice sample representation to said at least one processor (240).
15. The voice based user interface (10) according to any of claims 12 to 14, wherein the voice sample representation is a code, and wherein the at least one memory (60, 260) comprises instructions that are executable by the at least one processor (40, 240) whereby the at least one processor (40, 240) is operative to initiate said voice sample representation by executing said code.
16. The voice based user interface (10) according to any one of claims 12 to 15, wherein the voice based user interface (10) is further communicatively connected to an activation button (50), and wherein said activation trigger comprises an action associated with the activation button (50).
17. The voice based user interface (10) according to claim 16, wherein the activation button (50) is a pushbutton.
18. The voice based user interface (10) according to claim 17, wherein the activation trigger comprises a depression on the pushbutton.
19. The voice based user interface (10) according to any of claims 17 or 18, wherein the activation trigger comprises an action associated with multiple pushbuttons.
20. The voice based user interface (10) according to any one of claims 12 to 19, wherein the voice sample representation is associated with the activation trigger by means of the at least one processor (40, 240) by:
record an activation trigger, and
associate the voice sample representation with the recorded action.
21. The voice based user interface (10) according to any one of claims 12 to 20, wherein the activation trigger is associated with different voice sample representations depending on a context, and wherein the context includes at least one of a time and a location.
22. The voice based user interface (10) according to any one of claims 12 to 21, wherein the at least one memory (60, 260) comprises instructions that are executable by the at least one processor (40, 240) whereby the at least one processor (40, 240) is operative to generate a confirmation when the voice sample representation is associated with the activation trigger.
23. An audio device (20) comprising a voice based user interface (10) according to any one of claims 12 to 22.
24. The audio device (20) according to claim 23, wherein the audio device (20) is a loudspeaker.
25. The audio device (20) according to claim 23, wherein the audio device (20) is a pair of headphones.
26. The audio device (20) according to claim 25, wherein the pair of headphones are in-ears headphones.
27. The audio device (20) according to claim 23, wherein the audio device (20) is a hearing aid.
28. The audio device (20) according to claim 23, wherein the audio device (20) is a pair of ear protection devices.
PCT/SE2019/050669 2018-07-06 2019-07-05 Voice based user interface for controlling at least one function WO2020009651A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
SE1850860-6 2018-07-06
SE1850860A SE1850860A1 (en) 2018-07-06 2018-07-06 Voice based user interface for controlling at least one function

Publications (1)

Publication Number Publication Date
WO2020009651A1 true WO2020009651A1 (en) 2020-01-09

Family

ID=69059752

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SE2019/050669 WO2020009651A1 (en) 2018-07-06 2019-07-05 Voice based user interface for controlling at least one function

Country Status (2)

Country Link
SE (1) SE1850860A1 (en)
WO (1) WO2020009651A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030069733A1 (en) * 2001-10-02 2003-04-10 Ryan Chang Voice control method utilizing a single-key pushbutton to control voice commands and a device thereof
US20130231937A1 (en) * 2010-09-20 2013-09-05 Kopin Corporation Context Sensitive Overlays In Voice Controlled Headset Computer Displays
US20140365884A1 (en) * 2012-03-30 2014-12-11 Google Inc. Voice command recording and playback

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030069733A1 (en) * 2001-10-02 2003-04-10 Ryan Chang Voice control method utilizing a single-key pushbutton to control voice commands and a device thereof
US20130231937A1 (en) * 2010-09-20 2013-09-05 Kopin Corporation Context Sensitive Overlays In Voice Controlled Headset Computer Displays
US20140365884A1 (en) * 2012-03-30 2014-12-11 Google Inc. Voice command recording and playback

Also Published As

Publication number Publication date
SE1850860A1 (en) 2020-01-07

Similar Documents

Publication Publication Date Title
JP6115941B2 (en) Dialog program, server and method for reflecting user operation in dialog scenario
AU2021258071B2 (en) System and method for asynchronous multi-mode messaging
RU2526758C2 (en) Touch anywhere to speak
US10930277B2 (en) Configuration of voice controlled assistant
EP3077921B1 (en) Natural language control of secondary device
TWI489372B (en) Voice control method and mobile terminal apparatus
TWI535258B (en) Voice answering method and mobile terminal apparatus
US20130051543A1 (en) Muting and un-muting user devices
JP2009505204A (en) Methods for driving interactive and interface systems
KR20150099156A (en) Wireless receiver and method for controlling the same
CN202948437U (en) System and apparatus for generating user interface, and user interface used for system, device or application
JP6609376B2 (en) Immediate communication apparatus and method
KR101419764B1 (en) Mobile terminal control method for voice emoticon
JPWO2018100743A1 (en) Control device and equipment control system
JP2016019070A (en) Information processing device, display control method, computer program, and recording medium
JP2007280179A (en) Portable terminal
JP2013196661A (en) Input control program, input control device, input control system and input control method
US20190235832A1 (en) Personal Communicator Systems and Methods
US20140169582A1 (en) User interface for intelligent headset
US10002611B1 (en) Asynchronous audio messaging
JP3260324B2 (en) Message response type mobile phone
US9805721B1 (en) Signaling voice-controlled devices
CN103428339A (en) Method for controlling mobile phone to be operated through voice input
WO2020009651A1 (en) Voice based user interface for controlling at least one function
JP2005056170A (en) Interactive operation supporting system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19830693

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19830693

Country of ref document: EP

Kind code of ref document: A1