WO2009104195A1

WO2009104195A1 - Voice based man- machine interface (mmi) for mobile communication devices

Info

Publication number: WO2009104195A1
Application number: PCT/IN2008/000094
Authority: WO
Inventors: Krishnamoorthy Karungulam Ramachandran; Shyam Prasad Kompadav Shetty
Original assignee: Krishnamoorthy Karungulam Ramachandran; Shyam Prasad Kompadav Shetty
Priority date: 2008-02-18
Filing date: 2008-02-18
Publication date: 2009-08-27

Abstract

A VoMM communication device is provided, where the communication device is adapted to take only input voice commands and perform the commanded functions and produce voice output. The communication device is trained to respond only to the commands of a specific user providing inherent security, but at the same time, the communication device is language and accent independent. Absence of keypad and visual display panel, and the Voice Recognition engine configured to recognize only a limited set of context specific keywords reduces the power requirement and providing small form-factor enabling it to be worn in person with inherent headsets.

Description

Voice based Man-Machine Interface (MMI) for mobile communication devices

FIELD OF INVENTION

[001] The embodiments herein generally relate to mobile communications, and, more particularly, to voice-only-MMI mobile (VoMM) communication devices.

BACKGROUND AND PRIOR ART

[002] Conventional mobile communication devices, in addition to performing the basic functions for calling and receiving calls, perform a wide range of other functions like text messaging, camera, offering access to the internet, emailing facilities, hand-held computer applications, radio, music players etc. These and other features provide more and improved ways and means of communication, computing and entertainment. However, such proliferation of features and the evolution of a mobile device as a general purpose communication, computing and entertainment device has complicated the user interface and led to dissatisfaction among various sections of mobile device users. Some of these users may be physically challenged and hence are unable to use many of the features in a typical mobile device with a typical interface like keypad in an effective way. Others may be users who desire simpler access to basic features rather than a complicated device with too many features that one may never use. As communication technologies evolve, users should be able to avail of various services including basic services like calling and receiving calls, sending and receiving text/voice messages very easily. The devices must also be easy to handle. However, in reality, devices are becoming more complicated, more power consuming, more costly to buy and maintain, and difficult to use in general.

[003] Various attempts have been made to provide voice based activation of devices for various purposes and make it easier for users of devices to access services in an easy manner. However, none of these devices depends on just voice Man- Machine Interface (MMI) alone. [004] Hence, there is a need for a simple mobile communication device that provides easy access to services using voice MMI alone, is cost effective, has small form factor and has low power requirements.

SUMMARY OF INVENTION

[005] A mobile communication device using Voice-only-MMI and devoid of display and keypad units is provided. The device performs functions based on the input voice commands and responds using natural voice output. [006] The VoMM communication device comprises of an MMI engine with a signal conditioning unit, a voice recognition unit, a keyword store control unit, a menu items object unit, a context control unit, a complete keyword dataset, and a context specific keyword dataset. [007] According to an embodiment, a method to provide services on a VoMM communication device is disclosed where voice commands from the owner are received, by the device and the MMI engine processes, recognizes and performs functions according to the received voice commands and produces voice output. [008] These and other aspects of the embodiments herein will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following descriptions, while indicating preferred embodiments and numerous specific details thereof, are given by way of illustration and not of limitation. Many changes and modifications may be made within the scope of the embodiments herein without departing from the spirit thereof, and the embodiments herein include all such modifications.

BRIEF DESCRIPTION OF DRAWINGS

[009] The embodiments herein will be better understood from the following description with reference to the drawings, in which: FIG. 1 shows voice-only-MMI single-chip mobile communication device in accordance with various embodiments as described herein;

FIG. 2 shows voice-only-MMI two-chip mobile communication device in accordance with various embodiments as described herein; FIG. 3 shows voice signal flow and control operations of VoMM communication device in accordance with various embodiments as described herein;

FIG. 4 shows schematic diagram of MMI engine in accordance with various embodiments as described herein;

FIG. 5 shows various wearable articles and their shape and size (form-factors) in which VoMM communication device can be incorporated in accordance with various embodiments described herein;

FIG. 6 shows the flow diagram of the overall method of communication provided by the

VoMM communication device;

FIG. 7 shows a flow diagram of the learning mode operation of the VoMM communication device;

FIG. 8 shows a flow diagram of the normal mode of operation of the VoMM communication device;

FIG. 9 shows a flow diagram of the working of the Menu Item Object unit; and

FIG. 10 shows an example of the various embodiments of the VoMM communication device, when operating in Normal mode.

DESCRIPTION OF EMBODIMENTS

[0010] The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description.

Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.

[0011] As mentioned, there remains a need for a simple mobile communication device that allows users to access services in an easy manner. The embodiments herein achieve this by providing a mobile communication device with Voice-only-MMI, which has a small form factor and low power requirements. Referring now to the drawings, and more particularly to FIGS. 1 through 10, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments. [0012] FIG. 1 illustrates a single chip mobile device (101) comprising a phone semiconductor integrated chip (IC) wherein complete mobile communication device functionality including but not limited to RF, Baseband and Stack run from said single chip. [0013] According to various embodiments of voice-only-MMI mobile (VoMM) communication device, the MMI Engine and Voice Synthesis Module work in conjunction with a CPU which can be internal or external to said modules. [0014] According to an embodiment of the VoMM communication device, FIG. 2 illustrates a block diagram of a mobile communication device wherein single chip mobile device (101) is connected to a small external CPU (202) comprising components to provide additional functionality to said single chip mobile device (101) wherein the components can include but not limited to RAM (203), Flash (204), CODEC (204) and Timer (206) as shown in the figure. The said external CPU (202) can be provided when said single chip mobile device (101) cannot load additional functionality through software or firmware upgrade to perform the Voice Recognition/Voice Synthesis (VR/VS) functionality. The voice is routed through the external CPU for voice processing (VR/VS) as shown in FIG.2. The Control Path is used to communicate information from said external CPU (202) to single chip mobile device (101). [0015] According to an embodiment of the VoMM communication device, the interactions with said mobile communication device is only through voice based MMI. The said mobile communication device incorporates Voice Recognition algorithms to identify commands and inputs which are unambiguous of the authentic user wherein said authentic user can be set in said phone as described herein.

[0016] FIG. 3 illustrates an embodiment of VoMM communication device wherein the MMI Engine (301) processes the voice input by working in conjunction with the external CPU (202). The Voice Synthesis Module (302) artificially produces human speech in conjunction with external CPU (202) which comprises the recorded speech database wherein said module can incorporate a model including but not limited to vocal tract and other human voice characteristics to create voice output. [0017] FIG. 4 illustrates core engine according to various embodiments of said VoMM communication device. According to an embodiment of VoMM communication device, said device can be in Learning mode wherein said mode is the default mode when the power of said device is switched on for the first time or can be selected as the mode of operation by a user herein referred to as 'owner' using voice command. In Learning mode, the VR (403) engine operates in "wide-band mode", i.e. it operates in "speaker independent" mode. The voice input (401) is conditioned and digitized by Signal conditioning and ADC (402) Unit and recognized by Voice Recognition (VR) Unit (403) wherein characteristic parameters of voice input of said owner is extracted. The conditioned and digitized voice from Signal Conditioning and ADC (402) and said parameters from VR (403) are used by Keyword Store Control (KSC), (409) which is triggered when said device is in Learning mode by Learning Mode Control (410) to compile a data table comprising but not limited to keywords with their corresponding extracted parameters described herein. In Learning Mode, the mobile device guides the user on how to use the mobile device using intuitive voice prompts. During this process, it also extracts and stores the parameters for the keywords used for triggering various Menu Items. It also extracts the unique "voice signature", which is specific to the user (now the owner) during Learning Mode. The said data table is stored in Complete Keyword Dataset (408) by said KSC (409).

[0018] Said KSC (409) is the core module of the Learning Mode. Said KSC is implemented as a simple state machine which guides the user to utter specific keywords using natural voice prompts. Once a keyword is uttered in response to a voice prompt, said KSC confirms this from the user using voice prompt. After acquiring a limited set of keywords, the "Learning Process" is completed by said KSC (409) by estimating parameters for remaining keywords and stores said keywords in the Complete Keyword Dataset (408). Learning mode is exited when all keywords are acquired or through user action to exit Learning mode or by timeout. According to an embodiment, if the user tries to use Learning mode in noisy environment or if more than one person is speaking, the device automatically detects this and does not update its Complete Keyword Dataset (408) and retains old data. It also prompts the user to use the Learning mode in a quiet environment with only the "Owner" speaking to it. This is to ensure the user is able to use the mobile device reliably under all conditions.

[0019] According to an embodiment, as the KSC makes association between the relevant keywords and the sounds produced by the owner, the parameters for these sounds are language independent and accent independent. However, they are purposely retained to be speaker dependant, in order to enable only the owner of the device to use the device. Hence, it is possible to adapt the device for any natural language spoken on earth. It is also possible to adapt it to any set of sounds corresponding to the keywords that can be produced during the learning process. However, the corresponding voice prompts have to be made available for the language the user intends to use. [0020] Table 1 illustrates an example of voice interaction between said owner when the VoMM communication device is switched ON for the first time and said device according to various embodiments of VoMM communication device. The owner enquires the method to use the device and is required to provide some information which includes but not limited to the reference name of the owner and device, setting information which includes date and time. The said device prompts said owner in a natural voice on how to use said device in said Learning mode.

User VoMM Communication Device

How do I use the phone? Please tell me your name.

Shyam Hello Shyam, please follow the instructions. At any point you can say Exit, to exit this mode. If you exit in the middle, you n ;r some information again. Shyam, how do you want to name me?

Diamond (or any other name) Thanks! What is today's date? 2nd January 2008 Today is 2nd January 2008. Confirm or Reject? Confirm What is the time now? 4:30 pm Time now is 4:30 pm.

Confirm or Reject?

Confirm (By now, the mobile has collected all keywords, extracted

"Owner's" voice signature and estimated parameters for all remaining keywords. It exits Learning mode and enters

Normal mode).

Thank you! Now you are ready to use the phone. You can call others, receive calls, use the Phonebook or the SMS.

What would you like to do?

Nothing (The mobile goes to Normal mode)

(or user does not utter anything) (or it starts inactivity timer)

Table. 1: Example interaction of User with VoMM communication device when switched ON for first time

[0021] According to various embodiments, the reference name of owner and device are used in various instances as described herein. Further, according to various embodiments voice characteristics of said owner are recognized which is used to set the security as described herein and extrapolated to recognize other keywords as described herein. [0022] According to an embodiment of the VoMM communication device, security of said device is set during Learning mode by extracting the unique "voice signature" of the owner, which is represented by the unique pattern of the voice parameters for various keywords. The extracted parameters by VR (403) from limited keywords during initial settings of said device are further used to extrapolate parameters of other keywords for the voice of said owner which includes but not limited to representing each keyword by a set of coefficients of a linear equation, which when applied to the equation fed to the speech model would provide the sequence of speech samples representing the keyword. By collecting the parameters for a limited set of keywords spoken by said owner, the parameters for several other keywords are estimated. Further, said security enables providing access to features of the n vhich include but are not limited to calling, messaging, alarm, profile settings on said device only to the owner of said device as described herein without the involvement of said owner.

[0023] In various embodiments, when said mobile communication device is in Normal mode, said features of the device can be accessed by said owner. The voice input (401) is conditioned and digitized by the Signal conditioning and ADC (402) and recognized by VR (403) wherein said VR refers to Context Specific Keyword Dataset (407) to recognize said voice input and sends a signal to the corresponding Menu Item Object (405). The signal is queued in the Signal Queue (404). The Menu Item Object (405) processes the said signal and sends context information to the Context Control Unit (406) wherein context can include but is not limited to messaging, calling or profile setting features of said device. The Context Control Unit (406) reads parameters of the context specific keywords from Complete Keyword Dataset (408) and updates the said Context Specific Keyword Dataset (407). Further, in the event of multiple pending commands in the Signal Queue (404), which may become out of context, the Context Control Unit (406) purges the queue according to the preferences set by the owner which can include but is not limited to purging all subsequent voice commands in said queue in context wherein first signal command is executed.

[0024] According to an embodiment of the VoMM communication device, the said mobile communication device needs at most a single button and that too only for switching the power ON or OFF.

[0025] According to an embodiment of the VoMM communication device, hands-free mechanism is inherently provided wherein said device has a microphone and a speaker for a user to communicate with the device. In other embodiments, the device may provide appropriate slots so that users can use external microphones and speakers. [0026] FIG. 5 illustrates some of the form factors of VoMM communication device including Helmet, Necklace, Pendant, Watch and Bracelet. Further, the form-factors can include ear-piece, behind-ear, ear-ring, eye-glass mounted, helmet mounted, steering- wheel mounted, in addition to several other form factors not mentioned herein. The display panel and keyword interface of the device is eliminated, thus reducing the power requirement, which also reduces the weight of said communication device significantly. Since the device operates on a limited Keyword Dataset for voice recognition functionality, the computing power required is also very limited, thus further reducing the power consumption. Hence, said different forms can be worn in person by the owner. Further, said forms can operate in various interaction modes including but not limited to 'loud speaker' mode and the owner can hold the form close to the ear and mouth.

[0027] According to an embodiment of the VoMM communication device, in the event of the mobile communication device being turned ON, said mobile communication device can be located by said owner voicing a pre-defined keyword and its name to locate the said device which responds by annunciating its presence using a voice output. [0028] FIG. 6 shows the overall method of communication provided by the VoMM communication device wherein the voice commands of the owner is received at 601. The said voice commands are processed and recognized using MMI engine at 602. The said device performs the functions according to said received voice commands at 603 and produces voice output in response to said commands at 604. [0029] FIG. 7 shows the Learning mode of operation of the VoMM communication device according to various embodiments, wherein voice input is conditioned and digitized by the Signal Conditioning and Analog to Digital Converter (ADC) (402) unit at 701. The voice characteristic parameters of said input signal from the Signal Conditioning and ADC (402) is extracted by the VR (403) unit at 702. After prompting the user through natural voice to verify input and also, asking the user for the next input at 703 fed to the Keyword Store Control (409) unit which stores a data table of said voice parameters at 704 and said input signal from Signal Conditioning and ADC (402) unit and parameters are estimated for the rest of the keywords at 705 and said data table of estimated keywords is saved in the Complete Keyword Dataset (408) at 706. [0030] FIG. 8 shows the Normal mode operation of the VoMM communication device according to various embodiments, wherein the voice input is conditioned and digitized by the signal conditioning and ADC unit at 801. The voice recognition unit (403) recognizes the spoken keywords with reference to the Context Specific Keyword Dataset (407) at 802 and sends said signals to the Menu Item Objects (405). The signals are queued in a Signal Queue at 803, before being processed. Queuing of signals is incorporated to ensure reliability of operation. This also enhances user experience by being able to recognize and respond to user's voice inputs without missing any keyword. Each signal triggers a Menu Item Object corresponding to the spoken keyword and performs the actions specified for that Menu Item Object. Subsequently, the Menu Item Object (405) sends the context information to the Context control (406) unit at 805 which retrieves relevant context dataset from Complete Keyword Dataset (408) and enters it in the Context Specific Keyword Dataset (407) at 806. In the new context, if the signals queued up in the Signal Q (404) are irrelevant, they are purged as described herein (807). [003I] FIG. 9 shows the working of Menu Item Object (405) unit according to an embodiment of VoMM communication device in Normal Mode wherein said unit is invoked through signal from VR (403) as shown in FIG. 4 at 901. The Menu Item Object (405) performs the action requested by said signal at 902 and outputs stored voice at 903 message. The context of the device is updated at 904 and said updated context is sent to Context Control (406) Unit. [0032] FIG. 10 shows an example according to an embodiment of VoMM communication device, the owner requests for the Menu using keywords 'Menu Please' indicated by VCl (Voice Command 1), which is conditioned and digitized by the Signal Conditioning and ADC (402). The VR (403) module recognizes said keyword ("Menu") and signals the Menu Item Object (405) via the Signal Q (404). The corresponding Menu Item Object (405) reads out the menu options which includes Phonebook, SMS, Called Numbers, Missed Calls and Help, and asks the owner to choose the option and sends the context as 'Menu' to Context Control (406) unit. The Context Control (406) unit retrieves the dataset corresponding to the 'Menu' context and stores it in the Context Specific Keyboard Dataset (407). The owner selects the keyword option 'Phonebook' represented by VC2 which is recognized by VR (403) by referring to said Context Specific Keyword Dataset (407) and the request is queued and sent to the Menu Item Object (405) which processes the signal by reading out the relevant options Call, Recall, Add, Edit or Delete, which are updated in the Context Specific Keyword Dataset (407) by the context control as described herein. Further, the conditioned and digitized 'Call' keyword option, represented by VC3 selected by said owner is recognized by the VR (403) by referring to the Context Specific Keyword Dataset (407) and processed by the Menu Item Objects (405) for an entry selected by owner from the Phonebook.

[0033] With a battery of 800 milli ampere hour capacity, said VoMM communication device can be used for more than 15 days without recharging, with a usage pattern of approximately 120 minutes of call time per day.

[0034] The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the appended claims.

Claims

What is claimed is:

1. A VoMM communication device adapted to take input voice commands only, perform functions based on said input voice commands, and produce voice output, where said device has no keys and display.

2. The VoMM communication device of claim 1, where said device has a single button interface which can be used to power on and power off said device.

3. The VoMM communication device of claim 1, where said device comprises: a. an MMI engine to process input voice commands, and perform necessary actions; b. a voice synthesis unit to synthesize output from said MMI engine to produce voice output,

where said MMI engine provides keywords recognized as input to said synthesis unit after performing necessary actions.

4. The VoMM communication device of claim 1, where said MMI engine comprises of: a. a signal conditioning unit to convert voice input from analog to digital; b. a voice recognition (VR) unit to perform speaker recognition and speech recognition functions upon digital signals received from said signal conditioning unit; c. a Keyword Store Control (KSC) unit coupled to said voice recognition unit to compile a data table of keywords and corresponding voice characteristics and store said data table of keywords in a keyword dataset, where said keywords are keywords used by owner to identify menu items on said device; d. a menu items object unit for performing actions based on signals from said voice recognition unit; e. a context control unit coupled to said menu items object unit to update a context specific keyword dataset based on signals from said menu items object unit; f. a complete keyword dataset with voice characteristics of all keywords and corresponding menu items used by owner of said device; and g. a context specific keyword dataset with keyword characteristics of context specific keywords.

5. The VoMM communication device of claim 1, where a voice command is any of the commands to make a call, accept an incoming call, reject an incoming call, add an entry into phone book, remove an entry from phone book, send an SMS, open an SMS from SMS inbox, delete an SMS, browse SMS messages, move to learning mode, move to normal mode, locate said device, set alarm, review alarm, remove alarm, set vibration mode, set normal ring tone mode, set a combination of vibration and ring tone mode.

6. The VoMM communication device of claim 1, where said device has a small form factor and is light in weight such that said device is amenable to be incorporated into a wearable article, where wearable article is one of and is not limited to pendant, bracelet, wrist watch, eyeglass, helmet and necklace.

7. The VoMM communication device of claim 1, where keywords spoken to said device are language independent.

8. The VoMM communication device of claim 1, where keywords spoken to said device are accent independent.

9. The VoMM communication device of claim 1, where spoken voice prompts from said device depends on language of said keywords.

10. The VoMM communication device of claim 1, where said device is adapted to be operated in learning mode.

11. The VoMM communication device of claim 1 , where said device comprises instructions to guide owner of said device with natural voice on how to use said device, when the device is in learning mode.

12. The VoMM communication device of claim 1, where said device comprises instructions to learn owner voice characteristics, said instructions perform the steps of: a. said signal conditioning unit conditioning owner voice signal; b. said signal conditioning unit digitizing owner voice signal; c. feeding said digitized owner voice signal to said VR unit and KSC unit; d. said VR unit extracting owner voice characteristics specific to keywords used to identify Menu Items; e. feeding said owner voice characteristics to said KSC unit; f. said KSC unit extrapolating voice characteristics of other keywords using voice characteristics of limited keywords; g. said KSC unit compiling a data table of keywords and corresponding voice characteristics; and h. said KSC unit storing said data table in said complete Keyword dataset.

13. The VoMM communication device of claim 1, where said device is adapted to be operated in normal mode.

14. The VoMM communication device of claim 1, where said device is in learning mode when switched on for the first time.

15. The VoMM communication device of claim 1, where said KSC unit comprises instructions to switch to learning mode on receiving a voice command from said owner.

16. The VoMM communication device of claim 1, where said device is adapted to respond only to owner's voice.

17. The VoMM communication device of claim 1, where said device uses an 800 milli ampere hour battery, said device capable of operating for more than 15 days, with a maximum daily call time of approximately 120 minutes per day.

18. Method of providing service on a VoMM communication device comprising of an MMI engine, said MMI engine comprising a signal conditioning unit; a voice recognition unit; a keyword store control unit; a menu items object unit; a context control unit; a complete keyword dataset; and a context specific keyword dataset, the method comprising the steps of: a. receiving voice commands from owner of said device; b. processing and recognizing said voice commands using said MMI engine; c. performing functions according to said received voice commands; and d. producing voice output to signal further input or completion of a command.

19. The method of claim 18, where said device can be powered off and powered on using a single button interface provided on said device.

20. The method of claim 18, where a voice command is any of the commands to make a call, accept an incoming call, reject an incoming call, add an entry into phone book, remove an entry from phone book, browse and make calls from missed call list, browse and make calls from called number list, send an SMS, open an SMS from SMS inbox, delete an SMS, browse SMS messages, move to learning mode, move to normal mode, locate said device, set alarm, remove alarm, set vibration mode, set normal ring tone mode, set a combination of vibration and ring tone mode, and provide help on how to use said device.

21. The method of claim 18, where said device is in normal mode.

22. The method of claim 18, the method further comprising the steps of: a. conditioning voice input using said signal conditioning unit; b. digitizing analog voice input using said signal conditioning unit;' c. performing speaker recognition and speech recognition functions upon said digital signals using said voice recognition unit to identify menu items; d. queuing signals from said voice recognition unit; e. performing actions according to queued signals by said menu item object unit; f. updating context information on said context control unit; g. said context control unit updating context specific keyword information in said context specific keyword dataset; h. context control unit purging any queued signals to the menu item objects unit; i. performing any further functions based on signal queue by said menu item objects unit; and j. playing stored voice output by said voice synthesis unit,

where said device is in normal mode.

23. The method of claim 18, where said device is in learning mode.

24. The method of claim 18, where the method further comprises step of said device prompting owner of said device with voice on how to use said device, when the device is in learning mode.

25. The method of claim 17, the method further comprising the steps of: a. extracting owner voice characteristics from keywords used to identify menu items; b. feeding said owner voice characteristics to said keyword store control unit; c. said keyword store control unit compiling a data table of said voice characteristics and corresponding keywords; and d. said keyword store control updating said complete keyword dataset with said data table, where said device is in learning mode.

26. The method of claim 18, where the method further comprises the step of owner changing the mode of operation through a voice command.

27. The method of claim 18, where the method further comprises the step of owner locating said device through a pre-determined voice command and said device annunciating upon receiving said pre-determined voice command itself, where owner misplaces said device.

28. The method of claim 18, where said device is in learning mode when switched on for the first time.

29. The method of claim 18, where said device responds only to owner's voice by checking voice characteristics of input voice with owner's characteristics already stored in said context specific keyword datasets by said VR unit.