GB2406471A - Mobile phone with speech-to-text conversion system - Google Patents

Mobile phone with speech-to-text conversion system Download PDF

Info

Publication number
GB2406471A
GB2406471A GB0322512A GB0322512A GB2406471A GB 2406471 A GB2406471 A GB 2406471A GB 0322512 A GB0322512 A GB 0322512A GB 0322512 A GB0322512 A GB 0322512A GB 2406471 A GB2406471 A GB 2406471A
Authority
GB
United Kingdom
Prior art keywords
text message
mobile communication
command
communication device
message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB0322512A
Other versions
GB2406471B (en
GB0322512D0 (en
Inventor
Vaia Sdralia
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Priority to GB0322512A priority Critical patent/GB2406471B/en
Publication of GB0322512D0 publication Critical patent/GB0322512D0/en
Publication of GB2406471A publication Critical patent/GB2406471A/en
Application granted granted Critical
Publication of GB2406471B publication Critical patent/GB2406471B/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • G10L15/265
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/26Devices for calling a subscriber
    • H04M1/27Devices whereby a plurality of signals may be stored simultaneously
    • H04M1/271Devices whereby a plurality of signals may be stored simultaneously controlled by voice recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72436User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for text messaging, e.g. short messaging services [SMS] or e-mails
    • H04M1/72552
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2250/00Details of telephonic subscriber devices
    • H04M2250/70Details of telephonic subscriber devices methods for entering alphabetical characters, e.g. multi-tap or dictionary disambiguation

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Business, Economics & Management (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A mobile communication device comprises means for converting a spoken utterance into a command associate with the construction of a text message. The command may be the input of a single character, including a letter, number, punctuation mark or a predefined expression into a text message. The command may provide means for amending or deleting characters or additional means for processing the text message including selecting a recipient or requesting the transmission and storage of the message. A method of constructing a text message using a mobile communication device is also disclosed.

Description

IMPROVEMENTS IN MOBILE COMMUNICATION DEVICES
The present invention relates to text message applications in mobile communication devices.
Along with the proliferation of mobile communication device usage, such as mobile telephone usage, text messaging has become an increasingly popular method of communication. Text messaging services, such as the short message service (SMS) system used in GSM compatible telephones, allow communication between users using simple textual matter, which is constructed and transmitted in a predetermined message format.
In prior systems, when a user wishes to construct a text message, the relevant mode of the mobile telephone is entered and a text edit screen is displayed. The user presses the keypad keys that are associated with a specific letter in order to construct the message. Due to the small size of mobile telephones and the necessity to have keys close together, a large number of errors can occur when constructing a message in this manner.
Once the message has been constructed, the user sends it to their chosen recipient via the communication network the mobile telephone is currently connected to. The recipient will then receive the text message and may reply if they choose to.
It is known from UK patent application GB 2380906 to include a voice recognition processing unit on a mobile device in order to record conversations carried out by the user. The conversations are stored in a text string for sending to another device by, for example, email. This system is a word recognition system that requires substantial processing power due to the requirement to recognise a large vocabulary of words.
It is also known from European Patent application EP 1051015 to incorporate dictation functionality within a cellular telephone. This system allows verbal passages to be recorded into a digital memory and includes a speech converter in order to display the verbal passages on a screen. This system requires a substantial amount of processing power in order to recognise the large number of possible spoken words and convert spoken passages made up from these words into a displayable format, as well as requiring a large amount of digital memory to store the passages.
The present invention aims to overcome or at least alleviate some or all of the aforementioned problems.
An aspect of the present invention provides a mobile communication device that converts a spoken utterance received by the mobile communication device into an individual command associated with the construction of a text message.
A further aspect of the present invention provides a method of constructing a text message in a mobile communication device including the step of: converting a spoken utterance received by the mobile communication device into an individual command associated with the text message.
The present invention provides the advantage of utilising the minimal amount of processing power and memory available in a mobile communication device in order to provide a voice activated text construction system.
The present invention provides the additional advantage that a user is able to construct a standard text message without the requirement to use hand eye coordination. Thus, the user is able to construct the text message while being able to concurrently carry out a different manual task, or while being visually aware of the surrounding environment. Also, any lack of manual dexterity of the operator of the mobile communication device does not affect the number of errors contained in the message.
The present invention provides the further advantage that the user is able to construct a text message using a familiar construction technique as used when constructing a message manually. Additionally, text messages may be constructed in a timelier manner using this system than by constructing messages manually.
Specific embodiments of the invention will now be described by way of example only, with reference to the accompanying drawings, in which: Figure I shows a system block diagram of a mobile telephone according to an embodiment of the present invention; Figure 2 shows a mobile telephone according to an embodiment of the present invention; Figure 3 shows an operational flow chart of a speech recognition system used in an embodiment of the present invention; Figure 4 shows an operational flow chart of a voice activated text construction system according to an embodiment of the present invention;
FIRST EMBODIMENT
Figure 1 shows a mobile telephone system according to this embodiment. The system includes a power supply 101, such as a battery, and a transmitting and receiving device 103 for sending and receiving data via a wireless communication channel utilising a system such as, for example, UMTS, GPRS or GSM. The device further includes a smart card 105, such as a SIM (Subscriber Identity Module), a controller 107 and a memory storage device 109, such as an E2PROM. All the components work in conjunction with each other in order for a user to communicate with users using other devices. Embedded into the controller 107 is a digital signal processor. The digital signal processor converts component parts of a text message uttered by user into a text message for sending to a chosen recipient, as will be explained later. A speaker independent speech recognition system, and a local memory device are incorporated within the embedded digital signal processor of the controller 107.
Figure 2 shows a mobile telephone being used according to this embodiment. The mobile telephone 205 includes a display screen 201, a keypad 203 and a microphone 209. A user 211 enables a text construction mode by pressing an appropriate key on the keypad 203. The user 211 is then able to construct a text message 207 by uttering the individual characters required to make up the message 207 into the microphone 209, such as the letters H. E and L to start forming the word HELLO.
The speech recognition system recognises the individual characters spoken by the user 211 and instructs the controller 107 to display the characters on the screen 201 as the message 207 is constructed.
The user 211is able to input any characters that he wishes to use in the text message along with punctuation marks, spaces, and predefined expressions such as smiley faces. For example, in order to send a brief text message of the form: HELLO JANE:-) the user spells out the letters H. E, L, L and O. and then utters the character "space" followed by the letters J. A, N and E followed by the character "smiley face".
Additionally, a limited number of command signals allow the user to edit the text message 207 by moving the cursor on the screen using commands such as "forward" and "back", and deleting characters by uttering the command "delete". Also, the message may be sent to a recipient through using the command "send", scrolling through the address book using the commands "up" and "down" and selecting the recipient by using the command "select".
The digital signal processor includes a background noise filter that will determine whether the audio input received via the microphone 209 is intended audio output from the user, or merely background noise created in the local environment. The digital signal processor can determine whether the input is background noise by, for example, measuring the amplitude of the noise or determining whether the frequency of the noise is a typical human voice characteristic. The received input may then be rejected if the amplitude of the signal is below a predetermined level, or if the frequency response of the signal is not within the normal speech frequencies.
The digital signal processor is thus able to recognise all letters of the alphabet along with all the usual punctuation marks, spaces, and predefined expressions, as well as a limited number of command signals. Therefore, due to the limited amount of recognition material required to be stored and processed to use the system, the system is particularly suited for mobile telephone devices with a limited amount of processing power, memory and physical storage space.
As explained earlier, the digital signal processor has a speech recognition system incorporated into it that converts a spoken utterance into a component part of a text message. Figure 3 shows an operational flow chart of a speech recognition system according to this embodiment. At step 301 the speech recognition system receives an audio signal. Spectral components of the audio signal are obtained at step 303 by using known digital signal processing techniques such as band pass filtering, normalization and Fast Fourier Transformations (FFTs). At step 305 the components of the audio signal received are compared with stored templates corresponding to actionable responses used by the system.
Hidden Markov Model (HMM) techniques are used, as are well known in the industry, to determine whether the audio signal received matches that of an actionable response, at step 307. Software for implementing HMM speech recognisers is widely available, as are chipsets pre-programmed to do so. The templates (reference patterns) used by the HMM recogniser comprise one (or more) template(s) for each letter, plus templates for specific command words.
If a match is found, at step 309 the system determines the relevant output associated with the matched response. However, if no match is found then the process is restarted from the beginning.
Figure 4 shows an operational flow chart of the mobile telephone text construction system according to this embodiment. The application is started at step 401 by the user pressing a predetermined button on the keypad to enter the appropriate text construction mode. At step 403 the application continuously listens for any audible signals received via the microphone of the mobile telephone. Any received signals are temporarily stored in the memory portion of the digital signal processor.
The system is able to determine whether the signal received is merely background noise at step 405, as discussed earlier. If the signal is considered to be background noise then the system reverts back to step 403 to continuously listen for any further signals. If however the signal is not considered to be background noise, then, at step 407, spectral components of the received signal are obtained.
The spectral components are obtained using known digital signal processing techniques such as those discussed earlier with reference to Figure 3. The system then determines whether the received signal matches a stored actionable response at step 409, using Hidden Markov Model matching S techniques as discussed earlier. If there is not considered to be a match, the system reverts to step 403. However, if a match is found then the system determines whether the stored actionable response is that of a text message character at step 411.
At step 413, if the actionable response is that of a stored text message character, the system displays the matched character on the screen of the mobile telephone and reverts back to step 403. If the system determines that the response is not a character then it is assumed that the response must be a command signal. Therefore, at step 415, it is determined which command is associated with the actionable response and the command is executed.
IS If it is determined at step 417 that the user has reached the end of the message, for example through the use of the command "end" or "send", then the user can review the message at step 419. If the system determines that the user has not reached the end of the message then the system reverts back to step 403.
After the message has been reviewed at step 419, an option to modify the message is provided at step 421. If the message is required to be modified then the system reverts back to step 403. If the message does not require any modification then the completed text message is sent to the chosen recipient at step 423.
The digital signal processor includes a local memory that is able to store a predetermined number of letter combinations that are typically used by the user. This allows the system to determine whether a particular letter combination is an error, such as the combination of the letters Q and S. and so highlight any unusual combinations in order to indicate to the user that an error may have occurred. The user can amend the message and correct any errors that have appeared at the review stage.
The system described above allows a user to construct a text message in a fashion they are used to by entering each individual component part of the text message in the order they would do so if creating the text message by manually pressing the appropriate keys. The system further allows users to create short versions of words such as, for example, "CU" for the words "see 1 5 you".
AISO, the need for hand eye coordination is negated and so provides a text message construction system that produces few errors. Any errors that are produced would usually result in a single character in isolation to be shown in error, and so would normally still allow a recipient to determine the correct message. Further, the system described can be implemented in devices with limited physical space, power and memory. l
ADDITIONAL EMBODIMENTS
It will be understood that embodiments of the present invention are described by way of example only, and that modifications may be made, and alternatives may be used without departing from the scope of the invention.
It will be understood that the system may use known good examples of speech to adapt the set of actionable responses stored in the system. The system would be able to assume the examples of speech were good if, for example, the user sends the text message without any modification of the component parts of the text message. The previous utterances made to prepare the last text message may be stored in a temporary storage device for this purpose. Thus, the system may begin with standard recognition templates (corresponding to speaker-independent recognition) and adapt to speaker dependent recognition based on good samples from a particular user, which should be more reliable in the long run if the phone is only used by that user.
It will further be understood that an external microphone may be used with this system. For example, a hands free kit may be utilised so that the user of the system may construct text messages without holding the mobile telephone. This may be useful, for example, when constructing text messages whilst in a car.
It will further be understood that a user may provide alternative sounds for use when recognising certain letters, and alternative templates are therefore stored for mapping the alternative sounds to the letter concerned.
For example, instead of uttering the letters "P" and "T", the user may swap the recognition sounds for the letter "T" to the sound "tango", or swap the recognition sound "P" to the sound "peter". Therefore, the text message is still being constructed by recognising sounds associated with the individual characters making up the message but additionally allows the user to customise the speech recognition system such that errors may be reduced for certain problem letters.
It will further be understood that a user may customise, for example, replace or modify, the list of voice commands available, such as those for editing, deleting and send messages. For example, the user may decide to use the command signal 'reverse' instead of back' in order to move the cursor.
Several alternative commands may be pre-stored, or a user may be able to record a new template for a new command word, and select which existing command to map it onto.
It will be understood that any adaptation or analysis performed by the voice recognition system, such as that performed on the speaker's voice or to any background noise, may be carried out either in real time during use of the system, or at a later time prior to sending a text message (if the sound is stored). This adaptation analysis may then be used to confirm the spoken sounds are recognizable and usable.
Although Hidden Markov Models are used in the first embodiment, it will be understood that any other suitable type of pattern matching system may be used, such as dynamic time warping or neural networks for example.
It will further be understood that the technique of detecting usual letter combination may be implemented by allowing a memory area within the digital signal processor to contain an adaptable list of more common letter combinations. This list may be increased, or modified, over time depending upon the users more typical text message construction.
It will further be understood that the system may incorporate a spell checking function. Whereby words are spell checked after the utterance of certain characters indicating the end of a word. For example, the detection of the utterance "space" or any punctuation character will indicate that the previous set of letters would spell out a word. Therefore, a spell check system would then be operated upon the detection of such characters.
It will be further understood that text construction may be started in a way other than by pressing a predetermined key on the keypad. For example, the user may utter a particular command, such as "start", to enable the 1 5 application.

Claims (14)

1. A mobile communication device that converts a spoken utterance received by the mobile communication device into an individual command associated with the construction of a text message.
2. The mobile communication device of claim 1 wherein the command is that of inputting a single character into the text message.
3. The mobile communication device of claim 2 wherein the single character is one of the following: a letter, number, punctuation mark or Redefined expression.
4. The mobile communication device of claim 1 wherein the l 5 command provides means to amend or delete characters in the text message.
5. The mobile communication device of claim I wherein the command provides means for choosing a recipient to which the text message is to be sent.
6. The mobile communication device of claim 1 wherein the command provides means for sending the text message.
7. The mobile communication device of claim 1 wherein the command provides means for storing a text message.
8. A method of constructing a text message in a mobile communication device including the step of: converting a spoken utterance received by the mobile communication device into an individual command associated with the text message.
9. The method of claim 8 wherein the individual command inputs a single character into the text message.
10. The method of claim 9 wherein the single character is one of the following: a letter, number, punctuation mark or predefined expression.
1 5
11. The method of claim 8 wherein the individual command deletes or amends characters in the text message.
12. The method of claim 8 wherein the individual command chooses a recipient to which the text message is to be sent.
13. The method of claim 8 wherein the individual command sends the text message.
14. The method of claim wherein the individual command stores the text message. s
GB0322512A 2003-09-25 2003-09-25 Improvements in mobile communication devices Expired - Fee Related GB2406471B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB0322512A GB2406471B (en) 2003-09-25 2003-09-25 Improvements in mobile communication devices

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB0322512A GB2406471B (en) 2003-09-25 2003-09-25 Improvements in mobile communication devices

Publications (3)

Publication Number Publication Date
GB0322512D0 GB0322512D0 (en) 2003-10-29
GB2406471A true GB2406471A (en) 2005-03-30
GB2406471B GB2406471B (en) 2007-05-23

Family

ID=29286849

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0322512A Expired - Fee Related GB2406471B (en) 2003-09-25 2003-09-25 Improvements in mobile communication devices

Country Status (1)

Country Link
GB (1) GB2406471B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2433000A (en) * 2005-12-02 2007-06-06 Data Transfer & Comm Ltd Mobile phone accessory device
GB2406476B (en) * 2003-09-25 2008-04-30 Canon Europa Nv Cellular telephone
US8032084B2 (en) 2001-07-18 2011-10-04 Data Transfer & Communications Limited Data security device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020077143A1 (en) * 2000-07-11 2002-06-20 Imran Sharif System and method for internet appliance data entry and navigation
US20020142787A1 (en) * 2001-03-27 2002-10-03 Koninklijke Philips Electronics N.V. Method to select and send text messages with a mobile
EP1293962A2 (en) * 2001-09-13 2003-03-19 Matsushita Electric Industrial Co., Ltd. Focused language models for improved speech input of structured documents
US20030139922A1 (en) * 2001-12-12 2003-07-24 Gerhard Hoffmann Speech recognition system and method for operating same

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6895257B2 (en) * 2002-02-18 2005-05-17 Matsushita Electric Industrial Co., Ltd. Personalized agent for portable devices and cellular phone
GB2413040B (en) * 2002-12-09 2006-10-18 Voice Signal Technologies Inc Provider-activated software for mobile communication devices

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020077143A1 (en) * 2000-07-11 2002-06-20 Imran Sharif System and method for internet appliance data entry and navigation
US20020142787A1 (en) * 2001-03-27 2002-10-03 Koninklijke Philips Electronics N.V. Method to select and send text messages with a mobile
EP1293962A2 (en) * 2001-09-13 2003-03-19 Matsushita Electric Industrial Co., Ltd. Focused language models for improved speech input of structured documents
US20030139922A1 (en) * 2001-12-12 2003-07-24 Gerhard Hoffmann Speech recognition system and method for operating same

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8032084B2 (en) 2001-07-18 2011-10-04 Data Transfer & Communications Limited Data security device
GB2406476B (en) * 2003-09-25 2008-04-30 Canon Europa Nv Cellular telephone
GB2433000A (en) * 2005-12-02 2007-06-06 Data Transfer & Comm Ltd Mobile phone accessory device

Also Published As

Publication number Publication date
GB2406471B (en) 2007-05-23
GB0322512D0 (en) 2003-10-29

Similar Documents

Publication Publication Date Title
US6895257B2 (en) Personalized agent for portable devices and cellular phone
US8244540B2 (en) System and method for providing a textual representation of an audio message to a mobile device
US7203651B2 (en) Voice control system with multiple voice recognition engines
KR101149135B1 (en) Method and apparatus for voice interactive messaging
US20100145696A1 (en) Method, system and apparatus for improved voice recognition
US20090198497A1 (en) Method and apparatus for speech synthesis of text message
US6526292B1 (en) System and method for creating a digit string for use by a portable phone
CN102984666B (en) Address list voice information processing method in a kind of communication process and system
CN100521708C (en) Voice recognition and voice tag recoding and regulating method of mobile information terminal
KR20030044899A (en) Method and apparatus for a voice controlled foreign language translation device
JP2000056792A (en) Method and device for recognizing user's utterance
US7676364B2 (en) System and method for speech-to-text conversion using constrained dictation in a speak-and-spell mode
KR20080054591A (en) Method for communicating voice in wireless terminal
US20030135371A1 (en) Voice recognition system method and apparatus
KR102666826B1 (en) Speaker classification system using STT
KR20100081022A (en) Method for updating phonebook and mobile terminal using the same
KR100759728B1 (en) Method and apparatus for providing a text message
KR100467593B1 (en) Voice recognition key input wireless terminal, method for using voice in place of key input in wireless terminal, and recording medium therefore
GB2406471A (en) Mobile phone with speech-to-text conversion system
US20080146197A1 (en) Method and device for emitting an audible alert
KR20220121456A (en) Speaker classification system that categorizes and stores conversation text
CN111274828A (en) Language translation method, system, computer program and handheld terminal based on message leaving
KR100381970B1 (en) portable telephone having lies searching function and searching method therefor
KR20100121072A (en) Management method for recording communication history of portable device and supporting portable device using the same
KR100817284B1 (en) A method and apparatus of offering effective sound for mobile station

Legal Events

Date Code Title Description
PCNP Patent ceased through non-payment of renewal fee

Effective date: 20090925