GB2427500A - Mobile telephone text entry employing remote speech to text conversion - Google Patents

Mobile telephone text entry employing remote speech to text conversion Download PDF

Info

Publication number
GB2427500A
GB2427500A GB0512759A GB0512759A GB2427500A GB 2427500 A GB2427500 A GB 2427500A GB 0512759 A GB0512759 A GB 0512759A GB 0512759 A GB0512759 A GB 0512759A GB 2427500 A GB2427500 A GB 2427500A
Authority
GB
United Kingdom
Prior art keywords
text
speech
network
capability
wireless communication
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB0512759A
Other versions
GB0512759D0 (en
Inventor
Mats Hellman
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Symbian Software Ltd
Original Assignee
Symbian Software Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Symbian Software Ltd filed Critical Symbian Software Ltd
Priority to GB0512759A priority Critical patent/GB2427500A/en
Publication of GB0512759D0 publication Critical patent/GB0512759D0/en
Publication of GB2427500A publication Critical patent/GB2427500A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Telephonic Communication Services (AREA)

Abstract

Text is input in to a mobile telephone or similar wireless communications device, by speaking the required text. The device transmits the speech input to a network, where the speech is converted into text and transmits the created text back to the device. A unique identifier may be used so that, on receipt, the device is able to insert the text into an appropriate storage location on the device. The speech may be recorded on the device and extraneous matter lacking semantic content, such as hesistation, removed prior to sending the speech to the network.

Description

Over the Air Speech to Text Input This invention relates to an improved method of inputting quantities of text on a computing device having a wireless capability, and in particular to a method for creating such textual input through an over the air speech to text conversion facility. Currently, one of the most unsatisfactory aspects of most mobile telephones is the facilities that they offer for the input of text. Because the standard telephone keypad is principally for numeric input, letters and other characters are generated by pressing the numeric keys multiple times. So, entering the five letter word SORRY requires 16 key presses (4 presses of the seven key, 3 presses of the six key, 3 presses of the 7 key, another 3 presses of the 7 key, and finally 3 presses of the nine key). Apart from the obvious inconvenience of this mechanism to a user, it is error-prone. Telephones are small devices, and the keys are both small and placed close together; this makes it relatively easy to press a wrong key by accident. Furthermore, if too long a gap is left when entering a letter requiring multiple key presses, it is very easy for the wrong letter to be generated.It is also easy for the wrong letter to be generated if too short a gap is left between letters where the same key is used for consecutive letters in a word (such as the word HIGH, which requires the four key to be pressed 8 times with correct intervals). Even the simplest mobile phones demand quantities of textual input; entering names into the address book and generating SMS (short message service) messages are facilities which almost all mobile phone users take advantage of on a regular basis. However, even more text needs to be generated to take advantage of the facilities on more advanced mobile phones, which have more complex address books, calendar, alarm and agenda functionality, can handle email with document attachments and also run web browsers. Various solutions have been proposed to solve this problem. One common method is for textual data to be entered using a standard personal computer, with its proper keyboard and large screen, with the text then being transferred to the telephone over a wired or wireless connection. The main disadvantage of this method is that it cannot be easily used when the telephone is actually mobile, and used for the purposes and in the context for which it was designed. Other improved methods of text input which are capable of being used on the device do exist, but involve compromises which affect usability. For example, the inclusion by some manufacturers of a proper alphanumeric keyboard on the device generally either affects portability by making it too big to fit comfortably in a pocket, or else becomes over-miniaturized with the results that the keyboard is too small to be easily used; it can also affect the usability of the device as a normal telephone. Other manufacturers have included touch screens with either virtual keyboards or handwriting recognition. All these attempts have achieved some success, but none of them seems likely to be accepted as a universally convenient method, if for no other reason that they cannot easily be used one-handed while mobile. However, it has been known for some time that voice recognition technology can be used on mobile telephones; voice dialing has been the main use of this in the past. Since the mobile telephone is by design ideally suited to the input of speech, the provision of a speech-to-text conversion facility offers a means of inputting text into such a device that takes advantage of it primary functionality rather than making an attempt to convert it into a quite different type of device. Mobile phone devices with speech recognition capability are now becoming available. The SGH-P207 telephone from Samsung is the first example of this art; however, the speech-to-text conversion facilities are necessarily limited on such a device. This is because there will inevitably be significant practical problems to the implementation of speech-text technology on mobile phones because they are resource constrained device. It is known, for example, that desktop computers require significant computing resources to perform this task; it is recommended for a typical product, Dragon 8, that a computer should have a 500MHz processor, 512MB system RAM and 500MB of free disk (see http://www.scansoft.com/naturallyspeaking/preferred/sysreqs.asp and the independent assessment at http://www.dyslexic.com/database/ articles/print/dicttech.html).It should be noted that even with these resources, desktop computers using speech recognition software work best with good quality microphones and as little ambient sound as possible. Mobile phones have difficulty meeting the required hardware specification because in comparison to desktop computers they are resource constrained devices. Furthermore, because they will typically need to be used when mobile, and often in environments where ambient sound is ubiquitous, the speech recognition software will almost certainly be even more computationally expensive than on a desktop computer. Also, the extra power consumption needed for a speech engine would inevitably shorten operational battery life for the device, which is a serious concern for all mobile devices. It is therefore an object of the present invention to provide an improved method of providing speech recognition in a mobile computing device having a wireless communication capability. According to a first aspect of the present invention there is provided a method of creating textual data on a computing device including a capability for wireless communication, the method comprising transmitting speech from the device to a network comprising an ability to convert speech into text; and arranging for the device to receive from the network text generated from the transmitted speech. According to a second aspect of the present invention there is provided a computing device including a capability for wireless communication and arranged to operate in accordance with a method of the first aspect. According to a third aspect of the present invention there is provided an operating system for causing a computing device including a capability for wireless communication to operate in accordance with a method of the first aspect. Embodiments of the present invention will now be described, by way of further example only. With the present invention, it has been realized that the problems caused by the lack of computing, memory, storage and power resources on a mobile telephone may be overcome, in relation to speech recognition, by making use of the device communication capabilities. Instead of siting the computationally expensive speech recognition engines on the mobile phone itself, the present invention sites them in servers or other entities within a communication network to which the mobile telephone can be connected. Speech may either be recorded on the mobile phone and subsequently transmitted to the network, or alternatively may be transmitted to the network in real time, as spoken. The former option has the benefit that a user can ensure that the speech is filtered to exclude extraneous matter which is deplete of semantic content and is therefore well-formed before transmission: in other words the speech can be filtered for extraneous matter such as "Err .." or "You know ..". The latter option allows for the network to give real time feedback on the quality of the speech, and the extraneous "Err .." or "You know .." or other such phrases or utterances can be filtered out by the use of intelligent software. In both cases, the speech recognition software in the network, which is not resource constrained in any way and can therefore be of the highest quality, converts the sounds it receives into text and sends them back to the mobile telephone as text; many possible transport mechanisms exist for this, including SMS, Email, Instant Messaging, WAP Push. Dedicated protocols and transport mechanisms are also possible and these are also considered to fall within the capability of this invention. In this way, this invention makes possible the addition of quantities of text to emails, messages, documents, notes and any textual application in an intuitive, efficient and user friendly way without making impossible demands on the resource constrained mobile phone. The functionality for requesting the conversion of speech to text may be provided as part of the operating system on the mobile telephone; in this implementation, at any point where alphabetic or text input is expected, an icon or menu item or a dedicated or programmable hardware button enables speech to be transmitted to the network (either directly or via a previous recording) and text to be received in return. The speech is, ideally, tagged with a unique ID, which is also attached to the returned text; this enables the correct routing of the text received to its correct location. Preferably, this is handled automatically within the software. Alternatively, applications writers may provide facilities for the use of this functionality by adding a speech transmission (or record and transmit) option to access the required speech to text network functionality; this may be reached by a menu option or a user setting in appropriate applications in the software on the device. A third option is for network operators to add this functionality themselves to existing mobile telephones; this option recognises the fact that it is the networks who would be providing the speech recognition engines and that it is possible that these may be done on some networks and not on others, or done on different networks in incompatible ways. The network operator may also provide a dedicated or programmable hardware button for initiating this functionality. Any current or future method of provisioning telephones services by network operators is likely to be suitable for use with this invention.Examples include Telephone firmware updates Provisioning of new versions of applications or operating system components Addition of an option to an existing application, for example, by the use of methodology as disclosed in GB Patent Application 0422092.7 for embedding externally defined commands. Typical usage of this invention might be as follows: 1. The user creates a new entity for containing text (such as a message, email, note or document) and then selects (either via a hardware button or a software control such as a menu or icon) the option to input text via speech. 2. The speech is either recorded on the device and then transmitted to the network, or alternatively, a connection is opened to the network for the speech to be dictated directly in real time. 3. The speech is received by the speech recognition component in the network, and converted to text. 4. The text is sent back in as short a time as possible and is automatically inserted in the entity for containing text from which it was originally requested. Other aspects of speech to text conversion, such as the need to train the recognition engine to recognise the peculiarities of particular voices and accents, are well know to those versed in that particular art and are not a part of this invention. They will not be referred to in greater detail, therefore, in this description. It should also be noted that while the description of this invention concentrates on its implementation on mobile telephones and mobile telephone networks, it is capable of being applied to any computing devices capable of wireless communication with a network; for example, a microphone-equipped tablet PC with 802.11 b wireless networking capability could also use this invention to request of an entity on the network that it convert speech to text using the methods disclosed herein. The description here in terms of mobile telephones is therefore intended to illustrate the utility of the invention and not to restrict its applicability in any way. It can be appreciated, therefore, that significant advantages are obtained through the use of the present invention, including:- It enables the input of any quantity of text without the necessity to resort either to awkward alphabetic input via a numeric keypad, or twohanded operation of some type of alphanumeric keyboard. By offloading the speech to text conversion on to the network, it avoids the necessity for a mobile telephone to have the resources necessary for such a task; this both reduces the cost of building such a device and also ensures that device power consumption is minimised. The location of the speech recognition engine in the network means that advances in this type of technology can be implemented transparently to mobile phone users and manufacturers. Although the present invention has been described with reference to particular embodiments, it will be appreciated that modifications may be effected whilst remaining within the scope of the present invention as defined by the appended claims.

Claims (12)

Claims:
1. A method of creating textual data on a computing device including a capability for wireless communication, the method comprising a. transmitting speech from the device to a network comprising an ability to convert speech into text; and b. arranging for the device to receive from the network text generated from the transmitted speech.
2. A method according to claim 1 wherein speech is transmitted from the device to the network in real time as it is spoken.
3. A method according to claim 1 wherein speech is recorded on the device and the recording is subsequently transmitted to the network.
4. A method according to any one of claims 1 to 3 wherein the capability of wireless communication includes at least one of: a. A cellular telephone network b. 802.11 b wireless networking c. Bluetooth d. Infrared.
5. A method according to any of any one of the preceding claims wherein the computing device is selected to comprise a mobile telephone.
6. A method according to any one of the preceding claims in which the means by which text is received from the network includes any one or more of: a. SMS b. Email c. Instant Messaging d. WAP Push
7. A method according to any one of the preceding claims wherein the text generated from the speech is processed to exclude extraneous matter deplete of semantic content.
8. A method according to any one of the preceding claims wherein the functionality for converting speech to text is selectable by means of at least one of: a. A dedicated or programmable hardware button b. A menu item c. An icon
9. A method according to any one of the preceding claims wherein a speech transmission for conversion into text is identified by means of an identifier attached to the speech transmission and associated with an application location for the text, and in which the identifier is attached to text generated in response to the respective speech transmission to enable the device to use the identifier to insert the text into the application location on the device.
10.A method according to any one of the preceding claims in which the functionality for requesting a conversion of speech into text is a. Included in the operating system of the device; b. Included in a specific application on the device; c. Provided as an option by a network operator.
11.A computing device including a capability for wireless communication and arranged to operate in accordance with a method as claimed in any one of claims 1 to 10.
12.An operating system for causing a computing device including a capability for wireless communication to operate in accordance with a method as claimed in any one of claims 1 to 10.
GB0512759A 2005-06-22 2005-06-22 Mobile telephone text entry employing remote speech to text conversion Withdrawn GB2427500A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB0512759A GB2427500A (en) 2005-06-22 2005-06-22 Mobile telephone text entry employing remote speech to text conversion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB0512759A GB2427500A (en) 2005-06-22 2005-06-22 Mobile telephone text entry employing remote speech to text conversion

Publications (2)

Publication Number Publication Date
GB0512759D0 GB0512759D0 (en) 2005-07-27
GB2427500A true GB2427500A (en) 2006-12-27

Family

ID=34855993

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0512759A Withdrawn GB2427500A (en) 2005-06-22 2005-06-22 Mobile telephone text entry employing remote speech to text conversion

Country Status (1)

Country Link
GB (1) GB2427500A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009073768A1 (en) * 2007-12-04 2009-06-11 Vovision, Llc Correcting transcribed audio files with an email-client interface
EP2198527A1 (en) * 2007-09-12 2010-06-23 Microsoft Corporation Speech-to-text transcription for personal communication devices
EP2126902A4 (en) * 2007-03-07 2011-07-20 Vlingo Corp Speech recognition of speech recorded by a mobile communication facility
US9245522B2 (en) 2006-04-17 2016-01-26 Iii Holdings 1, Llc Methods and systems for correcting transcribed audio files

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0851403A2 (en) * 1996-12-27 1998-07-01 Casio Computer Co., Ltd. Apparatus for generating text data on the basis of speech data input from terminal
DE10017503A1 (en) * 2000-04-07 2001-10-25 Duque Anton Manuel Speech recognition method in wireless communication terminal, involves recognizing words held on voice server, and digitally transferring recognized results over Internet
US6366882B1 (en) * 1997-03-27 2002-04-02 Speech Machines, Plc Apparatus for converting speech to text
US20030078777A1 (en) * 2001-08-22 2003-04-24 Shyue-Chin Shiau Speech recognition system for mobile Internet/Intranet communication
EP1394771A1 (en) * 2002-04-04 2004-03-03 NEC Corporation Speech recognition conversation selection device, speech recognition conversation system, speech recognition conversation selection method, and program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0851403A2 (en) * 1996-12-27 1998-07-01 Casio Computer Co., Ltd. Apparatus for generating text data on the basis of speech data input from terminal
US6366882B1 (en) * 1997-03-27 2002-04-02 Speech Machines, Plc Apparatus for converting speech to text
DE10017503A1 (en) * 2000-04-07 2001-10-25 Duque Anton Manuel Speech recognition method in wireless communication terminal, involves recognizing words held on voice server, and digitally transferring recognized results over Internet
US20030078777A1 (en) * 2001-08-22 2003-04-24 Shyue-Chin Shiau Speech recognition system for mobile Internet/Intranet communication
EP1394771A1 (en) * 2002-04-04 2004-03-03 NEC Corporation Speech recognition conversation selection device, speech recognition conversation system, speech recognition conversation selection method, and program

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9245522B2 (en) 2006-04-17 2016-01-26 Iii Holdings 1, Llc Methods and systems for correcting transcribed audio files
US9715876B2 (en) 2006-04-17 2017-07-25 Iii Holdings 1, Llc Correcting transcribed audio files with an email-client interface
US9858256B2 (en) 2006-04-17 2018-01-02 Iii Holdings 1, Llc Methods and systems for correcting transcribed audio files
US10861438B2 (en) 2006-04-17 2020-12-08 Iii Holdings 1, Llc Methods and systems for correcting transcribed audio files
US11594211B2 (en) 2006-04-17 2023-02-28 Iii Holdings 1, Llc Methods and systems for correcting transcribed audio files
EP2126902A4 (en) * 2007-03-07 2011-07-20 Vlingo Corp Speech recognition of speech recorded by a mobile communication facility
EP2198527A1 (en) * 2007-09-12 2010-06-23 Microsoft Corporation Speech-to-text transcription for personal communication devices
JP2011504304A (en) * 2007-09-12 2011-02-03 マイクロソフト コーポレーション Speech to text transcription for personal communication devices
EP2198527A4 (en) * 2007-09-12 2011-09-28 Microsoft Corp Speech-to-text transcription for personal communication devices
WO2009073768A1 (en) * 2007-12-04 2009-06-11 Vovision, Llc Correcting transcribed audio files with an email-client interface

Also Published As

Publication number Publication date
GB0512759D0 (en) 2005-07-27

Similar Documents

Publication Publication Date Title
KR100394305B1 (en) E-mail processing system, processing method and processing device
US7912186B2 (en) Selectable state machine user interface system
US6125284A (en) Communication system with handset for distributed processing
US7308082B2 (en) Method to enable instant collaboration via use of pervasive messaging
US20010034225A1 (en) One-touch method and system for providing email to a wireless communication device
US8731525B2 (en) Single button contact request and response
US20080305742A1 (en) Interface for pda and computing device
EP1104155A2 (en) Voice recognition based user interface for wireless devices
JPH11215248A (en) Communication system and its radio communication terminal
CA2644931A1 (en) System and method for voice-enabled instant messaging
EP1593050B1 (en) System and method for processing a message in a server of a computer network sent by a mobile computer device
KR100363656B1 (en) Internet service system using voice
US20070263827A1 (en) System and method of receiving a call having an identified or unidentified number and an identified or unidentified name
CN1997022B (en) Remotely controllable soft keys
GB2427500A (en) Mobile telephone text entry employing remote speech to text conversion
KR20050083716A (en) A system and method for wireless audio communication with a computer
US20080147409A1 (en) System, apparatus and method for providing global communications
KR100380829B1 (en) System and method for managing conversation -type interface with agent and media for storing program source thereof
US8379809B2 (en) One-touch user voiced message
KR20080027572A (en) Voice message transmission system for using multimodal plug-in in end terminal browser based and method thereof
GB2444755A (en) Improved message handling for mobile devices
WO2015011217A1 (en) User-interface using rfid-tags or voice as input
KR20010097580A (en) Business Model that phone list and schedule,..., etc are saved automatically to mobile phone on electric wave by internet.
KR20010077316A (en) the system can use internet by sound with cellular-phone
KR200245838Y1 (en) Telephone Message Memo System Using Automatic Speech Recognition

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)