GB2427500A

GB2427500A - Mobile telephone text entry employing remote speech to text conversion

Info

Publication number: GB2427500A
Application number: GB0512759A
Authority: GB
Inventors: Mats Hellman
Original assignee: Symbian Software Ltd
Current assignee: Symbian Software Ltd
Priority date: 2005-06-22
Filing date: 2005-06-22
Publication date: 2006-12-27
Also published as: GB0512759D0

Abstract

Text is input in to a mobile telephone or similar wireless communications device, by speaking the required text. The device transmits the speech input to a network, where the speech is converted into text and transmits the created text back to the device. A unique identifier may be used so that, on receipt, the device is able to insert the text into an appropriate storage location on the device. The speech may be recorded on the device and extraneous matter lacking semantic content, such as hesistation, removed prior to sending the speech to the network.

Description

Over the Air Speech to Text Input This invention relates to an improved method of inputting quantities of text on a computing device having a wireless capability, and in particular to a method for creating such textual input through an over the air speech to text conversion facility. Currently, one of the most unsatisfactory aspects of most mobile telephones is the facilities that they offer for the input of text. Because the standard telephone keypad is principally for numeric input, letters and other characters are generated by pressing the numeric keys multiple times. So, entering the five letter word SORRY requires 16 key presses (4 presses of the seven key, 3 presses of the six key, 3 presses of the 7 key, another 3 presses of the 7 key, and finally 3 presses of the nine key). Apart from the obvious inconvenience of this mechanism to a user, it is error-prone. Telephones are small devices, and the keys are both small and placed close together; this makes it relatively easy to press a wrong key by accident. Furthermore, if too long a gap is left when entering a letter requiring multiple key presses, it is very easy for the wrong letter to be generated.It is also easy for the wrong letter to be generated if too short a gap is left between letters where the same key is used for consecutive letters in a word (such as the word HIGH, which requires the four key to be pressed 8 times with correct intervals). Even the simplest mobile phones demand quantities of textual input; entering names into the address book and generating SMS (short message service) messages are facilities which almost all mobile phone users take advantage of on a regular basis. However, even more text needs to be generated to take advantage of the facilities on more advanced mobile phones, which have more complex address books, calendar, alarm and agenda functionality, can handle email with document attachments and also run web browsers. Various solutions have been proposed to solve this problem. One common method is for textual data to be entered using a standard personal computer, with its proper keyboard and large screen, with the text then being transferred to the telephone over a wired or wireless connection. The main disadvantage of this method is that it cannot be easily used when the telephone is actually mobile, and used for the purposes and in the context for which it was designed. Other improved methods of text input which are capable of being used on the device do exist, but involve compromises which affect usability. For example, the inclusion by some manufacturers of a proper alphanumeric keyboard on the device generally either affects portability by making it too big to fit comfortably in a pocket, or else becomes over-miniaturized with the results that the keyboard is too small to be easily used; it can also affect the usability of the device as a normal telephone. Other manufacturers have included touch screens with either virtual keyboards or handwriting recognition. All these attempts have achieved some success, but none of them seems likely to be accepted as a universally convenient method, if for no other reason that they cannot easily be used one-handed while mobile. However, it has been known for some time that voice recognition technology can be used on mobile telephones; voice dialing has been the main use of this in the past. Since the mobile telephone is by design ideally suited to the input of speech, the provision of a speech-to-text conversion facility offers a means of inputting text into such a device that takes advantage of it primary functionality rather than making an attempt to convert it into a quite different type of device. Mobile phone devices with speech recognition capability are now becoming available. The SGH-P207 telephone from Samsung is the first example of this art; however, the speech-to-text conversion facilities are necessarily limited on such a device. This is because there will inevitably be significant practical problems to the implementation of speech-text technology on mobile phones because they are resource constrained device. It is known, for example, that desktop computers require significant computing resources to perform this task; it is recommended for a typical product, Dragon 8, that a computer should have a 500MHz processor, 512MB system RAM and 500MB of free disk (see http://www.scansoft.com/naturallyspeaking/preferred/sysreqs.asp and the independent assessment at http://www.dyslexic.com/database/ articles/print/dicttech.html).It should be noted that even with these resources, desktop computers using speech recognition software work best with good quality microphones and as little ambient sound as possible. Mobile phones have difficulty meeting the required hardware specification because in comparison to desktop computers they are resource constrained devices. Furthermore, because they will typically need to be used when mobile, and often in environments where ambient sound is ubiquitous, the speech recognition software will almost certainly be even more computationally expensive than on a desktop computer. Also, the extra power consumption needed for a speech engine would inevitably shorten operational battery life for the device, which is a serious concern for all mobile devices. It is therefore an object of the present invention to provide an improved method of providing speech recognition in a mobile computing device having a wireless communication capability. According to a first aspect of the present invention there is provided a method of creating textual data on a computing device including a capability for wireless communication, the method comprising transmitting speech from the device to a network comprising an ability to convert speech into text; and arranging for the device to receive from the network text generated from the transmitted speech. According to a second aspect of the present invention there is provided a computing device including a capability for wireless communication and arranged to operate in accordance with a method of the first aspect. According to a third aspect of the present invention there is provided an operating system for causing a computing device including a capability for wireless communication to operate in accordance with a method of the first aspect. Embodiments of the present invention will now be described, by way of further example only. With the present invention, it has been realized that the problems caused by the lack of computing, memory, storage and power resources on a mobile telephone may be overcome, in relation to speech recognition, by making use of the device communication capabilities. Instead of siting the computationally expensive speech recognition engines on the mobile phone itself, the present invention sites them in servers or other entities within a communication network to which the mobile telephone can be connected. Speech may either be recorded on the mobile phone and subsequently transmitted to the network, or alternatively may be transmitted to the network in real time, as spoken. The former option has the benefit that a user can ensure that the speech is filtered to exclude extraneous matter which is deplete of semantic content and is therefore well-formed before transmission: in other words the speech can be filtered for extraneous matter such as "Err .." or "You know ..". The latter option allows for the network to give real time feedback on the quality of the speech, and the extraneous "Err .." or "You know .." or other such phrases or utterances can be filtered out by the use of intelligent software. In both cases, the speech recognition software in the network, which is not resource constrained in any way and can therefore be of the highest quality, converts the sounds it receives into text and sends them back to the mobile telephone as text; many possible transport mechanisms exist for this, including SMS, Email, Instant Messaging, WAP Push. Dedicated protocols and transport mechanisms are also possible and these are also considered to fall within the capability of this invention. In this way, this invention makes possible the addition of quantities of text to emails, messages, documents, notes and any textual application in an intuitive, efficient and user friendly way without making impossible demands on the resource constrained mobile phone. The functionality for requesting the conversion of speech to text may be provided as part of the operating system on the mobile telephone; in this implementation, at any point where alphabetic or text input is expected, an icon or menu item or a dedicated or programmable hardware button enables speech to be transmitted to the network (either directly or via a previous recording) and text to be received in return. The speech is, ideally, tagged with a unique ID, which is also attached to the returned text; this enables the correct routing of the text received to its correct location. Preferably, this is handled automatically within the software. Alternatively, applications writers may provide facilities for the use of this functionality by adding a speech transmission (or record and transmit) option to access the required speech to text network functionality; this may be reached by a menu option or a user setting in appropriate applications in the software on the device. A third option is for network operators to add this functionality themselves to existing mobile telephones; this option recognises the fact that it is the networks who would be providing the speech recognition engines and that it is possible that these may be done on some networks and not on others, or done on different networks in incompatible ways. The network operator may also provide a dedicated or programmable hardware button for initiating this functionality. Any current or future method of provisioning telephones services by network operators is likely to be suitable for use with this invention.Examples include Telephone firmware updates Provisioning of new versions of applications or operating system components Addition of an option to an existing application, for example, by the use of methodology as disclosed in GB Patent Application 0422092.7 for embedding externally defined commands. Typical usage of this invention might be as follows: 1. The user creates a new entity for containing text (such as a message, email, note or document) and then selects (either via a hardware button or a software control such as a menu or icon) the option to input text via speech. 2. The speech is either recorded on the device and then transmitted to the network, or alternatively, a connection is opened to the network for the speech to be dictated directly in real time. 3. The speech is received by the speech recognition component in the network, and converted to text. 4. The text is sent back in as short a time as possible and is automatically inserted in the entity for containing text from which it was originally requested. Other aspects of speech to text conversion, such as the need to train the recognition engine to recognise the peculiarities of particular voices and accents, are well know to those versed in that particular art and are not a part of this invention. They will not be referred to in greater detail, therefore, in this description. It should also be noted that while the description of this invention concentrates on its implementation on mobile telephones and mobile telephone networks, it is capable of being applied to any computing devices capable of wireless communication with a network; for example, a microphone-equipped tablet PC with 802.11 b wireless networking capability could also use this invention to request of an entity on the network that it convert speech to text using the methods disclosed herein. The description here in terms of mobile telephones is therefore intended to illustrate the utility of the invention and not to restrict its applicability in any way. It can be appreciated, therefore, that significant advantages are obtained through the use of the present invention, including:- It enables the input of any quantity of text without the necessity to resort either to awkward alphabetic input via a numeric keypad, or twohanded operation of some type of alphanumeric keyboard. By offloading the speech to text conversion on to the network, it avoids the necessity for a mobile telephone to have the resources necessary for such a task; this both reduces the cost of building such a device and also ensures that device power consumption is minimised. The location of the speech recognition engine in the network means that advances in this type of technology can be implemented transparently to mobile phone users and manufacturers. Although the present invention has been described with reference to particular embodiments, it will be appreciated that modifications may be effected whilst remaining within the scope of the present invention as defined by the appended claims.

Claims

Claims:

1. A method of creating textual data on a computing device including a capability for wireless communication, the method comprising a. transmitting speech from the device to a network comprising an ability to convert speech into text; and b. arranging for the device to receive from the network text generated from the transmitted speech.

2. A method according to claim 1 wherein speech is transmitted from the device to the network in real time as it is spoken.

3. A method according to claim 1 wherein speech is recorded on the device and the recording is subsequently transmitted to the network.

4. A method according to any one of claims 1 to 3 wherein the capability of wireless communication includes at least one of: a. A cellular telephone network b. 802.11 b wireless networking c. Bluetooth d. Infrared.

5. A method according to any of any one of the preceding claims wherein the computing device is selected to comprise a mobile telephone.

6. A method according to any one of the preceding claims in which the means by which text is received from the network includes any one or more of: a. SMS b. Email c. Instant Messaging d. WAP Push

7. A method according to any one of the preceding claims wherein the text generated from the speech is processed to exclude extraneous matter deplete of semantic content.

8. A method according to any one of the preceding claims wherein the functionality for converting speech to text is selectable by means of at least one of: a. A dedicated or programmable hardware button b. A menu item c. An icon

9. A method according to any one of the preceding claims wherein a speech transmission for conversion into text is identified by means of an identifier attached to the speech transmission and associated with an application location for the text, and in which the identifier is attached to text generated in response to the respective speech transmission to enable the device to use the identifier to insert the text into the application location on the device.

10.A method according to any one of the preceding claims in which the functionality for requesting a conversion of speech into text is a. Included in the operating system of the device; b. Included in a specific application on the device; c. Provided as an option by a network operator.

11.A computing device including a capability for wireless communication and arranged to operate in accordance with a method as claimed in any one of claims 1 to 10.

12.An operating system for causing a computing device including a capability for wireless communication to operate in accordance with a method as claimed in any one of claims 1 to 10.