MXPA04011787A - Method for entering text. - Google Patents

Method for entering text.

Info

Publication number
MXPA04011787A
MXPA04011787A MXPA04011787A MXPA04011787A MXPA04011787A MX PA04011787 A MXPA04011787 A MX PA04011787A MX PA04011787 A MXPA04011787 A MX PA04011787A MX PA04011787 A MXPA04011787 A MX PA04011787A MX PA04011787 A MXPA04011787 A MX PA04011787A
Authority
MX
Mexico
Prior art keywords
word
vocalization
probable
candidate
character
Prior art date
Application number
MXPA04011787A
Other languages
Spanish (es)
Inventor
Kuansan Wang
Xuedong David Huang
Alejandro Acero
Milind V Mahajan
Original Assignee
Microsoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corp filed Critical Microsoft Corp
Publication of MXPA04011787A publication Critical patent/MXPA04011787A/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B1/00Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
    • H04B1/38Transceivers, i.e. devices in which transmitter and receiver form a structural unit and in which at least one part is used for functions of transmitting and receiving
    • H04B1/40Circuits
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/023Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
    • G06F3/0233Character input methods
    • G06F3/0237Character input methods using prediction or retrieval techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/228Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Document Processing Apparatus (AREA)
  • Machine Translation (AREA)
  • Telephone Function (AREA)
  • Character Discrimination (AREA)
  • User Interface Of Digital Computer (AREA)
  • Input From Keyboards Or The Like (AREA)

Abstract

In a method of entering text into a device a first character input is provided that is indicative of a first character of a text entry. Next, a vocalization of the text entry is captured. A probable word candidate is then identified for a first word of the vocalization based upon the first character input and an analysis of the vocalization. Finally, the probable word candidate is displayed for a user.

Description

METHOD TO INSERT TEXT FIELD OF THE INVENTION The invention relates generally to a method for inserting text into a device. More particularly, the invention relates to the insertion of vocalized text assisted by the insertion of a character in a device.
BACKGROUND OF THE INVENTION Small computing devices such as mobile phones and personal digital assistants (PDAs) are used with increasing frequency. The power of computing these devices has allowed them to be used to access and navigate the Internet as well as store contact information, review and edit text documents, and perform other tasks. Additionally, it has become very popular to send and receive text messages with mobile devices. For example, the Short Message Service (SMS) for mobile phones has been a tremendous success in the text messaging guide and the recently introduced Enhanced Messaging Service (EMS), an extension of the SMS application level , is expected to offer a smooth transition to the next Multimedia Message Transmission Service (MMS). As a result, these devices provide many applications in which the insertion of text is required. Unfortunately, such text insertion on mobile devices can be inconvenient because they lack a normal-sized standard keyboard. Currently, there are two common ways to achieve text insertion using numeric keyboards found in most mobile phones, a multiple-touch method, and a single-touch method. With the multi-touch method, a user presses a number key a number of times to insert the desired letter, where most of the number keys represent three or four letters of the alphabet. For example, the two key usually represents the letters A, B and C. If the user presses the two key once, the letter A is inserted. If the user presses the two key, twice, the letter B is inserted, and if the user presses the two key three times, the letter C is inserted. Pauses between entering successive letters of a word are sometimes necessary for the device to know when to advance the cursor to the next position of the letter insert. For example, to insert the word "cab", the user presses the two key three times to insert the letter C, pauses, press the two key once to insert the letter A, pause again, and press the two key, two times to insert the letter B. Other keys that are present on the numeric keypads, such as the pad keys ("#") and the asterisk ("*"), among other keys, and typically represented to insert symbols, or interchange between uppercase letters and lowercase Since the multiple touch method is usable in that the user can insert any word using only the numeric keys, it is advantageous for the introduction of quick and intuitive text. A word such as "cab" that only requires pressing three keys on a standard keyboard, one for each letter, requires six keystrokes on numeric keys using the multiple touch method. As compared to a standard keyboard, using numeric keys with a multiple touch method to achieve inserting text means that the user presses many keys even for a short message. In addition, errors can be frequent. For example, if the user intends to insert the letter B, but pauses too much time between the first and second press of the two key, "two letters A will be inserted in. The device in this case interprets the pause as the user has finished with the insertion of the current letter, and proceeds to the next position for the insertion of the letter, where it also inserts an A. Another method for inserting text using numeric keys is the individual-touch-dictionary method such as "T9", popularized by a company called Tegic Under the individual touch method, the user presses the number key with the desired letter once, even though the number key may be representing three to four different letters. a sequence of numbers for a word, the device in tries to discern the word that the user intends to insert, in base to the numerical sequence. Each numerical sequence represents a common word that corresponds to the sequence. For example, the numeric sequence 43556 can potentially correspond to any letter of five letters that has a first letter G, H, or I, 5 since the four key usually represents these letters. Similarly, the sequence potentially corresponds to any five letter word that has a second letter D, E or F, a third and fourth letter selected from the letters J, K and L, and a fifth letter M, N, or O, since the key three, five and six usually 10 represent these respective letters. However, because most of the five-letter words corresponding to the 43556 numeric sequence is the word "hello," the touch method ': 1 individual will always insert this word when the user presses the four, three, five, five and six keys in succession to insert this 15 numerical sequence. The individual touch method has advantages over the multiple touch method, but presents new disadvantages. Advantageously, the individual touch method ensures, with a high probability, that the user only has to press the same 20 number of keys according to the number of letters in the desired word. For example, the multiple touch method requires the user to press the two key six times to insert the word "cab." On the contrary, the individual touch method potentially only requires the user to press the two key three times to insert 25 this word, assume that the numerical sequence 222 represents the word "cab." Therefore, the individual touch method is more efficient in terms of keys than the multiple touch method for inserting text using numeric keys. It's almost as efficient as using a standard keyboard that has an individual key for each letter. The individual touch method is disadvantageous in that the word represented in a given numerical sequence may not be the word that the user intends to insert when capturing the sequence. For example, the sequence of numeric keys 7333 corresponds to both words, "seed" and "reed". Because only one word is represented for each sequence of numeric keys, the word "seed" can be inserted when the user types the sequence of number keys 7333, while the user may have tried to insert the word "reed". The individual touch method is mainly useful where there is only a single word for a sequence of given numeric keys, or, if there is a number of words for a given sequence, when the user wishes to insert the most common word associated with the sequence. When the word represented by the individual touch method is not the intended word, the insertion of the text can be reversed back to the multiple touch method or to an error correction mode. The final text insertion of the intended word may then require more keystrokes than if the user has started with the multiple touch method. Another method to insert text other than the use of a keyboard conventional is through the use of a voice recognition system. In such systems, the user vocalizes the text input, which is captured by the computing device through a microphone and digitized. The spectral analysis is applied to samples of voice vectors and captured captured features or keywords are generated for each sample. The output probabilities can then be calculated against statistical models such as the Hidden Harkov Models, which are subsequently used in the execution of the Viterbi decoding process or a similar type of processing technique. An acoustic model that represents voice units is investigated to determine probable phonemes that are represented by feature or keyword vectors and, therefore, the expression received from the user of the system. A lexicon of vocalized candidate words is researched to determine the word that most likely represents the vector of features or keywords. Additionally, language models can be used to improve the accuracy of the word produced by speech recognition systems. Language models generally operate to improve the accuracy of the voice recognition system by limiting the candidate words to those words most likely based on the preceding words. Once the words of the captured vocalized text entry are identified, they are inserted as text in the computer system. Voice recognition systems require a power of Significant processing in order to process vocalized text input and produce reasonably accurate results. Although the mobile devices of the future may be able to implement such speech recognition systems, current mobile computing devices lack the processing power necessary to do so in a useful way. Additionally, mobile computing devices typically lack the memory capacity that is required for continuous speech recognition with a large vocabulary. Accordingly, mobile computing devices have relied on methods for inserting text, discussed above that use limited keyboards. There is a continuing demand for improved methods for inserting text into devices that include mobile computing devices.
COMPENDIUM OF THE INVENTION The invention relates generally to a method for inserting text into a device. In the method, the input of a first character that is indicative of a first character of a text entry is provided. Next, a vocalization of the text input is captured. A probable candidate word is then identified by a first word of the vocalization based on the input of the first character and an analysis of the vocalization.
Finally, the word probable candidate is displayed to the user.
BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a simplified block diagram of an illustrative computing device in which the invention can be used. Figure 2 is a schematic diagram of a mobile telephone in which the invention can be used. Figure 3 is a flow diagram illustrating a method for inserting text into a device, according to the embodiments of the invention. Figure 4 is a block diagram of an illustrative system that can be used to implement the method of the invention. Figure 5 is a flow chart illustrating a method for inserting text into a device according to embodiments of the invention. Figure 6 is a flow chart illustrating a method for inserting text into a device according to embodiments of the invention.
DETAILED DESCRIPTION OF THE ILLUSTRATIVE MODALITIES The present invention relates generally to a method for inserting text in computing devices. Although the method of The present invention can be implemented in computing devices that include a conventional normal sized keyboard, it is most useful when used in connection with mobile computing devices that lack such keyboard. Figure 1 is a block diagram of an illustrative computing device 100 in which the invention can be implemented. The device 100 may be a mobile computing device such as, for example, a mobile telephone, a personal digital assistant (PDA), a mobile storage system (e.g., MP3 player), a remote control, and other computing devices. mobile phones that lack a conventional normal size keyboard. The device 100 is only an example of a computing environment suitable for the present invention and is not intended to suggest any limitation to the scope of the use or functionality of the invention. Additionally, the device 100 should not be construed as having any dependency or requirement related to any or a combination of the components illustrated in Figure 1. The device 100 may include a controller or processor 102, a screen 106, a microphone 108, and a character input device 110. The memory 104 is a computer readable memory that can be accessed through the processor 102. The memory 104 can comprise volatile and non-volatile memory storage technologies and can be configured to remove it from the device 100. or be fixed in it. For example, the memory 104 may include, but is not limited to, RAM, ROM, EEPROM, non-volatile memory or other memory storage device. The memory 104 is configured to store instructions, such as program modules, which can be executed by the processor 102 to implement the method of the present invention. Generally, program modules include routines, programs, objects, components, data structures, etc. that carry out particular tasks or implement abstract data types. The invention can also be practiced in distributed computing environments where tasks are carried out through remote processing devices that are linked through a communications network. In a distributed computing environment, the program modules can be located in both local and remote storage media.
The processor 102 is configured to display text and images on the screen 106 in accordance with operations of the conventional computing device. The screen 106 can be any suitable screen. For mobile computing devices, the screen 106 is typically a small, flat screen, such as a liquid crystal display (LCD) that can also be touch sensitive. Alternatively, the screen 106 may be a larger screen, such as a cathode ray tube (CRT) screen, or another type of larger screen, such as a large flat panel screen.
The microphone 108 of the device 100 can be used by a user to capture a vocalization. The vocalization is preferably converted to digital form through an analog-to-digital (A / D) converter 112. As will be discussed below in greater detail, the device 100 can process the digitized vocalization to extract probable candidate words that are contained. in vocalization. This is generally achieved by executing a voice recognition or language processing module contained in the memory 104 using the processor 102 to process the digitized vocalization. The character insertion device 110 is used by the user to insert alphanumeric characters, symbols, spaces, etc., as a text entry in the device 100. Additionally, the character insertion device 110 can be used to make selections. , move a cursor, scroll on a page, navigate options and menus, and carry out other functions. Although the character insertion device 110 could be a conventional keyboard, the present invention is more useful with computing devices 100 having a limited character insertion device 110 that is generally smaller, has fewer keys, and limited functionality with relation to conventional normal size keyboards. The insertion of characters using such limited character insertion devices 110 can be slow and tedious. Limited character insertion devices 110 They can take many different forms. Some limited-character insertion devices 110, which are typically used through the PDAs, are formed by a touch-sensitive screen, such as the screen 106. One such limited-character insertion device 110 is formed by the display of a miniature keyboard on the touch screen 106. The user can select the desired characters for text insertion by touching the character displayed with a stylus in a manner similar to the conventional keyboard. Another such limited character insertion device 110 allows users to write characters on the screen 106 or designate input characters that are represented in a particular sequence of keystrokes that can be applied to the touch screen 106 using the stylus. Once the user provides the text input using any form of the character insertion device 110, the insertion of the text is provided on the screen 106. Mobile computing devices, such as a mobile phone, use a limited character insertion device 110 in the form of a numeric keypad. Figure 2 is a simplified diagram of a device 100 in the form of a mobile telephone 114 including said keypad 116, a screen 106 and a microphone 108. The mobile telephone 114 may also include a horn 118, an antenna 120 as well as a system of electrical circuits of communications in the form of a receiver- transmitter (not shown) and other components, which are not pertinent to the present invention. The numeric keypad 116 includes a number of number keys 122 and other keys. In general, the numeric keypad 116 differs from a standard keyboard in that it does not have a unique key for each character. As a result, the numeric keypad 116 is a limited character insertion device 110. The numeric keypad 116 has the following number of keys: a key one 122A, a key two 122B, a key three 122C, a key four 122D, a key key five 122E, a key six 122F, a key seven 122G, a key eight 122H, a key nine 1221, and a key zero 122J. The numeric keypad 116 also has an asterisk (*) key 122K, and a pad key (#) 122L. The keypad 116 may also have other specialized keys other than those shown in Figure 2, or fewer keys than those shown in Figure 2. The keys 122 of the keypad 116 may be physical, real, or virtual keys, soft keys unfolded on the screen 106, where the screen 106 is a touch-sensitive screen. All the numeric keys 122 of the numeric keypad 116, except for the one key 122A and the zero key 122J, correspond to three or four letters of the alphabet. The key two 122B corresponds to the letters A, B and C. The three key 122C corresponds to the letters D, E and F. The four key 122D corresponds to the letters G, H, and I. The five key 122E corresponds to the letters letters J, K and L. The letter six 122F corresponds to the letters M, N and O. The seven key 122G corresponds to the letters P, Q, R, and S. The letter eight 122H corresponds to the letters T, U, and V. Finally, the nine key 1221 corresponds to the letters W, X, Y and Z. Punctuation characters and symbols can be included either in unused keys, such as key one 122A, or they can also be included in other number keys 122, along with the letters. Additionally, each number key 122 can be used to insert the number or symbol with which it is labeled. The mobile computing devices of the prior art, such as mobile phones, use the multiple touch and individual touch methods to insert text into the device 100. Such methods can be tedious and inefficient, not only because of the need to provide so minus one entry using the 122 keys for each text character. Additionally, the individual touch method that usually fails to recognize the word the user is trying to insert. For example, to insert the word "helium" the user presses the four key 122D, the three key 122C, the five key 122E twice and the six key 122F in succession. Due to the captured numeric sequence 43556 may correspond to other words than the word "helium", the intended word is ambiguous. Additionally, the lexicon used by the device, which contains words that are compared with specific numerical sequences, may not contain the word that is to be inserted through the user. This generally results in an out-of-vocabulary error (OOV), which generally requires the user to change the input mode of the text of the devices from the single-touch mode to the multiple-tap mode and reintroduce the desired text entry from the beginning. As a result, the user may be forced to perform significantly more numeric keystrokes of the number of letters contained in the word. The present invention operates to significantly reduce the number of keystrokes that are required to enter the desired text in the device 100 as compared to the prior art methods. This is achieved through a combination of voice recognition with user capture. The result is a text insertion system that is simple, efficient and correct. Figure 3 is a flow chart illustrating the steps of the method according to various embodiments of the present invention. Figure 4 is a block diagram of an illustrative system 128 that can be used to implement the modalities of the method in the device 100. The components of the system 128 generally correspond to program modules and instructions that are contained, for example, in the memory 104 and are executable by the processor 102 of Figure 1 to carry out the various steps of the method. Since the device 100 is set to a text insert mode, an input of the first character 130 is provided to through the user, in step 132. The entry of the first character 130 is indicative of the first character of a text entry that is to be inserted through the user. For example, when the desired text entry is "BERRY", the user provides an entry of the first character 130 which is indicative of the letter "B". The entry of the first character 130 may be the first current character of the text entry that is directly inserted by the user using, for example, the multiple touch method on the numeric keypad 116 (Figure 2), a touch-sensitive screen, or a conventional keyboard, another type of input device 110 (Figure 1), or other means. A disadvantage of this embodiment of the invention is that the limited character insertion devices 110, such as the numeric keypad 116, can force the user to press a key 122 multiple times to insert the desired character, as explained above. The input of the first character 130 can also be captured by the user according to the individual touch method. In this way, for the numeric keypad 116 the user must only press the key 122 corresponding to the desired character only once. In this way, to insert "B" the user simply press the two 122B key once. According to this embodiment of the invention, the entry of the first character 130 is representative of "B" as well as "A" and "C". In step 134 of the method, a vocalization 136 of the text input is captured. This is usually achieved through the user saying the text input in the microphone 108, which is digitized through the A / D converter 112 and stored in the memory 104 or conversely processed by the processor 102, according to conventional speech recognition method. Preferably, the vocalization 136 is captured after the input of the first character 130 has been provided by the user. The capture of the vocalization 136 can be activated to initiate in many different ways. Preferably, an indicator is provided through the device 100 in, for example, the screen 106, to inform the user that the vocalization of the text input should start. According to one embodiment of the invention, the capture step 134 starts in response to the user providing the input of the first character 130 in step 132 of the method. Accordingly, for the individual touch input method, pressing the numeric keypad corresponding to the first character of the text input while in the text input mode of the device 100 initiates the capture step 134. According to another modality of the invention, the capture step 134 starts when a key of the character insertion device 110 is pressed and released. This is particularly useful for the single touch method where only an individual key is pressed to designate the input of the first character 130, but it can be implemented together with the multiple-touch method and other text insertion methods. The device 100 may also include a dedicated hard or soft key that is used to activate the capture step 132.
In accordance with another embodiment of the invention, the capture step 134 can be configured to compensate for situations where a user prematurely speaks before pressing the key or another vocalization capture activation event is detected. One way to address this aspect is to continuously store a few hundred milliseconds of any vocalization made through the user in memory 104 while the device 100 operates in the text entry mode. The vocalization stored in temporary memory can be used to capture a "false start" vocalization of the text input it initiated before activating the event, which can be included as part of the vocalization input 136 that is provided to the recognizer voice 142 (Figure 4) during the capture step 134. The capture step 134 can be terminated through either the expiration of a predetermined period of time or by releasing the button or key that was held down to start capturing the entry of vocalized text. Alternatively, the capture step 132 can be terminated after the system detects the termination of the vocalization of the text input. When the capture step 132 is terminated, the device 100 preferably provides a notice of such situation to the user such as terminating the indicator that was provided once the capture step 132 was initiated. According to one embodiment of the invention, the entry of the The text provided by the user must be in isolated or individual increments of words. Accordingly, the vocalization 136 of the text entry corresponds to an individual or isolated text entry word. The process for inserting text by selecting an entry of a first character and speaking or vocalizing an individual text entry word is somewhat natural when the individual touch method is used to insert the entry of the first character 130. Additionally, the input of Individual word text has its advantages in the context of the mobile computing device. In particular, less memory is required to temporarily store the captured vocalization 136. Additionally, less computational power is required to analyze the vocalization 136 and more accurate speech recognition results are possible, as will be discussed below. According to another embodiment of the invention, text input is provided through the user in the form of multiple words. Due to the generally limited memory capacity and processing power of mobile computing devices, the length of the text input is preferably limited. Accordingly, the user is preferably only allowed to insert a short phrase or sentence. According to one embodiment of the invention, the indicator that notifies the user of the start and end of the capture step may be in the form of a stopwatch (i.e., countdown timer) or the deployment of a bar extending to indicate the elapsed time and completion of the capture step 134. Both modes, the individual word text input and the multiple words of the present invention initially operate in substantially the same way with respect to the first word of the text entry and the corresponding first word of the vocalization. In step 138 of the method, a probable candidate word 140 is identified for a first word of the vocalization 136 of the text input based on the input of the first character 130 and an analysis of the vocalization 136. In general, the method operates to narrow down a list of candidates of potential words for the first word of the text entry (multi-word text entry mode) or the word of the text entry (single or isolated text entry mode) through the elimination of words that fail when compared with the criterion established by the entry of the first character 130. For example, when an entry of the first character of individual touch 130 corresponds to multiple characters "ABC", for example, the list of potential candidate words can be refined to only those words that start with either "A", "B", or "C". As a result, system 128 of device 100 can not only produce more accurate results, but it can produce results much more quickly as would be possible if all potential candidate words for vocalization 136 were analyzed. This is particularly beneficial to mobile computing devices 100 that lack processing power used by other computer systems that implement speech recognition systems. The analysis of the vocalization 136 is generally carried out through a speech recognizer 142 (Figure 4). The speech recognizer 142 generally performs spectral analysis of vocalization digital samples 136 to identify a list of probable candidate words 144 of a lexicon or list of vocalized candidate words 146 that likely correspond to the vocalization 136 of the text input. Preferably the list of probable candidate words 144 produced by the voice recognizer 142 is classified according to their likelihood of coinciding with the vocalization 136. The voice recognizer 142 may also include a language model 148 that can improve the recognition accuracy of voice 142. The language model 148 operates to specify which sequences of words in the vocabulary are possible, or in general, provides information about the probability of several sequences of words. Examples of language models are the 1-gram, 2-gram, and N-gram language models. The 1-gram language model considers only the probability of an individual word, whereas the 2-gram language model considers the preceding word in the text entry as having an influence on which is the current vocalized word of the text input . Similarly, the 3-gram, 4-gram and N-language models gram consider the two, three or N-1 immediate words that precede the desired text input in determining the comparison with the vocalization 136. Due to the general lack of processing power in the computerized mobile devices 100, it may be necessary to limit the language model 148 to language models 1- or 2-gram. The identification step 138 is generally carried out through a prediction module 150. According to one embodiment of the invention, the prediction module 150 receives the list of probable candidate words 144 and the insertion of the character 130. The prediction module 148 identifies the probable candidate word 140 from the list of probable candidate words 144 based on the input of the first character 130. The predictor 150 preferably selects the highest ranked word in the list of probable candidate words 144 having the insertion of the character 130 as its first letter, according to the word "probable candidate" 140. In accordance with another embodiment of the invention, the identification step 138 is carried out first by reducing the lexicon or candidate list of voiced words 146 of the speech recognizer 142 using the input of the first character 130, as indicated by dotted line 152 in Figure 4. As a result, the list of candidates for vocalised words 146 is reduced to a smaller list of candidates for vocalized words 154 through the elimination of all candidates of words vocalized ones that fail to start with the character or characters identified by the input of the first character 130. The reduced list of vocalised words candidates 154 is further reduced to form the list of probable word candidates 144 for the first word of the vocalization 136 in Based on an analysis through the voice recognizer 142. As a result, the list of probable word candidates 144 that are provided to the forecaster 150 each starts with the character or characters identified by the input of the character 130. The forecaster 150 then identifies the probable word candidate 140, which is preferably the highest ranked candidate on the list of probable word candidates 144. Another embodiment of the identification step 138 includes performing an individual touch analysis at the input of the first character 130 In general, the forecaster 150 uses the input of the first character 130 to reduce a lexicon or list of entry word candidates 156 for only those words having first characters corresponding to the entry of the first character 130. In this form, the list of entry word candidates 156 is reduced to a short list of candidates for input words 158 for the first word of the vocalization 136. The predictor 150 then compares the list of candidates of voiced words 144 that are produced in response to the analysis of the vocalization 136 through the voice recognizer 142 for the shortlist of candidates input word 158. The Predictor 150 then identifies the probable word candidate 140 as the word candidate that is located in both, the list of candidates for vocalised words and the shortlist of entry word candidates. Preferably, the predictor 150 selects the probable word candidate 140 as the word that has the highest ranking in the list of probable word candidates 144 that has a match in the shortlist of entry word candidates 158. In the final step 160 of the method, the probable word candidate 140 is displayed to the user in, for example, the screen 106 of the device 100. Alternatively, multiple probable word candidates can be displayed to the user who satisfy the identification step 138. The Deployment of the probable word can be constructed as a probable word candidate entry in the device 100 even if it has already been accepted by the user. The probable word displayed 140 can then be accepted to complete the text entry of the word or rejected by the user. Usually, the probable word candidate 140 is accepted and inserted as the text entry in the device 100 in response to a selection through the user. According to one embodiment of the invention, the user inserts the probable word candidate 140 by pressing a soft or hard key on the device 100. According to one embodiment of the invention, the user's selection is preferably carried out by pressing one of the keys 122 of the numeric keypad which does not correspond to the alphanumeric characters, such as the 122K key of the asterisk or the 122L key of the pad. However, it should be understood that many conventional methods for making a selection can be used to insert the probable word candidate displayed. When the user is inserting text of one word at a time, and the probable word displayed is accepted by the user and inserted, the method may continue according to the flow chart of Figure 5. In step 162, the user provides an entry of a second character that is indicative of a first character of a second text entry. The input of the second character can be provided in accordance with the procedures explained above to provide the first character input 130. Next, in step 164, a vocalization of the second text input is captured in the manner described above with respect to step 314 (Figure 3). A candidate of probable word is then identified for the vocalization of the input of the second text, in step 166, based on the input of the second character and an analysis of the vocalization of the input of the second text. This step is carried out substantially in the manner described above with respect to step 138 of the method of Figure 3. Finally, the probable word candidate is displayed in step 168. The user then has the options to select or reject the candidate of probable word as it described earlier. The speech model 146 of the speech recognizer 150 can take into account the preceding words in the text entry to identify the current word that the user is trying to insert. Accordingly, the step 166 of identifying a probable word candidate for the vocalization of the entry of the second text can furthermore be based on the previously inserted probable word candidate 140. When the user is inserting text in the multiple word format, the probable word displayed is accepted by the user, and the words of the vocalization 136 have not all been identified, the method may continue according to the flow chart of Figure 6. In step 170 of the method, the input of a second character that is indicative of a first letter of a second word of the vocalization 136 captured in step 134 of Figure 3. As mentioned above, the entry of the second character can be provided according to the procedure explained above to provide the entry of the first character 130. Then, in step 172, a probable word candidate for the second word of the vocalization 136 is identified based on an analysis of the vocalization 136 and the input of the second character. The probable word candidate is then displayed in step 174 for the user to either accept or reject it. If the user accepts this probable word candidate, the method returns to step 170 and repeats until the words of the vocalization 136 are identified. As above, the step 172 of identifying a probable word candidate for the second word of the vocalization can furthermore be based on the probable word candidate previously inserted 140 using an appropriate language model 146 for the speech recognizer 150. As mentioned previously, the user also has an opportunity to reject the probable word candidate displayed 140 by providing an appropriate entry. According to one embodiment of the invention, a key is provided through the device 100, when pressed, which results in the rejection of the deployed probable word candidate 140. Said key can be a soft key or a hard key of the device. 100. For example, when using the 122K key of the asterisk to accept a probable word candidate displayed 140, the key 122L of the pad can be used to reject the probable word candidate. Many other methods to reject the candidate of probable word deployed can be used as well. According to one embodiment of the invention, once the candidate has rejected the user's probable word, one or more probable word candidates that match the criteria of the identification step 138 (Figure 3) are displayed to the user according to their classification. For example, when the desired word to be inserted is "BURY", the probable word 140 displayed by the 128 system could be "BERRY", after which the user rejects the probable word displayed, the system 128 can display the most likely alternatives, such as the desired word "BURY" as well as "BARRY", for example. The user is then provided with an option to select among the candidate probable alternative words displayed. According to another embodiment of the invention, the rejection of the candidate of probable word deployed 140 occurs in response to the user providing an input of a second character that is indicative of a second character of the first word of the vocalization 136 of the input of the desired text. The input of the second character input can be made in the manner described above for the input of the first character 130. The system 128 of the device 100 locates one or more likely word candidates that satisfy the method implemented in step 138 (FIG. ) and has a first and second characters corresponding to the entries of the first and second characters. The candidates for alternative probable words can then be displayed to the user for selection or rejection. This process can be repeated by continuing to insert the third and subsequent characters of the text entry. In the case where the alternative probable words still fail to be compared with the text input word desired by the user, the text insertion mode for the device 100 can be exchanged as multiple touches to allow the user to insert the text. word directly on device 100.
Although the present invention has been described with reference to particular embodiments, workers skilled in the art will recognize that changes can be made in form and detail without departing from the spirit and scope of the invention. Also, although most of the discussions here centered around alphabetic languages such as the principles described in this invention are also applicable to other languages such as those in East Asia whose insertion methods are not based on alphabets.

Claims (40)

  1. CLAIMS 1. A method for inserting text into a device comprising: a) providing input of a first character that is indicative of a first character of a text entry word; b) capture a vocalization of the text entry word; c) identify a probable word candidate for vocalization based on the input of the first character and an analysis of the vocalization; and d) display the probable word candidate. 2. The method according to claim 1, wherein the capture step b) starts in response to the proportion of step a). 3. The method according to claim 1, wherein the capture step b) is initiated before providing step a). 4. The method according to claim 1, wherein the capture step b) ends after a predetermined period of time. The method according to claim 1, wherein the capture step b) ends after the termination of the vocalization is detected. The method according to claim 1, wherein providing step a) includes pressing a key corresponding to multiple characters. The method according to claim 1, wherein: providing step a) includes pressing and holding a key; and the capture step b) begins in response to providing step a). The method according to claim 7, wherein the capture step b) ends after a predetermined period of time. The method according to claim 7, wherein the capture step ends when the key is released. The method according to claim 1, wherein the identification step c) includes: producing a candidate list of probable words based on an analysis of the vocalization; and identify the probable word candidate from the list of probable word candidates for vocalization based on the entry of the first character. The method according to claim 10 which includes: rejecting the probable word candidate in response to a user input; and display an alternative probable word candidate from the list of probable word candidates. The method according to claim 1, wherein the identification step c) includes: reducing a candidate list of voiced words using the entry of the first character to form a list reduced number of candidates for vocalised words; reduce the shortlist of candidates for vocalised words to a list of probable words candidates for vocalization based on an analysis of vocalization; and identify the probable word candidate from the list of probable word candidates. The method according to claim 12, which includes: rejecting the probable word candidate in response to a user input; and display a probable word candidate from the list of probable word candidates. The method according to claim 1, wherein the identification step c) includes: analyzing the vocalization to produce a list of candidates for vocalized words; reduce the candidate list of input words using the input of the first character to form a short list of candidates for input words for vocalization; compare the list of candidates for vocalised words with a short list of entry word candidates; and identify the probable word candidate as a word candidate that is located in both, the list of candidates for vocalised words and the shortlist of entry word candidates. 15. The method according to claim 14, which includes: reject the probable word candidate in response to a user input; and display an alternative probable word candidate that is located in both, the list of candidates for vocalised words and the shortlist of entry word candidates. 16. The method according to claim 1, which includes providing the input of a second character that is indicative of a second character of the word of the input text, wherein the candidate of probable word identified in step c) is based on the inputs of the first and second character and the analysis of vocalization. 17. The method according to claim 1, which includes inserting the probable word candidate in response to a user selection. The method according to claim 17, which includes: providing an entry of a second character that is indicative of a first character of a second text entry word; capture a vocalization of the second entry word of the text; identify a probable word candidate for the vocalization of the second entry of the word of the text based on the input of the second character and an analysis of the vocalization of the second word of the text entry; and deploy the probable word candidate for the vocalization of the second word of text entry. 19. The method according to claim 18, wherein the step of identifying a probable word candidate for the vocalization of the second word of text input is further based on the candidate of probable word inserted. 20. A method for inserting text into a device comprising: a) providing input of a first character that is indicative of a first character of the text entry; b) capture a vocalization of the text input; c) identify a probable word candidate for a first word of the vocalization based on the input of the first character and an analysis of the vocalization; and d) display the probable word candidate. 21. The method according to claim 20, wherein the text entry consists of an individual word. 22. The method according to claim 20, wherein the text input comprises multiple words. 23. The method according to claim 20, wherein the capture step b) starts in response to the proportion of step a). 24. The method according to claim 23, wherein the capture step b) ends after a predetermined period of time. 25. The method according to claim 20, wherein the step of providing a) includes pressing a key corresponding to multiple characters. 26. The method according to claim 20, wherein: step a) of providing includes pressing and holding a key; and step b) of capture is initiated in response to step a) of providing. 27. The method according to claim 26, wherein the capture step b) ends after a predetermined period of time. 28. The method according to claim 26, wherein the capture step b) ends when the key is released. 29. The method according to claim 20, wherein step c) of identification includes: producing a candidate list of probable words based on an analysis of the vocalization; and identify the probable word candidate from the list of probable word candidates for the first word of the vocalization based on the capture of the first character. 30. The method according to claim 29, which includes: rejecting the probable word candidate in response to a user's input; and display an alternative probable word candidate from the list of probable word candidates. 31. The method according to claim 20, wherein step c) of identification includes: reduce a list of word candidates using the entry of the first character to form a short list of candidates for vocalised words; reduce the shortlist of candidates for vocalised words to form a list of probable word candidates for the first word of vocalization based on a vocalization analysis; and identify the probable word candidate from the list of probable word candidates. 32. The method according to claim 31 which includes: rejecting the probable word candidate in response to a capture of a user; and display a probable word candidate from the list of probable word candidates. The method according to claim 20, wherein step c) of identification includes: analyzing the vocalization to produce a list of candidates for vocalized words; reduce the candidate list of input words using the input of the first character to form a short list of entry word candidates for the first word of the vocalization; compare the list of candidates for vocalised words to reduce the list of candidates for entry words; and identify the probable word candidate as a candidate of word that is located in both, the list of candidates for vocalised words and the shortlist of entry word candidates. 34. The method according to claim 33, which includes: rejecting the probable word candidate in response to a user input; and display an alternative probable word candidate that is located in both, the list of candidates for vocalised words and the shortlist of entry word candidates. 35. The method according to claim 20, which includes an entry of a second character that is indicative of a second character of the text entry, wherein the probable word candidate identified in step c) is based on the entries of the first and second character, and the analysis of vocalization. 36. The method according to claim 20 including the insertion of the probable word candidate in response to a selection of a user. 37. The method according to claim 36 which includes: providing an entry of a second character that is indicative of an entry of the second text; capture a vocalization of the input of the second text; identify a candidate of probable word for the vocalization of the entry of the second text based on the input of the second character and an analysis of the vocalization of the entry of the second text; Y unfold the candidate of probable word for the vocalization of the entry of the second text. 38. The method according to claim 37, wherein the step of identifying a probable word candidate for the vocalization of the second text entry is also based on the inserted probable word candidate. 39. The method according to claim 36, which includes: providing an input of the second character that is indicative of a first character of a second word of the vocalization; identify a probable word candidate for the second word of the vocalization based on the input of the second character and an analysis of the vocalization; and display the probable word candidate for the second word of the vocalization. 40. The method according to claim 39, wherein the step of identifying a probable word candidate for the second word of the vocalization is further based on the candidate of probable word inserted.
MXPA04011787A 2003-12-30 2004-11-26 Method for entering text. MXPA04011787A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/748,404 US7363224B2 (en) 2003-12-30 2003-12-30 Method for entering text

Publications (1)

Publication Number Publication Date
MXPA04011787A true MXPA04011787A (en) 2007-11-14

Family

ID=34574762

Family Applications (1)

Application Number Title Priority Date Filing Date
MXPA04011787A MXPA04011787A (en) 2003-12-30 2004-11-26 Method for entering text.

Country Status (10)

Country Link
US (1) US7363224B2 (en)
EP (1) EP1550939A3 (en)
JP (1) JP2005196140A (en)
KR (1) KR101109265B1 (en)
CN (1) CN1637702A (en)
AU (1) AU2004231171A1 (en)
BR (1) BRPI0405164A (en)
CA (1) CA2487614A1 (en)
MX (1) MXPA04011787A (en)
RU (1) RU2377664C2 (en)

Families Citing this family (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4012143B2 (en) * 2003-12-16 2007-11-21 キヤノン株式会社 Information processing apparatus and data input method
WO2005088607A1 (en) * 2004-03-12 2005-09-22 Siemens Aktiengesellschaft User and vocabulary-adaptive determination of confidence and rejecting thresholds
US7873149B2 (en) * 2004-06-01 2011-01-18 Verizon Business Global Llc Systems and methods for gathering information
US8392193B2 (en) * 2004-06-01 2013-03-05 Verizon Business Global Llc Systems and methods for performing speech recognition using constraint based processing
JP2006011641A (en) * 2004-06-23 2006-01-12 Fujitsu Ltd Information input method and device
JP4027357B2 (en) * 2004-10-08 2007-12-26 キヤノン株式会社 Character string input device and control method thereof
US8434116B2 (en) 2004-12-01 2013-04-30 At&T Intellectual Property I, L.P. Device, system, and method for managing television tuners
US7436346B2 (en) * 2005-01-20 2008-10-14 At&T Intellectual Property I, L.P. System, method and interface for controlling multiple electronic devices of a home entertainment system via a single control device
US20060293890A1 (en) * 2005-06-28 2006-12-28 Avaya Technology Corp. Speech recognition assisted autocompletion of composite characters
JP4702936B2 (en) * 2005-06-28 2011-06-15 キヤノン株式会社 Information processing apparatus, control method, and program
US8249873B2 (en) * 2005-08-12 2012-08-21 Avaya Inc. Tonal correction of speech
US8924212B1 (en) 2005-08-26 2014-12-30 At&T Intellectual Property Ii, L.P. System and method for robust access and entry to large structured data using voice form-filling
US20070076862A1 (en) * 2005-09-30 2007-04-05 Chatterjee Manjirnath A System and method for abbreviated text messaging
JP4878471B2 (en) * 2005-11-02 2012-02-15 キヤノン株式会社 Information processing apparatus and control method thereof
US20070100619A1 (en) * 2005-11-02 2007-05-03 Nokia Corporation Key usage and text marking in the context of a combined predictive text and speech recognition system
US20080141125A1 (en) * 2006-06-23 2008-06-12 Firooz Ghassabian Combined data entry systems
KR20090019198A (en) * 2007-08-20 2009-02-25 삼성전자주식회사 Method and apparatus for automatically completed text input using speech recognition
KR101502003B1 (en) * 2008-07-08 2015-03-12 엘지전자 주식회사 Mobile terminal and method for inputting a text thereof
JP5318030B2 (en) * 2010-05-19 2013-10-16 ヤフー株式会社 Input support apparatus, extraction method, program, and information processing apparatus
US9037459B2 (en) * 2011-03-14 2015-05-19 Apple Inc. Selection of text prediction results by an accessory
US9636582B2 (en) 2011-04-18 2017-05-02 Microsoft Technology Licensing, Llc Text entry by training touch models
WO2013006215A1 (en) * 2011-07-01 2013-01-10 Nec Corporation Method and apparatus of confidence measure calculation
US9105073B2 (en) * 2012-04-24 2015-08-11 Amadeus S.A.S. Method and system of producing an interactive version of a plan or the like
KR102313353B1 (en) * 2013-07-29 2021-10-18 삼성전자주식회사 Character inputting method and display apparatus
JP6165619B2 (en) 2013-12-13 2017-07-19 株式会社東芝 Information processing apparatus, information processing method, and information processing program
CN104267922B (en) * 2014-09-16 2019-05-31 联想(北京)有限公司 A kind of information processing method and electronic equipment
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US9578173B2 (en) 2015-06-05 2017-02-21 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10446143B2 (en) * 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US11170757B2 (en) * 2016-09-30 2021-11-09 T-Mobile Usa, Inc. Systems and methods for improved call handling
CN106802725B (en) * 2017-03-09 2018-07-24 重庆字曌教育科技有限公司 Chinese character word-building component, the joinery and its construction hanzi system of formation and Chinese character input method
DK201770439A1 (en) 2017-05-11 2018-12-13 Apple Inc. Offline personal assistant
DK179496B1 (en) 2017-05-12 2019-01-15 Apple Inc. USER-SPECIFIC Acoustic Models
DK179745B1 (en) 2017-05-12 2019-05-01 Apple Inc. SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT
DK201770431A1 (en) 2017-05-15 2018-12-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
DK201770432A1 (en) 2017-05-15 2018-12-21 Apple Inc. Hierarchical belief states for digital assistants
DK179549B1 (en) 2017-05-16 2019-02-12 Apple Inc. Far-field extension for digital assistant services
JP7056185B2 (en) * 2018-01-31 2022-04-19 トヨタ自動車株式会社 Information processing equipment and information processing method
CN108281142A (en) * 2018-02-05 2018-07-13 北京唱吧科技股份有限公司 A kind of requesting songs method and system
CN112578968B (en) * 2019-09-30 2024-04-30 菜鸟智能物流控股有限公司 Interface processing method and device of bar code acquisition equipment and electronic equipment

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5031206A (en) * 1987-11-30 1991-07-09 Fon-Ex, Inc. Method and apparatus for identifying words entered on DTMF pushbuttons
US5303299A (en) * 1990-05-15 1994-04-12 Vcs Industries, Inc. Method for continuous recognition of alphanumeric strings spoken over a telephone network
KR950008022B1 (en) * 1991-06-19 1995-07-24 가부시끼가이샤 히다찌세이사꾸쇼 Charactor processing method and apparatus therefor
US6092043A (en) * 1992-11-13 2000-07-18 Dragon Systems, Inc. Apparatuses and method for training and operating speech recognition systems
WO1996010795A1 (en) * 1994-10-03 1996-04-11 Helfgott & Karas, P.C. A database accessing system
US5787230A (en) * 1994-12-09 1998-07-28 Lee; Lin-Shan System and method of intelligent Mandarin speech input for Chinese computers
EP1016078B1 (en) 1997-06-27 2003-09-03 M.H. Segan Limited Partnership Speech recognition computer input method and device
KR100552085B1 (en) * 1997-09-25 2006-02-20 테직 커뮤니케이션 인코포레이티드 Reduced keyboard disambiguating system
US6223158B1 (en) * 1998-02-04 2001-04-24 At&T Corporation Statistical option generator for alpha-numeric pre-database speech recognition correction
US20020069058A1 (en) * 1999-07-06 2002-06-06 Guo Jin Multimodal data input device
WO2002005263A1 (en) 2000-07-07 2002-01-17 Siemens Aktiengesellschaft Method for voice input and voice recognition
GB2365188B (en) 2000-07-20 2004-10-20 Canon Kk Method for entering characters
US6405172B1 (en) * 2000-09-09 2002-06-11 Mailcode Inc. Voice-enabled directory look-up based on recognized spoken initial characters
US7010490B2 (en) * 2001-01-26 2006-03-07 International Business Machines Corporation Method, system, and apparatus for limiting available selections in a speech recognition system
US7369997B2 (en) * 2001-08-01 2008-05-06 Microsoft Corporation Controlling speech recognition functionality in a computing device
US7124085B2 (en) * 2001-12-13 2006-10-17 Matsushita Electric Industrial Co., Ltd. Constraint-based speech recognition system and method
US7174288B2 (en) * 2002-05-08 2007-02-06 Microsoft Corporation Multi-modal entry of ideogrammatic languages
JP4012143B2 (en) 2003-12-16 2007-11-21 キヤノン株式会社 Information processing apparatus and data input method

Also Published As

Publication number Publication date
JP2005196140A (en) 2005-07-21
BRPI0405164A (en) 2005-09-20
KR101109265B1 (en) 2012-01-30
US7363224B2 (en) 2008-04-22
RU2004135023A (en) 2006-05-10
KR20050071334A (en) 2005-07-07
EP1550939A3 (en) 2007-05-02
CA2487614A1 (en) 2005-06-30
RU2377664C2 (en) 2009-12-27
AU2004231171A1 (en) 2005-07-14
CN1637702A (en) 2005-07-13
EP1550939A2 (en) 2005-07-06
US20050149328A1 (en) 2005-07-07

Similar Documents

Publication Publication Date Title
US7363224B2 (en) Method for entering text
KR101312849B1 (en) Combined speech and alternate input modality to a mobile device
JP4829901B2 (en) Method and apparatus for confirming manually entered indeterminate text input using speech input
CA2556065C (en) Handwriting and voice input with automatic correction
US7319957B2 (en) Handwriting and voice input with automatic correction
US9786273B2 (en) Multimodal disambiguation of speech recognition
TWI266280B (en) Multimodal disambiguation of speech recognition
US20070100619A1 (en) Key usage and text marking in the context of a combined predictive text and speech recognition system
US20040153975A1 (en) Text entry mechanism for small keypads
US20060293890A1 (en) Speech recognition assisted autocompletion of composite characters
JP2001509290A (en) Reduced keyboard disambiguation system
JP2011254553A (en) Japanese language input mechanism for small keypad
US20070038456A1 (en) Text inputting device and method employing combination of associated character input method and automatic speech recognition method
JP2002366543A (en) Document generation system
JP2004227156A (en) Character input method
JP2001067097A (en) Document preparation device and document preparing method

Legal Events

Date Code Title Description
FG Grant or registration