CN1369830A - Divergence elimination language model - Google Patents

Divergence elimination language model Download PDF

Info

Publication number
CN1369830A
CN1369830A CN 02106530 CN02106530A CN1369830A CN 1369830 A CN1369830 A CN 1369830A CN 02106530 CN02106530 CN 02106530 CN 02106530 A CN02106530 A CN 02106530A CN 1369830 A CN1369830 A CN 1369830A
Authority
CN
China
Prior art keywords
character
language model
character string
computer
word phrase
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 02106530
Other languages
Chinese (zh)
Other versions
CN100568222C (en
Inventor
朱云正
F·A·阿列瓦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Corp
Original Assignee
Microsoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US09/773,342 external-priority patent/US6507453B2/en
Application filed by Microsoft Corp filed Critical Microsoft Corp
Publication of CN1369830A publication Critical patent/CN1369830A/en
Application granted granted Critical
Publication of CN100568222C publication Critical patent/CN100568222C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

A language model for language processing system such as speech recognition system with functions of related characters, words and phrases, and context marking is disclosed. A method for forming a training corpus and apparatus is provided, wherein the training corpus is used for training the language model. A system or module using this disclosed language model is further provided.

Description

Divergence elimination language model
Background of invention
The present invention relates to the language modeling.More particularly, the present invention relates to create and use a kind of minimized language model of ambiguity that is used to make such as during the character recognition of input voice.
Speech recognition accurately not only needs a kind of acoustic model to select the said correct word of user.In other words, if speech recognition device must be selected or which word what determine to be pronounced is, if all words all have identical pronunciation, then this speech recognition device will obviously can not be carried out satisfactorily.A kind of language model provides a kind of and specified which word sequence in the vocabulary to be possible method or device, and is perhaps common, and the information of relevant various word sequence similaritys is provided.
Speech recognition often is counted as a kind of Language Processing form from top to bottom.Two kinds of general types of Language Processing comprise " from top to bottom " and " from bottom to top ".Language Processing is that largest unit with language begins to discern from top to bottom, and for example a sentence is handled by it being categorized as smaller unit, and for example phrase is further divided into littler unit, for example word successively.On the contrary, Language Processing is to begin with word from bottom to top, and begins to construct bigger phrase and/or sentence therefrom.Two kinds of forms of this of Language Processing can get help from language model.
A kind of known sorting technique is to use a kind of N character row language model.Because N character row can be made friends with lot of data, the correlativity of N word provides the superficial part structure of the compacting of sentence structure and semanteme usually.Although N character row language model for general oral instruction can carry out fine, homophone can produce very big mistake.Homophone is an element such as the such language codes of character or syllable, just pronounces similar but has one of two or more elements of different spellings.For example, when a user is just spelling character, because the character of some character pronunciation same voice identification modules meeting output errors.Same, for the character (for example " m " and " n ") that sounds that when pronouncing the kinds of characters sound identification module that is analogous to each other also can output error.
Ambiguity problem is especially general in as language such as Japanese or Chinese, and it mainly is to write with the Chinese character writing system.The character of these language is a lot of complicated expression sound and the pictograph of the meaning.These characters have formed limited syllable, produce a large amount of homophones successively, have increased greatly by the oral instruction required time of spanned file.Particularly, homophone character that hereof must identification error and insert correct homophone character.
Therefore there is a kind of lasting demand to go to develop new method, is used to make the ambiguity when sending out the voice of homophone and similar pronunciation to minimize with different meanings.Along with the development of technology, in more applications, all provide speech recognition, this just must obtain a kind of language model more accurately.
Summary of the invention
Speech recognition device uses a kind of language model as N character row language model to improve accuracy usually.First aspect of the present invention comprises and generates a kind of language model, and it is just discerning a character or a plurality of character (for example syllable) is for example particularly useful when spelling a word a talker.This language model helps homophone and sounds the ambiguity elimination of the kinds of characters that is analogous to each other.This language model is by the training corpus structure of the coherent element that comprises a character string (can be single character), a word phrase with character string (can be word) and a contextual tagging.Use a word list or dictionary, sentence or phrase that the word phrase of a character string by comprising word phrase, contextual tagging and word phrase for each forms a part can generate training corpus automatically.In another embodiment, each the word symbol for the word phrase generates a phrase.
Another aspect of the present invention is a kind of above-mentioned system or module that is used to discern the language model of said character of using.In conjunction with the contextual tagging in the relevant word phrase, sound identification module determines that the user is spelling or the mode of identification character when saying a character string.This sound identification module will only be exported the character that is identified, and not export contextual tagging or relevant word phrase.In yet another embodiment, the character that relatively is identified of sound identification module and the correct character that the recognized word phrase has been identified with checking.If the character that is identified is not in the recognized word phrase, then Shu Chu character is a character that is identified the word phrase.
Brief description of drawings
Accompanying drawing 1 is the block scheme of a language processing system.
Accompanying drawing 2 is block schemes of a typical computing environment.
Accompanying drawing 3 is block schemes of a typical speech recognition system.
Accompanying drawing 4 is process flow diagrams of a kind of method of the present invention.
Accompanying drawing 5 is the module frame charts that are used to realize the method for accompanying drawing 4.
Accompanying drawing 6 is block schemes of a kind of sound identification module and a kind of optional character authentication module.
The detailed description of illustrative embodiment
Accompanying drawing 1 shows a kind of language processing system 10, and it receives a language input 12, and handles this language input 12 so that a language output 14 to be provided.For example, this language processing system 10 can be embodied in the speech recognition system or the module of the language input 12 of a kind of reception or language of being write down said by the user.Language processing system 10 is handled said language and is provided with the identified word of literal output form and/or character as an output.
During handling, speech recognition system or module 10 can be visited a language model 16 to determine being which word, particularly, are which homophone in the said language or the element of other similar pronunciations.16 pairs of a kind of specific speech encodings of language model are as English, Chinese, Japanese or the like.In the embodiment that is released, language model 16 can be used as a kind of statistical language model, as a kind of N character row language model, and context-free grammar, or above-mentioned mixing, all these is well known in the art.A main aspect of the present invention is a kind of method of creating and constructing language model 16.Another main aspect is this method of using in speech recognition.
Going through before the present invention, summarizing earlier, operating environment is of great use.Accompanying drawing 2 and relevant argumentation provide the explanation of a brief overview to a suitable computing environment 20 that realizes place of the present invention.This computing environment 20 is an example of a suitable computing environment just, can't any restriction be arranged to usable range of the present invention or function.This computing environment 20 also should not be construed as and illustrates one of each several part in the typical operation environment 20 or synthesizes any dependence or demand.
The present invention operates with many other universal or special computingasystem environment or configuration.The example that is fit to known computing system, environment and/or the configuration of the present invention's use includes, but are not limited to personal computer, server computer, hand or laptop devices, multicomputer system is based on the system of microprocessor, set top box, programmable consumer electronic device, network PC, minicomputer, large scale computer comprises the distributed computing environment and the similar devices of any said system or equipment.In addition, the present invention can be used in the telephone system.
The present invention can illustrate with the context of common computer executable instructions, as program module, is carried out by computing machine.Usually, program module comprises routine, program, object, assembly, data structure or the like execution special duty or realizes special abstract data type.The present invention also is applied in the distributed computing environment, wherein by executing the task by the teleprocessing equipment of communication network link.In distributed computing environment, program module can be arranged in the computer-readable storage medium that comprises memory storage device of Local or Remote.Be illustrated below and by means of accompanying drawing by program and the performed task of module.
With reference to the accompanying drawings 2, be used to realize that canonical system of the present invention comprises that one is the universal computing device of form with computing machine 30.The formation of computing machine 30 includes, but are not limited to, and 40, one system storages 50 of a processing element and a coupling comprise each system unit of system storage to the system bus 41 of handling parts 40.This system bus 41 can be to comprise a memory bus or memory controller, an external bus and bus structure of using any several types of any one bus-structured local bus.In order to demonstrate, but be not limited thereto, this structure comprises industrial standard architectures (ISA) bus, Micro Channel Architecture (MCA) bus, ISA (EISA) bus that strengthens, VESA (VESA) local bus, and Peripheral Component Interconnect (PCI) bus that is called as the add-in card bus.
Computing machine 30 typically comprises one kind of multiple computer-readable mediums.Computer-readable medium can be any by computing machine 30 can be accessed available medium, comprise volatibility and non-volatile medium, movably with non-movably medium.For example, be not limited thereto, computer-readable medium can comprise computer-readable storage medium and communication media.Computer-readable storage medium be included in be used for storing such as the volatibility that any method and technology realized of computer-readable instruction, data structure, program module or other data with non-volatile, movably with non-movably medium.Computer-readable storage medium comprises, but be not limited to, RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disc (DVD) or other optical disc memorys, magnetic tape cassette, tape, magnetic disk memory or other disk storage devices, or any other can be used to store information needed and can be by the medium of computing machine 20 visits.Communication media typically comprises computer-readable instruction, data structure, and program module or other data such as carrier wave or other transmission structures in modulated message signal, and comprise any information transmitting medium.Term " modulated message signal " is meant a kind of signal that comes the information in the coded signal that has one or more character group or changed by this way.As example, but be not limited to this, communication media comprises wire medium such as wired network or direct wired connection and wireless medium such as sound, FR, infrared ray and other wireless mediums.Above-mentioned combination in any can be included in the scope of computer-readable medium.
System storage 50 comprises that with volatibility and/or nonvolatile memory such as ROM (read-only memory) (ROM) 51 and random access memory (RAM) 52 be the computer-readable storage medium of form.A basic input/output 53 (BIOS) typically is stored among the ROM51, and it comprises basic routine, helps the information between each parts in the coordinate conversion computer 30, as between the starting period.RAM52 typically comprises can be accessed immediately and/or by the data and/or the program module of processing element 40 operations.As example, but be not limited to this, accompanying drawing 2 shows operating system 54, application program 55, other program modules 56 and routine data 57.
Computing machine 30 comprises that also other are removable/computer-readable storage medium of non-removable, volatile, nonvolatile.As just example, accompanying drawing 2 shows a kind of read or to its hard disk drive that writes 61 from non-removable, non-volatile magnetic medium, one from one removable, non-volatile magnetic disk 72 reads or removable, non-volatile CD 76 reads as CD ROM or other optical mediums or to its CD drive that writes 75 from one to its disc driver that writes 71 and one.Other can be used for the typical operation environment removable/non-removable, volatile/nonvolatile computer storage media comprises, but be not limited to magnetic tape cassette, flash card, digital versatile disc, digital recording band, solid-state RAM, solid-state ROM and like.Hard disk drive 61 typically links to each other with system bus 41 by the non-removable memory interface as interface 60, and disc driver 71 typically links to each other with system bus 41 by a removable memory interface as interface 70 with CD drive 75.
Above-mentioned discussion also is shown in the driver computer-readable storage medium relevant with them among Fig. 2 and provides computer-readable instruction, data structure, program module and other to be used for the memory of data of computing machine 30.In Fig. 2, for example, hard disk drive 61 diagrams store operating system 64, application program 65, other program modules 66 and routine data 67.Notice that these parts both can be the same or different in operating system 54, application program 55, other program modules 56 and routine data 57.Operating system 64, application program 65, other program modules 66 provide different marks with routine data 67 at this and illustrate, and at least, they are different versions.
User is by the input equipment as keyboard 82, and microphone 83 and one is such as the such pointing device 81 of mouse, tracking ball or touch pads, can input command and information in computing machine 30.Other input equipments
(not shown) can comprise operating rod, game mat, satellite retroreflector, scanner or a similar devices.These and other input equipments link to each other with processing element 40 by a user's input interface 80 with the system bus coupling usually, but also can be connected with bus structure by other interfaces, as parallel port, and game port or USB (universal serial bus) (USB).The display device of display 84 or other types also links to each other with system bus 41 by an interface such as video interface 85.Except display, computing machine can also comprise other peripheral output devices such as loudspeaker 87 and printer 86, and peripheral output device connects by an output Peripheral Interface 88.
Computing machine 30 uses this locality to be connected to one or more remote computers such as remote computer 94 and operates in the network environment.Remote computer 94 can be a personal computer, a handheld device, a station server, a router, a network PC, a statistics equipment or other common network node, and typically comprise the parts that many or all above-mentioned and computing machines 30 are relevant.The logic of describing among Fig. 2 connects and comprises a Local Area Network 91 and a wide area network (WAN) 93, can also comprise other networks.This networked environment is very general in office, enterprise-wide computer network, Intranet and the Internet.
When being used for the LAN networked environment, computing machine 30 is connected to LAN91 by a network interface or adapter 90.When being used for the WAN networked environment, computing machine 30 comprises that typically a modulator-demodular unit 92 or other are used for setting up the device of communication on WAN93, as the Internet.Modulator-demodular unit 92 can be built-in or external, links to each other with system bus 41 by user's input interface 80 or other suitable structures.In a network environment, the program module with computing machine 30 or some part correlation wherein of description can be stored in the remote memory storage device.As example, but be not limited to this, accompanying drawing 2 shows the remote application 95 that resides in the remote computer 94.It is typical that network shown in being appreciated that connects, and can use other to set up the device of intercomputer communication link.
An a kind of exemplary embodiments of speech recognition system 100 as shown in Figure 3.This speech recognition system 100 comprises microphone 83, a modulus (A/D) converter 104, a training module 105, characteristic extracting module 106, dictionaries store module 110, a sound model 112 along the senone tree, a tree-shaped search engine 114, language model 16 and a general language model 111.It should be noted that total system 100, or the part of speech recognition system 100, can in environment shown in Figure 2, realize.For example, microphone 83 passes through an appropriate interface, and can be used as an input equipment of computing machine 30 by A/D converter 104.Training module 105 and characteristic extracting module 106 both can be the hardware modules in the computing machine 30, also can be the software modules that is stored in the disclosed any information storing device of Fig. 2, and by processing element 40 or the visit of other suitable processor.In addition, dictionaries store module 110, sound model 112 and language model 16 and 111 preferably also are stored in any memory device shown in 2.And tree-shaped search engine 114 is realized (can comprise one or more processors) or is carried out by the voice recognition processor of a special use being used by computing machine 30 in processing element 40.
In illustrated embodiment, during speech recognition, provide voice to input to microphone 83 by the user as system 100 with the form of audio sound signal.Microphone 83 conversion audio speech signals are an analog electronic signal, and offer A/D converter 104.A/D converter 104 these analog voice signals of conversion are the string number signal, and offer characteristic extracting module 106.In one embodiment, characteristic extracting module 106 is a kind ofly traditional digital signal is carried out spectral analysis and calculated the array processor of value for the frequency range of each frequency spectrum.In an illustrated embodiment, this signal offers characteristic extracting module 106 with the sample frequency of about 16kHz by A/D converter 104.
Characteristic extracting module 106 will be divided into the frame that comprises a plurality of numeral samples from the digital signal that A/D converter 104 receives.The duration of each frame approximately is 10 milliseconds.These frames are encoded into the proper vector of a plurality of frequency range spectrum signatures of reflection by characteristic extracting module 106 then.Under the situation of discrete and semicontinuous implication Markov model, characteristic extracting module 106 is also used the vector quantization technology and the encoding book that obtains from training data is encoded, and this proper vector is one or more coded words.Therefore, characteristic extracting module 106 provides its output to be used for each proper vector that what is said or talked about (or coded word).Characteristic extracting module 106 provides proper vector (or coded word) with the frequency of about per 10 milliseconds of proper vectors or (coded word).
Utilize of the distribution of the proper vector (or coded word) of just analyzed particular frame then according to implication Markov model calculating output probability.These probability distribution back are used in the treatment technology of carrying out a Veterbi decoding process or similar type.
After characteristic extracting module 106 receives coded word, tree-shaped search engine 114 visits are stored in the information in the acoustic model 112.This model 112 stores acoustic model, as the implication Markov model, and the speech features that its expression is detected by speech recognition system 100.In one embodiment, acoustic model 112 comprises a senone tree relevant with each Markov state in the implication Markov model.The implication Markov model is represented phoneme in an illustrated embodiment.Based on the senone in the acoustic model 112, tree-shaped search engine 114 is determined to be reached from the expression of the tongue of system user reception by the most similar phoneme of proper vector (or the code word) expression that receives from characteristic extracting module 106.
Tree-shaped search engine 114 is also visited the dictionary that is stored in the module 110.The information that is received by the tree-shaped search engine 114 based on the visit of its acoustic model 112 is used to search for dictionary memory module 110 and represents the coded word that receives from characteristic extracting module 106 or the word of proper vector to determine a most probable.Simultaneously, search engine 114 access language models 16 and 111.In one embodiment, language model 16 is N character rows that are used to discern a word of the most similar character represented by the input voice or a plurality of characters, and it comprises a character (a plurality of character), and contextual tagging and a word phrase are with identification character.For example, the input voice can be " N as inNancy ", and wherein " N " (also can be small letter) is required character, and " as in " is contextual tagging, and " Nancy " is that a word phrase relevant with character " N " is to illustrate or to discern required character.As for phrase " N as in Nancy ", the output of speech recognition system 100 may have only character " N ".In other words, speech recognition system 100 determines that the user has selected to spell character after the input speech data of having analyzed about phrase " N as in Nancy ".Therefore, contextual tagging has been left in the basket from the text of output with relevant word phrase.Search engine 114 can be deleted contextual tagging and relevant word phrase where necessary.
It should be noted that in this embodiment language model 111 is a word N character row that is used to discern the most similar word of being represented by the input voice of general oral instruction.For example, when speech recognition system 100 was embodied in a dictation system, language model 111 was provided for the indication of the most similar word of general oral instruction; Yet, when the user uses when having the phrase of contextual tagging, have high value than the language model 111 that is used for identical phrase from the output meeting of language model 16.The high value of language model 16 is used as an indication in the system 100 that the user just discerning with contextual tagging and word phrase.Therefore, for an input phrase with contextual tagging, the processing element of search engine 114 or other speech recognition systems 100 can be ignored contextual tagging and word phrase, and only exports required character.Below will continue to discuss the use of language model 16.
Although speech recognition system 100 described here is used HMM model and senone tree, be to be understood that this is an illustrative embodiment.One of skill in the art will recognize that speech recognition system 100 can have many forms, the feature that just is to use language model 16 that it is required also provides the said text of user as output.
As everyone knows, a kind of N of statistics character row language model is that a word produces a kind of probability estimation, supposes the word sequence up to that word.(promptly supposing the historical record H of word).N character row language model only considered among the historical record H influential preceding (n-1) the individual word of the probability of next word.For example, two array (or 2 arrays) language models are considered the influential previous word of next word.Therefore, in a N character array language model, the probability of a word appearance is expressed as follows:
P(w/H)=P(w/w1,w2,…w(n-1)) (1)
Wherein w is an important speech;
W1 is positioned at the word that word w front is positioned at the n-1 position;
W2 is positioned at the word that word w front is positioned at the n-2 position; And
W (n-1) is first word of word w front in the sequence.
Simultaneously, the probability of a word sequence is determined based on the increase of the probability of the word of each given its historical record.Therefore, word sequence (w1 ... wm) probability is expressed as follows: P ( w 1 Λwm ) = Π i = 1 m ( P ( w i 20 / H i ) ) - - - ( 2 )
(set of phrase, sentence, statement interlude, paragraph or the like) obtains N character row model in N character row algorithm to the text training data corpus by using.A N character row algorithm can use, and for example, the statistical technique of knowing is the Katz technology for example, or binomial rear end distribution compensation technique.In these technology of use, this algorithm estimation word w (n) will follow a word sequence W1, W2 ..., the probability of W (n-1).These probable values form N character row language model jointly.The following stated some aspect of the present invention can be used to construct the statistics type N character row model of a standard.
First main aspect of the present invention as shown in Figure 4, as method 140 that is used for language processing system with the language model of pointing character of a kind of establishment.Also can be with reference to Fig. 5, a kind ofly comprise system or install 142 with the module that is used for implementation method 140 with instruction.Usually, method 140 comprises, for each word phrase of a word phrase table, interrelates with the contextual tagging of representing this character string of identification at step 144 character string and this word phrase with this word phrase.It should be noted that this character string can comprise a single character.Equally, a word phrase can comprise a single word.For example, for a character string and a word phrase that equals a word that equals a character, step 144 interrelates the contextual tagging that is used for each word in character of this word and the word list 141.Word in the language-specific that contextual tagging is normally used by the talker or word phrase are with a language element in the identified word phrase.The example of contextual tagging comprises " as in " in English, " for example ", " as found in ", " like ", " such as " or the like.Similarly word or word phrase also can find in other language, for example in the Japanese with Chinese in.In one embodiment, step 144 comprises the corpus of constructing a word phrase 143.Each word phrase comprises a character string, word phrase and contextual tagging.Typically, when a single character is related with a word, use first character, although also can use another character of this word.The example of this word phrase comprises " N as in Nancy ", " P as in Paul " and " Z as in zebra ".
In another embodiment, another character of word is related with word and contextual tagging, and in some language, for example Chinese, wherein many words include only one, two or three character, and this is helpful to each character of this word is associated with word in the contextual tagging.As noted above, related required character is to form identical word phrase with a kind of simple method of corresponding word and contextual tagging.Therefore, 141, one of the given word lists corpus that is used for the word phrase 143 of train language model can be easy to generate all required contextual taggings.
Based on corpus 143, language model 16 utilizes conventional construction model 146 to construct, and as a N character row tectonic model, realizes the known technology that is used to construct language model 16.Piece 148 is illustrated in structure language model 16 in the method 140, and wherein language model 16 includes, but are not limited to, a N character row language model, a context-free grammar or above-mentioned mixing.
The phrase that generates can a designated suitable numerical value, will produce a suitable probable value according to the formation of language model.In above-mentioned example, " N as in Nancy " more may be said than phrase " N as in notch ".Therefore, another feature of the present invention comprises each relevant character string in the language model and word phrase correction probability score.The probability score can be regulated at the generation of language model 16.In another embodiment, the probability score can produce a suitable probable value by being included in character and the word phrase that the identical word phrase of sufficient amount is proofreaied and correct to being correlated with in the corpus 143 in language model.Probable value also can be a function of the likelihood of word phrase use.Usually, exist than the more frequent word phrase of discerning a character or a plurality of characters of other words uses.This word phrase can provide a higher probable value designated or in addition in language model.
Fig. 6 shows a kind of sound identification module 180 and language model 16.Sound identification module 180 can be a above-mentioned type; Yet, be to be understood that sound identification module 180 is not limited to this embodiment, but a lot of forms can be arranged.As above-mentioned specified, whether data and access language model 16 that sound identification module 180 receives expression input voice comprise the phrase with contextual tagging with definite input voice.When detecting the word phrase with contextual tagging, 180 of sound identification modules provide single or multiple characters relevant with the word phrase with contextual tagging as output, rather than contextual tagging or word phrase.In other words, although sound identification module detects complete phrase " N as in Nancy ", sound identification module also only provides " N " as output.This output is particularly useful in dictation system, and wherein the talker selects individually to indicate required single or multiple characters.
In this, should be noted that above-mentioned language model 16 is made of relevant character string, word phrase and contextual tagging in fact, therefore allow 16 pairs of language models to have the input voice induction of this form.In the embodiments of figure 3, general language model 111 can be used as the input voice of the particular form that does not have character string, word phrase and contextual tagging.Yet, be to be understood that language model 16 and 111 can merge if necessary in these two kinds of embodiment.
For the reception of input voice with to the visit of language model 16, sound identification module 180 is determined the character string of an identification and the word phrase of an identification for the input voice.In many situations, the character string of identification will be because use language model 16 will be correct.Yet, In yet another embodiment, can comprise a character authentication module 182 and proofread and correct at least a portion in the mistake that causes by sound identification module 180.Identification string and identified word phrase and the character string of relatively identification and the word phrase of identification that 182 visits of character authentication module are determined by sound identification module 180, particularly, the character string of checking identification is present in the word phrase of identification.If the character string of identification is not in the word phrase of identification, mistake has clearly taken place, although should mistake both may be derived from the talker, also may be that sound identification module 180 has been misread the character string of identification or the word phrase of identification owing to dictated wrong phrase as " M as in Nancy ".In one embodiment, character authentication module 182 can suppose that therefore this mistake more may, substitute the character string of identification with the character in the word phrase that is present in identification in the character string of identification.Replace the character string of identification can relatively carrying out with the character of the word phrase of identification according to the similarity of sound between the character in the word phrase of the character string of identification and identification.Therefore, character authentication module 182 can be visited when belonging to the independently storage data of the sound of character.Utilization is present in the character in the word phrase of identification, the voice data of each character that character authentication module 182 is relatively stored in the word phrase of identification and the character string of identification.Provide immediate character as output.To those skilled in the art, character authentication module 182 can be included in the sound identification module 180; Yet for the purpose of explaining, character authentication module 182 is separately to illustrate.
Although the present invention is described with reference to preferred embodiment, those of ordinary skills can make various modifications under the situation that does not deviate from spirit and scope of the invention in form and description.

Claims (40)

1. an establishment is used for the method for the language model of speech recognition system with pointing character, and this method comprises:
For each the word phrase in the word phrase table, the contextual tagging that this character string is discerned in character string and this word phrase and the expression of this word phrase is associated;
Construct the function of a language model as related words phrase and character string.
2. the process of claim 1 wherein that this language model comprises a statistical language model.
3. the method for claim 2, wherein this language model comprises a N character row language model.
4. the method for claim 2, wherein this language model comprises a context-free grammar.
5. the process of claim 1 wherein that association comprises the corpus of constructing relevant character string, word phrase and a contextual tagging, and wherein construct language model comprise the visit this corpus.
6. the process of claim 1 wherein that association comprises that first character with each word phrase is associated with this word phrase.
7. the method for claim 6, wherein related another character that comprises at least a portion word phrase, rather than first character are associated with corresponding word phrase.
8. the method for claim 7, wherein related each character that comprises at least a portion word phrase is associated with corresponding word phrase.
9. the method for claim 7, be associated each character of each word phrase wherein related comprising with corresponding word phrase.
10. the method for claim 1 also is included as each relevant character string and word phrase correction probability score in the language model.
11. the process of claim 1 wherein that association is included as phrase that comprises word phrase character string, word phrase and contextual tagging of each word phrase formation of word phrase table.
12. the method for claim 11, wherein contextual tagging is similar to " the as in " in the English.
13. the method for claim 11, wherein contextual tagging comprise in the Chinese.
14. the method for claim 11, wherein contextual tagging comprises in the Japanese.
15. the process of claim 1 wherein that each word phrase all is a word.
16. the method for claim 15, wherein each character string all is a single character.
17. the process of claim 1 wherein that each character string all is a single character.
18. the computer-readable medium with instruction, when being carried out by a processor, this computer-readable medium is carried out the method that is used to discern said character, and this method comprises:
Reception has the word phrase of character string, tape character string and the input voice of contextual tagging;
Export this character string with text, and do not have word phrase and contextual tagging.
19. the computer-readable medium of claim 18 also comprises the instruction that is used to visit the language model of representing a plurality of phrases, each phrase has the word phrase and the contextual tagging of a character string, tape character string.
20. the computer-readable medium of claim 19, wherein this language model is represented in fact by the relevant character string, is had the word phrase of character string and the phrase that contextual tagging constitutes.
21. the computer-readable medium of claim 19, wherein output string comprises and utilizes the output string of language model with the function of identification string.
22. the computer-readable medium of claim 21, wherein language model comprises a statistical language model.
23. the computer-readable medium of claim 22, wherein language model comprises a N character row language model.
24. the computer-readable medium of claim 21, wherein output string comprises the output string of unique function of the N character row that is used as the input voice that received.
25. the computer-readable medium of claim 21, wherein output string comprises as the compare output string of function of character string that will identification and the word phrase of identification.
26. the computer-readable medium of claim 25, wherein when the character string of identification was not present in the word phrase of identification, the character string that is output was a character string of the word phrase of identification.
Hold 27. the computer-readable of claim 21 is situated between, wherein language model comprises a context-free grammar.
28. the computer-readable medium of claim 18, wherein each word phrase all is a word.
29. the computer-readable medium of claim 28, wherein each character string all is a single character.
30. the computer-readable medium of claim 18, wherein each character string all is a single character.
31. the computer-readable medium with instruction, when being carried out by a processor, this computer-readable medium is used to discern said character, and this instruction comprises:
A language model, the expression in fact by the relevant character string, have the word phrase of character string and the phrase that contextual tagging constitutes; And
Data, access language model and an output that is used to receive expression input voice is wherein imported voice and is comprised a word phrase and the contextual tagging with character string by the identification module of the said character string of user.
32. the computer-readable medium of claim 31, wherein this identification module output string.
33. the computer-readable medium of claim 31, wherein language model comprises a statistical language model.
34. the computer-readable medium of claim 31, wherein language model comprises a N character row language model.
35. the computer-readable medium of claim 31, wherein language model comprises a context-free grammar.
36. the computer-readable medium of claim 31, wherein identification module output is as the character string of function that the character string of identification is compared with the word phrase of identification.
37. the computer-readable medium of claim 31, wherein when the character string of identification was not present in the word phrase of identification, the character string that is output was a character string of the word phrase of identification.
38. the computer-readable medium of claim 31, wherein each word phrase all is a word.
39. the computer-readable medium of claim 38, wherein each character string all is a single character.
40. the computer-readable medium of claim 31, wherein each character string all is a single character.
CNB021065306A 2001-01-31 2002-01-29 Divergence elimination language model Expired - Fee Related CN100568222C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/773,342 US6507453B2 (en) 2000-02-02 2001-01-31 Thin floppy disk drive capable of preventing an eject lever from erroneously operating
US09/773,342 2001-01-31

Publications (2)

Publication Number Publication Date
CN1369830A true CN1369830A (en) 2002-09-18
CN100568222C CN100568222C (en) 2009-12-09

Family

ID=25097940

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB021065306A Expired - Fee Related CN100568222C (en) 2001-01-31 2002-01-29 Divergence elimination language model

Country Status (1)

Country Link
CN (1) CN100568222C (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1940915B (en) * 2005-09-29 2010-05-05 国际商业机器公司 Corpus expansion system and method
CN101248407B (en) * 2005-05-27 2010-12-08 索尼爱立信移动通讯股份有限公司 Automatic language selection for text input in messaging context
CN101256624B (en) * 2007-02-28 2012-10-10 微软公司 Method and system for establishing HMM topological structure being suitable for recognizing hand-written East Asia character
US8326333B2 (en) 2009-11-11 2012-12-04 Sony Ericsson Mobile Communications Ab Electronic device and method of controlling the electronic device
CN103943109A (en) * 2014-04-28 2014-07-23 深圳如果技术有限公司 Method and device for converting voice to characters
CN105045777A (en) * 2007-08-01 2015-11-11 金格软件有限公司 Automatic context sensitive language correction and enhancement using an internet corpus
CN105340003A (en) * 2013-06-20 2016-02-17 株式会社东芝 Speech synthesis dictionary creation device and speech synthesis dictionary creation method
CN106663428A (en) * 2014-07-16 2017-05-10 索尼公司 Apparatus, method, non-transitory computer-readable medium and system
CN108780444A (en) * 2016-03-10 2018-11-09 微软技术许可有限责任公司 Expansible equipment and natural language understanding dependent on domain
CN113034995A (en) * 2021-04-26 2021-06-25 读书郎教育科技有限公司 Method and system for generating dictation content by student tablet

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101248407B (en) * 2005-05-27 2010-12-08 索尼爱立信移动通讯股份有限公司 Automatic language selection for text input in messaging context
US8364134B2 (en) 2005-05-27 2013-01-29 Sony Ericsson Mobile Communications Ab Automatic language selection for text input in messaging context
CN1940915B (en) * 2005-09-29 2010-05-05 国际商业机器公司 Corpus expansion system and method
CN101256624B (en) * 2007-02-28 2012-10-10 微软公司 Method and system for establishing HMM topological structure being suitable for recognizing hand-written East Asia character
CN105045777A (en) * 2007-08-01 2015-11-11 金格软件有限公司 Automatic context sensitive language correction and enhancement using an internet corpus
US8326333B2 (en) 2009-11-11 2012-12-04 Sony Ericsson Mobile Communications Ab Electronic device and method of controlling the electronic device
CN105340003A (en) * 2013-06-20 2016-02-17 株式会社东芝 Speech synthesis dictionary creation device and speech synthesis dictionary creation method
CN105340003B (en) * 2013-06-20 2019-04-05 株式会社东芝 Speech synthesis dictionary creating apparatus and speech synthesis dictionary creating method
CN103943109A (en) * 2014-04-28 2014-07-23 深圳如果技术有限公司 Method and device for converting voice to characters
CN106663428A (en) * 2014-07-16 2017-05-10 索尼公司 Apparatus, method, non-transitory computer-readable medium and system
CN108780444A (en) * 2016-03-10 2018-11-09 微软技术许可有限责任公司 Expansible equipment and natural language understanding dependent on domain
CN113034995A (en) * 2021-04-26 2021-06-25 读书郎教育科技有限公司 Method and system for generating dictation content by student tablet
CN113034995B (en) * 2021-04-26 2023-04-11 读书郎教育科技有限公司 Method and system for generating dictation content by student tablet

Also Published As

Publication number Publication date
CN100568222C (en) 2009-12-09

Similar Documents

Publication Publication Date Title
CN100568223C (en) The method and apparatus that is used for the multi-mode input of ideographic language
EP3469585B1 (en) Scalable dynamic class language modeling
CN1667700B (en) Method for adding voice or acoustic description, pronunciation in voice recognition dictionary
US7831911B2 (en) Spell checking system including a phonetic speller
US7251600B2 (en) Disambiguation language model
CN1667699B (en) Generating large units of graphonemes with mutual information criterion for letter to sound conversion
US6327566B1 (en) Method and apparatus for correcting misinterpreted voice commands in a speech recognition system
US6910012B2 (en) Method and system for speech recognition using phonetically similar word alternatives
JP4267385B2 (en) Statistical language model generation device, speech recognition device, statistical language model generation method, speech recognition method, and program
KR101153078B1 (en) Hidden conditional random field models for phonetic classification and speech recognition
US8065149B2 (en) Unsupervised lexicon acquisition from speech and text
EP3417451A1 (en) Speech recognition system and method for speech recognition
JP2559998B2 (en) Speech recognition apparatus and label generation method
AU2010212370B2 (en) Generic spelling mnemonics
US6876967B2 (en) Speech complementing apparatus, method and recording medium
CN1760972A (en) Testing and tuning of speech recognition systems using synthetic inputs
CN110335608B (en) Voiceprint verification method, voiceprint verification device, voiceprint verification equipment and storage medium
CN100568222C (en) Divergence elimination language model
US20020184016A1 (en) Method of speech recognition using empirically determined word candidates
JP4499389B2 (en) Method and apparatus for generating decision tree questions for speech processing
Anastasopoulos Computational tools for endangered language documentation
Ogawa et al. Error type classification and word accuracy estimation using alignment features from word confusion network
US7272560B2 (en) Methodology for performing a refinement procedure to implement a speech recognition dictionary
JP2002221984A (en) Voice retrieving method and device for different kind of environmental voice data
CN115298736A (en) Speech recognition and training for data input

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20091209

Termination date: 20150129

EXPY Termination of patent right or utility model