CN1398395A - Global approach for segmenting characters into words - Google Patents
Global approach for segmenting characters into words Download PDFInfo
- Publication number
- CN1398395A CN1398395A CN99817082.8A CN99817082A CN1398395A CN 1398395 A CN1398395 A CN 1398395A CN 99817082 A CN99817082 A CN 99817082A CN 1398395 A CN1398395 A CN 1398395A
- Authority
- CN
- China
- Prior art keywords
- probability
- path
- speech
- word
- division path
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013459 approach Methods 0.000 title description 2
- 238000000034 method Methods 0.000 claims abstract description 22
- 230000011218 segmentation Effects 0.000 abstract 11
- 230000014509 gene expression Effects 0.000 description 4
- 230000000295 complement effect Effects 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 102220038624 rs7748563 Human genes 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/083—Recognition networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/19—Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
- G10L15/197—Probabilistic grammars, e.g. word n-grams
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
In some embodiments, the invention includes a method. The method involves creating a path list of segmentation paths of characters using a vocabulary. A probability of a first segmentation paht is determined and designated as the best segmentation path. The probability of an additional one of the segmentation paths is determined and compared with the probability of the best segmentation path. If the probability of the additional segmentation path exceeds that of the best segmentation path, the additional segmentation path is designated the best segmentation path. This is repeated until the probability for all remaining segmentation paths have been determined and compared with the probability of the best segmentation path. In some embodiments, the invention is an apparatus including a computer readable medium that performs such a method.
Description
Technical field
The present invention relates to speech recognition system, or rather, relate in speech recognition system some strokes are divided into speech.
Background technology
A part in the speech recognition device is a language model.Catch a kind of common methods of given language syntactic structure, be to use conditional probability to catch the orderly information that embeds in the speech string of sentence.For example,, can construct a language model if current speech is W1, represent some other speech W2, W3 ... Wn can follow the probability of W1.The probability of these speech can adopt following mode to represent: the probability that P21 can follow speech W1 for speech W2, wherein P21=(W2|W1).With this representation, P31 can follow the probability of speech W1 for speech W3; The probability that P41 can follow speech W1 for speech W4, and the like, Pn1 can follow the probability of speech W1 for speech Wn.P21, P31 ... maximal value among the Pn1 can be determined and be used in the language model.Aforesaid example is for the binary probability, although also can calculate the ternary conditional probability.
The generation of language model is often by the conditional probability of some speech in investigation literary works (such as newspaper) and the definite vocabulary with respect to other speech in the vocabulary.
In some language, such as Chinese and Japanese, speech can be written as the word of one or more character types, for example Chinese character in the Chinese and the Chinese character in the Japanese.Sentence is made up of word string, and speech wherein implies, because do not have at interval between the speech of adjacency.A specific word may oneself itself be exactly a speech, and perhaps the word with its front or back (also possibility while and front and back) combines to form a speech.When producing speech word how in conjunction with or separate, the meaning of speech has variation.Yet in writing form, not at interval,, perhaps form this speech between word and the word with another word or a plurality of word so whether a specific word oneself itself is exactly a speech, visually also not obvious.And which speech a specific word belongs to understand from context.For to the language model applied statistical method, adopt the mode at interval of on the border of speech, placing, speech is extracted clearly.
It is to be undertaken by " greedy algorithm " traditionally that stroke is divided into speech.Greedy algorithm may further comprise the steps:
(1) from the starting point of the given sentence that will handle, all possible speech that the word string start-up portion is complementary in exhaustive and the sentence.
(2) afterbody that picks up the longest speech (just, having the speech of maximum numbers of words) and the substring that is complementary in sentence is placed an interval, and all the other word strings are treated as a new sentence, and repeating step (1) all word processings in sentence finish.
From the viewpoint of the overall situation, greedy algorithm is not to make best selection.In fact, the combination that it is selected may be neither optimum be correct on the also non-sentence structure.As people such as T.Cormen in " Introduction to Algorithms " (The MIT Press, 1990) 329 pages say: " greedy algorithm always is made in this moment and seems best selection.Just, it is made local optimum and selects, and wishes that this selection can cause globally optimal solution.”
Summary of the invention
In certain embodiments, the present invention includes a kind of method.This method comprises a path list that uses certain vocabulary to produce the stroke sub-path.Determining to divide the probability in path and specify it for one first is the optimum division path.Determine that another one divides the probability in path and the probability in it and optimum division path is compared.If the probability in other division path surpasses the probability in optimum division path, just the optimum division path is appointed as in other division path.Repeat that this way divides up to all remaining that paths all obtain determining and finish with the likelihood ratio in optimum division path.
In certain embodiments, the present invention is a kind of device, comprises a kind of computer-readable medium, and it carries out this method.In more other embodiment, the present invention is a kind of computer system.
Introduce additional embodiment and prescription below.
Brief Description Of Drawings
Will be more fully understood the present invention from the accompanying drawing of the detailed introduction given below and the embodiment of the invention, but, the specific embodiment that they should not introduced as limiting the invention to, and only be in order to explain and to understand.
Fig. 1 is the high-level schematic block diagram of a computer system of expression, and some embodiment of the present invention can be together with this system of use.
Fig. 2 is the high-level synoptic diagram of a handheld computer system, and some embodiment of the present invention can be together with this system of use.
Embodiment
The present invention relates to a kind of system and method from the stroke participle.Just, the present invention relates to determine which speech a word should belong to.The present invention have with some language such as relevant with Japanese, the specific application of Chinese, the interval of speech division do not represented in these language between word and word.But the present invention is not limited to this type of purposes.Disclosed the present invention is designed to, and speech division preferably made in given any sentence.The language model of doing generation like this is better than the model that classic method above introduction, that use greedy algorithm obtains.Language model can cause recognition accuracy preferably preferably, because it has described this language preferably with regard to the speech string.
In certain embodiments, the present invention uses the dynamic programming algorithm execution division that statistical language model is equipped with.The mode that can carry out dynamic algorithm has a variety of.An example of dynamic algorithm is as follows.At first, calculate the n gram language model by traditional greedy algorithm and handle main body (promptly will be divided into the word of speech).Then, use the Viterbi algorithm to repartition this sentence.The Viterbi algorithm is a kind of dynamic programming, and it can be used for global optimization." Introduction to Algorithms " (The MIT Press, 1990) 301-328 page or leaf referring to people such as T.Cormen.The Viterbi algorithm that we use can be described as following (1) formula:
In (1) formula, P is a probability, and " prob " comprises this language model.In (1) formula, w
iBe i speech, w
I-1For near w
iPrevious speech, Pw
I-1Be w
I-1The probability that individual speech occurs, prob (w
i| w
I-1) be if speech w
I-1During appearance, speech w
iThe conditional probability that occurs.(1) formula relates to and finds to make the maximized speech w of (1) formula
iBy finding the solution (1) formula, word sequence (w0w1 as a result ... wN) will guarantee that under the selected meaning that is divided in maximum likelihood be best.In certain embodiments, work as i=N, when arriving the sentence ending, have global maximum.
(1) formula is a binary form, but, if other form is arranged in language model, such as ternary or unary form, also can use.Can also other technology of using compensation weighted sum.
As mentioned above, in some language, each word oneself itself just may be a speech.Yet, the present invention relates to determine that word can combine with other word to form other speech, still be alone that speech is better.The speech of being made up of a plurality of words also can be called term or phrase.
A kind of version of greedy algorithm provides as follows with pseudocode form:
Read vocabulary; // vocabulary is the tabulation of possible speech
Open the language main body; // language main body comprises the word that will be divided into speech
When (not being the ending of language main body)
{
From the language main body, read delegation and put into row buffer;
// row buffer is a storage stack, is not limited to any specific
Form
When (row buffer non-NULL)
{
Find with row buffer head coupling, the longest speech in the vocabulary;
Export this speech and a speech separator;
From row buffer, remove the head of coupling;
}
The output line Separator;
}
Close the closed language main body;
In certain embodiments, according to of the present invention, use language model one in partitioning algorithm may further comprise the steps: read language model; // language model is loaded in the storer or alternate manner makes
The available vocabulary of reading; Open the language main body; { from the language main body, read delegation and put into row buffer when (not being the ending of language main body);
Number of words in the // delegation can change according to embodiment; Delegation can
Can be one and use vocabulary, produce the path list that comprises all possible division path;
// one is divided the path is a kind of possible stroke branch; Can use different shapes
Formula is deposited the path, and for example tabulation or tree construction are found the division path of greed and it is saved as optimal path;
// can use multiple greedy algorithm such as above provide a kind of; Of the present invention
Among this embodiment,
// greed is divided the path and is regarded as optimal path at first, but also can use it
Its initial path uses language model to calculate the probability in this path, and this value is changed to maximum probability;
The general of another speech followed in probability and a speech that // language model specifies speech to take place
Rate.Can use (1) formula or another
// formula calculating probability { is selected the path and it is changed to current path when (path list non-NULL) from path list; Use language model, calculate the probability of current path; (if the probability>maximum probability of current path) {
The probability of maximum probability=current path;
Current path saves as optimal path;
}
From path list, remove current path;
}
The output optimal path;
}
Close the closed language main body;
In conjunction with the Chinese words in following, provide an example of this algorithm.
Urtext:
Use the division result of greedy method:
Use the division result of language model:
Example 1.
During correct the division, the meaning of this sentence is " having way and strength to deal with problems ".The present invention has successfully divided this sentence, and traditional method is not accomplished.
In example 1, urtext is considered as following eight words forms in order: C1, C2, C3, C4, C5, C6, C7 and C8.From urtext, visually also unclear how word the grouping to form speech.Following table 1 has provided two kinds of possible modes that the word grouping formed five speech W1-W5.
Table 1:
Speech | According to prior art greedy algorithm, the word that comprises in the speech | According to the present invention, the word that comprises in the speech |
????W1 | ????C1 | ????C1 |
????W2 | ????C2C3 | ????C2C3 |
????W3 | ????C4C5 | ????C4 |
????W4 | ????C6 | ????C5C6 |
????W5 | ????C7C8 | ????C7C8 |
It is as follows to use a kind of greedy algorithm generation greed to divide the path.In main body, in the vocabulary of consecutive word, be exactly the speech that has only word C1 with word C1 the longest initial speech.In other words, C1C2 is not the speech in the vocabulary.So speech W1 is exactly word C1.In certain embodiments, speech W1 leaves row buffer, and next word becomes capable head, although this is an implementation detail that need not illustrate.In this example, next word is C2.In main body, in the vocabulary of consecutive word, be the speech that comprises word C2C3 with word C2 the longest initial speech.In other words, C2C3 is in vocabulary, but C2C3C4 does not exist.So speech W2 is exactly word C2C3.In main body, in the vocabulary of consecutive word, be the speech that comprises word C4C5 with word C4 the longest initial speech.So speech W3 is exactly word C4C5.In main body, in the vocabulary of consecutive word, be the speech that comprises word C6 with word C6 the longest initial speech.So speech W4 is exactly word C6.In main body, in the vocabulary of consecutive word, be the speech that comprises word C7C8 with word C7 the longest initial speech.So speech W5 is exactly word C7C8.
Calculate this greed and divide the probability in path.For speech W1 and W2 and word C1, C2 and C3, division that comprise in the vocabulary, only path is the path of being selected by greedy algorithm.A kind of method of handling this situation is not recomputate probability, but is not also not calculate another kind of probability when existing other path that vocabulary allows.Another kind method is to recomputate the probability in same path, only can determine that they are identical, makes current path not replace maximum probability.
Yet,, have two kinds of paths for speech W3 and W4.First kind is that greedy algorithm is selected, and W3 is C4C5, and W4 is C6.The division path that another kind of vocabulary is allowed is, W3 is C4, and W4 is C5C6.In this example, the probability that the combination of supposing to follow C5C6 in the C4 back is being followed C6 than the combination back of C4C5 is bigger.(W5 is identical in each case.) so in (1) formula, the probability of current path can be divided the probability in path greater than greed, it can replace greed and divide the path.The possibility that merits attention below the attention.The combination of supposing C4C5 is bigger than the probability of C4 oneself.According to this single bit of information, can select greed to divide the path.Yet this can not cause preferably the overall situation to be separated, and is bigger because the probability that C5C6 following C6 than C4C5 back is being followed in the C4 back.
Row can be a sentence.As usage herein, term " sentence " is meant with the one group continuous speech of a symbol such as the fullstop ending.In different embodiment, in dividing the path, can consider not word on the same group.For example, all words in the sentence can be considered in the division path.Divide the path and can consider a mobile word window, and do not consider the sentence ending, notice that only language model does not allow the word of a sentence ending to combine with first word in the next sentence.Window may be a word of setting number.If the last character in previous path not in speech, from its initial new division path, is divided the path and may be comprised X word.Other possibility also exists.
There is various computing systems can be used for training and speech recognition system.Only be that Fig. 1 represents the high-level schematic of computer system 10 as an example, this system comprises processor 14, storer 16 and I/O and control assembly 18.Storer 16 may comprise row buffer 22.Row buffer only is a storage stack, needn't have any specific feature.For example, it needn't have adjacent memory unit.Have jumbo storer in processor 14, storer 16 may both be represented the not storer on processor 14 chips, and expression part is at the part storer on processor 14 chips not again.(perhaps storer 16 may be fully on processor 14 chips.) in certain embodiments, row buffer 24 is in processor 14, yet row buffer and nonessential in processor 14.In addition, be not that each embodiment of the present invention has row buffer.Dividing the path does not need to leave in the row buffer.At least some I/O and control assembly 18 may be on the same chips of processor 14.Perhaps on another chip.Microphone 26, monitor 30, annex memory 34, input equipment (such as keyboard and mouse 38), network be connected 42 and loudspeaker 44 may be mutual with I/O and control assembly 18.The multiple storer of storer 34 expressions is such as hard disk drive and CD ROM or DVD disc.These comprise computer-readable medium, and they can be held instruction, and carry out these instructions some embodiment of the present invention is taken place.It is emphasized that Fig. 1 only is schematically, the invention is not restricted to the purposes of this type of computer system.Be used to realize that computer system 10 of the present invention and other computer system may be various ways, such as desktop, main frame and pocket computer.
For example, Fig. 2 has shown the handheld device 60 that has display screen 62, and it may contain some or all characteristic of Fig. 1.This handheld device often may be the interface of another computer system, such as the system among Fig. 1.The shape of the object among Fig. 1 and Fig. 2 and relative size are not actual shape and relative size of hint.
Out of Memory and embodiment
The quality of language model is to measure with the confusion degree of puzzling traditionally, and it is a kind of entropy tolerance of language complexity.For identical training and evaluation body of text, the model with low puzzled confusion degree is better than the high model of puzzled confusion degree.As an experiment, use the data in 94 years to 98 years of People's Daily, the ternary model that different division methods estimate is estimated.The puzzled confusion degree of tradition (greed) method is 182, and the result of the embodiment of the invention is 143.Compared with prior art, this is the remarkable improvement of simulation accuracy.
Mention " embodiment ", " embodiment ", " some embodiment " or " other embodiment " in this manual, mean that a kind of specific characteristic, structure or feature together with the embodiment introduction is included at least among some embodiment, but need not to be all embodiment of the present invention.The multiple form of expression " embodiment ", " embodiment " or " some embodiment " needn't refer to same embodiment.
If this instructions declare " can ", " perhaps " or " possibility " comprise certain assembly, characteristic, structure or feature, is not to comprise this specific assembly, characteristic, structure or feature just.If mention " certain " key element in this instructions or claims, and do not mean that this key element has only one.If mention " certain is other " key element in this instructions or claims, not getting rid of has not only other key element.
Those skilled in the art obtains to will appreciate that after the interests of this open file, within the scope of the present invention, can produce many other changes from above introduction and accompanying drawing.Therefore, be that claims following, that comprise any other modification are stipulated scope of the present invention.
Claims (22)
1. method comprises:
(a) use certain vocabulary to produce a path list of stroke sub-path;
(b) determining to divide the probability in path and specify it for one first is the optimum division path;
(c) probability of determining another one division path determines also whether the probability in other division path surpasses the probability in optimum division path, if like this, the optimum division path is appointed as in just that this is other division path,
Repeating (c) divides up to all remaining that paths all obtain determining and finishes with the likelihood ratio in optimum division path.
2. according to the method for claim 1, it is characterized in that first obtains by greedy algorithm.
3. according to the method for claim 1, it is characterized in that, divide the path and leave in the row buffer, and after having compared corresponding probability, from row buffer, remove.
4. according to the method for claim 1, it is characterized in that the word that comprises in the division path is those words in the single sentence.
5. according to the method for claim 1, it is characterized in that the word that comprises in the division path is in the window of certain slip.
6. according to the method for claim 1, it is characterized in that, determine probability by using language model.
7. according to the method for claim 1, it is characterized in that, determine probability by the calculating that relates to following formula:
, w wherein
iBe i speech, w
I-1For near w
iPrevious speech, Pw
I-1Be w
I-1The probability that individual speech occurs, prob (w
i| w
I-1) be if speech w
I-1During appearance, speech w appears
iConditional probability.
8. device comprises:
A kind of computer-readable medium wherein contains instruction, makes computer system when carrying out these instructions:
(a) use certain vocabulary to produce a path list of stroke sub-path;
(b) determining to divide the probability in path and specify it for one first is the optimum division path;
(c) probability of determining another one division path determines also whether the probability in other division path surpasses the probability in optimum division path, if like this, the optimum division path is appointed as in just that this is other division path,
Repeating (c) divides up to all remaining that paths all obtain determining and finishes with the likelihood ratio in optimum division path.
9. device according to Claim 8 is characterized in that, first obtains by greedy algorithm.
10. device according to Claim 8 is characterized in that, divides the path and leaves in the row buffer, and remove from row buffer after having compared corresponding probability.
11. device according to Claim 8 is characterized in that, the word that comprises in the division path is those words in the single sentence.
12. device according to Claim 8 is characterized in that, the word that comprises in the division path is in the window of certain slip.
13. device according to Claim 8 is characterized in that, determines probability by using language model.
14. device according to Claim 8 is characterized in that, determines probability by the calculating that relates to following formula:
, w wherein
iBe i speech, w
I-1For near w
iPrevious speech, Pw
I-1Be w
I-1The probability that individual speech occurs, prob (w
i| w
I-1) be if speech w
I-1During appearance, speech w
iThe conditional probability that occurs.
15. device according to Claim 8 is characterized in that, this device is an one disc.
16. a computer system comprises:
Preserve and divide the storer that forms the word path list of speech in the vocabulary;
Processor, it
(a) determining to divide the probability in path and specify it for one first is the optimum division path;
(b) probability of determining another one division path determines also whether the probability in other division path surpasses the probability in optimum division path, if like this, just the optimum division path is appointed as in other division path,
Repeating (b) divides up to all remaining that paths all obtain determining and finishes with the likelihood ratio in optimum division path.
17. the device according to claim 16 is characterized in that, first obtains by greedy algorithm.
18. the device according to claim 16 is characterized in that, divides the path and leaves in the row buffer, and remove from row buffer after having compared corresponding probability.
19. the device according to claim 16 is characterized in that, the word that comprises in the division path is those words in the single sentence.
20. the device according to claim 16 is characterized in that, the word that comprises in the division path is in the window of certain slip.
21. the device according to claim 16 is characterized in that, determines probability by using language model.
22. the device according to claim 16 is characterized in that, determines probability by the calculating that relates to following formula:
, w wherein
iBe i speech, w
I-1For near w
iPrevious speech, Pw
I-1Be w
I-1The probability that individual speech occurs, prob (w
i| w
I-1) be if speech w
I-1During appearance, speech w
iThe conditional probability that occurs.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN1999/000213 WO2001048738A1 (en) | 1999-12-23 | 1999-12-23 | A global approach for segmenting characters into words |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1398395A true CN1398395A (en) | 2003-02-19 |
CN1192354C CN1192354C (en) | 2005-03-09 |
Family
ID=4575157
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB998170828A Expired - Fee Related CN1192354C (en) | 1999-12-23 | 1999-12-23 | Global approach for segmenting characters into words |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN1192354C (en) |
AU (1) | AU1767200A (en) |
WO (1) | WO2001048738A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101609671B (en) * | 2009-07-21 | 2011-09-07 | 北京邮电大学 | Method and device for continuous speech recognition result evaluation |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4059725A (en) * | 1975-03-12 | 1977-11-22 | Nippon Electric Company, Ltd. | Automatic continuous speech recognition system employing dynamic programming |
JPS58132298A (en) * | 1982-02-01 | 1983-08-06 | 日本電気株式会社 | Pattern matching apparatus with window restriction |
JPH02195400A (en) * | 1989-01-24 | 1990-08-01 | Canon Inc | Speech recognition device |
US5706397A (en) * | 1995-10-05 | 1998-01-06 | Apple Computer, Inc. | Speech recognition system with multi-level pruning for acoustic matching |
US5799276A (en) * | 1995-11-07 | 1998-08-25 | Accent Incorporated | Knowledge-based speech recognition system and methods having frame length computed based upon estimated pitch period of vocalic intervals |
US5862519A (en) * | 1996-04-02 | 1999-01-19 | T-Netix, Inc. | Blind clustering of data with application to speech processing systems |
JP2001516904A (en) * | 1997-09-18 | 2001-10-02 | シーメンス アクチエンゲゼルシヤフト | How to recognize keywords in spoken language |
US6374220B1 (en) * | 1998-08-05 | 2002-04-16 | Texas Instruments Incorporated | N-best search for continuous speech recognition using viterbi pruning for non-output differentiation states |
-
1999
- 1999-12-23 AU AU17672/00A patent/AU1767200A/en not_active Abandoned
- 1999-12-23 CN CNB998170828A patent/CN1192354C/en not_active Expired - Fee Related
- 1999-12-23 WO PCT/CN1999/000213 patent/WO2001048738A1/en active Application Filing
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101609671B (en) * | 2009-07-21 | 2011-09-07 | 北京邮电大学 | Method and device for continuous speech recognition result evaluation |
Also Published As
Publication number | Publication date |
---|---|
WO2001048738A1 (en) | 2001-07-05 |
AU1767200A (en) | 2001-07-09 |
CN1192354C (en) | 2005-03-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1226717C (en) | Automatic new term fetch method and system | |
CN1159661C (en) | System for Chinese tokenization and named entity recognition | |
CN1269102C (en) | Method for compressing dictionary data | |
CN1207664C (en) | Error correcting method for voice identification result and voice identification system | |
CN1135485C (en) | Identification of words in Japanese text by a computer system | |
CN1260704C (en) | Method for voice synthesizing | |
US6678409B1 (en) | Parameterized word segmentation of unsegmented text | |
CN101079028A (en) | On-line translation model selection method of statistic machine translation | |
EP1687738A2 (en) | Clustering of text for structuring of text documents and training of language models | |
CN1193779A (en) | Method for dividing sentences in Chinese language into words and its use in error checking system for texts in Chinese language | |
CN110853625B (en) | Speech recognition model word segmentation training method and system, mobile terminal and storage medium | |
CN1102779C (en) | Simplified Chinese character-the original complex form changingover apparatus | |
CN1108572C (en) | Mechanical Chinese to japanese two-way translating machine | |
CN1192354C (en) | Global approach for segmenting characters into words | |
CN1114165C (en) | Segmentation of Chinese text into words | |
CN102945231B (en) | Construction method and system of incremental-translation-oriented structured language model | |
CN1607526A (en) | Document sorting apparatus, method and program adopting freezing mode | |
CN1068688C (en) | Literal information processing method and apparatus | |
CN1271550C (en) | Sentence boundary identification method in spoken language dialogue | |
CN1302415C (en) | English-Chinese translation machine | |
CN1203389C (en) | Initial four-stroke Chinese sentence input method for computer | |
CN1144141C (en) | Change-over processor for Chinese input and method of change-over processing for Chinese input | |
CN1679023A (en) | Method and system of creating and using chinese language data and user-corrected data | |
CN1257445C (en) | Chinese-character 'Pronunciation-meaning code' input method | |
CN107168997A (en) | The original appraisal procedure of webpage, device and storage medium based on artificial intelligence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20050309 Termination date: 20121223 |