CN1398395A - Global approach for segmenting characters into words - Google Patents

Global approach for segmenting characters into words Download PDF

Info

Publication number
CN1398395A
CN1398395A CN99817082.8A CN99817082A CN1398395A CN 1398395 A CN1398395 A CN 1398395A CN 99817082 A CN99817082 A CN 99817082A CN 1398395 A CN1398395 A CN 1398395A
Authority
CN
China
Prior art keywords
probability
path
speech
word
division path
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN99817082.8A
Other languages
Chinese (zh)
Other versions
CN1192354C (en
Inventor
阎永红(音译)
托凌云(音译)
林志伟(音译)
张向东(音译)
罗伯特·勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Architecture Development Shanghai Co Ltd
Intel Corp
Original Assignee
Intel Architecture Development Shanghai Co Ltd
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Architecture Development Shanghai Co Ltd, Intel Corp filed Critical Intel Architecture Development Shanghai Co Ltd
Publication of CN1398395A publication Critical patent/CN1398395A/en
Application granted granted Critical
Publication of CN1192354C publication Critical patent/CN1192354C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/083Recognition networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
    • G10L15/197Probabilistic grammars, e.g. word n-grams

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

In some embodiments, the invention includes a method. The method involves creating a path list of segmentation paths of characters using a vocabulary. A probability of a first segmentation paht is determined and designated as the best segmentation path. The probability of an additional one of the segmentation paths is determined and compared with the probability of the best segmentation path. If the probability of the additional segmentation path exceeds that of the best segmentation path, the additional segmentation path is designated the best segmentation path. This is repeated until the probability for all remaining segmentation paths have been determined and compared with the probability of the best segmentation path. In some embodiments, the invention is an apparatus including a computer readable medium that performs such a method.

Description

Dividing word is the global approach of speech
Technical field
The present invention relates to speech recognition system, or rather, relate in speech recognition system some strokes are divided into speech.
Background technology
A part in the speech recognition device is a language model.Catch a kind of common methods of given language syntactic structure, be to use conditional probability to catch the orderly information that embeds in the speech string of sentence.For example,, can construct a language model if current speech is W1, represent some other speech W2, W3 ... Wn can follow the probability of W1.The probability of these speech can adopt following mode to represent: the probability that P21 can follow speech W1 for speech W2, wherein P21=(W2|W1).With this representation, P31 can follow the probability of speech W1 for speech W3; The probability that P41 can follow speech W1 for speech W4, and the like, Pn1 can follow the probability of speech W1 for speech Wn.P21, P31 ... maximal value among the Pn1 can be determined and be used in the language model.Aforesaid example is for the binary probability, although also can calculate the ternary conditional probability.
The generation of language model is often by the conditional probability of some speech in investigation literary works (such as newspaper) and the definite vocabulary with respect to other speech in the vocabulary.
In some language, such as Chinese and Japanese, speech can be written as the word of one or more character types, for example Chinese character in the Chinese and the Chinese character in the Japanese.Sentence is made up of word string, and speech wherein implies, because do not have at interval between the speech of adjacency.A specific word may oneself itself be exactly a speech, and perhaps the word with its front or back (also possibility while and front and back) combines to form a speech.When producing speech word how in conjunction with or separate, the meaning of speech has variation.Yet in writing form, not at interval,, perhaps form this speech between word and the word with another word or a plurality of word so whether a specific word oneself itself is exactly a speech, visually also not obvious.And which speech a specific word belongs to understand from context.For to the language model applied statistical method, adopt the mode at interval of on the border of speech, placing, speech is extracted clearly.
It is to be undertaken by " greedy algorithm " traditionally that stroke is divided into speech.Greedy algorithm may further comprise the steps:
(1) from the starting point of the given sentence that will handle, all possible speech that the word string start-up portion is complementary in exhaustive and the sentence.
(2) afterbody that picks up the longest speech (just, having the speech of maximum numbers of words) and the substring that is complementary in sentence is placed an interval, and all the other word strings are treated as a new sentence, and repeating step (1) all word processings in sentence finish.
From the viewpoint of the overall situation, greedy algorithm is not to make best selection.In fact, the combination that it is selected may be neither optimum be correct on the also non-sentence structure.As people such as T.Cormen in " Introduction to Algorithms " (The MIT Press, 1990) 329 pages say: " greedy algorithm always is made in this moment and seems best selection.Just, it is made local optimum and selects, and wishes that this selection can cause globally optimal solution.”
Summary of the invention
In certain embodiments, the present invention includes a kind of method.This method comprises a path list that uses certain vocabulary to produce the stroke sub-path.Determining to divide the probability in path and specify it for one first is the optimum division path.Determine that another one divides the probability in path and the probability in it and optimum division path is compared.If the probability in other division path surpasses the probability in optimum division path, just the optimum division path is appointed as in other division path.Repeat that this way divides up to all remaining that paths all obtain determining and finish with the likelihood ratio in optimum division path.
In certain embodiments, the present invention is a kind of device, comprises a kind of computer-readable medium, and it carries out this method.In more other embodiment, the present invention is a kind of computer system.
Introduce additional embodiment and prescription below.
Brief Description Of Drawings
Will be more fully understood the present invention from the accompanying drawing of the detailed introduction given below and the embodiment of the invention, but, the specific embodiment that they should not introduced as limiting the invention to, and only be in order to explain and to understand.
Fig. 1 is the high-level schematic block diagram of a computer system of expression, and some embodiment of the present invention can be together with this system of use.
Fig. 2 is the high-level synoptic diagram of a handheld computer system, and some embodiment of the present invention can be together with this system of use.
Embodiment
The present invention relates to a kind of system and method from the stroke participle.Just, the present invention relates to determine which speech a word should belong to.The present invention have with some language such as relevant with Japanese, the specific application of Chinese, the interval of speech division do not represented in these language between word and word.But the present invention is not limited to this type of purposes.Disclosed the present invention is designed to, and speech division preferably made in given any sentence.The language model of doing generation like this is better than the model that classic method above introduction, that use greedy algorithm obtains.Language model can cause recognition accuracy preferably preferably, because it has described this language preferably with regard to the speech string.
In certain embodiments, the present invention uses the dynamic programming algorithm execution division that statistical language model is equipped with.The mode that can carry out dynamic algorithm has a variety of.An example of dynamic algorithm is as follows.At first, calculate the n gram language model by traditional greedy algorithm and handle main body (promptly will be divided into the word of speech).Then, use the Viterbi algorithm to repartition this sentence.The Viterbi algorithm is a kind of dynamic programming, and it can be used for global optimization." Introduction to Algorithms " (The MIT Press, 1990) 301-328 page or leaf referring to people such as T.Cormen.The Viterbi algorithm that we use can be described as following (1) formula: Pw i = max i ( Pw i - 1 + prob ( w i | w i - 1 ) ) - - - - - - - ( 1 )
In (1) formula, P is a probability, and " prob " comprises this language model.In (1) formula, w iBe i speech, w I-1For near w iPrevious speech, Pw I-1Be w I-1The probability that individual speech occurs, prob (w i| w I-1) be if speech w I-1During appearance, speech w iThe conditional probability that occurs.(1) formula relates to and finds to make the maximized speech w of (1) formula iBy finding the solution (1) formula, word sequence (w0w1 as a result ... wN) will guarantee that under the selected meaning that is divided in maximum likelihood be best.In certain embodiments, work as i=N, when arriving the sentence ending, have global maximum.
(1) formula is a binary form, but, if other form is arranged in language model, such as ternary or unary form, also can use.Can also other technology of using compensation weighted sum.
As mentioned above, in some language, each word oneself itself just may be a speech.Yet, the present invention relates to determine that word can combine with other word to form other speech, still be alone that speech is better.The speech of being made up of a plurality of words also can be called term or phrase.
A kind of version of greedy algorithm provides as follows with pseudocode form:
Read vocabulary; // vocabulary is the tabulation of possible speech
Open the language main body; // language main body comprises the word that will be divided into speech
When (not being the ending of language main body)
{
From the language main body, read delegation and put into row buffer;
// row buffer is a storage stack, is not limited to any specific
Form
When (row buffer non-NULL)
{
Find with row buffer head coupling, the longest speech in the vocabulary;
Export this speech and a speech separator;
From row buffer, remove the head of coupling;
}
The output line Separator;
}
Close the closed language main body;
In certain embodiments, according to of the present invention, use language model one in partitioning algorithm may further comprise the steps: read language model; // language model is loaded in the storer or alternate manner makes
The available vocabulary of reading; Open the language main body; { from the language main body, read delegation and put into row buffer when (not being the ending of language main body);
Number of words in the // delegation can change according to embodiment; Delegation can
Can be one and use vocabulary, produce the path list that comprises all possible division path;
// one is divided the path is a kind of possible stroke branch; Can use different shapes
Formula is deposited the path, and for example tabulation or tree construction are found the division path of greed and it is saved as optimal path;
// can use multiple greedy algorithm such as above provide a kind of; Of the present invention
Among this embodiment,
// greed is divided the path and is regarded as optimal path at first, but also can use it
Its initial path uses language model to calculate the probability in this path, and this value is changed to maximum probability;
The general of another speech followed in probability and a speech that // language model specifies speech to take place
Rate.Can use (1) formula or another
// formula calculating probability { is selected the path and it is changed to current path when (path list non-NULL) from path list; Use language model, calculate the probability of current path; (if the probability>maximum probability of current path) {
The probability of maximum probability=current path;
Current path saves as optimal path;
}
From path list, remove current path;
}
The output optimal path;
}
Close the closed language main body;
In conjunction with the Chinese words in following, provide an example of this algorithm.
Urtext:
Way is arranged
Figure A9981708200101
Solve
Use the division result of greedy method:
Way is arranged
Figure A9981708200102
Solve
Use the division result of language model:
Way is arranged
Figure A9981708200103
Solve
Example 1.
During correct the division, the meaning of this sentence is " having way and strength to deal with problems ".The present invention has successfully divided this sentence, and traditional method is not accomplished.
In example 1, urtext is considered as following eight words forms in order: C1, C2, C3, C4, C5, C6, C7 and C8.From urtext, visually also unclear how word the grouping to form speech.Following table 1 has provided two kinds of possible modes that the word grouping formed five speech W1-W5.
Table 1:
Speech According to prior art greedy algorithm, the word that comprises in the speech According to the present invention, the word that comprises in the speech
????W1 ????C1 ????C1
????W2 ????C2C3 ????C2C3
????W3 ????C4C5 ????C4
????W4 ????C6 ????C5C6
????W5 ????C7C8 ????C7C8
It is as follows to use a kind of greedy algorithm generation greed to divide the path.In main body, in the vocabulary of consecutive word, be exactly the speech that has only word C1 with word C1 the longest initial speech.In other words, C1C2 is not the speech in the vocabulary.So speech W1 is exactly word C1.In certain embodiments, speech W1 leaves row buffer, and next word becomes capable head, although this is an implementation detail that need not illustrate.In this example, next word is C2.In main body, in the vocabulary of consecutive word, be the speech that comprises word C2C3 with word C2 the longest initial speech.In other words, C2C3 is in vocabulary, but C2C3C4 does not exist.So speech W2 is exactly word C2C3.In main body, in the vocabulary of consecutive word, be the speech that comprises word C4C5 with word C4 the longest initial speech.So speech W3 is exactly word C4C5.In main body, in the vocabulary of consecutive word, be the speech that comprises word C6 with word C6 the longest initial speech.So speech W4 is exactly word C6.In main body, in the vocabulary of consecutive word, be the speech that comprises word C7C8 with word C7 the longest initial speech.So speech W5 is exactly word C7C8.
Calculate this greed and divide the probability in path.For speech W1 and W2 and word C1, C2 and C3, division that comprise in the vocabulary, only path is the path of being selected by greedy algorithm.A kind of method of handling this situation is not recomputate probability, but is not also not calculate another kind of probability when existing other path that vocabulary allows.Another kind method is to recomputate the probability in same path, only can determine that they are identical, makes current path not replace maximum probability.
Yet,, have two kinds of paths for speech W3 and W4.First kind is that greedy algorithm is selected, and W3 is C4C5, and W4 is C6.The division path that another kind of vocabulary is allowed is, W3 is C4, and W4 is C5C6.In this example, the probability that the combination of supposing to follow C5C6 in the C4 back is being followed C6 than the combination back of C4C5 is bigger.(W5 is identical in each case.) so in (1) formula, the probability of current path can be divided the probability in path greater than greed, it can replace greed and divide the path.The possibility that merits attention below the attention.The combination of supposing C4C5 is bigger than the probability of C4 oneself.According to this single bit of information, can select greed to divide the path.Yet this can not cause preferably the overall situation to be separated, and is bigger because the probability that C5C6 following C6 than C4C5 back is being followed in the C4 back.
Row can be a sentence.As usage herein, term " sentence " is meant with the one group continuous speech of a symbol such as the fullstop ending.In different embodiment, in dividing the path, can consider not word on the same group.For example, all words in the sentence can be considered in the division path.Divide the path and can consider a mobile word window, and do not consider the sentence ending, notice that only language model does not allow the word of a sentence ending to combine with first word in the next sentence.Window may be a word of setting number.If the last character in previous path not in speech, from its initial new division path, is divided the path and may be comprised X word.Other possibility also exists.
There is various computing systems can be used for training and speech recognition system.Only be that Fig. 1 represents the high-level schematic of computer system 10 as an example, this system comprises processor 14, storer 16 and I/O and control assembly 18.Storer 16 may comprise row buffer 22.Row buffer only is a storage stack, needn't have any specific feature.For example, it needn't have adjacent memory unit.Have jumbo storer in processor 14, storer 16 may both be represented the not storer on processor 14 chips, and expression part is at the part storer on processor 14 chips not again.(perhaps storer 16 may be fully on processor 14 chips.) in certain embodiments, row buffer 24 is in processor 14, yet row buffer and nonessential in processor 14.In addition, be not that each embodiment of the present invention has row buffer.Dividing the path does not need to leave in the row buffer.At least some I/O and control assembly 18 may be on the same chips of processor 14.Perhaps on another chip.Microphone 26, monitor 30, annex memory 34, input equipment (such as keyboard and mouse 38), network be connected 42 and loudspeaker 44 may be mutual with I/O and control assembly 18.The multiple storer of storer 34 expressions is such as hard disk drive and CD ROM or DVD disc.These comprise computer-readable medium, and they can be held instruction, and carry out these instructions some embodiment of the present invention is taken place.It is emphasized that Fig. 1 only is schematically, the invention is not restricted to the purposes of this type of computer system.Be used to realize that computer system 10 of the present invention and other computer system may be various ways, such as desktop, main frame and pocket computer.
For example, Fig. 2 has shown the handheld device 60 that has display screen 62, and it may contain some or all characteristic of Fig. 1.This handheld device often may be the interface of another computer system, such as the system among Fig. 1.The shape of the object among Fig. 1 and Fig. 2 and relative size are not actual shape and relative size of hint.
Out of Memory and embodiment
The quality of language model is to measure with the confusion degree of puzzling traditionally, and it is a kind of entropy tolerance of language complexity.For identical training and evaluation body of text, the model with low puzzled confusion degree is better than the high model of puzzled confusion degree.As an experiment, use the data in 94 years to 98 years of People's Daily, the ternary model that different division methods estimate is estimated.The puzzled confusion degree of tradition (greed) method is 182, and the result of the embodiment of the invention is 143.Compared with prior art, this is the remarkable improvement of simulation accuracy.
Mention " embodiment ", " embodiment ", " some embodiment " or " other embodiment " in this manual, mean that a kind of specific characteristic, structure or feature together with the embodiment introduction is included at least among some embodiment, but need not to be all embodiment of the present invention.The multiple form of expression " embodiment ", " embodiment " or " some embodiment " needn't refer to same embodiment.
If this instructions declare " can ", " perhaps " or " possibility " comprise certain assembly, characteristic, structure or feature, is not to comprise this specific assembly, characteristic, structure or feature just.If mention " certain " key element in this instructions or claims, and do not mean that this key element has only one.If mention " certain is other " key element in this instructions or claims, not getting rid of has not only other key element.
Those skilled in the art obtains to will appreciate that after the interests of this open file, within the scope of the present invention, can produce many other changes from above introduction and accompanying drawing.Therefore, be that claims following, that comprise any other modification are stipulated scope of the present invention.

Claims (22)

1. method comprises:
(a) use certain vocabulary to produce a path list of stroke sub-path;
(b) determining to divide the probability in path and specify it for one first is the optimum division path;
(c) probability of determining another one division path determines also whether the probability in other division path surpasses the probability in optimum division path, if like this, the optimum division path is appointed as in just that this is other division path,
Repeating (c) divides up to all remaining that paths all obtain determining and finishes with the likelihood ratio in optimum division path.
2. according to the method for claim 1, it is characterized in that first obtains by greedy algorithm.
3. according to the method for claim 1, it is characterized in that, divide the path and leave in the row buffer, and after having compared corresponding probability, from row buffer, remove.
4. according to the method for claim 1, it is characterized in that the word that comprises in the division path is those words in the single sentence.
5. according to the method for claim 1, it is characterized in that the word that comprises in the division path is in the window of certain slip.
6. according to the method for claim 1, it is characterized in that, determine probability by using language model.
7. according to the method for claim 1, it is characterized in that, determine probability by the calculating that relates to following formula: P w i = max i ( P w i - 1 + prob ( w i | w i - 1 ) ) , w wherein iBe i speech, w I-1For near w iPrevious speech, Pw I-1Be w I-1The probability that individual speech occurs, prob (w i| w I-1) be if speech w I-1During appearance, speech w appears iConditional probability.
8. device comprises:
A kind of computer-readable medium wherein contains instruction, makes computer system when carrying out these instructions:
(a) use certain vocabulary to produce a path list of stroke sub-path;
(b) determining to divide the probability in path and specify it for one first is the optimum division path;
(c) probability of determining another one division path determines also whether the probability in other division path surpasses the probability in optimum division path, if like this, the optimum division path is appointed as in just that this is other division path,
Repeating (c) divides up to all remaining that paths all obtain determining and finishes with the likelihood ratio in optimum division path.
9. device according to Claim 8 is characterized in that, first obtains by greedy algorithm.
10. device according to Claim 8 is characterized in that, divides the path and leaves in the row buffer, and remove from row buffer after having compared corresponding probability.
11. device according to Claim 8 is characterized in that, the word that comprises in the division path is those words in the single sentence.
12. device according to Claim 8 is characterized in that, the word that comprises in the division path is in the window of certain slip.
13. device according to Claim 8 is characterized in that, determines probability by using language model.
14. device according to Claim 8 is characterized in that, determines probability by the calculating that relates to following formula: P w i = max i ( P w i - 1 + prob ( w i | w i - 1 ) ) , w wherein iBe i speech, w I-1For near w iPrevious speech, Pw I-1Be w I-1The probability that individual speech occurs, prob (w i| w I-1) be if speech w I-1During appearance, speech w iThe conditional probability that occurs.
15. device according to Claim 8 is characterized in that, this device is an one disc.
16. a computer system comprises:
Preserve and divide the storer that forms the word path list of speech in the vocabulary;
Processor, it
(a) determining to divide the probability in path and specify it for one first is the optimum division path;
(b) probability of determining another one division path determines also whether the probability in other division path surpasses the probability in optimum division path, if like this, just the optimum division path is appointed as in other division path,
Repeating (b) divides up to all remaining that paths all obtain determining and finishes with the likelihood ratio in optimum division path.
17. the device according to claim 16 is characterized in that, first obtains by greedy algorithm.
18. the device according to claim 16 is characterized in that, divides the path and leaves in the row buffer, and remove from row buffer after having compared corresponding probability.
19. the device according to claim 16 is characterized in that, the word that comprises in the division path is those words in the single sentence.
20. the device according to claim 16 is characterized in that, the word that comprises in the division path is in the window of certain slip.
21. the device according to claim 16 is characterized in that, determines probability by using language model.
22. the device according to claim 16 is characterized in that, determines probability by the calculating that relates to following formula: P w i = max i ( P w i - 1 + prob ( w i | w i - 1 ) ) , w wherein iBe i speech, w I-1For near w iPrevious speech, Pw I-1Be w I-1The probability that individual speech occurs, prob (w i| w I-1) be if speech w I-1During appearance, speech w iThe conditional probability that occurs.
CNB998170828A 1999-12-23 1999-12-23 Global approach for segmenting characters into words Expired - Fee Related CN1192354C (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN1999/000213 WO2001048738A1 (en) 1999-12-23 1999-12-23 A global approach for segmenting characters into words

Publications (2)

Publication Number Publication Date
CN1398395A true CN1398395A (en) 2003-02-19
CN1192354C CN1192354C (en) 2005-03-09

Family

ID=4575157

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB998170828A Expired - Fee Related CN1192354C (en) 1999-12-23 1999-12-23 Global approach for segmenting characters into words

Country Status (3)

Country Link
CN (1) CN1192354C (en)
AU (1) AU1767200A (en)
WO (1) WO2001048738A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101609671B (en) * 2009-07-21 2011-09-07 北京邮电大学 Method and device for continuous speech recognition result evaluation

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4059725A (en) * 1975-03-12 1977-11-22 Nippon Electric Company, Ltd. Automatic continuous speech recognition system employing dynamic programming
JPS58132298A (en) * 1982-02-01 1983-08-06 日本電気株式会社 Pattern matching apparatus with window restriction
JPH02195400A (en) * 1989-01-24 1990-08-01 Canon Inc Speech recognition device
US5706397A (en) * 1995-10-05 1998-01-06 Apple Computer, Inc. Speech recognition system with multi-level pruning for acoustic matching
US5799276A (en) * 1995-11-07 1998-08-25 Accent Incorporated Knowledge-based speech recognition system and methods having frame length computed based upon estimated pitch period of vocalic intervals
US5862519A (en) * 1996-04-02 1999-01-19 T-Netix, Inc. Blind clustering of data with application to speech processing systems
JP2001516904A (en) * 1997-09-18 2001-10-02 シーメンス アクチエンゲゼルシヤフト How to recognize keywords in spoken language
US6374220B1 (en) * 1998-08-05 2002-04-16 Texas Instruments Incorporated N-best search for continuous speech recognition using viterbi pruning for non-output differentiation states

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101609671B (en) * 2009-07-21 2011-09-07 北京邮电大学 Method and device for continuous speech recognition result evaluation

Also Published As

Publication number Publication date
WO2001048738A1 (en) 2001-07-05
AU1767200A (en) 2001-07-09
CN1192354C (en) 2005-03-09

Similar Documents

Publication Publication Date Title
CN1226717C (en) Automatic new term fetch method and system
CN1159661C (en) System for Chinese tokenization and named entity recognition
CN1269102C (en) Method for compressing dictionary data
CN1207664C (en) Error correcting method for voice identification result and voice identification system
CN1135485C (en) Identification of words in Japanese text by a computer system
CN1260704C (en) Method for voice synthesizing
US6678409B1 (en) Parameterized word segmentation of unsegmented text
CN101079028A (en) On-line translation model selection method of statistic machine translation
EP1687738A2 (en) Clustering of text for structuring of text documents and training of language models
CN1193779A (en) Method for dividing sentences in Chinese language into words and its use in error checking system for texts in Chinese language
CN110853625B (en) Speech recognition model word segmentation training method and system, mobile terminal and storage medium
CN1102779C (en) Simplified Chinese character-the original complex form changingover apparatus
CN1108572C (en) Mechanical Chinese to japanese two-way translating machine
CN1192354C (en) Global approach for segmenting characters into words
CN1114165C (en) Segmentation of Chinese text into words
CN102945231B (en) Construction method and system of incremental-translation-oriented structured language model
CN1607526A (en) Document sorting apparatus, method and program adopting freezing mode
CN1068688C (en) Literal information processing method and apparatus
CN1271550C (en) Sentence boundary identification method in spoken language dialogue
CN1302415C (en) English-Chinese translation machine
CN1203389C (en) Initial four-stroke Chinese sentence input method for computer
CN1144141C (en) Change-over processor for Chinese input and method of change-over processing for Chinese input
CN1679023A (en) Method and system of creating and using chinese language data and user-corrected data
CN1257445C (en) Chinese-character 'Pronunciation-meaning code' input method
CN107168997A (en) The original appraisal procedure of webpage, device and storage medium based on artificial intelligence

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20050309

Termination date: 20121223