CN101685438B - Chinese article debugging device, Chinese article debugging method - Google Patents

Chinese article debugging device, Chinese article debugging method Download PDF

Info

Publication number
CN101685438B
CN101685438B CN200810149253A CN200810149253A CN101685438B CN 101685438 B CN101685438 B CN 101685438B CN 200810149253 A CN200810149253 A CN 200810149253A CN 200810149253 A CN200810149253 A CN 200810149253A CN 101685438 B CN101685438 B CN 101685438B
Authority
CN
China
Prior art keywords
mentioned
word
chinese character
string
character string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN200810149253A
Other languages
Chinese (zh)
Other versions
CN101685438A (en
Inventor
谷圳
吴世弘
王文男
谢文泰
洪大弘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute for Information Industry
Original Assignee
Institute for Information Industry
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute for Information Industry filed Critical Institute for Information Industry
Priority to CN200810149253A priority Critical patent/CN101685438B/en
Publication of CN101685438A publication Critical patent/CN101685438A/en
Application granted granted Critical
Publication of CN101685438B publication Critical patent/CN101685438B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a Chinese article debugging device, and a Chinese article debugging method, wherein the Chinese article debugging device is suitable for processing a plurality of Chinese word strings comprising a first Chinese word string. The Chinese article debugging device comprises an article cutting module, a database, a candidate word generating module, a candidate sentence generating and grading module and a display device, wherein the article cutting module cuts the first Chinese word string into a plurality of first character sets, and each first character set consists of any two continuous and discontinuous characters in the first Chinese word string; the database is provided with a plurality of first correct strings and a plurality of first indexes, and each first index consists of any two continuous and discontinuous characters in the first correct strings; the candidate word generating module obtains first indexes corresponding to the first character sets and obtains corresponding first correct strings; the candidate sentence generating and grading module generates optimal candidate sentences; and the display device displays the Chinese strings and the optimal candidate sentences.

Description

Chinese article debug device and Chinese article debugging method
Technical field
The invention relates to a kind of Chinese article debug device, particularly a kind of Chinese article debug device about doubly-linked word (bi-gram) cutting mechanism.
Background technology
Because the use of computer is more and more general, people mostly rely on computer and write and write an article.Because the same sound of Chinese text possibly have a lot of words, and same shape also has many similar shape similar words, has therefore caused the loaded down with trivial details and complicated of Chinese text, makes that the writer is as easy as rolling off a log in article, to use wrongly written or mispronounced characters.
Summary of the invention
Based on above consideration, but the system and method for a kind of debug Chinese of needs article, to solve the loaded down with trivial details wrongly written or mispronounced characters problem that causes because of Chinese.
In view of this, the present invention discloses a kind of Chinese article debug device, is applicable to handle a plurality of Chinese character strings, and wherein Chinese character string is to cut from a Chinese article according to punctuation mark to form, and above-mentioned Chinese character string comprises one first Chinese character string.This device comprises an article cutting module, a database, a candidate word generation module, a candidate sentence generating and a grading module and a display device.It is a plurality of first word groups that the article cutting module cuts first Chinese character string, and wherein the first word group is formed by any two continuous and discontinuous characters in first Chinese character string.Database has a plurality of first correct word string and corresponding to a plurality of first index of the first correct word string, wherein first index is formed by any two continuous and discontinuous characters in the first correct word string.The candidate word generation module is obtained first index corresponding to the first word group according to the first word group, and obtains the first corresponding correct word string according to first index of being obtained.Candidate sentence generating and grading module produce optimal candidate sentences according to the first correct word string that is obtained.Display device Chinese display word string and above-mentioned optimal candidate sentences.
The present invention provides a kind of Chinese article debugging method in addition, is applicable to handle a plurality of Chinese character strings, and wherein Chinese character string is to cut from a Chinese article according to punctuation mark to form, and above-mentioned Chinese character string comprises one first Chinese character string.This method comprises that cutting first Chinese character string is a plurality of first word groups, and wherein the first word group is formed by any two continuous and discontinuous characters in first Chinese character string.One database is provided, and wherein database has a plurality of first correct word string, and corresponding to a plurality of first index of the first correct word string, wherein first index is formed by any two continuous and discontinuous characters in the first correct word string.Obtain first index according to the first word group, and obtain the first corresponding correct word string according to first index of being obtained corresponding to the first word group.The first correct word string according to being obtained produces optimal candidate sentences.At last in display device Chinese display word string and optimal candidate sentences.
Description of drawings
Figure 1 shows according to the invention described in a Chinese Ming article debugging embodiment of apparatus 100;
Figure 2 shows according to the invention described in Chinese Ming article flowchart debugging apparatus 100;
Figure 3 shows an embodiment according to the present example described in Ming Chinese Ming string Str said structure diagram;
Figure 4 shows an embodiment according to the present example Ming said plurality of candidate sentence generation mechanism; and
Figure 5 shows an embodiment according to the present example described in Ming sentence candidate scoring said Ming FIG.
Drawing reference numeral:
110~article receiver module, 120~article cutting module
130~correct language database 140~mistake language database
150~candidate word generation module, 160~candidate sentence generating and grading module
170~similar character database, 180~phonetically similar word database
190~language model database, 200~article indicates module
210~display device Art~Chinese article
Str~Chinese character string
Embodiment
For make above-mentioned purpose of the present invention, feature and advantage can be more obviously understandable, hereinafter is special lifts preferred embodiment, and cooperates appended graphicly, elaborates as follows:
Figure 1 shows according to the invention described in a Chinese Ming articles debug device 100 embodiments.Chinese article debug device 100 comprises the wrong language database of the correct language database of an article receiver module 110, an article cutting module 120, one 130, one 140, a candidate word generation module 150, a candidate sentence generating and grading module 160, a similar character database 170, a unisonance numerical data base 180, a language model database 190, article sign module 200 and a display device 210.
Article receiver module 110 is in order to receiving a Chinese article Art, and sends Chinese article Art to article cutting module 120 to carry out the cutting of article.Correct language database 130 is in order to storing the correct language data of idiomatic allusion, slang, proper noun, poem (being merely for example), and has a plurality of first correct word string and corresponding to a plurality of first index of the first correct word string.Mistake language database 140 is in order to storing vocabulary mistake commonly used and its correct vocabulary, and has a plurality of second index and the second index pairing a plurality of second correct word string.Candidate word generation module 150 is in order to obtaining the first correct word string, and to set the first correct word string be first candidate word, and obtains the second correct word string, and to set the second correct word string be second candidate word.Candidate sentence generating and grading module 160 are in order to produce a plurality of candidate sentence according to first candidate word and second candidate word; And use a candidate sentence scoring, mark candidate sentence to produce optimal candidate sentences according to the data of similar character database 170, phonetically similar word database 180 and language model database 190.Article indicates module 200 in order to indicate Chinese article Art and optimal candidate sentences on display device 210.
More than be the brief account of Chinese article debug device 100, the motion flow of its all elements will be in following detailed description.
Figure 2 shows according to the invention described in Chinese Ming article debugging apparatus 100 FIG.In step S100, article receiver module 110 receives Chinese article Art.In step S110, article cutting module 120 is carried out the cutting of article Art.Article cutting module 120 at first cuts into a plurality of Chinese sentences with article Art according to punctuation mark, and a Chinese character string represented in each sentence.For instance, " outside world is full of banners and flags, makes him be unable to bear jumping the bathroom, and it is cured also therefore to taste flat various sour-sweet hardship as follows in the narration of supposing Chinese article Art."; then article cutting module 120 cuts into three Chinese character strings according to punctuation mark (in this situation, being comma and fullstop) with Chinese article Art: " outside world is full of banners and flags ", " making him be unable to bear jumping the bathroom ", and " it is cured also therefore to taste flat various sour-sweet hardship ".Chinese article Art is cut into after a plurality of Chinese character strings, is the cutting process of indivedual Chinese character strings with that.
Before the cutting of underway text strings, its related definition is described earlier.With the example of Chinese character string Str " it is cured also therefore to taste flat various sour-sweet hardship ", it has the first wrong word string " sour-sweet hardship is cured " that is positioned at primary importance and the second wrong word string that is positioned at the second place " tasted flat ".Wherein primary importance is represented the position of the 8th to 11 character among the Chinese character string Str, and the second place is represented the position of the 4th to 5 character among the Chinese character string Str, and is as shown in Figure 3.
In this embodiment, article cutting module 120 adopts the mechanism cutting of great-jump-forward bi-gram, and any two continuous and discontinuous characters among the Chinese character string Str are cut into a plurality of first word groups.12,23,34... the first following word group of any two continuous characters representative among the Chinese character string Str:; Wherein 12 represent the 1st and the 2nd first word group that character is formed among the Chinese character string Str; 23 represent the 2nd and the 3rd first word group that character is formed among the Chinese character string Str, and the rest may be inferred.In addition; 13,35,57...24,46,68... any two discontinuous characters can be represented the first following word group among the Chinese character string Str:; Wherein 13 represent the 1st and the 3rd first word group that character is formed in the Chinese character string; 35 represent the 3rd and the 5th first word group that character is formed in the Chinese character string, that is the character in interval.In addition; Any two discontinuous characters also can be represented the first following word group among the Chinese character string Str: 14,47...25,58...36,69...; Wherein 14 represent the 1st and the 4th first word group that character is formed in the Chinese character string; 47 represent the 4th and the 7th first word group that character is formed in the Chinese character string, that is a next but two character.
In sum, Chinese character string Str " it is cured also therefore to taste flat various sour-sweet hardship " can cut into the first following word group:
Table one: the first word group of Chinese character string Str
N The first word group
0 Also because of therefore this to taste the sour-sweet sweetness and bitterness hardship of flat each all acid cured
1 Also this this flat all sweet cured bitter because of tasting each acid
2 Also taste all hardships because of flat acid acid cured this each sweet
Wherein N representative is when cutting Chinese character string Str is the first word group, the number of characters at two characters of first word group institute interval in Chinese character string Str.Two characters that N=0 represents the first word group in Chinese character string Str number of characters at interval be 0; Two characters that N=1 represents the first word group in Chinese character string Str number of characters at interval be 1, it is 2 that N=2 represents the number of characters at two characters institute interval in Chinese character string Str of the first word group.
After the cutting of Chinese article Art, correct language database 130 and wrong language database 140 are provided in step S120 then among the completing steps S110.It should be noted that, in step S120, can provide only a database in which the language database 130 with the correct language database 140 and the error data stored, so the above first and said second database Ming of convenience only and is not intended to limit the invention Ming.
As stated; Correct language database 130 has a plurality of first correct word string and corresponding to a plurality of first index of the first correct word string; Be that the mechanism cutting first correct word string according to above-mentioned bi-gram gets wherein, cut Chinese character string Str and the first word group shown in must table one as article cutting module 120 corresponding to a plurality of first index of the first correct word string.For example, suppose a database 130 has the correct language phrase "ups and downs" and proper nouns "Ethernet," the two first correct string (for example Ming only two purposes for more groups).In this case, stored data can be shown in the table two in the correct language database 130:
Table two: the data layout of correct language database 130
First index The first correct word string
Sour-sweet sweetness and bitterness are bitter peppery bitter sweet peppery vinegar-pepper The life's joys and sorrows
With Mrs's network with net too network with network Ethernet
In step S130, candidate word generation module 150 is obtained first index corresponding to the first word group according to the first word group, and obtains the first corresponding correct word string according to first index of being obtained.More particularly, candidate word generation module 150 looks for according to the first word group shown in the table one whether the first identical index is arranged in table two, just obtains this first index if any, and then obtains this first index pairing first correct word string.For instance, candidate word generation module 150 at first looks for according to the first word group of table one " also because of " whether the first identical index is arranged in table two.First index of " also because of " owing to do not have in the table two, so 150 continuation of candidate word generation module look for according to the next first word group " therefore " whether the first identical index is arranged in table two.Likewise; Owing to there is not first index of " therefore " in the table two; Therefore candidate word generation module 150 continues to look for according to the next first word group " this tastes " whether the first identical index etc. is arranged in table two, repeats above-mentioned step till the first all word groups was all looked for.In the meantime, when handling the first word group of " sour-sweet ", candidate word generation module 150 can find first index of " sour-sweet " in table two, so candidate word generation module 150 is obtained the action of this first index (that is " sour-sweet ").Obtain after this first index, candidate word generation module 150 is obtained its pairing first correct word string, just " life's joys and sorrows " according to first index of being obtained " sour-sweet ".Likewise, when handling the first word group of " bitter ", it equally also can find first index of " bitter " from table two, therefore obtains the first correct word string of " life's joys and sorrows " equally.
After obtaining the first corresponding correct word string " life's joys and sorrows ", then in step S140, the first correct word string that is obtained is carried out the filtration of former speech similarity, and the first correct word string that will filter out is set at first candidate word.The mode of filtering is exactly to according to the number of words that the first correct word string that obtained occurred to decide its former speech similarity in Chinese character string Str, and judges that whether its former speech similarity is greater than the experience threshold value.With this example; Four words, the first correct word string " life's joys and sorrows " that is obtained three words " sour-sweet hardship " occurred in Chinese character string Str " it is cured also therefore to taste flat various sour-sweet hardship "; Have only " peppery " not occur; Therefore its former speech similarity was 75% (occurring three words in four words), and preset experience threshold value is 60% (non-limiting), and the former speech similarity of the first correct word string " life's joys and sorrows " that expression is obtained has greater than the experience threshold value; Therefore can the first correct word string " life's joys and sorrows " that obtained be set at first candidate word, wherein first candidate word is corresponding to first wrong word string and the primary importance among Fig. 3.The first candidate word is used to determine the best candidates for a sentence, this point will be detailed below, said Ming.
Above-described processing procedure is to the first wrong word string " sour-sweet hardship is cured " among the Chinese character string Str, below will inquire into the processing that the second wrong word string " is tasted flat " among the Chinese character string Str.
As stated, mistake language database 140 has a plurality of second index and the second index pairing a plurality of second correct word string.In this case, assuming that the wrong language database 140 storing a second group of five two indexes and their corresponding second proper string (for example Ming purposes only, the actual situation may include more groups), the following Table III follows:
Table three: the data layout two of mistake language database 140
Second index The second correct word string
Tasting the flat sheet of tasting tastes all of Taste all of
Palm fibre is combined son Pyramid-shaped dumpling
Wherein the vocabulary that general user wrongly writes often represented in second index, the pairing correct vocabulary of vocabulary that on behalf of these, the second correct word string wrongly write often.With table three; " pyramid-shaped dumpling " of the second correct word string is correct term; And the user possibly usually be written as " pyramid-shaped dumpling " mistake " palm fibre " or " combining son ", and causing this wrong reason mainly is that " palm fibre " or " combining son " that mistake is write has the characteristic of similar shape with correct vocabulary " pyramid-shaped dumpling ".Likewise, the user possibly usually will " taste all of " and be written as " tasting flat " (because similar shape characteristic) or " tasting sheet " (because unisonance characteristic) by mistake.At this time; The present invention is to be the vocabulary of usually wrongly writing " palm fibre son " that second index is stored in the wrong language database 140 with " combining sub " predefined, and then to define its pairing correct vocabulary be that the second correct word string is stored in the wrong language database 140.
Explained that then flow process proceeds to step S150 after the stored data layout of wrong language database 140.
In step S150, candidate word generation module 150 produces second candidate word according to second index.The process that second candidate word produces is: at first candidate word generation module 150 judges whether the second wrong word string is identical with second index; When wherein a when identical of the second wrong word string and second index; Obtain the second index identical pairing second correct word string, and the second correct word string that is obtained is set at second candidate word with the second wrong word string.With above-mentioned example, the first word group that candidate word generation module 150 is at first judged table one " also because of " whether with table three in second index identical.The same word of " also because of " owing to do not have in five group of second index of table three, thus candidate word generation module 150 continue to judge the next ones " therefore " the first word group whether with table three in second index identical.Likewise; Owing to there is not the same word of " therefore " in five group of second index of table three; Therefore candidate word generation module 150 continue to judge next " this tastes " the first word group whether with table three in second index mutually equal, repeat above-mentioned steps till having judged the first all word groups.In the meantime; When taking turns to the first word group of " tasting flat " when handling (that is second wrong word string); Candidate word generation module 150 is judged the second wrong word strings (tasting flat) wherein identical with second index really; Therefore obtain the second index identical pairing second correct word string, just obtain the second correct word string of " tasting all of " with the second wrong word string.Then candidate word generation module 150 " is tasted all of the second correct word string that is obtained " and is set at second candidate word, and wherein second candidate word is corresponding to the second wrong word string and the second place among Fig. 3.
Produce after second candidate word, we have first candidate word and second candidate word now, therefore next in step S160, will carry out the processing of first candidate word and second candidate word.
In step S160, candidate sentence generating and grading module 160 produce a plurality of candidate sentence according to the first wrong word string, the second wrong word string, first candidate word and second candidate word, and produce optimal candidate sentences.To be candidate sentence generating at first replace the primary importance and the second place to the Chinese character string Str with the first wrong word string, the second wrong word string, first candidate word and second candidate word according to its corresponding respectively position with grading module 160 to the process of candidate sentence generating; And produce various a plurality of candidate sentence that possibly make up; And then according to a candidate sentence scoring a plurality of candidate sentence of marking, and the highest candidate sentence of will marking is set at optimal candidate sentences.
Figure 4 shows according to the invention an embodiment of Ming Chinese string Str described in all possible combinations of candidate sentence.As shown in Figure 4, candidate sentence generating and grading module 160 can produce four groups of following candidate sentence according to the first wrong word string, the second wrong word string, first candidate word and second candidate word: " it is cured also therefore to taste flat various sour-sweet hardship ", " also therefore tasting the flat various life's joys and sorrows ", " it is cured also therefore to taste all of various sour-sweet hardships " and " also therefore tasting all of the various life's joys and sorrows ".
Figure 5 shows according to the invention described in Ming scoring candidate sentence embodiments.According to four groups of candidate sentence that produced, candidate sentence generating and grading module 160 can be used the four groups of candidate sentence of marking of sentence similarity (SS), sound similarity (PS) and the shape similarity (WS) between frequency of utilization (PPL), candidate sentence and the Chinese character string Str (former sentence) of candidate sentence.Wherein, the frequency of utilization of candidate sentence is the language model of expression specific area, for example ken miscellaneous such as medical science, astronomy etc.The ratio
Figure GDA0000123599160000093
of the difference that is defined as candidate sentence number of words and candidate sentence and former non-similar shape number of words of shape similarity and former number of words is in sum between ratio
Figure GDA0000123599160000092
candidate sentence of the difference that is defined as candidate sentence number of words and candidate sentence and former non-unisonance number of words of sound similarity and former number of words and former between ratio
Figure GDA0000123599160000091
candidate sentence of the difference that is defined as candidate sentence number of words and candidate sentence and the different number of words of former sentence of sentence similarity and former sentence number of words and the former sentence between candidate sentence and the former sentence; Come the candidate sentence scoring according to four above factors, and the SCORE computing formula that must mark is following:
SCORE=w1*PPL+w2*SS+w3*PS+w4*WS
Wherein on behalf of weight, the w2 of the frequency of utilization of candidate sentence, w1 represent weight, the w3 of the sentence similarity of candidate sentence and former sentence represents the weight of candidate sentence and former 's sound similarity, and w4 represents the weight of candidate sentence and former 's shape similarity.Again, the frequency of utilization of candidate sentence can comprise the language model in a plurality of fields, and therefore according to Fig. 5, the frequency of utilization PPL of candidate sentence can calculate according to following formula:
PPL=(1-α)*PPL 1+α*PPL 2
PPL wherein 1Represent first kind of language model, PPL 2Represent second kind of language model.
According to above formula scoring candidate sentence, in the experiment given parameter following:
α=0.6,w1=-0.0001,w2=1,w3=1,w4=1
Then therefore the sentence D of Fig. 4 " also taste all of the various life's joys and sorrows " obtains the highest scoring, and therefore next candidate sentence generating and grading module 160 are set at optimal candidate sentences with this candidate sentence.
At last, in step S170, article indicates module 200 in showing the part of being revised between former sentence and the optimal candidate sentences on the display device 210.
Action of the present invention is detailed as above, it must be noted that, do not breaking away under the spirit of the present invention, more than the flow process that detailed can change.For instance, not necessarily will produce first candidate word earlier and then produce second candidate word, the generation of first and second candidate word can be in contrast to above step, or produced simultaneously.
In addition, in the above embodiments, the second wrong word string is " tasting flat ", and it is constituted by two characters.But in other a kind of situation, it possibly constituted by more character.For instance, consider following Chinese character string: " wanting using delicious delicacies ".In this case, " want with " itself is correct expression way, and work as " want with " and " delicacies " when appearing in the sentence simultaneously, and " thinking usefulness " just possibly be wrong.Because correct term is " enjoying delicious delicacies ", and because the unisonance characteristic of " enjoying " and " thinking " makes the user use unisonance easily but wrong word.In order to address this problem, the embodiment below the present invention will provide solution.
In the present embodiment, continue to use the table three in the wrong language database 140 and add new parameter and content, shown in following table four:
Table four: the data layout two of mistake language database 140
Second index The second correct word string Interior literary composition
Tasting flat tasting partially tastes all of Taste all of
Palm fibre is combined son Pyramid-shaped dumpling
Jia Jia The every household In every family
Want to use Enjoy Delicacies
In wrong language database 140, first and second row are contents originally, and third and fourth row are the newly-increased contents of present embodiment.Therefore, in third and fourth row, second index more corresponds to interior literary composition except corresponding to the second correct word string.Of course, the above data, said Ming purposes only, not intended to limit the invention Ming.
" want using delicious delicacies " according to above Chinese character string; Because " Jia Jia " and " wanting to use " itself is separately correct term; Own wrong unlike " tasting flat " and " combining son ", therefore can find correct term " to taste all of " and " pyramid-shaped dumpling " at once.In this case, though " Jia Jia " and " wanting to use " itself is correct term, when specific word string occurring in the sentence, " Jia Jia " and " wanting to use " will become wrong term.Therefore in the present embodiment, the present invention's word string that these are specific is defined as interior literary composition (shown in table four third column), and is stored in advance in the wrong language database 140.The following will describe the present Ming debugging steps.
At first Chinese character string " want using delicious delicacies " has wrong word string " want with ", and cuts into a plurality of word groups with bi-gram equally, and the principle of its cutting follows the result identical with table one, so at this repeated description no longer.At first candidate word generation module 150 judges whether identical with second index of table four the word group " is wanted to use "; Owing to have second index of " wanting to use " in the table four; Therefore candidate word generation module 150 is obtained the pairing interior literary composition of this second index, that is obtains the interior literary composition of " delicacies ".Then candidate word generation module 150 judges whether comprise the interior literary composition of being obtained (delicacies) in the Chinese character string, and if any, representative " want with " is wrong word string, if not then expression " want with " is correct word string, therefore continues other word group of processing.Owing to comprise the word string of " delicacies " in the Chinese character string really; Therefore candidate word generation module 150 is then obtained second index that is same as wrong word string (want with) (want with) the pairing second correct word string (enjoying), and the second correct word string that is obtained is set at second candidate word.
In the present embodiment, Chinese character string " is wanted using delicious delicacies " and is had only a wrong word string Chinese character string " to want to use ", therefore only to produce a candidate word.Though above generating step second candidate word, second candidate word also is unique candidate word.Being familiar with those skilled in the art must be appreciated that, if a Chinese character string has N wrong word string, then the present invention can produce N candidate word, and produces the candidate sentence (comprising former sentence) of 2N combination according to N candidate word.
In addition, the form that Chinese article debugging method of the present invention is an available programs is recorded among the Storage Media (for example discs, disk sheet and removable hard drive or the like), so that carry out the action of above-mentioned flow process.At this, the program of Chinese article debugging method is made up of most procedure code fragments basically, and the function of these procedure code fragments is to correspond to the step of said method and the functional block diagram of said system.
Though the present invention discloses as above with preferred embodiment; So it is not in order to limit scope of the present invention; Any those of ordinary skill in the art; Do not breaking away from the spirit and scope of the present invention, when can doing a little change and retouching, so protection scope of the present invention defines and is as the criterion when looking appended claim.

Claims (14)

1. Chinese article debugging method is applicable to and handles a plurality of Chinese character strings, it is characterized in that above-mentioned Chinese character string is to cut from a Chinese article according to punctuation mark to form, and above-mentioned Chinese character string comprises one first Chinese character string, comprising:
Cutting above-mentioned first Chinese character string is a plurality of first word groups, and the wherein above-mentioned first word group is formed by any two continuous and discontinuous characters in above-mentioned first Chinese character string;
One database is provided, and wherein above-mentioned database has a plurality of first correct word string, and corresponding to a plurality of first index of the above-mentioned first correct word string, wherein above-mentioned first index is formed by any two continuous and discontinuous characters in the above-mentioned first correct word string;
Obtain above-mentioned first index according to the above-mentioned first word group, and obtain the above-mentioned first corresponding correct word string according to above-mentioned first index of being obtained corresponding to the above-mentioned first word group;
The above-mentioned first correct word string according to being obtained produces an optimal candidate sentences; And
Show above-mentioned Chinese character string and above-mentioned optimal candidate sentences in a display device.
2. Chinese article debugging method as claimed in claim 1 is characterized in that, the generation of above-mentioned optimal candidate sentences is to borrow above-mentioned first Chinese character string in the above-mentioned Chinese character string is replaced with the above-mentioned first correct word string that is obtained.
3. Chinese article debugging method as claimed in claim 1; It is characterized in that; Above-mentioned Chinese character string more comprises one second Chinese character string, and above-mentioned database has more a plurality of second index and the above-mentioned second index pairing a plurality of second correct word string, and said method more to cut above-mentioned second Chinese character string be a plurality of second word groups; And obtain above-mentioned second index corresponding to the above-mentioned second word group according to the above-mentioned second word group; And obtain the above-mentioned second corresponding correct word string according to above-mentioned second index of being obtained, the more above-mentioned first correct word string that is obtained is set at one first candidate word, and the above-mentioned second correct word string is set at one second candidate word.
4. Chinese article debugging method as claimed in claim 3 is characterized in that, more comprises judging whether above-mentioned second Chinese character string is identical with above-mentioned second index.
5. Chinese article debugging method as claimed in claim 4; It is characterized in that; When wherein a when identical of above-mentioned second Chinese character string and above-mentioned second index; Obtain the above-mentioned second index identical pairing above-mentioned second correct word string, and the above-mentioned second correct word string that is obtained is set at above-mentioned second candidate word with above-mentioned second Chinese character string.
6. Chinese article debugging method as claimed in claim 4; It is characterized in that; Above-mentioned database has more the pairing a plurality of specific word strings of above-mentioned second index; When wherein a when identical of above-mentioned second Chinese character string and above-mentioned second index, obtain the above-mentioned second index pairing above-mentioned specific word string identical, and whether comprise the above-mentioned specific word string that is obtained among judging above-mentioned Chinese character string with above-mentioned second Chinese character string.
7. Chinese article debugging method as claimed in claim 6; It is characterized in that; More comprise when comprising the above-mentioned specific word string that is obtained among the above-mentioned Chinese character string; Obtain the above-mentioned second index pairing above-mentioned second correct word string that is same as above-mentioned second Chinese character string, and the above-mentioned second correct word string that is obtained is set at above-mentioned second candidate word.
8. Chinese article debug device is applicable to and handles a plurality of Chinese character strings, it is characterized in that above-mentioned Chinese character string is to cut from a Chinese article according to punctuation mark to form, and above-mentioned Chinese character string comprises one first Chinese character string, comprising:
One article cutting module, cutting above-mentioned first Chinese character string is a plurality of first word groups, the wherein above-mentioned first word group is formed by any two continuous and discontinuous characters in above-mentioned first Chinese character string;
One database has a plurality of first correct word string and corresponding to a plurality of first index of the above-mentioned first correct word string, wherein above-mentioned first index is formed by any two continuous and discontinuous characters in the above-mentioned first correct word string;
One candidate word generation module is obtained above-mentioned first index corresponding to the above-mentioned first word group according to the above-mentioned first word group, and obtains the above-mentioned first corresponding correct word string according to above-mentioned first index of being obtained;
One candidate sentence generating and grading module produce an optimal candidate sentences according to the above-mentioned first correct word string that is obtained; And
One display device shows above-mentioned Chinese character string and above-mentioned optimal candidate sentences.
9. Chinese article debug device as claimed in claim 8 is characterized in that, the generation of above-mentioned optimal candidate sentences is to borrow above-mentioned first Chinese character string in the above-mentioned Chinese character string is replaced with the above-mentioned first correct word string that is obtained.
10. Chinese article debug device as claimed in claim 8; It is characterized in that; Above-mentioned Chinese character string more comprises one second Chinese character string, and above-mentioned database has more a plurality of second index and the above-mentioned second index pairing a plurality of second correct word string, and it is a plurality of second word groups that above-mentioned candidate word generation module more cuts above-mentioned second Chinese character string; And obtain above-mentioned second index corresponding to the above-mentioned second word group according to the above-mentioned second word group; And obtain the above-mentioned second corresponding correct word string according to above-mentioned second index of being obtained, the more above-mentioned first correct word string that is obtained is set at one first candidate word, and the above-mentioned second correct word string is set at one second candidate word.
11. Chinese article debug device as claimed in claim 10 is characterized in that, above-mentioned candidate word generation module judges more whether above-mentioned second Chinese character string is identical with above-mentioned second index.
12. Chinese article debug device as claimed in claim 11; It is characterized in that; When wherein a when identical of above-mentioned second Chinese character string and above-mentioned second index; Above-mentioned candidate word generation module is obtained above-mentioned second index identical with the above-mentioned second Chinese character string pairing above-mentioned second correct word string, and the above-mentioned second correct word string that is obtained is set at above-mentioned second candidate word.
13. Chinese article debug device as claimed in claim 11; It is characterized in that; Above-mentioned database has more the pairing a plurality of specific word strings of above-mentioned second index; When wherein a when identical of above-mentioned second Chinese character string and above-mentioned second index, above-mentioned candidate word generation module is obtained the above-mentioned second index pairing above-mentioned specific word string identical with above-mentioned second Chinese character string, and whether comprises the above-mentioned specific word string that is obtained among judging above-mentioned Chinese character string.
14. Chinese article debug device as claimed in claim 13; It is characterized in that; When comprising the above-mentioned specific word string that is obtained among the above-mentioned Chinese character string; Above-mentioned candidate word generation module is obtained the above-mentioned second index pairing above-mentioned second correct word string that is same as above-mentioned second Chinese character string, and the above-mentioned second correct word string that is obtained is set at above-mentioned second candidate word.
CN200810149253A 2008-09-22 2008-09-22 Chinese article debugging device, Chinese article debugging method Active CN101685438B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200810149253A CN101685438B (en) 2008-09-22 2008-09-22 Chinese article debugging device, Chinese article debugging method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200810149253A CN101685438B (en) 2008-09-22 2008-09-22 Chinese article debugging device, Chinese article debugging method

Publications (2)

Publication Number Publication Date
CN101685438A CN101685438A (en) 2010-03-31
CN101685438B true CN101685438B (en) 2012-09-12

Family

ID=42048602

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200810149253A Active CN101685438B (en) 2008-09-22 2008-09-22 Chinese article debugging device, Chinese article debugging method

Country Status (1)

Country Link
CN (1) CN101685438B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1172992A (en) * 1996-06-25 1998-02-11 微软公司 Method and system for identifying and resolving commonly confused words in natural language parser
CN1755671A (en) * 2004-09-30 2006-04-05 北京大学 Automatic error correction method for query words in search engine
CN101206673A (en) * 2007-12-25 2008-06-25 北京科文书业信息技术有限公司 Intelligent error correcting system and method in network searching process

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1172992A (en) * 1996-06-25 1998-02-11 微软公司 Method and system for identifying and resolving commonly confused words in natural language parser
CN1755671A (en) * 2004-09-30 2006-04-05 北京大学 Automatic error correction method for query words in search engine
CN101206673A (en) * 2007-12-25 2008-06-25 北京科文书业信息技术有限公司 Intelligent error correcting system and method in network searching process

Also Published As

Publication number Publication date
CN101685438A (en) 2010-03-31

Similar Documents

Publication Publication Date Title
TWI391832B (en) Error detection apparatus and methods for chinese articles, and storage media
Zhou et al. The nature of sublexical processing in reading Chinese characters.
Botley et al. Investigating spelling errors in a Malaysian learner corpus
JP4652737B2 (en) Word boundary probability estimation device and method, probabilistic language model construction device and method, kana-kanji conversion device and method, and unknown word model construction method,
Shen et al. Development of orthographic skills in Chinese children
Pedler Computer correction of real-word spelling errors in dyslexic text
CN111400486B (en) Automatic text abstract generation system and method
JP2011518352A (en) A system for teaching writing based on the user's past writing
Ruan Lexical bundles in Chinese undergraduate academic writing at an English medium university
Chen Modern written Chinese in development
US20120164607A1 (en) Application system of multidimensional chinese learning
US20100318346A1 (en) Second language pronunciation and spelling
Neergaard et al. Database of word-level statistics for Mandarin Chinese (DoWLS-MAN)
Sardinha Discourse of academia from a multidimensional perspective
Meletis What is natural in writing? Prolegomena to a Natural Grapholinguistics
CN101685438B (en) Chinese article debugging device, Chinese article debugging method
Unser-Schutz Language as the visual: Exploring the intersection of linguistic and visual language in manga
Mitton et al. The adaptation of an English spellchecker for Japanese writers
Nag Learning to read Kannada and other languages of South Asia
Sorell A study of issues and techniques for creating core vocabulary lists for English as an international language
Spagnoletti et al. Metaphonological abilities of Japanese children
Simoës-Perlant et al. How adolescents with dyslexia dysorthographia use texting
Degaetano-Ortlieb et al. The scientization of literary study
Gnanadesikan Brahmi’s children: Variation and stability in a script family
Read The usability of digital ink technologies for children and teenagers

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant