CN102323921A - Word-by-word sentence comparison method, system, computer program product and recording media - Google Patents

Word-by-word sentence comparison method, system, computer program product and recording media Download PDF

Info

Publication number
CN102323921A
CN102323921A CN201110271090A CN201110271090A CN102323921A CN 102323921 A CN102323921 A CN 102323921A CN 201110271090 A CN201110271090 A CN 201110271090A CN 201110271090 A CN201110271090 A CN 201110271090A CN 102323921 A CN102323921 A CN 102323921A
Authority
CN
China
Prior art keywords
language
phrase
sentence
version
chinese
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201110271090A
Other languages
Chinese (zh)
Inventor
陈淮琰
唐海波
郑建锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inventec Besta Xian Co Ltd
Original Assignee
Inventec Besta Xian Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inventec Besta Xian Co Ltd filed Critical Inventec Besta Xian Co Ltd
Priority to CN201110271090A priority Critical patent/CN102323921A/en
Publication of CN102323921A publication Critical patent/CN102323921A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention provides a word-by-word sentence comparison method, a system, a computer program product and a computer-readable recording media. The method comprises the following steps: searching a first control word bank to judge whether a first language phrase is in the first control word bank, if the first language phrase is in the first control word bank, obtaining a group of second language definitions from the first control word bank; searching a second language phrase variation vocabulary to judge whether a specific second language definition is in the second language phrase variation vocabulary or not; if the specific second language definition is in the second language phrase variation vocabulary, obtaining a group of second language variation phrases from the second language phrase variation vocabulary; comparing a specific second language variation phrase and a second language sentence to judge whether the specific second language variation phrase is in the second language sentence or not; and if the specific second language variation phrase is in the second language sentence, marking the first language phrase and the specific second language variation phrase so as to show the corresponding relationship of the first language phrase and the specific second language variation phrase. The invention has the advantages of accurate comparison results and high efficiency.

Description

Method, system, computer program and recording medium by speech comparison sentence
Technical field
The present invention relates to a kind of method, system, computer program and computer-readable medium storing, especially about a kind of method, system, computer program and computer-readable medium storing of considering synon by speech comparison sentence by speech comparison sentence.
Background technology
In daily life, often identical content need be read simultaneously, but, article content can be correctly explained with affirmation with the article that different language is write as.Thus, will run into the problem that how to contrast the different language sentence.If a kind of simple speech comparison method that pursues can be provided, the different language sentence is compared, article content will be very helpful to promoting understanding.The situation that wherein the most often runs into promptly is Chinese and English comparison.
Formerly in the technology, can adopt usually from the comparison path of Chinese to English, because have a large amount of phrases or common saying in the English, if compare to Chinese from English, the accuracy that obtains is lower.But because the problem of middle English environment for use and grammer, even adopt Chinese to the comparison path of English, also can exist in a large number can't corresponding phrase.Therefore need to propose a kind of new method, to solve the disappearance of prior art by speech comparison sentence.
Summary of the invention
The above-mentioned technical matters of the present invention for existing in the solution background technology, and method, system, computer program and the computer-readable medium storing by speech comparison sentence proposed.
Technical solution of the present invention is that the present invention is a kind of method by speech comparison sentence; Supply first language sentence and second language sentence by speech comparison different language; The corresponding relation of a plurality of second language phrases that a plurality of first language phrases that comprised with decision first language sentence and second language sentence are comprised, its special character is: this method comprises the following step:
1) searches the first contrast dictionary, whether be present in the first contrast dictionary with the first language phrase of judging a plurality of first language phrases;
2) if the first language phrase is present in the first contrast dictionary, then from first contrast obtain that the first language phrase contrasted the dictionary one group of second language lexical or textual analysis;
3) search second language phrase version vocabulary, whether be present in the second language phrase version vocabulary to judge the specific second language lexical or textual analysis in one group of second language lexical or textual analysis;
4) if specific second language lexical or textual analysis is present in the second language phrase version vocabulary, then from second language phrase version vocabulary, obtain one group of second language version phrase of specific second language lexical or textual analysis;
5) whether specific second language version phrase and the second language sentence in one group of second language version phrase of comparison meets the second language phrase of a plurality of second language phrases to judge specific second language version phrase;
6) if specific second language version phrase meets the second language phrase, then first language phrase and second language phrase are carried out mark, have corresponding relation between the two with expression through processor control mark module.
Said method comprises step 7) if one of them second language phrase is marked as with plural first language phrase and has corresponding relation; Then respectively each first language phrase is calculated weighted value, answered corresponding first language phrase with decision second language phrase.
Whether above-mentioned to first language phrase and second language phrase with corresponding relation, analyzing the second language phrase is that contamination is done and modified to language; If, judge that then the second language phrase removes qualifier after, whether have new corresponding relation with the first language phrase; If, the language of second language phrase done with the first language phrase increasing newly in the second contrast dictionary then according to new corresponding relation.
The first language sentence is an english sentence, and the second language sentence is Chinese sentence, and its special character is: this method comprises: is the first language phrase through mark module with the composite marking that has following any specific pattern in the first language sentence:
A. article+(noun or adjective);
B. link-verb+(adjective or verb);
C. (verb or noun)+preposition; Perhaps
D. preposition+(verb or noun).
Storage is used for the computer program by speech comparison sentence program in a kind of, and its special character is: when this program of computer loads and after carrying out, can accomplish the method for claim 1.
A kind of computer-readable medium storing of internally stored program, its special character is: when the computer loads program and after carrying out, can accomplish the method for claim 1.
A kind of system by speech comparison sentence; For first language sentence and second language sentence by speech comparison different language; The corresponding relation of a plurality of second language phrases that a plurality of first language phrases that comprised with decision first language sentence and second language sentence are comprised; Its special character is: this system comprises the first contrast dictionary; Store many group first language entries and corresponding many group second language lexical or textual analysis data, wherein each group first language entry contrasts one group of second language lexical or textual analysis data; Second language phrase version vocabulary stores many group second language entries and relative many groups second language version phrase data, and wherein each group second language entry contrasts one group of second language version phrase data; Whether search module is connected with the first contrast dictionary and second language phrase version vocabulary, is used for searching the first contrast dictionary, be present in the first language phrase of judging a plurality of first language phrases in many groups first language entry of the first contrast dictionary; If then from the first contrast dictionary, obtain one group of second language lexical or textual analysis that the first language phrase is contrasted; And search second language phrase version vocabulary, whether be present in many groups second language entry of second language phrase version vocabulary to judge the specific second language lexical or textual analysis in one group of second language lexical or textual analysis; If then from second language phrase version vocabulary, obtain one group of second language version phrase of specific second language lexical or textual analysis; Comparing module; Be connected with search module; Be used for comparing specific second language version phrase and the second language sentence in one group of second language version phrase, whether meet the second language phrase of a plurality of second language phrases to judge specific second language version phrase; Mark module is connected with comparing module, if comparing module judges that specific second language version phrase meets the second language phrase, then mark module is used for first language phrase and second language phrase are carried out mark, has corresponding relation between the two with expression.
Said system comprises the weighted value computing module; If one of them second language phrase is marked as with plural first language phrase and has corresponding relation; Then respectively each first language phrase is calculated weighted value, answered corresponding first language phrase with decision second language phrase through the weighted value computing module.
Said system comprises that the data secondary excavates module, and whether to first language phrase with corresponding relation and second language phrase, analyzing the second language phrase is that contamination is done and modified to language; If, judge that then the second language phrase removes qualifier after, whether have new corresponding relation with the first language phrase; And if, then according to new corresponding relation, the language of second language phrase done with the first language phrase increasing newly in the second contrast dictionary.
Above-mentioned first language sentence is an english sentence, and the second language sentence is Chinese sentence, is the first language phrase through mark module with the composite marking that has following any specific pattern in the first language sentence:
A. article+(noun or adjective);
B. link-verb+(adjective or verb);
C. (verb or noun)+preposition; Perhaps
Preposition+(verb or noun).
The present invention provides a kind of method, system, computer program and computer-readable medium storing by speech comparison sentence, and comparison result is accurate, and efficient is high.
Description of drawings
Fig. 1 .1, Fig. 1 .2 are the environment for use synoptic diagram of the present invention by speech comparison sentence system;
Fig. 2 is the synoptic diagram of the present invention by speech comparison sentence system;
Fig. 3 is the flow chart of steps of the present invention by speech comparison sentence method;
Fig. 4 .1-4.4 is that synoptic diagram compared in the speech that pursues of specific embodiment of the present invention;
Fig. 5 is the search process synoptic diagram of the specific embodiment of the invention;
Fig. 6 .1, Fig. 6 .2 are the tool specific pattern english sentence synoptic diagram of the specific embodiment of the invention;
Fig. 7 .1, Fig. 7 .2 are the many-one relationship synoptic diagram of the specific embodiment of the invention;
Fig. 8 is the secondary excavation step process flow diagram of the specific embodiment of the invention.
Wherein, by the system-1 of speech comparison sentence, search module-10, comparing module-11, mark module-12; Weighted value computing module-13, data secondary excavate module-14, the first contrast dictionary-15, second language phrase version vocabulary-16, English-Chinese contrast dictionary-26; Chinese lexical or textual analysis one-261, Chinese lexical or textual analysis two-262, Chinese phrase version vocabulary-27, Chinese version phrase one-271, Chinese version phrase two-272; Chinese version phrase three-273, Chinese version phrase four-274 repeats English phrase one-71, repeats English phrase two-72; Chinese phrase-73, corresponding phrase-74 farthest, electronic installation-80, display interface-81; Processor-82, internal memory-83, Storage Media-84 is by speech comparison sentence program-90;
Embodiment
Referring to Fig. 1 .1, the present invention can be connected with electronic installation 80 by the system 1 of speech comparison sentence, through first language sentence and second language sentence are done to compare by speech, and with comparison result shows on display interface 81.But the present invention also can directly carry out on electronic installation 80 with the mode of software program or software program combined with hardware by speech comparison sentence system 1.Referring to Fig. 1 .2, be to be stored in Storage Media 84 by speech comparison sentence program 90, through processor 82 it is loaded on internal memory 83 backs and carries out, produce the present invention by speech comparison sentence method.In an embodiment of the present invention, electronic installation 80 can be flat computer, and display interface 81 is the screen of flat computer, but the invention is not restricted to this.Other is noted that in an embodiment of the present invention, Storage Media 84 can be Winchester disk drive; Storage Media 84 can be any devices that can be used to store computer program such as disk, tape or CD.
Referring to Fig. 2, the present invention comprises that by speech comparison sentence system 1 search module 10, comparing module 11, mark module 12, weighted value computing module 13, data secondary excavate module 14, first contrast dictionary 15 and the second language phrase version vocabulary 16.Be noted that in an embodiment of the present invention, it is hardware unit, software program, tough external that above-mentioned each module is removed configurable; Also can be through circuit loop or other suitable pattern configuration; And each module is except that pattern configuration that can be independent, and the pattern that can also combine disposes.
In an embodiment of the present invention, stored many groups first language entry and corresponding many groups second language lexical or textual analysis data in the first contrast dictionary 15, wherein each group first language entry contrasts in one group of second language lexical or textual analysis data; 16 of second language phrase version vocabularys have stored many groups second language entry and corresponding many groups second language version phrase data, and wherein each group second language entry contrasts in one group of second language version phrase data.Search module 10 is connected with the first contrast dictionary 15 and second language phrase version vocabulary 16; Be used to search the first contrast dictionary 15; To judge whether a certain first language phrase is present in the many groups first language entry in the first contrast dictionary 15; Exist if confirm; Then from the first contrast dictionary 15, obtain one group of second language lexical or textual analysis that the first language phrase is contrasted, search second language phrase version vocabulary 16 afterwards, whether be present in the many groups second language entry in the second language phrase version vocabulary 16 to judge a certain specific second language lexical or textual analysis; If confirm to exist, then from second language phrase version vocabulary 16, obtain one group of second language version phrase of specific second language lexical or textual analysis.Comparing module 11 electrically connects with search module 10; Be used for a certain specific second language version phrase and second language sentence are compared, whether meet a plurality of second language phrases second language phrase wherein to judge specific second language version phrase.If be judged as be, then carry out mark by 12 pairs of first language phrases of mark module and second language phrase, expression has corresponding relation between the two.In addition, mark module 12 can be one group of first language phrase with the composite marking that has specific pattern in the first language sentence also, detailed process after have detailed narration.
In an embodiment of the present invention; Also comprise weighted value computing module 13 by speech comparison sentence system 1; If there are a plurality of first language phrases to be labeled corresponding relation is arranged with certain second language phrase; Then weighted value computing module 13 can calculate weighted value to each first language phrase respectively, is answered corresponding first language phrase with decision second language phrase.The detailed calculated process has detailed narration after a while, repeats no more at this.Also comprise data secondary excavation module 14 by speech comparison sentence system 1, whether to first language phrase with corresponding relation and second language phrase, analyzing the second language phrase is that language is done and the modification contamination; If then judge after the second language phrase removes qualifier whether have new corresponding relation with the first language phrase; If, the language of second language phrase done with the first language phrase increasing newly in the second contrast dictionary then according to this new corresponding relation.Detailed process repeats no more at this being described in detail after a while.Be noted that it is not the necessary assembly of the present invention that weighted value computing module 13 and data secondary excavate module 14.The present invention also can omit weighted value computing module 13 and the data secondary excavates module 14, so still can reach the object of the invention.
Referring to Fig. 3; Fig. 4 .1-Fig. 4 .4 and Fig. 5, in the present embodiment, will be with english sentence 60 as the first language sentence; With Chinese sentence 50 as the second language sentence; With English-Chinese contrast dictionary 26 as the first contrast dictionary, and with Chinese phrase version vocabulary 27 as second language phrase version vocabulary, explain that the present invention pursues speech comparison sentence method.Be noted that the present invention is not only applicable to compare Chinese sentence and english sentence by the method for speech comparison sentence, the first language sentence of any different language and second language sentence all can use this method to pursue the speech comparison.In addition, though following the present invention compares the method for sentence with Fig. 1 .1, Fig. 1 .2 and shown in Figure 2 describing by speech comparison sentence system 1 by speech, the inventive method is not exceeded to be applied to pursuing speech comparison sentence system 1.
At first carry out step 301: obtain Chinese sentence and english sentence.
Referring to Fig. 4 .1, the present invention at first need obtain Chinese sentence 50 and the english sentence 60 that is presented on the display interface 81, for carrying out subsequent step.And about how obtaining Chinese sentence 50 and english sentence 60, be not emphasis of the present invention, so this does not give unnecessary details.
Then carry out step 302: english sentence and Chinese sentence are carried out participle, it being divided into a plurality of English phrases and a plurality of Chinese phrase, and it is done preliminary by speech comparison, mark comparison result.
Referring to Fig. 4 .2; Before pursuing the speech comparison; The present invention can be earlier carry out the participle action to english sentence 60, it is divided into seven groups of English phrases such as English phrase 1, English phrase 2 62, English phrase 3 63, English phrase 4 64, English phrase 5 65, English phrase 6 66 and English phrase 7 67.Also can be referring to Fig. 4 .3; Centering sentence 50 carries out participle in addition; It is divided into a plurality of Chinese phrases; Comprise Chinese phrase 1, Chinese phrase 2 52, Chinese phrase 3 53 and Chinese phrase 4 54, and first centering sentence 50 and english sentence 60 do Chinese to English by speech comparison, its corresponding relation of mark.Be noted that how english sentence 60 and Chinese sentence 50 are made participle, and how carry out Chinese the speech that pursues of English is compared that existing many prior arts can be for reference, so do not give unnecessary details at this.
From Fig. 4 .3, can be observed, English is compared by speech, then Chinese phrase 1 (the Chinese people) and the English phrase that can't seek correspondingly if just do simple Chinese.This be because do Chinese to English by the speech comparison time; Employed contrast between Chinese and English dictionary does not have " the Chinese people " this speech; But, can't tell " China " and reach " people " these two speech, so the English phrase that can't seek correspondingly because the Chinese word segmentation blur level is excessive.So the present invention is by the method for speech comparison sentence, purpose is to solve this type of problem, does by the speech comparison toward Chinese so increase from English, to improve the hit rate by the speech comparison.
Be noted that, before step 303, carry out earlier Chinese and English participle and Chinese to English by the speech comparison, be not this method steps necessary, i.e. the present invention can omit this process, directly carry out English to Chinese by the speech comparison.
Then carry out step 303: search English-Chinese contrast dictionary, to judge whether wherein specific English phrase is present in the English-Chinese contrast dictionary.
After Chinese and English participle and preliminary comparison completion, the present invention carry out step 303, to specific English phrase, searches through 10 pairs of English-Chinese contrast dictionaries 26 of search module, to confirm whether this specific English phrase is present in the English-Chinese contrast dictionary 26.If confirm to exist, then carry out step 304; Otherwise promptly finish the treatment step of this specific English phrase, then the English phrase of the next one is dealt with.Be convenient explanation present embodiment, below will be with English phrase 1 (Chinese) as specific English phrase, step after explaining.
If specific English phrase is present in the English-Chinese contrast dictionary, then carry out step 304: from English-Chinese contrast dictionary, obtain one group of Chinese lexical or textual analysis of specific English phrase contrast.
Referring to Fig. 5, after confirming that English phrase 1 (Chinese) is present in the English-Chinese contrast dictionary 26, the Chinese lexical or textual analysis of therefrom obtaining English phrase 1 (Chinese) through search module 10: Chinese lexical or textual analysis 1 (China), Chinese lexical or textual analysis 2 262 (China).Be noted that for how to search English-Chinese contrast dictionary 26 and how to obtain Chinese lexical or textual analysis, the present invention does not limit, and existing many methods can accomplish, so do not give unnecessary details at this.
Then carry out step 305: search Chinese phrase version vocabulary, to judge whether wherein specific Chinese lexical or textual analysis is present in the Chinese phrase version vocabulary.
Referring to Fig. 5; After the one group of Chinese lexical or textual analysis that obtains English phrase 1 (Chinese); To wherein one group of specific Chinese lexical or textual analysis, search Chinese phrase version vocabulary 27 through search module 10, whether be present in the Chinese phrase version vocabulary 27 to judge this specific Chinese lexical or textual analysis.If confirm to exist, then carry out step 306; Otherwise promptly finish the treatment step of this specific Chinese lexical or textual analysis, then next is organized Chinese lexical or textual analysis and deal with.Be convenient explanation present embodiment, below will be with Chinese lexical or textual analysis 1 (China) as specific Chinese lexical or textual analysis, step after explaining.
If specific Chinese lexical or textual analysis exists in the Chinese phrase version vocabulary 27, then carry out step 306: one group of Chinese version phrase from Chinese word group version vocabulary, obtaining specific Chinese lexical or textual analysis.
After confirming that Chinese lexical or textual analysis 1 (China) is present in the Chinese phrase version vocabulary 27; Therefrom obtain the Chinese version phrase of Chinese lexical or textual analysis 1 (China) through search module 23: Chinese version phrase 1 (China), Chinese version phrase 2 272 (China), Chinese version phrase 3 273 (China) and Chinese version phrase 4 274 (China) are respectively the traditional font synonym of " China ", simplified synonym, traditional font pattern and simplified pattern.Be noted that the pattern of Chinese version phrase is not limited thereto four, can comprise other designate of " China ".In addition for how to search Chinese phrase version vocabulary 27 and how to obtain Chinese version phrase, the present invention does not limit, and existing many methods can accomplish, so do not give unnecessary details at this.
Then carry out step 307: whether wherein specific Chinese version phrase and Chinese sentence compare, be present in the Chinese sentence to judge specific Chinese version phrase.
This step can be carried out through comparing module 11, and whether comparison is one group of specific Chinese version phrase and Chinese sentence 50 wherein, be present in the Chinese sentence 50 to judge specific Chinese version phrase.If confirm to exist, then carry out step 308; Otherwise promptly finish the treatment step of this specific Chinese version phrase, then next is organized Chinese version phrase and deal with.Be convenient explanation present embodiment, below will be with Chinese version phrase 1 (China) as specific Chinese version phrase, step after explaining.Be noted that, the process of how to compare, existing many prior arts can be accomplished, and non-emphasis of the present invention, so do not give unnecessary details at this.
If specific Chinese version phrase is present in the Chinese sentence 50, then carry out step 308: specific English phrase and specific Chinese version phrase are carried out mark, have corresponding relation between the two with expression.
Referring to Fig. 4 .4; Comparing module 11 is after confirming that Chinese version phrase 1 (China) is present in the Chinese sentence 50; Just by mark module 12 with Chinese version phrase 1 (China) and English phrase 1 (Chinese) mark, expression has corresponding relation between the two.After the processing procedure that finishes English phrase 1 (Chinese); The inventive method will continue the English phrase of the next one is dealt with; So also can carry out above steps flow chart subsequently to English phrase 2 62 (people); Find " people " in the Chinese sentence 50, and mark corresponding relation between the two.Be noted that how to carry out the process of mark, and mark corresponding relation in which way, existing many prior arts can be accomplished, and non-emphasis of the present invention, so do not give unnecessary details at this.
In addition, the present invention carries out in the step of participle at step 302 pair english sentence, if the combination that has specific pattern in the english sentence 60, the present invention also can be labeled as English phrase with it through mark module 12.In embodiments of the present invention, the combination of specific pattern comprises:
Article+(noun or adjective);
Link-verb+(adjective or verb);
(verb or noun)+preposition; And
Preposition+(verb or noun).
Referring to Fig. 6 .1, Fig. 6 .2, english sentence and Chinese sentence obtain the result like Fig. 6 .1, wherein " to through by after the speech comparison " and " the " be not labeled.But because " to accept " meet the principle of " preposition+(verb or noun) "; " The job " also meet the principle of " article+(noun or adjective) "; and so mark module 12 can be labeled as new English phrase again with it, the result behind the mark is shown in Fig. 6 .2.Be noted that how to recognize the part of speech of English phrase, existing many prior arts can be accomplished, and non-emphasis of the present invention, so do not give unnecessary details at this.This step is not a steps necessary of the present invention in addition, does not influence the execution of the present invention by speech comparison sentence method.
In addition; In step 308; The present invention can respectively to each english set of calculated weighted value, be answered corresponding unique English phrase to determine Chinese phrase through weighted value computing module 13 to being marked as a plurality of English phrase that corresponding relation is arranged with certain Chinese phrase.Referring to Fig. 7 .1, Fig. 7 .2; The English phrase " I " that two repetitions are arranged in english sentence; Promptly repeat English phrase 1 and repeat English phrase 2 72, these two English phrases all possibly correspond to single Chinese phrase " I " in the Chinese sentence, promptly Chinese phrase 73.So the present invention can be through weighted value computing module 13 to the English phrase 1 of repetition and repeat English phrase 2 72 and respectively calculate weighted value, with decision by whose corresponding Chinese phrase 73.The present invention calculates the mode of weighted value; Be based on phrase and can present local continuity; If Chinese sentence among a small circle in continuous two speech of appearance; In english sentence, also occur continuously, then Chinese sentence among a small circle in the 3rd speech occurring also have very high probability appear in the english sentence corresponding among a small circle in.
The detailed calculated process is following:
Calculate the Nmax value:
At first will calculate the Nmax value, there has been the ultimate range of the ordering between corresponding Chinese phrase for the ordering of Chinese phrase 73 in Chinese sentence in it with other.Referring to Fig. 7 .1, except that Chinese phrase 73, other Chinese phrase is all existing corresponding; And ultimate range appears at Chinese phrase 73 (in the present embodiment, its ordering is 2) and farthest 74 of corresponding phrases (in the present embodiment, its ordering is 5); Its distance is 3, is 3 so get the Nmax value.
There has been corresponding Chinese phrase to each, calculated magnification:
The magnification formula is: BaseThr ( x ) = 2 ( N Max - | n X - n X ′ | ) , Wherein nX ' is the ordering (in the present embodiment, its ordering be 2) of Chinese phrase 73 in Chinese sentence, and there be the corresponding ordering (in the present embodiment, its ordering be respectively 1,3,4,5) of Chinese phrase in Chinese sentence in nX for each.So can obtain following column count:
Chinese phrase nX |nx-nx’| Nmax-|nx-nx’| Magnification
1 |1-2|=1 3-1=2 4
3 |3-2|=1 3-1=2 4
4 |4-2|=2 3-2=1 2
5 |5-2|=3 3-3=0 1
Had corresponding English phrase to each, calculating and each repeat the distance weighting between English phrase:
The distance weighting formula is: DesStep (x)=DesLen-|Des (x)-Des (x ') |
Wherein DesLen is the group number of English phrase, is 6 in this example.X is the ordering (in the present embodiment, its ordering is 2 and 4) of repetition english group, and there has been the ordering (in the present embodiment, its ordering is 1,3,5 and 6) of corresponding English phrase in x ' for each.So can obtain following column count:
With the distance weighting that repeats English phrase 1 (in the present embodiment, its ordering is 2):
Figure BDA0000091947100000111
With the distance weighting that repeats English phrase 2 72 (in the present embodiment, its ordering is 4):
Figure BDA0000091947100000112
Repeats English phrase to each, calculate weighted value, and the big person of selection weighted value is corresponding English phrase:
The weighted value formula is: DesThr (n)=∑ (BaseThr (x) * DesThr (x))
By above formula, can calculate the weighted value of the English phrase of repetition.
Repeat the weighted value of English phrase 1=
DesThr(1)=4*5+4*5+2*3+1*2=48。
Repeat the weighted value of English phrase 2 72=
DesThr(3)=4*3+4*5+2*5+1*4=46。
Because it is higher to repeat the weighted value of English phrase 1, be corresponding English phrase so select the English phrase 1 of repetition, last mark result is referring to Fig. 7 .2.
Be noted that though present embodiment calculates weighted value with above-mentioned account form, the present invention is not limited to this mode, any have the account form of similar notion all can use; Above-mentioned steps is not a steps necessary of the present invention in addition, and these steps do not influence the execution of the present invention by speech comparison sentence method.
The present invention can excavate 14 pairs of existing dictionary data of module through the data secondary in addition and carry out the secondary excavation by the method for speech comparison sentence, increases the data volume of dictionary internal memory.Referring to Fig. 8, be convenient explanation present embodiment, below will with " beautiful " represent the first language phrase, represent the second language phrase with " beautiful ", contrast dictionary with the contrast between Chinese and English dictionary as second, explain orally secondary excavation flow process of the present invention.
At first carry out step 801: the part of speech of judging English phrase whether be adjective and Chinese phrase whether with " " finish.
Judge at first whether " beautiful " part of speech is adjective, and judge " beautiful " whether with " " finish." beautiful " is regarded as language and does herein, and " " be regarded as qualifier.Judged result is if both all are then carry out step 802.Be noted that, how to judge English phrase part of speech and Chinese phrase whether with " " finish, existing many prior arts can be accomplished, and non-emphasis of the present invention, so do not give unnecessary details at this.
If both all are then carry out step 802: judge Chinese phrase remove " " after speech longly whether surpass a Chinese character.
If step 801 result is for being, then continue to judge " beautiful " remove " " after speech whether grow greater than a Chinese character.Judged result then carry out step 803 if yes.
If to be judged as is then carry out step 803: provide and transform Chinese phrase, wherein transform Chinese phrase for Chinese phrase is removed " ".
Because in step 802, " beautiful " removed " " after speech grow still greater than a Chinese character, so can continue execution in step 803.Judge module 41 " beautiful " removed " " after, provide and transform Chinese phrase " beautiful ".
Follow execution in step 804: search Chinese and English natural glossary dictionary, judge whether transform Chinese phrase is present in the Chinese and English natural glossary dictionary.
Then centering English natural glossary dictionary is searched, and to judge whether transform Chinese phrase " beautiful " is present in wherein, wherein the natural glossary dictionary of the present invention's Chinese and English comprises all set English and Chinese natural vocabulary.Judged result then carry out step 805 if yes.
If be judged as is then carry out step 805: add and transform in Chinese phrase and English phrase to the contrast between Chinese and English dictionary.
If step 804 result for being, then proceeds step 805, " beautiful " reached " beautiful " be added in the contrast between Chinese and English dictionary, and both corresponding relations of mark.
Be noted that above secondary excavation step 801 to 805 is not a steps necessary of the present invention, these steps do not influence the execution of the present invention by speech comparison sentence method.

Claims (10)

1. one kind is pursued the method that sentence compared in speech; Supply first language sentence and second language sentence by speech comparison different language; The corresponding relation of a plurality of second language phrases that a plurality of first language phrases that comprised with decision first language sentence and second language sentence are comprised, it is characterized in that: this method comprises the following step:
1) searches the first contrast dictionary, whether be present in the first contrast dictionary with the first language phrase of judging a plurality of first language phrases;
2) if the first language phrase is present in the first contrast dictionary, then from first contrast obtain that the first language phrase contrasted the dictionary one group of second language lexical or textual analysis;
3) search second language phrase version vocabulary, whether be present in the second language phrase version vocabulary to judge the specific second language lexical or textual analysis in one group of second language lexical or textual analysis;
4) if specific second language lexical or textual analysis is present in the second language phrase version vocabulary, then from second language phrase version vocabulary, obtain one group of second language version phrase of specific second language lexical or textual analysis;
5) whether specific second language version phrase and the second language sentence in one group of second language version phrase of comparison meets the second language phrase of a plurality of second language phrases to judge specific second language version phrase;
6) if specific second language version phrase meets the second language phrase, then first language phrase and second language phrase are carried out mark, have corresponding relation between the two with expression through processor control mark module.
2. the method by speech comparison sentence according to claim 1; It is characterized in that: said method comprises step 7) if one of them second language phrase is marked as with plural first language phrase and has corresponding relation; Then respectively each first language phrase is calculated weighted value, answered corresponding first language phrase with decision second language phrase.
3. the method by speech comparison sentence according to claim 1 is characterized in that: said first language phrase and the second language phrase that be directed against with corresponding relation, and whether analyze the second language phrase is that language is done and the modification contamination; If, judge that then the second language phrase removes qualifier after, whether have new corresponding relation with the first language phrase; If, the language of second language phrase done with the first language phrase increasing newly in the second contrast dictionary then according to new corresponding relation.
4. the method by speech comparison sentence according to claim 1; The first language sentence is an english sentence; The second language sentence is Chinese sentence, it is characterized in that: this method comprises: is the first language phrase through mark module with the composite marking that has following any specific pattern in the first language sentence:
A. article+(noun or adjective);
B. link-verb+(adjective or verb);
C. (verb or noun)+preposition; Perhaps
D. preposition+(verb or noun).
5. storage is used for the computer program by speech comparison sentence program in one kind, it is characterized in that: when this program of computer loads and after carrying out, can accomplish the method for claim 1.
6. the computer-readable medium storing of an internally stored program is characterized in that: when the computer loads program and after carrying out, can accomplish the method for claim 1.
7. one kind is pursued the system that sentence compared in speech; For first language sentence and second language sentence by speech comparison different language; The corresponding relation of a plurality of second language phrases that a plurality of first language phrases that comprised with decision first language sentence and second language sentence are comprised; It is characterized in that: this system comprises the first contrast dictionary, stores many group first language entries and corresponding many group second language lexical or textual analysis data, and wherein each group first language entry contrasts one group of second language lexical or textual analysis data; Second language phrase version vocabulary stores many group second language entries and relative many groups second language version phrase data, and wherein each group second language entry contrasts one group of second language version phrase data; Whether search module is connected with the first contrast dictionary and second language phrase version vocabulary, is used for searching the first contrast dictionary, be present in the first language phrase of judging a plurality of first language phrases in many groups first language entry of the first contrast dictionary; If then from the first contrast dictionary, obtain one group of second language lexical or textual analysis that the first language phrase is contrasted; And search second language phrase version vocabulary, whether be present in many groups second language entry of second language phrase version vocabulary to judge the specific second language lexical or textual analysis in one group of second language lexical or textual analysis; If then from second language phrase version vocabulary, obtain one group of second language version phrase of specific second language lexical or textual analysis; Comparing module; Be connected with search module; Be used for comparing specific second language version phrase and the second language sentence in one group of second language version phrase, whether meet the second language phrase of a plurality of second language phrases to judge specific second language version phrase; Mark module is connected with comparing module, if comparing module judges that specific second language version phrase meets the second language phrase, then mark module is used for first language phrase and second language phrase are carried out mark, has corresponding relation between the two with expression.
8. a kind of system according to claim 7 by speech comparison sentence; It is characterized in that: said system comprises the weighted value computing module; If one of them second language phrase is marked as with plural first language phrase and has corresponding relation; Then respectively each first language phrase is calculated weighted value, answered corresponding first language phrase with decision second language phrase through the weighted value computing module.
9. a kind of system according to claim 7 by speech comparison sentence; It is characterized in that: said system comprises that the data secondary excavates module; Whether to first language phrase with corresponding relation and second language phrase, analyzing the second language phrase is that language is done and the modification contamination; If, judge that then the second language phrase removes qualifier after, whether have new corresponding relation with the first language phrase; And if, then according to new corresponding relation, the language of second language phrase done with the first language phrase increasing newly in the second contrast dictionary.
10. a kind of system according to claim 7 by speech comparison sentence; It is characterized in that: said first language sentence is an english sentence; The second language sentence is Chinese sentence, is the first language phrase through mark module with the composite marking that has following any specific pattern in the first language sentence:
A. article+(noun or adjective);
B. link-verb+(adjective or verb);
C. (verb or noun)+preposition; Perhaps
D. preposition+(verb or noun).
CN201110271090A 2011-09-16 2011-09-16 Word-by-word sentence comparison method, system, computer program product and recording media Pending CN102323921A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110271090A CN102323921A (en) 2011-09-16 2011-09-16 Word-by-word sentence comparison method, system, computer program product and recording media

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110271090A CN102323921A (en) 2011-09-16 2011-09-16 Word-by-word sentence comparison method, system, computer program product and recording media

Publications (1)

Publication Number Publication Date
CN102323921A true CN102323921A (en) 2012-01-18

Family

ID=45451665

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110271090A Pending CN102323921A (en) 2011-09-16 2011-09-16 Word-by-word sentence comparison method, system, computer program product and recording media

Country Status (1)

Country Link
CN (1) CN102323921A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942188A (en) * 2013-01-22 2014-07-23 腾讯科技(深圳)有限公司 Method and device for identifying corpus languages
US9336197B2 (en) 2013-01-22 2016-05-10 Tencent Technology (Shenzhen) Company Limited Language recognition based on vocabulary lists

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942188A (en) * 2013-01-22 2014-07-23 腾讯科技(深圳)有限公司 Method and device for identifying corpus languages
US9336197B2 (en) 2013-01-22 2016-05-10 Tencent Technology (Shenzhen) Company Limited Language recognition based on vocabulary lists

Similar Documents

Publication Publication Date Title
Chiron et al. Impact of OCR errors on the use of digital libraries: towards a better access to information
US8606559B2 (en) Method and apparatus for detecting errors in machine translation using parallel corpus
US20170242840A1 (en) Methods and systems for automated text correction
Snover et al. Ter-plus: paraphrase, semantic, and alignment enhancements to translation edit rate
US8171403B2 (en) System and method for managing acronym expansions
US8881005B2 (en) Methods and systems for large-scale statistical misspelling correction
US8473278B2 (en) Systems and methods for identifying collocation errors in text
US8543376B2 (en) Apparatus and method for decoding using joint tokenization and translation
US10275454B2 (en) Identifying salient terms for passage justification in a question answering system
KR20140021838A (en) Method for detecting grammar error and apparatus thereof
JP2008216756A (en) Technique for acquiring character string or the like to be newly recognized as phrase
Darwish et al. Using Stem-Templates to Improve Arabic POS and Gender/Number Tagging.
KR20150007647A (en) Method and system for statistical context-sensitive spelling correction using confusion set
CN112101032A (en) Named entity identification and error correction method based on self-distillation
CN101369285B (en) Spell emendation method for query word in Chinese search engine
Perera et al. A self-learning context-aware lemmatizer for German
US20160283597A1 (en) Fast substring fulltext search
CN102323921A (en) Word-by-word sentence comparison method, system, computer program product and recording media
Wu et al. Reducing the false alarm rate of Chinese character error detection and correction
Khan et al. Challenges in developing a rule based urdu stemmer
US20230055769A1 (en) Specificity ranking of text elements and applications thereof
Shang et al. Strategy-based technology for estimating MT quality
Wiechetek et al. Seeing more than whitespace—Tokenisation and disambiguation in a North Sámi grammar checker
JP5673265B2 (en) Calibration support apparatus and calibration support program
Baldwin The hare and the tortoise: speed and accuracy in translation retrieval

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20120118