CN107066454A - Number and sequence number replacement method and system for machine translation - Google Patents

Number and sequence number replacement method and system for machine translation Download PDF

Info

Publication number
CN107066454A
CN107066454A CN201710187175.1A CN201710187175A CN107066454A CN 107066454 A CN107066454 A CN 107066454A CN 201710187175 A CN201710187175 A CN 201710187175A CN 107066454 A CN107066454 A CN 107066454A
Authority
CN
China
Prior art keywords
sequence number
sentence
translation
former
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710187175.1A
Other languages
Chinese (zh)
Inventor
海同舟
李明
王兴强
彭成超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Excellent Translation Information Technology Ltd By Share Ltd
Original Assignee
Chengdu Excellent Translation Information Technology Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Excellent Translation Information Technology Ltd By Share Ltd filed Critical Chengdu Excellent Translation Information Technology Ltd By Share Ltd
Priority to CN201710187175.1A priority Critical patent/CN107066454A/en
Publication of CN107066454A publication Critical patent/CN107066454A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/45Example-based machine translation; Alignment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Abstract

The invention discloses the number for machine translation and sequence number replacement method, by choosing with reference to sentence, reference translation, sequence number table sum vocabulary, sequence number and number in original text is replaced with to the sequence number and number for meeting translation term custom, and when changing with reference to sentence, reference translation, sequence number table sum vocabulary with language change and term environmental change, the present invention may be used for multilingual and term environment, improve the applicability and versatility of translation.The invention discloses the number for machine translation and sequence number replacement system, by choosing unit selection with reference to sentence, reference translation, sequence number table sum vocabulary, replacement unit replaces with the sequence number and number in original text the sequence number and number for meeting translation term custom, and when changing with reference to sentence, reference translation, sequence number table sum vocabulary with language change and term environmental change, the present invention may be used for multilingual and term environment, improve the applicability and versatility of translation.

Description

Number and sequence number replacement method and system for machine translation
Technical field
The present invention relates to machine translation field, and in particular to for machine translation number and sequence number replacement method and be System.
Background technology
As international cooperation is increasingly strengthened, either Chinese companies or foreign corporation, in international project and international scientific research Project verification and development during, be required for largely translating the files such as engineering, scientific research and law.Meanwhile, many domestic projects and section The project of grinding is also required to translation associated documents for foreign capital investment and international bank lending etc..The files such as engineering, scientific research and law In often relate to substantial amounts of number and sequence number, due to the difference of various language, the writing mode of these numbers and sequence number often has Greatly difference, increases the translation difficulty of number and sequence number.
Translation of the tradition to number and sequence number, often using human translation, wastes time and energy, cost is higher;And to number and It is many using simple literal translation when sequence number carries out machine translation, translation it is poor for applicability;If in order to improve the applicability of translation, Corresponding interpretation method is made for a kind of language, then the poor universality translated, multilingual translation cost is high.
The content of the invention
The technical problems to be solved by the invention are when carrying out machine translation to number and sequence number, the applicability of translation and logical Poor with property, multilingual translation cost is high, it is therefore intended that provide the number and sequence number replacement method and system for machine translation, solves Certainly above mentioned problem.
The present invention is achieved through the following technical solutions:
For the number and sequence number replacement method of machine translation, comprise the following steps:S1:Chosen from semantic base and former sentence Most like reference sentence, and select the reference translation with matching with reference to sentence;S2:Judge former sentence with whether not existed together with reference to sentence For sequence number or number;S3:If do not existed together for sequence number, sequence is carried out to former sentence sequence number according to reference to sentence, reference translation and sequence number table Number replace;If do not existed together for number, number replacement is carried out to former sentence number according to reference to sentence, reference translation sum vocabulary.
In the prior art, it is many using simple literal translation, the applicability of translation when carrying out machine translation to number and sequence number In difference, such as English, legal provision sequence number is using form as " (1) ", during machine translation legal provision, often by this sequence Number directly it is translated as " (1) ", and the legal provision of China is using term mode as " (one) ";And made for a kind of language Interpretation method be only used for this language, the poor universality of translation, if necessary to be translated to multilingual progress number and sequence number, Then need to make a variety of interpretation methods, translation cost is high.
When the present invention is applied, the reference sentence most like with former sentence is first chosen from semantic base, and select with referring to sentence phase The reference translation of matching, whether with reference to sentence do not exist together be sequence number or number, if do not existed together for sequence number, root if then judging former sentence Sequence number replacement is carried out to former sentence sequence number according to reference to sentence, reference translation and sequence number table;If do not existed together for number, according to reference to sentence, Reference translation sum vocabulary carries out number replacement to former sentence number.The present invention by choose with reference to sentence, reference translation, sequence number table and Number table, the sequence number and number in original text is replaced with the sequence number and number for meeting translation term custom, improves the suitable of translation With property, and when referring to sentence, reference translation, sequence number table sum vocabulary and changing with language change and term environmental change, this hair It is bright to may be used for multilingual and term environment, improve the versatility of translation.
Further, step S1 also includes following sub-step:Choose isometric and WER values minimum with former sentence from semantic base Sentence is used as the reference sentence most like with former sentence;The WER values are that the minimum step number with reference to required for sentence is modified as from former sentence;Institute State WER values and use Lay Weinstein distance.
When the present invention is applied, when choosing the reference sentence most like with former sentence, the sentence isometric with former sentence is chosen, and pass through meter Calculate WER values to choose most like reference sentence, isometric reference sentence with former sentence when being replaced with former sentence, it is not necessary to considers sentence Sub- content is deleted, it is ensured that replace the accuracy of position, improves the accuracy that the present invention replaces translation.
Further, present invention additionally comprises following steps:Make sequence number table sum vocabulary;The sequence number table includes sequence number class Type and sequence number numerical value corresponding with sequence number;The number table includes number type and number numerical value corresponding with number.
When the present invention is applied, before using sequence number table sum vocabulary, sequence number table sum vocabulary is made as needed, is made Sequence number table include serial number type and sequence number numerical value corresponding with sequence number, the number table of making include number type and with number pair The number numerical value answered, this corresponding relation, clear in structure can improve what is replaced when being replaced translation to sequence number number Accuracy.
Further, the sequence number is replaced and comprised the following steps:If with reference to sentence sequence number, reference translation sequence number and former sentence sequence Number serial number type all belonged in sequence number table, then calculate the sequence number numerical value of former sentence sequence number according to sequence number meter, and according to the sequence number The serial number type of numerical value and reference translation finds corresponding sequence number in sequence number table and replaces former sentence sequence number;If with reference to sentence sequence number, Any one is not belonging to the serial number type in sequence number table in reference translation sequence number and former sentence sequence number, and with reference to sentence sequence number, with reference to translating Literary sequence number and former sentence sequence number are all multistage Arabic numerals, then to the former sentence sequence number without modification;If with reference to sentence sequence number, reference Any one is not belonging to the serial number type in sequence number table in translation sequence number and former sentence sequence number, nor multistage Arabic numerals, then The serial number type of sequence number table is expanded in request.
When the present invention is applied, if all belonged to reference to sentence sequence number, reference translation sequence number and former sentence sequence number in sequence number table Serial number type, then replace former sentence sequence number according to the sequence number numerical value of original text with the serial number type of reference translation, such substitute mode, Both meet the term custom of sequence number after translation, also ensure that the accuracy of sequence number after translation;If with reference to sentence sequence number, reference translation Any one is not belonging to the serial number type in sequence number table in sequence number and former sentence sequence number, and with reference to sentence sequence number, reference translation sequence number and Former sentence sequence number is all multistage Arabic numerals, then does not replace;The form of multistage Arabic numerals is " 3.1.1 " and " 2-3-1 " Etc. type, this type often occurs in the form of heading order number hereof, and the international usage of this heading order number compares system One, so to former sentence sequence number without modification, directly apply the sequence number of former sentence just can be with.If with reference to sentence sequence number, reference translation sequence Number and former sentence sequence number in any one be not belonging to serial number type in sequence number table, nor multistage Arabic numerals, then request is expanded The serial number type of sequence number table is filled, the sequence number table versatility after expansion is wider, and sequence number table can be applied in each expansion More multilingual and application field, further increases the versatility of the present invention.
Further, the number is replaced and comprised the following steps:If with reference to sentence number, reference translation number and former sentence number The number type that word is all belonged in number table, then calculate the number numerical value of former sentence number according to number meter, and according to the number The number type of numerical value and reference translation finds corresponding number in number table and replaces former sentence number;If with reference to sentence number, Any one is not belonging to the number type in number table in reference translation number and former sentence number, then the number of number table is expanded in request Part of speech type.
When the present invention is applied, if all belonged to reference to sentence number, reference translation number and former sentence number in number table Number type, then replace former sentence number according to the number numerical value of original text with the number type of reference translation, such substitute mode, Both meet the term custom of number after translation, also ensure that the accuracy of number after translation;If with reference to sentence number, reference translation Any one is not belonging to the number type in number table in number and former sentence number, then the number type of number table is expanded in request, Number table versatility after expansion is wider, and number table can be applied to more multilingual and application field in each expansion, Further increase the versatility of the present invention.
Further, the sequence number table increases serial number type with the change of languages and application field;The number table Increase number type with the change of languages and application field.
, can be with logarithm vocabulary and sequence number table according to this when language languages and application field change when the present invention is applied Change increase serial number type sum part of speech type, further increases the versatility of the present invention.
Number and sequence number replacement system for machine translation, it is characterised in that including:For from semantic base choose with The most like reference sentence of former sentence, and select the selection unit of the reference translation with matching with reference to sentence;For judge former sentence with Do not existed together with reference to sentence whether be sequence number or number judging unit;For when not existing together for sequence number according to reference to sentence, with reference to translating Text and sequence number table carry out sequence number replacement to former sentence sequence number, when not existing together for number according to reference to sentence, reference translation sum vocabulary The replacement unit of number replacement is carried out to former sentence number.
In the prior art, it is many using simple literal translation, the applicability of translation when carrying out machine translation to number and sequence number In difference, such as English, legal provision sequence number is using form as " (1) ", during machine translation legal provision, often by this sequence Number directly it is translated as " (1) ", and the legal provision of China is using term mode as " (one) ";And made for a kind of language Translation system be only used for this language, the poor universality of translation, if necessary to be translated to multilingual progress number and sequence number, Then need to make a variety of translation systems, translation cost is high.
When the present invention is applied, choose unit and the reference sentence most like with former sentence is chosen from semantic base, and select and join The reference translation that sentence matches is examined, then judging unit judges whether former sentence is sequence number or number, replacement with not existed together with reference to sentence Unit is replaced according to the judged result of judging unit.The present invention is by choosing with reference to sentence, reference translation, sequence number table and number Table, the sequence number and number in original text is replaced with the sequence number and number for meeting translation term custom, improves the applicability of translation, And when changing with reference to sentence, reference translation, sequence number table sum vocabulary with language change and term environmental change, the present invention can For multilingual and term environment, the versatility of translation is improved.
Further, it is described choose unit is chosen from corpus with former sentence is isometric and sentence of WER values minimum as with The most like reference sentence of former sentence;The WER values are that the minimum step number with reference to required for sentence is modified as from former sentence;The WER values are adopted With Lay Weinstein distance.
When the present invention is applied, when choosing the unit selection reference sentence most like with former sentence, the sentence isometric with former sentence is chosen, And most like reference sentence is chosen by calculating WER values, isometric reference sentence is not required to when being replaced with former sentence with former sentence Consider deleting for content of the sentence, it is ensured that replace the accuracy of position, improve the accuracy that the present invention replaces translation.
Further, present invention additionally comprises:Production unit for making, storing and changing sequence number table sum vocabulary;Institute Stating sequence number table includes serial number type and sequence number numerical value corresponding with sequence number;The number table includes number type and corresponding with number Number numerical value.
When the present invention is applied, the sequence number table that production unit makes includes serial number type and sequence number numerical value corresponding with sequence number, The number table of making includes number type and number numerical value corresponding with number, this corresponding relation, clear in structure, Ke Yi The accuracy replaced is improved when translation is replaced to sequence number number.
The present invention compared with prior art, has the following advantages and advantages:
1st, the present invention is used for the number and sequence number replacement method of machine translation, by choosing with reference to sentence, reference translation, sequence number Table sum vocabulary, the sequence number and number in original text is replaced with the sequence number and number for meeting translation term custom, and refers to sentence, ginseng When examining translation, sequence number table sum vocabulary and changing with language change and term environmental change, the present invention may be used for a variety of Language and term environment, improve the applicability and versatility of translation;
2nd, the present invention is used for the number and sequence number replacement method of machine translation, chooses the sentence isometric with former sentence, and pass through Calculate WER values to choose most like reference sentence, isometric reference sentence with former sentence when being replaced with former sentence, it is not necessary to considers Content of the sentence is deleted, it is ensured that replace the accuracy of position, improves the accuracy that the present invention replaces translation;
3rd, the correspondence that the present invention is used in the number and sequence number replacement method of machine translation, the sequence number table sum vocabulary of making Relational structure is clear, and the accuracy replaced is improved when being replaced translation to sequence number number;
4th, the present invention is used for the number and sequence number replacement method of machine translation, and the substitute mode of use both met after translation The term custom of sequence number, also ensure that the accuracy of sequence number after translation;
5th, the present invention be used for machine translation number and sequence number replacement method, sequence number table sum vocabulary can each It is applied to more multilingual and application field in expansion, further increases the versatility of the present invention;
6th, the present invention is used for the number and sequence number replacement method of machine translation, when language languages and application field change, Serial number type sum part of speech type can be increased according to this change with logarithm vocabulary and sequence number table, further increase the logical of the present invention The property used;
7th, the present invention is used for the number and sequence number replacement system of machine translation, by choosing unit selection with reference to sentence, reference Translation, sequence number table sum vocabulary, replacement unit replace with the sequence number and number in original text the sequence number for meeting translation term custom And number, and when changing with reference to sentence, reference translation, sequence number table sum vocabulary with language change and term environmental change, this Invention may be used for multilingual and term environment, improve the applicability and versatility of translation;
8th, the present invention is used for the number and sequence number replacement system of machine translation, chooses the unit selection sentence isometric with former sentence Son, and most like reference sentence is chosen by calculating WER values, with former sentence isometric reference sentence when being replaced with former sentence, Without the concern for deleting for content of the sentence, it is ensured that replace the accuracy of position, the accuracy that the present invention replaces translation is improved;
9th, the present invention is used for the number and sequence number replacement system of machine translation, the sequence number table sum vocabulary that production unit makes In corresponding relation clear in structure, when being replaced translation to sequence number number improve replace accuracy.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding the embodiment of the present invention, constitutes one of the application Point, do not constitute the restriction to the embodiment of the present invention.In the accompanying drawings:
Fig. 1 is the inventive method step schematic diagram;
Fig. 2 is present system structural representation.
Embodiment
For the object, technical solutions and advantages of the present invention are more clearly understood, with reference to embodiment and accompanying drawing, to this Invention is described in further detail, and exemplary embodiment and its explanation of the invention is only used for explaining the present invention, does not make For limitation of the invention.
Embodiment 1
As shown in figure 1, the present invention is used for the number and sequence number replacement method of machine translation, comprise the following steps:S1:From language The reference sentence most like with former sentence is chosen in adopted storehouse, and selects the reference translation with matching with reference to sentence;S2:Judge former sentence with Whether do not existed together with reference to sentence is sequence number or number;S3:If do not existed together for sequence number, according to reference to sentence, reference translation and sequence number table Sequence number replacement is carried out to former sentence sequence number;If do not existed together for number, according to reference to sentence, reference translation sum vocabulary to former sentence number Carry out number replacement.
It is first that the reference sentence most like with former sentence is chosen from semantic base when the present embodiment is implemented, and select with referring to sentence The reference translation matched, whether with reference to sentence do not exist together be sequence number or number, if do not existed together for sequence number if then judging former sentence, Sequence number replacement is carried out to former sentence sequence number according to reference to sentence, reference translation and sequence number table;If do not existed together for number, according to reference Sentence, reference translation sum vocabulary carry out number replacement to former sentence number.The present invention is by choosing with reference to sentence, reference translation, sequence number Table sum vocabulary, the sequence number and number in original text is replaced with the sequence number and number for meeting translation term custom, improves translation Applicability, and when referring to sentence, reference translation, sequence number table sum vocabulary and changing with language change and term environmental change, The present invention may be used for multilingual and term environment, improve the versatility of translation.
Embodiment 2
As shown in figure 1, the present embodiment is on the basis of embodiment 1, step S1 also includes following sub-step:From semantic base Choose and be used as the reference sentence most like with former sentence with the former sentence that sentence is isometric and WER values are minimum;The WER values are from the modification of former sentence Into the minimum step number with reference to required for sentence;The WER values use Lay Weinstein distance.
When the present embodiment is implemented, when choosing the reference sentence most like with former sentence, the sentence isometric with former sentence is chosen, and pass through Calculate WER values to choose most like reference sentence, isometric reference sentence with former sentence when being replaced with former sentence, it is not necessary to considers Content of the sentence is deleted, it is ensured that replace the accuracy of position, improves the accuracy that the present invention replaces translation.
Embodiment 3
The present embodiment is further comprising the steps of on the basis of embodiment 1:Make sequence number table sum vocabulary;The sequence number Table includes serial number type and sequence number numerical value corresponding with sequence number;The number table includes number type and number corresponding with number Numerical value.
When the present embodiment is implemented, before using sequence number table sum vocabulary, sequence number table sum vocabulary, system are made as needed The sequence number table of work includes serial number type and sequence number numerical value corresponding with sequence number, and the number table of making includes number type and and number Corresponding number numerical value, this corresponding relation, clear in structure can improve replacement when being replaced translation to sequence number number Accuracy.
The sequence number table of making is specially:
The sequence number table of table 1
Sequence number numerical value 1 2 3 4 20
Serial number type 1 2 3 4 20
Serial number type One Two Three Four 20
Serial number type a b c d t
Serial number type I II III IV XX
Serial number type First Second Third Fourth Twentieth
Serial number type i ii iii iv xx
The number table of making is specially:
The number table of table 2
Number numerical value 1 2 100 1000 10000
Number type 1 2 100 1000 10000
Number type 1,000 10,000
Number type One Two 100 1000 10000
Number type 100 1000 10000
The number table of table 2
Embodiment 4
The present embodiment is on the basis of embodiment 3, and the sequence number, which is replaced, to be comprised the following steps:If with reference to sentence sequence number, ginseng The serial number type that translation sequence number and former sentence sequence number are all belonged in sequence number table is examined, then the sequence number of former sentence sequence number is calculated according to sequence number meter Numerical value, and the corresponding former sentence sequence of sequence number replacement is found in sequence number table according to the serial number type of the sequence number numerical value and reference translation Number;If the serial number type in sequence number table is not belonging to reference to any one in sentence sequence number, reference translation sequence number and former sentence sequence number, and All it is multistage Arabic numerals with reference to sentence sequence number, reference translation sequence number and former sentence sequence number, then to the former sentence sequence number without modification;Such as Fruit is not belonging to the serial number type in sequence number table with reference to any one in sentence sequence number, reference translation sequence number and former sentence sequence number, nor The serial number type of sequence number table is expanded in multistage Arabic numerals, then request.
When the present embodiment is implemented, if all belonged to reference to sentence sequence number, reference translation sequence number and former sentence sequence number in sequence number table Serial number type, then former sentence sequence number, such replacement side are replaced with the serial number type of reference translation according to the sequence number numerical value of original text Formula, both meets the term custom of sequence number after translation, also ensure that the accuracy of sequence number after translation;If with reference to sentence sequence number, reference Any one is not belonging to the serial number type in sequence number table in translation sequence number and former sentence sequence number, and with reference to sentence sequence number, reference translation sequence Number and former sentence sequence number be all multistage Arabic numerals, then do not replace;The form of multistage Arabic numerals is " 3.1.1 " and " 2- The types such as 3-1 ", this type often occurs in the form of heading order number hereof, the international usage ratio of this heading order number Relatively unify, so to former sentence sequence number without modification, directly the sequence number of application original sentence just can be with.If with reference to sentence sequence number, with reference to translating Any one is not belonging to the serial number type in sequence number table in literary sequence number and former sentence sequence number, nor multistage Arabic numerals, then please The serial number type for expanding sequence number table is sought, the sequence number table versatility after expansion is wider, and sequence number table can be fitted in each expansion For more multilingual and application field, the versatility of the present invention is further increased.
Embodiment 5
The present embodiment is on the basis of embodiment 3, and the number, which is replaced, to be comprised the following steps:If with reference to sentence number, ginseng The number type that translation number and former sentence number are all belonged in number table is examined, then the number of former sentence number is calculated according to number meter Numerical value, and the corresponding former sentence number of number replacement is found in number table according to the number type of the number numerical value and reference translation Word;If being not belonging to the number type in number table with reference to any one in sentence number, reference translation number and former sentence number, The number type of number table is expanded in request.
When the present embodiment is implemented, if all belonged to reference to sentence number, reference translation number and former sentence number in number table Number type, then former sentence number, such replacement side are replaced with the number type of reference translation according to the number numerical value of original text Formula, both meets the term custom of number after translation, also ensure that the accuracy of number after translation;If with reference to sentence number, reference Any one is not belonging to the number type in number table in translation number and former sentence number, then the number class of number table is expanded in request Type, the number table versatility after expansion is wider, and number table can be applied to more multilingual and application neck in each expansion Domain, further increases the versatility of the present invention.
Embodiment 6
The present embodiment is on the basis of embodiment 1 to 5:
Former sentence is:B) " 3,000,000 yuan of Lee X " investments X investment centres (limited partnership)
It is taken as with reference to selected works:1) " 10,000,000 yuan of Lee X " investments X investment centres (limited partnership)
Reference translation is chosen for:I)LI X invested RMB 10,000,000to X Investment Center (limited partnership)
WER values are calculated, WER values are 2;
Whether judge former sentence is sequence number or number with not existed together with reference to sentence:Former sentence has different at two with reference to sentence, at first It is not both to be judged as being judged as at sequence number, second at " 10,000,000 " for referring to text, first at " 1 " with reference to text, second not to be both Number;
According to table 1, at first in, the serial number type of " b " of former sentence, " 1 " with reference to text and reference translation " I " is in table In 1, " b " corresponding sequence number numerical value of original text is 2, then referring to table 1, is drawn according to the serial number type of reference translation after replacement Serial number " II ".
According to table 2, at second in, " 3,000,000 " of former sentence, " 10,000,000 " with reference to text and reference translation " 10,000, 000 " number type is in table 2, and " 3,000,000 " corresponding sequence number numerical value of original text is 3000000, then referring to table 1, according to The serial number type of reference translation draws the serial number " 3,000,000 " after replacing.
Draw the complete sentence replaced:II)LI X invested RMB 3,000,000to X Investment Center(limited partnership)。
Embodiment 7
The present embodiment is on the basis of embodiment 1 to 5:
Former sentence is:2.1.1.Liabilities for Breach of Contract
It is taken as with reference to selected works:3.1.2.Liabilities for Breach of Contract
Reference translation is chosen for:3.1.2, the solution of application of law and dispute
WER values are calculated, WER values are 1;
Whether judge former sentence is sequence number or number with not existed together with reference to sentence:Former sentence has different at one with reference to sentence, does not exist together It is, with reference to literary " 3.1.2 ", to be judged as sequence number at this;
According to table 1, the serial number type of " 2.1.1 " of former sentence, " 3.1.2 " with reference to text and reference translation " 3.1.2 " is equal Not in table 1, and all it is multistage Arabic numerals, then directly applies the sequence number " 2.1.1 " of former sentence;
Draw complete sentence:2.1.1, the solution of application of law and dispute.
Embodiment 7
The present embodiment is on the basis of embodiment 1 to 5:
Former sentence is:Б.Конституции64-74 20
It is taken as with reference to selected works:А.Конституции64-74 20
Reference translation is chosen for:A. constitution 64-74 20
WER values are calculated, WER values are 1;
Whether judge former sentence is sequence number or number with not existed together with reference to sentence:Former sentence has different at one with reference to sentence, does not exist together It is that with reference to literary " А ", reference translation does not exist together for " A " herein, is judged as sequence number at this;
According to table 1, " Б " of former sentence and with reference to literary " А " not in table 1, and it is not multistage Arabic numerals, then please Expansion sequence number table is sought, user expands new entry according to the request:
Serial number type А Б В Г У
Then according to the sequence number table after expansion, " Б " corresponding sequence number numerical value of original text is 2, then referring to the sequence after expansion Number table, the serial number " B " after replacing is drawn according to the serial number type of reference translation;
Draw the complete sentence replaced:B. constitution 64-74 20.
Embodiment 8
The present embodiment is on the basis of embodiment 1 to 5, and the sequence number table increases with the change of languages and application field Serial number type;
When the present embodiment is implemented, Russian is translated, it is necessary to expand sequence number table:
Expand new entry as follows:
Serial number type А Б В Г У
After sequence number table expands, it is adaptable to Russian Translator.
Embodiment 8
The present embodiment is on the basis of embodiment 8:The number table increases number with the change of languages and application field Part of speech type;
When the present embodiment is implemented, bank finance field is translated, it is necessary to which logarithm vocabulary expands:
Expand new entry as follows:
Number type One Two 100 1000 Yi Wan
After number table expands, it is adaptable to bank finance field.
Embodiment 9
As shown in Fig. 2 the present invention is used for the number and sequence number replacement system of machine translation, including:For from semantic base The reference sentence most like with former sentence is chosen, and selects the selection unit of the reference translation with matching with reference to sentence;For judging Former sentence with reference to sentence do not exist together whether be sequence number or number judging unit;For when not existing together for sequence number according to reference to sentence, Reference translation and sequence number table carry out sequence number replacement to former sentence sequence number, when not existing together for number according to reference to sentence, reference translation and Number table carries out the replacement unit of number replacement to former sentence number.
When the present embodiment is implemented, choose unit and choose the reference sentence most like with former sentence from semantic base, and select and The reference translation matched with reference to sentence, whether it is sequence number or number that then judging unit judges former sentence and do not existed together with reference to sentence, is replaced Unit is changed to be replaced according to the judged result of judging unit.The present invention is by choosing with reference to sentence, reference translation, sequence number table sum Vocabulary, the sequence number and number in original text is replaced with the sequence number and number for meeting translation term custom, improves being applicable for translation Property, and when changing with reference to sentence, reference translation, sequence number table sum vocabulary with language change and term environmental change, the present invention Multilingual and term environment are may be used for, the versatility of translation is improved.
Embodiment 10
The present embodiment is on the basis of embodiment 9, and the selection unit chooses with original that sentence is isometric and WER values from corpus Minimum sentence is used as the reference sentence most like with former sentence;The WER values are that the minimum with reference to required for sentence is modified as from former sentence Step number;The WER values use Lay Weinstein distance.
When the present embodiment is implemented, when choosing the unit selection reference sentence most like with former sentence, the sentence isometric with former sentence is chosen Son, and most like reference sentence is chosen by calculating WER values, with former sentence isometric reference sentence when being replaced with former sentence, Without the concern for deleting for content of the sentence, it is ensured that replace the accuracy of position, the accuracy that the present invention replaces translation is improved.
Embodiment 11
As shown in Fig. 2 the present embodiment is on the basis of embodiment 9, in addition to:For making, storing and changing sequence number table The production unit of sum vocabulary;The sequence number table includes serial number type and sequence number numerical value corresponding with sequence number;The number table bag Include number type and number numerical value corresponding with number.
When the present embodiment is implemented, the sequence number table that production unit makes includes serial number type and sequence number number corresponding with sequence number Value, the number table of making includes number type and number numerical value corresponding with number, this corresponding relation, and clear in structure can To improve the accuracy replaced when being replaced translation to sequence number number.
Above-described embodiment, has been carried out further to the purpose of the present invention, technical scheme and beneficial effect Describe in detail, should be understood that the embodiment that the foregoing is only the present invention, be not intended to limit the present invention Protection domain, within the spirit and principles of the invention, any modification, equivalent substitution and improvements done etc. all should be included Within protection scope of the present invention.

Claims (9)

1. number and sequence number replacement method for machine translation, it is characterised in that comprise the following steps:
S1:The reference sentence most like with former sentence is chosen from semantic base, and selects the reference translation with matching with reference to sentence;
S2:Whether judge former sentence is sequence number or number with not existed together with reference to sentence;
S3:If do not existed together for sequence number, sequence number replacement is carried out to former sentence sequence number according to reference to sentence, reference translation and sequence number table;Such as Fruit is not existed together for number, and number replacement is carried out to former sentence number according to reference to sentence, reference translation sum vocabulary.
2. the number and sequence number replacement method according to claim 1 for machine translation, it is characterised in that step S1 is also Including following sub-step:
Chosen from semantic base and be used as the reference sentence most like with former sentence with the former sentence that sentence is isometric and WER values are minimum;The WER It is worth to be modified as the minimum step number with reference to required for sentence from former sentence;The WER values use Lay Weinstein distance.
3. the number and sequence number replacement method according to claim 1 for machine translation, it is characterised in that also including with Lower step:
Make sequence number table sum vocabulary;The sequence number table includes serial number type and sequence number numerical value corresponding with sequence number;The number Table includes number type and number numerical value corresponding with number.
4. the number and sequence number replacement method according to claim 3 for machine translation, it is characterised in that the sequence number Replacement comprises the following steps:
If the serial number type all belonged to reference to sentence sequence number, reference translation sequence number and former sentence sequence number in sequence number table, according to sequence Number meter calculates the sequence number numerical value of former sentence sequence number, and is found according to the serial number type of the sequence number numerical value and reference translation in sequence number table Corresponding sequence number replaces former sentence sequence number;
If being not belonging to the serial number type in sequence number table with reference to any one in sentence sequence number, reference translation sequence number and former sentence sequence number, And be all multistage Arabic numerals with reference to sentence sequence number, reference translation sequence number and former sentence sequence number, then to the former sentence sequence number without modification;
If being not belonging to the serial number type in sequence number table with reference to any one in sentence sequence number, reference translation sequence number and former sentence sequence number, Nor the serial number type of sequence number table is expanded in multistage Arabic numerals, then request.
5. the number and sequence number replacement method according to claim 3 for machine translation, it is characterised in that the number Replacement comprises the following steps:
If the number type all belonged to reference to sentence number, reference translation number and former sentence number in number table, according to number Vocabulary calculates the number numerical value of former sentence number, and is found according to the number type of the number numerical value and reference translation in number table Corresponding number replaces former sentence number;
If being not belonging to the number type in number table with reference to any one in sentence number, reference translation number and former sentence number, Then the number type of number table is expanded in request.
6. the number and sequence number replacement method according to claim 1 for machine translation, it is characterised in that the sequence number Table increases serial number type with the change of languages and application field;The number table is with the change of languages and application field Increase number type.
7. using the number and sequence number replacement system for machine translation of claim 1 to 6 any one method, its feature exists In, including:
For choosing the reference sentence most like with former sentence from semantic base, and select with referring to the reference translation that sentence matches Choose unit;
For judge former sentence with reference to sentence do not exist together whether be sequence number or number judging unit;
For carrying out sequence number replacement to former sentence sequence number according to reference to sentence, reference translation and sequence number table when not existing together for sequence number, Do not exist together during for number according to the replacement unit for carrying out number replacement to former sentence number with reference to sentence, reference translation sum vocabulary.
8. the number and sequence number replacement system according to claim 7 for machine translation, it is characterised in that the selection Unit is chosen with the former sentence that sentence is isometric and WER values are minimum from corpus as the reference sentence most like with former sentence;The WER It is worth to be modified as the minimum step number with reference to required for sentence from former sentence;The WER values use Lay Weinstein distance.
9. the number and sequence number replacement system according to claim 7 for machine translation, it is characterised in that also include:
Production unit for making, storing and changing sequence number table sum vocabulary;The sequence number table includes serial number type and and sequence Number corresponding sequence number numerical value;The number table includes number type and number numerical value corresponding with number.
CN201710187175.1A 2017-03-27 2017-03-27 Number and sequence number replacement method and system for machine translation Pending CN107066454A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710187175.1A CN107066454A (en) 2017-03-27 2017-03-27 Number and sequence number replacement method and system for machine translation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710187175.1A CN107066454A (en) 2017-03-27 2017-03-27 Number and sequence number replacement method and system for machine translation

Publications (1)

Publication Number Publication Date
CN107066454A true CN107066454A (en) 2017-08-18

Family

ID=59621157

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710187175.1A Pending CN107066454A (en) 2017-03-27 2017-03-27 Number and sequence number replacement method and system for machine translation

Country Status (1)

Country Link
CN (1) CN107066454A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8244519B2 (en) * 2008-12-03 2012-08-14 Xerox Corporation Dynamic translation memory using statistical machine translation
CN103020044A (en) * 2012-12-03 2013-04-03 江苏乐买到网络科技有限公司 Machine-aided webpage translation method and system thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8244519B2 (en) * 2008-12-03 2012-08-14 Xerox Corporation Dynamic translation memory using statistical machine translation
CN103020044A (en) * 2012-12-03 2013-04-03 江苏乐买到网络科技有限公司 Machine-aided webpage translation method and system thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张秉权: "机器翻译中数字和数词相关表达形式的词法分析技术", 《计算机工程与应用》 *

Similar Documents

Publication Publication Date Title
JP6607902B2 (en) Phrase-based dictionary extraction and translation quality evaluation
Boudin et al. Keyphrase extraction for n-best reranking in multi-sentence compression
CN103810212B (en) Automated database index creation method and system
CN105446962B (en) The alignment schemes and device of original text and translation
TW201812619A (en) Error correction method and device for search term
CN107885737A (en) A kind of human-computer interaction interpretation method and system
CN104881469A (en) Data exporting method and device
CN103365838A (en) Method for automatically correcting syntax errors in English composition based on multivariate features
CN103164393B (en) Report form formula disposal route and system
US20200293623A1 (en) Learning system for contextual interpretation of japanese words
CN106339367B (en) A kind of Mongolian auto-correction method
CN107066454A (en) Number and sequence number replacement method and system for machine translation
CN104933030A (en) Uygur language spelling examination method and device
US20180011836A1 (en) Tibetan Character Constituent Analysis Method, Tibetan Sorting Method And Corresponding Devices
CN102346777B (en) A kind of method and apparatus that illustrative sentence retrieval result is ranked up
CN109829010A (en) A kind of entry amending method and entry modify device
CN107766339A (en) The method and device of former translation alignment
CN109828775B (en) WEB management system and method for multilingual translation text content
CN109522563B (en) Method and device for automatically judging statement translation completion
CN107967303B (en) Corpus display method and apparatus
CN106775935B (en) The analytic method and its device and computer system of interpreted languages
Tang et al. Method of Chinese Grammar rules automatically access based on mining association rules
CN104133854A (en) MySQL multi-language mixed text fulltext retrieval realization method
Haslinger et al. Acquisition of semantic type flexibility: The case of conjunction
Berlot et al. Machine Translation with Cross-lingual Word Embeddings

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170818