CN107066454A - Number and sequence number replacement method and system for machine translation - Google Patents
Number and sequence number replacement method and system for machine translation Download PDFInfo
- Publication number
- CN107066454A CN107066454A CN201710187175.1A CN201710187175A CN107066454A CN 107066454 A CN107066454 A CN 107066454A CN 201710187175 A CN201710187175 A CN 201710187175A CN 107066454 A CN107066454 A CN 107066454A
- Authority
- CN
- China
- Prior art keywords
- sequence number
- sentence
- translation
- former
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/42—Data-driven translation
- G06F40/45—Example-based machine translation; Alignment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
Abstract
The invention discloses the number for machine translation and sequence number replacement method, by choosing with reference to sentence, reference translation, sequence number table sum vocabulary, sequence number and number in original text is replaced with to the sequence number and number for meeting translation term custom, and when changing with reference to sentence, reference translation, sequence number table sum vocabulary with language change and term environmental change, the present invention may be used for multilingual and term environment, improve the applicability and versatility of translation.The invention discloses the number for machine translation and sequence number replacement system, by choosing unit selection with reference to sentence, reference translation, sequence number table sum vocabulary, replacement unit replaces with the sequence number and number in original text the sequence number and number for meeting translation term custom, and when changing with reference to sentence, reference translation, sequence number table sum vocabulary with language change and term environmental change, the present invention may be used for multilingual and term environment, improve the applicability and versatility of translation.
Description
Technical field
The present invention relates to machine translation field, and in particular to for machine translation number and sequence number replacement method and be
System.
Background technology
As international cooperation is increasingly strengthened, either Chinese companies or foreign corporation, in international project and international scientific research
Project verification and development during, be required for largely translating the files such as engineering, scientific research and law.Meanwhile, many domestic projects and section
The project of grinding is also required to translation associated documents for foreign capital investment and international bank lending etc..The files such as engineering, scientific research and law
In often relate to substantial amounts of number and sequence number, due to the difference of various language, the writing mode of these numbers and sequence number often has
Greatly difference, increases the translation difficulty of number and sequence number.
Translation of the tradition to number and sequence number, often using human translation, wastes time and energy, cost is higher;And to number and
It is many using simple literal translation when sequence number carries out machine translation, translation it is poor for applicability;If in order to improve the applicability of translation,
Corresponding interpretation method is made for a kind of language, then the poor universality translated, multilingual translation cost is high.
The content of the invention
The technical problems to be solved by the invention are when carrying out machine translation to number and sequence number, the applicability of translation and logical
Poor with property, multilingual translation cost is high, it is therefore intended that provide the number and sequence number replacement method and system for machine translation, solves
Certainly above mentioned problem.
The present invention is achieved through the following technical solutions:
For the number and sequence number replacement method of machine translation, comprise the following steps:S1:Chosen from semantic base and former sentence
Most like reference sentence, and select the reference translation with matching with reference to sentence;S2:Judge former sentence with whether not existed together with reference to sentence
For sequence number or number;S3:If do not existed together for sequence number, sequence is carried out to former sentence sequence number according to reference to sentence, reference translation and sequence number table
Number replace;If do not existed together for number, number replacement is carried out to former sentence number according to reference to sentence, reference translation sum vocabulary.
In the prior art, it is many using simple literal translation, the applicability of translation when carrying out machine translation to number and sequence number
In difference, such as English, legal provision sequence number is using form as " (1) ", during machine translation legal provision, often by this sequence
Number directly it is translated as " (1) ", and the legal provision of China is using term mode as " (one) ";And made for a kind of language
Interpretation method be only used for this language, the poor universality of translation, if necessary to be translated to multilingual progress number and sequence number,
Then need to make a variety of interpretation methods, translation cost is high.
When the present invention is applied, the reference sentence most like with former sentence is first chosen from semantic base, and select with referring to sentence phase
The reference translation of matching, whether with reference to sentence do not exist together be sequence number or number, if do not existed together for sequence number, root if then judging former sentence
Sequence number replacement is carried out to former sentence sequence number according to reference to sentence, reference translation and sequence number table;If do not existed together for number, according to reference to sentence,
Reference translation sum vocabulary carries out number replacement to former sentence number.The present invention by choose with reference to sentence, reference translation, sequence number table and
Number table, the sequence number and number in original text is replaced with the sequence number and number for meeting translation term custom, improves the suitable of translation
With property, and when referring to sentence, reference translation, sequence number table sum vocabulary and changing with language change and term environmental change, this hair
It is bright to may be used for multilingual and term environment, improve the versatility of translation.
Further, step S1 also includes following sub-step:Choose isometric and WER values minimum with former sentence from semantic base
Sentence is used as the reference sentence most like with former sentence;The WER values are that the minimum step number with reference to required for sentence is modified as from former sentence;Institute
State WER values and use Lay Weinstein distance.
When the present invention is applied, when choosing the reference sentence most like with former sentence, the sentence isometric with former sentence is chosen, and pass through meter
Calculate WER values to choose most like reference sentence, isometric reference sentence with former sentence when being replaced with former sentence, it is not necessary to considers sentence
Sub- content is deleted, it is ensured that replace the accuracy of position, improves the accuracy that the present invention replaces translation.
Further, present invention additionally comprises following steps:Make sequence number table sum vocabulary;The sequence number table includes sequence number class
Type and sequence number numerical value corresponding with sequence number;The number table includes number type and number numerical value corresponding with number.
When the present invention is applied, before using sequence number table sum vocabulary, sequence number table sum vocabulary is made as needed, is made
Sequence number table include serial number type and sequence number numerical value corresponding with sequence number, the number table of making include number type and with number pair
The number numerical value answered, this corresponding relation, clear in structure can improve what is replaced when being replaced translation to sequence number number
Accuracy.
Further, the sequence number is replaced and comprised the following steps:If with reference to sentence sequence number, reference translation sequence number and former sentence sequence
Number serial number type all belonged in sequence number table, then calculate the sequence number numerical value of former sentence sequence number according to sequence number meter, and according to the sequence number
The serial number type of numerical value and reference translation finds corresponding sequence number in sequence number table and replaces former sentence sequence number;If with reference to sentence sequence number,
Any one is not belonging to the serial number type in sequence number table in reference translation sequence number and former sentence sequence number, and with reference to sentence sequence number, with reference to translating
Literary sequence number and former sentence sequence number are all multistage Arabic numerals, then to the former sentence sequence number without modification;If with reference to sentence sequence number, reference
Any one is not belonging to the serial number type in sequence number table in translation sequence number and former sentence sequence number, nor multistage Arabic numerals, then
The serial number type of sequence number table is expanded in request.
When the present invention is applied, if all belonged to reference to sentence sequence number, reference translation sequence number and former sentence sequence number in sequence number table
Serial number type, then replace former sentence sequence number according to the sequence number numerical value of original text with the serial number type of reference translation, such substitute mode,
Both meet the term custom of sequence number after translation, also ensure that the accuracy of sequence number after translation;If with reference to sentence sequence number, reference translation
Any one is not belonging to the serial number type in sequence number table in sequence number and former sentence sequence number, and with reference to sentence sequence number, reference translation sequence number and
Former sentence sequence number is all multistage Arabic numerals, then does not replace;The form of multistage Arabic numerals is " 3.1.1 " and " 2-3-1 "
Etc. type, this type often occurs in the form of heading order number hereof, and the international usage of this heading order number compares system
One, so to former sentence sequence number without modification, directly apply the sequence number of former sentence just can be with.If with reference to sentence sequence number, reference translation sequence
Number and former sentence sequence number in any one be not belonging to serial number type in sequence number table, nor multistage Arabic numerals, then request is expanded
The serial number type of sequence number table is filled, the sequence number table versatility after expansion is wider, and sequence number table can be applied in each expansion
More multilingual and application field, further increases the versatility of the present invention.
Further, the number is replaced and comprised the following steps:If with reference to sentence number, reference translation number and former sentence number
The number type that word is all belonged in number table, then calculate the number numerical value of former sentence number according to number meter, and according to the number
The number type of numerical value and reference translation finds corresponding number in number table and replaces former sentence number;If with reference to sentence number,
Any one is not belonging to the number type in number table in reference translation number and former sentence number, then the number of number table is expanded in request
Part of speech type.
When the present invention is applied, if all belonged to reference to sentence number, reference translation number and former sentence number in number table
Number type, then replace former sentence number according to the number numerical value of original text with the number type of reference translation, such substitute mode,
Both meet the term custom of number after translation, also ensure that the accuracy of number after translation;If with reference to sentence number, reference translation
Any one is not belonging to the number type in number table in number and former sentence number, then the number type of number table is expanded in request,
Number table versatility after expansion is wider, and number table can be applied to more multilingual and application field in each expansion,
Further increase the versatility of the present invention.
Further, the sequence number table increases serial number type with the change of languages and application field;The number table
Increase number type with the change of languages and application field.
, can be with logarithm vocabulary and sequence number table according to this when language languages and application field change when the present invention is applied
Change increase serial number type sum part of speech type, further increases the versatility of the present invention.
Number and sequence number replacement system for machine translation, it is characterised in that including:For from semantic base choose with
The most like reference sentence of former sentence, and select the selection unit of the reference translation with matching with reference to sentence;For judge former sentence with
Do not existed together with reference to sentence whether be sequence number or number judging unit;For when not existing together for sequence number according to reference to sentence, with reference to translating
Text and sequence number table carry out sequence number replacement to former sentence sequence number, when not existing together for number according to reference to sentence, reference translation sum vocabulary
The replacement unit of number replacement is carried out to former sentence number.
In the prior art, it is many using simple literal translation, the applicability of translation when carrying out machine translation to number and sequence number
In difference, such as English, legal provision sequence number is using form as " (1) ", during machine translation legal provision, often by this sequence
Number directly it is translated as " (1) ", and the legal provision of China is using term mode as " (one) ";And made for a kind of language
Translation system be only used for this language, the poor universality of translation, if necessary to be translated to multilingual progress number and sequence number,
Then need to make a variety of translation systems, translation cost is high.
When the present invention is applied, choose unit and the reference sentence most like with former sentence is chosen from semantic base, and select and join
The reference translation that sentence matches is examined, then judging unit judges whether former sentence is sequence number or number, replacement with not existed together with reference to sentence
Unit is replaced according to the judged result of judging unit.The present invention is by choosing with reference to sentence, reference translation, sequence number table and number
Table, the sequence number and number in original text is replaced with the sequence number and number for meeting translation term custom, improves the applicability of translation,
And when changing with reference to sentence, reference translation, sequence number table sum vocabulary with language change and term environmental change, the present invention can
For multilingual and term environment, the versatility of translation is improved.
Further, it is described choose unit is chosen from corpus with former sentence is isometric and sentence of WER values minimum as with
The most like reference sentence of former sentence;The WER values are that the minimum step number with reference to required for sentence is modified as from former sentence;The WER values are adopted
With Lay Weinstein distance.
When the present invention is applied, when choosing the unit selection reference sentence most like with former sentence, the sentence isometric with former sentence is chosen,
And most like reference sentence is chosen by calculating WER values, isometric reference sentence is not required to when being replaced with former sentence with former sentence
Consider deleting for content of the sentence, it is ensured that replace the accuracy of position, improve the accuracy that the present invention replaces translation.
Further, present invention additionally comprises:Production unit for making, storing and changing sequence number table sum vocabulary;Institute
Stating sequence number table includes serial number type and sequence number numerical value corresponding with sequence number;The number table includes number type and corresponding with number
Number numerical value.
When the present invention is applied, the sequence number table that production unit makes includes serial number type and sequence number numerical value corresponding with sequence number,
The number table of making includes number type and number numerical value corresponding with number, this corresponding relation, clear in structure, Ke Yi
The accuracy replaced is improved when translation is replaced to sequence number number.
The present invention compared with prior art, has the following advantages and advantages:
1st, the present invention is used for the number and sequence number replacement method of machine translation, by choosing with reference to sentence, reference translation, sequence number
Table sum vocabulary, the sequence number and number in original text is replaced with the sequence number and number for meeting translation term custom, and refers to sentence, ginseng
When examining translation, sequence number table sum vocabulary and changing with language change and term environmental change, the present invention may be used for a variety of
Language and term environment, improve the applicability and versatility of translation;
2nd, the present invention is used for the number and sequence number replacement method of machine translation, chooses the sentence isometric with former sentence, and pass through
Calculate WER values to choose most like reference sentence, isometric reference sentence with former sentence when being replaced with former sentence, it is not necessary to considers
Content of the sentence is deleted, it is ensured that replace the accuracy of position, improves the accuracy that the present invention replaces translation;
3rd, the correspondence that the present invention is used in the number and sequence number replacement method of machine translation, the sequence number table sum vocabulary of making
Relational structure is clear, and the accuracy replaced is improved when being replaced translation to sequence number number;
4th, the present invention is used for the number and sequence number replacement method of machine translation, and the substitute mode of use both met after translation
The term custom of sequence number, also ensure that the accuracy of sequence number after translation;
5th, the present invention be used for machine translation number and sequence number replacement method, sequence number table sum vocabulary can each
It is applied to more multilingual and application field in expansion, further increases the versatility of the present invention;
6th, the present invention is used for the number and sequence number replacement method of machine translation, when language languages and application field change,
Serial number type sum part of speech type can be increased according to this change with logarithm vocabulary and sequence number table, further increase the logical of the present invention
The property used;
7th, the present invention is used for the number and sequence number replacement system of machine translation, by choosing unit selection with reference to sentence, reference
Translation, sequence number table sum vocabulary, replacement unit replace with the sequence number and number in original text the sequence number for meeting translation term custom
And number, and when changing with reference to sentence, reference translation, sequence number table sum vocabulary with language change and term environmental change, this
Invention may be used for multilingual and term environment, improve the applicability and versatility of translation;
8th, the present invention is used for the number and sequence number replacement system of machine translation, chooses the unit selection sentence isometric with former sentence
Son, and most like reference sentence is chosen by calculating WER values, with former sentence isometric reference sentence when being replaced with former sentence,
Without the concern for deleting for content of the sentence, it is ensured that replace the accuracy of position, the accuracy that the present invention replaces translation is improved;
9th, the present invention is used for the number and sequence number replacement system of machine translation, the sequence number table sum vocabulary that production unit makes
In corresponding relation clear in structure, when being replaced translation to sequence number number improve replace accuracy.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding the embodiment of the present invention, constitutes one of the application
Point, do not constitute the restriction to the embodiment of the present invention.In the accompanying drawings:
Fig. 1 is the inventive method step schematic diagram;
Fig. 2 is present system structural representation.
Embodiment
For the object, technical solutions and advantages of the present invention are more clearly understood, with reference to embodiment and accompanying drawing, to this
Invention is described in further detail, and exemplary embodiment and its explanation of the invention is only used for explaining the present invention, does not make
For limitation of the invention.
Embodiment 1
As shown in figure 1, the present invention is used for the number and sequence number replacement method of machine translation, comprise the following steps:S1:From language
The reference sentence most like with former sentence is chosen in adopted storehouse, and selects the reference translation with matching with reference to sentence;S2:Judge former sentence with
Whether do not existed together with reference to sentence is sequence number or number;S3:If do not existed together for sequence number, according to reference to sentence, reference translation and sequence number table
Sequence number replacement is carried out to former sentence sequence number;If do not existed together for number, according to reference to sentence, reference translation sum vocabulary to former sentence number
Carry out number replacement.
It is first that the reference sentence most like with former sentence is chosen from semantic base when the present embodiment is implemented, and select with referring to sentence
The reference translation matched, whether with reference to sentence do not exist together be sequence number or number, if do not existed together for sequence number if then judging former sentence,
Sequence number replacement is carried out to former sentence sequence number according to reference to sentence, reference translation and sequence number table;If do not existed together for number, according to reference
Sentence, reference translation sum vocabulary carry out number replacement to former sentence number.The present invention is by choosing with reference to sentence, reference translation, sequence number
Table sum vocabulary, the sequence number and number in original text is replaced with the sequence number and number for meeting translation term custom, improves translation
Applicability, and when referring to sentence, reference translation, sequence number table sum vocabulary and changing with language change and term environmental change,
The present invention may be used for multilingual and term environment, improve the versatility of translation.
Embodiment 2
As shown in figure 1, the present embodiment is on the basis of embodiment 1, step S1 also includes following sub-step:From semantic base
Choose and be used as the reference sentence most like with former sentence with the former sentence that sentence is isometric and WER values are minimum;The WER values are from the modification of former sentence
Into the minimum step number with reference to required for sentence;The WER values use Lay Weinstein distance.
When the present embodiment is implemented, when choosing the reference sentence most like with former sentence, the sentence isometric with former sentence is chosen, and pass through
Calculate WER values to choose most like reference sentence, isometric reference sentence with former sentence when being replaced with former sentence, it is not necessary to considers
Content of the sentence is deleted, it is ensured that replace the accuracy of position, improves the accuracy that the present invention replaces translation.
Embodiment 3
The present embodiment is further comprising the steps of on the basis of embodiment 1:Make sequence number table sum vocabulary;The sequence number
Table includes serial number type and sequence number numerical value corresponding with sequence number;The number table includes number type and number corresponding with number
Numerical value.
When the present embodiment is implemented, before using sequence number table sum vocabulary, sequence number table sum vocabulary, system are made as needed
The sequence number table of work includes serial number type and sequence number numerical value corresponding with sequence number, and the number table of making includes number type and and number
Corresponding number numerical value, this corresponding relation, clear in structure can improve replacement when being replaced translation to sequence number number
Accuracy.
The sequence number table of making is specially:
The sequence number table of table 1
Sequence number numerical value | 1 | 2 | 3 | 4 | … | 20 | … |
Serial number type | 1 | 2 | 3 | 4 | … | 20 | … |
Serial number type | One | Two | Three | Four | … | 20 | … |
Serial number type | a | b | c | d | … | t | … |
Serial number type | I | II | III | IV | … | XX | … |
Serial number type | First | Second | Third | Fourth | … | Twentieth | … |
Serial number type | i | ii | iii | iv | … | xx | … |
… | … | … | … | … | … | … | … |
The number table of making is specially:
The number table of table 2
Number numerical value | 1 | 2 | … | 100 | … | 1000 | … | 10000 | … |
Number type | 1 | 2 | … | 100 | … | 1000 | … | 10000 | … |
Number type | 1,000 | … | 10,000 | … | |||||
Number type | One | Two | … | 100 | … | 1000 | … | 10000 | … |
Number type | 100 | … | 1000 | … | 10000 | … | |||
… | … | … | … | … | … | … | … | … | … |
The number table of table 2
Embodiment 4
The present embodiment is on the basis of embodiment 3, and the sequence number, which is replaced, to be comprised the following steps:If with reference to sentence sequence number, ginseng
The serial number type that translation sequence number and former sentence sequence number are all belonged in sequence number table is examined, then the sequence number of former sentence sequence number is calculated according to sequence number meter
Numerical value, and the corresponding former sentence sequence of sequence number replacement is found in sequence number table according to the serial number type of the sequence number numerical value and reference translation
Number;If the serial number type in sequence number table is not belonging to reference to any one in sentence sequence number, reference translation sequence number and former sentence sequence number, and
All it is multistage Arabic numerals with reference to sentence sequence number, reference translation sequence number and former sentence sequence number, then to the former sentence sequence number without modification;Such as
Fruit is not belonging to the serial number type in sequence number table with reference to any one in sentence sequence number, reference translation sequence number and former sentence sequence number, nor
The serial number type of sequence number table is expanded in multistage Arabic numerals, then request.
When the present embodiment is implemented, if all belonged to reference to sentence sequence number, reference translation sequence number and former sentence sequence number in sequence number table
Serial number type, then former sentence sequence number, such replacement side are replaced with the serial number type of reference translation according to the sequence number numerical value of original text
Formula, both meets the term custom of sequence number after translation, also ensure that the accuracy of sequence number after translation;If with reference to sentence sequence number, reference
Any one is not belonging to the serial number type in sequence number table in translation sequence number and former sentence sequence number, and with reference to sentence sequence number, reference translation sequence
Number and former sentence sequence number be all multistage Arabic numerals, then do not replace;The form of multistage Arabic numerals is " 3.1.1 " and " 2-
The types such as 3-1 ", this type often occurs in the form of heading order number hereof, the international usage ratio of this heading order number
Relatively unify, so to former sentence sequence number without modification, directly the sequence number of application original sentence just can be with.If with reference to sentence sequence number, with reference to translating
Any one is not belonging to the serial number type in sequence number table in literary sequence number and former sentence sequence number, nor multistage Arabic numerals, then please
The serial number type for expanding sequence number table is sought, the sequence number table versatility after expansion is wider, and sequence number table can be fitted in each expansion
For more multilingual and application field, the versatility of the present invention is further increased.
Embodiment 5
The present embodiment is on the basis of embodiment 3, and the number, which is replaced, to be comprised the following steps:If with reference to sentence number, ginseng
The number type that translation number and former sentence number are all belonged in number table is examined, then the number of former sentence number is calculated according to number meter
Numerical value, and the corresponding former sentence number of number replacement is found in number table according to the number type of the number numerical value and reference translation
Word;If being not belonging to the number type in number table with reference to any one in sentence number, reference translation number and former sentence number,
The number type of number table is expanded in request.
When the present embodiment is implemented, if all belonged to reference to sentence number, reference translation number and former sentence number in number table
Number type, then former sentence number, such replacement side are replaced with the number type of reference translation according to the number numerical value of original text
Formula, both meets the term custom of number after translation, also ensure that the accuracy of number after translation;If with reference to sentence number, reference
Any one is not belonging to the number type in number table in translation number and former sentence number, then the number class of number table is expanded in request
Type, the number table versatility after expansion is wider, and number table can be applied to more multilingual and application neck in each expansion
Domain, further increases the versatility of the present invention.
Embodiment 6
The present embodiment is on the basis of embodiment 1 to 5:
Former sentence is:B) " 3,000,000 yuan of Lee X " investments X investment centres (limited partnership)
It is taken as with reference to selected works:1) " 10,000,000 yuan of Lee X " investments X investment centres (limited partnership)
Reference translation is chosen for:I)LI X invested RMB 10,000,000to X Investment Center
(limited partnership)
WER values are calculated, WER values are 2;
Whether judge former sentence is sequence number or number with not existed together with reference to sentence:Former sentence has different at two with reference to sentence, at first
It is not both to be judged as being judged as at sequence number, second at " 10,000,000 " for referring to text, first at " 1 " with reference to text, second not to be both
Number;
According to table 1, at first in, the serial number type of " b " of former sentence, " 1 " with reference to text and reference translation " I " is in table
In 1, " b " corresponding sequence number numerical value of original text is 2, then referring to table 1, is drawn according to the serial number type of reference translation after replacement
Serial number " II ".
According to table 2, at second in, " 3,000,000 " of former sentence, " 10,000,000 " with reference to text and reference translation " 10,000,
000 " number type is in table 2, and " 3,000,000 " corresponding sequence number numerical value of original text is 3000000, then referring to table 1, according to
The serial number type of reference translation draws the serial number " 3,000,000 " after replacing.
Draw the complete sentence replaced:II)LI X invested RMB 3,000,000to X Investment
Center(limited partnership)。
Embodiment 7
The present embodiment is on the basis of embodiment 1 to 5:
Former sentence is:2.1.1.Liabilities for Breach of Contract
It is taken as with reference to selected works:3.1.2.Liabilities for Breach of Contract
Reference translation is chosen for:3.1.2, the solution of application of law and dispute
WER values are calculated, WER values are 1;
Whether judge former sentence is sequence number or number with not existed together with reference to sentence:Former sentence has different at one with reference to sentence, does not exist together
It is, with reference to literary " 3.1.2 ", to be judged as sequence number at this;
According to table 1, the serial number type of " 2.1.1 " of former sentence, " 3.1.2 " with reference to text and reference translation " 3.1.2 " is equal
Not in table 1, and all it is multistage Arabic numerals, then directly applies the sequence number " 2.1.1 " of former sentence;
Draw complete sentence:2.1.1, the solution of application of law and dispute.
Embodiment 7
The present embodiment is on the basis of embodiment 1 to 5:
Former sentence is:Б.Конституции64-74 20
It is taken as with reference to selected works:А.Конституции64-74 20
Reference translation is chosen for:A. constitution 64-74 20
WER values are calculated, WER values are 1;
Whether judge former sentence is sequence number or number with not existed together with reference to sentence:Former sentence has different at one with reference to sentence, does not exist together
It is that with reference to literary " А ", reference translation does not exist together for " A " herein, is judged as sequence number at this;
According to table 1, " Б " of former sentence and with reference to literary " А " not in table 1, and it is not multistage Arabic numerals, then please
Expansion sequence number table is sought, user expands new entry according to the request:
Serial number type | А | Б | В | Г | … | У | … |
Then according to the sequence number table after expansion, " Б " corresponding sequence number numerical value of original text is 2, then referring to the sequence after expansion
Number table, the serial number " B " after replacing is drawn according to the serial number type of reference translation;
Draw the complete sentence replaced:B. constitution 64-74 20.
Embodiment 8
The present embodiment is on the basis of embodiment 1 to 5, and the sequence number table increases with the change of languages and application field
Serial number type;
When the present embodiment is implemented, Russian is translated, it is necessary to expand sequence number table:
Expand new entry as follows:
Serial number type | А | Б | В | Г | … | У | … |
After sequence number table expands, it is adaptable to Russian Translator.
Embodiment 8
The present embodiment is on the basis of embodiment 8:The number table increases number with the change of languages and application field
Part of speech type;
When the present embodiment is implemented, bank finance field is translated, it is necessary to which logarithm vocabulary expands:
Expand new entry as follows:
Number type | One | Two | … | 100 | … | 1000 | … | Yi Wan | … |
After number table expands, it is adaptable to bank finance field.
Embodiment 9
As shown in Fig. 2 the present invention is used for the number and sequence number replacement system of machine translation, including:For from semantic base
The reference sentence most like with former sentence is chosen, and selects the selection unit of the reference translation with matching with reference to sentence;For judging
Former sentence with reference to sentence do not exist together whether be sequence number or number judging unit;For when not existing together for sequence number according to reference to sentence,
Reference translation and sequence number table carry out sequence number replacement to former sentence sequence number, when not existing together for number according to reference to sentence, reference translation and
Number table carries out the replacement unit of number replacement to former sentence number.
When the present embodiment is implemented, choose unit and choose the reference sentence most like with former sentence from semantic base, and select and
The reference translation matched with reference to sentence, whether it is sequence number or number that then judging unit judges former sentence and do not existed together with reference to sentence, is replaced
Unit is changed to be replaced according to the judged result of judging unit.The present invention is by choosing with reference to sentence, reference translation, sequence number table sum
Vocabulary, the sequence number and number in original text is replaced with the sequence number and number for meeting translation term custom, improves being applicable for translation
Property, and when changing with reference to sentence, reference translation, sequence number table sum vocabulary with language change and term environmental change, the present invention
Multilingual and term environment are may be used for, the versatility of translation is improved.
Embodiment 10
The present embodiment is on the basis of embodiment 9, and the selection unit chooses with original that sentence is isometric and WER values from corpus
Minimum sentence is used as the reference sentence most like with former sentence;The WER values are that the minimum with reference to required for sentence is modified as from former sentence
Step number;The WER values use Lay Weinstein distance.
When the present embodiment is implemented, when choosing the unit selection reference sentence most like with former sentence, the sentence isometric with former sentence is chosen
Son, and most like reference sentence is chosen by calculating WER values, with former sentence isometric reference sentence when being replaced with former sentence,
Without the concern for deleting for content of the sentence, it is ensured that replace the accuracy of position, the accuracy that the present invention replaces translation is improved.
Embodiment 11
As shown in Fig. 2 the present embodiment is on the basis of embodiment 9, in addition to:For making, storing and changing sequence number table
The production unit of sum vocabulary;The sequence number table includes serial number type and sequence number numerical value corresponding with sequence number;The number table bag
Include number type and number numerical value corresponding with number.
When the present embodiment is implemented, the sequence number table that production unit makes includes serial number type and sequence number number corresponding with sequence number
Value, the number table of making includes number type and number numerical value corresponding with number, this corresponding relation, and clear in structure can
To improve the accuracy replaced when being replaced translation to sequence number number.
Above-described embodiment, has been carried out further to the purpose of the present invention, technical scheme and beneficial effect
Describe in detail, should be understood that the embodiment that the foregoing is only the present invention, be not intended to limit the present invention
Protection domain, within the spirit and principles of the invention, any modification, equivalent substitution and improvements done etc. all should be included
Within protection scope of the present invention.
Claims (9)
1. number and sequence number replacement method for machine translation, it is characterised in that comprise the following steps:
S1:The reference sentence most like with former sentence is chosen from semantic base, and selects the reference translation with matching with reference to sentence;
S2:Whether judge former sentence is sequence number or number with not existed together with reference to sentence;
S3:If do not existed together for sequence number, sequence number replacement is carried out to former sentence sequence number according to reference to sentence, reference translation and sequence number table;Such as
Fruit is not existed together for number, and number replacement is carried out to former sentence number according to reference to sentence, reference translation sum vocabulary.
2. the number and sequence number replacement method according to claim 1 for machine translation, it is characterised in that step S1 is also
Including following sub-step:
Chosen from semantic base and be used as the reference sentence most like with former sentence with the former sentence that sentence is isometric and WER values are minimum;The WER
It is worth to be modified as the minimum step number with reference to required for sentence from former sentence;The WER values use Lay Weinstein distance.
3. the number and sequence number replacement method according to claim 1 for machine translation, it is characterised in that also including with
Lower step:
Make sequence number table sum vocabulary;The sequence number table includes serial number type and sequence number numerical value corresponding with sequence number;The number
Table includes number type and number numerical value corresponding with number.
4. the number and sequence number replacement method according to claim 3 for machine translation, it is characterised in that the sequence number
Replacement comprises the following steps:
If the serial number type all belonged to reference to sentence sequence number, reference translation sequence number and former sentence sequence number in sequence number table, according to sequence
Number meter calculates the sequence number numerical value of former sentence sequence number, and is found according to the serial number type of the sequence number numerical value and reference translation in sequence number table
Corresponding sequence number replaces former sentence sequence number;
If being not belonging to the serial number type in sequence number table with reference to any one in sentence sequence number, reference translation sequence number and former sentence sequence number,
And be all multistage Arabic numerals with reference to sentence sequence number, reference translation sequence number and former sentence sequence number, then to the former sentence sequence number without modification;
If being not belonging to the serial number type in sequence number table with reference to any one in sentence sequence number, reference translation sequence number and former sentence sequence number,
Nor the serial number type of sequence number table is expanded in multistage Arabic numerals, then request.
5. the number and sequence number replacement method according to claim 3 for machine translation, it is characterised in that the number
Replacement comprises the following steps:
If the number type all belonged to reference to sentence number, reference translation number and former sentence number in number table, according to number
Vocabulary calculates the number numerical value of former sentence number, and is found according to the number type of the number numerical value and reference translation in number table
Corresponding number replaces former sentence number;
If being not belonging to the number type in number table with reference to any one in sentence number, reference translation number and former sentence number,
Then the number type of number table is expanded in request.
6. the number and sequence number replacement method according to claim 1 for machine translation, it is characterised in that the sequence number
Table increases serial number type with the change of languages and application field;The number table is with the change of languages and application field
Increase number type.
7. using the number and sequence number replacement system for machine translation of claim 1 to 6 any one method, its feature exists
In, including:
For choosing the reference sentence most like with former sentence from semantic base, and select with referring to the reference translation that sentence matches
Choose unit;
For judge former sentence with reference to sentence do not exist together whether be sequence number or number judging unit;
For carrying out sequence number replacement to former sentence sequence number according to reference to sentence, reference translation and sequence number table when not existing together for sequence number,
Do not exist together during for number according to the replacement unit for carrying out number replacement to former sentence number with reference to sentence, reference translation sum vocabulary.
8. the number and sequence number replacement system according to claim 7 for machine translation, it is characterised in that the selection
Unit is chosen with the former sentence that sentence is isometric and WER values are minimum from corpus as the reference sentence most like with former sentence;The WER
It is worth to be modified as the minimum step number with reference to required for sentence from former sentence;The WER values use Lay Weinstein distance.
9. the number and sequence number replacement system according to claim 7 for machine translation, it is characterised in that also include:
Production unit for making, storing and changing sequence number table sum vocabulary;The sequence number table includes serial number type and and sequence
Number corresponding sequence number numerical value;The number table includes number type and number numerical value corresponding with number.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710187175.1A CN107066454A (en) | 2017-03-27 | 2017-03-27 | Number and sequence number replacement method and system for machine translation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710187175.1A CN107066454A (en) | 2017-03-27 | 2017-03-27 | Number and sequence number replacement method and system for machine translation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107066454A true CN107066454A (en) | 2017-08-18 |
Family
ID=59621157
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710187175.1A Pending CN107066454A (en) | 2017-03-27 | 2017-03-27 | Number and sequence number replacement method and system for machine translation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107066454A (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8244519B2 (en) * | 2008-12-03 | 2012-08-14 | Xerox Corporation | Dynamic translation memory using statistical machine translation |
CN103020044A (en) * | 2012-12-03 | 2013-04-03 | 江苏乐买到网络科技有限公司 | Machine-aided webpage translation method and system thereof |
-
2017
- 2017-03-27 CN CN201710187175.1A patent/CN107066454A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8244519B2 (en) * | 2008-12-03 | 2012-08-14 | Xerox Corporation | Dynamic translation memory using statistical machine translation |
CN103020044A (en) * | 2012-12-03 | 2013-04-03 | 江苏乐买到网络科技有限公司 | Machine-aided webpage translation method and system thereof |
Non-Patent Citations (1)
Title |
---|
张秉权: "机器翻译中数字和数词相关表达形式的词法分析技术", 《计算机工程与应用》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6607902B2 (en) | Phrase-based dictionary extraction and translation quality evaluation | |
Boudin et al. | Keyphrase extraction for n-best reranking in multi-sentence compression | |
CN103810212B (en) | Automated database index creation method and system | |
CN105446962B (en) | The alignment schemes and device of original text and translation | |
TW201812619A (en) | Error correction method and device for search term | |
CN107885737A (en) | A kind of human-computer interaction interpretation method and system | |
CN104881469A (en) | Data exporting method and device | |
CN103365838A (en) | Method for automatically correcting syntax errors in English composition based on multivariate features | |
CN103164393B (en) | Report form formula disposal route and system | |
US20200293623A1 (en) | Learning system for contextual interpretation of japanese words | |
CN106339367B (en) | A kind of Mongolian auto-correction method | |
CN107066454A (en) | Number and sequence number replacement method and system for machine translation | |
CN104933030A (en) | Uygur language spelling examination method and device | |
US20180011836A1 (en) | Tibetan Character Constituent Analysis Method, Tibetan Sorting Method And Corresponding Devices | |
CN102346777B (en) | A kind of method and apparatus that illustrative sentence retrieval result is ranked up | |
CN109829010A (en) | A kind of entry amending method and entry modify device | |
CN107766339A (en) | The method and device of former translation alignment | |
CN109828775B (en) | WEB management system and method for multilingual translation text content | |
CN109522563B (en) | Method and device for automatically judging statement translation completion | |
CN107967303B (en) | Corpus display method and apparatus | |
CN106775935B (en) | The analytic method and its device and computer system of interpreted languages | |
Tang et al. | Method of Chinese Grammar rules automatically access based on mining association rules | |
CN104133854A (en) | MySQL multi-language mixed text fulltext retrieval realization method | |
Haslinger et al. | Acquisition of semantic type flexibility: The case of conjunction | |
Berlot et al. | Machine Translation with Cross-lingual Word Embeddings |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170818 |