CN106844355A - A kind of date-time automatic translation control method - Google Patents
A kind of date-time automatic translation control method Download PDFInfo
- Publication number
- CN106844355A CN106844355A CN201710028790.8A CN201710028790A CN106844355A CN 106844355 A CN106844355 A CN 106844355A CN 201710028790 A CN201710028790 A CN 201710028790A CN 106844355 A CN106844355 A CN 106844355A
- Authority
- CN
- China
- Prior art keywords
- date
- translation
- rule
- time
- control method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
Abstract
The invention discloses a kind of date-time automatic translation control method, the control method uses rule-based time, numeral, date, currency recognition and interpretation method, for an any given english sentence to be translated, it is automatic that time therein, numeral, date, currency content automatic translation process is Chinese, then complete whole translation process by remaining untranslated part is translated using statistical machine translation.The present invention can solve the problems, such as that traditional statistical machine translation has the very strong regular easy translation error of translation content to part, and optimize the process of overall translation;So that English integrally has a certain upgrade to the translation quality of Chinese;A small-sized regular translation system is combined, in the case of overall translation speed is not influenceed, it is ensured that the original English text in the range of rule coverage can correctly be translated and.
Description
Technical field
The invention belongs to machine translation mothod field, more particularly to a kind of date-time automatic translation control method.
Background technology
In traditional statistical machine translation, corpus is the most important part for determining translation quality, is often determined
One quality of machine translation system translation ability.For some special translation content such as times, number with regularity
Word, date, currency etc., due to there is various forms to change, digital scope is also unlimited, and corpus is often difficult whole
Cover all situations, it is impossible to accomplish accuracy higher.For existing machine translation system, such as Baidu's translation system is turned over
It is 7,000,000,000 to translate There are 7bn people in the hundred degree of machine translation of earth. and cannot correctly translate 7bn.
Some have the problem of the regular easy translation error of text present in conventional statistics machine translation.
The content of the invention
It is an object of the invention to provide a kind of date-time automatic translation control method, it is intended to solve conventional statistics machine
Some have the problem of the regular easy translation error of text present in translation.
The present invention is achieved in that a kind of date-time automatic translation control method, the date-time automatic translation
Control method uses rule-based time, numeral, date, currency recognition and interpretation method, waits to turn over for any given one
The english sentence translated, automatically by time therein, numeral, date, currency content automatic translation process Chinese, then using system
Meter machine translation completes whole translation process by remaining untranslated part is translated.
Further, the date-time automatic translation control method is comprised the following steps:
Step one, pretreatment:Space, the treatment of word morphological change are added before punctuation mark is carried out to source language, at abbreviation
Reason etc., such as 105bn needs to be transformed into 105billion;
Step 2, consults the dictionary and participle:Using the segmenting method of maximum matching, whole sentence, foundation are from left to right scanned
Dictionary for word segmentation, finds out the phrase all most long being present in dictionary for word segmentation in sentence.Assuming that there is Greenwich in sentence
Mean Time, and be there is also in dictionary for word segmentation entry Greenwich Mean Time N the Greenwich Mean Time, then by sentence
The entry in son finds out, and marks upper time noun attribute.
Step 3, date-time rule-like and its matching:The various forms of dates being likely to occur in exhaustive practical language
And time form, and write out respective rule;For the date translation, during rule match, therefore, to assure that can correctly recognize year, the moon,
Day;
Step 4, conversion generation:To every rule, interpretive scheme will be provided, to carry out conversion generation translation.
Further, the respective rule includes::
(0)DAY[1]+(1)CAT[N]&&MONTH[1]+(2)CHI[,]+(3)YEAR[1]-->TREE[0,3,1];PUT
[0,DATE,1];
(0)CAT[N]&&MONTH[1]+(1)DAY[1]+(2)CHI[,]+(3)YEAR[1]-->TREE[0,3,1];PUT
[0,DATE,1]。
Further, in step 3, it is necessary to define many functions, such as digital scope function, such as function of year, month, day, day
Function DAY, scope is numeral 1-31, or 1st-31st, and moon function MONTH, is January to December, and they
Abbreviation, in year function YEAR, scope typically takes the numeral between 1200-2500.
Then rule is defined according to English date literary style, such as English has following several:
January 2,2016
2January,2016
Then define rule:
(0)MONTH[1]+(1)DAY[1]+(2)CHI[,]+(3)YEAR[1]
(0)DAY[1]+(1)MONTH[1]+(2)CHI[,]+(3)YEAR[1]
Above-mentioned two situations can be matched.
Another object of the present invention is to provide a kind of date-time of the date-time automatic translation control method certainly
Dynamic translation control system, the date-time automatic translation control system includes:
Pretreatment module, pretreatment module adds for being pre-processed to source language, including before carrying out punctuation mark to source language
Plus space, the treatment of word morphological change, abbreviation treatment;
Consult the dictionary and word-dividing mode, for using maximum matching method, whole sentence is from left to right scanned, according to participle word
Allusion quotation, finds out the phrase all most long being present in dictionary for word segmentation in sentence;
Date-time rule-like and its matching module, for the various forms of dates being likely to occur in exhaustive practical language
And time form, and write out respective rule;For the date translation, during rule match, therefore, to assure that can correctly recognize year, the moon,
Day;
Conversion generation module, for every rule, providing interpretive scheme, to carry out conversion generation translation.
Turned over another object of the present invention is to provide a kind of machine using the date-time automatic translation control method
Translate system.
The date-time automatic translation control method that the present invention is provided, it is such with very strong regular translation in order to improve
Content, using rule-based time, numeral, date, currency recognition and interpretation method, to be translated for any given one
English sentence, the present invention is automatic by time therein, numeral, date, currency content automatic translation process Chinese, is then utilizing
Traditional statistical machine translation completes whole translation process by remaining untranslated part is translated.
The present invention can solve traditional statistical machine translation, and there is very strong regular translation content easily to turn over to part
The problem of mistake is translated, and optimizes the process of overall translation;So that the translation of the content such as numeral of the English to Chinese, date is more
Accurately, BLEU values improve 0.3 point (100 points of full marks) in the machine evaluation and test of contrast, are improve in the artificial evaluation and test of contrast
0.12 point (4 points of full marks);A small-sized regular translation system is combined, in the case of overall translation speed is not influenceed, it is ensured that
Original English text in the range of rule coverage can correctly be translated and.
The present invention solves some easy translation errors of text with regularity present in conventional statistics machine translation
Problem, reasonably used algorithm to recognize and translate these contents, these contents comprising numeral, the date, currency and when
Between;Such as Baidu's translation can be by Jul 3rd, and I went to home. are translated into:In March, I has gone to house.Do not exist in corpus
Such expression method, so Jul 3rd correctly cannot be translated into July 3, and after using the inventive method, will not
There is this kind of issues for translation.
Brief description of the drawings
Fig. 1 is date-time automatic translation control method flow chart provided in an embodiment of the present invention.
Specific embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to embodiments, to the present invention
It is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to
Limit the present invention.
Application principle of the invention is explained in detail below in conjunction with the accompanying drawings.
As shown in figure 1, date-time automatic translation control method provided in an embodiment of the present invention is comprised the following steps:
S101:Pretreatment:Source language is pre-processed, including addition space, word word before punctuation mark are carried out to source language
Shape change process, abbreviation treatment etc., such as 105bn needs to be transformed into 105billion.
S102:Consult the dictionary and participle:Using maximum matching method, using the segmenting method of maximum matching, from left to right scan
Whole sentence, according to dictionary for word segmentation, finds out the phrase all most long being present in dictionary for word segmentation in sentence, such as Greenwich
Mean Time N the Greenwich Mean Time.
S103:Date-time rule-like and its matching:Various forms of dates for being likely to occur in exhaustive practical language and
Time form, and write out respective rule;For date translation, during rule match, therefore, to assure that can correctly recognize year, month, day;
S104:Conversion generation:To every rule, interpretive scheme will be provided, to carry out conversion generation translation.
Further, in S102, it is assumed that there is Greenwich Mean Time in sentence, and word is there is also in dictionary for word segmentation
Bar Greenwich Mean Time N the Greenwich Mean Time, then the entry in sentence is found out, and mark on
Time noun attribute.
Further, in S103, specifically include:Need to define many functions, such as digital scope function, the letter of year, month, day
Number, such as day function DAY, scope is numeral 1-31, or 1st-31st, and moon function MONTH, is January to December, with
And their abbreviations, year function YEAR, scope typically takes the numeral between 1200-2500.
Then rule is defined according to English date literary style, such as English has following several:
January 2,2016
2January,2016
Then define rule:
(0)MONTH[1]+(1)DAY[1]+(2)CHI[,]+(3)YEAR[1]
(0)DAY[1]+(1)MONTH[1]+(2)CHI[,]+(3)YEAR[1]
Above-mentioned two situations can be matched.
Application principle of the invention is further described with reference to specific embodiment.
Date-time automatic translation control method provided in an embodiment of the present invention is comprised the following steps:
Step one, pretreatment:Source language is pre-processed, including addition space, abbreviation treatment etc., such as 105bn needs
It is transformed into 105billion;
Step 2, consults the dictionary and participle:Using maximum matching method, the phrase in sentence, such as Greenwich Mean are found out
Time N the Greenwich Mean Time;
Step 3, date-time rule-like and its matching:The various forms of dates being likely to occur in exhaustive practical language
And time form, and respective rule is write out, such as:
(0)DAY[1]+(1)CAT[N]&&MONTH[1]+(2)CHI[,]+(3)YEAR[1]-->TREE[0,3,1];PUT
[0,DATE,1];
The date of this form can be matched:“17Feb,2016”.
(0)CAT[N]&&MONTH[1]+(1)DAY[1]+(2)CHI[,]+(3)YEAR[1]-->TREE[0,3,1];PUT
[0,DATE,1];
The date of this form can be matched:“May 12,2016”.
For date translation, during rule match, therefore, to assure that can correctly recognize year, month, day, such as the time is usually
18XX -20XX, there is various abbreviations month, and day is usually 1-31 etc., and other are also all similar.
Step 4, conversion generation:To the every rule in above-mentioned 3, interpretive scheme will be provided, be translated with carrying out conversion generation
Text, such as:
(0)CAT[U]&&M_SEM[A|B]+(1)CAT[N]&&MONTH[1]+(2)CHI[,]+(3)YEAR[1]-->%3
Year %1%RI [%0] is for big digital translation, it would be desirable to be able to be converted into correct Chinese.
Presently preferred embodiments of the present invention is the foregoing is only, is not intended to limit the invention, it is all in essence of the invention
Any modification, equivalent and improvement made within god and principle etc., should be included within the scope of the present invention.
Claims (6)
1. a kind of date-time automatic translation control method, it is characterised in that the date-time automatic translation control method is adopted
With rule-based time, numeral, date, currency recognition and interpretation method, the English sentence to be translated for any given
Son, automatically by time therein, numeral, date, currency content automatic translation process Chinese, is then utilizing statistical machine translation
Whole translation process is completed by remaining untranslated part is translated.
2. date-time automatic translation control method as claimed in claim 1, it is characterised in that the date-time automatic turning
Control method is translated to comprise the following steps:
Step one, pretreatment:Source language is pre-processed, including addition space, word morphology before punctuation mark are carried out to source language
Change process, abbreviation treatment;
Step 2, consults the dictionary and participle:Using the segmenting method of maximum matching, whole sentence is from left to right scanned, according to participle
Dictionary, finds out the phrase all most long being present in dictionary for word segmentation in sentence;
Step 3, date-time rule-like and its matching:Various forms of dates for being likely to occur in exhaustive practical language and when
Between form, and write out respective rule;For date translation, during rule match, therefore, to assure that can correctly recognize year, month, day;
Step 4, conversion generation:To every rule, interpretive scheme is provided, to carry out conversion generation translation.
3. date-time automatic translation control method as claimed in claim 2, it is characterised in that in step 3 is described corresponding
Rule includes::
(0)DAY[1]+(1)CAT[N]&&MONTH[1]+(2)CHI[,]+(3)YEAR[1]-->TREE[0,3,1];PUT[0,
DATE,1];
(0)CAT[N]&&MONTH[1]+(1)DAY[1]+(2)CHI[,]+(3)YEAR[1]-->TREE[0,3,1];PUT[0,
DATE,1]。
4. date-time automatic translation control method as claimed in claim 2, it is characterised in that in step 3, date-time
Rule-like and its matching, specifically include:The function of definition, including such as digital range function, year function, moon function, day function;Day
Function is DAY, and scope is numeral 1-31, or 1st-31st;Moon function MONTH, is January to December;Year function
YEAR, scope is the numeral between 1200-2500;
Then rule is defined according to English date literary style, defining rule is:
(0)MONTH[1]+(1)DAY[1]+(2)CHI[,]+(3)YEAR[1]
(0)DAY[1]+(1)MONTH[1]+(2)CHI[,]+(3)YEAR[1]。
5. a kind of date-time automatic translation control system of date-time automatic translation control method as claimed in claim 1,
Characterized in that, the date-time automatic translation control system includes:
Pretreatment module, for being pre-processed to source language, including carries out addition space, word morphology before punctuation mark to source language
Change process, abbreviation treatment;
Consult the dictionary and word-dividing mode, for using maximum matching method, from left to right scan whole sentence, according to dictionary for word segmentation, look for
Go out in sentence to be present in the phrase all most long in dictionary for word segmentation;
Date-time rule-like and its matching module, for various forms of dates for being likely to occur in exhaustive practical language and when
Between form, and write out respective rule;For date translation, during rule match, therefore, to assure that can correctly recognize year, month, day;
Conversion generation module, for every rule, providing interpretive scheme, to carry out conversion generation translation.
6. the machine translation system of date-time automatic translation control method described in a kind of utilization claim 1~3 any one.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710028790.8A CN106844355A (en) | 2017-01-16 | 2017-01-16 | A kind of date-time automatic translation control method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710028790.8A CN106844355A (en) | 2017-01-16 | 2017-01-16 | A kind of date-time automatic translation control method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106844355A true CN106844355A (en) | 2017-06-13 |
Family
ID=59123897
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710028790.8A Pending CN106844355A (en) | 2017-01-16 | 2017-01-16 | A kind of date-time automatic translation control method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106844355A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111652007A (en) * | 2020-06-09 | 2020-09-11 | 北京中科凡语科技有限公司 | Translation method and device for multi-language mixed file |
WO2023045873A1 (en) * | 2021-09-26 | 2023-03-30 | 北京字节跳动网络技术有限公司 | Application program translation method and apparatus |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101950286A (en) * | 2010-09-14 | 2011-01-19 | 传神联合(北京)信息技术有限公司 | Error correction module and method in software translation system |
US20150278201A1 (en) * | 2014-03-26 | 2015-10-01 | Microsoft Technology Licensing, Llc | Temporal translation grammar for language translation |
CN106326206A (en) * | 2015-06-24 | 2017-01-11 | 北京京东尚科信息技术有限公司 | Entity extraction method based on grammar templates |
-
2017
- 2017-01-16 CN CN201710028790.8A patent/CN106844355A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101950286A (en) * | 2010-09-14 | 2011-01-19 | 传神联合(北京)信息技术有限公司 | Error correction module and method in software translation system |
US20150278201A1 (en) * | 2014-03-26 | 2015-10-01 | Microsoft Technology Licensing, Llc | Temporal translation grammar for language translation |
CN106326206A (en) * | 2015-06-24 | 2017-01-11 | 北京京东尚科信息技术有限公司 | Entity extraction method based on grammar templates |
Non-Patent Citations (2)
Title |
---|
朱江涛 等: "一种基于网络的英文缩略语信息的自动抽取方法", 《全国第八届计算语言学联合学术会议论文集》 * |
郑宏: "汉英双向时间数字和数量词的识别与翻译技术", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111652007A (en) * | 2020-06-09 | 2020-09-11 | 北京中科凡语科技有限公司 | Translation method and device for multi-language mixed file |
WO2023045873A1 (en) * | 2021-09-26 | 2023-03-30 | 北京字节跳动网络技术有限公司 | Application program translation method and apparatus |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107463553B (en) | Text semantic extraction, representation and modeling method and system for elementary mathematic problems | |
CN110532573B (en) | Translation method and system | |
US8874433B2 (en) | Syntax-based augmentation of statistical machine translation phrase tables | |
US9984071B2 (en) | Language ambiguity detection of text | |
CN103678288B (en) | A kind of method of Automatic proper noun translation | |
Serrano et al. | Interactive handwriting recognition with limited user effort | |
CN110334362B (en) | Method for solving and generating untranslated words based on medical neural machine translation | |
CN106844355A (en) | A kind of date-time automatic translation control method | |
CN111144210A (en) | Image structuring processing method and device, storage medium and electronic equipment | |
CN103678270B (en) | Semantic primitive abstracting method and semantic primitive extracting device | |
Wong et al. | isentenizer-: Multilingual sentence boundary detection model | |
Mammadzada | A review of existing transliteration approaches and methods | |
CN102135957A (en) | Clause translating method and device | |
US11704505B2 (en) | Language processing method and device | |
CN109977430B (en) | Text translation method, device and equipment | |
CN101882158A (en) | Automatic translation sequence adjusting method based on contexts | |
JP2014229275A (en) | Query answering device and method | |
CN106339367A (en) | Method for automatically correcting Mongolian | |
CN102609410A (en) | Authority file auxiliary writing system and authority file generating method | |
CN106775914B (en) | A kind of code method for internationalizing and device for automatically generating key assignments | |
CN109871550A (en) | A method of the raising digital translation quality based on post-processing technology | |
CN107423293A (en) | The method and apparatus of data translation | |
CN109446537B (en) | Translation evaluation method and device for machine translation | |
Lu et al. | Language model for Mongolian polyphone proofreading | |
CN106681982B (en) | English novel abstraction generating method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 100040 Shijingshan District railway building, Beijing, the 16 floor Applicant after: Chinese translation language through Polytron Technologies Inc Address before: 100040 Shijingshan District railway building, Beijing, the 16 floor Applicant before: Mandarin Technology (Beijing) Co., Ltd. |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170613 |