CN106844355A - A kind of date-time automatic translation control method - Google Patents

A kind of date-time automatic translation control method Download PDF

Info

Publication number
CN106844355A
CN106844355A CN201710028790.8A CN201710028790A CN106844355A CN 106844355 A CN106844355 A CN 106844355A CN 201710028790 A CN201710028790 A CN 201710028790A CN 106844355 A CN106844355 A CN 106844355A
Authority
CN
China
Prior art keywords
date
translation
rule
time
control method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710028790.8A
Other languages
Chinese (zh)
Inventor
程国艮
宗浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mandarin Technology (beijing) Co Ltd
Original Assignee
Mandarin Technology (beijing) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mandarin Technology (beijing) Co Ltd filed Critical Mandarin Technology (beijing) Co Ltd
Priority to CN201710028790.8A priority Critical patent/CN106844355A/en
Publication of CN106844355A publication Critical patent/CN106844355A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Abstract

The invention discloses a kind of date-time automatic translation control method, the control method uses rule-based time, numeral, date, currency recognition and interpretation method, for an any given english sentence to be translated, it is automatic that time therein, numeral, date, currency content automatic translation process is Chinese, then complete whole translation process by remaining untranslated part is translated using statistical machine translation.The present invention can solve the problems, such as that traditional statistical machine translation has the very strong regular easy translation error of translation content to part, and optimize the process of overall translation;So that English integrally has a certain upgrade to the translation quality of Chinese;A small-sized regular translation system is combined, in the case of overall translation speed is not influenceed, it is ensured that the original English text in the range of rule coverage can correctly be translated and.

Description

A kind of date-time automatic translation control method
Technical field
The invention belongs to machine translation mothod field, more particularly to a kind of date-time automatic translation control method.
Background technology
In traditional statistical machine translation, corpus is the most important part for determining translation quality, is often determined One quality of machine translation system translation ability.For some special translation content such as times, number with regularity Word, date, currency etc., due to there is various forms to change, digital scope is also unlimited, and corpus is often difficult whole Cover all situations, it is impossible to accomplish accuracy higher.For existing machine translation system, such as Baidu's translation system is turned over It is 7,000,000,000 to translate There are 7bn people in the hundred degree of machine translation of earth. and cannot correctly translate 7bn.
Some have the problem of the regular easy translation error of text present in conventional statistics machine translation.
The content of the invention
It is an object of the invention to provide a kind of date-time automatic translation control method, it is intended to solve conventional statistics machine Some have the problem of the regular easy translation error of text present in translation.
The present invention is achieved in that a kind of date-time automatic translation control method, the date-time automatic translation Control method uses rule-based time, numeral, date, currency recognition and interpretation method, waits to turn over for any given one The english sentence translated, automatically by time therein, numeral, date, currency content automatic translation process Chinese, then using system Meter machine translation completes whole translation process by remaining untranslated part is translated.
Further, the date-time automatic translation control method is comprised the following steps:
Step one, pretreatment:Space, the treatment of word morphological change are added before punctuation mark is carried out to source language, at abbreviation Reason etc., such as 105bn needs to be transformed into 105billion;
Step 2, consults the dictionary and participle:Using the segmenting method of maximum matching, whole sentence, foundation are from left to right scanned Dictionary for word segmentation, finds out the phrase all most long being present in dictionary for word segmentation in sentence.Assuming that there is Greenwich in sentence Mean Time, and be there is also in dictionary for word segmentation entry Greenwich Mean Time N the Greenwich Mean Time, then by sentence The entry in son finds out, and marks upper time noun attribute.
Step 3, date-time rule-like and its matching:The various forms of dates being likely to occur in exhaustive practical language And time form, and write out respective rule;For the date translation, during rule match, therefore, to assure that can correctly recognize year, the moon, Day;
Step 4, conversion generation:To every rule, interpretive scheme will be provided, to carry out conversion generation translation.
Further, the respective rule includes::
(0)DAY[1]+(1)CAT[N]&&MONTH[1]+(2)CHI[,]+(3)YEAR[1]-->TREE[0,3,1];PUT [0,DATE,1];
(0)CAT[N]&&MONTH[1]+(1)DAY[1]+(2)CHI[,]+(3)YEAR[1]-->TREE[0,3,1];PUT [0,DATE,1]。
Further, in step 3, it is necessary to define many functions, such as digital scope function, such as function of year, month, day, day Function DAY, scope is numeral 1-31, or 1st-31st, and moon function MONTH, is January to December, and they Abbreviation, in year function YEAR, scope typically takes the numeral between 1200-2500.
Then rule is defined according to English date literary style, such as English has following several:
January 2,2016
2January,2016
Then define rule:
(0)MONTH[1]+(1)DAY[1]+(2)CHI[,]+(3)YEAR[1]
(0)DAY[1]+(1)MONTH[1]+(2)CHI[,]+(3)YEAR[1]
Above-mentioned two situations can be matched.
Another object of the present invention is to provide a kind of date-time of the date-time automatic translation control method certainly Dynamic translation control system, the date-time automatic translation control system includes:
Pretreatment module, pretreatment module adds for being pre-processed to source language, including before carrying out punctuation mark to source language Plus space, the treatment of word morphological change, abbreviation treatment;
Consult the dictionary and word-dividing mode, for using maximum matching method, whole sentence is from left to right scanned, according to participle word Allusion quotation, finds out the phrase all most long being present in dictionary for word segmentation in sentence;
Date-time rule-like and its matching module, for the various forms of dates being likely to occur in exhaustive practical language And time form, and write out respective rule;For the date translation, during rule match, therefore, to assure that can correctly recognize year, the moon, Day;
Conversion generation module, for every rule, providing interpretive scheme, to carry out conversion generation translation.
Turned over another object of the present invention is to provide a kind of machine using the date-time automatic translation control method Translate system.
The date-time automatic translation control method that the present invention is provided, it is such with very strong regular translation in order to improve Content, using rule-based time, numeral, date, currency recognition and interpretation method, to be translated for any given one English sentence, the present invention is automatic by time therein, numeral, date, currency content automatic translation process Chinese, is then utilizing Traditional statistical machine translation completes whole translation process by remaining untranslated part is translated.
The present invention can solve traditional statistical machine translation, and there is very strong regular translation content easily to turn over to part The problem of mistake is translated, and optimizes the process of overall translation;So that the translation of the content such as numeral of the English to Chinese, date is more Accurately, BLEU values improve 0.3 point (100 points of full marks) in the machine evaluation and test of contrast, are improve in the artificial evaluation and test of contrast 0.12 point (4 points of full marks);A small-sized regular translation system is combined, in the case of overall translation speed is not influenceed, it is ensured that Original English text in the range of rule coverage can correctly be translated and.
The present invention solves some easy translation errors of text with regularity present in conventional statistics machine translation Problem, reasonably used algorithm to recognize and translate these contents, these contents comprising numeral, the date, currency and when Between;Such as Baidu's translation can be by Jul 3rd, and I went to home. are translated into:In March, I has gone to house.Do not exist in corpus Such expression method, so Jul 3rd correctly cannot be translated into July 3, and after using the inventive method, will not There is this kind of issues for translation.
Brief description of the drawings
Fig. 1 is date-time automatic translation control method flow chart provided in an embodiment of the present invention.
Specific embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to embodiments, to the present invention It is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to Limit the present invention.
Application principle of the invention is explained in detail below in conjunction with the accompanying drawings.
As shown in figure 1, date-time automatic translation control method provided in an embodiment of the present invention is comprised the following steps:
S101:Pretreatment:Source language is pre-processed, including addition space, word word before punctuation mark are carried out to source language Shape change process, abbreviation treatment etc., such as 105bn needs to be transformed into 105billion.
S102:Consult the dictionary and participle:Using maximum matching method, using the segmenting method of maximum matching, from left to right scan Whole sentence, according to dictionary for word segmentation, finds out the phrase all most long being present in dictionary for word segmentation in sentence, such as Greenwich Mean Time N the Greenwich Mean Time.
S103:Date-time rule-like and its matching:Various forms of dates for being likely to occur in exhaustive practical language and Time form, and write out respective rule;For date translation, during rule match, therefore, to assure that can correctly recognize year, month, day;
S104:Conversion generation:To every rule, interpretive scheme will be provided, to carry out conversion generation translation.
Further, in S102, it is assumed that there is Greenwich Mean Time in sentence, and word is there is also in dictionary for word segmentation Bar Greenwich Mean Time N the Greenwich Mean Time, then the entry in sentence is found out, and mark on Time noun attribute.
Further, in S103, specifically include:Need to define many functions, such as digital scope function, the letter of year, month, day Number, such as day function DAY, scope is numeral 1-31, or 1st-31st, and moon function MONTH, is January to December, with And their abbreviations, year function YEAR, scope typically takes the numeral between 1200-2500.
Then rule is defined according to English date literary style, such as English has following several:
January 2,2016
2January,2016
Then define rule:
(0)MONTH[1]+(1)DAY[1]+(2)CHI[,]+(3)YEAR[1]
(0)DAY[1]+(1)MONTH[1]+(2)CHI[,]+(3)YEAR[1]
Above-mentioned two situations can be matched.
Application principle of the invention is further described with reference to specific embodiment.
Date-time automatic translation control method provided in an embodiment of the present invention is comprised the following steps:
Step one, pretreatment:Source language is pre-processed, including addition space, abbreviation treatment etc., such as 105bn needs It is transformed into 105billion;
Step 2, consults the dictionary and participle:Using maximum matching method, the phrase in sentence, such as Greenwich Mean are found out Time N the Greenwich Mean Time;
Step 3, date-time rule-like and its matching:The various forms of dates being likely to occur in exhaustive practical language And time form, and respective rule is write out, such as:
(0)DAY[1]+(1)CAT[N]&&MONTH[1]+(2)CHI[,]+(3)YEAR[1]-->TREE[0,3,1];PUT [0,DATE,1];
The date of this form can be matched:“17Feb,2016”.
(0)CAT[N]&&MONTH[1]+(1)DAY[1]+(2)CHI[,]+(3)YEAR[1]-->TREE[0,3,1];PUT [0,DATE,1];
The date of this form can be matched:“May 12,2016”.
For date translation, during rule match, therefore, to assure that can correctly recognize year, month, day, such as the time is usually 18XX -20XX, there is various abbreviations month, and day is usually 1-31 etc., and other are also all similar.
Step 4, conversion generation:To the every rule in above-mentioned 3, interpretive scheme will be provided, be translated with carrying out conversion generation Text, such as:
(0)CAT[U]&&M_SEM[A|B]+(1)CAT[N]&&MONTH[1]+(2)CHI[,]+(3)YEAR[1]-->%3 Year %1%RI [%0] is for big digital translation, it would be desirable to be able to be converted into correct Chinese.
Presently preferred embodiments of the present invention is the foregoing is only, is not intended to limit the invention, it is all in essence of the invention Any modification, equivalent and improvement made within god and principle etc., should be included within the scope of the present invention.

Claims (6)

1. a kind of date-time automatic translation control method, it is characterised in that the date-time automatic translation control method is adopted With rule-based time, numeral, date, currency recognition and interpretation method, the English sentence to be translated for any given Son, automatically by time therein, numeral, date, currency content automatic translation process Chinese, is then utilizing statistical machine translation Whole translation process is completed by remaining untranslated part is translated.
2. date-time automatic translation control method as claimed in claim 1, it is characterised in that the date-time automatic turning Control method is translated to comprise the following steps:
Step one, pretreatment:Source language is pre-processed, including addition space, word morphology before punctuation mark are carried out to source language Change process, abbreviation treatment;
Step 2, consults the dictionary and participle:Using the segmenting method of maximum matching, whole sentence is from left to right scanned, according to participle Dictionary, finds out the phrase all most long being present in dictionary for word segmentation in sentence;
Step 3, date-time rule-like and its matching:Various forms of dates for being likely to occur in exhaustive practical language and when Between form, and write out respective rule;For date translation, during rule match, therefore, to assure that can correctly recognize year, month, day;
Step 4, conversion generation:To every rule, interpretive scheme is provided, to carry out conversion generation translation.
3. date-time automatic translation control method as claimed in claim 2, it is characterised in that in step 3 is described corresponding Rule includes::
(0)DAY[1]+(1)CAT[N]&&MONTH[1]+(2)CHI[,]+(3)YEAR[1]-->TREE[0,3,1];PUT[0, DATE,1];
(0)CAT[N]&&MONTH[1]+(1)DAY[1]+(2)CHI[,]+(3)YEAR[1]-->TREE[0,3,1];PUT[0, DATE,1]。
4. date-time automatic translation control method as claimed in claim 2, it is characterised in that in step 3, date-time Rule-like and its matching, specifically include:The function of definition, including such as digital range function, year function, moon function, day function;Day Function is DAY, and scope is numeral 1-31, or 1st-31st;Moon function MONTH, is January to December;Year function YEAR, scope is the numeral between 1200-2500;
Then rule is defined according to English date literary style, defining rule is:
(0)MONTH[1]+(1)DAY[1]+(2)CHI[,]+(3)YEAR[1]
(0)DAY[1]+(1)MONTH[1]+(2)CHI[,]+(3)YEAR[1]。
5. a kind of date-time automatic translation control system of date-time automatic translation control method as claimed in claim 1, Characterized in that, the date-time automatic translation control system includes:
Pretreatment module, for being pre-processed to source language, including carries out addition space, word morphology before punctuation mark to source language Change process, abbreviation treatment;
Consult the dictionary and word-dividing mode, for using maximum matching method, from left to right scan whole sentence, according to dictionary for word segmentation, look for Go out in sentence to be present in the phrase all most long in dictionary for word segmentation;
Date-time rule-like and its matching module, for various forms of dates for being likely to occur in exhaustive practical language and when Between form, and write out respective rule;For date translation, during rule match, therefore, to assure that can correctly recognize year, month, day;
Conversion generation module, for every rule, providing interpretive scheme, to carry out conversion generation translation.
6. the machine translation system of date-time automatic translation control method described in a kind of utilization claim 1~3 any one.
CN201710028790.8A 2017-01-16 2017-01-16 A kind of date-time automatic translation control method Pending CN106844355A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710028790.8A CN106844355A (en) 2017-01-16 2017-01-16 A kind of date-time automatic translation control method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710028790.8A CN106844355A (en) 2017-01-16 2017-01-16 A kind of date-time automatic translation control method

Publications (1)

Publication Number Publication Date
CN106844355A true CN106844355A (en) 2017-06-13

Family

ID=59123897

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710028790.8A Pending CN106844355A (en) 2017-01-16 2017-01-16 A kind of date-time automatic translation control method

Country Status (1)

Country Link
CN (1) CN106844355A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111652007A (en) * 2020-06-09 2020-09-11 北京中科凡语科技有限公司 Translation method and device for multi-language mixed file
WO2023045873A1 (en) * 2021-09-26 2023-03-30 北京字节跳动网络技术有限公司 Application program translation method and apparatus

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101950286A (en) * 2010-09-14 2011-01-19 传神联合(北京)信息技术有限公司 Error correction module and method in software translation system
US20150278201A1 (en) * 2014-03-26 2015-10-01 Microsoft Technology Licensing, Llc Temporal translation grammar for language translation
CN106326206A (en) * 2015-06-24 2017-01-11 北京京东尚科信息技术有限公司 Entity extraction method based on grammar templates

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101950286A (en) * 2010-09-14 2011-01-19 传神联合(北京)信息技术有限公司 Error correction module and method in software translation system
US20150278201A1 (en) * 2014-03-26 2015-10-01 Microsoft Technology Licensing, Llc Temporal translation grammar for language translation
CN106326206A (en) * 2015-06-24 2017-01-11 北京京东尚科信息技术有限公司 Entity extraction method based on grammar templates

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
朱江涛 等: "一种基于网络的英文缩略语信息的自动抽取方法", 《全国第八届计算语言学联合学术会议论文集》 *
郑宏: "汉英双向时间数字和数量词的识别与翻译技术", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111652007A (en) * 2020-06-09 2020-09-11 北京中科凡语科技有限公司 Translation method and device for multi-language mixed file
WO2023045873A1 (en) * 2021-09-26 2023-03-30 北京字节跳动网络技术有限公司 Application program translation method and apparatus

Similar Documents

Publication Publication Date Title
CN107463553B (en) Text semantic extraction, representation and modeling method and system for elementary mathematic problems
CN110532573B (en) Translation method and system
US8874433B2 (en) Syntax-based augmentation of statistical machine translation phrase tables
US9984071B2 (en) Language ambiguity detection of text
CN103678288B (en) A kind of method of Automatic proper noun translation
Serrano et al. Interactive handwriting recognition with limited user effort
CN110334362B (en) Method for solving and generating untranslated words based on medical neural machine translation
CN106844355A (en) A kind of date-time automatic translation control method
CN111144210A (en) Image structuring processing method and device, storage medium and electronic equipment
CN103678270B (en) Semantic primitive abstracting method and semantic primitive extracting device
Wong et al. isentenizer-: Multilingual sentence boundary detection model
Mammadzada A review of existing transliteration approaches and methods
CN102135957A (en) Clause translating method and device
US11704505B2 (en) Language processing method and device
CN109977430B (en) Text translation method, device and equipment
CN101882158A (en) Automatic translation sequence adjusting method based on contexts
JP2014229275A (en) Query answering device and method
CN106339367A (en) Method for automatically correcting Mongolian
CN102609410A (en) Authority file auxiliary writing system and authority file generating method
CN106775914B (en) A kind of code method for internationalizing and device for automatically generating key assignments
CN109871550A (en) A method of the raising digital translation quality based on post-processing technology
CN107423293A (en) The method and apparatus of data translation
CN109446537B (en) Translation evaluation method and device for machine translation
Lu et al. Language model for Mongolian polyphone proofreading
CN106681982B (en) English novel abstraction generating method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100040 Shijingshan District railway building, Beijing, the 16 floor

Applicant after: Chinese translation language through Polytron Technologies Inc

Address before: 100040 Shijingshan District railway building, Beijing, the 16 floor

Applicant before: Mandarin Technology (Beijing) Co., Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170613