CN102508878A - Method for generating standard foreign language page by means of machine translation system - Google Patents

Method for generating standard foreign language page by means of machine translation system Download PDF

Info

Publication number
CN102508878A
CN102508878A CN2011103160329A CN201110316032A CN102508878A CN 102508878 A CN102508878 A CN 102508878A CN 2011103160329 A CN2011103160329 A CN 2011103160329A CN 201110316032 A CN201110316032 A CN 201110316032A CN 102508878 A CN102508878 A CN 102508878A
Authority
CN
China
Prior art keywords
translation
file
standard
web page
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011103160329A
Other languages
Chinese (zh)
Inventor
周明明
王志波
汪澜
王明贵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Gongjin Electronics Co Ltd
Original Assignee
Shenzhen Gongjin Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Gongjin Electronics Co Ltd filed Critical Shenzhen Gongjin Electronics Co Ltd
Priority to CN2011103160329A priority Critical patent/CN102508878A/en
Publication of CN102508878A publication Critical patent/CN102508878A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to a method for generating a standard foreign language page by means of a machine translation system, belongs to the technical field of networks, and particularly relates to the development of web pages, in particular to a web page document generating multiple languages. The method comprises the following steps of: extracting character strings required to be translated in the web page document with complete functions; submitting the character strings to the Google machine translation system to primarily translate the character strings; standardizing a result, and manually proofreading to form a target translation document; loading the final translation content to the web page document when loading; and generating web page versions in different languages.

Description

A kind of method that generates the standard foreign language page by machine translation system
Technical field
The invention belongs to networking technology area, the particularly exploitation of webpage specifically is to generate multilingual web page files.
Background technology
Along with market development, client's scope is more and more wider, and for satisfying different region clients' needs, the web page contents in company's site and the product needs multilingual support.
The way of mainly taking at present is: after functional development meets the demands; The developer extracts the character string of a whole set of page in the excel table; After unification was translated completion by the client, the developer was placed into the result of client's translation in the page again, and institute is in steps all by manual completion.
The one cover page has tens even hundreds of the pages, and the character string quantity of each page this shows that the workload of this work is very big between 10 ~ 50, and, all contents that need translate are not added with handling do not offer the client, also very big to client's pressure.
In addition, according to user's requirement, the translation of web page contents needs standardization, and mechanical translation can not satisfy this requirement, the situation that human translation also can occur omitting, and also can increase artificial workload.
Therefore, be necessary to adopt a kind of new method, extract the content that needs translation in the web page files automatically; Proofread and correct through standardization after the preliminary translation; Offer the client again and carry out verification, modification, when the needs load page, the translation result of target language is write web page files automatically; Substitute the content that needs translation, generate the web page files of target language.
Summary of the invention
In order to achieve the above object; The present invention has designed a kind of method that generates the standard foreign language page by machine translation system; With needing the contents extraction of translation to come out in the pagefile, submit to machine translation system and tentatively translate, again the result is carried out standard; Generate the target translated document, when webpage loads, substitute content relevant in the web page files.
The technical scheme that the present invention takes is; A kind ofly generate the method for the standard foreign language page by machine translation system, this method realizes based on the computing machine that connects or be mounted with machine translation system, generated the pagefile of perfect in shape and function in the computing machine and is stored in the internal memory of computing machine; And: in computing machine, set up file extraction module and file load module; Set up the mailbox memory that is used for the storage specification translation table of comparisons, storage source language character string in the standard translation table of comparisons, and to the standard translation content of target language that should character string; Set up the standard translation module, the method for translation may further comprise the steps automatically:
A, file extraction module extract the content that needs translation in the web page files, generate source xml file and storage;
B, the source xml file that generates is submitted to machine translation system, receive target xml file and the storage returned;
C, standard translation module ergodic source xml file find the source language character string of wherein translating table of comparisons coupling with standard, with corresponding content in the target language standard translation content replacement target xml file of correspondence;
D, when the needs load page, through file load module target xml file is write web page files, substitute to need the content of translation, generate the web page files of target language.
Further comprising the steps of before execution in step C:
File conversion becomes the file of excel form with target xml with source xml file, and with the artificial check and correction of the file consignment after the conversion, the file destination after will proofreading again converts the file of xml form to.
Needing the part of translation in the web page files is content displayed, and other part of configuration file does not need not allow translation like order, mark, watch attentively etc. yet, therefore at first will from web page files, extract the content of need translation.Machine translation system has certain call format to the file that receives, and before sending to translation system to the content of need translation, be made into specific file layout to file group.
The result of present machine translation system translation is also unsatisfactory, and is more accurate in order to make translation result, can also carry out the manual work check and correction.
In order to satisfy normalized requirement, through the comparison standard translation table of comparisons translation result is handled automatically, can obtain the cypher text of standard.
All source strings that comprise the translation of requirement standard in the standard translation table of comparisons, corresponding, also have the standard translation content of all target languages to source string, these contents be according to user or project require specific.
According to the kind of target language, adopt above-mentioned steps to generate a plurality of cypher texts, the corresponding a kind of target language of each cypher text to a webpage.
When requests for page, according to the category of language of selecting, load final cypher text through file load module, need the content of translation on the replacement page, generate the web page files of target language.
Adopt method provided by the invention, can extract the content that needs translation in the webpage automatically, through standard translation result after the mechanical translation; And can offer the client and do further check and correction, generate file destination, when page request; Selected file destination is write web page files automatically, has reduced labor workload, standard the translation of webpage; Simultaneously also alleviate client's workload, accelerated tempo of development.
Description of drawings
Fig. 1 is a method flow diagram of the present invention.
Fig. 2 is the process flow diagram of file extraction module.
Fig. 3 is the process flow diagram of file load module.
Embodiment
Below in conjunction with accompanying drawing the present invention is elaborated.
Fig. 1 is the method flow diagram of invention.At first generate the web page files of html format, extract the content that wherein need translate then and convert the xml form to, file is submitted to machine translation system; Result after the translation is carried out the manual work check and correction; Carry out standardization processing then, generate purpose xml file, when web-page requests loads; Content with the content replacement in the purpose xml file need be translated generates the web page files that requires.
In the existing machine translation system, Google translation supporting language is maximum, and accuracy rate is the highest, the database that Google is huge and handle to us efficiently and brought convenience.What will use here is exactly that Google provides free the translation js storehouse to the user: google translate api.Google supports the translation of the whole file of xml form, when needs are translated, is connected to the internet, the following content of input in browser:
Http:// translate.google.com/translate u='+escape (need_translate_url)+' & Langpair=en%7Czh-CN will need the above-mentioned need_translate_url of network path replacement of translated document, just can accomplish automatic translation.
In order to accomplish mechanical translation, the perfect pagefile of systematic function in computing machine at first, the pagefile of this moment is the file of html format.Because the Google translation needs the file of xml form, therefore need the web page files of html format be changed.
Can find a rule through page analysis, all in leaf label (promptly this label does not comprise subtab), some character strings that are not placed in the leaf label can be placed in the leaf label through the mode of revising the nearly all character string that need translate.Can get access to the character string that needs translation through this rule, method is exactly the recurrence traversal, and is as shown in Figure 2.On the basis of this method, carry out batch processing again, the file extraction module can in minutes just be accomplished character string and extract and format conversion work, generates source xml file.
After the file extraction module is accomplished and extracted and change, select the kind of language, source xml file is submitted to Google through said method translate, the result of translation still is the xml form.According to target
The kind of language repeats above-mentioned work, generates the purpose xml file of a plurality of translation results.
For making translation result more accurate, can carry out the manual work check and correction.Computing machine user for now is, excel is a kind of file layout relatively more commonly used, according to client's actual conditions and convenience, offers the client after the file conversion of xml form become the excel form.Here provide preliminary translation result to the client, like this, also reduced client's workload.After the client proofreads completion, return the excel file, again file conversion is become the xml form.
Storage source language character string in the standard translation table of comparisons; And to the standard translation content of target language that should character string, be Chinese like source language, it is " project " that a character string is arranged in the webpage; We stipulate: English standard translation content is " Project "; The standard translation content of French is " projet ", and the standard translation content of Japanese is " プ ロ ジ ェ Network ト ", and foregoing is write the standard translation table of comparisons.
Standard translation module ergodic source xml file, as find " project " character string is because the character string after source string and the translation is consistent at source xml file with position in the purpose xml file; Therefore; Can locate the character string position after the translation, according to the kind of target language, be English like target language; Then no matter in the file destination relevant position be what content, all replace with " Project ".
Equally, kind and quantity according to target language repeat above-mentioned work, generate a plurality of translation result files.
When requests for page, according to the target language of selecting, through load-on module, process as shown in Figure 3 loads corresponding target xml file, needs the content of translation on the replacement page, generates the web page files of target language.

Claims (2)

1. method that generates the standard foreign language page by machine translation system; This method realizes based on the computing machine that connects or be mounted with machine translation system; Generated the pagefile of perfect in shape and function in the computing machine and be stored in the internal memory of computing machine, it is characterized in that: in computing machine, set up file extraction module and file load module, set up the mailbox memory that is used for the storage specification translation table of comparisons; Storage source language character string in the standard translation table of comparisons; And, set up the standard translation module to the standard translation content of target language that should character string, the method for translation may further comprise the steps automatically:
A, file extraction module extract the content that needs translation in the web page files, generate source xml file and storage;
B, the source xml file that generates is submitted to machine translation system, receive target xml file and the storage returned;
C, standard translation module ergodic source xml file find the source language character string of wherein translating table of comparisons coupling with standard, with corresponding content in the target language standard translation content replacement target xml file of correspondence;
D, when the needs load page, through file load module target xml file is write web page files, substitute to need the content of translation, generate the web page files of target language.
2. a kind of method by the machine translation system generation standard foreign language page according to claim 1 is characterized in that: further comprising the steps of before execution in step C:
File conversion becomes the file of excel form with target xml with source xml file, and with the artificial check and correction of the file consignment after the conversion, the file destination after will proofreading again converts the file of xml form to.
CN2011103160329A 2011-10-18 2011-10-18 Method for generating standard foreign language page by means of machine translation system Pending CN102508878A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011103160329A CN102508878A (en) 2011-10-18 2011-10-18 Method for generating standard foreign language page by means of machine translation system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011103160329A CN102508878A (en) 2011-10-18 2011-10-18 Method for generating standard foreign language page by means of machine translation system

Publications (1)

Publication Number Publication Date
CN102508878A true CN102508878A (en) 2012-06-20

Family

ID=46220964

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011103160329A Pending CN102508878A (en) 2011-10-18 2011-10-18 Method for generating standard foreign language page by means of machine translation system

Country Status (1)

Country Link
CN (1) CN102508878A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929865A (en) * 2012-10-12 2013-02-13 广西大学 PDA (Personal Digital Assistant) translation system for inter-translating Chinese and languages of ASEAN (the Association of Southeast Asian Nations) countries
CN102981851A (en) * 2012-11-15 2013-03-20 深圳市共进电子股份有限公司 Rapid development and maintenance system and method for embedded type network device interface language
CN102982127A (en) * 2012-11-15 2013-03-20 深圳市共进电子股份有限公司 Method of replacing characters in batch to achieve multi-language version and batch processing device
CN106372065A (en) * 2016-10-27 2017-02-01 新疆大学 Method and system for developing multi-language website
CN107015971A (en) * 2017-03-30 2017-08-04 唐亮 The post-processing module of multilingual intelligence pretreatment real-time statistics machine translation system
CN108021423A (en) * 2017-12-15 2018-05-11 语联网(武汉)信息技术有限公司 A kind of Multilingual website generating method, system and computer-readable recording medium
CN109446496A (en) * 2018-11-05 2019-03-08 北京锐安科技有限公司 A kind of conversion method, device, equipment and the storage medium of test language file
CN110377917A (en) * 2019-07-08 2019-10-25 天津大学 A kind of the translation reading tables and its shared interpretation method in library
CN110419033A (en) * 2018-02-26 2019-11-05 乐夫兰度株式会社 Web page translation system, web page translation device, webpage provide device and web page translation method
CN111857934A (en) * 2020-07-29 2020-10-30 香港乐蜜有限公司 Page loading method and device, electronic equipment and storage medium
CN111862847A (en) * 2020-07-07 2020-10-30 深圳康佳电子科技有限公司 Electronic table board and translation system
CN113761953A (en) * 2021-08-25 2021-12-07 深圳市道通科技股份有限公司 Translation engine-based professional vocabulary translation method, tool and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101086731A (en) * 2006-06-05 2007-12-12 李钢 Multi-country instant online translation system based on server
CN101494621A (en) * 2009-03-16 2009-07-29 西安六度科技有限公司 Translation system and translation method for multi-language instant communication terminal
CN101968783A (en) * 2010-09-19 2011-02-09 深圳市万兴软件有限公司 Method and device of converting XML document into Excel document

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101086731A (en) * 2006-06-05 2007-12-12 李钢 Multi-country instant online translation system based on server
CN101494621A (en) * 2009-03-16 2009-07-29 西安六度科技有限公司 Translation system and translation method for multi-language instant communication terminal
CN101968783A (en) * 2010-09-19 2011-02-09 深圳市万兴软件有限公司 Method and device of converting XML document into Excel document

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《现代图书情报技术》 20030131 黄晓斌 "HTML向XML转换的研究" 第18至21页 1-2 , *
黄晓斌: ""HTML向XML转换的研究"", 《现代图书情报技术》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929865A (en) * 2012-10-12 2013-02-13 广西大学 PDA (Personal Digital Assistant) translation system for inter-translating Chinese and languages of ASEAN (the Association of Southeast Asian Nations) countries
CN102981851A (en) * 2012-11-15 2013-03-20 深圳市共进电子股份有限公司 Rapid development and maintenance system and method for embedded type network device interface language
CN102982127A (en) * 2012-11-15 2013-03-20 深圳市共进电子股份有限公司 Method of replacing characters in batch to achieve multi-language version and batch processing device
CN102982127B (en) * 2012-11-15 2015-10-21 深圳市共进电子股份有限公司 Batch substitute character string realizes method and the batch-processed devices of multi-lingual version
CN106372065A (en) * 2016-10-27 2017-02-01 新疆大学 Method and system for developing multi-language website
CN107015971A (en) * 2017-03-30 2017-08-04 唐亮 The post-processing module of multilingual intelligence pretreatment real-time statistics machine translation system
CN108021423A (en) * 2017-12-15 2018-05-11 语联网(武汉)信息技术有限公司 A kind of Multilingual website generating method, system and computer-readable recording medium
CN108021423B (en) * 2017-12-15 2021-05-04 语联网(武汉)信息技术有限公司 Multilingual website generation method and system and computer readable storage medium
CN110419033A (en) * 2018-02-26 2019-11-05 乐夫兰度株式会社 Web page translation system, web page translation device, webpage provide device and web page translation method
CN109446496A (en) * 2018-11-05 2019-03-08 北京锐安科技有限公司 A kind of conversion method, device, equipment and the storage medium of test language file
CN110377917A (en) * 2019-07-08 2019-10-25 天津大学 A kind of the translation reading tables and its shared interpretation method in library
CN111862847A (en) * 2020-07-07 2020-10-30 深圳康佳电子科技有限公司 Electronic table board and translation system
CN111857934A (en) * 2020-07-29 2020-10-30 香港乐蜜有限公司 Page loading method and device, electronic equipment and storage medium
CN113761953A (en) * 2021-08-25 2021-12-07 深圳市道通科技股份有限公司 Translation engine-based professional vocabulary translation method, tool and electronic equipment

Similar Documents

Publication Publication Date Title
CN102508878A (en) Method for generating standard foreign language page by means of machine translation system
US6983238B2 (en) Methods and apparatus for globalizing software
Federmann Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output.
CN101520786B (en) Method for realizing input method dictionary and input method system
US8417512B2 (en) Method, used by computers, for developing an ontology from a text in natural language
Nair et al. Machine translation systems for Indian languages
CN101248415A (en) E-services translation utilizing machine translation and translation memory
CN103823796A (en) System and method for translation
CN102289376A (en) Embedded multi-language WEB page realization method and system
CN102053958A (en) In-context exact (ICE) matching
Rozis et al. Tilde MODEL-multilingual open data for EU languages
CN105701089A (en) Post-editing processing method for correction of wrong words in machine translation
CN106126505B (en) Parallel phrase learning method and device
CN104267953A (en) Control and method for importing Word test questions based on browser
CN106372065A (en) Method and system for developing multi-language website
Ion et al. ROMBAC: The Romanian Balanced Annotated Corpus.
CN106776744A (en) A kind of software development methodology and system based on internet information
CN110334362B (en) Method for solving and generating untranslated words based on medical neural machine translation
Izquierdo et al. The ACTRES parallel corpus: an English–Spanish translation corpus
CN107480197A (en) Entity word recognition method and device
CN103218354A (en) On-line translation memory exchange method and system
CN112836525A (en) Human-computer interaction based machine translation system and automatic optimization method thereof
CN101196883A (en) Internet information natural language translation general method and system
CN109857746B (en) Automatic updating method and device for bilingual word stock and electronic equipment
CN109829010A (en) A kind of entry amending method and entry modify device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20120620