CN106372065A - Method and system for developing multi-language website - Google Patents

Method and system for developing multi-language website Download PDF

Info

Publication number
CN106372065A
CN106372065A CN201610958116.5A CN201610958116A CN106372065A CN 106372065 A CN106372065 A CN 106372065A CN 201610958116 A CN201610958116 A CN 201610958116A CN 106372065 A CN106372065 A CN 106372065A
Authority
CN
China
Prior art keywords
data
translation
language
website
language website
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610958116.5A
Other languages
Chinese (zh)
Other versions
CN106372065B (en
Inventor
努尔布力
陈海蛟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinjiang University
Original Assignee
Xinjiang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinjiang University filed Critical Xinjiang University
Priority to CN201610958116.5A priority Critical patent/CN106372065B/en
Publication of CN106372065A publication Critical patent/CN106372065A/en
Application granted granted Critical
Publication of CN106372065B publication Critical patent/CN106372065B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • G06F8/4441Reducing the execution time required by the program code
    • G06F8/4442Reducing the number of cache misses; Data prefetching

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the technical field of natural language processing, in particular to a method and system for developing a multi-language website. The method for developing the multi-language website comprises a step a of developing a static web page of the multi-language website; a step b of calling a machine translation interface to perform multilingual translation processing on Chinese data dynamically added in the multi-language website; a step c of reading translated data, and loading and rendering dynamic web pages of the multi-language website according to translated data. A machine translation and manual intervention correction processing mode is adopted, translation errors are greatly reduced, and a web page display effect is higher in accuracy; by selecting a Unicode coded format of utf-8, messy code conditions generated during web page rendering are avoided; through a dynamic loading buffer mechanism, the problems of resource consumption and loading delay caused by calling of the machine translation interface each time in the process of real-time translation loading are solved, and manual intervention is reduced.

Description

A kind of multi-language website development approach and system
Technical field
The present invention relates to natural language processing technique field, particularly to a kind of multi-language website development approach and system.
Background technology
Developing rapidly with Internet commercialization, e-commerce website emerges in multitude, and the market competition is growing more intense.In recent years Come, China E-Commerce Business fast development, the application in each field is constantly expanded and deepened, and turnover hits new peak continuously, drive related Industry flourishes, and associated support system is constantly optimized, and promotes the continuous enhancing of innovative impetus and ability.
It is known that Uighur is a kind of long ancient word of age, the books document write with Uighur at present, Historical summary is many.It stores thousands of Uighur and life information, its historic significance and cultureal value It is precious.Therefore, people's literary composition information processing technology is closely coupled with the development prospect in Uygur's language future.At present, with The culture of the Uygur nationality people and stepping up of know-how, the people that can make Uighur webpage also continuously increases.? Through having a lot of people or group to build various types of Uighur websites and propagated various information, these websites and common Chinese network Stand and news browsing is equally all provided, the function such as information download, but the coding of the Uighur software due to using when setting up a web site Different, this results in Uighur webpage and is constantly in surges ahead, incompatible ground condition, most Uighur Info web all can not be shared, and carry out conversion simultaneously and have to expend substantial amounts of working time and scientific research money between different codings Material.
Xinjiang Uygur Autonomous Regions be one multi-national multilingual occupy residence, e-commerce purchases become a kind of popular Trend, this trend of successful confirmation of Taobao will be popular always, but in boundary, most shopping platforms are all Chinese editions General Websites, the Uygur nationality compatriot being unfamiliar with Chinese for great majority uses difficult, therefore in the urgent need to one The bilingual shopping platform of the normalized dimension Chinese.So, the electric business platform of the dimension language version sending out specification a of looking on the bright side of things is not simple Static Web page Uighur, a mechanism of doing shopping perfectly, need dynamically manage in real time, dynamically additions and deletions change and look into, Manually translation cannot meet the demand of this mass data dynamic change, it would therefore be desirable to machine translation assistance platform Dynamic change.
Machine translation is the process using computer, a kind of natural language being converted into another kind of natural language.Machine translation It is developed so far, occurred in that multiple machine translation systems based on different principle.Totally can be by machine translation system from method On be roughly divided into four classes: rule-based machine translation, Case-based design, the machine translation based on statistics and mixing Formula machine translation.Different machine translation systems has his own strong points.For example, rule-based machine translation system is good at translation symbol Sentence normally, the quality of translation is higher;Versatility is had based on the machine translation system of statistics, automatically from corpus middle school Practise linguistry.
Relevant references with regard to tieing up Chinese machine translation include:
[1] Lan Baixiong, Zheng Xiaona, Xu Xin. the supply chain management [j] of electronic commerce times. Chinese management science, 2000, 03:2-8.
[2] soup. China Electronic Data Interchange network network shopping platform analysis of Organization [d]. Wuhan University of Technology, 2012.
[3] Chen Yun, Zhang Penghua, Ren Lihua. machine translation Research Commentary [j]. value engineering, 2013,01:174-176.
[4] Zhu sea. the machine translation system control fusion [d] based on confusion network. China Science & Technology University, 2010.
[5]nagao m.a.framework of a mechanical translation between japanese and english by analogy principle[m].north holland publications,1984.
[6] Mai Rehabaaili. some key issue research of the dimension Chinese machine translation [d] of Case-based Reasoning. Xinjiang University, 2014.
[7] Ali Fu Kuerban, A Buli meter Ti A not all hot according to wood, tell Er Genyi Bradley sound. dimension Chinese machine The design [j] of translation electronic dictionary. computer engineering and application, 2006,20:76-78.
[8] the hot west of Ka Haer river Ah ratio carries. the research of the Han Weiweihan bi-directional MT system of Case-based Reasoning [d]. Shanghai Communications University, 2012.
[9] Gu Lisongnasierding, buy carry match good fortune fourth. dimension Chinese machine translation system electronic dictionary research with set Meter [j]. Xinjiang Normal University journal (natural science edition), 1997,01:32-36.
In order to solve the problems, such as to tie up Chinese machine translation, Chinese Patent Application No. 201310740830.3 discloses a kind of application Electricity charge self-service payment terminal Uighur translation engine method, this patent from self-service payment terminal select display type such as Chinese, Uighur;If selection Chinese, machine translation need not be carried out;If selection Uighur, start translation engine to data base In information translated, and be shown on terminal interface, thus greatly reduce artificial intertranslation Chinese-Uighur cost and Time.This patent suffers a disadvantage in that and carries out real-time machine translation although greatly reducing manually mutual when selecting Uighur The cost translated and time, still lack caching mechanism or carry out Uighur database purchase in advance, reduce when webpage loads Postpone.
Another Chinese Patent Application No. 201310197369.1 discloses a kind of enterprise comprehensive information management system, this patent Submit to the request of information management to internationalization synchronization module by client, request bag contains the selection of language and application model;State Border synchronization of modules receives asks and point languages management, is transmitted further to information unification management module;Information unification management module will The different information in request after the management of point languages carry out judging simultaneously Classification Management;Different information transfers after Classification Management are given History module;History module receives different information transmission client after Classification Management.This patent solves Different language environment lower page synchronized update problem, the complete occurrences in human life of grasp enterprises of user, wage, archives, task and The details of property etc.;The all operations step of user is all synchronously saved in the middle of history module, can no hinder at any time The reduction that hinders and checking.But this patent suffers a disadvantage in that internationalization synchronization module sub-module point languages management, in client When updating the data in a large number, modules need real-time synchronization to update, and on the one hand do not have preprocessing process, and data returns to exist and refreshes Postpone;On the other hand, data updates and there may be error, does not manually participate in correction procedure.
In sum, the interpretive scheme of the existing dimension bilingual machine translation mothod of the Chinese is all more single, it is common to use dynamically real When machine translation, there is no caching mechanism or process of data preprocessing, under b/c pattern, webpage renders and may there is Confused-code With delay loading problem.
Content of the invention
The invention provides a kind of multi-language website development approach and system are it is intended at least solve existing to a certain extent One of above-mentioned technical problem in technology.
In order to solve the above problems, the technical scheme is that a kind of multi-language website development approach, bag Include:
Step a: the static Web page of exploitation multi-language website;
Step b: call machine translation interface, the Chinese data being dynamically added is carried out multilingual in described multi-language website Translation is processed;
Step c: read translation data, load and render described multi-language website dynamic web page according to described translation data.
The technical scheme that the embodiment of the present invention is taken also includes: in described step a, described multi-language website at least includes Chinese, Uighur or/and Kazak;The static Web page of described exploitation multi-language website is particularly as follows: pass through unicode character The utf-8 coded format of collection carries out the static Web page exploitation of multi-language website.
The technical scheme that the embodiment of the present invention is taken also includes: in described step b, described to dynamic in multi-language website The Chinese data adding carries out multilingual translation process and specifically includes:
Step b1: encapsulation translation interface, batch takes out the Chinese data being dynamically added in site databases, by described Chinese Data storage in a document, is pressed row to the Chinese data in document and is read, and often reads a line and calls machine translation interface to carry out certainly Dynamic translation;
Step b2: manual correction process is carried out to the translation data of described storage;
Step b3: the translation data that described manual correction is processed is stored in described site databases by corresponding form.
The technical scheme that the embodiment of the present invention is taken also includes: in described step c, described according to translation data load and wash with watercolours Contaminate described multi-language website dynamic web page to specifically include: storage translation data when, by Uighur or Kazak each Character code is converted into the 16 system character strings of four, when webpage renders, to the Uighur reading from site databases Or Kazak tries again code conversion.
The technical scheme that the embodiment of the present invention is taken also includes: described step c also includes: described loading webpage is delayed Deposit process;Described web cache processes and includes file cache and memory cache.
Another technical scheme that the embodiment of the present invention is taken is: a kind of multi-language website development system, comprising:
Static Web page development module: for developing the static Web page of multi-language website;
Machine translation module: be used for calling machine translation interface, to the Chinese number being dynamically added in described multi-language website According to carrying out multilingual translation process;
Webpage rendering module: for reading translation data, load and render described multilingual net according to described translation data Stand dynamic web page.
The technical scheme that the embodiment of the present invention is taken also includes: described multi-language website at least includes Chinese, Uighur Or/and Kazak;Described static Web page development module develops the static Web page of multi-language website particularly as follows: passing through unicode The utf-8 coded format of character set carries out the static Web page exploitation of multi-language website.
The technical scheme that the embodiment of the present invention is taken also includes website data library module, and described website data library module is used for The Chinese data being dynamically added in storage multi-language website;Described machine translation module also includes:
Translation unit: for encapsulation translation interface, take out the Chinese being dynamically added in described website data library module in batches Data, described Chinese data is stored in a document, presses row to the Chinese data in document and read, often read a line and call machine Translation interface carries out automatic translation;
Error correction unit: for manual correction process is carried out to the translation data of described storage;
Memory element: the translation data for processing described manual correction stores described website data by corresponding form In library module.
The technical scheme that the embodiment of the present invention is taken also includes: described webpage rendering module loads and wash with watercolours according to translation data Contaminate described multi-language website dynamic web page to specifically include: storage translation data when, by Uighur or Kazak each Character code is converted into the 16 system character strings of four, when webpage renders, to from website data library module read dimension I Your language or Kazak try again code conversion.
The technical scheme that the embodiment of the present invention is taken also includes data cache module, and described data cache module is used for institute State loading webpage and carry out caching process;Described web cache processes and includes file cache and memory cache.
With respect to prior art, what the embodiment of the present invention produced has the beneficial effects that: the multilingual net of the embodiment of the present invention Development approach of standing and system take the template development of static Web page and dynamic data to call the combination of machine translation interface, greatly Reduce greatly cost and the time of artificial intertranslation;Processing mode is corrected using machine translation and manual intervention, greatly reduces translation by mistake Difference, makes webpage display effect accuracy rate higher;By selecting the unicode coded format of utf-8, it is to avoid produce when webpage renders Mess code situation;By the caching mechanism of dynamic importing, during solving real time translation loading, need re invocation machine every time Resource consumption problem and loading delay issue that translation interface causes, reduce manual intervention simultaneously.
Brief description
Fig. 1 is the flow chart of the multi-language website development approach of the embodiment of the present invention;
Fig. 2 is the general frame figure of the multi-language website of the embodiment of the present invention;
Fig. 3 is the training flow chart of statistic law machine translation;
Fig. 4 is the multilingual human assistance translation flow figure of the embodiment of the present invention;
Fig. 5 is the structural representation of the multi-language website development system of the embodiment of the present invention.
Specific embodiment
In order that the objects, technical solutions and advantages of the present invention become more apparent, below in conjunction with drawings and Examples, right The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only in order to explain the present invention, not For limiting the present invention.
Refer to Fig. 1 and Fig. 2, Fig. 1 is the flow chart of the multi-language website development approach of the embodiment of the present invention, Fig. 2 is this The general frame figure of the multi-language website of inventive embodiments.The multi-language website development approach of the embodiment of the present invention includes following step Rapid:
Step 10: exploitation multi-language website template theme, carried out many by the utf-8 coded format of unicode character set The static Web page exploitation of language website;
In step 10, the multi-language website in the embodiment of the present invention at least includes Chinese, Uighur, Kazak Deng.During Uighur, the webpage development of Kazak, Unified coding process be a key technology, Uighur and Kazak belongs to Altai family, and its word has all borrowed Arabic and part Farsi letter, and Uighur has 32 Individual letter, Kazak totally 33 words.Uygur and Kazak word are a kind of words of handwritten form, and each letter is according in list Position in word is different, has 4 kinds of forms of expression such as form, final form in independent form, prefix form, word, by this when writing Position in word for the character determines to manifest form.Therefore, Uighur and Kazak character have one in input, editor A little particularitys, are embodied in: (1) presentation direction is from right to left, row to for from top to down, during input moving direction of cursor and The Chinese, english writing are in opposite direction, and this makes treatment technology when Uighur, Kazak and the Chinese, English mixed editorial more complicated; (2) Kazak has 33 letters, wherein has 9 vowels, 24 consonants.Kazak vowel harmony is tighter Whole, consonant assimilation phenomenon is more.Uighur is made up of 32 letters, and has more than 120 character style, and each letter has 4 Kind different written forms, the head that afterbody is connected with next letter write form, the intermediate form being connected with adjacent letters from beginning to end, The tail that stem is connected with a upper letter writes the absolute version that form and head and the tail are all not connected with adjacent letters, and according to word Which kind of form position in mother determining using;(3) punctuation mark of Uighur, such as comma, question mark etc. and the Chinese, English symbol Number in opposite direction.
There are a variety of identical character set in computer application field, and the user of different language is browsing difference During language web page, often occur because character set used is different and mess code situation occurs.Make in general Chinese web station system With simplified (gb2312) character set, and for Uighur and Kazak web station system, the character set of Chinese website is not Support its language.So to provide dimension, Kazakhstan, for the website of Chinese multi-language version it should select a kind of to Chinese, Uygur Language and the character set of Kazak all supports.Unicode character set has been formulated unified and only to each character of each language One with two bytes (also having 4 bytes) come the coding to represent, meet at language, cross-platform Character decoder and conversion Reason, but due to unicode character set incompatible is088:59-1 character set, the space of occupancy big (for English alphabet, Unicode is also required to two bytes to represent), thus create utf character set.Utf character set includes 2 kinds: utf-8 and utf- 16, wherein, the coding criterion of utf-16 with unicode itself is consistent, and utf-8 is then different, and it defines a kind of " area Between rule ", this rule can keep farthest compatible with is088:59-1 coding, may also be used for representing all simultaneously The character of language, so in design and exploitation dimension, Kazakhstan, Chinese multi-language version website, utf-8 is optimal selection.Pass through The coded format of utf-8, what the static Web page of multi-language website represented is the normal webpage not having mess code, and client selects to tie up me That language version, system will load Uighur template theme.
Step 20: language version pattern is selected by client, corresponding template theme is shown according to language version pattern Static Web page;
Step 30: encapsulation machine translation interface becomes webapi, calls machine translation interface, dynamically adds in site databases The Chinese data entering carries out multilingual translation process;
In step 30, the multi-language website template development of the embodiment of the present invention includes exploitation and the dynamic number of static Web page According to the invoked procedure of call on load machine translation interface, the multilingual e-commerce platform such as exploitation dimension Kazakhstan is except needing to develop Static Web page, being dynamically added of batch content needs also exist for translating into multilingual version, and the embodiment of the present invention encapsulates machine translation Interface, calls machine translation interface efficiently to solve multilingual translation when batch adds.For static Web page coding pretreatment it Afterwards, for the Chinese data being dynamically added, it is stored in site databases, and adds Chinese data in site databases and correspond to Uighur field or Kazak field, for storage translation after Uighur or Kazak, for dynamic web page wash with watercolours Dye loads.
At present, machine translation mode includes regular method and statistic law;
First, regular method: according to language rule, text is analyzed, relends and help computer program to be translated.Most business With machine translation system using regular method.The running of regular method machine translation system is passed through three continuous stages and is realized: analysis, Conversion, generates;Three-level is divided into according to the complexity of three phases:
1. directly translate: the translation of word to word;
2. transition translation: translation process will with reference to and take into account morphology, the syntax and semantic information of original text.Because information is come Source range is excessively wide in range, and grammatical ruless are excessive and there is contradiction and conflict each other, and transition translation is complex and error-prone;
3. interlingua translation.
2nd, statistic law smt: specifically as shown in figure 3, being the training flow chart of statistic law machine translation.By to substantial amounts of Parallel corpora carries out statistical analysiss, builds statistical translation model (vocabulary, comparison or language mode), and then is entered using this model Row translation, as translation, probabilistic algorithm is according to Bayes theorem typically to choose probability of occurrence highest entry in statistics.Assume One English sentence a is translated into Chinese, all Chinese sentence b, are the possible or non-possible potential translations of a.pr A () is the probability that similar a expression occurs, and pr (b | a) it is the probability that a translates into b appearance.Find the maximum of two parameters, just Sentence and its scope of corresponding translation retrieval can be reduced, thus finding out most suitable translation.Smt is according to text analyzing degree rank Difference be divided into two kinds: word-based smt and phrase-based smt, latter one commonly uses at present, and google is just It is this.Cypher text is divided into the sequence of terms of regular length automatically, more each sequence of terms is counted in corpus , corresponding probability highest translation to find in analysis.
Specifically, see also Fig. 4, be the multilingual human assistance translation flow figure of the embodiment of the present invention.To website The Chinese data being dynamically added in data base carries out multilingual translation and processes specifically including following steps:
Step 31: batch takes out the Chinese data (such as commodity data) being dynamically added in site databases, by Chinese number According to storage in a document, coding is pressed row to the Chinese data in document and is read, and often reads a line and calls machine translation interface It is automatically translated into Uighur or Kazak data, and the Uighur after translation or Kazak data are taken Unicode coded format is stored in result document;
In step 31, first translation interface is packaged, input character string one by one, takes out in site databases Chinese data is stored in a document, calls translation interface successively by row reading, returning result list is stored in result document, It is sequentially inserted in site databases corresponding field, this process is the process that an automatization is periodically executed, all data pass again Defeated and emphasize the unification of character encoding format during calling translation interface.
Step 32: manual correction process is carried out to the Uighur after automatic translation or Kazak data;
In the step 32, because dimension breathes out the quantity limitation of dictionary, all of Chinese phrase can not be accomplished very accurate Really translate, in order to improve the accuracy of translation, the present invention participates in correcting a small amount of error in translation process by artificial, significantly Reduce translation error, make webpage display effect accuracy rate higher.
Step 33: the Uighur that manual correction is crossed or Kazak data, read storage by corresponding form and arrive website In the corresponding field of data base, by the cycle, whole operation flow process is automatically processed, complete all static and dynamic two-way wash with watercolours Dye process.
Step 40: read translation data from site databases, load and render corresponding template theme according to translation data Dynamic web page, and to load webpage carry out caching process;
In step 40, because computer is in the restriction of Uighur, so Uighur website is when webpage renders, just There is a unicode coding and corresponding operating system, the coded format conversion of browser and site databases support is defeated The problem enter, exporting, for the driver of almost all of site databases, acquiescence passes between program and site databases The coded format of iso-8859-1 is all adopted during delivery data.Then, website platform by Uighur data storage in website data During storehouse, site databases driver will be stored unicode converting coding formats for iso-8859-1 form.In webpage When rendering, the Uighur data reading from site databases just becomes mess code.In order to solve Uighur, Kazak The read-write of language and Chinese and the incompatible Confused-code causing of storage mode, the embodiment of the present invention propose a kind of Uighur, The code conversion method of Kazak, in storage translation data, by each character code conversion of Uighur, Kazak Cheng Siwei 16 system character strings (such as:After code conversion: " 062a 0648 064a "), when webpage renders, right From site databases, the Uighur reading or Kazak try again code conversion, thus there is not Confused-code ?.
Caching process generally includes: (1) data buffer storage: refers to website data library inquiry php caching mechanism, accesses page every time When face, all can first detect data cached accordingly whether there is, if it does not, just connecting data base, obtain data, and It is saved in after Query Result is serialized in file, later same Query Result just directly obtains from cache table or file. (2) page cache: every time when accession page, all can first detect that corresponding caching pagefile whether there is, if do not deposited Just connecting data base, obtaining data, the display page is simultaneously generating caching pagefile, page when so next time accesses simultaneously Face file has just played a role (template engine and some common on the net php caching mechanism classes generally have this function).(3) time Trigger caching: check that file whether there is and timestamp is less than the expired time arranging, if the timestamp ratio of file modification It is big that current time stamp deducts expired time stamp, then just with caching, otherwise updates caching.(4) content trigger caching: when insertion number According to or when updating the data, force to update php caching mechanism.(5) static cache: static cache refers to static, directly generates The texts such as html or xml, have when renewal and re-generate once, are suitable for the page of less change.(6) memory cache: Memcached is high performance, distributed memory object php caching mechanism system, for reducing data in dynamic application Storehouse loads, and lifts access speed.(7) php caches web caching (10) dns wheel based on reverse proxy for (8) mysql caching (9) Ask.This caching process of the embodiment of the present invention mainly includes file cache and memory cache.The Main Function of caching is to reduce number According to the pressure in storehouse and php arithmetical unit, reduce the delay calling machine translation site databases data to bring when webpage renders, solve During real time translation is loaded into, the resource consumption problem that needs re invocation machine translation interface to cause every time, also can subtract simultaneously Few artificial intervention.The data inquiring is stored directly in inside caching, without repetition query web data base, the pressure of mysql Power can mitigate;And the computing of php is mainly reflected in, the result that such as complicated to one recursive operation obtains enters row cache, no Carry out the computing of complexity with wasting cpu every time.During carrying out caching process, the coding process of Uighur is still Take above-mentioned unicode coding criterion.
The embodiment of the present invention is not limited in the coding compatibling problem solving Wei Han, breathing out between the Chinese, between similar Ke Han Equally using similar numeralization processing method, entirely call machine translation interface to site databases data processing, artificial ginseng With process be applied equally to other multilingual work.
Refer to Fig. 5, be the structure chart of the multi-language website development system of the embodiment of the present invention.The embodiment of the present invention many Language Website development structure includes static Web page development module, static Web page display module, website data library module, machine translation Module, webpage rendering module data cache module;
Static Web page development module is used for developing multi-language website template theme, is compiled by the utf-8 of unicode character set Code form carries out the static Web page exploitation of multi-language website;Wherein, the multi-language website in the embodiment of the present invention at least includes the Chinese Language, Uighur, Kazak etc..During Uighur, the webpage development of Kazak, it is one that Unified coding is processed Key technology, Uighur and Kazak belong to Altai family, and its word has all borrowed Arabic and part Farsi Letter, Uighur has 32 letters, Kazak totally 33 words.Uygur and Kazak word are a kind of literary compositions of handwritten form Word, each letter is different according to the position in word, has 4 kinds of form, final form etc. in independent form, prefix form, word The form of expression, when writing, by this character, the position in word determines to manifest form.Therefore, Uighur and Kazak character In input, editor, there are some particularitys, be embodied in: (1) presentation direction is from right to left, row to for from top to down, During input, moving direction of cursor is in opposite direction with the Chinese, english writing, and this makes Uighur, Kazak mix volume with the Chinese, English When collecting, treatment technology is more complicated;(2) Kazak has 33 letters, wherein has 9 vowels, 24 consonants.Breathe out Sa Ke language vowel harmony is more in neat formation, and consonant assimilation phenomenon is more.Uighur is made up of 32 letters, and has more than 120 Character style, each letter has 4 kinds of different written forms, the head that afterbody is connected with next letter write form, from beginning to end with phase The tail that the intermediate form of adjacent letter connection, stem are connected with a upper letter writes form and head and the tail are all not connected with adjacent letters Absolute version, and determined using which kind of form according to the position in letter;(3) punctuation mark of Uighur, for example Comma, question mark etc. are in opposite direction with the Chinese, English symbol.
There are a variety of identical character set in computer application field, and the user of different language is browsing difference During language web page, often occur because character set used is different and mess code situation occurs.Make in general Chinese web station system With simplified (gb2312) character set, and for Uighur and Kazak web station system, the character set of Chinese website is not Support its language.So to provide dimension, Kazakhstan, for the website of Chinese multi-language version it should select a kind of to Chinese, Uygur Language and the character set of Kazak all supports.Unicode character set has been formulated unified and only to each character of each language One with two bytes (also having 4 bytes) come the coding to represent, meet at language, cross-platform Character decoder and conversion Reason, but due to unicode character set incompatible is088:59-1 character set, the space of occupancy big (for English alphabet, Unicode is also required to two bytes to represent), thus create utf character set.Utf character set includes 2 kinds: utf-8 and utf- 16, wherein, the coding criterion of utf-16 with unicode itself is consistent, and utf-8 is then different, and it defines a kind of " area Between rule ", this rule can keep farthest compatible with is088:59-1 coding, may also be used for representing all simultaneously The character of language, so in design and exploitation dimension, Kazakhstan, Chinese multi-language version website, utf-8 is optimal selection.Pass through The coded format of utf-8, what the static Web page of multi-language website represented is the normal webpage not having mess code, and client selects to tie up me That language version, system will load Uighur template theme.
The language version pattern that static Web page display module is used for according to client selects shows the quiet of corresponding template theme State webpage;
Website data library module is used for storing the Chinese data being dynamically added in multi-language website;
Machine translation module is used for encapsulating machine translation interface and becomes webapi, calls machine translation interface, to website data The Chinese data being dynamically added in library module carries out multilingual translation process;Wherein, the multi-language website mould of the embodiment of the present invention Plate exploitation includes the exploitation of static Web page and the invoked procedure of dynamic data call on load machine translation interface, and exploitation dimension is breathed out etc. Except needing to develop static Web page, being dynamically added of batch content needs also exist for translating into multi-lingual multilingual e-commerce platform Plant version, the embodiment of the present invention encapsulates machine translation interface, call machine translation interface efficiently to solve multi-lingual when batch adds Plant translation.After static Web page coding pretreatment, for the Chinese data being dynamically added, it is stored in DBM, And add Chinese data corresponding Uighur field or Kazak field in DBM, after storage translation Uighur or Kazak, render loading for dynamic web page.
Specifically, machine translation module includes translation unit, error correction unit and memory element;
Translation unit is used for taking out the Chinese data in DBM in batches, Chinese data is stored in a document, compiles Program writing is pressed row to the Chinese data in document and is read, and often reads a line and calls machine translation interface to be automatically translated into Uighur Or Kazak data, and the Uighur after translation or Kazak data are taken unicode coded format to be stored in Result document;Wherein, first translation interface is packaged, input character string one by one, takes out the Chinese in DBM Data storage, in a document, calls translation interface by row reading, returning result list is stored in result document successively, then according to Secondary insertion DBM corresponding field in, this process is the process that an automatization is periodically executed, all data transfers and The unification of character encoding format is emphasized during calling translation interface.
Error correction unit is used for carrying out manual correction process to the Uighur after automatic translation or Kazak data;Its In, because dimension breathes out the quantity limitation of dictionary, very accurate translation can not be accomplished to all of Chinese phrase, turn over to improve The accuracy translated, the present invention, by manually participating in correct a small amount of error in translation process, greatly reduces translation error, makes net Page bandwagon effect accuracy rate is higher.
Memory element is used for Uighur or the Kazak data that manual correction is crossed, and reads storage by corresponding form and arrives In the corresponding field of DBM, by the cycle, whole operation flow process is automatically processed, complete all static and dynamically double To render process.
Webpage rendering module is used for reading translation data from website data library module, loads and renders according to translation data The dynamic web page of corresponding template theme;Wherein, because computer is in the restriction of Uighur, so Uighur website is in webpage When rendering, there is a unicode coding and corresponding operating system, the coded format of browser and data base's support turns Change input, the problem of output, for the driver of almost all of data base, acquiescence transmits number between program database According to when all adopt the coded format of iso-8859-1.Then, website platform by Uighur data storage in DBM, Database device will be stored unicode converting coding formats for iso-8859-1 form.When webpage renders, from The Uighur data reading in DBM just becomes mess code.In order to solve Uighur, Kazak and Chinese Read-write and the incompatible Confused-code causing of storage mode, the embodiment of the present invention proposes a kind of Uighur, Kazak Code conversion method, storage translation data when, each character code of Uighur, Kazak is converted into four 16 system character strings (such as:After code conversion: " 062a 0648 064a "), when webpage renders, to from data base In module, the Uighur reading or Kazak try again code conversion, thus there is not Confused-code.
Data cache module is used for carrying out caching process to loading webpage;Caching process generally includes: (1) data buffer storage: Refer to data base querying php caching mechanism, when each accession page, all can be first detected data cached accordingly whether depositing If it does not, just connecting data base, obtaining data, and be saved in file, equally after Query Result is serialized later Query Result just directly obtain from cache table or file.(2) page cache: every time when accession page, all can first examine Surveying the corresponding pagefile that caches whether there is, if it does not, just connecting data base, obtaining data, the display page simultaneously Generate caching pagefile, when so next time accesses, pagefile has just played a role (template engine and common on the net Some php caching mechanism classes generally have this function).(3) Time Triggered caching: check that file whether there is and timestamp is less than The expired time of setting, if the timestamp of file modification deducts expired time stamp greatly than current time stamp, then just with caching, Otherwise update caching.(4) content trigger caching: when inserting data or updating the data, force to update php caching mechanism.(5) quiet State caches: static cache refers to static, directly generates the texts such as html or xml, has when renewal and re-generates once, It is suitable for the page of less change.(6) memory cache: memcached is high performance, distributed memory object php caching Mechanism system, for reducing database loads in dynamic application, lifts access speed.(7) php caching (8) mysql caching (9) web based on reverse proxy caches (10) dns poll.This caching process of the embodiment of the present invention mainly includes file cache And memory cache.The Main Function of caching is the pressure reducing data base and php arithmetical unit, reduces when webpage renders and calls machine The delay that translation database data is brought, during solving real time translation loading, needs re invocation machine translation interface every time The resource consumption problem causing, also can reduce artificial intervention simultaneously.The data inquiring is stored directly in inside caching, without Repeat to inquire about data base, the pressure of mysql can mitigate;And the computing of php is mainly reflected in, such as complicated to one recurrence is transported The result obtaining enters row cache, carries out complicated computing without wasting cpu every time.During carrying out caching process, dimension I still takes above-mentioned unicode coding criterion by your coding process of literary composition.
The multi-language website development approach of the embodiment of the present invention and system take template development and the dynamic number of static Web page According to the combination calling machine translation interface, greatly reduce cost and the time of artificial intertranslation;Using machine translation with manually Intervene and correct processing mode, greatly reduce translation error, make webpage display effect accuracy rate higher;By selecting utf-8's Unicode coded format, it is to avoid the mess code situation producing when webpage renders;By the caching mechanism of dynamic importing, solve in real time During translation is loaded into, needs resource consumption problem that re invocation machine translation interface causes every time and load delay issue, Reduce manual intervention simultaneously.
Described above to the disclosed embodiments, makes professional and technical personnel in the field be capable of or uses the present invention. Multiple modifications to these embodiments will be apparent from for those skilled in the art, as defined herein General Principle can be realized without departing from the spirit or scope of the present invention in other embodiments.Therefore, the present invention It is not intended to be limited to the embodiments shown herein, and be to fit to and principles disclosed herein and features of novelty phase one The scope the widest causing.

Claims (10)

1. a kind of multi-language website development approach is it is characterised in that include:
Step a: the static Web page of exploitation multi-language website;
Step b: call machine translation interface, multilingual translation is carried out to the Chinese data being dynamically added in described multi-language website Process;
Step c: read translation data, load and render described multi-language website dynamic web page according to described translation data.
2. multi-language website development approach according to claim 1 is it is characterised in that in described step a, described multi-lingual Speech website at least includes Chinese, Uighur or/and Kazak;Described exploitation multi-language website static Web page particularly as follows: Carry out the static Web page exploitation of multi-language website by the utf-8 coded format of unicode character set.
3. multi-language website development approach according to claim 2 is it is characterised in that in described step b, described to many The Chinese data being dynamically added in language website carries out multilingual translation process and specifically includes:
Step b1: encapsulation translation interface, batch takes out the Chinese data being dynamically added in site databases, by described Chinese data Storage in a document, is pressed row to the Chinese data in document and is read, and often reads a line and calls machine translation interface to carry out automatic turning Translate;
Step b2: manual correction process is carried out to the translation data of described storage;
Step b3: the translation data that described manual correction is processed is stored in described site databases by corresponding form.
4. multi-language website development approach according to claim 2 is it is characterised in that in described step c, described basis is turned over Translate data and load and render described multi-language website dynamic web page and specifically include: in storage translation data, by Uighur or Each character code of Kazak is converted into the 16 system character strings of four, when webpage renders, to from site databases The Uighur reading or Kazak try again code conversion.
5. multi-language website development approach according to claim 4 is it is characterised in that described step c also includes: to described Load webpage and carry out caching process;Described web cache processes and includes file cache and memory cache.
6. a kind of multi-language website development system is it is characterised in that include:
Static Web page development module: for developing the static Web page of multi-language website;
Machine translation module: be used for calling machine translation interface, the Chinese data being dynamically added in described multi-language website is entered Row multilingual translation is processed;
Webpage rendering module: for reading translation data, load and render described multi-language website according to described translation data and move State webpage.
7. multi-language website development system according to claim 6 is it is characterised in that described multi-language website at least includes Chinese, Uighur or/and Kazak;The static Web page that described static Web page development module develops multi-language website is concrete For: carry out the static Web page exploitation of multi-language website by the utf-8 coded format of unicode character set.
8. multi-language website development system according to claim 7 is it is characterised in that also include website data library module, Described website data library module is used for storing the Chinese data being dynamically added in multi-language website;Described machine translation module is also wrapped Include:
Translation unit: for encapsulation translation interface, take out the Chinese data being dynamically added in described website data library module in batches, Described Chinese data is stored in a document, row is pressed to the Chinese data in document and reads, often read a line and call machine translation Interface carries out automatic translation;
Error correction unit: for manual correction process is carried out to the translation data of described storage;
Memory element: the translation data for processing described manual correction stores described site databases mould by corresponding form In block.
9. multi-language website development system according to claim 7 is it is characterised in that described webpage rendering module is according to turning over Translate data and load and render described multi-language website dynamic web page and specifically include: in storage translation data, by Uighur or Each character code of Kazak is converted into the 16 system character strings of four, when webpage renders, to from site databases mould In block, the Uighur reading or Kazak try again code conversion.
10. multi-language website development system according to claim 9 is it is characterised in that also include data cache module, institute State data cache module for caching process is carried out to described loading webpage;Described web cache processes and includes file cache and interior Deposit caching.
CN201610958116.5A 2016-10-27 2016-10-27 Multi-language website development method and system Expired - Fee Related CN106372065B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610958116.5A CN106372065B (en) 2016-10-27 2016-10-27 Multi-language website development method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610958116.5A CN106372065B (en) 2016-10-27 2016-10-27 Multi-language website development method and system

Publications (2)

Publication Number Publication Date
CN106372065A true CN106372065A (en) 2017-02-01
CN106372065B CN106372065B (en) 2020-07-21

Family

ID=57893794

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610958116.5A Expired - Fee Related CN106372065B (en) 2016-10-27 2016-10-27 Multi-language website development method and system

Country Status (1)

Country Link
CN (1) CN106372065B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108021423A (en) * 2017-12-15 2018-05-11 语联网(武汉)信息技术有限公司 A kind of Multilingual website generating method, system and computer-readable recording medium
CN108280219A (en) * 2018-02-07 2018-07-13 深圳壹账通智能科技有限公司 Text interpretation method, device, computer equipment and storage medium
CN108563645A (en) * 2018-04-24 2018-09-21 成都智信电子技术有限公司 The metadata interpretation method and device of HIS systems
CN108664247A (en) * 2018-04-26 2018-10-16 微梦创科网络科技(中国)有限公司 A kind of method and device of Page Template data interaction
CN109088995A (en) * 2018-10-17 2018-12-25 永德利硅橡胶科技(深圳)有限公司 Support the method and mobile phone of global languages translation
CN109684096A (en) * 2018-12-29 2019-04-26 北京超图软件股份有限公司 A kind of software program recycling processing method and device
CN109783579A (en) * 2019-01-22 2019-05-21 南京焦点领动云计算技术有限公司 A kind of method of quick copy and translation web site
CN109828775A (en) * 2018-12-06 2019-05-31 中国电子进出口有限公司 A kind of WEB management system and method for multilingual translation content of text

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000330992A (en) * 1999-05-17 2000-11-30 Nec Software Shikoku Ltd Multilinguistic www server system and its processing method
CN1295292A (en) * 1999-11-05 2001-05-16 国际商业机器公司 Method and system for multi-language wide world web service device thereof
CN101957815A (en) * 2009-07-13 2011-01-26 白劲实 Automatic translation method and system based on correct translation result and corresponding relation
CN102193914A (en) * 2011-05-26 2011-09-21 中国科学院计算技术研究所 Computer aided translation method and system
CN102508878A (en) * 2011-10-18 2012-06-20 深圳市共进电子股份有限公司 Method for generating standard foreign language page by means of machine translation system
CN102567384A (en) * 2010-12-29 2012-07-11 盛乐信息技术(上海)有限公司 Webpage multi-language dynamic switching method and system based on webpage browser engine
CN102929865A (en) * 2012-10-12 2013-02-13 广西大学 PDA (Personal Digital Assistant) translation system for inter-translating Chinese and languages of ASEAN (the Association of Southeast Asian Nations) countries
CN103823796A (en) * 2014-02-25 2014-05-28 武汉传神信息技术有限公司 System and method for translation
CN104375808A (en) * 2013-07-11 2015-02-25 携程计算机技术(上海)有限公司 Method and device for displaying interfaces

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000330992A (en) * 1999-05-17 2000-11-30 Nec Software Shikoku Ltd Multilinguistic www server system and its processing method
CN1295292A (en) * 1999-11-05 2001-05-16 国际商业机器公司 Method and system for multi-language wide world web service device thereof
CN101957815A (en) * 2009-07-13 2011-01-26 白劲实 Automatic translation method and system based on correct translation result and corresponding relation
CN102567384A (en) * 2010-12-29 2012-07-11 盛乐信息技术(上海)有限公司 Webpage multi-language dynamic switching method and system based on webpage browser engine
CN102193914A (en) * 2011-05-26 2011-09-21 中国科学院计算技术研究所 Computer aided translation method and system
CN102508878A (en) * 2011-10-18 2012-06-20 深圳市共进电子股份有限公司 Method for generating standard foreign language page by means of machine translation system
CN102929865A (en) * 2012-10-12 2013-02-13 广西大学 PDA (Personal Digital Assistant) translation system for inter-translating Chinese and languages of ASEAN (the Association of Southeast Asian Nations) countries
CN104375808A (en) * 2013-07-11 2015-02-25 携程计算机技术(上海)有限公司 Method and device for displaying interfaces
CN103823796A (en) * 2014-02-25 2014-05-28 武汉传神信息技术有限公司 System and method for translation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
XML中国论坛: "《XML实用进阶教程》", 31 March 2001 *
王业 等: "一种多语言网站解决方案", 《计算机系统应用》 *
黄河清 等: "基于动态数据库的多国语言网站开发", 《计算机工程》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108021423A (en) * 2017-12-15 2018-05-11 语联网(武汉)信息技术有限公司 A kind of Multilingual website generating method, system and computer-readable recording medium
CN108021423B (en) * 2017-12-15 2021-05-04 语联网(武汉)信息技术有限公司 Multilingual website generation method and system and computer readable storage medium
CN108280219A (en) * 2018-02-07 2018-07-13 深圳壹账通智能科技有限公司 Text interpretation method, device, computer equipment and storage medium
CN108280219B (en) * 2018-02-07 2021-06-22 深圳壹账通智能科技有限公司 Text translation method and device, computer equipment and storage medium
CN108563645A (en) * 2018-04-24 2018-09-21 成都智信电子技术有限公司 The metadata interpretation method and device of HIS systems
CN108664247A (en) * 2018-04-26 2018-10-16 微梦创科网络科技(中国)有限公司 A kind of method and device of Page Template data interaction
CN108664247B (en) * 2018-04-26 2022-02-01 微梦创科网络科技(中国)有限公司 Page template data interaction method and device
CN109088995B (en) * 2018-10-17 2020-11-13 永德利硅橡胶科技(深圳)有限公司 Method and mobile phone for supporting global language translation
CN109088995A (en) * 2018-10-17 2018-12-25 永德利硅橡胶科技(深圳)有限公司 Support the method and mobile phone of global languages translation
CN109828775A (en) * 2018-12-06 2019-05-31 中国电子进出口有限公司 A kind of WEB management system and method for multilingual translation content of text
CN109828775B (en) * 2018-12-06 2021-12-07 中国电子进出口有限公司 WEB management system and method for multilingual translation text content
CN109684096A (en) * 2018-12-29 2019-04-26 北京超图软件股份有限公司 A kind of software program recycling processing method and device
CN109783579B (en) * 2019-01-22 2020-06-02 南京焦点领动云计算技术有限公司 Method for quickly copying and translating website
CN109783579A (en) * 2019-01-22 2019-05-21 南京焦点领动云计算技术有限公司 A kind of method of quick copy and translation web site

Also Published As

Publication number Publication date
CN106372065B (en) 2020-07-21

Similar Documents

Publication Publication Date Title
CN106372065A (en) Method and system for developing multi-language website
Diab Second generation AMIRA tools for Arabic processing: Fast and robust tokenization, POS tagging, and base phrase chunking
Way et al. On the Role of Translations in State‐of‐the‐Art Statistical Machine Translation
Zong Research on the relations between machine translation and human translation
Sikos Web Standards: Mastering HTML5, CSS3, and XML
Mo Design and Implementation of an Interactive English Translation System Based on the Information‐Assisted Processing Function of the Internet of Things
Van Der Goot et al. Norm It!: Lexical Normalization for Italian and Its Downstream Effects for Dependency Parsing
CN102929865A (en) PDA (Personal Digital Assistant) translation system for inter-translating Chinese and languages of ASEAN (the Association of Southeast Asian Nations) countries
US9779083B2 (en) Functioning of a computing device by a natural language processing method comprising analysis of sentences by clause types
CN109871516A (en) A kind of method of bilayer PDF Mass production WORD
Wu et al. Adapting attention-based neural network to low-resource Mongolian-Chinese machine translation
CN109828775B (en) WEB management system and method for multilingual translation text content
JP7064871B2 (en) Text mining device and text mining method
Singh et al. Intelligent System for Automatic Transfer Grammar Creation Using Parallel Corpus
Lu et al. Language model for Mongolian polyphone proofreading
Bhatti et al. Sindhi Text Corpus using XML and Custom Tags
Topping Using MathType to create TeX and MathML equations
Yu [Retracted] English Characteristic Semantic Block Processing Based on English‐Chinese Machine Translation
Abudouwaili et al. Morphological Analysis Corpus Construction of Uyghur
Aihua Man-Machine Translation—Future of Computer-Assisted Translation
Li The Application of Multimedia Network Technology in Network Technology
Chakrawarti et al. Phrase-Based Statistical Machine Translation of Hindi Poetries into English by incorporating Word Sense Disambiguation
Suganthi et al. Semantic based orthographic with prepositional phrase for English-Tamil translation
Liang et al. Tibetan-BERT-wwm: A Tibetan Pretrained Model With Whole Word Masking for Text Classification
Zhou Super-Function Based Machine Translation System for Business User

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200721