CN101916248A - Method for translating internet webpage - Google Patents

Method for translating internet webpage Download PDF

Info

Publication number
CN101916248A
CN101916248A CN 201010271775 CN201010271775A CN101916248A CN 101916248 A CN101916248 A CN 101916248A CN 201010271775 CN201010271775 CN 201010271775 CN 201010271775 A CN201010271775 A CN 201010271775A CN 101916248 A CN101916248 A CN 101916248A
Authority
CN
China
Prior art keywords
text
content
language
translated
webpage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201010271775
Other languages
Chinese (zh)
Inventor
俞晓华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WUXI NUOBAO TECHNOLOGY DEVELOPMENT Co Ltd
Original Assignee
WUXI NUOBAO TECHNOLOGY DEVELOPMENT Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WUXI NUOBAO TECHNOLOGY DEVELOPMENT Co Ltd filed Critical WUXI NUOBAO TECHNOLOGY DEVELOPMENT Co Ltd
Priority to CN 201010271775 priority Critical patent/CN101916248A/en
Publication of CN101916248A publication Critical patent/CN101916248A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a method for translating an internet webpage. The method comprises the following steps of: analyzing the structural standardization content and separating the content into a frame part and a content part, wherein the frame part comprises the content such as the attribute name of a webpage content unit, and the frame content part comprises the content such as the content of the webpage content unit; translating the frame part once, storing an original text in a text A and storing a translated text in a text B; establishing two tables corresponding to a table C of the language of the original text and a table D of a target language in a database for the content part; storing the content in a corresponding entry of the original text language table if specific webpage content unit records exist; and correspondingly translating the contents filled in the original text language table one to one and filling the contents in corresponding entries of a to-be-translated language table. By using the method provided by the invention, the distortion in meaning of the translated text caused by different languages and different word orders during the translation of long sentences is prevented, and meanwhile the frame part is only translated once, so the time for partially processing the translated text is saved and the efficiency is improved.

Description

A kind of method for translating internet webpage
Technical field
The present invention relates to a kind of method for translating internet webpage, be applied to Internet of Things Information technology field, internet.
Background technology
Existing translating internet webpage technology, such as Google's translation, the web page translation system of Google has obtained using widely in current Internet circles, is mainly used in to paste character translation and the translation of webpage entire chapter.The annexation of its each several part is more directly perceived, and essential part has three parts.First is a blank box, and pasting to write for the user needs the translation content, or web page address; Second portion is a background process, perhaps network address webpage full text translation in the user is pasted; Third part is the display part, shows the translation result that is transferred by second portion.Google's translation system is simple and practical, and its weak point is content not to be added the translation of differentiation monoblock return, and to the translation than the long article field, because each languages word order custom is different, looks like through mixed and disorderly stack of regular meeting or meaning distortion.Particularly for the webpage that the standardization content structure is arranged, as the introduction of e-commerce website to product, many webpages all are a kind of framed structures, and the content of framework (Frame) is constant usually, and concrete content (Content) is becoming, and at this moment can produce to repeat translation.
Summary of the invention
The present invention is directed to the problem that translation at present automatically exists, a kind of method for translating internet webpage is provided, designed translation system that a framework separates with content improving the accuracy rate and the efficient of automatic translation system.
The present invention is to the translation of webpage from a kind of language to another kind of language, especially for the webpage that the standardization content structure is arranged, provides technical scheme as e-commerce website to the introduction of product, and step is as follows:
A, Webpage is analyzed, page text is separated into framework text and content text;
B, the framework text is only translated once, original text is stored in first text, and translation is stored in second text;
C, to content text, in database, set up two tables, be respectively original text language table and language to be translated table;
D, read concrete web page contents text, each unit of content text is stored in respective items in the original text language table;
E, translate one by one, translation result is inserted respective items in the language to be translated table for the content that is filled in the original text language table;
F, first text connect by database gets a record from the original text language table, combine with the webpage format framework, forms former web page text; Second text connects by database gets a record from the language to be translated table, combine with the webpage format framework, forms the translation webpage.
Corresponding one by one on described first text and second text structure.
Described first text and second text all have the database connection mechanism, can get the record of same sequence number from described original text language table and language to be translated table.
Described original text language table and language to be translated table are two pre-designed tables of lane database, have same structure, and each is recorded unique sequences number.The sequence number of described original text language table and language to be translated table is the generation of progressively increasing automatically in the original text language table, in the language to be translated table be to duplicate gained when content is done its appropriate translation in according to the original text language table.
When split-frame text and content text, as the framework text, the particular content of content element is as content text with the Property Name of content element in the page text.
Advantage of the present invention is:
1, because characteristics such as each languages word order, word ambiguity, automatic language translation is to be difficult to accurately, divide framework and content to handle respectively to the information that cannonical format and content are arranged, to will be more accurate from the reception and registration of meaning to general full text translation method, to reduce fastening one person's story upon another person of automatic translation system, especially to the text message of cannonical format is arranged.
2, as previously mentioned, after branch framework and the content, framework only need be translated once, has improved efficient.
Description of drawings
Fig. 1 is a principle of the invention synoptic diagram.
Embodiment
The invention will be further described below in conjunction with drawings and Examples.The present invention has used software engineering, database technology and the Internet technology in present information field.Faster, more accurate when being applied between different language translation, more effectively finish translating of information, be particularly useful for having the various information individuality with content of standard.
The present invention can be used for internet system, also can be used for other automatic translation system, is particularly useful for having the information of cannonical format and content, such as in the e-commerce system to the introduction of product, with Property Name and content separate processes, such as, " commodity object ", " purchase mode " or the like keyword belongs to Property Name, it is framework, and its property content is a variable content as " whole world ", " direct deal " etc., in case in have change, replaced, promptly can be produced a new product introduction.According to these characteristics, adopt the translation system of the method for the invention that frame content is handled respectively, framework only need be translated once, be stored in framework translation result memory, the content basis is without product, each all needs to translate according to original language again, by compositor as a result framework result and content results is combined then to show the user.
Structure as shown in Figure 1, the method for the invention have been successfully applied to the language translation of a novel electron business system, and overall plan is that property content is placed in the pre-designed database table.One web user interface is arranged, and form is continuous therewith, inserts or revise the content part translation result for the translator.The framework attribute is translated in advance then places the translation result compositor, and compositor can call a record of content part translation result in addition from database, and synthetic through correspondence, bearing results passes in end-user interface.Framework and content be translation flow respectively, and label is corresponding to following steps among Fig. 1.
For example the text of the webpage of one group of norm structure wherein one page comprise following content:
Trade name knitted underwear crew neck lovers suit
The whole world, North America, area, Hong Kong, Macao and Taiwan, commodity object continent
The direct deal of purchase mode
So-and-so company of supply of material businessman
Commercial specification
Shandong, area, Zibo
So-and-so road of place Shandong Zibo so-and-so number
Issuing time 2009-11-9
Then concrete treatment step is:
1, Webpage is analyzed, page text is separated into framework text and content text, and the framework text comprises: " trade name ", " commodity object ", " purchase mode ", " supply of material businessman ", " commercial specification ", " area ", " place ", " issuing time ".Content text comprises: " knitted underwear crew neck lovers suit ", " Hong Kong, Macao and Taiwan, continent area North America the whole world ", " direct deal ", " so-and-so company ", " Shandong, Zibo ", " Shandong Zibo so-and-so road so-and-so number ", " 2009-11-9 ".
2, the framework text is only translated once, original text is stored in the first text A, and translation is stored in the second text B; Preserve as the first text A:
" trade name
The commodity object
The purchase mode
Supply of material businessman
Commercial specification
The area
The place
Issuing time "
Second text is preserved the corresponding translation of above text.Text A, the form of B can define voluntarily, can enough written in software goes into and reads wherein certain content.
3, to the content text of the page, in database, set up two tables, be respectively original text language table C and language to be translated table D; Table C, D comprise the item that is used to fill in the content text of underframe: trade name, commodity object, purchase mode, supply of material businessman, commercial specification, area, place, issuing time.
4, read concrete web page contents text, each unit of content text is stored in respective items in the original text language table C, the content text of one page is a unit, is about to the respective items that the described content text of step 1 is inserted step 3, as a record of table.
5, translate one by one for the content that is filled in the original text language table C, translation result is inserted respective items in the language to be translated table D.
6, the first text A gets a record by the database connection from original text language table C, combines with the web page frame part, forms former web page text; The second text B gets a record by the database connection from language to be translated table D, combine with the web page frame part, forms the translation webpage.
The content of language to be translated table D, be to produce according to the record among the original text language table C is in time corresponding, can be mechanical translation, such as the common used method of looking up the dictionary, in advance dictionary for translation is placed database or other attachable electronical records, or the user interface that connects original text language table C, language to be translated table D table, value from original text language table C are arranged, by human translation, be stored in again among the language to be translated table D.
Adopt translation method provided by the invention, the distortion because the translation that different language word order difference causes looks like when having avoided long statement to translate, frame part is only translated once simultaneously, has omitted the section processes translation time, has improved efficient.

Claims (6)

1. method for translating internet webpage, it is characterized in that: this method may further comprise the steps:
A, Webpage is analyzed, page text is separated into framework text and content text;
B, the framework text is only translated once, original text is stored in first text, and translation is stored in second text;
C, to content text, in database, set up two tables, be respectively original text language table and language to be translated table;
D, read concrete web page contents text, each unit of content text is stored in respective items in the original text language table;
E, translate one by one, translation result is inserted respective items in the language to be translated table for the content that is filled in the original text language table;
F, first text connect by database gets a record from the original text language table, combine with the webpage format framework, forms former web page text; Second text connects by database gets a record from the language to be translated table, combine with the webpage format framework, forms the translation webpage.
2. method for translating internet webpage according to claim 1 is characterized in that: corresponding one by one on described first text and second text structure.
3. method for translating internet webpage according to claim 1, it is characterized in that: described first text and second text all have the database connection mechanism, can get the record of same sequence number from described original text language table and language to be translated table.
4. method for translating internet webpage according to claim 1, it is characterized in that: described original text language table and language to be translated table are two pre-designed tables of lane database, have same structure, and each is recorded unique sequences number.
5. as method for translating internet webpage as described in the claim 4, it is characterized in that: the sequence number of described original text language table and language to be translated table is the generation of progressively increasing automatically in the original text language table, is to duplicate gained when content is done its appropriate translation in according to the original text language table in the language to be translated table.
6. method for translating internet webpage according to claim 1 is characterized in that: as the framework text, the particular content of content element is as content text with the Property Name of content element in the page text.
CN 201010271775 2010-09-03 2010-09-03 Method for translating internet webpage Pending CN101916248A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010271775 CN101916248A (en) 2010-09-03 2010-09-03 Method for translating internet webpage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010271775 CN101916248A (en) 2010-09-03 2010-09-03 Method for translating internet webpage

Publications (1)

Publication Number Publication Date
CN101916248A true CN101916248A (en) 2010-12-15

Family

ID=43323762

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010271775 Pending CN101916248A (en) 2010-09-03 2010-09-03 Method for translating internet webpage

Country Status (1)

Country Link
CN (1) CN101916248A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982127A (en) * 2012-11-15 2013-03-20 深圳市共进电子股份有限公司 Method of replacing characters in batch to achieve multi-language version and batch processing device
CN104978683A (en) * 2015-07-24 2015-10-14 沈阳云鼎科技有限公司 Usage method of China and Russia wood e-commerce platform
CN105760542A (en) * 2016-03-15 2016-07-13 腾讯科技(深圳)有限公司 Display control method, terminal and server
CN107329958A (en) * 2017-06-08 2017-11-07 努比亚技术有限公司 Language transfer method and device based on webpage
CN109783579A (en) * 2019-01-22 2019-05-21 南京焦点领动云计算技术有限公司 A kind of method of quick copy and translation web site

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101023425A (en) * 2004-06-07 2007-08-22 株式会社日本英柏斯 WEB page translation device and WEB page translation method
CN101470705A (en) * 2007-12-29 2009-07-01 英业达股份有限公司 Dynamic web page translation system and method
CN101615181A (en) * 2008-06-27 2009-12-30 国际商业机器公司 Create the system and method for internationalization network application

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101023425A (en) * 2004-06-07 2007-08-22 株式会社日本英柏斯 WEB page translation device and WEB page translation method
CN101470705A (en) * 2007-12-29 2009-07-01 英业达股份有限公司 Dynamic web page translation system and method
CN101615181A (en) * 2008-06-27 2009-12-30 国际商业机器公司 Create the system and method for internationalization network application

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982127A (en) * 2012-11-15 2013-03-20 深圳市共进电子股份有限公司 Method of replacing characters in batch to achieve multi-language version and batch processing device
CN102982127B (en) * 2012-11-15 2015-10-21 深圳市共进电子股份有限公司 Batch substitute character string realizes method and the batch-processed devices of multi-lingual version
CN104978683A (en) * 2015-07-24 2015-10-14 沈阳云鼎科技有限公司 Usage method of China and Russia wood e-commerce platform
CN105760542A (en) * 2016-03-15 2016-07-13 腾讯科技(深圳)有限公司 Display control method, terminal and server
CN107329958A (en) * 2017-06-08 2017-11-07 努比亚技术有限公司 Language transfer method and device based on webpage
CN107329958B (en) * 2017-06-08 2021-03-26 努比亚技术有限公司 Language conversion method and device based on webpage
CN109783579A (en) * 2019-01-22 2019-05-21 南京焦点领动云计算技术有限公司 A kind of method of quick copy and translation web site

Similar Documents

Publication Publication Date Title
CN107451296B (en) A kind of Website Module rendering intent component-based
Veselinova Negative existentials: A cross-linguistic study
US6397232B1 (en) Method and system for translating the format of the content of document file
US8756495B2 (en) Computer-implemented system and method for tagged and rectangular data processing
US20010014900A1 (en) Method and system for separating content and layout of formatted objects
US20130174024A1 (en) Method and device for converting document format
CN110083805A (en) A kind of method and system that Word file is converted to EPUB file
CN107391500A (en) Text interpretation method, device and equipment
CN101916248A (en) Method for translating internet webpage
EP1225516A1 (en) Storing data of an XML-document in a relational database
CN107122434A (en) A kind of method and system that reconciliation file is imported to database
CN108052619A (en) A kind of method based on configuration information matching and similarity extraction webpage information
CN108520065B (en) Method, system, equipment and storage medium for constructing named entity recognition corpus
CN106873971B (en) Multi-language display method and system for flash application
CN108959330B (en) Database processing and data query method and device
CN102446206B (en) A kind of cross-platform switch and method of three-dimensional data
CN111597292A (en) Text formatting cleaning method based on webpage label position
CN103020032B (en) Report form generation method in cloud computing system
CN109739504A (en) A method of the H5 business handling page is automatically generated according to backstage configuration
CN107203525B (en) Database processing method and device
CN110019433A (en) A kind of report form inquiring method and device
CN101916247A (en) Internet multilingual simultaneous translation method based on single kernel language
CN105224642B (en) The abstracting method and device of entity tag
CN114860946A (en) Method and device for generating map network
CN103778117B (en) A kind of method and system of information of mobile terminal load

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20101215