CN106844354A - A kind of webpage takes word Chinese interpretation method and its device - Google Patents

A kind of webpage takes word Chinese interpretation method and its device Download PDF

Info

Publication number
CN106844354A
CN106844354A CN201710019958.9A CN201710019958A CN106844354A CN 106844354 A CN106844354 A CN 106844354A CN 201710019958 A CN201710019958 A CN 201710019958A CN 106844354 A CN106844354 A CN 106844354A
Authority
CN
China
Prior art keywords
translation
chinese
word
module
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710019958.9A
Other languages
Chinese (zh)
Inventor
陈雷
高翊
胡泽林
李淼
杨振新
孙凯
高进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan Committee Of Ethnic And Religious Affairs
Hefei Institutes of Physical Science of CAS
Original Assignee
Yunnan Committee Of Ethnic And Religious Affairs
Hefei Institutes of Physical Science of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan Committee Of Ethnic And Religious Affairs, Hefei Institutes of Physical Science of CAS filed Critical Yunnan Committee Of Ethnic And Religious Affairs
Priority to CN201710019958.9A priority Critical patent/CN106844354A/en
Publication of CN106844354A publication Critical patent/CN106844354A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Abstract

Webpage of the invention takes word Chinese's translating equipment and has merged machine translation and data retrieval, it is identified for webpage text acquisition module, corresponding cypher text is directly returned if it can be retrieved in Chinese's intertranslation DBM, otherwise recall machine translation module is carried out from paragraph to sentence again to the parsing step by step of word to the content for obtaining, after translation and show final result, it is allowed to which user is edited the more preferable translation of offer to translation result again.The present invention is not limited to the translation of word, and capable translation can be dropped into whole sentence and whole section, ensures the integrality of translation result;Using fusion machine translation and the method for data retrieval, without machine translation module is called every time, translation speed can be greatly improved;Using translation, editor module improves translation result again, and Chinese's intertranslation pair is expanded as the increase of access times is sustainable.

Description

A kind of webpage takes word Chinese interpretation method and its device
Technical field
The present invention relates to Computer Applied Technology field, more particularly to a kind of webpage for merging machine translation and data retrieval Take word Chinese interpretation method and its device.
Background technology
With the development of internet, increasing knowledge is propagated by webpage.China is unified multi-ethnic country, The part concentrated area of nationality, however it remains the more difficult nationality compatriot of many use Chinese.Existing translation software is directed to the Chinese mostly The majority languages such as English, lack the related interpretative function of native language;On the other hand, some screen word-selecting softwares, such as Kingsoft Powerword, only Word can be translated, it is impossible to complete the translation of chapter, paragraph or sentence level so that indigestion is whole sometimes for user The implication of individual paragraph or whole sentence.Therefore, how to obtain the given content on webpage and translate into required native language tool There is realistic meaning.In recent years, natural language processing technique especially machine translation mothod sustainable development, native language information chemical industry Also greater advance is achieved, certain native language resource is have accumulated, is to realize that Chinese translate using machine translation mothod to carry Language basis and technical support are supplied.
The content of the invention
The present invention is for the information-based current demand of native language, there is provided a kind of fusion machine translation and data retrieval Webpage takes word Chinese interpretation method and its device, obtains the word in Chinese web, to word from paragraph to sentence, merges machine Translation is translated downwards step by step with data retrieval, realizes the effective integration of machine translation and data retrieval, is improve Chinese and is turned over The speed and accuracy translated.
The present invention is achieved by the following technical solutions:
A kind of machine translation that merges takes word Chinese's interpretation method with the webpage of data retrieval, comprises the following steps:
Step S1:Set up language translation model, decoder, Chinese's fontlib and Chinese's input method;
Step S2:Chinese's bilingual teaching mode is set up, is preserved in man-to-man form;
Step S3:The bilingual comparison data storehouse of Chinese is set up, is preserved in man-to-man form;
Step S4:In the non-Web page text such as navigation bar, menu, title, the content of text in entire Web page element is obtained, In Web page text part, with paragraph as the upper limit, the content of text at mouse is recognized and obtained in maximum length mode;
Step S5:The content of text of acquisition is compared with data in Chinese's bilingual teaching mode, if can find In the presence of the consistent intertranslation of the content of text for obtaining to then returning to corresponding translation data, the text that will be obtained if it cannot find Content carries out paragraph, sentence, word and parses step by step by decoder, is compared with the bilingual comparison data storehouse corresponding data of Chinese It is right, the parsing data after comparison are returned;
Step S6:The translation data of return or parsing data are arranged again by language translation model, after arrangement Translation result submit to, Chinese's fontlib is called according to translation languages and code identification, show final translation result;
Step S7:Translation is carried out to final translation result to edit again, it is allowed to which user calls Chinese's input method to carry out translation Editor and modification, and webpage word and the amended translation that will be obtained as intertranslation to added to Chinese's bilingual teaching mode In.Result carries out translation and edits again, it is allowed to which user calls Chinese's input method to edit and change translation, and the net that will be obtained Page word and amended translation is as intertranslation to added in Chinese's bilingual teaching mode.
A kind of machine translation that merges takes word Chinese's translating equipment with the webpage of data retrieval, including webpage word obtains mould Block, Chinese's intertranslation DBM, machine translation module, display module and translation editor module, Chinese's intertranslation data again Library module includes data retrieval module, Chinese's bilingual teaching mode, the bilingual comparison data storehouse of Chinese;The machine translation module Including language translation model, decoder, Chinese's fontlib and Chinese's input method.
The webpage that the present invention is provided takes word Chinese's translating equipment and has merged machine translation and data retrieval, for webpage word Acquisition module is identified, and corresponding translation text is directly returned if it can be retrieved in Chinese's intertranslation DBM This, otherwise recall machine translation module is carried out from paragraph to sentence again to the parsing step by step of word, translation to the content for obtaining Afterwards and show final result, it is allowed to which user is edited the more preferable translation of offer to translation result again.The present invention is not limited to list The translation of word, can drop into capable translation to whole sentence and whole section, ensure the integrality of translation result;Use fusion machine translation With the method for data retrieval, without machine translation module is called every time, translation speed can be greatly improved;Compiled again using translation Module is collected to improve translation result, Chinese's intertranslation pair is expanded as the increase of access times is sustainable.
Brief description of the drawings
Fig. 1 is the flow chart that webpage of the invention takes word Chinese's interpretation method
Fig. 2 is the structure chart that webpage of the invention takes word Chinese's translating equipment.
Specific embodiment
Technical scheme is elaborated below in conjunction with Fig. 1 and Fig. 2.
As depicted in figs. 1 and 2, webpage of the invention takes word Chinese's translating equipment, including webpage text acquisition module, Chinese Intertranslation DBM, machine translation module, display module and translation editor module again.Chinese's intertranslation DBM includes number According to retrieval module, Chinese's bilingual teaching mode, the bilingual comparison data storehouse of Chinese, machine translation module includes language translation mould Type, decoder, Chinese's fontlib and Chinese's input method.Chinese's bilingual teaching mode and the intertranslation in the bilingual comparison data storehouse of Chinese To being preserved in man-to-man form.
When needs are translated, after webpage text acquisition module starts, in the non-Web page text such as navigation bar, menu, title In, obtain the content of text in entire Web page element;In Web page text part, with paragraph as the upper limit, known in maximum length mode Not and obtain the content of text at mouse.The content of text that will be obtained again is bilingual parallel with the Chinese of Chinese's intertranslation DBM Data are compared in corpus, and corresponding translation is returned to if in the presence of the consistent intertranslation of the content of text for obtaining if that can find Data;The content of text of acquisition is carried out into paragraph, sentence, word by the decoder of machine translation module if it cannot find Parse step by step, compare with the bilingual comparison data storehouse corresponding data of the Chinese of Chinese's intertranslation DBM, after comparison Parsing data are returned.The translation data of return or parsing data are carried out by the language translation model of machine translation module whole again Reason, the translation result after arrangement is submitted to.Display module calls Chinese's fontlib according to translation languages and code identification, and display is most Whole translation result.User can call Chinese's input method of machine translation module to edit and change translation, and by obtain Webpage text content and amended translation are as intertranslation to the bilingual parallel language of Chinese added to Chinese's intertranslation DBM In material storehouse.
The above is only the preferred embodiment of the present invention, and protection scope of the present invention is not limited merely to above-mentioned implementation Example, all technical schemes belonged under thinking of the present invention belong to protection scope of the present invention.It should be pointed out that for the art Those of ordinary skill for, some improvements and modifications without departing from the principles of the present invention, these improvements and modifications Should be regarded as protection scope of the present invention.

Claims (2)

1. a kind of webpage takes word Chinese's interpretation method, it is characterised in that:Comprise the following steps:
Step S1:Set up language translation model, decoder, Chinese's fontlib and Chinese's input method;
Step S2:Chinese's bilingual teaching mode is set up, is preserved in man-to-man form;
Step S3:The bilingual comparison data storehouse of Chinese is set up, is preserved in man-to-man form;
Step S4:In the non-Web page text such as navigation bar, menu, title, the content of text in entire Web page element is obtained, in net Page body part, with paragraph as the upper limit, recognizes and obtains the content of text at mouse in maximum length mode;
Step S5:The content of text of acquisition is compared with data in Chinese's bilingual teaching mode, if presence can be found The consistent intertranslation of the content of text of acquisition to then returning to corresponding translation data, the content of text that will be obtained if it cannot find Paragraph, sentence, word are carried out by decoder to parse step by step, compare with the bilingual comparison data storehouse corresponding data of Chinese, will Parsing data after comparison are returned;
Step S6:By language translation model by the translation data of return or parsing data arranged again, by arrangement after turn over Result submission is translated, Chinese's fontlib is called according to translation languages and code identification, show final translation result;
Step S7:Translation is carried out to final translation result to edit again, it is allowed to which user calls Chinese's input method to edit translation With modification, and webpage word and the amended translation that will be obtained as intertranslation to added in Chinese's bilingual teaching mode.
2. webpage takes the translating equipment of word Chinese's interpretation method according to claim 1, it is characterised in that:Including webpage word Acquisition module, Chinese's intertranslation DBM, machine translation module, display module and translation editor module again, the Chinese are mutual Translating DBM includes data retrieval module, Chinese's bilingual teaching mode, the bilingual comparison data storehouse of Chinese;The machine is turned over Translating module includes language translation model, decoder, Chinese's fontlib and Chinese's input method.
CN201710019958.9A 2017-01-11 2017-01-11 A kind of webpage takes word Chinese interpretation method and its device Pending CN106844354A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710019958.9A CN106844354A (en) 2017-01-11 2017-01-11 A kind of webpage takes word Chinese interpretation method and its device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710019958.9A CN106844354A (en) 2017-01-11 2017-01-11 A kind of webpage takes word Chinese interpretation method and its device

Publications (1)

Publication Number Publication Date
CN106844354A true CN106844354A (en) 2017-06-13

Family

ID=59118115

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710019958.9A Pending CN106844354A (en) 2017-01-11 2017-01-11 A kind of webpage takes word Chinese interpretation method and its device

Country Status (1)

Country Link
CN (1) CN106844354A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112487791A (en) * 2020-11-27 2021-03-12 江苏省舜禹信息技术有限公司 Multi-language hybrid intelligent translation method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1661593A (en) * 2004-02-24 2005-08-31 北京中专翻译有限公司 Method for translating computer language and translation system
CN102270198A (en) * 2011-08-16 2011-12-07 上海交通大学出版社有限公司 Computer assisted translation system
CN102662933A (en) * 2012-03-28 2012-09-12 成都优译信息技术有限公司 Distributive intelligent translation method
CN103020044A (en) * 2012-12-03 2013-04-03 江苏乐买到网络科技有限公司 Machine-aided webpage translation method and system thereof
CN103631773A (en) * 2013-12-16 2014-03-12 哈尔滨工业大学 Statistical machine translation method based on field similarity measurement method
CN103885939A (en) * 2012-12-19 2014-06-25 新疆信息产业有限责任公司 Uyghur-Chinese bi-directional translation memory system construction method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1661593A (en) * 2004-02-24 2005-08-31 北京中专翻译有限公司 Method for translating computer language and translation system
CN102270198A (en) * 2011-08-16 2011-12-07 上海交通大学出版社有限公司 Computer assisted translation system
CN102662933A (en) * 2012-03-28 2012-09-12 成都优译信息技术有限公司 Distributive intelligent translation method
CN103020044A (en) * 2012-12-03 2013-04-03 江苏乐买到网络科技有限公司 Machine-aided webpage translation method and system thereof
CN103885939A (en) * 2012-12-19 2014-06-25 新疆信息产业有限责任公司 Uyghur-Chinese bi-directional translation memory system construction method
CN103631773A (en) * 2013-12-16 2014-03-12 哈尔滨工业大学 Statistical machine translation method based on field similarity measurement method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
潘学权主编: "《计算机辅助翻译教程》", 30 June 2016 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112487791A (en) * 2020-11-27 2021-03-12 江苏省舜禹信息技术有限公司 Multi-language hybrid intelligent translation method

Similar Documents

Publication Publication Date Title
CN101197849B (en) Method for commuting internet page into wireless application protocol page
CN102184189B (en) Webpage core block determining method based on DOM (Document Object Model) node text density
KR101393794B1 (en) Terminal and method for determining a type of input method editor
CN102065114A (en) Method and device for mobile terminal to access webpage
US20130339840A1 (en) System and method for logical chunking and restructuring websites
US20080172219A1 (en) Foreign language translator in a document editor
Müller et al. Multi-level annotation in MMAX
CN103064827A (en) Method and device for extracting webpage content
Way et al. On the Role of Translations in State‐of‐the‐Art Statistical Machine Translation
CN104142985B (en) A kind of semi-automatic vertical reptile Core Generator and method
CN102467497A (en) Method and system for text translation in verification program
CN102141868A (en) Method for quickly operating information interaction page, input method system and browser plug-in
US9811505B2 (en) Techniques to provide processing enhancements for a text editor in a computing environment
RU2579888C2 (en) Universal presentation of text to support various formats of documents and text subsystem
Roudaki et al. A classification of web browsing on mobile devices
CN106202066A (en) The interpretation method of website and device
WO2013148351A1 (en) System and method for analyzing an electronic documents
CN111831384A (en) Language switching method and device, equipment and storage medium
CN110309457B (en) Webpage data processing method, device, computer equipment and storage medium
US10198408B1 (en) System and method for converting and importing web site content
CN103455572A (en) Method and device for acquiring movie and television subjects from web pages
CN106844354A (en) A kind of webpage takes word Chinese interpretation method and its device
US9594737B2 (en) Natural language-aided hypertext document authoring
CN101425087A (en) Method and system for constructing dictionary
KR102095703B1 (en) An apparatus, method and recording medium for Markup parsing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170613