CN111401000B - Real-time translation previewing method for online auxiliary translation - Google Patents

Real-time translation previewing method for online auxiliary translation Download PDF

Info

Publication number
CN111401000B
CN111401000B CN202010260294.7A CN202010260294A CN111401000B CN 111401000 B CN111401000 B CN 111401000B CN 202010260294 A CN202010260294 A CN 202010260294A CN 111401000 B CN111401000 B CN 111401000B
Authority
CN
China
Prior art keywords
translation
atom
html
segment
tag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010260294.7A
Other languages
Chinese (zh)
Other versions
CN111401000A (en
Inventor
陈件
张井
成延
刘旻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yizhe Information Technology Co ltd
Original Assignee
Shanghai Yizhe Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Yizhe Information Technology Co ltd filed Critical Shanghai Yizhe Information Technology Co ltd
Priority to CN202010260294.7A priority Critical patent/CN111401000B/en
Publication of CN111401000A publication Critical patent/CN111401000A/en
Application granted granted Critical
Publication of CN111401000B publication Critical patent/CN111401000B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a real-time translation previewing method for online auxiliary translation, which relates to the field of computer auxiliary translation and comprises the following steps: converting the original text file into HTML through a file format converter, analyzing and dividing the original text into sentence Segment segments which are divided according to sentences, burying element ids in the sentence Segment segments into converted HTML sub-tags by using a cyclic recursion algorithm to form a one-to-one correspondence, and realizing linkage between the sentence Segment segments and the HTML through a dom node of the HTML at the front end so as to achieve the effect of previewing the translation in real time; the invention provides an algorithm, which can render the translation in the auxiliary translation into a browser in real time for the translator to check and reference, thereby greatly saving the translation time and improving the efficiency obviously.

Description

Real-time translation previewing method for online auxiliary translation
Technical Field
The invention relates to the field of computer-aided translation, in particular to a real-time translation preview method for online-aided translation.
Background
The contemporary computer aided translation needs to extract the text, translate the text into the appointed target language, and then fill the translated text back. Typically, a translator cannot view the original text and translated text of a translated document in an editor during translation. The conventional method is to convert the original text into an html format by a file conversion method and render the html format to a translator for viewing by a browser. However, the translations formed by the translator during the editing process cannot be viewed in real time.
Disclosure of Invention
The embodiment of the invention provides a real-time translation preview method for online assisted translation. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
According to a first aspect of an embodiment of the present invention, there is provided
A real-time translation preview method for online assisted translation comprises the following steps:
converting the original text file into an HTML format;
analyzing the original text and dividing the original text into sentence Segment arrays divided by sentences;
embedding the element id in the sentence Segment into the sub-tag in the HTML format file by using a cyclic recursion algorithm to form a one-to-one correspondence;
and the linkage between the sentence Segment and the HTML is realized through the dom node of the HTML, so that the effect of previewing the translation in real time is achieved.
Preferably, the original document format is doc, docx, rtf, xls, xlsx, ppt, pptx, pdf, sxw, stw, sxc, stc.
Preferably, the method for converting the original document into the HTML format is to convert the original document by using a word self conversion function or other third party tools.
Preferably, the parsing and dividing the original text into sentence Segment segments array divided by sentences, specifically, dividing the sentence into sentence Segment segments divided by sentences is word, phrase or sentence.
Preferably, the Segment array is a Segment list, and the text content and the corresponding text labels of the segments are recorded.
Preferably, the cyclic recursive algorithm comprises the steps of:
defining a class of Atom types, which has two types defined as Tag and text;
defining Segment content in the Segment list as text of Atom, and defining label of Segment as label Tag of Atom;
the algorithm circularly reads each Atom and judges whether the Atom is put into a text pool or not according to the type of the Atom;
and (3) corresponding each Atom in the text pool to the Tag of the Atom to finally form a new HTML sub-Tag set with id mapping.
Preferably, the Atom class is a custom class.
Preferably, each Segment is composed of one or more Atom.
Preferably, the sub-Tag of the HTML is constituted by a Tag of Atom.
Preferably, the linkage method between the sentence Segment and the HTML is as follows: the Tag of Atom is embedded into the HTML sub-Tag.
The technical scheme provided by the embodiment of the invention can comprise the following beneficial effects:
the invention provides an algorithm, which can render the translation in the auxiliary translation into a browser in real time for the translator to check and reference, thereby greatly saving the translation time and improving the efficiency obviously. As shown in FIG. 7, when translating the 181 th sentence, the translator can see the effect of the translated sentence in the translated text in real time.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a schematic diagram of a real-time preview method of a translation of an online assisted translation according to an exemplary embodiment;
FIG. 2 is a logic diagram of a recursive and round robin algorithm shown in accordance with an exemplary embodiment;
FIG. 3 is an exemplary diagram of an original document shown in accordance with an exemplary embodiment;
FIG. 4 is a schematic diagram of sentence segments divided by sentences, shown according to an example embodiment;
FIG. 5 is a diagram illustrating conversion of an original document into HTML through a file format, according to an exemplary embodiment;
FIG. 6 is a schematic diagram showing embedding a transUnitId in a tag according to an exemplary embodiment;
FIG. 7 is a live preview effect view of a translation shown in accordance with an exemplary embodiment.
Description of the embodiments
The following description and the drawings sufficiently illustrate specific embodiments of the invention to enable those skilled in the art to practice them. The embodiments represent only possible variations. Individual components and functions are optional unless explicitly required, and the sequence of operations may vary. Portions and features of some embodiments may be included in, or substituted for, those of others. The scope of embodiments of the invention encompasses the full ambit of the claims, as well as all available equivalents of the claims. Embodiments may be referred to herein, individually or collectively, by the term "invention" merely for convenience and without intending to voluntarily limit the scope of this application to any single invention or inventive concept if more than one is in fact disclosed. Relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed. Various embodiments are described herein in a progressive manner, each embodiment focusing on differences from other embodiments, and identical and similar parts between the various embodiments are sufficient to be seen with each other. The structures, products and the like disclosed in the embodiments correspond to the parts disclosed in the embodiments, so that the description is relatively simple, and the relevant parts refer to the description of the method parts.
It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless it is specifically stated otherwise.
It should be noted that although the steps of the methods of the present disclosure are illustrated in the accompanying drawings in a particular order, this does not require or imply that the steps must be performed in that particular order or that all of the illustrated steps be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.
The invention is further described below with reference to the accompanying drawings and examples:
as shown in FIG. 1, the real-time translation preview method for online assisted translation comprises the following steps:
s1: converting the original text file into an HTML format;
s2: analyzing the original text and dividing the original text into sentence Segment arrays divided by sentences;
s3: embedding the element id in the sentence Segment into an HTML sub-tag by using a cyclic recursion algorithm to form a one-to-one correspondence;
s4: and the linkage between the sentence Segment and the HTML is realized through the dom node of the HTML, so that the effect of previewing the translation in real time is achieved.
According to the above scheme, further, the original document format may be word, excel, ppt, pdf, as shown in fig. 3, where the original document is a word.
In particular embodiments, the file format conversion described in FIG. 5 may utilize the conversion functionality of the word itself or other third party open source tools.
According to the above scheme, further, the sentence is divided into sentence segments divided by sentence, and the sentence segments are words or phrases, as shown in fig. 3, the word has test. test, sentence 2: fast.
According to the above scheme, further, the Segment array is a Segment list, and records text content and corresponding text labels of Segment segments, as shown in fig. 4, when the code is implemented, we define two sentences as two objects, segment1 and Segment2, and transUnitId: sentence labels; srcAtom: is sentence content.
According to the above scheme, further, as shown in fig. 2, a schematic diagram of a logic diagram of a recursive and cyclic algorithm is shown in a specific embodiment, and the cyclic recursive algorithm specifically includes the following steps:
s31: defining a class of Atom types, which has two types defined as Tag and text;
s32: defining Segment content in the Segment list as text of Atom, and defining label of Segment as label Tag of Atom;
s33: the algorithm circularly reads each Atom and judges whether the Atom is put into a text pool or not according to the type of the Atom;
s34: and (3) corresponding each Atom in the text pool to the Tag of the Atom to finally form a new HTML sub-Tag set with id mapping.
In a specific embodiment, the Atom class is a custom class and is not an original class.
According to the above scheme, further, each Segment is composed of one or more Atom.
In a specific embodiment, the sub-Tag of the HTML is formed by a Tag of Atom.
According to the above scheme, further, the linkage between the Segment segments and the HTML is implemented by embedding the Atom Tag into the HTML sub-Tag, as shown in fig. 6, so as to realize that the translated content of the sentence 1 can be displayed in the HTML webpage in real time, we need to locate the first span Tag under the p Tag of the above figure. The simplest approach is to say transUnitId embedding in the tag.
The method for previewing the translation in real time in the online auxiliary translation can render the translation in the auxiliary translation into the browser in real time for the translator to check and reference, greatly saves the translation time, and has very obvious efficiency improvement. As shown in FIG. 7, when translating the 181 th sentence, the translator can see the effect of the translated sentence in the translated text in real time.
The invention provides an algorithm, which can render the translation in the auxiliary translation into a browser in real time for the translator to check and reference, thereby greatly saving the translation time and improving the efficiency obviously.
It is to be understood that the invention is not limited to the arrangements and instrumentality shown in the drawings and described above, and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (3)

1. A real-time translation preview method for online assisted translation is characterized by comprising the following steps:
converting the original text file into an HTML format;
analyzing the original text and dividing the original text into sentence Segment arrays divided by sentences; the method comprises the following steps: dividing the clause into sentence Segment segments divided according to sentences as words, phrases or sentences; the Segment array is a Segment list, and the text content and the corresponding text labels of the Segment are recorded;
embedding the element id in the sentence Segment into the sub-tag in the HTML format file by using a cyclic recursion algorithm to form a one-to-one correspondence;
the linkage between the sentence Segment and the HTML is realized through the dom node of the HTML, so that the effect of previewing the translation in real time is achieved;
the cyclic recursion algorithm comprises the following steps:
defining a class of Atom types, which has two types defined as Tag and text; class Atom is a custom class;
defining Segment content in the Segment list as text of Atom, and defining label of Segment as label Tag of Atom;
the algorithm circularly reads each Atom and judges whether the Atom is put into a text pool or not according to the type of the Atom;
each Atom in the text pool corresponds to the Tag of the Atom to finally form a new HTML sub-Tag set with id mapping;
wherein each Segment is composed of one or more Atom; the sub-Tag of the HTML is composed of Tag of an Atom;
the linkage method between the sentence Segment and the HTML comprises the following steps: the Tag of Atom is embedded into the HTML sub-Tag.
2. The method for real-time translation preview of online assisted translation according to claim 1, wherein said original document format is doc, docx, rtf, xls, xlsx, ppt, pptx, pdf, sxw, stw, sxc, stc.
3. The method for real-time translation preview of online assisted translation according to claim 2, wherein the converting the original document into HTML format is by using a word's own conversion function or other third party tools.
CN202010260294.7A 2020-04-03 2020-04-03 Real-time translation previewing method for online auxiliary translation Active CN111401000B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010260294.7A CN111401000B (en) 2020-04-03 2020-04-03 Real-time translation previewing method for online auxiliary translation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010260294.7A CN111401000B (en) 2020-04-03 2020-04-03 Real-time translation previewing method for online auxiliary translation

Publications (2)

Publication Number Publication Date
CN111401000A CN111401000A (en) 2020-07-10
CN111401000B true CN111401000B (en) 2023-06-20

Family

ID=71434942

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010260294.7A Active CN111401000B (en) 2020-04-03 2020-04-03 Real-time translation previewing method for online auxiliary translation

Country Status (1)

Country Link
CN (1) CN111401000B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111985255A (en) * 2020-09-01 2020-11-24 北京中科凡语科技有限公司 Translation method, translation device, electronic device and storage medium
CN113705158A (en) * 2021-09-26 2021-11-26 上海一者信息科技有限公司 Method for intelligently restoring original text style in document translation

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102388383A (en) * 2006-12-08 2012-03-21 帕特里克·J·霍尔 Online computer-aided translation
CN102567384A (en) * 2010-12-29 2012-07-11 盛乐信息技术(上海)有限公司 Webpage multi-language dynamic switching method and system based on webpage browser engine
CN102929867A (en) * 2011-11-03 2013-02-13 微软公司 Technology used for automatically translating a document
CN104965866A (en) * 2015-06-05 2015-10-07 小米科技有限责任公司 Method and apparatus for establishing label and style rule binding relation
CN105069000A (en) * 2015-08-24 2015-11-18 中译语通科技(北京)有限公司 Interactive prediction input method
CN105468697A (en) * 2015-11-18 2016-04-06 成都优译信息技术有限公司 Automatic positioning method used for translation teaching system
CN105573969A (en) * 2006-10-02 2016-05-11 谷歌公司 Displaying original text in a user interface with translated text
CN105760542A (en) * 2016-03-15 2016-07-13 腾讯科技(深圳)有限公司 Display control method, terminal and server
CN106649271A (en) * 2016-12-19 2017-05-10 成都优译信息技术股份有限公司 Translation-based word document analysis method
CN107885735A (en) * 2017-11-21 2018-04-06 语联网(武汉)信息技术有限公司 A kind of unrelated document translation method and system of form
CN109145260A (en) * 2018-08-24 2019-01-04 北京科技大学 A kind of text information extraction method
CN110263351A (en) * 2019-06-17 2019-09-20 深圳前海微众银行股份有限公司 A kind of multi-language translation method of webpage, device and equipment

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105573969A (en) * 2006-10-02 2016-05-11 谷歌公司 Displaying original text in a user interface with translated text
CN102388383A (en) * 2006-12-08 2012-03-21 帕特里克·J·霍尔 Online computer-aided translation
CN102567384A (en) * 2010-12-29 2012-07-11 盛乐信息技术(上海)有限公司 Webpage multi-language dynamic switching method and system based on webpage browser engine
CN102929867A (en) * 2011-11-03 2013-02-13 微软公司 Technology used for automatically translating a document
CN104965866A (en) * 2015-06-05 2015-10-07 小米科技有限责任公司 Method and apparatus for establishing label and style rule binding relation
CN105069000A (en) * 2015-08-24 2015-11-18 中译语通科技(北京)有限公司 Interactive prediction input method
CN105468697A (en) * 2015-11-18 2016-04-06 成都优译信息技术有限公司 Automatic positioning method used for translation teaching system
CN105760542A (en) * 2016-03-15 2016-07-13 腾讯科技(深圳)有限公司 Display control method, terminal and server
CN106649271A (en) * 2016-12-19 2017-05-10 成都优译信息技术股份有限公司 Translation-based word document analysis method
CN107885735A (en) * 2017-11-21 2018-04-06 语联网(武汉)信息技术有限公司 A kind of unrelated document translation method and system of form
CN109145260A (en) * 2018-08-24 2019-01-04 北京科技大学 A kind of text information extraction method
CN110263351A (en) * 2019-06-17 2019-09-20 深圳前海微众银行股份有限公司 A kind of multi-language translation method of webpage, device and equipment

Also Published As

Publication number Publication date
CN111401000A (en) 2020-07-10

Similar Documents

Publication Publication Date Title
US7472343B2 (en) Systems, methods and computer programs for analysis, clarification, reporting on and generation of master documents for use in automated document generation
US8635539B2 (en) Web-based language translation memory compilation and application
CN111401000B (en) Real-time translation previewing method for online auxiliary translation
Lewis et al. Developing ODIN: A multilingual repository of annotated language data for hundreds of the world's languages
JP2004334791A (en) Machine translation apparatus, data processing method and program
Goodman et al. Xigt: extensible interlinear glossed text for natural language processing
JP4304268B2 (en) Third language text generation algorithm, apparatus, and program by inputting bilingual parallel text
CN110413574A (en) A kind of method of automatic code generating internationalized resources
Sautter et al. Semi-automated XML markup of biosystematic legacy literature with the GoldenGATE editor
JP2004220266A (en) Machine translation device and machine translation method
Jacobson et al. Linguistic documents synchronizing sound and text
Hudík et al. The integration of moses into localization industry
Escartín Design and compilation of a specialized Spanish-German parallel corpus.
Komen Cesax: Coreference editor for syntactically annotated XML corpora
Durrani et al. Improving Egyptian-to-English SMT by mapping Egyptian into MSA
JP5994150B2 (en) Document creation method, document creation apparatus, and document creation program
Declerck et al. Cross-linking Austrian dialectal Dictionaries through formalized Meanings
Kumar et al. A machine assisted human translation system for technical documents
Senellart et al. SYSTRAN translation stylesheets: machine translation driven by XSLT
Cruz—Lara et al. Standardising the management and the representation of multilingual data
Boitet et al. Towards Higher Quality Internal and Outside Multilingualization of Web Sites
Erjavec et al. From machine readable dictionaries to lexical databases: The concede experience
Choumane et al. Integrating translation services within a structured editor
Korkiakangas A digital diplomatic edition of the 10th-century charters of Lucca for Latin corpus linguistics
Huang et al. Quality Assurance of Automatic Annotation of Very Large Corpora: a Study based on heterogeneous Tagging System.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant