WO2016206336A1 - Procédé d'extraction et de restauration de fichier favorable au travail de traduction - Google Patents

Procédé d'extraction et de restauration de fichier favorable au travail de traduction Download PDF

Info

Publication number
WO2016206336A1
WO2016206336A1 PCT/CN2015/098668 CN2015098668W WO2016206336A1 WO 2016206336 A1 WO2016206336 A1 WO 2016206336A1 CN 2015098668 W CN2015098668 W CN 2015098668W WO 2016206336 A1 WO2016206336 A1 WO 2016206336A1
Authority
WO
WIPO (PCT)
Prior art keywords
translation
document
sentence
translator
translated
Prior art date
Application number
PCT/CN2015/098668
Other languages
English (en)
Chinese (zh)
Inventor
江潮
罗伟峰
Original Assignee
武汉传神信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 武汉传神信息技术有限公司 filed Critical 武汉传神信息技术有限公司
Publication of WO2016206336A1 publication Critical patent/WO2016206336A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language

Definitions

  • the invention relates to an artificial intelligence and document processing method which is convenient for translation work.
  • the technical problem to be solved by the invention is to simplify the translation work and improve the translation efficiency, and propose a file extraction and restoration method which is beneficial to the translation work.
  • the file extraction and restoration method proposed by the present invention for facilitating translation work includes the following steps:
  • the translator processing document has three fields of "original”, “translation” and id, the "original” field corresponds to the original text of the sentence, and the "translation” field corresponds to the sentence translation;
  • Disassembling the document object to be translated into a data set to be translated with a sentence as a minimum unit includes the following steps:
  • the Aspose component provides a paragraph object, a child node object, and a Run object that facilitates character operations, and the Run object is a continuous set of character segments in a consistent character format in the document.
  • the merge of a Run object containing only one sentence fragment into a subsequent Run object includes the following steps:
  • the invention also includes establishing a dictionary object, the key of the dictionary object is the original text, the value is the translation, the original-translation is a key-value pair, and when the translator processes the document, the corresponding original text-translation is recorded in the record. , respectively, write the dictionary object.
  • step 5 if the translation field of the record of an id is empty, in the dictionary object, the original text of the record of the id is used as a key to find whether there is a matching translation value, and if found, the translation is filled with the translation. Translation column.
  • the translator is traversed to process the document, and the repeated sentences are marked to remind the translator that the translation is not required.
  • the translator is traversed to process the document, and the sentence in the original text is automatically matched with the term in the termbase, and if the sentence is matched, the term sentence is annotated, so that Translation work is smoother.
  • the translator is traversed to process the document, and the sentences in the original text are matched one by one with the corpus in the corpus, and if they match, the corpus translation in the corpus is filled in. Go to the "translation" field corresponding to the matching sentence.
  • the present invention simplifies the work of the translator, so that the translator does not need to master the processing methods of various mainstream document programs such as PPT, Word, EXCL, and PDF, so that more energy can be focused on the work of text translation.
  • various mainstream document programs such as PPT, Word, EXCL, and PDF
  • all repetitive sentences need only be translated once, others are automatically filled and generated; collecting each translation result, when When you receive a new manuscript, you can directly use the previously accumulated corpus and terminology to further improve translation efficiency.
  • FIG. 1 is a screenshot of a translator translation processing interface according to a specific embodiment of the present invention. The figure mainly shows a translator processing document filled with the original text.
  • FIG. 2 is a screenshot of another translator translation processing interface according to a specific embodiment of the present invention. The figure mainly shows a translator who has processed the pre-processed document.
  • Figure 3 is an overall flow chart of the present invention.
  • the method for extracting and restoring files for translation work proposed by the present invention comprises the following steps:
  • the translator processing document has three fields of "original”, “translation” and id, the "original” field corresponds to the original text of the sentence, and the "translation” field corresponds to the sentence translation;
  • the paragraph object contains all the text information of the document object, and does not include symbols, images or other non-text information that does not need to be translated;
  • the Aspose component provides paragraph objects, child node objects, and Run objects that facilitate character operations.
  • the Run object is a collection of characters in a consistent character format within a document.
  • Run object There are 4 cases of the obtained Run object: 1 a Run object contains multiple complete sentences; 2 a Run object contains multiple complete sentences and a certain sentence segment; 3 A Run object contains only one sentence segment; 4 The Run object contains a complete sentence. Therefore, further sentence processing is required, and the existing Run objects are split and merged to obtain only one one. A complete sentence of the Run object.
  • S4 traverse each Run object, split all Run objects into a Run object containing only one complete sentence, or a Run object containing only one sentence fragment.
  • the method used is for example:
  • Run object contains multiple complete sentences
  • the Run object is split with a sentence terminator and split into several Run objects that contain only one complete sentence.
  • Run object contains multiple complete sentences and a sentence fragment
  • the Run object is bounded by a sentence terminator, split into several Run objects containing only one complete sentence, and a Run containing a sentence fragment. Object.
  • Run-1 "In order to solve the above problem, a special will be proposed”
  • Run-2 Word, Excel, PPT, PDF
  • Run-3 "A variety of mainstream document formats are converted into a unified standard style”
  • Run-4 "Word”
  • Run-5 "Documents and can also be converted in turn The standard obtained
  • Run-6 "Word”
  • Run-7 "The method of restoring the document to the original format. To simplify the translation work and improve the translation efficiency.”
  • Run-1 to Run-6 above only contain one sentence fragment, and Run-7 contains two seemingly complete sentences.
  • Run-1 through Run-6 need to be merged, and Run-7 needs to be split further.
  • S5-1 takes out the character content of the Run object of only one sentence segment, stores it in the temporary storage unit, and then deletes the Run object in the paragraph object;
  • S5-2 checks the next Run object. If the character content of the Run object is only a sentence fragment, the character content of the Run object is taken out, added to the temporary storage unit, and then the Run object is deleted in the paragraph object, and the inspection continues. The next Run object; otherwise, the temporary storage unit is taken to store the character content, added to the character content of the next Run object, and then the temporary storage unit is emptied.
  • the translator processing the document is sent to the translator, and the translator translates the original text of the “original” field one by one in the translator processing document, and fills in the corresponding “translation” field until the processing is completed;
  • the key of the dictionary object is the original text, the value is the translation, the original-translation is a key-value pair; when traversing the translator to process the document, the corresponding original-translation in a record is written separately Enter the dictionary object.
  • the original text of the record of the id is used as a key to find whether there is a matching translation value, and if found, the translation column is filled with the translation.
  • EXCL, PPT, and PDF documents mentioned in the present invention those skilled in the art can implement the character information contained in these documents by using the ASpose component, and perform the sentence-based unit according to the method disclosed by the present invention.
  • the data collection is split and combined, the translator is processed to process the document, which is convenient for translators to translate; and after the translator translates, the translation of the translated document is processed.
  • the translator For example, for an EXCL, PDF, and PPT document, it can be processed by using the method of the above embodiment after converting it into a corresponding Word document by using an existing tool.
  • EXCL documents you can also use the ASpose component directly.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

L'invention concerne un procédé de traitement de fichier d'intelligence artificielle favorable au travail de traduction. Le procédé consiste : au moyen de la prise en charge d'un ensemble Aspose sur une opération de traitement de fichier, à désassembler un objet de fichier à traduire en un ensemble de données à traduire, une phrase simple servant d'unité minimale dans l'ensemble des données à traduire ; à établir un document de traitement d'interprète standard, à copier chaque phrase dans l'ensemble de données à traduire sur le document de traitement d'interprète une par une ; un interprète remplit des traductions dans le document de traitement d'interprète une par une ; à traverser l'ensemble de données à traduire et le document de traitement d'interprète, et à écrire les traductions dans l'ensemble de données à traduire ; et à restaurer l'ensemble de données à traduire en un document à format de manuscrit d'origine. Divers manuscrits de différents formats peuvent être convertis en un document de traitement d'interprète standard. Des phrases apparaissant de manière répétée de multiples fois n'ont pas besoin d'être traduites de manière répétée de multiples fois, le travail de traduction de l'interprète est simplifié, l'efficacité de traduction est améliorée, les efficacités d'exécution d'une logique d'extraction et d'une logique de restauration sont élevées, et un manuscrit de traduction restauré réserve un format de manuscrit d'origine.
PCT/CN2015/098668 2015-06-25 2015-12-24 Procédé d'extraction et de restauration de fichier favorable au travail de traduction WO2016206336A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2015103576722 2015-06-25
CN201510357672.2A CN104933041B (zh) 2015-06-25 2015-06-25 一种利于翻译工作的文件抽取和还原方法

Publications (1)

Publication Number Publication Date
WO2016206336A1 true WO2016206336A1 (fr) 2016-12-29

Family

ID=54120210

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/098668 WO2016206336A1 (fr) 2015-06-25 2015-12-24 Procédé d'extraction et de restauration de fichier favorable au travail de traduction

Country Status (2)

Country Link
CN (1) CN104933041B (fr)
WO (1) WO2016206336A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109617974A (zh) * 2018-12-21 2019-04-12 珠海金山办公软件有限公司 一种请求处理方法、装置及服务器
CN110555196A (zh) * 2018-05-30 2019-12-10 北京百度网讯科技有限公司 用于自动生成文章的方法、装置、设备和存储介质
CN110688863A (zh) * 2019-09-25 2020-01-14 六维联合信息科技(北京)有限公司 一种文档翻译系统及文档翻译方法
CN112766003A (zh) * 2021-01-20 2021-05-07 语联网(武汉)信息技术有限公司 文档辅助翻译方法及装置

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933041B (zh) * 2015-06-25 2017-09-01 武汉传神信息技术有限公司 一种利于翻译工作的文件抽取和还原方法
CN106919558B (zh) * 2015-12-24 2020-12-01 姚珍强 用于移动设备的基于自然对话方式的翻译方法和翻译装置
CN105808528B (zh) * 2016-03-04 2019-01-25 张广睿 一种文档文字的处理方法
CN105760368B (zh) * 2016-03-11 2019-02-12 张广睿 一种文档文字的深度处理方法
CN105677643A (zh) * 2016-03-14 2016-06-15 张广睿 一种人工结合机器的笔译方法
CN106021242B (zh) * 2016-05-27 2019-04-23 成都优译信息技术有限公司 Dwg格式图纸翻译数据回写系统及其回写方法
CN105975461B (zh) * 2016-05-27 2019-04-23 成都优译信息技术有限公司 在dwg格式文件中新增译文的方法
CN106021197B (zh) * 2016-05-27 2019-04-23 成都优译信息技术有限公司 Dwg格式文件的翻译系统及翻译方法
CN106055529B (zh) * 2016-05-27 2019-04-23 成都优译信息技术有限公司 Dwg格式文件中待翻译文本数据的解析系统及其解析方法
CN105975451B (zh) * 2016-05-27 2019-04-23 成都优译信息技术有限公司 Dwg格式文件翻译数据的处理系统及其处理方法
CN107590140B (zh) * 2017-10-17 2020-09-25 语联网(武汉)信息技术有限公司 一种文档漏译条目处理方法
CN107885735B (zh) * 2017-11-21 2021-05-04 语联网(武汉)信息技术有限公司 一种格式无关的文档翻译方法及系统
CN108563645B (zh) * 2018-04-24 2022-03-22 成都智信电子技术有限公司 His系统的元数据翻译方法和装置
CN109446531B (zh) * 2018-09-06 2023-05-05 语联网(武汉)信息技术有限公司 检测翻译进度的方法、装置与电子设备
CN109783826B (zh) * 2019-01-15 2023-11-21 四川译讯信息科技有限公司 一种文档自动翻译方法
CN111143074B (zh) * 2019-12-30 2024-04-09 文思海辉智科科技有限公司 一种翻译文件的分配方法和装置
CN111144070B (zh) * 2019-12-31 2023-08-01 北京迈迪培尔信息技术有限公司 一种文档解析翻译方法和装置
CN111291575B (zh) * 2020-02-28 2023-04-18 北京字节跳动网络技术有限公司 文本处理方法、装置、电子设备、及存储介质
CN112052648B (zh) * 2020-09-02 2021-11-16 文思海辉智科科技有限公司 一种字串翻译方法、装置、电子设备及存储介质
CN113705158A (zh) * 2021-09-26 2021-11-26 上海一者信息科技有限公司 一种文档翻译中智能还原原文样式的方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5848386A (en) * 1996-05-28 1998-12-08 Ricoh Company, Ltd. Method and system for translating documents using different translation resources for different portions of the documents
CN102982027A (zh) * 2011-09-02 2013-03-20 北大方正集团有限公司 提取文档中内容的方法和装置
CN104331399A (zh) * 2014-07-25 2015-02-04 一朵云(北京)科技有限公司 字典树翻译方法
CN104933041A (zh) * 2015-06-25 2015-09-23 武汉传神信息技术有限公司 一种利于翻译工作的文件抽取和还原方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5848386A (en) * 1996-05-28 1998-12-08 Ricoh Company, Ltd. Method and system for translating documents using different translation resources for different portions of the documents
CN102982027A (zh) * 2011-09-02 2013-03-20 北大方正集团有限公司 提取文档中内容的方法和装置
CN104331399A (zh) * 2014-07-25 2015-02-04 一朵云(北京)科技有限公司 字典树翻译方法
CN104933041A (zh) * 2015-06-25 2015-09-23 武汉传神信息技术有限公司 一种利于翻译工作的文件抽取和还原方法

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110555196A (zh) * 2018-05-30 2019-12-10 北京百度网讯科技有限公司 用于自动生成文章的方法、装置、设备和存储介质
CN110555196B (zh) * 2018-05-30 2023-07-18 北京百度网讯科技有限公司 用于自动生成文章的方法、装置、设备和存储介质
CN109617974A (zh) * 2018-12-21 2019-04-12 珠海金山办公软件有限公司 一种请求处理方法、装置及服务器
CN110688863A (zh) * 2019-09-25 2020-01-14 六维联合信息科技(北京)有限公司 一种文档翻译系统及文档翻译方法
CN110688863B (zh) * 2019-09-25 2023-04-07 六维联合信息科技(北京)有限公司 一种文档翻译系统及文档翻译方法
CN112766003A (zh) * 2021-01-20 2021-05-07 语联网(武汉)信息技术有限公司 文档辅助翻译方法及装置

Also Published As

Publication number Publication date
CN104933041B (zh) 2017-09-01
CN104933041A (zh) 2015-09-23

Similar Documents

Publication Publication Date Title
WO2016206336A1 (fr) Procédé d'extraction et de restauration de fichier favorable au travail de traduction
CN108415887A (zh) 一种pdf文件向ofd文件转化的方法
CN104346319B (zh) 检查文档样式的方法及系统
CN109582647B (zh) 一种面向非结构化证据文件的分析方法及系统
CN101558405B (zh) 将主机系统数据库转换为开放系统数据库的转换装置和方法
CN112149399A (zh) 基于rpa及ai的表格信息抽取方法、装置、设备及介质
CN105138575A (zh) 语音文本串的解析方法和装置
CN111309313A (zh) 一种快速生成html以及存储表单数据的方法
CN104199871A (zh) 一种用于智慧教学的高速化试题导入方法
CN111176650A (zh) 解析器生成方法、检索方法、服务器及存储介质
CN104750472A (zh) 一种终端应用的资源包管理方法和装置
CN111068336A (zh) 游戏译文版本的生成方法、装置、电子设备及存储介质
CN112527291A (zh) 网页生成方法、装置、电子设备及存储介质
CN112766000A (zh) 基于预训练模型的机器翻译方法及系统
CN113867694B (zh) 一种智能生成前端代码的方法和系统
Clausner et al. Efficient ocr training data generation with aletheia
US20180032544A1 (en) Distributed processing management method and distributed processing management apparatus
CN109947711B (zh) 一种对ios项目开发过程中的多语言文件自动化管理方法
CN106815181B (zh) 一种InDesign排版的indd文件到Office文件的转换方法及装置
CN104331399A (zh) 字典树翻译方法
CN110889266A (zh) 一种会议记录整合方法和装置
CN111143642A (zh) 网页分类方法、装置、电子设备及计算机可读存储介质
US8930808B2 (en) Processing rich text data for storing as legacy data records in a data storage system
CN102629244B (zh) 多语言工卡生成系统及方法
CN103440231A (zh) 用于比较文本的设备和方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15896219

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15896219

Country of ref document: EP

Kind code of ref document: A1