WO2022141788A1 - 文档翻译方法、装置、电子设备及存储介质 - Google Patents

文档翻译方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2022141788A1
WO2022141788A1 PCT/CN2021/078816 CN2021078816W WO2022141788A1 WO 2022141788 A1 WO2022141788 A1 WO 2022141788A1 CN 2021078816 W CN2021078816 W CN 2021078816W WO 2022141788 A1 WO2022141788 A1 WO 2022141788A1
Authority
WO
WIPO (PCT)
Prior art keywords
segment
translated
document
translation
segments
Prior art date
Application number
PCT/CN2021/078816
Other languages
English (en)
French (fr)
Inventor
张芃
Original Assignee
语联网(武汉)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 语联网(武汉)信息技术有限公司 filed Critical 语联网(武汉)信息技术有限公司
Publication of WO2022141788A1 publication Critical patent/WO2022141788A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/47Machine-assisted translation, e.g. using translation memory

Definitions

  • the present application relates to the field of computer technology, and in particular, to a document translation method, apparatus, electronic device, and storage medium.
  • the present application provides a document translation method, device, electronic device and storage medium, which are used to solve the problems of long translation time and low translation efficiency in the prior art.
  • This application provides a document translation method, including:
  • the hash value of any segment to be translated is matched with all the original segments in the translation corpus, and the translation segment of any segment to be translated is determined;
  • the translation corpus includes a plurality of original segments and a hash corresponding to each original segment values and translation fragments;
  • a translation result of the document is determined.
  • the hash value matching of any segment to be translated with all original segments in the translation corpus, and determining the translated segment of any segment to be translated includes:
  • a translation segment of any of the to-be-translated segments is determined.
  • the determination of the translation segment of any segment to be translated based on the segment to be translated, the plurality of original segments and the translated segment of each original segment includes:
  • the translation segment corresponding to the candidate source text segment is used as the translation segment of any segment to be translated.
  • the associated segment of any segment to be translated is a context segment of the segment to be translated in the document.
  • the determining of multiple segments to be translated in the document includes:
  • segmenting the document to determine all segments of the document
  • the segments with the same hash value are clustered to obtain a plurality of semantic similarity classes, and the Any segment is used as the segment to be translated corresponding to each semantically similar class;
  • a plurality of to-be-translated segments in the document are determined.
  • determining the translation result of the document based on the translation segment of each segment to be translated includes:
  • a translation result of the document is determined.
  • the document is divided into segments, and all segments of the document are determined, including:
  • the document is segmented based on paragraph identifiers and/or punctuation marks in the document, and all segments of the document are determined.
  • the present application provides a document translation device, including:
  • a fragment determination unit used to determine a plurality of fragments to be translated in the document
  • a segment translation unit configured to perform hash value matching between any segment to be translated and all original segments in the translation corpus, and determine the translated segment of any segment to be translated;
  • the translation corpus includes a plurality of original segments and each The hash value corresponding to the original segment and the translated segment;
  • a result output unit configured to determine a translation result of the document based on the translation segment of each segment to be translated.
  • the present application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements any one of the document translation methods described above when the processor executes the program. A step of.
  • the present application also provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps of any one of the above document translation methods.
  • the hash value of any segment to be translated in the document is matched with all the original segments in the translation corpus, and the translated segment of any segment to be translated is determined, according to The translation segment of each segment to be translated determines the translation result of the document.
  • the translation corpus includes multiple original segments and the corresponding hash value and translation segment of each original segment.
  • the existing historical translation data is used to reduce the translator's workload. The workload is realized, the document translation is automated, and the document translation efficiency is improved. At the same time, it avoids inconsistencies in the results translated by different translators for the same segment, and ensures the consistency of the translation results.
  • FIG. 2 is a schematic structural diagram of a document translation device provided by the application.
  • FIG. 3 is a schematic structural diagram of an electronic device provided by the present application.
  • FIG. 1 is a schematic flowchart of a document translation method provided by the application, and as shown in FIG. 1 , the method includes:
  • Step 110 Determine multiple segments to be translated in the document.
  • the document is the text to be translated, and the language type of the document may be Chinese, or may be English, Japanese, French, German, Arabic, and the like.
  • a fragment is a basic unit of a document, which can be a natural paragraph or a sentence.
  • a document to be translated can be divided into multiple segments to be translated.
  • a document to be translated it can be divided into segments to obtain multiple segments to be translated, which can be expressed as a set as:
  • S is the document to be translated
  • S i is the ith segment to be translated
  • n is the number of segments to be translated, 1 ⁇ i ⁇ n.
  • Step 120 Match any fragment to be translated with all the original text fragments in the translation corpus to determine the translation fragment of the fragment to be translated; the translation corpus includes a plurality of original text fragments and the corresponding hash value and Translation fragment.
  • a segment of any length is transformed into a fixed-length output through a hashing algorithm, and the output is the hash value.
  • a hash value is usually represented by a short string of random letters and numbers.
  • the hash algorithm realizes data compression, reduces the amount of data, and fixes the format of the data.
  • Hash algorithms also known as hash algorithms, include MD5 (Message Digest Algorithm), SHA-1 (Secure Hash Algorithm 1) and SHA-256 (Secure Hash Algorithm 256).
  • the translation corpus is a bilingual parallel corpus composed of the original text segment and the translated text segment corresponding to the original text segment, established according to the historical translation data.
  • the original text segment is the text that needs to be translated in the historical data
  • the translated text segment is the text obtained by translating the original text segment.
  • the translation corpus also includes hash values corresponding to the original text segment and the translated text segment.
  • the original text segment and the translated text segment are relative, that is, any segment of the translation corpus can be the original text segment or the translated text segment, and correspondingly, the translated text corresponding to the segment is the translated text segment or the original text segment.
  • the calculation method of the hash value corresponding to the original text segment and the translation text segment in the translation corpus is the same as the calculation method of the hash value of the to-be-translated segment in the document to be translated during the translation process.
  • the translation corpus can be continuously updated and expanded based on historical translation data.
  • a matching original segment is searched in the translation corpus, that is, the hash value of any segment to be translated is matched with all the original segments in the translation corpus.
  • a hash value lookup is performed in the translation corpus to obtain all original text fragments with the same hash value as hi .
  • the segment to be translated When the hash value of the segment to be translated is the same as the hash value of the original segment in the translation corpus, it can be considered that the segment to be translated and the found original segment are the same segment, and the translated segment of the original segment can be regarded as the to-be-translated segment The translated result of the fragment.
  • Step 130 Determine the translation result of the document based on the translation segment of each segment to be translated.
  • the translation result of the document is the translation comparison text of the document.
  • the to-be-translated segments in the document correspond one-to-one with the translated segments in the translation result.
  • the translation segments of each segment to be translated are arranged according to the segment order of each segment to be translated in the document, and the translation result of the document can be obtained.
  • the corresponding translation segment is Y i
  • the translation set corresponding to ⁇ S 1 , S 2 ,...,S n ⁇ is ⁇ Y 1 ,Y 2 ,...,Y n ⁇ .
  • the hash value of any segment to be translated in the document is matched with all the original segments in the translation corpus, and the translation segment of any segment to be translated is determined, according to the translation segment of each segment to be translated , to determine the translation result of the document.
  • the translation corpus includes multiple original text fragments and the corresponding hash values and translation fragments of each original text fragment.
  • the existing historical translation data is used, which reduces the workload of translators and realizes the automation of document translation. , which improves the efficiency of document translation, and at the same time, avoids inconsistent results translated by different translators for the same segment, and ensures the consistency of translation results.
  • step 120 includes:
  • Hash value matching of any segment to be translated with all original segments in the translation corpus to determine multiple original segments that match the segment to be translated
  • a translation segment of the to-be-translated segment is determined.
  • any to-be-translated segment when translating the original text segment found in the corpus, it is likely to find multiple original text segments.
  • the found multiple original text fragments can be filtered.
  • the to-be-translated segment and each original text segment may be compared for semantic similarity, or the to-be-translated segment and each original text segment may be subjected to word frequency statistics, and the original text segment corresponding to the to-be-translated segment is determined according to the semantic similarity comparison result or the word frequency statistics result.
  • the context segment of the segment to be translated and the context segment of each original segment can be compared for semantic similarity, or the context segment of the segment to be translated and the context segment of each original segment can be subjected to word frequency statistics, and the result of the semantic similarity comparison can be performed. Or the word frequency statistics result determines the original text segment corresponding to the segment to be translated.
  • determining the translated text segment of any segment to be translated including:
  • the translation segment corresponding to the candidate source segment is used as the translation segment of any segment to be translated.
  • the associated segment of any segment to be translated is a segment that is semantically associated with the segment to be translated, and the associated segment may include the segment to be translated itself.
  • the associated segment of any segment to be translated may also include the context segment of the segment to be translated in the document, or the segment above the segment to be translated in the document, or the segment below the segment to be translated in the document, or The starting segment in the document where the segment to be translated is located, such as the first segment or the last segment.
  • the segment S i to be translated obtain the context segments S i-1 and S i+1 of the segment in the document, and combine the segment S i to be translated, the above segment S i-1 and the following segment S i+1 together
  • a threshold can be set, and the threshold is used as a benchmark to filter multiple semantic similarity values. For example, if the threshold is set to 60, if the semantic similarity value is less than 60, it indicates that the semantic similarity between the related segment of the original text segment and the related segment of the to-be-translated segment is both low. If the semantic similarity value is greater than 60, it indicates that the related segment of the original text segment and the related segment of the to-be-translated segment have high semantic similarity, and the textual text segment and the to-be-translated segment are similar in context and semantics.
  • the calculation methods of semantic similarity include calculation methods based on vector space model, calculation methods based on Hamming distance, and calculation methods based on semantic understanding.
  • the original text segment with the highest semantic similarity value and greater than the threshold is used as the candidate original text segment of the segment to be translated.
  • the candidate source text segment is the source text segment that is similar in context and semantics to the segment to be translated.
  • the translation segment corresponding to the candidate source text segment can be used as the translation segment of the segment to be translated.
  • the candidate original segment of any segment to be translated is determined, and the translation corresponding to the candidate original segment is The segment is used as the translation segment of any segment to be translated, so that the original segment obtained from the translation corpus can be highly semantically similar to the segment to be translated, and the accuracy of the translation result is improved.
  • the associated segment of any segment to be translated is a context segment in the document of any segment to be translated.
  • the associated segment of any segment to be translated is preferably the segment to be translated and a context segment of the segment to be translated in the document to be translated.
  • the relevant segment of any original text segment in the translation corpus is the original text segment and the context segment of the original text segment in the translated document.
  • the translation corpus When the translation corpus saves the original text fragments and their translated text fragments, they can be saved in the order of fragments in the translated documents, and the translated documents and their corresponding translation results can be aligned. Alignment processing can include sentence alignment or paragraph alignment.
  • the context segment of the segment to be translated in the document may be a segment or multiple segments above the segment to be translated in the document, or may be one segment or multiple segments below the segment to be translated in the document There are several segments, and the number of segments to be selected may be determined according to the actual situation, which is not specifically limited in this embodiment of the present application.
  • step 110 includes:
  • the segments with the same hash value are clustered to obtain multiple semantic similarity classes, and any segment in each semantic similarity class is classified into as the segment to be translated corresponding to each semantically similar class;
  • a plurality of to-be-translated segments in the document are determined based on the plurality of semantically similar classes and the segments to be translated corresponding to each of the semantically similar classes.
  • the fragments in the fragment set E and their corresponding associated documents can be clustered.
  • the clustering method can use the K-means algorithm. After clustering, the fragments in the fragment set E are divided into several categories. Several fragments can be considered as the same fragment to be translated.
  • the embodiment of the present application provides a clustering method based on the semantic similarity of associated segments, and the steps of the method are:
  • Step 2 Using S e1 as the benchmark, calculate the semantic similarity between the associated segment of S e1 and the associated segments of the remaining segments in the segment set E, and screen out all the segments whose semantic similarity is greater than a given threshold, and form the first segment with S e1 .
  • a semantically similar class ES 1 ;
  • Step 3 in all the remaining fragments except ES 1 in the fragment set E, according to the method in step 2, obtain the second semantic similarity class ES 2 ;
  • step 4 the methods in steps 2 and 3 are repeated until all the fragments in the fragment set E are divided into corresponding semantically similar classes, and finally multiple semantically similar classes are obtained.
  • all segments in the semantic similarity class can be translated as a segment to be translated, that is, any segment in the semantic similarity class can be used as the semantic similarity class
  • a representative in a class can take its translation segment as the translation segment of all segments in the semantically similar class.
  • all segments of the document are clustered to obtain a plurality of semantic similarity classes. Similar classes are used to determine multiple segments to be translated in the document. Due to the cluster analysis of the segments in the document, the translation workload is reduced, the document translation efficiency is improved, and the consistency of the translation results is ensured.
  • the translation result of the document is determined, including:
  • the translation result of the document is determined.
  • the translated segment can be used as the translation of all segments to be translated in the semantically similar class Fragment. That is to say, the translation segment of any segment in each semantic similarity class can be used as the translation segment of all segments in the semantic similarity class.
  • the document is segmented, and all segments of the document are determined, including:
  • the document is segmented based on paragraph identifiers and/or punctuation marks in the document, and all segments of the document are determined.
  • the document when the document is segmented, it can be divided according to natural segments, it can also be divided according to sentences, and it can also be divided according to natural segments and sentences.
  • the division basis can be selected as a segment identifier. If according to the way of division of sentences, the division basis can be punctuation marks.
  • the punctuation marks here are punctuation marks that can indicate the end of a complete sentence. Examples include periods, question marks, exclamation marks, and carriage returns.
  • the document translation method provided by the embodiment of the present application divides the document into segments according to the paragraph identifiers and/or punctuation marks in the document, and determines all segments of the document, which is simple and easy to implement, reduces the workload of translators, and realizes the realization of the document.
  • the translation automation improves the efficiency of document translation.
  • the document translation apparatus provided by the present application is described below, and the document translation apparatus described below and the document translation method described above may refer to each other correspondingly.
  • FIG. 2 is a schematic structural diagram of a document translation device provided by this application. As shown in FIG. 2 , the device includes:
  • a fragment determination unit 210 configured to determine a plurality of fragments to be translated in the document
  • the segment translation unit 220 is configured to perform hash value matching between any segment to be translated and all original segments in the translation corpus to determine the translated segment of any segment to be translated; the translation corpus includes a plurality of original segments and each original segment corresponds to The hash value and translated fragment of ;
  • the result output unit 230 is configured to determine the translation result of the document based on the translation segment of each segment to be translated.
  • the segment determination unit 210 is configured to determine a plurality of segments to be translated in the document.
  • the segment translation unit 220 is configured to perform hash value matching between any segment to be translated and all original segments in the translation corpus to determine the translated segment of any segment to be translated.
  • the result output unit 230 is used to determine the translation result of the document.
  • the document translation apparatus performs hash value matching of any segment to be translated in the document with all original segments in the translation corpus, determines the translated segment of any segment to be translated, and determines the translated segment of any segment to be translated according to the translated segment of each segment to be translated , to determine the translation result of the document.
  • the translation corpus includes multiple original text fragments and the corresponding hash values and translation fragments of each original text fragment.
  • the existing historical translation data is used, which reduces the workload of translators and realizes the automation of document translation. , which improves the efficiency of document translation, and at the same time, avoids inconsistent results translated by different translators for the same segment, and ensures the consistency of translation results.
  • the segment translation unit 220 includes:
  • the matching subunit is used to perform hash value matching between any fragment to be translated and all the original text fragments in the translation corpus, and determine a plurality of original text fragments that match the fragment to be translated;
  • the translation subunit is configured to determine the translation segment of the segment to be translated based on the segment to be translated, a plurality of original segments and the translated segment of each original segment.
  • the translation subunit includes:
  • a similarity comparison module configured to determine the candidate original text segment of the to-be-translated segment based on the semantic similarity between the associated segment of any to-be-translated segment and the associated segment of each original text segment;
  • the translation segment determination module is configured to use the translation segment corresponding to the candidate original text segment as the translation segment of the segment to be translated.
  • the associated segment of any segment to be translated is the context segment of the segment to be translated in the document.
  • the segment determination unit 210 includes:
  • the segment division subunit is used to segment the document to determine all the segments of the document
  • a clustering subunit configured to cluster the segments with the same hash value based on the semantic similarity between the associated segments of the segments with the same hash value in the document, to obtain a plurality of semantic similarity classes, and Any segment in each semantically similar class is used as the segment to be translated corresponding to each semantically similar class;
  • the segment to be translated determination subunit is configured to determine multiple segments to be translated in the document based on the multiple semantic similarity classes and the segment to be translated corresponding to each semantic similarity class.
  • the result output unit is specifically used for:
  • the translation result of the document is determined.
  • segment division subunit is specifically used for:
  • the document is segmented based on paragraph identifiers and/or punctuation marks in the document, and all segments of the document are determined.
  • FIG. 3 is a schematic structural diagram of an electronic device provided by the present application.
  • the electronic device may include: a processor (Processor) 310, a communication interface (Communications Interface) 320, a memory (Memory) ) 330 and a communication bus (Communications Bus) 340, wherein the processor 310, the communication interface 320, and the memory 330 complete the communication with each other through the communication bus 340.
  • the processor 310 can invoke the logic commands in the memory 330 to execute the methods provided by the above-mentioned embodiments, and the methods include:
  • the above-mentioned logic commands in the memory 330 may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as an independent product.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution.
  • the computer software product is stored in a storage medium, including Several commands are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .
  • the processor in the electronic device provided in the embodiment of the present application can call the logic instruction in the memory to implement the above document translation method. .
  • the present application also provides a non-transitory computer-readable storage medium.
  • the following describes the non-transitory computer-readable storage medium provided by the present application, the non-transitory computer-readable storage medium described below and the document translation method described above. can refer to each other.
  • Embodiments of the present application provide a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, is implemented to execute the methods provided by the foregoing embodiments, and the method includes:
  • the device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.
  • each embodiment can be implemented by means of software plus a necessary general hardware platform, and certainly can also be implemented by hardware.
  • the above-mentioned technical solutions can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic A disc, an optical disc, etc., includes several commands to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments or some parts of the embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

一种文档翻译方法、装置、电子设备及存储介质,其中方法包括:确定文档中的多个待翻译片段(110);将任一待翻译片段与翻译语料库中的所有原文片段进行散列值匹配,确定所述任一待翻译片段的译文片段(120);所述翻译语料库包括多个原文片段以及每一原文片段对应的散列值和译文片段;基于每一待翻译片段的译文片段,确定所述文档的翻译结果(130)。所述方法、装置、电子设备及存储介质,利用了已有的历史翻译数据,减少了翻译人员的工作量,实现了文档翻译自动化,提高了文档翻译效率,保证了翻译结果的一致性。

Description

文档翻译方法、装置、电子设备及存储介质
相关申请的交叉引用
本申请要求于2020年12月30日提交的申请号为2020116050044,发明名称为“文档翻译方法、装置、电子设备及存储介质”的中国专利申请的优先权,其通过引用方式全部并入本文。
技术领域
本申请涉及计算机技术领域,尤其涉及一种文档翻译方法、装置、电子设备及存储介质。
背景技术
在文档翻译项目中,特别是具备大量格式化内容或具备大量相同内容的连贯性翻译项目中,主要依靠人工翻译的方式,翻译时间长,翻译效率低,并且不同的翻译人员翻译出的文档不一致,导致相同内容的翻译结果不一致。
发明内容
(一)要解决的技术问题
本申请提供一种文档翻译方法、装置、电子设备及存储介质,用以解决现有技术中文档的翻译时间长,翻译效率低的问题。
(二)发明内容
本申请提供一种文档翻译方法,包括:
确定文档中的多个待翻译片段;
将任一待翻译片段与翻译语料库中的所有原文片段进行散列值匹配,确定所述任一待翻译片段的译文片段;所述翻译语料库包括多个原文片段以及每一原文片段对应的散列值和译文片段;
基于每一待翻译片段的译文片段,确定所述文档的翻译结果。
根据本申请提供的文档翻译方法,所述将任一待翻译片段与翻译语料库中的所有原文片段进行散列值匹配,确定所述任一待翻译片段的译文片段,包括:
将任一待翻译片段与翻译语料库中的所有原文片段进行散列值匹配,确定与所述任一待翻译片段相匹配的多个原文片段;
基于所述任一待翻译片段,所述多个原文片段以及每一原文片段的译 文片段,确定所述任一待翻译片段的译文片段。
根据本申请提供的文档翻译方法,所述基于所述任一待翻译片段,所述多个原文片段以及每一原文片段的译文片段,确定所述任一待翻译片段的译文片段,包括:
基于所述任一待翻译片段的关联片段与每一原文片段的关联片段之间的语义相似度,确定所述任一待翻译片段的候选原文片段;
将所述候选原文片段对应的译文片段作为所述任一待翻译片段的译文片段。
根据本申请提供的文档翻译方法,所述任一待翻译片段的关联片段为所述任一待翻译片段在所述文档中的上下文片段。
根据本申请提供的文档翻译方法,所述确定文档中的多个待翻译片段,包括:
对所述文档进行片段划分,确定所述文档的所有片段;
基于所述文档中散列值相同的片段的关联片段之间的语义相似度,对所述散列值相同的片段进行聚类,得到多个语义相似类,并将每一语义相似类中的任一片段作为每一语义相似类对应的待翻译片段;
基于所述多个语义相似类以及每一语义相似类对应的待翻译片段,确定所述文档中的多个待翻译片段。
根据本申请提供的文档翻译方法,所述基于每一待翻译片段的译文片段,确定所述文档的翻译结果,包括:
基于每一语义相似类中任一片段的译文片段,确定所述每一语义相似类中所有片段的译文片段基于所述文档中的所有片段的译文片段,确定所述文档的翻译结果。
根据本申请提供的文档翻译方法,所述对所述文档进行片段划分,确定所述文档的所有片段,包括:
基于所述文档中的段落标识符和/或标点符号,对所述文档进行片段划分,确定所述文档的所有片段。
本申请提供一种文档翻译装置,包括:
片段确定单元,用于确定文档中的多个待翻译片段;
片段翻译单元,用于将任一待翻译片段与翻译语料库中的所有原文片段进行散列值匹配,确定所述任一待翻译片段的译文片段;所述翻译语料库包括多个原文片段以及每一原文片段对应的散列值和译文片段;
结果输出单元,用于基于每一待翻译片段的译文片段,确定所述文档的翻译结果。
本申请还提供一种电子设备,包括存储器、处理器及存储在存储器上 并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如上述任一种所述文档翻译方法的步骤。
本申请还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如上述任一种所述文档翻译方法的步骤。
(三)有益效果
本申请提供的文档翻译方法、装置、电子设备及存储介质,将文档中的任一待翻译片段与翻译语料库中的所有原文片段进行散列值匹配,确定任一待翻译片段的译文片段,根据每一待翻译片段的译文片段,确定文档的翻译结果,翻译语料库包括多个原文片段以及每一原文片段对应的散列值和译文片段,利用了已有的历史翻译数据,减少了翻译人员的工作量,实现了文档翻译自动化,提高了文档翻译效率,同时,避免了不同的翻译人员针对同一片段翻译出的结果不一致,保证了翻译结果的一致性。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请提供的文档翻译方法的流程示意图;
图2为本申请提供的文档翻译装置的结构示意图;
图3为本申请提供的电子设备的结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
图1为本申请提供的文档翻译方法的流程示意图,如图1所示,该方法包括:
步骤110,确定文档中的多个待翻译片段。
具体地,文档为需要翻译的文本,文档的语言种类可以为中文,也可以为英文、日文、法文、德文和阿拉伯文等。片段为组成文档的一个基本单位,可以为一个自然段或者一个句子。一篇待翻译的文档可以划分为多个待翻译片段。
例如,对于待翻译的文档,可以对其进行片段划分,得到多个待翻译片段,可以用集合表示为:
S={S 1,S 2,...,S n}
式中,S为待翻译的文档,S i为第i个待翻译片段,n为待翻译片段的数量,1≤i≤n。
步骤120,将任一待翻译片段与翻译语料库中的所有原文片段进行散列值匹配,确定该待翻译片段的译文片段;翻译语料库包括多个原文片段以及每一原文片段对应的散列值和译文片段。
具体地,将任意长度的片段通过散列算法变换成固定长度的输出,该输出就是散列值。散列值通常用一个短的随机字母和数字组成的字符串来代表。散列算法实现了数据压缩,使得数据量变小,并将数据的格式固定下来。散列算法又称哈希算法,包括MD5(Message Digest Algorithm)、SHA-1(Secure Hash Algorithm 1)和SHA-256(Secure Hash Algorithm 256)等。
翻译语料库为根据历史翻译数据建立的由原文片段和原文片段对应的译文片段组成的双语对照平行语料库。原文片段为历史数据中需要翻译的文本,译文片段为对原文片段进行翻译后得到的文本。翻译语料库中还包括原文片段和译文片段对应的散列值。此处,原文片段和译文片段是相对而言的,即翻译语料库的任一片段可以为原文片段或者译文片段,相应地,该片段对应的翻译文本为译文片段或者原文片段。
翻译语料库中原文片段和译文片段对应的散列值的计算方法和翻译过程中待翻译的文档中待翻译片段的散列值的计算方法相同。翻译语料库可以根据历史翻译数据不断地进行更新和扩充。
根据任一待翻译片段的散列值,在翻译语料库中查找相匹配的原文片段,即将任一待翻译片段与翻译语料库中的所有原文片段进行散列值匹配。
例如,对于片段集合S={S 1,S 2,...,S n},利用哈希算法对每个待翻译片 段S i进行散列化处理,得到每个待翻译片段对应的唯一的数字散列值h i及其所构成的散列值集合H,可以表示为
H={h 1,h 2,...,h n}
对于H中的每个元素h i,在翻译语料库中进行散列值查找,得到散列值与h i相同的所有原文片段。
当该待翻译片段的散列值与翻译语料库中的原文片段的散列值相同时,可以认为待翻译片段和查找到的原文片段为同一片段,可以将该原文片段的译文片段作为该待翻译片段的译文结果。
步骤130,基于每一待翻译片段的译文片段,确定该文档的翻译结果。
具体地,该文档的翻译结果为该文档的翻译对照文本。文档中的待翻译片段与翻译结果中的译文片段一一对应。
根据每一待翻译片段的译文片段,按照文档中各个待翻译片段的片段顺序整理译文片段,可以得到该文档的翻译结果。例如,对于待翻译的文档S中的每一待翻译片段S i,其对应的译文片段为Y i,则{S 1,S 2,...,S n}对应的译文集合为{Y 1,Y 2,...,Y n}。按照待翻译片段S i的顺序排列译文片段Y i,得到待翻译的文档S的翻译结果Y。
本申请提供的文档翻译方法,将文档中的任一待翻译片段与翻译语料库中的所有原文片段进行散列值匹配,确定任一待翻译片段的译文片段,根据每一待翻译片段的译文片段,确定文档的翻译结果,翻译语料库包括多个原文片段以及每一原文片段对应的散列值和译文片段,利用了已有的历史翻译数据,减少了翻译人员的工作量,实现了文档翻译自动化,提高了文档翻译效率,同时,避免了不同的翻译人员针对同一片段翻译出的结果不一致,保证了翻译结果的一致性。
基于上述实施例,步骤120包括:
将任一待翻译片段与翻译语料库中的所有原文片段进行散列值匹配,确定与该待翻译片段相匹配的多个原文片段;
基于该待翻译片段,多个原文片段以及每一原文片段的译文片段,确定该待翻译片段的译文片段。
具体地,根据任一待翻译片段的散列值,在翻译语料库中查找到的原文片段时,很有可能查找到的多个原文片段。当在翻译语料库中查找到的 原文片段为多个时,可以对查找到的多个原文片段进行筛选。
例如,可以将待翻译片段和每一原文片段进行语义相似度比较,或者对待翻译片段和每一原文片段进行词频统计,根据语义相似度比较结果或者词频统计结果确定待翻译片段对应的原文片段。
又例如,可以将待翻译片段的上下文片段和每一原文片段的上下文片段进行语义相似度比较,或者对待翻译片段的上下文片段和每一原文片段的上下文片段进行词频统计,根据语义相似度比较结果或者词频统计结果确定待翻译片段对应的原文片段。
基于上述任一实施例,基于任一待翻译片段,多个原文片段以及每一原文片段的译文片段,确定任一待翻译片段的译文片段,包括:
基于任一待翻译片段的关联片段与每一原文片段的关联片段之间的语义相似度,确定任一待翻译片段的候选原文片段;
将候选原文片段对应的译文片段作为任一待翻译片段的译文片段。
具体地,任一待翻译片段的关联片段为与该待翻译片段在语义上存在关联的片段,关联片段可以包括该待翻译片段本身在内。此外,任一待翻译片段的关联片段还可以包括该待翻译片段在文档中的上下文片段,或者该待翻译片段在文档中的上文片段,或者该待翻译片段在文档中的下文片段,或者该待翻译片段所在文档中的起始片段,例如第一段或者最后一段。
例如,对于待翻译片段S i,获取该片段在文档中的上下文片段S i-1和S i+1,将待翻译片段S i、上文片段S i-1和下文片段S i+1一起组成的片段待翻译片段S i的关联片段,用R i表示,即R i={S i-1、S i、S i+1}。
计算任一待翻译片段的关联片段与每一原文片段的关联片段之间的语义相似度,得到多个语义相似度值。可以设置阈值,阈值用于作为基准,对多个语义相似度值进行筛选。例如,设置阈值为60,若语义相似度值小于60,表明该原文片段的关联片段与该待翻译片段的关联片段的语义相似度均较低。若语义相似度值大于60,表明该原文片段的关联片段与该待翻译片段的关联片段的语义相似度均较高,则该原文片段与该待翻译片段在上下文语义上相似。
语义相似度计算方法包括基于向量空间模型的计算方法、基于汉明距离的计算方法和基于语义理解的计算方法等。
将语义相似度值最高且大于阈值的原文片段作为该待翻译片段的候选原文片段。候选原文片段为与待翻译片段在上下文语义上相似的原文片段。候选原文片段对应的译文片段可以作为该待翻译片段的译文片段。
本申请提供的文档翻译方法,根据任一待翻译片段的关联片段与每一原文片段的关联片段之间的语义相似度,确定任一待翻译片段的候选原文片段,将候选原文片段对应的译文片段作为任一待翻译片段的译文片段,使得在翻译语料库查找得到的原文片段能够与待翻译片段在语义上高度相似,提高了翻译结果的准确性。
基于上述任一实施例,任一待翻译片段的关联片段为任一待翻译片段在文档中的上下文片段。
具体地,任一待翻译片段的关联片段,优选为该待翻译片段以及该待翻译片段在待翻译的文档中的上下文片段。相应地,翻译语料库中任一原文片段的关联片段为该原文片段以及该原文片段在已翻译的文档中的上下文片段。
翻译语料库在保存原文片段及其译文片段时,可以按照已翻译的文档中的片段顺序进行保存,并对已翻译的文档及其对应的翻译结果进行对齐处理。对齐处理可以包括句句对齐或者段段对齐。
该待翻译片段在文档中的上下文片段,可以为文档中待翻译片段在文档中的上文中的一个片段或者多个片段,可以为文档中待翻译片段在文档中的下文中的一个片段或者多个片段,片段的选择数量可以根据实际情况确定,本申请实施例对此不作具体限定。
基于上述任一实施例,步骤110包括:
对文档进行片段划分,确定该文档的所有片段;
基于该文档中散列值相同的片段的关联片段之间的语义相似度,对散列值相同的片段进行聚类,得到多个语义相似类,并将每一语义相似类中的任一片段作为每一语义相似类对应的待翻译片段;
基于多个语义相似类以及每一语义相似类对应的待翻译片段,确定该文档中的多个待翻译片段。
具体地,对于待翻译的文档,进行片段划分后可以得到该文档的所有片段组成的集合S。由于某些片段散列值是相同的,则可以将其组合为同 一个片段集合E={S e1,S e2,...,S em}。其中,片段S ei和片段S ej的散列值相同,i∈[1,m],j∈[1,m],i≠j,m为集合中片段的数量。
可以对片段集合E中的片段及其相应的关联文档进行聚类,聚类方法可以采用K-means算法,聚类后将片段集合E中的片段分为若干个类,对于每个类中的若干个片段,可以认为是同一个待翻译片段。
本申请实施例提供一种基于关联片段的语义相似度的聚类方法,该方法的步骤为:
步骤一、确定片段集合E={S e1,S e2,...,S em}和语义相似度的给定阈值;
步骤二、以S e1为基准,计算S e1的关联片段与片段集合E中其余片段的关联片段的语义相似度,将所有语义相似度大于给定阈值的片段筛选出来,与S e1构成第一个语义相似类ES 1
步骤三,在片段集合E中除ES 1中之外的所有剩余片段中,按照步骤二中的方法,得到第二个语义相似类ES 2
步骤四,重复步骤二和步骤三中的方法,直到片段集合E中所有的片段都被划分到对应的语义相似类,最后得到多个语义相似类。
例如,对于E={S e1,S e2,S e3,S e4},聚类后得到ES 1={S e1,S e4}和ES 2={S e2,S e3}。
如果语义相似类中所有的片段均未得到翻译,则可以将该语义相似类中所有的片段作为一个待翻译片段进行翻译,也就是说,该语义相似类中任一片段均可以作为该语义相似类中的代表,可以将其译文片段作为该语义相似类中所有片段的译文片段。得到多个语义相似类以及每一语义相似类对应的待翻译片段,也就得到了该文档中的多个待翻译片段。
本申请实施例提供的文档翻译方法,根据每一片段的关联片段与其他片段的关联片段之间的语义相似度,对文档的所有片段进行聚类,得到多个语义相似类,根据多个语义相似类,确定文档中的多个待翻译片段,由于对文档中的片段进行了聚类分析,减少了翻译工作量,提高了文档翻译效率,保证了翻译结果的一致性。
基于上述任一实施例,基于每一待翻译片段的译文片段,确定文档的翻译结果,包括:
基于每一语义相似类中任一片段的译文片段,确定所述每一语义相似 类中所有片段的译文片段;
基于文档中的所有片段的译文片段,确定文档的翻译结果。
具体地,对于每个语义相似类,如果其中某个待翻译片段已经通过上述实施例中的文档翻译方法得到译文片段,则可以将该译文片段作为该语义相似类中所有的待翻译片段的译文片段。也就是说,每一语义相似类中任一片段的译文片段可以作为该语义相似类中所有片段的译文片段。
将文档中的待翻译片段聚类为多个语义相似类,按照上述实施例中的方法得到每个语义相似类中所有片段的译文片段,也即得到了文档中的所有片段的译文片段,按照片段在文档中的顺序进行组合,得到该文档的翻译结果。
基于上述任一实施例,对文档进行片段划分,确定文档的所有片段,包括:
基于文档中的段落标识符和/或标点符号,对文档进行片段划分,确定文档的所有片段。
具体地,对文档进行片段划分时,可以按照自然段进行划分,也可以按照句子进行划分,还可以按照自然段和句子进行划分。
若按照自然段的划分方式,则划分依据可以选择为段落标识符。若按照句子的划分方式,则划分依据可以选择标点符号。此处的标点符号为能够表征一个完整语句结束的标点符号。例如句号、问号、感叹号和回车符等。
本申请实施例提供的文档翻译方法,根据文档中的段落标识符和/或标点符号,对文档进行片段划分,确定文档的所有片段,简单易行,减少了翻译人员的工作量,实现了文档翻译自动化,提高了文档翻译效率。
下面对本申请提供的文档翻译装置进行描述,下文描述的文档翻译装置与上文描述的文档翻译方法可相互对应参照。
基于上述任一实施例,图2为本申请提供的文档翻译装置的结构示意图,如图2所示,该装置包括:
片段确定单元210,用于确定文档中的多个待翻译片段;
片段翻译单元220,用于将任一待翻译片段与翻译语料库中的所有原文片段进行散列值匹配,确定任一待翻译片段的译文片段;翻译语料库包 括多个原文片段以及每一原文片段对应的散列值和译文片段;
结果输出单元230,用于基于每一待翻译片段的译文片段,确定文档的翻译结果。
具体地,片段确定单元210用于确定文档中的多个待翻译片段。片段翻译单元220用于将任一待翻译片段与翻译语料库中的所有原文片段进行散列值匹配,确定任一待翻译片段的译文片段。结果输出单元230用于确定文档的翻译结果。
本申请提供的文档翻译装置,将文档中的任一待翻译片段与翻译语料库中的所有原文片段进行散列值匹配,确定任一待翻译片段的译文片段,根据每一待翻译片段的译文片段,确定文档的翻译结果,翻译语料库包括多个原文片段以及每一原文片段对应的散列值和译文片段,利用了已有的历史翻译数据,减少了翻译人员的工作量,实现了文档翻译自动化,提高了文档翻译效率,同时,避免了不同的翻译人员针对同一片段翻译出的结果不一致,保证了翻译结果的一致性。
基于上述任一实施例,片段翻译单元220包括:
匹配子单元,用于将任一待翻译片段与翻译语料库中的所有原文片段进行散列值匹配,确定与该待翻译片段相匹配的多个原文片段;
翻译子单元,用于基于该待翻译片段,多个原文片段以及每一原文片段的译文片段,确定该待翻译片段的译文片段。
基于上述任一实施例,翻译子单元包括:
相似度比较模块,用于基于任一待翻译片段的关联片段与每一原文片段的关联片段之间的语义相似度,确定该待翻译片段的候选原文片段;
译文片段确定模块,用于将候选原文片段对应的译文片段作为该待翻译片段的译文片段。
基于上述任一实施例,任一待翻译片段的关联片段为该待翻译片段在文档中的上下文片段。
基于上述任一实施例,片段确定单元210包括:
片段划分子单元,用于对文档进行片段划分,确定文档的所有片段;
聚类子单元,用于基于所述文档中散列值相同的片段的关联片段之间的语义相似度,对所述散列值相同的片段进行聚类,得到多个语义相似类, 并将每一语义相似类中的任一片段作为每一语义相似类对应的待翻译片段;
待翻译片段确定子单元,用于基于所述多个语义相似类以及每一语义相似类对应的待翻译片段,确定所述文档中的多个待翻译片段。
基于上述任一实施例,结果输出单元具体用于:
基于每一语义相似类中任一片段的译文片段,确定所述每一语义相似类中所有片段的译文片段;
基于文档中的所有片段的译文片段,确定文档的翻译结果。
基于上述任一实施例,片段划分子单元具体用于:
基于文档中的段落标识符和/或标点符号,对文档进行片段划分,确定文档的所有片段。
基于上述任一实施例,图3为本申请提供的电子设备的结构示意图,如图3所示,该电子设备可以包括:处理器(Processor)310、通信接口(Communications Interface)320、存储器(Memory)330和通信总线(Communications Bus)340,其中,处理器310,通信接口320,存储器330通过通信总线340完成相互间的通信。处理器310可以调用存储器330中的逻辑命令,以执行上述各实施例提供的方法,该方法包括:
确定文档中的多个待翻译片段;将任一待翻译片段与翻译语料库中的所有原文片段进行散列值匹配,确定任一待翻译片段的译文片段;翻译语料库包括多个原文片段以及每一原文片段对应的散列值和译文片段;基于每一待翻译片段的译文片段,确定文档的翻译结果。
此外,上述的存储器330中的逻辑命令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干命令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
本申请实施例提供的电子设备中的处理器可以调用存储器中的逻辑指令,实现上述文档翻译方法,其具体的实施方式与方法实施方式一致,且可以达到相同的有益效果,此处不再赘述。
本申请还提供一种非暂态计算机可读存储介质,下面对本申请提供的非暂态计算机可读存储介质进行描述,下文描述的非暂态计算机可读存储介质与上文描述的文档翻译方法可相互对应参照。
本申请实施例提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现以执行上述各实施例提供的方法,该方法包括:
确定文档中的多个待翻译片段;将任一待翻译片段与翻译语料库中的所有原文片段进行散列值匹配,确定任一待翻译片段的译文片段;翻译语料库包括多个原文片段以及每一原文片段对应的散列值和译文片段;基于每一待翻译片段的译文片段,确定文档的翻译结果。
本申请实施例提供的非暂态计算机可读存储介质上存储的计算机程序被执行时,实现上述文档翻译方法,其具体的实施方式与方法实施方式一致,且可以达到相同的有益效果,此处不再赘述。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干命令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对 其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (10)

  1. 一种文档翻译方法,其特征在于,包括:
    确定文档中的多个待翻译片段;
    将任一待翻译片段与翻译语料库中的所有原文片段进行散列值匹配,确定所述任一待翻译片段的译文片段;所述翻译语料库包括多个原文片段以及每一原文片段对应的散列值和译文片段;
    基于每一待翻译片段的译文片段,确定所述文档的翻译结果。
  2. 根据权利要求1所述的文档翻译方法,其特征在于,所述将任一待翻译片段与翻译语料库中的所有原文片段进行散列值匹配,确定所述任一待翻译片段的译文片段,包括:
    将任一待翻译片段与翻译语料库中的所有原文片段进行散列值匹配,确定与所述任一待翻译片段相匹配的多个原文片段;
    基于所述任一待翻译片段,所述多个原文片段以及每一原文片段的译文片段,确定所述任一待翻译片段的译文片段。
  3. 根据权利要求2所述的文档翻译方法,其特征在于,所述基于所述任一待翻译片段,所述多个原文片段以及每一原文片段的译文片段,确定所述任一待翻译片段的译文片段,包括:
    基于所述任一待翻译片段的关联片段与每一原文片段的关联片段之间的语义相似度,确定所述任一待翻译片段的候选原文片段;
    将所述候选原文片段对应的译文片段作为所述任一待翻译片段的译文片段。
  4. 根据权利要求3所述的文档翻译方法,其特征在于,所述任一待翻译片段的关联片段为所述任一待翻译片段在所述文档中的上下文片段。
  5. 根据权利要求4所述的文档翻译方法,其特征在于,所述确定文档中的多个待翻译片段,包括:
    对所述文档进行片段划分,确定所述文档的所有片段;
    基于所述文档中散列值相同的片段的关联片段之间的语义相似度,对所述散列值相同的片段进行聚类,得到多个语义相似类,并将每一语义相似类中的任一片段作为每一语义相似类对应的待翻译片段;
    基于所述多个语义相似类以及每一语义相似类对应的待翻译片段,确 定所述文档中的多个待翻译片段。
  6. 根据权利要求5所述的文档翻译方法,其特征在于,所述基于每一待翻译片段的译文片段,确定所述文档的翻译结果,包括:
    基于每一语义相似类中任一片段的译文片段,确定所述每一语义相似类中所有片段的译文片段;
    基于所述文档中的所有片段的译文片段,确定所述文档的翻译结果。
  7. 根据权利要求5所述的文档翻译方法,其特征在于,所述对所述文档进行片段划分,确定所述文档的所有片段,包括:
    基于所述文档中的段落标识符和/或标点符号,对所述文档进行片段划分,确定所述文档的所有片段。
  8. 一种文档翻译装置,其特征在于,包括:
    片段确定单元,用于确定文档中的多个待翻译片段;
    片段翻译单元,用于将任一待翻译片段与翻译语料库中的所有原文片段进行散列值匹配,确定所述任一待翻译片段的译文片段;所述翻译语料库包括多个原文片段以及每一原文片段对应的散列值和译文片段;
    结果输出单元,用于基于每一待翻译片段的译文片段,确定所述文档的翻译结果。
  9. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至7中任一项所述的文档翻译方法的步骤。
  10. 一种非暂态计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7中任一项所述的文档翻译方法的步骤。
PCT/CN2021/078816 2020-12-30 2021-03-03 文档翻译方法、装置、电子设备及存储介质 WO2022141788A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011605004.4 2020-12-30
CN202011605004.4A CN112633015A (zh) 2020-12-30 2020-12-30 文档翻译方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022141788A1 true WO2022141788A1 (zh) 2022-07-07

Family

ID=75286435

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/078816 WO2022141788A1 (zh) 2020-12-30 2021-03-03 文档翻译方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN112633015A (zh)
WO (1) WO2022141788A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101034395A (zh) * 2007-03-30 2007-09-12 传神联合(北京)信息技术有限公司 一种待译文件处理系统及使用这种系统的文件处理方法
CN101140570A (zh) * 2006-09-04 2008-03-12 富士施乐株式会社 翻译装置、翻译方法以及计算机可读介质
CN107885737A (zh) * 2017-12-27 2018-04-06 传神语联网网络科技股份有限公司 一种人机互动翻译方法及系统
US10452785B2 (en) * 2017-03-09 2019-10-22 Rakuten, Inc. Translation assistance system, translation assistance method and translation assistance program
CN111611813A (zh) * 2020-04-29 2020-09-01 南京南瑞继保电气有限公司 文档翻译方法、装置、电子设备及存储介质
CN111666776A (zh) * 2020-06-23 2020-09-15 北京字节跳动网络技术有限公司 文档翻译方法和装置、存储介质和电子设备

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5386855B2 (ja) * 2008-05-30 2014-01-15 富士ゼロックス株式会社 翻訳メモリ翻訳装置および翻訳プログラム

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101140570A (zh) * 2006-09-04 2008-03-12 富士施乐株式会社 翻译装置、翻译方法以及计算机可读介质
CN101034395A (zh) * 2007-03-30 2007-09-12 传神联合(北京)信息技术有限公司 一种待译文件处理系统及使用这种系统的文件处理方法
US10452785B2 (en) * 2017-03-09 2019-10-22 Rakuten, Inc. Translation assistance system, translation assistance method and translation assistance program
CN107885737A (zh) * 2017-12-27 2018-04-06 传神语联网网络科技股份有限公司 一种人机互动翻译方法及系统
CN111611813A (zh) * 2020-04-29 2020-09-01 南京南瑞继保电气有限公司 文档翻译方法、装置、电子设备及存储介质
CN111666776A (zh) * 2020-06-23 2020-09-15 北京字节跳动网络技术有限公司 文档翻译方法和装置、存储介质和电子设备

Also Published As

Publication number Publication date
CN112633015A (zh) 2021-04-09

Similar Documents

Publication Publication Date Title
US10108607B2 (en) Method and device for machine translation
US9740688B2 (en) System and method for training a machine translation system
US8660834B2 (en) User input classification
US11645475B2 (en) Translation processing method and storage medium
JP7413630B2 (ja) 要約生成モデルの訓練方法、装置、デバイス及び記憶媒体
CN111539229A (zh) 神经机器翻译模型训练方法、神经机器翻译方法及装置
US10083172B2 (en) Native-script and cross-script chinese name matching
CN109063184B (zh) 多语言新闻文本聚类方法、存储介质及终端设备
WO2017166626A1 (zh) 归一化方法、装置和电子设备
WO2017101541A1 (zh) 文本聚类方法、装置及计算设备
WO2020103447A1 (zh) 视频信息链式存储方法、装置、计算机设备及存储介质
CN110929510A (zh) 一种基于字典树的中文未登录词识别方法
CN108132917B (zh) 一种文档纠错标记方法
CN116468009A (zh) 文章生成方法、装置、电子设备和存储介质
WO2022160819A1 (zh) 文档批量翻译方法、装置、电子设备及存储介质
CN113408660B (zh) 图书聚类方法、装置、设备和存储介质
WO2022141788A1 (zh) 文档翻译方法、装置、电子设备及存储介质
US20200089774A1 (en) Machine Translation Method and Apparatus, and Storage Medium
JP5106431B2 (ja) 機械翻訳装置、プログラム及び方法
CN109783820B (zh) 一种语义解析方法及系统
Mashtalir et al. Data preprocessing and tokenization techniques for technical Ukrainian texts
KR101721536B1 (ko) 품사간 정렬 경향을 반영한 통계적 단어 정렬 방법 및 이를 이용한 기계 번역 장치
CN116522966B (zh) 基于多语言词条的文本翻译方法及系统
CN114416213B (zh) 词向量文件加载方法、装置及存储介质
CN115496079B (zh) 一种中文翻译方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21912558

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21912558

Country of ref document: EP

Kind code of ref document: A1