CN109299480B

CN109299480B - Context-based term translation method and device

Info

Publication number: CN109299480B
Application number: CN201811025328.3A
Authority: CN
Inventors: 宋安琪
Original assignee: Iol Wuhan Information Technology Co ltd; Shanghai Transn Translation Services Co ltd
Current assignee: Iol Wuhan Information Technology Co ltd; Shanghai Transn Translation Services Co ltd
Priority date: 2018-09-04
Filing date: 2018-09-04
Publication date: 2023-11-07
Anticipated expiration: 2038-09-04
Also published as: CN109299480A

Abstract

The embodiment of the invention provides a method and a device for translating terms based on a context, wherein the method comprises the following steps: sentence dividing processing is carried out on a document to be translated, terms in the document to be translated and sentences where the terms are located are extracted by combining a maximum forward matching algorithm with a term library selected in advance by a translator, and term definitions corresponding to the terms are obtained from the term library; extracting the corpus with the sentence matching degree of the corpus and the term being greater than a preset threshold value, sequencing the corpus according to the matching degree from high to low, and filtering the corpus which does not contain the term; acquiring corpus paraphrasing corresponding to the terms in the corpus by adopting a word alignment method; and screening the term definitions by utilizing the corpus definitions and a character string editing distance algorithm to obtain final term definitions. According to the embodiment of the invention, the term paraphrasing in the corpus is extracted by the word alignment method, so that the term prompt function of computer-aided translation is improved, and the translation efficiency of a translator can be effectively improved.

Description

Context-based term translation method and device

Technical Field

The embodiment of the invention relates to the technical field of natural language processing, in particular to a method and a device for translating terms based on a context.

Background

The Computer aided translation (CAT, computer-Aided Translation) means that when the translator performs translation work, the background continuously and automatically stores translations entered by the translator so as to establish a database, so that when the same or similar phrases appear again in the later translation process, the system can automatically search the same or similar content stored in the database, provide reference translations for the translator, avoid repeated translation work, and only concentrate on the translation of new content, thereby effectively improving the translation efficiency.

In computer aided translation, term hint is an important function, and a translator typically connects multiple term libraries during translation, and a term typically corresponds to multiple definitions. The existing term prompt function generally prompts the translator of all the definitions of the term, and the translator needs to choose the definition according to the context, so that the translator cannot quickly choose the correct term definition for the translation, which is inefficient. Accordingly, there is a need to provide a method for improving the term hint function that provides accurate term definitions to a translator.

Disclosure of Invention

Aiming at the problems existing in the prior art, the embodiment of the invention provides a method and a device for translating terms based on a context.

According to a first aspect of an embodiment of the present invention, there is provided a context-based term translation method, comprising:

sentence dividing processing is carried out on a document to be translated, a maximum forward matching algorithm is utilized to combine a term library preselected by a translator to extract terms in the document to be translated and sentences in which the terms are located, and one or more term definitions corresponding to the terms are obtained from the term library;

extracting the linguistic data with the sentence matching degree of the linguistic data with the term being greater than a preset threshold value from a linguistic data base, sequencing the linguistic data according to the similarity from high to low, and filtering the linguistic data without the term;

acquiring corpus paraphrasing corresponding to the terms in the corpus by adopting a word alignment method;

and screening the term definitions by utilizing the corpus definitions and a character string editing distance algorithm to obtain final term definitions.

According to a second aspect of embodiments of the present invention, there is provided a context-based term translation apparatus comprising:

the term definition acquisition module is used for carrying out sentence processing on a document to be translated, extracting terms in the document to be translated and sentences in which the terms are located by combining a term library preselected by a translator through a maximum forward matching algorithm, and acquiring one or more term definitions corresponding to the terms from the term library;

the corpus extraction module is used for extracting the corpus with the sentence matching degree of the corpus with the term being greater than a preset threshold value, sequencing the corpus according to the similarity from high to low, and filtering the corpus which does not contain the term;

the word alignment module is used for acquiring corpus paraphrasing corresponding to the term in the corpus by adopting a word alignment method;

and the paraphrasing screening module is used for screening the term paraphrasing by utilizing the corpus paraphrasing and combining a character string editing distance algorithm to obtain a final term paraphrasing.

According to a third aspect of an embodiment of the present invention, there is provided an electronic apparatus including:

at least one processor; and

at least one memory communicatively coupled to the processor, wherein:

the memory stores program instructions executable by the processor to invoke the method of context-based term translation provided by any of the various possible implementations of the first aspect described above.

According to a fourth aspect of embodiments of the present invention, there is provided a non-transitory computer-readable storage medium storing computer instructions that enable the computer to perform the context-based term translation method provided by any one of the various possible implementations of the first aspect described above.

According to the method and the device for translating the terms based on the context, which are provided by the embodiment of the invention, the definitions of the terms in the corpus are extracted through the word alignment method, so that the optimal definitions are screened out, the term prompt function in computer-aided translation can be improved, the optimal definitions conforming to the context are provided for the translator, and the translation efficiency of the translator can be effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow diagram of a context-based term translation method provided by an embodiment of the present invention;

FIG. 2 is a schematic diagram of a context-based term translation device according to another embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The term hint in existing computer aided translations is an important function and typically prompts the translator for all definitions of the term, which may require selection of the definition based on context, resulting in an inability to select the correct term definition for translation in a short period of time. In order to overcome the above problems in the prior art, the present invention provides a method for translating terms based on a context, which uses a term library and a corpus of computer-aided translation to obtain an optimal interpretation meeting the context of the terms by term interpretation extraction, corpus matching, word alignment and interpretation screening. The following description and description will be made with reference to various embodiments.

As shown in fig. 1, a flow chart of a context-based term translation method according to an embodiment of the present invention is shown, where an execution subject of the method is a server, and the method includes:

and step 10, sentence processing is carried out on the document to be translated, the terms in the document to be translated and sentences in which the terms are located are extracted by combining a maximum forward matching algorithm with a term library selected in advance by a translator, and one or more term definitions corresponding to the terms are obtained from the term library.

Specifically, the existing general sentence dividing algorithm can be utilized to process the sentence to be translated, and the sentence to be translated is divided into a plurality of sentences, so that the subsequent extraction of the terms and the sentences in which the terms are located is facilitated.

In order to determine the best interpretation corresponding to the term by utilizing the context of the term, the embodiment of the invention firstly needs to extract the term and the sentence where the term is located in the document to be translated. The terms in each sentence of the document to be translated can be extracted by using a maximum forward matching algorithm (for convenience of description, the extracted terms are referred to as target terms). Correspondingly, the sentence in which the term is located can also be determined. The maximum forward matching algorithm is a basic word segmentation algorithm, and is mature in application and not described herein.

The term definitions corresponding to the target term are then obtained from the term library, meaning that the term is in the term library, and one term may be one or more in the term library, e.g., the term definition of the Super Computer includes: "supercomputer", "high-performance computer", etc. The term definitions are the basis for the best definition of the term provided to the translator, and the subsequent steps of operation are all processes performed on the basis of the term definitions.

And 11, extracting the linguistic data with the sentence matching degree of the linguistic data with the term being greater than a preset threshold value, sequencing the linguistic data according to the matching degree from high to low, and filtering the linguistic data without the term.

In order to provide accurate term definitions to a translator, embodiments of the present invention require the use of the context of the term, which may be accomplished through corpus matching. Specifically, a corpus with a sentence matching degree with a target term being greater than a preset threshold is extracted, and a large number of sentence texts and corresponding translations thereof are stored in the corpus according to the embodiment of the invention, and the corpus similar to the sentence with the target term being is matched according to the preset threshold through an elastic search and other search systems, wherein the value of the preset threshold can be set according to needs, for example, 50%.

The corpus satisfying the extraction conditions can be a plurality of sentences, and the extracted corpus is ordered according to the matching degree. The corpus extracted according to the method may have the condition of not containing the terms in the document to be translated, so that the corpus not containing the terms in the document to be translated also needs to be filtered.

It should be noted that, the embodiment of the invention does not limit the method for calculating the matching degree between sentences.

And step 12, acquiring corpus paraphrasing corresponding to the terms in the corpus by adopting a word alignment method.

Specifically, the translation of the target term in the extracted corpus is called corpus paraphrasing. The extracted corpus is sentences with the matching degree with the sentences of the target term being greater than a preset threshold value and translations thereof, and can also be considered as sentences with higher similarity with the sentences of the target term. The translation of the target term in the extracted corpus is already translated and is an existing translation resource, and can be considered as a relatively accurate interpretation. In order to provide accurate term definitions for the translator, translations corresponding to the target terms in all extracted corpora need to be found and screened. The method can be used for obtaining the target term by a word alignment method, and if the context of the target term in the corpus is aligned by the word alignment method, the translation of the target term can be directly obtained.

And step 13, screening the term definitions by utilizing the corpus definitions and a character string editing distance algorithm to obtain final term definitions.

Specifically, after the corpus paraphrasing is obtained, the above term paraphrasing is screened by using the corpus paraphrasing. The string edit distance algorithm refers to the minimum number of edits required to convert one string into another string, and can be used to compare the similarity between the corpus paraphrasing and the term paraphrasing, for example, the edit distance between the supercomputer and the supercomputer is 1, the edit distance is smaller, and the description similarity is higher. If the similarity is zero, the editing distance is large.

The step of screening the term definitions by utilizing the corpus definitions and the character string editing distance algorithm is to compare the similarity of the term definitions and the corpus definitions through the character string editing distance algorithm, delete the term definitions with zero similarity, and sort the term definitions according to the editing distance from small to large to obtain final term definitions, wherein the final term definitions are the term definitions most similar to the corpus definitions.

According to the term translation method based on the context, the term definitions in the corpus are extracted through the word alignment method, so that the optimal definition is screened based on the context, a term prompt function in computer-aided translation can be improved, the optimal definition conforming to the context is provided for a translator, the time for selecting the term translation by the translator is saved, the repeated work phenomenon in the translation process is avoided, and the translation efficiency of the translator can be effectively improved.

On the basis of the foregoing embodiment, before the step of extracting the corpus, in which the sentence similarity with the term is greater than a preset threshold, the method further includes:

classifying the files to be translated by using a pre-established classifier, and classifying the files to be translated into one or more industry categories according to the probability;

querying dictionary definitions corresponding to the terms in the industry category dictionary, and pre-screening the term definitions by utilizing the dictionary definitions;

correspondingly, the step of screening the term definitions by utilizing the corpus definitions and a character string editing distance algorithm to obtain final term definitions comprises the following specific steps:

and screening the term definitions subjected to pre-screening by utilizing the corpus definitions and a character string editing distance algorithm to obtain final term definitions.

Specifically, the embodiment of the invention also provides a term translation method based on the context, which is characterized in that before the term paraphrasing is screened by utilizing the corpus paraphrasing, the term paraphrasing is pre-screened by utilizing the dictionary paraphrasing. The detailed flow of the term translation method based on the context provided by the embodiment of the invention can be as follows:

first, the to-be-translated document is subjected to industry classification: and establishing a naive Bayesian text classifier according to industry classification according to a large number of translation manuscripts with industry labels, and classifying according to industry setting. Classifying the files to be translated by using the established classifier, and classifying the files into one or more categories according to the probability. In particular, a document is considered to belong to a general category when the probability that the document belongs to each category is low.

Acquiring term paraphrasing: and carrying out sentence segmentation on the document to be translated, extracting the terms in the document to be translated and sentences in which the terms are located by combining a term library preselected by a translator by using a maximum forward matching method, and acquiring one or more term definitions corresponding to the terms from the term library.

Pre-screening with dictionary definitions: inquiring the corresponding dictionary definitions of the terms in the industry category dictionary, comparing the similarity of the term definitions and the dictionary definitions through a character string editing distance algorithm, sorting the term definitions according to the editing distance from small to large, and filtering the term definitions with the similarity of zero to obtain the pre-screened term definitions.

Matching corpus: extracting a corpus with sentence similarity greater than a preset threshold value from the corpus, sequencing the corpus according to the similarity from high to low, and filtering the corpus which does not contain the term;

word alignment extracts term paraphrasing in corpus: and acquiring the corpus paraphrasing of the terms in the corpus by adopting a word alignment method.

Term paraphrasing screening: and comparing the similarity of the pre-screened term definitions with the corpus definitions through a character string editing distance algorithm, sorting the pre-screened term definitions according to the editing distance from small to large, and deleting the term definitions with the similarity of zero to obtain the final term definitions.

According to the method for translating the terms based on the context, provided by the embodiment of the invention, the dictionary definitions are utilized to conduct pre-screening of the term definitions, and then the word alignment method is utilized to extract the term definitions in the corpus, so that the translation accuracy of the term definitions can be improved, the optimal definition can be screened based on the context, the term prompt function in computer-aided translation can be improved, the optimal definition conforming to the context is provided for the translator, the time for selecting the term translations by the translator is saved, the repeated work phenomenon in the translation process is avoided, and the translation efficiency of the translator can be effectively improved.

Based on the content of each embodiment, the step of obtaining the corpus paraphrasing corresponding to the term in the corpus by using a word alignment method specifically includes:

performing word alignment scoring on the terms in the corpus by using a preset scoring model, and taking the translated word with the highest word alignment scoring as the corpus paraphrasing of the terms;

wherein, the preset scoring model is:

in the above formula, src represents the original text word, dst represents the translated text word, similarity represents the original text wordDefinition similarity, w, of sink src and translated word dst _i Weights representing the ith factor, score _i Score, q, representing the ith factor _j Weight, distance of jth word in four words representing context of original word src _j Representing the distance between the aligned word and the translated word dst if the jth word is aligned, len representing the number of verbs and nouns contained in the corpus.

Specifically, word alignment scoring is performed on terms in all extracted corpus by using a preset scoring model, wherein the scoring model comprises three aspects of contents:

the similarity represents the paraphrase similarity of the original word src and the translated word dst, and if the paraphrases are identical, the paraphrases are 1, 80% similar, 0.8, half similar, 0.5 and completely different, and 0.

The second aspect of the scoring model is the scoring factor measure, w _i Weights representing the ith factor, score _i The score representing the ith factor (1 represents full satisfaction, 0.5 represents half satisfaction). Wherein w is _i Score obtained by training a plurality of bilingual sentences including word alignment _i Comprises the following types:

whether the parts of speech of src and dst are the same, if so, score _i 1, not identical, score _i Is 0;

two context words before and after src and dst are aligned with each other, if the context word of src is already aligned with the context word of dst, score _i 1 is shown in the specification; if half aligned, score _i 0.5; if not aligned at all, score _i Is 0. For example, the context of src is ABsrcCD, the context of dst is EFdstGH, A is aligned to E, B is aligned to F, C is aligned to G, D is aligned to H, score _i 1. The context of src is ABsrcCD, dst is EFdstGH, A is aligned to G, B is aligned to H, then score _i 0.5.

The third aspect of the scoring model is the penalty value, where qj represents the weight of the jth term in the four terms of the src context, e.g., the context terms of src are the weights of ABsrcCD, B and CWeights 1, A and D are 0.5.distance represents the distance between the aligned words and dst if aligned, len represents the number of nouns and verb words contained in the corpus. For example: the total number of nouns and verb words contained in the corpus is 10, the context word of the term src is ABsrcCD, the context word of the word dst in the translation is EFdstGH, A, B and C are not aligned, D is aligned to the 5 th word behind H, and distance is the word of the word dst _j =5, third partHas a value of 0.25.

When the three-part total score of the word alignment scoring model is greater than 1, the total score is set to 1. And taking the translated word with the highest word alignment score as the corpus paraphrasing of the term.

As shown in fig. 2, a schematic structural diagram of a context-based term translation device according to an embodiment of the present invention includes: a term paraphrasing acquisition module 20, a corpus extraction module 21, a word alignment module 22 and a paraphrasing screening module 23, wherein,

the term definition obtaining module 20 is configured to perform sentence processing on a document to be translated, extract terms in the document to be translated and sentences in which the terms are located by using a maximum forward matching algorithm in combination with a term library pre-selected by a translator, and obtain one or more term definitions corresponding to the terms from the term library.

Specifically, the term definition obtaining module 20 may perform sentence segmentation on the document to be translated by using an existing general sentence segmentation algorithm, so as to segment the document to be translated into a plurality of sentences, thereby facilitating subsequent extraction of terms and sentences in which the terms are located.

In order to determine the best interpretation corresponding to a term by using the context of the term, the term interpretation obtaining module 20 needs to extract the term and the sentence in which the term is located in the document to be translated. The terms in each sentence of the document to be translated can be extracted by using a maximum forward matching algorithm (for convenience of description, the extracted terms are referred to as target terms). Correspondingly, the sentence in which the term is located can also be determined. The maximum forward matching algorithm is a basic word segmentation algorithm, and is mature in application and not described herein.

The term definition acquisition module 20 then acquires the term definition corresponding to the target term from the term library, the term definition referring to the definition of the term in the term library, and the definition of a term in the term library may be one or more.

The corpus extraction module 21 is configured to extract a corpus in the corpus, where the degree of matching with the sentence where the term is located is greater than a preset threshold, rank the corpus according to the degree of matching from high to low, and filter the corpus that does not include the term.

In order to provide accurate term definitions to a translator, embodiments of the present invention require the use of the context of the term, which may be accomplished through corpus matching. Specifically, the corpus extraction module 21 extracts a corpus with a sentence matching degree with the target term being greater than a preset threshold, and the corpus in the embodiment of the invention stores a large number of sentence texts and corresponding translations thereof, and can match the corpus similar to the sentence with the target term being according to the preset threshold through a search system such as an elastic search system, and the value of the preset threshold can be set according to needs, for example, 50%.

The corpus satisfying the extraction conditions may be a plurality of sentences, and the corpus extraction module 21 sorts the extracted corpus according to the matching degree. The corpus extracted according to the above method may have a case of not including terms in the document to be translated, and therefore, the corpus extraction module 21 also needs to filter the corpus not including terms in the document to be translated.

The word alignment module 22 is configured to obtain a corpus paraphrasing corresponding to the term in the corpus by using a word alignment method.

The extracted corpus is sentences with the matching degree with the sentences of the target term being greater than a preset threshold value and translations thereof, and can also be considered as sentences with higher similarity with the sentences of the target term. The translation of the target term in the extracted corpus is already translated and is an existing translation resource, and can be considered as a relatively accurate interpretation. In order to provide accurate term definitions to the translator, the word alignment module 22 needs to find and filter translations corresponding to the target terms in all extracted corpora. The method can be used for obtaining the target term by a word alignment method, and if the context of the target term in the corpus is aligned by the word alignment method, the translation of the target term can be directly obtained.

And the paraphrasing screening module 23 is used for screening the term paraphrasing by utilizing the corpus paraphrasing and the character string editing distance algorithm to obtain a final term paraphrasing.

Specifically, after the corpus paraphrasing is obtained, the paraphrasing screening module 23 screens the term paraphrasing using the corpus paraphrasing. The character string editing distance algorithm refers to the minimum editing frequency required for converting one character string into another character string, and can be used for comparing the similarity between the corpus paraphrasing and the term paraphrasing, the editing distance is smaller, and the description similarity is higher. If the similarity is zero, the editing distance is large.

According to the term translation device based on the context, provided by the embodiment of the invention, the term definitions in the corpus are extracted through the word alignment method, so that the optimal definition is screened based on the context, the term prompt function in computer-aided translation can be improved, the optimal definition conforming to the context is provided for the translator, the time for selecting the term translation by the translator is saved, the repeated work phenomenon in the translation process is avoided, and the translation efficiency of the translator can be effectively improved.

Based on the content of the above embodiment, the apparatus further includes:

the classification module is used for classifying the files to be translated by using a pre-established classifier, and classifying the files to be translated into one or more industry categories according to the probability;

a pre-screening module, configured to query a dictionary definition corresponding to the term in the industry category dictionary, and pre-screen the term definition using the dictionary definition;

correspondingly, the paraphrasing screening module is specifically used for:

Specifically, the classification module establishes a naive Bayesian text classifier according to industry classification according to a large number of translation manuscripts with industry labels, and classifies according to industry setting. Classifying the files to be translated by using the established classifier, and classifying the files into one or more categories according to the probability. In particular, a document is considered to belong to a general category when the probability that the document belongs to each category is low.

And the pre-screening module inquires corresponding dictionary definitions of the terms in the industry category dictionary, compares the similarity of the term definitions and the dictionary definitions through a character string editing distance algorithm, sorts the term definitions according to the editing distance from small to large, and filters the term definitions with the similarity of zero to obtain the pre-screened term definitions.

According to the context-based term translation device provided by the embodiment of the invention, the dictionary definitions are utilized to pre-screen the term definitions, and the word alignment method is utilized to extract the term definitions in the corpus, so that the translation accuracy of the term definitions can be improved.

Based on the content of the above embodiment, the word alignment module 22 is specifically configured to:

wherein, the preset scoring model is:

in the above, src represents the original word, dst represents the translated word, similarity represents the similarity of the definitions of the original word src and the translated word dst, and w _i Weights representing the ith factor, score _i Score, q, representing the ith factor _j Weight, distance of jth word in four words representing context of original word src _j Representing the distance between the aligned word and the translated word dst if the jth word is aligned, len representing the number of verbs and nouns contained in the corpus.

Specifically, the word alignment module 22 performs word alignment scoring on terms in all extracted corpus by using a preset scoring model, where the scoring model includes three aspects of contents:

two context words before and after src and dst are aligned with each other, if the context word of src is already aligned with the context word of dst, score _i 1 is shown in the specification; if half aligned, score _i 0.5; if not aligned at all, score _i Is 0.

The third aspect of the scoring model is penalty, where q _j Weights representing the jth term of the jth four terms in the src-th context。distance _j Indicating the distance of the aligned word and dst if aligned. len represents the number of nouns and verb words contained in the corpus.

FIG. 3 shows a schematic structural diagram of an electronic device according to an embodiment of the present invention, as shown in FIG. 3, including a processor (processor) 301, a memory (memory) 302, and a bus 303;

wherein, the processor 301 and the memory 302 respectively complete communication with each other through the bus 303; the processor 301 is configured to invoke program instructions in the storage 302 to perform the context-based term translation method provided by the above embodiments, including, for example: sentence dividing processing is carried out on a document to be translated, a maximum forward matching algorithm is utilized to combine a term library preselected by a translator to extract terms in the document to be translated and sentences in which the terms are located, and one or more term definitions corresponding to the terms are obtained from the term library; extracting the linguistic data with the sentence matching degree of the linguistic data with the term being greater than a preset threshold value from a linguistic data base, sequencing the linguistic data according to the similarity from high to low, and filtering the linguistic data without the term; acquiring corpus paraphrasing corresponding to the terms in the corpus by adopting a word alignment method; and screening the term definitions by utilizing the corpus definitions and a character string editing distance algorithm to obtain final term definitions.

Embodiments of the present invention provide a non-transitory computer readable storage medium storing computer instructions that cause a computer to perform the context-based term translation method provided by the above embodiments, for example, including: sentence dividing processing is carried out on a document to be translated, a maximum forward matching algorithm is utilized to combine a term library preselected by a translator to extract terms in the document to be translated and sentences in which the terms are located, and one or more term definitions corresponding to the terms are obtained from the term library; extracting the linguistic data with the sentence matching degree of the linguistic data with the term being greater than a preset threshold value from a linguistic data base, sequencing the linguistic data according to the similarity from high to low, and filtering the linguistic data without the term; acquiring corpus paraphrasing corresponding to the terms in the corpus by adopting a word alignment method; and screening the term definitions by utilizing the corpus definitions and a character string editing distance algorithm to obtain final term definitions.

The apparatus embodiments described above are merely illustrative, wherein elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on such understanding, the foregoing technical solutions may be embodied essentially or in part in the form of a software product, which may be stored in a computer-readable storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the various embodiments or methods of some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method of context-based term translation, comprising:

extracting a corpus with the sentence matching degree of the corpus and the term being greater than a preset threshold, wherein the corpus comprises a plurality of sentences, sequencing the corpus according to the matching degree from high to low, and filtering the corpus which does not comprise the term;

screening the term definitions by utilizing the corpus definitions and a character string editing distance algorithm to obtain final term definitions;

before the step of extracting the corpus, the corpus with the sentence similarity with the term being greater than a preset threshold value, the method further comprises the following steps:

screening the term definitions subjected to pre-screening by utilizing the corpus definitions and a character string editing distance algorithm to obtain final term definitions;

the step of acquiring the corpus paraphrasing corresponding to the term in the corpus by adopting a word alignment method comprises the following specific steps:

wherein, the preset scoring model is:

in the above, src represents the original word, dst represents the translated word, similarity represents the similarity of the definitions of the original word src and the translated word dst, and w _i Weights representing the ith factor, score _i Score, q, representing the ith factor _j The j-th word weight in the four words in the context of the original text word src is represented, the distance j represents the distance between the aligned word and the translated word dst if the j-th word is aligned, and len represents the verb and noun quantity contained in the corpus.

2. The method according to claim 1, wherein the step of screening the term definitions by using the corpus definitions in combination with a string edit distance algorithm to obtain final term definitions is specifically:

comparing the similarity of the term paraphrasing and the corpus paraphrasing through a character string editing distance algorithm;

and if the similarity is not zero, deleting the term paraphrasing with zero similarity, and sorting the term paraphrasing according to the editing distance from small to large to obtain the final term paraphrasing.

3. The method according to claim 1, wherein the step of pre-screening the term definitions using the dictionary definitions is in particular:

comparing the similarity of the term paraphrasing and the dictionary paraphrasing through a character string editing distance algorithm;

and sorting the term definitions according to the editing distance from small to large, and deleting the term definitions with zero similarity to obtain the term definitions subjected to pre-screening.

4. A context-based term translation apparatus comprising:

the corpus extraction module is used for extracting the corpus with the sentence matching degree of the corpus with the term being greater than a preset threshold value, sequencing the corpus according to the matching degree from high to low, and filtering the corpus which does not contain the term;

the paraphrasing screening module is used for screening the term paraphrasing by utilizing the corpus paraphrasing and combining a character string editing distance algorithm to obtain a final term paraphrasing;

further comprises:

correspondingly, the paraphrasing screening module is specifically used for:

the word alignment module is specifically configured to:

wherein, the preset scoring model is:

5. An electronic device, comprising:

at least one processor; and

at least one memory communicatively coupled to the processor, wherein:

the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1-3.

6. A non-transitory computer readable storage medium storing computer instructions that cause the computer to perform the method of any one of claims 1 to 3.