CN112215018A - Automatic positioning method and device for correction term pair, electronic equipment and storage medium - Google Patents
Automatic positioning method and device for correction term pair, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN112215018A CN112215018A CN202011305060.6A CN202011305060A CN112215018A CN 112215018 A CN112215018 A CN 112215018A CN 202011305060 A CN202011305060 A CN 202011305060A CN 112215018 A CN112215018 A CN 112215018A
- Authority
- CN
- China
- Prior art keywords
- word
- alignment
- result
- sentence
- pair
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 113
- 238000012937 correction Methods 0.000 title claims abstract description 107
- 238000003860 storage Methods 0.000 title claims abstract description 14
- 238000013519 translation Methods 0.000 claims abstract description 169
- 230000011218 segmentation Effects 0.000 claims description 60
- 238000012549 training Methods 0.000 claims description 15
- 230000001965 increasing effect Effects 0.000 claims description 10
- 238000004140 cleaning Methods 0.000 claims description 6
- 230000000153 supplemental effect Effects 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000012217 deletion Methods 0.000 claims description 2
- 230000037430 deletion Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 12
- 238000004891 communication Methods 0.000 description 9
- 239000000463 material Substances 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 241000588724 Escherichia coli Species 0.000 description 2
- 238000007792 addition Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000005520 cutting process Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 241000193830 Bacillus <bacterium> Species 0.000 description 1
- 241000588914 Enterobacter Species 0.000 description 1
- 241000588722 Escherichia Species 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/189—Automatic justification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/226—Validation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The present disclosure provides an automatic positioning method of a correction term pair, comprising: s1, obtaining a machine translation result of the source language sentence, and correcting the machine translation result; s2, comparing the machine translation result with the correction translation result to obtain at least one candidate query word for positioning each correction term pair in at least one correction term pair, and performing word alignment on the source language sentence and the correction translation result to obtain a word alignment result; and S3, matching the at least one candidate query word used for positioning each correction term pair in the at least one correction term pair with the word alignment result to obtain at least one correction term pair in the source language sentence and the correction translation result. The present disclosure also provides an automatic positioning apparatus, an electronic device, and a storage medium that correct the term pair.
Description
Technical Field
The present disclosure relates to the field of language processing technologies, and in particular, to an automatic positioning method and apparatus for correcting term pairs, an electronic device, and a storage medium.
Background
A Computer Aided Translation (CAT) system for assisting the translator in translating by computerized tool features that the heavy and complicated manual translation process is automatized by various natural language processing techniques based on machine translation, resulting in high translation efficiency and quality. The CAT system firstly translates a source language into a target language through a machine translation model and a memory base, and then a translator carries out manual correction on a machine translation result to form a high-quality translation. For vocabularies and uncommon words in some professional fields, the machine translation result is often wrong and missed, and a translator is required to correct the translation result word by word and sentence by sentence. To avoid the system repeating the misinterpretation of words that have been corrected by the translator, the exact translation of the misinterpreted source language words and corrected target language words is saved as term pairs in the term memory.
At present, a translator needs to manually select or input an accurate translation method of a source language word and a corrected target language word at a specific position of a system when adding term pairs by using a computer aided translation system, the method is complex in operation, and repeatability exists between the method and the correction work of a machine translation result, so that the automation degree of the system is low, and the translation efficiency of the translator is not high.
Disclosure of Invention
To solve at least one of the above technical problems, the present disclosure provides an automatic positioning method, apparatus, electronic device, and storage medium for correcting term pairs. The correction term pair is a correction term pair of the machine translation result.
According to an aspect of the present disclosure, there is provided an automatic positioning method of a correction term pair, including: s1, obtaining a machine translation result of the source language sentence, and correcting the machine translation result to obtain a corrected translation result; s2, comparing the machine translation result with the correction translation result to obtain at least one candidate query word for positioning each correction term pair in at least one correction term pair, and performing word alignment on the source language sentence and the correction translation result to obtain a word alignment result; and S3, matching at least one candidate query word for positioning each correction term pair in the at least one correction term pair with the word alignment result to obtain at least one correction term pair in the source language sentence and the correction translation result.
According to the automatic positioning method of the correction term pair, the correction translation result is a correction translation result corrected by the translator.
According to the automatic positioning method of the correction term pair of at least one embodiment of the present disclosure, at least one candidate query term of each correction term pair is obtained by: comparing the machine translation result with the correction translation result to obtain at least one correction character in the correction translation result; and obtaining at least one candidate query term for locating each of the at least one corrected term pair using a sliding window method based on the at least one corrected character.
According to the automatic positioning method of the correction term pair, the correction characters comprise adding characters and/or deleting characters.
According to the automatic positioning method of the corrected term pair of at least one embodiment of the present disclosure, performing word alignment on the source language sentence and the corrected translation result to obtain a word alignment result, including: SS1, forming sentence pairs by the source language sentences and the corrected translation results, namely the target language sentences, and performing word segmentation on the source language sentences and the target language sentences respectively; SS2, aligning the words in the sentence pairs after word segmentation by using a professional domain dictionary to obtain word pairs which can be aligned by the professional domain dictionary and serve as dictionary alignment results; SS3, carrying out forward alignment on each word in the source language sentence and each word in the target language sentence to obtain a word pair which can be aligned in the forward alignment as a forward alignment result; SS4, carrying out reverse alignment on the words which can not be aligned in the forward direction in the step SS3, and obtaining word pairs which can be aligned in the reverse direction as a reverse alignment result; and SS5, using the dictionary alignment result, the forward alignment result, and the reverse alignment result as the primary alignment result.
According to the automatic positioning method of the corrected term pair of at least one embodiment of the present disclosure, in step SS3, alternatively, words that are not alignable in the professional domain dictionary are forward aligned, and a word pair that is aligned in the forward direction is obtained as a result of the forward alignment.
The automatic positioning method for the correction term pair according to at least one embodiment of the present disclosure performs supplementary alignment on the primary alignment result obtained in step SS5, including: SS61, segmenting the sentence pair into a source language speech block sequence and a target language speech block sequence by using the source language segmentation words and the target language segmentation words; SS62, based on the primary alignment result, corresponding the source language blocks and the target language blocks one by one to obtain language block pairs; SS63, judging whether the source language words and the target language words in the word pairs in the primary alignment result are simultaneously in a word block pair, if a word pair is not simultaneously in a word block pair, removing the source language words and the target language words in the word pair from the word block pair to obtain a cleaned word block pair; and SS64, aligning the unaligned words in the cleaned word block pair, and obtaining a supplementary alignment result of the primary alignment result.
In the automatic location method of the corrected term pair according to at least one embodiment of the present disclosure, in step SS62, the word block pair is obtained using the following method:
representing a sequence of source speech blocks asRepresenting a sequence of target speech blocks as
Wherein, with a subscriptWord pairs as a result of preliminary alignment, withOf subscriptsIs a word that is not aligned;
based on the primary alignment result, obtaining the alignment relation and the alignment probability of the source language words and the target language words, and performing language block alignment by using the following formula:
wherein i, j represent the serial number of the language block, m, n represent the serial number of the word in the sequence i, j of the language block respectively;
when language block alignment is carried out, for each source language block, calculating the alignment probability rho between each word in the source language block and each word in the target language block, wherein the alignment probability of a word pair belonging to a primary alignment result is the primary alignment probability, the alignment probability of the word pair not belonging to the primary alignment result is 0;
and adding the alignment probabilities of all word pairs in the source language block to obtain the language block alignment probability of the source language block relative to the target language block, and selecting the target language block with the highest probability as the source language block alignment.
According to the automatic positioning method of the correction term pair of at least one embodiment of the present disclosure, in step SS3, the forward alignment includes the steps of: SS31, obtaining the translation probability of each word in the source language training corpus relative to each word in the target language training corpus, and obtaining a position alignment factor; SS32, calculating the position alignment probability of each word in the source language sentence after word segmentation relative to each word in the target language sentence after word segmentation based on the translation probability and the position alignment factor; and SS33, taking the corresponding result of the words of the source language sentence and the words of the target language sentence corresponding to the maximum value of the position alignment probability of each word in the source language sentence relative to each word in the target language sentence after word segmentation as a positive alignment result.
The automatic positioning method of the correction term pair according to at least one embodiment of the present disclosure further includes: and SS34, judging whether each maximum value exceeds a preset threshold value, and if the maximum value lower than the preset threshold value exists, reversely aligning the words in the source language sentence corresponding to the maximum value lower than the preset threshold value.
According to the automatic positioning method of the correction term pair of at least one embodiment of the present disclosure, the reverse alignment includes: obtaining the translation probability of each word in the target language training corpus relative to each word in the source language training corpus, and simultaneously increasing the position alignment factor in the forward alignment; calculating the position alignment probability of each word in the target language sentence relative to each word in the source language sentence after word segmentation based on the translation probability and the increased position alignment factor; and taking a corresponding result of the word of the target language sentence corresponding to the maximum value of the position alignment probability of each word in the target language sentence relative to each word in the source language sentence after word segmentation and the word of the source language sentence as a reverse alignment result.
According to the automatic positioning method of the correction term pair, when the position alignment probability of each word in the segmented target language sentence relative to each word in the segmented source language sentence is obtained, the position alignment factor is increased.
According to the automatic positioning method of the corrected term pair of at least one embodiment of the present disclosure, in step SS31, a translation probability of each word in the segmented source language sentence with respect to each word in the segmented target language sentence is obtained using a source language-target language translation probability table.
According to the automatic positioning method of the correction term pair of at least one embodiment of the present disclosure, a target language-source language translation probability table is used to obtain a translation probability of each word in the segmented target language sentence with respect to each word in the segmented source language sentence.
According to the automatic positioning method of the correction term pair of at least one embodiment of the present disclosure, in step SS31, the position alignment probability is calculated by the following formula:
the position alignment probability in the above formula, that is, the position alignment probability of each word i in e aligned to a word j in f;
wherein e is a source sentence, m is a source sentence length, f is a target sentence, n is a target sentence length, theta is a position alignment factor, aiJ represents that the aligned word with word i aligned is word j;
according to the automatic positioning method of the correction term pair of at least one embodiment of the present disclosure, Z is calculated using the following calculation formulaθ(i,m,n):
Order:
then:
according to yet another aspect of the present disclosure, there is provided an automatic positioning device for correcting term pairs, including: a translation result acquisition module that acquires a machine translation result of a source language sentence and acquires a corrected translation result obtained by correcting the machine translation result; the text comparison module is used for performing text comparison on the machine translation result and the correction translation result obtained by the translation result obtaining module to obtain at least one candidate query word for positioning each correction term pair in at least one correction term pair; the alignment module is used for carrying out word alignment on the source language sentence and the correction translation result to obtain a word alignment result; and a corrected term pair acquisition module that matches the at least one candidate query word for locating each corrected term pair in the at least one corrected term pair with the word alignment result to obtain at least one corrected term pair in the source language sentence and the corrected translation result.
According to the automatic positioner of correction term pair of at least one embodiment of this disclosure, the alignment module includes: the word segmentation module is used for forming sentence pairs by machine translation results of a source language sentence and a target language sentence, namely the target language sentence, and performing word segmentation on the source language sentence and the target language sentence respectively; the dictionary alignment module aligns the words in the sentence pairs after word segmentation by using a professional domain dictionary to obtain word pairs which can be aligned by the professional domain dictionary and serve as a dictionary alignment result; a forward alignment module, which performs forward alignment on each word in the source language sentence and each word in the target language sentence to obtain a word pair which can be aligned in the forward alignment and is used as a forward alignment result; the reverse alignment module performs reverse alignment on words which cannot be aligned in the forward direction to obtain word pairs which can be aligned in the reverse direction and serve as reverse alignment results; and a primary alignment result generation module that takes the dictionary alignment result, the forward alignment result, and the reverse alignment result as a primary alignment result.
According to the automatic positioning device for the corrected term pair, the forward alignment module performs forward alignment on words which cannot be aligned by the professional domain dictionary to obtain the word pairs which can be aligned in the forward alignment mode as a forward alignment result.
The automatic positioning device for correcting term pairs according to at least one embodiment of the present disclosure further includes a supplementary alignment module that performs supplementary alignment on the primary alignment result.
According to the automatic positioner of correction term pair of at least one embodiment of this disclosure, the supplementary alignment module includes: a speech segmentation module that segments the sentence pairs into source speech block sequences and target speech block sequences using source language segmentation words and target language segmentation words; a language block pair generation module which corresponds the source language blocks and the target language blocks one to one on the basis of the primary alignment result to obtain language block pairs; a language block pair cleaning module, which judges whether the source language words and the target language words in the word pairs in the primary alignment result are simultaneously in a language block pair, if a certain word pair is not simultaneously in a language block pair, the source language words and the target language words in the word pair are removed from the language block pair, and the cleaned language block pair is obtained; and the supplementary alignment result generation module is used for aligning unaligned words in the cleaned word block pair to obtain a supplementary alignment result of the primary alignment result.
According to yet another aspect of the present disclosure, there is provided an electronic device including: a memory storing execution instructions; and a processor executing execution instructions stored by the memory to cause the processor to perform any of the methods described above.
According to yet another aspect of the present disclosure, there is provided a readable storage medium having stored therein execution instructions for implementing the method of any one of the above when executed by a processor.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the disclosure and together with the description serve to explain the principles of the disclosure.
Fig. 1 is a flowchart illustrating an automatic location method of a correction term pair according to an embodiment of the present disclosure.
Fig. 2 is a flowchart illustrating a method for acquiring a candidate query term in an automatic location method for a corrected term pair according to an embodiment of the present disclosure.
Fig. 3 is a flowchart illustrating a word alignment method in an automatic positioning method of a correction term pair according to an embodiment of the present disclosure.
Fig. 4 is a flowchart illustrating a word alignment method in an automatic location method of a correction term pair according to still another embodiment of the present disclosure.
Fig. 5 is a flowchart illustrating a word alignment method in an automatic location method of a correction term pair according to still another embodiment of the present disclosure.
Fig. 6 is a flowchart illustrating a word alignment method in an automatic location method of a correction term pair according to still another embodiment of the present disclosure.
Fig. 7 is a flowchart illustrating a supplementary alignment method in a word alignment method in an automatic positioning method of a correction term pair according to an embodiment of the present disclosure.
Fig. 8 is a schematic flow chart of forward alignment in the word alignment method in the automatic positioning method of the correction term pair according to one embodiment of the present disclosure.
FIG. 9 is a block diagram schematic of the structure of an automatic positioning device correcting term pairs according to one embodiment of the present disclosure.
Fig. 10 is a block diagram of a structure of an alignment module of an automatic positioning device correcting term pairs according to an embodiment of the present disclosure.
Fig. 11 is a block diagram of a structure of an alignment module of an automatic positioning device correcting term pairs according to still another embodiment of the present disclosure.
Fig. 12 is a block diagram of a supplementary alignment module of an automatic positioning device correcting term pairs according to still another embodiment of the present disclosure.
Fig. 13 is an exemplary flow diagram of a word alignment method according to one embodiment of the present disclosure.
Fig. 14 is an exemplary diagram of supplemental alignment with language blocks according to one embodiment of the present disclosure.
Fig. 15 is a block diagram of an electronic device according to one embodiment of the present disclosure.
Description of the reference numerals
10 automatic positioning device for correction term pair
11 translation result acquisition module
12 text comparison module
13 alignment module
14 correction term pair acquisition module
131 word segmentation module
132 dictionary alignment module
133 forward alignment module
134 reverse alignment module
135 primary alignment result generation module
136 supplemental alignment module
1361 language block cutting module
1362 language block pair generation module
1363 language block pair cleaning module
1364 complement the alignment result generation module.
Detailed Description
The present disclosure will be described in further detail with reference to the drawings and embodiments. It is to be understood that the specific embodiments described herein are for purposes of illustration only and are not to be construed as limitations of the present disclosure. It should be further noted that, for the convenience of description, only the portions relevant to the present disclosure are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict. Technical solutions of the present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Unless otherwise indicated, the illustrated exemplary embodiments/examples are to be understood as providing exemplary features of various details of some ways in which the technical concepts of the present disclosure may be practiced. Accordingly, unless otherwise indicated, features of the various embodiments may be additionally combined, separated, interchanged, and/or rearranged without departing from the technical concept of the present disclosure.
The use of cross-hatching and/or shading in the drawings is generally used to clarify the boundaries between adjacent components. As such, unless otherwise noted, the presence or absence of cross-hatching or shading does not convey or indicate any preference or requirement for a particular material, material property, size, proportion, commonality between the illustrated components and/or any other characteristic, attribute, property, etc., of a component. Further, in the drawings, the size and relative sizes of components may be exaggerated for clarity and/or descriptive purposes. While example embodiments may be practiced differently, the specific process sequence may be performed in a different order than that described. For example, two processes described consecutively may be performed substantially simultaneously or in reverse order to that described. In addition, like reference numerals denote like parts.
When an element is referred to as being "on" or "on," "connected to" or "coupled to" another element, it can be directly on, connected or coupled to the other element or intervening elements may be present. However, when an element is referred to as being "directly on," "directly connected to" or "directly coupled to" another element, there are no intervening elements present. For purposes of this disclosure, the term "connected" may refer to physically, electrically, etc., and may or may not have intermediate components.
For descriptive purposes, the present disclosure may use spatially relative terms such as "below … …," below … …, "" below … …, "" below, "" above … …, "" above, "" … …, "" higher, "and" side (e.g., "in the sidewall") to describe one component's relationship to another (other) component as illustrated in the figures. Spatially relative terms are intended to encompass different orientations of the device in use, operation, and/or manufacture in addition to the orientation depicted in the figures. For example, if the device in the figures is turned over, elements described as "below" or "beneath" other elements or features would then be oriented "above" the other elements or features. Thus, the exemplary term "below … …" can encompass both an orientation of "above" and "below". Further, the devices may be otherwise positioned (e.g., rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein interpreted accordingly.
The terminology used herein is for the purpose of describing particular embodiments and is not intended to be limiting. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Furthermore, when the terms "comprises" and/or "comprising" and variations thereof are used in this specification, the presence of stated features, integers, steps, operations, elements, components and/or groups thereof are stated but does not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or groups thereof. It is also noted that, as used herein, the terms "substantially," "about," and other similar terms are used as approximate terms and not as degree terms, and as such, are used to interpret inherent deviations in measured values, calculated values, and/or provided values that would be recognized by one of ordinary skill in the art.
Fig. 1 is a flowchart illustrating an automatic location method of a correction term pair according to an embodiment of the present disclosure.
As shown in fig. 1, the automatic positioning method of correction term pair includes: s1, obtaining a machine translation result of the source language sentence, and correcting the machine translation result to obtain a corrected translation result; s2, comparing the machine translation result with the correction translation result to obtain at least one candidate query word for positioning each correction term pair in at least one correction term pair, and performing word alignment on the source language sentence and the correction translation result to obtain a word alignment result; and S3, matching at least one candidate query word for positioning each correction term pair in the at least one correction term pair with the word alignment result to obtain at least one correction term pair in the source language sentence and the correction translation result.
Wherein, the corrected translation result may be a corrected translation result corrected by the translator.
In the method for automatically locating a corrected term pair according to this embodiment, a machine translation result is compared with a corrected translation result to obtain at least one candidate query term for each corrected term pair, a source language sentence (e.g., an english sentence) and the corrected translation result (e.g., a chinese sentence) are word-aligned to obtain a word-aligned result (i.e., an aligned result of an english word and a chinese word), and then the candidate query terms are matched with the word-aligned result to obtain a corrected term pair, where the corrected term pair is a term pair formed by a word in the source language sentence and a corrected word in the corrected translation result.
Preferably, as shown in fig. 2, at least one candidate query term of each correction term pair is obtained by:
comparing the machine translation result with the correction translation result to obtain at least one correction character in the correction translation result; and
based on the at least one corrected character, at least one candidate query term for locating each of at least one corrected term pair is obtained using a sliding window method.
Wherein correcting the characters includes adding characters and/or deleting characters.
The automatic positioning method for the correction term pair of the present disclosure firstly compares the machine translation result and the correction translation result (for example, the translator correction result) to obtain the candidate query term for positioning the correction term.
Illustratively, it is relatively simple to record the action of the translator to correct the result of the machine translation, but in practical cases, for a word that is wrongly translated, the translator will not choose to completely delete the wrong part and then re-input the correct translation method, but will correct the word by adding and/or deleting characters on the basis of the result of the machine translation. For example, in the medical field, the specialized word "Escherichia coli" is interpreted by machine translation as "Escherichia coli", and the translator will delete two characters of "Escherichia" and add the character of "rod". If only the additions or deletions at the translator character level are recorded, the specific words (terms) modified by the translator cannot be accurately located if the deleted or added characters appear multiple times in a sentence in the target language.
In the automatic positioning method of the corrected term pair according to the present embodiment, after the edited characters of the corrected behavior of the translator are obtained by comparing the translation results before and after correction, a plurality of candidate query terms of the corrected word are obtained by, for example, a sliding window-based method, and in the above example, the candidate query terms may be "enteric canal", "bacillus", "enterobacter" according to the size of the sliding window. These candidate query terms may be used to locate words that the translator specifically corrects.
Preferably, the automatic location method of the correction term pair of the present embodiment first identifies the difference between the two texts (the machine translation result text and the correction translation result text) (e.g., using the Differ () method in python's own third party library difflib).
Preferably, the identification rules are as follows: the corrected text is identified with a '+' symbol, the deleted text is identified with a '-' symbol, and the unmodified text is not identified with a symbol, relative to the character that is newly added to the corrected text. And then, taking the newly added character marked as '+' as a central character, if no newly added character exists, taking the previous character or the next character of the '-' marked character as the central character, taking the starting character and the ending character of the text as boundaries, taking the deleted character with the '-' mark as a limiting condition, and sliding left and right through a sliding window with a preset size to expand the character edited by the translator into a plurality of candidate query words.
Subsequently, if a term pair needs to be located, the source language word corresponding to the corrected word needs to be obtained. The above-mentioned problems cannot be solved only by the dictionary method. Firstly, the problem of unknown words cannot be solved, and secondly, in consideration of the normalization of professional translation, for example, "missing" in the medical field should be translated into "blind", and "penalty" in the football field should be translated into "nodding" and the like, the dictionary cannot be generalized to match the accurate translation of the same source language word in different professional fields. The present disclosure provides an improved word alignment method that performs word alignment of a corrected result with a source language sentence. The accurate word alignment results may result in a correspondence of each word of the source language sentence to each word of the corrected translation results. And matching the candidate query words with the word alignment result so as to locate the term pairs corrected by the translator in the machine translation result.
Fig. 3 is a flowchart illustrating a word alignment method in an automatic positioning method of a correction term pair according to an embodiment of the present disclosure.
As shown in fig. 3, in the foregoing embodiment, performing word alignment on the source language sentence and the corrected translation result to obtain a word alignment result includes:
SS1, forming sentence pairs by machine translation results of the source language sentences and the source language sentences, namely target language sentences, and performing word segmentation on the source language sentences and the target language sentences respectively;
SS2, aligning the words in the sentence pairs after word segmentation by using a professional domain dictionary to obtain the word pairs which can be aligned by the professional domain dictionary and serve as dictionary alignment results;
SS3, carrying out forward alignment on each word in the source language sentence and each word in the target language sentence to obtain a word pair which can be aligned by forward alignment and is used as a forward alignment result;
SS4, carrying out reverse alignment on the words which can not be aligned in the forward direction in the step SS3, and obtaining word pairs which can be aligned in the reverse direction as a reverse alignment result; and
SS5, using the dictionary alignment result, the forward alignment result and the reverse alignment result as the primary alignment result.
Wherein, the dictionary alignment can be performed in a dictionary matching manner. The source language may be english, german or french, and the target language may be chinese.
Fig. 4 is a flowchart illustrating a word alignment method in an automatic location method of a correction term pair according to still another embodiment of the present disclosure.
As shown in fig. 4, in the foregoing embodiment, performing word alignment on the source language sentence and the corrected translation result to obtain a word alignment result includes:
SS1, forming sentence pairs by machine translation results of the source language sentences and the source language sentences, namely target language sentences, and performing word segmentation on the source language sentences and the target language sentences respectively;
SS2, aligning the words in the sentence pairs after word segmentation by using a professional domain dictionary to obtain the word pairs which can be aligned by the professional domain dictionary and serve as dictionary alignment results;
SS3, carrying out forward alignment on words which can not be aligned in the professional domain dictionary, and obtaining word pairs which can be aligned in the forward alignment as a forward alignment result;
SS4, carrying out reverse alignment on the words which can not be aligned in the forward direction in the step SS3, and obtaining word pairs which can be aligned in the reverse direction as a reverse alignment result; and
SS5, using the dictionary alignment result, the forward alignment result and the reverse alignment result as the primary alignment result.
Fig. 5 is a flowchart illustrating a word alignment method in an automatic location method of a correction term pair according to still another embodiment of the present disclosure.
As shown in fig. 5, in the foregoing embodiment, performing word alignment on the source language sentence and the corrected translation result to obtain a word alignment result includes:
SS1, forming sentence pairs by machine translation results of the source language sentences and the source language sentences, namely target language sentences, and performing word segmentation on the source language sentences and the target language sentences respectively;
SS2, aligning the words in the sentence pairs after word segmentation by using a professional domain dictionary to obtain the word pairs which can be aligned by the professional domain dictionary and serve as dictionary alignment results;
SS3, carrying out forward alignment on each word in the source language sentence and each word in the target language sentence to obtain a word pair which can be aligned by forward alignment and is used as a forward alignment result;
SS4, carrying out reverse alignment on the words which can not be aligned in the forward direction in the step SS3, and obtaining word pairs which can be aligned in the reverse direction as a reverse alignment result;
SS5, using the dictionary alignment result, the forward alignment result and the reverse alignment result as the primary alignment result; and
SS6, performing supplementary alignment on the primary alignment result.
Fig. 6 is a flowchart illustrating a word alignment method in an automatic location method of a correction term pair according to still another embodiment of the present disclosure.
As shown in fig. 6, in the foregoing embodiment, performing word alignment on the source language sentence and the corrected translation result to obtain a word alignment result includes:
SS1, forming sentence pairs by machine translation results of the source language sentences and the source language sentences, namely target language sentences, and performing word segmentation on the source language sentences and the target language sentences respectively;
SS2, aligning the words in the sentence pairs after word segmentation by using a professional domain dictionary to obtain the word pairs which can be aligned by the professional domain dictionary and serve as dictionary alignment results;
SS3, carrying out forward alignment on words which can not be aligned in the professional domain dictionary, and obtaining word pairs which can be aligned in the forward alignment as a forward alignment result;
SS4, carrying out reverse alignment on the words which can not be aligned in the forward direction in the step SS3, and obtaining word pairs which can be aligned in the reverse direction as a reverse alignment result;
SS5, using the dictionary alignment result, the forward alignment result and the reverse alignment result as the primary alignment result; and
SS6, performing supplementary alignment on the primary alignment result.
Fig. 8 is a schematic flow chart of forward alignment in the word alignment method in the automatic positioning method of the correction term pair according to one embodiment of the present disclosure.
Preferably, in step SS3, the forward alignment comprises the following steps:
SS31, obtaining a translation probability for each word in the source language corpus relative to each word in the target language corpus, and a position alignment factor.
SS32, calculating the position alignment probability of each word in the source language sentence after word segmentation relative to each word in the target language sentence after word segmentation based on the translation probability and the position alignment factor; and
and SS33, taking the corresponding result of the words of the source language sentence and the words of the target language sentence corresponding to the maximum value of the position alignment probability of each word in the source language sentence relative to each word in the target language sentence after word segmentation as a forward alignment result.
The obtaining of the source language training corpus and the target language training corpus belongs to the prior art and is not described in detail.
Next, a dictionary alignment process and a forward alignment process in the above embodiment will be described by taking english-chinese translation as an example.
First, a dictionary file and a word translation probability table for english-chinese bilingual are prepared.
The dictionary file can prepare a basic professional domain dictionary according to the domain, and the word translation probability table can be obtained by training a large-scale English-Chinese bilingual training corpus based on an EM algorithm by using a fast _ align word alignment tool.
The sentence pairs are segmented, English can be segmented by using a blank space, and Chinese can be segmented by using an open source segmentation tool such as jieba and the like.
Firstly, a domain dictionary is utilized, and matching can be carried out in the domain dictionary in a dictionary matching mode.
And then aligned using a word alignment method. The basic method of word alignment is to calculate the product of the translation probability and the position alignment probability of each word in the source sentence and the target sentence as an alignment score. Wherein the translation probability can be obtained from a word translation probability table obtained by training through a table look-up, and words which are not counted are uniformly set to a value, for example, 10-6。
Preferably, the word alignment method in the foregoing embodiment further includes:
SS34, determining whether each maximum value exceeds a predetermined threshold, and if the maximum value below the predetermined threshold exists, reversely aligning the words in the source language sentence corresponding to the maximum value below the predetermined threshold.
Preferably, in the word alignment method of the machine translation result in the foregoing embodiment, the reverse alignment includes:
obtaining the translation probability of each word in the target language training corpus relative to each word in the source language training corpus, and simultaneously increasing the position alignment factor in the forward alignment; calculating the position alignment probability of each word in the target language sentence relative to each word in the source language sentence after word segmentation based on the translation probability and the increased position alignment factor; and taking a corresponding result of the word of the target language sentence corresponding to the maximum value of the position alignment probability of each word in the target language sentence relative to each word in the source language sentence after word segmentation and the word of the source language sentence as a reverse alignment result.
Preferably, in the above embodiment, when obtaining the position alignment probability of each word in the segmented target language sentence with respect to each word in the segmented source language sentence, the position alignment factor is increased.
Preferably, in the above embodiment, in step SS31, the translation probability of each word in the segmented source language sentence with respect to each word in the segmented target language sentence is obtained using the source language-target language translation probability table.
Preferably, in the above embodiment, the target language-source language translation probability table is used to obtain the translation probability of each word in the segmented target language sentence with respect to each word in the segmented source language sentence.
Preferably, in the above embodiment, in step SS31, the position alignment probability is calculated by the following formula:
the position alignment probability in the above formula, that is, the position alignment probability of each word i in e aligned to a word j in f;
wherein e is a source sentence, m is a source sentence length, f is a target sentence, n is a target sentence length, theta is a position alignment factor, aiJ represents that the aligned word with word i aligned is word j;
the position alignment factor θ (position alignment factor) can be obtained together with the translation probability table when the corpus is trained by a fast _ alignment tool or the like.
Preferably, in the above embodiment, Z is calculated using the following calculation formulaθ(i,m,n):
Order:
then:
the word alignment method of the machine translation result of the present disclosure preferably performs alignment twice in the forward direction and the reverse direction when performing word alignment.
Taking english-chinese translation as an example, firstly, an english-chinese translation probability table is used for alignment, an alignment score of each word of an english sentence and each word of a chinese sentence is calculated, the scores are normalized into probability values, and the maximum probability is taken as a result of forward alignment of english-chinese.
And (3) regarding the alignment result which is considered to be unreliable and has the maximum probability lower than a set threshold value in the forward alignment, performing reverse alignment by using the Chinese-English translation probability table, and increasing the value of the position alignment factor theta at the moment, namely enhancing the influence of the position on the alignment result to obtain the reverse alignment result.
And integrating the dictionary alignment result, the forward alignment result and the reverse alignment result to obtain a primary alignment result aligned by using a method of combining the dictionary and word alignment.
The preliminary alignment results obtained in the above embodiments are obtained using a combination of a dictionary and word alignment, which has a good accuracy in translating correct word alignment, but often errors occur for mistranslated words. Meanwhile, for english-chinese translation as an example, since the sentence orders expressed in english-chinese are usually different, for example, some phrases explaining time and place, and situations of inclusion and dependency relationship exist, under these situations, the goal cannot be achieved by aligning the wrong translation situation by the word alignment method, and also by aligning the words on both sides of the wrong translation.
A supplementary alignment of the primary alignment result is required.
Fig. 7 is a flowchart illustrating a supplementary alignment method in a word alignment method in an automatic positioning method of a correction term pair according to an embodiment of the present disclosure.
As shown in fig. 7, step SS6, i.e., the supplementary alignment step in the above embodiment, preferably includes:
SS61, dividing the sentence into a source language speech block sequence and a target language speech block sequence in a bisection mode by using the source language segmentation words and the target language segmentation words;
SS62, corresponding the source language blocks and the target language blocks one by one based on the primary alignment result to obtain language block pairs;
SS63, judging whether the source language words and the target language words in the word pairs in the primary alignment result are simultaneously in a word block pair, if a word pair is not simultaneously in a word block pair, removing the source language words and the target language words in the word pair from the word block pair to obtain a cleaned word block pair; and
and SS64, aligning the unaligned words in the cleaned word block to obtain a supplementary alignment result of the primary alignment result.
Preferably, in step SS62, the following method is used to obtain the phrase pair:
representing a sequence of source speech blocks asRepresenting a sequence of target speech blocks as
Wherein,with subscript aWord pairs as a result of preliminary alignment, withOf subscriptsIs a word that is not aligned;
based on the primary alignment result, obtaining the alignment relation and the alignment probability of the source language words and the target language words, and performing language block alignment by using the following formulas:
wherein i, j represent the serial number of the language block, m, n represent the serial number of the word in the sequence i, j of the language block respectively;
when language block alignment is carried out, for each source language block, calculating the alignment probability rho between each word in the source language block and each word in the target language block, wherein the alignment probability of a word pair belonging to a primary alignment result is the primary alignment probability, the alignment probability of the word pair not belonging to the primary alignment result is 0;
and adding the alignment probabilities of all word pairs in the source language block to obtain the language block alignment probability of the source language block relative to the target language block, and selecting the target language block with the highest probability as the source language block alignment.
In the above embodiment, step SS61 divides sentence pairs into a sequence of chunks. Taking english-chinese translation as an example, preferably, a start mark is added in front of an aligned entry in a sentence pair according to a primary alignment result, and an end mark is added behind the entry, so as to prevent splitting the aligned result when segmenting a word block. When English is segmented, some English segmentation words including punctuation marks and "in, at, on, of, with, and, but, or" and the like "can be preset, and the existence of these words often changes sentence sequence because of preceding or following in English expression habit.
And segmenting the English sentences in the original sentence pairs into word block (block) sequences according to preset segmentation words. This allows the order of words in each block to be consistent with the order of words in a chinese translation. Chinese is then also segmented into a sequence of chunks. In the same way, Chinese punctuation marks and words such as 'and' or 'and' can be set as segmentation words, and the Chinese is segmented into word block sequences.
After the sentence pair block sequence is obtained, step SS62 corresponds the chinese language block and the english language block one to one by using the primary alignment result, taking english-chinese translation as an example. After the correspondence, the alignment relation of the words which are not aligned due to the transliteration and other conditions except the primary alignment result can be further clarified, so that the supplementary alignment is carried out.
Illustratively, for English-to-Chinese translation, each English block, e.g., word block, is aligned with each otherAnd calculating the alignment probability p between each word in the Chinese language block and each word in the Chinese language block. Word pairs belonging to a preliminary alignment result, e.g.Andhaving an alignment probability of the primary alignment probability, not being a result of the primary alignment, e.g.OrThe alignment probability is 0. And then, adding the alignment probabilities of all word pairs in the language block to be used as the language block alignment probabilities of the English language block to the Chinese language block, and selecting the Chinese language block with the highest probability as the English language block alignment.
After aligning all the language blocks, there are Chinese words corresponding to other English language blocks in the Chinese language blocks aligned by English language blocks (for example, English-to-Chinese translation), because there is no possibility of one-to-one correspondence between language blocks and between words. Therefore, it is necessary to use step SS63 to clean the words inside each block, for example, for each word pair in the primary alignment result, if it does not occur in both the English block and the aligned Chinese block, the English word or Chinese word is removed.
After alignment and cleaning, the correspondence between the language blocks and the words is basically consistent, and then words (taking english-chinese translation as an example) which do not find aligned chinese in each english language block can be searched, for exampleIs aligned toThen, the word is aligned with the unaligned word in the corresponding Chinese language block, and the unaligned word can be obtainedAndin the alignment relationship of (1).
The primary alignment result and the supplementary alignment result are combined to obtain a final alignment result, which can be seen in the example shown in fig. 14. Compared with the prior art, the accuracy of the finally obtained alignment result can be improved by more than 10%.
FIG. 9 is a block diagram schematic of the structure of an automatic positioning device correcting term pairs according to one embodiment of the present disclosure.
As shown in fig. 9, the automatic positioning device 10 for correcting term pairs includes: the translation result acquisition module 11, the translation result acquisition module 11 acquires a machine translation result of the source language sentence and acquires a corrected translation result obtained by correcting the machine translation result; the text comparison module 12 is used for performing text comparison on the machine translation result and the corrected translation result acquired by the translation result acquisition module 11 by the text comparison module 12 to acquire at least one candidate query word for positioning each corrected term pair in at least one corrected term pair; the alignment module 13, the alignment module 13 performs word alignment on the source language sentence and the corrected translation result to obtain a word alignment result; and a corrected term pair obtaining module 14, wherein the corrected term pair obtaining module 14 matches the at least one candidate query word for locating each corrected term pair in at least one corrected term pair with the word alignment result, and obtains at least one corrected term pair in the source language sentence and the corrected translation result.
Fig. 10 is a block diagram of a structure of an alignment module of an automatic positioning device correcting term pairs according to an embodiment of the present disclosure.
As shown in fig. 10, the alignment module 13 includes:
a word segmentation module 131, wherein the word segmentation module 131 forms sentence pairs from machine translation results of the source language sentence and the source language sentence, i.e. the target language sentence, and performs word segmentation on the source language sentence and the target language sentence respectively;
the dictionary alignment module 132 is used for aligning the words in the sentence pairs after word segmentation by using the professional domain dictionary through the dictionary alignment module 132 to obtain the word pairs which can be aligned by the professional domain dictionary and serve as a dictionary alignment result;
a forward alignment module 133, wherein the forward alignment module 133 performs forward alignment on each word in the source language sentence and each word in the target language sentence to obtain a word pair which can be aligned by forward alignment, and the word pair is used as a forward alignment result;
a reverse alignment module 134, wherein the reverse alignment module 134 performs reverse alignment on the words which are not aligned in the forward direction to obtain word pairs which are aligned in the reverse direction and used as a reverse alignment result; and
a primary alignment result generation module 135, the primary alignment result generation module 135 using the dictionary alignment result, the forward alignment result, and the reverse alignment result as the primary alignment result.
Fig. 11 is a block diagram of a structure of an alignment module of an automatic positioning device correcting term pairs according to still another embodiment of the present disclosure.
As shown in fig. 11, the alignment module 13 includes:
a word segmentation module 131, wherein the word segmentation module 131 forms sentence pairs from machine translation results of the source language sentence and the source language sentence, i.e. the target language sentence, and performs word segmentation on the source language sentence and the target language sentence respectively;
the dictionary alignment module 132 is used for aligning the words in the sentence pairs after word segmentation by using the professional domain dictionary through the dictionary alignment module 132 to obtain the word pairs which can be aligned by the professional domain dictionary and serve as a dictionary alignment result;
a forward alignment module 133, wherein the forward alignment module 133 performs forward alignment on each word in the source language sentence and each word in the target language sentence to obtain a word pair which can be aligned by forward alignment, and the word pair is used as a forward alignment result;
a reverse alignment module 134, wherein the reverse alignment module 134 performs reverse alignment on the words which are not aligned in the forward direction to obtain word pairs which are aligned in the reverse direction and used as a reverse alignment result;
a primary alignment result generation module 135, the primary alignment result generation module 135 using the dictionary alignment result, the forward alignment result, and the reverse alignment result as a primary alignment result; and
and a supplementary alignment module 136, wherein the supplementary alignment module 136 performs supplementary alignment on the primary alignment result.
Alternatively, in the above embodiment, the forward alignment module 133 performs forward alignment on words that cannot be aligned by the professional domain dictionary, and obtains pairs of words that can be aligned by the forward alignment as a forward alignment result.
Fig. 12 is a block diagram of a supplementary alignment module of an automatic positioning device correcting term pairs according to still another embodiment of the present disclosure.
As shown in fig. 12, the supplemental alignment module 136 includes:
a speech segmentation module 1361, the speech segmentation module 1361 segmentalizes the sentence into a source speech block sequence and a target speech block sequence;
a language block pair generation module 1362, the language block pair generation module 1362 corresponding the source language blocks to the target language blocks one to one based on the primary alignment result, to obtain language block pairs;
a block pair cleaning module 1363, where the block pair cleaning module 1363 determines whether a source language word and a target language word in a word pair in the primary alignment result are simultaneously present in one block pair, and if a word pair is not simultaneously present in one block pair, removes the source language word and the target language word in the word pair from the block pair to obtain a cleaned block pair; and
a supplementary alignment result generating module 1364, where the supplementary alignment result generating module 1364 aligns unaligned words in the cleaned word block pair to obtain a supplementary alignment result of the primary alignment result.
Fig. 13 is an exemplary flow diagram of a word alignment method according to one embodiment of the present disclosure.
As shown in fig. 13, segmenting a sentence pair of a corrected translation result (which may be a corrected result of a source language sentence translator), first performing matching using a professional dictionary to obtain a word pair that can be aligned by the dictionary, then performing forward word alignment to obtain a word pair that can be aligned by forward word alignment, wherein it is determined whether alignment probabilities of forward word alignment of each word are greater than a threshold, if both are greater than the threshold, obtaining a primary alignment result, and if there is a forward alignment probability that is not greater than the threshold, performing reverse word alignment, and then obtaining a primary alignment result.
After the primary alignment result is obtained, the bilingual block sequence is segmented, the sentence pair is segmented into a source language speech block sequence and a target language speech block sequence, the bilingual block sequence is aligned on the basis of the primary alignment result, whether a word is not aligned is judged, if not, a final alignment result is obtained, if yes, the intra-language block supplementary alignment is carried out, and then a final alignment result is obtained.
Compared with the method/device in the prior art, the word alignment method and the word alignment device firstly obtain a primary alignment result, and then improve the accuracy of word alignment, particularly the alignment of a wrong translation result and a source language sentence word by cutting a sentence pair and performing supplementary alignment by using the word block alignment method.
According to the automatic positioning method for the correction term pair, the candidate query words used for positioning the positions of the correction words are obtained, the improved word alignment method is utilized, the correction term pair can be automatically positioned after the translator finishes correcting the machine translation result, the method can improve the automation degree of a computer-aided translation system, and the translation efficiency of the translator is improved.
The present disclosure also provides an electronic device, as shown in fig. 15, the device including: a communication interface 1000, a memory 2000, and a processor 3000. The communication interface 1000 is used for communicating with an external device to perform data interactive transmission. The memory 2000 has stored therein a computer program that is executable on the processor 3000. The processor 3000 implements the method in the above-described embodiment when executing the computer program. The number of the memory 2000 and the processor 3000 may be one or more.
The memory 2000 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.
If the communication interface 1000, the memory 2000 and the processor 3000 are implemented independently, the communication interface 1000, the memory 2000 and the processor 3000 may be connected to each other through a bus to complete communication therebetween. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not represent only one bus or one type of bus.
Optionally, in a specific implementation, if the communication interface 1000, the memory 2000, and the processor 3000 are integrated on a chip, the communication interface 1000, the memory 2000, and the processor 3000 may complete communication with each other through an internal interface.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present disclosure includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the implementations of the present disclosure. The processor performs the various methods and processes described above. For example, method embodiments in the present disclosure may be implemented as a software program tangibly embodied in a machine-readable medium, such as a memory. In some embodiments, some or all of the software program may be loaded and/or installed via memory and/or a communication interface. When the software program is loaded into memory and executed by a processor, one or more steps of the method described above may be performed. Alternatively, in other embodiments, the processor may be configured to perform one of the methods described above by any other suitable means (e.g., by means of firmware).
The logic and/or steps represented in the flowcharts or otherwise described herein may be embodied in any readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.
For the purposes of this description, a "readable storage medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the readable storage medium include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable read-only memory (CDROM). In addition, the readable storage medium may even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in the memory.
It should be understood that portions of the present disclosure may be implemented in hardware, software, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps of the method implementing the above embodiments may be implemented by hardware that is instructed to implement by a program, which may be stored in a readable storage medium, and when executed, includes one or a combination of the steps of the method embodiments.
In addition, each functional unit in the embodiments of the present disclosure may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a readable storage medium. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.
In the description herein, reference to the description of the terms "one embodiment/implementation," "some embodiments/implementations," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment/implementation or example is included in at least one embodiment/implementation or example of the present application. In this specification, the schematic representations of the terms described above are not necessarily the same embodiment/mode or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments/modes or examples. Furthermore, the various embodiments/aspects or examples and features of the various embodiments/aspects or examples described in this specification can be combined and combined by one skilled in the art without conflicting therewith.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
It will be understood by those skilled in the art that the foregoing embodiments are merely for clarity of illustration of the disclosure and are not intended to limit the scope of the disclosure. Other variations or modifications may occur to those skilled in the art, based on the foregoing disclosure, and are still within the scope of the present disclosure.
Claims (10)
1. A method for automatic location of a corrected term pair, comprising:
s1, obtaining a machine translation result of the source language sentence, and correcting the machine translation result to obtain a corrected translation result;
s2, comparing the machine translation result with the correction translation result to obtain at least one candidate query word for positioning each correction term pair in at least one correction term pair, and performing word alignment on the source language sentence and the correction translation result to obtain a word alignment result; and
s3, matching at least one candidate query word used for positioning each correction term pair in the at least one correction term pair with the word alignment result, and obtaining at least one correction term pair in the source language sentence and the correction translation result.
2. The method for automatically locating corrective term pairs according to claim 1, wherein said corrective translation result is a corrective translation result corrected by a translator.
3. The method of claim 1, wherein at least one candidate query term for each correction term pair is obtained by:
comparing the machine translation result with the correction translation result to obtain at least one correction character in the correction translation result; and
based on the at least one corrected character, at least one candidate query term for locating each of at least one corrected term pair is obtained using a sliding window method.
4. Method for automatic positioning of correction term pairs according to claim 3, characterized in that the correction characters comprise addition characters and/or deletion characters.
5. The method of claim 1, wherein performing word alignment on the source language sentence and the corrected translation result to obtain a word alignment result comprises:
SS1, forming sentence pairs by the source language sentences and the corrected translation results, namely the target language sentences, and performing word segmentation on the source language sentences and the target language sentences respectively;
SS2, aligning the words in the sentence pairs after word segmentation by using a professional domain dictionary to obtain word pairs which can be aligned by the professional domain dictionary and serve as dictionary alignment results;
SS3, carrying out forward alignment on each word in the source language sentence and each word in the target language sentence to obtain a word pair which can be aligned in the forward alignment as a forward alignment result;
SS4, carrying out reverse alignment on the words which can not be aligned in the forward direction in the step SS3, and obtaining word pairs which can be aligned in the reverse direction as a reverse alignment result; and
SS5, using the dictionary alignment result, the forward alignment result and the reverse alignment result as the primary alignment result.
6. The automatic positioning method of correction term pair according to claim 5, characterized in that in step SS3, alternatively, words that are not alignable in the professional domain dictionary are forward aligned to obtain forward aligned alignable word pairs as a result of forward alignment,
or,
performing supplementary alignment on the primary alignment result obtained in step SS5, including:
SS61, segmenting the sentence pair into a source language speech block sequence and a target language speech block sequence by using the source language segmentation words and the target language segmentation words;
SS62, based on the primary alignment result, corresponding the source language blocks and the target language blocks one by one to obtain language block pairs;
SS63, judging whether the source language words and the target language words in the word pairs in the primary alignment result are simultaneously in a word block pair, if a word pair is not simultaneously in a word block pair, removing the source language words and the target language words in the word pair from the word block pair to obtain a cleaned word block pair; and
SS64 aligning the unaligned words in the cleaned word block pair to obtain the supplementary alignment result of the primary alignment result,
or,
in step SS62, the word block pairs were obtained using the following method:
representing a sequence of source speech blocks asRepresenting a sequence of target speech blocks as
Wherein, with a subscriptWord pairs as a result of preliminary alignment, withOf subscriptsIs a word that is not aligned;
based on the primary alignment result, obtaining the alignment relation and the alignment probability of the source language words and the target language words, and performing language block alignment by using the following formula:
wherein i, j represent the serial number of the language block, m, n represent the serial number of the word in the sequence i, j of the language block respectively;
when language block alignment is carried out, for each source language block, calculating the alignment probability rho between each word in the source language block and each word in the target language block, wherein the alignment probability of a word pair belonging to a primary alignment result is the primary alignment probability, the alignment probability of the word pair not belonging to the primary alignment result is 0;
adding the alignment probabilities of all word pairs in the source language block to obtain the language block alignment probability of the source language block relative to the target language block, selecting the target language block with the highest probability of alignment of the source language block,
or,
in step SS3, the forward alignment includes the following steps:
SS31, obtaining the translation probability of each word in the source language training corpus relative to each word in the target language training corpus, and obtaining a position alignment factor;
SS32, calculating the position alignment probability of each word in the source language sentence after word segmentation relative to each word in the target language sentence after word segmentation based on the translation probability and the position alignment factor; and
SS33, taking the corresponding result of the words of the source language sentence and the words of the target language sentence corresponding to the maximum value of the position alignment probability of each word in the source language sentence relative to each word in the target language sentence after word segmentation as the positive alignment result,
or,
further comprising:
SS34, judging whether each maximum value exceeds a preset threshold value, if the maximum value which is lower than the preset threshold value exists, reversely aligning the words in the source language sentence corresponding to the maximum value which is lower than the preset threshold value,
or,
the reverse alignment includes:
obtaining the translation probability of each word in the target language training corpus relative to each word in the source language training corpus, and simultaneously increasing the position alignment factor in the forward alignment; calculating the position alignment probability of each word in the target language sentence relative to each word in the source language sentence after word segmentation based on the translation probability and the increased position alignment factor; taking a corresponding result of the word of the target language sentence corresponding to the maximum value of the position alignment probability of each word in the target language sentence relative to each word in the source language sentence after word segmentation as a reverse alignment result,
or,
increasing a position alignment factor when obtaining a position alignment probability of each word in the segmented target language sentence relative to each word in the segmented source language sentence,
or,
in step SS31, a translation probability of each word in the segmented source language sentence with respect to each word in the segmented target language sentence is obtained using the source language-target language translation probability table,
or,
obtaining a translation probability of each word in the segmented target language sentence with respect to each word in the segmented source language sentence using a target language-source language translation probability table,
or,
in step SS31, the position alignment probability is calculated by the following formula:
the position alignment probability in the above formula, that is, the position alignment probability of each word i in e aligned to a word j in f;
wherein e is a source sentence, m is a source sentence length, f is a target sentence, n is a target sentence length, theta is a position alignment factor, aiJ represents that the aligned word with word i aligned is word j;
or,
z is calculated by the following calculation formulaθ(i,m,n):
Order:
then:
7. an automatic locator device for correcting term pairs, comprising:
a translation result acquisition module that acquires a machine translation result of a source language sentence and acquires a corrected translation result obtained by correcting the machine translation result;
the text comparison module is used for performing text comparison on the machine translation result and the correction translation result obtained by the translation result obtaining module to obtain at least one candidate query word for positioning each correction term pair in at least one correction term pair;
the alignment module is used for carrying out word alignment on the source language sentence and the correction translation result to obtain a word alignment result; and
a corrected term pair acquisition module that matches the at least one candidate query word for locating each corrected term pair in the at least one corrected term pair with the word alignment result to obtain at least one corrected term pair in the source language sentence and the corrected translation result.
8. The automatic positioning device of correction term pairs according to claim 7, characterized in that said alignment module comprises:
the word segmentation module is used for forming sentence pairs by machine translation results of a source language sentence and a target language sentence, namely the target language sentence, and performing word segmentation on the source language sentence and the target language sentence respectively;
the dictionary alignment module aligns the words in the sentence pairs after word segmentation by using a professional domain dictionary to obtain word pairs which can be aligned by the professional domain dictionary and serve as a dictionary alignment result;
a forward alignment module, which performs forward alignment on each word in the source language sentence and each word in the target language sentence to obtain a word pair which can be aligned in the forward alignment and is used as a forward alignment result;
the reverse alignment module performs reverse alignment on words which cannot be aligned in the forward direction to obtain word pairs which can be aligned in the reverse direction and serve as reverse alignment results; and
a primary alignment result generation module that takes a dictionary alignment result, a forward alignment result, and a reverse alignment result as primary alignment results,
or,
alternatively, the forward alignment module performs forward alignment on words which can not be aligned by the professional domain dictionary to obtain word pairs which can be aligned by the forward alignment, and as a result of the forward alignment,
or,
further comprising a supplementary alignment module that supplementary aligns the primary alignment result,
or,
the supplemental alignment module includes:
a speech segmentation module that segments the sentence pairs into source speech block sequences and target speech block sequences using source language segmentation words and target language segmentation words;
a language block pair generation module which corresponds the source language blocks and the target language blocks one to one on the basis of the primary alignment result to obtain language block pairs;
a language block pair cleaning module, which judges whether the source language words and the target language words in the word pairs in the primary alignment result are simultaneously in a language block pair, if a certain word pair is not simultaneously in a language block pair, the source language words and the target language words in the word pair are removed from the language block pair, and the cleaned language block pair is obtained; and
and the supplementary alignment result generation module is used for aligning unaligned words in the cleaned word block pair to obtain a supplementary alignment result of the primary alignment result.
9. An electronic device, comprising:
a memory storing execution instructions; and
a processor executing execution instructions stored by the memory to cause the processor to perform the method of any of claims 1 to 6.
10. A readable storage medium having stored therein execution instructions, which when executed by a processor, are configured to implement the method of any one of claims 1 to 6.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2020108831719 | 2020-08-28 | ||
CN202010883171.9A CN111985254A (en) | 2020-08-28 | 2020-08-28 | Automatic positioning method and device for correction term pair, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112215018A true CN112215018A (en) | 2021-01-12 |
CN112215018B CN112215018B (en) | 2021-08-13 |
Family
ID=73439679
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010883171.9A Pending CN111985254A (en) | 2020-08-28 | 2020-08-28 | Automatic positioning method and device for correction term pair, electronic equipment and storage medium |
CN202011305060.6A Active CN112215018B (en) | 2020-08-28 | 2020-11-20 | Automatic positioning method and device for correction term pair, electronic equipment and storage medium |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010883171.9A Pending CN111985254A (en) | 2020-08-28 | 2020-08-28 | Automatic positioning method and device for correction term pair, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN111985254A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114118112A (en) * | 2021-12-02 | 2022-03-01 | 江苏省舜禹信息技术有限公司 | Method for merging bilingual merged documents |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101290616A (en) * | 2008-06-11 | 2008-10-22 | 中国科学院计算技术研究所 | Statistical machine translation method and system |
CN102799579A (en) * | 2012-07-18 | 2012-11-28 | 西安理工大学 | Statistical machine translation method with error self-diagnosis and self-correction functions |
CN104375988A (en) * | 2014-11-04 | 2015-02-25 | 北京第二外国语学院 | Word and expression alignment method and device |
CN109545189A (en) * | 2018-12-14 | 2019-03-29 | 东华大学 | A kind of spoken language pronunciation error detection and correcting system based on machine learning |
CN109993160A (en) * | 2019-02-18 | 2019-07-09 | 北京联合大学 | A kind of image flame detection and text and location recognition method and system |
-
2020
- 2020-08-28 CN CN202010883171.9A patent/CN111985254A/en active Pending
- 2020-11-20 CN CN202011305060.6A patent/CN112215018B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101290616A (en) * | 2008-06-11 | 2008-10-22 | 中国科学院计算技术研究所 | Statistical machine translation method and system |
CN102799579A (en) * | 2012-07-18 | 2012-11-28 | 西安理工大学 | Statistical machine translation method with error self-diagnosis and self-correction functions |
CN104375988A (en) * | 2014-11-04 | 2015-02-25 | 北京第二外国语学院 | Word and expression alignment method and device |
CN109545189A (en) * | 2018-12-14 | 2019-03-29 | 东华大学 | A kind of spoken language pronunciation error detection and correcting system based on machine learning |
CN109993160A (en) * | 2019-02-18 | 2019-07-09 | 北京联合大学 | A kind of image flame detection and text and location recognition method and system |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114118112A (en) * | 2021-12-02 | 2022-03-01 | 江苏省舜禹信息技术有限公司 | Method for merging bilingual merged documents |
Also Published As
Publication number | Publication date |
---|---|
CN112215018B (en) | 2021-08-13 |
CN111985254A (en) | 2020-11-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7827027B2 (en) | Method and apparatus for bilingual word alignment, method and apparatus for training bilingual word alignment model | |
CN108595410B (en) | Automatic correction method and device for handwritten composition | |
EP2300939A1 (en) | Method and system for using alignment means in matching translation | |
WO2008019509A1 (en) | Means and method for training a statistical machine translation system | |
CN105068997B (en) | The construction method and device of parallel corpora | |
RU2641225C2 (en) | Method of detecting necessity of standard learning for verification of recognized text | |
CN103853710A (en) | Coordinated training-based dual-language named entity identification method | |
CN112215018B (en) | Automatic positioning method and device for correction term pair, electronic equipment and storage medium | |
CN111651978A (en) | Entity-based lexical examination method and device, computer equipment and storage medium | |
US7729540B2 (en) | Translation device, translation program, and translation method | |
CN104375988A (en) | Word and expression alignment method and device | |
CN111026815A (en) | Method for extracting specific relation of entity pair based on user-assisted correction | |
CN109871544B (en) | Entity identification method, device, equipment and storage medium based on Chinese medical record | |
JP2003022269A (en) | Cartoon translation device and its system and cartoon translation method | |
WO2022166267A1 (en) | Machine translation post-editing method and system | |
CN112199965B (en) | Word alignment method and device of machine translation result, electronic equipment and storage medium | |
CN102855477B (en) | Method and device for recognizing direction of characters in image block | |
CN112766002A (en) | Text alignment method and system based on dynamic programming | |
CN102955770A (en) | Method and system for automatic recognition of pinyin | |
CN111009296A (en) | Capsule endoscopy report labeling method, apparatus, and medium | |
CN114462427A (en) | Machine translation method and device based on term protection | |
CN114564970A (en) | Full-automatic corpus alignment system and method | |
CN114528824A (en) | Text error correction method and device, electronic equipment and storage medium | |
Ortega et al. | Using any machine translation source for fuzzy-match repair in a computer-aided translation setting | |
JP6558696B2 (en) | Word association device, machine translation learning device, method, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Zhou Yu Inventor after: Deng Biao Inventor after: Liu Peng Inventor after: Han Yanchao Inventor before: Zhou Yu Inventor before: Deng Biao Inventor before: Li Xiaoqing Inventor before: Liu Peng Inventor before: Han Yanchao |