CN113051936A - Method for enhancing Hanyue neural machine translation based on low-frequency word representation - Google Patents

Method for enhancing Hanyue neural machine translation based on low-frequency word representation Download PDF

Info

Publication number
CN113051936A
CN113051936A CN202110280508.1A CN202110280508A CN113051936A CN 113051936 A CN113051936 A CN 113051936A CN 202110280508 A CN202110280508 A CN 202110280508A CN 113051936 A CN113051936 A CN 113051936A
Authority
CN
China
Prior art keywords
low
word
frequency
words
chinese
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110280508.1A
Other languages
Chinese (zh)
Inventor
余正涛
杨福岸
高盛祥
王振晗
朱俊国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202110280508.1A priority Critical patent/CN113051936A/en
Publication of CN113051936A publication Critical patent/CN113051936A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/53Processing of non-Latin text

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a method for enhancing Hanyue neural machine translation based on low-frequency word representation, and belongs to the field of natural language processing. Low frequency words in neural machine translation are a key factor affecting the performance of translation models. Because the frequency of the low-frequency words appearing in the data set is low, the representation of the low-frequency words in the training process is not accurate enough, and the problem is more obviously influenced in a low-resource neural machine translation task. The method learns the probability distribution of the low-frequency words by utilizing the monolingual data context information, recalculates the word embedding of the low-frequency words according to the distribution, and then retrains the Transformer model on the basis of the obtained word embedding, thereby effectively relieving the problem of inaccurate representation of the low-frequency words. Experiments are respectively carried out on two low-resource translation tasks of Han-Yue and Yue-Han, and the experimental results show that the method provided by the invention is respectively improved by 8.58% and 6.06% on the two tasks compared with a baseline model.

Description

Method for enhancing Hanyue neural machine translation based on low-frequency word representation
Technical Field
The invention relates to a method for enhancing Hanyue neural machine translation based on low-frequency word representation, and belongs to the technical field of natural language processing.
Background
The core of the word representation enhancement method is how to learn more accurately to a more accurate word representation form, and the difficulty is how to represent low-frequency words. In general, there are roughly 2 methods for word representation enhancement: (1) a method based on external knowledge integration. The method is characterized in that the prior knowledge is blended, so that words have richer meanings to achieve the purpose of enhancing word representation; (2) a method based on internal knowledge enhancement. The method has the advantages that the representation form of the word is learnt again through the monolingual data, so that the representation form of the word contains richer translation information, and the representation of the word is more accurate. The 2 methods can enhance the expression form of words to a certain extent, so that the meanings of the enhanced words are more consistent with the meanings of sentences, but the method for enhancing the low-frequency word representation is not available, so that the problem of poor translation of the low-frequency words cannot be solved.
Disclosure of Invention
The invention provides a method for enhancing Hanyue neural machine translation based on low-frequency word representation, which solves the problem that low-frequency words are not well represented in neural machine translation by introducing a language model and a low-frequency word dictionary into a Transformer translation model.
The technical scheme of the invention is as follows: a method of hanyue neural machine translation enhancement based on low frequency word representation, comprising:
step1, collecting Chinese-Yue bilingual corpus, and preprocessing the collected corpus;
step2, learning the probability distribution of each word through a language model;
step3, constructing a Chinese-lower frequency word dictionary;
step4, judging low-frequency words in the translation model input by using the Chinese-lower frequency word dictionary constructed in Step3, and updating the representation of the original low-frequency words by using Step2 probability distribution so as to obtain a new representation form of the translation model input;
and Step5, retraining the Transformer translation model on the basis of the characterization form obtained in Step4.
As a further scheme of the present invention, the Step1 specifically comprises the following steps:
step1.1, translating English into Chinese by a linguistic expert according to the public IWLST English-Vietnamese bilingual parallel corpus to obtain a Chinese-Vietnamese parallel corpus;
step1.2, cleaning and word segmentation processing are carried out on the speech material, and 127,481 pair Chinese and Vietnamese parallel data are finally obtained;
step1.3, segmenting Chinese sentences by using a bus segmentation tool, and cutting punctuations by using a tokenizer for processing Vietnamese.
As a further scheme of the present invention, the Step2 specifically comprises the following steps:
step2.1, for any word w in the dictionary, the probability distribution is:
P(w)=(P1(w),P2(w),P3(w),...,P|V|(w),) (1)
satisfies the following conditions:
Figure BDA0002978100190000021
step2.2, the language model to calculate the conditional probabilities of all words before P (w) and V, for the t-th word x in a sentencetThe method comprises the following steps:
Pj(xt)=LM(wj|x<t) (3)。
as a further scheme of the present invention, the Step3 specifically comprises the following steps:
step3.1, respectively counting word frequencies of Chinese and Vietnamese;
step3.2, defining low-frequency words according to the word frequency distribution rule, namely determining the word grade by adopting a maximum value method, namely arranging the occurrence times of the words from high to low, wherein the grade is the word sequence value k of the words, and constructing a low-frequency word dictionary d with the word sequence value kk
Step3.3, constructing a low-frequency word dictionary Dk
Figure BDA0002978100190000022
As a further scheme of the present invention, the Step4 specifically comprises the following steps:
step4.1, low frequency word dictionary D constructed by usingKJudging words in the input sentence as low-frequency words if xt∈DKY, otherwise N;
step4.2, if Y, P (x) trained using the language modelt) To update x corresponding to iti(ii) a If the number is N, the original sequence is kept unchanged, so that a new source end sequence X' is obtained;
step4.3, multiplying the obtained new source end sequence X' with a word embedding matrix E of a dictionary V to obtain the input of a translation model:
input=X'E (5)。
as a further aspect of the present invention, Step5 further includes:
step5.1, finally obtaining a translation result through a translation model Transformer:
output=Transformer(input,Y) (6)。
the invention has the beneficial effects that:
1. according to the invention, a language model and a low-frequency word dictionary are introduced into a Transformer model, so that the problem that the low-frequency words are not well represented in neural machine translation can be effectively solved.
2. The method can further improve the performance of a machine translation model on a classical Transformer model and a Transformer + LM model without distinguishing word frequency.
3. The experimental result of the invention shows that the Chinese-Yuan neural machine translation method with enhanced low-frequency word representation, which is provided by the invention, improves the BLEU4 scores by 8.58% and 6.06% on two low-resource translation tasks of Han-Yuan and Yuan-Han respectively relative to a baseline model.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a diagram of a translation model architecture of the present invention;
FIG. 3 is a diagram illustrating the influence of the K-class low-frequency word dictionary on the Han-Yuan model according to the present invention;
FIG. 4 is a diagram illustrating the effect of the K-class low-frequency word dictionary on the over-the-Chinese model according to the present invention.
Detailed Description
For the purpose of describing the invention in more detail and facilitating understanding for those skilled in the art, the present invention will be further described with reference to the accompanying drawings and examples, which are provided for illustration and understanding of the present invention and are not intended to limit the present invention.
Example 1: 1-4, a method for enhanced hanyue neural machine translation based on low frequency word representation, comprising the steps of:
step1, collecting Chinese-Yue bilingual corpus, and preprocessing the collected corpus;
step1.1, translating English into Chinese by a linguistic expert according to the public IWLST English-Vietnamese bilingual parallel corpus to obtain a Chinese-Vietnamese parallel corpus;
step1.2, cleaning and word segmentation processing are carried out on the speech material, and 127,481 pair Chinese and Vietnamese parallel data are finally obtained;
step1.3, segmenting Chinese sentences by using a bus segmentation tool, and cutting punctuations by using a tokenizer for processing Vietnamese.
Step2, learning the probability distribution of each word through a language model;
the language model learns the probability distribution of the low-frequency words by utilizing the monolingual data context information, namely, for a given source end and target end sentence pair, the probability distribution of each word is obtained through the language model;
the purpose of the language model is to obtain the probability distribution of each low frequency word in a lexicon of vocabulary size | V |, which for any low frequency word w is:
P(w)=(P1(w),P2(w),P3(w),...,P|V|(w),) (1)
satisfies the following conditions:
Figure BDA0002978100190000041
the probability distribution P (w) of the low-frequency words w can be calculated by various methods, the invention utilizes a pre-trained 6-layer Transformer decoder as a language model to calculate the conditional probability of all words before P (w) and V, and the t x th word in a sentence istThe words, there are:
Pj(xt)=LM(wj|x<t) (3)
wherein LM (w)j|x<t) Representing the probability of the jth word after it appears in the dictionary, the probability distribution computed by the language model can be seen as a smooth approximation of one-hot since they have the same vocabulary size since they are trained using the same corpus as the translation model.
Step3, constructing a Chinese-lower frequency word dictionary;
as a further scheme of the present invention, the Step3 specifically comprises the following steps:
step3.1, respectively counting word frequencies of Chinese and Vietnamese;
step3.2, defining low-frequency words according to the word frequency distribution rule, namely determining the word grade by adopting a maximum value method, namely arranging the occurrence times of the words from high to low, wherein the grade is the word sequence value k of the words, and constructing a low-frequency word dictionary d with the word sequence value kk
Step3.3, constructing a low-frequency word dictionary Dk
Figure BDA0002978100190000042
Specifically, Chinese and Vietnamese low-frequency word dictionaries are respectively constructed in a statistical mode. The method comprises the steps of selecting low-frequency words by taking a Chinese and Vietnamese training set as a target, defining low-frequency word dictionary words with a word sequence value equal to K as a word sequence value K-type low-frequency word dictionary, and defining the low-frequency word dictionary with the word sequence value K less than or equal to K as a K-type low-frequency word dictionary. And respectively constructing a K-type low-frequency word dictionary and a K-type low-frequency word dictionary according to the word sequence value K (K takes 1 to 10) of each word. And respectively counting the dictionary coverage rate of the low-frequency word dictionary, wherein the dictionary coverage rate is the ratio of the size of the low-frequency word dictionary to the size of the total dictionary, and the total dictionary is obtained by counting the training set.
The Chinese dictionary vocabulary has a size of 47356 and the total number of words in the training set is 2275526. The word sequence value k type low-frequency words respectively have 18496, 6656, 3787, 2508, 1812, 1397, 1067, 832, 719 and 593 words. The Vietnamese dictionary vocabulary size is 22732 and the training set total number of words is 3189350. The low-frequency words of the word sequence value k class respectively have 9428 words, 3188 words, 1667 words, 1006 words, 718 words, 514 words 393 words, 340 words, 188 words and 223 words.
Step4, judging low-frequency words in the translation model input by using the Chinese-lower frequency word dictionary constructed in Step3, and updating the representation of the original low-frequency words by using Step2 probability distribution so as to obtain a new representation form of the translation model input;
as a further scheme of the present invention, the Step4 specifically comprises the following steps:
step4.1, low frequency word dictionary D constructed by usingKJudging words in the input sentence as low-frequency words if xt∈DKY, otherwise N;
step4.2, if Y, P (x) trained using the language modelt) To update x corresponding to iti(ii) a If the number is N, the original sequence is kept unchanged, so that a new source end sequence X' is obtained;
step4.3, multiplying the obtained new source end sequence X' with a word embedding matrix E of a dictionary V to obtain the input of a translation model:
input=X'E (5)。
and Step5, retraining the Transformer translation model on the basis of the characterization form obtained in Step4. Finally, obtaining a translation result through a translation model Transformer:
output=Transformer(input,Y) (6)。
for better effectiveness of the training model and the verification model, the parallel data of the bilingual Hanyue with the scale of 2,000 pairs are randomly extracted from the parallel data of the bilingual Hanyue respectively to serve as a test set and a verification set, and the rest are taken as training sets, wherein specific data information is shown in table 1:
TABLE 1 data size and data set partitioning
Figure BDA0002978100190000051
In the Chinese-Vietnamese translation task, a Transformer Decoder is adopted as a Chinese language model. The training set and the verification set of the Chinese language model are derived from Chinese corpus in the translation model, and the scales of the training set and the verification set are 127,481 and 2,000 Chinese monolingual data respectively; in the Vietnamese-Chinese translation task, the structure of a language model is the same as that of a model in the Chinese-Vietnamese translation task, and a training set and a verification set of the Vietnamese language model are derived from Vietnamese monolingual corpus in the translation model, and the Vietnamese monolingual data are 127,481 and 2,000 pieces of Vietnamese monolingual data respectively.
The low-frequency words are not well performed in the neural machine translation of the lower resources of the Chinese, and in order to distinguish the low-frequency words from other words, the method is used for performing low-frequency word representation enhancement on the low-frequency words, and a low-frequency word dictionary is constructed. The chinese-vietnam word order value k-class low-frequency word dictionary is shown in table 2:
TABLE 2 Chinese-Vietnamese word order value k-class low-frequency words
Figure BDA0002978100190000061
In the invention, the word list size of the Chinese dictionary is 47,356, the word list size of the Vietnam words is 22,732, the maximum Maxpot of batch is 2048, the maximum length of a sentence is 128, the maximum epoch is 100 rounds, dropout is set to 0.1, the word embedding dimension is 512 dimensions, and the hidden layer dimension is 512 dimensions. All models were trained by the Adam optimizer and the initial learning rate was 10-4.
After the language module training is completed, the optimal training parameters of the model are saved, and when the translation model is trained, the optimal training parameters of the language model are fixedly used. The method provided by the invention is verified on two tasks of Chinese-Vietnamese and Vietnamese-Chinese by using Chinese-Vietnamese parallel data. The method adopts a self-help resampling method (resampling for 1000 times) and uses a BLEU4 value as an evaluation index on a test set under the condition that the significance level p is less than 0.05.
The present invention employs the following two models as baseline models. One is the classical Transformer model (Transformer): experiments were performed using the Transformer _ base model in both the chinese-vietnamese and vietnamese-chinese translation tasks. And secondly, adding a language model (Transformer + LM) on the basis of the Transformer, randomly replacing the input of the translation model by using a training result of the language model, wherein the replacement probability is gamma, and the gamma is 0.15 (the gamma value is the optimal setting used in the document [2 ]) and performing experiments on two translation tasks of Chinese-Vietnamese and Vietnamese-Chinese.
To verify the effectiveness of the method of the present invention, two baseline models, respectively, the classical Transformer model and the Transformer + LM model used in the prior art (corpus scale is 127,481 for chinese-vietnamese parallel data) were compared in the experiment. The invention respectively carries out experiments on translation tasks in two directions of Chinese-Vietnamese and Vietnamese-Chinese, and the experimental result is the BLEU4 score of each translation model, as shown in Table 3.
TABLE 3 experimental results of Chinese-Vietnamese, Vietnamese-Chinese
Figure BDA0002978100190000071
As can be seen from the above table, in the translation tasks in the two directions of the chinese-vietnamese and the vietnamese-chinese, the transform + LM model respectively improves the BLEU4 values by 0.87 and 0.59 compared with the classical transform model; compared with a Transformer + LM model, the method provided by the invention respectively improves the BLEU4 values by 0.84 and 0.68. According to the results, the method provided by the invention has better promotion in Chinese-Vietnamese and Vietnamese-Chinese translation tasks compared with a Transformer model and a Transformer + LM model, and proves that the method based on low-frequency word representation enhancement provided by the invention is effective in Chinese-Vietnamese and Vietnamese-Chinese translation tasks. The analysis from the experimental result shows that the Transformer + LM model is superior to the classical Transformer model, and word context information is introduced randomly by the Transformer + LM model through the language model, so that randomly introduced words can obtain richer information, and the effectiveness of the word context information introduced in the Transformer + LM model is proved. Compared with a Transformer + LM model, the translation performance of the method is greatly improved, the information of the low-frequency words is considered in the method, the probability estimation of the context is only carried out on the low-frequency words, the translation performance is improved, and the performance is reduced due to the fact that the low-frequency words and the non-low-frequency words are not distinguished. The experimental result shows that the method can relieve the problem of poor translation of low-frequency words, and has obvious advantages on two translation tasks of Chinese-Vietnamese and Vietnamese-Chinese.
In order to analyze the influence of the occurrence frequency of low-frequency words on the method of the present invention, as shown in fig. 1, the method of the present invention performs model performance tests on two translation tasks of chinese-vietnamese and vietnamese-chinese according to words whose occurrence frequency is less than or equal to K (K ═ 1, 2., 10). The results are shown in FIGS. 3 and 4.
As can be seen from fig. 3 and 4, in the chinese-overtaking and overtaking-chinese translation task, as the K value increases, the overall trend increases first and then decreases, and when the K value takes 5 and 6 respectively, that is, the low-frequency words are set to appear in the training set with frequencies less than or equal to 5 and 6 (70.25% and 70.66% of the word list size respectively), the BLEU4 value takes the highest value; the result of the classical Transformer model is obtained when the K value is 0, and the performance of the model is superior to that of the classical Transformer model when the K value is 1,2, 10; in the ascending process, when the K value is equal to 3, the performance of the model of the method exceeds that of a transform + LM model; in the process of descending, when the K value is respectively 9 and 10, the performance of the Transformer + LM model is slightly superior to that of the method.
As shown in fig. 3, when the K value is 0 (classical Transformer model), the Transformer + LM model is superior to the classical Transformer model because the Transformer + LM model introduces the context information of the random word; when the K value is less than or equal to 5, the model effect is stably increased, the occurrence frequency of words in the K-class low-frequency word dictionary is low, the low-frequency words cannot be well represented in the translation model, and the context information of the low-frequency words is used for replacing the low-frequency words for representation, so that the representation information of the low-frequency words is enriched, the low-frequency words have richer context semantic information, and the model is stably increased. When the K value is more than 5, namely words with the occurrence frequency more than 5 are added into the low-frequency word dictionary, the newly added words can be well trained, and the representation of the trained words is superior to the enhanced representation form provided by the language model. Therefore, the newly added words in the low-frequency word dictionary cannot achieve the effect of optimizing the translation performance. Therefore, when the K value is larger than 5, the translation effect is continuously reduced.
TABLE 4 Han-Yue translation example analysis
Figure BDA0002978100190000081
TABLE 5 analysis of examples of Yuehan-Han translation
Figure BDA0002978100190000082
As can be seen from the analysis table 4, the method has better translation effect on the low-frequency words, and in the example of Chinese-Vietnamese translation, the low-frequency words comprise prerequisites which appear 5 times in the Chinese training data set, and the method translates the words into the words
Figure BDA0002978100190000083
Can be seen from the real translation
Figure BDA0002978100190000084
The meaning is completely similar and the context of the sentence is well understood. The baseline model translation adopts the translation result of the Transformer + LM model and is translated into' nhu
Figure BDA0002978100190000085
", does not well mean the low frequency word" prerequisite ". As shown in Table 5, in the example of Vietnam-Chinese translation, the word "ng-n" is classified as a low frequency word dictionary, which appears 6 in the Vietnam training data set, and is "depressed" in both the translated text and the translated text of the present invention, and the line model translated text is modeled as a transform modelThe translation result is translated into "urgent", which is far from the true translation and is not a qualified translation. Therefore, the method can fully show and translate the low-frequency words, and can better understand the meaning of the low-frequency words in the sentences.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims (6)

1. A method for enhanced hanyue neural machine translation based on low frequency word representation, comprising: comprises the following steps:
step1, collecting Chinese-Yue bilingual corpus, and preprocessing the collected corpus;
step2, learning the probability distribution of each word through a language model;
step3, constructing a Chinese-lower frequency word dictionary;
step4, judging low-frequency words in the translation model input by using the Chinese-lower frequency word dictionary constructed in Step3, and updating the representation of the original low-frequency words by using Step2 probability distribution so as to obtain a new representation form of the translation model input;
and Step5, retraining the Transformer translation model on the basis of the characterization form obtained in Step4.
2. The method for enhanced hanyue neural machine translation based on low-frequency word representation according to claim 1, wherein: the specific steps of Step1 are as follows:
step1.1, translating English into Chinese by a linguistic expert according to the public IWLST English-Vietnamese bilingual parallel corpus to obtain a Chinese-Vietnamese parallel corpus;
step1.2, cleaning and word segmentation processing are carried out on the speech material, and 127,481 pair Chinese and Vietnamese parallel data are finally obtained;
step1.3, segmenting Chinese sentences by using a bus segmentation tool, and cutting punctuations by using a tokenizer for processing Vietnamese.
3. The method for enhanced hanyue neural machine translation based on low-frequency word representation according to claim 1, wherein: the specific steps of Step2 are as follows:
step2.1, for any word w in the dictionary, the probability distribution is:
P(w)=(P1(w),P2(w),P3(w),...,P|V|(w),) (1)
satisfies the following conditions:
Figure FDA0002978100180000011
step2.2, the language model to calculate the conditional probabilities of all words before P (w) and V, for the t-th word x in a sentencetThe method comprises the following steps:
Pj(xt)=LM(wj|x<t) (3)。
4. the method for enhanced hanyue neural machine translation based on low-frequency word representation according to claim 1, wherein: the specific steps of Step3 are as follows:
step3.1, respectively counting word frequencies of Chinese and Vietnamese;
step3.2, defining low-frequency words according to the word frequency distribution rule, namely determining the word grade by adopting a maximum value method, namely arranging the occurrence times of the words from high to low, wherein the grade is the word sequence value k of the words, and constructing a low-frequency word dictionary d with the word sequence value kk
Step3.3, constructing a low-frequency word dictionary Dk
Figure FDA0002978100180000021
5. The method for enhanced hanyue neural machine translation based on low-frequency word representation according to claim 1, wherein: the specific steps of Step4 are as follows:
step4.1, low frequency word dictionary D constructed by usingKJudging words in the input sentence as low-frequency words if xt∈DKY, otherwise N;
step4.2, if Y, P (x) trained using the language modelt) To update x corresponding to iti(ii) a If the number is N, the original sequence is kept unchanged, so that a new source end sequence X' is obtained;
step4.3, multiplying the obtained new source end sequence X' with a word embedding matrix E of a dictionary V to obtain the input of a translation model:
input=X'E (5)。
6. the method for enhanced hanyue neural machine translation based on low-frequency word representation according to claim 1, wherein: the Step5 further comprises:
step5.1, finally obtaining a translation result through a translation model Transformer:
output=Transformer(input,Y) (6)。
CN202110280508.1A 2021-03-16 2021-03-16 Method for enhancing Hanyue neural machine translation based on low-frequency word representation Pending CN113051936A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110280508.1A CN113051936A (en) 2021-03-16 2021-03-16 Method for enhancing Hanyue neural machine translation based on low-frequency word representation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110280508.1A CN113051936A (en) 2021-03-16 2021-03-16 Method for enhancing Hanyue neural machine translation based on low-frequency word representation

Publications (1)

Publication Number Publication Date
CN113051936A true CN113051936A (en) 2021-06-29

Family

ID=76512520

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110280508.1A Pending CN113051936A (en) 2021-03-16 2021-03-16 Method for enhancing Hanyue neural machine translation based on low-frequency word representation

Country Status (1)

Country Link
CN (1) CN113051936A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038725A (en) * 2017-12-04 2018-05-15 中国计量大学 A kind of electric business Customer Satisfaction for Product analysis method based on machine learning
CN109117480A (en) * 2018-08-17 2019-01-01 腾讯科技(深圳)有限公司 Word prediction technique, device, computer equipment and storage medium
CN111428518A (en) * 2019-01-09 2020-07-17 科大讯飞股份有限公司 Low-frequency word translation method and device
CN112215017A (en) * 2020-10-22 2021-01-12 内蒙古工业大学 Mongolian Chinese machine translation method based on pseudo parallel corpus construction

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038725A (en) * 2017-12-04 2018-05-15 中国计量大学 A kind of electric business Customer Satisfaction for Product analysis method based on machine learning
CN109117480A (en) * 2018-08-17 2019-01-01 腾讯科技(深圳)有限公司 Word prediction technique, device, computer equipment and storage medium
CN111428518A (en) * 2019-01-09 2020-07-17 科大讯飞股份有限公司 Low-frequency word translation method and device
CN112215017A (en) * 2020-10-22 2021-01-12 内蒙古工业大学 Mongolian Chinese machine translation method based on pseudo parallel corpus construction

Similar Documents

Publication Publication Date Title
CN110442760B (en) Synonym mining method and device for question-answer retrieval system
CN108399163B (en) Text similarity measurement method combining word aggregation and word combination semantic features
CN110378409B (en) Chinese-Yue news document abstract generation method based on element association attention mechanism
CN112464676B (en) Machine translation result scoring method and device
CN109359294A (en) A kind of archaic Chinese interpretation method based on neural machine translation
CN107870901A (en) Similar literary method, program, device and system are generated from translation source original text
CN111914532A (en) Chinese composition scoring method
CN110489554B (en) Attribute-level emotion classification method based on location-aware mutual attention network model
Elsherif et al. Perspectives of Arabic machine translation
CN115935959A (en) Method for labeling low-resource glue word sequence
Yousif Hidden Markov Model tagger for applications based Arabic text: A review
Forsyth Automatic readability prediction for modern standard Arabic
CN111815426B (en) Data processing method and terminal related to financial investment and research
CN116757188A (en) Cross-language information retrieval training method based on alignment query entity pairs
Joshi et al. Word embeddings in low resource Gujarati language
Khaliq et al. Induction of root and pattern lexicon for unsupervised morphological analysis of Arabic
CN113051936A (en) Method for enhancing Hanyue neural machine translation based on low-frequency word representation
CN110674871B (en) Translation-oriented automatic scoring method and automatic scoring system
CN111709245A (en) Chinese-Yuan pseudo parallel sentence pair extraction method based on semantic self-adaptive coding
Kornilov et al. Numerical Assessment of Machine Translation Quality by Method of Near Duplicates Analysis
ZHANG A New Method for Improving the Accuracy of Word Segmentation in Modern Chinese Texts
Li An automated English translation judging system based on feature extraction algorithm
CN113627152B (en) Self-supervision learning-based unsupervised machine reading and understanding training method
CN115034239B (en) Machine translation method of Han-Yue nerve based on noise reduction prototype sequence
Deng Using Machine learning Techniques to Improve the Accuracy and Fluency of Japanese Translation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210629