CN109657244B - English long sentence automatic segmentation method and system - Google Patents

English long sentence automatic segmentation method and system Download PDF

Info

Publication number
CN109657244B
CN109657244B CN201811549280.6A CN201811549280A CN109657244B CN 109657244 B CN109657244 B CN 109657244B CN 201811549280 A CN201811549280 A CN 201811549280A CN 109657244 B CN109657244 B CN 109657244B
Authority
CN
China
Prior art keywords
sequence
english
neural network
network model
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811549280.6A
Other languages
Chinese (zh)
Other versions
CN109657244A (en
Inventor
张睦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Iol Wuhan Information Technology Co ltd
Original Assignee
Iol Wuhan Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Iol Wuhan Information Technology Co ltd filed Critical Iol Wuhan Information Technology Co ltd
Priority to CN201811549280.6A priority Critical patent/CN109657244B/en
Publication of CN109657244A publication Critical patent/CN109657244A/en
Application granted granted Critical
Publication of CN109657244B publication Critical patent/CN109657244B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Abstract

The embodiment of the invention provides an English long sentence automatic segmentation method and a system, wherein the method comprises the following steps: obtaining English long sentences to be divided; and inputting the English long sentence to be divided into the trained sequence and the neural network model of the sequence frame, and outputting two English short sentences. According to the method and the system for automatically segmenting the English long sentence, provided by the embodiment of the invention, the mode recognition is carried out by utilizing the sequence-to-sequence neural network model, the English long sentence is automatically segmented into two short sentences, and the human resources are greatly saved.

Description

English long sentence automatic segmentation method and system
Technical Field
The embodiment of the invention relates to the technical field of translation, in particular to an automatic English long sentence segmentation method and system.
Background
After an english translator from a native chinese language country completes the translation from chinese to english, a translation company often invites a language expert from the native english language country to examine the translation of the translator in order to further ensure the translation quality. By comparing the translated translations of a batch of translators with the ones reviewed by experts, it can be seen that the most popular type of review modification from foreign experts, in addition to some simple grammar, spelling and editing error correction, is the replacement of a long english sentence with two short sentences of the same origin and logical consistency.
Because of the difference between Chinese and English in languages, when a short Chinese text is translated into English, the complete information is often described by a larger length of English text. Meanwhile, the translation of the checking version of the foreign expert is more reasonable in language organization, and the reading sense is better brought to the reader by the way of the literary composition. However, the labor cost of inviting the foreign experts to do the checking work is very high, so that the labor resource is a great waste.
Therefore, there is a need for an automatic English long sentence segmentation method to solve the above problems.
Disclosure of Invention
In order to solve the above problems, embodiments of the present invention provide an automatic English long sentence segmentation method and system that overcome the above problems or at least partially solve the above problems.
The first aspect of the present invention provides an automatic English long sentence segmentation method, including:
obtaining English long sentences to be divided;
and inputting the English long sentence to be divided into the trained sequence-to-sequence frame neural network model, and outputting two English short sentences.
In a second aspect, an embodiment of the present invention provides an automatic English long sentence segmentation system, including:
the acquisition module is used for acquiring English long sentences to be divided;
and the automatic cutting module is used for inputting the English long sentence to be cut into the trained sequence to the neural network model of the sequence frame and outputting two English short sentences.
Third aspect an embodiment of the present invention provides an electronic device, including:
a processor, a memory, a communication interface, and a bus; the processor, the memory and the communication interface complete mutual communication through the bus; the memory stores program instructions which can be executed by the processor, and the processor calls the program instructions to execute the English long sentence automatic segmentation method.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, which stores computer instructions, where the computer instructions cause the computer to execute the method for automatically segmenting long english sentences as described above.
According to the method and the system for automatically segmenting the English long sentence, provided by the embodiment of the invention, the mode recognition is carried out by utilizing the sequence-to-sequence neural network model, the English long sentence is automatically segmented into two short sentences, and the human resources are greatly saved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic flow chart of an automatic English long sentence segmentation method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an encoder structure provided in an embodiment of the present invention;
FIG. 3 is a block diagram of a first phrase decoder according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a second syntax decoder according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an automatic english long sentence segmentation system according to an embodiment of the present invention;
fig. 6 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments, but not all embodiments, of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
At present, after an english translator from a native chinese language country completes the chinese to english translation, a translation company usually invites a language expert from the native english language country to examine the translation of the translator in order to further ensure the translation quality. By comparing the translations translated by a batch of translators with the translations reviewed by experts, it has been found that in addition to some simple grammar, spelling and editing error correction, the most popular type of review modification from foreign experts is the replacement of a long english sentence by two short sentences that are identical in meaning and logically consecutive. The main reason for this is not the translation expertise of the translator itself, but rather the linguistic differences between Chinese and English. When a short piece of chinese text is translated into english, the complete information is often described in more extensive english text.
Table 1 exemplary translation examples provided by embodiments of the present invention
Figure BDA0001910224090000031
Figure BDA0001910224090000041
Table 1 is a typical translation example provided by the embodiment of the present invention, and as shown in table 1, the difference between two different translations in table 1 is small in terms of the editing distance of the character string. It is clear that the translation of the checked version of the foreign expert is more reasonable in language organization and that this way of speaking the line to the reader gives a better reading. However, the labor cost of inviting the foreign expert to do the checking work is very high.
To solve the above problem, fig. 1 is a schematic flow chart of an automatic english long sentence segmentation method according to an embodiment of the present invention, as shown in fig. 1, including:
101. obtaining English long sentences to be divided;
102. and inputting the English long sentence to be divided into the trained sequence and the neural network model of the sequence frame, and outputting two English short sentences.
It can be understood that, in order to achieve the effect of automatically segmenting an english long sentence, the embodiment of the present invention provides a neural network model from a trained sequence to a sequence frame to automatically segment any english long sentence to be segmented, the long sentence segmentation function can be automatically completed only by inputting the english long sentence to be segmented into the trained sequence to the neural network model of the sequence frame, and the segmented sentences are two english short sentences, which can be referred to as a first short sentence and a second short sentence in the embodiment of the present invention.
Specifically, in step 101, in the embodiment of the present invention, one or more english long sentences to be divided need to be obtained. It should be noted that, in the embodiment of the present invention, no limitation is made to the specific length and type of the english long sentence.
Then, in step 102, the English long sentence to be segmented is input into the neural network model of the trained sequence-to-sequence framework, the model is a neural network model which is trained in advance and can automatically complete the short sentence segmentation function, historical original texts, translator translations and checking translations need to be accumulated as training sets in the training process, and are marked respectively, so that the neural network model can learn the translation mode of the checking translations, and the final segmentation is completed.
According to the method and the system for automatically segmenting the English long sentence, the mode identification is carried out by utilizing the sequence-to-sequence neural network model, the English long sentence is automatically segmented into two short sentences, and the human resources are greatly saved.
On the basis of the above embodiment, before the english long sentence to be divided is input into the neural network with the trained sequence-to-sequence framework and two english short sentences are output, the method further includes:
obtaining a corpus data set, wherein the corpus data set comprises an original text, a translator translation and an auditing translation;
and training a preset sequence-to-sequence framework neural network model by using the corpus data set as a training sample set to obtain the trained sequence-to-sequence framework neural network model.
It can be known from the content of the above embodiment that the embodiment of the present invention provides a trained neural network model from a sequence to a sequence frame, and then the neural network model from the sequence to the sequence frame needs a training sample set to perform training to complete an automatic segmentation function.
In the embodiment of the invention, the original text, the translator translation and the checking translation are used as corpus data sets, and are used as training sets in a one-to-one correspondence mode to train the neural network model from the sequence to the sequence frame. It should be noted that the selected corpus data set is data that has been translated historically, and the latest wikipedia chinese and english monolingual corpus is downloaded and participled during the training process. Then, chinese and English word vectors are trained by using a Skip-Gram algorithm, wherein the most main training hyper-parameters can be preferably set as follows: the dimension of the word vector is 300 and the context window is 5.
And finally, training a sequence to sequence frame neural network model based on the training sample set and the Chinese and English word vectors which are trained firstly.
On the basis of the above embodiment, before the training of the neural network model from a preset sequence to a sequence framework with the corpus data set as a training sample set, the method further includes:
and performing data preprocessing of word segmentation and sentence segmentation on the text in the corpus data set.
It can be known from the content of the above embodiment that the embodiment of the present invention accumulates the original text, the translator translation and the reviewing translation as the corpus data set, and then the embodiment of the present invention performs the word segmentation and sentence segmentation preprocessing on the text in the corpus data set.
Specifically, the method comprises the step of screening N triples (one original sentence, one translator translation sentence, (a first checking translation sentence, a second checking translation sentence)) from the triples to serve as model training and verification tests.
The invention reasonably records the data set as D = { D = { (D) } 1 ,D 2 ,D 3 ,…,D N In which D is i =(SRC i ,TRAS i ,(REVIEW i1 ,REVIEW i2 )). Randomly extracting 20% from D as a verification test set D test The remaining 80% was used as trainingExercise and Collection D train
On the basis of the above embodiment, the sequence-to-sequence framework neural network model includes:
the system comprises an original text encoder, a translated text encoder, a first short sentence decoder and a second short sentence decoder.
The training of the neural network model from a preset sequence to a sequence frame by taking the corpus data set as a training sample set comprises the following steps:
combining the original text vector and the translated text vector in the training sample set into a first vector based on the original text encoder and the translated text encoder;
generating a first phrase and a second vector based on the first phrase decoder and the first vector;
generating a second phrase based on the second phrase decoder and the second vector.
The sequence-to-sequence framework neural network model provided by the embodiment of the invention mainly comprises four components, namely an original text encoder, a translated text encoder, a first short sentence decoder and a second short sentence decoder.
FIG. 2 is a schematic diagram of an encoder according to an embodiment of the present invention, as shown in FIG. 2, namely an original encoder and a translation encoder, which encode the original into an original vector C using a recurrent neural network LSTM src And translation vector C trans And are combined by concatenation into a new vector C, the first vector in the embodiment of the present invention.
For example: the original text' I did not want to preserve the incenses before, but rather wishes to drag the tiger with the old and weak, and create a chance for you to escape. "and translator translation" What I ear before do not mean narrowing the translation using my old and week body so all have a change to escape "are encoded into vectors by the encoder, respectively, and are connected into a new vector C.
Next, fig. 3 is a schematic diagram of a first phrase decoder according to an embodiment of the present invention, and as shown in fig. 3, a word vector is used as a first decoderInputting and combining the first vector to generate the first short sentence and obtain a new vector C review1 I.e. the second vector in the embodiment of the present invention.
For example: a first phrase can be generated using a first phrase decoder: "generating the first short sentence" What I ear before do not mean condensing the differences protection ".
Finally, fig. 4 is a schematic diagram of a second syntax decoder according to an embodiment of the present invention, as shown in fig. 4, combining a vector C and a vector C review1 The second phrase is generated as an input to a second phrase decoder.
For example: using the vector C produced by the second phrase encoder and the vector C produced by the decoder which generated the first phrase review1 A second sentence "Rather, I'm with to use my old and free body to discrete Tiger so you all have a hand to escape" is generated in another decoder.
Fig. 5 is a schematic structural diagram of an english long sentence automatic segmentation system according to an embodiment of the present invention, as shown in fig. 5, including: an obtaining module 501 and an automatic segmentation module 502, wherein:
the obtaining module 501 is configured to obtain an english long sentence to be divided;
the automatic segmentation module 502 is configured to input the long english sentence to be segmented into the trained sequence and output two short english sentences in the neural network model of the sequence frame.
Specifically, how to automatically split the long english sentence through the obtaining module 501 and the automatic splitting module 502 may be used to execute the technical scheme of the embodiment of the method for automatically splitting the long english sentence shown in fig. 1, which has similar implementation principles and technical effects, and is not described herein again.
According to the method and the system for automatically segmenting the English long sentence, provided by the embodiment of the invention, the mode recognition is carried out by utilizing the sequence-to-sequence neural network model, the English long sentence is automatically segmented into two short sentences, and the human resources are greatly saved.
On the basis of the above embodiment, the system further includes:
the training module is used for acquiring a corpus data set, wherein the corpus data set comprises an original text, a translator translation and an examining and correcting translation;
and training a preset sequence-to-sequence frame neural network model by using the corpus data set as a training sample set to obtain the trained sequence-to-sequence frame neural network model.
On the basis of the above embodiment, the system further includes:
and the preprocessing module is used for preprocessing the data of word segmentation and sentence segmentation of the text in the corpus data set.
On the basis of the above embodiment, the sequence-to-sequence framework neural network model includes:
the system comprises an original text encoder, a translated text encoder, a first short sentence decoder and a second short sentence decoder.
On the basis of the above embodiment, the training module includes:
the encoding unit is used for combining the original text vectors and the translated text vectors in the training sample set into a first vector based on the original text encoder and the translated text encoder;
a first decoding unit configured to generate a first phrase and a second vector based on the first phrase decoder and the first vector;
a second decoding unit configured to generate a second phrase based on the second phrase decoder and the second vector.
An embodiment of the present invention provides an electronic device, including: at least one processor; and at least one memory communicatively coupled to the processor, wherein:
fig. 6 is a block diagram of an electronic device according to an embodiment of the present invention, and referring to fig. 6, the electronic device includes: a processor (processor) 601, a communication Interface (Communications Interface) 602, a memory (memory) 603 and a bus 604, wherein the processor 601, the communication Interface 602 and the memory 603 complete communication with each other through the bus 604. The processor 601 may call logic instructions in the memory 603 to perform the following method: obtaining English long sentences to be divided; and inputting the English long sentence to be divided into the trained sequence and the neural network model of the sequence frame, and outputting two English short sentences.
An embodiment of the present invention discloses a computer program product, which includes a computer program stored on a non-transitory computer readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer can execute the methods provided by the above method embodiments, for example, the method includes: obtaining English long sentences to be divided; and inputting the English long sentence to be divided into the trained sequence-to-sequence frame neural network model, and outputting two English short sentences.
Embodiments of the present invention provide a non-transitory computer-readable storage medium storing computer instructions, which cause a computer to execute the method provided by the above method embodiments, for example, including: obtaining English long sentences to be divided; and inputting the English long sentence to be divided into the trained sequence and the neural network model of the sequence frame, and outputting two English short sentences.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in each embodiment or some portions of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (5)

1. An automatic English long sentence segmentation method is characterized by comprising the following steps:
obtaining English long sentences to be divided;
inputting the English long sentence to be divided into the trained sequence and the neural network model of the sequence frame, and outputting two English short sentences;
before the English long sentence to be divided is input into the neural network of the trained sequence-to-sequence framework and two English short sentences are output, the method further comprises the following steps:
obtaining a corpus data set, wherein the corpus data set comprises an original text, a translator translation and an auditing translation;
training a preset sequence-to-sequence framework neural network model by using the corpus data set as a training sample set to obtain the trained sequence-to-sequence framework neural network model;
the sequence-to-sequence framework neural network model comprises:
the system comprises an original text encoder, a translated text encoder, a first short sentence decoder and a second short sentence decoder;
the training of the neural network model from a preset sequence to a sequence frame by taking the corpus data set as a training sample set comprises the following steps:
combining the original text vector and the translation vector in the training sample set into a first vector based on the original text encoder and the translation encoder;
generating a first phrase and a second vector based on the first phrase decoder and the first vector;
generating a second phrase based on the second phrase decoder and the second vector.
2. The method according to claim 1, wherein before said training the corpus data set as a training sample set on a preset sequence-to-sequence framework neural network model, the method further comprises:
and performing data preprocessing of word segmentation and sentence segmentation on the text in the corpus data set.
3. An automatic English long sentence segmentation system is characterized by comprising:
the acquisition module is used for acquiring English long sentences to be divided;
the automatic segmentation module is used for inputting the English long sentence to be segmented into the trained sequence and outputting two English short sentences into the neural network model of the sequence frame;
before inputting the long english sentence to be divided into the neural network of the trained sequence-to-sequence framework and outputting two short english sentences, the method further comprises:
acquiring a corpus data set, wherein the corpus data set comprises original text, translator translation and checking translation;
training a preset sequence-to-sequence frame neural network model by using the corpus data set as a training sample set to obtain the trained sequence-to-sequence frame neural network model;
the sequence-to-sequence framework neural network model comprises:
the system comprises an original text encoder, a translated text encoder, a first short sentence decoder and a second short sentence decoder;
the training of the neural network model from a preset sequence to a sequence frame by taking the corpus data set as a training sample set comprises the following steps:
combining the original text vector and the translated text vector in the training sample set into a first vector based on the original text encoder and the translated text encoder;
generating a first phrase and a second vector based on the first phrase decoder and the first vector;
generating a second phrase based on the second phrase decoder and the second vector.
4. An electronic device, comprising a memory and a processor, wherein the processor and the memory communicate with each other via a bus; the memory stores program instructions executable by the processor, the processor invoking the program instructions to be capable of performing the method of claim 1 or 2.
5. A non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the method of claim 1 or 2.
CN201811549280.6A 2018-12-18 2018-12-18 English long sentence automatic segmentation method and system Active CN109657244B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811549280.6A CN109657244B (en) 2018-12-18 2018-12-18 English long sentence automatic segmentation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811549280.6A CN109657244B (en) 2018-12-18 2018-12-18 English long sentence automatic segmentation method and system

Publications (2)

Publication Number Publication Date
CN109657244A CN109657244A (en) 2019-04-19
CN109657244B true CN109657244B (en) 2023-04-18

Family

ID=66114558

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811549280.6A Active CN109657244B (en) 2018-12-18 2018-12-18 English long sentence automatic segmentation method and system

Country Status (1)

Country Link
CN (1) CN109657244B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353281B (en) * 2020-02-24 2023-04-07 百度在线网络技术(北京)有限公司 Text conversion method and device, electronic equipment and storage medium
CN112506949B (en) * 2020-12-03 2023-07-25 北京百度网讯科技有限公司 Method, device and storage medium for generating structured query language query statement

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107590132A (en) * 2017-10-17 2018-01-16 语联网(武汉)信息技术有限公司 A kind of method of automatic corrigendum segment word is judged by English part of speech
CN107797995A (en) * 2017-11-20 2018-03-13 语联网(武汉)信息技术有限公司 A kind of Chinese and English fragment language material generation method
WO2018097022A1 (en) * 2016-11-24 2018-05-31 国立研究開発法人情報通信研究機構 Automatic translation pattern learning device, automatic translation preprocessing device, and computer program
WO2018177334A1 (en) * 2017-03-30 2018-10-04 华为技术有限公司 Content explanation method and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070282594A1 (en) * 2006-06-02 2007-12-06 Microsoft Corporation Machine translation in natural language application development
CN101458681A (en) * 2007-12-10 2009-06-17 株式会社东芝 Voice translation method and voice translation apparatus
CN106527756A (en) * 2016-10-26 2017-03-22 长沙军鸽软件有限公司 Method and device for intelligently correcting input information
CN106919646B (en) * 2017-01-18 2020-06-09 南京云思创智信息科技有限公司 Chinese text abstract generating system and method
CN108491372B (en) * 2018-01-31 2021-06-08 华南理工大学 Chinese word segmentation method based on seq2seq model
CN108647207B (en) * 2018-05-08 2022-04-05 上海携程国际旅行社有限公司 Natural language correction method, system, device and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018097022A1 (en) * 2016-11-24 2018-05-31 国立研究開発法人情報通信研究機構 Automatic translation pattern learning device, automatic translation preprocessing device, and computer program
WO2018177334A1 (en) * 2017-03-30 2018-10-04 华为技术有限公司 Content explanation method and device
CN107590132A (en) * 2017-10-17 2018-01-16 语联网(武汉)信息技术有限公司 A kind of method of automatic corrigendum segment word is judged by English part of speech
CN107797995A (en) * 2017-11-20 2018-03-13 语联网(武汉)信息技术有限公司 A kind of Chinese and English fragment language material generation method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
任众 ; 侯宏旭 ; 武静 ; 王洪彬 ; 李金廷 ; 樊文婷 ; 申志鹏 ; .基于统计和神经网络的蒙汉机器翻译研究.中文信息学报.2018,(11),全文. *
包乌格德勒 ; 赵小兵 ; .基于RNN和CNN的蒙汉神经机器翻译研究.中文信息学报.2018,(08),全文. *
王蕾 ; 谢云 ; 周俊生 ; 顾彦慧 ; 曲维光 ; .基于神经网络的片段级中文命名实体识别.中文信息学报.2018,(03),全文. *

Also Published As

Publication number Publication date
CN109657244A (en) 2019-04-19

Similar Documents

Publication Publication Date Title
CN109670180B (en) Method and device for translating individual characteristics of vectorized translator
CN110543643B (en) Training method and device of text translation model
CN110555213B (en) Training method of text translation model, and text translation method and device
CN111382580A (en) Encoder-decoder framework pre-training method for neural machine translation
US20170308526A1 (en) Compcuter Implemented machine translation apparatus and machine translation method
CN112766000B (en) Machine translation method and system based on pre-training model
CN112287696B (en) Post-translation editing method and device, electronic equipment and storage medium
CN111144140B (en) Zhongtai bilingual corpus generation method and device based on zero-order learning
CN111832318B (en) Single sentence natural language processing method and device, computer equipment and readable storage medium
CN111539229A (en) Neural machine translation model training method, neural machine translation method and device
CN109657244B (en) English long sentence automatic segmentation method and system
CN112560510A (en) Translation model training method, device, equipment and storage medium
CN104679735A (en) Pragmatic machine translation method
CN111539199A (en) Text error correction method, device, terminal and storage medium
CN111178098B (en) Text translation method, device, equipment and computer readable storage medium
CN111178097B (en) Method and device for generating Zhongtai bilingual corpus based on multistage translation model
CN112836525A (en) Human-computer interaction based machine translation system and automatic optimization method thereof
CN114861628A (en) System, method, electronic device and storage medium for training machine translation model
CN113988047A (en) Corpus screening method and apparatus
CN115017876A (en) Method and terminal for automatically generating emotion text
CN114519358A (en) Translation quality evaluation method and device, electronic equipment and storage medium
CN111985251B (en) Translation quality evaluation method and device
CN112836528A (en) Machine translation post-editing method and system
CN113283218A (en) Semantic text compression method and computer equipment
CN111178090A (en) Method and system for enterprise name translation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant