CN113705168B

CN113705168B - Chapter neural machine translation method and system based on cross-level attention mechanism

Info

Publication number: CN113705168B
Application number: CN202111016267.6A
Authority: CN
Inventors: 李军辉; 陈林卿; 贡正仙; 周国栋
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2023-04-07
Anticipated expiration: 2041-08-31
Also published as: CN113705168A

Abstract

The invention discloses a discourse neural machine translation method based on a cross-level attention mechanism, which comprises the following steps: generating a training corpus containing document structure information from the unprocessed corpus; training a chapter neural machine translation model of a basic structure by using training corpus; acquiring a source end document sentence vector set with sentence boundary information by using a translation model; the sentence vector set is used as the input of a context capturer based on cross level attention, a cross attention mechanism is used for obtaining a dependency weight matrix between words and sentences, and an independent overall context at chapter level is obtained for each word in the current sentence; combining the global context with a translation model to obtain a discourse neural machine translation model based on a cross-attention mechanism; and carrying out neural network training on the obtained object. According to the invention, the global context information of the chapters is introduced in the training, the chapters to be translated are regarded as a whole for translation, and the error or missing of the translation caused by neglecting the context information can be avoided while the structure information of the chapters is maintained.

Description

Cross-level attention mechanism-based chapter neural machine translation method and system

Technical Field

The invention relates to the technical field of neural machine translation, in particular to a chapter neural machine translation method and system based on a cross-level attention mechanism.

Background

The existing common neural machine translation method is a neural machine translation method based on parallel sentence pairs, and model training is carried out by utilizing large-scale bilingual parallel linguistic data through a coder-decoder neural network framework. During training, a word-level attention mechanism is used for calculating the dependency relationship among all words in a sentence, and effective information of other words in the sentence where the words are located is obtained for the words.

However, the word-level attention mechanism can only acquire the dependency relationship between words in a single sentence, and neglects the relation and structural relationship between sentences in the document. The context-based benefits to the translation quality of documents have been supported by the research efforts of many researchers. Meanwhile, ignoring chapter context information also causes cumulative propagation of translation errors, which causes problems such as inconsistent reference.

Disclosure of Invention

Therefore, the technical problem to be solved by the present invention is to overcome the deficiencies in the prior art, and provide a method, an apparatus and a system for discourse neural machine translation based on a cross-level attention mechanism, which automatically sense the document boundary through a model, and provide a context capturer to obtain the context from the whole document for each word in the current sentence, thereby improving the translation quality and the translation fluency, and solving the problems such as inconsistent reference.

In order to solve the technical problem, the invention provides a cross-level attention mechanism-based discourse neural machine translation method, which comprises the following steps:

s1: generating a training corpus containing document structure information from the unprocessed corpus;

s2: training a discourse neural machine translation model of the basic structure by using the training corpus;

s3: acquiring a source end document sentence vector set with sentence boundary information by using the translation model;

s4: the sentence vector set is used as the input of a context capturer based on cross level attention, a cross attention mechanism is used for obtaining a dependency weight matrix between words and sentences, and an independent overall context at chapter level is obtained for each word in the current sentence;

s5: combining the overall context of the chapter level obtained in the S4 with the chapter neural machine translation model obtained in the S2 to obtain a chapter neural machine translation model based on a cross-attention mechanism;

s6: and carrying out neural network training on the discourse neural machine translation model based on the cross attention mechanism.

In one embodiment of the present invention, the corpus includes a source corpus and a target corpus.

In an embodiment of the present invention, the generating the corpus in the S1 step includes the following steps:

s1.1: processing the document into a single long sentence, and marking the document sentence boundary by using a special symbol;

s1.2: in the processing process, the long document is segmented according to the length to obtain sub-chapters with proper length, for the segmented long document, partial sentences of adjacent front and back sub-chapters of the middle sub-chapter are reserved as connection contexts, and if the number of the last sub-chapter sentence of the segmented sub-chapter set is too small, the last sub-chapter sentence is attached to the previous sub-chapter.

In one embodiment of the present invention, the ability of the discourse neural machine translation model of the infrastructure in step S2 includes:

the method has the capability of performing attention mechanism operation by taking chapters as units;

the method has the capability of translating the document by taking chapters as units;

the method has the necessary capacity in the process of performing translation model training by taking chapters as units;

the ability to maintain sentence alignment during the above operations.

In an embodiment of the present invention, the set of source-end document sentence vectors having sentence boundary information in step S3 includes:

and generating a word-level hidden state which takes discourse as a unit and still keeps sentence independence by utilizing a neural network, and converting the word-level hidden state into a sentence vector set which takes sentences as a unit by utilizing a sentence embedding layer.

In an embodiment of the present invention, the obtaining of the independent overall context at chapter level by the dependency weight matrix between words and sentences and each word in the step S4 includes:

the word-level hidden state of the current sentence is obtained from a source material, a sentence vector set of the whole discourse is obtained to serve as a context source, the sentence vector representation and the word vector representation are operated through a cross-level attention mechanism to obtain a dependency relationship weight matrix of each word in the current sentence and all sentences in the discourse, and the whole context of the whole discourse is obtained for the words in the target sentence respectively.

In an embodiment of the present invention, the training of the translation model in step S6 includes:

loading the parameters of the discourse neural machine translation model of the basic structure obtained in the S2 into the discourse neural machine translation model based on the cross-attention mechanism;

freezing parameters shared by the discourse neural machine translation model of the basic structure in the discourse neural machine translation model based on the cross-attention mechanism;

training a discourse neural machine translation model based on a cross-attention mechanism.

In addition, the invention also provides a chapter neural machine translation system based on a cross-level attention mechanism, which comprises:

the corpus preprocessing module is used for generating a training corpus containing document structure information from the unprocessed corpus;

the basic translation model training module is used for training a chapter neural machine translation model of a basic structure by using training corpora;

a sentence vector acquisition module for acquiring a set of source-end document sentence vectors having sentence boundary information by using the translation model;

the global context module is used for cooperating the sentence vector set as the input of the context capturer based on cross level attention, acquiring a dependency weight matrix between words and sentences by using a cross attention mechanism, and acquiring an independent overall context at a chapter level for each word in the current sentence;

the neural machine translation model obtaining module is used for combining the overall context at the chapter level with the chapter neural machine translation model to obtain a chapter neural machine translation model based on a cross-attention mechanism;

and the neural machine translation model training module is used for carrying out neural network training on the discourse neural machine translation model based on the cross-attention mechanism.

In an embodiment of the present invention, the corpus preprocessing module includes:

a corpus tagging unit for processing the document into a single long sentence, tagging a document sentence boundary with a special symbol;

and the corpus segmentation unit is used for segmenting the long document according to the length in the processing process to obtain sub-chapters with moderate length, wherein for the segmented long document, the middle sub-chapter keeps partial sentences of the adjacent front and rear sub-chapters as a connection context, and if the number of the last sub-chapter sentences of the segmented sub-chapter set is too small, the last sub-chapter sentence is attached to the previous sub-chapter.

In one embodiment of the invention, the global context module comprises:

a word-level expression form encoding unit for encoding a context at a word level;

a sentence vector conversion unit for converting the word-level expression form of the context into a sentence-level expression form;

the cross level attention mechanism unit is used for acquiring a word/sentence level dependency weight relation matrix of a cross level;

and the global context allocation unit is used for allocating the acquired context to each word of the current sentence according to needs.

Compared with the prior art, the technical scheme of the invention has the following advantages:

according to the invention, the global context information from the whole chapters is introduced in the training, the chapters to be translated are regarded as a whole for translation, and the structure information of the chapters is kept in the translation process, so that the situation of wrong or missed translation caused by neglected context information can be avoided.

Drawings

FIG. 1 is a flow chart of a chapter neural machine translation method based on a cross-level attention mechanism according to the present invention.

FIG. 2 is a schematic diagram of a method for generating a corpus containing document structure information from unprocessed corpus according to the present invention.

FIG. 3 is a diagram illustrating a method for utilizing a cross-attention mechanism global context according to the present invention.

FIG. 4 is a diagram illustrating a method for integrating global context with a discourse neural machine translation model according to the present invention.

FIG. 5 is a schematic diagram of the neural network training method for the discourse neural machine translation model based on the cross-attention mechanism according to the present invention.

FIG. 6 is a schematic structural diagram of a chapter neural machine translation system based on a cross-level attention mechanism according to the present invention.

FIG. 7 is a diagram illustrating a corpus pre-processing module according to the present invention.

FIG. 8 is a block diagram of the global context module of the present invention.

The reference numbers in the figures illustrate: 10. a corpus preprocessing module; 101. a corpus tagging unit; 102. a corpus segmentation unit; 20. a basic translation model training module; 30. a sentence vector acquisition module; 40. a global context module; 401. a word-level expression form encoding unit; 402. a sentence vector conversion unit; 403. a cross-level attention mechanism unit; 404. a global context allocation unit; 50. a neural machine translation model obtaining module; 60. and a neural machine translation model training module.

Detailed Description

The present invention is further described below in conjunction with the drawings and the embodiments so that those skilled in the art can better understand the present invention and can carry out the present invention, but the embodiments are not to be construed as limiting the present invention.

Example one

Referring to fig. 1 to 8, an embodiment of the present invention provides a discourse neural machine translation method based on a cross-level attention mechanism, including:

The working principle is as follows: the dependency relationship between the sentence from the whole chapter and the characters in the current sentence is directly calculated, so that the corresponding characters can obtain the specific global context and are added into the neural training process together, and the final translation model can enable the global context from the chapter to directly participate in the translation process. Context information from the overall situation of the chapters is introduced in training, sentence sets in the documents are used as candidate contexts and are distributed to words in the current sentences, the document mechanism of the contexts can be fully utilized in the translation process, and each word can fully acquire the overall context.

The generating of the corpus in the step S1 includes the following steps:

s1.2: in the processing process, the long document is segmented according to the length to obtain sub-chapters with moderate length, for the segmented long document, the middle sub-chapter keeps partial sentences of the adjacent front and rear sub-chapters as a connection context, and if the number of the last sub-chapter sentences of the segmented sub-chapter set is too small, the last sub-chapter sentence is attached to the previous sub-chapter. The method can greatly reduce the number of sentences in a single chapter, reduce the calculated amount, shorten the training time, reduce the model parameters and ensure more reliable technology.

Wherein, the capability of the discourse neural machine translation model of the basic structure in the step S2 comprises: the method has the capability of performing attention mechanism operation by taking chapters as units; the method has the capability of translating the document by taking chapters as units; the method has the necessary capacity in the process of carrying out translation model training by taking chapters as units; the ability to maintain sentence alignment during the above operations.

The source-end document sentence vector set with sentence boundary information in the step S3 comprises a word-level hidden state which is generated by using a neural network and takes chapters as a unit and still keeps sentence independence, and the word-level hidden state is converted into a sentence vector set taking sentences as a unit by using a sentence embedding layer.

In the step S4, obtaining a dependency weight matrix between words and sentences by using a cross-attention mechanism, and obtaining an independent overall context at a chapter level for each word in the current sentence includes: encoding a sentence into a vector expression form at a character level by using an encoder; converting a document into an expression form of a sentence vector set by using a sentence vector embedding layer, namely, each sentence is represented by one vector, and each chapter is represented by a set consisting of a plurality of sentence vectors; acquiring a dependency relationship weight matrix between each word in the current sentence and all sentences in the document through a cross hierarchy attention mechanism; and distributing the context according to the dependency relationship weight matrix so as to obtain a unique global context for each character in the current sentence.

The working principle is as follows: the dependency relationship weight matrix obtained by calculation represents the dependency relationship of each character in the current sentence on other sentences in the full chapter, namely the degree of closeness of the relation between the two; the method comprises the steps that a sentence is embedded, vector expression forms of all characters in the sentence are obtained, weight sums are obtained according to importance degrees among the characters in the sentence, and therefore the purpose of obtaining sentence vectors is achieved; and assigning the sentence vectors to each character in the current sentence according to the importance of the weight matrix obtained by calculation, so that the characters obtain unique global context from the full text. It provides a specific method for obtaining the overall context with the overall summary information of full chapters from the current sentence. By combining the sentence vectors and the relation weight matrix, the global context can be accurately and efficiently selected and obtained, and meanwhile, the calculation time and the calculation power overhead in the process are greatly reduced.

As shown in fig. 4, combining the global context at chapter level with the chapter neural machine translation model to obtain the chapter neural machine translation model based on the cross-attention mechanism includes: segmenting the corpus according to a natural document structure; obtaining context for source language chapters; combining the context with the source language chapters; and taking the chapters combined with the chapter context as source end input of the translation model.

As shown in fig. 5, the neural network training of the cross-attention mechanism-based discourse neural machine translation model at least includes the following contents: training a translation model with a conventional coder-decoder structure; loading parameters multiplexed by the trained model and the model of the invention into a discourse neural machine translation model based on a cross attention mechanism; freezing the trained multiplexing parameters; training a complete chapter time machine translation model based on the cross-attention mechanism provided by the invention.

The working principle is as follows: the translation model trained in the two-step training mode has better performance, and the parameters of the model trained in the first step are used as the initialization parameters of the complete large model, so that the training speed is high, and the calculation cost is relatively low. The method can greatly reduce the training time and improve the performance of the translation system.

Example two

The embodiment of the invention provides a chapter neural machine translation system based on a cross-level attention mechanism, which comprises:

the corpus preprocessing module 10 is used for generating a training corpus containing document structure information from unprocessed corpus;

a basic translation model training module 20, configured to train a discourse neural machine translation model of a basic structure using a training corpus;

a sentence vector obtaining module 30, configured to obtain a set of sentence vectors of a source-end document having sentence boundary information by using the translation model;

a global context module 40, configured to cooperate the sentence vector set as an input of a cross-level attention-based context capturer, obtain a dependency weight matrix between words and sentences by using a cross-attention mechanism, and obtain an independent overall context at a chapter level for each word in a current sentence;

a neural machine translation model obtaining module 50, configured to combine the overall context at the chapter level with the chapter neural machine translation model to obtain a chapter neural machine translation model based on a cross-attention mechanism;

and a neural machine translation model training module 60, configured to perform neural network training on the chapter-based neural machine translation model based on the cross-attention mechanism.

Wherein, the corpus preprocessing module 10 includes:

a corpus tagging unit 101 for processing a document into a single long sentence, tagging document sentence boundaries with special symbols;

the corpus segmentation unit 102 is configured to segment the long document according to the length during the processing process to obtain sub-chapters with an appropriate length, wherein for the segmented long document, a middle sub-chapter retains partial sentences of adjacent front and rear sub-chapters as a connection context, and if the number of last sub-chapter sentences of the segmented sub-chapter set is too small, the middle sub-chapter is attached to the previous sub-chapter.

Wherein the global context module 40 comprises:

a word-level expression form encoding unit 401 for encoding a context at a word level;

a sentence vector conversion unit 402 for converting the word-level expression form of the context into a sentence-level expression form

An intersection level attention mechanism unit 403, configured to obtain a word/sentence level dependency weight relationship matrix of an intersection level;

and a global context allocation unit 404, configured to allocate the obtained context to each word of the current sentence as needed.

The above embodiments are merely preferred embodiments for fully illustrating the present invention, and the scope of the present invention is not limited thereto. The equivalent substitution or change made by the technical personnel in the technical field on the basis of the invention is all within the protection scope of the invention. The protection scope of the invention is subject to the claims.

Claims

1. A discourse neural machine translation method based on a cross-level attention mechanism is characterized by comprising the following steps:

s6: carrying out neural network training on the discourse neural machine translation model based on the cross attention mechanism;

the step S1 of generating the training corpus comprises the following steps:

s1.2: in the processing process, the long document is segmented according to the length to obtain sub chapters with moderate length, for the segmented long document, partial sentences of the adjacent front and rear sub chapters of the middle sub chapter are kept as connection contexts, and if the number of the last sub chapter sentences of the segmented sub chapter set is too small, the last sub chapter sentences are attached to the previous sub chapter;

s4, obtaining a dependency weight matrix between words and sentences by using a cross attention mechanism, and obtaining an independent overall context at chapter level for each word in the current sentence, wherein the overall context comprises the following steps: encoding a sentence into a vector expression form at a character level by using an encoder; converting a document into an expression form of a sentence vector set by using a sentence vector embedding layer, namely, each sentence is represented by one vector, and each chapter is represented by a set consisting of a plurality of sentence vectors; acquiring a dependency relationship weight matrix between each word in the current sentence and all sentences in the document through a cross hierarchy attention mechanism; and distributing the context according to the dependency relationship weight matrix so as to obtain a unique global context for each character in the current sentence.

2. The discourse neural machine translation method based on the cross-level attention mechanism as claimed in claim 1, wherein: the corpus comprises a source corpus and a target corpus.

3. The discourse neural machine translation method based on the cross-level attention mechanism as claimed in claim 1, wherein: the source end document sentence vector set with sentence boundary information in the step S3 comprises:

4. The discourse neural machine translation method based on the cross-level attention mechanism as claimed in claim 1, wherein: the training of the translation model in the step S6 comprises the following steps:

5. A discourse neural machine translation system based on a cross-level attention mechanism is characterized by comprising:

a sentence vector acquisition module for acquiring a set of source end document sentence vectors having sentence boundary information by using the translation model;

the neural machine translation model training module is used for carrying out neural network training on the discourse neural machine translation model based on the cross attention mechanism;

the corpus preprocessing module comprises:

a corpus tagging unit for processing the document into a single long sentence, tagging document sentence boundaries with special symbols;

the corpus segmentation unit is used for segmenting the long document according to the length in the processing process to obtain sub-chapters with moderate length, wherein for the segmented long document, the middle sub-chapter keeps partial sentences of the adjacent front and rear sub-chapters as a connection context, and if the number of the last sub-chapter sentence of the segmented sub-chapter set is too small, the last sub-chapter sentence is attached to the previous sub-chapter;

the global context module obtains a dependency weight matrix between words and sentences by using a cross attention mechanism, and obtains an independent overall context at chapter level for each word in the current sentence, wherein the overall context comprises: encoding a sentence into a vector expression form at a character level by using an encoder; converting a document into an expression form of a sentence vector set by using a sentence vector embedding layer, namely, each sentence is represented by one vector, and each chapter is represented by a set consisting of a plurality of sentence vectors; acquiring a dependency relationship weight matrix between each word in the current sentence and all sentences in the document through a cross level attention mechanism; and distributing the context according to the dependency relationship weight matrix so as to obtain the unique global context for each character in the current sentence.