CN111680519A

CN111680519A - Text translation method and device based on dimension reduction barrel model

Info

Publication number: CN111680519A
Application number: CN202010349528.5A
Authority: CN
Inventors: 骆加维; 吴信朝; 周宸; 王虎; 许康颂
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-04-28
Filing date: 2020-04-28
Publication date: 2020-09-18
Anticipated expiration: 2040-04-28
Also published as: CN111680519B

Abstract

The invention discloses a text translation method and a text translation device based on a dimension reduction barrel model, relates to the technical field of artificial intelligence, and mainly aims to solve the problem of deep semantic transfer of long texts in an NLP (non-line-of-sight) translation task by introducing a dimension reduction barrel algorithm, expand the length of a single input text and improve the accuracy of a translation result and the translation efficiency by reducing the calculation complexity. The method comprises the following steps: receiving a text translation request, wherein the request carries text data to be translated; processing the text data to be translated through a dimension reduction barrel algorithm, and determining the word with the highest output confidence coefficient as a text translation result; and responding to the text translation request by using the text translation result. The method is suitable for text translation based on the dimension reduction barrel model.

Description

Text translation method and device based on dimension reduction barrel model

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a text translation method and device based on a dimension reduction barrel model.

Background

Machine translation is a branch of natural language processing, and is mainly applied to background work at present, and is mainly responsible for translation between different languages in the related field of robots. The technology of machine translation has been from the initial seq2seq according to machine translation to the end-to-end model relying on RNN-based models and on transformers, the 18-year-end bert model, which is an end-to-end model, relying on the RNN-based model and on the transformer. Upgrading the model results in performance optimization and accuracy improvement.

Currently, the Bert model is generally used as a machine translation model. However, the Bert model has natural defects, that is, under the MASK mechanism, MASK words are mutually independent, and deep semantics are lost; in addition, in a long text task, the problem that deep semantics cannot be transmitted through long texts is caused by unreasonable sentence breaking of the model and the like. Therefore, the accuracy of the translation result is poor, and the translation efficiency is low.

Disclosure of Invention

In view of the above, the present invention provides a text translation method and apparatus based on a dimension reduction bucket model, and mainly aims to solve the problem of deep semantic transfer of long texts in NLP translation tasks by introducing a dimension reduction bucket algorithm, so as to expand the length of a single input text and improve the accuracy of translation results and translation efficiency by reducing the computation complexity.

According to one aspect of the invention, a text translation method based on a dimension reduction bucket model is provided, which comprises the following steps:

receiving a text translation request, wherein the request carries text data to be translated;

processing the text data to be translated through a dimension reduction barrel algorithm, and determining the word with the highest output confidence coefficient as a text translation result;

and responding to the text translation request by using the text translation result.

Further, the processing the text data to be translated through the dimension reduction bucket algorithm, and determining the word with the highest output confidence as the text translation result includes:

and processing the text data with translation by using a pre-trained dimensionality reduction bucket model, and determining the word with the highest output confidence coefficient as a text translation result.

Further, the processing the text data to be translated by using the pre-trained dimensionality reduction bucket model, and determining the word with the highest output confidence as the text translation result includes:

carrying out normalization processing according to the obtained input vector to obtain an attention score, and dividing a dimensionality reduction barrel structure by using the attention score;

performing the structure supplement of the dimensionality reduction barrel by presetting attention scores of different orders of magnitude;

performing dimensionality reduction processing on the dimensionality reduction barrel structure by using a dimensionality reduction algorithm;

and updating the sharing weight of the attention score obtained by the dimensionality reduction treatment to obtain a shared attention score, and outputting a word with the highest confidence as a text translation result after normalization treatment.

Further, before the processing the text data to be translated through the dimensionality reduction bucket attention model, the method further comprises:

and converting words in the text to be translated into word vectors by using a preset word vector algorithm.

Further, before converting the words in the text with translation into word vectors by using a preset word vector algorithm, the method further includes:

and obtaining a position coding vector according to a preset position coding algorithm and the word vector, and splicing the word vector and the position vector into an input vector.

Further, the method further comprises:

and carrying out factorization processing on the input vector to obtain bidirectional information of the input vector.

Further, before performing factorization processing on the input vector to obtain bidirectional information of the input vector, the method further includes:

and training a dimensionality reduction barrel model according to the Bert model and the bidirectional information of the input vector.

According to two aspects of the invention, a text translation device based on a dimension reduction bucket model is provided, which comprises:

the system comprises a receiving unit, a translation unit and a translation unit, wherein the receiving unit is used for receiving a text translation request, and the request carries text data to be translated;

the processing unit is used for processing the text data to be translated through a dimension reduction barrel algorithm and determining the word with the highest output confidence coefficient as a text translation result;

a response unit, configured to respond to the text translation request with the text translation result.

The processing unit may be specifically configured to process the text data to be translated by using a pre-trained dimension reduction bucket model, and determine a word with the highest output confidence as a text translation result.

Further, the processing unit includes:

the dividing module is used for carrying out normalization processing according to the acquired input vector to obtain an attention score and dividing the dimensionality reduction barrel structure by utilizing the attention score;

the supplement module is used for supplementing the dimension reduction barrel structure by presetting attention scores of different orders of magnitude;

the dimensionality reduction module is used for carrying out dimensionality reduction processing on the dimensionality reduction barrel structure by using a dimensionality reduction algorithm;

and the updating module is used for updating the sharing weight of the attention score obtained by the dimensionality reduction processing to obtain a shared attention score, and outputting a word with the highest confidence as a text translation result after normalization processing.

Further, the apparatus further comprises:

and the conversion unit is used for converting the words in the text to be translated into word vectors by utilizing a preset word vector algorithm.

Further, the apparatus further comprises:

and the splicing unit is used for obtaining a position coding vector according to a preset position coding algorithm and the word vector, and splicing the word vector and the position vector into an input vector.

Further, the apparatus further comprises:

and the decomposition unit is used for carrying out factorization processing on the input vector to obtain the bidirectional information of the input vector.

Further, the apparatus further comprises:

and the training unit is used for training the dimensionality reduction barrel model according to the Bert model and the bidirectional information of the input vector.

According to a third aspect of the present invention, there is provided a storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform the steps of: receiving a text translation request, wherein the request carries text data to be translated; processing the text data to be translated through a dimension reduction barrel algorithm, and determining the word with the highest output confidence coefficient as a text translation result; and responding to the text translation request by using the text translation result.

According to a fourth aspect of the present invention, there is provided a computer device comprising a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface communicate with each other via the communication bus, and the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to perform the following steps: receiving a text translation request, wherein the request carries text data to be translated; processing the text data to be translated through a dimension reduction barrel algorithm, and determining the word with the highest output confidence coefficient as a text translation result; and responding to the text translation request by using the text translation result.

Compared with the prior art that text data translation is carried out through a self-attention mechanism, the text translation method and device based on the dimension reduction barrel model receive a text translation request, wherein the request carries text data to be translated; processing the text data to be translated through a dimension reduction barrel algorithm, and determining the word with the highest output confidence coefficient as a text translation result; and responding to the text translation request by using the text translation result. Therefore, the problem of deep semantic transfer of long texts in an NLP translation task can be solved by introducing a dimensionality reduction bucket algorithm, the length of a single input text is expanded by reducing the calculation complexity, and the accuracy and the translation efficiency of a translation result are improved.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a flowchart illustrating a text translation method based on a dimension reduction bucket model according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a text translation apparatus based on a dimension reduction bucket model according to an embodiment of the present invention;

fig. 3 shows a physical structure diagram of a computer device according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As described in the background, a Bert model is currently used as a machine translation model. However, the Bert model has natural defects, that is, under the MASK mechanism, MASK words are mutually independent, and deep semantics are lost; in addition, in a long text task, the problem that deep semantics cannot be transmitted through a long text due to unreasonable sentence breaks of the model and the like can be caused, so that the result accuracy of text data translation is low.

In order to solve the above problem, an embodiment of the present invention provides a text translation method based on a dimension reduction bucket model, as shown in fig. 1, the method includes:

101. receiving a text translation request, wherein the request carries text data to be translated.

For the embodiment of the invention, the translation processing equipment can be used as a main body to receive text translation requests sent by different network sources. The text data to be translated may specifically include text type data such as TXT, for example, 32-bit numerical data. Specifically, after receiving a text translation request, the request may be analyzed to obtain text data to be translated carried in the request.

102. And processing the text data to be translated through a dimension reduction bucket algorithm, and determining the word with the highest output confidence coefficient as a text translation result.

The dimension reduction bucket algorithm may specifically be a dimension reduction bucket model trained in advance, that is, a Bert model constructed by a dimension reduction bucket attention mechanism. The confidence may be the obtained attention score, and the Bert model (bidirectional encoder retrieval from Transformer) may be used to obtain a semantic representation of the text containing rich semantic information by using large-scale unlabeled corpus training, and then fine-tune the semantic representation of the text in a specific NLP task, and finally apply to the NLP task. Specifically, when text data to be translated is obtained, the text data to be translated can be processed through a dimension reduction bucket algorithm, and thus a word with the highest confidence coefficient is obtained and determined as a text translation result.

It should be noted that, in the embodiment of the present invention, the Bert model is constructed by using the dimension reduction bucket attention mechanism, so that the problem of deep semantic transfer of the ultra-long text in the NLP translation task can be solved, and the text length of the single input model is extended by reducing the computation complexity.

103. And responding to the text translation request by using the text translation result.

For the embodiment of the present invention, after the text translation result is obtained, the text translation request may be responded to using the translation result, so that the text translation result is displayed on a display interface of a client, or used for other purposes, and the like.

Further, in order to better explain the process of the text translation method, as a refinement and an extension to the above embodiment, the embodiment of the present invention provides several alternative embodiments, but is not limited thereto, and specifically, the following embodiments are provided:

in an optional embodiment of the present invention, the step 102 may specifically include: and processing the text data to be translated by using a pre-trained dimensionality reduction barrel model, and determining the word with the highest output confidence coefficient as a text translation result.

The dimension reduction barrel model can be a Bert model constructed by using a dimension reduction barrel attention mechanism, and dimension reduction calculation can be performed by dividing barrel structures with different orders of magnitude so as to realize long text input and translation.

For the embodiment of the present invention, the processing the text data to be translated by using the pre-trained dimension reduction bucket model, and determining the word with the highest output confidence as the text translation result may specifically include: carrying out normalization processing according to the obtained input vector to obtain an attention score, and dividing a dimensionality reduction barrel structure by using the attention score; performing the structure supplement of the dimensionality reduction barrel by presetting attention scores of different orders of magnitude; performing dimensionality reduction processing on the dimensionality reduction barrel structure by using a dimensionality reduction algorithm; and updating the sharing weight of the attention score obtained by the dimensionality reduction treatment to obtain a shared attention score, and outputting a word with the highest confidence as a text translation result after normalization treatment.

The process of extracting features by using the dimensionality reduction barrel attention mechanism specifically comprises the following steps: s1 dividing barrels; s2 supplement of barrels; s3, performing dimensionality reduction calculation; barrel normalization of S4; s5 top level sharing weight update.

Specifically, the bucket dividing process may include: since the calculation of the ordinary attention mechanism is normalized by softmax, the attention scores obtained can be as follows: [1.7349562e^-1，4.2673022e^-1，3.8612169e^-1，8.6378390e^-3，1.5820753e^-4，5.8201298e^-5，2.1411061e^-5]It is obvious that the value of the attention weight of the current word is divided into 2 parts, one is that the attention values of the first three words with larger specific gravity are all 10^-1Of the order of magnitude of (1), while the values of the attention scores subsequent to the first three bits are negligible 10^-3、10^-4And even smaller. Thus, the score can be placed in two barrel configurations, such as with 1.7349562e^-1，4.2673022e^-1，3.8612169e^-1，8.6378390e^-3The 4 scores construct a high-score bucket structure, and all the remaining attention scores are put into a low-score bucket, so that the word vector characteristics related to the current word can be effectively further strengthened.

Specifically, the specific process of bucket replenishment may include: in the bucket division process, in order to enhance the generalization of the model, the model is enabled to effectively learn more word characteristics, and meanwhile, the model is prevented from only paying attention to the word most related to the current word, and the maximum 10 words are selected^-3The attention value of the order of magnitude is also put into a high-score barrel, the sequence of the elements in the barrel with high attention value keeps the word order of the originally input barrel attention layer, and the barrel highly related to the word can be obtained through division. Generally, due to the large number of words in the long text, the first 32 or 64 attention values can be selected for selecting the words in the high-resolution barrel, and the fractional words with a smaller number level are added in the high-resolution barrel.

Specifically, the specific process of the dimension reduction calculation may include: and eliminating the current word to calculate the attention score, and carrying out normalization calculation again. And carrying out normalization processing on elements in the barrel. Will present the wordThe score of the word is replaced by-inf minus infinity, so that the score of the current word is 0 in the normalization calculation, and the problem of overhigh proportion of the body word in the small-magnitude self-attention mechanism can be effectively solved. It should be noted that the dimensionality reduction and scoring mechanism in the present application effectively reduces the high computational complexity brought by the attention mechanism. The computational complexity of the general attention mechanism is O (L)²) Where L is the input text length (typically 512, 1024); after partitioning using buckets, the computational complexity O (1) is a constant level ((64-1) × (64-1), (32-1) × (32-1)).

Specifically, the specific process of bucket normalization may include: and normalizing the calculated attention score through softmax normalization so as to facilitate subsequent calculation processing. The specific softmax normalization formula is as follows:

specifically, the specific process of updating the sharing weight may include: when the encoder has 6 layers, the sharing weight update can be performed at the sixth layer of the top layer. After the barrel is scored, only the barrel attention score is calculated in the calculation process of the middle encoder, and when the last layer of input is carried out, the attention mechanism is shared, so that the shared attention score through the dimensionality reduction barrel mechanism is obtained.

For the embodiment of the present invention, an example of the calculation process of the score of the attention force in the dimensionality reduction bucket is as follows: for example, for a word sequence obtained, which includes 10 words in total, the 9 th word needs to be predicted, as shown below:

1

2

3

4

5

6

7

8

9 current word

10

Firstly, dividing the sequence into buckets, calculating attention scores of other words and a current word, and respectively dividing the attention scores into a high-division bucket structure and a low-division bucket structure, wherein the high-division bucket structure can be as follows:

secondly, in order to avoid that the model only focuses on the word most related to the current word, the word with the top grade is selected to be added into the high-score barrel structure, and particularly, the word of top32 or top64 can be selected according to the requirement, and the application does not make explicit provisions.

And thirdly, performing dimensionality reduction calculation on the high-resolution barrel structure, and performing normalization processing.

And finally, after the attention scores of the multilayer buckets are calculated, performing attention weight sharing updating, and outputting words with large weights.

1 (update)

2

3

4 (update)

5

6 (update)

7 (update)

8

9 current word

10

It should be noted that, in the residual error mechanism in the embodiment of the present invention, the overlap portion may be obtained by adding the absolute position code to the input of the previous layer. Meanwhile, the position coding is introduced into a residual error mechanism, and the output of the main attention mechanism is added (output + position encoding) with the position coding and then normalized (normalization). The conditions of gradient explosion and gradient disappearance in the backward propagation process can be avoided. The upper layer input is added before the second layer of linearly changing active layer is input, so that the cross entropy of the abstract representation in the process of gradient updating of a decoder can be reduced, and the convergence rate is accelerated.

In addition, the feature conversion of the decoder and the encoder can be performed by a cross attention mechanism. The cross attention mechanism is a common decoder-encoder feature conversion method in NLP tasks. Since the translation task is one of the text generation type tasks, the output of the encoder (encoder) needs to be weighted and scored every time a new character is output. The decoder (decoder) principle is substantially identical to the encoder. Except that the codes are weighted averaged at the decoder input and then input to the decoder. And performing linear transformation on the high-dimension tensor at the top end of the decoder to obtain a tensor with the dimension being the same as the length of the word bank, normalizing the possibility of all words through a softmax layer, and further finding out the word with the highest confidence coefficient to be output as a result. After the text in one language is input to the translator, the translated text is output word by word at the decoder side.

In an alternative embodiment of the invention, the method further comprises: and converting words in the text to be translated into word vectors by using a preset word vector algorithm.

The word vector algorithm may specifically be a pre-trained GloVe word vector training model, where GloVe is a word representation (word representation) tool based on global word frequency statistics, and may learn a word vector by using co-occurrence times between a word and an adjacent word, to finally obtain a word vector expression with semantic information, and specifically may be configured to convert a word into a vector consisting of real numbers, pre-train the obtained word, and obtain word vectors of all words mapped in the same vector space, where the word includes a MASK word (MASK), and a word vector dimension is D. The specific process can comprise the following steps: 1): presetting a corpus, constructing a co-occurrence matrix according to the corpus, wherein each element in the co-occurrence matrix represents the co-occurrence times of a word in an upper window and a lower window with a specific size in a context word, and specifically, defining an attenuation function for calculating weight according to the distance d between two words in the context window; 2): and constructing an approximate relation between the word vector and the co-occurrence matrix, wherein the relation can be represented by the following formula:

wherein the content of the first and second substances,

and

is the word vector that is to be solved at the end,

is w_iTransposing; b_iAnd

is a bias term for two word vectors; i and j represent the number of word vectors, X, respectively_ijIs an output result; 3): the loss function J is constructed according to the following formula:

wherein, V represents the whole dictionary base, the loss function J uses the mean square error, and a weight function f (x) is added; the formula of the weight function f (x) is as follows:

wherein x is_maxThe highest occurrence frequency in the context of a word or another word is represented, and can be set as 120, α is 0.76, and after training by the GloVe word representation tool, a word vector table of the corpus is obtained, and the word vector table is set as:

wherein d is_vIs the dimension of the word vector, | V | is the size of the entire dictionary library constructed above;

after words in the original sentence data are mapped into vectors by searching the word vector table, the text sentence is expressed as X ═ (X ═ X)₁，x₂，...，x_n)，

Similarly, words in the target sequence are searched in the word vector table to obtain a vectorized target sequence:

in another alternative embodiment of the present invention, the method may further comprise: and obtaining a position coding vector according to a preset position coding algorithm and the word vector, and splicing the word vector and the position vector into an input vector.

It should be noted that, because of the way of modeling the sequence of the multi-head attention machine, it is a feature of bag of words (bag of words), that is, the mechanism is regarded as a flat structure, and no matter how far away a word looks, it is 1 in the multi-head attention machine. Such modeling effectively loses the relative distance relationship between words. For example: the expressions corresponding to each word modeled by the three sentences of 'cattle eating grass', 'cattle eating grass' and 'cattle eating grass' are consistent. In order to alleviate this problem, in the transform, the embodiment of the present invention maps the position of the word in the sentence into a vector, and supplements the vector to the embedding layer, that is, the word vector context is processed according to a preset position coding algorithm to obtain a position coding vector. Considering that long text is used in the text translation task, embodiments of the present invention use relative position coding in order to maintain long-distance memory of the text.

Wherein the position encoding vector can be randomly initialized and trained in a model, or generated by a sine function or a cosine function. The specific process of random initialization and training in the model may be as follows: calculating the position distance of each word in the context relative to the target sequence to obtain position distance information, if a target sequence is composed of a plurality of words and a certain context belongs to the target sequence, the position distance between the context and the target sequence is 0, and calculating the position weight of all the context words relative to the target sequence through the position distance information. For example, first, the position of the first character in the target sequence in the context in the whole original sentence data is calculated, the index number is recorded as k, then the distance between each word in the upper and lower contexts and the position of the first character in the target sequence is calculated as l, and assuming that the total length of the target sequence is m, the distance calculation formula is as follows:

wherein l_iIndicating the distance of the ith word in the current context from the target sequence.

As shown in the above formula, the distance between the contexts on the left side of the target sequence is less than 0, the distance between the contexts on the right side of the target sequence is greater than 0, the distance between all words in the middle of the target sequence is set to 0, and the position weight of each word in the text sentence from the target sequence can be calculated by the following formula:

where n is the total length of the text, m is the total length of the target sequence, | l_iI is the absolute value of the distance, w_iAnd the position weight of the ith word in the original sentence from the target sequence is represented.

In addition, the generating the position vector by the sine function or the cosine function may include:

wherein if the length of word embedding is d_posThen it is necessary to construct a length d_posThe position-coding vector PE. Where p denotes the position of the word, PE_i(p) represents the value of the ith element in the pth word position vector, and then the word vector is added directly to the position vector. The position coding not only comprises absolute positionsInformation, represented by sin (α + β) ═ sin β 0cos β + cos α sin β and cos (α + β) ═ cos α cos β -sin α sin β 1, means that the position vector for p + k can be represented as a linear transformation of the position p position vector, so that the relative position information is also represented_embedding。

In yet another alternative embodiment of the present invention, the method may further comprise: and carrying out factorization processing on the input vector to obtain bidirectional information of the input vector.

Wherein, the specific factorization process may include: given a sequence xx of length T, there is a total of T! The arrangement method is also corresponding to T! A chain decomposition method is provided. For example, given the sequence x ═ x₁x₂x₃Then there may be a total of 3! As 6 decomposition methods, respectively:

p(x)＝p(x₁)p(x₂|x₁)p(x₃|x₁x₂)→1→2→3

p(x)＝p(x₁)p(x₂|x₁x₃)p(x₃|x₁)→1→3→2

p(x)＝p(x₁|x₂)p(x₂)p(x₃|x₁x₂)→2→1→3

p(x)＝p(x₁|x₂x₃)p(x₂)p(x₃|x₂)→2→3→1

p(x)＝p(x₁|x₃)p(x₂|x₁x₃)p(x₃)→3→1→2

p(x)＝p(x₁|x₁x₃)p(x₂|x₃)p(x₃)→3→2→1

wherein, p (x)₂|x₁x₃) Means that the first word is x₁And the third word is x₃Under the condition that the second word is x₂That is to say the original word order is preserved. The translator traverses T! Decomposition method and parameters of the model are sharedThen the model should be able to learn various contexts. Ordinary left-to-right or right-to-left language models can only learn one directional dependency, such as "guessing" a word first, then "guessing" a second word based on the first word, and "guessing" a third word based on the first two words, … …. The ranking language model learns various sequential guessing methods, such as the sequence 3 → 1 → 2 corresponding to the last expression above, which is to "guess" the third word, then guess the first word based on the third word, and finally guess the second word based on the first and third words. It should be noted that, after obtaining the factorized sequence, the mask masking mechanism is used herein not for taking it as a prediction word but for letting it not participate in the prediction of the sequence. That is, one word is predicted according to the context by the mask each time, and another word is predicted randomly by the mask according to the context until all words are mask and predicted. Thereby obtaining a pre-trained model.

It should be noted that the above factorization step may be performed during model pre-training, and since the essence of the dimensionality reduction bucket model is a Bert model constructed by a dimensionality reduction bucket self-attention mechanism, the Bert model needs to be subjected to a Masked LM process during pre-training, that is, model parameters are adjusted through mask word prediction to obtain a pre-trained dimensionality reduction bucket model. Since the mask words are independent of each other, that is, after the Bert model is 15% of the word masks, 15% of the masks are predicted by the remaining 85% of the words, and the context semantic relationship is ignored. Therefore, the embodiment of the invention can include all the context words in the training process through factorization processing, thereby improving the accuracy of the model.

In yet another alternative embodiment of the present invention, the method further comprises: and training a dimensionality reduction barrel model according to the Bert model and the bidirectional information of the input vector.

For the embodiments of the present invention, since the traditional autoregtrestive model is one-way inferred, the Bert model is two-way inferred. Therefore, combining the advantages of the two and considering the disadvantage of the mask mechanism of the Bert model in the pre-training task, no mask appears in the downstream task fine tuning, so that a factorization method can be adopted, and under the condition that the word order is kept unchanged, 85% of words before and after the words within the attribute length limit are randomly selected as context embedding (contextual embedding). In addition, the Bert model constructed by the dimensionality reduction barrel attention mechanism can solve the problem of deep semantic transfer of the ultra-long text in the NLP translation task, and the text length of the single-input model is expanded by reducing the complexity of calculation.

Compared with the prior art that text data translation is carried out through a self-attention mechanism, the text translation method based on the dimension reduction barrel model has the advantages that a text translation request is received, and the request carries text data to be translated; processing the text data to be translated through a dimension reduction barrel algorithm, and determining the word with the highest output confidence coefficient as a text translation result; and responding to the text translation request by using the text translation result. Therefore, the problem of deep semantic transfer of long texts in an NLP translation task can be solved by introducing a dimensionality reduction bucket algorithm, the length of a single input text is expanded by reducing the calculation complexity, and the accuracy and the translation efficiency of a translation result are improved.

Further, as a specific implementation of fig. 1, an embodiment of the present invention provides a text translation apparatus based on a dimension reduction bucket model, and as shown in fig. 2, the apparatus includes: a receiving unit 21, a processing unit 22 and a response unit 23.

The receiving unit 21 may be configured to receive a text translation request, where the request carries text data to be translated;

the processing unit 22 may be configured to process the text data to be translated through a dimension reduction bucket algorithm, and determine a word with the highest output confidence as a text translation result;

the response unit 23 may be configured to respond to the text translation request by using the text translation result.

The processing unit 22 may be specifically configured to process the text data to be translated by using a pre-trained dimension reduction bucket model, and determine a word with the highest output confidence as a text translation result.

Further, the processing unit 22 includes:

the dividing module 221 may be configured to perform normalization processing according to the obtained input vector to obtain an attention score, and divide the reduced-dimension bucket structure by using the attention score;

a supplement module 222, configured to perform a dimension reduction bucket structure supplement by using attention scores of different preset orders of magnitude;

the dimension reduction module 223 may be configured to perform dimension reduction processing on the dimension reduction barrel structure by using a dimension reduction algorithm;

the updating module 224 may be configured to perform shared weight updating on the attention score obtained through the dimension reduction processing to obtain a shared attention score, and output a word with the highest confidence as a text translation result through normalization processing.

Further, the apparatus further comprises:

the converting unit 24 may be configured to convert words in the text to be translated into word vectors by using a preset word vector algorithm.

Further, the apparatus further comprises:

the splicing unit 25 may be configured to obtain a position coding vector according to a preset position coding algorithm and the word vector, and splice the word vector and the position vector into an input vector.

Further, the apparatus further comprises:

the decomposition unit 26 may be configured to perform factorization processing on the input vector to obtain bidirectional information of the input vector.

Further, the apparatus further comprises:

the training unit 27 may train the dimension reduction bucket model according to the Bert model and the bidirectional information of the input vector.

It should be noted that other corresponding descriptions of the functional modules involved in the text translation apparatus based on the dimension reduction bucket model provided in the embodiment of the present invention may refer to the corresponding descriptions of the method shown in fig. 1, and are not described herein again.

Based on the method shown in fig. 1, correspondingly, an embodiment of the present invention further provides a storage medium, where at least one executable instruction is stored in the storage medium, and the executable instruction causes a processor to perform the following steps: receiving a text translation request, wherein the request carries text data to be translated; processing the text data to be translated through a dimension reduction barrel algorithm, and determining the word with the highest output confidence coefficient as a text translation result; and responding to the text translation request by using the text translation result.

Based on the above-mentioned embodiments of the method shown in fig. 1 and the apparatus shown in fig. 2, the embodiment of the present invention further provides a computer device, as shown in fig. 3, including a processor (processor)31, a communication Interface (communication Interface)32, a memory (memory)33, and a communication bus 34. Wherein: the processor 31, the communication interface 32, and the memory 33 communicate with each other via a communication bus 34. A communication interface 34 for communicating with network elements of other devices, such as clients or other servers. The processor 31 is configured to execute a program, and may specifically execute relevant steps in the above embodiment of the text translation method based on the dimension reduction bucket model. In particular, the program may include program code comprising computer operating instructions. The processor 31 may be a central processing unit CPU or an application specific integrated circuit asic or one or more integrated circuits configured to implement embodiments of the present invention.

The terminal comprises one or more processors, which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs. And a memory 33 for storing a program. The memory 33 may comprise a high-speed RAM memory, and may further include a non-volatile memory (non-volatile memory), such as at least one disk memory. The program may specifically be adapted to cause the processor 31 to perform the following operations: receiving a text translation request, wherein the request carries text data to be translated; processing the text data to be translated through a dimension reduction barrel algorithm, and determining the word with the highest output confidence coefficient as a text translation result; and responding to the text translation request by using the text translation result.

According to the technical scheme, a text translation request can be received, wherein the request carries text data to be translated; processing the text data to be translated through a dimension reduction barrel algorithm, and determining the word with the highest output confidence coefficient as a text translation result; and responding to the text translation request by using the text translation result. Therefore, the problem of deep semantic transfer of long texts in an NLP translation task is solved by introducing a dimensionality reduction bucket algorithm, the length of a single input text is expanded by reducing the calculation complexity, and the accuracy and the translation efficiency of a translation result are improved.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

It will be appreciated that the relevant features of the method and apparatus described above are referred to one another. In addition, "first", "second", and the like in the above embodiments are for distinguishing the embodiments, and do not represent merits of the embodiments.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components in accordance with embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A text translation method based on a dimension reduction bucket model is characterized by comprising the following steps:

2. The method according to claim 1, wherein the processing the text data to be translated through the dimensionality reduction bucket algorithm, and determining the word with the highest output confidence as the text translation result comprises:

and processing the text data to be translated by using a pre-trained dimensionality reduction barrel model, and determining the word with the highest output confidence coefficient as a text translation result.

3. The method according to claim 1, wherein the processing the text data to be translated by using a pre-trained dimension reduction bucket model, and determining the word with the highest output confidence as the text translation result comprises:

4. The method of claim 1, wherein before the processing the text data to be translated by the dimension reduction bucket algorithm, the method further comprises:

5. The method according to claim 4, wherein before converting words in the text to be translated into word vectors by using a preset word vector algorithm, the method further comprises:

6. The method of claim 5, further comprising:

7. The method of claim 6, wherein before factoring the input vector to obtain the bi-directional information of the input vector, the method further comprises:

8. A text translation device based on a dimension reduction bucket model is characterized by comprising:

9. A storage medium having a computer program stored thereon, the storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the dimension reduction bucket model based text translation method according to any one of claims 1-7.

10. A computer device comprising a processor, a memory, a communication interface, and a communication bus, wherein the processor, the memory, and the communication interface communicate with each other via the communication bus, and the memory is used for storing at least one executable instruction, which causes the processor to perform an operation corresponding to the dimension-reduced bucket model-based text translation according to any one of claims 1-7.