CN113569584A - Text translation method and device, electronic equipment and computer readable storage medium - Google Patents

Text translation method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN113569584A
CN113569584A CN202110097438.6A CN202110097438A CN113569584A CN 113569584 A CN113569584 A CN 113569584A CN 202110097438 A CN202110097438 A CN 202110097438A CN 113569584 A CN113569584 A CN 113569584A
Authority
CN
China
Prior art keywords
text
emotion
vector
translated
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110097438.6A
Other languages
Chinese (zh)
Inventor
梁云龙
孟凡东
徐金安
陈钰枫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110097438.6A priority Critical patent/CN113569584A/en
Publication of CN113569584A publication Critical patent/CN113569584A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the application provides a text translation method and device, electronic equipment and a computer readable storage medium, and relates to the technical field of language processing. The method comprises the steps of generating a sequence to be translated by obtaining a text to be translated and an emotion label associated with the text to be translated, coding the sequence to be translated to obtain an emotion prediction vector and a text processing vector, decoding the emotion prediction vector and the text processing vector to obtain a target translation, introducing emotion factors in the translation process, reflecting corresponding emotion in the translation, improving the translation accuracy and enabling the translation method to be suitable for more complex occasions.

Description

Text translation method and device, electronic equipment and computer readable storage medium
Technical Field
The present application relates to the field of language processing technologies, and in particular, to a text translation method, an apparatus, an electronic device, and a computer-readable storage medium.
Background
Natural language processing is an important direction in the fields of computer science and artificial intelligence. The computer is used as a powerful tool for language research, the quantitative research of language information is carried out under the support of the computer, and language description which can be commonly used between a person and the computer is provided.
With the development of natural language processing-related research, people have been exploring methods for text translation using machines. At present, a machine translation technology based on an artificial neural network is gradually developed, a sentence to be translated is generally divided into a plurality of clauses, modeling is performed, analysis and translation of the sentence are achieved, and in the process, reference to context information can be achieved through an attention network in a model. However, in this way, the translation result is still not accurate enough in a complex application environment.
Disclosure of Invention
The purpose of the present application is to solve at least one of the above-mentioned technical drawbacks, in particular the technical drawback of poor translation accuracy.
In a first aspect, a method for text translation is provided, and the method includes:
acquiring a text to be translated, and acquiring at least one emotion label associated with the text to be translated; at least one emotion label is used for predicting the emotion of the text to be translated;
generating a sequence to be translated based on the text to be translated and the emotion label;
coding a sequence to be translated to obtain an emotion prediction vector and a text processing vector;
decoding the emotion prediction vector and the text processing vector to obtain a target translation; the target translation corresponds to the emotion of the text to be translated.
In an optional embodiment of the first aspect, generating a sequence to be translated based on the text to be translated and the emotional tag comprises:
acquiring a related text of a text to be translated;
and splicing the text to be translated, the associated text and the emotion label to generate a sequence to be translated.
In an optional embodiment of the first aspect, obtaining at least one emotion tag associated with the text to be translated comprises:
identifying a text emotion corresponding to the associated text;
at least one sentiment tag is generated based on the text sentiment.
In an optional embodiment of the first aspect, encoding a sequence to be translated to obtain an emotion prediction vector and a text processing vector includes:
dividing a sequence to be translated into at least one unit, and determining a unit vector corresponding to each unit;
and acquiring an emotion prediction vector and a text processing vector corresponding to the text to be translated based on the unit vector.
In an optional embodiment of the first aspect, obtaining the emotion prediction vector and the text processing vector corresponding to the text to be translated based on the unit vector comprises:
acquiring at least one sequence vector corresponding to a sequence to be translated based on the unit vector;
determining an emotion prediction vector and at least one intermediate vector corresponding to the text to be translated from the sequence vector;
and coding the intermediate vector to obtain a text processing vector.
In an optional embodiment of the first aspect, decoding the emotion prediction vector and the text processing vector to obtain the target translation includes:
acquiring an emotion distribution vector containing a preset emotion type based on the emotion prediction vector;
converting the emotion distribution vector into an emotion feature vector;
decoding the text processing vector to obtain at least one decoding vector;
and splicing at least one decoding vector with the emotion characteristic vector to obtain a target translation.
In an optional embodiment of the first aspect, converting the emotion distribution vector into an emotion feature vector comprises:
determining at least one preset emotion and a preset emotion corresponding probability based on the emotion distribution vector;
determining emotion expression vectors corresponding to preset emotions from a preset emotion expression matrix;
and acquiring an emotion characteristic vector based on the probability corresponding to the preset emotion and the emotion expression vector.
In a second aspect, an apparatus for text translation is provided, the apparatus comprising:
the obtaining module is used for obtaining a text to be translated and obtaining at least one emotion label associated with the text to be translated; at least one emotion label is used for predicting the emotion of the text to be translated;
the generating module is used for generating a sequence to be translated based on the text to be translated and the emotion label;
the encoding module is used for encoding the sequence to be translated to obtain an emotion prediction vector and a text processing vector;
the decoding module is used for decoding the emotion prediction vector and the text processing vector to obtain a target translation; the target translation corresponds to the emotion of the text to be translated.
In an optional embodiment of the second aspect, the generating module, when generating the sequence to be translated based on the text to be translated and the emotion tag, is specifically configured to:
acquiring a related text of a text to be translated;
and splicing the text to be translated, the associated text and the emotion label to generate a sequence to be translated.
In an optional embodiment of the second aspect, the obtaining module, when obtaining at least one emotion tag associated with the text to be translated, is specifically configured to:
identifying a text emotion corresponding to the associated text;
at least one sentiment tag is generated based on the text sentiment.
In an optional embodiment of the second aspect, when the encoding module encodes the sequence to be translated to obtain the emotion prediction vector and the text processing vector, the encoding module is specifically configured to:
dividing a sequence to be translated into at least one unit, and determining a unit vector corresponding to each unit;
and acquiring an emotion prediction vector and a text processing vector corresponding to the text to be translated based on the unit vector.
In an optional embodiment of the second aspect, the encoding module, when obtaining the emotion prediction vector and the text processing vector corresponding to the text to be translated based on the unit vector, is specifically configured to:
acquiring at least one sequence vector corresponding to a sequence to be translated based on the unit vector;
determining an emotion prediction vector and at least one intermediate vector corresponding to the text to be translated from the sequence vector;
and coding the intermediate vector to obtain a text processing vector.
In an optional embodiment of the second aspect, when decoding the emotion prediction vector and the text processing vector to obtain the target translation, the decoding module is specifically configured to:
acquiring an emotion distribution vector containing a preset emotion type based on the emotion prediction vector;
converting the emotion distribution vector into an emotion feature vector;
decoding the text processing vector to obtain at least one decoding vector;
and splicing at least one decoding vector with the emotion characteristic vector to obtain a target translation.
In an optional embodiment of the second aspect, when converting the emotion distribution vector into the emotion feature vector, the decoding module is specifically configured to:
determining at least one preset emotion and a preset emotion corresponding probability based on the emotion distribution vector;
determining emotion expression vectors corresponding to preset emotions from a preset emotion expression matrix;
and acquiring an emotion characteristic vector based on the probability corresponding to the preset emotion and the emotion expression vector.
In a third aspect, an electronic device is provided, which includes:
the text translation system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the text translation method of any one of the embodiments.
In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the text translation method of any of the above embodiments.
According to the text translation method, the text to be translated and the emotion label associated with the text to be translated are obtained to generate the sequence to be translated, the sequence to be translated is coded to obtain the emotion prediction vector and the text processing vector, then the emotion prediction vector and the text processing vector are decoded to obtain the target translation, emotion factors are introduced in the translation process, corresponding emotion is reflected in the translation, the translation accuracy is improved, and the translation method can be suitable for more complex occasions.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic flowchart of a text translation method according to an embodiment of the present application;
fig. 2 is a schematic diagram of acquiring a sequence to be translated in a text translation method according to an embodiment of the present application;
fig. 3 is a schematic flowchart illustrating a sequence to be translated passing through an embedding layer in a text translation method according to an embodiment of the present application;
fig. 4 is a schematic diagram of a sub-structure of an encoder in a text translation method according to an embodiment of the present application;
fig. 5 is a schematic diagram of a sub-structure of an encoder in a text translation method according to an embodiment of the present application;
fig. 6 is a schematic flowchart of a text translation method according to an embodiment of the present application, where at least one sequence vector is obtained by an encoder;
fig. 7 is a schematic flowchart of determining an emotion prediction vector and an intermediate vector from a sequence vector in a text translation method according to an embodiment of the present application;
fig. 8 is a schematic flowchart illustrating a process of encoding a unit vector by an encoder in a text translation method according to an embodiment of the present application;
fig. 9 is a schematic diagram of emotion distribution vectors in a text translation method according to an embodiment of the present application;
fig. 10 is a schematic diagram of a decoder substructure in a text translation method according to an embodiment of the present application;
fig. 11 is a schematic diagram of a decoder substructure in a text translation method according to an embodiment of the present application;
fig. 12 is a schematic flowchart illustrating encoding and decoding unit vectors in a text translation method according to an embodiment of the present application;
fig. 13 is a schematic flowchart illustrating encoding and decoding unit vectors in a text translation method according to an embodiment of the present application;
fig. 14 is a schematic flowchart of obtaining an emotion feature vector in a text translation method according to an embodiment of the present application;
fig. 15 is a schematic flowchart of a text translation method according to an embodiment of the present application;
fig. 16 is a schematic structural diagram of a text translation apparatus according to an embodiment of the present application;
fig. 17 is a schematic structural diagram of an electronic device for text translation according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Natural Language Processing (NLP) is a technology for performing interactive communication with a machine using a Natural Language used for human communication. The natural language is processed by human, so that the computer can read and understand the natural language. The starting point of natural language processing related research is closely related to the human exploration of machine translation, and in order to realize the machine translation of natural language texts, a method for enabling a computer to solve natural language semantics and express given meanings in natural language is explored.
With the development of artificial intelligence technology, the machine translation method based on the neural network gradually surpasses the rule translation method and the statistical translation method, and becomes the mainstream technical scheme at present. The neural network machine translation method realizes modeling of variable-length input sentences through the structures of the encoder and the decoder. The encoder realizes the 'understanding' of source language sentences, floating point number vectors with a specific dimension are formed, and then the decoder generates translation results of a target language word by word according to the vectors. The method solves many problems in the traditional method, such as length limitation of a sequencing model and the like, improves fluency, but aiming at complex application scenes, the situation of low translation accuracy still occurs, for example, emotion factors cannot be well considered in conversation translation scenes, and translated texts obtained after translation are difficult to embody corresponding emotions.
The text translation method, the text translation device, the electronic device and the computer-readable storage medium provided by the application aim to solve the technical problems in the prior art.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
The text translation method provided by the embodiment of the application can be applied to a server and can also be applied to a terminal.
Those skilled in the art will understand that the "terminal" used herein may be a Mobile phone, a tablet computer, a PDA (Personal Digital Assistant), an MID (Mobile Internet Device), etc.; a "server" may be implemented as a stand-alone server or as a server cluster comprised of multiple servers.
The embodiment of the present application provides a text translation method, which may be applied to a server or a terminal, and as shown in fig. 1, the method may include:
step S101, acquiring a text to be translated, and acquiring at least one emotion label associated with the text to be translated; at least one emotion tag is used to predict the emotion of the text to be translated.
In the embodiment of the present application, the text to be translated may be a sentence, a section of speech, an article, etc., and may also be a sentence in a conversation process. The method for acquiring the text to be translated can be that a user inputs the text, the text meeting preset conditions can be automatically captured by a computer, and the text can also be a dialog text generated after voice recognition in a real-time dialog.
The obtaining of the at least one emotion tag associated with the text to be translated may be obtaining a context of the text to be translated, determining at least one emotion tag corresponding to the context, using the emotion tags as the emotion tags associated with the text to be translated, and determining emotion tags that may be matched with the text to be translated based on the environment type by determining a language occurrence environment of the text to be translated.
The emotion label associated with the text to be translated can be used for predicting the emotion corresponding to the text to be translated, so that preparation is made for introducing emotion factors in the translation process.
Specifically, the emotion labels may be in text form, such as "happy", "surprised", "angry", and the like; can be a number or a symbol, which is then associated with a different emotion; different emotions may also be represented in the form of vectors.
In an example, when the text to be translated has no context or other information that can be used to obtain the associated emotion tag, an initialized emotion tag to be predicted may be set to store the emotion feature vector generated after the encoding and decoding processes, and the emotion feature vector may be used to assist in obtaining a translation more suitable for the corresponding emotion.
And step S102, generating a sequence to be translated based on the text to be translated and the emotion label.
In the embodiment of the application, the text to be translated and the emotion label can be spliced to generate the sequence to be translated, so that the whole sequence can be conveniently encoded and decoded.
In an example, when the text to be translated has no context or other information that can be used to obtain the associated emotion tag, an initialized emotion tag to be predicted may be set and spliced at the head end of the text to be translated.
And step S103, coding the sequence to be translated to obtain an emotion prediction vector and a text processing vector.
In the embodiment of the application, the sequence to be translated may include a text to be translated and an emotion tag associated with the text to be translated, and the sequence to be translated may be encoded, that is, the sequence to be translated may be input into a preset encoder, the sequence to be translated is converted into a corresponding sequence vector, and then an emotion prediction vector and a text processing vector may be obtained based on the sequence vector.
The emotion prediction vector can be used for representing emotion prediction results corresponding to the text to be translated and introducing emotion factors in the decoding process; the text processing vector may include a vector obtained by segment-coding a portion corresponding to the text to be translated in the sequence to be translated.
Step S104, decoding the emotion prediction vector and the text processing vector to obtain a target translation; the target translation corresponds to the emotion of the text to be translated.
In the embodiment of the application, the emotion prediction vector can be converted into the emotion feature vector, the text processing vector can be decoded to obtain the decoding vector, then the emotion feature vector and the decoding vector can be subjected to feature fusion, and the target translation is obtained through the Softmax layer and is fused with the emotion corresponding to the text to be translated.
According to the text translation method, the text to be translated and the emotion label associated with the text to be translated are obtained to generate the sequence to be translated, the sequence to be translated is coded to obtain the emotion prediction vector and the text processing vector, then the emotion prediction vector and the text processing vector are decoded to obtain the target translation, emotion factors are introduced in the translation process, corresponding emotion is reflected in the translation, translation accuracy is improved, and the translation method can be suitable for more complex occasions.
In the embodiment of the present application, generating a sequence to be translated based on a text to be translated and an emotion tag may include the following steps:
(1) acquiring a related text of a text to be translated;
(2) and splicing the text to be translated, the associated text and the emotion label to generate a sequence to be translated.
The associated text of the text to be translated may be text information having an associated relationship with the text to be translated, and the associated relationship may be a context relationship, or may be an associated relationship such as the same article topic or the same description scenario, for example: when the text to be translated is a certain sentence or paragraph in the article, the associated text may be the preamble information of the sentence or the paragraph; when the text to be translated is a dialog in each scene, the text to be translated can be the speaking content of a certain role, and the associated text can be historical dialog information before the speaking, and the historical dialog information can be from the role or from other roles participating in the dialog; when the translation text is used for describing a specific subject, such as a wedding scene, the associated text may be text information for describing the wedding scene.
The text to be translated, the associated text and the at least one emotion tag may be spliced into a sequence to be translated. The emotion label associated with the text to be translated may refer to the emotion label corresponding to the associated text.
In the embodiment of the application, the preamble information or the historical conversation information of the text to be translated can be used as the associated text of the text to be translated, and the emotion tag associated with the text to be translated can refer to the emotion tag corresponding to the associated text. For further explanation, the emotion labels may be divided into known labels and predicted labels, the known labels may be obtained based on the associated text, the predicted labels may be initialized to store the prediction results of the emotion corresponding to the text to be translated, and specifically, the content of the predicted labels may be predicted by using the feature information included in the known labels and the associated text.
Specifically, the splicing manner may be to place the associated text in front of the text to be translated, place the predicted tag at the head end of the text, and insert at least one known tag into the associated text corresponding to the predicted tag.
In one example, a CLS (classification) flag may be added before an overall text composed of the associated text and the text to be translated, and the CLS flag may be used as a prediction tag in the emotion tag to store an emotion prediction result corresponding to the text to be translated.
The CLS label can be used as a symbol without obvious semantic information, so that the effect of fusing the semantic information of each word in the text more fairly is realized, and the emotion of the text to be translated can be well predicted through global characteristic information.
In one example, as shown in formula (1), a CLS tag is placed at the head end of the text for storing emotion prediction results corresponding to the text to be translated; SEP labels are arranged between sentences and used for segmenting the sentences in the text, and the segmentation basis can be based on punctuation marks; xnRepresenting the text to be translated, X0~Xn-1The associated text is represented by a representation of the associated text,
Figure BDA0002914826290000101
represents the (n-1) th associated text (X)n-1) A corresponding emotional tag.
Figure BDA0002914826290000102
In one example, as shown in fig. 2, the "first meeting" may be a text to be translated, the "hello" may be an associated text, a CLS tag is placed at the head end of the text for storing a prediction result about a corresponding emotion of the "first meeting", and an SEP mark is placed between the "hello" and the "first meeting" for clause. The "happy" may be the emotion label corresponding to the "first meeting", that is, the known label in the emotion labels corresponding to the associated text.
In the embodiment of the application, when the text to be translated, the associated text and at least one emotion label are spliced, the emotion label used for splicing can be in a word form, such as "happy", "surprised", "angry", and the like; can be a number or a symbol, which is then associated with a different emotion; different emotions may also be represented in the form of vectors.
In an embodiment of the present application, obtaining at least one emotion tag associated with a text to be translated may include: identifying a text emotion corresponding to the associated text; at least one sentiment tag is generated based on the text sentiment. The emotion label corresponding to the associated text can be known or generated through recognition of a neural network model for recognizing the emotion of the text.
In this embodiment of the present application, encoding a sequence to be translated to obtain an emotion prediction vector and a text processing vector may include the following steps:
(1) and dividing the sequence to be translated into at least one unit, and determining a unit vector corresponding to each unit.
The sequence to be translated can be divided into at least one unit (token) based on the lexical analysis result, the unit can be a word or a word, each unit can be encoded, and the sparse vector obtained after encoding is mapped into a low-dimensional word vector or word vector, that is, the unit vector corresponding to each unit is determined.
In the embodiment of the application, a sequence to be translated is divided into at least one unit, and the unit vector corresponding to each unit is determined to be realized through an Embedding (Embedding) layer in the deep learning field, so that a large sparse vector is converted into a low-dimensional space with a semantic relationship preserved, the resource occupation is reduced, the internal semantic relationship among words is preserved, and the relationship becomes relatively mature in the training process.
In particular, different functional embedding layers may be used in addition, for example, one or more of the following embedding layers may be included: a word embedding layer (word embedding) for dividing a word and representing the word with a shorter vector; a position embedding layer (positional embedding) for adding position information to each unit, which can be obtained by training learning or formula calculation; a turn embedding layer (turn embedding) for dividing the turn when the text is layered or the text has a plurality of conversation turns for the conversation content; a role embedding layer (role embedding) for dividing a text based on role attribution when there are a plurality of roles in the text, for example, dividing the content of actions made by different roles in a novel text, or dividing a text based on utterances of different roles in a dialog text.
In one example, as shown in FIG. 3, when applied to a dialog translation scenario, the embedded layer may consist of a stack of four layers: word embedding layer (word embedding), position embedding layer (position embedding), turn embedding layer (turn embedding), and role embedding layer (role embedding). The word embedding layer can be used for segmenting words of the associated text 1, the emotion tag 1, the associated text 2, the emotion tag 2 and the text to be translated to obtain at least one unit, and representing the unit by using a vector with a lower dimension; the position embedding layer can be used for adding position information to each unit and sequencing the units; the turn embedding layer can be used for distinguishing each turn, and assuming that in a conversation scene, the "associated text 1" and the "associated text 2" belong to a first turn, namely the first turn of conversation content, and the "text to be translated" belongs to a second turn, a turn identifier can be embedded between the "associated text 2" and the "text to be translated"; the role embedding layer can be used for segmentation based on text role attributes, for example, in a conversation scene, if "associated text 1" and "associated text 2" are utterances of a role a and "to-be-translated text" is an utterance of a role B, a role identifier can be embedded between the "associated text 2" and the "to-be-translated text".
The outputs of the four differently acting embedding layers may be subjected to an additive process.
In the embodiment of the present application, the weight of the Embedding layer may be replaced by a pre-training model, and if the pre-training model replaces the Embedding layer, it may not be necessary to train the weight parameter in advance. For example, a BERT (bidirectional Encoder responses from transformations) pre-training model can be used as the Embedding layer.
The BERT model is suitable for tasks at sentence and paragraph levels, has better performance when processing high-level semantic information extraction tasks, and has the advantage of being capable of acquiring context-related bidirectional feature representation. And acquiring an emotion prediction vector and a text processing vector corresponding to the text to be translated based on the unit vector.
(2) And acquiring an emotion prediction vector and a text processing vector corresponding to the text to be translated based on the unit vector.
In the embodiment of the application, at least one unit vector can be encoded, and an emotion prediction vector and a text processing vector corresponding to a text to be translated are determined from an encoding result.
Wherein, the encoding can be performed by an input encoder, the encoder can include an Attention (Attention) mechanism, while generating the output, an Attention range is generated to indicate that the important part of the input sequence needs to be focused when the next output is generated, and the next output is generated according to the focused region.
The emotion prediction vector may be used to represent a prediction result of an emotion corresponding to the text to be translated after encoding, and the text processing vector may refer to a portion corresponding to the text to be translated in a sequence vector obtained by encoding a unit into which the sequence to be translated is divided. The relationship between the emotion prediction vector, the text processing vector, and the sequence vector will be explained in the following.
In this embodiment of the present application, obtaining an emotion prediction vector and a text processing vector corresponding to a text to be translated based on a unit vector may include the following steps:
(1) at least one sequence vector corresponding to the sequence to be translated is obtained based on the unit vectors.
In this embodiment of the present application, the unit vector may be encoded by an encoder to obtain at least one sequence vector corresponding to the sequence to be translated, and the sequence vector and the unit vector may correspond to each other one to one.
As shown in fig. 4, X1, X2, and X3 may represent three element vectors, which are input into the self-attention layer to obtain processed Z1, Z2, and Z3, and then input Z1, Z2, and Z3 into the feedforward neural network to complete the encoding process of the single-layer substructure on the element vectors.
In the embodiment of the present application, the Self-Attention layer may be a Self-Attention mechanism (Self-Attention), and as each unit of the sequence to be processed is input, the Self-Attention focuses on all units of the whole input sequence, and the understanding of all relevant units is integrated into the unit being processed to assist the encoding process. The self-Attention mechanism may be a single-Head Attention mechanism or a Multi-Head Attention mechanism (Multi-Head Attention).
Feed-Forward neural Networks (FFN) may be unidirectional, multi-layer structures, where each layer contains a number of neurons, and each neuron may receive signals from a neuron in a previous layer and generate an output to a next layer. Specifically, the feedforward neural network can be realized by adopting a full connection layer, and the full connection layer can be formed by a two-layer neural network, and is subjected to linear transformation, then to ReLU nonlinear transformation, and finally to linear transformation.
The Normalization method includes various methods, such as Layer Normalization (LN), Batch Normalization (BN), and Weight Normalization (WN).
In one example, the use of a multi-head attention mechanism may increase the ability of the model to capture different location information, which may be associated with words in more locations; when mapping is carried out, the weight is not shared, the mapping is carried out on different subspaces, and the information covered by the finally spliced vector is wider. The number of heads of a multi-head attention mechanism is increased, and the long-distance information capturing capability of the model can be improved.
In one example, the Normalization uses a Layer Normalization method, the input of all dimensions of a Layer is considered comprehensively, the average input value and the input variance of the Layer are calculated, then the input of each dimension is converted by using the same Normalization operation, the average value and the variance do not need to be saved, and the additional storage space can be saved.
In the embodiment of the present application, the fully-connected layer in the feedforward neural network may be replaced with a convolutional layer.
In the embodiment of the present application, residual connection may be added between the self-attention layer and the feedforward neural network in the encoder substructure, summation and normalization are performed, residual connection is also performed before the feedforward neural network outputs, and the summation and normalization are performed before outputting, and then outputting is performed, at this time, a substructure schematic diagram may be as shown in fig. 5. Wherein, X1 and X2 may be two units input into the self-attention layer, Z1 and Z2 may be outputs corresponding to X1 and X2 after the self-attention layer processing, then Z1 and Z2 are transformed into a matrix form, summed with the residual block and then normalized to obtain Z1 'and Z2', Z1 'and Z2' may be input into the feedforward neural network, summed with the residual block and then normalized in the same way, and the encoding process of the single-layer substructure on the unit vector is completed.
In the embodiment of the application, the Residual block is added to prevent the degradation in the deep neural Network training, so that the problem of gradient disappearance caused by depth increase in the deep neural Network is solved, and the Residual block can be obtained through a Residual Network (ResNet).
In this embodiment of the present application, as shown in fig. 6, a process of obtaining at least one sequence vector corresponding to a sequence to be translated may be to divide the sequence to be translated into at least one unit layer by layer through an embedding, obtain unit vectors X1, X2, X3, X4, and X5 corresponding to each unit, input the unit vectors into an encoder, where the encoder includes at least one layer of sub-structure, the sub-structure includes a self-attention layer and a feedforward neural network, and the encoder encodes the unit vectors to obtain corresponding sequence vectors Z1, Z2, Z3, Z4, and Z5.
(2) An emotion prediction vector and at least one intermediate vector corresponding to the text to be translated are determined from the sequence vector.
In the embodiment of the present application, the sequence vector may correspond to a unit vector, the unit vector may correspond to an emotion tag in the sequence to be translated and the text to be translated, and the emotion tag may include a known tag and a predicted tag. When determining the emotion prediction vector from the sequence vectors, a unit vector corresponding to the prediction tag may be determined first, and then the sequence vector corresponding to the unit vector may be determined. Similarly, when determining the intermediate vector from the sequence vectors, at least one unit vector corresponding to the text to be translated may be determined first, and then the sequence vectors corresponding to the unit vectors may be determined, and the flowchart may be as shown in fig. 7.
The method comprises the steps of obtaining a translation sequence, setting a cross-grain part in the translation sequence as a known label, setting a black part in the translation sequence as a predicted label, setting a blank part in the translation sequence as a related text, setting a cross-grain part in the translation sequence as a known label corresponding to the related text, and setting a black part in the translation sequence as a text to be translated. The sequence to be translated can be input into an Embedding layer (Embedding) to obtain unit vectors X1, X2, X3, X4, X5, X6 and X7, where X1 is a unit vector corresponding to a prediction tag in the sequence to be translated, and X6 and X7 are unit vectors corresponding to a text portion to be translated in the sequence to be translated. And inputting the plurality of unit vectors into an encoder for encoding to obtain a plurality of sequence vectors Z1, Z2, Z3, Z4, Z5, Z6 and Z7. The sequence vector Z1 corresponds to a prediction tag in the sequence to be translated, and may be set as Z1 as emotional prediction vector Z1 ', the sequence vectors Z6 and Z7 correspond to the text to be translated in the sequence to be translated, and may be set as Z6 and Z7 as intermediate vectors Z6 ' and Z7 ', respectively.
(3) And coding the intermediate vector to obtain a text processing vector.
In the embodiment of the present application, the intermediate vector may be encoded by an encoder to obtain a text processing vector. The encoder can be divided into a bottom layer sub-model and a top layer sub-model, and respectively comprises at least one layer of sub-structure, and each layer of sub-structure can comprise a self-attention layer, a feedforward neural network and residual error connection.
In the embodiment of the present application, the intermediate vectors in the sequence vectors may be encoded again by a Mask (Mask) operation. Some values can be masked by the mask so that the values do not produce effects when the parameters are updated, and specifically, the parts except the intermediate vector of the sequence vector can be masked so that the parts do not enter the next encoding stage in the application.
In one example, as shown in fig. 8, the encoder may include six-layer substructures, and the first-layer substructure may be used as a Bottom-layer sub-model (Bottom Block) and the second-layer to fifth-layer sub-structures may be used as a Top-layer sub-model (Top Block). The unit vector can be input into the bottom-layer sub-model to obtain a sequence vector, an emotion prediction vector and a middle vector are determined from the sequence vector, and then the middle vector is input into the top-layer sub-model to be encoded to obtain a text processing vector.
As shown in fig. 8, the twill part in the sequence to be translated may be set as a prediction tag in the emotion tags for storing emotion prediction results of the text to be translated, the blank part may be set as an associated text, the cross-grain part may be set as a known tag corresponding to the associated text, and the dark part may be set as the text to be translated. The sequence to be translated can be input into an Embedding layer to obtain unit vectors X1, X2, X3, X4, X5, X6 and X7, where X1 is the unit vector corresponding to the prediction tag in the sequence to be translated, and X6 and X7 are the unit vectors corresponding to the text portion to be translated in the sequence to be translated.
A plurality of unit vectors may be input to a bottom-layer sub-model of an encoder for encoding, resulting in a plurality of sequence vectors Z1, Z2, Z3, Z4, Z5, Z6, and Z7. The sequence vector Z1 corresponds to a prediction tag in the sequence to be translated, and may be set as Z1 as emotional prediction vector Z1 ', the sequence vectors Z6 and Z7 correspond to the text to be translated in the sequence to be translated, and may be set as Z6 and Z7 as intermediate vectors Z6 ' and Z7 ', respectively. And inputting the intermediate vectors Z6 'and Z7' into the top-level submodel for encoding to obtain corresponding text processing vectors Y6 and Y7.
In this embodiment of the present application, decoding the emotion prediction vector and the text processing vector to obtain the target translation may include the following steps:
(1) and acquiring an emotion distribution vector containing a preset emotion type based on the emotion prediction vector.
In the embodiment of the application, the emotion prediction vector can be converted into a fixed vector, the conversion method can be to multiply parameters obtained by learning in training, and then the emotion distribution vector containing the preset category can be obtained through a Softmax function. The text to be translated may correspond to a plurality of emotions, so the emotion distribution vector may also include a plurality of emotions in the preset emotion category.
The emotion distribution vector can display all preset types of emotions and corresponding probabilities, and when the probability of a certain emotion is 0, the element corresponding to the emotion is set to be 0; or only the emotion probability with the probability not being 0 can be represented, and the index can be used for subsequent table look-up operation.
In one example, a schematic diagram of the emotion distribution vector may be as shown in fig. 9, and as method one in the diagram, all the predetermined kinds of emotions and corresponding probabilities may be displayed, wherein the probability of being happy is 0.1, the probability of being surprised is 0.2, the probability of being difficult is 0, the probability of being shy is 0.3, the probability of being excited is 0.4, and the probability of being regret is 0; as shown in the second method in the figure, only the emotion probability with a probability of not 0 may be represented.
(2) And converting the emotion distribution vector into an emotion feature vector.
In the embodiment of the application, the emotion distribution vector can be converted into the emotion feature vector fusing multiple emotion features, so that feature splicing can be performed subsequently, and the target translation can be obtained in an auxiliary manner.
(3) And decoding the text processing vector to obtain at least one decoding vector.
In an embodiment of the present application, the technique for decoding the text processing vector may include: masks, self-attention mechanisms, feedback neural networks, etc.
In this embodiment of the present application, the decoder may decode the text processing vector to obtain a decoded vector, and the decoder may be composed of multiple layers of decoding substructures, where each layer of decoding substructures may be as shown in fig. 10, and include: a Multi-Head Attention mechanism (Masked Multi-Head Attention), a Multi-Head Attention mechanism (Multi-Head Attention), and a feedback neural network (FFN, Feed-Forward Networks) with masks.
In the embodiment of the application, the multi-head attention mechanism can be used for increasing the capability of the model for capturing different position information and can be associated with words at more positions; when mapping is carried out, the weight is not shared, the mapping is carried out on different subspaces, and the information covered by the finally spliced vector is wider. The number of heads of a multi-head attention mechanism is increased, and the long-distance information capturing capability of the model can be improved.
Wherein, the multi-head attention mechanism can be replaced by a single-head attention mechanism; the position sequence among the feedforward neural network layer, the multi-head attention mechanism layer containing the mask, and the multi-head attention mechanism layer may be adjusted according to application requirements, and is not limited in the present application, and the position sequence in fig. 10 is only an example.
In an embodiment of the present application, a mask in a mask Multi-Head Attention mechanism (Masked Multi-Head Attention) including a mask may include an overlay mask (masking mask) and a sequence mask (sequence mask). Where the overlay mask may be used to adjust the attention range, meaningless padding may be generated when the input sequence is aligned, and where the overlay mask may be used to avoid the attention mechanism from putting attention on these portions. In particular, the value of the meaningless fill locations can be set to negative infinity, and the probability of these locations will be close to 0 when passing through the Softmax function. The order mask can be used to make the encoder invisible to future information, and to preserve the model autoregressive characteristics, i.e. at time i, the output of the decoder can only depend on the output before time i, but not on the output after time i, and specifically, an upper triangular matrix can be generated, the values of the upper triangle are all 0, and the matrix is applied to each order.
In this embodiment, residual connection may be added between the feedforward neural network layer, the multi-head attention mechanism layer including the mask, and the multi-head attention mechanism layer in the decoder substructure, and the residual connection is performed to perform summation and normalization, and at the same time, residual connection is also performed before the feedforward neural network is output, and the summation and normalization are performed before the output, and then the output is performed, and a schematic structural diagram may be as shown in fig. 11.
In an embodiment of the present application, the text processing vector output by the encoder may be converted into an attention vector set comprising a vector K (key vector) and a vector V (value vector), which are input into the self-attention layer of the decoder substructure for assisting the decoder in determining the appropriate attention range.
The input of the underlying decoder substructure may be the output of the previous time model, that is, the historical translation result output at a time step on the input of the underlying decoder substructure when the current text processing vector to be decoded is decoded. For the input historical translation result, the historical translation result can be input into the decoder after passing through an Embedding layer with the same function as that arranged before the decoder.
(4) And splicing at least one decoding vector with the emotion characteristic vector to obtain a target translation.
In the embodiment of the application, the decoded vector and the emotion feature vector can be spliced and then pass through a Softmax layer, the score is changed into the probability, the index corresponding to the highest probability is obtained, and the word corresponding to the index is determined and used as the output of the current time step. A termination symbol may be set, and when the decoder outputs the termination symbol, it indicates that the decoding process is completed, and a target translation corresponding to the text to be translated is obtained, and an emotional factor is introduced into the target translation.
In one example, as shown in fig. 12, the encoder may be internally divided into a top-level sub-model and a bottom-level sub-model, wherein an emotion prediction vector Y1 may be determined from at least one unit vector output by the bottom-level sub-model, and the emotion prediction vector Y1 may be transformed to obtain an emotion feature vector. The text vectors Y2 and Y3 output by the top level sub-model may be transformed to obtain a set of attention vectors comprising vectors K (key vectors) and V (value vectors), which are input into the self-attention layer of the decoder sub-structure for assisting the decoder in determining the appropriate attention range. Referring to fig. 12, the internal composition of the bottom layer sub-structure in the decoder is optionally plotted, and as shown, an Attention vector set containing vector K (key vector) and vector V (value vector) may be input into a Multi-Head Attention mechanism layer (Multi-Head Attention) in each layer of the encoder sub-structure.
When the encoder completes the decoding process of the currently operated text processing vector, the decoding vector is output, the score can be changed into the probability through a Softmax layer after the decoding vector and the emotion characteristic vector are spliced, the index corresponding to the highest probability is obtained, and the word corresponding to the index is determined to be used as the output of the current time step.
In an example, as shown in fig. 13, the input of the bottom layer decoder substructure may be the output of the model at the previous time, for example, in this example, the model output at the previous time is "you", and then "you" may pass through an Embedding layer and then be used as the input of the bottom layer decoder substructure, obtain the decoding vector output at the current time, then splice the decoding vector with the emotion feature vector, and then pass through a softmax layer to obtain the model output "hello" at the current time.
In one example, when decoding the first text processing vector, the decoder underlying substructure has no input and the model outputs "tom"; inputting 'Tohm' into a decoder bottom layer substructure, and outputting 'Tohm chase' from a model; the decoder bottom layer substructure inputs 'tom chase', and the model outputs 'tom chase jerry'; and inputting the Tohm chaser Jire into the substructure at the bottom layer of the decoder, identifying a termination symbol, outputting the Tohm chaser Jire by the model, and ending the translation process.
In the embodiment of the present application, converting the emotion distribution vector into the emotion feature vector may include the following steps:
(1) determining emotion expression vectors corresponding to preset emotions from a preset emotion expression matrix;
(2) and acquiring an emotion characteristic vector based on the probability corresponding to the preset emotion and the emotion expression vector.
The emotion expression matrix can be preset and can also be obtained through learning of a model training process, the preset emotion can refer to emotion contained in the emotion distribution vector, the emotion expression vector corresponding to the preset emotion is determined in the emotion expression matrix, the probability of each preset emotion can be multiplied by the corresponding emotion expression vector, and then all multiplication results are summed to serve as the emotion feature vector.
In one example, as shown in fig. 14, the emotion distribution vector may show all the predetermined kinds of emotions and corresponding probabilities, and when the probability of a certain emotion is 0, the element corresponding to the emotion is set to 0. Specifically, the diagram has three preset emotions, namely 'happy', 'too difficult' and 'surprised', the probabilities are '0.6', '0' and '0.4', the right side is an emotion expression matrix, the size can be 3X100, and the emotion expression matrix is obtained through training and learning. When calculating the emotion feature vector, the emotion distribution vector may be multiplied by the emotion expression matrix to obtain the emotion feature vector.
In the embodiment of the application, the emotion distribution vector contains the probability of various emotions, and the emotion characteristic vector obtained based on the emotion distribution vector and the emotion expression matrix contains characteristic parameters of various emotions, so that the emotion characteristics are more fit with the diversity of language emotion in reality, and the translation accuracy is improved.
In the embodiment of the application, the translation operation of the text to be translated can be completed by improving the Transformer model and introducing emotional factors in the translation process. The transform model is a model which is derived from Google and can be applied to a machine translation task, can realize quick parallelism by using a self-attention mechanism, and has better accuracy.
The improved Transformer model may include an embedded layer, an emotion polarity prediction module, an encoder, and a decoder, among others. The encoder may include a bottom layer submodel and a top layer submodel, the bottom layer submodel and the top layer submodel respectively including a multilayer encoder substructure, each layer of the encoder substructure may include a self-attention mechanism layer, a residual connection and normalization layer, and a feed-forward neural network layer. The decoder may include a multi-layer decoder sub-structure, each of which may include a multi-head attention mechanism layer including a mask, a multi-head attention mechanism layer, a feed-forward neural network layer, and a residual concatenation and normalization layer.
The multi-head attention mechanism can increase the capability of the model for capturing different position information and can be associated with words at more positions; when mapping is carried out, the weight is not shared, the mapping is carried out on different subspaces, and the information covered by the finally spliced vector is wider. The number of heads of a multi-head attention mechanism is increased, and the long-distance information capturing capability of the model can be improved.
Specifically, a sequence to be translated may be input into the embedding layer, the sequence to be translated is divided into a plurality of units, a unit vector corresponding to each unit is determined, and the unit vectors may be input into codes in a model to obtain an emotion prediction vector and a text processing vector.
The emotion prediction vector can be converted into an emotion distribution vector through an emotion polarity prediction module. Specifically, the emotion prediction vector may be converted into a vector of a fixed length, and then the emotion distribution vector is obtained through a Softmax function.
The text processing vector can be decoded by a decoder, and particularly, the text processing vector is used for helping a multi-head attention mechanism layer in each layer of the substructure of the decoder to determine an attention range, so that multi-head interactive attention processing is performed.
When the model is output at the previous time step, and the translation corresponding to the current unit is translated, the output at the previous time step can be converted into a vector through an embedded layer, the vector is input into a decoder from a substructure of a bottommost encoder to obtain a corresponding decoded vector, the decoded vector is spliced with the emotion characteristic vector, and the model output at the current moment is obtained through a softmax layer.
In order to explain the text translation method of the present application more clearly, the text translation method will be further explained below with reference to specific examples.
In one example, the present application provides a text translation method, as in fig. 15, comprising the steps of:
step S1501, acquiring a text to be translated and an associated text; obtaining at least one emotion label; the emotion label can be used for predicting the emotion of the text to be translated;
step S1502, splicing the text to be translated, the associated text and the emotion label to generate a sequence to be translated;
step S1503, dividing the sequence to be translated into at least one unit, and determining a unit vector corresponding to each unit; specifically, a sequence to be translated can be input into an embedding layer with multiple functions to obtain at least one unit vector;
step S1504, acquiring at least one sequence vector corresponding to the sequence to be translated based on the unit vector; specifically, the unit vector may be input into a bottom-layer sub-model of the encoder to obtain at least one sequence vector;
step S1505, determining an emotion prediction vector and at least one intermediate vector corresponding to the text to be translated from the sequence vector;
step S1506, encoding the intermediate vector to obtain at least one text processing vector; specifically, the intermediate vector may be input to a top-level sub-model in the encoder, and re-encoded to obtain at least one text processing vector;
step S1507, acquiring an emotion distribution vector containing a preset emotion type based on the emotion prediction vector;
step S1508, acquiring an emotion expression matrix, and determining an emotion expression vector corresponding to a preset emotion from the emotion expression matrix;
step S1509, acquiring emotion characteristic vectors based on the probability and emotion expression vectors corresponding to preset emotions;
step S1510, decoding the text processing vector to obtain at least one decoding vector;
and step S1511, splicing at least one decoding vector with the emotion characteristic vector respectively to obtain a target translation.
An embodiment of the present application provides a text translation apparatus, and as shown in fig. 16, the image processing apparatus 160 may include: an obtaining module 1601, a generating module 1602, an encoding module 1603, and a decoding module 1604, wherein,
an obtaining module 1601, configured to obtain a text to be translated, and obtain at least one emotion tag associated with the text to be translated; the at least one emotion tag is used for predicting the emotion of the text to be translated;
a generating module 1602, configured to generate the sequence to be translated based on the text to be translated and the emotion tag;
an encoding module 1603, configured to encode the sequence to be translated to obtain an emotion prediction vector and a text processing vector;
a decoding module 1604, configured to decode the emotion prediction vector and the text processing vector to obtain a target translation; the target translation corresponds to the emotion of the text to be translated.
According to the text translation device, the text to be translated and the emotion label associated with the text to be translated are obtained to generate the sequence to be translated, the sequence to be translated is coded to obtain the emotion prediction vector and the text processing vector, then the emotion prediction vector and the text processing vector are decoded to obtain the target translation, emotion factors are introduced in the translation process, corresponding emotion is reflected in the translation, the translation accuracy is improved, and the translation method can be suitable for more complex occasions.
In this embodiment of the application, when the generating module 1602 generates the sequence to be translated based on the text to be translated and the emotion tag, specifically configured to:
acquiring a related text of a text to be translated;
and splicing the text to be translated, the associated text and the emotion label to generate a sequence to be translated.
In this embodiment of the application, the obtaining module 1601 is specifically configured to, when obtaining at least one emotion tag associated with a text to be translated:
identifying a text emotion corresponding to the associated text;
at least one sentiment tag is generated based on the text sentiment.
In this embodiment of the application, when encoding a sequence to be translated to obtain an emotion prediction vector and a text processing vector, the encoding module 1603 is specifically configured to:
dividing a sequence to be translated into at least one unit, and determining a unit vector corresponding to each unit;
and acquiring an emotion prediction vector and a text processing vector corresponding to the text to be translated based on the unit vector.
In this embodiment of the application, when obtaining the emotion prediction vector and the text processing vector corresponding to the text to be translated based on the unit vector, the encoding module 1603 is specifically configured to:
acquiring at least one sequence vector corresponding to a sequence to be translated based on the unit vector;
determining an emotion prediction vector and at least one intermediate vector corresponding to the text to be translated from the sequence vector;
and coding the intermediate vector to obtain a text processing vector.
In this embodiment of the application, when the decoding module 1604 decodes the emotion prediction vector and the text processing vector to obtain the target translation, it is specifically configured to:
acquiring an emotion distribution vector containing a preset emotion type based on the emotion prediction vector;
converting the emotion distribution vector into an emotion feature vector;
decoding the text processing vector to obtain at least one decoding vector;
and splicing at least one decoding vector with the emotion characteristic vector to obtain a target translation.
In this embodiment of the application, when the decoding module 1604 converts the emotion distribution vector into an emotion feature vector, it is specifically configured to:
determining at least one preset emotion and a preset emotion corresponding probability based on the emotion distribution vector;
determining emotion expression vectors corresponding to preset emotions from a preset emotion expression matrix;
and acquiring an emotion characteristic vector based on the probability corresponding to the preset emotion and the emotion expression vector.
In an alternative embodiment, there is provided an electronic apparatus, as shown in fig. 17, an electronic apparatus 4000 shown in fig. 17 including: a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further comprise a transceiver 4004. In addition, the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.
The Processor 4001 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 4001 may also be a combination that performs a computational function, including, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
Bus 4002 may include a path that carries information between the aforementioned components. The bus 4002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 17, but this does not mean only one bus or one type of bus.
The Memory 4003 may be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, a RAM (Random Access Memory) or other types of dynamic storage devices that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.
The memory 4003 is used for storing application codes for executing the scheme of the present application, and the execution is controlled by the processor 4001. Processor 4001 is configured to execute application code stored in memory 4003 to implement what is shown in the foregoing method embodiments.
The electronic devices include, but are not limited to, mobile terminals such as mobile phones, notebook computers, PADs, etc., and fixed terminals such as digital TVs, desktop computers, etc.
The present application provides a computer-readable storage medium, on which a computer program is stored, which, when running on a computer, enables the computer to execute the corresponding content in the foregoing method embodiments. Compared with the prior art, the emotion factor can be introduced in the translation process, the corresponding emotion is reflected in the translated text, the translation accuracy is improved, and the translation method can be suitable for more complex occasions.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A method of text translation, comprising:
acquiring a text to be translated, and acquiring at least one emotion label associated with the text to be translated; the at least one emotion tag is used for predicting the emotion of the text to be translated;
generating the sequence to be translated based on the text to be translated and the emotion label;
encoding the sequence to be translated to obtain an emotion prediction vector and a text processing vector;
decoding the emotion prediction vector and the text processing vector to obtain a target translation; the target translation corresponds to the emotion of the text to be translated.
2. The text translation method according to claim 1, wherein the generating the sequence to be translated based on the text to be translated and the emotion label comprises:
acquiring a related text of the text to be translated;
and splicing the text to be translated, the associated text and the emotional tag to generate the sequence to be translated.
3. The text translation method according to claim 2, wherein the obtaining of at least one emotion label associated with the text to be translated comprises:
identifying a text emotion corresponding to the associated text;
generating the at least one emotion tag based on the textual emotion.
4. The text translation method according to claim 2, wherein said encoding the sequence to be translated to obtain an emotion prediction vector and a text processing vector comprises:
dividing the sequence to be translated into at least one unit, and determining a unit vector corresponding to each unit;
and acquiring an emotion prediction vector and a text processing vector corresponding to the text to be translated based on the unit vector.
5. The text translation method according to claim 4, wherein the obtaining of the emotion prediction vector and the text processing vector corresponding to the text to be translated based on the unit vector comprises:
acquiring at least one sequence vector corresponding to the sequence to be translated based on the unit vector;
determining the emotion prediction vector and at least one intermediate vector corresponding to the text to be translated from the sequence vector;
and coding the intermediate vector to obtain the text processing vector.
6. The text translation method of claim 5, wherein the decoding the emotion prediction vector and the text processing vector to obtain a target translation comprises:
acquiring an emotion distribution vector containing a preset emotion type based on the emotion prediction vector;
converting the emotion distribution vector into an emotion feature vector;
decoding the text processing vector to obtain at least one decoding vector;
and splicing the at least one decoding vector with the emotion characteristic vector respectively to obtain the target translation.
7. The text translation method of claim 6 wherein said converting the emotion distribution vector into an emotion feature vector comprises:
determining at least one preset emotion and the corresponding probability of the preset emotion based on the emotion distribution vector;
determining an emotion expression vector corresponding to the preset emotion from a preset emotion expression matrix;
and acquiring an emotion feature vector based on the probability corresponding to the preset emotion and the emotion expression vector.
8. An apparatus for text translation, comprising:
the obtaining module is used for obtaining a text to be translated and obtaining at least one emotion label associated with the text to be translated; the at least one emotion tag is used for predicting the emotion of the text to be translated;
the generating module is used for generating the sequence to be translated based on the text to be translated and the emotion label;
the coding module is used for coding the sequence to be translated to obtain an emotion prediction vector and a text processing vector;
the decoding module is used for decoding the emotion prediction vector and the text processing vector to obtain a target translation; the target translation corresponds to the emotion of the text to be translated.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the text translation method of any one of claims 1-7 when executing the program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, implements the text translation method according to any one of claims 1 to 7.
CN202110097438.6A 2021-01-25 2021-01-25 Text translation method and device, electronic equipment and computer readable storage medium Pending CN113569584A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110097438.6A CN113569584A (en) 2021-01-25 2021-01-25 Text translation method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110097438.6A CN113569584A (en) 2021-01-25 2021-01-25 Text translation method and device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN113569584A true CN113569584A (en) 2021-10-29

Family

ID=78160961

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110097438.6A Pending CN113569584A (en) 2021-01-25 2021-01-25 Text translation method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113569584A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114282555A (en) * 2022-03-04 2022-04-05 北京金山数字娱乐科技有限公司 Translation model training method and device, and translation method and device
CN116611459A (en) * 2023-07-19 2023-08-18 腾讯科技(深圳)有限公司 Translation model training method and device, electronic equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007148039A (en) * 2005-11-28 2007-06-14 Matsushita Electric Ind Co Ltd Speech translation device and speech translation method
CN110377733A (en) * 2019-06-28 2019-10-25 平安科技(深圳)有限公司 A kind of text based Emotion identification method, terminal device and medium
CN111223498A (en) * 2020-01-10 2020-06-02 平安科技(深圳)有限公司 Intelligent emotion recognition method and device and computer readable storage medium
CN111382580A (en) * 2020-01-21 2020-07-07 沈阳雅译网络技术有限公司 Encoder-decoder framework pre-training method for neural machine translation
CN111444709A (en) * 2020-03-09 2020-07-24 腾讯科技(深圳)有限公司 Text classification method, device, storage medium and equipment
CN111597830A (en) * 2020-05-20 2020-08-28 腾讯科技(深圳)有限公司 Multi-modal machine learning-based translation method, device, equipment and storage medium
CN111858932A (en) * 2020-07-10 2020-10-30 暨南大学 Multiple-feature Chinese and English emotion classification method and system based on Transformer
CN111930940A (en) * 2020-07-30 2020-11-13 腾讯科技(深圳)有限公司 Text emotion classification method and device, electronic equipment and storage medium
WO2020228376A1 (en) * 2019-05-16 2020-11-19 华为技术有限公司 Text processing method and model training method and apparatus
CN112084331A (en) * 2020-08-27 2020-12-15 清华大学 Text processing method, text processing device, model training method, model training device, computer equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007148039A (en) * 2005-11-28 2007-06-14 Matsushita Electric Ind Co Ltd Speech translation device and speech translation method
WO2020228376A1 (en) * 2019-05-16 2020-11-19 华为技术有限公司 Text processing method and model training method and apparatus
CN110377733A (en) * 2019-06-28 2019-10-25 平安科技(深圳)有限公司 A kind of text based Emotion identification method, terminal device and medium
CN111223498A (en) * 2020-01-10 2020-06-02 平安科技(深圳)有限公司 Intelligent emotion recognition method and device and computer readable storage medium
CN111382580A (en) * 2020-01-21 2020-07-07 沈阳雅译网络技术有限公司 Encoder-decoder framework pre-training method for neural machine translation
CN111444709A (en) * 2020-03-09 2020-07-24 腾讯科技(深圳)有限公司 Text classification method, device, storage medium and equipment
CN111597830A (en) * 2020-05-20 2020-08-28 腾讯科技(深圳)有限公司 Multi-modal machine learning-based translation method, device, equipment and storage medium
CN111858932A (en) * 2020-07-10 2020-10-30 暨南大学 Multiple-feature Chinese and English emotion classification method and system based on Transformer
CN111930940A (en) * 2020-07-30 2020-11-13 腾讯科技(深圳)有限公司 Text emotion classification method and device, electronic equipment and storage medium
CN112084331A (en) * 2020-08-27 2020-12-15 清华大学 Text processing method, text processing device, model training method, model training device, computer equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114282555A (en) * 2022-03-04 2022-04-05 北京金山数字娱乐科技有限公司 Translation model training method and device, and translation method and device
CN116611459A (en) * 2023-07-19 2023-08-18 腾讯科技(深圳)有限公司 Translation model training method and device, electronic equipment and storage medium
CN116611459B (en) * 2023-07-19 2024-03-15 腾讯科技(深圳)有限公司 Translation model training method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111783462B (en) Chinese named entity recognition model and method based on double neural network fusion
CN107979764B (en) Video subtitle generating method based on semantic segmentation and multi-layer attention framework
CN113420807A (en) Multi-mode fusion emotion recognition system and method based on multi-task learning and attention mechanism and experimental evaluation method
CN111862977B (en) Voice conversation processing method and system
WO2020140487A1 (en) Speech recognition method for human-machine interaction of smart apparatus, and system
CN113158665B (en) Method for improving dialog text generation based on text abstract generation and bidirectional corpus generation
CN112860888B (en) Attention mechanism-based bimodal emotion analysis method
CN112199956B (en) Entity emotion analysis method based on deep representation learning
CN113076441A (en) Keyword extraction method and device, electronic equipment and computer readable storage medium
CN110457713A (en) Interpretation method, device, equipment and storage medium based on Machine Translation Model
CN113792177B (en) Scene character visual question-answering method based on knowledge-guided deep attention network
CN113569584A (en) Text translation method and device, electronic equipment and computer readable storage medium
CN114969338A (en) Image-text emotion classification method and system based on heterogeneous fusion and symmetric translation
CN116341651A (en) Entity recognition model training method and device, electronic equipment and storage medium
CN109979461B (en) Voice translation method and device
CN111222343A (en) Intention identification method and intention identification device
CN114399646B (en) Image description method and device based on transform structure
CN116029303A (en) Language expression mode identification method, device, electronic equipment and storage medium
CN112802451B (en) Prosodic boundary prediction method and computer storage medium
CN114329005A (en) Information processing method, information processing device, computer equipment and storage medium
Asadi et al. A deep decoder structure based on wordembedding regression for an encoder-decoder based model for image captioning
CN116310984B (en) Multi-mode video subtitle generating method based on Token sampling
CN117150320B (en) Dialog digital human emotion style similarity evaluation method and system
CN115017900B (en) Conversation emotion recognition method based on multi-mode multi-prejudice
CN112084782B (en) Answer identification method and system based on energy-enhanced attention network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination