CN109284510B - Text processing method and system and text processing device - Google Patents

Text processing method and system and text processing device Download PDF

Info

Publication number
CN109284510B
CN109284510B CN201710602815.0A CN201710602815A CN109284510B CN 109284510 B CN109284510 B CN 109284510B CN 201710602815 A CN201710602815 A CN 201710602815A CN 109284510 B CN109284510 B CN 109284510B
Authority
CN
China
Prior art keywords
word
information
source
hidden layer
decoded
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710602815.0A
Other languages
Chinese (zh)
Other versions
CN109284510A (en
Inventor
程善伯
王宇光
姜里羊
陈伟
王砚峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201710602815.0A priority Critical patent/CN109284510B/en
Publication of CN109284510A publication Critical patent/CN109284510A/en
Application granted granted Critical
Publication of CN109284510B publication Critical patent/CN109284510B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation

Abstract

The embodiment of the invention provides a text processing method, a text processing system and a text processing device, wherein the method comprises the following steps: receiving source text, the source text having a plurality of source words; invoking an encoder to encode the plurality of source words into a plurality of vectors; when the tth target word is decoded, determining the center point of the local attention window according to one or more information in the encoding state, the decoding state when the tth target word is decoded, and the center point before the tth target word is decoded; determining a local attention window based on a center point of the local attention window; and calling a decoder to decode the t-th target word from the vector according to the source word in the local attention window. By comprehensively considering various information, the accuracy of the center positioning of attention is improved, and the quality of business processing such as translation is improved.

Description

Text processing method and system and text processing device
Technical Field
The present invention relates to the field of language processing technologies, and in particular, to a text processing method, a text processing system, and a device for text processing.
Background
Machine translation, also known as automatic translation technology, automatically converts one language into another, the former known as the source language and the latter known as the target language, by exploiting the programming power of computers.
Currently, a local attention model is commonly used in machine translation, and the local attention model is an improvement based on the attention model.
However, the feedforward neural network uses less information to be referred to by the encoder, and the accuracy of attention centering is low, resulting in poor translation quality.
Disclosure of Invention
In view of the foregoing problems, in order to solve the problem of low accuracy of the above-mentioned center positioning, embodiments of the present invention provide a text processing method, a corresponding text processing system, and a device for text processing.
In order to solve the above problem, an embodiment of the present invention discloses a text processing method, including:
receiving a source text, the source text having a plurality of source words;
calling an encoder to encode the plurality of source words into a plurality of vectors;
when the t-th target word is decoded, determining the center point of a local attention window according to one or more information of an encoding state, a decoding state when the t-th target word is decoded, and the center point before the t-th target word is decoded;
determining a local attention window based on a center point of the local attention window;
and calling a decoder to decode the t-th target word from the vector according to the source word in the local attention window.
Optionally, the step of determining a center point of a local attention window according to one or more information of an encoding status, a decoding status when the t-th target word is decoded, and a center point before the t-th target word is decoded includes:
acquiring one or more information in a first hidden layer state of the encoder, a second hidden layer state of the decoder when the t-th target word is decoded, and a matrix connection of a weight matrix when other target words before the t-th target word are decoded;
and determining the center of attention concentration in the source text by combining the first hidden layer state, the second hidden layer state and the matrix connection, wherein the center of attention concentration is used as the center point of a local attention window.
Optionally, the step of obtaining a first hidden layer state of the encoder, a second hidden layer state of the decoder when the t-th target word is decoded, and one or more pieces of information in matrix connection of a weight matrix when other target words before the t-th target word are decoded includes:
extracting the jth source word recorded when the source text is sequentially input and the first word information of the source word behind the jth source word;
extracting second word information of a jth source word recorded when the source text is input in a reverse order and a source word positioned before the jth source word;
combining the first word information and the second word information, and converting into a first hidden layer state of the encoder;
and/or the presence of a gas in the atmosphere,
extracting a plurality of weight matrixes when other target words before the t-th target word are decoded;
mapping the weight matrixes into a plurality of weight matrixes with specified formats;
and adding the weight matrixes of the plurality of specified formats to obtain matrix connection.
Optionally, the step of determining a center of attention in the source text by combining the first hidden state, the second hidden state and the matrix connection, and the step of using the center of attention as a center point of a local attention window includes:
respectively configuring a weight matrix for one or more information in the first hidden layer state, the second hidden layer state and the matrix connection;
combining and configuring one or more information of a first hidden layer state, a second hidden layer state and the matrix connection of a weight matrix to obtain characteristic information;
carrying out nonlinear activation on the characteristic information and configuring a weight matrix to obtain activation information;
carrying out nonlinear transformation on the activation information to obtain a characteristic value;
and rounding down the product between the characteristic value and the word length of the source word to obtain the central point of the local attention window.
Optionally, the step of determining a local attention window based on the center of local attention comprises:
calculating a difference value between the central point and a preset central deviation value as a first endpoint value;
calculating a sum value between the central point and a preset central deviation value as a second endpoint value;
setting a distance between the first endpoint value and the second endpoint value as a local attention window.
The embodiment of the invention also discloses a text processing system, which comprises:
a source text receiving module, configured to receive a source text, where the source text has a plurality of source words;
the vector coding module is used for calling a coder to code the source words into a plurality of vectors;
the central point determining module is used for determining the central point of the local attention window according to one or more information in the coding state, the decoding state when the tth target word is decoded and the central point before the tth target word is decoded when the tth target word is decoded;
a local attention window determination module to determine a local attention window based on a center point of the local attention window;
and the vector decoding module is used for calling a decoder to decode the t-th target word from the vector according to the source word in the local attention window.
Optionally, the central point determining module comprises:
a reference information obtaining sub-module, configured to obtain one or more pieces of information in a first hidden layer state of the encoder, a second hidden layer state of the decoder when the t-th target word is decoded, and a matrix connection of a weight matrix when other target words before the t-th target word are decoded;
and the reference information determining submodule is used for determining the center of the concentration in the source text as the central point of the local attention window by combining the first hidden layer state, the second hidden layer state and the matrix connection.
Optionally, the reference information obtaining sub-module includes:
a first word information extraction unit, configured to extract first word information of a jth source word and source words located after the jth source word, which are recorded when the source text is sequentially input;
a second word information extraction unit, configured to extract second word information of a jth source word recorded when the source text is input in a reverse order and a source word located before the jth source word;
a word information combining and converting unit, configured to combine the first word information and the second word information, and convert the first word information into a first hidden layer state of the encoder;
and/or the presence of a gas in the gas,
a weight matrix extracting unit, configured to extract a plurality of weight matrices when decoding other target words before the t-th target word;
the weight matrix mapping unit is used for mapping the weight matrixes into a plurality of weight matrixes with specified formats;
and the weight matrix adding unit is used for adding the weight matrixes of the plurality of specified formats to obtain matrix connection.
Optionally, the reference information determination sub-module includes:
a weight matrix configuration unit, configured to configure a weight matrix for one or more information in the first hidden layer state, the second hidden layer state, and the matrix connection, respectively;
the reference information combination unit is used for combining and configuring one or more information of a first hidden layer state, a second hidden layer state and the matrix connection of the weight matrix to obtain characteristic information;
the nonlinear activation unit is used for carrying out nonlinear activation on the characteristic information and configuring a weight matrix to obtain activation information;
the nonlinear transformation unit is used for carrying out nonlinear transformation on the activation information to obtain a characteristic value;
and the lower rounding unit is used for rounding down the product between the characteristic value and the word length of the source word to obtain the central point of the local attention window.
Optionally, the local attention window determination module comprises:
the first endpoint value setting submodule is used for calculating a difference value between the central point and a preset central deviation value to serve as a first endpoint value;
the second endpoint value setting submodule is used for calculating a sum value between the central point and a preset central deviation value to serve as a second endpoint value;
a local attention window setting submodule for setting a distance between the first endpoint value and the second endpoint value as a local attention window.
The embodiment of the invention also discloses a device for text processing, which comprises a memory and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs are configured to be executed by one or more processors and comprise instructions for:
receiving source text, the source text having a plurality of source words;
invoking an encoder to encode the plurality of source words into a plurality of vectors;
when the tth target word is decoded, determining the center point of the local attention window according to one or more information in the encoding state, the decoding state when the tth target word is decoded, and the center point before the tth target word is decoded;
determining a local attention window based on a center point of the local attention window;
and calling a decoder to decode the t-th target word from the vector according to the source word in the local attention window.
Optionally, the one or more programs also configured to be executed by the one or more processors include instructions for:
acquiring one or more information in a first hidden layer state of the encoder, a second hidden layer state of the decoder when the t-th target word is decoded, and a matrix connection of a weight matrix when other target words before the t-th target word are decoded;
and determining the center of attention concentration in the source text by combining the first hidden layer state, the second hidden layer state and the matrix connection, wherein the center of attention concentration is used as the center point of a local attention window.
Optionally, the one or more programs also configured to be executed by the one or more processors include instructions for:
extracting the first word information of the jth source word recorded when the source text is sequentially input and the source words behind the jth source word;
extracting second word information of a jth source word recorded when the source text is input in a reverse order and a source word positioned before the jth source word;
combining the first word information and the second word information, and converting into a first hidden layer state of the encoder;
and/or the presence of a gas in the gas,
extracting a plurality of weight matrixes when other target words before the t-th target word are decoded;
mapping the weight matrixes into a plurality of weight matrixes with specified formats;
and adding the weight matrixes of the plurality of specified formats to obtain matrix connection.
Optionally, the one or more programs also configured to be executed by the one or more processors include instructions for:
respectively configuring a weight matrix for one or more information in the first hidden layer state, the second hidden layer state and the matrix connection;
combining and configuring one or more information of a first hidden layer state, a second hidden layer state and the matrix connection of a weight matrix to obtain characteristic information;
carrying out nonlinear activation on the characteristic information and configuring a weight matrix to obtain activation information;
carrying out nonlinear transformation on the activation information to obtain a characteristic value;
and rounding down the product between the characteristic value and the word length of the source word to obtain the central point of the local attention window.
Optionally, the one or more programs also configured to be executed by the one or more processors include instructions for:
calculating a difference value between the central point and a preset central deviation value as a first endpoint value;
calculating a sum value between the central point and a preset central deviation value as a second endpoint value;
setting a distance between the first endpoint value and the second endpoint value as a local attention window.
The embodiment of the invention has the following advantages:
the embodiment of the invention introduces a local attention model into an encoding-decoding framework, calls an encoder to encode a plurality of source words in a received source text into a plurality of vectors, determines the central point of a local attention window according to one or more information in the central point before the t-th target word when the t-th target word is decoded according to the encoding state, the decoding state when the t-th target word is decoded, determines the local attention window accordingly, calls a decoder to decode the t-th target word according to the vectors positioned in the local attention window, is favorable for searching the position of the central attention suitable for the encoding state and the decoding state when the t-th target word is decoded in the source text, is also favorable for reducing the central point concentrated attention before the t-th target word is decoded in the source text, improves the concentrated attention of the position of the central point before the t-th target word is decoded in the source text, and improves the accuracy of the central positioning of the attention by comprehensively considering a plurality of information, thereby improving the quality of business processing such as translation.
Drawings
FIG. 1 is a flow diagram of the steps of a method of text processing according to one embodiment of the invention;
FIG. 2 is a block diagram of a text processing system according to an embodiment of the present invention;
FIG. 3 is a block diagram illustrating an apparatus for text processing in accordance with an exemplary embodiment;
fig. 4 is a schematic structural diagram of a server in the embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention more comprehensible, the present invention is described in detail with reference to the accompanying drawings and the detailed description thereof.
Referring to fig. 1, a flowchart illustrating steps of a text processing method according to an embodiment of the present invention is shown, which may specifically include the following steps:
step 101, receiving a source text.
The source text is a text to be subjected to business processing, and generally, the source text has a plurality of source words.
In contrast, the word after the business process is called a target word.
It should be noted that the source word and the target word are relative to the service processing, and both represent a word of a unit, and a punctuation mark, a number, a chinese character, a phrase, and an english word can be referred to as a word of a unit.
Step 102, invoking an encoder to encode the plurality of source words into a plurality of vectors.
In a specific implementation, the embodiment of the present invention may apply an Encoder-Decoder (encoding-decoding) framework.
In the Encoder-Decoder framework, an Encoder is provided for converting an input sequence into a fixed-length vector, and a Decoder is provided for reconverting the fixed-length vector into an output sequence.
The Encoder-Decoder framework can be applied to business processes such as translation, document extraction, question-answering system and the like, for example, in translation, an input sequence (i.e. a source text) is a text belonging to a first language to be translated, and an output sequence (i.e. a target word) is a translated text belonging to a second language; in a question-answering system, the input sequence is the question posed and the output sequence is the answer.
It should be noted that, the models specifically used by the encoder and the decoder may be set by those skilled in the art according to actual situations, for example, CNN (Convolutional Neural Network), RNN (Recurrent Neural Network), biRNN (Bidirectional Recurrent Neural Network), GRU (Gated Recurrent Unit), LSTM (Long Short-Term Memory, temporal Recurrent Neural Network), deep LSTM (Deep temporal Recurrent Neural Network), etc., and these models may also be combined by those skilled in the art according to actual situations, for example, CNN is used by the encoder, RNN is used by the decoder, RNN is used by the encoder, RNN is used by the decoder, etc., and the embodiments of the present invention are not limited thereto.
And 103, when the t-th target word is decoded, determining the center point of the local attention window according to the encoding state, the decoding state when the t-th target word is decoded and one or more information in the center point before the t-th target word is decoded.
In the embodiment of the invention, a local Attention model is introduced in an Encoder-Decoder framework, and the local Attention model is a variant of an Attention model (Attention model).
The attention model is a soft alignment model, and during the business process (e.g. translation), before each target word is generated, an attention alignment model is calculated, which indicates that when the current target word is generated, some source words in the source text are focused on by the attention (corresponding part in the weight matrix, with high probability value).
In the attention model, when each target word is generated, although some source words are "noticed", other source words of the source text have corresponding probabilities, which may result in that attention is not focused enough. While the local attention model ignores source words outside the window, thereby making attention more focused.
It should be noted that the local attention model does not require the encoder to encode all the input information into a fixed-length vector. Instead, the encoder needs to encode the input as a sequence of vectors, and each step selectively selects a subset of the sequence of vectors for further processing during decoding. Thus, the information carried by the input sequence can be fully utilized when each output is generated.
In a specific implementation, if the decoder decodes the tth (t is a positive integer) target word, that is, at time t, the center point of the local attention window may be determined with reference to one or more information of an encoding state, a decoding state when the tth target word is decoded, and a center point before the tth target word is decoded.
Wherein the encoding state facilitates finding a focus location in the source text that is suitable for encoding.
The decoding state at the time of decoding the tth target word facilitates finding a position in the source text suitable for focusing attention at the time of decoding the tth target word.
The center point before the t-th target word is decoded is beneficial to reducing the attention focused on the center point before the t-th target word is decoded and improving the attention focused on positions except the center point before the t-th target word is decoded.
In one embodiment of the present invention, step 103 may comprise the following sub-steps:
and a substep S11 of acquiring a first hidden layer state of the encoder, and acquiring one or more information in matrix connection of a weight matrix when decoding the t-th target word and a second hidden layer state of the decoder and decoding other target words before the t-th target word.
1. The coding state may be represented using a first hidden state of the encoder:
in a specific implementation, on one hand, the first word information of the jth (j is a positive integer) source word and the source words located after the jth source word recorded when the source text is received sequentially is extracted.
On the other hand, second word information of a jth source word recorded when the source text is received in a reverse order and a source word positioned before the jth source word is extracted.
And combining the first word information and the second word information, and converting into a first hidden layer state of the encoder.
2. The decoding state when the tth target word is decoded can be represented by a second hidden layer state of the decoder when the tth target word is decoded:
in a specific implementation, a first hidden layer state of an encoder, a t-1 th target word and a content vector when the t-1 th target word is decoded can be extracted, and a decoding state when the t-1 th target word is decoded can be obtained through function conversion.
Wherein, the vector content is obtained by adding the hidden vector sequences in coding according to the weight.
3. The center point before the decoding of the t-th target word can be represented by matrix connection of the weight matrix when other target words before the decoding of the t-th target word are decoded.
In a specific implementation, a plurality of weight matrices may be extracted when decoding other target words before the t-th target word.
Because the dimensions of different weight matrixes are different, in order to facilitate the addition of the weight matrixes, a plurality of weight matrixes can be mapped into a plurality of weight matrixes in a specified format.
And adding the weight matrixes of the plurality of specified formats to obtain matrix connection.
And a substep S12, determining the center of attention concentration in the source text as the central point of the local attention window by combining the first hidden layer state, the second hidden layer state and the matrix connection.
In an embodiment of the invention, one or more factors of the first hidden state, the second hidden state and the matrix connection are comprehensively considered, and the center of attention concentration in the source text is determined as the center point of the local attention window.
In one example of embodiment of the present invention, the substep S12 may comprise the substeps of:
and a substep S121, configuring a weight matrix for one or more information of the first hidden layer state, the second hidden layer state and the matrix connection, respectively.
And a substep S122, combining and configuring one or more information of the first hidden layer state, the second hidden layer state and the matrix connection of the weight matrix to obtain characteristic information.
And a substep S123 of performing nonlinear activation on the characteristic information and configuring a weight matrix to obtain activation information.
And a substep S124 of performing nonlinear transformation on the activation information to obtain a characteristic value.
Substep S125, rounding down the product between the feature value and the word length of the source word to obtain the center point of the local attention window.
Taking the example of using the first hidden state, the second hidden state and the matrix connection at the same time, the center point of the local attention window can be calculated by the following formula:
Figure BDA0001357576830000101
where mid represents the center point of the local attention window, floor () function is used to round down, | S | represents the word length of the source text, sigmoid function is used to perform nonlinear transformation, tanh function is used to perform nonlinear activation,
Figure BDA0001357576830000111
W pt 、W ps 、W a respectively represent four weight matrices, h t Refers to the second hidden layer state of the decoder at time t (i.e. decoding the t-th target word), h s First hidden layer state of finger encoder, att <t A matrix connection representing the weight matrix (including the weight matrix for the first hidden state, the second hidden state, the matrix connection, and the nonlinear activation function) at all times before time t.
A local attention window is determined based on the center point of the local attention window, step 104.
In a specific implementation, if a central point of the local attention window is determined, a region within a certain range of the central point may be used as the local attention window.
In one embodiment of the present invention, step 104 may include the following sub-steps:
and a substep S21 of calculating a difference between the center point and a preset center deviation value as a first endpoint value.
And a substep S22 of calculating a sum value between the center point and a preset center deviation value as a second endpoint value.
Substep S23, setting the distance between the first endpoint value and the second endpoint value as a local attention window.
By applying the embodiment of the invention, a central deviation value, namely a value deviating from the central point of the local attention window, can be preset.
It should be noted that the central deviation value may be a default value, or may be calculated according to the situation of the source text, which is not limited in the embodiment of the present invention.
Assuming the center point is mid and the center deviation value is w, the local attention window is:
[mid-w,mid+w]
further, with the above formula of calculating the center point of the local attention window, since the sigmoid function is a real number that converts an arbitrary real number into (0, 1), the center point mid is an integer between (1, | S |), and thus, a portion beyond the source text is ignored.
And if the difference value between the central point and the central deviation value is less than 0, taking 0 as the first endpoint value.
And if the difference value between the central point and the central deviation value is greater than the word length | S | of the source text, taking | S | as a second endpoint value.
At this time, the local attention window is:
[max(0,mid-w),min(|S|,mid+w)]
wherein the min function represents taking a smaller value and the max function represents taking a larger value.
Step 105, invoking a decoder to decode the t-th target word from the vector according to the source word located in the local attention window.
In the local attention model, attention to the tth target word may be calculated for the source word located in the local attention window, and the t target word is decoded from the vector by invoking a decoder according to the source word located in the local attention window and configured with attention.
The embodiment of the invention introduces a local attention model into an encoding-decoding framework, calls an encoder to encode a plurality of source words in a received source text into a plurality of vectors, determines the central point of a local attention window according to one or more information in the central point before the t-th target word when the t-th target word is decoded according to the encoding state, the decoding state when the t-th target word is decoded, determines the local attention window accordingly, calls a decoder to decode the t-th target word according to the vectors positioned in the local attention window, is favorable for searching the position of the central attention suitable for the encoding state and the decoding state when the t-th target word is decoded in the source text, is also favorable for reducing the central point concentrated attention before the t-th target word is decoded in the source text, improves the concentrated attention of the position of the central point before the t-th target word is decoded in the source text, and improves the accuracy of the central positioning of the attention by comprehensively considering a plurality of information, thereby improving the quality of business processing such as translation.
In order to make the embodiments of the present invention better understood by those skilled in the art, the following description is made by way of example of translation.
Suppose the source text is the Chinese sentence "I | is | Chinese | people | and | likes | eat | Chinese | vegetables. "where" | "is a separator between source words, there are 10 words in the source text along with punctuation.
If the translation is carried out manually, the English sentence 'I am a Chinese, I like applying Chinese food'
If a traditional local attention model is applied, the following translations are generated:
I am a Chinese,eating food.
at time 6, when "eating" is generated, the center point of the attention window is calculated as 7 (i.e., "eat"), eating is translated and "liking" is missed.
If the local attention model of the embodiment of the invention is applied, the following translation is generated:
I am a Chinese,like eating Chines food.
at the 6 th moment, namely when the 'like' is generated, the central point of the attention window is calculated to be 6 (namely the 'like'), the 'like' is translated, and the translation quality is improved.
It should be noted that for simplicity of description, the method embodiments are shown as a series of combinations of acts, but those skilled in the art will recognize that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Referring to fig. 2, a block diagram of a text processing system according to an embodiment of the present invention is shown, which may specifically include the following modules:
a source text receiving module 201, configured to receive a source text, where the source text has a plurality of source words;
a vector encoding module 202, configured to invoke an encoder to encode the plurality of source words into a plurality of vectors;
a central point determining module 203, configured to determine, when a tth target word is decoded, a central point of a local attention window according to one or more information in a coding state, a decoding state when the tth target word is decoded, and a central point before the tth target word is decoded;
a local attention window determination module 204 for determining a local attention window based on a center point of the local attention window;
a vector decoding module 205, configured to invoke a decoder to decode the vector into the t-th target word according to the source word located in the local attention window.
In one embodiment of the present invention, the central point determining module 203 comprises:
a reference information obtaining sub-module, configured to obtain one or more pieces of information in a first hidden layer state of the encoder, a second hidden layer state of the decoder when the t-th target word is decoded, and a matrix connection of a weight matrix when other target words before the t-th target word are decoded;
and the reference information determining submodule is used for determining the center of the concentration in the source text as the central point of the local attention window by combining the first hidden layer state, the second hidden layer state and the matrix connection.
In an embodiment of the present invention, the reference information obtaining sub-module includes:
a first word information extraction unit, configured to extract first word information of a jth source word and source words located after the jth source word, which are recorded when the source text is sequentially input;
a second word information extraction unit, configured to extract second word information of a jth source word recorded when the source text is input in a reverse order and a source word located before the jth source word;
a word information combining and converting unit, configured to combine the first word information and the second word information, and convert the first word information into a first hidden layer state of the encoder;
and/or the presence of a gas in the gas,
a weight matrix extracting unit, configured to extract a plurality of weight matrices when decoding other target words before the t-th target word;
the weight matrix mapping unit is used for mapping the weight matrixes into a plurality of weight matrixes with specified formats;
and the weight matrix adding unit is used for adding the weight matrixes of the plurality of specified formats to obtain matrix connection.
In one embodiment of the present invention, the reference information determination sub-module includes:
a weight matrix configuration unit, configured to configure a weight matrix for one or more information in the first hidden layer state, the second hidden layer state, and the matrix connection, respectively;
the reference information combination unit is used for combining and configuring one or more information of a first hidden layer state, a second hidden layer state and the matrix connection of the weight matrix to obtain characteristic information;
the nonlinear activation unit is used for carrying out nonlinear activation on the characteristic information and configuring a weight matrix to obtain activation information;
the nonlinear transformation unit is used for carrying out nonlinear transformation on the activation information to obtain a characteristic value;
and the lower rounding unit is used for rounding down the product between the characteristic value and the word length of the source word to obtain the central point of the local attention window.
In one embodiment of the present invention, the local attention window determination module 204 comprises:
the first endpoint value setting submodule is used for calculating a difference value between the central point and a preset central deviation value to serve as a first endpoint value;
the second endpoint value setting submodule is used for calculating the sum value of the central point and a preset central deviation value to serve as a second endpoint value;
a local attention window setting submodule for setting a distance between the first endpoint value and the second endpoint value as a local attention window.
With respect to the system in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated herein.
The embodiment of the invention introduces a local attention model into an encoding-decoding framework, calls an encoder to encode a plurality of source words in a received source text into a plurality of vectors, determines the central point of a local attention window according to one or more information in the central point before the t-th target word when the t-th target word is decoded according to the encoding state, the decoding state when the t-th target word is decoded, determines the local attention window accordingly, calls a decoder to decode the t-th target word according to the vectors positioned in the local attention window, is favorable for searching the position of the central attention suitable for the encoding state and the decoding state when the t-th target word is decoded in the source text, is also favorable for reducing the central point concentrated attention before the t-th target word is decoded in the source text, improves the concentrated attention of the position of the central point before the t-th target word is decoded in the source text, and improves the accuracy of the central positioning of the attention by comprehensively considering a plurality of information, thereby improving the quality of business processing such as translation.
FIG. 3 is a block diagram illustrating an apparatus 300 for text processing in accordance with an example embodiment. For example, the apparatus 300 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 3, the apparatus 300 may include one or more of the following components: processing component 302, memory 304, power component 306, multimedia component 308, audio component 310, input/output (I/O) interface 312, sensor component 314, and communication component 316.
The processing component 302 generally controls overall operation of the device 300, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 302 may include one or more processors 320 to execute instructions to perform all or a portion of the steps of the methods described above. Further, processing component 302 may include one or more modules that facilitate interaction between processing component 302 and other components. For example, the processing component 302 may include a multimedia module to facilitate interaction between the multimedia component 308 and the processing component 302.
The memory 304 is configured to store various types of data to support operations at the device 300. Examples of such data include instructions for any application or method operating on device 300, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 304 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 306 provides power to the various components of the device 300. The power components 306 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 300.
The multimedia component 308 includes a screen that provides an output interface between the device 300 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 308 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 300 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 310 is configured to output and/or input audio signals. For example, audio component 310 includes a Microphone (MIC) configured to receive external audio signals when apparatus 300 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 304 or transmitted via the communication component 316. In some embodiments, audio component 310 also includes a speaker for outputting audio signals.
The I/O interface 312 provides an interface between the processing component 302 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 314 includes one or more sensors for providing various aspects of status assessment for the device 300. For example, sensor assembly 314 may detect the open/closed status of device 300, the relative positioning of components, such as a display and keypad of apparatus 300, the change in position of apparatus 300 or a component of apparatus 300, the presence or absence of user contact with apparatus 300, the orientation or acceleration/deceleration of apparatus 300, and the change in temperature of apparatus 300. Sensor assembly 314 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 314 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 314 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 316 is configured to facilitate wired or wireless communication between the apparatus 300 and other devices. The device 300 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In an exemplary embodiment, the communication section 316 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 316 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 300 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 304 comprising instructions, executable by the processor 320 of the apparatus 300 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
A non-transitory computer readable storage medium having instructions therein, which when executed by a processor of a mobile terminal, enable the mobile terminal to perform a text processing method, the method comprising:
receiving source text, the source text having a plurality of source words;
invoking an encoder to encode the plurality of source words into a plurality of vectors;
when the t-th target word is decoded, determining the center point of a local attention window according to one or more information of an encoding state, a decoding state when the t-th target word is decoded, and the center point before the t-th target word is decoded;
determining a local attention window based on a center point of the local attention window;
and calling a decoder to decode the t-th target word from the vector according to the source word in the local attention window.
Optionally, the step of determining a center point of a local attention window according to one or more information of an encoding status, a decoding status when the t-th target word is decoded, and a center point before the t-th target word is decoded includes:
acquiring one or more information in a first hidden layer state of the encoder, a second hidden layer state of the decoder when the t-th target word is decoded, and a matrix connection of a weight matrix when other target words before the t-th target word are decoded;
and determining the center of attention concentration in the source text by combining the first hidden layer state, the second hidden layer state and the matrix connection, wherein the center of attention concentration is used as the center point of a local attention window.
Optionally, the step of obtaining a first hidden layer state of the encoder, when the t-th target word is decoded, a second hidden layer state of the decoder, and when other target words before the t-th target word are decoded, one or more pieces of information in matrix connection of a weight matrix includes:
extracting the first word information of the jth source word recorded when the source text is sequentially input and the source words behind the jth source word;
extracting second word information of a jth source word recorded when the source text is input in a reverse order and a source word positioned before the jth source word;
combining the first word information and the second word information, and converting into a first hidden layer state of the encoder;
and/or the presence of a gas in the atmosphere,
extracting a plurality of weight matrixes when other target words before the t-th target word are decoded;
mapping the weight matrixes into a plurality of weight matrixes with specified formats;
and adding the weight matrixes of the plurality of specified formats to obtain matrix connection.
Optionally, the step of determining a center of attention in the source text as a center point of a local attention window in combination with the first hidden state, the second hidden state and the matrix connection includes:
respectively configuring a weight matrix for one or more information in the first hidden layer state, the second hidden layer state and the matrix connection;
combining and configuring one or more pieces of information in a first hidden layer state, a second hidden layer state and the matrix connection of a weight matrix to obtain characteristic information;
carrying out nonlinear activation on the characteristic information and configuring a weight matrix to obtain activation information;
carrying out nonlinear transformation on the activation information to obtain a characteristic value;
and rounding down the product between the characteristic value and the word length of the source word to obtain the central point of the local attention window.
Optionally, the step of determining a local attention window based on the center of local attention comprises:
calculating a difference value between the central point and a preset central deviation value as a first endpoint value;
calculating a sum value between the central point and a preset central deviation value as a second endpoint value;
setting a distance between the first endpoint value and the second endpoint value as a local attention window.
Fig. 4 is a schematic structural diagram of a server in the embodiment of the present invention. The server 400, which may vary significantly depending on configuration or performance, may include one or more Central Processing Units (CPUs) 422 (e.g., one or more processors) and memory 432, one or more storage media 430 (e.g., one or more mass storage devices) storing applications 442 or data 444. Memory 432 and storage media 430 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 430 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 422 may be configured to communicate with the storage medium 430, and execute a series of instruction operations in the storage medium 430 on the server 400.
The server 400 may also include one or more power supplies 426, one or more wired or wireless network interfaces 450, one or more input-output interfaces 458, one or more keyboards 456, and/or one or more operating systems 441, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, and so forth.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. The invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements that have been described above and shown in the drawings, and that various modifications and changes can be made without departing from the scope thereof. The scope of the invention is only limited by the appended patent claims
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.
The embodiment of the invention discloses a method for processing a text, which comprises the following steps:
receiving source text, the source text having a plurality of source words;
invoking an encoder to encode the plurality of source words into a plurality of vectors;
when the tth target word is decoded, determining the center point of the local attention window according to one or more information in the encoding state, the decoding state when the tth target word is decoded, and the center point before the tth target word is decoded;
determining a local attention window based on a center point of the local attention window;
and calling a decoder to decode the t-th target word from the vector according to the source word in the local attention window.
The method according to A1, wherein the step of determining the center point of the local attention window according to one or more information of the encoding state, the decoding state when the t-th target word is decoded, and the center point before the t-th target word is decoded includes:
acquiring one or more information in a first hidden layer state of the encoder, a second hidden layer state of the decoder when the t-th target word is decoded, and a matrix connection of a weight matrix when other target words before the t-th target word are decoded;
and determining the center of attention concentration in the source text by combining the first hidden layer state, the second hidden layer state and the matrix connection, wherein the center of attention concentration is used as the center point of a local attention window.
A3, according to the method described in A2, the step of obtaining one or more pieces of information in a first hidden layer state of the encoder, a second hidden layer state of the decoder when the t-th target word is decoded, and a matrix connection of a weight matrix when other target words before the t-th target word are decoded includes:
extracting the jth source word recorded when the source text is sequentially input and the first word information of the source word behind the jth source word;
extracting second word information of a jth source word recorded when the source text is input in a reverse order and a source word positioned before the jth source word;
combining the first word information and the second word information, and converting into a first hidden layer state of the encoder;
and/or the presence of a gas in the atmosphere,
extracting a plurality of weight matrixes when other target words before the t-th target word are decoded;
mapping the weight matrixes into a plurality of weight matrixes with specified formats;
and adding the weight matrixes of the plurality of specified formats to obtain matrix connection.
A4, according to the method described in A2, the step of determining a center of attention in the source text as a center point of a local attention window by combining the first hidden state, the second hidden state and the matrix connection includes:
configuring a weight matrix for one or more information in the first hidden layer state, the second hidden layer state and the matrix connection respectively;
combining and configuring one or more information of a first hidden layer state, a second hidden layer state and the matrix connection of a weight matrix to obtain characteristic information;
carrying out nonlinear activation on the characteristic information and configuring a weight matrix to obtain activation information;
carrying out nonlinear transformation on the activation information to obtain a characteristic value;
and rounding down the product between the characteristic value and the word length of the source word to obtain the central point of the local attention window.
A5, the method according to A1 or A2 or A3 or A4, the step of determining a local attention window based on the center of local attention comprising:
calculating a difference value between the central point and a preset central deviation value as a first endpoint value;
calculating a sum value between the central point and a preset central deviation value as a second endpoint value;
setting a distance between the first endpoint value and the second endpoint value as a local attention window.
The embodiment of the invention also discloses B6, a text processing system, comprising:
a source text receiving module, configured to receive a source text, where the source text has a plurality of source words;
the vector coding module is used for calling the coder to code the source words into a plurality of vectors;
the central point determining module is used for determining the central point of the local attention window according to one or more information in the coding state, the decoding state when the tth target word is decoded and the central point before the tth target word is decoded when the tth target word is decoded;
a local attention window determination module to determine a local attention window based on a center point of the local attention window;
and the vector decoding module is used for calling a decoder to decode the t-th target word from the vector according to the source word in the local attention window.
B7, according to the system of B6, the central point determining module comprises:
a reference information obtaining sub-module, configured to obtain one or more pieces of information in a first hidden layer state of the encoder, a second hidden layer state of the decoder when the t-th target word is decoded, and a matrix connection of a weight matrix when other target words before the t-th target word are decoded;
and the reference information determining submodule is used for determining the center of the concentration in the source text as the central point of the local attention window by combining the first hidden layer state, the second hidden layer state and the matrix connection.
B8, according to the system of B7, the reference information obtaining sub-module includes:
a first word information extraction unit, configured to extract first word information of a jth source word and source words located after the jth source word, which are recorded when the source text is sequentially input;
a second word information extraction unit, configured to extract second word information of a jth source word recorded when the source text is input in a reverse order and a source word located before the jth source word;
a word information combining and converting unit, configured to combine the first word information and the second word information, and convert the first word information into a first hidden layer state of the encoder;
and/or the presence of a gas in the gas,
a weight matrix extracting unit, configured to extract a plurality of weight matrices when decoding other target words before the t-th target word;
the weight matrix mapping unit is used for mapping the weight matrixes into a plurality of weight matrixes with specified formats;
and the weight matrix adding unit is used for adding the weight matrixes of the plurality of specified formats to obtain matrix connection.
B9, according to the system of B7, the reference information determination sub-module includes:
a weight matrix configuration unit, configured to configure a weight matrix for one or more information in the first hidden layer state, the second hidden layer state, and the matrix connection, respectively;
the reference information combination unit is used for combining and configuring one or more information of a first hidden layer state, a second hidden layer state and the matrix connection of the weight matrix to obtain characteristic information;
the nonlinear activation unit is used for carrying out nonlinear activation on the characteristic information and configuring a weight matrix to obtain activation information;
the nonlinear transformation unit is used for carrying out nonlinear transformation on the activation information to obtain a characteristic value;
and the lower rounding unit is used for rounding down the product between the characteristic value and the word length of the source word to obtain the central point of the local attention window.
B10, the system according to B6 or B7 or B8 or B9, the local attention window determining module comprising:
the first endpoint value setting submodule is used for calculating a difference value between the central point and a preset central deviation value to serve as a first endpoint value;
the second endpoint value setting submodule is used for calculating the sum value of the central point and a preset central deviation value to serve as a second endpoint value;
a local attention window setting submodule for setting a distance between the first endpoint value and the second endpoint value as a local attention window.
The embodiment of the invention also discloses C11, a device for text processing, comprising a memory and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprise instructions for:
receiving a source text, the source text having a plurality of source words;
calling an encoder to encode the plurality of source words into a plurality of vectors;
when the t-th target word is decoded, determining the center point of a local attention window according to one or more information of an encoding state, a decoding state when the t-th target word is decoded, and the center point before the t-th target word is decoded;
determining a local attention window based on a center point of the local attention window;
and calling a decoder to decode the t-th target word from the vector according to the source word in the local attention window.
C12, the device of C11, the one or more programs also configured to be executed by the one or more processors including instructions for:
acquiring one or more information in a first hidden layer state of the encoder, a second hidden layer state of the decoder when the t-th target word is decoded, and a matrix connection of a weight matrix when other target words before the t-th target word are decoded;
and determining the center of attention concentration in the source text by combining the first hidden layer state, the second hidden layer state and the matrix connection, wherein the center of attention concentration is used as the center point of a local attention window.
C13, the device of C12, the one or more programs also configured to be executed by the one or more processors including instructions for:
extracting the first word information of the jth source word recorded when the source text is sequentially input and the source words behind the jth source word;
extracting the jth source word recorded when the source text is input in a reverse order and second word information of the source word before the jth source word;
combining the first word information and the second word information, and converting into a first hidden layer state of the encoder;
and/or the presence of a gas in the gas,
extracting a plurality of weight matrixes when other target words before the t-th target word are decoded;
mapping the weight matrixes into a plurality of weight matrixes with specified formats;
and adding the weight matrixes of the plurality of specified formats to obtain matrix connection.
C14, the device of C12, the one or more programs also configured to be executed by the one or more processors include instructions for:
configuring a weight matrix for one or more information in the first hidden layer state, the second hidden layer state and the matrix connection respectively;
combining and configuring one or more pieces of information in a first hidden layer state, a second hidden layer state and the matrix connection of a weight matrix to obtain characteristic information;
carrying out nonlinear activation on the characteristic information and configuring a weight matrix to obtain activation information;
carrying out nonlinear transformation on the activation information to obtain a characteristic value;
and rounding down the product between the characteristic value and the word length of the source word to obtain the central point of the local attention window.
C15, the device of C11 or C12 or C13 or C14, the one or more programs also configured to be executed by the one or more processors to include instructions for:
calculating a difference value between the central point and a preset central deviation value as a first endpoint value;
calculating a sum value between the central point and a preset central deviation value as a second endpoint value;
setting a distance between the first endpoint value and the second endpoint value as a local attention window.

Claims (16)

1. A method of text processing, comprising:
receiving a source text, the source text having a plurality of source words;
invoking an encoder to encode the plurality of source words into a plurality of vectors;
when the tth target word is decoded, determining the center point of the local attention window according to one or more information in the encoding state, the decoding state when the tth target word is decoded, and the center point before the tth target word is decoded; wherein the one or more information includes one or more information in a matrix connection of a weight matrix when decoding other target words before the t-th target word;
determining a local attention window based on a center point of the local attention window;
and calling a decoder to decode the t-th target word from the vector according to the source word in the local attention window.
2. The method of claim 1, wherein the step of determining the center point of the local attention window according to one or more of the encoding status, the decoding status when the t-th target word is decoded, and the center point before the t-th target word is decoded comprises:
acquiring one or more information in a first hidden layer state of the encoder, a second hidden layer state of the decoder when the t-th target word is decoded, and a matrix connection of a weight matrix when other target words before the t-th target word are decoded;
and determining the center of attention concentration in the source text by combining the first hidden layer state, the second hidden layer state and the matrix connection, wherein the center of attention concentration is used as the center point of a local attention window.
3. The method of claim 2, wherein the step of obtaining one or more information in a first hidden state of the encoder, a second hidden state of the decoder when decoding the t-th target word, and a matrix connection of weight matrices when decoding other target words before the t-th target word comprises:
extracting the first word information of the jth source word recorded when the source text is sequentially input and the source words behind the jth source word;
extracting second word information of a jth source word recorded when the source text is input in a reverse order and a source word positioned before the jth source word;
combining the first word information and the second word information, and converting into a first hidden layer state of the encoder;
and/or the presence of a gas in the gas,
extracting a plurality of weight matrixes when other target words before the t-th target word are decoded;
mapping the weight matrixes into a plurality of weight matrixes with specified formats;
and adding the weight matrixes of the plurality of specified formats to obtain matrix connection.
4. The method of claim 2, wherein the step of determining the center of attention in the source text as the center point of the local attention window in combination with the first hidden state, the second hidden state and the matrix connection comprises:
configuring a weight matrix for one or more information in the first hidden layer state, the second hidden layer state and the matrix connection respectively;
combining and configuring one or more information of a first hidden layer state, a second hidden layer state and the matrix connection of a weight matrix to obtain characteristic information;
carrying out nonlinear activation on the characteristic information and configuring a weight matrix to obtain activation information;
carrying out nonlinear transformation on the activation information to obtain a characteristic value;
and rounding down the product between the characteristic value and the word length of the source word to obtain the central point of the local attention window.
5. The method of claim 1 or 2 or 3 or 4, wherein the step of determining a local attention window based on the center of local attention comprises:
calculating a difference value between the central point and a preset central deviation value as a first endpoint value;
calculating a sum value between the central point and a preset central deviation value as a second endpoint value;
setting a distance between the first endpoint value and the second endpoint value as a local attention window.
6. A text processing system, comprising:
a source text receiving module, configured to receive a source text, where the source text has a plurality of source words;
the vector coding module is used for calling the coder to code the source words into a plurality of vectors;
the central point determining module is used for determining the central point of the local attention window according to one or more information in the coding state, the decoding state when the tth target word is decoded and the central point before the tth target word is decoded when the tth target word is decoded; wherein the one or more pieces of information include one or more pieces of information in matrix connection of a weight matrix when other target words before the t-th target word are decoded;
a local attention window determination module to determine a local attention window based on a center point of the local attention window;
and the vector decoding module is used for calling a decoder to decode the t-th target word from the vector according to the source word in the local attention window.
7. The system of claim 6, wherein the center point determination module comprises:
a reference information obtaining sub-module, configured to obtain one or more pieces of information in a first hidden layer state of the encoder, a second hidden layer state of the decoder when the t-th target word is decoded, and a matrix connection of a weight matrix when other target words before the t-th target word are decoded;
and the reference information determining submodule is used for determining the center of the concentration in the source text as the central point of the local attention window by combining the first hidden layer state, the second hidden layer state and the matrix connection.
8. The system of claim 7, wherein the reference information obtaining sub-module comprises:
a first word information extraction unit, configured to extract first word information of a jth source word and source words located after the jth source word, which are recorded when the source text is sequentially input;
a second word information extraction unit, configured to extract second word information of a jth source word recorded when the source text is input in a reverse order and a source word located before the jth source word;
a word information combining and converting unit, configured to combine the first word information and the second word information, and convert the first word information into a first hidden layer state of the encoder;
and/or the presence of a gas in the gas,
a weight matrix extracting unit, configured to extract a plurality of weight matrices when decoding other target words before the t-th target word;
the weight matrix mapping unit is used for mapping the weight matrixes into a plurality of weight matrixes with specified formats;
and the weight matrix adding unit is used for adding the weight matrixes of the plurality of specified formats to obtain matrix connection.
9. The system of claim 7, wherein the reference information determination sub-module comprises:
a weight matrix configuration unit, configured to configure a weight matrix for one or more information in the first hidden layer state, the second hidden layer state, and the matrix connection, respectively;
the reference information combination unit is used for combining and configuring one or more pieces of information in a first hidden layer state, a second hidden layer state and the matrix connection of a weight matrix to obtain characteristic information;
the nonlinear activation unit is used for carrying out nonlinear activation on the characteristic information and configuring a weight matrix to obtain activation information;
the nonlinear transformation unit is used for carrying out nonlinear transformation on the activation information to obtain a characteristic value;
and the rounding-down unit is used for rounding down the product between the characteristic value and the word length of the source word to obtain the central point of the local attention window.
10. The system of claim 6 or 7 or 8 or 9, wherein the local attention window determination module comprises:
the first endpoint value setting submodule is used for calculating a difference value between the central point and a preset central deviation value to serve as a first endpoint value;
the second endpoint value setting submodule is used for calculating a sum value between the central point and a preset central deviation value to serve as a second endpoint value;
a local attention window setting sub-module for setting a distance between the first endpoint value and the second endpoint value as a local attention window.
11. An apparatus for text processing comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors comprises instructions for:
receiving source text, the source text having a plurality of source words;
invoking an encoder to encode the plurality of source words into a plurality of vectors;
when the tth target word is decoded, determining the center point of the local attention window according to one or more information in the encoding state, the decoding state when the tth target word is decoded, and the center point before the tth target word is decoded; wherein the one or more pieces of information include one or more pieces of information in matrix connection of a weight matrix when other target words before the t-th target word are decoded;
determining a local attention window based on a center point of the local attention window;
and calling a decoder to decode the t-th target word from the vector according to the source word in the local attention window.
12. The device of claim 11, further configured to execute the one or more programs by one or more processors comprises instructions for:
acquiring one or more information in a first hidden layer state of the encoder, a second hidden layer state of the decoder when the t-th target word is decoded, and a matrix connection of a weight matrix when other target words before the t-th target word are decoded;
and determining the center of attention concentration in the source text by combining the first hidden layer state, the second hidden layer state and the matrix connection, wherein the center of attention concentration is used as the center point of a local attention window.
13. The device of claim 12, further configured to execute the one or more programs by one or more processors comprises instructions for:
extracting the first word information of the jth source word recorded when the source text is sequentially input and the source words behind the jth source word;
extracting second word information of a jth source word recorded when the source text is input in a reverse order and a source word positioned before the jth source word;
combining the first word information and the second word information, and converting into a first hidden layer state of the encoder;
and/or the presence of a gas in the atmosphere,
extracting a plurality of weight matrixes when other target words before the t-th target word are decoded;
mapping the weight matrixes into a plurality of weight matrixes with specified formats;
and adding the weight matrixes of the plurality of specified formats to obtain matrix connection.
14. The device of claim 12, further configured to execute the one or more programs by one or more processors comprises instructions for:
configuring a weight matrix for one or more information in the first hidden layer state, the second hidden layer state and the matrix connection respectively;
combining and configuring one or more information of a first hidden layer state, a second hidden layer state and the matrix connection of a weight matrix to obtain characteristic information;
carrying out nonlinear activation on the characteristic information and configuring a weight matrix to obtain activation information;
carrying out nonlinear transformation on the activation information to obtain a characteristic value;
and rounding down the product between the characteristic value and the word length of the source word to obtain the central point of the local attention window.
15. The device of claim 11 or 12 or 13 or 14, further configured to execute the one or more programs by one or more processors including instructions for:
calculating a difference value between the central point and a preset central deviation value as a first endpoint value;
calculating a sum value between the central point and a preset central deviation value as a second endpoint value;
setting a distance between the first endpoint value and the second endpoint value as a local attention window.
16. A storage medium, characterized in that instructions in the storage medium, when executed by a processor of a mobile terminal, enable the mobile terminal to perform a text processing method according to one or more of method claims 1-5.
CN201710602815.0A 2017-07-21 2017-07-21 Text processing method and system and text processing device Active CN109284510B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710602815.0A CN109284510B (en) 2017-07-21 2017-07-21 Text processing method and system and text processing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710602815.0A CN109284510B (en) 2017-07-21 2017-07-21 Text processing method and system and text processing device

Publications (2)

Publication Number Publication Date
CN109284510A CN109284510A (en) 2019-01-29
CN109284510B true CN109284510B (en) 2022-10-21

Family

ID=65185298

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710602815.0A Active CN109284510B (en) 2017-07-21 2017-07-21 Text processing method and system and text processing device

Country Status (1)

Country Link
CN (1) CN109284510B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110347790B (en) * 2019-06-18 2021-08-10 广州杰赛科技股份有限公司 Text duplicate checking method, device and equipment based on attention mechanism and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102047680A (en) * 2008-06-02 2011-05-04 皇家飞利浦电子股份有限公司 Apparatus and method for adjusting the cognitive complexity of an audiovisual content to a viewer attention level
CN102054178A (en) * 2011-01-20 2011-05-11 北京联合大学 Chinese painting image identifying method based on local semantic concept

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060015323A1 (en) * 2004-07-13 2006-01-19 Udupa Raghavendra U Method, apparatus, and computer program for statistical translation decoding

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102047680A (en) * 2008-06-02 2011-05-04 皇家飞利浦电子股份有限公司 Apparatus and method for adjusting the cognitive complexity of an audiovisual content to a viewer attention level
CN102054178A (en) * 2011-01-20 2011-05-11 北京联合大学 Chinese painting image identifying method based on local semantic concept

Also Published As

Publication number Publication date
CN109284510A (en) 2019-01-29

Similar Documents

Publication Publication Date Title
US20210042474A1 (en) Method for text recognition, electronic device and storage medium
CN107291690B (en) Punctuation adding method and device and punctuation adding device
WO2017114020A1 (en) Speech input method and terminal device
JP6918181B2 (en) Machine translation model training methods, equipment and systems
JP2017535007A (en) Classifier training method, type recognition method and apparatus
CN110633755A (en) Network training method, image processing method and device and electronic equipment
CN107291704B (en) Processing method and device for processing
CN111242303B (en) Network training method and device, and image processing method and device
CN110781813B (en) Image recognition method and device, electronic equipment and storage medium
WO2019165832A1 (en) Text information processing method, device and terminal
CN110633470A (en) Named entity recognition method, device and storage medium
CN111160047A (en) Data processing method and device and data processing device
CN107564526B (en) Processing method, apparatus and machine-readable medium
CN111369978A (en) Data processing method and device and data processing device
CN113539233A (en) Voice processing method and device and electronic equipment
US20210089726A1 (en) Data processing method, device and apparatus for data processing
CN109992754B (en) Document processing method and device
CN108733657B (en) Attention parameter correction method and device in neural machine translation and electronic equipment
CN111382748B (en) Image translation method, device and storage medium
CN109284510B (en) Text processing method and system and text processing device
CN109887492B (en) Data processing method and device and electronic equipment
CN111832322A (en) Statement translation method and device, electronic equipment and storage medium
CN109979435B (en) Data processing method and device for data processing
CN109284509B (en) Text processing method and system and text processing device
CN109977424B (en) Training method and device for machine translation model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant