CN117556787B - Method and system for generating target text sequence for natural language text sequence - Google Patents

Method and system for generating target text sequence for natural language text sequence Download PDF

Info

Publication number
CN117556787B
CN117556787B CN202410038359.1A CN202410038359A CN117556787B CN 117556787 B CN117556787 B CN 117556787B CN 202410038359 A CN202410038359 A CN 202410038359A CN 117556787 B CN117556787 B CN 117556787B
Authority
CN
China
Prior art keywords
node
output
word
text sequence
local
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410038359.1A
Other languages
Chinese (zh)
Other versions
CN117556787A (en
Inventor
张岳
白雪峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Westlake University
Original Assignee
Westlake University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Westlake University filed Critical Westlake University
Priority to CN202410038359.1A priority Critical patent/CN117556787B/en
Publication of CN117556787A publication Critical patent/CN117556787A/en
Application granted granted Critical
Publication of CN117556787B publication Critical patent/CN117556787B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • G06F40/157Transformation using dictionaries or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The application relates to a method for generating a target text sequence for a natural language text sequence, which comprises the steps that a processor receives an input vector corresponding to the natural language text sequence of the target text sequence to be generated; taking the current element in the received input vector as an input word node, taking each generated word in the target text sequence as a generated word node sequence, constructing a word node diagram based on the input word node and each generated word node, wherein the word node diagram also comprises a global node and a local node, the global node is connected with the input word node and each generated word node, and the local node is connected with the latest w generated word nodes in the input word node and the generated word node sequence; based on the word node diagram, generating a current output word of the target text sequence by using the trained first learning network. The method can better model the long-distance dependency relationship and the local characteristic in the text sequence, and can generate the target text sequence more accurately and more efficiently.

Description

Method and system for generating target text sequence for natural language text sequence
Technical Field
The present application relates to the field of natural language processing technology, and more particularly, to a method and system for generating a target text sequence for a natural language text sequence.
Background
In the field of natural language processing, existing methods mostly use neural network models to model natural language when given a segment of natural language text sequence, requiring the generation of a target text sequence based on different tasks such as machine translation, information extraction, dialog systems, text abstractions, etc.
A common neural network model used to model natural language is a Long Short-Term Memory (LSTM) based sequence model. LSTM is a recurrent neural network that is based on gating mechanisms (gates) and processes sequence data by passing information between time steps. Although LSTM-based language models effectively alleviate the gradient problem through a memory mechanism, they have a disadvantage in that they still have limited modeling capability for long-distance dependence due to the fact that their memory depends on a linear transmission mechanism of information from front to back, i.e., the output at the present moment depends only on the output at the previous moment directly, so that the efficiency of information transfer between nodes with long time sequence distances is low. Another common natural language text modeling method is a sequence model based on a transducer. The transducer uses a self-attention mechanism to directly conduct information interaction between any two words, which is helpful for modeling long-distance dependence, but has the disadvantage that the self-attention mechanism on which the transducer depends treats information of all positions equally, weakens natural time sequence distance differences among the positions, and leads to the fact that the importance of local characteristics to a current task cannot be fully considered when the model predicts, and the information of a long-distance position can have an inappropriately high weight.
It can be seen that the existing natural language model has defects in modeling capability of long-distance dependence of a natural language text sequence or in local feature modeling capability, and no method capable of predicting a target text sequence more accurately by considering the two methods is found.
Disclosure of Invention
The present application has been made to solve the above-mentioned problems occurring in the prior art. The application aims to provide a method and a system for generating a target text sequence for a natural language text sequence, which can output the target text sequence which is more matched with the user requirement based on the natural language text sequence input by the user.
According to a first aspect of the present application, there is provided a method of generating a target text sequence for a natural language text sequence, comprising, by a processor: receiving an input vector corresponding to a natural language text sequence of a target text sequence to be generated; taking the current element in the received input vector as an input word node, taking each generated word in the target text sequence as a generated word node sequence, and constructing a word node diagram corresponding to the current output word in the target text sequence based on the input word node and each generated word node, wherein the word node diagram also comprises a global node and a local node, the global node is connected with the input word node and each generated word node, and the local node is connected with the input word node and the latest w generated word nodes in the generated word node sequence; and generating a current output word in the target text sequence by using the trained first learning network based on the word node diagram.
According to a second aspect of the present application, there is provided a system for generating a target text sequence for a natural language text sequence, comprising: an interface configured to receive an input vector corresponding to a natural language text sequence of a target text sequence to be generated; a processor configured to perform a method of generating a target text sequence for a natural language text sequence according to various embodiments of the application.
According to a third aspect of the present application, there is provided a non-transitory computer-readable storage medium having stored thereon computer-executable instructions which, when executed by a processor, implement the steps of a method of generating a target text sequence for a natural language text sequence according to various embodiments of the present application.
According to the method and the system for generating the target text sequence for the natural language text sequence, when the target text sequence is generated based on the natural language text sequence, firstly, the word node diagram comprising the global nodes and the local nodes is constructed by utilizing the current elements in the input vectors corresponding to the natural language text sequence and the generated words in the target text sequence, the long-distance dependence in the text sequence can be better modeled by constructing the global nodes, the distance between the two nodes is closer, the local features of a plurality of nodes with more important influences can be additionally enhanced by constructing the local nodes, and therefore, when the first learning network generates the current output words in the target text sequence based on the word node diagram comprising both the global nodes and the local nodes, the generated target text sequence can be more matched with the requirements of users with higher accuracy.
The foregoing description is only an overview of the present application, and is intended to be implemented in accordance with the teachings of the present application in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present application more readily apparent.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
In the drawings, which are not necessarily drawn to scale, like numerals may describe similar components in different views. The accompanying drawings illustrate various embodiments by way of example in general and not by way of limitation, and together with the description and claims serve to explain the disclosed embodiments. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts. Such embodiments are illustrative and not intended to be exhaustive or exclusive of the present apparatus or method.
Fig. 1 shows a flow diagram of a method of generating a target text sequence for a natural language text sequence according to an embodiment of the application.
Fig. 2 shows a schematic diagram of a method of generating a target text sequence for a natural language text sequence according to an embodiment of the present application.
Fig. 3 shows a schematic diagram of the partial composition and principle of a first learning network according to an embodiment of the present application.
Fig. 4 shows a partial composition diagram of a system for generating a target text sequence for a natural language text sequence according to an embodiment of the application.
Detailed Description
The present application will be described in detail below with reference to the drawings and detailed description to enable those skilled in the art to better understand the technical aspects of the present application. Embodiments of the present application will be described in further detail below with reference to the drawings and specific examples, but not by way of limitation.
The terms "first," "second," and the like, as used herein do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises" and the like means that elements preceding the word encompass the elements recited after the word, and not exclude the possibility of also encompassing other elements. The order in which the steps of the methods described in connection with the figures are performed is not intended to be limiting. As long as the logical relationship between the steps is not affected, several steps may be integrated into a single step, the single step may be decomposed into multiple steps, or the execution order of the steps may be exchanged according to specific requirements.
It should also be understood that the term "and/or" in the present application is merely an association relationship describing the associated object, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone. In the present application, the character "/" generally indicates that the front and rear related objects are an or relationship.
Fig. 1 shows a flow diagram of a method of generating a target text sequence for a natural language text sequence according to an embodiment of the application. Fig. 2 shows a schematic diagram of a method of generating a target text sequence for a natural language text sequence according to an embodiment of the present application.
As shown in fig. 1, in the case that the target text sequence is to be generated for the natural language text sequence, an input vector corresponding to the natural language text sequence of the target text sequence to be generated may be received by the processor first in step 101. As shown in fig. 2, the vector is inputMay be a feature vector generated by performing processing such as word embedding (Word Embedding) on a natural language text sequence of a target text sequence to be generated, where k is the current time. It should be noted that, here, the length of the natural language text sequence may be, for example, hundreds of words, or thousands of words or longer, and the length of the text sequence does not affect the applicability of the method according to the embodiment of the present application. In some embodiments, the processor may also receive a natural language text sequence of the target text sequence to be generated entered by the user and employ any suitable natural language text word vector generation method including word embedding to generate a corresponding input vector/>. In other embodiments, the processor may also receive related requirements or queries (not shown) about generating the target text sequence, etc. entered by the user along with the natural language text sequence, and the specific user requirements and manner of user queries are not limiting of the application. In the view of figure 2,For the generated target text sequence, the text sequence may be, for example, a text sequence generated based on the natural language text sequence entered by the user that matches the user's requirements and/or user queries, etc.
Next, in step 102, using the current element in the received input vector as an input word node, using each generated word in the target text sequence as a generated word node sequence, and constructing a word node diagram corresponding to the current output word in the target text sequence based on the input word node and each generated word node, wherein the word node diagram further includes a global node gl and a local node lo as shown in fig. 2, and the global node gl and the input word nodeEach generated word node/>The local node lo is connected with the input word node/>The latest w generated word nodes/>, in the generated word node sequenceAre connected. Therefore, by means of the global node gl, the path length between the nodes in the word node graph can be 2, and even for the nodes far away in time sequence, the information transmission distance between the nodes is not more than 2, so that the long-distance dependency relationship between the nodes can be learned more directly and more efficiently. On the other hand, by means of the local node lo, the influence of the latest w generated word nodes, which is more significant, can be additionally emphasized by virtue of the fact that the local node lo is closer to the current output word, and for example, the most suitable w value can be selected experimentally according to different target text sequence generation requirements, so that the influence of the local node on the current output word node can be fully and properly reflected.
Then, in step 103, based on the word node diagram, a current output word in the target text sequence is generated using the trained first learning network
When the method and the system for generating the target text sequence for the natural language text sequence are used, when the target text sequence is generated based on the input vector corresponding to the natural language text sequence, firstly, the word node diagram comprising the global node and the local node is constructed by utilizing the current element in the input vector and the generated word in the target text sequence, and modeling of long-distance dependency relationship among the nodes and local feature modeling of the more adjacent nodes are combined through construction of the global node and the local node, so that the constructed target text sequence generation model is more accurate, has higher accuracy when the first learning network generates the target text sequence based on the word node diagram, and is more matched with the requirements of users.
In some embodiments, the target text sequence includes, but is not limited to, one of, or a combination of, source language digest text of the natural language text sequence, target language translation text of the natural language text sequence, and dialog text sequence matching the natural language text sequence, as the application is not limited in this regard. By way of example only, when, for example, the target text sequence to be generated is a source language summary text sequence of the natural language text sequence, the first learning network may be trained by selecting an automatic summary training dataset in the same language as the source language of the natural language text sequence, e.g., in the case of english in the source language, the first learning network may be trained on a single language summary dataset such as SAMSum corpus (manually annotated dialog dataset for abstract summary), CNN/DAILYMAIL dataset (partial extraction news corpus dataset), NYT Annotated Corpus (partial extraction corpus dataset), newsroom (extraction + generation corpus dataset). In other embodiments, AMI (a long meeting summary dataset), XSum (extremely generated news summary dataset), dialogSum (dialog summary dataset in real scenes), etc. may also be included, and the present application is not limited in particular. For another example, when the target text sequence to be generated is a cross-language summary text sequence of the natural language text sequence, a cross-language summary data set such as Zh2EnSum (social media domain data set), dialogSumX (daily session data set) or QMSumX (session data set of a meeting) may be selected to train the first learning network.
Fig. 3 shows a schematic diagram of the partial composition and principle of a first learning network according to an embodiment of the present application. As shown in fig. 3, the first learning network 30 is formed by sequentially connecting a neural network 31 and an output layer 32 in series, and the neural network 31 further includes a local feature calculating unit 311, a global feature calculating unit 312, and a node state updating unit 313. In the view of figure 3 of the drawings,For natural language text sequences entered by the user,/>Is the generated target text sequence.
In some embodiments, the node status updating section 313 may be configured to: the local semantic features outputted by the local feature calculation unit 311, the global semantic features outputted by the global feature calculation unit 312, and the input word nodes are usedCorresponding first input features, hidden state/>, for nodesThe updating of T rounds is performed cyclically, wherein, for example, the node hidden state/>, at time T-1Includes node output hidden state/>And node memory hidden state/>Outputting hidden state/>, of the node updated by the T-th roundAs an output of the graph neural network 31. In some embodiments, the value of T may be set to a fixed value in advance according to experimental data, and may also be different for different requirements of the target text sequence to be generated. In addition, in the training process of the first learning network 30/the graph neural network 31, a proper value of T can be determined according to the approximation degree of the truth sequence with the training data, so that the cycle of T time steps is ensured to have obvious improvement on the prediction precision, and meanwhile, the training and prediction efficiency is not excessively reduced.
The local feature calculation section 311 may be configured to: w generated word nodes connected to the local node l output by the node status updating unit 313Node hidden state/>, e.g., at time t-1To/>, local semantic features of the word node graphAnd (5) performing calculation.
The global feature calculation section 312 may be configured to: each generated word node connected to the global node gl output by the node status updating unit 313Node hidden state at time t-1, for example, to characterize the global semantics of the word node graph/>And (5) performing calculation.
The output layer 32 may be configured to output hidden states based on the node updated by the T-th round of the neural network 31Generating a current output word/>, in the target text sequence
In some embodiments, the local semantic features include local context output featuresAnd local context memory feature/>The global semantic features include global context output features/>And global context memory feature/>
The local feature calculation section 311 may be further configured to: based on w generated word nodes connected to the local node output from the node state updating unit 313 by using a pooling functionFor example node output hidden state/>, at time t-1To predict local context output features/>, of the word node graph; Based on w generated word nodes/>, which are connected to the local node lo and output from the node state updating unit 313, by using a pooling functionFor example node memory hidden state/>, at time t-1To predict local contextual memory features/>, of the word node graph. By way of example only, the local context output feature/>, may be calculated in the manner of equations (1-1) and (1-2) as followsAnd local context memory feature/>
Formula (1-1)
Formula (1-2)
Wherein,Representing a mean-pooling function.
The global feature calculation section 312 may be further configured to: based on each generated word node connected to the global node gl output from the node status updating section 313 using an attention mechanism in a transducer-like neural network modelFor example node output hidden state/>, at time t-1To predict the global context output features/>, of the word node graph; Based on each generated word node/>, which is connected to the global node gl and is output from the node status updating section 313, by using an attention mechanismFor example node memory hidden state/>, at time t-1To predict global context memory features/>, of the word node graph. By way of example only, the global context output feature/>, may be calculated as in the following equation (2-1) -equation (2-5)And global context memory feature/>
Formula (2-1)
Formula (2-2)
Formula (2-3)
Formula (2-4)
Formula (2-5)
Wherein,Is the dimension of the hidden layer vector,/>The Query (Query) vector, key (Key) vector and Value (Value) vector are represented respectively, and their specific meanings and calculation methods are known to those skilled in the art and are not described herein.
Given in equation (2-1)The calculation mode is an exemplary calculation method of the attention mechanism, and other applicable calculation modes can be also adopted, which is not limited by the present application.
It can be seen that the attention mechanism introduced by the above formula (2-1) -formula (2-5) enables the features in the hidden states of all the historical prediction results to be fully utilized, so that the global context output features and the global context memory features with higher relevance and higher accuracy are generated, therefore, the graph neural network in the embodiment of the application has stronger processing capability similar to that of the transducer model on the sequence at the same time, and compared with the transducer model which only focuses on the global features, the embodiment of the application more reasonably strengthens the additional influence of each node closer to the current output node through the local feature calculation part 311. In this way, the first learning network 30 can be more focused on nearby words when performing target text sequence prediction. This feature has significant advantages in text sequence prediction tasks because the text prediction task itself relies more on local features, while the impact of long-range features is relatively small.
More specifically, the node state updating section 313 may be constructed based on a structure similar to a sequential state long-short term memory network, for example, unlike the existing sequential state long-short term memory network having only an input gate, a forget gate, and an output gate, the node state updating section 313 is provided with an input gate i, a local gate l, a forget gate f, a global gate g, and an output gate o, and the node state updating section 313 may be further configured to: gating an input feature vector u corresponding to a second input feature by using an input gate vector i corresponding to the input gate to generate a memory hidden state of the current round nodeFirst component/>; The local context memory characteristic/>, of the previous round (time t-1), is memorized by using the local gate vector l corresponding to the local gateGating to generate node memory hidden state/>, of current round (t moment)Second component/>; Memorizing hidden states/>, of nodes in the previous round by using a forgetting gate vector f corresponding to the forgetting gateGating to generate the memory hidden state/>, of the current round nodeThird component/>; The global context of the previous round is memorized by using the global gate vector g corresponding to the global gateGating to generate the memory hidden state/>, of the current round nodeFourth component/>
Based on this, further based on the first componentThe second component/>The third componentAnd the fourth component/>To generate the memory hidden state/>, of the current round nodeAnd memorizing hidden states/>' for the nodes by utilizing output gate vectors o corresponding to the output gatesGating is carried out to generate node output hidden state/>, which is the current round
For example only, the node output hidden state for the current round may be calculated according to the following formulas (3-1) and (3-6)
=/>Formula (3-1)
=/>Formula (3-2)
Formula (3-3)
Formula (3-4)
Formula (3-5)
Formula (3-6)
Wherein o represents an output gate vector corresponding to the output gate for controlling the influence of the memory characteristic on the output characteristic, and wherein, as for the bit-wise multiplication operation of the vector, tan is a nonlinear activation function. In other embodiments, other non-linear activation functions besides tanh may be used, as the application is not limited in this regard.
In some embodiments, the input gate vector i, the local gate vector l, the forget gate vector f, the global gate vector g, the output gate vector o, and the input feature vector u corresponding to the second input feature may be constructed as follows:
first, node output hidden state based on previous round Local context output feature of previous round/>Global context output feature of previous round/>And the first input feature/>To construct a composite vector/>
And then based on the composite vectorAn input gate vector i corresponding to the input gate, a local gate vector l corresponding to the local gate, a forget gate vector f corresponding to the forget gate, a global gate vector g corresponding to the global gate, and an output gate vector o corresponding to the output gate are generated using a linear activation function such as sigmoid and a normalization function such as softmax.
Furthermore, based on the composite vectorUsing for example/>And generating an input feature vector u corresponding to the second input feature by using the nonlinear activation function.
By way of example only, the input gate vector i, the local gate vector l, the forget gate vector f, the global gate vector g, the output gate vector o, and the input feature vector u corresponding to the second input feature may be calculated according to the following formula (4-1) -formula (4-8):
Formula (4-1)
Formula (4-2)
Formula (4-3)
Formula (4-4)
Formula (4-5)
Formula (4-6)
Formula (4-7)
Formula (4-8)
Wherein,Representing a sigmoid activation function,/>As a normalization function,/>A non-linear activation function is used,And/>Each vector i, l, f, g, o and/>, respectivelyCorresponding two-dimensional parameter matrix and one-dimensional parameter vector can set initial value according to need, and optimize convergence in training process.
By introducing the input gate vector i, the local gate vector l, the forget gate vector f, the global gate vector g and the output gate vector o, the invention can effectively perform information interaction between nodes, thereby realizing state update. Compared with an LSTM structure, the local gate vector l and the global gate vector g introduced by the invention allow the model to learn local features and global features, thereby effectively improving the middle-long distance dependence modeling capability of the model. In comparison to the transfomer structure, the present invention uses a gating mechanism to control information flow on the one hand, with finer granularity of control than the way of attention (using number multiplication); on the other hand, the invention uses the local department vector l to control local information flow besides the global door vector g, and additionally models local information in a text sequence, so that the invention is more suitable for learning short-distance dependence in text compared with a Transformer.
In addition, compared with the residual connection mode used by the transfomer, the gating cycle mechanism used by the application has two advantages: 1) The gating circulation mechanism can better control information flow between layers through the input gate and the forgetting gate, so that the graph neural network of the embodiment of the application can fully learn potential dependency relations among multiple layers of neural networks, and meanwhile, compared with the existing Transformer model, the graph neural network has stronger deep dependency modeling capability, and therefore, the graph neural network model with more layers can be supported, that is, the graph neural network model can accurately model larger-scale complex problems. 2) The gating circulation mechanism does not need to allocate independent parameters for each layer, but realizes parameter sharing, so that the gating circulation mechanism has obvious advantages in parameter efficiency, and meanwhile, the model occupies smaller storage space.
In some embodiments, the output layer 32 may be further configured to: node output hidden state after being updated in the T-th round based on the output of the graph neural network 31And an output vector matrix/>, generated based on the predetermined vocabularyA probability distribution P of the current output word is generated as shown in the following formula (5):
Formula (5)
Wherein,Representing a first learning network (by/>Parameter set representing the first learning network) at the current location/>Output probability distribution of/>Representing context in the generated target text sequence, i.e./>As a probability normalization function,/>For outputting vector matrix, wherein/>Representing the size of the predetermined vocabulary,/>Representing the word vector dimension.
Then, based on the probability distributionTo finalize the current output word, for example, the output of the previous step may be used as the input of the next step, and a beam search (beam search) method may be used to select an optimal output sequence, thereby outputting the word currently.
In some embodiments, a training data set with target text sequence truth labels may be utilizedTraining the first learning network or the graph neural network may use, for example, a negative log-likelihood function shown in equation (6) as a loss function of the graph neural network or the first learning network:
Formula (6)
Wherein,Representing the amount of training data in the training data set. Furthermore, optimizers based on gradient descent methods (e.g., adam, etc.) may be used to optimize based on the loss function described above, for example, as the application is not limited in this regard.
In other embodiments, when each output word in the target text sequence is generated by training, the truth label of the current output word can be selected to be used as the input of generating the next output word by training with the first probability p1, and accordingly, the current output word output by the first learning network in training is selected to be used as the input of generating the next output word by training with the probability of 1-p. That is, during the training of the first learning network or the graph neural network, teachering-mapping is used for the probability selection of p1, and model autoregressive is used for the probability selection of 1-p1 (Autoregressive). In particular, during the early stages of training, the value of p1 may be relatively large, while p1 may be gradually decreased as the training process proceeds. Therefore, in the early training period, when the prediction capability of the first learning network or the graph neural network is weak, the teachering-mapping is utilized to avoid over divergence, so that training convergence is accelerated; in the later stage of training, the smaller p1 value can repair the self-generated error by using autoregressive as much as possible, so that the overrule can be practically avoided, and meanwhile, the generation of the target text sequence has more possibility, so that a possible better solution is not missed.
There is also provided, in accordance with an embodiment of the present application, a system for generating a target text sequence for a natural language text sequence. Fig. 4 shows a partial composition diagram of a system for generating a target text sequence for a natural language text sequence according to an embodiment of the application.
As shown in fig. 4, a system 400 according to an embodiment of the present application may be a special purpose computer or a general purpose computer including at least an interface 401 and a processor 402. Wherein the interface 401 may be configured, for example, to receive a natural language text sequence of a target text sequence to be generated entered by a user. The processor 402 may be configured to perform the steps of the method of generating a target text sequence for a natural language text sequence according to various embodiments of the present application and ultimately generate a current output word in the target text sequence.
In other embodiments, the interface 401 may be further configured to receive related requirements or queries about generating the target text sequence, etc. entered with the natural language text sequence, and the application is not limited in particular.
In some embodiments, interface 401 may include a network adapter, a cable connector, a serial connector, a USB connector, a parallel connector, a high-speed data transmission adapter (such as fiber optic, USB 3.0, a lightning interface, etc.), a wireless network adapter (such as a WiFi adapter), a telecommunications (3G, 4G/LTE, etc.) adapter, etc., as the application is not limited in this respect. The system 400 may transmit the retrieved natural language text sequence or the like of the target text sequence to be generated, which is input by the user, to other parts of the processor 402 or the like via the interface 401. In some embodiments, interface 401 may also receive, for example, a trained first learning network from, for example, a first learning network training device (not shown), to name but a few.
In some embodiments, the processor 402 may be a processing device including more than one general purpose processing device, such as a microprocessor, central Processing Unit (CPU), graphics Processing Unit (GPU), or the like. More specifically, the processor may be a Complex Instruction Set Computing (CISC) microprocessor, a Reduced Instruction Set Computing (RISC) microprocessor, a Very Long Instruction Word (VLIW) microprocessor, a processor running other instruction sets, or a processor running a combination of instruction sets. The processor may also be one or more special purpose processing devices such as an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), a system on a chip (SoC), or the like.
In other embodiments, the system 400 may further include a memory (not shown) for storing data such as the trained first learning network. In some embodiments, the memory may also store computer-executable instructions, such as one or more processing programs, to implement the steps of the method of generating a target text sequence for a natural language text sequence according to various embodiments of the present application.
There is further provided, in accordance with an embodiment of the present application, a non-transitory computer-readable storage medium having stored thereon computer-executable instructions that, when executed by a processor, implement a method of generating a target text sequence for a natural language text sequence in accordance with various embodiments of the present application.
In some embodiments, the non-transitory computer readable medium described above may be a medium such as Read Only Memory (ROM), random Access Memory (RAM), phase change random access memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), electrically Erasable Programmable Read Only Memory (EEPROM), other types of Random Access Memory (RAM), flash memory disk or other forms of flash memory, cache, registers, static memory, compact disk read only memory (CD-ROM), digital Versatile Disk (DVD) or other optical storage, magnetic cassettes or other magnetic storage devices, or any other possible non-transitory medium which is used to store information or instructions that can be accessed by a computer device.
The above description is intended to be illustrative and not restrictive. For example, the above-described examples (or one or more aspects thereof) may be used in combination with each other. Other embodiments may be used by those of ordinary skill in the art in view of the above description. Moreover, in the foregoing detailed description, various features may be grouped together to simplify the present application. This should not be interpreted as intending that an unclaimed disclosed feature is essential to any claim. Rather, the inventive subject matter may lie in less than all features of a single disclosed embodiment. Thus, the claims are hereby incorporated into the detailed description as examples or embodiments, with each claim standing on its own as a separate embodiment, and it is contemplated that these embodiments may be combined with each other in various combinations or permutations. The scope of the application should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.

Claims (9)

1. A method of generating a target text sequence for a natural language text sequence, comprising, by a processor:
Receiving an input vector corresponding to a natural language text sequence of a target text sequence to be generated;
taking the current element in the received input vector as an input word node, taking each generated word in the target text sequence as a generated word node sequence, constructing a word node diagram corresponding to the current output word in the target text sequence based on the input word node and each generated word node,
The word node diagram also comprises global nodes and local nodes, wherein the global nodes are connected with the input word nodes and each generated word node, and the local nodes are connected with the input word nodes and the latest w generated word nodes in the generated word node sequence;
generating a current output word in the target text sequence by using a trained first learning network based on the word node diagram, wherein the first learning network is formed by sequentially connecting a graph neural network and an output layer in series, the graph neural network comprises a local feature calculation part, a global feature calculation part and a node state updating part,
The node status updating section is configured to: the node hidden state is circularly updated for T rounds by utilizing the local semantic features output by the local feature computing part, the global semantic features output by the global feature computing part and the first input features corresponding to the input word nodes, wherein the node hidden state comprises a node output hidden state and a node memory hidden state; taking the node output hidden state updated by the T-th round as the output of the graph neural network;
The local feature calculation section is configured to: calculating local semantic features of the word node diagram by using node hidden states of w generated word nodes connected with the local nodes, which are output by the node state updating part;
the global feature calculation section is configured to: calculating global semantic features of the word node graph by using node hidden states of each generated word node connected with the global node and output by the node state updating part;
And the output layer is configured to generate a current output word in the target text sequence based on the node output hidden state after the T-th round of updating output by the graphic neural network.
2. The method of claim 1, wherein the target text sequence is one of a source language summary text sequence of the natural language text sequence, a target language translation text sequence of the natural language text sequence, a dialog text sequence matching the natural language text sequence, or a combination thereof.
3. The method of claim 1, wherein the local semantic features comprise local context output features and local context memory features, and the global semantic features comprise global context output features and global context memory features,
The local feature calculation section is further configured to: predicting a local context output feature of the word node map based on node output hidden states of w generated word nodes connected to the local node output by the node state updating section using a pooling function; predicting a local context memory feature of the word node map based on node memory hidden states of w generated word nodes connected to the local node output by the node state updating section using a pooling function;
The global feature calculation section is further configured to: predicting, with an attention mechanism, global context output features of the word node graph based on node output hidden states of each generated word node connected to the global node output by the node state updating section; the global context memory feature of the word node graph is predicted based on node memory hidden states of each generated word node connected to the global node output by the node state updating section using an attention mechanism.
4. A method according to claim 3, wherein the node status updating section is provided with an input gate, an output gate, a local gate, a forget gate, a global gate and an output gate, the node status updating section being further configured to:
Gating an input feature vector corresponding to a second input feature by using an input gate vector corresponding to the input gate to generate a first component of a node memory hidden state;
Gating the local context memory feature of the previous round by utilizing the local gate vector corresponding to the local gate to generate a second component of the memory hidden state of the node of the current round;
Gating the node memory hidden state of the previous round by using the forgetting gate vector corresponding to the forgetting gate to generate a third component of the node memory hidden state of the current round;
Gating global context memory features of a previous round by using global gate vectors corresponding to the global gates to generate a fourth component of the memory hidden state of the node of the current round;
Generating a current round node memory hidden state based on the first component, the second component, the third component, and the fourth component;
And gating the node memory hidden state by using the output gate to generate the node output hidden state of the current round.
5. The method of claim 4, further comprising constructing a composite vector based on the node output hidden state of the previous round, the local context output feature of the previous round, the global context output feature of the previous round, and the first input feature;
based on the composite vector, generating an input gate vector corresponding to the input gate, a local gate vector corresponding to the local gate, a forget gate vector corresponding to the forget gate, a global gate vector corresponding to the global gate and an output gate vector corresponding to the output gate by using a linear activation function and a normalization function;
based on the composite vector, a nonlinear activation function is utilized to generate an input feature vector corresponding to the second input feature.
6. The method of any of claims 3-5, wherein the output layer is further configured to:
And generating probability distribution of a current output word based on the T-th round updated node output hidden state output by the graph neural network and an output vector matrix generated based on a preset word list, and determining the current output word based on the probability distribution.
7. The method of claim 1, wherein in the case of training the first learning network or the graph neural network with training data having a true value annotation of a target text sequence, when training to generate each output word in the target text sequence, the input of generating a next output word using the true value annotation of the current output word as training is selected with a first probability p1, the input of generating a next output word using the current output word output by the first learning network in training is selected with a probability of 1-p1, and the first probability p1 is gradually decreased as the training process proceeds.
8. A system for generating a target text sequence for a natural language text sequence, comprising:
an interface configured to receive an input vector corresponding to a natural language text sequence of a target text sequence to be generated;
A processor configured to: a method of generating a target text sequence for a natural language text sequence according to any of claims 1-7 is performed.
9. A non-transitory computer readable storage medium having stored thereon computer executable instructions which, when executed by a processor, implement a method of generating a target text sequence for a natural language text sequence according to any of claims 1-7.
CN202410038359.1A 2024-01-11 2024-01-11 Method and system for generating target text sequence for natural language text sequence Active CN117556787B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410038359.1A CN117556787B (en) 2024-01-11 2024-01-11 Method and system for generating target text sequence for natural language text sequence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410038359.1A CN117556787B (en) 2024-01-11 2024-01-11 Method and system for generating target text sequence for natural language text sequence

Publications (2)

Publication Number Publication Date
CN117556787A CN117556787A (en) 2024-02-13
CN117556787B true CN117556787B (en) 2024-04-26

Family

ID=89816994

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410038359.1A Active CN117556787B (en) 2024-01-11 2024-01-11 Method and system for generating target text sequence for natural language text sequence

Country Status (1)

Country Link
CN (1) CN117556787B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109902293A (en) * 2019-01-30 2019-06-18 华南理工大学 A kind of file classification method based on part with global mutually attention mechanism
CN111767732A (en) * 2020-06-09 2020-10-13 上海交通大学 Document content understanding method and system based on graph attention model
CN112035661A (en) * 2020-08-24 2020-12-04 北京大学深圳研究生院 Text emotion analysis method and system based on graph convolution network and electronic device
CN112597296A (en) * 2020-12-17 2021-04-02 中山大学 Abstract generation method based on plan mechanism and knowledge graph guidance
CN114048754A (en) * 2021-12-16 2022-02-15 昆明理工大学 Chinese short text classification method integrating context information graph convolution
CN114065771A (en) * 2020-08-01 2022-02-18 新加坡依图有限责任公司(私有) Pre-training language processing method and device
CN115688804A (en) * 2021-07-30 2023-02-03 微软技术许可有限责任公司 Representation generation based on embedding vector sequence abstraction
CN116415034A (en) * 2023-03-08 2023-07-11 华南理工大学 Text-video time sequence positioning method based on feature reconstruction
CN116628186A (en) * 2023-07-17 2023-08-22 乐麦信息技术(杭州)有限公司 Text abstract generation method and system
CN117251562A (en) * 2023-09-28 2023-12-19 电子科技大学 Text abstract generation method based on fact consistency enhancement

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230368500A1 (en) * 2022-05-11 2023-11-16 Huaneng Lancang River Hydropower Inc Time-series image description method for dam defects based on local self-attention

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109902293A (en) * 2019-01-30 2019-06-18 华南理工大学 A kind of file classification method based on part with global mutually attention mechanism
CN111767732A (en) * 2020-06-09 2020-10-13 上海交通大学 Document content understanding method and system based on graph attention model
CN114065771A (en) * 2020-08-01 2022-02-18 新加坡依图有限责任公司(私有) Pre-training language processing method and device
CN112035661A (en) * 2020-08-24 2020-12-04 北京大学深圳研究生院 Text emotion analysis method and system based on graph convolution network and electronic device
CN112597296A (en) * 2020-12-17 2021-04-02 中山大学 Abstract generation method based on plan mechanism and knowledge graph guidance
CN115688804A (en) * 2021-07-30 2023-02-03 微软技术许可有限责任公司 Representation generation based on embedding vector sequence abstraction
CN114048754A (en) * 2021-12-16 2022-02-15 昆明理工大学 Chinese short text classification method integrating context information graph convolution
CN116415034A (en) * 2023-03-08 2023-07-11 华南理工大学 Text-video time sequence positioning method based on feature reconstruction
CN116628186A (en) * 2023-07-17 2023-08-22 乐麦信息技术(杭州)有限公司 Text abstract generation method and system
CN117251562A (en) * 2023-09-28 2023-12-19 电子科技大学 Text abstract generation method based on fact consistency enhancement

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Global Explanations for Multivariate time series models;Vijay Arya;Proceedings of the 6th Joint International Conference on Data Science & Management of Data;20230131;全文 *
Modeling Global and Local Node Contexts for Text Generation from Knowledge Graphs;zhang yue;TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS;20201231;全文 *
基于邻接信息熵的网络节点重要性识别算法;胡钢;徐翔;高浩;过秀成;;系统工程理论与实践;20200325(03);全文 *
融合attention机制的BI-LSTM-CRF中文分词模型;黄丹丹;郭玉翠;;软件;20181015(10);全文 *

Also Published As

Publication number Publication date
CN117556787A (en) 2024-02-13

Similar Documents

Publication Publication Date Title
CN109885842B (en) Processing text neural networks
AU2018271417B2 (en) A system for deep abstractive summarization of long and structured documents
US20190370659A1 (en) Optimizing neural network architectures
WO2021091681A1 (en) Adversarial training of machine learning models
US20190164084A1 (en) Method of and system for generating prediction quality parameter for a prediction model executed in a machine learning algorithm
WO2020140073A1 (en) Neural architecture search through a graph search space
Wang et al. TranS^ 3: A transformer-based framework for unifying code summarization and code search
JP2022024102A (en) Method for training search model, method for searching target object and device therefor
CN114144794A (en) Electronic device and method for controlling electronic device
US11797281B2 (en) Multi-language source code search engine
Perera et al. Multi-task learning for parsing the alexa meaning representation language
CN105630763A (en) Method and system for making mention of disambiguation in detection
Nawaz et al. Proof guidance in PVS with sequential pattern mining
CN114330717A (en) Data processing method and device
CN113806489A (en) Method, electronic device and computer program product for dataset creation
CN113297355A (en) Method, device, equipment and medium for enhancing labeled data based on countermeasure interpolation sequence
Seilsepour et al. Self-supervised sentiment classification based on semantic similarity measures and contextual embedding using metaheuristic optimizer
CN117556787B (en) Method and system for generating target text sequence for natural language text sequence
Gomez-Perez et al. Understanding word embeddings and language models
US20200302270A1 (en) Budgeted neural network architecture search system and method
CN113076089B (en) API (application program interface) completion method based on object type
KR20240034804A (en) Evaluating output sequences using an autoregressive language model neural network
CN108460453B (en) Data processing method, device and system for CTC training
Zhu et al. Order-sensitive keywords based response generation in open-domain conversational systems
CN115210714A (en) Large scale model simulation by knowledge distillation based NAS

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant