CN111597792B - Sentence-level convolution LSTM training method, equipment and readable medium - Google Patents

Sentence-level convolution LSTM training method, equipment and readable medium Download PDF

Info

Publication number
CN111597792B
CN111597792B CN202010146406.6A CN202010146406A CN111597792B CN 111597792 B CN111597792 B CN 111597792B CN 202010146406 A CN202010146406 A CN 202010146406A CN 111597792 B CN111597792 B CN 111597792B
Authority
CN
China
Prior art keywords
word
current
sentence
moment
vector input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010146406.6A
Other languages
Chinese (zh)
Other versions
CN111597792A (en
Inventor
张凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010146406.6A priority Critical patent/CN111597792B/en
Publication of CN111597792A publication Critical patent/CN111597792A/en
Priority to PCT/CN2020/118341 priority patent/WO2021174824A1/en
Application granted granted Critical
Publication of CN111597792B publication Critical patent/CN111597792B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a sentence-level convolution LSTM training method, which comprises the following steps: aggregating the hidden states of the current word and the previous moment of the adjacent word in the sentence in a one-dimensional convolution mode, and inputting the aggregated hidden states as sentence vectors; inputting the sub-state of the current word of the sentence at the current moment as a word vector; inputting the sentence vector input, the word vector input and the cell state of the current word at the previous moment into a logic gate to obtain the cell state of the current word at the current moment; and sending the sentence vector input, the word vector input and the cell state of the current word at the current moment into an output gate to obtain and output the hidden state of the current word at the current moment. The invention also discloses a computer device and a readable storage medium. The invention takes an integral sentence as a single state containing word-level sub-states with the length of the sentence, and uses one-dimensional convolution to aggregate local information around each word, thereby greatly improving the parallel computing capability and saving the time and the capital cost.

Description

Sentence-level convolution LSTM training method, equipment and readable medium
Technical Field
The invention relates to the technical field of deep learning, in particular to a sentence-level convolution LSTM training method, equipment and a readable medium.
Background
The long and short term memory network (LSTM) has strong sequential information extraction capability and is the current mainstream text representation tool. To date, it has achieved the most advanced results in natural language processing tasks such as language modeling, machine translation, syntactic analysis, question answering, etc.
The conventional recurrent neural network models sentences as sequence data having a chain structure, and at each time step, a hidden state having the above information at the present time is output by inputting a word state at one present time step and a hidden state at one previous time step. However, if the length of the input sequence is too long, the recurrent neural network often has the problems of gradient disappearance and gradient explosion, so that long-term dependence information in the sequence cannot be well learned.
In the prior art, the SRU simplifies the state operation process of each gate in the LSTM by using operations such as dot product, coupling and the like, thereby improving the parallelism; SRNN, by segmenting a sentence into a plurality of clauses and using independent LSTM in the plurality of clauses, achieves parallel computation of the clauses and improves parallel computation capability. However, both of the above approaches still perform poorly in the time dimension.
Disclosure of Invention
In view of the above, an object of the embodiments of the present invention is to provide a training method, a device and a readable medium for sentence-level convolution LSTM, in which a whole sentence is regarded as a single state containing word-level sub-states with a sentence length, local information around each word is aggregated by using one-dimensional convolution, interaction between the local information and the above information is realized in a stacking manner, information update in a cell state and state output are controlled by using a logic gate, parallel computing capability is greatly improved, and time and capital costs are saved.
Based on the above object, an aspect of the embodiments of the present invention provides a training method for sentence-level convolution LSTM, including the following steps: aggregating the hidden states of the current word and the adjacent word at the previous moment in the sentence in a one-dimensional convolution mode, and inputting the aggregated hidden states as sentence vectors; inputting the sub-state of the current word of the sentence at the current moment as a word vector; inputting the sentence vector input, the word vector input and the cell state of the current word at the previous moment into a logic gate to obtain the cell state of the current word at the current moment; and sending the sentence vector input, the word vector input and the cell state of the current word at the current moment into an output gate to obtain and output the hidden state of the current word at the current moment.
In some embodiments, aggregating hidden states at a time immediately above a current word and its neighboring words in a sentence by means of one-dimensional convolution, and inputting as a sentence vector comprises: and performing one-dimensional convolution on the hidden state of the previous word at the previous moment, the hidden state of the current word at the previous moment and the hidden state of the next word at the previous moment to generate sentence vector input of the current word at the previous moment.
In some embodiments, entering the sentence vector input, the word vector input, and the state of the cell at a time previous to the current word into the logic gate comprises: the sentence vector input, the word vector input, and the cell state at the time immediately preceding the current word are fed into a forgetting gate to discard part of the information of the cell state at the time immediately preceding the current word.
In some embodiments, entering the sentence vector input, the word vector input, and the cellular state at a time previous to the current word into the logic gate comprises: the sentence vector input, the word vector input, and the cell state at the time immediately preceding the current word are fed into an input gate to add new partial information to the cell state at the time immediately preceding the current word.
In some embodiments, further comprising: and performing one-dimensional convolution on the hidden state of the current word at the current moment, the hidden state of the previous word at the current moment and the hidden state of the next word at the current moment to generate sentence vector input of the current word at the next moment so as to enter the next cycle.
In another aspect of the embodiments of the present invention, there is also provided a computer device, including: at least one processor; and a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of: aggregating the hidden states of the current word and the adjacent word in the sentence at the previous moment in a one-dimensional convolution mode, and inputting the aggregated hidden states as sentence vectors; inputting the sub-state of the current word of the sentence at the current moment as a word vector; inputting the sentence vector input, the word vector input and the cell state of the current word at the previous moment into a logic gate to obtain the cell state of the current word at the current moment; and sending the sentence vector input, the word vector input and the cell state of the current word at the current moment into an output gate to obtain and output the hidden state of the current word at the current moment.
In some embodiments, aggregating hidden states at a time instant on a current word and its neighboring words in a sentence by means of one-dimensional convolution and inputting as a sentence vector comprises: and performing one-dimensional convolution on the hidden state of the previous word at the previous moment, the hidden state of the current word at the previous moment and the hidden state of the next word at the previous moment to generate sentence vector input of the current word at the previous moment.
In some embodiments, entering the sentence vector input, the word vector input, and the cellular state at a time previous to the current word into the logic gate comprises: the sentence vector input, the word vector input, and the cell state at the time immediately preceding the current word are fed into a forgetting gate to discard part of the information of the cell state at the time immediately preceding the current word.
In some embodiments, entering the sentence vector input, the word vector input, and the cellular state at a time previous to the current word into the logic gate comprises: the sentence vector input, the word vector input, and the cell state at the time immediately preceding the current word are fed into an input gate to add new partial information to the cell state at the time immediately preceding the current word.
In a further aspect of the embodiments of the present invention, a computer-readable storage medium is also provided, in which a computer program for implementing the above method steps is stored when the computer program is executed by a processor.
The invention has the following beneficial technical effects: by regarding an integral sentence as a single state containing word-level sub-states with the length of the sentence, using one-dimensional convolution to aggregate local information around each word, realizing the interaction of the local information and the above information in a stacking mode, and controlling the information updating and the state output in the cell state by using a logic gate, the parallel computing capability is greatly improved, and the time and the capital cost are saved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
FIG. 1 is a diagram of an embodiment of a sentence-level convolution LSTM training method provided by the present invention;
FIG. 2 is a convolution diagram of the global scope hidden state of the training method of sentence-level convolution LSTM provided by the present invention;
FIG. 3 is an internal operation diagram of the training method of sentence-level convolution LSTM according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
In view of the above, a first aspect of the embodiments of the present invention proposes an embodiment of a training method for sentence-level convolution LSTM. FIG. 1 is a diagram illustrating an embodiment of a sentence-level convolution LSTM training method provided by the present invention. As shown in fig. 1, the embodiment of the present invention includes the following steps:
s1, aggregating hidden states of a current word and a word adjacent to the current word in a sentence at a previous moment in a one-dimensional convolution mode, and inputting the aggregated hidden states as sentence vectors;
s2, inputting the sub-state of the current word of the sentence at the current moment as a word vector;
s3, inputting the sentence vector input, the word vector input and the cell state of the current word at the previous moment into a logic gate to obtain the cell state of the current word at the current moment; and
and S4, sending the sentence vector input, the word vector input and the cell state of the current word at the current moment into an output gate to obtain and output the hidden state of the current word at the current moment.
In this embodiment, a long short term memory network (LSTM) has a strong capability of extracting sequence information, and is a mainstream text presentation tool at present. LSTM store long-term dependent information in the sequence by introducing additional cell states.
In this embodiment, using sentences as the basic processing unit of the entire network, for a sentence of length n, the word vector X = [ X ] for the ith word in the sentence 1 ,x 2 ,...,x n ],x i ∈R d Where d represents the dimension of the word vector, randomly initializing the cell state C of the ith substate i And hidden state h i Cell state C = [ C ] of sentence 1 ,c 2 ,...,c n ],c i ∈R d Hidden state H = [ H ] 1 ,h 2 ,...,h n ],h i ∈R d
Taking the hidden state of the sentence as the input of the one-dimensional convolution to acquire the local information, and setting the size of the convolution kernel as k, at the time t, the output of the one-dimensional convolution of the i-th word hidden state is as follows:
Figure GDA0003895949380000051
wherein W t ∈R 2d×kd ,b∈R 2d Is a parameterized matrix and bias term for the convolution kernel.
v is a nonlinear activation function-linear gating cell that implements a simple gating mechanism on the output of the convolution, formulated as
Figure GDA0003895949380000052
Wherein [ AB ]]∈R 2d Is the output of the one-dimensional convolution; v ([ AB ]])∈R d Output of a non-linear activation function, A, B ∈ R d Is an input.
The one-dimensional convolution of the entire sentence at time t is output as
Figure GDA0003895949380000061
The output of the one-dimensional convolution at the last time and the sub-state X of the ith word are compared i As inputs, to three logic gates, the logic gate formula is as follows:
forget the door: f. of i t =σ(W f H t-1 +U f x i +b f );
An input gate:
Figure GDA0003895949380000062
an output gate:
Figure GDA0003895949380000063
wherein W, U is belonged to R d×d Is a weight parameter, b ∈ R d Is a bias term and σ is a nonlinear activation function.
And finally, at the time t, updating the cell state through the output of three logic gate structures, and outputting the hidden state of the ith word. The formula is as follows:
Figure GDA0003895949380000064
cell state at time t of ith word
Figure GDA0003895949380000065
Hidden state of ith word at time t
Figure GDA0003895949380000066
Wherein, W c ,U c ∈R d×d Is a weight parameter, b c ∈R d Is the term of the offset, and,
Figure GDA0003895949380000067
representing the hadamard product, and tanh is a nonlinear activation function.
In the embodiment, a whole sentence is regarded as a single state containing word-level sub-states with the length of the sentence, the local information around each word is aggregated by using one-dimensional convolution, the interaction between the local information and the above information is realized in a stacking mode, and the information updating and the state output in the cell state are controlled by using logic gates, so that the parallel computing capacity is greatly improved, and the time and the capital cost are saved.
FIG. 2 is a diagram illustrating a global scope hidden state convolution of the training method of sentence-level convolution LSTM according to the present invention. As shown in fig. 2, in some embodiments of the present invention, aggregating hidden states at a time immediately before a current word and its neighboring words in a sentence by means of one-dimensional convolution, and inputting as a sentence vector includes: hiding the previous word at the previous momentStatus of state
Figure GDA0003895949380000068
Hidden state of the previous moment of the current word
Figure GDA0003895949380000069
And the hidden state of the next word at the previous moment
Figure GDA00038959493800000610
One-dimensional convolution is carried out to generate sentence vector input of the last moment of the current word
Figure GDA0003895949380000071
In some embodiments of the invention, the adjacent words may also include upper two words and lower two words, upper three words and lower three words, and adjacent upper and lower words.
In some embodiments of the invention, entering the sentence vector input, the word vector input, and the cellular state at a time immediately preceding the current word into the logic gate comprises: the sentence vector input, the word vector input, and the cell state at the time immediately preceding the current word are fed into a forgetting gate to discard part of the information of the cell state at the time immediately preceding the current word.
In some embodiments of the invention, entering the sentence vector input, the word vector input, and the cellular state at a time immediately preceding the current word into the logic gate comprises: the sentence vector input, the word vector input, and the cell state at the time immediately preceding the current word are fed into an input gate to add new partial information to the cell state at the time immediately preceding the current word.
In some embodiments of the invention, further comprising: and performing one-dimensional convolution on the hidden state of the current word at the current moment, the hidden state of the previous word at the current moment and the hidden state of the next word at the current moment to generate sentence vector input of the current word at the next moment.
It should be noted that, the above-mentioned sentence-level convolution LSTM training method may have steps of interleaving, replacing, adding, and deleting, so that these reasonable permutation and combination transformations should also belong to the scope of the present invention, and should not limit the scope of the present invention to the embodiments.
FIG. 3 is an internal operation diagram of the training method of sentence-level convolution LSTM provided by the present invention, as shown in FIG. 3: the hidden state of the i-1 th word t-1 time of the sentence
Figure GDA0003895949380000072
Hidden state of ith word at time t-1
Figure GDA0003895949380000073
And hidden state at time t-1 of the (i + 1) th word
Figure GDA0003895949380000074
One-dimensional convolution is carried out to generate sentence vector input of the ith word at the moment of t-1
Figure GDA0003895949380000075
Sub-state at time t of ith word
Figure GDA0003895949380000076
Inputting the input data into a forgetting gate, an input gate and an output gate; cell state at time point of ith word t-1
Figure GDA0003895949380000077
After information is discarded through a forgetting gate and new information is added through an input gate, the cell state of the ith word at the t moment is updated
Figure GDA0003895949380000078
And outputs the hidden state of the ith word at the moment t
Figure GDA0003895949380000079
In view of the above object, a second aspect of an embodiment of the present invention provides a computer device, including: at least one processor; and a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of: s1, aggregating the hidden states of the current word and the adjacent word at the previous moment in the sentence in a one-dimensional convolution mode, and inputting the aggregated hidden states as sentence vectors; s2, inputting the sub-state of the current word of the sentence at the current moment as a word vector; s3, inputting the sentence vector input, the word vector input and the cell state of the current word at the previous moment into a logic gate to obtain the cell state of the current word at the current moment; and S4, sending the sentence vector input, the word vector input and the cell state of the current word at the current moment into an output gate to obtain and output the hidden state of the current word at the current moment.
In some embodiments of the present invention, aggregating hidden states at a time immediately before a current word and its neighboring words in a sentence by means of one-dimensional convolution, and inputting as a sentence vector comprises: and performing one-dimensional convolution on the hidden state of the previous word at the previous moment, the hidden state of the current word at the previous moment and the hidden state of the next word at the previous moment to generate sentence vector input of the current word at the previous moment.
In some embodiments of the invention, entering the sentence vector input, the word vector input, and the state of the cell at a time previous to the current word into the logic gate comprises: the sentence vector input, the word vector input, and the cell state at the time immediately preceding the current word are fed into a forgetting gate to discard part of the information of the cell state at the time immediately preceding the current word.
In some embodiments of the invention, entering the sentence vector input, the word vector input, and the cellular state at a time immediately preceding the current word into the logic gate comprises: the sentence vector input, the word vector input, and the state of the cell at the time immediately preceding the current word are fed into an input gate to add new partial information to the state of the cell at the time immediately preceding the current word.
The invention also provides a computer readable storage medium storing a computer program which, when executed by a processor, performs the method as above.
Finally, it should be noted that, as one of ordinary skill in the art can appreciate that all or part of the processes of the above-described embodiments of the method can be implemented by a computer program to instruct relevant hardware to implement the above-described embodiments of the method, and the program of the sentence-level convolution LSTM training method can be stored in a computer-readable storage medium, and when executed, the program can include the processes of the above-described embodiments of the method. The storage medium of the program may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.
Furthermore, the methods disclosed according to embodiments of the present invention may also be implemented as a computer program executed by a processor, which may be stored in a computer-readable storage medium. Which when executed by a processor performs the above-described functions defined in the methods disclosed in embodiments of the invention.
Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.
Further, it should be appreciated that the computer-readable storage media (e.g., memory) herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM may be available in a variety of forms such as synchronous RAM (DRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
The various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein may be implemented or performed with the following components designed to perform the functions herein: a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP, and/or any other such configuration.
The steps of a method or algorithm described in connection with the disclosure herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
In one or more exemplary designs, the functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, digital Versatile Disc (DVD), floppy disk, blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit or scope of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (7)

1. A training method of sentence-level convolution LSTM is characterized by comprising the following steps:
aggregating the hidden states of the current word and the adjacent word in the sentence at the previous moment in a one-dimensional convolution mode, and inputting the aggregated hidden states as sentence vectors;
inputting the sub-state of the current word of the sentence at the current moment as a word vector;
sending the sentence vector input, the word vector input and the cell state of the current word at the previous moment into a logic gate to obtain the cell state of the current word at the current moment; and
sending the sentence vector input, the word vector input and the cell state of the current word at the current moment into an output gate to obtain and output the hidden state of the current word at the current moment;
the aggregating the hidden states of the current word and the adjacent word in the sentence at the previous moment in a one-dimensional convolution mode and inputting the aggregated hidden states as sentence vectors comprises the following steps:
performing one-dimensional convolution on the hidden state of the previous word at the previous moment, the hidden state of the current word at the previous moment and the hidden state of the next word at the previous moment to generate sentence vector input of the current word at the previous moment;
and performing one-dimensional convolution on the hidden state of the current word at the current moment, the hidden state of the previous word at the current moment and the hidden state of the next word at the current moment to generate sentence vector input of the current word at the next moment so as to enter the next cycle.
2. The training method of claim 1, wherein entering the sentence vector input, the word vector input, and a cellular state at a time immediately preceding the current word into a logic gate comprises:
and sending the sentence vector input, the word vector input and the cell state of the current word at the previous moment into a forgetting gate so as to discard part of information of the cell state of the current word at the previous moment.
3. The training method of claim 1, wherein entering the sentence vector input, the word vector input, and a cellular state at a time immediately preceding the current word into a logic gate comprises:
and inputting the sentence vector input, the word vector input and the cell state of the current word at the previous moment into an input gate so as to add new partial information to the cell state of the current word at the previous moment.
4. A computer device, comprising:
at least one processor; and
a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of:
aggregating the hidden states of the current word and the adjacent word in the sentence at the previous moment in a one-dimensional convolution mode, and inputting the aggregated hidden states as sentence vectors;
inputting the sub-state of the current word of the sentence at the current moment as a word vector;
sending the sentence vector input, the word vector input and the cell state of the current word at the previous moment into a logic gate to obtain the cell state of the current word at the current moment; and
sending the sentence vector input, the word vector input and the cell state of the current word at the current moment into an output gate to obtain and output the hidden state of the current word at the current moment;
performing one-dimensional convolution on the hidden state of the current word at the current moment, the hidden state of the previous word at the current moment and the hidden state of the next word at the current moment to generate sentence vector input of the current word at the next moment so as to enter a next cycle;
the aggregating the hidden states of the current word and the adjacent word in the sentence at the previous moment in a one-dimensional convolution mode and inputting the aggregated hidden states as sentence vectors comprises the following steps:
and performing one-dimensional convolution on the hidden state of the previous word at the previous moment, the hidden state of the current word at the previous moment and the hidden state of the next word at the previous moment to generate sentence vector input of the current word at the previous moment.
5. The apparatus of claim 4, wherein entering the sentence vector input, the word vector input, and a state of a cell at a time previous to the current word into a logic gate comprises:
and sending the sentence vector input, the word vector input and the cell state of the current word at the previous moment into a forgetting gate to discard part of information of the cell state of the current word at the previous moment.
6. The apparatus of claim 4, wherein entering the sentence vector input, the word vector input, and a cellular state at a time previous to the current word into a logic gate comprises:
and inputting the sentence vector input, the word vector input and the cell state of the current word at the previous moment into an input gate so as to add new partial information to the cell state of the current word at the previous moment.
7. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 3.
CN202010146406.6A 2020-03-05 2020-03-05 Sentence-level convolution LSTM training method, equipment and readable medium Active CN111597792B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010146406.6A CN111597792B (en) 2020-03-05 2020-03-05 Sentence-level convolution LSTM training method, equipment and readable medium
PCT/CN2020/118341 WO2021174824A1 (en) 2020-03-05 2020-09-28 Sentence-level convolution lstm training method, and device and readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010146406.6A CN111597792B (en) 2020-03-05 2020-03-05 Sentence-level convolution LSTM training method, equipment and readable medium

Publications (2)

Publication Number Publication Date
CN111597792A CN111597792A (en) 2020-08-28
CN111597792B true CN111597792B (en) 2023-01-06

Family

ID=72191097

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010146406.6A Active CN111597792B (en) 2020-03-05 2020-03-05 Sentence-level convolution LSTM training method, equipment and readable medium

Country Status (2)

Country Link
CN (1) CN111597792B (en)
WO (1) WO2021174824A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597792B (en) * 2020-03-05 2023-01-06 苏州浪潮智能科技有限公司 Sentence-level convolution LSTM training method, equipment and readable medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN110717330A (en) * 2019-09-23 2020-01-21 哈尔滨工程大学 Word-sentence level short text classification method based on deep learning

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304911B (en) * 2018-01-09 2020-03-13 中国科学院自动化研究所 Knowledge extraction method, system and equipment based on memory neural network
CN108363753B (en) * 2018-01-30 2020-05-19 南京邮电大学 Comment text emotion classification model training and emotion classification method, device and equipment
AU2018100320A4 (en) * 2018-03-15 2018-04-26 Ji, Jiajian Mr A New System for Stock Volatility Prediction by Using Long Short-Term Memory with Sentimental Indicators
CN109783817B (en) * 2019-01-15 2022-12-06 浙江大学城市学院 Text semantic similarity calculation model based on deep reinforcement learning
CN111597792B (en) * 2020-03-05 2023-01-06 苏州浪潮智能科技有限公司 Sentence-level convolution LSTM training method, equipment and readable medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN110717330A (en) * 2019-09-23 2020-01-21 哈尔滨工程大学 Word-sentence level short text classification method based on deep learning

Also Published As

Publication number Publication date
CN111597792A (en) 2020-08-28
WO2021174824A1 (en) 2021-09-10

Similar Documents

Publication Publication Date Title
US11615255B2 (en) Multi-turn dialogue response generation with autoregressive transformer models
Krause et al. Multiplicative LSTM for sequence modelling
CN109977428A (en) A kind of method and device that answer obtains
US20240013059A1 (en) Extreme Language Model Compression with Optimal Sub-Words and Shared Projections
CN112836502B (en) Financial field event implicit causal relation extraction method
US20230394245A1 (en) Adversarial Bootstrapping for Multi-Turn Dialogue Model Training
CN109918499A (en) A kind of file classification method, device, computer equipment and storage medium
US11694034B2 (en) Systems and methods for machine-learned prediction of semantic similarity between documents
CN113240115B (en) Training method for generating face change image model and related device
CN111597792B (en) Sentence-level convolution LSTM training method, equipment and readable medium
CN107562729B (en) Party building text representation method based on neural network and theme enhancement
CN113127604B (en) Comment text-based fine-grained item recommendation method and system
Yao et al. Collie: Systematic construction of constrained text generation tasks
CN109753563B (en) Tag extraction method, apparatus and computer readable storage medium based on big data
WO2023107207A1 (en) Automated notebook completion using sequence-to-sequence transformer
CN115512374A (en) Deep learning feature extraction and classification method and device for table text
CN111651607A (en) Information positive and negative emotion analysis method and device, computer equipment and storage medium
CN112199954A (en) Disease entity matching method and device based on voice semantics and computer equipment
US20230376676A1 (en) Semi-Autoregressive Text Editing
CN116502640B (en) Text characterization model training method and device based on context
Klahold et al. Automatic text generation
Siegelmann et al. Nine switch-affine neurons suffice for turing universality
Tang et al. Learning by Interpreting.
CN109614463A (en) Text matches processing method and processing device
CN112668343B (en) Text rewriting method, electronic device and storage device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant