CN111414749A - Social text dependency syntactic analysis system based on deep neural network - Google Patents

Social text dependency syntactic analysis system based on deep neural network Download PDF

Info

Publication number
CN111414749A
CN111414749A CN202010193329.XA CN202010193329A CN111414749A CN 111414749 A CN111414749 A CN 111414749A CN 202010193329 A CN202010193329 A CN 202010193329A CN 111414749 A CN111414749 A CN 111414749A
Authority
CN
China
Prior art keywords
module
training
social
text
texts
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010193329.XA
Other languages
Chinese (zh)
Other versions
CN111414749B (en
Inventor
刘宇鹏
张晓晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin University of Science and Technology
Original Assignee
Harbin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin University of Science and Technology filed Critical Harbin University of Science and Technology
Priority to CN202010193329.XA priority Critical patent/CN111414749B/en
Publication of CN111414749A publication Critical patent/CN111414749A/en
Application granted granted Critical
Publication of CN111414749B publication Critical patent/CN111414749B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

A deep neural network-based social text dependency syntactic analysis system relates to the technical field of computer information processing, and aims at solving the problem of sparse social text data in the prior art, and comprises the following steps: the system comprises a social text crawling module, a preprocessing module, a basic bilinear attention module, a stacked bilinear attention module and a joint decoding and training module; the social text crawling module is used for crawling social texts from social media websites; the preprocessing module is used for filtering the obtained social text and generating an initialization word vector; the basic bilinear attention module is used for pre-training by utilizing regular texts; the stacked bilinear attention module is used for predicting social texts; the combined decoding and training module is used for calculating an empirical risk function for the stacked bilinear attention module, performing back propagation gradient adjustment parameters, fitting a training function, and finally performing parallel calculation to accelerate model training by using the GPU.

Description

Social text dependency syntactic analysis system based on deep neural network
Technical Field
The invention relates to the technical field of computer information processing, in particular to a social text dependency syntax analysis system based on a deep neural network.
Background
Dependency analysis is a fundamental and important task in natural language processing, and many applications require dependency analysis on sentences to provide syntactic results to the corresponding task. Through the powerful computing power of a computer, the dependency syntax structure of the sentence is identified. Dependency syntax trees are broadly divided into two categories by structure: project (Project) and Non-Project (Non-Project) dependency syntax structures; according to the decoding algorithm: graph-based (Graph-based) and transformation-based (Transition-based) dependency algorithms. The deep neural network part overcomes gradient diffusion and explosion of the traditional neural network, is developed rapidly in recent years, and makes great progress in various application fields of natural language. The deep neural translation method has the advantages that 1, the deep neural translation method is a nonparametric model and is irrelevant to the scale of a task, and the task of data of any scale can be learned as long as parameters are specified; 2. unlike the traditional dependency analysis method which needs independent feature extraction, the feature extraction and the training of the dependency analyzer are put together, and the Joint model method overcomes the fault propagation defect of the traditional Pipeline (Pipeline) model; 3. compared with the traditional method, the method has higher performance and is used for many tasks. Many research institutions and scientific institutions have focused on deep learning models.
Unlike traditional dependency algorithms, dependency analysis of social text has some problems: for example, the training corpus is small, and special words and dependency relationships can occur.
Disclosure of Invention
The purpose of the invention is: aiming at the problem of sparse social text data in the prior art, a deep neural network-based social text dependency syntax analysis system is provided.
The technical scheme adopted by the invention to solve the technical problems is as follows:
a deep neural network based social text dependency syntactic analysis system, comprising: the system comprises a social text crawling module, a preprocessing module, a basic bilinear attention module, a stacked bilinear attention module and a joint decoding and training module;
the social text crawling module is used for crawling social texts from social media websites;
the preprocessing module is used for filtering the obtained social text and generating an initialization word vector;
the basic bilinear attention module is used for pre-training by utilizing regular texts;
the stacked bilinear attention module is used for predicting social texts;
the combined decoding and training module is used for calculating an empirical risk function for the stacked bilinear attention module, performing back propagation gradient adjustment parameters, fitting a training function, and finally performing parallel calculation to accelerate model training by using the GPU.
Further, the social text crawling module executes the following steps:
firstly, compiling a webpage crawler by using Scapy based on Python, configuring the Scapy, setting a crawling time interval and an agent, and then positioning related text content of a webpage for extraction.
Further, the specific steps of filtering in the pretreatment module are as follows:
firstly, English regular text Gigaword is used for training corpus, then a language model tool Ken L M is used for training a language model, and finally the language model is used for calculating scores of downloaded social texts and filtering the scores by using a threshold value.
Further, the specific steps of generating the initialization word vector in the preprocessing module are as follows:
firstly, training regular texts and social texts of good words by using a Glove tool to generate sentence word vectors { e } of the regular texts1,e2,…,eLSentence word vector of social text { e'1,e'2,…,e'LL, where L represents the sentence length that needs dependency analysis.
Further, the base bilinear attention module performs the following steps:
firstly, a bidirectional long-time memory module is used for modeling a sentence, then a self-attention module is used for generating the dependency relationship of other words on the current word, then a multilayer perceptron module is used for purifying the generated word feature vector, and finally a bilinear attention module is used for generating a target function of the dependency relationship among regular text words for training.
Further, the stacked bilinear attention module performs the following steps:
firstly, outputting the purified word feature vector in the basic model as a part to a stacked neural network with the same structure as the basic model, and then predicting the dependency relationship of the social text.
Further, the joint decoding and training module performs the steps of:
firstly, combining a base bilinear attention module and a stacked bilinear attention module to form a whole deep dependence analysis network, then decoding by using a beam search algorithm, then training a model by gradient back propagation, continuously iterating until convergence, and finally accelerating parallel training by using a GPU.
The invention has the beneficial effects that:
the method uses a stacked neural network structure, uses regular text in a base neural network for pre-training to overcome the problem of sparse social text data, uses a global objective function for training and decoding to better consider global information, adds a self-attention mechanism on the basis of the original bidirectional L STM to better model the relationship among words, and uses the base layer and the stacked head and tail word feature vectors to better balance two layers of learning results when calculating the stacked neural network.
Drawings
FIG. 1 is a block diagram of the system of the present invention;
FIG. 2 is a block diagram of a radix bilinear attention module on regular text;
FIG. 3 is a schematic diagram of a stacked bilinear attention module;
FIG. 4 is an exemplary diagram of a social text parsing tree.
Detailed Description
The first embodiment is as follows: specifically describing the present embodiment with reference to fig. 1, the social text dependency parsing system based on deep neural network according to the present embodiment includes: the system comprises a social text crawling module, a preprocessing module, a basic bilinear attention module, a stacked bilinear attention module and a joint decoding and training module;
the social text crawling module is used for crawling social texts from social media websites;
the preprocessing module is used for filtering the obtained social text and generating an initialization word vector;
the basic bilinear attention module is used for pre-training by utilizing regular texts;
the stacked bilinear attention module is used for predicting social texts;
the combined decoding and training module is used for calculating an empirical risk function for the stacked bilinear attention module, performing back propagation gradient adjustment parameters, fitting a training function, and finally performing parallel calculation to accelerate model training by using the GPU.
A. A social text crawling step: as a further description of the present invention, the step a includes the following steps:
a1, webpage obtaining step: compiling a webpage crawler by using Scapy based on Python, wherein the webpage crawler is set, a main module is crawled, and data are stored;
a2, text extraction: extracting related content in the webpage by using Goose based on Python;
B. a pretreatment step: filtering by using a filtering algorithm, segmenting words of the filtered text, and generating an initialized word vector by using a word vector training tool, wherein the step B comprises the following steps of:
b1, text filtering step: filtering the social text using a language model tool;
b2, word segmentation and word vector training steps: performing word segmentation on the selected text and training an initial word vector;
C. modeling a sequence by using a bidirectional long-short time memory (L STM) module, generating the influence of other words on a current word by using a Self-attention (Self-attention) module, purifying generated word feature vectors by using a multi-layer perceptron module, and finally generating an objective function of the dependence relationship among regular text words by using bilinear attention (Bi-linear attention) for training, wherein the basic bilinear attention module is shown in FIG. 2 and is used for further explanation of the invention, and the step C comprises the following steps:
c1, bidirectional long-short time memory step: in each unit related to the word, memorizing or forgetting the current word or historical information can process long-term and short-term memorization;
c2, self-attention step: the self-attention mechanism is used for modeling the soft alignment among the words, so that the influence of bidirectional long-time and short-time memory only considering context information is made up, and the inter-word relation is better described;
c3, multi-layer perceptron step: generating a current word as a head and tail dependency vector through multilayer nonlinear transformation, and reflecting the characteristic description of the current word as the head and tail;
c4, bilinear attention step: calculating the relationship between two words through a bilinear attention mechanism, and reflecting the dependency relationship score between the current word and other words;
D. a stack bilinear attention step: the word feature vector after purification in the Base Model (Base Model) is output as a part to a stacked neural network with the same structure as the Base Model (a bidirectional long-time memory module, a self-attention module, a multi-layer perceptron module and finally bilinear attention generating a target function of the dependency relationship among the social text words for training), the stacked bilinear attention is shown in fig. 3, and the step D comprises the following steps:
d1, a step of memorizing the two-way length of the stack: on the basis of the base output feature vector, a layer of two-way long-time memory is established, not only the feature vector of the regular text of the base layer is considered, but also the word vector of the current social text is considered, and due to the particularity of social dependency analysis, the special word vector is used for representing ROOT (a ROOT node representing dependency relationship) and EMP (a word representing no dependency relationship) so as to depict the special dependency phenomenon that the head word in the social text is ROOT and has no head word;
d2, stacked self-attention step: a layer of self-attention step is added on the basis of the stacked two-way long-short-time memory step to depict the relation between social text words, so that the problem that the two-way long-short-time memory only considers local context information is solved;
d3, stacked multilayer perceptron step: generating a head and tail word feature vector for the social text word;
d4, stacked bilinear attention step: calculating the relationship between the two words through a bilinear attention mechanism, reflecting the dependency relationship score between the current word and other words, and including the head and tail word feature vectors generated by the base model in addition to the current head and tail word feature vectors, so that feature information can be referenced from the base model;
E. joint decoding and training step: as a further explanation of the present invention, step E includes the following steps of training a basic bilinear attention module, stacking a new module with the same structure as the basic bilinear attention module on the trained result (the module on the stack is not trained), and performing joint decoding by using a stacked neural network during decoding:
e1, joint decoding step: combining the steps A, B, C and D to form a whole deep dependence analysis network, calculating an objective function value, adopting GPU parallel training to accelerate the dependence relationship result of a given social text sentence, adopting a global-based beam search algorithm for decoding, and considering the historically generated dependence result;
e2, back propagation step: the parameters are updated according to the calculated gradient and iterated until convergence.
Fig. 1 shows a block diagram of the system of the present invention, which is set forth in detail below:
step A1: the crawler was written using the individual components of Scapy. Defining data needing to be captured and post-processed by using a project module; configuring the script by using the configuration module file so as to modify a user-agent, set a crawling time interval, set an agent, configure various middleware and the like; the pipeline module is used for storing data needing to be processed in the later period, so that the crawling and the processing of the data are separated; the crawler is customized using a crawler module.
Step A2: and removing disordered characters and pictures on the webpage, and only reserving the text part which is subjected to the finishing typesetting. And positioning the related text content of the webpage for extraction.
And step B1, training a language model by using English regular text Gigaword training corpus and a language model tool Ken L M, calculating scores (the scores reflect the fluency of the language) by using the language model for the downloaded social texts, and filtering out low scores by using a threshold value.
Step B2, compared with regular texts, the social texts have some special linguistic phenomena such as @ mention, emoticon (Emotion), website (UR L), # subject (Hashtag), forward (Retweet) and abbrevation (Abbreviation), the social texts are kept as independent marks during word segmentation, punctuation separation is needed as the regular texts, punctuation separation is needed for the regular texts, the regular texts are trained for the regular texts and the social texts with the Glove tool, and sentence vector representation { e } of the regular texts is generated1,e2,...,eLSentence word vector representation of social text { e'1,e'2,...,e'LL, where L indicates that dependency analysis sentence length is required.
Step C1 use of a bidirectional L STM with peep hole (taking long term memory into account when calculating gates) including three gates, forget gate ft(for controlling long-term memory), input gate it(for controlling short-term memory of current word) and an output gate ot(for controlling the weighted average post-memory vector), the basic process is described as follows:
forget the door: f. oft=σ(Wf·[Ct-1,ht-1,et]+bf)
An input gate: i.e. it=σ(Wi·[Ct-1,ht-1,et]+bi)
An output gate: ot=σ(Wo·[Ct-1,ht-1,et]+bo)
Wherein sigma is sigmoid function and represents the value of [0,1 ]]Is a weight function, Wf,Wi,WoRepresenting a parameter matrix, Ct-1Representing the long-term memory delivered last time, ht-1Hidden state vector representing the last moment, etA feature vector representing a pre-trained regular text word, [,]representing a vector connection, bf,bi,boDenotes an offset vector, h0And hL+1The initial vector uses random initialization, L indicating the length of the text.
Short-term memory vector for current word:
Figure BDA0002416706810000051
long-term memory vector of current word:
Figure BDA0002416706810000052
where tanh is the hyperbolic tangent function,
Figure BDA0002416706810000053
representing hadamard products between elements, current time long memory vector CtEqual to the last moment long memory vector Ct-1Short-term memory vector with current word
Figure BDA0002416706810000061
Weighted average obtained, forgetting gate ftAnd an input gate itIs a weight vector.
Forward hidden state vector for current word:
Figure BDA0002416706810000062
backward hidden state vector
Figure BDA0002416706810000063
Generating mode and forward hidden state vector
Figure BDA0002416706810000064
Similarly, the hidden state vector h at the next moment is simply not considered in the calculation of the gate functiont-1Instead, the hidden state vector h at the previous moment is consideredt+1. Entire hidden state vector
Figure BDA0002416706810000065
Obtained by forward and backward implicit state vector concatenation.
Step C2: the self-attention mechanism with multiple heads is described as follows:
and (3) querying the vector: q. q.st=Wq·ht
Keyword vector: k is a radical oft=Wk·ht
Vector of values: v. oft=Wv·ht
Wherein
Figure BDA0002416706810000066
(dmodelIs the dimension of the model vector) is the output of the previous step,
Figure BDA0002416706810000067
Figure BDA0002416706810000068
(dkis a query vector, the dimensions of a key vector and a value vector) is a parameter matrix, which represents that a linear transformation is performed on a feature vector to generate a query vector (different parameter matrices are adopted to represent that different representations are generated for the same feature vector).
Attention weight:
Figure BDA0002416706810000069
where softmax represents the normalized probability calculated by column j,
Figure BDA00024167068100000610
indicating that the dimensionality is used to adjust the results.
Attention generation vector:
Figure BDA00024167068100000611
Figure BDA00024167068100000612
representation for value vector vjA weighted average is performed.
Single-head attention generation matrix:
Figure BDA00024167068100000613
wherein the single-head attention generating matrix
Figure BDA00024167068100000614
Is the attention vector chAre connected in columns.
Multi-head attention generation matrix: c ═ C1,...,CH]
Self-attention feature matrix: s ═ C.WS
Wherein H represents the number of heads, the parameter matrix adopted by each head is different, and each head forms an attention generation matrix ChMatrix connections generated for each head
Figure BDA0002416706810000071
Reuse parameter matrix
Figure BDA0002416706810000072
Performing linear transformation to generate self-attention feature matrix
Figure BDA0002416706810000073
The self-attention feature vector representation s of each wordt(is one row of the S matrix).
Step C3: generating representations of head and tail feature vectors using a Multi-layer Perceptron (Multi-layer Perceptron):
head word feature vector:
Figure BDA0002416706810000074
the characteristic vector of the end word:
Figure BDA0002416706810000075
MLP(head)and M L P(dep)The function representation is subjected to a multi-layer non-linear transformation (using a hyperbolic tangent function tanh), and the two functions are different by using a parameter matrix.
Step C4: the bilinear attention model adopts a bilinear affine (Bi-affine) function to calculate the dependency relationship score between the head and the tail word feature vectors.
The dependency score is:
Figure BDA0002416706810000076
conversion matrix between U vectors, whead,wdepRepresenting head and tail parameter vectors.
Step D1 similar to the C1 procedure, using a bidirectional L STM with peepholes, the basic procedure is described as follows:
forget the door: f't=σ(W'f·[C't-1,h't-1,st,e't]+b'f)
An input gate: i't=σ(W'i·[C't-1,h't-1,st,e't]+b'i)
An output gate: o 'to't=σ(W'o·[C't-1,h't-1,st,e't]+b'o)
Wherein s istA vector representing the inter-word relationship of the response generated by the base layer through the self-attention mechanism. Difference from step C1: considering the input vector not only the feature vector s of the previous layertCurrent social text word feature vector e 'is also considered't(and denormal text word feature vector et) At the same time miningWith different parameter matrix W'f,W'i,W'oAnd vector b'f,b'i,b'o
Short-term memory vector for current word:
Figure BDA0002416706810000077
calculating the short-term memory vector of the current word also needs to consider the feature vector s of the previous layert. Step D2: similar to the C2 procedure, except that the input vector is derived from the hidden state vector h1...htBecomes the Stack hidden State vector h't...h't
Step D3: similar to the step C3, except that the head-tail vectors generated here need to take into account not only the generated head-and-tail vectors of the current layer but also the generated head-and-tail vectors of the base layer:
head vector quantity:
Figure BDA0002416706810000081
tail vector:
Figure BDA0002416706810000082
combined head vector:
Figure BDA0002416706810000083
tail vector after combination:
Figure BDA0002416706810000084
Figure BDA0002416706810000085
indicating that the corresponding dimensions are added.
Step D4: similar to the step of C4, the combined head and tail vectors are used to calculate the scores of the two dependencies.
Step E1: the objective function employs a maximum interval based ranking function (structured change loss function). The calculation formula is as follows:
Figure BDA0002416706810000086
wherein
Figure BDA0002416706810000087
Training data set
Figure BDA0002416706810000088
Total of N pairs of data input sentences xiAnnotating analytical trees for gold yiC represents the weighted hamming distance,
Figure BDA0002416706810000089
representing the square of a 2-norm for a parameter matrix or vector, and λ is a weighting function for balancing the regularization factor
Figure BDA00024167068100000810
And an objective function L (x)i,yiTheta) to prevent overfitting, 1/N is used to lose L (x) for all sentence levelsi,yi(ii) a Θ), which is a set of parameters that contains all the parameters during neural network training.
The joint decoding process is divided into two parts: decoding of a base model for pre-training, the basic formula is:
Figure BDA00024167068100000811
indicating that the N-best result is searched; the decoding of the stack model is performed on the basis of the base model, and the basic formula is as follows:
Figure BDA00024167068100000812
fig. 4 shows the result of the dependency parsing (ROOT represents the ROOT word, the word without edges represents the unselected special word, and the two words with edges: the arrow points to the head word, and the other is the end word).
Step E2: the batch updating mode has the advantage of combining the quick convergence of random updating and the stability of whole batch updating. Adam (adaptive motion estimation) adjusts the learning rate for each parameter using first and second Moment estimates of the gradient. Adam has the advantages that after offset correction, the learning rate of each iteration has a certain range, so that the parameters are relatively stable.
Figure BDA00024167068100000813
m0=0,n0=0
mt=μ·mt-1+(1-μ)·gt
Figure BDA0002416706810000091
Figure BDA0002416706810000092
Figure BDA0002416706810000093
Figure BDA0002416706810000094
Figure BDA0002416706810000095
Wherein g istRepresents the parameter W of the objective function J for the time tt(WtEither a matrix or a vector, depending on the particular case of the parameters); the algorithm updates the exponential moving average (m) of the gradientt) And exponential moving average of squared gradient (n)t) Wherein the hyperparameter u, v ∈ [0,1 ]]The exponential decay rate of these moving averages is controlled. The moving average is estimated using the first moment (mean) and the second original moment (non-centered variance) of the gradient. However, these moving averages m0,n0Vectors initialized to 0, result in moment estimates that are biased toward zero, particularly during the initial time step (especially when the decay rate is small, i.e., the decay rate is small), i.e., theμ, ν approaches 1). This initialization bias is easily cancelled out, resulting in a bias correction
Figure BDA0002416706810000096
And
Figure BDA0002416706810000097
the gradient is estimated by unbiased first moment and second moment respectively, the exponential attenuation rate mu of the first moment is 0.9, the exponential attenuation rate v of the second moment is 0.999, the smooth parameter is 1e-08, the learning rate η is 0.001, and the training parameters are in the interval of [ -0.1,0.1]The sampling is carried out in uniform distribution; dropout is set to 0.5; the minimum batch size is set to 10. Multiplication represents the product between a vector or matrix and a scalar,
Figure BDA0002416706810000098
multiplying the product of corresponding elements between the expression vectors or matrixes; for division, a vector or matrix divided by a scalar represents each element divided by a scalar, and a vector or matrix divided by a vector or matrix represents the corresponding element division.
The recursion part of the deep network in the patent adopts a BPTT (Back propagation Through time) algorithm, which is basically the same as the traditional back propagation algorithm, but a plurality of connection parameters between the internal parameters of each hidden unit and the hidden units are shared, and the parameters need to be accumulated for gradient updating of each step.
It should be noted that the detailed description is only for explaining and explaining the technical solution of the present invention, and the scope of protection of the claims is not limited thereby. It is intended that all such modifications and variations be included within the scope of the invention as defined in the following claims and the description.

Claims (7)

1. The social text dependency syntactic analysis system based on the deep neural network is characterized by comprising: the system comprises a social text crawling module, a preprocessing module, a basic bilinear attention module, a stacked bilinear attention module and a joint decoding and training module;
the social text crawling module is used for crawling social texts from social media websites;
the preprocessing module is used for filtering the crawled social texts and generating an initialization word vector;
the basic bilinear attention module is used for pre-training by utilizing regular texts;
the stacked bilinear attention module is used for predicting social texts;
the joint decoding and training module is used for carrying out joint decoding training on the base bilinear attention module and the stacked bilinear attention module, carrying out back propagation gradient adjustment parameters, fitting a training function and finally utilizing a GPU (graphics processing Unit) to calculate and accelerate model decoding training in parallel.
2. The deep neural network-based social text dependency parsing system of claim 1, wherein the social text crawling module performs the steps of:
firstly, compiling a webpage crawler by using Scapy based on Python, configuring the Scapy, setting a crawling time interval and an agent, and then positioning related text content of a webpage for extraction.
3. The deep neural network-based social text dependency parsing system of claim 2, wherein the specific steps of filtering in the preprocessing module are:
firstly, English regular text Giga word is used for training corpus, then a language model tool Ken L M is used for training a language model, and finally the language model is used for calculating scores of downloaded social texts and filtering the scores by using a threshold value.
4. The deep neural network-based social text dependency parsing system of claim 2, wherein the specific steps of generating the initialization word vector in the preprocessing module are:
firstly, the methodTraining regular texts and social texts of good words by using a Glove tool to generate sentence word vectors { e } of the regular texts1,e2,…,eLSentence word vector of social text { e'1,e'2,…,e'LL, where L represents the sentence length that needs dependency analysis.
5. The deep neural network-based social text dependency parsing system of claim 3, wherein the radix bilinear attention module performs the steps of:
firstly, a bidirectional long-time memory module is used for modeling a sentence, then a self-attention module is used for generating the dependency relationship of other words on the current word, then a multilayer perceptron module is used for purifying the generated word feature vector, and finally a bilinear attention module is used for generating a target function of the dependency relationship among regular text words for training.
6. The deep neural network-based social text dependency parsing system of claim 5, wherein the stacked bilinear attention module performs the steps of:
firstly, outputting the purified word feature vector in the basic model as a part to a stacked neural network with the same structure as the basic model, and then predicting the dependency relationship of the social text.
7. The deep neural network-based social text dependency parsing system of claim 6, wherein the joint decoding and training module performs the steps of:
firstly, combining a base bilinear attention module and a stacked bilinear attention module to form a whole deep dependence analysis network, then decoding by using a beam search algorithm, then training a model by gradient back propagation, continuously iterating until convergence, and finally accelerating parallel training by using a GPU.
CN202010193329.XA 2020-03-18 2020-03-18 Social text dependency syntactic analysis system based on deep neural network Active CN111414749B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010193329.XA CN111414749B (en) 2020-03-18 2020-03-18 Social text dependency syntactic analysis system based on deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010193329.XA CN111414749B (en) 2020-03-18 2020-03-18 Social text dependency syntactic analysis system based on deep neural network

Publications (2)

Publication Number Publication Date
CN111414749A true CN111414749A (en) 2020-07-14
CN111414749B CN111414749B (en) 2022-06-21

Family

ID=71491131

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010193329.XA Active CN111414749B (en) 2020-03-18 2020-03-18 Social text dependency syntactic analysis system based on deep neural network

Country Status (1)

Country Link
CN (1) CN111414749B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111984845A (en) * 2020-08-17 2020-11-24 江苏百达智慧网络科技有限公司 Website wrongly-written character recognition method and system
CN112347269A (en) * 2020-11-11 2021-02-09 重庆邮电大学 Method for recognizing argument pairs based on BERT and Att-BilSTM
CN112667940A (en) * 2020-10-15 2021-04-16 广东电子工业研究院有限公司 Webpage text extraction method based on deep learning
WO2021147404A1 (en) * 2020-07-30 2021-07-29 平安科技(深圳)有限公司 Dependency relationship classification method and related device
CN113254636A (en) * 2021-04-27 2021-08-13 上海大学 Remote supervision entity relationship classification method based on example weight dispersion
CN113901847A (en) * 2021-09-16 2022-01-07 昆明理工大学 Neural machine translation method based on source language syntax enhanced decoding
CN116090450A (en) * 2022-11-28 2023-05-09 荣耀终端有限公司 Text processing method and computing device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180129931A1 (en) * 2016-11-04 2018-05-10 Salesforce.Com, Inc. Quasi-recurrent neural network based encoder-decoder model
CN109034368A (en) * 2018-06-22 2018-12-18 北京航空航天大学 A kind of complex device Multiple Fault Diagnosis Method based on DNN
CN109598387A (en) * 2018-12-14 2019-04-09 华东师范大学 Forecasting of Stock Prices method and system based on two-way cross-module state attention network model
CN109885670A (en) * 2019-02-13 2019-06-14 北京航空航天大学 A kind of interaction attention coding sentiment analysis method towards topic text
CN110162749A (en) * 2018-10-22 2019-08-23 哈尔滨工业大学(深圳) Information extracting method, device, computer equipment and computer readable storage medium
CN110276439A (en) * 2019-05-08 2019-09-24 平安科技(深圳)有限公司 Time Series Forecasting Methods, device and storage medium based on attention mechanism
CN110879940A (en) * 2019-11-21 2020-03-13 哈尔滨理工大学 Machine translation method and system based on deep neural network
CN111818329A (en) * 2020-06-24 2020-10-23 天津大学 Video quality evaluation method based on stack type adaptive encoder
CN112084769A (en) * 2020-09-14 2020-12-15 深圳前海微众银行股份有限公司 Dependency syntax model optimization method, device, equipment and readable storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180129931A1 (en) * 2016-11-04 2018-05-10 Salesforce.Com, Inc. Quasi-recurrent neural network based encoder-decoder model
CN109034368A (en) * 2018-06-22 2018-12-18 北京航空航天大学 A kind of complex device Multiple Fault Diagnosis Method based on DNN
CN110162749A (en) * 2018-10-22 2019-08-23 哈尔滨工业大学(深圳) Information extracting method, device, computer equipment and computer readable storage medium
CN109598387A (en) * 2018-12-14 2019-04-09 华东师范大学 Forecasting of Stock Prices method and system based on two-way cross-module state attention network model
CN109885670A (en) * 2019-02-13 2019-06-14 北京航空航天大学 A kind of interaction attention coding sentiment analysis method towards topic text
CN110276439A (en) * 2019-05-08 2019-09-24 平安科技(深圳)有限公司 Time Series Forecasting Methods, device and storage medium based on attention mechanism
CN110879940A (en) * 2019-11-21 2020-03-13 哈尔滨理工大学 Machine translation method and system based on deep neural network
CN111818329A (en) * 2020-06-24 2020-10-23 天津大学 Video quality evaluation method based on stack type adaptive encoder
CN112084769A (en) * 2020-09-14 2020-12-15 深圳前海微众银行股份有限公司 Dependency syntax model optimization method, device, equipment and readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TIMOTHY DOZAT: "Deep Biaffine Attention for Neural Dependency Parsing", 《COMPUTATION AND LANGUAGE》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021147404A1 (en) * 2020-07-30 2021-07-29 平安科技(深圳)有限公司 Dependency relationship classification method and related device
CN111984845A (en) * 2020-08-17 2020-11-24 江苏百达智慧网络科技有限公司 Website wrongly-written character recognition method and system
CN111984845B (en) * 2020-08-17 2023-10-31 江苏百达智慧网络科技有限公司 Website wrongly written word recognition method and system
CN112667940A (en) * 2020-10-15 2021-04-16 广东电子工业研究院有限公司 Webpage text extraction method based on deep learning
CN112667940B (en) * 2020-10-15 2022-02-18 广东电子工业研究院有限公司 Webpage text extraction method based on deep learning
CN112347269A (en) * 2020-11-11 2021-02-09 重庆邮电大学 Method for recognizing argument pairs based on BERT and Att-BilSTM
CN113254636A (en) * 2021-04-27 2021-08-13 上海大学 Remote supervision entity relationship classification method based on example weight dispersion
CN113901847A (en) * 2021-09-16 2022-01-07 昆明理工大学 Neural machine translation method based on source language syntax enhanced decoding
CN113901847B (en) * 2021-09-16 2024-05-24 昆明理工大学 Neural machine translation method based on source language syntax enhancement decoding
CN116090450A (en) * 2022-11-28 2023-05-09 荣耀终端有限公司 Text processing method and computing device

Also Published As

Publication number Publication date
CN111414749B (en) 2022-06-21

Similar Documents

Publication Publication Date Title
CN111414749B (en) Social text dependency syntactic analysis system based on deep neural network
Shen et al. Disan: Directional self-attention network for rnn/cnn-free language understanding
US11620515B2 (en) Multi-task knowledge distillation for language model
Liu et al. Multi-timescale long short-term memory neural network for modelling sentences and documents
US10776581B2 (en) Multitask learning as question answering
Neubig Neural machine translation and sequence-to-sequence models: A tutorial
CN108733742B (en) Global normalized reader system and method
Wu et al. On multiplicative integration with recurrent neural networks
Zhao et al. Attention-Based Convolutional Neural Networks for Sentence Classification.
US10339440B2 (en) Systems and methods for neural language modeling
US11568266B2 (en) Systems and methods for mutual learning for topic discovery and word embedding
CN110879940B (en) Machine translation method and system based on deep neural network
WO2019083812A1 (en) Generating dual sequence inferences using a neural network model
Trask et al. Modeling order in neural word embeddings at scale
Li et al. A method of emotional analysis of movie based on convolution neural network and bi-directional LSTM RNN
Bajaj et al. Metro: Efficient denoising pretraining of large scale autoencoding language models with model generated signals
Chen et al. Deep neural networks for multi-class sentiment classification
Heigold et al. Neural morphological tagging from characters for morphologically rich languages
US20230351149A1 (en) Contrastive captioning neural networks
CN113157919A (en) Sentence text aspect level emotion classification method and system
Zhang et al. Feedforward sequential memory neural networks without recurrent feedback
Aggarwal et al. Recurrent neural networks
Cao et al. Stacked residual recurrent neural network with word weight for text classification
Wu et al. An empirical exploration of skip connections for sequential tagging
Artemov et al. Informational neurobayesian approach to neural networks training. Opportunities and prospects

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant