CN112560441B - Method for constructing composition syntax analysis tree by combining bottom-up rules with neural network - Google Patents

Method for constructing composition syntax analysis tree by combining bottom-up rules with neural network Download PDF

Info

Publication number
CN112560441B
CN112560441B CN202011525926.4A CN202011525926A CN112560441B CN 112560441 B CN112560441 B CN 112560441B CN 202011525926 A CN202011525926 A CN 202011525926A CN 112560441 B CN112560441 B CN 112560441B
Authority
CN
China
Prior art keywords
span
analysis tree
classifier
layer
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011525926.4A
Other languages
Chinese (zh)
Other versions
CN112560441A (en
Inventor
马连博
王经纬
常凤荣
Original Assignee
东北大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 东北大学 filed Critical 东北大学
Priority to CN202011525926.4A priority Critical patent/CN112560441B/en
Publication of CN112560441A publication Critical patent/CN112560441A/en
Application granted granted Critical
Publication of CN112560441B publication Critical patent/CN112560441B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a composition syntax analysis tree construction method combining a rule from bottom to top with a neural network, and belongs to the technical field of composition syntax analysis tree construction. Comprising the following steps: acquiring a PTB data set and preprocessing the PTB data set; creating a syntactic analysis tree construction model comprising a Bi-LSTM neural network, a tag classifier and a connection classifier; training the constituent syntax analysis tree construction model by utilizing the sentences in the PTB data set and the structural information of the constituent syntax analysis tree of the sentences; giving a sentence and POS labels corresponding to words in the sentence, and carrying out composition syntax analysis tree construction on the sentence by using the trained composition syntax analysis tree construction model. The lower-layer information of the syntactic analysis tree is utilized in the construction process to help judge the upper-layer structure of the syntactic analysis tree; the application of Bi-LSTM overcomes the limitation of manual extraction rules; the two classifiers respectively predict sentence structures and components, so that the accuracy of the syntactic analysis tree construction model is improved.

Description

Method for constructing composition syntax analysis tree by combining bottom-up rules with neural network
Technical Field
The invention belongs to the technical field of composition syntax analysis tree construction, and particularly relates to a composition syntax analysis tree construction method for constructing an English sentence composition syntax analysis tree by combining a bottom-up rule with a neural network.
Background
As machine learning and neural network technology continue to achieve in various fields, the application of machine learning and neural network technology to the field of natural language processing has been a current academic hotspot. Among them, component syntactic analysis is a basic task in natural language processing tasks, and is an important pre-task for many high-level tasks such as intelligent question-answering and machine translation. The goal of the component parsing task is to determine the grammatical structure of the entire sentence and present it in a tree graph.
The traditional composition syntax analysis tree construction method mainly comprises a rule syntax analysis algorithm and a probability statistics syntax analysis algorithm. In the rule syntax analysis algorithm, three rules are combined from bottom to top, from top to bottom, from bottom to top and from top to bottom, and the structure of the syntax analysis tree is carried out according to a context combination rule table extracted manually, wherein representative methods comprise a line drawing (chart) analysis algorithm and a CYK (Cocke Younger Kasami algorithm) analysis algorithm. Among the algorithms for the probabilistic statistical syntax analysis, the most commonly applied algorithm is the PCFG (Probabilistic context-free algorism) algorithm, which calculates the probabilities of all possible context combinations by manually extracting a table of context combination rules, constructs all possible sentence parsing trees, and takes the parsing tree with the highest probability as the final result.
On the basis of the traditional method, the application of the neural network method to the component syntactic analysis is a hot spot of current research. The application of the neural network method makes the construction of the component syntactic analysis tree not depend on manual extraction of rule tables, but replaces manual searching of rules by the neural network model, and feature extraction is performed on potential relations in sentences, so that the analysis tree is generated. The current neural network method realizes many applications on the construction of the component syntax analysis tree and achieves good effects.
In the prior art, a way of combining a top-down rule with a neural network has been proposed and has good effects. However, the method of applying the top-down rule has a limitation that the method cannot utilize the information of the lower layers of the parse tree. In the construction process of the component syntax analysis tree, the lower-layer information is important, so that the upper-layer structure content of the component syntax analysis tree can be judged.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a composition syntax analysis tree construction method combining a bottom-up rule with a neural network, which is a composition syntax analysis tree construction method combining the bottom-up rule with the neural network, and aims to combine the bottom-up rule with the neural network to play the role of analyzing tree low-layer information, and simultaneously, automatically extract information by using the neural network to construct a composition syntax analysis tree of English sentences.
In order to solve the technical problems, the method for constructing the composition syntax analysis tree by combining the bottom-up rule with the neural network comprises the following steps:
step 1: acquiring a PTB data set and preprocessing the PTB data set;
step 2: creating a syntactic analysis tree construction model, wherein the model comprises a Bi-LSTM neural network, a label classifier and a connection classifier;
the Bi-LSTM is used for extracting feature vectors of all parts with context information in sentences;
the tag classifier is used for predicting tags of all parts of the input sequence and completing tag classification tasks;
the connection classifier is used for predicting whether each part of the input sequence is connected with the adjacent part in the next layer of the composition syntax analysis tree;
step 3: training the composition syntax analysis tree construction model constructed in the step 2 by utilizing the sentences in the PTB data set and the structure information of the composition syntax analysis tree of the sentences;
step 4: giving a sentence and POS labels corresponding to words in the sentence, and carrying out component syntactic analysis tree construction on the sentence by using the component syntactic analysis tree construction model trained in the step 3.
Further, according to the method for constructing the composition syntax analysis tree of the neural network by combining the bottom-up rule, the preprocessing of the PTB data set in the step 1 includes:
a) All words, all POS labels and non-leaf node labels appearing in the data set are respectively represented by different vectors, and the numerical values of the vectors are randomly generated;
b) Extracting structural information of a constituent syntax analysis tree of each sentence in the PTB dataset, comprising: the content of each part on each layer in the analysis tree, whether the next layer and the adjacent parts on each layer in the analysis tree belong to the same part, and the labels of each part on each layer in the analysis tree.
Further, according to the composition syntax analysis tree construction method of the neural network by combining the bottom-up rules, the tag classifier and the connection classifier are both based on a CRF model.
Further, according to the method for constructing the composition syntax analysis tree by combining the neural network according to the bottom-up rule, the step 3 includes the following steps:
step 3.1: connecting word vectors of each word in a sentence with POS label vectors to obtain a representation vector of each word, inputting the representation vectors of all the words in the sentence into a Bi-LSTM neural network at the same time, extracting context characteristics from the sentence by the Bi-LSTM neural network, wherein the output of the Bi-LSTM neural network is a vector sequence with equal length as the input of the Bi-LSTM neural network;
step 3.2: marking a part in any span in a sentence as a span, and obtaining a characteristic vector sp corresponding to the span according to the output of the Bi-LSTM neural network;
step 3.3: firstly, inputting feature vectors sp corresponding to the span in a syntax analysis tree into a label classifier and a connection classifier layer by layer, inputting real non-leaf node labels of the spans and corresponding non-leaf node label vectors of the spans into the label classifier layer by layer, and inputting real conditions of whether adjacent spans belong to the same part or not in the next layer of the composition syntax analysis tree and the next adjacent spans into the connection classifier layer by layer; then the label classifier outputs the prediction labels of the span in each layer of sequence and the error value between the non-leaf node label obtained by predicting each layer of sequence and the real non-leaf node label; the connection classifier outputs an error value between a result obtained by predicting each span in each layer of sequence and a real result;
step 3.4: and (3) respectively accumulating and summing the error values obtained by the label classifier and the connection classifier in the step (3.3) in each layer, then accumulating the prediction error value accumulated values of the two parts, transmitting the summed values to the syntactic analysis tree construction model constructed in the step (2), and adjusting parameters contained in each part in the model by applying a back propagation algorithm.
Further, according to the method for constructing the composition syntax analysis tree by combining the bottom-up rule with the neural network, in the step 3.2, the feature vector sp is extracted layer by layer for the composition syntax analysis tree corresponding to the current sentence.
Further, according to the method for constructing the composition syntax analysis tree by combining the neural network according to the bottom-up rule, the step 4 includes the following steps:
step 4.1: according to the A), a word vector corresponding to the word in the sentence and a POS label vector corresponding to the word are obtained;
step 4.2: connecting word vectors in sentences with POS label vectors corresponding to the words, transmitting the word vectors to a Bi-LSTM neural network, extracting context features from the words in the sentences by the Bi-LSTM neural network, and outputting the context features as vector sequences with equal length as input;
step 4.3: taking each word in the sentence as a span, and obtaining a feature vector sp of each span according to the output result of the Bi-LSTM, so as to form an sp sequence corresponding to the sentence;
step 4.4: the sp sequence is respectively input into a tag classifier and a connection classifier, and the tag classifier and the connection classifier respectively give prediction results: the label classifier outputs the result of predicting the labels of each span in the sequence; the connection classifier predicts the connection condition of each part in the sequence;
step 4.5: according to the prediction results of the connection classifier on the connection conditions of the span obtained in the step 4.4, the span with the prediction result of 1 in the span sequence is connected with the adjacent span after the span, and the span with the prediction result of 0 is not connected with the adjacent span after the span, so that a new span sequence is formed;
step 4.6: each span in the new span sequence obtained in the step 4.5 is subjected to the characteristic vector sp corresponding to the span according to the output result of the Bi-LSTM in the step 4.3, so as to form a new sp sequence;
step 4.7: and repeatedly executing the steps 4.4 to 4.6 until the new span sequence obtained in the step 4.5 contains only one span, transmitting the characteristic vector sp corresponding to the span to a label classifier, and carrying out label prediction on the characteristic vector sp by the label classifier.
Further, according to the method for constructing the composition syntax analysis tree by combining the bottom-up rule with the neural network, the connection classifier predicts the connection condition of each part in the sequence in the step 4.4, if the current part is connected with the next adjacent part, then output 1, otherwise output 0.
The method for constructing the component syntactic analysis tree by combining the bottom-up rule with the neural network is a brand-new method for constructing the component syntactic analysis tree, combines the neural network with the syntactic analysis tree construction task by applying the bottom-up rule, and has the following beneficial effects: constructing a component syntactic analysis tree from bottom to top, utilizing low-level information of the syntactic analysis tree in the construction process to help judge the upper-level structure of the syntactic analysis tree, namely transmitting word vectors corresponding to words in the whole sentence and POS label vectors to Bi-LSTM, wherein the Bi-LSTM can effectively extract context information of each word in the sentence, deduce the context information of each span in the sentence, fully utilize the capability of extracting features of a neural network, and judge the connection of the spans according to the extracted information so as to construct a complete syntactic analysis tree from bottom to top; bi-LSTM neural network is good at processing sequence type input, and the application of the Bi-LSTM neural network can extract information needed for constructing a syntactic analysis tree from sentences more pertinently, so that the limitation of relying on manual extraction rules is overcome; the method comprises the steps of respectively carrying out sentence structure and component prediction by using a connection classifier and a label classifier, simultaneously respectively transmitting the span feature vectors to the label classifier and the connection classifier to respectively obtain prediction results about labels and structures, regarding each layer of input sequence as a whole instead of splitting each part in each layer based on a CRF model by using the two classifiers, so that the whole prediction about the sequence is given, and the two parts of prediction are respectively carried out by the two classifiers, wherein the prediction results between the two classifiers do not interfere with each other, thereby improving the accuracy of a syntactic analysis tree structure model.
Drawings
FIG. 1 is a schematic diagram of a composition parse tree for an example of English sentences in a PTB dataset;
FIG. 2 is a hierarchical partitioning diagram of a component parse tree of the example sentence of FIG. 1;
FIG. 3 is a flow chart of a method for constructing a composition syntax analysis tree by combining a bottom-up rule with a neural network according to the present embodiment;
FIG. 4 is a schematic diagram of the internal structure of the composition parse tree construction model according to the present embodiment;
FIG. 5 is a flow chart of a model training once using a sentence in the PTB dataset and its constituent parse tree information;
FIG. 6 is a flow chart of composition parse tree construction for a given sentence using a trained composition parse tree construction model;
FIG. 7 is a schematic diagram of a process for composition parse tree construction for a given sentence using a trained composition parse tree construction model.
Detailed Description
In order to facilitate an understanding of the present application, a more complete description of the present application will now be provided with reference to the relevant figures. Preferred embodiments of the present application are shown in the accompanying drawings. This application may, however, be embodied in many different forms and is not limited to the embodiments described herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
The basic task of complete syntactic analysis is to determine the syntactic structure of a sentence, which structure is presented in the form of a tree graph. In this embodiment, a description will be given of a component syntax analysis tree structure method of the bottom-up rule-based neural network of this embodiment taking an english sentence "BELL input strings inc. The sentence component syntax analysis tree is shown in fig. 1, wherein the leaf nodes of the tree are words (including punctuation marks) forming the sentence and POS (part of speech) tags corresponding to the words, for example, the first leaf node in fig. 1 is "NNP-BELL"; labels on non-leaf nodes, such as NP, VP, represent the sentence component made up of all words within the span of the node.
The neural network method is applied to construct a component syntactic analysis tree, a construction model is firstly constructed, so that the model aims at generating a correct syntactic analysis tree, and then the model is trained by data, so that optimization and adjustment of parameters in the model are continuously carried out. After training of the model is completed, the model is tested with test data to verify that the model performs well in order to verify its ability to generate a correct parse tree. The model constructed at this time is based on a DyNet 2.1 framework, which is a neural network library developed by the university of Carcinyl Mercury and other institutions together, which can run more efficiently on a CPU or GPU and works well with networks having dynamic structures, so it has applications in many aspects of natural language processing. In the process of constructing the component syntactic analysis tree from bottom to top, as the number of nodes of each layer of the tree is changed, a network model with a dynamic structure needs to be constructed, so that the application of the DyNet framework can not only improve the efficiency, but also provide great convenience for the coding process. The data set of the training model applied in the invention is PTB (Penn Tree Bank) data set which is developed by the university of pennsylvania, the corpus source is the daily newspaper of the wall street in 1989, and the data set is a general data set for researching a composition syntactic analysis method.
The composition syntax analysis tree for each sentence in the PTB dataset is in the form of: the components are separated by brackets, the content in the same pair of brackets is taken as a sentence component, and the labels corresponding to the components are marked in front of the brackets. Taking the sentence as an example, the syntax analysis tree corresponding to the sentence in the dataset is as follows: (TOP (S (NP (NNP BELL) (NNP INDUSTRIES)) (NNP inc.)) (VP (VBD increased) (NP (prp$is) (NN quaterly))) (PP (TO) (NP (CD 10) (NNs centers))))) (PP (IN from) (NP (NP (CD seven) (NNS cents)) (NP (DT a) (NN share)))))))) (-)). The composition syntax analysis tree with a clearer hierarchical structure corresponding to the sentence is shown in fig. 2. The labels include both POS labels and non-leaf node labels.
The POS (part of speech) tag is a tag that performs part of speech description on words in a sentence according to context information. There are 45 kinds of POS tags, respectively: 'and' thereto ',' and 'thereto-LRB-', 'and' thereto-RRB- ',' thereto 'CC', 'CD', 'DT', 'EX', 'FW', 'IN', 'JJ', 'JJR', 'JJs', 'LS', 'MD', 'NN', 'NNP', 'NNPs', 'NNs', 'PDT', 'POS', 'PRP', 'prp$', 'RB', 'RBR', 'RBs', 'RP', 'SYM', 'TO', 'UH', 'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ', 'WDT', 'WP', 'WRB', 'v'.
The total of 26 non-leaf node labels in the component syntax analysis tree are respectively: an 'ADJP', 'ADVP', 'CONJP', 'FRAG', 'INTJ', 'LST', 'NAC', 'NP', 'NX', 'PP', 'PRN', 'PRT', 'QP', 'RRC', 'S', 'SBAR', 'SBARQ', 'SINV', 'SQ', 'UCP', 'VP', 'WHADJP', 'WHADVP', 'WHNP', 'WHPP', and 'X'. In addition, a tag "0" is added to indicate that the current portion cannot form an independent component, and cannot correspond to a conventional non-leaf node tag, which is defined as a non-leaf node tag in the implementation process.
The method for constructing the composition syntax analysis tree of the neural network by combining the bottom-up rule according to the present embodiment, as shown in fig. 3, includes the following steps:
step 1: acquiring a PTB data set and preprocessing the PTB data set;
a) All words, all POS tags and non-leaf node tags appearing in the dataset are represented by different vectors, respectively, and the values of the vectors are randomly generated. Natural language text cannot be recognized directly by a computer and needs to be converted into numerical form. Words appearing in all sentences in the data set, including punctuation, are converted into vectors respectively, the same word is represented by the same vector, different words are represented by different vectors, the vector dimension can be set to be 100, and the numerical value of the vector is randomly generated. Similarly, POS tags and non-leaf node tags also need to be converted into vectors, the vector dimension can be set to 50, and the values of the vectors are randomly generated.
B) Extracting structural information of a constituent syntax analysis tree of each sentence in the PTB dataset, comprising: the content of each part on each layer in the analysis tree, whether each part on each layer in the analysis tree belongs to the same part on the next layer and the adjacent parts behind the next layer, and the labels of each part on each layer in the analysis tree;
for example, the tree is parsed according to the composition of the sentence, and the contents of each part on each layer of the parse tree are obtained, and the result is shown in table 1:
TABLE 1 per-layer content list of the composition parse tree for the sentence
According to the component syntax analysis tree, it is obtained whether each part of each layer of the analysis tree belongs to the same part as the adjacent part behind the next layer, namely, the connection condition of each part is shown in table 2, wherein 1 in table 2 indicates that the part of the upper layer belongs to the same part as the adjacent part behind the next layer, and 0 in table 2 indicates that the part of the upper layer does not belong to the same part as the adjacent part behind the next layer.
TABLE 2 connection case Table of parts in hierarchy of composition syntax analysis tree of sentence
Hierarchy level Whether or not the parts on each layer are the same part in the next layer and the adjacent parts behind it
Leaf node layer 1 1 0 0 1 0 0 1 0 0 0 1 1 0 0
First layer 0 0 0 0 0 0 1 0 0
Second layer 0 0 0 1 0 1 0 0
Third layer 0 1 1 1 0 0
Fourth layer 1 1 0
Fifth layer 0
From the component parse tree, non-leaf node labels for the parts at each level of the parse tree are obtained, as shown in Table 3:
TABLE 3 non-leaf node tag tables for portions on each layer
Hierarchy level Non-leaf node labels for portions on each layer
Leaf node layer 0 0 0 0 0 0 0 0 0 0 0NP NP NP 0
First layer NP 0NP 0NP 0NP 0 0
Second layer 0 0 0 0 0 0NP 0
Third layer 0 0 0PP PP 0
Fourth layer 0VP 0
Fifth layer S
Step 2: a syntactic analysis tree construction model is created, the model comprising a Bi-LSTM neural network, a label classifier and a junction classifier, both classifiers being based on the CRF model.
Bi-LSTM (Bi-directional Long and short term memory, bi-directional long-short-term memory neural network) is formed by combining forward LSTM and backward LSTM, and single neurons of the LSTM are formed by forgetting gates, memory gates and output gates, so that compared with other types of neural networks, the Bi-LSTM can better extract information transmitted by the content input before in the process of serializing input, and the Bi-LSTM formed by the forward LSTM and the backward LSTM can extract information transmitted by the forward content and the backward content of the serializing input information, so that the Bi-LSTM can better hold the context information of each part in sentences, and has wide application in natural language processing. In the present invention, bi-LSTM is also used to extract feature vectors with context information for each part of the sentence.
The tag classifier is used for predicting tags of all parts of the input sequence to complete the tag classification task, and the classification result is 27 non-leaf node tags (including 26 traditional non-leaf node tags and tag '0' indicating that a complete component in a syntactic analysis process cannot be formed), and a CRF (conditional random field, conditional random field model) model is arranged inside the tag classifier and can predict the tags of all the parts of the input sequence.
The task of the connection classifier is to judge whether each part on each layer belongs to the same part in the next layer of the component syntax analysis tree and the adjacent part behind the next layer, and the classification result is connected or disconnected and corresponds to output 1 or 0. The join classifier internals are CRF models that make predictions as to whether each part in the input sequence is joined with its next adjacent part in the next level of the component parse tree.
The internal structure and data transmission of the composition syntax analysis tree construction model are shown in fig. 4, after sentence information is input, the Bi-LSTM neural network is utilized to extract context information for the sentence, and the Bi-LSTM output is respectively transmitted to two classifiers after being processed.
Step 3: training the composition syntax analysis tree construction model constructed in the step 2 by utilizing the sentences in the PTB data set and the structure information of the composition syntax analysis tree of the sentences;
after the model is trained, the model can be directly used, and tasks can be executed according to a training mode. The trained parse tree construction model can be used to parse tree construction for a newly obtained sentence. Fig. 5 is a flowchart of training a model once by using one sentence and its constituent syntax in the PTB dataset, and specifically includes the following steps, according to the same training step, of completing training of the model one by using a part of sentences or all sentences in the PTB dataset.
Step 3.1: connecting word vectors of each word in a sentence with POS label vectors to obtain a representation vector of each word, inputting the representation vectors of all the words in the sentence into a Bi-LSTM neural network at the same time, extracting context characteristics from the sentence by the Bi-LSTM neural network, wherein the output of the Bi-LSTM neural network is a vector sequence with equal length as the input of the Bi-LSTM neural network;
the word vector is concatenated with the POS tag vector of the word in such a way that the meaning of the current word in context is better represented. Taking the word "BELL" in the sentence as an example, the corresponding word vector is w= { w 1 ,w 2 ,…w 100 The vector of the corresponding POS tag is p= { p } 1 ,p 2 ,…p 50 The expression vector of the word "BELL" is wp= { w } 1 ,w 2 ,…w 100 ,p 1 ,p 2 ,…p 50 }. The expression vector of all words in the sentence is taken as the input of Bi-LSTM, and the output of Bi-LSTM is a vector containing a plurality of vectorsThe number of vectors is equal to the number of representative vectors of the input word.
Step 3.2: marking a part in any span in a sentence as a span, and obtaining a characteristic vector sp corresponding to the span according to the output of the Bi-LSTM neural network;
for example, the span from the i-th word to the j-th word in a sentence is denoted as span [ i, j ], and the corresponding feature vector is sp (i, j), and span [1,2] represents "BELL input strings" for the example of the sentence. From the characteristics of Bi-LSTM, the context feature vector sp (i, j) of any span [ i, j ] in the sentence can be calculated:
sp_f(i,j)=lstm(j)-lstm(i) (1)
sp_b(i,j)=lstm(i+1)-lstm(j+1) (2)
sp (i, j) = [ sp_f (i, j), sp_b (i, j) ] (3) wherein LSTM (i) represents the i-th vector in the Bi-LSTM output vector sequence, LSTM (j) represents the j-th vector in the Bi-LSTM output vector sequence, LSTM (i+1) represents the i+1-th vector in the Bi-LSTM output vector sequence, LSTM (j+1) represents the j+1-th vector in the Bi-LSTM output vector sequence, sp_f (i, j) represents the feature vector extracted from the forward portion of the sequence, and sp_b (i, j) represents the feature vector extracted from the backward portion of the sequence. sp (i, j) represents a forward and backward feature vector that combines the two features, representing the contextual features of the sentence from the ith word to the jth word.
In this embodiment, feature vectors sp are extracted layer by layer for the component syntax analysis tree corresponding to the current sentence, that is, feature vectors sp of all span on each layer are extracted layer by layer, and feature vectors sp of each span are obtained according to the output result of Bi-LSTM;
assuming that each word in a sentence is numbered sequentially from 1 to n, the span sequence of the first layer can be expressed as: span [1,1], span [2,2], span [3,3] … span [ n, n ], the context feature vectors sp (1, 1), sp (2, 2), sp (3, 4) … sp (n, n) of each portion of the first layer can be respectively obtained according to the formula (1) (2) (3), and the first layer span sequence corresponding to the sentence is as follows: "[ BELL ] [ INDUSTRIES ] [ Inc ] [ embedded ] [ quaterlely ] [10] [ centers ] [ from ] [ seven ] [ centers ] [ a ] [ share ]", respectively obtaining a feature vector sp corresponding to each span; according to the same method, extracting the feature vector sp corresponding to each span of other layers on the component syntactic analysis tree corresponding to the current sentence.
Step 3.3: firstly, feature vectors sp corresponding to the span in the syntax analysis tree are respectively input into a label classifier and a connection classifier layer by layer, real non-leaf node labels of the span and corresponding non-leaf node label vectors are input into the label classifier layer by layer, and real conditions of whether the span belongs to the same part with the adjacent span behind the span in the next layer of the composition syntax analysis tree are input into the connection classifier layer by layer. Then the label classifier outputs the prediction labels of the span in each layer of sequence and the error value between the non-leaf node label obtained by predicting each layer of sequence and the real non-leaf node label; the connection classifier outputs an error value between a result obtained by predicting each span in each layer of sequence and a real result;
here, the sequence of feature vectors sp corresponding to the span at each level is input, taking the sentence "BELL input strings inc. "sp (1, 1), sp (2, 2), sp (3, 4) … sp (15, 15)", a real tag corresponding to a span in the first layer: the 00 00 00 00 00 0NP NP NP 0 and the corresponding label vector are input to a label classifier, and the label classifier gives a prediction result and an error value generated by the first layer of prediction is loss_l01; the sp vector corresponding to the second layer span sequence is then: "sp (1, 3), sp (4, 4), sp (5, 6), sp (7, 7), sp (8, 9), sp (10, 10), sp (11, 12), sp (13, 14), sp (15, 15)", and the real label corresponding to the span in the second layer: "NP 0NP 0NP 0NP 0 0" is input to the tag classifier, which gives the prediction result and the error value generated by the second layer prediction is loss_l02. The prediction error value of each layer can be obtained according to the above form: the loss_l01, loss_l02, loss_l03, loss_l04, loss_l05 and loss_l06 are cumulatively summed to obtain a first partial error value loss_label of the sentence prediction by the component syntax analysis tree construction model.
The composition of the sentence is parsed into sp vectors corresponding to the first layer span sequence of the tree: "sp (1, 1), sp (2, 2), sp (3, 4) … sp (15, 15)", and the true case where a span in a first layer is connected to its next adjacent span in the next layer: the "11 00 100 100 01 10 0" is input to a connection classifier, and the connection classifier gives a prediction result and an error value loss_l11 generated by the first-layer prediction; the sp vector corresponding to the second layer span sequence is then: "sp (1, 3), sp (4, 4), sp (5, 6), sp (7, 7), sp (8, 9), sp (10, 10), sp (11, 12), sp (13, 14), sp (15, 15)", and the true case that a span in the second layer is connected to an adjacent span next to it in the next layer: "0 00 00 01 0 0" is input to a connection classifier which gives the prediction result and the error value loss_l12 produced by the second layer prediction. The prediction error value of each layer can be obtained according to the above form: the loss_l11, loss_l12, loss_l13, loss_l14, loss_l15 and loss_l16 are cumulatively summed to obtain a second partial error value loss_connect of the component syntax analysis tree construction model for example sentence prediction.
Step 3.4: respectively accumulating and summing error values obtained by the label classifier and the connection classifier in the step 3.3 in each layer, then transmitting the sum value to the syntactic analysis tree construction model constructed in the step 2 after accumulating the error value accumulated values of the two parts of prediction error values, and adjusting parameters contained in each part of the model by applying a back propagation algorithm;
the prediction error values loss_l01, loss_l02, loss_l03, loss_l04, loss_l05 and loss_l06 of the label classifier in each layer are accumulated and summed to obtain a first partial error value loss_label of the sentence prediction by the component syntactic analysis tree construction model.
The prediction error values loss_l11, loss_l12, loss_l13, loss_l14, loss_l15 and loss_l16 of the connection classifier in each layer are accumulated and summed to obtain a second partial error value loss_connect of the sentence prediction by the component syntax analysis tree construction model.
And then summing the prediction error values of the two parts of loss_label and loss_connect, and reversely transmitting the sum value to the syntactic analysis tree construction model constructed in the step 2 for adjusting parameters contained in each part of the model.
In the embodiment, the values of the parameters are adjusted according to the error values and a back propagation algorithm, and the purpose of the process is that the error values obtained by re-executing the process after the parameters are expected to be adjusted by the model are expected to be smaller, and the smaller error values indicate that the model prediction result is more accurate, so that the purpose of training the model is achieved.
Step 4: giving a sentence and POS labels corresponding to words in the sentence, and constructing a composition syntax analysis tree for the sentence by utilizing the composition syntax analysis tree construction model trained in the step 3;
as shown in fig. 6, the method comprises the following steps:
step 4.1: according to the part A) of the step 1, obtaining word vectors corresponding to words in sentences and POS label vectors corresponding to the words, wherein the words in the sentences are represented by the word vectors generated in the part A), and the POS label vectors are represented by the POS label vectors generated in the part A);
for example, a new input sentence: "This time, the firmware power ready" ("new sentence" hereinafter), the POS tag corresponding to the word in the input new sentence is: "DT NN, DT NNS VBD JJ. And B), respectively finding vectors corresponding to each word and label in the new sentence from the part A), using different word vectors to represent different words in the new sentence, and using different POS label vectors to represent different POS labels corresponding to the words in the new sentence.
Step 4.2: connecting word vectors in sentences with POS label vectors corresponding to the words, transmitting the word vectors to a Bi-LSTM neural network, extracting context features from the words in the sentences by the Bi-LSTM neural network, and outputting the context features as vector sequences with equal length as input;
the word vector of each word in the sentence is connected with the label vector as the expression vector of the word, and the expression vector of each word is input into the Bi-LSTM neural network.
Step 4.3: taking each word in the sentence as a span, and obtaining a feature vector sp of each span according to the output result of the Bi-LSTM, so as to form an sp sequence corresponding to the sentence;
taking a new sentence as an example, each word is taken as a span, and a span sequence can be obtained as follows: "[ This ] [ time, ] [, ] [ the ] [ firmware ] [ power ] [ ready ]", and obtaining a characteristic vector sp corresponding to each span respectively to obtain a sp sequence: sp (1, 1), sp (2, 2), sp (3, 3), sp (4, 4), sp (5, 5), sp (6, 6), sp (7, 7), sp (8, 8).
Step 4.4: the sp sequence is respectively input into a tag classifier and a connection classifier, and the tag classifier and the connection classifier respectively give prediction results: the label classifier outputs the result of predicting the labels of each span in the sequence; the connection classifier predicts the connection condition of each part in the sequence, outputs 1 if the current part is connected with the next adjacent part, and outputs 0 if the current part is not connected with the next adjacent part;
taking a new sentence as an example, the sequence obtained from the step 4.3 is respectively input into a tag classifier and a connection classifier, and the tag classifier gives a prediction result: "0 00 00 0ADJP 0", the connection classifier gives the predicted outcome: "1 00 10 10 0".
Step 4.5: according to the prediction results of the connection classifier on the connection conditions of the span obtained in the step 4.4, the span with the prediction result of 1 in the span sequence is connected with the adjacent span after the span, and the span with the prediction result of 0 is not connected with the adjacent span after the span, so that a new span sequence is formed;
taking a new sentence as an example, the connection classifier prediction result obtained from step 4.4: "1 00 10 10 0", the current sequence is: "[ This ] [ time, ] [, ] [ the ] [ firmware ] [ power ] [ ready ]", the new span sequences thus constructed are: "[ This time, ] [, ] [ the firmware ] [ power ready ].
Step 4.6: each span in the new span sequence obtained in the step 4.5 is subjected to the characteristic vector sp corresponding to the span according to the output result of the Bi-LSTM in the step 4.3, so as to form a new sp sequence;
taking a new sentence as an example, the new span sequence obtained from step 4.5 is: "[ This time, ] [, ] [ the firmware ] [ power ready ]", corresponding to the new sp sequence: sp (1, 2), sp (3, 3), sp (4, 5), sp (6, 7), sp (8, 8).
Step 4.7: and repeatedly executing the steps 4.4-4.6 until the new span sequence obtained in the step 4.5 contains only one span, which indicates that the syntactic analysis tree is constructed, stopping the process, transmitting the feature vector sp corresponding to the final span to a label classifier, and carrying out label prediction on the feature vector sp by the label classifier.
Taking the construction process from the leaf node to the second layer of the composition syntax analysis tree in the composition syntax analysis tree construction process of the new sentence "the real time", the processing procedure of steps 4.4-4.6 is as shown in fig. 7: the method comprises the steps of respectively inputting a sequence consisting of sp corresponding to a single word span into a tag classifier and a connection classifier, respectively giving prediction results by the tag classifier and the connection classifier, wherein the prediction results of the tag classifier are non-leaf node tags corresponding to all parts in a first layer of a syntactic analysis tree, the prediction results given by the connection classifier are whether all the span forms a part with the span adjacent to the span in the next layer of the analysis tree, connecting the span according to the prediction results of the connection classifier on all the span connection conditions to form a new span sequence, and transmitting the feature vector sequence corresponding to the new span sequence to the tag classifier and the connection classifier to obtain the prediction results of a second layer.
The syntactic tree is fully constructed with the flag that the new span sequence formed in step 4.5 contains only one span, representing that all parts are connected, and the looping process can stop. This span is tag predicted by a tag classifier. As shown in fig. 7, the newly constructed span sequence only contains one span, and the feature vector corresponding to the span sequence is transferred to the label classifier to predict the root node of the component syntactic analysis tree corresponding to the sentence.
It will be appreciated by those skilled in the art in light of the present teachings that various modifications and alterations can be made in light of the above teachings without departing from the scope of the invention and still fall within the scope thereof.

Claims (7)

1. A method for constructing a composition syntax analysis tree by combining a bottom-up rule with a neural network, comprising the steps of:
step 1: acquiring a PTB data set and preprocessing the PTB data set;
step 2: creating a syntactic analysis tree construction model, wherein the model comprises a Bi-LSTM neural network, a label classifier and a connection classifier;
the Bi-LSTM is used for extracting feature vectors of all parts with context information in sentences;
the tag classifier is used for predicting tags of all parts of the input sequence and completing tag classification tasks;
the connection classifier is used for predicting whether each part of the input sequence is connected with the adjacent part in the next layer of the composition syntax analysis tree;
step 3: training the composition syntax analysis tree construction model constructed in the step 2 by utilizing the sentences in the PTB data set and the structure information of the composition syntax analysis tree of the sentences;
step 4: giving a sentence and POS labels corresponding to words in the sentence, and carrying out component syntactic analysis tree construction on the sentence by using the component syntactic analysis tree construction model trained in the step 3.
2. The method for constructing a composition syntax analysis tree for a bottom-up rule-based neural network according to claim 1, wherein said preprocessing the PTB data set in step 1 comprises:
a) All words, all POS labels and non-leaf node labels appearing in the data set are respectively represented by different vectors, and the numerical values of the vectors are randomly generated;
b) Extracting structural information of a constituent syntax analysis tree of each sentence in the PTB dataset, comprising: the content of each part on each layer in the analysis tree, whether the next layer and the adjacent parts on each layer in the analysis tree belong to the same part, and the labels of each part on each layer in the analysis tree.
3. The method of claim 1, wherein the tag classifier and the connection classifier are both based on a CRF model.
4. The method for constructing a composition syntax analysis tree for a bottom-up rule-based neural network according to claim 2, wherein said step 3 comprises the steps of:
step 3.1: connecting word vectors of each word in a sentence with POS label vectors to obtain a representation vector of each word, inputting the representation vectors of all the words in the sentence into a Bi-LSTM neural network at the same time, extracting context characteristics from the sentence by the Bi-LSTM neural network, wherein the output of the Bi-LSTM neural network is a vector sequence with equal length as the input of the Bi-LSTM neural network;
step 3.2: marking a part in any span in a sentence as a span, and obtaining a characteristic vector sp corresponding to the span according to the output of the Bi-LSTM neural network;
step 3.3: firstly, inputting feature vectors sp corresponding to the span in a syntax analysis tree into a label classifier and a connection classifier layer by layer, inputting real non-leaf node labels of the spans and corresponding non-leaf node label vectors of the spans into the label classifier layer by layer, and inputting real conditions of whether adjacent spans belong to the same part or not in the next layer of the composition syntax analysis tree and the next adjacent spans into the connection classifier layer by layer; then the label classifier outputs the prediction labels of the span in each layer of sequence and the error value between the non-leaf node label obtained by predicting each layer of sequence and the real non-leaf node label; the connection classifier outputs an error value between a result obtained by predicting each span in each layer of sequence and a real result;
step 3.4: and (3) respectively accumulating and summing the error values obtained by the label classifier and the connection classifier in the step (3.3) in each layer, then accumulating the prediction error value accumulated values of the two parts, transmitting the summed values to the syntactic analysis tree construction model constructed in the step (2), and adjusting parameters contained in each part in the model by applying a back propagation algorithm.
5. The method for constructing a component syntax analysis tree for a bottom-up rule-based neural network according to claim 4, wherein in step 3.2, feature vectors sp are extracted layer by layer for the component syntax analysis tree corresponding to the current sentence.
6. The method for constructing a composition syntax analysis tree for a bottom-up rule-based neural network according to claim 2, wherein said step 4 comprises the steps of:
step 4.1: according to the A), a word vector corresponding to the word in the sentence and a POS label vector corresponding to the word are obtained;
step 4.2: connecting word vectors in sentences with POS label vectors corresponding to the words, transmitting the word vectors to a Bi-LSTM neural network, extracting context features from the words in the sentences by the Bi-LSTM neural network, and outputting the context features as vector sequences with equal length as input;
step 4.3: taking each word in the sentence as a span, and obtaining a feature vector sp of each span according to the output result of the Bi-LSTM, so as to form an sp sequence corresponding to the sentence;
step 4.4: the sp sequence is respectively input into a tag classifier and a connection classifier, and the tag classifier and the connection classifier respectively give prediction results: the label classifier outputs the result of predicting the labels of each span in the sequence; the connection classifier predicts the connection condition of each part in the sequence;
step 4.5: according to the prediction results of the connection classifier on the connection conditions of the span obtained in the step 4.4, the span with the prediction result of 1 in the span sequence is connected with the adjacent span after the span, and the span with the prediction result of 0 is not connected with the adjacent span after the span, so that a new span sequence is formed;
step 4.6: each span in the new span sequence obtained in the step 4.5 is subjected to the characteristic vector sp corresponding to the span according to the output result of the Bi-LSTM in the step 4.3, so as to form a new sp sequence;
step 4.7: and repeatedly executing the steps 4.4 to 4.6 until the new span sequence obtained in the step 4.5 contains only one span, transmitting the characteristic vector sp corresponding to the span to a label classifier, and carrying out label prediction on the characteristic vector sp by the label classifier.
7. The method for constructing a tree for component syntactic analysis by combining bottom-up rules with a neural network according to claim 6, in which the connection classifier in step 4.4 predicts the connection condition of each part in the sequence, and outputs 1 if the current part is connected to its next adjacent part, or outputs 0 if the current part is connected to its next adjacent part.
CN202011525926.4A 2020-12-22 2020-12-22 Method for constructing composition syntax analysis tree by combining bottom-up rules with neural network Active CN112560441B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011525926.4A CN112560441B (en) 2020-12-22 2020-12-22 Method for constructing composition syntax analysis tree by combining bottom-up rules with neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011525926.4A CN112560441B (en) 2020-12-22 2020-12-22 Method for constructing composition syntax analysis tree by combining bottom-up rules with neural network

Publications (2)

Publication Number Publication Date
CN112560441A CN112560441A (en) 2021-03-26
CN112560441B true CN112560441B (en) 2024-02-09

Family

ID=75031352

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011525926.4A Active CN112560441B (en) 2020-12-22 2020-12-22 Method for constructing composition syntax analysis tree by combining bottom-up rules with neural network

Country Status (1)

Country Link
CN (1) CN112560441B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280064A (en) * 2018-02-28 2018-07-13 北京理工大学 Participle, part-of-speech tagging, Entity recognition and the combination treatment method of syntactic analysis
CN108875000A (en) * 2018-06-14 2018-11-23 广东工业大学 A kind of semantic relation classification method merging more syntactic structures
CN110458181A (en) * 2018-06-07 2019-11-15 中国矿业大学 A kind of syntax dependency model, training method and analysis method based on width random forest
JP2020115303A (en) * 2019-01-18 2020-07-30 ハーディス株式会社 Natural language parsing system, parsing method and program
CN111783461A (en) * 2020-06-16 2020-10-16 北京工业大学 Named entity identification method based on syntactic dependency relationship
CN112069328A (en) * 2020-09-08 2020-12-11 中国人民解放军国防科技大学 Establishment method of entity relation joint extraction model based on multi-label classification

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280064A (en) * 2018-02-28 2018-07-13 北京理工大学 Participle, part-of-speech tagging, Entity recognition and the combination treatment method of syntactic analysis
CN110458181A (en) * 2018-06-07 2019-11-15 中国矿业大学 A kind of syntax dependency model, training method and analysis method based on width random forest
CN108875000A (en) * 2018-06-14 2018-11-23 广东工业大学 A kind of semantic relation classification method merging more syntactic structures
JP2020115303A (en) * 2019-01-18 2020-07-30 ハーディス株式会社 Natural language parsing system, parsing method and program
CN111783461A (en) * 2020-06-16 2020-10-16 北京工业大学 Named entity identification method based on syntactic dependency relationship
CN112069328A (en) * 2020-09-08 2020-12-11 中国人民解放军国防科技大学 Establishment method of entity relation joint extraction model based on multi-label classification

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于转移的神经网络哈萨克语句法分析研究;白雅雯;《中国优秀硕士学位论文全文数据库 信息科技辑》(第12期);I138-766 *
结合全局向量特征的神经网络依存句法分析模型;王衡军 等;《通信学报》;第39卷(第2期);第53-64页 *

Also Published As

Publication number Publication date
CN112560441A (en) 2021-03-26

Similar Documents

Publication Publication Date Title
Lee et al. Fully character-level neural machine translation without explicit segmentation
McCallum Efficiently inducing features of conditional random fields
CN111401077A (en) Language model processing method and device and computer equipment
CN112883193A (en) Training method, device and equipment of text classification model and readable medium
CN113157859B (en) Event detection method based on upper concept information
CN112052684A (en) Named entity identification method, device, equipment and storage medium for power metering
CN112818698B (en) Fine-grained user comment sentiment analysis method based on dual-channel model
CN117151220A (en) Industry knowledge base system and method based on entity link and relation extraction
CN115329088B (en) Robustness analysis method of graph neural network event detection model
CN113378573A (en) Content big data oriented small sample relation extraction method and device
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN116956824A (en) Aspect-level emotion analysis method and system based on dependency type and phrase structure tree
CN114841353A (en) Quantum language model modeling system fusing syntactic information and application thereof
Luo et al. Improving neural language models by segmenting, attending, and predicting the future
CN117094325B (en) Named entity identification method in rice pest field
CN111382333A (en) Case element extraction method in news text sentence based on case correlation joint learning and graph convolution
CN112560441B (en) Method for constructing composition syntax analysis tree by combining bottom-up rules with neural network
Zheng et al. Character-based parsing with convolutional neural network
CN116340507A (en) Aspect-level emotion analysis method based on mixed weight and double-channel graph convolution
CN115906846A (en) Document-level named entity identification method based on double-graph hierarchical feature fusion
CN112529743B (en) Contract element extraction method, device, electronic equipment and medium
CN114328924A (en) Relation classification method based on combination of pre-training model and syntax subtree
He et al. [Retracted] Application of Grammar Error Detection Method for English Composition Based on Machine Learning
CN109299442A (en) Chinese chapter primary-slave relation recognition methods and system
Huang et al. DFS-NER: Description Enhanced Few-shot NER via Prompt Learning and Meta-Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant