CN116595982A - Nested named entity identification method based on dynamic graph convolution - Google Patents
Nested named entity identification method based on dynamic graph convolution Download PDFInfo
- Publication number
- CN116595982A CN116595982A CN202310566702.5A CN202310566702A CN116595982A CN 116595982 A CN116595982 A CN 116595982A CN 202310566702 A CN202310566702 A CN 202310566702A CN 116595982 A CN116595982 A CN 116595982A
- Authority
- CN
- China
- Prior art keywords
- word
- sequence
- graph
- speech
- named entity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 19
- 238000000605 extraction Methods 0.000 claims abstract description 13
- 230000000694 effects Effects 0.000 claims abstract description 9
- 238000005516 engineering process Methods 0.000 claims abstract description 7
- 238000013507 mapping Methods 0.000 claims abstract description 7
- 239000013598 vector Substances 0.000 claims description 37
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 6
- 230000015654 memory Effects 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 6
- 238000004422 calculation algorithm Methods 0.000 claims description 5
- 230000006870 function Effects 0.000 claims description 5
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 238000003064 k means clustering Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 2
- 230000007547 defect Effects 0.000 abstract description 6
- 238000012545 processing Methods 0.000 abstract description 3
- 230000002159 abnormal effect Effects 0.000 abstract description 2
- 230000005540 biological transmission Effects 0.000 abstract description 2
- 238000009412 basement excavation Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 7
- 238000010276 construction Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000036316 preload Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000012916 structural analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/042—Knowledge-based neural networks; Logical representations of neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to the technical field of computer language identification processing, in particular to a nested named entity identification technology, and discloses a nested named entity identification method based on dynamic graph convolution, which comprises the following steps: aiming at natural language texts, mapping and characterizing text features by adopting a knowledge representation technology; modeling a grammar relation graph by using a graph structure according to part-of-speech dependency information of the text; extracting attribute characteristics and semantic similarity characteristics of the text body by adopting a dynamic graph convolution mode; the two-stage recognition strategy is used for locating and classifying entities. The invention overcomes the defects of insufficient extraction of the existing time sequence feature extraction model features and insufficient excavation of the model features, weakens the time sequence of information transmission, improves the recognition effect on abnormal word sequence texts and low-frequency entities, reduces the omission ratio of an accurate recognition mode, enhances the robustness of the system and is worthy of popularization and application.
Description
Technical Field
The invention relates to the technical field of computer language identification processing, in particular to a nested named entity identification technology.
Background
The nested named entity recognition task is one of the main components of natural language processing tasks such as a question-answering system, information retrieval, text abstracts and the like, and aims to recognize short entities in long entities with nesting conditions, 37% of sentences in a news broadcast corpus have nesting entity conditions, about 17% of the entities in a biomedical literature corpus are embedded in another entity, and the nesting condition of visible entities occupies a non-negligible share in the existing corpus. The identification of nested entities can be used to capture finer granularity semantic information, enabling better servicing of retrieved natural language applications.
The main method adopted by the named entity recognition research is based on a sequence mark model, and a sequence feature model such as a long-short-term memory network and the like is adopted to output a sequence label with the highest probability for each English character string or Chinese character of an input text in combination with a conditional random field model, but the methods have unsatisfactory effects when processing nesting conditions.
In recent years, corresponding model structures are also proposed for nested entity phenomena. If a rule-based model is proposed in the early stage, the model makes corresponding entity structure rules through a domain expert to conduct entity prediction. However, the rule-based method has limitations due to individual cognitive differences, has high field dependence, cannot be expanded, takes time and labor when rule preparation is performed, and the like, and has unsatisfactory recognition effect.
It has then been proposed to use proprietary structure based models to capture nested entities, including selection of region graphs, hypergraphs, etc., by treating the entities in the recognition sentence as the best sub-hypergraphs in the original complete hypergraph or one span of the parse tree. Wherein the hypergraph structure is composed of five node types for compactly representing entities of a plurality of different semantic categories and boundaries. Furthermore, superarc can very naturally solve the nesting problem because it can connect two or more nodes. Together, these paths form a unique sub-hypergraph of the original hypergraph, which is used to express all nested entities in the sentence. However, a great deal of manpower is required to design accurate data sets for nested entity construction to avoid false structures and structural ambiguity, which is costly and not efficient enough.
With the development of machine learning and deep learning models, nested entity recognition methods based on deep learning, such as a manner based on stacking flat entity recognition models, a manner based on span enumeration, and the like, appear. The enumeration span-based method needs to classify all sub-sequences, is expensive in calculation and low in reasoning efficiency, and is not supervised by using boundary information. In addition, the existing model can obtain better results when trained and tested on a standard data set, but the effect obtained on a verification set and a test set is inferior to that of a training set. Especially in natural language dialogue scenes, the conditions of non-mentioned (Out-of-vocaliry) entities, disordered language sequences when the entities are expressed, and the like are common, the verification effect of the existing model on low-frequency entities and disordered entities is poor, namely when most of the entities and nesting conditions in the test set are different from those in the training set, the recognition effect is obviously reduced, and the robustness of the model is weak.
The nested named entity recognition model based on time sequence feature extraction only focuses on sequence context feature extraction, can obtain a better result in a mode based on sequence decoding, but does not utilize interactive information on grammar space of texts, such as part of speech, common finger and other grammar information interaction.
In addition, it has been proposed to convert nested entity recognition questions into the form of question-and-answer tasks, input text, and output the location and category of entities in the text as answers. The appearance of the methods opens ideas for nested entity identification strategies, but there is still room for improvement in performance and application scenarios.
In summary, there is no nested named entity recognition model capable of fully extracting sequence, text ontology attribute features and semantic features, and the robustness of the existing model is still to be enhanced.
Disclosure of Invention
The invention aims to solve the technical problem of providing a nested named entity recognition method based on dynamic graph convolution, which aims to overcome the defects in the prior art and improve the accuracy and efficiency of candidate generation and category recognition.
In order to solve the technical problems, the invention provides a nested named entity identification method for dynamic graph convolution, which comprises the following steps:
s1: aiming at natural language texts, mapping and characterizing text features by adopting a knowledge representation technology;
s2: modeling a grammar relation graph by using a graph structure according to part-of-speech dependency information of the text;
s3: extracting attribute characteristics and semantic similarity characteristics of the text body by adopting a dynamic graph convolution mode;
s4: the two-stage recognition strategy is used for locating and classifying entities.
The step S1 includes the steps of:
s11: taking each given sequence in the data set as a unit, wherein the data set is text data, the sequence is a complete sentence ending with a period, each word in the sequence is represented as a word matrix formed by character vectors through a convolutional neural network, then the word matrix is subjected to conventional one-layer convolutional operation, and a character-level vector is obtained in a maximal pooling mode;
s12: obtaining word-level vectors by adopting a BERT pre-training word vector table; BERT is an abbreviation of Bidirectional Encoder Representation from Transformers, a pre-trained word embedding model;
s13: splicing the obtained character-level and word-level vectors, and extracting context characteristics through a two-way long-short-time memory network to obtain a vector representation for finishing initialization;
s14: and inputting the input word sequence in reverse order into a Long Short-Term Memory (LSTM) network to obtain reverse word vector representation, and splicing the forward word coding result and the reverse word coding result to obtain the output of the word context feature codes.
The step S2 includes:
s21: each word in the sentence sequence is used as a node in the graph, and a sequence edge is constructed for the front word node and the rear word node in the context according to the sequence relation, so that a sequence graph adjacency matrix is obtained; the edges have no directivity, and the information representing the positive and negative directions can be transmitted;
s22: a part-of-speech parser in an NLTK (fully called Natural Language Toolkit, a common natural language processing tool kit) library is adopted to decode to obtain part-of-speech relations, edges are constructed for word nodes with high-frequency part-of-speech dependency relations, dependency strength is given to the edges as a weight value, and therefore a part-of-speech dependency graph adjacency matrix is obtained, wherein the high-frequency part-of-speech dependency relations refer to that the dependency combination relations among parts of speech meet a certain statistical frequency;
the step S3 includes:
s31: respectively carrying out one-round to k-round graph convolution operation on the sequence graph and the part-of-speech dependency graph, transmitting and updating to obtain first-order or k-order neighbor information, wherein the specific propagation round number is selected according to the experimental effect, k is a natural number, and the value range of k is a certain empirical value;
s32: and adopting a binary K-means clustering algorithm to dynamically sample the common-finger nodes, adding edges to the common-finger nodes, and defining the common-finger nodes as common-finger edges, wherein the common-finger nodes are nodes with the same category labels or are similar in semantic space.
The step S4 includes:
s41: inputting the node feature vector obtained by the feature extraction module into a classifier for label decoding, and dividing the boundary label of each word node into two types, wherein one type is composed of entities and the other type is composed of non-entities;
s42: the nodes which are identified as the entity components are combined according to the adjacency to obtain candidate spans, and then normalized feature vectors of the spans are input into a category prediction module for prediction;
s43: and carrying out class prediction on the normalized input by adopting a Softmax () function through an obtained span representation input span class prediction module.
The step S41 classifies the boundary labels of each word node into two types, specifically:
dividing the boundary label of each word node into an entity composition and a non-entity composition by adopting a fuzzy boundary label strategy, wherein the calculation formula is as follows
P b =Softmax(MLP(x final ))
In which x is final And representing the sequence feature representation obtained by the feature extraction module, wherein the MLP (·) is a multi-layer perceptron, and the final boundary tag classifier adopts a Softmax () function to conduct classification prediction.
The invention has the following beneficial effects:
1. compared with the problem of insufficient model feature mining in the prior art, the feature extraction method based on the dynamic graph convolution network provided by the invention has the advantages that the part-of-speech dependency information is flexibly utilized by the statistical analysis data set, the co-reference relation graph is dynamically generated, the time sequence is weakened by the information transmission based on the graph structure, the recognition effect of the model on the abnormal word sequence text and the low-frequency entity is improved, and meanwhile, the robustness of the model is improved.
2. The invention uses a simple and efficient information storage mode of a graph structure, adopts space mapping to map text units to a feature space, adopts a dynamic graph convolution mode to fuse different semantic and grammar information from various graph structures, adopts a two-stage recognition strategy, and overcomes the defect of high cost based on an enumeration mode and the defect of boundary blurring based on a hierarchical model.
3. The invention performs feature extraction based on the graph structure, can transfer and fuse sequence, grammar and semantic feature information in a topological structure mode, and can continuously perform iterative updating, so that the relation among each text unit can be fully reflected, the multi-granularity feature of the text can be fully learned, and the accuracy and the efficiency of candidate generation and category identification are improved.
4. The invention adopts the fuzzy boundary recognition strategy to generate candidate entities, reduces the omission ratio of an accurate recognition mode, and improves the model recognition recall rate.
Drawings
The technical scheme of the invention is further specifically described below with reference to the accompanying drawings and the detailed description.
FIG. 1 is a part-of-speech dependency graph of the present invention.
Fig. 2 is a split-map convolution illustration of the present invention.
Detailed Description
The invention provides a nested named entity identification method for dynamic graph convolution, which comprises the following steps:
s1: aiming at natural language texts, mapping and characterizing text features by adopting a knowledge representation technology;
s2: modeling a grammar relation graph by using a graph structure according to part-of-speech dependency information of the text;
s3: extracting attribute characteristics and semantic similarity characteristics of the text body by adopting a dynamic graph convolution mode;
s4: the two-stage recognition strategy is used for locating and classifying entities.
Specifically, firstly, preprocessing related corpus to obtain a distributed representation of text, wherein the main steps are shown in fig. 1, and the specific steps are as follows:
s11: for each given sequence in the dataset, the dataset is text data, each given sequence is a complete sentence ending in a period, and in units of each given sequence in the dataset, the sequence is defined asn represents a sequence containing n words in total. The method comprises the steps of carrying out knowledge representation on a sequence in an initialization stage, specifically, firstly, obtaining character-level codes of each word through a convolutional neural network, constructing a dictionary for the characters by the network, carrying out one-hot (one-hot) codes, setting feature dimensions, representing each word as a word matrix formed by character vectors, carrying out conventional one-layer convolutional operation on the matrix, and obtaining a final character-level vector in a maximum pooling mode;
s12: next, the BERT pre-training word vector is used for initializing word level coding. Specifically, for each word in the text sequence, a word-level vector representation x is obtained by looking up a BERT pre-training word vector table that matches the pre-load word =BERT emb (x);
S13: next, the resulting character level is displayedSplicing word-level vectors, and extracting context characteristics through a two-way long short-time memory network to obtain initialized vector representationThe calculation formula is as follows:
wherein W and b each represent a parameter to be trained,represents the t-th word, h in the sentence t-1 Indicating the state of the cell at the previous moment, +.>Output representing forget gate, +.>Representing the output of the memory gate, ">Indicating temporary cell status,/->Indicating the current cell state->Indicates the output door, ++>Representing the forward vector.
S14: inputting the input word sequence into LSTM network in reverse order to obtain reverse word vector representation,that is, the reverse vector coding result is represented, and the forward word coding result and the reverse word coding result are spliced to obtain the output of the word context feature code
S21: next, graph construction is performed. Each word x in the sentence sequence i Will be a node n in the graph i Firstly, constructing a sequence edge E for the front word node and the rear word node in the context according to the sequence relation s =[e 1-2 ,e 2-3 ,e 3-4 ,...,e n-1-n ]The edges are not directional, and information indicating both the forward and reverse directions can be transmitted, thereby obtaining a sequence diagram adjacent matrix A t ∈R n×n ;
S22: then, part-of-speech resolvers in an NLTK library are adopted to decode to obtain part-of-speech relations, and edges are constructed for word nodes corresponding to the high-frequency part-of-speech dependency relations, such as modified noun-modified noun and other relations, which are common part-of-speech dependency combinations
Specifically, the high-frequency part-of-speech dependency relationship in the GENIA corpus is shown in table 1, so that edges are constructed for adjectives and nouns according to part-of-speech dependency frequencies obtained by statistics on all entities in the corpus, schematically shown in fig. 1, circles in the figure represent word nodes, node indexes in the figure correspond to indexes marked above sentences one by one, edges are part-of-speech dependency edges, values on the edges are edge weights, and isolated points are nodes with no correlation dependency in part-of-speech structural analysis.
TABLE 1 word dependency statistics in GENIA
Considering the difference of dependency strength, the edge weight value is given according to the statistical frequency as the part-of-speech dependency correlation score, so the edge E can be added according to the part-of-speech dependency r =[e 2-3 ,e 2-4 ,e 3-4 ]The corresponding edge weights are 0.5, 0.3 and 0.3 in sequence, so that the part-of-speech dependency graph adjacency matrix A is obtained s ∈R n×n 。
S31: and respectively carrying out one-round to k-round graph convolution operation on the sequence graph and the part-of-speech dependency graph, transmitting and updating to obtain first-order or k-order neighbor information, wherein k is a natural number, and the value range is 3 to 6 according to experience. The graph convolution formula is shown below. As shown in fig. 2, the upper dashed line frame in fig. 2 represents a sequence diagram constructed according to a sequence context, the lower dashed line frame represents a part-of-speech dependency diagram constructed according to a part-of-speech dependency relationship, the upper and lower two diagrams are respectively subjected to one-to-three diagram convolution, and the feature diagrams after convolution update are spliced to obtain an output feature diagram on the right side;
G t =GCN(X t ,A t )
G s =GCN(X s ,A s )
s32: and then, adopting a binary K-means clustering algorithm to dynamically sample the common-finger nodes and adding common-finger edges. Co-fingered nodes are nodes that have the same class labels or are similar in semantic space. The algorithm automatically samples the common-finger nodes as clusters according to the spatial distance of the semantic representation of the nodes, and constructs edges for the common-finger nodes.
Specifically, a graph made up of all nodes is first considered as one cluster, and then recursively split into two new clusters until a specified number of clusters is reached. Specifically, a K-means algorithm is called first, a data set is divided into two clusters by calculating Euclidean distance between feature vectors, and each cluster has own error square sum and is named as a father node SSE; next, these clusters are respectively K-means classified, and the sum of the SSEs of the two clusters that are separated is recorded, which is called the child node total SSE. Recording the difference SSE of SSE after each cluster is classified Difference value =SSE Father node -SSE Child node The cluster with the largest SSE difference value is selected to continue dividing, while the other clusters stop dividing. The bipartite step is repeated until the total number of clusters reaches K. The sampling distance formula and the SSE calculation formula are shown below.
Wherein,,are feature vectors of node samples, n represents the dimension of the vector, and dist (·) represents the Euclidean distance between the two obtained vectors.
The model weakens the sequence characteristic for the vector characterization learned by the word nodes, enriches the generalization characteristic, and simultaneously has the characteristics of body attribute, grammar structure characteristic and semantic similarity. The model is weakened by the influence of the character sequence, and can be effectively identified under the condition of disordered characters, so that the classification result of the model aiming at the characteristics is more robust.
S41: and then inputting the node feature vector obtained by the feature extraction module into a classifier for label decoding.
Specifically, the fuzzy boundary tag strategy classifies the boundary tags of each word node into two categories, one is an entity composition (denoted by 1) and the other is a non-entity composition (denoted by 0), and the calculation formula is shown below.
P b =Softmax(MLP(x final ))
In which x is final And representing the sequence feature representation obtained by the feature extraction module, wherein the MLP (·) is a multi-layer perceptron, and the final boundary tag classifier adopts a Softmax () function to conduct classification prediction.
S42: and then, identifying the nodes which are formed by the entities, obtaining candidate spans according to the adjacency combination, and inputting normalized feature vectors of the spans into a category prediction module for prediction.
Specifically, the normalization mode of mapping of the full connection layer is performed once by adopting the vector spliced from the head to the tail, and the formula is as follows.
Span gt =MLP([x s :x s+n ])
S43: finally, the obtained span representation is input into a span type prediction module, and the normalized input is subjected to type prediction by adopting Softmax, wherein the formula is as follows:
P c =Softmax(Span)
the nested named entity recognition technology based on dynamic graph convolution, which is provided by the invention, performs feature extraction based on graph structure, can transfer and fuse sequence, grammar and semantic feature information in a topological structure mode, and continuously performs iterative updating, so that the relation among text units can be fully reflected, the multi-granularity features of texts can be fully learned, and the accuracy and the efficiency of candidate generation and category recognition are improved. The simple and efficient information storage mode of the graph structure is utilized, the space mapping is adopted, the text unit is mapped to the feature space, the dynamic graph convolution mode is adopted, different semantic and grammar information is fused from various graph structures, and the two-stage recognition strategy is adopted, so that the defects of high cost based on the enumeration mode and the defect of boundary blurring based on the hierarchical model are overcome.
Finally, it should be noted that the above-mentioned embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention, and all such modifications and equivalents are intended to be encompassed in the scope of the claims of the present invention.
Claims (6)
1. The nested named entity identification method based on dynamic graph convolution is characterized by comprising the following steps of:
s1: aiming at natural language texts, mapping and characterizing text features by adopting a knowledge representation technology;
s2: modeling a grammar relation graph by using a graph structure according to part-of-speech dependency information of the text;
s3: extracting attribute characteristics and semantic similarity characteristics of the text body by adopting a dynamic graph convolution mode;
s4: the two-stage recognition strategy is used for locating and classifying entities.
2. The nested named entity recognition method based on dynamic graph convolution according to claim 1, wherein the step S1 comprises the steps of:
s11: taking each given sequence in the data set as a unit, wherein the data set is text data, the sequence is a complete sentence ending with a period, each word in the sequence is represented as a word matrix formed by character vectors through a convolutional neural network, then the word matrix is subjected to conventional one-layer convolutional operation, and a character-level vector is obtained in a maximal pooling mode;
s12: obtaining word-level vectors by adopting a BERT pre-training word vector table;
s13: splicing the obtained character-level and word-level vectors, and extracting context characteristics through a two-way long-short-time memory network to obtain a vector representation for finishing initialization;
s14: and inputting the reverse sequence of the input word sequence into a long-short-time memory LSTM network to obtain reverse word vector representation, and splicing the forward word coding result and the reverse word coding result to obtain the output of the word context feature coding.
3. The nested named entity recognition method based on dynamic graph convolution according to claim 2, wherein the step S2 comprises:
s21: each word in the sentence sequence is used as a node in the graph, and a sequence edge is constructed for the front word node and the rear word node in the context according to the sequence relation, so that a sequence graph adjacency matrix is obtained; the edges have no directivity, and the information representing the positive and negative directions can be transmitted;
s22: the part-of-speech resolvers in the NLTK library are adopted to decode and obtain part-of-speech relations, edges are built for word nodes with high-frequency part-of-speech dependency relations, dependency strength is given to the edges as a weight value, and therefore a part-of-speech dependency graph adjacency matrix is obtained, and the high-frequency part-of-speech dependency relations means that dependency combination relations among parts of speech meet certain statistical frequency.
4. The nested named entity recognition method based on dynamic graph convolution according to claim 3, wherein the step S3 comprises:
s31: respectively carrying out one-round to k-round graph convolution operation on the sequence graph and the part-of-speech dependency graph, transmitting and updating to obtain first-order or k-order neighbor information, wherein the specific propagation round number is selected according to the experimental effect, k is a natural number, and the value range of k is a certain empirical value;
s32: and adopting a binary K-means clustering algorithm to dynamically sample the common-finger nodes, adding edges to the common-finger nodes, and defining the common-finger nodes as common-finger edges, wherein the common-finger nodes are nodes with the same category labels or are similar in semantic space.
5. The nested named entity recognition method based on dynamic graph convolution according to claim 3, wherein the step S4 comprises:
s41: inputting the node feature vector obtained by the feature extraction module into a classifier for label decoding, and dividing the boundary label of each word node into two types, wherein one type is composed of entities and the other type is composed of non-entities;
s42: the nodes which are identified as the entity components are combined according to the adjacency to obtain candidate spans, and then normalization operation is carried out on the characteristic vectors of the spans;
s43: the second step of the two-stage entity recognition model is category prediction, the obtained span representation feature vector is input into a span category prediction module for prediction, and specifically, a Softmax () function is adopted for category prediction of the normalized input.
6. The nested named entity recognition method based on dynamic graph convolution according to claim 5, wherein the step S41 classifies the boundary labels of each word node into two classes, specifically:
dividing the boundary label of each word node into an entity composition and a non-entity composition by adopting a fuzzy boundary label strategy, wherein the calculation formula is as follows
P b =Softmax(MLP(x final ))
In which x is final And representing the sequence feature representation obtained by the feature extraction module, wherein the MLP (·) is a multi-layer perceptron, and the final boundary tag classifier adopts a Softmax () function to conduct classification prediction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310566702.5A CN116595982A (en) | 2023-05-19 | 2023-05-19 | Nested named entity identification method based on dynamic graph convolution |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310566702.5A CN116595982A (en) | 2023-05-19 | 2023-05-19 | Nested named entity identification method based on dynamic graph convolution |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116595982A true CN116595982A (en) | 2023-08-15 |
Family
ID=87607659
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310566702.5A Pending CN116595982A (en) | 2023-05-19 | 2023-05-19 | Nested named entity identification method based on dynamic graph convolution |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116595982A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116757216A (en) * | 2023-08-15 | 2023-09-15 | 之江实验室 | Small sample entity identification method and device based on cluster description and computer equipment |
CN118246453A (en) * | 2024-05-20 | 2024-06-25 | 昆明理工大学 | Nested entity recognition model based on graph convolution, construction method thereof and storage medium |
-
2023
- 2023-05-19 CN CN202310566702.5A patent/CN116595982A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116757216A (en) * | 2023-08-15 | 2023-09-15 | 之江实验室 | Small sample entity identification method and device based on cluster description and computer equipment |
CN116757216B (en) * | 2023-08-15 | 2023-11-07 | 之江实验室 | Small sample entity identification method and device based on cluster description and computer equipment |
CN118246453A (en) * | 2024-05-20 | 2024-06-25 | 昆明理工大学 | Nested entity recognition model based on graph convolution, construction method thereof and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112560432B (en) | Text emotion analysis method based on graph attention network | |
CN109992629B (en) | Neural network relation extraction method and system fusing entity type constraints | |
CN1677388B (en) | Method and system for translating Input semantic structure into output semantic structure according to fraction | |
CN117076653B (en) | Knowledge base question-answering method based on thinking chain and visual lifting context learning | |
CN113204952B (en) | Multi-intention and semantic slot joint identification method based on cluster pre-analysis | |
CN111046179B (en) | Text classification method for open network question in specific field | |
CN117151220B (en) | Entity link and relationship based extraction industry knowledge base system and method | |
CN112183094B (en) | Chinese grammar debugging method and system based on multiple text features | |
CN116595982A (en) | Nested named entity identification method based on dynamic graph convolution | |
CN111309918A (en) | Multi-label text classification method based on label relevance | |
CN118093834B (en) | AIGC large model-based language processing question-answering system and method | |
CN114548101A (en) | Event detection method and system based on backtracking sequence generation method | |
CN113255366B (en) | Aspect-level text emotion analysis method based on heterogeneous graph neural network | |
CN114691864A (en) | Text classification model training method and device and text classification method and device | |
CN114818717A (en) | Chinese named entity recognition method and system fusing vocabulary and syntax information | |
CN111145914A (en) | Method and device for determining lung cancer clinical disease library text entity | |
CN114880307A (en) | Structured modeling method for knowledge in open education field | |
CN115081472A (en) | Pulse signal syntax modeling and feature extraction method for radar behavior analysis | |
CN114626378B (en) | Named entity recognition method, named entity recognition device, electronic equipment and computer readable storage medium | |
CN116955579B (en) | Chat reply generation method and device based on keyword knowledge retrieval | |
CN118227790A (en) | Text classification method, system, equipment and medium based on multi-label association | |
CN111666375A (en) | Matching method of text similarity, electronic equipment and computer readable medium | |
CN116595170A (en) | Medical text classification method based on soft prompt | |
CN117216617A (en) | Text classification model training method, device, computer equipment and storage medium | |
CN118057354A (en) | Event detection method based on meta attribute learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |