CN116595982A - Nested named entity identification method based on dynamic graph convolution - Google Patents

Nested named entity identification method based on dynamic graph convolution Download PDF

Info

Publication number
CN116595982A
CN116595982A CN202310566702.5A CN202310566702A CN116595982A CN 116595982 A CN116595982 A CN 116595982A CN 202310566702 A CN202310566702 A CN 202310566702A CN 116595982 A CN116595982 A CN 116595982A
Authority
CN
China
Prior art keywords
word
sequence
graph
speech
named entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310566702.5A
Other languages
Chinese (zh)
Inventor
莫益军
孙淑榕
刘辉宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202310566702.5A priority Critical patent/CN116595982A/en
Publication of CN116595982A publication Critical patent/CN116595982A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the technical field of computer language identification processing, in particular to a nested named entity identification technology, and discloses a nested named entity identification method based on dynamic graph convolution, which comprises the following steps: aiming at natural language texts, mapping and characterizing text features by adopting a knowledge representation technology; modeling a grammar relation graph by using a graph structure according to part-of-speech dependency information of the text; extracting attribute characteristics and semantic similarity characteristics of the text body by adopting a dynamic graph convolution mode; the two-stage recognition strategy is used for locating and classifying entities. The invention overcomes the defects of insufficient extraction of the existing time sequence feature extraction model features and insufficient excavation of the model features, weakens the time sequence of information transmission, improves the recognition effect on abnormal word sequence texts and low-frequency entities, reduces the omission ratio of an accurate recognition mode, enhances the robustness of the system and is worthy of popularization and application.

Description

Nested named entity identification method based on dynamic graph convolution
Technical Field
The invention relates to the technical field of computer language identification processing, in particular to a nested named entity identification technology.
Background
The nested named entity recognition task is one of the main components of natural language processing tasks such as a question-answering system, information retrieval, text abstracts and the like, and aims to recognize short entities in long entities with nesting conditions, 37% of sentences in a news broadcast corpus have nesting entity conditions, about 17% of the entities in a biomedical literature corpus are embedded in another entity, and the nesting condition of visible entities occupies a non-negligible share in the existing corpus. The identification of nested entities can be used to capture finer granularity semantic information, enabling better servicing of retrieved natural language applications.
The main method adopted by the named entity recognition research is based on a sequence mark model, and a sequence feature model such as a long-short-term memory network and the like is adopted to output a sequence label with the highest probability for each English character string or Chinese character of an input text in combination with a conditional random field model, but the methods have unsatisfactory effects when processing nesting conditions.
In recent years, corresponding model structures are also proposed for nested entity phenomena. If a rule-based model is proposed in the early stage, the model makes corresponding entity structure rules through a domain expert to conduct entity prediction. However, the rule-based method has limitations due to individual cognitive differences, has high field dependence, cannot be expanded, takes time and labor when rule preparation is performed, and the like, and has unsatisfactory recognition effect.
It has then been proposed to use proprietary structure based models to capture nested entities, including selection of region graphs, hypergraphs, etc., by treating the entities in the recognition sentence as the best sub-hypergraphs in the original complete hypergraph or one span of the parse tree. Wherein the hypergraph structure is composed of five node types for compactly representing entities of a plurality of different semantic categories and boundaries. Furthermore, superarc can very naturally solve the nesting problem because it can connect two or more nodes. Together, these paths form a unique sub-hypergraph of the original hypergraph, which is used to express all nested entities in the sentence. However, a great deal of manpower is required to design accurate data sets for nested entity construction to avoid false structures and structural ambiguity, which is costly and not efficient enough.
With the development of machine learning and deep learning models, nested entity recognition methods based on deep learning, such as a manner based on stacking flat entity recognition models, a manner based on span enumeration, and the like, appear. The enumeration span-based method needs to classify all sub-sequences, is expensive in calculation and low in reasoning efficiency, and is not supervised by using boundary information. In addition, the existing model can obtain better results when trained and tested on a standard data set, but the effect obtained on a verification set and a test set is inferior to that of a training set. Especially in natural language dialogue scenes, the conditions of non-mentioned (Out-of-vocaliry) entities, disordered language sequences when the entities are expressed, and the like are common, the verification effect of the existing model on low-frequency entities and disordered entities is poor, namely when most of the entities and nesting conditions in the test set are different from those in the training set, the recognition effect is obviously reduced, and the robustness of the model is weak.
The nested named entity recognition model based on time sequence feature extraction only focuses on sequence context feature extraction, can obtain a better result in a mode based on sequence decoding, but does not utilize interactive information on grammar space of texts, such as part of speech, common finger and other grammar information interaction.
In addition, it has been proposed to convert nested entity recognition questions into the form of question-and-answer tasks, input text, and output the location and category of entities in the text as answers. The appearance of the methods opens ideas for nested entity identification strategies, but there is still room for improvement in performance and application scenarios.
In summary, there is no nested named entity recognition model capable of fully extracting sequence, text ontology attribute features and semantic features, and the robustness of the existing model is still to be enhanced.
Disclosure of Invention
The invention aims to solve the technical problem of providing a nested named entity recognition method based on dynamic graph convolution, which aims to overcome the defects in the prior art and improve the accuracy and efficiency of candidate generation and category recognition.
In order to solve the technical problems, the invention provides a nested named entity identification method for dynamic graph convolution, which comprises the following steps:
s1: aiming at natural language texts, mapping and characterizing text features by adopting a knowledge representation technology;
s2: modeling a grammar relation graph by using a graph structure according to part-of-speech dependency information of the text;
s3: extracting attribute characteristics and semantic similarity characteristics of the text body by adopting a dynamic graph convolution mode;
s4: the two-stage recognition strategy is used for locating and classifying entities.
The step S1 includes the steps of:
s11: taking each given sequence in the data set as a unit, wherein the data set is text data, the sequence is a complete sentence ending with a period, each word in the sequence is represented as a word matrix formed by character vectors through a convolutional neural network, then the word matrix is subjected to conventional one-layer convolutional operation, and a character-level vector is obtained in a maximal pooling mode;
s12: obtaining word-level vectors by adopting a BERT pre-training word vector table; BERT is an abbreviation of Bidirectional Encoder Representation from Transformers, a pre-trained word embedding model;
s13: splicing the obtained character-level and word-level vectors, and extracting context characteristics through a two-way long-short-time memory network to obtain a vector representation for finishing initialization;
s14: and inputting the input word sequence in reverse order into a Long Short-Term Memory (LSTM) network to obtain reverse word vector representation, and splicing the forward word coding result and the reverse word coding result to obtain the output of the word context feature codes.
The step S2 includes:
s21: each word in the sentence sequence is used as a node in the graph, and a sequence edge is constructed for the front word node and the rear word node in the context according to the sequence relation, so that a sequence graph adjacency matrix is obtained; the edges have no directivity, and the information representing the positive and negative directions can be transmitted;
s22: a part-of-speech parser in an NLTK (fully called Natural Language Toolkit, a common natural language processing tool kit) library is adopted to decode to obtain part-of-speech relations, edges are constructed for word nodes with high-frequency part-of-speech dependency relations, dependency strength is given to the edges as a weight value, and therefore a part-of-speech dependency graph adjacency matrix is obtained, wherein the high-frequency part-of-speech dependency relations refer to that the dependency combination relations among parts of speech meet a certain statistical frequency;
the step S3 includes:
s31: respectively carrying out one-round to k-round graph convolution operation on the sequence graph and the part-of-speech dependency graph, transmitting and updating to obtain first-order or k-order neighbor information, wherein the specific propagation round number is selected according to the experimental effect, k is a natural number, and the value range of k is a certain empirical value;
s32: and adopting a binary K-means clustering algorithm to dynamically sample the common-finger nodes, adding edges to the common-finger nodes, and defining the common-finger nodes as common-finger edges, wherein the common-finger nodes are nodes with the same category labels or are similar in semantic space.
The step S4 includes:
s41: inputting the node feature vector obtained by the feature extraction module into a classifier for label decoding, and dividing the boundary label of each word node into two types, wherein one type is composed of entities and the other type is composed of non-entities;
s42: the nodes which are identified as the entity components are combined according to the adjacency to obtain candidate spans, and then normalized feature vectors of the spans are input into a category prediction module for prediction;
s43: and carrying out class prediction on the normalized input by adopting a Softmax () function through an obtained span representation input span class prediction module.
The step S41 classifies the boundary labels of each word node into two types, specifically:
dividing the boundary label of each word node into an entity composition and a non-entity composition by adopting a fuzzy boundary label strategy, wherein the calculation formula is as follows
P b =Softmax(MLP(x final ))
In which x is final And representing the sequence feature representation obtained by the feature extraction module, wherein the MLP (·) is a multi-layer perceptron, and the final boundary tag classifier adopts a Softmax () function to conduct classification prediction.
The invention has the following beneficial effects:
1. compared with the problem of insufficient model feature mining in the prior art, the feature extraction method based on the dynamic graph convolution network provided by the invention has the advantages that the part-of-speech dependency information is flexibly utilized by the statistical analysis data set, the co-reference relation graph is dynamically generated, the time sequence is weakened by the information transmission based on the graph structure, the recognition effect of the model on the abnormal word sequence text and the low-frequency entity is improved, and meanwhile, the robustness of the model is improved.
2. The invention uses a simple and efficient information storage mode of a graph structure, adopts space mapping to map text units to a feature space, adopts a dynamic graph convolution mode to fuse different semantic and grammar information from various graph structures, adopts a two-stage recognition strategy, and overcomes the defect of high cost based on an enumeration mode and the defect of boundary blurring based on a hierarchical model.
3. The invention performs feature extraction based on the graph structure, can transfer and fuse sequence, grammar and semantic feature information in a topological structure mode, and can continuously perform iterative updating, so that the relation among each text unit can be fully reflected, the multi-granularity feature of the text can be fully learned, and the accuracy and the efficiency of candidate generation and category identification are improved.
4. The invention adopts the fuzzy boundary recognition strategy to generate candidate entities, reduces the omission ratio of an accurate recognition mode, and improves the model recognition recall rate.
Drawings
The technical scheme of the invention is further specifically described below with reference to the accompanying drawings and the detailed description.
FIG. 1 is a part-of-speech dependency graph of the present invention.
Fig. 2 is a split-map convolution illustration of the present invention.
Detailed Description
The invention provides a nested named entity identification method for dynamic graph convolution, which comprises the following steps:
s1: aiming at natural language texts, mapping and characterizing text features by adopting a knowledge representation technology;
s2: modeling a grammar relation graph by using a graph structure according to part-of-speech dependency information of the text;
s3: extracting attribute characteristics and semantic similarity characteristics of the text body by adopting a dynamic graph convolution mode;
s4: the two-stage recognition strategy is used for locating and classifying entities.
Specifically, firstly, preprocessing related corpus to obtain a distributed representation of text, wherein the main steps are shown in fig. 1, and the specific steps are as follows:
s11: for each given sequence in the dataset, the dataset is text data, each given sequence is a complete sentence ending in a period, and in units of each given sequence in the dataset, the sequence is defined asn represents a sequence containing n words in total. The method comprises the steps of carrying out knowledge representation on a sequence in an initialization stage, specifically, firstly, obtaining character-level codes of each word through a convolutional neural network, constructing a dictionary for the characters by the network, carrying out one-hot (one-hot) codes, setting feature dimensions, representing each word as a word matrix formed by character vectors, carrying out conventional one-layer convolutional operation on the matrix, and obtaining a final character-level vector in a maximum pooling mode;
s12: next, the BERT pre-training word vector is used for initializing word level coding. Specifically, for each word in the text sequence, a word-level vector representation x is obtained by looking up a BERT pre-training word vector table that matches the pre-load word =BERT emb (x);
S13: next, the resulting character level is displayedSplicing word-level vectors, and extracting context characteristics through a two-way long short-time memory network to obtain initialized vector representationThe calculation formula is as follows:
wherein W and b each represent a parameter to be trained,represents the t-th word, h in the sentence t-1 Indicating the state of the cell at the previous moment, +.>Output representing forget gate, +.>Representing the output of the memory gate, ">Indicating temporary cell status,/->Indicating the current cell state->Indicates the output door, ++>Representing the forward vector.
S14: inputting the input word sequence into LSTM network in reverse order to obtain reverse word vector representation,that is, the reverse vector coding result is represented, and the forward word coding result and the reverse word coding result are spliced to obtain the output of the word context feature code
S21: next, graph construction is performed. Each word x in the sentence sequence i Will be a node n in the graph i Firstly, constructing a sequence edge E for the front word node and the rear word node in the context according to the sequence relation s =[e 1-2 ,e 2-3 ,e 3-4 ,...,e n-1-n ]The edges are not directional, and information indicating both the forward and reverse directions can be transmitted, thereby obtaining a sequence diagram adjacent matrix A t ∈R n×n
S22: then, part-of-speech resolvers in an NLTK library are adopted to decode to obtain part-of-speech relations, and edges are constructed for word nodes corresponding to the high-frequency part-of-speech dependency relations, such as modified noun-modified noun and other relations, which are common part-of-speech dependency combinations
Specifically, the high-frequency part-of-speech dependency relationship in the GENIA corpus is shown in table 1, so that edges are constructed for adjectives and nouns according to part-of-speech dependency frequencies obtained by statistics on all entities in the corpus, schematically shown in fig. 1, circles in the figure represent word nodes, node indexes in the figure correspond to indexes marked above sentences one by one, edges are part-of-speech dependency edges, values on the edges are edge weights, and isolated points are nodes with no correlation dependency in part-of-speech structural analysis.
TABLE 1 word dependency statistics in GENIA
Considering the difference of dependency strength, the edge weight value is given according to the statistical frequency as the part-of-speech dependency correlation score, so the edge E can be added according to the part-of-speech dependency r =[e 2-3 ,e 2-4 ,e 3-4 ]The corresponding edge weights are 0.5, 0.3 and 0.3 in sequence, so that the part-of-speech dependency graph adjacency matrix A is obtained s ∈R n×n
S31: and respectively carrying out one-round to k-round graph convolution operation on the sequence graph and the part-of-speech dependency graph, transmitting and updating to obtain first-order or k-order neighbor information, wherein k is a natural number, and the value range is 3 to 6 according to experience. The graph convolution formula is shown below. As shown in fig. 2, the upper dashed line frame in fig. 2 represents a sequence diagram constructed according to a sequence context, the lower dashed line frame represents a part-of-speech dependency diagram constructed according to a part-of-speech dependency relationship, the upper and lower two diagrams are respectively subjected to one-to-three diagram convolution, and the feature diagrams after convolution update are spliced to obtain an output feature diagram on the right side;
G t =GCN(X t ,A t )
G s =GCN(X s ,A s )
s32: and then, adopting a binary K-means clustering algorithm to dynamically sample the common-finger nodes and adding common-finger edges. Co-fingered nodes are nodes that have the same class labels or are similar in semantic space. The algorithm automatically samples the common-finger nodes as clusters according to the spatial distance of the semantic representation of the nodes, and constructs edges for the common-finger nodes.
Specifically, a graph made up of all nodes is first considered as one cluster, and then recursively split into two new clusters until a specified number of clusters is reached. Specifically, a K-means algorithm is called first, a data set is divided into two clusters by calculating Euclidean distance between feature vectors, and each cluster has own error square sum and is named as a father node SSE; next, these clusters are respectively K-means classified, and the sum of the SSEs of the two clusters that are separated is recorded, which is called the child node total SSE. Recording the difference SSE of SSE after each cluster is classified Difference value =SSE Father node -SSE Child node The cluster with the largest SSE difference value is selected to continue dividing, while the other clusters stop dividing. The bipartite step is repeated until the total number of clusters reaches K. The sampling distance formula and the SSE calculation formula are shown below.
Wherein,,are feature vectors of node samples, n represents the dimension of the vector, and dist (·) represents the Euclidean distance between the two obtained vectors.
The model weakens the sequence characteristic for the vector characterization learned by the word nodes, enriches the generalization characteristic, and simultaneously has the characteristics of body attribute, grammar structure characteristic and semantic similarity. The model is weakened by the influence of the character sequence, and can be effectively identified under the condition of disordered characters, so that the classification result of the model aiming at the characteristics is more robust.
S41: and then inputting the node feature vector obtained by the feature extraction module into a classifier for label decoding.
Specifically, the fuzzy boundary tag strategy classifies the boundary tags of each word node into two categories, one is an entity composition (denoted by 1) and the other is a non-entity composition (denoted by 0), and the calculation formula is shown below.
P b =Softmax(MLP(x final ))
In which x is final And representing the sequence feature representation obtained by the feature extraction module, wherein the MLP (·) is a multi-layer perceptron, and the final boundary tag classifier adopts a Softmax () function to conduct classification prediction.
S42: and then, identifying the nodes which are formed by the entities, obtaining candidate spans according to the adjacency combination, and inputting normalized feature vectors of the spans into a category prediction module for prediction.
Specifically, the normalization mode of mapping of the full connection layer is performed once by adopting the vector spliced from the head to the tail, and the formula is as follows.
Span gt =MLP([x s :x s+n ])
S43: finally, the obtained span representation is input into a span type prediction module, and the normalized input is subjected to type prediction by adopting Softmax, wherein the formula is as follows:
P c =Softmax(Span)
the nested named entity recognition technology based on dynamic graph convolution, which is provided by the invention, performs feature extraction based on graph structure, can transfer and fuse sequence, grammar and semantic feature information in a topological structure mode, and continuously performs iterative updating, so that the relation among text units can be fully reflected, the multi-granularity features of texts can be fully learned, and the accuracy and the efficiency of candidate generation and category recognition are improved. The simple and efficient information storage mode of the graph structure is utilized, the space mapping is adopted, the text unit is mapped to the feature space, the dynamic graph convolution mode is adopted, different semantic and grammar information is fused from various graph structures, and the two-stage recognition strategy is adopted, so that the defects of high cost based on the enumeration mode and the defect of boundary blurring based on the hierarchical model are overcome.
Finally, it should be noted that the above-mentioned embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention, and all such modifications and equivalents are intended to be encompassed in the scope of the claims of the present invention.

Claims (6)

1. The nested named entity identification method based on dynamic graph convolution is characterized by comprising the following steps of:
s1: aiming at natural language texts, mapping and characterizing text features by adopting a knowledge representation technology;
s2: modeling a grammar relation graph by using a graph structure according to part-of-speech dependency information of the text;
s3: extracting attribute characteristics and semantic similarity characteristics of the text body by adopting a dynamic graph convolution mode;
s4: the two-stage recognition strategy is used for locating and classifying entities.
2. The nested named entity recognition method based on dynamic graph convolution according to claim 1, wherein the step S1 comprises the steps of:
s11: taking each given sequence in the data set as a unit, wherein the data set is text data, the sequence is a complete sentence ending with a period, each word in the sequence is represented as a word matrix formed by character vectors through a convolutional neural network, then the word matrix is subjected to conventional one-layer convolutional operation, and a character-level vector is obtained in a maximal pooling mode;
s12: obtaining word-level vectors by adopting a BERT pre-training word vector table;
s13: splicing the obtained character-level and word-level vectors, and extracting context characteristics through a two-way long-short-time memory network to obtain a vector representation for finishing initialization;
s14: and inputting the reverse sequence of the input word sequence into a long-short-time memory LSTM network to obtain reverse word vector representation, and splicing the forward word coding result and the reverse word coding result to obtain the output of the word context feature coding.
3. The nested named entity recognition method based on dynamic graph convolution according to claim 2, wherein the step S2 comprises:
s21: each word in the sentence sequence is used as a node in the graph, and a sequence edge is constructed for the front word node and the rear word node in the context according to the sequence relation, so that a sequence graph adjacency matrix is obtained; the edges have no directivity, and the information representing the positive and negative directions can be transmitted;
s22: the part-of-speech resolvers in the NLTK library are adopted to decode and obtain part-of-speech relations, edges are built for word nodes with high-frequency part-of-speech dependency relations, dependency strength is given to the edges as a weight value, and therefore a part-of-speech dependency graph adjacency matrix is obtained, and the high-frequency part-of-speech dependency relations means that dependency combination relations among parts of speech meet certain statistical frequency.
4. The nested named entity recognition method based on dynamic graph convolution according to claim 3, wherein the step S3 comprises:
s31: respectively carrying out one-round to k-round graph convolution operation on the sequence graph and the part-of-speech dependency graph, transmitting and updating to obtain first-order or k-order neighbor information, wherein the specific propagation round number is selected according to the experimental effect, k is a natural number, and the value range of k is a certain empirical value;
s32: and adopting a binary K-means clustering algorithm to dynamically sample the common-finger nodes, adding edges to the common-finger nodes, and defining the common-finger nodes as common-finger edges, wherein the common-finger nodes are nodes with the same category labels or are similar in semantic space.
5. The nested named entity recognition method based on dynamic graph convolution according to claim 3, wherein the step S4 comprises:
s41: inputting the node feature vector obtained by the feature extraction module into a classifier for label decoding, and dividing the boundary label of each word node into two types, wherein one type is composed of entities and the other type is composed of non-entities;
s42: the nodes which are identified as the entity components are combined according to the adjacency to obtain candidate spans, and then normalization operation is carried out on the characteristic vectors of the spans;
s43: the second step of the two-stage entity recognition model is category prediction, the obtained span representation feature vector is input into a span category prediction module for prediction, and specifically, a Softmax () function is adopted for category prediction of the normalized input.
6. The nested named entity recognition method based on dynamic graph convolution according to claim 5, wherein the step S41 classifies the boundary labels of each word node into two classes, specifically:
dividing the boundary label of each word node into an entity composition and a non-entity composition by adopting a fuzzy boundary label strategy, wherein the calculation formula is as follows
P b =Softmax(MLP(x final ))
In which x is final And representing the sequence feature representation obtained by the feature extraction module, wherein the MLP (·) is a multi-layer perceptron, and the final boundary tag classifier adopts a Softmax () function to conduct classification prediction.
CN202310566702.5A 2023-05-19 2023-05-19 Nested named entity identification method based on dynamic graph convolution Pending CN116595982A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310566702.5A CN116595982A (en) 2023-05-19 2023-05-19 Nested named entity identification method based on dynamic graph convolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310566702.5A CN116595982A (en) 2023-05-19 2023-05-19 Nested named entity identification method based on dynamic graph convolution

Publications (1)

Publication Number Publication Date
CN116595982A true CN116595982A (en) 2023-08-15

Family

ID=87607659

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310566702.5A Pending CN116595982A (en) 2023-05-19 2023-05-19 Nested named entity identification method based on dynamic graph convolution

Country Status (1)

Country Link
CN (1) CN116595982A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116757216A (en) * 2023-08-15 2023-09-15 之江实验室 Small sample entity identification method and device based on cluster description and computer equipment
CN118246453A (en) * 2024-05-20 2024-06-25 昆明理工大学 Nested entity recognition model based on graph convolution, construction method thereof and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116757216A (en) * 2023-08-15 2023-09-15 之江实验室 Small sample entity identification method and device based on cluster description and computer equipment
CN116757216B (en) * 2023-08-15 2023-11-07 之江实验室 Small sample entity identification method and device based on cluster description and computer equipment
CN118246453A (en) * 2024-05-20 2024-06-25 昆明理工大学 Nested entity recognition model based on graph convolution, construction method thereof and storage medium

Similar Documents

Publication Publication Date Title
CN112560432B (en) Text emotion analysis method based on graph attention network
CN109992629B (en) Neural network relation extraction method and system fusing entity type constraints
CN1677388B (en) Method and system for translating Input semantic structure into output semantic structure according to fraction
CN117076653B (en) Knowledge base question-answering method based on thinking chain and visual lifting context learning
CN113204952B (en) Multi-intention and semantic slot joint identification method based on cluster pre-analysis
CN111046179B (en) Text classification method for open network question in specific field
CN117151220B (en) Entity link and relationship based extraction industry knowledge base system and method
CN112183094B (en) Chinese grammar debugging method and system based on multiple text features
CN116595982A (en) Nested named entity identification method based on dynamic graph convolution
CN111309918A (en) Multi-label text classification method based on label relevance
CN118093834B (en) AIGC large model-based language processing question-answering system and method
CN114548101A (en) Event detection method and system based on backtracking sequence generation method
CN113255366B (en) Aspect-level text emotion analysis method based on heterogeneous graph neural network
CN114691864A (en) Text classification model training method and device and text classification method and device
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN111145914A (en) Method and device for determining lung cancer clinical disease library text entity
CN114880307A (en) Structured modeling method for knowledge in open education field
CN115081472A (en) Pulse signal syntax modeling and feature extraction method for radar behavior analysis
CN114626378B (en) Named entity recognition method, named entity recognition device, electronic equipment and computer readable storage medium
CN116955579B (en) Chat reply generation method and device based on keyword knowledge retrieval
CN118227790A (en) Text classification method, system, equipment and medium based on multi-label association
CN111666375A (en) Matching method of text similarity, electronic equipment and computer readable medium
CN116595170A (en) Medical text classification method based on soft prompt
CN117216617A (en) Text classification model training method, device, computer equipment and storage medium
CN118057354A (en) Event detection method based on meta attribute learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination