CN114756679A - Chinese medical text entity relation combined extraction method based on conversation attention mechanism - Google Patents

Chinese medical text entity relation combined extraction method based on conversation attention mechanism Download PDF

Info

Publication number
CN114756679A
CN114756679A CN202210315494.7A CN202210315494A CN114756679A CN 114756679 A CN114756679 A CN 114756679A CN 202210315494 A CN202210315494 A CN 202210315494A CN 114756679 A CN114756679 A CN 114756679A
Authority
CN
China
Prior art keywords
entity
sentence
layer
input
entities
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210315494.7A
Other languages
Chinese (zh)
Inventor
黄杰
罗之宇
张蕾
万健
史斌彬
张丽娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lover Health Science and Technology Development Co Ltd
Original Assignee
Zhejiang Lover Health Science and Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lover Health Science and Technology Development Co Ltd filed Critical Zhejiang Lover Health Science and Technology Development Co Ltd
Priority to CN202210315494.7A priority Critical patent/CN114756679A/en
Publication of CN114756679A publication Critical patent/CN114756679A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation

Abstract

The invention discloses a Chinese medical text entity relation joint extraction method based on a conversation attention mechanism. The invention makes a conversational interaction between the relationships by proposing the idea of feature fusion of the CLN layer and the position information and introducing a Talking head association mechanism. The relation between the entity type and the relation type is strengthened, and the accuracy of the model is greatly improved.

Description

Chinese medical text entity relation combined extraction method based on conversation attention mechanism
Technical Field
The invention belongs to the technical field of computer application, and relates to a Chinese medical text entity relationship joint extraction method based on a conversation attention mechanism.
Background
The medical knowledge map is constructed according to medical domain knowledge, aims to systematically organize knowledge in medical texts by establishing association relations among medical entities, and provides convenience for downstream data searching, mining and analyzing. The medical field has a great deal of text information, but how to extract the required medical knowledge from the medical text to construct the knowledge map has become a focus of research now.
The method is characterized in that a knowledge graph is constructed without leaving Information Extraction (IE), and the research difficulty in the Information Extraction (IE) is two tasks of named entity identification (NER) and entity Relationship Extraction (RE). In an era of rapid development of the Natural Language Processing (NLP) field, a pipeline approach and a joint extraction approach have been proposed to address both of these problems.
At present, the application is widely based on a traditional pipeline mode, and the pipeline mode is to extract entities and then identify the relationship of the entities. The traditional pipeline model needs to be trained by using real entity labels in training, and the output of an entity recognition model is used in a relation extraction stage, so that the performance of the relation extraction model is reduced due to the difference in distribution between the real entity labels and the output of the entity recognition model. In fact, there is some implicit relationship between entity types and relationship types, and the pipelined approach does not take advantage of such relationships. And the pipeline type extracts the relation aiming at each entity pair, thereby causing a great deal of information waste. Moreover, for the entity relationship overlapping problem, the traditional model cannot provide a better solution. Thus, the method based on joint extraction starts to enter the visual field of people, and the method can effectively solve the difficulties encountered by the traditional method.
On the basis of entity relationship combined extraction, the method provides a concept of feature fusion of a CLN layer and position information, introduces a Talking head association mechanism, and carries out a Talking (interaction) among all relationships. The relation between the entity type and the relation type is strengthened, and the accuracy of the model is greatly improved.
Disclosure of Invention
The invention aims to provide a combined extraction model which can be effectively applied to the medical field aiming at the defects of the prior art.
In order to achieve the purpose, the invention provides a Chinese medical text entity relation joint extraction method based on a conversation attention mechanism, which comprises the following steps of:
step 1, inputting sentences into a RoBERTA layer, fully extracting sentence characteristics and mining association between words:
inputting sentences into a RoBERTA layer, fully extracting sentence characteristics and mining association between words; extracting head entities and tail entities in the same step, and predicting the relation types among the entities; marking each input initial (start) and end (end) by a pointer, and converting the multi-fragment problem into N2 classifications, wherein N is the length of a sequence; processing the sequence matrix subjected to entity extraction by a CLN layer and a THA layer to complete extraction of the triples;
Step 2, extracting entities of input sentences, and extracting triples by using two cascade modules according to a stacked pointer network, wherein the two modules correspondingly comprise entity extraction and corresponding relation extraction of two contents; extracting entities of each input sentence, wherein the entities comprise a head entity and a tail entity; the extracted entity, namely the head entity, is input into the next module, all the relationships are traversed, and whether a relationship capable of matching the head entity with the tail entity exists is calculated;
and 3, traversing all different objects, inputting the objects into a subsequent module, and extracting the triples.
Further, in the step 1, the RoBERTa layer carries out feature extraction and sentence modeling based on a transform algorithm bidirectional coding characterization algorithm;
slicing and annotating the input sentence, and performing distributed representation on the sentence:
X={X1,X2,…,Xt,…,Xn}#(1)
Xt=ET+ES+EP#(2)
each segment comprises a word vector, a text vector and a position vector; in the formula, ET represents a word vector (spoken-Embedding), Es represents a text vector (Eseg-Embedding), and Ep represents a position vector (Epos-Embedding).
Further, each input sentence in step 2 is passed through a 12-layer RoBERTa encoder to obtain a coding vector h for extracting all entities in the input sentence, including a head entity and a tail entity; allocating 0/1 binary marks to each mark point by initializing a pointer network; 0/1 binary marking the initial (start) and end (end) positions of the recognized entity, the marked entity will be input as an object to the next level module;
Figure BDA0003567916500000021
In the formula sstartAnd sendThe result of the output is shown, which is the set of probabilities for the start and end positions of all positions; if the position probability exceeds a set threshold value, marking the position probability as 1, and if not, marking the position probability as 0;
Figure BDA0003567916500000031
represents the weight in the fully connected layer, and the new weight is updated through each input;
Figure BDA0003567916500000032
representing a bias vector, wherein sigma is a sigmoid function as an activation function;
the representation of all objects in the input sentence x is optimized by the following likelihood function;
Figure BDA0003567916500000033
wherein L is the length of the sentence; in the initial (start) and end (end) sequences of the output, the start position of the entity is marked 1, its R1Is 1, R thereof2Is 0; the end position of the entity is marked 1, R thereof1Is 0, R thereof2Is 1; parameter(s)
Figure BDA0003567916500000034
Further, in step 3, a scene generated by the text with the fixed-length vector as the condition is fused into beta and gamma of the normalization layer; the concrete implementation formula is as follows:
Figure BDA0003567916500000035
wherein avg is the average value of h and std is the standard deviation of h; beta and gamma are two dynamic matrices that are iterated continuously according to the change of the object in the input sentence;
before entering THA layer, the output result of CLN layer is compared with the previous E when entity is extracted pos-EmbeddingSplicing and combining are carried out;
Figure BDA0003567916500000036
the newly derived mixed attention formula is shown below:
Figure BDA0003567916500000037
Figure BDA0003567916500000038
Figure BDA0003567916500000039
in the formula, different Query, Key and Value weight matrixes are used, and each matrix is generated by random initialization; then embedding and projecting words into different spaces through training;
Figure BDA00035679165000000310
represents the ith feature calculation result, and J represents the calculation results of all the headsiSplicing together; j is a unit ofiRepresenting each feature through two dialogs to associate all the features; o isiA result representing attention of the output dialog feature;
Figure BDA00035679165000000311
wherein rstart and rend denote the output result, which is the set of probabilities for the start and end positions of all positions;
Figure BDA00035679165000000312
represents the weight in the fully connected layer, and the new weight is updated through each input;
Figure BDA0003567916500000041
is to represent a bias vector, and sigma is a sigmoid function as an activation function;
the corresponding relation representation of all objects in the input sentence x is optimized by the following likelihood function:
Figure BDA0003567916500000042
wherein L is the length of the sentence; in the output start or end sequence, the start position of the tail entity of the corresponding relation is marked as 1, and I of the tail entity is1Is 1, I thereof2Is 0; the end position of the tail entity of the correspondence is marked 1, its I 1Is 0, I thereof2Is 1; parameter(s)
Figure BDA0003567916500000043
For training set D, for each sentence xiThe likelihood functions of the entities and relationships of (a) are summed; an Adam loss function method is adopted, and a K value is maximized to train the model; the learning rate of the optimizer is set to be a larger value, and then the learning rate is dynamically reduced according to the increase of times, so that the efficiency and the effect are achieved; in the formula, TiRepresenting all objects in the input sentence, TrRepresenting all relations corresponding to the head entity;
Figure BDA0003567916500000044
the invention has the following beneficial effects: the invention makes a conversational interaction between the relations by proposing the idea of feature fusion between the CLN layer and the position information and introducing a Talking head association mechanism. The relation between the entity type and the relation type is strengthened, and the accuracy of the model is greatly improved.
Drawings
FIG. 1 is an overall architecture diagram of the C-THA model.
FIG. 2 is a RoBERTA for sentence modeling.
FIG. 3 is a schematic diagram of the operation of the Conditional Layer Normalization module.
Detailed Description
The present invention is further analyzed with reference to the following specific examples.
The technology provided by the invention is a Chinese medical text entity relation joint extraction method based on a conversation attention mechanism, which comprises the following steps:
Step 1, inputting sentences into a RoBERTA layer, fully extracting sentence characteristics and mining association between words:
the overall architecture of the model is roughly divided into two parts: the first part is a RoBERTa layer; the second part mainly comprises a CLN layer and a THA layer, corresponding guest entities are predicted according to each relation corresponding to the host entities, and the overall architecture of the model is shown in figure 1. The sentence is input into a RoBERTA layer, and sentence characteristics are fully extracted and association between words is mined. And (3) extracting head entities and tail entities which are placed in the same step, and predicting the relationship types between the entities when the tail entities are extracted. Labeling the head and tail of each input through pointer labeling, and converting the multi-fragment problem into N2 classifications (N is the sequence length). And then, processing a CLN layer and a THA layer on the sequence matrix subjected to entity extraction to finish extraction of the triples.
RoBERTa does not have much innovation in the model relative to BERT, and is improved in detail. RoBERTa has longer training time, larger block size (batch size), more training data, and dynamically adjusted Masking mechanism. Additional output layers can be used for fine tuning through the RoBERTA model, so that the RoBERTA model has excellent performance in downstream tasks. The whole RoBERTA principle is a two-way coding characterization algorithm based on a Transformer algorithm, so that feature extraction and sentence modeling are performed. This model, as an automated coding model, can reconstruct the original data from noisy data by masking out some part of the words in the corpus with a 12-layer transform decoder (encoder), predicting according to the language model, and annotating them with [ MASK ] symbols. The training flow is shown in fig. 2.
Slicing and annotating the input sentence, and performing distributed representation on the sentence:
X={X1,X2,…,Xt,…,Xn}#(1)
Xt=ET+ES+EP#(2)
each segment contains a word vector, a text vector, and a position vector. In the formula, ET represents a word vector (spoken-Embedding), Es represents a text vector (Eseg-Embedding), and Ep represents a position vector (Epos-Embedding).
Step 2, extracting entities of input sentences:
the method is based on a stacked pointer network, and the basic idea is to use two cascaded modules to extract a triple, wherein the two modules correspondingly extract two contents including entity extraction and corresponding relation. And performing entity extraction on each input sentence, wherein the entity extraction comprises a head entity and a tail entity. The extracted entity, i.e., the head entity, is input to the next module, and all relationships are traversed to calculate whether a relationship exists that matches the head entity to the tail entity.
Each input sentence is passed through a 12-layer RoBERTa encoder to obtain a coding vector h for extracting all entities in the input sentence, including a head entity and a tail entity. More deeply, it is essentially a binary problem, assigning 0/1 a binary label to each label point by initializing a pointer network. The (0/1) binary labels indicate the start (start) and end (end) positions of the identified entities, which are to be input as objects to the next level of modules.
Figure BDA0003567916500000061
In the formula sstartAnd sendWhat is shown is the output result, which is a set of probabilities for the start and end positions of all positions. If this position probability exceeds the threshold set by us, it is marked as 1, if not, it is marked as 0.
Figure BDA0003567916500000062
Are representative of the weights in the fully connected layer, with new weights updated through each input.
Figure BDA0003567916500000063
Is to represent the offset vector, and σ is the sigmoid function as the activation function.
The representation of all objects in the input sentence x is optimized by the following likelihood function.
Figure BDA0003567916500000064
Wherein L is the length of the sentence. In the start (start) and end (end) sequences of the output, the start position of the entity is marked 1, its R1Is 1, R thereof2Is 0. The end position of the entity is marked 1, R thereof1Is 0, R thereof2Is 1. Parameter(s)
Figure BDA0003567916500000065
Step 3, traversing all different objects and inputting the objects into a subsequent module to extract the triples:
the CLN Layer is called the conventional Layer Normalization. The method takes a vector with fixed length as a scene of text generation of conditions, and the conditions are fused into beta and gamma of a standardization Layer (Layer standardization). The working principle is shown in fig. 3. The concrete implementation formula is as follows.
Figure BDA0003567916500000066
Where avg is the average value of h and std is the standard deviation of h. β and γ are two dynamic matrices that are iterated over time according to the changes in the objects in the input sentence. For the model of the method, vectors of fixed length are used as initial beta and gamma unconditionally.
That is, the method can transform the input condition object to the same dimension as β and γ by initializing the transformation matrix with two different all zeros, and then adding the two transformation results to β and γ, respectively. In this state, the model remains the same as the original pre-trained model.
Before entering THA layer, the method extracts the output result of CLN layer and the previous entitypos-EmbeddingSplicing and combining are carried out. The idea not only utilizes the relevant parameters when entity recognition is carried out, but also improves the accuracy when the relation is extracted.
Figure BDA0003567916500000067
The THA layer is called a Talking Head Attention (Talking Head Attention) layer. The original Multi-head Attention focuses only on the expression of each feature (head), the operations of each feature (head) are isolated from each other, and stronger Attention (Attention) can be gained by connecting (Talking) the features. The original formula is as follows:
Figure BDA0003567916500000071
Figure BDA0003567916500000072
MultiHead(Q,K,V)=Concat(head1,…,headh)WO#(9)
the dialogue Attention mechanism (Talking Head Attention) uses a multi-Head Attention as two parameter matrixes lambda on the basis of the dialogue Attention mechanismLAnd λWRe-fusion into multiple mixed attentions. Each new resulting hybrid attention fuses the original feature (head) attentions. The formula is as follows:
Figure BDA0003567916500000073
Figure BDA0003567916500000074
Figure BDA0003567916500000075
In the formula, different Query, Key and Value weight matrixes are used, and each matrix is generated by random initialization. Word embedding is then projected into a different space through training.
Figure BDA0003567916500000076
Represents the ith feature (head) computation result, and J represents the computation result J of all features (heads)iAre spliced together. J is a unit ofiRepresenting that each feature (head) associates all features (heads) through two sessions (talking). O isiRepresenting the result of the Talking Head Attention after output.
Figure BDA0003567916500000077
In the equations rstart and rend denote the output result, which is a set of probabilities for the start and end positions of all positions. If this position probability exceeds the threshold we set, it is marked as 1, if not, it is marked as 0.
Figure BDA0003567916500000078
Are representative of the weights in the fully connected layer, with new weights updated through each input.
Figure BDA0003567916500000079
Is to represent the offset vector, and σ is the sigmoid function as the activation function.
The corresponding relation representation of all objects in the input sentence x is optimized by the following likelihood function:
Figure BDA00035679165000000710
wherein L is the length of the sentence. In the output start (start) or end (end) sequence, the tail entity start position of the correspondence is marked as 1, its I 1Is 1, I thereof2Is 0. The end position of the tail entity of the correspondence is marked 1, its I1Is 0, I thereof2Is 1. Parameter(s)
Figure BDA0003567916500000081
In fig. 1, we can see that for each sentence output, all the entities are matched one by one in the schema, and a matrix of length 2 relations is constructed. As shown in the figure, the entity "pancreatic cancer" is compared with the relationships of "imaging examination", "age of onset", "clinical manifestation", etc., to find the entity with the highest probability of forming a triple with the tail entity. Finally, two triplets of "pancreatic cancer-imaging examination-ultrasound examination" and "pancreatic cancer-clinical manifestation-pancreatic mass" were found.
For training set D, we are for each sentence xiThe likelihood functions of the entities and the relationships of (a) are summed. We train the model using the Adam loss function approach, maximizing the K value. The learning rate at the beginning of the optimizer is set to a larger value and then dynamically decreased as the number of times increases to achieve both efficiency and effectiveness. In the formula, TiRepresenting all objects in the input sentence, TrAll relationships corresponding to the head entities are represented.
Figure BDA0003567916500000082
The experimental process comprises the following steps:
the experiment uses a Baidu2019 and CHIP2020 dataset, and the Baidu2019 dataset is the largest industry-scale Chinese information extraction dataset based on the triple schema (schema), and comprises 50 predefined schemas, 21 ten thousand Chinese sentences and 43 ten thousand triple data. The sentences in the data set are from hundred encyclopedia and hundred information stream text. The data set was divided into 17 ten thousand training sets, 2 ten thousand validation sets and 2 ten thousand test sets. The CHIP-2020 is derived from a Chinese medical information extraction dataset based on a model and is jointly constructed by a national language processing laboratory of Zhengzhou university and a computer linguistics key laboratory of the education department of the Beijing university. CHIP-2020 is a chinese medical data set with high annotation quality and high full-text coverage at present, and contains nearly 2 million disease statements of 109 common diseases.
The statistics of these two data sets are shown in table 1. It was observed that both the Baidu2019 and CHIP-2020 data sets present a physical overlap problem, especially in CHIP-2020, the data set accounts for over 60%. In two respects, the difficulty in the data set of Baidu2019 is easier than that of CHIP2020 for entities that need entity identification. In CHIP2020, more medical terminology is used and the overlap rate is higher. In the aspect of relationship overlapping, the text sentences of the Baidu2019 data set are all in a single sentence format, most of the CHIP2020 are spliced sentences, the context relationship is emphasized, and the relationship identification difficulty is higher. The baseline herein was established with the caseel model and the differences from other models were compared in the general field on the Baidu2019 dataset. For the medical text data set, CHIP-2020, the effectiveness of the baseline model and the model herein at each module is compared.
Table 1 data set statistics
Figure BDA0003567916500000091
In the experiment, accuracy (precision), recall (recall) and F1 scores are used as comprehensive evaluation indexes of extraction results. The concrete formula is as follows:
Figure BDA0003567916500000092
Figure BDA0003567916500000093
Figure BDA0003567916500000094
wherein TP is the number of triples with correct prediction, FP is the number of irrelevant triples with correct prediction, and FN is the triples without prediction.
In this experiment, the joint extraction model was based on the keras frame. Further, the hardware and software environments are shown in table 2.
Table 2 experimental environment configuration
Figure BDA0003567916500000095
In experiments, the method uses the Baidu2019 data set to verify the joint extraction capability of the model and compares the model with other joint models to prove the effectiveness of the model. In addition, ablation experiments were performed herein on CHIP2020, focusing on the contribution of RoBERTa encoder, CLN layer and THA layer to the improvement of model performance and evaluating the degree of cooperation of the two methods.
TABLE 3 comparison of different models on the Baidu2019 dataset
Figure BDA0003567916500000101
Table 4 comparison of different models on CHIP2020 dataset
Figure BDA0003567916500000102
The results of comparing the model herein with other models for the Baidu2019 dataset are shown in table 3. The method combines a sentence-level extraction model with a simple corpus-level module to realize the aggregation of single entities. When extracting entities, the CoType uses a text segmentation algorithm, and when extracting subsequent entities and relations, text features and type labels are embedded into two low-dimensional spaces belonging to the entities and the relations. The Multi-head selection proposes a federated neural model that can perform both entity recognition and relationship extraction without requiring any manually extracted features or using any external tools. The entity recognition task and the relationship extraction task are modeled as a multi-head selection problem, i.e., each entity identifies multiple relationships. Casrel models relationship relationships as a function of mapping head entity objects to tail entity objects, rather than treating them as labels on entity pairs. Etl-Span based labeling scheme, which decomposes the entity identification and relationship extraction two subtasks into several sequence label problems and solves the problems through a hierarchical boundary marker and a multi-Span decoding algorithm.
From experimental results, for the Baidu2019 dataset, the effect of MultiR is poor, mainly because the algorithm is not enough in terms of the overlap problem. The improvement of Cotype, Multi-head selection, Casrel and Etl-Span on the model mainly aims at the problem of entity relationship overlapping, and the overall effect is much higher than that of MultiR by adding the global semantic information coded by Bert. However, the model herein not only far surpasses the baseline model in accuracy and recall, but also reaches 0.819 above the respective model in the F1 value. Compared with other models, the model more fully utilizes the relation between the entity and the relation and is more suitable for solving the overlapping problem. Therefore, to compare the efficacy of the individual modules in the model, we performed validation on the CHIP-2020 dataset, as in table 4.
From the perspective of ablation experiments, the comparison between our model and the baseline model is not ideal because of the problems of sentence splicing, entity overlapping, etc. in the CHIP2020 dataset, and the direct application of the conventional joint extraction model to this dataset for experiments is not ideal. It can be seen from table 4 that the F1 values for the baseline model are relatively low. The f1 value of the model was increased by 0.04 using RoBERTa embedding. The effect of the CLN layer module and the THA layer module is larger in improvement of the baseline model, and the interrelation between entity identification and relation extraction is effectively utilized. The F1 value in CHIP2020 finally rises to 0.64 for our model, which is 0.16 higher than the baseline model.

Claims (4)

1. A Chinese medical text entity relation combined extraction method based on a conversation attention mechanism is characterized by comprising the following steps:
step 1, inputting sentences into a RoBERTA layer, fully extracting sentence characteristics and mining association between words:
inputting sentences into a RoBERTA layer, fully extracting sentence characteristics and mining association between words; extracting head entities and tail entities in the same step, and predicting the relation types among the entities; marking each input initial (start) and end (end) by a pointer, and converting the multi-segment problem into N2 classes, wherein N is the sequence length; processing the sequence matrix subjected to entity extraction by a CLN layer and a THA layer to complete extraction of the triples;
2, extracting entities of input sentences, and extracting triples by using two cascade modules according to a cascading pointer network, wherein the two modules correspondingly comprise entity extraction and corresponding relation extraction two contents; extracting entities of each input sentence, wherein the entities comprise a head entity and a tail entity; the extracted entity, namely the head entity is input into the next module, all the relations are traversed, and whether a relation capable of matching the head entity with the tail entity exists is calculated;
And 3, traversing all different objects, inputting the objects into a subsequent module, and extracting the triples.
2. The method for extracting Chinese medical text entity relationship jointly based on conversation attention mechanism as claimed in claim 1, wherein: in the step 1, the RoBERTA layer carries out feature extraction and sentence modeling based on a transform algorithm bidirectional coding characterization algorithm;
slicing and annotating the input sentence, and performing distributed representation on the sentence:
X={X1,X2,…,Xt,…,Xn}#(1)
Xt=ET+ES+EP#(2)
each segment comprises a word vector, a text vector and a position vector; in the formula, ET represents a word vector (spoken-Embedding), Es represents a text vector (Eseg-Embedding), and Ep represents a position vector (Epos-Embedding).
3. The method for extracting the Chinese medical text entity relationship based on the conversational attention mechanism as claimed in claim 1 or 2, wherein: in the step 2, each input sentence obtains a coding vector h through a 12-layer RoBERTA coder, and all entities in the input sentence are extracted, including a head entity and a tail entity; allocating 0/1 binary marks to each mark point by initializing a pointer network; 0/1 binary marking the initial (start) and end (end) positions of the identified entity, the marked entity will be input as an object to the next level of module;
Figure FDA0003567916490000021
In the formula sstartAnd sendThe result of the output is shown, which is the set of probabilities for the start and end positions of all positions; if the position probability exceeds a set threshold value, marking the position probability as 1, and if not, marking the position probability as 0;
Figure FDA0003567916490000022
represents the weight in the fully connected layer, and the new weight is updated through each input;
Figure FDA0003567916490000023
representing a bias vector, wherein sigma is a sigmoid function as an activation function;
the representation of all objects in the input sentence x is optimized by the following likelihood function;
Figure FDA0003567916490000024
wherein L is the length of the sentence; in the initial (start) and end (end) sequences of the output, the start position of the entity is marked 1, its R1Is 1, R thereof2Is 0; the end position of the entity is marked 1, R thereof1Is 0, R thereof2Is 1; parameter(s)
Figure FDA0003567916490000025
4. The method for extracting entity relationship of Chinese medical texts based on the conversational attention mechanism as claimed in claim 1 or 2, wherein the text generation scenario with fixed length vectors as conditions in step 3 is to fuse the conditions into β and γ of the normalization layer; the concrete implementation formula is as follows:
Figure FDA0003567916490000026
wherein avg is the average value of h and std is the standard deviation of h; beta and gamma are two dynamic matrices that are iterated continuously according to the change of the object in the input sentence;
Before entering THA layer, the output result of CLN layer is compared with the previous E when entity is extractedpos-EmbeddingSplicing and combining are carried out;
Figure FDA0003567916490000027
the newly derived mixed attention formula is shown below:
Figure FDA0003567916490000028
Figure FDA0003567916490000029
Figure FDA00035679164900000210
in the formula, different Query, Key and Value weight matrixes are used, and each matrix is generated by random initialization; then embedding and projecting words into different spaces through training;
Figure FDA00035679164900000211
represents the ith feature calculation result, and J represents the calculation results of all the headsiSplicing together; j. the design is a squareiRepresenting each feature through two dialogs to associate all the features; o isiA result representing attention of the output dialog feature;
Figure FDA0003567916490000031
wherein rstart and rend denote the output result, which is the set of probabilities for the start and end positions of all positions;
Figure FDA0003567916490000032
represents the weight in the fully connected layer, and the new weight is updated through each input;
Figure FDA0003567916490000033
is to represent a bias vector, and sigma is a sigmoid function as an activation function;
the corresponding relation representation of all objects in the input sentence x is optimized by the following likelihood function:
Figure FDA0003567916490000034
wherein L is the length of the sentence; in the output start or end sequence, the start position of the tail entity of the corresponding relation is marked as 1, and I of the tail entity is 1Is 1, I thereof2Is 0; the end position of the tail entity of the correspondence is marked 1, its I1Is 0, I thereof2Is 1; parameter(s)
Figure FDA0003567916490000035
For training set D, for each sentence xiLikelihood functions of the entities and relationships ofRow summation; training a model by adopting an Adam loss function method and maximizing a K value; the learning rate of the optimizer is set to be a larger value, and then the learning rate is dynamically reduced according to the increase of times, so that the efficiency and the effect are achieved; in the formula, TiRepresenting all objects in the input sentence, TrRepresenting all relations corresponding to the head entity;
Figure FDA0003567916490000036
CN202210315494.7A 2022-03-28 2022-03-28 Chinese medical text entity relation combined extraction method based on conversation attention mechanism Pending CN114756679A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210315494.7A CN114756679A (en) 2022-03-28 2022-03-28 Chinese medical text entity relation combined extraction method based on conversation attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210315494.7A CN114756679A (en) 2022-03-28 2022-03-28 Chinese medical text entity relation combined extraction method based on conversation attention mechanism

Publications (1)

Publication Number Publication Date
CN114756679A true CN114756679A (en) 2022-07-15

Family

ID=82327471

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210315494.7A Pending CN114756679A (en) 2022-03-28 2022-03-28 Chinese medical text entity relation combined extraction method based on conversation attention mechanism

Country Status (1)

Country Link
CN (1) CN114756679A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116894436A (en) * 2023-09-06 2023-10-17 神州医疗科技股份有限公司 Data enhancement method and system based on medical named entity recognition

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116894436A (en) * 2023-09-06 2023-10-17 神州医疗科技股份有限公司 Data enhancement method and system based on medical named entity recognition
CN116894436B (en) * 2023-09-06 2023-12-15 神州医疗科技股份有限公司 Data enhancement method and system based on medical named entity recognition

Similar Documents

Publication Publication Date Title
CN110348016B (en) Text abstract generation method based on sentence correlation attention mechanism
CN114169330B (en) Chinese named entity recognition method integrating time sequence convolution and transform encoder
CN111159407B (en) Method, apparatus, device and medium for training entity recognition and relation classification model
Zhang et al. SG-Net: Syntax guided transformer for language representation
CN112183094B (en) Chinese grammar debugging method and system based on multiple text features
CN109857846B (en) Method and device for matching user question and knowledge point
CN112883193A (en) Training method, device and equipment of text classification model and readable medium
CN111832293A (en) Entity and relation combined extraction method based on head entity prediction
CN115392252A (en) Entity identification method integrating self-attention and hierarchical residual error memory network
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN113742733A (en) Reading understanding vulnerability event trigger word extraction and vulnerability type identification method and device
CN115759092A (en) Network threat information named entity identification method based on ALBERT
Gong et al. Continual pre-training of language models for math problem understanding with syntax-aware memory network
CN111145914B (en) Method and device for determining text entity of lung cancer clinical disease seed bank
CN114756679A (en) Chinese medical text entity relation combined extraction method based on conversation attention mechanism
CN114332519A (en) Image description generation method based on external triple and abstract relation
CN117033423A (en) SQL generating method for injecting optimal mode item and historical interaction information
CN115860002A (en) Combat task generation method and system based on event extraction
CN117009456A (en) Medical query text processing method, device, equipment, medium and electronic product
CN113468311B (en) Knowledge graph-based complex question and answer method, device and storage medium
CN115859978A (en) Named entity recognition model and method based on Roberta radical enhanced adapter
CN115659172A (en) Generation type text summarization method based on key information mask and copy
CN115062123A (en) Knowledge base question-answer pair generation method of conversation generation system
CN114943216A (en) Case microblog attribute-level viewpoint mining method based on graph attention network
CN114595700A (en) Zero-pronoun and chapter information fused Hanyue neural machine translation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination