CN117151102A - Traditional Chinese medicine document-level relation extraction method, system, electronic equipment and medium based on local path enhancement - Google Patents

Traditional Chinese medicine document-level relation extraction method, system, electronic equipment and medium based on local path enhancement Download PDF

Info

Publication number
CN117151102A
CN117151102A CN202311037068.2A CN202311037068A CN117151102A CN 117151102 A CN117151102 A CN 117151102A CN 202311037068 A CN202311037068 A CN 202311037068A CN 117151102 A CN117151102 A CN 117151102A
Authority
CN
China
Prior art keywords
chinese medicine
entity
path
traditional chinese
entities
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311037068.2A
Other languages
Chinese (zh)
Inventor
黄泽昊
石云成
刘琼
王笳辉
谢文飞
段亮
岳昆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan University YNU
Original Assignee
Yunnan University YNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan University YNU filed Critical Yunnan University YNU
Priority to CN202311037068.2A priority Critical patent/CN117151102A/en
Publication of CN117151102A publication Critical patent/CN117151102A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/041Abduction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Chemical & Material Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Toxicology (AREA)
  • Epidemiology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The extraction of entity relation is carried out on text data such as traditional Chinese medicine electronic medical records, prescriptions, clinical records, literature documents and the like, and is an important guarantee for efficiently and accurately constructing a traditional Chinese medicine knowledge graph. Based on the characteristics of complex entity relations existing in Chinese medicine text data, the application combines the multi-word meaning phenomenon existing in Chinese medicine entity in Chinese medicine auxiliary characteristics to perform entity alignment, and provides a document-level Chinese medicine entity relation reasoning path construction method based on heuristic rules aiming at the long-distance dependence problem existing in Chinese medicine entity relation reasoning; and introducing Chinese medicinal prescription information, and carrying out characteristic enhancement on the reasoning path based on the knowledge of the field so as to enable the reasoning path to have more complete Chinese medicinal characteristics. The method provided by the application can effectively improve the accuracy of the extraction result of the entity relationship of the traditional Chinese medicine document level, lays a foundation for the analysis and understanding of the text data of the traditional Chinese medicine, and provides technical support for learning and popularizing the relevant knowledge in the traditional Chinese medicine field and promoting the clinical diagnosis and treatment application of the traditional Chinese medicine.

Description

Traditional Chinese medicine document-level relation extraction method, system, electronic equipment and medium based on local path enhancement
Technical Field
The application discloses a traditional Chinese medicine document level relation extraction method, system, electronic equipment and medium based on local path enhancement, relates to relation extraction of traditional Chinese medicine document level data by using a deep neural network, and belongs to the technical field of information extraction.
Technical Field
In the field of traditional Chinese medicine, entity relations are extracted from traditional Chinese medicine text data such as electronic medical records, prescriptions, clinical records and literature books, so that a foundation can be laid for analysis and understanding of the traditional Chinese medicine text data, drug research and development and clinical diagnosis and promotion of traditional Chinese medicine research and development can be effectively promoted, and the method has important practical significance. At present, the extraction of the relationship between traditional Chinese medicine entities generally needs to be carried out by expertise in the medical field, but the manual labeling mode cannot meet the urgent requirement of knowledge extraction under the increasingly background of data. Therefore, an artificial intelligent model needs to be established to assist traditional Chinese medicine researches and clinical workers in rapidly extracting entity relations in traditional Chinese medicine texts such as electronic medical records, prescriptions, clinical records and literature books. In addition, by comparing and verifying the known entity relationship, the novel entity relationship can be found, so that research and innovation in the field of traditional Chinese medicine are promoted.
Entity relation extraction generally refers to extracting entities and relations among entities from semi-structured or unstructured text data, and the known method adds priori information on the basis of a pre-trained language model, so that entity relations in single sentences can be effectively extracted. For example Jing Shenqi et al (medical entity relationship extraction study based on medical domain knowledge and remote supervision [ J ], data analysis and knowledge discovery, 2022) utilize medical BERT obtained based on standard BERT model expansion to encode annotated, low noise data acquired by remote supervision and take textual descriptions of entities in an external knowledge base as prior information, thereby enhancing the performance of the relationship extraction model. Wang Haitao (< method and device for extracting Chinese entity and relationship based on span and attention mechanism, patent CN202210816017.9>, 2022), mapping span set formed by word segmentation of natural language sentences into word vector set, generating feature expression set based on pre-training model, finishing feature fusion by attention mechanism, calculating entity type of span by classifier, and adding calculated entity type before and after span to form span with boundary information and entity type, and realizing relationship extraction by using the same as prior information. However, the above method is only suitable for extracting entity relations in a single sentence, and cannot realize reasoning of document-level relations among a plurality of sentences, so that a better effect is difficult to obtain in practical application.
The document-level relation extraction is to perform relation reasoning according to the context information on the basis of entity relation extraction, so that the effect of relation extraction on long texts is achieved. The currently known method for extracting the document-level relationship based on path reasoning realizes the acquisition of the relationship information between entity pairs from a long document. For example Zhao Tiejun et al (< a document level relation extraction method based on a graph neural network and inference paths, patent CN202210617790.2>, 2022) represent global features of entity pairs in a graph by composing sentences, entities and their references into heterograms and fusing the features of multiple paths between entity pairs using an attention mechanism. HangTing et al (< a document level relation extraction method based on selective attention and path reasoning, patent CN202211134776.3>, 2022) acquires the representation of the index item pair and the corresponding sentence pair representation by using a Bi-directional long-short Term Memory network (Bi-directional Long Short-Term Memory, biLSTM) and a multi-layer perceptron, respectively constructs an intra-sentence relation graph and an inter-sentence relation graph by using a multi-layer cyclic network, and screens out sentences related to the entity pair by using selective attention and aggregates the sentences into a document subgraph based on the document graph formed by the intra-sentence subgraph and the inter-sentence subgraph so as to construct the extraction range of the target entity relation and the reasoning path of the relation. In the text data of traditional Chinese medicine, the association between entity pairs is usually hidden in the local neighborhood of the upper sentence, the local sentence and the lower sentence, and the importance of the local information on the extraction of the relation between the entity pairs of the traditional Chinese medicine is considered. However, the general model performs relationship inference based on global information, which easily introduces excessive noise, thereby affecting the quality of the relationship extraction result.
In order to overcome the defects of the prior art, the application provides a traditional Chinese medicine document-level entity relation extraction method based on heuristic rules, which uses a traditional Chinese medicine entity alignment method to eliminate common multi-word meaning phenomenon and improve the quality of entity identification. Heuristic rules are given to find supporting information between entity pairs and construct reasoning paths, so that accuracy of extraction results is improved. Furthermore, the local reasoning path is enhanced by introducing the knowledge of the traditional Chinese medicine prescription, so that the characteristic expression capability of the path is improved. And finally, carrying out relationship reasoning on the enhanced path, and improving the accuracy of relationship prediction.
Disclosure of Invention
1. Object of the application
Aiming at the problems of complex semantics, multi-text consistency, difficult relation extraction and the like of traditional Chinese medicine document data such as electronic medical records, prescriptions, clinical records, classical literature and the like, the application congeals the 3 types of traditional Chinese medicine core entity relation paradigms of 'phenomenon expression', 'treated', and 'treatment method', utilizes heuristic rules to extract reasoning paths, combines an entity alignment method and introduces traditional Chinese medicine prescription information to strengthen the paths, effectively improves the accuracy of relation extraction on the traditional Chinese medicine cross-text data, and provides technical support for the research and application of the traditional Chinese medicine field.
2. The steps of the application
The execution process of the application is divided into the following 3 steps:
(1) The structure of the traditional Chinese medicine data set: and collecting the data of the traditional Chinese medicine documents, aligning entity references, and marking different medical entities and position and relation evidence sentences.
(2) The physical route of the traditional Chinese medicine is enhanced: and (3) constructing a path of the data set marked in the step (1) through heuristic rules, and introducing knowledge of a traditional Chinese medicine prescription to strengthen the path.
(3) Extracting the traditional Chinese medicine relationship: and (3) inputting the path enhanced in the step (2) into a BioBERT model for training, calculating a loss function, iteratively updating the weight of model parameters, and using the trained model to realize traditional Chinese medicine relation extraction.
The method comprises the following specific steps:
1: data set structure of traditional Chinese medicine
1.1: alignment of Chinese medicine entities
The concept of traditional Chinese medicine such as prescriptions, medicaments and symptoms often appears in the sense of multiple words, i.e. one traditional Chinese medicine entity has multiple synonymous or near-sense expressions. For the situation, the application constructs the traditional Chinese medicine public entity library M= [ M ]' 1 ,m′ 2 ,..,m′ z ]Where z (z > 0) is the total number of entities in M, adding the alias representation of the entity in the entity library as a bias term enhancement feature representation. The ith entity in M is represented as Wherein->Representing entity name->Representing type name->Is an alias set of size Φ. The data set of the Chinese medicine document is marked as +.> Wherein n (n > 0) is the number of Chinese medicine documents, ">Is a piece of text data in the Chinese medicine document data.
For a pair ofWord segmentation is carried out to obtain->The collection of individual words, get->Arbitrary word->If-> Then->Walk->Obtaining T i Is defined by each alias b of (a) l (1.ltoreq.l.ltoreq.x) by using the open source biomedical pre-training model BioBERT, b is obtained l Feature vector +.>Finally, the following alias bias item characteristics are obtained>
If it isThen->Empty, pair->After supplementing the auxiliary features, the application uses cosine similarity (Cosine Similarity) to calculate a noun and other ∈>Similarity between terms, judging the term h of a Chinese medicine k At->Whether other synonyms exist in the obtained product, and obtaining h through a BioBERT model k Feature vector +.>Other arbitrary words->Feature vector +.>The cosine similarity is calculated according to the following formula:
where "·" represents the vector dot product "|·|" represents the Euclidean modulus of the vector,and->Respectively represent the bias terms of the Chinese medicinal entities, and the corresponding ++>Or->Is 0.
For a pair ofIs->And->Given a similarity threshold ψ (ψ > 0), if +.>Then add entity->Refer to entity +.>And then performing entity alignment on all references in the whole text data:
wherein,representation entity->Finally refers to the +.f in the public entity library of traditional Chinese medicine>The "g ()" represents an entity alignment method.
After aligning all entities and their references, the text data is updatedObtaining an aligned dataset +.> The entity with higher consistency is identified, and the accuracy of relation extraction is improved.
1.2: chinese medicine data label
The application condenses the relation paradigm of 3 Chinese medicine core entities, namely 'phenomenon expression', 'treated' and 'treatment method', and the relation paradigm is marked according to the general document-level data set DocRED format, and the aligned dataThe Chinese medicine entity and the position and relation of the Chinese medicine entity are marked to obtain corresponding marked data L, L= [ L ] 1 ,l 2 ,...,l n ]Dividing the marked data set L into training data sets D according to a certain proportion train And test dataset D val I.e. l=d train ∪D dev Wherein D is train For training models, D dev For evaluating model recognition accuracy.
2: physical route enhancement of traditional Chinese medicine
In the extraction of the traditional Chinese medicine document-level relationship, more supporting information of the relationship among the auxiliary traditional Chinese medicine entities can be obtained through path reasoning, so that the reliability of the extraction result is improved. By introducing the knowledge of the traditional Chinese medicine prescription to strengthen the path, richer traditional Chinese medicine characteristics can be obtained, thereby improving the accuracy of the extraction result.
2.1: physical path structure of traditional Chinese medicine
The application uses the following 4 heuristic rules to obtain sentences with relevance among the traditional Chinese medicine entities in the labeling data set L and connects the sentences to construct a path, and the path format is that Wherein eta (eta > 0) represents the connector entity e head And tail entity e tail Is a number of sentences of (a).
2.1.1: continuous path structure
For training data set D train =[d 1 ,d 2 ,...,d k ](k > 0) to be usedThe Chinese medicinal entity in the formula is constructed as an entity set +.>Because the entity relationship in the Chinese medicine document has causality and usually appears in adjacent texts, a minimum neighborhood threshold gamma (gamma > 0) is set, and +.>Any two different Chinese medicine entity heads e ω And tail entity e w Requirements are that(/>Representation e ω At->Sentence index of->Representation e w At->Sentence index) in (c). If there is a Chinese medicinal entity e meeting the requirements ω And e w Then a continuous path can be constructed based on the sentences in which both are located when +.>At the time of e ω And e w While a single sentence is present. Traversal D train Performing continuous path construction to obtain a continuous path set
2.1.2: critical path structure
In the Chinese medicine document data, sentences with Chinese medicine specification keywords such as diagnosis, syndrome, chinese medicine treatment method and the like are key sentences for proving Chinese medicine relations, and the Chinese medicine entity relation extraction task has higher priority than other sentences. The application designs a keyword library of traditional Chinese medicine specifications For training data set D train =[d 1 ,d 2 ,...,d k ]If->Any one of the keywords k o (0 < o < r) and any two different types of Chinese medicinal head entities e c And tail entity e v Appear at d s The same minimum neighborhood in (0 < s < k) can be based on k i 、e c And e v The sentence in question constructs a path. Considering that the sentence is already contained in the continuous path +.>Thus setting a marker [ 'key']To mark this path as a critical path, denoted +.>Traversal D train Carrying out critical path construction to obtain a critical path set +.>
2.1.3: multi-hop path structure
For the far-distance Chinese medicine head entity e in the Chinese medicine document y And tail entity e u I.e. If a series of bridging mention entities are present +.>So that +.>Is formed->Inter-sentence entity pair with bridging relation>Then a Chinese medicine head entity e can be constructed based on the generated entity pair y And a pharmaceutical tail entity e u The multi-hop paths are connected. Considering the characteristic that the entities in the Chinese medicine documents are related to each other, namely that the entities with the Chinese medicine relationship do not need too many entities to bridge, the threshold alpha (alpha > 0) is set, namely that at most only alpha bridging entities are needed. Traversal D train Carrying out multi-hop path construction to obtain a multi-hop path set +.>
2.1.4: default path structure
When all the above 3 rules are not applicable, consider using the most relevant sentences to perform rough estimation of the auxiliary evidence of the traditional Chinese medicine relationship, for d b A Chinese medicinal head entity e without path relation in (0 < b < k) h And tail entity e t Collect all of the contents e h And e t As a default path. Specifically, assume thatAnd->Respectively represent and contain e h And e t If e h And e t There is no path between, then ∈>The connection is made to construct a default path. Traversal D train Carrying out default path construction to obtain a default path set +.>Thereby obtaining a complete set of paths-> And use BioBERT to apply +.>Mapping to feature set, expressed as +.> Is the number of path features in the feature set.
2.2: chinese medicine characteristic extraction
The rapidly growing Chinese medicine text data contains rich field information, and can provide more supporting information for the relationship between the Chinese medicine prescription and the symptoms. The information is introduced into the reasoning path as the knowledge of the traditional Chinese medicine prescription, so that the accuracy of path reasoning can be effectively improved. The application comprehensively utilizes the knowledge of the traditional Chinese medicine formulas and obtains the combination data D of the formula name (pres), the function main treatment (ind), the herbal medicine components (herebs) and the traditional Chinese medicine efficacy (eff) from the knowledge of the traditional Chinese medicine formulas based on a BioBERT model aiming at the characteristic that the knowledge of the traditional Chinese medicine formulas contains the entity description information and the component efficacy information of each traditional Chinese medicine formula extral Wherein iota (iota > 0) is the knowledge number of the traditional Chinese medicine prescription:
then splicing the generated feature vectors in a dimension expansion mode to obtain a complete feature expression e of the prescription entity I =[pres I ,ind I ,herbs I ,eff I ]And (2) andthe structured knowledge base of the Chinese medicinal prescription is expressed as
2.3: knowledge fusion of Chinese medicine prescriptions
Considering that there may be a distribution deviation between the knowledge of the traditional Chinese medicine prescription and the training data, and that the meaning of the same entity may have different expressions in the training data and the knowledge base of the traditional Chinese medicine prescription, it is necessary to align the knowledge of the traditional Chinese medicine prescription with the entity and restrict the distribution thereof. For the distribution difference between the traditional Chinese medicine prescription knowledge base and the training data, the distance between the traditional Chinese medicine prescription knowledge base and the training data is measured by adopting KL divergence (Kullback-Leibler Divergence), and knowledge with smaller distribution difference with the training data is selected to be added into the candidate set C. Assuming that the probability distribution of the a-th sample in the training data is P a The probability distribution of the traditional Chinese medicine prescription knowledge base E is Q, and the KL divergence from the traditional Chinese medicine prescription knowledge to the training data is defined as follows according to the principle of taking the training data as the standard:
setting KL divergence threshold epsilon (epsilon > 0), selecting prescription entities with probability distribution similar to training data from E, and when D KL (Q||P a ) When < epsilon, adding the prescription entity into the candidate set, and expressing the element as mu a =[pres a ,ind a ,herbs a ,eff a ]Length is thenThe candidate set C expression of (a) is:
to prevent knowledge supplementation from extracting from candidate set CExcessive noise is introduced during the path, and the application adopts a multi-layer perceptron to process the path characteristic setAnd candidate set C, and based on the attention score matrix ATTN s And selecting a prescription entity from the candidate set to perform feature fusion on the path entity. Specifically, let->For the first path entity in the path feature set +.>Is characterized by (a) then->And C, performing attention score calculation:
wherein W is a1 Is the weight matrix of the first layer of the multi-layer perceptron, W a2 Is the weight matrix of the second layer, attention score matrix
When the entity in the first pathAttention score to C->Exceeding a feature fusion thresholdAnd adding the feature vectors of the two, wherein the formula is as follows:
wherein,representing the fused path entity characteristics and obtaining an enhanced path characteristic set path fset =[path′ 1 ,path′ 2 ,...,path′ τ ]。
3: extraction of Chinese medicine relationship
3.1: traditional Chinese medicine relation extraction model construction
The enhanced path feature set path is obtained by the step 2.3 fset Then obtaining the hidden state representation of the feature sequence by a BioBERT modelIn order to obtain information of traditional Chinese medicine entities related to paths, a word vector-based traditional Chinese medicine entity mention representation method is defined as follows:
wherein s and t represent the kth mentioned start and end positions, respectively, in relation to path c,representing the kth mentioned related fragment in path c represents +.>Is the hidden state of the BioBERT model, expressed as the hidden state vector of the j-th marker in the text sequence of the Chinese medicine input.
Based on the definition mentioned by the pharmaceutical entity, the following representation of the pharmaceutical entity is further given:
wherein,the representation of the pharmaceutical entity in path c is shown.
Then, according to the current path c, calculating the probability of occurrence of each traditional Chinese medicine relation r by using a two-layer perceptron:
wherein,representing the probability of the occurrence of a traditional Chinese medicine relation r of a medical entity pair (i, j), wherein sigma is a Sigmoid activation function, and F (·) represents a two-layer perceptron, < ->Is the position characteristic among Chinese medicinal entities +.>Representing the multiplication of two Chinese medicinal entity embedding vectors element by element.
3.2: loss function construction
Based on the BioBERT model, the relevance between each pair of traditional Chinese medicine entities is predicted, and the problem of unbalanced traditional Chinese medicine relationship types exists, so that the average loss of all paths is calculated, the scores of the paths are overlapped and the maximum value is taken, the cross entropy is used as a loss function of the model, and the method is defined as follows:
where f denotes the index of the current path, g denotes the index of the target class,the value of (1) is 0 or 1, indicating whether the path f belongs to the target class g. The loss of each path in the loss function is weighted such that paths associated with the target class g are subject to a higher penalty, while paths not associated with the target class are subject to a smaller penalty. P is p f Predictive score, w, representing current path f f Weights representing the current path f, calculated by how frequently the path appears in the training setCalculating, i.e. w f For the proportion of the number of the entity pairs of the path f to the number of all the entity pairs in the training set, the calculation formula is as follows:
wherein N is f Represents the corresponding number of the Chinese medicine entity pairs, N, of the current path f x Representing Path set Path set The number of pairs of Chinese medicinal entities corresponding to the route x in the list, if the route is a critical route, the list is provided with an identifier [ 'key']Is given an initial weight delta ff > 0), according to w' f =w ff Weights are calculated, to avoid overfitting, the following L is introduced 2 Regularization term:
where θ (θ > 0) is a parameter of the model, λ (λ > 0) is a regularization parameter, and the following final loss function is obtained in combination with equation (3-5):
L(y,P,θ)=L(y,P)+L reg (θ) (3-7)
3.3: traditional Chinese medicine relationship prediction
In the back propagation of model training, the optimizer iteratively updates the weight parameters in the model using a random gradient descent (Stochastic Gradient Descent, SGD) algorithm:
wherein w is t For the weight parameters to be updated currently, E (E > 0) is the learning rate,regarding w as an overall loss function t Gradient of w' t Is w t Updated parameters.
Inputting the pretreated traditional Chinese medicine data into a trained traditional Chinese medicine documentThe stage relation extraction model is used for obtaining a relation extraction prediction result P ij (r):
Wherein,the prediction result of each path between the pair of traditional Chinese medicine entities is shown.
Taking the relation corresponding to the maximum probability and outputting the relationThe final result of model prediction is:
3. detailed description of the preferred embodiments
The following description of the embodiments of the application is presented in conjunction with the accompanying drawings to provide a better understanding of the application to those skilled in the art. It is to be expressly noted that in the description below, detailed descriptions of known functions and designs are omitted here as perhaps obscuring the present application.
Examples: the relation extraction is carried out on the clinical curative effect of treating irregular menstruation by combining Wuji Baifeng pills with norethindrone in the traditional Chinese medicine literature.
1: data set structure of traditional Chinese medicine
According to step 1.1, in this embodiment, a local public entity library M= [ M ]' 1 ,m′ 2 ,..,m′ 28780 ]The types of the Chinese medicinal composition include Chinese medicinal diagnosis, chinese medicinal syndrome, chinese medicinal treatment law, prescription, western medicinal diagnosis, western medicinal treatment and other treatments. For example m' 17 = { "black-bone chicken white phoenix pill": [ "prescription", [ " Baiwan", "Water-honeyed pill", "Wuji Baifen Wan"]]}. Data set of Chinese medicine documentsObtaining a data set of the Chinese medicine document after entity alignment according to the formulas (1-1), (1-2) and (1-3)>The empirical value of ψ is 0.8. For example, "in the traditional Chinese medicine field, black-bone chicken white phoenix bolus can treat irregular menstruation (Irregular Menstruation), water-honey bolus is one of the effective treatment means for treating IM at present, but has the following disadvantages of being the terms of the three-year, two entities of black-bone chicken white phoenix bolus and irregular menstruation are identified, but both the water-honey bolus and the water-honey bolus point to the black-bone chicken white phoenix bolus, and the IM points to the irregular menstruation, so entity alignment is needed, and both the water-honey bolus and the water-honey bolus are replaced by the black-bone chicken white phoenix bolus; the "IM" is replaced with "menoxenia", and the "black-bone chicken white phoenix bolus" finally obtained can treat the menoxenia (Irregular Menstruation), and the black-bone chicken white phoenix bolus is one of the effective treatment means for treating the menoxenia at present, but the black-bone chicken white phoenix bolus has the following disadvantages.
According to step 1.2, the data set is compiledThe data are converted into a structured data set L in a data labeling mode, wherein the L contains labeling information such as entity names, entity categories, inter-sentence indexes of the entities, relationship types, relationship supporting sentence indexes and the like. And then according to 8: the ratio of 2 divides the data set L into D train Training data set and D dev And (3) testing the data set, wherein the relation positive example label 5693 cases in the test set are used as target values of the evaluation model. The extraction targets are used for labeling entity types for the 3 types of core entity relations of 'phenomenon expression', 'treated' and 'treatment method', and the entity relations are shown in table 1.
TABLE 1 relationship between entity of Chinese medicine
Fig. 3 is a schematic labeling diagram of a clinical efficacy document fragment of the black-bone chicken white phoenix bolus, taking the traditional Chinese medicine literature of clinical efficacy of the black-bone chicken white phoenix bolus combined with norethindrone for treating irregular menstruation as an example, the follow-up path structure provides labeling data comprising a document segmentation word sents, an entity name, a sentence index sent_id of the entity, an intra-sentence position pos and an entity type.
2: physical route enhancement of traditional Chinese medicine
If the head and tail entities of the traditional Chinese medicine belong to the same category, for example, all the head and tail entities belong to the 'traditional Chinese medicine diagnosis', the head and tail entities do not form a relation, so that the pair of the traditional Chinese medicine entities is skipped to avoid useless calculation; if the head and tail entities of the traditional Chinese medicine belong to different relation categories, for example, the head entity of the traditional Chinese medicine belongs to the diagnosis of traditional Chinese medicine, and the tail entity of the traditional Chinese medicine belongs to the syndrome of traditional Chinese medicine, the relation of 'phenomenon expression' is likely to be generated between the two traditional Chinese medicine entities, and the necessity of an extraction path exists.
According to step 2.1.1, a minimum neighborhood threshold γ of 3 is set taking into account that at most 3 sentences are required for reasoning. "Wuji Baifeng pills" appear in the first sentence, and the category is: the irregular menstruation appears in the first sentence, and the category of the irregular menstruation is Western diagnosis, which possibly forms a treated relation and belongs to the minimum neighborhood, so that two traditional Chinese medicine entities of black-bone chicken white phoenix bolus and irregular menstruation are connected by the first sentence; the irregular menstruation appears in the second sentence, and the black-bone chicken white phoenix bolus appears in the fourth sentence, belonging to the smallest neighborhood, so that the second sentence and the fourth sentence are also used as one possible connection. A continuous path representation is obtained: { "Wuji Baifeng Wan": { "irregular menstruation": [1,2,4] }, the number represents the index of the sentence, representing the real sentence to which the index points.
According to step 2.1.2, sentences containing "purpose", "method", "result", "conclusion" keywords are more likely to appear in relation to evidence sentences than other common sentences; in the text data of traditional Chinese medicine, sentences containing the keywords of diagnosis, traditional Chinese medicine treatment method, analysis and syndrome are more likely to be verified by the relation of traditional Chinese medicine entities, and a keyword library is constructed according to the knowledge of the traditional Chinese medicine field The keyword ' purpose ' exists in the first sentence and belongs to the continuous path of ' black chicken white phoenix bolus ' and ' irregular menstruation ', thus the marker ' key ' is set ']Labeled as critical path. Obtaining a critical path representation: { "Wuji Baifeng Wan": { "irregular menstruation": [ "purpose: discussing the clinical effect of black-bone chicken white phoenix bolus combined with norethindrone for treating irregular menstruation [ 'key']”]}}。
According to step 2.1.3, setting the threshold alpha to be 3 according to the statistical relation extracted by the document relation, taking into consideration that the maximum number of bridging entities required by reasoning is not more than 3. The irregular menstruation appears in the second sentence, the category is "western medicine diagnosis", the norethindrone "appears in the seventh sentence, the category is" western medicine treatment ", and the relationship of" treated "is possibly formed, but the relationship does not belong to the minimum neighborhood, and the bridging entity needs to be found. In the second sentence, "irregular menstruation" and "control group" appear; in the fourth sentence, "control group" and "treatment group" appear; in the sixth sentence, "treatment group" and "treatment" appear; in the sixth sentence, "treatment" and "norethindrone" appear. There is a series of bridging entities and so the second, fourth, sixth and seventh sentence is taken as one possible connection. A multi-hop path representation is obtained: { "Norethindrone": { "irregular menstruation": [2,4,6,7] }, the numbers represent the index of the sentence, representing the real sentence to which the index points.
For training dataset D, according to step 2.1.4 train =[d 1 ,d 2 ,...,d 7734 ]Sequentially constructing a rest entity set without path relationWherein->Represent S * Number of middle entities->Judgment S * Is a certain entity->And S is * If the categories of the other entities are inconsistent, further judging whether the categories of the entities which the two entities belong to meet the traditional Chinese medicine relation forming rule, and if the categories of the entities which the two entities belong to meet the rule, forming a default path by the sentence set which the two entities are located in. For a completely constructed Path set->Will->Input of the Medium Path element into the feature expression set of the generated Path in the biomedical Pre-training model BioBERT +.>The input sequence of the pre-training model is +.>The output is the characteristic expression path 'of the path' l Thereby completing the construction of the path feature set
According to step 2.2, a knowledge base of the traditional Chinese medicine prescription is constructed, and the specific traditional Chinese medicine entity is utilized to construct the combination data of prescription names, functional indications, herbal medicine components and the efficacy of the traditional Chinese medicine. For example: [ "Black-bone chicken Baifeng pill": [ "ind": "menoxenia, menstrual abdominal pain, palpitation, shortness of breath, soreness of waist and legs", "pres": black-bone chicken, deer-horn glue, turtle shell and oyster "]"eft": "Qi-invigorating and blood-nourishing, menstruation-regulating and leukorrhagia-stopping, promoting hematopoiesis and stopping bleeding"]]. Pair D using equation (2-1) extral Generating a characteristic sequence to obtain each prescription in the knowledge base of the traditional Chinese medicine prescriptionAgent and characteristic sequence e of descriptive information I Thereby obtaining the characteristic representation E of the knowledge base of the traditional Chinese medicine prescription.
According to step 2.3, measuring the asymmetric distance of probability distribution between training data and knowledge of the traditional Chinese medicine prescription by adopting a formula (2-2) and using a divergence D KL (Q||P a ) And (3) representing. Then constructing a traditional Chinese medicine prescription knowledge candidate set C through a formula (2-3), wherein the threshold epsilon is set to be 0.1. And then the step 2.1.4 is carried outAnd the traditional Chinese medicine prescription knowledge candidate set C is used for carrying out attention calculation, wherein the delta value is set to be 0.2, and the enhanced path characteristic set path is obtained through the formula (2-5) fset
3: extraction of Chinese medicine relationship
According to step 3.1, the path obtained in step 2 is processed fset Input into the BioBERT model, and get the feature representation of the reference and entity according to the formula (3-1) and the formula (3-2), so as to use the formula (3-3) to find the probability of each traditional Chinese medicine relationship under the current path.
According to step 3.2, the weighted average loss of the model is calculated during training using equation (3-4), and the path weights are calculated by equation (3-5), where δ c Set to 0.3, and finally, the joint loss is calculated according to the formula (3-7) by combining the formula (3-6), and set to 0.5 according to the experience lambda.
According to step 3.3, the learning rate epsilon is set to be 0.005, and according to the formula (3-8), a random gradient descent method is adopted to iteratively update the weight of the model until the loss function value converges. Inputting the preprocessed data into a relation prediction model trained in the step 3.1, aggregating the probabilities of all paths by using a formula (3-9), outputting a relation maximum value as a prediction result, and obtaining a traditional Chinese medicine relation corresponding to the maximum probability by using a formula (3-10).
4. Compared with the prior art, the application has the advantages and positive effects
(1) The application provides a traditional Chinese medicine entity alignment method, which can better consider the characteristic expression of entity attributes by combining a local traditional Chinese medicine entity library, overcomes the problems of entity term deficiency, multi-word meaning and the like in the traditional Chinese medicine field, and improves the precision and effect of extracting the traditional Chinese medicine relation.
(2) The application provides an improved heuristic rule, which effectively reflects the special term vocabulary and knowledge structure in the traditional Chinese medicine field, enhances the characteristic expression capability of the relationship in the traditional Chinese medicine field, and improves the extraction capability of the traditional Chinese medicine relationship.
(3) According to the method for introducing the knowledge of the traditional Chinese medicine prescription, more priori knowledge is introduced into the constructed path by designing the knowledge base of the traditional Chinese medicine prescription, so that the expression capability of the contextual information of the path is enhanced, and the efficiency of relation extraction is effectively improved.
(4) The application reconstructs the loss function of the PATH model, and adds the loss weight and the L2 regularization term into the two-class loss function, so that the model pays more attention to the relation information to be extracted, and the problem of low extraction accuracy caused by unbalanced distribution of positive and negative samples is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
fig. 1: step flow chart of the application
Fig. 2: traditional Chinese medicine relation extraction task flow chart
Fig. 3: document annotation schematic
Fig. 4: schematic path construction.

Claims (5)

1. A traditional Chinese medicine document level relation extraction method, system, electronic equipment and medium based on local path enhancement are characterized by comprising the following steps:
s1: the traditional Chinese medicine data set is constructed, the traditional Chinese medicine document data is collected, the entity mention is aligned, and different medical entities and the positions and the relation evidence sentences thereof are marked.
S2: and (3) path enhancement of the traditional Chinese medicine entity, namely constructing a path of the data set marked in the step (S1) through a heuristic rule, and introducing knowledge of the traditional Chinese medicine prescription to perform path enhancement.
S3: and (3) extracting the traditional Chinese medicine relationship, namely inputting the path enhanced in the step (S2) into a BioBERT model for training, calculating a loss function, iteratively updating the weight of model parameters, and using the trained model to realize the extraction of the traditional Chinese medicine relationship.
2. The method, system, electronic device and medium for extracting a traditional Chinese medicine document-level relationship based on local path enhancement according to claim 1, wherein the step S1 further comprises the following specific steps:
s1.1: alignment of Chinese medicine entities
The application constructs the public entity library of the traditional Chinese medicine locallyWhere z (z > 0) is the total number of entities in M, and adding the alias representation of the entity in the entity library as the bias term enhancement feature representation. The ith entity in M is represented asWherein->Representing entity name->The name of the type is indicated and,is an alias set of size Φ. The data set of the Chinese medicine document is recorded asWherein n (n > 0) is the number of Chinese medicine documents, "> Is a piece of text data in the Chinese medicine document data.
For a pair ofWord segmentation is carried out to obtain->The collection of individual words, get->Arbitrary word->If-> Then->Walk->Obtain->Is->By using the biomedical pre-training model BioBERT of open source, we get +.>Feature vector +.>Finally, the following alias bias item characteristics are obtained>
If it isThen->Empty, pair->After supplementing the auxiliary features, the application uses cosine similarity (Cosine Similarity) to calculate a noun and other ∈>Similarity between terms, judging the term h of a Chinese medicine k At->Whether other synonyms exist in the obtained product, and obtaining h through a BioBERT model k Feature vector +.>Any other wordsFeature vector +.>The cosine similarity is calculated according to the following formula:
where "·" represents the vector dot product "|·|" represents the Euclidean modulus of the vector,and->Respectively represent the bias terms of the Chinese medicinal entities, and the corresponding ++>Or->Is 0.
For a pair ofIs->And->Given a similarity threshold ψ (ψ > 0), if +.>Then the entity will beRefer to entity +.>And then performing entity alignment on all references in the whole text data:
wherein,representation entity->Finally refers to the +.f in the public entity library of traditional Chinese medicine>The "g ()" represents an entity alignment method.
After aligning all entities and their references, the text data is updatedObtaining an aligned dataset +.>
S1.2: chinese medicine data label
The application condenses the relation paradigm of 3 Chinese medicine core entities, namely 'phenomenon expression', 'treated' and 'treatment method', and the relation paradigm is marked according to the general document-level data set DocRED format, and the aligned dataThe Chinese medicine entity and the position and relation of the Chinese medicine entity are marked to obtain corresponding marked data L, L= [ L ] 1 ,l 2 ,...,l n ]Dividing the marked data set L into training data sets D according to a certain proportion train And test dataset D val I.e. l=d train ∪D eev Wherein D is train For training models, D dev For evaluating model recognition accuracy.
3. The method, system, electronic device and medium for extracting a traditional Chinese medicine document-level relationship based on local path enhancement according to claim 1, wherein the step S2 further comprises the following specific steps:
s2.1: physical path structure of traditional Chinese medicine
The application uses the following 4 heuristic rules to obtain sentences with relevance among the traditional Chinese medicine entities in the labeling data set L and connects the sentences to construct a path, and the path format is that Wherein eta (eta > 0) represents the connector entity e head And tail entity e tail Is a number of sentences of (a). S2.1.1: continuous path structure
For training data set D train =[d 1 ,d 2 ,...,d k ](k > 0) to be usedThe Chinese medicinal entity in the formula is constructed as an entity set +.>Because the entity relationship in the Chinese medicine document has causality and usually appears in adjacent texts, a minimum neighborhood threshold gamma (gamma > 0) is set, and +.>Any two different Chinese medicine entity heads e ω And tail entity e w Requirements are that(/>Representation e ω At->Sentence index of->Representation e w At->Sentence index) in (c). If there is a Chinese medicinal entity e meeting the requirements ω And e w Then a continuous path can be constructed based on the sentences in which both are located when +.>At the time of e ω And e w While a single sentence is present. Traversal D train Performing continuous path construction to obtain a continuous path set
S2.1.2: critical path structure
In the Chinese medicine document data, sentences with Chinese medicine specification keywords such as diagnosis, syndrome, chinese medicine treatment method and the like are key sentences for proving Chinese medicine relations, and the Chinese medicine entity relation extraction task has higher priority than other sentences. The application designs a keyword library of traditional Chinese medicine specifications For training data set D train =[d 1 ,d 2 ,...,d k ]If->Any one of the keywords k o (0 < o < r) and any two different types of Chinese medicinal head entities e c And tail entity e v Appear at d s The same minimum neighborhood in (0 < s < k) can be based on k i 、e c And e v The sentence in question constructs a path. Considering that the sentence is already contained in the continuous path +.>Thus setting a marker [ 'key']To mark this path as a critical path, denoted +.>Traversal D train Carrying out critical path construction to obtain a critical path set +.>
S2.1.3: multi-hop path structure
For the far-distance Chinese medicine head entity e in the Chinese medicine document y And tail entity e u I.e. If a series of bridging mention entities are present +.>So that +.>Is formed->Inter-sentence entity pair with bridging relation>Then a Chinese medicine head entity e can be constructed based on the generated entity pair y And a pharmaceutical tail entity e u The multi-hop paths are connected. Considering the characteristic that the entities in the Chinese medicine documents are related to each other, namely that the entities with the Chinese medicine relationship do not need too many entities to bridge, the threshold alpha (alpha > 0) is set, namely that at most only alpha bridging entities are needed. Traversal D train Carrying out multi-hop path construction to obtain a multi-hop path set +.>
S2.1.4: default path structure
When all the above 3 rules are not applicable, consider using the most relevant sentences to perform rough estimation of the auxiliary evidence of the traditional Chinese medicine relationship, for d b A Chinese medicinal head entity e without path relation in (0 < b < k) h And tail entity e t Collect all of the contents e h And e t As a default path. Specifically, assume thatAnd->Respectively represent and contain e h And e t If e h And e t There is no path between, then ∈>The connection is made to construct a default path. Traversal D train Carrying out default path construction to obtain a default path set +.>Thereby obtaining a complete set of paths-> And use BioBERT to apply +.>Mapping to feature set, expressed as +.> τ (τ > 0) is the number of path features in the feature set.
S2.2: chinese medicine characteristic extraction
The application is comprehensiveThe combination data D of prescription name (pres), functional indications (ind), herbal components (herebs) and traditional Chinese medicine efficacy (eff) are obtained from the knowledge of traditional Chinese medicine prescription based on a BioBERT model extral Wherein iota (iota > 0) is the knowledge number of the traditional Chinese medicine prescription:
then splicing the generated feature vectors in a dimension expansion mode to obtain a complete feature expression of the prescription entityAnd->The structured knowledge base of the Chinese medicinal prescription is expressed as
S2.3: knowledge fusion of Chinese medicine prescriptions
For the distribution difference between the traditional Chinese medicine prescription knowledge base and the training data, the distance between the traditional Chinese medicine prescription knowledge base and the training data is measured by adopting KL divergence (Kullback-Leibler Divergence), and knowledge with smaller distribution difference with the training data is selected to be added into the candidate set C. Assuming that the probability distribution of the alpha sample in the training data is P α The probability distribution of the traditional Chinese medicine prescription knowledge base E is Q, and the KL divergence from the traditional Chinese medicine prescription knowledge to the training data is defined as follows according to the principle of taking the training data as the standard:
setting KL divergence threshold epsilon (epsilon > 0), selecting prescription entities with probability distribution similar to training data from E, and when D KL (Q||P a ) When < epsilon, adding the prescription entity into the candidate set, and expressing the element as mu a =[pres a ,ind a ,herbs a ,eff a ]Length is thenThe candidate set C expression of (a) is:
in order to prevent excessive noise from being introduced when knowledge supplement paths are extracted from a candidate set C, the application adopts a multi-layer perceptron to carry out the method for the path feature setAnd candidate set C, and based on the attention score matrix ATTN s And selecting a prescription entity from the candidate set to perform feature fusion on the path entity. Specifically, let->For the first path entity in the path feature set +.>Is characterized by (a) then->And C, performing attention score calculation:
wherein W is a1 Is the weight matrix of the first layer of the multi-layer perceptron, W a2 Is the weight matrix of the second layer, attention score matrix
When the entity in the first pathAttention score to C->Exceeding the feature fusion threshold->And adding the feature vectors of the two, wherein the formula is as follows:
wherein,representing the fused path entity characteristics and obtaining an enhanced path characteristic set path fset =[path′ 1 ,path′ 2 ,...,path′ τ ]。
4. The method, system, electronic device and medium for extracting a traditional Chinese medicine document-level relationship based on local path enhancement according to claim 1, wherein the step S3 further comprises the following specific steps:
s3.1: traditional Chinese medicine relation extraction model construction
The enhanced path feature set path is obtained by the step S2.3 fset Then obtaining the hidden state representation of the feature sequence by a BioBERT modelIn order to obtain information of traditional Chinese medicine entities related to paths, a word vector-based traditional Chinese medicine entity mention representation method is defined as follows:
wherein s and t represent the kth mentioned start and end positions, respectively, in relation to path c,representing the kth mentioned related fragment in path c represents +.>Is the hidden state of the BioBERT model, expressed as the hidden state vector of the j-th marker in the text sequence of the Chinese medicine input.
Based on the definition mentioned by the pharmaceutical entity, the following representation of the pharmaceutical entity is further given:
wherein,the representation of the pharmaceutical entity in path c is shown.
Then, according to the current path c, calculating the probability of occurrence of each traditional Chinese medicine relation r by using a two-layer perceptron:
wherein,representing the probability of the occurrence of a traditional Chinese medicine relation r of a medical entity pair (i, j), wherein sigma is a Sigmoid activation function, and F (·) represents a two-layer perceptron, < ->Is the position characteristic among Chinese medicinal entities +.>Representing the multiplication of two Chinese medicinal entity embedding vectors element by element.
S3.2: loss function construction
Based on the BioBERT model, the relevance between each pair of traditional Chinese medicine entities is predicted, and the problem of unbalanced traditional Chinese medicine relationship types exists, so that the average loss of all paths is calculated, the scores of the paths are overlapped and the maximum value is taken, the cross entropy is used as a loss function of the model, and the method is defined as follows:
where f denotes the index of the current path, g denotes the index of the target class,the value of (1) is 0 or 1, indicating whether the path f belongs to the target class g. The loss of each path in the loss function is weighted such that paths associated with the target class g are subject to a higher penalty, while paths not associated with the target class are subject to a smaller penalty. P is p f Predictive score, w, representing current path f f The weight representing the current path f is calculated from the frequency of occurrence of the path in the training set, i.e. w f For the proportion of the number of the entity pairs of the path f to the number of all the entity pairs in the training set, the calculation formula is as follows:
wherein N is f Represents the corresponding number of the Chinese medicine entity pairs, N, of the current path f x Representing Path set Path set The number of pairs of Chinese medicinal entities corresponding to the route x in the list, if the route is a critical route, the list is provided with an identifier [ 'key']Is given an initial weight delta ff > 0), according to w' f =w ff Weights are calculated, to avoid overfitting, the following L is introduced 2 Regularization term:
where θ (θ > 0) is a parameter of the model, λ (λ > 0) is a regularization parameter, and the following final loss function is obtained in combination with equation (3-5):
L(y,P,θ)=L(y,P)+L reg (θ) (3-7)
s3.3: traditional Chinese medicine relationship prediction
In the back propagation of model training, the optimizer iteratively updates the weight parameters in the model using a random gradient descent (Stochastic Gradient Descent, SGD) algorithm:
wherein w is t For the weight parameters to be updated currently, E (E > 0) is the learning rate,regarding w as an overall loss function t Gradient of w' t Is w t Updated parameters.
Inputting the preprocessed traditional Chinese medicine data into a trained traditional Chinese medicine document-level relation extraction model to obtain a relation extraction prediction result P ij (r):
Wherein,the prediction result of each path between the pair of traditional Chinese medicine entities is shown.
Taking the relation corresponding to the maximum probability and outputting the relationThe final result of model prediction is:
5. the method, system, electronic device and medium for extracting the relationship between the traditional Chinese medicine documents based on local path enhancement according to claim 2,3 or 4, wherein the method is characterized in that:
in step S1.1, the total of 28780 local entity libraries M is a traditional Chinese medicine document data set9668 pieces in total, and a similarity threshold ψ=0.8 is set.
In step S2.2.1, a minimum neighborhood threshold γ=3 is set;
in step S2.2.3, a threshold α=3 is set;
in step S2.3, a KL divergence threshold epsilon=0.1 is set, and a feature fusion threshold is set
In step S3.2, an initial weight delta is set c =0.3, regularization parameter λ=0.5;
in step S3.3, the learning rate e=0.005 is set.
CN202311037068.2A 2023-08-17 2023-08-17 Traditional Chinese medicine document-level relation extraction method, system, electronic equipment and medium based on local path enhancement Pending CN117151102A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311037068.2A CN117151102A (en) 2023-08-17 2023-08-17 Traditional Chinese medicine document-level relation extraction method, system, electronic equipment and medium based on local path enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311037068.2A CN117151102A (en) 2023-08-17 2023-08-17 Traditional Chinese medicine document-level relation extraction method, system, electronic equipment and medium based on local path enhancement

Publications (1)

Publication Number Publication Date
CN117151102A true CN117151102A (en) 2023-12-01

Family

ID=88905332

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311037068.2A Pending CN117151102A (en) 2023-08-17 2023-08-17 Traditional Chinese medicine document-level relation extraction method, system, electronic equipment and medium based on local path enhancement

Country Status (1)

Country Link
CN (1) CN117151102A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117874174A (en) * 2024-03-11 2024-04-12 华南理工大学 Document relation extraction method based on relation priori bias

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117874174A (en) * 2024-03-11 2024-04-12 华南理工大学 Document relation extraction method based on relation priori bias
CN117874174B (en) * 2024-03-11 2024-05-10 华南理工大学 Document relation extraction method based on relation priori bias

Similar Documents

Publication Publication Date Title
CN110222201B (en) Method and device for constructing special disease knowledge graph
CN111274373B (en) Electronic medical record question-answering method and system based on knowledge graph
US9606990B2 (en) Cognitive system with ingestion of natural language documents with embedded code
CN110032648A (en) A kind of case history structuring analytic method based on medical domain entity
CN104516942B (en) The automatic merogenesis mark of Concept-driven test
CN111858940B (en) Multi-head attention-based legal case similarity calculation method and system
CN106599032A (en) Text event extraction method in combination of sparse coding and structural perceptron
US10095740B2 (en) Selective fact generation from table data in a cognitive system
CN113505243A (en) Intelligent question-answering method and device based on medical knowledge graph
CN112650840A (en) Intelligent medical question-answering processing method and system based on knowledge graph reasoning
CN106776711A (en) A kind of Chinese medical knowledge mapping construction method based on deep learning
CN109783806B (en) Text matching method utilizing semantic parsing structure
CN110277167A (en) The Chronic Non-Communicable Diseases Risk Forecast System of knowledge based map
US11532387B2 (en) Identifying information in plain text narratives EMRs
CN112420191A (en) Traditional Chinese medicine auxiliary decision making system and method
CN114818717B (en) Chinese named entity recognition method and system integrating vocabulary and syntax information
Liu et al. Augmented LSTM framework to construct medical self-diagnosis android
CN113742493A (en) Method and device for constructing pathological knowledge map
CN117151102A (en) Traditional Chinese medicine document-level relation extraction method, system, electronic equipment and medium based on local path enhancement
CN115293161A (en) Reasonable medicine taking system and method based on natural language processing and medicine knowledge graph
Cao et al. Automatic ICD code assignment based on ICD’s hierarchy structure for Chinese electronic medical records
Bansal Advanced Natural Language Processing with TensorFlow 2: Build effective real-world NLP applications using NER, RNNs, seq2seq models, Transformers, and more
CN113836321B (en) Method and device for generating medical knowledge representation
CN114626463A (en) Language model training method, text matching method and related device
CN113065355B (en) Professional encyclopedia named entity identification method, system and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination