CN117473104A - Knowledge graph construction method based on chronic disease management - Google Patents
Knowledge graph construction method based on chronic disease management Download PDFInfo
- Publication number
- CN117473104A CN117473104A CN202311651846.7A CN202311651846A CN117473104A CN 117473104 A CN117473104 A CN 117473104A CN 202311651846 A CN202311651846 A CN 202311651846A CN 117473104 A CN117473104 A CN 117473104A
- Authority
- CN
- China
- Prior art keywords
- chronic disease
- data
- entity
- causal
- knowledge graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 208000017667 Chronic Disease Diseases 0.000 title claims abstract description 205
- 238000010276 construction Methods 0.000 title claims abstract description 27
- 230000001364 causal effect Effects 0.000 claims abstract description 118
- 239000011159 matrix material Substances 0.000 claims abstract description 28
- 238000005065 mining Methods 0.000 claims abstract description 14
- 238000000034 method Methods 0.000 claims description 45
- 238000000605 extraction Methods 0.000 claims description 23
- 238000002372 labelling Methods 0.000 claims description 11
- 239000013598 vector Substances 0.000 claims description 10
- 238000012216 screening Methods 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 9
- 238000013145 classification model Methods 0.000 claims description 8
- 238000007781 pre-processing Methods 0.000 claims description 7
- 230000011218 segmentation Effects 0.000 claims description 7
- 201000010099 disease Diseases 0.000 claims description 6
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 6
- 238000011282 treatment Methods 0.000 claims description 6
- 230000001684 chronic effect Effects 0.000 claims description 4
- 238000003908 quality control method Methods 0.000 claims description 4
- 238000003745 diagnosis Methods 0.000 claims description 3
- 229940126585 therapeutic drug Drugs 0.000 claims description 2
- 238000011269 treatment regimen Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 8
- 239000003814 drug Substances 0.000 description 9
- 238000003058 natural language processing Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 4
- 230000036541 health Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000012790 confirmation Methods 0.000 description 2
- 230000001788 irregular Effects 0.000 description 2
- 230000002265 prevention Effects 0.000 description 2
- 208000024172 Cardiovascular disease Diseases 0.000 description 1
- 208000014085 Chronic respiratory disease Diseases 0.000 description 1
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 239000008186 active pharmaceutical agent Substances 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 208000026106 cerebrovascular disease Diseases 0.000 description 1
- 231100000749 chronicity Toxicity 0.000 description 1
- 206010012601 diabetes mellitus Diseases 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 230000002526 effect on cardiovascular system Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 239000008103 glucose Substances 0.000 description 1
- 210000002216 heart Anatomy 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 208000015181 infectious disease Diseases 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 210000003734 kidney Anatomy 0.000 description 1
- 230000003902 lesion Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Public Health (AREA)
- Pathology (AREA)
- Primary Health Care (AREA)
- Epidemiology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Animal Behavior & Ethology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Algebra (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a knowledge graph construction method based on chronic disease management, which comprises the following steps: collecting and sorting data related to chronic diseases to obtain chronic disease data; constructing a priori entity word set and a priori relation word set related to chronic disease management; extracting the entity of the chronic disease data to obtain entity matrix data; mining a causal structure of the entity matrix data to obtain a causal graph structure reflecting causal relations among the entities; and constructing a knowledge graph by utilizing a causal graph structure. By integrating causal relationship in the knowledge graph, the knowledge graph is more easy to find causal relationship of chronic diseases in chronic disease management, and more accurate relationship description of chronic disease management can be realized, so that the management work of the chronic diseases is facilitated, and the problem that the existing knowledge graph for chronic disease management cannot better embody causal relationship among entities, so that the effect of the knowledge graph in chronic disease management is limited is solved.
Description
Technical Field
The invention relates to the technical field of medicine and data processing, in particular to a knowledge graph construction method based on chronic disease management.
Background
Chronic diseases are general names of diseases which do not form infection and have long-term accumulation to form morphological lesions of the diseases, and common chronic diseases mainly include cardiovascular and cerebrovascular diseases, cancers, diabetes mellitus, chronic respiratory diseases and the like. The damage of chronic diseases mainly causes damage to important organs such as brain, heart, kidney and the like, is easy to cause disability, affects working capacity and life quality, has extremely high medical cost, and increases the economic burden of society and families.
At present, along with the continuous improvement of medical level, chronic diseases are usually managed by adopting an active prevention and passive treatment mode, wherein when chronic diseases are prevented and managed, the basic ideas and theoretical basis of traditional Chinese and western medicine are less ambiguous, and the action effects can be complemented, so that the chronic diseases are usually managed by adopting a mode of combining traditional Chinese and western medicine.
Although the current domestic Chinese and Western medicine has ICD codes, medical institutions such as hospitals and the like accumulate certain medical data, even part of medical institutions have built a self medical informatization system, the knowledge graph in the existing medical informatization system cannot well embody the causal relationship among entities, and the causal relationship has very important value in chronic disease management, and medical staff can provide a more accurate health management scheme for patients through the causal relationship. Therefore, the existing knowledge graph architecture cannot well play the role of the existing knowledge graph architecture when facing the growing chronic disease management demands.
Disclosure of Invention
The invention aims to provide a knowledge graph construction method based on chronic disease management, which aims to solve the problem that the existing knowledge graph aiming at chronic disease management cannot better reflect causal relationship among entities, so that the effect of the knowledge graph is limited in chronic disease management.
In order to solve the technical problems, the invention provides a knowledge graph construction method based on chronic disease management, which comprises the following steps:
collecting and sorting data related to chronic diseases to obtain chronic disease data;
constructing a priori entity word set and a priori relation word set related to chronic disease management;
extracting the entity of the chronic disease data to obtain entity matrix data;
mining a causal structure of the entity matrix data to obtain a causal graph structure reflecting causal relations among the entities;
and constructing a knowledge graph by utilizing a causal graph structure.
Optionally, in the knowledge graph construction method based on chronic disease management, the method for collecting and sorting data related to chronic disease to obtain chronic disease data includes:
obtaining authorized chronic medical data from a medical facility, the chronic medical data including disease characteristics of the chronic disease, complications, treatment regimens, therapeutic drugs, and records of diagnosis and treatment of the patient;
obtaining chronic disease disclosure data disclosed on the Internet, wherein the chronic disease disclosure data comprises medical inquiry records, medical inquiry records and medical books;
and performing quality control on the acquired chronic disease medical data and chronic disease public data to obtain chronic disease data.
Optionally, in the method for constructing a knowledge graph based on chronic disease management, the method for quality controlling the obtained chronic disease medical data and the obtained chronic disease public data to obtain the chronic disease data includes:
removing data which are irrelevant to chronic diseases from the acquired chronic disease medical data and chronic disease public data;
eliminating data which are obviously out of compliance with medical common sense from the acquired chronic disease medical data and chronic disease public data;
and combining the professional books and professionals, and confirming and combing the removed chronic disease medical data and chronic disease public data to obtain the chronic disease number.
Optionally, in the knowledge graph construction method based on chronic disease management, the method for constructing a priori entity word set and a priori relationship word set related to chronic disease management includes:
obtaining a medical term standard library;
text word segmentation and part-of-speech recognition are carried out on text contents in a medical term standard library so as to extract keywords;
screening nouns and verbs from the keywords, and taking the nouns as candidate entity words and the verbs as candidate relationship words;
screening and classifying the candidate entity words and the candidate relation words to obtain a priori entity word set and a priori relation word set.
Optionally, in the method for constructing a knowledge graph based on chronic disease management, the method for extracting entities from chronic disease data to obtain entity matrix data includes:
preprocessing chronic disease data;
extracting priori entity from the preprocessed chronic disease data in a text matching mode;
manually labeling part of high-quality sample fine-tuning BiLSTM-CRF models to extract model entities;
and combining the prior entity extraction result and the model entity extraction result to obtain entity matrix data.
Optionally, in the method for constructing a knowledge graph based on chronic disease management, the method for preprocessing chronic disease data includes:
and eliminating the disordered websites, unusual symbols and characters in the chronic disease data to obtain the pretreated chronic disease data.
Optionally, in the method for constructing a knowledge graph based on chronic disease management, the method for extracting the model entity by manually labeling part of the high-quality sample fine tuning BiLSTM-CRF model includes:
finely tuning the training BERT model in an unsupervised mode to obtain vectors of each word of the text;
manually labeling part of high-quality entity extraction task samples, wherein the manually labeled entity extraction task samples comprise chronic disease medical data and chronic disease public data;
and (5) performing model entity extraction on all text data by using a trained BiLSTM-CRF model.
Optionally, in the method for constructing a knowledge graph based on chronic disease management, the method for mining a causal structure of entity matrix data to obtain a causal graph structure reflecting causal relationships between entities includes:
and mining a causal structure of the entity matrix data by using a causal discovery PC algorithm to obtain a causal graph structure reflecting causal relations among the entities.
Optionally, in the method for constructing a knowledge graph based on chronic disease management, the method for constructing a knowledge graph by using a causal graph structure includes:
extracting a causal event set of a chronic disease entity with obvious causal relation from a causal graph structure;
extracting relationships between entities from the entity matrix data;
and constructing a knowledge graph according to the relation between the chronic disease entity causal event set and the entity.
Optionally, in the method for constructing a knowledge graph based on chronic disease management, the method for constructing a knowledge graph according to a relation between a causal event set of a chronic disease entity and the entity includes:
constructing a basic knowledge graph according to the relation between the entities;
and taking the causal event set of the chronic disease entity as a calibration set, and finely adjusting a DNN relation classification model to fuse the causal event set of the chronic disease entity into a basic knowledge graph so as to obtain a final knowledge graph.
The invention provides a knowledge graph construction method based on chronic disease management, which comprises the following steps: collecting and sorting data related to chronic diseases to obtain chronic disease data; constructing a priori entity word set and a priori relation word set related to chronic disease management; extracting the entity of the chronic disease data to obtain entity matrix data; mining a causal structure of the entity matrix data to obtain a causal graph structure reflecting causal relations among the entities; and constructing a knowledge graph by utilizing a causal graph structure. By integrating causal relationship in the knowledge graph, the knowledge graph is more easy to find causal relationship of chronic diseases in chronic disease management, and more accurate relationship description of chronic disease management can be realized, so that the management work of the chronic diseases is facilitated, and the problem that the existing knowledge graph for chronic disease management cannot better embody causal relationship among entities, so that the effect of the knowledge graph in chronic disease management is limited is solved.
Drawings
Fig. 1 is a flowchart of a knowledge graph construction method based on chronic disease management provided in this embodiment;
fig. 2 is an implementation block diagram of a knowledge graph construction method based on chronic disease management provided in this embodiment;
fig. 3 is a schematic structural diagram of an LSTM model provided in this embodiment;
FIG. 4 is a schematic diagram of the PC algorithm of the causal invention provided in this embodiment;
fig. 5 is a (partial) visual display effect diagram of the knowledge graph provided in the present embodiment.
Detailed Description
The knowledge graph construction method based on chronic disease management provided by the invention is further described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that the drawings are in a very simplified form and are all to a non-precise scale, merely for convenience and clarity in aiding in the description of embodiments of the invention. Furthermore, the structures shown in the drawings are often part of actual structures. In particular, the drawings are shown with different emphasis instead being placed upon illustrating the various embodiments.
It is noted that "first", "second", etc. in the description and claims of the present invention and the accompanying drawings are used to distinguish similar objects so as to describe embodiments of the present invention, and not to describe a specific order or sequence, it should be understood that the structures so used may be interchanged under appropriate circumstances. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiment provides a knowledge graph construction method based on chronic disease management, as shown in fig. 1, including:
s1, collecting and sorting data related to chronic diseases to obtain chronic disease data;
s2, constructing a priori entity word set and a priori relation word set related to chronic disease management;
s3, entity extraction is carried out on the chronic disease data to obtain entity matrix data;
s4, mining a causal structure of the entity matrix data to obtain a causal graph structure reflecting causal relations among the entities;
s5, constructing a knowledge graph by utilizing a causal graph structure.
According to the knowledge graph construction method based on chronic disease management, the causal relationship is integrated in the knowledge graph, so that the knowledge graph is easy to find the causal relationship of chronic disease in the chronic disease management, and the more accurate relationship description of the chronic disease management can be realized, the management work of the chronic disease is facilitated, and the problem that the existing knowledge graph aiming at the chronic disease management cannot better embody the causal relationship among entities, so that the effect of the knowledge graph in the chronic disease management is limited is solved.
It should be noted that, in the knowledge graph construction method provided in the embodiment, the order among the steps may be adjusted according to the actual situation, or other steps may be added between the steps to optimize the effect. Other steps and order of steps may be added without departing from the spirit of the present application.
In a specific embodiment, as shown in fig. 2, the method for constructing a knowledge graph based on chronic disease management provided in this embodiment generally includes: collecting chronic disease related data from clinical guidelines, medical documents, health books, network resources and the like, and obtaining chronic disease data by auditing the collected data; then obtaining text labels and vectors by means of manual knowledge arrangement and manual label determination or means of network health knowledge and keyword extraction; finally, the text labels and vectors are processed by using a program to obtain a final knowledge graph, and the knowledge graph can be stored in a database.
Further, in the present embodiment, step S1, the method for collecting and sorting the data related to chronic diseases to obtain chronic disease data includes:
s11, authorized chronic disease medical data is acquired from the medical institution, wherein the chronic disease medical data comprises but is not limited to disease characteristics, complications, treatment schemes, treatment drugs and diagnosis records of patients.
Specifically, in this embodiment, chronic disease medical data may be obtained from medical institutions such as hospitals, outpatients, disease control departments, medical institutions, etc. through an API interface, and the obtained various chronic disease medical data include data of traditional Chinese medicine type and data of western medicine type, so that the combination of traditional Chinese medicine and western medicine is enabled, and more effective guidance is provided for chronic disease management.
S12, acquiring chronic disease disclosure data disclosed on the Internet, wherein the chronic disease disclosure data comprises, but is not limited to, medical inquiry records, medical inquiry response records and medical books.
Specifically, in this embodiment, the chronic disease disclosure data disclosed on the internet may be obtained by a web crawler.
And S13, performing quality control on the acquired chronic disease medical data and chronic disease public data to obtain chronic disease data.
Specifically, in this embodiment, the method for quality control of acquired chronic disease medical data and chronic disease public data includes: removing data which are irrelevant to chronic diseases from the acquired chronic disease medical data and chronic disease public data; eliminating data which are obviously out of compliance with medical common sense from the acquired chronic disease medical data and chronic disease public data; and combining the professional books and professionals, and confirming and combing the removed chronic disease medical data and chronic disease public data to obtain the chronic disease number.
In practical application, the elimination of data materials irrelevant to chronic diseases and data not conforming to medical common sense can be performed manually or by means of some computer intelligent system. When the removed chronic disease medical data and chronic disease public data are confirmed and combed, manual work is generally adopted to improve the accuracy of confirmation and combing, and meanwhile, some flaw errors (such as pen errors, irregular expression, wrongly written characters and the like) can be manually modified.
Further, in this embodiment, step S2, the method for constructing the prior entity word set and the prior relationship word set related to chronic disease management includes:
s21, obtaining a medical term standard library.
Specifically, in this embodiment, the standard medical term library may be derived from a representative medical term standard library at home and abroad, such as a medical system nomenclature-clinical term (SNOMED CT), an OMAHA "tangram" medical term set, an integrated medical language system (UMLS), a chinese integrated medical language system (CUMLS), and the like.
S22, text word segmentation and part-of-speech recognition are carried out on the text content in the medical term standard library so as to extract keywords.
Specifically, text word segmentation and part-of-speech recognition can be performed manually or by using a natural language processing model. Of course, the text word segmentation and part of speech recognition can be performed by using the natural language processing model, and then the processing result of the natural language processing model is confirmed by using manpower, so that the accuracy of the text word segmentation and part of speech recognition is improved. Methods for text segmentation and part-of-speech recognition using natural language processing models are well known to those skilled in the art and are not described in detail herein.
S21, nouns and verbs are screened from the keywords, and the nouns are used as candidate entity words and the verbs are used as candidate relation words.
Likewise, this step may be performed manually or using a natural language processing model. . Of course, the keyword screening can be performed by using the natural language processing model, and then the processing result of the natural language processing model can be confirmed manually, so that the accuracy of the keyword screening can be improved.
And S21, screening and classifying the candidate entity words and the candidate relationship words to obtain a priori entity word set and a priori relationship word set.
Preferably, the step takes manual screening as a means, and by means of expert personnel with medical knowledge, screening and classifying candidate entity words and candidate relation words, removing unreasonable candidate words, distinguishing candidate words belonging to an entity word set or a relation word set, classifying the distinguished candidate words according to categories, and thus obtaining a priori entity word set and a priori check relation word set.
Further, in the embodiment, step S3, the method for extracting the entity from the chronic disease data to obtain the entity matrix data includes:
s31, preprocessing chronic disease data.
Specifically, in this embodiment, the method for preprocessing chronic disease data includes: and eliminating the disordered websites, unusual symbols and characters in the chronic disease data to obtain the pretreated chronic disease data. By preprocessing the chronic disease data, the efficiency and accuracy of data processing can be further improved.
S32, extracting priori entities from the preprocessed chronic disease data in a text matching mode.
S33, manually labeling part of the high-quality sample fine-tuning BiLSTM-CRF model to extract the model entity.
Specifically, in this embodiment, the method for extracting the model entity by manually labeling a part of the high-quality sample fine tuning BiLSTM-CRF model includes: firstly, finely tuning and training the BERT model in an unsupervised mode to obtain vectors of each word of the text; then, manually labeling part of high-quality entity extraction task samples, wherein the manually labeled entity extraction task samples comprise chronic disease medical data and chronic disease public data; and finally, performing model entity extraction on all text data by using a trained BiLSTM-CRF model.
In this embodiment, the cycle epoch of fine-tuning the BERT model is 10. The manual labeling mode is 'BIO', and the number of data strips is 5000. The BiLSTM-CRF model is the existing conventional BiLSTM-CRF model. Of course, in practical application, training rounds, labeling modes and the number of labeled data can be set according to practical needs. And, the mode of model training is well known to those skilled in the art, and this is not described in detail in this application.
It is contemplated that if analysis is performed using conventional cox regression, there are some difficulties in facing irregular follow-up data. For example, there are 7 follow-up blood glucose data for some patients, while another part is only 3, and such trapezoidal data is not useful for analysis. Therefore, the present embodiment uses the BiLSTM-CRF model for analysis, wherein the LSTM model structure is shown in FIG. 3. The BiLSTM-CRF model main body consists of a bidirectional long-short-time memory network and a conditional random field, wherein model input is character characteristics, and output is a prediction label corresponding to each character. According to the embodiment, by adopting the BiLSTM-CRF model, different people can be identified for testing, a prediction model for predicting the risk of an individual completely can be presented, and analysis of interaction of important prediction factors and the like can be performed.
S34, combining the prior entity extraction result and the model entity extraction result to obtain entity matrix data.
Still further, in this embodiment, step S4, the method for mining a causal structure of entity matrix data to obtain a causal graph structure reflecting causal relationships between entities includes:
and mining a causal structure of the entity matrix data by using a causal discovery PC algorithm to obtain a causal graph structure reflecting causal relations among the entities.
The causal discovery PC algorithm is a basic causal learning algorithm, and by performing iterative fitting on observed data for a plurality of rounds, variables with causal relation in the observed data can be calculated and displayed in a data structure of a directed graph. As shown in FIG. 4, each edge in the causal graph structure and the entities at its ends form a candidate causal event triplet < entity 1, causal relationship, entity 2>, where entity 1 is the tail of the directed edge and entity 2 is the head of the directed edge (indicated by the arrow).
After obtaining the causal graph structure, in this embodiment, step S5, the method for constructing a knowledge graph by using the causal graph structure includes:
s51, extracting a causal event set of the chronic disease entity with obvious causal relation from the causal graph structure.
Specifically, in this embodiment, for each candidate causal event triplet, after causal identification and Do-calcul causal effect estimation, an average causal effect is calculated, and the magnitude of the average causal effect is taken as a confidence, and finally only causal event triples with the confidence greater than 0.05 are reserved, so as to form a chronic disease entity causal event set with obvious causal relationship.
In this embodiment, the causal event triples with a confidence level greater than 0.05 are considered as causal event triples with significant causal relationships. Of course, in other embodiments, different confidence thresholds may also be set to quantify the significance of the causal relationship.
S52, extracting the relation among the entities from the entity matrix data.
Specifically, in this embodiment, the extraction of the relationships between the entities is implemented by training a DNN entity relationship classification model, which includes: first, on the basis of entity matrix data, high-quality relation extraction classification task samples are manually marked (in this embodiment, 5000 relation extraction classification task samples are manually marked), and the selection standard of the sample data is the same as the standard of entity extraction, which may be the same piece of data. Then, the input of the DNN model is a map triplet entity relation pair < entity 1, relation and entity 2>, wherein entity 1 is a text vector, entity 2 is a text vector, the relation is a text vector of the text context where entity 1 and entity 2 are located, or an Onehot coding vector of a relation Label R_Label of entity 1 and entity 2, wherein the text vectors are extracted from the BERT model, and the relation Label is manually marked in the last step. Then, when predicting entity relationship, the DNN model needs to construct candidate entity pairs < entity 1, entity 2>, that is, taking 2 adjacent sentences (sentence division is based on period or english point number) as a sample, constructing an entity pair between all entities in a sample, and predicting the relationship of entity pairs through a trained DNN classification model. Finally, based on the prediction result of the DNN entity relationship classification model, a high-confidence atlas triplet is reserved, i.e. in this embodiment, only entity relationship pairs with relationship labels not equal to "unknown" and classification probability greater than 0.9 are reserved.
Of course, in practical application, a high-confidence confirmation principle may be set according to practical requirements, for example, the entity relation pair with the classification probability greater than 0.85 is confirmed as a high-confidence map triplet.
And S53, constructing a knowledge graph according to the relation between the causal event set of the chronic disease entity and the entity.
Specifically, in this embodiment, the method for constructing a knowledge graph includes: first, a basic knowledge graph is constructed according to the relationship between entities. And then, taking the causal event set of the chronic disease entity as a calibration set, and finely adjusting a training DNN relation classification model to fuse the causal event set of the chronic disease entity into a basic knowledge graph so as to obtain a final knowledge graph. The method comprises the following steps: taking the mined causal event triplet set < entity 1, causal relation, entity 2> as a real check set group_set; predicted triplet set < entity 1, prediction relationship, entity 2> as prediction set prediction_set; comparing relation labels in a check set group_set and a prediction set prediction_set of the causal event, if the relation labels are consistent, indicating that the DNN model predicts the causal event correctly, otherwise, indicating that the causal event is mispredicted, so as to obtain a prediction correct set and a prediction error set of the causal event set; performing fine tuning training on the DNN relation classification model by using a set of correct prediction and incorrect prediction, wherein in the embodiment, the sample weight of the set of incorrect prediction is set to be 2 times of that of the set of correct prediction samples, so that the DNN relation classification model is more concerned with learning of incorrect samples in the fine tuning process; and (3) re-predicting all triples of the constructed basic knowledge graph by using the finely-tuned DNN model, merging the causal event triples into a prediction result, and taking the causal event result as the reference when the merging is inconsistent, so that the final knowledge graph is obtained.
In this embodiment, the DNN model adopted in this step and the DNN model adopted in step S52 are the same DNN model, so that the utilization rate of the model is improved, and hardware resources are saved. And when the DNN model is subjected to fine tuning training, the weight setting can be adjusted according to actual needs, and the protection scope of the application is not limited to the method.
Preferably, the Neo4j graphic database may be used to visualize the chronicity management knowledge graph.
According to the knowledge graph construction method based on chronic disease management, the knowledge graph is fused with the causal event relationship among the entities, so that the knowledge graph is not only beneficial to further finding the causal relationship of the chronic disease, but also beneficial to the prevention and treatment of the chronic disease; the causal event mining adopts causal inference methods of causal discovery and causal effect estimation, so that mining results are more accurate and have better interpretability; therefore, more accurate relation description can be realized in the chronic disease management, the application range of the method is wide, and multi-scene application such as chronic disease management, active health, AI intelligent question and answer and the like can be realized.
In this specification, each embodiment is described in a progressive manner, and each embodiment focuses on the difference from other embodiments, so that the same similar parts of each embodiment are referred to each other.
The knowledge graph construction method based on chronic disease management provided by the embodiment comprises the following steps: collecting and sorting data related to chronic diseases to obtain chronic disease data; constructing a priori entity word set and a priori relation word set related to chronic disease management; extracting the entity of the chronic disease data to obtain entity matrix data; mining a causal structure of the entity matrix data to obtain a causal graph structure reflecting causal relations among the entities; and constructing a knowledge graph by utilizing a causal graph structure. By integrating causal relationship in the knowledge graph, the knowledge graph is more easy to find causal relationship of chronic diseases in chronic disease management, and more accurate relationship description of chronic disease management can be realized, so that the management work of the chronic diseases is facilitated, and the problem that the existing knowledge graph for chronic disease management cannot better embody causal relationship among entities, so that the effect of the knowledge graph in chronic disease management is limited is solved.
The above description is only illustrative of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention, and any alterations and modifications made by those skilled in the art based on the above disclosure shall fall within the scope of the appended claims.
Claims (10)
1. The knowledge graph construction method based on chronic disease management is characterized by comprising the following steps of:
collecting and sorting data related to chronic diseases to obtain chronic disease data;
constructing a priori entity word set and a priori relation word set related to chronic disease management;
extracting the entity of the chronic disease data to obtain entity matrix data;
mining a causal structure of the entity matrix data to obtain a causal graph structure reflecting causal relations among the entities;
and constructing a knowledge graph by utilizing a causal graph structure.
2. The knowledge graph construction method based on chronic disease management according to claim 1, wherein the method for collecting and sorting data related to chronic disease to obtain chronic disease data comprises:
obtaining authorized chronic medical data from a medical facility, the chronic medical data including disease characteristics of the chronic disease, complications, treatment regimens, therapeutic drugs, and records of diagnosis and treatment of the patient;
obtaining chronic disease disclosure data disclosed on the Internet, wherein the chronic disease disclosure data comprises medical inquiry records, medical inquiry records and medical books;
and performing quality control on the acquired chronic disease medical data and chronic disease public data to obtain chronic disease data.
3. The knowledge graph construction method based on chronic disease management according to claim 2, wherein the method for quality controlling the acquired chronic disease medical data and chronic disease public data to obtain chronic disease data comprises:
removing data which are irrelevant to chronic diseases from the acquired chronic disease medical data and chronic disease public data;
eliminating data which are obviously out of compliance with medical common sense from the acquired chronic disease medical data and chronic disease public data;
and combining the professional books and professionals, and confirming and combing the removed chronic disease medical data and chronic disease public data to obtain the chronic disease number.
4. The knowledge graph construction method based on chronic disease management according to claim 1, wherein the method for constructing prior entity word sets and prior relationship word sets related to chronic disease management comprises:
obtaining a medical term standard library;
text word segmentation and part-of-speech recognition are carried out on text contents in a medical term standard library so as to extract keywords;
screening nouns and verbs from the keywords, and taking the nouns as candidate entity words and the verbs as candidate relationship words;
screening and classifying the candidate entity words and the candidate relation words to obtain a priori entity word set and a priori relation word set.
5. The knowledge graph construction method based on chronic disease management according to claim 1, wherein the method for performing entity extraction on chronic disease data to obtain entity matrix data comprises:
preprocessing chronic disease data;
extracting priori entity from the preprocessed chronic disease data in a text matching mode;
manually labeling part of high-quality sample fine-tuning BiLSTM-CRF models to extract model entities;
and combining the prior entity extraction result and the model entity extraction result to obtain entity matrix data.
6. The knowledge graph construction method based on chronic disease management according to claim 5, wherein the method for preprocessing chronic disease data comprises:
and eliminating the disordered websites, unusual symbols and characters in the chronic disease data to obtain the pretreated chronic disease data.
7. The knowledge graph construction method based on chronic disease management according to claim 5, wherein the method for performing model entity extraction by manually labeling part of high-quality sample fine tuning BiLSTM-CRF model comprises:
finely tuning the training BERT model in an unsupervised mode to obtain vectors of each word of the text;
manually labeling part of high-quality entity extraction task samples, wherein the manually labeled entity extraction task samples comprise chronic disease medical data and chronic disease public data;
and (5) performing model entity extraction on all text data by using a trained BiLSTM-CRF model.
8. The knowledge graph construction method based on chronic disease management according to claim 1, wherein the method for mining a causal structure of entity matrix data to obtain a causal graph structure reflecting causal relationships between entities comprises:
and mining a causal structure of the entity matrix data by using a causal discovery PC algorithm to obtain a causal graph structure reflecting causal relations among the entities.
9. The knowledge graph construction method based on chronic disease management according to claim 1, wherein the method for constructing a knowledge graph using a causal graph structure comprises:
extracting a causal event set of a chronic disease entity with obvious causal relation from a causal graph structure;
extracting relationships between entities from the entity matrix data;
and constructing a knowledge graph according to the relation between the chronic disease entity causal event set and the entity.
10. The knowledge graph construction method based on chronic disease management according to claim 9, wherein the method for constructing a knowledge graph according to the relation between a set of causal events of a chronic disease entity and the entity comprises:
constructing a basic knowledge graph according to the relation between the entities;
and taking the causal event set of the chronic disease entity as a calibration set, and finely adjusting a DNN relation classification model to fuse the causal event set of the chronic disease entity into a basic knowledge graph so as to obtain a final knowledge graph.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311651846.7A CN117473104A (en) | 2023-12-04 | 2023-12-04 | Knowledge graph construction method based on chronic disease management |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311651846.7A CN117473104A (en) | 2023-12-04 | 2023-12-04 | Knowledge graph construction method based on chronic disease management |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117473104A true CN117473104A (en) | 2024-01-30 |
Family
ID=89638016
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311651846.7A Pending CN117473104A (en) | 2023-12-04 | 2023-12-04 | Knowledge graph construction method based on chronic disease management |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117473104A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118448049A (en) * | 2024-07-08 | 2024-08-06 | 恺恩泰(南京)科技有限公司 | Grouping distribution management system and method based on different chronic diseases |
-
2023
- 2023-12-04 CN CN202311651846.7A patent/CN117473104A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118448049A (en) * | 2024-07-08 | 2024-08-06 | 恺恩泰(南京)科技有限公司 | Grouping distribution management system and method based on different chronic diseases |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111540468B (en) | ICD automatic coding method and system for visualizing diagnostic reasons | |
CN106874643B (en) | Method and system for automatically constructing knowledge base to realize auxiliary diagnosis and treatment based on word vectors | |
CN111274806B (en) | Method and device for recognizing word segmentation and part of speech and method and device for analyzing electronic medical record | |
US10929420B2 (en) | Structured report data from a medical text report | |
CN109599185B (en) | Disease data processing method and device, electronic equipment and computer readable medium | |
CN111316281B (en) | Semantic classification method and system for numerical data in natural language context based on machine learning | |
Yu et al. | Automatic ICD code assignment of Chinese clinical notes based on multilayer attention BiRNN | |
CN112597774B (en) | Chinese medical named entity recognition method, system, storage medium and equipment | |
CN111949759A (en) | Method and system for retrieving medical record text similarity and computer equipment | |
Carchiolo et al. | Medical prescription classification: a NLP-based approach | |
Bekhuis et al. | Feature engineering and a proposed decision-support system for systematic reviewers of medical evidence | |
CN112541056A (en) | Medical term standardization method, device, electronic equipment and storage medium | |
US20130060793A1 (en) | Extracting information from medical documents | |
CN117473104A (en) | Knowledge graph construction method based on chronic disease management | |
CN113539515A (en) | Clinical demand mining method and device, electronic equipment and storage medium | |
CN111180026A (en) | Special diagnosis and treatment view system and method | |
CN115954072A (en) | Intelligent clinical test scheme generation method and related device | |
CN113851208A (en) | Medical examination recommendation system and method based on explicit topic allocation technology | |
CN113643825B (en) | Medical case knowledge base construction method and system based on clinical key feature information | |
CN116775897A (en) | Knowledge graph construction and query method and device, electronic equipment and storage medium | |
CN113724878B (en) | Medical risk information pushing method and device based on machine learning | |
Funkner et al. | Negation Detection for Clinical Text Mining in Russian. | |
Sedghi et al. | Mining clinical text for stroke prediction | |
CN117194604B (en) | Intelligent medical patient inquiry corpus construction method | |
CN117609635A (en) | Collaborative filtering-based data pushing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |