CN115312186B - Auxiliary screening system for diabetic retinopathy - Google Patents
Auxiliary screening system for diabetic retinopathy Download PDFInfo
- Publication number
- CN115312186B CN115312186B CN202210947675.1A CN202210947675A CN115312186B CN 115312186 B CN115312186 B CN 115312186B CN 202210947675 A CN202210947675 A CN 202210947675A CN 115312186 B CN115312186 B CN 115312186B
- Authority
- CN
- China
- Prior art keywords
- text
- data
- diabetic retinopathy
- entity
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 206010012689 Diabetic retinopathy Diseases 0.000 title claims abstract description 60
- 238000012216 screening Methods 0.000 title claims abstract description 39
- 238000012545 processing Methods 0.000 claims abstract description 31
- 230000003902 lesion Effects 0.000 claims abstract description 26
- 238000010276 construction Methods 0.000 claims abstract description 25
- 238000000605 extraction Methods 0.000 claims abstract description 18
- 238000004364 calculation method Methods 0.000 claims abstract description 13
- 239000013598 vector Substances 0.000 claims description 49
- 238000000034 method Methods 0.000 claims description 28
- 238000012549 training Methods 0.000 claims description 27
- 230000011218 segmentation Effects 0.000 claims description 21
- 208000024891 symptom Diseases 0.000 claims description 19
- 238000013135 deep learning Methods 0.000 claims description 17
- 238000003384 imaging method Methods 0.000 claims description 12
- 238000013507 mapping Methods 0.000 claims description 10
- 230000002457 bidirectional effect Effects 0.000 claims description 3
- 229940126585 therapeutic drug Drugs 0.000 claims description 3
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 17
- 201000010099 disease Diseases 0.000 description 16
- 238000003745 diagnosis Methods 0.000 description 15
- 230000008569 process Effects 0.000 description 10
- 238000002372 labelling Methods 0.000 description 8
- 239000003814 drug Substances 0.000 description 4
- 238000007689 inspection Methods 0.000 description 4
- 208000032843 Hemorrhage Diseases 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 241001466804 Carnivora Species 0.000 description 2
- 229920000742 Cotton Polymers 0.000 description 2
- 241000282324 Felis Species 0.000 description 2
- 206010025421 Macule Diseases 0.000 description 2
- 241000282376 Panthera tigris Species 0.000 description 2
- 206010038862 Retinal exudates Diseases 0.000 description 2
- 230000000740 bleeding effect Effects 0.000 description 2
- 206010012601 diabetes mellitus Diseases 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000002265 prevention Effects 0.000 description 2
- 230000035755 proliferation Effects 0.000 description 2
- 208000002249 Diabetes Complications Diseases 0.000 description 1
- 206010020772 Hypertension Diseases 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000009411 base construction Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000003759 clinical diagnosis Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000000586 desensitisation Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000003748 differential diagnosis Methods 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 208000030533 eye disease Diseases 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000000474 nursing effect Effects 0.000 description 1
- 230000036285 pathological change Effects 0.000 description 1
- 231100000915 pathological change Toxicity 0.000 description 1
- 239000002244 precipitate Substances 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Public Health (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Epidemiology (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Primary Health Care (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Pathology (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention discloses an auxiliary screening system for diabetic retinopathy, which relates to the technical field of data processing, and comprises the following components: the data processing module is used for sequentially carrying out text standardization processing and text extraction processing on the patient medical record data so as to obtain a multi-element structured data set; the knowledge graph construction module is used for constructing a diabetic retinopathy rule knowledge graph; the lesion determining module is used for respectively carrying out similarity calculation on the multiple structural data groups and multiple groups of case information in the diabetic retinopathy rule knowledge graph so as to determine the maximum data similarity value; the result determining module is used for: when the maximum data similarity value is smaller than a set threshold value, marking the multi-element structured data set; and when the maximum data similarity value is greater than or equal to the set threshold value, determining case information corresponding to the maximum data similarity value as a screening result. The invention can efficiently and accurately screen diabetic retinopathy according to medical record information.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to an auxiliary screening system for diabetic retinopathy.
Background
With the advent and rising of a large number of deep learning models represented by CNN, AI techniques such as image classification, object detection, and semantic segmentation are widely applied to the current auxiliary diagnostic system mainly based on clinical imaging, which reduces the working intensity of imaging doctors to a certain extent and improves the working efficiency. However, the imaging examination result is still in an auxiliary position, and in the actual clinical diagnosis, the accurate diagnosis cannot be made by the imaging result alone, but a doctor needs to integrate information of all parties such as the patient history, the main complaint, the examination result and the like to make the final accurate judgment. Therefore, at present, the image AI does not really touch the core content of the artificial intelligence enabling medical field.
Another common form is a rule statistical model based on expert experience, which uses feature engineering to perform standardized modeling on each index of each patient, extracts information such as age, sex, region, medical history, symptoms, inspection results and the like provided by the patient as each feature variable convenient for differential diagnosis, respectively gives different weights according to expert experience, comprehensively calculates risk indexes of various diseases of the patient, and performs symptom mapping and similarity calculation on the risk indexes and diseases in a standard disease library to complete final diagnosis. However, with the expansion of the types of diseases and the complexity of the relation of the diseases, the method needs to maintain a plurality of huge rule bases, diseases, symptom bases and the like in the later period, and the diagnosis effect is often highly dependent on the professional literacy and experience of the rule base construction expert. Finally, the built AI auxiliary diagnosis system is high in maintenance cost and long in construction period, and the stability and the mobility of the system are greatly reduced.
Disclosure of Invention
The invention aims to provide an auxiliary screening system for diabetic retinopathy, which can efficiently and accurately screen the diabetic retinopathy according to medical record information.
In order to achieve the above object, the present invention provides the following solutions:
a diabetic retinopathy auxiliary screening system comprising:
the data processing module is used for sequentially carrying out text standardization processing and text extraction processing on the patient medical record data so as to obtain a multi-element structured data set; the patient medical record data comprises patient symptom signs, current medical history, past medical history, laboratory examination results and imaging examination results; the multi-element structured data set comprises text entities corresponding to the patient medical record data, text relations among different text entities and text attributes corresponding to the text entities;
the knowledge graph construction module is used for constructing a diabetic retinopathy rule knowledge graph; the diabetic retinopathy rule knowledge graph comprises a plurality of groups of case information; each set of the case information includes lesion name, lesion symptom sign, laboratory examination data of the lesion, imaging examination data of the lesion, therapeutic drug and recipe care;
the lesion determining module is respectively connected with the data processing module and the knowledge graph construction module and is used for respectively carrying out similarity calculation on the multi-element structured data set and a plurality of groups of case information in the diabetic retinopathy rule knowledge graph so as to determine the maximum data similarity value;
the result determining module is connected with the lesion determining module and is used for:
judging whether the maximum data similarity value is smaller than a set threshold value or not;
marking the multi-element structured data set when the maximum data similarity value is less than a set threshold;
and when the maximum data similarity value is greater than or equal to a set threshold value, determining case information corresponding to the maximum data similarity value as a screening result.
Optionally, the data processing module specifically includes:
the standardized unified sub-module is used for carrying out word mapping and word sense disambiguation on the patient medical record data according to a preset database so as to obtain a standard text data set;
the text extraction sub-module is used for inputting the standard text data set into a Chinese named entity recognition mixed model for text extraction so as to obtain a multi-element structured data set; the Chinese named entity recognition hybrid model is obtained by training a deep learning BiLSTM-CRF model by adopting a training set; the deep learning BiLSTM-CRF model comprises a BiLSTM layer and a CRF layer which are sequentially connected; the training set includes a plurality of sample data; each sample data includes historical patient medical record data and label information; the label information is word vectors corresponding to the historical patient medical record data; the word vectors include a text entity vector, a text attribute vector, and a text relationship vector.
Optionally, in the training aspect of the Chinese named entity recognition mixed model, the text extraction submodule specifically includes:
the training set acquisition unit is used for acquiring a plurality of sample data;
the model training unit is used for inputting a plurality of sample data into the deep learning BiLSTM-CRF model for training so as to obtain an optimal deep learning BiLSTM-CRF model; the optimal deep learning BiLSTM-CRF model is a Chinese naming entity identification mixed model; the BiLSTM layer is used for carrying out bidirectional coding on the historical patient medical record data, and calculating the probability that text entities, text relations and text attributes in the historical patient medical record data are marked as label information so as to obtain a prediction label group corresponding to the historical patient medical record data; the CRF layer is used for adding constraint conditions to the prediction tag group so as to obtain an optimal prediction tag group; the optimal predictive tag set is the multi-structured data set.
Optionally, in terms of building the training set, the text extraction sub-module further includes:
an initial data acquisition unit for acquiring a plurality of history patient medical record data;
the word segmentation unit is used for carrying out character division on each history patient medical record data based on the jieba word segmentation library so as to obtain word segmentation texts;
the marking unit is used for marking the word segmentation text by adopting a BIO marking method so as to obtain a marked text; the noted text includes a text entity, a text relationship and a text attribute;
and the word embedding unit is used for converting the marked text into a word vector by adopting a word2vec method.
Optionally, the lesion determination module specifically includes:
similarity calculation submodule for calculating similarity according to formula
Calculating a data similarity value;
wherein Sim (V) 1 ,V 2 ) Representing a first entity V 1 With a second entity V 2 Data similarity values of (2); the first entity V 1 For a multi-structured data set, the second entity V 2 Is any group of case information in a diabetic retinopathy rule knowledge graph; v (V) n1 Text attribute vector representing first entity, V n2 A text attribute vector representing a second entity; v (V) γ1 A text relationship vector representing the first entity, V γ2 A literal relationship vector representing a second entity; gamma represents the weight distribution of the character attribute vector and the character relation vector;
wherein C is 1 And C 2 All represent variables, when C 1 Is V (V) n1 At time C 2 Is V (V) n2 The method comprises the steps of carrying out a first treatment on the surface of the When C 1 Is V (V) γ1 At time C 2 Is V (V) γ2 The method comprises the steps of carrying out a first treatment on the surface of the i represents an independent variable constant, and C represents the dimension of the vector;
and the maximum value determining sub-module is used for determining the maximum data similarity value according to the plurality of data similarity values.
Optionally, the diabetic retinopathy auxiliary screening system further comprises:
the case adding module is used for obtaining the case information corresponding to the marked multi-structured data set and storing the case information corresponding to the marked multi-structured data set to the diabetic retinopathy rule knowledge graph.
Optionally, the knowledge graph construction module specifically includes:
the data acquisition module is used for acquiring a plurality of groups of case information from the third party database;
and the knowledge graph construction module is used for establishing an SPO triplet according to the case information and constructing a diabetic retinopathy rule knowledge graph according to a plurality of the SPO triplets.
Optionally, the diabetic retinopathy auxiliary screening system further comprises:
and the text linking module is respectively connected with the data processing module and the knowledge graph construction module and is used for carrying out entity linking on the multi-element structured data set and the case information in the diabetic retinopathy rule knowledge graph so as to realize the association mapping of the multi-element structured data set and any case information.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention discloses an auxiliary screening system for diabetic retinopathy, which is provided with a data processing module, a knowledge graph construction module, a pathological change module and a result determination module. The data processing module processes the patient medical record data to obtain a multi-element structured data set containing text entities, text relations and text attributes; the knowledge graph construction module constructs a diabetic retinopathy rule knowledge graph comprising a plurality of groups of case information; the lesion determining module calculates the maximum data similarity value according to the multi-element structured data set and the case information in the knowledge graph which are respectively determined in the two modules; the result determining module outputs two results according to the maximum similarity value, namely marking the multi-element structured data set to indicate that the corresponding case information is not stored in the knowledge graph, and determining the case information corresponding to the multi-element structured data set in the knowledge graph as a screening result, so that the auxiliary screening of the diabetic retinopathy can be efficiently and quickly realized without manual participation through the mutual matching of a plurality of modules. In addition, compared with the prior art, the auxiliary screening system provided by the invention is simple in structure and low in maintenance cost.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an auxiliary screening system for diabetic retinopathy according to the present invention;
FIG. 2 is a schematic layer Schema structure and data layer population example of a diabetic retinopathy rule knowledge graph of the present invention;
FIG. 3 is a schematic diagram of the structure of a deep learning BiLSTM-CRF model in the diabetic retinopathy auxiliary screening system of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide an auxiliary screening system for diabetic retinopathy, which can realize large-scale automatic screening of sugar net diseases by utilizing uncertainty information (ambiguity, incompleteness, randomness) such as symptoms, signs, inspection results and the like provided by patients, can greatly improve the efficiency, can remove subjectivity of people, can avoid diagnosis screening errors caused by personal knowledge and experience differences, can more stably, efficiently, accurately and objectively finish rapid screening of sugar net patients, and saves medical cost.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
The Knowledge Graph (knowledgegraph) is a netlike Knowledge base formed by linking entities with attributes through relationships, so that massive Knowledge in the database can be conveniently represented as a ternary relationship group by using the relationships among the entities as a bridge, and the Knowledge can be completely and clearly described through a Graph formed by nodes and edges with numerous relationships. For example, the knowledge that "one of the common complications of diabetes is diabetic retinopathy" can be expressed in a knowledge graph using a triplet relationship (diabetes complication, diabetic retinopathy).
On the other hand, starting from the existing entity relation data in the knowledge base, new association among the entities can be established through logic-based reasoning, graph-based reasoning and deep learning-based reasoning, so that knowledge reasoning can be completed rapidly, hidden relations among the entities, predicted entity types, inferred entity attribute values and the like are mined. For example: it is known that (tiger, family, feline) and (feline, order, carnivora) can be deduced (tiger, order, carnivora); or the birthday attribute of an entity is known, the age attribute of the entity can be obtained by reasoning, and the like. In addition, the Graph database Neo4j (knowledge Graph data, such as MySQL, is a common relational database, because the data of the knowledge Graph contains entities, attributes, relationships and the like, and cannot well reflect the characteristics of the data), so that the knowledge Graph data is generally stored by adopting a Graph database (Graph Databases, neo4j is the most common Graph database) to store billions of nodes, so that a huge Graph network structure is formed, and the construction of a large-scale knowledge Graph is facilitated.
As shown in fig. 1, the present invention provides an auxiliary screening system for diabetic retinopathy, which comprises a data processing module 100, a knowledge graph construction module 200, a lesion module 300 and a result determination module 400.
Data processing module
The data processing module 100 is used for sequentially performing text standardization processing and text extraction processing on patient medical record data to obtain a multi-element structured data set; the patient medical record data comprises patient symptom signs, current medical history, past medical history, laboratory examination results and imaging examination results; the multi-element structured data set comprises text entities corresponding to the patient medical record data, text relations among different text entities and text attributes corresponding to the text entities.
Specifically, during patient consultation, a clinician collects relevant information of a patient through inquiry and other modes, including general information (gender, age, contact mode and the like), patient symptom signs, current medical history, past history, personal history, physical examination, laboratory examination results, conventional imaging examination results and the like, and generates corresponding patient diagnosis medical records based on the information.
The data processing module 100 specifically includes a standardized unified sub-module and a text extraction sub-module.
And the standardized unified sub-module is used for carrying out word mapping and word sense disambiguation on the patient medical record data according to a preset database so as to obtain a standard text data set. Specifically, word mapping and co-instruction disambiguation are performed on aliases, english names, abbreviated abbreviations and other different-name homonyms in patient medical record data through an offline synonymous dictionary library, a medical term library and the like, so that knowledge standardization of medical record texts is completed, and uniqueness of storage entities in rules KG is ensured.
The text extraction sub-module is used for inputting the standard text data set into a Chinese named entity recognition mixed model for text extraction so as to obtain a multi-element structured data set; the Chinese named entity recognition hybrid model is obtained by training a deep learning BiLSTM-CRF model by adopting a training set; the deep learning BiLSTM-CRF model comprises a BiLSTM layer and a CRF layer which are sequentially connected; the training set includes a plurality of sample data; each sample data includes historical patient medical record data and label information; the label information is word vectors corresponding to the historical patient medical record data; the word vectors include a text entity vector, a text attribute vector, and a text relationship vector.
In the aspect of constructing the training set, the text extraction sub-module further comprises an initial data acquisition unit, a word segmentation unit, a labeling unit and a word embedding unit. The initial data acquisition unit is used for acquiring a plurality of historical patient medical record data.
The word segmentation unit is used for carrying out character division on each history patient medical record data based on the jieba word segmentation library so as to obtain word segmentation texts; specifically, the third-party Chinese jieba word segmentation library provided by Python is utilized for word segmentation, and proper nouns such as medicine and medicine are supplemented and perfected, the total vocabulary in the dictionary after perfected reaches 23685, the word segmentation processing is carried out on the history patient medical record data based on the total vocabulary, and the sentence segmentation position is represented by using "/s" in the word segmentation process.
The labeling unit is used for labeling the word segmentation text by adopting a BIO labeling method so as to obtain a labeled text; the noted text includes a text entity, a text relationship and a text attribute; specifically, a BIO labeling method (Begin, intermediate, other) is adopted to label the word segmentation text, wherein 'B' represents a word first word, 'I' represents a word non-first word, and 'O' represents a non-focused word or punctuation. In the specific labeling process, the labeling modes shown in table 1 are selected for distinguishing.
TABLE 1 BIO tag Table
The word embedding unit is used for converting the marked text into a word vector by adopting a word2vec method. Specifically, words with corresponding labels are embedded by using a word2vec method to generate a 300-dimensional word vector matrix for model training. Table 2 shows the results of the BiLSTM+CRF mixed model after word segmentation and sequence labeling of the original complaint sentences in the sampled medical records.
Table 2 example Table of sequence annotations in BiLSTMi+CRF model
In the training aspect of the Chinese named entity recognition hybrid model, the text extraction submodule specifically comprises a training set acquisition unit and a model training unit. The training set acquisition unit is used for acquiring a plurality of sample data. The model training unit is used for inputting a plurality of sample data into the deep learning BiLSTM-CRF model for training so as to obtain an optimal deep learning BiLSTM-CRF model; the optimal deep learning BiLSTM-CRF model identifies a hybrid model for Chinese named entities.
As shown in fig. 3, the BiLSTM layer is configured to perform bi-directional encoding on the historical patient medical record data, and calculate a probability that a text entity, a text relationship, and a text attribute in the historical patient medical record data are marked as tag information, so as to obtain a prediction tag group corresponding to the historical patient medical record data. Specifically, an Embedding sequence of each word of a sentence is firstly used as the input of each time step of a bidirectional LSTM through a BiLSTM layer, and then the hidden states of forward and reverse output are spliced to obtain a complete hidden state sequence; then, a transmission score (tag vector) corresponding to each word is output.
The CRF layer is used for adding constraint conditions to the prediction tag group so as to obtain an optimal prediction tag group; the optimal predictive tag set is the multi-structured data set. Specifically, the output predictive label is corrected based on the emission score via CRF layer constraint, and then the optimal predictive label corresponding to each word is output.
(II) knowledge graph construction module
The knowledge graph construction module 200 is used for constructing a diabetic retinopathy rule knowledge graph; the diabetic retinopathy rule knowledge graph comprises a plurality of groups of case information; each set of the case information includes lesion name, symptom sign of the lesion, laboratory examination data of the lesion, imaging examination data of the lesion, therapeutic drug, and recipe care.
The knowledge graph construction module 200 specifically includes a data acquisition module and a knowledge graph construction module. The data acquisition module is used for acquiring a plurality of groups of case information from the third party database; the knowledge graph construction module is used for establishing an SPO triplet according to the case information and constructing a diabetic retinopathy rule knowledge graph according to a plurality of the SPO triplets.
Further, fig. 2 is a schematic layer Schema structure and data layer filling example of a diabetic retinopathy rule knowledge graph (abbreviated as "rule KG") constructed in the present invention. Considering a medical auxiliary diagnostic system as a typical example in serious medical scenario landing application, it often has more stringent requirements on accuracy, authority, etc. of knowledge. Therefore, all knowledge point SPO triple data used in the construction of the rule KG are collected from a plurality of authority Web sites such as official 'eye science' teaching materials, hundred degree medical classics, 360 health networks, medicine searching and questioning networks and the like, collected, checked and integrated by a plurality of staff, and pass blind examination, spot check and quality control of a plurality of three-level ophthalmologist examination groups, and the finally formed rule KG comprises 11 types and a total of 1930 entities. The entity type and the quantity distribution related to the Schema layer Schema (a piece of data must satisfy the entity object and the type of the entity object which are predefined by the Schema and are allowed to be updated into the knowledge graph) in the rule KG created by the invention are shown in the table 3 in detail.
Table 3 entity type and quantity distribution table of rule KG
In addition, the formed rule KG comprises 25 types and a total of 2396 different entity-relation-entity SPO triplets, the content covers medical knowledge of four aspects of diagnosis, treatment, management and prevention (short for diagnosis, treatment, management and prevention) of diabetic retinopathy and related diseases, and powerful guarantee is provided for subsequent rapid screening and accurate diagnosis of sugar net type 2 and 6 based on the rule KG. In addition, the invention can provide personalized treatment schemes, medication suggestions, recommended recipes, nursing modes and the like for the diagnosed sugar net patients based on the rule KG and the physique of the patients, and help the patients to prevent diseases, manage and control the course of the diseases and the like. The corresponding entity relationship types and quantity distribution are shown in Table 4:
table 4 entity relationship type and quantity distribution table of rule KG
(III) lesion determination Module
The lesion determination module 300 is respectively connected with the data processing module and the knowledge graph construction module, and is used for respectively performing similarity calculation on the multi-element structured data set and multiple groups of case information in the diabetic retinopathy rule knowledge graph so as to determine a maximum data similarity value. Specifically, firstly, medical record feature selection and disorder reduction are carried out on the input multi-element structured data set, and because the knowledge expressions such as main complaints, medical history and the like in the clinical case library inevitably have redundancy phenomenon, the redundant features and the knowledge expressions are deleted on the premise of keeping the classification and deducing capability unchanged, and the operation can effectively reduce the complexity of subsequent similarity calculation.
The lesion determination module 300 specifically includes a similarity calculation sub-module and a maximum value determination sub-module. In the medical knowledge graph, the most important entity identity is the entity name and attribute, and the relationship between the entities is only referred, so that a weight gamma is set in similarity calculation and used for representing the weight distribution of the entity name/attribute vector and the relationship vector, and the relationship vector only selects the relationship of important categories. Finally, the two entity similarities are combined into root mean square by the two sets of vector similarities. Based on this, the similarity calculation submodule is used for calculating the similarity according to the formula
And calculating the data similarity value.
Wherein Sim (V) 1 ,V 2 ) Representing a first entity V 1 With a second entity V 2 Data similarity values of (2); the first entity V 1 For a multi-structured data set, the second entity V 2 Is any group of case information in a diabetic retinopathy rule knowledge graph; v (V) n1 Text attribute vector representing first entity, V n2 A text attribute vector representing a second entity; v (V) γ1 A text relationship vector representing the first entity, V γ2 A literal relationship vector representing a second entity; gamma represents the weight distribution of the character attribute vector and the character relation vector, and can be custom set according to different entities, and specific reference can be made to the diagnosis standard of a professional doctor.
Wherein C is 1 And C 2 All represent variables, when C 1 Is V (V) n1 At time C 2 Is V (V) n2 The method comprises the steps of carrying out a first treatment on the surface of the When C 1 Is V (V) γ1 At time C 2 Is V (V) γ2 The method comprises the steps of carrying out a first treatment on the surface of the i represents the independent variable constant and C represents the dimension of the vector. When C 1 Is V (V) n1 ,C 2 Is V (V) n2 At time C 1i The ith dimension, C, of the text attribute vector representing the first entity 2i The ith dimension of the literal attribute vector representing the second entity. Similarly, when C 1 Is V (V) γ1 ,C 2 Is V (V) γ2 At time C 1i The ith dimension, C, of the literal relationship vector representing the first entity 2i An ith dimension of the literal relationship vector representing the second entity.
The maximum value determination submodule is used for determining a maximum data similarity value according to the plurality of data similarity values.
(IV) result determination Module
A result determination module 400, coupled to the lesion determination module, for:
judging whether the maximum data similarity value is smaller than a set threshold value or not; marking the multi-element structured data set when the maximum data similarity value is less than a set threshold; and when the maximum data similarity value is greater than or equal to a set threshold value, determining case information corresponding to the maximum data similarity value as a screening result.
Specifically, the set threshold is set to 0.8. If the maximum data similarity value is greater than or equal to 0.8, the most similar medical records (i.e., the case information corresponding to the maximum data similarity value) are output, and diagnostic reasoning is performed according to the content of the medical records, which is similar to the analogy reasoning of clinicians. If the maximum data similarity value is lower than 0.8, the medical record is considered to be a new medical record which is not in the case KG, at the moment, reasoning can be carried out by a heavy head according to medical knowledge stored in the rule KG, and finally, a diagnosis conclusion is given, wherein the process is similar to the logic reasoning of a clinician.
The newly generated medical records are processed and can be stored into the case KG again for expanding the case KG. The clinician regularly extracts the rules of the case KG and analyzes the precipitate, so that new medical knowledge can be continuously expanded into the rules KG. This process is similar to the process of clinical experience expansion by time accumulation of clinical staff. The invention supports the diagnosis reasoning process of simulating doctors only depending on the rule KG in the cold start link, and realizes the rapid screening and diagnosis of sugar net patients.
Preferably, the diabetic retinopathy auxiliary screening system further comprises:
the case adding module is used for obtaining the case information corresponding to the marked multi-structured data set and storing the case information corresponding to the marked multi-structured data set to the diabetic retinopathy rule knowledge graph.
And the text linking module is respectively connected with the data processing module and the knowledge graph construction module and is used for carrying out entity linking on the multi-element structured data set and the case information in the diabetic retinopathy rule knowledge graph so as to realize the association mapping of the multi-element structured data set and any case information. Specifically, the character strings of the identified characterization entities, relationships, attributes and the like are mapped to the corresponding entities of the rule KG through entity links, and finally, the association mapping of the information such as the symptoms, the signs, the inspection results and the like of the patient in the medical record to be diagnosed and the rule KG is realized.
In one embodiment, a specific application of the diabetic retinopathy auxiliary screening system of the present invention is as follows:
(1) The summary table of the medical record profiling information to be diagnosed after desensitization treatment is shown in table 5, and the personal information, symptoms, signs, complaints, inspection results of fundus images and the like of the patient are mainly recorded.
Table 5 summary of the recorded medical record to be diagnosed
(2) Executing a data processing module on the recorded medical record to be diagnosed in table 5: normalization operation is carried out on medical terms and entity names through a medical record text knowledge normalization link, so that joint disambiguation (such as spot bleeding, exudation, hard exudation, cotton wool spots and cotton wool spots) is realized; and then invoking a BiLSTM+CRF mixed model in a medical record text knowledge extraction link to extract disease/symptom entities (such as hypertension and sheet hemorrhage) in the medical record text, and corresponding attribute values (such as diabetes mellitus-LastTime: 3 months), wherein the final structured medical record to be diagnosed is shown in a table 6.
Table 6 structured medical record summary table to be diagnosed obtained by data processing module
(3) And executing the set Cypher statement in a Neo4j graph database of the storage rule KG by using the extracted symptom entity to perform the association search of the entity and the relation.
The eye diseases which are obtained after the searching and the inquiring and simultaneously show the hard exudation and the cotton velvet spot are provided with the diabetic retinopathy and the moderate non-proliferation DR, and at the moment, the symptom which is not shared by the two diseases is further combined, namely the spot-sheet bleeding, so that the patient to be diagnosed can be rapidly diagnosed as the moderate non-proliferation DR. To this end, a rapid screening procedure for whether the patient in this example is suffering from diabetic retinopathy has been completed.
Aiming at complex situations, when the type of the patient in the sugar net 'type 2 and 6 period' can not be judged only according to information such as symptoms, signs and the like, attribute information (such as cotton linter macula (loc: macula, num: few)) of a symptom entity can be further called to carry out association mapping and rule matching with a disease parting and stage standard (disease Classify Standard) in a rule KG, so that final diagnosis is realized. In addition, if the signs and symptoms of the mild patients are not obvious, the basic information such as the region, age, sex, medical history and the like of the patients can be further combined, and similarity calculation can be carried out on the basic information and the characteristics such as the high incidence region and the high risk group of the diabetic retinopathy in the rule KG so as to further determine whether the patients suffer from the diabetic retinopathy.
In a specific embodiment, the auxiliary screening of diabetic retinopathy based on the knowledge graph can be replaced by any other disease according to the service requirement, and the corresponding rule KG can be constructed by referring to the pattern layer Schema structure designed by the invention. Further, the medical record filing method can limit the affiliated hospitals, information input modes and the like of the patient according to actual needs; the device, the method, the report reading return, the tested crowd characteristics and the like for laboratory examination and imaging examination which are needed in the process of diagnosing the patient can be set according to actual needs.
Compared with the prior art, the invention has the following advantages:
(1) The invention is based on the convenience of the knowledge graph in eliminating the interference of language ambiguity and the complex relation among related entities, and the self-evolution capability and the reasoning operation capability in revealing the dynamic development rule of the knowledge field, integrates the rule KG and the case KG 'double engines', screens the newly input medical record to be diagnosed, and remarkably improves the result accuracy and the self-evolution capability of the sugar network auxiliary diagnosis system.
(2) According to the invention, in an initial cold start link, the rapid screening of the sugar net patient can be realized by simulating the screening reasoning process of a doctor only depending on the rule KG. The real clinical case data of the patient to be diagnosed can be continuously introduced in the follow-up process, the data processing module and the lesion determination module extract information of corresponding entities, relations, attributes and the like of the structured medical record to be diagnosed, and then the follow-up similarity calculation and multi-layer deep retrieval can be conveniently carried out according to the diabetic retinopathy case knowledge graph, so that the multi-mode comprehensive reasoning and decision are achieved, and the final goal of intelligent auxiliary rapid and accurate screening of the sugar net disease is realized.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.
Claims (7)
1. A diabetic retinopathy auxiliary screening system, the system comprising:
the data processing module is used for sequentially carrying out text standardization processing and text extraction processing on the patient medical record data so as to obtain a multi-element structured data set; the patient medical record data comprises patient symptom signs, current medical history, past medical history, laboratory examination results and imaging examination results; the multi-element structured data set comprises text entities corresponding to the patient medical record data, text relations among different text entities and text attributes corresponding to the text entities;
the knowledge graph construction module is used for constructing a diabetic retinopathy rule knowledge graph; the diabetic retinopathy rule knowledge graph comprises a plurality of groups of case information; each set of the case information includes lesion name, lesion symptom sign, laboratory examination data of the lesion, imaging examination data of the lesion, therapeutic drug and recipe care;
the lesion determining module is respectively connected with the data processing module and the knowledge graph construction module and is used for respectively carrying out similarity calculation on the multi-element structured data set and a plurality of groups of case information in the diabetic retinopathy rule knowledge graph so as to determine the maximum data similarity value; the lesion determination module specifically comprises: similarity calculation submodule for calculating similarity according to formula
Calculating a data similarity value;
wherein Sim (V) 1 ,V 2 ) Representing a first entity V 1 With a second entity V 2 Data similarity values of (2); the first entity V 1 For a multi-structured data set, the second entity V 2 Is any group of case information in a diabetic retinopathy rule knowledge graph; v (V) n1 Text attribute vector representing first entity, V n2 A text attribute vector representing a second entity; v (V) γ1 A text relationship vector representing the first entity, V γ2 A literal relationship vector representing a second entity; gamma represents the weight distribution of the character attribute vector and the character relation vector;
wherein C is 1 And C 2 All represent variables, when C 1 Is V (V) n1 At time C 2 Is V (V) n2 The method comprises the steps of carrying out a first treatment on the surface of the When C 1 Is V (V) γ1 At time C 2 Is V (V) γ2 The method comprises the steps of carrying out a first treatment on the surface of the i represents an independent variable constant, and C represents the dimension of the vector;
the maximum value determining submodule is used for determining a maximum data similarity value according to the plurality of data similarity values;
the result determining module is connected with the lesion determining module and is used for:
judging whether the maximum data similarity value is smaller than a set threshold value or not;
marking the multi-element structured data set when the maximum data similarity value is less than a set threshold;
and when the maximum data similarity value is greater than or equal to a set threshold value, determining case information corresponding to the maximum data similarity value as a screening result.
2. The diabetic retinopathy auxiliary screening system according to claim 1, wherein the data processing module specifically comprises:
the standardized unified sub-module is used for carrying out word mapping and word sense disambiguation on the patient medical record data according to a preset database so as to obtain a standard text data set;
the text extraction sub-module is used for inputting the standard text data set into a Chinese named entity recognition mixed model for text extraction so as to obtain a multi-element structured data set; the Chinese named entity recognition hybrid model is obtained by training a deep learning BiLSTM-CRF model by adopting a training set; the deep learning BiLSTM-CRF model comprises a BiLSTM layer and a CRF layer which are sequentially connected; the training set includes a plurality of sample data; each sample data includes historical patient medical record data and label information; the label information is word vectors corresponding to the historical patient medical record data; the word vectors include a text entity vector, a text attribute vector, and a text relationship vector.
3. The diabetic retinopathy auxiliary screening system according to claim 2, wherein the text extraction submodule specifically includes, in terms of training of a chinese named entity recognition hybrid model:
the training set acquisition unit is used for acquiring a plurality of sample data;
the model training unit is used for inputting a plurality of sample data into the deep learning BiLSTM-CRF model for training so as to obtain an optimal deep learning BiLSTM-CRF model; the optimal deep learning BiLSTM-CRF model is a Chinese naming entity identification mixed model; the BiLSTM layer is used for carrying out bidirectional coding on the historical patient medical record data, and calculating the probability that text entities, text relations and text attributes in the historical patient medical record data are marked as label information so as to obtain a prediction label group corresponding to the historical patient medical record data; the CRF layer is used for adding constraint conditions to the prediction tag group so as to obtain an optimal prediction tag group; the optimal predictive tag set is the multi-structured data set.
4. The diabetic retinopathy auxiliary screening system according to claim 2, wherein in terms of construction of a training set, the text extraction sub-module further comprises:
an initial data acquisition unit for acquiring a plurality of history patient medical record data;
the word segmentation unit is used for carrying out character division on each history patient medical record data based on the jieba word segmentation library so as to obtain word segmentation texts;
the marking unit is used for marking the word segmentation text by adopting a BIO marking method so as to obtain a marked text; the noted text includes a text entity, a text relationship and a text attribute;
and the word embedding unit is used for converting the marked text into a word vector by adopting a word2vec method.
5. The diabetic retinopathy auxiliary screening system according to claim 1, further comprising:
the case adding module is used for obtaining the case information corresponding to the marked multi-structured data set and storing the case information corresponding to the marked multi-structured data set to the diabetic retinopathy rule knowledge graph.
6. The diabetic retinopathy auxiliary screening system according to claim 1, wherein the knowledge graph construction module specifically comprises:
the data acquisition module is used for acquiring a plurality of groups of case information from the third party database;
and the knowledge graph construction module is used for establishing an SPO triplet according to the case information and constructing a diabetic retinopathy rule knowledge graph according to a plurality of the SPO triplets.
7. The diabetic retinopathy auxiliary screening system according to claim 1, further comprising:
and the text linking module is respectively connected with the data processing module and the knowledge graph construction module and is used for carrying out entity linking on the multi-element structured data set and the case information in the diabetic retinopathy rule knowledge graph so as to realize the association mapping of the multi-element structured data set and any case information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210947675.1A CN115312186B (en) | 2022-08-09 | 2022-08-09 | Auxiliary screening system for diabetic retinopathy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210947675.1A CN115312186B (en) | 2022-08-09 | 2022-08-09 | Auxiliary screening system for diabetic retinopathy |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115312186A CN115312186A (en) | 2022-11-08 |
CN115312186B true CN115312186B (en) | 2023-06-09 |
Family
ID=83860001
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210947675.1A Active CN115312186B (en) | 2022-08-09 | 2022-08-09 | Auxiliary screening system for diabetic retinopathy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115312186B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118016316B (en) * | 2024-04-10 | 2024-06-04 | 健数(长春)科技有限公司 | Disease screening rate improving method and system by combining knowledge graph with blood routine test data |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111414393A (en) * | 2020-03-26 | 2020-07-14 | 湖南科创信息技术股份有限公司 | Semantic similar case retrieval method and equipment based on medical knowledge graph |
CN111767410A (en) * | 2020-06-30 | 2020-10-13 | 平安国际智慧城市科技股份有限公司 | Construction method, device, equipment and storage medium of clinical medical knowledge map |
CN112164460A (en) * | 2020-10-19 | 2021-01-01 | 科技谷(厦门)信息技术有限公司 | Intelligent disease auxiliary diagnosis system based on medical knowledge map |
CN112364174A (en) * | 2020-10-21 | 2021-02-12 | 山东大学 | Patient medical record similarity evaluation method and system based on knowledge graph |
-
2022
- 2022-08-09 CN CN202210947675.1A patent/CN115312186B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111414393A (en) * | 2020-03-26 | 2020-07-14 | 湖南科创信息技术股份有限公司 | Semantic similar case retrieval method and equipment based on medical knowledge graph |
CN111767410A (en) * | 2020-06-30 | 2020-10-13 | 平安国际智慧城市科技股份有限公司 | Construction method, device, equipment and storage medium of clinical medical knowledge map |
CN112164460A (en) * | 2020-10-19 | 2021-01-01 | 科技谷(厦门)信息技术有限公司 | Intelligent disease auxiliary diagnosis system based on medical knowledge map |
CN112364174A (en) * | 2020-10-21 | 2021-02-12 | 山东大学 | Patient medical record similarity evaluation method and system based on knowledge graph |
Also Published As
Publication number | Publication date |
---|---|
CN115312186A (en) | 2022-11-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113871003B (en) | Disease auxiliary differential diagnosis system based on causal medical knowledge graph | |
US11749387B2 (en) | Deduplication of medical concepts from patient information | |
CN109299239B (en) | ES-based electronic medical record retrieval method | |
CN111986770B (en) | Prescription medication auditing method, device, equipment and storage medium | |
CN112786194A (en) | Medical image diagnosis guide inspection system, method and equipment based on artificial intelligence | |
CN113505243A (en) | Intelligent question-answering method and device based on medical knowledge graph | |
CN112183026A (en) | ICD (interface control document) encoding method and device, electronic device and storage medium | |
CN110277167A (en) | The Chronic Non-Communicable Diseases Risk Forecast System of knowledge based map | |
US20180121603A1 (en) | Identification of Related Electronic Medical Record Documents in a Question and Answer System | |
WO2023160264A1 (en) | Medical data processing method and apparatus, and storage medium | |
CN112700865A (en) | Intelligent triage method based on comprehensive reasoning | |
CN115293161A (en) | Reasonable medicine taking system and method based on natural language processing and medicine knowledge graph | |
CN116386805A (en) | Intelligent guided diagnosis report generation method | |
CN115312186B (en) | Auxiliary screening system for diabetic retinopathy | |
CN116992002A (en) | Intelligent care scheme response method and system | |
CN115841861A (en) | Similar medical record recommendation method and system | |
Liao et al. | Medical data inquiry using a question answering model | |
Nasiri et al. | A medical case-based reasoning approach using image classification and text information for recommendation | |
CN118296121A (en) | Medical term standardization auxiliary diagnosis method based on large language model | |
CN117194604A (en) | Intelligent medical patient inquiry corpus construction method | |
JP2017167738A (en) | Diagnostic processing device, diagnostic processing system, server, diagnostic processing method, and program | |
CN116994689A (en) | Characterization processing method, device, equipment, medium and product of medical data | |
CN113314236A (en) | Intelligent question-answering system for hypertension | |
CN112669961A (en) | Intelligent triage method based on big data reasoning | |
Safari et al. | An enhancement on Clinical Data Analytics Language (CliniDAL) by integration of free text concept search |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |