CN115312186B - Auxiliary screening system for diabetic retinopathy - Google Patents

Auxiliary screening system for diabetic retinopathy Download PDF

Info

Publication number
CN115312186B
CN115312186B CN202210947675.1A CN202210947675A CN115312186B CN 115312186 B CN115312186 B CN 115312186B CN 202210947675 A CN202210947675 A CN 202210947675A CN 115312186 B CN115312186 B CN 115312186B
Authority
CN
China
Prior art keywords
text
data
diabetic retinopathy
entity
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210947675.1A
Other languages
Chinese (zh)
Other versions
CN115312186A (en
Inventor
代黎明
张冬冬
杨洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhizhen Internet Technology Co ltd
Original Assignee
Beijing Zhizhen Internet Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhizhen Internet Technology Co ltd filed Critical Beijing Zhizhen Internet Technology Co ltd
Priority to CN202210947675.1A priority Critical patent/CN115312186B/en
Publication of CN115312186A publication Critical patent/CN115312186A/en
Application granted granted Critical
Publication of CN115312186B publication Critical patent/CN115312186B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Public Health (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Epidemiology (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Primary Health Care (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses an auxiliary screening system for diabetic retinopathy, which relates to the technical field of data processing, and comprises the following components: the data processing module is used for sequentially carrying out text standardization processing and text extraction processing on the patient medical record data so as to obtain a multi-element structured data set; the knowledge graph construction module is used for constructing a diabetic retinopathy rule knowledge graph; the lesion determining module is used for respectively carrying out similarity calculation on the multiple structural data groups and multiple groups of case information in the diabetic retinopathy rule knowledge graph so as to determine the maximum data similarity value; the result determining module is used for: when the maximum data similarity value is smaller than a set threshold value, marking the multi-element structured data set; and when the maximum data similarity value is greater than or equal to the set threshold value, determining case information corresponding to the maximum data similarity value as a screening result. The invention can efficiently and accurately screen diabetic retinopathy according to medical record information.

Description

Auxiliary screening system for diabetic retinopathy
Technical Field
The invention relates to the technical field of data processing, in particular to an auxiliary screening system for diabetic retinopathy.
Background
With the advent and rising of a large number of deep learning models represented by CNN, AI techniques such as image classification, object detection, and semantic segmentation are widely applied to the current auxiliary diagnostic system mainly based on clinical imaging, which reduces the working intensity of imaging doctors to a certain extent and improves the working efficiency. However, the imaging examination result is still in an auxiliary position, and in the actual clinical diagnosis, the accurate diagnosis cannot be made by the imaging result alone, but a doctor needs to integrate information of all parties such as the patient history, the main complaint, the examination result and the like to make the final accurate judgment. Therefore, at present, the image AI does not really touch the core content of the artificial intelligence enabling medical field.
Another common form is a rule statistical model based on expert experience, which uses feature engineering to perform standardized modeling on each index of each patient, extracts information such as age, sex, region, medical history, symptoms, inspection results and the like provided by the patient as each feature variable convenient for differential diagnosis, respectively gives different weights according to expert experience, comprehensively calculates risk indexes of various diseases of the patient, and performs symptom mapping and similarity calculation on the risk indexes and diseases in a standard disease library to complete final diagnosis. However, with the expansion of the types of diseases and the complexity of the relation of the diseases, the method needs to maintain a plurality of huge rule bases, diseases, symptom bases and the like in the later period, and the diagnosis effect is often highly dependent on the professional literacy and experience of the rule base construction expert. Finally, the built AI auxiliary diagnosis system is high in maintenance cost and long in construction period, and the stability and the mobility of the system are greatly reduced.
Disclosure of Invention
The invention aims to provide an auxiliary screening system for diabetic retinopathy, which can efficiently and accurately screen the diabetic retinopathy according to medical record information.
In order to achieve the above object, the present invention provides the following solutions:
a diabetic retinopathy auxiliary screening system comprising:
the data processing module is used for sequentially carrying out text standardization processing and text extraction processing on the patient medical record data so as to obtain a multi-element structured data set; the patient medical record data comprises patient symptom signs, current medical history, past medical history, laboratory examination results and imaging examination results; the multi-element structured data set comprises text entities corresponding to the patient medical record data, text relations among different text entities and text attributes corresponding to the text entities;
the knowledge graph construction module is used for constructing a diabetic retinopathy rule knowledge graph; the diabetic retinopathy rule knowledge graph comprises a plurality of groups of case information; each set of the case information includes lesion name, lesion symptom sign, laboratory examination data of the lesion, imaging examination data of the lesion, therapeutic drug and recipe care;
the lesion determining module is respectively connected with the data processing module and the knowledge graph construction module and is used for respectively carrying out similarity calculation on the multi-element structured data set and a plurality of groups of case information in the diabetic retinopathy rule knowledge graph so as to determine the maximum data similarity value;
the result determining module is connected with the lesion determining module and is used for:
judging whether the maximum data similarity value is smaller than a set threshold value or not;
marking the multi-element structured data set when the maximum data similarity value is less than a set threshold;
and when the maximum data similarity value is greater than or equal to a set threshold value, determining case information corresponding to the maximum data similarity value as a screening result.
Optionally, the data processing module specifically includes:
the standardized unified sub-module is used for carrying out word mapping and word sense disambiguation on the patient medical record data according to a preset database so as to obtain a standard text data set;
the text extraction sub-module is used for inputting the standard text data set into a Chinese named entity recognition mixed model for text extraction so as to obtain a multi-element structured data set; the Chinese named entity recognition hybrid model is obtained by training a deep learning BiLSTM-CRF model by adopting a training set; the deep learning BiLSTM-CRF model comprises a BiLSTM layer and a CRF layer which are sequentially connected; the training set includes a plurality of sample data; each sample data includes historical patient medical record data and label information; the label information is word vectors corresponding to the historical patient medical record data; the word vectors include a text entity vector, a text attribute vector, and a text relationship vector.
Optionally, in the training aspect of the Chinese named entity recognition mixed model, the text extraction submodule specifically includes:
the training set acquisition unit is used for acquiring a plurality of sample data;
the model training unit is used for inputting a plurality of sample data into the deep learning BiLSTM-CRF model for training so as to obtain an optimal deep learning BiLSTM-CRF model; the optimal deep learning BiLSTM-CRF model is a Chinese naming entity identification mixed model; the BiLSTM layer is used for carrying out bidirectional coding on the historical patient medical record data, and calculating the probability that text entities, text relations and text attributes in the historical patient medical record data are marked as label information so as to obtain a prediction label group corresponding to the historical patient medical record data; the CRF layer is used for adding constraint conditions to the prediction tag group so as to obtain an optimal prediction tag group; the optimal predictive tag set is the multi-structured data set.
Optionally, in terms of building the training set, the text extraction sub-module further includes:
an initial data acquisition unit for acquiring a plurality of history patient medical record data;
the word segmentation unit is used for carrying out character division on each history patient medical record data based on the jieba word segmentation library so as to obtain word segmentation texts;
the marking unit is used for marking the word segmentation text by adopting a BIO marking method so as to obtain a marked text; the noted text includes a text entity, a text relationship and a text attribute;
and the word embedding unit is used for converting the marked text into a word vector by adopting a word2vec method.
Optionally, the lesion determination module specifically includes:
similarity calculation submodule for calculating similarity according to formula
Figure BDA0003788003440000031
Calculating a data similarity value;
wherein Sim (V) 1 ,V 2 ) Representing a first entity V 1 With a second entity V 2 Data similarity values of (2); the first entity V 1 For a multi-structured data set, the second entity V 2 Is any group of case information in a diabetic retinopathy rule knowledge graph; v (V) n1 Text attribute vector representing first entity, V n2 A text attribute vector representing a second entity; v (V) γ1 A text relationship vector representing the first entity, V γ2 A literal relationship vector representing a second entity; gamma represents the weight distribution of the character attribute vector and the character relation vector;
Figure BDA0003788003440000032
wherein C is 1 And C 2 All represent variables, when C 1 Is V (V) n1 At time C 2 Is V (V) n2 The method comprises the steps of carrying out a first treatment on the surface of the When C 1 Is V (V) γ1 At time C 2 Is V (V) γ2 The method comprises the steps of carrying out a first treatment on the surface of the i represents an independent variable constant, and C represents the dimension of the vector;
and the maximum value determining sub-module is used for determining the maximum data similarity value according to the plurality of data similarity values.
Optionally, the diabetic retinopathy auxiliary screening system further comprises:
the case adding module is used for obtaining the case information corresponding to the marked multi-structured data set and storing the case information corresponding to the marked multi-structured data set to the diabetic retinopathy rule knowledge graph.
Optionally, the knowledge graph construction module specifically includes:
the data acquisition module is used for acquiring a plurality of groups of case information from the third party database;
and the knowledge graph construction module is used for establishing an SPO triplet according to the case information and constructing a diabetic retinopathy rule knowledge graph according to a plurality of the SPO triplets.
Optionally, the diabetic retinopathy auxiliary screening system further comprises:
and the text linking module is respectively connected with the data processing module and the knowledge graph construction module and is used for carrying out entity linking on the multi-element structured data set and the case information in the diabetic retinopathy rule knowledge graph so as to realize the association mapping of the multi-element structured data set and any case information.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention discloses an auxiliary screening system for diabetic retinopathy, which is provided with a data processing module, a knowledge graph construction module, a pathological change module and a result determination module. The data processing module processes the patient medical record data to obtain a multi-element structured data set containing text entities, text relations and text attributes; the knowledge graph construction module constructs a diabetic retinopathy rule knowledge graph comprising a plurality of groups of case information; the lesion determining module calculates the maximum data similarity value according to the multi-element structured data set and the case information in the knowledge graph which are respectively determined in the two modules; the result determining module outputs two results according to the maximum similarity value, namely marking the multi-element structured data set to indicate that the corresponding case information is not stored in the knowledge graph, and determining the case information corresponding to the multi-element structured data set in the knowledge graph as a screening result, so that the auxiliary screening of the diabetic retinopathy can be efficiently and quickly realized without manual participation through the mutual matching of a plurality of modules. In addition, compared with the prior art, the auxiliary screening system provided by the invention is simple in structure and low in maintenance cost.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an auxiliary screening system for diabetic retinopathy according to the present invention;
FIG. 2 is a schematic layer Schema structure and data layer population example of a diabetic retinopathy rule knowledge graph of the present invention;
FIG. 3 is a schematic diagram of the structure of a deep learning BiLSTM-CRF model in the diabetic retinopathy auxiliary screening system of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide an auxiliary screening system for diabetic retinopathy, which can realize large-scale automatic screening of sugar net diseases by utilizing uncertainty information (ambiguity, incompleteness, randomness) such as symptoms, signs, inspection results and the like provided by patients, can greatly improve the efficiency, can remove subjectivity of people, can avoid diagnosis screening errors caused by personal knowledge and experience differences, can more stably, efficiently, accurately and objectively finish rapid screening of sugar net patients, and saves medical cost.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
The Knowledge Graph (knowledgegraph) is a netlike Knowledge base formed by linking entities with attributes through relationships, so that massive Knowledge in the database can be conveniently represented as a ternary relationship group by using the relationships among the entities as a bridge, and the Knowledge can be completely and clearly described through a Graph formed by nodes and edges with numerous relationships. For example, the knowledge that "one of the common complications of diabetes is diabetic retinopathy" can be expressed in a knowledge graph using a triplet relationship (diabetes complication, diabetic retinopathy).
On the other hand, starting from the existing entity relation data in the knowledge base, new association among the entities can be established through logic-based reasoning, graph-based reasoning and deep learning-based reasoning, so that knowledge reasoning can be completed rapidly, hidden relations among the entities, predicted entity types, inferred entity attribute values and the like are mined. For example: it is known that (tiger, family, feline) and (feline, order, carnivora) can be deduced (tiger, order, carnivora); or the birthday attribute of an entity is known, the age attribute of the entity can be obtained by reasoning, and the like. In addition, the Graph database Neo4j (knowledge Graph data, such as MySQL, is a common relational database, because the data of the knowledge Graph contains entities, attributes, relationships and the like, and cannot well reflect the characteristics of the data), so that the knowledge Graph data is generally stored by adopting a Graph database (Graph Databases, neo4j is the most common Graph database) to store billions of nodes, so that a huge Graph network structure is formed, and the construction of a large-scale knowledge Graph is facilitated.
As shown in fig. 1, the present invention provides an auxiliary screening system for diabetic retinopathy, which comprises a data processing module 100, a knowledge graph construction module 200, a lesion module 300 and a result determination module 400.
Data processing module
The data processing module 100 is used for sequentially performing text standardization processing and text extraction processing on patient medical record data to obtain a multi-element structured data set; the patient medical record data comprises patient symptom signs, current medical history, past medical history, laboratory examination results and imaging examination results; the multi-element structured data set comprises text entities corresponding to the patient medical record data, text relations among different text entities and text attributes corresponding to the text entities.
Specifically, during patient consultation, a clinician collects relevant information of a patient through inquiry and other modes, including general information (gender, age, contact mode and the like), patient symptom signs, current medical history, past history, personal history, physical examination, laboratory examination results, conventional imaging examination results and the like, and generates corresponding patient diagnosis medical records based on the information.
The data processing module 100 specifically includes a standardized unified sub-module and a text extraction sub-module.
And the standardized unified sub-module is used for carrying out word mapping and word sense disambiguation on the patient medical record data according to a preset database so as to obtain a standard text data set. Specifically, word mapping and co-instruction disambiguation are performed on aliases, english names, abbreviated abbreviations and other different-name homonyms in patient medical record data through an offline synonymous dictionary library, a medical term library and the like, so that knowledge standardization of medical record texts is completed, and uniqueness of storage entities in rules KG is ensured.
The text extraction sub-module is used for inputting the standard text data set into a Chinese named entity recognition mixed model for text extraction so as to obtain a multi-element structured data set; the Chinese named entity recognition hybrid model is obtained by training a deep learning BiLSTM-CRF model by adopting a training set; the deep learning BiLSTM-CRF model comprises a BiLSTM layer and a CRF layer which are sequentially connected; the training set includes a plurality of sample data; each sample data includes historical patient medical record data and label information; the label information is word vectors corresponding to the historical patient medical record data; the word vectors include a text entity vector, a text attribute vector, and a text relationship vector.
In the aspect of constructing the training set, the text extraction sub-module further comprises an initial data acquisition unit, a word segmentation unit, a labeling unit and a word embedding unit. The initial data acquisition unit is used for acquiring a plurality of historical patient medical record data.
The word segmentation unit is used for carrying out character division on each history patient medical record data based on the jieba word segmentation library so as to obtain word segmentation texts; specifically, the third-party Chinese jieba word segmentation library provided by Python is utilized for word segmentation, and proper nouns such as medicine and medicine are supplemented and perfected, the total vocabulary in the dictionary after perfected reaches 23685, the word segmentation processing is carried out on the history patient medical record data based on the total vocabulary, and the sentence segmentation position is represented by using "/s" in the word segmentation process.
The labeling unit is used for labeling the word segmentation text by adopting a BIO labeling method so as to obtain a labeled text; the noted text includes a text entity, a text relationship and a text attribute; specifically, a BIO labeling method (Begin, intermediate, other) is adopted to label the word segmentation text, wherein 'B' represents a word first word, 'I' represents a word non-first word, and 'O' represents a non-focused word or punctuation. In the specific labeling process, the labeling modes shown in table 1 are selected for distinguishing.
TABLE 1 BIO tag Table
Figure BDA0003788003440000071
Figure BDA0003788003440000081
The word embedding unit is used for converting the marked text into a word vector by adopting a word2vec method. Specifically, words with corresponding labels are embedded by using a word2vec method to generate a 300-dimensional word vector matrix for model training. Table 2 shows the results of the BiLSTM+CRF mixed model after word segmentation and sequence labeling of the original complaint sentences in the sampled medical records.
Table 2 example Table of sequence annotations in BiLSTMi+CRF model
Figure BDA0003788003440000082
In the training aspect of the Chinese named entity recognition hybrid model, the text extraction submodule specifically comprises a training set acquisition unit and a model training unit. The training set acquisition unit is used for acquiring a plurality of sample data. The model training unit is used for inputting a plurality of sample data into the deep learning BiLSTM-CRF model for training so as to obtain an optimal deep learning BiLSTM-CRF model; the optimal deep learning BiLSTM-CRF model identifies a hybrid model for Chinese named entities.
As shown in fig. 3, the BiLSTM layer is configured to perform bi-directional encoding on the historical patient medical record data, and calculate a probability that a text entity, a text relationship, and a text attribute in the historical patient medical record data are marked as tag information, so as to obtain a prediction tag group corresponding to the historical patient medical record data. Specifically, an Embedding sequence of each word of a sentence is firstly used as the input of each time step of a bidirectional LSTM through a BiLSTM layer, and then the hidden states of forward and reverse output are spliced to obtain a complete hidden state sequence; then, a transmission score (tag vector) corresponding to each word is output.
The CRF layer is used for adding constraint conditions to the prediction tag group so as to obtain an optimal prediction tag group; the optimal predictive tag set is the multi-structured data set. Specifically, the output predictive label is corrected based on the emission score via CRF layer constraint, and then the optimal predictive label corresponding to each word is output.
(II) knowledge graph construction module
The knowledge graph construction module 200 is used for constructing a diabetic retinopathy rule knowledge graph; the diabetic retinopathy rule knowledge graph comprises a plurality of groups of case information; each set of the case information includes lesion name, symptom sign of the lesion, laboratory examination data of the lesion, imaging examination data of the lesion, therapeutic drug, and recipe care.
The knowledge graph construction module 200 specifically includes a data acquisition module and a knowledge graph construction module. The data acquisition module is used for acquiring a plurality of groups of case information from the third party database; the knowledge graph construction module is used for establishing an SPO triplet according to the case information and constructing a diabetic retinopathy rule knowledge graph according to a plurality of the SPO triplets.
Further, fig. 2 is a schematic layer Schema structure and data layer filling example of a diabetic retinopathy rule knowledge graph (abbreviated as "rule KG") constructed in the present invention. Considering a medical auxiliary diagnostic system as a typical example in serious medical scenario landing application, it often has more stringent requirements on accuracy, authority, etc. of knowledge. Therefore, all knowledge point SPO triple data used in the construction of the rule KG are collected from a plurality of authority Web sites such as official 'eye science' teaching materials, hundred degree medical classics, 360 health networks, medicine searching and questioning networks and the like, collected, checked and integrated by a plurality of staff, and pass blind examination, spot check and quality control of a plurality of three-level ophthalmologist examination groups, and the finally formed rule KG comprises 11 types and a total of 1930 entities. The entity type and the quantity distribution related to the Schema layer Schema (a piece of data must satisfy the entity object and the type of the entity object which are predefined by the Schema and are allowed to be updated into the knowledge graph) in the rule KG created by the invention are shown in the table 3 in detail.
Table 3 entity type and quantity distribution table of rule KG
Figure BDA0003788003440000091
Figure BDA0003788003440000101
In addition, the formed rule KG comprises 25 types and a total of 2396 different entity-relation-entity SPO triplets, the content covers medical knowledge of four aspects of diagnosis, treatment, management and prevention (short for diagnosis, treatment, management and prevention) of diabetic retinopathy and related diseases, and powerful guarantee is provided for subsequent rapid screening and accurate diagnosis of sugar net type 2 and 6 based on the rule KG. In addition, the invention can provide personalized treatment schemes, medication suggestions, recommended recipes, nursing modes and the like for the diagnosed sugar net patients based on the rule KG and the physique of the patients, and help the patients to prevent diseases, manage and control the course of the diseases and the like. The corresponding entity relationship types and quantity distribution are shown in Table 4:
table 4 entity relationship type and quantity distribution table of rule KG
Figure BDA0003788003440000102
/>
Figure BDA0003788003440000111
/>
Figure BDA0003788003440000121
(III) lesion determination Module
The lesion determination module 300 is respectively connected with the data processing module and the knowledge graph construction module, and is used for respectively performing similarity calculation on the multi-element structured data set and multiple groups of case information in the diabetic retinopathy rule knowledge graph so as to determine a maximum data similarity value. Specifically, firstly, medical record feature selection and disorder reduction are carried out on the input multi-element structured data set, and because the knowledge expressions such as main complaints, medical history and the like in the clinical case library inevitably have redundancy phenomenon, the redundant features and the knowledge expressions are deleted on the premise of keeping the classification and deducing capability unchanged, and the operation can effectively reduce the complexity of subsequent similarity calculation.
The lesion determination module 300 specifically includes a similarity calculation sub-module and a maximum value determination sub-module. In the medical knowledge graph, the most important entity identity is the entity name and attribute, and the relationship between the entities is only referred, so that a weight gamma is set in similarity calculation and used for representing the weight distribution of the entity name/attribute vector and the relationship vector, and the relationship vector only selects the relationship of important categories. Finally, the two entity similarities are combined into root mean square by the two sets of vector similarities. Based on this, the similarity calculation submodule is used for calculating the similarity according to the formula
Figure BDA0003788003440000122
And calculating the data similarity value.
Wherein Sim (V) 1 ,V 2 ) Representing a first entity V 1 With a second entity V 2 Data similarity values of (2); the first entity V 1 For a multi-structured data set, the second entity V 2 Is any group of case information in a diabetic retinopathy rule knowledge graph; v (V) n1 Text attribute vector representing first entity, V n2 A text attribute vector representing a second entity; v (V) γ1 A text relationship vector representing the first entity, V γ2 A literal relationship vector representing a second entity; gamma represents the weight distribution of the character attribute vector and the character relation vector, and can be custom set according to different entities, and specific reference can be made to the diagnosis standard of a professional doctor.
Figure BDA0003788003440000131
Wherein C is 1 And C 2 All represent variables, when C 1 Is V (V) n1 At time C 2 Is V (V) n2 The method comprises the steps of carrying out a first treatment on the surface of the When C 1 Is V (V) γ1 At time C 2 Is V (V) γ2 The method comprises the steps of carrying out a first treatment on the surface of the i represents the independent variable constant and C represents the dimension of the vector. When C 1 Is V (V) n1 ,C 2 Is V (V) n2 At time C 1i The ith dimension, C, of the text attribute vector representing the first entity 2i The ith dimension of the literal attribute vector representing the second entity. Similarly, when C 1 Is V (V) γ1 ,C 2 Is V (V) γ2 At time C 1i The ith dimension, C, of the literal relationship vector representing the first entity 2i An ith dimension of the literal relationship vector representing the second entity.
The maximum value determination submodule is used for determining a maximum data similarity value according to the plurality of data similarity values.
(IV) result determination Module
A result determination module 400, coupled to the lesion determination module, for:
judging whether the maximum data similarity value is smaller than a set threshold value or not; marking the multi-element structured data set when the maximum data similarity value is less than a set threshold; and when the maximum data similarity value is greater than or equal to a set threshold value, determining case information corresponding to the maximum data similarity value as a screening result.
Specifically, the set threshold is set to 0.8. If the maximum data similarity value is greater than or equal to 0.8, the most similar medical records (i.e., the case information corresponding to the maximum data similarity value) are output, and diagnostic reasoning is performed according to the content of the medical records, which is similar to the analogy reasoning of clinicians. If the maximum data similarity value is lower than 0.8, the medical record is considered to be a new medical record which is not in the case KG, at the moment, reasoning can be carried out by a heavy head according to medical knowledge stored in the rule KG, and finally, a diagnosis conclusion is given, wherein the process is similar to the logic reasoning of a clinician.
The newly generated medical records are processed and can be stored into the case KG again for expanding the case KG. The clinician regularly extracts the rules of the case KG and analyzes the precipitate, so that new medical knowledge can be continuously expanded into the rules KG. This process is similar to the process of clinical experience expansion by time accumulation of clinical staff. The invention supports the diagnosis reasoning process of simulating doctors only depending on the rule KG in the cold start link, and realizes the rapid screening and diagnosis of sugar net patients.
Preferably, the diabetic retinopathy auxiliary screening system further comprises:
the case adding module is used for obtaining the case information corresponding to the marked multi-structured data set and storing the case information corresponding to the marked multi-structured data set to the diabetic retinopathy rule knowledge graph.
And the text linking module is respectively connected with the data processing module and the knowledge graph construction module and is used for carrying out entity linking on the multi-element structured data set and the case information in the diabetic retinopathy rule knowledge graph so as to realize the association mapping of the multi-element structured data set and any case information. Specifically, the character strings of the identified characterization entities, relationships, attributes and the like are mapped to the corresponding entities of the rule KG through entity links, and finally, the association mapping of the information such as the symptoms, the signs, the inspection results and the like of the patient in the medical record to be diagnosed and the rule KG is realized.
In one embodiment, a specific application of the diabetic retinopathy auxiliary screening system of the present invention is as follows:
(1) The summary table of the medical record profiling information to be diagnosed after desensitization treatment is shown in table 5, and the personal information, symptoms, signs, complaints, inspection results of fundus images and the like of the patient are mainly recorded.
Table 5 summary of the recorded medical record to be diagnosed
Figure BDA0003788003440000141
/>
(2) Executing a data processing module on the recorded medical record to be diagnosed in table 5: normalization operation is carried out on medical terms and entity names through a medical record text knowledge normalization link, so that joint disambiguation (such as spot bleeding, exudation, hard exudation, cotton wool spots and cotton wool spots) is realized; and then invoking a BiLSTM+CRF mixed model in a medical record text knowledge extraction link to extract disease/symptom entities (such as hypertension and sheet hemorrhage) in the medical record text, and corresponding attribute values (such as diabetes mellitus-LastTime: 3 months), wherein the final structured medical record to be diagnosed is shown in a table 6.
Table 6 structured medical record summary table to be diagnosed obtained by data processing module
Figure BDA0003788003440000142
Figure BDA0003788003440000151
(3) And executing the set Cypher statement in a Neo4j graph database of the storage rule KG by using the extracted symptom entity to perform the association search of the entity and the relation.
The eye diseases which are obtained after the searching and the inquiring and simultaneously show the hard exudation and the cotton velvet spot are provided with the diabetic retinopathy and the moderate non-proliferation DR, and at the moment, the symptom which is not shared by the two diseases is further combined, namely the spot-sheet bleeding, so that the patient to be diagnosed can be rapidly diagnosed as the moderate non-proliferation DR. To this end, a rapid screening procedure for whether the patient in this example is suffering from diabetic retinopathy has been completed.
Aiming at complex situations, when the type of the patient in the sugar net 'type 2 and 6 period' can not be judged only according to information such as symptoms, signs and the like, attribute information (such as cotton linter macula (loc: macula, num: few)) of a symptom entity can be further called to carry out association mapping and rule matching with a disease parting and stage standard (disease Classify Standard) in a rule KG, so that final diagnosis is realized. In addition, if the signs and symptoms of the mild patients are not obvious, the basic information such as the region, age, sex, medical history and the like of the patients can be further combined, and similarity calculation can be carried out on the basic information and the characteristics such as the high incidence region and the high risk group of the diabetic retinopathy in the rule KG so as to further determine whether the patients suffer from the diabetic retinopathy.
In a specific embodiment, the auxiliary screening of diabetic retinopathy based on the knowledge graph can be replaced by any other disease according to the service requirement, and the corresponding rule KG can be constructed by referring to the pattern layer Schema structure designed by the invention. Further, the medical record filing method can limit the affiliated hospitals, information input modes and the like of the patient according to actual needs; the device, the method, the report reading return, the tested crowd characteristics and the like for laboratory examination and imaging examination which are needed in the process of diagnosing the patient can be set according to actual needs.
Compared with the prior art, the invention has the following advantages:
(1) The invention is based on the convenience of the knowledge graph in eliminating the interference of language ambiguity and the complex relation among related entities, and the self-evolution capability and the reasoning operation capability in revealing the dynamic development rule of the knowledge field, integrates the rule KG and the case KG 'double engines', screens the newly input medical record to be diagnosed, and remarkably improves the result accuracy and the self-evolution capability of the sugar network auxiliary diagnosis system.
(2) According to the invention, in an initial cold start link, the rapid screening of the sugar net patient can be realized by simulating the screening reasoning process of a doctor only depending on the rule KG. The real clinical case data of the patient to be diagnosed can be continuously introduced in the follow-up process, the data processing module and the lesion determination module extract information of corresponding entities, relations, attributes and the like of the structured medical record to be diagnosed, and then the follow-up similarity calculation and multi-layer deep retrieval can be conveniently carried out according to the diabetic retinopathy case knowledge graph, so that the multi-mode comprehensive reasoning and decision are achieved, and the final goal of intelligent auxiliary rapid and accurate screening of the sugar net disease is realized.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims (7)

1. A diabetic retinopathy auxiliary screening system, the system comprising:
the data processing module is used for sequentially carrying out text standardization processing and text extraction processing on the patient medical record data so as to obtain a multi-element structured data set; the patient medical record data comprises patient symptom signs, current medical history, past medical history, laboratory examination results and imaging examination results; the multi-element structured data set comprises text entities corresponding to the patient medical record data, text relations among different text entities and text attributes corresponding to the text entities;
the knowledge graph construction module is used for constructing a diabetic retinopathy rule knowledge graph; the diabetic retinopathy rule knowledge graph comprises a plurality of groups of case information; each set of the case information includes lesion name, lesion symptom sign, laboratory examination data of the lesion, imaging examination data of the lesion, therapeutic drug and recipe care;
the lesion determining module is respectively connected with the data processing module and the knowledge graph construction module and is used for respectively carrying out similarity calculation on the multi-element structured data set and a plurality of groups of case information in the diabetic retinopathy rule knowledge graph so as to determine the maximum data similarity value; the lesion determination module specifically comprises: similarity calculation submodule for calculating similarity according to formula
Figure FDA0004200526810000011
Calculating a data similarity value;
wherein Sim (V) 1 ,V 2 ) Representing a first entity V 1 With a second entity V 2 Data similarity values of (2); the first entity V 1 For a multi-structured data set, the second entity V 2 Is any group of case information in a diabetic retinopathy rule knowledge graph; v (V) n1 Text attribute vector representing first entity, V n2 A text attribute vector representing a second entity; v (V) γ1 A text relationship vector representing the first entity, V γ2 A literal relationship vector representing a second entity; gamma represents the weight distribution of the character attribute vector and the character relation vector;
Figure FDA0004200526810000012
wherein C is 1 And C 2 All represent variables, when C 1 Is V (V) n1 At time C 2 Is V (V) n2 The method comprises the steps of carrying out a first treatment on the surface of the When C 1 Is V (V) γ1 At time C 2 Is V (V) γ2 The method comprises the steps of carrying out a first treatment on the surface of the i represents an independent variable constant, and C represents the dimension of the vector;
the maximum value determining submodule is used for determining a maximum data similarity value according to the plurality of data similarity values;
the result determining module is connected with the lesion determining module and is used for:
judging whether the maximum data similarity value is smaller than a set threshold value or not;
marking the multi-element structured data set when the maximum data similarity value is less than a set threshold;
and when the maximum data similarity value is greater than or equal to a set threshold value, determining case information corresponding to the maximum data similarity value as a screening result.
2. The diabetic retinopathy auxiliary screening system according to claim 1, wherein the data processing module specifically comprises:
the standardized unified sub-module is used for carrying out word mapping and word sense disambiguation on the patient medical record data according to a preset database so as to obtain a standard text data set;
the text extraction sub-module is used for inputting the standard text data set into a Chinese named entity recognition mixed model for text extraction so as to obtain a multi-element structured data set; the Chinese named entity recognition hybrid model is obtained by training a deep learning BiLSTM-CRF model by adopting a training set; the deep learning BiLSTM-CRF model comprises a BiLSTM layer and a CRF layer which are sequentially connected; the training set includes a plurality of sample data; each sample data includes historical patient medical record data and label information; the label information is word vectors corresponding to the historical patient medical record data; the word vectors include a text entity vector, a text attribute vector, and a text relationship vector.
3. The diabetic retinopathy auxiliary screening system according to claim 2, wherein the text extraction submodule specifically includes, in terms of training of a chinese named entity recognition hybrid model:
the training set acquisition unit is used for acquiring a plurality of sample data;
the model training unit is used for inputting a plurality of sample data into the deep learning BiLSTM-CRF model for training so as to obtain an optimal deep learning BiLSTM-CRF model; the optimal deep learning BiLSTM-CRF model is a Chinese naming entity identification mixed model; the BiLSTM layer is used for carrying out bidirectional coding on the historical patient medical record data, and calculating the probability that text entities, text relations and text attributes in the historical patient medical record data are marked as label information so as to obtain a prediction label group corresponding to the historical patient medical record data; the CRF layer is used for adding constraint conditions to the prediction tag group so as to obtain an optimal prediction tag group; the optimal predictive tag set is the multi-structured data set.
4. The diabetic retinopathy auxiliary screening system according to claim 2, wherein in terms of construction of a training set, the text extraction sub-module further comprises:
an initial data acquisition unit for acquiring a plurality of history patient medical record data;
the word segmentation unit is used for carrying out character division on each history patient medical record data based on the jieba word segmentation library so as to obtain word segmentation texts;
the marking unit is used for marking the word segmentation text by adopting a BIO marking method so as to obtain a marked text; the noted text includes a text entity, a text relationship and a text attribute;
and the word embedding unit is used for converting the marked text into a word vector by adopting a word2vec method.
5. The diabetic retinopathy auxiliary screening system according to claim 1, further comprising:
the case adding module is used for obtaining the case information corresponding to the marked multi-structured data set and storing the case information corresponding to the marked multi-structured data set to the diabetic retinopathy rule knowledge graph.
6. The diabetic retinopathy auxiliary screening system according to claim 1, wherein the knowledge graph construction module specifically comprises:
the data acquisition module is used for acquiring a plurality of groups of case information from the third party database;
and the knowledge graph construction module is used for establishing an SPO triplet according to the case information and constructing a diabetic retinopathy rule knowledge graph according to a plurality of the SPO triplets.
7. The diabetic retinopathy auxiliary screening system according to claim 1, further comprising:
and the text linking module is respectively connected with the data processing module and the knowledge graph construction module and is used for carrying out entity linking on the multi-element structured data set and the case information in the diabetic retinopathy rule knowledge graph so as to realize the association mapping of the multi-element structured data set and any case information.
CN202210947675.1A 2022-08-09 2022-08-09 Auxiliary screening system for diabetic retinopathy Active CN115312186B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210947675.1A CN115312186B (en) 2022-08-09 2022-08-09 Auxiliary screening system for diabetic retinopathy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210947675.1A CN115312186B (en) 2022-08-09 2022-08-09 Auxiliary screening system for diabetic retinopathy

Publications (2)

Publication Number Publication Date
CN115312186A CN115312186A (en) 2022-11-08
CN115312186B true CN115312186B (en) 2023-06-09

Family

ID=83860001

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210947675.1A Active CN115312186B (en) 2022-08-09 2022-08-09 Auxiliary screening system for diabetic retinopathy

Country Status (1)

Country Link
CN (1) CN115312186B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118016316B (en) * 2024-04-10 2024-06-04 健数(长春)科技有限公司 Disease screening rate improving method and system by combining knowledge graph with blood routine test data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414393A (en) * 2020-03-26 2020-07-14 湖南科创信息技术股份有限公司 Semantic similar case retrieval method and equipment based on medical knowledge graph
CN111767410A (en) * 2020-06-30 2020-10-13 平安国际智慧城市科技股份有限公司 Construction method, device, equipment and storage medium of clinical medical knowledge map
CN112164460A (en) * 2020-10-19 2021-01-01 科技谷(厦门)信息技术有限公司 Intelligent disease auxiliary diagnosis system based on medical knowledge map
CN112364174A (en) * 2020-10-21 2021-02-12 山东大学 Patient medical record similarity evaluation method and system based on knowledge graph

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414393A (en) * 2020-03-26 2020-07-14 湖南科创信息技术股份有限公司 Semantic similar case retrieval method and equipment based on medical knowledge graph
CN111767410A (en) * 2020-06-30 2020-10-13 平安国际智慧城市科技股份有限公司 Construction method, device, equipment and storage medium of clinical medical knowledge map
CN112164460A (en) * 2020-10-19 2021-01-01 科技谷(厦门)信息技术有限公司 Intelligent disease auxiliary diagnosis system based on medical knowledge map
CN112364174A (en) * 2020-10-21 2021-02-12 山东大学 Patient medical record similarity evaluation method and system based on knowledge graph

Also Published As

Publication number Publication date
CN115312186A (en) 2022-11-08

Similar Documents

Publication Publication Date Title
CN113871003B (en) Disease auxiliary differential diagnosis system based on causal medical knowledge graph
US11749387B2 (en) Deduplication of medical concepts from patient information
CN109299239B (en) ES-based electronic medical record retrieval method
CN111986770B (en) Prescription medication auditing method, device, equipment and storage medium
CN112786194A (en) Medical image diagnosis guide inspection system, method and equipment based on artificial intelligence
CN113505243A (en) Intelligent question-answering method and device based on medical knowledge graph
CN112183026A (en) ICD (interface control document) encoding method and device, electronic device and storage medium
CN110277167A (en) The Chronic Non-Communicable Diseases Risk Forecast System of knowledge based map
US20180121603A1 (en) Identification of Related Electronic Medical Record Documents in a Question and Answer System
WO2023160264A1 (en) Medical data processing method and apparatus, and storage medium
CN112700865A (en) Intelligent triage method based on comprehensive reasoning
CN115293161A (en) Reasonable medicine taking system and method based on natural language processing and medicine knowledge graph
CN116386805A (en) Intelligent guided diagnosis report generation method
CN115312186B (en) Auxiliary screening system for diabetic retinopathy
CN116992002A (en) Intelligent care scheme response method and system
CN115841861A (en) Similar medical record recommendation method and system
Liao et al. Medical data inquiry using a question answering model
Nasiri et al. A medical case-based reasoning approach using image classification and text information for recommendation
CN118296121A (en) Medical term standardization auxiliary diagnosis method based on large language model
CN117194604A (en) Intelligent medical patient inquiry corpus construction method
JP2017167738A (en) Diagnostic processing device, diagnostic processing system, server, diagnostic processing method, and program
CN116994689A (en) Characterization processing method, device, equipment, medium and product of medical data
CN113314236A (en) Intelligent question-answering system for hypertension
CN112669961A (en) Intelligent triage method based on big data reasoning
Safari et al. An enhancement on Clinical Data Analytics Language (CliniDAL) by integration of free text concept search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant