CN117690600A - Knowledge-graph-based infectious disease prediction method, system, terminal and storage medium - Google Patents

Knowledge-graph-based infectious disease prediction method, system, terminal and storage medium Download PDF

Info

Publication number
CN117690600A
CN117690600A CN202410137531.9A CN202410137531A CN117690600A CN 117690600 A CN117690600 A CN 117690600A CN 202410137531 A CN202410137531 A CN 202410137531A CN 117690600 A CN117690600 A CN 117690600A
Authority
CN
China
Prior art keywords
infectious disease
information
preset
knowledge
related features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410137531.9A
Other languages
Chinese (zh)
Other versions
CN117690600B (en
Inventor
郭鹏
李涛
史浩田
邓小宁
金剑
马杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North Health Medical Big Data Technology Co ltd
Original Assignee
North Health Medical Big Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North Health Medical Big Data Technology Co ltd filed Critical North Health Medical Big Data Technology Co ltd
Priority to CN202410137531.9A priority Critical patent/CN117690600B/en
Publication of CN117690600A publication Critical patent/CN117690600A/en
Application granted granted Critical
Publication of CN117690600B publication Critical patent/CN117690600B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention relates to the field of infectious disease prediction, and in particular provides a knowledge-graph-based infectious disease prediction method, a knowledge-graph-based infectious disease prediction system, a knowledge-graph-based infectious disease prediction terminal and a knowledge-graph-based infectious disease prediction storage medium, wherein the method comprises the following steps: constructing an infectious disease knowledge graph; extracting target information of a target patient; searching all infectious diseases related to the entities and the relations matched in the target information in the knowledge graph to form an infectious disease candidate set; after the target information is coded by adopting a 01 coding mode, inputting the target information into a preset weight calculation model corresponding to each infectious disease in the infectious disease candidate set, and correspondingly calculating the weight corresponding to each infectious disease; sorting the infectious diseases based on the magnitude relation of the weights, and generating a sorting list; splitting a corresponding infectious disease knowledge graph from the infectious disease knowledge graph, and marking the infectious disease knowledge graph as a target graph; outputting the ordered list and the target map. The method has the advantages that the prediction result of the infectious disease is explanatory, the infectious disease knowledge graph related in the ordered list can be displayed, and the visualization of the prediction result is increased.

Description

Knowledge-graph-based infectious disease prediction method, system, terminal and storage medium
Technical Field
The invention belongs to the technical field of infectious disease prediction, and particularly relates to a knowledge-graph-based infectious disease prediction method, a knowledge-graph-based infectious disease prediction system, a knowledge-graph-based infectious disease prediction terminal and a knowledge-graph-based infectious disease prediction storage medium.
Background
Infectious diseases pose a serious threat to human health and socioeconomic performance. Early prediction is very important for the treatment and control of infectious diseases. Infectious diseases are classified into a first class, a second class and a third class in China for classification management. Infectious diseases of class a, such as plague, etc., seriously jeopardize public health and need to be reported immediately once found. Infectious diseases of type B, such as typhoid fever, influenza, etc., are reported in increasing numbers of cases weekly. Infections of the class of c, such as hand-foot-and-mouth disease, etc., are reported monthly. Different classes of infectious diseases have different reporting requirements and management measures. It is important how to make accurate predictions as early as possible using advanced techniques.
The prior art discloses a method for predicting an infectious disease, which comprises the steps of firstly acquiring electronic medical record information and inspection report information of a target patient, and then predicting whether the target patient is a suspected infectious disease by using the two-classification prediction model. Namely, the key of the method is the establishment of a prediction model: firstly, acquiring electronic medical record information and examination report information of all patients from a medical database, then training by utilizing the electronic medical record information and examination report information of each patient and a first multi-input densification diagnosis model to obtain a classification prediction model, and training by utilizing the electronic medical record information and examination report information of each patient and a second multi-input densification diagnosis model to obtain a multi-classification prediction model so as to obtain the multi-classification prediction model: the multi-classification predictive model specifically predicts the type of suspected infectious disease. However, the prediction method is actually to use a deep learning model to realize the prediction of whether the patient is an infectious disease, and the prediction result lacks of interpretation. This is a disadvantage of the prior art.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention provides a knowledge-graph-based infectious disease prediction method, a knowledge-graph-based infectious disease prediction system, a knowledge-graph-based infectious disease prediction terminal and a knowledge-graph-based infectious disease prediction storage medium, so as to solve the technical problem that a prediction result lacks interpretation.
In a first aspect, the present invention provides a knowledge-graph-based infectious disease prediction method, including:
constructing an infectious disease knowledge graph;
extracting information corresponding to the preset epidemiological history related features, information corresponding to the preset symptom related features and information corresponding to the preset sign related features from patient information of a target patient, and recording the information as target information;
searching the entity and the relation matched with each piece of information in the target information in the knowledge graph to obtain all infectious diseases related to the target information in the knowledge graph, and forming an infectious disease candidate set;
after the target information is coded by adopting a 01 coding mode, inputting the target information into a preset weight calculation model corresponding to each infectious disease in the infectious disease candidate set, and correspondingly calculating the weight corresponding to each infectious disease in the infectious disease candidate set;
ranking the infectious diseases in the set of infectious disease candidates based on the magnitude relation of the weights, and generating a ranking list;
splitting the infectious disease knowledge maps corresponding to all infectious diseases in the ordered list from the infectious disease knowledge maps, and marking the infectious disease knowledge maps as target maps;
and outputting the ordered list and the target map.
Further, the predetermined epidemiological history related features include:
contact history information, group information, food history information, travel history information, vaccination history information, family history information, allergy history information;
the contact history information includes whether a person having similar symptoms has been contacted recently or a source of infection has been contacted; the bunching information includes whether a bunching of similar cases occurred; food history information includes whether certain foods were consumed within two weeks before disease occurred; the travel history information includes whether the patient has been traveling outside before the occurrence of the disease; vaccination history information includes whether there is vaccination; family history information includes whether there are genetic cases in the family; the allergy history information includes whether or not the food or drug of some kind is exposed to allergy.
Further, extracting information corresponding to the preset epidemiological history related features, information corresponding to the preset symptom related features and information corresponding to the preset sign related features from patient information of the target patient, including: and extracting information corresponding to the preset epidemiological history related features, information corresponding to the preset symptom related features and information corresponding to the preset sign related features in the patient information of the target patient by using a preset general information extraction algorithm.
Further, the infectious disease knowledge graph includes infectious disease, symptoms of infectious disease, latency, infection patterns, signs, and summaries.
Further, the weight calculation model is a linear regression model.
Further, the method further comprises: a weight calculation model is previously assigned to each infectious disease involved in the infectious disease knowledge graph.
Further, the method for obtaining the weight calculation model of each infectious disease comprises the following steps:
collecting historical case data for an infectious disease, the historical case data including confirmed infectious disease information, epidemiological history related information, symptom related information, and sign related information;
preprocessing the historical case data to obtain a preprocessed historical case data set;
selecting information corresponding to the confirmed infectious diseases, the preset epidemiological history related features and the preset symptom related features and information corresponding to the preset physical sign related features in the preprocessed historical case data set, and 01 coding the information to obtain a training set; the weight of the codes of the confirmed infectious diseases in the training set corresponding to the infectious diseases;
taking the preset epidemiological history related features, the preset symptom related features and the preset sign related features as independent variables, taking the weight of infectious diseases as dependent variables, and constructing a multiple linear regression initial model of the infectious diseases, wherein the multiple linear regression initial model is as follows:
wherein,as constant term, n is the number of pre-set epidemiological history related features, ++>For the preset ith epidemiological history related features, m is the number of preset symptom related features, ++>For the j-th preset symptom-related feature, k is the number of preset sign-related features, +.>For the preset g-th sign related feature, </i >>、/>、/>Is a model coefficient, where i=1, 2,..n, j=1, 2,., m, g=1, 2,., k;
and training the multiple linear regression initial model by using the training set to obtain a trained multiple linear regression model, and obtaining a weight calculation model of the corresponding infectious disease.
In a second aspect, the present invention provides a knowledge-based infectious disease prediction method system, the system comprising:
the infectious disease knowledge graph construction module is used for constructing an infectious disease knowledge graph;
the extraction module is used for extracting information corresponding to the preset epidemiological history related features, information corresponding to the preset symptom related features and information corresponding to the preset sign related features from patient information of a target patient, and recording the information as target information;
the infectious disease candidate set generation module is used for searching the entity and the relation matched with each piece of information in the target information in the knowledge graph to obtain all infectious diseases related to the target information in the knowledge graph, so as to form an infectious disease candidate set;
the weight calculation module is used for inputting target information into a preset weight calculation model corresponding to each infectious disease in the infectious disease candidate set after the target information is coded in a 01 coding mode, and correspondingly calculating the weight corresponding to each infectious disease in the infectious disease candidate set;
the infectious disease ranking module is used for ranking infectious diseases in the infectious disease candidate set based on the magnitude relation of the weights and generating a ranking list;
the target map generation module is used for splitting the infectious disease knowledge maps corresponding to all infectious diseases in the ordered list from the infectious disease knowledge maps and marking the infectious disease knowledge maps as target maps;
and the visual output module is used for outputting the ordered list and the target map.
In a third aspect, a terminal is provided, including:
a processor, a memory, wherein,
the memory is used for storing a computer program,
the processor is configured to call and run the computer program from the memory, so that the terminal performs the method of the terminal as described above.
In a fourth aspect, there is provided a computer storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the method of the above aspects.
The invention has the beneficial effects that:
according to the knowledge-graph-based infectious disease prediction method, system, terminal and storage medium, the infectious disease candidate set is obtained by constructing the infectious disease graph and matching the extracted information corresponding to the preset epidemiological history related features, the information corresponding to the preset symptom related features and the information corresponding to the preset sign related features. According to the types of infectious diseases in the infectious disease candidate set, a preset weight calculation model corresponding to each infectious disease is obtained, and the weight corresponding to each infectious disease in the infectious disease candidate set is correspondingly calculated. And then ranking the candidate infectious diseases in the set of infectious disease candidates according to the weights, and generating a ranked list. And splitting the infectious disease knowledge maps corresponding to all infectious diseases in the ordered list from the infectious disease knowledge maps, marking the infectious disease knowledge maps as target maps, and finally outputting the ordered list and the target maps. The method has the advantages that the prediction result of the infectious disease is explanatory, the infectious disease knowledge graph related in the ordered list can be displayed, and the visualization of the prediction result is increased. In addition, the invention has reliable design principle, simple structure and very wide application prospect.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a schematic flow chart of a method of one embodiment of the invention.
FIG. 2 is a schematic block diagram of a system of one embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
Detailed Description
In order to make the technical solution of the present invention better understood by those skilled in the art, the technical solution of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
The infectious disease prediction method based on the knowledge graph provided by the embodiment of the invention is executed by the computer equipment, and correspondingly, the infectious disease prediction method based on the knowledge graph system is operated in the computer equipment.
FIG. 1 is a schematic flow chart of a method of one embodiment of the invention. The execution subject of fig. 1 may be a knowledge-based infectious disease prediction method system. The order of the steps in the flow chart may be changed and some may be omitted according to different needs.
As shown in fig. 1, the method includes:
step 110, constructing an infectious disease knowledge graph;
step 120, extracting information corresponding to the preset epidemiological history related features, information corresponding to the preset symptom related features and information corresponding to the preset sign related features from patient information of a target patient, and marking the information as target information;
step 130, searching the entity and relation matched with each piece of information in the target information in the knowledge graph to obtain all infectious diseases related to the target information in the knowledge graph, and forming a infectious disease candidate set;
step 140, after encoding the target information by using a 01 encoding mode, inputting the target information into a preset weight calculation model corresponding to each infectious disease in the infectious disease candidate set, and correspondingly calculating the weight corresponding to each infectious disease in the infectious disease candidate set;
step 150, sorting the infectious diseases in the infectious disease candidate set based on the magnitude relation of the weights, and generating a sorting list;
step 160, splitting the infectious disease knowledge graph corresponding to all infectious diseases in the ordered list from the infectious disease knowledge graph, and marking the infectious disease knowledge graph as a target graph;
and step 170, outputting the ordered list and the target map.
In order to facilitate understanding of the present invention, the method for predicting infectious diseases based on a knowledge graph according to the present invention will be further described with reference to the process of the method for predicting infectious diseases based on a knowledge graph in the embodiment.
Specifically, the knowledge-graph-based infectious disease prediction method comprises the following steps:
s1, constructing an infectious disease knowledge graph.
The knowledge graph of infectious disease includes infectious disease, symptoms of infectious disease, latency, infection pattern, signs and overview.
Through collection, arrangement and classification of the medical knowledge of infectious diseases, a complete infectious disease knowledge graph of various infectious diseases in the medical field is constructed.
The construction steps of the infectious disease knowledge graph comprise: in the first step, scientific literature is obtained. Taking influenza infection as an example, scientific literature related to seasonal influenza is searched, including virology, epidemiology, clinical research, and the like. Statistics of influenza, case reports, etc. are collected using medical databases, e.g. databases using public health authorities, such as CDC, WHO, etc. Clinical records are integrated, e.g., clinical case records are collected in different regions and years, including symptom descriptions, treatment experience, etc. In the second step, data cleaning and preprocessing, e.g., deleting duplicate documents and data, is performed to ensure the uniqueness of the data. The data from different sources is then standardized into a unified format, ensuring consistency. And finally, processing the existing missing value to ensure the integrity of the data. Third, determining the body structure of the knowledge graph, including entity types (for example, influenza infection, the entity types may be viruses, symptoms, vaccines, etc.), attributes (for example, influenza infection, the attributes may be virus subtypes, descriptions of symptoms, etc.), mapping the entity and the relationship to the body structure, and ensuring consistency and standardization of the graph. Fourth, a suitable graph database, such as Neo4j, is selected for storing and querying the constructed knowledge-graph. And finally, importing the cleaned and processed data into a graph database to finally obtain a constructed infectious disease knowledge graph.
S2, extracting information corresponding to the preset epidemiological history related features, information corresponding to the preset symptom related features and information corresponding to the preset sign related features from patient information of a target patient, and recording the information as target information.
The target patient is a patient for whom an infection prediction is to be made.
The predetermined epidemiological history related features include: contact history information, group information, food history information, travel history information, vaccination history information, family history information, allergy history information.
The contact history information includes whether a person with similar symptoms has been contacted recently or a source of infection has been contacted. The clustering information includes whether clustering of similar cases occurred. The food history information includes whether certain foods were consumed within two weeks before the occurrence of the disease. The travel history information includes whether the patient was traveling outside before the disease occurred. The vaccination history information includes whether there is vaccination. Family history information includes whether there are genetic cases in the family. The allergy history information includes whether or not the food or drug of some kind is exposed to allergy.
The predetermined symptom-related features may include, but are not limited to, headache, dizziness, nausea, debilitation, pain, anxiety, depression, insomnia, memory decline, dyspnea, loss of appetite, loss of taste, frequent urination, blurred vision, dry mouth, redness of the eyes, tremors of the hands, hair loss, numbness of hands and feet, tinnitus, diarrhea, convulsions.
The predetermined sign-related features may include, but are not limited to, hyperthermia, low fever, elevated blood pressure, reduced blood pressure, skin rash, bumps, too fast heartbeat, too slow heartbeat, arrhythmia, weight gain, weight loss, mydriasis, and muscle atrophy.
In particular, the person skilled in the art can set the above-mentioned preset epidemiological history related features, the above-mentioned preset symptom related features and the above-mentioned preset sign related features according to the actual situation.
Information corresponding to a preset epidemiological history related feature, information corresponding to a preset symptom related feature and information corresponding to a preset sign related feature extracted from patient information of a target patient, including: and extracting information corresponding to the preset epidemiological history related features, information corresponding to the preset symptom related features and information corresponding to the preset sign related features in the target information by using a preset general information extraction algorithm.
Taking the extraction of contact history information in epidemiological history as an example, in a first step, contact fields are defined: a field containing contact history information is determined, such as whether a person suffering from similar symptoms or some source of infection has been contacted. Second, design contact Shi Chouqu algorithm: an algorithm is created to extract information related to the contact history from the text. Thirdly, data cleaning: and processing the extracted information to ensure accuracy and consistency.
And S3, searching the entity and the relation matched with each piece of information in the target information in the knowledge graph to obtain all infectious diseases related to the target information in the knowledge graph, and forming an infectious disease candidate set.
Specifically, for each feature in each item (epidemiological history, symptoms, signs) extracted from the information of the target patient, a matching entity is searched in the knowledge graph. And using text matching technology, such as character string matching, word vector models (Word vectors), fuzzy matching and the like, to find out the entity matched with the patient information in the knowledge graph. For each matched entity, a relationship is obtained in relation thereto. Other entities associated with the entity in the target information are found by traversing the relationships in the atlas. All matched entities and their relationships are combined together to form a set of infectious disease candidates. The candidate set comprises all infectious diseases and relevant information related to target information in the knowledge graph.
S4, after the target information is coded by adopting a 01 coding mode, inputting the target information into a preset weight calculation model corresponding to each infectious disease in the infectious disease candidate set, and correspondingly calculating the weight corresponding to each infectious disease in the infectious disease candidate set.
Specifically, the method for respectively adopting 01 codes to the information corresponding to the preset epidemiological history related features, the information corresponding to the preset symptom related features and the information corresponding to the preset sign related features is as follows: the presence or absence of this feature is indicated by 0 or 1. Taking contact history information as an example, 0 indicates that a person suffering from similar symptoms has not been contacted recently or has contacted a certain infectious agent, and 1 indicates that a person suffering from similar symptoms has been contacted recently or has contacted a certain infectious agent.
Specifically, the weight calculation model is a linear regression model.
A weight calculation model is previously assigned to each infectious disease involved in the infectious disease knowledge graph. The method for obtaining the weight calculation model of each infectious disease comprises the following steps Q1 to Q4.
Q1: historical case data for infectious diseases is collected.
The historical case data includes confirmed infectious disease information, epidemiological history related information, symptom related information, and sign related information.
It will be appreciated that the epidemiological history related information includes, but is not limited to, information corresponding to the predetermined epidemiological history related features, the symptom related information includes, but is not limited to, information corresponding to the predetermined symptom related features, and the sign related information includes, but is not limited to, information corresponding to the predetermined sign related features.
The method for preprocessing the historical case data comprises the following steps: and deleting the abnormal value and filling the missing value.
Specifically, historical case data is imported using a data processing tool (such as the Pandas library in Python) and outliers are counted. Then selecting a proper filling method according to the data type and the missing value. For example, when the data is in a normal distribution or near normal distribution, mean or median padding may be used. When the data has a certain trend or periodicity, interpolation methods can be used for filling, for example, interpolation algorithms such as linear interpolation, polynomial interpolation and the like can be used for deducing missing values according to the existing data. When the data presents a distinct mode, a mode fill can be used, using the mode of the entire variable as a substitute for the missing value.
Q2: and selecting the information corresponding to the confirmed infectious diseases, the preset epidemiological history related features and the information corresponding to the preset symptom related features and the information corresponding to the preset sign related features in the preprocessed historical case data set, and carrying out 01 coding on the information to obtain a training set.
The codes of the confirmed infectious diseases in the training set correspond to the weight of the infectious diseases.
Specifically, the method for 01 coding the information corresponding to the preset epidemiological history related features, the information corresponding to the preset symptom related features and the information corresponding to the preset sign related features of the selected case data in the preprocessed historical case data set is as follows: the presence or absence of this feature in the patient is indicated by 0 or 1. Wherein the codes of the confirmed infectious disease information are all 1, namely the weight is 1. Epidemiological history is characterized by contact history information, where 0 indicates that a person having similar symptoms has not been contacted recently or has been contacted with an infectious agent, and 1 indicates that a person having similar symptoms has been contacted recently or has been contacted with an infectious agent.
Q3: and constructing a multiple linear regression initial model of the infectious disease by taking the preset epidemiological history related features, the preset symptom related features and the preset sign related features as independent variables and the weight of the infectious disease as the dependent variables.
The initial multiple linear regression model is:
wherein,as constant term, n is the number of pre-set epidemiological history related features (i.e. number), +.>For the preset ith epidemiological history related features, m is the number of preset symptom related features, ++>For the preset j-th symptom-related feature, k is the number of preset sign-related features, < ->For the preset g-th sign related feature, </i >>、/>、/>Is a model coefficient, where i=1, 2,..n, j=1, 2,..m, g=1, 2,..k.
Q4: and training the multiple linear regression initial model by using the training set to obtain a trained multiple linear regression model, and obtaining a weight calculation model of the corresponding infectious disease.
Specifically, taking influenza as an example, the weight calculation model finally obtained is:
weight for influenza = 0.04488-0.1167 x history of contact +0.5392 x hyperthermia +0.056 x cough
S5, sorting the infectious diseases in the infectious disease candidate set based on the weight magnitude relation, and generating a sorting list.
Specifically, for each infectious disease in the set of infectious disease candidates, a corresponding weight has been calculated by the model. The calculated weights are sorted into a data structure, e.g. into a list, containing the name of the infectious disease and the corresponding weights. And sorting the sorted weight data in descending order according to the weight size.
S6, splitting the infectious disease knowledge graph corresponding to all infectious diseases in the ordered list from the infectious disease knowledge graph, and marking the infectious disease knowledge graph as a target graph.
Specifically, using the sorting method in the previous step, an infectious disease list sorted according to the weight is obtained. Based on the infectious diseases in the list of infectious diseases, the knowledge graph of infectious diseases is traversed, and entities and relationships related to each infectious disease in the list of target infectious diseases are retrieved. For each target infectious disease, extracting relevant information of the target infectious disease in a knowledge graph, wherein the relevant information comprises information corresponding to preset epidemiological history relevant characteristics, information corresponding to preset symptom relevant characteristics and information corresponding to preset sign relevant characteristics. And finally, storing the obtained target knowledge graph.
And S7, outputting the ordered list and the target map.
In particular, a GUI may be created using a library of graphical interfaces (e.g., tlater, pyQt, etc.) to allow a user to intuitively view the ordered list. For example, a Tlater GUI is created that includes a text box in which the names of infectious diseases in the ordered list are displayed row by row. This GUI is based on a function display_scaled_list, which accepts the ordered list as a parameter and is displayed in the GUI. The visualization method provides more visual display, and can better display the information of the ordered list and the target knowledge graph.
In some embodiments, the knowledge-based infectious disease prediction method system 300 may include a plurality of functional modules consisting of computer program segments. The computer program of each program segment in the knowledge-based infectious disease prediction method system 300 may be stored in a memory of a computer device and executed by at least one processor to perform (see fig. 1 for details) the functions of the knowledge-based infectious disease prediction method.
In this embodiment, the knowledge-based infectious disease prediction method system 200 may be divided into a plurality of functional modules according to the functions performed by the system, as shown in fig. 2. The functional module may include: an infectious disease knowledge map construction module 210, an extraction module 220, an infectious disease candidate set generation module 230 and weight calculation module 240, an infectious disease ranking module 250, a target map generation module 260, and a visual output module 270. The module referred to in the present invention refers to a series of computer program segments capable of being executed by at least one processor and of performing a fixed function, stored in a memory. In the present embodiment, the functions of the respective modules will be described in detail in the following embodiments.
The infectious disease knowledge graph construction module is used for constructing an infectious disease knowledge graph;
the extraction module is used for extracting information corresponding to the preset epidemiological history related features, information corresponding to the preset symptom related features and information corresponding to the preset sign related features from patient information of a target patient, and recording the information as target information;
the infectious disease candidate set generation module is used for searching the entity and the relation matched with each piece of information in the target information in the knowledge graph to obtain all infectious diseases related to the target information in the knowledge graph, so as to form an infectious disease candidate set;
the weight calculation module is used for inputting target information into a preset weight calculation model corresponding to each infectious disease in the infectious disease candidate set after the target information is coded in a 01 coding mode, and correspondingly calculating the weight corresponding to each infectious disease in the infectious disease candidate set;
the infectious disease ranking module is used for ranking infectious diseases in the infectious disease candidate set based on the magnitude relation of the weights and generating a ranking list;
the target map generation module is used for splitting the infectious disease knowledge maps corresponding to all infectious diseases in the ordered list from the infectious disease knowledge maps and marking the infectious disease knowledge maps as target maps;
and the visual output module is used for outputting the ordered list and the target map.
Fig. 3 is a schematic structural diagram of a terminal 300 according to an embodiment of the present invention, where the terminal 300 may be used to execute the knowledge-graph-based infectious disease prediction method according to the embodiment of the present invention.
The terminal 300 may include: processor 310, memory 320, and communication module 330. The components may communicate via one or more buses, and it will be appreciated by those skilled in the art that the configuration of the server as shown in the drawings is not limiting of the invention, as it may be a bus-like structure, a star-like structure, or include more or fewer components than shown, or may be a combination of certain components or a different arrangement of components.
The memory 320 may be used to store instructions for execution by the processor 310, and the memory 320 may be implemented by any type of volatile or non-volatile memory terminal or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk, or optical disk. The execution of the instructions in memory 320, when executed by processor 310, enables terminal 300 to perform some or all of the steps in the method embodiments described below.
The processor 310 is a control center of the storage terminal, connects various parts of the entire electronic terminal using various interfaces and lines, and performs various functions of the electronic terminal and/or processes data by running or executing software programs and/or modules stored in the memory 320, and invoking data stored in the memory. The processor may be comprised of an integrated circuit (Integrated Circuit, simply referred to as an IC), for example, a single packaged IC, or may be comprised of a plurality of packaged ICs connected to the same function or different functions. For example, the processor 310 may include only a central processing unit (Central Processing Unit, simply CPU). In the embodiment of the invention, the CPU can be a single operation core or can comprise multiple operation cores.
And a communication module 330, configured to establish a communication channel, so that the storage terminal can communicate with other terminals. Receiving user data sent by other terminals or sending the user data to other terminals.
The present invention also provides a computer storage medium in which a program may be stored, which program may include some or all of the steps in the embodiments provided by the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a random-access memory (random access memory, RAM), or the like.
Therefore, the infectious disease candidate set is obtained by constructing and matching the infectious disease map with the information corresponding to the extracted preset epidemiological history related features, the information corresponding to the preset symptom related features and the information corresponding to the preset sign related features. According to the types of infectious diseases in the infectious disease candidate set, a preset weight calculation model corresponding to each infectious disease is obtained, and the weight corresponding to each infectious disease in the infectious disease candidate set is correspondingly calculated. And then ranking the candidate infectious diseases in the set of infectious disease candidates according to the weights, and generating a ranked list. And splitting the infectious disease knowledge maps corresponding to all infectious diseases in the ordered list from the infectious disease knowledge maps, marking the infectious disease knowledge maps as target maps, and finally outputting the ordered list and the target maps. The method has the advantages that the prediction result of the infectious disease is explanatory, the infectious disease knowledge graph related in the ordered list can be displayed, and the visualization of the prediction result is increased. The technical effects achieved by this embodiment may be referred to above, and will not be described herein.
It will be apparent to those skilled in the art that the techniques of embodiments of the present invention may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solution in the embodiments of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium such as a U-disc, a mobile hard disc, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, etc. various media capable of storing program codes, including several instructions for causing a computer terminal (which may be a personal computer, a server, or a second terminal, a network terminal, etc.) to execute all or part of the steps of the method described in the embodiments of the present invention.
The same or similar parts between the various embodiments in this specification are referred to each other. In particular, for the terminal embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference should be made to the description in the method embodiment for relevant points.
In the several embodiments provided by the present invention, it should be understood that the disclosed systems and methods may be implemented in other ways. For example, the system embodiments described above are merely illustrative, e.g., the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with respect to each other may be through some interface, indirect coupling or communication connection of systems or modules, electrical, mechanical, or other form.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
Although the present invention has been described in detail by way of preferred embodiments with reference to the accompanying drawings, the present invention is not limited thereto. Various equivalent modifications and substitutions may be made in the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and it is intended that all such modifications and substitutions be within the scope of the present invention/be within the scope of the present invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. An infectious disease prediction method based on a knowledge graph is characterized by comprising the following steps:
constructing an infectious disease knowledge graph;
extracting information corresponding to the preset epidemiological history related features, information corresponding to the preset symptom related features and information corresponding to the preset sign related features from patient information of a target patient, and recording the information as target information;
searching the entity and the relation matched with each piece of information in the target information in the knowledge graph to obtain all infectious diseases related to the target information in the knowledge graph, and forming an infectious disease candidate set;
after the target information is coded by adopting a 01 coding mode, inputting the target information into a preset weight calculation model corresponding to each infectious disease in the infectious disease candidate set, and correspondingly calculating the weight corresponding to each infectious disease in the infectious disease candidate set;
ranking the infectious diseases in the set of infectious disease candidates based on the magnitude relation of the weights, and generating a ranking list;
splitting the infectious disease knowledge maps corresponding to all infectious diseases in the ordered list from the infectious disease knowledge maps, and marking the infectious disease knowledge maps as target maps;
and outputting the ordered list and the target map.
2. The method of claim 1, wherein the predetermined epidemiological history related features include: contact history information, group information, food history information, travel history information, vaccination history information, family history information, allergy history information;
the contact history information includes whether a person having similar symptoms has been contacted recently or a source of infection has been contacted; the bunching information includes whether a bunching of similar cases occurred; food history information includes whether certain foods were consumed within two weeks before disease occurred; the travel history information includes whether the patient has been traveling outside before the occurrence of the disease; vaccination history information includes whether there is vaccination; family history information includes whether there are genetic cases in the family; the allergy history information includes whether or not the food or drug of some kind is exposed to allergy.
3. The method of claim 1, wherein extracting information corresponding to the predetermined epidemiological history related features, information corresponding to the predetermined symptom related features, and information corresponding to the predetermined sign related features from patient information of the target patient comprises: and extracting information corresponding to the preset epidemiological history related features, information corresponding to the preset symptom related features and information corresponding to the preset sign related features in the patient information of the target patient by using a preset general information extraction algorithm.
4. The method of claim 1, wherein the infectious disease knowledge graph comprises infectious disease, symptoms of infectious disease, latency, infection pattern, signs, and summaries.
5. The method of claim 1, wherein the weight calculation model is a linear regression model.
6. The method according to claim 1, wherein the method further comprises: a weight calculation model is previously assigned to each infectious disease involved in the infectious disease knowledge graph.
7. The method of claim 6, wherein the method of obtaining a weight calculation model for each infectious disease comprises:
collecting historical case data for an infectious disease, the historical case data including confirmed infectious disease information, epidemiological history related information, symptom related information, and sign related information;
preprocessing the historical case data to obtain a preprocessed historical case data set;
selecting information corresponding to the confirmed infectious diseases, the preset epidemiological history related features and the preset symptom related features and information corresponding to the preset physical sign related features in the preprocessed historical case data set, and 01 coding the information to obtain a training set; the weight of the codes of the confirmed infectious diseases in the training set corresponding to the infectious diseases;
taking the preset epidemiological history related features, the preset symptom related features and the preset sign related features as independent variables, taking the weight of infectious diseases as dependent variables, and constructing a multiple linear regression initial model of the infectious diseases, wherein the multiple linear regression initial model is as follows:
wherein,as constant term, n is the number of pre-set epidemiological history related features, ++>For the preset ith epidemiological history related features, m is the number of preset symptom related features, ++>For the preset j-th symptom-related feature, k is the number of preset sign-related features, < ->For the preset g-th sign related feature, </i >>、/>、/>Is a model coefficient, where i=1, 2,..n, j=1, 2,., m, g=1, 2,., k;
and training the multiple linear regression initial model by using the training set to obtain a trained multiple linear regression model, and obtaining a weight calculation model of the corresponding infectious disease.
8. An infectious disease prediction method system based on a knowledge graph is characterized in that the system comprises:
the infectious disease knowledge graph construction module is used for constructing an infectious disease knowledge graph;
the extraction module is used for extracting information corresponding to the preset epidemiological history related features, information corresponding to the preset symptom related features and information corresponding to the preset sign related features from patient information of a target patient, and recording the information as target information;
the infectious disease candidate set generation module is used for searching the entity and the relation matched with each piece of information in the target information in the knowledge graph to obtain all infectious diseases related to the target information in the knowledge graph, so as to form an infectious disease candidate set;
the weight calculation module is used for inputting target information into a preset weight calculation model corresponding to each infectious disease in the infectious disease candidate set after the target information is coded in a 01 coding mode, and correspondingly calculating the weight corresponding to each infectious disease in the infectious disease candidate set;
the infectious disease ranking module is used for ranking infectious diseases in the infectious disease candidate set based on the magnitude relation of the weights and generating a ranking list;
the target map generation module is used for splitting the infectious disease knowledge maps corresponding to all infectious diseases in the ordered list from the infectious disease knowledge maps and marking the infectious disease knowledge maps as target maps;
and the visual output module is used for outputting the ordered list and the target map.
9. A terminal, comprising:
a memory for storing a knowledge-graph-based infectious disease prediction program;
a processor for implementing the knowledge-based infectious disease prediction method according to any one of claims 1-7 when executing the knowledge-based infectious disease prediction program.
10. A computer readable storage medium storing a computer program, characterized in that the readable storage medium stores a knowledge-based infectious disease prediction program, which when executed by a processor, implements the steps of the knowledge-based infectious disease prediction method according to any one of claims 1-7.
CN202410137531.9A 2024-02-01 2024-02-01 Knowledge-graph-based infectious disease prediction method, system, terminal and storage medium Active CN117690600B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410137531.9A CN117690600B (en) 2024-02-01 2024-02-01 Knowledge-graph-based infectious disease prediction method, system, terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410137531.9A CN117690600B (en) 2024-02-01 2024-02-01 Knowledge-graph-based infectious disease prediction method, system, terminal and storage medium

Publications (2)

Publication Number Publication Date
CN117690600A true CN117690600A (en) 2024-03-12
CN117690600B CN117690600B (en) 2024-04-30

Family

ID=90139305

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410137531.9A Active CN117690600B (en) 2024-02-01 2024-02-01 Knowledge-graph-based infectious disease prediction method, system, terminal and storage medium

Country Status (1)

Country Link
CN (1) CN117690600B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021180245A1 (en) * 2020-11-02 2021-09-16 平安科技(深圳)有限公司 Server, data processing method and apparatus, and readable storage medium
CN113889284A (en) * 2021-09-16 2022-01-04 同济大学 Infectious disease contact target tracking method based on public transport knowledge graph
CN115344713A (en) * 2022-08-19 2022-11-15 上海安图生物技术有限公司 Disease prediction method based on disease diagnosis standard knowledge graph
CN115858820A (en) * 2023-02-13 2023-03-28 南京云创大数据科技股份有限公司 Prediction method and device based on medical knowledge graph, electronic equipment and storage medium
CN116013534A (en) * 2022-10-10 2023-04-25 睿愈(南京)数字医疗科技有限公司 Clinical auxiliary decision-making method and system based on medical guideline and data
CN116168825A (en) * 2022-12-27 2023-05-26 中国科学院计算机网络信息中心 Automatic diagnosis device for automatic interpretable diseases based on knowledge graph enhancement
WO2023098288A1 (en) * 2021-12-01 2023-06-08 浙江大学 Aided disease differential diagnosis system based on causality-containing medical knowledge graph
CN116844736A (en) * 2023-06-21 2023-10-03 山东浪潮智慧医疗科技有限公司 Infectious disease early warning method and system based on medical knowledge graph
CN116884636A (en) * 2023-06-30 2023-10-13 平安科技(深圳)有限公司 Infectious disease data analysis method, infectious disease data analysis device, computer equipment and storage medium
CN117174279A (en) * 2022-05-26 2023-12-05 北京百度网讯科技有限公司 Method and apparatus for predicting information

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021180245A1 (en) * 2020-11-02 2021-09-16 平安科技(深圳)有限公司 Server, data processing method and apparatus, and readable storage medium
CN113889284A (en) * 2021-09-16 2022-01-04 同济大学 Infectious disease contact target tracking method based on public transport knowledge graph
WO2023098288A1 (en) * 2021-12-01 2023-06-08 浙江大学 Aided disease differential diagnosis system based on causality-containing medical knowledge graph
CN117174279A (en) * 2022-05-26 2023-12-05 北京百度网讯科技有限公司 Method and apparatus for predicting information
CN115344713A (en) * 2022-08-19 2022-11-15 上海安图生物技术有限公司 Disease prediction method based on disease diagnosis standard knowledge graph
CN116013534A (en) * 2022-10-10 2023-04-25 睿愈(南京)数字医疗科技有限公司 Clinical auxiliary decision-making method and system based on medical guideline and data
CN116168825A (en) * 2022-12-27 2023-05-26 中国科学院计算机网络信息中心 Automatic diagnosis device for automatic interpretable diseases based on knowledge graph enhancement
CN115858820A (en) * 2023-02-13 2023-03-28 南京云创大数据科技股份有限公司 Prediction method and device based on medical knowledge graph, electronic equipment and storage medium
CN116844736A (en) * 2023-06-21 2023-10-03 山东浪潮智慧医疗科技有限公司 Infectious disease early warning method and system based on medical knowledge graph
CN116884636A (en) * 2023-06-30 2023-10-13 平安科技(深圳)有限公司 Infectious disease data analysis method, infectious disease data analysis device, computer equipment and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
LIU, PH等: "HKDP: A Hybrid Knowledge Graph Based Pediatric Disease Prediction System", 《SMART HEALTH》, 31 January 2017 (2017-01-31) *
姜茂敏;张富程;高凯;: "基于知识图谱的传染病防控机制研究可视化分析", 中国医疗管理科学, no. 04, 15 July 2020 (2020-07-15) *
陈德华;殷苏娜;乐嘉锦;王梅;潘乔;朱立峰;: "一种面向临床领域时序知识图谱的链接预测模型", 计算机研究与发展, no. 12, 15 December 2017 (2017-12-15) *
陈晓慧;刘俊楠;徐立;李佳;张伟;刘海砚;: "COVID-19病例活动知识图谱构建――以郑州市为例", 武汉大学学报(信息科学版), no. 06, 5 June 2020 (2020-06-05) *

Also Published As

Publication number Publication date
CN117690600B (en) 2024-04-30

Similar Documents

Publication Publication Date Title
US11210292B2 (en) Search method and apparatus
US7917377B2 (en) Patient data mining for automated compliance
JP6066826B2 (en) Analysis system and health business support method
Young et al. Using search engine data as a tool to predict syphilis
Chattopadhyay et al. A Case‐Based Reasoning system for complex medical diagnosis
AU2011247830B2 (en) Method and system for generating text
CN110299209B (en) Similar medical record searching method, device and equipment and readable storage medium
CN113724848A (en) Medical resource recommendation method, device, server and medium based on artificial intelligence
US20150310179A1 (en) System and method that applies relational and non-relational data structures to medical diagnosis
JP2020529058A (en) Human-participatory interactive model training
US11604778B1 (en) Taxonomic fingerprinting
CN109671476A (en) Recognition methods, device, terminal and the computer readable storage medium of unrelated medication
JP4318221B2 (en) Medical information analysis apparatus, method and program
KR20170133692A (en) Method and Apparatus for generating association rules between medical words in medical record document
US20210174968A1 (en) Visualization of Social Determinants of Health
US8676800B2 (en) Method and system for generating text
CN111243753A (en) Medical data-oriented multi-factor correlation interactive analysis method
CN112435755A (en) Disease analysis method, disease analysis device, electronic device, and storage medium
CN113948168A (en) Medical data evaluation practical application system and medical data evaluation practical application method
CN116070096A (en) Method and system for helping hospital build patient portrait through big data analysis
CN111415760A (en) Doctor recommendation method, system, computer equipment and storage medium
Shan et al. COVID‐19 patient diagnosis and treatment data mining algorithm based on association rules
CN112307028B (en) Cross-data information knowledge modal differential content recommendation method oriented to essential computation
CN117370565A (en) Information retrieval method and system
CN117690600B (en) Knowledge-graph-based infectious disease prediction method, system, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant