CN111986815A

CN111986815A - Project combination mining method based on co-occurrence relation and related equipment

Info

Publication number: CN111986815A
Application number: CN202010893345.XA
Authority: CN
Inventors: 盛建为; 周全
Original assignee: Ping An Medical and Healthcare Management Co Ltd
Current assignee: Shenzhen Ping An Medical Health Technology Service Co Ltd
Priority date: 2020-08-31
Filing date: 2020-08-31
Publication date: 2020-11-24

Abstract

The invention relates to the field of data processing, and discloses a project combination mining method and related equipment based on a co-occurrence relation, which are applied to the field of intelligent medical treatment. The method includes preprocessing medical record data, classifying according to disease types, extracting treatment items in the medical record data, constructing an initial co-occurrence relationship network by taking the treatment items as graph nodes, simplifying the treatment items to obtain a project co-occurrence relationship graph, performing combination mining by using a project combination mining model to obtain a complete subgraph, and outputting a treatment item combination data set based on the complete subgraph; meanwhile, the diagnosis efficiency of doctors is improved, and further, the possibility is provided for subsequent medical intellectualization. In addition, the invention also relates to a block chain technology, and the treatment items and the co-occurrence relation can be stored in the block chain.

Description

Project combination mining method based on co-occurrence relation and related equipment

Technical Field

The application relates to the field of data processing, in particular to a project combination mining method and related equipment based on co-occurrence relationship.

Background

With the popularization of medical electronic medical records, more and more clinical medical records are electronized and digitalized and become data sources which can be directly processed by a computer. With the continuous development of big data technology, people increasingly use computer technology means to analyze a large amount of medical care data from patients and crowds to obtain valuable implicit information for assisting clinical researchers, clinicians, managers, researchers and health policy makers.

At present, the analysis of medical data in medical clinic mainly includes the analysis of clinical symptoms of each disease category and the analysis of treatment effects and pathological responses of a single treatment item, while the analysis of association relations generated when a plurality of treatment items are applied to the same disease category, although some methods for revealing association relations among diseases analyze the association relations when the research of treating the co-morbid syndrome is carried out, the association methods do not conform to the habits of medical clinic research, so that the association methods are difficult to be docked with the clinical research, or the association relations among the treatment items cannot be well reflected.

Disclosure of Invention

The invention mainly aims to solve the technical problem that the correlation analysis is difficult to realize when a plurality of treatment items are simultaneously used on the same disease category, so that the diagnosis efficiency is low in the prior art.

The invention provides a project combination mining method based on a co-occurrence relationship in a first aspect, which comprises the following steps:

acquiring clinical data and extracting medical record data in the clinical data, wherein the medical record data comprises at least two diagnosis lists;

preprocessing the medical record data, clustering diagnosis lists of which the diagnosis results belong to the same disease category, and extracting all medical records of corresponding disease categories and treatment items corresponding to each medical record from the clustered diagnosis lists;

taking all treatment projects as graph nodes, and constructing an initial co-occurrence relationship network among the graph nodes to obtain a project co-occurrence relationship graph;

simplifying an initial co-occurrence relation network in the project co-occurrence relation graph by a preset project combination mining model to obtain a network relation structure, and adjusting the project co-occurrence relation graph based on the network relation structure to obtain a complete subgraph;

and generating a treatment item combination data set corresponding to the disease species based on the network relation between the graph nodes in the complete sub-graph.

Optionally, in a first implementation manner of the first aspect of the present invention, the constructing an initial co-occurrence relationship network between the graph nodes by using all the treatment items as graph nodes to obtain the item co-occurrence relationship graph includes:

randomly selecting one treatment item from all treatment items as a main node, traversing the rest treatment items, and forming an item combination with each treatment item;

calculating the probability of the project combination appearing in the medical record data at the same time;

judging whether the probability meets an initial co-occurrence condition;

if so, adding an edge between the project combinations to form an initial co-occurrence relationship network;

and outputting the project co-occurrence relation graph after all the project combinations are added.

Optionally, in a second implementation manner of the first aspect of the present invention, the item combination includes a first treatment item and a second treatment item, and the calculating a probability that the item combination appears in the medical record data at the same time includes:

counting a first number of times that the first treatment item and the second treatment item appear in the same diagnosis list simultaneously in the medical record data, and a second number of times that the first treatment item appears in the diagnosis list independently and a third number of times that the second treatment item appears in the diagnosis list independently;

calculating a first probability of occurrence of the combination of items relative to the first treatment item based on the first number and the second number;

calculating a second probability of occurrence of the combination of items relative to the second treatment item based on the first number and the third number.

Optionally, in a third implementation manner of the first aspect of the present invention, the determining whether the probability satisfies a co-occurrence relationship construction condition includes:

comparing the first occurrence probability and the second occurrence probability with an initial co-occurrence condition respectively;

if the first occurrence probability and the second occurrence probability simultaneously meet the initial co-occurrence condition, determining that the item combination is a binding treatment item of the same disease type;

and if at least one of the first occurrence probability and the second occurrence probability does not meet the initial co-occurrence condition, determining that the item combination is an unbound treatment item of the same disease category.

Optionally, in a fourth implementation manner of the first aspect of the present invention, the simplifying, by using a preset project combination mining model, an initial co-occurrence relationship network in the project co-occurrence relationship diagram, so as to obtain a network relationship structure, includes:

extracting probabilities of a first graph node and other graph nodes in the project co-occurrence relationship graph, and comparing the probabilities with preset weight values respectively, wherein the first graph node is a currently selected simplified graph node, the other nodes are graph nodes except the first graph node, and the weight values are the proportion of the number of medical cases for simultaneously using two pieces of medical project data to the total number of the medical cases;

if the weight value is lower than the weight value, deleting the corresponding edge from the first graph node;

and after all the graph nodes in the project co-occurrence relation graph are compared, outputting a network relation structure.

Optionally, in a fifth implementation manner of the first aspect of the present invention, the adjusting the project co-occurrence relationship graph based on the network relationship structure to obtain a complete sub-graph includes:

traversing all graph nodes, screening out zero-degree nodes, and deleting the zero-degree nodes from the initial co-occurrence relationship network, wherein the zero-degree nodes are graph nodes without edges between the zero-degree nodes and any graph nodes;

randomly selecting N graph nodes, and calculating the total edge number of a local relationship graph consisting of the N graph nodes;

judging whether the total number of edges is equal to an edge threshold value, wherein the edge threshold value is equal to N (N-1)/2, and N is greater than or equal to 2;

and if so, determining that the layout relationship graph is a complete subgraph.

Optionally, in a sixth implementation manner of the first aspect of the present invention, after the generating a treatment item combination data set corresponding to the disease category based on the network relationship between the graph nodes in the full sub-graph, the method further includes:

extracting medicine information corresponding to each disease species and the incidence relation among the medicines;

constructing a medicine co-occurrence relation graph corresponding to the disease species according to the medicine information and the corresponding association relation;

and simplifying the drug co-occurrence relation graph according to a preset drug combination mining model, and generating a drug combination data set based on the result after simplification.

The second aspect of the present invention provides a project combination mining apparatus based on co-occurrence relationship, including:

the data acquisition module is used for acquiring clinical data and extracting medical record data in the clinical data, wherein the medical record data comprises at least two diagnosis lists;

the preprocessing module is used for preprocessing the medical record data, clustering diagnosis lists with diagnosis results belonging to the same disease category, and extracting all medical records corresponding to the disease category and treatment items corresponding to each medical record from the clustered diagnosis lists;

the construction module is used for constructing an initial co-occurrence relationship network among all the graph nodes by taking all the treatment projects as the graph nodes to obtain a project co-occurrence relationship graph;

the mining module is used for simplifying an initial co-occurrence relation network in the project co-occurrence relation graph through a preset project combination mining model to obtain a network relation structure, and adjusting the project co-occurrence relation graph based on the network relation structure to obtain a complete subgraph;

and the generation module is used for generating a treatment project combination data set corresponding to the disease category based on the network relation among the graph nodes in the complete sub-graph.

Optionally, in a first implementation manner of the second aspect of the present invention, the building module includes:

the traversing unit is used for randomly selecting one treatment item from all the treatment items as a main node, traversing the rest of the treatment items and forming an item combination with each treatment item;

the first calculation unit is used for calculating the probability of the item combination appearing in the medical record data at the same time;

the first judgment unit is used for judging whether the probability meets the initial co-occurrence condition or not;

a creating unit, configured to add an edge between the item combinations when the probability satisfies an initial co-occurrence condition, so as to form an initial co-occurrence relationship network; and outputting the project co-occurrence relation graph after all the project combinations are added.

Optionally, in a second implementation manner of the second aspect of the present invention, the item combination includes a first treatment item and a second treatment item, and the first computing unit is specifically configured to:

Optionally, in a third implementation manner of the second aspect of the present invention, the first determining unit is specifically configured to:

when the first occurrence probability and the second occurrence probability simultaneously meet the initial co-occurrence condition, determining that the item combination is a binding treatment item of the same disease type; and determining that the combination of the items is an unbound treatment item of the same disease species when at least one of the first occurrence probability and the second occurrence probability does not satisfy the initial co-occurrence condition.

Optionally, in a fourth implementation manner of the second aspect of the present invention, the mining module includes:

a comparing unit, configured to extract probabilities of a first graph node and other graph nodes in the project co-occurrence relationship graph, and compare the probabilities with preset weight values, where the first graph node is a currently selected simplified graph node, the other nodes are graph nodes except the first graph node, and the weight value is a ratio of a number of cases in which two pieces of medical project data are used simultaneously to a total number of cases;

a deleting unit, configured to delete the corresponding edge from the first graph node when the probability is lower than the weight value;

and the output unit is used for outputting the network relationship structure after all the graph nodes in the project co-occurrence relationship graph are compared.

Optionally, in a fifth implementation manner of the second aspect of the present invention, the mining module further includes:

the screening unit is used for traversing all the graph nodes, screening zero-degree nodes, and deleting the zero-degree nodes from the initial co-occurrence relationship network, wherein the zero-degree nodes are the graph nodes without edges between the zero-degree nodes and any graph node;

the second calculation unit is used for randomly selecting N graph nodes and calculating the total number of edges of the local relationship graph formed by the N graph nodes;

a second determining unit, configured to determine whether the total number of edges is equal to an edge threshold, where the edge threshold is equal to N × N-1/2, and N is greater than or equal to 2;

and the determining unit is used for determining that the layout relationship graph is a complete subgraph when the total number of the edges is equal to the threshold value of the number of the edges.

Optionally, in a sixth implementation manner of the second aspect of the present invention, the project combination mining apparatus based on co-occurrence relationship further includes an optimization module, which is specifically configured to:

The third aspect of the present invention provides a project combination mining apparatus based on co-occurrence relationship, including: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;

the at least one processor invokes the instructions in the memory to cause the co-occurrence based project portfolio mining device to perform the co-occurrence based project portfolio mining method described above.

A fourth aspect of the present invention provides a computer-readable storage medium having stored therein a computer program which, when run on a computer, causes the computer to execute the above-described co-occurrence relationship-based project portfolio mining method.

According to the technical scheme provided by the invention, medical record data is preprocessed, classified according to disease types, treatment items in the medical record data are extracted, an initial co-occurrence relation network is established by taking the treatment items as graph nodes and simplified to obtain an item co-occurrence relation graph, a complete subgraph is obtained by utilizing an item combination mining model to carry out combination mining, and a treatment item combination data set is output based on the complete subgraph; meanwhile, after a doctor diagnoses a specific disease type, the doctor can directly search the treatment item combination data set to obtain the treatment item recommendation which needs to be associated after determining one of the treatment items, so that the diagnosis efficiency of the doctor is greatly improved, and further, the possibility is provided for subsequent medical intellectualization.

Drawings

FIG. 1 is a diagram of a first embodiment of a project combination mining method based on co-occurrence relationship in the embodiment of the present invention;

FIG. 2 is a diagram of a project portfolio mining methodology based on co-occurrence relationships according to a second embodiment of the present invention;

FIG. 3 is a diagram of a third embodiment of a project combination mining method based on co-occurrence relationship in the embodiment of the present invention;

FIG. 4 is a diagram of a fourth embodiment of a project combination mining method based on co-occurrence relationship according to the embodiment of the present invention;

FIG. 5 is a schematic illustration of a set of treatment items in an embodiment of the invention;

FIG. 6 is a schematic diagram of a complete subgraph in an embodiment of the invention;

FIG. 7 is a diagram of an embodiment of a project portfolio mining apparatus based on co-occurrence relationships, in accordance with an embodiment of the present invention;

FIG. 8 is a diagram of another embodiment of a project portfolio mining apparatus based on co-occurrence relationships in an embodiment of the present invention;

FIG. 9 is a diagram of an embodiment of a project portfolio mining device based on co-occurrence relationships, according to an embodiment of the present invention.

Detailed Description

Aiming at the defects in the prior art, the application provides a method for mining the co-occurrence relationship of a plurality of treatment items in the same disease species through a diagnosis and treatment item combined graph mining model, in particular to a method for mining the co-occurrence relationship of different diagnosis and treatment items based on the graph mining model of the co-occurrence relationship so as to determine the treatment items which need to appear simultaneously in the treatment of the same disease species, and when a doctor diagnoses, the doctor can quickly provide a corresponding treatment scheme for a patient based on the combined relationship, so that the diagnosis time is greatly shortened, and meanwhile, the diagnosis efficiency of the doctor is also improved.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

For convenience of understanding, a specific flow of the embodiment of the present invention is described below, and referring to fig. 1, a first embodiment of a project combination mining method based on a co-occurrence relationship in the embodiment of the present invention includes:

101. acquiring clinical data and extracting medical record data in the clinical data, wherein the medical record data comprises at least two diagnosis lists;

in this step, clinical data may be obtained through an interface on the same provided data analysis platform, for example, through a secure transmission interface provided based on a mobile internet communication protocol, a user may log in the secure transmission interface through an account to directly access clinical data of each medical institution, and specifically, while invoking the secure transmission interface, a data acquisition program is also invoked to read and access a clinical database in the medical institution, thereby capturing clinical data in the database.

The clinical data comprises a diagnosis list, the diagnosis list comprises patient data information, diagnosis information and treatment information, the diagnosis information comprises diagnosis results, and the treatment information comprises medical treatment items and medication information of each item.

102. Preprocessing medical record data, clustering diagnosis lists of which diagnosis results belong to the same disease category, and extracting all medical records of corresponding disease categories and treatment items corresponding to each medical record from the clustered diagnosis lists;

in this embodiment, the preprocessing includes a plurality of steps such as feature extraction, data screening, and clustering, and specifically, when the medical record data is preprocessed, the preprocessing includes:

screening the medical record data, identifying information of non-diagnosis orders in the medical record data, and deleting the non-diagnosis orders to obtain a diagnosis order set;

extracting disease symptoms of each diagnosis list in the diagnosis list set by using a disease species knowledge graph, and matching specific disease names based on the extracted disease symptoms;

classifying the diagnosis lists with the same disease name in the diagnosis list set into one class by using a clustering algorithm to obtain a disease data group;

furthermore, the medical records of the diagnosis list in each disease data set are classified, and the treatment items of the diagnosis list corresponding to each medical record are extracted to form an item set.

For example: and (3) preprocessing the medical record data, screening all medical records of a certain disease type, and recording the number of all medical records as n. Converting diagnosis and treatment items (medicine or inspection items) used by a patient in the hospitalization process into a set of items { a }_i1,a_i2,…a_ikAnd f, wherein the subscript i is a medical record label, and k is a diagnosis and treatment item received by the patient.

103. Constructing an initial co-occurrence relationship network among all graph nodes by taking all treatment projects as graph nodes to obtain a project co-occurrence relationship graph;

in the step, when the project co-occurrence relationship diagram is constructed, the project co-occurrence relationship diagram can be specifically constructed according to a construction mode of a complete diagram, firstly, all treatment projects extracted from medical record data are taken as nodes, then, each node is connected with each other to obtain an initial complete diagram, the betweenness of edges among the nodes is calculated based on the initial complete diagram, whether the association relationship between the node and other nodes meets a preset condition or not is judged based on the betweenness, if yes, the connection of the edges is reserved, otherwise, the connection of the edges is deleted, and therefore an initial co-occurrence relationship network is constructed to obtain the project co-occurrence relationship diagram.

In practical application, the detection algorithm of the python network module can be used for detecting the shortest path between two nodes in a complete graph, and calculating betweenness (betweenness) between the two nodes, and the specific implementation steps for detecting the shortest path between the two nodes are as follows:

and identifying the module structure in the complete graph, optionally, taking each node in the complete graph as a central node, searching nodes connected with the node, recording the module if the number level of the searched nodes reaches the number level of the modules, and calculating the betweenness between two nodes in the module.

Specifically, edge betweenness of all paths connected between two nodes is calculated, and the path with the smallest betweenness value is selected as the path between two modules, so that a co-occurrence relation between the two nodes is obtained.

104. Simplifying an initial co-occurrence relation network in the project co-occurrence relation graph by presetting a project combination mining model to obtain a network relation structure, and adjusting the project co-occurrence relation graph based on the network relation structure to obtain a complete subgraph;

in this embodiment, the item combination mining model refers to a detection model that can identify the co-occurrence relationship of treatment items in each diagnosis list by learning the treatment items in the medical data of clinical treatment success of various disease types based on a natural language technology.

The model is used for analyzing and mining the relationship between different medical items to obtain diagnosis and treatment items which often appear together in clinic, and the co-occurrence relationship between the diagnosis and treatment items and the corresponding relationship between the diagnosis and treatment items and the disease types under the co-occurrence relationship are established.

In practical application, the model is used for simplifying the initial co-occurrence relationship network, meanwhile, treatment items are classified according to different disease types, after all treatment items in medical record data are extracted, the model is used for simplifying to obtain a network relationship structure, classification is carried out according to the same disease type based on the network relationship structure to obtain a plurality of complete sub-graphs, then treatment items corresponding to all nodes on the complete sub-graphs are converted into treatment item sets, and classification graphs are obtained, wherein the classification graphs are shown in fig. 5.

105. And generating a treatment project combination data set corresponding to the disease category based on the network relation among the graph nodes in the complete sub-graph.

In the step, the entity characteristics of the treatment items in each complete subgraph are extracted by using a natural language algorithm, a corresponding medical knowledge graph is constructed according to the entity characteristics, and the corresponding relation between the medical knowledge graph and the disease species is established, so that a treatment item combination data set is formed.

Through the implementation of the method, the combined relation between different diagnosis and treatment items is mined based on actual diagnosis and treatment data and a graph mining model of the co-occurrence relation. The method adopts a graph method to show the combined use condition of various diagnosis and treatment items in the clinical practical process, is simple and clear, is fit with the clinical practice, and can truly reflect the combined use condition of various items in the clinical treatment process; meanwhile, after a doctor diagnoses a specific disease type, the doctor can directly search the treatment item combination data set to obtain the treatment item recommendation which needs to be associated after determining one treatment item, and the diagnosis efficiency of the doctor is greatly improved.

Referring to fig. 2, a second embodiment of the project portfolio mining method based on co-occurrence relationship according to the present invention includes:

201. acquiring clinical data and extracting a plurality of diagnosis lists in the clinical data;

202. preprocessing medical record data, clustering diagnosis lists of which diagnosis results belong to the same disease category, and extracting all medical records of corresponding disease categories and treatment items corresponding to each medical record from the clustered diagnosis lists;

203. randomly selecting one treatment item from all treatment items as a main node, traversing the rest treatment items, and forming an item combination with each treatment item;

in this step, a project combination is formed by traversing with the remaining treatment projects from one of all the treatment projects as a master node according to an algorithm, and after the master node is completed, the next treatment project is continuously selected as the master node to be combined, and after all the treatment projects are traversed and combined, all the combinations are subjected to duplicate removal and screening to obtain the project combination.

In practical applications, the deduplication here is to determine whether the treatment items in the combination are the same, but does not identify the connection direction of the combination, that is, a → B and B → a are repeated combinations, and the treatment item combination here is not limited to two item combinations, and may be three or more.

204. Calculating the probability of the simultaneous occurrence of the project combinations in the medical record data;

in practical application, the scene of simultaneous use of multiple treatment items is mainly evaluated in units of diagnosis orders, and different diagnosis orders are not evaluated, that is, the co-occurrence relationship refers to a relationship of simultaneous use on the same disease to treat patients, while in the same disease, multiple symptoms or simultaneous occurrence of multiple symptoms exist, so that clinically, different treatment items need to be selected to take medicines against the symptoms, and in addition, whether conflicts exist before different treatment items exist or not needs to be judged.

205. Judging whether the probability meets the initial co-occurrence condition;

in this step, the initial co-occurrence condition includes the probability of being used simultaneously between the treatment items in the clinical trial, which is the probability in the clinical trial, and in practical application, it is adjusted adaptively according to the clinical response of the treatment items themselves.

206. If so, adding an edge between the project combinations to form an initial co-occurrence relationship network;

in this embodiment, the edges added between the treatment items in the item combination are connected by the shortest path between the two, and the shortest path is realized by the intermediaries among the plurality of paths connected in the treatment items, so as to form the initial co-occurrence relationship network.

207. After all the project combinations are added, outputting a project co-occurrence relation graph;

in this step, a co-occurrence relationship diagram among all treatment items is specifically constructed according to the initial co-occurrence relationship network in the item combination, and the calculation of the co-occurrence relationship between the combination and the combination is specifically included, and the calculation principle thereof is the same as that of the co-occurrence relationship between the treatment items in the combination, and is not repeated here.

208. Simplifying an initial co-occurrence relation network in the project co-occurrence relation graph by presetting a project combination mining model to obtain a network relation structure, and adjusting the project co-occurrence relation graph based on the network relation structure to obtain a complete subgraph;

in this embodiment, the project combination mining model is obtained by learning and training treatment projects in a diagnosis list through a deep neural network, specifically, treatment projects in a historical diagnosis list are extracted and combined to form a training set, regression processing is performed on the basis of the training set through a regression algorithm and then the training set is divided into a training set and a verification set, the deep neural network is trained on the basis of the training set to obtain a model prototype, the model prototype is verified on the basis of the verification set, and the project combination mining model is formed after the probability that the verification result and the treatment projects in the diagnosis list corresponding to the verification set appear at the same time reaches a threshold value.

Edges in the initial co-occurrence relationship graph are simplified based on the project combination mining model, wherein the simplification refers to deletion or modification.

And then selecting a plurality of graph nodes in the simplified graph to combine through a random combination algorithm to form a relation subgraph, calculating whether the number of connecting edges of each node in the relation subgraph meets a combination formula of the complete subgraph, and if so, outputting the relation subgraph as the complete subgraph.

209. And generating a treatment project combination data set corresponding to the disease category based on the network relation among the graph nodes in the complete sub-graph.

According to the method, the combined relation among different diagnosis and treatment projects is mined based on the graph mining model of the co-occurrence relation. The method adopts a graph method to show the combined use condition of various diagnosis and treatment items in the clinical practical process, is simple and clear, is fit with the clinical practice, and can truly reflect the combined use condition of various items in the clinical treatment process; meanwhile, after a doctor diagnoses a specific disease type, the doctor can directly search the treatment item combination data set to obtain the treatment item recommendation which needs to be associated after determining one treatment item, and the diagnosis efficiency of the doctor is greatly improved.

Referring to fig. 3, a third embodiment of the project combination mining method based on co-occurrence relationship according to the embodiment of the present invention includes:

301. acquiring clinical data and extracting medical record data in the clinical data, wherein the medical record data comprises at least two diagnosis lists;

302. preprocessing medical record data, clustering diagnosis lists of which diagnosis results belong to the same disease category, and extracting all medical records of corresponding disease categories and treatment items corresponding to each medical record from the clustered diagnosis lists;

303. randomly selecting one treatment item from all treatment items as a main node, traversing the rest treatment items, and forming an item combination with each treatment item;

in this embodiment, the implementation principles of

steps

301 and 303 are the same as those of

steps

201 and 203 in the above embodiment, and are not described herein again.

304. Counting a first time number of simultaneous occurrence of a first treatment item and a second treatment item in the same diagnosis list, a second time number of independent occurrence of the first treatment item in the diagnosis list and a third time number of independent occurrence of the second treatment item in the diagnosis list in the medical record data;

305. calculating a first probability of occurrence of the combination of items relative to the first treatment item based on the first number and the second number;

306. calculating a second probability of occurrence of the combination of items relative to the second treatment item based on the first number and the third number;

in practical application, when counting the first time, the second time and the third time, the number of diagnosis lists with the simultaneous occurrence of the first treatment item and the second treatment item can be directly counted from medical record data, the number of diagnosis lists is taken as the first time, then the number of diagnosis lists with the first treatment item exists is counted from the medical record data, the number of diagnosis lists with the second time is counted from the medical record data, and further the number of diagnosis lists with the second treatment item exists is counted from the medical record data, and the number of diagnosis lists is taken as the third time;

furthermore, statistics can be performed by first counting the number of diagnosis lists with the first treatment item or the second treatment item in the medical record data, then counting the number of diagnosis lists with two treatment items appearing simultaneously from the diagnosis lists with the two treatment items, and finally calculating the first appearance probability and the second appearance probability based on the three parameters.

307. Comparing the first occurrence probability and the second occurrence probability with the initial co-occurrence condition respectively;

308. if the first occurrence probability and the second occurrence probability simultaneously meet the initial co-occurrence condition, determining that the item combination is a binding treatment item of the same disease type;

309. if at least one of the first occurrence probability and the second occurrence probability does not meet the initial co-occurrence condition, determining that the item combination is a non-binding treatment item of the same disease species;

in practical application, when the constructed project co-occurrence relationship graph is G (v, e, w), where v represents a node, i.e. all the diagnosis and treatment projects, and e is an edge between nodes.

Specifically, the initial co-occurrence condition is the minimum support degree, and the specific implementation of judging that the initial co-occurrence condition is satisfied is as follows:

and adding an edge between the items a and b when the items a and b meet the co-occurrence relation. Wherein satisfying the initial co-occurrence condition is defined as:

wherein minsup is the minimum support degree and is an empirical parameter, a user defines a variable, the larger the value of the variable is, the stricter the co-occurrence relationship is, and the value is usually determined by experiments and is between 0.8 and 0.95. And based on the item sets of all the patients, judging whether the diagnosis and treatment items meet the co-occurrence relationship in any pairwise combination mode, and adding an edge between two nodes when the co-occurrence relationship is met.

310. If so, adding an edge between the project combinations to form an initial co-occurrence relationship network;

311. after all the project combinations are added, outputting a project co-occurrence relation graph;

312. simplifying an initial co-occurrence relation network in the project co-occurrence relation graph by presetting a project combination mining model to obtain a network relation structure, and adjusting the project co-occurrence relation graph based on the network relation structure to obtain a complete subgraph;

313. and generating a treatment project combination data set corresponding to the disease category based on the network relation among the graph nodes in the complete sub-graph.

The method mainly mines the combination relation among different diagnosis and treatment projects based on actual diagnosis and treatment data and a graph mining model of the co-occurrence relation. The method adopts a graph method to show the combined use condition of various diagnosis and treatment items in the clinical practical process, is simple, clear and fit with the clinical practice, and can truly reflect the combined use condition of various items in the clinical treatment process.

Furthermore, based on the method, one treatment item is taken as a base point, the treatment items are mined according to disease types and co-occurrence relations, based on the mined treatment items, a doctor can determine a specific disease type according to the treatment items during diagnosis, and then directly searches a treatment item combination data set through one of the treatment items to obtain the treatment item recommendation which needs to be associated, so that the diagnosis efficiency of the doctor is greatly improved.

Referring to fig. 4, a fourth embodiment of the project combination mining method based on co-occurrence relationship according to the embodiment of the present invention includes:

401. acquiring clinical data and extracting medical record data in the clinical data, wherein the medical record data comprises at least two diagnosis lists;

402. preprocessing medical record data, clustering diagnosis lists of which diagnosis results belong to the same disease category, and extracting all medical records of corresponding disease categories and treatment items corresponding to each medical record from the clustered diagnosis lists;

403. constructing an initial co-occurrence relationship network among all graph nodes by taking all treatment projects as graph nodes to obtain a project co-occurrence relationship graph;

in this embodiment, the implementation principles of steps 401-403 are the same as those of steps 201-203 in the above embodiment, and are not described herein again.

404. Extracting the probability of a first graph node and other graph nodes in the project co-occurrence relation graph, and respectively comparing the probability with a preset weight value;

in this step, the first graph node is a currently selected simplified graph node, the other nodes are graph nodes except the first graph node, and the weight value is a ratio of the number of medical records using two pieces of medical item data at the same time to the total number of medical records.

405. If the weight value is lower than the weight value, deleting the corresponding edge from the first graph node;

406. after all graph nodes in the project co-occurrence relation graph are compared, outputting a network relation structure;

in practical application, in order to simplify the complexity of graph mining, a certain simplification of the project co-occurrence relationship graph is required. Generally, the preset probability value can be set as a weight threshold, edges below the threshold are deleted, and then those 0-degree nodes are removed, where the 0-degree node refers to a node having no edge with any node.

In this embodiment, when the constructed project co-occurrence relationship diagram is a complete diagram, that is, a diagram in which any two nodes in the constructed project co-occurrence relationship diagram have one edge, the complete diagram in the project co-occurrence relationship diagram means that any two projects satisfy the co-occurrence relationship, that is, the two projects often appear together. The maximum complete subgraph refers to adding any node, and the graph is not a complete graph.

And extracting complete subgraphs from the project co-occurrence relationship graph, particularly mining all the maximum complete subgraphs by using NetworkX, finding out all projects which frequently appear together,

in this embodiment, extracting a complete subgraph specifically includes:

the adjusting the project co-occurrence relationship graph based on the network relationship structure to obtain a complete subgraph comprises:

if so, determining that the layout relationship graph is a complete subgraph, and obtaining the complete subgraph specifically as shown in fig. 6.

407. Generating a treatment project combination data set corresponding to the disease category based on the network relation among the graph nodes in the complete sub-graph;

408. extracting medicine information corresponding to each disease species and the incidence relation among the medicines;

409. constructing a medicine co-occurrence relation graph corresponding to the disease species according to the medicine information and the corresponding association relation;

410. and simplifying the drug co-occurrence relation graph according to a preset drug combination mining model, and generating a drug combination data set based on the result after simplification.

In this embodiment, a drug entity is specifically identified from acquired medical record data;

preprocessing the extracted drug entities, and performing co-occurrence matrix of disease species and drug entities;

calculating the confidence value IMPT of the relationship between each pair of nodes in the co-occurrence matrix by adopting a naive Bayes model, or calculating and acquiring the confidence value IMPT of the relationship between each pair of nodes in the co-occurrence matrix in the step B by adopting a NoisyOR model;

ranking all the confidence values from large to small, wherein the previous n or the relation with the confidence value larger than a certain threshold value is an edge, and all the medicine entities are used as nodes to construct a medicine co-occurrence relation graph;

and calling a drug combination mining model based on the drug co-occurrence relation graph, simplifying the drug co-occurrence relation graph, and generating a drug combination data set based on the result after simplification.

In practical application, the constructed medicine co-occurrence relationship diagram can be essentially understood as a relationship diagram of indications of tablets, specifically, the indications of the medicines applied to the same disease category are extracted, a medicine combination and a relationship diagram are constructed based on the indications, then a treatment item is randomly selected from all the indications in the relationship diagram as a main node, the rest indications are traversed, and a item combination is formed with each indication;

judging whether the probability meets an initial co-occurrence condition;

and outputting the medicine co-occurrence relation graph after all the project combinations are added.

Through the implementation of the scheme, the combined relation among different diagnosis and treatment projects is mined based on actual diagnosis and treatment data and a graph mining model of the co-occurrence relation. The method adopts a graph method to show the combined use condition of various diagnosis and treatment items in the clinical practical process, is simple, clear and fit with the clinical practice, and can truly reflect the combined use condition of various items in the clinical treatment process.

In the above description of the project combination mining method based on the co-occurrence relationship in the embodiment of the present invention, the project combination mining device based on the co-occurrence relationship in the embodiment of the present invention is described below, referring to fig. 7, a first embodiment of the project combination mining device based on the co-occurrence relationship in the embodiment of the present invention includes:

a data obtaining module 701, configured to obtain clinical data and extract medical record data in the clinical data, where the medical record data includes at least two diagnosis lists;

a preprocessing module 702, configured to preprocess the medical record data, cluster diagnosis lists with diagnosis results belonging to the same disease category, and extract all medical records of corresponding disease categories and treatment items corresponding to each medical record from the clustered diagnosis lists;

a constructing module 703, configured to construct an initial co-occurrence relationship network between the graph nodes by using all the treatment projects as graph nodes, so as to obtain a project co-occurrence relationship graph;

the mining module 704 is used for simplifying the initial co-occurrence relationship network in the project co-occurrence relationship diagram by presetting a project combination mining model to obtain a network relationship structure, and adjusting the project co-occurrence relationship diagram based on the network relationship structure to obtain a complete subgraph;

a generating module 705, configured to generate a treatment item combination data set corresponding to the disease category based on the network relationship between the graph nodes in the complete sub-graph.

In this embodiment, the item combination mining device based on the co-occurrence relationship operates the item combination mining method based on the co-occurrence relationship, the method includes preprocessing medical record data, classifying the medical record data according to disease types, extracting treatment items in the medical record data, constructing an initial co-occurrence relationship network by taking the treatment items as graph nodes, simplifying the initial co-occurrence relationship network to obtain an item co-occurrence relationship graph, performing combination mining by using an item combination mining model to obtain a complete subgraph, and outputting a treatment item combination data set based on the complete subgraph; meanwhile, after a doctor diagnoses a specific disease type, the doctor can directly search the treatment item combination data set to obtain the treatment item recommendation which needs to be associated after determining one of the treatment items, so that the diagnosis efficiency of the doctor is greatly improved, and further, the possibility is provided for subsequent medical intellectualization.

Referring to fig. 8, a second embodiment of the project combination mining device based on co-occurrence relationship according to the embodiment of the present invention specifically includes:

Optionally, the building module 703 includes:

a traversal unit 7031, configured to randomly select one treatment item from all treatment items as a master node, traverse the remaining treatment items, and form an item combination with each treatment item;

a first calculating unit 7032, configured to calculate a probability that the item combination appears in the medical record data at the same time;

a first determining unit 7033, configured to determine whether the probability satisfies an initial co-occurrence condition;

a creating unit 7034, configured to add an edge between the item combinations to form an initial co-occurrence relationship network when the probability satisfies an initial co-occurrence condition; and outputting the project co-occurrence relation graph after all the project combinations are added.

Optionally, the item combination includes a first treatment item and a second treatment item, and the first computing unit 7032 is specifically configured to:

Optionally, the first determining unit 7033 is specifically configured to:

Optionally, the mining module 704 includes:

a comparing unit 7041, configured to extract probabilities of a first graph node and other graph nodes in the project co-occurrence relationship graph, and compare the probabilities with preset weight values, where the first graph node is a currently selected simplified graph node, the other nodes are graph nodes except the first graph node, and the weight value is a ratio of a number of cases in which two pieces of medical project data are used at the same time to a total number of cases;

a deleting unit 7042, configured to delete the corresponding edge from the first graph node when the probability is lower than the weight value;

an output unit 7043, configured to output the network relationship structure after all the graph nodes in the project co-occurrence relationship graph are compared.

Optionally, the mining module 704 further includes:

a screening unit 7044, configured to traverse all graph nodes, screen out a zero-degree node, and delete the zero-degree node from the initial co-occurrence relationship network, where the zero-degree node is a graph node that has no edge with any graph node;

a second calculating unit 7045, configured to randomly select N graph nodes, and calculate a total number of edges of the local relationship graph formed by the N graph nodes;

a second determining unit 7046, configured to determine whether the total number of edges is equal to an edge threshold, where the edge threshold is equal to N × N (N-1)/2, and N is greater than or equal to 2;

a determining unit 7047, configured to determine that the layout relationship diagram is a complete subgraph when the total number of edges is equal to the edge number threshold.

The co-occurrence relationship-based project combination mining apparatus further includes an optimization module 705, which is specifically configured to:

The above fig. 7 and fig. 8 describe in detail the project combination mining apparatus based on co-occurrence relationship in the embodiment of the present invention from the perspective of the modular functional entity, and the project combination mining apparatus based on co-occurrence relationship in the embodiment of the present invention from the perspective of the hardware processing, but the project combination mining apparatus based on co-occurrence relationship may set in the form of a plug-in to implement the mining of co-occurrence relationship between the treatment projects with the project combination mining apparatus based on co-occurrence relationship, and extract different treatment project usage combinations among disease categories.

Fig. 9 is a schematic structural diagram of a co-occurrence based project composition mining device according to an embodiment of the present invention, where the co-occurrence based project composition mining device 600 may generate relatively large differences due to different configurations or performances, and may include one or more processors (CPUs) 610 (e.g., one or more processors) and a memory 620, and one or more storage media 630 (e.g., one or more mass storage devices) storing an application program 633 or data 632. Memory 620 and storage medium 630 may be, among other things, transient or persistent storage. The program stored on the storage medium 630 may include one or more modules (not shown), each of which may include a series of instruction operations for the co-occurrence based project portfolio mining tool 600. Still further, the processor 610 may be configured to communicate with the storage medium 630 to execute a series of instruction operations in the storage medium 630 on the co-occurrence based project portfolio mining apparatus 600 to implement the steps of the co-occurrence based project portfolio mining methodology described above.

The co-occurrence based project portfolio mining tool 600 may also include one or more power supplies 640, one or more wired or wireless network interfaces 650, one or more input-output interfaces 660, and/or one or more operating systems 631, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and the like. Those skilled in the art will appreciate that the co-occurrence based project portfolio excavation tool configuration illustrated in FIG. 9 does not constitute a limitation of the co-occurrence based project portfolio excavation tools provided herein, and may include more or fewer components than illustrated, or some components in combination, or a different arrangement of components.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, and may also be a volatile computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are executed on a computer, the instructions cause the computer to perform the steps of the co-occurrence relationship based project combination mining method provided in the foregoing embodiments.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A project combination mining method based on co-occurrence relationship is characterized in that the project combination mining method based on co-occurrence relationship comprises the following steps:

2. The co-occurrence relationship-based project combination mining method according to claim 1, wherein the constructing an initial co-occurrence relationship network between the graph nodes by using all treatment projects as the graph nodes to obtain a project co-occurrence relationship graph comprises:

judging whether the probability meets an initial co-occurrence condition;

3. The co-occurrence relationship-based project portfolio mining method of claim 2, wherein the project portfolio comprises a first treatment project and a second treatment project, and the calculating the probability that the project portfolio occurs at the same time in the medical record data comprises:

4. The co-occurrence relationship-based project combination mining method according to claim 3, wherein the judging whether the probability satisfies a co-occurrence relationship construction condition comprises:

5. The method of claim 4, wherein the step of simplifying the initial co-occurrence relationship network in the project co-occurrence relationship diagram by presetting a project combination mining model to obtain a network relationship structure comprises:

6. The method for mining a combination of items based on co-occurrence relationship according to claim 5, wherein the adjusting the graph of the co-occurrence relationship of the items based on the network relationship structure to obtain a complete subgraph comprises:

7. The co-occurrence relationship-based project combination mining method according to any one of claims 1-6, further comprising, after the generating of the treatment project combination data set corresponding to the disease category based on the network relationship between graph nodes in the full sub-graph:

8. A co-occurrence relation-based project combination mining device is characterized in that the co-occurrence relation-based project combination mining device comprises:

9. A co-occurrence relationship-based project portfolio mining apparatus, the co-occurrence relationship-based project portfolio mining apparatus comprising: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;

the at least one processor invokes the instructions in the memory to cause the co-occurrence based project portfolio mining device to perform the co-occurrence based project portfolio mining method of any of claims 1-7.

10. A computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the co-occurrence relationship-based project portfolio mining method of any one of claims 1-7.