CN111785372A - Collaborative filtering disease prediction system based on association rule and electronic equipment thereof - Google Patents

Collaborative filtering disease prediction system based on association rule and electronic equipment thereof Download PDF

Info

Publication number
CN111785372A
CN111785372A CN202010408220.3A CN202010408220A CN111785372A CN 111785372 A CN111785372 A CN 111785372A CN 202010408220 A CN202010408220 A CN 202010408220A CN 111785372 A CN111785372 A CN 111785372A
Authority
CN
China
Prior art keywords
disease
association rule
unit
data
correlation matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010408220.3A
Other languages
Chinese (zh)
Inventor
王晓梅
李广砥
袁雪
张世武
王雅宁
王晨阳
赵明芬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Zhisheng Technology Group Co ltd
Original Assignee
Zhejiang Zhisheng Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Zhisheng Technology Group Co ltd filed Critical Zhejiang Zhisheng Technology Group Co ltd
Priority to CN202010408220.3A priority Critical patent/CN111785372A/en
Publication of CN111785372A publication Critical patent/CN111785372A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention provides a disease prediction system based on collaborative filtering of association rules, which comprises: the acquisition module is used for acquiring historical case data related to a patient; the disease prediction module is used for predicting by utilizing the normalized disease correlation matrix and historical case data to generate a disease prediction result; and the sequencing module is used for sequencing the disease prediction results according to the probability. Provides more excellent disease risk prediction for patients.

Description

Collaborative filtering disease prediction system based on association rule and electronic equipment thereof
[ technical field ] A method for producing a semiconductor device
The invention belongs to the field of recommendation systems, and particularly relates to a collaborative filtering disease prediction system based on association rules and electronic equipment thereof.
[ background of the invention ]
Traditionally, physicians or doctors use risk calculators to assess the likelihood of disease progression. These calculators use basic information such as demographics, medical conditions, life routines, etc. to calculate the likelihood of developing a disease. This calculation is done using mathematical methods and tools based on equations. The challenge here is the low accuracy of using a similar equation-based approach, and that requires a very large amount of comprehensive data for careful statistical analysis. However, with the development of technologies such as big data, machine learning, data mining, artificial intelligence and the like in recent years, the disease prediction result may be more accurate, more convenient and faster. Medical institutions, insurance groups, doctors, etc. are working with statisticians and computer scientists to develop better tools to predict disease. Experts in the field are studying methods to determine, develop and fine tune machine learning algorithms and models to provide accurate predictions. Miotto, R. et al studied and derived "deep patient" model representations that were highly accurate in the specific disease prediction task. Convolutional Neural Network (CNN) models have been constructed using data suitable for both structured and unstructured to predict cerebral infarction. Nagrecha et al use a "diagnostic map" to predict heart failure in elderly patients. This work is aimed at mining important disease progression tracks to help people predict heart failure. Qingyu Zhao et al used a generic regression model based on a variogram self-coding framework and applied it to the brain aging prediction problem for structured nuclear magnetic images.
At present, most of domestic and foreign papers and patents use machine learning or data mining to predict the incidence probability and grade of a specific disease. Or analysis and prediction of specific diseases based on external input of other external factors, such as diet. Medical image data or gene-related information is mostly used from the viewpoint of data of research prediction sampling. The research of extracting the diseases of the patients from the electronic medical record, tracking various diseases for a long time and predicting the occurrence probability of the subsequent diseases is relatively few.
The collaborative filtering algorithm is used for recommending information which is interested by a user by utilizing the preference of a group which has mutual experience and interests. Collaborative filtering techniques have been successfully applied to recommendation systems in the entertainment and electronic retail industries. These systems predict item dependencies not found in an entity history by exploring the entity's current item history. The method is also suitable for disease prediction, and can take diseases as items and the current medical history of the subject as item history. Davis et al first proposed and discussed the use of a mechanism of collaborative filtering as a disease predictor. They use a method of user preference vector similarity to solve this problem. They have created a system called CARE that uses patient history as input to predict future diagnostic risk based on characteristics of other similar patients. Folino et al employ association rule analysis and Markov models to predict disease risk, which uses a combination of mining models to extract continuous disease patterns.
Collaborative filtering has become quite popular in the entertainment field as well as in businesses as a method for recommending systems. Many collaborative filtering algorithm based recommendation systems have been successfully applied to movie, song, and merchandise purchase recommendations, among others. Recommender systems have become a cornerstone of personalization for many industries.
At present, many researches and applications are carried out on disease prediction, but most of the researches are directed to analysis and prediction of a certain specific disease, for example, only whether diabetes occurs or not, the occurrence stage and the complications of the diabetes are predicted, but methods for predicting various diseases which may occur in the future based on a collaborative filtering algorithm according to the correlation relationship among various diseases of patients are very few, some documents propose disease prediction methods based on correlation rules, but only the traditional disease prediction method based on the generation of the correlation rules is directly predicted, and a disease correlation matrix is not constructed based on information generated by the correlation rules to carry out a disease prediction system based on the collaborative filtering.
In the invention, an FP-Growth algorithm of a Hanweimaster is used for generating the association rules, the extraction operation is carried out on the association rules higher than a certain confidence coefficient threshold value, the correlation matrix of the disease is constructed, and the risk probability or index of future disease occurrence of other patients similar to the patient is obtained based on the mode of predicting the data inner product disease correlation matrix, so that higher disease prediction precision is obtained.
[ summary of the invention ]
The invention aims to provide a disease prediction system based on association rules and a device thereof for solving the defects of the prior art, and provides a more excellent disease risk prediction for patients.
In order to achieve the above object, the present invention provides a disease prediction system based on collaborative filtering of association rules, comprising:
the acquisition module is used for acquiring historical case data related to a patient;
the disease prediction module is used for predicting by utilizing the normalized disease correlation matrix and historical case data to generate a disease prediction result; and
and the sequencing module is used for sequencing the disease prediction results according to the probability.
In an embodiment of the invention, the disease prediction module includes:
the disease correlation matrix establishing unit is used for establishing a disease correlation matrix by utilizing the association rule which meets the confidence requirement and selects the front piece and the back piece of the association rule to be 1;
the disease pair counting matrix establishing unit is used for establishing a disease pair counting matrix by utilizing the association rule which meets the confidence requirement and selects the front piece and the back piece of the association rule to be 1 so as to form a disease pair counting matrix for counting the occurrence times of the association rule;
the normalization unit is used for performing normalization processing on the counting matrix by utilizing the disease correlation matrix according to the element divided by the disease to obtain a normalized disease correlation matrix;
and the calculating unit is used for multiplying the historical case points by the normalized disease correlation matrix to obtain a disease prediction result.
In an embodiment of the invention, the disease correlation matrix establishing unit includes:
an acquisition unit for acquiring historical diagnostic data of a plurality of different patients;
a data processing unit for processing the historical diagnostic data for each patient into a disease data set;
the association rule establishing unit is used for identifying all disease frequent item sets meeting the requirement of the minimum support degree from a plurality of disease data sets based on an FP-Growth algorithm and outputting the association rule when the identified disease frequent item sets meet the association rule of the requirement of the reliability;
and the association rule extraction unit is used for extracting the association rule which is not less than the confidence requirement from all the association rules in the association rule establishment unit and selecting the association rule of which the front piece and the back piece are both 1 for training the disease correlation matrix.
In an embodiment of the invention, the data processing unit is further configured to:
preprocessing the historical diagnostic data to obtain a diagnostic result field in the historical diagnostic data; and
and mapping the diagnosis result field based on the ICD-10 code to obtain a multidimensional binary vector, wherein the multidimensional binary vector is a mapped data set.
In an embodiment of the present invention, the association rule establishing unit includes:
the preprocessing unit is used for setting a patient number as an identifier TID based on disease data sets of a plurality of patients, setting the number of the corresponding disease data set as a transaction T, setting all the disease data sets as D, and setting a disease diagnosis result of each disease data set as an item;
the FP-tree unit is used for scanning the set D of all disease data sets for the first time, calculating the support degree count of each diagnosis result field in each transaction T, reserving the diagnosis result field as a frequent item when the support degree count of the diagnosis result field is larger than or equal to the minimum support degree threshold value, and then arranging the frequent items in a descending order according to the support degree count;
the FP-tree subunit is used for carrying out secondary scanning on the transaction set D, when one transaction T is read in, a node null marked as a diagnosis result field is created, then a path from a root node to the diagnosis result field node is formed until each transaction is mapped to one path of the FP-tree, and the FP-tree is formed after all the transactions are read in;
and the mining unit is used for sequentially extracting corresponding item sets upwards from the tail node of each path of the FP-tree, when the support degree of the item sets is greater than or equal to the minimum support degree threshold value, the item sets are reserved as frequent item sets L, and the association rules are output when the identified frequent item sets L meet the association rules required by the confidence degree.
In an embodiment of the present invention, the disease correlation matrix creating unit further includes a noise reduction unit, and the noise reduction unit is configured to perform noise reduction processing on noise information in the historical diagnostic data.
In an embodiment of the present invention, the confidence level in the association rule extracting unit is set to 0.01.
In an embodiment of the present invention, the disease correlation matrix of the disease correlation matrix establishing unit is a two-dimensional matrix of N × N, where N is 3000;
the disease coefficient matrix of the disease pair count matrix building unit is a two-dimensional matrix of N × N, where N is 3000.
In an embodiment of the present invention, in the disease prediction module, the normalized disease correlation matrix is evaluated by using prediction accuracy and recall.
To achieve the above object, the present invention also provides an electronic device comprising a memory for storing information including program instructions and a processor for controlling the execution of the program instructions, which are loaded and executed by the processor to implement the association rule based collaborative filtering disease prediction system.
Compared with the prior art, the collaborative filtering disease prediction system based on the association rule provided by the invention has the advantages that the historical diagnosis data of the user is preprocessed in a mapping coding mode, the running speed and efficiency of the disease prediction system can be improved, and the disease prediction system has good interpretability.
In addition, compared with the prior art, when the collaborative filtering disease prediction system based on the association rule performs prediction, the disease risk probability and the sequencing thereof can be obtained only by performing inner product operation on the data to be predicted and the disease correlation matrix and sequencing the data.
[ description of the drawings ]
Fig. 1 is a block diagram of a disease prediction system based on association rules according to an embodiment of the present invention.
Fig. 2 is a block diagram of a disease prediction module of the disease prediction system according to the above embodiment of the present invention.
Fig. 3 is a block diagram illustrating a disease correlation matrix establishing unit of the disease prediction system according to the association rule in the above embodiment of the present invention.
Fig. 4 is a block diagram illustrating an association rule establishing unit of the disease prediction system according to the association rule in the above embodiment of the present invention.
Fig. 5 is a block diagram of the electronic device according to the above embodiment of the present invention.
[ detailed description ] embodiments
The following description is presented to disclose the invention so as to enable any person skilled in the art to practice the invention. The preferred embodiments in the following description are given by way of example only, and other obvious variations will occur to those skilled in the art. The basic principles of the invention, as defined in the following description, may be applied to other embodiments, variations, modifications, equivalents, and other technical solutions without departing from the spirit and scope of the invention.
In the present invention, the terms "a" and "an" in the claims and the description should be understood as meaning "one or more", that is, one element may be one in number in one embodiment, and the element may be more than one in number in another embodiment. The terms "a" and "an" should not be construed as limiting the number unless the number of such elements is explicitly recited as one in the present disclosure, but rather the terms "a" and "an" should not be construed as being limited to only one of the number.
The embodiment of the invention provides a collaborative filtering disease prediction system based on association rules, as shown in fig. 1, the disease prediction system is used for predicting the probability that a patient is possibly suffered from a certain disease in the future through a normalized disease correlation matrix according to historical diagnosis data of the patient. The disease prediction system may include an acquisition module 10, a disease prediction module 20, and a ranking module 30. The obtaining module 10 is configured to obtain historical case data related to a patient. The disease prediction module 20 is configured to perform prediction using the normalized disease correlation matrix and the historical case data to generate a disease prediction result. The ranking module 30 is used to rank the disease prediction results according to probability. Here, the disease prediction result generated by the disease prediction module 20 includes a plurality of diseases or complications associated with the historical diagnostic data and probabilities corresponding to the associated diseases or complications. The ranking module 30 may rank the results generated by the disease prediction module 20 according to the probability according to different tasks or different scenes. And the number of the selected sorting can be set according to the needs of the patients. For example, the historical diagnosis data of a certain patient has diabetes, the acquisition module 10 acquires the diabetes, the disease prediction module 20 can obtain a plurality of diseases or complications associated with diabetes and their corresponding occurrence probabilities, such as retinopathy, coronary heart disease, diabetic nephropathy, diabetic foot, and the like, and the ranking module 30 ranks the associated diseases or complications in the disease prediction result according to the occurrence probability, such as selecting the associated diseases or complications with high top10 probability, the diseases with high top 50 probability, or the diseases with high top 100 probability.
As shown in fig. 2, in an embodiment of the present invention, the disease prediction module 20 may include a disease correlation matrix establishing unit 21, a disease pair count matrix establishing unit 22, a normalization unit 23, and a calculating unit 24. The disease correlation matrix establishing unit 21 is configured to establish a disease correlation matrix by using an association rule that satisfies the confidence requirement and selects a front piece and a back piece of the association rule, which are both 1. The disease pair count matrix establishing unit 22 is configured to establish a disease pair count matrix by using the association rule that satisfies the confidence requirement and selects the association rule that the front piece and the back piece are both 1. The normalization unit 23 is configured to perform normalization processing on the count matrix by using the disease correlation matrix according to the element divided by the disease, so as to obtain a normalized disease correlation matrix. The calculation unit 24 is configured to multiply the historical case points by the normalized disease correlation matrix to obtain a disease prediction result. Here, the normalization unit 23 is used to normalize the count matrix by dividing the element by the disease using the disease-related matrix, and may be expressed as the normalization unit 23 is used to normalize the count matrix by dividing the element by the disease using the disease-related matrix by the bitwise division.
In an embodiment of the present invention, the disease correlation matrix of the disease correlation matrix establishing unit is a two-dimensional matrix of N × N, where N is 3000;
the disease coefficient matrix of the disease pair count matrix building unit is a two-dimensional matrix of N × N, where N is 3000.
In the present invention, a disease correlation matrix is constructed based on the association rule and its corresponding confidence, preferably, the disease correlation matrix is a two-dimensional matrix of N by N, where N is the number of common diseases in the system, here 3000, and each disease has a corresponding ICD-10 code of the international standard. All values of the initial N by N disease correlation matrix are 0, since the disease correlation matrix has not yet been constructed. Similarly, by constructing an N by N two-dimensional disease pair count matrix based on association rules and their corresponding confidences, the initial count value is 1.
The support degree is the proportion of the frequent item set in the whole data set, and is represented by the probability that a certain frequent item set occupies the whole data set. The confidence level is the proportion occupied by another frequent item set when the whole data set comprises a certain frequent item set, and is represented by conditional probability. For example, assuming that there are 100 records in a data set and 10 records containing { 'diabetes', }, then the support of { 'diabetes', } is 10/100 ═ 0.1. The confidence is defined for a certain association rule, the association rule is, for example, { 'diabetes' } - > { 'retinopathy' }, and the confidence calculation formula is the support of { 'diabetes', 'retinopathy' }/{ 'diabetes' }. Assuming that the support of { 'diabetes', 'retinopathy' } is 0.05 and the support of { 'diabetes' } is 0.1, the confidence of { 'diabetes' } - > { 'retinopathy' } is 0.05/0.1 — 0.5, indicating that 50% of all diabetic patients also have retinopathy of ocular disease.
In the disease correlation matrix training of the present invention, an association rule is set in which the confidence is greater than 0.01 and both the front piece and the back piece of the association rule are 1. The antecedent is a proposition indicating a condition in the association rule, and the consequent is a proposition indicating that a dependent condition is established in the association rule. For example, { 'diabetes' } - > { 'retinopathy' } the antecedent and posterity of this association rule are both 1, and { 'diabetes', 'arthritis' } > { 'retinopathy' } the antecedent of this association rule is 2, i.e., the combined antecedent of diabetes and arthritis, and the posterity is 1.
And constructing a disease correlation matrix based on the obtained association rule after obtaining the association rule with all the front and back pieces being 1 according to the association rule with the set confidence degree being more than 0.01 and the front and back pieces being 1 of the association rule. For example, having the association rule { 'diabetes' } - > { 'retinopathy' } with a confidence of 0.5, then the correlation between diabetes and retinopathy can be established in the disease correlation matrix, the rows and columns where diabetes and retinopathy intersect in the N by N disease correlation matrix are data-updated and the original 0 value is updated to 0.5. If the association rule { 'retinopathy' } - > { 'diabetes' } has a confidence of 0.3, then the rows and columns where retinopathy and diabetes intersect at the N by N disease correlation matrix are data updated and the original 0 value is updated to 0.3. If there is a correlation rule { 'hypertension' } - > { 'Heart disease' } and its confidence is 0.3, then the correlation between hypertension and heart disease can be established in the disease correlation matrix, the rows and columns of the crossing of hypertension and heart disease in the N by N disease correlation matrix are updated with data and the original 0 value is updated to 0.3. And by analogy, traversing all association rules of which the front piece and the back piece are 1, and finally obtaining a complete disease correlation matrix, wherein the disease correlation matrix with the correlation or the confidence coefficient larger than 0.01 is established.
Similarly, the N by N disease pair count matrix is synchronously updated to form a disease pair count matrix that counts the number of occurrences of association rules as all association rules that satisfy both the antecedent and the consequent 1 are traversed. For example, if { 'diabetes', 'retinopathy' } occurs twice then the corresponding disease row and column in the N by N count matrix is updated from the original 1 value to 2, if the pair of diseases subsequently also occurs then the 2 update continues to 3, and so on. Until the disease pair count matrix is generated after training is completed.
As shown in fig. 3, the disease correlation matrix creating unit 21 may include an acquiring unit 211, a data processing unit 212, an association rule creating unit 213, and an association rule extracting unit 214. The obtaining unit 211 is configured to obtain historical diagnosis data of a plurality of different patients. The data processing unit 212 is used to process the historical diagnostic data for each patient into a disease data set. The association rule establishing unit 213 is configured to identify all disease frequent item sets satisfying the minimum support requirement from the plurality of disease data sets based on the FP-Growth algorithm, and output the association rule when the identified disease frequent item set satisfies the association rule satisfying the confidence requirement. The association rule extracting unit 214 extracts the association rule which is not less than the confidence requirement from all the association rules in the association rule establishing unit 213 and selects the association rule of which the front piece and the back piece are 1, so as to train the disease correlation matrix.
The historical diagnosis data collected by one or more hospitals or companies includes diagnosis information of different patients in different years, different hospitals and possible irrelevant information, and the obtaining unit 211 is used for obtaining the historical diagnosis result field.
In an embodiment of the present invention, the data processing unit 212 is further configured to: preprocessing the historical diagnostic data to obtain a diagnostic result field in the historical diagnostic data; and mapping the diagnosis result field based on the ICD-10 code to obtain a multidimensional binary vector, wherein the multidimensional binary vector is a mapped data set.
The information in the historical diagnosis data is grouped and combined, irrelevant information in the historical diagnosis data is deleted, only diagnosis result fields of patients diagnosed in different historical periods are reserved, then the diagnosis result fields are mapped according to the ICD-10 codes, each disease in the diagnosis result fields is in one-to-one correspondence with the ICD-10 codes, and therefore a disease data set formed after mapping is a multi-dimensional binary vector. For example, the present invention selects more than 3000 common diseases and one-to-one corresponding ICD-10 diagnosis codes, and uses the code J31.200 to represent chronic pharyngitis disease, and uses 1 to represent that the diagnosed disease has occurred in its medical history, and uses 0 to represent that the diagnosed disease has never occurred in its medical history, regardless of whether the patient has occurred multiple times during the actual diagnosis, so that it is possible to form a binary vector in which the historical diagnosis result field of each patient is expressed in 3000 dimensions.
As shown in fig. 3, the disease correlation matrix creating unit 21 further includes a denoising unit 214, where the denoising unit 214 is configured to perform denoising processing on noise information in the historical diagnosis data to eliminate noise information introduced in the processes executed by the data processing unit 212 and the obtaining unit 211, for example, various noise information may be introduced in the processes of raw information of the historical electronic medical record, disease extraction and transformation, and the like.
In an embodiment of the present invention, the confidence level in the association rule extracting unit is set to 0.01.
As shown in fig. 4, the association rule establishing unit 213 may include a preprocessing unit 2131, an FP-tree unit 2132, an FP-tree sub-unit 2132, and a mining unit 2134. The preprocessing unit 2131 is configured to set a patient number as an identifier TID, a number corresponding to a disease data set as one transaction T, a set of all disease data sets as D, and a disease diagnosis result of each disease data set as one item, based on the disease data sets of a plurality of patients. The FP-tree unit 2132 is configured to perform first scanning on the set D of all disease datasets, calculate a support count of each diagnosis result field in each transaction T, reserve, as a frequent item, a diagnosis result field when the support count of the diagnosis result field is greater than or equal to a minimum support threshold, and then sort the frequent items in descending order according to the support counts. An FP-tree subunit 2132, configured to perform second scanning on the transaction set D, create a node null marked as a diagnosis result field of the transaction set D when each transaction T is read in, then form a path from the root node to the diagnosis result field node until each transaction is mapped to a path of the FP-tree, and form the FP-tree after all transactions are read in. The mining unit 2134 is configured to sequentially extract a corresponding item set from an end node of each path of the FP-tree upwards, where when the support degree of the item set is greater than or equal to a minimum support degree threshold, the item set is reserved as a frequent item set L, and when the identified frequent item set L meets an association rule that requires confidence, the association rule is output.
In an embodiment of the present invention, a path in the FP-tree subunit is composed of a root node null and all corresponding diagnosis result field nodes, where a path order of the diagnosis result field nodes is a descending order of the frequent item support counts.
In an embodiment of the invention, in the disease prediction module, the normalized disease correlation matrix is evaluated by using precision rate and recall rate.
To verify the training effectiveness of the present system, in an embodiment of the present invention, a disease prediction system that evaluates the association rules using precision and recall. Here, to simulate the challenge of predicting unknown diseases in the real world to measure the performance of a disease prediction system, we tested the performance of the disease prediction system by the hold-out method. The leave-out method is that for each patient, half of diagnosis results are reserved from historical diagnosis data of the patient, diagnosis codes of the reserved diagnosis results are changed to 0, then the reserved patient diagnosis test data are transmitted to a disease prediction system to obtain the probability of disease diagnosis, and the accuracy rate and the recall rate are calculated according to the reserved diagnosis probability.
In the actual verification process, the prediction result of the collaborative filtering disease prediction system based on the association rule (the system provided by the invention in the table) is compared with the CARE system (the CARE system is abbreviated as CARE in the table), and the disease diagnosis accuracy (precision) and recall (recall) of Top-K are predicted, which are respectively shown in table 1 and table 2.
TABLE 1 accuracy of Top-k prediction by two methods
topK accuracy CARE system The invention provides a system
1 48.3% 53.8%
5 35.3% 38.3%
10 30.2% 33.9%
50 13.7% 15.1%
100 9.1% 10.7%
TABLE 2 two methods to predict Top-k recall
topK recall CARE system The invention provides a system
1 30.1% 13.6%
5 43.9% 17.8%
10 51.4% 27.5%
50 57.7% 40.3%
100 69.5% 49.1%
As can be seen from tables 1 and 2, in the data set, the collaborative filtering algorithm based on the association rule has certain advantages in disease prediction compared with the original CARE system, and moreover, the prediction method based on the association rule model has higher algorithm efficiency in prediction, because the system provided by the invention can obtain the required disease prediction probability and the sequencing thereof only by one-step matrix multiplication and corresponding sequencing operation in prediction, and is more efficient.
In addition, it is often the case that noise data is mixed in system data, and for example, various noise information may be introduced in processes such as electronic medical record raw information and disease extraction conversion. As with other commonly used machine learning predictive systems, the robustness of the system, i.e., the noise immunity, is related to the stability and reliability of the system during its particular operation. Therefore, in another step of system test, noise information is artificially introduced into the original test information. Noise was added from 1 to 5 to verify the change in system recall. The recall change results for each patient noise diagnosis from 1 to 10 are shown in table 3.
TABLE 3 Effect of noise introduction on recall for Top-10 diagnostics
Number of noise additions top10 recall Change
0 51.4%
1 50.5%
2 49.1%
3 48.2%
4 47.5%
5 46.3%
As can be seen from the above table, the anti-noise capability of the system provided by the invention is detected, so that the disease prediction system provided by the invention has stronger robustness to noise input.
To achieve the above object, the present invention also provides an electronic device comprising a memory for storing information including program instructions and a processor for controlling the execution of the program instructions, which are loaded and executed by the processor to implement the association rule based collaborative filtering disease prediction system described above.
Next, an electronic apparatus according to an embodiment of the present invention is described with reference to the drawings. As shown in fig. 5, electronic device 40 includes one or more processors 41 and memory 42.
The processor 41 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 40 to perform desired functions. In other words, the processor 41 includes one or more physical devices configured to execute instructions. For example, the processor 41 may be configured to execute instructions that are part of: one or more applications, services, programs, routines, libraries, objects, components, data structures, or other logical constructs. Such instructions may be implemented to perform a task, implement a data type, transform the state of one or more components, implement a technical effect, or otherwise arrive at a desired result.
The processor 41 may include one or more processors configured to execute software instructions. Additionally or alternatively, the processor 41 may include one or more hardware or firmware logic machines configured to execute hardware or firmware instructions. The processors of the processor 41 may be single-core or multi-core, and the instructions executed thereon may be configured for serial, parallel, and/or distributed processing. The various components of the processor 41 may optionally be distributed over two or more separate devices, which may be remotely located and/or configured for coordinated processing. Aspects of the processor 41 may be virtualized and executed by remotely accessible networked computing devices configured in a cloud computing configuration.
The memory 42 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer readable storage medium and executed by the processor 41 to implement some or all of the steps of the above-described exemplary methods of the present invention, and/or other desired functions.
In other words, the memory 42 comprises one or more physical devices configured to hold machine-readable instructions executable by the processor 41 to implement the methods and processes described herein. In implementing these methods and processes, the state of the memory 42 may be transformed (e.g., to hold different data). The memory 42 may comprise a removable and/or built-in device. The memory 42 may include optical memory (e.g., CD, DVD, HD-DVD, blu-ray disc, etc.), semiconductor memory (e.g., RAM, EPROM, EEPROM, etc.), and/or magnetic memory (e.g., hard-disk drive, floppy-disk drive, tape drive, MRAM, etc.), among others. The memory 42 may include volatile, nonvolatile, dynamic, static, read/write, read-only, random-access, sequential-access, location-addressable, file-addressable, and/or content-addressable devices.
It is understood that the memory 42 comprises one or more physical devices. However, aspects of the instructions described herein may alternatively be propagated by a communication medium (e.g., an electromagnetic signal, an optical signal, etc.) that is not held by a physical device for a limited period of time. Aspects of the processor 41 and the memory 42 may be integrated together into one or more hardware logic components. These hardware logic components may include, for example, Field Programmable Gate Arrays (FPGAs), program and application specific integrated circuits (PASIC/ASIC), program and application specific standard products (PSSP/ASSP), system on a chip (SOC), and Complex Programmable Logic Devices (CPLDs).
In one example, as shown in FIG. 5, the electronic device 40 may also include input and output devices that are interconnected via a bus system and/or other form of connection mechanism (not shown). For example, the input device may be, for example, a camera module or the like for capturing image data or video data. As another example, the input device may include or interface with one or more user input devices such as a keyboard, mouse, touch screen, or game controller. In some embodiments, the input device may include or interface with a selected Natural User Input (NUI) component. Such component parts may be integrated or peripheral and the transduction and/or processing of input actions may be processed on-board or off-board. Example NUI components may include a microphone for speech and/or voice recognition; infrared, color, stereo display and/or depth camera for machine vision and/or gesture recognition; a head tracker, eye tracker, accelerometer and/or gyroscope for motion detection and/or intent recognition; and an electric field sensing component for assessing brain activity and/or body movement; and/or any other suitable sensor.
The output device can output various information including classification results and the like to the outside. The output devices may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto.
Of course, the electronic device 40 may further include the communication apparatus, wherein the communication apparatus may be configured to communicatively couple the electronic device 40 with one or more other computer devices. The communication means may comprise wired and/or wireless communication devices compatible with one or more different communication protocols. As a non-limiting example, the communication subsystem may be configured for communication via a wireless telephone network or a wired or wireless local or wide area network. In some embodiments, the communication device may allow the electronic device 40 to send and/or receive messages to and/or from other devices via a network such as the internet.
It will be appreciated that the configurations and/or approaches described herein are exemplary in nature, and that these specific embodiments or examples are not to be considered in a limiting sense, because numerous variations are possible. The specific routines or methods described herein may represent one or more of any number of processing strategies. As such, various acts illustrated and/or described may be performed in the sequence illustrated and/or described, in other sequences, in parallel, or omitted. Also, the order of the above-described processes may be changed.
Of course, for simplicity, only some of the components of the electronic device 40 relevant to the present invention are shown, and components such as buses, input/output interfaces, and the like are omitted. In addition, electronic device 40 may include any other suitable components, depending on the particular application.
According to another aspect of the present invention, the present invention further provides an electronic device such as a smartphone, a smart robot, or the like, wherein the electronic device is configured with the above-mentioned association rule-based disease prediction system for performing disease prediction on a user. Illustratively, the electronic device comprises a smart phone and the association rule based disease prediction system 1, wherein the association rule based disease prediction system 1 is configured to the smart phone for performing disease prediction on historical diagnosis results input via the smart phone. It is to be understood that the smartphone may be, but is not limited to being, implemented as a camera-enabled smartphone.
In addition to the above-described methods and apparatus, embodiments of the present invention may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the methods according to various embodiments of the present invention described in the "exemplary methods" section above of this specification.
The computer program product may write program code for carrying out operations for embodiments of the present invention in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the C language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, an embodiment of the present invention may also be a computer-readable storage medium having stored thereon computer program instructions, which, when executed by a processor, cause the processor to perform the steps of the above-described method of the present specification.
The computer readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The basic principles of the present invention have been described above with reference to specific embodiments, but it should be noted that the advantages, effects, etc. mentioned in the present invention are only examples and are not limiting, and the advantages, effects, etc. must not be considered to be possessed by various embodiments of the present invention. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the invention is not limited to the specific details described above.
The block diagrams of devices, apparatuses, systems involved in the present invention are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
It should also be noted that in the apparatus, devices and methods of the present invention, the components or steps may be broken down and/or re-combined. These decompositions and/or recombinations are to be regarded as equivalents of the present invention.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the invention. Thus, the present invention is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
It will be appreciated by persons skilled in the art that the embodiments of the invention described above and shown in the drawings are given by way of example only and are not limiting of the invention. The objects of the invention have been fully and effectively accomplished. The functional and structural principles of the present invention have been shown and described in the examples, and any variations or modifications of the embodiments of the present invention may be made without departing from the principles.

Claims (10)

1. A collaborative filtering based disease prediction system based on association rules, comprising:
the acquisition module is used for acquiring historical case data related to a patient;
the disease prediction module is used for predicting by utilizing the normalized disease correlation matrix and historical case data to generate a disease prediction result; and the sequencing module is used for sequencing the disease prediction results according to the probability.
2. The association rule based collaborative filtering disease prediction system of claim 1, wherein the disease prediction module comprises:
the disease correlation matrix establishing unit is used for establishing a disease correlation matrix by utilizing the association rule which meets the confidence requirement and selects the front piece and the back piece of the association rule to be 1;
the disease pair counting matrix establishing unit is used for establishing a disease pair counting matrix by utilizing the association rule which meets the confidence requirement and selects the front piece and the back piece of the association rule to be 1;
the normalization unit is used for performing normalization processing on the counting matrix by utilizing the disease correlation matrix according to the element divided by the disease to obtain a normalized disease correlation matrix;
and the calculating unit is used for multiplying the historical case points by the normalized disease correlation matrix to obtain a disease prediction result.
3. The association rule based collaborative filtering disease prediction system according to claim 2, wherein the disease correlation matrix establishing unit includes:
an acquisition unit for acquiring historical diagnostic data of a plurality of different patients;
a data processing unit for processing the historical diagnostic data for each patient into a disease data set;
the association rule establishing unit is used for identifying all disease frequent item sets meeting the requirement of the minimum support degree from a plurality of disease data sets based on an FP-Growth algorithm and outputting the association rule when the identified disease frequent item sets meet the association rule of the confidence requirement;
and the association rule extraction unit is used for extracting the association rule which is not less than the confidence requirement from all the association rules in the association rule establishment unit and selecting the association rule of which the front piece and the back piece are both 1 to construct a disease correlation matrix.
4. The association rule based collaborative filtering disease prediction system according to claim 3, wherein the data processing unit is further configured to:
preprocessing the historical diagnostic data to obtain a diagnostic result field in the historical diagnostic data; and
and mapping the diagnosis result field based on the ICD-10 code to obtain a multidimensional binary vector, wherein the multidimensional binary vector is a mapped data set.
5. The association rule based collaborative filtering disease prediction system according to claim 4, wherein the association rule establishing unit includes:
the preprocessing unit is used for setting a patient number as an identifier TID based on disease data sets of a plurality of patients, setting the number of the corresponding disease data set as a transaction T, setting all the disease data sets as D, and setting a disease diagnosis result of each disease data set as an item;
the FP-tree unit is used for scanning the set D of all disease data sets for the first time, calculating the support degree count of each diagnosis result field in each transaction T, reserving the diagnosis result field as a frequent item when the support degree count of the diagnosis result field is larger than or equal to the minimum support degree threshold value, and then arranging the frequent items in a descending order according to the support degree count;
the FP-tree subunit is used for carrying out secondary scanning on the transaction set D, when one transaction T is read in, a node null marked as a diagnosis result field is created, then a path from a root node to the diagnosis result field node is formed until each transaction is mapped to one path of the FP-tree, and the FP-tree is formed after all the transactions are read in;
and the mining unit is used for sequentially extracting corresponding item sets upwards from the tail node of each path of the FP-tree, when the support degree of the item sets is greater than or equal to the minimum support degree threshold value, the item sets are reserved as frequent item sets L, and the association rules are output when the identified frequent item sets L meet the association rules required by the confidence degree.
6. The association rule based collaborative filtering disease prediction system according to claim 3, wherein the disease correlation matrix building unit further comprises a noise reduction unit for performing noise reduction processing on noise information in historical diagnostic data.
7. The association rule based collaborative filtering disease prediction system of claim 3,
the confidence in the association rule extraction unit is set to 0.01.
8. The association rule based collaborative filtering disease prediction system of claim 3,
the disease correlation matrix of the disease correlation matrix building unit is a two-dimensional matrix of N x N, wherein N is 3000;
the disease coefficient matrix of the disease pair count matrix building unit is a two-dimensional matrix of N × N, where N is 3000.
9. The association rule based collaborative filtering disease prediction system of claim 1,
in the disease prediction module, the normalized disease correlation matrix is evaluated by using prediction accuracy and recall.
10. An electronic device comprising a memory for storing information comprising program instructions and a processor for controlling the execution of the program instructions, wherein the program instructions when loaded and executed by the processor implement the association rule based collaborative filtering disease prediction system of any of claims 1 to 9.
CN202010408220.3A 2020-05-14 2020-05-14 Collaborative filtering disease prediction system based on association rule and electronic equipment thereof Pending CN111785372A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010408220.3A CN111785372A (en) 2020-05-14 2020-05-14 Collaborative filtering disease prediction system based on association rule and electronic equipment thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010408220.3A CN111785372A (en) 2020-05-14 2020-05-14 Collaborative filtering disease prediction system based on association rule and electronic equipment thereof

Publications (1)

Publication Number Publication Date
CN111785372A true CN111785372A (en) 2020-10-16

Family

ID=72753632

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010408220.3A Pending CN111785372A (en) 2020-05-14 2020-05-14 Collaborative filtering disease prediction system based on association rule and electronic equipment thereof

Country Status (1)

Country Link
CN (1) CN111785372A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112489807A (en) * 2020-11-30 2021-03-12 中国人民解放军南部战区总医院 Accelerated rehabilitation data processing method, device and medium based on historical data
CN113643815A (en) * 2021-08-31 2021-11-12 平安医疗健康管理股份有限公司 Disease complication prediction method and device, computer equipment and storage medium
CN114504298A (en) * 2022-01-21 2022-05-17 南京航空航天大学 Physiological feature distinguishing method and system based on multi-source health perception data fusion

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0917078A1 (en) * 1996-09-30 1999-05-19 Smithkline Beecham Corporation Disease management method and system
US20040015337A1 (en) * 2002-01-04 2004-01-22 Thomas Austin W. Systems and methods for predicting disease behavior
CN105335804A (en) * 2014-08-06 2016-02-17 北京计算机技术及应用研究所 Community health service system
CN106709248A (en) * 2016-12-16 2017-05-24 浙江大学 Disease complication excavating method based on FP-Growth algorithm
WO2017205972A1 (en) * 2016-05-31 2017-12-07 Genomedx Biosciences, Inc. Systems methods and compositions for predicting metastasis in bladder cancer
CN108334548A (en) * 2017-12-26 2018-07-27 爱品克科技(武汉)股份有限公司 A kind of data mining technology based on correlation rule
CN108573758A (en) * 2018-04-26 2018-09-25 贵州大学 A kind of intelligent medical big data service system and application process
CN108597614A (en) * 2018-04-12 2018-09-28 上海熙业信息科技有限公司 A kind of auxiliary diagnosis decision-making technique based on Chinese electronic health record
CN109584086A (en) * 2018-10-30 2019-04-05 平安医疗健康管理股份有限公司 Be hospitalized rational method and Related product are predicted based on prediction model
CN110164520A (en) * 2019-05-24 2019-08-23 南京邮电大学 The associated chronic diseases management device of space-time big data

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0917078A1 (en) * 1996-09-30 1999-05-19 Smithkline Beecham Corporation Disease management method and system
US20040015337A1 (en) * 2002-01-04 2004-01-22 Thomas Austin W. Systems and methods for predicting disease behavior
CN105335804A (en) * 2014-08-06 2016-02-17 北京计算机技术及应用研究所 Community health service system
WO2017205972A1 (en) * 2016-05-31 2017-12-07 Genomedx Biosciences, Inc. Systems methods and compositions for predicting metastasis in bladder cancer
CN106709248A (en) * 2016-12-16 2017-05-24 浙江大学 Disease complication excavating method based on FP-Growth algorithm
CN108334548A (en) * 2017-12-26 2018-07-27 爱品克科技(武汉)股份有限公司 A kind of data mining technology based on correlation rule
CN108597614A (en) * 2018-04-12 2018-09-28 上海熙业信息科技有限公司 A kind of auxiliary diagnosis decision-making technique based on Chinese electronic health record
CN108573758A (en) * 2018-04-26 2018-09-25 贵州大学 A kind of intelligent medical big data service system and application process
CN109584086A (en) * 2018-10-30 2019-04-05 平安医疗健康管理股份有限公司 Be hospitalized rational method and Related product are predicted based on prediction model
CN110164520A (en) * 2019-05-24 2019-08-23 南京邮电大学 The associated chronic diseases management device of space-time big data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
邓丰义: ""基于模式矩阵的FP-growth改进算法"", 《厦门大学学报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112489807A (en) * 2020-11-30 2021-03-12 中国人民解放军南部战区总医院 Accelerated rehabilitation data processing method, device and medium based on historical data
CN113643815A (en) * 2021-08-31 2021-11-12 平安医疗健康管理股份有限公司 Disease complication prediction method and device, computer equipment and storage medium
CN114504298A (en) * 2022-01-21 2022-05-17 南京航空航天大学 Physiological feature distinguishing method and system based on multi-source health perception data fusion
CN114504298B (en) * 2022-01-21 2024-02-13 南京航空航天大学 Physiological characteristic discriminating method and system based on multisource health perception data fusion

Similar Documents

Publication Publication Date Title
JP7200311B2 (en) Method and Apparatus for Determining Developmental Progress Using Artificial Intelligence and User Input
WO2019144542A1 (en) Affective interaction systems, devices, and methods based on affective computing user interface
Haider et al. Emotion recognition in low-resource settings: An evaluation of automatic feature selection methods
US11954150B2 (en) Electronic device and method for controlling the electronic device thereof
JP2022505676A (en) Systems and methods for patient screening, diagnosis, and stratification
US20190042079A1 (en) Electronic device and method for providing search result thereof
US11875277B2 (en) Learning and applying contextual similiarities between entities
Dabek et al. A grammar-based approach for modeling user interactions and generating suggestions during the data exploration process
CN111785372A (en) Collaborative filtering disease prediction system based on association rule and electronic equipment thereof
US11270565B2 (en) Electronic device and control method therefor
KR102697345B1 (en) An electronic device and method for obtaining emotional information
Wang et al. Hierarchical attentive transaction embedding with intra-and inter-transaction dependencies for next-item recommendation
CN112289442A (en) Method and device for predicting disease endpoint event and electronic equipment
CN111161883A (en) Disease prediction system based on variational self-encoder and electronic equipment thereof
CN113243918B (en) Risk detection method and device based on multi-mode hidden information test
Zhang et al. Exploring unsupervised multivariate time series representation learning for chronic disease diagnosis
US11468270B2 (en) Electronic device and feedback information acquisition method therefor
Ghahremani Boozandani et al. RegBN: Batch Normalization of Multimodal Data with Regularization
CN111222993A (en) Fund recommendation method and device
Wang et al. Jointly modeling intra-and inter-transaction dependencies with hierarchical attentive transaction embeddings for next-item recommendation
WO2023031235A1 (en) Semi-supervised machine learning method and system suitable for identification of patient subgroups in electronic healthcare records
CN114970727A (en) Multi-label text classification method and system and computer equipment
CN115408599A (en) Information recommendation method and device, electronic equipment and computer-readable storage medium
Rahman et al. Machine Learning Based Parkinson’s Disease Diagnosis using Hand Writing Related Variables
JP2021507392A (en) Learning and applying contextual similarities between entities

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201016