CN106709248A - Disease complication excavating method based on FP-Growth algorithm - Google Patents
Disease complication excavating method based on FP-Growth algorithm Download PDFInfo
- Publication number
- CN106709248A CN106709248A CN201611168316.7A CN201611168316A CN106709248A CN 106709248 A CN106709248 A CN 106709248A CN 201611168316 A CN201611168316 A CN 201611168316A CN 106709248 A CN106709248 A CN 106709248A
- Authority
- CN
- China
- Prior art keywords
- disease
- physical examination
- frequent episode
- list
- support
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Public Health (AREA)
- Medical Informatics (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Pathology (AREA)
- Fuzzy Systems (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a disease complication excavating method based on a FP-Growth algorithm. Based on the physical examination data of a large hospital for many years, the diagnosis data of a patient is extracted, and a frequency set is acquired by using the FP-Growth algorithm; a rule of which reliability is not less than the threshold value can be structured, namely, the disease complication is structured. When a doctor gives a diagnosis suggestion, the suggestion can be based on the physical examination data of the patient, and the doctor can provide scientific and reliable suggestion and prevention measures for the patient according to the disease complication. The disease correlation obtained through correlation rule excavation is comprehensive, real and reliable; the applied FP-Growth algorithm is faster and more efficient by comparing with the general correlation rule algorithm; except for the disease complication, the method can further provide a corresponding possibility, and sequence the complication according to the possibility, thus the diagnosis result and suggestions provided for the patients are more exact, and the patient's physical examination satisfaction is improved.
Description
Technical field
The invention belongs to medical data digging technology field, and in particular to a kind of disease based on FP-Growth algorithms is simultaneously
Hair disease method for digging.
Background technology
Data mining is the mixing together recently as artificial intelligence and database technology and the frontier branch of science risen, it
It is devoted to finding to lie in the knowledge or rule on things essence and things development trend in data, and is the decision-making of expert
There is provided and support.As information technology is in the large-scale application of medical industry, substantial amounts of medical data is collected together, data mining
Technology has a good application prospect in medical field and supported with data.From physical examination diagnostic data base excavate disease complications with
Abundant expertise and theory of medicine, complication often has complexity and uncertainty very high, using mass data research
It is significant that concurrency relation between disease carries out treatment of the complication early warning to disease.
What research complication was actually studied is the concurrent collinear relationship between disease, and these disease concurrency relations are a bit
Known, also some are unknown;Some are belonging to same section office, and some are across section office.Because its huge data volume,
These implicit disease concurrency relations are difficult to and are manually found, and data mining technology exactly solves the most easy to handle of this problem
Method.Current disease complications research is often all studied just for an a kind of or class disease, such as common diabetes are simultaneously
The research of hair disease and the complication research of some cancers.
The content of the invention
In view of it is above-mentioned, the invention provides a kind of disease complications method for digging based on FP-Growth algorithms, for institute
The complication for having common disease is excavated, it is intended to during for diagnosis, is suffered from for patient provides more fully physical examination suggestion and reminds
Person carries out preventing early to some diseases.
A kind of disease complications method for digging based on FP-Growth algorithms, comprises the following steps:
(1) to Hospital Physical Examination database in all physical examinations report pre-processed and analyzed, obtain every part of physical examination report
Accuse be diagnosed to be list of diseases;
(2) based on the list of diseases that all physical examinations report is corresponding, by counting identification output disease frequent episode list, should
Frequent episode in list is the combination of a kind of disease or two kinds of diseases, and for any frequent episode i, it meets following condition will
Ask:
Wherein:N is the total quantity of physical examination report, and support (i) is the support of frequent episode i, and ρ is the ratio threshold of setting
Value;
(3) based on the list of disease frequent episode by calculate find correlation rule, excavate belong to frequent episode disease institute it is right
The complication answered.
Physical examination report is pre-processed and analyzed in the step (1), is specifically included and missing values is deleted, it is right
Exceptional value is processed, and counts the species and distribution and the popular disease figure of generation of medical diagnosis on disease, so as to obtain every part of physical examination
The be diagnosed to be list of diseases of report.
All frequent episodes are identified using FP-Growth algorithms statistics in the step (2), so as to export disease frequent episode
List.
Support support (i) is the physical examination reporting quantities comprising frequent episode i in list of diseases.
The step (3) to implement process as follows:
3.1, for belonging to any disease a of frequent episode, count the relevant disease of institute of combined frequent episode;
The 3.2 any association disease b for disease a, both confidence level confidence (a/ are calculated by following formula
b):
Wherein:Support (a) is frequent comprising this in the support of the frequent episode being only made up of disease a, i.e. list of diseases
The physical examination reporting quantities of item;Support (a/b) is by disease a and the support for associating the frequent episode that disease b is combined, i.e. disease
Physical examination reporting quantities comprising the frequent episode in list;
Whether 3.3 judge confidence level confidence (a/b) more than default believability threshold, if so, then judging association
Disease b is the complication of disease a;
The 3.4 relevant diseases of institute that disease a is traveled through according to step 3.2~3.3, excavation obtains all complication of disease a,
And then show after being arranged these complication descendings according to confidence level confidence (a/b).
Disease complications method for digging of the present invention be based on large hospital for many years between physical examination data, to the diagnostic data of patient
Extracted, and frequent item set is obtained using FP-Growth algorithms, therefrom constructed the rule that confidence level is not less than threshold value, i.e. disease
Sick complication.Doctor not only can carry out suggestion when diagnostic recommendations are given according to patient's physical examination data, can also be according to disease
Complication proposes science reliably suggestion and protection measure to patient.Thus, the present invention has following Advantageous Effects:
(1) data source of the present invention in large hospital for many years between physical examination data, diagnostic data passes through ten thousand up to more than 50
The disease association disease that association rule mining is obtained is comprehensive, true, reliable.
(2) Hospital Physical Examination data are all increasing daily, and the present invention can set the renewal time, protect disease complications data
Hold relative real-time.
(3) present invention uses FP-Growth algorithms, than the Apriori association rule algorithms that are typically deployed more rapidly, it is high
Effect.
(4) complication except providing disease of the invention, gives corresponding possibility, and according to possibility height pair simultaneously
Hair disease is ranked up, and makes to be supplied to the diagnostic result of sufferer and diagnostic recommendations more accurate, improves sufferer physical examination satisfaction.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of disease complications method for digging of the present invention.
Fig. 2 is the schematic flow sheet of data prediction and analysis part in the present invention.
Fig. 3 is that common disease diagnoses schematic diagram.
Fig. 4 is the schematic flow sheet of identification frequent episode in the present invention.
The complication displaying figure that Fig. 5 is obtained for medical diagnosis on disease.
Specific embodiment
In order to more specifically describe the present invention, below in conjunction with the accompanying drawings and specific embodiment is to technical scheme
It is described in detail.
Present embodiment is based on the decennary physical examination data of 2nd Affiliated Hospital Zhejiang University School of Medicine, and patient is examined
Disconnected data are extracted, and obtain frequent item set using FP-Growth algorithms, therefrom construct the rule that confidence level is not less than threshold value
Then, i.e. disease complications.Doctor not only can carry out suggestion when diagnostic recommendations are given according to patient's physical examination data, can be with
Science reliably suggestion and protection measure are proposed to patient according to disease complications.Whole method for digging is main by three part groups
Into:Data prediction and analysis, identification frequent item set, discovery correlation rule.
Notebook data is physical examination diagnostic data, with imperfection, redundancy and form polytropy.So locating in advance in data
Reason part is mainly deleted missing values, and exceptional value is processed, and counts species and distribution and the life of medical diagnosis on disease
Into popular disease figure.
Identification frequent item set part recognizes all of medical diagnosis on disease frequent item set using FP-Growth algorithms, it is desirable to frequently
The supporting rate of Item Sets is not less than the minimum of setting.This part is the key component for finding disease complications, is also to calculate most
Big part.
It was found that correlation rule part is to concentrate construction confidence level to be not less than the rule of the minimum of user's setting from frequent item
Then, and with data visualization tool intuitively show.
As shown in figure 1, present embodiment is first pre-processed to data in database and is extracted generation diagnostic data set, so
Afterwards using FP-Growth algorithms generation frequent item set, wherein the support of frequent item set is more than the minimum support for setting, then profit
Rule digging is associated with frequent item set, the Association Rules of the lowest confidence higher than setting are found, is exactly concurrent disease
Disease data.
Fig. 2 is the flow of data prediction and analysis module, physical examination diagnosis letter of the data storage in oracle server
, it is necessary to be physical examination coding and diagnostic message, one physical examination coding of physical examination correspondence, a physical examination with the key assignments for obtaining in breath table
There are multiple diagnostic messages.First in diagnostic message be similar to ":+-" and "" as insignificant character filter out, delete again afterwards
Except missing values and redundancy value in diagnostic message;Statistics medical diagnosis on disease distribution situation after data are extracted from database
And it is integrated into form needed for frequent item set mining again.Medical diagnosis on disease species is about 120,000 kinds, and common disease is diagnosed such as Fig. 3 institutes
Show, font is bigger, and to represent the number of times that this medical diagnosis on disease occurs in physical examination more.
Fig. 4 is frequent episode generating process, and all frequent episodes for meeting minimum support are found out using FP-Growth algorithms
Collection.FP-Growth algorithms are using strategy of dividing and rule:The transaction database for providing Frequent Item Sets is compressed to a frequent mould
Formula tree (FP-tree), but still suspended item collection related information;Then the database after this compression is divided into one group of condition data
Storehouse, each one frequent item of association, and each condition database is excavated respectively.Present embodiment need to only obtain length for 1 He
Length is 2 frequent episode, so making a little changes to cycling condition.
Algorithm:FP-Growth//use FP-tree is increased by schema section, Mining Frequent Patterns.
Input:Transaction database D, minimum support threshold value min_sup.
Output:The complete set of frequent mode.
Method:
(1) FP-tree is constructed according to the following steps:
A () scanning transaction database D is once.Collect the set F and their support of frequent episode.F is dropped by support
Sequence sorts, and is as a result frequent item set L.
B () creates the root node of FP-tree, it is marked with " null ".For each affairs Trans in D, perform:
Frequent episode in selection Trans, and sorted by the order in L.If the frequent episode table after sequence is [p | P], wherein p
It is first element, and P is the table of surplus element.Call insert_tree ([p | P], T).The process implementation status is as follows.Such as
Fruit T has children N so that N.item-name=p.item-name, then the counting increase by 1 of N;A new node N is otherwise created, will
Its counting is set to 1, is linked to its father node T, and be linked to identical item-name by node chain structure
Node.If P non-NULLs, insert_tree (P, N) is recursively called.
(2) excavation of FP-tree is realized by invoked procedure FP-Growth (FP-tree, null).
// the process is realized as follows:
Procedure FP-Growth(tree,α)
1) if tree include single path P then
2) each combination of nodes (being designated as β) of for path Ps
3) pattern β ∪ α, the minimum support of support support=β interior joints are produced
4)else for each aiTree head
5) pattern β=a is producedi∪ β, its support support=ai·support
6) conditional pattern base of structural model β, and construct the condition FP-tree of ββ
7)ifthen
8) FP-Growth (tree are calledβ,β)}
After obtaining frequent item set, correlation rule is obtained according to quantizating index confidence level.The confidence level warp of one rule P → H
Calculation such as following formula:
Confidence (P → H)=support (P | H)/support (P)
Wherein:P | H refers to all elements appeared in set P or H.Equally because of project demands, the length of P and H sets
It is set to 1.Disease complications data are exactly to seek all regular collections for meeting Minimum support4, meanwhile, the difference of same disease is simultaneously
Hair disease carries out descending sort according to confidence level, and the disease complications for finally obtaining are as shown in figure 5, two connected disease generations of lines
Table complication each other.
The above-mentioned description to embodiment is to be understood that and apply the present invention for ease of those skilled in the art.
Person skilled in the art obviously can easily make various modifications to above-described embodiment, and described herein general
Principle is applied in other embodiment without by performing creative labour.Therefore, the invention is not restricted to above-described embodiment, ability
Field technique personnel announcement of the invention, the improvement made for the present invention and modification all should be in protection scope of the present invention
Within.
Claims (5)
1. a kind of disease complications method for digging based on FP-Growth algorithms, comprises the following steps:
(1) to Hospital Physical Examination database in all physical examinations report pre-processed and analyzed, obtain every part of physical examination report institute
The list of diseases being diagnosed to be;
(2) based on the list of diseases that all physical examinations report is corresponding, by counting identification output disease frequent episode list, the list
In frequent episode be a kind of disease or two kinds of diseases combination, and for any frequent episode i, it meets following condition requirement:
Wherein:N is the total quantity of physical examination report, and support (i) is the support of frequent episode i, and ρ is the proportion threshold value of setting;
(3) correlation rule is found by calculating based on the list of disease frequent episode, is excavated corresponding to the disease for belonging to frequent episode
Complication.
2. disease complications method for digging according to claim 1, it is characterised in that:To physical examination report in the step (1)
Announcement is pre-processed and analyzed, and specifically includes and missing values are deleted, and exceptional value is processed, and counts medical diagnosis on disease
Species and distribution and the popular disease figure of generation, report be diagnosed to be list of diseases so as to obtain every part of physical examination.
3. disease complications method for digging according to claim 1, it is characterised in that:FP- is used in the step (2)
Growth algorithms statistics identifies all frequent episodes, so as to export disease frequent episode list.
4. disease complications method for digging according to claim 1, it is characterised in that:Support support (i) is
Physical examination reporting quantities comprising frequent episode i in list of diseases.
5. disease complications method for digging according to claim 1, it is characterised in that:Step (3) implement
Process is as follows:
3.1, for belonging to any disease a of frequent episode, count the relevant disease of institute of combined frequent episode;
The 3.2 any association disease b for disease a, both confidence level confidence (a/b) are calculated by following formula:
Wherein:Support (a) is comprising the frequent episode in the support of the frequent episode being only made up of disease a, i.e. list of diseases
Physical examination reporting quantities;Support (a/b) is by disease a and the support for associating the frequent episode that disease b is combined, i.e. list of diseases
In comprising the frequent episode physical examination reporting quantities;
Whether 3.3 judge confidence level confidence (a/b) more than default believability threshold, if so, then judging association disease b
It is the complication of disease a;
The 3.4 relevant diseases of institute that disease a is traveled through according to step 3.2~3.3, excavation obtains all complication of disease a, and then
Show after being arranged these complication descendings according to confidence level confidence (a/b).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611168316.7A CN106709248A (en) | 2016-12-16 | 2016-12-16 | Disease complication excavating method based on FP-Growth algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611168316.7A CN106709248A (en) | 2016-12-16 | 2016-12-16 | Disease complication excavating method based on FP-Growth algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106709248A true CN106709248A (en) | 2017-05-24 |
Family
ID=58937965
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611168316.7A Pending CN106709248A (en) | 2016-12-16 | 2016-12-16 | Disease complication excavating method based on FP-Growth algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106709248A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107451416A (en) * | 2017-08-28 | 2017-12-08 | 昆明理工大学 | A kind of sle auxiliary diagnostic equipment and method |
CN109147879A (en) * | 2018-07-02 | 2019-01-04 | 北京众信易保科技有限公司 | The method and system of Visual Report Forms based on medical document |
CN110019188A (en) * | 2017-09-15 | 2019-07-16 | 上海诺悦智能科技有限公司 | A kind of suspicious characteristic discovery method based on trade network node |
CN111785372A (en) * | 2020-05-14 | 2020-10-16 | 浙江知盛科技集团有限公司 | Collaborative filtering disease prediction system based on association rule and electronic equipment thereof |
CN113643815A (en) * | 2021-08-31 | 2021-11-12 | 平安医疗健康管理股份有限公司 | Disease complication prediction method and device, computer equipment and storage medium |
CN113823414A (en) * | 2021-08-23 | 2021-12-21 | 杭州火树科技有限公司 | Main diagnosis and main operation matching detection method and device, computing equipment and storage medium |
-
2016
- 2016-12-16 CN CN201611168316.7A patent/CN106709248A/en active Pending
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107451416A (en) * | 2017-08-28 | 2017-12-08 | 昆明理工大学 | A kind of sle auxiliary diagnostic equipment and method |
CN110019188A (en) * | 2017-09-15 | 2019-07-16 | 上海诺悦智能科技有限公司 | A kind of suspicious characteristic discovery method based on trade network node |
CN109147879A (en) * | 2018-07-02 | 2019-01-04 | 北京众信易保科技有限公司 | The method and system of Visual Report Forms based on medical document |
CN109147879B (en) * | 2018-07-02 | 2021-07-27 | 北京众信易保科技有限公司 | Method and system for visual report based on medical document |
CN111785372A (en) * | 2020-05-14 | 2020-10-16 | 浙江知盛科技集团有限公司 | Collaborative filtering disease prediction system based on association rule and electronic equipment thereof |
CN113823414A (en) * | 2021-08-23 | 2021-12-21 | 杭州火树科技有限公司 | Main diagnosis and main operation matching detection method and device, computing equipment and storage medium |
CN113823414B (en) * | 2021-08-23 | 2024-04-05 | 杭州火树科技有限公司 | Main diagnosis and main operation matching detection method, device, computing equipment and storage medium |
CN113643815A (en) * | 2021-08-31 | 2021-11-12 | 平安医疗健康管理股份有限公司 | Disease complication prediction method and device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106709248A (en) | Disease complication excavating method based on FP-Growth algorithm | |
Baim | A method for attribute selection in inductive learning systems | |
CN106874693A (en) | A kind of medical big data analysis process system and method | |
CN109119167A (en) | Pyemia anticipated mortality system based on integrated model | |
CN107066791A (en) | A kind of aided disease diagnosis method based on patient's assay | |
CN111540468A (en) | ICD automatic coding method and system for visualization of diagnosis reason | |
CN110136836A (en) | A kind of disease forecasting method based on physical examination report clustering | |
CN108256452A (en) | A kind of method of the ECG signal classification of feature based fusion | |
CN107358014A (en) | The clinical pre-treating method and system of a kind of physiological data | |
WO2022233121A1 (en) | Unsupervised medical behavior compliance assessment method based on electronic medical record | |
CN111243753B (en) | Multi-factor correlation interactive analysis method for medical data | |
CN110322356A (en) | The medical insurance method for detecting abnormality and system of dynamic multi-mode are excavated based on HIN | |
CN112201330A (en) | Medical quality monitoring and evaluating method combining DRGs tool and Bayesian model | |
CN106919804A (en) | Medicine based on clinical data recommends method, recommendation apparatus and server | |
CN114121295A (en) | Construction method of knowledge graph driven liver cancer diagnosis and treatment scheme recommendation system | |
CN109074871A (en) | The mode discovery visual analysis system of patient group is generated for analyzing clinical data feature | |
CN104023002B (en) | Agreement is to Barebone and is the method and apparatus that the fate map of presentation protocol determines the best fit of event | |
Tseng et al. | Rough set based rule induction in decision making using credible classification and preference from medical application perspective | |
US11961204B2 (en) | State visualization device, state visualization method, and state visualization program | |
CN106951710A (en) | CAP data systems and method based on privilege information Learning support vector machine | |
JP2021523499A (en) | A method for constructing a disease network considering the stratification of cohort data by confounding factors and the time of occurrence between diseases, a method for visualizing the network, and a computer-readable recording medium that records the method. | |
CN109360658A (en) | A kind of the disease pattern method for digging and device of word-based vector model | |
Labib et al. | Data mining for cancer management in Egypt case study: childhood acute lymphoblastic leukemia | |
Rajan et al. | A survey on mining techniques for early lung cancer diagnoses | |
Patil et al. | Predicting burn patient survivability using decision tree in weka environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170524 |
|
RJ01 | Rejection of invention patent application after publication |