CN106709248A - Disease complication excavating method based on FP-Growth algorithm - Google Patents

Disease complication excavating method based on FP-Growth algorithm Download PDF

Info

Publication number
CN106709248A
CN106709248A CN201611168316.7A CN201611168316A CN106709248A CN 106709248 A CN106709248 A CN 106709248A CN 201611168316 A CN201611168316 A CN 201611168316A CN 106709248 A CN106709248 A CN 106709248A
Authority
CN
China
Prior art keywords
disease
physical examination
frequent episode
list
support
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611168316.7A
Other languages
Chinese (zh)
Inventor
吴健
顾盼
周立水
邱奇波
邓水光
李莹
尹建伟
吴朝晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201611168316.7A priority Critical patent/CN106709248A/en
Publication of CN106709248A publication Critical patent/CN106709248A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Pathology (AREA)
  • Fuzzy Systems (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a disease complication excavating method based on a FP-Growth algorithm. Based on the physical examination data of a large hospital for many years, the diagnosis data of a patient is extracted, and a frequency set is acquired by using the FP-Growth algorithm; a rule of which reliability is not less than the threshold value can be structured, namely, the disease complication is structured. When a doctor gives a diagnosis suggestion, the suggestion can be based on the physical examination data of the patient, and the doctor can provide scientific and reliable suggestion and prevention measures for the patient according to the disease complication. The disease correlation obtained through correlation rule excavation is comprehensive, real and reliable; the applied FP-Growth algorithm is faster and more efficient by comparing with the general correlation rule algorithm; except for the disease complication, the method can further provide a corresponding possibility, and sequence the complication according to the possibility, thus the diagnosis result and suggestions provided for the patients are more exact, and the patient's physical examination satisfaction is improved.

Description

A kind of disease complications method for digging based on FP-Growth algorithms
Technical field
The invention belongs to medical data digging technology field, and in particular to a kind of disease based on FP-Growth algorithms is simultaneously Hair disease method for digging.
Background technology
Data mining is the mixing together recently as artificial intelligence and database technology and the frontier branch of science risen, it It is devoted to finding to lie in the knowledge or rule on things essence and things development trend in data, and is the decision-making of expert There is provided and support.As information technology is in the large-scale application of medical industry, substantial amounts of medical data is collected together, data mining Technology has a good application prospect in medical field and supported with data.From physical examination diagnostic data base excavate disease complications with Abundant expertise and theory of medicine, complication often has complexity and uncertainty very high, using mass data research It is significant that concurrency relation between disease carries out treatment of the complication early warning to disease.
What research complication was actually studied is the concurrent collinear relationship between disease, and these disease concurrency relations are a bit Known, also some are unknown;Some are belonging to same section office, and some are across section office.Because its huge data volume, These implicit disease concurrency relations are difficult to and are manually found, and data mining technology exactly solves the most easy to handle of this problem Method.Current disease complications research is often all studied just for an a kind of or class disease, such as common diabetes are simultaneously The research of hair disease and the complication research of some cancers.
The content of the invention
In view of it is above-mentioned, the invention provides a kind of disease complications method for digging based on FP-Growth algorithms, for institute The complication for having common disease is excavated, it is intended to during for diagnosis, is suffered from for patient provides more fully physical examination suggestion and reminds Person carries out preventing early to some diseases.
A kind of disease complications method for digging based on FP-Growth algorithms, comprises the following steps:
(1) to Hospital Physical Examination database in all physical examinations report pre-processed and analyzed, obtain every part of physical examination report Accuse be diagnosed to be list of diseases;
(2) based on the list of diseases that all physical examinations report is corresponding, by counting identification output disease frequent episode list, should Frequent episode in list is the combination of a kind of disease or two kinds of diseases, and for any frequent episode i, it meets following condition will Ask:
Wherein:N is the total quantity of physical examination report, and support (i) is the support of frequent episode i, and ρ is the ratio threshold of setting Value;
(3) based on the list of disease frequent episode by calculate find correlation rule, excavate belong to frequent episode disease institute it is right The complication answered.
Physical examination report is pre-processed and analyzed in the step (1), is specifically included and missing values is deleted, it is right Exceptional value is processed, and counts the species and distribution and the popular disease figure of generation of medical diagnosis on disease, so as to obtain every part of physical examination The be diagnosed to be list of diseases of report.
All frequent episodes are identified using FP-Growth algorithms statistics in the step (2), so as to export disease frequent episode List.
Support support (i) is the physical examination reporting quantities comprising frequent episode i in list of diseases.
The step (3) to implement process as follows:
3.1, for belonging to any disease a of frequent episode, count the relevant disease of institute of combined frequent episode;
The 3.2 any association disease b for disease a, both confidence level confidence (a/ are calculated by following formula b):
Wherein:Support (a) is frequent comprising this in the support of the frequent episode being only made up of disease a, i.e. list of diseases The physical examination reporting quantities of item;Support (a/b) is by disease a and the support for associating the frequent episode that disease b is combined, i.e. disease Physical examination reporting quantities comprising the frequent episode in list;
Whether 3.3 judge confidence level confidence (a/b) more than default believability threshold, if so, then judging association Disease b is the complication of disease a;
The 3.4 relevant diseases of institute that disease a is traveled through according to step 3.2~3.3, excavation obtains all complication of disease a, And then show after being arranged these complication descendings according to confidence level confidence (a/b).
Disease complications method for digging of the present invention be based on large hospital for many years between physical examination data, to the diagnostic data of patient Extracted, and frequent item set is obtained using FP-Growth algorithms, therefrom constructed the rule that confidence level is not less than threshold value, i.e. disease Sick complication.Doctor not only can carry out suggestion when diagnostic recommendations are given according to patient's physical examination data, can also be according to disease Complication proposes science reliably suggestion and protection measure to patient.Thus, the present invention has following Advantageous Effects:
(1) data source of the present invention in large hospital for many years between physical examination data, diagnostic data passes through ten thousand up to more than 50 The disease association disease that association rule mining is obtained is comprehensive, true, reliable.
(2) Hospital Physical Examination data are all increasing daily, and the present invention can set the renewal time, protect disease complications data Hold relative real-time.
(3) present invention uses FP-Growth algorithms, than the Apriori association rule algorithms that are typically deployed more rapidly, it is high Effect.
(4) complication except providing disease of the invention, gives corresponding possibility, and according to possibility height pair simultaneously Hair disease is ranked up, and makes to be supplied to the diagnostic result of sufferer and diagnostic recommendations more accurate, improves sufferer physical examination satisfaction.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of disease complications method for digging of the present invention.
Fig. 2 is the schematic flow sheet of data prediction and analysis part in the present invention.
Fig. 3 is that common disease diagnoses schematic diagram.
Fig. 4 is the schematic flow sheet of identification frequent episode in the present invention.
The complication displaying figure that Fig. 5 is obtained for medical diagnosis on disease.
Specific embodiment
In order to more specifically describe the present invention, below in conjunction with the accompanying drawings and specific embodiment is to technical scheme It is described in detail.
Present embodiment is based on the decennary physical examination data of 2nd Affiliated Hospital Zhejiang University School of Medicine, and patient is examined Disconnected data are extracted, and obtain frequent item set using FP-Growth algorithms, therefrom construct the rule that confidence level is not less than threshold value Then, i.e. disease complications.Doctor not only can carry out suggestion when diagnostic recommendations are given according to patient's physical examination data, can be with Science reliably suggestion and protection measure are proposed to patient according to disease complications.Whole method for digging is main by three part groups Into:Data prediction and analysis, identification frequent item set, discovery correlation rule.
Notebook data is physical examination diagnostic data, with imperfection, redundancy and form polytropy.So locating in advance in data Reason part is mainly deleted missing values, and exceptional value is processed, and counts species and distribution and the life of medical diagnosis on disease Into popular disease figure.
Identification frequent item set part recognizes all of medical diagnosis on disease frequent item set using FP-Growth algorithms, it is desirable to frequently The supporting rate of Item Sets is not less than the minimum of setting.This part is the key component for finding disease complications, is also to calculate most Big part.
It was found that correlation rule part is to concentrate construction confidence level to be not less than the rule of the minimum of user's setting from frequent item Then, and with data visualization tool intuitively show.
As shown in figure 1, present embodiment is first pre-processed to data in database and is extracted generation diagnostic data set, so Afterwards using FP-Growth algorithms generation frequent item set, wherein the support of frequent item set is more than the minimum support for setting, then profit Rule digging is associated with frequent item set, the Association Rules of the lowest confidence higher than setting are found, is exactly concurrent disease Disease data.
Fig. 2 is the flow of data prediction and analysis module, physical examination diagnosis letter of the data storage in oracle server , it is necessary to be physical examination coding and diagnostic message, one physical examination coding of physical examination correspondence, a physical examination with the key assignments for obtaining in breath table There are multiple diagnostic messages.First in diagnostic message be similar to ":+-" and "" as insignificant character filter out, delete again afterwards Except missing values and redundancy value in diagnostic message;Statistics medical diagnosis on disease distribution situation after data are extracted from database And it is integrated into form needed for frequent item set mining again.Medical diagnosis on disease species is about 120,000 kinds, and common disease is diagnosed such as Fig. 3 institutes Show, font is bigger, and to represent the number of times that this medical diagnosis on disease occurs in physical examination more.
Fig. 4 is frequent episode generating process, and all frequent episodes for meeting minimum support are found out using FP-Growth algorithms Collection.FP-Growth algorithms are using strategy of dividing and rule:The transaction database for providing Frequent Item Sets is compressed to a frequent mould Formula tree (FP-tree), but still suspended item collection related information;Then the database after this compression is divided into one group of condition data Storehouse, each one frequent item of association, and each condition database is excavated respectively.Present embodiment need to only obtain length for 1 He Length is 2 frequent episode, so making a little changes to cycling condition.
Algorithm:FP-Growth//use FP-tree is increased by schema section, Mining Frequent Patterns.
Input:Transaction database D, minimum support threshold value min_sup.
Output:The complete set of frequent mode.
Method:
(1) FP-tree is constructed according to the following steps:
A () scanning transaction database D is once.Collect the set F and their support of frequent episode.F is dropped by support Sequence sorts, and is as a result frequent item set L.
B () creates the root node of FP-tree, it is marked with " null ".For each affairs Trans in D, perform:
Frequent episode in selection Trans, and sorted by the order in L.If the frequent episode table after sequence is [p | P], wherein p It is first element, and P is the table of surplus element.Call insert_tree ([p | P], T).The process implementation status is as follows.Such as Fruit T has children N so that N.item-name=p.item-name, then the counting increase by 1 of N;A new node N is otherwise created, will Its counting is set to 1, is linked to its father node T, and be linked to identical item-name by node chain structure Node.If P non-NULLs, insert_tree (P, N) is recursively called.
(2) excavation of FP-tree is realized by invoked procedure FP-Growth (FP-tree, null).
// the process is realized as follows:
Procedure FP-Growth(tree,α)
1) if tree include single path P then
2) each combination of nodes (being designated as β) of for path Ps
3) pattern β ∪ α, the minimum support of support support=β interior joints are produced
4)else for each aiTree head
5) pattern β=a is producedi∪ β, its support support=ai·support
6) conditional pattern base of structural model β, and construct the condition FP-tree of ββ
7)ifthen
8) FP-Growth (tree are calledβ,β)}
After obtaining frequent item set, correlation rule is obtained according to quantizating index confidence level.The confidence level warp of one rule P → H Calculation such as following formula:
Confidence (P → H)=support (P | H)/support (P)
Wherein:P | H refers to all elements appeared in set P or H.Equally because of project demands, the length of P and H sets It is set to 1.Disease complications data are exactly to seek all regular collections for meeting Minimum support4, meanwhile, the difference of same disease is simultaneously Hair disease carries out descending sort according to confidence level, and the disease complications for finally obtaining are as shown in figure 5, two connected disease generations of lines Table complication each other.
The above-mentioned description to embodiment is to be understood that and apply the present invention for ease of those skilled in the art. Person skilled in the art obviously can easily make various modifications to above-described embodiment, and described herein general Principle is applied in other embodiment without by performing creative labour.Therefore, the invention is not restricted to above-described embodiment, ability Field technique personnel announcement of the invention, the improvement made for the present invention and modification all should be in protection scope of the present invention Within.

Claims (5)

1. a kind of disease complications method for digging based on FP-Growth algorithms, comprises the following steps:
(1) to Hospital Physical Examination database in all physical examinations report pre-processed and analyzed, obtain every part of physical examination report institute The list of diseases being diagnosed to be;
(2) based on the list of diseases that all physical examinations report is corresponding, by counting identification output disease frequent episode list, the list In frequent episode be a kind of disease or two kinds of diseases combination, and for any frequent episode i, it meets following condition requirement:
sup p o r t ( i ) N ≥ ρ
Wherein:N is the total quantity of physical examination report, and support (i) is the support of frequent episode i, and ρ is the proportion threshold value of setting;
(3) correlation rule is found by calculating based on the list of disease frequent episode, is excavated corresponding to the disease for belonging to frequent episode Complication.
2. disease complications method for digging according to claim 1, it is characterised in that:To physical examination report in the step (1) Announcement is pre-processed and analyzed, and specifically includes and missing values are deleted, and exceptional value is processed, and counts medical diagnosis on disease Species and distribution and the popular disease figure of generation, report be diagnosed to be list of diseases so as to obtain every part of physical examination.
3. disease complications method for digging according to claim 1, it is characterised in that:FP- is used in the step (2) Growth algorithms statistics identifies all frequent episodes, so as to export disease frequent episode list.
4. disease complications method for digging according to claim 1, it is characterised in that:Support support (i) is Physical examination reporting quantities comprising frequent episode i in list of diseases.
5. disease complications method for digging according to claim 1, it is characterised in that:Step (3) implement Process is as follows:
3.1, for belonging to any disease a of frequent episode, count the relevant disease of institute of combined frequent episode;
The 3.2 any association disease b for disease a, both confidence level confidence (a/b) are calculated by following formula:
c o n f i d e n c e ( a / b ) = sup p o r t ( a / b ) sup p o r t ( a )
Wherein:Support (a) is comprising the frequent episode in the support of the frequent episode being only made up of disease a, i.e. list of diseases Physical examination reporting quantities;Support (a/b) is by disease a and the support for associating the frequent episode that disease b is combined, i.e. list of diseases In comprising the frequent episode physical examination reporting quantities;
Whether 3.3 judge confidence level confidence (a/b) more than default believability threshold, if so, then judging association disease b It is the complication of disease a;
The 3.4 relevant diseases of institute that disease a is traveled through according to step 3.2~3.3, excavation obtains all complication of disease a, and then Show after being arranged these complication descendings according to confidence level confidence (a/b).
CN201611168316.7A 2016-12-16 2016-12-16 Disease complication excavating method based on FP-Growth algorithm Pending CN106709248A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611168316.7A CN106709248A (en) 2016-12-16 2016-12-16 Disease complication excavating method based on FP-Growth algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611168316.7A CN106709248A (en) 2016-12-16 2016-12-16 Disease complication excavating method based on FP-Growth algorithm

Publications (1)

Publication Number Publication Date
CN106709248A true CN106709248A (en) 2017-05-24

Family

ID=58937965

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611168316.7A Pending CN106709248A (en) 2016-12-16 2016-12-16 Disease complication excavating method based on FP-Growth algorithm

Country Status (1)

Country Link
CN (1) CN106709248A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451416A (en) * 2017-08-28 2017-12-08 昆明理工大学 A kind of sle auxiliary diagnostic equipment and method
CN109147879A (en) * 2018-07-02 2019-01-04 北京众信易保科技有限公司 The method and system of Visual Report Forms based on medical document
CN110019188A (en) * 2017-09-15 2019-07-16 上海诺悦智能科技有限公司 A kind of suspicious characteristic discovery method based on trade network node
CN111785372A (en) * 2020-05-14 2020-10-16 浙江知盛科技集团有限公司 Collaborative filtering disease prediction system based on association rule and electronic equipment thereof
CN113643815A (en) * 2021-08-31 2021-11-12 平安医疗健康管理股份有限公司 Disease complication prediction method and device, computer equipment and storage medium
CN113823414A (en) * 2021-08-23 2021-12-21 杭州火树科技有限公司 Main diagnosis and main operation matching detection method and device, computing equipment and storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451416A (en) * 2017-08-28 2017-12-08 昆明理工大学 A kind of sle auxiliary diagnostic equipment and method
CN110019188A (en) * 2017-09-15 2019-07-16 上海诺悦智能科技有限公司 A kind of suspicious characteristic discovery method based on trade network node
CN109147879A (en) * 2018-07-02 2019-01-04 北京众信易保科技有限公司 The method and system of Visual Report Forms based on medical document
CN109147879B (en) * 2018-07-02 2021-07-27 北京众信易保科技有限公司 Method and system for visual report based on medical document
CN111785372A (en) * 2020-05-14 2020-10-16 浙江知盛科技集团有限公司 Collaborative filtering disease prediction system based on association rule and electronic equipment thereof
CN113823414A (en) * 2021-08-23 2021-12-21 杭州火树科技有限公司 Main diagnosis and main operation matching detection method and device, computing equipment and storage medium
CN113823414B (en) * 2021-08-23 2024-04-05 杭州火树科技有限公司 Main diagnosis and main operation matching detection method, device, computing equipment and storage medium
CN113643815A (en) * 2021-08-31 2021-11-12 平安医疗健康管理股份有限公司 Disease complication prediction method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN106709248A (en) Disease complication excavating method based on FP-Growth algorithm
Baim A method for attribute selection in inductive learning systems
CN106874693A (en) A kind of medical big data analysis process system and method
CN109119167A (en) Pyemia anticipated mortality system based on integrated model
CN107066791A (en) A kind of aided disease diagnosis method based on patient's assay
CN111540468A (en) ICD automatic coding method and system for visualization of diagnosis reason
CN110136836A (en) A kind of disease forecasting method based on physical examination report clustering
CN108256452A (en) A kind of method of the ECG signal classification of feature based fusion
CN107358014A (en) The clinical pre-treating method and system of a kind of physiological data
WO2022233121A1 (en) Unsupervised medical behavior compliance assessment method based on electronic medical record
CN111243753B (en) Multi-factor correlation interactive analysis method for medical data
CN110322356A (en) The medical insurance method for detecting abnormality and system of dynamic multi-mode are excavated based on HIN
CN112201330A (en) Medical quality monitoring and evaluating method combining DRGs tool and Bayesian model
CN106919804A (en) Medicine based on clinical data recommends method, recommendation apparatus and server
CN114121295A (en) Construction method of knowledge graph driven liver cancer diagnosis and treatment scheme recommendation system
CN109074871A (en) The mode discovery visual analysis system of patient group is generated for analyzing clinical data feature
CN104023002B (en) Agreement is to Barebone and is the method and apparatus that the fate map of presentation protocol determines the best fit of event
Tseng et al. Rough set based rule induction in decision making using credible classification and preference from medical application perspective
US11961204B2 (en) State visualization device, state visualization method, and state visualization program
CN106951710A (en) CAP data systems and method based on privilege information Learning support vector machine
JP2021523499A (en) A method for constructing a disease network considering the stratification of cohort data by confounding factors and the time of occurrence between diseases, a method for visualizing the network, and a computer-readable recording medium that records the method.
CN109360658A (en) A kind of the disease pattern method for digging and device of word-based vector model
Labib et al. Data mining for cancer management in Egypt case study: childhood acute lymphoblastic leukemia
Rajan et al. A survey on mining techniques for early lung cancer diagnoses
Patil et al. Predicting burn patient survivability using decision tree in weka environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170524

RJ01 Rejection of invention patent application after publication