CN106202883A

CN106202883A - A kind of method setting up disease cloud atlas based on big data analysis

Info

Publication number: CN106202883A
Application number: CN201610497249.7A
Authority: CN
Inventors: 温川飙; 程小恩; 贾帅; 杨东
Original assignee: Chengdu University of Traditional Chinese Medicine
Current assignee: Chengdu University of Traditional Chinese Medicine
Priority date: 2016-06-28
Filing date: 2016-06-28
Publication date: 2016-12-07

Abstract

The present invention relates to a kind of method setting up disease cloud atlas based on big data analysis, utilize GIS map and big data framework technology, show that disease, in the geographical distribution situation of room and time, comprises the following steps: step 1. gathers data: obtain electronic health record data；The conversion of step 2. data and cleaning；Data after step 3. uses data mining technology that step 2 is changed and cleaned are associated analyzing, excavate frequent item set, obtain the correlation rule between symptom attribute according to frequent item set, set up the result set in " kinds of Diseases sick time number of patients geographical position " according to correlation rule；Step 4. calls GIS map data, according to result set show disease in space, temporal geographical distribution situation.Present invention exhibition is regularty of epidemic and the offer basic data of the exploration disease cause of disease of study of disease；By the analysis of Disease Distribution rule and determiner being contributed to for reasonably working out the anti-system of disease, Health method and measure offer scientific basis.

Description

A kind of method setting up disease cloud atlas based on big data analysis

Technical field

The present invention relates to disease forecasting analysis technical field method, particularly relate to one and set up disease based on big data analysis The method of cloud atlas.

Background technology

In medicinal diagnostic message, there is the substantial amounts of information about conditions of patients with individual, including patient's These medical datas are associated analyzing, excavate the most under covering substantial amounts of by medical history and certain disease and various symptoms Correlation rule, such as the relation between symptom attribute, can with the development trend of predictive disease and assist doctor to assess health condition, Make diagnosis, prevention of disease and treatment and have important meaning.

In data mining technology, if there is certain regularity between the value of two or more variable, just it is called Association (association).Association reflects the dependence between an event and other event or the degree frequently occurred.Close Connection can be divided into simple association, sequential correlation, causalnexus and quantity association etc..Association between data is complicated, major part Contain cannot observe after mass data and learn.Association analysis (association analysis) be exactly be used for finding hidden Ensconce the valuable contact that large data is concentrated.The contact that association analysis is found can be with correlation rule or frequent item set Form represent.Correlation rule (association rule) refers to being correlated with of the different attribute item occurred in the same time Property.In broad terms, association analysis is the essence of data mining, since the purpose of data mining is to find to hide in data behind Knowledge, then this knowledge must be the relation between the different object of reflection；Saying in the narrow sense, association analysis refers to that a class is specific Data mining technology, main purpose is the association in mining data storehouse between object.Association rule mining is exactly from substantial amounts of Data are excavated and describes the valuable knowledge connected each other between data item.

GIS has massive spatial data storage, the feature of analysis, can export rapidly and intuitively again analysis result, So becoming the aid that modern medicine study is strong, also gradually it is applied to disease prevention and control field.Most Epidemiological study data all has space attribute, such as the morbidity infection conditions of crowd and animal, the distribution of host's medium, warm and humid Degree, rainfall, soil are all relevant with geographical position with sanitary installation etc., and these data cube computation can be risen by GIS by spatial relationship Come, interact display and analyze, the most also providing basis for later statistical analysis.

Summary of the invention

It is desirable to provide a kind of method setting up disease cloud atlas based on big data analysis,

For reaching above-mentioned purpose, the technical solution used in the present invention is as follows:

A kind of method setting up disease cloud atlas based on big data analysis, utilizes GIS map and big data framework technology, shows Disease in space, temporal distribution situation, comprise the following steps:

Step 1. gathers data: obtain electronic health record data；

Step 2. data cleansing and process；The data obtaining step 1 are changed and are cleaned；Conversion includes lack of standardization Name of disease change, by change by unified for nonstandard name of disease be the name of disease of standard criterion；By clean, by unreasonable, The data predicted the outcome may be affected reject；

Data after step 3. uses data mining technology that step 2 is changed and cleared up are associated analyzing, and excavate frequency Numerous collection, obtains the correlation rule between symptom attribute according to frequent item set, sets up " kinds of Diseases-ill according to correlation rule Time m-number of patients-geographical position " result set；

Step 4. calls GIS map data, the result set that read step 3 obtains, and shows disease at sky according to result set Between, temporal distribution situation.

Further, in described step 2, for there being the data of property value to carry out specification attributes value: wherein to value both There is word to have again the attribute of discrete data by word quantitative classification, be converted into Category Attributes；For only word attribute according to Business is qualitative, and word attribute is converted into Category Attributes；Attribute for only discrete data then deletes this attribute；

Two kinds of processing modes are had: mode 1: using missing value as a kind of value form for the data of attribute missing value；Mode 2: Ignore this attribute.

Further, in described step 3, set minimum support and the minimum confident degree, will be greater than and/or equal to ramuscule The correlation rule of degree of holding and the minimum confident degree is as Strong association rule, if the illness symptom of a patient meets certain disease disease The Strong association rule of shape attribute, then it is assumed that it is the patient of this disease, obtains the sick time of this patient and geographical position then Put, number of patients is added up simultaneously.

Further, described step 3 uses following steps Mining Frequent Itemsets Based:

The 3.1 all candidate's symptom attributes traveling through certain disease, determine the support frequency of every attribute, the most all candidates Symptom attribute composition candidate 1 collection: H₁；

3.2 set minimum supports frequency, and with minimum, the support frequency of all properties in H1 being supported, frequency compares, will H₁Middle support frequency supports frequent 1 the collection F1 of attribute composition of frequency more than minimum；

Then 3.3 use following method to excavate frequent k+1 item collection: F_k+1；

(1) attended operation (F is utilized_k)⊕(F_k), determine candidate's k+1 item collection: H_k+1, wherein K=1,2 ... n；

(2) to H_k+1In attribute be scanned, calculate H_k+1In the support frequency of each attribute, by H_k+1In all Hold frequency and support attribute composition frequent K+1 item collection: the F of frequency more than or equal to minimum_k+1；

F_kMiddle term collection number is | F_k|, then C_k+1In haveIndividual attribute；C_k+1It is the Candidate Set of frequent item set, i.e. C_k+1Bag Include H₁、…H_K+1；

(3) when item concentrates the support frequency both less than minimum support frequency of all properties element, algorithm is terminated；

According to the frequent item set F excavated_k+1Obtain the Strong association rule meeting minimum support and the minimum confident degree, its Middle K=0,1 ... n.

Further, formula (1) is utilized to calculate the degree of belief of obtained correlation rule；

C o n f i d e n c e (H &DoubleRightArrow; F) = P (H / F) = \frac{\sup p o r t_c o u n t (H \cup F)}{\sup p o r t_c o u n t (H)} - - - (1)

In formula (1), support_count (H ∪ F) is the things number comprising item collection H ∪ F, and support_count (H) is bag Containing the things number of item collection H, wherein H is candidate, and F is frequent item set, if Then thinking that this correlation rule is Strong association rule, wherein min_counf is the minimum confident degree threshold values arranged.

According to this formula, relevance principle can produce as follows: for each frequent item set L, produces all non-gap of L Collection, for each nonvoid subset of L, if Then produce a Strong association ruleWherein, min_counf is the minimum confident degree threshold values arranged.

Owing to rule is directly produced by frequent item set, therefore all item collection involved by correlation rule are satisfied by Little support threshold.Frequent item set and support frequency thereof can store in lists so that they can quickly be accessed.Use Apriori algorithm is associated rule digging to the medical record information added up.

Further, in described step 3, according to Strong association rule, result data is stored in result table, wherein disease Kind, sick time, number of patients and geographical position are as the row race member of result table.

Preferably, in described step 4, Baidu's map increases figure layer, shows that certain disease exists with the form of thermodynamic chart Geographical distribution situation in space, time and crowd, the overlay area wherein scheming layer obtains according to geographical position, figure layer transparency Obtain according to the number of patients in this region.

Further, also include before described step 1, figure layer data is initialized.

Further, Baidu's map is done by described step 4 second interface exploitation, it is achieved data model and GIS map Interconnect and apply response.

The method have the advantages that

1. the present invention utilizes GIS map and big data framework technology, shows the disease geographical distribution feelings in space, in the time Condition, for regularty of epidemic and the offer basic data of the exploration disease cause of disease of study of disease；

2. integrating the electronic health record data of different platform, before solving, data dispersion, inefficiency are difficult to data set Become and the comprehensive problem analyzed；

3., by the description being distributed disease cloud atlas data, the basic feature of understanding disease popularity, is that clinical diagnosis has very much The important information being worth；

4. the analysis of pair Disease Distribution rule and determiner contributes to as reasonably working out the anti-system of disease, Health method And measure provides scientific basis.

Accompanying drawing explanation

Fig. 1 is the geographical distribution design sketch of gastric abscess；

Fig. 2 is the geographical distribution design sketch of cough.

Detailed description of the invention

In order to make the purpose of the present invention, technical scheme and advantage clearer, below in conjunction with accompanying drawing, the present invention is made Further describe.

Embodiment 1

The disclosed method setting up disease cloud atlas based on big data analysis of the present embodiment, utilizes GIS map and big data shelf Structure technology, shows the disease geographical distribution situation in space, in the time, comprises the following steps:

Step 1. gathers data: obtain electronic health record data.Further, electronic health record Data Source is in big data acquisition Platform.Big data acquisition platform, uses the medical data acquisition technology of cloud computing mode, gathers the clinical case history of 80 multiple hospitals Data, data acquisition xml document formal layout, it is provided that unify, upload interface easily, support that real-time files disposition is looked into Ask, upload batch management and problem data rollback, the most compatible other data format analysis processing and interface modes.

Step 2. data cleansing and process；The data obtaining step 1 are changed and are cleaned, and conversion includes lack of standardization Name of disease change, such as ' the little pain of gastral cavilty ' is converted to ' gastric abscess ', by change by unified for nonstandard name of disease be standard The name of disease of specification.Then by cleaning, the data predicting the outcome impact unreasonable, possible are rejected, and as deleted name of disease are Empty record.

Step 4. calls GIS map data, according to data model show disease in space, temporal distribution situation. The present embodiment utilizes front-end technology, reads the result set after back-end data is excavated, then the situation respectively of disease is presented in On the figure layer of GIS.Front-end technology is developed also known as front-end technology, is by java language development front end page, and after reading Number of units evidence, is presented to front end page.

Front end page can be the front-end platform existed, it is also possible to individually build for disease cloud atlas by the following method A vertical front end page:

1. write the front end html page, and in java code, call the interface of Baidu's map, thus optimized integration map Loading；

2. in code, write a figure layer, the incoming parameter such as definition time, sick kinds, number, area, and the dividing of parameter Shelves；

3. in java code, write data base interface function, it is achieved inquire from the result set of data mining and meet bar The record of part, and pass to parameter；

4. the correlation rule then can set up according to step 3 after parameter receives data, is illustrated in GIS map.

Parameter stepping: use different colours to represent rank in various degree, minima=0, maximum=patient's sum, so After according to accounting stepping, and represent with different colours, then according to the shelves level at each place, area, in GIS map, display is each The color that individual area is corresponding.

Two kinds of processing modes are had: mode 1: using missing value as a kind of value form for the data of attribute missing value；Mode 2: Ignore this attribute.From the point of view of data cases, name of disease the most completely be may determine that to attribute, desirable；Attribute is obscured Not can determine that attribute, then ignore.

Further, described step 3 uses following steps Mining Frequent Itemsets Based:

The 3.1 all candidate's symptom attributes traveling through certain disease, determine the support frequency of every attribute, the most all candidates Symptom attribute composition candidate 1 collection: H₁.Frequency is exactly the number of times occurred, comes out by statistical method.

3.2 set minimum supports frequency: min_support, by the support frequency of all properties in H1 and min_support Compare, by H₁Frequent 1 the collection F1 of the middle support frequency attribute composition more than min_support；

(2) to H_k+1In attribute be scanned, calculate H_k+1In the support frequency of each attribute, by H_k+1In all Hold frequency attribute composition frequent K+1 item collection: the F more than min_support_k+1；

(3) when the support frequency that item concentrates all properties element is both less than min_support, algorithm is terminated；

C o n f i d e n c e (H &DoubleRightArrow; F) = P (H / F) = \frac{\sup p o r t_c o u n t (H \cup F)}{\sup p o r t_c o u n t (H)} - - - (1)

In formula (1), support_count (H ∪ F) is the things number comprising item collection H ∪ F, and support_count (H) is Comprising the things number of item collection H, wherein H is candidate, and F is frequent item set, if Then thinking that this correlation rule is Strong association rule, wherein min_counf is the minimum confident degree threshold values arranged.

Owing to rule is directly produced by frequent item set, therefore all item collection involved by correlation rule are satisfied by Little support threshold.Frequent item set and support frequency thereof can store in lists so that they can quickly be accessed.

Further, in step 3, according to Strong association rule, result data is stored in result table, wherein disease kind Class, sick time, number of patients and geographical position are as the row race member of result table.

Preferably, in step 4 on Baidu's map increase figure layer, with the form of thermodynamic chart show certain disease in space, Geographical distribution situation in time and crowd, wherein schemes the overlay area of layer and obtains according to geographical position, figure layer transparency according to Number of patients in this region obtains.

Further, before step 1 figure layer data is initialized.

Further, in step 4 Baidu's map is done second interface exploitation, it is achieved data model and GIS map mutual Connection intercommunication and application response.

Embodiment 2

The inventive method, as a example by epigastric pain disease, is described in detail by the present embodiment.

(1) pretreatment of original medical record data

The medical record data of 1000 example epigastric pain disease patients in the big data warehouse of Data Source hospital.All data are equal Deriving from the anthropic factors such as actual case history, eliminating operator in including, gather and extracting, this medical record information is the trueest Real believable medical data.

Due to complexity, multiformity and the redundancy of medical data, in order to avoid data mining process falls into chaos, obtain More accurate experimental result, first I has carried out pretreatment to the medical record data of patient.The attribute of original medical record data has 36 Individual.By observational study, find initial data also exists the problems such as substantial amounts of noise data and redundant data, such as attribute " disease Sick durante dolors " there are such as nearly more than 10 values such as " 0 ", " 5～6 minutes ", " 30 minutes ", " 1～10 minute " and pole not Specification, and the value in " paroxysmal nocturnal dyspnea " and " pain diffuses position " attribute is the most lack of standardization, including null value, 0, The literary composition such as the discrete data of 1 type and " can not put down sleeping " " both shoulders back " " shoulder back cervical region lower jaw part and left arm are to left finger end " WD, some attributes also exist the situation of overwhelming majority value missing value.

For this situation, the present embodiment is taked following method data carry out pretreatment:

1. specification attributes value: word existing to value has again the attribute of discrete data to be described by word as the case may be Quantitative classification, is converted into Category Attributes.Such as to " ache from Disease persistent period ", herein by description lack of standardization as " 0 ", " 5～6 Minute ", " 30 minutes " etc. be quantified as " 0 " " without pain ", " 1 " " durante dolors is less than 10 minutes ", " 2 " " durante dolors is more than 10 minutes and less than 30 minutes ", " 3 " " durante dolors is more than 30 minutes ".For " battle array Send out property nocturnal dyspnea " in " can not put down sleeping " be set to the third value " 2 " outside " 0 " " 1 "；

2. for attribute missing value situation, having two kinds of processing modes, one is that two is to ignore as a kind of value form using missing value This attribute.Which kind of mode is selected to want particular problem to make a concrete analysis of, such as " pathologic Q ripple " attribute in initial data, in 450 example case histories Entirely without an example value, therefore this attribute is ignored；And to " arrhythmia " " Electrocardiography " two attributes, only exist a small amount of Missing value, and understand these two attributes according to medical science general knowledge and occupy critical role when Diagnosis of Gastric gastralgia, herein will when processing The missing value of two attributes supposes to be set to " without exception ".

In addition to above-mentioned processing method, also to focus on the protection to patients ' privacy, it is ensured that used data do not expose patient and appoint What privacy information.

(2) mining process of frequent item set

Choosing of model algorithm: the algorithm being suitable for for same Model Selection, the present embodiment is with apriori algorithm for ginseng Examine, and done appropriate optimization.

Table 1 is the medical record information tables of data of gastric abscess, and gastric abscess has 6 symptom and sign attributes, various in order to excavate Association between symptom attribute, uses the innovatory algorithm of Apriori that data are associated rule digging.

Table 1: gastric abscess patient medical record information

In table 1, each attribute and value implication are as follows:

A=has the hiccups；B=flatulence；C=feels sick；D=vomits；E=suffers from diarrhoea；F=is uncomfortable in chest；G=gastropathy history；H=is tired or lives Dynamic postemphasis；I=can spontaneous remission after having a rest；The medicines such as J=buccal nitric acid lipid can be alleviated；K=paroxysmal pain.

Attribute value represents this symptom for " 1 " and exists, and " 0 " represents this symptom for not exist.Patient such as serial number 1 Medical record information shows that this patient has the hiccups, and has nauseating and symptom of diarrhea.

Apriori innovatory algorithm is used to be associated analyzing to relation between symptom attribute in table 4.1, key therein Step is to use frequent K-1 item collection: F_k-1Generate frequent K item collection F_k, this process is divided into two steps: first, by F_k-1In appoint One or two Son item set connects, it is thus achieved that candidate collection C_k；Then, to C_kIn each element screen, because C_kIn every Collection is not necessarily all frequent item set, then obtains satisfactory item collection composition F_k。

Mining process below in conjunction with this medical record data detailed description frequent item set:

All candidate's symptom attributes are traveled through once by 2.1, determine every support frequency, all properties composition candidate 1 Collection: H₁；

2.2 set minimum supports frequency: min_support=10%, by H₁The support frequency of middle all properties and min_ Support compares, and the most all support frequency attribute more than min_support forms frequent 1 collection: F₁, count one by one at primary disease According to, F₁Middle all constituent elements and support frequency are as shown in table 2:

Table 2: frequent 1 collection

Attribute	Support%
		Have the hiccups	40.909
Flatulence	54.541
		Feel sick	27.273
Vomiting	63.636
		Diarrhoea	63.636
Uncomfortable in chest	54.541
		Gastropathy history	63.636
Fatigue or activity postemphasis	22.727
		Can spontaneous remission after rest	27.273
The medicines such as buccal nitric acid lipid can be alleviated	22.727
		Paroxysmal pain	40.909

2.3 utilize attended operation (F₁)⊕(F₁) determine that candidate 2 collects H₂, frequent 1 collection middle term collection number be | F₁|, then C₂ In haveIndividual attribute, C₂The Candidate Set of frequent 2 collection, i.e. C₂Include H₁、H₂；

2.4 couples of H₂In attribute be scanned, calculate the support frequency of each attribute；

Attribute more than min_support of all support frequency of obtaining in 2.4 is formed frequent 2 collection: F by 2.5₂.? In the present embodiment, frequent 2 collection (because frequent 2 collection are relatively big in this case history, only choose a portion conduct as shown in table 3 Example):

Table 3: frequent 2 collection

2.6 are iterated according to above-mentioned algorithm, can sequentially generate frequent item set F₃To F₅, process is slightly.Frequent 5 collection: F₅As Shown in table 4:

Table 4: frequent 5 collection

2.7 work as algorithm iteration again, and item concentrates only one of which element, support that frequency is 9.091 through calculating it, are less than Min_support, therefore do not have new item collection to find, algorithm terminates.

(3) extraction of Strong association rule

After excavating all of frequent item set from data base, it is possible to be easier to obtain corresponding correlation rule.Also Seek to produce the Strong association rule meeting minimum support and the minimum confident degree, it is possible to use formula (1) calculates obtained pass The degree of belief of connection rule.Here conditional probability is that the support utilizing item collection calculates.

C o n f i d e n c e (H &DoubleRightArrow; F) = P (H / F) = \frac{\sup p o r t_c o u n t (H \cup F)}{\sup p o r t_c o u n t (H)} - - - (1)

Owing to correlation rule is directly produced by frequent item set, therefore all Xiang Jijun involved by correlation rule are full Foot minimum support threshold value.Frequent item set and support frequency thereof can store in lists so that they can quickly be accessed. The present embodiment uses the medical record information of the Apriori algorithm 1000 example gastric abscess patients to being added up to be associated rule digging, Arranging minimum support min_support is 30%, and the minimum confident degree min_conf is 90%, obtains Strong association rule part such as Shown in table 5:

Table 5: the Strong association rule between gastric abscess patient medical record symptom attribute

If the symptom of a patient meets symptom any one of above-mentioned list, then it is assumed that it is the patient of gastric abscess. The such as symptom of this patient's early stage is: have nauseating, diarrhoea and three symptoms uncomfortable in chest, after the medicines such as buccal nitric acid lipid simultaneously Above-mentioned symptom can be alleviated, then it is assumed that it is the patient of gastric abscess.When determining that this patient suffers from gastric abscess, then obtain this trouble The sick time of person and geographical position, add up number of patients simultaneously.

According to said method, patients suffering from gastric abscess all in data base are found out, adds up patient numbers simultaneously. The result table (as shown in table 6) of foundation " kinds of Diseases-sick time-number of patients-geographical position ":

Table 6: result table

Due to the bad accurate statistics of sick time and ill geographical position.In the present embodiment table 6, by the coordinate of hospital of seeking medical advice As geographical position, using Waiting time as sick time.

Parameter stepping and colour code (as shown in table 7), are illustrated as a example by gastric abscess by the present embodiment:

Table 7: gastric abscess parameter stepping

Represent 1 grade by redness in the present embodiment, i.e. the red area of map denotation illustrates the disease condition in this area Seriously, call GIS map data, according to the geographical distribution feelings in space, time and crowd of the data display gastric abscess in table 6 Condition, design sketch is as shown in Figure 1.

According to said method, it is possible to show cough geographical distribution situation in space, time and crowd, design sketch such as figure Shown in 2.

Further, being provided with time started, end time, sick plant kind in foreground, the start and end time on foreground is Two time controls.After foreground chooses time period and kinds of Diseases, result the exterior and the interior is met this time period by background program Pass to figure layer with number of patients, the geographical coordinate of kinds of Diseases, then displayed by map.Further, at result table In the age bracket of patient is made a distinction, using age bracket as the row race member of result table, show that disease is in different age group Distribution situation.

Due to virulence factor, crowd characteristic and the impact of the many factors comprehensive function such as nature, social environment, disease exists The epidemic strength of different crowd, different regions and different time differs, and existence is the most incomplete same.Grinding of the distribution of disease Study carefully the biological characteristics both reflecting disease itself, also the various internal and external environment factors that concentrated expression disease is relevant effect and Its synergistic feature.The present invention utilizes GIS map and big data framework technology, show disease in space, time and crowd In geographical distribution situation.Integrating the electronic health record data of different platform, before solving, data dispersion, inefficiency are difficult to Data integration and the comprehensive problem analyzed；For regularty of epidemic and the offer basic data of the exploration disease cause of disease of study of disease, lead to Crossing the description to the distribution of disease cloud atlas data, the basic feature of understanding disease popularity, is the of great value important letter of clinical diagnosis Breath；Analysis to Disease Distribution rule and determiner contributes to anti-system, Health method and the measure for reasonably working out disease Scientific basis is provided.

Certainly, the present invention also can have other numerous embodiments, in the case of without departing substantially from present invention spirit and essence thereof, Those of ordinary skill in the art can make various corresponding change and deformation according to the present invention, but these change accordingly and become Shape all should belong to the protection domain of appended claims of the invention.

Claims

1. the method setting up disease cloud atlas based on big data analysis, it is characterised in that: utilize GIS map and big data framework skill Art, shows disease distribution situation on room and time, comprises the following steps:

Step 1. gathers data: obtain electronic health record data；

The data that step 1 obtains are changed and are cleaned by step 2.；Conversion includes nonstandard name of disease is converted to standard gauge The name of disease of model；By cleaning, the data that impact unreasonable, possible predicts the outcome are rejected；

Data after step 3. uses data mining technology that step 2 is changed and cleared up are associated analyzing, and excavate frequent episode Collection, obtains the correlation rule between symptom attribute according to frequent item set, according to correlation rule set up " kinds of Diseases-sick time- Number of patients-geographical position " result set；

Step 4. calls GIS map data, the result set that read step 3 obtains, show disease in space, temporal distribution Situation.

2. the method for claim 1, it is characterised in that: in described step 2, for there being the data of property value to carry out specification Attribute value: wherein word existing to value has again the attribute of discrete data by word quantitative classification, is converted into Category Attributes；Right Qualitative according to business in the attribute of only word, and word attribute is converted into Category Attributes；Genus for only discrete data Property then deletes this attribute.

3. the method for claim 1, it is characterised in that: in described step 3, set minimum support and minimum trust Degree, will be greater than and/or equal to the correlation rule of minimum support and the minimum confident degree as Strong association rule, if a patient Illness symptom meet the Strong association rule of certain disease symptoms attribute, then it is assumed that it is the patient of this disease, then obtains The sick time of this patient and geographical position, add up number of patients simultaneously.

4. method as claimed in claim 3, it is characterised in that: employing following steps Mining Frequent Itemsets Based in described step 3:

The 3.1 all candidate's symptom attributes traveling through certain disease, determine the support frequency of every attribute, the most all candidate's symptoms Attribute composition candidate 1 collection: H₁；

3.2 set minimum supports frequency, and with minimum, the support frequency of all properties in H1 being supported, frequency compares, by H₁In Support that frequency supports frequent 1 the collection F1 of attribute composition of frequency more than minimum；

(1) attended operation is utilizedDetermine candidate's k+1 item collection: H_k+1, wherein K=1,2 ... n；

(2) to H_k+1In attribute be scanned, calculate H_k+1In the support frequency of each attribute, by H_k+1In all support frequency Degree supports attribute composition frequent K+1 item collection: the F of frequency more than or equal to minimum_k+1；

F_kMiddle term collection number is | F_k|, then C_k+1In haveIndividual attribute；C_k+1It is the Candidate Set of frequent item set, i.e. C_k+1Include H₁、…H_K+1；

According to the frequent item set F excavated_k+1Obtain the Strong association rule meeting minimum support and the minimum confident degree, wherein K= 0,1 ... n.

5. method as claimed in claim 4, it is characterised in that: utilize formula (1) to calculate the degree of belief of obtained correlation rule；

C o n f i d e n c e (H &DoubleRightArrow; F) = P (H / F) = \frac{\sup p o r t_c o u n t (H \cup F)}{\sup p o r t_c o u n t (H)} - - - (1)

In formula (1), support_count (H ∪ F) is the things number comprising item collection H ∪ F, and support_count (H) is for comprising The things number of item collection H, wherein H is candidate, and F is frequent item set.

6. the method as described in claim 1,3 or 4, it is characterised in that: in described step 3, according to Strong association rule, by result Data are stored in result table, and wherein kinds of Diseases, sick time, number of patients and geographical position become as the row race of result table Member.

7. the method for claim 1, it is characterised in that: in described step 4, Baidu's map increases figure layer, with heat The form tried hard to shows certain disease geographical distribution situation in space, time and crowd, wherein schemes the overlay area root of layer Obtaining according to geographical position, figure layer transparency obtains according to the number of patients in this region.

8. method as claimed in claim 7, it is characterised in that: also include before described step 1, figure layer data is carried out initially Change.

9. the method as described in claim 1 or 7, it is characterised in that: Baidu's map is done second interface by described step 4 and opens Send out, it is achieved the interconnecting and apply response of data model and GIS map.