CN106202883A - A kind of method setting up disease cloud atlas based on big data analysis - Google Patents

A kind of method setting up disease cloud atlas based on big data analysis Download PDF

Info

Publication number
CN106202883A
CN106202883A CN201610497249.7A CN201610497249A CN106202883A CN 106202883 A CN106202883 A CN 106202883A CN 201610497249 A CN201610497249 A CN 201610497249A CN 106202883 A CN106202883 A CN 106202883A
Authority
CN
China
Prior art keywords
data
disease
attribute
support
minimum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610497249.7A
Other languages
Chinese (zh)
Inventor
温川飙
程小恩
贾帅
杨东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu University of Traditional Chinese Medicine
Original Assignee
Chengdu University of Traditional Chinese Medicine
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu University of Traditional Chinese Medicine filed Critical Chengdu University of Traditional Chinese Medicine
Priority to CN201610497249.7A priority Critical patent/CN106202883A/en
Publication of CN106202883A publication Critical patent/CN106202883A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The present invention relates to a kind of method setting up disease cloud atlas based on big data analysis, utilize GIS map and big data framework technology, show that disease, in the geographical distribution situation of room and time, comprises the following steps: step 1. gathers data: obtain electronic health record data;The conversion of step 2. data and cleaning;Data after step 3. uses data mining technology that step 2 is changed and cleaned are associated analyzing, excavate frequent item set, obtain the correlation rule between symptom attribute according to frequent item set, set up the result set in " kinds of Diseases sick time number of patients geographical position " according to correlation rule;Step 4. calls GIS map data, according to result set show disease in space, temporal geographical distribution situation.Present invention exhibition is regularty of epidemic and the offer basic data of the exploration disease cause of disease of study of disease;By the analysis of Disease Distribution rule and determiner being contributed to for reasonably working out the anti-system of disease, Health method and measure offer scientific basis.

Description

A kind of method setting up disease cloud atlas based on big data analysis
Technical field
The present invention relates to disease forecasting analysis technical field method, particularly relate to one and set up disease based on big data analysis The method of cloud atlas.
Background technology
In medicinal diagnostic message, there is the substantial amounts of information about conditions of patients with individual, including patient's These medical datas are associated analyzing, excavate the most under covering substantial amounts of by medical history and certain disease and various symptoms Correlation rule, such as the relation between symptom attribute, can with the development trend of predictive disease and assist doctor to assess health condition, Make diagnosis, prevention of disease and treatment and have important meaning.
In data mining technology, if there is certain regularity between the value of two or more variable, just it is called Association (association).Association reflects the dependence between an event and other event or the degree frequently occurred.Close Connection can be divided into simple association, sequential correlation, causalnexus and quantity association etc..Association between data is complicated, major part Contain cannot observe after mass data and learn.Association analysis (association analysis) be exactly be used for finding hidden Ensconce the valuable contact that large data is concentrated.The contact that association analysis is found can be with correlation rule or frequent item set Form represent.Correlation rule (association rule) refers to being correlated with of the different attribute item occurred in the same time Property.In broad terms, association analysis is the essence of data mining, since the purpose of data mining is to find to hide in data behind Knowledge, then this knowledge must be the relation between the different object of reflection;Saying in the narrow sense, association analysis refers to that a class is specific Data mining technology, main purpose is the association in mining data storehouse between object.Association rule mining is exactly from substantial amounts of Data are excavated and describes the valuable knowledge connected each other between data item.
GIS has massive spatial data storage, the feature of analysis, can export rapidly and intuitively again analysis result, So becoming the aid that modern medicine study is strong, also gradually it is applied to disease prevention and control field.Most Epidemiological study data all has space attribute, such as the morbidity infection conditions of crowd and animal, the distribution of host's medium, warm and humid Degree, rainfall, soil are all relevant with geographical position with sanitary installation etc., and these data cube computation can be risen by GIS by spatial relationship Come, interact display and analyze, the most also providing basis for later statistical analysis.
Summary of the invention
It is desirable to provide a kind of method setting up disease cloud atlas based on big data analysis,
For reaching above-mentioned purpose, the technical solution used in the present invention is as follows:
A kind of method setting up disease cloud atlas based on big data analysis, utilizes GIS map and big data framework technology, shows Disease in space, temporal distribution situation, comprise the following steps:
Step 1. gathers data: obtain electronic health record data;
Step 2. data cleansing and process;The data obtaining step 1 are changed and are cleaned;Conversion includes lack of standardization Name of disease change, by change by unified for nonstandard name of disease be the name of disease of standard criterion;By clean, by unreasonable, The data predicted the outcome may be affected reject;
Data after step 3. uses data mining technology that step 2 is changed and cleared up are associated analyzing, and excavate frequency Numerous collection, obtains the correlation rule between symptom attribute according to frequent item set, sets up " kinds of Diseases-ill according to correlation rule Time m-number of patients-geographical position " result set;
Step 4. calls GIS map data, the result set that read step 3 obtains, and shows disease at sky according to result set Between, temporal distribution situation.
Further, in described step 2, for there being the data of property value to carry out specification attributes value: wherein to value both There is word to have again the attribute of discrete data by word quantitative classification, be converted into Category Attributes;For only word attribute according to Business is qualitative, and word attribute is converted into Category Attributes;Attribute for only discrete data then deletes this attribute;
Two kinds of processing modes are had: mode 1: using missing value as a kind of value form for the data of attribute missing value;Mode 2: Ignore this attribute.
Further, in described step 3, set minimum support and the minimum confident degree, will be greater than and/or equal to ramuscule The correlation rule of degree of holding and the minimum confident degree is as Strong association rule, if the illness symptom of a patient meets certain disease disease The Strong association rule of shape attribute, then it is assumed that it is the patient of this disease, obtains the sick time of this patient and geographical position then Put, number of patients is added up simultaneously.
Further, described step 3 uses following steps Mining Frequent Itemsets Based:
The 3.1 all candidate's symptom attributes traveling through certain disease, determine the support frequency of every attribute, the most all candidates Symptom attribute composition candidate 1 collection: H1
3.2 set minimum supports frequency, and with minimum, the support frequency of all properties in H1 being supported, frequency compares, will H1Middle support frequency supports frequent 1 the collection F1 of attribute composition of frequency more than minimum;
Then 3.3 use following method to excavate frequent k+1 item collection: Fk+1
(1) attended operation (F is utilizedk)⊕(Fk), determine candidate's k+1 item collection: Hk+1, wherein K=1,2 ... n;
(2) to Hk+1In attribute be scanned, calculate Hk+1In the support frequency of each attribute, by Hk+1In all Hold frequency and support attribute composition frequent K+1 item collection: the F of frequency more than or equal to minimumk+1
FkMiddle term collection number is | Fk|, then Ck+1In haveIndividual attribute;Ck+1It is the Candidate Set of frequent item set, i.e. Ck+1Bag Include H1、…HK+1
(3) when item concentrates the support frequency both less than minimum support frequency of all properties element, algorithm is terminated;
According to the frequent item set F excavatedk+1Obtain the Strong association rule meeting minimum support and the minimum confident degree, its Middle K=0,1 ... n.
Further, formula (1) is utilized to calculate the degree of belief of obtained correlation rule;
C o n f i d e n c e ( H ⇒ F ) = P ( H / F ) = sup p o r t _ c o u n t ( H ∪ F ) sup p o r t _ c o u n t ( H ) - - - ( 1 )
In formula (1), support_count (H ∪ F) is the things number comprising item collection H ∪ F, and support_count (H) is bag Containing the things number of item collection H, wherein H is candidate, and F is frequent item set, if Then thinking that this correlation rule is Strong association rule, wherein min_counf is the minimum confident degree threshold values arranged.
According to this formula, relevance principle can produce as follows: for each frequent item set L, produces all non-gap of L Collection, for each nonvoid subset of L, if Then produce a Strong association ruleWherein, min_counf is the minimum confident degree threshold values arranged.
Owing to rule is directly produced by frequent item set, therefore all item collection involved by correlation rule are satisfied by Little support threshold.Frequent item set and support frequency thereof can store in lists so that they can quickly be accessed.Use Apriori algorithm is associated rule digging to the medical record information added up.
Further, in described step 3, according to Strong association rule, result data is stored in result table, wherein disease Kind, sick time, number of patients and geographical position are as the row race member of result table.
Preferably, in described step 4, Baidu's map increases figure layer, shows that certain disease exists with the form of thermodynamic chart Geographical distribution situation in space, time and crowd, the overlay area wherein scheming layer obtains according to geographical position, figure layer transparency Obtain according to the number of patients in this region.
Further, also include before described step 1, figure layer data is initialized.
Further, Baidu's map is done by described step 4 second interface exploitation, it is achieved data model and GIS map Interconnect and apply response.
The method have the advantages that
1. the present invention utilizes GIS map and big data framework technology, shows the disease geographical distribution feelings in space, in the time Condition, for regularty of epidemic and the offer basic data of the exploration disease cause of disease of study of disease;
2. integrating the electronic health record data of different platform, before solving, data dispersion, inefficiency are difficult to data set Become and the comprehensive problem analyzed;
3., by the description being distributed disease cloud atlas data, the basic feature of understanding disease popularity, is that clinical diagnosis has very much The important information being worth;
4. the analysis of pair Disease Distribution rule and determiner contributes to as reasonably working out the anti-system of disease, Health method And measure provides scientific basis.
Accompanying drawing explanation
Fig. 1 is the geographical distribution design sketch of gastric abscess;
Fig. 2 is the geographical distribution design sketch of cough.
Detailed description of the invention
In order to make the purpose of the present invention, technical scheme and advantage clearer, below in conjunction with accompanying drawing, the present invention is made Further describe.
Embodiment 1
The disclosed method setting up disease cloud atlas based on big data analysis of the present embodiment, utilizes GIS map and big data shelf Structure technology, shows the disease geographical distribution situation in space, in the time, comprises the following steps:
Step 1. gathers data: obtain electronic health record data.Further, electronic health record Data Source is in big data acquisition Platform.Big data acquisition platform, uses the medical data acquisition technology of cloud computing mode, gathers the clinical case history of 80 multiple hospitals Data, data acquisition xml document formal layout, it is provided that unify, upload interface easily, support that real-time files disposition is looked into Ask, upload batch management and problem data rollback, the most compatible other data format analysis processing and interface modes.
Step 2. data cleansing and process;The data obtaining step 1 are changed and are cleaned, and conversion includes lack of standardization Name of disease change, such as ' the little pain of gastral cavilty ' is converted to ' gastric abscess ', by change by unified for nonstandard name of disease be standard The name of disease of specification.Then by cleaning, the data predicting the outcome impact unreasonable, possible are rejected, and as deleted name of disease are Empty record.
Data after step 3. uses data mining technology that step 2 is changed and cleared up are associated analyzing, and excavate frequency Numerous collection, obtains the correlation rule between symptom attribute according to frequent item set, sets up " kinds of Diseases-ill according to correlation rule Time m-number of patients-geographical position " result set;
Step 4. calls GIS map data, according to data model show disease in space, temporal distribution situation. The present embodiment utilizes front-end technology, reads the result set after back-end data is excavated, then the situation respectively of disease is presented in On the figure layer of GIS.Front-end technology is developed also known as front-end technology, is by java language development front end page, and after reading Number of units evidence, is presented to front end page.
Front end page can be the front-end platform existed, it is also possible to individually build for disease cloud atlas by the following method A vertical front end page:
1. write the front end html page, and in java code, call the interface of Baidu's map, thus optimized integration map Loading;
2. in code, write a figure layer, the incoming parameter such as definition time, sick kinds, number, area, and the dividing of parameter Shelves;
3. in java code, write data base interface function, it is achieved inquire from the result set of data mining and meet bar The record of part, and pass to parameter;
4. the correlation rule then can set up according to step 3 after parameter receives data, is illustrated in GIS map.
Parameter stepping: use different colours to represent rank in various degree, minima=0, maximum=patient's sum, so After according to accounting stepping, and represent with different colours, then according to the shelves level at each place, area, in GIS map, display is each The color that individual area is corresponding.
Further, in described step 2, for there being the data of property value to carry out specification attributes value: wherein to value both There is word to have again the attribute of discrete data by word quantitative classification, be converted into Category Attributes;For only word attribute according to Business is qualitative, and word attribute is converted into Category Attributes;Attribute for only discrete data then deletes this attribute;
Two kinds of processing modes are had: mode 1: using missing value as a kind of value form for the data of attribute missing value;Mode 2: Ignore this attribute.From the point of view of data cases, name of disease the most completely be may determine that to attribute, desirable;Attribute is obscured Not can determine that attribute, then ignore.
Further, in described step 3, set minimum support and the minimum confident degree, will be greater than and/or equal to ramuscule The correlation rule of degree of holding and the minimum confident degree is as Strong association rule, if the illness symptom of a patient meets certain disease disease The Strong association rule of shape attribute, then it is assumed that it is the patient of this disease, obtains the sick time of this patient and geographical position then Put, number of patients is added up simultaneously.
Further, described step 3 uses following steps Mining Frequent Itemsets Based:
The 3.1 all candidate's symptom attributes traveling through certain disease, determine the support frequency of every attribute, the most all candidates Symptom attribute composition candidate 1 collection: H1.Frequency is exactly the number of times occurred, comes out by statistical method.
3.2 set minimum supports frequency: min_support, by the support frequency of all properties in H1 and min_support Compare, by H1Frequent 1 the collection F1 of the middle support frequency attribute composition more than min_support;
Then 3.3 use following method to excavate frequent k+1 item collection: Fk+1
(1) attended operation (F is utilizedk)⊕(Fk), determine candidate's k+1 item collection: Hk+1, wherein K=1,2 ... n;
(2) to Hk+1In attribute be scanned, calculate Hk+1In the support frequency of each attribute, by Hk+1In all Hold frequency attribute composition frequent K+1 item collection: the F more than min_supportk+1
FkMiddle term collection number is | Fk|, then Ck+1In haveIndividual attribute;Ck+1It is the Candidate Set of frequent item set, i.e. Ck+1Bag Include H1、…HK+1
(3) when the support frequency that item concentrates all properties element is both less than min_support, algorithm is terminated;
According to the frequent item set F excavatedk+1Obtain the Strong association rule meeting minimum support and the minimum confident degree, its Middle K=0,1 ... n.
Further, formula (1) is utilized to calculate the degree of belief of obtained correlation rule;
C o n f i d e n c e ( H ⇒ F ) = P ( H / F ) = sup p o r t _ c o u n t ( H ∪ F ) sup p o r t _ c o u n t ( H ) - - - ( 1 )
In formula (1), support_count (H ∪ F) is the things number comprising item collection H ∪ F, and support_count (H) is Comprising the things number of item collection H, wherein H is candidate, and F is frequent item set, if Then thinking that this correlation rule is Strong association rule, wherein min_counf is the minimum confident degree threshold values arranged.
According to this formula, relevance principle can produce as follows: for each frequent item set L, produces all non-gap of L Collection, for each nonvoid subset of L, if Then produce a Strong association ruleWherein, min_counf is the minimum confident degree threshold values arranged.
Owing to rule is directly produced by frequent item set, therefore all item collection involved by correlation rule are satisfied by Little support threshold.Frequent item set and support frequency thereof can store in lists so that they can quickly be accessed.
Further, in step 3, according to Strong association rule, result data is stored in result table, wherein disease kind Class, sick time, number of patients and geographical position are as the row race member of result table.
Preferably, in step 4 on Baidu's map increase figure layer, with the form of thermodynamic chart show certain disease in space, Geographical distribution situation in time and crowd, wherein schemes the overlay area of layer and obtains according to geographical position, figure layer transparency according to Number of patients in this region obtains.
Further, before step 1 figure layer data is initialized.
Further, in step 4 Baidu's map is done second interface exploitation, it is achieved data model and GIS map mutual Connection intercommunication and application response.
Embodiment 2
The inventive method, as a example by epigastric pain disease, is described in detail by the present embodiment.
(1) pretreatment of original medical record data
The medical record data of 1000 example epigastric pain disease patients in the big data warehouse of Data Source hospital.All data are equal Deriving from the anthropic factors such as actual case history, eliminating operator in including, gather and extracting, this medical record information is the trueest Real believable medical data.
Due to complexity, multiformity and the redundancy of medical data, in order to avoid data mining process falls into chaos, obtain More accurate experimental result, first I has carried out pretreatment to the medical record data of patient.The attribute of original medical record data has 36 Individual.By observational study, find initial data also exists the problems such as substantial amounts of noise data and redundant data, such as attribute " disease Sick durante dolors " there are such as nearly more than 10 values such as " 0 ", " 5~6 minutes ", " 30 minutes ", " 1~10 minute " and pole not Specification, and the value in " paroxysmal nocturnal dyspnea " and " pain diffuses position " attribute is the most lack of standardization, including null value, 0, The literary composition such as the discrete data of 1 type and " can not put down sleeping " " both shoulders back " " shoulder back cervical region lower jaw part and left arm are to left finger end " WD, some attributes also exist the situation of overwhelming majority value missing value.
For this situation, the present embodiment is taked following method data carry out pretreatment:
1. specification attributes value: word existing to value has again the attribute of discrete data to be described by word as the case may be Quantitative classification, is converted into Category Attributes.Such as to " ache from Disease persistent period ", herein by description lack of standardization as " 0 ", " 5~6 Minute ", " 30 minutes " etc. be quantified as " 0 " " without pain ", " 1 " " durante dolors is less than 10 minutes ", " 2 " " durante dolors is more than 10 minutes and less than 30 minutes ", " 3 " " durante dolors is more than 30 minutes ".For " battle array Send out property nocturnal dyspnea " in " can not put down sleeping " be set to the third value " 2 " outside " 0 " " 1 ";
2. for attribute missing value situation, having two kinds of processing modes, one is that two is to ignore as a kind of value form using missing value This attribute.Which kind of mode is selected to want particular problem to make a concrete analysis of, such as " pathologic Q ripple " attribute in initial data, in 450 example case histories Entirely without an example value, therefore this attribute is ignored;And to " arrhythmia " " Electrocardiography " two attributes, only exist a small amount of Missing value, and understand these two attributes according to medical science general knowledge and occupy critical role when Diagnosis of Gastric gastralgia, herein will when processing The missing value of two attributes supposes to be set to " without exception ".
In addition to above-mentioned processing method, also to focus on the protection to patients ' privacy, it is ensured that used data do not expose patient and appoint What privacy information.
(2) mining process of frequent item set
Choosing of model algorithm: the algorithm being suitable for for same Model Selection, the present embodiment is with apriori algorithm for ginseng Examine, and done appropriate optimization.
Table 1 is the medical record information tables of data of gastric abscess, and gastric abscess has 6 symptom and sign attributes, various in order to excavate Association between symptom attribute, uses the innovatory algorithm of Apriori that data are associated rule digging.
Table 1: gastric abscess patient medical record information
In table 1, each attribute and value implication are as follows:
A=has the hiccups;B=flatulence;C=feels sick;D=vomits;E=suffers from diarrhoea;F=is uncomfortable in chest;G=gastropathy history;H=is tired or lives Dynamic postemphasis;I=can spontaneous remission after having a rest;The medicines such as J=buccal nitric acid lipid can be alleviated;K=paroxysmal pain.
Attribute value represents this symptom for " 1 " and exists, and " 0 " represents this symptom for not exist.Patient such as serial number 1 Medical record information shows that this patient has the hiccups, and has nauseating and symptom of diarrhea.
Apriori innovatory algorithm is used to be associated analyzing to relation between symptom attribute in table 4.1, key therein Step is to use frequent K-1 item collection: Fk-1Generate frequent K item collection Fk, this process is divided into two steps: first, by Fk-1In appoint One or two Son item set connects, it is thus achieved that candidate collection Ck;Then, to CkIn each element screen, because CkIn every Collection is not necessarily all frequent item set, then obtains satisfactory item collection composition Fk
Mining process below in conjunction with this medical record data detailed description frequent item set:
All candidate's symptom attributes are traveled through once by 2.1, determine every support frequency, all properties composition candidate 1 Collection: H1
2.2 set minimum supports frequency: min_support=10%, by H1The support frequency of middle all properties and min_ Support compares, and the most all support frequency attribute more than min_support forms frequent 1 collection: F1, count one by one at primary disease According to, F1Middle all constituent elements and support frequency are as shown in table 2:
Table 2: frequent 1 collection
Attribute Support%
Have the hiccups 40.909
Flatulence 54.541
Feel sick 27.273
Vomiting 63.636
Diarrhoea 63.636
Uncomfortable in chest 54.541
Gastropathy history 63.636
Fatigue or activity postemphasis 22.727
Can spontaneous remission after rest 27.273
The medicines such as buccal nitric acid lipid can be alleviated 22.727
Paroxysmal pain 40.909
2.3 utilize attended operation (F1)⊕(F1) determine that candidate 2 collects H2, frequent 1 collection middle term collection number be | F1|, then C2 In haveIndividual attribute, C2The Candidate Set of frequent 2 collection, i.e. C2Include H1、H2
2.4 couples of H2In attribute be scanned, calculate the support frequency of each attribute;
Attribute more than min_support of all support frequency of obtaining in 2.4 is formed frequent 2 collection: F by 2.52.? In the present embodiment, frequent 2 collection (because frequent 2 collection are relatively big in this case history, only choose a portion conduct as shown in table 3 Example):
Table 3: frequent 2 collection
2.6 are iterated according to above-mentioned algorithm, can sequentially generate frequent item set F3To F5, process is slightly.Frequent 5 collection: F5As Shown in table 4:
Table 4: frequent 5 collection
2.7 work as algorithm iteration again, and item concentrates only one of which element, support that frequency is 9.091 through calculating it, are less than Min_support, therefore do not have new item collection to find, algorithm terminates.
(3) extraction of Strong association rule
After excavating all of frequent item set from data base, it is possible to be easier to obtain corresponding correlation rule.Also Seek to produce the Strong association rule meeting minimum support and the minimum confident degree, it is possible to use formula (1) calculates obtained pass The degree of belief of connection rule.Here conditional probability is that the support utilizing item collection calculates.
C o n f i d e n c e ( H ⇒ F ) = P ( H / F ) = sup p o r t _ c o u n t ( H ∪ F ) sup p o r t _ c o u n t ( H ) - - - ( 1 )
In formula (1), support_count (H ∪ F) is the things number comprising item collection H ∪ F, and support_count (H) is Comprising the things number of item collection H, wherein H is candidate, and F is frequent item set, if Then thinking that this correlation rule is Strong association rule, wherein min_counf is the minimum confident degree threshold values arranged.
Owing to correlation rule is directly produced by frequent item set, therefore all Xiang Jijun involved by correlation rule are full Foot minimum support threshold value.Frequent item set and support frequency thereof can store in lists so that they can quickly be accessed. The present embodiment uses the medical record information of the Apriori algorithm 1000 example gastric abscess patients to being added up to be associated rule digging, Arranging minimum support min_support is 30%, and the minimum confident degree min_conf is 90%, obtains Strong association rule part such as Shown in table 5:
Table 5: the Strong association rule between gastric abscess patient medical record symptom attribute
If the symptom of a patient meets symptom any one of above-mentioned list, then it is assumed that it is the patient of gastric abscess. The such as symptom of this patient's early stage is: have nauseating, diarrhoea and three symptoms uncomfortable in chest, after the medicines such as buccal nitric acid lipid simultaneously Above-mentioned symptom can be alleviated, then it is assumed that it is the patient of gastric abscess.When determining that this patient suffers from gastric abscess, then obtain this trouble The sick time of person and geographical position, add up number of patients simultaneously.
According to said method, patients suffering from gastric abscess all in data base are found out, adds up patient numbers simultaneously. The result table (as shown in table 6) of foundation " kinds of Diseases-sick time-number of patients-geographical position ":
Table 6: result table
Due to the bad accurate statistics of sick time and ill geographical position.In the present embodiment table 6, by the coordinate of hospital of seeking medical advice As geographical position, using Waiting time as sick time.
Parameter stepping and colour code (as shown in table 7), are illustrated as a example by gastric abscess by the present embodiment:
Table 7: gastric abscess parameter stepping
Represent 1 grade by redness in the present embodiment, i.e. the red area of map denotation illustrates the disease condition in this area Seriously, call GIS map data, according to the geographical distribution feelings in space, time and crowd of the data display gastric abscess in table 6 Condition, design sketch is as shown in Figure 1.
According to said method, it is possible to show cough geographical distribution situation in space, time and crowd, design sketch such as figure Shown in 2.
Further, being provided with time started, end time, sick plant kind in foreground, the start and end time on foreground is Two time controls.After foreground chooses time period and kinds of Diseases, result the exterior and the interior is met this time period by background program Pass to figure layer with number of patients, the geographical coordinate of kinds of Diseases, then displayed by map.Further, at result table In the age bracket of patient is made a distinction, using age bracket as the row race member of result table, show that disease is in different age group Distribution situation.
Due to virulence factor, crowd characteristic and the impact of the many factors comprehensive function such as nature, social environment, disease exists The epidemic strength of different crowd, different regions and different time differs, and existence is the most incomplete same.Grinding of the distribution of disease Study carefully the biological characteristics both reflecting disease itself, also the various internal and external environment factors that concentrated expression disease is relevant effect and Its synergistic feature.The present invention utilizes GIS map and big data framework technology, show disease in space, time and crowd In geographical distribution situation.Integrating the electronic health record data of different platform, before solving, data dispersion, inefficiency are difficult to Data integration and the comprehensive problem analyzed;For regularty of epidemic and the offer basic data of the exploration disease cause of disease of study of disease, lead to Crossing the description to the distribution of disease cloud atlas data, the basic feature of understanding disease popularity, is the of great value important letter of clinical diagnosis Breath;Analysis to Disease Distribution rule and determiner contributes to anti-system, Health method and the measure for reasonably working out disease Scientific basis is provided.
Certainly, the present invention also can have other numerous embodiments, in the case of without departing substantially from present invention spirit and essence thereof, Those of ordinary skill in the art can make various corresponding change and deformation according to the present invention, but these change accordingly and become Shape all should belong to the protection domain of appended claims of the invention.

Claims (9)

1. the method setting up disease cloud atlas based on big data analysis, it is characterised in that: utilize GIS map and big data framework skill Art, shows disease distribution situation on room and time, comprises the following steps:
Step 1. gathers data: obtain electronic health record data;
The data that step 1 obtains are changed and are cleaned by step 2.;Conversion includes nonstandard name of disease is converted to standard gauge The name of disease of model;By cleaning, the data that impact unreasonable, possible predicts the outcome are rejected;
Data after step 3. uses data mining technology that step 2 is changed and cleared up are associated analyzing, and excavate frequent episode Collection, obtains the correlation rule between symptom attribute according to frequent item set, according to correlation rule set up " kinds of Diseases-sick time- Number of patients-geographical position " result set;
Step 4. calls GIS map data, the result set that read step 3 obtains, show disease in space, temporal distribution Situation.
2. the method for claim 1, it is characterised in that: in described step 2, for there being the data of property value to carry out specification Attribute value: wherein word existing to value has again the attribute of discrete data by word quantitative classification, is converted into Category Attributes;Right Qualitative according to business in the attribute of only word, and word attribute is converted into Category Attributes;Genus for only discrete data Property then deletes this attribute.
3. the method for claim 1, it is characterised in that: in described step 3, set minimum support and minimum trust Degree, will be greater than and/or equal to the correlation rule of minimum support and the minimum confident degree as Strong association rule, if a patient Illness symptom meet the Strong association rule of certain disease symptoms attribute, then it is assumed that it is the patient of this disease, then obtains The sick time of this patient and geographical position, add up number of patients simultaneously.
4. method as claimed in claim 3, it is characterised in that: employing following steps Mining Frequent Itemsets Based in described step 3:
The 3.1 all candidate's symptom attributes traveling through certain disease, determine the support frequency of every attribute, the most all candidate's symptoms Attribute composition candidate 1 collection: H1
3.2 set minimum supports frequency, and with minimum, the support frequency of all properties in H1 being supported, frequency compares, by H1In Support that frequency supports frequent 1 the collection F1 of attribute composition of frequency more than minimum;
Then 3.3 use following method to excavate frequent k+1 item collection: Fk+1
(1) attended operation is utilizedDetermine candidate's k+1 item collection: Hk+1, wherein K=1,2 ... n;
(2) to Hk+1In attribute be scanned, calculate Hk+1In the support frequency of each attribute, by Hk+1In all support frequency Degree supports attribute composition frequent K+1 item collection: the F of frequency more than or equal to minimumk+1
FkMiddle term collection number is | Fk|, then Ck+1In haveIndividual attribute;Ck+1It is the Candidate Set of frequent item set, i.e. Ck+1Include H1、…HK+1
(3) when item concentrates the support frequency both less than minimum support frequency of all properties element, algorithm is terminated;
According to the frequent item set F excavatedk+1Obtain the Strong association rule meeting minimum support and the minimum confident degree, wherein K= 0,1 ... n.
5. method as claimed in claim 4, it is characterised in that: utilize formula (1) to calculate the degree of belief of obtained correlation rule;
C o n f i d e n c e ( H ⇒ F ) = P ( H / F ) = sup p o r t _ c o u n t ( H ∪ F ) sup p o r t _ c o u n t ( H ) - - - ( 1 )
In formula (1), support_count (H ∪ F) is the things number comprising item collection H ∪ F, and support_count (H) is for comprising The things number of item collection H, wherein H is candidate, and F is frequent item set.
6. the method as described in claim 1,3 or 4, it is characterised in that: in described step 3, according to Strong association rule, by result Data are stored in result table, and wherein kinds of Diseases, sick time, number of patients and geographical position become as the row race of result table Member.
7. the method for claim 1, it is characterised in that: in described step 4, Baidu's map increases figure layer, with heat The form tried hard to shows certain disease geographical distribution situation in space, time and crowd, wherein schemes the overlay area root of layer Obtaining according to geographical position, figure layer transparency obtains according to the number of patients in this region.
8. method as claimed in claim 7, it is characterised in that: also include before described step 1, figure layer data is carried out initially Change.
9. the method as described in claim 1 or 7, it is characterised in that: Baidu's map is done second interface by described step 4 and opens Send out, it is achieved the interconnecting and apply response of data model and GIS map.
CN201610497249.7A 2016-06-28 2016-06-28 A kind of method setting up disease cloud atlas based on big data analysis Pending CN106202883A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610497249.7A CN106202883A (en) 2016-06-28 2016-06-28 A kind of method setting up disease cloud atlas based on big data analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610497249.7A CN106202883A (en) 2016-06-28 2016-06-28 A kind of method setting up disease cloud atlas based on big data analysis

Publications (1)

Publication Number Publication Date
CN106202883A true CN106202883A (en) 2016-12-07

Family

ID=57463275

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610497249.7A Pending CN106202883A (en) 2016-06-28 2016-06-28 A kind of method setting up disease cloud atlas based on big data analysis

Country Status (1)

Country Link
CN (1) CN106202883A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451416A (en) * 2017-08-28 2017-12-08 昆明理工大学 A kind of sle auxiliary diagnostic equipment and method
CN107887026A (en) * 2017-11-01 2018-04-06 中国科学院地理科学与资源研究所 A kind of assembly type cancer intelligence Mapping System and method based on environmental hazard key element
CN108417274A (en) * 2018-03-06 2018-08-17 东南大学 Forecast of epiphytotics method, system and equipment
CN108806767A (en) * 2018-06-15 2018-11-13 中南大学 Disease symptoms association analysis method based on electronic health record
CN109065158A (en) * 2018-08-22 2018-12-21 重庆市智权之路科技有限公司 Big data smart machine carries out data and extracts working method
CN109147879A (en) * 2018-07-02 2019-01-04 北京众信易保科技有限公司 The method and system of Visual Report Forms based on medical document
CN109192301A (en) * 2018-08-22 2019-01-11 重庆华医康道科技有限公司 Patient's diagnostic work method is carried out by internet intelligent medical treatment & health equipment
CN109411093A (en) * 2018-10-16 2019-03-01 烟台翰宁信息科技有限公司 A kind of intelligent medical treatment big data analysis processing method based on cloud computing
WO2019136807A1 (en) * 2018-01-12 2019-07-18 平安科技(深圳)有限公司 Medical data relationship image acquisition method and apparatus, terminal device and storage medium
CN110703183A (en) * 2019-11-13 2020-01-17 江苏方天电力技术有限公司 Intelligent electric energy meter fault data analysis method and system
CN110781216A (en) * 2019-11-05 2020-02-11 广东工业大学 Traditional Chinese medicine symptom association rule mining method and device and storage medium
CN111341454A (en) * 2018-12-19 2020-06-26 中国电信股份有限公司 Data mining method and device
CN111476696A (en) * 2020-03-27 2020-07-31 南京慧智灵杰信息技术有限公司 Community correction group position information monitoring and alarming system based on big data
CN111540425A (en) * 2020-04-26 2020-08-14 吴九云 Intelligent medical information pushing method based on artificial intelligence and electronic medical record cloud platform
CN112365943A (en) * 2020-10-22 2021-02-12 杭州未名信科科技有限公司 Method and device for predicting length of stay of patient, electronic equipment and storage medium
CN113362960A (en) * 2021-07-02 2021-09-07 西南科技大学 Urban resident public health influence factor visual analysis system and method combining multi-source data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101149751A (en) * 2007-10-29 2008-03-26 浙江大学 Generalized relating rule digging method for analyzing traditional Chinese medicine recipe drug matching rule
US20100332430A1 (en) * 2009-06-30 2010-12-30 Dow Agrosciences Llc Application of machine learning methods for mining association rules in plant and animal data sets containing molecular genetic markers, followed by classification or prediction utilizing features created from these association rules
CN104715013A (en) * 2015-01-26 2015-06-17 南京邮电大学 Hadoop-based user health data analysis method and system
CN104866979A (en) * 2015-06-08 2015-08-26 苏芮 Traditional Chinese medicine case data processing method and system of emergent acute infectious disease

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101149751A (en) * 2007-10-29 2008-03-26 浙江大学 Generalized relating rule digging method for analyzing traditional Chinese medicine recipe drug matching rule
US20100332430A1 (en) * 2009-06-30 2010-12-30 Dow Agrosciences Llc Application of machine learning methods for mining association rules in plant and animal data sets containing molecular genetic markers, followed by classification or prediction utilizing features created from these association rules
CN104715013A (en) * 2015-01-26 2015-06-17 南京邮电大学 Hadoop-based user health data analysis method and system
CN104866979A (en) * 2015-06-08 2015-08-26 苏芮 Traditional Chinese medicine case data processing method and system of emergent acute infectious disease

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
周英等: "《大数据技术丛书 大数据挖掘 系统方法与实例分析》", 31 May 2016, 机械工业出版社 *
王华等: "《基于关联规则的数据挖掘在临床上的应用》", 《安徽大学学报》 *
许思莹: "《基于时空关联规则的标绘数据挖掘研究—以旅游标绘数据挖掘为例》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451416A (en) * 2017-08-28 2017-12-08 昆明理工大学 A kind of sle auxiliary diagnostic equipment and method
CN107887026A (en) * 2017-11-01 2018-04-06 中国科学院地理科学与资源研究所 A kind of assembly type cancer intelligence Mapping System and method based on environmental hazard key element
CN107887026B (en) * 2017-11-01 2022-04-05 中国科学院地理科学与资源研究所 Component type cancer intelligent mapping system and method based on environmental risk factors
WO2019136807A1 (en) * 2018-01-12 2019-07-18 平安科技(深圳)有限公司 Medical data relationship image acquisition method and apparatus, terminal device and storage medium
CN108417274A (en) * 2018-03-06 2018-08-17 东南大学 Forecast of epiphytotics method, system and equipment
CN108806767A (en) * 2018-06-15 2018-11-13 中南大学 Disease symptoms association analysis method based on electronic health record
CN108806767B (en) * 2018-06-15 2021-10-22 中南大学 Disease symptom correlation analysis method based on electronic medical record
CN109147879A (en) * 2018-07-02 2019-01-04 北京众信易保科技有限公司 The method and system of Visual Report Forms based on medical document
CN109147879B (en) * 2018-07-02 2021-07-27 北京众信易保科技有限公司 Method and system for visual report based on medical document
CN109065158B (en) * 2018-08-22 2020-06-30 湖南德善信医药科技有限公司 Data extraction working method of big data intelligent equipment
CN109192301A (en) * 2018-08-22 2019-01-11 重庆华医康道科技有限公司 Patient's diagnostic work method is carried out by internet intelligent medical treatment & health equipment
CN109065158A (en) * 2018-08-22 2018-12-21 重庆市智权之路科技有限公司 Big data smart machine carries out data and extracts working method
CN109192301B (en) * 2018-08-22 2022-02-18 重庆华医康道科技有限公司 Method for diagnosing patient through Internet intelligent medical health equipment
CN109411093B (en) * 2018-10-16 2022-03-18 国康中健(北京)健康科技有限公司 Intelligent medical big data analysis processing method based on cloud computing
CN109411093A (en) * 2018-10-16 2019-03-01 烟台翰宁信息科技有限公司 A kind of intelligent medical treatment big data analysis processing method based on cloud computing
CN111341454A (en) * 2018-12-19 2020-06-26 中国电信股份有限公司 Data mining method and device
CN110781216A (en) * 2019-11-05 2020-02-11 广东工业大学 Traditional Chinese medicine symptom association rule mining method and device and storage medium
CN110703183A (en) * 2019-11-13 2020-01-17 江苏方天电力技术有限公司 Intelligent electric energy meter fault data analysis method and system
CN111476696A (en) * 2020-03-27 2020-07-31 南京慧智灵杰信息技术有限公司 Community correction group position information monitoring and alarming system based on big data
CN111540425B (en) * 2020-04-26 2021-01-15 和宇健康科技股份有限公司 Intelligent medical information pushing method based on artificial intelligence and electronic medical record cloud platform
CN111540425A (en) * 2020-04-26 2020-08-14 吴九云 Intelligent medical information pushing method based on artificial intelligence and electronic medical record cloud platform
CN112365943A (en) * 2020-10-22 2021-02-12 杭州未名信科科技有限公司 Method and device for predicting length of stay of patient, electronic equipment and storage medium
WO2022083140A1 (en) * 2020-10-22 2022-04-28 杭州未名信科科技有限公司 Patient length of stay prediction method and apparatus, electronic device, and storage medium
CN113362960A (en) * 2021-07-02 2021-09-07 西南科技大学 Urban resident public health influence factor visual analysis system and method combining multi-source data
CN113362960B (en) * 2021-07-02 2022-04-29 西南科技大学 Urban resident public health influence factor visual analysis system and method combining multi-source data

Similar Documents

Publication Publication Date Title
CN106202883A (en) A kind of method setting up disease cloud atlas based on big data analysis
Shickel et al. DeepSOFA: a continuous acuity score for critically ill patients using clinically interpretable deep learning
CN104166667B (en) Analysis system and public health work support method
US7865375B2 (en) System and method for multidimensional extension of database information using inferred groupings
US7991579B2 (en) Statistical methods for multivariate ordinal data which are used for data base driven decision support
Davis et al. Deconstructing a species-complex: geometric morphometric and molecular analyses define species in the Western Rattlesnake (Crotalus viridis)
Albert Decision theory in medicine: a review and critique
CN101911077A (en) Method and apparatus for refining similar case search
CN102405473A (en) A point-of-care enactive medical system and method
JP7404581B1 (en) Chronic nephropathy subtype mining system based on self-supervised graph clustering
CN112201360A (en) Chronic disease follow-up visit record collection method, device, equipment and storage medium
Ahmed et al. TDTD: Thyroid disease type diagnostics
US20210225513A1 (en) Method to Create Digital Twins and use the Same for Causal Associations
Lin et al. Time-to-event predictive modeling for chronic conditions using electronic health records
Moriña et al. Competing risks simulation with the survsim R package
Chou et al. Extracting drug utilization knowledge using self-organizing map and rough set theory
Sushma et al. Comparative Study of Naive Bayes, Gaussian Naive Bayes Classifier and Decision Tree Algorithms for Prediction of Heart Diseases
Hashimoto et al. The Log-Burr XII regression model for grouped survival data
Hadgu et al. Application of generalized estimating equations to a dental randomized clinical trial
Avdic Microeconometric analyses of individual behavior in public welfare systems
Giuliano et al. Issuing electrocardiographic reports remotely: experience of the telemedicine network of Santa Catarina
Sakthidharan et al. Detection and prediction of breast cancer using CNN-MDRP algorithm in big data and machine learning: study and analysis
US20230307139A1 (en) Computing device for estimating the probability of myocardial infarction
Bian et al. Towards a task taxonomy of visual analysis of electronic health or medical record data
Xie et al. Predicting number of hospitalization days based on health insurance claims data using bagged regression trees

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20161207

RJ01 Rejection of invention patent application after publication