CN106202883A - A kind of method setting up disease cloud atlas based on big data analysis - Google Patents
A kind of method setting up disease cloud atlas based on big data analysis Download PDFInfo
- Publication number
- CN106202883A CN106202883A CN201610497249.7A CN201610497249A CN106202883A CN 106202883 A CN106202883 A CN 106202883A CN 201610497249 A CN201610497249 A CN 201610497249A CN 106202883 A CN106202883 A CN 106202883A
- Authority
- CN
- China
- Prior art keywords
- data
- disease
- attribute
- support
- minimum
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
The present invention relates to a kind of method setting up disease cloud atlas based on big data analysis, utilize GIS map and big data framework technology, show that disease, in the geographical distribution situation of room and time, comprises the following steps: step 1. gathers data: obtain electronic health record data;The conversion of step 2. data and cleaning;Data after step 3. uses data mining technology that step 2 is changed and cleaned are associated analyzing, excavate frequent item set, obtain the correlation rule between symptom attribute according to frequent item set, set up the result set in " kinds of Diseases sick time number of patients geographical position " according to correlation rule;Step 4. calls GIS map data, according to result set show disease in space, temporal geographical distribution situation.Present invention exhibition is regularty of epidemic and the offer basic data of the exploration disease cause of disease of study of disease;By the analysis of Disease Distribution rule and determiner being contributed to for reasonably working out the anti-system of disease, Health method and measure offer scientific basis.
Description
Technical field
The present invention relates to disease forecasting analysis technical field method, particularly relate to one and set up disease based on big data analysis
The method of cloud atlas.
Background technology
In medicinal diagnostic message, there is the substantial amounts of information about conditions of patients with individual, including patient's
These medical datas are associated analyzing, excavate the most under covering substantial amounts of by medical history and certain disease and various symptoms
Correlation rule, such as the relation between symptom attribute, can with the development trend of predictive disease and assist doctor to assess health condition,
Make diagnosis, prevention of disease and treatment and have important meaning.
In data mining technology, if there is certain regularity between the value of two or more variable, just it is called
Association (association).Association reflects the dependence between an event and other event or the degree frequently occurred.Close
Connection can be divided into simple association, sequential correlation, causalnexus and quantity association etc..Association between data is complicated, major part
Contain cannot observe after mass data and learn.Association analysis (association analysis) be exactly be used for finding hidden
Ensconce the valuable contact that large data is concentrated.The contact that association analysis is found can be with correlation rule or frequent item set
Form represent.Correlation rule (association rule) refers to being correlated with of the different attribute item occurred in the same time
Property.In broad terms, association analysis is the essence of data mining, since the purpose of data mining is to find to hide in data behind
Knowledge, then this knowledge must be the relation between the different object of reflection;Saying in the narrow sense, association analysis refers to that a class is specific
Data mining technology, main purpose is the association in mining data storehouse between object.Association rule mining is exactly from substantial amounts of
Data are excavated and describes the valuable knowledge connected each other between data item.
GIS has massive spatial data storage, the feature of analysis, can export rapidly and intuitively again analysis result,
So becoming the aid that modern medicine study is strong, also gradually it is applied to disease prevention and control field.Most
Epidemiological study data all has space attribute, such as the morbidity infection conditions of crowd and animal, the distribution of host's medium, warm and humid
Degree, rainfall, soil are all relevant with geographical position with sanitary installation etc., and these data cube computation can be risen by GIS by spatial relationship
Come, interact display and analyze, the most also providing basis for later statistical analysis.
Summary of the invention
It is desirable to provide a kind of method setting up disease cloud atlas based on big data analysis,
For reaching above-mentioned purpose, the technical solution used in the present invention is as follows:
A kind of method setting up disease cloud atlas based on big data analysis, utilizes GIS map and big data framework technology, shows
Disease in space, temporal distribution situation, comprise the following steps:
Step 1. gathers data: obtain electronic health record data;
Step 2. data cleansing and process;The data obtaining step 1 are changed and are cleaned;Conversion includes lack of standardization
Name of disease change, by change by unified for nonstandard name of disease be the name of disease of standard criterion;By clean, by unreasonable,
The data predicted the outcome may be affected reject;
Data after step 3. uses data mining technology that step 2 is changed and cleared up are associated analyzing, and excavate frequency
Numerous collection, obtains the correlation rule between symptom attribute according to frequent item set, sets up " kinds of Diseases-ill according to correlation rule
Time m-number of patients-geographical position " result set;
Step 4. calls GIS map data, the result set that read step 3 obtains, and shows disease at sky according to result set
Between, temporal distribution situation.
Further, in described step 2, for there being the data of property value to carry out specification attributes value: wherein to value both
There is word to have again the attribute of discrete data by word quantitative classification, be converted into Category Attributes;For only word attribute according to
Business is qualitative, and word attribute is converted into Category Attributes;Attribute for only discrete data then deletes this attribute;
Two kinds of processing modes are had: mode 1: using missing value as a kind of value form for the data of attribute missing value;Mode 2:
Ignore this attribute.
Further, in described step 3, set minimum support and the minimum confident degree, will be greater than and/or equal to ramuscule
The correlation rule of degree of holding and the minimum confident degree is as Strong association rule, if the illness symptom of a patient meets certain disease disease
The Strong association rule of shape attribute, then it is assumed that it is the patient of this disease, obtains the sick time of this patient and geographical position then
Put, number of patients is added up simultaneously.
Further, described step 3 uses following steps Mining Frequent Itemsets Based:
The 3.1 all candidate's symptom attributes traveling through certain disease, determine the support frequency of every attribute, the most all candidates
Symptom attribute composition candidate 1 collection: H1;
3.2 set minimum supports frequency, and with minimum, the support frequency of all properties in H1 being supported, frequency compares, will
H1Middle support frequency supports frequent 1 the collection F1 of attribute composition of frequency more than minimum;
Then 3.3 use following method to excavate frequent k+1 item collection: Fk+1;
(1) attended operation (F is utilizedk)⊕(Fk), determine candidate's k+1 item collection: Hk+1, wherein K=1,2 ... n;
(2) to Hk+1In attribute be scanned, calculate Hk+1In the support frequency of each attribute, by Hk+1In all
Hold frequency and support attribute composition frequent K+1 item collection: the F of frequency more than or equal to minimumk+1;
FkMiddle term collection number is | Fk|, then Ck+1In haveIndividual attribute;Ck+1It is the Candidate Set of frequent item set, i.e. Ck+1Bag
Include H1、…HK+1;
(3) when item concentrates the support frequency both less than minimum support frequency of all properties element, algorithm is terminated;
According to the frequent item set F excavatedk+1Obtain the Strong association rule meeting minimum support and the minimum confident degree, its
Middle K=0,1 ... n.
Further, formula (1) is utilized to calculate the degree of belief of obtained correlation rule;
In formula (1), support_count (H ∪ F) is the things number comprising item collection H ∪ F, and support_count (H) is bag
Containing the things number of item collection H, wherein H is candidate, and F is frequent item set, if
Then thinking that this correlation rule is Strong association rule, wherein min_counf is the minimum confident degree threshold values arranged.
According to this formula, relevance principle can produce as follows: for each frequent item set L, produces all non-gap of L
Collection, for each nonvoid subset of L, if Then produce a Strong association ruleWherein, min_counf is the minimum confident degree threshold values arranged.
Owing to rule is directly produced by frequent item set, therefore all item collection involved by correlation rule are satisfied by
Little support threshold.Frequent item set and support frequency thereof can store in lists so that they can quickly be accessed.Use
Apriori algorithm is associated rule digging to the medical record information added up.
Further, in described step 3, according to Strong association rule, result data is stored in result table, wherein disease
Kind, sick time, number of patients and geographical position are as the row race member of result table.
Preferably, in described step 4, Baidu's map increases figure layer, shows that certain disease exists with the form of thermodynamic chart
Geographical distribution situation in space, time and crowd, the overlay area wherein scheming layer obtains according to geographical position, figure layer transparency
Obtain according to the number of patients in this region.
Further, also include before described step 1, figure layer data is initialized.
Further, Baidu's map is done by described step 4 second interface exploitation, it is achieved data model and GIS map
Interconnect and apply response.
The method have the advantages that
1. the present invention utilizes GIS map and big data framework technology, shows the disease geographical distribution feelings in space, in the time
Condition, for regularty of epidemic and the offer basic data of the exploration disease cause of disease of study of disease;
2. integrating the electronic health record data of different platform, before solving, data dispersion, inefficiency are difficult to data set
Become and the comprehensive problem analyzed;
3., by the description being distributed disease cloud atlas data, the basic feature of understanding disease popularity, is that clinical diagnosis has very much
The important information being worth;
4. the analysis of pair Disease Distribution rule and determiner contributes to as reasonably working out the anti-system of disease, Health method
And measure provides scientific basis.
Accompanying drawing explanation
Fig. 1 is the geographical distribution design sketch of gastric abscess;
Fig. 2 is the geographical distribution design sketch of cough.
Detailed description of the invention
In order to make the purpose of the present invention, technical scheme and advantage clearer, below in conjunction with accompanying drawing, the present invention is made
Further describe.
Embodiment 1
The disclosed method setting up disease cloud atlas based on big data analysis of the present embodiment, utilizes GIS map and big data shelf
Structure technology, shows the disease geographical distribution situation in space, in the time, comprises the following steps:
Step 1. gathers data: obtain electronic health record data.Further, electronic health record Data Source is in big data acquisition
Platform.Big data acquisition platform, uses the medical data acquisition technology of cloud computing mode, gathers the clinical case history of 80 multiple hospitals
Data, data acquisition xml document formal layout, it is provided that unify, upload interface easily, support that real-time files disposition is looked into
Ask, upload batch management and problem data rollback, the most compatible other data format analysis processing and interface modes.
Step 2. data cleansing and process;The data obtaining step 1 are changed and are cleaned, and conversion includes lack of standardization
Name of disease change, such as ' the little pain of gastral cavilty ' is converted to ' gastric abscess ', by change by unified for nonstandard name of disease be standard
The name of disease of specification.Then by cleaning, the data predicting the outcome impact unreasonable, possible are rejected, and as deleted name of disease are
Empty record.
Data after step 3. uses data mining technology that step 2 is changed and cleared up are associated analyzing, and excavate frequency
Numerous collection, obtains the correlation rule between symptom attribute according to frequent item set, sets up " kinds of Diseases-ill according to correlation rule
Time m-number of patients-geographical position " result set;
Step 4. calls GIS map data, according to data model show disease in space, temporal distribution situation.
The present embodiment utilizes front-end technology, reads the result set after back-end data is excavated, then the situation respectively of disease is presented in
On the figure layer of GIS.Front-end technology is developed also known as front-end technology, is by java language development front end page, and after reading
Number of units evidence, is presented to front end page.
Front end page can be the front-end platform existed, it is also possible to individually build for disease cloud atlas by the following method
A vertical front end page:
1. write the front end html page, and in java code, call the interface of Baidu's map, thus optimized integration map
Loading;
2. in code, write a figure layer, the incoming parameter such as definition time, sick kinds, number, area, and the dividing of parameter
Shelves;
3. in java code, write data base interface function, it is achieved inquire from the result set of data mining and meet bar
The record of part, and pass to parameter;
4. the correlation rule then can set up according to step 3 after parameter receives data, is illustrated in GIS map.
Parameter stepping: use different colours to represent rank in various degree, minima=0, maximum=patient's sum, so
After according to accounting stepping, and represent with different colours, then according to the shelves level at each place, area, in GIS map, display is each
The color that individual area is corresponding.
Further, in described step 2, for there being the data of property value to carry out specification attributes value: wherein to value both
There is word to have again the attribute of discrete data by word quantitative classification, be converted into Category Attributes;For only word attribute according to
Business is qualitative, and word attribute is converted into Category Attributes;Attribute for only discrete data then deletes this attribute;
Two kinds of processing modes are had: mode 1: using missing value as a kind of value form for the data of attribute missing value;Mode 2:
Ignore this attribute.From the point of view of data cases, name of disease the most completely be may determine that to attribute, desirable;Attribute is obscured
Not can determine that attribute, then ignore.
Further, in described step 3, set minimum support and the minimum confident degree, will be greater than and/or equal to ramuscule
The correlation rule of degree of holding and the minimum confident degree is as Strong association rule, if the illness symptom of a patient meets certain disease disease
The Strong association rule of shape attribute, then it is assumed that it is the patient of this disease, obtains the sick time of this patient and geographical position then
Put, number of patients is added up simultaneously.
Further, described step 3 uses following steps Mining Frequent Itemsets Based:
The 3.1 all candidate's symptom attributes traveling through certain disease, determine the support frequency of every attribute, the most all candidates
Symptom attribute composition candidate 1 collection: H1.Frequency is exactly the number of times occurred, comes out by statistical method.
3.2 set minimum supports frequency: min_support, by the support frequency of all properties in H1 and min_support
Compare, by H1Frequent 1 the collection F1 of the middle support frequency attribute composition more than min_support;
Then 3.3 use following method to excavate frequent k+1 item collection: Fk+1;
(1) attended operation (F is utilizedk)⊕(Fk), determine candidate's k+1 item collection: Hk+1, wherein K=1,2 ... n;
(2) to Hk+1In attribute be scanned, calculate Hk+1In the support frequency of each attribute, by Hk+1In all
Hold frequency attribute composition frequent K+1 item collection: the F more than min_supportk+1;
FkMiddle term collection number is | Fk|, then Ck+1In haveIndividual attribute;Ck+1It is the Candidate Set of frequent item set, i.e. Ck+1Bag
Include H1、…HK+1;
(3) when the support frequency that item concentrates all properties element is both less than min_support, algorithm is terminated;
According to the frequent item set F excavatedk+1Obtain the Strong association rule meeting minimum support and the minimum confident degree, its
Middle K=0,1 ... n.
Further, formula (1) is utilized to calculate the degree of belief of obtained correlation rule;
In formula (1), support_count (H ∪ F) is the things number comprising item collection H ∪ F, and support_count (H) is
Comprising the things number of item collection H, wherein H is candidate, and F is frequent item set, if
Then thinking that this correlation rule is Strong association rule, wherein min_counf is the minimum confident degree threshold values arranged.
According to this formula, relevance principle can produce as follows: for each frequent item set L, produces all non-gap of L
Collection, for each nonvoid subset of L, if Then produce a Strong association ruleWherein, min_counf is the minimum confident degree threshold values arranged.
Owing to rule is directly produced by frequent item set, therefore all item collection involved by correlation rule are satisfied by
Little support threshold.Frequent item set and support frequency thereof can store in lists so that they can quickly be accessed.
Further, in step 3, according to Strong association rule, result data is stored in result table, wherein disease kind
Class, sick time, number of patients and geographical position are as the row race member of result table.
Preferably, in step 4 on Baidu's map increase figure layer, with the form of thermodynamic chart show certain disease in space,
Geographical distribution situation in time and crowd, wherein schemes the overlay area of layer and obtains according to geographical position, figure layer transparency according to
Number of patients in this region obtains.
Further, before step 1 figure layer data is initialized.
Further, in step 4 Baidu's map is done second interface exploitation, it is achieved data model and GIS map mutual
Connection intercommunication and application response.
Embodiment 2
The inventive method, as a example by epigastric pain disease, is described in detail by the present embodiment.
(1) pretreatment of original medical record data
The medical record data of 1000 example epigastric pain disease patients in the big data warehouse of Data Source hospital.All data are equal
Deriving from the anthropic factors such as actual case history, eliminating operator in including, gather and extracting, this medical record information is the trueest
Real believable medical data.
Due to complexity, multiformity and the redundancy of medical data, in order to avoid data mining process falls into chaos, obtain
More accurate experimental result, first I has carried out pretreatment to the medical record data of patient.The attribute of original medical record data has 36
Individual.By observational study, find initial data also exists the problems such as substantial amounts of noise data and redundant data, such as attribute " disease
Sick durante dolors " there are such as nearly more than 10 values such as " 0 ", " 5~6 minutes ", " 30 minutes ", " 1~10 minute " and pole not
Specification, and the value in " paroxysmal nocturnal dyspnea " and " pain diffuses position " attribute is the most lack of standardization, including null value, 0,
The literary composition such as the discrete data of 1 type and " can not put down sleeping " " both shoulders back " " shoulder back cervical region lower jaw part and left arm are to left finger end "
WD, some attributes also exist the situation of overwhelming majority value missing value.
For this situation, the present embodiment is taked following method data carry out pretreatment:
1. specification attributes value: word existing to value has again the attribute of discrete data to be described by word as the case may be
Quantitative classification, is converted into Category Attributes.Such as to " ache from Disease persistent period ", herein by description lack of standardization as " 0 ", " 5~6
Minute ", " 30 minutes " etc. be quantified as " 0 " " without pain ", " 1 " " durante dolors is less than 10 minutes ", " 2 "
" durante dolors is more than 10 minutes and less than 30 minutes ", " 3 " " durante dolors is more than 30 minutes ".For " battle array
Send out property nocturnal dyspnea " in " can not put down sleeping " be set to the third value " 2 " outside " 0 " " 1 ";
2. for attribute missing value situation, having two kinds of processing modes, one is that two is to ignore as a kind of value form using missing value
This attribute.Which kind of mode is selected to want particular problem to make a concrete analysis of, such as " pathologic Q ripple " attribute in initial data, in 450 example case histories
Entirely without an example value, therefore this attribute is ignored;And to " arrhythmia " " Electrocardiography " two attributes, only exist a small amount of
Missing value, and understand these two attributes according to medical science general knowledge and occupy critical role when Diagnosis of Gastric gastralgia, herein will when processing
The missing value of two attributes supposes to be set to " without exception ".
In addition to above-mentioned processing method, also to focus on the protection to patients ' privacy, it is ensured that used data do not expose patient and appoint
What privacy information.
(2) mining process of frequent item set
Choosing of model algorithm: the algorithm being suitable for for same Model Selection, the present embodiment is with apriori algorithm for ginseng
Examine, and done appropriate optimization.
Table 1 is the medical record information tables of data of gastric abscess, and gastric abscess has 6 symptom and sign attributes, various in order to excavate
Association between symptom attribute, uses the innovatory algorithm of Apriori that data are associated rule digging.
Table 1: gastric abscess patient medical record information
In table 1, each attribute and value implication are as follows:
A=has the hiccups;B=flatulence;C=feels sick;D=vomits;E=suffers from diarrhoea;F=is uncomfortable in chest;G=gastropathy history;H=is tired or lives
Dynamic postemphasis;I=can spontaneous remission after having a rest;The medicines such as J=buccal nitric acid lipid can be alleviated;K=paroxysmal pain.
Attribute value represents this symptom for " 1 " and exists, and " 0 " represents this symptom for not exist.Patient such as serial number 1
Medical record information shows that this patient has the hiccups, and has nauseating and symptom of diarrhea.
Apriori innovatory algorithm is used to be associated analyzing to relation between symptom attribute in table 4.1, key therein
Step is to use frequent K-1 item collection: Fk-1Generate frequent K item collection Fk, this process is divided into two steps: first, by Fk-1In appoint
One or two Son item set connects, it is thus achieved that candidate collection Ck;Then, to CkIn each element screen, because CkIn every
Collection is not necessarily all frequent item set, then obtains satisfactory item collection composition Fk。
Mining process below in conjunction with this medical record data detailed description frequent item set:
All candidate's symptom attributes are traveled through once by 2.1, determine every support frequency, all properties composition candidate 1
Collection: H1;
2.2 set minimum supports frequency: min_support=10%, by H1The support frequency of middle all properties and min_
Support compares, and the most all support frequency attribute more than min_support forms frequent 1 collection: F1, count one by one at primary disease
According to, F1Middle all constituent elements and support frequency are as shown in table 2:
Table 2: frequent 1 collection
Attribute | Support% |
Have the hiccups | 40.909 |
Flatulence | 54.541 |
Feel sick | 27.273 |
Vomiting | 63.636 |
Diarrhoea | 63.636 |
Uncomfortable in chest | 54.541 |
Gastropathy history | 63.636 |
Fatigue or activity postemphasis | 22.727 |
Can spontaneous remission after rest | 27.273 |
The medicines such as buccal nitric acid lipid can be alleviated | 22.727 |
Paroxysmal pain | 40.909 |
2.3 utilize attended operation (F1)⊕(F1) determine that candidate 2 collects H2, frequent 1 collection middle term collection number be | F1|, then C2
In haveIndividual attribute, C2The Candidate Set of frequent 2 collection, i.e. C2Include H1、H2;
2.4 couples of H2In attribute be scanned, calculate the support frequency of each attribute;
Attribute more than min_support of all support frequency of obtaining in 2.4 is formed frequent 2 collection: F by 2.52.?
In the present embodiment, frequent 2 collection (because frequent 2 collection are relatively big in this case history, only choose a portion conduct as shown in table 3
Example):
Table 3: frequent 2 collection
2.6 are iterated according to above-mentioned algorithm, can sequentially generate frequent item set F3To F5, process is slightly.Frequent 5 collection: F5As
Shown in table 4:
Table 4: frequent 5 collection
2.7 work as algorithm iteration again, and item concentrates only one of which element, support that frequency is 9.091 through calculating it, are less than
Min_support, therefore do not have new item collection to find, algorithm terminates.
(3) extraction of Strong association rule
After excavating all of frequent item set from data base, it is possible to be easier to obtain corresponding correlation rule.Also
Seek to produce the Strong association rule meeting minimum support and the minimum confident degree, it is possible to use formula (1) calculates obtained pass
The degree of belief of connection rule.Here conditional probability is that the support utilizing item collection calculates.
In formula (1), support_count (H ∪ F) is the things number comprising item collection H ∪ F, and support_count (H) is
Comprising the things number of item collection H, wherein H is candidate, and F is frequent item set, if
Then thinking that this correlation rule is Strong association rule, wherein min_counf is the minimum confident degree threshold values arranged.
Owing to correlation rule is directly produced by frequent item set, therefore all Xiang Jijun involved by correlation rule are full
Foot minimum support threshold value.Frequent item set and support frequency thereof can store in lists so that they can quickly be accessed.
The present embodiment uses the medical record information of the Apriori algorithm 1000 example gastric abscess patients to being added up to be associated rule digging,
Arranging minimum support min_support is 30%, and the minimum confident degree min_conf is 90%, obtains Strong association rule part such as
Shown in table 5:
Table 5: the Strong association rule between gastric abscess patient medical record symptom attribute
If the symptom of a patient meets symptom any one of above-mentioned list, then it is assumed that it is the patient of gastric abscess.
The such as symptom of this patient's early stage is: have nauseating, diarrhoea and three symptoms uncomfortable in chest, after the medicines such as buccal nitric acid lipid simultaneously
Above-mentioned symptom can be alleviated, then it is assumed that it is the patient of gastric abscess.When determining that this patient suffers from gastric abscess, then obtain this trouble
The sick time of person and geographical position, add up number of patients simultaneously.
According to said method, patients suffering from gastric abscess all in data base are found out, adds up patient numbers simultaneously.
The result table (as shown in table 6) of foundation " kinds of Diseases-sick time-number of patients-geographical position ":
Table 6: result table
Due to the bad accurate statistics of sick time and ill geographical position.In the present embodiment table 6, by the coordinate of hospital of seeking medical advice
As geographical position, using Waiting time as sick time.
Parameter stepping and colour code (as shown in table 7), are illustrated as a example by gastric abscess by the present embodiment:
Table 7: gastric abscess parameter stepping
Represent 1 grade by redness in the present embodiment, i.e. the red area of map denotation illustrates the disease condition in this area
Seriously, call GIS map data, according to the geographical distribution feelings in space, time and crowd of the data display gastric abscess in table 6
Condition, design sketch is as shown in Figure 1.
According to said method, it is possible to show cough geographical distribution situation in space, time and crowd, design sketch such as figure
Shown in 2.
Further, being provided with time started, end time, sick plant kind in foreground, the start and end time on foreground is
Two time controls.After foreground chooses time period and kinds of Diseases, result the exterior and the interior is met this time period by background program
Pass to figure layer with number of patients, the geographical coordinate of kinds of Diseases, then displayed by map.Further, at result table
In the age bracket of patient is made a distinction, using age bracket as the row race member of result table, show that disease is in different age group
Distribution situation.
Due to virulence factor, crowd characteristic and the impact of the many factors comprehensive function such as nature, social environment, disease exists
The epidemic strength of different crowd, different regions and different time differs, and existence is the most incomplete same.Grinding of the distribution of disease
Study carefully the biological characteristics both reflecting disease itself, also the various internal and external environment factors that concentrated expression disease is relevant effect and
Its synergistic feature.The present invention utilizes GIS map and big data framework technology, show disease in space, time and crowd
In geographical distribution situation.Integrating the electronic health record data of different platform, before solving, data dispersion, inefficiency are difficult to
Data integration and the comprehensive problem analyzed;For regularty of epidemic and the offer basic data of the exploration disease cause of disease of study of disease, lead to
Crossing the description to the distribution of disease cloud atlas data, the basic feature of understanding disease popularity, is the of great value important letter of clinical diagnosis
Breath;Analysis to Disease Distribution rule and determiner contributes to anti-system, Health method and the measure for reasonably working out disease
Scientific basis is provided.
Certainly, the present invention also can have other numerous embodiments, in the case of without departing substantially from present invention spirit and essence thereof,
Those of ordinary skill in the art can make various corresponding change and deformation according to the present invention, but these change accordingly and become
Shape all should belong to the protection domain of appended claims of the invention.
Claims (9)
1. the method setting up disease cloud atlas based on big data analysis, it is characterised in that: utilize GIS map and big data framework skill
Art, shows disease distribution situation on room and time, comprises the following steps:
Step 1. gathers data: obtain electronic health record data;
The data that step 1 obtains are changed and are cleaned by step 2.;Conversion includes nonstandard name of disease is converted to standard gauge
The name of disease of model;By cleaning, the data that impact unreasonable, possible predicts the outcome are rejected;
Data after step 3. uses data mining technology that step 2 is changed and cleared up are associated analyzing, and excavate frequent episode
Collection, obtains the correlation rule between symptom attribute according to frequent item set, according to correlation rule set up " kinds of Diseases-sick time-
Number of patients-geographical position " result set;
Step 4. calls GIS map data, the result set that read step 3 obtains, show disease in space, temporal distribution
Situation.
2. the method for claim 1, it is characterised in that: in described step 2, for there being the data of property value to carry out specification
Attribute value: wherein word existing to value has again the attribute of discrete data by word quantitative classification, is converted into Category Attributes;Right
Qualitative according to business in the attribute of only word, and word attribute is converted into Category Attributes;Genus for only discrete data
Property then deletes this attribute.
3. the method for claim 1, it is characterised in that: in described step 3, set minimum support and minimum trust
Degree, will be greater than and/or equal to the correlation rule of minimum support and the minimum confident degree as Strong association rule, if a patient
Illness symptom meet the Strong association rule of certain disease symptoms attribute, then it is assumed that it is the patient of this disease, then obtains
The sick time of this patient and geographical position, add up number of patients simultaneously.
4. method as claimed in claim 3, it is characterised in that: employing following steps Mining Frequent Itemsets Based in described step 3:
The 3.1 all candidate's symptom attributes traveling through certain disease, determine the support frequency of every attribute, the most all candidate's symptoms
Attribute composition candidate 1 collection: H1;
3.2 set minimum supports frequency, and with minimum, the support frequency of all properties in H1 being supported, frequency compares, by H1In
Support that frequency supports frequent 1 the collection F1 of attribute composition of frequency more than minimum;
Then 3.3 use following method to excavate frequent k+1 item collection: Fk+1;
(1) attended operation is utilizedDetermine candidate's k+1 item collection: Hk+1, wherein K=1,2 ... n;
(2) to Hk+1In attribute be scanned, calculate Hk+1In the support frequency of each attribute, by Hk+1In all support frequency
Degree supports attribute composition frequent K+1 item collection: the F of frequency more than or equal to minimumk+1;
FkMiddle term collection number is | Fk|, then Ck+1In haveIndividual attribute;Ck+1It is the Candidate Set of frequent item set, i.e. Ck+1Include
H1、…HK+1;
(3) when item concentrates the support frequency both less than minimum support frequency of all properties element, algorithm is terminated;
According to the frequent item set F excavatedk+1Obtain the Strong association rule meeting minimum support and the minimum confident degree, wherein K=
0,1 ... n.
5. method as claimed in claim 4, it is characterised in that: utilize formula (1) to calculate the degree of belief of obtained correlation rule;
In formula (1), support_count (H ∪ F) is the things number comprising item collection H ∪ F, and support_count (H) is for comprising
The things number of item collection H, wherein H is candidate, and F is frequent item set.
6. the method as described in claim 1,3 or 4, it is characterised in that: in described step 3, according to Strong association rule, by result
Data are stored in result table, and wherein kinds of Diseases, sick time, number of patients and geographical position become as the row race of result table
Member.
7. the method for claim 1, it is characterised in that: in described step 4, Baidu's map increases figure layer, with heat
The form tried hard to shows certain disease geographical distribution situation in space, time and crowd, wherein schemes the overlay area root of layer
Obtaining according to geographical position, figure layer transparency obtains according to the number of patients in this region.
8. method as claimed in claim 7, it is characterised in that: also include before described step 1, figure layer data is carried out initially
Change.
9. the method as described in claim 1 or 7, it is characterised in that: Baidu's map is done second interface by described step 4 and opens
Send out, it is achieved the interconnecting and apply response of data model and GIS map.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610497249.7A CN106202883A (en) | 2016-06-28 | 2016-06-28 | A kind of method setting up disease cloud atlas based on big data analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610497249.7A CN106202883A (en) | 2016-06-28 | 2016-06-28 | A kind of method setting up disease cloud atlas based on big data analysis |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106202883A true CN106202883A (en) | 2016-12-07 |
Family
ID=57463275
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610497249.7A Pending CN106202883A (en) | 2016-06-28 | 2016-06-28 | A kind of method setting up disease cloud atlas based on big data analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106202883A (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107451416A (en) * | 2017-08-28 | 2017-12-08 | 昆明理工大学 | A kind of sle auxiliary diagnostic equipment and method |
CN107887026A (en) * | 2017-11-01 | 2018-04-06 | 中国科学院地理科学与资源研究所 | A kind of assembly type cancer intelligence Mapping System and method based on environmental hazard key element |
CN108417274A (en) * | 2018-03-06 | 2018-08-17 | 东南大学 | Forecast of epiphytotics method, system and equipment |
CN108806767A (en) * | 2018-06-15 | 2018-11-13 | 中南大学 | Disease symptoms association analysis method based on electronic health record |
CN109065158A (en) * | 2018-08-22 | 2018-12-21 | 重庆市智权之路科技有限公司 | Big data smart machine carries out data and extracts working method |
CN109147879A (en) * | 2018-07-02 | 2019-01-04 | 北京众信易保科技有限公司 | The method and system of Visual Report Forms based on medical document |
CN109192301A (en) * | 2018-08-22 | 2019-01-11 | 重庆华医康道科技有限公司 | Patient's diagnostic work method is carried out by internet intelligent medical treatment & health equipment |
CN109411093A (en) * | 2018-10-16 | 2019-03-01 | 烟台翰宁信息科技有限公司 | A kind of intelligent medical treatment big data analysis processing method based on cloud computing |
WO2019136807A1 (en) * | 2018-01-12 | 2019-07-18 | 平安科技(深圳)有限公司 | Medical data relationship image acquisition method and apparatus, terminal device and storage medium |
CN110703183A (en) * | 2019-11-13 | 2020-01-17 | 江苏方天电力技术有限公司 | Intelligent electric energy meter fault data analysis method and system |
CN110781216A (en) * | 2019-11-05 | 2020-02-11 | 广东工业大学 | Traditional Chinese medicine symptom association rule mining method and device and storage medium |
CN111341454A (en) * | 2018-12-19 | 2020-06-26 | 中国电信股份有限公司 | Data mining method and device |
CN111476696A (en) * | 2020-03-27 | 2020-07-31 | 南京慧智灵杰信息技术有限公司 | Community correction group position information monitoring and alarming system based on big data |
CN111540425A (en) * | 2020-04-26 | 2020-08-14 | 吴九云 | Intelligent medical information pushing method based on artificial intelligence and electronic medical record cloud platform |
CN112365943A (en) * | 2020-10-22 | 2021-02-12 | 杭州未名信科科技有限公司 | Method and device for predicting length of stay of patient, electronic equipment and storage medium |
CN113362960A (en) * | 2021-07-02 | 2021-09-07 | 西南科技大学 | Urban resident public health influence factor visual analysis system and method combining multi-source data |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101149751A (en) * | 2007-10-29 | 2008-03-26 | 浙江大学 | Generalized relating rule digging method for analyzing traditional Chinese medicine recipe drug matching rule |
US20100332430A1 (en) * | 2009-06-30 | 2010-12-30 | Dow Agrosciences Llc | Application of machine learning methods for mining association rules in plant and animal data sets containing molecular genetic markers, followed by classification or prediction utilizing features created from these association rules |
CN104715013A (en) * | 2015-01-26 | 2015-06-17 | 南京邮电大学 | Hadoop-based user health data analysis method and system |
CN104866979A (en) * | 2015-06-08 | 2015-08-26 | 苏芮 | Traditional Chinese medicine case data processing method and system of emergent acute infectious disease |
-
2016
- 2016-06-28 CN CN201610497249.7A patent/CN106202883A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101149751A (en) * | 2007-10-29 | 2008-03-26 | 浙江大学 | Generalized relating rule digging method for analyzing traditional Chinese medicine recipe drug matching rule |
US20100332430A1 (en) * | 2009-06-30 | 2010-12-30 | Dow Agrosciences Llc | Application of machine learning methods for mining association rules in plant and animal data sets containing molecular genetic markers, followed by classification or prediction utilizing features created from these association rules |
CN104715013A (en) * | 2015-01-26 | 2015-06-17 | 南京邮电大学 | Hadoop-based user health data analysis method and system |
CN104866979A (en) * | 2015-06-08 | 2015-08-26 | 苏芮 | Traditional Chinese medicine case data processing method and system of emergent acute infectious disease |
Non-Patent Citations (3)
Title |
---|
周英等: "《大数据技术丛书 大数据挖掘 系统方法与实例分析》", 31 May 2016, 机械工业出版社 * |
王华等: "《基于关联规则的数据挖掘在临床上的应用》", 《安徽大学学报》 * |
许思莹: "《基于时空关联规则的标绘数据挖掘研究—以旅游标绘数据挖掘为例》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107451416A (en) * | 2017-08-28 | 2017-12-08 | 昆明理工大学 | A kind of sle auxiliary diagnostic equipment and method |
CN107887026A (en) * | 2017-11-01 | 2018-04-06 | 中国科学院地理科学与资源研究所 | A kind of assembly type cancer intelligence Mapping System and method based on environmental hazard key element |
CN107887026B (en) * | 2017-11-01 | 2022-04-05 | 中国科学院地理科学与资源研究所 | Component type cancer intelligent mapping system and method based on environmental risk factors |
WO2019136807A1 (en) * | 2018-01-12 | 2019-07-18 | 平安科技(深圳)有限公司 | Medical data relationship image acquisition method and apparatus, terminal device and storage medium |
CN108417274A (en) * | 2018-03-06 | 2018-08-17 | 东南大学 | Forecast of epiphytotics method, system and equipment |
CN108806767A (en) * | 2018-06-15 | 2018-11-13 | 中南大学 | Disease symptoms association analysis method based on electronic health record |
CN108806767B (en) * | 2018-06-15 | 2021-10-22 | 中南大学 | Disease symptom correlation analysis method based on electronic medical record |
CN109147879A (en) * | 2018-07-02 | 2019-01-04 | 北京众信易保科技有限公司 | The method and system of Visual Report Forms based on medical document |
CN109147879B (en) * | 2018-07-02 | 2021-07-27 | 北京众信易保科技有限公司 | Method and system for visual report based on medical document |
CN109065158B (en) * | 2018-08-22 | 2020-06-30 | 湖南德善信医药科技有限公司 | Data extraction working method of big data intelligent equipment |
CN109192301A (en) * | 2018-08-22 | 2019-01-11 | 重庆华医康道科技有限公司 | Patient's diagnostic work method is carried out by internet intelligent medical treatment & health equipment |
CN109065158A (en) * | 2018-08-22 | 2018-12-21 | 重庆市智权之路科技有限公司 | Big data smart machine carries out data and extracts working method |
CN109192301B (en) * | 2018-08-22 | 2022-02-18 | 重庆华医康道科技有限公司 | Method for diagnosing patient through Internet intelligent medical health equipment |
CN109411093B (en) * | 2018-10-16 | 2022-03-18 | 国康中健(北京)健康科技有限公司 | Intelligent medical big data analysis processing method based on cloud computing |
CN109411093A (en) * | 2018-10-16 | 2019-03-01 | 烟台翰宁信息科技有限公司 | A kind of intelligent medical treatment big data analysis processing method based on cloud computing |
CN111341454A (en) * | 2018-12-19 | 2020-06-26 | 中国电信股份有限公司 | Data mining method and device |
CN110781216A (en) * | 2019-11-05 | 2020-02-11 | 广东工业大学 | Traditional Chinese medicine symptom association rule mining method and device and storage medium |
CN110703183A (en) * | 2019-11-13 | 2020-01-17 | 江苏方天电力技术有限公司 | Intelligent electric energy meter fault data analysis method and system |
CN111476696A (en) * | 2020-03-27 | 2020-07-31 | 南京慧智灵杰信息技术有限公司 | Community correction group position information monitoring and alarming system based on big data |
CN111540425B (en) * | 2020-04-26 | 2021-01-15 | 和宇健康科技股份有限公司 | Intelligent medical information pushing method based on artificial intelligence and electronic medical record cloud platform |
CN111540425A (en) * | 2020-04-26 | 2020-08-14 | 吴九云 | Intelligent medical information pushing method based on artificial intelligence and electronic medical record cloud platform |
CN112365943A (en) * | 2020-10-22 | 2021-02-12 | 杭州未名信科科技有限公司 | Method and device for predicting length of stay of patient, electronic equipment and storage medium |
WO2022083140A1 (en) * | 2020-10-22 | 2022-04-28 | 杭州未名信科科技有限公司 | Patient length of stay prediction method and apparatus, electronic device, and storage medium |
CN113362960A (en) * | 2021-07-02 | 2021-09-07 | 西南科技大学 | Urban resident public health influence factor visual analysis system and method combining multi-source data |
CN113362960B (en) * | 2021-07-02 | 2022-04-29 | 西南科技大学 | Urban resident public health influence factor visual analysis system and method combining multi-source data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106202883A (en) | A kind of method setting up disease cloud atlas based on big data analysis | |
Shickel et al. | DeepSOFA: a continuous acuity score for critically ill patients using clinically interpretable deep learning | |
CN104166667B (en) | Analysis system and public health work support method | |
US7865375B2 (en) | System and method for multidimensional extension of database information using inferred groupings | |
US7991579B2 (en) | Statistical methods for multivariate ordinal data which are used for data base driven decision support | |
Davis et al. | Deconstructing a species-complex: geometric morphometric and molecular analyses define species in the Western Rattlesnake (Crotalus viridis) | |
Albert | Decision theory in medicine: a review and critique | |
CN101911077A (en) | Method and apparatus for refining similar case search | |
CN102405473A (en) | A point-of-care enactive medical system and method | |
JP7404581B1 (en) | Chronic nephropathy subtype mining system based on self-supervised graph clustering | |
CN112201360A (en) | Chronic disease follow-up visit record collection method, device, equipment and storage medium | |
Ahmed et al. | TDTD: Thyroid disease type diagnostics | |
US20210225513A1 (en) | Method to Create Digital Twins and use the Same for Causal Associations | |
Lin et al. | Time-to-event predictive modeling for chronic conditions using electronic health records | |
Moriña et al. | Competing risks simulation with the survsim R package | |
Chou et al. | Extracting drug utilization knowledge using self-organizing map and rough set theory | |
Sushma et al. | Comparative Study of Naive Bayes, Gaussian Naive Bayes Classifier and Decision Tree Algorithms for Prediction of Heart Diseases | |
Hashimoto et al. | The Log-Burr XII regression model for grouped survival data | |
Hadgu et al. | Application of generalized estimating equations to a dental randomized clinical trial | |
Avdic | Microeconometric analyses of individual behavior in public welfare systems | |
Giuliano et al. | Issuing electrocardiographic reports remotely: experience of the telemedicine network of Santa Catarina | |
Sakthidharan et al. | Detection and prediction of breast cancer using CNN-MDRP algorithm in big data and machine learning: study and analysis | |
US20230307139A1 (en) | Computing device for estimating the probability of myocardial infarction | |
Bian et al. | Towards a task taxonomy of visual analysis of electronic health or medical record data | |
Xie et al. | Predicting number of hospitalization days based on health insurance claims data using bagged regression trees |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20161207 |
|
RJ01 | Rejection of invention patent application after publication |