CN109147879A - The method and system of Visual Report Forms based on medical document - Google Patents

The method and system of Visual Report Forms based on medical document Download PDF

Info

Publication number
CN109147879A
CN109147879A CN201810709344.8A CN201810709344A CN109147879A CN 109147879 A CN109147879 A CN 109147879A CN 201810709344 A CN201810709344 A CN 201810709344A CN 109147879 A CN109147879 A CN 109147879A
Authority
CN
China
Prior art keywords
disease
data
analysis
algorithm
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810709344.8A
Other languages
Chinese (zh)
Other versions
CN109147879B (en
Inventor
孙字弋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongxin Yi Bao Technology Co Ltd
Original Assignee
Beijing Zhongxin Yi Bao Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongxin Yi Bao Technology Co Ltd filed Critical Beijing Zhongxin Yi Bao Technology Co Ltd
Priority to CN201810709344.8A priority Critical patent/CN109147879B/en
Publication of CN109147879A publication Critical patent/CN109147879A/en
Application granted granted Critical
Publication of CN109147879B publication Critical patent/CN109147879B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data

Landscapes

  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention relates to the methods of the Visual Report Forms based on medical document.The method of the present invention includes following steps: 1) acquiring the data of medical document;2) data of medical document are divided into disease data and patient data;3) disease category data are analyzed, using clustering algorithm, the result of analysis is then presented with the mode of disease category distribution map;4) data of sick people are analyzed, using crowd's attribute tags algorithm and association rules mining algorithm, the result of analysis is then presented with the method for the cyberrelationship figure of sick people;Wherein, the disease category data analysis uses clustering algorithm;It is to do association rule mining using Apriori algorithm that the data to sick people, which carry out analysis,.The present invention is directed to the specificity of medical big data, proposes the dimension different to these, shows the solution of the analysis convenient for control and prevention of disease in a uniform manner.

Description

The method and system of Visual Report Forms based on medical document
Technical field
The invention belongs to data or technical field of information processing, and in particular to the processing of medical big data is more particularly to The method and system of the Visual Report Forms of medical document.
Background technique
In medical industry, medical data has a specific diagnosis and treatment data of hospital, the generally professional height of this kind of data, and main In each department's storage of hospital so common channel is not easy to obtain.But Medical receipt data (invoice, prescription etc.), due to all wanting It gives patient to hold, so acquisition is easy, for example insurance company's Claims Resolution channel can obtain this kind of data.Therefore, this kind of medical treatment Bills data is being in the growth of geometric progression.Accompanying problem is that: the extreme of medical document big data visualization system It is deficient.
Because when facing mass data, browsing data one by one becomes without in all senses.Need visualization system It generates.And for visualization system, the data and data dimension of different industries can bring final report to present It is as far apart as heaven and earth.
With the rise of present big data concept, all trades and professions start to pay much attention to the acquisition of industry Various types of data and deposit Storage.Known big data analysis has certain application, and such as application No. is 201610497249 patent applications to be related to based on big Data analyze the method for establishing disease cloud atlas, and application No. is 201710150587.8 patent applications to be related to wisdom environmental protection big data Method for visualizing.But medical big data has its specificity, for example includes disease, disease category, patient's has age, gender Equal attributes.How by these different dimensions, the analysis convenient for control and prevention of disease is showed in a uniform manner, is a needs It solves the problems, such as.
Summary of the invention
For the demand, the present invention provides a kind of method of Visual Report Forms based on medical document.
A kind of method of Visual Report Forms based on medical document of the invention mainly includes following processes:
1) data of medical document are acquired
2) data of medical document are divided into disease data and patient data
3) disease category data are analyzed, using clustering algorithm, then with disease category distribution map mode come The result of analysis is presented
4) data of sick people are analyzed, using crowd's attribute tags algorithm and association rules mining algorithm, so The result of analysis is presented with the method for the cyberrelationship figure of sick people afterwards
Wherein, the method for above-mentioned disease category data analysis is as follows:
The source of disease data is obtained according to the disease name in the prescription and diagnosis proof on medical document.
ICD10 medical treatment catalogue is mainly used, as tree catalogue, then by disease specific, is done on this directory tree Clustering algorithm.Detailed process are as follows:
A icd10 catalogue) is sorted out in a manner of relational data, divides DS1, tri- ranks of DS2, DS3
B) the method searched with similarity, while the mode for being subject to error correction navigates to specific disease record DS3
The specific method of lookup is disease on traversal document, calculates the editing distance of it and DS3 grade disease.
Algorithm is as follows:
B1) length of str1 or str2 is 0 length for returning to another character string.If (str1.length==0) return
B2 the matrix d of (n+1) * (m+1)) is initialized, and the value of the first row and column is allowed to increase since 0.Scan two characters It goes here and there (n*m grades), if: str1 [i]==str2 [j] records it with temp, is 0.Otherwise temp is denoted as 1.Then in matrix D [i, j] gives the minimum value of d [i-1, j]+1, d [i, j-1]+1, d [i-1, j-1]+temp three.
B3 after) scanning through, the last one value d [n] [m] for returning to matrix is their distance.
B4) and all DS3 ranks compare distance, and distance is 0 or lower than one threshold value, hit, it is believed that on document Disease be exactly this DS3 disease.
C) to DS3, the number of sufferer is recorded.
D) in DS2 rank, summarize all numbers of DS3 rank;Summarize all data of DS2 in DS1 rank.This Sample, no matter which rank of data can obtain sufferer number.
E) finally, the number of incidences and number of disease out can be summarized by tree.
By the above method, finally presented with the Visual Report Forms based on disease category distribution map.The present invention uses square The mode of shape tree graph shows the morbidity quantity of various diseases, and region area is bigger, and it is more to represent morbidity.Rectangle tree graph is main Purpose seeks to scheme interior very clear whole situation at one, determines diagram size by the size of each element amount, and have Group management function.
Specifically do drawing method are as follows: firstly, calculating the toatl proportion of morbidity, then root according to the morbidity number of third level disease Area of the every kind of disease of the third level on a rectangle is determined according to toatl proportion number.Once the rectangular surfaces of the disease of all third level Product determines, then the area of second level disease and the area of first order disease also determine therewith.
Disease data is divided into three-level according to the catalogue of icd10.First order disease is presented with the region of different colours.Such as Fig. 2 Shown in example diagram.The second level and third level disease are showed all in first order region with the region of subdivision.Click any first order Region can focus on this rank and specially show its information.Such as click after respiratory disease, can present this classification more into One step information.
The method of above-mentioned patients' data analysis is as follows:
Data source includes: the tree (being obtained with above-mentioned disease data analysis method) first is that every disease, Second is that crowd's attribute tags of patient data.
The data source of patients' attribute tags, the age of the patient in medical document (such as case record), gender, doctor Card number is protected, then age-based and gender, forms different groups of users.
Then, association rule mining is done with the data of disease and patient's these two aspects.Specific method is mainly to use Apriori algorithm does association rule mining.
Apriori algorithm is a kind of algorithm of most influential Mining Boolean Association Rules frequent item set.It is based in this way The fact: algorithm uses the priori knowledge of frequent item set property.Apriori is referred to as the alternative manner successively searched for using a kind of, K- item collection is for exploring (k+1)-item collection.Firstly, finding out the set of frequent 1- item collection.The set is denoted as L1。L1For looking for frequent 2- The set L of item collection2, and L2For looking for L3, so go down, until frequent k- item collection cannot be found.Look for each LkNeed a data Library scanning.
All affairs are scanned first, obtain 1- item collection C1, are required elimination to be unsatisfactory for condition item collection according to support, are obtained frequency Numerous 1- item collection.Followed by recursive operation:
Known frequent k- item collection (known to frequent 1- item collection), according to the item in frequent k- item collection, connection obtains all possibility K+1_ item, and carry out beta pruning (if all k subsets of the k+1_ item collection are not all able to satisfy support condition, the k+ 1_ item collection is cut up), obtain Ck+1Then item collection filters off the Ck+1The item that support condition is unsatisfactory in item collection obtains frequent k+1- Item collection.If obtained Ck+1Item collection is sky, then algorithm terminates.
The method of connection are as follows: assuming that LkAll items in item collection all arrange in that same order, if that Lk [i] and LkPreceding k-1 in [j] are all identical, and kth item is different, then Lk[i] and Lk[j] is attachable.Such as L2In { I1, I2 } and { I1, I3 } be exactly it is attachable, connection after obtain { I1, I2, I3 }, still { I1, I2 } and I2, I3 } be it is not attachable, otherwise will lead to and duplicate item in item collection.
It is illustrated again about beta pruning, such as by L2Generate K3During, the 3_ item collection enumerated include I1, I2, I3 }, { I1, I3, I5 }, { I2, I3, I4 }, { I2, I3, I5 }, { I2, I4, I5 }, but since { I3, I4 } and { I4, I5 } do not have It occurs in L2In, so { I2, I3, I4 }, { I2, I3, I5 }, { I2, I4, I5 } are fallen by beta pruning.
By the above method, finally presented with the cyberrelationship figure of sick people.It, can be with wherein by association rule mining Find out the inner link of disease category and Susceptible population's attribute.The specific method is as follows:
Firstly, can calculate the level encoder DS1 of disease category to every an example disease, can also calculate the crowd of patient The group of attribute encodes PG, constructs an one-dimension array and is put into [DS1, PG];
Then, all disease records are scanned, the input of the one-dimension array of the first step is filled into a new array, is built into One higher-dimension array;
Again, rule digging is associated to higher-dimension array to calculate, will eventually get DS1, the frequency of PG various combination data Spend weighted value FP.Due to being analysis of high frequency relationship, so taking 80 groups of most high frequency as a result, being filled with Gexf formatted data.Gexf It is a kind of special xml language for describing complex network relationship, is usually first to illustrate node (nodes) in gexf, then Resettle the relationship (edges) between node.DS3, PG are inserted as the Node of Gexf, filled out using its corresponding FP value as Edge Enter.
Finally, making the rendering of relational graph of Gexf data.Wherein red is disease category, and dark blue is crowd's attribute.Wherein, Crowd's attribute is grouped according to age bracket and gender.Disease category is classified by the first class catalogue of icd10.Calculate a people Group can show weighted value FP not and after the weight of the relationship of disease category on chain.Weighted value is higher, represents this kind of crowd This susceptible disease.Because FP value is excavated according to the frequency relation of crowd's attribute PG and disease code DS1 as a result, FP Value height represents in data result, and the people group is not and the relationship of disease is high frequency.
Corresponding, the present invention provides a kind of system of the Visual Report Forms of big data analysis based on medical document, mainly Including following modules:
1) data acquisition and categorization module: it is divided into disease for acquiring the data of medical document, and by the data of medical document Sick data and patient data;
2) disease category data analysis module and the data analysis module of sick people data analysis module: are respectively included;
3) analysis visual Reports module: is presented with the cyberrelationship figure that disease category is distributed map and sick people respectively As a result.
The present invention is directed to the specificity (having disease, disease category, the attributes such as patient's has age, gender) of medical big data, The dimension different to these is proposed, shows the solution of the analysis convenient for control and prevention of disease in a uniform manner.Also Solve the problems, such as Medical receipt data in the scarcity of medical document big data visualization system caused by the growth of geometric progression, tool There are preferable application and promotional value.
Detailed description of the invention
The basic flow chart of Fig. 1 the method for the present invention and system.
The intuitive schematic diagram of morbidity quantity (example diagram) of Fig. 2 various diseases
The intuitive schematic diagram of morbidity quantity (example diagram) of Fig. 3 respiratory disease
The inner link cyberrelationship figure (example diagram) of Fig. 4 disease category and Susceptible population's attribute
Specific embodiment
Below by the description of specific embodiment, the present invention is further explained, but is not construed as limiting the invention.
One, method main flow of the invention
1) data of medical document are acquired
2) data of medical document are divided into disease data and patient data
3) disease category data are analyzed, using clustering algorithm, then with disease category distribution map mode come The result of analysis is presented
4) data of sick people are analyzed, using crowd's attribute tags algorithm and association rules mining algorithm, so The result of analysis is presented with the method for the cyberrelationship figure of sick people afterwards
Two, the explanation of analysis method
1, the method for the data analysis of above-mentioned disease category distribution map
The source of disease data is obtained according to the disease name in the prescription and diagnosis proof on medical document.
ICD10 medical treatment catalogue is mainly used, as tree catalogue, then by disease specific, toward on this directory tree Do clustering algorithm, process are as follows:
A icd10 catalogue) is sorted out in a manner of relational data, divides DS1, tri- ranks of DS2, DS3
B) the method searched with similarity, while the mode for being subject to error correction navigates to specific disease record DS3
The specific method of lookup is disease on traversal document, calculates the editing distance of it and DS3 grade disease.
Algorithm is as follows:
B1) length of str1 or str2 is 0 length for returning to another character string.If (str1.length==0) return
B2 the matrix d of (n+1) * (m+1)) is initialized, and the value of the first row and column is allowed to increase since 0.Scan two characters It goes here and there (n*m grades), if: str1 [i]==str2 [j] records it with temp, is 0.Otherwise temp is denoted as 1.Then in matrix D [i, j] gives the minimum value of d [i-1, j]+1, d [i, j-1]+1, d [i-1, j-1]+temp three.
B3 after) scanning through, the last one value d [n] [m] for returning to matrix is their distance
B4) and all DS3 ranks compare distance, and distance is 0 or lower than one threshold value, hit, it is believed that on document Disease be exactly this DS3 disease
C) to DS3, the number of sufferer is recorded
D) in DS2 rank, summarize all numbers of DS3 rank;Summarize all data of DS2 in DS1 rank.This Sample, no matter which rank of data can obtain sufferer number.
E) finally, the number of incidences and number of disease out can be summarized by tree
2, the method for the data analysis of the cyberrelationship figure of above-mentioned patients
Data source of both needing to use, first is that the tree of every disease (is distributed with above-mentioned disease category What the data of map were analyzed), second is that crowd's attribute tags of patient data.
The data source of patients' attribute tags, the age of the patient in medical document (such as case record), gender, doctor Card number is protected, then age-based and gender, forms different groups of users.
Then, association rule mining is done with the data of disease and patient's these two aspects.
Association rule mining is mainly done using Apriori algorithm.
Apriori algorithm is a kind of algorithm of most influential Mining Boolean Association Rules frequent item set.It is based in this way The fact: algorithm uses the priori knowledge of frequent item set property.Apriori is referred to as the alternative manner successively searched for using a kind of, K- item collection is for exploring (k+1)-item collection.Firstly, finding out the set of frequent 1- item collection.The set is denoted as L1。L1For looking for frequent 2- The set L of item collection2, and L2For looking for L3, so go down, until frequent k- item collection cannot be found.Look for each LkNeed a data Library scanning.
The thinking of algorithm is briefly described below.If being briefly exactly set I is not frequent item set, own Bigger set comprising set I is also impossible to be frequent item set.
Algorithm initial data is as follows:
The basic process of algorithm is as follows:
All affairs are scanned first, obtain 1- item collection C1, are required elimination to be unsatisfactory for condition item collection according to support, are obtained frequency Numerous 1- item collection.
Recursive operation is carried out below:
Known frequent k- item collection (known to frequent 1- item collection), according to the item in frequent k- item collection, connection obtains all possibility K+1_ item, and carry out beta pruning (if all k subsets of the k+1_ item collection are not all able to satisfy support condition, the k+ 1_ item collection is cut up), obtain Ck+1Then item collection filters off the Ck+1The item that support condition is unsatisfactory in item collection obtains frequent k+1- Item collection.If obtained Ck+1Item collection is sky, then algorithm terminates.
The method of connection: assuming that LkAll items in item collection all arrange in that same order, if that Lk[i] And LkPreceding k-1 in [j] are all identical, and kth item is different, then Lk[i] and Lk[j] is attachable.Such as L2In { I1, I2 } and { I1, I3 } be exactly it is attachable, connection after obtain { I1, I2, I3 }, still { I1, I2 } and { I2, I3 } is It is not attachable, it otherwise will lead to and duplicate item in item collection.
It is illustrated again about beta pruning, such as by L2Generate K3During, the 3_ item collection enumerated include I1, I2, I3 }, { I1, I3, I5 }, { I2, I3, I4 }, { I2, I3, I5 }, { I2, I4, I5 }, but since { I3, I4 } and { I4, I5 } do not have It occurs in L2In, so { I2, I3, I4 }, { I2, I3, I5 }, { I2, I4, I5 } are fallen by beta pruning.
Three, the pattern and data structure of Visual Report Forms
1, the Visual Report Forms based on disease category distribution map
With the mode of rectangle tree graph, the morbidity quantity of various diseases is showed, region area is bigger, and it is more to represent morbidity. Rectangle tree graph main purpose seeks to scheme interior very clear whole situation at one, determines diagram by the size of each element amount Size, and there is group management function.
Specifically do drawing method are as follows: firstly, calculating the toatl proportion of morbidity, then root according to the morbidity number of third level disease Area of the every kind of disease of the third level on a rectangle is determined according to toatl proportion number.Once the rectangular surfaces of the disease of all third level Product determines, then the area of second level disease and the area of first order disease also determine therewith.Shown in Fig. 2 is a disease class Not Fen Bu map example diagram.
Disease data is divided into three-level according to the catalogue of icd10.First order disease is presented with the region of different colours.Such as Fig. 2 Shown in example diagram.The second level and third level disease are showed all in first order region with the region of subdivision.Click any first order Region can focus on this rank and specially show its information.Such as click after respiratory disease, can present this classification more into One step information, example diagram as shown in Figure 3.
2, the cyberrelationship figure of sick people
Association rule mining is then a critically important project in data mining, and as its name suggests, it is from data behind It was found that association or connection that may be present between things.Such as the thing discovery that customer buys in market by inquiry, 30% Customer can buy bed-linen simultaneously, and buying in the people of sheet has 80% to have purchased pillowcase, just conceals one here Association: sheet-> pillowcase, that is to say, that a big chunk customer can buy bed-linen simultaneously, then for market, Bed-linen can be placed on the same shopping area, just make things convenient for customers do shopping like that.
The inherent connection of disease category and Susceptible population's attribute can be found out by association rule mining specific to the present invention System.The specific method is as follows:
Firstly, can calculate the level encoder DS1 of disease category to every an example disease, can also calculate the crowd of patient The group of attribute encodes PG, constructs an one-dimension array and is put into [DS1, PG];
Then, all disease records are scanned, the input of the one-dimension array of the first step is filled into a new array, is built into One higher-dimension array;
Again, rule digging is associated to higher-dimension array to calculate, will eventually get DS1, the frequency of PG various combination data Spend weighted value FP.Due to being analysis of high frequency relationship, so taking 80 groups of most high frequency as a result, being filled with Gexf formatted data.Gexf It is a kind of special xml language for describing complex network relationship, is usually first to illustrate node (nodes) in gexf, then Resettle the relationship (edges) between node.DS3, PG are inserted as the Node of Gexf, filled out using its corresponding FP value as Edge Enter.
Finally, making the rendering of relational graph of Gexf data.Wherein red is disease category, and dark blue is crowd's attribute.Wherein, Crowd's attribute is grouped according to age bracket and gender.Disease category is classified by the first class catalogue of icd10.Calculate a people Group can show weighted value FP not and after the weight of the relationship of disease category on chain.Weighted value is higher, represents this kind of crowd This susceptible disease.Because FP value is excavated according to the frequency relation of crowd's attribute PG and disease code DS1 as a result, FP Value height represents in data result, and the people group is not and the relationship of disease is high frequency.Shown in Fig. 4 is a disease category With the inner link cyberrelationship example diagram of Susceptible population attribute.

Claims (8)

1. a kind of method of the Visual Report Forms based on medical document, which comprises the steps of:
1) data of medical document are acquired;
2) data of medical document are divided into disease data and patient data;
3) disease category data are analyzed, using clustering algorithm, is then presented with the mode of disease category distribution map The result of analysis;
4) data of sick people are analyzed, using crowd's attribute tags algorithm and association rules mining algorithm, is then used The method of the cyberrelationship figure of sick people is presented the result of analysis;
Wherein, the disease category data analysis uses ICD10 medical treatment catalogue, as tree catalogue, then by specific disease Disease does clustering algorithm on directory tree;
It is that the data of both disease and patient do association rule mining that the data to sick people, which carry out analysis, is Association rule mining is done using Apriori algorithm.
2. the method as described in claim 1, which is characterized in that the method for the disease category data analysis specifically: according to Prescription on medical document obtains the source of disease data with the disease name in diagnosis proof;Using ICD10 medical treatment catalogue, As tree catalogue, clustering algorithm, specific clustering algorithm process then will be done on disease specific directory tree are as follows:
A icd10 catalogue) is sorted out in a manner of relational data, divides DS1, tri- ranks of DS2, DS3;
B) the method searched with similarity, while the mode for being subject to error correction navigates to specific disease record DS3, lookup it is specific Method is disease on traversal document, calculates the editing distance of it and DS3 grade disease;
C) to DS3, the number of sufferer is recorded;
D) in DS2 rank, summarize all numbers of DS3 rank;Summarize all data of DS2 in DS1 rank.In this way, nothing Which rank of data sufferer number can be obtained by;
E) finally, the number of incidences and number of disease out can be summarized by tree.
3. method according to claim 2, which is characterized in that B) in specific algorithm it is as follows:
B1) length of str1 or str2 is 0 length for returning to another character string: if (str1.length==0) return
B2 the matrix d of (n+1) * (m+1)) is initialized, and the value of the first row and column is allowed to increase since 0;Scan two character string (n* M grades), if: str1 [i]==str2 [j] records it with temp, is 0;Otherwise temp is denoted as 1;Then matrix d [i, J] give the minimum value of d [i-1, j]+1, d [i, j-1]+1, d [i-1, j-1]+temp three;
B3 after) scanning through, the last one value d [n] [m] for returning to matrix is their distance;
B4) and all DS3 ranks compare distance, and distance is 0 or lower than one threshold value, hit, it is believed that the disease on document Disease is exactly the disease of this DS3.
4. the method as described in claim 1, which is characterized in that the method that the data of the sick people are analyzed is as follows:
Data source includes: the tree (being obtained with above-mentioned disease data analysis method) first is that every disease, second is that Crowd's attribute tags of patient data, the age of the patient in medical document (such as case record), gender, medical insurance card number, so Age-based and gender afterwards, forms different groups of users;
Correlation rule is done using Apriori algorithm with the data of above-mentioned disease and patient's these two aspects.
5. the method as described in claim 1, which is characterized in that the disease category is distributed the mode of map analysis is presented The result is that showing the morbidity quantity of various diseases with the mode of rectangle tree graph, region area is bigger, and it is more to represent morbidity.
6. method as claimed in claim 5, which is characterized in that the disease category distribution map specifically does drawing method are as follows: Firstly, calculating the toatl proportion of morbidity according to the morbidity number of third level disease, then determine that the third level is every according to toatl proportion number Area of the kind disease on a rectangle;It is highly preferred that disease data is divided into three-level according to the catalogue of icd10, first order disease, It is presented with the region of different colours;The second level and third level disease are showed all in first order region with the region of subdivision;It clicks Any first order region can focus on this rank and specially show its information.
7. the method as described in claim 1, which is characterized in that make the specific method of the cyberrelationship figure of the sick people It is as follows:
Firstly, the level encoder DS1 of disease category can be calculated to every an example disease, crowd's attribute of patient can be also calculated Group encode PG, construct an one-dimension array and be put into [DS1, PG];
Then, all disease records are scanned, the input of the one-dimension array of the first step is filled into a new array, is built into one Higher-dimension array;
Again, rule digging is associated to higher-dimension array to calculate, will eventually get DS1, the frequency power of PG various combination data Weight values FP;Due to being analysis of high frequency relationship, so taking 80 groups of most high frequency as a result, being filled with Gexf formatted data;By DS3, PG Node as Gexf is inserted, and is inserted using its corresponding FP value as Edge;
Finally, making the rendering of relational graph of Gexf data.Wherein disease category, crowd's attribute are indicated with different colours respectively;Its In, crowd's attribute is grouped according to age bracket and gender;Disease category classifies by the first class catalogue of icd10, calculates one Personal group can show weighted value FP not and after the weight of the relationship of disease category on chain.
8. realizing the visualization report of the big data analysis based on medical document of method as described in any one of claim 1 to 7 The system of table, which is characterized in that mainly include following modules:
1) data acquisition and categorization module: it is divided into disease number for acquiring the data of medical document, and by the data of medical document According to and patient data;
2) disease category data analysis module and the data analysis module of sick people data analysis module: are respectively included;
3) knot analyzed Visual Report Forms module: is presented with the cyberrelationship figure that disease category is distributed map and sick people respectively Fruit.
CN201810709344.8A 2018-07-02 2018-07-02 Method and system for visual report based on medical document Active CN109147879B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810709344.8A CN109147879B (en) 2018-07-02 2018-07-02 Method and system for visual report based on medical document

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810709344.8A CN109147879B (en) 2018-07-02 2018-07-02 Method and system for visual report based on medical document

Publications (2)

Publication Number Publication Date
CN109147879A true CN109147879A (en) 2019-01-04
CN109147879B CN109147879B (en) 2021-07-27

Family

ID=64802681

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810709344.8A Active CN109147879B (en) 2018-07-02 2018-07-02 Method and system for visual report based on medical document

Country Status (1)

Country Link
CN (1) CN109147879B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353051A (en) * 2019-12-04 2020-06-30 江苏蓝河智能科技有限公司 K-means and Apriori-based algorithm maritime big data association analysis method
CN111582219A (en) * 2020-05-18 2020-08-25 湖南纳九物联科技有限公司 Intelligent pet management system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105893766A (en) * 2016-04-06 2016-08-24 成都数联易康科技有限公司 Graded diagnosis and treatment evaluating method based on data mining
CN106202883A (en) * 2016-06-28 2016-12-07 成都中医药大学 A kind of method setting up disease cloud atlas based on big data analysis
US20160378919A1 (en) * 2013-11-27 2016-12-29 The Johns Hopkins University System and method for medical data analysis and sharing
CN106407650A (en) * 2016-08-29 2017-02-15 首都医科大学附属北京中医医院 Traditional Chinese medicine data processing device and method
CN106709248A (en) * 2016-12-16 2017-05-24 浙江大学 Disease complication excavating method based on FP-Growth algorithm
CN106934235A (en) * 2017-03-09 2017-07-07 中国科学院软件研究所 Patient's similarity measurement migratory system between a kind of disease areas based on transfer learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160378919A1 (en) * 2013-11-27 2016-12-29 The Johns Hopkins University System and method for medical data analysis and sharing
CN105893766A (en) * 2016-04-06 2016-08-24 成都数联易康科技有限公司 Graded diagnosis and treatment evaluating method based on data mining
CN106202883A (en) * 2016-06-28 2016-12-07 成都中医药大学 A kind of method setting up disease cloud atlas based on big data analysis
CN106407650A (en) * 2016-08-29 2017-02-15 首都医科大学附属北京中医医院 Traditional Chinese medicine data processing device and method
CN106709248A (en) * 2016-12-16 2017-05-24 浙江大学 Disease complication excavating method based on FP-Growth algorithm
CN106934235A (en) * 2017-03-09 2017-07-07 中国科学院软件研究所 Patient's similarity measurement migratory system between a kind of disease areas based on transfer learning

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353051A (en) * 2019-12-04 2020-06-30 江苏蓝河智能科技有限公司 K-means and Apriori-based algorithm maritime big data association analysis method
CN111582219A (en) * 2020-05-18 2020-08-25 湖南纳九物联科技有限公司 Intelligent pet management system
CN111582219B (en) * 2020-05-18 2023-12-22 湖南纳九物联科技有限公司 Intelligent pet management system

Also Published As

Publication number Publication date
CN109147879B (en) 2021-07-27

Similar Documents

Publication Publication Date Title
Gutman et al. A Bayesian procedure for file linking to analyze end-of-life medical costs
Clarkson et al. Resultmaps: Visualization for search interfaces
SuryaNarayana et al. A traditional analysis for efficient data mining with integrated association mining into regression techniques
AU2016345990A1 (en) A system and method for processing big data using electronic document and electronic file-based system that operates on RDBMS
US20050015381A1 (en) Database management system
US20100082697A1 (en) Data model enrichment and classification using multi-model approach
Feng et al. Has ceo gender bias really been fixed? adversarial attacking and improving gender fairness in image search
Mishne et al. Data-driven tree transforms and metrics
CN109147879A (en) The method and system of Visual Report Forms based on medical document
Canbek et al. New techniques in profiling big datasets for machine learning with a concise review of android mobile malware datasets
Singh et al. Effectual variance estimation strategy in two-occasion successive sampling in presence of random non response
Mueller-Warrant et al. Detecting and correcting logically inconsistent crop rotations and other land-use sequences
Gordon et al. TSI-GNN: extending graph neural networks to handle missing data in temporal settings
Syam et al. Efficient similarity measure via Genetic algorithm for content based medical image retrieval with extensive features
Li A Bayesian approach for estimating and replacing missing categorical data
CN108572981A (en) A kind of information recommendation method and device
Zhao et al. Entity matching across heterogeneous data sources: An approach based on constrained cascade generalization
Ranbaduge et al. A scalable and efficient subgroup blocking scheme for multidatabase record linkage
Jentner et al. Visual analytics of co-occurrences to discover subspaces in structured data
Xu et al. Automatic semantic modeling for structural data source with the prior knowledge from knowledge base
US20200089691A1 (en) System and method for regularizing data between data source and data destination
CN112163408A (en) Multi-level pull-down question type data processing method in online questionnaire survey system
Kumar et al. An Efficient Algorithm for Mining Frequent Itemsets in Large Databases
Aghdam et al. On enhancing data utility in k-anonymization for data without hierarchical taxonomies
Muelder et al. Multivariate social network visual analytics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant