CN109522331A - Compartmentalization various dimensions health data processing method and medium centered on individual - Google Patents
Compartmentalization various dimensions health data processing method and medium centered on individual Download PDFInfo
- Publication number
- CN109522331A CN109522331A CN201811203501.4A CN201811203501A CN109522331A CN 109522331 A CN109522331 A CN 109522331A CN 201811203501 A CN201811203501 A CN 201811203501A CN 109522331 A CN109522331 A CN 109522331A
- Authority
- CN
- China
- Prior art keywords
- data
- health
- master
- personal
- main
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The present invention provides a kind of compartmentalization various dimensions health data processing method and medium centered on individual, and method includes: to obtain pure health data to from the collected health data progress duplicate removal processing of different data fields;Standardization processing is carried out to pure health data and obtains Unify legislation health data;Unify legislation health data is integrated into the data warehousing distinguished with data field and obtains personal information table;By different data store in a warehouse in main identification data strip carry out clustering processing and obtain main cluster data item;The first weight of every data in main cluster data item is calculated according to multiple domain personal data normalizing weight analytical formula;Main cluster data item is normalized to obtain main personal data item according to the first weight and first threshold;Master index number is generated for main personal data item;It is numbered according to master index and main personal data item is stored in universe master index table.The present invention realizes the cross validation of different data domain health data, can dynamic, reflect personal health problem comprehensively.
Description
Technical field
The present invention relates to field of computer technology more particularly to the integration methods of compartmentalization various dimensions health data, specifically
For be exactly a kind of compartmentalization various dimensions health data processing method and medium centered on individual.
Background technique
In recent years, it as big data, the fast development of cloud platform and people are to the pay attention to day by day of self health status, is good for
The value of health big data increasingly shows, and health data amount is in the hurried dilatation of geometry grade growth rate.However, only to magnanimity
Medical data carries out classification processing, orderly integration, could really play the value of health data.In the prior art, Data Integration
Method, which is all based on, to be established based on a certain data source (such as hospital internal data), and mainly with department or
Kinds of Diseases, which are served as theme, carries out Data Integration, cannot achieve between different data sources and different types of data taking human as the mutual of unit
Even intercommunication.Therefore, data correlation is carried out in hospital internal data as unit of individual, each medical institutions' internal data is only
Personal medical situation once or several times can be reflected, the health that can not complete, dynamically, comprehensively reflect individual is asked
Topic.
In addition, present personal data correlation technology, mainly passes through identity card, driver's license, officer's identity card, passport, medical insurance
Card number etc. is associated, or is identified by name, gender and date of birth.There is processing for this data processing method
Logic is single, not flexible, can not verify to personal essential information and determine its confidence level.If name has used the pet name or system
Done go privacyization handle and individual data there is mistake, such issues that data can not will carry out personal data normalizing forever
Change processing, allows the value of health data to have a greatly reduced quality.
Therefore, those skilled in the art need to develop a kind of health data integrated processing method as unit of individual,
In health data part in the case where field errors, health data is normalized, that improves health data applies valence
Value.
Summary of the invention
In view of this, the technical problem to be solved in the present invention is that providing a kind of compartmentalization various dimensions centered on individual
Health data processing method and medium, different data sources, different types of health data can not be carried out by solving the prior art
The problem of integration handles and can not handle part field errors health data.
In order to solve the above-mentioned technical problem, a specific embodiment of the invention provides a kind of compartmentalization centered on individual
Various dimensions health data processing method, comprising: carried out according to integrity rule to from the collected health data of different data fields
Duplicate removal processing obtains pure health data;Standardization processing is carried out to the pure health data and obtains Unify legislation health number
According to;The Unify legislation health data is integrated into the data warehousing distinguished with data field and obtains personal information table, wherein institute
Stating personal information table includes main identification data strip and non-master identification data strip;By different data store in a warehouse described in main identification data strip
It carries out clustering processing and obtains main cluster data item;The main cluster numbers are calculated according to multiple domain personal data normalizing weight analytical formula
According to the first weight of data every in item;Normalizing is carried out to the main cluster data item according to first weight and first threshold
Change handles to obtain main personal data item;Master index number is generated for the main personal data item;Being numbered according to the master index will
The main personal data item is stored in universe master index table.
A specific embodiment of the invention also provides a kind of computer storage medium comprising computer executed instructions, described
When computer executed instructions are handled via data processing equipment, the compartmentalization which executes centered on individual is more
Dimension health data processing method.
Above-mentioned specific embodiment according to the present invention is it is found that at compartmentalization various dimensions health data centered on individual
Reason method and medium at least have the advantages that and not only carry out the interior health data of each data field (data source) laterally
Comparison can also carry out longitudinal comparison to health data across data field, realize the cross validation of health data, guarantee is stored in universe
The reliability of main personal data item in master index table;As needed healthy number can also be configured with the weight in configuration data domain
According to the weight of specific fields, data flexibility is high;It can be made according to the characteristic of non-master identification data strip, selection during Data Matching
The normalized success rate of health data is improved on the basis of guaranteeing health data reliability with accurate matching and fuzzy matching;
The optimal non-master identification data in different type different data domain are compared with master index table, efficiently solve different numbers
According to the normalized problem of the individual health data in domain;It is close to solve name by multiple domain personal data normalizing weight analytical formula
Claim, in the case where privacyization even source health data mistake, the normalization matching problem of health data.
It is to be understood that above-mentioned general description and following specific embodiments are merely illustrative and illustrative, not
The range of the invention to be advocated can be limited.
Detailed description of the invention
Following appended attached drawing is part of specification of the invention, depicts example embodiments of the present invention, institute
Attached drawing is used to illustrate the principle of the present invention together with the description of specification.
Fig. 1 is at a kind of compartmentalization various dimensions health data centered on individual that the specific embodiment of the invention provides
The flow chart of the embodiment one of reason method.
Fig. 2 is at a kind of compartmentalization various dimensions health data centered on individual that the specific embodiment of the invention provides
The flow chart of the embodiment two of reason method.
Fig. 3 is at a kind of compartmentalization various dimensions health data centered on individual that the specific embodiment of the invention provides
The flow chart of the embodiment three of reason method.
Fig. 4 is at a kind of compartmentalization various dimensions health data centered on individual that the specific embodiment of the invention provides
The flow chart of the example IV of reason method.
Fig. 5 provides a kind of periodical in batches from different data fields using Kettle tool for the specific embodiment of the invention
Acquire the schematic diagram of health data.
Fig. 6 be the specific embodiment of the invention provide a kind of carry out personal information table and universe master index table to intersect rope
The schematic diagram drawn.
Specific embodiment
Understand in order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below will with attached drawing and in detail
Narration clearly illustrates the spirit of disclosed content, and any skilled artisan is understanding the content of present invention
After embodiment, when the technology that can be taught by the content of present invention, it is changed and modifies, without departing from the essence of the content of present invention
Mind and range.
The illustrative embodiments of the present invention and their descriptions are used to explain the present invention, but not as a limitation of the invention.
In addition, in the drawings and embodiments the use of element/component of same or like label is for representing same or like portion
Point.
About " first " used herein, " second " ... etc., not especially censure the meaning of order or cis-position,
It is non-to limit the present invention, only for distinguish with same technique term description element or operation.
About direction term used herein, such as: upper and lower, left and right, front or rear etc. are only the sides with reference to attached drawing
To.Therefore, the direction term used is intended to be illustrative and not intended to limit this creation.
It is open term, i.e., about "comprising" used herein, " comprising ", " having ", " containing " etc.
Mean including but not limited to.
About it is used herein " and/or ", including any of the things or all combination.
It include " two " and " two or more " about " multiple " herein;It include " two groups " about " multiple groups " herein
And " more than two ".
About term used herein " substantially ", " about " etc., to modify it is any can be with the quantity or mistake of microvariations
Difference, but this slight variations or error can't change its essence.In general, microvariations that such term is modified or error
Range in some embodiments can be 20%, in some embodiments can be 10%, can be in some embodiments 5% or its
His numerical value.It will be understood by those skilled in the art that the aforementioned numerical value referred to can be adjusted according to actual demand, it is not limited thereto.
Fig. 1 is at a kind of compartmentalization various dimensions health data centered on individual that the specific embodiment of the invention provides
The flow chart of the embodiment one of reason method, as shown in Figure 1, standardizing again after carrying out duplicate removal processing to the health data of acquisition
Processing, obtains Unify legislation health data;Then Unify legislation health data is integrated into data warehousing and obtains personal information
Table;Clustering processing is carried out to the main identification data strip in personal information table again, calculates the power of every data in main cluster data item
Value;Then main cluster data item is normalized according to weight to obtain main personal data item;It is finally main personal data
Item generates master index number, and is numbered according to master index and store main personal data item into universe master index table.
Compartmentalization various dimensions health data processing method in the specific embodiment shown in the drawings, centered on individual
Include:
Step 101: being obtained according to integrity rule to from the collected health data progress duplicate removal processing of different data fields
Pure health data.In the embodiment of the present invention, integrity rule refers specifically to that name cannot be lacked in health data, lacks name
It is considered as No Master Record, logic deletion is carried out to this health data.
Step 102: standardization processing being carried out to the pure health data and obtains Unify legislation health data.Of the invention
In embodiment, gender field in medical insurance domain, male is labeled as " 1 ";Gender field in hospital domain, male's label
For " B ", standardization processing is carried out to the data field, Unify legislation is exported as " male ".Step 102 specifically includes: analysis is every
The magnanimity health data of a data field obtains the Data Identification rule of each data field;It will be described according to the Data Identification rule
The specific fields data of pure health data carry out standardization processing.
Step 103: the Unify legislation health data being integrated into the data warehousing distinguished with data field and obtains individual
Information table, wherein the personal information table includes main identification data strip and non-master identification data strip.In the embodiment of the present invention,
Including exclusive number in every data in the main identification data strip, the exclusive number includes identification card number, officer's identity card
Number, at least one of passport No. and medical insurance card number;It does not include that uniqueness is compiled in every data in the non-master identification data strip
Number.For example, non-master identification data strip includes: name, gender, date of birth, phone number, telephone number, country, province, city
At least one of city, home address and postcode.Data field includes medical insurance domain, hospital domain, Residential soil domain, pharmacy
Domain, healthy wearable device domain and personal use behavior domain etc..The preparatory Modeling and Design of data warehousing, each data field have correspondence
Data warehousing.
Wherein, medical insurance numeric field data includes: medical insurance personal information data, personal medical insurance payment data, personal medical insurance card state
Data, medical advice of settlement etc..Hospital's numeric field data includes: personal essential information data, Prescriptions for Out-patients data, patient medical history number in institute
According to, check data, inspection data, surgery anesthesia data, ECG data, be hospitalized order data, inpatient cases data, ICU (danger
Severe ward) data etc..Residential soil numeric field data: personal essential information data, health examination data, child health care number
According to, pregnant and lying-in women's information data, chronic disease management data, communicable disease control data and mental disorder management data etc..Pharmacy
Numeric field data: personal essential information data, personal purchase medicine data etc..Healthy wearable device numeric field data: personal essential information data,
Blood glucose level data, blood pressure data, heart rate data, oxygen content of blood data, temperature data, respiratory rate data etc..Personal use behavior
Numeric field data: personal essential information data, personal authorization using internet product when the usage behaviors data such as inquiry, browsing.
Step 104: by different data store in a warehouse described in main identification data strip carry out clustering processing and obtain main cluster data item.
In the embodiment of the present invention, equal cluster can be successively carried out according to identification card number, officer's identity card number, passport No., medical insurance card number.
The main cluster data item table 1 as follows that clustering processing obtains is carried out according to identification card number, table 1 is to be clustered according to identification card number
Handle obtained main cluster data item.
Table 1
Step 105: every number in the main cluster data item is calculated according to multiple domain personal data normalizing weight analytical formula
According to the first weight.In the embodiment of the present invention, multiple domain personal data normalizing weight analytical formula specifically:
Wherein, ΩiFor the first weight of every data in main cluster data item;wiFor the quality of data in different data domain
Weight;fi,jFor other data strips in some specific fields data of a data in main cluster data item and main cluster data item
The successful number of specific fields Data Matching;N is the item number of data in main cluster data item;ujFor different data domain Zhong Te
Determine the weight of field data;I is data Field Number, wherein data field includes medical insurance domain, hospital domain, Residential soil domain, medicine
Shop domain, healthy wearable device domain and personal use behavior domain.
According to table 1 above, data parameters are as follows in medical insurance domain:
W1=10 (weights of medical insurance numeric field data quality);
f1,1=3 (numbers of identification card number progress successful match in medical insurance domain identification card number and other data fields);
f1,2=2 (numbers of name progress successful match in medical insurance domain name and other data fields);
f1,3=3 (numbers of gender progress successful match in medical insurance domain gender and other data fields);
f1,4=3 (numbers of birthdate progress successful match in medical insurance domain birthdate and other data fields);
f1,5=3 (address carries out the number of successful match, fuzzy matching in medical insurance domain address and other data fields);
f1,6=3 (numbers of cell-phone number progress successful match in medical insurance domain cell-phone number and other data fields);
N=4 (sum of data field);
u1=10 (weights of identification card number field);
u2=8 (weights of name field);
u3=6 (weights of gender field);
u4=8 (weights of birthdate field);
u5=8 (weights of address field);
u6=8 (weights of cell-phone number field).
Above-mentioned parameter artificially can freely be set according to concrete application scene situation.Such as: data need stringent normalized,
Then need the weight of the fields such as enhancing name, date of birth, gender, address.Conversely, then reducing the weight of these fields.
Step 106: the main cluster data item being normalized according to first weight and first threshold
To main personal data item.In the embodiment of the present invention, Ω is calculated by operational formula1Value, i.e., main cluster data Chinese medicine
Protect the first weight of numeric field data item;And so on calculate Ω2、Ω3、Ω4Value, and according to preset first threshold pair
Main cluster data item is normalized to obtain main personal data item, and main personal data item is exactly the data strip of the same person.
First in this example, Article 2 and third data be same person's data, Article 4 data are different personal datas, therefore, main
Personal data item is exactly first, Article 2 and third data.
Step 107: generating master index number for the main personal data item.In the embodiment of the present invention, master index number
Generating logic is that SHA256 operation life is carried out to " identification card number+name+date of birth+home address+cell-phone number+random digit "
It is numbered at master index, the meaning for drawing random digit is the repetition for avoiding master index from numbering.SHA256 is secure hash algorithm
One of, it is the HASH value that one 64 are obtained after being calculated data.
Step 108: being numbered according to the master index and the main personal data item is stored in universe master index table.This hair
In bright embodiment, the master index number of generation will need to be distributed to every data in main personal data item, these data simultaneously
Item is located at different data fields;Universe master index table is related by master index number to the main personal data item in each data field
Connection.
Referring to Fig. 1, health data in each data field is not only subjected to across comparison, it can also be across data field to healthy number
According to longitudinal comparison is carried out, the cross validation of health data is realized, guarantee is stored in main personal data item in universe master index table
Reliability;The weight of health data specific fields can also be configured with the weight of configuration data domain (data source) as needed,
Data flexibility is high;Solves the name pet name, privacyization even source health by multiple domain personal data normalizing weight analytical formula
In the case where error in data, the normalization matching problem of health data.
Fig. 2 is at a kind of compartmentalization various dimensions health data centered on individual that the specific embodiment of the invention provides
Main personal data item is stored in universe main rope as shown in Fig. 2, numbering according to master index by the flow chart of the embodiment two of reason method
After drawing in table, clustering processing is carried out to non-master identification data strip and obtains non-master cluster data item, and calculates non-master cluster data
Second weight of every data in item;The optimal non-master identification data in each data warehousing are chosen further according to the second weight;Most
Optimal non-master identification data are stored into universe master index table afterwards.
Compartmentalization various dimensions health in the specific embodiment shown in the drawings, after step 108, centered on individual
Data processing method further include:
Step 109: by different data store in a warehouse described in non-master identification data strip carry out clustering processing and obtain non-master cluster numbers
According to item.In the embodiment of the present invention, as long as in non-master identification data strip in the specific fields such as name, phone number, family address
There is a successful match, non-master cluster data item can be converted into.It can be lived according to name, phone number, telephone number, family
Location successively carries out clustering processing.The non-master cluster data item table 2 as follows that clustering processing obtains, table 2 are carried out according to telephone number
To carry out the non-master cluster data item that clustering processing obtains according to telephone number.It is 4 that non-master cluster data item number is shared in table 2
Item, from different data fields, medical insurance domain, the corresponding non-master cluster data item in hospital domain and wearable device domain are same
People, the corresponding non-master cluster data item in user's usage behavior domain is that separately have other people.
Table 2
Step 110: calculating in the non-master cluster data item every according to multiple domain personal data normalizing weight analytical formula
Second weight of data.In the embodiment of the present invention, the multiple domain personal data normalizing weight analytical formula specifically:
Wherein, ΩiFor the second weight of every data in non-master cluster data item;wiFor the quality of data in different data domain
Weight;fi,jFor a data in non-master cluster data item some specific fields data with it is other in non-master cluster data item
The successful number of specific fields Data Matching of data strip;N is the item number of data in non-master cluster data item;ujFor different numbers
According to the weight of specific fields data in domain;I is data Field Number, wherein data field includes medical insurance domain, hospital domain, residents ' health
Archives domain, pharmacy domain, healthy wearable device domain and personal use behavior domain etc..
Step 111: the optimal non-master identification data in each data warehousing are chosen according to second weight.Of the invention
In embodiment, obtained optimal non-master identification data such as following Table 3, table 3 is the optimal non-master identification data chosen.
Table 3
Step 112: the optimal non-master identification data are stored into the universe master index table.The embodiment of the present invention
In, step 112 specifically includes: matching the optimal non-master identification data and the universe master index table;Successful match, then basis
The master index number stores the optimal non-master identification data to the universe master index table;It fails to match, then generates new
Master index number, and numbered according to new master index and store the optimal non-master identification data to the universe master index table
In.For example, optimal non-master identification data and universe master index table are matched, can to " name+gender+birthdate ",
" cell-phone number+home address " or " cell-phone number+birthdate " is matched, as long as a kind of combination successful match, explanation
The optimal non-master identification data have existed in universe master index table, and optimal non-master identification data are stored to universe master index table
In;If it fails to match for all combinations, illustrate that there is no the optimal non-master identification data in universe master index table, it is optimal for this
Non-master identification data generate new master index number, and are numbered according to new master index and store optimal non-master identification data to complete
In the master index table of domain.
Referring to fig. 2, optimal non-master identification data are compared with master index table, efficiently solve different type,
The normalized problem of the individual health data in different data domain;In matching process can according to it is optimal it is non-master identification data characteristic,
Selection meets the needs of data user using precisely matching and fuzzy matching;It is analyzed by multiple domain personal data normalizing weight
In the case that formula solves the name pet name, privacyization even source health data mistake, the normalization matching problem of health data.
Fig. 3 is at a kind of compartmentalization various dimensions health data centered on individual that the specific embodiment of the invention provides
The flow chart of the embodiment three of reason method;Fig. 6 provides a kind of by personal information table and universe for the specific embodiment of the invention
Master index table carries out the schematic diagram of cross-index, as shown in Fig. 3, Fig. 6, by numbering in master index number and domain by personal information
Table and universe master index table carry out cross-index, make the personal information table of all data fields interrelated.
Compartmentalization various dimensions health in the specific embodiment shown in the drawings, after step 112, centered on individual
Data processing method further include:
Step 113: according to number in master index number and domain by the personal information table and the universe master index table into
Row cross-index.In the embodiment of the present invention, number includes personal number, outpatient service number, testament number, sample number, knot in the domain
Calculate at least one of single number and Residential soil number.
Referring to Fig. 3, Fig. 6, the health data in all different data domains has been associated with number in domain by master index number
Come, all relative health datas can be inquired by master index number, data search is convenient.
Fig. 4 is at a kind of compartmentalization various dimensions health data centered on individual that the specific embodiment of the invention provides
The flow chart of the example IV of reason method;Fig. 5 be the specific embodiment of the invention provide it is a kind of using Kettle tool never
With the schematic diagram of data field periodicity batch capture health data, as shown in Figure 4, Figure 5, in order to ensure the privacy of health data
Property, using private network special line using Kettle tool periodically from different data field batch capture health datas.
Compartmentalization various dimensions health in the specific embodiment shown in the drawings, before step 101, centered on individual
Data processing method further include:
Step 100: by private network special line using Kettle tool periodically from different data field batch capture health datas.
It, daily ought full dose health data in a few days from the acquisition of different data fields using Kettle tool in the embodiment of the present invention.It can be with
Special light channel is laid between health data collection equipment and data source, Kettle tool completes Data Format Transform,
In the database by the health data storage of acquisition.
Referring to fig. 4, using the health data of private network special line transmission Kettle tool acquisition, guarantee the privacy of health data
Property, user experience is good;Special light channel, data acquisition efficiency are laid between health data collection equipment and data source
Height, magnanimity health data collection will not influence the normal operation of medical service organ network.
The specific embodiment of the invention provides a kind of computer storage medium comprising computer executed instructions, the calculating
Machine executes instruction when handling via data processing equipment, which executes the compartmentalization various dimensions centered on individual
Health data processing method.Method the following steps are included:
Step 101: being obtained according to integrity rule to from the collected health data progress duplicate removal processing of different data fields
Pure health data.
Step 102: standardization processing being carried out to the pure health data and obtains Unify legislation health data.
Step 103: the Unify legislation health data being integrated into the data warehousing distinguished with data field and obtains individual
Information table, wherein the personal information table includes main identification data strip and non-master identification data strip.
Step 104: by different data store in a warehouse described in main identification data strip carry out clustering processing and obtain main cluster data item.
Step 105: every number in the main cluster data item is calculated according to multiple domain personal data normalizing weight analytical formula
According to the first weight.
Step 106: the main cluster data item being normalized according to first weight and first threshold
To main personal data item.
Step 107: generating master index number for the main personal data item.
Step 108: being numbered according to the master index and the main personal data item is stored in universe master index table.
The specific embodiment of the invention also provides a kind of computer storage medium comprising computer executed instructions, the meter
Calculation machine executes instruction when handling via data processing equipment, which executes the compartmentalization multidimensional centered on individual
Spend health data processing method.Method the following steps are included:
Step 101: being obtained according to integrity rule to from the collected health data progress duplicate removal processing of different data fields
Pure health data.
Step 102: standardization processing being carried out to the pure health data and obtains Unify legislation health data.
Step 103: the Unify legislation health data being integrated into the data warehousing distinguished with data field and obtains individual
Information table, wherein the personal information table includes main identification data strip and non-master identification data strip.
Step 104: by different data store in a warehouse described in main identification data strip carry out clustering processing and obtain main cluster data item.
Step 105: every number in the main cluster data item is calculated according to multiple domain personal data normalizing weight analytical formula
According to the first weight.
Step 106: the main cluster data item being normalized according to first weight and first threshold
To main personal data item.
Step 107: generating master index number for the main personal data item.
Step 108: being numbered according to the master index and the main personal data item is stored in universe master index table.
Step 109: by different data store in a warehouse described in non-master identification data strip carry out clustering processing and obtain non-master cluster numbers
According to item.
Step 110: calculating in the non-master cluster data item every according to multiple domain personal data normalizing weight analytical formula
Second weight of data.
Step 111: the optimal non-master identification data in each data warehousing are chosen according to second weight.
Step 112: the optimal non-master identification data are stored into the universe master index table.
The specific embodiment of the invention also provides a kind of computer storage medium comprising computer executed instructions, the meter
Calculation machine executes instruction when handling via data processing equipment, which executes the compartmentalization multidimensional centered on individual
Spend health data processing method.Method the following steps are included:
Step 101: being obtained according to integrity rule to from the collected health data progress duplicate removal processing of different data fields
Pure health data.
Step 102: standardization processing being carried out to the pure health data and obtains Unify legislation health data.
Step 103: the Unify legislation health data being integrated into the data warehousing distinguished with data field and obtains individual
Information table, wherein the personal information table includes main identification data strip and non-master identification data strip.
Step 104: by different data store in a warehouse described in main identification data strip carry out clustering processing and obtain main cluster data item.
Step 105: every number in the main cluster data item is calculated according to multiple domain personal data normalizing weight analytical formula
According to the first weight.
Step 106: the main cluster data item being normalized according to first weight and first threshold
To main personal data item.
Step 107: generating master index number for the main personal data item.
Step 108: being numbered according to the master index and the main personal data item is stored in universe master index table.
Step 109: by different data store in a warehouse described in non-master identification data strip carry out clustering processing and obtain non-master cluster numbers
According to item.
Step 110: calculating in the non-master cluster data item every according to multiple domain personal data normalizing weight analytical formula
Second weight of data.
Step 111: the optimal non-master identification data in each data warehousing are chosen according to second weight.
Step 112: the optimal non-master identification data are stored into the universe master index table.
Step 113: according to number in master index number and domain by the personal information table and the universe master index table into
Row cross-index.
The specific embodiment of the invention also provides a kind of computer storage medium comprising computer executed instructions, the meter
Calculation machine executes instruction when handling via data processing equipment, which executes the compartmentalization multidimensional centered on individual
Spend health data processing method.Method the following steps are included:
Step 100: by private network special line using Kettle tool periodically from different data field batch capture health datas.
Step 101: being obtained according to integrity rule to from the collected health data progress duplicate removal processing of different data fields
Pure health data.
Step 102: standardization processing being carried out to the pure health data and obtains Unify legislation health data.
Step 103: the Unify legislation health data being integrated into the data warehousing distinguished with data field and obtains individual
Information table, wherein the personal information table includes main identification data strip and non-master identification data strip.
Step 104: by different data store in a warehouse described in main identification data strip carry out clustering processing and obtain main cluster data item.
Step 105: every number in the main cluster data item is calculated according to multiple domain personal data normalizing weight analytical formula
According to the first weight.
Step 106: the main cluster data item being normalized according to first weight and first threshold
To main personal data item.
Step 107: generating master index number for the main personal data item.
Step 108: being numbered according to the master index and the main personal data item is stored in universe master index table.
The specific embodiment of the invention provide a kind of compartmentalization various dimensions health data processing method centered on individual and
The interior health data of each data field (data source) is not only carried out across comparison by medium, can also be across data field to health data
Longitudinal comparison is carried out, realizes the cross validation of health data, guarantee to be stored in main personal data item in universe master index table can
By property;As needed the weight of health data specific fields, data flexibility can also be configured with the weight in configuration data domain
It is high;It can guaranteed according to the characteristic of non-master identification data strip, selection using precisely matching and fuzzy matching during Data Matching
On the basis of health data reliability, the normalized success rate of health data is improved;By the optimal of different type different data domain
Non-master identification data are compared with master index table, and the individual health data for efficiently solving different data domain is normalized
Problem;Solves the name pet name, privacyization even source health data mistake by multiple domain personal data normalizing weight analytical formula
In the case where, the normalization matching problem of health data.
The above-mentioned embodiment of the present invention can be implemented in various hardware, Software Coding or both combination.For example, this hair
Bright embodiment can also be the execution above method in data signal processor (Digital Signal Processor, DSP)
Program code.The present invention can also refer to computer processor, digital signal processor, microprocessor or field-programmable gate array
Arrange the multiple functions that (Field Programmable Gate Array, FPGA) is executed.Above-mentioned processing can be configured according to the present invention
Device executes particular task, and machine-readable software code or the firmware generation of the ad hoc approach that the present invention discloses are defined by executing
Code is completed.Software code or firmware code can be developed as different program languages and different formats or form.Can also be
Different target platform composing software codes.However, executing software code and the other types configuration generation of task according to the present invention
Different code pattern, type and the language of code do not depart from spirit and scope of the invention.
The foregoing is merely the schematical specific embodiments of the present invention, before not departing from conceptions and principles of the invention
It puts, the equivalent changes and modifications that any those skilled in the art is made should belong to the scope of protection of the invention.
Claims (10)
1. a kind of compartmentalization various dimensions health data processing method centered on individual, which is characterized in that this method comprises:
Pure health data is obtained to from the collected health data progress duplicate removal processing of different data fields according to integrity rule;
Standardization processing is carried out to the pure health data and obtains Unify legislation health data;
The Unify legislation health data is integrated into the data warehousing distinguished with data field and obtains personal information table, wherein
The personal information table includes main identification data strip and non-master identification data strip;
By different data store in a warehouse described in main identification data strip carry out clustering processing and obtain main cluster data item;
The first weight of every data in the main cluster data item is calculated according to multiple domain personal data normalizing weight analytical formula;
The main cluster data item is normalized to obtain main personal data according to first weight and first threshold
Item;
Master index number is generated for the main personal data item;And
It is numbered according to the master index and the main personal data item is stored in universe master index table.
2. the compartmentalization various dimensions health data processing method centered on individual as described in claim 1, which is characterized in that
After the step that the main personal data item is stored in universe master index table according to master index number, this method is also wrapped
It includes:
By different data store in a warehouse described in non-master identification data strip carry out clustering processing and obtain non-master cluster data item;
The second power of every data in the non-master cluster data item is calculated according to multiple domain personal data normalizing weight analytical formula
Value;
The optimal non-master identification data in each data warehousing are chosen according to second weight;And
The optimal non-master identification data are stored into the universe master index table.
3. the compartmentalization various dimensions health data processing method centered on individual as claimed in claim 2, which is characterized in that
The optimal non-master identification data are stored into the step into the universe master index table, are specifically included:
Match the optimal non-master identification data and the universe master index table;
Successful match then numbers according to the master index and stores the optimal non-master identification data to the universe master index
Table;
It fails to match, then generates new master index number, and number the optimal non-master identification data according to new master index
It stores into the universe master index table.
4. the compartmentalization various dimensions health data processing method centered on individual as claimed in claim 2, which is characterized in that
After the optimal non-master identification data are stored the step into the universe master index table, this method further include:
The personal information table and the universe master index table are subjected to cross-index according to number in master index number and domain.
5. the compartmentalization various dimensions health data processing method centered on individual as claimed in claim 4, which is characterized in that
Number includes in personal number, outpatient service number, testament number, sample number, advice of settlement number and Residential soil number in the domain
At least one.
6. the compartmentalization various dimensions health data processing method centered on individual as described in claim 1, which is characterized in that
The step of pure health data is obtained to from the collected health data progress duplicate removal processing of different data fields according to integrity rule
Before rapid, this method further include:
By private network special line using Kettle tool periodically from different data field batch capture health datas.
7. the compartmentalization various dimensions health data processing method centered on individual as described in claim 1, which is characterized in that
The step of standardization processing obtains Unify legislation health data is carried out to the pure health data, is specifically included:
The magnanimity health data for analyzing each data field obtains the Data Identification rule of each data field;And
The specific fields data of the pure health data are subjected to standardization processing according to the Data Identification rule.
8. the compartmentalization various dimensions health data processing method centered on individual as described in claim 1, which is characterized in that
The multiple domain personal data normalizing weight analytical formula specifically:
Wherein, ΩiFor the first weight of every data in main cluster data item;wiFor the weight of the quality of data in different data domain;
fi,jFor the spy of other data strips in some specific fields data of a data in main cluster data item and main cluster data item
Determine the successful number of field data match;N is the item number of data in main cluster data item;ujFor specific fields in different data domain
The weight of data;I be data Field Number, wherein data field include medical insurance domain, hospital domain, Residential soil domain, pharmacy domain,
Healthy wearable device domain and personal use behavior domain.
9. the compartmentalization various dimensions health data processing method centered on individual as described in claim 1, which is characterized in that
Including exclusive number in every data in the main identification data strip, the exclusive number includes identification card number, officer's identity card
Number, at least one of passport No. and medical insurance card number;It does not include that uniqueness is compiled in every data in the non-master identification data strip
Number.
10. a kind of computer storage medium comprising computer executed instructions, the computer executed instructions are via data processing
When equipment processing, which requires 1~9 any compartmentalization various dimensions centered on individual
Health data processing method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811203501.4A CN109522331B (en) | 2018-10-16 | 2018-10-16 | Individual-centered regionalized multi-dimensional health data processing method and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811203501.4A CN109522331B (en) | 2018-10-16 | 2018-10-16 | Individual-centered regionalized multi-dimensional health data processing method and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109522331A true CN109522331A (en) | 2019-03-26 |
CN109522331B CN109522331B (en) | 2021-04-16 |
Family
ID=65770882
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811203501.4A Active CN109522331B (en) | 2018-10-16 | 2018-10-16 | Individual-centered regionalized multi-dimensional health data processing method and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109522331B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111694993A (en) * | 2020-06-11 | 2020-09-22 | 北京金山云网络技术有限公司 | Method, device, electronic equipment and medium for creating data index |
CN113836141A (en) * | 2021-09-24 | 2021-12-24 | 中国劳动关系学院 | Big data cross indexing method based on distribution model |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101324894A (en) * | 2008-07-24 | 2008-12-17 | 中国网络通信集团公司 | Correlation method and system of medical global mark and medical local mark |
CN102005023A (en) * | 2010-10-26 | 2011-04-06 | 汪海玥 | National health medical file system managed by means of internet website |
CN103870668A (en) * | 2012-12-17 | 2014-06-18 | 上海联影医疗科技有限公司 | Method and device for establishing master patient index oriented to regional medical treatment |
CN104063567A (en) * | 2013-03-20 | 2014-09-24 | 上海联影医疗科技有限公司 | Establishment method of patient identity source cross reference |
CN105574334A (en) * | 2015-12-15 | 2016-05-11 | 深圳安泰创新科技股份有限公司 | Medical information processing method and system |
CN105678100A (en) * | 2016-03-01 | 2016-06-15 | 万达信息股份有限公司 | Health record browsing system |
CN105787010A (en) * | 2016-02-23 | 2016-07-20 | 北京凯行同创科技有限公司 | Acquisition processing and pushing method and system based on personal data |
-
2018
- 2018-10-16 CN CN201811203501.4A patent/CN109522331B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101324894A (en) * | 2008-07-24 | 2008-12-17 | 中国网络通信集团公司 | Correlation method and system of medical global mark and medical local mark |
CN102005023A (en) * | 2010-10-26 | 2011-04-06 | 汪海玥 | National health medical file system managed by means of internet website |
CN103870668A (en) * | 2012-12-17 | 2014-06-18 | 上海联影医疗科技有限公司 | Method and device for establishing master patient index oriented to regional medical treatment |
CN104063567A (en) * | 2013-03-20 | 2014-09-24 | 上海联影医疗科技有限公司 | Establishment method of patient identity source cross reference |
CN105574334A (en) * | 2015-12-15 | 2016-05-11 | 深圳安泰创新科技股份有限公司 | Medical information processing method and system |
CN105787010A (en) * | 2016-02-23 | 2016-07-20 | 北京凯行同创科技有限公司 | Acquisition processing and pushing method and system based on personal data |
CN105678100A (en) * | 2016-03-01 | 2016-06-15 | 万达信息股份有限公司 | Health record browsing system |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111694993A (en) * | 2020-06-11 | 2020-09-22 | 北京金山云网络技术有限公司 | Method, device, electronic equipment and medium for creating data index |
CN111694993B (en) * | 2020-06-11 | 2023-05-02 | 北京金山云网络技术有限公司 | Method, device, electronic equipment and medium for creating data index |
CN113836141A (en) * | 2021-09-24 | 2021-12-24 | 中国劳动关系学院 | Big data cross indexing method based on distribution model |
Also Published As
Publication number | Publication date |
---|---|
CN109522331B (en) | 2021-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Degoulet et al. | Introduction to clinical informatics | |
Petitti | Meta-analysis, decision analysis, and cost-effectiveness analysis: methods for quantitative synthesis in medicine | |
US20130197925A1 (en) | Behavioral clustering for removing outlying healthcare providers | |
KR20230118194A (en) | Systems and methods for modifying and redacting health data for analysis across geographic regions | |
US20140149132A1 (en) | Adaptive medical documentation and document management | |
JP7274599B2 (en) | Automatic creation of cancer registry records | |
JP2018060529A (en) | Method and apparatus of context-based patient similarity | |
Chowriappa et al. | Introduction to machine learning in healthcare informatics | |
JPWO2019244949A1 (en) | Biometric information processing methods, biometric information processing devices, and biometric information processing systems | |
US11875884B2 (en) | Expression of clinical logic with positive and negative explainability | |
CN112002397A (en) | Clinical decision support and clinical cost management system | |
CN109522331A (en) | Compartmentalization various dimensions health data processing method and medium centered on individual | |
Kumar et al. | Review paper on Big Data in healthcare informatics | |
US20130253892A1 (en) | Creating synthetic events using genetic surprisal data representing a genetic sequence of an organism with an addition of context | |
US11238988B2 (en) | Large scale identification and analysis of population health risks | |
US20180286519A1 (en) | Methods and Systems for Extrapolating and Estimating Occurrences Based on Sample Data | |
CN115171830A (en) | Patient data-based service package generation method, device, equipment and storage medium | |
CN114121213A (en) | Anesthesia medicine information rechecking method and device, electronic equipment and storage medium | |
CN111986815B (en) | Project combination mining method based on co-occurrence relation and related equipment | |
US20160162650A1 (en) | Method for automating medical billing | |
Yee et al. | Big data: Its implications on healthcare and future steps | |
US10521552B2 (en) | Method and computing device for implementing multiple matching strategies | |
Guyet et al. | An open generator of synthetic administrative healthcare databases | |
GB2573512A (en) | Database and associated method | |
CN113657809B (en) | Hospital portrait construction method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |