CN109522331A - Compartmentalization various dimensions health data processing method and medium centered on individual - Google Patents

Compartmentalization various dimensions health data processing method and medium centered on individual Download PDF

Info

Publication number
CN109522331A
CN109522331A CN201811203501.4A CN201811203501A CN109522331A CN 109522331 A CN109522331 A CN 109522331A CN 201811203501 A CN201811203501 A CN 201811203501A CN 109522331 A CN109522331 A CN 109522331A
Authority
CN
China
Prior art keywords
data
health
master
personal
main
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811203501.4A
Other languages
Chinese (zh)
Other versions
CN109522331B (en
Inventor
金以东
李雪莉
周大胜
王语莫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ebaotech Internet Medical Information Technology (beijing) Co Ltd
Original Assignee
Ebaotech Internet Medical Information Technology (beijing) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ebaotech Internet Medical Information Technology (beijing) Co Ltd filed Critical Ebaotech Internet Medical Information Technology (beijing) Co Ltd
Priority to CN201811203501.4A priority Critical patent/CN109522331B/en
Publication of CN109522331A publication Critical patent/CN109522331A/en
Application granted granted Critical
Publication of CN109522331B publication Critical patent/CN109522331B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The present invention provides a kind of compartmentalization various dimensions health data processing method and medium centered on individual, and method includes: to obtain pure health data to from the collected health data progress duplicate removal processing of different data fields;Standardization processing is carried out to pure health data and obtains Unify legislation health data;Unify legislation health data is integrated into the data warehousing distinguished with data field and obtains personal information table;By different data store in a warehouse in main identification data strip carry out clustering processing and obtain main cluster data item;The first weight of every data in main cluster data item is calculated according to multiple domain personal data normalizing weight analytical formula;Main cluster data item is normalized to obtain main personal data item according to the first weight and first threshold;Master index number is generated for main personal data item;It is numbered according to master index and main personal data item is stored in universe master index table.The present invention realizes the cross validation of different data domain health data, can dynamic, reflect personal health problem comprehensively.

Description

Compartmentalization various dimensions health data processing method and medium centered on individual
Technical field
The present invention relates to field of computer technology more particularly to the integration methods of compartmentalization various dimensions health data, specifically For be exactly a kind of compartmentalization various dimensions health data processing method and medium centered on individual.
Background technique
In recent years, it as big data, the fast development of cloud platform and people are to the pay attention to day by day of self health status, is good for The value of health big data increasingly shows, and health data amount is in the hurried dilatation of geometry grade growth rate.However, only to magnanimity Medical data carries out classification processing, orderly integration, could really play the value of health data.In the prior art, Data Integration Method, which is all based on, to be established based on a certain data source (such as hospital internal data), and mainly with department or Kinds of Diseases, which are served as theme, carries out Data Integration, cannot achieve between different data sources and different types of data taking human as the mutual of unit Even intercommunication.Therefore, data correlation is carried out in hospital internal data as unit of individual, each medical institutions' internal data is only Personal medical situation once or several times can be reflected, the health that can not complete, dynamically, comprehensively reflect individual is asked Topic.
In addition, present personal data correlation technology, mainly passes through identity card, driver's license, officer's identity card, passport, medical insurance Card number etc. is associated, or is identified by name, gender and date of birth.There is processing for this data processing method Logic is single, not flexible, can not verify to personal essential information and determine its confidence level.If name has used the pet name or system Done go privacyization handle and individual data there is mistake, such issues that data can not will carry out personal data normalizing forever Change processing, allows the value of health data to have a greatly reduced quality.
Therefore, those skilled in the art need to develop a kind of health data integrated processing method as unit of individual, In health data part in the case where field errors, health data is normalized, that improves health data applies valence Value.
Summary of the invention
In view of this, the technical problem to be solved in the present invention is that providing a kind of compartmentalization various dimensions centered on individual Health data processing method and medium, different data sources, different types of health data can not be carried out by solving the prior art The problem of integration handles and can not handle part field errors health data.
In order to solve the above-mentioned technical problem, a specific embodiment of the invention provides a kind of compartmentalization centered on individual Various dimensions health data processing method, comprising: carried out according to integrity rule to from the collected health data of different data fields Duplicate removal processing obtains pure health data;Standardization processing is carried out to the pure health data and obtains Unify legislation health number According to;The Unify legislation health data is integrated into the data warehousing distinguished with data field and obtains personal information table, wherein institute Stating personal information table includes main identification data strip and non-master identification data strip;By different data store in a warehouse described in main identification data strip It carries out clustering processing and obtains main cluster data item;The main cluster numbers are calculated according to multiple domain personal data normalizing weight analytical formula According to the first weight of data every in item;Normalizing is carried out to the main cluster data item according to first weight and first threshold Change handles to obtain main personal data item;Master index number is generated for the main personal data item;Being numbered according to the master index will The main personal data item is stored in universe master index table.
A specific embodiment of the invention also provides a kind of computer storage medium comprising computer executed instructions, described When computer executed instructions are handled via data processing equipment, the compartmentalization which executes centered on individual is more Dimension health data processing method.
Above-mentioned specific embodiment according to the present invention is it is found that at compartmentalization various dimensions health data centered on individual Reason method and medium at least have the advantages that and not only carry out the interior health data of each data field (data source) laterally Comparison can also carry out longitudinal comparison to health data across data field, realize the cross validation of health data, guarantee is stored in universe The reliability of main personal data item in master index table;As needed healthy number can also be configured with the weight in configuration data domain According to the weight of specific fields, data flexibility is high;It can be made according to the characteristic of non-master identification data strip, selection during Data Matching The normalized success rate of health data is improved on the basis of guaranteeing health data reliability with accurate matching and fuzzy matching; The optimal non-master identification data in different type different data domain are compared with master index table, efficiently solve different numbers According to the normalized problem of the individual health data in domain;It is close to solve name by multiple domain personal data normalizing weight analytical formula Claim, in the case where privacyization even source health data mistake, the normalization matching problem of health data.
It is to be understood that above-mentioned general description and following specific embodiments are merely illustrative and illustrative, not The range of the invention to be advocated can be limited.
Detailed description of the invention
Following appended attached drawing is part of specification of the invention, depicts example embodiments of the present invention, institute Attached drawing is used to illustrate the principle of the present invention together with the description of specification.
Fig. 1 is at a kind of compartmentalization various dimensions health data centered on individual that the specific embodiment of the invention provides The flow chart of the embodiment one of reason method.
Fig. 2 is at a kind of compartmentalization various dimensions health data centered on individual that the specific embodiment of the invention provides The flow chart of the embodiment two of reason method.
Fig. 3 is at a kind of compartmentalization various dimensions health data centered on individual that the specific embodiment of the invention provides The flow chart of the embodiment three of reason method.
Fig. 4 is at a kind of compartmentalization various dimensions health data centered on individual that the specific embodiment of the invention provides The flow chart of the example IV of reason method.
Fig. 5 provides a kind of periodical in batches from different data fields using Kettle tool for the specific embodiment of the invention Acquire the schematic diagram of health data.
Fig. 6 be the specific embodiment of the invention provide a kind of carry out personal information table and universe master index table to intersect rope The schematic diagram drawn.
Specific embodiment
Understand in order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below will with attached drawing and in detail Narration clearly illustrates the spirit of disclosed content, and any skilled artisan is understanding the content of present invention After embodiment, when the technology that can be taught by the content of present invention, it is changed and modifies, without departing from the essence of the content of present invention Mind and range.
The illustrative embodiments of the present invention and their descriptions are used to explain the present invention, but not as a limitation of the invention. In addition, in the drawings and embodiments the use of element/component of same or like label is for representing same or like portion Point.
About " first " used herein, " second " ... etc., not especially censure the meaning of order or cis-position, It is non-to limit the present invention, only for distinguish with same technique term description element or operation.
About direction term used herein, such as: upper and lower, left and right, front or rear etc. are only the sides with reference to attached drawing To.Therefore, the direction term used is intended to be illustrative and not intended to limit this creation.
It is open term, i.e., about "comprising" used herein, " comprising ", " having ", " containing " etc. Mean including but not limited to.
About it is used herein " and/or ", including any of the things or all combination.
It include " two " and " two or more " about " multiple " herein;It include " two groups " about " multiple groups " herein And " more than two ".
About term used herein " substantially ", " about " etc., to modify it is any can be with the quantity or mistake of microvariations Difference, but this slight variations or error can't change its essence.In general, microvariations that such term is modified or error Range in some embodiments can be 20%, in some embodiments can be 10%, can be in some embodiments 5% or its His numerical value.It will be understood by those skilled in the art that the aforementioned numerical value referred to can be adjusted according to actual demand, it is not limited thereto.
Fig. 1 is at a kind of compartmentalization various dimensions health data centered on individual that the specific embodiment of the invention provides The flow chart of the embodiment one of reason method, as shown in Figure 1, standardizing again after carrying out duplicate removal processing to the health data of acquisition Processing, obtains Unify legislation health data;Then Unify legislation health data is integrated into data warehousing and obtains personal information Table;Clustering processing is carried out to the main identification data strip in personal information table again, calculates the power of every data in main cluster data item Value;Then main cluster data item is normalized according to weight to obtain main personal data item;It is finally main personal data Item generates master index number, and is numbered according to master index and store main personal data item into universe master index table.
Compartmentalization various dimensions health data processing method in the specific embodiment shown in the drawings, centered on individual Include:
Step 101: being obtained according to integrity rule to from the collected health data progress duplicate removal processing of different data fields Pure health data.In the embodiment of the present invention, integrity rule refers specifically to that name cannot be lacked in health data, lacks name It is considered as No Master Record, logic deletion is carried out to this health data.
Step 102: standardization processing being carried out to the pure health data and obtains Unify legislation health data.Of the invention In embodiment, gender field in medical insurance domain, male is labeled as " 1 ";Gender field in hospital domain, male's label For " B ", standardization processing is carried out to the data field, Unify legislation is exported as " male ".Step 102 specifically includes: analysis is every The magnanimity health data of a data field obtains the Data Identification rule of each data field;It will be described according to the Data Identification rule The specific fields data of pure health data carry out standardization processing.
Step 103: the Unify legislation health data being integrated into the data warehousing distinguished with data field and obtains individual Information table, wherein the personal information table includes main identification data strip and non-master identification data strip.In the embodiment of the present invention, Including exclusive number in every data in the main identification data strip, the exclusive number includes identification card number, officer's identity card Number, at least one of passport No. and medical insurance card number;It does not include that uniqueness is compiled in every data in the non-master identification data strip Number.For example, non-master identification data strip includes: name, gender, date of birth, phone number, telephone number, country, province, city At least one of city, home address and postcode.Data field includes medical insurance domain, hospital domain, Residential soil domain, pharmacy Domain, healthy wearable device domain and personal use behavior domain etc..The preparatory Modeling and Design of data warehousing, each data field have correspondence Data warehousing.
Wherein, medical insurance numeric field data includes: medical insurance personal information data, personal medical insurance payment data, personal medical insurance card state Data, medical advice of settlement etc..Hospital's numeric field data includes: personal essential information data, Prescriptions for Out-patients data, patient medical history number in institute According to, check data, inspection data, surgery anesthesia data, ECG data, be hospitalized order data, inpatient cases data, ICU (danger Severe ward) data etc..Residential soil numeric field data: personal essential information data, health examination data, child health care number According to, pregnant and lying-in women's information data, chronic disease management data, communicable disease control data and mental disorder management data etc..Pharmacy Numeric field data: personal essential information data, personal purchase medicine data etc..Healthy wearable device numeric field data: personal essential information data, Blood glucose level data, blood pressure data, heart rate data, oxygen content of blood data, temperature data, respiratory rate data etc..Personal use behavior Numeric field data: personal essential information data, personal authorization using internet product when the usage behaviors data such as inquiry, browsing.
Step 104: by different data store in a warehouse described in main identification data strip carry out clustering processing and obtain main cluster data item. In the embodiment of the present invention, equal cluster can be successively carried out according to identification card number, officer's identity card number, passport No., medical insurance card number. The main cluster data item table 1 as follows that clustering processing obtains is carried out according to identification card number, table 1 is to be clustered according to identification card number Handle obtained main cluster data item.
Table 1
Step 105: every number in the main cluster data item is calculated according to multiple domain personal data normalizing weight analytical formula According to the first weight.In the embodiment of the present invention, multiple domain personal data normalizing weight analytical formula specifically:
Wherein, ΩiFor the first weight of every data in main cluster data item;wiFor the quality of data in different data domain Weight;fi,jFor other data strips in some specific fields data of a data in main cluster data item and main cluster data item The successful number of specific fields Data Matching;N is the item number of data in main cluster data item;ujFor different data domain Zhong Te Determine the weight of field data;I is data Field Number, wherein data field includes medical insurance domain, hospital domain, Residential soil domain, medicine Shop domain, healthy wearable device domain and personal use behavior domain.
According to table 1 above, data parameters are as follows in medical insurance domain:
W1=10 (weights of medical insurance numeric field data quality);
f1,1=3 (numbers of identification card number progress successful match in medical insurance domain identification card number and other data fields);
f1,2=2 (numbers of name progress successful match in medical insurance domain name and other data fields);
f1,3=3 (numbers of gender progress successful match in medical insurance domain gender and other data fields);
f1,4=3 (numbers of birthdate progress successful match in medical insurance domain birthdate and other data fields);
f1,5=3 (address carries out the number of successful match, fuzzy matching in medical insurance domain address and other data fields);
f1,6=3 (numbers of cell-phone number progress successful match in medical insurance domain cell-phone number and other data fields);
N=4 (sum of data field);
u1=10 (weights of identification card number field);
u2=8 (weights of name field);
u3=6 (weights of gender field);
u4=8 (weights of birthdate field);
u5=8 (weights of address field);
u6=8 (weights of cell-phone number field).
Above-mentioned parameter artificially can freely be set according to concrete application scene situation.Such as: data need stringent normalized, Then need the weight of the fields such as enhancing name, date of birth, gender, address.Conversely, then reducing the weight of these fields.
Step 106: the main cluster data item being normalized according to first weight and first threshold To main personal data item.In the embodiment of the present invention, Ω is calculated by operational formula1Value, i.e., main cluster data Chinese medicine Protect the first weight of numeric field data item;And so on calculate Ω2、Ω3、Ω4Value, and according to preset first threshold pair Main cluster data item is normalized to obtain main personal data item, and main personal data item is exactly the data strip of the same person. First in this example, Article 2 and third data be same person's data, Article 4 data are different personal datas, therefore, main Personal data item is exactly first, Article 2 and third data.
Step 107: generating master index number for the main personal data item.In the embodiment of the present invention, master index number Generating logic is that SHA256 operation life is carried out to " identification card number+name+date of birth+home address+cell-phone number+random digit " It is numbered at master index, the meaning for drawing random digit is the repetition for avoiding master index from numbering.SHA256 is secure hash algorithm One of, it is the HASH value that one 64 are obtained after being calculated data.
Step 108: being numbered according to the master index and the main personal data item is stored in universe master index table.This hair In bright embodiment, the master index number of generation will need to be distributed to every data in main personal data item, these data simultaneously Item is located at different data fields;Universe master index table is related by master index number to the main personal data item in each data field Connection.
Referring to Fig. 1, health data in each data field is not only subjected to across comparison, it can also be across data field to healthy number According to longitudinal comparison is carried out, the cross validation of health data is realized, guarantee is stored in main personal data item in universe master index table Reliability;The weight of health data specific fields can also be configured with the weight of configuration data domain (data source) as needed, Data flexibility is high;Solves the name pet name, privacyization even source health by multiple domain personal data normalizing weight analytical formula In the case where error in data, the normalization matching problem of health data.
Fig. 2 is at a kind of compartmentalization various dimensions health data centered on individual that the specific embodiment of the invention provides Main personal data item is stored in universe main rope as shown in Fig. 2, numbering according to master index by the flow chart of the embodiment two of reason method After drawing in table, clustering processing is carried out to non-master identification data strip and obtains non-master cluster data item, and calculates non-master cluster data Second weight of every data in item;The optimal non-master identification data in each data warehousing are chosen further according to the second weight;Most Optimal non-master identification data are stored into universe master index table afterwards.
Compartmentalization various dimensions health in the specific embodiment shown in the drawings, after step 108, centered on individual Data processing method further include:
Step 109: by different data store in a warehouse described in non-master identification data strip carry out clustering processing and obtain non-master cluster numbers According to item.In the embodiment of the present invention, as long as in non-master identification data strip in the specific fields such as name, phone number, family address There is a successful match, non-master cluster data item can be converted into.It can be lived according to name, phone number, telephone number, family Location successively carries out clustering processing.The non-master cluster data item table 2 as follows that clustering processing obtains, table 2 are carried out according to telephone number To carry out the non-master cluster data item that clustering processing obtains according to telephone number.It is 4 that non-master cluster data item number is shared in table 2 Item, from different data fields, medical insurance domain, the corresponding non-master cluster data item in hospital domain and wearable device domain are same People, the corresponding non-master cluster data item in user's usage behavior domain is that separately have other people.
Table 2
Step 110: calculating in the non-master cluster data item every according to multiple domain personal data normalizing weight analytical formula Second weight of data.In the embodiment of the present invention, the multiple domain personal data normalizing weight analytical formula specifically:
Wherein, ΩiFor the second weight of every data in non-master cluster data item;wiFor the quality of data in different data domain Weight;fi,jFor a data in non-master cluster data item some specific fields data with it is other in non-master cluster data item The successful number of specific fields Data Matching of data strip;N is the item number of data in non-master cluster data item;ujFor different numbers According to the weight of specific fields data in domain;I is data Field Number, wherein data field includes medical insurance domain, hospital domain, residents ' health Archives domain, pharmacy domain, healthy wearable device domain and personal use behavior domain etc..
Step 111: the optimal non-master identification data in each data warehousing are chosen according to second weight.Of the invention In embodiment, obtained optimal non-master identification data such as following Table 3, table 3 is the optimal non-master identification data chosen.
Table 3
Step 112: the optimal non-master identification data are stored into the universe master index table.The embodiment of the present invention In, step 112 specifically includes: matching the optimal non-master identification data and the universe master index table;Successful match, then basis The master index number stores the optimal non-master identification data to the universe master index table;It fails to match, then generates new Master index number, and numbered according to new master index and store the optimal non-master identification data to the universe master index table In.For example, optimal non-master identification data and universe master index table are matched, can to " name+gender+birthdate ", " cell-phone number+home address " or " cell-phone number+birthdate " is matched, as long as a kind of combination successful match, explanation The optimal non-master identification data have existed in universe master index table, and optimal non-master identification data are stored to universe master index table In;If it fails to match for all combinations, illustrate that there is no the optimal non-master identification data in universe master index table, it is optimal for this Non-master identification data generate new master index number, and are numbered according to new master index and store optimal non-master identification data to complete In the master index table of domain.
Referring to fig. 2, optimal non-master identification data are compared with master index table, efficiently solve different type, The normalized problem of the individual health data in different data domain;In matching process can according to it is optimal it is non-master identification data characteristic, Selection meets the needs of data user using precisely matching and fuzzy matching;It is analyzed by multiple domain personal data normalizing weight In the case that formula solves the name pet name, privacyization even source health data mistake, the normalization matching problem of health data.
Fig. 3 is at a kind of compartmentalization various dimensions health data centered on individual that the specific embodiment of the invention provides The flow chart of the embodiment three of reason method;Fig. 6 provides a kind of by personal information table and universe for the specific embodiment of the invention Master index table carries out the schematic diagram of cross-index, as shown in Fig. 3, Fig. 6, by numbering in master index number and domain by personal information Table and universe master index table carry out cross-index, make the personal information table of all data fields interrelated.
Compartmentalization various dimensions health in the specific embodiment shown in the drawings, after step 112, centered on individual Data processing method further include:
Step 113: according to number in master index number and domain by the personal information table and the universe master index table into Row cross-index.In the embodiment of the present invention, number includes personal number, outpatient service number, testament number, sample number, knot in the domain Calculate at least one of single number and Residential soil number.
Referring to Fig. 3, Fig. 6, the health data in all different data domains has been associated with number in domain by master index number Come, all relative health datas can be inquired by master index number, data search is convenient.
Fig. 4 is at a kind of compartmentalization various dimensions health data centered on individual that the specific embodiment of the invention provides The flow chart of the example IV of reason method;Fig. 5 be the specific embodiment of the invention provide it is a kind of using Kettle tool never With the schematic diagram of data field periodicity batch capture health data, as shown in Figure 4, Figure 5, in order to ensure the privacy of health data Property, using private network special line using Kettle tool periodically from different data field batch capture health datas.
Compartmentalization various dimensions health in the specific embodiment shown in the drawings, before step 101, centered on individual Data processing method further include:
Step 100: by private network special line using Kettle tool periodically from different data field batch capture health datas. It, daily ought full dose health data in a few days from the acquisition of different data fields using Kettle tool in the embodiment of the present invention.It can be with Special light channel is laid between health data collection equipment and data source, Kettle tool completes Data Format Transform, In the database by the health data storage of acquisition.
Referring to fig. 4, using the health data of private network special line transmission Kettle tool acquisition, guarantee the privacy of health data Property, user experience is good;Special light channel, data acquisition efficiency are laid between health data collection equipment and data source Height, magnanimity health data collection will not influence the normal operation of medical service organ network.
The specific embodiment of the invention provides a kind of computer storage medium comprising computer executed instructions, the calculating Machine executes instruction when handling via data processing equipment, which executes the compartmentalization various dimensions centered on individual Health data processing method.Method the following steps are included:
Step 101: being obtained according to integrity rule to from the collected health data progress duplicate removal processing of different data fields Pure health data.
Step 102: standardization processing being carried out to the pure health data and obtains Unify legislation health data.
Step 103: the Unify legislation health data being integrated into the data warehousing distinguished with data field and obtains individual Information table, wherein the personal information table includes main identification data strip and non-master identification data strip.
Step 104: by different data store in a warehouse described in main identification data strip carry out clustering processing and obtain main cluster data item.
Step 105: every number in the main cluster data item is calculated according to multiple domain personal data normalizing weight analytical formula According to the first weight.
Step 106: the main cluster data item being normalized according to first weight and first threshold To main personal data item.
Step 107: generating master index number for the main personal data item.
Step 108: being numbered according to the master index and the main personal data item is stored in universe master index table.
The specific embodiment of the invention also provides a kind of computer storage medium comprising computer executed instructions, the meter Calculation machine executes instruction when handling via data processing equipment, which executes the compartmentalization multidimensional centered on individual Spend health data processing method.Method the following steps are included:
Step 101: being obtained according to integrity rule to from the collected health data progress duplicate removal processing of different data fields Pure health data.
Step 102: standardization processing being carried out to the pure health data and obtains Unify legislation health data.
Step 103: the Unify legislation health data being integrated into the data warehousing distinguished with data field and obtains individual Information table, wherein the personal information table includes main identification data strip and non-master identification data strip.
Step 104: by different data store in a warehouse described in main identification data strip carry out clustering processing and obtain main cluster data item.
Step 105: every number in the main cluster data item is calculated according to multiple domain personal data normalizing weight analytical formula According to the first weight.
Step 106: the main cluster data item being normalized according to first weight and first threshold To main personal data item.
Step 107: generating master index number for the main personal data item.
Step 108: being numbered according to the master index and the main personal data item is stored in universe master index table.
Step 109: by different data store in a warehouse described in non-master identification data strip carry out clustering processing and obtain non-master cluster numbers According to item.
Step 110: calculating in the non-master cluster data item every according to multiple domain personal data normalizing weight analytical formula Second weight of data.
Step 111: the optimal non-master identification data in each data warehousing are chosen according to second weight.
Step 112: the optimal non-master identification data are stored into the universe master index table.
The specific embodiment of the invention also provides a kind of computer storage medium comprising computer executed instructions, the meter Calculation machine executes instruction when handling via data processing equipment, which executes the compartmentalization multidimensional centered on individual Spend health data processing method.Method the following steps are included:
Step 101: being obtained according to integrity rule to from the collected health data progress duplicate removal processing of different data fields Pure health data.
Step 102: standardization processing being carried out to the pure health data and obtains Unify legislation health data.
Step 103: the Unify legislation health data being integrated into the data warehousing distinguished with data field and obtains individual Information table, wherein the personal information table includes main identification data strip and non-master identification data strip.
Step 104: by different data store in a warehouse described in main identification data strip carry out clustering processing and obtain main cluster data item.
Step 105: every number in the main cluster data item is calculated according to multiple domain personal data normalizing weight analytical formula According to the first weight.
Step 106: the main cluster data item being normalized according to first weight and first threshold To main personal data item.
Step 107: generating master index number for the main personal data item.
Step 108: being numbered according to the master index and the main personal data item is stored in universe master index table.
Step 109: by different data store in a warehouse described in non-master identification data strip carry out clustering processing and obtain non-master cluster numbers According to item.
Step 110: calculating in the non-master cluster data item every according to multiple domain personal data normalizing weight analytical formula Second weight of data.
Step 111: the optimal non-master identification data in each data warehousing are chosen according to second weight.
Step 112: the optimal non-master identification data are stored into the universe master index table.
Step 113: according to number in master index number and domain by the personal information table and the universe master index table into Row cross-index.
The specific embodiment of the invention also provides a kind of computer storage medium comprising computer executed instructions, the meter Calculation machine executes instruction when handling via data processing equipment, which executes the compartmentalization multidimensional centered on individual Spend health data processing method.Method the following steps are included:
Step 100: by private network special line using Kettle tool periodically from different data field batch capture health datas.
Step 101: being obtained according to integrity rule to from the collected health data progress duplicate removal processing of different data fields Pure health data.
Step 102: standardization processing being carried out to the pure health data and obtains Unify legislation health data.
Step 103: the Unify legislation health data being integrated into the data warehousing distinguished with data field and obtains individual Information table, wherein the personal information table includes main identification data strip and non-master identification data strip.
Step 104: by different data store in a warehouse described in main identification data strip carry out clustering processing and obtain main cluster data item.
Step 105: every number in the main cluster data item is calculated according to multiple domain personal data normalizing weight analytical formula According to the first weight.
Step 106: the main cluster data item being normalized according to first weight and first threshold To main personal data item.
Step 107: generating master index number for the main personal data item.
Step 108: being numbered according to the master index and the main personal data item is stored in universe master index table.
The specific embodiment of the invention provide a kind of compartmentalization various dimensions health data processing method centered on individual and The interior health data of each data field (data source) is not only carried out across comparison by medium, can also be across data field to health data Longitudinal comparison is carried out, realizes the cross validation of health data, guarantee to be stored in main personal data item in universe master index table can By property;As needed the weight of health data specific fields, data flexibility can also be configured with the weight in configuration data domain It is high;It can guaranteed according to the characteristic of non-master identification data strip, selection using precisely matching and fuzzy matching during Data Matching On the basis of health data reliability, the normalized success rate of health data is improved;By the optimal of different type different data domain Non-master identification data are compared with master index table, and the individual health data for efficiently solving different data domain is normalized Problem;Solves the name pet name, privacyization even source health data mistake by multiple domain personal data normalizing weight analytical formula In the case where, the normalization matching problem of health data.
The above-mentioned embodiment of the present invention can be implemented in various hardware, Software Coding or both combination.For example, this hair Bright embodiment can also be the execution above method in data signal processor (Digital Signal Processor, DSP) Program code.The present invention can also refer to computer processor, digital signal processor, microprocessor or field-programmable gate array Arrange the multiple functions that (Field Programmable Gate Array, FPGA) is executed.Above-mentioned processing can be configured according to the present invention Device executes particular task, and machine-readable software code or the firmware generation of the ad hoc approach that the present invention discloses are defined by executing Code is completed.Software code or firmware code can be developed as different program languages and different formats or form.Can also be Different target platform composing software codes.However, executing software code and the other types configuration generation of task according to the present invention Different code pattern, type and the language of code do not depart from spirit and scope of the invention.
The foregoing is merely the schematical specific embodiments of the present invention, before not departing from conceptions and principles of the invention It puts, the equivalent changes and modifications that any those skilled in the art is made should belong to the scope of protection of the invention.

Claims (10)

1. a kind of compartmentalization various dimensions health data processing method centered on individual, which is characterized in that this method comprises:
Pure health data is obtained to from the collected health data progress duplicate removal processing of different data fields according to integrity rule;
Standardization processing is carried out to the pure health data and obtains Unify legislation health data;
The Unify legislation health data is integrated into the data warehousing distinguished with data field and obtains personal information table, wherein The personal information table includes main identification data strip and non-master identification data strip;
By different data store in a warehouse described in main identification data strip carry out clustering processing and obtain main cluster data item;
The first weight of every data in the main cluster data item is calculated according to multiple domain personal data normalizing weight analytical formula;
The main cluster data item is normalized to obtain main personal data according to first weight and first threshold Item;
Master index number is generated for the main personal data item;And
It is numbered according to the master index and the main personal data item is stored in universe master index table.
2. the compartmentalization various dimensions health data processing method centered on individual as described in claim 1, which is characterized in that After the step that the main personal data item is stored in universe master index table according to master index number, this method is also wrapped It includes:
By different data store in a warehouse described in non-master identification data strip carry out clustering processing and obtain non-master cluster data item;
The second power of every data in the non-master cluster data item is calculated according to multiple domain personal data normalizing weight analytical formula Value;
The optimal non-master identification data in each data warehousing are chosen according to second weight;And
The optimal non-master identification data are stored into the universe master index table.
3. the compartmentalization various dimensions health data processing method centered on individual as claimed in claim 2, which is characterized in that The optimal non-master identification data are stored into the step into the universe master index table, are specifically included:
Match the optimal non-master identification data and the universe master index table;
Successful match then numbers according to the master index and stores the optimal non-master identification data to the universe master index Table;
It fails to match, then generates new master index number, and number the optimal non-master identification data according to new master index It stores into the universe master index table.
4. the compartmentalization various dimensions health data processing method centered on individual as claimed in claim 2, which is characterized in that After the optimal non-master identification data are stored the step into the universe master index table, this method further include:
The personal information table and the universe master index table are subjected to cross-index according to number in master index number and domain.
5. the compartmentalization various dimensions health data processing method centered on individual as claimed in claim 4, which is characterized in that Number includes in personal number, outpatient service number, testament number, sample number, advice of settlement number and Residential soil number in the domain At least one.
6. the compartmentalization various dimensions health data processing method centered on individual as described in claim 1, which is characterized in that The step of pure health data is obtained to from the collected health data progress duplicate removal processing of different data fields according to integrity rule Before rapid, this method further include:
By private network special line using Kettle tool periodically from different data field batch capture health datas.
7. the compartmentalization various dimensions health data processing method centered on individual as described in claim 1, which is characterized in that The step of standardization processing obtains Unify legislation health data is carried out to the pure health data, is specifically included:
The magnanimity health data for analyzing each data field obtains the Data Identification rule of each data field;And
The specific fields data of the pure health data are subjected to standardization processing according to the Data Identification rule.
8. the compartmentalization various dimensions health data processing method centered on individual as described in claim 1, which is characterized in that The multiple domain personal data normalizing weight analytical formula specifically:
Wherein, ΩiFor the first weight of every data in main cluster data item;wiFor the weight of the quality of data in different data domain; fi,jFor the spy of other data strips in some specific fields data of a data in main cluster data item and main cluster data item Determine the successful number of field data match;N is the item number of data in main cluster data item;ujFor specific fields in different data domain The weight of data;I be data Field Number, wherein data field include medical insurance domain, hospital domain, Residential soil domain, pharmacy domain, Healthy wearable device domain and personal use behavior domain.
9. the compartmentalization various dimensions health data processing method centered on individual as described in claim 1, which is characterized in that Including exclusive number in every data in the main identification data strip, the exclusive number includes identification card number, officer's identity card Number, at least one of passport No. and medical insurance card number;It does not include that uniqueness is compiled in every data in the non-master identification data strip Number.
10. a kind of computer storage medium comprising computer executed instructions, the computer executed instructions are via data processing When equipment processing, which requires 1~9 any compartmentalization various dimensions centered on individual Health data processing method.
CN201811203501.4A 2018-10-16 2018-10-16 Individual-centered regionalized multi-dimensional health data processing method and medium Active CN109522331B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811203501.4A CN109522331B (en) 2018-10-16 2018-10-16 Individual-centered regionalized multi-dimensional health data processing method and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811203501.4A CN109522331B (en) 2018-10-16 2018-10-16 Individual-centered regionalized multi-dimensional health data processing method and medium

Publications (2)

Publication Number Publication Date
CN109522331A true CN109522331A (en) 2019-03-26
CN109522331B CN109522331B (en) 2021-04-16

Family

ID=65770882

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811203501.4A Active CN109522331B (en) 2018-10-16 2018-10-16 Individual-centered regionalized multi-dimensional health data processing method and medium

Country Status (1)

Country Link
CN (1) CN109522331B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694993A (en) * 2020-06-11 2020-09-22 北京金山云网络技术有限公司 Method, device, electronic equipment and medium for creating data index
CN113836141A (en) * 2021-09-24 2021-12-24 中国劳动关系学院 Big data cross indexing method based on distribution model

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101324894A (en) * 2008-07-24 2008-12-17 中国网络通信集团公司 Correlation method and system of medical global mark and medical local mark
CN102005023A (en) * 2010-10-26 2011-04-06 汪海玥 National health medical file system managed by means of internet website
CN103870668A (en) * 2012-12-17 2014-06-18 上海联影医疗科技有限公司 Method and device for establishing master patient index oriented to regional medical treatment
CN104063567A (en) * 2013-03-20 2014-09-24 上海联影医疗科技有限公司 Establishment method of patient identity source cross reference
CN105574334A (en) * 2015-12-15 2016-05-11 深圳安泰创新科技股份有限公司 Medical information processing method and system
CN105678100A (en) * 2016-03-01 2016-06-15 万达信息股份有限公司 Health record browsing system
CN105787010A (en) * 2016-02-23 2016-07-20 北京凯行同创科技有限公司 Acquisition processing and pushing method and system based on personal data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101324894A (en) * 2008-07-24 2008-12-17 中国网络通信集团公司 Correlation method and system of medical global mark and medical local mark
CN102005023A (en) * 2010-10-26 2011-04-06 汪海玥 National health medical file system managed by means of internet website
CN103870668A (en) * 2012-12-17 2014-06-18 上海联影医疗科技有限公司 Method and device for establishing master patient index oriented to regional medical treatment
CN104063567A (en) * 2013-03-20 2014-09-24 上海联影医疗科技有限公司 Establishment method of patient identity source cross reference
CN105574334A (en) * 2015-12-15 2016-05-11 深圳安泰创新科技股份有限公司 Medical information processing method and system
CN105787010A (en) * 2016-02-23 2016-07-20 北京凯行同创科技有限公司 Acquisition processing and pushing method and system based on personal data
CN105678100A (en) * 2016-03-01 2016-06-15 万达信息股份有限公司 Health record browsing system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694993A (en) * 2020-06-11 2020-09-22 北京金山云网络技术有限公司 Method, device, electronic equipment and medium for creating data index
CN111694993B (en) * 2020-06-11 2023-05-02 北京金山云网络技术有限公司 Method, device, electronic equipment and medium for creating data index
CN113836141A (en) * 2021-09-24 2021-12-24 中国劳动关系学院 Big data cross indexing method based on distribution model

Also Published As

Publication number Publication date
CN109522331B (en) 2021-04-16

Similar Documents

Publication Publication Date Title
Degoulet et al. Introduction to clinical informatics
Petitti Meta-analysis, decision analysis, and cost-effectiveness analysis: methods for quantitative synthesis in medicine
US20130197925A1 (en) Behavioral clustering for removing outlying healthcare providers
KR20230118194A (en) Systems and methods for modifying and redacting health data for analysis across geographic regions
US20140149132A1 (en) Adaptive medical documentation and document management
JP7274599B2 (en) Automatic creation of cancer registry records
JP2018060529A (en) Method and apparatus of context-based patient similarity
Chowriappa et al. Introduction to machine learning in healthcare informatics
JPWO2019244949A1 (en) Biometric information processing methods, biometric information processing devices, and biometric information processing systems
US11875884B2 (en) Expression of clinical logic with positive and negative explainability
CN112002397A (en) Clinical decision support and clinical cost management system
CN109522331A (en) Compartmentalization various dimensions health data processing method and medium centered on individual
Kumar et al. Review paper on Big Data in healthcare informatics
US20130253892A1 (en) Creating synthetic events using genetic surprisal data representing a genetic sequence of an organism with an addition of context
US11238988B2 (en) Large scale identification and analysis of population health risks
US20180286519A1 (en) Methods and Systems for Extrapolating and Estimating Occurrences Based on Sample Data
CN115171830A (en) Patient data-based service package generation method, device, equipment and storage medium
CN114121213A (en) Anesthesia medicine information rechecking method and device, electronic equipment and storage medium
CN111986815B (en) Project combination mining method based on co-occurrence relation and related equipment
US20160162650A1 (en) Method for automating medical billing
Yee et al. Big data: Its implications on healthcare and future steps
US10521552B2 (en) Method and computing device for implementing multiple matching strategies
Guyet et al. An open generator of synthetic administrative healthcare databases
GB2573512A (en) Database and associated method
CN113657809B (en) Hospital portrait construction method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant