CN108198595A - A kind of multi-source heterogeneous unstructured medical record data fusion method - Google Patents

A kind of multi-source heterogeneous unstructured medical record data fusion method Download PDF

Info

Publication number
CN108198595A
CN108198595A CN201810047069.8A CN201810047069A CN108198595A CN 108198595 A CN108198595 A CN 108198595A CN 201810047069 A CN201810047069 A CN 201810047069A CN 108198595 A CN108198595 A CN 108198595A
Authority
CN
China
Prior art keywords
data
class
medical record
tables
record data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810047069.8A
Other languages
Chinese (zh)
Other versions
CN108198595B (en
Inventor
史晟辉
李五锁
詹思延
徐梓豪
张洋
杨羽
武姗姗
黄元升
黄定琦
陈晓宇
张永健
赵鑫
杨廷伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Beijing University of Chemical Technology
Original Assignee
Peking University
Beijing University of Chemical Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University, Beijing University of Chemical Technology filed Critical Peking University
Priority to CN201810047069.8A priority Critical patent/CN108198595B/en
Publication of CN108198595A publication Critical patent/CN108198595A/en
Application granted granted Critical
Publication of CN108198595B publication Critical patent/CN108198595B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/328Management therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of multi-source heterogeneous unstructured medical record data fusion methods.The method is based on following design, and the tables of data based on medical record data platform establishes exterior deficiency and intends class, and the exterior deficiency intends class and includes entity generation SQL statement algorithm;Class foundation and the one-to-one table class of attribute of the tables of data in medical record data platform are intended by exterior deficiency;It establishes data and controls virtual class, the example of the correspondence table class of tables of data in case data platform is included in a manner of attribute, the data control virtual class to include virtual transfer algorithm, and the virtual transfer algorithm converts the data into the object for table class;The entity generation SQL statement algorithm is called, the object of the table class is traversed by reflection technology, the attribute of data is changed into SQL statement, stores the tables of data into medical record data platform.The present invention realizes medical record data fusion, improves processing safety and fusion efficiencies, and effectively reduce error rate under the premise of original system Structure and stability is not influenced.

Description

A kind of multi-source heterogeneous unstructured medical record data fusion method
Technical field
The present invention relates to electronic health record technical field more particularly to a kind of multi-source heterogeneous unstructured medical record data fusion sides Method.
Background technology
With the development of Chinese medical cause, although medical information degree improves, the HIS of each hospital (Hospital Information System) system storage organization is not consistent.In order to carry out scientific research, in face of from difference No any structure text data are exported in the HIS databases of hospital, how to be realized for unstructured in multiple data sources Data integrated, remove privacy, it is an important content in Chinese electronic health record research work to form structural data. To at present, China there is not yet the data of such Medical data sharing, to multiple data sources, different structure, it is non- The integration of the data of structuring, also in budding state.
Such as shown in Fig. 1, medical record data comes from the case history text data of more Different hospitals, multi-source heterogeneous to study Non-structured medical record data merges, and following characteristics are contained in text medical record data:
(1) form of case history is text, but the form disunity of text, there is word forms, txt forms, all It is non-structured data.
(2) data volume of text is big, nearly 100MB.
(3) content of the medical record data file of each hospital, structure are between each other without similar.Since medical record data source comes from Each different place, and the HIS systems that each civilian hospital uses are inconsistent, along with manually being exported from HIS systems The each difference of operation.As a result the form for leading to medical record data is inconsistent.
(4) data structure of different regions hospital is substantially inconsistent, but the form basic one of the file in same area It causes.
Realize above-mentioned unstructured case data fusion, existing influences original system stability, complicated for operation, inefficiency, Error rate is high, the problems such as needing second-order correction.
Invention content
To solve the above problems, the present invention provides a kind of multi-source heterogeneous unstructured medical record data fusion method, it is intended to number On the basis of the original form of platform, structuring operation is carried out to new data, is formed with being carried out after the matched structure of data platform Addition, does not influence former data platform or system operation and stability, and avoid or reduce further machine or artificial correction.
A kind of multi-source heterogeneous unstructured medical record data fusion method, includes the following steps:
(a) tables of data based on medical record data platform establishes exterior deficiency and intends class, and the exterior deficiency intends class and includes entity generation SQL languages Sentence algorithm;
(b) class foundation and the one-to-one table class of attribute of the tables of data in medical record data platform are intended by exterior deficiency;
(c) data are established and control virtual class, the correspondence table class of tables of data in case data platform is included in a manner of attribute Example, the data control virtual class to include virtual transfer algorithm, and the virtual transfer algorithm is converted the data into as table class Object;
(d) the entity generation SQL statement algorithm is called, the object of the table class is traversed by reflection technology, by data Attribute change into SQL statement, store the tables of data into medical record data platform.
Preferably, the tables of data of the medical record data platform include patient's personal information, the history information of patient, case Essential information and progress note information.
Preferably, the attribute of the data is word or text.
A kind of multi-source heterogeneous unstructured medical record data fusion method, including with lower module:
(a) exterior deficiency intends generic module, and SQL statement algorithm is generated including entity;
(b) table generic module is intended generic module by exterior deficiency and is obtained and one a pair of the attribute of the tables of data in medical record data platform The table class answered;
(c) data control virtual generic module, including the example of the correspondence table class of tables of data in case data platform and virtually Transfer algorithm, the virtual transfer algorithm convert the data into the object for table class;
The table generic module calls the entity generation SQL statement algorithm, and pair of the table class is traversed by reflection technology As the attribute of data is changed into SQL statement, stores the tables of data into medical record data platform.
Preferably, the tables of data of the medical record data platform include patient's personal information, the history information of patient, case Essential information and progress note information.
Preferably, the attribute of the data is word or text.
The multi-source heterogeneous unstructured medical record data fusion method of one kind provided by the present invention is by changing to input data Carry out structuring, obtain with input data platform after the data of data platform attributes match, do not influencing original system structure and steady Under the premise of qualitatively, data fusion is realized, improves processing safety and fusion efficiencies, and effectively reduce error rate.
Description of the drawings
Fig. 1 is multi-source heterogeneous case history text data process flow
Fig. 2 is the flow chart of multi-source heterogeneous unstructured medical record data fusion method provided by the present invention.
Specific embodiment
For those skilled in the art is made to more fully understand technical scheme of the present invention, the present invention is carried below in conjunction with the accompanying drawings The multi-source heterogeneous unstructured medical record data fusion method of one kind of confession is described in detail.
Embodiment one
The form of all case text data files is converted;The tray of txt is dumped to from forms such as word In formula.
For the content of case data, an electronic medical records data platform for including all case data contents is designed.Electricity The system structure of sub- case data platform, it is contemplated that the Rational structure of case data rather than rely on certain a kind of case file Data content, and the dependence between the case relation table of data platform also should be simple and clear.Designed case load The substance that should be included according to platform:The personal information of patient, the history information of patient, the essential information of case, course of disease note Record information etc..
Realize that a case data platform exterior deficiency intends class A, the case data platform exterior deficiency in present case case fusion case Intend generating the method B of SQL statement comprising an entity in class A, realize the SQL statement that the attribute of oneself is changed into data platform. The reflection technology of CSharp is used in present case, so as to be directly generated by way of the attribute of traverse object.
Each table in corresponding case data platform, realizes each table class C based on virtual class A;Attribute in table class C with The attribute of table corresponds in case data platform.It it is achieved thereby that can for each entity object in case data platform Automatically generate the SQL statement being input in case data platform.
One data of design control virtual class D, and the correspondence of table in each case data platform is included in a manner of attribute The example of table class C controls a virtual conversion method E defined in virtual class D in data, and this method is to be converted into text data Object implementatio8 for table class C;
The control module of system realizes that the reading to file operates;Virtual class D is controlled to correspond to according to parameter call data real The conversion method E of body generates each object of table class C, and the method B of SQL statement is generated by the entity of call list class C, just real Show and each data have been deposited into data platform.
So far, basic function is all realized substantially, also, content of no longer modifying.
When the non-structured case data in a certain area need to integrate, according to the characteristics of data source, with case data The form of class F realizes virtual class D, i.e., the entity conversion method E for generating each object is realized, calling system fortune Row can realize that current non-structured case data are dissolved into the table in case data platform.
System control process is essentially:The entity conversion method E of case data class F is called, is entered data into method, it is empty Method E can generate the object of the table class C of the table in each corresponding case data platform;According to the case load in case data platform According to the generation SQL method B of the dependence of the major key of table, successively call list class C, SQL statement, execution are returned.Although control Defined in molding block is that data control virtual class D, but according to the difference of the case data source of input and the parameter of input Difference, that actually perform is the last entity conversion method E of case data class F newly added in.
As shown in Figure 2:
Wherein table 1, table 2 etc. belong to the various tables of case data Platform Designing;
Table class 1, table class 2 etc. be with the corresponding table class C of case data platform table, they are from the virtual class A of a base Come over;
Object 1, object 2 etc. are the objects of the generations such as table class 1, table class 2, by the way that the entity of itself is called to generate SQL statement Method B, the sentence for the attribute data of itself being deposited into, table being corresponded in case data platform can be generated;
Data source 1, data source 2 etc. are the non-structured medical record datas for having different structure from different areas;
It is to define empty conversion method E that data, which control virtual class, and this method is that the data in data source are generated data pair The process of elephant, what is called in control module is the virtual class;
Data class 1, data class 2 etc. are according to data source 1, the case history feature of data source 2, and are carried out increased based on data Control the subclass of virtual class D.Object and the method that how to generate these objects which includes each table class.
Control module is the basic process of system, and the mode that data control the empty method E of virtual class is called (actually to call Conversion method E for case data class F), text medical record data is input to control module, the opposite data with data source of generation To processes such as each object 1, objects 2;And it is deposited into data platform, forms the data of structuring.
According to the method so designed, after new text medical record data source n is obtained, work is for according to text The content of the data of medical record data source n completes a new entity class n for realizing data and controlling the empty method E of virtual class D, Data source file and novel entities class n are input to for parameter in control module, just realized to new medical record data source The integration of the data of n, and there is no any influence to the data source of structured completion before, to system before Normal operation does not also influence.The requirements such as efficient, safety are reached.
For being independent from each other between the data of different regions, for the difference of the new different data sources of new content The non-structured data of structure are supplemented in a manner of adding rather than original system are modified, and are influenced previously Content.And then under the premise of original system Structure and stability is not influenced, data fusion is realized, improve safe operation Property and fusion efficiencies, and effectively reduce error rate.
It is understood that the principle that embodiment of above is intended to be merely illustrative of the present and the exemplary implementation that uses Mode, however the present invention is not limited thereto.For those skilled in the art, in the essence for not departing from the present invention In the case of refreshing and essence, various changes and modifications can be made therein, these variations and modifications are also considered as protection scope of the present invention.

Claims (6)

1. a kind of multi-source heterogeneous unstructured medical record data fusion method, which is characterized in that include the following steps:
(a) tables of data based on medical record data platform establishes exterior deficiency and intends class, and the exterior deficiency intends class and includes entity generation SQL statement calculation Method;
(b) class foundation and the one-to-one table class of attribute of the tables of data in medical record data platform are intended by exterior deficiency;
(c) data are established and control virtual class, the reality of the correspondence table class of tables of data in case data platform is included in a manner of attribute Example, the data control virtual class to include virtual transfer algorithm, and the virtual transfer algorithm converts the data into pair for table class As;
(d) the entity generation SQL statement algorithm is called, the object of the table class is traversed by reflection technology, by the category of data Property changes into the SQL statement of data platform, stores the tables of data into medical record data platform.
2. multi-source heterogeneous unstructured medical record data fusion method according to claim 1, which is characterized in that the case history The tables of data of data platform includes patient's personal information, the history information of patient, the essential information of case and progress note information.
3. multi-source heterogeneous unstructured medical record data fusion method according to claim 1 or 2, which is characterized in that described The attribute of data is word or text.
4. a kind of multi-source heterogeneous unstructured medical record data fusion method, which is characterized in that including with lower module:
(a) exterior deficiency intends generic module, and SQL statement algorithm is generated including entity;
(b) table generic module is obtained one-to-one with the attribute of the tables of data in medical record data platform by exterior deficiency plan generic module Table class;
(c) data control virtual generic module, including the example of the correspondence table class of tables of data in case data platform and virtual conversion Algorithm, the virtual transfer algorithm convert the data into the object for table class;
The table generic module calls the entity generation SQL statement algorithm, and the object of the table class is traversed by reflection technology, will The attribute of data changes into SQL statement, stores the tables of data into medical record data platform.
5. multi-source heterogeneous unstructured medical record data fusion method according to claim 4, which is characterized in that the case history The tables of data of data platform includes patient's personal information, the history information of patient, the essential information of case and progress note information.
6. multi-source heterogeneous unstructured medical record data fusion method according to claim 4 or 5, which is characterized in that described The attribute of data is word or text.
CN201810047069.8A 2018-01-18 2018-01-18 Multi-source heterogeneous unstructured medical record data fusion method Active CN108198595B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810047069.8A CN108198595B (en) 2018-01-18 2018-01-18 Multi-source heterogeneous unstructured medical record data fusion method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810047069.8A CN108198595B (en) 2018-01-18 2018-01-18 Multi-source heterogeneous unstructured medical record data fusion method

Publications (2)

Publication Number Publication Date
CN108198595A true CN108198595A (en) 2018-06-22
CN108198595B CN108198595B (en) 2022-05-03

Family

ID=62590153

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810047069.8A Active CN108198595B (en) 2018-01-18 2018-01-18 Multi-source heterogeneous unstructured medical record data fusion method

Country Status (1)

Country Link
CN (1) CN108198595B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109448846A (en) * 2018-09-07 2019-03-08 北京大学 A kind of analysis method for calculating rare sick disease incidence based on medical insurance big data
CN111177156A (en) * 2019-12-31 2020-05-19 广东科学技术职业学院 Big data storage method and system
CN111177506A (en) * 2019-12-31 2020-05-19 广东科学技术职业学院 Classification storage method and system based on big data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226478A (en) * 2013-05-22 2013-07-31 北京金和软件股份有限公司 Method for automatically generating and using code
US20160021181A1 (en) * 2013-07-23 2016-01-21 George Ianakiev Data fusion and exchange hub - architecture, system and method
CN107066499A (en) * 2016-12-30 2017-08-18 江苏瑞中数据股份有限公司 The data query method of multi-source data management and visualization system is stored towards isomery
CN107193858A (en) * 2017-03-28 2017-09-22 福州金瑞迪软件技术有限公司 Towards the intelligent Service application platform and method of multi-source heterogeneous data fusion
CN107402976A (en) * 2017-07-03 2017-11-28 国网山东省电力公司经济技术研究院 Power grid multi-source data fusion method and system based on multi-element heterogeneous model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226478A (en) * 2013-05-22 2013-07-31 北京金和软件股份有限公司 Method for automatically generating and using code
US20160021181A1 (en) * 2013-07-23 2016-01-21 George Ianakiev Data fusion and exchange hub - architecture, system and method
CN107066499A (en) * 2016-12-30 2017-08-18 江苏瑞中数据股份有限公司 The data query method of multi-source data management and visualization system is stored towards isomery
CN107193858A (en) * 2017-03-28 2017-09-22 福州金瑞迪软件技术有限公司 Towards the intelligent Service application platform and method of multi-source heterogeneous data fusion
CN107402976A (en) * 2017-07-03 2017-11-28 国网山东省电力公司经济技术研究院 Power grid multi-source data fusion method and system based on multi-element heterogeneous model

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109448846A (en) * 2018-09-07 2019-03-08 北京大学 A kind of analysis method for calculating rare sick disease incidence based on medical insurance big data
CN111177156A (en) * 2019-12-31 2020-05-19 广东科学技术职业学院 Big data storage method and system
CN111177506A (en) * 2019-12-31 2020-05-19 广东科学技术职业学院 Classification storage method and system based on big data
CN111177156B (en) * 2019-12-31 2023-10-03 广东科学技术职业学院 Big data storage method and system

Also Published As

Publication number Publication date
CN108198595B (en) 2022-05-03

Similar Documents

Publication Publication Date Title
CN104123374B (en) The method and device of aggregate query in distributed data base
CN108198595A (en) A kind of multi-source heterogeneous unstructured medical record data fusion method
US20040122661A1 (en) Method, system, and computer program product for storing, managing and using knowledge expressible as, and organized in accordance with, a natural language
EP1222569A1 (en) Method and systems for making olap hierarchies summarisable
CN103049251B (en) A kind of data base persistence layer device and database operation method
Slepicka et al. KR2RML: An Alternative Interpretation of R2RML for Heterogenous Sources.
CN111061739B (en) Method and device for warehousing massive medical data, electronic equipment and storage medium
WO2008016822A2 (en) Primenet data management system
JP2010160591A (en) Device, method and program for managing spatial data
CN112115276B (en) Intelligent customer service method, device, equipment and storage medium based on knowledge graph
US20230050290A1 (en) Horizontally-scalable data de-identification
Theodorakis et al. Context in information bases
CN106021344A (en) A multi-adaptive CIME power grid model sharing method
JP2017511928A (en) Data processing method and system for establishing input recommendations
van den Hamer et al. A data flow based architecture for CAD frameworks
Chen et al. Constructing and maintaining scientific database views in the framework of the object-protocol model
CN110032574B (en) SQL statement processing method and device
EP2590089B1 (en) Rule type columns in database
CN114817512A (en) Question-answer reasoning method and device
CN114637752A (en) Connection query statement processing method, device, equipment and storage medium
Sukarsa et al. Modification of ISONER Framework as Enterprise Service Bus to Build Consultation Robot Using External Engine
Bechhofer et al. Delivering terminological services
Sachdeva et al. AQBE–QBE style queries for archetyped data
JP2785138B2 (en) Genetic methods in large-scale knowledge database systems
CN113672639B (en) Multi-type database table structure comparison method, system, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant