CN111767332A - Data integration method, system and terminal for heterogeneous data sources - Google Patents

Data integration method, system and terminal for heterogeneous data sources Download PDF

Info

Publication number
CN111767332A
CN111767332A CN202010566643.8A CN202010566643A CN111767332A CN 111767332 A CN111767332 A CN 111767332A CN 202010566643 A CN202010566643 A CN 202010566643A CN 111767332 A CN111767332 A CN 111767332A
Authority
CN
China
Prior art keywords
data
heterogeneous
database
metadata
change
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010566643.8A
Other languages
Chinese (zh)
Other versions
CN111767332B (en
Inventor
王福
陈良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Synyi Medical Technology Co ltd
Original Assignee
Shanghai Synyi Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Synyi Medical Technology Co ltd filed Critical Shanghai Synyi Medical Technology Co ltd
Publication of CN111767332A publication Critical patent/CN111767332A/en
Application granted granted Critical
Publication of CN111767332B publication Critical patent/CN111767332B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Abstract

The data integration method, the data integration system and the terminal of the heterogeneous data source are used for solving the problems that in the prior art, based on a large amount of heterogeneous data, especially when structured data and unstructured data are integrated, the data integration is incomplete, the efficiency is not high, the data are difficult to expand, the data lack of treatment, the application range is limited, repeated development is needed when the integration range is expanded to new application, and the cost is high. The heterogeneous databases of the subsystems are converted into the unified data format supported by the data lake, the problem of inconsistent data content standards among heterogeneous data is deeply managed, data integration and sharing are realized, the data standard is established, the follow-up data application is facilitated, and the expandability is good.

Description

Data integration method, system and terminal for heterogeneous data sources
Technical Field
The present invention relates to the technical field of data information processing, and in particular, to a data integration method, system and terminal for heterogeneous data sources.
Background
Data is computer information that may be transmitted and stored. "database" refers to a set of related data that is stored, organized, and manipulated in a particular logical structure. To ensure transaction rate, reliability, maintainability, scalability and cost, existing large applications typically access databases through "database management software" (DBMS), obtain the required data or perform data maintenance. Database management software such as IBM DB2, Oracle, Mysql, SqlServer, etc., dominates large data processing applications.
With the development of information-based construction, if an enterprise wishes to support operation management in the enterprise through data analysis and Business Intelligence (BI), a uniform data warehouse must be established to store data of each sub-application system in a centralized manner, so as to ensure data consistency, achieve data interconnection and intercommunication, exchange and share data sources efficiently, and reduce repeated labor and corresponding cost for data collection. However, since different application systems use different database software, the data storage structures and data maintenance methods of the systems are different, and the problem of exchanging heterogeneous data arises. Heterogeneous data refers not only to different types of database software, but also includes heterogeneity between different structured data, such as structured data and unstructured data.
Particularly in medical scenarios, unstructured data is very common, such as patient medical records, examination reports, images, text, recordings, and the like.
In order to solve the problem, in the prior art, an independent data interface is generally developed between subsystems needing to be integrated, and data integration is performed according to specified data content and format, but the limitation is more, and the requirement for massive data exchange of all systems in an enterprise cannot be met. In addition, data is lack of governance, the application range is limited to a certain extent, repeated development is needed when the integration range is expanded to new application, and the cost is high.
Disclosure of Invention
In view of the above drawbacks of the prior art, an object of the present invention is to provide a data integration method, a system and a terminal for a heterogeneous data source, which are used to solve the problems in the prior art that when a large amount of heterogeneous data is based, especially structured data and unstructured data are integrated, data integration is incomplete, low in efficiency, difficult to expand, data is lack of governance, an application range is limited, and when the integration range is expanded to a new application, repeated development is required, and cost is high.
To achieve the above and other related objects, the present invention provides a data integration method for heterogeneous data sources, including: performing abstract mapping on each data source in a plurality of heterogeneous databases to obtain metadata of each meta-model under the mapping relation, wherein each meta-model corresponds to one data source; copying each heterogeneous database to a copy database, and establishing change capture on the copy database to obtain a change table for recording change data in each heterogeneous database; converting the read change data in each heterogeneous database into a data format unified with the metadata; and performing data governance on the change data and the metadata after the conversion of the unified data format, and storing the change data and the metadata into an integrated data lake.
In an embodiment of the present invention, the manner of performing abstract mapping on each data source in the multiple heterogeneous databases to obtain metadata of each meta-model obtained under the mapping relationship includes: abstract mapping is carried out on physical models in each data source in a plurality of heterogeneous databases according to the mapping relation, and meta models with logical relations are respectively generated; and obtaining the metadata of the meta-model of each data source under the mapping relation based on each meta-model.
In an embodiment of the present invention, the heterogeneous database includes structured data and/or unstructured data.
In an embodiment of the invention, the unstructured data includes: patient medical record data, examination report data, image data, text data, and a record database.
In an embodiment of the present invention, the copying the heterogeneous databases to the replication database and establishing change capture on the replication database to obtain a change table for recording change data in the heterogeneous databases includes: synchronously copying data in each heterogeneous database to a copy database; capturing new change data in the replicated database into a change table each time a time threshold has elapsed.
In an embodiment of the present invention, the data structure supported by the replication database includes: one or more of DB2, Oracle, Sqlserver, and Mysql database.
In an embodiment of the present invention, the data governance method includes: one or more of invalid data removal, unified data definition, missing data processing, and efficient variable manner of extracting unstructured data.
To achieve the above and other related objects, the present invention provides a data integration system of heterogeneous data sources, the system comprising: the metadata management module is used for carrying out abstract mapping on each data source in the heterogeneous databases to obtain metadata of each meta-model under the mapping relation, wherein each meta-model corresponds to one data source; the replication database module is used for replicating each heterogeneous database to a replication database and establishing change capture on the replication database so as to obtain a change table for recording change data in each heterogeneous database; the data integration module is connected with the metadata management module and the copy database module and is used for converting the read change data in each heterogeneous database into a data format unified with the metadata; and the data management module is connected with the data integration module and used for performing data management on the change data and the metadata after the conversion of the unified data format and storing the change data and the metadata into an integrated data lake.
In an embodiment of the present invention, the manner of performing abstract mapping on each data source in the multiple heterogeneous databases to obtain metadata of each meta-model obtained under the mapping relationship includes: abstract mapping is carried out on physical models in each data source in a plurality of heterogeneous databases according to the mapping relation, and meta models with logical relations are respectively generated; and obtaining the metadata of the meta-model of each data source under the mapping relation based on each meta-model.
To achieve the above and other related objects, the present invention provides a data integration terminal for heterogeneous data sources, comprising: a memory for storing a computer program; and the processor is used for executing the data integration method of the heterogeneous data source.
As described above, the data integration method, system and terminal of the heterogeneous data source of the present invention have the following beneficial effects: the heterogeneous databases of the subsystems are converted into the unified data format supported by the data lake, the problem of inconsistent data content standards among heterogeneous data is deeply managed, data integration and sharing are realized, the data standard is established, the follow-up data application is facilitated, and the expandability is good.
Drawings
Fig. 1 is a flowchart illustrating a data integration method for heterogeneous data sources according to an embodiment of the present invention.
Fig. 2 is a flowchart illustrating a data integration method for heterogeneous data sources according to an embodiment of the present invention.
Fig. 3 is a flowchart illustrating a data integration method for heterogeneous data sources according to an embodiment of the present invention.
Fig. 4 is a flowchart illustrating a data integration method for heterogeneous data sources according to an embodiment of the present invention.
Fig. 5 is a schematic structural diagram of a data integration system of heterogeneous data sources according to an embodiment of the present invention.
Fig. 6 is a schematic structural diagram of a data integration terminal of heterogeneous data sources according to an embodiment of the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It is noted that in the following description, reference is made to the accompanying drawings which illustrate several embodiments of the present invention. It is to be understood that other embodiments may be utilized and that mechanical, structural, electrical, and operational changes may be made without departing from the spirit and scope of the present invention. The following detailed description is not to be taken in a limiting sense, and the scope of embodiments of the present invention is defined only by the claims of the issued patent. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. Spatially relative terms, such as "upper," "lower," "left," "right," "lower," "below," "lower," "over," "upper," and the like, may be used herein to facilitate describing one element or feature's relationship to another element or feature as illustrated in the figures.
Throughout the specification, when a part is referred to as being "connected" to another part, this includes not only a case of being "directly connected" but also a case of being "indirectly connected" with another element interposed therebetween. In addition, when a certain part is referred to as "including" a certain component, unless otherwise stated, other components are not excluded, but it means that other components may be included.
The terms first, second, third, etc. are used herein to describe various elements, components, regions, layers and/or sections, but are not limited thereto. These terms are only used to distinguish one element, component, region, layer or section from another element, component, region, layer or section. Thus, a first element, component, region, layer or section discussed below could be termed a second element, component, region, layer or section without departing from the scope of the present invention.
Also, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," and/or "comprising," when used in this specification, specify the presence of stated features, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, operations, elements, components, items, species, and/or groups thereof. The terms "or" and/or "as used herein are to be construed as inclusive or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a; b; c; a and B; a and C; b and C; A. b and C ". An exception to this definition will occur only when a combination of elements, functions or operations are inherently mutually exclusive in some way.
Therefore, the embodiment of the present invention provides a data integration method for a heterogeneous data source, which is used to solve the problems in the prior art that, based on a large amount of heterogeneous data, especially when structured data and unstructured data are integrated, data integration is incomplete, efficiency is not high, expansion is difficult, data is lack of governance, an application range is limited, repeated development is required when the integration range is expanded to a new application, and cost is high. The heterogeneous databases of the subsystems are converted into the unified data format supported by the data lake, the problem of inconsistent data content standards among heterogeneous data is deeply managed, data integration and sharing are realized, the data standard is established, the follow-up data application is facilitated, and the expandability is good.
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings so that those skilled in the art can easily implement the embodiments of the present invention. The present invention may be embodied in many different forms and is not limited to the embodiments described herein.
Fig. 1 is a schematic flow chart illustrating a data integration method of heterogeneous data sources according to an embodiment of the present invention.
The method comprises the following steps:
step S11: and carrying out abstract mapping on each data source in the heterogeneous databases to obtain metadata corresponding to each meta-model under the mapping relation.
Optionally, the heterogeneous database includes: the computer architecture heterogeneous database, the operating system heterogeneous database, the data format heterogeneous database, the data storage location heterogeneous database, and other heterogeneous databases are not limited in this application.
Optionally, performing abstract mapping on physical models in each data source in a plurality of heterogeneous databases according to the mapping relationship, and generating meta models with logical relationships respectively; and obtaining the metadata of the meta-model of each data source under the mapping relation based on each meta-model.
Specifically, abstract mapping is carried out on physical models of data sources in a plurality of heterogeneous databases according to mapping relations respectively, and a plurality of meta models with logical relations are obtained; and obtaining the metadata of the meta-model of each data source under the mapping relation based on the meta-model corresponding to each data source.
For each heterogeneous database, carrying out abstract mapping on a physical model of a data source in the heterogeneous database according to a mapping relation to obtain a meta-model (a logical model) with a logical relation; metadata of the meta-model of the data source under the mapping relationship is obtained, as shown in fig. 2.
Preferably, the mapping relationship is used for converting data of data sources of physical models managed by different semantic types and/or business logics into data of a uniform logical relationship. Wherein the logical relationship or mapping relationship is utilized to obtain metadata managed by the same specified data source semantic type standard and/or the same business logical standard.
Optionally, the heterogeneous database comprises structured data and/or unstructured data.
Optionally, in a medical scenario, the unstructured data includes: patient medical record data, examination report data, image data, text data, and a record database. The various databases obtain the metadata managed by the same semantic type standard and/or the same business logic standard of the specified data source according to the set mapping relation.
Optionally, the metadata management tool performs abstract mapping on each data source in the multiple heterogeneous databases to obtain metadata of each meta-model under the mapping relationship, where each meta-model corresponds to one data source.
Optionally, the metadata management tool may also delete, add, extract, store, query, manage, and the like the obtained meta model and the metadata.
Step S12: and copying each heterogeneous database to a copy database, and establishing change capture on the copy database to obtain a change table for recording change data in each heterogeneous database.
Optionally, synchronously copying data in each heterogeneous database to a copy database; every time a time threshold has elapsed, new change data in the replicated database is captured into the change table, as shown in FIG. 3.
Specifically, data in each heterogeneous database is synchronously copied to generate a plurality of copy databases; each replication database corresponds to one heterogeneous database; it should be noted that the replicated databases are synchronized to change when the data of the heterogeneous databases change.
And every time a set time threshold value passes, carrying out change capture on the current database, and generating a change table containing the captured change data. The time threshold is determined according to specific requirements, and the shorter the time threshold is set, the better the capture change effect is.
Optionally, the data structure supported by the replication database includes: one or more of DB2, Oracle, Sqlserver, and Mysql database.
Step S13: and converting the read change data in each heterogeneous database into a data format unified with the metadata.
Optionally, the format is unified according to the change data in each heterogeneous database read from the change table captured by each replication database and the metadata obtained by obtaining the unified mapping relationship. So as to update the continuously changing data in each heterogeneous database and unify the format.
Optionally, the unified format is the same as the format of the metadata.
Step S14: and performing data governance on the change data and the metadata after the conversion of the unified data format, and storing the change data and the metadata into an integrated data lake.
Optionally, the change data and the metadata after being converted by the unified data format are subjected to data governance; and outputting the change data subjected to data governance and the metadata to an integrated data lake for storage. The format of the change data and the metadata subjected to data governance is a unified data format supported by the data lake, as shown in fig. 4.
Optionally, the data governance method includes: one or more of removing invalid data, unifying data definitions, processing missing data, and extracting valid variable ways of unstructured data, in order to generate normalized and normalized data for storage into the integrated data lake.
Similar to the principle of the above embodiment, the present invention provides a data integration system of heterogeneous data sources.
Specific embodiments are provided below in conjunction with the attached figures:
fig. 5 is a schematic structural diagram illustrating a data integration system of heterogeneous data sources according to an embodiment of the present invention.
The system comprises:
the metadata management module 51 is configured to perform abstract mapping on each data source in a plurality of heterogeneous databases to obtain metadata of each meta-model in the mapping relationship, where each meta-model corresponds to one data source;
the replication database module 52 is configured to replicate the heterogeneous databases to the replication database, and establish change capture on the replication database to obtain a change table for recording change data in the heterogeneous databases;
a data integration module 53, connected to the metadata management module 51 and the replication database module 52, for converting the read change data in the various heterogeneous databases into a data format unified with the metadata;
and the data governance module 54 is connected with the data integration module 53, and is used for performing data governance on the change data and the metadata which are subjected to the unified data format conversion, and storing the change data and the metadata into an integrated data lake.
Optionally, the heterogeneous database includes: the computer architecture heterogeneous database, the operating system heterogeneous database, the data format heterogeneous database, the data storage location heterogeneous database, and other heterogeneous databases are not limited in this application.
Optionally, the metadata management module 51 performs abstract mapping on physical models in each data source in a plurality of heterogeneous databases according to a mapping relationship, and generates a meta model with a logical relationship respectively; and obtaining the metadata of the meta-model of each data source under the mapping relation based on each meta-model.
Specifically, the metadata management module 51 performs abstract mapping on the physical models of the data sources in the heterogeneous databases according to the mapping relationship, so as to obtain a plurality of metadata models with logical relationships; and obtaining the metadata of the meta-model of each data source under the mapping relation based on the meta-model corresponding to each data source.
For each heterogeneous database, the metadata management module 51 performs abstract mapping on a physical model of a data source in the heterogeneous database according to a mapping relationship to obtain a meta model (logical model) with a logical relationship; and obtaining metadata of the meta-model of the data source under the mapping relation.
Preferably, the mapping relationship is used for converting data of data sources of physical models managed by different semantic types and/or business logics into data of a uniform logical relationship. Wherein the logical relationship or mapping relationship is utilized to obtain metadata managed by the same specified data source semantic type standard and/or the same business logical standard.
Optionally, the heterogeneous database comprises structured data and/or unstructured data.
Optionally, the unstructured data includes: patient medical record data, examination report data, image data, text data, and a record database. The various databases obtain the metadata managed by the same semantic type standard and/or the same business logic standard of the specified data source according to the set mapping relation.
Optionally, the metadata management module 51 performs abstract mapping on each data source in the multiple heterogeneous databases through a metadata management tool to obtain metadata of each meta model under the mapping relationship, where each meta model corresponds to one data source.
Optionally, the metadata management tool may also delete, add, extract, store, query, manage, and the like the obtained meta model and the metadata.
Optionally, the metadata management tool includes: ODBC, file adapter, XML adapter, etc. and storage device.
Optionally, the replication database module 52 synchronously replicates the data in each heterogeneous database to the replication database; capturing new change data in the replicated database into a change table each time a time threshold has elapsed.
Specifically, the replication database module 52 performs synchronous replication on the data in each heterogeneous database to generate a plurality of replication databases; each replication database corresponds to one heterogeneous database; it should be noted that the replicated databases are synchronized to change when the data of the heterogeneous databases change.
Every time a set time threshold value passes, the replication database module 52 captures the change of the current database, and generates a change table containing the captured change data. The time threshold is determined according to specific requirements, and the shorter the time threshold is set, the better the capture change effect is.
Optionally, the data structure supported by the replication database includes: one or more of DB2, Oracle, Sqlserver, and Mysql database.
Optionally, the data integration module 53 unifies the format of the change data in each heterogeneous database read from the change table captured by each replication database with the metadata obtained by obtaining the unified mapping relationship. So as to update the continuously changing data in each heterogeneous database and unify the format.
Optionally, the unified format is the same as the format of the metadata.
Optionally, the data governance module 54 performs data governance on the change data and the metadata after the uniform data format conversion; and outputting the change data subjected to data governance and the metadata to an integrated data lake for storage. And the formats of the change data and the metadata subjected to data governance are unified data formats supported by the data lake.
Optionally, the data governance method includes: one or more of removing invalid data, unifying data definitions, processing missing data, and extracting valid variable ways of unstructured data, in order to generate normalized and normalized data for storage into the integrated data lake.
As shown in fig. 6, a schematic structural diagram of a data integration terminal 60 of heterogeneous data sources in the embodiment of the present invention is shown.
The data integration terminal 60 of the heterogeneous data source includes: a memory 61 and a processor 62, the memory 61 being for storing computer programs; the processor 62 runs a computer program to implement the data integration method of heterogeneous data sources as described in fig. 1.
Optionally, the number of the memories 61 may be one or more, the number of the processors 62 may be one or more, and fig. 6 illustrates one example.
Optionally, the processor 62 in the data integration terminal 60 of the heterogeneous data source may load one or more instructions corresponding to processes of the application program into the memory 61 according to the steps shown in fig. 1, and the processor 62 runs the application program stored in the first memory 61, so as to implement various functions in the data integration method of the heterogeneous data source shown in fig. 1.
Optionally, the memory 61 may include, but is not limited to, a high speed random access memory, a non-volatile memory. Such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid-state storage devices; the Processor 62 may include, but is not limited to, a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
Optionally, the Processor 62 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
The present invention also provides a computer-readable storage medium storing a computer program, which when executed implements the data integration method of heterogeneous data sources as shown in fig. 1. The computer-readable storage medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs (compact disc-read only memories), magneto-optical disks, ROMs (read-only memories), RAMs (random access memories), EPROMs (erasable programmable read only memories), EEPROMs (electrically erasable programmable read only memories), magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing machine-executable instructions. The computer readable storage medium may be a product that is not accessed by the computer device or may be a component that is used by an accessed computer device.
In summary, the data integration method, system, terminal and medium of the heterogeneous data source of the present invention solve the problems in the prior art that based on a large amount of heterogeneous data, especially when structured data and unstructured data are integrated, data integration is incomplete, efficiency is not high, expansion is difficult, data is lack of control, application range is limited, and when the integration range is expanded to a new application, repeated development is required, and cost is high. The heterogeneous databases of the subsystems are converted into the unified data format supported by the data lake, the problem of inconsistent data content standards among heterogeneous data is deeply managed, data integration and sharing are realized, the data standard is established, the follow-up data application is facilitated, and the expandability is good. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.
The foregoing embodiments are merely illustrative of the principles of the present invention and its efficacy, and are not to be construed as limiting the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (10)

1. A data integration method for heterogeneous data sources is characterized by comprising the following steps:
performing abstract mapping on each data source in a plurality of heterogeneous databases to obtain metadata of each meta-model under the mapping relation, wherein each meta-model corresponds to one data source;
copying each heterogeneous database to a copy database, and establishing change capture on the copy database to obtain a change table for recording change data in each heterogeneous database;
converting the read change data in each heterogeneous database into a data format unified with the metadata;
and performing data governance on the change data and the metadata after the conversion of the unified data format, and storing the change data and the metadata into an integrated data lake.
2. The method for data integration of heterogeneous data sources according to claim 1, wherein the performing abstract mapping on each data source in the plurality of heterogeneous databases to obtain metadata of each meta-model obtained under the mapping relationship comprises:
abstract mapping is carried out on physical models in each data source in a plurality of heterogeneous databases according to the mapping relation, and meta models with logical relations are respectively generated;
and obtaining the metadata of the meta-model of each data source under the mapping relation based on each meta-model.
3. The data integration method of heterogeneous data sources of claim 1, wherein the heterogeneous database comprises structured data and/or unstructured data.
4. The method of data integration of disparate data sources as recited in claim 3, wherein said unstructured data comprises: patient medical record data, examination report data, image data, text data, and a record database.
5. The method for data integration of heterogeneous data sources according to claim 1, wherein the copying each heterogeneous database to a duplicate database and establishing change capture on the duplicate database to obtain a change table for recording change data in each heterogeneous database comprises:
synchronously copying data in each heterogeneous database to a copy database;
capturing new change data in the replicated database into a change table each time a time threshold has elapsed.
6. The method of data integration of disparate data sources as recited in claim 1, wherein said replicating a database-supported data structure comprises: one or more of DB2, Oracle, Sqlserver, and Mysql database.
7. The data integration method of the heterogeneous data source according to claim 2, wherein the data governance manner comprises: one or more of invalid data removal, unified data definition, missing data processing, and efficient variable manner of extracting unstructured data.
8. A data integration system for disparate data sources, the system comprising:
the metadata management module is used for carrying out abstract mapping on each data source in the heterogeneous databases to obtain metadata of each meta-model under the mapping relation, wherein each meta-model corresponds to one data source;
the replication database module is used for replicating each heterogeneous database to a replication database and establishing change capture on the replication database so as to obtain a change table for recording change data in each heterogeneous database;
the data integration module is connected with the metadata management module and the copy database module and is used for converting the read change data in each heterogeneous database into a data format unified with the metadata;
and the data management module is connected with the data integration module and used for performing data management on the change data and the metadata after the conversion of the unified data format and storing the change data and the metadata into an integrated data lake.
9. The data integration system of heterogeneous data sources of claim 8, wherein the manner of abstractly mapping each data source in the plurality of heterogeneous databases to obtain the metadata of each meta-model obtained under the mapping relationship comprises:
abstract mapping is carried out on physical models in each data source in a plurality of heterogeneous databases according to the mapping relation, and meta models with logical relations are respectively generated;
and obtaining the metadata of the meta-model of each data source under the mapping relation based on each meta-model.
10. A data integration terminal of heterogeneous data sources, comprising:
a memory for storing a computer program;
a processor for performing the data integration method of the heterogeneous data sources of any one of claims 1 to 7.
CN202010566643.8A 2020-06-12 2020-06-19 Data integration method, system and terminal for heterogeneous data sources Active CN111767332B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2020105354525 2020-06-12
CN202010535452 2020-06-12

Publications (2)

Publication Number Publication Date
CN111767332A true CN111767332A (en) 2020-10-13
CN111767332B CN111767332B (en) 2021-07-30

Family

ID=72721139

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010566643.8A Active CN111767332B (en) 2020-06-12 2020-06-19 Data integration method, system and terminal for heterogeneous data sources

Country Status (1)

Country Link
CN (1) CN111767332B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112286916A (en) * 2020-10-22 2021-01-29 北京锐安科技有限公司 Data processing method, device, equipment and storage medium
CN112883091A (en) * 2021-01-12 2021-06-01 平安资产管理有限责任公司 Factor data acquisition method and device, computer equipment and storage medium
CN113190517A (en) * 2021-06-30 2021-07-30 北京德风新征程科技有限公司 Data integration method and device, electronic equipment and computer readable medium
CN113568938A (en) * 2021-08-04 2021-10-29 北京百度网讯科技有限公司 Data stream processing method and device, electronic equipment and storage medium

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5970490A (en) * 1996-11-05 1999-10-19 Xerox Corporation Integration platform for heterogeneous databases
US20080222192A1 (en) * 2006-12-21 2008-09-11 Ec-Enabler, Ltd Method and system for transferring information using metabase
CN101390094A (en) * 2006-02-22 2009-03-18 微软公司 Distributed conflict resolution for replicated databases
CN102004745A (en) * 2009-09-02 2011-04-06 中国银联股份有限公司 Data transfer system and method
CN102073767A (en) * 2011-01-12 2011-05-25 南京南瑞继保电气有限公司 Method for managing metadata of virtual data warehouse of electric power information system group
CN103412917A (en) * 2013-08-08 2013-11-27 广西大学 Extensible database system and management method for coordinated management of data in multi-type field
CN107122428A (en) * 2017-04-12 2017-09-01 南京南瑞集团公司 A kind of database isomeric data Format Painter conversion method
CN107729366A (en) * 2017-09-08 2018-02-23 广东省建设信息中心 A kind of pervasive multi-source heterogeneous large-scale data synchronization system
CN107895046A (en) * 2017-11-30 2018-04-10 广东奥飞数据科技股份有限公司 A kind of Heterogeneous Database Integration Platform
CN108010573A (en) * 2017-11-24 2018-05-08 苏州市环亚数据技术有限公司 A kind of hospital data emerging system, method, electronic equipment and storage medium
CN108108949A (en) * 2016-11-25 2018-06-01 杭州王道科技有限公司 Speed passage through customs method and the platform of information synergism
CN109739867A (en) * 2018-12-29 2019-05-10 北京航天数据股份有限公司 A kind of industry metadata management method and system
CN110739075A (en) * 2019-10-28 2020-01-31 常州工业职业技术学院 COPD disease auxiliary diagnosis monitoring system based on big data
US20200097651A1 (en) * 2018-09-26 2020-03-26 General Electric Company Systems and methods to achieve robustness and security in medical devices
CN111052106A (en) * 2018-04-27 2020-04-21 甲骨文国际公司 System and method for heterogeneous database replication from a remote server

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5970490A (en) * 1996-11-05 1999-10-19 Xerox Corporation Integration platform for heterogeneous databases
CN101390094A (en) * 2006-02-22 2009-03-18 微软公司 Distributed conflict resolution for replicated databases
US20080222192A1 (en) * 2006-12-21 2008-09-11 Ec-Enabler, Ltd Method and system for transferring information using metabase
CN102004745A (en) * 2009-09-02 2011-04-06 中国银联股份有限公司 Data transfer system and method
CN102073767A (en) * 2011-01-12 2011-05-25 南京南瑞继保电气有限公司 Method for managing metadata of virtual data warehouse of electric power information system group
CN103412917A (en) * 2013-08-08 2013-11-27 广西大学 Extensible database system and management method for coordinated management of data in multi-type field
CN108108949A (en) * 2016-11-25 2018-06-01 杭州王道科技有限公司 Speed passage through customs method and the platform of information synergism
CN107122428A (en) * 2017-04-12 2017-09-01 南京南瑞集团公司 A kind of database isomeric data Format Painter conversion method
CN107729366A (en) * 2017-09-08 2018-02-23 广东省建设信息中心 A kind of pervasive multi-source heterogeneous large-scale data synchronization system
CN108010573A (en) * 2017-11-24 2018-05-08 苏州市环亚数据技术有限公司 A kind of hospital data emerging system, method, electronic equipment and storage medium
CN107895046A (en) * 2017-11-30 2018-04-10 广东奥飞数据科技股份有限公司 A kind of Heterogeneous Database Integration Platform
CN111052106A (en) * 2018-04-27 2020-04-21 甲骨文国际公司 System and method for heterogeneous database replication from a remote server
US20200097651A1 (en) * 2018-09-26 2020-03-26 General Electric Company Systems and methods to achieve robustness and security in medical devices
CN109739867A (en) * 2018-12-29 2019-05-10 北京航天数据股份有限公司 A kind of industry metadata management method and system
CN110739075A (en) * 2019-10-28 2020-01-31 常州工业职业技术学院 COPD disease auxiliary diagnosis monitoring system based on big data

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112286916A (en) * 2020-10-22 2021-01-29 北京锐安科技有限公司 Data processing method, device, equipment and storage medium
CN112883091A (en) * 2021-01-12 2021-06-01 平安资产管理有限责任公司 Factor data acquisition method and device, computer equipment and storage medium
CN113190517A (en) * 2021-06-30 2021-07-30 北京德风新征程科技有限公司 Data integration method and device, electronic equipment and computer readable medium
CN113568938A (en) * 2021-08-04 2021-10-29 北京百度网讯科技有限公司 Data stream processing method and device, electronic equipment and storage medium
CN113568938B (en) * 2021-08-04 2023-11-14 北京百度网讯科技有限公司 Data stream processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111767332B (en) 2021-07-30

Similar Documents

Publication Publication Date Title
CN111767332B (en) Data integration method, system and terminal for heterogeneous data sources
US10216770B1 (en) Scaling stateful clusters while maintaining access
US9697268B1 (en) Bulk data distribution system
US10684990B2 (en) Reconstructing distributed cached data for retrieval
CN106649676B (en) HDFS (Hadoop distributed File System) -based duplicate removal method and device for stored files
EP3602341A1 (en) Data replication system
CN112988702A (en) Heterogeneous data source real-time data transmission method and system, storage medium and terminal
CN111627552A (en) Medical streaming data blood relationship analysis and storage method and device
CN109994164A (en) Magnanimity Analysis of Medical Treatment Data system based on big data platform
Dhanda Big data storage and analysis
CN112965939A (en) File merging method, device and equipment
CN113297231A (en) Database processing method and device
Kaushal et al. Big data application in medical domain
CN112286918B (en) Method and device for fast access conversion of data, electronic equipment and storage medium
CN116881371B (en) Data synchronization method, device, equipment and storage medium
Tamilselvi et al. Big data analytics using hadoop technology
TWI796943B (en) A processing system that realizes high-efficiency computing by using cache mirroring data
CN110993115B (en) System and method for medical data heterogeneous fusion treatment
US20230070847A1 (en) System and method for planning a data warehouse migration
CN117495288A (en) Data asset full life cycle management system and method
CN117472365A (en) Data processing method, terminal device and storage medium
CN116028465A (en) Data migration method, device and storage medium
CN116126793A (en) HDFS distributed file system based on SpringBoot project fusion query
CN114356945A (en) Data processing method, data processing device, computer equipment and storage medium
CN114238317A (en) Data storage and synchronization method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant