CN103927344A - Data integration method - Google Patents

Data integration method Download PDF

Info

Publication number
CN103927344A
CN103927344A CN201410123611.5A CN201410123611A CN103927344A CN 103927344 A CN103927344 A CN 103927344A CN 201410123611 A CN201410123611 A CN 201410123611A CN 103927344 A CN103927344 A CN 103927344A
Authority
CN
China
Prior art keywords
data
source
data source
replica
mapping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410123611.5A
Other languages
Chinese (zh)
Inventor
王勇
曲晓白
吴光州
王立峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Shandong Electric Power Co Ltd
Original Assignee
State Grid Shandong Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Shandong Electric Power Co Ltd filed Critical State Grid Shandong Electric Power Co Ltd
Priority to CN201410123611.5A priority Critical patent/CN103927344A/en
Publication of CN103927344A publication Critical patent/CN103927344A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/84Mapping; Conversion
    • G06F16/86Mapping to a database

Abstract

A data integration method includes: receiving data integration information, searching for corresponding data sources in a source end database, mapping the data of the data sources to target data, and transmitting the target data to a target end database. The data integration method has the advantages that by integrating the data of all data resources into globe-mode data, a user is allowed to transparently visit the data of the data sources according to a globe mode, data consistency of the data resources can be maintained by duplicating the data of other data resources to the target data resources, and information sharing and utilization efficiency is increased; data integration is achieved through source end and target end data conversion, and data distribution and mapping problems are solved.

Description

A kind of data integrating method
Technical field
The present invention relates to a kind of Data Integration neck, specifically a kind of data integrating method.
Background technology
In enterprise, difference due to development time or developing department, often have infosystem a plurality of mappings, that operate on different hardware and software platforms to move simultaneously, the data source of these systems independently of one another, sealing mutually, make data be difficult between system to exchange, share and merge.Therefore be starved of existing information is integrated, share information.
Data integration refers to physically organically concentrates the data of separate sources, form, feature character at logical OR, namely the data in the source database from different are formed to the data in destination database after logical OR is physically organically concentrated, by data integration, can provide comprehensive data sharing for enterprise.
Data integration provides a width mirror image of the backend information of storing on corporate boss's computing machine.When a client need to check the state of an order, this inquiry is just transferred to data integration software.Therefore, not always need to access the principal computer of this enterprise.Data integration software has enough intelligence, knows and when keeps synchronizeing to data are constantly updated with principal computer.For E-business applications integrating ERP data be by data staging and directly the two the combination of access ERP data complete, it comprises uses a data server and some data cache.Data integration software mixes direct real-time and data access method in batches with aptitude manner, so as from an ERP system extracted data.
In to the research process of data integration, still need to focus on following problem: the mapping that (1) relational data model and the semi-structured data based on XML are asked, guarantee to shine upon integrality and the consistency constraint of front and back data; (2) construction method of semi-structured data global schema and mapping method, will guarantee that the integrality of data and consistency constraint can ask transmission in semi-structured data: safe and reliable data transmission in (3) data integration process equally.
Under the promotion of the technical manuals such as XML, Web Services and grid computing, some difficult problems in data integration will be well solved, and the application of data integration also can be more extensive.
Summary of the invention
The deficiency existing for overcoming above-mentioned prior art, the invention provides a kind of data integrating method.
The present invention solves the technical scheme that its technical matters takes: a kind of data integrating method, it is characterized in that, and comprise
Receive data integration information;
In source database, search corresponding data source;
By the data-mapping of data source on destination data;
Transmit described destination data to destination database.
Further, described is exactly to be global schema's data by the data integration of all data sources by the data-mapping of data source to the process in destination data.
Preferably, described global schema data comprise data structure, field information and the data manipulation of data source.
Preferably, the data integration of described data source is that the process of global schema's data is exactly the processing procedure that data source is carried out data extraction, checking, cleaning, integration, assembled and pack into.
Further, described by the data-mapping of data source to the process in destination data be exactly determine a data source be target data source and by data transfer mode by the data Replica of other data source on target data source.
Further, described is exactly that data source active push data on target data source or target data source initiatively from the data source process of extracted data by the data Replica of other data source to the process in target data source by data transfer mode.
Further, described by the data-mapping of data source to the process in destination data be exactly determine a data source be target data source and by data Replica triggering mode by the data Replica of other data source on target data source.
Further, described is exactly by data Replica trigger event to start other data source to data Replica target data source in by the data Replica of other data source to the process in target data source by data Replica triggering mode, and described certificate copies that trigger event comprises data variation trigger event, trigger event, client call trigger event and clocked flip event in batches.
Closer, the data-mapping described in said method comprises grammer mapping and field mappings;
The mapping of described grammer is exactly the naming rule of data source and data type to be mapped as to naming rule and the data type of destination data;
Described field mappings is exactly the content of data source and implication to be mapped as to content and the implication of destination data, and field mappings comprises that field splits, field merges, field data format conversion and field shift.
The invention has the beneficial effects as follows: the present invention is by being global schema's data by the data integration of all data sources, make the user can be according to global schema's data of access data sources pellucidly, by by the data Replica of other data source on target data source, data consistency that can service data source, has improved the efficiency that information sharing utilizes; Thereby by the data-switching between source and destination, reach data integration object, solved the problem of distributivity and the mapping of data.
Accompanying drawing explanation
Below in conjunction with accompanying drawing, the present invention is further described:
Fig. 1 is method flow diagram of the present invention;
Fig. 2 is the schematic diagram of grammer mapping of the present invention;
Fig. 3 is the schematic diagram that field of the present invention splits;
Fig. 4 is the schematic diagram that field of the present invention merges;
Fig. 5 is the schematic diagram of field data format conversion of the present invention;
Fig. 6 is the schematic diagram that field of the present invention shifts.
Embodiment
As shown in Figure 1, a kind of data integrating method of the present invention, it comprises the following steps:
Receive data integration information;
In source database, search corresponding data source;
By the data-mapping of data source on destination data;
Transmit described destination data to destination database.
The present invention makes full use of the integrated means of several data, solves the problem of data among enterprises fragment, to make quickly data driven type operational decision making and more effectively effectively to carry out business running.
One, data integration mode
A kind of data integrating method that adopts data integration mode of the present invention, it comprises the following steps: receive data integration information, in source database, search corresponding data source, by the data integration of all data sources, be global schema's data, transmit described global schema data to destination database.
While carrying out data integration, the Data View of each data source is integrated into global schema, makes user access pellucidly each data source according to global schema, global schema described data structure, field information and the data manipulation etc. of data source.
User directly submits request on the basis of global schema, and destination converts according to these requests the request that each data source can be carried out on local data view basis to.The feature of data integration mode is directly for user provides transparent data access method.The global schema using due to user is virtual data source, data source information that not only can integrated morphology, can also be integrated information in semi-structured or unstructured data sources.The data source of data integration mainly refers to data base management system (DBMS), broadly also comprises structuring, the semi-structured information such as all kinds of XML document, html document, Email, ordinary file.
In data integration process, the structure inconsistency of each data source, thereby to data integration, brought some difficulty, can extract by ETL(data, conversion, load) process can eliminate the mapping relations between data source.The workflow of ETL can turn in detail data extraction, checking, cleaning, integration, assemble and pack into.Data are prerequisites that enterprise carries out transaction, for numerous infosystems provides service.In the face of a plurality of platforms and a plurality of data structure, and the complicated data environment such as remote of being physically separated by between platform, ETL will complete the task of the data that provide comprehensive and high-quality, possesses good versatility and extendability simultaneously.
Two, data Replica mode
A kind of data integrating method that adopts data Replica mode of the present invention, it comprises the following steps: receive data integration information, in source database, search corresponding data source, determine a data source be target data source and by data transfer mode or data Replica triggering mode by the data Replica of other data source on target data source, transmit described target data source to destination database.
Data Replica mode by the data Replica of each data source on relative target data source, can service data source data consistency on the whole, improve the efficiency that information sharing utilizes.Data Replica can be copying of whole data source, can be also only to the propagation of delta data with copy.
Data Replica mode can reduce the repeated accesses of user to multiple data sources, thereby improves the performance of data integration.Modal data copy method is exactly data warehouse method, the method by the data Replica of each data source to same place---data warehouse, as access general data storehouse, direct visit data warehouse.
Data Replica mode can be divided from data transfer mode and two aspects of data Replica triggering mode.Data transfer mode refers to the transmission form of data between the source data source of distributing data and the destination data source of subscription data, can be divided into data-pushing and data pick-up.
Data-pushing refers to that source data initiatively pushes data on destination data source.Data pick-up is the operation contrary with data-pushing, is that request of data is initiatively sent to source data source in destination data source, from source data source, obtains data to this locality.In some cases, the data that source is sent to destination directly do not store in destination data source, need to be through the localization process of destination.At this moment conventionally adopt buffer memory to coordinate the asynchronous of source and data subscription end.Under the mode of data-pushing, data buffer storage will be structured in destination; And under the mode of data pick-up, data buffer storage will be structured in source.
Data Replica triggering mode refers to the mode that calling data copies.Conventionally pre-defined some events, these events can comprise: certain operation of the data variation that data publishing side is caused, data publishing side data buffer storage are accumulated to certain batch, user sends request of access, has the time point of certain intervals etc. certain data source.When being triggered, these events carry out corresponding data Replica.Therefore, data Replica triggering mode can be divided into by the difference of event definition: data variation triggers, triggering, client call triggering, clocked flip etc. in batches.
Data Replica directly adopts end-to-end mode conventionally, also has some data integrated systems to use and aims at the data platform that data turnover is served.During data Replica, data publisher is first sent to data on this data platform, the person that is transmitted to data subscription after being processed by data platform.Data platform will be handled network burden and Concurrency Control well.The benefit of usage data platform is that management is controlled, is convenient to single-point.But data platform has increased the complicacy of system, reduced the reliability of system.
Three, comprehensive integration mode
A kind of data integrating method that adopts comprehensive integration mode of the present invention, it comprises the following steps: receive data integration information, in source database, search corresponding data source, by the data-mapping of data source on destination data; Transmit described destination data to destination database.Wherein, described is exactly to be global schema's data by the data integration of all data sources by the data-mapping of data source to the process in destination data, or determine a data source be target data source and by data transfer mode or data Replica triggering mode by the data Replica of other data source on target data source.
Data integration mode provides global data view and unified access interface for user, and transparency is high; But the method does not realize the data interaction between data source, when user uses, often need to access a plurality of data sources, so the method needs system to have good network performance.Data Replica mode is before user uses certain data source, the data in advance of other data source that user may be used copies, when user uses, only need to access certain data source or a small amount of several data sources, this can raising system greatly process the efficiency that user asks; But data Replica mode exists time delay conventionally, while using the method, be difficult to ensure the real-time consistency of data between data source.
System scale that data integration mode is applicable to be integrated is large, Data Update frequent, data realtime uniform requires high situation.When being difficult to the query demand of predictive user, be also applicable to adopting in this way.In data integration mode, conventionally adopt middleware method.Because federative database need to be write separately a large amount of communication interfaces for each data source when integrated, therefore simple federative database method is seldom used now.
Data Replica mode is applicable to the situation that data source is relatively stable, user's query pattern is known or limited.Wider when data distributivity, network delay is larger, while needing again to have the very short processing time, also can consider to adopt data integrating method simultaneously.Some application scenario need to be backed up data, at this moment conventionally adopts data Replica mode; Also have some occasions, for the consideration of confidentiality, data do not allow to copy, at this moment will usage data integrated approach.
In order to break through the limitation of first two method, conventionally these two kinds of methods are mixed to use, i.e. so-called comprehensive integration mode.Comprehensive integration mode normally tries every possible means to improve the performance based on middleware system, and the method still has virtual data pattern view for user, can data conventional between data source be copied simultaneously.For the simple request of access of user, comprehensive integration mode always, as possible by data Replica mode, is realized user's requirements for access on local data source or data mapping; And to those complicated user requests, in the time of cannot realizing by data Replica mode, just usage data is integrated.
In calling the process of above-mentioned various integration modes, need to solve data-mapping problem.Data-mapping is the key problem of a lot of data integrated systems of puzzlement always, is also the focus of research aspect data integration.Wherein the difficult point of data-mapping is mainly manifested in grammer mapping and field mappings.
(1) grammer mapping
Grammer mapping refers generally to naming rule and data type existence difference between source data and destination data.For database, naming rule refers to table name and field name.Grammer mapping is relatively simple, if realize field to field, be recorded to the mapping of record, solution name conflict and data type conflict wherein.This mapping is all very direct, than being easier to, realizes.Therefore, grammer mapping is without content and the implication of being concerned about data, as long as know data structure information. complete source data structure just passable to the mapping between destination data structure.As can be seen from Figure 2 the feature that grammer shines upon, field data content (" 1001 ", " Zhang San ") do not change in mapping process.
(2) field mappings
When data integration will be considered the content of data and implication, just enter on the level of field mappings.Field mappings than grammer mapping complex many, it needs to destroy the atomicity of field often, needs direct deal with data content.As shown in Fig. 3-6, common field mappings comprises following modes: field fractionation, field merging, field data format conversion, field transfer etc.
Difference when grammer mapping can be traced back to data source modeling with the difference of field mappings: when the entity relationship model of data source is identical, when just naming rule is different, the just grammer between data source causing shines upon; When data source builds solid model, if adopt different granularity division, different inter-entity relation and different field data fields to represent, will inevitably cause the field mappings between data source, to data integration, bring very burden.In fact, in reality, the grammer of data integrated system mapping phenomenon is ubiquitous.Several grammer mapping above-mentioned belongs to the comparatively grammer mapping of rule, can address these problems with specific mapping method.Also have some uncommon or be difficult for found grammer mapping, for example data source has implied some constraint informations when building, and when data integration, these constraints are difficult for being found, the generation that tends to make the mistake.As certain data item is used for defining month, implying its value can only be between 1 to 12, and if ignored this constraint when integrated, probably cause absurd result.
Compared with prior art, the present invention has following advantage:
(1), along with enterprise is considered as traffic issues by data management, therefore by using a plurality of instruments, skill set and supplier's complexity to be down to the minimum raising for work efficiency, become particularly crucial.
(2), data integration platform is by increasing work efficiency, and helps each application system to run more efficiently.Platform makes application system in each project, do repeated work.Application system but can share method, technology and assets, for example logic and metadata in all items.
(3), standardization data integration practice on platform, while creating integration capability center, can aspect development time of integration application and data-interface and cost and maintenance cost, obtain greatly saving.
(4), data integration also relates to many different role, from data administrator and business diagnosis teacher to data framework teacher and developer, Each performs its own functions and each doing his best, to tackle the business demand of continuous variation in more quick and economical mode.
(5), unified data integration platform can cooperate more effectively by engineering department and business department.Platform provides interface and instrument, make in tool set each several part can be in a plurality of projects seamless being used in conjunction with.The personnel that participate in data integration only need spend less time understanding platform, thereby the more time can be dropped in one's work.
The above is the preferred embodiment of the present invention, for those skilled in the art, under the premise without departing from the principles of the invention, can also make some improvements and modifications, and these improvements and modifications are also regarded as protection scope of the present invention.

Claims (10)

1. a data integrating method, is characterized in that, comprises
Receive data integration information;
In source database, search corresponding data source;
By the data-mapping of data source on destination data;
Transmit described destination data to destination database.
2. a kind of data integrating method according to claim 1, is characterized in that, described is exactly to be global schema's data by the data integration of all data sources by the data-mapping of data source to the process in destination data.
3. a kind of data integrating method according to claim 2, is characterized in that, described global schema data comprise data structure, field information and the data manipulation of data source.
4. a kind of data integrating method according to claim 2, is characterized in that, the data integration of described data source is that the process of global schema's data is exactly the processing procedure that data source is carried out data extraction, checking, cleaning, integration, assembled and pack into.
5. a kind of data integrating method according to claim 1, it is characterized in that, described by the data-mapping of data source to the process in destination data be exactly determine a data source be target data source and by data transfer mode by the data Replica of other data source on target data source.
6. a kind of data integrating method according to claim 5, is characterized in that, described is exactly data source active push data to process target data source in by the data Replica of other data source to the process in target data source by data transfer mode.
7. a kind of data integrating method according to claim 5, is characterized in that, described is exactly target data source initiatively from the data source process of extracted data by the data Replica of other data source to the process in target data source by data transfer mode.
8. a kind of data integrating method according to claim 1, it is characterized in that, described by the data-mapping of data source to the process in destination data be exactly determine a data source be target data source and by data Replica triggering mode by the data Replica of other data source on target data source.
9. a kind of data integrating method according to claim 8, it is characterized in that, described is exactly by data Replica trigger event to start other data source to data Replica target data source in by the data Replica of other data source to the process in target data source by data Replica triggering mode, and described certificate copies that trigger event comprises data variation trigger event, trigger event, client call trigger event and clocked flip event in batches.
10. according to a kind of data integrating method described in claim 1-9 any one, it is characterized in that, described data-mapping comprises grammer mapping and field mappings;
The mapping of described grammer is exactly the naming rule of data source and data type to be mapped as to naming rule and the data type of destination data;
Described field mappings is exactly the content of data source and implication to be mapped as to content and the implication of destination data, and field mappings comprises that field splits, field merges, field data format conversion and field shift.
CN201410123611.5A 2014-03-31 2014-03-31 Data integration method Pending CN103927344A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410123611.5A CN103927344A (en) 2014-03-31 2014-03-31 Data integration method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410123611.5A CN103927344A (en) 2014-03-31 2014-03-31 Data integration method

Publications (1)

Publication Number Publication Date
CN103927344A true CN103927344A (en) 2014-07-16

Family

ID=51145565

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410123611.5A Pending CN103927344A (en) 2014-03-31 2014-03-31 Data integration method

Country Status (1)

Country Link
CN (1) CN103927344A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105893187A (en) * 2016-04-01 2016-08-24 广州唯品会网络技术有限公司 Mirror image data generation method and electronic equipment
CN108133007A (en) * 2017-12-22 2018-06-08 北京明朝万达科技股份有限公司 A kind of method of data synchronization and system
CN108280157A (en) * 2018-01-15 2018-07-13 国网信通亿力科技有限责任公司 Data information integrated system
CN110503540A (en) * 2019-08-23 2019-11-26 国网河北省电力有限公司信息通信分公司 A kind of capital management display systems
CN111726377A (en) * 2019-03-19 2020-09-29 百度在线网络技术(北京)有限公司 Data processing method and device based on public cloud
CN113760609A (en) * 2021-09-22 2021-12-07 南方电网数字电网研究院有限公司 Method for realizing power grid data sharing access
CN116092682A (en) * 2023-04-11 2023-05-09 中大体育产业集团股份有限公司 File management method and system for body measurement data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050223109A1 (en) * 2003-08-27 2005-10-06 Ascential Software Corporation Data integration through a services oriented architecture
CN101083656A (en) * 2007-07-05 2007-12-05 上海交通大学 Data stream technique based multi-source heterogeneous data integrated system
CN102567335A (en) * 2010-12-15 2012-07-11 上海杉达学院 Service system based on heterogeneous data
CN103309977A (en) * 2013-06-14 2013-09-18 广东电网公司电力科学研究院 Heterogeneous data resource integration method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050223109A1 (en) * 2003-08-27 2005-10-06 Ascential Software Corporation Data integration through a services oriented architecture
CN101083656A (en) * 2007-07-05 2007-12-05 上海交通大学 Data stream technique based multi-source heterogeneous data integrated system
CN102567335A (en) * 2010-12-15 2012-07-11 上海杉达学院 Service system based on heterogeneous data
CN103309977A (en) * 2013-06-14 2013-09-18 广东电网公司电力科学研究院 Heterogeneous data resource integration method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
中国系统工程学会决策科学专业委员会: "《决策科学理论与创新》", 30 September 2007, 海洋出版社 *
蔚继承 等: ""一种基于本体的异构数据集成方法"", 《信息化研究》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105893187A (en) * 2016-04-01 2016-08-24 广州唯品会网络技术有限公司 Mirror image data generation method and electronic equipment
CN108133007A (en) * 2017-12-22 2018-06-08 北京明朝万达科技股份有限公司 A kind of method of data synchronization and system
CN108280157A (en) * 2018-01-15 2018-07-13 国网信通亿力科技有限责任公司 Data information integrated system
CN108280157B (en) * 2018-01-15 2020-10-09 国网信息通信产业集团有限公司 Data information integration system
CN111726377A (en) * 2019-03-19 2020-09-29 百度在线网络技术(北京)有限公司 Data processing method and device based on public cloud
CN110503540A (en) * 2019-08-23 2019-11-26 国网河北省电力有限公司信息通信分公司 A kind of capital management display systems
CN113760609A (en) * 2021-09-22 2021-12-07 南方电网数字电网研究院有限公司 Method for realizing power grid data sharing access
CN113760609B (en) * 2021-09-22 2023-10-20 南方电网数字电网研究院有限公司 Method for realizing power grid data sharing access
CN116092682A (en) * 2023-04-11 2023-05-09 中大体育产业集团股份有限公司 File management method and system for body measurement data

Similar Documents

Publication Publication Date Title
CN103927344A (en) Data integration method
CN107451220B (en) Distributed NewSQL database system
US20200265017A1 (en) Synchronization of client machines with a content management system repository
TW202002587A (en) Block chain-based data processing method and device
US7979479B2 (en) Transaction-controlled graph processing and management
US20040148420A1 (en) Programmable streaming data processor for database appliance having multiple processing unit groups
US20110161281A1 (en) Distributed Transaction Management in a Distributed Shared Disk Cluster Environment
CN104516967A (en) Electric power system mass data management system and use method thereof
CN104346377A (en) Method for integrating and exchanging data on basis of unique identification
CN109063109A (en) A kind of data query system based on ether mill
CN104160381A (en) Managing tenant-specific data sets in a multi-tenant environment
CA2912038A1 (en) Low latency query engine for apache hadoop
CN101741614B (en) Equivalent type node manager and equivalent type node management method
CN106354833A (en) Platform for achieving data management and sharing exchange on basis of B/S framework
US10432703B2 (en) On-demand session upgrade in a coordination service
US11080207B2 (en) Caching framework for big-data engines in the cloud
CN103399894A (en) Distributed transaction processing method on basis of shared storage pool
CN104834635A (en) Data processing method and device
CN108268614A (en) A kind of distribution management method of forest reserves spatial data
US20170177687A1 (en) Synchronization of offline instances
CN103853612A (en) Method for reading data based on digital family content under distributed storage
CN102262561B (en) The dispatching method that massive tasks of databases processes
CN102447620A (en) Real-time exchange management method, device and system for associated database
Bienko et al. IBM cloudant: database as a service advanced topics
KR101828522B1 (en) System of Parallel Distributed Processing System for Heterogeneous Data Processing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20140716

RJ01 Rejection of invention patent application after publication