CN107844581A - A kind of multi-resources Heterogeneous data fusion platform - Google Patents

A kind of multi-resources Heterogeneous data fusion platform Download PDF

Info

Publication number
CN107844581A
CN107844581A CN201711113864.4A CN201711113864A CN107844581A CN 107844581 A CN107844581 A CN 107844581A CN 201711113864 A CN201711113864 A CN 201711113864A CN 107844581 A CN107844581 A CN 107844581A
Authority
CN
China
Prior art keywords
data
database
modules
fusion platform
foreign key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711113864.4A
Other languages
Chinese (zh)
Inventor
陈涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Blue Scene Information Technology Co Ltd
Original Assignee
Chengdu Blue Scene Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Blue Scene Information Technology Co Ltd filed Critical Chengdu Blue Scene Information Technology Co Ltd
Priority to CN201711113864.4A priority Critical patent/CN107844581A/en
Publication of CN107844581A publication Critical patent/CN107844581A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Abstract

The invention discloses a kind of multi-resources Heterogeneous data fusion platform;It is characterized in that:The multi-resources Heterogeneous data fusion platform includes Metadata modules, data read module, Transformer modular converters and Foreign Key repair modules;The framework platform serves irreplaceable key effect in actual items.Even if third party database abandons restricted selection significantly to simplify data synchronization technology and provides the low quality data storehouse containing invalid data, the framework still it can be converted into constraint it is sound, relation it is correct, it is strict logic, can be by quality data storehouse that production environment directly accesses.The exactly basic-level support of this framework, the exploitation of whole upper strata complicated applications just become feasible.

Description

A kind of multi-resources Heterogeneous data fusion platform
Technical field
The present invention relates to domain of data fusion, is especially a kind of multi-resources Heterogeneous data fusion platform.
Background technology
Relevant database supports a variety of constraints, wherein including the foreign key constraint for ensureing referential integrity.Two datas can To establish father and son's association, subdata provides a foreign key column to preserve the id of parent data, and user needs outer to the foundation of this foreign key column Key constraint to ensure, or the external key of a data for sky, i.e., be temporarily not directed to any parent data, otherwise must be one The legal id of existing parent data.Limited by force by such, the relation between data is always complete, and subdata is impossible Hold illegal external key to lead to not point to an existing parent data and form an illegal and skimble-skamble fault relationships, This is relevant database the reason for why being relational data, and general in database and civilian post class practitioner's hand One of important difference of energization sub-table.
In actual items, it is impossible to which all data are all oneself to provide and safeguard, always have to use many third parties Data, services.Ideally, third party's data-service providers should provide a sound database of constraint and in a steady stream Constantly data change is synchronously come, but in fact, a part of data-service providers probably provide unconfined number According to the pure business-driven type supplier that storehouse, especially technical strength are extremely weak.Once constraint database lacks, when subdata is first It is synchronized in customer database but when parent data corresponding to it is not in time for push, should is only originally in subdata Legal parent data id external key can preserves an id for not completing synchronous and not existing parent data also in violation of rules and regulations, finally Cause a series of mistakes.This way ignores the set membership of data, and all data are considered as to extraneous data isolated each other, institute With can regardless of sequencing, mechanically continued synchronization data change, the difficulty of data, services is greatly reduced.But logarithm For consumer, the database so comprising a large amount of illegal relations can not be used directly, one long time Although follow-up data syn-chronization afterwards can repair before the problem of, can also manufacture the problem of new simultaneously, database is in forever Illegal state.
The content of the invention
Therefore, in order to solve above-mentioned deficiency, the present invention provides a kind of multi-resources Heterogeneous data fusion platform herein;The framework is put down Platform serves irreplaceable key effect in actual items.Even if third party database is significantly simplified data synchronization technology And abandon restricted selection and the low quality data storehouse containing invalid data be provided, the framework still it can be converted into constraint it is sound, Relation correctly, strict logic, can be by quality data storehouse that production environment directly accesses.The exactly bottom branch of this framework Hold, the exploitation of whole upper strata complicated applications just becomes feasible.
The present invention is achieved in that a kind of multi-resources Heterogeneous data fusion platform of construction, it is characterised in that:The multi-source is different Prime number includes Metadata modules, data read module, Transformer modular converters and Foreign according to convergence platform Key repair modules;
Wherein, Metadata modules are used for the code structure for analyzing user, automatically generate the SQL languages for building table, building constraint, indexing Sentence, the desired data structure of user is generated in target database;
Wherein, system data read module directly reads all data from low quality source database;Meanwhile changed from source database Data change is continuously read in daily record;
Wherein, Transformer modular converters obtain data change from event queue, personal code work are called, by old data Need to be changed into desired new data according to the specific business of project;
At the same time, the new data after processing is put into high quality target database by Transformer modular converters;
Also, Transformer conversion modules notice Foreign Key repair modules new data arrives, it may be necessary to repairs outer Key;
Described, Foreign Key repair modules will can become legal institute because of the arrival of latest data in target database There is foreign key constraint all to repair.
According to a kind of multi-resources Heterogeneous data fusion platform of the present invention, it is characterised in that:System data read module from Source database directly reads all data, and this process is extremely very long, often a couple of days;Therefore system is done once before reaching the standard grade first, only This once, later no longer full dose update, the substitute is incremental update.
According to a kind of multi-resources Heterogeneous data fusion platform of the present invention, it is characterised in that:Come for data read module Say, either which kind of mode reads data, is put into follow-up event queue;The existing purpose of event queue is to solve The problem of processing speed of data read module and the processing speed of follow-up data modular converter may be inconsistent, there is provided certain Buffering.
According to a kind of multi-resources Heterogeneous data fusion platform of the present invention, it is characterised in that:Conversion module is from event queue Not merely it is that target data is simply converted and be saved according to business after middle extraction source database data altering event In storehouse;And whether there is external key in checking in the data being currently pushed(Foreign Key)If the value of some external keys is temporary transient Valid data can not be referred in target database, then the external key of the data is temporarily arranged to nothing in target database, Its following possible value is retained in extra temporary marker field simultaneously, is continued until certain event handling in future Afterwards at this moment the desired quantity to be quoted of the temporary marker field just sets the outer of that original data according to when being also pushed in place Key, to allow it to quote the newest data being pushed.
This framework allows the relational structure of developer's configuration target database, including specified foreign key constraint.If Existing developer specifies an external key for certain table, is designated as fk, but this framework can actually generate two fields in object table Fk and hidden $ fk.Wherein fk is external key truly, possesses foreign key constraint, otherwise it is sky, otherwise deposited for one Parent data id;And hidden $ fk are only the external key in a symbolic meaning, do not possess foreign key constraint, can set Any value, including illegal value.
This framework provides a set of mechanism, by the database for receiving third party's data supplier data syn-chronization(To the 3rd For number formulary is according to service provider, this is consumer;For this framework, this is producer.Hereinafter third party database)In Low quality is continuously synchronized in target database without bound data change.When certain subdata is by the data of this framework Synchronization module from third party database be synchronized to target database when, first should will be external key in the data but not set by third party The value for being set to the row of external key copies to the symbolic external key hidden fk of target database and in non-genuine external key fk, i.e., Hidden $ fk have recorded the value of true external key future but non-present as a temporary marker.If now hidden $ fk are signified The parent data drawn has been present, and copying hidden $ fk in just genuine key fk finish relations builds;Otherwise true external key continues Null value is kept, and expects that follow-up data can synchronously be in due course and assigns its correctly value.In follow-up data synchronization, work as father After data are synchronized to position finally, hidden $ fk are found in all subdatas and are equal to currently by just by the parent data of synchronization Id subdata, for each subdata found, its hidden fk value is copied in fk.
All above-mentioned steps, be black box for developer, except know in database it is all with hidden start row All be with business it is unrelated, for assistance data fusion technical row outside, user need not be concerned about any interior details.
Certainly, necessarily there is many other functions as a data convergence platform, this framework, such as carry out source data Certain deformation for possessing business meaning is then stored into the function of target database after changing.But these functions and on the market other Data Integration is similar, and not this platform is distinctive, repeats no more.
Multi-resources Heterogeneous data fusion platform of the present invention, the framework of a Java language exploitation.Developer can base Simple secondary development is carried out in it, the data of multiple different source databases are continuously synchronized to a target data In storehouse, in this process, a most important subfunction be by constrain loosely even without low quality data according to exploitation The wish of personnel is converted into the complete quality data of constraint.
The invention has the advantages that:The present invention provides a kind of multi-resources Heterogeneous data fusion platform, and the framework platform is in reality Irreplaceable key effect is served in the project of border.Even if third party database is abandoned significantly to simplify data synchronization technology Restricted selection provides the low quality data storehouse containing invalid data, and it still can be converted into constraint perfects, relation just by the framework It is true, strict logic, can be by quality data storehouse that production environment directly accesses.The exactly basic-level support of this framework, it is whole The exploitation of individual upper strata complicated applications just becomes feasible.
Brief description of the drawings
Fig. 1 is multi-resources Heterogeneous data fusion platform structure functional block diagram of the present invention;
Fig. 2-Fig. 3 is corresponding database comparison diagram when foreign key constraint of the present invention recovers.
Embodiment
Below in conjunction with accompanying drawing 1- Fig. 2, the present invention is described in detail, and the technical scheme in the embodiment of the present invention is entered Row clearly and completely describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole realities Apply example.Based on the embodiment in the present invention, those of ordinary skill in the art are obtained under the premise of creative work is not made Every other embodiment, belong to the scope of protection of the invention.
The present invention provides a kind of multi-resources Heterogeneous data fusion platform herein by improving, and can give reality as follows Apply;
The multi-resources Heterogeneous data fusion platform includes Metadata modules, data read module, Transformer modular converters And Foreign Key repair modules;Shown in reference picture 1:
Wherein, the code structure of Metadata module analysis user, the SQL statement for building table, building constraint, indexing is automatically generated, The desired data structure of user is generated in target database, refers to and the arrow for being is marked in figure.
Wherein, system data read module directly reads all data from source database.This process is extremely very long, often counts My god;Therefore system is done once before reaching the standard grade first, this once, is performed never again later, this is full dose renewal, is referred in figure Labeled as 2 arrow.
Meanwhile changed from source database and data change is continuously read in daily record, this is incremental update, refers to figure It is middle to mark the arrow for being.
Which kind of mode data either are read with, are put into follow-up event queue, refer to and the arrow for being is marked in figure Head.The existing purpose of event queue is to solve the processing of the processing speed of data read module and follow-up data modular converter The problem of speed may be inconsistent, there is provided certain buffering.
Wherein, Transformer modular converters obtain data change from event queue, call personal code work, will be old Data need to be changed into desired new data according to the specific business of project, refer to and the arrow for being is marked in figure.
Meanwhile the new data after processing is put into target database by Transformer modular converters, is referred in figure and is marked For 6 arrow.
Transformer conversion modules notice Foreign Key repair modules new data arrives, it may be necessary to repairs outer Key, refer to and the arrow for being is marked in figure.
Finally, Foreign Key repair modules will can become legal in target database because of the arrival of latest data Legacy data foreign key field all repair, refer to and the arrow for being marked in figure.
Foreign key constraint restoration methods are:
(1), source database many datas are inserted by third party's service, but their parent datas for relying on jointly also not by Insertion.After the synchronization of this platform, source database and target database are to such as Fig. 2;
(2), after one section of very long wait, the parent data belonging to these data is inserted into source data by third party's service finally In storehouse;After the synchronization of this platform, source database and target database are to such as Fig. 3.
The foregoing description of the disclosed embodiments, professional and technical personnel in the field are enable to realize or using the present invention. A variety of modifications to these embodiments will be apparent for those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, it is of the invention The embodiments shown herein is not intended to be limited to, and is to fit to and principles disclosed herein and features of novelty phase one The most wide scope caused.

Claims (4)

  1. A kind of 1. multi-resources Heterogeneous data fusion platform, it is characterised in that:The multi-resources Heterogeneous data fusion platform includes Metadata modules, data read module, Transformer modular converters and Foreign Key repair modules;
    Wherein, Metadata modules are used for the code structure for analyzing user, automatically generate the SQL languages for building table, building constraint, indexing Sentence, the desired data structure of user is generated in target database;
    Wherein, system data read module directly reads all data from low quality source database;Meanwhile changed from source database Data change is continuously read in daily record;
    Wherein, Transformer modular converters obtain data change from event queue, personal code work are called, by old data Need to be changed into desired new data according to the specific business of project;
    At the same time, the new data after processing is put into high quality target database by Transformer modular converters;
    Also, Transformer conversion modules notice Foreign Key repair modules new data arrives;
    Described, Foreign Key repair modules will can become legal institute because of the arrival of latest data in target database There is foreign key constraint all to repair.
  2. A kind of 2. multi-resources Heterogeneous data fusion platform according to claim 1, it is characterised in that:System data read module from Source database directly reads all data, and this process is extremely very long, often a couple of days;Therefore system is done once before reaching the standard grade first, only This once, later no longer full dose update, the substitute is incremental update.
  3. A kind of 3. multi-resources Heterogeneous data fusion platform according to claim 1, it is characterised in that:Come for data read module Say, either which kind of mode reads data, is put into follow-up event queue;The existing purpose of event queue is to solve The problem of processing speed of data read module and the processing speed of follow-up data modular converter may be inconsistent, there is provided certain Buffering.
  4. A kind of 4. multi-resources Heterogeneous data fusion platform according to claim 1, it is characterised in that:The conversion module is from event Not merely it is that target is simply converted and be saved according to business in queue after extraction source database data altering event In database;And whether there is external key in checking in the data being currently pushed(Foreign Key)If the value of some external keys Valid data can not be temporarily referred in target database, then is temporarily arranged to the external key of the data in target database Nothing, while its following possible value is retained in extra temporary marker field, it is continued until certain event in future After processing at this moment the desired quantity to be quoted of the temporary marker field just sets that original data according to when being also pushed in place External key, to allow it to quote the newest data being pushed.
CN201711113864.4A 2017-11-13 2017-11-13 A kind of multi-resources Heterogeneous data fusion platform Pending CN107844581A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711113864.4A CN107844581A (en) 2017-11-13 2017-11-13 A kind of multi-resources Heterogeneous data fusion platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711113864.4A CN107844581A (en) 2017-11-13 2017-11-13 A kind of multi-resources Heterogeneous data fusion platform

Publications (1)

Publication Number Publication Date
CN107844581A true CN107844581A (en) 2018-03-27

Family

ID=61681050

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711113864.4A Pending CN107844581A (en) 2017-11-13 2017-11-13 A kind of multi-resources Heterogeneous data fusion platform

Country Status (1)

Country Link
CN (1) CN107844581A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110888924A (en) * 2018-09-10 2020-03-17 深圳市从晶科技有限公司 Data acquisition system
CN112817990A (en) * 2021-01-28 2021-05-18 北京百度网讯科技有限公司 Data processing method and device, electronic equipment and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110060719A1 (en) * 2009-09-05 2011-03-10 Vivek Kapoor Method for Transforming Setup Data in Business Applications
CN102495916A (en) * 2011-11-07 2012-06-13 中国南方电网有限责任公司 Multi-application-system panoramic modeling method based on object matching
CN103441988A (en) * 2013-08-02 2013-12-11 广东电网公司电力科学研究院 Data migration method crossing GIS platforms
CN105808553A (en) * 2014-09-26 2016-07-27 三星Sds株式会社 Database migration method and device thereof
CN106547853A (en) * 2016-10-19 2017-03-29 北京航天泰坦科技股份有限公司 Forestry big data building method based on a figure
CN106933859A (en) * 2015-12-30 2017-07-07 中国移动通信集团公司 The moving method and device of a kind of medical data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110060719A1 (en) * 2009-09-05 2011-03-10 Vivek Kapoor Method for Transforming Setup Data in Business Applications
CN102495916A (en) * 2011-11-07 2012-06-13 中国南方电网有限责任公司 Multi-application-system panoramic modeling method based on object matching
CN103441988A (en) * 2013-08-02 2013-12-11 广东电网公司电力科学研究院 Data migration method crossing GIS platforms
CN105808553A (en) * 2014-09-26 2016-07-27 三星Sds株式会社 Database migration method and device thereof
CN106933859A (en) * 2015-12-30 2017-07-07 中国移动通信集团公司 The moving method and device of a kind of medical data
CN106547853A (en) * 2016-10-19 2017-03-29 北京航天泰坦科技股份有限公司 Forestry big data building method based on a figure

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MOZHGAN MEMARI ET AL.: ""SQL Data Profiling of Foreign Keys"", 《INTERNATIONAL CONFERENCE ON CONCEPTUAL MODELING》 *
马伟: ""一种基于XML的异构数据源集成系统的研究"", 《万方数据知识服务平台》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110888924A (en) * 2018-09-10 2020-03-17 深圳市从晶科技有限公司 Data acquisition system
CN112817990A (en) * 2021-01-28 2021-05-18 北京百度网讯科技有限公司 Data processing method and device, electronic equipment and readable storage medium
CN112817990B (en) * 2021-01-28 2024-03-08 北京百度网讯科技有限公司 Data processing method, device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
Wynar et al. Introduction to cataloging and classification
CN101509783B (en) Data checking method and device applying to navigation electronic map production
US7162688B1 (en) Method for automated generation and assembly of specifications documents in CADD environments
CA2606148A1 (en) Method of building a validation database
CN101313300A (en) Local search
EA200400614A1 (en) METHOD AND SYSTEM FOR CHECKING THE RELIABILITY OF REMOTE DATABASE
CN105320680A (en) Data synchronization method and device
CN102254029A (en) View-based data access system and method
CN107844581A (en) A kind of multi-resources Heterogeneous data fusion platform
CN107203642A (en) A kind of method of data synchronization and device
CN100504878C (en) SQL statement construction method and apparatus for preprocessing special-character
CN104915412A (en) Method and system for connecting dynamic management database
CN110134511A (en) A kind of shared storage optimization method of OpenTSDB
CN101013430A (en) Searching method and apparatus
CN111914028A (en) Method and device for synchronizing data relation of heterogeneous data sources based on graph increment
CN111897837A (en) Data query method, device, equipment and medium
CN115713309A (en) Internal auditing system
CN102646118B (en) Data indexing method and device
CN114116907A (en) Database synchronization method and device, electronic equipment and storage medium
CN105653532A (en) Method for synchronizing heterogeneous database
CN112231285A (en) Knowledge graph generation method and device based on data resources
Cimiano et al. Applying linked data principles to linking multilingual wordnets
CN104881455A (en) Structural difference processing method and system based on MYSQL
Jeong et al. A message conversion system, XML-based metadata semantics description language and metadata repository
CN109446223B (en) Data integration method among multiple systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned
AD01 Patent right deemed abandoned

Effective date of abandoning: 20220614