CN106250444A - The real-time Input System of a kind of heterogeneous data source and method - Google Patents

The real-time Input System of a kind of heterogeneous data source and method Download PDF

Info

Publication number
CN106250444A
CN106250444A CN201610600065.9A CN201610600065A CN106250444A CN 106250444 A CN106250444 A CN 106250444A CN 201610600065 A CN201610600065 A CN 201610600065A CN 106250444 A CN106250444 A CN 106250444A
Authority
CN
China
Prior art keywords
data
real
data source
heterogeneous
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610600065.9A
Other languages
Chinese (zh)
Inventor
温宗臣
张翼
何良均
范卫卫
冯森林
崔晶晶
林佳婕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING GEO POLYMERIZATION TECHNOLOGY Co Ltd
Original Assignee
BEIJING GEO POLYMERIZATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING GEO POLYMERIZATION TECHNOLOGY Co Ltd filed Critical BEIJING GEO POLYMERIZATION TECHNOLOGY Co Ltd
Priority to CN201610600065.9A priority Critical patent/CN106250444A/en
Publication of CN106250444A publication Critical patent/CN106250444A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The real-time Input System of a kind of heterogeneous data source is disclosed, it is capable of the hot plug of heterogeneous data source, can arbitrarily add or reduce the configuration of data source, put in storage the most in real time according to configuration data, and on the basis of distributed system, achieve the data synchronization process between heterogeneous system.This system includes: data acquisition module, and its configuration comes for each data source batch through AES encryption, starts multiple subtasks parallel acquisition;And configuration makes data access layer support the stop and start of task and recall and breakpoint transmission;Data transmission module, it is configured to a customization data channel based on distributed structure/architecture;Data resolution module, it is configured to a streaming computing cluster based on distributed structure/architecture, in order to completes to extract in real time, change, load ETL process;Data memory module, it is configured to, according to DSN and time marking, generate data partition, selects the flow direction of storage medium.Also has method.

Description

The real-time Input System of a kind of heterogeneous data source and method
Technical field
The invention belongs to the technical field that big data process, more particularly to the real-time Input System of a kind of heterogeneous data source And method.
Background technology
In prior art, such as, Chinese patent " a kind of heterogeneous data source efficient data synchronous method " (number of patent application 2015108101397), Chinese patent " heterogeneous data source real-time synchronization system and method " (number of patent application 2015102411686) technical scheme being mutually in step data between the data source of two isomeries, is all given.
But, prior art solve only Database Systems and is exchanged with each other the problem of data between any two, does not solve The certainly data exchange between heterogeneous system;Data syn-chronization is all one-of-a-kind system behavior, is not distributed system, it is impossible to support magnanimity Data syn-chronization.
Summary of the invention
The technology of the present invention solves problem: overcome the deficiencies in the prior art, it is provided that entering in real time of a kind of heterogeneous data source Storehouse system, it is capable of the hot plug of heterogeneous data source, can arbitrarily add or reduce the configuration of data source, according to configuration Data are put in storage the most in real time, and achieve the data synchronization process between heterogeneous system on the basis of distributed system.
The technical solution of the present invention is: the real-time Input System of this heterogeneous data source, and this system includes:
Data acquisition module, its configuration comes for each data source batch through AES encryption, starts multiple subtask also Row gathers;And configuration makes data access layer support the stop and start of task and recall and breakpoint transmission;
Data transmission module, it is configured to a customization data channel based on distributed structure/architecture;
Data resolution module, it is configured to a streaming computing cluster based on distributed structure/architecture, in order to complete to take out in real time Take, change, load ETL process;
Data memory module, it is configured to, according to DSN and time marking, generate data partition, selects storage to be situated between The flow direction of matter.
The present invention passes through data acquisition module pulling data, by data transmission module transparent data, and records data volume, Data source is resolved according to the content in configuration file, by data memory module the number after parsing by data resolution module According in storage to storage medium, therefore, it is possible to realize the hot plug of heterogeneous data source, can arbitrarily add or reduce data source Configuration, put in storage the most in real time according to configuration data, and achieve between heterogeneous system on the basis of distributed system Data synchronization process.
Additionally providing the real-time storage method of a kind of heterogeneous data source, the method comprises the following steps:
(1) for each data source batch through AES encryption, multiple subtasks parallel acquisition is started;And configuration makes Data access layer is supported the stop and start of task and recalls and breakpoint transmission;
(2) carried out data transmission by a customization data channel based on distributed structure/architecture;
(3) carry out the extracting in real time of data by a streaming computing cluster based on distributed structure/architecture, change, load;
(4) according to DSN and time marking, generate data partition, select the flow direction of storage medium.
Accompanying drawing explanation
Fig. 1 shows the structural representation of the real-time Input System of the heterogeneous data source according to the present invention.
Detailed description of the invention
As it is shown in figure 1, the real-time Input System of this heterogeneous data source, this system includes:
Data acquisition module, its configuration comes for each through AES (Advanced Encryption Standard, height Level encryption standard, also known as Rijndael enciphered method in cryptography, is a kind of block encryption standard of Federal Government employing) The data source batch of encryption, starts multiple subtasks parallel acquisition;And configure make data access layer support task stop and start, with And recall and breakpoint transmission;
Data transmission module, it is configured to a customization data channel based on distributed structure/architecture;
Data resolution module, it is configured to a streaming computing cluster based on distributed structure/architecture, in order to complete to take out in real time Take, change, load ETL process;ETL, is the abbreviation of English Extract-Transform-Load, is used for describing by data always Source is through extraction (extract), conversion (transform), the process of loading (load) to destination;
ETL mono-word is more common at data warehouse, but its object is not limited to data warehouse;
Data memory module, it is configured to, according to DSN and time marking, generate data partition, selects storage to be situated between The flow direction of matter.
The present invention passes through data acquisition module pulling data, by data transmission module transparent data, and records data volume, Data source is resolved according to the content in configuration file, by data memory module the number after parsing by data resolution module According in storage to storage medium, therefore, it is possible to realize the hot plug of heterogeneous data source, can arbitrarily add or reduce data source Configuration, put in storage the most in real time according to configuration data, and achieve between heterogeneous system on the basis of distributed system Data synchronization process.
It addition, described data transmission module, it is further configured to a data access layer and decouples with data resolution module, it is provided that number According to buffering and data filing.
It addition, the data buffering time of described data transmission module is one week, data filing, for permanently storing, the most also may be used To be set to At All Other Times.
It addition, described data resolution module, the deciphering of Advanced Encryption Standard AES of the completeest paired data;Next carry out Data are done structuring and normalized, and are washed out abnormal data by the analytical algorithm matched with heterogeneous data source.
It addition, described data memory module, it is additionally configured to judge partition data according to data total amount and time delay Whether terminate, on-line analysis and off-line interactive inquiry service are externally provided after data loading.
It addition, described storage medium is Hadoop distributed file system HDFS, PostgreSQL database HBASE or Transaction Information Storehouse ES.
Additionally providing the real-time storage method of a kind of heterogeneous data source, the method comprises the following steps:
(1) for each data source batch through AES encryption, multiple subtasks parallel acquisition is started;And configuration makes Data access layer is supported the stop and start of task and recalls and breakpoint transmission;
(2) carried out data transmission by a customization data channel based on distributed structure/architecture;
(3) carry out the extracting in real time of data by a streaming computing cluster based on distributed structure/architecture, change, load;
(4) according to DSN and time marking, generate data partition, select the flow direction of storage medium.
Certainly, before step (1), in addition it is also necessary to disposition data source node, including the IP of data source, port, storage organization, And maximum access speed.
It addition, in described step (2), data access layer is decoupled with data resolution module, it is provided that data buffering and data Filing.
It addition, in described step (3), the deciphering of Advanced Encryption Standard AES of the completeest paired data;Next is carried out with different Data are done structuring and normalized, and are washed out abnormal data by the analytical algorithm that structure data source matches.
It addition, in described step (4), judge whether partition data terminates according to data total amount and time delay, data On-line analysis and off-line interactive inquiry service are externally provided after warehouse-in.
Beneficial effects of the present invention is as follows:
1. increase and decrease configurableization of data source;
2. based on distributed data syn-chronization, in hgher efficiency;
3. data syn-chronization between the data system of isomery, has more the universal meaning of data exchange;
The above, be only presently preferred embodiments of the present invention, and the present invention not makees any pro forma restriction, every depends on Any simple modification, equivalent variations and the modification made above example according to the technical spirit of the present invention, the most still belongs to the present invention The protection domain of technical scheme.

Claims (10)

1. the real-time Input System of a heterogeneous data source, it is characterised in that: this system includes:
Data acquisition module, its configuration comes for each data source batch through AES encryption, starts multiple subtask and adopts parallel Collection;And configuration makes data access layer support the stop and start of task and recall and breakpoint transmission;
Data transmission module, it is configured to a customization data channel based on distributed structure/architecture;
Data resolution module, it is configured to a streaming computing cluster based on distributed structure/architecture, in order to complete to extract in real time, turn Change, load ETL process;
Data memory module, it is configured to, according to DSN and time marking, generate data partition, selects storage medium Flow to.
The real-time Input System of heterogeneous data source the most according to claim 1, it is characterised in that: described data transmission mould Block, it is further configured to a data access layer and decouples with data resolution module, it is provided that data buffering and data filing.
The real-time Input System of heterogeneous data source the most according to claim 2, it is characterised in that: described data transmission module The data buffering time be one week, data filing is for permanently storing.
4. according to the real-time Input System of the heterogeneous data source described in Claims 2 or 3, it is characterised in that: described data parsing Module, the deciphering of Advanced Encryption Standard AES of the completeest paired data;Next parsing carrying out matching with heterogeneous data source is calculated Data are done structuring and normalized, and are washed out abnormal data by method.
The real-time Input System of heterogeneous data source the most according to claim 4, it is characterised in that: described data storage mould Block, it is additionally configured to judge whether partition data terminates according to data total amount and time delay, externally carries after data loading For on-line analysis and off-line interactive inquiry service.
The real-time Input System of heterogeneous data source the most according to claim 5, it is characterised in that: described storage medium is Hadoop distributed file system HDFS, PostgreSQL database HBASE or transaction database ES.
7. the real-time storage method of a heterogeneous data source, it is characterised in that: the method comprises the following steps:
(1) for each data source batch through AES encryption, multiple subtasks parallel acquisition is started;And configuration makes data Access Layer is supported the stop and start of task and recalls and breakpoint transmission;
(2) carried out data transmission by a customization data channel based on distributed structure/architecture;
(3) carry out the extracting in real time of data by a streaming computing cluster based on distributed structure/architecture, change, load;
(4) according to DSN and time marking, generate data partition, select the flow direction of storage medium.
The real-time storage method of heterogeneous data source the most according to claim 7, it is characterised in that: in described step (2), Data access layer decouples with data resolution module, it is provided that data buffering and data filing.
The real-time storage method of heterogeneous data source the most according to claim 8, it is characterised in that: in described step (3), first The first deciphering of Advanced Encryption Standard AES of complete paired data;Next carries out the analytical algorithm matched with heterogeneous data source, logarithm According to doing structuring and normalized, and wash out abnormal data.
The real-time storage method of heterogeneous data source the most according to claim 9, it is characterised in that: in described step (4), Judge whether partition data terminates according to data total amount and time delay, externally provide after data loading on-line analysis and from Line interactive inquiry services.
CN201610600065.9A 2016-07-27 2016-07-27 The real-time Input System of a kind of heterogeneous data source and method Pending CN106250444A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610600065.9A CN106250444A (en) 2016-07-27 2016-07-27 The real-time Input System of a kind of heterogeneous data source and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610600065.9A CN106250444A (en) 2016-07-27 2016-07-27 The real-time Input System of a kind of heterogeneous data source and method

Publications (1)

Publication Number Publication Date
CN106250444A true CN106250444A (en) 2016-12-21

Family

ID=57604294

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610600065.9A Pending CN106250444A (en) 2016-07-27 2016-07-27 The real-time Input System of a kind of heterogeneous data source and method

Country Status (1)

Country Link
CN (1) CN106250444A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108459919A (en) * 2018-03-29 2018-08-28 中信百信银行股份有限公司 A kind of distributed transaction processing method and device
CN108804533A (en) * 2018-05-04 2018-11-13 佛山科学技术学院 A kind of filter method and device of isomery big data information
CN109246073A (en) * 2018-07-04 2019-01-18 杭州数云信息技术有限公司 A kind of data flow processing system and its method
CN109271435A (en) * 2018-09-14 2019-01-25 南威软件股份有限公司 A kind of data pick-up method and system for supporting breakpoint transmission
CN109815292A (en) * 2019-01-03 2019-05-28 广州中软信息技术有限公司 A kind of concerning taxes data collection system based on asynchronous message mechanism
CN110309108A (en) * 2019-05-08 2019-10-08 江苏满运软件科技有限公司 Data acquisition and storage method, device, electronic equipment, storage medium
CN111026535A (en) * 2019-12-12 2020-04-17 成都九洲电子信息系统股份有限公司 Non-standardized hot plug type data batch processing method
WO2020215532A1 (en) * 2019-04-26 2020-10-29 厦门市美亚柏科信息股份有限公司 System and method for data synchronization between heterogeneous databases, and storage medium
CN112015799A (en) * 2020-10-20 2020-12-01 平安国际智慧城市科技股份有限公司 ETL task execution method and device, computer equipment and storage medium
CN113239081A (en) * 2021-05-21 2021-08-10 瀚云科技有限公司 Streaming data calculation method
CN113377863A (en) * 2020-03-10 2021-09-10 阿里巴巴集团控股有限公司 Data synchronization method and device, electronic equipment and computer readable storage medium
CN113688116A (en) * 2020-05-19 2021-11-23 长鑫存储技术有限公司 Data presentation system, method, device and computer readable storage medium
CN115186020A (en) * 2022-07-15 2022-10-14 深圳安巽科技有限公司 Data access storage processing method, system and storage medium
US11983224B2 (en) 2020-05-19 2024-05-14 Changxin Memory Technologies, Inc. Data presentation system, method and device, and computer-readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090063453A1 (en) * 2007-08-29 2009-03-05 International Business Machines Corporation Apparatus, system, and method for executing a distributed spatial data query
CN101957865A (en) * 2010-10-27 2011-01-26 杭州新中大软件股份有限公司 Data exchange and sharing technology among heterogeneous systems
CN102938731A (en) * 2012-11-22 2013-02-20 北京锐易特软件技术有限公司 Exchange and integration device and method based on proxy cache adaptation model
CN104699723A (en) * 2013-12-10 2015-06-10 北京神州泰岳软件股份有限公司 Data exchange adapter and system and method for synchronizing data among heterogeneous systems
CN105243155A (en) * 2015-10-29 2016-01-13 贵州电网有限责任公司电力调度控制中心 Big data extracting and exchanging system
CN105677836A (en) * 2016-01-05 2016-06-15 北京汇商融通信息技术有限公司 Big data processing and solving system simultaneously supporting offline data and real-time online data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090063453A1 (en) * 2007-08-29 2009-03-05 International Business Machines Corporation Apparatus, system, and method for executing a distributed spatial data query
CN101957865A (en) * 2010-10-27 2011-01-26 杭州新中大软件股份有限公司 Data exchange and sharing technology among heterogeneous systems
CN102938731A (en) * 2012-11-22 2013-02-20 北京锐易特软件技术有限公司 Exchange and integration device and method based on proxy cache adaptation model
CN104699723A (en) * 2013-12-10 2015-06-10 北京神州泰岳软件股份有限公司 Data exchange adapter and system and method for synchronizing data among heterogeneous systems
CN105243155A (en) * 2015-10-29 2016-01-13 贵州电网有限责任公司电力调度控制中心 Big data extracting and exchanging system
CN105677836A (en) * 2016-01-05 2016-06-15 北京汇商融通信息技术有限公司 Big data processing and solving system simultaneously supporting offline data and real-time online data

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108459919B (en) * 2018-03-29 2022-04-15 中信百信银行股份有限公司 Distributed transaction processing method and device
CN108459919A (en) * 2018-03-29 2018-08-28 中信百信银行股份有限公司 A kind of distributed transaction processing method and device
CN108804533B (en) * 2018-05-04 2021-11-30 佛山科学技术学院 Heterogeneous big data information filtering method and device
CN108804533A (en) * 2018-05-04 2018-11-13 佛山科学技术学院 A kind of filter method and device of isomery big data information
CN109246073A (en) * 2018-07-04 2019-01-18 杭州数云信息技术有限公司 A kind of data flow processing system and its method
CN109271435A (en) * 2018-09-14 2019-01-25 南威软件股份有限公司 A kind of data pick-up method and system for supporting breakpoint transmission
CN109271435B (en) * 2018-09-14 2022-03-04 南威软件股份有限公司 Data extraction method and system supporting breakpoint continuous transmission
CN109815292A (en) * 2019-01-03 2019-05-28 广州中软信息技术有限公司 A kind of concerning taxes data collection system based on asynchronous message mechanism
WO2020215532A1 (en) * 2019-04-26 2020-10-29 厦门市美亚柏科信息股份有限公司 System and method for data synchronization between heterogeneous databases, and storage medium
CN110309108A (en) * 2019-05-08 2019-10-08 江苏满运软件科技有限公司 Data acquisition and storage method, device, electronic equipment, storage medium
CN111026535A (en) * 2019-12-12 2020-04-17 成都九洲电子信息系统股份有限公司 Non-standardized hot plug type data batch processing method
CN111026535B (en) * 2019-12-12 2023-03-21 成都九洲电子信息系统股份有限公司 Non-standardized hot plug type data batch processing method
CN113377863A (en) * 2020-03-10 2021-09-10 阿里巴巴集团控股有限公司 Data synchronization method and device, electronic equipment and computer readable storage medium
CN113377863B (en) * 2020-03-10 2022-04-29 阿里巴巴集团控股有限公司 Data synchronization method and device, electronic equipment and computer readable storage medium
CN113688116A (en) * 2020-05-19 2021-11-23 长鑫存储技术有限公司 Data presentation system, method, device and computer readable storage medium
US11983224B2 (en) 2020-05-19 2024-05-14 Changxin Memory Technologies, Inc. Data presentation system, method and device, and computer-readable storage medium
CN113688116B (en) * 2020-05-19 2024-08-23 长鑫存储技术有限公司 Data display system, method, apparatus and computer readable storage medium
CN112015799A (en) * 2020-10-20 2020-12-01 平安国际智慧城市科技股份有限公司 ETL task execution method and device, computer equipment and storage medium
CN113239081A (en) * 2021-05-21 2021-08-10 瀚云科技有限公司 Streaming data calculation method
CN115186020A (en) * 2022-07-15 2022-10-14 深圳安巽科技有限公司 Data access storage processing method, system and storage medium

Similar Documents

Publication Publication Date Title
CN106250444A (en) The real-time Input System of a kind of heterogeneous data source and method
US10949447B2 (en) Blockchain-based data synchronizing and data block parsing method and device
JP6716727B2 (en) Streaming data distributed processing method and apparatus
CN110674154B (en) Spark-based method for inserting, updating and deleting data in Hive
US20160035044A1 (en) Account processing method and apparatus
CN104036025A (en) Distribution-base mass log collection system
US10706062B2 (en) Method and system for exchanging data from a big data source to a big data target corresponding to components of the big data source
CN107220310A (en) A kind of database data management system, method and device
CN103984745A (en) Distributed video vertical searching method and system
CN105045856A (en) Hadoop-based data processing system for big-data remote sensing satellite
CN103516802A (en) Method and device for achieving seamless transference of across heterogeneous virtual switch
CN113900810A (en) Distributed graph processing method, system and storage medium
CN111343241B (en) Graph data updating method, device and system
CN105530272A (en) Method and device for application data synchronization
CN104572505A (en) System and method for ensuring eventual consistency of mass data caches
Kchaou et al. Towards an offloading framework based on big data analytics in mobile cloud computing environments
CN104899278A (en) Method and apparatus for generating data operation logs of Hbase database
CN111476595A (en) Product pushing method and device, computer equipment and storage medium
US10853367B1 (en) Dynamic prioritization of attributes to determine search space size of each term, then index on those sizes as attributes
CN105681199A (en) Method and device for processing message data in vehicular bus
Chen et al. Big data generation and acquisition
CN117093619A (en) Rule engine processing method and device, electronic equipment and storage medium
CN111538772A (en) Data exchange processing method and device, electronic equipment and storage medium
CN106161056B (en) The distributed caching O&M method and device of preiodic type data
US10860614B2 (en) Partitioning data in a clustered database environment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20161221