CN106250444A - The real-time Input System of a kind of heterogeneous data source and method - Google Patents
The real-time Input System of a kind of heterogeneous data source and method Download PDFInfo
- Publication number
- CN106250444A CN106250444A CN201610600065.9A CN201610600065A CN106250444A CN 106250444 A CN106250444 A CN 106250444A CN 201610600065 A CN201610600065 A CN 201610600065A CN 106250444 A CN106250444 A CN 106250444A
- Authority
- CN
- China
- Prior art keywords
- data
- real
- data source
- heterogeneous
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The real-time Input System of a kind of heterogeneous data source is disclosed, it is capable of the hot plug of heterogeneous data source, can arbitrarily add or reduce the configuration of data source, put in storage the most in real time according to configuration data, and on the basis of distributed system, achieve the data synchronization process between heterogeneous system.This system includes: data acquisition module, and its configuration comes for each data source batch through AES encryption, starts multiple subtasks parallel acquisition;And configuration makes data access layer support the stop and start of task and recall and breakpoint transmission;Data transmission module, it is configured to a customization data channel based on distributed structure/architecture;Data resolution module, it is configured to a streaming computing cluster based on distributed structure/architecture, in order to completes to extract in real time, change, load ETL process;Data memory module, it is configured to, according to DSN and time marking, generate data partition, selects the flow direction of storage medium.Also has method.
Description
Technical field
The invention belongs to the technical field that big data process, more particularly to the real-time Input System of a kind of heterogeneous data source
And method.
Background technology
In prior art, such as, Chinese patent " a kind of heterogeneous data source efficient data synchronous method " (number of patent application
2015108101397), Chinese patent " heterogeneous data source real-time synchronization system and method " (number of patent application
2015102411686) technical scheme being mutually in step data between the data source of two isomeries, is all given.
But, prior art solve only Database Systems and is exchanged with each other the problem of data between any two, does not solve
The certainly data exchange between heterogeneous system;Data syn-chronization is all one-of-a-kind system behavior, is not distributed system, it is impossible to support magnanimity
Data syn-chronization.
Summary of the invention
The technology of the present invention solves problem: overcome the deficiencies in the prior art, it is provided that entering in real time of a kind of heterogeneous data source
Storehouse system, it is capable of the hot plug of heterogeneous data source, can arbitrarily add or reduce the configuration of data source, according to configuration
Data are put in storage the most in real time, and achieve the data synchronization process between heterogeneous system on the basis of distributed system.
The technical solution of the present invention is: the real-time Input System of this heterogeneous data source, and this system includes:
Data acquisition module, its configuration comes for each data source batch through AES encryption, starts multiple subtask also
Row gathers;And configuration makes data access layer support the stop and start of task and recall and breakpoint transmission;
Data transmission module, it is configured to a customization data channel based on distributed structure/architecture;
Data resolution module, it is configured to a streaming computing cluster based on distributed structure/architecture, in order to complete to take out in real time
Take, change, load ETL process;
Data memory module, it is configured to, according to DSN and time marking, generate data partition, selects storage to be situated between
The flow direction of matter.
The present invention passes through data acquisition module pulling data, by data transmission module transparent data, and records data volume,
Data source is resolved according to the content in configuration file, by data memory module the number after parsing by data resolution module
According in storage to storage medium, therefore, it is possible to realize the hot plug of heterogeneous data source, can arbitrarily add or reduce data source
Configuration, put in storage the most in real time according to configuration data, and achieve between heterogeneous system on the basis of distributed system
Data synchronization process.
Additionally providing the real-time storage method of a kind of heterogeneous data source, the method comprises the following steps:
(1) for each data source batch through AES encryption, multiple subtasks parallel acquisition is started;And configuration makes
Data access layer is supported the stop and start of task and recalls and breakpoint transmission;
(2) carried out data transmission by a customization data channel based on distributed structure/architecture;
(3) carry out the extracting in real time of data by a streaming computing cluster based on distributed structure/architecture, change, load;
(4) according to DSN and time marking, generate data partition, select the flow direction of storage medium.
Accompanying drawing explanation
Fig. 1 shows the structural representation of the real-time Input System of the heterogeneous data source according to the present invention.
Detailed description of the invention
As it is shown in figure 1, the real-time Input System of this heterogeneous data source, this system includes:
Data acquisition module, its configuration comes for each through AES (Advanced Encryption Standard, height
Level encryption standard, also known as Rijndael enciphered method in cryptography, is a kind of block encryption standard of Federal Government employing)
The data source batch of encryption, starts multiple subtasks parallel acquisition;And configure make data access layer support task stop and start, with
And recall and breakpoint transmission;
Data transmission module, it is configured to a customization data channel based on distributed structure/architecture;
Data resolution module, it is configured to a streaming computing cluster based on distributed structure/architecture, in order to complete to take out in real time
Take, change, load ETL process;ETL, is the abbreviation of English Extract-Transform-Load, is used for describing by data always
Source is through extraction (extract), conversion (transform), the process of loading (load) to destination;
ETL mono-word is more common at data warehouse, but its object is not limited to data warehouse;
Data memory module, it is configured to, according to DSN and time marking, generate data partition, selects storage to be situated between
The flow direction of matter.
The present invention passes through data acquisition module pulling data, by data transmission module transparent data, and records data volume,
Data source is resolved according to the content in configuration file, by data memory module the number after parsing by data resolution module
According in storage to storage medium, therefore, it is possible to realize the hot plug of heterogeneous data source, can arbitrarily add or reduce data source
Configuration, put in storage the most in real time according to configuration data, and achieve between heterogeneous system on the basis of distributed system
Data synchronization process.
It addition, described data transmission module, it is further configured to a data access layer and decouples with data resolution module, it is provided that number
According to buffering and data filing.
It addition, the data buffering time of described data transmission module is one week, data filing, for permanently storing, the most also may be used
To be set to At All Other Times.
It addition, described data resolution module, the deciphering of Advanced Encryption Standard AES of the completeest paired data;Next carry out
Data are done structuring and normalized, and are washed out abnormal data by the analytical algorithm matched with heterogeneous data source.
It addition, described data memory module, it is additionally configured to judge partition data according to data total amount and time delay
Whether terminate, on-line analysis and off-line interactive inquiry service are externally provided after data loading.
It addition, described storage medium is Hadoop distributed file system HDFS, PostgreSQL database HBASE or Transaction Information
Storehouse ES.
Additionally providing the real-time storage method of a kind of heterogeneous data source, the method comprises the following steps:
(1) for each data source batch through AES encryption, multiple subtasks parallel acquisition is started;And configuration makes
Data access layer is supported the stop and start of task and recalls and breakpoint transmission;
(2) carried out data transmission by a customization data channel based on distributed structure/architecture;
(3) carry out the extracting in real time of data by a streaming computing cluster based on distributed structure/architecture, change, load;
(4) according to DSN and time marking, generate data partition, select the flow direction of storage medium.
Certainly, before step (1), in addition it is also necessary to disposition data source node, including the IP of data source, port, storage organization,
And maximum access speed.
It addition, in described step (2), data access layer is decoupled with data resolution module, it is provided that data buffering and data
Filing.
It addition, in described step (3), the deciphering of Advanced Encryption Standard AES of the completeest paired data;Next is carried out with different
Data are done structuring and normalized, and are washed out abnormal data by the analytical algorithm that structure data source matches.
It addition, in described step (4), judge whether partition data terminates according to data total amount and time delay, data
On-line analysis and off-line interactive inquiry service are externally provided after warehouse-in.
Beneficial effects of the present invention is as follows:
1. increase and decrease configurableization of data source;
2. based on distributed data syn-chronization, in hgher efficiency;
3. data syn-chronization between the data system of isomery, has more the universal meaning of data exchange;
The above, be only presently preferred embodiments of the present invention, and the present invention not makees any pro forma restriction, every depends on
Any simple modification, equivalent variations and the modification made above example according to the technical spirit of the present invention, the most still belongs to the present invention
The protection domain of technical scheme.
Claims (10)
1. the real-time Input System of a heterogeneous data source, it is characterised in that: this system includes:
Data acquisition module, its configuration comes for each data source batch through AES encryption, starts multiple subtask and adopts parallel
Collection;And configuration makes data access layer support the stop and start of task and recall and breakpoint transmission;
Data transmission module, it is configured to a customization data channel based on distributed structure/architecture;
Data resolution module, it is configured to a streaming computing cluster based on distributed structure/architecture, in order to complete to extract in real time, turn
Change, load ETL process;
Data memory module, it is configured to, according to DSN and time marking, generate data partition, selects storage medium
Flow to.
The real-time Input System of heterogeneous data source the most according to claim 1, it is characterised in that: described data transmission mould
Block, it is further configured to a data access layer and decouples with data resolution module, it is provided that data buffering and data filing.
The real-time Input System of heterogeneous data source the most according to claim 2, it is characterised in that: described data transmission module
The data buffering time be one week, data filing is for permanently storing.
4. according to the real-time Input System of the heterogeneous data source described in Claims 2 or 3, it is characterised in that: described data parsing
Module, the deciphering of Advanced Encryption Standard AES of the completeest paired data;Next parsing carrying out matching with heterogeneous data source is calculated
Data are done structuring and normalized, and are washed out abnormal data by method.
The real-time Input System of heterogeneous data source the most according to claim 4, it is characterised in that: described data storage mould
Block, it is additionally configured to judge whether partition data terminates according to data total amount and time delay, externally carries after data loading
For on-line analysis and off-line interactive inquiry service.
The real-time Input System of heterogeneous data source the most according to claim 5, it is characterised in that: described storage medium is
Hadoop distributed file system HDFS, PostgreSQL database HBASE or transaction database ES.
7. the real-time storage method of a heterogeneous data source, it is characterised in that: the method comprises the following steps:
(1) for each data source batch through AES encryption, multiple subtasks parallel acquisition is started;And configuration makes data
Access Layer is supported the stop and start of task and recalls and breakpoint transmission;
(2) carried out data transmission by a customization data channel based on distributed structure/architecture;
(3) carry out the extracting in real time of data by a streaming computing cluster based on distributed structure/architecture, change, load;
(4) according to DSN and time marking, generate data partition, select the flow direction of storage medium.
The real-time storage method of heterogeneous data source the most according to claim 7, it is characterised in that: in described step (2),
Data access layer decouples with data resolution module, it is provided that data buffering and data filing.
The real-time storage method of heterogeneous data source the most according to claim 8, it is characterised in that: in described step (3), first
The first deciphering of Advanced Encryption Standard AES of complete paired data;Next carries out the analytical algorithm matched with heterogeneous data source, logarithm
According to doing structuring and normalized, and wash out abnormal data.
The real-time storage method of heterogeneous data source the most according to claim 9, it is characterised in that: in described step (4),
Judge whether partition data terminates according to data total amount and time delay, externally provide after data loading on-line analysis and from
Line interactive inquiry services.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610600065.9A CN106250444A (en) | 2016-07-27 | 2016-07-27 | The real-time Input System of a kind of heterogeneous data source and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610600065.9A CN106250444A (en) | 2016-07-27 | 2016-07-27 | The real-time Input System of a kind of heterogeneous data source and method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106250444A true CN106250444A (en) | 2016-12-21 |
Family
ID=57604294
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610600065.9A Pending CN106250444A (en) | 2016-07-27 | 2016-07-27 | The real-time Input System of a kind of heterogeneous data source and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106250444A (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108459919A (en) * | 2018-03-29 | 2018-08-28 | 中信百信银行股份有限公司 | A kind of distributed transaction processing method and device |
CN108804533A (en) * | 2018-05-04 | 2018-11-13 | 佛山科学技术学院 | A kind of filter method and device of isomery big data information |
CN109246073A (en) * | 2018-07-04 | 2019-01-18 | 杭州数云信息技术有限公司 | A kind of data flow processing system and its method |
CN109271435A (en) * | 2018-09-14 | 2019-01-25 | 南威软件股份有限公司 | A kind of data pick-up method and system for supporting breakpoint transmission |
CN109815292A (en) * | 2019-01-03 | 2019-05-28 | 广州中软信息技术有限公司 | A kind of concerning taxes data collection system based on asynchronous message mechanism |
CN110309108A (en) * | 2019-05-08 | 2019-10-08 | 江苏满运软件科技有限公司 | Data acquisition and storage method, device, electronic equipment, storage medium |
CN111026535A (en) * | 2019-12-12 | 2020-04-17 | 成都九洲电子信息系统股份有限公司 | Non-standardized hot plug type data batch processing method |
WO2020215532A1 (en) * | 2019-04-26 | 2020-10-29 | 厦门市美亚柏科信息股份有限公司 | System and method for data synchronization between heterogeneous databases, and storage medium |
CN112015799A (en) * | 2020-10-20 | 2020-12-01 | 平安国际智慧城市科技股份有限公司 | ETL task execution method and device, computer equipment and storage medium |
CN113239081A (en) * | 2021-05-21 | 2021-08-10 | 瀚云科技有限公司 | Streaming data calculation method |
CN113377863A (en) * | 2020-03-10 | 2021-09-10 | 阿里巴巴集团控股有限公司 | Data synchronization method and device, electronic equipment and computer readable storage medium |
CN113688116A (en) * | 2020-05-19 | 2021-11-23 | 长鑫存储技术有限公司 | Data presentation system, method, device and computer readable storage medium |
CN115186020A (en) * | 2022-07-15 | 2022-10-14 | 深圳安巽科技有限公司 | Data access storage processing method, system and storage medium |
US11983224B2 (en) | 2020-05-19 | 2024-05-14 | Changxin Memory Technologies, Inc. | Data presentation system, method and device, and computer-readable storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090063453A1 (en) * | 2007-08-29 | 2009-03-05 | International Business Machines Corporation | Apparatus, system, and method for executing a distributed spatial data query |
CN101957865A (en) * | 2010-10-27 | 2011-01-26 | 杭州新中大软件股份有限公司 | Data exchange and sharing technology among heterogeneous systems |
CN102938731A (en) * | 2012-11-22 | 2013-02-20 | 北京锐易特软件技术有限公司 | Exchange and integration device and method based on proxy cache adaptation model |
CN104699723A (en) * | 2013-12-10 | 2015-06-10 | 北京神州泰岳软件股份有限公司 | Data exchange adapter and system and method for synchronizing data among heterogeneous systems |
CN105243155A (en) * | 2015-10-29 | 2016-01-13 | 贵州电网有限责任公司电力调度控制中心 | Big data extracting and exchanging system |
CN105677836A (en) * | 2016-01-05 | 2016-06-15 | 北京汇商融通信息技术有限公司 | Big data processing and solving system simultaneously supporting offline data and real-time online data |
-
2016
- 2016-07-27 CN CN201610600065.9A patent/CN106250444A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090063453A1 (en) * | 2007-08-29 | 2009-03-05 | International Business Machines Corporation | Apparatus, system, and method for executing a distributed spatial data query |
CN101957865A (en) * | 2010-10-27 | 2011-01-26 | 杭州新中大软件股份有限公司 | Data exchange and sharing technology among heterogeneous systems |
CN102938731A (en) * | 2012-11-22 | 2013-02-20 | 北京锐易特软件技术有限公司 | Exchange and integration device and method based on proxy cache adaptation model |
CN104699723A (en) * | 2013-12-10 | 2015-06-10 | 北京神州泰岳软件股份有限公司 | Data exchange adapter and system and method for synchronizing data among heterogeneous systems |
CN105243155A (en) * | 2015-10-29 | 2016-01-13 | 贵州电网有限责任公司电力调度控制中心 | Big data extracting and exchanging system |
CN105677836A (en) * | 2016-01-05 | 2016-06-15 | 北京汇商融通信息技术有限公司 | Big data processing and solving system simultaneously supporting offline data and real-time online data |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108459919B (en) * | 2018-03-29 | 2022-04-15 | 中信百信银行股份有限公司 | Distributed transaction processing method and device |
CN108459919A (en) * | 2018-03-29 | 2018-08-28 | 中信百信银行股份有限公司 | A kind of distributed transaction processing method and device |
CN108804533B (en) * | 2018-05-04 | 2021-11-30 | 佛山科学技术学院 | Heterogeneous big data information filtering method and device |
CN108804533A (en) * | 2018-05-04 | 2018-11-13 | 佛山科学技术学院 | A kind of filter method and device of isomery big data information |
CN109246073A (en) * | 2018-07-04 | 2019-01-18 | 杭州数云信息技术有限公司 | A kind of data flow processing system and its method |
CN109271435A (en) * | 2018-09-14 | 2019-01-25 | 南威软件股份有限公司 | A kind of data pick-up method and system for supporting breakpoint transmission |
CN109271435B (en) * | 2018-09-14 | 2022-03-04 | 南威软件股份有限公司 | Data extraction method and system supporting breakpoint continuous transmission |
CN109815292A (en) * | 2019-01-03 | 2019-05-28 | 广州中软信息技术有限公司 | A kind of concerning taxes data collection system based on asynchronous message mechanism |
WO2020215532A1 (en) * | 2019-04-26 | 2020-10-29 | 厦门市美亚柏科信息股份有限公司 | System and method for data synchronization between heterogeneous databases, and storage medium |
CN110309108A (en) * | 2019-05-08 | 2019-10-08 | 江苏满运软件科技有限公司 | Data acquisition and storage method, device, electronic equipment, storage medium |
CN111026535A (en) * | 2019-12-12 | 2020-04-17 | 成都九洲电子信息系统股份有限公司 | Non-standardized hot plug type data batch processing method |
CN111026535B (en) * | 2019-12-12 | 2023-03-21 | 成都九洲电子信息系统股份有限公司 | Non-standardized hot plug type data batch processing method |
CN113377863A (en) * | 2020-03-10 | 2021-09-10 | 阿里巴巴集团控股有限公司 | Data synchronization method and device, electronic equipment and computer readable storage medium |
CN113377863B (en) * | 2020-03-10 | 2022-04-29 | 阿里巴巴集团控股有限公司 | Data synchronization method and device, electronic equipment and computer readable storage medium |
CN113688116A (en) * | 2020-05-19 | 2021-11-23 | 长鑫存储技术有限公司 | Data presentation system, method, device and computer readable storage medium |
US11983224B2 (en) | 2020-05-19 | 2024-05-14 | Changxin Memory Technologies, Inc. | Data presentation system, method and device, and computer-readable storage medium |
CN113688116B (en) * | 2020-05-19 | 2024-08-23 | 长鑫存储技术有限公司 | Data display system, method, apparatus and computer readable storage medium |
CN112015799A (en) * | 2020-10-20 | 2020-12-01 | 平安国际智慧城市科技股份有限公司 | ETL task execution method and device, computer equipment and storage medium |
CN113239081A (en) * | 2021-05-21 | 2021-08-10 | 瀚云科技有限公司 | Streaming data calculation method |
CN115186020A (en) * | 2022-07-15 | 2022-10-14 | 深圳安巽科技有限公司 | Data access storage processing method, system and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106250444A (en) | The real-time Input System of a kind of heterogeneous data source and method | |
US10949447B2 (en) | Blockchain-based data synchronizing and data block parsing method and device | |
JP6716727B2 (en) | Streaming data distributed processing method and apparatus | |
CN110674154B (en) | Spark-based method for inserting, updating and deleting data in Hive | |
US20160035044A1 (en) | Account processing method and apparatus | |
CN104036025A (en) | Distribution-base mass log collection system | |
US10706062B2 (en) | Method and system for exchanging data from a big data source to a big data target corresponding to components of the big data source | |
CN107220310A (en) | A kind of database data management system, method and device | |
CN103984745A (en) | Distributed video vertical searching method and system | |
CN105045856A (en) | Hadoop-based data processing system for big-data remote sensing satellite | |
CN103516802A (en) | Method and device for achieving seamless transference of across heterogeneous virtual switch | |
CN113900810A (en) | Distributed graph processing method, system and storage medium | |
CN111343241B (en) | Graph data updating method, device and system | |
CN105530272A (en) | Method and device for application data synchronization | |
CN104572505A (en) | System and method for ensuring eventual consistency of mass data caches | |
Kchaou et al. | Towards an offloading framework based on big data analytics in mobile cloud computing environments | |
CN104899278A (en) | Method and apparatus for generating data operation logs of Hbase database | |
CN111476595A (en) | Product pushing method and device, computer equipment and storage medium | |
US10853367B1 (en) | Dynamic prioritization of attributes to determine search space size of each term, then index on those sizes as attributes | |
CN105681199A (en) | Method and device for processing message data in vehicular bus | |
Chen et al. | Big data generation and acquisition | |
CN117093619A (en) | Rule engine processing method and device, electronic equipment and storage medium | |
CN111538772A (en) | Data exchange processing method and device, electronic equipment and storage medium | |
CN106161056B (en) | The distributed caching O&M method and device of preiodic type data | |
US10860614B2 (en) | Partitioning data in a clustered database environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20161221 |