CN108874313B - Data exchange platform for big data increment extraction based on data stream - Google Patents
Data exchange platform for big data increment extraction based on data stream Download PDFInfo
- Publication number
- CN108874313B CN108874313B CN201810548955.9A CN201810548955A CN108874313B CN 108874313 B CN108874313 B CN 108874313B CN 201810548955 A CN201810548955 A CN 201810548955A CN 108874313 B CN108874313 B CN 108874313B
- Authority
- CN
- China
- Prior art keywords
- data
- incremental
- processor
- module
- big data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0604—Improving or facilitating administration, e.g. storage management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0644—Management of space entities, e.g. partitions, extents, pools
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0662—Virtualisation aspects
- G06F3/0665—Virtualisation aspects at area level, e.g. provisioning of virtual or logical volumes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention relates to a data exchange platform for extracting big data increment based on data flow, which comprises the following parts: a network server module that provides an execution environment for the incremental processor module and the big data storage module and is used to host an HTTP-based command and control API for data streams in the big data storage module; the increment processor module is used for acquiring the execution environment provided by the network server module and storing the cache queue in the big data storage module; the big data storage module is used for acquiring the execution environment provided by the network server module and storing the cache queue of the incremental processor module; the output end of the network server module is connected with the input end of the increment processor module, the increment processor module is in bidirectional communication connection with the big data storage module, and the network server module is in bidirectional communication connection with the big data storage module. The data transformation platform of the invention has the advantages of easy management of incremental data, convenient use, flexible scaling and expandability.
Description
Technical Field
The invention belongs to the technical field of incremental extraction of big data, and particularly relates to a data exchange platform for incremental extraction of big data based on data flow.
Background
With the continuous development of the internet and the wide application of various intelligent terminal devices, more and more terminal devices can access the internet, so that the data volume is larger and larger. Various data information is particularly important for resident life, enterprise decision, government policy making and the like, so that the problem of data storage is particularly important. The oldest data information is stored in a file form, so that resources are wasted, storage is difficult, searching and analysis of the data information are not facilitated, and the method is not suitable for the society of rapid development of the internet. Database technology arises in that a Database (Database) is a warehouse that organizes, stores, and manages data according to a data structure. With the development of information technology and markets, particularly after the nineties of the twentieth century, data management is no longer just storing and managing data, but is turning into various data management ways required by users.
The data information involved at present is wide and huge in related area, and data scattered in various aspects needs to be concentrated in one data mart, so that the analysis, statistics and use of the data are facilitated. Conventional databases have been unable to support the enormous statistical analysis effort, which requires a technique to centralize the data. If multiple systems are present in an enterprise, the primary task faced is to solve the problem of incremental extraction of data streams between data production systems and data processing systems.
Disclosure of Invention
According to the problems in the prior art, the invention provides a data exchange platform for large data increment extraction based on data flow, which has the advantages of easy management of increment data, convenient use, flexible scaling and expandability.
In order to achieve the above object, the present invention provides a data exchange platform for extracting big data increment based on data stream, which includes the following parts:
a web server module that provides an execution environment for the incremental processor module and the big data storage module and is used to host an HTTP-based command and control API (application programming interface) for data flow in the big data storage module;
the increment processor module is used for acquiring the execution environment provided by the network server module, storing the cache queue in the big data storage module and reading the cache queue from the big data storage module;
the big data storage module is used for acquiring an execution environment provided by the network server module and storing a cache queue of the incremental processor module, and commands and control APIs (application programming interfaces) of data streams in the big data storage module are hosted by the network server module;
the output end of the network server module is connected with the input end of the increment processor module, the increment processor module is in bidirectional communication connection with the big data storage module, and the network server module is in bidirectional communication connection with the big data storage module.
Preferably, the increment processor module comprises an increment data processor, an increment extraction processor and an expander; the output ends of the incremental data processor and the expander are connected with the input end of the incremental extraction processor; the input ends of the increment data processor, the increment extraction processor and the expander are all connected with the output end of the network server module, the output end of the increment data processor is connected with the input end of the big data storage module, and the increment extraction processor is in bidirectional communication connection with the big data storage module.
Preferably, the big data storage module comprises a data stream source library, a data stream file library and a big data content library, and the output end of the data stream file library is connected with the input end of the big data content library; the data stream source library, the data stream file library and the big data content library are in bidirectional communication connection with the network server module; the input end of the data stream source library is connected with the output end of the incremental data processor, the output end of the data stream source library is connected with the input end of the incremental extraction processor, and the output end of the incremental extraction processor is connected with the input end of the data stream file library.
Still further preferably, the web server module provides an execution environment for the incremental data processor, the incremental extraction processor, the extender, the data stream source repository, the data stream file repository, and the big data content repository, and is used for hosting HTTP-based command and control APIs for data streams in the data stream source repository, the data stream file repository, and the big data content repository.
Preferably, the incremental data processor is configured to obtain incremental data, store the cache queue data where the incremental data is located in the data stream source library, read the incremental data in the data stream source library by the incremental extraction processor, obtain the execution time table by the expander, extract the incremental data by the incremental extraction processor to obtain the incremental extraction data, store the cache queue where the incremental extraction data is located in the data stream file library, and further store the cache queue where the incremental extraction data is located in the big data content library.
The invention has the beneficial effects that:
1) the data exchange platform comprises a network server module, an increment processor module and a big data storage module, wherein the big data storage module specifies the storage positions of a plurality of file systems, namely a data stream source library, a data stream file library and a big data content library, so that different physical partitions can be obtained, and the contention of a single volume is reduced. The data exchange platform has the advantages of easy management of incremental data, convenient use, flexible scaling and expandability.
Drawings
FIG. 1 is a schematic diagram of a data exchange platform according to the present invention.
FIG. 2 is a flow chart of data increment extraction of the data exchange platform of the present invention.
Reference numerals: 1-network server module, 2-increment processor module, 3-big data storage module, 21-increment data processor, 22-increment extraction processor, 23-expander, 31-data stream source base, 32-data stream file base and 33-big data content base.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, the present invention provides a data exchange platform for incremental extraction of big data based on data stream, which includes the following parts:
a web server module 1 that provides an execution environment for the incremental processor module 2 and the big data storage module 3, and is used to host the HTTP-based command and control API for the data streams in the big data storage module 3;
the increment processor module 2 is used for acquiring the execution environment provided by the network server module 1, storing the cache queue in the big data storage module 3 and reading the cache queue from the big data storage module 3;
specifically, the increment processor module 2 includes an increment data processor 21, an increment extraction processor 22, and an expander 23; the output ends of the increment data processor 21 and the expander 23 are both connected with the input end of the increment extraction processor 22; the input ends of the increment data processor 21, the increment extraction processor 22 and the expander 23 are all connected with the output end of the network server module 1, the output end of the increment data processor 21 is connected with the input end of the big data storage module 3, and the increment extraction processor 22 is in bidirectional communication connection with the big data storage module 3.
The big data storage module 3 is used for acquiring an execution environment provided by the network server module 1 and storing a cache queue of the incremental processor module 2, and HTTP-based commands and control APIs of data streams in the big data storage module 3 are hosted by the network server module 1;
specifically, the big data storage module 3 includes a data stream source library 31, a data stream file library 32, and a big data content library 33, wherein an output end of the data stream file library 32 is connected to an input end of the big data content library 33; the data stream source library 31, the data stream file library 32 and the big data content library 33 are all in bidirectional communication connection with the network server module 1; the input end of the data stream source library 31 is connected with the output end of the incremental data processor 21, the output end of the data stream source library 31 is connected with the input end of the incremental extraction processor 22, and the output end of the incremental extraction processor 22 is connected with the input end of the data stream file library 32.
The output end of the network server module 1 is connected with the input end of the increment processor module 2, the increment processor module 2 is in bidirectional communication connection with the big data storage module 3, and the network server module 1 is in bidirectional communication connection with the big data storage module 3.
As shown in fig. 2, the web server module 1 provides an execution environment for the incremental data processor 21, the incremental extraction processor 22, the extender 23, the data stream source library 31, the data stream file library 32, and the big data content library 33, and is used to host HTTP-based command and control APIs for data streams in the data stream source library 31, the data stream file library 32, and the big data content library 33.
The incremental data processor 21 is configured to obtain incremental data, store the cache queue data where the incremental data is located in the data stream source library 31, read the incremental data in the data stream source library 31 by the incremental extraction processor 22, obtain an execution time table by the expander 23, extract the incremental data by the incremental extraction processor 22 to obtain the incremental extraction data, store the cache queue where the incremental extraction data is located in the data stream file library 32, and further store the cache queue where the incremental extraction data is located in the big data content library 33.
In summary, the present invention provides a data exchange platform for incremental extraction of big data based on data stream, which is easy to manage incremental data, convenient to use, and has the advantages of flexible scaling and scalability.
Claims (1)
1. A data exchange platform for big data increment extraction based on data flow is characterized by comprising the following parts:
a web server module (1) providing an execution environment for the incremental processor module (2) and the big data storage module (3) and for hosting an HTTP-based command and control API for data streams in the big data storage module (3);
the increment processor module (2) is used for acquiring an execution environment provided by the network server module (1), storing the cache queue in the big data storage module (3) and reading the cache queue from the big data storage module (3);
the big data storage module (3) is used for acquiring an execution environment provided by the network server module (1) and storing a cache queue of the incremental processor module (2), and HTTP-based commands and control APIs of data streams in the big data storage module (3) are hosted by the network server module (1);
the output end of the network server module (1) is connected with the input end of the increment processor module (2), the increment processor module (2) is in bidirectional communication connection with the big data storage module (3), and the network server module (1) is in bidirectional communication connection with the big data storage module (3);
the increment processor module (2) comprises an increment data processor (21), an increment extraction processor (22) and an expander (23); the output ends of the increment data processor (21) and the expander (23) are connected with the input end of the increment extraction processor (22); the input ends of the incremental data processor (21), the incremental extraction processor (22) and the expander (23) are all connected with the output end of the network server module (1), the output end of the incremental data processor (21) is connected with the input end of the big data storage module (3), and the incremental extraction processor (22) is in bidirectional communication connection with the big data storage module (3);
the big data storage module (3) comprises a data stream source library (31), a data stream file library (32) and a big data content library (33), wherein the output end of the data stream file library (32) is connected with the input end of the big data content library (33); the data stream source library (31), the data stream file library (32) and the big data content library (33) are in bidirectional communication connection with the network server module (1); the input end of the data stream source library (31) is connected with the output end of the incremental data processor (21), the output end of the data stream source library (31) is connected with the input end of the incremental extraction processor (22), and the output end of the incremental extraction processor (22) is connected with the input end of the data stream file library (32);
the network server module (1) provides execution environments for the incremental data processor (21), the incremental extraction processor (22), the extender (23), the data stream source library (31), the data stream file library (32) and the big data content library (33), and is used for hosting HTTP-based command and control APIs of data streams in the data stream source library (31), the data stream file library (32) and the big data content library (33);
the incremental data processor (21) is used for acquiring incremental data, storing cache queue data where the incremental data are located in the data stream source library (31), reading the incremental data in the data stream source library (31) by the incremental extraction processor (22), acquiring an execution time table by the expander (23), extracting the incremental data by the incremental extraction processor (22) to obtain the incremental extraction data, storing the cache queue where the incremental extraction data are located in the data stream file library (32), and further storing the cache queue where the incremental extraction data are located in the big data content library (33).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810548955.9A CN108874313B (en) | 2018-05-31 | 2018-05-31 | Data exchange platform for big data increment extraction based on data stream |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810548955.9A CN108874313B (en) | 2018-05-31 | 2018-05-31 | Data exchange platform for big data increment extraction based on data stream |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108874313A CN108874313A (en) | 2018-11-23 |
CN108874313B true CN108874313B (en) | 2021-11-23 |
Family
ID=64336446
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810548955.9A Active CN108874313B (en) | 2018-05-31 | 2018-05-31 | Data exchange platform for big data increment extraction based on data stream |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108874313B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114839889B (en) * | 2022-05-05 | 2023-06-16 | 融营智能科技(上海)有限公司 | Mode switching method and system based on big data analysis |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103577474A (en) * | 2012-08-03 | 2014-02-12 | 阿里巴巴集团控股有限公司 | Method and system for updating database |
CN106709029A (en) * | 2016-12-28 | 2017-05-24 | 上海斐讯数据通信技术有限公司 | File hierarchical processing method and processing system based on Hadoop and MySQL |
CN107562931A (en) * | 2017-09-15 | 2018-01-09 | 新智云数据服务有限公司 | Data pick-up system and data abstracting method |
CN107908683A (en) * | 2017-10-31 | 2018-04-13 | 安徽四创电子股份有限公司 | Wireless city big data off-line processing system and its big data processed offline method |
CN108009207A (en) * | 2017-11-06 | 2018-05-08 | 东软集团股份有限公司 | Incremental data inquiry method and device, storage medium, electronic equipment |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8825601B2 (en) * | 2010-02-01 | 2014-09-02 | Microsoft Corporation | Logical data backup and rollback using incremental capture in a distributed database |
CN102841897B (en) * | 2011-06-23 | 2016-03-02 | 阿里巴巴集团控股有限公司 | A kind of method, Apparatus and system realizing incremental data and extract |
CN103107903B (en) * | 2011-11-15 | 2016-01-27 | 中国移动通信集团广东有限公司 | A kind of resource data shares method and resource data shared device |
CN107395669B (en) * | 2017-06-01 | 2020-04-07 | 华南理工大学 | Data acquisition method and system based on streaming real-time distributed big data |
-
2018
- 2018-05-31 CN CN201810548955.9A patent/CN108874313B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103577474A (en) * | 2012-08-03 | 2014-02-12 | 阿里巴巴集团控股有限公司 | Method and system for updating database |
CN106709029A (en) * | 2016-12-28 | 2017-05-24 | 上海斐讯数据通信技术有限公司 | File hierarchical processing method and processing system based on Hadoop and MySQL |
CN107562931A (en) * | 2017-09-15 | 2018-01-09 | 新智云数据服务有限公司 | Data pick-up system and data abstracting method |
CN107908683A (en) * | 2017-10-31 | 2018-04-13 | 安徽四创电子股份有限公司 | Wireless city big data off-line processing system and its big data processed offline method |
CN108009207A (en) * | 2017-11-06 | 2018-05-08 | 东软集团股份有限公司 | Incremental data inquiry method and device, storage medium, electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN108874313A (en) | 2018-11-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11093466B2 (en) | Incremental out-of-place updates for index structures | |
US10061834B1 (en) | Incremental out-of-place updates for datasets in data stores | |
US11429630B2 (en) | Tiered storage for data processing | |
US10860562B1 (en) | Dynamic predicate indexing for data stores | |
US10824612B2 (en) | Key ticketing system with lock-free concurrency and versioning | |
CN116383238B (en) | Data virtualization system, method, device, equipment and medium based on graph structure | |
US20210109912A1 (en) | Multi-layered key-value storage | |
CN113407649A (en) | Data warehouse modeling method and device, electronic equipment and storage medium | |
US10423617B2 (en) | Remote query optimization in multi data sources | |
CN114265814B (en) | Data lake file system based on object storage | |
CN108874313B (en) | Data exchange platform for big data increment extraction based on data stream | |
US7979385B2 (en) | Selective exposure to a data consumer | |
US10996855B2 (en) | Memory allocation in a data analytics system | |
US20230281211A1 (en) | Adding a read-only query engine to perform queries to a point-in-time of a write-accessible database | |
CN111198917A (en) | Data processing method, device, equipment and storage medium | |
Adam et al. | Big data management and analysis | |
CN111459980A (en) | Monitoring data storage and query method and device | |
CN113836235B (en) | Data processing method based on data center and related equipment thereof | |
US20230169048A1 (en) | Detecting idle periods at network endpoints for management actions at processing clusters for managed databases | |
US20180232406A1 (en) | Big data database system | |
US11055266B2 (en) | Efficient key data store entry traversal and result generation | |
Hilley et al. | Persistent temporal streams | |
Iwazume et al. | Big data in memory: Benchimarking in memory database using the distributed key-value store for machine to machine communication | |
CN109492004A (en) | A kind of number fishery isomeric data storage method, system and device | |
CN113051244B (en) | Data access method and device, and data acquisition method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |