CN108874313B

CN108874313B - Data exchange platform for big data increment extraction based on data stream

Info

Publication number: CN108874313B
Application number: CN201810548955.9A
Authority: CN
Inventors: 胡刚; 范联伟; 余保华; 徐圣吉; 张金国; 展昭; 李鑫; 邓惠元; 张国林; 金文林; 徐剑; 刘春珲; 胡斌; 谢伟; 赵树林; 王梦园; 杨培韬
Original assignee: Sun Create Electronics Co ltd
Current assignee: Sun Create Electronics Co ltd
Priority date: 2018-05-31
Filing date: 2018-05-31
Publication date: 2021-11-23
Anticipated expiration: 2038-05-31
Also published as: CN108874313A

Abstract

The invention relates to a data exchange platform for extracting big data increment based on data flow, which comprises the following parts: a network server module that provides an execution environment for the incremental processor module and the big data storage module and is used to host an HTTP-based command and control API for data streams in the big data storage module; the increment processor module is used for acquiring the execution environment provided by the network server module and storing the cache queue in the big data storage module; the big data storage module is used for acquiring the execution environment provided by the network server module and storing the cache queue of the incremental processor module; the output end of the network server module is connected with the input end of the increment processor module, the increment processor module is in bidirectional communication connection with the big data storage module, and the network server module is in bidirectional communication connection with the big data storage module. The data transformation platform of the invention has the advantages of easy management of incremental data, convenient use, flexible scaling and expandability.

Description

Data exchange platform for big data increment extraction based on data stream

Technical Field

The invention belongs to the technical field of incremental extraction of big data, and particularly relates to a data exchange platform for incremental extraction of big data based on data flow.

Background

With the continuous development of the internet and the wide application of various intelligent terminal devices, more and more terminal devices can access the internet, so that the data volume is larger and larger. Various data information is particularly important for resident life, enterprise decision, government policy making and the like, so that the problem of data storage is particularly important. The oldest data information is stored in a file form, so that resources are wasted, storage is difficult, searching and analysis of the data information are not facilitated, and the method is not suitable for the society of rapid development of the internet. Database technology arises in that a Database (Database) is a warehouse that organizes, stores, and manages data according to a data structure. With the development of information technology and markets, particularly after the nineties of the twentieth century, data management is no longer just storing and managing data, but is turning into various data management ways required by users.

The data information involved at present is wide and huge in related area, and data scattered in various aspects needs to be concentrated in one data mart, so that the analysis, statistics and use of the data are facilitated. Conventional databases have been unable to support the enormous statistical analysis effort, which requires a technique to centralize the data. If multiple systems are present in an enterprise, the primary task faced is to solve the problem of incremental extraction of data streams between data production systems and data processing systems.

Disclosure of Invention

According to the problems in the prior art, the invention provides a data exchange platform for large data increment extraction based on data flow, which has the advantages of easy management of increment data, convenient use, flexible scaling and expandability.

In order to achieve the above object, the present invention provides a data exchange platform for extracting big data increment based on data stream, which includes the following parts:

a web server module that provides an execution environment for the incremental processor module and the big data storage module and is used to host an HTTP-based command and control API (application programming interface) for data flow in the big data storage module;

the increment processor module is used for acquiring the execution environment provided by the network server module, storing the cache queue in the big data storage module and reading the cache queue from the big data storage module;

the big data storage module is used for acquiring an execution environment provided by the network server module and storing a cache queue of the incremental processor module, and commands and control APIs (application programming interfaces) of data streams in the big data storage module are hosted by the network server module;

the output end of the network server module is connected with the input end of the increment processor module, the increment processor module is in bidirectional communication connection with the big data storage module, and the network server module is in bidirectional communication connection with the big data storage module.

Preferably, the increment processor module comprises an increment data processor, an increment extraction processor and an expander; the output ends of the incremental data processor and the expander are connected with the input end of the incremental extraction processor; the input ends of the increment data processor, the increment extraction processor and the expander are all connected with the output end of the network server module, the output end of the increment data processor is connected with the input end of the big data storage module, and the increment extraction processor is in bidirectional communication connection with the big data storage module.

Preferably, the big data storage module comprises a data stream source library, a data stream file library and a big data content library, and the output end of the data stream file library is connected with the input end of the big data content library; the data stream source library, the data stream file library and the big data content library are in bidirectional communication connection with the network server module; the input end of the data stream source library is connected with the output end of the incremental data processor, the output end of the data stream source library is connected with the input end of the incremental extraction processor, and the output end of the incremental extraction processor is connected with the input end of the data stream file library.

Still further preferably, the web server module provides an execution environment for the incremental data processor, the incremental extraction processor, the extender, the data stream source repository, the data stream file repository, and the big data content repository, and is used for hosting HTTP-based command and control APIs for data streams in the data stream source repository, the data stream file repository, and the big data content repository.

Preferably, the incremental data processor is configured to obtain incremental data, store the cache queue data where the incremental data is located in the data stream source library, read the incremental data in the data stream source library by the incremental extraction processor, obtain the execution time table by the expander, extract the incremental data by the incremental extraction processor to obtain the incremental extraction data, store the cache queue where the incremental extraction data is located in the data stream file library, and further store the cache queue where the incremental extraction data is located in the big data content library.

The invention has the beneficial effects that:

1) the data exchange platform comprises a network server module, an increment processor module and a big data storage module, wherein the big data storage module specifies the storage positions of a plurality of file systems, namely a data stream source library, a data stream file library and a big data content library, so that different physical partitions can be obtained, and the contention of a single volume is reduced. The data exchange platform has the advantages of easy management of incremental data, convenient use, flexible scaling and expandability.

Drawings

FIG. 1 is a schematic diagram of a data exchange platform according to the present invention.

FIG. 2 is a flow chart of data increment extraction of the data exchange platform of the present invention.

Reference numerals: 1-network server module, 2-increment processor module, 3-big data storage module, 21-increment data processor, 22-increment extraction processor, 23-expander, 31-data stream source base, 32-data stream file base and 33-big data content base.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, the present invention provides a data exchange platform for incremental extraction of big data based on data stream, which includes the following parts:

a web server module 1 that provides an execution environment for the incremental processor module 2 and the big data storage module 3, and is used to host the HTTP-based command and control API for the data streams in the big data storage module 3;

the increment processor module 2 is used for acquiring the execution environment provided by the network server module 1, storing the cache queue in the big data storage module 3 and reading the cache queue from the big data storage module 3;

specifically, the increment processor module 2 includes an increment data processor 21, an increment extraction processor 22, and an expander 23; the output ends of the increment data processor 21 and the expander 23 are both connected with the input end of the increment extraction processor 22; the input ends of the increment data processor 21, the increment extraction processor 22 and the expander 23 are all connected with the output end of the network server module 1, the output end of the increment data processor 21 is connected with the input end of the big data storage module 3, and the increment extraction processor 22 is in bidirectional communication connection with the big data storage module 3.

The big data storage module 3 is used for acquiring an execution environment provided by the network server module 1 and storing a cache queue of the incremental processor module 2, and HTTP-based commands and control APIs of data streams in the big data storage module 3 are hosted by the network server module 1;

specifically, the big data storage module 3 includes a data stream source library 31, a data stream file library 32, and a big data content library 33, wherein an output end of the data stream file library 32 is connected to an input end of the big data content library 33; the data stream source library 31, the data stream file library 32 and the big data content library 33 are all in bidirectional communication connection with the network server module 1; the input end of the data stream source library 31 is connected with the output end of the incremental data processor 21, the output end of the data stream source library 31 is connected with the input end of the incremental extraction processor 22, and the output end of the incremental extraction processor 22 is connected with the input end of the data stream file library 32.

The output end of the network server module 1 is connected with the input end of the increment processor module 2, the increment processor module 2 is in bidirectional communication connection with the big data storage module 3, and the network server module 1 is in bidirectional communication connection with the big data storage module 3.

As shown in fig. 2, the web server module 1 provides an execution environment for the incremental data processor 21, the incremental extraction processor 22, the extender 23, the data stream source library 31, the data stream file library 32, and the big data content library 33, and is used to host HTTP-based command and control APIs for data streams in the data stream source library 31, the data stream file library 32, and the big data content library 33.

The incremental data processor 21 is configured to obtain incremental data, store the cache queue data where the incremental data is located in the data stream source library 31, read the incremental data in the data stream source library 31 by the incremental extraction processor 22, obtain an execution time table by the expander 23, extract the incremental data by the incremental extraction processor 22 to obtain the incremental extraction data, store the cache queue where the incremental extraction data is located in the data stream file library 32, and further store the cache queue where the incremental extraction data is located in the big data content library 33.

In summary, the present invention provides a data exchange platform for incremental extraction of big data based on data stream, which is easy to manage incremental data, convenient to use, and has the advantages of flexible scaling and scalability.

Claims

1. A data exchange platform for big data increment extraction based on data flow is characterized by comprising the following parts:

a web server module (1) providing an execution environment for the incremental processor module (2) and the big data storage module (3) and for hosting an HTTP-based command and control API for data streams in the big data storage module (3);

the increment processor module (2) is used for acquiring an execution environment provided by the network server module (1), storing the cache queue in the big data storage module (3) and reading the cache queue from the big data storage module (3);

the big data storage module (3) is used for acquiring an execution environment provided by the network server module (1) and storing a cache queue of the incremental processor module (2), and HTTP-based commands and control APIs of data streams in the big data storage module (3) are hosted by the network server module (1);

the output end of the network server module (1) is connected with the input end of the increment processor module (2), the increment processor module (2) is in bidirectional communication connection with the big data storage module (3), and the network server module (1) is in bidirectional communication connection with the big data storage module (3);

the increment processor module (2) comprises an increment data processor (21), an increment extraction processor (22) and an expander (23); the output ends of the increment data processor (21) and the expander (23) are connected with the input end of the increment extraction processor (22); the input ends of the incremental data processor (21), the incremental extraction processor (22) and the expander (23) are all connected with the output end of the network server module (1), the output end of the incremental data processor (21) is connected with the input end of the big data storage module (3), and the incremental extraction processor (22) is in bidirectional communication connection with the big data storage module (3);

the big data storage module (3) comprises a data stream source library (31), a data stream file library (32) and a big data content library (33), wherein the output end of the data stream file library (32) is connected with the input end of the big data content library (33); the data stream source library (31), the data stream file library (32) and the big data content library (33) are in bidirectional communication connection with the network server module (1); the input end of the data stream source library (31) is connected with the output end of the incremental data processor (21), the output end of the data stream source library (31) is connected with the input end of the incremental extraction processor (22), and the output end of the incremental extraction processor (22) is connected with the input end of the data stream file library (32);

the network server module (1) provides execution environments for the incremental data processor (21), the incremental extraction processor (22), the extender (23), the data stream source library (31), the data stream file library (32) and the big data content library (33), and is used for hosting HTTP-based command and control APIs of data streams in the data stream source library (31), the data stream file library (32) and the big data content library (33);

the incremental data processor (21) is used for acquiring incremental data, storing cache queue data where the incremental data are located in the data stream source library (31), reading the incremental data in the data stream source library (31) by the incremental extraction processor (22), acquiring an execution time table by the expander (23), extracting the incremental data by the incremental extraction processor (22) to obtain the incremental extraction data, storing the cache queue where the incremental extraction data are located in the data stream file library (32), and further storing the cache queue where the incremental extraction data are located in the big data content library (33).