CN108874313B - Data exchange platform for big data increment extraction based on data stream - Google Patents

Data exchange platform for big data increment extraction based on data stream Download PDF

Info

Publication number
CN108874313B
CN108874313B CN201810548955.9A CN201810548955A CN108874313B CN 108874313 B CN108874313 B CN 108874313B CN 201810548955 A CN201810548955 A CN 201810548955A CN 108874313 B CN108874313 B CN 108874313B
Authority
CN
China
Prior art keywords
data
incremental
processor
module
big data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810548955.9A
Other languages
Chinese (zh)
Other versions
CN108874313A (en
Inventor
胡刚
范联伟
余保华
徐圣吉
张金国
展昭
李鑫
邓惠元
张国林
金文林
徐剑
刘春珲
胡斌
谢伟
赵树林
王梦园
杨培韬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Create Electronics Co ltd
Original Assignee
Sun Create Electronics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Create Electronics Co ltd filed Critical Sun Create Electronics Co ltd
Priority to CN201810548955.9A priority Critical patent/CN108874313B/en
Publication of CN108874313A publication Critical patent/CN108874313A/en
Application granted granted Critical
Publication of CN108874313B publication Critical patent/CN108874313B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • G06F3/0665Virtualisation aspects at area level, e.g. provisioning of virtual or logical volumes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to a data exchange platform for extracting big data increment based on data flow, which comprises the following parts: a network server module that provides an execution environment for the incremental processor module and the big data storage module and is used to host an HTTP-based command and control API for data streams in the big data storage module; the increment processor module is used for acquiring the execution environment provided by the network server module and storing the cache queue in the big data storage module; the big data storage module is used for acquiring the execution environment provided by the network server module and storing the cache queue of the incremental processor module; the output end of the network server module is connected with the input end of the increment processor module, the increment processor module is in bidirectional communication connection with the big data storage module, and the network server module is in bidirectional communication connection with the big data storage module. The data transformation platform of the invention has the advantages of easy management of incremental data, convenient use, flexible scaling and expandability.

Description

Data exchange platform for big data increment extraction based on data stream
Technical Field
The invention belongs to the technical field of incremental extraction of big data, and particularly relates to a data exchange platform for incremental extraction of big data based on data flow.
Background
With the continuous development of the internet and the wide application of various intelligent terminal devices, more and more terminal devices can access the internet, so that the data volume is larger and larger. Various data information is particularly important for resident life, enterprise decision, government policy making and the like, so that the problem of data storage is particularly important. The oldest data information is stored in a file form, so that resources are wasted, storage is difficult, searching and analysis of the data information are not facilitated, and the method is not suitable for the society of rapid development of the internet. Database technology arises in that a Database (Database) is a warehouse that organizes, stores, and manages data according to a data structure. With the development of information technology and markets, particularly after the nineties of the twentieth century, data management is no longer just storing and managing data, but is turning into various data management ways required by users.
The data information involved at present is wide and huge in related area, and data scattered in various aspects needs to be concentrated in one data mart, so that the analysis, statistics and use of the data are facilitated. Conventional databases have been unable to support the enormous statistical analysis effort, which requires a technique to centralize the data. If multiple systems are present in an enterprise, the primary task faced is to solve the problem of incremental extraction of data streams between data production systems and data processing systems.
Disclosure of Invention
According to the problems in the prior art, the invention provides a data exchange platform for large data increment extraction based on data flow, which has the advantages of easy management of increment data, convenient use, flexible scaling and expandability.
In order to achieve the above object, the present invention provides a data exchange platform for extracting big data increment based on data stream, which includes the following parts:
a web server module that provides an execution environment for the incremental processor module and the big data storage module and is used to host an HTTP-based command and control API (application programming interface) for data flow in the big data storage module;
the increment processor module is used for acquiring the execution environment provided by the network server module, storing the cache queue in the big data storage module and reading the cache queue from the big data storage module;
the big data storage module is used for acquiring an execution environment provided by the network server module and storing a cache queue of the incremental processor module, and commands and control APIs (application programming interfaces) of data streams in the big data storage module are hosted by the network server module;
the output end of the network server module is connected with the input end of the increment processor module, the increment processor module is in bidirectional communication connection with the big data storage module, and the network server module is in bidirectional communication connection with the big data storage module.
Preferably, the increment processor module comprises an increment data processor, an increment extraction processor and an expander; the output ends of the incremental data processor and the expander are connected with the input end of the incremental extraction processor; the input ends of the increment data processor, the increment extraction processor and the expander are all connected with the output end of the network server module, the output end of the increment data processor is connected with the input end of the big data storage module, and the increment extraction processor is in bidirectional communication connection with the big data storage module.
Preferably, the big data storage module comprises a data stream source library, a data stream file library and a big data content library, and the output end of the data stream file library is connected with the input end of the big data content library; the data stream source library, the data stream file library and the big data content library are in bidirectional communication connection with the network server module; the input end of the data stream source library is connected with the output end of the incremental data processor, the output end of the data stream source library is connected with the input end of the incremental extraction processor, and the output end of the incremental extraction processor is connected with the input end of the data stream file library.
Still further preferably, the web server module provides an execution environment for the incremental data processor, the incremental extraction processor, the extender, the data stream source repository, the data stream file repository, and the big data content repository, and is used for hosting HTTP-based command and control APIs for data streams in the data stream source repository, the data stream file repository, and the big data content repository.
Preferably, the incremental data processor is configured to obtain incremental data, store the cache queue data where the incremental data is located in the data stream source library, read the incremental data in the data stream source library by the incremental extraction processor, obtain the execution time table by the expander, extract the incremental data by the incremental extraction processor to obtain the incremental extraction data, store the cache queue where the incremental extraction data is located in the data stream file library, and further store the cache queue where the incremental extraction data is located in the big data content library.
The invention has the beneficial effects that:
1) the data exchange platform comprises a network server module, an increment processor module and a big data storage module, wherein the big data storage module specifies the storage positions of a plurality of file systems, namely a data stream source library, a data stream file library and a big data content library, so that different physical partitions can be obtained, and the contention of a single volume is reduced. The data exchange platform has the advantages of easy management of incremental data, convenient use, flexible scaling and expandability.
Drawings
FIG. 1 is a schematic diagram of a data exchange platform according to the present invention.
FIG. 2 is a flow chart of data increment extraction of the data exchange platform of the present invention.
Reference numerals: 1-network server module, 2-increment processor module, 3-big data storage module, 21-increment data processor, 22-increment extraction processor, 23-expander, 31-data stream source base, 32-data stream file base and 33-big data content base.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, the present invention provides a data exchange platform for incremental extraction of big data based on data stream, which includes the following parts:
a web server module 1 that provides an execution environment for the incremental processor module 2 and the big data storage module 3, and is used to host the HTTP-based command and control API for the data streams in the big data storage module 3;
the increment processor module 2 is used for acquiring the execution environment provided by the network server module 1, storing the cache queue in the big data storage module 3 and reading the cache queue from the big data storage module 3;
specifically, the increment processor module 2 includes an increment data processor 21, an increment extraction processor 22, and an expander 23; the output ends of the increment data processor 21 and the expander 23 are both connected with the input end of the increment extraction processor 22; the input ends of the increment data processor 21, the increment extraction processor 22 and the expander 23 are all connected with the output end of the network server module 1, the output end of the increment data processor 21 is connected with the input end of the big data storage module 3, and the increment extraction processor 22 is in bidirectional communication connection with the big data storage module 3.
The big data storage module 3 is used for acquiring an execution environment provided by the network server module 1 and storing a cache queue of the incremental processor module 2, and HTTP-based commands and control APIs of data streams in the big data storage module 3 are hosted by the network server module 1;
specifically, the big data storage module 3 includes a data stream source library 31, a data stream file library 32, and a big data content library 33, wherein an output end of the data stream file library 32 is connected to an input end of the big data content library 33; the data stream source library 31, the data stream file library 32 and the big data content library 33 are all in bidirectional communication connection with the network server module 1; the input end of the data stream source library 31 is connected with the output end of the incremental data processor 21, the output end of the data stream source library 31 is connected with the input end of the incremental extraction processor 22, and the output end of the incremental extraction processor 22 is connected with the input end of the data stream file library 32.
The output end of the network server module 1 is connected with the input end of the increment processor module 2, the increment processor module 2 is in bidirectional communication connection with the big data storage module 3, and the network server module 1 is in bidirectional communication connection with the big data storage module 3.
As shown in fig. 2, the web server module 1 provides an execution environment for the incremental data processor 21, the incremental extraction processor 22, the extender 23, the data stream source library 31, the data stream file library 32, and the big data content library 33, and is used to host HTTP-based command and control APIs for data streams in the data stream source library 31, the data stream file library 32, and the big data content library 33.
The incremental data processor 21 is configured to obtain incremental data, store the cache queue data where the incremental data is located in the data stream source library 31, read the incremental data in the data stream source library 31 by the incremental extraction processor 22, obtain an execution time table by the expander 23, extract the incremental data by the incremental extraction processor 22 to obtain the incremental extraction data, store the cache queue where the incremental extraction data is located in the data stream file library 32, and further store the cache queue where the incremental extraction data is located in the big data content library 33.
In summary, the present invention provides a data exchange platform for incremental extraction of big data based on data stream, which is easy to manage incremental data, convenient to use, and has the advantages of flexible scaling and scalability.

Claims (1)

1. A data exchange platform for big data increment extraction based on data flow is characterized by comprising the following parts:
a web server module (1) providing an execution environment for the incremental processor module (2) and the big data storage module (3) and for hosting an HTTP-based command and control API for data streams in the big data storage module (3);
the increment processor module (2) is used for acquiring an execution environment provided by the network server module (1), storing the cache queue in the big data storage module (3) and reading the cache queue from the big data storage module (3);
the big data storage module (3) is used for acquiring an execution environment provided by the network server module (1) and storing a cache queue of the incremental processor module (2), and HTTP-based commands and control APIs of data streams in the big data storage module (3) are hosted by the network server module (1);
the output end of the network server module (1) is connected with the input end of the increment processor module (2), the increment processor module (2) is in bidirectional communication connection with the big data storage module (3), and the network server module (1) is in bidirectional communication connection with the big data storage module (3);
the increment processor module (2) comprises an increment data processor (21), an increment extraction processor (22) and an expander (23); the output ends of the increment data processor (21) and the expander (23) are connected with the input end of the increment extraction processor (22); the input ends of the incremental data processor (21), the incremental extraction processor (22) and the expander (23) are all connected with the output end of the network server module (1), the output end of the incremental data processor (21) is connected with the input end of the big data storage module (3), and the incremental extraction processor (22) is in bidirectional communication connection with the big data storage module (3);
the big data storage module (3) comprises a data stream source library (31), a data stream file library (32) and a big data content library (33), wherein the output end of the data stream file library (32) is connected with the input end of the big data content library (33); the data stream source library (31), the data stream file library (32) and the big data content library (33) are in bidirectional communication connection with the network server module (1); the input end of the data stream source library (31) is connected with the output end of the incremental data processor (21), the output end of the data stream source library (31) is connected with the input end of the incremental extraction processor (22), and the output end of the incremental extraction processor (22) is connected with the input end of the data stream file library (32);
the network server module (1) provides execution environments for the incremental data processor (21), the incremental extraction processor (22), the extender (23), the data stream source library (31), the data stream file library (32) and the big data content library (33), and is used for hosting HTTP-based command and control APIs of data streams in the data stream source library (31), the data stream file library (32) and the big data content library (33);
the incremental data processor (21) is used for acquiring incremental data, storing cache queue data where the incremental data are located in the data stream source library (31), reading the incremental data in the data stream source library (31) by the incremental extraction processor (22), acquiring an execution time table by the expander (23), extracting the incremental data by the incremental extraction processor (22) to obtain the incremental extraction data, storing the cache queue where the incremental extraction data are located in the data stream file library (32), and further storing the cache queue where the incremental extraction data are located in the big data content library (33).
CN201810548955.9A 2018-05-31 2018-05-31 Data exchange platform for big data increment extraction based on data stream Active CN108874313B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810548955.9A CN108874313B (en) 2018-05-31 2018-05-31 Data exchange platform for big data increment extraction based on data stream

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810548955.9A CN108874313B (en) 2018-05-31 2018-05-31 Data exchange platform for big data increment extraction based on data stream

Publications (2)

Publication Number Publication Date
CN108874313A CN108874313A (en) 2018-11-23
CN108874313B true CN108874313B (en) 2021-11-23

Family

ID=64336446

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810548955.9A Active CN108874313B (en) 2018-05-31 2018-05-31 Data exchange platform for big data increment extraction based on data stream

Country Status (1)

Country Link
CN (1) CN108874313B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114839889B (en) * 2022-05-05 2023-06-16 融营智能科技(上海)有限公司 Mode switching method and system based on big data analysis

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577474A (en) * 2012-08-03 2014-02-12 阿里巴巴集团控股有限公司 Method and system for updating database
CN106709029A (en) * 2016-12-28 2017-05-24 上海斐讯数据通信技术有限公司 File hierarchical processing method and processing system based on Hadoop and MySQL
CN107562931A (en) * 2017-09-15 2018-01-09 新智云数据服务有限公司 Data pick-up system and data abstracting method
CN107908683A (en) * 2017-10-31 2018-04-13 安徽四创电子股份有限公司 Wireless city big data off-line processing system and its big data processed offline method
CN108009207A (en) * 2017-11-06 2018-05-08 东软集团股份有限公司 Incremental data inquiry method and device, storage medium, electronic equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8825601B2 (en) * 2010-02-01 2014-09-02 Microsoft Corporation Logical data backup and rollback using incremental capture in a distributed database
CN102841897B (en) * 2011-06-23 2016-03-02 阿里巴巴集团控股有限公司 A kind of method, Apparatus and system realizing incremental data and extract
CN103107903B (en) * 2011-11-15 2016-01-27 中国移动通信集团广东有限公司 A kind of resource data shares method and resource data shared device
CN107395669B (en) * 2017-06-01 2020-04-07 华南理工大学 Data acquisition method and system based on streaming real-time distributed big data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577474A (en) * 2012-08-03 2014-02-12 阿里巴巴集团控股有限公司 Method and system for updating database
CN106709029A (en) * 2016-12-28 2017-05-24 上海斐讯数据通信技术有限公司 File hierarchical processing method and processing system based on Hadoop and MySQL
CN107562931A (en) * 2017-09-15 2018-01-09 新智云数据服务有限公司 Data pick-up system and data abstracting method
CN107908683A (en) * 2017-10-31 2018-04-13 安徽四创电子股份有限公司 Wireless city big data off-line processing system and its big data processed offline method
CN108009207A (en) * 2017-11-06 2018-05-08 东软集团股份有限公司 Incremental data inquiry method and device, storage medium, electronic equipment

Also Published As

Publication number Publication date
CN108874313A (en) 2018-11-23

Similar Documents

Publication Publication Date Title
US11093466B2 (en) Incremental out-of-place updates for index structures
US10061834B1 (en) Incremental out-of-place updates for datasets in data stores
US11429630B2 (en) Tiered storage for data processing
US10860562B1 (en) Dynamic predicate indexing for data stores
US10824612B2 (en) Key ticketing system with lock-free concurrency and versioning
CN116383238B (en) Data virtualization system, method, device, equipment and medium based on graph structure
US20210109912A1 (en) Multi-layered key-value storage
CN113407649A (en) Data warehouse modeling method and device, electronic equipment and storage medium
US10423617B2 (en) Remote query optimization in multi data sources
CN114265814B (en) Data lake file system based on object storage
CN108874313B (en) Data exchange platform for big data increment extraction based on data stream
US7979385B2 (en) Selective exposure to a data consumer
US10996855B2 (en) Memory allocation in a data analytics system
US20230281211A1 (en) Adding a read-only query engine to perform queries to a point-in-time of a write-accessible database
CN111198917A (en) Data processing method, device, equipment and storage medium
Adam et al. Big data management and analysis
CN111459980A (en) Monitoring data storage and query method and device
CN113836235B (en) Data processing method based on data center and related equipment thereof
US20230169048A1 (en) Detecting idle periods at network endpoints for management actions at processing clusters for managed databases
US20180232406A1 (en) Big data database system
US11055266B2 (en) Efficient key data store entry traversal and result generation
Hilley et al. Persistent temporal streams
Iwazume et al. Big data in memory: Benchimarking in memory database using the distributed key-value store for machine to machine communication
CN109492004A (en) A kind of number fishery isomeric data storage method, system and device
CN113051244B (en) Data access method and device, and data acquisition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant