CN107784093A - A kind of distributed big data processing system - Google Patents
A kind of distributed big data processing system Download PDFInfo
- Publication number
- CN107784093A CN107784093A CN201710954633.XA CN201710954633A CN107784093A CN 107784093 A CN107784093 A CN 107784093A CN 201710954633 A CN201710954633 A CN 201710954633A CN 107784093 A CN107784093 A CN 107784093A
- Authority
- CN
- China
- Prior art keywords
- data
- module
- file
- processing
- data processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
Abstract
The present invention provides a kind of distributed big data processing system, and data input control module is received and transmitted to the data of system, and by the data transfer of reception to data management control module;Data management control unit receives the data of each data input control module transmission, and data are handled, and routing data to data outputting module according to different results is exported;Data outputting module is used for the data for receiving the transmission of data management control module, and is transmitted according to data processing type to data processing unit;Data processing unit carries out data processing according to the data type of reception;Distributed big data processing system passes through Transmission Control Protocol, or WEB mode realizes the Stream Processing of big data, user can quickly realize that the distribution of flow chart of data processing is built and quickly started, the processing procedure of every data in flow, which will all be recorded, facilitates user to trace to the source, and system can also dock diversified big data component to complete the circulation of data.
Description
Technical field
The present invention relates to big data process field, more particularly to a kind of distributed big data processing system.
Background technology
The computation schema of big data is divided into batch calculating and streaming computing.Both patterns be applicable from different scenes, batch
Amount calculating needs first to store to be calculated afterwards, and real-time is not high.And the data in streaming computing are real-times in a time window
It is stronger.
The speed for producing and propagating with the fast development of the emerging technologies such as Internet of Things, mobile interchange, social media, data
Constantly accelerate, while the value of data can also drastically reduce.Value how quickly is extracted in never disconnected caused mass data, into
The urgent demand of people.
The big data streaming processing block frame of comparative maturity has on the market at present:Spark, Strom and Samza.These three realities
When computing system be all the distributed system increased income, there is low delay, many advantages, such as expansible and fault-tolerance is high.But they
Also there is the shortcomings that certain, such as:The demand of change can not be timely responded to, it is necessary to repack, upload;Data handling procedure is not straight
See, all no data are traced to the source function.
The content of the invention
In order to overcome above-mentioned deficiency of the prior art, the present invention provides a kind of distributed big data processing system, bag
Include:Several data input control modules, data management control unit, data outputting module and data processing unit;
Each data input control module receives respectively to be transmitted to the data of system, and by the data transfer of reception to data
Management control module;
Data management control unit receives the data of each data input control module transmission, and data are handled, root
Data outputting module is routed data to according to different results to be exported;
Data outputting module is used for the data for receiving the transmission of data management control module, and is transmitted according to data processing type
To data processing unit;
Data processing unit carries out data processing according to the data type of reception;
Data management control unit includes:Document management module;
Document management module is used to the data file of reception being stored in the Hash map of JVM internal memories, and with write-ahead log
Mode record the metadata of currently received data;Metadata includes the attribute of all data, performs the pointer of data content,
And the state of data.
Preferably, the ability that the write-ahead log function offer processing of document management module is restarted or system exception is handled;
The data file that document management module receives includes:Main frame power failure data information, Kernel Panic data message, system
Upgrade data message and periodic maintenance data message.
Preferably, data management control unit also includes:Data content management module;
Data content management module uses non-variable property and Copy on write schema management data file, by content data file
It is stored on disk, when data file is read, is read in JVM internal memories.
Preferably, data management control unit also includes:Source data management module;
Source data management module is used for the history of data storage file, the data of each reception is traced to the source, to any
One event of time data file can all create a new source event;Source event is one of data file current time fast
According to, source event replicates the attribute of data file and the pointer for performing content data file and recording data files institute is stateful,
These contents are stored in source data management module.
Source event includes:Establishment to data file, the duplication to data file and the modification to data file.
Preferably, data processing unit includes:HDFS processing modules, HBASE processing modules and KAFKA processing modules.
Preferably, data input control module is received using Transmission Control Protocol and transmitted to the data of system, or uses sockte side
Formula is received and transmitted to the data of system, or is received and transmitted to the data of system by the way of WEB.
Preferably, data input control module, data management control unit, data outputting module and data processing unit
Between pass through avro forms carry out Deta bearer circulation.
As can be seen from the above technical solutions, the present invention has advantages below:
Distributed big data processing system realizes big data by Transmission Control Protocol, or sockte modes, or WEB mode
Stream Processing, user can quickly realize that the distribution of flow chart of data processing is built and quickly started, every number in flow
According to processing procedure will all be recorded and facilitate user to trace to the source, system can also dock diversified big data component and come
Into the circulation of data.
Brief description of the drawings
In order to illustrate more clearly of technical scheme, the required accompanying drawing used in description will be made below simple
Ground introduction, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for ordinary skill
For personnel, on the premise of not paying creative work, other accompanying drawings can also be obtained according to these accompanying drawings.
Fig. 1 is the overall schematic of distributed big data processing system;
Fig. 2 is distributed big data processing system embodiment schematic diagram.
Embodiment
To enable goal of the invention, feature, the advantage of the present invention more obvious and understandable, will use below specific
Embodiment and accompanying drawing, the technical scheme protected to the present invention are clearly and completely described, it is clear that implementation disclosed below
Example is only part of the embodiment of the present invention, and not all embodiment.Based on the embodiment in this patent, the common skill in this area
All other embodiment that art personnel are obtained under the premise of creative work is not made, belong to the model of this patent protection
Enclose.
The present embodiment provides a kind of distributed big data processing system, as shown in Figure 1 and Figure 2, including:Several data
Input control module 1, data management control unit 2, data outputting module 3 and data processing unit 4;
Each data input control module 1 receives respectively to be transmitted to the data of system, and by the data transfer of reception to number
According to management control module 2;
Data management control unit 2 receives the data of each data input control module transmission, and data are handled, root
Data outputting module 3 is routed data to according to different results to be exported;
Data outputting module 3 is used for the data for receiving the transmission of data management control module, and is passed according to data processing type
Transport to data processing unit 4;Data processing unit 4 carries out data processing according to the data type of reception;The type of data processing
Including:For the data type of HDFS data processings, for the data type of HBASE data processings, and for KAFKA data
The data type of processing.
Data management control unit 2 includes:Document management module 21;Document management module 21 is used for the data text of reception
Part is stored in the Hash map of JVM internal memories, so very efficient can obtain processing data, and record and work as in a manner of write-ahead log
The metadata of the data of preceding reception;Metadata includes the attribute of all data, performs the pointer of data content, and the shape of data
State.The write-ahead log function offer processing of document management module is restarted or the ability of system exception processing;Document management module connects
The data file of receipts includes:Main frame power failure data information, Kernel Panic data message, system upgrade data message and cycle dimension
Protect data message.
In the present embodiment, data management control unit 2 also includes:Data content management module 22 and source data management mould
Block 23;
Data content management module 22 uses non-variable property and Copy on write schema management data file, to ensure maximum
Speed and thread-safe.Content data file is stored on disk, when data file is read, read in JVM internal memories
Take.Small and effective data so can be only handled, and without all the elements are all read in JVM.Therefore for example split, polymerize,
It is very easy to shift the operations such as large-scale target, it is not necessary to damages internal memory.
Source data management module 23 is used for the history of data storage file, and the data of each reception are traced to the source, to appointing
One event of one time data file can all create a new source event;Source event is one of data file current time
Snapshot, source event replicate the attribute of data file and perform the pointer and all shapes of recording data files of content data file
State, these contents are stored in source data management module.Source event includes:Establishment to data file, to data file
Duplication and the modification to data file.
Data management control unit 2 save current stream file in data flow initial data, content repository store work as
Preceding and the content of historical file, data management control unit 2 store the historical record of file.
Programming logos of the distributed big data processing system based on work streaming, system is highly susceptible to using, it is reliable and
Height is configurable.Possesses data backdating capability.User interface allows user intuitively to understand in WEB and holds friendship with data flow
Mutually, more rapidly it is iterated with safety.Data backtracking characteristic allows user to check how an object circulates between system, returns
Situation about occurring after before putting and visualizing committed step.
Distributed big data processing system big data streaming system architecture, can be with efficient process Internet of Things, mobile terminal etc.
Caused mass data.Support the quick change of handling process to tackle continually changing demand.Include data backdating capability.Can
Be efficiently applied to financial air control it is counter cheat, the scene such as personnel at risk's early warning.
In the present embodiment, data processing unit includes:HDFS processing modules, HBASE processing modules and KAFKA processing moulds
Block.Data processing can be carried out according to different usage scenarios, or use environment based on different processing modes.
In the present embodiment, data input control module is received using Transmission Control Protocol and transmitted to the data of system, or is used
Sockte modes, which receive, to be transmitted to the data of system, or is received and transmitted to the data of system by the way of WEB.System can lead to
Cross multiple channel and get data message.
Data input control module can use single data receiver mode, the combination of several data input control modules
It is achieved that a variety of different data receiver modes.
Data input control module, data management control unit, lead between data outputting module and data processing unit
Cross avro forms and carry out Deta bearer circulation, can so improve data processing and the circulation efficiency of internal system.
User interface of the system based on WEB:The design of data flow can be carried out, is controlled, feedback and monitoring.System is supported more
Kind recording controller docking big data and Internet of Things screen component.Also User Defined controller is supported;The persistence that system can pass through
Write-ahead log (WAL) and content repository ensure the reliability of data.System can carry out the historical trace of data.System can
To carry out distributed deployment.
The foregoing description of the disclosed embodiments, professional and technical personnel in the field are enable to realize or using the present invention.
A variety of modifications to these embodiments will be apparent for those skilled in the art, as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, it is of the invention
The embodiments shown herein is not intended to be limited to, and is to fit to and principles disclosed herein and features of novelty phase one
The most wide scope caused.
Claims (7)
- A kind of 1. distributed big data processing system, it is characterised in that including:Several data input control modules, data Management control unit, data outputting module and data processing unit;Each data input control module receives respectively to be transmitted to the data of system, and by the data transfer of reception to data management Control module;Data management control unit receives the data of each data input control module transmission, and data are handled, according to not Same result routes data to data outputting module and exported;Data outputting module is used for the data for receiving the transmission of data management control module, and is transmitted according to data processing type to number According to processing unit;Data processing unit carries out data processing according to the data type of reception;Data management control unit includes:Document management module;Document management module is used to the data file of reception being stored in the Hash map of JVM internal memories, and with the side of write-ahead log Formula records the metadata of currently received data;Metadata includes the attribute of all data, performs the pointer of data content, and The state of data.
- 2. distributed big data processing system according to claim 1, it is characterised in thatThe write-ahead log function offer processing of document management module is restarted or the ability of system exception processing;The data file that document management module receives includes:Main frame power failure data information, Kernel Panic data message, system upgrade Data message and periodic maintenance data message.
- 3. distributed big data processing system according to claim 1 or 2, it is characterised in thatData management control unit also includes:Data content management module;Data content management module uses non-variable property and Copy on write schema management data file, and content data file is preserved On disk, when data file is read, read in JVM internal memories.
- 4. distributed big data processing system according to claim 1 or 2, it is characterised in thatData management control unit also includes:Source data management module;Source data management module is used for the history of data storage file, the data of each reception is traced to the source, to any time One event of data file can all create a new source event;Source event is a snapshot of data file current time, Source event replicates the attribute of data file and the pointer for performing content data file and recording data files institute is stateful, by this A little contents are stored in source data management module.Source event includes:Establishment to data file, the duplication to data file and the modification to data file.
- 5. distributed big data processing system according to claim 1 or 2, it is characterised in thatData processing unit includes:HDFS processing modules, HBASE processing modules and KAFKA processing modules.
- 6. distributed big data processing system according to claim 1 or 2, it is characterised in thatData input control module is received using Transmission Control Protocol and transmitted to the data of system, or receives transmission using sockte modes To the data of system, or received and transmitted to the data of system by the way of WEB.
- 7. according to the distributed big data processing system described in claim 1 or 2, it is characterised in thatData input control module, data management control unit, passes through between data outputting module and data processing unit Avro forms carry out Deta bearer circulation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710954633.XA CN107784093A (en) | 2017-10-13 | 2017-10-13 | A kind of distributed big data processing system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710954633.XA CN107784093A (en) | 2017-10-13 | 2017-10-13 | A kind of distributed big data processing system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107784093A true CN107784093A (en) | 2018-03-09 |
Family
ID=61433610
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710954633.XA Pending CN107784093A (en) | 2017-10-13 | 2017-10-13 | A kind of distributed big data processing system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107784093A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110083626A (en) * | 2019-03-29 | 2019-08-02 | 北京奇安信科技有限公司 | Streaming events sequences match method and device |
CN111337727A (en) * | 2020-03-05 | 2020-06-26 | 山东泰开互感器有限公司 | Current transformer and cloud computing-based current transformer information interaction system |
CN111368501A (en) * | 2018-12-26 | 2020-07-03 | 中国石油天然气集团有限公司 | Seismic auxiliary data flow processing system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7284104B1 (en) * | 2003-06-30 | 2007-10-16 | Veritas Operating Corporation | Volume-based incremental backup and recovery of files |
CN104378423A (en) * | 2014-11-13 | 2015-02-25 | 普华基础软件股份有限公司 | Metadata cluster distribution storage system and storage data reading and writing method |
CN105701203A (en) * | 2016-01-12 | 2016-06-22 | 北京中交兴路车联网科技有限公司 | Information storage and query method and system for big data clusters |
CN105760459A (en) * | 2016-02-04 | 2016-07-13 | 四川嘉宝资产管理集团股份有限公司 | Distributed data processing system and method |
CN105960635A (en) * | 2014-02-07 | 2016-09-21 | 国际商业机器公司 | Creating restore copy from copy of source data in repository having source data at different point-in-times |
CN106709003A (en) * | 2016-12-23 | 2017-05-24 | 长沙理工大学 | Hadoop-based mass log data processing method |
CN106709069A (en) * | 2017-01-25 | 2017-05-24 | 焦点科技股份有限公司 | High-reliability big data logging collection and transmission method |
-
2017
- 2017-10-13 CN CN201710954633.XA patent/CN107784093A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7284104B1 (en) * | 2003-06-30 | 2007-10-16 | Veritas Operating Corporation | Volume-based incremental backup and recovery of files |
CN105960635A (en) * | 2014-02-07 | 2016-09-21 | 国际商业机器公司 | Creating restore copy from copy of source data in repository having source data at different point-in-times |
CN104378423A (en) * | 2014-11-13 | 2015-02-25 | 普华基础软件股份有限公司 | Metadata cluster distribution storage system and storage data reading and writing method |
CN105701203A (en) * | 2016-01-12 | 2016-06-22 | 北京中交兴路车联网科技有限公司 | Information storage and query method and system for big data clusters |
CN105760459A (en) * | 2016-02-04 | 2016-07-13 | 四川嘉宝资产管理集团股份有限公司 | Distributed data processing system and method |
CN106709003A (en) * | 2016-12-23 | 2017-05-24 | 长沙理工大学 | Hadoop-based mass log data processing method |
CN106709069A (en) * | 2017-01-25 | 2017-05-24 | 焦点科技股份有限公司 | High-reliability big data logging collection and transmission method |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111368501A (en) * | 2018-12-26 | 2020-07-03 | 中国石油天然气集团有限公司 | Seismic auxiliary data flow processing system |
CN111368501B (en) * | 2018-12-26 | 2023-09-26 | 中国石油天然气集团有限公司 | Seismic auxiliary data flow processing system |
CN110083626A (en) * | 2019-03-29 | 2019-08-02 | 北京奇安信科技有限公司 | Streaming events sequences match method and device |
CN111337727A (en) * | 2020-03-05 | 2020-06-26 | 山东泰开互感器有限公司 | Current transformer and cloud computing-based current transformer information interaction system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA3087309C (en) | Blockchain-based data processing method and device | |
CN108829350A (en) | Data migration method and device based on block chain | |
CN108536761A (en) | Report data querying method and server | |
CN107784093A (en) | A kind of distributed big data processing system | |
US20130013597A1 (en) | Processing Repetitive Data | |
CN109284251A (en) | Blog management method, device, computer equipment and storage medium | |
CN104731796B (en) | Data storage computational methods and system | |
CN110334070A (en) | Data processing method, system, equipment and storage medium | |
CN105630847A (en) | Data storage method as well as data query method, apparatus and system | |
CN103581332A (en) | HDFS framework and pressure decomposition method for NameNodes in HDFS framework | |
CN107508869A (en) | Trace back data acquisition method and client | |
CN103294799B (en) | A kind of data parallel batch imports the method and system of read-only inquiry system | |
CN110297810A (en) | A kind of stream data processing method, device and electronic equipment | |
CN109165210A (en) | A kind of method and device of cluster Hbase Data Migration | |
EP2208317B1 (en) | Compressing null columns in rows of the tabular data stream protocol | |
CN105022676A (en) | Recovery method and device of main memory database redo log files | |
CN107463340A (en) | The data-storage system of computer | |
CN103577434B (en) | A kind of management method and device of data file | |
CN115935909A (en) | File generation method and device and electronic equipment | |
WO2019127926A1 (en) | Calculation method and calculation device for sparse neural network, electronic device, computer readable storage medium, and computer program product | |
CN106156069B (en) | Log system and log recording method | |
CN202121595U (en) | Server cluster system | |
CN206322076U (en) | A kind of system for retrieving mobile terminal model | |
Tang et al. | Design of a data processing method for the farmland environmental monitoring based on improved Spark components | |
CN111541747A (en) | Data check point setting method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200522 Address after: Building S01, Inspur Science Park, No. 1036, Inspur Road, high tech Zone, Jinan City, Shandong Province, 250000 Applicant after: Tidal Cloud Information Technology Co.,Ltd. Address before: 450000 Henan province Zheng Dong New District of Zhengzhou City Xinyi Road No. 278 16 floor room 1601 Applicant before: ZHENGZHOU YUNHAI INFORMATION TECHNOLOGY Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180309 |
|
RJ01 | Rejection of invention patent application after publication |