CN107784093A - A kind of distributed big data processing system - Google Patents

A kind of distributed big data processing system Download PDF

Info

Publication number
CN107784093A
CN107784093A CN201710954633.XA CN201710954633A CN107784093A CN 107784093 A CN107784093 A CN 107784093A CN 201710954633 A CN201710954633 A CN 201710954633A CN 107784093 A CN107784093 A CN 107784093A
Authority
CN
China
Prior art keywords
data
module
file
processing
data processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710954633.XA
Other languages
Chinese (zh)
Inventor
张炜刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Cloud Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201710954633.XA priority Critical patent/CN107784093A/en
Publication of CN107784093A publication Critical patent/CN107784093A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Abstract

The present invention provides a kind of distributed big data processing system, and data input control module is received and transmitted to the data of system, and by the data transfer of reception to data management control module;Data management control unit receives the data of each data input control module transmission, and data are handled, and routing data to data outputting module according to different results is exported;Data outputting module is used for the data for receiving the transmission of data management control module, and is transmitted according to data processing type to data processing unit;Data processing unit carries out data processing according to the data type of reception;Distributed big data processing system passes through Transmission Control Protocol, or WEB mode realizes the Stream Processing of big data, user can quickly realize that the distribution of flow chart of data processing is built and quickly started, the processing procedure of every data in flow, which will all be recorded, facilitates user to trace to the source, and system can also dock diversified big data component to complete the circulation of data.

Description

A kind of distributed big data processing system
Technical field
The present invention relates to big data process field, more particularly to a kind of distributed big data processing system.
Background technology
The computation schema of big data is divided into batch calculating and streaming computing.Both patterns be applicable from different scenes, batch Amount calculating needs first to store to be calculated afterwards, and real-time is not high.And the data in streaming computing are real-times in a time window It is stronger.
The speed for producing and propagating with the fast development of the emerging technologies such as Internet of Things, mobile interchange, social media, data Constantly accelerate, while the value of data can also drastically reduce.Value how quickly is extracted in never disconnected caused mass data, into The urgent demand of people.
The big data streaming processing block frame of comparative maturity has on the market at present:Spark, Strom and Samza.These three realities When computing system be all the distributed system increased income, there is low delay, many advantages, such as expansible and fault-tolerance is high.But they Also there is the shortcomings that certain, such as:The demand of change can not be timely responded to, it is necessary to repack, upload;Data handling procedure is not straight See, all no data are traced to the source function.
The content of the invention
In order to overcome above-mentioned deficiency of the prior art, the present invention provides a kind of distributed big data processing system, bag Include:Several data input control modules, data management control unit, data outputting module and data processing unit;
Each data input control module receives respectively to be transmitted to the data of system, and by the data transfer of reception to data Management control module;
Data management control unit receives the data of each data input control module transmission, and data are handled, root Data outputting module is routed data to according to different results to be exported;
Data outputting module is used for the data for receiving the transmission of data management control module, and is transmitted according to data processing type To data processing unit;
Data processing unit carries out data processing according to the data type of reception;
Data management control unit includes:Document management module;
Document management module is used to the data file of reception being stored in the Hash map of JVM internal memories, and with write-ahead log Mode record the metadata of currently received data;Metadata includes the attribute of all data, performs the pointer of data content, And the state of data.
Preferably, the ability that the write-ahead log function offer processing of document management module is restarted or system exception is handled;
The data file that document management module receives includes:Main frame power failure data information, Kernel Panic data message, system Upgrade data message and periodic maintenance data message.
Preferably, data management control unit also includes:Data content management module;
Data content management module uses non-variable property and Copy on write schema management data file, by content data file It is stored on disk, when data file is read, is read in JVM internal memories.
Preferably, data management control unit also includes:Source data management module;
Source data management module is used for the history of data storage file, the data of each reception is traced to the source, to any One event of time data file can all create a new source event;Source event is one of data file current time fast According to, source event replicates the attribute of data file and the pointer for performing content data file and recording data files institute is stateful, These contents are stored in source data management module.
Source event includes:Establishment to data file, the duplication to data file and the modification to data file.
Preferably, data processing unit includes:HDFS processing modules, HBASE processing modules and KAFKA processing modules.
Preferably, data input control module is received using Transmission Control Protocol and transmitted to the data of system, or uses sockte side Formula is received and transmitted to the data of system, or is received and transmitted to the data of system by the way of WEB.
Preferably, data input control module, data management control unit, data outputting module and data processing unit Between pass through avro forms carry out Deta bearer circulation.
As can be seen from the above technical solutions, the present invention has advantages below:
Distributed big data processing system realizes big data by Transmission Control Protocol, or sockte modes, or WEB mode Stream Processing, user can quickly realize that the distribution of flow chart of data processing is built and quickly started, every number in flow According to processing procedure will all be recorded and facilitate user to trace to the source, system can also dock diversified big data component and come Into the circulation of data.
Brief description of the drawings
In order to illustrate more clearly of technical scheme, the required accompanying drawing used in description will be made below simple Ground introduction, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for ordinary skill For personnel, on the premise of not paying creative work, other accompanying drawings can also be obtained according to these accompanying drawings.
Fig. 1 is the overall schematic of distributed big data processing system;
Fig. 2 is distributed big data processing system embodiment schematic diagram.
Embodiment
To enable goal of the invention, feature, the advantage of the present invention more obvious and understandable, will use below specific Embodiment and accompanying drawing, the technical scheme protected to the present invention are clearly and completely described, it is clear that implementation disclosed below Example is only part of the embodiment of the present invention, and not all embodiment.Based on the embodiment in this patent, the common skill in this area All other embodiment that art personnel are obtained under the premise of creative work is not made, belong to the model of this patent protection Enclose.
The present embodiment provides a kind of distributed big data processing system, as shown in Figure 1 and Figure 2, including:Several data Input control module 1, data management control unit 2, data outputting module 3 and data processing unit 4;
Each data input control module 1 receives respectively to be transmitted to the data of system, and by the data transfer of reception to number According to management control module 2;
Data management control unit 2 receives the data of each data input control module transmission, and data are handled, root Data outputting module 3 is routed data to according to different results to be exported;
Data outputting module 3 is used for the data for receiving the transmission of data management control module, and is passed according to data processing type Transport to data processing unit 4;Data processing unit 4 carries out data processing according to the data type of reception;The type of data processing Including:For the data type of HDFS data processings, for the data type of HBASE data processings, and for KAFKA data The data type of processing.
Data management control unit 2 includes:Document management module 21;Document management module 21 is used for the data text of reception Part is stored in the Hash map of JVM internal memories, so very efficient can obtain processing data, and record and work as in a manner of write-ahead log The metadata of the data of preceding reception;Metadata includes the attribute of all data, performs the pointer of data content, and the shape of data State.The write-ahead log function offer processing of document management module is restarted or the ability of system exception processing;Document management module connects The data file of receipts includes:Main frame power failure data information, Kernel Panic data message, system upgrade data message and cycle dimension Protect data message.
In the present embodiment, data management control unit 2 also includes:Data content management module 22 and source data management mould Block 23;
Data content management module 22 uses non-variable property and Copy on write schema management data file, to ensure maximum Speed and thread-safe.Content data file is stored on disk, when data file is read, read in JVM internal memories Take.Small and effective data so can be only handled, and without all the elements are all read in JVM.Therefore for example split, polymerize, It is very easy to shift the operations such as large-scale target, it is not necessary to damages internal memory.
Source data management module 23 is used for the history of data storage file, and the data of each reception are traced to the source, to appointing One event of one time data file can all create a new source event;Source event is one of data file current time Snapshot, source event replicate the attribute of data file and perform the pointer and all shapes of recording data files of content data file State, these contents are stored in source data management module.Source event includes:Establishment to data file, to data file Duplication and the modification to data file.
Data management control unit 2 save current stream file in data flow initial data, content repository store work as Preceding and the content of historical file, data management control unit 2 store the historical record of file.
Programming logos of the distributed big data processing system based on work streaming, system is highly susceptible to using, it is reliable and Height is configurable.Possesses data backdating capability.User interface allows user intuitively to understand in WEB and holds friendship with data flow Mutually, more rapidly it is iterated with safety.Data backtracking characteristic allows user to check how an object circulates between system, returns Situation about occurring after before putting and visualizing committed step.
Distributed big data processing system big data streaming system architecture, can be with efficient process Internet of Things, mobile terminal etc. Caused mass data.Support the quick change of handling process to tackle continually changing demand.Include data backdating capability.Can Be efficiently applied to financial air control it is counter cheat, the scene such as personnel at risk's early warning.
In the present embodiment, data processing unit includes:HDFS processing modules, HBASE processing modules and KAFKA processing moulds Block.Data processing can be carried out according to different usage scenarios, or use environment based on different processing modes.
In the present embodiment, data input control module is received using Transmission Control Protocol and transmitted to the data of system, or is used Sockte modes, which receive, to be transmitted to the data of system, or is received and transmitted to the data of system by the way of WEB.System can lead to Cross multiple channel and get data message.
Data input control module can use single data receiver mode, the combination of several data input control modules It is achieved that a variety of different data receiver modes.
Data input control module, data management control unit, lead between data outputting module and data processing unit Cross avro forms and carry out Deta bearer circulation, can so improve data processing and the circulation efficiency of internal system.
User interface of the system based on WEB:The design of data flow can be carried out, is controlled, feedback and monitoring.System is supported more Kind recording controller docking big data and Internet of Things screen component.Also User Defined controller is supported;The persistence that system can pass through Write-ahead log (WAL) and content repository ensure the reliability of data.System can carry out the historical trace of data.System can To carry out distributed deployment.
The foregoing description of the disclosed embodiments, professional and technical personnel in the field are enable to realize or using the present invention. A variety of modifications to these embodiments will be apparent for those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, it is of the invention The embodiments shown herein is not intended to be limited to, and is to fit to and principles disclosed herein and features of novelty phase one The most wide scope caused.

Claims (7)

  1. A kind of 1. distributed big data processing system, it is characterised in that including:Several data input control modules, data Management control unit, data outputting module and data processing unit;
    Each data input control module receives respectively to be transmitted to the data of system, and by the data transfer of reception to data management Control module;
    Data management control unit receives the data of each data input control module transmission, and data are handled, according to not Same result routes data to data outputting module and exported;
    Data outputting module is used for the data for receiving the transmission of data management control module, and is transmitted according to data processing type to number According to processing unit;
    Data processing unit carries out data processing according to the data type of reception;
    Data management control unit includes:Document management module;
    Document management module is used to the data file of reception being stored in the Hash map of JVM internal memories, and with the side of write-ahead log Formula records the metadata of currently received data;Metadata includes the attribute of all data, performs the pointer of data content, and The state of data.
  2. 2. distributed big data processing system according to claim 1, it is characterised in that
    The write-ahead log function offer processing of document management module is restarted or the ability of system exception processing;
    The data file that document management module receives includes:Main frame power failure data information, Kernel Panic data message, system upgrade Data message and periodic maintenance data message.
  3. 3. distributed big data processing system according to claim 1 or 2, it is characterised in that
    Data management control unit also includes:Data content management module;
    Data content management module uses non-variable property and Copy on write schema management data file, and content data file is preserved On disk, when data file is read, read in JVM internal memories.
  4. 4. distributed big data processing system according to claim 1 or 2, it is characterised in that
    Data management control unit also includes:Source data management module;
    Source data management module is used for the history of data storage file, the data of each reception is traced to the source, to any time One event of data file can all create a new source event;Source event is a snapshot of data file current time, Source event replicates the attribute of data file and the pointer for performing content data file and recording data files institute is stateful, by this A little contents are stored in source data management module.
    Source event includes:Establishment to data file, the duplication to data file and the modification to data file.
  5. 5. distributed big data processing system according to claim 1 or 2, it is characterised in that
    Data processing unit includes:HDFS processing modules, HBASE processing modules and KAFKA processing modules.
  6. 6. distributed big data processing system according to claim 1 or 2, it is characterised in that
    Data input control module is received using Transmission Control Protocol and transmitted to the data of system, or receives transmission using sockte modes To the data of system, or received and transmitted to the data of system by the way of WEB.
  7. 7. according to the distributed big data processing system described in claim 1 or 2, it is characterised in that
    Data input control module, data management control unit, passes through between data outputting module and data processing unit Avro forms carry out Deta bearer circulation.
CN201710954633.XA 2017-10-13 2017-10-13 A kind of distributed big data processing system Pending CN107784093A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710954633.XA CN107784093A (en) 2017-10-13 2017-10-13 A kind of distributed big data processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710954633.XA CN107784093A (en) 2017-10-13 2017-10-13 A kind of distributed big data processing system

Publications (1)

Publication Number Publication Date
CN107784093A true CN107784093A (en) 2018-03-09

Family

ID=61433610

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710954633.XA Pending CN107784093A (en) 2017-10-13 2017-10-13 A kind of distributed big data processing system

Country Status (1)

Country Link
CN (1) CN107784093A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110083626A (en) * 2019-03-29 2019-08-02 北京奇安信科技有限公司 Streaming events sequences match method and device
CN111337727A (en) * 2020-03-05 2020-06-26 山东泰开互感器有限公司 Current transformer and cloud computing-based current transformer information interaction system
CN111368501A (en) * 2018-12-26 2020-07-03 中国石油天然气集团有限公司 Seismic auxiliary data flow processing system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7284104B1 (en) * 2003-06-30 2007-10-16 Veritas Operating Corporation Volume-based incremental backup and recovery of files
CN104378423A (en) * 2014-11-13 2015-02-25 普华基础软件股份有限公司 Metadata cluster distribution storage system and storage data reading and writing method
CN105701203A (en) * 2016-01-12 2016-06-22 北京中交兴路车联网科技有限公司 Information storage and query method and system for big data clusters
CN105760459A (en) * 2016-02-04 2016-07-13 四川嘉宝资产管理集团股份有限公司 Distributed data processing system and method
CN105960635A (en) * 2014-02-07 2016-09-21 国际商业机器公司 Creating restore copy from copy of source data in repository having source data at different point-in-times
CN106709003A (en) * 2016-12-23 2017-05-24 长沙理工大学 Hadoop-based mass log data processing method
CN106709069A (en) * 2017-01-25 2017-05-24 焦点科技股份有限公司 High-reliability big data logging collection and transmission method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7284104B1 (en) * 2003-06-30 2007-10-16 Veritas Operating Corporation Volume-based incremental backup and recovery of files
CN105960635A (en) * 2014-02-07 2016-09-21 国际商业机器公司 Creating restore copy from copy of source data in repository having source data at different point-in-times
CN104378423A (en) * 2014-11-13 2015-02-25 普华基础软件股份有限公司 Metadata cluster distribution storage system and storage data reading and writing method
CN105701203A (en) * 2016-01-12 2016-06-22 北京中交兴路车联网科技有限公司 Information storage and query method and system for big data clusters
CN105760459A (en) * 2016-02-04 2016-07-13 四川嘉宝资产管理集团股份有限公司 Distributed data processing system and method
CN106709003A (en) * 2016-12-23 2017-05-24 长沙理工大学 Hadoop-based mass log data processing method
CN106709069A (en) * 2017-01-25 2017-05-24 焦点科技股份有限公司 High-reliability big data logging collection and transmission method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368501A (en) * 2018-12-26 2020-07-03 中国石油天然气集团有限公司 Seismic auxiliary data flow processing system
CN111368501B (en) * 2018-12-26 2023-09-26 中国石油天然气集团有限公司 Seismic auxiliary data flow processing system
CN110083626A (en) * 2019-03-29 2019-08-02 北京奇安信科技有限公司 Streaming events sequences match method and device
CN111337727A (en) * 2020-03-05 2020-06-26 山东泰开互感器有限公司 Current transformer and cloud computing-based current transformer information interaction system

Similar Documents

Publication Publication Date Title
CA3087309C (en) Blockchain-based data processing method and device
CN108829350A (en) Data migration method and device based on block chain
CN108536761A (en) Report data querying method and server
CN107784093A (en) A kind of distributed big data processing system
US20130013597A1 (en) Processing Repetitive Data
CN109284251A (en) Blog management method, device, computer equipment and storage medium
CN104731796B (en) Data storage computational methods and system
CN110334070A (en) Data processing method, system, equipment and storage medium
CN105630847A (en) Data storage method as well as data query method, apparatus and system
CN103581332A (en) HDFS framework and pressure decomposition method for NameNodes in HDFS framework
CN107508869A (en) Trace back data acquisition method and client
CN103294799B (en) A kind of data parallel batch imports the method and system of read-only inquiry system
CN110297810A (en) A kind of stream data processing method, device and electronic equipment
CN109165210A (en) A kind of method and device of cluster Hbase Data Migration
EP2208317B1 (en) Compressing null columns in rows of the tabular data stream protocol
CN105022676A (en) Recovery method and device of main memory database redo log files
CN107463340A (en) The data-storage system of computer
CN103577434B (en) A kind of management method and device of data file
CN115935909A (en) File generation method and device and electronic equipment
WO2019127926A1 (en) Calculation method and calculation device for sparse neural network, electronic device, computer readable storage medium, and computer program product
CN106156069B (en) Log system and log recording method
CN202121595U (en) Server cluster system
CN206322076U (en) A kind of system for retrieving mobile terminal model
Tang et al. Design of a data processing method for the farmland environmental monitoring based on improved Spark components
CN111541747A (en) Data check point setting method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200522

Address after: Building S01, Inspur Science Park, No. 1036, Inspur Road, high tech Zone, Jinan City, Shandong Province, 250000

Applicant after: Tidal Cloud Information Technology Co.,Ltd.

Address before: 450000 Henan province Zheng Dong New District of Zhengzhou City Xinyi Road No. 278 16 floor room 1601

Applicant before: ZHENGZHOU YUNHAI INFORMATION TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20180309

RJ01 Rejection of invention patent application after publication