CN109343791A

CN109343791A - A kind of big data all-in-one machine

Info

Publication number: CN109343791A
Application number: CN201810936219.0A
Authority: CN
Inventors: 张隆显
Original assignee: Wuhan Yuan Ding Chong Tian Mdt Infotech Ltd
Current assignee: Wuhan Yuan Ding Chong Tian Mdt Infotech Ltd
Priority date: 2018-08-16
Filing date: 2018-08-16
Publication date: 2019-02-15
Anticipated expiration: 2038-08-16
Also published as: CN109343791B

Abstract

The present invention provides a kind of big data all-in-one machine, including multiple Distributed Storage modules, is distributed in mutually independent multiple memories, including a large amount of memory node；Streams Module, for acquisition and transmitting data stream；Multiple computer nodes, connect one to one with memory node, to obtain the data stored in memory node；Data-mining module connects Streams Module and Distributed Storage module, and the data flow classification and being reached data flow in corresponding memory node according to the classification of data flow is stored；Distributed data management module is connect with computer node by channel to channel adapter, calculating task is assigned to the computer node being connected with the memory node that store the required by task data；Multiple fault-tolerant modules of distributed multi-stage correspond with memory, and connect the Streams Module, data-mining module, distributed data management module and memory node, and data flow and order flow through the corresponding fault-tolerant module of distributed multi-stage.

Description

A kind of big data all-in-one machine

Technical field

The present invention relates to information technology field more particularly to a kind of big data all-in-one machines.

Background technique

As application of the information technology in mankind's items production and living is constantly expanded, analyzable data show explosion Formula increases.Potential value is efficiently and rapidly excavated from mass data and is converted into decision-making foundation has become every profession and trade letter The significant challenge that breathization faces.

All-in-one machine is the integrated system product that software is combined with hardware, generally collection data processing, data transmission, data Three aspect of storage is in one.All-in-one machine can be realized rapid deployment, simplify IT foundation frame by integrating, testing, optimization in advance Structure saves resource.But the software and hardware for forming all-in-one machine will appear failure and mistake from time to time, causes to crash or run Failure, to influence to calculate the time or even loss of data can be made.So being badly in need of a kind of with fault-tolerant ability and with efficient operation The all-in-one machine of speed.

Summary of the invention

In view of this, there is fault-tolerant ability, and execute speed the embodiment provides a kind of big data all-in-one machine Fastly, high-efficient.

The embodiment of the present invention provides a kind of big data all-in-one machine, including

Multiple Distributed Storage modules are distributed in mutually independent multiple memories, including a large amount of memory node；

Streams Module has input/output interface, for acquisition and transmitting data stream；

The a large amount of memory node on multiple computer nodes, with each memory connects one to one, with Obtain the data stored in the corresponding memory node；

Data-mining module connects the Streams Module and the Distributed Storage module, the data mining Data flow is reached the corresponding storage according to mining algorithm by the data flow classification and according to the classification of data flow by module It is stored in the corresponding memory node in device；

Distributed data management module is connect by channel to channel adapter with the multiple computer node, will be calculated and be appointed Business is assigned to the computer node being connected with the memory node that store the required by task data；

Multiple fault-tolerant modules of distributed multi-stage, at least correspond with multiple memories, and the distributed multi-stage holds Mismatch block connects the Streams Module, the data-mining module, distributed data management module and corresponding all described Memory node, each fault-tolerant module of distributed multi-stage is with hierarchical manner tissue process group, wherein based on a process group Process group, in dynamic, the interaction mode of other process groups is regularly updated by the checkpoint of the host process group, if described Host process group collapse, using standby execute election algorithm selected in other described process groups a process group as new master into The order that journey group, the transimission and storage of data flow and the distributed data management module issue flows through the corresponding distribution Multi-level fault tolerance module.

Further, the mining algorithm is one or more of clustering algorithm, sorting algorithm and prediction algorithm, is used In splitting data into the high hot spot data of rate of people logging in and the low cold data of rate of people logging in, the cold data is by the data-mining module It reaches in the cheap memory of low speed, the hot spot data is by the data-mining module according to storage prefetching algorithm distribution Into high speed and the memory of capacity greatly.

Further, distributed data management module includes multiple management nodes, described in each management node and one Memory is corresponding, and connect with all memory nodes in the corresponding memory, each management node Include:

Local DBMS, the data being responsible in the management memory corresponding with this management node；

With data connection component, the management node and other all management nodes are connected.

Further, the channel to channel adapter includes multiple channel switch, the quantity of the channel switch and the calculating The quantity of machine node is consistent, and connects one to one with the computer node.

The technical solution that the embodiment of the present invention provides has the benefit that big data one of the present invention Calculating task is assigned to and is store this using the distributed data management module by the channel to channel adapter by machine The computer node that the memory node of data needed for being engaged in is connected executes, to improve data transmission bauds, keeps away Exempt from serious data transmission bottle neck in traditional distributed calculating.Big data all-in-one machine of the present invention has multiple distributions more The fault-tolerant module of grade, the failure and mistake of all-in-one machine generation are coped with using dynamic multi-level fault tolerance mechanism, it is ensured that task is smooth, high The execution of speed, can prevent tasks interrupt and loss of data.

Detailed description of the invention

Fig. 1 is the schematic diagram of big data all-in-one machine of the present invention.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is further described.

Referring to FIG. 1, the embodiment provides a kind of big data all-in-one machine, including Streams Module 5, data are dug It is fault-tolerant to dig module 6, channel to channel adapter, multiple Distributed Storage modules 2, multiple computer nodes 3, multiple distributed multi-stages Module 7 and distributed data management module.

The multiple Distributed Storage module 2 is distributed in mutually independent multiple memories 1, each storage It include a Distributed Storage module 2 in device 1, each Distributed Storage module 2 includes a large amount of storage Node 21.

The quantity of the computer node 3 is consistent with the quantity of the memory node 21, and with the memory node 21 1 One is correspondingly connected with, so that each computer node 3 can directly and quickly obtain the storage for connection of being corresponding to it The data stored in node 21.

The Streams Module 5 has input/output interface, for acquisition and transmitting data stream.The data mining mould Block 6 connects the Streams Module 5 and all memory nodes 21, and the data-mining module 6 is according to mining algorithm by institute It states data flow classification and data flow is reached to the corresponding storage in the corresponding memory 1 according to the classification of data flow It is stored in node 21.Specifically, the mining algorithm is one of clustering algorithm, sorting algorithm and prediction algorithm or more Kind, for splitting data into the high hot spot data of rate of people logging in and the low cold data of rate of people logging in.The cold data is dug by the data Pick module 6 reaches in the cheap memory 1 of low speed, so as on the basis of small size sacrificial system overall performance, greatly The construction cost and energy consumption of width reduction system.The hot spot data is by the data-mining module 6 according to storage prefetching algorithm point It is assigned in high speed and the memory 1 of capacity greatly, to improve access performance and arithmetic speed.

The distributed data management module includes multiple management nodes 4, each management node 4 and a storage Device 1 is corresponding, and connect with all memory nodes 21 in the corresponding memory 1, each management node 4 Including local DBMS 41 and data connection component 42.The local DBMS 41 is responsible for management and this Data in the corresponding memory 1 of management node 41；The data connection component 42 connects the management node 4 and other institutes There is management node 4, so that data sharing is realized, convenient for the optimum management of overall data.

The channel to channel adapter includes multiple channel switch, the quantity of the channel switch and the computer node 3 Quantity is consistent, and connects one to one with the computer node 3.The distributed data management module is selected by the channel It selects device to connect with the multiple computer node 3, calculating task is assigned to and being store using parallel computation frame intelligence The computer node 3 that the memory node 21 of the required by task data is connected, can shorten the transmission path of data And transmission time, that can prevent data from malfunctioning during transmission to a certain extent, to avoid traditional distributed meter Serious data transmission bottle neck in calculation.

The multiple fault-tolerant module 7 of distributed multi-stage is at least corresponded with multiple memories 1, and the distribution is more The fault-tolerant module 7 of grade connects the Streams Module 5, the data-mining module 6, distributed data management module and is corresponding to it All memory nodes 21, each fault-tolerant module 7 of distributed multi-stage is with hierarchical manner tissue process group, wherein an institute Stating process group is main process group, and in dynamic, the interaction mode of other process groups is determined by the checkpoint of the host process group Phase updates, if the host process group is collapsed, executes election algorithm using standby and selects a process in other described process groups Group is used as new host process group, and the order of the transimission and storage of data flow and distributed data management module sending flows through pair The fault-tolerant module of the distributed multi-stage answered.I.e. each fault-tolerant module of the distributed multi-stage is corresponding described for handling The failure of memory perhaps Problem-Error and copes with these failures or mistake using dynamic multi-level fault tolerance mechanism, it is ensured that Task smoothly, the execution of high speed, tasks interrupt and loss of data can be prevented.

The technical solution that the embodiment of the present invention provides has the benefit that big data one of the present invention Calculating task is assigned to and is store this using the distributed data management module by the channel to channel adapter by machine The computer node 3 that the memory node 21 of data needed for being engaged in is connected executes, so that data transmission bauds is improved, Avoid serious data transmission bottle neck in traditional distributed calculating.Big data all-in-one machine of the present invention has multiple distributions Multi-level fault tolerance module 7 copes with the failure and mistake of all-in-one machine generation using dynamic multi-level fault tolerance mechanism, it is ensured that task is suitable The execution of benefit, high speed, can prevent tasks interrupt and loss of data.

Herein, the nouns of locality such as related front, rear, top, and bottom are to be located in figure with components in attached drawing and zero Part mutual position defines, only for the purpose of expressing the technical solution clearly and conveniently.It should be appreciated that the noun of locality Use should not limit the claimed range of the application.

In the absence of conflict, the feature in embodiment and embodiment herein-above set forth can be combined with each other.

The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims

1. a kind of big data all-in-one machine, it is characterised in that: including

The a large amount of memory node on multiple computer nodes, with each memory connects one to one, to obtain The data stored in the corresponding memory node；

Data-mining module connects the Streams Module and the Distributed Storage module, the data-mining module Data flow is reached in the corresponding memory according to mining algorithm by the data flow classification and according to the classification of data flow The corresponding memory node in store；

Distributed data management module is connect with the multiple computer node by channel to channel adapter, calculating task is referred to Send to the computer node being connected with the memory node that store the required by task data；

Multiple fault-tolerant modules of distributed multi-stage, at least correspond with multiple memories, the fault-tolerant mould of distributed multi-stage Block connects the Streams Module, the data-mining module, distributed data management module and corresponding all storages Node, each fault-tolerant module of distributed multi-stage is with hierarchical manner tissue process group, wherein a process group is host process Group, in dynamic, the interaction mode of other process groups regularly updates by the checkpoint of the host process group, if the master into The collapse of journey group executes election algorithm using standby and selects a process group as new host process in other described process groups It is more that the order that group, the transimission and storage of data flow and the distributed data management module issue flows through the corresponding distribution The fault-tolerant module of grade.

2. big data all-in-one machine as described in claim 1, it is characterised in that: the mining algorithm is clustering algorithm, classification calculation One or more of method and prediction algorithm, for splitting data into the high hot spot data of rate of people logging in and the low cold number of rate of people logging in According to the cold data is reached in the cheap memory of low speed by the data-mining module, and the hot spot data is described Data-mining module is distributed according to storage prefetching algorithm into high speed and the memory of capacity greatly.

3. big data all-in-one machine as described in claim 1, it is characterised in that: distributed data management module includes multiple management Node, each management node is corresponding with a memory, and with it is all described in the corresponding memory Memory node connection, each management node include:

4. big data all-in-one machine as described in claim 1, it is characterised in that: the channel to channel adapter includes that multiple channels are opened It closes, the quantity of the channel switch is consistent with the quantity of the computer node, and corresponds and connect with the computer node It connects.