CN109343791A - A kind of big data all-in-one machine - Google Patents
A kind of big data all-in-one machine Download PDFInfo
- Publication number
- CN109343791A CN109343791A CN201810936219.0A CN201810936219A CN109343791A CN 109343791 A CN109343791 A CN 109343791A CN 201810936219 A CN201810936219 A CN 201810936219A CN 109343791 A CN109343791 A CN 109343791A
- Authority
- CN
- China
- Prior art keywords
- data
- module
- node
- distributed
- memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/062—Securing storage systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Hardware Redundancy (AREA)
Abstract
The present invention provides a kind of big data all-in-one machine, including multiple Distributed Storage modules, is distributed in mutually independent multiple memories, including a large amount of memory node;Streams Module, for acquisition and transmitting data stream;Multiple computer nodes, connect one to one with memory node, to obtain the data stored in memory node;Data-mining module connects Streams Module and Distributed Storage module, and the data flow classification and being reached data flow in corresponding memory node according to the classification of data flow is stored;Distributed data management module is connect with computer node by channel to channel adapter, calculating task is assigned to the computer node being connected with the memory node that store the required by task data;Multiple fault-tolerant modules of distributed multi-stage correspond with memory, and connect the Streams Module, data-mining module, distributed data management module and memory node, and data flow and order flow through the corresponding fault-tolerant module of distributed multi-stage.
Description
Technical field
The present invention relates to information technology field more particularly to a kind of big data all-in-one machines.
Background technique
As application of the information technology in mankind's items production and living is constantly expanded, analyzable data show explosion
Formula increases.Potential value is efficiently and rapidly excavated from mass data and is converted into decision-making foundation has become every profession and trade letter
The significant challenge that breathization faces.
All-in-one machine is the integrated system product that software is combined with hardware, generally collection data processing, data transmission, data
Three aspect of storage is in one.All-in-one machine can be realized rapid deployment, simplify IT foundation frame by integrating, testing, optimization in advance
Structure saves resource.But the software and hardware for forming all-in-one machine will appear failure and mistake from time to time, causes to crash or run
Failure, to influence to calculate the time or even loss of data can be made.So being badly in need of a kind of with fault-tolerant ability and with efficient operation
The all-in-one machine of speed.
Summary of the invention
In view of this, there is fault-tolerant ability, and execute speed the embodiment provides a kind of big data all-in-one machine
Fastly, high-efficient.
The embodiment of the present invention provides a kind of big data all-in-one machine, including
Multiple Distributed Storage modules are distributed in mutually independent multiple memories, including a large amount of memory node;
Streams Module has input/output interface, for acquisition and transmitting data stream;
The a large amount of memory node on multiple computer nodes, with each memory connects one to one, with
Obtain the data stored in the corresponding memory node;
Data-mining module connects the Streams Module and the Distributed Storage module, the data mining
Data flow is reached the corresponding storage according to mining algorithm by the data flow classification and according to the classification of data flow by module
It is stored in the corresponding memory node in device;
Distributed data management module is connect by channel to channel adapter with the multiple computer node, will be calculated and be appointed
Business is assigned to the computer node being connected with the memory node that store the required by task data;
Multiple fault-tolerant modules of distributed multi-stage, at least correspond with multiple memories, and the distributed multi-stage holds
Mismatch block connects the Streams Module, the data-mining module, distributed data management module and corresponding all described
Memory node, each fault-tolerant module of distributed multi-stage is with hierarchical manner tissue process group, wherein based on a process group
Process group, in dynamic, the interaction mode of other process groups is regularly updated by the checkpoint of the host process group, if described
Host process group collapse, using standby execute election algorithm selected in other described process groups a process group as new master into
The order that journey group, the transimission and storage of data flow and the distributed data management module issue flows through the corresponding distribution
Multi-level fault tolerance module.
Further, the mining algorithm is one or more of clustering algorithm, sorting algorithm and prediction algorithm, is used
In splitting data into the high hot spot data of rate of people logging in and the low cold data of rate of people logging in, the cold data is by the data-mining module
It reaches in the cheap memory of low speed, the hot spot data is by the data-mining module according to storage prefetching algorithm distribution
Into high speed and the memory of capacity greatly.
Further, distributed data management module includes multiple management nodes, described in each management node and one
Memory is corresponding, and connect with all memory nodes in the corresponding memory, each management node
Include:
Local DBMS, the data being responsible in the management memory corresponding with this management node;
With data connection component, the management node and other all management nodes are connected.
Further, the channel to channel adapter includes multiple channel switch, the quantity of the channel switch and the calculating
The quantity of machine node is consistent, and connects one to one with the computer node.
The technical solution that the embodiment of the present invention provides has the benefit that big data one of the present invention
Calculating task is assigned to and is store this using the distributed data management module by the channel to channel adapter by machine
The computer node that the memory node of data needed for being engaged in is connected executes, to improve data transmission bauds, keeps away
Exempt from serious data transmission bottle neck in traditional distributed calculating.Big data all-in-one machine of the present invention has multiple distributions more
The fault-tolerant module of grade, the failure and mistake of all-in-one machine generation are coped with using dynamic multi-level fault tolerance mechanism, it is ensured that task is smooth, high
The execution of speed, can prevent tasks interrupt and loss of data.
Detailed description of the invention
Fig. 1 is the schematic diagram of big data all-in-one machine of the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention
Formula is further described.
Referring to FIG. 1, the embodiment provides a kind of big data all-in-one machine, including Streams Module 5, data are dug
It is fault-tolerant to dig module 6, channel to channel adapter, multiple Distributed Storage modules 2, multiple computer nodes 3, multiple distributed multi-stages
Module 7 and distributed data management module.
The multiple Distributed Storage module 2 is distributed in mutually independent multiple memories 1, each storage
It include a Distributed Storage module 2 in device 1, each Distributed Storage module 2 includes a large amount of storage
Node 21.
The quantity of the computer node 3 is consistent with the quantity of the memory node 21, and with the memory node 21 1
One is correspondingly connected with, so that each computer node 3 can directly and quickly obtain the storage for connection of being corresponding to it
The data stored in node 21.
The Streams Module 5 has input/output interface, for acquisition and transmitting data stream.The data mining mould
Block 6 connects the Streams Module 5 and all memory nodes 21, and the data-mining module 6 is according to mining algorithm by institute
It states data flow classification and data flow is reached to the corresponding storage in the corresponding memory 1 according to the classification of data flow
It is stored in node 21.Specifically, the mining algorithm is one of clustering algorithm, sorting algorithm and prediction algorithm or more
Kind, for splitting data into the high hot spot data of rate of people logging in and the low cold data of rate of people logging in.The cold data is dug by the data
Pick module 6 reaches in the cheap memory 1 of low speed, so as on the basis of small size sacrificial system overall performance, greatly
The construction cost and energy consumption of width reduction system.The hot spot data is by the data-mining module 6 according to storage prefetching algorithm point
It is assigned in high speed and the memory 1 of capacity greatly, to improve access performance and arithmetic speed.
The distributed data management module includes multiple management nodes 4, each management node 4 and a storage
Device 1 is corresponding, and connect with all memory nodes 21 in the corresponding memory 1, each management node 4
Including local DBMS 41 and data connection component 42.The local DBMS 41 is responsible for management and this
Data in the corresponding memory 1 of management node 41;The data connection component 42 connects the management node 4 and other institutes
There is management node 4, so that data sharing is realized, convenient for the optimum management of overall data.
The channel to channel adapter includes multiple channel switch, the quantity of the channel switch and the computer node 3
Quantity is consistent, and connects one to one with the computer node 3.The distributed data management module is selected by the channel
It selects device to connect with the multiple computer node 3, calculating task is assigned to and being store using parallel computation frame intelligence
The computer node 3 that the memory node 21 of the required by task data is connected, can shorten the transmission path of data
And transmission time, that can prevent data from malfunctioning during transmission to a certain extent, to avoid traditional distributed meter
Serious data transmission bottle neck in calculation.
The multiple fault-tolerant module 7 of distributed multi-stage is at least corresponded with multiple memories 1, and the distribution is more
The fault-tolerant module 7 of grade connects the Streams Module 5, the data-mining module 6, distributed data management module and is corresponding to it
All memory nodes 21, each fault-tolerant module 7 of distributed multi-stage is with hierarchical manner tissue process group, wherein an institute
Stating process group is main process group, and in dynamic, the interaction mode of other process groups is determined by the checkpoint of the host process group
Phase updates, if the host process group is collapsed, executes election algorithm using standby and selects a process in other described process groups
Group is used as new host process group, and the order of the transimission and storage of data flow and distributed data management module sending flows through pair
The fault-tolerant module of the distributed multi-stage answered.I.e. each fault-tolerant module of the distributed multi-stage is corresponding described for handling
The failure of memory perhaps Problem-Error and copes with these failures or mistake using dynamic multi-level fault tolerance mechanism, it is ensured that
Task smoothly, the execution of high speed, tasks interrupt and loss of data can be prevented.
The technical solution that the embodiment of the present invention provides has the benefit that big data one of the present invention
Calculating task is assigned to and is store this using the distributed data management module by the channel to channel adapter by machine
The computer node 3 that the memory node 21 of data needed for being engaged in is connected executes, so that data transmission bauds is improved,
Avoid serious data transmission bottle neck in traditional distributed calculating.Big data all-in-one machine of the present invention has multiple distributions
Multi-level fault tolerance module 7 copes with the failure and mistake of all-in-one machine generation using dynamic multi-level fault tolerance mechanism, it is ensured that task is suitable
The execution of benefit, high speed, can prevent tasks interrupt and loss of data.
Herein, the nouns of locality such as related front, rear, top, and bottom are to be located in figure with components in attached drawing and zero
Part mutual position defines, only for the purpose of expressing the technical solution clearly and conveniently.It should be appreciated that the noun of locality
Use should not limit the claimed range of the application.
In the absence of conflict, the feature in embodiment and embodiment herein-above set forth can be combined with each other.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and
Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
Claims (4)
1. a kind of big data all-in-one machine, it is characterised in that: including
Multiple Distributed Storage modules are distributed in mutually independent multiple memories, including a large amount of memory node;
Streams Module has input/output interface, for acquisition and transmitting data stream;
The a large amount of memory node on multiple computer nodes, with each memory connects one to one, to obtain
The data stored in the corresponding memory node;
Data-mining module connects the Streams Module and the Distributed Storage module, the data-mining module
Data flow is reached in the corresponding memory according to mining algorithm by the data flow classification and according to the classification of data flow
The corresponding memory node in store;
Distributed data management module is connect with the multiple computer node by channel to channel adapter, calculating task is referred to
Send to the computer node being connected with the memory node that store the required by task data;
Multiple fault-tolerant modules of distributed multi-stage, at least correspond with multiple memories, the fault-tolerant mould of distributed multi-stage
Block connects the Streams Module, the data-mining module, distributed data management module and corresponding all storages
Node, each fault-tolerant module of distributed multi-stage is with hierarchical manner tissue process group, wherein a process group is host process
Group, in dynamic, the interaction mode of other process groups regularly updates by the checkpoint of the host process group, if the master into
The collapse of journey group executes election algorithm using standby and selects a process group as new host process in other described process groups
It is more that the order that group, the transimission and storage of data flow and the distributed data management module issue flows through the corresponding distribution
The fault-tolerant module of grade.
2. big data all-in-one machine as described in claim 1, it is characterised in that: the mining algorithm is clustering algorithm, classification calculation
One or more of method and prediction algorithm, for splitting data into the high hot spot data of rate of people logging in and the low cold number of rate of people logging in
According to the cold data is reached in the cheap memory of low speed by the data-mining module, and the hot spot data is described
Data-mining module is distributed according to storage prefetching algorithm into high speed and the memory of capacity greatly.
3. big data all-in-one machine as described in claim 1, it is characterised in that: distributed data management module includes multiple management
Node, each management node is corresponding with a memory, and with it is all described in the corresponding memory
Memory node connection, each management node include:
Local DBMS, the data being responsible in the management memory corresponding with this management node;
With data connection component, the management node and other all management nodes are connected.
4. big data all-in-one machine as described in claim 1, it is characterised in that: the channel to channel adapter includes that multiple channels are opened
It closes, the quantity of the channel switch is consistent with the quantity of the computer node, and corresponds and connect with the computer node
It connects.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810936219.0A CN109343791B (en) | 2018-08-16 | 2018-08-16 | Big data all-in-one |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810936219.0A CN109343791B (en) | 2018-08-16 | 2018-08-16 | Big data all-in-one |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109343791A true CN109343791A (en) | 2019-02-15 |
CN109343791B CN109343791B (en) | 2021-11-09 |
Family
ID=65296937
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810936219.0A Active CN109343791B (en) | 2018-08-16 | 2018-08-16 | Big data all-in-one |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109343791B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1494688A (en) * | 2001-02-24 | 2004-05-05 | �Ҵ���˾ | Novel massively parallel super computer |
CN104580503A (en) * | 2015-01-26 | 2015-04-29 | 浪潮电子信息产业股份有限公司 | Efficient dynamic load balancing system and method for processing large-scale data |
US20170024251A1 (en) * | 2014-04-09 | 2017-01-26 | Tencent Technology (Shenzhen) Company Limited | Scheduling method and apparatus for distributed computing system |
CN106575296A (en) * | 2014-06-20 | 2017-04-19 | 亚马逊技术股份有限公司 | Dynamic N-dimensional cubes for hosted analytics |
CN106815338A (en) * | 2016-12-25 | 2017-06-09 | 北京中海投资管理有限公司 | A kind of real-time storage of big data, treatment and inquiry system |
-
2018
- 2018-08-16 CN CN201810936219.0A patent/CN109343791B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1494688A (en) * | 2001-02-24 | 2004-05-05 | �Ҵ���˾ | Novel massively parallel super computer |
US20170024251A1 (en) * | 2014-04-09 | 2017-01-26 | Tencent Technology (Shenzhen) Company Limited | Scheduling method and apparatus for distributed computing system |
CN106575296A (en) * | 2014-06-20 | 2017-04-19 | 亚马逊技术股份有限公司 | Dynamic N-dimensional cubes for hosted analytics |
CN104580503A (en) * | 2015-01-26 | 2015-04-29 | 浪潮电子信息产业股份有限公司 | Efficient dynamic load balancing system and method for processing large-scale data |
CN106815338A (en) * | 2016-12-25 | 2017-06-09 | 北京中海投资管理有限公司 | A kind of real-time storage of big data, treatment and inquiry system |
Also Published As
Publication number | Publication date |
---|---|
CN109343791B (en) | 2021-11-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Shao et al. | Managing and mining large graphs: systems and implementations | |
CN102307206B (en) | Caching system and caching method for rapidly accessing virtual machine images based on cloud storage | |
Han et al. | Spark: A big data processing platform based on memory computing | |
CN104011736A (en) | Methods and systems for detection in a state machine | |
CN103235817B (en) | A kind of extensive infection control data storage processing method | |
CN112148578A (en) | IT fault defect prediction method based on machine learning | |
CN107301243A (en) | Switchgear fault signature extracting method based on big data platform | |
CN107895046A (en) | A kind of Heterogeneous Database Integration Platform | |
CN107992354A (en) | For reducing the method and device of memory load | |
Li | Modernization of databases in the cloud era: Building databases that run like Legos | |
CN109343791A (en) | A kind of big data all-in-one machine | |
CN113177088A (en) | Multi-scale simulation big data management system for material irradiation damage | |
Kudinov et al. | Derivational modal logics with the difference modality | |
CN109799728B (en) | Fault-tolerant CPS simulation test method based on hierarchical adaptive strategy | |
CN111241455A (en) | Data processing apparatus, computer device, and storage medium | |
US20120054247A1 (en) | Method and Apparatus for Automated Processing of a Data Stream | |
Li et al. | A single-scan algorithm for mining sequential patterns from data streams | |
Dao Thi et al. | Stochastic automata networks with master/slave synchronization: Product form and tensor | |
Thalheim et al. | Analysis-driven data collection, integration and preparation for visualisation | |
CN105677853A (en) | Data storage method and device based on big data technology framework | |
CN107193686A (en) | Method and apparatus for data backup | |
Daud et al. | Scalable link prediction in twitter using self-configured framework | |
Khan et al. | Smart Data Placement for Big Data Pipelines: An Approach based on the Storage-as-a-Service Model | |
Kalimoldayev et al. | Solving mean-shift clustering using MapReduce Hadoop | |
Chiu et al. | Processor-embedded distributed smart disks for I/O-intensive workloads: architectures, performance models and evaluation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |