CN109343791A - A kind of big data all-in-one machine - Google Patents

A kind of big data all-in-one machine Download PDF

Info

Publication number
CN109343791A
CN109343791A CN201810936219.0A CN201810936219A CN109343791A CN 109343791 A CN109343791 A CN 109343791A CN 201810936219 A CN201810936219 A CN 201810936219A CN 109343791 A CN109343791 A CN 109343791A
Authority
CN
China
Prior art keywords
data
module
node
distributed
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810936219.0A
Other languages
Chinese (zh)
Other versions
CN109343791B (en
Inventor
张隆显
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Yuan Ding Chong Tian Mdt Infotech Ltd
Original Assignee
Wuhan Yuan Ding Chong Tian Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Yuan Ding Chong Tian Mdt Infotech Ltd filed Critical Wuhan Yuan Ding Chong Tian Mdt Infotech Ltd
Priority to CN201810936219.0A priority Critical patent/CN109343791B/en
Publication of CN109343791A publication Critical patent/CN109343791A/en
Application granted granted Critical
Publication of CN109343791B publication Critical patent/CN109343791B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/062Securing storage systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)

Abstract

The present invention provides a kind of big data all-in-one machine, including multiple Distributed Storage modules, is distributed in mutually independent multiple memories, including a large amount of memory node;Streams Module, for acquisition and transmitting data stream;Multiple computer nodes, connect one to one with memory node, to obtain the data stored in memory node;Data-mining module connects Streams Module and Distributed Storage module, and the data flow classification and being reached data flow in corresponding memory node according to the classification of data flow is stored;Distributed data management module is connect with computer node by channel to channel adapter, calculating task is assigned to the computer node being connected with the memory node that store the required by task data;Multiple fault-tolerant modules of distributed multi-stage correspond with memory, and connect the Streams Module, data-mining module, distributed data management module and memory node, and data flow and order flow through the corresponding fault-tolerant module of distributed multi-stage.

Description

A kind of big data all-in-one machine
Technical field
The present invention relates to information technology field more particularly to a kind of big data all-in-one machines.
Background technique
As application of the information technology in mankind's items production and living is constantly expanded, analyzable data show explosion Formula increases.Potential value is efficiently and rapidly excavated from mass data and is converted into decision-making foundation has become every profession and trade letter The significant challenge that breathization faces.
All-in-one machine is the integrated system product that software is combined with hardware, generally collection data processing, data transmission, data Three aspect of storage is in one.All-in-one machine can be realized rapid deployment, simplify IT foundation frame by integrating, testing, optimization in advance Structure saves resource.But the software and hardware for forming all-in-one machine will appear failure and mistake from time to time, causes to crash or run Failure, to influence to calculate the time or even loss of data can be made.So being badly in need of a kind of with fault-tolerant ability and with efficient operation The all-in-one machine of speed.
Summary of the invention
In view of this, there is fault-tolerant ability, and execute speed the embodiment provides a kind of big data all-in-one machine Fastly, high-efficient.
The embodiment of the present invention provides a kind of big data all-in-one machine, including
Multiple Distributed Storage modules are distributed in mutually independent multiple memories, including a large amount of memory node;
Streams Module has input/output interface, for acquisition and transmitting data stream;
The a large amount of memory node on multiple computer nodes, with each memory connects one to one, with Obtain the data stored in the corresponding memory node;
Data-mining module connects the Streams Module and the Distributed Storage module, the data mining Data flow is reached the corresponding storage according to mining algorithm by the data flow classification and according to the classification of data flow by module It is stored in the corresponding memory node in device;
Distributed data management module is connect by channel to channel adapter with the multiple computer node, will be calculated and be appointed Business is assigned to the computer node being connected with the memory node that store the required by task data;
Multiple fault-tolerant modules of distributed multi-stage, at least correspond with multiple memories, and the distributed multi-stage holds Mismatch block connects the Streams Module, the data-mining module, distributed data management module and corresponding all described Memory node, each fault-tolerant module of distributed multi-stage is with hierarchical manner tissue process group, wherein based on a process group Process group, in dynamic, the interaction mode of other process groups is regularly updated by the checkpoint of the host process group, if described Host process group collapse, using standby execute election algorithm selected in other described process groups a process group as new master into The order that journey group, the transimission and storage of data flow and the distributed data management module issue flows through the corresponding distribution Multi-level fault tolerance module.
Further, the mining algorithm is one or more of clustering algorithm, sorting algorithm and prediction algorithm, is used In splitting data into the high hot spot data of rate of people logging in and the low cold data of rate of people logging in, the cold data is by the data-mining module It reaches in the cheap memory of low speed, the hot spot data is by the data-mining module according to storage prefetching algorithm distribution Into high speed and the memory of capacity greatly.
Further, distributed data management module includes multiple management nodes, described in each management node and one Memory is corresponding, and connect with all memory nodes in the corresponding memory, each management node Include:
Local DBMS, the data being responsible in the management memory corresponding with this management node;
With data connection component, the management node and other all management nodes are connected.
Further, the channel to channel adapter includes multiple channel switch, the quantity of the channel switch and the calculating The quantity of machine node is consistent, and connects one to one with the computer node.
The technical solution that the embodiment of the present invention provides has the benefit that big data one of the present invention Calculating task is assigned to and is store this using the distributed data management module by the channel to channel adapter by machine The computer node that the memory node of data needed for being engaged in is connected executes, to improve data transmission bauds, keeps away Exempt from serious data transmission bottle neck in traditional distributed calculating.Big data all-in-one machine of the present invention has multiple distributions more The fault-tolerant module of grade, the failure and mistake of all-in-one machine generation are coped with using dynamic multi-level fault tolerance mechanism, it is ensured that task is smooth, high The execution of speed, can prevent tasks interrupt and loss of data.
Detailed description of the invention
Fig. 1 is the schematic diagram of big data all-in-one machine of the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is further described.
Referring to FIG. 1, the embodiment provides a kind of big data all-in-one machine, including Streams Module 5, data are dug It is fault-tolerant to dig module 6, channel to channel adapter, multiple Distributed Storage modules 2, multiple computer nodes 3, multiple distributed multi-stages Module 7 and distributed data management module.
The multiple Distributed Storage module 2 is distributed in mutually independent multiple memories 1, each storage It include a Distributed Storage module 2 in device 1, each Distributed Storage module 2 includes a large amount of storage Node 21.
The quantity of the computer node 3 is consistent with the quantity of the memory node 21, and with the memory node 21 1 One is correspondingly connected with, so that each computer node 3 can directly and quickly obtain the storage for connection of being corresponding to it The data stored in node 21.
The Streams Module 5 has input/output interface, for acquisition and transmitting data stream.The data mining mould Block 6 connects the Streams Module 5 and all memory nodes 21, and the data-mining module 6 is according to mining algorithm by institute It states data flow classification and data flow is reached to the corresponding storage in the corresponding memory 1 according to the classification of data flow It is stored in node 21.Specifically, the mining algorithm is one of clustering algorithm, sorting algorithm and prediction algorithm or more Kind, for splitting data into the high hot spot data of rate of people logging in and the low cold data of rate of people logging in.The cold data is dug by the data Pick module 6 reaches in the cheap memory 1 of low speed, so as on the basis of small size sacrificial system overall performance, greatly The construction cost and energy consumption of width reduction system.The hot spot data is by the data-mining module 6 according to storage prefetching algorithm point It is assigned in high speed and the memory 1 of capacity greatly, to improve access performance and arithmetic speed.
The distributed data management module includes multiple management nodes 4, each management node 4 and a storage Device 1 is corresponding, and connect with all memory nodes 21 in the corresponding memory 1, each management node 4 Including local DBMS 41 and data connection component 42.The local DBMS 41 is responsible for management and this Data in the corresponding memory 1 of management node 41;The data connection component 42 connects the management node 4 and other institutes There is management node 4, so that data sharing is realized, convenient for the optimum management of overall data.
The channel to channel adapter includes multiple channel switch, the quantity of the channel switch and the computer node 3 Quantity is consistent, and connects one to one with the computer node 3.The distributed data management module is selected by the channel It selects device to connect with the multiple computer node 3, calculating task is assigned to and being store using parallel computation frame intelligence The computer node 3 that the memory node 21 of the required by task data is connected, can shorten the transmission path of data And transmission time, that can prevent data from malfunctioning during transmission to a certain extent, to avoid traditional distributed meter Serious data transmission bottle neck in calculation.
The multiple fault-tolerant module 7 of distributed multi-stage is at least corresponded with multiple memories 1, and the distribution is more The fault-tolerant module 7 of grade connects the Streams Module 5, the data-mining module 6, distributed data management module and is corresponding to it All memory nodes 21, each fault-tolerant module 7 of distributed multi-stage is with hierarchical manner tissue process group, wherein an institute Stating process group is main process group, and in dynamic, the interaction mode of other process groups is determined by the checkpoint of the host process group Phase updates, if the host process group is collapsed, executes election algorithm using standby and selects a process in other described process groups Group is used as new host process group, and the order of the transimission and storage of data flow and distributed data management module sending flows through pair The fault-tolerant module of the distributed multi-stage answered.I.e. each fault-tolerant module of the distributed multi-stage is corresponding described for handling The failure of memory perhaps Problem-Error and copes with these failures or mistake using dynamic multi-level fault tolerance mechanism, it is ensured that Task smoothly, the execution of high speed, tasks interrupt and loss of data can be prevented.
The technical solution that the embodiment of the present invention provides has the benefit that big data one of the present invention Calculating task is assigned to and is store this using the distributed data management module by the channel to channel adapter by machine The computer node 3 that the memory node 21 of data needed for being engaged in is connected executes, so that data transmission bauds is improved, Avoid serious data transmission bottle neck in traditional distributed calculating.Big data all-in-one machine of the present invention has multiple distributions Multi-level fault tolerance module 7 copes with the failure and mistake of all-in-one machine generation using dynamic multi-level fault tolerance mechanism, it is ensured that task is suitable The execution of benefit, high speed, can prevent tasks interrupt and loss of data.
Herein, the nouns of locality such as related front, rear, top, and bottom are to be located in figure with components in attached drawing and zero Part mutual position defines, only for the purpose of expressing the technical solution clearly and conveniently.It should be appreciated that the noun of locality Use should not limit the claimed range of the application.
In the absence of conflict, the feature in embodiment and embodiment herein-above set forth can be combined with each other.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims (4)

1. a kind of big data all-in-one machine, it is characterised in that: including
Multiple Distributed Storage modules are distributed in mutually independent multiple memories, including a large amount of memory node;
Streams Module has input/output interface, for acquisition and transmitting data stream;
The a large amount of memory node on multiple computer nodes, with each memory connects one to one, to obtain The data stored in the corresponding memory node;
Data-mining module connects the Streams Module and the Distributed Storage module, the data-mining module Data flow is reached in the corresponding memory according to mining algorithm by the data flow classification and according to the classification of data flow The corresponding memory node in store;
Distributed data management module is connect with the multiple computer node by channel to channel adapter, calculating task is referred to Send to the computer node being connected with the memory node that store the required by task data;
Multiple fault-tolerant modules of distributed multi-stage, at least correspond with multiple memories, the fault-tolerant mould of distributed multi-stage Block connects the Streams Module, the data-mining module, distributed data management module and corresponding all storages Node, each fault-tolerant module of distributed multi-stage is with hierarchical manner tissue process group, wherein a process group is host process Group, in dynamic, the interaction mode of other process groups regularly updates by the checkpoint of the host process group, if the master into The collapse of journey group executes election algorithm using standby and selects a process group as new host process in other described process groups It is more that the order that group, the transimission and storage of data flow and the distributed data management module issue flows through the corresponding distribution The fault-tolerant module of grade.
2. big data all-in-one machine as described in claim 1, it is characterised in that: the mining algorithm is clustering algorithm, classification calculation One or more of method and prediction algorithm, for splitting data into the high hot spot data of rate of people logging in and the low cold number of rate of people logging in According to the cold data is reached in the cheap memory of low speed by the data-mining module, and the hot spot data is described Data-mining module is distributed according to storage prefetching algorithm into high speed and the memory of capacity greatly.
3. big data all-in-one machine as described in claim 1, it is characterised in that: distributed data management module includes multiple management Node, each management node is corresponding with a memory, and with it is all described in the corresponding memory Memory node connection, each management node include:
Local DBMS, the data being responsible in the management memory corresponding with this management node;
With data connection component, the management node and other all management nodes are connected.
4. big data all-in-one machine as described in claim 1, it is characterised in that: the channel to channel adapter includes that multiple channels are opened It closes, the quantity of the channel switch is consistent with the quantity of the computer node, and corresponds and connect with the computer node It connects.
CN201810936219.0A 2018-08-16 2018-08-16 Big data all-in-one Active CN109343791B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810936219.0A CN109343791B (en) 2018-08-16 2018-08-16 Big data all-in-one

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810936219.0A CN109343791B (en) 2018-08-16 2018-08-16 Big data all-in-one

Publications (2)

Publication Number Publication Date
CN109343791A true CN109343791A (en) 2019-02-15
CN109343791B CN109343791B (en) 2021-11-09

Family

ID=65296937

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810936219.0A Active CN109343791B (en) 2018-08-16 2018-08-16 Big data all-in-one

Country Status (1)

Country Link
CN (1) CN109343791B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1494688A (en) * 2001-02-24 2004-05-05 �Ҵ���˾ Novel massively parallel super computer
CN104580503A (en) * 2015-01-26 2015-04-29 浪潮电子信息产业股份有限公司 Efficient dynamic load balancing system and method for processing large-scale data
US20170024251A1 (en) * 2014-04-09 2017-01-26 Tencent Technology (Shenzhen) Company Limited Scheduling method and apparatus for distributed computing system
CN106575296A (en) * 2014-06-20 2017-04-19 亚马逊技术股份有限公司 Dynamic N-dimensional cubes for hosted analytics
CN106815338A (en) * 2016-12-25 2017-06-09 北京中海投资管理有限公司 A kind of real-time storage of big data, treatment and inquiry system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1494688A (en) * 2001-02-24 2004-05-05 �Ҵ���˾ Novel massively parallel super computer
US20170024251A1 (en) * 2014-04-09 2017-01-26 Tencent Technology (Shenzhen) Company Limited Scheduling method and apparatus for distributed computing system
CN106575296A (en) * 2014-06-20 2017-04-19 亚马逊技术股份有限公司 Dynamic N-dimensional cubes for hosted analytics
CN104580503A (en) * 2015-01-26 2015-04-29 浪潮电子信息产业股份有限公司 Efficient dynamic load balancing system and method for processing large-scale data
CN106815338A (en) * 2016-12-25 2017-06-09 北京中海投资管理有限公司 A kind of real-time storage of big data, treatment and inquiry system

Also Published As

Publication number Publication date
CN109343791B (en) 2021-11-09

Similar Documents

Publication Publication Date Title
Shao et al. Managing and mining large graphs: systems and implementations
CN102307206B (en) Caching system and caching method for rapidly accessing virtual machine images based on cloud storage
Han et al. Spark: A big data processing platform based on memory computing
CN104011736A (en) Methods and systems for detection in a state machine
CN103235817B (en) A kind of extensive infection control data storage processing method
CN112148578A (en) IT fault defect prediction method based on machine learning
CN107301243A (en) Switchgear fault signature extracting method based on big data platform
CN107895046A (en) A kind of Heterogeneous Database Integration Platform
CN107992354A (en) For reducing the method and device of memory load
Li Modernization of databases in the cloud era: Building databases that run like Legos
CN109343791A (en) A kind of big data all-in-one machine
CN113177088A (en) Multi-scale simulation big data management system for material irradiation damage
Kudinov et al. Derivational modal logics with the difference modality
CN109799728B (en) Fault-tolerant CPS simulation test method based on hierarchical adaptive strategy
CN111241455A (en) Data processing apparatus, computer device, and storage medium
US20120054247A1 (en) Method and Apparatus for Automated Processing of a Data Stream
Li et al. A single-scan algorithm for mining sequential patterns from data streams
Dao Thi et al. Stochastic automata networks with master/slave synchronization: Product form and tensor
Thalheim et al. Analysis-driven data collection, integration and preparation for visualisation
CN105677853A (en) Data storage method and device based on big data technology framework
CN107193686A (en) Method and apparatus for data backup
Daud et al. Scalable link prediction in twitter using self-configured framework
Khan et al. Smart Data Placement for Big Data Pipelines: An Approach based on the Storage-as-a-Service Model
Kalimoldayev et al. Solving mean-shift clustering using MapReduce Hadoop
Chiu et al. Processor-embedded distributed smart disks for I/O-intensive workloads: architectures, performance models and evaluation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant