CN109343791B - Big data all-in-one - Google Patents

Big data all-in-one Download PDF

Info

Publication number
CN109343791B
CN109343791B CN201810936219.0A CN201810936219A CN109343791B CN 109343791 B CN109343791 B CN 109343791B CN 201810936219 A CN201810936219 A CN 201810936219A CN 109343791 B CN109343791 B CN 109343791B
Authority
CN
China
Prior art keywords
data
distributed
module
storage
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810936219.0A
Other languages
Chinese (zh)
Other versions
CN109343791A (en
Inventor
张隆显
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Yuanding Chuangtian Information Technology Co ltd
Original Assignee
Wuhan Yuanding Chuangtian Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Yuanding Chuangtian Information Technology Co ltd filed Critical Wuhan Yuanding Chuangtian Information Technology Co ltd
Priority to CN201810936219.0A priority Critical patent/CN109343791B/en
Publication of CN109343791A publication Critical patent/CN109343791A/en
Application granted granted Critical
Publication of CN109343791B publication Critical patent/CN109343791B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/062Securing storage systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention provides a big data all-in-one machine which comprises a plurality of distributed data storage modules, a plurality of storages and a plurality of storage nodes, wherein the distributed data storage modules are distributed in the mutually independent storages and comprise a large number of storage nodes; the data flow module is used for acquiring and transmitting data flow; the computer nodes are connected with the storage nodes in a one-to-one correspondence mode so as to acquire data stored in the storage nodes; the data mining module is connected with the data stream module and the distributed data storage module, classifies the data streams and stores the data streams into corresponding storage nodes according to the classes of the data streams; a distributed data management module connected with the computer nodes through the channel selector to assign the computing task to the computer nodes connected with the storage nodes storing the data required by the task; and the distributed multi-level fault-tolerant modules correspond to the memories one by one and are connected with the data stream modules, the data mining modules, the distributed data management modules and the storage nodes, and data streams and commands flow through the corresponding distributed multi-level fault-tolerant modules.

Description

Big data all-in-one
Technical Field
The invention relates to the technical field of information, in particular to a big data all-in-one machine.
Background
With the continuous expansion of the application of information technology in various production and life of human beings, analyzable data shows explosive growth. The potential value is efficiently and rapidly mined from the mass data and converted into a decision basis, which becomes a great challenge for informatization of various industries.
The all-in-one machine is an integrated system product combining software and hardware, and generally integrates three aspects of data processing, data transmission and data storage. The integrated machine can realize rapid deployment, simplify IT infrastructure and save resources by pre-integration, testing and optimization. However, software and hardware forming the all-in-one machine sometimes have faults and errors, so that the all-in-one machine is halted or fails to operate, the calculation time is influenced, and even data is lost. Therefore, an all-in-one machine with fault-tolerant capability and high running speed is urgently needed.
Disclosure of Invention
In view of this, the embodiment of the present invention provides a big data all-in-one machine, which has fault tolerance capability, fast execution speed and high efficiency.
The embodiment of the invention provides a big data all-in-one machine, which comprises
The distributed data storage modules are distributed in a plurality of mutually independent storages and comprise a large number of storage nodes;
the data flow module is provided with an input and output interface and is used for acquiring and transmitting data flow;
the computer nodes are connected with a large number of storage nodes on each storage in a one-to-one correspondence mode so as to acquire data stored in the corresponding storage nodes;
the data mining module is connected with the data stream module and the distributed data storage module, classifies the data streams according to a mining algorithm and stores the data streams into the corresponding storage nodes in the corresponding memories according to the classification of the data streams;
a distributed data management module connected to the plurality of computer nodes through a channel selector to assign a computing task to the computer node connected to the storage node storing data required by the task;
the distributed multi-level fault-tolerant modules are at least in one-to-one correspondence with the memories, and are connected with the data stream module, the data mining module, the distributed data management module and all the corresponding storage nodes, each distributed multi-level fault-tolerant module organizes process groups in a level mode, one process group is a main process group and is in a dynamic state, the interaction states of other process groups are periodically updated by a check point of the main process group, if the main process group crashes, a backup execution election algorithm is adopted to select one process group from other process groups as a new main process group, and data stream transmission and storage and commands sent by the distributed data management module flow through the corresponding distributed multi-level fault-tolerant modules.
Further, the mining algorithm is one or more of a clustering algorithm, a classification algorithm and a prediction algorithm, and is used for dividing data into hot spot data with a high access rate and cold data with a low access rate, the cold data is transmitted to the low-speed and low-cost memory by the data mining module, and the hot spot data is distributed to the high-speed and high-capacity memory by the data mining module according to a memory pre-fetching algorithm.
Further, the distributed data management module includes a plurality of management nodes, each of the management nodes corresponds to one of the memories and is connected to all of the storage nodes in the memory corresponding to the management node, and each of the management nodes includes:
the local database management system is responsible for managing the data in the memory corresponding to the management node;
and the data connecting component is used for connecting the management node with all other management nodes.
Further, the channel selector comprises a plurality of channel switches, the number of the channel switches is consistent with that of the computer nodes, and the channel switches are connected with the computer nodes in a one-to-one correspondence mode.
The technical scheme provided by the embodiment of the invention has the following beneficial effects: according to the big data all-in-one machine, the distributed data management module is used for assigning the computing task to the computer node connected with the storage node storing the data required by the task to be executed through the channel selector, so that the data transmission speed is improved, and the serious data transmission bottleneck in the traditional distributed computing is avoided. The big data all-in-one machine is provided with a plurality of distributed multi-stage fault-tolerant modules, and a dynamic multi-stage fault-tolerant mechanism is utilized to deal with faults and errors of the all-in-one machine, so that smooth and high-speed execution of tasks is ensured, and task interruption and data loss can be prevented.
Drawings
FIG. 1 is a schematic diagram of the big data all-in-one machine of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be further described with reference to the accompanying drawings.
Referring to fig. 1, an embodiment of the present invention provides a big data all-in-one machine, which includes a data stream module 5, a data mining module 6, a channel selector, a plurality of distributed data storage modules 2, a plurality of computer nodes 3, a plurality of distributed multi-level fault tolerance modules 7, and a distributed data management module.
The plurality of distributed data storage modules 2 are distributed in a plurality of memories 1 which are independent from each other, each memory 1 contains one distributed data storage module 2, and each distributed data storage module 2 comprises a large number of storage nodes 21.
The number of the computer nodes 3 is the same as the number of the storage nodes 21, and the computer nodes 3 are connected with the storage nodes 21 in a one-to-one correspondence manner, so that each computer node 3 can directly and quickly acquire the data stored in the storage node 21 connected with the computer node in a corresponding manner.
The data stream module 5 has an input/output interface for acquiring and transmitting a data stream. The data mining module 6 is connected with the data stream module 5 and all the storage nodes 21, and the data mining module 6 classifies the data streams according to a mining algorithm and transmits the data streams to the corresponding storage nodes 21 in the corresponding memory 1 for storage according to the classification of the data streams. Specifically, the mining algorithm is one or more of a clustering algorithm, a classification algorithm and a prediction algorithm, and is used for dividing data into hot data with high access rate and cold data with low access rate. The cold data is transmitted to the low-speed and cheap memory 1 by the data mining module 6, so that the construction cost and the energy consumption of the system can be greatly reduced on the basis of slightly sacrificing the overall performance of the system. The hot spot data is distributed to the high-speed and large-capacity memory 1 by the data mining module 6 according to a memory prefetching algorithm so as to improve the access performance and the operation speed.
The distributed data management module comprises a plurality of management nodes 4, each management node 4 corresponds to one of the storages 1 and is connected with all the storage nodes 21 in the corresponding storage 1, and each management node 4 comprises a local database management system 41 and a data connection component 42. The local database management system 41 is responsible for managing the data in the memory 1 corresponding to the local management node 41; the data connection component 42 connects the management node 4 with all other management nodes 4, thereby realizing data sharing and facilitating the optimized management of the whole data.
The channel selector comprises a plurality of channel switches, the number of the channel switches is consistent with that of the computer nodes 3, and the channel switches are connected with the computer nodes 3 in a one-to-one correspondence mode. The distributed data management module is connected with the plurality of computer nodes 3 through the channel selector, and intelligently assigns a computing task to the computer nodes 3 connected with the storage nodes 21 storing data required by the task by using a parallel computing framework, so that the transmission path and transmission time of the data can be shortened, errors in the data transmission process can be prevented to a certain extent, and serious data transmission bottlenecks in traditional distributed computing can be avoided.
The distributed multi-level fault-tolerant modules 7 are at least in one-to-one correspondence with the memories 1, the distributed multi-level fault-tolerant modules 7 are connected with the data stream module 5, the data mining module 6, the distributed data management modules and all the storage nodes 21 corresponding to the distributed multi-level fault-tolerant modules, each distributed multi-level fault-tolerant module 7 organizes process groups in a hierarchical mode, one process group is a main process group and is in a dynamic state, the interaction states of other process groups are periodically updated by a check point of the main process group, if the main process group crashes, a backup execution election algorithm is adopted to select one process group from other process groups as a new main process group, and data stream transmission and storage and commands sent by the distributed data management modules flow through the corresponding distributed multi-level fault-tolerant modules. Each distributed multistage fault-tolerant module is used for processing faults or errors of the corresponding memory, and a dynamic multistage fault-tolerant mechanism is used for dealing with the faults or errors, so that smooth and high-speed execution of tasks is ensured, and task interruption and data loss can be prevented.
The technical scheme provided by the embodiment of the invention has the following beneficial effects: according to the big data all-in-one machine, the distributed data management module is used for assigning the computing task to the computer node 3 connected with the storage node 21 storing the data required by the task to be executed through the channel selector, so that the data transmission speed is improved, and the serious data transmission bottleneck in the traditional distributed computing is avoided. The big data all-in-one machine is provided with a plurality of distributed multistage fault-tolerant modules 7, and a dynamic multistage fault-tolerant mechanism is utilized to deal with faults and errors of the all-in-one machine, so that smooth and high-speed execution of tasks is ensured, and task interruption and data loss can be prevented.
In this document, the terms front, back, upper and lower are used to define the components in the drawings and the positions of the components relative to each other, and are used for clarity and convenience of the technical solution. It is to be understood that the use of the directional terms should not be taken to limit the scope of the claims.
The features of the embodiments and embodiments described herein above may be combined with each other without conflict.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (4)

1. The utility model provides a big data all-in-one which characterized in that: comprises that
The distributed data storage modules are distributed in a plurality of mutually independent storages and comprise a plurality of storage nodes;
the data flow module is provided with an input and output interface and is used for acquiring and transmitting data flow;
the computer nodes are connected with a large number of storage nodes on each storage in a one-to-one correspondence mode so as to acquire data stored in the corresponding storage nodes;
the data mining module is connected with the data stream module and the distributed data storage module, classifies the data streams according to a mining algorithm and stores the data streams into the corresponding storage nodes in the corresponding memories according to the classification of the data streams;
a distributed data management module connected to the plurality of computer nodes through a channel selector to assign a computing task to the computer node connected to the storage node storing data required by the task;
the distributed multi-level fault-tolerant modules correspond to the memories in a one-to-one mode, each distributed multi-level fault-tolerant module is connected with the data stream module, the data mining module, the distributed data management modules and all the corresponding storage nodes, each distributed multi-level fault-tolerant module organizes process groups in a level mode, one process group is a main process group and is in a dynamic state, the interaction states of other process groups are periodically updated by a check point of the main process group, if the main process group crashes, a backup execution election algorithm is adopted to select one process group from the other process groups as a new main process group, and data stream transmission and storage and commands sent by the distributed data management modules flow through the corresponding distributed multi-level fault-tolerant modules.
2. The big data all-in-one machine of claim 1, wherein: the mining algorithm is one or more of a clustering algorithm, a classification algorithm and a prediction algorithm and is used for dividing data into hot spot data with high access rate and cold data with low access rate, the cold data is transmitted to the low-speed and low-price memory by the data mining module, and the hot spot data is distributed to the high-speed and high-capacity memory by the data mining module according to a storage pre-fetching algorithm.
3. The big data all-in-one machine of claim 1, wherein: the distributed data management module comprises a plurality of management nodes, each management node corresponds to one memory and is connected with all the storage nodes in the memory corresponding to the management node, and each management node comprises:
the local database management system is responsible for managing the data in the memory corresponding to the management node;
and the data connecting component is used for connecting the management node with all other management nodes.
4. The big data all-in-one machine of claim 1, wherein: the channel selector comprises a plurality of channel switches, and the number of the channel switches is consistent with that of the computer nodes and is connected with the computer nodes in a one-to-one correspondence mode.
CN201810936219.0A 2018-08-16 2018-08-16 Big data all-in-one Active CN109343791B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810936219.0A CN109343791B (en) 2018-08-16 2018-08-16 Big data all-in-one

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810936219.0A CN109343791B (en) 2018-08-16 2018-08-16 Big data all-in-one

Publications (2)

Publication Number Publication Date
CN109343791A CN109343791A (en) 2019-02-15
CN109343791B true CN109343791B (en) 2021-11-09

Family

ID=65296937

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810936219.0A Active CN109343791B (en) 2018-08-16 2018-08-16 Big data all-in-one

Country Status (1)

Country Link
CN (1) CN109343791B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1494688A (en) * 2001-02-24 2004-05-05 �Ҵ���˾ Novel massively parallel super computer
CN104580503A (en) * 2015-01-26 2015-04-29 浪潮电子信息产业股份有限公司 Efficient dynamic load balancing system and method for processing large-scale data
CN106575296A (en) * 2014-06-20 2017-04-19 亚马逊技术股份有限公司 Dynamic N-dimensional cubes for hosted analytics
CN106815338A (en) * 2016-12-25 2017-06-09 北京中海投资管理有限公司 A kind of real-time storage of big data, treatment and inquiry system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978228B (en) * 2014-04-09 2019-08-30 腾讯科技(深圳)有限公司 A kind of dispatching method and device of distributed computing system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1494688A (en) * 2001-02-24 2004-05-05 �Ҵ���˾ Novel massively parallel super computer
CN106575296A (en) * 2014-06-20 2017-04-19 亚马逊技术股份有限公司 Dynamic N-dimensional cubes for hosted analytics
CN104580503A (en) * 2015-01-26 2015-04-29 浪潮电子信息产业股份有限公司 Efficient dynamic load balancing system and method for processing large-scale data
CN106815338A (en) * 2016-12-25 2017-06-09 北京中海投资管理有限公司 A kind of real-time storage of big data, treatment and inquiry system

Also Published As

Publication number Publication date
CN109343791A (en) 2019-02-15

Similar Documents

Publication Publication Date Title
US11436400B2 (en) Optimization method for graph processing based on heterogeneous FPGA data streams
US4247892A (en) Arrays of machines such as computers
CN109977116B (en) FPGA-DDR-based hash connection operator acceleration method and system
US9195599B2 (en) Multi-level aggregation techniques for memory hierarchies
US10521395B1 (en) Systems and methods for implementing an intelligence processing computing architecture
CN102968339A (en) System and method for realizing complicated event handling based on cloud computing architecture
CN102937964A (en) Intelligent data service method based on distributed system
CN102622323B (en) Data transmission management method based on switch matrix in dynamic configurable serial bus
CN103366021A (en) Variable neighborhood search method and system on cloud computing platform
US20200348871A1 (en) Memory system, operating method thereof and computing system for classifying data according to read and write counts and storing the classified data in a plurality of types of memory devices
CN110442446B (en) Method for real-time processing high-speed digital signal data stream
CN109343791B (en) Big data all-in-one
Kobus et al. Gossip: Efficient communication primitives for multi-gpu systems
Feng et al. Criso: an incremental scalable and cost-effective network architecture for data centers
CN116069480B (en) Processor and computing device
CN105608046A (en) Multi-core processor architecture based on MapReduce programming model
CN1456994A (en) Self-organizing dynamic network computer system structure
CN116302574A (en) Concurrent processing method based on MapReduce
CN112965805B (en) Cross-process asynchronous task processing method and system based on memory mapping file
CN102819218A (en) Discrete event system monitor on basis of event control function and control method thereof
CN112380288A (en) Decentralized distributed data processing system
CN102200961B (en) Expansion method of sub-units in dynamically reconfigurable processor
CN108491167B (en) Industrial process working condition data rapid random distribution storage method
CN112751789A (en) Method and system for realizing asymmetric SDN controller cluster
US8200913B2 (en) Distributed memory type information processing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant