CN109343791B

CN109343791B - Big data all-in-one

Info

Publication number: CN109343791B
Application number: CN201810936219.0A
Authority: CN
Inventors: 张隆显
Original assignee: Wuhan Yuanding Chuangtian Information Technology Co ltd
Current assignee: Wuhan Yuanding Chuangtian Information Technology Co ltd
Priority date: 2018-08-16
Filing date: 2018-08-16
Publication date: 2021-11-09
Anticipated expiration: 2038-08-16
Also published as: CN109343791A

Abstract

The invention provides a big data all-in-one machine which comprises a plurality of distributed data storage modules, a plurality of storages and a plurality of storage nodes, wherein the distributed data storage modules are distributed in the mutually independent storages and comprise a large number of storage nodes; the data flow module is used for acquiring and transmitting data flow; the computer nodes are connected with the storage nodes in a one-to-one correspondence mode so as to acquire data stored in the storage nodes; the data mining module is connected with the data stream module and the distributed data storage module, classifies the data streams and stores the data streams into corresponding storage nodes according to the classes of the data streams; a distributed data management module connected with the computer nodes through the channel selector to assign the computing task to the computer nodes connected with the storage nodes storing the data required by the task; and the distributed multi-level fault-tolerant modules correspond to the memories one by one and are connected with the data stream modules, the data mining modules, the distributed data management modules and the storage nodes, and data streams and commands flow through the corresponding distributed multi-level fault-tolerant modules.

Description

Big data all-in-one

Technical Field

The invention relates to the technical field of information, in particular to a big data all-in-one machine.

Background

With the continuous expansion of the application of information technology in various production and life of human beings, analyzable data shows explosive growth. The potential value is efficiently and rapidly mined from the mass data and converted into a decision basis, which becomes a great challenge for informatization of various industries.

The all-in-one machine is an integrated system product combining software and hardware, and generally integrates three aspects of data processing, data transmission and data storage. The integrated machine can realize rapid deployment, simplify IT infrastructure and save resources by pre-integration, testing and optimization. However, software and hardware forming the all-in-one machine sometimes have faults and errors, so that the all-in-one machine is halted or fails to operate, the calculation time is influenced, and even data is lost. Therefore, an all-in-one machine with fault-tolerant capability and high running speed is urgently needed.

Disclosure of Invention

In view of this, the embodiment of the present invention provides a big data all-in-one machine, which has fault tolerance capability, fast execution speed and high efficiency.

The embodiment of the invention provides a big data all-in-one machine, which comprises

The distributed data storage modules are distributed in a plurality of mutually independent storages and comprise a large number of storage nodes;

the data flow module is provided with an input and output interface and is used for acquiring and transmitting data flow;

the computer nodes are connected with a large number of storage nodes on each storage in a one-to-one correspondence mode so as to acquire data stored in the corresponding storage nodes;

the data mining module is connected with the data stream module and the distributed data storage module, classifies the data streams according to a mining algorithm and stores the data streams into the corresponding storage nodes in the corresponding memories according to the classification of the data streams;

a distributed data management module connected to the plurality of computer nodes through a channel selector to assign a computing task to the computer node connected to the storage node storing data required by the task;

the distributed multi-level fault-tolerant modules are at least in one-to-one correspondence with the memories, and are connected with the data stream module, the data mining module, the distributed data management module and all the corresponding storage nodes, each distributed multi-level fault-tolerant module organizes process groups in a level mode, one process group is a main process group and is in a dynamic state, the interaction states of other process groups are periodically updated by a check point of the main process group, if the main process group crashes, a backup execution election algorithm is adopted to select one process group from other process groups as a new main process group, and data stream transmission and storage and commands sent by the distributed data management module flow through the corresponding distributed multi-level fault-tolerant modules.

Further, the mining algorithm is one or more of a clustering algorithm, a classification algorithm and a prediction algorithm, and is used for dividing data into hot spot data with a high access rate and cold data with a low access rate, the cold data is transmitted to the low-speed and low-cost memory by the data mining module, and the hot spot data is distributed to the high-speed and high-capacity memory by the data mining module according to a memory pre-fetching algorithm.

Further, the distributed data management module includes a plurality of management nodes, each of the management nodes corresponds to one of the memories and is connected to all of the storage nodes in the memory corresponding to the management node, and each of the management nodes includes:

the local database management system is responsible for managing the data in the memory corresponding to the management node;

and the data connecting component is used for connecting the management node with all other management nodes.

Further, the channel selector comprises a plurality of channel switches, the number of the channel switches is consistent with that of the computer nodes, and the channel switches are connected with the computer nodes in a one-to-one correspondence mode.

The technical scheme provided by the embodiment of the invention has the following beneficial effects: according to the big data all-in-one machine, the distributed data management module is used for assigning the computing task to the computer node connected with the storage node storing the data required by the task to be executed through the channel selector, so that the data transmission speed is improved, and the serious data transmission bottleneck in the traditional distributed computing is avoided. The big data all-in-one machine is provided with a plurality of distributed multi-stage fault-tolerant modules, and a dynamic multi-stage fault-tolerant mechanism is utilized to deal with faults and errors of the all-in-one machine, so that smooth and high-speed execution of tasks is ensured, and task interruption and data loss can be prevented.

Drawings

FIG. 1 is a schematic diagram of the big data all-in-one machine of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be further described with reference to the accompanying drawings.

Referring to fig. 1, an embodiment of the present invention provides a big data all-in-one machine, which includes a data stream module 5, a data mining module 6, a channel selector, a plurality of distributed data storage modules 2, a plurality of computer nodes 3, a plurality of distributed multi-level fault tolerance modules 7, and a distributed data management module.

The plurality of distributed data storage modules 2 are distributed in a plurality of memories 1 which are independent from each other, each memory 1 contains one distributed data storage module 2, and each distributed data storage module 2 comprises a large number of storage nodes 21.

The number of the computer nodes 3 is the same as the number of the storage nodes 21, and the computer nodes 3 are connected with the storage nodes 21 in a one-to-one correspondence manner, so that each computer node 3 can directly and quickly acquire the data stored in the storage node 21 connected with the computer node in a corresponding manner.

The data stream module 5 has an input/output interface for acquiring and transmitting a data stream. The data mining module 6 is connected with the data stream module 5 and all the storage nodes 21, and the data mining module 6 classifies the data streams according to a mining algorithm and transmits the data streams to the corresponding storage nodes 21 in the corresponding memory 1 for storage according to the classification of the data streams. Specifically, the mining algorithm is one or more of a clustering algorithm, a classification algorithm and a prediction algorithm, and is used for dividing data into hot data with high access rate and cold data with low access rate. The cold data is transmitted to the low-speed and cheap memory 1 by the data mining module 6, so that the construction cost and the energy consumption of the system can be greatly reduced on the basis of slightly sacrificing the overall performance of the system. The hot spot data is distributed to the high-speed and large-capacity memory 1 by the data mining module 6 according to a memory prefetching algorithm so as to improve the access performance and the operation speed.

The distributed data management module comprises a plurality of management nodes 4, each management node 4 corresponds to one of the storages 1 and is connected with all the storage nodes 21 in the corresponding storage 1, and each management node 4 comprises a local database management system 41 and a data connection component 42. The local database management system 41 is responsible for managing the data in the memory 1 corresponding to the local management node 41; the data connection component 42 connects the management node 4 with all other management nodes 4, thereby realizing data sharing and facilitating the optimized management of the whole data.

The channel selector comprises a plurality of channel switches, the number of the channel switches is consistent with that of the computer nodes 3, and the channel switches are connected with the computer nodes 3 in a one-to-one correspondence mode. The distributed data management module is connected with the plurality of computer nodes 3 through the channel selector, and intelligently assigns a computing task to the computer nodes 3 connected with the storage nodes 21 storing data required by the task by using a parallel computing framework, so that the transmission path and transmission time of the data can be shortened, errors in the data transmission process can be prevented to a certain extent, and serious data transmission bottlenecks in traditional distributed computing can be avoided.

The distributed multi-level fault-tolerant modules 7 are at least in one-to-one correspondence with the memories 1, the distributed multi-level fault-tolerant modules 7 are connected with the data stream module 5, the data mining module 6, the distributed data management modules and all the storage nodes 21 corresponding to the distributed multi-level fault-tolerant modules, each distributed multi-level fault-tolerant module 7 organizes process groups in a hierarchical mode, one process group is a main process group and is in a dynamic state, the interaction states of other process groups are periodically updated by a check point of the main process group, if the main process group crashes, a backup execution election algorithm is adopted to select one process group from other process groups as a new main process group, and data stream transmission and storage and commands sent by the distributed data management modules flow through the corresponding distributed multi-level fault-tolerant modules. Each distributed multistage fault-tolerant module is used for processing faults or errors of the corresponding memory, and a dynamic multistage fault-tolerant mechanism is used for dealing with the faults or errors, so that smooth and high-speed execution of tasks is ensured, and task interruption and data loss can be prevented.

The technical scheme provided by the embodiment of the invention has the following beneficial effects: according to the big data all-in-one machine, the distributed data management module is used for assigning the computing task to the computer node 3 connected with the storage node 21 storing the data required by the task to be executed through the channel selector, so that the data transmission speed is improved, and the serious data transmission bottleneck in the traditional distributed computing is avoided. The big data all-in-one machine is provided with a plurality of distributed multistage fault-tolerant modules 7, and a dynamic multistage fault-tolerant mechanism is utilized to deal with faults and errors of the all-in-one machine, so that smooth and high-speed execution of tasks is ensured, and task interruption and data loss can be prevented.

In this document, the terms front, back, upper and lower are used to define the components in the drawings and the positions of the components relative to each other, and are used for clarity and convenience of the technical solution. It is to be understood that the use of the directional terms should not be taken to limit the scope of the claims.

The features of the embodiments and embodiments described herein above may be combined with each other without conflict.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. The utility model provides a big data all-in-one which characterized in that: comprises that

The distributed data storage modules are distributed in a plurality of mutually independent storages and comprise a plurality of storage nodes;

the distributed multi-level fault-tolerant modules correspond to the memories in a one-to-one mode, each distributed multi-level fault-tolerant module is connected with the data stream module, the data mining module, the distributed data management modules and all the corresponding storage nodes, each distributed multi-level fault-tolerant module organizes process groups in a level mode, one process group is a main process group and is in a dynamic state, the interaction states of other process groups are periodically updated by a check point of the main process group, if the main process group crashes, a backup execution election algorithm is adopted to select one process group from the other process groups as a new main process group, and data stream transmission and storage and commands sent by the distributed data management modules flow through the corresponding distributed multi-level fault-tolerant modules.

2. The big data all-in-one machine of claim 1, wherein: the mining algorithm is one or more of a clustering algorithm, a classification algorithm and a prediction algorithm and is used for dividing data into hot spot data with high access rate and cold data with low access rate, the cold data is transmitted to the low-speed and low-price memory by the data mining module, and the hot spot data is distributed to the high-speed and high-capacity memory by the data mining module according to a storage pre-fetching algorithm.

3. The big data all-in-one machine of claim 1, wherein: the distributed data management module comprises a plurality of management nodes, each management node corresponds to one memory and is connected with all the storage nodes in the memory corresponding to the management node, and each management node comprises:

4. The big data all-in-one machine of claim 1, wherein: the channel selector comprises a plurality of channel switches, and the number of the channel switches is consistent with that of the computer nodes and is connected with the computer nodes in a one-to-one correspondence mode.