WO2019006639A1

WO2019006639A1 - Big data storage management system

Info

Publication number: WO2019006639A1
Application number: PCT/CN2017/091588
Authority: WO
Inventors: 陈钦鹏
Original assignee: 深圳齐心集团股份有限公司
Priority date: 2017-07-04
Filing date: 2017-07-04
Publication date: 2019-01-10

Abstract

A big data storage management system, which is applicable to the technical field of data management, comprises a cloud data server (100) and at least one smart terminal (200). The cloud data server (100) is in a wireless communication connection with the smart terminal (200). The cloud data server (100) comprises a data collection unit (110), a data classification and numbering unit (120), a data parallel processing unit (130), a data recovery unit (140), a data storage unit (150), and a cloud database (160). The system has a proper structure, stably operates, improves the data processing efficiency and the error detection rate, reduces the complexity of managing related data and reduces the operation load of the system.

Description

Big data storage management system

Technical field

The invention belongs to the field of data management, and in particular relates to a big data storage management system.

Background technique

With the rapid development of computer technology, the data of various fields and fields have grown rapidly. The data comes from all aspects, from sensors that collect weather conditions, digital photos, online video materials, to online shopping transactions, mobile phone GPS signals, and more. Along with the rapid expansion of data scale, the accumulated amount of data in various industries is getting larger and larger, the types of data are increasing, and the data structure is becoming more and more complex. It has surpassed the traditional data management system and the ability of processing modes. The traditional string Row database systems have been difficult to adapt to this rapidly growing application demand, showing significant lack of capacity in production practices, unable to meet the data storage needs of the era of big data.

In the prior art, the centralized data storage solution has low data processing efficiency, low disaster tolerance, and long system recovery time. The distributed data storage solution uses DHT to access user data and finds in a unicast manner. When a node fails, the search request is initiated to another node, and the operations such as the update are similar. The data processing efficiency is low, the disaster tolerance is low, and the implementation of the HASH calculation and the route search process is complicated. There may also be cases where data is inconsistent.

technical problem

In order to overcome the problems of the prior art, the embodiment of the present invention provides a big data storage management system with reasonable structure and stable operation, improved data processing efficiency and error detection rate, and reduced complexity of related data management. , reducing the computing load of the system.

Technical solution

The embodiment of the present invention is implemented as follows: A big data storage management system includes: a cloud data server and at least one smart terminal; the cloud data server is wirelessly connected to the smart terminal; wherein the cloud data server, The method includes: a data collection unit, a data classification number unit, a data parallel processing unit, a data recovery unit, a data storage unit, and a cloud database; the data collection unit collects data on the smart terminal, and performs preliminary classification on the data, and Data compression processing of the same category is transmitted to the data classification number unit; the data classification number unit reclassifies and compresses the compressed data, and performs data location, data time, and data capacity of the same type of compressed data of different categories. Classes are classified and data classification numbers are generated; the data parallel processing unit adopts parallel data preprocessing technology and is provided with Map/Reduce Processing the model, by calling the Map function, each processing task is processed in parallel by multiple Map tasks, these Map tasks are assigned to the execution nodes assigned to the processing task assignment, and then each function is processed by calling the Reduce function. The processing result of each Map task is merged to complete data pre-processing; the data storage unit sequentially stores each compressed data in the pre-processed data into the cloud database according to the generated number; the data recovery unit pin hard disk drive check The wrong mechanism optimizes the mechanism to improve the efficiency of the system's error detection, thus ensuring that the system realizes efficient storage of big data.

Preferably, the cloud data server further includes:

The data redundancy judging module is connected with the data collecting unit and the cloud database, and is used for redundantly judging the data collected by the data collecting unit. If the data stored in the cloud database is the same as the data collected by the data collecting unit, the same data is discarded. .

Preferably, the cloud data server further includes:

a data noise reduction processing unit for performing noise reduction preprocessing on the collected data;

A data mining unit is used for mining and analyzing data in a cloud database.

Preferably, the data mining unit comprises:

The data parallel mining module is used for multi-path parallel mining of data in the cloud database from different angles;

a mining result fusion module for summarizing data mining results output by the multi-way parallel data parallel mining module;

A fusion information analysis module for analyzing and processing the summarized data.

Preferably, the data parallel processing unit comprises:

The data discretization processing module is configured to discretize the compressed data to facilitate storage and further analysis.

Beneficial effect

The big data storage management system provided by the embodiment of the invention has reasonable structure and stable operation, improves data processing efficiency and error detection rate, reduces complexity of related data management, and reduces computation load of the system.

DRAWINGS

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are Some embodiments of the present invention may also be used to obtain other drawings based on these drawings without departing from the art.

The following drawings are only intended to illustrate and explain the present invention, and do not limit the scope of the invention.

1 is a schematic structural diagram of a big data storage management system according to an embodiment of the present invention;

2 is a schematic structural diagram of a cloud data server according to an embodiment of the present invention;

3 is a schematic structural diagram of another cloud data server according to an embodiment of the present invention;

4 is a schematic structural diagram of a data mining unit according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a data parallel processing unit according to an embodiment of the present invention.

Embodiments of the invention

The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The specific implementation of the present invention will be described in detail below with reference to specific embodiments.

As shown in FIG. 1 , in the embodiment of the present invention, a big data storage management system includes: a cloud data server 100 and at least one smart terminal 200; the cloud data server 100 is wirelessly connected to the smart terminal 200; The cloud data server 100 includes: a data collection unit 110, a data classification number unit 120, a data parallel processing unit 130, a data recovery unit 140, a data storage unit 150, and a cloud database 160. The data collection unit 110 collects Data on the intelligent terminal, and preliminary classification of the data, and compressing the same type of data and transmitting the data to the data classification number unit; the data classification number unit 120 classifies the compressed data again into compression processing, and Different types of compressed data of the same type are classified into data location, data time, and data capacity, and a data classification number is generated; the data parallel processing unit 130 adopts parallel data preprocessing technology and is provided with Map/Reduce. Processing the model, by calling the Map function, each processing task is processed in parallel by multiple Map tasks, these Map tasks are assigned to the execution nodes assigned to the processing task assignment, and then each function is processed by calling the Reduce function. The processing result of each Map task is merged to complete the data pre-processing; the data storage unit 150 sequentially stores each compressed data in the pre-processed data into the cloud database 160 according to the generated number; the data recovery unit 140 pin hard disk The error detection mechanism of the driver optimizes the mechanism to improve the error detection efficiency of the system, thereby ensuring that the system realizes efficient storage of big data. The structure is reasonable and the operation is stable, which improves the data processing efficiency and the error detection rate, reduces the complexity of the related data management, and reduces the computing load of the system.

In the embodiment of the present invention, as shown in FIG. 2, the cloud data server 100 further includes: a data redundancy determining module 170, which is connected to the data collecting unit 110 and the cloud database 160 for collecting by the data collecting unit 110. The data is redundantly judged. If the data stored in the cloud database 160 is the same as the data collected by the data collection unit 110, the same data is discarded.

In the embodiment of the present invention, as shown in FIG. 3, the cloud data server 100 further includes: a data noise reduction processing unit 180, configured to perform noise reduction preprocessing on the collected data; and a data mining unit 190, Mining and analyzing data in the cloud database.

In the embodiment of the present invention, as shown in FIG. 4, the data mining unit 190 includes: a data parallel mining module 191, configured to perform multiple parallel mining on data in the cloud database from different angles; and the mining result fusion module 192, The data mining result outputted by the multi-way parallel data parallel mining module is summarized; and the fusion information analysis module 193 is configured to analyze and process the summarized data.

In the embodiment of the present invention, as shown in FIG. 5, the data parallel processing unit 130 includes a data discretization processing module 131 for discretizing the compressed data for convenient storage and further analysis.

The big data storage management system provided by the above embodiments of the invention has reasonable structure and stable operation, improves data processing efficiency and error detection rate, reduces complexity of related data management, and reduces computation load of the system.

The above is only the preferred embodiment of the present invention, and is not intended to limit the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the protection of the present invention. Within the scope.

Claims

A large data storage management system, comprising: a cloud data server and at least one smart terminal; the cloud data server is wirelessly connected to the smart terminal; wherein the cloud data server comprises: a data collection unit a data classification number unit, a data parallel processing unit, a data recovery unit, a data storage unit, and a cloud database; the data collection unit collects data on the intelligent terminal, and performs preliminary classification on the data, and compresses data of the same category After processing, the data is transmitted to the data classification number unit; the data classification number unit reclassifies and compresses the compressed data, and classifies the same type of compressed data of different categories into a data location, a data time, and a data capacity category. And generating a data classification number; the data parallel processing unit adopts parallel data preprocessing technology, and is provided with Map/Reduce Processing the model, by calling the Map function, each processing task is processed in parallel by multiple Map tasks, these Map tasks are assigned to the execution nodes assigned to the processing task assignment, and then each function is processed by calling the Reduce function. The processing result of each Map task is merged to complete data pre-processing; the data storage unit sequentially stores each compressed data in the pre-processed data into the cloud database according to the generated number; the data recovery unit pin hard disk drive check The wrong mechanism optimizes the mechanism to improve the efficiency of the system's error detection, thus ensuring that the system realizes efficient storage of big data.
The large data storage management system according to claim 1, wherein the cloud data server further comprises:

The data redundancy judging module is connected with the data collecting unit and the cloud database, and is used for redundantly judging the data collected by the data collecting unit. If the data stored in the cloud database is the same as the data collected by the data collecting unit, the same data is discarded. .
The large data storage management system according to claim 1, wherein the cloud data server further comprises:

a data noise reduction processing unit for performing noise reduction preprocessing on the collected data;

A data mining unit is used for mining and analyzing data in a cloud database.
The data mining management system according to claim 3, wherein the data mining unit comprises:

The data parallel mining module is used for multi-path parallel mining of data in the cloud database from different angles;

a mining result fusion module for summarizing data mining results output by the multi-way parallel data parallel mining module;

A fusion information analysis module for analyzing and processing the summarized data.
The big data storage management system according to claim 1, wherein the data parallel processing unit comprises:

The data discretization processing module is configured to discretize the compressed data to facilitate storage and further analysis.