WO2019006639A1 - Big data storage management system - Google Patents

Big data storage management system Download PDF

Info

Publication number
WO2019006639A1
WO2019006639A1 PCT/CN2017/091588 CN2017091588W WO2019006639A1 WO 2019006639 A1 WO2019006639 A1 WO 2019006639A1 CN 2017091588 W CN2017091588 W CN 2017091588W WO 2019006639 A1 WO2019006639 A1 WO 2019006639A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
unit
cloud
processing
mining
Prior art date
Application number
PCT/CN2017/091588
Other languages
French (fr)
Chinese (zh)
Inventor
陈钦鹏
Original Assignee
深圳齐心集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳齐心集团股份有限公司 filed Critical 深圳齐心集团股份有限公司
Priority to PCT/CN2017/091588 priority Critical patent/WO2019006639A1/en
Publication of WO2019006639A1 publication Critical patent/WO2019006639A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the invention belongs to the field of data management, and in particular relates to a big data storage management system.
  • the centralized data storage solution has low data processing efficiency, low disaster tolerance, and long system recovery time.
  • the distributed data storage solution uses DHT to access user data and finds in a unicast manner. When a node fails, the search request is initiated to another node, and the operations such as the update are similar.
  • the data processing efficiency is low, the disaster tolerance is low, and the implementation of the HASH calculation and the route search process is complicated. There may also be cases where data is inconsistent.
  • the embodiment of the present invention provides a big data storage management system with reasonable structure and stable operation, improved data processing efficiency and error detection rate, and reduced complexity of related data management. , reducing the computing load of the system.
  • a big data storage management system includes: a cloud data server and at least one smart terminal; the cloud data server is wirelessly connected to the smart terminal; wherein the cloud data server,
  • the method includes: a data collection unit, a data classification number unit, a data parallel processing unit, a data recovery unit, a data storage unit, and a cloud database; the data collection unit collects data on the smart terminal, and performs preliminary classification on the data, and Data compression processing of the same category is transmitted to the data classification number unit; the data classification number unit reclassifies and compresses the compressed data, and performs data location, data time, and data capacity of the same type of compressed data of different categories.
  • Classes are classified and data classification numbers are generated;
  • the data parallel processing unit adopts parallel data preprocessing technology and is provided with Map/Reduce Processing the model, by calling the Map function, each processing task is processed in parallel by multiple Map tasks, these Map tasks are assigned to the execution nodes assigned to the processing task assignment, and then each function is processed by calling the Reduce function.
  • the processing result of each Map task is merged to complete data pre-processing; the data storage unit sequentially stores each compressed data in the pre-processed data into the cloud database according to the generated number; the data recovery unit pin hard disk drive check
  • the wrong mechanism optimizes the mechanism to improve the efficiency of the system's error detection, thus ensuring that the system realizes efficient storage of big data.
  • the cloud data server further includes:
  • the data redundancy judging module is connected with the data collecting unit and the cloud database, and is used for redundantly judging the data collected by the data collecting unit. If the data stored in the cloud database is the same as the data collected by the data collecting unit, the same data is discarded. .
  • the cloud data server further includes:
  • a data noise reduction processing unit for performing noise reduction preprocessing on the collected data
  • a data mining unit is used for mining and analyzing data in a cloud database.
  • the data mining unit comprises:
  • the data parallel mining module is used for multi-path parallel mining of data in the cloud database from different angles;
  • a mining result fusion module for summarizing data mining results output by the multi-way parallel data parallel mining module
  • a fusion information analysis module for analyzing and processing the summarized data.
  • the data parallel processing unit comprises:
  • the data discretization processing module is configured to discretize the compressed data to facilitate storage and further analysis.
  • the big data storage management system provided by the embodiment of the invention has reasonable structure and stable operation, improves data processing efficiency and error detection rate, reduces complexity of related data management, and reduces computation load of the system.
  • FIG. 1 is a schematic structural diagram of a big data storage management system according to an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of a cloud data server according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of another cloud data server according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a data mining unit according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a data parallel processing unit according to an embodiment of the present invention.
  • the big data storage management system provided by the embodiment of the invention has reasonable structure and stable operation, improves data processing efficiency and error detection rate, reduces complexity of related data management, and reduces computation load of the system.
  • a big data storage management system includes: a cloud data server 100 and at least one smart terminal 200; the cloud data server 100 is wirelessly connected to the smart terminal 200;
  • the cloud data server 100 includes: a data collection unit 110, a data classification number unit 120, a data parallel processing unit 130, a data recovery unit 140, a data storage unit 150, and a cloud database 160.
  • the data collection unit 110 collects Data on the intelligent terminal, and preliminary classification of the data, and compressing the same type of data and transmitting the data to the data classification number unit; the data classification number unit 120 classifies the compressed data again into compression processing, and Different types of compressed data of the same type are classified into data location, data time, and data capacity, and a data classification number is generated; the data parallel processing unit 130 adopts parallel data preprocessing technology and is provided with Map/Reduce. Processing the model, by calling the Map function, each processing task is processed in parallel by multiple Map tasks, these Map tasks are assigned to the execution nodes assigned to the processing task assignment, and then each function is processed by calling the Reduce function.
  • the processing result of each Map task is merged to complete the data pre-processing; the data storage unit 150 sequentially stores each compressed data in the pre-processed data into the cloud database 160 according to the generated number; the data recovery unit 140 pin hard disk
  • the error detection mechanism of the driver optimizes the mechanism to improve the error detection efficiency of the system, thereby ensuring that the system realizes efficient storage of big data.
  • the structure is reasonable and the operation is stable, which improves the data processing efficiency and the error detection rate, reduces the complexity of the related data management, and reduces the computing load of the system.
  • the cloud data server 100 further includes: a data redundancy determining module 170, which is connected to the data collecting unit 110 and the cloud database 160 for collecting by the data collecting unit 110. The data is redundantly judged. If the data stored in the cloud database 160 is the same as the data collected by the data collection unit 110, the same data is discarded.
  • a data redundancy determining module 170 which is connected to the data collecting unit 110 and the cloud database 160 for collecting by the data collecting unit 110. The data is redundantly judged. If the data stored in the cloud database 160 is the same as the data collected by the data collection unit 110, the same data is discarded.
  • the cloud data server 100 further includes: a data noise reduction processing unit 180, configured to perform noise reduction preprocessing on the collected data; and a data mining unit 190, Mining and analyzing data in the cloud database.
  • the data mining unit 190 includes: a data parallel mining module 191, configured to perform multiple parallel mining on data in the cloud database from different angles; and the mining result fusion module 192, The data mining result outputted by the multi-way parallel data parallel mining module is summarized; and the fusion information analysis module 193 is configured to analyze and process the summarized data.
  • the data parallel processing unit 130 includes a data discretization processing module 131 for discretizing the compressed data for convenient storage and further analysis.
  • the big data storage management system provided by the above embodiments of the invention has reasonable structure and stable operation, improves data processing efficiency and error detection rate, reduces complexity of related data management, and reduces computation load of the system.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A big data storage management system, which is applicable to the technical field of data management, comprises a cloud data server (100) and at least one smart terminal (200). The cloud data server (100) is in a wireless communication connection with the smart terminal (200). The cloud data server (100) comprises a data collection unit (110), a data classification and numbering unit (120), a data parallel processing unit (130), a data recovery unit (140), a data storage unit (150), and a cloud database (160). The system has a proper structure, stably operates, improves the data processing efficiency and the error detection rate, reduces the complexity of managing related data and reduces the operation load of the system.

Description

一种大数据存储管理系统  Big data storage management system 技术领域Technical field
本发明属于数据管理领域,尤其涉及一种大数据存储管理系统。The invention belongs to the field of data management, and in particular relates to a big data storage management system.
背景技术Background technique
随着计算机技术的飞速发展,各行各领域数据的呈几何级快速增长。这些数据来自各方面,从搜集天气情况的感测器、数码图片、在线的视频资料,到网络购物的交易记录、手机的全球定位系统信号等应有尽有。伴随数据规模的急剧膨胀,各行业累积的数据量越来越巨大,数据类型也越来越多、数据结构越来越复杂,已经超越了传统数据管理系统、处理模式的能力范围,传统的串行数据库系统已经难以适应这种飞速增长的应用需求,在生产实践中表现出明显的能力不足,无法满足大数据时代的数据存储需求。With the rapid development of computer technology, the data of various fields and fields have grown rapidly. The data comes from all aspects, from sensors that collect weather conditions, digital photos, online video materials, to online shopping transactions, mobile phone GPS signals, and more. Along with the rapid expansion of data scale, the accumulated amount of data in various industries is getting larger and larger, the types of data are increasing, and the data structure is becoming more and more complex. It has surpassed the traditional data management system and the ability of processing modes. The traditional string Row database systems have been difficult to adapt to this rapidly growing application demand, showing significant lack of capacity in production practices, unable to meet the data storage needs of the era of big data.
在现有技术中,集中式的数据存储方案的数据处理效率低、容灾能力低,系统恢复时间长;分布式的数据存储方案采用DHT方式访问用户数据,以单播方式查找,当发现一个节点失效时,才向另一节点发起查找请求,更新等操作也类似,同样存在数据处理效率低、容灾能力低,并且,实施时需要有繁琐的HASH计算和路由查找过程,实现复杂,同时也有可能出现数据不一致的情况。In the prior art, the centralized data storage solution has low data processing efficiency, low disaster tolerance, and long system recovery time. The distributed data storage solution uses DHT to access user data and finds in a unicast manner. When a node fails, the search request is initiated to another node, and the operations such as the update are similar. The data processing efficiency is low, the disaster tolerance is low, and the implementation of the HASH calculation and the route search process is complicated. There may also be cases where data is inconsistent.
技术问题technical problem
为了克服上述现有技术所存在的问题,本发明实施例提供一种大数据存储管理系统,结构合理、运行稳定,提高了数据处理效率和检错率,并降低了对相关数据管理的复杂度,减轻了系统的运算负荷。In order to overcome the problems of the prior art, the embodiment of the present invention provides a big data storage management system with reasonable structure and stable operation, improved data processing efficiency and error detection rate, and reduced complexity of related data management. , reducing the computing load of the system.
技术解决方案Technical solution
本发明实施例是这样实现的,一种大数据存储管理系统,包括:云数据服务器以及至少一个智能终端;所述云数据服务器与所述智能终端无线通信连接;其中,所述云数据服务器,包括:数据采集单元、数据分类编号单元、数据并行处理单元、数据恢复单元、数据存储单元以及云数据库;所述数据采集单元采集所述智能终端上的数据,并对数据进行初步分类,并将相同类别的数据压缩处理后传输至数据分类编号单元;所述数据分类编号单元将压缩处理后的数据再次分类压缩处理,并将不同类别的相同类型的压缩数据进行数据位置,数据时间,数据容量的类别进行分类,并生成数据分类编号;所述数据并行处理单元采用并行数据预处理技术,设有Map/Reduce 处理模型,通过调用Map函数,将每个处理任务由多个Map任务并行处理,这些Map任务被分配到所属处理任务分配的执行节点上执行,再通过调用Reduce函数,分别对每个处理任务的各Map任务的处理结果进行合并操作,完成数据预处理;所述数据存储单元将预处理后的每一个压缩数据按着生成的编号依次存入云数据库;所述数据恢复单元针硬盘驱动器的检错机制,将该机制进行优化从而提高系统的检错效率,从而保证系统实现大数据有效地存储。The embodiment of the present invention is implemented as follows: A big data storage management system includes: a cloud data server and at least one smart terminal; the cloud data server is wirelessly connected to the smart terminal; wherein the cloud data server, The method includes: a data collection unit, a data classification number unit, a data parallel processing unit, a data recovery unit, a data storage unit, and a cloud database; the data collection unit collects data on the smart terminal, and performs preliminary classification on the data, and Data compression processing of the same category is transmitted to the data classification number unit; the data classification number unit reclassifies and compresses the compressed data, and performs data location, data time, and data capacity of the same type of compressed data of different categories. Classes are classified and data classification numbers are generated; the data parallel processing unit adopts parallel data preprocessing technology and is provided with Map/Reduce Processing the model, by calling the Map function, each processing task is processed in parallel by multiple Map tasks, these Map tasks are assigned to the execution nodes assigned to the processing task assignment, and then each function is processed by calling the Reduce function. The processing result of each Map task is merged to complete data pre-processing; the data storage unit sequentially stores each compressed data in the pre-processed data into the cloud database according to the generated number; the data recovery unit pin hard disk drive check The wrong mechanism optimizes the mechanism to improve the efficiency of the system's error detection, thus ensuring that the system realizes efficient storage of big data.
优选地,所述云数据服务器,还包括:Preferably, the cloud data server further includes:
数据冗余判断模块,与数据采集单元、云数据库连接,用于对数据采集单元采集的数据进行冗余判断,若云数据库内存储的数据与数据采集单元采集的数据相同则将相同的数据丢弃。The data redundancy judging module is connected with the data collecting unit and the cloud database, and is used for redundantly judging the data collected by the data collecting unit. If the data stored in the cloud database is the same as the data collected by the data collecting unit, the same data is discarded. .
优选地,所述云数据服务器,还包括:Preferably, the cloud data server further includes:
数据降噪处理单元,用于对采集到的数据进行降噪预处理;以及a data noise reduction processing unit for performing noise reduction preprocessing on the collected data;
数据挖掘单元,用于对云数据库中的数据进行挖掘分析。A data mining unit is used for mining and analyzing data in a cloud database.
优选地,所述数据挖掘单元,包括:Preferably, the data mining unit comprises:
数据并行挖掘模块,用于从不同角度对云数据库内数据进行多路并行挖掘;The data parallel mining module is used for multi-path parallel mining of data in the cloud database from different angles;
挖掘结果融合模块,用于对多路并行的数据并行挖掘模块输出的数据挖掘结果进行汇总;以及a mining result fusion module for summarizing data mining results output by the multi-way parallel data parallel mining module;
融合信息分析模块,用于对汇总后的数据进行分析处理。A fusion information analysis module for analyzing and processing the summarized data.
优选地,所述数据并行处理单元,包括:Preferably, the data parallel processing unit comprises:
数据离散化处理模块,用于将压缩处理后的数据进行离散化处理,方便存储和进一步分析。The data discretization processing module is configured to discretize the compressed data to facilitate storage and further analysis.
有益效果Beneficial effect
本发明实施例提供的大数据存储管理系统,结构合理、运行稳定,提高了数据处理效率和检错率,并降低了对相关数据管理的复杂度,减轻了系统的运算负荷。The big data storage management system provided by the embodiment of the invention has reasonable structure and stable operation, improves data processing efficiency and error detection rate, reduces complexity of related data management, and reduces computation load of the system.
附图说明DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are Some embodiments of the present invention may also be used to obtain other drawings based on these drawings without departing from the art.
以下附图仅旨在于对本发明做示意性说明和解释,并不限定本发明的范围。The following drawings are only intended to illustrate and explain the present invention, and do not limit the scope of the invention.
图1是本发明实施例提供的一种大数据存储管理系统的结构示意图;1 is a schematic structural diagram of a big data storage management system according to an embodiment of the present invention;
图2是本发明实施例提供的一种云数据服务器的结构示意图;2 is a schematic structural diagram of a cloud data server according to an embodiment of the present invention;
图3是本发明实施例提供的另一种云数据服务器的结构示意图;3 is a schematic structural diagram of another cloud data server according to an embodiment of the present invention;
图4是本发明实施例提供的数据挖掘单元的结构示意图;4 is a schematic structural diagram of a data mining unit according to an embodiment of the present invention;
图5是本发明实施例提供的数据并行处理单元的结构示意图。FIG. 5 is a schematic structural diagram of a data parallel processing unit according to an embodiment of the present invention.
本发明的实施方式Embodiments of the invention
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
本发明实施例提供的大数据存储管理系统,结构合理、运行稳定,提高了数据处理效率和检错率,并降低了对相关数据管理的复杂度,减轻了系统的运算负荷。The big data storage management system provided by the embodiment of the invention has reasonable structure and stable operation, improves data processing efficiency and error detection rate, reduces complexity of related data management, and reduces computation load of the system.
以下结合具体实施例对本发明的具体实现进行详细描述。The specific implementation of the present invention will be described in detail below with reference to specific embodiments.
如图1所示,在本发明实施例中,一种大数据存储管理系统,包括:云数据服务器100以及至少一个智能终端200;所述云数据服务器100与所述智能终端200无线通信连接;其中,所述云数据服务器100,包括:数据采集单元110、数据分类编号单元120、数据并行处理单元130、数据恢复单元140、数据存储单元150以及云数据库160;所述数据采集单元110采集所述智能终端上的数据,并对数据进行初步分类,并将相同类别的数据压缩处理后传输至数据分类编号单元;所述数据分类编号单元120将压缩处理后的数据再次分类压缩处理,并将不同类别的相同类型的压缩数据进行数据位置,数据时间,数据容量的类别进行分类,并生成数据分类编号;所述数据并行处理单元130采用并行数据预处理技术,设有Map/Reduce 处理模型,通过调用Map函数,将每个处理任务由多个Map任务并行处理,这些Map任务被分配到所属处理任务分配的执行节点上执行,再通过调用Reduce函数,分别对每个处理任务的各Map任务的处理结果进行合并操作,完成数据预处理;所述数据存储单元150将预处理后的每一个压缩数据按着生成的编号依次存入云数据库160;所述数据恢复单元140针硬盘驱动器的检错机制,将该机制进行优化从而提高系统的检错效率,从而保证系统实现大数据有效地存储。结构合理、运行稳定,提高了数据处理效率和检错率,并降低了对相关数据管理的复杂度,减轻了系统的运算负荷。As shown in FIG. 1 , in the embodiment of the present invention, a big data storage management system includes: a cloud data server 100 and at least one smart terminal 200; the cloud data server 100 is wirelessly connected to the smart terminal 200; The cloud data server 100 includes: a data collection unit 110, a data classification number unit 120, a data parallel processing unit 130, a data recovery unit 140, a data storage unit 150, and a cloud database 160. The data collection unit 110 collects Data on the intelligent terminal, and preliminary classification of the data, and compressing the same type of data and transmitting the data to the data classification number unit; the data classification number unit 120 classifies the compressed data again into compression processing, and Different types of compressed data of the same type are classified into data location, data time, and data capacity, and a data classification number is generated; the data parallel processing unit 130 adopts parallel data preprocessing technology and is provided with Map/Reduce. Processing the model, by calling the Map function, each processing task is processed in parallel by multiple Map tasks, these Map tasks are assigned to the execution nodes assigned to the processing task assignment, and then each function is processed by calling the Reduce function. The processing result of each Map task is merged to complete the data pre-processing; the data storage unit 150 sequentially stores each compressed data in the pre-processed data into the cloud database 160 according to the generated number; the data recovery unit 140 pin hard disk The error detection mechanism of the driver optimizes the mechanism to improve the error detection efficiency of the system, thereby ensuring that the system realizes efficient storage of big data. The structure is reasonable and the operation is stable, which improves the data processing efficiency and the error detection rate, reduces the complexity of the related data management, and reduces the computing load of the system.
在本发明实施例中,如图2所示,所述云数据服务器100,还包括:数据冗余判断模块170,与数据采集单元110、云数据库160连接,用于对数据采集单元110采集的数据进行冗余判断,若云数据库160内存储的数据与数据采集单元110采集的数据相同则将相同的数据丢弃。In the embodiment of the present invention, as shown in FIG. 2, the cloud data server 100 further includes: a data redundancy determining module 170, which is connected to the data collecting unit 110 and the cloud database 160 for collecting by the data collecting unit 110. The data is redundantly judged. If the data stored in the cloud database 160 is the same as the data collected by the data collection unit 110, the same data is discarded.
在本发明实施例中,如图3所示,所述云数据服务器100,还包括:数据降噪处理单元180,用于对采集到的数据进行降噪预处理;以及数据挖掘单元190,用于对云数据库中的数据进行挖掘分析。In the embodiment of the present invention, as shown in FIG. 3, the cloud data server 100 further includes: a data noise reduction processing unit 180, configured to perform noise reduction preprocessing on the collected data; and a data mining unit 190, Mining and analyzing data in the cloud database.
在本发明实施例中,如图4所示,所述数据挖掘单元190,包括:数据并行挖掘模块191,用于从不同角度对云数据库内数据进行多路并行挖掘;挖掘结果融合模块192,用于对多路并行的数据并行挖掘模块输出的数据挖掘结果进行汇总;以及融合信息分析模块193,用于对汇总后的数据进行分析处理。In the embodiment of the present invention, as shown in FIG. 4, the data mining unit 190 includes: a data parallel mining module 191, configured to perform multiple parallel mining on data in the cloud database from different angles; and the mining result fusion module 192, The data mining result outputted by the multi-way parallel data parallel mining module is summarized; and the fusion information analysis module 193 is configured to analyze and process the summarized data.
在本发明实施例中,如图5所示,所述数据并行处理单元130,包括:数据离散化处理模块131,用于将压缩处理后的数据进行离散化处理,方便存储和进一步分析。In the embodiment of the present invention, as shown in FIG. 5, the data parallel processing unit 130 includes a data discretization processing module 131 for discretizing the compressed data for convenient storage and further analysis.
上述发明实施例提供的大数据存储管理系统,结构合理、运行稳定,提高了数据处理效率和检错率,并降低了对相关数据管理的复杂度,减轻了系统的运算负荷。The big data storage management system provided by the above embodiments of the invention has reasonable structure and stable operation, improves data processing efficiency and error detection rate, reduces complexity of related data management, and reduces computation load of the system.
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。The above is only the preferred embodiment of the present invention, and is not intended to limit the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the protection of the present invention. Within the scope.

Claims (5)

  1. 一种大数据存储管理系统,其特征在于,包括:云数据服务器以及至少一个智能终端;所述云数据服务器与所述智能终端无线通信连接;其中,所述云数据服务器,包括:数据采集单元、数据分类编号单元、数据并行处理单元、数据恢复单元、数据存储单元以及云数据库;所述数据采集单元采集所述智能终端上的数据,并对数据进行初步分类,并将相同类别的数据压缩处理后传输至数据分类编号单元;所述数据分类编号单元将压缩处理后的数据再次分类压缩处理,并将不同类别的相同类型的压缩数据进行数据位置,数据时间,数据容量的类别进行分类,并生成数据分类编号;所述数据并行处理单元采用并行数据预处理技术,设有Map/Reduce 处理模型,通过调用Map函数,将每个处理任务由多个Map任务并行处理,这些Map任务被分配到所属处理任务分配的执行节点上执行,再通过调用Reduce函数,分别对每个处理任务的各Map任务的处理结果进行合并操作,完成数据预处理;所述数据存储单元将预处理后的每一个压缩数据按着生成的编号依次存入云数据库;所述数据恢复单元针硬盘驱动器的检错机制,将该机制进行优化从而提高系统的检错效率,从而保证系统实现大数据有效地存储。 A large data storage management system, comprising: a cloud data server and at least one smart terminal; the cloud data server is wirelessly connected to the smart terminal; wherein the cloud data server comprises: a data collection unit a data classification number unit, a data parallel processing unit, a data recovery unit, a data storage unit, and a cloud database; the data collection unit collects data on the intelligent terminal, and performs preliminary classification on the data, and compresses data of the same category After processing, the data is transmitted to the data classification number unit; the data classification number unit reclassifies and compresses the compressed data, and classifies the same type of compressed data of different categories into a data location, a data time, and a data capacity category. And generating a data classification number; the data parallel processing unit adopts parallel data preprocessing technology, and is provided with Map/Reduce Processing the model, by calling the Map function, each processing task is processed in parallel by multiple Map tasks, these Map tasks are assigned to the execution nodes assigned to the processing task assignment, and then each function is processed by calling the Reduce function. The processing result of each Map task is merged to complete data pre-processing; the data storage unit sequentially stores each compressed data in the pre-processed data into the cloud database according to the generated number; the data recovery unit pin hard disk drive check The wrong mechanism optimizes the mechanism to improve the efficiency of the system's error detection, thus ensuring that the system realizes efficient storage of big data.
  2. 如权利要求1所述的大数据存储管理系统,其特征在于,所述云数据服务器,还包括:The large data storage management system according to claim 1, wherein the cloud data server further comprises:
    数据冗余判断模块,与数据采集单元、云数据库连接,用于对数据采集单元采集的数据进行冗余判断,若云数据库内存储的数据与数据采集单元采集的数据相同则将相同的数据丢弃。The data redundancy judging module is connected with the data collecting unit and the cloud database, and is used for redundantly judging the data collected by the data collecting unit. If the data stored in the cloud database is the same as the data collected by the data collecting unit, the same data is discarded. .
  3. 如权利要求1所述的大数据存储管理系统,其特征在于,所述云数据服务器,还包括:The large data storage management system according to claim 1, wherein the cloud data server further comprises:
    数据降噪处理单元,用于对采集到的数据进行降噪预处理;以及a data noise reduction processing unit for performing noise reduction preprocessing on the collected data;
    数据挖掘单元,用于对云数据库中的数据进行挖掘分析。A data mining unit is used for mining and analyzing data in a cloud database.
  4. 如权利要求3所述的大数据存储管理系统,其特征在于,所述数据挖掘单元,包括:The data mining management system according to claim 3, wherein the data mining unit comprises:
    数据并行挖掘模块,用于从不同角度对云数据库内数据进行多路并行挖掘;The data parallel mining module is used for multi-path parallel mining of data in the cloud database from different angles;
    挖掘结果融合模块,用于对多路并行的数据并行挖掘模块输出的数据挖掘结果进行汇总;以及a mining result fusion module for summarizing data mining results output by the multi-way parallel data parallel mining module;
    融合信息分析模块,用于对汇总后的数据进行分析处理。A fusion information analysis module for analyzing and processing the summarized data.
  5. 如权利要求1所述的大数据存储管理系统,其特征在于,所述数据并行处理单元,包括:The big data storage management system according to claim 1, wherein the data parallel processing unit comprises:
    数据离散化处理模块,用于将压缩处理后的数据进行离散化处理,方便存储和进一步分析。 The data discretization processing module is configured to discretize the compressed data to facilitate storage and further analysis.
PCT/CN2017/091588 2017-07-04 2017-07-04 Big data storage management system WO2019006639A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/091588 WO2019006639A1 (en) 2017-07-04 2017-07-04 Big data storage management system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/091588 WO2019006639A1 (en) 2017-07-04 2017-07-04 Big data storage management system

Publications (1)

Publication Number Publication Date
WO2019006639A1 true WO2019006639A1 (en) 2019-01-10

Family

ID=64949549

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/091588 WO2019006639A1 (en) 2017-07-04 2017-07-04 Big data storage management system

Country Status (1)

Country Link
WO (1) WO2019006639A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150220080A1 (en) * 2014-01-31 2015-08-06 Fisher-Rosemount Systems, Inc. Managing Big Data In Process Control Systems
CN106446516A (en) * 2016-08-30 2017-02-22 江苏名通信息科技有限公司 Big-data incremental truth-value discovery algorithm based on Map-Reduce
CN106649765A (en) * 2016-12-27 2017-05-10 国网山东省电力公司济宁供电公司 Smart power grid panoramic data analysis method based on big data technology

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150220080A1 (en) * 2014-01-31 2015-08-06 Fisher-Rosemount Systems, Inc. Managing Big Data In Process Control Systems
CN106446516A (en) * 2016-08-30 2017-02-22 江苏名通信息科技有限公司 Big-data incremental truth-value discovery algorithm based on Map-Reduce
CN106649765A (en) * 2016-12-27 2017-05-10 国网山东省电力公司济宁供电公司 Smart power grid panoramic data analysis method based on big data technology

Similar Documents

Publication Publication Date Title
CN111586091B (en) Edge computing gateway system for realizing computing power assembly
CN100478944C (en) Automatic task generator method and system
CN107818120A (en) Data processing method and device based on big data
CN111709527A (en) Operation and maintenance knowledge map library establishing method, device, equipment and storage medium
CN105005683A (en) Caching system and method for solving data normalization problem of regional medical system
CN102932195B (en) A kind of business diagnosis method for supervising of protocal analysis Network Based and system
CN103377100B (en) A kind of data back up method, network node and system
CN109241023A (en) Distributed memory system date storage method, device, system and storage medium
CN107977167B (en) Erasure code based degeneration reading optimization method for distributed storage system
CN113593713B (en) Epidemic situation prevention and control method, device, equipment and medium
CN107391258A (en) A kind of portable remote sensing image real time processing system of software and hardware one
CN103235817A (en) Large-scale infection control data storage processing method
CN110399485A (en) The data source tracing method and system of word-based vector sum machine learning
CN110377757A (en) A kind of real time knowledge map construction system
CN109753541A (en) A kind of relational network construction method and device, computer readable storage medium
CN112163127B (en) Relationship graph construction method and device, electronic equipment and storage medium
WO2019006639A1 (en) Big data storage management system
WO2020207252A1 (en) Data storage method and device, storage medium, and electronic apparatus
CN112738159A (en) Digital application information acquisition system and method for power grid company
CN204990305U (en) Big data intelligent analysis system
Bai RETRACTED ARTICLE: Data cleansing method of talent management data in wireless sensor network based on data mining technology
WO2023173733A1 (en) Data tracking method and apparatus, electronic device and storage medium
CN116089431A (en) Data processing method and device of data warehouse, electronic equipment and storage medium
CN111199777B (en) Biological big data-oriented streaming and mutation real-time mining system and method
CN115509693A (en) Data optimization method based on cluster Pod scheduling combined with data lake

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17917102

Country of ref document: EP

Kind code of ref document: A1