CN107291380A

CN107291380A - Efficient big data storage method

Info

Publication number: CN107291380A
Application number: CN201710347064.2A
Authority: CN
Inventors: 梁庆欢; 蒋颖; 王川林; 陈长明
Original assignee: Chengdu Baomihua Information Technology Co Ltd
Current assignee: Chengdu Baomihua Information Technology Co Ltd
Priority date: 2017-05-16
Filing date: 2017-05-16
Publication date: 2017-10-24

Abstract

The present invention provides a kind of efficient big data storage method, comprises the following steps：S1：Receive object data, and the attribute information of identification object data；S2：The first storage subsystem into storage system is stored its object data according to the attribute information of object data；S3：The incidence relation and pattern of the object data for storing into storage system the first storage subsystem are stored into the second storage subsystem into storage system；S4：Choose at least two-server start GlusterFS service, resource-sharing is locally stored into GlusterFS basic unit of storage at least two-server, and by the basic unit of storage constitute a GlusterFS volume；S5：Start Hadoop name node service on the server rolled up described in the carry, and by the data storage of the name node on the volume of carry；S6：Mechanical hard disk is replaced by solid state hard disc in lifting to storage device physical performance, conversion equipment.

Description

Efficient big data storage method

Technical field

The present invention relates to specifically related to a kind of efficient big data storage method.

Background technology

Such definition is given for " big data " research institution Gartner." big data " is to need new tupe There could be stronger decision edge, see clearly discovery power and process optimization ability to adapt to magnanimity, high growth rate and diversified letter Cease assets；Mai Kenxi studies in the whole world given a definition that：A kind of scale is arrived greatly in terms of acquisition, storage, management, analysis significantly Beyond the data acquisition system of traditional database software means capability scope, the data scale with magnanimity, quick stream compression, Various big feature of data type and value density low four；The strategic importance of big data technology, which is not lain in, grasps huge data letter Breath, and be to carry out specialized process containing significant data to these.In other words, if big data is compared to a kind of production Industry, then this industry realizes the key of profit, is to improve " working ability " to data, data is realized by " processing " " increment "；Technically, the relation of big data and cloud computing is inseparable just as the positive and negative of one piece of coin.Big data It can not necessarily be handled with the computer of separate unit, it is necessary to use distributed structure/architecture.Its characteristic is to carry out mass data Distributed data digging.But it must rely on distributed treatment, distributed data base and the cloud storage of cloud computing, virtualization skill Art；With the arriving of cloud era, big data has also attracted increasing concern.Analyst team thinks that big data is generally used To describe a large amount of unstructured datas and semi-structured data of company's creation, these data are downloading to relational data Storehouse can overspending time and money when being used to analyze.Big data analysis is often linked together with cloud computing, because big in real time Type data set analysis needs the framework as MapReduce to be shared out the work to tens of, hundreds of or even thousands of computer.

Big data needs special technology, effectively to handle the data in the substantial amounts of tolerance elapsed time.Suitable for big The technology of data, including MPP database, data mining, distributed file system, distributed data base, cloud meter Calculate platform, internet and expansible storage system；Therefore it is badly in need of a kind of efficient big data storage method at present to adapt at present Demand.

The content of the invention

In view of the above-mentioned deficiencies in the prior art, it is an object of the present invention to provide a kind of efficient big data storage method, this is efficient Big data storage method can solve the above problems well.

To reach above-mentioned requirements, the present invention is adopted the technical scheme that：A kind of efficient big data storage method, the height are provided Effect big data storage method comprises the following steps：

S1：Receive object data, and the attribute information of identification object data；

S2：The first storage subsystem into storage system is stored its object data according to the attribute information of object data System；

S3：The incidence relation of the object data for storing into storage system the first storage subsystem and pattern are stored The second storage subsystem into storage system；

S4：Choose at least two-server and start GlusterFS services, by being locally stored at least two-server Resource-sharing into GlusterFS basic unit of storage, and by the basic unit of storage constitute a GlusterFS volume；

S5：Start Hadoop name node service on the server rolled up described in the carry, and by the title section The data storage of point is on the volume of carry；

S6：Mechanical hard disk is replaced by solid state hard disc in lifting to storage device physical performance, conversion equipment；And

S7：Systematic function is greatly improved using RAID1+0 technologies；And optimized for I/O loads, improve specific deposit Storage performance under storage pattern.

The efficient big data storage method has the advantage that as follows：

(1) the efficient big data storage method realizes acentric, efficient big data storage.

(2) the efficient big data storage method reduces management cost, be favorably improved data processing flexibility and easily With property, the learning cost that user uses is reduced.

Brief description of the drawings

Accompanying drawing described herein is used for providing further understanding of the present application, the part of the application is constituted, at this Same or analogous part, the schematic description and description of the application are represented using identical reference number in a little accompanying drawings For explaining the application, the improper restriction to the application is not constituted.In the accompanying drawings：

Fig. 1 schematically shows the flow chart of the efficient big data storage method according to the application one embodiment.

Embodiment

To make the purpose, technical scheme and advantage of the application clearer, below in conjunction with drawings and the specific embodiments, to this Application is described in further detail.

In the following description, the reference to " one embodiment ", " embodiment ", " example ", " example " etc. shows The embodiment or example so described can include special characteristic, structure, characteristic, property, element or limit, but not each real Applying example or example all necessarily includes special characteristic, structure, characteristic, property, element or limit.In addition, reuse phrase " according to One embodiment of the application " is not necessarily referring to identical embodiment although it is possible to refer to phase be the same as Example.

For the sake of simplicity, eliminate that well known to a person skilled in the art some technical characteristics in describing below.

According to one embodiment of the application there is provided a kind of efficient big data storage method, comprise the following steps：

According to one embodiment of the application, the object data of the efficient big data storage method includes structuring number According at least one of, semi-structured data or unstructured data, and this method also includes, receive object data it Before, in the establishment of object data, between setting structure data, semi-structured data and unstructured data and memory cell Corresponding relation, first storage subsystem is made up of parallel data library unit and Hadoop platform, and Hadoop platform includes HDFS units, HBase units and Hive units, wherein, HDFS units storage unstructured data, HBase units and Hive are mono- Member storage semi-structured data, parallel database unit storage structure data.

First to the lifting of storage device physical performance, mechanical hard disk is replaced by solid state hard disc in conversion equipment；Then it is The design and optimization for structure of uniting, its optimization comprising system logic structure and data flow passage, can be big using RAID1+0 technologies It is big to improve systematic function；Finally optimized for I/O loads, improve the storage performance under particular memory pattern；Start Hadoop other calculate nodes, collectively constitute complete Hadoop service systems with the name node, externally provide big data Processing work；The server of volume is delayed machine or when occurring abnormal when carry, may further include：Hung on other servers The volume is carried, then starts name node service, data do not have any loss, and Hadoop continues normal offer service；If It is not that the server of the carry volume is delayed and machine or occurs abnormal, influence will not be produced on Hadoop services.

Embodiment described above only represents the several embodiments of the present invention, and it describes more specific and detailed, but not It is understood that as limitation of the scope of the invention.It should be pointed out that for the person of ordinary skill of the art, not departing from On the premise of present inventive concept, various modifications and improvements can be made, these belong to the scope of the present invention.Therefore this hair Bright protection domain should be defined by the claim.

Claims

1. a kind of efficient big data storage method, it is characterised in that comprise the following steps：

S2：The first storage subsystem into storage system is stored its object data according to the attribute information of object data；

S3：The incidence relation and pattern of the object data for storing into storage system the first storage subsystem are stored to depositing The second storage subsystem in storage system；

S4：Choose at least two-server and start GlusterFS services, resource is locally stored at least two-server Share into GlusterFS basic unit of storage, and the basic unit of storage is constituted to GlusterFS volume；

S5：Start Hadoop name node service on the server rolled up described in the carry, and by the name node Data storage is on the volume of carry；

S7：Systematic function is greatly improved using RAID1+0 technologies；And optimized for I/O loads, improve particular memory mould Storage performance under formula.

2. efficient big data storage method according to claim 1, it is characterised in that：The object data includes structuring At least one of data, semi-structured data or unstructured data, and this method also includes, receive object data it Before, in the establishment of object data, between setting structure data, semi-structured data and unstructured data and memory cell Corresponding relation, first storage subsystem is made up of parallel data library unit and Hadoop platform, and Hadoop platform includes HDFS units, HBase units and Hive units, wherein, HDFS units storage unstructured data, HBase units and Hive are mono- Member storage semi-structured data, parallel database unit storage structure data.