CN107291380A - Efficient big data storage method - Google Patents

Efficient big data storage method Download PDF

Info

Publication number
CN107291380A
CN107291380A CN201710347064.2A CN201710347064A CN107291380A CN 107291380 A CN107291380 A CN 107291380A CN 201710347064 A CN201710347064 A CN 201710347064A CN 107291380 A CN107291380 A CN 107291380A
Authority
CN
China
Prior art keywords
storage
data
object data
glusterfs
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201710347064.2A
Other languages
Chinese (zh)
Inventor
梁庆欢
蒋颖
王川林
陈长明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Baomihua Information Technology Co Ltd
Original Assignee
Chengdu Baomihua Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Baomihua Information Technology Co Ltd filed Critical Chengdu Baomihua Information Technology Co Ltd
Priority to CN201710347064.2A priority Critical patent/CN107291380A/en
Publication of CN107291380A publication Critical patent/CN107291380A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • G06F3/0607Improving or facilitating administration, e.g. storage management by facilitating the process of upgrading existing storage systems, e.g. for improving compatibility between host and storage device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of efficient big data storage method, comprises the following steps:S1:Receive object data, and the attribute information of identification object data;S2:The first storage subsystem into storage system is stored its object data according to the attribute information of object data;S3:The incidence relation and pattern of the object data for storing into storage system the first storage subsystem are stored into the second storage subsystem into storage system;S4:Choose at least two-server start GlusterFS service, resource-sharing is locally stored into GlusterFS basic unit of storage at least two-server, and by the basic unit of storage constitute a GlusterFS volume;S5:Start Hadoop name node service on the server rolled up described in the carry, and by the data storage of the name node on the volume of carry;S6:Mechanical hard disk is replaced by solid state hard disc in lifting to storage device physical performance, conversion equipment.

Description

Efficient big data storage method
Technical field
The present invention relates to specifically related to a kind of efficient big data storage method.
Background technology
Such definition is given for " big data " research institution Gartner." big data " is to need new tupe There could be stronger decision edge, see clearly discovery power and process optimization ability to adapt to magnanimity, high growth rate and diversified letter Cease assets;Mai Kenxi studies in the whole world given a definition that:A kind of scale is arrived greatly in terms of acquisition, storage, management, analysis significantly Beyond the data acquisition system of traditional database software means capability scope, the data scale with magnanimity, quick stream compression, Various big feature of data type and value density low four;The strategic importance of big data technology, which is not lain in, grasps huge data letter Breath, and be to carry out specialized process containing significant data to these.In other words, if big data is compared to a kind of production Industry, then this industry realizes the key of profit, is to improve " working ability " to data, data is realized by " processing " " increment ";Technically, the relation of big data and cloud computing is inseparable just as the positive and negative of one piece of coin.Big data It can not necessarily be handled with the computer of separate unit, it is necessary to use distributed structure/architecture.Its characteristic is to carry out mass data Distributed data digging.But it must rely on distributed treatment, distributed data base and the cloud storage of cloud computing, virtualization skill Art;With the arriving of cloud era, big data has also attracted increasing concern.Analyst team thinks that big data is generally used To describe a large amount of unstructured datas and semi-structured data of company's creation, these data are downloading to relational data Storehouse can overspending time and money when being used to analyze.Big data analysis is often linked together with cloud computing, because big in real time Type data set analysis needs the framework as MapReduce to be shared out the work to tens of, hundreds of or even thousands of computer.
Big data needs special technology, effectively to handle the data in the substantial amounts of tolerance elapsed time.Suitable for big The technology of data, including MPP database, data mining, distributed file system, distributed data base, cloud meter Calculate platform, internet and expansible storage system;Therefore it is badly in need of a kind of efficient big data storage method at present to adapt at present Demand.
The content of the invention
In view of the above-mentioned deficiencies in the prior art, it is an object of the present invention to provide a kind of efficient big data storage method, this is efficient Big data storage method can solve the above problems well.
To reach above-mentioned requirements, the present invention is adopted the technical scheme that:A kind of efficient big data storage method, the height are provided Effect big data storage method comprises the following steps:
S1:Receive object data, and the attribute information of identification object data;
S2:The first storage subsystem into storage system is stored its object data according to the attribute information of object data System;
S3:The incidence relation of the object data for storing into storage system the first storage subsystem and pattern are stored The second storage subsystem into storage system;
S4:Choose at least two-server and start GlusterFS services, by being locally stored at least two-server Resource-sharing into GlusterFS basic unit of storage, and by the basic unit of storage constitute a GlusterFS volume;
S5:Start Hadoop name node service on the server rolled up described in the carry, and by the title section The data storage of point is on the volume of carry;
S6:Mechanical hard disk is replaced by solid state hard disc in lifting to storage device physical performance, conversion equipment;And
S7:Systematic function is greatly improved using RAID1+0 technologies;And optimized for I/O loads, improve specific deposit Storage performance under storage pattern.
The efficient big data storage method has the advantage that as follows:
(1) the efficient big data storage method realizes acentric, efficient big data storage.
(2) the efficient big data storage method reduces management cost, be favorably improved data processing flexibility and easily With property, the learning cost that user uses is reduced.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding of the present application, the part of the application is constituted, at this Same or analogous part, the schematic description and description of the application are represented using identical reference number in a little accompanying drawings For explaining the application, the improper restriction to the application is not constituted.In the accompanying drawings:
Fig. 1 schematically shows the flow chart of the efficient big data storage method according to the application one embodiment.
Embodiment
To make the purpose, technical scheme and advantage of the application clearer, below in conjunction with drawings and the specific embodiments, to this Application is described in further detail.
In the following description, the reference to " one embodiment ", " embodiment ", " example ", " example " etc. shows The embodiment or example so described can include special characteristic, structure, characteristic, property, element or limit, but not each real Applying example or example all necessarily includes special characteristic, structure, characteristic, property, element or limit.In addition, reuse phrase " according to One embodiment of the application " is not necessarily referring to identical embodiment although it is possible to refer to phase be the same as Example.
For the sake of simplicity, eliminate that well known to a person skilled in the art some technical characteristics in describing below.
According to one embodiment of the application there is provided a kind of efficient big data storage method, comprise the following steps:
S1:Receive object data, and the attribute information of identification object data;
S2:The first storage subsystem into storage system is stored its object data according to the attribute information of object data System;
S3:The incidence relation of the object data for storing into storage system the first storage subsystem and pattern are stored The second storage subsystem into storage system;
S4:Choose at least two-server and start GlusterFS services, by being locally stored at least two-server Resource-sharing into GlusterFS basic unit of storage, and by the basic unit of storage constitute a GlusterFS volume;
S5:Start Hadoop name node service on the server rolled up described in the carry, and by the title section The data storage of point is on the volume of carry;
S6:Mechanical hard disk is replaced by solid state hard disc in lifting to storage device physical performance, conversion equipment;And
S7:Systematic function is greatly improved using RAID1+0 technologies;And optimized for I/O loads, improve specific deposit Storage performance under storage pattern.
According to one embodiment of the application, the object data of the efficient big data storage method includes structuring number According at least one of, semi-structured data or unstructured data, and this method also includes, receive object data it Before, in the establishment of object data, between setting structure data, semi-structured data and unstructured data and memory cell Corresponding relation, first storage subsystem is made up of parallel data library unit and Hadoop platform, and Hadoop platform includes HDFS units, HBase units and Hive units, wherein, HDFS units storage unstructured data, HBase units and Hive are mono- Member storage semi-structured data, parallel database unit storage structure data.
First to the lifting of storage device physical performance, mechanical hard disk is replaced by solid state hard disc in conversion equipment;Then it is The design and optimization for structure of uniting, its optimization comprising system logic structure and data flow passage, can be big using RAID1+0 technologies It is big to improve systematic function;Finally optimized for I/O loads, improve the storage performance under particular memory pattern;Start Hadoop other calculate nodes, collectively constitute complete Hadoop service systems with the name node, externally provide big data Processing work;The server of volume is delayed machine or when occurring abnormal when carry, may further include:Hung on other servers The volume is carried, then starts name node service, data do not have any loss, and Hadoop continues normal offer service;If It is not that the server of the carry volume is delayed and machine or occurs abnormal, influence will not be produced on Hadoop services.
Embodiment described above only represents the several embodiments of the present invention, and it describes more specific and detailed, but not It is understood that as limitation of the scope of the invention.It should be pointed out that for the person of ordinary skill of the art, not departing from On the premise of present inventive concept, various modifications and improvements can be made, these belong to the scope of the present invention.Therefore this hair Bright protection domain should be defined by the claim.

Claims (2)

1. a kind of efficient big data storage method, it is characterised in that comprise the following steps:
S1:Receive object data, and the attribute information of identification object data;
S2:The first storage subsystem into storage system is stored its object data according to the attribute information of object data;
S3:The incidence relation and pattern of the object data for storing into storage system the first storage subsystem are stored to depositing The second storage subsystem in storage system;
S4:Choose at least two-server and start GlusterFS services, resource is locally stored at least two-server Share into GlusterFS basic unit of storage, and the basic unit of storage is constituted to GlusterFS volume;
S5:Start Hadoop name node service on the server rolled up described in the carry, and by the name node Data storage is on the volume of carry;
S6:Mechanical hard disk is replaced by solid state hard disc in lifting to storage device physical performance, conversion equipment;And
S7:Systematic function is greatly improved using RAID1+0 technologies;And optimized for I/O loads, improve particular memory mould Storage performance under formula.
2. efficient big data storage method according to claim 1, it is characterised in that:The object data includes structuring At least one of data, semi-structured data or unstructured data, and this method also includes, receive object data it Before, in the establishment of object data, between setting structure data, semi-structured data and unstructured data and memory cell Corresponding relation, first storage subsystem is made up of parallel data library unit and Hadoop platform, and Hadoop platform includes HDFS units, HBase units and Hive units, wherein, HDFS units storage unstructured data, HBase units and Hive are mono- Member storage semi-structured data, parallel database unit storage structure data.
CN201710347064.2A 2017-05-16 2017-05-16 Efficient big data storage method Withdrawn CN107291380A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710347064.2A CN107291380A (en) 2017-05-16 2017-05-16 Efficient big data storage method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710347064.2A CN107291380A (en) 2017-05-16 2017-05-16 Efficient big data storage method

Publications (1)

Publication Number Publication Date
CN107291380A true CN107291380A (en) 2017-10-24

Family

ID=60094240

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710347064.2A Withdrawn CN107291380A (en) 2017-05-16 2017-05-16 Efficient big data storage method

Country Status (1)

Country Link
CN (1) CN107291380A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110442430A (en) * 2019-08-06 2019-11-12 上海浦东发展银行股份有限公司信用卡中心 A kind of dissemination method based on distributed storage container cloud application

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110442430A (en) * 2019-08-06 2019-11-12 上海浦东发展银行股份有限公司信用卡中心 A kind of dissemination method based on distributed storage container cloud application
CN110442430B (en) * 2019-08-06 2021-11-19 上海浦东发展银行股份有限公司信用卡中心 Publishing method based on distributed storage container cloud application

Similar Documents

Publication Publication Date Title
CN103365929B (en) The management method of a kind of data base connection and system
CN104331435B (en) A kind of efficient mass data abstracting method of low influence based on Hadoop big data platforms
US11588793B2 (en) System and methods for dynamic geospatially-referenced cyber-physical infrastructure inventory and asset management
CN105630847B (en) Date storage method, data query method, apparatus and system
Ngu et al. B+-tree construction on massive data with Hadoop
CN108595473A (en) A kind of big data application platform based on cloud computing
Roth et al. Event data warehousing for complex event processing
Chen et al. An intelligent approval system for city construction based on cloud computing and big data
Blythe et al. Farm: Architecture for distributed agent-based social simulations
CN105824892A (en) Method for synchronizing and processing data by data pool
CN106682071A (en) University library digital resource sharing method based on big data
Baig et al. Big Data Tools: Advantages and Disadvantages.
CN110119422A (en) Small wechat borrows tenant data depot data processing system and equipment
Bhogal et al. A review on big data security and handling
CN107291380A (en) Efficient big data storage method
Lee et al. A big data management system for energy consumption prediction models
Andi et al. Association rule algorithm with FP growth for book search
CN112445776A (en) Presto-based dynamic barrel dividing method, system, equipment and readable storage medium
CN204906437U (en) Big data storage application network framework
Ketu et al. Performance enhancement of distributed K-Means clustering for big Data analytics through in-memory computation
CN107203633A (en) Tables of data pushes away several processing methods, device and electronic equipment
Rao et al. Mapreduce accelerated signature-based intrusion detection mechanism (idm) with pattern matching mechanism
CN105630896A (en) Method for quickly importing mass data
Kaur et al. Enhanced Data Management Framework for Cloud Based System
Monu et al. Simulation of performance analysis of mongodb, pig, hive storage, map reduce, spark and yarn

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20171024

WW01 Invention patent application withdrawn after publication