CN107291380A - Efficient big data storage method - Google Patents
Efficient big data storage method Download PDFInfo
- Publication number
- CN107291380A CN107291380A CN201710347064.2A CN201710347064A CN107291380A CN 107291380 A CN107291380 A CN 107291380A CN 201710347064 A CN201710347064 A CN 201710347064A CN 107291380 A CN107291380 A CN 107291380A
- Authority
- CN
- China
- Prior art keywords
- storage
- data
- object data
- glusterfs
- server
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0604—Improving or facilitating administration, e.g. storage management
- G06F3/0607—Improving or facilitating administration, e.g. storage management by facilitating the process of upgrading existing storage systems, e.g. for improving compatibility between host and storage device
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0683—Plurality of storage devices
- G06F3/0689—Disk arrays, e.g. RAID, JBOD
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of efficient big data storage method, comprises the following steps:S1:Receive object data, and the attribute information of identification object data;S2:The first storage subsystem into storage system is stored its object data according to the attribute information of object data;S3:The incidence relation and pattern of the object data for storing into storage system the first storage subsystem are stored into the second storage subsystem into storage system;S4:Choose at least two-server start GlusterFS service, resource-sharing is locally stored into GlusterFS basic unit of storage at least two-server, and by the basic unit of storage constitute a GlusterFS volume;S5:Start Hadoop name node service on the server rolled up described in the carry, and by the data storage of the name node on the volume of carry;S6:Mechanical hard disk is replaced by solid state hard disc in lifting to storage device physical performance, conversion equipment.
Description
Technical field
The present invention relates to specifically related to a kind of efficient big data storage method.
Background technology
Such definition is given for " big data " research institution Gartner." big data " is to need new tupe
There could be stronger decision edge, see clearly discovery power and process optimization ability to adapt to magnanimity, high growth rate and diversified letter
Cease assets;Mai Kenxi studies in the whole world given a definition that:A kind of scale is arrived greatly in terms of acquisition, storage, management, analysis significantly
Beyond the data acquisition system of traditional database software means capability scope, the data scale with magnanimity, quick stream compression,
Various big feature of data type and value density low four;The strategic importance of big data technology, which is not lain in, grasps huge data letter
Breath, and be to carry out specialized process containing significant data to these.In other words, if big data is compared to a kind of production
Industry, then this industry realizes the key of profit, is to improve " working ability " to data, data is realized by " processing "
" increment ";Technically, the relation of big data and cloud computing is inseparable just as the positive and negative of one piece of coin.Big data
It can not necessarily be handled with the computer of separate unit, it is necessary to use distributed structure/architecture.Its characteristic is to carry out mass data
Distributed data digging.But it must rely on distributed treatment, distributed data base and the cloud storage of cloud computing, virtualization skill
Art;With the arriving of cloud era, big data has also attracted increasing concern.Analyst team thinks that big data is generally used
To describe a large amount of unstructured datas and semi-structured data of company's creation, these data are downloading to relational data
Storehouse can overspending time and money when being used to analyze.Big data analysis is often linked together with cloud computing, because big in real time
Type data set analysis needs the framework as MapReduce to be shared out the work to tens of, hundreds of or even thousands of computer.
Big data needs special technology, effectively to handle the data in the substantial amounts of tolerance elapsed time.Suitable for big
The technology of data, including MPP database, data mining, distributed file system, distributed data base, cloud meter
Calculate platform, internet and expansible storage system;Therefore it is badly in need of a kind of efficient big data storage method at present to adapt at present
Demand.
The content of the invention
In view of the above-mentioned deficiencies in the prior art, it is an object of the present invention to provide a kind of efficient big data storage method, this is efficient
Big data storage method can solve the above problems well.
To reach above-mentioned requirements, the present invention is adopted the technical scheme that:A kind of efficient big data storage method, the height are provided
Effect big data storage method comprises the following steps:
S1:Receive object data, and the attribute information of identification object data;
S2:The first storage subsystem into storage system is stored its object data according to the attribute information of object data
System;
S3:The incidence relation of the object data for storing into storage system the first storage subsystem and pattern are stored
The second storage subsystem into storage system;
S4:Choose at least two-server and start GlusterFS services, by being locally stored at least two-server
Resource-sharing into GlusterFS basic unit of storage, and by the basic unit of storage constitute a GlusterFS volume;
S5:Start Hadoop name node service on the server rolled up described in the carry, and by the title section
The data storage of point is on the volume of carry;
S6:Mechanical hard disk is replaced by solid state hard disc in lifting to storage device physical performance, conversion equipment;And
S7:Systematic function is greatly improved using RAID1+0 technologies;And optimized for I/O loads, improve specific deposit
Storage performance under storage pattern.
The efficient big data storage method has the advantage that as follows:
(1) the efficient big data storage method realizes acentric, efficient big data storage.
(2) the efficient big data storage method reduces management cost, be favorably improved data processing flexibility and easily
With property, the learning cost that user uses is reduced.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding of the present application, the part of the application is constituted, at this
Same or analogous part, the schematic description and description of the application are represented using identical reference number in a little accompanying drawings
For explaining the application, the improper restriction to the application is not constituted.In the accompanying drawings:
Fig. 1 schematically shows the flow chart of the efficient big data storage method according to the application one embodiment.
Embodiment
To make the purpose, technical scheme and advantage of the application clearer, below in conjunction with drawings and the specific embodiments, to this
Application is described in further detail.
In the following description, the reference to " one embodiment ", " embodiment ", " example ", " example " etc. shows
The embodiment or example so described can include special characteristic, structure, characteristic, property, element or limit, but not each real
Applying example or example all necessarily includes special characteristic, structure, characteristic, property, element or limit.In addition, reuse phrase " according to
One embodiment of the application " is not necessarily referring to identical embodiment although it is possible to refer to phase be the same as Example.
For the sake of simplicity, eliminate that well known to a person skilled in the art some technical characteristics in describing below.
According to one embodiment of the application there is provided a kind of efficient big data storage method, comprise the following steps:
S1:Receive object data, and the attribute information of identification object data;
S2:The first storage subsystem into storage system is stored its object data according to the attribute information of object data
System;
S3:The incidence relation of the object data for storing into storage system the first storage subsystem and pattern are stored
The second storage subsystem into storage system;
S4:Choose at least two-server and start GlusterFS services, by being locally stored at least two-server
Resource-sharing into GlusterFS basic unit of storage, and by the basic unit of storage constitute a GlusterFS volume;
S5:Start Hadoop name node service on the server rolled up described in the carry, and by the title section
The data storage of point is on the volume of carry;
S6:Mechanical hard disk is replaced by solid state hard disc in lifting to storage device physical performance, conversion equipment;And
S7:Systematic function is greatly improved using RAID1+0 technologies;And optimized for I/O loads, improve specific deposit
Storage performance under storage pattern.
According to one embodiment of the application, the object data of the efficient big data storage method includes structuring number
According at least one of, semi-structured data or unstructured data, and this method also includes, receive object data it
Before, in the establishment of object data, between setting structure data, semi-structured data and unstructured data and memory cell
Corresponding relation, first storage subsystem is made up of parallel data library unit and Hadoop platform, and Hadoop platform includes
HDFS units, HBase units and Hive units, wherein, HDFS units storage unstructured data, HBase units and Hive are mono-
Member storage semi-structured data, parallel database unit storage structure data.
First to the lifting of storage device physical performance, mechanical hard disk is replaced by solid state hard disc in conversion equipment;Then it is
The design and optimization for structure of uniting, its optimization comprising system logic structure and data flow passage, can be big using RAID1+0 technologies
It is big to improve systematic function;Finally optimized for I/O loads, improve the storage performance under particular memory pattern;Start
Hadoop other calculate nodes, collectively constitute complete Hadoop service systems with the name node, externally provide big data
Processing work;The server of volume is delayed machine or when occurring abnormal when carry, may further include:Hung on other servers
The volume is carried, then starts name node service, data do not have any loss, and Hadoop continues normal offer service;If
It is not that the server of the carry volume is delayed and machine or occurs abnormal, influence will not be produced on Hadoop services.
Embodiment described above only represents the several embodiments of the present invention, and it describes more specific and detailed, but not
It is understood that as limitation of the scope of the invention.It should be pointed out that for the person of ordinary skill of the art, not departing from
On the premise of present inventive concept, various modifications and improvements can be made, these belong to the scope of the present invention.Therefore this hair
Bright protection domain should be defined by the claim.
Claims (2)
1. a kind of efficient big data storage method, it is characterised in that comprise the following steps:
S1:Receive object data, and the attribute information of identification object data;
S2:The first storage subsystem into storage system is stored its object data according to the attribute information of object data;
S3:The incidence relation and pattern of the object data for storing into storage system the first storage subsystem are stored to depositing
The second storage subsystem in storage system;
S4:Choose at least two-server and start GlusterFS services, resource is locally stored at least two-server
Share into GlusterFS basic unit of storage, and the basic unit of storage is constituted to GlusterFS volume;
S5:Start Hadoop name node service on the server rolled up described in the carry, and by the name node
Data storage is on the volume of carry;
S6:Mechanical hard disk is replaced by solid state hard disc in lifting to storage device physical performance, conversion equipment;And
S7:Systematic function is greatly improved using RAID1+0 technologies;And optimized for I/O loads, improve particular memory mould
Storage performance under formula.
2. efficient big data storage method according to claim 1, it is characterised in that:The object data includes structuring
At least one of data, semi-structured data or unstructured data, and this method also includes, receive object data it
Before, in the establishment of object data, between setting structure data, semi-structured data and unstructured data and memory cell
Corresponding relation, first storage subsystem is made up of parallel data library unit and Hadoop platform, and Hadoop platform includes
HDFS units, HBase units and Hive units, wherein, HDFS units storage unstructured data, HBase units and Hive are mono-
Member storage semi-structured data, parallel database unit storage structure data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710347064.2A CN107291380A (en) | 2017-05-16 | 2017-05-16 | Efficient big data storage method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710347064.2A CN107291380A (en) | 2017-05-16 | 2017-05-16 | Efficient big data storage method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107291380A true CN107291380A (en) | 2017-10-24 |
Family
ID=60094240
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710347064.2A Withdrawn CN107291380A (en) | 2017-05-16 | 2017-05-16 | Efficient big data storage method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107291380A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110442430A (en) * | 2019-08-06 | 2019-11-12 | 上海浦东发展银行股份有限公司信用卡中心 | A kind of dissemination method based on distributed storage container cloud application |
-
2017
- 2017-05-16 CN CN201710347064.2A patent/CN107291380A/en not_active Withdrawn
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110442430A (en) * | 2019-08-06 | 2019-11-12 | 上海浦东发展银行股份有限公司信用卡中心 | A kind of dissemination method based on distributed storage container cloud application |
CN110442430B (en) * | 2019-08-06 | 2021-11-19 | 上海浦东发展银行股份有限公司信用卡中心 | Publishing method based on distributed storage container cloud application |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103365929B (en) | The management method of a kind of data base connection and system | |
CN104331435B (en) | A kind of efficient mass data abstracting method of low influence based on Hadoop big data platforms | |
US11588793B2 (en) | System and methods for dynamic geospatially-referenced cyber-physical infrastructure inventory and asset management | |
CN105630847B (en) | Date storage method, data query method, apparatus and system | |
Ngu et al. | B+-tree construction on massive data with Hadoop | |
CN108595473A (en) | A kind of big data application platform based on cloud computing | |
Roth et al. | Event data warehousing for complex event processing | |
Chen et al. | An intelligent approval system for city construction based on cloud computing and big data | |
Blythe et al. | Farm: Architecture for distributed agent-based social simulations | |
CN105824892A (en) | Method for synchronizing and processing data by data pool | |
CN106682071A (en) | University library digital resource sharing method based on big data | |
Baig et al. | Big Data Tools: Advantages and Disadvantages. | |
CN110119422A (en) | Small wechat borrows tenant data depot data processing system and equipment | |
Bhogal et al. | A review on big data security and handling | |
CN107291380A (en) | Efficient big data storage method | |
Lee et al. | A big data management system for energy consumption prediction models | |
Andi et al. | Association rule algorithm with FP growth for book search | |
CN112445776A (en) | Presto-based dynamic barrel dividing method, system, equipment and readable storage medium | |
CN204906437U (en) | Big data storage application network framework | |
Ketu et al. | Performance enhancement of distributed K-Means clustering for big Data analytics through in-memory computation | |
CN107203633A (en) | Tables of data pushes away several processing methods, device and electronic equipment | |
Rao et al. | Mapreduce accelerated signature-based intrusion detection mechanism (idm) with pattern matching mechanism | |
CN105630896A (en) | Method for quickly importing mass data | |
Kaur et al. | Enhanced Data Management Framework for Cloud Based System | |
Monu et al. | Simulation of performance analysis of mongodb, pig, hive storage, map reduce, spark and yarn |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20171024 |
|
WW01 | Invention patent application withdrawn after publication |