CN111026814B - Low-cost data storage method - Google Patents

Low-cost data storage method Download PDF

Info

Publication number
CN111026814B
CN111026814B CN201911103260.0A CN201911103260A CN111026814B CN 111026814 B CN111026814 B CN 111026814B CN 201911103260 A CN201911103260 A CN 201911103260A CN 111026814 B CN111026814 B CN 111026814B
Authority
CN
China
Prior art keywords
data
hdfs
storage method
data storage
low cost
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911103260.0A
Other languages
Chinese (zh)
Other versions
CN111026814A (en
Inventor
冯报安
杨晶生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Microphone Culture Media Co ltd
Original Assignee
Shanghai Microphone Culture Media Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Microphone Culture Media Co ltd filed Critical Shanghai Microphone Culture Media Co ltd
Priority to CN201911103260.0A priority Critical patent/CN111026814B/en
Publication of CN111026814A publication Critical patent/CN111026814A/en
Application granted granted Critical
Publication of CN111026814B publication Critical patent/CN111026814B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a low-cost data storage method in the technical field of data storage, which comprises the following steps of S1: selection of storage hardware, S2: storing data, S3: backup of data, S4: the invention uses Hive based on Hdfs to store all data, uses Hive to allow historical data to be inquired through api and interactive client, groups data according to date, improves the performance of inquiring data in a certain time period, uses low price and high storage ratio HDD disk as hardware.

Description

Low-cost data storage method
Technical Field
The invention relates to the technical field of data storage, in particular to a low-cost data storage method.
Background
With the continuous development of business and time, the continuous expansion of data volume related to users and products, TB-level and even PB-level data is quite common. The original traditional relational databases can greatly reduce performance and even be unusable in the face of such large amounts of data. However, if modern distributed columnar storage such as Hbase is used, it is actually possible to store such huge data, but since Hbase is a high-performance random read-write for providing an on-line service, if all data is stored in an expensive SSD hard disk, a huge increase in hardware cost is caused. Based on the above, the present invention designs a low-cost data storage method to solve the above-mentioned problems.
Disclosure of Invention
The present invention is directed to a low cost data storage method, which solves the above-mentioned problems of the related art.
In order to achieve the above purpose, the present invention provides the following technical solutions: a method of storing data at a low cost,
s1: selection of storage hardware
Selecting an HDD hard disk with stable performance to store all user and product data in a distributed arrangement mode;
s2: storage of data
User and product data storage input is carried out by using a distributed file system Hdfs and a data warehouse Hive built on the distributed file system Hdfs, wherein the Hdfs is a distributed file system designed to be suitable for running on a HHD hard disk in the step S1, high-throughput data access can be provided, the requirements of POSIX are relaxed, the data in the file system can be accessed in a stream mode, the data storage mode in the Hdfs is grouped according to the date, and the Hive comprises a data source, data storage and management, data service and data application;
s3: backup of data
The Hdfs in the step S2 is utilized to backup the stored user and product data, so that good data backup can be provided;
s4: data query
By extracting the user and product data stored by Hdfs in the querying step S2 and the data backed up by Hdfs on the stored user and product in the step S3 by using the ability of the Hive system in the step S2 to provide the query history data, hive can allow the query history data to be queried through api and interactive clients.
Preferably, in the step S1, the ST6000NM0034NWCCG del 6tb3.5 inch 12gb hdd v4SAS hard disk is adopted.
Preferably, the POSIX is a portable operating system interface.
Preferably, the data source is a data source of a data warehouse, and comprises external data, an existing service system and document data, the data storage and management is storage and management of the data, the data warehouse, a data mart, a data warehouse detection, an operation and maintenance tool and metadata management are included, the data service provides data service for a front end and an application, the data service can be directly obtained from the data warehouse for the front end application, the data service responsible for the front end application can also be provided through an OLAP server, and the data application is a data service which is directly oriented to a user, and comprises a data query tool, a free report tool, a data analysis tool, a data mining tool and various application systems.
Preferably, the OLAP is an online analysis process, and can quickly, consistently and interactively observe information from various aspects, so as to achieve the purpose of deep understanding of data.
Compared with the prior art, the invention has the beneficial effects that:
1) The invention uses Hive based on Hdfs to save all data;
2) The invention can allow historical data to be queried through the api and the interactive client by using Hive;
3) According to the invention, the data are grouped according to the date, so that the performance of inquiring the data in a certain time period is improved;
4) The invention uses low price and high storage ratio HDD disk as hardware.
Of course, it is not necessary for any one product to practice the invention to achieve all of the advantages set forth above at the same time.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, based on the embodiments of the invention, which are apparent to one of ordinary skill in the art without inventive faculty are intended to be within the scope of the invention
Referring to fig. 1, the present invention provides a technical solution: a method of storing data at a low cost,
s1: selection of storage hardware
Selecting an HDD hard disk with stable performance to store all user and product data in a distributed arrangement mode;
s2: storage of data
User and product data storage input is carried out by using a distributed file system Hdfs and a data warehouse Hive built on the distributed file system Hdfs, wherein the Hdfs is a distributed file system designed to be suitable for running on a HHD hard disk in the step S1, high-throughput data access can be provided, the requirements of POSIX are relaxed, the data in the file system can be accessed in a stream form, the data storage mode in the Hdfs is grouped according to the date, and the Hive comprises a data source, data storage and management, data service and data application;
s3: backup of data
The Hdfs in the step S2 is utilized to backup the stored user and product data, so that good data backup can be provided;
s4: data query
By extracting the user and product data stored by Hdfs in the querying step S2 and the data backed up by Hdfs on the stored user and product in the step S3 by using the ability of the Hive system in the step S2 to provide the query history data, hive can allow the query history data to be queried through api and interactive clients.
In step S1, ST6000NM0034NWCCG Dell6TB3.5 inch 12Gb HDD V4SAS hard disk is adopted.
Wherein POSIX is a portable operating system interface.
The data source is a data source of a data warehouse, and comprises external data, an existing service system and document data, the data storage and management is the storage and management of the data, the data warehouse, a data warehouse detection, operation and maintenance tool and metadata management are included, the data service provides data service for a front end and an application, the data service can be directly obtained from the data warehouse for the front end application, the data service responsible for the front end application can also be provided through an OLAP server, and the data application is directly oriented to a user and comprises a data query tool, a free report tool, a data analysis tool, a data mining tool and various application systems.
The OLAP is online analysis processing, and can rapidly, consistently and interactively observe information from various aspects so as to achieve the purpose of deeply understanding data.
In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The preferred embodiments of the invention disclosed above are intended only to assist in the explanation of the invention. The preferred embodiments are not exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof.

Claims (7)

1. A low cost data storage method, characterized by:
s1: selection of storage hardware
Selecting an HDD hard disk with stable performance to store all user and product data in a distributed arrangement mode;
s2: storage of data
User and product data storage inputs are performed by using a distributed file system Hdfs and a data warehouse Hive built on the distributed file system Hdfs;
s3: backup of data
The Hdfs in the step S2 is utilized to backup the stored user and product data, so that good data backup can be provided;
s4: data query
Extracting user and product data stored by Hdfs in the inquiring step S2 and data backed up by Hdfs on the stored user and product in the step S3 by utilizing the capability of providing inquiring historical data by the Hive system in the step S2;
the Hive in steps S2 and S4 includes a data source, a data storage and management, a data service and a data application, where the data source is a data source of a data warehouse and includes external data, an existing service system and document data, the data storage and management includes data warehouse, data mart, data warehouse detection, operation and maintenance tools and metadata management, the data service provides data service for front end and application, the data service can be directly obtained from the data warehouse for front end application, and the data service responsible for front end application can also be provided through OLAP server, and the data application is directly oriented to users and includes a data query tool, a free report tool, a data analysis tool, a data mining tool and various application systems.
2. A low cost data storage method according to claim 1, wherein: in the step S1, the standard data is ST6000NM0034 NWCCDCDDELL6 TB3.5 inch 12GbHDDV4SAS hard disk.
3. A low cost data storage method according to claim 1, wherein: the Hdfs in step S2 is a distributed file system designed to be suitable for running on the HHD hard disk in step S1, and can provide high throughput data access, and relaxes the requirements of POSIX so that the data in the file system can be accessed in a streaming form.
4. A low cost data storage method according to claim 3, wherein: the POSIX is a portable operating system interface.
5. A low cost data storage method according to claim 4, wherein: the OLAP is an online analysis process, and can rapidly, consistently and interactively observe information from various aspects so as to achieve the purpose of deeply understanding data.
6. A low cost data storage method according to claim 1, wherein: the data are grouped by date in the manner of storing the data in the step Hdfs.
7. A low cost data storage method according to claim 1, wherein: hive in step S4 can allow for querying historical data via api and interactive clients.
CN201911103260.0A 2019-11-12 2019-11-12 Low-cost data storage method Active CN111026814B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911103260.0A CN111026814B (en) 2019-11-12 2019-11-12 Low-cost data storage method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911103260.0A CN111026814B (en) 2019-11-12 2019-11-12 Low-cost data storage method

Publications (2)

Publication Number Publication Date
CN111026814A CN111026814A (en) 2020-04-17
CN111026814B true CN111026814B (en) 2024-04-12

Family

ID=70205485

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911103260.0A Active CN111026814B (en) 2019-11-12 2019-11-12 Low-cost data storage method

Country Status (1)

Country Link
CN (1) CN111026814B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104111996A (en) * 2014-07-07 2014-10-22 山大地纬软件股份有限公司 Health insurance outpatient clinic big data extraction system and method based on hadoop platform
CN108268217A (en) * 2018-01-10 2018-07-10 北京航天云路有限公司 A kind of bedding storage method based on the cold and hot classification of time series data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101918806B1 (en) * 2015-06-30 2018-11-14 전자부품연구원 Cache Management Method for Optimizing the Read Performance of Distributed File System

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104111996A (en) * 2014-07-07 2014-10-22 山大地纬软件股份有限公司 Health insurance outpatient clinic big data extraction system and method based on hadoop platform
CN108268217A (en) * 2018-01-10 2018-07-10 北京航天云路有限公司 A kind of bedding storage method based on the cold and hot classification of time series data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
周逸文 ; .分布式存储技术和应用浅析.数码世界.2017,(12),全文. *

Also Published As

Publication number Publication date
CN111026814A (en) 2020-04-17

Similar Documents

Publication Publication Date Title
EP3028137B1 (en) Generating a multi-column index for relational databases by interleaving data bits for selectivity
US20180210934A1 (en) Systems and methods for interest-driven business intelligence systems including event-oriented data
Chaudhuri et al. An overview of business intelligence technology
US9842152B2 (en) Transparent discovery of semi-structured data schema
US11347740B2 (en) Managed query execution platform, and methods thereof
US10061834B1 (en) Incremental out-of-place updates for datasets in data stores
US10114846B1 (en) Balanced distribution of sort order values for a multi-column sort order of a relational database
TW201530328A (en) Method and device for constructing NoSQL database index for semi-structured data
CN114860780A (en) Data warehouse, data processing system and computer device
Ranawade et al. Online analytical processing on hadoop using apache kylin
Vishwanath et al. An Association Rule Mining for Materialized View Selection and View Maintanance
CN107341198B (en) Electric power mass data storage and query method based on theme instance
US11520763B2 (en) Automated optimization for in-memory data structures of column store databases
CN111026814B (en) Low-cost data storage method
CN111046013B (en) Cold data full-quantity storage and query architecture
CA2701173A1 (en) System and method for distributing queries to a group of databases and expediting data access
Li et al. A comparative study of row and column storage for time series data
Fong et al. Toward a scale-out data-management middleware for low-latency enterprise computing
Golab Querying sliding windows over online data streams
CN104657370B (en) A kind of associated method and apparatus of realization multi-dimension data cube
Liu et al. The Read Amplification Analysis of NoSQL Database on Top of OSDs: A Case Study of HBase
Baboo et al. Next generation data warehouse design with OLTP and OLAP systems sharing same database
CN107609746B (en) Intelligent bidding method based on data OLAP analysis and matched retrieval
Alam Data migration: relational RDBMS to non-relational NoSQL
KR20160127448A (en) Big data analysis relational database management system using high speed semiconductor storage device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant