CN111026814B

CN111026814B - Low-cost data storage method

Info

Publication number: CN111026814B
Application number: CN201911103260.0A
Authority: CN
Inventors: 冯报安; 杨晶生
Original assignee: Shanghai Microphone Culture Media Co ltd
Current assignee: Shanghai Microphone Culture Media Co ltd
Priority date: 2019-11-12
Filing date: 2019-11-12
Publication date: 2024-04-12
Anticipated expiration: 2039-11-12
Also published as: CN111026814A

Abstract

The invention discloses a low-cost data storage method in the technical field of data storage, which comprises the following steps of S1: selection of storage hardware, S2: storing data, S3: backup of data, S4: the invention uses Hive based on Hdfs to store all data, uses Hive to allow historical data to be inquired through api and interactive client, groups data according to date, improves the performance of inquiring data in a certain time period, uses low price and high storage ratio HDD disk as hardware.

Description

Low-cost data storage method

Technical Field

The invention relates to the technical field of data storage, in particular to a low-cost data storage method.

Background

With the continuous development of business and time, the continuous expansion of data volume related to users and products, TB-level and even PB-level data is quite common. The original traditional relational databases can greatly reduce performance and even be unusable in the face of such large amounts of data. However, if modern distributed columnar storage such as Hbase is used, it is actually possible to store such huge data, but since Hbase is a high-performance random read-write for providing an on-line service, if all data is stored in an expensive SSD hard disk, a huge increase in hardware cost is caused. Based on the above, the present invention designs a low-cost data storage method to solve the above-mentioned problems.

Disclosure of Invention

The present invention is directed to a low cost data storage method, which solves the above-mentioned problems of the related art.

In order to achieve the above purpose, the present invention provides the following technical solutions: a method of storing data at a low cost,

s1: selection of storage hardware

Selecting an HDD hard disk with stable performance to store all user and product data in a distributed arrangement mode;

s2: storage of data

User and product data storage input is carried out by using a distributed file system Hdfs and a data warehouse Hive built on the distributed file system Hdfs, wherein the Hdfs is a distributed file system designed to be suitable for running on a HHD hard disk in the step S1, high-throughput data access can be provided, the requirements of POSIX are relaxed, the data in the file system can be accessed in a stream mode, the data storage mode in the Hdfs is grouped according to the date, and the Hive comprises a data source, data storage and management, data service and data application;

s3: backup of data

The Hdfs in the step S2 is utilized to backup the stored user and product data, so that good data backup can be provided;

s4: data query

By extracting the user and product data stored by Hdfs in the querying step S2 and the data backed up by Hdfs on the stored user and product in the step S3 by using the ability of the Hive system in the step S2 to provide the query history data, hive can allow the query history data to be queried through api and interactive clients.

Preferably, in the step S1, the ST6000NM0034NWCCG del 6tb3.5 inch 12gb hdd v4SAS hard disk is adopted.

Preferably, the POSIX is a portable operating system interface.

Preferably, the data source is a data source of a data warehouse, and comprises external data, an existing service system and document data, the data storage and management is storage and management of the data, the data warehouse, a data mart, a data warehouse detection, an operation and maintenance tool and metadata management are included, the data service provides data service for a front end and an application, the data service can be directly obtained from the data warehouse for the front end application, the data service responsible for the front end application can also be provided through an OLAP server, and the data application is a data service which is directly oriented to a user, and comprises a data query tool, a free report tool, a data analysis tool, a data mining tool and various application systems.

Preferably, the OLAP is an online analysis process, and can quickly, consistently and interactively observe information from various aspects, so as to achieve the purpose of deep understanding of data.

Compared with the prior art, the invention has the beneficial effects that:

1) The invention uses Hive based on Hdfs to save all data;

2) The invention can allow historical data to be queried through the api and the interactive client by using Hive;

3) According to the invention, the data are grouped according to the date, so that the performance of inquiring the data in a certain time period is improved;

4) The invention uses low price and high storage ratio HDD disk as hardware.

Of course, it is not necessary for any one product to practice the invention to achieve all of the advantages set forth above at the same time.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, based on the embodiments of the invention, which are apparent to one of ordinary skill in the art without inventive faculty are intended to be within the scope of the invention

Referring to fig. 1, the present invention provides a technical solution: a method of storing data at a low cost,

s1: selection of storage hardware

s2: storage of data

User and product data storage input is carried out by using a distributed file system Hdfs and a data warehouse Hive built on the distributed file system Hdfs, wherein the Hdfs is a distributed file system designed to be suitable for running on a HHD hard disk in the step S1, high-throughput data access can be provided, the requirements of POSIX are relaxed, the data in the file system can be accessed in a stream form, the data storage mode in the Hdfs is grouped according to the date, and the Hive comprises a data source, data storage and management, data service and data application;

s3: backup of data

s4: data query

In step S1, ST6000NM0034NWCCG Dell6TB3.5 inch 12Gb HDD V4SAS hard disk is adopted.

Wherein POSIX is a portable operating system interface.

The data source is a data source of a data warehouse, and comprises external data, an existing service system and document data, the data storage and management is the storage and management of the data, the data warehouse, a data warehouse detection, operation and maintenance tool and metadata management are included, the data service provides data service for a front end and an application, the data service can be directly obtained from the data warehouse for the front end application, the data service responsible for the front end application can also be provided through an OLAP server, and the data application is directly oriented to a user and comprises a data query tool, a free report tool, a data analysis tool, a data mining tool and various application systems.

The OLAP is online analysis processing, and can rapidly, consistently and interactively observe information from various aspects so as to achieve the purpose of deeply understanding data.

In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The preferred embodiments of the invention disclosed above are intended only to assist in the explanation of the invention. The preferred embodiments are not exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof.

Claims

1. A low cost data storage method, characterized by:

s1: selection of storage hardware

s2: storage of data

User and product data storage inputs are performed by using a distributed file system Hdfs and a data warehouse Hive built on the distributed file system Hdfs;

s3: backup of data

s4: data query

Extracting user and product data stored by Hdfs in the inquiring step S2 and data backed up by Hdfs on the stored user and product in the step S3 by utilizing the capability of providing inquiring historical data by the Hive system in the step S2;

the Hive in steps S2 and S4 includes a data source, a data storage and management, a data service and a data application, where the data source is a data source of a data warehouse and includes external data, an existing service system and document data, the data storage and management includes data warehouse, data mart, data warehouse detection, operation and maintenance tools and metadata management, the data service provides data service for front end and application, the data service can be directly obtained from the data warehouse for front end application, and the data service responsible for front end application can also be provided through OLAP server, and the data application is directly oriented to users and includes a data query tool, a free report tool, a data analysis tool, a data mining tool and various application systems.

2. A low cost data storage method according to claim 1, wherein: in the step S1, the standard data is ST6000NM0034 NWCCDCDDELL6 TB3.5 inch 12GbHDDV4SAS hard disk.

3. A low cost data storage method according to claim 1, wherein: the Hdfs in step S2 is a distributed file system designed to be suitable for running on the HHD hard disk in step S1, and can provide high throughput data access, and relaxes the requirements of POSIX so that the data in the file system can be accessed in a streaming form.

4. A low cost data storage method according to claim 3, wherein: the POSIX is a portable operating system interface.

5. A low cost data storage method according to claim 4, wherein: the OLAP is an online analysis process, and can rapidly, consistently and interactively observe information from various aspects so as to achieve the purpose of deeply understanding data.

6. A low cost data storage method according to claim 1, wherein: the data are grouped by date in the manner of storing the data in the step Hdfs.

7. A low cost data storage method according to claim 1, wherein: hive in step S4 can allow for querying historical data via api and interactive clients.