CN111026814B - Low-cost data storage method - Google Patents
Low-cost data storage method Download PDFInfo
- Publication number
- CN111026814B CN111026814B CN201911103260.0A CN201911103260A CN111026814B CN 111026814 B CN111026814 B CN 111026814B CN 201911103260 A CN201911103260 A CN 201911103260A CN 111026814 B CN111026814 B CN 111026814B
- Authority
- CN
- China
- Prior art keywords
- data
- hdfs
- storage method
- data storage
- low cost
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013500 data storage Methods 0.000 title claims abstract description 26
- 238000000034 method Methods 0.000 title claims abstract description 18
- 230000002452 interceptive effect Effects 0.000 claims abstract description 5
- 238000013523 data management Methods 0.000 claims description 6
- 238000007726 management method Methods 0.000 claims description 5
- 238000004458 analytical method Methods 0.000 claims description 3
- 238000007405 data analysis Methods 0.000 claims description 3
- 238000007418 data mining Methods 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 3
- 238000012423 maintenance Methods 0.000 claims description 3
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1464—Management of the backup or restore process for networked environments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2474—Sequence data queries, e.g. querying versioned data
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a low-cost data storage method in the technical field of data storage, which comprises the following steps of S1: selection of storage hardware, S2: storing data, S3: backup of data, S4: the invention uses Hive based on Hdfs to store all data, uses Hive to allow historical data to be inquired through api and interactive client, groups data according to date, improves the performance of inquiring data in a certain time period, uses low price and high storage ratio HDD disk as hardware.
Description
Technical Field
The invention relates to the technical field of data storage, in particular to a low-cost data storage method.
Background
With the continuous development of business and time, the continuous expansion of data volume related to users and products, TB-level and even PB-level data is quite common. The original traditional relational databases can greatly reduce performance and even be unusable in the face of such large amounts of data. However, if modern distributed columnar storage such as Hbase is used, it is actually possible to store such huge data, but since Hbase is a high-performance random read-write for providing an on-line service, if all data is stored in an expensive SSD hard disk, a huge increase in hardware cost is caused. Based on the above, the present invention designs a low-cost data storage method to solve the above-mentioned problems.
Disclosure of Invention
The present invention is directed to a low cost data storage method, which solves the above-mentioned problems of the related art.
In order to achieve the above purpose, the present invention provides the following technical solutions: a method of storing data at a low cost,
s1: selection of storage hardware
Selecting an HDD hard disk with stable performance to store all user and product data in a distributed arrangement mode;
s2: storage of data
User and product data storage input is carried out by using a distributed file system Hdfs and a data warehouse Hive built on the distributed file system Hdfs, wherein the Hdfs is a distributed file system designed to be suitable for running on a HHD hard disk in the step S1, high-throughput data access can be provided, the requirements of POSIX are relaxed, the data in the file system can be accessed in a stream mode, the data storage mode in the Hdfs is grouped according to the date, and the Hive comprises a data source, data storage and management, data service and data application;
s3: backup of data
The Hdfs in the step S2 is utilized to backup the stored user and product data, so that good data backup can be provided;
s4: data query
By extracting the user and product data stored by Hdfs in the querying step S2 and the data backed up by Hdfs on the stored user and product in the step S3 by using the ability of the Hive system in the step S2 to provide the query history data, hive can allow the query history data to be queried through api and interactive clients.
Preferably, in the step S1, the ST6000NM0034NWCCG del 6tb3.5 inch 12gb hdd v4SAS hard disk is adopted.
Preferably, the POSIX is a portable operating system interface.
Preferably, the data source is a data source of a data warehouse, and comprises external data, an existing service system and document data, the data storage and management is storage and management of the data, the data warehouse, a data mart, a data warehouse detection, an operation and maintenance tool and metadata management are included, the data service provides data service for a front end and an application, the data service can be directly obtained from the data warehouse for the front end application, the data service responsible for the front end application can also be provided through an OLAP server, and the data application is a data service which is directly oriented to a user, and comprises a data query tool, a free report tool, a data analysis tool, a data mining tool and various application systems.
Preferably, the OLAP is an online analysis process, and can quickly, consistently and interactively observe information from various aspects, so as to achieve the purpose of deep understanding of data.
Compared with the prior art, the invention has the beneficial effects that:
1) The invention uses Hive based on Hdfs to save all data;
2) The invention can allow historical data to be queried through the api and the interactive client by using Hive;
3) According to the invention, the data are grouped according to the date, so that the performance of inquiring the data in a certain time period is improved;
4) The invention uses low price and high storage ratio HDD disk as hardware.
Of course, it is not necessary for any one product to practice the invention to achieve all of the advantages set forth above at the same time.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, based on the embodiments of the invention, which are apparent to one of ordinary skill in the art without inventive faculty are intended to be within the scope of the invention
Referring to fig. 1, the present invention provides a technical solution: a method of storing data at a low cost,
s1: selection of storage hardware
Selecting an HDD hard disk with stable performance to store all user and product data in a distributed arrangement mode;
s2: storage of data
User and product data storage input is carried out by using a distributed file system Hdfs and a data warehouse Hive built on the distributed file system Hdfs, wherein the Hdfs is a distributed file system designed to be suitable for running on a HHD hard disk in the step S1, high-throughput data access can be provided, the requirements of POSIX are relaxed, the data in the file system can be accessed in a stream form, the data storage mode in the Hdfs is grouped according to the date, and the Hive comprises a data source, data storage and management, data service and data application;
s3: backup of data
The Hdfs in the step S2 is utilized to backup the stored user and product data, so that good data backup can be provided;
s4: data query
By extracting the user and product data stored by Hdfs in the querying step S2 and the data backed up by Hdfs on the stored user and product in the step S3 by using the ability of the Hive system in the step S2 to provide the query history data, hive can allow the query history data to be queried through api and interactive clients.
In step S1, ST6000NM0034NWCCG Dell6TB3.5 inch 12Gb HDD V4SAS hard disk is adopted.
Wherein POSIX is a portable operating system interface.
The data source is a data source of a data warehouse, and comprises external data, an existing service system and document data, the data storage and management is the storage and management of the data, the data warehouse, a data warehouse detection, operation and maintenance tool and metadata management are included, the data service provides data service for a front end and an application, the data service can be directly obtained from the data warehouse for the front end application, the data service responsible for the front end application can also be provided through an OLAP server, and the data application is directly oriented to a user and comprises a data query tool, a free report tool, a data analysis tool, a data mining tool and various application systems.
The OLAP is online analysis processing, and can rapidly, consistently and interactively observe information from various aspects so as to achieve the purpose of deeply understanding data.
In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The preferred embodiments of the invention disclosed above are intended only to assist in the explanation of the invention. The preferred embodiments are not exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof.
Claims (7)
1. A low cost data storage method, characterized by:
s1: selection of storage hardware
Selecting an HDD hard disk with stable performance to store all user and product data in a distributed arrangement mode;
s2: storage of data
User and product data storage inputs are performed by using a distributed file system Hdfs and a data warehouse Hive built on the distributed file system Hdfs;
s3: backup of data
The Hdfs in the step S2 is utilized to backup the stored user and product data, so that good data backup can be provided;
s4: data query
Extracting user and product data stored by Hdfs in the inquiring step S2 and data backed up by Hdfs on the stored user and product in the step S3 by utilizing the capability of providing inquiring historical data by the Hive system in the step S2;
the Hive in steps S2 and S4 includes a data source, a data storage and management, a data service and a data application, where the data source is a data source of a data warehouse and includes external data, an existing service system and document data, the data storage and management includes data warehouse, data mart, data warehouse detection, operation and maintenance tools and metadata management, the data service provides data service for front end and application, the data service can be directly obtained from the data warehouse for front end application, and the data service responsible for front end application can also be provided through OLAP server, and the data application is directly oriented to users and includes a data query tool, a free report tool, a data analysis tool, a data mining tool and various application systems.
2. A low cost data storage method according to claim 1, wherein: in the step S1, the standard data is ST6000NM0034 NWCCDCDDELL6 TB3.5 inch 12GbHDDV4SAS hard disk.
3. A low cost data storage method according to claim 1, wherein: the Hdfs in step S2 is a distributed file system designed to be suitable for running on the HHD hard disk in step S1, and can provide high throughput data access, and relaxes the requirements of POSIX so that the data in the file system can be accessed in a streaming form.
4. A low cost data storage method according to claim 3, wherein: the POSIX is a portable operating system interface.
5. A low cost data storage method according to claim 4, wherein: the OLAP is an online analysis process, and can rapidly, consistently and interactively observe information from various aspects so as to achieve the purpose of deeply understanding data.
6. A low cost data storage method according to claim 1, wherein: the data are grouped by date in the manner of storing the data in the step Hdfs.
7. A low cost data storage method according to claim 1, wherein: hive in step S4 can allow for querying historical data via api and interactive clients.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911103260.0A CN111026814B (en) | 2019-11-12 | 2019-11-12 | Low-cost data storage method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911103260.0A CN111026814B (en) | 2019-11-12 | 2019-11-12 | Low-cost data storage method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111026814A CN111026814A (en) | 2020-04-17 |
CN111026814B true CN111026814B (en) | 2024-04-12 |
Family
ID=70205485
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911103260.0A Active CN111026814B (en) | 2019-11-12 | 2019-11-12 | Low-cost data storage method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111026814B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104111996A (en) * | 2014-07-07 | 2014-10-22 | 山大地纬软件股份有限公司 | Health insurance outpatient clinic big data extraction system and method based on hadoop platform |
CN108268217A (en) * | 2018-01-10 | 2018-07-10 | 北京航天云路有限公司 | A kind of bedding storage method based on the cold and hot classification of time series data |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101918806B1 (en) * | 2015-06-30 | 2018-11-14 | 전자부품연구원 | Cache Management Method for Optimizing the Read Performance of Distributed File System |
-
2019
- 2019-11-12 CN CN201911103260.0A patent/CN111026814B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104111996A (en) * | 2014-07-07 | 2014-10-22 | 山大地纬软件股份有限公司 | Health insurance outpatient clinic big data extraction system and method based on hadoop platform |
CN108268217A (en) * | 2018-01-10 | 2018-07-10 | 北京航天云路有限公司 | A kind of bedding storage method based on the cold and hot classification of time series data |
Non-Patent Citations (1)
Title |
---|
周逸文 ; .分布式存储技术和应用浅析.数码世界.2017,(12),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN111026814A (en) | 2020-04-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3028137B1 (en) | Generating a multi-column index for relational databases by interleaving data bits for selectivity | |
US20180210934A1 (en) | Systems and methods for interest-driven business intelligence systems including event-oriented data | |
Chaudhuri et al. | An overview of business intelligence technology | |
US9842152B2 (en) | Transparent discovery of semi-structured data schema | |
US11347740B2 (en) | Managed query execution platform, and methods thereof | |
US10061834B1 (en) | Incremental out-of-place updates for datasets in data stores | |
US10114846B1 (en) | Balanced distribution of sort order values for a multi-column sort order of a relational database | |
TW201530328A (en) | Method and device for constructing NoSQL database index for semi-structured data | |
CN114860780A (en) | Data warehouse, data processing system and computer device | |
Ranawade et al. | Online analytical processing on hadoop using apache kylin | |
Vishwanath et al. | An Association Rule Mining for Materialized View Selection and View Maintanance | |
CN107341198B (en) | Electric power mass data storage and query method based on theme instance | |
US11520763B2 (en) | Automated optimization for in-memory data structures of column store databases | |
CN111026814B (en) | Low-cost data storage method | |
CN111046013B (en) | Cold data full-quantity storage and query architecture | |
CA2701173A1 (en) | System and method for distributing queries to a group of databases and expediting data access | |
Li et al. | A comparative study of row and column storage for time series data | |
Fong et al. | Toward a scale-out data-management middleware for low-latency enterprise computing | |
Golab | Querying sliding windows over online data streams | |
CN104657370B (en) | A kind of associated method and apparatus of realization multi-dimension data cube | |
Liu et al. | The Read Amplification Analysis of NoSQL Database on Top of OSDs: A Case Study of HBase | |
Baboo et al. | Next generation data warehouse design with OLTP and OLAP systems sharing same database | |
CN107609746B (en) | Intelligent bidding method based on data OLAP analysis and matched retrieval | |
Alam | Data migration: relational RDBMS to non-relational NoSQL | |
KR20160127448A (en) | Big data analysis relational database management system using high speed semiconductor storage device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |