CN111046099A

CN111046099A - Thermal data high-performance storage framework

Info

Publication number: CN111046099A
Application number: CN201911102682.6A
Authority: CN
Inventors: 冯报安; 杨晶生
Original assignee: Shanghai Microphone Culture Media Co ltd
Current assignee: Shanghai Microphone Culture Media Co ltd
Priority date: 2019-11-12
Filing date: 2019-11-12
Publication date: 2020-04-21

Abstract

The invention belongs to the technical field of big data storage, and particularly relates to a high-performance thermal data storage architecture which comprises an open source processing platform Kafka and an Hbase open source database, wherein the thermal data comprises hot spot service data within 7 days, and the storage process of the thermal data comprises the following steps: monitoring real-time log data from an open source message queue Kafka; storing the log data into Hbase in real time, and setting automatic expiration time; hbase provides high-performance random read-write operation outwards; and aggregating and synchronously archiving the data of the previous day to other low-performance and low-price databases at the daily timing. The invention provides the method for screening and extracting data, only stores the recent data with short time according to the service, simultaneously uses the distributed column-type database for storage, and greatly improves the random read-write performance under the condition of huge user quantity by abandoning the strong consistency and transaction provided by the traditional database.

Description

Thermal data high-performance storage framework

Technical Field

The invention relates to the technical field of big data storage, in particular to a high-performance thermal data storage framework.

Background

The on-line service is very sensitive to response delay, and any lengthy inquiry or operation time can cause the service usage experience to be severely degraded and cause loss of users. However, with the expansion of the current business, the data volume is larger and larger, and the traditional relational database is difficult to meet the increasing demand, so that a more modern and novel storage model is required to be used.

In addition, according to the demand of online service, in most cases, only the data of the last few days need to be accessed, so that the required storage space usage can be controlled even if the user volume is hundreds of millions. Therefore, when hardware is selected, an expensive SSD hard disk with excellent performance can be selected to further improve performance.

To this end, we propose a thermal data high performance storage architecture to solve the above problem.

Disclosure of Invention

The invention aims to solve the defects in the prior art and provides a thermal data high-performance storage architecture.

In order to achieve the purpose, the invention adopts the following technical scheme:

a high-performance storage architecture for thermal data comprises Kafka and Hbase open source databases of open source processing platforms, the thermal data comprises hot spot service data within 7 days, and a storage process of the thermal data comprises the following steps:

s1, monitoring real-time log data from an open source message queue Kafka;

s2, storing the log data into Hbase in real time, and setting automatic expiration time (such as 7 days);

s3, providing high-performance random read-write operation by Hbase;

and S4, aggregating and synchronously archiving the data of the previous day to other low-performance and low-cost databases at the daily timing.

In the above thermal data high performance storage architecture, the installation manner of Kafka in step S1 includes the following steps:

a1, entering Kafka configuration catalog, D: \ dev \ Kafka _ 2.12-1.0.1;

a2, edit file "server.

A3, find and edit log.dirs ═ D: \ \ dev \ \ kafka _2.12-1.0.1\ \ tmp.

In the above-mentioned thermal data high-performance storage architecture, the manner for listening to the open source processing platform Kafka in step S1 includes the following operation steps:

b1, installing Zookeeper, entering a Zookeeper setting catalog, and writing a person D of \ dev \ Zookeeper-3.4.10\ conf;

b2, rename "zo _ sample.cfg" zo _ cfg ";

b3, open zoo.cfg in any text editor (e.g. statepad);

b4, finding and editing dataDir ═ D: \ \ dev \ \ zookeeper-3.4.10\ \ temp

B5, operating zookeeper, D \\ dev \ zookeeper-3.4.10\ bin \ zkServer. cmd;

b6, entering Kafka installation catalog D \ dev \ Kafka _ 2.12-1.0.1;

b7, pressing the Shift + right key, selecting the option of 'opening a command window', and opening a command line;

b8, now input \\ \ bin \ windows \ kafka-server-start.

In the above-mentioned thermal data high-performance storage architecture, the SSD hard disk is used as hardware for storing the service data and providing good performance in step S2.

In the above-mentioned high-performance storage architecture for thermal data, the Get method provided by the Hbase in step S3 provides a method for obtaining data in batch, and is implemented by assembling a list < Get > gets.

In the above high-performance thermal data storage architecture, the low-cost database in step S4 is a MySQL database, and the MySQL database has the characteristics of a dual-node architecture and automatic disaster recovery, which can reduce the cost by 40% compared with the cloud host self-construction.

Compared with the prior art, the thermal data high-performance storage architecture has the advantages that:

the method is characterized in that data is screened and extracted, only short-term data with short time is stored according to business, meanwhile, a distributed column-type database is used for storing, and random read-write performance under the condition of huge user quantity is greatly improved by abandoning strong consistency and affairs provided by a traditional database;

meanwhile, the whole data storage usage amount is reduced, so that the SSD hard disk which is excellent in service performance and expensive can be used. Thus, the cost is well controlled while the read-write delay is further reduced. For hot spot data needing random reading and writing, Hbase and an SSD hard disk are used to provide random reading performance of 20ms on average.

Drawings

FIG. 1 is a diagram of the method steps for a thermal data high performance memory architecture according to the present invention;

FIG. 2 is a data introduction diagram of a thermal data high performance memory architecture according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.

Referring to fig. 1-2, a high-performance storage architecture for thermal data includes an open source processing platform Kafka and Hbase open source database, the thermal data includes hot spot service data within 7 days, and a storage process of the thermal data includes the following steps:

s1, monitoring real-time log data from an open source message queue Kafka;

s3, providing high-performance random read-write operation by Hbase;

Specifically, the Kafka installation method in step S1 includes the following steps:

a1, entering Kafka configuration catalog, D: \ dev \ Kafka _ 2.12-1.0.1;

a2, edit file "server.

A3, find and edit log.dirs ═ D: \ \ dev \ \ kafka _2.12-1.0.1\ \ tmp.

More specifically, the method for listening to the Kafka in the open source processing platform in step S1 includes the following operation steps:

b2, rename "zo _ sample.cfg" zo _ cfg ";

b3, open zoo.cfg in any text editor (e.g. statepad);

b4, finding and editing dataDir ═ D: \ \ dev \ \ zookeeper-3.4.10\ \ temp

B5, operating zookeeper, D \\ dev \ zookeeper-3.4.10\ bin \ zkServer. cmd;

b6, entering Kafka installation catalog D \ dev \ Kafka _ 2.12-1.0.1;

b8, now input \\ \ bin \ windows \ kafka-server-start.

And the following two ways can be adopted during the listening:

the first method is as follows:

the second method comprises the following steps:

in step S2, the SSD hard disk is used as hardware for storing the service data and providing good performance, and only a small amount of hot spot data is saved, thereby reducing the cost of the hardware.

The Get method provided by the Hbase in step S3 provides a batch data obtaining method, and is implemented by assembling a list < Get > gets.

The low-price database in the step S4 is a MySQL database, and the MySQL database has the characteristics of a dual-node architecture and automatic disaster recovery, which can reduce the cost by 40% compared with the self-construction of a cloud host.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims

1. A high-performance storage architecture of thermal data is characterized by comprising Kafka and Hbase open source databases of open source processing platforms, wherein the thermal data comprises hot spot service data within 7 days, and the storage process of the thermal data comprises the following steps:

s1, monitoring real-time log data from an open source message queue Kafka;

s3, providing high-performance random read-write operation by Hbase;

2. The thermal data high-performance storage architecture according to claim 1, wherein the Kafka installation manner in step S1 includes the following steps:

a1, entering Kafka configuration catalog, D: \ dev \ Kafka _ 2.12-1.0.1;

a2, edit file "server.

A3, find and edit log.dirs ═ D: \ \ dev \ \ kafka _2.12-1.0.1\ \ tmp.

3. The thermal data high-performance storage architecture according to claim 2, wherein the manner for listening to the open source processing platform Kafka in step S1 includes the following operation steps:

b2, rename "zo _ sample.cfg" zo _ cfg ";

b3, open zoo.cfg in any text editor (e.g. statepad);

b4, finding and editing dataDir ═ D: \ \ dev \ \ zookeeper-3.4.10\ \ temp

B5, operating zookeeper, D \\ dev \ zookeeper-3.4.10\ bin \ zkServer. cmd;

b6, entering Kafka installation catalog D \ dev \ Kafka _ 2.12-1.0.1;

b8, now input \\ \ bin \ windows \ kafka-server-start.

4. The thermal data high-performance storage architecture according to claim 1, wherein said step S2 uses SSD hard disk as hardware for storing service data and providing good performance.

5. The architecture of claim 1, wherein the Get method provided by Hbase in step S3 provides a batch data access method, which is implemented by assembling a list < Get > gets.

6. The high-performance thermal data storage architecture according to claim 1, wherein the low-cost database in step S4 is MySQL database, and the MySQL database has a dual-node architecture and is capable of automatically disaster recovery, which can reduce cost by 40% compared with a cloud host self-building.