CN111046099A - Thermal data high-performance storage framework - Google Patents
Thermal data high-performance storage framework Download PDFInfo
- Publication number
- CN111046099A CN111046099A CN201911102682.6A CN201911102682A CN111046099A CN 111046099 A CN111046099 A CN 111046099A CN 201911102682 A CN201911102682 A CN 201911102682A CN 111046099 A CN111046099 A CN 111046099A
- Authority
- CN
- China
- Prior art keywords
- data
- kafka
- performance
- hbase
- thermal data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 claims abstract description 17
- 238000013500 data storage Methods 0.000 claims abstract description 7
- 230000004931 aggregating effect Effects 0.000 claims abstract description 4
- 238000012544 monitoring process Methods 0.000 claims abstract description 4
- 238000009434 installation Methods 0.000 claims description 6
- 238000011084 recovery Methods 0.000 claims description 3
- 238000012216 screening Methods 0.000 abstract 1
- 238000010276 construction Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
- G06F3/0611—Improving I/O performance in relation to response time
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/065—Replication mechanisms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to the technical field of big data storage, and particularly relates to a high-performance thermal data storage architecture which comprises an open source processing platform Kafka and an Hbase open source database, wherein the thermal data comprises hot spot service data within 7 days, and the storage process of the thermal data comprises the following steps: monitoring real-time log data from an open source message queue Kafka; storing the log data into Hbase in real time, and setting automatic expiration time; hbase provides high-performance random read-write operation outwards; and aggregating and synchronously archiving the data of the previous day to other low-performance and low-price databases at the daily timing. The invention provides the method for screening and extracting data, only stores the recent data with short time according to the service, simultaneously uses the distributed column-type database for storage, and greatly improves the random read-write performance under the condition of huge user quantity by abandoning the strong consistency and transaction provided by the traditional database.
Description
Technical Field
The invention relates to the technical field of big data storage, in particular to a high-performance thermal data storage framework.
Background
The on-line service is very sensitive to response delay, and any lengthy inquiry or operation time can cause the service usage experience to be severely degraded and cause loss of users. However, with the expansion of the current business, the data volume is larger and larger, and the traditional relational database is difficult to meet the increasing demand, so that a more modern and novel storage model is required to be used.
In addition, according to the demand of online service, in most cases, only the data of the last few days need to be accessed, so that the required storage space usage can be controlled even if the user volume is hundreds of millions. Therefore, when hardware is selected, an expensive SSD hard disk with excellent performance can be selected to further improve performance.
To this end, we propose a thermal data high performance storage architecture to solve the above problem.
Disclosure of Invention
The invention aims to solve the defects in the prior art and provides a thermal data high-performance storage architecture.
In order to achieve the purpose, the invention adopts the following technical scheme:
a high-performance storage architecture for thermal data comprises Kafka and Hbase open source databases of open source processing platforms, the thermal data comprises hot spot service data within 7 days, and a storage process of the thermal data comprises the following steps:
s1, monitoring real-time log data from an open source message queue Kafka;
s2, storing the log data into Hbase in real time, and setting automatic expiration time (such as 7 days);
s3, providing high-performance random read-write operation by Hbase;
and S4, aggregating and synchronously archiving the data of the previous day to other low-performance and low-cost databases at the daily timing.
In the above thermal data high performance storage architecture, the installation manner of Kafka in step S1 includes the following steps:
a1, entering Kafka configuration catalog, D: \ dev \ Kafka _ 2.12-1.0.1;
a2, edit file "server.
A3, find and edit log.dirs ═ D: \ \ dev \ \ kafka _2.12-1.0.1\ \ tmp.
In the above-mentioned thermal data high-performance storage architecture, the manner for listening to the open source processing platform Kafka in step S1 includes the following operation steps:
b1, installing Zookeeper, entering a Zookeeper setting catalog, and writing a person D of \ dev \ Zookeeper-3.4.10\ conf;
b2, rename "zo _ sample.cfg" zo _ cfg ";
b3, open zoo.cfg in any text editor (e.g. statepad);
b4, finding and editing dataDir ═ D: \ \ dev \ \ zookeeper-3.4.10\ \ temp
B5, operating zookeeper, D \\ dev \ zookeeper-3.4.10\ bin \ zkServer. cmd;
b6, entering Kafka installation catalog D \ dev \ Kafka _ 2.12-1.0.1;
b7, pressing the Shift + right key, selecting the option of 'opening a command window', and opening a command line;
b8, now input \\ \ bin \ windows \ kafka-server-start.
In the above-mentioned thermal data high-performance storage architecture, the SSD hard disk is used as hardware for storing the service data and providing good performance in step S2.
In the above-mentioned high-performance storage architecture for thermal data, the Get method provided by the Hbase in step S3 provides a method for obtaining data in batch, and is implemented by assembling a list < Get > gets.
In the above high-performance thermal data storage architecture, the low-cost database in step S4 is a MySQL database, and the MySQL database has the characteristics of a dual-node architecture and automatic disaster recovery, which can reduce the cost by 40% compared with the cloud host self-construction.
Compared with the prior art, the thermal data high-performance storage architecture has the advantages that:
the method is characterized in that data is screened and extracted, only short-term data with short time is stored according to business, meanwhile, a distributed column-type database is used for storing, and random read-write performance under the condition of huge user quantity is greatly improved by abandoning strong consistency and affairs provided by a traditional database;
meanwhile, the whole data storage usage amount is reduced, so that the SSD hard disk which is excellent in service performance and expensive can be used. Thus, the cost is well controlled while the read-write delay is further reduced. For hot spot data needing random reading and writing, Hbase and an SSD hard disk are used to provide random reading performance of 20ms on average.
Drawings
FIG. 1 is a diagram of the method steps for a thermal data high performance memory architecture according to the present invention;
FIG. 2 is a data introduction diagram of a thermal data high performance memory architecture according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
Referring to fig. 1-2, a high-performance storage architecture for thermal data includes an open source processing platform Kafka and Hbase open source database, the thermal data includes hot spot service data within 7 days, and a storage process of the thermal data includes the following steps:
s1, monitoring real-time log data from an open source message queue Kafka;
s2, storing the log data into Hbase in real time, and setting automatic expiration time (such as 7 days);
s3, providing high-performance random read-write operation by Hbase;
and S4, aggregating and synchronously archiving the data of the previous day to other low-performance and low-cost databases at the daily timing.
Specifically, the Kafka installation method in step S1 includes the following steps:
a1, entering Kafka configuration catalog, D: \ dev \ Kafka _ 2.12-1.0.1;
a2, edit file "server.
A3, find and edit log.dirs ═ D: \ \ dev \ \ kafka _2.12-1.0.1\ \ tmp.
More specifically, the method for listening to the Kafka in the open source processing platform in step S1 includes the following operation steps:
b1, installing Zookeeper, entering a Zookeeper setting catalog, and writing a person D of \ dev \ Zookeeper-3.4.10\ conf;
b2, rename "zo _ sample.cfg" zo _ cfg ";
b3, open zoo.cfg in any text editor (e.g. statepad);
b4, finding and editing dataDir ═ D: \ \ dev \ \ zookeeper-3.4.10\ \ temp
B5, operating zookeeper, D \\ dev \ zookeeper-3.4.10\ bin \ zkServer. cmd;
b6, entering Kafka installation catalog D \ dev \ Kafka _ 2.12-1.0.1;
b7, pressing the Shift + right key, selecting the option of 'opening a command window', and opening a command line;
b8, now input \\ \ bin \ windows \ kafka-server-start.
And the following two ways can be adopted during the listening:
the first method is as follows:
the second method comprises the following steps:
in step S2, the SSD hard disk is used as hardware for storing the service data and providing good performance, and only a small amount of hot spot data is saved, thereby reducing the cost of the hardware.
The Get method provided by the Hbase in step S3 provides a batch data obtaining method, and is implemented by assembling a list < Get > gets.
The low-price database in the step S4 is a MySQL database, and the MySQL database has the characteristics of a dual-node architecture and automatic disaster recovery, which can reduce the cost by 40% compared with the self-construction of a cloud host.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.
Claims (6)
1. A high-performance storage architecture of thermal data is characterized by comprising Kafka and Hbase open source databases of open source processing platforms, wherein the thermal data comprises hot spot service data within 7 days, and the storage process of the thermal data comprises the following steps:
s1, monitoring real-time log data from an open source message queue Kafka;
s2, storing the log data into Hbase in real time, and setting automatic expiration time (such as 7 days);
s3, providing high-performance random read-write operation by Hbase;
and S4, aggregating and synchronously archiving the data of the previous day to other low-performance and low-cost databases at the daily timing.
2. The thermal data high-performance storage architecture according to claim 1, wherein the Kafka installation manner in step S1 includes the following steps:
a1, entering Kafka configuration catalog, D: \ dev \ Kafka _ 2.12-1.0.1;
a2, edit file "server.
A3, find and edit log.dirs ═ D: \ \ dev \ \ kafka _2.12-1.0.1\ \ tmp.
3. The thermal data high-performance storage architecture according to claim 2, wherein the manner for listening to the open source processing platform Kafka in step S1 includes the following operation steps:
b1, installing Zookeeper, entering a Zookeeper setting catalog, and writing a person D of \ dev \ Zookeeper-3.4.10\ conf;
b2, rename "zo _ sample.cfg" zo _ cfg ";
b3, open zoo.cfg in any text editor (e.g. statepad);
b4, finding and editing dataDir ═ D: \ \ dev \ \ zookeeper-3.4.10\ \ temp
B5, operating zookeeper, D \\ dev \ zookeeper-3.4.10\ bin \ zkServer. cmd;
b6, entering Kafka installation catalog D \ dev \ Kafka _ 2.12-1.0.1;
b7, pressing the Shift + right key, selecting the option of 'opening a command window', and opening a command line;
b8, now input \\ \ bin \ windows \ kafka-server-start.
4. The thermal data high-performance storage architecture according to claim 1, wherein said step S2 uses SSD hard disk as hardware for storing service data and providing good performance.
5. The architecture of claim 1, wherein the Get method provided by Hbase in step S3 provides a batch data access method, which is implemented by assembling a list < Get > gets.
6. The high-performance thermal data storage architecture according to claim 1, wherein the low-cost database in step S4 is MySQL database, and the MySQL database has a dual-node architecture and is capable of automatically disaster recovery, which can reduce cost by 40% compared with a cloud host self-building.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911102682.6A CN111046099A (en) | 2019-11-12 | 2019-11-12 | Thermal data high-performance storage framework |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911102682.6A CN111046099A (en) | 2019-11-12 | 2019-11-12 | Thermal data high-performance storage framework |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111046099A true CN111046099A (en) | 2020-04-21 |
Family
ID=70231828
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911102682.6A Pending CN111046099A (en) | 2019-11-12 | 2019-11-12 | Thermal data high-performance storage framework |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111046099A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112051968A (en) * | 2020-08-07 | 2020-12-08 | 东北大学 | Kafka-based distributed data stream hierarchical cache automatic migration algorithm |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106709003A (en) * | 2016-12-23 | 2017-05-24 | 长沙理工大学 | Hadoop-based mass log data processing method |
CN107169083A (en) * | 2017-05-11 | 2017-09-15 | 聚龙融创科技有限公司 | Public security bayonet socket magnanimity vehicle data storage and retrieval method and device, electronic equipment |
CN109542733A (en) * | 2018-12-05 | 2019-03-29 | 焦点科技股份有限公司 | A kind of highly reliable real-time logs collection and visual m odeling technique method |
CN110134723A (en) * | 2019-05-22 | 2019-08-16 | 网易(杭州)网络有限公司 | A kind of method and database of storing data |
CN110147398A (en) * | 2019-04-25 | 2019-08-20 | 北京字节跳动网络技术有限公司 | A kind of data processing method, device, medium and electronic equipment |
-
2019
- 2019-11-12 CN CN201911102682.6A patent/CN111046099A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106709003A (en) * | 2016-12-23 | 2017-05-24 | 长沙理工大学 | Hadoop-based mass log data processing method |
CN107169083A (en) * | 2017-05-11 | 2017-09-15 | 聚龙融创科技有限公司 | Public security bayonet socket magnanimity vehicle data storage and retrieval method and device, electronic equipment |
CN109542733A (en) * | 2018-12-05 | 2019-03-29 | 焦点科技股份有限公司 | A kind of highly reliable real-time logs collection and visual m odeling technique method |
CN110147398A (en) * | 2019-04-25 | 2019-08-20 | 北京字节跳动网络技术有限公司 | A kind of data processing method, device, medium and electronic equipment |
CN110134723A (en) * | 2019-05-22 | 2019-08-16 | 网易(杭州)网络有限公司 | A kind of method and database of storing data |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112051968A (en) * | 2020-08-07 | 2020-12-08 | 东北大学 | Kafka-based distributed data stream hierarchical cache automatic migration algorithm |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108924370B (en) | Call center outbound voice waveform analysis method, system, equipment and storage medium | |
KR102119258B1 (en) | Technique for implementing change data capture in database management system | |
JP2005523521A (en) | System and method for managing native application data | |
US10824612B2 (en) | Key ticketing system with lock-free concurrency and versioning | |
KR20200056357A (en) | Technique for implementing change data capture in database management system | |
CN110727406A (en) | Data storage scheduling method and device | |
CN110287152A (en) | A kind of method and relevant apparatus of data management | |
CN103514258A (en) | Centralized recording, preprocessing and replaying method based on offline cache file operation | |
CN110647460A (en) | Test resource management method and device and test client | |
CN110647423B (en) | Method, device and readable medium for creating storage volume mirror image based on application | |
CN111046099A (en) | Thermal data high-performance storage framework | |
CN108920095B (en) | Data storage optimization method and device based on CRUSH | |
CN104933077A (en) | Rule-based multi-file information analysis method | |
CN110377757B (en) | Real-time knowledge graph construction system | |
CN112130759A (en) | Parameter configuration method, system and related device of storage system | |
US20190377800A1 (en) | Natural language processing system | |
CN106990917B (en) | File reading and writing method and system | |
CN107656936B (en) | Terminal database construction method in field of instant messaging | |
CN109213639A (en) | A kind of storage and disaster tolerance method and device | |
US11943294B1 (en) | Storage medium and compression for object stores | |
CN100384282C (en) | Method for realizing recording cell journal | |
US20180217825A1 (en) | Parallel diagnostic/software installation system | |
CN114116646A (en) | Log data processing method, device, equipment and storage medium | |
CN113986840A (en) | Block chain data multilevel storage and reading method and storage system | |
CN113468259A (en) | Real-time data acquisition and storage method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200421 |
|
RJ01 | Rejection of invention patent application after publication |