CN111046099A - Thermal data high-performance storage framework - Google Patents

Thermal data high-performance storage framework Download PDF

Info

Publication number
CN111046099A
CN111046099A CN201911102682.6A CN201911102682A CN111046099A CN 111046099 A CN111046099 A CN 111046099A CN 201911102682 A CN201911102682 A CN 201911102682A CN 111046099 A CN111046099 A CN 111046099A
Authority
CN
China
Prior art keywords
data
kafka
performance
hbase
thermal data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911102682.6A
Other languages
Chinese (zh)
Inventor
冯报安
杨晶生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Microphone Culture Media Co ltd
Original Assignee
Shanghai Microphone Culture Media Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Microphone Culture Media Co ltd filed Critical Shanghai Microphone Culture Media Co ltd
Priority to CN201911102682.6A priority Critical patent/CN111046099A/en
Publication of CN111046099A publication Critical patent/CN111046099A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/065Replication mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of big data storage, and particularly relates to a high-performance thermal data storage architecture which comprises an open source processing platform Kafka and an Hbase open source database, wherein the thermal data comprises hot spot service data within 7 days, and the storage process of the thermal data comprises the following steps: monitoring real-time log data from an open source message queue Kafka; storing the log data into Hbase in real time, and setting automatic expiration time; hbase provides high-performance random read-write operation outwards; and aggregating and synchronously archiving the data of the previous day to other low-performance and low-price databases at the daily timing. The invention provides the method for screening and extracting data, only stores the recent data with short time according to the service, simultaneously uses the distributed column-type database for storage, and greatly improves the random read-write performance under the condition of huge user quantity by abandoning the strong consistency and transaction provided by the traditional database.

Description

Thermal data high-performance storage framework
Technical Field
The invention relates to the technical field of big data storage, in particular to a high-performance thermal data storage framework.
Background
The on-line service is very sensitive to response delay, and any lengthy inquiry or operation time can cause the service usage experience to be severely degraded and cause loss of users. However, with the expansion of the current business, the data volume is larger and larger, and the traditional relational database is difficult to meet the increasing demand, so that a more modern and novel storage model is required to be used.
In addition, according to the demand of online service, in most cases, only the data of the last few days need to be accessed, so that the required storage space usage can be controlled even if the user volume is hundreds of millions. Therefore, when hardware is selected, an expensive SSD hard disk with excellent performance can be selected to further improve performance.
To this end, we propose a thermal data high performance storage architecture to solve the above problem.
Disclosure of Invention
The invention aims to solve the defects in the prior art and provides a thermal data high-performance storage architecture.
In order to achieve the purpose, the invention adopts the following technical scheme:
a high-performance storage architecture for thermal data comprises Kafka and Hbase open source databases of open source processing platforms, the thermal data comprises hot spot service data within 7 days, and a storage process of the thermal data comprises the following steps:
s1, monitoring real-time log data from an open source message queue Kafka;
s2, storing the log data into Hbase in real time, and setting automatic expiration time (such as 7 days);
s3, providing high-performance random read-write operation by Hbase;
and S4, aggregating and synchronously archiving the data of the previous day to other low-performance and low-cost databases at the daily timing.
In the above thermal data high performance storage architecture, the installation manner of Kafka in step S1 includes the following steps:
a1, entering Kafka configuration catalog, D: \ dev \ Kafka _ 2.12-1.0.1;
a2, edit file "server.
A3, find and edit log.dirs ═ D: \ \ dev \ \ kafka _2.12-1.0.1\ \ tmp.
In the above-mentioned thermal data high-performance storage architecture, the manner for listening to the open source processing platform Kafka in step S1 includes the following operation steps:
b1, installing Zookeeper, entering a Zookeeper setting catalog, and writing a person D of \ dev \ Zookeeper-3.4.10\ conf;
b2, rename "zo _ sample.cfg" zo _ cfg ";
b3, open zoo.cfg in any text editor (e.g. statepad);
b4, finding and editing dataDir ═ D: \ \ dev \ \ zookeeper-3.4.10\ \ temp
B5, operating zookeeper, D \\ dev \ zookeeper-3.4.10\ bin \ zkServer. cmd;
b6, entering Kafka installation catalog D \ dev \ Kafka _ 2.12-1.0.1;
b7, pressing the Shift + right key, selecting the option of 'opening a command window', and opening a command line;
b8, now input \\ \ bin \ windows \ kafka-server-start.
In the above-mentioned thermal data high-performance storage architecture, the SSD hard disk is used as hardware for storing the service data and providing good performance in step S2.
In the above-mentioned high-performance storage architecture for thermal data, the Get method provided by the Hbase in step S3 provides a method for obtaining data in batch, and is implemented by assembling a list < Get > gets.
In the above high-performance thermal data storage architecture, the low-cost database in step S4 is a MySQL database, and the MySQL database has the characteristics of a dual-node architecture and automatic disaster recovery, which can reduce the cost by 40% compared with the cloud host self-construction.
Compared with the prior art, the thermal data high-performance storage architecture has the advantages that:
the method is characterized in that data is screened and extracted, only short-term data with short time is stored according to business, meanwhile, a distributed column-type database is used for storing, and random read-write performance under the condition of huge user quantity is greatly improved by abandoning strong consistency and affairs provided by a traditional database;
meanwhile, the whole data storage usage amount is reduced, so that the SSD hard disk which is excellent in service performance and expensive can be used. Thus, the cost is well controlled while the read-write delay is further reduced. For hot spot data needing random reading and writing, Hbase and an SSD hard disk are used to provide random reading performance of 20ms on average.
Drawings
FIG. 1 is a diagram of the method steps for a thermal data high performance memory architecture according to the present invention;
FIG. 2 is a data introduction diagram of a thermal data high performance memory architecture according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
Referring to fig. 1-2, a high-performance storage architecture for thermal data includes an open source processing platform Kafka and Hbase open source database, the thermal data includes hot spot service data within 7 days, and a storage process of the thermal data includes the following steps:
s1, monitoring real-time log data from an open source message queue Kafka;
s2, storing the log data into Hbase in real time, and setting automatic expiration time (such as 7 days);
s3, providing high-performance random read-write operation by Hbase;
and S4, aggregating and synchronously archiving the data of the previous day to other low-performance and low-cost databases at the daily timing.
Specifically, the Kafka installation method in step S1 includes the following steps:
a1, entering Kafka configuration catalog, D: \ dev \ Kafka _ 2.12-1.0.1;
a2, edit file "server.
A3, find and edit log.dirs ═ D: \ \ dev \ \ kafka _2.12-1.0.1\ \ tmp.
More specifically, the method for listening to the Kafka in the open source processing platform in step S1 includes the following operation steps:
b1, installing Zookeeper, entering a Zookeeper setting catalog, and writing a person D of \ dev \ Zookeeper-3.4.10\ conf;
b2, rename "zo _ sample.cfg" zo _ cfg ";
b3, open zoo.cfg in any text editor (e.g. statepad);
b4, finding and editing dataDir ═ D: \ \ dev \ \ zookeeper-3.4.10\ \ temp
B5, operating zookeeper, D \\ dev \ zookeeper-3.4.10\ bin \ zkServer. cmd;
b6, entering Kafka installation catalog D \ dev \ Kafka _ 2.12-1.0.1;
b7, pressing the Shift + right key, selecting the option of 'opening a command window', and opening a command line;
b8, now input \\ \ bin \ windows \ kafka-server-start.
And the following two ways can be adopted during the listening:
the first method is as follows:
Figure BDA0002270318610000051
Figure BDA0002270318610000061
Figure BDA0002270318610000071
the second method comprises the following steps:
Figure BDA0002270318610000072
Figure BDA0002270318610000081
Figure BDA0002270318610000091
in step S2, the SSD hard disk is used as hardware for storing the service data and providing good performance, and only a small amount of hot spot data is saved, thereby reducing the cost of the hardware.
The Get method provided by the Hbase in step S3 provides a batch data obtaining method, and is implemented by assembling a list < Get > gets.
The low-price database in the step S4 is a MySQL database, and the MySQL database has the characteristics of a dual-node architecture and automatic disaster recovery, which can reduce the cost by 40% compared with the self-construction of a cloud host.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims (6)

1. A high-performance storage architecture of thermal data is characterized by comprising Kafka and Hbase open source databases of open source processing platforms, wherein the thermal data comprises hot spot service data within 7 days, and the storage process of the thermal data comprises the following steps:
s1, monitoring real-time log data from an open source message queue Kafka;
s2, storing the log data into Hbase in real time, and setting automatic expiration time (such as 7 days);
s3, providing high-performance random read-write operation by Hbase;
and S4, aggregating and synchronously archiving the data of the previous day to other low-performance and low-cost databases at the daily timing.
2. The thermal data high-performance storage architecture according to claim 1, wherein the Kafka installation manner in step S1 includes the following steps:
a1, entering Kafka configuration catalog, D: \ dev \ Kafka _ 2.12-1.0.1;
a2, edit file "server.
A3, find and edit log.dirs ═ D: \ \ dev \ \ kafka _2.12-1.0.1\ \ tmp.
3. The thermal data high-performance storage architecture according to claim 2, wherein the manner for listening to the open source processing platform Kafka in step S1 includes the following operation steps:
b1, installing Zookeeper, entering a Zookeeper setting catalog, and writing a person D of \ dev \ Zookeeper-3.4.10\ conf;
b2, rename "zo _ sample.cfg" zo _ cfg ";
b3, open zoo.cfg in any text editor (e.g. statepad);
b4, finding and editing dataDir ═ D: \ \ dev \ \ zookeeper-3.4.10\ \ temp
B5, operating zookeeper, D \\ dev \ zookeeper-3.4.10\ bin \ zkServer. cmd;
b6, entering Kafka installation catalog D \ dev \ Kafka _ 2.12-1.0.1;
b7, pressing the Shift + right key, selecting the option of 'opening a command window', and opening a command line;
b8, now input \\ \ bin \ windows \ kafka-server-start.
4. The thermal data high-performance storage architecture according to claim 1, wherein said step S2 uses SSD hard disk as hardware for storing service data and providing good performance.
5. The architecture of claim 1, wherein the Get method provided by Hbase in step S3 provides a batch data access method, which is implemented by assembling a list < Get > gets.
6. The high-performance thermal data storage architecture according to claim 1, wherein the low-cost database in step S4 is MySQL database, and the MySQL database has a dual-node architecture and is capable of automatically disaster recovery, which can reduce cost by 40% compared with a cloud host self-building.
CN201911102682.6A 2019-11-12 2019-11-12 Thermal data high-performance storage framework Pending CN111046099A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911102682.6A CN111046099A (en) 2019-11-12 2019-11-12 Thermal data high-performance storage framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911102682.6A CN111046099A (en) 2019-11-12 2019-11-12 Thermal data high-performance storage framework

Publications (1)

Publication Number Publication Date
CN111046099A true CN111046099A (en) 2020-04-21

Family

ID=70231828

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911102682.6A Pending CN111046099A (en) 2019-11-12 2019-11-12 Thermal data high-performance storage framework

Country Status (1)

Country Link
CN (1) CN111046099A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112051968A (en) * 2020-08-07 2020-12-08 东北大学 Kafka-based distributed data stream hierarchical cache automatic migration algorithm

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709003A (en) * 2016-12-23 2017-05-24 长沙理工大学 Hadoop-based mass log data processing method
CN107169083A (en) * 2017-05-11 2017-09-15 聚龙融创科技有限公司 Public security bayonet socket magnanimity vehicle data storage and retrieval method and device, electronic equipment
CN109542733A (en) * 2018-12-05 2019-03-29 焦点科技股份有限公司 A kind of highly reliable real-time logs collection and visual m odeling technique method
CN110134723A (en) * 2019-05-22 2019-08-16 网易(杭州)网络有限公司 A kind of method and database of storing data
CN110147398A (en) * 2019-04-25 2019-08-20 北京字节跳动网络技术有限公司 A kind of data processing method, device, medium and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709003A (en) * 2016-12-23 2017-05-24 长沙理工大学 Hadoop-based mass log data processing method
CN107169083A (en) * 2017-05-11 2017-09-15 聚龙融创科技有限公司 Public security bayonet socket magnanimity vehicle data storage and retrieval method and device, electronic equipment
CN109542733A (en) * 2018-12-05 2019-03-29 焦点科技股份有限公司 A kind of highly reliable real-time logs collection and visual m odeling technique method
CN110147398A (en) * 2019-04-25 2019-08-20 北京字节跳动网络技术有限公司 A kind of data processing method, device, medium and electronic equipment
CN110134723A (en) * 2019-05-22 2019-08-16 网易(杭州)网络有限公司 A kind of method and database of storing data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112051968A (en) * 2020-08-07 2020-12-08 东北大学 Kafka-based distributed data stream hierarchical cache automatic migration algorithm

Similar Documents

Publication Publication Date Title
CN108924370B (en) Call center outbound voice waveform analysis method, system, equipment and storage medium
KR102119258B1 (en) Technique for implementing change data capture in database management system
JP2005523521A (en) System and method for managing native application data
US10824612B2 (en) Key ticketing system with lock-free concurrency and versioning
KR20200056357A (en) Technique for implementing change data capture in database management system
CN110727406A (en) Data storage scheduling method and device
CN110287152A (en) A kind of method and relevant apparatus of data management
CN103514258A (en) Centralized recording, preprocessing and replaying method based on offline cache file operation
CN110647460A (en) Test resource management method and device and test client
CN110647423B (en) Method, device and readable medium for creating storage volume mirror image based on application
CN111046099A (en) Thermal data high-performance storage framework
CN108920095B (en) Data storage optimization method and device based on CRUSH
CN104933077A (en) Rule-based multi-file information analysis method
CN110377757B (en) Real-time knowledge graph construction system
CN112130759A (en) Parameter configuration method, system and related device of storage system
US20190377800A1 (en) Natural language processing system
CN106990917B (en) File reading and writing method and system
CN107656936B (en) Terminal database construction method in field of instant messaging
CN109213639A (en) A kind of storage and disaster tolerance method and device
US11943294B1 (en) Storage medium and compression for object stores
CN100384282C (en) Method for realizing recording cell journal
US20180217825A1 (en) Parallel diagnostic/software installation system
CN114116646A (en) Log data processing method, device, equipment and storage medium
CN113986840A (en) Block chain data multilevel storage and reading method and storage system
CN113468259A (en) Real-time data acquisition and storage method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200421

RJ01 Rejection of invention patent application after publication