CN108021650A - A kind of efficient storage of time series data and reading system - Google Patents
A kind of efficient storage of time series data and reading system Download PDFInfo
- Publication number
- CN108021650A CN108021650A CN201711240991.0A CN201711240991A CN108021650A CN 108021650 A CN108021650 A CN 108021650A CN 201711240991 A CN201711240991 A CN 201711240991A CN 108021650 A CN108021650 A CN 108021650A
- Authority
- CN
- China
- Prior art keywords
- data
- difference
- module
- file
- read
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2474—Sequence data queries, e.g. querying versioned data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
Abstract
A kind of efficient storage of time series data and reading system, belong to Real-Time Databases System Technique field.Include the computer of one or more networking, constitute the hardware platform of system;The software of the system, including Data write. module, data compressing module and data read module are run on computers, and Data write. module is responsible for receiving new data, and data are respectively written into memory cache and journal file;Data compressing module is responsible for that compression algorithm and index structure are compressed into data file designed according to this invention by the data of journal file;Read module responds read requests, is returned after comprehensive memory cache and data file query result.Advantage is, compared to relevant database, disk takes less, read or write speed is fast;It is less that data take disk space after overcompression, and it is only 35% that disk space, which takes,;Faster, with Mysql database no-load voltage ratios, writing speed improves 3 times to writing speed;Faster, with Mysql database no-load voltage ratios, reading speed improves 20 times to data reading speed.
Description
Technical field
The invention belongs to Real-Time Databases System Technique field, more particularly to a kind of efficient storage of time series data and reading are
System.
Background technology
Time series data, that is, time series data, refers to and (changes with time tag according to the order of time, i.e. the time serializes)
Data.Time series data is mainly gathered with analytical equipment by all types of monitorings in real time such as electric power, chemical industry, metallurgy, inspection, produced
Data, the typical feature of these industrial datas is:Producing frequency, (each monitoring point can produce a plurality of number in one second soon
According to), to depend critically upon acquisition time (each data be required to correspond to unique time), measuring point multiple data quantity big (conventional
Real-time monitoring system has thousands of monitoring point, and monitoring point all produces data each second, produces the data of tens GB daily
Amount).
The storage and processing for time series data are often handled by the way of relevant database at present, but due to
The born inferior position of relevant database causes it can not carry out efficiently storage and the inquiry of data.Therefore there is an urgent need to a kind of special
Door does the efficient storage optimized and reading system for time series data.
The content of the invention
It is an object of the invention to provide a kind of efficient storage of time series data and system is read, when solving all kinds
The Efficient Compression of ordinal number evidence, efficiently write and efficiently read problem.
The system of the present invention includes the computer of one or more networking, constitutes the hardware platform of system;In computer
The software of upper operation the system, including Data write. module, data compressing module and data read module, Data write. module are born
Duty receives new data, and data are respectively written into memory cache and journal file;Data compressing module is responsible for the number of journal file
Data file is compressed into according to compression algorithm designed according to this invention and index structure;Read module responds read requests, comprehensive
Returned after memory cache and data file query result.
The present invention devises special compression method for all kinds time series data.Time series data type includes integer, floats
Five kinds of points, boolean, character string, markers data types, the compression method separately designed for this five kinds of data types are as follows:
The compression method of integer is that first integer does not compress, and the difference with previous number is calculated since second integer
Value, and ZigZag is carried out to difference and (proposes) coding first in protocol-buffers agreements by Google, by difference
For the positive number that is changed into of negative, difference (is then come from into paper using simple8b algorithms:Ann and Moffat, " Index
compression using 64-bit words",Softw.Pract.Exper.2010;40:131-147) it is compressed.
The compression method of floating number is that first floating number is not compressed, since second floating number with previous number into
Difference is calculated in row exclusive or.The difference very little obtained when two floating number numerical value are close, 10 is only deposited when difference is 0;
11 is deposited when being not zero, then with 0 quantity for being located at left end in 5 storages 64,0 quantity for occupying right end is stored with 6,
Again nonzero digit is intercepted out and stored.
The compression method of Boolean is that Boolean directly can be stored 64 with 1 storage, each 64 unsigned ints
A Boolean.
The compression method of character string is that character string order is added to after byte stream with snappy algorithms (by Google
In http:The algorithm of increasing income that //google.github.io/snappy/ is provided) compression.
The compression method of markers number is that first markers number does not compress, since second markers number with previous number into
Row mathematic interpolation, first difference are not compressed, then since the 3rd number calculating difference difference, if the difference of difference is
0 (when the memory gap of data is identical), only stores 0 and 0 number occurred;Otherwise the difference of the difference is stored using simple8b
Value.
Ensure efficiently to write using memory cache and journal file.Speed random write soon is sequentially written in due to disk
It is slow (tracking and rotational latency) to enter speed, and is that mass data is constantly gathered, constantly write the characteristics of time series data, in order to improve
Write efficiency compiles batch of data (generally 5000 to 10000 points) according to roll-call, points, markers, value, markers, value ... order
Code is byte stream, and journal file is write after recycling snappy compression algorithms.Meanwhile the data of journal file will be write with point
The purpose of name, markers, structure deposit internal memory cache region of value, memory cache is synchronous with the holding of area's journal file, memory cache is generation
There is provided for journal file and read to closing on the efficient of data.Log file size is fixed, and is automatically generated when reaching prescribed level
One new journal file.
Special data file structure is devised for time series data, and designs multi-stage compression mechanism by journal file boil down to
Data file is used to efficiently read, reduces disk occupancy.Multiple data blocks and an index block, data block are included in data file
For one group of data point according to time sequence after data pass through the corresponding compressed word of compression algorithm of the foregoing point data type
Throttling, index block are made of roll-call, data number of blocks, initial time, end time, relative position, byte number.Timing into
Row multi-stage compression, is compressed since multiple journal files first, obtains level one data file, followed by from multiple level one datas
Compressing file obtains secondary data file, so compresses layer by layer, until data file reaches prescribed level.System is in memory
Each point structure memory index structure, is made of roll-call, time range, Data Filename, for rapidly locating point section
Data file corresponding to time data.Read data when, system first determine whether from close on internal memory cache region read or from
Read in which data file, its index structure fast positioning to the position where data is just utilized if being read from data file
Put, read so as to fulfill efficient.
It is an advantage of the current invention that comparing relevant database, disk takes less, read or write speed is fast.First, data are passed through
It is less that disk space is taken after compression, with Mysql database no-load voltage ratios, it is only 35% that disk space, which takes,;Secondly, data write-in speed
Faster, with Mysql database no-load voltage ratios, writing speed improves 3 times to degree;Finally, data reading speed faster, with Mysql databases
No-load voltage ratio, reading speed improve 20 times.
Brief description of the drawings
Fig. 1 is the building-block of logic of system.
Fig. 2 is data compression flow chart.
Fig. 3 is the structure chart of data file.
Embodiment
As shown in Figure 1, system includes memory cache, journal file, three kinds of data storage formats of data file and write-in, pressure
Contracting, read three kinds of data processing behaviors.In write-in, memory cache is synchronously written with journal file, and journal file timing is compressed
For data file, the data file progressively data file of boil down to bigger again.When reading, system needs to judge from memory cache
Middle read in still data file is read, and if being read in data file, utilizes the index rapidly locating position in file
Put.
As shown in Fig. 2, time series data includes roll-call, markers and numerical information, traversal is each when writing one group of time series data
Point, the data type for judging numerical value are integer, floating number, Boolean or character string, call respectively corresponding compression algorithm into
Row compression, calls the compression of markers compression algorithm by markers, is preserved after both are merged byte stream.
As shown in figure 3, data file is made of file header, data block area, index block and end-of-file.Wherein file header is fixed
Size is used for the version number for preserving system, and end-of-file fixed size is used to preserve the position of index block hereof.Data block area
Multiple databases can be stored, database produces after being compressed by Fig. 2 flows.Index block once stores the data that data are called the roll, put
Between at the beginning of type, data block number, data block, the end time of data block, data block initial position hereof and institute
The byte number accounted for.
Using said system as core, there is provided after necessary calling interface, can use, can use extensively as time series database
In plant processes monitoring and Internet of Things field.
Claims (1)
1. a kind of efficient storage of time series data and reading system, it is characterised in that include the computer of one or more networking,
The hardware platform of composition system;Data write. module, data compressing module and data read module, data are run on computers
Writing module is responsible for receiving new data, and data are respectively written into memory cache and journal file;Data compressing module was responsible for day
The data of will file are compressed into data file according to the compression algorithm and index structure of design;Read module responds read requests,
Returned after comprehensive memory cache and data file query result;
Time series data type includes five kinds of integer, floating number, boolean, character string, markers data types, for this five kinds of data class
The compression method that type separately designs is as follows:
The compression method of integer is that first integer does not compress, and the difference with previous number is calculated since second integer, and
ZigZag codings are carried out to difference, difference is changed into positive number for negative, is then pressed difference using simple8b algorithms
Contracting;
The compression method of floating number is that first floating number is not compressed, different with the progress of previous number since second floating number
Or difference is calculated;The difference very little obtained when two floating number numerical value are close, 10 is only deposited when difference is 0;It is not
11 is deposited when zero, then with 0 quantity for being located at left end in 5 storages 64,0 quantity for occupying right end is stored with 6, then will
Nonzero digit, which intercepts out, to be stored;
The compression method of Boolean is that Boolean directly can be stored 64 cloth with 1 storage, each 64 unsigned ints
Value of;
The compression method of character string is to use snappy compression algorithms after character string order is added to byte stream;
The compression method of markers number is that first markers number does not compress, poor with the progress of previous number since second markers number
Value calculates, and first difference do not compress, then since the 3rd number calculating difference difference, when the difference of difference is 0, only deposit
The number that storage 0 and 0 occurs;Otherwise the difference of the difference is stored using simple8b;
Ensure efficiently to write using memory cache and journal file.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711240991.0A CN108021650A (en) | 2017-11-30 | 2017-11-30 | A kind of efficient storage of time series data and reading system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711240991.0A CN108021650A (en) | 2017-11-30 | 2017-11-30 | A kind of efficient storage of time series data and reading system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108021650A true CN108021650A (en) | 2018-05-11 |
Family
ID=62077668
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711240991.0A Pending CN108021650A (en) | 2017-11-30 | 2017-11-30 | A kind of efficient storage of time series data and reading system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108021650A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109542059A (en) * | 2018-11-19 | 2019-03-29 | 国核自仪系统工程有限公司 | Historical data compression set and method |
CN109582708A (en) * | 2018-11-19 | 2019-04-05 | 冶金自动化研究设计院 | A kind of time series database system |
CN109710614A (en) * | 2018-12-28 | 2019-05-03 | 深圳市同行者科技有限公司 | A kind of method and device of real-time data memory and inquiry |
CN110636368A (en) * | 2018-06-25 | 2019-12-31 | 杭州海康威视数字技术股份有限公司 | Media playing method and device |
CN111078755A (en) * | 2019-12-19 | 2020-04-28 | 远景智能国际私人投资有限公司 | Time sequence data storage query method and device, server and storage medium |
CN111858391A (en) * | 2020-06-16 | 2020-10-30 | 中国人民解放军空军研究院航空兵研究所 | Method for optimizing compressed storage format in data processing process |
CN111953653A (en) * | 2020-07-07 | 2020-11-17 | 上海金仕达软件科技有限公司 | Data transmission method, system and device |
CN112054804A (en) * | 2020-09-11 | 2020-12-08 | 杭州海康威视数字技术股份有限公司 | Method and device for compressing data and method and device for decompressing data |
CN112181973A (en) * | 2019-07-01 | 2021-01-05 | 北京涛思数据科技有限公司 | Time sequence data storage method |
CN112632127A (en) * | 2020-12-29 | 2021-04-09 | 国华卫星数据科技有限公司 | Data processing method for real-time data acquisition and time sequence of equipment operation |
CN110943797B (en) * | 2019-12-18 | 2021-06-22 | 北京邮电大学 | Data compression method in SDH network |
CN113348450A (en) * | 2020-06-24 | 2021-09-03 | 智协慧同(北京)科技有限公司 | Vehicle-mounted data storage method and system |
WO2021176698A1 (en) * | 2020-03-06 | 2021-09-10 | 富士通株式会社 | Machine learning data generation program, machine learning program, machine learning data generation method, and extraction device |
CN113492890A (en) * | 2020-04-07 | 2021-10-12 | 中国航天科工飞航技术研究院(中国航天海鹰机电技术研究院) | Data acquisition and storage method for central control system and central control system |
CN113609085A (en) * | 2021-08-18 | 2021-11-05 | 希尔塔(苏州)信息技术有限公司 | Automobile data rapid storage method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006244389A (en) * | 2005-03-07 | 2006-09-14 | Nippon Telegr & Teleph Corp <Ntt> | Similar time series data calculating device, and its method and program |
CN101795138A (en) * | 2010-01-19 | 2010-08-04 | 北京四方继保自动化股份有限公司 | Compressing method for high density time sequence data in WAMS (Wide Area Measurement System) of power system |
US20140040276A1 (en) * | 2012-07-31 | 2014-02-06 | International Business Machines Corporation | Method and apparatus for processing time series data |
CN105791228A (en) * | 2014-12-22 | 2016-07-20 | 中兴通讯股份有限公司 | Method and device for compressing timestamp |
CN106156037A (en) * | 2015-03-26 | 2016-11-23 | 深圳市腾讯计算机系统有限公司 | Data processing method, Apparatus and system |
-
2017
- 2017-11-30 CN CN201711240991.0A patent/CN108021650A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006244389A (en) * | 2005-03-07 | 2006-09-14 | Nippon Telegr & Teleph Corp <Ntt> | Similar time series data calculating device, and its method and program |
CN101795138A (en) * | 2010-01-19 | 2010-08-04 | 北京四方继保自动化股份有限公司 | Compressing method for high density time sequence data in WAMS (Wide Area Measurement System) of power system |
US20140040276A1 (en) * | 2012-07-31 | 2014-02-06 | International Business Machines Corporation | Method and apparatus for processing time series data |
CN103577456A (en) * | 2012-07-31 | 2014-02-12 | 国际商业机器公司 | Method and device for processing time series data |
CN105791228A (en) * | 2014-12-22 | 2016-07-20 | 中兴通讯股份有限公司 | Method and device for compressing timestamp |
CN106156037A (en) * | 2015-03-26 | 2016-11-23 | 深圳市腾讯计算机系统有限公司 | Data processing method, Apparatus and system |
Non-Patent Citations (2)
Title |
---|
MIKE_ZHANG: "Influxdb数据压缩", 《博客园》 * |
戴杨: "实时数据库中数据的分类压缩算法", 《计算机与现代化》 * |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110636368A (en) * | 2018-06-25 | 2019-12-31 | 杭州海康威视数字技术股份有限公司 | Media playing method and device |
CN110636368B (en) * | 2018-06-25 | 2021-12-24 | 杭州海康威视数字技术股份有限公司 | Media playing method, system, device and storage medium |
CN109582708A (en) * | 2018-11-19 | 2019-04-05 | 冶金自动化研究设计院 | A kind of time series database system |
CN109542059A (en) * | 2018-11-19 | 2019-03-29 | 国核自仪系统工程有限公司 | Historical data compression set and method |
CN109710614A (en) * | 2018-12-28 | 2019-05-03 | 深圳市同行者科技有限公司 | A kind of method and device of real-time data memory and inquiry |
CN112181973A (en) * | 2019-07-01 | 2021-01-05 | 北京涛思数据科技有限公司 | Time sequence data storage method |
CN112181973B (en) * | 2019-07-01 | 2023-05-30 | 北京涛思数据科技有限公司 | Time sequence data storage method |
CN110943797B (en) * | 2019-12-18 | 2021-06-22 | 北京邮电大学 | Data compression method in SDH network |
CN111078755A (en) * | 2019-12-19 | 2020-04-28 | 远景智能国际私人投资有限公司 | Time sequence data storage query method and device, server and storage medium |
JP7279266B2 (en) | 2019-12-19 | 2023-05-22 | エンヴィジョン デジタル インターナショナル ピーティーイー.エルティーディー. | Methods and apparatus for storing and querying time series data, and their servers and storage media |
WO2021126079A1 (en) * | 2019-12-19 | 2021-06-24 | Envision Digital International Pte. Ltd. | Method and apparatus for storing and querying time series data, and server and storage medium thereof |
KR102511271B1 (en) | 2019-12-19 | 2023-03-17 | 엔비전 디지털 인터내셔널 피티이 리미티드 | Method and device for storing and querying time series data, and server and storage medium therefor |
KR20220108186A (en) * | 2019-12-19 | 2022-08-02 | 엔비전 디지털 인터내셔널 피티이 리미티드 | Method and apparatus for storing and querying time series data, and server and storage medium thereof |
JP2023502543A (en) * | 2019-12-19 | 2023-01-24 | エンヴィジョン デジタル インターナショナル ピーティーイー.エルティーディー. | Methods and apparatus for storing and querying time series data, and their servers and storage media |
WO2021176698A1 (en) * | 2020-03-06 | 2021-09-10 | 富士通株式会社 | Machine learning data generation program, machine learning program, machine learning data generation method, and extraction device |
CN113492890A (en) * | 2020-04-07 | 2021-10-12 | 中国航天科工飞航技术研究院(中国航天海鹰机电技术研究院) | Data acquisition and storage method for central control system and central control system |
CN111858391A (en) * | 2020-06-16 | 2020-10-30 | 中国人民解放军空军研究院航空兵研究所 | Method for optimizing compressed storage format in data processing process |
CN113348450A (en) * | 2020-06-24 | 2021-09-03 | 智协慧同(北京)科技有限公司 | Vehicle-mounted data storage method and system |
WO2021258360A1 (en) * | 2020-06-24 | 2021-12-30 | 智协慧同(北京)科技有限公司 | On-board data storage method and system |
CN111953653A (en) * | 2020-07-07 | 2020-11-17 | 上海金仕达软件科技有限公司 | Data transmission method, system and device |
CN112054804A (en) * | 2020-09-11 | 2020-12-08 | 杭州海康威视数字技术股份有限公司 | Method and device for compressing data and method and device for decompressing data |
CN112632127B (en) * | 2020-12-29 | 2022-07-15 | 国华卫星数据科技有限公司 | Data processing method for real-time data acquisition and time sequence of equipment operation |
CN112632127A (en) * | 2020-12-29 | 2021-04-09 | 国华卫星数据科技有限公司 | Data processing method for real-time data acquisition and time sequence of equipment operation |
CN113609085A (en) * | 2021-08-18 | 2021-11-05 | 希尔塔(苏州)信息技术有限公司 | Automobile data rapid storage method |
CN113609085B (en) * | 2021-08-18 | 2023-08-15 | 希尔塔(苏州)信息技术有限公司 | Automobile data quick storage method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108021650A (en) | A kind of efficient storage of time series data and reading system | |
CN109582708A (en) | A kind of time series database system | |
CN105574212B (en) | A kind of image search method of more index disk hash data structures | |
US20110087669A1 (en) | Composite locality sensitive hash based processing of documents | |
US7849039B2 (en) | Method for using one-dimensional dynamics in assessing the similarity of sets of data using kinetic energy | |
CN111125116B (en) | Method and system for positioning code field in service table and corresponding code table | |
CN104881449A (en) | Image retrieval method based on manifold learning data compression hash | |
CN109597757B (en) | Method for measuring similarity between software networks based on multidimensional time series entropy | |
CN114003791A (en) | Depth map matching-based automatic classification method and system for medical data elements | |
Connor et al. | High-dimensional simplexes for supermetric search | |
CN116821646A (en) | Data processing chain construction method, data reduction method, device, equipment and medium | |
CN106372181A (en) | Big data compression method based on industrial process | |
CN105740428A (en) | B+ tree-based high-dimensional disc indexing structure and image search method | |
Zeng et al. | An empirical evaluation of columnar storage formats | |
CN105302915A (en) | High-performance data processing system based on memory calculation | |
CN109271614A (en) | A kind of data duplicate checking method | |
Martín-Fernández et al. | Indexes to find the optimal number of clusters in a hierarchical clustering | |
CN111767419A (en) | Picture searching method, device, equipment and computer readable storage medium | |
CN115862653A (en) | Audio denoising method and device, computer equipment and storage medium | |
Meng et al. | AAC: An anomaly aware time series compression algorithm towards green computing | |
Xu et al. | An approach to cluster electrical load profiles based on piecewise symbolic aggregation | |
CN113537349A (en) | Method, device, equipment and storage medium for identifying hardware fault of large host | |
Dongjie et al. | A data grouping model based on cache transaction for unstructured data storage systems | |
CN111401783A (en) | Power system operation data integration feature selection method | |
Luo et al. | A comparison of som based document categorization systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180511 |
|
WD01 | Invention patent application deemed withdrawn after publication |