CN108021650A - A kind of efficient storage of time series data and reading system - Google Patents

A kind of efficient storage of time series data and reading system Download PDF

Info

Publication number
CN108021650A
CN108021650A CN201711240991.0A CN201711240991A CN108021650A CN 108021650 A CN108021650 A CN 108021650A CN 201711240991 A CN201711240991 A CN 201711240991A CN 108021650 A CN108021650 A CN 108021650A
Authority
CN
China
Prior art keywords
data
difference
module
file
read
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711240991.0A
Other languages
Chinese (zh)
Inventor
徐化岩
李勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Automation Research and Design Institute of Metallurgical Industry
Original Assignee
Automation Research and Design Institute of Metallurgical Industry
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Automation Research and Design Institute of Metallurgical Industry filed Critical Automation Research and Design Institute of Metallurgical Industry
Priority to CN201711240991.0A priority Critical patent/CN108021650A/en
Publication of CN108021650A publication Critical patent/CN108021650A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures

Abstract

A kind of efficient storage of time series data and reading system, belong to Real-Time Databases System Technique field.Include the computer of one or more networking, constitute the hardware platform of system;The software of the system, including Data write. module, data compressing module and data read module are run on computers, and Data write. module is responsible for receiving new data, and data are respectively written into memory cache and journal file;Data compressing module is responsible for that compression algorithm and index structure are compressed into data file designed according to this invention by the data of journal file;Read module responds read requests, is returned after comprehensive memory cache and data file query result.Advantage is, compared to relevant database, disk takes less, read or write speed is fast;It is less that data take disk space after overcompression, and it is only 35% that disk space, which takes,;Faster, with Mysql database no-load voltage ratios, writing speed improves 3 times to writing speed;Faster, with Mysql database no-load voltage ratios, reading speed improves 20 times to data reading speed.

Description

A kind of efficient storage of time series data and reading system
Technical field
The invention belongs to Real-Time Databases System Technique field, more particularly to a kind of efficient storage of time series data and reading are System.
Background technology
Time series data, that is, time series data, refers to and (changes with time tag according to the order of time, i.e. the time serializes) Data.Time series data is mainly gathered with analytical equipment by all types of monitorings in real time such as electric power, chemical industry, metallurgy, inspection, produced Data, the typical feature of these industrial datas is:Producing frequency, (each monitoring point can produce a plurality of number in one second soon According to), to depend critically upon acquisition time (each data be required to correspond to unique time), measuring point multiple data quantity big (conventional Real-time monitoring system has thousands of monitoring point, and monitoring point all produces data each second, produces the data of tens GB daily Amount).
The storage and processing for time series data are often handled by the way of relevant database at present, but due to The born inferior position of relevant database causes it can not carry out efficiently storage and the inquiry of data.Therefore there is an urgent need to a kind of special Door does the efficient storage optimized and reading system for time series data.
The content of the invention
It is an object of the invention to provide a kind of efficient storage of time series data and system is read, when solving all kinds The Efficient Compression of ordinal number evidence, efficiently write and efficiently read problem.
The system of the present invention includes the computer of one or more networking, constitutes the hardware platform of system;In computer The software of upper operation the system, including Data write. module, data compressing module and data read module, Data write. module are born Duty receives new data, and data are respectively written into memory cache and journal file;Data compressing module is responsible for the number of journal file Data file is compressed into according to compression algorithm designed according to this invention and index structure;Read module responds read requests, comprehensive Returned after memory cache and data file query result.
The present invention devises special compression method for all kinds time series data.Time series data type includes integer, floats Five kinds of points, boolean, character string, markers data types, the compression method separately designed for this five kinds of data types are as follows:
The compression method of integer is that first integer does not compress, and the difference with previous number is calculated since second integer Value, and ZigZag is carried out to difference and (proposes) coding first in protocol-buffers agreements by Google, by difference For the positive number that is changed into of negative, difference (is then come from into paper using simple8b algorithms:Ann and Moffat, " Index compression using 64-bit words",Softw.Pract.Exper.2010;40:131-147) it is compressed.
The compression method of floating number is that first floating number is not compressed, since second floating number with previous number into Difference is calculated in row exclusive or.The difference very little obtained when two floating number numerical value are close, 10 is only deposited when difference is 0; 11 is deposited when being not zero, then with 0 quantity for being located at left end in 5 storages 64,0 quantity for occupying right end is stored with 6, Again nonzero digit is intercepted out and stored.
The compression method of Boolean is that Boolean directly can be stored 64 with 1 storage, each 64 unsigned ints A Boolean.
The compression method of character string is that character string order is added to after byte stream with snappy algorithms (by Google In http:The algorithm of increasing income that //google.github.io/snappy/ is provided) compression.
The compression method of markers number is that first markers number does not compress, since second markers number with previous number into Row mathematic interpolation, first difference are not compressed, then since the 3rd number calculating difference difference, if the difference of difference is 0 (when the memory gap of data is identical), only stores 0 and 0 number occurred;Otherwise the difference of the difference is stored using simple8b Value.
Ensure efficiently to write using memory cache and journal file.Speed random write soon is sequentially written in due to disk It is slow (tracking and rotational latency) to enter speed, and is that mass data is constantly gathered, constantly write the characteristics of time series data, in order to improve Write efficiency compiles batch of data (generally 5000 to 10000 points) according to roll-call, points, markers, value, markers, value ... order Code is byte stream, and journal file is write after recycling snappy compression algorithms.Meanwhile the data of journal file will be write with point The purpose of name, markers, structure deposit internal memory cache region of value, memory cache is synchronous with the holding of area's journal file, memory cache is generation There is provided for journal file and read to closing on the efficient of data.Log file size is fixed, and is automatically generated when reaching prescribed level One new journal file.
Special data file structure is devised for time series data, and designs multi-stage compression mechanism by journal file boil down to Data file is used to efficiently read, reduces disk occupancy.Multiple data blocks and an index block, data block are included in data file For one group of data point according to time sequence after data pass through the corresponding compressed word of compression algorithm of the foregoing point data type Throttling, index block are made of roll-call, data number of blocks, initial time, end time, relative position, byte number.Timing into Row multi-stage compression, is compressed since multiple journal files first, obtains level one data file, followed by from multiple level one datas Compressing file obtains secondary data file, so compresses layer by layer, until data file reaches prescribed level.System is in memory Each point structure memory index structure, is made of roll-call, time range, Data Filename, for rapidly locating point section Data file corresponding to time data.Read data when, system first determine whether from close on internal memory cache region read or from Read in which data file, its index structure fast positioning to the position where data is just utilized if being read from data file Put, read so as to fulfill efficient.
It is an advantage of the current invention that comparing relevant database, disk takes less, read or write speed is fast.First, data are passed through It is less that disk space is taken after compression, with Mysql database no-load voltage ratios, it is only 35% that disk space, which takes,;Secondly, data write-in speed Faster, with Mysql database no-load voltage ratios, writing speed improves 3 times to degree;Finally, data reading speed faster, with Mysql databases No-load voltage ratio, reading speed improve 20 times.
Brief description of the drawings
Fig. 1 is the building-block of logic of system.
Fig. 2 is data compression flow chart.
Fig. 3 is the structure chart of data file.
Embodiment
As shown in Figure 1, system includes memory cache, journal file, three kinds of data storage formats of data file and write-in, pressure Contracting, read three kinds of data processing behaviors.In write-in, memory cache is synchronously written with journal file, and journal file timing is compressed For data file, the data file progressively data file of boil down to bigger again.When reading, system needs to judge from memory cache Middle read in still data file is read, and if being read in data file, utilizes the index rapidly locating position in file Put.
As shown in Fig. 2, time series data includes roll-call, markers and numerical information, traversal is each when writing one group of time series data Point, the data type for judging numerical value are integer, floating number, Boolean or character string, call respectively corresponding compression algorithm into Row compression, calls the compression of markers compression algorithm by markers, is preserved after both are merged byte stream.
As shown in figure 3, data file is made of file header, data block area, index block and end-of-file.Wherein file header is fixed Size is used for the version number for preserving system, and end-of-file fixed size is used to preserve the position of index block hereof.Data block area Multiple databases can be stored, database produces after being compressed by Fig. 2 flows.Index block once stores the data that data are called the roll, put Between at the beginning of type, data block number, data block, the end time of data block, data block initial position hereof and institute The byte number accounted for.
Using said system as core, there is provided after necessary calling interface, can use, can use extensively as time series database In plant processes monitoring and Internet of Things field.

Claims (1)

1. a kind of efficient storage of time series data and reading system, it is characterised in that include the computer of one or more networking, The hardware platform of composition system;Data write. module, data compressing module and data read module, data are run on computers Writing module is responsible for receiving new data, and data are respectively written into memory cache and journal file;Data compressing module was responsible for day The data of will file are compressed into data file according to the compression algorithm and index structure of design;Read module responds read requests, Returned after comprehensive memory cache and data file query result;
Time series data type includes five kinds of integer, floating number, boolean, character string, markers data types, for this five kinds of data class The compression method that type separately designs is as follows:
The compression method of integer is that first integer does not compress, and the difference with previous number is calculated since second integer, and ZigZag codings are carried out to difference, difference is changed into positive number for negative, is then pressed difference using simple8b algorithms Contracting;
The compression method of floating number is that first floating number is not compressed, different with the progress of previous number since second floating number Or difference is calculated;The difference very little obtained when two floating number numerical value are close, 10 is only deposited when difference is 0;It is not 11 is deposited when zero, then with 0 quantity for being located at left end in 5 storages 64,0 quantity for occupying right end is stored with 6, then will Nonzero digit, which intercepts out, to be stored;
The compression method of Boolean is that Boolean directly can be stored 64 cloth with 1 storage, each 64 unsigned ints Value of;
The compression method of character string is to use snappy compression algorithms after character string order is added to byte stream;
The compression method of markers number is that first markers number does not compress, poor with the progress of previous number since second markers number Value calculates, and first difference do not compress, then since the 3rd number calculating difference difference, when the difference of difference is 0, only deposit The number that storage 0 and 0 occurs;Otherwise the difference of the difference is stored using simple8b;
Ensure efficiently to write using memory cache and journal file.
CN201711240991.0A 2017-11-30 2017-11-30 A kind of efficient storage of time series data and reading system Pending CN108021650A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711240991.0A CN108021650A (en) 2017-11-30 2017-11-30 A kind of efficient storage of time series data and reading system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711240991.0A CN108021650A (en) 2017-11-30 2017-11-30 A kind of efficient storage of time series data and reading system

Publications (1)

Publication Number Publication Date
CN108021650A true CN108021650A (en) 2018-05-11

Family

ID=62077668

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711240991.0A Pending CN108021650A (en) 2017-11-30 2017-11-30 A kind of efficient storage of time series data and reading system

Country Status (1)

Country Link
CN (1) CN108021650A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109542059A (en) * 2018-11-19 2019-03-29 国核自仪系统工程有限公司 Historical data compression set and method
CN109582708A (en) * 2018-11-19 2019-04-05 冶金自动化研究设计院 A kind of time series database system
CN109710614A (en) * 2018-12-28 2019-05-03 深圳市同行者科技有限公司 A kind of method and device of real-time data memory and inquiry
CN110636368A (en) * 2018-06-25 2019-12-31 杭州海康威视数字技术股份有限公司 Media playing method and device
CN111078755A (en) * 2019-12-19 2020-04-28 远景智能国际私人投资有限公司 Time sequence data storage query method and device, server and storage medium
CN111858391A (en) * 2020-06-16 2020-10-30 中国人民解放军空军研究院航空兵研究所 Method for optimizing compressed storage format in data processing process
CN111953653A (en) * 2020-07-07 2020-11-17 上海金仕达软件科技有限公司 Data transmission method, system and device
CN112054804A (en) * 2020-09-11 2020-12-08 杭州海康威视数字技术股份有限公司 Method and device for compressing data and method and device for decompressing data
CN112181973A (en) * 2019-07-01 2021-01-05 北京涛思数据科技有限公司 Time sequence data storage method
CN112632127A (en) * 2020-12-29 2021-04-09 国华卫星数据科技有限公司 Data processing method for real-time data acquisition and time sequence of equipment operation
CN110943797B (en) * 2019-12-18 2021-06-22 北京邮电大学 Data compression method in SDH network
CN113348450A (en) * 2020-06-24 2021-09-03 智协慧同(北京)科技有限公司 Vehicle-mounted data storage method and system
WO2021176698A1 (en) * 2020-03-06 2021-09-10 富士通株式会社 Machine learning data generation program, machine learning program, machine learning data generation method, and extraction device
CN113492890A (en) * 2020-04-07 2021-10-12 中国航天科工飞航技术研究院(中国航天海鹰机电技术研究院) Data acquisition and storage method for central control system and central control system
CN113609085A (en) * 2021-08-18 2021-11-05 希尔塔(苏州)信息技术有限公司 Automobile data rapid storage method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006244389A (en) * 2005-03-07 2006-09-14 Nippon Telegr & Teleph Corp <Ntt> Similar time series data calculating device, and its method and program
CN101795138A (en) * 2010-01-19 2010-08-04 北京四方继保自动化股份有限公司 Compressing method for high density time sequence data in WAMS (Wide Area Measurement System) of power system
US20140040276A1 (en) * 2012-07-31 2014-02-06 International Business Machines Corporation Method and apparatus for processing time series data
CN105791228A (en) * 2014-12-22 2016-07-20 中兴通讯股份有限公司 Method and device for compressing timestamp
CN106156037A (en) * 2015-03-26 2016-11-23 深圳市腾讯计算机系统有限公司 Data processing method, Apparatus and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006244389A (en) * 2005-03-07 2006-09-14 Nippon Telegr & Teleph Corp <Ntt> Similar time series data calculating device, and its method and program
CN101795138A (en) * 2010-01-19 2010-08-04 北京四方继保自动化股份有限公司 Compressing method for high density time sequence data in WAMS (Wide Area Measurement System) of power system
US20140040276A1 (en) * 2012-07-31 2014-02-06 International Business Machines Corporation Method and apparatus for processing time series data
CN103577456A (en) * 2012-07-31 2014-02-12 国际商业机器公司 Method and device for processing time series data
CN105791228A (en) * 2014-12-22 2016-07-20 中兴通讯股份有限公司 Method and device for compressing timestamp
CN106156037A (en) * 2015-03-26 2016-11-23 深圳市腾讯计算机系统有限公司 Data processing method, Apparatus and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MIKE_ZHANG: "Influxdb数据压缩", 《博客园》 *
戴杨: "实时数据库中数据的分类压缩算法", 《计算机与现代化》 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110636368A (en) * 2018-06-25 2019-12-31 杭州海康威视数字技术股份有限公司 Media playing method and device
CN110636368B (en) * 2018-06-25 2021-12-24 杭州海康威视数字技术股份有限公司 Media playing method, system, device and storage medium
CN109582708A (en) * 2018-11-19 2019-04-05 冶金自动化研究设计院 A kind of time series database system
CN109542059A (en) * 2018-11-19 2019-03-29 国核自仪系统工程有限公司 Historical data compression set and method
CN109710614A (en) * 2018-12-28 2019-05-03 深圳市同行者科技有限公司 A kind of method and device of real-time data memory and inquiry
CN112181973A (en) * 2019-07-01 2021-01-05 北京涛思数据科技有限公司 Time sequence data storage method
CN112181973B (en) * 2019-07-01 2023-05-30 北京涛思数据科技有限公司 Time sequence data storage method
CN110943797B (en) * 2019-12-18 2021-06-22 北京邮电大学 Data compression method in SDH network
CN111078755A (en) * 2019-12-19 2020-04-28 远景智能国际私人投资有限公司 Time sequence data storage query method and device, server and storage medium
JP7279266B2 (en) 2019-12-19 2023-05-22 エンヴィジョン デジタル インターナショナル ピーティーイー.エルティーディー. Methods and apparatus for storing and querying time series data, and their servers and storage media
WO2021126079A1 (en) * 2019-12-19 2021-06-24 Envision Digital International Pte. Ltd. Method and apparatus for storing and querying time series data, and server and storage medium thereof
KR102511271B1 (en) 2019-12-19 2023-03-17 엔비전 디지털 인터내셔널 피티이 리미티드 Method and device for storing and querying time series data, and server and storage medium therefor
KR20220108186A (en) * 2019-12-19 2022-08-02 엔비전 디지털 인터내셔널 피티이 리미티드 Method and apparatus for storing and querying time series data, and server and storage medium thereof
JP2023502543A (en) * 2019-12-19 2023-01-24 エンヴィジョン デジタル インターナショナル ピーティーイー.エルティーディー. Methods and apparatus for storing and querying time series data, and their servers and storage media
WO2021176698A1 (en) * 2020-03-06 2021-09-10 富士通株式会社 Machine learning data generation program, machine learning program, machine learning data generation method, and extraction device
CN113492890A (en) * 2020-04-07 2021-10-12 中国航天科工飞航技术研究院(中国航天海鹰机电技术研究院) Data acquisition and storage method for central control system and central control system
CN111858391A (en) * 2020-06-16 2020-10-30 中国人民解放军空军研究院航空兵研究所 Method for optimizing compressed storage format in data processing process
CN113348450A (en) * 2020-06-24 2021-09-03 智协慧同(北京)科技有限公司 Vehicle-mounted data storage method and system
WO2021258360A1 (en) * 2020-06-24 2021-12-30 智协慧同(北京)科技有限公司 On-board data storage method and system
CN111953653A (en) * 2020-07-07 2020-11-17 上海金仕达软件科技有限公司 Data transmission method, system and device
CN112054804A (en) * 2020-09-11 2020-12-08 杭州海康威视数字技术股份有限公司 Method and device for compressing data and method and device for decompressing data
CN112632127B (en) * 2020-12-29 2022-07-15 国华卫星数据科技有限公司 Data processing method for real-time data acquisition and time sequence of equipment operation
CN112632127A (en) * 2020-12-29 2021-04-09 国华卫星数据科技有限公司 Data processing method for real-time data acquisition and time sequence of equipment operation
CN113609085A (en) * 2021-08-18 2021-11-05 希尔塔(苏州)信息技术有限公司 Automobile data rapid storage method
CN113609085B (en) * 2021-08-18 2023-08-15 希尔塔(苏州)信息技术有限公司 Automobile data quick storage method

Similar Documents

Publication Publication Date Title
CN108021650A (en) A kind of efficient storage of time series data and reading system
CN109582708A (en) A kind of time series database system
CN105574212B (en) A kind of image search method of more index disk hash data structures
US20110087669A1 (en) Composite locality sensitive hash based processing of documents
US7849039B2 (en) Method for using one-dimensional dynamics in assessing the similarity of sets of data using kinetic energy
CN111125116B (en) Method and system for positioning code field in service table and corresponding code table
CN104881449A (en) Image retrieval method based on manifold learning data compression hash
CN109597757B (en) Method for measuring similarity between software networks based on multidimensional time series entropy
CN114003791A (en) Depth map matching-based automatic classification method and system for medical data elements
Connor et al. High-dimensional simplexes for supermetric search
CN116821646A (en) Data processing chain construction method, data reduction method, device, equipment and medium
CN106372181A (en) Big data compression method based on industrial process
CN105740428A (en) B+ tree-based high-dimensional disc indexing structure and image search method
Zeng et al. An empirical evaluation of columnar storage formats
CN105302915A (en) High-performance data processing system based on memory calculation
CN109271614A (en) A kind of data duplicate checking method
Martín-Fernández et al. Indexes to find the optimal number of clusters in a hierarchical clustering
CN111767419A (en) Picture searching method, device, equipment and computer readable storage medium
CN115862653A (en) Audio denoising method and device, computer equipment and storage medium
Meng et al. AAC: An anomaly aware time series compression algorithm towards green computing
Xu et al. An approach to cluster electrical load profiles based on piecewise symbolic aggregation
CN113537349A (en) Method, device, equipment and storage medium for identifying hardware fault of large host
Dongjie et al. A data grouping model based on cache transaction for unstructured data storage systems
CN111401783A (en) Power system operation data integration feature selection method
Luo et al. A comparison of som based document categorization systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180511

WD01 Invention patent application deemed withdrawn after publication