CN111930751A - Time sequence data storage method and device - Google Patents

Time sequence data storage method and device Download PDF

Info

Publication number
CN111930751A
CN111930751A CN202010891609.8A CN202010891609A CN111930751A CN 111930751 A CN111930751 A CN 111930751A CN 202010891609 A CN202010891609 A CN 202010891609A CN 111930751 A CN111930751 A CN 111930751A
Authority
CN
China
Prior art keywords
data
sub
column
time sequence
written
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010891609.8A
Other languages
Chinese (zh)
Inventor
张艳清
杨尧
陈博
胥莉君
鲁燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Sefon Software Co Ltd
Original Assignee
Chengdu Sefon Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Sefon Software Co Ltd filed Critical Chengdu Sefon Software Co Ltd
Priority to CN202010891609.8A priority Critical patent/CN111930751A/en
Publication of CN111930751A publication Critical patent/CN111930751A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2308Concurrency control
    • G06F16/2315Optimistic concurrency control
    • G06F16/2322Optimistic concurrency control using timestamps
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24573Query processing with adaptation to user needs using data annotations, e.g. user-defined metadata
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data

Abstract

The invention discloses a time sequence data storage method and a time sequence data storage device, which mainly solve the problem that a server is easy to crash because a large amount of data is directly compressed by a time sequence database in the prior art. The storage method of the time sequence data comprises the following steps: splitting original data into different data sets and writing the data sets into different sub-tables; the sub-table is written to the timing database. By the scheme, the method and the device have the advantages that tens of thousands of columns of measurement data are split into different sub-tables and are compressed and stored; meanwhile, a storage format for storing the sub-table metadata is provided, and the aim of storing the sub-table is effectively managed.

Description

Time sequence data storage method and device
Technical Field
The invention relates to the technical field of data storage and query, in particular to a method and a device for storing time sequence data.
Background
The time sequence database is mainly used for processing data with time labels, namely time-sequenced data which are changed according to the time sequence, and the data with the time labels are also called time sequence data; when the existing time sequence database is used for writing in tens of thousands of columns of data, the data is directly compressed, and under the condition that the original data volume of one row is huge, even the data volume after batch data compression is also massive, so that the server bears huge pressure, and the server is easy to crash.
Disclosure of Invention
The invention aims to provide a time sequence data storage method and a time sequence data storage device, which are used for solving the problem that a server is easy to crash because a large amount of data is directly compressed by the conventional time sequence database.
In order to solve the above problems, the present invention provides the following technical solutions:
a storage method of time sequence data comprises the following steps: splitting original data into different data sets and writing the data sets into different sub-tables; storing the metadata information of the sub-table into the metadata of the original data; the sub-table is written to the timing database.
The conventional time-series database generally writes and reads ten thousand columns of data:
writing: and extracting a time period with a fixed length according to the timestamp, compressing the time period and writing the time period into the KV engine.
Reading: when data is read, the column to be read is calculated through the specified time stamp, and then the corresponding time sequence data block is read in the KV removing engine.
According to the method, before data is written in, the data is split and written into different sub-tables according to the measurement columns, so that the data written into one row in each table is one percent of the data written into one row in the original row, the total amount of compressed data is reduced, the writing overhead can be greatly reduced, the throughput of a time sequence database is integrally improved, and the method is more practical especially in an offline analysis scene of the time sequence data; storing the metadata information of the sub-table into the metadata of the original data, and providing a metadata information persistence structure for managing the metadata information of the sub-table from a data table of the original data during reading; reading amplification is reduced, and the corresponding column names can be directly positioned by managing metadata of the sub-tables; a storage format for storing sub-table metadata is provided to efficiently manage storage of sub-tables.
Further, after the original data measurement columns are sorted according to the column names and the hash, the original data measurement columns are split into different measurement column subsets according to the set number of measurement columns; and splitting the measurement data of tens of thousands of columns into different sub-tables for compression storage.
Further, the original data comprises an index column set and a time column, the index column set, the time column and the measurement column subset are combined into new data, and the data is written into the sub-table.
Further, after data is written in the sub-table, a key is extracted for each set tag, and the data is compressed by the extracted key in accordance with a time stamp, and compressed to generate compressed _ Block.
Further, the key and the following data Block are written into the KV engine as a KV, and the last timestamp of each Compress _ Block is used as a column name.
A data storage device based on time series data comprises
A memory: for storing executable instructions;
a processor: the storage method is used for executing the executable instructions stored in the memory and realizing the storage method of the time sequence data.
Compared with the prior art, the invention has the following beneficial effects:
(1) according to the invention, before data is written in, the data is split and written in different sub-tables according to the measurement columns, so that the data written in one row in each table is one percent of the data in the original row, the total amount of compressed data is reduced, the writing overhead can be greatly reduced, the throughput of the time sequence database is integrally improved, and the method is more practical especially in an offline analysis scene of the time sequence data.
(2) Storing the metadata information of the sub-table into the metadata of the original data, and providing a metadata information persistence structure for managing the metadata information of the sub-table from a data table of the original data during reading; reading amplification is reduced, and the corresponding column names can be directly positioned by managing metadata of the sub-tables; a storage format for storing sub-table metadata is provided to efficiently manage storage of sub-tables.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts, wherein:
FIG. 1 is a diagram of data split into sub-tables.
Fig. 2 is a diagram of a data write sub-table.
Fig. 3 is a diagram of metadata management.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to fig. 1 to 3, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
Example 1
A storage method of time sequence data comprises the following steps: splitting original data into different data sets and writing the data sets into different sub-tables; writing the sub-table into a time sequence database; by the scheme, tens of thousands of columns of measurement data are split into different sub-tables for compression and storage, so that the total amount of compressed data is reduced, the write overhead can be greatly reduced, and the throughput of the time sequence database is integrally improved; the method for writing the sub-table into the time sequence database adopts the existing time sequence data to write into the time sequence database, is a mature prior art, and therefore, the description is not repeated here.
Example 2
As shown in fig. 1, in this embodiment, based on embodiment 1, after sorting the original data metric columns by the column name hash, splitting the original data metric columns into different metric column subsets according to a set number of metric columns; the original data comprises an index column set and a time column, the index column set, the time column and the measurement column subset are combined into new data, and the data are written into a sub-table; the data is split and recombined and then written into different sub-tables and then compressed, so that the data volume is reduced, and the total amount of compressed data is also reduced.
Example 3
As shown in fig. 2, in this embodiment, after data is written into the sub-table, keys are extracted according to the set tags, and the data is compressed according to the time stamp by the extracted keys, so as to generate compressed _ Block; the ey and the following data Block are written into the KV engine as KV, and the last timestamp of each compression _ Block is used as a column name; data is written to the sub-table by the above method.
Example 4
The embodiment further stores the metadata information of the sub-table into the metadata of the original data on the basis of the embodiment 1, and proposes a metadata information persistence structure for managing the metadata information of the sub-table from the data table of the original data during reading; the data table provides a persistent structure for managing the metadata information of the sub-table, so that the data table can manage the metadata of the sub-table by managing the metadata of the sub-table; through the content, the invention provides a storage format for storing the metadata of the sub-table, which effectively manages and stores the sub-table; if the sub table 1 comprises an index column set, a measurement column set (M1-M100) and a time column, the sub table 2 comprises an index column set, a measurement column set (M101-M200) and a time column, the sub table 3 comprises an index column set, a measurement column set (M201-M300) and a time column …, the metadata information of the sub table 1, the sub table 2 and the sub table 3 … is stored in the metadata of the original data, the corresponding sub table can be directly read during reading, and the metadata information of the original data does not need to be traversed; if reading M232, it can be directly fed back to sub-table 3, reducing read amplification and saving memory.
Example 5
As shown in fig. 3, this embodiment is further based on embodiment 1, and a data storage device based on time series data includes a memory: for storing executable instructions; a processor: the storage method of the time sequence data is realized by executing the executable instructions stored in the memory.
Example 6
In this embodiment, based on embodiment 1, after the time series data is stored, reading data is performed by first obtaining related information of a sub-table from metadata of a data table, then reading a Compress _ Block that meets a query condition from a kv engine according to the metadata of the sub-table, obtaining a metric column subset after decompressing the Compress _ Block, and finally combining the index column set and the time column set of all the metric column subsets into original data; the stored data can be read out through the above process.
In the prior art, tens of thousands of columns of time sequence data are compressed strictly according to time periods, and are read and written integrally, so that the server has huge pressure and is easy to burst; the invention divides tens of thousands of columns of measurement data into different sub-tables for compression storage, and provides a storage format for storing sub-table metadata, thereby effectively managing the stored sub-tables, effectively reducing the pressure of the server and reducing the reading and amplifying; the method can be used for monitoring and analyzing after data acquisition, can be landed on a company big data monitoring platform, and can also be used as an independent time sequence data monitoring and analyzing node to be accessed to a user big data platform.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (6)

1. A method for storing time series data is characterized by comprising the following steps:
splitting original data into different data sets and writing the data sets into different sub-tables;
storing the metadata information of the sub-table into the metadata of the original data;
the sub-table is written to the timing database.
2. The method as claimed in claim 1, wherein the original data metric columns are sorted by column name hash, and then split into different metric column subsets according to a set number of metric columns.
3. The method of claim 2, wherein the original data comprises an index column set and a time column, and wherein the index column set, the time column, and the metric column subset are combined into new data to be written into the sub-table.
4. The method of claim 1, wherein after the data is written in the sub-table, a key is extracted according to a set tag, the extracted key compresses the data according to a time stamp, and the compressed data is compressed to generate a compressed _ Block.
5. The method of claim 4, wherein the key and the following data Block are written into the KV engine as a KV, and the last timestamp of each Compress _ Block is used as a column name.
6. A data storage device based on time series data is characterized by comprising
A memory: for storing executable instructions;
a processor: the executable instructions stored in the memory are executed to realize a time sequence data storage method according to any one of claims 1 to 5.
CN202010891609.8A 2020-08-31 2020-08-31 Time sequence data storage method and device Pending CN111930751A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010891609.8A CN111930751A (en) 2020-08-31 2020-08-31 Time sequence data storage method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010891609.8A CN111930751A (en) 2020-08-31 2020-08-31 Time sequence data storage method and device

Publications (1)

Publication Number Publication Date
CN111930751A true CN111930751A (en) 2020-11-13

Family

ID=73310152

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010891609.8A Pending CN111930751A (en) 2020-08-31 2020-08-31 Time sequence data storage method and device

Country Status (1)

Country Link
CN (1) CN111930751A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112632073A (en) * 2020-12-24 2021-04-09 第四范式(北京)技术有限公司 Storage method and device of time sequence characteristic data table
CN115098507A (en) * 2022-06-30 2022-09-23 东方合智数据科技(广东)有限责任公司 Storage method based on industrial internet data and related equipment
CN115357629A (en) * 2022-10-20 2022-11-18 成都宽邦科技有限公司 Processing method, system, electronic device and storage medium for financial data stream
CN116719876A (en) * 2023-08-11 2023-09-08 国网信息通信产业集团有限公司 Time sequence data processing method and terminal based on rule engine

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140237343A1 (en) * 2013-02-21 2014-08-21 International Business Machines Corporation Method and system for optimizing rendering of data tables
CN104133661A (en) * 2014-07-30 2014-11-05 西安电子科技大学 Multi-core parallel hash partitioning optimizing method based on column storage
US20150006485A1 (en) * 2013-06-26 2015-01-01 Eric Alan Christiansen High Scalability Data Management Techniques for Representing, Editing, and Accessing Data
CN105912666A (en) * 2016-04-12 2016-08-31 中国科学院软件研究所 Method for high-performance storage and inquiry of hybrid structure data aiming at cloud platform
CN108874873A (en) * 2018-04-26 2018-11-23 北京空间科技信息研究所 Data query method, apparatus, storage medium and processor
CN110494811A (en) * 2017-02-10 2019-11-22 江森自控科技公司 The building management system of declaratively view with time series data
CN111159176A (en) * 2019-11-29 2020-05-15 中国科学院计算技术研究所 Method and system for storing and reading mass stream data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140237343A1 (en) * 2013-02-21 2014-08-21 International Business Machines Corporation Method and system for optimizing rendering of data tables
US20150006485A1 (en) * 2013-06-26 2015-01-01 Eric Alan Christiansen High Scalability Data Management Techniques for Representing, Editing, and Accessing Data
CN104133661A (en) * 2014-07-30 2014-11-05 西安电子科技大学 Multi-core parallel hash partitioning optimizing method based on column storage
CN105912666A (en) * 2016-04-12 2016-08-31 中国科学院软件研究所 Method for high-performance storage and inquiry of hybrid structure data aiming at cloud platform
CN110494811A (en) * 2017-02-10 2019-11-22 江森自控科技公司 The building management system of declaratively view with time series data
CN108874873A (en) * 2018-04-26 2018-11-23 北京空间科技信息研究所 Data query method, apparatus, storage medium and processor
CN111159176A (en) * 2019-11-29 2020-05-15 中国科学院计算技术研究所 Method and system for storing and reading mass stream data

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112632073A (en) * 2020-12-24 2021-04-09 第四范式(北京)技术有限公司 Storage method and device of time sequence characteristic data table
CN115098507A (en) * 2022-06-30 2022-09-23 东方合智数据科技(广东)有限责任公司 Storage method based on industrial internet data and related equipment
CN115098507B (en) * 2022-06-30 2023-08-18 东方合智数据科技(广东)有限责任公司 Storage method and related equipment based on industrial Internet data
CN115357629A (en) * 2022-10-20 2022-11-18 成都宽邦科技有限公司 Processing method, system, electronic device and storage medium for financial data stream
CN116719876A (en) * 2023-08-11 2023-09-08 国网信息通信产业集团有限公司 Time sequence data processing method and terminal based on rule engine
CN116719876B (en) * 2023-08-11 2023-10-20 国网信息通信产业集团有限公司 Time sequence data processing method and terminal based on rule engine

Similar Documents

Publication Publication Date Title
CN111930751A (en) Time sequence data storage method and device
CN109034993B (en) Account checking method, account checking equipment, account checking system and computer readable storage medium
CN110109923B (en) Time sequence data storage method, time sequence data analysis method and time sequence data analysis device
US6721749B1 (en) Populating a data warehouse using a pipeline approach
CN111291235A (en) Metadata storage method and device based on time sequence database
KR101708261B1 (en) Managing storage of individually accessible data units
CN111339103B (en) Data exchange method and system based on full-quantity fragmentation and incremental log analysis
US20160217158A1 (en) Image search method, image search system, and information recording medium
US20090248725A1 (en) Compressability estimation of non-unique indexes in a database management system
CN111309720A (en) Time sequence data storage method, time sequence data reading method, time sequence data storage device, time sequence data reading device, electronic equipment and storage medium
CN114077609B (en) Data storage and retrieval method, device, computer readable storage medium and electronic equipment
CN114722014B (en) Batch data time sequence transmission method and system based on database log file
CN111291047A (en) Space-time data storage method and device, storage medium and electronic equipment
CN111859070A (en) Mass internet news cleaning system
CN110995273A (en) Data compression method, device, equipment and medium for power database
CN111274454B (en) Spatio-temporal data processing method and device, electronic equipment and storage medium
CN113297208A (en) Data processing method and device
CN107169003B (en) Data association method and device
CN111008183A (en) Storage method and system for business wind control log data
CN107577809A (en) Offline small documents processing method and processing device
CN112306421B (en) Method and system for storing MDF file in analysis and measurement data format
CN115098029A (en) Data processing method and device
CN108153744A (en) A kind of data storage system maintenance method and device
CN106802922A (en) A kind of object-based storage system and method for tracing to the source
CN111753518A (en) Autonomous file consistency checking method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201113

RJ01 Rejection of invention patent application after publication