CN110765201A - Method and system for processing time series data under PostgreSQL database construction - Google Patents

Method and system for processing time series data under PostgreSQL database construction Download PDF

Info

Publication number
CN110765201A
CN110765201A CN201910869189.0A CN201910869189A CN110765201A CN 110765201 A CN110765201 A CN 110765201A CN 201910869189 A CN201910869189 A CN 201910869189A CN 110765201 A CN110765201 A CN 110765201A
Authority
CN
China
Prior art keywords
data
database
time series
postgresql
series data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910869189.0A
Other languages
Chinese (zh)
Inventor
缪燕
王伟
权超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Beijing Xuji Electric Co Ltd
Original Assignee
State Grid Corp of China SGCC
Beijing Xuji Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Beijing Xuji Electric Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201910869189.0A priority Critical patent/CN110765201A/en
Publication of CN110765201A publication Critical patent/CN110765201A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for processing time sequence data under a PostgreSQL database, which comprises the following steps: creating a time series data processing plug-in a relational database PostgreSQL which determines time series data to be processed; when the database has data import operation, carrying out fragment storage on the data; automatically performing horizontal expansion on stored data; allowing efficient authoring and batch insertion into a batch of small quantities of databases. The advantage of time sequence data load in the recent time interval is utilized to carry out parallel writing across the server or the disk so as to further support high-speed data writing; indexes are swapped to disk for the most recent time interval.

Description

Method and system for processing time series data under PostgreSQL database construction
Technical Field
The invention belongs to the technical field of information, and discloses a method and a system for processing time sequence data under a PostgreSQL database.
Background
With the wide application of information technology, time series data appears in more and more scenes, such as monitoring and operation and maintenance, hydrology monitoring, equipment monitoring of factories, data monitoring related to national security, communication monitoring, financial industry index data, sensor data, application program use data and the like. Generally, the data has the characteristics of large data volume and high complexity; this means that storing time series data requires both size and complex queries. The existing relational database has difficulty in realizing the two attributes simultaneously.
Disclosure of Invention
Aiming at the problem that a relational database can not process time sequence data in the prior art, the invention aims to provide a method and a system for processing time sequence data under a PostgreSQL database, so as to increase the function of the relational database.
In order to solve the above problem, an embodiment of the present invention provides a method for processing time series data under a PostgreSQL database, including:
creating time series data extension through SQL sentences in a relational database PostgreSQL which determines time series data to be processed;
when the database is subjected to data import operation, the data is automatically stored in a fragmentation mode according to time and space; automatically dividing the hyper table into blocks, wherein one block corresponds to a two-dimensional space determined according to the designated time interval and the area of the partition key; wherein a single database deployment comprises one or more hypertables, and the table structure of each hypertable is the same or different;
according to the size and time of the table, the space-time partition is automatically executed to partition the table, and horizontal expansion is completed;
automatically submitting the data insertion in batches; the expansion design fully utilizes the advantage of time sequence data load at the latest time interval, and the parallel writing is carried out for crossing a server or a disk so as to further support the high-speed data writing;
indexes are exchanged to disk for the most recent time interval; the creation of the index relates to data types, and the index is created by inheriting the data types supported by a PostgreSQL database from traditional text or numerical value columns to professional array data types or GIS columns;
the method comprises the steps that complete transaction support is provided for entries of the same partition key, and transactional semantics are ensured to be provided for each device bottom layer in a monitoring program, so that multiple device measurement values related to multiple independent sensor indexes are inserted atomically;
and a data write delay configured to not exceed a block size limit after the delay backfill such that a predetermined amount of delayed data can be inserted.
Further, the Hypertable table is automatically placed in a database node or copied to multiple nodes.
In order to solve the above problem, the present invention further provides a system for processing time series data under a PostgreSQL database, which is configured to synchronize two or more databases that are backed up from each other, and the system includes: a memory for storing at least one instruction, a processor for executing the instructions, wherein the instructions are for:
creating time series data extension through SQL sentences in a relational database PostgreSQL which determines time series data to be processed;
when the database is subjected to data import operation, the data is automatically stored in a fragmentation mode according to time and space; automatically dividing the hyper table into blocks, wherein one block corresponds to a two-dimensional space determined according to the designated time interval and the area of the partition key; wherein a single database deployment comprises one or more hypertables, and the table structure of each hypertable is the same or different;
according to the size and time of the table, the space-time partition is automatically executed to partition the table, and horizontal expansion is completed;
automatically submitting the data insertion in batches; the expansion design fully utilizes the advantage of time sequence data load at the latest time interval, and the parallel writing is carried out for crossing a server or a disk so as to further support the high-speed data writing;
indexes are exchanged to disk for the most recent time interval; the creation of the index relates to data types, and the index is created by inheriting the data types supported by a PostgreSQL database from traditional text or numerical value columns to professional array data types or GIS columns;
the method comprises the steps that complete transaction support is provided for entries of the same partition key, and transactional semantics are ensured to be provided for each device bottom layer in a monitoring program, so that multiple device measurement values related to multiple independent sensor indexes are inserted atomically;
and a data write delay configured to not exceed a block size limit after the delay backfill such that a predetermined amount of delayed data can be inserted.
The technical scheme of the embodiment of the invention supports the establishment of indexes for Postgres columns of any type, from traditional text or numerical value columns to professional array data types or GIS columns; the method has complete transaction support for the entries of the same partition key, ensures that each device bottom layer has transactional semantics in a monitoring program, and ensures that multiple device measurement values possibly related to multiple independent sensor indexes are inserted atomically; configured to not exceed the size limit of one block, a modest amount of latency data can be inserted without creating additional smaller sized blocks.
Drawings
Fig. 1 is a system structure topology diagram according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
The invention provides a method for processing time sequence data under a PostgreSQL database, which comprises the following steps: creating a time series data processing plug-in a relational database PostgreSQL which determines time series data to be processed; when the database has data import operation, carrying out fragment storage on the data; automatically performing horizontal expansion on stored data; allowing efficient authoring and batch insertion into a batch of small quantities of databases. The advantage of time sequence data load in the recent time interval is utilized to carry out parallel writing across the server or the disk so as to further support high-speed data writing; indexes are swapped into disk for the most recent time interval (where a large number of writes occur). Supporting the establishment of indexes for Postgres columns of any type, from traditional text or numerical value columns to professional array data types or GIS columns; the method has complete transaction support for the entries of the same partition key, ensures that each device bottom layer has transactional semantics in a monitoring program, and ensures that multiple device measurement values possibly related to multiple independent sensor indexes are inserted atomically; configured to not exceed the size limit of one block, a modest amount of latency data can be inserted without creating additional smaller sized blocks.
Specifically, an embodiment of the present invention provides a method for processing time series data in a PostgreSQL database, which is configured to process time series data in the PostgreSQL database, and includes:
creating a time series data processing plug-in a relational database PostgreSQL which determines time series data to be processed;
when the database has data import operation, carrying out fragment storage on the data; and performing horizontal expansion on the stored data;
allowing TimescaleDB to be written efficiently in batches for insertion into a batch of small number of databases; the advantage of time sequence data load in the recent time interval is utilized to carry out parallel writing across the server or the disk so as to further support high-speed data writing;
indexes are swapped into disk for the most recent time interval (where a large number of writes occur). Support indexing for any type of Postgres column, from traditional text or numeric columns, to professional array data types or GIS columns.
The complete transaction support is provided for the entries of the same partition key, and in a monitoring program, transactional semantics are ensured to be provided for the bottom layer of each device, and the multiple device measurement values possibly related to multiple independent sensor indexes are ensured to be inserted atomically.
Configured to not exceed the size limit of one block, a modest amount of latency data can be inserted without creating additional smaller sized blocks.
In order to solve the above problem, an embodiment of the present invention provides a system for processing time series data under a PostgreSQL database, which is constructed for processing time series data in a relational database PostgreSQL, and includes:
and the time series data processing plug-in is created in a relational database PostgreSQL which determines time series data to be processed.
And data fragmentation, namely performing fragmentation storage on the data when the database performs data import operation.
And horizontal expansion, namely automatically performing horizontal expansion on the stored data.
Batch commit, allowing efficient authoring and batch insertion into a batch of small quantities of databases. Taking advantage of the time-sequential data load at the most recent time interval, parallel writes are made across servers or disks to further support high-speed data writes.
Memory index, where indices are swapped to disk for the most recent time interval (where a large number of writes occur). Support indexing for any type of Postgres column, from traditional text or numeric columns, to professional array data types or GIS columns.
Transaction support, with complete transaction support for entries of the same partition key, ensures transactional semantics at the bottom of each device in a monitor, ensuring that multiple device measurements, which may involve multiple independent sensor metrics, are atomically interpolated.
Data is reversed and configured such that a modest amount of delayed data can be inserted without creating additional smaller sized blocks without exceeding the size limit of one block.
The invention is further illustrated below by means of a specific example.
Step one, software (hereinafter referred to as TimescaleDB) for implementing the method of the embodiment of the present invention is based on PostgreSQL database plug-in extensions, so the following listed installation packages need to be installed:
cmake-3.10.2.tar.gz
TimescaleDB-1.0.0.tar.gz
these installation packages then need to be compiled for installation.
Step two, finding the package by the Cmake package needing http:// Cmake.
./bootstrap
gmake
gmake install
After the TimescaleDB is installed and the TimescaleDB-1.0.0.tar.gz is decompressed, the following steps are performed.
./bootstrap
cd./build&&make
make install
Step three, configuring the Timescale DB of the embodiment of the invention
For postgresql. conf files in the user postgres data directory, some parameters are set:
shared_preload_libraries='timescaledb'
restarting a database to validate it
pg_ctl restart-m fast
And step four, creating a plug-in for the database needing to use the TimescaleDB.
create extension timescaledb;
Step five, after the completion, creating an example table:
Figure BDA0002202257220000061
while the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (4)

1. A method for processing time series data under a PostgreSQL database is constructed, and is characterized by comprising the following steps:
creating time series data extension through SQL sentences in a relational database PostgreSQL which determines time series data to be processed;
when the database is subjected to data import operation, the data is automatically stored in a fragmentation mode according to time and space; automatically dividing the hyper table into blocks, wherein one block corresponds to a two-dimensional space determined according to the designated time interval and the area of the partition key; wherein a single database deployment comprises one or more hypertables, and the table structure of each hypertable is the same or different;
according to the size and time of the table, the space-time partition is automatically executed to partition the table, and horizontal expansion is completed;
automatically submitting the data insertion in batches; the expansion design fully utilizes the advantage of time sequence data load at the latest time interval, and the parallel writing is carried out for crossing a server or a disk so as to further support the high-speed data writing;
indexes are exchanged to disk for the most recent time interval; the creation of the index relates to data types, and the index is created by inheriting the data types supported by a PostgreSQL database from traditional text or numerical value columns to professional array data types or GIS columns;
the method comprises the steps that complete transaction support is provided for entries of the same partition key, and transactional semantics are ensured to be provided for each device bottom layer in a monitoring program, so that multiple device measurement values related to multiple independent sensor indexes are inserted atomically;
and a data write delay configured to not exceed a block size limit after the delay backfill such that a predetermined amount of delayed data can be inserted.
2. The method for processing time series data under the construction of the PostgreSQL database according to claim 1, wherein the hyper table is automatically placed in a certain database node or copied to a plurality of nodes.
3. The method for processing time series data under the construction of the PostgreSQL database according to claim 1, wherein the hyper table is automatically placed in a certain database node or copied to a plurality of nodes.
4. A system for processing time series data under construction of a PostgreSQL database, comprising: a memory for storing at least one instruction, a processor for executing the instructions, wherein the instructions are for:
creating time series data extension through SQL sentences in a relational database PostgreSQL which determines time series data to be processed;
when the database is subjected to data import operation, the data is automatically stored in a fragmentation mode according to time and space; automatically dividing the hyper table into blocks, wherein one block corresponds to a two-dimensional space determined according to the designated time interval and the area of the partition key; wherein a single database deployment comprises one or more hypertables, and the table structure of each hypertable is the same or different;
according to the size and time of the table, the space-time partition is automatically executed to partition the table, and horizontal expansion is completed;
automatically submitting the data insertion in batches; the expansion design fully utilizes the advantage of time sequence data load at the latest time interval, and the parallel writing is carried out for crossing a server or a disk so as to further support the high-speed data writing;
indexes are exchanged to disk for the most recent time interval; the creation of the index relates to data types, and the index is created by inheriting the data types supported by a PostgreSQL database from traditional text or numerical value columns to professional array data types or GIS columns;
the method comprises the steps that complete transaction support is provided for entries of the same partition key, and transactional semantics are ensured to be provided for each device bottom layer in a monitoring program, so that multiple device measurement values related to multiple independent sensor indexes are inserted atomically;
and a data write delay configured to not exceed a block size limit after the delay backfill such that a predetermined amount of delayed data can be inserted.
CN201910869189.0A 2019-09-16 2019-09-16 Method and system for processing time series data under PostgreSQL database construction Pending CN110765201A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910869189.0A CN110765201A (en) 2019-09-16 2019-09-16 Method and system for processing time series data under PostgreSQL database construction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910869189.0A CN110765201A (en) 2019-09-16 2019-09-16 Method and system for processing time series data under PostgreSQL database construction

Publications (1)

Publication Number Publication Date
CN110765201A true CN110765201A (en) 2020-02-07

Family

ID=69329792

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910869189.0A Pending CN110765201A (en) 2019-09-16 2019-09-16 Method and system for processing time series data under PostgreSQL database construction

Country Status (1)

Country Link
CN (1) CN110765201A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694860A (en) * 2020-04-28 2020-09-22 国家计算机网络与信息安全管理中心 Safety detection time sequence data real-time abnormity discovery method and electronic device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105302869A (en) * 2015-09-29 2016-02-03 烽火通信科技股份有限公司 HBase secondary index query and storage system and query method
CN106548035A (en) * 2016-11-24 2017-03-29 腾讯科技(深圳)有限公司 A kind of diagnostic method and device of data exception
CN107733696A (en) * 2017-09-26 2018-02-23 南京天数信息科技有限公司 A kind of machine learning and artificial intelligence application all-in-one dispositions method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105302869A (en) * 2015-09-29 2016-02-03 烽火通信科技股份有限公司 HBase secondary index query and storage system and query method
CN106548035A (en) * 2016-11-24 2017-03-29 腾讯科技(深圳)有限公司 A kind of diagnostic method and device of data exception
CN107733696A (en) * 2017-09-26 2018-02-23 南京天数信息科技有限公司 A kind of machine learning and artificial intelligence application all-in-one dispositions method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694860A (en) * 2020-04-28 2020-09-22 国家计算机网络与信息安全管理中心 Safety detection time sequence data real-time abnormity discovery method and electronic device

Similar Documents

Publication Publication Date Title
US11093466B2 (en) Incremental out-of-place updates for index structures
US10482154B2 (en) Distributed processing of data records
Ahn et al. ForestDB: A fast key-value storage system for variable-length string keys
US9418094B2 (en) Method and apparatus for performing multi-stage table updates
CN106844089B (en) Method and equipment for recovering tree data storage
WO2013123831A1 (en) Intelligent data archiving
Kleppmann Designing data-intensive applications
CN110134335A (en) A kind of RDF data management method, device and storage medium based on key-value pair
CN114329096A (en) Method and system for processing native map database
WO2015168988A1 (en) Data index creation method and device, and computer storage medium
JP2015528957A (en) Distributed file system, file access method, and client device
CN113821573A (en) Mass data rapid retrieval service construction method, system, terminal and storage medium
US10776368B1 (en) Deriving cardinality values from approximate quantile summaries
CN104598652B (en) A kind of data base query method and device
US20070174329A1 (en) Presenting a reason why a secondary data structure associated with a database needs rebuilding
WO2017156855A1 (en) Database systems with re-ordered replicas and methods of accessing and backing up databases
CN103970844A (en) Big data write-in method and device, big data read method and device and big data processing system
CN110765201A (en) Method and system for processing time series data under PostgreSQL database construction
CN108256019A (en) Database key generation method, device, equipment and its storage medium
CN110019192B (en) Database retrieval method and device
US10606835B2 (en) Managing data obsolescence in relational databases
CN110110034A (en) A kind of RDF data management method, device and storage medium based on figure
CA3000042C (en) Apparatus and method for managing storage of primary database and replica database
Bradberry et al. Practical Cassandra: a developer's approach
Kaplanis et al. HB+ tree: use hadoop and HBase even your data isn't that big

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination