CN110765201A

CN110765201A - Method and system for processing time series data under PostgreSQL database construction

Info

Publication number: CN110765201A
Application number: CN201910869189.0A
Authority: CN
Inventors: 缪燕; 王伟; 权超
Original assignee: State Grid Corp of China SGCC; Beijing Xuji Electric Co Ltd
Current assignee: State Grid Corp of China SGCC; Beijing Xuji Electric Co Ltd
Priority date: 2019-09-16
Filing date: 2019-09-16
Publication date: 2020-02-07

Abstract

The invention provides a method for processing time sequence data under a PostgreSQL database, which comprises the following steps: creating a time series data processing plug-in a relational database PostgreSQL which determines time series data to be processed; when the database has data import operation, carrying out fragment storage on the data; automatically performing horizontal expansion on stored data; allowing efficient authoring and batch insertion into a batch of small quantities of databases. The advantage of time sequence data load in the recent time interval is utilized to carry out parallel writing across the server or the disk so as to further support high-speed data writing; indexes are swapped to disk for the most recent time interval.

Description

Method and system for processing time series data under PostgreSQL database construction

Technical Field

The invention belongs to the technical field of information, and discloses a method and a system for processing time sequence data under a PostgreSQL database.

Background

With the wide application of information technology, time series data appears in more and more scenes, such as monitoring and operation and maintenance, hydrology monitoring, equipment monitoring of factories, data monitoring related to national security, communication monitoring, financial industry index data, sensor data, application program use data and the like. Generally, the data has the characteristics of large data volume and high complexity; this means that storing time series data requires both size and complex queries. The existing relational database has difficulty in realizing the two attributes simultaneously.

Disclosure of Invention

Aiming at the problem that a relational database can not process time sequence data in the prior art, the invention aims to provide a method and a system for processing time sequence data under a PostgreSQL database, so as to increase the function of the relational database.

In order to solve the above problem, an embodiment of the present invention provides a method for processing time series data under a PostgreSQL database, including:

creating time series data extension through SQL sentences in a relational database PostgreSQL which determines time series data to be processed;

when the database is subjected to data import operation, the data is automatically stored in a fragmentation mode according to time and space; automatically dividing the hyper table into blocks, wherein one block corresponds to a two-dimensional space determined according to the designated time interval and the area of the partition key; wherein a single database deployment comprises one or more hypertables, and the table structure of each hypertable is the same or different;

according to the size and time of the table, the space-time partition is automatically executed to partition the table, and horizontal expansion is completed;

automatically submitting the data insertion in batches; the expansion design fully utilizes the advantage of time sequence data load at the latest time interval, and the parallel writing is carried out for crossing a server or a disk so as to further support the high-speed data writing;

indexes are exchanged to disk for the most recent time interval; the creation of the index relates to data types, and the index is created by inheriting the data types supported by a PostgreSQL database from traditional text or numerical value columns to professional array data types or GIS columns;

the method comprises the steps that complete transaction support is provided for entries of the same partition key, and transactional semantics are ensured to be provided for each device bottom layer in a monitoring program, so that multiple device measurement values related to multiple independent sensor indexes are inserted atomically;

and a data write delay configured to not exceed a block size limit after the delay backfill such that a predetermined amount of delayed data can be inserted.

Further, the Hypertable table is automatically placed in a database node or copied to multiple nodes.

In order to solve the above problem, the present invention further provides a system for processing time series data under a PostgreSQL database, which is configured to synchronize two or more databases that are backed up from each other, and the system includes: a memory for storing at least one instruction, a processor for executing the instructions, wherein the instructions are for:

The technical scheme of the embodiment of the invention supports the establishment of indexes for Postgres columns of any type, from traditional text or numerical value columns to professional array data types or GIS columns; the method has complete transaction support for the entries of the same partition key, ensures that each device bottom layer has transactional semantics in a monitoring program, and ensures that multiple device measurement values possibly related to multiple independent sensor indexes are inserted atomically; configured to not exceed the size limit of one block, a modest amount of latency data can be inserted without creating additional smaller sized blocks.

Drawings

Fig. 1 is a system structure topology diagram according to an embodiment of the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.

The invention provides a method for processing time sequence data under a PostgreSQL database, which comprises the following steps: creating a time series data processing plug-in a relational database PostgreSQL which determines time series data to be processed; when the database has data import operation, carrying out fragment storage on the data; automatically performing horizontal expansion on stored data; allowing efficient authoring and batch insertion into a batch of small quantities of databases. The advantage of time sequence data load in the recent time interval is utilized to carry out parallel writing across the server or the disk so as to further support high-speed data writing; indexes are swapped into disk for the most recent time interval (where a large number of writes occur). Supporting the establishment of indexes for Postgres columns of any type, from traditional text or numerical value columns to professional array data types or GIS columns; the method has complete transaction support for the entries of the same partition key, ensures that each device bottom layer has transactional semantics in a monitoring program, and ensures that multiple device measurement values possibly related to multiple independent sensor indexes are inserted atomically; configured to not exceed the size limit of one block, a modest amount of latency data can be inserted without creating additional smaller sized blocks.

Specifically, an embodiment of the present invention provides a method for processing time series data in a PostgreSQL database, which is configured to process time series data in the PostgreSQL database, and includes:

creating a time series data processing plug-in a relational database PostgreSQL which determines time series data to be processed;

when the database has data import operation, carrying out fragment storage on the data; and performing horizontal expansion on the stored data;

allowing TimescaleDB to be written efficiently in batches for insertion into a batch of small number of databases; the advantage of time sequence data load in the recent time interval is utilized to carry out parallel writing across the server or the disk so as to further support high-speed data writing;

indexes are swapped into disk for the most recent time interval (where a large number of writes occur). Support indexing for any type of Postgres column, from traditional text or numeric columns, to professional array data types or GIS columns.

The complete transaction support is provided for the entries of the same partition key, and in a monitoring program, transactional semantics are ensured to be provided for the bottom layer of each device, and the multiple device measurement values possibly related to multiple independent sensor indexes are ensured to be inserted atomically.

Configured to not exceed the size limit of one block, a modest amount of latency data can be inserted without creating additional smaller sized blocks.

In order to solve the above problem, an embodiment of the present invention provides a system for processing time series data under a PostgreSQL database, which is constructed for processing time series data in a relational database PostgreSQL, and includes:

and the time series data processing plug-in is created in a relational database PostgreSQL which determines time series data to be processed.

And data fragmentation, namely performing fragmentation storage on the data when the database performs data import operation.

And horizontal expansion, namely automatically performing horizontal expansion on the stored data.

Batch commit, allowing efficient authoring and batch insertion into a batch of small quantities of databases. Taking advantage of the time-sequential data load at the most recent time interval, parallel writes are made across servers or disks to further support high-speed data writes.

Memory index, where indices are swapped to disk for the most recent time interval (where a large number of writes occur). Support indexing for any type of Postgres column, from traditional text or numeric columns, to professional array data types or GIS columns.

Transaction support, with complete transaction support for entries of the same partition key, ensures transactional semantics at the bottom of each device in a monitor, ensuring that multiple device measurements, which may involve multiple independent sensor metrics, are atomically interpolated.

Data is reversed and configured such that a modest amount of delayed data can be inserted without creating additional smaller sized blocks without exceeding the size limit of one block.

The invention is further illustrated below by means of a specific example.

Step one, software (hereinafter referred to as TimescaleDB) for implementing the method of the embodiment of the present invention is based on PostgreSQL database plug-in extensions, so the following listed installation packages need to be installed:

cmake-3.10.2.tar.gz

TimescaleDB-1.0.0.tar.gz

these installation packages then need to be compiled for installation.

Step two, finding the package by the Cmake package needing http:// Cmake.

./bootstrap

gmake

gmake install

After the TimescaleDB is installed and the TimescaleDB-1.0.0.tar.gz is decompressed, the following steps are performed.

./bootstrap

cd./build&&make

make install

Step three, configuring the Timescale DB of the embodiment of the invention

For postgresql. conf files in the user postgres data directory, some parameters are set:

shared_preload_libraries＝'timescaledb'

restarting a database to validate it

pg_ctl restart-m fast

And step four, creating a plug-in for the database needing to use the TimescaleDB.

create extension timescaledb；

Step five, after the completion, creating an example table:

while the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method for processing time series data under a PostgreSQL database is constructed, and is characterized by comprising the following steps:

2. The method for processing time series data under the construction of the PostgreSQL database according to claim 1, wherein the hyper table is automatically placed in a certain database node or copied to a plurality of nodes.

3. The method for processing time series data under the construction of the PostgreSQL database according to claim 1, wherein the hyper table is automatically placed in a certain database node or copied to a plurality of nodes.

4. A system for processing time series data under construction of a PostgreSQL database, comprising: a memory for storing at least one instruction, a processor for executing the instructions, wherein the instructions are for: