CN110765201A - Method and system for processing time series data under PostgreSQL database construction - Google Patents
Method and system for processing time series data under PostgreSQL database construction Download PDFInfo
- Publication number
- CN110765201A CN110765201A CN201910869189.0A CN201910869189A CN110765201A CN 110765201 A CN110765201 A CN 110765201A CN 201910869189 A CN201910869189 A CN 201910869189A CN 110765201 A CN110765201 A CN 110765201A
- Authority
- CN
- China
- Prior art keywords
- data
- database
- time series
- postgresql
- series data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
- G06F16/278—Data partitioning, e.g. horizontal or vertical partitioning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2474—Sequence data queries, e.g. querying versioned data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a method for processing time sequence data under a PostgreSQL database, which comprises the following steps: creating a time series data processing plug-in a relational database PostgreSQL which determines time series data to be processed; when the database has data import operation, carrying out fragment storage on the data; automatically performing horizontal expansion on stored data; allowing efficient authoring and batch insertion into a batch of small quantities of databases. The advantage of time sequence data load in the recent time interval is utilized to carry out parallel writing across the server or the disk so as to further support high-speed data writing; indexes are swapped to disk for the most recent time interval.
Description
Technical Field
The invention belongs to the technical field of information, and discloses a method and a system for processing time sequence data under a PostgreSQL database.
Background
With the wide application of information technology, time series data appears in more and more scenes, such as monitoring and operation and maintenance, hydrology monitoring, equipment monitoring of factories, data monitoring related to national security, communication monitoring, financial industry index data, sensor data, application program use data and the like. Generally, the data has the characteristics of large data volume and high complexity; this means that storing time series data requires both size and complex queries. The existing relational database has difficulty in realizing the two attributes simultaneously.
Disclosure of Invention
Aiming at the problem that a relational database can not process time sequence data in the prior art, the invention aims to provide a method and a system for processing time sequence data under a PostgreSQL database, so as to increase the function of the relational database.
In order to solve the above problem, an embodiment of the present invention provides a method for processing time series data under a PostgreSQL database, including:
creating time series data extension through SQL sentences in a relational database PostgreSQL which determines time series data to be processed;
when the database is subjected to data import operation, the data is automatically stored in a fragmentation mode according to time and space; automatically dividing the hyper table into blocks, wherein one block corresponds to a two-dimensional space determined according to the designated time interval and the area of the partition key; wherein a single database deployment comprises one or more hypertables, and the table structure of each hypertable is the same or different;
according to the size and time of the table, the space-time partition is automatically executed to partition the table, and horizontal expansion is completed;
automatically submitting the data insertion in batches; the expansion design fully utilizes the advantage of time sequence data load at the latest time interval, and the parallel writing is carried out for crossing a server or a disk so as to further support the high-speed data writing;
indexes are exchanged to disk for the most recent time interval; the creation of the index relates to data types, and the index is created by inheriting the data types supported by a PostgreSQL database from traditional text or numerical value columns to professional array data types or GIS columns;
the method comprises the steps that complete transaction support is provided for entries of the same partition key, and transactional semantics are ensured to be provided for each device bottom layer in a monitoring program, so that multiple device measurement values related to multiple independent sensor indexes are inserted atomically;
and a data write delay configured to not exceed a block size limit after the delay backfill such that a predetermined amount of delayed data can be inserted.
Further, the Hypertable table is automatically placed in a database node or copied to multiple nodes.
In order to solve the above problem, the present invention further provides a system for processing time series data under a PostgreSQL database, which is configured to synchronize two or more databases that are backed up from each other, and the system includes: a memory for storing at least one instruction, a processor for executing the instructions, wherein the instructions are for:
creating time series data extension through SQL sentences in a relational database PostgreSQL which determines time series data to be processed;
when the database is subjected to data import operation, the data is automatically stored in a fragmentation mode according to time and space; automatically dividing the hyper table into blocks, wherein one block corresponds to a two-dimensional space determined according to the designated time interval and the area of the partition key; wherein a single database deployment comprises one or more hypertables, and the table structure of each hypertable is the same or different;
according to the size and time of the table, the space-time partition is automatically executed to partition the table, and horizontal expansion is completed;
automatically submitting the data insertion in batches; the expansion design fully utilizes the advantage of time sequence data load at the latest time interval, and the parallel writing is carried out for crossing a server or a disk so as to further support the high-speed data writing;
indexes are exchanged to disk for the most recent time interval; the creation of the index relates to data types, and the index is created by inheriting the data types supported by a PostgreSQL database from traditional text or numerical value columns to professional array data types or GIS columns;
the method comprises the steps that complete transaction support is provided for entries of the same partition key, and transactional semantics are ensured to be provided for each device bottom layer in a monitoring program, so that multiple device measurement values related to multiple independent sensor indexes are inserted atomically;
and a data write delay configured to not exceed a block size limit after the delay backfill such that a predetermined amount of delayed data can be inserted.
The technical scheme of the embodiment of the invention supports the establishment of indexes for Postgres columns of any type, from traditional text or numerical value columns to professional array data types or GIS columns; the method has complete transaction support for the entries of the same partition key, ensures that each device bottom layer has transactional semantics in a monitoring program, and ensures that multiple device measurement values possibly related to multiple independent sensor indexes are inserted atomically; configured to not exceed the size limit of one block, a modest amount of latency data can be inserted without creating additional smaller sized blocks.
Drawings
Fig. 1 is a system structure topology diagram according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
The invention provides a method for processing time sequence data under a PostgreSQL database, which comprises the following steps: creating a time series data processing plug-in a relational database PostgreSQL which determines time series data to be processed; when the database has data import operation, carrying out fragment storage on the data; automatically performing horizontal expansion on stored data; allowing efficient authoring and batch insertion into a batch of small quantities of databases. The advantage of time sequence data load in the recent time interval is utilized to carry out parallel writing across the server or the disk so as to further support high-speed data writing; indexes are swapped into disk for the most recent time interval (where a large number of writes occur). Supporting the establishment of indexes for Postgres columns of any type, from traditional text or numerical value columns to professional array data types or GIS columns; the method has complete transaction support for the entries of the same partition key, ensures that each device bottom layer has transactional semantics in a monitoring program, and ensures that multiple device measurement values possibly related to multiple independent sensor indexes are inserted atomically; configured to not exceed the size limit of one block, a modest amount of latency data can be inserted without creating additional smaller sized blocks.
Specifically, an embodiment of the present invention provides a method for processing time series data in a PostgreSQL database, which is configured to process time series data in the PostgreSQL database, and includes:
creating a time series data processing plug-in a relational database PostgreSQL which determines time series data to be processed;
when the database has data import operation, carrying out fragment storage on the data; and performing horizontal expansion on the stored data;
allowing TimescaleDB to be written efficiently in batches for insertion into a batch of small number of databases; the advantage of time sequence data load in the recent time interval is utilized to carry out parallel writing across the server or the disk so as to further support high-speed data writing;
indexes are swapped into disk for the most recent time interval (where a large number of writes occur). Support indexing for any type of Postgres column, from traditional text or numeric columns, to professional array data types or GIS columns.
The complete transaction support is provided for the entries of the same partition key, and in a monitoring program, transactional semantics are ensured to be provided for the bottom layer of each device, and the multiple device measurement values possibly related to multiple independent sensor indexes are ensured to be inserted atomically.
Configured to not exceed the size limit of one block, a modest amount of latency data can be inserted without creating additional smaller sized blocks.
In order to solve the above problem, an embodiment of the present invention provides a system for processing time series data under a PostgreSQL database, which is constructed for processing time series data in a relational database PostgreSQL, and includes:
and the time series data processing plug-in is created in a relational database PostgreSQL which determines time series data to be processed.
And data fragmentation, namely performing fragmentation storage on the data when the database performs data import operation.
And horizontal expansion, namely automatically performing horizontal expansion on the stored data.
Batch commit, allowing efficient authoring and batch insertion into a batch of small quantities of databases. Taking advantage of the time-sequential data load at the most recent time interval, parallel writes are made across servers or disks to further support high-speed data writes.
Memory index, where indices are swapped to disk for the most recent time interval (where a large number of writes occur). Support indexing for any type of Postgres column, from traditional text or numeric columns, to professional array data types or GIS columns.
Transaction support, with complete transaction support for entries of the same partition key, ensures transactional semantics at the bottom of each device in a monitor, ensuring that multiple device measurements, which may involve multiple independent sensor metrics, are atomically interpolated.
Data is reversed and configured such that a modest amount of delayed data can be inserted without creating additional smaller sized blocks without exceeding the size limit of one block.
The invention is further illustrated below by means of a specific example.
Step one, software (hereinafter referred to as TimescaleDB) for implementing the method of the embodiment of the present invention is based on PostgreSQL database plug-in extensions, so the following listed installation packages need to be installed:
cmake-3.10.2.tar.gz
TimescaleDB-1.0.0.tar.gz
these installation packages then need to be compiled for installation.
Step two, finding the package by the Cmake package needing http:// Cmake.
./bootstrap
gmake
gmake install
After the TimescaleDB is installed and the TimescaleDB-1.0.0.tar.gz is decompressed, the following steps are performed.
./bootstrap
cd./build&&make
make install
Step three, configuring the Timescale DB of the embodiment of the invention
For postgresql. conf files in the user postgres data directory, some parameters are set:
shared_preload_libraries='timescaledb'
restarting a database to validate it
pg_ctl restart-m fast
And step four, creating a plug-in for the database needing to use the TimescaleDB.
create extension timescaledb;
Step five, after the completion, creating an example table:
while the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (4)
1. A method for processing time series data under a PostgreSQL database is constructed, and is characterized by comprising the following steps:
creating time series data extension through SQL sentences in a relational database PostgreSQL which determines time series data to be processed;
when the database is subjected to data import operation, the data is automatically stored in a fragmentation mode according to time and space; automatically dividing the hyper table into blocks, wherein one block corresponds to a two-dimensional space determined according to the designated time interval and the area of the partition key; wherein a single database deployment comprises one or more hypertables, and the table structure of each hypertable is the same or different;
according to the size and time of the table, the space-time partition is automatically executed to partition the table, and horizontal expansion is completed;
automatically submitting the data insertion in batches; the expansion design fully utilizes the advantage of time sequence data load at the latest time interval, and the parallel writing is carried out for crossing a server or a disk so as to further support the high-speed data writing;
indexes are exchanged to disk for the most recent time interval; the creation of the index relates to data types, and the index is created by inheriting the data types supported by a PostgreSQL database from traditional text or numerical value columns to professional array data types or GIS columns;
the method comprises the steps that complete transaction support is provided for entries of the same partition key, and transactional semantics are ensured to be provided for each device bottom layer in a monitoring program, so that multiple device measurement values related to multiple independent sensor indexes are inserted atomically;
and a data write delay configured to not exceed a block size limit after the delay backfill such that a predetermined amount of delayed data can be inserted.
2. The method for processing time series data under the construction of the PostgreSQL database according to claim 1, wherein the hyper table is automatically placed in a certain database node or copied to a plurality of nodes.
3. The method for processing time series data under the construction of the PostgreSQL database according to claim 1, wherein the hyper table is automatically placed in a certain database node or copied to a plurality of nodes.
4. A system for processing time series data under construction of a PostgreSQL database, comprising: a memory for storing at least one instruction, a processor for executing the instructions, wherein the instructions are for:
creating time series data extension through SQL sentences in a relational database PostgreSQL which determines time series data to be processed;
when the database is subjected to data import operation, the data is automatically stored in a fragmentation mode according to time and space; automatically dividing the hyper table into blocks, wherein one block corresponds to a two-dimensional space determined according to the designated time interval and the area of the partition key; wherein a single database deployment comprises one or more hypertables, and the table structure of each hypertable is the same or different;
according to the size and time of the table, the space-time partition is automatically executed to partition the table, and horizontal expansion is completed;
automatically submitting the data insertion in batches; the expansion design fully utilizes the advantage of time sequence data load at the latest time interval, and the parallel writing is carried out for crossing a server or a disk so as to further support the high-speed data writing;
indexes are exchanged to disk for the most recent time interval; the creation of the index relates to data types, and the index is created by inheriting the data types supported by a PostgreSQL database from traditional text or numerical value columns to professional array data types or GIS columns;
the method comprises the steps that complete transaction support is provided for entries of the same partition key, and transactional semantics are ensured to be provided for each device bottom layer in a monitoring program, so that multiple device measurement values related to multiple independent sensor indexes are inserted atomically;
and a data write delay configured to not exceed a block size limit after the delay backfill such that a predetermined amount of delayed data can be inserted.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910869189.0A CN110765201A (en) | 2019-09-16 | 2019-09-16 | Method and system for processing time series data under PostgreSQL database construction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910869189.0A CN110765201A (en) | 2019-09-16 | 2019-09-16 | Method and system for processing time series data under PostgreSQL database construction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110765201A true CN110765201A (en) | 2020-02-07 |
Family
ID=69329792
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910869189.0A Pending CN110765201A (en) | 2019-09-16 | 2019-09-16 | Method and system for processing time series data under PostgreSQL database construction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110765201A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111694860A (en) * | 2020-04-28 | 2020-09-22 | 国家计算机网络与信息安全管理中心 | Safety detection time sequence data real-time abnormity discovery method and electronic device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105302869A (en) * | 2015-09-29 | 2016-02-03 | 烽火通信科技股份有限公司 | HBase secondary index query and storage system and query method |
CN106548035A (en) * | 2016-11-24 | 2017-03-29 | 腾讯科技(深圳)有限公司 | A kind of diagnostic method and device of data exception |
CN107733696A (en) * | 2017-09-26 | 2018-02-23 | 南京天数信息科技有限公司 | A kind of machine learning and artificial intelligence application all-in-one dispositions method |
-
2019
- 2019-09-16 CN CN201910869189.0A patent/CN110765201A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105302869A (en) * | 2015-09-29 | 2016-02-03 | 烽火通信科技股份有限公司 | HBase secondary index query and storage system and query method |
CN106548035A (en) * | 2016-11-24 | 2017-03-29 | 腾讯科技(深圳)有限公司 | A kind of diagnostic method and device of data exception |
CN107733696A (en) * | 2017-09-26 | 2018-02-23 | 南京天数信息科技有限公司 | A kind of machine learning and artificial intelligence application all-in-one dispositions method |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111694860A (en) * | 2020-04-28 | 2020-09-22 | 国家计算机网络与信息安全管理中心 | Safety detection time sequence data real-time abnormity discovery method and electronic device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11093466B2 (en) | Incremental out-of-place updates for index structures | |
US10482154B2 (en) | Distributed processing of data records | |
Ahn et al. | ForestDB: A fast key-value storage system for variable-length string keys | |
US9418094B2 (en) | Method and apparatus for performing multi-stage table updates | |
CN106844089B (en) | Method and equipment for recovering tree data storage | |
WO2013123831A1 (en) | Intelligent data archiving | |
Kleppmann | Designing data-intensive applications | |
CN110134335A (en) | A kind of RDF data management method, device and storage medium based on key-value pair | |
CN114329096A (en) | Method and system for processing native map database | |
WO2015168988A1 (en) | Data index creation method and device, and computer storage medium | |
JP2015528957A (en) | Distributed file system, file access method, and client device | |
CN113821573A (en) | Mass data rapid retrieval service construction method, system, terminal and storage medium | |
US10776368B1 (en) | Deriving cardinality values from approximate quantile summaries | |
CN104598652B (en) | A kind of data base query method and device | |
US20070174329A1 (en) | Presenting a reason why a secondary data structure associated with a database needs rebuilding | |
WO2017156855A1 (en) | Database systems with re-ordered replicas and methods of accessing and backing up databases | |
CN103970844A (en) | Big data write-in method and device, big data read method and device and big data processing system | |
CN110765201A (en) | Method and system for processing time series data under PostgreSQL database construction | |
CN108256019A (en) | Database key generation method, device, equipment and its storage medium | |
CN110019192B (en) | Database retrieval method and device | |
US10606835B2 (en) | Managing data obsolescence in relational databases | |
CN110110034A (en) | A kind of RDF data management method, device and storage medium based on figure | |
CA3000042C (en) | Apparatus and method for managing storage of primary database and replica database | |
Bradberry et al. | Practical Cassandra: a developer's approach | |
Kaplanis et al. | HB+ tree: use hadoop and HBase even your data isn't that big |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |