CN113515503A - Table-based InfluxDB data migration method - Google Patents

Table-based InfluxDB data migration method Download PDF

Info

Publication number
CN113515503A
CN113515503A CN202110837793.2A CN202110837793A CN113515503A CN 113515503 A CN113515503 A CN 113515503A CN 202110837793 A CN202110837793 A CN 202110837793A CN 113515503 A CN113515503 A CN 113515503A
Authority
CN
China
Prior art keywords
data
migration
task
time
migrated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110837793.2A
Other languages
Chinese (zh)
Inventor
程海明
刘启铨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Whale Cloud Technology Co Ltd
Original Assignee
Whale Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Whale Cloud Technology Co Ltd filed Critical Whale Cloud Technology Co Ltd
Priority to CN202110837793.2A priority Critical patent/CN113515503A/en
Publication of CN113515503A publication Critical patent/CN113515503A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4812Task transfer initiation or dispatching by interrupt, e.g. masked

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a table-based InfluxDB data migration method, which comprises the following steps: supporting reading of data in a source InfluxDB table according to a specific time interval and a specific column; supporting a user to define each metadata according to needs; the migrated target data source meets the requirements of network isolation and remote delivery; calculating and instantiating all migration subtasks; scheduling each independent data migration fragmentation task, performing data query on the source InfluxDB, and uniformly writing query results into a task queue; reading data and writing data at a maximum rate; and monitoring the query response time index in real time, calculating a query response time-consuming percentile P95 in a latest sliding time window, comparing the time-consuming percentile with a writing threshold value, and dynamically adjusting the number of concurrent tasks. Has the advantages that: the granularity of table migration scheme control is finer, data processing is more accurate, and migration efficiency is higher.

Description

Table-based InfluxDB data migration method
Technical Field
The invention relates to the field of intelligent operation and maintenance, in particular to a table-based InfluxDB data migration method.
Background
With the continuous evolution of technologies and architectures of cloud, containerization and distributed micro services, the continuous evolution of the internet of things; the number of application instances and various objects grows exponentially, and the Performance, usability and other KPIs (Key Performance Indicator) of modern application systems and the operational Indicator data have two obvious characteristics:
1. data of a system KPI (Key Performance indicator) has time sequence type data characteristics, the data has a time axis, and the data cannot be updated generally.
2. The index numbers are all at a mass level. The storage and migration of the time sequence database need to support real-time online and high efficiency.
Currently known to the industry for time-ordered databases are InfluxDB, Kdb +, Prometheus, Graphite, RRDtool, OpenTSDB. From 2016, InfluxDB always occupies the leaderboard position, accounting for nearly 30%, and is widely applied in various industries.
The InfluxDB has the following advantages: the installation is convenient and fast, and no dependence exists; a moderately configured machine supports 25 ten thousand + data writes per second [ cpu: 4-6 core, memory: 8-32G, IOPS (Input/Output Operations Per Second, disk operands Per Second): 500- > 1000 ]; native support of HTTP API, easier docking and expansion, and provision of SDK (Software Development Kit) in each language; the SQL-like query language is a function taking time as a center, so that the learning cost is lower, and the learning, mastering and implementation are easier; the query response is rapid (within 100 ms); and the system supports contact query and different storage strategies, and can quickly sample and grade storage capacity.
In the ground and actual production process, the real-time online data migration capability based on fine-grained control of the table is urgently needed to be provided no matter in the scenes of data return delivered according to projects, accurate data migration between time sequence instances managed in a centralized mode and the like.
An effective solution to the problems in the related art has not been proposed yet.
Disclosure of Invention
Aiming at the problems in the related art, the invention provides a table-based InfluxDB data migration method to overcome the technical problems in the prior related art.
Therefore, the invention adopts the following specific technical scheme:
a table-based InfluxDB data migration method comprises the following steps:
s1, providing data reading based on the InfluxDB table model, and supporting the reading of data in the source InfluxDB table according to a specific time interval and a specific column;
s2, providing a user-defined configuration interface, and supporting a table to be migrated, a column in the table to be migrated, the starting and ending time of data in the table to be migrated, a table data source to be migrated and a target data source which are defined by a user according to needs;
s3, for the migrated target data source, supporting the satisfaction of service scenes including network isolation and remote delivery in a file manner;
s4, calculating and instantiating all migration subtasks according to the configured time of each task slice and the start-stop time of the data of the table to be migrated;
s5, scheduling each independent data migration slicing task through a scheduling engine, performing data query on each independent data migration slicing task in a source InfluxDB according to a data query statement to be migrated, and uniformly writing a query result into a task queue containing a writing threshold;
s6, ensuring that data is read from the source InfluxDB and written to the target InfluxDB at the maximum rate;
s7, the scheduling engine monitors the query response time index of the source InfluxDB in real time, calculates a query response time-consuming percentile P95 in a latest sliding time window based on a percentile numerical algorithm, and dynamically adjusts the number of concurrent tasks by comparing the percentile P95 with a write-in threshold value;
wherein, the percentile numerical algorithm in the S7 is as follows: and sequencing a group of data from small to large, and calculating corresponding cumulative percentiles, wherein the value of the data corresponding to one percentile is called the percentile of the percentile, namely a group of n observed values are arranged according to the numerical value.
Further, in S4, the calculating and instantiating all migration subtasks according to the configured time size of each task slice and the start-stop time of the data in the table to be migrated further includes the following steps:
s41, the task instantiation engine divides the starting and ending time of the data of the table to be migrated by the time of the task fragment, and the number of migration tasks to be segmented is calculated;
s42, the task instantiation engine instantiates each migration subtask and the starting and ending time of each migration subtask;
and S43, acquiring related metadata according to the list information including the table to be migrated and the table to be migrated, configured by the user in the S2, and injecting the metadata into each instantiation task to be migrated.
Further, the calculation formula of the number of migration tasks to be segmented is as follows:
(end time-start time) [ ms ]/task slicing time size [ ms ] — the number of migration tasks to be sliced.
Further, the step of scheduling each independent data migration fragmentation task by the scheduling engine in S5, performing data query on the source infiluxdb by each independent data migration fragmentation task according to the data query statement to be migrated, and uniformly writing the query result into the task queue including the write threshold further includes the following steps:
s51, the scheduling engine initiates scheduling according to the number of concurrent tasks configured by the user and the instantiated tasks in S4;
s52, scheduling each data migration slicing task to be migrated, wherein each slicing task is a minimum unit which independently runs and migrates data;
s53, assembling a data query statement to be migrated when each independent data migration slicing task is scheduled to be initiated and executed;
and S54, the data migration slicing task carries out data query on the source InfluxDB according to the data query statement to be migrated, and uniformly writes the query result into a task queue containing a write-in threshold value.
Further, when the data query statement to be migrated is assembled in S53, the data query statement to be migrated is assembled according to the start time of the task, the list to be migrated, and the table name.
Further, the writing threshold is 3 seconds or 5000 pieces, when the number of pieces of index data in the queue is greater than 5000 pieces during writing, batch writing is initiated, and when the asynchronous refresh thread counts for 3 seconds, batch writing of data is initiated by the asynchronous refresh thread.
Further, the step S7 of the scheduling engine monitoring the query response time index of the source infiluxdb database in real time, calculating a percentile consuming time P95 in the latest sliding time window based on a percentile numerical algorithm, and dynamically adjusting the number of concurrent tasks by comparing the percentile P95 with the write threshold further includes the following steps:
s71, the scheduling engine performs task scheduling according to the initialized concurrency number of the maximum task number/2;
s72, starting task scheduling monitoring, responding to time consumption according to the maximum query configured by the user, and simultaneously calculating the P95 state;
and S73, if the P95 query condition is met and the current concurrent task number is less than the maximum task concurrent number defined by the user, increasing the concurrent task number.
Further, if the P95 query condition is satisfied and the number of concurrent tasks is equal to or greater than the maximum number of tasks in S73, the number of concurrent tasks is not adjusted.
Further, if the P95 query condition is not satisfied in S73, the number of concurrent tasks is reduced.
Further, if the P95 query condition is not satisfied and the number of tasks is 1 in S73, the migration operation is interrupted, the user is prompted to divide the time slice into smaller time slices, and the time slices are slid in the time window for S71 to S73.
The invention has the beneficial effects that:
(1) the invention provides a fine migration strategy based on an InfluxDB table, which has rich user-defined operation interfaces; compared with a migration scheme aiming at the granularity of the whole database, the granularity of table migration scheme control is finer, data processing is more accurate, and migration efficiency is higher.
(2) The invention also provides a rich user-defined operation interface, and the user can define information such as the column of migration, the time interval of migration and the like, thereby providing more operation options for the user and having stronger operability.
(3) The invention realizes data migration by a scheme of splitting a small task of a data migration period and concurrency of tasks, and the small task causes the influence of the minimum data migration unit on the InfluxDB; task concurrency can improve migration efficiency.
(4) The invention increases the real-time monitoring of data migration, has the capability of dynamically adjusting the concurrency of tasks by monitoring the state of the data source in real time, and enables the migration rate to be dynamically adapted along with the state of the data source, thereby ensuring the migration efficiency, reducing the influence on the data source to the maximum extent and realizing the characteristic of online migration. The invention supports various target data source types, such as offline files, and can meet the data migration requirements of scenes such as network isolation, remote delivery and the like.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a flow chart of a method for table-based InfluxDB data migration according to an embodiment of the invention;
FIG. 2 is a diagram illustrating an index batch write target InfluxDB according to an embodiment of the invention;
FIG. 3 is a full flow diagram of table-based migration according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an InfluxDB data write-read flow according to an embodiment of the present invention;
FIG. 5 is a diagram of a dynamically adjusted concurrent count graph according to an embodiment of the present invention.
Detailed Description
For further explanation of the various embodiments, the drawings which form a part of the disclosure and which are incorporated in and constitute a part of this specification, illustrate embodiments and, together with the description, serve to explain the principles of operation of the embodiments, and to enable others of ordinary skill in the art to understand the various embodiments and advantages of the invention, and, by reference to these figures, reference is made to the accompanying drawings, which are not to scale and wherein like reference numerals generally refer to like elements.
According to the embodiment of the invention, the invention provides a table-based InfluxDB data migration method, InfluxDB is always in the leading position in the time sequence field, the time sequence field accounts for nearly 30 percent, the method is continuously increased, and the method is widely applied to various industries; data migration is the most primitive feature of a database that is required by various types of users. The InfluxDB native support backup and recovery according to the library level; in general, the granularity of backup and export by library is too large, which is to consume time and resources, and cannot control precisely, possibly causing data pollution. The invention provides a table-based data migration method, which can more finely control the migration strategy of time sequence data, such as a migrated table, the starting and ending time of the migrated table, and which columns of the migrated table, so as to realize more accurate control. Furthermore, the universality of index data is massive data, and the key characteristics of the method are that the interference on a source library and a target library is reduced and efficient online migration is realized; in the storage characteristics of the InfluxDB, time is the most critical index, and simultaneously, data storage files are divided into different files according to time periods; the number of indexes read each time is reduced, the time span of migration is controlled, most of operations are guaranteed not to cross stored file blocks, and the synchronization efficiency can be greatly improved; the invention provides a task segmentation method based on time slices, which can customize the size of a concurrent task to be migrated and the size of a data time slice to be migrated each time of each task; concurrent tasks and small time slices are used for replacing large tasks, so that the data migration efficiency is greatly improved, and the influence of a data source is reduced. Meanwhile, in order to reduce the influence on the online service in the actual migration, the invention also dynamically adjusts the number of the concurrent tasks to be migrated according to the performance index of the data source in the migration process in real time, and finally realizes the high-efficiency real-time online migration capability without perception.
The method is different from a coarse-grained migration and backup scheme based on a library, more refined control is achieved, more recently, time sequence index databases are massive data, the high efficiency of the migration scheme is a core characteristic, and meanwhile, the migration process is guaranteed to have no influence on the source InfluxDB and the target InfluxDB and can be migrated on line. The method is different from a coarse-grained migration and backup scheme based on a library, and more refined control is achieved; the core of the method is to have a control unit with lower layer migration, considering the data model of InfluxDB, a library-table, and the table is the minimum model unit; the control unit of the migration scheme needs to be based on control of the table, and further fine control can be achieved. How to ensure the high efficiency of data migration, the migration process has no influence on the source InfluxDB and the target InfluxDB, and the migration can be carried out on line; for more efficient migration, the migration scheme requires maximum rate of reading from the source infiluxdb and writing data to the target infiluxdb; the InfluxDB optimizes the write-in data greatly, the asynchronous refreshing of WAL (write ahead log), Cache (Cache memory) of an operating system and an asynchronous sequencing and compression scheme greatly improve the write-in speed of the InfluxDB, and the data migration write-in is not a key factor of influence. The key core of migration is to ensure the maximum reading speed of data, and meanwhile, the data reading does not affect the source InfluxDB database. The maximum reading speed of the data is ensured, and meanwhile, the data reading does not influence a source InfluxDB database; the core needs to increase the reading speed of the migration scheme under the condition that the source InfluxDB has good performance; when the delay is higher, the reading speed is reduced; the method has the capability of dynamically adjusting the migration rate. The core of the capability of dynamically adjusting the migration rate is that each migration task is reduced as much as possible according to needs, each small task only migrates data of a small time slice, and the operation of the small task does not have great influence on the source InfluxDB. Secondly, a scheduling engine is required to be provided, the concurrence number of the migration tasks can be dynamically adjusted according to the reading delay of each migration task, and the data can be efficiently migrated on the premise of not influencing the InfluxDB database, so that the online migration characteristic is achieved.
The present invention will be further described with reference to the accompanying drawings and specific embodiments, as shown in fig. 1-2, a table-based infiluxdb data migration method according to an embodiment of the present invention includes the following steps:
s1, the table-based InfluxDB data migration method of the invention provides data reading based on an InfluxDB table model, and supports reading data in a source InfluxDB table according to a specific time interval and a specific column;
s2, in order to provide more control capability for data migration operation, a user-defined configuration interface is provided, and a user can define a table to be migrated, a column in the table to be migrated, start-stop time of data in the table to be migrated, a table data source and a target data source to be migrated and the like according to needs, and as shown in 1 in FIG. 3, the user can define each metadata to be migrated and refine a control migration strategy;
s3, in a common service scene, removing the InfluxDB instance of real-time online migration for a migrated target data source, and supporting the service scene of meeting network isolation, remote delivery and the like in a file manner;
s4, as 2 in FIG. 3, calculating and instantiating all migration subtasks according to the configured time of each task slice and the start-stop time of the data of the table to be migrated;
in S4, calculating and instantiating all migration subtasks according to the configured time size of each task slice and the start-stop time of the data in the table to be migrated further includes the following steps:
s41, the task instantiation engine divides the starting and ending time of the data of the table to be migrated by the time of the task fragment, and the number of migration tasks to be segmented is calculated;
s42, the task instantiation engine instantiates each migration subtask and the starting and ending time of each migration subtask;
and S43, acquiring related metadata according to the information of the table to be migrated, the column of the table to be migrated and the like configured by the user in the S2, and injecting the metadata into each instantiation task to be migrated.
The calculation formula of the number of the migration tasks to be segmented is as follows:
(end time-start time) [ ms ]/task slicing time size [ ms ] — the number of migration tasks to be sliced.
S5, as shown in fig. 3 of the present solution, the data migration scheduling engine schedules each independent data migration fragment task through the scheduling engine, and each independent data migration fragment task performs data query on the source infiluxdb according to a data query statement to be migrated, and uniformly writes a query result into a task queue including a write threshold;
in S5, the scheduling each independent data migration fragmentation task by the scheduling engine, and performing data query on the source infiluxdb by each independent data migration fragmentation task according to the data query statement to be migrated, and uniformly writing the query result into the task queue including the write threshold further includes the following steps:
s51, the scheduling engine initiates scheduling according to the number of concurrent tasks configured by the user and the instantiated tasks in S4;
s52, scheduling each data migration slicing task to be migrated, wherein each slicing task is a minimum unit which independently runs and migrates data;
s53, when each independent data migration slicing task is scheduled to be initiated and executed, assembling a data query statement to be migrated: select host, core _ id, user, idle, system, wait FROM cpu where time > -2019-11-29T 13: 36: 06Z 'and time <'2019-11-29T 13: 36: 16Z';
and S54, the data migration slicing task carries out data query on the source InfluxDB according to the data query statement to be migrated, and uniformly writes the query result into a task queue containing a write-in threshold value.
When the data query statement to be migrated is assembled in S53, the data query statement to be migrated is assembled according to the start time of the task, the list to be migrated, and the table name.
The writing threshold is 3 seconds or 5000 pieces, namely when the index data to be written exceeds any threshold, and when the index data in the queue is greater than 5000 pieces during writing, batch writing is initiated, and when the asynchronous refreshing thread counts for 3 seconds, the asynchronous refreshing thread initiates data batch writing operation.
S6. for more efficient migration, the migration scheme needs to ensure that data is read from the source infiluxdb and written to the target infiluxdb at the maximum rate;
as shown in fig. 4, the infiluxdb performs a great deal of optimization on the write data, and the write speed of the infiluxdb is greatly improved by the wal (write ahead log), the asynchronous refresh of the Cache of the operating system, and the asynchronous sequencing and compression scheme; the data migration efficiency, the data writing of the InfluxDB is not a key factor of the influence. The key core of migration is to ensure the maximum reading speed of data, and meanwhile, the data reading does not affect the source InfluxDB database.
S7, a scheduling engine carries out real-time monitoring on the query response time index of the source InfluxDB, calculates a query response time-consuming percentile P95 in a latest sliding time window based on a percentile numerical algorithm, and simultaneously dynamically adjusts the number of concurrent tasks by comparing the percentile P95 with a write-in threshold value so as to carry out data migration at the maximum efficiency without influence; although the average value is the total time consumption of the requests/the total number of the requests, the state that the query response takes time can also be reflected, the average value can average out some abnormal data, and we can only know that the average value is 300ms, but there are 100ms faster queries and more than 1s abnormal slow queries. The percentile can reflect the real inquiry response condition of the system;
wherein, the percentile numerical algorithm in the S7 is as follows: and sequencing a group of data from small to large, and calculating corresponding cumulative percentiles, wherein the value of the data corresponding to one percentile is called the percentile of the one percentile, namely a group of n observed values are arranged according to the numerical value, for example, the value at the p% position is called the p percentile.
As shown in fig. 5, the scheduling engine in S7 performs real-time monitoring on the query response time indicator of the source infiluxdb database, calculates a percentile P95 of query response time consumption in the latest sliding time window based on a percentile numerical algorithm, and dynamically adjusts the number of concurrent tasks by comparing the percentile P95 with the write threshold, further including the following steps:
s71, the scheduling engine performs task scheduling according to the initialized concurrency number of the maximum task number/2;
s72, starting task scheduling monitoring, responding to time consumption according to the maximum query configured by the user, and simultaneously calculating the P95 state;
and S73, if the P95 query condition is met and the current concurrent task number is less than the maximum task concurrent number defined by the user, increasing the concurrent task number.
And in the step S73, if the P95 query condition is satisfied and the number of concurrent tasks is greater than or equal to the maximum number of tasks, the number of concurrent tasks is not adjusted.
And in the step S73, if the P95 query condition is not met, reducing the number of concurrent tasks.
And if the P95 query condition is not met and the number of tasks is 1 in the step S73, interrupting the migration action, prompting the user to divide the time slice into smaller time slices, and continuing to slide according to the time window from S71 to S73. Therefore, online, real-time and efficient migration of data can be realized without influence. The execution of the entire tablewalk is now complete.
The embodiments of the present invention will be described in detail below with reference to specific embodiments.
The real-time online mode migrates data from the source InfluxDB to the target InfluxDB.
Firstly, a user configures a table to be migrated, columns, start time and end time, and configures link information configuration of a source InfluxDB and a target InfluxDB, and the file names are stored as follows: txt cpu _ sync
Figure BDA0003177790930000091
Figure BDA0003177790930000101
Secondly, the program calculates the number of migration tasks, the list to be migrated, the starting and stopping time of migration and the like according to the user-defined configuration, and instantiates the data migration task of the minimum unit;
and thirdly, initiating scheduling by a scheduling program, monitoring the data reading time consumption of the tasks in real time, dynamically adjusting the number of concurrent tasks and the migration rate by relying on an algorithm based on percentile P95, and efficiently migrating without affecting the InfluxDB data source.
Fourth, the user initiates an execution command: sync-file./cpu _ sync.txt; the execution result part is as follows:
JobTask[2019-11-29 13:36:06,2019-11-29 13:36:16]syncSuccessCount:1273syncFailCount:0
JobTask[2019-11-29 13:35:33,2019-11-29 13:35:43]syncSuccessCount:1165syncFailCount:0
JobTask[2019-11-29 13:36:28,2019-11-29 13:36:38]syncSuccessCount:1189syncFailCount:0
JobTask[2019-11-29 13:37:23,2019-11-29 13:37:33]syncSuccessCount:1322syncFailCount:0
JobTask[2019-11-29 13:36:50,2019-11-29 13:37:00]syncSuccessCount:1439syncFailCount:0
JobTask[2019-11-29 13:36:39,2019-11-29 13:36:49]syncSuccessCount:1261syncFailCount:0
JobTask[2019-11-29 13:36:17,2019-11-29 13:36:27]syncSuccessCount:1404syncFailCount:0
JobTask[2019-11-29 13:38:07,2019-11-29 13:38:17]syncSuccessCount:1236syncFailCount:0
JobTask[2019-11-29 13:37:01,2019-11-29 13:37:11]syncSuccessCount:1213syncFailCount:0
JobTask[2019-11-29 13:37:45,2019-11-29 13:37:55]syncSuccessCount:1497syncFailCount:0
JobTask[2019-11-29 13:37:12,2019-11-29 13:37:22]syncSuccessCount:1323syncFailCount:0
JobTask[2019-11-29 13:37:34,2019-11-29 13:37:44]syncSuccessCount:1231syncFailCount:0
JobTask[2019-11-29 13:37:56,2019-11-29 13:38:06]syncSuccessCount:1180syncFailCount:0
Sync complete SyncStats{duration(s):155,jobCount=655,syncSuccessCount=857196,syncFailCount=0}
thus, the real-time online data migration task based on the table is completed.
In conclusion, the invention provides a fine migration strategy based on the InfluxDB table, and has rich user-defined operation interfaces; compared with a migration scheme aiming at the granularity of the whole database, the granularity of table migration scheme control is finer, data processing is more accurate, and migration efficiency is higher. The invention also provides a rich user-defined operation interface, and the user can define information such as the column of migration, the time interval of migration and the like, thereby providing more operation options for the user and having stronger operability. The invention realizes data migration by a scheme of splitting a small task of a data migration period and concurrency of tasks, and the small task causes the influence of the minimum data migration unit on the InfluxDB; task concurrency can improve migration efficiency. The invention increases the real-time monitoring of data migration, has the capability of dynamically adjusting the concurrency of tasks by monitoring the state of the data source in real time, and enables the migration rate to be dynamically adapted along with the state of the data source, thereby ensuring the migration efficiency, reducing the influence on the data source to the maximum extent and realizing the characteristic of online migration. The invention supports various target data source types, such as offline files, and can meet the data migration requirements of scenes such as network isolation, remote delivery and the like.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A table-based InfluxDB data migration method is characterized by comprising the following steps:
s1, providing data reading based on the InfluxDB table model, and supporting the reading of data in the source InfluxDB table according to a specific time interval and a specific column;
s2, providing a user-defined configuration interface, and supporting a table to be migrated, a column in the table to be migrated, the starting and ending time of data in the table to be migrated, a table data source to be migrated and a target data source which are defined by a user according to needs;
s3, for the migrated target data source, supporting the satisfaction of service scenes including network isolation and remote delivery in a file manner;
s4, calculating and instantiating all migration subtasks according to the configured time of each task slice and the start-stop time of the data of the table to be migrated;
s5, scheduling each independent data migration slicing task through a scheduling engine, performing data query on each independent data migration slicing task in a source InfluxDB according to a data query statement to be migrated, and uniformly writing a query result into a task queue containing a writing threshold;
s6, ensuring that data is read from the source InfluxDB and written to the target InfluxDB at the maximum rate;
s7, the scheduling engine monitors the query response time index of the source InfluxDB in real time, calculates a query response time-consuming percentile P95 in a latest sliding time window based on a percentile numerical algorithm, and dynamically adjusts the number of concurrent tasks by comparing the percentile P95 with a write-in threshold value;
wherein, the percentile numerical algorithm in the S7 is as follows: and sequencing a group of data from small to large, and calculating corresponding cumulative percentiles, wherein the value of the data corresponding to one percentile is called the percentile of the percentile, namely a group of n observed values are arranged according to the numerical value.
2. The method according to claim 1, wherein the step of calculating and instantiating all migration subtasks in S4 according to the configured time size of each task slice and the start-stop time of the data in the table to be migrated further comprises the steps of:
s41, the task instantiation engine divides the starting and ending time of the data of the table to be migrated by the time of the task fragment, and the number of migration tasks to be segmented is calculated;
s42, the task instantiation engine instantiates each migration subtask and the starting and ending time of each migration subtask;
and S43, acquiring related metadata according to the list information including the table to be migrated and the table to be migrated, configured by the user in the S2, and injecting the metadata into each instantiation task to be migrated.
3. The method according to claim 2, wherein the calculation formula of the number of migration tasks to be divided is:
(end time-start time) [ ms ]/task slicing time size [ ms ] — the number of migration tasks to be sliced.
4. The method according to claim 1, wherein in S5, each independent data migration fragmentation task is scheduled by a scheduling engine, and performs data query on the source infiluxdb according to a data query statement to be migrated by each independent data migration fragmentation task, and uniformly writes a query result into a task queue including a write threshold, further comprising:
s51, the scheduling engine initiates scheduling according to the number of concurrent tasks configured by the user and the instantiated tasks in S4;
s52, scheduling each data migration slicing task to be migrated, wherein each slicing task is a minimum unit which independently runs and migrates data;
s53, assembling a data query statement to be migrated when each independent data migration slicing task is scheduled to be initiated and executed;
and S54, the data migration slicing task carries out data query on the source InfluxDB according to the data query statement to be migrated, and uniformly writes the query result into a task queue containing a write-in threshold value.
5. The method of claim 4, wherein when the data query statement to be migrated is assembled in S53, the data query statement to be migrated is assembled according to a start time of a task, a column to be migrated, and a table name.
6. The method of claim 4, wherein the write threshold is 3 seconds or 5000 pieces, and when the number of pieces of index data in the queue is greater than 5000 pieces during writing, a batch write is initiated, and when the asynchronous refresh thread counts for 3 seconds, the asynchronous refresh thread initiates a data batch write operation.
7. The method for infiluxdb data migration based on table according to claim 1, wherein the scheduling engine in S7 performs real-time monitoring on the query response time index of the source infiluxdb database, calculates a percentile consumed time P95 for query response in the latest sliding time window based on a percentile numerical algorithm, and dynamically adjusts the number of concurrent tasks by comparing the percentile P95 with the write threshold, further comprising the following steps:
s71, the scheduling engine performs task scheduling according to the initialized concurrency number of the maximum task number/2;
s72, starting task scheduling monitoring, responding to time consumption according to the maximum query configured by the user, and simultaneously calculating the P95 state;
and S73, if the P95 query condition is met and the current concurrent task number is less than the maximum task concurrent number defined by the user, increasing the concurrent task number.
8. The method of claim 7, wherein if the query condition of P95 is satisfied and the number of concurrent tasks is greater than or equal to the maximum number of tasks in S73, the number of concurrent tasks is not adjusted.
9. The method of claim 8, wherein in step S73, if the P95 query condition is not satisfied, the number of concurrent tasks is reduced.
10. The method of claim 9, wherein if the P95 query condition is not satisfied and the number of tasks is 1 in S73, the migration is interrupted and the user is prompted to split the time slice into smaller time slices and slide the time slices according to the time window for the steps of S71-S73.
CN202110837793.2A 2021-07-23 2021-07-23 Table-based InfluxDB data migration method Pending CN113515503A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110837793.2A CN113515503A (en) 2021-07-23 2021-07-23 Table-based InfluxDB data migration method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110837793.2A CN113515503A (en) 2021-07-23 2021-07-23 Table-based InfluxDB data migration method

Publications (1)

Publication Number Publication Date
CN113515503A true CN113515503A (en) 2021-10-19

Family

ID=78068681

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110837793.2A Pending CN113515503A (en) 2021-07-23 2021-07-23 Table-based InfluxDB data migration method

Country Status (1)

Country Link
CN (1) CN113515503A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170195994A1 (en) * 2016-01-04 2017-07-06 Bank Of America Corporation Resource optimization allocation system
CN106970831A (en) * 2017-05-15 2017-07-21 金航数码科技有限责任公司 The resources of virtual machine dynamic scheduling system and method for a kind of facing cloud platform
CN111008188A (en) * 2019-10-29 2020-04-14 平安科技(深圳)有限公司 Data migration method and device, computer equipment and storage medium
CN111049914A (en) * 2019-12-18 2020-04-21 珠海格力电器股份有限公司 Load balancing method and device and computer system
CN112015716A (en) * 2020-08-04 2020-12-01 北京人大金仓信息技术股份有限公司 Database data migration method, device, medium and electronic equipment
CN112783859A (en) * 2021-01-08 2021-05-11 河北志晟信息技术股份有限公司 Lightweight concurrent migration method for database
CN113127412A (en) * 2021-04-23 2021-07-16 深圳市酷开网络科技股份有限公司 Data migration method and device, computer equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170195994A1 (en) * 2016-01-04 2017-07-06 Bank Of America Corporation Resource optimization allocation system
CN106970831A (en) * 2017-05-15 2017-07-21 金航数码科技有限责任公司 The resources of virtual machine dynamic scheduling system and method for a kind of facing cloud platform
CN111008188A (en) * 2019-10-29 2020-04-14 平安科技(深圳)有限公司 Data migration method and device, computer equipment and storage medium
CN111049914A (en) * 2019-12-18 2020-04-21 珠海格力电器股份有限公司 Load balancing method and device and computer system
CN112015716A (en) * 2020-08-04 2020-12-01 北京人大金仓信息技术股份有限公司 Database data migration method, device, medium and electronic equipment
CN112783859A (en) * 2021-01-08 2021-05-11 河北志晟信息技术股份有限公司 Lightweight concurrent migration method for database
CN113127412A (en) * 2021-04-23 2021-07-16 深圳市酷开网络科技股份有限公司 Data migration method and device, computer equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
WEI ZHANG; XIAOFEI LIAO HUAZHONG UNIVERSITY OF SCIENCE AND TECHNOLOGY ; PENG LI THE UNIVERSITY OF AIZU ; HAI JIN; LI LIN HUAZHONG : "Fine-Grained Scheduling in Cloud Gaming on Heterogeneous CPU-GPU Clusters", IEEE NETWORK, vol. 32, no. 1, 28 November 2017 (2017-11-28), pages 172, XP011676314, DOI: 10.1109/MNET.2017.1700047 *
刘健;张军伟;张浩;邵冰清;杨洪章;刘振军;: "蓝鲸元数据服务器集群的细粒度负载迁移", 计算机研究与发展, no. 1, 15 December 2014 (2014-12-15), pages 210 - 222 *
刘晴和;杨云;贺兴亚;徐文春;周媛媛;: "基于业务分割的并行式大数据迁移策略研究", 计算机与现代化, no. 11, 15 November 2014 (2014-11-15), pages 86 - 89 *

Similar Documents

Publication Publication Date Title
CN111460023B (en) Method, device, equipment and storage medium for processing service data based on elastic search
US7669026B2 (en) Systems and methods for memory migration
US7784051B2 (en) Cooperative scheduling using coroutines and threads
US7890675B2 (en) Apparatus, system, and method for real time job-specific buffer allocation
CN101968755B (en) Application load change adaptive snapshot generating method
EP2225633B1 (en) Data parallel production and consumption
Herodotou et al. Automating distributed tiered storage management in cluster computing
CN109684079B (en) Display data processing method and device and electronic equipment
CN101093454A (en) Method and device for executing SQL script file in distributed system
CN115373835A (en) Task resource adjusting method and device for Flink cluster and electronic equipment
CN111444158A (en) Long-short term user portrait generation method, device, equipment and readable storage medium
CN111488492A (en) Method and apparatus for retrieving graph database
CN114090580A (en) Data processing method, device, equipment, storage medium and product
CN115587118A (en) Task data dimension table association processing method and device and electronic equipment
CN115794337A (en) Resource scheduling method and device, cloud platform, equipment and storage medium
CN111782147A (en) Method and apparatus for cluster scale-up
CN110990476B (en) Data importing method, device, server and storage medium
CN113515503A (en) Table-based InfluxDB data migration method
EP2662783A1 (en) Data archiving approach leveraging database layer functionality
US20070067455A1 (en) Dynamically adjusting resources
CN115640280A (en) Data migration method and device
CN102495763B (en) Dynamic configuration method for program memory of computer
US20240020169A1 (en) Conserving computing resources for machine learning pipelines with a feature service
CN114189490B (en) User list processing method, system, electronic equipment and storage medium
WO2022266975A1 (en) Method for millisecond-level accurate slicing of time series stream data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination