WO2023279801A9 - 时序数据库的分区调整方法、装置、设备及可读存储介质 - Google Patents

时序数据库的分区调整方法、装置、设备及可读存储介质 Download PDF

Info

Publication number
WO2023279801A9
WO2023279801A9 PCT/CN2022/087412 CN2022087412W WO2023279801A9 WO 2023279801 A9 WO2023279801 A9 WO 2023279801A9 CN 2022087412 W CN2022087412 W CN 2022087412W WO 2023279801 A9 WO2023279801 A9 WO 2023279801A9
Authority
WO
WIPO (PCT)
Prior art keywords
time
block group
node
user
access
Prior art date
Application number
PCT/CN2022/087412
Other languages
English (en)
French (fr)
Other versions
WO2023279801A1 (zh
Inventor
毛靖琦
徐然
张宗全
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202111278133.1A external-priority patent/CN115599782A/zh
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Priority to EP22836560.7A priority Critical patent/EP4357931A1/en
Publication of WO2023279801A1 publication Critical patent/WO2023279801A1/zh
Publication of WO2023279801A9 publication Critical patent/WO2023279801A9/zh
Priority to US18/405,617 priority patent/US20240143626A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2477Temporal data queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning

Definitions

  • the present application relates to the technical field of data processing, in particular to a partition adjustment method, device, equipment and readable storage medium of a time series database.
  • time series databases data is stored in the form of data tables. Due to the continuous growth of data volume, time series databases need to disperse data tables to multiple nodes (nodes) for storage. Therefore, the data table needs to be partitioned (sharding) to form multiple shards, so that each shard is set in different nodes.
  • the data table is divided according to time periods to obtain multiple shard groups, and each shard group is set in a different time period.
  • a block group continue to divide according to certain rules to obtain multiple blocks, and each block is set in a different node.
  • the rules for dividing into multiple areas are the rules selected when creating the database or creating the data table.
  • the user's access habits to the data tables may change, so that the rules selected when creating the database or data tables do not match the access habits. If the selected rules are still used for partitioning, it may cause abnormal conditions such as load imbalance among nodes, thus affecting the read and write performance of nodes.
  • This application provides a time series data partition adjustment method, device, equipment and readable storage medium to solve the problems existing in related technologies.
  • the technical solution is as follows:
  • a partition adjustment method of a time-series database is provided. First, feature information of at least one user's access request for a data table of the time-series database is obtained, and the feature information is used to reflect the access habit of at least one user for the data table.
  • the data table is divided into multiple blocks according to predetermined rules, and each block can be divided into multiple blocks. Each block group is set in different time periods, and each block is set in different nodes.
  • the predetermined rules are adjusted according to the feature information, and new block groups and/or new blocks matching the access habits are generated according to the adjusted rules.
  • the present application acquires feature information reflecting the user's access habits to the data table, and adjusts predetermined rules based on the feature information, so as to generate new block groups and/or new blocks that match the access habits according to the adjusted predetermined rules.
  • the application can realize the timely update of the partition rules, avoid abnormal situations caused by inappropriate partition rules, make the block group and/or block match the user's access habits, and ensure the read and write performance of the nodes.
  • generating a new block and/or a new block matching the access habit according to the adjusted rule includes: generating a new block, and in the new block according to the adjusted rule Generate new zones that match access habits.
  • the method further includes: determining a reference time in a predetermined block group, where the predetermined block group is the block group where the feature information acquisition time is located, and the reference time is the data in the data table before the feature information acquisition time Maximum moment of writing.
  • the start time of the new block is determined based on the time interval between the reference time and the end time of the predetermined block.
  • the reference time is the maximum time when data is written in the data table.
  • the reference time is taken as the earliest time at which the predetermined rule can be adjusted, so as to avoid re-migration of the data written in the data table, thereby avoiding the occupation of processing resources during the data migration process.
  • this implementation method determines the starting time of the new block group based on the reference time.
  • determining the start time of the new block group includes: in response to the time interval being not less than the time threshold, determining the reference time is the starting time of the new block.
  • the method further includes: updating the predetermined block group, the start time of the updated block group is the start time of the predetermined block group, and the end time of the updated block group is the reference time.
  • the time interval is greater than the time threshold, it means that the time interval between the reference moment and the moment when the next block group is generated is relatively long , if you wait until the end of the predetermined block before adjusting the predetermined rule, you will still use the predetermined rule for a long time, which may cause an abnormal situation. Therefore, it is necessary to update the predetermined rules as soon as possible. For this reason, in this implementation, the earliest time at which the predetermined rules can be updated, that is, the reference time, is determined as the starting time of the new block group, so that the adjusted rule can be used in the new block group. This implementation makes the process of determining the starting moment of a new block more flexible.
  • the end time of the new block group is the end time of the predetermined block group.
  • the end time of the predetermined block group is used as the end time of the new block group. That is to say, the predetermined block group is divided into two different block groups by taking the reference time as the dividing point. One of the blocks is the updated block and the other block is the new block.
  • determining the start time of the new block group includes: in response to the time interval being smaller than the time threshold, changing the predetermined block group The end time is determined as the start time of the new block.
  • each block group is generated sequentially, and after a block group ends, the next block group is generated. If the time interval is less than the time threshold, it means that the time interval between the reference time and the time when the next block group is generated is relatively short , so this implementation can wait until the end of the predetermined block group, and then adjust the predetermined rules together when generating the next block group, that is, determine the end time of the predetermined block group as the start time of the new block group.
  • This implementation makes the process of determining the starting moment of a new block more flexible.
  • the feature information includes a query fan-out, and the query fan-out is used to indicate the number of nodes that need to be accessed to process the access request.
  • adjusting the predetermined rule according to the feature information includes: in response to querying that the number of nodes indicated by the fan-out degree is greater than the number threshold, determining the partition key based on the usage frequency of the access condition obtained by parsing the access request, and determining the partition key based on the partition Press to adjust the preset rules to get the adjusted rules.
  • the adjusted rules are obtained based on the partition key, which can reduce the query fan-out during the access process.
  • obtaining the feature information of at least one user's access request for the data table of the time series database includes: parsing the access request to obtain the access condition, and determining the node that needs to be accessed to process the access request based on the access condition. The number of nodes visited is determined as the query fanout.
  • the characteristic information includes a load imbalance degree, which is used to indicate the degree of load imbalance of different nodes.
  • adjusting the predetermined rule according to the characteristic information includes: in response to the unbalanced degree indicated by the load unbalanced degree being greater than a reference threshold, determining the boundary value of the zone based on the load of different nodes, and adjusting the predetermined rule based on the boundary value of the zone , to get the adjusted rule.
  • the adjusted rule is obtained based on the zone boundary value, which can improve the unbalanced load among the nodes.
  • the characteristic information of at least one user's access request for the data table of the time series database is obtained, including: at least one of the data volume based on different nodes, the number of timelines, and the access frequency of the timeline, Determine the load of different nodes; determine the degree of load imbalance based on the load of different nodes.
  • the feature information includes a correspondence between user locations and node locations, where the user location is the location of at least one user, and the node location is the location of a node that needs to be accessed for processing the access request.
  • adjusting the predetermined rule according to the feature information includes: determining the distance to the user position in response to the distance between the user position and the node position in the corresponding relationship between the user position and the node position being greater than a distance threshold For the updated nodes that are not greater than the distance threshold, the predetermined rules are adjusted based on the updated nodes to obtain adjusted rules. The adjusted rule is obtained based on the updated node, which can reduce the data transmission distance during the access process.
  • obtaining the characteristic information of at least one user's access request for the data table of the time series database includes: parsing the access request to obtain the access condition, determining the node to be accessed to process the access request based on the access condition, and based on the need The location of the visited node and the location of at least one user determine the correspondence between the location of the user and the location of the node.
  • the feature information includes at least one information of query fanout, load imbalance, and correspondence between user locations and node locations.
  • a device for adjusting partitions of a time series database comprising:
  • An acquisition module configured to acquire feature information of at least one user's access request for a data table of a time-series database, wherein the data table is divided into multiple blocks according to predetermined rules, each block is divided into multiple blocks, and each block is set separately In different time periods, each zone is set in a different node, and the characteristic information is used to reflect the access habits of at least one user to the data table;
  • the adjustment module is configured to adjust the predetermined rule according to the feature information, and generate a new block group and/or a new block matching the access habit according to the adjusted rule.
  • the adjustment module is configured to generate a new block group, and in the new block group, a new block matching the access habit is generated according to the adjusted rule.
  • the adjustment module is also used to determine the reference time in the predetermined block group, the predetermined block group is the block group where the acquisition time of the characteristic information is located, and the reference time is the time in the data table before the acquisition time of the characteristic information.
  • the adjustment module is configured to determine the reference moment as the starting moment of the new block group in response to the time interval being not less than the time threshold;
  • the adjustment module is also used to update the predetermined block group, the start time of the updated block group is the start time of the predetermined block group, and the end time of the updated block group is the reference time.
  • the end time of the new block group is the end time of the predetermined block group.
  • the adjustment module is configured to determine the end moment of the predetermined block group as the start moment of the new block group in response to the time interval being smaller than the time threshold.
  • the feature information includes query fan-out, and the query fan-out is used to indicate the number of nodes that need to be accessed to process the access request.
  • the adjustment module is configured to respond to the query that the number of nodes indicated by the fan-out degree is greater than the number threshold, determine the partition key based on the usage frequency of the access condition obtained by parsing the access request, and adjust the predetermined rule based on the partition key , to get the adjusted rule.
  • the acquisition module is configured to parse the access request to obtain the access condition, determine the nodes to be accessed to process the access request based on the access condition, and determine the number of nodes to be accessed as the query fanout.
  • the characteristic information includes a load imbalance degree, which is used to indicate the degree of load imbalance of different nodes.
  • the adjustment module is configured to respond to the unbalanced degree indicated by the load unbalanced degree being greater than a reference threshold, determine the zone boundary value based on the loads of different nodes, adjust the predetermined rule based on the zone boundary value, and obtain the adjusted the rule of.
  • the acquisition module is configured to determine the loads of different nodes based on at least one of the amount of data of different nodes, the number of timelines, and the access frequency of timelines; balance.
  • the feature information includes a correspondence between user locations and node locations, where the user location is the location of at least one user, and the node location is the location of a node that needs to be accessed for processing the access request.
  • the adjustment module is configured to determine that the distance to the user position is not greater than the distance threshold in response to the distance between the user position and the node position in the corresponding relationship between the user position and the node position being greater than the distance threshold.
  • the updated node of adjusts the predetermined rule based on the updated node, and obtains the adjusted rule.
  • the obtaining module is configured to parse the access request to obtain the access condition, determine the node to be accessed for processing the access request based on the access condition, and determine the user location based on the location of the node to be accessed and the location of at least one user Correspondence with the node position.
  • the feature information includes at least one information of query fanout, load imbalance, and correspondence between user locations and node locations.
  • a partition adjustment device for a time series database.
  • the device includes a memory and a processor; at least one instruction is stored in the memory, and at least one instruction is loaded and executed by the processor, so that the partition device realizes the above aspects. method.
  • processors there are one or more processors, and one or more memories.
  • the memory may be integrated with the processor, or the memory may be separated from the processor.
  • the memory can be a non-transitory (non-transitory) memory, such as a read-only memory (read only memory, ROM), which can be integrated with the processor on the same chip, or can be respectively arranged in different On the chip, the application does not limit the type of the memory and the arrangement of the memory and the processor.
  • a non-transitory memory such as a read-only memory (read only memory, ROM)
  • ROM read only memory
  • a computer-readable storage medium is provided. At least one instruction is stored in the computer-readable storage medium, and the instruction is loaded and executed by a processor to implement the methods in the above aspects.
  • a computer program product includes a computer program or an instruction, and the computer program or instruction is executed by a processor, so that the computer implements the methods in the above aspects.
  • a chip including a processor, configured to call from a memory and execute instructions stored in the memory, so that a communication device installed with the chip executes the methods in the above aspects.
  • another chip including: an input interface, an output interface, a processor, and a memory, the input interface, the output interface, the processor, and the memory are connected through an internal connection path, and the processor is used to execute the code in the memory, When the code is executed, the processor is used to perform the methods in the above aspects.
  • FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a partition adjustment method for a time series database provided in an embodiment of the present application
  • FIG. 3 is a schematic structural diagram of a data table of a time series database provided in an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a partitioning process of a data table of a time series database provided in an embodiment of the present application
  • FIG. 5 is a schematic diagram of a partitioning process of a data table of a time series database provided in an embodiment of the present application
  • Fig. 6 is a schematic diagram of a block group provided in the embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a time-series database partition adjustment device provided in an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a time series database partition adjustment device provided by an embodiment of the present application.
  • An embodiment of the present application provides a partition adjustment method of a time series database, and the method is applied in the implementation environment shown in FIG. 1 .
  • it includes a statistics module, a decision module, and a sharding module, and the statistics module, decision module, and sharding module are connected in pairs.
  • the statistics module, the decision module and the partition module can be integrated in the same hardware device, or can be integrated in different hardware devices, so as to realize the functions required by each module.
  • the hardware device includes, but is not limited to, a terminal device, a server, or other network devices that require partitioning, and the embodiment of the present application does not limit the type of the hardware device.
  • the statistics module collects the information relevant to the user request (request), and collects the load information (load information) of each node sent by the partition module, thereby statistically obtains feature information, for example, the feature information is used to reflect at least A user's access habits for data tables.
  • the characteristic information includes but not limited to: the corresponding relationship between user location and node location, load unbalance degree and query fanout degree.
  • the statistical module feeds back the statistical feature information to the decision-making module.
  • the decision module is configured to receive the feature information sent by the statistics module, and determine whether to update the currently used predetermined rule (sharding rule) based on the feature information. If an update is required, the predetermined rules are further adjusted according to the feature information to obtain adjusted rules, and the adjusted rules are sent to the partition module. Of course, if it is determined based on the characteristic information that the currently used predetermined rules do not need to be updated, the adjusted rules will not be sent to the partitioning module. In addition, no matter whether the predetermined rule is updated or not, after the current block ends, the partition module needs to be instructed to generate the next block.
  • a partition module for partitioning as directed by the decision module. Wherein, in response to receiving the adjusted rule sent by the decision-making module, a new block group and/or block is generated according to the adjusted rule. In addition, the partition module also generates the next block after the end of the current block according to the instruction of the decision-making module.
  • the embodiment of the present application provides a partition adjustment method for a time series database.
  • the method includes the following steps.
  • continuous statistics are performed based on at least one user's access request to the data table of the time series database, so as to obtain feature information. Since the statistical process is continuous, the feature information obtained through statistics may change over time. In some implementation manners, the feature information is acquired every reference time length, which is not limited in this embodiment of the present application, and the reference time length can be set according to experience. Wherein, the feature information will be described in detail later, and will not be repeated here. Next, the time series database and the data table of the time series database will be described.
  • a time-series database also known as a time-series database, is a database used to process time-series data, and time-series data corresponds to data with a timestamp (timestamp).
  • time-series data is stored in the form of a data table, and the time-series data is arranged in chronological order in the data table.
  • a piece of time series data includes a timestamp, a tag and at least one index data.
  • the at least one indicator data includes data generated by the data source and/or data used to indicate the attributes of the data source (also known as metadata), and the tag is used to uniquely identify the data source. For example, referring to FIG. 3, FIG.
  • the data source used to generate the indicator data is a device (device), and the label is a device identifier (device_id).
  • the indicator data includes data generated by the data source and data used to indicate the attributes of the data source.
  • the data generated by the data source includes the central processing unit (central processing unit, CPU) average value (cpu_1m_avg), free memory (free_mem) and temperature (temperature) per minute, and the data used to indicate the data source attribute includes the location identification (location_id ) and development type (dev_type).
  • the data tables of the time series database are divided into multiple blocks according to predetermined rules, and each block is divided into multiple blocks.
  • each block group is set in a different time period, and each block is set in a different node. Block groups and blocks are described in conjunction with FIG. 4 .
  • each shard group is set in a different time period, which is also called the shard group duration.
  • the durations of different blocks may be the same or different, and the embodiment of the present application does not limit the duration of the blocks.
  • the time period of one block is from 0:00 to 24:00 on Monday, and the time period of another block is from 0:00 to 24:00 on Tuesday, the time period of these two blocks is different ( Monday and Tuesday, respectively), but the two blocks have the same duration (both 24 hours).
  • each block is generated sequentially. That is to say, after one block ends, the next block is generated. For example, if the time period of a block is from 0:00 to 24:00 on Monday, then the block will end at 24:00 on Monday, and the next block will be generated at this time (ie 24:00 on Monday).
  • each block group is divided into multiple blocks according to predetermined rules, and the multiple blocks are respectively set in different nodes of the database, and each block includes at least one piece of time series data in the data table.
  • the predetermined rule includes, but is not limited to, at least one of labels included in the time series data and various index data, and this embodiment does not limit the predetermined rule.
  • the data table of the time series database is divided into multiple blocks according to predetermined rules, and each block is divided into multiple blocks, including: dividing the data table into multiple blocks according to the time range Multiple block groups, each block group is divided into multiple blocks according to predetermined rules.
  • the time period of the first block group is from 2017-01-01 01:02:00 to 2017-01-01 01:02:59, then the first block group includes time series data 1, time series data 2 and Time Series Data3.
  • the time period of the second block group is from 2017-01-01 01:03:00 to 2017-01-01 01:03:59, then the second block group includes time series data 4, time series data 5 and Time series data6.
  • the position identification is used as the predetermined rule, so that the first block group and the second block group are divided into two blocks respectively.
  • the first block in the first block group includes time series data 1 and time series data 2 (both location identifiers are 42), and the second block in the first block group includes time series data 3 (location ID is 77).
  • the first block in the second block group includes time series data 4 and time series data 5 (both location identifiers are 42), and the second block in the second block group includes time series data 6 (location identifier is 77 ).
  • the first block in the first block group and the first block in the second block group are both set in the first node
  • the second block in the first block group and the second block group are both set in the second in a node.
  • the time series data in the time period where the block group is located will be written into the block obtained according to the predetermined rule.
  • the time series data that has been written to the data table before the time period where the block is located will not be written again.
  • the time period of a block group is from 0:00 to 24:00 on Monday.
  • the data table has been divided into block groups and areas according to predetermined rules, and the characteristic information is used to reflect at least one user's access habit to the data table.
  • the feature information satisfies the condition
  • the manner in which the characteristic information satisfies the condition will be described in detail later together with the characteristic information, and will not be repeated here.
  • generating a new block group and/or a new block matching the access habit according to the adjusted rule includes: generating a new block group, and generating a new block group according to the adjusted rule in the new block group. Visit the new area for custom matching.
  • the new block group since the new block group is set in a time period, the new block group corresponds to a start time and an end time, and the embodiment of the present application needs to determine the start time and end time of the new block group.
  • the method further comprises: determining the reference instant in the predetermined block.
  • the start time of the new block is determined based on the time interval between the reference time and the end time of the predetermined block.
  • the predetermined block group is the block group at which the feature information is obtained at the moment.
  • the reference time is the maximum time when data is written in the data table before the acquisition time of the characteristic information.
  • the data written in the data table will not be rewritten, so it is necessary to determine the maximum time when data is written in the data table, that is, the reference time, after which the scheduled The rules are adjusted.
  • the embodiment of the present application acquires the maximum time stamp in the data table, so as to determine the time indicated by the maximum time stamp as the reference time.
  • such a manner of determining the reference time is only an example, and is not intended to limit the manner of determining the reference time in this embodiment.
  • the start time of a new block group is determined based on the time interval between the reference moment and the end time of the predetermined block group, including the following two ways A1 and A2.
  • each block group is generated sequentially, and the next block group will be generated after one block group ends.
  • the time interval between the reference moment and the end moment of the predetermined block group is less than the time threshold, it indicates that the time interval between the reference moment and the moment when the next block group is generated is relatively short. Therefore, this embodiment does not use the reference time as the starting time of a new block, that is, does not immediately generate a new block, and does not immediately adjust the predetermined rules, but waits for the end of the predetermined block, and then generates a new block. Adjust the predetermined rules together with each block.
  • the time threshold is 1 hour
  • the time period of the scheduled block group is from 0:00 to 24:00 on Monday
  • the reference time is 23:30 on Monday. Since the time interval between the reference time and the end time of the scheduled block group (that is, 24:00 on Monday) is 30 minutes, which is less than the time threshold of 1 hour, the end time of the scheduled block group is used as the start time of the new block group .
  • the end time of the new block group may be any time later than the start time of the new block group.
  • the block duration of the predetermined block is consistent with that of the new block. For example, the time period of the predetermined block group is from 0:00 to 24:00 on Monday, and the time period of the new block group is from 0:00 to 24:00 on Tuesday.
  • the reference time is determined as the starting time of the new block group, that is, a new block group is generated immediately, so that the predetermined rules can be adjusted in the new block group, thereby ensuring that the predetermined timely adjustment of the rules.
  • the time threshold is 1 hour
  • the time period of the scheduled block group is from 0:00 to 24:00 on Monday
  • the reference time is 14:00 on Monday. Since the time interval between the reference time and the end time of the predetermined block group (that is, 24:00 on Monday) is 8 hours, which is greater than the time threshold of 1 hour, the reference time is determined as the start time of the new block group.
  • the method further includes: updating the predetermined block group, the start time of the updated block group is the start time of the predetermined block group, and the end time of the updated block group is the reference time. Still taking the time period of the scheduled block group as 0:00 to 24:00 on Monday, and the reference time as 14:00 on Monday as an example, the starting time of the updated block group is 0:00 on Monday. After the update The end time of the block is 14:00 on Monday.
  • the end time of the new block group in mode A2 is the end time of the predetermined block group.
  • the time period of the predetermined block group is [T1, T2]
  • the reference time is T3 between T1 and T2
  • the starting time of the new block group is T3 which is consistent with the reference time
  • the new block group The end time of is the end time T2 of the scheduled block group.
  • the predetermined block is divided into two different block groups by taking the reference time as the dividing point. One of them is the new block [T3, T2], and the other is the updated block [T1, T3].
  • this embodiment does not limit the end time of the new block group, and the end time of the new block group may be later than or earlier than the end time of the predetermined block group.
  • the above-mentioned 201 and 202 can be executed multiple times, so as to ensure that the partition rules can be continuously updated according to actual needs, so that the rules for partitioning can adapt to the user's access habits, and ensure that each of the database The read and write performance of the node.
  • the above-mentioned 202 is aimed at the situation where the feature information satisfies the condition and the predetermined rule needs to be adjusted.
  • the feature information does not meet the conditions, it means that partitioning according to the predetermined rule will not lead to abnormal situations, so there is no need to adjust the predetermined rule, and the predetermined rule can still be used.
  • the next block group is generated directly after the end of the predetermined block group, and the next block group is still divided into multiple blocks according to the predetermined rule.
  • the start time of the next block group is the end time of the predetermined block group, and the embodiment of the present application does not limit the end time of the next block group.
  • the feature information of the data table includes at least one of the corresponding relationship between user locations and node locations, load imbalance, and query fanout. It can be understood that the above characteristic information is only an example, and is not used to limit this embodiment. In this embodiment, other information may also be used as characteristic information corresponding to the data table according to actual needs. Cases B1-B3 are used to illustrate the query fan-out degree, load unbalanced degree, and the corresponding relationship between user locations and node locations.
  • the query fanout is used to indicate the number of nodes that need to be visited to process the access request.
  • the manner of obtaining the query fan-out degree includes: analyzing the access request to obtain the access condition, determining the nodes to be visited to process the access request based on the access condition, and determining the number of nodes to be visited as the query fan-out degree.
  • the access request is used to read and write data from the data table, and which node or nodes need to be accessed to process the access request is determined by the access condition in the access request.
  • the access condition is to access the data in area A within a certain period of time, and the data in area A within this period of time are respectively located in node 1 and node 2, then it can be determined that the nodes to be accessed are node 1 and node 2.
  • the number of nodes to be visited can be counted, so as to determine the query fanout based on the number of nodes to be visited.
  • the number of nodes to be visited is directly used as the query fan-out degree. For example, in the previous example, the nodes to be visited are node 1 and node 2, and the number of nodes to be visited is 2, then the query fanout degree is determined to be 2.
  • the ratio of the number of nodes to be accessed to all nodes included in the database is used as the query fan-out degree. Still taking the example that the number of nodes to be accessed is 2, if the database includes 10 nodes, 2/10 is used as the query fan-out degree.
  • adjusting the predetermined rule according to the characteristic information includes: in response to querying that the number of nodes indicated by the fan-out degree is greater than the number threshold, determining the partition key based on the usage frequency of the access condition obtained by parsing the access request, based on the partition key The predetermined rule is adjusted to obtain the adjusted rule. It can be seen that if the number of nodes indicated by the fan-out degree in response to the query is greater than the number threshold, then the characteristic information is considered to meet the condition, and the predetermined rule needs to be adjusted.
  • the currently used partition key is time.
  • a block includes ten sub-periods from A0 to A9.
  • the time-series data of a sub-period is located in the same node, while the time-series data of an area may be scattered in different nodes.
  • the access condition is a region, for example, when you need to read and write all the time series data in region B0, you may need to visit all ten nodes to obtain all the time series data in region B0, so the query fanout is large, which may be greater than the number threshold. Therefore, it is necessary to determine a new partition key based on access conditions, so as to obtain adjusted rules.
  • adjusting the predetermined rule based on the partition key to obtain the adjusted rule includes: using the most frequently used access condition as the partition key, and replacing the predetermined rule with the partition key to obtain the adjusted rule. For example, if the most frequently used access condition is region, use region as the partition key. Still taking the above example as an example, after the region is used as the partition key, the time series data of a region are located in the same node instead of being scattered in different nodes. When it is necessary to read and write all the time series data in area B0, only one node corresponding to area B0 needs to be accessed, thereby reducing the query fan-out.
  • the access conditions are arranged in descending order of usage frequency to obtain an access condition sequence, and the first reference number of access conditions in the access condition sequence are used as partition keys.
  • the present embodiment does not limit the reference number, which is a positive integer not less than 2. For example, if the most frequently used access condition is region, the second most frequently used access condition is time, and the reference number is 2, then both region and time are used as the adjusted rule.
  • the user may set the partition key randomly, or set the partition key according to the actual needs at the time of creation.
  • the user's business may change, resulting in the previously set partition key no longer applicable to the user's current business, and the partition obtained according to the partition key does not match the user's access habits.
  • the partition key is adjusted according to the fan-out degree of the query, so that the partition key matches the user's current business status, and the partition obtained according to the partition key matches the user's access habits, thereby improving the read and write performance of the node .
  • the degree of load imbalance is used to indicate the degree of load imbalance of different nodes.
  • the manner of obtaining the load imbalance includes: determining the loads of different nodes based on at least one of the data volume of different nodes, the number of timelines, and the access frequency of timelines. The degree of load imbalance is determined based on the load of different nodes.
  • the timeline corresponds to the data source one by one, and one timeline includes one or more pieces of time series data corresponding to the data source.
  • the timeline corresponding to the data source abc123 includes time series data 1 and time series data 4 .
  • the access frequency of the timeline is the frequency of reading and writing to the timeline.
  • the above-mentioned different nodes refer to: nodes configured with various areas of the data table.
  • the database includes 10 nodes in total, and the data table in a block group includes 5 blocks set among the 5 nodes, then different nodes refer to 5 nodes set with blocks among the 10 nodes.
  • different nodes refer to: all nodes included in the database.
  • the database includes 10 nodes in total, and these 10 nodes are the above-mentioned different nodes.
  • determining the load unbalanced degree based on the loads of different nodes includes: for a node, determining the load value of the node based on the load of the node, and determining the load unbalanced degree based on the load values of different nodes. For example, the variance of the load value of each node is calculated, and the variance is used as the degree of load imbalance.
  • determining the load value of a node based on the load of the node includes: determining the first sub-value based on the data volume of the node, determining the second sub-value based on the number of timelines of the node, and determining the second sub-value based on the time of the node
  • the access frequency of the line determines the third sub-value, and the weighted sum of the first sub-value, the second sub-value and the third sub-value is used as the load value of the node.
  • the weights of different subvalues are the same or different, and this embodiment does not limit the weights of different subvalues.
  • adjusting the predetermined rule according to the characteristic information includes: in response to the degree of imbalance indicated by the load imbalance degree being greater than a reference threshold, determining the zone boundary value between different nodes based on the load of different nodes, based on the zone boundary The value adjusts the predetermined rule to obtain the adjusted rule. It can be seen that, in response to the unbalanced degree indicated by the load unbalanced degree being greater than the reference threshold, it is considered that the characteristic information satisfies the condition, and the predetermined rule needs to be adjusted.
  • the loads of different nodes are adjusted by modifying the area boundary value, so as to realize load balancing among different nodes.
  • adjusting the predetermined rule based on the zone boundary value to obtain the adjusted rule includes: replacing the predetermined rule with the zone boundary value to obtain the adjusted rule.
  • the adjustment may be performed in units of a timeline, which is not limited in this embodiment. The following uses the timeline as an example for illustration.
  • node A includes timelines 0-999
  • node B includes timelines 1000-1999, so the block boundary values are 999 and 1000. If the load of node A in the predetermined block is relatively large and the load of node B is relatively small, then there is load imbalance between node A and node B. Therefore, it is necessary to reduce the load of node A and increase the load of node B, so that the load of node A and node B can be balanced.
  • the time series data corresponding to the data source corresponding to the timeline 0-599 is written to node A, and the timeline 600-1999
  • the time series data corresponding to the corresponding data source is written to node B, and the loads of node A and node B reach a load balance within the time period where the new block is located.
  • the user location is the location of at least one user
  • the node location is the location of the node that needs to be accessed for processing the access request.
  • obtaining the corresponding relationship between the user's location and the node's location includes: analyzing the access request to obtain the access condition, determining the node that needs to be visited to process the access request based on the access condition, and determining the relationship between the user's location and the location of at least one user based on the node that needs to be visited Correspondence between node positions.
  • the access request may carry the identifier of the user equipment, and the identifier of the user equipment is used to uniquely indicate the user equipment, so the location of the user can be determined based on the identifier of the user equipment.
  • the process of determining the node to be visited refer to the description above, and details will not be repeated here.
  • the number of nodes to be visited is one, it is sufficient to directly store the node positions and user positions correspondingly.
  • the correspondence between the user positions and the node positions is a one-to-one correspondence.
  • the node positions of each node may be stored in correspondence with the user positions, and the correspondence between the user positions and the node positions is a one-to-many correspondence.
  • the feature information includes the correspondence between the user position and the node position
  • adjusting the predetermined rule according to the feature information includes: responding to the distance between the user position and the node position in the correspondence between the user position and the node position being greater than the distance Threshold, determine the updated node whose distance from the user location is not greater than the distance threshold, adjust the predetermined rule based on the updated node, and obtain the adjusted rule. It can be seen that, in response to the distance between the user position and the node position in the corresponding relationship between the user position and the node position being greater than the distance threshold, the characteristic information is considered to meet the condition, and the predetermined rule needs to be adjusted.
  • the time series data that needs to be read and written will be transmitted between the user device and the node device. Therefore, the farther the distance between the user location and the node location is, the longer the time series data needs to be transmitted, which may lead to an increase in the number of routes and reduce the read and write performance of the node. Therefore, the time series data that needs to be read and written is adjusted to be stored in a node whose distance from the user equipment is not greater than the distance threshold, that is, adjusted to be stored in an updated node.
  • the timeline may be used as the unit for adjustment, which is not limited in this embodiment. The following uses the timeline as an example for illustration.
  • the timeline to be visited is located in node A, and the distance between node A's position and the user's position exceeds a distance threshold.
  • the time series data of the timeline is written to node B instead of node A, and subsequent users only need to access node B when they need to access the timeline.
  • the data transmission distance is shortened, and the read and write performance of nodes is improved.
  • the characteristic information includes at least two types of information among load unbalance degree, query fan-out degree and the corresponding relationship between user position and node position, then the predetermined rule is adjusted according to the characteristic information, including the following two ways C1 and C2.
  • Manner C1 adjust the predetermined rule based on the condition that the priority level meets the threshold.
  • priority levels are respectively set for each satisfied condition, and the priority level is used to indicate the priority degree of the condition.
  • adjusting the predetermined rule based on the condition that the priority level meets the threshold includes: adjusting the predetermined rule based on the condition with the highest priority level.
  • the priority levels of each condition from high to low are as follows: Condition 1—the number of nodes indicated by the query fanout is greater than the number threshold; Condition 2—the degree of imbalance indicated by the load imbalance is greater than the reference threshold; Condition 3 -- The distance between the user's location and the node's location exceeds the distance threshold.
  • condition 1 the number of nodes indicated by the query fanout is greater than the number threshold
  • Condition 2 the degree of imbalance indicated by the load imbalance is greater than the reference threshold
  • Condition 3 The distance between the user's location and the node's location exceeds the distance threshold.
  • Way C2 Adjusting the predetermined rule based on the various conditions that are met.
  • the sub-partition rules are respectively determined for each satisfied condition, and then the sub-partition rules are combined into adjusted rules.
  • the sub-partition rule determined based on condition 1 is the partition key
  • the sub-partition rule determined based on condition 2 is the region boundary value. Therefore, both the partition key and the partition boundary value are used as adjusted rules.
  • the embodiment of the present application adjusts the predetermined rule based on the feature information, and generates a new block group and/or block that matches the user's access habit to the data table according to the adjusted rule.
  • the application can update the partition rules in time, avoid abnormal situations caused by inappropriate partition rules, match the block group and/or block with the user's access habits, and ensure the read and write performance of the nodes of the time series database.
  • FIG. 7 is a schematic structural diagram of an apparatus for adjusting partitions of a time series database provided in an embodiment of the present application, and the apparatus can be applied to the hardware device shown in FIG. 1 .
  • the device for adjusting partitions of the time series database shown in FIG. 7 can execute the operations in the method embodiment shown in FIG. 2 .
  • the device may include more additional modules than those shown or omit some of the modules shown therein, which is not limited in this embodiment of the present application.
  • the device for adjusting partitions of a time series database provided in the embodiment of the present application includes the following modules.
  • the acquiring module 701 is configured to acquire feature information of at least one user's access request for a data table of the time series database, wherein the data table is divided into multiple blocks according to predetermined rules, each block is divided into multiple blocks, and each block is respectively It is set in different time periods, each zone is set in a different node, and the characteristic information is used to reflect at least one user's access habit to the data table.
  • the steps performed by the obtaining module 701 refer to the description in 201 above, and will not be repeated here.
  • the adjustment module 702 is configured to adjust the predetermined rule according to the characteristic information, and generate a new block group and/or a new block matching the access habit according to the adjusted rule. For the steps performed by the adjustment module 702, refer to the description in 202 above, and will not be repeated here.
  • the adjustment module 702 is configured to generate a new block group, and in the new block group, a new block matching the access habit is generated according to the adjusted rule.
  • the adjustment module 702 is also used to determine the reference time in the predetermined block group, the predetermined block group is the block group where the acquisition time of the characteristic information is located, and the reference time is the data table before the acquisition time of the characteristic information There is a maximum time for data writing in ; based on the time interval between the reference time and the end time of the predetermined block group, determine the start time of the new block group.
  • the adjustment module 702 is configured to determine the reference moment as the starting moment of the new block group in response to the time interval being not less than the time threshold;
  • the adjustment module 702 is also used to update the predetermined block group, the start time of the updated block group is the start time of the predetermined block group, and the end time of the updated block group is the reference time.
  • the end time of the new block group is the end time of the predetermined block group.
  • the adjustment module 702 is configured to determine the end time of the predetermined block group as the start time of the new block group in response to the time interval being smaller than the time threshold.
  • the feature information includes a query fan-out, and the query fan-out is used to indicate the number of nodes that need to be accessed to process the access request.
  • the adjustment module 702 is configured to respond to the query that the number of nodes indicated by the fan-out degree is greater than the number threshold, determine the partition key based on the frequency of use of the access condition obtained by parsing the access request, and adjust the scheduled node based on the partition key. rules, to get adjusted rules.
  • the obtaining module 701 is configured to parse the access request to obtain the access condition, determine the nodes to be visited to process the access request based on the access condition, and determine the number of nodes to be visited as the query fanout.
  • the characteristic information includes a load imbalance degree, which is used to indicate the degree of load imbalance of different nodes.
  • the adjustment module 702 is configured to determine the zone boundary value based on the loads of different nodes in response to the imbalance degree indicated by the load imbalance degree being greater than a reference threshold, adjust the predetermined rule based on the zone boundary value, and obtain the adjusted after the rules.
  • the acquisition module 701 is configured to determine the loads of different nodes based on at least one of the data volume of different nodes, the number of timelines, and the access frequency of timelines; determine the load based on the loads of different nodes unbalanced degree.
  • the feature information includes a correspondence between user locations and node locations, where the user location is the location of at least one user, and the node location is the location of a node that needs to be accessed for processing the access request.
  • the adjustment module 702 is configured to determine that the distance between the user position and the node position is not greater than the distance
  • the updated node of the threshold is used to adjust the predetermined rule based on the updated node to obtain the adjusted rule.
  • the acquisition module 701 is configured to parse the access request to obtain the access condition, determine the node to be accessed for processing the access request based on the access condition, and determine the user based on the location of the node to be accessed and the location of at least one user Correspondence between position and node position.
  • the feature information includes at least one information of query fanout, load imbalance, and correspondence between user locations and node locations.
  • the embodiment of the present application adjusts the predetermined rule based on the feature information, and generates a new block group and/or block that matches the user's access habit to the data table according to the adjusted rule.
  • the application can update the partition rules in time, avoid abnormal situations caused by inappropriate partition rules, match the block group and/or block with the user's access habits, and ensure the read and write performance of the nodes of the time series database.
  • the present application provides a time series database partition adjustment device, the device includes: a communication interface and a processor, and optionally, the communication device further includes a memory.
  • the communication interface, the memory and the processor communicate with each other through an internal connection path, the memory is used to store instructions, and the processor is used to execute the instructions stored in the memory to control the communication interface to receive signals and control the communication interface to send signals , and when the processor executes the instructions stored in the memory, the processor is made to execute any exemplary time series database partition adjustment method provided in the present application.
  • FIG. 8 shows a schematic structural diagram of an exemplary time series database partition adjustment device 800 of the present application.
  • the device 800 shown in FIG. 8 is configured to perform the operations involved in the partition adjustment method of the time series database shown in FIG. 2 above.
  • the device 800 is, for example, a server, a server cluster composed of multiple servers, or a cloud computing service center.
  • a device 800 includes at least one processor 801 , a memory 803 and at least one communication interface 804 .
  • the processor 801 is, for example, a general-purpose CPU, a digital signal processor (digital signal processor, DSP), a network processor (network processor, NP), a GPU, a neural network processor (neural-network processing units, NPU), a data processing unit ( Data Processing Unit, DPU), microprocessor or one or more integrated circuits or application-specific integrated circuits (application-specific integrated circuit, ASIC), programmable logic device (programmable logic device, PLD) or Other programmable logic devices, transistor logic devices, hardware components, or any combination thereof.
  • DSP digital signal processor
  • NP network processor
  • GPU a neural network processor
  • NPU neural-network processing units
  • DPU data processing unit
  • microprocessor or one or more integrated circuits or application-specific integrated circuits application-specific integrated circuit, ASIC
  • programmable logic device programmable logic device, PLD
  • Other programmable logic devices transistor logic devices, hardware components, or any combination thereof.
  • the PLD is, for example, a complex programmable logic device (complex programmable logic device, CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL) or any combination thereof. It can implement or execute the various logical blocks, modules and circuits described in connection with the present disclosure.
  • the processor can also be a combination of computing functions, for example, a combination of one or more microprocessors, a combination of DSP and a microprocessor, and so on.
  • the device 800 further includes a bus 802 .
  • Bus 802 is used to transfer information between the various components of device 800 .
  • the bus 802 may be a peripheral component interconnect standard (PCI for short) bus or an extended industry standard architecture (EISA for short) bus or the like.
  • PCI peripheral component interconnect standard
  • EISA extended industry standard architecture
  • the bus 802 can be divided into address bus, data bus, control bus and so on. For ease of representation, only one line is used in FIG. 8 , but it does not mean that there is only one bus or one type of bus.
  • the memory 803 is, for example, a read-only memory (read-only memory, ROM) or other types of storage devices that can store static information and instructions, or a random access memory (random access memory, RAM) or other types that can store information and instructions.
  • types of dynamic storage devices such as electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be used by Any other medium accessed by a computer, but not limited to.
  • the memory 803 exists independently, for example, and is connected to the processor 801 through the bus 802 .
  • the memory 803 can also be integrated with the processor 801.
  • the communication interface 804 uses any device such as a transceiver for communicating with other devices or a communication network.
  • the communication network can be Ethernet, radio access network (radio access network, RAN) or wireless local area network (wireless local area network, WLAN). )wait.
  • the communication interface 804 may include a wired communication interface, and may also include a wireless communication interface.
  • the communication interface 804 can be an Ethernet (Ethernet) interface, such as: Fast Ethernet (Fast Ethernet, FE) interface, Gigabit Ethernet (Gigabit Ethernet, GE) interface, Asynchronous Transfer Mode (Asynchronous Transfer Mode, ATM) interface, WLAN interface, cellular network communication interface or a combination thereof.
  • the Ethernet interface can be an optical interface, an electrical interface or a combination thereof.
  • the communication interface 804 may be used for the device 800 to communicate with other devices.
  • the processor 801 may include one or more CPUs, such as CPU0 and CPU1 shown in FIG. 8 . Each of these processors can be a single-core processor or a multi-core processor.
  • a processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (eg, computer program instructions).
  • the device 800 may include multiple processors, such as the processor 801 and the processor 805 shown in FIG. 8 . Each of these processors can be a single-core processor or a multi-core processor.
  • a processor herein may refer to one or more devices, circuits, and/or processing cores for processing data such as computer program instructions.
  • the memory 803 is used to store the program code 810 for implementing the solution of the present application, and the processor 801 can execute the program code 810 stored in the memory 803 . That is, the device 800 can implement the partition adjustment method of the time series database provided by the method embodiment through the processor 801 and the program code 810 in the memory 803 .
  • One or more software modules may be included in the program code 810 .
  • the processor 801 itself may also store program codes or instructions for executing the solution of the present application.
  • the device 800 of the present application may correspond to the device for performing the above method, and the processor 801 in the device 800 reads the instructions in the memory 803, so that the device 800 shown in FIG. 8 can execute the method embodiment All or some of the steps in .
  • the device 800 may also correspond to the device shown in FIG. 7 above, and each functional module in the device shown in FIG. 7 is implemented by software of the device 800 .
  • the functional modules included in the apparatus shown in FIG. 7 are generated after the processor 801 of the device 800 reads the program code 810 stored in the memory 803 .
  • each step of the partition adjustment method of the time series database shown in FIG. 2 is completed by an integrated logic circuit of hardware in the processor of the device 800 or an instruction in the form of software.
  • the steps combined with the method embodiments disclosed in this application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
  • the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps of the above method embodiments in combination with its hardware. To avoid repetition, no detailed description is given here.
  • this embodiment provides a partition adjustment device for a time series database.
  • the device includes a memory and a processor; at least one instruction is stored in the memory, and at least one instruction is loaded and executed by the processor, so that the partition device Implement the partition adjustment method of the time series database provided by any exemplary embodiment of the present application.
  • a computer-readable storage medium is provided, and at least one instruction is stored in the computer-readable storage medium, and the instruction is loaded and executed by a processor to implement any one of the exemplary embodiments of the present application.
  • the partition adjustment method of the time series database is provided, and at least one instruction is stored in the computer-readable storage medium, and the instruction is loaded and executed by a processor to implement any one of the exemplary embodiments of the present application.
  • a computer program product includes a computer program or an instruction, and the computer program or instruction is executed by a processor, so that the computer implements the sequence provided by any of the exemplary embodiments of the present application.
  • the database partition adjustment method is provided, the computer program product includes a computer program or an instruction, and the computer program or instruction is executed by a processor, so that the computer implements the sequence provided by any of the exemplary embodiments of the present application.
  • a chip including a processor, configured to call and execute instructions stored in the memory from the memory, so that the communication device installed with the chip executes the instructions described in any of the exemplary embodiments of the present application.
  • the partition adjustment method of the provided time series database is provided, including a processor, configured to call and execute instructions stored in the memory from the memory, so that the communication device installed with the chip executes the instructions described in any of the exemplary embodiments of the present application.
  • another chip including: an input interface, an output interface, a processor, and a memory.
  • the input interface, the output interface, the processor, and the memory are connected through an internal connection path, and the processor is used to execute the memory in the memory.
  • the code when the code is executed, the processor is configured to execute the time series database partition adjustment method provided by any exemplary embodiment of the present application.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the present application will be generated in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, optical fiber, DSL) or wireless (eg, infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, DVD), or a semiconductor medium (for example, a Solid State Disk).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了时序数据库的分区调整方法、装置、设备及可读存储介质,属于数据处理技术领域。方法包括:首先,获取至少一个用户针对时序数据库的数据表的访问请求的特征信息,特征信息用于反映至少一个用户针对数据表的访问习惯。其中,数据表按照预定规则分成多个区组,每个区组又能分成多个区。各个区组分别设置在不同时间段中,各个区分别设置在不同节点中。之后,根据特征信息调整预定规则,根据调整后的规则产生与访问习惯匹配的新的区组和/或新的区。本申请能够根据特征信息及时进行分区规则的更新,避免了由于分区规则不恰当而导致异常情况的发生,使得区组和/或区与用户的访问习惯相匹配,保证了节点的读写性能。

Description

时序数据库的分区调整方法、装置、设备及可读存储介质
本申请要求于2021年07月08日提交的申请号为202110770891.9、发明名称为“一种数据库的处理方法和装置”的中国专利申请的优先权,本申请还要求于2021年10月30日提交的申请号为202111278133.1、发明名称为“时序数据库的分区调整方法、装置、设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理技术领域,特别涉及时序数据库的分区调整方法、装置、设备及可读存储介质。
背景技术
在时序数据库中,数据以数据表的形式存储。由于数据量的不断增长,时序数据库需要将数据表分散至多个节点(node)进行存储。因此,需要对数据表进行分区(sharding),形成多个区(shard),从而将各个区分别设置在不同节点中。
相关技术中,首先按照时间段对数据表进行划分,得到多个区组(shard group),每个区组分别设置在不同时间段中。在一个区组中,按照一定的规则继续进行划分,得到多个区,每个区分别设置在不同节点中。其中,用于划分得到多个区的规则是在创建数据库或创建数据表时选定的规则。
然而,随着用户业务的变化,用户针对数据表的访问习惯可能发生改变,从而使得在创建数据库或创建数据表时选定的规则与访问习惯不匹配。如果仍使用选定的规则进行分区,则可能会导致节点之间负载不均衡等异常情况的发生,从而影响了节点的读写性能。
发明内容
本申请提供了一种时序数据的分区调整方法、装置、设备及可读存储介质,以解决相关技术存在的问题,技术方案如下:
第一方面,提供了一种时序数据库的分区调整方法,首先,获取至少一个用户针对时序数据库的数据表的访问请求的特征信息,特征信息用于反映至少一个用户针对数据表的访问习惯。其中,数据表按照预定规则分成多个区组,每个区组又能分成多个区。各个区组分别设置在不同时间段中,各个区分别设置在不同节点中。之后,根据特征信息调整预定规则,根据调整后的规则产生与访问习惯匹配的新的区组和/或新的区。
本申请获取用于反映用户针对数据表的访问习惯的特征信息,基于该特征信息调整预定规则,从而根据调整后的预定规则产生与访问习惯匹配的新的区组和/或新的区。本申请能够实现对分区规则的及时更新,避免了由于分区规则不恰当而导致异常情况的发生,使得区组和/或区与用户的访问习惯相匹配,保证了节点的读写性能。
在一种可能的实现方式中,根据调整后的规则产生与访问习惯匹配的新的区组和/或新的区,包括:产生新的区组,在新的区组中根据调整后的规则产生与访问习惯匹配的新的区。
在一种可能的实现方式中,方法还包括:在预定区组中确定参考时刻,预定区组为特征信息的获取时刻所在的区组,参考时刻为特征信息的获取时刻之前数据表中有数据写入的最大时刻。基于参考时刻与预定区组的结束时刻之间的时间间隔,确定新的区组的起始时刻。在该实现方式中,参考时刻为数据表中有数据写入的最大时刻。本实现方式将该参考时刻作为能够调整预定规则的最早时刻,从而避免对已写入数据表中的数据重新进行迁移,进而避免了数据迁移过程占用处理资源。正是由于该参考时刻是能够调整预定规则的最早时刻,因而本实现方式基于该参考时刻确定新的区组的起始时刻。
在一种可能的实现方式中,基于参考时刻与预定区组的结束时刻之间的时间间隔,确定新的区组的起始时刻,包括:响应于时间间隔不小于时间阈值,将参考时刻确定为新的区组的起始时刻。方法还包括:更新预定区组,更新后的区组的起始时刻为预定区组的起始时刻,更新后的区组的结束时刻为参考时刻。其中,各个区组是依次生成的,一个区组结束之后则产生下一个区组,响应于该时间间隔大于时间阈值,则说明参考时刻与产生下一个区组的时刻之间的时间间隔较长,如果等待至预定区组结束之后再对预定规则进行调整,则还要使用预定规则较长时间,可能造成异常情况。因此,需要尽快对预定规则进行更新。基于此种原因,该实现方式将能够对预定规则进行更新的最早时刻,即参考时刻,确定为新的区组的起始时刻,以便于在新的区组中使用调整后的规则。该实现方式使得确定新的区组的起始时刻的过程较为灵活。
在一种可能的实现方式中,新的区组的结束时刻为预定区组的结束时刻。在该实现方式中,将预定区组的结束时刻作为新的区组的结束时刻。也就是说,以参考时刻为分界点,将预定区组划分为两个不同的区组。其中一个区组为更新后的区组,另一个区组为新的区组。
在一种可能的实现方式中,基于参考时刻与预定区组的结束时刻之间的时间间隔,确定新的区组的起始时刻,包括:响应于时间间隔小于时间阈值,将预定区组的结束时刻确定为新的区组的起始时刻。其中,各个区组是依次生成的,一个区组结束之后则生成下一个区组,响应于该时间间隔小于时间阈值,则说明参考时刻与生成下一个区组的时刻之间的时间间隔较短,因而本实现方式可等待至预定区组结束之后,在产生下一个区组时一并调整预定规则,也即是将预定区组的结束时刻确定为新的区组的起始时刻。该实现方式使得确定新的区组的起始时刻的过程较为灵活。
在一种可能的实现方式中,特征信息包括查询扇出度,查询扇出度用于指示处理访问请求所需要访问的节点的数量。
在一种可能的实现方式中,根据特征信息调整预定规则,包括:响应于查询扇出度指示的节点的数量大于数量阈值,基于解析访问请求得到的访问条件的使用频率确定分区键,基于分区键调整预定规则,得到调整后的规则。基于分区键获得调整后的规则,能够减小访问过程中的查询扇出度。
在一种可能的实现方式中,获取至少一个用户针对时序数据库的数据表的访问请求的特征信息,包括:解析访问请求得到访问条件,基于访问条件确定处理访问请求所需要访问的节点,将需要访问的节点的数量确定为查询扇出度。
在一种可能的实现方式中,特征信息包括负载不均衡度,负载不均衡度用于指示不同节点的负载的不均衡程度。
在一种可能的实现方式中,根据特征信息调整预定规则,包括:响应于负载不均衡度指 示的不均衡程度大于参考阈值,基于不同节点的负载确定区边界值,基于区边界值调整预定规则,得到调整后的规则。基于区边界值获得调整后的规则,能够改善各个节点之间负载不均衡的情况。
在一种可能的实现方式中,获取至少一个用户针对时序数据库的数据表的访问请求的特征信息,包括:基于不同节点的数据量、时间线数量和时间线的访问频率中的至少一种,确定不同节点的负载;基于不同节点的负载确定负载不均衡度。
在一种可能的实现方式中,特征信息包括用户位置与节点位置的对应关系,用户位置为至少一个用户的位置,节点位置为处理访问请求所需要访问的节点的位置。
在一种可能的实现方式中,根据特征信息调整预定规则,包括:响应于用户位置与节点位置的对应关系中用户位置与节点位置之间的距离大于距离阈值,确定与用户位置之间的距离不大于距离阈值的更新后的节点,基于更新后的节点调整预定规则,得到调整后的规则。基于更新后的节点获取调整后的规则,能够减小访问过程中的数据传输距离。
在一种可能的实现方式中,获取至少一个用户针对时序数据库的数据表的访问请求的特征信息,包括:解析访问请求得到访问条件,基于访问条件确定处理访问请求所需要访问的节点,基于需要访问的节点的位置和至少一个用户的位置确定用户位置与节点位置的对应关系。
在一种可能的实现方式中,特征信息包括查询扇出度、负载不均衡度以及用户位置与节点位置的对应关系中的至少一种信息。
第二方面,提供了一种时序数据库的分区调整装置,该装置包括:
获取模块,用于获取至少一个用户针对时序数据库的数据表的访问请求的特征信息,其中,数据表按照预定规则分成多个区组,每个区组分成多个区,每个区组分别设置在不同时间段中,每个区分别设置在不同节点中,特征信息用于反映至少一个用户针对数据表的访问习惯;
调整模块,用于根据特征信息调整预定规则,根据调整后的规则产生与访问习惯匹配的新的区组和/或新的区。
在一种可能的实现方式中,调整模块,用于产生新的区组,在新的区组中根据调整后的规则产生与访问习惯匹配的新的区。
在一种可能的实现方式中,调整模块,还用于在预定区组中确定参考时刻,预定区组为特征信息的获取时刻所在的区组,参考时刻为特征信息的获取时刻之前数据表中有数据写入的最大时刻;基于参考时刻与预定区组的结束时刻之间的时间间隔,确定新的区组的起始时刻。
在一种可能的实现方式中,调整模块,用于响应于时间间隔不小于时间阈值,将参考时刻确定为新的区组的起始时刻;
调整模块,还用于更新预定区组,更新后的区组的起始时刻为预定区组的起始时刻,更新后的区组的结束时刻为参考时刻。
在一种可能的实现方式中,新的区组的结束时刻为预定区组的结束时刻。
在一种可能的实现方式中,调整模块,用于响应于时间间隔小于时间阈值,将预定区组的结束时刻确定为新的区组的起始时刻。
在一种可能的实现方式中,特征信息包括查询扇出度,查询扇出度用于指示处理访问请 求所需要访问的节点的数量。
在一种可能的实现方式中,调整模块,用于响应于查询扇出度指示的节点的数量大于数量阈值,基于解析访问请求得到的访问条件的使用频率确定分区键,基于分区键调整预定规则,得到调整后的规则。
在一种可能的实现方式中,获取模块,用于解析访问请求得到访问条件,基于访问条件确定处理访问请求所需要访问的节点,将需要访问的节点的数量确定为查询扇出度。
在一种可能的实现方式中,特征信息包括负载不均衡度,负载不均衡度用于指示不同节点的负载的不均衡程度。
在一种可能的实现方式中,调整模块,用于响应于负载不均衡度指示的不均衡程度大于参考阈值,基于不同节点的负载确定区边界值,基于区边界值调整预定规则,得到调整后的规则。
在一种可能的实现方式中,获取模块,用于基于不同节点的数据量、时间线数量和时间线的访问频率中的至少一种,确定不同节点的负载;基于不同节点的负载确定负载不均衡度。
在一种可能的实现方式中,特征信息包括用户位置与节点位置的对应关系,用户位置为至少一个用户的位置,节点位置为处理访问请求所需要访问的节点的位置。
在一种可能的实现方式中,调整模块,用于响应于用户位置与节点位置的对应关系中用户位置与节点位置之间的距离大于距离阈值,确定与用户位置之间的距离不大于距离阈值的更新后的节点,基于更新后的节点调整预定规则,得到调整后的规则。
在一种可能的实现方式中,获取模块,用于解析访问请求得到访问条件,基于访问条件确定处理访问请求所需要访问的节点,基于需要访问的节点的位置和至少一个用户的位置确定用户位置与节点位置的对应关系。
在一种可能的实现方式中,特征信息包括查询扇出度、负载不均衡度以及用户位置与节点位置的对应关系中的至少一种信息。
第三方面,提供了一种时序数据库的分区调整设备,设备包括存储器及处理器;存储器中存储有至少一条指令,至少一条指令由处理器加载并执行,以使分区设备实现上述各方面中的方法。
可选地,处理器为一个或多个,存储器为一个或多个。
可选地,存储器可以与处理器集成在一起,或者存储器与处理器分离设置。
在具体实现过程中,存储器可以为非瞬时性(non-transitory)存储器,例如只读存储器(read only memory,ROM),其可以与处理器集成在同一块芯片上,也可以分别设置在不同的芯片上,本申请对存储器的类型以及存储器与处理器的设置方式不做限定。
第四方面,提供了一种计算机可读存储介质,计算机可读存储介质中存储有至少一条指令,指令由处理器加载并执行以实现上述各方面中的方法。
第五方面,提供了一种计算机程序产品,计算机程序产品包括计算机程序或指令,计算机程序或指令被处理器执行,以使计算机实现上述各方面中的方法。
第六方面,提供了一种芯片,包括处理器,用于从存储器中调用并运行存储器中存储的指令,使得安装有芯片的通信设备执行上述各方面中的方法。
第七方面,提供另一种芯片,包括:输入接口、输出接口、处理器和存储器,输入接口、输出接口、处理器以及存储器之间通过内部连接通路相连,处理器用于执行存储器中的代码, 当代码被执行时,处理器用于执行上述各方面中的方法。
附图说明
图1为本申请实施例提供的一种实施环境的示意图;
图2为本申请实施例提供的一种时序数据库的分区调整方法的流程示意图;
图3为本申请实施例提供的一种时序数据库的数据表的结构示意图;
图4为本申请实施例提供的一种时序数据库的数据表的分区过程的示意图;
图5为本申请实施例提供的一种时序数据库的数据表的分区过程的示意图;
图6为本申请实施例提供的一种区组的示意图;
图7为本申请实施例提供的一种时序数据库的分区调整装置的结构示意图;
图8为本申请实施例提供的一种时序数据库的分区调整设备的结构示意图。
具体实施方式
本申请的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。
本申请实施例提供了一种时序数据库的分区调整方法,该方法应用于图1所示的实施环境中。图1中,包括统计模块(statistics module)、决策模块(decision module)和分区模块(sharding module),统计模块、决策模块和分区模块之间两两连接。其中,统计模块、决策模块和分区模块可以集成于同一硬件设备中,也可以集成于不同的硬件设备中,以实现各个模块所需要实现的功能。示例性的,该硬件设备包括但不限于终端设备、服务器或者其他有分区需求的网络设备等,本申请实施例不对硬件设备的种类进行限定。接下来,对各个模块所需要实现的功能进行说明如下。
统计模块,用于统计特征信息。参见图1,统计模块采集与用户请求(request)有关的信息,并采集分区模块发送的各个节点的负载信息(load information),从而统计得到特征信息,示例性地,该特征信息用于反映至少一个用户针对数据表的访问习惯。其中,特征信息包括但不限于:用户位置与节点位置的对应关系、负载不均衡度和查询扇出度。之后,统计模块将统计的特征信息反馈(feed back)至决策模块。
决策模块,用于接收统计模块发送的特征信息,基于特征信息确定是否需要对当前使用的预定规则(sharding rule)进行更新。如果需要更新,则进一步根据特征信息调整预定规则,得到调整后的规则,将调整后的规则发送至分区模块。当然,如果基于特征信息确定不需要对当前使用的预定规则进行更新,则不向分区模块发送调整后的规则。另外,无论是否对预定规则进行更新,在当前区组结束之后,均需要指示分区模块产生下一个区组。
分区模块,用于根据决策模块的指示进行分区。其中,响应于接收到决策模块发送的调整后的规则,则按照调整后的规则产生新的区组和/或区。另外,分区模块还按照决策模块的指示,在当前区组结束之后产生下一个区组。
基于上述图1所示的实施环境,本申请实施例提供了一种时序数据库的分区调整方法。参见图2,该方法包括如下的步骤。
201,获取至少一个用户针对时序数据库的数据表的访问请求的特征信息,特征信息用于 反映至少一个用户针对数据表的访问习惯。
在本申请实施例中,基于至少一个用户针对时序数据库的数据表的访问请求进行持续统计,从而得到特征信息。由于统计过程是持续的,因而统计得到的特征信息可能是随时间不断变化的。在一些实施方式中,每隔参考时长获取一次特征信息,本申请实施例不对该参考时长加以限定,参考时长可以根据经验进行设置。其中,特征信息会在后文进行详细说明,此处暂不进行赘述。接下来,对时序数据库和时序数据库的数据表进行说明。
时序数据库又称时间序列数据库,是一种用于处理时间序列数据的数据库,时间序列数据是对应有时间戳(timestamp)的数据。在时序数据库中,时间序列数据以数据表的形式存储,时间序列数据在数据表中按照时间顺序排列。一条时间序列数据包括时间戳、标签(tag)和至少一个指标数据。至少一个指标数据包括数据源产生的数据和/或用于指示数据源属性的数据(又称元数据),标签用于对数据源进行唯一标识。例如,参见图3,图3示出了一个示例性的时序数据库的数据表,数据表中的每一行即为一条时间序列数据。图3中,用于产生指标数据的数据源为设备(device),标签为设备标识(device_id),指标数据包括数据源产生的数据和用于指示数据源属性的数据。其中,数据源产生的数据包括每分钟中央处理器(central processing unit,CPU)平均值(cpu_1m_avg)、空闲内存(free_mem)和温度(temperature),用于指示数据源属性的数据包括位置标识(location_id)和开发类型(dev_type)。
本申请实施例中,时序数据库的数据表按照预定规则分成多个区组,每个区组分成多个区。其中,每个区组分别设置在不同时间段中,每个区分别设置在不同节点中。结合图4对区组和区进行说明。
参见图4中的纵坐标,首先按照时间范围(time range)将数据表划分为多个区组,区组又称时间分区。每个区组分别设置在不同时间段中,该时间段又称为区组持续时间(shard group duration)。示例性地,不同区组的持续时间可能相同,也可能不同,本申请实施例不对区组的持续时间加以限定。例如,一个区组所在的时间段为周一的0:00至24:00,另一个区组所在的时间段为周二的0:00至24:00,这两个区组所在的时间段不同(分别为周一和周二),但这两个区组具有相同的持续时间(均为24小时)。示例性地,各个区组是依次生成的。也就是说,在一个区组结束之后,再产生下一个区组。例如,一个区组所在的时间段为周一的0:00至24:00,则在周一的24:00该区组结束,此时(即周一的24:00)再产生下一个区组。
参见图4中的横坐标,按照预定规则将每个区组划分为多个区,多个区分别设置在数据库的不同节点中,每个区中均包括数据表中的至少一条时间序列数据。示例性地,预定规则包括但不限于时间序列数据包括的标签和各个指标数据中的至少一个,本实施例不对预定规则加以限定。
根据以上结合图4的说明能够看出,在一些实施方式中,将时序数据库的数据表按照预定规则分成多个区组,每个区组分成多个区,包括:按照时间范围将数据表分成多个区组,按照预定规则将每个区组分成多个区。以图5示出的情况为例,说明按照该实施方式对时序数据库的数据表进行分区的过程。
首先,以1分钟为一个时间段,将数据表划分为两个区组。第一个区组所在的时间段为2017-01-01 01:02:00至2017-01-01 01:02:59,则第一个区组中包括时间序列数据1、时间序列数据2和时间序列数据3。第二个区组所在的时间段为2017-01-01 01:03:00至2017-01-01 01:03:59,则第二个区组中包括时间序列数据4、时间序列数据5和时间序列数据6。
在第一个区组和第二个区组中,以位置标识为预定规则,从而将第一个区组和第二个区组分别划分为两个区。如图5所示,第一个区组中的第一个区包括时间序列数据1和时间序列数据2(位置标识均为42),第一个区组中的第二个区包括时间序列数据3(位置标识为77)。第二个区组中的第一个区包括时间序列数据4和时间序列数据5(位置标识均为42),第二个区组中的第二个区包括时间序列数据6(位置标识为77)。其中,第一个区组和第二个区组中的第一个区均设置在第一个节点中,第一个区组和第二个区组中的第二个区均设置在第二个节点中。
需要说明的是,在一个区组内按照预定规则划分得到多个区后,仅会将该区组所在的时间段内的时间序列数据写入按照该预定规则划分得到的区中,而在该区组所在的时间段之前已写入数据表的时间序列数据则不会再重新被写入。例如,一个区组所在的时间段为周一的0:00至24:00,按照预定规则划分得到多个分区之后,如果在周一的0:00至24:00内获取到需要写入数据表中的时间序列数据,则将该时间序列数据写入按照该预定规则划分得到的区中,而周一的0:00之前数据表中已写入的时间序列数据则不会被重新写入。
202,根据特征信息调整预定规则,根据调整后的规则产生与访问习惯匹配的新的区组和/或新的区。
根据201中的说明可知,数据表已按照预定规则划分为区组和区,特征信息用于反映至少一个用户针对数据表的访问习惯。示例性地,响应于特征信息满足条件,则说明按照预定规则划分得到的区组和区与特征信息所反映的访问习惯不匹配,如果继续使用预定规则,则会导致异常情况的发生。因此,在特征信息满足条件的情况下,需要根据特征信息调整预定规则,以便于根据调整后的规则产生与访问习惯匹配的新的区组和/或新的区。其中,特征信息满足条件的方式会在后文中与特征信息一并进行详细说明,此处暂不进行赘述。
在示例性实施例中,根据调整后的规则产生与访问习惯匹配的新的区组和/或新的区,包括:产生新的区组,在新的区组中根据调整后的规则产生与访问习惯匹配的新的区。其中,由于新的区组设置在一个时间段中,因而该新的区组对应有起始时刻和结束时刻,本申请实施例需要确定该新的区组的起始时刻和结束时刻。
在示例性实施例中,方法还包括:在预定区组中确定参考时刻。基于参考时刻与预定区组的结束时刻之间的时间间隔,确定新的区组的起始时刻。其中,预定区组为特征信息的获取时刻所在的区组。参考时刻为特征信息的获取时刻之前数据表中有数据写入的最大时刻。
根据201中的说明可知,已写入数据表中的数据不会被重新写入,因而需要确定数据表中有数据写入的最大时刻,也即是参考时刻,在该参考时刻之后才能对预定规则进行调整。示例性地,由于数据表中的时间序列数据包括有时间戳,因而本申请实施例获取数据表中的最大时间戳,从而将该最大时间戳所指示的时刻确定为参考时刻。当然,此种确定参考时刻的方式仅为举例,不用于对本实施例确定参考时刻的方式造成限定。
在示例性实施例中,基于参考时刻与预定区组的结束时刻之间的时间间隔,确定新的区组的起始时刻,包括如下的两种方式A1和A2。
方式A1:响应于参考时刻与预定区组的结束时刻之间的时间间隔小于时间阈值,将预定区组的结束时刻确定为新的区组的起始时刻。
其中,根据201中的说明可知,各个区组是依次生成的,一个区组结束之后便会产生下一个区组。响应于参考时刻与预定区组的结束时刻之间的时间间隔小于时间阈值,则说明参考 时刻与生成下一个区组的时刻之间的时间间隔较短。因此,本实施例不将参考时刻作为新的区组的起始时刻,也即是不立即产生新的区组、不立即对预定规则进行调整,而是等待预定区组结束之后,在产生下一个区组时一并对预定规则进行调整。能够理解的是,由于该时间间隔较短,因而即使在参考时刻与预定区组的结束时刻之间仍使用预定规则进行分区,使得由于该预定规则而产生的异常情况在预定区组结束之前不能够得到改善,也不会造成严重后果。例如,时间阈值为1小时,预定区组所在的时间段为周一的0:00至24:00,参考时刻为周一的23:30。由于参考时刻与预定区组的结束时刻(即周一的24:00)之间的时间间隔为30分钟,小于时间阈值1小时,因而将预定区组的结束时刻作为新的区组的起始时刻。
在方式A1中,新的区组的结束时刻可以是晚于新的区组的起始时刻的任一时刻。示例性地,本申请实施例使得预定区组与新的区组的区组持续时间一致。例如预定区组所在的时间段为周一的0:00至24:00,则新的区组所在的时间段为周二的0:00至24:00。
方式A2:响应于参考时刻与预定区组的结束时刻之间的时间间隔不小于时间阈值,将参考时刻确定为新的区组的起始时刻。
响应于参考时刻与预定区组结束时刻之间的时间间隔不小于时间阈值,则说明参考时刻与产生下一个区组的时刻之间的时间间隔较长。如果仍然等待至预定区组结束之后再对预定规则进行调整,则可能导致由于预定规则而产生的异常情况存在较长时间,从而造成严重后果。因此,在方式A2中,将参考时刻确定为新的区组的起始时刻,也即是立即产生新的区组,以便于在新的区组中对预定规则进行调整,从而保证了对预定规则的及时调整。例如,时间阈值为1小时,预定区组所在的时间段为周一的0:00至24:00,参考时刻为周一的14:00。由于参考时刻与预定区组的结束时刻(即周一的24:00)之间的时间间隔为8小时,大于时间阈值1小时,因而将参考时刻确定为新的区组的起始时刻。
并且,在方式A2中,由于不同区组需要设置在不同时间段中,而预定区组中参考时刻之后的时间段已经被作为新的区组,因此还需要对预定区组进行更新。在示例性实施例中,方法还包括:更新预定区组,更新后的区组的起始时刻为预定区组的起始时刻,更新后的区组的结束时刻为参考时刻。仍以预定区组所在的时间段为周一的0:00至24:00,参考时刻为周一的14:00为例,则更新后的区组的起始时刻为周一的0:00,更新后的区组的结束时刻为周一的14:00。
在示例性实施例中,方式A2中新的区组的结束时刻为预定区组的结束时刻。例如,预定区组所在的时间段为[T1,T2],参考时刻为位于T1和T2之间的T3,则新的区组的起始时刻是与该参考时刻一致的T3,新的区组的结束时刻是预定区组的结束时刻T2。参见图6,在此种情况下,是以参考时刻为分界点,将预定区组分为了两个不同区组。其中一个即为新的区组[T3,T2],另一个即为上述更新后的区组[T1,T3]。当然,本实施例不对新的区组的结束时刻加以限定,新的区组的结束时刻可以晚于预定区组的结束时刻,也可以早于预定区组的结束时刻。
能够理解的是,在实际应用过程中,上述201和202能够多次执行,从而确保分区规则能够根据实际需要不断更新,使得用于分区的规则与用户的访问习惯相适配,保证数据库的各个节点的读写性能。
另外,上述202针对于特征信息满足条件、需要对预定规则进行调整的情况。响应于特征信息不满足条件,则说明按照该预定规则进行分区不会导致异常情况的发生,因而无需对预 定规则进行调整,仍然使用预定规则即可。示例性地,在预定区组结束之后直接产生下一个区组,在下一个区组中仍然按照预定规则划分得到多个区。其中,该下一个区组的起始时刻即为预定区组的结束时刻,本申请实施例不对该下一个区组的结束时刻加以限定。
以上,对本实施例中时序数据库的分区调整过程进行了说明。接下来,对上述说明中未进行赘述的内容进行详细说明。
在示例性实施例中,数据表的特征信息包括用户位置与节点位置的对应关系、负载不均衡度和查询扇出度中的至少一种信息。能够理解的是,以上特征信息仅为举例,不用于对本实施例造成限定。本实施例还可以根据实际需要将其他信息作为数据表对应的特征信息。通过情况B1-情况B3对查询扇出度、负载不均衡度和用户位置与节点位置的对应关系分别进行说明。
情况B1:查询扇出度用于指示处理访问请求所需要访问的节点的数量。示例性地,获取查询扇出度的方式包括:解析访问请求得到访问条件,基于访问条件确定处理访问请求所需要访问的节点,将需要访问的节点的数量确定为查询扇出度。其中,访问请求用于从数据表中读写数据,处理该访问请求需要访问哪个节点或哪些节点,是由该访问请求中的访问条件确定的。例如,访问条件为访问某个时间段内A区域的数据,而该时间段内A区域的数据分别位于节点1和节点2中,则能够确定需要访问的节点即为节点1和节点2。在确定需要访问的节点之后,便能够统计需要访问的节点数量,从而基于需要访问的节点数量确定该查询扇出度。在一些实施方式中,将需要访问的节点数量直接作为查询扇出度。例如在上一个举例中,需要访问的节点为节点1和节点2,需要访问的节点数量为2个,则确定查询扇出度为2。在另一些实施方式中,将需要访问的节点数量与数据库包括的所有节点的比值作为查询扇出度。仍以需要访问的节点数量为2个为例,如果数据库包括10个节点,则将2/10作为查询扇出度。
在示例性实施例中,根据特征信息调整预定规则,包括:响应于查询扇出度指示的节点的数量大于数量阈值,则基于解析访问请求得到的访问条件的使用频率确定分区键,基于分区键调整预定规则,得到调整后的规则。能够看出,响应于查询扇出度指示的节点的数量大于数量阈值,则认为特征信息满足条件,需要调整预定规则。
其中,响应于查询扇出度指示的节点的数量大于数量阈值,则说明当前使用的分区键与用户的访问习惯不匹配。例如,当前使用的分区键为时间。在某个区组内包括A0-A9共十个子时间段,一个子时间段的时间序列数据位于同一个节点中,而一个区域的时间序列数据则可能分散在不同节点中。当访问条件为区域时,例如需要读写区域B0的所有时间序列数据时,则可能需要访问所有的十个节点才能获得区域B0的所有时间序列数据,因而查询扇出度较大,可能大于数量阈值。因此,需要基于访问条件确定新的分区键,从而得到调整后的规则。
在一些实施方式中,基于分区键调整预定规则,得到调整后的规则,包括:将使用频率最高的访问条件作为分区键,将预定规则替换为分区键,得到调整后的规则。例如,使用频率最高的访问条件为区域,则将区域作为分区键。仍以上一个举例为例,将区域作为分区键之后,一个区域的时间序列数据均位于同一个节点中,而不再分散在不同节点中。当需要读写区域B0的所有时间序列数据时,仅需访问区域B0对应的一个节点即可,从而减小了查询扇出度。在另一些实施方式中,本实施例按照使用频率从大到小的顺序对访问条件进行排列,得到访问条件序列,将该访问条件序列中前参考数量个访问条件作为分区键。本实施例不对 参考数量加以限定,参考数量为不小于2的正整数。例如,使用频率最高的访问条件为区域,使用频率第二高的访问条件为时间,参考数量为2,则将区域和时间共同作为调整后的规则。
在数据表创建时,用户可能会随机设置分区键,或者根据创建时的实际需求设置分区键。然而,在数据表的使用过程中,用户业务可能发生改变,从而导致之前设置的分区键不再适用用户当前的业务,根据该分区键划分得到的区与用户的访问习惯不匹配。本实施例根据查询扇出度对分区键进行调整,使得分区键与用户当前业务的状态相适配、根据该分区键划分得到的区与用户的访问习惯相匹配,从而提高了节点读写性能。
情况B2,负载不均衡度用于指示不同节点的负载的不均衡程度。示例性地,获取负载不均衡度的方式包括:基于不同节点的数据量、时间线数量和时间线的访问频率中的至少一种,确定不同节点的负载。基于不同节点的负载确定负载不均衡度。其中,时间线与数据源一一对应,一个时间线中包括与数据源对应的一条或多条时间序列数据。例如,在图3所示的情况下,数据源abc123对应的时间线包括时间序列数据1和时间序列数据4。另外,时间线的访问频率即为对时间线进行读写的频率。
在一些实施方式中,上述不同节点是指:设置有数据表的各个区的节点。例如,数据库共包括10个节点,在一个区组中数据表包括设置在5个节点中的5个区,则不同节点是指10个节点中设置有区的5个节点。在另一些实施方式中,不同节点是指:数据库包括的所有节点。例如,数据库共包括10个节点,这10个节点即为上述不同节点。
示例性地,基于不同节点的负载确定负载不均衡度,包括:对于一个节点而言,基于该节点的负载确定该节点的负载值,基于不同节点的负载值确定负载不均衡度。例如,计算各个节点的负载值的方差,将方差作为负载不均衡度。在一些实施方式中,基于一个节点的负载确定该节点的负载值,包括:基于该节点的数据量确定第一子值,基于该节点的时间线数量确定第二子值,基于该节点的时间线的访问频率确定第三子值,将第一子值、第二子值和第三子值的加权求和值作为该节点的负载值。其中,不同子值的权重相同或不同,本实施例不对不同子值的权重加以限定。
在示例性实施例中,根据特征信息调整预定规则,包括:响应于负载不均衡度指示的不均衡的程度大于参考阈值,基于不同节点的负载确定不同节点之间的区边界值,基于区边界值调整预定规则,得到调整后的规则。能够看出,响应于负载不均衡度指示的不均衡的程度大于参考阈值,则认为特征信息满足条件,需要对预定规则进行调整。
在不同节点之间负载不均衡的情况下,负载较大的节点会受到节点处理能力的限制,导致负载较大的节点读写性能较差。另外,负载较小的节点则无法充分利用节点处理能力,导致节点处理能力的浪费。因此,需要对负载不均衡的情况进行改善。本实施例中,通过对区边界值的修改来调整不同节点的负载,从而实现不同节点之间的负载均衡。示例性地,基于区边界值调整预定规则,得到调整后的规则,包括:将预定规则替换为区边界值,得到调整后的规则。示例性地,在调整不同节点的负载时,可以以时间线为单位进行调整,本实施例不对单位加以限定。以下以单位为时间线为例进行说明。
在预定区组中,节点A中包括时间线0-999,节点B中包括时间线1000-1999,则区边界值为999和1000。如果预定区组内节点A的负载较大,节点B的负载较小,则节点A和节点B存在负载不均衡情况。因此,需要减小节点A的负载并增大节点B的负载,从而使得节点A与节点B达到负载均衡。例如,将区边界值由999和1000调整为599和600,则在新的区组内,时间线 0-599对应的数据源所对应的时间序列数据被写入节点A,时间线600-1999对应的数据源所对应的时间序列数据被写入节点B,则节点A和节点B的负载在新的区组所在的时间段内达到负载平衡。
需要强调的是,在上述负载均衡的过程中,是在预定区组所在的时间段内发生负载不均衡时,在新的区组所在的时间段内进行区边界值的调整,从而在新的区组所在的时间段内实现负载均衡,因而并不涉及时间序列数据在不同节点之间的迁移过程。由此,避免了不同节点之间的数据迁移过程与用户正常访问数据库的过程发生冲突,从而避免了影响节点读写性能。其中,由于预定区组所在的时间段内未进行区边界值的调整,因而预定区组所在的时间段内的负载不均衡的情况仍然存在。不过,由于时序数据库往往最为关注新的区组,因而在新的区组所在的时间段内实现负载均衡,即可满足数据库对于不同节点之间的负载均衡的要求。
情况B3,在用户位置与节点位置的对应关系中,用户位置为至少一个用户的位置,节点位置为处理访问请求所需要访问的节点的位置。示例性地,获取用户位置与节点位置的对应关系,包括:解析访问请求得到访问条件,基于访问条件确定处理访问请求需要访问的节点,基于需要访问的节点和至少一个用户的位置确定用户位置与节点位置的对应关系。其中,访问请求中可以携带用户设备的标识,该用户设备的标识用于对用户设备进行唯一指示,因而基于该用户设备的标识能够确定用户位置。确定需要访问的节点的过程参见上文说明,此处不再进行赘述。示例性地,当需要访问的节点数量为一个时,直接将节点位置与用户位置进行对应存储即可,此种情况下用户位置与节点位置的对应关系是一对一的对应关系。当需要访问的节点数量为多个时,可以将各个节点的节点位置均与用户位置对应存储,则用户位置与节点位置的对应关系是一对多的对应关系。
在示例性实施例中,特征信息包括用户位置与节点位置的对应关系,根据特征信息调整预定规则,包括:响应于用户位置与节点位置的对应关系中用户位置与节点位置之间的距离大于距离阈值,确定与用户位置之间的距离不大于距离阈值的更新后的节点,基于更新后的节点调整预定规则,得到调整后的规则。能够看出,响应于用户位置与节点位置的对应关系中用户位置与节点位置之间的距离大于距离阈值,则认为特征信息满足条件,需要对预定规则进行调整。
在访问过程中,需要读写的时间序列数据会在用户设备与节点设备之间传输。因此,用户位置与节点位置之间的距离越远,则时间序列数据需要传输的距离也越远,从而可能导致路由次数的增加、降低了节点读写性能。因此,将需要读写的时间序列数据调整至与用户设备之间的距离不大于距离阈值的节点中进行存储,即调整至更新后的节点中进行存储。示例性地,在进行调整时,可以以时间线为单位进行调整,本实施例不对单位加以限定。以下以单位为时间线为例进行说明。
预定区组内,需要访问的时间线位于节点A中,而节点A位置与用户位置之间的距离超过距离阈值。此种情况下,从数据库包括的所有节点中选择与用户位置距离不超过距离阈值的节点,例如与用户位置距离最近的节点B,从而在新的区组内将该时间线对应的数据源对应的时间序列数据写入节点B中,而不再写入节点A中,则后续用户需要访问该时间线时仅需访问节点B即可。由此,缩短了数据的传输距离,提高了节点读写性能。
在示例性实施例中,特征信息包括负载不均衡度、查询扇出度和用户位置与节点位置的 对应关系中的至少两种信息,则根据特征信息调整预定规则,包括如下的两种方式C1和C2。
方式C1:基于优先级别满足阈值的条件调整预定规则。
其中,本实施例针对各个所满足的条件分别设置优先级别,优先级别用于指示条件的优先程度。优先级别越高,则指示的优先程度越高。示例性地,基于优先级别满足阈值的条件调整预定规则,包括:基于优先级别最高的条件调整预定规则。
例如,各个条件的优先级别从高到低依次为:条件1--查询扇出度指示的节点的数量大于数量阈值,条件2--负载不均衡度指示的不均衡程度大于参考阈值,条件3--用户位置与节点位置之间的距离超过距离阈值。响应于条件1和条件2均满足,则基于优先级别最高的条件1调整预定规则,也即是按照情况C1中的说明调整预定规则,从而将分区键作为调整后的规则。
方式C2:综合所满足的各个条件调整预定规则。
此种情况下,对于所满足的各个条件分别确定子分区规则,再将子分区规则组合为调整后的规则。仍以方式C1中的举例为例,在条件1和条件2均满足的情况下,基于条件1确定出的子分区规则为分区键,基于条件2确定出的子分区规则为区边界值。因此,将分区键和区边界值均作为调整后的规则。
综上所述,本申请实施例基于特征信息调整预定规则,根据调整后的规则产生与用户针对数据表的访问习惯匹配的新的区组和/或区。本申请能够及时更新分区规则,避免了由于分区规则不恰当而导致异常情况的发生,使得区组和/或区与用户的访问习惯相匹配,保证了时序数据库的节点的读写性能。
以上介绍了本申请实施例提供的时序数据的分区调整方法,与上述方法对应,本申请实施例还提供了时序数据库的分区调整装置。图7是本申请实施例提供的一种时序数据库的分区调整装置的结构示意图,该装置可应用于图1所示的硬件设备。基于图7所示的如下多个模块,该图7所示的时序数据库的分区调整装置能够执行图2所示的方法实施例中的操作。应理解到,该装置可以包括比所示模块更多的附加模块或者省略其中所示的一部分模块,本申请实施例对此并不进行限制。如图7所示,本申请实施例提供的时序数据库的分区调整装置包括如下的模块。
获取模块701,用于获取至少一个用户针对时序数据库的数据表的访问请求的特征信息,其中,数据表按照预定规则分成多个区组,每个区组分成多个区,每个区组分别设置在不同时间段中,每个区分别设置在不同节点中,特征信息用于反映至少一个用户针对数据表的访问习惯。获取模块701所执行的步骤参见上文201中的说明,此处不再进行赘述。
调整模块702,用于根据特征信息调整预定规则,根据调整后的规则产生与访问习惯匹配的新的区组和/或新的区。调整模块702所执行的步骤参见上文202中的说明,此处不再进行赘述。
在一种可能的实现方式中,调整模块702,用于产生新的区组,在新的区组中根据调整后的规则产生与访问习惯匹配的新的区。
在一种可能的实现方式中,调整模块702,还用于在预定区组中确定参考时刻,预定区组为特征信息的获取时刻所在的区组,参考时刻为特征信息的获取时刻之前数据表中有数据写入的最大时刻;基于参考时刻与预定区组的结束时刻之间的时间间隔,确定新的区组的起始时刻。
在一种可能的实现方式中,调整模块702,用于响应于时间间隔不小于时间阈值,将参考时刻确定为新的区组的起始时刻;
调整模块702,还用于更新预定区组,更新后的区组的起始时刻为预定区组的起始时刻,更新后的区组的结束时刻为参考时刻。
在一种可能的实现方式中,新的区组的结束时刻为预定区组的结束时刻。
在一种可能的实现方式中,调整模块702,用于响应于时间间隔小于时间阈值,将预定区组的结束时刻确定为新的区组的起始时刻。
在一种可能的实现方式中,特征信息包括查询扇出度,查询扇出度用于指示处理访问请求所需要访问的节点的数量。
在一种可能的实现方式中,调整模块702,用于响应于查询扇出度指示的节点的数量大于数量阈值,基于解析访问请求得到的访问条件的使用频率确定分区键,基于分区键调整预定规则,得到调整后的规则。
在一种可能的实现方式中,获取模块701,用于解析访问请求得到访问条件,基于访问条件确定处理访问请求所需要访问的节点,将需要访问的节点的数量确定为查询扇出度。
在一种可能的实现方式中,特征信息包括负载不均衡度,负载不均衡度用于指示不同节点的负载的不均衡程度。
在一种可能的实现方式中,调整模块702,用于响应于负载不均衡度指示的不均衡程度大于参考阈值,基于不同节点的负载确定区边界值,基于区边界值调整预定规则,得到调整后的规则。
在一种可能的实现方式中,获取模块701,用于基于不同节点的数据量、时间线数量和时间线的访问频率中的至少一种,确定不同节点的负载;基于不同节点的负载确定负载不均衡度。
在一种可能的实现方式中,特征信息包括用户位置与节点位置的对应关系,用户位置为至少一个用户的位置,节点位置为处理访问请求所需要访问的节点的位置。
在一种可能的实现方式中,调整模块702,用于响应于用户位置与节点位置的对应关系中用户位置与节点位置之间的距离大于距离阈值,确定与用户位置之间的距离不大于距离阈值的更新后的节点,基于更新后的节点调整预定规则,得到调整后的规则。
在一种可能的实现方式中,获取模块701,用于解析访问请求得到访问条件,基于访问条件确定处理访问请求所需要访问的节点,基于需要访问的节点的位置和至少一个用户的位置确定用户位置与节点位置的对应关系。
在一种可能的实现方式中,特征信息包括查询扇出度、负载不均衡度以及用户位置与节点位置的对应关系中的至少一种信息。
综上所述,本申请实施例基于特征信息调整预定规则,根据调整后的规则产生与用户针对数据表的访问习惯匹配的新的区组和/或区。本申请能够及时更新分区规则,避免了由于分区规则不恰当而导致异常情况的发生,使得区组和/或区与用户的访问习惯相匹配,保证了时序数据库的节点的读写性能。
应理解的是,上述图7提供的装置在实现其功能时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提 供的装置与方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
本申请提供了一种时序数据库的分区调整设备,该设备包括:通信接口和处理器,可选的,该通信设备还包括存储器。其中,该通信接口、该存储器和该处理器通过内部连接通路互相通信,该存储器用于存储指令,该处理器用于执行该存储器存储的指令,以控制通信接口接收信号,并控制通信接口发送信号,并且当该处理器执行该存储器存储的指令时,使得该处理器执行本申请所提供的任一种示例性的时序数据库的分区调整方法。
参见图8,图8示出了本申请一示例性的时序数据库的分区调整设备800的结构示意图。图8所示的设备800用于执行上述图2所示的时序数据库的分区调整方法所涉及的操作。该设备800例如是一台服务器、由多台服务器组成的服务器集群,或者是一个云计算服务中心等。
如图8所示,设备800包括至少一个处理器801、存储器803以及至少一个通信接口804。
处理器801例如是通用CPU、数字信号处理器(digital signal processor,DSP)、网络处理器(network processer,NP)、GPU、神经网络处理器(neural-network processing units,NPU)、数据处理单元(Data Processing Unit,DPU)、微处理器或者一个或多个用于实现本申请方案的集成电路或专用集成电路(application-specific integrated circuit,ASIC),可编程逻辑器件(programmable logic device,PLD)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。PLD例如是复杂可编程逻辑器件(complex programmable logic device,CPLD)、现场可编程逻辑门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合。其可以实现或执行结合本申请公开内容所描述的各种逻辑方框、模块和电路。处理器也可以是实现计算功能的组合,例如包括一个或多个微处理器组合,DSP和微处理器的组合等等。
可选的,设备800还包括总线802。总线802用于在设备800的各组件之间传送信息。总线802可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。总线802可以分为地址总线、数据总线、控制总线等。为便于表示,图8中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。
存储器803例如是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其它类型的存储设备,又如是随机存取存储器(random access memory,RAM)或者可存储信息和指令的其它类型的动态存储设备,又如是电可擦可编程只读存储器(electrically erasable programmable read-only Memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其它光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其它磁存储设备,或者是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其它介质,但不限于此。存储器803例如是独立存在,并通过总线802与处理器801相连接。存储器803也可以和处理器801集成在一起。
通信接口804使用任何收发器一类的装置,用于与其它设备或通信网络通信,通信网络可以为以太网、无线接入网(radio access network,RAN)或无线局域网(wireless local area network,WLAN)等。通信接口804可以包括有线通信接口,还可以包括无线通信接口。具 体的,通信接口804可以为以太(Ethernet)接口,如:快速以太(Fast Ethernet,FE)接口、千兆以太(Gigabit Ethernet,GE)接口,异步传输模式(Asynchronous Transfer Mode,ATM)接口,WLAN接口,蜂窝网络通信接口或其组合。以太网接口可以是光接口,电接口或其组合。在本申请的一些实施方式中,通信接口804可以用于设备800与其他设备进行通信。
在具体实现中,作为一些实施方式,处理器801可以包括一个或多个CPU,如图8中所示的CPU0和CPU1。这些处理器中的每一个可以是一个单核处理器,也可以是一个多核处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。
在具体实现中,作为一些实施方式,设备800可以包括多个处理器,如图8中所示的处理器801和处理器805。这些处理器中的每一个可以是一个单核处理器,也可以是一个多核处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(如计算机程序指令)的处理核。
在一些实施方式中,存储器803用于存储执行本申请方案的程序代码810,处理器801可以执行存储器803中存储的程序代码810。也即是,设备800可以通过处理器801以及存储器803中的程序代码810,来实现方法实施例提供的时序数据库的分区调整方法。程序代码810中可以包括一个或多个软件模块。可选地,处理器801自身也可以存储执行本申请方案的程序代码或指令。
在具体实施过程中,本申请的设备800可对应于用于执行上述方法的设备,设备800中的处理器801读取存储器803中的指令,使图8所示的设备800能够执行方法实施例中的全部或部分步骤。
设备800还可以对应于上述图7所示的装置,图7所示的装置中的每个功能模块采用设备800的软件实现。换句话说,图7所示的装置包括的功能模块为设备800的处理器801读取存储器803中存储的程序代码810后生成的。
其中,图2所示的时序数据库的分区调整方法的各步骤通过设备800的处理器中的硬件的集成逻辑电路或者软件形式的指令完成。结合本申请所公开的方法实施例的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法实施例的步骤,为避免重复,这里不再详细描述。
在示例性实施例中,本实施例提供了一种时序数据库的分区调整设备,设备包括存储器及处理器;存储器中存储有至少一条指令,至少一条指令由处理器加载并执行,以使分区设备实现本申请的任一种示例性实施例所提供的时序数据库的分区调整方法。
在示例性实施例中,提供了一种计算机可读存储介质,计算机可读存储介质中存储有至少一条指令,指令由处理器加载并执行以实现本申请的任一种示例性实施例所提供的时序数据库的分区调整方法。
在示例性实施例中,提供了一种计算机程序产品,计算机程序产品包括计算机程序或指令,计算机程序或指令被处理器执行,以使计算机实现本申请的任一种示例性实施例所提供时序数据库的分区调整方法。
在示例性实施例中,提供了一种芯片,包括处理器,用于从存储器中调用并运行存储器中存储的指令,使得安装有芯片的通信设备执行本申请的任一种示例性实施例所提供的时序数据库的分区调整方法。
在示例性实施例中,提供另一种芯片,包括:输入接口、输出接口、处理器和存储器,输入接口、输出接口、处理器以及存储器之间通过内部连接通路相连,处理器用于执行存储器中的代码,当代码被执行时,处理器用于执行本申请的任一种示例性实施例所提供的时序数据库的分区调整方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk)等。
以上所述仅为本申请的实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (35)

  1. 一种时序数据库的分区调整方法,其特征在于,所述方法包括:
    获取至少一个用户针对所述时序数据库的数据表的访问请求的特征信息,其中,所述数据表按照预定规则分成多个区组,每个区组分成多个区,每个区组分别设置在不同时间段中,每个区分别设置在不同节点中,所述特征信息用于反映所述至少一个用户针对所述数据表的访问习惯;
    根据所述特征信息调整所述预定规则,根据调整后的规则产生与所述访问习惯匹配的新的区组和/或新的区。
  2. 根据权利要求1所述的方法,其特征在于,所述根据调整后的规则产生与所述访问习惯匹配的新的区组和/或新的区,包括:
    产生新的区组,在所述新的区组中根据所述调整后的规则产生与所述访问习惯匹配的新的区。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    在预定区组中确定参考时刻,所述预定区组为所述特征信息的获取时刻所在的区组,所述参考时刻为所述特征信息的获取时刻之前所述数据表中有数据写入的最大时刻;
    基于所述参考时刻与所述预定区组的结束时刻之间的时间间隔,确定所述新的区组的起始时刻。
  4. 根据权利要求3所述的方法,其特征在于,所述基于所述参考时刻与所述预定区组的结束时刻之间的时间间隔,确定所述新的区组的起始时刻,包括:
    响应于所述时间间隔不小于时间阈值,将所述参考时刻确定为新的区组的起始时刻;
    所述方法还包括:
    更新所述预定区组,更新后的区组的起始时刻为所述预定区组的起始时刻,所述更新后的区组的结束时刻为所述参考时刻。
  5. 根据权利要求4所述的方法,其特征在于,所述新的区组的结束时刻为所述预定区组的结束时刻。
  6. 根据权利要求3所述的方法,其特征在于,所述基于所述参考时刻与所述预定区组的结束时刻之间的时间间隔,确定所述新的区组的起始时刻,包括:
    响应于所述时间间隔小于时间阈值,将所述预定区组的结束时刻确定为所述新的区组的起始时刻。
  7. 根据权利要求1-6任一所述的方法,其特征在于,所述特征信息包括查询扇出度,所述查询扇出度用于指示处理所述访问请求所需要访问的节点的数量。
  8. 根据权利要求7所述的方法,其特征在于,所述根据所述特征信息调整所述预定规则,包括:
    响应于所述查询扇出度指示的节点的数量大于数量阈值,基于解析所述访问请求得到的访问条件的使用频率确定分区键,基于所述分区键调整所述预定规则,得到所述调整后的规则。
  9. 根据权利要求7或8所述的方法,其特征在于,所述获取至少一个用户针对所述时序数据库的数据表的访问请求的特征信息,包括:
    解析所述访问请求得到访问条件,基于所述访问条件确定处理所述访问请求所需要访问的节点,将所述需要访问的节点的数量确定为所述查询扇出度。
  10. 根据权利要求1-6任一所述的方法,其特征在于,所述特征信息包括负载不均衡度,所述负载不均衡度用于指示所述不同节点的负载的不均衡程度。
  11. 根据权利要求10所述的方法,其特征在于,所述根据所述特征信息调整所述预定规则,包括:
    响应于所述负载不均衡度指示的不均衡程度大于参考阈值,基于所述不同节点的负载确定区边界值,基于所述区边界值调整所述预定规则,得到所述调整后的规则。
  12. 根据权利要求10或11所述的方法,其特征在于,所述获取至少一个用户针对所述时序数据库的数据表的访问请求的特征信息,包括:
    基于所述不同节点的数据量、时间线数量和时间线的访问频率中的至少一种,确定所述不同节点的负载;
    基于所述不同节点的负载确定所述负载不均衡度。
  13. 根据权利要求1-6任一所述的方法,其特征在于,所述特征信息包括用户位置与节点位置的对应关系,所述用户位置为所述至少一个用户的位置,所述节点位置为处理所述访问请求所需要访问的节点的位置。
  14. 根据权利要求13所述的方法,其特征在于,所述根据所述特征信息调整所述预定规则,包括:
    响应于所述用户位置与节点位置的对应关系中所述用户位置与所述节点位置之间的距离大于距离阈值,确定与所述用户位置之间的距离不大于所述距离阈值的更新后的节点,基于所述更新后的节点调整所述预定规则,得到所述调整后的规则。
  15. 根据权利要求13或14所述的方法,其特征在于,所述获取至少一个用户针对所述时序数据库的数据表的访问请求的特征信息,包括:
    解析所述访问请求得到访问条件,基于所述访问条件确定处理所述访问请求所需要访问 的节点,基于所述需要访问的节点的位置和所述至少一个用户的位置确定所述用户位置与节点位置的对应关系。
  16. 根据权利要求1-15任一所述的方法,其特征在于,所述特征信息包括查询扇出度、负载不均衡度以及用户位置与节点位置的对应关系中的至少一种信息。
  17. 一种时序数据库的分区调整装置,其特征在于,所述装置包括:
    获取模块,用于获取至少一个用户针对所述时序数据库的数据表的访问请求的特征信息,其中,所述数据表按照预定规则分成多个区组,每个区组分成多个区,每个区组分别设置在不同时间段中,每个区分别设置在不同节点中,所述特征信息用于反映所述至少一个用户针对所述数据表的访问习惯;
    调整模块,用于根据所述特征信息调整所述预定规则,根据调整后的规则产生与所述访问习惯匹配的新的区组和/或新的区。
  18. 根据权利要求17所述的装置,其特征在于,所述调整模块,用于产生新的区组,在所述新的区组中根据所述调整后的规则产生与所述访问习惯匹配的新的区。
  19. 根据权利要求18所述的装置,其特征在于,所述调整模块,还用于在预定区组中确定参考时刻,所述预定区组为所述特征信息的获取时刻所在的区组,所述参考时刻为所述特征信息的获取时刻之前所述数据表中有数据写入的最大时刻;基于所述参考时刻与所述预定区组的结束时刻之间的时间间隔,确定所述新的区组的起始时刻。
  20. 根据权利要求19所述的装置,其特征在于,所述调整模块,用于响应于所述时间间隔不小于时间阈值,将所述参考时刻确定为新的区组的起始时刻;
    所述调整模块,还用于更新所述预定区组,更新后的区组的起始时刻为所述预定区组的起始时刻,所述更新后的区组的结束时刻为所述参考时刻。
  21. 根据权利要求20所述的装置,其特征在于,所述新的区组的结束时刻为所述预定区组的结束时刻。
  22. 根据权利要求19所述的装置,其特征在于,所述调整模块,用于响应于所述时间间隔小于时间阈值,将所述预定区组的结束时刻确定为所述新的区组的起始时刻。
  23. 根据权利要求17-22任一所述的装置,其特征在于,所述特征信息包括查询扇出度,所述查询扇出度用于指示处理所述访问请求所需要访问的节点的数量。
  24. 根据权利要求23所述的装置,其特征在于,所述调整模块,用于响应于所述查询扇出度指示的节点的数量大于数量阈值,基于解析所述访问请求得到的访问条件的使用频率确定分区键,基于所述分区键调整所述预定规则,得到所述调整后的规则。
  25. 根据权利要求23或24所述的装置,其特征在于,所述获取模块,用于解析所述访问请求得到访问条件,基于所述访问条件确定处理所述访问请求所需要访问的节点,将所述需要访问的节点的数量确定为所述查询扇出度。
  26. 根据权利要求17-22任一所述的装置,其特征在于,所述特征信息包括负载不均衡度,所述负载不均衡度用于指示所述不同节点的负载的不均衡程度。
  27. 根据权利要求26所述的装置,其特征在于,所述调整模块,用于响应于所述负载不均衡度指示的不均衡程度大于参考阈值,基于所述不同节点的负载确定区边界值,基于所述区边界值调整所述预定规则,得到所述调整后的规则。
  28. 根据权利要求26或27所述的装置,其特征在于,所述获取模块,用于基于所述不同节点的数据量、时间线数量和时间线的访问频率中的至少一种,确定所述不同节点的负载;基于所述不同节点的负载确定所述负载不均衡度。
  29. 根据权利要求17-22任一所述的装置,其特征在于,所述特征信息包括用户位置与节点位置的对应关系,所述用户位置为所述至少一个用户的位置,所述节点位置为处理所述访问请求所需要访问的节点的位置。
  30. 根据权利要求29所述的装置,其特征在于,所述调整模块,用于响应于所述用户位置与节点位置的对应关系中所述用户位置与所述节点位置之间的距离大于距离阈值,确定与所述用户位置之间的距离不大于所述距离阈值的更新后的节点,基于所述更新后的节点调整所述预定规则,得到所述调整后的规则。
  31. 根据权利要求29或30所述的装置,其特征在于,所述获取模块,用于解析所述访问请求得到访问条件,基于所述访问条件确定处理所述访问请求所需要访问的节点,基于所述需要访问的节点的位置和所述至少一个用户的位置确定所述用户位置与节点位置的对应关系。
  32. 根据权利要求17-31任一所述的装置,其特征在于,所述特征信息包括查询扇出度、负载不均衡度以及用户位置与节点位置的对应关系中的至少一种信息。
  33. 一种时序数据库的分区调整设备,其特征在于,所述设备包括存储器及处理器;所述存储器中存储有至少一条指令,所述至少一条指令由所述处理器加载并执行,以使所述分区设备实现权利要求1-16中任一所述的时序数据库的分区调整方法。
  34. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有至少一条指令,所述指令由处理器加载并执行以实现如权利要求1-16中任一所述的时序数据库的分区调整方法。
  35. 一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机程序或指令,所述计算机程序或指令被处理器执行,以使计算机实现权利要求1-16任一所述的时序数据库的分区调整方法。
PCT/CN2022/087412 2021-07-08 2022-04-18 时序数据库的分区调整方法、装置、设备及可读存储介质 WO2023279801A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22836560.7A EP4357931A1 (en) 2021-07-08 2022-04-18 Shard adjustment method and apparatus for time series database, device, and readable storage medium
US18/405,617 US20240143626A1 (en) 2021-07-08 2024-01-05 Shard Adjustment Method, Apparatus, and Device for Time Series Database, and Readable Storage Medium

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202110770891.9 2021-07-08
CN202110770891 2021-07-08
CN202111278133.1 2021-10-30
CN202111278133.1A CN115599782A (zh) 2021-07-08 2021-10-30 时序数据库的分区调整方法、装置、设备及可读存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/405,617 Continuation US20240143626A1 (en) 2021-07-08 2024-01-05 Shard Adjustment Method, Apparatus, and Device for Time Series Database, and Readable Storage Medium

Publications (2)

Publication Number Publication Date
WO2023279801A1 WO2023279801A1 (zh) 2023-01-12
WO2023279801A9 true WO2023279801A9 (zh) 2023-02-09

Family

ID=84801213

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/087412 WO2023279801A1 (zh) 2021-07-08 2022-04-18 时序数据库的分区调整方法、装置、设备及可读存储介质

Country Status (3)

Country Link
US (1) US20240143626A1 (zh)
EP (1) EP4357931A1 (zh)
WO (1) WO2023279801A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116089414B (zh) * 2023-04-10 2023-09-08 之江实验室 基于海量数据场景的时序数据库写入性能优化方法及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108256088A (zh) * 2018-01-23 2018-07-06 清华大学 一种基于键值数据库的时序数据的存储方法及系统
US10909114B1 (en) * 2018-06-19 2021-02-02 Amazon Technologies, Inc. Predicting partitions of a database table for processing a database query
US10884644B2 (en) * 2018-06-28 2021-01-05 Amazon Technologies, Inc. Dynamic distributed data clustering
CN111090687B (zh) * 2019-12-24 2023-03-10 腾讯科技(深圳)有限公司 数据处理方法及装置、系统、计算机可读存储介质
CN112445795A (zh) * 2020-10-22 2021-03-05 浙江蓝卓工业互联网信息技术有限公司 一种时序数据库的分布式存储扩容方法及数据查询方法

Also Published As

Publication number Publication date
US20240143626A1 (en) 2024-05-02
EP4357931A1 (en) 2024-04-24
WO2023279801A1 (zh) 2023-01-12

Similar Documents

Publication Publication Date Title
US11249969B2 (en) Data storage method and apparatus, and storage medium
CN109947668B (zh) 存储数据的方法和装置
US20170364697A1 (en) Data interworking method and data interworking device
US20220156115A1 (en) Resource Allocation Method And Resource Borrowing Method
US20240143626A1 (en) Shard Adjustment Method, Apparatus, and Device for Time Series Database, and Readable Storage Medium
US20200364080A1 (en) Interrupt processing method and apparatus and server
CN114356921A (zh) 数据处理方法、装置、服务器及存储介质
US20160203180A1 (en) Index tree search method and computer
US20230327875A1 (en) Data flow control in distributed computing systems
CN113051448A (zh) 数据处理方法、装置、电子设备及存储介质
CN111312352A (zh) 一种基于区块链的数据处理方法、装置、设备和介质
CN110851474A (zh) 数据查询方法、数据库中间件、数据查询设备及存储介质
CN113127477A (zh) 访问数据库的方法、装置、计算机设备和存储介质
CN112732711B (zh) 一种数据存储方法、装置及电子设备
US20230336368A1 (en) Block chain-based data processing method and related apparatus
CN113760640A (zh) 监控日志处理方法、装置、设备及存储介质
CN116151631A (zh) 一种业务决策处理系统、一种业务决策处理方法和装置
CN114491253B (zh) 观测信息处理方法、装置、电子设备及存储介质
CN115970295A (zh) 请求处理方法、装置和电子设备
CN114115696B (zh) 内存重删方法、装置及存储介质
CN115599782A (zh) 时序数据库的分区调整方法、装置、设备及可读存储介质
CN116701386A (zh) 键值对检索方法、装置及存储介质
CN111143326B (zh) 减少数据库操作的方法、装置、计算机设备及存储介质
US11947822B2 (en) Maintaining a record data structure using page metadata of a bookkeeping page
CN118132565B (zh) 数据索引存储的控制方法及装置、存储介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22836560

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022836560

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022836560

Country of ref document: EP

Effective date: 20240116

NENP Non-entry into the national phase

Ref country code: DE