WO2015035789A1 - 数据库管理方法与系统 - Google Patents

数据库管理方法与系统 Download PDF

Info

Publication number
WO2015035789A1
WO2015035789A1 PCT/CN2014/076481 CN2014076481W WO2015035789A1 WO 2015035789 A1 WO2015035789 A1 WO 2015035789A1 CN 2014076481 W CN2014076481 W CN 2014076481W WO 2015035789 A1 WO2015035789 A1 WO 2015035789A1
Authority
WO
WIPO (PCT)
Prior art keywords
partition
data
physical
logical
database system
Prior art date
Application number
PCT/CN2014/076481
Other languages
English (en)
French (fr)
Inventor
刘辉军
许中清
高军楠
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP14843923.5A priority Critical patent/EP2990962A4/en
Publication of WO2015035789A1 publication Critical patent/WO2015035789A1/zh
Priority to US14/956,666 priority patent/US9460186B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results

Definitions

  • the present invention relates to database technology, and in particular, to a database management method and system. Background technique
  • Partitioning is an important function of the database system.
  • the partition divides and conquers large data tables. It is represented by a partition table.
  • the partition table can also include partition fragments.
  • Reasonable partitioning can improve the performance of the database system and reduce the number of times the system reads and writes data.
  • the small database server handles the size of the data and facilitates the management of the database.
  • the partitioning strategy is a clustering strategy set by the designer for partitioning big data tables. The partitioning strategy specifies which partition fragment a data record of the data table is mapped to.
  • the database system also provides management functions for data partitioning to implement partition management.
  • Prior art database systems support three basic partitioning strategies: range partitioning, list partitioning, hash partitioning, or supporting other extended partitioning strategies and combined partitioning strategies.
  • the partitioning strategy and the number of partitioned segments have a significant impact on the performance of the partitioned table. If the partition fragment is too large, the advantage of the partition cannot be fully exploited; if the partition fragment is too small, the number of partition tables is too large, which increases the query optimization time, and the partition management is more complicated.
  • the prior art management functions for data partitioning include adding partition fragments, deleting partition fragments (for range partitions and list partitions), reducing partition fragments (for hash partitions), merging partition fragments, splitting partition fragments, modifying partition table names, etc. .
  • management operations are implemented, not only can the data of the partition tuple be modified, but data can also be migrated between partitioned tables.
  • an embodiment of the present invention provides a database management method, including: a database system divides a data partition into a physical partition layer and a logical partition layer, where the physical partition layer is configured to store a physical partition of the database system.
  • the physical partition fragment is a set of data records of the partition table;
  • the logical partition layer is configured to store a logical partition fragment of the database system, to provide logical partition information of the partition table;
  • the partition segment and the physical partition segment are mutually mapped by a mapping relationship;
  • the database system receives a data operation instruction of the user, and determines a logical partition segment to which the data record requested by the data operation instruction belongs;
  • the database system according to the The data operation instruction and the mapping relationship determine a physical partition segment of the physical partition layer corresponding to the determined logical partition segment to operate on a data record of the data operation instruction request operation.
  • the method further includes: determining, by the database system, the rationality of the current data partition according to the partition information of the physical partition layer and the logical partition layer, if the current data is determined If the partition is unreasonable, the database system adjusts the unreasonable logical partition segment or the physical partition segment.
  • the data operation instruction is specifically: a query instruction; and correspondingly, the operating instruction according to the data Determining, by the mapping relationship, the physical partition segment corresponding to the determined logical partition segment in the physical partition layer, to perform operations on the data record of the data operation instruction request operation, including: the database system Receiving the query instruction, determining a query condition based on the partition table according to the query instruction, performing partition pruning on the logical partition segment, excluding a logical partition segment not involved in the query, and generating an execution plan; And the database system determines, according to the execution plan, the physical partition segment corresponding to the determined logical partition segment in the physical partition layer according to the query instruction and the mapping relationship, to obtain the query instruction.
  • the database system triggers the logical partition layer to scan the corresponding physical partition segment in the physical partition layer according to the query instruction and the mapping relationship, to perform a corresponding query operation, Specifically, the database system: cyclically obtains a data record from the logical partition segment according to the execution plan and the query condition, and selects, according to the mapping relationship, a location corresponding to the determined logical partition segment.
  • the physical partition segment obtains a data record; the database system performs data filtering on the obtained data record according to the information in the query instruction, and returns the data record obtained by the data filtering to the user as a query result.
  • the data operation instruction is specifically: inserting, deleting, or updating an instruction;
  • the data operation instruction and the mapping relationship determine the physical partition segment corresponding to the determined logical partition segment in the physical partition layer, to perform operation on the data record of the data operation instruction request operation, specifically: Determining, by the database system, a data record to be inserted, deleted, or updated according to the insert, delete, or update instruction; the database system calculating, according to the set of the data record, the logical partition segment to which the data record belongs a physical partition segment corresponding to the logical partition segment; the database system performs an insert, delete or update operation on the calculated data in the physical partition segment.
  • the database system is configured to perform a reasonable current data partition according to the partition information of the physical partition layer and the logical partition layer. Judging, if it is determined that the current data partition is unreasonable, the database system adjusts the unreasonable logical partition segment or the physical partition segment, and the method includes: the database system collects a database system related to the partition adjustment operation Information; the information of the database system running includes: data information of a logical partition and a physical partition, a data partition size of the logical partition fragment and the physical partition fragment, a SQL statement submitted by the peripheral to the database server, and Execution frequency, the nature of the database server executing SQL statements The database system uses the information of the database system to determine whether the current data partition is reasonable; when it is determined that the current data partition is unreasonable, the database system re-plans the partitioning scheme; the database system is based on the re-planned partition The scheme adjusts the logical partition segment and/or the physical partition segment.
  • the database system by using information about the running of the database system, determining whether the current data partition is reasonable, includes: the database system is configured according to The feature value in the information of the running of the database system determines whether the current data partition is reasonable, and the feature value includes an upper limit of the data size of the partition segment and a threshold value of the skew factor, wherein if the feature value exceeds a preset value of the corresponding feature value , then it is judged that the current data partition is unreasonable.
  • the database system is configured to perform the logical partition segment according to the recalculated logical partition layer and the physical partition layer And adjusting the physical partition segment, and the method includes: the database system adjusting the logical partition layer by adding, deleting, merging, splitting the logical partition segment according to the re-planned partitioning scheme, and Or, the physical partition layer is adjusted by splitting the physical partition segments.
  • an embodiment of the present invention provides a database system, including: a physical partition layer, configured to store a physical partition fragment of the database system, where the physical partition fragment is a set of data records of a partition table; a layer, configured to store a logical partition segment of the database system, to provide logical partition information of the partition table; the logical partition segment and the physical partition segment are mapped to each other by a mapping relationship; and a processing module, configured to receive a user data operation An instruction to determine the data operation instruction request a logical partition segment to which the data record of the operation belongs; determining, according to the data operation instruction and the mapping relationship, the physical partition segment corresponding to the determined logical partition segment in the physical partition layer, The data manipulation instruction requests the operation of the data record to operate.
  • the method further includes: a partition management apparatus, configured to perform a rationality judgment on the current data partition according to the collected partition information of the physical partition layer and the logical partition layer; If it is determined that the current data partition is unreasonable, the logical partition segment or the physical partition segment that is unreasonable is adjusted.
  • a partition management apparatus configured to perform a rationality judgment on the current data partition according to the collected partition information of the physical partition layer and the logical partition layer; If it is determined that the current data partition is unreasonable, the logical partition segment or the physical partition segment that is unreasonable is adjusted.
  • the data operation instruction is specifically: a query instruction; and correspondingly, the processing module includes: An optimizer, configured to receive the query instruction, determine a query condition based on the partition table according to the query instruction, perform partition pruning on the logical partition segment, exclude a logical partition segment not involved in the query, and generate An execution plan, configured to: trigger, according to the execution plan generated by the query optimizer, the logical partitioning layer to determine the determined in the physical partition layer according to the query instruction and the mapping relationship And the logical partition segment corresponding to the physical partition segment to obtain a data record of the query instruction request query, and returning the obtained data record as a query result to the user.
  • An optimizer configured to receive the query instruction, determine a query condition based on the partition table according to the query instruction, perform partition pruning on the logical partition segment, exclude a logical partition segment not involved in the query
  • An execution plan configured to: trigger, according to the execution plan generated by the query optimizer, the logical partitioning layer to determine the determined in the physical partition layer according to the query instruction
  • the execution engine is specifically configured to: according to an execution plan generated by the query optimizer and the query condition, Obtaining a data record in the logical partition segment, and obtaining a data record from the corresponding physical partition segment from the determined logical partition segment according to the mapping relationship; according to the information in the query instruction The obtained data record is subjected to data screening, and the data record obtained by the data screening is returned to the user as a query result.
  • the data operation instruction is specifically: inserting, deleting, or updating an instruction; correspondingly, the processing module body For: determining, according to the insert, delete or update instruction, a data record to be inserted, deleted or updated; Calculating, according to the set of the data records, the logical partition segment to which the data record belongs and a physical partition segment corresponding to the logical partition segment; and calculating the physical partition according to the logical partition layer pair
  • the data record within the fragment performs an insert, delete, or update operation.
  • the area management apparatus includes: a partition information collecting unit, configured to collect information about a database system operation related to the partition adjustment;
  • the database system operation information includes: data information of a logical partition and a physical partition, a data partition size of the logical partition fragment and the physical partition fragment, a SQL statement submitted by the peripheral to the database server, and an execution frequency,
  • the database server executes the performance of the SQL statement;
  • the partition decision unit is configured to use the information of the database system to determine whether the current data partition is reasonable; when it is determined that the current data partition is unreasonable, re-plan the partition scheme;
  • the logical partition segment and/or the physical partition segment are adjusted according to the re-planned partitioning scheme.
  • the partitioning determining unit is specifically configured to: according to the information about the running of the database system collected by the partition information collecting unit The feature value determines whether the current data partition is reasonable.
  • the feature value includes an upper limit of the data size of the partition segment and a skew factor threshold. If the feature value exceeds a preset value of the corresponding feature value, it is determined that the current data partition is unreasonable.
  • the zoning adjustment unit is configured to: add, according to the re-planned partitioning scheme, by adding the logical partition segment , deleting, merging, splitting the logical partition layer, and ⁇ or, by adjusting the split of the physical partition segment.
  • the zoning adjustment unit is configured to: add, according to the re-planned partitioning scheme, by adding the logical partition segment , deleting, merging, splitting the logical partition layer, and ⁇ or, by adjusting the split of the physical partition segment.
  • one of the logical partition segments corresponds to one or more of the physical Partition fragment.
  • the database management method and system provided by the embodiments of the present invention divide the data partition into a physical partition layer and a logical partition layer; the logical partition segment is mapped with the corresponding physical partition segment; and the logical partition layer scans the corresponding physical layer in the physical partition layer.
  • a partition fragment capable of responding to operations of the corresponding physical partition segment by the query optimizer, the execution engine, and the partition management device.
  • Figure 1 is a schematic diagram of a database structure of a hierarchical partition
  • Embodiment 1 of a database management method according to the present invention
  • Embodiment 3 is a flowchart of Embodiment 2 of a database management method according to the present invention.
  • FIG. 4 is a schematic diagram of a process of applying a database management method to a query operation according to the present invention
  • FIG. 5 is a flowchart of a third embodiment of a database management method according to the present invention.
  • FIG. 6 is a flowchart of Embodiment 4 of a database management method according to the present invention.
  • FIG. 7 is a schematic diagram of a process for judging the rationality of data partitioning in the embodiment shown in FIG. 6.
  • FIG. 8 is a schematic structural diagram of Embodiment 1 of a database system according to the present invention.
  • Embodiment 9 is a schematic structural diagram of Embodiment 2 of a database system according to the present invention.
  • FIG. 10 is a schematic structural diagram of a partition management apparatus in a database system according to the present invention.
  • the technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention.
  • the embodiments are a part of the embodiments of the invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without departing from the inventive scope are the scope of the present invention.
  • the data partition is reasonable and directly affects the performance of the database system.
  • the performance of the database system may be low; and, along with the database
  • the increase in the amount of data in the system may cause the partitioning scheme to be changed from reasonable to unreasonable. Therefore, the partitioning scheme needs to be re-adjusted.
  • the prior art database management method when adjusting the partition, for example, when splitting or merging partitioned subtables, Due to the large amount of data involved, the cost of data migration is high.
  • the database system can include: a partition management device, a query optimizer, and an execution engine.
  • the partitioning of the data can be completed by the partition management device.
  • Query optimizer is a database system structured query language (Structured Query
  • the statement generates an execution plan.
  • the execution engine can be the execution engine of the database system, and the execution engine receives the execution plan of the query optimizer, and performs operations such as query, insert, delete, or update.
  • FIG. 1 is a schematic diagram of a hierarchical partition database structure, as shown in FIG. 1 , the data in the data table Records are physically clustered by physical partitioning strategies, that is, physical partition fragments, logically clustered by logical partitioning strategies, ie, logical partition fragments.
  • the logical partition layer resides above the physical partition layer, and one logical partition fragment may correspond to one or more physical partition fragments. For example, in FIG. 1, the data bits Li, L 2 , ...
  • i in the logical partition layer are integers greater than 1
  • the data corresponding to 1 ⁇ in the physical partition layer is P u , ... to P ln
  • the Li data being Pu ⁇ to P im , n and m are integers greater than 0.
  • the functions and attributes of the physical partition layer and the logical partition layer are as follows:
  • Physical partitioning layer A method of clustering the data recorded in the data table on the storage medium.
  • the data records of the partition table are clustered on the storage medium in a physical partitioning strategy.
  • the physical partitioning strategy and partition granularity are determined by the partition management device.
  • the functions of the physical partition layer include: storing physical partition segments; adding, deleting, querying, or modifying data in the physical partition according to instructions of the logical partition layer; performing physical partition table according to instructions of the partition management device Management operations such as creating, deleting, splitting, and merging.
  • Logical partitioning layer A logical clustering of tuples that represent data tables.
  • the logical partitioning strategy is closely related to the physical partitioning strategy, and the logical partitioning strategy is limited by the physical partitioning strategy.
  • a logical partition fragment can be mapped to one or more physical partition fragments; conversely, a physical partition fragment available to the upper layer must be mapped to one and can only be mapped to one logical partition fragment.
  • the logical partition layer has fewer logical segments than the physical partition segments in the physical partition layer.
  • the functions of the logical partition layer include: providing the query optimizer with logical partition information of the partition table as the basis for partition pruning; when the database server executes the query statement, the logical partition layer provides services for the execution engine: Sequence scan or index scan; respond to execution engine requests by scanning physical partition fragments mapped on the physical partition layer; when the database server performs operations such as insert, delete, or update, the logical partition layer is executed according to the execution engine
  • the instruction performs an operation of inserting, deleting, or updating the tuple on the logical partition segment; the logical partition layer performs the operations of inserting, deleting, or updating the data record onto the physical partition segment through the data record routing; according to the partition management device Instruction, collect logical partition information, perform logical partition table creation, deletion, merge, split, and other management operations.
  • FIG. 2 is a flowchart of the first embodiment of the database management method of the present invention.
  • the method in this embodiment is executed by the database system. As shown in FIG. 2, the method may include:
  • the database system divides the data partition into a physical partition layer and a logical partition layer, where the physical partition layer is configured to store a physical partition fragment of the database system, where the physical partition fragment is each data record of the partition table.
  • the logical partitioning layer is configured to store logical partition segments of the database system to provide logical partition information of the partition table; the logical partition segment is mapped to the corresponding physical partition segment.
  • S201 can be performed by the partition management device in the database system.
  • tuple represents a data record in the relational data table
  • partition table represents the complete data table in the logical data partition
  • partition key represents the partitioning strategy used in the data partition One or more columns of the data table, the partitioning key is used to perform aggregation of data according to a certain interval value or range value, a specific value list or a hash function value
  • Segment The partition table is divided into partition fragments according to the partition policy, including physical partition fragments and logical partition fragments.
  • the physical partition layer embodies a clustering manner of the tuple of the data table on the storage medium.
  • the tuples of the partition table are clustered on the storage medium in a physical partitioning strategy.
  • the physical partitioning strategy and partition granularity are determined by the partition management device.
  • the partition granularity represents the unit of the partition, for example, by month or year.
  • the logical partition layer embodies the logical clustering of tuples of data tables. Moreover, the logical partitioning strategy is closely related to the physical partitioning strategy, and the logical partitioning strategy is limited by the physical partitioning strategy.
  • a logical partition fragment can be mapped to one or more physical partition fragments; conversely, a physical partition fragment available to the upper layer must be mapped to one and can only be mapped to one logical partition fragment. That is, the number of logical segments of the logical partition layer is smaller than that of the physical partition layer in the physical partition layer, and, in specific implementation, the partition management device can control the size of each logical partition segment to be consistent for management.
  • the database system receives a data operation instruction of the user, and determines a logical partition segment to which the data record requested by the data operation instruction belongs.
  • the logical partition layer may operate on the corresponding physical partition segment according to the query optimizer, the execution engine, and the instruction of the partition management device in the database system.
  • the services provided by the logical partition layer for the query optimizer include: providing the query optimizer with logical partition information of the partition table as a basis for partition pruning; wherein, the partition pruning is a technology closely related to the partition method, for example, The logical partition fragment performs partition pruning, which can exclude logical partition fragments that are not involved in the query. Partition pruning maps the application of queries to partitioned tables to queries for a few partitioned fragments.
  • the logical partitioning layer provides services for the execution engine including: For example, when the database server executes the query statement, the logical partition layer performs a sequence scan or an index scan on the logical partition fragment. The logical partition layer responds to requests from the execution engine by scanning the physical partition fragments mapped to it on the physical partition layer. Further, the logical partition layer can also use physical partition pruning to improve the performance of physical partition segment scanning, thereby further improving query performance.
  • the services provided by the logical partitioning layer for the execution engine include: For example, when the database server performs insert, delete, or update tuple statements, the execution engine performs insert, delete, or update operations on tuples on the logical partition fragments through the logical partition layer, logic The partition layer commits the insertion, deletion, or update operations of the tuple onto the physical partition fragment through tuple routing.
  • the services provided by the logical partition layer to the partition management device include, for example, collection of logical partition information, management operations such as creation, deletion, merging, and splitting of logical partition tables.
  • management operations such as creation, deletion, merging, and splitting of logical partition tables.
  • the management of the logical partition table involves only the modification of the data of the logical partition tuple, so the management cost is very low.
  • the physical partition layer provides the following services:
  • Serving the logical partition layer For example, performing a query/insert/delete/update operation on a tuple of a physical partition;
  • Providing services to the partition management device For example, management operations such as creation, deletion, splitting, and merging of physical partition tables.
  • the physical partitioning layer is invisible to the query optimizer and the execution engine.
  • the physical partitioning layer is connected to the query optimizer and the execution engine through the logical partitioning layer.
  • the database management method provided by the embodiment of the present invention divides the data partition into a physical partition layer and a logical partition layer; the logical partition segment is mapped with the corresponding physical partition segment; and the logical partition layer scans the corresponding physical partition segment in the physical partition layer
  • the operation of the corresponding physical partition segment can be responded to by the query optimizer, the execution engine, and the partition management device.
  • the embodiment of the present invention can reduce the cost of partition adjustment by adopting a layered method of partitioning data into a physical partition layer and a logical partition layer, thereby avoiding large-scale data migration caused by the adjustment partition.
  • the data operation instruction is specifically: a query instruction, and further, the embodiment further includes automatic adjustment of a database partition. method. As shown in FIG. 3, the method in this embodiment may be performed on the basis of the embodiment shown in FIG.
  • the database system receives the query instruction, determines a query condition based on the partition table according to the query instruction, performs partition pruning on the logical partition segment, excludes a logical partition segment not involved in the query, and generates an execution. plan.
  • S301 can be executed by a query optimizer in a database system
  • the query optimizer is A component of the database system SQL engine that generates an execution plan for DML statements.
  • the functions associated with the partitioned table for example, perform a partition pruning on the query statement, or on the portion of the query in the insert, delete update, update tuple statement.
  • the partition pruning is only for logical partitions.
  • the query optimizer does not directly operate on the physical partition layer.
  • the partition pruning here is based on the query condition of the partition table, in the stage of generating the execution plan, it is judged that those logical partition fragments are not involved in the query, and the logical partition fragments not involved in the part of the query are involved. Splitting at the stage of generating the execution plan, partition pruning can greatly improve query performance.
  • the database system determines, according to the execution plan, the physical partition segment corresponding to the determined logical partition segment in the physical partition layer according to the query instruction and the mapping relationship, to obtain the check.
  • the query requests the data record of the query, and returns the obtained data record as a query result to the user.
  • S302 may be executed by an execution engine in a database system, and the execution engine performs a corresponding query operation on the logical partition segment according to an execution plan generated by the query optimizer.
  • the data operation instruction is an operation such as inserting, deleting or updating
  • the above method for querying the instruction can also be used for inserting, deleting or updating. The part of the query in the operation.
  • the database system can also automatically manage data partitions, which can include:
  • the database system performs a rationality judgment on the current data partition according to the partition information of the physical partition layer and the logical partition layer. If it is determined that the current data partition is unreasonable, the database system is unreasonable to the logical partition. The segment or the physical partition segment is adjusted.
  • S303 can be executed by a partition management device in the database system. Where the data partition is reasonable, it can be judged by a set of eigenvalues, which are:
  • the size of the partition fragment can define the number of tuples contained in the fragment of the partition or the size of the storage space occupied by the partition. When a logical partition fragment size is larger than the upper scale limit, this logical partition fragment must be split into smaller logical partition fragments.
  • the upper limit of the logical partition fragment size is specified by the database system or by the partition table definer.
  • Skew factor threshold The skew factor of two partition segments refers to the ratio of the larger scale table to the smaller scale table in the data size. If the skew factor of the two partition segments is greater than the skew factor threshold Value, it is considered that the data partition has a data skew and needs to be adjusted.
  • the results of data partition rationality judgment are classified into reasonable and unreasonable.
  • the unreasonable partition includes: logical partition overflow, physical partition overflow, and logical partition skew.
  • the partition management device provides the following services:
  • the partition management device determines the partition policy and partition granularity according to the data type of the partition key according to the default setting.
  • the database administrator can also specify the partition policy when creating the partition. And partition granularity.
  • the partition management device condition triggers (timing trigger or change of a key indicator of the system, etc.).
  • the partition analysis process determines whether the current partition is reasonable according to the real-time operation information of the system. If unreasonable, the partition management device decides a new partitioning strategy, and adjusts the unreasonable logical partition segment or the physical partition segment according to a new partitioning strategy to implement automatic management of the data partition.
  • the query optimizer and the execution engine in the database system perform operations such as inserting, deleting, or updating the corresponding physical partition segments by the logical partition segment to improve the database.
  • Management efficiency; the partition calendar device in the database system can automatically adjust the partition according to the data changes, which can improve the rationality of the partition and the management performance of the database system.
  • FIG. 4 is a schematic diagram of a process of applying a database management method to a query operation according to the present invention. As shown in FIG. 4, this embodiment is a detailed description of a process (S301 and S302) for implementing a query operation according to the method in the second embodiment.
  • the method of this embodiment may include:
  • S401 can be performed by a query optimizer in the database system.
  • the query optimizer first performs a structured query language (Structured)
  • SQL Query Language, SQL
  • parsing parsing the original SQL statement to generate the parse tree; then generating the plan tree, that is, generating the plan tree from the parse tree, wherein S401 further includes logical partition pruning of the partition table according to the query condition, excluding the query The logical partition fragment involved.
  • S402 can be performed by an execution engine in the database system and a logical partition layer and a physical partition layer.
  • the execution engine is from the logical partition segment according to a preset scan filter condition Retrieving a data record, the logical partition segment obtaining a data record from the corresponding physical partition segment;
  • the specific execution query process mainly describes the scanning operation on the partition table, and specifically includes: initializing the logical partition segment scan, and setting the scan filter condition.
  • the physical partition pruning is performed according to the scanning filter condition.
  • the initialization of the logical partition segment scanning process is implemented by initializing the underlying physical partition segment scanning, so setting the scanning filtering condition is to set the scanning filtering condition for the physical partition segment.
  • the loop takes the tuple from the logical partition fragment.
  • the logical partitioning layer is implemented by taking tuples from the physical partition fragments. For example, taking a tuple from a logical partition fragment is by taking a tuple from the corresponding physical partition fragment.
  • the tuple filtering process is performed as input for subsequent calculations; and a tuple is taken.
  • S403 can be performed by a query optimizer in the database system.
  • the data record is obtained from the logical partition segment and the corresponding physical partition segment according to the preset scan filter condition, and the query process for the data record is implemented.
  • FIG. 5 is a flowchart of Embodiment 3 of the database management method of the present invention.
  • the data operation instruction is specifically: inserting, deleting, or updating an instruction.
  • the method in this embodiment may include, when based on the embodiment shown in FIG. 2, an operation for inserting, deleting, or updating an instruction.
  • the database system determines a data record to be inserted, deleted, or updated according to the insert, delete, or update instruction.
  • the database system may calculate a tuple inserted, deleted, or updated according to the insert, delete, or update instruction: For example, a method for obtaining a tuple is a tuple directly input by a user (for example, inserting a new tuple ), so that after the SQL statement is parsed, you can get the tuple; Another way to get a tuple is to get a tuple (such as a conditional update) from the data table by query. In this case, the target tuple that needs to be obtained is the same as the query process.
  • the database system calculates, according to the set of the data records, the logical partition segment to which the data record belongs and a physical partition segment corresponding to the logical partition segment.
  • the database system may calculate the logical partition segment to which the tuple belongs according to the partition key value of the tuple: determining the physical partition segment corresponding to the logical partition segment of the tuple according to the logical partition policy and the mapping relationship.
  • S501 and S502 can be similar to the query operations of S301 and S302 in the second embodiment in actual operation.
  • the database system performs an insert, delete, or update operation on the calculated data in the physical partition segment.
  • the execution engine sends a tuple insertion, deletion, or update request to the logical partition layer, and the logical partition layer calculates a physical partition fragment of the mapping according to the partition key value of the tuple, and submits the insert, delete, or update request to the physical partition. Layer to complete.
  • Embodiments of the present invention may implement an operation of inserting, deleting, or updating an acquired data record by obtaining a data record from the logical partition segment and the corresponding physical partition segment.
  • FIG. 6 is a flowchart of Embodiment 4 of the database management method of the present invention. As shown in FIG. 6, the embodiment describes a process for implementing automatic partition management according to the method in the foregoing Embodiment 2.
  • the partition management device implements automatic management of data partitioning, and specifically includes:
  • the partition information collecting unit of the partition management device collects information about the operation of the database system related to the partition adjustment.
  • the collected information includes: data information of a logical partition and a physical partition, a data size of the logical partition fragment and the physical partition fragment, a SQL statement submitted by the peripheral to the database server, an execution frequency, and a database server execution.
  • the performance of the SQL statement; this information is the basis for decision making by the partition decision unit.
  • the partition decision unit of the partition management device uses the information collected by the partition information collection unit to determine that the current data partition is unreasonable, and the partition decision unit re-plans the partition. Calculate new logical partitions and physical partitions.
  • the partition decision unit recalculates the reasonable data partition according to the current partition information.
  • the partition adjustment unit of the partition management apparatus adjusts the logical partition segment or the physical partition segment.
  • the partition adjustment unit After the partition adjustment unit performs the adjustment process of the logical data partition or the physical data partition, the data partition adjustment is ended.
  • the database server automatically triggers the data partitioning adjustment process according to a certain logic, for example, periodically triggering according to time, the adjustment of the data partition by the partition adjustment unit includes: determining, according to the collected information, whether the current data partition is reasonable;
  • the partition decision unit recalculates the reasonable data partition based on the current partition information
  • the partition adjustment unit performs adjustment of the logical data partition segment or the physical data partition segment, specifically modifying the logical partition of the logical partition tuple by adding, deleting, merging, and splitting the logical partition segment.
  • the fragment is adjusted, or the physical partition fragment is adjusted by splitting the physical partition.
  • the partition adjustment process is completed by the partition adjustment unit.
  • the adjustment involves only logical partitions. Includes addition, deletion, merging, splitting, etc. of logical partition segments. These adjustments to the logical partition only need to modify the data of the logical partition tuple without the need for data migration between partition fragments.
  • adjusting logical partitions does not require data migration, adjusting physical partitions often involves data migration. Data migration will cause partial partition fragments to be unavailable for a period of time, so the cost of logical partition adjustment is much lower than the cost of physical partition adjustment. .
  • the physical partition granularity in the hierarchical data partition is relatively small, so the amount of data affected by the adjustment is relatively small, thereby improving data availability.
  • FIG. 7 is a schematic diagram of the process of judging the rationality of data partitioning in the embodiment shown in FIG. 6. This embodiment describes in detail the process of judging the rationality of data partitioning in the process of implementing automatic management in the above embodiment.
  • This embodiment describes a data partitioning rationality judging process for demonstrating that the present invention is implementable.
  • the idea of the present invention is not limited thereto, and other embodiments different from the process described in this embodiment do not affect the layering idea of data partitioning and the protection of the data partitioning adjustment process.
  • the zoning decision unit determines whether the current data partition is reasonable according to the feature value in the information collected by the partition information collecting unit, where the feature value includes an upper limit of the data size of the partition segment and a threshold value of the skew factor. If the feature value exceeds the preset value of the corresponding feature value, it is determined that the current data partition is unreasonable.
  • the rationality judgment of the data partition is based on a set of eigenvalues, and the set of eigenvalues is:
  • the size of the partition fragment can define the number of tuples contained in the fragment of the partition or the size of the storage space occupied by the partition. Includes logical partition fragment size and physical partition fragment size. For example, when a logical partition fragment size is larger than the upper limit of the logical partition fragment size, this logical partition fragment can be split into smaller logical partition fragments.
  • the logical partition fragment size cap can be specified by the database system or by the partition table definer.
  • the physical partition fragment size is the same.
  • Skew factor threshold The skew factor of two partition segments refers to the ratio of the larger scale table in the logical partition segment size to the scale value of the smaller scale table. If the skew factor of the two logical partition segments is greater than the skew factor threshold, then the data partition is considered to have data skew and needs to be adjusted.
  • the results of data partition rationality judgment are divided into reasonable and unreasonable.
  • the partitions are unreasonable, including: logical partition overflow, physical partition overflow, logical partition skew, three unreasonable results, which can correspond to the logical partition fragment size exceeding the limit, physical partition The fragment size exceeds the limit, and the skew factor of the two logical partition segments is greater than the skew factor threshold.
  • the data partitioning rationality judgment process specifically includes:
  • the minimum logical partition fragment handle and the minimum logical partition fragment size minPart2 are updated. Otherwise, if the logical partition fragment size is larger than the maximum logical partition fragment size ax Part2, the maximum logical partition handle and the maximum logical partition fragment size m ax Part2 are updated ;
  • the logical partition fragment size is not larger than the maximum logical partition fragment size
  • maxPart2 that is, no, it is judged that the logical partition segment is reasonable, go to S703; otherwise, then, execute S706;
  • This logical partition fragment is too large to record an oversized logical partition fragment.
  • the judgment result is that the logical partition fragment is unreasonable; if yes, for example, at least one physical partition fragment size is greater than or equal to the largest physical partition fragment.
  • the scale maxPartl the judgment result is unreasonable for this physical partition fragment. Record unreasonable logical partition fragments or physical partition fragments;
  • the physical partition fragment or the logical partition fragment is not found to be unreasonable after the above steps, calculate the two partitions having the largest logical partition fragment size and the smallest logical partition fragment size.
  • the skew factor of the segment ie m ax Part2/mi n P a rt2, and determine whether the calculated value of the skew factor is greater than the skew factor threshold. If no, that is, the skew factor is less than the skew factor threshold, then the data partition is considered reasonable and no adjustment is needed.
  • the result of the judgment is that the logical partition is skewed, that is, the partition is unreasonable, and logical partition adjustment is required.
  • FIG. 8 is a schematic structural diagram of Embodiment 1 of a database system according to the present invention. As shown in FIG. 8, the system includes:
  • a physical partition layer 801 configured to store a physical partition fragment of the database system, where the physical partition fragment is a set of data records of the partition table;
  • the logical partition layer 802 is configured to store logical partition segments of the database system to provide logical partition information of the partition table; the logical partition segments and the physical partition segments are mapped to each other by a mapping relationship.
  • the processing module 803 is configured to receive a data operation instruction of the user, determine a logical partition segment to which the data record requested by the data operation instruction belongs, and determine, according to the data operation instruction and the mapping relationship, the physical partition layer Determining the physical partition segment corresponding to the logical partition segment to operate on a data record of the data operation instruction request operation.
  • the data partition is divided into a physical partition layer and a logical partition layer; the logical partition segment is mapped with the corresponding physical partition segment; and the logical partition layer can respond by scanning the corresponding physical partition segment in the physical partition layer.
  • FIG. 9 is a schematic structural diagram of Embodiment 2 of the database system of the present invention.
  • the tuples in the zone layer may be associated with tuples in the physical partition layer, and may be associated by tuple routing (not shown).
  • the system may further include:
  • the partition management device 804 is configured to perform plausibility judgment on the current data partition according to the collected partition information of the physical partition layer and the logical partition layer; if it is determined that the current data partition is unreasonable, the logic is unreasonable The partition fragment or the physical partition fragment is adjusted.
  • the data operation instruction may specifically be: a query instruction, and correspondingly, the processing module 803 may include:
  • a query optimizer 8031 configured to receive the query instruction, determine a query condition based on the partition table according to the query instruction, perform partition pruning on the logical partition segment, and exclude a logical partition segment not involved in the query And generate an execution plan;
  • the execution engine 8032 is configured to trigger, according to the execution plan generated by the query optimizer 8031, the logical partitioning layer to determine the determined logic in the physical partition layer according to the query instruction and the mapping relationship.
  • the physical partition segment corresponding to the partition segment is obtained to obtain a data record of the query request query, and the obtained data record is returned to the user as a query result.
  • the execution engine 8032 is specifically configured to:
  • a data record is obtained from the logical partition segment, and corresponding to the determined logical partition segment according to the mapping relationship
  • the physical partition fragment obtains a data record
  • the data operation instruction may be specifically: inserting, deleting, or updating an instruction; correspondingly, the processing module 803 may be specifically configured to:
  • FIG. 10 is a schematic structural diagram of a partition management apparatus in a database system according to the present invention. As shown in FIG. 9, the partition management apparatus 804 may specifically include:
  • a partition information collecting unit 8041 configured to collect information about a database system operation related to the partition adjustment
  • the database system operation information includes: data information of a logical partition and a physical partition, a data partition size of the logical partition fragment and the physical partition fragment, a SQL statement submitted by the peripheral to the database server, and an execution frequency, The performance of the database server executing the SQL statement;
  • a partitioning decision unit 8042 configured to use the information of the database system to determine whether the current data partition is reasonable; when it is determined that the current data partition is unreasonable, re-plan the partitioning scheme;
  • the partition adjustment unit 8043 is configured to adjust the logical partition segment and/or the physical partition segment according to the re-planned partitioning scheme.
  • the partition decision unit 8042 is specifically configured to:
  • the partition information collecting unit 8041 Determining whether the current data partition is reasonable according to the feature value in the information of the operation of the database system collected by the partition information collecting unit 8041, where the feature value includes an upper limit of the data size of the partition segment, and a skew factor threshold, where If the feature value exceeds the preset value of the corresponding feature value, it is judged that the current data partition is unreasonable.
  • the partition adjustment unit 8043 is specifically configured to:
  • the logical partition layer is adjusted by adding, deleting, merging, splitting the logical partition segments, and/or by splitting the physical partition segments.
  • one of the logical partition segments corresponds to one or more of the physical partition segments.
  • the data partitioning system automatically selects a reasonable partitioning strategy and partition granularity for the partition table, which avoids the partition performance being low because the DBA adopts an unreasonable partition; and the data partition increases with the increase of the data volume.
  • the system automatically adjusts the partition according to the data change, and can find unreasonable partitions in time and adjust them in time without DBA intervention.
  • the data partitioning system utilizes the idea of physical partitioning and logical partition layering, which is avoided in most cases. Large-scale data migration caused by partition adjustment greatly reduces the cost of partition adjustment; automatic partition adjustment of data partition system frees DBA from complex partition management.
  • the basis for the partition adjustment is the real-time information of the database system operation, so it can be guaranteed that the adjusted partition strategy is reasonable. Moreover, it does not require long-term verification; because the hierarchical data partition is introduced, the performance advantages of the storage cluster system can be fully utilized, and the part below the logical partition layer is implemented on the storage cluster, which greatly improves the query efficiency.
  • the database management method and system provided by the embodiments of the present invention divide the data partition into a physical partition layer and a logical partition layer; the logical partition segment is mapped to the corresponding physical partition segment;
  • the partitioning layer is responsive to the operation of the corresponding physical partition segment by the query optimizer, the execution engine, and the partition management device by scanning the corresponding physical partition segment in the physical partition layer.
  • the embodiment of the invention can automatically adjust the partition according to the data change, and can improve the rationality of the partition and the management performance of the database system.
  • the aforementioned program can be stored in a computer readable storage medium.
  • the program when executed, performs the steps including the above-described method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.

Abstract

本发明实施例提供一种数据库管理方法与系统。方法包括:数据库系统将数据分区划分为物理分区层和逻辑分区层;其中,物理分区层用于存储所述数据库系统的物理分区片段,物理分区片段为分区表的各条数据记录的集合;逻辑分区层用于存储数据库系统的逻辑分区片段,以提供分区表的逻辑分区信息;逻辑分区片段与所述物理分区片段通过映射关系互相映射;数据库系统接收用户的数据操作指令,确定数据操作指令所请求操作的数据记录所属的逻辑分区片段;根据数据操作指令和映射关系确定物理分区层中与确定出的逻辑分区片段对应的物理分区片段,以对数据操作指令对应请求操作的数据记录进行操作。本发明能够提高数据库系统的性能。

Description

数据库管理方法与系统
技术领域
本发明涉及数据库技术, 尤其涉及一种数据库管理方法与系统。 背景技术
分区是数据库系统的重要功能, 分区将大的数据表分而治之, 用分区 表表示, 分区表还可以包括分区片段, 合理的分区能使数据库系统性能提 高, 可减少系统对数据的读写次数, 减小数据库服务器处理数据的规模, 便于对数据库的管理。 分区策略是设计人员设定的对大数据表分区的聚簇 策略, 分区策略规定数据表的一个数据记录映射到哪个分区片段。 数据库 系统还提供对数据分区的管理功能, 实现对分区的管理。
现有技术的数据库系统支持三种基本分区策略:范围分区、列表分区、 散列分区, 或支持其他扩展分区策略和组合分区策略。 分区策略和分区片 段数量对分区表的性能影响重大。 如果分区片段过大, 导致分区的优势不 能充分发挥; 如果分区片段过小, 使得分区表数量太多, 从而增加査询优 化的时间, 分区管理上也会更复杂。
现有技术对数据分区的管理功能, 包括增加分区片段、 删除分区片段 (针对范围分区和列表分区) , 减少分区片段 (针对散列分区) 、 合并分 区片段、 分裂分区片段, 修改分区表名等。 实施这些管理操作时, 不仅可 能会修改分区元组的数据, 而且还可能导致数据在分区表之间迁移。
现有技术中, 随着数据量的增长, 容易存在分区的不合理而导致分区性能 和査询性能降低, 同时管理操作在拆分、 合并分区片段时, 容易导致数据 迁移代价较高, 并有可能锁住部分分区片段或者整张分区表, 导致系统性 能下降, 不利于数据库系统的管理。 发明内容
本发明提供一种数据库管理方法与系统, 能够根据数据变化自动调整分 区, 可降低分区调整的代价, 并提高分区的合理性以及数据库系统的管理性 第一方面, 本发明实施例提供一种数据库管理方法, 包括: 数据库系统将数据分区划分为物理分区层和逻辑分区层, 其中, 所述物理分区层, 用于存储所述数据库系统的物理分区片段, 所 述物理分区片段为分区表的各条数据记录的集合; 所述逻辑分区层, 用于存 储所述数据库系统的逻辑分区片段, 以提供所述分区表的逻辑分区信息; 所 述逻辑分区片段与所述物理分区片段通过映射关系互相映射; 所述数据库系统接收用户的数据操作指令, 确定所述数据操作指令所请 求操作的数据记录所属的逻辑分区片段; 所述数据库系统根据所述数据操作指令和所述映射关系确定所述物理分 区层中与确定出的所述逻辑分区片段对应的物理分区片段, 以对所述数据操 作指令请求操作的数据记录进行操作。 在第一方面的第一种可能的实现方式中, 还包括: 所述数据库系统根据所述物理分区层和所述逻辑分区层的分区信息, 对 当前数据分区进行合理性判断, 如果判断当前数据分区不合理, 则所述数据 库系统对不合理的所述逻辑分区片段或所述物理分区片段进行调整。 根据第一方面或第一方面的第一种可能的实现方式, 在第二种可能的实 现方式中, 所述数据操作指令具体为: 査询指令; 相应地, 所述根据所述数 据操作指令和所述映射关系确定所述物理分区层中与确定出的所述逻辑分区 片段对应的所述物理分区片段, 以对所述数据操作指令请求操作的数据记录 进行操作, 包括: 所述数据库系统接收所述査询指令, 根据所述査询指令确定基于分区表 的査询条件, 对所述逻辑分区片段进行分区剪枝, 排除査询不涉及的逻辑分 区片段, 并生成执行计划; 所述数据库系统根据所述执行计划, 根据所述査询指令和所述映射关系 确定所述物理分区层中与确定出的所述逻辑分区片段对应的所述物理分区片 段, 以得到所述査询指令请求査询的数据记录, 并将得到的所述数据记录作 为査询结果返回给用户。 根据第一方面的第二种可能的实现方式, 在第三种可能的实现方式中, 所述数据库系统根据所述执行计划, 触发所述逻辑分区层根据所述査询指令 和所述映射关系扫描所述物理分区层中对应的所述物理分区片段, 以执行相 应的査询操作, 具体包括: 所述数据库系统根据所述执行计划和所述査询条件, 从所述逻辑分区片 段中循环取得数据记录, 并根据所述映射关系从与确定出的所述逻辑分区片 段对应的所述物理分区片段取得数据记录; 所述数据库系统根据所述査询指令中的信息对所述取得的数据记录进行 数据筛选, 将所述数据筛选获得的数据记录作为査询结果返回给用户。 根据第一方面或第一方面的第一种可能的实现方式, 在第四种可能的实 现方式中, 所述数据操作指令具体为: 插入、 删除或更新指令; 相应地, 所 述根据所述数据操作指令和所述映射关系确定所述物理分区层中与确定出的 所述逻辑分区片段对应的所述物理分区片段, 以对所述数据操作指令请求操 作的数据记录进行操作, 具体包括: 所述数据库系统根据所述插入、 删除或更新指令, 确定待插入、 删除或 更新的数据记录; 所述数据库系统根据所述数据记录的所在集合计算所述数据记录所属的 所述逻辑分区片段以及与所述逻辑分区片段对应的物理分区片段; 所述数据库系统对计算得出的所述物理分区片段内的数据执行插入、 删 除或更新操作。 根据第一方面的第一种可能的实现方式, 在第五种可能的实现方式中, 根所述数据库系统根据所述物理分区层和所述逻辑分区层的分区信息, 对当 前数据分区进行合理性判断, 如果判断当前数据分区不合理, 则所述数据库 系统对不合理的所述逻辑分区片段或所述物理分区片段进行调整, 包括: 所述数据库系统收集与分区调整相关的数据库系统运行的信息; 其中, 所述数据库系统运行的信息包括: 逻辑分区和物理分区的元组的 数据信息、 所述逻辑分区片段和所述物理分区片段的数据规模、 外设提交给 数据库服务器的 SQL语句以及执行频率、 数据库服务器执行 SQL语句的性 所述数据库系统利用所述数据库系统运行的信息, 判断当前数据分区是 否合理; 当判断出当前数据分区不合理时, 所述数据库系统重新规划分区方案; 所述数据库系统根据所述重新规划的分区方案, 对所述逻辑分区片段和\ 或所述物理分区片段进行调整。 根据第一方面的第五种可能的实现方式, 在第六种可能的实现方式中, 所述数据库系统利用所述数据库系统运行的信息, 判断当前数据分区是否合 理, 包括: 所述数据库系统根据所述数据库系统运行的信息中的特征值判断当前数 据分区是否合理, 所述特征值包括分区片段的数据规模上限、偏斜因子阈值, 其中, 如果所述特征值超过相应特征值的预设值, 则判断当前数据分区不合 理。 根据第一方面的第五种可能的实现方式, 在第七种可能的实现方式中, 所述数据库系统根据所述重新计算的所述逻辑分区层和所述物理分区层对所 述逻辑分区片段和 \或所述物理分区片段进行调整, 包括: 所述数据库系统根据所述重新规划的分区方案, 通过对所述逻辑分区片 段的增加、 删除、 合并、 拆分调整所述逻辑分区层, 和\或, 通过对所述物理 分区片段的拆分调整所述物理分区层。
根据第一方面、 第一方面的第一种至第七种可能的实现方式中的任意一 种, 在第八种可能的实现方式中, 一个所述逻辑分区片段对应一个或多个所 述物理分区片段。 第二方面, 本发明实施例提供一种数据库系统, 包括: 物理分区层, 用于存储所述数据库系统的物理分区片段, 所述物理分区 片段为分区表的各条数据记录的集合; 逻辑分区层, 用于存储所述数据库系统的逻辑分区片段, 以提供分区表 的逻辑分区信息; 所述逻辑分区片段与所述物理分区片段通过映射关系互相 映射; 处理模块, 用于接收用户的数据操作指令, 确定所述数据操作指令所请 求操作的数据记录所属的逻辑分区片段; 根据所述数据操作指令和所述映射 关系确定所述物理分区层中与确定出的所述逻辑分区片段对应的所述物理分 区片段, 以对所述数据操作指令请求操作的数据记录进行操作。 在第二方面的第一种可能的实现方式中, 还包括: 分区管理装置, 用于根据收集的所述物理分区层和所述逻辑分区层的分 区信息, 对当前数据分区进行合理性判断; 如果判断当前数据分区不合理, 则对不合理的所述逻辑分区片段或所述物理分区片段进行调整。 根据第二方面或第二方面的第一种可能的实现方式, 在第二种可能的实 现方式中, 所述数据操作指令具体为: 査询指令; 相应地, 所述处理模块包 括: 査询优化器, 用于接收所述査询指令, 根据所述査询指令确定基于分区 表的査询条件, 对所述逻辑分区片段进行分区剪枝, 排除査询不涉及的逻辑 分区片段, 并生成执行计划; 执行引擎, 用于根据所述査询优化器生成的执行计划, 触发所述逻辑分 区层根据所述査询指令和所述映射关系确定所述物理分区层中与确定出的所 述逻辑分区片段对应的所述物理分区片段, 以得到所述査询指令请求査询的 数据记录, 并将得到的所述数据记录作为査询结果返回给用户。 根据第二方面的第二种可能的实现方式, 在第三种可能的实现方式中, 所述执行引擎具体用于: 根据所述査询优化器生成的执行计划和所述査询条件, 从所述逻辑分区 片段中循环取得数据记录, 并根据所述映射关系从与确定出的所述逻辑分区 片段从对应的所述物理分区片段取得数据记录; 根据所述査询指令中的信息对所述取得的数据记录进行数据筛选, 将所 述数据筛选获得的数据记录作为査询结果返回给用户。 根据第二方面或第二方面的第一种可能的实现方式, 在第四种可能的实 现方式中, 所述数据操作指令具体为: 插入、 删除或更新指令; 相应地, 所 述处理模块体用于: 根据所述插入、 删除或更新指令, 确定待插入、 删除或更新的数据记录; 根据所述数据记录的所在集合, 计算所述数据记录所属的所述逻辑分区 片段以及与所述逻辑分区片段对应的物理分区片段; 根据所述逻辑分区层对对计算得出的所述物理分区片段内的数据记录执 行插入、 删除或更新操作。 根据第二方面的第一种可能的实现方式, 在第五种可能的实现方式中, 所述分区管理装置具体包括: 分区信息收集单元,用于收集与分区调整相关的数据库系统运行的信息; 其中, 所述数据库系统运行信息包括: 逻辑分区和物理分区的元组的数 据信息、 所述逻辑分区片段和所述物理分区片段的数据规模、 外设提交给数 据库服务器的 SQL语句以及执行频率、 数据库服务器执行 SQL语句的性能; 分区决策单元, 用于利用所述数据库系统运行的信息, 判断当前数据分 区是否合理; 当判断出当前数据分区不合理时, 重新规划分区方案; 分区调整单元, 用于根据所述重新规划的分区方案, 对所述逻辑分区片 段和 \或所述物理分区片段进行调整。 根据第二方面的第五种可能的实现方式, 在第六种可能的实现方式中, 所述分区决策单元具体用于: 根据所述分区信息收集单元收集的所述数据库系统运行的信息中的特征 值判断当前数据分区是否合理, 所述特征值包括分区片段的数据规模上限、 偏斜因子阈值, 其中, 如果所述特征值超过相应特征值的预设值, 则判断当 前数据分区不合理。 根据第二方面的第五种可能的实现方式, 在第七种可能的实现方式中, 所述分区调整单元具体用于: 根据所述重新规划的分区方案, 通过对所述逻辑分区片段的增加、删除、 合并、 拆分调整所述逻辑分区层, 和\或, 通过对所述物理分区片段的拆分进 行调整。 根据第二方面、 第二方面的第一种至第七种可能的实现方式中的任意一 种, 在第八种可能的实现方式中, 一个所述逻辑分区片段对应一个或多个所 述物理分区片段。 本发明实施例提供的数据库管理方法与系统, 通过将数据分区分为物理 分区层和逻辑分区层; 逻辑分区片段与对应的物理分区片段相映射; 逻辑分 区层通过扫描物理分区层中对应的物理分区片段, 能够响应査询优化器、 执 行引擎以及分区管理装置对对应的物理分区片段的操作。 本发明实施例由于 采用了将数据分区为物理分区层和逻辑分区层的分层方法, 能够避免调整分 区导致的大规模数据迁移, 因而能够降低分区调整的代价。 附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案, 下面将对 实施例或现有技术描述中所需要使用的附图作一简单地介绍, 显而易见 地, 下面描述中的附图是本发明的一些实施例, 对于本领域普通技术人员 来讲, 在不付出创造性劳动性的前提下, 还可以根据这些附图获得其他的 附图。
图 1为分层分区的数据库结构示意图;
图 2为本发明数据库管理方法实施例一的流程图;
图 3为本发明数据库管理方法实施例二的流程图;
图 4为本发明数据库管理方法应用于査询操作的过程示意图; 图 5为本发明数据库管理方法实施例三的流程图;
图 6为本发明数据库管理方法实施例四的流程图;
图 7为图 6所示实施例中数据分区合理性判断过程的示意图; 图 8为本发明数据库系统实施例一的结构示意图;
图 9为本发明数据库系统实施例二的结构示意图;
图 10为本发明数据库系统中分区管理装置的结构示意图。 具体实施方式 为使本发明实施例的目的、 技术方案和优点更加清楚, 下面将结合本 发明实施例中的附图, 对本发明实施例中的技术方案进行清楚、 完整地描 述, 显然,所描述的实施例是本发明一部分实施例, 而不是全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有做出创造性劳动前提 下所获得的所有其他实施例, 都属于本发明保护的范围。 在数据库系统中, 数据分区是否合理直接影响数据库系统的性能, 当 数据库管理员 (Database Administrator, 简称: DBA) 采用一个不合理的 分区方案, 可能导致数据库系统的性能很低; 并且, 随着数据库系统中数 据量的的增长, 可能导致分区方案由合理变为不合理, 因此需要重新调整 分区方案, 现有技术的数据库管理方法, 在调整分区时, 例如在拆分或合 并分区子表时, 由于涉及的数据量较大, 导致数据迁移的代价较高。
数据库系统可以包括: 分区管理装置、 査询优化器和执行引擎。
其中, 数据的分区可以由分区管理装置来完成。
査询优化器是数据库系统结构化査询语言 (Structured Query
Language, 简称: SQL) 引擎的功能部件, 为数据操纵语言 (Data
Manipulation Language, 简称: DML) 语句生成执行计划。 执行引擎可以 是数据库系统的执行引擎, 执行引擎接收査询优化器的执行计划, 执行査 询、 插入、 删除或更新等操作。
本发明实施例提出了数据分区的一种数据库管理方法, 将数据分区划 分为物理分区层和逻辑分区层, 图 1为分层分区的数据库结构示意图, 如 图 1所示, 数据表中的数据记录在物理上以物理分区策略聚簇, 即物理分 区片段, 在逻辑上以逻辑分区策略聚簇, 即逻辑分区片段。 逻辑分区层居 于物理分区层之上, 一个逻辑分区片段可以对应于一个或多个物理分区片 段。 例如, 图 1中, 逻辑分区层中的数据位 Li , L2, ……至 Li, i为大于 1的整数, 而物理分区层中的对应于 1^数据为 Pu, ……至 Pln, 对应于 Li数据为 Pu, ··· ···至 Pim, n和 m均为大于 0的整数。
具体地, 物理分区层和逻辑分区层的功能和属性分别如下:
物理分区层: 体现数据表中的数据记录在存储介质上的聚簇方式。 分 区表的数据记录在存储介质上以物理分区策略聚簇。 物理分区策略和分区 粒度由分区管理装置决定。 物理分区层的功能包括: 存储物理分区片段; 根据逻辑分区层的指令, 对物理分区中的数据进行增加、 删除、 査询、 或 修改的操作; 根据分区管理装置的指令, 进行物理分区表的创建、 删除、 拆分、 合并等管理操作。
而数据库系统中的査询优化器和执行引擎不对物理分区层进行直接 的操作。 逻辑分区层: 体现数据表的元组在逻辑上的聚簇方式。 当然, 逻辑分 区策略与物理分区策略息息相关, 逻辑分区策略受限于物理分区策略。 一 个逻辑分区片段可以映射到一个或多个物理分区片段; 反之, 一个对上层 可用的物理分区片段必定映射到一个且只能映射到一个逻辑分区片段上。 换句话说, 逻辑分区层的逻辑片段数量相对于物理分区层中的物理分区片 段较少。
逻辑分区层的功能包括: 为査询优化器提供分区表的逻辑分区信息, 作为分区剪枝的依据; 当数据库服务器执行査询语句时, 逻辑分区层为执 行引擎提供服务: 对逻辑分区片段执行序列 (Sequence ) 扫描或者索引扫 描; 通过扫描物理分区层上的与之映射的物理分区片段来响应执行引擎的 请求; 当数据库服务器执行插入、 删除或更新等操作时, 逻辑分区层根据 执行引擎的指令,对逻辑分区片段上的元组执行插入、删除或更新等操作; 逻辑分区层通过数据记录路由将对数据记录的插入、 删除或更新等操作落 实到物理分区片段上; 根据分区管理装置的指令, 收集逻辑分区信息、 进 行逻辑分区表的创建、 删除、 合并、 拆分等管理操作。
可以看出, 分区管理装置对逻辑分区表的管理操作只涉及到逻辑分区 元数据的修改, 所以管理代价较低。
下面介绍本发明实施例的数据库管理方法, 图 2为本发明数据库管理 方法实施例一的流程图,本实施例的方法由数据库系统执行,如图 2所示, 所述方法可以包括:
S201、数据库系统将数据分区划分为物理分区层和逻辑分区层,其中, 所述物理分区层, 用于存储所述数据库系统的物理分区片段, 所述物理分 区片段为分区表的各条数据记录的集合; 所述逻辑分区层, 用于存储所述 数据库系统的逻辑分区片段, 以提供所述分区表的逻辑分区信息; 所述逻 辑分区片段与对应的所述物理分区片段相映射。
具体地, 可以由数据库系统中的分区管理装置执行 S201。
首先对以下术语的含义做以解释: 元组: 表示关系数据表中的一条数 据记录; 分区表: 表示在逻辑上数据分区中的完整的数据表; 分区键: 表 示数据分区中分区策略所使用的数据表的一个或多个列, 分区键用于根据 某个区间值或范围值、 特定值列表或散列函数值执行数据的聚集; 分区片 段: 分区表根据分区策略划分之后的片段叫做分区片段, 包括物理分区片 段和逻辑分区片段。
具体的, 物理分区层体现数据表的元组在存储介质上的聚簇方式。 分 区表的元组在存储介质上以物理分区策略聚簇。物理分区策略和分区粒度 由分区管理装置决定。 其中分区粒度代表分区的单位, 例如是以月或年来 分区。
逻辑分区层体现数据表的元组在逻辑上的聚簇方式。 并且, 逻辑分区 策略与物理分区策略息息相关, 逻辑分区策略受限于物理分区策略。 一个 逻辑分区片段可以映射到一个或多个物理分区片段; 反之, 一个对上层可 用的物理分区片段必定映射到一个且只能映射到一个逻辑分区片段上。 即 逻辑分区层的逻辑片段数量相对于物理分区层中的物理分区片段较少, 并 且,在具体实现时,分区管理装置可以控制各个逻辑分区片段的大小一致, 以便于管理。
5202、 所述数据库系统接收用户的数据操作指令, 确定所述数据操作 指令所请求操作的数据记录所属的逻辑分区片段。
5203、根据所述数据操作指令和所述映射关系确定所述物理分区层中 与确定出的所述逻辑分区片段对应的物理分区片段, 以对所述数据操作指 令请求操作的数据记录进行操作。
具体的, 逻辑分区层可以根据数据库系统中的査询优化器、 执行引擎 以及分区管理装置的指令, 对对应的所述物理分区片段进行操作。
逻辑分区层为査询优化器提供的服务包括: 为査询优化器提供分区表 的逻辑分区信息, 作为分区剪枝的依据; 其中, 分区剪枝是与分区方法紧 密相关的技术, 例如对所述逻辑分区片段进行分区剪枝, 可以排除査询不 涉及的逻辑分区片段。 分区剪枝技术将应用对分区表的査询映射到对少数 几个分区片段的査询。
逻辑分区层为执行引擎提供的服务包括: 例如, 当数据库服务器执行 査询语句时, 逻辑分区层对逻辑分区片段执行序列 (Sequence ) 扫描或者 索引扫描。逻辑分区层通过扫描物理分区层上的与之映射的物理分区片段 来响应执行引擎的请求。 进一歩的, 逻辑分区层也可以采用物理分区剪枝 等方法来提高物理分区片段扫描的性能, 从而进一歩提高査询性能。 逻辑分区层为执行引擎提供的服务包括: 例如, 当数据库服务器执行 插入、 删除或更新元组语句时, 执行引擎通过逻辑分区层对逻辑分区片段 上的元组执行插入、 删除或更新操作, 逻辑分区层通过元组路由将对元组 的插入、 删除或更新操作落实到物理分区片段上。
逻辑分区层向分区管理装置提供的服务包括: 例如, 对逻辑分区信息 的收集, 对逻辑分区表的创建、 删除、 合并、 拆分等管理操作。 对逻辑分 区表的管理操作只涉及到逻辑分区元组的数据的修改, 所以管理代价很 低。
功能上物理分区层提供以下服务:
存储物理分区片段;
向逻辑分区层提供服务: 例如, 对物理分区的元组执行査询 /插入 /删 除 /更新操作;
向分区管理装置提供服务: 例如, 对物理分区表的创建、删除、拆分、 合并等进行管理操作。
物理分区层对査询优化器和执行引擎不可见, 物理分区层是通过逻辑 分区层与査询优化器和执行引擎建立连接的。
本发明实施例提供的数据库管理方法, 通过将数据分区分为物理分区 层和逻辑分区层; 逻辑分区片段与对应的物理分区片段相映射; 逻辑分区 层通过扫描物理分区层中对应的物理分区片段, 能够响应査询优化器、 执 行引擎以及分区管理装置对对应的物理分区片段的操作。 本发明实施例由 于采用了将数据分区为物理分区层和逻辑分区层的分层方法, 能够避免调 整分区导致的大规模数据迁移, 因而能够降低分区调整的代价。
图 3为本发明数据库管理方法实施例二的流程图, 在本实施例中, 所 述数据操作指令具体为: 査询指令, 进一歩地, 本实施例还包括了对数据 库分区的自动调整的方法。 如图 3所示, 本实施例的方法在图 2所示实施 例的基础上, 针对査询操作时, 可以包括:
S301、 数据库系统接收所述査询指令, 根据所述査询指令确定基于分 区表的査询条件, 对所述逻辑分区片段进行分区剪枝, 排除査询不涉及的 逻辑分区片段, 并生成执行计划。
具体地, S301可以由数据库系统中的査询优化器执行, 査询优化器为 数据库系统 SQL引擎的功能部件, 为 DML的语句生成执行计划。 与分区 表相关的功能例如, 对査询语句, 或者, 对插入、 删除更新更新元组语句 中的査询部分执行分区剪枝。此处分区剪枝只针对逻辑分区,分区剪枝时, 査询优化器不直接对物理分区层进行操作。 此处的分区剪枝, 就是根据对 分区表的査询条件, 在生成执行计划的阶段判断那些逻辑分区片段是在査 询时所不涉及的, 并把这部分査询不涉及的逻辑分区片段在生成执行计划 阶段就剪除, 分区剪枝能极大提高査询性能。
5302、 数据库系统根据所述执行计划, 根据所述査询指令和所述映射 关系确定所述物理分区层中与确定出的所述逻辑分区片段对应的所述物 理分区片段, 以得到所述査询指令请求査询的数据记录, 并将得到的所述 数据记录作为査询结果返回给用户。
具体地, S302可以由数据库系统中的执行引擎执行,执行引擎根据査 询优化器生成的执行计划, 对所述逻辑分区片段执行相应的査询操作。
可以理解的是, 当数据操作指令为插入、 删除或更新等操作时, 由于 同样需要先査询到带操作的数据记录, 因此, 上述针对査询指令的方法也 可以用于插入、 删除或更新等操作中的査询部分。
进一歩地, 数据库系统还可以对数据分区进行自动化管理, 具体可以 包括:
5303、 数据库系统根据所述物理分区层和所述逻辑分区层的分区信 息, 对当前数据分区进行合理性判断, 如果判断当前数据分区不合理, 则 所述数据库系统对不合理的所述逻辑分区片段或所述物理分区片段进行 调整。
具体的, S303可以由数据库系统中的分区管理装置执行。 其中, 数据 分区是否合理, 可以通过一组特征值进行判断, 这组特征值是:
1) 分区片段规模上限: 分区片段的规模可以定义成分区片段包含的 元组数量或分区片段占有存储空间的大小。 当某一逻辑分区片段规模大于 规模上限时, 此逻辑分区片段必须被拆分成更小的逻辑分区片段。 逻辑分 区片段规模上限由数据库系统指定或者由分区表定义者指定。
2) 偏斜因子阈值: 两个分区片段的偏斜因子指数据规模中较大规模 的表与较小规模表的比值。 如果两个分区片段的偏斜因子大于偏斜因子阈 值, 则认为数据分区发生了数据偏斜, 需要调整。
数据分区合理性判断结果分为合理与不合理, 其中分区不合理包括: 逻辑分区溢出、 物理分区溢出、 逻辑分区偏斜三种不合理结果。
分区管理装置提供以下服务:
创建分区表时, 分区管理装置根据分区键的数据类型, 按照默认设置 为分区表决定分区策略和分区粒度, 可选地, 数据库管理员 (Database Administrator , DBA) 也可以在创建分区时指定分区策略和分区粒度。
在数据库系统运行过程当中, 分区管理装置条件触发 (定时触发或者 系统某一关键指标的变化等) 分区分析过程, 分区管理装置根据系统的实 时运行信息, 判断目前的分区是否合理。 如果不合理, 分区管理装置决策 新的分区策略, 并根据新的分区策略对不合理的所述逻辑分区片段或所述 物理分区片段进行调整, 以实现对数据分区的自动化管理。
本发明实施例提供的数据库管理方法, 数据库系统中的査询优化器、 执行引擎通过所述逻辑分区片段对对应的所述物理分区片段进行插入、 删 除或更新等数据库管理的操作, 能够提高数据库管理的效率; 数据库系统 中的分区挂历装置可以根据数据变化自动调整分区, 能够提高分区的合理 性以及数据库系统的管理性能。
图 4为本发明数据库管理方法应用于査询操作的过程示意图, 如图 4 所示, 本实施例是根据上述实施例二的方法中实现査询操作的过程(S301 和 S302 ) 的详细描述。 本实施例的方法可以包括:
5401、 生成计划树。
具体地, 可以由数据库系统中的査询优化器执行 S401。
例如, 査询开始后査询优化器首先进行结构化査询语言 (Structured
Query Language, SQL) 解析, 解析原始 SQL语句生成解析树; 然后生成 计划树, 即由解析树生成计划树, 其中, S401还包括根据査询条件对分区 表进行逻辑分区剪枝, 排除査询不涉及的逻辑分区片段。
5402、 执行扫描逻辑分区片段。
具体地, 可以由数据库系统中的执行引擎以及逻辑分区层和物理分区 层执行 S402。
所述执行引擎根据预先设置的扫描过滤条件, 从所述逻辑分区片段中 循环取得数据记录, 所述逻辑分区片段从对应的所述物理分区片段取得数 据记录;
具体的执行査询过程主要描述对分区表的扫描操作, 具体包括: 初始 化逻辑分区片段扫描, 设置扫描过滤条件。 其中, 包括根据扫描过滤条件 进行物理分区剪枝; 另外, 初始化逻辑分区片段扫描过程是通过初始化底 层物理分区片段扫描来实现的, 因此设置扫描过滤条件也就是为物理分区 片段设置扫描过滤条件。
循环从逻辑分区片段中取元组。逻辑分区层通过从物理分区片段取得 元组来实现, 例如从逻辑分区片段取一个元组是通过从对应的物理分区片 段取一个元组。
判断元组是否为空, 如果元组为空, 则扫描过程结束, 这时跳出循环 过程, 扫描完毕。
如果元组不为空, 则执行元组筛选过程, 作为后续的计算的输入; 并 取下一条元组。
S403、 获取査询结果; 即经过筛选获取后续计算的数据记录, 作为获 取的査询结果。
具体地, 可以由数据库系统中的査询优化器执行 S403。
例如可为获取范围査询结果并返回査询结果集。
至此, 整个査询过程结束。
本实施例,可以根据预先设置的扫描过滤条件,从所述逻辑分区片段以 及对应的所述物理分区片段取得数据记录, 实现对数据记录的査询过程。
图 5为本发明数据库管理方法实施例三的流程图, 在本实施例中, 所 述数据操作指令具体为: 插入、 删除或更新指令。 如图 5所示, 本实施例 的方法在图 2所示实施例的基础上, 针对插入、 删除或更新指令操作时, 可以包括:
S501、 数据库系统根据所述插入、 删除或更新指令, 确定待插入、 删 除或更新的数据记录。
具体的, 所述数据库系统可以根据所述插入、 删除或更新指令计算插 入、 删除或更新的元组: 例如, 一种获取元组的方法是用户直接输入的元 组 (比如插入一条新元组) , 这样在 SQL语句解析完之后就能得到元组; 另一种获取元组的方法是通过査询从数据表中获取元组(比如条件更新), 这种情况下, 计算需要获取的目标元组与査询过程相同。
5502、所述数据库系统根据所述数据记录的所在集合计算所述数据记 录所属的所述逻辑分区片段以及与所述逻辑分区片段对应的物理分区片 段。
具体的, 所述数据库系统可以根据元组的分区键值计算元组所属的逻 辑分区片段: 根据逻辑分区策略和映射关系确定元组逻辑分区片段对应的 所述物理分区片段。
S501和 S502在实际操作时可以与实施例二中 S301和 S302的査询操 作类似。
5503、所述数据库系统对计算得出的所述物理分区片段内的数据执行 插入、 删除或更新操作。
具体地, 执行引擎向逻辑分区层发送元组插入、 删除或更新请求, 逻 辑分区层根据元组的分区键值计算其映射的物理分区片段, 并将插入、 删 除或更新的请求提交到物理分区层来完成。
本发明实施例可以通过从所述逻辑分区片段以及对应的所述物理分 区片段取得数据记录, 实现对获取的数据记录的插入、 删除或更新操作过 程。
图 6为本发明数据库管理方法实施例四的流程图, 如图 6所示, 本实 施例是根据上述实施例二的方法对实现分区自动化管理的过程加以详细 描述。
分区管理装置实现对数据分区的自动化管理, 具体包括:
5601、分区管理装置的分区信息收集单元收集与分区调整相关的数据 库系统运行的信息。
其中, 收集的信息包括: 逻辑分区和物理分区的元组的数据信息、 所 述逻辑分区片段和所述物理分区片段的数据规模、 外设提交给数据库服务 器的 SQL语句以及执行频率、 数据库服务器执行 SQL语句的性能; 这些 信息是分区决策单元的决策依据。
5602、所述分区管理装置的分区决策单元利用所述分区信息收集单元 收集的信息, 判断当前数据分区不合理时, 分区决策单元重新规划分区, 计算新的逻辑分区和物理分区。
具体的, 如果当前分区合理, 则结束此次数据分区调整流程; 如果当 前分区不合理, 分区决策单元根据当前的分区信息重新计算合理的数据分 区。
S603、所述分区管理装置的分区调整单元对所述逻辑分区片段或所述 物理分区片段进行调整。
分区调整单元执行逻辑数据分区或物理数据分区的调整过程后, 结束 此次数据分区调整。
优选的, 数据库服务器根据一定逻辑自动触发数据分区调整过程, 例 如根据时间定期触发等方式, 所述分区调整单元对数据分区的调整包括: 根据所述收集的信息, 判断当前数据分区是否合理;
如果当前数据分区不合理, 分区决策单元根据当前的分区信息重新计 算合理的数据分区;
所述分区调整单元执行所述逻辑数据分区片段或物理数据分区片段 的调整, 具体通过对所述逻辑分区片段的增加、 删除、 合并、 拆分时修改 逻辑分区元组的数据对所述逻辑分区片段进行调整, 或者, 通过拆分物理 分区对所述物理分区片段进行调整。
具体的, 如果分区决策单元判断数据分区需要调整, 那么分区调整的 过程则由分区调整单元来完成。
分区调整分为两类:
大部分情况下, 调整只涉及到逻辑分区。 包括逻辑分区片段的增加、 删除、 合并、 拆分等。 这些对逻辑分区的调整只需要修改逻辑分区元组的 数据即可, 而不需要进行分区片段之间的数据迁移。
少部分情况下, 由于只调整逻辑分区并不能完全满足性能需求, 比如 随着生产系统长时间在线运营, 导致物理分区片段规模逐渐增大, 以致影 响査询性能。 此时, 需要对物理分区进行调整, 对物理分区的调整主要是 拆分物理分区。
由于调整逻辑分区不需要进行数据迁移, 而调整物理分区往往涉及到 数据迁移, 数据迁移会导致部分分区片段在一段时间内不可用, 所以逻辑 分区调整的代价相比物理分区调整的代价要低很多。 另外, 相比现有不分层的分区技术, 分层的数据分区中的物理分区粒 度比较小, 所以受调整影响的数据量相对较小, 从而提高了数据可用性。
图 7为图 6所示实施例中数据分区合理性判断过程的示意图。 本实施 例是对上述实施例中实现自动化管理的过程中的数据分区合理性判断过 程加以详细描述。
本实施例描述一种数据分区合理性判断过程, 用来证明本发明是可实 施的。 但本发明的思想并不以此为限, 采用与本实施例描述过程不同的其 它实施例并不影响对数据分区的分层思想和数据分区调整过程的保护。
优选的, 所述分区决策单元根据所述分区信息收集单元收集的所述信 息中的特征值判断当前数据分区是否合理, 所述特征值包括分区片段的数 据规模上限、 偏斜因子阈值, 如果所述特征值超过相应特征值的预设值, 则判断当前数据分区不合理。
具体的,数据分区的合理性判断以一组特征值为依据,这组特征值是:
1) 分区片段规模上限: 分区片段的规模可以定义成分区片段包含的 元组数量或分区片段占有存储空间的大小。包括逻辑分区片段规模和物理 分区片段规模。 例如当某一逻辑分区片段规模大于逻辑分区片段规模上限 时, 此逻辑分区片段可以被拆分成更小的逻辑分区片段。 逻辑分区片段规 模上限可以由数据库系统指定或者由分区表定义者指定。物理分区片段规 模同样道理。
2) 偏斜因子阈值: 两个分区片段的偏斜因子指逻辑分区片段规模中 较大规模的表与较小规模表的规模数值的比值。 如果两个逻辑分区片段的 偏斜因子大于偏斜因子阈值,则认为数据分区发生了数据偏斜,需要调整。
数据分区合理性判断结果分为合理与不合理, 其中分区不合理包括: 逻辑分区溢出、 物理分区溢出、 逻辑分区偏斜三种不合理结果, 可以分别 对应逻辑分区片段规模超过限值、 物理分区片段规模超过限值、 两个逻辑 分区片段的偏斜因子大于偏斜因子阈值。
如图 7所示, 数据分区合理性判断过程具体包括:
判断开始后, 首先执行:
S701、 初始化最小分区片段句柄、 最小分区片段规模、 最大分区片段 句柄、 最大分区片段规模。 例如设置最小分区片段规模 minPart=MAX, 最大分区片段规模 maxPart=0, 其中包括对最大或最小物理分区片段句柄及规模 minPartl、 maxPartl的初始化,以及对最大或最小逻辑分区片段句柄及规模 minPart2、 maxPart2的初始化。
S702、 遍历所有逻辑分区片段。
5703、 判断是否还有下一个逻辑分区片段, 如果为空, 即若否, 则没 有下一个逻辑分区片段, 则转去执行 S709; 如果为是, 则执行:
5704、 更新 minPart2或 maxPart2。
例如, 如果逻辑分区片段规模小于最小逻辑分区片段规模 minPart2, 则更新最小逻辑分区片段句柄和最小逻辑分区片段规模 minPart2。 否则, 如果逻辑分区片段规模大于最大逻辑分区片段规模 maxPart2, 则更新最大 逻辑分区句柄和最大逻辑分区片段规模 maxPart2;
5705、判断逻辑分区片段规模是否超过逻辑分区片段规模上下限预设 值。
例如, 如果逻辑分区片段规模不大于最大逻辑分区片段规模
maxPart2, 即为否, 则判断此逻辑分区片段合理, 转到 S703 ; 否则为是, 则执行 S706;
5706、 遍历此逻辑分区片段对应的所有物理分区片段, 判断物理分区 片段规模是否超过物理分区片段规模上下限预设值。
判断结果分为: 如果为是, 则执行 S707 , 如果为否, 则执行 S708。
5707、 此物理分区片段过大, 记录过大的物理分区片段。
5708、 此逻辑分区片段过大, 记录过大的逻辑分区片段。
举例来说, 如果为否, 比如所有物理分区片段规模小于最大物理分区 片段规模 maxPartl , 则判断结果为逻辑分区片段不合理; 如果为是, 比如 至少有一个物理分区片段规模大于等于最大物理分区片段规模 maxPartl , 则判断结果为此物理分区片段不合理。 记录不合理的逻辑分区片段或物理 分区片段;
S709、 判断是否 maxPart2/minPart2大于偏斜因子阈值。
如果经过上述歩骤没有发现物理分区片段或者逻辑分区片段不合理, 则计算拥有最大逻辑分区片段规模和最小逻辑分区片段规模的两个分区 片段的偏斜因子, 即 maxPart2/minPart2, 并判断该偏斜因子的计算值是否 大于偏斜因子阈值。 如果为否, 即偏斜因子小于偏斜因子阈值, 则认为数 据分区合理, 不需要进行调整。
否则为是, 则执行 S710。
S710、 记录发生偏斜的两个逻辑分区片段。
这时判断结果为逻辑分区发生偏斜, 即分区不合理, 需要进行逻辑分 区调整。
最后记录合理性判断结果, 结束数据分区合理性判断过程。
图 8为本发明数据库系统实施例一的结构示意图, 如图 8所示, 所述 系统包括:
物理分区层 801, 用于存储所述数据库系统的物理分区片段, 所述物 理分区片段为分区表的各条数据记录的集合;
逻辑分区层 802, 用于存储所述数据库系统的逻辑分区片段, 以提供 分区表的逻辑分区信息; 所述逻辑分区片段与所述物理分区片段通过映射 关系互相映射。
处理模块 803, 用于接收用户的数据操作指令, 确定所述数据操作指 令所请求操作的数据记录所属的逻辑分区片段; 根据所述数据操作指令和 所述映射关系确定所述物理分区层中与确定出的所述逻辑分区片段对应 的所述物理分区片段, 以对所述数据操作指令请求操作的数据记录进行操 作。
本发明实施例提供的数据库系统, 数据分区分为物理分区层和逻辑分 区层; 逻辑分区片段与对应的物理分区片段相映射; 逻辑分区层通过扫描 物理分区层中对应的物理分区片段, 能够响应査询优化器、 执行引擎以及 分区管理装置对对应的物理分区片段的操作。 本发明实施例由于采用了将 数据分区为物理分区层和逻辑分区层的分层方法, 能够避免调整分区导致 的大规模数据迁移, 因而能够降低分区调整的代价。
图 9为本发明数据库系统实施例二的结构示意图, 如图 9所示其中, 所述物理分区层 801, 例如包括图 1中的 Pu, ……至 Pln, P..., Pa , …… 至 Pim ; 所述逻辑分区层 802, 例如包括图 1中的 Li , L2, ……至 Li ; 所 述逻辑分区片段通过映射关系与对应的所述物理分区片段相映射; 逻辑分 区层中的元组可以与物理分区层中的元组相联系, 具体可以通过元组路由 (图中未示出) 相联系。
优选的, 所述系统还可以包括:
分区管理装置 804, 用于根据收集的所述物理分区层和所述逻辑分区 层的分区信息, 对当前数据分区进行合理性判断; 如果判断当前数据分区 不合理, 则对不合理的所述逻辑分区片段或所述物理分区片段进行调整。
进一歩地, 所述数据操作指令具体可以为: 査询指令, 相应地, 所述 处理模块 803可以包括:
査询优化器 8031, 用于接收所述査询指令, 根据所述査询指令确定基 于分区表的査询条件, 对所述逻辑分区片段进行分区剪枝, 排除査询不涉 及的逻辑分区片段, 并生成执行计划;
执行引擎 8032, 用于根据所述査询优化器 8031生成的执行计划, 触 发所述逻辑分区层根据所述査询指令和所述映射关系确定所述物理分区 层中与确定出的所述逻辑分区片段对应的所述物理分区片段, 以得到所述 査询指令请求査询的数据记录, 并将得到的所述数据记录作为査询结果返 回给用户。
优选的, 所述执行引擎 8032具体可以用于:
根据所述査询优化器生成的执行计划和所述査询条件, 从所述逻辑分 区片段中循环取得数据记录, 并根据所述映射关系从与确定出的所述逻辑 分区片段从对应的所述物理分区片段取得数据记录;
根据所述査询指令中的信息对所述取得的数据记录进行数据筛选, 将 所述数据筛选获得的数据记录作为査询结果返回给用户。
进一歩地, 所述数据操作指令具体可以为: 插入、 删除或更新指令; 相应地, 所述处理模块 803具体可以用于:
根据所述插入、 删除或更新指令, 确定待插入、 删除或更新的数据记 录;
根据所述数据记录的所在集合, 计算所述数据记录所属的所述逻辑分 区片段以及与所述逻辑分区片段对应的物理分区片段;
根据所述逻辑分区层对计算得出的所述物理分区片段内的数据记录 的执行插入、 删除或更新操作。 图 10为本发明数据库系统中分区管理装置的结构示意图, 如图 9所 示, 优选的, 所述分区管理装置 804具体可以包括:
分区信息收集单元 8041,用于收集与分区调整相关的数据库系统运行 的信息;
其中, 所述数据库系统运行信息包括: 逻辑分区和物理分区的元组的 数据信息、 所述逻辑分区片段和所述物理分区片段的数据规模、 外设提交 给数据库服务器的 SQL语句以及执行频率、 数据库服务器执行 SQL语句 的性能;
分区决策单元 8042, 用于利用所述所述数据库系统运行的信息, 判断 当前数据分区是否合理; 当判断出当前数据分区不合理时, 重新规划分区 方案;
分区调整单元 8043, 用于根据所述重新规划的分区方案, 对所述逻辑 分区片段和 \或所述物理分区片段进行调整。
优选的, 所述分区决策单元 8042具体用于:
根据所述分区信息收集单元 8041收集的所述数据库系统运行的信息 中的特征值判断当前数据分区是否合理, 所述特征值包括分区片段的数据 规模上限、 偏斜因子阈值, 其中, 如果所述特征值超过相应特征值的预设 值, 则判断当前数据分区不合理。
优选的, 所述分区调整单元 8043具体用于:
根据所述重新规划的分区方案, 通过对所述逻辑分区片段的增加、 删 除、 合并、 拆分调整所述逻辑分区层, 和\或, 通过对所述物理分区片段的 拆分进行调整。
优选的, 一个所述逻辑分区片段对应一个或多个所述物理分区片段。 本发明上述各实施例中, 数据分区系统为分区表自动选择合理的分区 策略和分区粒度, 避免了因为 DBA采用一个不合理的分区可能导致分区 性能很低;随着数据量的增长,数据分区系统根据数据变化自动调整分区, 在不需要 DBA干预的前提下能及时发现不合理的分区并进行及时调整; 同时, 数据分区系统利用物理分区和逻辑分区分层思想, 大多数情况下避 免了因为分区调整导致的大规模的数据迁移, 极大降低分区调整的代价; 数据分区系统的自动分区调整将 DBA从复杂的分区管理中解脱出来, 由 于分区调整的依据是数据库系统运行的实时信息, 所以可以保证调整后的 分区策略是合理的。 而且不需要长时间的验证; 因为引入了分层的数据 分区, 所以可以充分利用存储集群系统的性能优势, 将逻辑分区层以下的 部分在存储集群上实现, 极大提高了査询效率。
综上所述, 本发明实施例提供的数据库管理方法与系统, 通过将数据 分区分为物理分区层和逻辑分区层; 所述逻辑分区片段与对应的所述物理 分区片段相映射; 所述逻辑分区层通过扫描所述物理分区层中对应的所述 物理分区片段, 能够响应査询优化器、 执行引擎以及分区管理装置对对应 的所述物理分区片段的操作。 本发明实施例可以根据数据变化自动调整分 区, 能够提高分区的合理性以及数据库系统的管理性能。
本领域普通技术人员可以理解: 实现上述各方法实施例的全部或部分 歩骤可以通过程序指令相关的硬件来完成。 前述的程序可以存储于一计算 机可读取存储介质中。 该程序在执行时, 执行包括上述各方法实施例的歩 骤; 而前述的存储介质包括: ROM、 RAM, 磁碟或者光盘等各种可以存 储程序代码的介质。
最后应说明的是: 以上各实施例仅用以说明本发明实施例的技术方案, 而非对其限制; 尽管参照前述各实施例对本发明实施例进行了详细的说明, 本领域的普通技术人员应当理解: 其依然可以对前述各实施例所记载的技术 方案进行修改, 或者对其中部分或者全部技术特征进行等同替换; 而这些修 改或者替换, 并不使相应技术方案的本质脱离本发明实施例各实施例技术方 案的范围。

Claims

权 利 要 求 书
1、 一种数据库管理方法, 其特征在于, 包括:
数据库系统将数据分区划分为物理分区层和逻辑分区层,
其中, 所述物理分区层, 用于存储所述数据库系统的物理分区片段, 所述物理分区片段为分区表的各条数据记录的集合; 所述逻辑分区层, 用 于存储所述数据库系统的逻辑分区片段, 以提供所述分区表的逻辑分区信 息; 所述逻辑分区片段与所述物理分区片段通过映射关系互相映射;
所述数据库系统接收用户的数据操作指令, 确定所述数据操作指令所 请求操作的数据记录所属的逻辑分区片段;
所述数据库系统根据所述数据操作指令和所述映射关系确定所述物 理分区层中与确定出的所述逻辑分区片段对应的物理分区片段, 以对所述 数据操作指令请求操作的数据记录进行操作。
2、 根据权利要求 1所述的方法, 其特征在于, 还包括:
所述数据库系统根据所述物理分区层和所述逻辑分区层的分区信息, 对当前数据分区进行合理性判断, 如果判断当前数据分区不合理, 则所述 数据库系统对不合理的所述逻辑分区片段或所述物理分区片段进行调整。
3、 根据权利要求 1或 2所述的方法, 其特征在于, 所述数据操作指 令具体为: 査询指令; 相应地, 所述根据所述数据操作指令和所述映射关 系确定所述物理分区层中与确定出的所述逻辑分区片段对应的所述物理 分区片段, 以对所述数据操作指令请求操作的数据记录进行操作, 包括: 所述数据库系统接收所述査询指令, 根据所述査询指令确定基于分区 表的査询条件, 对所述逻辑分区片段进行分区剪枝, 排除査询不涉及的逻 辑分区片段, 并生成执行计划;
所述数据库系统根据所述执行计划, 根据所述査询指令和所述映射关 系确定所述物理分区层中与确定出的所述逻辑分区片段对应的所述物理 分区片段, 以得到所述査询指令请求査询的数据记录, 并将得到的所述数 据记录作为査询结果返回给用户。
4、 根据权利要求 3所述的方法, 其特征在于, 所述数据库系统根据 所述执行计划, 触发所述逻辑分区层根据所述査询指令和所述映射关系扫 描所述物理分区层中对应的所述物理分区片段, 以执行相应的査询操作, 具体包括:
所述数据库系统根据所述执行计划和所述査询条件, 从所述逻辑分区 片段中循环取得数据记录, 并根据所述映射关系从与确定出的所述逻辑分 区片段对应的所述物理分区片段取得数据记录;
所述数据库系统根据所述査询指令中的信息对所述取得的数据记录 进行数据筛选, 将所述数据筛选获得的数据记录作为査询结果返回给用 户。
5、 根据权利要求 1或 2所述的方法, 其特征在于, 所述数据操作指 令具体为: 插入、 删除或更新指令; 相应地, 所述根据所述数据操作指令 和所述映射关系确定所述物理分区层中与确定出的所述逻辑分区片段对 应的所述物理分区片段, 以对所述数据操作指令请求操作的数据记录进行 操作, 具体包括:
所述数据库系统根据所述插入、 删除或更新指令, 确定待插入、 删除 或更新的数据记录;
所述数据库系统根据所述数据记录的所在集合计算所述数据记录所 属的所述逻辑分区片段以及与所述逻辑分区片段对应的物理分区片段; 所述数据库系统对计算得出的所述物理分区片段内的数据执行插入、 删除或更新操作。
6、 根据权利要求 2所述的方法, 其特征在于, 所述数据库系统根据 所述物理分区层和所述逻辑分区层的分区信息, 对当前数据分区进行合理 性判断, 如果判断当前数据分区不合理, 则所述数据库系统对不合理的所 述逻辑分区片段或所述物理分区片段进行调整, 包括:
所述数据库系统收集与分区调整相关的数据库系统运行的信息; 其中, 所述数据库系统运行的信息包括: 逻辑分区和物理分区的元组 的数据信息、 所述逻辑分区片段和所述物理分区片段的数据规模、 外设提 交给数据库服务器的 SQL语句以及执行频率、 数据库服务器执行 SQL语 句的性能;
所述数据库系统利用所述数据库系统运行的信息, 判断当前数据分区 是否合理;
当判断出当前数据分区不合理时, 所述数据库系统重新规划分区方 案;
所述数据库系统根据所述重新规划的分区方案, 对所述逻辑分区片段 和\或所述物理分区片段进行调整。
7、 根据权利要求 6所述的方法, 其特征在于, 所述数据库系统利用 所述数据库系统运行的信息, 判断当前数据分区是否合理, 包括:
所述数据库系统根据所述数据库系统运行的信息中的特征值判断当 前数据分区是否合理, 所述特征值包括分区片段的数据规模上限、 偏斜因 子阈值, 其中, 如果所述特征值超过相应特征值的预设值, 则判断当前数 据分区不合理。
8、 根据权利要求 6所述的方法, 其特征在于, 所述数据库系统根据 所述重新计算的所述逻辑分区层和所述物理分区层对所述逻辑分区片段 和\或所述物理分区片段进行调整, 包括:
所述数据库系统根据所述重新规划的分区方案, 通过对所述逻辑分区 片段的增加、 删除、 合并、 拆分调整所述逻辑分区层, 和\或, 通过对所述 物理分区片段的拆分调整所述物理分区层。
9、 根据权利要求 1-8任一项所述的方法, 其特征在于, 一个所述逻辑 分区片段对应一个或多个所述物理分区片段。
10、 一种数据库系统, 其特征在于, 包括:
物理分区层, 用于存储所述数据库系统的物理分区片段, 所述物理分 区片段为分区表的各条数据记录的集合;
逻辑分区层, 用于存储所述数据库系统的逻辑分区片段, 以提供分区 表的逻辑分区信息; 所述逻辑分区片段与所述物理分区片段通过映射关系 互相映射;
处理模块, 用于接收用户的数据操作指令, 确定所述数据操作指令所 请求操作的数据记录所属的逻辑分区片段; 根据所述数据操作指令和所述 映射关系确定所述物理分区层中与确定出的所述逻辑分区片段对应的所 述物理分区片段, 以对所述数据操作指令请求操作的数据记录进行操作。
11、 根据权利要求 10所述的系统, 其特征在于, 还包括:
分区管理装置, 用于根据收集的所述物理分区层和所述逻辑分区层的 分区信息, 对当前数据分区进行合理性判断; 如果判断当前数据分区不合 理, 则对不合理的所述逻辑分区片段或所述物理分区片段进行调整。
12、 根据权利要求 10或 11所述的系统, 其特征在于, 所述数据操作 指令具体为: 査询指令; 相应地, 所述处理模块包括:
査询优化器, 用于接收所述査询指令, 根据所述査询指令确定基于分 区表的査询条件, 对所述逻辑分区片段进行分区剪枝, 排除査询不涉及的 逻辑分区片段, 并生成执行计划;
执行引擎, 用于根据所述査询优化器生成的执行计划, 触发所述逻辑 分区层根据所述査询指令和所述映射关系确定所述物理分区层中与确定 出的所述逻辑分区片段对应的所述物理分区片段, 以得到所述査询指令请 求査询的数据记录, 并将得到的所述数据记录作为査询结果返回给用户。
13、 根据权利要求 12所述的系统, 其特征在于, 所述执行引擎具体 用于:
根据所述査询优化器生成的执行计划和所述査询条件, 从所述逻辑分 区片段中循环取得数据记录, 并根据所述映射关系从与确定出的所述逻辑 分区片段从对应的所述物理分区片段取得数据记录;
根据所述査询指令中的信息对所述取得的数据记录进行数据筛选, 将 所述数据筛选获得的数据记录作为査询结果返回给用户。
14、 根据权利要求 10或 11所述的系统, 其特征在于, 所述数据操作 指令具体为: 插入、 删除或更新指令; 相应地, 所述处理模块体用于: 根据所述插入、 删除或更新指令, 确定待插入、 删除或更新的数据记 录;
根据所述数据记录的所在集合, 计算所述数据记录所属的所述逻辑分 区片段以及与所述逻辑分区片段对应的物理分区片段;
根据所述逻辑分区层对对计算得出的所述物理分区片段内的数据记 录执行插入、 删除或更新操作。
15、 根据权利要求 11所述的系统, 其特征在于, 所述分区管理装置 具体包括:
分区信息收集单元, 用于收集与分区调整相关的数据库系统运行的信 息;
其中, 所述数据库系统运行信息包括: 逻辑分区和物理分区的元组的 数据信息、 所述逻辑分区片段和所述物理分区片段的数据规模、 外设提交 给数据库服务器的 SQL语句以及执行频率、 数据库服务器执行 SQL语句 的性能;
分区决策单元, 用于利用所述数据库系统运行的信息, 判断当前数据 分区是否合理; 当判断出当前数据分区不合理时, 重新规划分区方案; 分区调整单元, 用于根据所述重新规划的分区方案, 对所述逻辑分区 片段和 \或所述物理分区片段进行调整。
16、 根据权利要求 15所述的系统, 其特征在于, 所述分区决策单元 具体用于:
根据所述分区信息收集单元收集的所述数据库系统运行的信息中的 特征值判断当前数据分区是否合理, 所述特征值包括分区片段的数据规模 上限、 偏斜因子阈值, 其中, 如果所述特征值超过相应特征值的预设值, 则判断当前数据分区不合理。
17、 根据权利要求 15所述的系统, 其特征在于, 所述分区调整单元 具体用于:
根据所述重新规划的分区方案, 通过对所述逻辑分区片段的增加、 删 除、 合并、 拆分调整所述逻辑分区层, 和\或, 通过对所述物理分区片段的 拆分进行调整。
18、 根据权利要求 10-17任一项所述的系统, 其特征在于, 一个所述逻 辑分区片段对应一个或多个所述物理分区片段。
PCT/CN2014/076481 2013-09-12 2014-04-29 数据库管理方法与系统 WO2015035789A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP14843923.5A EP2990962A4 (en) 2013-09-12 2014-04-29 METHOD AND SYSTEM FOR MANAGING THE DATABASE
US14/956,666 US9460186B2 (en) 2013-09-12 2015-12-02 Database management method and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310416229.9 2013-09-12
CN2013104162299A CN103473321A (zh) 2013-09-12 2013-09-12 数据库管理方法与系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/956,666 Continuation US9460186B2 (en) 2013-09-12 2015-12-02 Database management method and system

Publications (1)

Publication Number Publication Date
WO2015035789A1 true WO2015035789A1 (zh) 2015-03-19

Family

ID=49798169

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/076481 WO2015035789A1 (zh) 2013-09-12 2014-04-29 数据库管理方法与系统

Country Status (4)

Country Link
US (1) US9460186B2 (zh)
EP (1) EP2990962A4 (zh)
CN (1) CN103473321A (zh)
WO (1) WO2015035789A1 (zh)

Families Citing this family (70)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473321A (zh) 2013-09-12 2013-12-25 华为技术有限公司 数据库管理方法与系统
US10282437B2 (en) * 2014-04-17 2019-05-07 Oracle International Corporation Partial indexes for partitioned tables
US9965497B2 (en) * 2014-05-29 2018-05-08 Oracle International Corporation Moving data between partitions
WO2015188299A1 (zh) * 2014-06-09 2015-12-17 华为技术有限公司 一种数据处理方法及装置
KR102177190B1 (ko) 2014-06-30 2020-11-10 마이크로소프트 테크놀로지 라이센싱, 엘엘씨 유연한 스키마를 사용한 데이터 관리
US10394818B2 (en) 2014-09-26 2019-08-27 Oracle International Corporation System and method for dynamic database split generation in a massively parallel or distributed database environment
US10089377B2 (en) 2014-09-26 2018-10-02 Oracle International Corporation System and method for data transfer from JDBC to a data warehouse layer in a massively parallel or distributed database environment
US10380114B2 (en) 2014-09-26 2019-08-13 Oracle International Corporation System and method for generating rowid range-based splits in a massively parallel or distributed database environment
US10180973B2 (en) 2014-09-26 2019-01-15 Oracle International Corporation System and method for efficient connection management in a massively parallel or distributed database environment
US10387421B2 (en) 2014-09-26 2019-08-20 Oracle International Corporation System and method for generating size-based splits in a massively parallel or distributed database environment
US10528596B2 (en) 2014-09-26 2020-01-07 Oracle International Corporation System and method for consistent reads between tasks in a massively parallel or distributed database environment
US10089357B2 (en) * 2014-09-26 2018-10-02 Oracle International Corporation System and method for generating partition-based splits in a massively parallel or distributed database environment
US10078684B2 (en) 2014-09-26 2018-09-18 Oracle International Corporation System and method for query processing with table-level predicate pushdown in a massively parallel or distributed database environment
CN104536904B (zh) * 2014-12-29 2018-01-09 杭州华为数字技术有限公司 一种数据管理的方法、设备与系统
CN105447112B (zh) * 2015-11-12 2019-02-01 国云科技股份有限公司 一种实现关系数据库Hash分区高效扩展的方法
CN105718539B (zh) * 2016-01-18 2019-02-19 浪潮通用软件有限公司 一种数据库应用方法及装置
CN107085570B (zh) * 2016-02-14 2020-08-14 华为技术有限公司 数据处理方法、应用服务器和路由器
CN106446153A (zh) * 2016-09-21 2017-02-22 广州特道信息科技有限公司 一种分布式newSQL数据库系统及方法
US11461334B2 (en) * 2016-09-26 2022-10-04 Splunk Inc. Data conditioning for dataset destination
US11550847B1 (en) 2016-09-26 2023-01-10 Splunk Inc. Hashing bucket identifiers to identify search nodes for efficient query execution
US11599541B2 (en) 2016-09-26 2023-03-07 Splunk Inc. Determining records generated by a processing task of a query
US11604795B2 (en) 2016-09-26 2023-03-14 Splunk Inc. Distributing partial results from an external data system between worker nodes
US11126632B2 (en) 2016-09-26 2021-09-21 Splunk Inc. Subquery generation based on search configuration data from an external data system
US20180089324A1 (en) 2016-09-26 2018-03-29 Splunk Inc. Dynamic resource allocation for real-time search
US11232100B2 (en) 2016-09-26 2022-01-25 Splunk Inc. Resource allocation for multiple datasets
US11243963B2 (en) 2016-09-26 2022-02-08 Splunk Inc. Distributing partial results to worker nodes from an external data system
US11620336B1 (en) 2016-09-26 2023-04-04 Splunk Inc. Managing and storing buckets to a remote shared storage system based on a collective bucket size
US11222066B1 (en) 2016-09-26 2022-01-11 Splunk Inc. Processing data using containerized state-free indexing nodes in a containerized scalable environment
US11003714B1 (en) 2016-09-26 2021-05-11 Splunk Inc. Search node and bucket identification using a search node catalog and a data store catalog
US11106734B1 (en) 2016-09-26 2021-08-31 Splunk Inc. Query execution using containerized state-free search nodes in a containerized scalable environment
US11023463B2 (en) 2016-09-26 2021-06-01 Splunk Inc. Converting and modifying a subquery for an external data system
US10776355B1 (en) 2016-09-26 2020-09-15 Splunk Inc. Managing, storing, and caching query results and partial query results for combination with additional query results
US10353965B2 (en) 2016-09-26 2019-07-16 Splunk Inc. Data fabric service system architecture
US11580107B2 (en) 2016-09-26 2023-02-14 Splunk Inc. Bucket data distribution for exporting data to worker nodes
US11593377B2 (en) 2016-09-26 2023-02-28 Splunk Inc. Assigning processing tasks in a data intake and query system
US11281706B2 (en) 2016-09-26 2022-03-22 Splunk Inc. Multi-layer partition allocation for query execution
US11250056B1 (en) 2016-09-26 2022-02-15 Splunk Inc. Updating a location marker of an ingestion buffer based on storing buckets in a shared storage system
US10984044B1 (en) 2016-09-26 2021-04-20 Splunk Inc. Identifying buckets for query execution using a catalog of buckets stored in a remote shared storage system
US11321321B2 (en) 2016-09-26 2022-05-03 Splunk Inc. Record expansion and reduction based on a processing task in a data intake and query system
US11314753B2 (en) 2016-09-26 2022-04-26 Splunk Inc. Execution of a query received from a data intake and query system
US10977260B2 (en) 2016-09-26 2021-04-13 Splunk Inc. Task distribution in an execution node of a distributed execution environment
US11562023B1 (en) 2016-09-26 2023-01-24 Splunk Inc. Merging buckets in a data intake and query system
US11269939B1 (en) 2016-09-26 2022-03-08 Splunk Inc. Iterative message-based data processing including streaming analytics
US11294941B1 (en) 2016-09-26 2022-04-05 Splunk Inc. Message-based data ingestion to a data intake and query system
US11860940B1 (en) 2016-09-26 2024-01-02 Splunk Inc. Identifying buckets for query execution using a catalog of buckets
US11163758B2 (en) 2016-09-26 2021-11-02 Splunk Inc. External dataset capability compensation
US11586627B2 (en) 2016-09-26 2023-02-21 Splunk Inc. Partitioning and reducing records at ingest of a worker node
US11615104B2 (en) 2016-09-26 2023-03-28 Splunk Inc. Subquery generation based on a data ingest estimate of an external data system
US11416528B2 (en) 2016-09-26 2022-08-16 Splunk Inc. Query acceleration data store
US11663227B2 (en) 2016-09-26 2023-05-30 Splunk Inc. Generating a subquery for a distinct data intake and query system
US10956415B2 (en) 2016-09-26 2021-03-23 Splunk Inc. Generating a subquery for an external data system using a configuration file
US11874691B1 (en) 2016-09-26 2024-01-16 Splunk Inc. Managing efficient query execution including mapping of buckets to search nodes
US11567993B1 (en) 2016-09-26 2023-01-31 Splunk Inc. Copying buckets from a remote shared storage system to memory associated with a search node for query execution
US11442935B2 (en) 2016-09-26 2022-09-13 Splunk Inc. Determining a record generation estimate of a processing task
US10860571B2 (en) * 2016-11-04 2020-12-08 Sap Se Storage and pruning for faster access of a document store
CN107301023A (zh) * 2017-06-29 2017-10-27 郑州云海信息技术有限公司 一种固态盘配置信息管理方法和装置
US11921672B2 (en) 2017-07-31 2024-03-05 Splunk Inc. Query execution at a remote heterogeneous data store of a data fabric service
US11151137B2 (en) 2017-09-25 2021-10-19 Splunk Inc. Multi-partition operation in combination operations
US10896182B2 (en) 2017-09-25 2021-01-19 Splunk Inc. Multi-partitioning determination for combination operations
CN108108434A (zh) * 2017-12-19 2018-06-01 福建中金在线信息科技有限公司 一种管理数据库的方法及装置
US11334543B1 (en) 2018-04-30 2022-05-17 Splunk Inc. Scalable bucket merging for a data intake and query system
CN110825794B (zh) 2018-08-14 2022-03-29 华为云计算技术有限公司 分区合并方法和数据库服务器
WO2020220216A1 (en) 2019-04-29 2020-11-05 Splunk Inc. Search time estimate in data intake and query system
US11715051B1 (en) 2019-04-30 2023-08-01 Splunk Inc. Service provider instance recommendations using machine-learned classifications and reconciliation
EP3859551A4 (en) * 2019-07-24 2021-11-24 Huawei Technologies Co., Ltd. METHOD AND APPARATUS FOR ESTABLISHING AN INDEX FOR DATA
US11494380B2 (en) 2019-10-18 2022-11-08 Splunk Inc. Management of distributed computing framework components in a data fabric service system
US11922222B1 (en) 2020-01-30 2024-03-05 Splunk Inc. Generating a modified component for a data intake and query system using an isolated execution environment image
US11704313B1 (en) 2020-10-19 2023-07-18 Splunk Inc. Parallel branch operation using intermediary nodes
US11308066B1 (en) 2021-02-24 2022-04-19 International Business Machines Corporation Optimized database partitioning
CN117668003B (zh) * 2024-02-01 2024-05-03 福建省华大数码科技有限公司 实现数据库中集合数据类型的数据处理方法及系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101876983A (zh) * 2009-04-30 2010-11-03 国际商业机器公司 数据库分区方法与系统
US20120158722A1 (en) * 2010-12-15 2012-06-21 Teradata Us, Inc. Database partition management
CN103473321A (zh) * 2013-09-12 2013-12-25 华为技术有限公司 数据库管理方法与系统

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5666524A (en) 1994-08-31 1997-09-09 Price Waterhouse Llp Parallel processing system for traversing a transactional database
US7124136B2 (en) 2001-01-11 2006-10-17 America Online, Incorporated Method and system for processing data in a multi-database system
US6931395B2 (en) 2001-10-25 2005-08-16 International Business Machines Corporation Method and apparatus for optimizing queries in a logically partitioned computer system
US6792429B2 (en) 2001-12-19 2004-09-14 Hewlett-Packard Development Company, L.P. Method for fault tolerant modification of data representation in a large database
US7562090B2 (en) * 2002-12-19 2009-07-14 International Business Machines Corporation System and method for automating data partitioning in a parallel database
CN100504854C (zh) * 2003-01-14 2009-06-24 联想(北京)有限公司 文件管理方法
US8145872B2 (en) 2004-11-08 2012-03-27 International Business Machines Corporation Autonomic self-tuning of database management system in dynamic logical partitioning environment
US20070124274A1 (en) 2005-11-30 2007-05-31 International Business Machines Corporation Apparatus and method for autonomic adjustment of resources in a logical partition to improve partitioned query performance
US7809769B2 (en) * 2006-05-18 2010-10-05 Google Inc. Database partitioning by virtual partitions
US8150870B1 (en) * 2006-12-22 2012-04-03 Amazon Technologies, Inc. Scalable partitioning in a multilayered data service framework
JP2008181243A (ja) * 2007-01-23 2008-08-07 Hitachi Ltd ストレージシステムのキャッシュパーティション領域の設定を制御するデータベース管理システム
US7792822B2 (en) 2007-03-02 2010-09-07 Microsoft Corporation Systems and methods for modeling partitioned tables as logical indexes
CN101315628B (zh) * 2007-06-01 2011-01-05 华为技术有限公司 内存数据库系统及实现内存数据库的方法和装置
CN101203019B (zh) * 2007-11-20 2011-01-05 中兴通讯股份有限公司 通讯设备逻辑地址与物理地址的互换方法及装置
US7949687B1 (en) * 2007-12-31 2011-05-24 Teradata Us, Inc. Relational database system having overlapping partitions
US8291191B2 (en) * 2008-02-13 2012-10-16 Computer Associates Think, Inc. System and method for safely automating the generation of multiple data definition language statements
CN101620600A (zh) * 2008-06-30 2010-01-06 上海全成通信技术有限公司 一种海量数据的处理方法
US9009112B2 (en) * 2010-03-15 2015-04-14 Microsoft Technology Licensing, Llc Reorganization of data under continuous workload
CN102375779B (zh) * 2010-08-16 2015-08-19 深圳市朗科科技股份有限公司 一种数据处理方法以及数据处理模块
US9684702B2 (en) * 2010-12-07 2017-06-20 International Business Machines Corporation Database redistribution utilizing virtual partitions
US8392378B2 (en) 2010-12-09 2013-03-05 International Business Machines Corporation Efficient backup and restore of virtual input/output server (VIOS) cluster
US20120166402A1 (en) 2010-12-28 2012-06-28 Teradata Us, Inc. Techniques for extending horizontal partitioning to column partitioning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101876983A (zh) * 2009-04-30 2010-11-03 国际商业机器公司 数据库分区方法与系统
US20120158722A1 (en) * 2010-12-15 2012-06-21 Teradata Us, Inc. Database partition management
CN103473321A (zh) * 2013-09-12 2013-12-25 华为技术有限公司 数据库管理方法与系统

Also Published As

Publication number Publication date
US9460186B2 (en) 2016-10-04
EP2990962A4 (en) 2016-06-08
EP2990962A1 (en) 2016-03-02
US20160092541A1 (en) 2016-03-31
CN103473321A (zh) 2013-12-25

Similar Documents

Publication Publication Date Title
WO2015035789A1 (zh) 数据库管理方法与系统
US9189506B2 (en) Database index management
US9043372B2 (en) Metadata subsystem for a distributed object store in a network storage system
US10055480B2 (en) Aggregating database entries by hashing
US10296508B2 (en) Systems and methods to manage online analytical and transactional processing for an in-memory columnar database
US20130297570A1 (en) Method and apparatus for deleting duplicate data
US9244838B2 (en) System, method, and computer-readable medium for grouping database level object access counts per processing module in a parallel processing system
CN102169507A (zh) 一种分布式实时搜索引擎
CN106796589B (zh) 空间数据对象的索引方法和系统
JP2003150414A (ja) データベース管理システム情報を考慮したデータ再配置方法およびデータ再配置を行う計算機システム
US11030196B2 (en) Method and apparatus for processing join query
CN103942289A (zh) 一种Hadoop上面向范围查询的内存缓存方法
CN108334596B (zh) 一种面向大数据平台的海量关系数据高效并行迁移方法
EP3101556B1 (en) Indexing dynamic hierarchical data
CN108536808A (zh) 一种基于Spark计算框架的数据获取方法和装置
CN109597829B (zh) 一种实现可搜索加密关系型数据库缓存的中间件方法
WO2022246953A1 (zh) 一种面向lsm树的键值存储方法和存储系统
JP2018169644A (ja) データベース管理システム及びデータベース管理方法
CN110032676B (zh) 一种基于谓词关联的sparql查询优化方法及系统
US10558636B2 (en) Index page with latch-free access
US20220215021A1 (en) Data Query Method and Apparatus, Computing Device, and Storage Medium
JP6371136B2 (ja) データ仮想化サーバ、データ仮想化サーバにおけるクエリ処理方法及びクエリ処理プログラム
CN110162574B (zh) 数据重分布方式的确定方法、装置、服务器及存储介质
CN112632118A (zh) 查询数据的方法、装置、计算设备和存储介质
Huang et al. Partition pruning for range query on distributed log-structured merge-tree

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14843923

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2014843923

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE