WO2016206100A1 - 一种数据表的分区管理方法及装置 - Google Patents

一种数据表的分区管理方法及装置 Download PDF

Info

Publication number
WO2016206100A1
WO2016206100A1 PCT/CN2015/082513 CN2015082513W WO2016206100A1 WO 2016206100 A1 WO2016206100 A1 WO 2016206100A1 CN 2015082513 W CN2015082513 W CN 2015082513W WO 2016206100 A1 WO2016206100 A1 WO 2016206100A1
Authority
WO
WIPO (PCT)
Prior art keywords
dimensional
partition
copy
dimension
data
Prior art date
Application number
PCT/CN2015/082513
Other languages
English (en)
French (fr)
Inventor
刘哲
沈柯
王铁英
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2015/082513 priority Critical patent/WO2016206100A1/zh
Priority to CN201580001180.8A priority patent/CN106716400B/zh
Publication of WO2016206100A1 publication Critical patent/WO2016206100A1/zh
Priority to US15/853,952 priority patent/US11226986B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems

Definitions

  • the present invention relates to the field of database technologies, and in particular, to a partition management method and apparatus for a data table.
  • OLTP Online Transaction Processing
  • OLAP Online Analysis Processing
  • the storage system first backs up the data, and then stores the backed up data in a corresponding plurality of physical disks, wherein the DSM (Decomposition Storage Model) is a column.
  • Storage mode saves data
  • NSM N-ary storage mode
  • requests for OLAP can be routed to column copies for execution
  • requests for OLTP can be routed to row copies for execution.
  • the scheme uses different replicas for different types of queries, especially for multi-table association operations, which consumes computational complexity and data network transmission overhead.
  • the database system is internally divided into two subsystems to store system data, one data is stored in at least one of the two subsystems, one subsystem saves the data as a non-column storage structure, and the other subsystem Save the data as a column storage structure.
  • routing to different subsystems for processing For example: for adding, deleting, and changing data operations, routing to non- The column storage subsystem processes and the column storage subsystem periodically requests data synchronization from the non-column storage subsystem.
  • the two subsystems are two relatively independent, both for hardware configuration and system maintenance.
  • the invention provides a partition management method and device for a data table, which adopts different partition management methods for data tables of different types and different dimensions, thereby efficiently implementing data processing mechanisms of OLTP and OLAP in one system, thereby saving resources. Consumption.
  • a first aspect of the embodiments of the present invention provides a partition management method for a data table, where the table group includes a fact table and at least one dimension table, where the fact table is not a dimension table of any data table in the table group, and each The dimension table is a dimension table associated with the fact table, and the field associated with the fact table is used as an association key between the fact table and each dimension table, including:
  • the each of the data tables in the table group is associated with each of the associated key pairs in each of the data tables
  • the one-dimensional partitioning of the row copy space of the data table and the one-dimensional or multi-dimensional partitioning of the column copy space include:
  • the one-dimensional dimension table is obtained by performing one-dimensional partitioning on the column copy space and the row copy space of the one-dimensional dimension table for a single association key of the one-dimensional dimension table.
  • One-dimensional column copy partitions and one-dimensional row copy partitions are obtained by performing one-dimensional partitioning on the column copy space and the row copy space of the one-dimensional dimension table for a single association key of the one-dimensional dimension table.
  • the data is used for each of the data types according to the type of the data table in the table group
  • the association key in the table performs one-dimensional partitioning on the row copy space of each of the data tables and one-dimensional or multi-dimensional partitioning of the column copy space including:
  • the data table is a multi-dimensional dimension table, a one-dimensional column copy partition corresponding to each of the plurality of association keys of the multi-dimensional dimension table, and a one-dimensional row copy partition;
  • the The one-dimensional or multi-dimensional partitioning of the row copy space of each of the data tables and the one-dimensional or multi-dimensional partitioning of the column copy space includes:
  • the acquiring the corresponding one of each of the multiple associated keys of the multidimensional dimension table Dimension copy partitions and one-dimensional row copy partitions include:
  • each of the plurality of association keys of the multi-dimensional dimension table respectively has a column copy of the multi-dimensional dimension table
  • the space and the row copy space are subjected to one-dimensional partitioning to obtain a one-dimensional column copy partition corresponding to each of the associated keys and a one-dimensional row copy partition.
  • the each of the data tables in the table group is associated with each of the associated key pairs in each of the data tables
  • the method further includes:
  • each data table in the table group on the master node is synchronized to the slave node.
  • One- or multi-dimensional column copy partitions, creating synchronization queues include:
  • a one-dimensional synchronous queue is created according to the one-dimensional row copy partition or the one-dimensional column copy partition of the one-dimensional dimension table.
  • One- or multi-dimensional column copy partitions, creating synchronization queues include:
  • a multidimensional synchronization queue is created according to the multidimensional column copy partition of the multidimensional dimension table.
  • One- or multi-dimensional column copy partitions, creating synchronization queues include:
  • the data table is a fact table
  • a one-dimensional column copy partition of the one-dimensional dimension table associated with the fact table and/or a multi-dimensional column copy partition of the multi-dimensional dimension table and a one-dimensional row of the fact table Copy the partition to create a multidimensional sync queue.
  • the primary node includes a primary node of a one-dimensional row replica, according to the synchronization Queue, synchronizing each data table in the table group on the primary node to the slave node includes:
  • a second aspect of the embodiments of the present invention provides a partition management apparatus for a data table, where the table group includes a fact table and at least one dimension table, where the fact table is not a dimension table of any data table in the table group, and each The dimension table is a dimension table associated with the fact table, and each of the dimension tables is associated with the fact table as an association key between the fact table and each of the dimension tables,
  • the device includes:
  • a determining module configured to determine a type of each data table in the table group and an association key, where the type of the data table includes a one-dimensional dimension table, a multi-dimensional dimension table, or a fact table;
  • a partitioning module configured to perform one-dimensional partitioning and column copy space on a row copy space of each of the data tables according to an association key in each of the data tables according to a type of each of the data tables in the table group Perform one-dimensional or multi-dimensional partitioning.
  • the partitioning module is specifically configured to:
  • the one-dimensional dimension table is obtained by performing one-dimensional partitioning on the column copy space and the row copy space of the one-dimensional dimension table for a single association key of the one-dimensional dimension table.
  • One-dimensional column copy partitions and one-dimensional row copy partitions are obtained by performing one-dimensional partitioning on the column copy space and the row copy space of the one-dimensional dimension table for a single association key of the one-dimensional dimension table.
  • the partitioning module is specifically configured to:
  • the data table is a multi-dimensional dimension table, a one-dimensional column copy partition corresponding to each of the plurality of association keys of the multi-dimensional dimension table, and a one-dimensional row copy partition;
  • a row copy partition is used as a one-dimensional row copy partition of the multidimensional dimension table.
  • the partitioning module is specifically configured to:
  • the partitioning module is specifically configured to:
  • each of the plurality of association keys of the multi-dimensional dimension table respectively has a column copy of the multi-dimensional dimension table
  • the space and the row copy space are subjected to one-dimensional partitioning to obtain a one-dimensional column copy partition corresponding to each of the associated keys and a one-dimensional row copy partition.
  • the device further includes:
  • a creating module configured to create a synchronization queue according to the obtained one-dimensional row copy partition of the data table, or the obtained one-dimensional or multi-dimensional column copy partition of the data table;
  • a synchronization module configured to: according to the synchronization queue, each data in the table group on the primary node The table is synchronized to the slave node.
  • the creating module is specifically configured to:
  • a one-dimensional synchronous queue is created according to the one-dimensional row copy partition or the one-dimensional column copy partition of the one-dimensional dimension table.
  • the creating module is specifically configured to:
  • a multidimensional synchronization queue is created according to the multidimensional column copy partition of the multidimensional dimension table.
  • the creating module is specifically configured to:
  • the data table is a fact table
  • a one-dimensional column copy partition of the one-dimensional dimension table associated with the fact table and/or a multi-dimensional column copy partition of the multi-dimensional dimension table and a one-dimensional row of the fact table Copy the partition to create a multidimensional sync queue.
  • the primary node includes a primary node of a one-dimensional row replica
  • the synchronization module is specifically used to:
  • a third aspect of the embodiments of the present invention provides another partition management apparatus for a data table, where the table group includes a fact table and at least one dimension table, where the fact table is not a dimension table of any data table in the table group, and each The dimension table is a dimension table associated with the fact table, and a field associated with the fact table of each dimension table is used as an association key between the fact table and each dimension table.
  • the apparatus includes a network interface, a memory, and a processor, wherein the memory stores a set of program code, and the processor is configured to call program code stored in the memory for causing the apparatus to perform the method of the above implementation.
  • the type of the data table includes a one-dimensional dimension table, a multi-dimensional dimension table, or a fact table; then, according to the type of each data table in the table group, the row copy space of each data table is performed for the associated key in each data table.
  • One-dimensional or multi-dimensional partitioning is performed on one-dimensional partitions and column copy spaces. Therefore, different partition management methods are adopted for data tables of different types and different dimensions, thereby realizing the data processing mechanism of OLTP and OLAP efficiently in one system, thereby saving resource consumption.
  • FIG. 1 is a schematic structural diagram of a database provided by a prior art solution of the present invention.
  • FIG. 2 is a schematic structural diagram of a data storage provided by a prior art solution of the present invention.
  • FIG. 3 is a schematic flow chart of a method for partition management of a data table according to a first embodiment of the present invention
  • FIG. 4A is a schematic structural diagram of a table group according to an embodiment of the present invention.
  • FIG. 4B is a schematic structural diagram of another watch group according to an embodiment of the present invention.
  • FIG. 5A is a schematic diagram of partitioning of a one-dimensional dimension table according to an embodiment of the present invention.
  • FIG. 5B is a schematic diagram of partitioning of a multidimensional dimension table according to an embodiment of the present invention.
  • FIG. 5C is a schematic diagram of partitioning of another multidimensional dimension table according to an embodiment of the present invention.
  • 5D is a schematic diagram of partitioning of a fact table according to an embodiment of the present invention.
  • FIG. 6 is a schematic flow chart of a method for partition management of a data table according to a second embodiment of the present invention.
  • FIG. 7A is a schematic diagram of a synchronization queue for creating a one-dimensional dimension table according to an embodiment of the present invention.
  • FIG. 7B is a schematic diagram of a synchronization queue for creating a multidimensional dimension table according to an embodiment of the present invention.
  • FIG. 7C is a schematic diagram of a synchronization queue for creating a fact table according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a partition management apparatus for a data table according to an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of a network structure of a database system according to an embodiment of the present invention.
  • FIG. 10 is a schematic structural diagram of another partition management apparatus for a data table according to an embodiment of the present invention.
  • FIG. 3 is a schematic flowchart of a method for partition management of a data table according to a first embodiment of the present invention.
  • the executor in the embodiment of the present invention may be a management node in a database cluster.
  • the method in the embodiment of the present invention includes:
  • S301 Determine a type of each data table in the table group and an association key, and the type of the data table includes a one-dimensional dimension table, a multi-dimensional dimension table, or a fact table.
  • the table group includes a fact table and at least one dimension table, where the fact table is not a dimension table of any data table in the table group, and each of the dimension tables is a dimension table associated with the fact table. a field associated with the fact table for each dimension table as an association key between the fact table and each of the dimension tables.
  • the table saved by the database can be divided into a fact table and a dimension table according to the saved content.
  • a fact table is an intersection of multiple dimension tables that contains data describing a particular event within a business (such as a bank transaction or product sale) for storing at least one fact record, each fact record corresponding to a row in the fact table.
  • a fact table can be associated with multiple dimension tables, and the dimension table is used to analyze the facts.
  • the entries in the dimension table are used to describe the data in the fact table, and the dimensional characteristics of the fact records in the fact table are recorded. For example, to analyze product sales, you can choose to analyze by product category or sales time. Therefore, you can first create a product dimension table and a time dimension table, and then aggregate the product dimension table and the time dimension table into one table to get the fact table.
  • the fact table can include product categories, sales hours, total sales, and so on.
  • the fact table in the table group is associated with each dimension table in the table group. Connect two The keys of the data tables are also often referred to as association keys, and the fact table and the dimension table are associated by the association key.
  • the table includes a dimension table PART, a dimension table SUPPLIER, a dimension table PARTSUPP, and a fact table LINEITEM.
  • the dimension table PART, the dimension table SUPPLIER, and the dimension table PARTSUPP are all dimension tables of the fact table LINEITEM, wherein the dimension table PART is associated with the dimension table PARTSUPP and the fact table LINEITEM through the association key PARTKEY.
  • the dimension table SUPPLIER is associated with the dimension table PARTSUPP and the fact table LINEITEM through the association key SUPPKEY.
  • the dimension table PART and the dimension table SUPPLIER have only one associated key associated with the fact table.
  • the dimension table PARTSUPP is associated with the fact table LINEITEM through the association key PARTKEY and the association key SUPPKEY, and the dimension table PARTSUPP is called a two-dimensional dimension table.
  • the dimension table in the table group is not limited to include a one-dimensional dimension table or a two-dimensional dimension table, and may also include a three-dimensional dimension table, a four-dimensional dimension table, and the like.
  • the data table is a one-dimensional dimension table
  • the interval range of the one-dimensional row copy partition of one dimension table is consistent with the range range of the one-dimensional column copy partition.
  • the number of nodes between the row copy partitions may be selected according to the size of the data volume, and the partition intervals of each column copy partition are regarded as one logical node.
  • the dimension table PART and the dimension table SUPPLIER are one-dimensional dimension tables
  • the association key of the dimension table PART is PARTKEY
  • the association key of the dimension table SUPPLIER is SUPPKEY
  • the dimension table PART is for the association key.
  • the one-dimensional row copy partition and the one-dimensional column copy partition of the dimension table PART obtained by PARTKEY for one-dimensional partitioning are both (P_0, P_1)/(P_1, P_2)/(P_2, P_3), and the dimension table SUPPLIER is performed for the associated key SUPPKEY.
  • the one-dimensional row copy partition and the one-dimensional column copy partition of the dimension table SUPPLIER obtained by the one-dimensional partition are both (S_0, S_1) / (S_1, S_2).
  • the dimension table PARTSUPP is a two-dimensional dimension table
  • the two-dimensional dimension table PARTSUPP is associated with the one-dimensional dimension table PART and the one-dimensional dimension table SUPPLIER, respectively.
  • the one-dimensional row copy partition and the one-dimensional column copy partition of the one-dimensional dimension table PART are both (P_0, P_1)/(P_1, P_2)/(P_2, P_3), and the one-dimensional dimension table SUPPLIER
  • the one-dimensional row copy partition and the one-dimensional column copy partition are both (S_0, S_1)/(S_1, S_2), so each association in the dimension table PARTSUPP can be obtained directly from the one-dimensional dimension table PART and the one-dimensional dimension table SUPPLIER.
  • the one-dimensional column copy partition corresponding to the key and the one-dimensional row copy partition are Alternatively, as shown in FIG. 4B and FIG.
  • the dimension table PARTSUPP is a two-dimensional dimension table, and the dimension table PARTSUPP is associated with the fact table LINEITEM through the association keys PARTKEY and SUPPKEY, and the dimension table PARTSUPP is not associated with the one-dimensional dimension table. Therefore, one-dimensional partitioning and one-dimensional row copy partitioning (P_0, P_1)/(P_1, P_2)/(P_2, P_3) can be obtained for the one-dimensional partitioning of the association key PARTKEY of the dimension table PARTSUPP, and the related key SUPPKEY is performed.
  • the one-dimensional partition obtains a one-dimensional column copy partition and a one-dimensional line copy partition (S_0, S_1) / (S_1, S_2).
  • the one-dimensional column copy partition (P_0, P_1) / (P_1, P_2) / (P_2, P_3) and the one-dimensional column copy partition (S_0, S_1) / (S_1, S_2) are combined to obtain the multidimensional of the dimension table PARTSUPP
  • Column copy partition (P_0, P_1) (S_0, S_1) / (P_0, P_1) (S_1, S_2) / (P_1, P_2) (S_0, S_1) / (P_1, P_2) (S_1, S_2) / (P_2, P_3)(S_0,S_1)/(P_2,P_3)(S_1,S_2) finally, you can select the associated key PARTKEY Or the one-dimensional row copy partition corresponding to the SUPPKEY as the one-dimensional row copy partition of the dimension table PARTSUPP.
  • the data table is a fact table
  • obtain a one-dimensional column copy partition of the one-dimensional dimension table and/or a multi-dimensional column copy partition of the multi-dimensional dimension table associated with the fact table select one from the fact table
  • the other fields except the associated key are used as partition keys, and the one-dimensional partition partition and the one-dimensional row copy partition are obtained by one-dimensional partitioning of the column copy space and the row copy interval of the fact table for the partition key;
  • the one-dimensional column copy partition corresponding to the partition key and the one-dimensional column copy partition of the one-dimensional dimension table and/or the multi-dimensional column copy partition of the multidimensional dimension table associated with the fact table are combined to obtain the fact table.
  • the multi-dimensional column copy partition; one-dimensional partitioning of the one-dimensional row copy partition corresponding to the partition key to obtain a one-dimensional row copy partition of the fact table.
  • the multi-dimensional column copy partition (P_0, P_1) (S_0, S_1) / (P_0, P_1) (S_1, S_2) of the dimension table PARTSUPP associated with the fact table LINEITEM may be first acquired.
  • the type of the data table includes a one-dimensional dimension table, a multi-dimensional dimension table, or a fact table; and then according to each data in the table group
  • the type of the table, one-dimensional partitioning of the row copy space of each data table and one-dimensional or multi-dimensional partitioning of the column copy space for each data table associated key Therefore, different partition management methods are adopted for data tables of different types and different dimensions, thereby implementing efficiently in one system.
  • the data processing mechanism of OLTP and OLAP saves resource consumption.
  • FIG. 6 is a schematic flowchart of a method for partition management of a data table according to a second embodiment of the present invention.
  • the executor in the embodiment of the present invention may be a management node in a database cluster.
  • the method in the embodiment of the present invention includes:
  • S601. Determine a type of each data table in the table group and an association key, where the type of the data table includes a one-dimensional dimension table, a multi-dimensional dimension table, or a fact table.
  • the table group includes a fact table and at least one dimension table, where the fact table is not a dimension table of any data table in the table group, and each of the dimension tables is a dimension table associated with the fact table. a field associated with the fact table for each dimension table as an association key between the fact table and each of the dimension tables.
  • the table saved by the database can be divided into a fact table and a dimension table according to the saved content.
  • a fact table is an intersection of multiple dimension tables that contains data describing a particular event within a business (such as a bank transaction or product sale) for storing at least one fact record, each fact record corresponding to a row in the fact table.
  • a fact table can be associated with multiple dimension tables, and the dimension table is used to analyze the facts.
  • the entries in the dimension table are used to describe the data in the fact table, and the dimensional characteristics of the fact records in the fact table are recorded. For example, to analyze product sales, you can choose to analyze by product category or sales time. Therefore, you can first create a product dimension table and a time dimension table, and then aggregate the product dimension table and the time dimension table into one table to get the fact table.
  • the fact table can include product categories, sales hours, total sales, and so on.
  • the fact table in the table group is associated with each dimension table in the table group.
  • the keys that connect the two data tables are also often referred to as association keys, and the fact table is associated with the dimension table through the association key.
  • the table includes a dimension table PART, a dimension table SUPPLIER, a dimension table PARTSUPP, and a fact table LINEITEM.
  • the dimension table PART, the dimension table SUPPLIER, and the dimension table PARTSUPP are all dimension tables of the fact table LINEITEM, wherein the dimension table PART is associated with the dimension table PARTSUPP and the fact table LINEITEM through the association key PARTKEY.
  • the dimension table SUPPLIER is associated with the dimension table PARTSUPP and the fact table LINEITEM through the association key SUPPKEY.
  • the dimension table PART and the dimension table SUPPLIER have only one associated key associated with the fact table.
  • the dimension table PARTSUPP is associated with the fact table LINEITEM through the association key PARTKEY and the association key SUPPKEY, and the dimension table PARTSUPP is called a two-dimensional dimension table.
  • the dimension table in the table group is not limited to include a one-dimensional dimension table or a two-dimensional dimension table, and may also include a three-dimensional dimension table, a four-dimensional dimension table, and the like.
  • S602. Perform one-dimensional partitioning and row-copy space of each row of the data table for one-dimensional partitioning according to an association key in each of the data tables according to a type of each of the data tables in the table group. Or multidimensional partitioning.
  • the data table is a one-dimensional dimension table
  • the interval range of the one-dimensional row copy partition of one dimension table is consistent with the range range of the one-dimensional column copy partition.
  • the number of nodes between the row copy partitions may be selected according to the size of the data volume, and the partition intervals of each column copy partition are regarded as one logical node.
  • the dimension table PART and the dimension table SUPPLIER are one-dimensional dimension tables
  • the association key of the dimension table PART is PARTKEY
  • the association key of the dimension table SUPPLIER is SUPPKEY
  • the dimension table PART is for the association key.
  • the one-dimensional row copy partition and the one-dimensional column copy partition of the dimension table PART obtained by PARTKEY for one-dimensional partitioning are both (P_0, P_1)/(P_1, P_2)/(P_2, P_3), and the dimension table SUPPLIER is performed for the associated key SUPPKEY.
  • the one-dimensional row copy partition and the one-dimensional column copy partition of the dimension table SUPPLIER obtained by the one-dimensional partition are both (S_0, S_1) / (S_1, S_2).
  • each of the plurality of associated keys of the multidimensional dimension table respectively performs one-dimensional partitioning on the column copy space and the row copy space of the multidimensional dimension table to obtain one corresponding to each of the associated keys.
  • Dimension copy partitions and one-dimensional row copy partitions are used.
  • the dimension table PARTSUPP is a two-dimensional dimension table
  • the two-dimensional dimension table PARTSUPP is associated with the one-dimensional dimension table PART and the one-dimensional dimension table SUPPLIER, respectively.
  • the one-dimensional row copy partition and the one-dimensional column copy partition of the one-dimensional dimension table PART are both (P_0, P_1)/(P_1, P_2)/(P_2, P_3), and the one-dimensional dimension table SUPPLIER
  • the one-dimensional row copy partition and the one-dimensional column copy partition are both (S_0, S_1)/(S_1, S_2), so one-dimensional corresponding to each associated key can be obtained directly from the one-dimensional dimension table PART and the one-dimensional dimension table SUPPLIER.
  • Column copy partitions and one-dimensional row copy partitions are Alternatively, as shown in FIG. 4B and FIG.
  • the dimension table PARTSUPP is a two-dimensional dimension table
  • the dimension table PARTSUPP is associated with the fact table LINEITEM through the association keys PARTKEY and SUPPKEY
  • the dimension table PARTSUPP is not associated with the one-dimensional dimension table, so One-dimensional partitioning can be performed on the associated key PARTKEY of the dimension table PARTSUPP to obtain a one-dimensional column copy partition and a one-dimensional row copy partition (P_0, P_1) / (P_1, P_2) / (P_2, P_3), and one for the associated key SUPPKEY Dimension partitions get a one-dimensional column copy partition and a one-dimensional row copy partition (S_0, S_1) / (S_1, S_2).
  • the one-dimensional column copy partition (P_0, P_1) / (P_1, P_2) / (P_2, P_3) and the one-dimensional column copy partition (S_0, S_1) / (S_1, S_2) are combined to obtain the multidimensional of the dimension table PARTSUPP
  • the one-dimensional row copy partition corresponding to the associated key PARTKEY or the associated key SUPPKEY may be selected as the one-dimensional row copy partition of the dimension table PARTSUPP.
  • the data table is a fact table
  • the other fields except the associated key are used as partition keys, and the one-dimensional partition partition and the one-dimensional row copy partition are obtained by one-dimensional partitioning of the column copy space and the row copy interval of the fact table for the partition key; a one-dimensional column copy partition corresponding to the partition key and a one-dimensional column copy partition of the one-dimensional dimension table and/or a multi-dimensional column copy partition of the multidimensional dimension table associated with the fact table, group Obtaining a multi-dimensional column copy partition of the fact table; performing one-dimensional partitioning on the one-dimensional row copy partition corresponding to the partition key to obtain a one-dimensional row copy partition of the fact table.
  • the multi-dimensional column copy partition (P_0, P_1) (S_0, S_1) / (P_0, P_1) (S_1, S_2) of the dimension table PARTSUPP associated with the fact table LINEITEM may be first acquired.
  • a one-dimensional synchronous queue is created according to the one-dimensional row copy partition or the one-dimensional column copy partition of the one-dimensional dimension table. Further, each of the one-dimensional row copy partition or the one-dimensional column replica partition of the one-dimensional dimension table may be a member of the synchronization queue.
  • the one-dimensional row copy partition and the one-dimensional column copy partition of the dimension table PART are both (P_0, P_1)/(P_1, P_2)/(P_2, P_3), and a synchronization queue (P_0, can be created).
  • a multidimensional synchronization queue is created according to the multidimensional column copy partition of the multidimensional dimension table.
  • each partition in the multi-dimensional column replica partition of the multi-dimensional dimension table may be used as a member in the synchronization queue.
  • the multidimensional column copy partition of the dimension table PARTSUPP is (P_0, P_1) (S_0, S_1) / (P_0, P_1) (S_1, S_2) / (P_1, P_2) (S_0, S_1) / (P_1, P_2) (S_1, S_2) / (P_2, P_3) ( S_0, S_1) / (P_2, P_3) (S_1, S_2), a synchronization queue as shown in FIG. 10 can be created.
  • a one-dimensional column copy partition of the one-dimensional dimension table and/or a multi-dimensional column copy partition of the multi-dimensional dimension table according to the fact table, and the fact table A one-dimensional row copy partition creates a multidimensional sync queue.
  • a one-dimensional column copy partition of the one-dimensional dimension table and/or a multi-dimensional column copy partition of the multi-dimensional dimension table, and a one-dimensional row copy partition of the fact table may be combined. Each partition is a member of the synchronization queue.
  • the multi-dimensional column copy partition of the two-dimensional dimension table PARTSUPP associated with the fact table is (P_0, P_1) (S_0, S_1) / (P_0, P_1) (S_1, S_2) / ( P_1, P_2)(S_0,S_1)/(P_1,P_2)(S_1,S_2)/(P_2,P_3)(S_0,S_1)/(P_2,P_3)(S_1,S_2), one-dimensional line copy of the fact table
  • the partition is (O_0, O_01) / (O_01, O_02) / (O_02, O_1) / (O_1, O_11) / (O_11, O_12) / (O_12, O_2), and the combination results in a synchronization queue as shown in Fig. 7C.
  • the database cluster is composed of one manager node and several cluster nodes.
  • the cluster node is a collective name of the master node (master node) and the slave node (slave node).
  • the manager node is responsible for managing the system table group, and is responsible for updating and synchronizing the changes of the global route.
  • the master node includes a master node of a one-dimensional row replica and a master node of a one-dimensional column replica, and the table group can be sent to the slave node for synchronization by the master node of the one-dimensional row replica.
  • the master node of the one-dimensional row copy initiates a data synchronization request to the slave node, and after returning the response response from the node, the data tables in the table group are respectively synchronized to the row copy space and the column copy space of the slave node.
  • the type of the data table includes a one-dimensional dimension table, a multi-dimensional dimension table, or a fact table; and then according to each data in the table group
  • the type of the table, one-dimensional partitioning of the row copy space of each data table and one-dimensional or multi-dimensional partitioning of the column copy space for each data table associated key So for different types as well
  • the data tables of different dimensions adopt different partition management methods, thereby effectively implementing the data processing mechanism of OLTP and OLAP in one system, thereby saving resource consumption.
  • FIG. 8 is a schematic structural diagram of a partition management apparatus for a data table according to an embodiment of the present invention.
  • the device in the embodiment of the present invention includes:
  • the determining module 801 is configured to determine a type of each data table in the table group and an associated key, and the type of the data table includes a one-dimensional dimension table, a multi-dimensional dimension table, or a fact table.
  • the table group includes a fact table and at least one dimension table, where the fact table is not a dimension table of any data table in the table group, and each of the dimension tables is a dimension table associated with the fact table. a field associated with the fact table for each dimension table as an association key between the fact table and each of the dimension tables.
  • the table saved by the database can be divided into a fact table and a dimension table according to the saved content.
  • a fact table is an intersection of multiple dimension tables that contains data describing a particular event within a business (such as a bank transaction or product sale) for storing at least one fact record, each fact record corresponding to a row in the fact table.
  • a fact table can be associated with multiple dimension tables, and the dimension table is used to analyze the facts.
  • the entries in the dimension table are used to describe the data in the fact table, and the dimensional characteristics of the fact records in the fact table are recorded. For example, to analyze product sales, you can choose to analyze by product category or sales time. Therefore, you can first create a product dimension table and a time dimension table, and then aggregate the product dimension table and the time dimension table into one table to get the fact table.
  • the fact table can include product categories, sales hours, total sales, and so on.
  • the fact table in the table group is associated with each dimension table in the table group.
  • the keys that connect the two data tables are also often referred to as association keys, and the fact table is associated with the dimension table through the association key.
  • the table includes a dimension table PART, a dimension table SUPPLIER, a dimension table PARTSUPP, and a fact table LINEITEM.
  • the dimension table PART, the dimension table SUPPLIER, and the dimension table PARTSUPP are all dimension tables of the fact table LINEITEM, wherein the dimension table PART is associated with the dimension table PARTSUPP and the fact table LINEITEM through the association key PARTKEY.
  • the dimension table SUPPLIER is associated with the dimension table PARTSUPP and the fact table LINEITEM through the association key SUPPKEY.
  • the dimension table PART and the dimension table SUPPLIER have only one associated key associated with the fact table.
  • the dimension table PARTSUPP is associated with the fact table LINEITEM through the association key PARTKEY and the association key SUPPKEY, and the dimension table PARTSUPP is called a two-dimensional dimension table.
  • the dimension table in the table group is not limited to include a one-dimensional dimension table or a two-dimensional dimension table, and may also include a three-dimensional dimension table, a four-dimensional dimension table, and the like.
  • a partitioning module 802 configured to perform one-dimensional partitioning and column copying on a row copy space of each of the data tables according to an association key in each of the data tables according to a type of each of the data tables in the table group Space for one-dimensional or multi-dimensional partitioning.
  • the data table is a one-dimensional dimension table
  • the interval range of the one-dimensional row copy partition of one dimension table is consistent with the range range of the one-dimensional column copy partition.
  • the number of nodes between the row copy partitions may be selected according to the size of the data volume, and the partition intervals of each column copy partition are regarded as one logical node.
  • the dimension table PART and the dimension table SUPPLIER are one-dimensional dimension tables
  • the association key of the dimension table PART is PARTKEY
  • the association key of the dimension table SUPPLIER is SUPPKEY
  • the dimension table PART is for the association key.
  • the one-dimensional row copy partition and the one-dimensional column copy partition of the dimension table PART obtained by PARTKEY for one-dimensional partitioning are both (P_0, P_1)/(P_1, P_2)/(P_2, P_3), and the dimension table SUPPLIER is performed for the associated key SUPPKEY.
  • the one-dimensional row copy partition and the one-dimensional column copy partition of the dimension table SUPPLIER obtained by the one-dimensional partition are both (S_0, S_1) / (S_1, S_2).
  • each of the plurality of associated keys of the multidimensional dimension table respectively performs one-dimensional partitioning on the column copy space and the row copy space of the multidimensional dimension table to obtain one corresponding to each of the associated keys.
  • Dimension copy partitions and one-dimensional row copy partitions are used.
  • the dimension table PARTSUPP is a two-dimensional dimension table
  • the two-dimensional dimension table PARTSUPP is associated with the one-dimensional dimension table PART and the one-dimensional dimension table SUPPLIER, respectively.
  • the one-dimensional row copy partition and the one-dimensional column copy partition of the one-dimensional dimension table PART are both (P_0, P_1)/(P_1, P_2)/(P_2, P_3), and the one-dimensional dimension table SUPPLIER
  • the one-dimensional row copy partition and the one-dimensional column copy partition are both (S_0, S_1)/(S_1, S_2), so one-dimensional corresponding to each associated key can be obtained directly from the one-dimensional dimension table PART and the one-dimensional dimension table SUPPLIER.
  • Column copy partitions and one-dimensional row copy partitions are Alternatively, as shown in FIG. 4B and FIG.
  • the dimension table PARTSUPP is a two-dimensional dimension table, and the dimension table PARTSUPP is associated with the fact table LINEITEM through the association keys PARTKEY and SUPPKEY, and the dimension table PARTSUPP is not associated with the one-dimensional dimension table. Therefore, one-dimensional partitioning and one-dimensional row copy partitioning (P_0, P_1)/(P_1, P_2) can be obtained by one-dimensional partitioning for the association key PARTKEY of the dimension table PARTSUPP.
  • the one-dimensional column copy partition (P_0, P_1) / (P_1, P_2) / (P_2, P_3) and the one-dimensional column copy partition (S_0, S_1) / (S_1, S_2) are combined to obtain the multidimensional of the dimension table PARTSUPP
  • the one-dimensional row copy partition corresponding to the associated key PARTKEY or the associated key SUPPKEY may be selected as the one-dimensional row copy partition of the dimension table PARTSUPP.
  • the data table is a fact table
  • the other fields except the associated key are used as partition keys, and the one-dimensional partition partition and the one-dimensional row copy partition are obtained by one-dimensional partitioning of the column copy space and the row copy interval of the fact table for the partition key; a one-dimensional column copy partition corresponding to the partition key and a one-dimensional column copy partition of the one-dimensional dimension table and/or a multi-dimensional column copy partition of the multidimensional dimension table associated with the fact table, group Obtaining a multi-dimensional column copy partition of the fact table; performing one-dimensional partitioning on the one-dimensional row copy partition corresponding to the partition key to obtain a one-dimensional row copy partition of the fact table.
  • the multi-dimensional column copy partition (P_0, P_1) (S_0, S_1) / (P_0, P_1) (S_1, S_2) of the dimension table PARTSUPP associated with the fact table LINEITEM may be first acquired.
  • the apparatus may further include:
  • the creating module 803 is configured to create a synchronization queue according to the obtained one-dimensional row copy partition of the data table or the obtained one-dimensional or multi-dimensional column copy partition of the data table.
  • a one-dimensional synchronous queue is created according to the one-dimensional row copy partition or the one-dimensional column copy partition of the one-dimensional dimension table. Further, each of the one-dimensional row copy partition or the one-dimensional column replica partition of the one-dimensional dimension table may be a member of the synchronization queue.
  • the one-dimensional row copy partition and the one-dimensional column copy partition of the dimension table PART are both (P_0, P_1)/(P_1, P_2)/(P_2, P_3), and a synchronization queue (P_0, can be created).
  • a multidimensional synchronization queue is created according to the multidimensional column copy partition of the multidimensional dimension table.
  • each partition in the multi-dimensional column replica partition of the multi-dimensional dimension table may be used as a member in the synchronization queue.
  • the multi-dimensional column copy partition of the dimension table PARTSUPP is (P_0, P_1) (S_0, S_1) / (P_0, P_1) (S_1, S_2) / (P_1, P_2) (S_0, S_1) / (P_1, P_2) (S_1, S_2) / (P_2, P_3) (S_0, S_1) / (P_2, P_3) (S_1, S_2)
  • a synchronization queue as shown in FIG. 10 can be created.
  • a one-dimensional column copy partition of the one-dimensional dimension table and/or a multi-dimensional column copy partition of the multi-dimensional dimension table according to the fact table, and the fact table A one-dimensional row copy partition creates a multidimensional sync queue.
  • a one-dimensional column copy partition of the one-dimensional dimension table and/or a multi-dimensional column copy partition of the multi-dimensional dimension table, and a one-dimensional row copy partition of the fact table may be combined. Each partition is a member of the synchronization queue.
  • the multi-dimensional column copy partition of the two-dimensional dimension table PARTSUPP associated with the fact table is (P_0, P_1) (S_0, S_1) / (P_0, P_1) (S_1, S_2) / ( P_1, P_2)(S_0,S_1)/(P_1,P_2)(S_1,S_2)/(P_2,P_3)(S_0,S_1)/(P_2,P_3)(S_1,S_2), one-dimensional line copy of the fact table
  • the partition is (O_0, O_01) / (O_01, O_02) / (O_02, O_1) / (O_1, O_11) / (O_11, O_12) / (O_12, O_2), and the combination results in a synchronization queue as shown in Fig. 7C.
  • the synchronization module 804 is configured to synchronize each data table in the table group on the master node to the slave node according to the synchronization queue.
  • the database cluster is composed of one manager node and several cluster nodes.
  • the cluster node is a collective name of the master node (master node) and the slave node (slave node).
  • the manager node is responsible for managing the system table group, and is responsible for updating and synchronizing the changes of the global route.
  • the master node includes a master node of a one-dimensional row replica and a master node of a one-dimensional column replica, and the table group can be sent to the slave node for synchronization by the master node of the one-dimensional row replica.
  • the master node of the one-dimensional row copy initiates a data synchronization request to the slave node, and after returning the response response from the node, the data tables in the table group are respectively synchronized to the row copy space and the column copy space of the slave node.
  • the type of the data table includes a one-dimensional dimension table, a multi-dimensional dimension table, or a fact table; and then according to each data in the table group
  • the type of table, the row copy space for each data table for the associated key in each data table Perform one-dimensional or multi-dimensional partitioning with one-dimensional partitioning and column copy space. Therefore, different partition management methods are adopted for data tables of different types and different dimensions, thereby realizing the data processing mechanism of OLTP and OLAP efficiently in one system, thereby saving resource consumption.
  • FIG. 10 is a schematic structural diagram of a partition management apparatus for a data table according to an embodiment of the present invention.
  • the apparatus can include at least one processor 1001, such as a CPU, at least one receiver 1003, at least one memory 1004, at least one transmitter 1005, and at least one communication bus 1002.
  • the communication bus 1002 is used to implement connection communication between these components.
  • the receiver 1003 and the transmitter 1005 of the device in the embodiment of the present invention may be wired transmission ports, or may be wireless devices, for example, including antenna devices, for performing signaling or data communication with other node devices.
  • the memory 1004 may be a high speed RAM memory or a non-volatile memory such as at least one disk memory.
  • the memory 1004 can also optionally be at least one storage device located remotely from the aforementioned processor 1001.
  • a set of program code is stored in the memory 1004, and the processor 1001 is configured to call program code stored in the memory for causing the apparatus to perform the method in the above implementation.
  • the program may be stored in a computer readable storage medium, and the storage medium may include: Flash disk, Read-Only Memory (ROM), Random Access Memory (RAM), disk or optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据表的分区管理方法及装置,包括:确定表组中每个数据表的类型以及关联键,所述数据表的类型包括一维维度表、多维维度表或事实表(S301);根据所述表组中每个所述数据表的类型,针对每个所述数据表中的关联键对每个所述数据表的行副本空间进行一维分区以及列副本空间进行一维或多维分区(S302)。针对不同类型以及不同维度的数据表采用不同的分区管理方法,从而在一个系统里高效的实现OLTP和OLAP的数据处理机制,节省了资源消耗。

Description

一种数据表的分区管理方法及装置 技术领域
本发明涉及数据库技术领域,尤其涉及一种数据表的分区管理方法及装置。
背景技术
随着信息化的发展,数据库不仅要求数据的存储效率高,而且要求数据分析的数据源最新,以便提出的结论有参考价值。因此,如图1所示,结合OLTP(Online Transaction Processing,联机事务处理)和OLAP(Online Analysis Processing,联机分析处理)组成一个新的系统,该系统是一种OLTP和OLAP应用的混合存储数据库,同时支持列存储和行存储功能,因此具有良好的读性能和写性能。其中,OLTP可以及时地处理输入的数据并做出回答,OLAP可以进行复杂的分析操作,侧重决策支持,并且提供直观易懂的查询结果。对于该新的系统,如何保证OLTP操作的实时性以及OLAP的分析效率和实效性至关重要。
现有技术方案一,如图2所示,存储系统首先对数据进行备份,然后将备份的数据存储到对应的多个物理磁盘中,其中,DSM(Decomposition Storage Model,分解存储模型)是以列存储方式保存数据,NSM(N-ary storage mode,多维存储模式)是以行存储的方式保存数据,并且对于OLAP的请求可以路由到列副本来执行,对于OLTP的请求可以路由到行副本来执行。但是,该方案对于不同类型的查询分别使用不同的副本进行处理,尤其对于多表关联的操作,消耗计算量和数据网络传输开销大。
现有技术方案二,数据库系统在内部分为两个子系统来保存系统数据,一份数据在两个子系统中至少各保存一份,一个子系统将数据保存为非列存储结构,另一个子系统将数据保存为列存储结构。对于不同类型的查询服务,路由到不同的子系统中进行处理。例如:对于增删改的数据操作,路由到非 列存储子系统进行处理,列存储子系统定时的从非列存储子系统中请求数据同步。但是,两个子系统是两个相对独立的,无论是对于硬件配置还是系统维护都需要很大开销。
在数据库系统中如何能够既保证OLTP操作的实时性又能够实现OLAP的分析效率和实效性是需要解决的问题。
发明内容
本发明提供了一种数据表的分区管理方法及装置,针对不同类型以及不同维度的数据表采用不同的分区管理方法,从而在一个系统里高效的实现OLTP和OLAP的数据处理机制,节省了资源消耗。
本发明实施例第一方面提供了一种数据表的分区管理方法,表组包括一个事实表和至少一个维度表,所述事实表不是所述表组中任一数据表的维度表,每个所述维度表为所述事实表相关联的维度表,所述每个维度表与所述事实表相关联的字段作为所述事实表与所述每个维度表之间的关联键,包括:
确定所述表组中每个数据表的类型以及关联键,所述数据表的类型包括一维维度表、多维维度表或事实表;
根据所述表组中每个所述数据表的类型,针对每个所述数据表中的关联键对每个所述数据表的行副本空间进行一维分区以及列副本空间进行一维或多维分区。
结合第一方面,在第一方面的第一种可能的实现方式中,所述根据所述表组中每个所述数据表的类型,针对每个所述数据表中的关联键对每个所述数据表的行副本空间进行一维分区以及列副本空间进行一维或多维分区包括:
当所述数据表为一维维度表时,针对所述一维维度表的单个关联键分别对所述一维维度表的列副本空间以及行副本空间进行一维分区得到所述一维维度表的一维列副本分区以及一维行副本分区。
结合第一方面的第一种可能的实现方式,在第一方面的第二种可能的实现方式中,所述根据所述表组中每个所述数据表的类型,针对每个所述数据 表中的关联键对每个所述数据表的行副本空间进行一维分区以及列副本空间进行一维或多维分区包括:
当所述数据表为多维维度表时,获取所述多维维度表的多个关联键中的每个关联键分别对应的一维列副本分区以及一维行副本分区;
将所述多个关联键对应的多个一维列副本分区进行组合得到所述多维维度表的多维列副本分区,并从所述多个关联键中选择一个关联键对应的一维行副本分区作为所述多维维度表的一维行副本分区。
结合第一方面的第一种以及第二种可能的实现方式,在第一方面的第三种可能的实现方式中,所述根据所述表组中每个所述数据表的类型,针对每个所述数据表中的关联键对每个所述数据表的行副本空间进行一维分区以及列副本空间进行一维或多维分区包括:
当所述数据表为事实表时,获取与所述事实表相关联的一维维度表的一维列副本分区和/或多维维度表的多维列副本分区;
从所述事实表中选择一个除所述关联键之外的其他字段作为分区键,并针对所述分区键对所述事实表的列副本空间以及行副本区间进行一维分区得到一维列副本分区以及一维行副本分区;
根据所述分区键对应的一维列副本分区以及所述与所述事实表相关联的一维维度表的一维列副本分区和/或多维维度表的多维列副本分区,组合得到所述事实表的多维列副本分区;
对所述分区键对应的一维行副本分区进行一维分区得到所述事实表的一维行副本分区。
结合第一方面的第二种可能的实现方式,在第一方面的第四种可能的实现方式中,所述获取所述多维维度表的多个关联键中的每个关联键分别对应的一维列副本分区以及一维行副本分区包括:
当所述表组中存在与所述多维维度表相关联的一维维度表时,获取每个所述相关联的一维维度表中的单个关联键对应的一维列副本分区以及一维行副本分区;
当所述表组中不存在与所述多维维度表相关联的一维维度表时,针对所述多维维度表的多个关联键中的每个关联键分别对所述多维维度表的列副本空间以及行副本空间进行一维分区得到所述每个关联键对应的一维列副本分区以及一维行副本分区。
结合第一方面,在第一方面的第五种可能的实现方式中,所述根据所述表组中每个所述数据表的类型,针对每个所述数据表中的关联键对每个所述数据表的行副本空间进行一维分区以及列副本空间进行一维或多维分区之后,还包括:
根据得到的所述数据表的一维行副本分区,或得到的所述数据表的一维或多维列副本分区,创建同步队列;
根据所述同步队列,将主节点上的所述表组中的各数据表同步到从节点上。
结合第一方面的第五种可能的实现方式,在第一方面的第六种可能的实现方式中,所述根据得到的所述数据表的一维行副本分区,或得到的所述数据表的一维或多维列副本分区,创建同步队列包括:
当所述数据表为一维维度表时,根据所述一维维度表的一维行副本分区或一维列副本分区,创建一维同步队列。
结合第一方面的第五种可能的实现方式,在第一方面的第七种可能的实现方式中,所述根据得到的所述数据表的一维行副本分区,或得到的所述数据表的一维或多维列副本分区,创建同步队列包括:
当所述数据表为多维维度表时,根据所述多维维度表的多维列副本分区创建多维同步队列。
结合第一方面的第五种可能的实现方式,在第一方面的第八种可能的实现方式中,所述根据得到的所述数据表的一维行副本分区,或得到的所述数据表的一维或多维列副本分区,创建同步队列包括:
当所述数据表为事实表时,根据与所述事实表相关联的一维维度表的一维列副本分区和/或多维维度表的多维列副本分区,以及所述事实表的一维行 副本分区,创建多维同步队列。
结合第一方面的第五种至第八种可能的实现方式,在第一方面的第九种可能的实现方式中,所述主节点包括一维行副本的主节点,所述根据所述同步队列,将主节点上的所述表组中的各数据表同步到从节点上包括:
通过所述一维行副本的主节点将所述表组中的各数据表发送至所述从节点进行同步。
本发明实施例第二方面提供了一种数据表的分区管理装置,表组包括一个事实表和至少一个维度表,所述事实表不是所述表组中任一数据表的维度表,每个所述维度表为所述事实表相关联的维度表,所述每个维度表与所述事实表相关联的字段作为所述事实表与所述每个维度表之间的关联键,所述装置包括:
确定模块,用于确定所述表组中每个数据表的类型以及关联键,所述数据表的类型包括一维维度表、多维维度表或事实表;
分区模块,用于根据所述表组中每个所述数据表的类型,针对每个所述数据表中的关联键对每个所述数据表的行副本空间进行一维分区以及列副本空间进行一维或多维分区。
结合第二方面,在第二方面的第一种可能的实现方式中,所述分区模块具体用于:
当所述数据表为一维维度表时,针对所述一维维度表的单个关联键分别对所述一维维度表的列副本空间以及行副本空间进行一维分区得到所述一维维度表的一维列副本分区以及一维行副本分区。
结合第二方面的第一种可能的实现方式,在第二方面的第二种可能的实现方式中,所述分区模块具体用于:
当所述数据表为多维维度表时,获取所述多维维度表的多个关联键中的每个关联键分别对应的一维列副本分区以及一维行副本分区;
将所述多个关联键对应的多个一维列副本分区进行组合得到所述多维维度表的多维列副本分区,并从所述多个关联键中选择一个关联键对应的一维 行副本分区作为所述多维维度表的一维行副本分区。
结合第二方面的第一种以及第二种可能的实现方式,在第二方面的第三种可能的实现方式中,所述分区模块具体用于:
当所述数据表为事实表时,获取与所述事实表相关联的一维维度表的一维列副本分区和/或多维维度表的多维列副本分区;
从所述事实表中选择一个除所述关联键之外的其他字段作为分区键,并针对所述分区键对所述事实表的列副本空间以及行副本区间进行一维分区得到一维列副本分区以及一维行副本分区;
根据所述分区键对应的一维列副本分区以及所述与所述事实表相关联的一维维度表的一维列副本分区和/或多维维度表的多维列副本分区,组合得到所述事实表的多维列副本分区;
对所述分区键对应的一维行副本分区进行一维分区得到所述事实表的一维行副本分区。
结合第二方面的第二种可能的实现方式,在第二方面的第四种可能的实现方式中,所述分区模块具体用于:
当所述表组中存在与所述多维维度表相关联的一维维度表时,获取每个所述相关联的一维维度表中的单个关联键对应的一维列副本分区以及一维行副本分区;
当所述表组中不存在与所述多维维度表相关联的一维维度表时,针对所述多维维度表的多个关联键中的每个关联键分别对所述多维维度表的列副本空间以及行副本空间进行一维分区得到所述每个关联键对应的一维列副本分区以及一维行副本分区。
结合第二方面,在第二方面的第五种可能的实现方式中,所述装置还包括:
创建模块,用于根据得到的所述数据表的一维行副本分区,或得到的所述数据表的一维或多维列副本分区,创建同步队列;
同步模块,用于根据所述同步队列,将主节点上的所述表组中的各数据 表同步到从节点上。
结合第二方面的第五种可能的实现方式,在第二方面的第六种可能的实现方式中,所述创建模块具体用于:
当所述数据表为一维维度表时,根据所述一维维度表的一维行副本分区或一维列副本分区,创建一维同步队列。
结合第二方面的第五种可能的实现方式,在第二方面的第七种可能的实现方式中,所述创建模块具体用于:
当所述数据表为多维维度表时,根据所述多维维度表的多维列副本分区创建多维同步队列。
结合第二方面的第五种可能的实现方式,在第二方面的第八种可能的实现方式中,所述创建模块具体用于:
当所述数据表为事实表时,根据与所述事实表相关联的一维维度表的一维列副本分区和/或多维维度表的多维列副本分区,以及所述事实表的一维行副本分区,创建多维同步队列。
结合第二方面的第五种至第八种可能的实现方式,在第二方面的第九种可能的实现方式中,所述主节点包括一维行副本的主节点,所述同步模块具体用于:
通过所述一维行副本的主节点将所述表组中的各数据表发送至所述从节点进行同步。
本发明实施例第三方面提供了另一种数据表的分区管理装置,表组包括一个事实表和至少一个维度表,所述事实表不是所述表组中任一数据表的维度表,每个所述维度表为所述事实表相关联的维度表,所述每个维度表与所述事实表相关联的字段作为所述事实表与所述每个维度表之间的关联键,所述装置包括网络接口、存储器以及处理器,其中,存储器中存储一组程序代码,且处理器用于调用存储器中存储的程序代码,用于使所述装置执行上述实现方式中的方法。
实施本发明实施例,首先确定所述表组中每个数据表的类型以及关联键, 所述数据表的类型包括一维维度表、多维维度表或事实表;然后根据表组中每个数据表的类型,针对每个数据表中的关联键对每个数据表的行副本空间进行一维分区以及列副本空间进行一维或多维分区。因此针对不同类型以及不同维度的数据表采用不同的分区管理方法,从而在一个系统里高效的实现OLTP和OLAP的数据处理机制,节省了资源消耗。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本发明现有技术方案提供的一种数据库结构示意图;
图2是本发明现有技术方案提供的一种数据存储的结构示意图;
图3是本发明第一实施例提出的一种数据表的分区管理方法的示意流程图;
图4A是本发明实施例提供的一种表组的结构示意图;
图4B是本发明实施例提供的另一种表组的结构示意图;
图5A是本发明实施例提供的一种一维维度表的分区示意图;
图5B是本发明实施例提供的一种多维维度表的分区示意图;
图5C是本发明实施例提供的另一种多维维度表的分区示意图;
图5D是本发明实施例提供的一种事实表的分区示意图;
图6是本发明第二实施例提出的一种数据表的分区管理方法的示意流程图;
图7A是本发明实施例提供的一种创建一维维度表的同步队列的示意图;
图7B是本发明实施例提供的一种创建多维维度表的同步队列的示意图;
图7C是本发明实施例提供的一种创建事实表的同步队列的示意图;
图8是本发明实施例提出的一种数据表的分区管理装置的结构示意图;
图9是本发明实施例提供的一种数据库系统的网络结构示意图;
图10是本发明实施例提出的另一种数据表的分区管理装置的结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
请参考图3,图3是本发明第一实施例提出的一种数据表的分区管理方法的示意流程图。本发明实施例中的执行主体可以为数据库集群中的管理节点,如图所示,本发明实施例中的方法包括:
S301,确定表组中每个数据表的类型以及关联键,所述数据表的类型包括一维维度表、多维维度表或事实表。
具体实现中,表组包括一个事实表和至少一个维度表,所述事实表不是所述表组中任一数据表的维度表,每个所述维度表为所述事实表相关联的维度表,所述每个维度表与所述事实表相关联的字段作为所述事实表与所述每个维度表之间的关联键。在分布式数据库多表管理的业务场景中,数据库保存的表根据保存的内容可以分为事实表和维度表。事实表是多个维度表的一个交点,包含描述业务(如银行事务或产品销售)内特定事件的数据,用于存储至少一个事实记录,每个事实记录对应事实表中的一行。一个事实表可以关联多个维度表,通过维度表来达到分析事实的目的,维度表中的条目用来描述事实表中的数据,记录事实表中的事实记录的维度特性。例如:分析产品销售情况,可以选择按产品类别或销售时间来分析,因此,可以首先创建产品维度表以及时间维度表,然后将产品维度表以及时间维度表分别聚合到一个表中得到事实表,事实表可以包括产品类别、销售时间以及销售总量等等。本实施例中,表组中的事实表与表组中的每个维度表相关联。连接两 个数据表的键也常常称为关联键,事实表与维度表通过关联键进行关联。
如图4A所示,表组中包括维度表PART、维度表SUPPLIER、维度表PARTSUPP以及事实表LINEITEM,维度表PART、维度表SUPPLIER、维度表PARTSUPP均为事实表LINEITEM的维度表,其中,维度表PART通过关联键PARTKEY与维度表PARTSUPP、事实表LINEITEM相关联,维度表SUPPLIER通过关联键SUPPKEY与维度表PARTSUPP、事实表LINEITEM相关联,维度表PART和维度表SUPPLIER均只有一个关联键与事实表相关联,称为一维维度表,维度表PARTSUPP通过关联键PARTKEY以及关联键SUPPKEY与事实表LINEITEM相关联,则维度表PARTSUPP称为二维维度表。需要说明的是,表组中的维度表不局限于包括一维维度表或二维维度表,还可以包括三维维度表、四维维度表等等。
S302,根据所述表组中每个所述数据表的类型,针对每个所述数据表中的关联键对每个所述数据表的行副本空间进行一维分区以及列副本空间进行一维或多维分区。
具体实现中,当所述数据表为一维维度表时,针对所述一维维度表的单个关联键分别对所述一维维度表的列副本空间以及行副本空间进行一维分区得到所述一维维度表的一维列副本分区以及一维行副本分区。其中,一个维度表的一维行副本分区的区间范围和一维列副本分区的区间范围保持一致。进一步的,可以根据数据量的大小选择行副本分区之间的节点个数,而每个列副本分区的分区区间均作为一个逻辑节点。
例如:如图4A和图5A所示,维度表PART和维度表SUPPLIER为一维维度表,维度表PART的关联键为PARTKEY,维度表SUPPLIER的关联键为SUPPKEY,因此,维度表PART针对关联键PARTKEY进行一维分区得到的维度表PART的一维行副本分区和一维列副本分区均为(P_0,P_1)/(P_1,P_2)/(P_2,P_3),维度表SUPPLIER针对关联键SUPPKEY进行一维分区得到的维度表SUPPLIER的一维行副本分区和一维列副本分区均为(S_0,S_1)/(S_1,S_2)。
当所述数据表为多维维度表时,获取所述多维维度表的多个关联键中的每个关联键分别对应的一维列副本分区以及一维行副本分区;将所述多个关联键对应的多个一维列副本分区进行组合得到所述多维维度表的多维列副本分区,并从所述多个关联键中选择一个关联键对应的一维行副本分区作为所述多维维度表的一维行副本分区。
进一步的,当所述表组中存在与所述多维维度表相关联的一维维度表时,获取每个所述相关联的一维维度表中的单个关联键对应的一维列副本分区以及一维行副本分区;当所述表组中不存在与所述多维维度表相关联的一维维度表时,针对所述多维维度表的多个关联键中的每个关联键分别对所述多维维度表的列副本空间以及行副本空间进行一维分区得到所述每个关联键对应的一维列副本分区以及一维行副本分区。
例如:首先,如图4A和图5B所示,维度表PARTSUPP为二维维度表,二维维度表PARTSUPP分别与一维维度表PART以及一维维度表SUPPLIER相关联。由本实施例的上述方法可知,一维维度表PART的一维行副本分区和一维列副本分区均为(P_0,P_1)/(P_1,P_2)/(P_2,P_3),一维维度表SUPPLIER一维行副本分区和一维列副本分区均为(S_0,S_1)/(S_1,S_2),因此可以直接从一维维度表PART和一维维度表SUPPLIER获取得到维度表PARTSUPP中的每个关联键对应的一维列副本分区以及一维行副本分区。或者,如图4B和图5C所示,维度表PARTSUPP为二维维度表,维度表PARTSUPP通过关联键PARTKEY和SUPPKEY与事实表LINEITEM相关联,维度表PARTSUPP并没有与一维维度表相关联。因此,可以针对维度表PARTSUPP的关联键PARTKEY进行一维分区得到一维列副本分区和一维行副本分区(P_0,P_1)/(P_1,P_2)/(P_2,P_3),针对关联键SUPPKEY进行一维分区得到一维列副本分区和一维行副本分区(S_0,S_1)/(S_1,S_2)。
然后,对一维列副本分区(P_0,P_1)/(P_1,P_2)/(P_2,P_3)和一维列副本分区(S_0,S_1)/(S_1,S_2)进行组合得到维度表PARTSUPP的多维列副本分区(P_0,P_1)(S_0,S_1)/(P_0,P_1)(S_1,S_2)/(P_1,P_2)(S_0,S_1)/(P_1,P_2)(S_1,S_2)/(P_2,P_3)(S_0,S_1)/(P_2,P_3)(S_1,S_2),最后,可以选择关联键PARTKEY 或关联键SUPPKEY对应的一维行副本分区作为维度表PARTSUPP的一维行副本分区。
当所述数据表为事实表时,获取与所述事实表相关联的一维维度表的一维列副本分区和/或多维维度表的多维列副本分区;从所述事实表中选择一个除所述关联键之外的其他字段作为分区键,并针对所述分区键对所述事实表的列副本空间以及行副本区间进行一维分区得到一维列副本分区以及一维行副本分区;根据所述分区键对应的一维列副本分区以及所述与所述事实表相关联的一维维度表的一维列副本分区和/或多维维度表的多维列副本分区,组合得到所述事实表的多维列副本分区;对所述分区键对应的一维行副本分区进行一维分区得到所述事实表的一维行副本分区。
例如:如图4A和图5D所示,可以首先获取与事实表LINEITEM相关联的维度表PARTSUPP的多维列副本分区(P_0,P_1)(S_0,S_1)/(P_0,P_1)(S_1,S_2)/(P_1,P_2)(S_0,S_1)/(P_1,P_2)(S_1,S_2)/(P_2,P_3)(S_0,S_1)/(P_2,P_3)(S_1,S_2);然后将事实表LINEITEM中的ORDERKEY作为分区键,针对分区键ORDERKEY对事实表LINEITEM的列副本空间以及行副本区间进行一维分区得到一维列副本分区(O_0,O_1)/(O_1,O_2)以及一维行副本分区(O_0,O_1)/(O_1,O_2);最后对多维列副本分区(P_0,P_1)(S_0,S_1)/(P_0,P_1)(S_1,S_2)/(P_1,P_2)(S_0,S_1)/(P_1,P_2)(S_1,S_2)/(P_2,P_3)(S_0,S_1)/(P_2,P_3)(S_1,S_2)与一维列副本分区(O_0,O_1)/(O_1,O_2)进行组合得到如图5C所示的事实表的多维列副本分区,将一维行副本分区(O_0,O_1)/(O_1,O_2)进行再分区得到事实表的一维行副本分区(O_0,O_01)/(O_01,O_02)/(O_02,O_1)/(O_1,O_11)/(O_11,O_12)/(O_12,O_2)。
本发明实施例中,首先确定所述表组中每个数据表的类型以及关联键,所述数据表的类型包括一维维度表、多维维度表或事实表;然后根据表组中每个数据表的类型,针对每个数据表中的关联键对每个数据表的行副本空间进行一维分区以及列副本空间进行一维或多维分区。因此针对不同类型以及不同维度的数据表采用不同的分区管理方法,从而在一个系统里高效的实现 OLTP和OLAP的数据处理机制,节省了资源消耗。
请参考图6,图6是本发明第二实施例提出的一种数据表的分区管理方法的示意流程图。本发明实施例中的执行主体可以为数据库集群中的管理节点,如图所示,本发明实施例中的方法包括:
S601,确定所述表组中每个数据表的类型以及关联键,所述数据表的类型包括一维维度表、多维维度表或事实表。
具体实现中,表组包括一个事实表和至少一个维度表,所述事实表不是所述表组中任一数据表的维度表,每个所述维度表为所述事实表相关联的维度表,所述每个维度表与所述事实表相关联的字段作为所述事实表与所述每个维度表之间的关联键。在分布式数据库多表管理的业务场景中,数据库保存的表根据保存的内容可以分为事实表和维度表。事实表是多个维度表的一个交点,包含描述业务(如银行事务或产品销售)内特定事件的数据,用于存储至少一个事实记录,每个事实记录对应事实表中的一行。一个事实表可以关联多个维度表,通过维度表来达到分析事实的目的,维度表中的条目用来描述事实表中的数据,记录事实表中的事实记录的维度特性。例如:分析产品销售情况,可以选择按产品类别或销售时间来分析,因此,可以首先创建产品维度表以及时间维度表,然后将产品维度表以及时间维度表分别聚合到一个表中得到事实表,事实表可以包括产品类别、销售时间以及销售总量等等。本实施例中,表组中的事实表与表组中的每个维度表相关联。连接两个数据表的键也常常称为关联键,事实表与维度表通过关联键进行关联。
如图4A所示,表组中包括维度表PART、维度表SUPPLIER、维度表PARTSUPP以及事实表LINEITEM,维度表PART、维度表SUPPLIER、维度表PARTSUPP均为事实表LINEITEM的维度表,其中,维度表PART通过关联键PARTKEY与维度表PARTSUPP、事实表LINEITEM相关联,维度表SUPPLIER通过关联键SUPPKEY与维度表PARTSUPP、事实表LINEITEM相关联,维度表PART和维度表SUPPLIER均只有一个关联键与事实表相关 联,称为一维维度表,维度表PARTSUPP通过关联键PARTKEY以及关联键SUPPKEY与事实表LINEITEM相关联,则维度表PARTSUPP称为二维维度表。需要说明的是,表组中的维度表不局限于包括一维维度表或二维维度表,还可以包括三维维度表、四维维度表等等。
S602,根据所述表组中每个所述数据表的类型,针对每个所述数据表中的关联键对每个所述数据表的行副本空间进行一维分区以及列副本空间进行一维或多维分区。
具体实现中,当所述数据表为一维维度表时,针对所述一维维度表的单个关联键分别对所述一维维度表的列副本空间以及行副本空间进行一维分区得到所述一维维度表的一维列副本分区以及一维行副本分区。其中,一个维度表的一维行副本分区的区间范围和一维列副本分区的区间范围保持一致。进一步的,可以根据数据量的大小选择行副本分区之间的节点个数,而每个列副本分区的分区区间均作为一个逻辑节点。
例如:如图4A和图5A所示,维度表PART和维度表SUPPLIER为一维维度表,维度表PART的关联键为PARTKEY,维度表SUPPLIER的关联键为SUPPKEY,因此,维度表PART针对关联键PARTKEY进行一维分区得到的维度表PART的一维行副本分区和一维列副本分区均为(P_0,P_1)/(P_1,P_2)/(P_2,P_3),维度表SUPPLIER针对关联键SUPPKEY进行一维分区得到的维度表SUPPLIER的一维行副本分区和一维列副本分区均为(S_0,S_1)/(S_1,S_2)。
当所述数据表为多维维度表时,获取所述多维维度表的多个关联键中的每个关联键分别对应的一维列副本分区以及一维行副本分区;将所述多个关联键对应的多个一维列副本分区进行组合得到所述多维维度表的多维列副本分区,并从所述多个关联键中选择一个关联键对应的一维行副本分区作为所述多维维度表的一维行副本分区。
进一步的,当所述表组中存在与所述多维维度表相关联的一维维度表时,获取每个所述相关联的一维维度表中的单个关联键对应的一维列副本分区以及一维行副本分区;当所述表组中不存在与所述多维维度表相关联的一维维 度表时,针对所述多维维度表的多个关联键中的每个关联键分别对所述多维维度表的列副本空间以及行副本空间进行一维分区得到所述每个关联键对应的一维列副本分区以及一维行副本分区。
例如:首先,如图4A和图5B所示,维度表PARTSUPP为二维维度表,二维维度表PARTSUPP分别与一维维度表PART以及一维维度表SUPPLIER相关联。由本实施例的上述方法可知,一维维度表PART的一维行副本分区和一维列副本分区均为(P_0,P_1)/(P_1,P_2)/(P_2,P_3),一维维度表SUPPLIER一维行副本分区和一维列副本分区均为(S_0,S_1)/(S_1,S_2),因此可以直接从一维维度表PART和一维维度表SUPPLIER获取得到每个关联键对应的一维列副本分区以及一维行副本分区。或者,如图4B和图5C所示,维度表PARTSUPP为二维维度表,维度表PARTSUPP通过关联键PARTKEY和SUPPKEY与事实表LINEITEM相关联,维度表PARTSUPP并没有与一维维度表相关联,因此,可以针对维度表PARTSUPP的关联键PARTKEY进行一维分区得到一维列副本分区和一维行副本分区(P_0,P_1)/(P_1,P_2)/(P_2,P_3),针对关联键SUPPKEY进行一维分区得到一维列副本分区和一维行副本分区(S_0,S_1)/(S_1,S_2)。
然后,对一维列副本分区(P_0,P_1)/(P_1,P_2)/(P_2,P_3)和一维列副本分区(S_0,S_1)/(S_1,S_2)进行组合得到维度表PARTSUPP的多维列副本分区(P_0,P_1)(S_0,S_1)/(P_0,P_1)(S_1,S_2)/(P_1,P_2)(S_0,S_1)/(P_1,P_2)(S_1,S_2)/(P_2,P_3)(S_0,S_1)/(P_2,P_3)(S_1,S_2),最后,可以选择关联键PARTKEY或关联键SUPPKEY对应的一维行副本分区作为维度表PARTSUPP的一维行副本分区。
当所述数据表为事实表时,获取与所述事实表相关联的一维维度表的一维列副本分区和/或多维维度表的多维列副本分区;从所述事实表中选择一个除所述关联键之外的其他字段作为分区键,并针对所述分区键对所述事实表的列副本空间以及行副本区间进行一维分区得到一维列副本分区以及一维行副本分区;根据所述分区键对应的一维列副本分区以及所述与所述事实表相关联的一维维度表的一维列副本分区和/或多维维度表的多维列副本分区,组 合得到所述事实表的多维列副本分区;对所述分区键对应的一维行副本分区进行一维分区得到所述事实表的一维行副本分区。
例如:如图4A和图5D所示,可以首先获取与事实表LINEITEM相关联的维度表PARTSUPP的多维列副本分区(P_0,P_1)(S_0,S_1)/(P_0,P_1)(S_1,S_2)/(P_1,P_2)(S_0,S_1)/(P_1,P_2)(S_1,S_2)/(P_2,P_3)(S_0,S_1)/(P_2,P_3)(S_1,S_2);然后将事实表LINEITEM中的ORDERKEY作为分区键,针对分区键ORDERKEY对事实表LINEITEM的列副本空间以及行副本区间进行一维分区得到一维列副本分区(O_0,O_1)/(O_1,O_2)以及一维行副本分区(O_0,O_1)/(O_1,O_2);最后对多维列副本分区(P_0,P_1)(S_0,S_1)/(P_0,P_1)(S_1,S_2)/(P_1,P_2)(S_0,S_1)/(P_1,P_2)(S_1,S_2)/(P_2,P_3)(S_0,S_1)/(P_2,P_3)(S_1,S_2)与一维列副本分区(O_0,O_1)/(O_1,O_2)进行组合得到如图5C所示的事实表的多维列副本分区,将一维行副本分区(O_0,O_1)/(O_1,O_2)进行再分区得到事实表的一维行副本分区(O_0,O_01)/(O_01,O_02)/(O_02,O_1)/(O_1,O_11)/(O_11,O_12)/(O_12,O_2)。
S603,根据得到的所述数据表的一维行副本分区,或得到的所述数据表的一维或多维列副本分区,创建同步队列。
具体实现中,当所述数据表为一维维度表时,根据所述一维维度表的一维行副本分区或一维列副本分区,创建一维同步队列。进一步的,可以将所述一维维度表的一维行副本分区或一维列副本分区中的每一个分区作为同步队列中的一个成员。
例如:如图7A所示,维度表PART的一维行副本分区和一维列副本分区均为(P_0,P_1)/(P_1,P_2)/(P_2,P_3),可以创建同步队列(P_0,P_1)、(P_1,P_2)、(P_2,P_3)。
可选的,当所述数据表为多维维度表时,根据所述多维维度表的多维列副本分区创建多维同步队列。进一步的,可以将所述多维维度表的多维列副本分区中的每一个分区作为同步队列中的一个成员。
例如:如图7B所示,维度表PARTSUPP的多维列副本分区为 (P_0,P_1)(S_0,S_1)/(P_0,P_1)(S_1,S_2)/(P_1,P_2)(S_0,S_1)/(P_1,P_2)(S_1,S_2)/(P_2,P_3)(S_0,S_1)/(P_2,P_3)(S_1,S_2),可以创建如图10所示的同步队列。
可选的,当所述数据表为事实表时,根据与所述事实表相关联的一维维度表的一维列副本分区和/或多维维度表的多维列副本分区,以及所述事实表的一维行副本分区,创建多维同步队列。进一步的,可以将与所述事实表相关联的一维维度表的一维列副本分区和/或多维维度表的多维列副本分区,以及所述事实表的一维行副本分区进行组合得到的每个分区作为同步队列中的一个成员。
例如:如图7C所示,与所述事实表相关联的二维维度表PARTSUPP的多维列副本分区为(P_0,P_1)(S_0,S_1)/(P_0,P_1)(S_1,S_2)/(P_1,P_2)(S_0,S_1)/(P_1,P_2)(S_1,S_2)/(P_2,P_3)(S_0,S_1)/(P_2,P_3)(S_1,S_2),事实表的一维行副本分区为(O_0,O_01)/(O_01,O_02)/(O_02,O_1)/(O_1,O_11)/(O_11,O_12)/(O_12,O_2),组合得到如图7C所示的同步队列。
S604,根据所述同步队列,将主节点上的所述表组中的各数据表同步到从节点上。
具体实现中,如图8所示,数据库集群是由一个manager节点和若干个cluster节点构成。其中,cluster节点是master节点(主节点)和slave节点(从节点)的统称,manager节点负责管理系统的表组,对全局路由的变化负责更新和同步。所述主节点包括一维行副本的主节点和一维列副本的主节点,可以通过所述一维行副本的主节点将所述表组发送至所述从节点进行同步。进一步的,通过一维行副本的主节点向从节点发起数据同步请求,在从节点返回应答响应后,将表组中的各数据表分别同步到从节点的行副本空间和列副本空间。
本发明实施例中,首先确定所述表组中每个数据表的类型以及关联键,所述数据表的类型包括一维维度表、多维维度表或事实表;然后根据表组中每个数据表的类型,针对每个数据表中的关联键对每个数据表的行副本空间进行一维分区以及列副本空间进行一维或多维分区。因此针对不同类型以及 不同维度的数据表采用不同的分区管理方法,从而在一个系统里高效的实现OLTP和OLAP的数据处理机制,节省了资源消耗。
请参考图8,图8是本发明实施例提出的一种数据表的分区管理装置的结构示意图。如图所示,本发明实施例中的装置包括:
确定模块801,用于确定所述表组中每个数据表的类型以及关联键,所述数据表的类型包括一维维度表、多维维度表或事实表。
具体实现中,表组包括一个事实表和至少一个维度表,所述事实表不是所述表组中任一数据表的维度表,每个所述维度表为所述事实表相关联的维度表,所述每个维度表与所述事实表相关联的字段作为所述事实表与所述每个维度表之间的关联键。在分布式数据库多表管理的业务场景中,数据库保存的表根据保存的内容可以分为事实表和维度表。事实表是多个维度表的一个交点,包含描述业务(如银行事务或产品销售)内特定事件的数据,用于存储至少一个事实记录,每个事实记录对应事实表中的一行。一个事实表可以关联多个维度表,通过维度表来达到分析事实的目的,维度表中的条目用来描述事实表中的数据,记录事实表中的事实记录的维度特性。例如:分析产品销售情况,可以选择按产品类别或销售时间来分析,因此,可以首先创建产品维度表以及时间维度表,然后将产品维度表以及时间维度表分别聚合到一个表中得到事实表,事实表可以包括产品类别、销售时间以及销售总量等等。本实施例中,表组中的事实表与表组中的每个维度表相关联。连接两个数据表的键也常常称为关联键,事实表与维度表通过关联键进行关联。
如图4A所示,表组中包括维度表PART、维度表SUPPLIER、维度表PARTSUPP以及事实表LINEITEM,维度表PART、维度表SUPPLIER、维度表PARTSUPP均为事实表LINEITEM的维度表,其中,维度表PART通过关联键PARTKEY与维度表PARTSUPP、事实表LINEITEM相关联,维度表SUPPLIER通过关联键SUPPKEY与维度表PARTSUPP、事实表LINEITEM相关联,维度表PART和维度表SUPPLIER均只有一个关联键与事实表相关 联,称为一维维度表,维度表PARTSUPP通过关联键PARTKEY以及关联键SUPPKEY与事实表LINEITEM相关联,则维度表PARTSUPP称为二维维度表。需要说明的是,表组中的维度表不局限于包括一维维度表或二维维度表,还可以包括三维维度表、四维维度表等等。
分区模块802,用于根据所述表组中每个所述数据表的类型,针对每个所述数据表中的关联键对每个所述数据表的行副本空间进行一维分区以及列副本空间进行一维或多维分区。
具体实现中,当所述数据表为一维维度表时,针对所述一维维度表的单个关联键分别对所述一维维度表的列副本空间以及行副本空间进行一维分区得到所述一维维度表的一维列副本分区以及一维行副本分区。其中,一个维度表的一维行副本分区的区间范围和一维列副本分区的区间范围保持一致。进一步的,可以根据数据量的大小选择行副本分区之间的节点个数,而每个列副本分区的分区区间均作为一个逻辑节点。
例如:如图4A和图5A所示,维度表PART和维度表SUPPLIER为一维维度表,维度表PART的关联键为PARTKEY,维度表SUPPLIER的关联键为SUPPKEY,因此,维度表PART针对关联键PARTKEY进行一维分区得到的维度表PART的一维行副本分区和一维列副本分区均为(P_0,P_1)/(P_1,P_2)/(P_2,P_3),维度表SUPPLIER针对关联键SUPPKEY进行一维分区得到的维度表SUPPLIER的一维行副本分区和一维列副本分区均为(S_0,S_1)/(S_1,S_2)。
当所述数据表为多维维度表时,获取所述多维维度表的多个关联键中的每个关联键分别对应的一维列副本分区以及一维行副本分区;将所述多个关联键对应的多个一维列副本分区进行组合得到所述多维维度表的多维列副本分区,并从所述多个关联键中选择一个关联键对应的一维行副本分区作为所述多维维度表的一维行副本分区。
进一步的,当所述表组中存在与所述多维维度表相关联的一维维度表时,获取每个所述相关联的一维维度表中的单个关联键对应的一维列副本分区以及一维行副本分区;当所述表组中不存在与所述多维维度表相关联的一维维 度表时,针对所述多维维度表的多个关联键中的每个关联键分别对所述多维维度表的列副本空间以及行副本空间进行一维分区得到所述每个关联键对应的一维列副本分区以及一维行副本分区。
例如:首先,如图4A和图5B所示,维度表PARTSUPP为二维维度表,二维维度表PARTSUPP分别与一维维度表PART以及一维维度表SUPPLIER相关联。由本实施例的上述方法可知,一维维度表PART的一维行副本分区和一维列副本分区均为(P_0,P_1)/(P_1,P_2)/(P_2,P_3),一维维度表SUPPLIER一维行副本分区和一维列副本分区均为(S_0,S_1)/(S_1,S_2),因此可以直接从一维维度表PART和一维维度表SUPPLIER获取得到每个关联键对应的一维列副本分区以及一维行副本分区。或者,如图4B和图5C所示,维度表PARTSUPP为二维维度表,维度表PARTSUPP通过关联键PARTKEY和SUPPKEY与事实表LINEITEM相关联,维度表PARTSUPP并没有与一维维度表相关联。因此,可以针对维度表PARTSUPP的关联键PARTKEY进行一维分区得到一维列副本分区和一维行副本分区(P_0,P_1)/(P_1,P_2)
/(P_2,P_3),针对关联键SUPPKEY进行一维分区得到一维列副本分区和一维行副本分区(S_0,S_1)/(S_1,S_2)。
然后,对一维列副本分区(P_0,P_1)/(P_1,P_2)/(P_2,P_3)和一维列副本分区(S_0,S_1)/(S_1,S_2)进行组合得到维度表PARTSUPP的多维列副本分区(P_0,P_1)(S_0,S_1)/(P_0,P_1)(S_1,S_2)/(P_1,P_2)(S_0,S_1)/(P_1,P_2)(S_1,S_2)/(P_2,P_3)(S_0,S_1)/(P_2,P_3)(S_1,S_2),最后,可以选择关联键PARTKEY或关联键SUPPKEY对应的一维行副本分区作为维度表PARTSUPP的一维行副本分区。
当所述数据表为事实表时,获取与所述事实表相关联的一维维度表的一维列副本分区和/或多维维度表的多维列副本分区;从所述事实表中选择一个除所述关联键之外的其他字段作为分区键,并针对所述分区键对所述事实表的列副本空间以及行副本区间进行一维分区得到一维列副本分区以及一维行副本分区;根据所述分区键对应的一维列副本分区以及所述与所述事实表相关联的一维维度表的一维列副本分区和/或多维维度表的多维列副本分区,组 合得到所述事实表的多维列副本分区;对所述分区键对应的一维行副本分区进行一维分区得到所述事实表的一维行副本分区。
例如:如图4A和图5D所示,可以首先获取与事实表LINEITEM相关联的维度表PARTSUPP的多维列副本分区(P_0,P_1)(S_0,S_1)/(P_0,P_1)(S_1,S_2)/(P_1,P_2)(S_0,S_1)/(P_1,P_2)(S_1,S_2)/(P_2,P_3)(S_0,S_1)/(P_2,P_3)(S_1,S_2);然后将事实表LINEITEM中的ORDERKEY作为分区键,针对分区键ORDERKEY对事实表LINEITEM的列副本空间以及行副本区间进行一维分区得到一维列副本分区(O_0,O_1)/(O_1,O_2)以及一维行副本分区(O_0,O_1)/(O_1,O_2);最后对多维列副本分区(P_0,P_1)(S_0,S_1)/(P_0,P_1)(S_1,S_2)/(P_1,P_2)(S_0,S_1)/(P_1,P_2)(S_1,S_2)/(P_2,P_3)(S_0,S_1)/(P_2,P_3)(S_1,S_2)与一维列副本分区(O_0,O_1)/(O_1,O_2)进行组合得到如图5C所示的事实表的多维列副本分区,将一维行副本分区(O_0,O_1)/(O_1,O_2)进行再分区得到事实表的一维行副本分区(O_0,O_01)/(O_01,O_02)/(O_02,O_1)/(O_1,O_11)/(O_11,O_12)/(O_12,O_2)。
可选的,如图8所示,装置还可以进一步包括:
创建模块803,用于根据得到的所述数据表的一维行副本分区,或得到的所述数据表的一维或多维列副本分区,创建同步队列。
具体实现中,当所述数据表为一维维度表时,根据所述一维维度表的一维行副本分区或一维列副本分区,创建一维同步队列。进一步的,可以将所述一维维度表的一维行副本分区或一维列副本分区中的每一个分区作为同步队列中的一个成员。
例如:如图7A所示,维度表PART的一维行副本分区和一维列副本分区均为(P_0,P_1)/(P_1,P_2)/(P_2,P_3),可以创建同步队列(P_0,P_1)、(P_1,P_2)、(P_2,P_3)。
可选的,当所述数据表为多维维度表时,根据所述多维维度表的多维列副本分区创建多维同步队列。进一步的,可以将所述多维维度表的多维列副本分区中的每一个分区作为同步队列中的一个成员。
例如:如图7B所示,维度表PARTSUPP的多维列副本分区为(P_0,P_1)(S_0,S_1)/(P_0,P_1)(S_1,S_2)/(P_1,P_2)(S_0,S_1)/(P_1,P_2)(S_1,S_2)/(P_2,P_3)(S_0,S_1)/(P_2,P_3)(S_1,S_2),可以创建如图10所示的同步队列。
可选的,当所述数据表为事实表时,根据与所述事实表相关联的一维维度表的一维列副本分区和/或多维维度表的多维列副本分区,以及所述事实表的一维行副本分区,创建多维同步队列。进一步的,可以将与所述事实表相关联的一维维度表的一维列副本分区和/或多维维度表的多维列副本分区,以及所述事实表的一维行副本分区进行组合得到的每个分区作为同步队列中的一个成员。
例如:如图7C所示,与所述事实表相关联的二维维度表PARTSUPP的多维列副本分区为(P_0,P_1)(S_0,S_1)/(P_0,P_1)(S_1,S_2)/(P_1,P_2)(S_0,S_1)/(P_1,P_2)(S_1,S_2)/(P_2,P_3)(S_0,S_1)/(P_2,P_3)(S_1,S_2),事实表的一维行副本分区为(O_0,O_01)/(O_01,O_02)/(O_02,O_1)/(O_1,O_11)/(O_11,O_12)/(O_12,O_2),组合得到如图7C所示的同步队列。
同步模块804,用于根据所述同步队列,将主节点上的所述表组中的各数据表同步到从节点上。
具体实现中,如图9所示,数据库集群是由一个manager节点和若干个cluster节点构成。其中,cluster节点是master节点(主节点)和slave节点(从节点)的统称,manager节点负责管理系统的表组,对全局路由的变化负责更新和同步。所述主节点包括一维行副本的主节点和一维列副本的主节点,可以通过所述一维行副本的主节点将所述表组发送至所述从节点进行同步。进一步的,通过一维行副本的主节点向从节点发起数据同步请求,在从节点返回应答响应后,将表组中的各数据表分别同步到从节点的行副本空间和列副本空间。
本发明实施例中,首先确定所述表组中每个数据表的类型以及关联键,所述数据表的类型包括一维维度表、多维维度表或事实表;然后根据表组中每个数据表的类型,针对每个数据表中的关联键对每个数据表的行副本空间 进行一维分区以及列副本空间进行一维或多维分区。因此针对不同类型以及不同维度的数据表采用不同的分区管理方法,从而在一个系统里高效的实现OLTP和OLAP的数据处理机制,节省了资源消耗。
图10是本发明实施例提出的一种数据表的分区管理装置的结构示意图。如图所示,该装置可以包括:至少一个处理器1001,例如CPU,至少一个接收器1003,至少一个存储器1004,至少一个发送器1005,至少一个通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。其中,本发明实施例中装置的接收器1003和发送器1005可以是有线发送端口,也可以为无线设备,例如包括天线装置,用于与其他节点设备进行信令或数据的通信。存储器1004可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器1004可选的还可以是至少一个位于远离前述处理器1001的存储装置。存储器1004中存储一组程序代码,且处理器1001用于调用存储器中存储的程序代码,用于使所述装置执行上述实现方式中的方法。
需要说明的是,对于前述的各个方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某一些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详细描述的部分,可以参见其他实施例的相关描述。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。
以上对本发明实施例所提供的内容下载方法及相关设备、系统进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。

Claims (21)

  1. 一种数据表的分区管理方法,其特征在于,表组中包括一个事实表和至少一个维度表,所述事实表不是所述表组中任一数据表的维度表,每个所述维度表为所述事实表相关联的维度表,所述每个维度表与所述事实表相关联的字段作为所述事实表与所述每个维度表之间的关联键,所述方法包括:
    确定所述表组中每个数据表的类型以及关联键,所述数据表的类型包括一维维度表、多维维度表或事实表;
    根据所述表组中每个所述数据表的类型,针对每个所述数据表中的关联键对每个所述数据表的行副本空间进行一维分区以及列副本空间进行一维或多维分区。
  2. 如权利要求1所述的方法,其特征在于,所述根据所述表组中每个所述数据表的类型,针对每个所述数据表中的关联键对每个所述数据表的行副本空间进行一维分区以及列副本空间进行一维或多维分区包括:
    当所述数据表为一维维度表时,针对所述一维维度表的单个关联键分别对所述一维维度表的列副本空间以及行副本空间进行一维分区得到所述一维维度表的一维列副本分区以及一维行副本分区。
  3. 如权利要求2所述的方法,其特征在于,所述根据所述表组中每个所述数据表的类型,针对每个所述数据表中的关联键对每个所述数据表的行副本空间进行一维分区以及列副本空间进行一维或多维分区包括:
    当所述数据表为多维维度表时,获取所述多维维度表的多个关联键中的每个关联键分别对应的一维列副本分区以及一维行副本分区;
    将所述多个关联键对应的多个一维列副本分区进行组合得到所述多维维度表的多维列副本分区,并从所述多个关联键中选择一个关联键对应的一维行副本分区作为所述多维维度表的一维行副本分区。
  4. 如权利要求2或3所述的方法,其特征在于,所述根据所述表组中每 个所述数据表的类型,针对每个所述数据表中的关联键对每个所述数据表的行副本空间进行一维分区以及列副本空间进行一维或多维分区包括:
    当所述数据表为事实表时,获取与所述事实表相关联的一维维度表的一维列副本分区和/或多维维度表的多维列副本分区;
    从所述事实表中选择一个除所述关联键之外的其他字段作为分区键,并针对所述分区键对所述事实表的列副本空间以及行副本区间进行一维分区得到一维列副本分区以及一维行副本分区;
    根据所述分区键对应的一维列副本分区以及所述与所述事实表相关联的一维维度表的一维列副本分区和/或多维维度表的多维列副本分区,组合得到所述事实表的多维列副本分区;
    对所述分区键对应的一维行副本分区进行一维分区得到所述事实表的一维行副本分区。
  5. 如权利要求3所述的方法,其特征在于,所述获取所述多维维度表的多个关联键中的每个关联键分别对应的一维列副本分区以及一维行副本分区包括:
    当所述表组中存在与所述多维维度表相关联的一维维度表时,获取每个所述相关联的一维维度表中的单个关联键对应的一维列副本分区以及一维行副本分区;
    当所述表组中不存在与所述多维维度表相关联的一维维度表时,针对所述多维维度表的多个关联键中的每个关联键分别对所述多维维度表的列副本空间以及行副本空间进行一维分区得到所述每个关联键对应的一维列副本分区以及一维行副本分区。
  6. 如权利要求1所述的方法,其特征在于,所述根据所述表组中每个所述数据表的类型,针对每个所述数据表中的关联键对每个所述数据表的行副本空间进行一维分区以及列副本空间进行一维或多维分区之后,还包括:
    根据得到的所述数据表的一维行副本分区,或得到的所述数据表的一维或多维列副本分区,创建同步队列;
    根据所述同步队列,将主节点上的所述表组中的各数据表同步到从节点上。
  7. 如权利要求6所述的方法,其特征在于,所述根据得到的所述数据表的一维行副本分区,或得到的所述数据表的一维或多维列副本分区,创建同步队列包括:
    当所述数据表为一维维度表时,根据所述一维维度表的一维行副本分区或一维列副本分区,创建一维同步队列。
  8. 如权利要求6所述的方法,其特征在于,所述根据得到的所述数据表的一维行副本分区,或得到的所述数据表的一维或多维列副本分区,创建同步队列包括:
    当所述数据表为多维维度表时,根据所述多维维度表的多维列副本分区创建多维同步队列。
  9. 如权利要求6所述的方法,其特征在于,所述根据得到的所述数据表的一维行副本分区,或得到的所述数据表的一维或多维列副本分区,创建同步队列包括:
    当所述数据表为事实表时,根据与所述事实表相关联的一维维度表的一维列副本分区和/或多维维度表的多维列副本分区,以及所述事实表的一维行副本分区,创建多维同步队列。
  10. 如权利要求6~9任意一项所述的方法,其特征在于,所述主节点包括一维行副本的主节点,所述根据所述同步队列,将主节点上的所述表组中的各数据表同步到从节点上包括:
    通过所述一维行副本的主节点将所述表组中的各数据表发送至所述从节点进行同步。
  11. 一种数据表的分区管理装置,其特征在于,表组中包括一个事实表 和至少一个维度表,所述事实表不是所述表组中任一数据表的维度表,每个所述维度表为所述事实表相关联的维度表,所述每个维度表与所述事实表相关联的字段作为所述事实表与所述每个维度表之间的关联键,所述装置包括:
    确定模块,用于确定所述表组中每个数据表的类型以及关联键,所述数据表的类型包括一维维度表、多维维度表或事实表;
    分区模块,用于根据所述表组中每个所述数据表的类型,针对每个所述数据表中的关联键对每个所述数据表的行副本空间进行一维分区以及列副本空间进行一维或多维分区。
  12. 如权利要求11所述的装置,其特征在于,所述分区模块具体用于:
    当所述数据表为一维维度表时,针对所述一维维度表的单个关联键分别对所述一维维度表的列副本空间以及行副本空间进行一维分区得到所述一维维度表的一维列副本分区以及一维行副本分区。
  13. 如权利要求12所述的装置,其特征在于,所述分区模块具体用于:
    当所述数据表为多维维度表时,获取所述多维维度表的多个关联键中的每个关联键分别对应的一维列副本分区以及一维行副本分区;
    将所述多个关联键对应的多个一维列副本分区进行组合得到所述多维维度表的多维列副本分区,并从所述多个关联键中选择一个关联键对应的一维行副本分区作为所述多维维度表的一维行副本分区。
  14. 如权利要求12或13所述的装置,其特征在于,所述分区模块具体用于:
    当所述数据表为事实表时,获取与所述事实表相关联的一维维度表的一维列副本分区和/或多维维度表的多维列副本分区;
    从所述事实表中选择一个除所述关联键之外的其他字段作为分区键,并针对所述分区键对所述事实表的列副本空间以及行副本区间进行一维分区得到一维列副本分区以及一维行副本分区;
    根据所述分区键对应的一维列副本分区以及所述与所述事实表相关联的 一维维度表的一维列副本分区和/或多维维度表的多维列副本分区,组合得到所述事实表的多维列副本分区;
    对所述分区键对应的一维行副本分区进行一维分区得到所述事实表的一维行副本分区。
  15. 如权利要求13所述的装置,其特征在于,所述分区模块具体用于:
    当所述表组中存在与所述多维维度表相关联的一维维度表时,获取每个所述相关联的一维维度表中的单个关联键对应的一维列副本分区以及一维行副本分区;
    当所述表组中不存在与所述多维维度表相关联的一维维度表时,针对所述多维维度表的多个关联键中的每个关联键分别对所述多维维度表的列副本空间以及行副本空间进行一维分区得到所述每个关联键对应的一维列副本分区以及一维行副本分区。
  16. 如权利要求11所述的装置,其特征在于,所述装置还包括:
    创建模块,用于根据得到的所述数据表的一维行副本分区,或得到的所述数据表的一维或多维列副本分区,创建同步队列;
    同步模块,用于根据所述同步队列,将主节点上的所述表组中的各数据表同步到从节点上。
  17. 如权利要求16所述的装置,其特征在于,所述创建模块具体用于:
    当所述数据表为一维维度表时,根据所述一维维度表的一维行副本分区或一维列副本分区,创建一维同步队列。
  18. 如权利要求16所述的装置,其特征在于,所述创建模块具体用于:
    当所述数据表为多维维度表时,根据所述多维维度表的多维列副本分区创建多维同步队列。
  19. 如权利要求16所述的装置,其特征在于,所述创建模块具体用于:
    当所述数据表为事实表时,根据与所述事实表相关联的一维维度表的一维列副本分区和/或多维维度表的多维列副本分区,以及所述事实表的一维行副本分区,创建多维同步队列。
  20. 如权利要求16~19任意一项所述的装置,其特征在于,所述主节点包括一维行副本的主节点,所述同步模块具体用于:
    通过所述一维行副本的主节点将所述表组中的各数据表发送至所述从节点进行同步。
  21. 一种数据表的分区管理装置,其特征在于,表组包括一个事实表和至少一个维度表,所述事实表不是所述表组中任一数据表的维度表,每个所述维度表为所述事实表相关联的维度表,所述每个维度表与所述事实表相关联的字段作为所述事实表与所述每个维度表之间的关联键,所述装置包括网络接口、存储器以及处理器,其中,存储器中存储一组程序代码,且处理器用于调用存储器中存储的程序代码,用于使所述装置执行如权利要求1-10中所述的方法。
PCT/CN2015/082513 2015-06-26 2015-06-26 一种数据表的分区管理方法及装置 WO2016206100A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/CN2015/082513 WO2016206100A1 (zh) 2015-06-26 2015-06-26 一种数据表的分区管理方法及装置
CN201580001180.8A CN106716400B (zh) 2015-06-26 2015-06-26 一种数据表的分区管理方法及装置
US15/853,952 US11226986B2 (en) 2015-06-26 2017-12-25 Data table partitioning management method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2015/082513 WO2016206100A1 (zh) 2015-06-26 2015-06-26 一种数据表的分区管理方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/853,952 Continuation US11226986B2 (en) 2015-06-26 2017-12-25 Data table partitioning management method and apparatus

Publications (1)

Publication Number Publication Date
WO2016206100A1 true WO2016206100A1 (zh) 2016-12-29

Family

ID=57584634

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/082513 WO2016206100A1 (zh) 2015-06-26 2015-06-26 一种数据表的分区管理方法及装置

Country Status (3)

Country Link
US (1) US11226986B2 (zh)
CN (1) CN106716400B (zh)
WO (1) WO2016206100A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115292360A (zh) * 2022-09-29 2022-11-04 北京迪力科技有限责任公司 一种副本的数据分区方法及装置

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491294B (zh) * 2018-03-15 2022-11-25 中国银行股份有限公司 一种数据库备份方法、装置及系统
US11016969B1 (en) * 2020-11-25 2021-05-25 Coupang Corp. Systems and methods for managing a highly available distributed hybrid transactional and analytical database
CN112905596B (zh) * 2021-03-05 2024-02-02 北京中经惠众科技有限公司 数据处理的方法、装置、计算机设备以及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050038784A1 (en) * 2001-02-27 2005-02-17 Oracle International Corporation Method and mechanism for database partitioning
CN101120340A (zh) * 2004-02-21 2008-02-06 数据迅捷股份有限公司 超无共享并行数据库
CN103440245A (zh) * 2013-07-15 2013-12-11 西北工业大学 数据库系统的行列混合存储方法
CN103942342A (zh) * 2014-05-12 2014-07-23 中国人民大学 一种内存数据库oltp&olap并发查询优化方法

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7035851B1 (en) * 2001-12-07 2006-04-25 Ncr Corp. Reorganization of database records using partition merging
US8782075B2 (en) 2007-05-08 2014-07-15 Paraccel Llc Query handling in databases with replicated data
US20110157194A1 (en) * 2009-12-31 2011-06-30 Omri Eisenbach System, data structure, and method for processing multi-dimensional video data
GB2480599A (en) 2010-05-17 2011-11-30 Tech Universit T Muenchen Hybrid OLTP and OLAP database
WO2012072879A1 (en) * 2010-11-30 2012-06-07 Nokia Corporation Method and apparatus for updating a partitioned index
US8874505B2 (en) 2011-01-11 2014-10-28 Hitachi, Ltd. Data replication and failure recovery method for distributed key-value store
CN103210671B (zh) * 2011-09-06 2016-02-03 华为技术有限公司 一种消息发送方法和装置
US10198465B2 (en) * 2015-05-14 2019-02-05 Deephaven Data Labs Llc Computer data system current row position query language construct and array processing query language constructs

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050038784A1 (en) * 2001-02-27 2005-02-17 Oracle International Corporation Method and mechanism for database partitioning
CN101120340A (zh) * 2004-02-21 2008-02-06 数据迅捷股份有限公司 超无共享并行数据库
CN103440245A (zh) * 2013-07-15 2013-12-11 西北工业大学 数据库系统的行列混合存储方法
CN103942342A (zh) * 2014-05-12 2014-07-23 中国人民大学 一种内存数据库oltp&olap并发查询优化方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115292360A (zh) * 2022-09-29 2022-11-04 北京迪力科技有限责任公司 一种副本的数据分区方法及装置

Also Published As

Publication number Publication date
US11226986B2 (en) 2022-01-18
US20180121532A1 (en) 2018-05-03
CN106716400A (zh) 2017-05-24
CN106716400B (zh) 2019-09-27

Similar Documents

Publication Publication Date Title
US11372888B2 (en) Adaptive distribution for hash operations
CN102395962B (zh) 对数据库表的哈希和列表组合划分
US9195701B2 (en) System and method for flexible distributed massively parallel processing (MPP) database
JP6086463B2 (ja) ピアツーピアデータ複製用の方法、デバイス、およびシステム、ならびにマスタノード切替え用の方法、デバイス、およびシステム
JPWO2012121316A1 (ja) 分散ストレージシステムおよび方法
US11226986B2 (en) Data table partitioning management method and apparatus
CN117321583A (zh) 用于混合数据处理的存储引擎
US11789971B1 (en) Adding replicas to a multi-leader replica group for a data set
US11550793B1 (en) Systems and methods for spilling data for hash joins
US20230066540A1 (en) Hybrid data processing system and method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15895998

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15895998

Country of ref document: EP

Kind code of ref document: A1