WO2021046750A1 - 数据重分布方法、装置及系统 - Google Patents

数据重分布方法、装置及系统 Download PDF

Info

Publication number
WO2021046750A1
WO2021046750A1 PCT/CN2019/105357 CN2019105357W WO2021046750A1 WO 2021046750 A1 WO2021046750 A1 WO 2021046750A1 CN 2019105357 W CN2019105357 W CN 2019105357W WO 2021046750 A1 WO2021046750 A1 WO 2021046750A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
node
node set
migrated
migration
Prior art date
Application number
PCT/CN2019/105357
Other languages
English (en)
French (fr)
Inventor
佟强
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2019/105357 priority Critical patent/WO2021046750A1/zh
Priority to CN201980005457.2A priority patent/CN112789606A/zh
Priority to EP19945328.3A priority patent/EP3885929A4/en
Publication of WO2021046750A1 publication Critical patent/WO2021046750A1/zh
Priority to US17/370,275 priority patent/US11860833B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1451Management of the data involved in backup or backup restore by selection of backup contents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/128Details of file system snapshots on the file-level, e.g. snapshot creation, administration, deletion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/164File meta data generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • G06F16/24544Join order optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/088Usage controlling of secret information, e.g. techniques for restricting cryptographic keys to pre-authorized uses, different access levels, validity of crypto-period, different key- or password length, or different strong and weak cryptographic algorithms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3236Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions

Definitions

  • This application relates to the field of databases, in particular to a data redistribution method, device and system.
  • Online data redistribution refers to the completion of data redistribution without interrupting user services. At present, more and more databases are applying this technology.
  • the data of one or more data tables are maintained in multiple nodes of the database.
  • the way of creating temporary tables is usually used to realize online redistribution of data. For example, for the first data table that requires data redistribution, first create a temporary table for this table. Then copy all the data of the first data table deployed on the node corresponding to the first data table to the node corresponding to the temporary table. After completing the data copy, exchange the data of the temporary table and the data of the first data table (this process is called For data switching), after the exchange is completed, delete the data of the temporary table and the temporary table, and the data redistribution is completed at this point.
  • this process is called For data switching
  • the embodiments of the present application provide a data redistribution method, device, and system, which can reduce the complexity of online data redistribution.
  • a data redistribution method including:
  • the first node set includes the data in the first data table before the data in the first data table is redistributed.
  • the second node set includes data nodes used to store the data in the first data table since the data in the first data table is redistributed by the data;
  • a third node set for responding to the target service request is determined in the first node set and the second node set ;
  • the target service request is sent to the data node in the third node set, and the target service request is used for each node in the third node set to perform service processing based on the target service request.
  • the data redistribution method provided by the embodiments of the present application can execute the target task without establishing a temporary table, and realize online data redistribution. In this way, there is no need to perform data migration between tables and only need to perform data migration within tables, thereby reducing online The complexity of data redistribution.
  • determining the third node set used to respond to the target service request in the first node set and the second node set includes: when the target service request is a data addition request, determining the third node set in the second node set for The third node set responding to the data addition request.
  • determining the third node set used to respond to the data addition request in the second node set includes: calculating a hash value according to the key value of the newly added data carried in the data addition request; determining the second node set The data node corresponding to the hash value, and the determined data node belongs to the third node set.
  • hash distribution rules for data distribution can achieve load balancing.
  • determining the third node set used to respond to the target service request in the first node set and the second node set includes:
  • the data node used to respond to the target service request is determined in the first node centrally, and the data node used to respond to the target service request is determined in the first node.
  • the second node collectively determines data nodes used to respond to the target service request, and the third node set is composed of data nodes determined from the first node set and data nodes determined from the second node set.
  • migrating the data of the first data table from the first node set to the second node set includes: filtering the data to be migrated among the data of the first data table stored in the first node set, and the data to be migrated is selected from the data in the first data table stored in the first node set.
  • the migration data is the data of the first data table that is not stored in the second node set before the migration; the data to be migrated is migrated from the first node set to the second node set.
  • the data that is deployed in the data node at the same location and/or the data that has been deleted before the migration will not only occupy data resources, but also affect the efficiency of the migration. Therefore, the invalid migration data can be eliminated through the filtering operation, and the data that actually needs to be migrated is regarded as the data to be migrated, and the data is migrated. That is, the data to be migrated includes data in the first data table except for invalid migration data. In this way, part of the table data can be migrated, the amount of data to be migrated can be reduced, the occupation of data resources can be reduced, and the migration efficiency can be improved.
  • filtering the data to be migrated from the data in the first data table stored in the first node set includes: obtaining the first mapping relationship between the data in the first data table and the data nodes of the first node set; obtaining the first data The second mapping relationship between the data in the table and the data nodes of the second node set; for the target data in the first data table, the data node corresponding to the target data determined based on the first mapping relationship and When the data node corresponding to the target data determined based on the second mapping relationship is different, in the data node corresponding to the target data determined based on the first mapping relationship, the target data is determined to be all Describe the data to be migrated.
  • different data of the first data table is migrated from the first node set to the second node set through multiple distributed transactions executed in series.
  • the migration of different data of the first data table from the first node set to the second node set through a plurality of distributed transactions executed serially includes:
  • the data to be migrated that meets the migration condition is selected from the un-migrated data of the first data table in the first node set through the distributed transaction currently executed, And migrating the selected data to be migrated from the first node set to the second node set, and the selected data to be migrated is locked during the migration process;
  • the migration condition includes: the data volume of the data to be migrated that is migrated through the currently executed distributed transaction is less than or equal to a specified data volume threshold, and/or the migration duration of the currently executed distributed transaction migration Less than or equal to the specified duration threshold.
  • the currently executed distributed transaction is selected from the unmigrated data of the first data table in the first node set
  • the data to be migrated that meets the migration conditions, and the selected data to be migrated is migrated from the first node set to the second node set, including:
  • n distributed plans are generated for n data nodes respectively, the first node set includes the n data nodes, the n data nodes and the n distributions One-to-one correspondence between formula plans, and n is a positive integer;
  • n data nodes Instruct the n data nodes to execute the n distributed plans to select the data to be migrated that meet the sub-migration conditions from the un-migrated data of the first data table in the n data nodes in parallel, and The selected data to be migrated that satisfy the sub-migration condition are sent from the n data nodes to the second node set, and the sub-migration condition is determined according to the migration condition.
  • the method further includes: in the process of migrating the data of the first data table, if a rollback trigger event is detected, rolling back the data that has been migrated through the multiple distributed transactions .
  • the method further includes: in the process of migrating the data of the first data table, if a rollback trigger event is detected, rolling back the data that has been migrated through the currently executed distributed transaction .
  • the rollback trigger event may be a data node failure (such as downtime) associated with the first data table, a data transmission error, a network error, or a rollback instruction received, etc.
  • the aforementioned distributed transaction ensures the data consistency and durability of the migration process.
  • the overall data migration process for the first data table is split into multiple distributed transactions.
  • the transaction migration process if a rollback trigger event is detected, all operations of a distributed transaction currently working only need to be rolled back. After the migration conditions are met again, a new distributed transaction can be initiated for data migration. Therefore, the data granularity of the rollback and the amount of data to be rolled back are reduced, the amount of repeated migration data is reduced, the overall impact of the rollback on the data migration process is reduced, the waste of resources is avoided, and the fault tolerance of the database is improved.
  • the method includes: setting a deletion flag for the migrated data in the first data table on the first node set.
  • the migrated data is deleted, it is essentially recorded as a historical version on the corresponding data node.
  • the historical version of the data can be skipped (that is, the data with the deletion flag is skipped) ). In this way, it can be ensured that during the data migration process, the user's data query operation for the historical version of the data is effectively performed.
  • a data redistribution device may include at least one module, and the at least one module may be used to implement the data redistribution method provided by the foregoing first aspect or various possible implementations of the first aspect .
  • the present application provides a computing device, which includes a processor and a memory.
  • the memory stores computer instructions; the processor executes the computer instructions stored in the memory, so that the computing device executes the methods provided by the foregoing first aspect or various possible implementations of the first aspect, so that the computing device deploys the foregoing second aspect or the first aspect.
  • the data redistribution device provided by various possible implementations of the two aspects.
  • the present application provides a computer-readable storage medium in which computer instructions are stored, and the computer instructions instruct the computing device to execute the foregoing first aspect or various possible implementations of the first aspect.
  • the method or the computer instruction instructs the computing device to deploy the data redistribution apparatus provided by the foregoing second aspect or various possible implementations of the second aspect.
  • the present application provides a computer program product.
  • the computer program product includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computing device can read the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computing device executes the foregoing first aspect or the method provided by various possible implementations of the first aspect, so that the calculation
  • the device deploys the data redistribution device provided by the foregoing second aspect or various possible implementations of the second aspect.
  • a distributed database system including: a management node and a data node, the management node includes the second aspect or various possible implementations of the data redistribution device of the second aspect or the data redistribution device of the third aspect Computing equipment.
  • a chip in a seventh aspect, is provided.
  • the chip may include a programmable logic circuit and/or program instructions, and is used to implement the data redistribution method according to any one of the first aspects when the chip is running.
  • Figure 1 is a schematic diagram of a data redistribution method provided by related technologies
  • FIG. 2 is a schematic diagram of an application scenario involved in a data redistribution method provided by an embodiment of the present application
  • FIG. 3 is a schematic flowchart of a data redistribution method provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of data nodes involved in a data redistribution method provided by an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a method for screening data to be migrated according to an embodiment of the present application
  • FIG. 6 is a schematic diagram of a mapping relationship provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a data migration process provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of an execution scenario of data migration provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a user business scenario of data migration provided by an embodiment of the present application.
  • FIG. 10 is a block diagram of a data redistribution device provided by an embodiment of the present application.
  • FIG. 11 is a block diagram of a second determining module provided by an embodiment of the present application.
  • FIG. 12 is a block diagram of a migration module provided by an embodiment of the present application.
  • FIG. 13 is a block diagram of another data redistribution device provided by an embodiment of the present application.
  • FIG. 14 is a block diagram of another data redistribution device provided by an embodiment of the present application.
  • FIG. 15 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
  • DDB distributed databases
  • data redistribution technology can be applied in scenarios such as system expansion, reduction, or data migration.
  • Online data redistribution refers to the completion of data redistribution without interrupting user services.
  • Relational databases refer to databases that use a relational model to organize data. They store data in the form of rows and columns. Usually a row of data is the smallest unit of data read and write. Called a record.
  • a relational database a series of rows and columns is called a data table, and a data table can be regarded as a two-dimensional table.
  • the relational model can be simply understood as a two-dimensional tabular model.
  • the relational database includes one or more data tables and description information of the relationship between the data tables.
  • Each data table includes table data and table information.
  • the table data is the data deployed in the data node in the data table, that is, the aforementioned data stored in the form of rows and columns.
  • the table information is the information describing the data table, such as description data Table definition and structure information.
  • the table information of the data table can be stored in each data node where the table data is deployed, or it can be stored by a separate node.
  • data is stored in a structured manner.
  • the fields of each data table are defined according to preset rules (that is, the structure of the table is predefined), and then stored according to the structure of the data table data. In this way, since the form and content of the data have been defined before being stored in the data table, the reliability and stability of the entire data table are relatively high.
  • a relational database data from one or more data tables are deployed in multiple data nodes of the database.
  • the way of creating temporary tables is usually used to realize online redistribution of data.
  • the first data table T1 corresponds to the data node ( Figure 1 takes 3 data nodes: nodes 1 to 3 as an example) and all the data of the first data table deployed: data 1 to 9 are copied to the corresponding data node of the temporary table ( Figure 1 takes 4 data nodes: nodes 1 to 4 as an example), the method of copying all the data in the data table at one time is called full data migration.
  • the relational database may be a greenplum database (Database, DB) or GaussDB.
  • greenplum DB is referred to as gpdb.
  • the table that is undergoing data replication will be locked, and data in the table is not allowed to perform data Add operation (also known as data insertion operation), data delete operation and data modification operation, only allow data query operations on the data in the table.
  • data Add operation also known as data insertion operation
  • data delete operation also known as data modification operation
  • GaussDB it is assumed that the first data table needs to be redistributed. After the temporary table is established, in order to allow data updates during the data redistribution process, such as data addition, deletion and/or modification, GaussDB is receiving After a data update request (such as a data addition request or a data deletion request), the specified file is used to record the updated data, so that after a full data migration is completed, the updated data during the full data migration process can be found, and based on the updated data The data performs incremental data migration.
  • the incremental data migration process refers to checking whether the specified file has updated records (including deleted records, modified records, and inserted records during the full data migration process). If there are updated records, it is based on the updated records. Record the copy of the updated data again.
  • the update operation may always exist, if the specified file has updated records after several incremental data migration processes, the first data table should be locked (such as an exclusive lock) for the last time, and data replication should be performed. After the data is copied, the exchange process between the first data table and the temporary table is executed, and the lock is finally released.
  • locked such as an exclusive lock
  • FIG. 2 is a schematic diagram of an application environment of a distributed database system (DDBS) involved in a data redistribution method provided by an embodiment of the present application.
  • the DDBS may be a server or a server cluster composed of multiple servers, and the DDBS includes a distributed database management system (DDBMS) and a DDB.
  • DDBMS distributed database management system
  • an application can transparently operate DDB through DDBS.
  • the data in DDB are stored in different local databases, managed by one or more DDBMS, run on different machines, and run on different machines. Supported by the operating system and connected by different communication networks.
  • the DDBS 10 includes: a management node (also called a database engine, coordinating data node, coordinator) 101 and a data node 102.
  • the DDBMS may be deployed on the management node 101, and the DDB may be deployed on multiple data nodes (datanode) 102.
  • the distributed database can be established based on the share-nothing architecture, that is, all data in the database is distributed on data nodes, and data between data nodes is not shared.
  • the management node 101 is used to manage the corresponding data node 102, and implement the operation of the application program 20 on the data node 102, such as performing data addition operations, data deletion operations, data modification operations, or data query operations.
  • the management node 101 may be a single node, or a designated data node among multiple data nodes 102 or a data node obtained by election, and it may be a server or a server cluster composed of multiple servers.
  • Each data node represents a set minimum processing unit of DDBS.
  • each data node may be an application instance or a database execution process for managing and/or storing data.
  • the DDBS can be deployed on a server or a server cluster composed of multiple servers.
  • a distributed database can have multiple data tables. The data records of each data table are distributed to each data node according to a user-defined distribution rule.
  • the data distribution rule is usually a hash (Hash) distribution, that is, key-value (key-value). value) distribution.
  • Hash distribution is a data distribution method based on a hash function, and the hash function can also be called a hash function.
  • key also known as the key value
  • value also known as the hash value
  • f the hash function.
  • the hash bucket algorithm is a special hash algorithm that can resolve hash conflicts.
  • a hash bucket is a container for placing a linked list of different keys (also called a hash table).
  • the hash bucket is also called an f (key) set or a value set.
  • the value corresponding to the same hash bucket is the same. Refer to the previous example, you can set the number of hash buckets to the value of modulus (also called modulus), that is, 5. Multiple value values correspond to multiple hash buckets one-to-one.
  • the value value can be used as the index or number of the hash bucket.
  • Each hash bucket stores keys with the same value.
  • the conflicting keys in the same hash bucket are stored in a singly linked list, which is solved. Hash conflict.
  • the hash function may also be a remainder function (in this case, the hash function is a remainder operation (Complementation). ) Function, the number of hash buckets is the value of the modulus), or other functions, which are not limited in the embodiment of the present application.
  • the embodiment of the present application provides a data redistribution method, which can be applied to the distributed database in the application environment shown in FIG. 2 and can simplify the complexity of online data redistribution. All or part of the method can be executed by the aforementioned management node . As shown in FIG. 3, the embodiment of the present application assumes that the first data table is the data table to be migrated, that is, the data table to be redistributed.
  • the method includes:
  • Step 301 The management node determines the first node set and the second node set respectively associated with the first data table in the distributed database.
  • the operation and maintenance personnel of the distributed database will adjust the data nodes according to the load situation of the database and other information.
  • the distributed database adds new data nodes (expansion scenarios) or needs to delete some data nodes (reduction scenarios) or need to do some
  • the adjustment of the stored data of the data node data migration scenario), or the need to adjust the data table between data nodes (inter-group data table adjustment scenario)
  • the operation and maintenance personnel can input data redistribution instructions to the management node, and the management node receives To the data redistribution instruction, and control the data node to perform data redistribution based on the data redistribution instruction.
  • the data redistribution instruction is a structured query language (Structured Query Language, SQL) for instructing data redistribution, which includes One or more SQL statements.
  • SQL Structured Query Language
  • the data nodes in the distributed database are divided into different data node groups, and each data node group contains the same or different numbers of data nodes.
  • the user wants to create a data node group When migrating the table to other data node groups, the table data needs to be redistributed on the new data node group to generate this scenario.
  • the content of data redistribution is different.
  • the data node after redistribution will include all data nodes before redistribution
  • the data redistribution instruction is an expansion instruction, which is used to indicate the data table involved in the expansion operation (in this embodiment, the first Data table), which is also used to indicate the data nodes added by the expansion operation
  • the data redistribution instruction is the scale-down instruction, which is used to indicate
  • the data table involved in the shrinking operation (the first data table in this embodiment) is also used to indicate the data nodes reduced by the shrinking operation
  • the data redistribution instruction is a data migration instruction, which is used to indicate the data table (the first data table in this embodiment) involved
  • the data redistribution instruction is a data migration instruction, which is used to indicate the data table involved in the data migration operation (The first data table in this embodiment) is also used to indicate the target data node group to be migrated by the data migration operation.
  • the management node may add a redistribution flag to the first data table, and the redistribution flag is used for Identifies that the first data table is in the data redistribution process. Subsequently, after receiving the user's service request, the management node can perform corresponding actions by querying whether the data table involved in the service request has a redistribution flag added.
  • the management node may obtain the first node set and the second node set based on the data redistribution instruction (that is, parse the SQL statement in the data redistribution instruction).
  • the first node set includes data nodes used to store data in the first data table before the data in the first data table is redistributed, that is, the first node set is current (that is, when step 301 is executed, step 302 is Before) the set of data nodes deployed by the data in the first data table;
  • the second node set includes the data nodes used to store the data in the first data table since the data in the first data table is redistributed by the data, and also That is, the second node set is the set of data nodes deployed by the data in the first data table after subsequent data migration (that is, after step 302).
  • both the first node set and the second node set include one or more data nodes.
  • the data node where the data in the current first data table is deployed can be directly queried to obtain the first node set.
  • the distributed database may maintain the current mapping relationship between the data of each data table and the data nodes of the node set where it is deployed, and each mapping relationship may be based on the deployment location of the data of the corresponding data table. Real-time update, so that the first node set corresponding to the first data table can be obtained by querying the mapping relationship; for example, the mapping relationship between the data in the first data table and the data nodes of the first node set is called the first mapping relationship , The first node set can be determined by querying the first mapping relationship.
  • the data redistribution instruction may carry an identification of the first node set, and the first node set is acquired based on the identification.
  • the second node set can be obtained directly through the data redistribution instruction; for example, in a capacity expansion scenario, the first node set and the data nodes added by the capacity expansion operation are determined as the data nodes included in the second node set.
  • Figure 2 takes the expansion scenario as an example, the first node set has a total of 4 data nodes, and the second node set has a total of 6 data nodes; in the shrinking scenario, the first node is concentrated to remove the shrinking operation The data nodes other than the reduced data nodes are determined as the second node set; in the data migration scenario, the target data node migrated by the data migration operation is determined as the second node set; in the inter-group data table adjustment scenario, the data The target data node group migrated by the migration operation is determined to be the second node set.
  • first node set and the second node set may also be other manners, and the embodiment of the present application is merely illustrative and does not limit this.
  • Figure 4 assumes that the first node set includes data nodes N1 to N6, and the second node set includes data nodes N2 to N5, and N7 to N9.
  • the data nodes involved in this data redistribution include data nodes N1 to N9.
  • the data nodes involved in the data redistribution can be ordered in a unified number, and the hash distribution rule is used to determine the data in the first data table and the data nodes in the first node set.
  • the first mapping relationship and the second section mapping relationship can be based on the principle of minimum movement amount (also known as the minimum movement of data). Principle) to determine. If the distributed system pre-stores the first mapping relationship between the data of the first data table and the data nodes of the first node set, the mapping relationship can be directly obtained, and the hash calculation is not performed again.
  • mapping relationship of the table data distribution can be organized. This makes it easy to find the direction of data movement in the subsequent data migration process. At the same time, it is also convenient to prepare for generating a distributed plan (also called a distributed execution plan) in the process of migrating the data of the first data table.
  • a distributed plan also called a distributed execution plan
  • the foregoing process of determining the first node set and the second node set is the process of determining which data nodes are involved before and after data redistribution, and the process of determining the mapping relationship is to determine which data node the data is specifically distributed before and after each data redistribution. process.
  • Step 302 The management node migrates the data of the first data table from the first node set to the second node set.
  • the principle of the migration action is similar to the data cut, which refers to the action of moving a piece of data from one node to another node.
  • the process of migrating the data of the first data table from the first node set to the second node set is the process of moving the data of the first data table from the first node set to the second node set.
  • the data moved from the first node set is no longer stored in the first node set.
  • the data migration process of the first data table that is, the process of data redistribution
  • the following optional implementation ways are used as examples for illustration, but this is not the case. No restriction:
  • the first optional implementation manner is to directly migrate all the data in the first data table from the first node set to the second node set. That is, all data in the first data table is used as the data to be migrated. Such a migration process is a full migration process.
  • the data to be migrated is filtered from the data in the first data table stored in the first node set, and the data to be migrated is the data of the first data table that the second node set did not store before the migration; Migrate the data to be migrated from the first node set to the second node set.
  • the data to be migrated includes data in the first data table except for invalid migration data. In this way, part of the table data can be migrated, the amount of data to be migrated can be reduced, the occupation of data resources can be reduced, and the migration efficiency can be improved.
  • the present application before filtering the data to be migrated among the data in the first data table stored in the first node set, it is also possible to detect whether the same data node exists in the first node set and the second node set; When the same data node exists in the first node set and the second node set, filter the data to be migrated from the data in the first data table stored in the first node set; when the first node set and the second node set do not have the same data When node, no filtering action is performed. Since the screening process of the data to be migrated is more computationally expensive than the foregoing detection process, additional screening of the data to be migrated can be avoided, thereby reducing the computational complexity and improving the efficiency of data migration.
  • the process of filtering the data to be migrated from the data in the first data table stored in the first node set may include:
  • Step 3021 the management node obtains the first mapping relationship between the data in the first data table and the data nodes of the first node set.
  • a distributed database In a distributed database, the distribution of data follows the principle of load balancing. With reference to the foregoing introduction, in order to ensure uniform distribution of data and achieve load balancing, hash distribution rules are usually used to distribute data on each data node. Further, in order to avoid hash conflicts, data distribution can also be carried out by introducing a hash bucket algorithm. In a distributed database that introduces hash bucket calculation, the data is usually distributed on each data node in the unit of hash bucket. , In order to achieve load balancing. Generally, a data node can deploy data corresponding to one or more hash buckets.
  • the first mapping relationship may be characterized by the mapping relationship between the hash value and the identifier of the data node in the first node set. Further, in a distributed database to which the hash bucket algorithm is applied, since in the hash bucket algorithm, the hash value and the hash bucket identifier correspond one-to-one, the first mapping relationship may also adopt the hash bucket identifier and the first A representation of the mapping relationship between the identifiers of the data nodes in the node set.
  • the identification of the data node may consist of one or more characters (such as numbers) to identify the data node; the identification of the data node may be the data node name (such as N1 or N2) or the data node number.
  • the hash bucket identifier can be composed of one or more characters (such as numbers) to identify the hash bucket; the hash bucket identifier can be the value of the calculated hash value, or the hash bucket number, such as 1 Or 2.
  • the first mapping relationship can be calculated in real time. If the distributed database records the first mapping relationship in advance, the pre-recorded first mapping relationship may also be directly acquired.
  • the first mapping relationship may be characterized by a relationship diagram, a relationship table, or a relationship index.
  • the first mapping relationship may be a relationship diagram as shown in FIG. 6.
  • the first mapping relationship may be characterized by a mapping relationship between the hash bucket number and the name of the data node in the first node set
  • data with hash bucket numbers 1 to 6 correspond to data nodes with data node names N1 to N6, and data with hash bucket numbers 7 to 12
  • the data nodes with node names from N1 to N6 correspond respectively
  • the data with hash bucket numbers from 13 to 17 correspond to the data nodes with data node names from N1 to N5 respectively.
  • the data node N1 corresponds to the hash buckets with hash bucket numbers 1, 7, and 13; the data node N2 corresponds to the hash buckets with hash bucket numbers 2, 8 and 14; Data node N3 corresponds to the hash buckets numbered 3, 9 and 15; data node N4 corresponds to the hash bucket numbers 4, 10 and 16; data node N5 corresponds to the hash bucket number 5 , 11 and 17 correspond to the hash buckets; data node N6 corresponds to the hash buckets numbered 6 and 12.
  • the name of the data node and the hash bucket number are in a one-to-many relationship.
  • Step 3022 the management node obtains a second mapping relationship between the data in the first data table and the data nodes of the second node set.
  • the second mapping relationship can be characterized in a variety of ways and in a variety of forms.
  • the second mapping relationship may be characterized by the mapping relationship between the hash value and the identifier of the data node in the second node set.
  • the second mapping relationship may be characterized by the mapping relationship between the hash bucket identifier and the identifier of the data node in the second node set.
  • the identification of the data node may consist of one or more characters (such as numbers) to identify the data node; the identification of the data node may be the data node name (such as N1 or N2) or the data node number.
  • the hash bucket identifier can be composed of one or more characters (such as numbers) to identify the hash bucket; the hash bucket identifier can be the value of the calculated hash value, or the hash bucket number, such as 1 Or 2.
  • the second mapping relationship can be calculated in real time, for example, determined based on the first mapping relationship and the principle of the minimum movement amount. If the second mapping relationship is pre-recorded in the distributed database, the pre-recorded second mapping relationship can also be directly obtained.
  • the second mapping relationship can be characterized in a relationship graph, a relationship table, or a relationship index.
  • the second mapping relationship may be a relationship diagram as shown in FIG. 6.
  • the second mapping relationship may be characterized by a mapping relationship between the hash bucket number and the name of the data node in the second node set
  • data with hash bucket numbers 1 to 6 correspond to data nodes with data node names N7, N2, N3, N4, N5, and N8, respectively
  • hash bucket numbers Data from 7 to 12 correspond to data nodes with data node names N9, N2, N3, N4, N7, and N8, respectively.
  • Data with hash bucket numbers from 13 to 17 and data node names are N9, N2, N3, N7 Correspond to the data node of N5 respectively.
  • the data node N2 corresponds to the hash buckets with hash bucket numbers 2, 8 and 14; the data node N3 corresponds to the hash buckets with hash bucket numbers 3, 9 and 15; Data node N4 corresponds to hash buckets with hash bucket numbers 4 and 10; data node N5 corresponds to hash buckets with hash bucket numbers 5 and 17; data node N7 corresponds to hash buckets with hash bucket numbers 1, 11, and 16 The data node N8 corresponds to the hash buckets with hash bucket numbers 6 and 12; the data node N9 corresponds to the hash buckets with hash bucket numbers 7 and 13.
  • the name of the data node and the hash bucket number are in a one-to-many relationship.
  • first mapping relationship and the second mapping relationship may be characterized by the same relationship diagram, relationship table, or relationship index, or may be characterized by their respective relationship diagrams, relationship tables, or relationship index.
  • FIG. 6 takes the first mapping relationship and the second mapping relationship to be represented by the same relationship diagram as an example for illustration, but it does not limit this.
  • Step 3023 Based on the first mapping relationship and the second mapping relationship, the management node filters the data to be migrated from the data in the first data table stored in the first node set.
  • the data to be migrated is the data that is deployed in the data node before and after the migration (ie data redistribution), that is, the effective migration data, and the data to be migrated is the second set of nodes that was not stored before the migration.
  • the data of the first data sheet is the data that is deployed in the data node before and after the migration (ie data redistribution), that is, the effective migration data, and the data to be migrated is the second set of nodes that was not stored before the migration.
  • each data in the first data table can be traversed, and by comparing the first mapping relationship and the second mapping relationship, the data in the first data table stored in the first node set can be selected to be migrated. data.
  • the data node corresponding to the target data determined based on the first mapping relationship is different from the data node corresponding to the target data determined based on the second mapping relationship
  • the data node corresponding to the target data determined based on the first mapping relationship is different from the data node corresponding to the target data determined based on the second mapping relationship.
  • the target data is determined as the data to be migrated.
  • the target data X in the first data table calculates the hash value. Assuming that the calculated hash value is 1, then the target data X is The hash value is stored in hash bucket 1, and the target data X is the data with hash bucket number 1. Based on the first mapping relationship, it can be known that the data node corresponding to the target data X is N1; based on the second mapping relationship, it can be known that the data node corresponding to the target data X is N7. It can be seen that the target data X has different data nodes before and after the data migration, and the target data X in the data node N1 is determined to be the data to be migrated.
  • the comparison process includes: for each data node in the first node set, query the first mapping relationship to obtain the first data set corresponding to the data node; query the second mapping relationship to obtain the second data node corresponding to the data node.
  • Data set The data in the first data set and the data in the second data set are different as the data to be migrated corresponding to the data node.
  • the acquired data to be migrated corresponding to each data node in the first node set constitutes the final data to be migrated.
  • the data node may not exist in the second node set. If the certain data node does not exist in the second node set, the second data corresponding to the certain data node The set is empty.
  • the first mapping relationship For the data node N2 in the first node set, query the first mapping relationship, and obtain the first data set corresponding to the data node as data with hash bucket numbers 2, 8 and 14; query the second mapping relationship to obtain the data node
  • the corresponding second data set includes data with hash bucket numbers 2, 8 and 14; then the data to be migrated corresponding to the data node N2 is empty.
  • the method for obtaining the data to be migrated for other data nodes is similar, and the details are not described in the embodiment of the present application.
  • the data to be migrated corresponding to the first node set includes data with hash bucket numbers 1, 11, and 16 (subsequently migrated from data nodes N1, N5, and N4 to data node N7 respectively), and hash buckets with numbers 6 and 12 Data (subsequent migration from data node N6 to data node N8) and data with hash bucket numbers 7 and 13 (subsequent migration from data node N1 to data node N9).
  • the data update is temporarily prohibited by adding an exclusive lock to the source table.
  • the entire migration process the source table is Need to be locked. If the data to be migrated is large, such as data of tens of giga (Giga, G) or tera (T), it will cause tens of minutes or even hours of user service congestion.
  • the overall migration process is divided into full migration and multiple incremental migrations. If the data to be migrated is large, such as tens of G or tens of T, it will cause tens of minutes of user service congestion.
  • the data to be migrated screening process in step 3023 can reduce the migration of a large amount of invalid migration data, thereby reducing business congestion. Time length, improve migration efficiency.
  • the foregoing process of migrating the data of the first data table from the first node set to the second node set may be performed by one or more distributed transactions.
  • the distributed transaction in the embodiment of the present application involves a management node and multiple data nodes.
  • Distributed transaction usually includes three stages: transaction start stage, transaction execution stage and transaction commit stage. Among them, in the process of executing the distributed transaction, at the beginning of the transaction, the management node needs to prepare certain statements for the subsequent transaction execution phase; in the transaction execution phase, the management node performs one or more actions involved in the distributed transaction , The multiple actions can be executed in parallel.
  • the action included in the distributed transaction may be a scanning action or a migration action.
  • the migration action can involve one or more SQL statements, and the actions included in a distributed transaction can also be to generate a distributed plan and send a distributed plan; follow the two-phase commit (2PC) protocol in the transaction commit phase Or a Three-Phase Commit (3PC) protocol to maintain the consistency of transaction execution on the management node and the multiple data nodes.
  • 2PC two-phase commit
  • 3PC Three-Phase Commit
  • the foregoing process of migrating the data of the first data table from the first node set to the second node set may be implemented by multiple distributed transactions executed in series.
  • the management node may execute the multiple distributed transactions serially to control the data nodes in the first node set and the second node set to implement data migration.
  • the management node selects the data to be migrated that meets the migration condition from the un-migrated data in the first data table in the first node set through the distributed transaction currently executed (The method for determining the data to be migrated can refer to the aforementioned steps 3021 to 3023), and the selected data to be migrated is migrated from the first node set to the second node set.
  • the selected data to be migrated is locked in the process of being migrated, and usually when the distributed transaction for migrating the data to be migrated is successfully submitted, the data to be migrated is unlocked.
  • the migration condition includes: the data volume of the data to be migrated through the currently executed distributed transaction is less than or equal to the specified data volume threshold, and/or the migration time of the currently executed distributed transaction is less than or Equal to the specified duration threshold.
  • the data volume of the data to be migrated can be characterized by the number of records.
  • the data of a record is also a row of data in the data table, which is the smallest unit of data migration.
  • the specified data volume threshold can be characterized by the specified number of data thresholds.
  • the aforementioned data volume threshold and the specified duration threshold may be fixedly set values, or may be dynamically changed values, respectively.
  • the data amount threshold may be determined based on the data amount of the first data table, and/or the current load information of the distributed database; and/or, based on the data amount of the first data table, and/or Or, the load information of the current resources (such as one or more of the CPU, memory, or IO resources) used by the distributed database determines the specified duration threshold.
  • the data volume of the first data table is positively correlated with the data volume threshold, and is positively correlated with the specified duration threshold
  • the current load information of the distributed database is negatively correlated with the data volume threshold and negatively correlated with the specified duration threshold. That is, the larger the data volume of the first data table, the larger the data volume threshold and the longer the duration threshold; the larger the current load of the distributed database, the smaller the data volume threshold and the smaller the duration threshold.
  • the management node migrates its corresponding data to be migrated through each distributed transaction currently executed, it can delete the migrated data of the first data table stored in the data node of the first node, so that it can be scanned later. , Distinguish which data has been migrated and which data has not been migrated.
  • the length of time the user's business is blocked is actually the length of time when the data is locked. Since the data migrated through each distributed transaction is different, for each migrated data, the length of time it is locked is the corresponding The duration of the migration process for distributed transactions.
  • the table data migration is performed in batches of multiple transactions in series. By limiting the amount of data and/or the migration duration of each distributed transaction, the resource consumption is avoided when each distributed transaction is executed. Large, reducing the lock time corresponding to each distributed transaction.
  • the length of time each migrated data is locked is equal to the migration time of the entire incremental migration process; in GaussDB, although the overall migration process is divided into For full migration and multiple incremental migrations, the length of time each migrated data is locked is relatively shortened, but the overall business block time is still relatively long.
  • the duration of each migrated data being locked is much less than the lock duration in the traditional data redistribution process .
  • the overall service congestion time can be reduced to about 1 minute, which is usually unaware by users. Therefore, compared with the traditional data redistribution method, it can effectively reduce the service congestion time, ensure smooth service, and improve user experience.
  • the lock added to the migrated data is a write lock, which avoids modification and deletion operations on the data during the migration process, but query operations on the data can still be performed.
  • the management node may initiate a series of multiple distributed transactions in sequence based on the determined first node set and second node set, and generate one or more distributed plans when each distributed transaction is executed. , And instruct the data nodes in the first node set and/or the second node set to execute the generated distributed plan, so as to realize the data migration in the aforementioned first data table.
  • each distributed plan corresponds to one or more data nodes.
  • the distributed plan includes one or more SQL statements, which are used to indicate the actions performed by the corresponding data nodes, and the sequence of actions to be performed.
  • the executed action may be a scanning action or a migration action.
  • the distributed plan may carry the foregoing migration conditions, or sub-migration conditions determined based on the migration conditions.
  • the management node may also adjust the content of the distributed plan in combination with the current system resource situation, for example, adjust the migration conditions or sub-migration conditions.
  • the distributed plan can be implemented by executing transactions or tasks in the corresponding data nodes. For example, when a data node receives the distributed plan, it can initiate a transaction (also called a local transaction) or task to execute the actions indicated in the distributed plan in the order indicated in the distributed plan.
  • the management node In the first optional manner, the management node generates multiple distributed plans based on the currently executed distributed transactions to instruct multiple data nodes to perform data migration in the first data table.
  • the first node set includes n data nodes, and n is a positive integer; the second node set includes m data nodes, and m is a positive integer.
  • the migration process includes:
  • Step 3024 Based on the distributed transactions currently executed, the management node generates n distributed plans for n data nodes, and the n data nodes correspond to the n distributed plans one-to-one; the management node instructs the n data nodes to execute separately n distributed plans to parallelly select the data to be migrated that meet the sub-migration conditions from the unmigrated data in the first data table among the n data nodes, and select the data to be migrated that meet the sub-migration conditions from the n data nodes Sent to the second node set.
  • the management node sends each of the n distributed plans generated based on the distributed transaction to the corresponding data node.
  • the distributed plan is executed by the corresponding data node.
  • the management node executes the next distributed transaction, and then generates n new distributed plans, and sends them to the corresponding data nodes, and so on. If all data of the first data table has been migrated, the management node cancels the table redistribution flag, and prepares for data migration of the next data table.
  • the aforementioned sub-migration conditions are determined based on the migration conditions.
  • the distributed plan may also carry the aforementioned migration sub-conditions. For example, when the migration condition is that the data amount of the data to be migrated that is migrated through the currently executed distributed transaction is less than or equal to the specified data amount threshold, the corresponding sub-migration condition is that the data that is migrated by executing the corresponding distributed plan The data volume of the data to be migrated is less than or equal to the sub-data volume threshold.
  • the sub-number threshold is less than the specified number threshold.
  • the thresholds for the number of sub-numbers corresponding to n distributed plans can be equal or unequal.
  • the sub-number threshold corresponding to n distributed plans may be equal to one-nth of the specified number threshold.
  • the sub-migration condition is that the migration duration of the currently executed distributed transaction migration is less than or equal to the sub-duration threshold .
  • the sub-duration threshold is less than or equal to the specified time-length threshold, and the maximum value of the sub-duration thresholds corresponding to n distributed plans is the aforementioned specified time-length threshold.
  • the sub-duration thresholds corresponding to n distributed plans can be equal or unequal.
  • the sub-time thresholds corresponding to n distributed plans are all equal to the specified time-length thresholds.
  • the distributed plan obtained can be implemented by executing transactions or tasks in the data node.
  • the first data node is any data node among n data nodes, it is taken as an example that the first data node executes a local transaction to implement a distributed plan.
  • the distributed plan generated for the first data node may include one or more SQL statements, which are used to instruct the first data node to perform the scanning action and the migration action, and the scanning action and the migration action are executed in parallel, and the target of the data migration
  • the data node is the second data node (that is, the data node in the second node set), and the distribution plan carries the sub-migration condition.
  • the first data node can scan the unmigrated data of the first data table stored in the first data node through local transaction scan (also called table scan) to select the data to be migrated that meet the sub-migration conditions, And the selected data to be migrated that meets the sub-migration conditions are sent from the first data node to the second data node in the second node set.
  • local transaction scan also called table scan
  • the first data node can traverse the first data stored in the first data node through a local transaction
  • the data obtained by the traversal is used as the data to be migrated.
  • the data to be migrated is obtained by filtering the data in the first data table, and the first data node traverses the un-migrated data in the first data table in the first data node through local transactions.
  • the screening process can refer to the aforementioned step 3023.
  • the unmigrated data scanned by n data nodes is all the data in the first data table.
  • the unmigrated data obtained by scanning by n data nodes is the data of the first data table that has not passed the previous distributed transaction migration.
  • the first data node can scan all records of the first data table stored in the first data node to obtain the un-migrated data through local transactions, that is, from the data stored in the first data node The beginning of the first data table is scanned from top to bottom.
  • the scanning mode provided by the first optional implementation mode in each distributed transaction executed by the management node, the first data node is instructed to scan all the records of the first data table stored in the first data node, The omission of data to be migrated can be avoided.
  • the first data node can record the position of the end of the scan through the local transaction, and the next distributed transaction executed by the management node , Instruct the first data node to scan backwards from the latest completed position of the record of the first data table stored in the first data node to obtain unmigrated data based on the corresponding distributed plan. In this way, it is possible to prevent the previously scanned records in the first data node from being scanned again.
  • the management node can generate n distributed plans through the last executed distributed transaction, and each distributed plan instructs the corresponding data node to scan the data of the first data table stored on the data node at one time, thereby avoiding data omissions. Or control n data nodes through multiple distributed transactions to scan different data of the first data table at the same time.
  • the foregoing steps 3023 and 3024 can be executed in a nested manner, that is, the specific action of the foregoing step 3023 is performed by the management node instructing the data node through a distributed plan .
  • Step 3025 The management node generates m distributed plans for m data nodes based on the distributed transactions currently executed, and the m data nodes correspond to the m distributed plans one-to-one; the management node instructs the m data nodes to execute m respectively A distributed plan to receive and store data in the first data table sent from the first set of nodes in parallel.
  • the distributed plan obtained can be implemented by executing transactions or tasks in the data node.
  • the second data node is any data node among m data nodes, it is taken as an example that the second data node executes a local transaction to implement a distributed plan.
  • the distributed plan generated for the second data node may include one or more SQL statements, which are used to instruct the second data node to perform the receiving action and the storing action, and the receiving action and the storing action are executed in parallel.
  • the source data of the data The node is the first data node. Based on the distributed plan, the second data node can receive and store the data of the first data table sent from the first node set through a local transaction.
  • the data node is deployed to execute local transactions of the distributed plan issued by the management node.
  • the local transaction executed by the data node may include two threads, and the two threads are used to perform the aforementioned scanning action and migration action respectively.
  • each local transaction includes a scanning thread and a sending thread.
  • the scanning thread is used to scan the unmigrated data of the first data table in the data node corresponding to the first node set (that is, when scanning the data of the first data table, skip After the deleted data), the data to be migrated is obtained, and the process of determining the data to be migrated can refer to the aforementioned step 3023; the sending thread is used to send the data to be migrated to the target data node in the second node set. Two threads can be executed in parallel to improve the efficiency of data redistribution. For each data node in the second node set, the data node is deployed to execute local transactions of the distributed plan issued by the management node.
  • the local transaction executed by the data node may include a receiving thread for receiving data sent by other data nodes and writing the received data to the local data node. Since data nodes in the first node set may also receive data from other nodes, the local transaction executed by each data node in the first node set may also include a receiving thread. Similarly, since data nodes in the second node set may also send data to other nodes, the local transaction executed by each data node in the second node set may also include a sending thread.
  • the data node can initiate a sending and receiving thread by executing a local transaction (that is, the local transaction includes a sending and receiving thread) for Complete the functions of the aforementioned sending thread and receiving thread, such as receiving and sending data.
  • a local transaction that is, the local transaction includes a sending and receiving thread
  • the data node in the first node concentration can send data to the target data node in the second node concentration to which the data to be migrated is migrated.
  • Migration completion notification also called end mark
  • the data node receives the corresponding source data node the source data node corresponding to the data node can be recorded in the distributed plan
  • multiple data nodes can be instructed to execute multiple distributed plans in parallel to perform data migration in parallel, which can effectively save the execution time of each distributed transaction and improve the execution of distributed transactions s efficiency.
  • the first node set includes data nodes N1 to N3, the second node set includes data node N4, and the management node migrates the data to be migrated through two distributed transactions executed in series.
  • the two distributed transactions include the first distributed transaction and the second distributed transaction.
  • the first node Based on the three distributed plans generated by the first distributed transaction, the first node centralizes the transactions 1a to the three data nodes.
  • 1c implementation based on the 3 distributed plans generated by the second distributed transaction, the first node centralizes the transactions 2a to 2c of the 3 data nodes.
  • each transaction in transactions 1a to 1c is executed to scan the data of multiple records that are not migrated by the corresponding data node After that, the migration of 1 record of data is completed.
  • each data node executes the corresponding distributed plan, so that each data node scans the data on the local data node through its transaction, finds the data to be migrated, and sends the data to be migrated Go to the target data node (data node N4 in FIG. 8), and delete the migrated data of the local data node that has been migrated by the transaction at the same time.
  • the scanning and migration actions are executed in parallel until the aforementioned migration condition is met, or each data node satisfies the corresponding sub-condition.
  • the management node submits the first distributed transaction to complete the migration of this batch of data.
  • the process of finding the data to be migrated can refer to the corresponding process in the foregoing step 302.
  • the execution process of the transactions 2a to 2c refer to the execution process of the aforementioned transactions 1a to 1c, which is not repeated in the embodiment of the present application.
  • a distributed plan corresponding to the data node N4 can also be generated, and the data node N4 implements the distributed plan by executing the transaction (not shown in FIG. 8), thereby receiving the data nodes N1 to N1 to N3 sends data and stores the received data to data node N4.
  • the management node In the second optional method, the management node generates a distributed plan based on the distributed transactions currently executed, and instructs the data nodes in the first node concentration and the data nodes in the second node concentration to execute the distributed plan to Select the data to be migrated that meets the migration condition from the unmigrated data in the first data table in the first node set, and migrate the selected data to be migrated from the first node set to the second node set.
  • the distributed plan corresponds to multiple data nodes in the first node set and the second node set, and it can be regarded as an integrated plan of n distributed plans and m distributed plans in the aforementioned first optional manner.
  • the distributed plan includes one or more SQL statements, which are used to indicate the actions performed by each data node in the first node set and the second node set, as well as the order of execution of the actions, etc.
  • the executed action may include a scanning action, a migration action, a receiving action, and/or a storage action.
  • the distributed plan may also carry the aforementioned migration conditions. After each data node receives the distributed plan, it can determine the action it needs to perform, and can also determine the sub-migration condition corresponding to it based on the migration condition.
  • the process of determining the migration condition can refer to the first alternative method mentioned above. .
  • the distributed plan can be implemented by executing transactions or tasks in data nodes.
  • the distributed database adopts a multi-version concurrency control (Multiversion Concurrency Control, MVCC) mechanism for data storage.
  • MVCC Multiversion Concurrency Control
  • the data deleted from a data node is not physically removed from the data node, but is also stored on the data node as a historical version.
  • the management node sets a deletion flag for the migrated data in the first data table on the first node set (or sets the deletion flag through the distributed plan control data node), and the deletion flag indicates that the data has been migrated.
  • the data is transformed into historical version data.
  • the migrated data described in the foregoing step 3025 is deleted, essentially recording the data as a historical version on the corresponding data node, and subsequently performing distributed transactions for data scanning, skipping the historical version
  • the data is sufficient (that is, skip the data with the delete mark). In this way, it can be ensured that during the data migration process, the user's data query operation for the historical version of the data is effectively performed.
  • the historical version of the data in the first node set for example, the data query operation for querying the historical version of the data
  • the historical version of the data will no longer be accessed and can be physically deleted.
  • the distributed database is based on the periodic data cleaning mechanism of its operation, and the historical version of the data will be removed from the data in the first data table, that is, physically removed from the distributed database (this process is the expired data cleaning process ).
  • Step 303 In the process of migrating the data of the first data table, the management node, when receiving the target service request for the first data table, determines the first node set and the second node set for responding to the target service request. Three-node set.
  • the corresponding service requests are data query requests and data addition requests (also called Data insertion request), data deletion request and data modification request.
  • a data query request is used to request a data query operation of data
  • a data addition request is used to request a data addition operation
  • a data deletion request is used to request a data deletion operation
  • a data modification request is used to request a data modification operation.
  • the data query service is divided into a data query service associated with one data table and a data query service associated with multiple data tables based on its relevance to a data table, and a data query corresponding to the data query service associated with a data table.
  • the data query operation indicated by the request only needs to query data in one data table, and the data query operation indicated by the data query request corresponding to the data query service associated with multiple data tables needs to query data in multiple data tables.
  • the data query request is: "Query the female employee information in company X".
  • the query operation involves only one data table, and the data query request is related to one The data query request corresponding to the data query business associated with the data table; for another example, the data query request is: "Query the information of female employees of the customer company of company X", assuming that the customer company of company X is recorded in the second data table, and different customers
  • the company’s female employee information is recorded in different data tables, then the query operation instructions first query the second data table to obtain the company X’s customer company identification, and then query the corresponding company’s data table based on the obtained identification to obtain company X’s customer Information about female employees of the company.
  • the data query request involves multiple data tables, and the data query request is a data query request corresponding to a data query service associated with the multiple data tables.
  • the data redistribution method can be applied to a variety of scenarios, and the target business request can be a data query request, a data addition request (also called an insertion request), a data deletion request, or a data modification request.
  • the target business request can be Data for one or more records.
  • the business data targeted by the same target business may involve data nodes before data redistribution and/or data nodes after data redistribution. For example, when data is hashed in a hash bucket, the data in the same hash bucket will be moved through multiple distributed transactions executed serially. Therefore, there will be data in the same hash bucket at the same time during the migration process.
  • the finally determined third node set is different, and the third node set includes one or more data nodes.
  • the embodiments of the present application take the following implementation scenarios as examples to illustrate the process of determining the third node set:
  • a third node set for responding to the data addition request is determined in the second node set.
  • the hash value is calculated according to the key value of the newly added data carried in the data addition request; the third node set for responding to the data addition request is determined in the second node set according to the hash value.
  • the data node in the second node set corresponding to the hash value is determined as the data node in the third node set.
  • the hash bucket corresponding to the hash value may be determined, and the data node corresponding to the hash bucket in the second node set may be determined as the data node in the third node set.
  • the data node obtained by the query can be determined as the data node in the third node set by querying the aforementioned second mapping relationship table.
  • the request indicates that the data addition operation corresponds to the newly added data D
  • the third node set stored in the newly added data D is determined by the hash distribution rule to be the data node N4, and the The newly added data D is stored in the data node N4.
  • the table is forcibly locked for migration, it may be locked The time of the table is relatively long and affects the user's business.
  • the newly added data is directly added to the data node of the second node set (that is, the third node set), so during the data migration process, there is no need to migrate these newly added data, and there is no need to Recording these new data can quickly realize the storage of the new data, effectively reduce the number of data migration, simplify the data migration process, improve the data migration efficiency, and reduce the impact on the user's business.
  • the first node when the target service request is a data deletion request or a data modification request or a data query request associated with the first data table, the first node centrally determines the data node used to respond to the target service request , And determine the data nodes used to respond to the target service request in the second node set, and the third node set is composed of the data nodes determined from the first node set and the data nodes determined from the second node set.
  • the first node centrally queries the data node used to respond to the data deletion request (that is, the data node where the data requested to be deleted by the data deletion request is located), and In the second node, the data nodes used for responding to the data deletion request are queried collectively, and the data nodes obtained by the respective queries are combined to form a third node set.
  • the data node of the data B requested to be deleted by the data deletion request is queried to obtain data node N2; in the second node set (including data node N4) The data node where the data B is located is queried to obtain the data node N4; the third node set composed of the queried data nodes includes the data nodes N2 and N4.
  • the fourth node set is determined in the first node set based on the aforementioned first mapping relationship table, based on the aforementioned second mapping relationship
  • the table determines the fifth node set in the second node set.
  • the deleted data may exist in both node sets, so the union of the fourth node set and the fifth node set is determined as the third node set, that is, the third node set includes the fourth node set and the fifth node set.
  • Both the fourth node set and the fifth node set include one or more data nodes.
  • the data node used to respond to the data modification request (that is, the data node where the data requested to be modified by the data modification request is located) is queried centrally at the first node, And in the second node, the data nodes used for responding to the data modification request are queried centrally, and the data nodes obtained from the respective queries are combined to form a third node set.
  • the data node where the data C requested to be modified by the data modification request is located is queried to obtain data node N3; and in the second node set (including data node In N4), the data node where the data C is located is queried to obtain the data node N4; the third node set composed of the data nodes obtained by the query includes the data nodes N3 and N4.
  • the sixth node set is determined in the first node set based on the aforementioned first mapping relationship table, based on the aforementioned second mapping relationship
  • the table determines the seventh node set in the second node set.
  • the modified data may exist in both node sets, so the union of the sixth node set and the seventh node set is determined as the third node set.
  • Both the sixth node set and the seventh node set include one or more data nodes.
  • the data node used to respond to the data query request is queried centrally at the first node (that is, the query requested by the data query request)
  • the data nodes used to respond to the data query request are queried centrally in the second node, and the data nodes obtained by the respective queries are combined to form a third node set.
  • the data node where the data A requested to be modified by the data modification request is located is queried to obtain data node N1
  • the second node set including data node In N4
  • the data node where the data A is located is queried, and the data node N4 is obtained
  • the third node set composed of the data nodes obtained by the query includes the data nodes N1 and N4.
  • the eighth node set is determined in the first node set based on the foregoing first mapping relationship table, based on the foregoing second mapping relationship
  • the table determines the ninth node set in the second node set.
  • the data to be queried may exist in both node sets, so the union of the eighth node set and the ninth node set is determined as the third node set.
  • Both the eighth node set and the ninth node set include one or more data nodes.
  • the data query request associated with the first data table may be a data query request associated with only the first data table, or a data query request associated with multiple data tables including the first data table. If the query request is a data query request associated with multiple data tables including the first data table, for each data table associated with the query request, the third node set corresponding to the data table used to respond to the query request
  • the acquisition method refers to the acquisition method of the third node set corresponding to the first data table when the data query request is only the data query request associated with the first data table, which is not described in detail in the embodiment of the present application.
  • the subsequent data query request needs to be sent to the third node set corresponding to the multiple data tables. For the sending process, refer to the subsequent step 304.
  • the number of data nodes in the third node set can be reduced, the amount of information subsequently interacted with the third node set can be reduced, and communication overhead can be saved.
  • the data targeted by the target business request can be one or more records of data.
  • the data of one record cannot exist on two data nodes at the same time, the same piece of data
  • the record can only be processed successfully in one of the data nodes.
  • the third node set is not determined based on the key value, the target service request needs to be sent to all the data nodes involved before and after the data redistribution. Because during the data migration process, all data nodes may have records that meet the conditions requested by the target business request. It can be seen that in the foregoing second implementation scenario, the operation of querying data nodes may not be performed, and the union of the first node set and the second node set may be directly determined as the third node set.
  • the target business request is a data query request.
  • the data query request is used to request to query data in a specified data range or a specified time range in the first data table.
  • the specified data range may be a range of data that meets a specified condition.
  • the specified time The range can be a time range earlier or later than the specified time point. Because the data targeted by the data query request is in the data migration process of the first data table, one part may be located at the data node before the data redistribution, and the other part may be located at the data node before the data redistribution.
  • first node set and the second node set can be directly combined Determined as the third node set.
  • directly determining the union of the first node set and the second node set as the third node set can also reduce the time delay of querying data nodes and improve service execution efficiency.
  • Step 304 The management node sends the target service request to the data node in the third node set.
  • the target service request is used for each data node in the third node set to perform service processing based on the target service request, and each data node in the third node set performs corresponding service processing after receiving the target service request. For example, assuming that the first data node is any data node in the third node set, the first data node performs the following process:
  • the first data node When the first data node receives the data query request, it checks whether the data node stores the data requested by the data query request. If the data node stores the data requested by the data query request, it obtains the information of the data and sends it to The management node sends a data query response, the data query response includes the queried data; if the data node does not store the data requested by the data query request, stop the action, or send a data query response to the management node, the data query response indicates The requested data was not found.
  • the first data node When the first data node receives the data addition request, it directly adds the new data to the data node. Optionally, the first data node may send an addition success response to the management node.
  • the first data node When the first data node receives the data modification request, it detects whether the data node stores the data modified by the data modification request. If the data node stores the data modified by the data modification request, the data is modified according to the data modification request. Optionally, send a data modification response to the management node, the data modification response includes the modified data, or indicates that the modification is successful; if the data node does not store the modified data requested by the data modification request, stop the action, or send it to the management node Send a data modification response, which indicates that the requested data does not exist.
  • the first data node When the first data node receives the data deletion request, it checks whether the data node stores the data requested by the data deletion request. If the data node stores the data requested by the data deletion request, delete the data according to the data deletion request . Optionally, send a data deletion response to the management node, the data deletion response indicating that the deletion is successful; if the data node does not store the data requested to be deleted by the data deletion request, stop the action, or send a data deletion response to the management node, the data The delete response indicates that the requested data does not exist.
  • the data is no longer stored in the data node before the migration after the migration, so it is guaranteed that the data of the same record will only be stored in one data in the distributed database.
  • the node instead of being stored on two data nodes, it can be ensured that there will be no conflict response to the aforementioned target service request.
  • Step 305 During the process of migrating the data of the first data table, if the management node detects a rollback trigger event, it rolls back the migrated data through multiple distributed transactions.
  • the rollback trigger event can be a data node failure (such as downtime) associated with the first data table in the second node set, or a data transmission error in the data node associated with the first data table in the second node set, or A network error occurred in a data node associated with the first data table in the second node set, or a data node associated with the first data table in the second node set received a rollback instruction, or a distributed transaction associated with the first data table submission failed, etc.
  • a data node failure such as downtime
  • a network error occurred in a data node associated with the first data table in the second node set, or a data node associated with the first data table in the second node set received a rollback instruction, or a distributed transaction associated with the first data table submission failed, etc.
  • the distributed database can still perform online services and data redistribution services normally.
  • step 305 can be replaced with: in the process of migrating the data of the first data table, if a rollback trigger event is detected, the data that has been migrated by the currently executed distributed transaction will be returned. roll.
  • the data in the data table is migrated through a distributed transaction. If a rollback trigger event is detected, all the currently migrated data will be rolled back, that is, the corresponding data of the distributed transaction will be cancelled. All actions performed. The amount of data to be rolled back is large, and all the migrated data is invalid. After the re-migration conditions are met, it needs to be re-migrated, resulting in repeated data migration, waste of resources, and poor fault tolerance of the database.
  • the aforementioned distributed transaction ensures the data consistency and durability of the migration process.
  • the overall data migration process is split into multiple distributed transactions executed in series.
  • all operations corresponding to a distributed transaction currently executed only need to be rolled back.
  • a new distributed transaction can be initiated for data migration. Therefore, the data granularity of the rollback and the amount of data to be rolled back are reduced, the amount of repeated migration of data is reduced, the overall impact of the rollback on the data migration process is reduced, the waste of resources is avoided, and the fault tolerance of the database is improved.
  • a data definition language (DDL) service includes services such as creating table information, modifying table information, and deleting table information.
  • the object of the operation requested by the DDL service is table information, that is, the definition and structure of the table.
  • the data migration process occurs in the data table, rather than between the source table and the temporary table. Therefore, the aforementioned DDL service is supported during the data migration process. For example, support to modify table meta information, allow to modify table names, add or delete fields in data tables, and so on.
  • the data redistribution method provided by the embodiments of the present application does not need to establish a temporary table to execute the target task and realize online data redistribution. In this way, there is no need to perform data migration between tables, only data migration within tables is required. , Thereby reducing the complexity of online data redistribution.
  • the amount of data to be migrated is effectively reduced, thereby reducing resource consumption and reducing the impact on other user operations that are executed at the same time.
  • the data with hash bucket numbers 1 to 17 needs to be migrated from the first node set to the second node set; and in the embodiment of the present application, Data with hash bucket number 1 needs to be moved from data node N1 to data node N7, data with hash bucket number 2 does not need to be moved, data with hash bucket number 7 needs to be moved from data node N1 to data node N9, etc. Wait.
  • only data with hash bucket numbers 1, 6, 7, 11, 12, 13, and 16 need to be migrated (the second node set in Figure 6 needs to receive the data nodes N7, N8, and N9 that need to receive the previously migrated data. Shaded), effectively reducing the amount of data migration.
  • the embodiment of the present application provides a data redistribution device 40, and the data redistribution device 40 may be deployed on a management node. As shown in FIG. 10, the data redistribution device 40 includes:
  • the first determining module 401 is configured to perform the aforementioned step 301;
  • the migration module 402 is configured to execute the aforementioned step 302;
  • the second determining module 403 is configured to perform the aforementioned step 303;
  • the sending module 404 is configured to execute the aforementioned step 304.
  • the data redistribution device provided in the embodiments of the present application can execute the target task without creating a temporary table, and realize online data redistribution. In this way, there is no need to perform data migration between tables, but only data migration within tables. , Thereby reducing the complexity of online data redistribution.
  • the second determining module 403 includes:
  • the determining submodule 4031 is configured to determine, in the second node set, the third node set used to respond to the data addition request when the target service request is a data addition request.
  • the determining submodule 4031 is configured to:
  • the data node corresponding to the hash value is determined in the second node set, and the determined data node belongs to the third node set.
  • the second determining module 403 is configured to:
  • the data node used to respond to the target service request is determined in the first node centrally, and the data node used to respond to the target service request is determined in the first node.
  • the second node collectively determines data nodes used to respond to the target service request, and the third node set is composed of data nodes determined from the first node set and data nodes determined from the second node set.
  • the migration module 402 includes:
  • the screening sub-module 4021 is configured to screen the data to be migrated from the data in the first data table stored in the first node set, and the data to be migrated is the data that is not stored in the second node set before the migration.
  • the migration sub-module 4022 is configured to migrate the data to be migrated from the first node set to the second node set.
  • the screening submodule 4021 is configured to:
  • the data node corresponding to the target data determined based on the first mapping relationship and the data node corresponding to the target data determined based on the second mapping relationship At different times, in the data nodes corresponding to the target data determined based on the first mapping relationship, the target data is determined to be the data to be migrated.
  • the migration submodule 4022 is configured to:
  • the migration submodule 4022 is configured to:
  • the data to be migrated that meets the migration condition is selected from the un-migrated data of the first data table in the first node set through the distributed transaction currently executed, And migrating the selected data to be migrated from the first node set to the second node set, and the selected data to be migrated is locked during the migration process;
  • the migration condition includes: the data volume of the data to be migrated that is migrated through the currently executed distributed transaction is less than or equal to a specified data volume threshold, and/or the migration duration of the currently executed distributed transaction migration Less than or equal to the specified duration threshold.
  • the migration submodule 4022 is configured to:
  • n distributed plans are generated for n data nodes respectively, the first node set includes the n data nodes, the n data nodes and the n distributions One-to-one correspondence between formula plans, and n is a positive integer;
  • n data nodes Instruct the n data nodes to execute the n distributed plans to select the data to be migrated that meet the sub-migration conditions from the un-migrated data of the first data table in the n data nodes in parallel, and The selected data to be migrated that satisfy the sub-migration condition are sent from the n data nodes to the second node set, and the sub-migration condition is determined according to the migration condition.
  • the apparatus 40 further includes:
  • the rollback module 405 is configured to roll back the data migrated by the currently working distributed transaction if it is detected that the distributed database reaches a rollback trigger event during the process of migrating the data of the first data table.
  • the rollback module 405 is configured to roll back the data that has been migrated through the currently executed distributed transaction if a rollback trigger event is detected during the process of migrating the data of the first data table.
  • the device 40 further includes:
  • the setting module 406 is configured to set a deletion flag for the migrated data in the first data table on the first node set.
  • FIG. 15 schematically provides a possible basic hardware architecture of the computing device described in this application.
  • the computing device 500 includes a processor 501, a memory 502, a communication interface 503, and a bus 504.
  • the number of processors 501 may be one or more, and FIG. 15 only illustrates one of the processors 501.
  • the processor 501 may be a central processing unit (CPU). If the computing device 500 has multiple processors 501, the types of the multiple processors 501 may be different or may be the same. Optionally, multiple processors 501 of the computing device 500 may also be integrated into a multi-core processor.
  • the memory 502 stores computer instructions and data; the memory 502 can store computer instructions and data required to implement the data redistribution method provided by the present application. For example, the memory 502 stores instructions for implementing the steps of the data redistribution method.
  • the memory 502 may be any one or any combination of the following storage media: non-volatile memory (for example, read only memory (ROM), solid state drive (SSD), hard disk (HDD), optical disk)), volatile memory.
  • the communication interface 503 may be any one or any combination of the following devices: a network interface (for example, an Ethernet interface), a wireless network card, and other devices with a network access function.
  • the communication interface 503 is used for data communication between the computing device 500 and other computing devices or terminals.
  • the bus 504 can connect the processor 501 with the memory 502 and the communication interface 503. In this way, through the bus 504, the processor 501 can access the memory 502, and can also use the communication interface 503 to interact with other computing devices or terminals.
  • the computing device 500 executes the computer instructions in the memory 502, so that the computing device 500 implements the data redistribution method provided in this application, or causes the computing device 500 to deploy a data redistribution apparatus.
  • non-transitory computer-readable storage medium including instructions, such as a memory including instructions, which can be executed by a processor of a server to complete the emoticons shown in the various embodiments of the present invention.
  • the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
  • An embodiment of the present application provides a distributed database system, including: a management node and a data node, and the management node includes any one of the foregoing data redistribution apparatus 40 or the foregoing computing device 500.
  • the computer may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software it may be implemented in whole or in part in the form of a computer program product, which includes one or more computer instructions.
  • the computer may be a general-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data.
  • the center transmits to another website site, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium, or a semiconductor medium (for example, a solid state hard disk).
  • the program can be stored in a computer-readable storage medium.
  • the storage medium mentioned can be a read-only memory, a magnetic disk or an optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Operations Research (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据重分布方法、装置及系统,应用于数据库领域,用于数据存储。该方法包括:确定分布式数据库中与第一数据表分别关联的第一节点集和第二节点集,第一节点集包括在第一数据表的数据被数据重分布之前用于存储第一数据表中的数据的数据节点,第二节点集包括从第一数据表的数据被数据重分布开始用于存储第一数据表中的数据的数据节点;将第一数据表的数据从第一节点集迁移至第二节点集;在迁移第一数据表的数据的过程中,当接收到对第一数据表的目标业务请求时,在第一节点集和第二节点集中确定用于响应目标业务请求的第三节点集;将目标业务请求发送至第三节点集中的数据节点。能够降低在线数据重分布的复杂度。

Description

数据重分布方法、装置及系统 技术领域
本申请涉及数据库领域,特别涉及一种数据重分布方法、装置及系统。
背景技术
在线数据重分布是指在不中断用户业务的情况下完成数据重新分布。目前越来越多的数据库都在应用该技术。
在关系型数据库中,数据库的多个节点中维护有一个或多个数据表的数据。通常采用创建临时表的方式来实现数据的在线重分布。例如,对于需要进行数据重分布的第一数据表,先为该表创建临时表。然后将该第一数据表对应节点上部署的第一数据表的所有数据对应复制至临时表对应节点中,在完成数据复制后,交换临时表的数据和第一数据表的数据(该过程称为数据切换),交换完成后,删除临时表和临时表的数据,至此即完成了数据重分布。
在上述数据重分布过程中,需要保证源表(即第一数据表)和临时表的数据一致性,还需要执行数据切换过程,因此,在线数据重分布的复杂度较高。
发明内容
本申请实施例提供一种数据重分布方法、装置及系统,可以降低在线数据重分布的复杂度。
第一方面,提供一种数据重分布方法,包括:
确定分布式数据库中与第一数据表分别关联的第一节点集和第二节点集,第一节点集包括在第一数据表的数据被数据重分布之前用于存储第一数据表中的数据的数据节点,第二节点集包括从第一数据表的数据被数据重分布开始用于存储第一数据表中的数据的数据节点;
将第一数据表的数据从第一节点集迁移至第二节点集;
在迁移所述第一数据表的数据的过程中,当接收到对第一数据表的目标业务请求时,在第一节点集和第二节点集中确定用于响应目标业务请求的第三节点集;
将目标业务请求发送至第三节点集中的数据节点,目标业务请求用于供第三节点集中每个节点基于目标业务请求进行业务处理。
本申请实施例提供的数据重分布方法,无需建立临时表,即可进行目标任务的执行,实现在线数据重分布,这样无需进行表间数据迁移,仅需进行表内数据迁移,从而降低了在线数据重分布的复杂度。
在一种可能实现中,在第一节点集和第二节点集中确定用于响应目标业务请求的第三节点集,包括:当目标业务请求为数据添加请求时,在第二节点集中确定用于响应所述数据添加请求的第三节点集。
将新增数据直接写入重分布后的节点,可以有效降低重分布的复杂度,提高数据迁移效率
例如,在第二节点集中确定用于响应所述数据添加请求的第三节点集,包括:根据数据添加请求所携带的新增数据的键值计算哈希值;在第二节点集中确定所述哈希值对应的数据节点,确定的数据节点属于第三节点集。
采用哈希分布规则进行数据分布可以实现负载均衡。
在一种可能实现中,在第一节点集和第二节点集中确定用于响应目标业务请求的第三节点集,包括:
当所述目标业务请求为数据删除请求或者数据修改请求或者与第一数据表关联的数据查询请求时,在所述第一节点集中确定用于响应所述目标业务请求的数据节点,并在所述第二节点集中确定用于响应所述目标业务请求的数据节点,由从所述第一节点集中确定的数据节点和从所述第二节点集中确定的数据节点组成所述第三节点集。
在一种可能实现中,将第一数据表的数据从第一节点集迁移至第二节点集,包括:在第一节点集存储的第一数据表的数据中筛选待迁移数据,所述待迁移数据为所述第二节点集在迁移前没有存储的所述第一数据表的数据;将待迁移数据从第一节点集迁移至第二节点集。
由于在一些场景中,例如扩容场景中,一些数据可能无需进行迁移,这些数据可以称之为无效迁移数据。例如,在迁移前和迁移后在数据节点中部署的位置不变的数据和/或在迁移动作前已经删除的数据,对这些数据执行迁移动作,不仅占用数据资源,还会影响迁移的效率。因此可以通过筛选操作剔除无效迁移数据,将实际需要进行迁移的数据作为待迁移数据,进行数据迁移。也即是该待迁移数据包括第一数据表的数据中除无效迁移数据之外的数据。这样可以实现表数据的部分迁移,减少迁移的数据量,减少数据资源占用,提高迁移效率。
示例的,在第一节点集存储的第一数据表的数据中筛选待迁移数据,包括:获取第一数据表中的数据与第一节点集的数据节点的第一映射关系;获取第一数据表中的数据与第二节点集的数据节点的第二映射关系;对于所述第一数据表中的目标数据,在基于所述第一映射关系确定的与所述目标数据对应的数据节点与基于所述第二映射关系确定的与所述目标数据对应的数据节点不同时,在基于所述第一映射关系确定的与所述目标数据对应的数据节点中,将所述目标数据确定为所述待迁移数据。
在一种可能实现中,通过串行执行的多个分布式事务,分别将所述第一数据表的不同数据从所述第一节点集迁移至所述第二节点集。
由于采用串行执行的多个分布式事务进行数据迁移,虽然迁移第一数据表的总耗时不一定减短,但每次分布式事务的资源消耗少,单次迁移时间短,由于已经迁移成功的事务的数据,是不需要再重新迁移的,因此,如果一次迁移失败后重新进行数据迁移的代价较低,资源消耗较小,减少了对同时执行的其他用户作业的影响。
在一种可能实现中,所述通过串行执行的多个分布式事务,分别将所述第一数据表的不同数据从所述第一节点集迁移至所述第二节点集,包括:
在串行执行所述多个分布式事务时,通过当前执行到的分布式事务,从所述第一节点集中的所述第一数据表的未迁移数据中选择满足迁移条件的待迁移数据,并将选择的所述待迁移数据从所述第一节点集迁移至所述第二节点集,选择的所述待迁移数据在被迁移过程中被加锁;
其中,所述迁移条件包括:通过当前执行到的分布式事务迁移的所述待迁移数据的数据量小于或等于指定数据量阈值,和/或,通过当前执行到的分布式事务迁移的迁移时长小于或等于指定时长阈值。
在一种可能实现中,所述在串行执行所述多个分布式事务时,通过当前执行到的分布式事务,从所述第一节点集中的所述第一数据表的未迁移数据选择满足迁移条件的待迁移数据,并将选择的所述待迁移数据从所述第一节点集迁移至所述第二节点集,包括:
基于所述当前执行到的分布式事务,为n个数据节点分别生成n个分布式计划,所述第一节点集包括所述n个数据节点,所述n个数据节点与所述n个分布式计划一一对应,n为正整数;
指示所述n个数据节点分别执行所述n个分布式计划来并行从所述n个数据节点中的所述第一数据表的未迁移数据中选择满足子迁移条件的待迁移数据、并将选择的满足所述子迁移条件的所述待迁移数据从所述n个数据节点发送至所述第二节点集,所述子迁移条件是根据所述迁移条件确定的。
在一种可能实现中,该方法还包括:在迁移所述第一数据表的数据的过程中,若检测到回滚触发事件,将通过所述多个分布式事务已迁移的数据进行回滚。
在一种可能实现中,该方法还包括:在迁移所述第一数据表的数据的过程中,若检测到回滚触发事件,将通过当前执行到的分布式事务已迁移的数据进行回滚。
其中,回滚触发事件可以是第一数据表关联的数据节点故障(如宕机),数据传输错误,网络错误,或接收到回滚指令等。
本申请实施例中,前述分布式事务保证了迁移过程的数据一致性和持久性,当分布式事务有多个时,针对第一数据表的整体的数据迁移过程拆分成通过多个分布式事务的迁移过程,若检测到回滚触发事件,只需将当前工作的一个分布式事务的所有操作进行回滚,在再次满足迁移条件后可以继续发起新的分布式事务进行数据迁移。因此,降低了回滚的数据粒度以及回滚的数据量,减少重复迁移的数据量,减少回滚对该数据迁移过程整体上影响,避免资源浪费,提高数据库的容错性。
在一种可能实现中,所述方法包括:为所述第一节点集上的所述第一数据表中的已迁移数据设置删除标识。
即迁移后的数据被删除后,实质上作为历史版本记录在相应的数据节点上,后续分布式事务进行数据扫描时,跳过该历史版本的数据即可(即跳过设置有删除标记的数据)。这样,可以保证数据迁移过程中,用户针对该历史版本的数据的数据查询操作有效执行。
第二方面,提供一种数据重分布装置,所述装置可以包括至少一个模块,该至少一个模块可以用于实现上述第一方面或者第一方面的各种可能实现提供的所述数据重分布方法。
第三方面,本申请提供一种计算设备,该计算设备包括处理器和存储器。该存储器存储计算机指令;该处理器执行该存储器存储的计算机指令,使得该计算设备执行上述第一方面或者第一方面的各种可能实现提供的方法,使得该计算设备部署上述第二方面或者第二方面的各种可能实现提供的该数据重分布装置。
第四方面,本申请提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机指令,该计算机指令指示该计算设备执行上述第一方面或者第一方面的各种可能实现提供的方法,或者该计算机指令指示该计算设备部署上述第二方面或者第二方面的各种可能实现提供的数据重分布装置。
第五方面,本申请提供一种计算机程序产品,该计算机程序产品包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算设备的处理器可以从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算设备执行上述第一方面或者第一方面的各种可能实现提供的方法,使得该计算设备部署上述第二方面或者第二方面的各种可能实现提供的数据重分布装置。
第六方面,提供一种分布式数据库系统,包括:管理节点和数据节点,所述管理节点包括第二方面或者第二方面的各种可能实现所述的数据重分布装置或第三方面所述的计算设备。
第七方面,提供一种芯片,所述芯片可以包括可编程逻辑电路和/或程序指令,当所述芯片运行时用于实现如第一方面任一所述的数据重分布方法。
附图说明
图1是相关技术提供的一种数据重分布方法的示意图;
图2是本申请实施例提供的一种数据重分布方法涉及的应用场景的示意图;
图3是本申请实施例提供的一种数据重分布方法的流程示意图;
图4是本申请实施例提供的一种数据重分布方法涉及的数据节点的示意图;
图5是本申请实施例提供的一种待迁移数据的筛选方法的流程示意图;
图6是本申请实施例提供的一种映射关系示意图;
图7是本申请实施例提供的一种数据迁移的流程示意图;
图8是本申请实施例提供的一种数据迁移的执行场景示意图;
图9是本申请实施例提供的一种数据迁移的用户业务场景示意图;
图10是本申请实施例提供的一种数据重分布装置的框图;
图11是本申请实施例提供的一种第二确定模块的框图;
图12是本申请实施例提供的一种迁移模块的框图;
图13是本申请实施例提供的另一种数据重分布装置的框图;
图14是本申请实施例提供的又一种数据重分布装置的框图;
图15是本申请实施例提供的一种计算设备的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
很多分布式数据库(Distributed Database,DDB)都支持数据重分布技术,比如在系统扩容、缩容或数据迁移等场景均可以应用数据重分布技术。在线数据重分布是指在不中断用户业务的情况下完成数据重分布。
分布式数据库可以包括关系型数据库(Relational database),关系型数据库是指采用了关系模型来组织数据的数据库,其以行和列的形式存储数据,通常一行数据是数据读写的最小单位,也称为一条记录。关系型数据库中,一系列的行和列被称为数据表,一个数据表可以视为一个二维表。关系模型可以简单理解为二维表格模型。关系型数据库包括一个或多个数据表以及数据表之间的关系描述信息。每个数据表包括表数据和表信息,表数据是该数据表内部署在数据节点中的数据,即前述以行和列的形式存储的数据,表信息是描述数据表的信息,例如描述数据表的定义和架构的信息,数据表的表信息可以存储在表数据所部署的各个数据节点,也可以由单独的节点保存。
在关系型数据库中,按照结构化的方式存储数据,每个数据表的各个字段均按照预先设置的规则定义好(也即是表的结构是预先定义的),再根据数据表的结构存入数据。这样,由于数据的形式和内容在存入数据表之前就已经定义好了,所以整个数据表的可靠性和稳定性都比较高。
在关系型数据库中,数据库的多个数据节点中部署有一个或多个数据表的数据。通常采用创建临时表的方式来实现数据的在线重分布。例如,请参考图1,对于需要进行数据重分布的第一数据表T1(可以称为源表),先为该表创建临时表T2。然后将该第一数据表T1对应数据节点(图1以3个数据节点:节点1至3为例)上部署的第一数据表的所有数据:数据1至9,复制至临时表对应数据节点(图1以4个数据节点:节点1至4为例)中,将数据表中的全部数据一次性复制的方式称为全量数据迁移。在完成数据复制过程后,交换临时表T2的数据和第一数据表的数据,在交换完成后,删除临时表的数据和第一数据表的数据,至此即完成了数据重分布的完整过程。示例的,关系型数据库可以为greenplum数据库(Database,DB)或者GaussDB。其中,greenplum DB简称gpdb。
数据从源表所在节点复制到临时表所在节点(也称数据重分布)的过程中,如果执行数据的增加、删除和/或修改等数据的更新操作,可能会导致临时表和源表的数据不一致,因此通过对源表加独占锁来暂时禁止数据更新,当数据切换过程完成后再解锁。
示例的,在gpdb中,为了避免数据重分布过程出现数据的增加、删除和/或修改等数据的更新操作,正在进行数据复制的表会被加锁,不允许对该表中的数据进行数据添加操作(也称数据插入操作)、数据删除操作和数据修改操作,只允许对该表中的数据进行数据查询操作。
在GaussDB中,假设需要进行数据重分布的是第一数据表,在建立了临时表后,为了能够在数据重分布过程中允许数据更新,例如数据的增加、删除和/或修改,GaussDB在接收到数据更新请求(例如数据添加请求或数据删除请求)后,采用指定文件记录更新的数据,以便在完成一次全量数据迁移后,可以找到该全量数据迁移过程中更新的数据,并基于该更新的数据执行增量数据迁移。该增量数据迁移过程指的是,检查指定文件是否有更新的记录(包括全量数据迁移过程中的删除的记录,修改的记录和插入的记录等),如果有更新的记录,则基于更新的记录再次执行更新的数据的复制。由于更新操作可能总是会存在,因此执行几次增量数据迁移过程后如果指定文件还有更新的记录,最后一次应该对第一数据表加锁(如 独占锁),并执行数据复制,在数据复制后,执行第一数据表和临时表的交换过程,最后释放锁。
在上述数据重分布过程中,需要保证源表和临时表的数据一致性,还需要执行数据切换过程,因此。在线数据重分布的复杂度较高。并且,数据表全量迁移耗时长,资源消耗大(包括中央处理器(CPU,central processing unit)、内存、输入/输出(input/output,IO)资源等多种资源的消耗均较大),而同时执行的其他用户作业就可能由于资源不足而受到影响。
请参考图2,图2是本申请实施例提供的一种数据重分布方法所涉及的分布式数据库系统(Distributed Database System,DDBS)的应用环境的示意图。该DDBS可以为一个服务器或者由多个服务器组成的服务器集群,该DDBS包括分布式数据库管理系统(Distributed Database Management System,DDBMS)和DDB。在分布式数据库系统中,一个应用程序可以通过DDBS对DDB进行透明操作,DDB中的数据分别在不同的局部数据库中存储、由一个或多个DDBMS进行管理、在不同的机器上运行、由不同的操作系统支持,并被不同的通信网络连接在一起。其中,DDBS10包括:管理节点(也称数据库引擎,协调数据节点,coordinator)101和数据节点102。DDBMS可以部署在管理节点101上,DDB可以部署在多个数据节点(datanode)102上。该分布式数据库可以基于share-nothing架构建立,即数据库的所有数据都分布在数据节点上,数据节点之间的数据不共享。
管理节点101用于管理相应的数据节点102,并实现应用程序20对数据节点102的操作,例如执行数据添加操作、数据删除操作、数据修改操作或数据查询操作等。
本申请实施例中,管理节点101可以为单独一个节点,或者多个数据节点102中指定数据节点或者选举得到的数据节点,其可以为一个服务器或者由多个服务器组成的服务器集群。每个数据节点表征DDBS的一个设定的最小处理单元。示例的,每个数据节点可以为管理和/或存储数据的一个应用实例或一个数据库执行进程。该DDBS可以部署在一个服务器或者由多个服务器组成的服务器集群。分布式数据库可以具有多个数据表,每个数据表的数据记录根据用户定义的分布规则来分布到各个数据节点上,数据分布规则通常为哈希(Hash)分布,即键-值(key-value)分布。
为了便于读者理解,本申请实施例对哈希分布原理进行简单介绍。哈希分布是基于哈希函数的一种数据分布方法,哈希函数也可以称为散列函数。哈希函数是基于数据的键(key,也称键值,在分布式系统中也称分布键值),得到值(value,也称哈希值)的一种函数。即value=f(key),函数f即为哈希函数。以表1为例,假设哈希函数为f(key)=key mod 5,“mod”表示取模,即该哈希函数为取模运算(Module Operation)函数。则假设key分别为1、2、3、4、5、6、7、8和9,则对应的value分别为1、2、3、4、0、1、2、3和4。
表1
key 1 2 3 4 5 6 7 8 9
value 1 2 3 4 0 1 2 3 4
由上可知,key为1和6时,value都为1。因此,采用哈希函数确定value可能存在不同的key对应相同的value的情况,这种情况称为哈希冲突。哈希桶算法是一种特殊的哈希算法,其能够解决哈希冲突。哈希桶为放置不同key链表(也称哈希表)的容器,该哈希桶也称f(key)集合,或value集合。同一哈希桶对应的value相同。参考前述例子,可以设置哈希桶的个数 为模数(也称模)的值,即5。多个value值与多个哈希桶一一对应。示例的,可以采用value值作为哈希桶的索引或编号,每个哈希桶存放具有相同value的key,同一个哈希桶中冲突的key之间用单向链表进行存储,这样就解决了哈希冲突。在查找与key对应的数据时,只需要通过key索引到对应value的哈希桶,然后从哈希桶的首地址对应的节点开始查找,即按照链表顺序查找,对比key的值,直到找到对应key,再基于查找到的key索引到对应的数据。如表1所示,key为1和6时,存储在哈希桶1中,key为2和7时,存储在哈希桶2中;key为3和8时,存储在哈希桶3中;key为4和9时,存储在哈希桶4中;key为5时,存储在哈希桶0中。
需要说明的是,前述实施例仅以哈希函数为取模的函数为例进行说明,实际上该哈希函数还可以为取余的函数(此时,该哈希函数为取余运算(Complementation)函数,哈希桶的个数为模数的值),或者其他函数,本申请实施例对此不做限定。
本申请实施例提供一种数据重分布方法,可以应用于图2所示的应用环境中的分布式数据库,可以简化在线数据重分布的复杂度,该方法的全部或部分可以由前述管理节点执行。如图3所示,本申请实施例假设第一数据表为待迁移数据表,也即是待进行数据重分布的数据表,该方法包括:
步骤301、管理节点确定分布式数据库中与第一数据表分别关联的第一节点集和第二节点集。
分布式数据库的运维人员会根据数据库的负载情况等信息进行数据节点的调节,当分布式数据库增加了新的数据节点(扩容场景)或需要删除部分数据节点(缩容场景)或需要进行一些数据节点的存储数据的调整(数据迁移场景),或需要进行数据节点的组间的数据表调整(组间数据表调整场景),运维人员可以向管理节点输入数据重分布指令,管理节点接收到该数据重分布指令,并基于该数据重分布指令控制数据节点进行数据重分布,该数据重分布指令为用于指示进行数据重分布的结构化查询语言(Structured Query Language,SQL),其包括一条或多条SQL语句。其中,在组间数据表调整场景,分布式数据库中的数据节点划分为不同的数据节点组,每个数据节点组包含相同或不同数量的数据节点,当用户希望将某个数据节点组上创建的表迁移到其他的数据节点组上,需要将表数据在新的数据节点组上重新分布,从而产生该场景。
在不同数据重分布场景中,数据重分布内容不同。例如,在扩容场景中,重分布后的数据节点会包含所有重分布前的数据节点,数据重分布指令为扩容指令,用于指示扩容操作所涉及的数据表(在本实施例中为第一数据表),还用于指示扩容操作所增加的数据节点;在缩容场景中,重分布前的数据节点会包含所有重分布后的数据节点,数据重分布指令为缩容指令,用于指示缩容操作所涉及的数据表(在本实施例中为第一数据表),还用于指示缩容操作所减少的数据节点;在数据迁移场景,重分布后的数据节点和重分布前的数据节点之间可能有数据节点重叠,也可能没有重叠;数据重分布指令为数据迁移指令,用于指示数据迁移操作所涉及的数据表(在本实施例中为第一数据表),还用于指示数据迁移操作所迁移的目标数据节点。在组间数据表调整场景中,通常重分布后的数据节点和重分布前的数据节点之间没有数据节点重叠;数据重分布指令为数据迁移指令,用于指示数据迁移操作所涉及的数据表(在本实施例中为第一数据表),还用于指示数据迁移操作所迁移的目标数据节点组。
值得说明的是,数据重分布还可以有其他场景,本申请实施例只是示意性说明,并不对此进行限定。在数据重分布指令触发数据重分布过程之后,为了能够有效识别第一数据表在当前是否处于数据重分布过程中,管理节点可以为该第一数据表添加重分布标志,该重分布标志用于标识该第一数据表处于数据重分布过程。后续,管理节点在接收到用户的业务请求后,可以通过查询业务请求所涉及的数据表是否添加有重分布标志,来执行相应动作。
管理节点可以基于该数据重分布指令(即解析该数据重分布指令中的SQL语句),获取第一节点集和第二节点集。该第一节点集包括在第一数据表的数据被数据重分布之前用于存储第一数据表中的数据的数据节点,也即是第一节点集为当前(即步骤301执行时,步骤302之前)第一数据表中的数据所部署的数据节点的集合;该第二节点集包括从第一数据表的数据被数据重分布开始用于存储第一数据表中的数据的数据节点,也即是第二节点集为后续经过数据迁移后(即步骤302之后),第一数据表中的数据所部署的数据节点的集合。在本申请实施例中,第一节点集和第二节点集均包括一个或多个数据节点。
第一节点集的获取方式可以有多种。在一种可选方式中,可以直接查询当前第一数据表中的数据所部署的数据节点,得到第一节点集。在另一种可选方式中,分布式数据库中可以维护有当前每个数据表的数据与其部署的节点集的数据节点的映射关系,每个映射关系可以基于对应的数据表的数据的部署位置实时更新,从而可以通过查询该映射关系得到第一数据表所对应的第一节点集;例如,第一数据表中的数据与第一节点集的数据节点的映射关系,称为第一映射关系,通过查询该第一映射关系可以确定第一节点集。在又一种可选方式中,数据重分布指令可以携带该第一节点集的标识,基于该标识获取第一节点集。
第二节点集的获取方式也可以有多种。第二节点集可以直接通过数据重分布指令获取;例如,在扩容场景中,将第一节点集以及扩容操作所增加的数据节点确定为第二节点集包括的数据节点。如图2所示,图2以扩容场景为例,则第一节点集共4个数据节点,第二节点集共6个数据节点;在缩容场景中,将第一节点集中除缩容操作所减少的数据节点之外的数据节点确定为第二节点集;在数据迁移场景,将数据迁移操作所迁移的目标数据节点确定为第二节点集;在组间数据表调整场景中,将数据迁移操作所迁移的目标数据节点组确定为第二节点集。
值得说明的是,第一节点集和第二节点集的确定方式还可以是其他方式,本申请实施例只是示意性说明,并不对此进行限定。
如图4所示,图4假设第一节点集包括数据节点N1至N6,第二节点集包括数据节点N2至N5,以及N7至N9,则此次数据重分布所涉及的数据节点包括数据节点N1至N9。
在步骤302的第一数据表的数据迁移过程之前,可以预先将数据重分布所涉及的数据节点统一编号排序,采用哈希分布规则确定出第一数据表的数据与第一节点集的数据节点的第一映射关系,以及第一数据表的数据和第二节点集的数据节点的第二映射关系,第一映射关系和第二节映射关系可以依据最小移动数量的原则(也称数据最小移动原则)来确定。若分布式系统预先存储有第一数据表的数据与第一节点集的数据节点的第一映射关系,可以直接获取该映射关系,不再重新进行哈希计算。通过获取第一映射关系和第二映射关系,可以对表数据分布的映射关系进行组织。这样便于找到数据在后续数据迁移过程中的移动方向。同时也便于在迁移第一数据表的数据的过程中,为生成分布式计划(也称分布式执行计划)做准备。
简言之,前述确定第一节点集和第二节点集的过程是确定数据重分布前后涉及哪些数据节点的过程,确定映射关系的过程是确定各个数据重分布前后数据具体分布在哪个数据节点的过程。
步骤302、管理节点将第一数据表的数据从第一节点集迁移至第二节点集。
在本申请实施例中,迁移动作的原理类似于数据剪切,指的是将一个数据从一个节点移动到另一节点的动作。将第一数据表的数据从第一节点集迁移至第二节点集的过程,即为将第一数据表的数据从第一节点集移动至第二节点集的过程。可选地,从第一节点集移动后的数据不再存储于该第一节点集中。
该第一数据表的数据迁移过程,也即是数据重分布的过程,可以有多种实现方式,本申请实施例中,以以下几种可选的实现方式为例进行说明,但对此并不进行限定:
第一种可选的实现方式,将第一数据表的所有数据直接从第一节点集迁移至第二节点集。也即是将第一数据表的所有数据作为待迁移数据。这样一次迁移过程即为全量迁移过程。
第二种可选的实现方式,在第一节点集存储的第一数据表的数据中筛选待迁移数据,该待迁移数据为第二节点集在迁移前没有存储的第一数据表的数据;将待迁移数据从第一节点集迁移至第二节点集。
由于在一些场景中,例如扩容场景中,一些数据可能无需进行迁移,这些数据可以称之为无效迁移数据。例如,在迁移前和迁移后在数据节点中部署的位置不变的数据和/或在迁移动作前已经删除的数据,对这些数据执行迁移动作,不仅占用数据资源,还会影响迁移的效率。因此可以通过筛选操作剔除无效迁移数据,将实际需要进行迁移的数据作为待迁移数据。也即是该待迁移数据包括第一数据表的数据中除无效迁移数据之外的数据。这样可以实现表数据的部分迁移,减少迁移的数据量,减少数据资源占用,提高迁移效率。
值得说明的是,在第一节点集和第二节点集中存在相同的数据节点(即第一节点集和第二节点集的数据节点存在交集)时,才可能会出现迁移前和迁移后在数据节点中部署的位置不变的数据的情况。如果第一节点集和第二节点集中的数据节点完全不同(在数据迁移场景可能会出现这种情况),通常不会出现迁移前和迁移后在数据节点中部署的位置不变的数据的情况;这种情况下,第一节点集中的数据节点所部署的第一数据表的数据需要全部迁移至第二节点集中的数据节点中,也即是待迁移数据是第一节点集中的数据节点所部署的第一数据表的全部数据。因此,在本申请实施例中,在第一节点集存储的第一数据表的数据中筛选待迁移数据之前,还可以检测第一节点集和第二节点集中是否存在相同的数据节点;当第一节点集和第二节点集中存在相同的数据节点时,再在第一节点集存储的第一数据表的数据中筛选待迁移数据;当第一节点集和第二节点集中不存在相同的数据节点时,不执行筛选动作。由于待迁移数据的筛选过程较前述检测过程的计算量要大,这样可以避免额外的待迁移数据的筛选,从而降低计算复杂度,提高数据迁移的效率。
示例的,如图5所示,在第一节点集存储的第一数据表的数据中筛选待迁移数据的过程,可以包括:
步骤3021、管理节点获取第一数据表中的数据与第一节点集的数据节点的第一映射关系。
分布式数据库中,数据的分布遵循负载均衡原则。参考前述介绍,为了保证数据的均匀分布,实现负载均衡,通常采用哈希分布规则,来在各个数据节点上分布数据。进一步的,为了避免哈希冲突,还可以通过引入哈希桶算法来进行数据分布,在引入哈希桶算的的分布 式数据库中,通常以哈希桶为单位来在各个数据节点上分布数据,以达到负载均衡。通常情况下,一个数据节点可以部署一个或多个哈希桶对应的数据。
在采用哈希分布规则进行数据分布时,第一映射关系可以采用哈希值和第一节点集中数据节点的标识的映射关系表征。进一步的,在应用有哈希桶算法的分布式数据库中,由于在哈希桶算法中,哈希值与哈希桶标识一一对应,第一映射关系也可以采用哈希桶标识和第一节点集中数据节点的标识的映射关系表征。其中,数据节点的标识可以由一个或多个字符(如数字)组成,用于标识数据节点;该数据节点的标识可以为数据节点名称(如N1或N2)或数据节点编号。哈希桶标识可以由一个或多个字符(如数字)组成,用于标识哈希桶;该哈希桶标识可以为计算得到的哈希值的数值,也可以为哈希桶编号,如1或2。
第一映射关系可以实时计算。若分布式数据库预先记录有该第一映射关系,也可以直接获取预先记录的该第一映射关系。该第一映射关系可以以关系图、关系表或者关系索引的方式表征。示例的,第一映射关系可以为如图6所示的关系图,在该关系图中,假设第一映射关系可以采用哈希桶编号和第一节点集中数据节点的名称的映射关系表征,则如图6所示,基于该第一映射关系可知,哈希桶编号为1至6的数据与数据节点名称为N1至N6的数据节点分别对应,哈希桶编号为7至12的数据与数据节点名称为N1至N6的数据节点分别对应,哈希桶编号为13至17的数据与数据节点名称为N1至N5的数据节点分别对应。由此可知,第一映射关系中,数据节点N1与哈希桶编号为1、7和13的哈希桶对应;数据节点N2与哈希桶编号为2、8和14的哈希桶对应;数据节点N3与哈希桶编号为3、9和15的哈希桶对应;数据节点N4与哈希桶编号为4、10和16的哈希桶对应;数据节点N5与哈希桶编号为5、11和17的哈希桶对应;数据节点N6与哈希桶编号为6和12的哈希桶对应。在图6所示的第一映射关系中,数据节点的名称与哈希桶编号为一对多的关系。
步骤3022、管理节点获取第一数据表中的数据与第二节点集的数据节点的第二映射关系。
与第一映射关系同理,第二映射关系可以以多种方式和多种形式表征。在采用哈希分布规则进行数据分布时,第二映射关系可以采用哈希值和第二节点集中数据节点的标识的映射关系表征。进一步的,在应用有哈希桶算法的分布式数据库中,第二映射关系可以采用哈希桶标识和第二节点集中数据节点的标识的映射关系表征。其中,数据节点的标识可以由一个或多个字符(如数字)组成,用于标识数据节点;该数据节点的标识可以为数据节点名称(如N1或N2)或数据节点编号。哈希桶标识可以由一个或多个字符(如数字)组成,用于标识哈希桶;该哈希桶标识可以为计算得到的哈希值的数值,也可以为哈希桶编号,如1或2。第二映射关系可以实时计算,例如基于第一映射关系以及最小移动数量的原则确定。若分布式数据库预先记录有该第二映射关系,也可以直接获取预先记录的该第二映射关系。第二映射关系可以以关系图、关系表或者关系索引的方式表征。
示例的,第二映射关系可以为如图6所示的关系图,在该关系图中,假设第二映射关系可以采用哈希桶编号和第二节点集中数据节点的名称的映射关系表征,则如图6所示,基于该第二映射关系可知,哈希桶编号为1至6的数据与数据节点名称为N7、N2、N3、N4、N5和N8的数据节点分别对应,哈希桶编号为7至12的数据与数据节点名称为N9、N2、N3、N4、N7和N8的数据节点分别对应,哈希桶编号为13至17的数据与数据节点名称为N9、N2、N3、N7和N5的数据节点分别对应。由此可知,第二映射关系中,数据节点N2与哈希桶编号为2、8和14的哈希桶对应;数据节点N3与哈希桶编号为3、9和15的哈希桶对应; 数据节点N4与哈希桶编号为4和10的哈希桶对应;数据节点N5与哈希桶编号为5和17的哈希桶对应;数据节点N7与哈希桶编号为1、11和16的哈希桶对应;数据节点N8与哈希桶编号为6和12的哈希桶对应;数据节点N9与哈希桶编号为7和13的哈希桶对应。在图6所示的第二映射关系中,数据节点的名称与哈希桶编号为一对多的关系。
值得说明的是,第一映射关系和第二映射关系可以采用同一关系图、关系表或者关系索引表征,也可以分别采用各自的关系图、关系表或者关系索引表征。图6以第一映射关系和第二映射关系可以采用同一关系图表征为例进行说明,但并不对此进行限定。
步骤3023、管理节点基于第一映射关系和第二映射关系,在第一节点集存储的第一数据表的数据中筛选待迁移数据。
参考前述内容可知,待迁移数据为迁移(即数据重分布)前和迁移后在数据节点中部署的位置改变的数据,即有效迁移数据,该待迁移数据为第二节点集在迁移前没有存储的第一数据表的数据。
在一种可选示例中,可以遍历第一数据表中每个数据,并通过对比该第一映射关系和第二映射关系,在第一节点集存储的第一数据表的数据中筛选待迁移数据。具体地,对于第一数据表中的目标数据,在基于第一映射关系确定的与目标数据对应的数据节点与基于第二映射关系确定的与目标数据对应的数据节点不同时,在基于第一映射关系确定的与该目标数据对应的数据节点中,将该目标数据确定为待迁移数据。
以图6为例,假设哈希值与哈希桶编号相同,对于第一数据表中的目标数据X,计算其哈希值,假设计算得到的哈希值为1,则该目标数据X的哈希值存储在哈希桶1中,目标数据X即为哈希桶编号为1的数据。基于第一映射关系可知该目标数据X对应的数据节点为N1;基于第二映射关系可知该目标数据X对应的数据节点为N7。可知,目标数据X在数据迁移前后的数据节点不同,则数据节点N1中的目标数据X确定为待迁移数据。
在另一种可选示例中,通过对比该第一映射关系和第二映射关系,将两个映射关系中存储的数据节点不同的数据作为待迁移数据。具体地,该对比过程包括:对于第一节点集中的每个数据节点,查询第一映射关系,获取该数据节点对应的第一数据集;查询第二映射关系,获取该数据节点对应的第二数据集;将第一数据集的数据中,和第二数据集的数据中不同的数据作为与该数据节点对应的待迁移数据。获取的第一节点集中的各个数据节点对应的待迁移数据组成最终的待迁移数据。值得说明的是,对于第一节点集中的某一数据节点,第二节点集中可能不存在该数据节点,若第二节点集中不存在该某一数据节点,该某一数据节点对应的第二数据集为空。
以图6为例,对于第一节点集中的数据节点N1,查询第一映射关系,获取该数据节点对应的第一数据集包括哈希桶编号为1、7和13的数据;查询第二映射关系,获取该数据节点N1对应的第二数据集为空;则数据节点N1对应的待迁移数据为哈希桶编号为1、7和13的数据。对于第一节点集中的数据节点N2,查询第一映射关系,获取该数据节点对应的第一数据集为哈希桶编号为2、8和14的数据;查询第二映射关系,获取该数据节点对应的第二数据集包括哈希桶编号为2、8和14的数据;则数据节点N2对应的待迁移数据为空。其他数据节点的待迁移数据的获取方法类似,本申请实施例不再赘述。最终第一节点集对应的待迁移数据包括哈希桶编号为1、11和16的数据(后续分别从数据节点N1、N5和N4迁移到数据节点N7),哈希桶编号为6和12的数据(后续从数据节点N6迁移到数据节点N8)以及 哈希桶编号为7和13的数据(后续从数据节点N1迁移数据节点N9)。
传统的数据重分布过程中,数据从源表迁移到临时表的过程中,通过对源表加独占锁来暂时禁止数据更新,在gpdb中,由于采用全量数据迁移,整个迁移过程,源表均需要被加锁。若迁移的数据较多,例如几十吉(giga,G)或几十太(tera,T)的数据,则会引起几十分钟甚至几小时的用户业务阻塞。在GaussDB中,将整体迁移过程划分为全量迁移和多次增量迁移,若迁移的数据较多,例如几十G或几十T的数据,则会引起几十分钟的用户业务阻塞。
而本申请实施例中,虽然仍然采用全量数据迁移,但在扩容或缩容等场景中,通过前述步骤3023的待迁移数据的筛选过程,可以减少大量的无效迁移数据的迁移,从而降低业务阻塞时长,提高迁移效率。
在一种可选实施例中,前述将第一数据表的数据从第一节点集迁移至第二节点集的过程可以通过一个或多个分布式事务(Distributed Transaction)执行。
分布式数据库中的事务均可以称为分布式事务。本申请实施例中的分布式事务涉及管理节点和多个数据节点。分布式事务通常包括事务开始阶段、事务执行阶段和事务提交阶段共三个阶段。其中,在执行该分布式事务的过程中,在事务开始阶段,管理节点需要为后续事务执行阶段进行一定的语句准备;在事务执行阶段,管理节点执行分布式事务所涉及的一个或多个动作,该多个动作可以并行执行。在本申请实施例中,分布式事务包括的动作可以是扫描动作,也可以是迁移动作。其中,迁移动作可以涉及一条或多条SQL语句,分布式事务包括的动作还可以是生成分布式计划,以及发送分布式计划;在事务提交阶段遵循2阶段提交(Two-Phase Commit,2PC)协议或3阶段提交(Three-Phase Commit,3PC)协议,以保持事务在管理节点以及该多个数据节点执行的一致性。
在另一种可选实施例中,前述将第一数据表的数据从第一节点集迁移至第二节点集的过程可以通过串行执行的多个分布式事务实现。本申请实施例中,管理节点可以串行执行该多个分布式事务,以控制第一节点集和第二节点集中的数据节点实现数据迁移。
具体地,在串行执行该多个分布式事务时,管理节点通过当前执行到的分布式事务,从该第一节点集中的第一数据表的未迁移数据中选择满足迁移条件的待迁移数据(该待迁移数据的确定方式可以参考前述步骤3021至3023),并将选择的该待迁移数据从该第一节点集迁移至该第二节点集。选择的待迁移数据在被迁移过程中被加锁,通常在用于迁移该待迁移数据的分布式事务提交成功时,该待迁移数据被解锁。
其中,该迁移条件包括:通过当前执行到的分布式事务迁移的该待迁移数据的数据量小于或等于指定数据量阈值,和/或,通过当前执行到的分布式事务迁移的迁移时长小于或等于指定时长阈值。
待迁移数据的数据量可以采用记录的条数表征,一条记录的数据也即是数据表的一行数据,是数据迁移的最小单位。相应的,指定数据量阈值可以由指定条数阈值表征。
前述数据量阈值和指定时长阈值,分别可以是固定设置的值,或者分别可以是动态变化的值。示例的,在步骤302之前,可以基于第一数据表的数据量,和/或,分布式数据库当前的负载信息,确定数据量阈值;和/或,基于第一数据表的数据量,和/或,分布式数据库使用的当前资源(如CPU、内存或IO资源中的一种或多种资源)的负载信息确定指定时长阈值。其中,第一数据表的数据量与数据量阈值正相关,与指定时长阈值正相关,分布式数据库当前的负载信息与数据量阈值负相关,与指定时长阈值负相关。也即是,第一数据表的数据量 越大,数据量阈值越大,时长阈值越长;分布式数据库当前的负载越大,数据量阈值越小,时长阈值越小。
管理节点在通过当前执行到的每个分布式事务迁移其对应的待迁移数据后,可以将第一节点集中数据节点存储的第一数据表的被迁移过的数据删除,以便后续在扫描数据时,区分哪些数据已经被迁移,哪些数据没有被迁移。
值得说明的是,用户业务阻塞的时长实际就是数据被加锁的时长,由于通过每个分布式事务迁移的数据不同,则对于每个被迁移的数据,其被加锁的时长即为对应的分布式事务的迁移过程的时长。本申请实施例中,表数据迁移采用串行的多个事务批量执行,通过限制每个分布式事务的迁移的数据量和/或迁移时长,避免在执行每个分布式事务时的资源消耗过大,减少了每个分布式事务对应的加锁时长。
传统的数据重分布过程中,在gpdb中,由于采用全量数据迁移,每个被迁移的数据被加锁的时长等于整个增量迁移过程的迁移时长;在GaussDB中,虽然将整体迁移过程划分为全量迁移和多次增量迁移,每个被迁移的数据被加锁的时长相对缩短,但是整体的业务阻塞时长仍然较长。
而本申请实施例中,通过限制每个分布式事务的迁移的数据量和/或迁移时长,每个被迁移的数据被加锁的时长远远小于传统的数据重分布过程中的加锁时长。整体的业务阻塞时长可以降低到1分钟左右,通常用户无感知,因此相对于传统的数据重分布方法,能够有效降低业务阻塞时长,保证业务顺畅,提高用户体验。并且,对被迁移的数据添加的锁为写锁,避免其迁移过程中,对该数据的更改和删除操作,但是对该数据的查询操作仍然可以执行。
在本申请实施例中,管理节点可以基于确定的第一节点集和第二节点集,依次发起串行的多个分布式事务,在执行每个分布式事务时生成一个或多个分布式计划,并指示第一节点集和/或第二节点集中的数据节点执行生成的分布式计划,从而实现前述第一数据表中的数据迁移。其中,每个分布式计划与一个或多个数据节点对应。该分布式计划包括一个或多个SQL语句,其用于指示对应的数据节点执行的动作,以及执行动作的先后顺序等。例如,该执行的动作可以是扫描动作,也可以是迁移动作。该分布式计划可以携带前述迁移条件,或者基于该迁移条件确定的子迁移条件。可选地,在每次发起分布式事务时,管理节点还可以结合当前系统资源情况调整分布式计划的内容,例如调整迁移条件或子迁移条件。该分布式计划可以通过在对应的数据节点中执行事务或者任务来实现。例如,一个数据节点接收到该分布式计划时,可以发起一个事务(也称本地事务)或者任务以按照分布式计划中指示的先后顺序,执行该分布式计划中所指示的动作。
在第一种可选方式中,管理节点基于当前执行到的分布式事务,生成多个分布式计划以指示多个数据节点进行第一数据表中的数据迁移。假设,第一节点集包括n个数据节点,n为正整数;第二节点集包括m个数据节点,m为正整数。如图7所示,该迁移过程包括:
步骤3024、管理节点基于当前执行到的分布式事务,为n个数据节点分别生成n个分布式计划,n个数据节点与n个分布式计划一一对应;管理节点指示n个数据节点分别执行n个分布式计划来并行从n个数据节点中的第一数据表的未迁移数据中选择满足子迁移条件的待迁移数据、并将选择的满足子迁移条件的待迁移数据从n个数据节点发送至第二节点集。
具体地,对于当前执行到的分布式事务,管理节点将基于该分布式事务生成的n个分布 式计划中的每个分布式计划发送至对应的数据节点。由该对应的数据节点执行该分布式计划。当各个数据节点执行完成对应的分布式计划后,管理节点执行下一次分布式事务,再生成新的n个分布式计划,并分别发送至对应的数据节点,以此类推。如果第一数据表的所有数据都已经完成迁移,管理节点取消表重分布标志,并准备下一个数据表的数据迁移。
前述子迁移条件是根据迁移条件确定的。可选地,该分布式计划还可以携带前述迁移子条件。例如,当该迁移条件为通过当前执行到的分布式事务迁移的该待迁移数据的数据量小于或等于指定数据量阈值时,对应的,子迁移条件为通过执行对应的分布式计划迁移的该待迁移数据的数据量小于或等于子数据量阈值。该子数量阈值小于该指定数量阈值。n个分布式计划对应的子数量阈值可以相等也可以不等。例如n个分布式计划对应的子数量阈值可以等于指定数量阈值的n分之一。当该迁移条件为通过当前执行到的分布式事务迁移的迁移时长小于或等于指定时长阈值时,对应的,子迁移条件为通过当前执行到的分布式事务迁移的迁移时长小于或等于子时长阈值。该子时长阈值小于或等于指定时长阈值,且n个分布式计划对应的子时长阈值的最大值为前述指定时长阈值。n个分布式计划对应的子时长阈值可以相等也可以不等。通常情况下,n个分布式计划对应的子时长阈值均等于指定时长阈值。
对于n个数据节点中的每个数据节点,其获取的分布式计划可以通过在该数据节点中执行事务或者任务来实现。假设第一数据节点为n个数据节点中的任一数据节点,以该第一数据节点执行本地事务来实现分布式计划为例。例如,为该第一数据节点生成的分布式计划可以包括一个或多个SQL语句,其用于指示第一数据节点执行扫描动作、迁移动作,且扫描动作、迁移动作并行执行,数据迁移的目标数据节点为第二数据节点(即第二节点集中的数据节点),且该分布计划携带子迁移条件。则基于该分布式计划,该第一数据节点可以通过本地事务扫描(也称表扫描)第一数据节点中存储的第一数据表的未迁移数据,以选择满足子迁移条件的待迁移数据、并将选择的满足子迁移条件的待迁移数据从第一数据节点发送至第二节点集中的第二数据节点。
例如,当采用前述第一种可选的实现方式,将第一数据表的所有未迁移数据作为待迁移数据时,第一数据节点通过本地事务,可以遍历第一数据节点中存储的第一数据表的未迁移数据,将遍历得到的数据作为待迁移数据。
当采用前述第二种可选的实现方式,待迁移数据是在第一数据表的数据中筛选得到的,第一数据节点通过本地事务,遍历第一数据节点中第一数据表中的未迁移数据,筛选得到满足迁移子条件的待迁移数据。该筛选过程可以参考前述步骤3023。
当某一分布式事务为数据重分布过程中首次发起的分布式事务时,n个数据节点扫描得到的未迁移数据即为该第一数据表的所有数据,当某一分布式事务为数据重分布过程中非首次发起的分布式事务时,n个数据节点扫描得到的未迁移数据为该第一数据表的未通过之前的分布式事务所迁移的数据。
在第一种可选的实现方式中,第一数据节点通过本地事务可以扫描第一数据节点中存储的第一数据表的全部记录来获取未迁移的数据,即从第一数据节点中存储的第一数据表的开头开始从上往下扫描。采用该第一种可选的实现方式提供的扫描方式,在管理节点执行到的每个分布式事务时,指示第一数据节点均扫描第一数据节点中存储的第一数据表的全部记录,可以避免待迁移数据的遗漏。
可选地,如果采用第二种可选的实现方式扫描未迁移的数据,第一数据节点通过本地事务可以将本次扫描结束的位置记录下来,在管理节点执行到的下一个分布式事务时,指示第一数据节点基于对应的分布式计划,从第一数据节点中存储的第一数据表的记录的最新次已结束的位置向后扫描来获取未迁移的数据。这样可以避免第一数据节点中前面已经被扫描的记录再被扫描。
可选地,如果采用第二种可选的实现方式扫描未迁移的数据,为了避免更新的数据存储在管理节点在本次分布式事务之前的分布式事务控制数据节点扫描过的数据记录中,管理节点可以通过最后一次执行的分布式事务生成n个分布式计划,每个分布式计划指示对应的数据节点一次性扫描完该数据节点上存储的第一数据表的数据,从而避免数据遗漏,或者通过多个分布式事务控制n个数据节点同时分别扫描第一数据表的不同数据。
本申请实施例中,在管理节点执行到当前的分布式事务时,前述步骤3023和步骤3024可以嵌套执行,也即是前述步骤3023的具体动作是管理节点通过分布式计划指示数据节点执行的。
步骤3025、管理节点基于当前执行到的分布式事务为m个数据节点分别生成m个分布式计划,m个数据节点与m个分布式计划一一对应;管理节点指示m个数据节点分别执行m个分布式计划来并行接收并存储从第一节点集发送的第一数据表的数据。
对于m个数据节点中的每个数据节点,其获取的分布式计划可以通过在该数据节点中执行事务或者任务来实现。假设第二数据节点为m个数据节点中的任一数据节点,以该第二数据节点执行本地事务来实现分布式计划为例。例如,为该第二数据节点生成的分布式计划可以包括一个或多个SQL语句,其用于指示第二数据节点执行接收动作、存储动作,且接收动作、存储动作并行执行,数据的源数据节点为第一数据节点。则基于该分布式计划,该第二数据节点可以通过本地事务接收并存储从第一节点集发送的第一数据表的数据。
可选地,对于第一节点集中的每个数据节点,该数据节点部署用于执行管理节点下发的分布式计划的本地事务。具体地,该数据节点执行的本地事务可以包括两个线程,两个线程用于分别执行前述扫描动作和迁移动作。示例的,每个本地事务包括扫描线程和发送线程,扫描线程用于扫描第一节点集对应数据节点中的第一数据表的未迁移数据(也即是扫描第一数据表的数据时,跳过已删除数据)得到待迁移数据,确定待迁移数据的过程可以参考前述步骤3023;发送线程用于将待迁移数据发送至第二节点集中的目标数据节点。两个线程可以并行执行,提高数据重分布效率。对于第二节点集中的每个数据节点,该数据节点部署用于执行管理节点下发的分布式计划的本地事务。具体地,该数据节点执行的本地事务可以包括一个接收线程,用于接收其他数据节点发送的数据,并将接收的数据写入本地数据节点。由于第一节点集中的数据节点也可能收到其他节点的数据,因此,第一节点集中的每个数据节点执行的本地事务还也可以包括一个接收线程。同理,由于第二节点集中的数据节点也可能向其他节点发送数据,因此,第二节点集中的每个数据节点执行的本地事务也可以包括一个发送线程。可选地,当一个数据节点需要同时发起发送线程和接收线程时,为了节约对线程的占用,该数据节点可以通过执行本地事务发起一个收发线程(即该本地事务包括一个收发线程),用于完成前述发送线程和接收线程的功能,例如接收和发送数据。
值得说明的是,第一节点集中数据节点在完成本地数据节点上存储的第一数据表的待迁移数据的迁移后,可以向该待迁移数据所迁移至的第二节点集中的目标数据节点发 送迁移完成通知(也称结束标记);对于第二节点集中的任一数据节点,当该数据节点接收到对应的各个源数据节点(该数据节点对应的源数据节点可以记载在分布式计划中)的迁移完成通知后,确定完成对应的分布式计划的执行,停止执行对应的分布式计划。
通过基于分布式事务生成多个分布式计划,可以指示多个数据节点并行执行多个分布式计划,以并行进行数据迁移,这样可以有效节约每个分布式事务的执行时长,提高执行分布式事务的效率。
如图8所示,假设第一节点集包括数据节点N1至N3,第二节点集包括数据节点N4,管理节点通过串行执行的两个分布式事务进行待迁移数据的迁移。其中,假设该两个分布式事务包括第一分布式事务和第二分布式事务,基于第一分布式事务生成的3个分布式计划,分别由第一节点集中3个数据节点的事务1a至1c实现,基于第二分布式事务生成的3个分布式计划,分别由第一节点集中3个数据节点的事务2a至2c。假设采用记录条数表征迁移数据量,每个分布式计划中对应的指定数据量阈值为1条,则执行事务1a至1c中每个事务,以扫描对应数据节点未迁移的多条记录的数据后,完成1条记录的数据的迁移。以管理节点执行第一分布式事务为例,每个数据节点执行对应分布式计划,使得每个数据节点通过其事务在本地数据节点上进行数据扫描,找到待迁移数据,并将待迁移数据发送到目标数据节点(图8中是数据节点N4),并同时删除本地数据节点由该事务迁移过的已迁移数据。通过执行事务1a至1c并行执行扫描和迁移动作,直到满足前述迁移条件,或者每个数据节点满足对应的子条件。之后管理节点提交第一分布式事务,完成这批数据的迁移。其中,找到待迁移数据的过程可以参考前述步骤302中对应过程。事务2a至2c的执行过程参考前述事务1a至1c的执行过程,本申请实施例对此不做赘述。进一步的,基于第一分布式事务还可以生成一个与数据节点N4对应的分布式计划,数据节点N4通过执行事务(图8中未示出)来实现该分布式计划,从而接收数据节点N1至N3发送的数据,并将接收的数据存储到数据节点N4。
在第二种可选方式中,管理节点基于当前执行到的分布式事务,生成一个分布式计划,并指示第一节点集中的数据节点和第二节点集中的数据节点执行该分布式计划,以从第一节点集中的第一数据表的未迁移数据选择满足迁移条件的待迁移数据,并将选择的待迁移数据从第一节点集迁移至第二节点集。
该分布式计划与第一节点集和第二节点集中的多个数据节点对应,其可以视为前述第一种可选方式中,n个分布式计划和m个分布式计划的整合计划。该分布式计划包括一个或多个SQL语句,其用于指示第一节点集和第二节点集中每个数据节点执行的动作,以及执行动作的先后顺序等。例如,该执行的动作可以包括扫描动作、迁移动作、接收动作和/或存储动作。可选地,该分布式计划还可以携带前述迁移条件。每个数据节点接收到该分布式计划后,可以确定自身所需执行的动作,还可以基于该迁移条件确定与自身对应的子迁移条件,该迁移条件确定过程可以参考前述第一种可选方式。
该分布式计划可以通过在数据节点中执行事务或者任务来实现。第一节点集和第二节点集中每个数据节点执行分布式计划中自身所需执行的动作的过程可以参考前述第一种可选方式中该数据节点执行与对应分布式计划的过程,本申请实施例对此不再赘述。
本申请实施例中,分布式数据库采用多版本并发控制机制(Multiversion concurrency control,MVCC)进行数据存储。在多版本并发机制中,从某一数据节点上删除的数据并没有 从该数据节点物理上移除,只是作为历史版本也同样存储在该数据节点上。例如,在执行前述步骤3025后,管理节点为第一节点集上的第一数据表中的已迁移数据设置删除标识(或者通过分布式计划控制数据节点设置删除标识),该删除标识指示已迁移数据转化为历史版本的数据。则前述步骤3025中所述的迁移后的数据被删除,实质上是将数据作为历史版本记录在相应的数据节点上,后续通过执行分布式事务,以进行数据扫描时,跳过该历史版本的数据即可(即跳过设置有删除标记的数据)。这样,可以保证数据迁移过程中,用户针对该历史版本的数据的数据查询操作有效执行。
值得说明的是,数据迁移过程中,由于正在被迁移的数据被加锁,针对该被迁移的数据只能执行数据查询操作,不能执行数据修改和删除操作。一旦该数据的迁移完成,第一节点集上的该数据会被设置删除标识,该数据就变成了历史版本数据(实际上并未在第一节点集中真正删除),最新版本数据已经迁移到第二节点集中的新节点上。这些历史版本的数据只支持数据查询操作。当用于迁移该数据的分布式事务提交后,新的用户事务也不会再查询这些历史版本的数据了。当针对第一节点集中历史版本的数据的并发事务(例如用于查询历史版本的数据的数据查询操作)全部结束后,这些历史版本的数据就不再会被访问,可以被物理删除。分布式数据库基于其运行的周期的数据清理机制,会将这些历史版本的数据从第一数据表的数据中清除,也即从分布式数据库中物理性移除(该过程为数据的过期清理过程)。
步骤303、管理节点在迁移第一数据表的数据的过程中,当接收到对第一数据表的目标业务请求时,在第一节点集和第二节点集中确定用于响应目标业务请求的第三节点集。
在该数据迁移过程中,由于用户的不同需求,可以产生多种类型的用户业务。
用户业务在不同场景中有多种,例如数据查询业务、数据添加业务(也称数据插入业务)、数据删除业务和数据修改业务,对应的业务请求分别为数据查询请求、数据添加请求(也称数据插入请求)、数据删除请求和数据修改请求。其中,数据查询请求用于请求进行数据的数据查询操作,数据添加请求用于请求进行数据添加操作,数据删除请求用于请求进行数据删除操作,数据修改请求用于请求进行数据修改操作。其中,数据查询业务又基于其与数据表的关联性划分为与一个数据表关联的数据查询业务和与多个数据表关联的数据查询业务,与一个数据表关联的数据查询业务对应的数据查询请求所指示的数据查询操作仅需要查询一个数据表中的数据,与多个数据表关联的数据查询业务对应的数据查询请求所指示的数据查询操作需要查询多个数据表中的数据。例如,数据查询请求为:“查询公司X中的女员工信息”,假设公司X的女员工信息记录在第一数据表中,则查询操作只涉及一个数据表,该数据查询请求即为与一个数据表关联的数据查询业务对应的数据查询请求;又例如,数据查询请求为:“查询公司X的客户公司的女员工信息”,假设公司X的客户公司记录在第二数据表中,不同客户公司的女员工信息记录在不同的数据表中,则查询操作指示先查询第二数据表得到公司X的客户公司的标识,再基于获取的标识查询对应的公司的数据表,得到公司X的客户公司的女员工信息。该数据查询请求涉及多个数据表,该数据查询请求即为与多个数据表关联的数据查询业务对应的数据查询请求。
本申请实施例,数据重分布方法可以应用于多种场景,则该目标业务请求可以为数据查询请求、数据添加请求(也称插入请求)、数据删除请求或数据修改请求,该目标业务请求可以针对一条或多条记录的数据。如图9所示,由于数据迁移过程中,同一目标业务所针对的业务数据可能涉及数据重分布前的数据节点和/或数据重分布后的数据节点。例如,当数据 按照哈希桶的方式进行哈希分布,同一哈希桶的数据会通过串行执行的多个分布式事务来移动,因此在迁移过程中会存在同一个哈希桶的数据同时分布在两个数据节点上(已经迁移的数据分布在第二节点集的数据节点上,没有迁移的数据分布在第一节点集的数据节点上,且该哈希桶对应的所有新增数据都直接写入在第二节点集的数据节点上)。因此针对不同的目标业务,最终确定的第三节点集不同,该第三节点集包括一个或多个数据节点。本申请实施例以以下几种实现场景为例,对该第三节点集的确定过程进行说明:
在第一种实现场景中,当目标业务请求为数据添加请求时,在第二节点集中确定用于响应所述数据添加请求的第三节点集。
示例的,根据数据添加请求所携带的新增数据的键值计算哈希值;根据哈希值在第二节点集中确定用于响应所述数据添加请求的第三节点集。例如,将哈希值所对应的第二节点集中的数据节点确定为第三节点集的数据节点。例如,可以确定哈希值所对应哈希桶,将第二节点集中哈希桶所对应的数据节点确定为第三节点集中的数据节点。例如,可以通过查询前述第二映射关系表,将查询得到的数据节点确定为第三节点集中的数据节点。
如图9所示,假设接收到数据添加请求,该请求指示数据添加操作对应新增数据D,则通过哈希分布规则确定新增数据D所存储的第三节点集为数据节点N4,将该新增数据D存储在数据节点N4。
对于传统的数据重分布方法,考虑到源表和临时表的一致性,如果源表的新增数据速率大于数据迁移速率,会导致数据迁移无法结束,如果强行锁表进行迁移,可能会使锁表的时间比较长而影响用户业务。而本申请实施例,由于无需建立临时表,将新增数据直接添加到第二节点集的数据节点(即第三节点集)中,则数据迁移过程中,无需迁移这些新增数据,也无需记录这些新增数据,可以快速实现新增数据的存储,有效减少数据迁移的数量,简化数据迁移过程,提高数据迁移效率,减少对用户业务的影响。
在第二种实现场景中,当目标业务请求为数据删除请求或者数据修改请求或者与第一数据表关联的数据查询请求时,在该第一节点集中确定用于响应该目标业务请求的数据节点,并在该第二节点集中确定用于响应该目标业务请求的数据节点,由从该第一节点集中确定的数据节点和从该第二节点集中确定的数据节点组成该第三节点集。
在一种可选方式中,当目标业务请求包括数据删除请求时,在第一节点集中查询用于响应数据删除请求的数据节点(即该数据删除请求所请求删除的数据所在数据节点),并在第二节点集中查询用于响应数据删除请求的数据节点,合并分别查询得到的数据节点来组成第三节点集。如图9所示,在第一节点集(包括数据节点N1至N3)中查询数据删除请求所请求删除的数据B所在数据节点,得到数据节点N2;在第二节点集(包括数据节点N4)中查询数据B所在的数据节点,得到数据节点N4;由查询得到的数据节点组成的第三节点集包括数据节点N2和N4。
示例的,对于数据删除请求,如果可以基于键值来进行删除,由键值计算哈希值后,基于前述第一映射关系表在第一节点集中确定第四节点集,基于前述第二映射关系表在第二节点集中确定第五节点集。被删除的数据在两个节点集中都可能存在,因此将第四节点集和第五节点集的并集确定为第三节点集,即第三节点集包括第四节点集和第五节点集。其中第四节点集和第五节点集均包括一个或多个数据节点。
在另一种可选方式中,当目标业务请求包括数据修改请求时,在第一节点集中查询用于 响应数据修改请求的数据节点(即该数据修改请求所请求修改的数据所在数据节点),并在第二节点集中查询用于响应数据修改请求的数据节点,合并分别查询得到的数据节点来组成第三节点集。如图9所示,在第一节点集(包括数据节点N1至N3)中查询数据修改请求所请求修改的数据C所在的数据节点,得到数据节点N3;并在第二节点集(包括数据节点N4)中查询数据C所在的数据节点,得到数据节点N4;由查询得到的数据节点组成的第三节点集包括数据节点N3和N4。
示例的,对于数据修改请求,如果可以基于键值来进行修改,由键值计算哈希值后,基于前述第一映射关系表在第一节点集中确定第六节点集,基于前述第二映射关系表在第二节点集中确定第七节点集。被修改的数据在两个节点集中都可能存在,因此将第六节点集和第七节点集的并集确定为第三节点集。其中第六节点集和第七节点集均包括一个或多个数据节点。
在又一种可选方式中,当数据查询请求包括与第一数据表关联的数据查询请求时,在第一节点集中查询用于响应数据查询请求的数据节点(即该数据查询请求所请求查询的数据所在的数据节点),并在第二节点集中查询用于响应数据查询请求的数据节点,合并分别查询得到的数据节点来组成第三节点集。如图9所示,在第一节点集(包括数据节点N1至N3)中查询数据修改请求所请求修改的数据A所在的数据节点,得到数据节点N1,并在第二节点集(包括数据节点N4)中查询数据A所在的数据节点,得到数据节点N4,则由查询得到的数据节点组成的第三节点集包括数据节点N1和N4。
示例的,对于数据查询请求,如果可以基于键值来进行查询,由键值计算哈希值后,基于前述第一映射关系表在第一节点集中确定第八节点集,基于前述第二映射关系表在第二节点集中确定第九节点集。被查询的数据在两个节点集中都可能存在,因此将第八节点集和第九节点集的并集确定为第三节点集。其中第八节点集和第九节点集均包括一个或多个数据节点。
需要说明的是,与第一数据表关联的数据查询请求可以为仅与第一数据表关联的数据查询请求,也可以为与包括第一数据表的多个数据表关联的数据查询请求。若查询请求为与包括第一数据表的多个数据表关联的数据查询请求时,对于该查询请求所关联的每个数据表,用于响应该查询请求的该数据表对应的第三节点集的获取方式参考数据查询请求为仅与第一数据表关联的数据查询请求时,第一数据表对应的第三节点集的获取方式,本申请实施例对此不做赘述。后续该数据查询请求需要发送至该多个数据表对应的第三节点集。发送过程可以参考后续步骤304。
第二种实现场景中,通过执行前述查询数据节点的操作,可以减少第三节点集中数据节点的数量,减少与第三节点集后续交互的信息量,节约通信开销。
如前所述,目标业务请求所针对的数据可以是一条或多条记录的数据,当其针对一条记录的数据时,由于一条记录的数据不可能同时存在于两个数据节点上,因此同一条记录只能在其中一个数据节点被处理成功。如果不是基于键值进行第三节点集的确定,目标业务请求需要发给数据重分布前后涉及的所有数据节点。因为数据迁移过程中,所有数据节点都可能存在满足目标业务请求所请求的条件的记录。由此可知,前述第二种实现场景中,也可以不执行查询数据节点的操作,直接将第一节点集和第二节点集的并集确定为第三节点集。例如,目标业务请求为数据查询请求,该数据查询请求用于请求查询第一数据表中指定数据范围或 指定时间范围的数据,该指定数据范围可以为符合指定条件的数据的范围,该指定时间范围可以为早于或晚于指定时间点的时间范围,则由于该数据查询请求所针对的数据在第一数据表的数据迁移过程中,可能一部分位于数据重分布前的数据节点,另一部分位于数据重分布后的数据节点,通常需要遍历数据重分布前的数据节点和数据重分布后的数据节点,以避免遗漏查询的数据,因此可以直接将第一节点集和第二节点集的并集确定为第三节点集。并且,直接将第一节点集和第二节点集的并集确定为第三节点集还可以减少查询数据节点的时延,提高业务执行效率。
步骤304、管理节点将目标业务请求发送至第三节点集中的数据节点。
该目标业务请求用于供第三节点集中每个数据节点基于目标业务请求进行业务处理,第三节点集中的每个数据节点在接收了目标业务请求后,进行相应的业务处理。例如,假设第一数据节点为第三节点集中的任一个数据节点,则该第一数据节点执行以下过程:
当第一数据节点接收到数据查询请求,检测本数据节点是否存储有数据查询请求所请求查询的数据,如果本数据节点存储有数据查询请求所请求查询的数据,获取该数据的信息,并向管理节点发送数据查询响应,该数据查询响应包括查询到的数据;如果本数据节点未存储有数据查询请求所请求查询的数据,停止动作,或者向管理节点发送数据查询响应,该数据查询响应指示未查询到所请求的数据。
当第一数据节点接收到数据添加请求,直接在本数据节点添加新增数据。可选地,第一数据节点可以向管理节点发送添加成功响应。
当第一数据节点接收到数据修改请求,检测本数据节点是否存储有数据修改请求所请求修改的数据,如果本数据节点存储有数据修改请求所请求修改的数据,根据数据修改请求修改该数据。可选地,向管理节点发送数据修改响应,该数据修改响应包括修改后的数据,或者指示修改成功;如果本数据节点未存储有数据修改请求所请求修改的数据,停止动作,或者向管理节点发送数据修改响应,该数据修改响应指示不存在所请求的数据。
当第一数据节点接收到数据删除请求,检测本数据节点是否存储有数据删除请求所请求删除的数据,如果本数据节点存储有数据删除请求所请求删除的数据,根据该数据删除请求删除该数据。可选地,向管理节点发送数据删除响应,该数据删除响应指示删除成功;如果本数据节点未存储有数据删除请求所请求删除的数据,停止动作,或者向管理节点发送数据删除响应,该数据删除响应指示不存在所请求的数据。
如前所述,由于本申请实施例的前述数据重分布过程中,数据在迁移之后不再存储于其迁移前的数据节点,因此保证同一条记录的数据仅会存储在分布式数据库的一个数据节点上,而不会存储于两个数据节点上,从而可以保证前述目标业务请求不会出现冲突响应的情况。
步骤305、管理节点在迁移第一数据表的数据的过程中,若检测到回滚触发事件,将通过多个分布式事务已迁移的数据进行回滚(rollback)。
回滚触发事件可以是在第二节点集中第一数据表关联的的数据节点故障(如宕机),或者是在第二节点集中与第一数据表关联的数据节点发生数据传输错误,或者是在第二节点集中与第一数据表关联的数据节点发生网络错误,或者是在第二节点集中与第一数据表关联的数据节点接收到回滚指令,或者第一数据表关联的分布式事务提交失败等。
在一种可能实现方式中,在该分布式数据库中检测到回滚触发事件后,将通过多个分布式事务已迁移的数据进行回滚,可以将分布式数据库恢复到之前的能够正常运行的状态,以 便于后续过程中达到回滚触发事件的结束条件后,分布式数据库仍然能够正常执行在线业务以及数据重分布等业务。
在一种可能实现方式中,前述步骤305可以替换为:在迁移第一数据表的数据的过程中,若检测到回滚触发事件,将通过当前执行到的分布式事务已迁移的数据进行回滚。
传统的分布式数据库中,通过一次分布式事务迁移数据表中的数据,若检测到回滚触发事件,将当前所迁移的数据全部进行回滚,即撤销了该一次分布式事务的所对应的所有已执行动作。回滚的数据量较大,所有已迁移的数据均失效,达到再次迁移条件后,需要重新再迁移,导致数据的重复迁移,造成资源的浪费,数据库的容错性差。
而本申请实施例中,前述分布式事务保证了迁移过程的数据一致性和持久性,当分布式事务有多个时,整体的数据迁移过程拆分成串行执行的多个分布式事务的迁移过程,若检测到回滚触发事件,只需将当前执行到的一个分布式事务对应的所有操作进行回滚,在再次满足迁移条件后可以继续发起新的分布式事务进行数据迁移。因此,降低了回滚的数据粒度以及回滚的数据量,减少重复迁移的数据量,减少回滚对数据迁移过程整体上影响,避免资源浪费,提高数据库的容错性。
值得说明的是,在迁移第一数据表的数据的过程中,除了前述数据查询业务、数据添加业务等数据操作语言(data manipulation language,DML)业务,还可以产生其他类型的用户业务。如数据定义语言(data definition language,DDL)业务,该DDL业务包括创建表信息、修改表信息和删除表信息等业务,DDL业务所请求操作的对象是表信息,即表的定义和架构。
在传统的数据重分布方法中,由于需要保证源表和临时表的数据一致性,因此在数据迁移过程中,不允许执行DDL业务。
而本申请实施例中,由于无需建立临时表,数据迁移过程发生在数据表内,而不是源表和临时表之间,因此,在数据迁移过程中,支持前述DDL业务。例如支持修改表元信息,允许修改表名,增加或删除数据表中的字段等等。
值得说明的是,前述实施例是以一个数据表需要进行数据重分布为例进行说明的,本申请实施例在实际实现时,多个数据表可以同时执行上述数据重分布过程,提高数据重分布的效率,增加并发度。
综上所述,本申请实施例提供的数据重分布方法,无需建立临时表,即可进行目标任务的执行,实现在线数据重分布,这样无需进行表间数据迁移,仅需进行表内数据迁移,从而降低了在线数据重分布的复杂度。
并且,由于采用串行执行的多个分布式事务进行数据迁移,单次迁移耗时较短,资源消耗较小,减少了对同时执行的其他用户作业的影响。
进一步的,由于将新增数据直接写入数据重分布后的数据节点,有效减少了迁移的数据量,从而降低资源消耗减少了对同时执行的其他用户作业的影响。
示例的,以前述图6为例,若采用传统的数据重分布方法,哈希桶编号为1至17的数据均需要从第一节点集迁移至第二节点集;而本申请实施例中,哈希桶编号为1的数据需要从数据节点N1移动到数据节点N7,哈希桶编号为2的数据不需要移动,哈希桶编号为7的数据需要从数据节点N1移动到数据节点N9等等。总体需要迁移的仅有哈希桶编号为1、6、7、11、12、13和16的数据(图6中第二节点集中需要接收前述迁移的数据的数据节点N7、N8和N9采用了阴影表示),有效减少了数据迁移量。
本申请实施例,在有并发用户作业的场景下,采用了表内数据迁移和分布式多版本并发控制技术来实现数据迁移。不需要考虑并发作业的插入和删除操作导致的数据追增,允许按数据量和执行时间分批多次进行迁移,保证数据重分布对系统资源的消耗是可控的,可以有效的控制迁移的资源消耗和锁冲突影响,对用户作业的影响大大减少。利用本申请实施例来实现分布式数据库的在线扩容,可以避免停机扩容导致的长时间业务阻塞,对在线作业影响很小,即使在数据节点故障和网络故障情况下也可以很容易重新恢复重分布操作,对数据迁移的影响也很小。
本申请实施例提供一种数据重分布装置40,该数据重分布装置40可以部署在管理节点上。如图10所示,数据重分布装置40包括:
第一确定模块401,用于执行前述步骤301;
迁移模块402,用于执行前述步骤302;
第二确定模块403,用于执行前述步骤303;
发送模块404,用于执行前述步骤304。
综上所述,本申请实施例提供的数据重分布装置,无需建立临时表,即可进行目标任务的执行,实现在线数据重分布,这样无需进行表间数据迁移,仅需进行表内数据迁移,从而降低了在线数据重分布的复杂度。
可选地,如图11所示,所述第二确定模块403,包括:
确定子模块4031,用于当所述目标业务请求为数据添加请求时,在所述第二节点集中确定用于响应所述数据添加请求的所述第三节点集。
可选地,所述确定子模块4031,用于:
根据所述数据添加请求所携带的新增数据的键值计算哈希值;
在所述第二节点集中确定所述哈希值对应的数据节点,确定的数据节点属于所述第三节点集。
可选地,所述第二确定模块403,用于:
当所述目标业务请求为数据删除请求或者数据修改请求或者与第一数据表关联的数据查询请求时,在所述第一节点集中确定用于响应所述目标业务请求的数据节点,并在所述第二节点集中确定用于响应所述目标业务请求的数据节点,由从所述第一节点集中确定的数据节点和从所述第二节点集中确定的数据节点组成所述第三节点集。
可选地,如图12所示,所述迁移模块402,包括:
筛选子模块4021,用于在所述第一节点集存储的所述第一数据表的数据中筛选待迁移数据,所述待迁移数据为所述第二节点集在迁移前没有存储的所述第一数据表的数据;
迁移子模块4022,用于将所述待迁移数据从所述第一节点集迁移至所述第二节点集。
可选地,所述筛选子模块4021,用于:
获取所述第一数据表中的数据与所述第一节点集的数据节点的第一映射关系;
获取所述第一数据表中的数据与所述第二节点集的数据节点的第二映射关系;
对于所述第一数据表中的目标数据,在基于所述第一映射关系确定的与所述目标数据对应的数据节点与基于所述第二映射关系确定的与所述目标数据对应的数据节点不同时,在基于所述第一映射关系确定的与所述目标数据对应的数据节点中,将所述目标数据确定为所述待迁移数据。
可选地,所述迁移子模块4022,用于:
通过串行执行的多个分布式事务,分别将所述第一数据表的不同数据从所述第一节点集迁移至所述第二节点集。
可选地,所述迁移子模块4022,用于:
在串行执行所述多个分布式事务时,通过当前执行到的分布式事务,从所述第一节点集中的所述第一数据表的未迁移数据中选择满足迁移条件的待迁移数据,并将选择的所述待迁移数据从所述第一节点集迁移至所述第二节点集,选择的所述待迁移数据在被迁移过程中被加锁;
其中,所述迁移条件包括:通过当前执行到的分布式事务迁移的所述待迁移数据的数据量小于或等于指定数据量阈值,和/或,通过当前执行到的分布式事务迁移的迁移时长小于或等于指定时长阈值。
可选地,所述迁移子模块4022,用于:
基于所述当前执行到的分布式事务,为n个数据节点分别生成n个分布式计划,所述第一节点集包括所述n个数据节点,所述n个数据节点与所述n个分布式计划一一对应,n为正整数;
指示所述n个数据节点分别执行所述n个分布式计划来并行从所述n个数据节点中的所述第一数据表的未迁移数据中选择满足子迁移条件的待迁移数据、并将选择的满足所述子迁移条件的所述待迁移数据从所述n个数据节点发送至所述第二节点集,所述子迁移条件是根据所述迁移条件确定的。
可选地,如图13所示,所述装置40还包括:
回滚模块405,用于在迁移所述第一数据表的数据的过程中,若检测到所述分布式数据库达到回滚触发事件,将当前工作的分布式事务所迁移的数据进行回滚。
或者,回滚模块405,用于在在迁移所述第一数据表的数据的过程中,若检测到回滚触发事件,将通过当前执行到的分布式事务已迁移的数据进行回滚。
可选地,如图14所示,所述装置40还包括:
设置模块406,用于为所述第一节点集上的所述第一数据表中的已迁移数据设置删除标识。
可选地,图15示意性地提供本申请所述计算设备的一种可能的基本硬件架构。
参见图15,计算设备500包括处理器501、存储器502、通信接口503和总线504。
计算设备500中,处理器501的数量可以是一个或多个,图15仅示意了其中一个处理器501。可选地,处理器501,可以是中央处理器(central processing unit,CPU)。如果计算设备500具有多个处理器501,多个处理器501的类型可以不同,或者可以相同。可选地,计算设备500的多个处理器501还可以集成为多核处理器。
存储器502存储计算机指令和数据;存储器502可以存储实现本申请提供的数据重分布方法所需的计算机指令和数据,例如,存储器502存储用于实现数据重分布方法的步骤的指令。存储器502可以是以下存储介质的任一种或任一种组合:非易失性存储器(例如只读存储器(ROM)、固态硬盘(SSD)、硬盘(HDD)、光盘),易失性存储器。
通信接口503可以是以下器件的任一种或任一种组合:网络接口(例如以太网接口)、无线网卡等具有网络接入功能的器件。
通信接口503用于计算设备500与其它计算设备或者终端进行数据通信。
总线504可以将处理器501与存储器502和通信接口503连接。这样,通过总线504,处理器501可以访问存储器502,还可以利用通信接口503与其它计算设备或者终端进行数据交互。
在本申请中,计算设备500执行存储器502中的计算机指令,使得计算设备500实现本申请提供的数据重分布方法,或者使得计算设备500部署数据重分布装置。
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器,上述指令可由服务器的处理器执行以完成本发明各个实施例所示的表情图片推荐方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
本申请实施例提供一种分布式数据库系统,包括:管理节点和数据节点,所述管理节点包括前述任一所述的数据重分布装置40或前述计算设备500。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现,所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机的可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质,或者半导体介质(例如固态硬盘)等。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。
在本申请中,术语“第一”和“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性。术语“多个”指两个或两个以上,除非另有明确的限定。A参考B,指的是A与B相同或者A为B的简单变形。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的示例性实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (27)

  1. 一种数据重分布方法,其特征在于,包括:
    确定分布式数据库中与第一数据表分别关联的第一节点集和第二节点集,所述第一节点集包括在所述第一数据表的数据被数据重分布之前用于存储所述第一数据表中的数据的数据节点,所述第二节点集包括从所述第一数据表的数据被数据重分布开始用于存储所述第一数据表中的数据的数据节点;
    将所述第一数据表的数据从所述第一节点集迁移至所述第二节点集;
    在迁移所述第一数据表的数据的过程中,当接收到对所述第一数据表的目标业务请求时,在所述第一节点集和所述第二节点集中确定用于响应所述目标业务请求的第三节点集;
    将所述目标业务请求发送至所述第三节点集中的数据节点。
  2. 根据权利要求1所述的方法,其特征在于,所述在所述第一节点集和所述第二节点集中确定用于响应所述目标业务请求的第三节点集,包括:
    当所述目标业务请求为数据添加请求时,在所述第二节点集中确定用于响应所述数据添加请求的所述第三节点集。
  3. 根据权利要求2所述的方法,其特征在于,所述在所述第二节点集中确定用于响应所述数据添加请求的所述第三节点集,包括:
    根据所述数据添加请求所携带的新增数据的键值计算哈希值;
    在所述第二节点集中确定所述哈希值对应的数据节点,确定的数据节点属于所述第三节点集。
  4. 根据权利要求1所述的方法,其特征在于,所述在所述第一节点集和所述第二节点集中确定用于响应所述目标业务请求的第三节点集,包括:
    当所述目标业务请求为数据删除请求或者数据修改请求或者与第一数据表关联的数据查询请求时,在所述第一节点集中确定用于响应所述目标业务请求的数据节点,并在所述第二节点集中确定用于响应所述目标业务请求的数据节点,由从所述第一节点集中确定的数据节点和从所述第二节点集中确定的数据节点组成所述第三节点集。
  5. 根据权利要求1至4任一所述的方法,其特征在于,所述将所述第一数据表的数据从所述第一节点集迁移至所述第二节点集,包括:
    在所述第一节点集存储的所述第一数据表的数据中筛选待迁移数据,所述待迁移数据为所述第二节点集在迁移前没有存储的所述第一数据表的数据;
    将所述待迁移数据从所述第一节点集迁移至所述第二节点集。
  6. 根据权利要求5所述的方法,其特征在于,所述在所述第一节点集存储的所述第一数据表的数据中筛选待迁移数据,包括:
    获取所述第一数据表中的数据与所述第一节点集的数据节点的第一映射关系;
    获取所述第一数据表中的数据与所述第二节点集的数据节点的第二映射关系;
    对于所述第一数据表中的目标数据,在基于所述第一映射关系确定的与所述目标数据对应的数据节点与基于所述第二映射关系确定的与所述目标数据对应的数据节点不同时,在基于所述第一映射关系确定的与所述目标数据对应的数据节点中,将所述目标数据确定为所述待迁移数据。
  7. 根据权利要求1至6任一所述的方法,其特征在于,所述将所述第一数据表的数据从所述第一节点集迁移至所述第二节点集,包括:
    通过串行执行的多个分布式事务,分别将所述第一数据表的不同数据从所述第一节点集迁移至所述第二节点集。
  8. 根据权利要求7所述的方法,其特征在于,所述通过串行执行的多个分布式事务,分别将所述第一数据表的不同数据从所述第一节点集迁移至所述第二节点集,包括:
    在串行执行所述多个分布式事务时,通过当前执行到的分布式事务,从所述第一节点集中的所述第一数据表的未迁移数据中选择满足迁移条件的待迁移数据,并将选择的所述待迁移数据从所述第一节点集迁移至所述第二节点集,选择的所述待迁移数据在被迁移过程中被加锁;
    其中,所述迁移条件包括:通过当前执行到的分布式事务迁移的所述待迁移数据的数据量小于或等于指定数据量阈值,和/或,通过当前执行到的分布式事务迁移的迁移时长小于或等于指定时长阈值。
  9. 根据权利要求8所述的方法,其特征在于,
    所述在串行执行所述多个分布式事务时,通过当前执行到的分布式事务,从所述第一节点集中的所述第一数据表的未迁移数据选择满足迁移条件的待迁移数据,并将选择的所述待迁移数据从所述第一节点集迁移至所述第二节点集,包括:
    基于所述当前执行到的分布式事务,为n个数据节点分别生成n个分布式计划,所述第一节点集包括所述n个数据节点,所述n个数据节点与所述n个分布式计划一一对应,n为正整数;
    指示所述n个数据节点分别执行所述n个分布式计划来并行从所述n个数据节点中的所述第一数据表的未迁移数据中选择满足子迁移条件的待迁移数据、并将选择的满足所述子迁移条件的所述待迁移数据从所述n个数据节点发送至所述第二节点集,所述子迁移条件是根据所述迁移条件确定的。
  10. 根据权利要求7至9任一所述的方法,其特征在于,所述方法还包括:
    在迁移所述第一数据表的数据的过程中,若检测到回滚触发事件,将通过所述多个分布式事务已迁移的数据进行回滚。
  11. 根据权利要求7至9任一所述的方法,其特征在于,所述方法还包括:
    在迁移所述第一数据表的数据的过程中,若检测到回滚触发事件,将通过当前执行到的分布式事务已迁移的数据进行回滚。
  12. 根据权利要求1至11任一所述的方法,其特征在于,所述方法包括:
    为所述第一节点集上的所述第一数据表中的已迁移数据设置删除标识。
  13. 一种数据重分布装置,其特征在于,包括:
    第一确定模块,用于确定分布式数据库中与第一数据表分别关联的第一节点集和第二节点集,所述第一节点集包括在所述第一数据表的数据被数据重分布之前用于存储所述第一数据表中的数据的数据节点,所述第二节点集包括从所述第一数据表的数据被数据重分布开始用于存储所述第一数据表中的数据的数据节点;
    迁移模块,用于将所述第一数据表的数据从所述第一节点集迁移至所述第二节点集;
    第二确定模块,用于在迁移所述第一数据表的数据的过程中,当接收到对所述第一数据表的目标业务请求时,在所述第一节点集和所述第二节点集中确定用于响应所述目标业务请求所的第三节点集;
    发送模块,用于将所述目标业务请求发送至所述第三节点集中的数据节点。
  14. 根据权利要求13所述的装置,其特征在于,所述第二确定模块,包括:
    确定子模块,用于当所述目标业务请求为数据添加请求时,在所述第二节点集中确定用于响应所述数据添加请求的所述第三节点集。
  15. 根据权利要求14所述的装置,其特征在于,所述确定子模块,用于:
    根据所述数据添加请求所携带的新增数据的键值计算哈希值;
    在所述第二节点集中确定所述哈希值对应的数据节点,确定的数据节点属于所述第三节点集。
  16. 根据权利要求13所述的装置,其特征在于,所述第二确定模块,用于:
    当所述目标业务请求为数据删除请求或者数据修改请求或者与第一数据表关联的数据查询请求时,在所述第一节点集中确定用于响应所述目标业务请求的数据节点,并在所述第二节点集中确定用于响应所述目标业务请求的数据节点,由从所述第一节点集中确定的数据节点和从所述第二节点集中确定的数据节点组成所述第三节点集。
  17. 根据权利要求13至16任一所述的装置,其特征在于,所述迁移模块,包括:
    筛选子模块,用于在所述第一节点集存储的所述第一数据表的数据中筛选待迁移数据,所述待迁移数据为所述第二节点集在迁移前没有存储的所述第一数据表的数据;
    迁移子模块,用于将所述待迁移数据从所述第一节点集迁移至所述第二节点集。
  18. 根据权利要求17所述的装置,其特征在于,所述筛选子模块,用于:
    获取所述第一数据表中的数据与所述第一节点集的数据节点的第一映射关系;
    获取所述第一数据表中的数据与所述第二节点集的数据节点的第二映射关系;
    对于所述第一数据表中的目标数据,在基于所述第一映射关系确定的与所述目标数据对应的数据节点与基于所述第二映射关系确定的与所述目标数据对应的数据节点不同时,在基于所述第一映射关系确定的与所述目标数据对应的数据节点中,将所述目标数据确定为所述待迁移数据。
  19. 根据权利要求13至18任一所述的装置,其特征在于,所述迁移子模块,用于:
    通过串行执行的多个分布式事务,分别将所述第一数据表的不同数据从所述第一节点集迁移至所述第二节点集。
  20. 根据权利要求19所述的装置,其特征在于,所述迁移子模块,用于:
    在串行执行所述多个分布式事务时,通过当前执行到的分布式事务,从所述第一节点集中的所述第一数据表的未迁移数据中选择满足迁移条件的待迁移数据,并将选择的所述待迁移数据从所述第一节点集迁移至所述第二节点集,选择的所述待迁移数据在被迁移过程中被加锁;
    其中,所述迁移条件包括:通过当前执行到的分布式事务迁移的所述待迁移数据的数据量小于或等于指定数据量阈值,和/或,通过当前执行到的分布式事务迁移的迁移时长小于或等于指定时长阈值。
  21. 根据权利要求20所述的装置,其特征在于,所述迁移子模块,用于:
    基于所述当前执行到的分布式事务,为n个数据节点分别生成n个分布式计划,所述第一节点集包括所述n个数据节点,所述n个数据节点与所述n个分布式计划一一对应,n为正整数;
    指示所述n个数据节点分别执行所述n个分布式计划来并行从所述n个数据节点中的所述第一数据表的未迁移数据中选择满足子迁移条件的待迁移数据、并将选择的满足所述子迁移条件的所述待迁移数据从所述n个数据节点发送至所述第二节点集,所述子迁移条件是根据所述迁移条件确定的。
  22. 根据权利要求19至21任一所述的装置,其特征在于,所述装置还包括:
    回滚模块,用于在迁移所述第一数据表的数据的过程中,若检测到所述分布式数据库达到回滚触发事件,将当前工作的分布式事务所迁移的数据进行回滚。
  23. 根据权利要求19至21任一所述的装置,其特征在于,所述装置还包括:
    回滚模块,用于在在迁移所述第一数据表的数据的过程中,若检测到回滚触发事件,将通过当前执行到的分布式事务已迁移的数据进行回滚。
  24. 根据权利要求13至23任一所述的装置,其特征在于,所述装置还包括:
    设置模块,用于为所述第一节点集上的所述第一数据表中的已迁移数据设置删除标识。
  25. 一种计算机设备,其特征在于,包括:
    处理器和存储器;
    所述存储器,用于存储计算机指令;
    所述处理器,用于执行所述存储器存储的计算机指令,使得所述计算设备执行权利要求1至12任一所述的数据重分布方法。
  26. 一种分布式数据库系统,其特征在于,包括:管理节点和数据节点,所述管理节点包括权利要求13至24任一所述的数据重分布装置。
  27. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质包括计算机指令,所述计算机指令指示计算设备执行权利要求1至12任一所述的数据重分布方法。
PCT/CN2019/105357 2019-09-11 2019-09-11 数据重分布方法、装置及系统 WO2021046750A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
PCT/CN2019/105357 WO2021046750A1 (zh) 2019-09-11 2019-09-11 数据重分布方法、装置及系统
CN201980005457.2A CN112789606A (zh) 2019-09-11 2019-09-11 数据重分布方法、装置及系统
EP19945328.3A EP3885929A4 (en) 2019-09-11 2019-09-11 METHOD, DEVICE AND SYSTEM FOR DATA REDISTRIBUTION
US17/370,275 US11860833B2 (en) 2019-09-11 2021-07-08 Data redistribution method, apparatus, and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/105357 WO2021046750A1 (zh) 2019-09-11 2019-09-11 数据重分布方法、装置及系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/370,275 Continuation US11860833B2 (en) 2019-09-11 2021-07-08 Data redistribution method, apparatus, and system

Publications (1)

Publication Number Publication Date
WO2021046750A1 true WO2021046750A1 (zh) 2021-03-18

Family

ID=74867152

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/105357 WO2021046750A1 (zh) 2019-09-11 2019-09-11 数据重分布方法、装置及系统

Country Status (4)

Country Link
US (1) US11860833B2 (zh)
EP (1) EP3885929A4 (zh)
CN (1) CN112789606A (zh)
WO (1) WO2021046750A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114265875A (zh) * 2022-03-03 2022-04-01 深圳钛铂数据有限公司 一种基于流数据的实时建宽表的方法

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113742347B (zh) * 2021-09-07 2023-06-27 中国兵器装备集团自动化研究所有限公司 数据库存储数据方法、电子装置及计算机可读存储介质
CN113515364B (zh) * 2021-09-14 2022-03-01 腾讯科技(深圳)有限公司 一种数据迁移的方法及装置、计算机设备和存储介质
CN113868335A (zh) * 2021-09-15 2021-12-31 威讯柏睿数据科技(北京)有限公司 一种内存数据库分布式集群的扩展方法和设备
CN114661718B (zh) * 2022-03-28 2023-04-25 北京海量数据技术股份有限公司 Opengauss平台下在线创建本地分区索引的方法及系统
CN117435594B (zh) * 2023-12-18 2024-04-16 天津南大通用数据技术股份有限公司 一种分布式数据库分布键的优选方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103164167A (zh) * 2011-12-15 2013-06-19 深圳市腾讯计算机系统有限公司 一种数据迁移方法及装置
CN103514212A (zh) * 2012-06-27 2014-01-15 腾讯科技(深圳)有限公司 数据写入方法及系统
US20140173035A1 (en) * 2011-08-02 2014-06-19 Nec Corporation Distributed storage system and method
CN107066570A (zh) * 2017-04-07 2017-08-18 聚好看科技股份有限公司 数据管理方法及装置

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003021377A2 (en) * 2001-08-31 2003-03-13 Op40, Inc. Enterprise information system
JP4767057B2 (ja) * 2006-03-27 2011-09-07 富士通株式会社 ハッシュ値生成プログラム、ストレージ管理プログラム、判定プログラム及びデータ変更検証装置
ES2387625T3 (es) * 2007-12-17 2012-09-27 Nokia Siemens Networks Oy Encaminamiento de consulta en un sistema de base de datos distribuida
US8078825B2 (en) * 2009-03-11 2011-12-13 Oracle America, Inc. Composite hash and list partitioning of database tables
US8250325B2 (en) * 2010-04-01 2012-08-21 Oracle International Corporation Data deduplication dictionary system
US9613119B1 (en) * 2013-03-14 2017-04-04 Nutanix, Inc. Unique identifiers for data replication, migration, failover operations and failback operations
US9519663B2 (en) * 2013-06-26 2016-12-13 Sap Se Upgrading and migrating a database by a migration tool
US10037340B2 (en) * 2014-01-21 2018-07-31 Red Hat, Inc. Tiered distributed storage policies
US10884869B2 (en) * 2015-04-16 2021-01-05 Nuodb, Inc. Backup and restore in a distributed database utilizing consistent database snapshots
US10528580B2 (en) * 2016-01-27 2020-01-07 Oracle International Corporation Method and mechanism for efficient re-distribution of in-memory columnar units in a clustered RDBMS on topology change
US10733186B2 (en) * 2016-09-15 2020-08-04 Oracle International Corporation N-way hash join
US9805071B1 (en) * 2016-11-10 2017-10-31 Palantir Technologies Inc. System and methods for live data migration
CN108132949B (zh) * 2016-12-01 2021-02-12 腾讯科技(深圳)有限公司 数据库集群中数据迁移的方法及装置
US11042330B2 (en) * 2017-03-01 2021-06-22 Samsung Electronics Co., Ltd. Methods and systems for distributed data storage
US11120006B2 (en) * 2018-06-21 2021-09-14 Amazon Technologies, Inc. Ordering transaction requests in a distributed database according to an independently assigned sequence

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140173035A1 (en) * 2011-08-02 2014-06-19 Nec Corporation Distributed storage system and method
CN103164167A (zh) * 2011-12-15 2013-06-19 深圳市腾讯计算机系统有限公司 一种数据迁移方法及装置
CN103514212A (zh) * 2012-06-27 2014-01-15 腾讯科技(深圳)有限公司 数据写入方法及系统
CN107066570A (zh) * 2017-04-07 2017-08-18 聚好看科技股份有限公司 数据管理方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3885929A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114265875A (zh) * 2022-03-03 2022-04-01 深圳钛铂数据有限公司 一种基于流数据的实时建宽表的方法

Also Published As

Publication number Publication date
US11860833B2 (en) 2024-01-02
EP3885929A1 (en) 2021-09-29
US20210334252A1 (en) 2021-10-28
EP3885929A4 (en) 2022-02-09
CN112789606A (zh) 2021-05-11

Similar Documents

Publication Publication Date Title
WO2021046750A1 (zh) 数据重分布方法、装置及系统
US11321303B2 (en) Conflict resolution for multi-master distributed databases
US9460185B2 (en) Storage device selection for database partition replicas
US9489443B1 (en) Scheduling of splits and moves of database partitions
US10248709B2 (en) Promoted properties in relational structured data
US10599676B2 (en) Replication control among redundant data centers
US11442961B2 (en) Active transaction list synchronization method and apparatus
CN109643310B (zh) 用于数据库中数据重分布的系统和方法
US11226985B2 (en) Replication of structured data records among partitioned data storage spaces
US10235406B2 (en) Reminder processing of structured data records among partitioned data storage spaces
JP7438603B2 (ja) トランザクション処理方法、装置、コンピュータデバイス及びコンピュータプログラム
US20140279840A1 (en) Read Mostly Instances
US8650274B2 (en) Virtual integrated management device for performing information update process for device configuration information management device
WO2019109854A1 (zh) 分布式数据库数据处理方法、装置、存储介质及电子装置
JP2017027326A (ja) ストレージシステムおよびストレージシステム用プログラム
CN111459913B (zh) 分布式数据库的容量扩展方法、装置及电子设备
US11061719B2 (en) High availability cluster management of computing nodes
US11940972B2 (en) Execution of operations on partitioned tables
WO2023124242A1 (zh) 事务执行方法、装置、设备和存储介质
WO2023066222A1 (zh) 数据处理方法、装置、电子设备、存储介质及程序产品
US11372838B2 (en) Parallel processing of changes in a distributed system
US11836125B1 (en) Scalable database dependency monitoring and visualization system
US11740836B2 (en) Parallel reads of data staging table
US11768853B2 (en) System to copy database client data
US20230385305A1 (en) Database transactions across different domains

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19945328

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019945328

Country of ref document: EP

Effective date: 20210623

NENP Non-entry into the national phase

Ref country code: DE