WO2024067528A1 - 数据库系统及数据库列变更方法 - Google Patents

数据库系统及数据库列变更方法 Download PDF

Info

Publication number
WO2024067528A1
WO2024067528A1 PCT/CN2023/121268 CN2023121268W WO2024067528A1 WO 2024067528 A1 WO2024067528 A1 WO 2024067528A1 CN 2023121268 W CN2023121268 W CN 2023121268W WO 2024067528 A1 WO2024067528 A1 WO 2024067528A1
Authority
WO
WIPO (PCT)
Prior art keywords
column
logical
gsi
physical
data table
Prior art date
Application number
PCT/CN2023/121268
Other languages
English (en)
French (fr)
Inventor
游翎璟
陈默
黄贵
李飞飞
楼江航
Original Assignee
杭州阿里云飞天信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州阿里云飞天信息技术有限公司 filed Critical 杭州阿里云飞天信息技术有限公司
Publication of WO2024067528A1 publication Critical patent/WO2024067528A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Definitions

  • the present disclosure relates to a database, and in particular to a solution for performing column changes in a database.
  • a distributed database system based on the idea of distributed database sharding is mainly composed of one or more data nodes (DN, also called “storage nodes”) and one or more computing nodes (CN).
  • DN data nodes
  • CN computing nodes
  • the computing node stores the logical table (also called “logical library table”), which can have multiple partitions (shards).
  • the data of each partition is stored in the sub-library and sub-table on the corresponding data node, that is, the physical table (also called “physical library table”).
  • users can send logical SQL (structured query statement) instructions acting on logical tables to computing nodes, and then computing nodes can send physical SQL instructions acting on physical tables to data nodes.
  • logical SQL structured query statement
  • GSI global secondary index
  • main table or “data master table”
  • data master table The data in the data table and GSI are consistent. Both the data table and GSI can have logical tables on the computing nodes and physical tables on the data nodes respectively.
  • a table consists of one or more columns.
  • a column stores a piece of information in the table.
  • Each column has its own column name and corresponding column type (data type).
  • Each column stores a specific piece of information. For example, in a personnel information table, one column stores the personnel number, another column stores the personnel name, and the address, city, state, and zip code are also stored in their own columns, and each row corresponds to a different person and records their various information.
  • a column change operation it is necessary to perform a change operation on the attributes of a column in a table, that is, a column change operation.
  • a column change operation For example, the column name and/or column type may be changed.
  • Some database solutions use third-party tools or built-in tools to perform column change operations. Their common characteristics are that they need to copy data to a new table, take a long time to execute, and need to rely on MySQL triggers or Binlog.
  • a technical problem to be solved by the present disclosure is to provide a database column change solution, which can conveniently implement column changes without causing failure of user write or read access during the column change process.
  • a method for changing a database column wherein the database has a data table and a global secondary index GSI for the data table, the data table has a logical data table on a computing node and a physical data table on a data node, the GSI has a logical GSI table on a computing node and a physical GSI table on a data node, the method comprising: adding a second column to the physical data table and the physical GSI table, respectively, the value of at least one attribute of the second column in the physical data table and the physical GSI table being a change target value of a corresponding attribute of a first column to be changed in the data table; adding a second column to each of the structural definitions of the logical data table and the logical GSI table, respectively, and hiding the second column from a user, the second column in the logical data table and the logical GSI table being associated with the second column in the physical data table and the physical GSI
  • the column name of the second column in the physical data table and the physical GSI table and the column name of the second column in the logical data table and the logical GSI table are both the change target value of the column name of the first column.
  • the column name of the second column in the physical data table and the physical GSI table may be different from the column name of the second column in the logical data table and the logical GSI table, and the column name of the second column in the logical data table and the logical GSI table is configured to map to the column name of the second column in the physical data table and the physical GSI table.
  • the method may further include: after each change to the structure definition of the logical data table and the logical GSI table, synchronizing the changed structure definitions of the logical data table and the logical GSI table to all computing nodes.
  • the operation of synchronizing the changed structure definitions of the logical data table and the logical GSI table to all computing nodes is an atomic operation.
  • configuration is made in the respective structural definitions of the logical data table and the logical GSI table to use the second column as the first column in the user access logic of the database, and after the changed structural definitions of the logical data table and the logical GSI table are synchronized to all computing nodes, the step of deleting the first column in the physical data table and the physical GSI table is performed.
  • the method may further include: after adding the second column and before deleting the first column, making the second column of the physical data table and the physical GSI table have the same data content as the first column of the physical data table and the physical GSI table.
  • the step of making the second column of the physical data table and the physical GSI table have the same data content as the first column of the physical data table and the physical GSI table comprises: when the first column is displayed to the user and the second column is hidden to the user,
  • the structure definitions of the physical data table and the logical GSI table are configured so that the modifications to the first column are copied to the second column of the physical data table and the physical GSI table respectively; and/or the existing data content in the first column of the physical data table and the physical GSI table is copied to the second column of the physical data table and the physical GSI table respectively; and/or when the first column is hidden from the user and the second column is displayed to the user, the structure definitions of the logical data table and the logical GSI table are configured so that the modifications to the first column are stopped from being copied to the second column of the physical data table and the physical GSI table respectively, and the modifications to the second column are copied to the first column of the physical data table and the physical GSI table respectively.
  • the method further includes: configuring in the structure definition of the logical data table and the logical GSI table respectively to stop replicating modifications to the second column on the first column of the physical data table and the physical GSI table respectively.
  • the attributes of the first column to be changed include the column name and the column type, or only the column name
  • the step of configuring in the respective structural definitions of the logical data table and the logical GSI table to replace the first column with the second column in the user access logic of the database includes: configuring in the respective structural definitions of the logical data table and the logical GSI table so that the first column is hidden from the user and the second column is displayed to the user.
  • the attributes of the first column to be changed include the column type
  • the step of configuring in the respective structural definitions of the logical data table and the logical GSI table to replace the first column with the second column in the user access logic of the database includes: exchanging the column names of the first column and the second column in the physical data table and the physical GSI table, or modifying the column name of the first column and the column name of the second column in the logical data table and the logical GSI table to map to the column names of the second column and the first column in the physical data table and the physical GSI table, respectively; and exchanging the column names of the first column and the second column in the respective structural definitions of the logical data table and the logical GSI table.
  • the database is a distributed database.
  • the database operates on each column based on the column name.
  • a database system comprising one or more computing nodes and one or more data nodes, wherein a column change operation is implemented using the method according to the first aspect.
  • a computing device comprising: a processor; and a memory on which executable codes are stored, and when the executable codes are executed by the processor, the processor executes the method described in the first aspect above.
  • a computer program product comprising an executable code, and when the executable code is executed by a processor of an electronic device, the processor is caused to execute the method as described in the first aspect above.
  • a non-transitory machine-readable storage medium on which executable code is stored.
  • the executable code is executed by a processor of an electronic device, the processor executes the method described in the first aspect above.
  • the database column change is conveniently implemented, and the column change process will not cause the user's write or read access request to fail.
  • FIG1 is a schematic block diagram of a distributed database system.
  • FIG. 2 is a schematic diagram of a write operation in a database having a global secondary index (GSI).
  • GPI global secondary index
  • FIG. 3 is a schematic diagram of a query operation in a database having a global secondary index (GSI).
  • FIG. 4 is a schematic flow chart of a database column modification method according to the present disclosure.
  • FIG5 is a schematic flow chart of a database column changing method according to an improved embodiment of the present disclosure.
  • 6A to 6H schematically illustrate a column changing process in one situation.
  • FIG8 shows a schematic diagram of the structure of a computing device that can be used to implement the above database column modification method according to an embodiment of the present invention.
  • DDL Data Definition Language, which can be used to add columns, change column types and lengths, and add constraints.
  • users modify the column name while modifying the column type.
  • there may be a time point when some physical tables have completed the physical DDL and use the new column name, while other physical tables have not completed the physical DDL and use the old column name.
  • the query instructions issued by the user will report an error regardless of whether they use the new column name or the old column name.
  • the column change has not yet been performed on the global secondary index (GSI), and the column type of the column in the data table and the GSI is inconsistent.
  • GSI global secondary index
  • the database system may have multiple computing nodes.
  • different computing nodes may see different versions of the logical schema before and after the change.
  • online DDL operation refers to DDL operation that allows the user's DML statements to be executed concurrently during the DDL execution process.
  • DML is data manipulation language, such as insert (INSERT), update (UPDATE), delete (DELETE) and other statements.
  • the present disclosure proposes a novel database column change solution, which can conveniently change column attributes by adding new columns, switching the access logic of new and old columns, and deleting old columns. For users, changes to data tables and GSIs will not cause write or read access failures.
  • FIG1 is a schematic block diagram of a distributed database system.
  • a distributed database may include one or more computing nodes and one or more data nodes.
  • FIGS. 2 and 3 are schematic diagrams of write and query operations in a distributed database with GSI, respectively.
  • the distributed database includes logical tables at the logical layer and physical tables at the physical layer (sub-library 1, sub-library 2, etc.).
  • Logical tables can be located on computing nodes, while physical tables are located on data nodes. Different sub-libraries can be located on multiple different data nodes. Logical tables can also be located on multiple computing nodes.
  • the distributed database has a data table (e.g., a master table) and a GSI for providing a global secondary index for the data table, such as the master table.
  • the data table and the GSI respectively have physical tables on the data nodes (master table sub-table 1, master table sub-table 2; index sub-table 1, index sub-table 2; ...) and logical tables on the computing nodes.
  • the logical table of the data table is referred to as the "logical data table”
  • the physical table of the data table is referred to as the “physical data table” (i.e., the main table sub-table in Figures 2 and 3)
  • the logical table of the GSI is referred to as the "logical GSI table”
  • the physical table of the GSI is referred to as the "physical GSI table” (i.e., the index sub-table in Figures 2 and 3).
  • the logical data table, physical data table, logical GSI table, and physical GSI table for the same data table have the same columns, but the columns are presented or stored in different forms.
  • the same columns in the logical data table, physical data table, logical GSI table, and physical GSI table for the same data table are no longer named separately.
  • logical SQL structured query statement
  • computing nodes can send physical SQL instructions to data nodes to act on physical tables.
  • GAI global secondary index
  • this index is called a local index. Different from a local index, if the data row and the corresponding index row are stored on different shards, this index is called a global secondary index, which is mainly used to quickly determine the data shards involved in the query. The two can be used together. After the query is sent to a single shard through GSI, the local index on the shard can improve the query performance within the shard.
  • GSI supports adding split dimensions on demand and provides globally unique indexes. Each GSI corresponds to an index table, and XA multi-write is used to ensure strong data consistency between the main table and the index table.
  • XA transaction means "distributed transaction”.
  • cross-shard queries may occur.
  • the increase in cross-shard queries will lead to performance issues such as slow query and connection pool exhaustion.
  • GSI can reduce cross-shard queries by increasing the split dimension and eliminate performance bottlenecks.
  • the following describes a database column changing method according to the present disclosure with reference to FIG. 4 .
  • Column modification refers to changing the attribute of a column in the database to a change target value.
  • the attribute to be modified may be, for example, the column name and/or the column type.
  • Fig. 4 is a schematic flow chart of a database column modification method according to the present disclosure.
  • the database here may be a distributed database.
  • the corresponding logical data table and the physical data table have associated first columns
  • the corresponding logical GSI table and the physical GSI table have associated first columns
  • the first columns in the logical GSI table and the physical GSI table provide global secondary indexes for the first columns in the logical data table and the physical data table.
  • the present disclosure creatively proposes to implement a new database online column change solution through online DDL operations such as adding columns, renaming columns, and dropping columns on a database system such as MySQL.
  • a second column is added to the physical data table and the physical GSI table, respectively.
  • the value of at least one attribute of the second column in the physical data table and the physical GSI table is the attribute change target value of the corresponding attribute of the first column to be changed in the data table.
  • the column change operation is expected to change the value of the attribute of the first column to the attribute change target value.
  • the properties of a column may include, for example, the column name, column type, and the like.
  • the column name of the second column added here can be different from the column name target value of the first column.
  • the logical table and the physical table are associated with each other through the same column name, that is, the columns in the logical table and the physical table with the same column name are associated and correspond to the same column.
  • the column name of the second column can be set to the column name change target value of the first column. That is, if you want to change the column name of the first column to "N", set the column name of the second column in the physical table to "N".
  • the logical table and the physical table are associated with each other through the mapping relationship of column names.
  • the mapping relationship can be configured to record which column names in the physical table each column name in the logical table corresponds to.
  • the logical table is presented to the user, and the user can find the desired column name in the logical table.
  • the corresponding column name in the physical table can be determined through the mapping relationship, so that the physical table can be queried based on the corresponding physical table column name.
  • the column name of the second column does not need to be set to the column name change target value of the first column, but can be set to any column name that meets the general column name conditions (such as no conflict, no duplication). That is, if you want to change the column name of the first column to "N", the column name of the second column in the physical table does not need to be set to "N", but can be set to, for example, "M”. In this way, the column name of the second column in the logical table can be set to "N" later, and the column name "N” in the logical table can be mapped to the column name "M" in the physical table.
  • column names of the corresponding columns in the data table and GSI are generally the same.
  • the column name set of the GSI is the same as the column name set of the data table.
  • step S420 a second column is added to the structure definition (schema) of the logical data table and the logical GSI table respectively, and the second column is hidden from the user.
  • the second column in the logical data table and the logical GSI table is associated with the second column in the physical data table and the physical GSI table respectively, and the value of at least one attribute of the second column in the logical data table and the logical GSI table is the above-mentioned attribute change target value.
  • the column name of the second column in the physical data table and the physical GSI table set in step S410 is the same as the column name of the second column in the logical data table and the logical GSI table set in step S420, and both are the target values for the column name change. In this way, the association between the newly added second column in the logical table and the physical table can be realized.
  • the column name of the second column of the logical table is configured to have the change target value of the column name, while the column name of the second column of the physical table does not necessarily have to be the change target value of the column name.
  • the association between the two can be realized by configuring the column name of the second column in the logical data table and the logical GSI table to be mapped to the column name of the second column in the physical data table and the physical GSI table.
  • the first column has not changed and is still displayed to the user.
  • step S430 configuration is performed in the structure definitions of the logical data table and the logical GSI table respectively, so as to replace the first column to be changed with the second column in the user access logic of the database.
  • the structure definitions of the logical data table and the logical GSI table can be configured so that the first column to be changed is hidden from the user, while the second column is displayed to the user, and the second column in the logical data table and the logical GSI table is kept associated with the second column in the physical data table and the physical GSI table, respectively.
  • the second column and the first column are exchanged in the user access logic.
  • the column with the original attribute value (the first column) is changed to the column with the attribute change target value (the second column).
  • the user seems to feel that the attribute of only one column has been changed.
  • the user accesses (writes or queries) the second column in the physical table through the second column in the logical table.
  • the second column is displayed to the user, and the first column is no longer displayed to the user, but is hidden from the user.
  • step S430 The configuration operation of step S430 is described in further detail below.
  • the column attribute to be changed in the first column may include a column name and/or a column type. Different configuration schemes may be provided for different change attributes.
  • configuration can be made in the structure definitions of the logical data table and the logical GSI table respectively to hide the first column from the user and display the second column to the user.
  • the second column in the physical table can have the same column name as the second column in the logical table; or, they can have different column names, and it is only necessary to keep mapping the column name of the second column in the logical table to the column name of the second column in the physical table.
  • the user access logic can be changed by first exchanging the association between the column names of the first column and the second column in the physical table and the column names of the first column and the second column in the logical table, and then exchanging the column names of the first column and the second column in the logical table.
  • the column name of the second column can be set to any new column name that meets the general column name conditions (such as no conflict, no duplication).
  • the column names of the first column and the second column can be exchanged in the physical data table and the physical GSI table first, thereby exchanging the association relationship between the column names of the first column and the second column in the physical table and the column names of the first column and the second column in the logical table.
  • the column names of the first column and the second column can be exchanged in the physical data table and the physical GSI table; or, the column names of the first column and the second column in the logical data table and the logical GSI table can be modified to map to the column names of the second column and the first column in the physical data table and the physical GSI table, respectively, so as to exchange the association relationship between the column names of the first column and the second column in the physical table and the column names of the first column and the second column in the logical table.
  • the first and second columns in the physical table have been swapped based on the association relationship between the column names and the first and second columns in the logical table, while the column names in the logical table have not been swapped.
  • the database can operate on each column based on the column name.
  • the first column and the second column of each table can be distinguished by, for example, the different but unique column IDs they each have, when performing access control, or when sending a physical SQL instruction from the logical layer of the computing node to the physical layer of the data node, the column name can be used to determine the column to be accessed.
  • the columns displayed to the user and/or the columns hidden from the user can also be configured by the column name.
  • the write or query request issued by the user also determines the column to be accessed based on the column name.
  • the first column (with the original column name) in the logical data table and the logical GSI table will become associated with the second column (with the original column name or the mapping relationship corresponding to the first column of the logical table after the exchange) in the physical data table and the physical GSI table
  • the second column (with the new column name) in the logical data table and the logical GSI table will become associated with the first column (with the new column name or the mapping relationship corresponding to the second column of the logical table after the exchange) in the physical data table and the physical GSI table.
  • the first column with the original column name in the logical data table and the logical GSI table is still displayed to the user, while the second column with the new column name is hidden from the user.
  • the second column with the original column name after the exchange or the mapping relationship corresponding to the first column of the logical table after the exchange is associated with the first column of the logical data table and the logical GSI table, respectively, while the first column with the new column name or the mapping relationship corresponding to the second column of the logical table after the exchange in the physical data table and the physical GSI table is associated with the second column of the logical data table and the logical GSI table, respectively.
  • the user accesses the first column of the original column type at the logical layer, he actually accesses the second column of the new column type (attribute change target value) at the physical layer.
  • the second column with the original column name in the logical data table and the logical GSI table is displayed to the user, while the first column with the new column name is hidden from the user.
  • the second column in the logical data table and the logical GSI table has the same column name (original column name) as the second column in the physical data table and the physical GSI table, or has a mapping relationship based on the column name, and is thus associated.
  • the first column in the logical data table and the logical GSI table has the same column name (new column name) as the first column in the physical data table and the physical GSI table, or has a mapping relationship based on the column name, and is thus associated.
  • the user in the logical table will see that the column with the original column name and the new column type (the second column) replaces the original column with the original column name and the original column type (the first column), just like the original column type of the first column is changed to the column type of the second column (the new column type).
  • step S440 the first column may be deleted in the physical data table and the physical GSI table.
  • step S450 the first column is deleted from the structure definitions of the logical data table and the logical GSI table.
  • step S440 and step S450 can be reversed or performed simultaneously.
  • the database system may have multiple computing nodes, and each computing node may have a logical data table and a logical GSI table.
  • the changed structure definitions of the logical data table and the logical GSI table may be synchronized to all computing nodes.
  • Synchronization of the logical table structure definition means sending the updated structure definition from the computing node that executes the DDL to all computing nodes. After the operation is completed, all computing nodes will see the updated structure definition (logical Schema).
  • each synchronization operation is atomic, that is, an atomic operation.
  • An atomic operation refers to an operation or a series of operations that cannot be interrupted.
  • a synchronous operation may include a series of operations. Since the synchronous operation is atomic, after the synchronous operation is completed, all computing nodes either see the updated state or the state before the update. There will not be a situation where some nodes succeed while others fail, and there will not be a situation where a node sees part of the series of operations succeed at a certain moment.
  • step S430 After completing the configuration operation of step S430 and synchronizing the changed structure definition of the logical data table and the structure definition of the logical GSI table to all computing nodes, the column deletion operation of the above steps S440 and S350 can be performed. do.
  • the attribute change of the first column from the user's perspective is achieved by adding a second column having the attribute change target value and replacing the first column with the second column.
  • the first column may already have some data content before the change, and during the change process, the user may also perform a modification access operation on the first column or the second column. Therefore, after adding the second column to the structure definition of the logical table in step S420 and before deleting the first column in the physical table in step S440, the second column of the physical data table and the physical GSI table may further have the same data content as the first column of the physical data table and the physical GSI table.
  • FIG5 is a schematic flowchart of a database column changing method according to an improved embodiment of the present disclosure.
  • the method shown in Fig. 5 includes steps S410 to S450 described above with reference to Fig. 4. These steps may be the same as those described above with reference to Fig. 4.
  • the newly added steps S510 to S540 in FIG5 are respectively used to synchronize different data contents or user-initiated data modifications between the first column and the second column at different times, so that the first column and the second column have the same data content. It should be understood that in the database column modification method of the improved embodiment, only any one of the steps S510 to S540 may be included, or a combination of any two or three of them may be included, or all of them may be included.
  • step S510 configuration can be made in the structure definition of the logical data table and the logical GSI table so that the modification to the first column is replicated on the second column of the physical data table and the physical GSI table.
  • a modification replication operation can be called a "column multi-write" operation.
  • column multi-write operation means that when a user writes a value to a column in a logical table through logical DML, a copy of the value is copied to another column in the physical table through physical DML.
  • This operation can be enabled through the configuration item of the logical table structure definition (logical schema) on the computing node.
  • the second column can be subjected to the same modification operation as the first column, so that the modified new data content remains consistent between the two columns.
  • the data content of the second column of the physical data table and the physical GSI table is kept consistent with the data content of the first column of the physical data table and the physical GSI table.
  • step S520 the existing data content in the first column of the physical data table and the physical GSI table may also be copied to the second column of the physical data table and the physical GSI table.
  • Such a content copy operation may be referred to as a "column backfill" operation.
  • column backfilling refers to copying the value of a column in a physical table to another column. This operation is completed in the data node.
  • the second column can have the historical data content that the first column previously had.
  • step S530 the structure definitions of the logical data table and the logical GSI table may be configured so that the table stops at the physical column.
  • the modification to the first column is copied to the second column of the physical data table and the physical GSI table (i.e., the aforementioned column multi-write operation is stopped), and the modification to the second column is copied to the first column of the physical data table and the physical GSI table.
  • the modification copy operation here can also be called a "column multi-write” operation, or a "reverse column multi-write” operation.
  • step S530 the direction of the column multi-write operation is modified.
  • the first column and the second column can continue to have the same data content before the first column is deleted, so as to avoid potential user access failures.
  • step S430 a configuration is made in the structure definition of the logical data table and the logical GSI table on one computing node to replace the first column to be changed with the second column in the user access logic of the database, and the modified structure definition has not yet been synchronized to all computing nodes, then in the structure definition of the logical data table and the logical GSI table on some computing nodes, the user still accesses the first column. In this case, it is necessary to continue to maintain the same data content as the second column, so that the new data content modified by the user in the second column can also be synchronized to the first column.
  • step S430 when the attribute to be changed is the column type and the column name does not need to be changed, after the column names of the first column and the second column are exchanged in the physical table and the logical table, the direction of the column multi-write has been automatically adjusted accordingly. This is because, as mentioned above, the database operates on each column based on the column name. After the column name is exchanged, the direction of the corresponding column multi-write operation is also modified to the reverse direction.
  • step S540 it is possible to configure in the structural definitions of the logical data table and the logical GSI table to stop replicating modifications to the second column on the first column of the physical data table and the physical GSI table, i.e., to stop the reverse column multi-write operation.
  • steps S440 and S450 may be executed to delete the first column in the physical table and the data table respectively without worrying about the loss of data content.
  • table T there is a table T.
  • table T there are logical data tables and physical data tables; logical GSI tables and physical GSI tables.
  • the GSI table provides a global secondary index for the data table, and the data table and the GSI table have logical tables and physical tables respectively.
  • FIGS. 6A to 6H and 7A to 7I Since the same operation is performed on both the data table and the GSI table, in FIGS. 6A to 6H and 7A to 7I , for table T, only one table is shown for both the logical table and the physical table as an illustration.
  • 6A to 6H schematically illustrate the column changing process in case (1).
  • Fig. 6A schematically shows the initial state.
  • the solid arrows represent write operations
  • the dotted arrows represent read operations.
  • the columns pointed to by the user's read and write arrows represent that the columns are visible to the user, and the columns not pointed to represent that they are not visible to the user.
  • the logical table and the physical table have a column (the first column) with a column name of “A” and a column type of “L1”.
  • Column A is visible to the user, that is, the user can perform write/read operations on column A.
  • a solid arrow pointing to column A and a dotted arrow extending from column A represent write/read operations on column A, and indicate that column A is visible to the user.
  • the column to be accessed can be determined by the column name.
  • the column name is used to represent the column.
  • the column change operation described below changes the column name of column A to "B” and the column type to "L2" from the user's perspective.
  • the column name "B” and the column type "L2" are the attribute change target values for column A.
  • a column (the second column) is added to the physical data table and the physical GSI table, the column name is “B”, and the column type is “L2”.
  • column B (the second column) does not exist in table T1, so column B is not visible to the user.
  • column multi-write operations can be enabled for data tables and GSI tables, that is, all modifications made by the user to column A are copied to column B, as indicated by a horizontal arrow pointing from column A to column B in the logic table in FIG. 6C .
  • the structure definitions of the changed logical data table and the logical GSI table may be synchronized with all computing nodes. As described above, this synchronization operation may be an atomic operation.
  • a column backfill operation can be performed on the physical data table and the physical GSI table, that is, all the contents in column A on the physical table are copied to column B.
  • a horizontal arrow pointing from column A to column B in the physical table is used to indicate this.
  • column A is hidden in the structure definition of the logical data table and the logical GSI table
  • column B is displayed, and the user sees the modified result, i.e., column B. In other words, the user can access (read/write) column B but no longer access column A.
  • the structure definitions of the changed logical data table and the logical GSI table may be synchronized with all computing nodes. As described above, this synchronization operation may be an atomic operation.
  • the structure definitions of the logical data table and the logical GSI table may be modified to stop column overwriting.
  • the structure definitions of the changed logical data table and logical GSI table can be synchronized with all computing nodes.
  • column A may be deleted in the physical data table and the physical GSI table.
  • FIG. 6H the structure definitions of the logical data table and the logical GSI table may be modified to delete column A.
  • the structure definitions of the changed logical data table and logical GSI table can be synchronized with all computing nodes.
  • column A is replaced by column B. From the user's perspective, the original column type "L1" and column name "A” are changed to a column type "L2" and column name "B".
  • Fig. 7A schematically shows the initial state.
  • the logical table and the physical table have a column with a column name of "A” and a column type of "L1" (the first column), and column A is visible to the user.
  • the column change operation described below changes the column type of column A to "L2" from the user's perspective, while the column name remains unchanged, "A".
  • the column type "L2" is the target value for the attribute change of column A.
  • a column (the second column) is added to the physical data table and the physical GSI table, the column name is “A′”, and the column type is “L2”.
  • the column A' (the second column) does not exist in the table T1, so the column A' is not visible to the user.
  • the column name of the column added in the logical table can be different from the column name of the column added in the physical table.
  • the column name of the column added in the physical table is set to "D”
  • the column A' in the logical table is mapped to the column D in the physical table.
  • the "column A'" in the physical table mentioned below can be replaced by "column D”.
  • the structure definitions of the changed logical data table and the logical GSI table may be synchronized with all computing nodes. As described above, this synchronization operation may be an atomic operation.
  • a column backfill operation can be performed on the physical data table and the physical GSI table, that is, all contents in column A on the physical data table and the physical GSI table are copied to column A'.
  • a horizontal arrow pointing from column A to column A' in the physical table is used to indicate this.
  • the column type of column A (the second column) is "L2"
  • the column type of column A' (the first column) is "L1”.
  • the case where the logical table and the physical table are associated with each other through the same column name is used as an example to describe that the column names of the first column and the second column in the physical table are exchanged.
  • the mapping relationship between the first column and the second column in the physical table and the first column and the second column in the logical table can also be exchanged to achieve the exchange of the association relationship.
  • column A' (actually replaced by column 1) is still hidden according to the column name.
  • the user sees the modified result that column A (actually replaced by column 2) is of type "L2".
  • the structure definitions of the changed logical data table and logical GSI table can be synchronized with all computing nodes.
  • the structure definition of the logical data table and the logical GSI table may be modified to stop column overwriting.
  • column A′ (the first column) may be deleted on the physical data table and the physical GSI table.
  • the structure definitions of the logical data table and the logical GSI table may be modified to delete column A′ (the first column).
  • the structure definitions of the changed logical data table and logical GSI table can be synchronized with all computing nodes.
  • the column change solution disclosed in the present invention converts the physical column change operation into a physical column addition, column renaming, and column deletion method, so that when a distributed database performs column changes, the change of the structure definition (logical Schema) of the logical table is not affected by the different completion time of the change of the structure definition (physical Schema) of the physical table. It can ensure that the logical Schema can choose a specific time point for change, so that the user can see a unified change time point, and the change is effective for the main table (data table) and GSI at the same time.
  • adding columns, renaming columns, and removing columns are all online DDL.
  • the column change solution disclosed in the present invention can use the online DDL column type change operation natively supported by MySQL, so the physical DDL involved in the solution will not block the execution of the user's concurrent DML, avoiding the problem that the original column change operation needs to block the user's DML, and there is no need to use Binlog or triggers.
  • the column change scheme of the present application allows different computing nodes to have different logical Schemas before and after the update without affecting the correctness of read and write access until all computing nodes are synchronized.
  • the following is an analysis and explanation of the logical Schema operations (structural definition configuration of the logical table) involved in the column change scheme of the present disclosure.
  • the newly added column does not exist in the old version of the logical schema, so the column cannot be read or written.
  • the column has been added to the new version of the logical schema, it is hidden from the user.
  • the computing node will automatically copy the user's write to the new column (column multi-write). The user still reads the old column. Therefore, the possible data loss in the new column will not affect the user's reading. Therefore, normal access can be achieved through different versions of the logical schema before and after the change.
  • step S430 (2) Regarding changing the column name and column type or hiding the old column and displaying the new column in the logical schema when changing the column name, or exchanging the column name in the logical schema when changing the column type (step S430).
  • step S520 After the column backfill operation (step S520), the data content of the new and old columns is consistent.
  • the old version of the logical schema still accesses the old column, but the data content written by the new version of the logical schema is automatically copied to the old column (step S530). Therefore, there will be no situation where the data content written by the new version of the logical schema cannot be read.
  • the data content written by the old version of the logical schema will be automatically copied to the new column, so the new version of the logical schema can also read the data content written by the old version of the logical schema.
  • step S540 (3) Regarding the operation of stopping multiple writes of columns in the logical schema.
  • step S450 Operation of deleting old columns in the logical schema
  • FIG8 shows a schematic diagram of the structure of a computing device that can be used to implement the above database column modification method according to an embodiment of the present invention.
  • computing device 800 includes a memory 810 and a processor 820 .
  • the processor 820 may be a multi-core processor or may include multiple processors.
  • the processor 820 may include a general-purpose main processor and one or more special coprocessors, such as a graphics processing unit (GPU), a digital signal processor (DSP), etc.
  • the processor 820 may be implemented using a customized circuit, such as an application-specific integrated circuit (ASIC) or a field programmable gate array (FPGA).
  • ASIC application-specific integrated circuit
  • FPGA field programmable gate array
  • the memory 810 may include various types of storage units, such as system memory, read-only memory (ROM), and permanent storage devices. Among them, ROM can store static data or instructions required by the processor 820 or other modules of the computer.
  • the permanent storage device may be a readable and writable storage device.
  • the permanent storage device may be a non-volatile storage device that does not lose the stored instructions and data even after the computer is powered off.
  • the permanent storage device uses a large-capacity storage device (such as a magnetic or optical disk, flash memory) as a permanent storage device.
  • the permanent storage device may be a removable storage device (such as a floppy disk, optical drive).
  • the system memory may be a readable and writable storage device or a volatile readable and writable storage device, such as a dynamic random access memory.
  • the system memory may store some or all instructions and data required by the processor at run time.
  • the memory 810 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), and disks and/or optical disks may also be used.
  • the memory 810 may include a removable storage device that can be read and/or written, such as a laser disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, double-layer DVD-ROM), a read-only Blu-ray disc, an ultra-density optical disc, a flash memory card (e.g., SD card, mini SD card, Micro-SD card, etc.), a magnetic floppy disk, etc.
  • a removable storage device that can be read and/or written, such as a laser disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, double-layer DVD-ROM), a read-only Blu-ray disc, an ultra-density optical disc, a flash memory card (e.g., SD card, mini SD card, Micro-SD card, etc.), a magnetic floppy disk, etc.
  • the computer-readable storage medium does not include carrier waves and transient electronic signals transmitted wirelessly or wired.
  • the memory 810 stores executable codes.
  • the processor 820 can execute the database column changing method mentioned above.
  • the method according to the present invention may also be implemented as a computer program or a computer program product, which includes computer program code instructions for executing the above steps defined in the above method of the present invention.
  • the present invention may also be implemented as a non-temporary machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) on which executable code (or computer program, or computer instruction code) is stored.
  • executable code or computer program, or computer instruction code
  • the processor executes the various steps of the above-mentioned method according to the present invention.
  • each square box in the flow chart or block diagram can represent a part of a module, program segment or code, and the part of the module, program segment or code contains one or more executable instructions for realizing the specified logical function.
  • the functions marked in the square box can also occur in a sequence different from that marked in the accompanying drawings. For example, two continuous square boxes can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved.
  • each square box in the block diagram and/or flow chart, and the combination of the square boxes in the block diagram and/or flow chart can be implemented with a dedicated hardware-based system that performs the specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本公开涉及一种数据库系统及数据库列变更方法,将其中第一列的列属性变更为变更目标值。在物理数据表和物理GSI表中分别添加第二列。在逻辑数据表和逻辑GSI表各自的结构定义中分别添加第二列,并使第二列对用户隐藏。逻辑数据表和逻辑GSI表中的第二列分别与物理数据表和物理GSI表中的第二列相关联,并且逻辑表第二列的属性的值为变更目标值。在逻辑数据表和逻辑GSI表各自的结构定义中进行配置,以在数据库的用户访问逻辑中用第二列替换第一列。在物理数据表和物理GSI表中删除第一列。在逻辑数据表和逻辑GSI表各自的结构定义中删除第一列。由此,便捷地实现了数据库的列变更,而且在列变更过程中不会导致用户写入或读取访问请求执行失败。

Description

数据库系统及数据库列变更方法
本申请要求于2022年09月29日提交中国专利局、申请号为202211198909.3、申请名称为“数据库系统及数据库列变更方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及数据库,特别涉及数据库中进行列变更的方案。
背景技术
随着互联网应用的不断普及和深化,需要处理的数据量也不断增大,对数据库的性能要求也越来越高,分布式数据库受到越来越多的关注。
一种基于分布式数据库分片(Sharding)思想实现的分布式数据库系统主要由一个或多个数据节点(DN,也可以称为“存储节点”)与一个或多个计算节点(CN)组成。
计算节点上存储逻辑表(也可以称为“逻辑库表”),逻辑表可以具有多个分区(shard)。每个分区的数据存储到各自对应的数据节点上的分库分表,即物理表(也可以称为“物理库表”)。
执行数据库操作时,用户可以向计算节点发送作用在逻辑表上的逻辑SQL(结构化查询语句)指令,然后计算节点可以向数据节点发送作用在物理表上的物理SQL指令。
另外,还提出了全局二级索引(GSI)的概念。GSI是数据表(一些情况下,也可以称为“主表”或“数据主表”)之外的索引,其中可能包含了主表的若干列和主键,并采用不同的分片模式。数据表和GSI两者的数据是一致的。数据表和GSI都可以分别具有计算节点上的逻辑表和数据节点上的物理表。
表由一个或多个列组成。列中存储着表中某部分的信息。每个列都有各自的列名称和相应的列类型(数据类型)。每一列存储着一条特定的信息。例如,在人员信息表中,一个列存储着人员编号,另一个列存储着人员姓名,而地址、城市、州以及邮政编码等也都存储在各自的列中,而每一行则分别对应不同的人员,记录其各项信息。
在一些情况下,需要对表中的列的属性进行变更操作,即列变更操作,例如,可以对列名称和/或列类型进行变更。
使用传统方法在分布式数据库上执行修改列类型的操作时,数据表和GSI两者各自的物理表中列的类型不能同时变更,这可能导致在某个时刻在主表和GSI上同时看到修改前后不同类型的列和数据,从而导致用户的写入或读取访问请求执行失败。
一些数据库解决方案使用第三方工具或自带工具来进行列变更操作,其共同特点是需要将数据复制到新表,执行时间长,并且需要额外依赖于MySQL的触发器或者Binlog。
在另一些传统数据库解决方案中,对于列类型变更的操作,数据节点上的数据库系统(例如MySQL)需要在锁表的情况进行处理。而加锁期间若要执行对应表的SQL会报错,即不能做到在线列变更。这会影响业务的正常执行。
因此,仍然需要一种改进的数据库列变更方案。
发明内容
本公开要解决的一个技术问题是提供一种数据库列变更方案,其能够便捷地实现列变更而又不会导致列变更过程中用户写入或读取访问的失败。
根据本公开的第一个方面,提供了一种数据库列变更方法,所述数据库具有数据表和用于所述数据表的全局二级索引GSI,所述数据表具有计算节点上的逻辑数据表和数据节点上的物理数据表,所述GSI具有计算节点上的逻辑GSI表和数据节点上的物理GSI表,该方法包括:在物理数据表和物理GSI表中分别添加第二列,物理数据表和物理GSI表中的第二列的至少一项属性的值为数据表中待变更的第一列的相应属性的变更目标值;在逻辑数据表和逻辑GSI表各自的结构定义中分别添加第二列,并使第二列对用户隐藏,逻辑数据表和逻辑GSI表中的第二列分别与物理数据表和物理GSI表中的第二列相关联,并且逻辑数据表和逻辑GSI表中的第二列的所述至少一项属性的值为所述变更目标值;在逻辑数据表和逻辑GSI表各自的结构定义中进行配置,以在数据库的用户访问逻辑中用第二列替换第一列;在物理数据表和物理GSI表中删除第一列;以及在逻辑数据表和逻辑GSI表各自的结构定义中删除第一列。
可选地,物理数据表和物理GSI表中的第二列的列名称与逻辑数据表和逻辑GSI表中的第二列的列名称均为第一列列名称的变更目标值。
或者,可选地,物理数据表和物理GSI表中的第二列的列名称与逻辑数据表和逻辑GSI表中的第二列的列名称可以不同,逻辑数据表和逻辑GSI表中的第二列的列名称被配置为映射到物理数据表和物理GSI表中的第二列的列名称。
可选地,该方法还可以包括:每次对逻辑数据表和逻辑GSI表各自的结构定义进行变更之后,将变更后的逻辑数据表和逻辑GSI表的结构定义同步到所有计算节点。
可选地,所述将变更后的逻辑数据表和逻辑GSI表的结构定义同步到所有计算节点的操作是原子操作。
可选地,在逻辑数据表和逻辑GSI表各自的结构定义中进行配置以在数据库的用户访问逻辑中用第二列第一列,并将变更后的逻辑数据表的结构定义和逻辑GSI表的结构定义同步到所有计算节点之后,执行所述在物理数据表和物理GSI表中删除第一列的步骤。
可选地,该方法还可以包括:在添加第二列之后,删除第一列之前,使物理数据表和物理GSI表的第二列具有与物理数据表和物理GSI表的第一列相同的数据内容。
可选地,使物理数据表和物理GSI表的第二列具有与物理数据表和物理GSI表的第一列相同的数据内容的步骤包括:在第一列对用户显示而第二列对用户隐藏的情况下,在逻 辑数据表和逻辑GSI表各自的结构定义中进行配置,使得在物理数据表和物理GSI表各自的第二列上复制对第一列的修改;以及/或者将物理数据表和物理GSI表各自的第一列中已有的数据内容复制到物理数据表和物理GSI表各自的第二列;以及/或者在第一列对用户隐藏而第二列对用户显示的情况下,在逻辑数据表和逻辑GSI表各自的结构定义中进行配置,使得停止在物理数据表和物理GSI表各自的第二列上复制对第一列的修改,并在物理数据表和物理GSI表各自的第一列上复制对第二列的修改。
可选地,在物理数据表和物理GSI表中删除第一列的步骤之前,该方法还包括:在逻辑数据表和逻辑GSI表各自的结构定义中进行配置,停止在物理数据表和物理GSI表各自的第一列上复制对第二列的修改。
可选地,第一列待变更的属性包括列名称和列类型,或者仅包括列名称,在逻辑数据表和逻辑GSI表各自的结构定义中进行配置以在数据库的用户访问逻辑中用第二列替换第一列的步骤包括:在逻辑数据表和逻辑GSI表各自的结构定义中进行配置,以使第一列对用户隐藏,而第二列对用户显示。
可选地,第一列待变更的属性包括列类型,在逻辑数据表和逻辑GSI表各自的结构定义中进行配置以在数据库的用户访问逻辑中用第二列替换第一列的步骤包括:在物理数据表和物理GSI表中交换第一列和第二列的列名称,或者将逻辑数据表和逻辑GSI表中的第一列的列名称和第二列的列名称分别修改为映射到物理数据表和物理GSI表中第二列和第一列的列名称;以及在逻辑数据表和逻辑GSI表各自的结构定义中交换第一列和第二列的列名称。
可选地,所述数据库是分布式数据库。
可选地,所述数据库基于列名称对各列进行操作。
根据本公开的第二个方面,提供了一种数据库系统,包括一个或多个计算节点以及一个或多个数据节点,其中,使用根据上述第一方面所述的方法来实现列变更操作。
根据本公开的第三个方面,提供了一种计算设备,包括:处理器;以及存储器,其上存储有可执行代码,当可执行代码被处理器执行时,使处理器执行如上述第一方面所述的方法。
根据本公开的第四个方面,提供了一种计算机程序产品,包括可执行代码,当所述可执行代码被电子设备的处理器执行时,使所述处理器执行如上述第一方面所述的方法。
根据本公开的第五个方面,提供了一种非暂时性机器可读存储介质,其上存储有可执行代码,当可执行代码被电子设备的处理器执行时,使处理器执行如上述第一方面所述的方法。
由此,便捷地实现了数据库的列变更,而且在列变更过程中不会导致用户写入或读取访问请求执行失败。
附图说明
通过结合附图对本公开示例性实施方式进行更详细的描述,本公开的上述以及其它目的、特征和优势将变得更加明显,其中,在本公开示例性实施方式中,相同的参考标号通常代表相同部件。
图1是分布式数据库系统的示意性框图。
图2是具有全局二级索引(GSI)的数据库中进行写入操作的示意图。
图3是具有全局二级索引(GSI)的数据库中进行查询操作的示意图。
图4是根据本公开的数据库列变更方法的示意性流程图。
图5是根据本公开改进实施例的数据库列变更方法的示意性流程图。
图6A至6H示意性地示出了一种情形下的列变更过程。
图7A至7I示意性地示出了另一种情形下的列变更过程。
图8示出了根据本发明一实施例可用于实现上述数据库列变更方法的计算设备的结构示意图。
具体实施方式
下面将参照附图更详细地描述本公开的优选实施方式。虽然附图中显示了本公开的优选实施方式,然而应该理解,可以以各种形式实现本公开而不应被这里阐述的实施方式所限制。相反,提供这些实施方式是为了使本公开更加透彻和完整,并且能够将本公开的范围完整地传达给本领域的技术人员。
在分布式数据库中,当执行列变更操作时,需要在所有物理表上执行对应的物理DDL指令,并且修改计算节点中存储的逻辑表结构定义(也可以称为“逻辑Schema”),使其与物理DDL之后的物理表结构定义(也可以称为“物理Schema”)保持一致。
DDL即数据定义语言,例如增加列,变更列类型、长度,增加约束等。
然而,在这一过程中会存在一些问题。
在执行物理列变更的过程中,由于分布式的特性,一般不能做到使所有物理DDL在同一时刻完成。因此,无法选择一个确切的时间点来进行逻辑Schema的变更。
例如,一些情况下,用户在修改列类型的同时还修改了列名称。这样,就有可能存在一个时间点,部分物理表已经完成了物理DDL,且使用的是新列名称,而另一部分物理表还未完成物理DDL,且使用旧列名称。此时,用户发出的查询指令无论使用的是新列名称还是旧列名称,都将报错。
或者,还有可能在数据表上执行完物理列变更之后,全局二级索引(GSI)上还未执行列变更,数据表和GSI中该列的列类型不一致。
另一方面,数据库系统可能拥有多个计算节点。这样,在逻辑Schema变更的过程中,不同的计算节点可能看到变更前后不同版本的逻辑Schema。此时,需要确保看到前后不同版本的逻辑Schema的计算节点都能正常工作。
另一方面,由于列变更在MySQL上不是在线DDL操作,所以在物理表在执行DDL 时发送到该物理表的DML语句将会被阻塞,从而影响用户正常的业务执行。
这里,“在线DDL操作”是指在执行DDL的过程中允许用户的DML语句并发执行的DDL操作。DML即数据操纵语言,例如插入(INSERT)、更新(UPDATE)、删除(DELETE)等语句。
本公开提出一种新颖的数据库列变更方案,通过添加新列、切换新旧列访问逻辑、删除旧列,能够便捷地实现列属性的变更。对于用户而言,数据表和GSI的变更不会导致写入或读取访问的失败。
首先,参考图1至图3对可应用本公开列变更方案的分布式数据库系统进行简要描述。
图1是分布式数据库系统的示意性框图。
如图1所示,分布式数据库可以包括一个或多个计算节点和一个或多个数据节点。
图2和图3分别是具有GSI的分布式数据库中进行写入和查询操作的示意图。
如图2和图3所示,分布式数据库包括逻辑层的逻辑表和物理层的物理表(分库1、分库2……)。逻辑表可以位于计算节点上,而物理表则位于数据节点上。不同的分库可以位于多个不同的数据节点上。逻辑表也可以位于多个计算节点上。
分布式数据库具有数据表(例如,主表)和用于对数据表例如主表提供全局二级索引的GSI。数据表和GSI又分别有数据节点上的物理表(主表分表1、主表分表2;索引分表1、索引分表2;……)和计算节点上的逻辑表。
为便于描述,本公开中将数据表的逻辑表称为“逻辑数据表”,将数据表的物理表称为“物理数据表”(即图2和图3中的主表分表),将GSI的逻辑表称为“逻辑GSI表”,将GSI的物理表称为“物理GSI表”(即图2和图3中的索引分表)。
应当理解,在针对同一个数据表的逻辑数据表、物理数据表、逻辑GSI表、物理GSI表中,具有相同的列,只是列的表现形式或存储形式有所区别。为便于描述,在本公开中,对于针对同一个数据表的逻辑数据表、物理数据表、逻辑GSI表、物理GSI表中相同的列不再分别命名。
在本公开中,“第一列”、“第二列”中“第一”、“第二”的表述仅用于区分不同的列,而不表示任何先后、主次等其它含义。
用户可以向计算节点发送作用在逻辑表上的逻辑SQL(结构化查询语句)指令,如写入指令或查询指令。然后,计算节点可以向数据节点发送作用在物理表上的物理SQL指令。
关于全局二级索引(GSI),进一步描述如下。
在分布式数据库中,如果数据行和对应的索引行保存在相同分片上,称这种索引为局部索引。不同于局部索引,如果数据行和对应的索引行保存在不同分片上,则称这种索引为全局二级索引,主要用于快速确定查询涉及的数据分片。两者可以搭配使用,通过GSI将查询下发到单个分片后,该分片上的局部索引能够提升分片内的查询性能。
GSI支持按需增加拆分维度,提供全局唯一索引。每个GSI对应一张索引表,使用XA多写保证主表和索引表之间数据强一致。这里,“XA事务”表示“分布式事务”。
如果查询的维度与逻辑表的拆分维度不同,则可能产生跨分片查询。跨分片查询的增加会导致查询卡慢、连接池耗尽等性能问题。使用GSI则能够通过增加拆分维度来减少跨分片查询,消除性能瓶颈。
下面参考图4描述根据本公开的数据库列变更方法。
列变更是指将数据库中一列的属性变更为变更目标值。要变更的属性例如可以是列名称和/或列类型。
图4是根据本公开的数据库列变更方法的示意性流程图。这里的数据库可以是分布式数据库。
设要对数据库中的第一列的属性进行变更。应当理解,相应的逻辑数据表和物理数据表中具有相关联的第一列,相应的逻辑GSI表和物理GSI表中具有相关联的第一列,逻辑GSI表和物理GSI表中的第一列为逻辑数据表和物理数据表中的第一列提供全局二级索引。
本公开创造性地提出,通过数据库系统如MySQL上的加列、重命名列、减列等在线DDL操作,实现一种新的数据库在线列变更方案。
如图4所示,在步骤S410,在物理数据表和物理GSI表中分别添加第二列。物理数据表和物理GSI表中的第二列的至少一项属性的值为数据表中待变更的第一列的相应属性的属性变更目标值。列变更操作期望将第一列的属性的值变更为该属性变更目标值。
由此,在数据节点的物理层上添加了第二列。
列的属性例如可以包括列名称、列类型等等。
一些情况下,这里添加的第二列的列名称可以与第一列的列名称目标值不同。
在一些方案中,逻辑表和物理表之间通过相同的列名称来实现逻辑表的列与物理表的列之间的关联,即逻辑表和物理表中具有相同列名称的列是相关联的,对应于同一个列。
这种情况下,如果第一列期望变更的属性包括列名称,那么第二列的列名可以设定为第一列的列名称变更目标值。即,如果希望将第一列的列名称变更为“N”,则将物理表中第二列的列名称设定为“N”。
而在另一些方案中,逻辑表和物理表之间通过列名称的映射关系来实现逻辑表的列与物理表的列之间的关联。换言之,可以配置映射关系,记录逻辑表中的各个列名称分别对应于物理表中的哪个列名称。逻辑表面向用户,用户可以在逻辑表中查找到期望的列名称。而基于逻辑表中的列名称可以通过映射关系确定其所对应的物理表中的列名称,从而在物理表中基于所对应的物理表列名称来进行查询。
这种情况下,如果第一列期望变更的属性包括列名称,那么第二列的列名也不必设定为第一列的列名称变更目标值,而是可以设定为任意满足一般列名称条件(例如不冲突、不重复)的列名称。即,如果希望将第一列的列名称变更为“N”,物理表中第二列的列名称不必设定为“N”,而是可以设定为例如“M”。这样,后面可以将逻辑表中第二列的列名称设定为“N”,而将逻辑表中的列名称“N”映射到物理表中的列名称“M”。
另外,数据表和GSI中对应列的列名称一般是相同的。GSI的列名称集合是数据表的 列名称集合的一个子集。
在步骤S420,在逻辑数据表和逻辑GSI表各自的结构定义(schema)中分别添加第二列,并使第二列对用户隐藏。
应当理解,逻辑数据表和逻辑GSI表中的第二列分别与物理数据表和物理GSI表中的第二列相关联,并且逻辑数据表和逻辑GSI表中的第二列的至少一项属性的值为上述属性变更目标值。
如上文所述,在逻辑表和物理表之间通过相同的列名称来实现逻辑表的列与物理表的列之间的关联的情况下,如果需要变更列名称,则上述步骤S410中设定的物理数据表和物理GSI表中的第二列的列名称与步骤S420中设定的逻辑数据表和逻辑GSI表中的第二列的列名称相同,均为列名称变更的目标值。由此,可以实现逻辑表和物理表中新增的第二列之间的关联。
而另一方面,在逻辑表和物理表之间通过列名称的映射关系来实现逻辑表的列与物理表的列之间的关联的情况下,逻辑表的第二列的列名称被配置为具有列名称的变更目标值,而物理表的第二列的列名称则不需要一定为列名称的变更目标值。将逻辑数据表和逻辑GSI表中的第二列的列名称配置为映射到物理数据表和物理GSI表中的第二列的列名称,即可实现两者之间的关联。
由此,在计算节点的逻辑层上添加了第二列。
而此时,第一列没有变更,仍然对用户显示。
在步骤S430,在逻辑数据表和逻辑GSI表各自的结构定义中进行配置,以在数据库的用户访问逻辑中用第二列替换要变更的第一列。
具体说来,可以在逻辑数据表和逻辑GSI表各自的结构定义中进行配置,以使待变更的第一列对用户隐藏,而第二列对用户显示,并且保持逻辑数据表和逻辑GSI表中的第二列分别与物理数据表和物理GSI表中的第二列相关联。
由此,实现了第二列与第一列在用户访问逻辑中的交换,对于用户而言,具有原属性值的一列(第一列)变更成了具有属性变更目标值的列(第二列)。虽然实际上进行了两个列的替换,而用户感受到的则似乎只是一个列的属性变更。
此后,用户通过逻辑表中的第二列来访问(写入或查询)物理表中的第二列。第二列对用户显示,第一列则已不再对用户显示,而是对用户隐藏。
下面,进一步详细描述步骤S430的配置操作。
这里,第一列要变更的列属性可以包括列名称和/或列类型。对于不同的变更属性,可以有不同的配置方案。
当第一列要变更的列属性包括列名称和列类型,或者仅包括列名称时,可以在逻辑数据表和逻辑GSI表各自的结构定义中进行配置,以使第一列对用户隐藏,而第二列对用户显示。
这样,用户将看到逻辑表中具有第二列的列名称(以及列类型)的列替代了原来具有 第一列的列名称(以及列类型)的列,就像原来第一列的列名称(以及列类型)变更成了第二列的列名称(以及列类型)。
如上文所述,物理表中的第二列可以具有与逻辑表中的第二列相同的列名称;或者,也可以具有不同的列名称,只需要保持逻辑表中第二列的列名称映射到物理表中第二列的列名称即可。
另一方面,当不需要变更列名称,而仅期望变更列类型,即第一列要变更的列属性包括列类型时,可以通过先交换物理表中第一列和第二列的列名称与逻辑表中第一列和第二列的列名称之间的关联关系,再在逻辑表中交换第一列和第二列的列名称,来实现用户访问逻辑的变更。在步骤S410添加第二列时,由于期望变更的列属性不包括列名称,第二列的列名称可以设定为任意满足一般列名称条件(例如不冲突、不重复)的新列名称。
例如,在逻辑表和物理表之间通过相同的列名称来实现逻辑表的列与物理表的列之间的关联的情况下,可以先在物理数据表和物理GSI表中交换第一列和第二列的列名称,从而交换物理表中第一列和第二列的列名称与逻辑表中第一列和第二列的列名称之间的关联关系。
另一方面,在逻辑表和物理表之间通过列名称的映射关系来实现逻辑表的列与物理表的列之间的关联的情况下,可以在物理数据表和物理GSI表中交换第一列和第二列的列名称;或者,也可以将逻辑数据表和逻辑GSI表中的第一列的列名称和第二列的列名称分别修改为映射到物理数据表和物理GSI表中第二列和第一列的列名称,从而交换物理表中第一列和第二列的列名称与逻辑表中第一列和第二列的列名称之间的关联关系。换言之,这种情况下,可以不需要修改物理表中第一列和第二列的列名称,而只需要修改(交换)逻辑表和物理表中第一列和第二列的映射关系。
此时,物理表中第一列和第二列基于列名称与逻辑表中第一列和第二列的关联关系已交换,而逻辑表中的列名称尚未交换。
应当理解,数据库可以基于列名称对各列进行操作。虽然不同表中各自的第一列和第二列例如可以通过分别具有的各不相同而唯一不变的列ID来加以区分,但是当进行访问控制的时候,或者说,从计算节点的逻辑层向数据节点的物理层发送物理SQL指令的时候,可以通过列名称来确定要访问的列。在逻辑数据表和逻辑GSI表各自的结构定义中,也可以通过列名称来配置对用户的显示的列和/或对用户隐藏的列。事实上,用户发出的写入或查询请求也是基于列名称来确定要访问的列的。
这样,逻辑数据表和逻辑GSI表中的第一列(具有原列名称)将变得与物理数据表和物理GSI表中的第二列(具有原列名称或交换后对应于逻辑表第一列的映射关系)相关联,而逻辑数据表和逻辑GSI表中的第二列(具有新列名称)将变得与物理数据表和物理GSI表中的第一列(具有新列名称或交换后对应于逻辑表第二列的映射关系)相关联。
此时,逻辑数据表和逻辑GSI表中仍然是具有原列名称的第一列对用户显示,而具有新列名称的第二列对用户隐藏。而物理数据表和物理GSI表中列名称交换或列名称映射关 系交换后具有原列名称或交换后对应于逻辑表第一列的映射关系的第二列分别关联到逻辑数据表和逻辑GSI表中的第一列,而物理数据表和物理GSI表中具有新列名称或交换后对应于逻辑表第二列的映射关系的第一列则分别关联到逻辑数据表和逻辑GSI表中的第二列。用户在逻辑层访问原列类型的第一列时,实际访问到物理层中新列类型(属性变更目标值)的第二列。
然后,在逻辑数据表和逻辑GSI表各自的结构定义中交换第一列和第二列的列名称。
此时,物理表和逻辑表中的列名称均已交换。逻辑数据表和逻辑GSI表中具有原列名称的第二列对用户显示,而具有新列名称的第一列对用户隐藏。逻辑数据表和逻辑GSI表中的第二列与物理数据表和物理GSI表中的第二列具有相同的列名称(原列名称)或具有基于列名称的映射关系,并因而相关联。逻辑数据表和逻辑GSI表中的第一列与物理数据表和物理GSI表中的第一列具有相同的列名称(新列名称)或具有基于列名称的映射关系,并因而相关联。
于是,用户(在逻辑表中)将看到具有原列名称以及新列类型的列(第二列)替代了原来具有原列名称以及原列类型的列(第一列),就像原来第一列的原列类型变更成了第二列的列类型(新列类型)。
然后,在步骤S440,可以在物理数据表和物理GSI表中删除第一列。
在步骤S450,在逻辑数据表和逻辑GSI表各自的结构定义中删除第一列。
由于用户已经不再访问第一列,因此步骤S440和步骤S450的顺序可以颠倒,或同步执行。
如上文所述,数据库系统可以具有多个计算节点,每个计算节点上都可以有逻辑数据表和逻辑GSI表。
在每次对逻辑数据表和逻辑GSI表各自的结构定义进行变更之后,可以将变更后的逻辑数据表和逻辑GSI表的结构定义同步到所有计算节点。
逻辑表的结构定义(逻辑Schema)的同步是指,将更新后的结构定义从执行DDL的计算节点发送到所有的计算节点,该操作结束之后所有的计算节点都将看到更新后的结构定义(逻辑Schema)。
需要注意的是,每次同步操作都是原子的,即原子操作。
原子操作是指不可中断的一个或者一系列操作。同步操作可能包含一系列操作。由于同步操作是原子的,同步操作结束之后,要么所有的计算节点都看到更新后的状态,要么都看到更新前的状态,不会有部分节点成功,部分节点失败的情况发生,也不会有某个节点在某个时刻看到这一系列操作中部分成功的情况发生。
这样,对逻辑数据表和逻辑GSI表的修改要么同时都成功,要么同时都失败,而不会有其它中间结果。
这样,可以在完成步骤S430的配置操作,并将变更后的逻辑数据表的结构定义和逻辑GSI表的结构定义同步到所有计算节点之后,再执行上述步骤S440、S350的列删除操 作。
上文中已经参考图4描述了通过添加具有属性变更目标值的第二列,并用第二列替换第一列,来实现从用户的视角而言对第一列的属性变更。
在一些情况下,例如在线进行列变更操作时,第一列在变更前可能已经具有一些数据内容,而且在变更过程中,用户也可能对第一列或第二列进行修改访问操作。因此,在步骤S420在逻辑表的结构定义中添加第二列之后,步骤S440在物理表中删除第一列之前,还可以进一步使物理数据表和物理GSI表的第二列具有与物理数据表和物理GSI表的第一列相同的数据内容。
这样,可以在顺利实现列变更的同时,还可以进一步确保列中数据内容的保持和延续。
图5是根据本公开改进实施例的数据库列变更方法的示意性流程图。
图5所示方法中包括上文已参考图4描述的步骤S410至步骤S450。这些步骤可以与上面参考图4描述的相同。
图5中新增加的步骤S510至S540分别用于在不同的时机在第一列和第二列之间对不同的数据内容或用户发起的数据修改进行同步,以使第一列和第二列具有相同的数据内容。应当理解,在改进实施例的数据库列变更方法中,可以仅包括步骤S510至S540任何一项,也可以包括其中任何两项或三项的组合,也可以包括其全部。
如图5所示,在步骤S420之后,在第一列对用户显示而第二列对用户隐藏的情况下,在步骤S510,可以在逻辑数据表和逻辑GSI表各自的结构定义中进行配置,使得在物理数据表和物理GSI表各自的第二列上复制对第一列的修改。这样的修改复制操作可以称为“列多写”操作。
更宽泛地讲,列多写操作是指,当用户通过逻辑DML在逻辑表中对于某个列写入值时,在物理DML中在物理表中复制一份到另一列上。该操作可以通过计算节点上逻辑表的结构定义(逻辑Schema)的配置项开启。
由此,在添加了第二列而尚未实现第一列与第二列之间的替换变更之前,可以使第二列接受到与第一列相同的修改操作,使修改的新数据内容在两列之间保持一致。
换言之,通过列多写操作,使物理数据表和物理GSI表的第二列与物理数据表和物理GSI表的第一列的数据内容保持一致。
在第一列可能已经具有数据内容的情况下,在步骤S520,还可以将物理数据表和物理GSI表各自的第一列中已有的数据内容复制到物理数据表和物理GSI表各自的第二列。这样的内容复制操作可以称为“列回填”操作。
更宽泛地讲,列回填操作是指,将物理表中某一列的值复制到另一列中。该操作在数据节点中完成。
这样,可以使第二列具有第一列先前已经具有的历史数据内容。
另一方面,在步骤S430之后,在第一列对用户隐藏而第二列对用户显示的情况下,在步骤S530可以在逻辑数据表和逻辑GSI表各自的结构定义中进行配置,使得停止在物 理数据表和物理GSI表各自的第二列上复制对第一列的修改(即停止前述列多写操作),并在物理数据表和物理GSI表各自的第一列上复制对第二列的修改。这里的修改复制操作也可以称为“列多写”操作,或者,也可以称为“反向列多写”操作。换言之,在步骤S530,修改了列多写操作的方向。
这样,可以在删除第一列之前继续保持第一列与第二列具有相同的数据内容,以避免潜在的用户访问故障。
例如,在存在多个计算节点的情况下,如果在步骤S430在一个计算节点对逻辑数据表和逻辑GSI表各自的结构定义中进行了配置以在数据库的用户访问逻辑中用第二列替换要变更的第一列,而修改后的结构定义尚未同步到所有计算节点,那么在部分计算节点上的逻辑数据表和逻辑GSI表各自的结构定义中,用户仍然访问第一列。这样,就需要继续保持第一列与第二列具有相同的数据内容,使得用户对第二列修改的新数据内容也能够同步到第一列。
应当理解,在上面关于步骤S430的描述中,在要变更的属性为列类型,而不需要变更列名称的情况下,在经过先后在物理表和逻辑表中交换第一列和第二列的列名称之后,列多写的方向也已经相应自动调整了。这也是因为如上所述,数据库中基于列名称来进行对各列进行操作。列名称交换后,相应的列多写操作方向也就修改反向了。
在例如步骤S430配置的逻辑数据表和逻辑GSI表各自的结构定义已经同步到所有计算节点的情况下,在步骤S540,可以在逻辑数据表和逻辑GSI表各自的结构定义中进行配置,停止在物理数据表和物理GSI表各自的第一列上复制对第二列的修改,即停止反向列多写操作。
此后,便可以执行步骤S440和S450,在物理表和数据表中分别删除第一列,而不必担心数据内容的丢失。
下面参考图6A至6H和图7A至7I,进一步详细描述本公开实施例中,两种情况下进行列变更的方案。
假设存在一个表T。在数据库中,与表T对应的,具有逻辑数据表、物理数据表;逻辑GSI表、物理GSI表。具体说来,GSI表为数据表提供全局二级索引,数据表和GSI表分别具有逻辑表和物理表。
由于对数据表和GSI表都要进行相同的操作,因此在图6A至6H和图7A至7I中,对于表T,逻辑表和物理表均仅示出一个表作为示意。
要对表T中的一列进行列变更操作,要变更的列的列名称为“A”,列类型为“L1”。
根据该列要修改的属性,可以分为两种情况:
(1)将列A的类型从L1修改为L2,并将列名称修改为“B”;
(2)将列A的类型从L1修改为L2,不修改列名称。
下面参考图6A至6H描述第(1)种情形下的列变更过程。
图6A至6H示意性地示出了第(1)种情形下的列变更过程。
图6A示意性地示出了初始状态。图中以实线箭头表示写入操作,虚线箭头表示读取操作。对于这里所涉及的列A和列表,用户的读写箭头指向的列表示该列对于用户是可见的,未指向的列表示对用户不可见。
如图6A所示,逻辑表和物理表中具有列名称为“A”,列类型为“L1”的列(第一列)。列A对用户可见,即用户可以对列A进行写入/读取操作。图6A中以指向列A的实线箭头和从列A引出的虚线箭头表示对列A的写入/读取操作,并表示列A是对用户可见的。
如上文所述,虽然例如可以通过分别具有的唯一不变的列ID来加以区分,但是当进行访问控制的时候,或者说,从计算节点的逻辑层向数据节点的物理层发送物理SQL指令的时候,可以通过列名称来确定要访问的列。下文中,用列名称来表示列。
下面描述的列变更操作,从用户的视角而言,将列A的列名称变更为“B”,列类型变更为“L2”。列名称“B”和列类型“L2”为针对列A的属性变更目标值。
如图6B所示,在物理数据表和物理GSI表中添加一列(第二列),列名称为“B”,列类型为“L2”。
这里,以逻辑表和物理表之间通过相同的列名称来实现逻辑表的列与物理表的列之间的关联的情形为例进行描述。应当理解,也可以为物理表中添加的第二列设定任意其它列名称,例如“C”,而在后面在逻辑表中添加第二列之后设定逻辑表中添加的列B映射到物理表中的列C即可。这种情况下,下文中提及的物理表中的“列B”可以替换为“列C”。
此时,逻辑数据表和逻辑GSI表的结构定义(逻辑Schema)中,表T1还不存在列B(第二列),所以列B对用户不可见。
接下来,如图6C所示,修改逻辑数据表和逻辑GSI表的结构定义,添加列B,但对于用户隐藏,所以列B依然对用户不可见。
同时,可以对数据表和GSI表开启列多写操作,即用户对于列A的所有修改,都在列B上复制一份,图6C中,以逻辑表中从列A指向列B的水平箭头指示。
另外,可以将变更后的逻辑数据表和逻辑GSI表的结构定义同步所有的计算节点。如上文所述,这个同步操作可以是原子操作。
如图6D所示,可以在物理数据表和物理GSI表上进行列回填操作,即将物理表上列A中的所有内容拷贝到列B中。图6D中,以物理表中从列A指向列B的水平箭头指示。
在图6D所示的列回填操作完成后,如图6E所示,在逻辑数据表和逻辑GSI表的结构定义中隐藏列A,显示列B,用户看到修改后的结果,即列B。换言之,用户能够访问(读取/写入)列B,而不再访问列A。
同时,可以修改列多写的方向,即用户对于列B的所有修改,都在列A上复制一份,图6E中,以逻辑表中从列B指向列A的水平箭头指示。
另外,可以将变更后的逻辑数据表和逻辑GSI表的结构定义同步所有的计算节点。如上文所述,这个同步操作可以是原子操作。
然后,如图6F所示,可以修改逻辑数据表和逻辑GSI表的结构定义以停止列多写。
另外,可以将变更后的逻辑数据表和逻辑GSI表的结构定义同步所有的计算节点。
于是,如图6G所示,可以在物理数据表和物理GSI表中删除列A。
进一步地,如图6H所示,可以修改逻辑数据表和逻辑GSI表的结构定义以删除列A。
另外,可以将变更后的逻辑数据表和逻辑GSI表的结构定义同步所有的计算节点。
由此,通过添加列B、列访问逻辑切换、删除列A,用列B替代了列A,从用户的视角看来,将原来的列类型为“L1”、列名称为“A”的列变更成了列类型为“L2”、列名称为“B”的列。
接下来,参考图7A至7I描述上述第(2)种情形下的列变更过程。
图7A至7I示意性地示出了上述第(2)种情形下的列变更过程。
图7A示意性地示出了初始状态。如图7A所示,逻辑表和物理表中具有列名称为“A”,列类型为“L1”的列(第一列),列A对用户可见。
在下文的描述中涉及到两个列的列名称交换。从用户视角和/或数据库控制逻辑的角度,仍然用列名称来指代各个列,但是应当明白,在列名称交换前后,同样的列名称实际指代的列发生了变化。这可以参考括号中备注的“第一列”、“第二列”(相当于列ID等不变的列唯一标识)来理解。
下面描述的列变更操作,从用户的视角而言,将列A的列类型变更为“L2”,而列名称不变,仍然为“A”。列类型“L2”为针对列A的属性变更目标值。
如图7B所示,在物理数据表和物理GSI表上添加一列(第二列),列名称为“A'”,列类型为“L2”。
此时,逻辑数据表和逻辑GSI表的结构定义(逻辑Schema)中,表T1还不存在列A'(第二列),所以列A'对用户不可见。
接下来,如图7C所示,修改逻辑数据表和逻辑GSI表的结构定义,添加列A',但对于用户隐藏,所以列A'依然对用户不可见。
这里,以逻辑表和物理表之间通过相同的列名称来实现逻辑表的列与物理表的列之间的关联的情形为例进行描述。应当理解,逻辑表中添加的列的列名称可以与物理表中添加的列的列名不同,例如将物理表中添加的列的列名称设定为“D”,而将逻辑表中的列A'映射到物理表中的列D即可。这种情况下,下文中提及的物理表中的“列A'”可以替换为“列D”。
同时,开启列多写操作,用户对于列A的所有修改,都在列A'上复制一份,图7C中,以逻辑表中从列A指向列A'的水平箭头指示。
另外,可以将变更后的逻辑数据表和逻辑GSI表的结构定义同步所有的计算节点。如上文所述,这个同步操作可以是原子操作。
如图7D所示,可以在物理数据表和物理GSI表上进行列回填操作,即将物理数据表和物理GSI表上列A中的所有内容拷贝到列A'中。图7D中,以物理表中从列A指向列A'的水平箭头指示。
在图7D所示的列回填操作完成后,如图7E所示,在物理数据表和物理GSI表中交换列A和列A'(第一列和第二列)的列名称。
交换完成之后,在物理数据表和物理GSI表中,列A(第二列)的列类型为“L2”,而列A'(第一列)的列类型为“L1”。
这里,以逻辑表和物理表之间通过相同的列名称来实现逻辑表的列与物理表的列之间的关联的情形为例进行描述,交换物理表中第一列和第二列的列名称。如上文所述,也可以交换物理表中第一列和第二列与逻辑表中第一列和第二列之间的映射关系,来实现关联关系的交换。
接下来,如图7F所示,在逻辑数据表和逻辑GSI表的结构定义中交换列A和列A'(第一列和第二列)的列名称。
交换完成之后,按列名称来看,列A'(实际已替换为第一列)依然是隐藏的。用户看到修改后的结果为列A(实际已替换为第二列)的类型为“L2”。
此时,图7E中从列A(第一列)到列A'(第二列)的列反写操作已自然转换为图7F中从列名称交换后的列A(第二列)到列A'(第一列)的列反写操作。
另外,可以将变更后的逻辑数据表和逻辑GSI表的结构定义同步所有的计算节点。
然后,如图7G所示,可以修改逻辑数据表和逻辑GSI表的结构定义,停止列多写。
另外,将变更后的逻辑数据表和逻辑GSI表的结构定义同步所有的计算节点。
于是,如图7H所示,可以在物理数据表和物理GSI表上删除列A'(第一列)。
进一步地,如图7I所示,可以修改逻辑数据表和逻辑GSI表的结构定义以删除列A'(第一列)。
另外,可以将变更后的逻辑数据表和逻辑GSI表的结构定义同步所有的计算节点。
由此,通过添加列A'、列名称交换、删除列A,用新添加的列替代了原来的列,从用户的视角看来,将原来的列类型为“L1”、列名称为“A”的列变更成了列类型为“L2”、列名称仍然为“A”的列。
本公开的列变更方案通过将物理列变更操作转变为物理加列、重命名列、减列的方式执行,使得分布式数据库在进行列变更时,逻辑表的结构定义(逻辑Schema)的变更不受物理表的结构定义(物理Schema)变更完成时间不同的影响,可以保证逻辑Schema能够选择一个特定的时间点进行变更,从而用户能够看到一个统一变更的时间点,变更对主表(数据表)和GSI同时生效。
而且,MySQL中,加列、重命名列、减列都是在线DDL。本公开的列变更方案可以使用MySQL原生支持的在线DDL的更改列类型操作,所以该方案涉及的物理DDL均不会阻塞用户并发执行的DML的执行,避免了原来的列变更操作需要阻塞用户DML的问题,也无需额外使用Binlog或者触发器。
通过本公开的列变更方案执行列变更时,用户可以一直通过旧列来访问数据,直到逻辑Schema切换。逻辑Schema切换之后则可以直接通过新列访问。因此,不会发生物理 DDL部分完成时,无论是对新列还是旧列的访问都有可能出错的情况发生。
并且,由于主表(数据表)和GSI表的逻辑切换是一个原子操作,所以用户对于主表(数据表)和GSI表的逻辑Schema的查看一定是一致的。
另外,在存在多个计算节点的情况下,对于所有的逻辑Schema操作,本申请的列变更方案中允许不同计算节点同时拥有更新前和更新后不同的逻辑Schema,而不影响读写访问的正确性,直到所有的计算节点同步完成。下面对本公开的列变更方案中涉及的逻辑Schema操作(逻辑表的结构定义配置)分别进行分析阐述。
(1)关于在逻辑Schema添加新列(步骤S420)、开启列多写(步骤S510)的操作。
旧版本逻辑Schema中不存在新添加的列,读取不到该列,也不会对该列进行写入。
新版本逻辑Schema中虽然已经添加了该列,但是该列对于用户是隐藏的。对旧列进行写入时,会由计算节点自动复制一份用户的写入到新列(列多写)。而用户读取的依然是旧列。因此,新列此时可能的数据缺失不会影响用户的读取。所以,通过变更前后不同版本的逻辑Schema均能实现正常访问。
(2)关于变更列名称和列类型或变更列名称时,逻辑Schema中隐藏旧列、显示新列的操作,或是变更列类型时,逻辑Schema中交换列名称的操作(步骤S430)。
此时,经过列回填操作(步骤S520),新旧两列的数据内容是一致的。旧版本逻辑Schema依然访问旧列,但是通过新版本逻辑Schema写入的数据内容会自动复制一份到旧列(步骤S530)。因此,不会出现读取不到通过新版本逻辑Schema写入的数据内容的情况。
同理,通过旧版本逻辑Schema写入的数据内容会自动复制一份到新列,所以新版本逻辑Schema也能读取到通过旧版本逻辑Schema写入的数据内容。
因此,通过变更前后不同版本的逻辑Schema均能实现正常访问。
(3)关于逻辑Schema中停止列多写的操作(步骤S540)。
此时,新旧两个版本的逻辑Schema都是通过新列来进行读取,所以停止列多写不会对用户读取出的数据造成影响。
(4)关于逻辑Schema中删除旧列的操作(步骤S450)。
此时,新旧两个版本的逻辑Schema都已经不会访问该列,所以不同版本的逻辑Schema均能实现正常访问。
图8示出了根据本发明一实施例可用于实现上述数据库列变更方法的计算设备的结构示意图。
参见图8,计算设备800包括存储器810和处理器820。
处理器820可以是一个多核的处理器,也可以包含多个处理器。在一些实施例中,处理器820可以包含一个通用的主处理器以及一个或多个特殊的协处理器,例如图形处理器(GPU)、数字信号处理器(DSP)等等。在一些实施例中,处理器820可以使用定制的电路实现,例如特定用途集成电路(ASIC,Application Specific Integrated Circuit)或者现场可编程逻辑门阵列(FPGA,Field Programmable GateArrays)。
存储器810可以包括各种类型的存储单元,例如系统内存、只读存储器(ROM),和永久存储装置。其中,ROM可以存储处理器820或者计算机的其他模块需要的静态数据或者指令。永久存储装置可以是可读写的存储装置。永久存储装置可以是即使计算机断电后也不会失去存储的指令和数据的非易失性存储设备。在一些实施方式中,永久性存储装置采用大容量存储装置(例如磁或光盘、闪存)作为永久存储装置。另外一些实施方式中,永久性存储装置可以是可移除的存储设备(例如软盘、光驱)。系统内存可以是可读写存储设备或者易失性可读写存储设备,例如动态随机访问内存。系统内存可以存储一些或者所有处理器在运行时需要的指令和数据。此外,存储器810可以包括任意计算机可读存储媒介的组合,包括各种类型的半导体存储芯片(DRAM,SRAM,SDRAM,闪存,可编程只读存储器),磁盘和/或光盘也可以采用。在一些实施方式中,存储器810可以包括可读和/或写的可移除的存储设备,例如激光唱片(CD)、只读数字多功能光盘(例如DVD-ROM,双层DVD-ROM)、只读蓝光光盘、超密度光盘、闪存卡(例如SD卡、min SD卡、Micro-SD卡等等)、磁性软盘等等。计算机可读存储媒介不包含载波和通过无线或有线传输的瞬间电子信号。
存储器810上存储有可执行代码,当可执行代码被处理器820处理时,可以使处理器820执行上文述及的数据库列变更方法。
上文中已经参考附图详细描述了根据本发明的数据库列变更方法。
此外,根据本发明的方法还可以实现为一种计算机程序或计算机程序产品,该计算机程序或计算机程序产品包括用于执行本发明的上述方法中限定的上述各步骤的计算机程序代码指令。
或者,本发明还可以实施为一种非暂时性机器可读存储介质(或计算机可读存储介质、或机器可读存储介质),其上存储有可执行代码(或计算机程序、或计算机指令代码),当所述可执行代码(或计算机程序、或计算机指令代码)被电子设备(或计算设备、服务器等)的处理器执行时,使所述处理器执行根据本发明的上述方法的各个步骤。
本领域技术人员还将明白的是,结合这里的公开所描述的各种示例性逻辑块、模块、电路和算法步骤可以被实现为电子硬件、计算机软件或两者的组合。
附图中的流程图和框图显示了根据本发明的多个实施例的系统和方法的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标记的功能也可以以不同于附图中所标记的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
以上已经描述了本发明的各实施例,上述说明是示例性的,并非穷尽性的,并且也不 限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。

Claims (14)

  1. 一种数据库列变更的方法,所述数据库具有数据表和用于所述数据表的全局二级索引GSI,所述数据表具有计算节点上的逻辑数据表和数据节点上的物理数据表,所述GSI具有计算节点上的逻辑GSI表和数据节点上的物理GSI表,该方法包括:
    在物理数据表和物理GSI表中分别添加第二列,物理数据表和物理GSI表中的第二列的至少一项属性的值为数据表中待变更的第一列的相应属性的变更目标值;
    在逻辑数据表和逻辑GSI表各自的结构定义中分别添加第二列,并使第二列对用户隐藏,逻辑数据表和逻辑GSI表中的第二列分别与物理数据表和物理GSI表中的第二列相关联,并且逻辑数据表和逻辑GSI表中的第二列的所述至少一项属性的值为所述变更目标值;
    在逻辑数据表和逻辑GSI表各自的结构定义中进行配置,以在数据库的用户访问逻辑中用第二列替换第一列;
    在物理数据表和物理GSI表中删除第一列;以及
    在逻辑数据表和逻辑GSI表各自的结构定义中删除第一列。
  2. 根据权利要求1所述的方法,其中,
    物理数据表和物理GSI表中的第二列的列名称与逻辑数据表和逻辑GSI表中的第二列的列名称均为第一列列名称的变更目标值;或者
    逻辑数据表和逻辑GSI表中的第二列的列名称被配置为映射到物理数据表和物理GSI表中的第二列的列名称。
  3. 根据权利要求1所述的方法,还包括:
    每次对逻辑数据表和逻辑GSI表各自的结构定义进行变更之后,将变更后的逻辑数据表和逻辑GSI表的结构定义同步到所有计算节点。
  4. 根据权利要求3所述的方法,其中,
    所述将变更后的逻辑数据表和逻辑GSI表的结构定义同步到所有计算节点的操作是原子操作。
  5. 根据权利要求3所述的方法,其中,
    在逻辑数据表和逻辑GSI表各自的结构定义中进行配置以在数据库的用户访问逻辑中用第二列替换第一列,并将变更后的逻辑数据表的结构定义和逻辑GSI表的结构定义同步到所有计算节点之后,执行所述在物理数据表和物理GSI表中删除第一列的步骤。
  6. 根据权利要求1所述的方法,还包括:
    在添加第二列之后,删除第一列之前,使物理数据表和物理GSI表的第二列具有与物理数据表和物理GSI表的第一列相同的数据内容。
  7. 根据权利要求6所述的方法,其中,使物理数据表和物理GSI表的第二列具有与物理数据表和物理GSI表的第一列相同的数据内容的步骤包括:
    在第一列对用户显示而第二列对用户隐藏的情况下,在逻辑数据表和逻辑GSI表各自的结构定义中进行配置,使得在物理数据表和物理GSI表各自的第二列上复制对第一列的 修改;以及/或者
    将物理数据表和物理GSI表各自的第一列中已有的数据内容复制到物理数据表和物理GSI表各自的第二列;以及/或者
    在第一列对用户隐藏而第二列对用户显示的情况下,在逻辑数据表和逻辑GSI表各自的结构定义中进行配置,使得停止在物理数据表和物理GSI表各自的第二列上复制对第一列的修改,并在物理数据表和物理GSI表各自的第一列上复制对第二列的修改。
  8. 根据权利要求7所述的方法,其中,在物理数据表和物理GSI表中删除第一列的步骤之前,该方法还包括:
    在逻辑数据表和逻辑GSI表各自的结构定义中进行配置,停止在物理数据表和物理GSI表各自的第一列上复制对第二列的修改。
  9. 根据权利要求1所述的方法,其中,
    第一列待变更的属性包括列名称和列类型,或者仅包括列名称,在逻辑数据表和逻辑GSI表各自的结构定义中进行配置以在数据库的用户访问逻辑中用第二列替换第一列的步骤包括:
    在逻辑数据表和逻辑GSI表各自的结构定义中进行配置,以使第一列对用户隐藏,而第二列对用户显示,
    并且/或者,
    第一列待变更的属性包括列类型,在逻辑数据表和逻辑GSI表各自的结构定义中进行配置以在数据库的用户访问逻辑中用第二列替换第一列的步骤包括:
    在物理数据表和物理GSI表中交换第一列和第二列的列名称,或者将逻辑数据表和逻辑GSI表中的第一列的列名称和第二列的列名称分别修改为映射到物理数据表和物理GSI表中第二列和第一列的列名称;以及
    在逻辑数据表和逻辑GSI表各自的结构定义中交换第一列和第二列的列名称。
  10. 根据权利要求9所述的方法,其中,
    所述数据库是分布式数据库;并且/或者
    所述数据库基于列名称对各列进行操作。
  11. 一种数据库系统,包括一个或多个计算节点以及一个或多个数据节点,其中,使用根据权利要求1至10中任何一项所述的方法来实现列变更操作。
  12. 一种计算设备,包括:
    处理器;以及
    存储器,其上存储有可执行代码,当所述可执行代码被所述处理器执行时,使所述处理器执行如权利要求1至10中任何一项所述的方法。
  13. 一种计算机程序产品,包括可执行代码,当所述可执行代码被电子设备的处理器执行时,使所述处理器执行如权利要求1至10中任何一项所述的方法。
  14. 一种非暂时性机器可读存储介质,其上存储有可执行代码,当所述可执行代码被电子设备的处理器执行时,使所述处理器执行如权利要求1至10中任何一项所述的方法。
PCT/CN2023/121268 2022-09-29 2023-09-25 数据库系统及数据库列变更方法 WO2024067528A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211198909.3A CN115687343A (zh) 2022-09-29 2022-09-29 数据库系统及数据库列变更方法
CN202211198909.3 2022-09-29

Publications (1)

Publication Number Publication Date
WO2024067528A1 true WO2024067528A1 (zh) 2024-04-04

Family

ID=85064788

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/121268 WO2024067528A1 (zh) 2022-09-29 2023-09-25 数据库系统及数据库列变更方法

Country Status (2)

Country Link
CN (1) CN115687343A (zh)
WO (1) WO2024067528A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115687343A (zh) * 2022-09-29 2023-02-03 阿里云计算有限公司 数据库系统及数据库列变更方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160117965A (ko) * 2015-04-01 2016-10-11 삼성에스디에스 주식회사 NoSQL 모델 생성 방법 및 그 장치
CN106471489A (zh) * 2014-06-30 2017-03-01 微软技术许可有限责任公司 管理具有灵活模式的数据
CN108369587A (zh) * 2015-10-19 2018-08-03 甲骨文国际公司 创建用于交换的表
US20220121632A1 (en) * 2020-10-19 2022-04-21 Oracle International Corporation Dynamic inclusion of custom columns into a logical model
CN115687343A (zh) * 2022-09-29 2023-02-03 阿里云计算有限公司 数据库系统及数据库列变更方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106471489A (zh) * 2014-06-30 2017-03-01 微软技术许可有限责任公司 管理具有灵活模式的数据
KR20160117965A (ko) * 2015-04-01 2016-10-11 삼성에스디에스 주식회사 NoSQL 모델 생성 방법 및 그 장치
CN108369587A (zh) * 2015-10-19 2018-08-03 甲骨文国际公司 创建用于交换的表
US20220121632A1 (en) * 2020-10-19 2022-04-21 Oracle International Corporation Dynamic inclusion of custom columns into a logical model
CN115687343A (zh) * 2022-09-29 2023-02-03 阿里云计算有限公司 数据库系统及数据库列变更方法

Also Published As

Publication number Publication date
CN115687343A (zh) 2023-02-03

Similar Documents

Publication Publication Date Title
JP6059273B2 (ja) データベースクエリを修正する方法、コンピュータ可読記憶媒体及びシステム
CA3121919C (en) System and method for augmenting database applications with blockchain technology
CN109906448B (zh) 用于促进可插拔数据库上的操作的方法、设备和介质
US7383293B2 (en) Database backup system using data and user-defined routines replicators for maintaining a copy of database on a secondary server
US9330147B2 (en) Database and data bus architecture and systems for efficient data distribution
US7213116B2 (en) Method and apparatus for mirroring objects between storage systems
US8255373B2 (en) Atomic multiple modification of data in a distributed storage system
JP5321682B2 (ja) ストレージシステムとストレージアクセス方法とプログラム
WO2024067528A1 (zh) 数据库系统及数据库列变更方法
US10754854B2 (en) Consistent query of local indexes
US20050010592A1 (en) Method and system for taking a data snapshot
US9576038B1 (en) Consistent query of local indexes
US20120005179A1 (en) Extensibility of metaobjects
EP3719665B1 (en) Supporting scalable distributed secondary index using replication engine for high-performance distributed database systems
CN105144162B (zh) 集成数据库框架
CN109871386A (zh) 非易失性存储器中的多版本并发控制(mvcc)
JP2021082257A (ja) ハイブリッドクラウド弾性スケーリングおよび高性能データ仮想化のためのリアルタイムクロスシステムデータベースレプリケーション
US20120284244A1 (en) Transaction processing device, transaction processing method and transaction processing program
WO2012108015A1 (ja) データ同期方法、データ同期プログラム、及びデータ同期制御装置
WO2023202394A1 (zh) 分区表建立、针对分区表的数据写入、读取方法及装置
WO2023197404A1 (zh) 一种基于分布式数据库的对象存储方法及装置
US20100088289A1 (en) Transitioning clone data maps and synchronizing with a data query
WO2024087777A1 (zh) 数据库表的数据重整方法和装置、介质和计算机设备
US11940972B2 (en) Execution of operations on partitioned tables
Kantabutra et al. Intentionally-Linked Entities: A Better Database System for Representing Dynamic Social Networks, Narrative Geographic Information Sytem and General Abstractions of Reality

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23870757

Country of ref document: EP

Kind code of ref document: A1