CN117555893A

CN117555893A - Method and device for processing global secondary index

Info

Publication number: CN117555893A
Application number: CN202210938926.XA
Authority: CN
Inventors: 周兆琦; 张树杰; 刘宗昊; 刘宝珠
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-08-05
Filing date: 2022-08-05
Publication date: 2024-02-13
Also published as: WO2024027464A1

Abstract

The application provides a method and a device for processing a global secondary index, which are applied to a first node, wherein the method comprises the following steps: generating a first indication, wherein the first indication is used for indicating to update or delete a first tuple meeting a first condition in the global secondary index; wherein the first condition includes: the value of the data belonging to the first target attribute in the tuple is a first data value; wherein the first target attribute does not contain a null value; and sending the first indication to a second node for storing the global secondary index, so that the second node updates or deletes the first tuple meeting the first condition in the global secondary index according to the first indication. According to the scheme provided by the application, the efficiency and the performance of data operation on the global secondary index of the base table can be improved.

Description

Method and device for processing global secondary index

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a method and an apparatus for processing a global secondary index.

Background

Distributed databases typically include coordination nodes and data nodes, wherein the data nodes are used for storing data, and the coordination nodes are used for managing, operating and scheduling the data stored in the data nodes. A base table is an object in a distributed database that is used to store data, and a base table may be a collection of rows by columns of data that are stored in a distributed manner in at least one data node. Wherein, the data of the same column belongs to the same attribute, the data of the same row belongs to the same tuple, and the data of the same row can be stored in the same data node. The base table may have one or more global secondary indexes, each of which may contain a portion of the column data in the base table. Similar to the manner in which the base table data is stored, the data of the global secondary index of the base table may also be stored in a distributed manner in at least one data node.

Wherein coordination may utilize a data manipulation language (data manipulation language, DML) to implement data operations on the global secondary index, which may include Insert/Update/Delete (IUD) operations, i.e., inserting, deleting, modifying tuple data, etc., in the global secondary index. When the current coordination node performs IUD operation on the global secondary index, a general method is to traverse the global secondary index data stored in the data node to locate a tuple needing IUD operation, and further perform corresponding IUD operation on the tuple. Since the amount of data stored in the data nodes is typically large, locating tuples requiring IUD operations by traversal is inefficient, resulting in a lower performance of the method for IUD operations on the global secondary index.

Disclosure of Invention

The application provides a method and a device for processing a global secondary index, which are used for improving the efficiency and performance of data operation on the global secondary index of a base table.

In a first aspect, the present application provides a method for processing a global secondary index, applied to a first node, the method comprising: generating a first indication, wherein the first indication is used for indicating to update or delete a first tuple meeting a first condition in the global secondary index; wherein the first condition includes: the value of the data belonging to the first target attribute in the tuple is a first data value; wherein the first target attribute does not contain a null value; and sending the first indication to a second node for storing the global secondary index, so that the second node updates or deletes the first tuple meeting the first condition in the global secondary index according to the first indication.

In the method, the data filtering condition is set according to the attribute of which the global secondary index does not contain null values to indicate the tuple needing to perform the IUD operation in the global secondary index, the data node used for storing the tuple needing to perform the IUD operation can be determined in the data nodes of the storage global secondary index based on the attribute values set in the filtering condition, and the tuple needing to perform the IUD operation can be queried from the data node according to the attribute values set in the filtering condition without traversing the global secondary index data stored in the data node to query the tuple needing to perform the IUD operation, so that the speed and efficiency of determining the tuple can be improved, and the efficiency and performance of performing the IUD operation on the tuple are improved.

In one possible design, the global secondary index contains at least one attribute in a base table; the first target attribute comprises a first attribute and/or a second attribute; wherein the first attribute is one or more attributes of the at least one attribute, and the second attribute is an attribute for indicating location information for indicating a location of a tuple to which the data of the at least one attribute belongs in the base table.

In this approach, the data nodes typically store data in a binary tree data structure. The first target attribute does not contain null values, and therefore the null values are not contained in the first attribute, the data node where the tuple is located can be determined by specifying the data value corresponding to the first attribute, and the leaf node where the tuple is located in the binary tree storage structure is determined in the data node, so that the tuple in the global secondary index can be selected from the leaf nodes directly, and the screening is not required to traverse all data in the data node, and the efficiency of searching the tuple can be improved. The first attribute is an attribute contained in the base table, so that the filtering condition of the tuple can be set based on the actual data in the base table, and the practicability is high.

In one possible design, the second node includes at least one data node;

when the first target attribute only comprises the first attribute, the value of the data belonging to the first attribute in the first condition is used for determining a first data node where the first tuple is located; or when the first target attribute only comprises the second attribute, the value of the data belonging to the second attribute in the first condition is used for determining the first tuple in the at least one data node; or when the first target attribute includes the first attribute and the second attribute, the value of the data belonging to the first attribute in the first condition is used for determining a first data node where the first tuple is located, and the value of the data belonging to the second attribute in the first condition is used for determining the first tuple in the first data node.

In the method, at least one data node which possibly stores the first tuple can be determined from at least one data node storing the global secondary index according to the range to which the attribute value of the first attribute belongs, and the first tuple can be queried from the determined at least one data node. The method can avoid traversing the data in all the data nodes to query the first tuple, and can improve the speed and efficiency of positioning the first tuple, thereby improving the efficiency and performance of performing IUD operation on the first tuple. The storage location of the first tuple in the data node may then be further determined based on the attribute value of the second attribute. By combining the attribute values of the first attribute and the second attribute, the first tuple can be queried from the determined specific storage position in the at least one data node, so that the speed and efficiency of positioning to the first tuple can be further improved, and the efficiency and performance of performing IUD operation on the first tuple are further improved.

In one possible design, the location information corresponding to each tuple includes first location information and second location information; the first location information is used for indicating a second data node where the tuple is located, and the second location information is used for indicating a storage location of the tuple in the second data node.

In the method, the node where the data is located can be quickly determined among a plurality of nodes in the storage base table based on the first position information in the position information corresponding to the data, and a more accurate data storage position can be further determined in the node based on the second position information in the position information corresponding to the data. Therefore, based on the first position information and the second position information in the position information, the data can be positioned quickly and efficiently.

In one possible design, the first attribute is an index key of the global secondary index; or, a part of the first attribute is an index key of the global secondary index, and the other attributes except the part of the first attribute are additional keys of the global secondary index.

In the method, the attribute belonging to the base table in the global secondary index can be used as an index key of the global secondary index or an additional key of the global secondary index, so that the flexibility is high.

In one possible design, the second attribute is an additional key of the global secondary index.

In the method, the attribute which does not belong to the base table in the global secondary index can be used as an additional key of the global secondary index, so that the index key of the global secondary index only comprises the attribute which belongs to the base table, the consistency of the index key and the base table data can be ensured, and the processing such as the query, the screening and the filtering of the base table data can be conveniently carried out according to the index key.

In one possible design, the global secondary index further includes version information for indicating a version of a tuple to which the data of the at least one attribute belongs, wherein the version of each tuple is used to indicate whether the data of the corresponding tuple is valid.

In the method, the version information is used for determining the validity of the metadata, so that the valid data range can be determined based on the version information of the metadata, thereby avoiding processing of invalid data and further ensuring the correctness of the data in processing.

In one possible design, before generating the first indication, the method further comprises: receiving a second instruction, wherein the second instruction is used for indicating to update or delete the first data in the base table; wherein the first data comprises the first tuple; according to the second indication, a third node for storing the first data is indicated to update or delete the first data; receiving first target information from the third node, and determining the first indication according to the first target information; when the second instruction is used for indicating to update the first data, the first target information comprises data belonging to the at least one attribute in the first data before updating, position information corresponding to the first data before updating, data belonging to the at least one attribute in the first data after updating and position information corresponding to the first data after updating; when the second instruction is used for indicating to delete the first data, the first target information comprises data belonging to the at least one attribute in the first data and position information corresponding to the first data; the position information corresponding to any data is used for indicating the position of the data in the base table.

In the method, the update of the global secondary index can be carried out along with the update of the base table, so that the effect that the data in the global secondary index of the base table correspondingly changes along with the change of the data in the base table can be realized, the data consistency of the global secondary index and the base table can be maintained, and the data correctness and usability of the global secondary index are improved.

In one possible design, the method further comprises: when a third instruction for instructing to add second data to the base table is received, determining a fourth node for storing the second data according to the third instruction, and instructing the fourth node to add the second data to the base table; receiving second target information from the fourth node; the second target information comprises data belonging to the at least one attribute in the second data and position information corresponding to the second data, wherein the position information corresponding to the second data is used for indicating the position of the second data in the base table; and updating the global secondary index according to the second target information.

In a second aspect, the present application provides a method of processing a global secondary index, applied to a second node, the method comprising: receiving a first indication from a first node, wherein the first indication is used for indicating to update or delete a first tuple meeting a first condition in a global secondary index; wherein the first condition includes: the value of the data belonging to the first target attribute in the tuple is a first data value; wherein the first target attribute does not contain a null value; the global secondary index is stored in the second node; and updating or deleting the first tuple meeting the first condition in the global secondary index according to the first indication.

In one possible design, the second node includes at least one data node; before updating or deleting the first tuple in the global secondary index that satisfies the first condition according to the first indication, the method further comprises: determining the first tuple; wherein said determining said first tuple comprises: when the first target attribute only comprises the first attribute, determining a first data node where the first tuple is located according to the value of the data belonging to the first attribute in the first condition, and searching the data stored in the first data node to obtain the first tuple; or when the first target attribute only comprises the second attribute, searching the at least one data node for obtaining the first tuple according to the value of the data belonging to the second attribute in the first condition; or when the first target attribute comprises the first attribute and the second attribute, determining the first data node where the first tuple is located according to the value of the data belonging to the first attribute in the first condition, and searching for the first tuple in the first data node according to the value of the data belonging to the second attribute in the first condition.

In a third aspect, the present application provides a data processing apparatus for use in a first node, the apparatus comprising: the processing unit is used for generating a first instruction, wherein the first instruction is used for indicating to update or delete a first tuple meeting a first condition in the global secondary index; wherein the first condition includes: the value of the data belonging to the first target attribute in the tuple is a first data value; wherein the first target attribute does not contain a null value; and the receiving and transmitting unit is used for transmitting the first indication to a second node for storing the global secondary index so that the second node updates or deletes the first tuple meeting the first condition in the global secondary index according to the first indication.

In one possible design, the second node includes at least one data node; when the first target attribute only comprises the first attribute, the value of the data belonging to the first attribute in the first condition is used for determining a first data node where the first tuple is located; or when the first target attribute only comprises the second attribute, the value of the data belonging to the second attribute in the first condition is used for determining the first tuple in the at least one data node; or when the first target attribute includes the first attribute and the second attribute, the value of the data belonging to the first attribute in the first condition is used for determining a first data node where the first tuple is located, and the value of the data belonging to the second attribute in the first condition is used for determining the first tuple in the first data node.

In one possible design, the processing unit is further configured to, prior to generating the first indication: receiving, by the transceiver unit, a second instruction, where the second instruction is used to instruct updating or deleting the first data in the base table; wherein the first data comprises the first tuple; according to the second instruction, a third node for storing the first data is instructed to update or delete the first data through the receiving-transmitting unit; receiving, by the transceiver unit, first target information from the third node, and determining the first indication according to the first target information; when the second instruction is used for indicating to update the first data, the first target information comprises data belonging to the at least one attribute in the first data before updating, position information corresponding to the first data before updating, data belonging to the at least one attribute in the first data after updating and position information corresponding to the first data after updating; when the second instruction is used for indicating to delete the first data, the first target information comprises data belonging to the at least one attribute in the first data and position information corresponding to the first data; the position information corresponding to any data is used for indicating the position of the data in the base table.

In one possible design, the processing unit is further configured to: when a third instruction for instructing to add second data to the base table is received through the transceiving unit, determining a fourth node for storing the second data according to the third instruction, and instructing the fourth node to add the second data to the base table; receiving, by the transceiver unit, second target information from the fourth node; the second target information comprises data belonging to the at least one attribute in the second data and position information corresponding to the second data, wherein the position information corresponding to the second data is used for indicating the position of the second data in the base table; and updating the global secondary index according to the second target information.

In a fourth aspect, the present application provides a data processing apparatus for use in a second node, the apparatus comprising: the receiving and transmitting unit is used for receiving a first instruction from a first node, wherein the first instruction is used for indicating to update or delete a first tuple meeting a first condition in the global secondary index; wherein the first condition includes: the value of the data belonging to the first target attribute in the tuple is a first data value; wherein the first target attribute does not contain a null value; the global secondary index is stored in the second node; and the processing unit is used for updating or deleting the first tuple meeting the first condition in the global secondary index according to the first indication.

In a fifth aspect, the present application provides a data processing apparatus comprising a memory and at least one processor; the memory is used for storing a computer program; the processor is configured to execute a computer program stored in the memory to implement the method described in the first aspect or any of the possible designs of the first aspect, or to implement the method described in the second aspect or any of the possible designs of the second aspect.

In a sixth aspect, the present application provides a distributed database system comprising the first node, the at least one second node and the at least one third node of any one of the possible designs of the first aspect or the first aspect, or comprising the first node, the at least one node and the at least one third node of any one of the possible designs of the second aspect or the second aspect.

In a seventh aspect, the present application provides a computer storage medium having a computer readable program stored therein, which when run on a computer causes the computer to perform the method described in the above first aspect or any of the possible designs of the first aspect, or causes the computer to perform the method described in the above second aspect or any of the possible designs of the second aspect.

In an eighth aspect, the present application provides a computer program product which, when run on a computer, causes the computer to perform the method described in any one of the possible designs of the first aspect or the first aspect, or causes the computer to perform the method described in any one of the possible designs of the second aspect or the second aspect.

In a ninth aspect, embodiments of the present application provide a chip for reading a computer program stored in a memory, performing the method described in any one of the above first aspect or any one of the possible designs of the first aspect, or performing the method described in any one of the above second aspect or any one of the possible designs of the second aspect.

In a tenth aspect, embodiments of the present application provide a chip system, where the chip system includes a processing unit, configured to support a computer device to implement the method described in the first aspect or any of the possible designs of the first aspect, or implement the method described in the second aspect or any of the possible designs of the second aspect.

In one possible design, the chip system further includes a memory for storing programs and data necessary for the computer device.

In one possible design, the chip system may be formed from a chip or may include a chip and other discrete devices.

Advantageous effects of the second aspect to the tenth aspect are described with reference to the first aspect, and the detailed description thereof is omitted here.

Drawings

FIG. 1a is a schematic diagram of a distributed database system;

FIG. 1b is a schematic diagram of a flow of IUD manipulation of a global secondary index;

FIG. 2a is a schematic diagram of a distributed database system;

FIG. 2b is a schematic diagram of a flow of IUD manipulation of a global secondary index;

FIG. 3a is a schematic diagram of a distributed database system according to an embodiment of the present disclosure;

FIG. 3b is a schematic diagram of a distributed database system according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a method for processing a global secondary index according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a method of performing IUD manipulation on a base table according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a method for processing a global secondary index according to an embodiment of the present disclosure;

FIG. 7 is a flowchart of a method for processing a global secondary index according to an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of data distribution in a distributed database according to an embodiment of the present application;

FIG. 9 is a schematic diagram of data distribution in a distributed database according to an embodiment of the present application;

FIG. 10 is a schematic diagram of data distribution in a distributed database according to an embodiment of the present application;

FIG. 11 is a schematic diagram of data distribution in a distributed database according to an embodiment of the present application;

FIG. 12 is a flowchart of a method for processing a global secondary index according to an embodiment of the present disclosure;

FIG. 13 is a schematic diagram of a pair-base table, a global secondary index and data distribution thereof according to an embodiment of the present application;

FIG. 14 is a diagram of IUD manipulation of base tables and global secondary indexes and data distribution in a distributed database according to an embodiment of the present application;

FIG. 15 is a diagram of IUD manipulation of base tables and global secondary indexes and data distribution in a distributed database according to an embodiment of the present application;

FIG. 16 is a diagram of IUD manipulation of base tables and global secondary indexes and data distribution in a distributed database according to an embodiment of the present application;

FIG. 17 is a schematic diagram of a data processing apparatus according to an embodiment of the present disclosure;

fig. 18 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings. Wherein in the description of embodiments of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature.

For ease of understanding, a description of concepts related to the present application is given by way of example for reference.

1) A distributed database is a logically unified database formed by connecting physically dispersed database units by a computer network, where each connected database unit may be referred to as a site or node or Data Node (DN), etc. Distributed storage is a data storage technique, which is a technique of storing data in a plurality of individual data units in a decentralized manner.

2) A key-value database (key-value database), or key-value store, is a data storage paradigm used to store, retrieve, and manage associative arrays, which are one type of data structure more commonly referred to as a dictionary or hash table. Wherein a dictionary contains a collection of objects or records, within which are a plurality of different fields (or fields), each containing data. The key-value database may store data as a set of key-value pairs, where a key is a unique identifier, may be used to identify the data, and may also be referred to as a field or attribute, etc.

3) A base table (i.e., table) is an object in a database for storing data, and is a collection of structured data. A base table may be defined as a collection of at least one attribute (or column/key). An attribute is a list of data in the base table, and thus, the base table is also understood to be a collection of at least one list of data.

4) A primary key (primary key), in which there is typically a combination of one or more columns of data whose values can uniquely identify each row in the base table, is said to be the primary key of the base table by which the physical integrity of the base table can be enforced. When creating or changing the base table, the primary key can be created by defining primary key constraints, one base table can only have one primary key constraint, and the column data in the primary key constraint cannot be null. Since primary key constraints ensure unique data, they are often used to define identification columns.

5) A distribution key (distribution key), which is a combination of one or more columns of data in a distributed database that is used to determine the database unit storing a particular data row, is used to determine which database unit the data row in the base table is stored in based on the distribution key.

6) A global secondary index (global secondary index, GSI), also known as a global index, is one type of index, which is a sorted data structure in a database system that is used to facilitate quick querying and updating of data in a base table of the database. The global secondary index is a global index that takes effect for data nodes in the distributed database.

7) An index key (index key) is a combination of one or more columns used to identify an index item.

8) NULL (NULL) value: the representation value is unknown. Null values are special tags used in structured query languages that can generally represent that data is unknown, inapplicable, or will be added later, an identification of data attributes that are unknown or missing in the database, for indicating uncertainty values in the database. The null value may satisfy the need to support "missing information and inapplicable information" in a real relational database management system (RDBMS).

9) A Coordinator Node (CN) is a node that performs data scheduling. The coordination node can manage the data nodes which establish communication connection with the coordination node and schedule data among the data nodes.

A Data Node (DN) is a node for processing and storing data. The data node may receive the data scheduling information of the coordinating node, and transmit and receive data or perform data processing and storage according to the received data scheduling information.

The device forms of the coordination node and the data node provided in the embodiment of the present application may be electronic devices, or may be functional modules or units disposed in the electronic devices, or may be chips, integrated circuits, components, and the like disposed in the electronic devices.

10 B-tree): is a self-balancing multi-way tree, and can keep data orderly. In summary, the B-tree is a generalized binary search tree (binary search tree) that can have more than 2 child nodes. This data structure enables operations of searching for data, sequentially accessing, inserting data, and deleting data to be completed in logarithmic time. B-tree such data structures may be used to describe external storage, often applied to database and file system implementations. And the data nodes can adopt a B-tree structure to store basic table data and global secondary index data.

By way of example, the electronic device may be a terminal device, which may also be referred to as a User Equipment (UE), a Mobile Terminal (MT), etc. Terminal devices include, but are not limited to, mobile Phone, tablet, laptop, palmtop, mobile internet device (Mobile Internet Device, MID), wearable device (e.g., smart watch, smart bracelet, etc.), vehicle, on-board device (e.g., car, electric car, airplane, boat, train, high-speed rail, etc.), virtual Reality (VR) device, augmented Reality (Augmented Reality, AR) device, wireless terminal in industrial control (Industrial Control), smart Home device (e.g., refrigerator, television, air conditioner, electric meter, etc.), smart robot, workshop device, wireless terminal in unmanned (Self Driving), wireless terminal in teleoperation (Remote Medical Surgery), wireless terminal in Smart Grid (Smart Grid), wireless terminal in transportation security (Transportation Safety), wireless terminal in Smart City (Smart City), or wireless terminal in Smart Home (Smart Home), flying device (e.g., smart robot, hot balloon, unmanned aerial vehicle, aircraft), etc. Exemplary terminal devices in embodiments of the present application include, but are not limited to, piggybacking Or other operating system.

It should be understood that in embodiments of the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a alone, a and B together, and B alone, wherein A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one (item) below" or the like, refers to any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, a and b, a and c, b and c, or a, b and c, wherein a, b and c can be single or multiple.

Currently, in a distributed database, when performing IUD operation on a global secondary index of a base table, it is required to traverse global secondary index data stored in a data node to locate a tuple needing to perform IUD operation, and further perform corresponding IUD operation on the tuple, so that efficiency and performance of performing IUD operation on the global secondary index are low.

In addition, the IUD operations currently performed on the global secondary index are mainly to maintain consistency of the global secondary index data with the base table data (i.e., to maintain consistency of the data in the global secondary index with its corresponding data in the base table), and thus the IUD operations on the global secondary index are mostly performed in association with the IUD operations on the base table. Whereas for IUD operations for base tables and global secondary indexes, the currently commonly employed schemes are asynchronous schemes or synchronous schemes.

Wherein the asynchronous scheme is that when IUD operation is performed on data in the base table, the IUD operation can be performed immediately, and IUD operation on the global secondary index will be delayed. In the asynchronous scheme, aiming at IUD sentences issued by users, a database system only operates base table data, but operates index data in GSI in a suspending way, and although the performance of IUD operation can be ensured, the final consistency of global secondary index data and base table data can be only supported, so that the usability of the global secondary index is not improved.

In particular, referring to FIG. 1a, a database system architecture employing an asynchronous scheme may include a plurality of database base tables, wherein the database base tables are stored in a plurality of shards. The global secondary index established for the base table is stored in a plurality of shards of the database using a different distribution scheme than the base table.

As shown in fig. 1b, the flow of IUD operation on the global secondary index in this scheme may include the following steps 1 to 5:

step 1: the database system receives IUD statements issued by the user for IUD operations on the base table.

Step 2: the database system performs IUD operations on the base table.

Step 3: the database system determines whether to perform IUD operation on the global secondary index of the base table; if yes, executing the step 4, otherwise executing the step 5.

When the database system determines that the asynchronous update period is reached, determining to perform IUD operation on the global secondary index of the base table, otherwise, determining not to perform IUD operation on the global secondary index of the base table.

Step 4: the database system performs IUD operations on the global secondary index of the base table.

Step 5: the database system waits for a new instruction statement.

The asynchronous scheme can effectively realize the asynchronous updating of the global secondary index, but before asynchronous updating of the global secondary index, the data in the global secondary index is inconsistent with the data of the base table, so that the scheme only ensures the final consistency of the data of the global secondary index and the data of the base table. The result of operations such as data query using the global secondary index cannot guarantee correctness until the asynchronous update period comes. The asynchronous scheme delays the IUD operation on the global secondary index, so that the overall efficiency of the IUD operation is effectively improved, the requirement on data consistency is reduced, the data on the global secondary index cannot be timely synchronized, and the availability of the global secondary index is reduced.

The synchronization scheme is that when IUD operation is performed on data in the base table, the IUD operation is immediately performed, and for the global secondary index, the IUD operation for the global secondary index is implemented by adopting a read-before-write or direct update mode according to the existence of the base table primary key. The synchronization scheme can support strong consistency of the global secondary index data and the base table data, but for IUD statement issued by a user, the database system can confirm whether the base table main key value exists or not by using approximate member inquiry, and update GSI correspondingly by using a method of reading before writing or directly performing IUD, and the operation performance of IUD is affected due to the need of performing the operation of checking the existence of the base table main key value.

Specifically, FIG. 2a is a schematic diagram of a database system architecture employing an asynchronous scheme. As shown in fig. 2a, the data distribution manner in the database system architecture applying the asynchronous scheme is basically identical to that in the asynchronous scheme, but the data organization structure of the bottom layer in the database system in this scheme uses a form of a log structure merge (log structured merge, LSM) tree (tree), when IUD operation is performed on the data in the base table, the IUD operation on the base table is immediately performed, and for the global secondary index, after judging the existence of the primary key value of the base table, the IUD operation for the global secondary index is performed synchronously according to the specific storage condition of the base table data.

As shown in fig. 2b, the flow of IUD operation on the global secondary index in this scheme may include the following steps 1 to 6:

step 1: the database system extracts primary key data of the base table related to the IUD statement after performing the IUD operation on the base table according to the IUD statement issued by the user.

Step 2: the database system performs an approximate member query on the LSM data structure.

Step 3: the database system determines whether the primary key data of the base table exists in the LSM data structure, if yes, the step 4 is executed, and if not, the step 6 is executed.

Step 4: the database system performs read-before-write.

Step 5: the database system deletes the data corresponding to the primary key data of the base table in the global secondary index. And step 6 is performed.

Step 6: the database system performs IUD operations on the global secondary index.

The above synchronization scheme can solve the problem of synchronous update of the global secondary index, but the scheme is only aimed at a database system of which the bottom layer uses LSM tree organization, and cannot be applied to a database system supporting all distributed data structures. In addition, the approximate member query operation of the primary key is required before the IUD operation is performed on the data in the global secondary index, and a certain execution time is consumed, which affects the performance of the IUD operation. The scheme is a targeted global secondary index DML implementation scheme of a distributed non-relational database with an LSM tree organization adopted at the bottom layer, has no universality, needs to consider the characteristics of the distributed database based on the LSM tree, and needs to consider the sorting process if the merging operation of SSTable and a memory table (Memable) needs to sort, and also needs to consider the sorting process for the IUD operation of the global secondary index.

In summary, the efficiency and performance of IUD operation on the global secondary index in the current scheme are low, and it is difficult to combine data consistency and IUD operation performance. In view of this, the embodiments of the present application provide a method and an apparatus for processing a global secondary index, which are used to improve the efficiency and performance of IUD operation on the global secondary index, and improve the consistency of the base table and the data of the global secondary index, and improve the availability of the global secondary index.

The scheme provided by the embodiment of the application can be applied to a scene of processing the global secondary index of the base table in the distributed database. Wherein, the base table includes at least one column of data, and the at least one column of data can be divided into at least one row of data, that is, the base table includes at least one row×at least one column of data. In some embodiments of the present application, the data in the same column in the base table belongs to the same attribute; the data in the same row in the base table belong to the same tuple and can be stored in the same storage location. At least one column of data in the base table is a distribution key of the base table, and the distribution key of the base table is used for determining a storage position of the data in the base table. Optionally, there may also be at least one column of data in the base table for use as a primary key. The distribution key and the main key of the base table may be the same or different. The distribution key and the main key of the base table may be data of a set column in the base table, or may be data determined according to a user instruction.

In this embodiment of the present invention, the global secondary index of the base table includes data of a partial column of the base table and position information corresponding to the data of the base table, where the data of the partial column may be selected and set by a user from column data of the base table, and the position information corresponding to the data is used to indicate a position of a tuple to which the data belongs in the base table. Optionally, the global secondary index of the base table may further include version information (or referred to as transaction information) corresponding to the data in the base table, where the version information corresponding to the data is used to determine whether the corresponding data is valid. In some embodiments of the present application, data located in the same column in the global secondary index belongs to the same attribute; data in the same row in the global secondary index belongs to the same tuple and can be stored in the same storage position. At least one column of data in the global secondary index is a distribution key of the global secondary index, and the distribution key of the global secondary index is used for determining a node where the data in the global secondary index is located. In some embodiments of the present application, the global secondary index includes index keys, additional keys (which may also be referred to as additional columns/hidden columns/covered columns/contained columns), and the like. The index key is at least one column of data in the global secondary index and can be used for inquiring specific data, so that the inquiring speed is increased. The additional key comprises at least one column of data in the global secondary index which is invisible to the user, and when the user instructs the global secondary index to be displayed, the data in the global secondary index displayed to the user does not comprise the additional key data of the global secondary index. The distribution key and index key of the global secondary index may be the same or different. The distribution key and index key of the global secondary index may be data of a set column or may be data determined according to a user instruction.

In the scheme provided by the embodiment of the application, the user can perform IUD operation on the data in the global secondary index, so that the global secondary index is updated. The IUD operation on the global secondary index may be performed after the IUD operation on the base table to ensure strong consistency of the base table data and the global secondary index data.

The scheme provided by the embodiment of the application can be applied to a distributed database system comprising at least one coordination node and at least one data node. The data nodes are used for storing base table data and/or global secondary index data, and the coordination nodes are used for managing, operating and scheduling the base table data.

For example, taking the system including one coordination node and a plurality of data nodes as shown in fig. 3a, the data included in the base table and the data included in the global secondary index may be stored in a distributed manner in the plurality of data nodes, for example, the data nodes 1 to N shown in fig. 3a, where N is a positive integer. Of course, the data contained in the global secondary index may also be stored in a distributed manner in a portion of the plurality of data nodes. That is, not all data nodes in the distributed database system necessarily store the data of the global secondary index, and in particular, the distribution setting of the data of the global secondary index stored by the data nodes can be performed according to the actual situation. The base table data 1 to N shown in fig. 3a constitute complete base table data, and the global secondary index data 1 to N shown in fig. 3a constitute complete global secondary index data.

In the distributed database system, communication connection is established between each coordination node and at least one data node, and IUD operation can be performed on base table data and global secondary index data stored in the data nodes according to user instructions. Specifically, after receiving an IUD statement input by a user, the coordination node determines a data node storing base table data according to a base table distribution key related in the statement through an analysis process of the statement, and instructs the data node to perform a corresponding IUD operation, as shown in a process a in fig. 3 a. At the data node, the base table data will be updated first, and then information for updating the global secondary index, such as the base table distribution key, the global secondary index key, the base table data location information, etc., is sent back to the coordinator node, as shown by the b process in fig. 3 a. The coordination node sends the global secondary index key, the base table data position information, the base table distribution key information and the like to the data node storing the global secondary index according to the global secondary index distribution key, and the data node performs IUD operation on the global secondary index according to the information to update the global secondary index, as shown in a c process in fig. 3 a. The distribution manner of the base table data and the global secondary index data in the data nodes may be hash (hash) distribution, range (range) distribution, and the like, which is not particularly limited in the embodiment of the present application. For example, for data in hash distribution, the data may be calculated by using a hash function, and a data node where the data is located may be determined according to a calculation result. For data of a range distribution, the distribution may be stored on different data nodes directly according to the data range.

In some embodiments of the present application, the distributed storage of the base table and the distributed storage of the global secondary index are independent from each other, and the distribution manner of the base table data and the global secondary index data at the data node may be different. The coordinating node may store the base table data in at least one data node in a distributed manner (e.g., hash distribution, range distribution, etc.) corresponding to the base table. The coordinating node may store the global secondary index data in a distributed manner (e.g., hash distribution, scope distribution, etc.) corresponding to the global secondary index in at least one data node. Only the base table data or the global secondary index data can be stored in the same data node, and the base table data and the global secondary index data can also be stored simultaneously.

Illustratively, the coordinating node may store the base table data in a distributed manner at a plurality of data nodes by a distribution algorithm according to the distribution keys of the base table. For the global secondary index of the base table, the coordination node can adopt the same or different distribution mode with the base table, and according to the distribution key of the global secondary index, the global secondary index data is distributed and stored in a plurality of data nodes through a distribution algorithm.

In some embodiments of the present application, the coordinating node may also perform data query operations based on the base table. Illustratively, as shown in fig. 3b, taking a query scenario as an example, a coordination node may include a query parsing module and a query optimizing module, and a data node may include a query executing module. The user may enter a query statement on the coordinator node indicating the query data and view the returned data on the coordinator node. The query analysis module in the coordination node can analyze query sentences input by a user, the query optimization module in the coordination node can optimize analysis results of the query analysis module, and send query instructions to the query execution module in the data node, and the query analysis module in the data node can perform final execution of data query.

Illustratively, the functions of the query parsing module may be implemented by a parser deployed in a coordination node, the functions of the query optimizing module may be implemented by an optimizer deployed in a coordination node, and the functions of the query executing module may be implemented by an executor deployed in a data node. Of course, the functions of the above modules may also be implemented by other hardware devices, which are not specifically limited in the embodiments of the present application. For example, the above-described execution logic of the coordinating node or the data node may be deployed in the form of program code on the coordinating node and the data node in the distributed database.

In the embodiment of the application, each data node is not different in architecture except for different stored data, and supports various different data distribution modes. The coordination nodes can be selected by an election algorithm, can also adopt different architectures from the data nodes, and can be multiple according to the specific deployment condition of the distributed database system. All modules of the database can be equally deployed on the coordination node and all data nodes, and the roles assumed by the nodes in the distributed database system are set through configuration files.

It should be understood that the system architecture shown in fig. 3a or fig. 3b is merely an exemplary illustration of a system architecture applicable to the present application, and is not limited to the system architecture applicable to the present application.

The following describes the solution provided in the embodiments of the present application in conjunction with specific embodiments.

Referring to fig. 4, a method for processing a global secondary index according to an embodiment of the present application may include:

s401: the first node generates a first indication, wherein the first indication is used for indicating to update or delete a first tuple meeting a first condition in the global secondary index; wherein the first condition includes: the value of the data belonging to the first target attribute in the tuple is a first data value; wherein the first target attribute does not contain a null value.

The first node in the embodiment of the present application may be a coordination node in the distributed database system for managing, operating and scheduling the base table data and the global secondary index data, and the second node in the embodiment of the present application may be a data node in the distributed database for storing the base table data and the global secondary index data. Illustratively, the first node may be a coordinating node in the distributed database system, and the second node may be a data node in the distributed database system. In this embodiment, the data structure adopted by the distributed database system may be a B-tree, that is, the second node may store data in the form of a B-tree structure.

In the embodiment of the present application, the global secondary index is a global secondary index of a base table in the distributed database. Wherein the base table may contain one or more attributes. Each attribute in the base table is a column of data in the base table, and the data in the base table (i.e., the at least one attribute/the at least one column of data) may be stored in at least one data node in a distributed manner (i.e., in a distributed manner by tuple). Wherein data in the same row (i.e., one tuple) in the base table may be stored in the same data node, each of the at least one data node storing at least one tuple (i.e., at least one row of data) in the base table. The base table may have at least one global secondary index, wherein the global secondary index of the base table may contain at least one attribute in the base table, the at least one attribute contained in the one or more attributes. The second node for storing the global secondary index data may comprise at least one data node in which the global secondary index data may be distributed according to a row distribution of data. Wherein data in the same row (i.e., one tuple) in the global secondary index may be stored in the same data node, and at least one tuple (i.e., at least one row of data) in the global secondary index is stored in each data node storing the global secondary index.

It should be understood that the global secondary index described in the embodiments of the present application may be any one of the at least one global secondary index of the base table, or the global secondary index described in the embodiments of the present application may be each one of the at least one global secondary index of the base table.

In some embodiments of the present application, the global secondary index of the base table may further include location information, where the location information is used to indicate a location in the base table of data corresponding to the data in the global secondary index in the base table. Specifically, the location information is used for indicating the location of the tuple to which the data of at least one attribute belongs in the base table, wherein the at least one attribute is an attribute belonging to the base table contained in the global secondary index. Alternatively, the tuple to which the data of the at least one attribute belongs may be a tuple to which the data of the at least one attribute belongs in the base table. In some embodiments of the present application, the location information corresponding to each tuple may include first location information and second location information, where the first location information is used to indicate a second data node where the tuple is located, and the second location information is used to indicate a storage location of the tuple in the second data node. Wherein the second data node is a data node of at least one data node for storing a base table.

Based on the above method, at least one piece of location information may be included in the global secondary index, wherein each piece of location information may correspond to one tuple in the base table, and each piece of location information is used to indicate a location of the corresponding tuple in the base table. In some embodiments of the present application, the at least one piece of location information in the global secondary index may be an attribute in the global secondary index, or the at least one piece of first location information in the global secondary index may be an attribute in the global secondary index, and the at least one piece of second location information in the global secondary index may be an attribute in the global secondary index.

In some embodiments of the present application, the global secondary index of the base table may further include version information, where the version information is used to indicate a version of data in the global secondary index corresponding to the data in the base table, and each version of data is used to determine whether the corresponding data is valid. Specifically, the version information is used for indicating the version of the tuple to which the data of the at least one attribute belongs, wherein the version of each tuple is used for indicating whether the data of the corresponding tuple is valid. Alternatively, the tuple to which the data of the at least one attribute belongs may be a tuple to which the data of the at least one attribute belongs in the base table.

Based on the above method, at least one version information may be included in the global secondary index of the base table, wherein each version information corresponds to one tuple in the base table, each version information is used to indicate a version of data of the corresponding one tuple, and the version of data of each tuple is used to determine whether the data of the corresponding tuple is valid. Based on the version information, when the first node queries the base table data, whether the corresponding data in the base table is valid or not can be determined according to the version information contained in the global secondary index of the base table, so that the valid data contained in the base table is determined, and the data to be queried can be obtained from the valid data.

In some embodiments of the present application, the version information corresponding to each tuple may include first version information and second version information. Wherein the first version information may be used to represent an identification of a transaction in which the tuple is inserted in the base table and the second version information may be used to represent an identification of a transaction in which the tuple is deleted in the base table. Whether the tuple is valid can be determined by comparing the first version information and the second version information of the tuple in the base table.

In some embodiments of the present application, at least one version information in the global secondary index may be used as an attribute in the global secondary index, or at least one first version information in the global secondary index may be used as an attribute in the global secondary index, and at least one second version information in the global secondary index may be used as an attribute in the global secondary index.

Alternatively, the location information and version information contained in the global secondary index may be used as data for the additional keys in the global secondary index.

In some embodiments of the present application, the index key included in the global secondary index has a non-null constraint, i.e., no null value is included in the index key.

In this embodiment of the present application, the first target attribute may be at least one attribute selected by the user from attributes included in the global secondary index, and the first data value may be a corresponding attribute value set by the user for at least one attribute selected from attributes included in the global secondary index. For example, the attributes included in the global secondary index include attributes 1 to 3, in one example, the first target attribute may be attribute 1 selected from attributes 1 to 3 by the user, the first data value may be data value 1 corresponding to attribute 1 set by the user, and the first tuple meeting the first condition in the global secondary index is: the included data belonging to attribute 1 takes the value of tuple of data value 1. In another example, the first target attribute may be attribute 2 and attribute 3 selected by the user from attributes 1 to 3, and the first data value may include data value 2 corresponding to attribute 2 and data value 3 corresponding to attribute 3 set by the user, and the first tuple satisfying the first condition in the global secondary index is: the data belonging to attribute 2 is included as a tuple of data value 2 and the data belonging to attribute 3 is included as a tuple of data value 3.

In some embodiments of the present application, the first target attribute may include a first attribute and/or a second attribute, where the first attribute is one or more of at least one attribute, and the second attribute is an attribute for indicating location information; wherein the at least one attribute is an attribute belonging to a base table contained in the global secondary index, and the second attribute may include at least one location information as described above. Optionally, the first attribute is a distribution key of a global secondary index.

And if the first target attribute does not contain the null value, the first attribute does not contain the null value, so that the user can designate the data value corresponding to the first attribute and can select at least one tuple in the global secondary index through the data value.

In some embodiments of the present application, the second node for storing the global secondary index comprises at least one data node. In one possible case, when the first target attribute only includes the first attribute, the value of the data belonging to the first attribute in the first condition is used to determine a first data node where the first tuple is located, where the first data node is included in the at least one data node. I.e. the attribute value of the first attribute is used to determine the data node where said first tuple is located. In a scenario where data nodes store data based on B-tree structures, each data node stores data in the form of a B-tree structure, each B-tree structure includes a plurality of leaf nodes, and the leaf nodes include stored data. In this scenario, according to the range to which the attribute value of the first attribute belongs, a data node storing the first tuple may be determined in at least one data node storing the global secondary index, and a leaf node in the B-tree where the first tuple is located (i.e., the first tuple is in a storage area of the data node) is determined in the data node, so that the first tuple may be obtained by querying the leaf node in the determined data node. Based on the mode, the first tuple can be searched by avoiding traversing all data in the data node (namely, the data of the whole B-tree in the data node), the speed and the efficiency of positioning the first tuple can be improved, and the efficiency and the performance of performing IUD operation on the first tuple are further improved.

In another possible case, when the first target attribute includes the first attribute and the second attribute, the value of the data belonging to the first attribute in the first condition is used to determine a first data node where the first tuple is located, and the value of the data belonging to the second attribute in the first condition is used to determine the first tuple in the leaf nodes in the first data node where the first tuple is located. That is, the attribute value of the first attribute may be used to determine the data node where the first tuple is located and the storage location of the first tuple in the data node where the first tuple is located, and the attribute value of the second attribute may be used to determine the storage location of the first tuple in the data node, more specifically, the attribute value of the second attribute is used to determine the storage location of the first tuple in the leaf node in the data node, that is, the most accurate storage location of the first tuple. In a scenario in which data nodes store data based on a B-tree structure, a data node storing a first tuple and a rough storage location of the first tuple in the data node (i.e., a storage location storing the first tuple and possibly storing some other tuples) may be determined in at least one data node storing a global secondary index according to a range to which an attribute value of a first attribute belongs, and then the data node may be further located to the first tuple according to an attribute value of a second attribute, and an accurate storage location of the first tuple in the data node (i.e., a storage location storing only the first tuple) may be determined. The attribute values associated with the first attribute and the second attribute may then be queried from the determined particular storage location in the particular data node to obtain the first tuple. Based on this approach, the speed and efficiency of locating the first tuple can be further improved, thereby improving the efficiency and performance of IUD operations on the first tuple.

In yet another possible scenario, when the first target attribute contains only the second attribute, the exact storage location of the first tuple in the data node cannot be determined based on only the second attribute, and then all data of the global secondary index stored in the data node needs to be traversed (i.e., the entire B-tree is traversed) to locate the first tuple. However, the location information of each tuple in the global secondary index is generally unique, so that the second attribute (i.e. the location information) is used as a condition for locating the first tuple, and the first tuple can be quickly and accurately screened in the global secondary index under a simpler screening condition.

In some embodiments of the present application, the first attribute may be an index key of the global secondary index; or, a part of the first attribute is an index key of the global secondary index, and the other attributes except the part of the first attribute are additional keys of the global secondary index. Wherein the partial attribute is an attribute set by a user.

S402: and the first node sends the first indication to a second node for storing the global secondary index, so that the second node updates or deletes the first tuple meeting the first condition in the global secondary index according to the first indication.

After the first node generates the first indication, the first indication may be sent to a second node for storing the global secondary index, and after the second node receives the first indication, the second node may locate the first tuple according to a first condition in the first indication, and further perform a corresponding IUD operation on the first tuple.

In some embodiments of the present application, the method described in the foregoing steps S401 to S402 may be performed after performing IUD operation on the base table, so that the effect of synchronously updating the global secondary index along with the update of the base table may be achieved by performing IUD operation on the global secondary index according to the update condition of the base table, so as to maintain strong consistency between the global secondary index data and the base table data, and improve availability of the global secondary index.

Optionally, before the step S401, the first node may further perform steps S501 to S503 as shown in fig. 5, which specifically include:

s501: receiving a second instruction, wherein the second instruction is used for indicating to update or delete the first data in the base table; wherein the first data includes the first tuple.

The first data may be a tuple to which the first tuple belongs in the base table.

S502: and according to the second indication, a third node for storing the first data is indicated to update or delete the first data.

In some embodiments of the present application, when the second indication is used to indicate updating of the first data in the base table, the third node is a node storing the first data in a plurality of data nodes for storing the base table. The first node may calculate data belonging to a base table distribution key in the first data by using a distribution algorithm corresponding to a distribution mode of the base table, and determine a third node for storing the first data according to a calculation result. After determining the third node for storing the first data, the first node may instruct the third node to update the first data. After receiving the indication, the third node determines the first data from the stored base table data, updates the first data according to the indication, and can also generate position information and version information corresponding to the first data. Optionally, the third node may comprise at least one data node. The third node may be the same node as the second node, or the third node may contain partially the same data node as the second node, or the third node may be entirely different from the data node contained by the second node.

In some embodiments of the present application, when the second indication is used to indicate deletion of the first data in the base table, the third node is a node storing the first data in a plurality of data nodes for storing the base table. The first node may calculate data belonging to a base table distribution key in the first data by using a distribution algorithm corresponding to a distribution mode of the base table, and determine a third node for storing the first data according to a calculation result. After determining the third node to store the first data, the first node may instruct the third node to delete the first data. And after receiving the indication, the third node determines the first data from the stored base table data and performs deleting operation on the first data according to the indication. The deleting operation performed by the third node on the first data may be updating version information corresponding to the first data, so that the updated version information marks the first data as invalid data. The third node can uniformly delete the first data and other data to be deleted when the time for deleting the data is reached.

S503: receiving first target information from a third node, and determining the first indication according to the first target information; when the second instruction is used for indicating to update the first data, the first target information comprises data belonging to the at least one attribute in the first data before updating, position information corresponding to the first data before updating, data belonging to the at least one attribute in the first data after updating and position information corresponding to the first data after updating; when the second instruction is used for indicating to delete the first data, the first target information comprises data belonging to the at least one attribute in the first data and position information corresponding to the first data; the position information corresponding to any data is used for indicating the position of the data in the base table.

In some embodiments of the present invention, after performing an update/delete operation on the first data according to the instruction of the first node, the third node may return first target information for updating the global secondary index of the base table to the first node, so that the first node controls the node storing the global secondary index to update the global secondary index accordingly.

In some embodiments of the present application, when the second instruction is used to indicate updating of the first data in the base table, the first target information returned by the third node to the first node includes data belonging to the at least one attribute in the first data before updating, location information corresponding to the first data before updating, data belonging to the at least one attribute in the first data after updating, and location information corresponding to the first data after updating. Wherein the location information may include the first location information and the second location information described in the above embodiments. After receiving the first target information, the first node may use the attribute belonging to the base table in the updated first tuple and the location information corresponding to the updated first data as the updated first tuple in the global second-level index, and instruct the second node to update the first tuple in the global second-level index into the updated first tuple.

In some embodiments of the present application, when the second instruction is used to instruct to delete the first data in the base table, the first target information returned by the third node to the first node includes data belonging to the at least one attribute in the first data and location information corresponding to the first data. Wherein the location information may include the first location information and the second location information described in the above embodiments. After receiving the first target information, the first node may send the data of the first tuple to the second node, and instruct the second node to delete the data.

Optionally, when the second instruction is used for indicating to update the first data in the base table, the first target information may further include version information corresponding to the first data before and after updating. When the second instruction is used for indicating to delete the first data in the base table, the first target information may further include version information corresponding to the first data.

In some embodiments of the present application, the step S503 may be replaced by the following steps: receiving first target information from the third node, and determining the first condition according to the first target information; when the second indication is used for indicating to update the first data, the first target information comprises data belonging to the at least one attribute in the first data before updating, data belonging to a base table distribution key in the first data before updating, second position information corresponding to the first data before updating, data belonging to the at least one attribute in the first data after updating, data belonging to a base table distribution key in the first data after updating, and second position information corresponding to the first data after updating; when the second instruction is used for indicating to delete the first data, the first target information comprises data belonging to the at least one attribute in the first data, data belonging to a base table distribution key in the first data and second position information corresponding to the first data; any data belongs to the data of the base table distribution key and is used for determining first position information corresponding to the data.

In this step, when the second instruction is used to instruct updating the first data, the first node may determine first location information corresponding to the updated first data according to data belonging to the base table distribution key in the updated first data, and then use the updated data belonging to the at least one attribute in the first data, the first location information corresponding to the updated first data, and the second location information as updated first tuples, and instruct the second node to update the first tuple in the global second-level index into the updated first tuples.

When the second instruction is used for indicating to delete the first data, the first node may determine first location information corresponding to the first data according to data belonging to the base table distribution key in the first data, and then use the data belonging to the at least one attribute, the first location information corresponding to the first data and the second location information in the first data as data of the first tuple, and instruct the second node to delete the data.

In some embodiments of the present application, the first node may also control the insertion of data in the base table. Specifically, when receiving a third instruction for instructing to add second data to the base table, the first node may determine a fourth node for storing the second data according to the third instruction, and instruct the fourth node to add the second data to the base table. After the fourth node inserts the second data into the base table according to the indication, the fourth node may further generate location information and version information corresponding to the first data, and return second target information for updating the global secondary index to the first node. After receiving the second target information from the fourth node, the first node may update the global secondary index according to the second target information. The first node may calculate, according to a distribution algorithm corresponding to a distribution manner of data in the base table, data belonging to a distribution key of the base table in the second data, and determine, according to a calculation result, a fourth node for storing the second data.

Optionally, the fourth node may comprise at least one data node. The fourth node may be the same node as the second/third node, or the fourth node may contain partially the same data node as the second/third node, or the fourth node may be entirely different from the data node contained by the second/third node.

As an optional implementation manner, the second target information includes data belonging to the at least one attribute in the second data and location information corresponding to the second data, where the location information corresponding to the second data is used to indicate a location of the second data in the base table. The first node may insert the second target information as data inserted into the global secondary index and instruct the node storing the global secondary index to insert the second target information into the global secondary index.

As another optional implementation manner, the second target information includes data belonging to the at least one attribute in the second data, data belonging to a base table distribution key in the second data, and second location information corresponding to the second data, where the data belonging to the base table distribution key in the second data is used to determine first location information corresponding to the second data, the first location information corresponding to the second data is used to indicate a node where the second data is located, and the second location information corresponding to the second data is used to indicate a location of the second data in the node. The first node may determine, according to the data belonging to the base table distribution key in the second data, first location information corresponding to the second data, and then insert the data belonging to the at least one attribute in the second data, the first location information corresponding to the second data, and the second location information into the global secondary index, and instruct a node storing the global secondary index to insert the data into the global secondary index.

Optionally, the second target information may further include version information corresponding to the second data.

In the above method, IUD operations on global secondary index data can be supported by adding modifications to the non-null constraints of the data in the B-tree. By setting the data filtering condition according to the attribute of the global secondary index which does not contain null values, the tuple needing to perform the IUD operation in the global secondary index is indicated, the data nodes possibly storing the tuple needing to perform the IUD operation can be determined in the data nodes stored in the global secondary index based on the attribute values set in the filtering condition, further the tuple needing to perform the IUD operation can be searched and positioned from the data nodes, the data nodes can be prevented from being traversed to search and position the tuple needing to perform the IUD operation, and the speed and efficiency of positioning the tuple are improved. Therefore, by generating the data filtering condition when performing the IUD operation on the global secondary index according to the non-empty constraint of the attribute column in the global secondary index, the efficiency and performance of performing the IUD operation can be improved. In addition, the global secondary index of the base table contains the position information corresponding to the data in the base table, instead of the main key, the distribution key or the partition key of the base table, so that the data size of the global secondary index of the base table can be properly reduced, the index expansion problem is reduced, and especially when the attribute columns related to the main key and the distribution key of the base table are more, the data size required to be maintained after the global secondary index is created can be obviously reduced, and the efficiency of using the global secondary index is improved. In the method, when IUD operation is performed on the data in the base table, the first node can synchronously update the global secondary index of the base table according to the target information returned by the node performing the IUD operation, so that strong consistency of the base table data and the global secondary index data can be ensured, and further accuracy in using the base table and the global secondary index is improved. In summary, the method can give consideration to data consistency and IUD operation performance, and can ensure consistency of global secondary index data and base table data while improving performance of IUD operation on the global secondary index.

Referring to fig. 6, a method for processing a global secondary index according to an embodiment of the present application may include:

s601: the second node receives a first indication from the first node, wherein the first indication is used for indicating to update or delete a first tuple meeting a first condition in the global secondary index; wherein the first condition includes: the value of the data belonging to the first target attribute in the tuple is a first data value; wherein the first target attribute does not contain a null value; the global secondary index is stored in the second node.

S602: and the second node updates or deletes the first tuple meeting the first condition in the global secondary index according to the first indication.

The second node may be the second node described in the foregoing embodiment, and the first node may be the first node described in the foregoing embodiment. This method may be performed after the method shown in fig. 5 described above, i.e., after step S502 described above. The specific implementation of each step in the method may refer to the related description in the method described in the foregoing embodiments, which is not repeated herein.

The implementation process of the above method will be described with reference to a specific example by taking the first node as a coordination node and the second node/third node/fourth node as data nodes in the above embodiment.

Example 1

Fig. 7 is a flowchart of a method for processing a global secondary index according to an embodiment of the present application. As shown in fig. 7, the flow of the method may include:

s701: the coordination node receives an IUD statement issued by a user, the IUD statement being used to indicate IUD operations to be performed on data in the base table.

In some embodiments of the present application, the IUD statement may be an insert statement or an update statement or a delete statement. Wherein, insert statement is used for indicating to insert (add) data in the base table, update statement is used for indicating to update the specified data in the base table, delete statement is used for indicating to delete the specified data in the base table. After receiving the IUD statement input by the user, the coordination node can determine the user instruction by analyzing the IUD statement, and generate an execution plan, wherein the execution plan is used for indicating the operation required to be executed by the data node.

S702: and the coordination node instructs the data node to execute corresponding IUD operation on the base table data according to the received IUD statement.

After the coordination node analyzes the IUD statement input by the user, when the IUD statement is determined to be an insert statement, the coordination node may determine a data node for executing the corresponding IUD operation according to the distribution mode of the base table data, and instruct the data node to execute the corresponding IUD operation on the base table data. When the coordinating node determines that the IUD statement is an update statement or a delete statement, as an alternative embodiment, the coordinating node may send an instruction to perform the corresponding IUD operation to all the data nodes, where the data nodes receive an instruction to perform the IUD operation if it is determined that the data related to the IUD operation (i.e., the data indicated by the IUD statement) is stored in the coordinating node itself, and ignore the instruction of the coordinating node if the data related to the IUD operation is not stored in the coordinating node itself. As another alternative embodiment, the coordinating node may determine a data node performing the corresponding IUD operation according to the distribution manner of the base table data, and instruct the data node to perform the corresponding IUD operation on the base table data.

S703: the data node performs IUD operation on the base table data according to the instruction of the coordination node.

S704: after performing the IUD operation on the base table data, the data node returns to the coordinator node the information needed to update the global secondary index of the base table.

The information required for updating the global secondary index of the base table at least includes base table data related to IUD operation and position information corresponding to the base table data, and may also include version information corresponding to the base table data, an index key of the global secondary index of the base table, and the like.

S705: and the coordination node indicates the data node to perform corresponding IUD operation on the global secondary index of the base table according to the information returned by the data node.

In this step, the coordination node may perform an IUD operation corresponding to the global secondary index of the base table on which the IUD operation is currently performed, and specifically, the coordination node may send an execution plan of the global secondary index IUD to the data node, and the data node performs the IUD operation on the global secondary index according to the execution plan.

S706: the coordination node determines whether all global secondary indexes of the base table are updated, if so, step S707 is executed, and otherwise, step S705 is executed.

And the coordination node executes corresponding IUD operation on all global secondary indexes which are associated with the base table and need to be updated in the current IUD operation until no global secondary index needs to be updated, and finally updating the global secondary index of the base table is completed.

S707: and the coordination node determines to complete synchronous updating of the base table and the global secondary index thereof, and waits for the next input of an instruction statement by the user.

Optionally, after receiving the IUD statement input by the user, the coordination node may continue to execute the step S701 and the subsequent steps.

The above-described flow is exemplarily described below in connection with a specific scenario.

Illustratively, taking the scenario where the base table has a global secondary index as an example, the name of the base table may be base table1 (table 1), and the base table may include data as shown in table1 below:

table1 base table1 data

c1	c2	c3
			1	1	4
2	3	5
			4	4	7

The base table1 shown in table1 may include 3 columns×3 rows of data, where 3 columns of data respectively belong to 3 attributes of c1, c2, and c3, and each row of data is a tuple. Wherein, the data in column c1 is the primary key and the distribution key of the base table 1. Wherein, the base table1 is distributed and stored in two data nodes according to a hash distribution mode.

The global secondary index of the base table1 may be named idx_gsi_1, and the global secondary index idx_gsi_1 may include data as shown in table 2 below:

table 2 global secondary index data

c2	c3	xc_node_hash	ctid	xmin	xmax
						1	4	hash(1)	(1,3)	100	0
3	5	hash(2)	(2,4)	102	0
						4	7	hash(4)	(2,9)	150	0

Wherein, the c2 column data in the global secondary index idx_gsi_1 is the distribution key and the index key of idx_gsi_1, and has non-null constraint (i.e. the data in the column is required to contain no null value); c3 column data is an additional key of idx_gsi_1, and no non-empty constraint exists; the xc_node_hash column data is a hash value of c1 column, xc_node_hash is used for representing first position information, and ctid is used for representing second position information; xmin is used to represent the first version information and xmax is used to represent the second version information.

When the base table 1 and the global secondary index idx_gsi_1 are both stored in two data nodes of the data node 1 and the data node 2 in a distributed manner, the distributed storage of the base table 1 and the global secondary index idx_gsi_1 in the data node is as shown in fig. 8, wherein the first row of data (i.e., the first tuple) (1, 4) in the base table 1 is stored in the data node 1, and the second and third rows of data (i.e., the second and third tuples) (2, 3, 5) and (4, 7) are stored in the data node 2. The first and second rows of data (i.e., first and second tuples) (1, 4, hash (1), (1, 3), 100, 0) and (3, 5, hash (2), (2, 4), 102,0) in the global secondary index idx_gsi_1 are stored in the data node 1, and the third row of data (i.e., third tuple) in the global secondary index idx_gsi_1 is stored in the data node 2 (4, 7, hash (4), (2, 9), 150,0).

1) Inserting data scenes

Based on the above example, when the operation of inserting data is performed on the base table 1 and the global secondary index idx_gsi_1, the inserted data is (3, 2, 6), and the data included in the base table 1 after inserting the data is as shown in the following table 3:

table 3 base table 1 data

c1	c2	c3
			1	1	4
2	3	5
			4	4	7
3	2	6

After inserting the above data in the base table 1, the coordinator node and the data node update the idx_gsi_1 by using the method provided in the above embodiment, and the data contained in the idx_gsi_1 is shown in the following table 4:

Table 4 global secondary index data

c2	c3	xc_node_hash	ctid	xmin	xmax
						1	4	hash(1)	(1,3)	100	0
3	5	hash(2)	(2,4)	102	0
						4	7	hash(4)	(2,9)	150	0
2	6	hash(3)	(1,5)	185	0

Wherein, according to the distribution mode of the base table1, the inserted data is stored in the data node 1; the inserted data will be stored in node 2 according to the distribution of the global secondary index idx_gsi_1. The distributed storage case of the base table1 and the global secondary index idx_gsi_1 in the data node after the above data is inserted is shown in fig. 9, in which the tuples (3, 2, 6) inserted in the base table1 are stored into the data node 1, compared to the distributed storage case shown in fig. 8. The tuples (2, 6, hash (3), (1, 5), 185,0) inserted in the global secondary index idx_gsi_1 are stored into data node 2.

The final execution plan of the data flow is as follows:

Plan:

Insert GSI(s)on table1

Node/s:All datanodes

Remote Query:INSERT INTO idx_gsi_1(c2,c3,xc_node_hash,ctid,xmin,xmax)VALUES($2,$3,$7,$6,$4,$5)

->Insert on table1

Output:table1.c1,table1.c2,table1.c3,table1.ctid,table1.xmin,table1.xmax

Node/s:Alldatanode

Remote Query:INSERT INTO table1(c1,c2,c3)VALUES($1,$2,$3)RETURNING table1.c1,table1.c2,table1.c3,table1.ctid,table1.xmin,table1.xmax

->Result

Output:3,2,6

wherein, $2, $3, $7, $6, $4, $5 represent data values of 2,6, hash (3), (1, 5), 185,0, respectively, in the execution plan. In the execution plan, $1, $2, $3 represent data values 3,2,6, respectively.

The execution plan performs from bottom to top, and the meaning of the execution plan is as follows:

the following statements are sent to all data nodes, namely data node 1 and data node 2: INSERT INTO table1 (c 1, c2, c 3) VALUES ($1, $2, $3) return table1.C1, table1.C2, table1.C3, table1.Ctid, table1.Xmin, table1. Xmax) such that the data node 1 updates the data contained in the base table1, i.e., the insertion data. Data node 1 inserts data in base table1 and returns values of the following attributes in the inserted data to the coordinator node: table1.c1, table1.c2, table1.c3, table1.ctid, table1.xmin, table1.xmax. After receiving the information returned by the data node 1, the coordination node performs hash operation according to the base table distribution key c1 to obtain xc_node_hash column data, and combines other information returned by the data node 1 to splice out data c2, c3, xc_node_hash, ctid, xmin and xmax inserted into idx_gsi_1, so as to generate the following insertion statement and send the following insertion statement to all the data nodes: INSERT INTO idx_gsi_1 (c 2, c3, xc_node_hash, ctid, xmin, xmax) value ($2, $3, $7, $6, $4, $5) causes the data node to update the data in idx_gsi_1.

2) Updating data scenes

Based on the above example, the operation of inserting data is performed on the base table 1 and the corresponding global secondary index idx_gsi_1 shown in table 3, where the tuple before update is (3, 2, 6), and the tuple after update is (3,2,9), and the data contained in the base table 1 after update is as shown in the following table 5:

table 5 base table 1 data

c1	c2	c3
			1	1	4
2	3	5
			4	4	7
3	2	9

After updating the base table data, the coordination node and the data node update the idx_gsi_1 by using the method provided in the foregoing embodiment, and the data included in the idx_gsi_1 is shown in the following table 6:

TABLE 6 Global secondary index data

c2	c3	xc_node_hash	ctid	xmin	xmax
						1	4	hash(1)	(1,3)	100	0
3	5	hash(2)	(2,4)	102	0
						4	7	hash(4)	(2,9)	150	0
2	9	hash(3)	(1,5)	188	0

Wherein, according to the distribution mode of the base table 1, the updated data is stored in the data node 1; according to the distribution mode of the global secondary index idx_gsi_1, the updated data is stored in the node 2. The distributed storage of the base table 1 and the global secondary index idx_gsi_1 in the data node after updating the data is shown in fig. 10, where the data in the tuples in the base table and the global secondary index have been updated.

The final execution plan of the update data flow is as follows:

Plan:

Insert GSI(s)on table1

Node/s:All datanodes

Remote Query:UPDATE idx_gsi_1SET c2＝$2,c3＝$3,xc_node_hash＝$7,ctid＝$6,xmin＝$4,xmax＝$5)WHERE c2＝$2,xc_node_hash＝$14AND ctid＝$13

->Update on table1

Output:c1_new$$,c2_new$$,c3_new$$,ctid_new$$,xmin_new$$,xmax_new$$,table1.c1,table1.c2,table1.c3,table1.ctid,table1.xmin,table1.xmax

Node/s:Alldatanode

Node expr:table1.c1

Remote Query:UPDATE ONLY table1 SET c3＝$3WHERE table1.ctid＝$4AND table1.xc_node_id＝$5RETURNING table1.c1 AS“c1_new$$”.table1.c2 AS“c2_new$$”.table1.c3 AS“c3_new$$”.table1.xmin AS“xmin_new$$”.table1.xmax AS“xmax_new$$”.table1.ctid AS“ctid_new$$”.table1.c1,table1.c2,table1.c3,table1.ctid,table1.xmin,table1.xmax

->Data Node Scan on table1“_REMOTE_TABLE_QUERY”

Output:table1.c1,table1.c2,9,table1.c2,table1.ctid,table1.xc_node_id,table1*

Node/s:Alldatanode

Remote Query:SELECT c1,c2,xc_node_id.table1*FROM ONLY table1 WHERE c2＝2FOR UPDATE OF table1

the terms UPDATE idx_gsi_1set c2= $2, c3= $3, xc_node_hash= $7, ctid= $6, xmin= $4, xmax= $5) WHERE c 2= $2, xc_node_hash= $14AND ctid= $13 may be used as the first instruction corresponding statement in the above embodiment, c2= $2, xc node_hash= $14AND ctid= $13 is the first condition in the above embodiment, c2 is the first attribute in the above embodiment, xc_node_hash, ctid is the second attribute in the above embodiment, AND the data corresponding to $2, $14, $13 is the first data value in the above embodiment.

the method comprises the steps of firstly traversing and obtaining basic table original data before updating through remote QUERY (EMOTE QUERY), then carrying out updating operation on the basic table data at corresponding data nodes, and RETURNING contents such as a basic table new distribution key, basic table new version information, basic table new position information, basic table new values and the like through a return mechanism (return) so as to provide subsequent updating operation for the global secondary index. And the coordination node performs the updating operation of the global secondary index data on the data node where the global secondary index is positioned by utilizing the acquired various information.

Alternatively, the statements UPDATE idx_gsi_1set c2= $2, c3= $3, xc_node_hash= $7, ctid= $6, xmin= $4, xmax= $5) WHERE c 2= $2, xc_node_hash= $14AND ctid= $13 in the execution plan may be replaced by the following statement: UPDATE idx_gsi_1set c2= $2, c3= $3, xc_node_hash= $7, ctid= $6, xmin= $4, xmax= $5) WHERE xc_node_hash= $14AND ctid= $13, or may be replaced by the following statement: UPDATE idx_gsi_1set c2= $2, c3= $3, xc_node_hash= $7, ctid= $6, xmin= $4, xmax= $5) WHERE c 2= $2.

3) Deleting data scenes

Based on the above example, the operation of deleting data is performed on the base table 1 and the corresponding global secondary index idx_gsi_1 shown in table 3, wherein the deleted tuple is (4, 7), and the data contained in the base table 1 after deleting the tuple is as shown in the following table 7:

Table 7 base table 1 data

c1	c2	c3
			1	1	4
2	3	5
			3	2	9

After updating the base table data, the coordination node and the data node update the idx_gsi_1 by using the method provided in the foregoing embodiment, and the data included in the idx_gsi_1 is shown in the following table 8:

table 8 global secondary index data

c2	c3	xc_node_hash	ctid	xmin	xmax
						1	4	hash(1)	(1,3)	100	0
3	5	hash(2)	(2,4)	102	0
						2	9	hash(3)	(1,5)	188	0

Wherein, according to the distribution mode of the base table 1, the deleted data is stored in the data node 2; the deleted data is stored in node 2 according to the distribution of the global secondary index idx_gsi_1. The distributed storage of the base table 1 and the global secondary index idx_gsi_1 in the data node after deleting the data is shown in fig. 11, in which part of the tuples in the base table and the global secondary index are deleted.

The final execution plan of the deleting data flow is as follows:

Plan:

Delete GSI(s)on table1

Node/s:All datanodes

Remote Query:DELETE FROM idx_gsi_1WHERE c2＝$2,xc_node_hash＝$7AND ctid＝$6

->Delete on table1

Output:table1.c1,table1.c2,table1.c3,table1.ctid,table1.xmin,table1.xmax

Node/s:Alldatanode

Remote Query:DELETE FROM ONLY table1 WHERE table1.ctid＝$1AND table1.xc_node_id＝$14RETURNING table1.c1,table1.c2,table1.c3,table1.ctid,table1.xmin,table1.xmax

->Data Node Scan on table1“_REMOTE_TABLE_QUERY”

Output:table1.c2,table1.ctid,table1.xc_node_id

Node/s:Alldatanode

Remote Query:SELECT c2,xc_node_id,ctid FROM ONLY table1 WHERE c2＝4

the statement DELETE FROM idx_gsi_1where c2= $2, xc_node_hash= $7AND act= $6 may be the statement corresponding to the first instruction described in the above embodiment, c2= $2, xc_node_hash= $7AND act= $6 is the first condition described in the above embodiment, c2 is the first attribute described in the above embodiment, xc_node_hash, act is the second attribute described in the above embodiment, AND data corresponding to $2, $7, $6 is the first data value described in the above embodiment.

the base table and the global secondary index data are deleted in the same transaction, the base table is traversed through REMOTE QUERY to obtain original data information in the table, then the deleting operation of the base table data is carried out on the data node, and related information is returned through RETURNING. And deleting the data of the global secondary index idx_gsi_1 by using the previously acquired primary table data. When the deleted transaction is submitted, the version information corresponding to the deleted base table data and the global secondary index data is updated and marked as deleted (i.e. updated as version information with invalid marking corresponding data), and the base table data shown in the table 7and the global secondary index data shown in the table 8 can be obtained.

Alternatively, the statement DELETE FROM idx_gsi_1where c2= $2, xc_node_hash= $7AND act= $6 in the execution plan may be replaced by the following statement: DELETE FROM idx_gsi_1WHERE xc_node_hash = $7AND ctid= $6, or may be replaced by the following statement: DELETE FROM idx_gsi_1where c2= $2.

Example 2

Fig. 12 is a flowchart of a method for processing a global secondary index according to an embodiment of the present application. As shown in fig. 12, the flow of the method may include:

S1201: the coordination node receives an IUD statement issued by a user, the IUD statement being used to indicate IUD operations to be performed on data in the base table.

Illustratively, taking the scenario in which the base table has one global secondary index as an example, the name of the base table may be base table2 (table 2), and the base table may include data therein as shown in the schematic diagram (a) in fig. 13. Wherein, the base table2 may contain 3 columns×3 rows of data, wherein, 3 columns of data respectively belong to the 3 attributes of c1, c2, and c3, and each row of data is a tuple. Wherein, the data in column c1 is the primary key and the distribution key of the base table 1. Wherein, the data in the base table2 are distributed and stored in two data nodes according to a range distribution mode. For example, as shown in the schematic diagram of fig. 13 (a), when the distribution key data value is less than or equal to 25, the data thereof may be distributed in the data node 1, and when the distribution key data value is greater than 25, the data thereof is distributed in the data node 2.

The global secondary index of the base table2 may be named idx_gsi_2, and the global secondary index idx_gsi_2 may include data therein as shown in the schematic diagram (b) of fig. 13. Wherein, the data in the c2 column in the global secondary index idx_gsi_2 is a distribution key and an index key, and has non-null constraint (i.e. the data in the column is required to not contain null value); c3 column data is an additional key of idx_gsi_2, and no non-null constraint exists; the xc_node_range column data is a range value of c1 column, xc_node_range is used for representing first position information, and ctid is used for representing second position information; xmin is used to represent the first version information and xmax is used to represent the second version information.

When the base table 2 and the global secondary index idx_gsi_2 are both distributed and stored in two data nodes of the data node 1 and the data node 2, the distribution and storage of the base table 2 and the global secondary index idx_gsi_2 in the data node is shown as a schematic diagram in fig. 13 (c), in which tuples (10,10,4) and (20,30,5) in the base table 2 are stored in the data node 1 and the tuple (40,40,7) is stored in the data node 2. The tuples (10, 4, range (10), (1, 3), 100, 0) in the global secondary index idx_gsi_2 are stored in data node 1, and the tuples (30, 5, range (20), (2, 4), 102,0) and (40, 7, range (40), (2, 9), 150,0) in the global secondary index idx_gsi_2 are stored in data node 2.

Illustratively, the IUD operation performed on the base table indicated by the IUD statement issued by the user may be any one of examples 1 to 3 below:

example 1: tuple (30,20,6) is inserted into base table 2.

Example 2: based on the scenario of example 1, the tuple (30,20,6) inserted in base table 2 is updated to a tuple (30,20,9).

Example 3: based on the scenario of example 1, the tuple (40,40,7) in base table 2 is deleted.

S1202: and the coordination node instructs the data node to execute corresponding IUD operation on the base table data according to the received IUD statement.

S1203: the data node performs IUD operation on the base table data according to the instruction of the coordination node.

S1204: after performing the IUD operation on the base table data, the data node returns to the coordinator node the information needed to update the global secondary index of the base table.

S1205: and the coordination node indicates the data node to perform corresponding IUD operation on the global secondary index of the base table according to the information returned by the data node.

S1206: and the data node performs corresponding IUD operation on the global secondary index of the base table according to the instruction of the coordination node.

Illustratively, when the operation of inserting data is performed on the base table 2 and the global secondary index idx_gsi_2 based on the example 1 described in the above step S1201, the tuple of the inserted base table 2 is (30,20,6), and the data included in the base table 2 after the data is inserted is shown in the schematic diagram (a) of fig. 14. After the above data is inserted into the base table 2, the coordinator node and the data node update idx_gsi_2 by using the method provided in the above embodiment, and the data contained in idx_gsi_2 is shown in (b) schematic diagram in fig. 14. Wherein, according to the distribution mode of the base table 2, the inserted data is stored in the data node 2; the inserted data will be stored in data node 1 according to the way the global secondary index idx_gsi_2 is distributed. The distributed storage situation of the base table 2 and the global secondary index idx_gsi_2 in the data node after the above data is inserted is shown in the diagram of fig. 14 (c), in which the tuple (30,20,6) inserted in the base table 2 is stored in the data node 2, compared to the distributed storage situation shown in the diagram of fig. 13 (c). The tuples (20, 6, range (30), (1, 5), 185,0) inserted in the global secondary index idx_gsi_2 are stored into data node 1.

The final execution plan of the data flow is as follows:

Plan:

Insert GSI(s)on table2

Node/s:All datanodes

Remote Query:INSERT INTO idx_gsi_2(c2,c3,xc_node_range,ctid,xmin,xmax)VALUES($2,$3,$7,$6,$4,$5)

->Insert on table2

Output:table2.c1,table2.c2,table2.c3,table2.ctid,table2.xmin,table2.xmax

Node/s:Alldatanode

Remote Query:INSERT INTO table2(c1,c2,c3)VALUES($1,$2,$3)RETURNING table2.c1,table2.c2,table2.c3,table2.ctid,table2.xmin,table2.xmax

->Result

Output:30,20,6

wherein, $2, $3, $7, $6, $4, $5 represent data values 20,6, range (30), (1, 5), 185,0, respectively, in the execution plan. In the execution plan, $1, $2, $3 represent data values 30,20,6, respectively.

the following statements are sent to all data nodes, namely data node 1 and data node 2: INSERT INTO table2 (c 1, c2, c 3) VALUES ($1, $2, $3) return table2.C1, table2.C2, table2.C3, table2.Ctid, table2.Xmin, table2.Xmax, so that the data node 2 updates the data contained in the base table2, i.e., the insertion data. Data node 2 inserts data in base table2 and returns values of the following attributes in the inserted data to the coordinator node: table2.c1, table2.c2, table2.c3, table2.ctid, table2.xmin, table2.xmax. After receiving the information returned by the data node 2, the coordination node performs range operation according to the base table distribution key c1 to obtain xc_node_range column data, and combines other information returned by the data node 2 to splice out data inserted with idx_gsi_2, namely c2, c3, xc_node_range, ctid, xmin and xmax, so as to generate the following insertion statement and send the following insertion statement to all the data nodes: INSERT INTO idx_gsi_2 (c 2, c3, xc_node_range, ctid, xmin, xmax) value ($2, $3, $7, $6, $4, $5) causes the data node to update the data in idx_gsi_2.

Illustratively, based on example 2 described in step S1201 above, the operation of inserting data is performed on the base table 1 and the corresponding global secondary index idx_gsi_2 shown in table 3, where the tuple before update is (30,20,6), the tuple after update is (30,20,9), and the data included in the base table 2 after update is as shown in the schematic diagram (a) in fig. 15. After updating the base table data, the coordination node and the data node update the idx_gsi_2 by using the method provided in the foregoing embodiment, and the data included in the idx_gsi_2 is shown in the schematic diagram (b) in fig. 15.

Wherein, according to the distribution mode of the base table 2, the updated data is stored in the data node 2; according to the distribution mode of the global secondary index idx_gsi_2, the updated data is stored in the node 1. The distributed storage of the base table 2 and the global secondary index idx_gsi_2 in the data node after updating the data is shown in the schematic diagram of fig. 15 (c), in which the data in the tuples in the base table and the global secondary index are updated.

The final execution plan of the update data flow is as follows:

Plan:

Insert GSI(s)on table2

Node/s:All datanodes

Remote Query:UPDATE idx_gsi_2 SET c2＝$2,c3＝$3,xc_node_range＝$7,ctid＝$6,xmin＝$4,xmax＝$5)WHERE c2＝$2,xc_node_range＝$14AND ctid＝$13

->Update on table2

Output:c1_new$$,c2_new$$,c3_new$$,ctid_new$$,xmin_new$$,xmax_new$$,table2.c1,table2.c2,table2.c3,table2.ctid,table2.xmin,table2.xmax

Node/s:Alldatanode

Node expr:table2.c1

Remote Query:UPDATE ONLY table2 SET c3＝$3WHERE table2.ctid＝$4AND table2.xc_node_id＝$5RETURNING table2.c1 AS“c1_new$$”.table2.c2 AS“c2_new$$”.table2.c3 AS“c3_new$$”.table2.xmin AS“xmin_new$$”.table2.xmax AS“xmax_new$$”.table2.ctid AS“ctid_new$$”.table2.c1,table2.c2,table2.c3,table2.ctid,table2.xmin,table2.xmax

->Data Node Scan on table2“_REMOTE_TABLE_QUERY”

Output:table2.c1,table2.c2,9,table2.c2,table2.ctid,table2.xc_node_id.table2*

Node/s:Alldatanode

Remote Query:SELECT c1,c2,table2.xc_node_id.table2*FROM ONLY table2 WHERE c2＝2FOR UPDATE OF table2

the terms UPDATE idx_gsi_2set c2= $2, c3= $3, xc_node_range= $7, ctid= $6, xmin= $4, xmax= $5) WHERE c 2= $2, xc_node_range= $14AND ctid= $13 may be used as the first instruction corresponding statement in the above embodiment, c2= $2, xc node_range= $14AND ctid= $13 is the first condition in the above embodiment, c2 is the first attribute in the above embodiment, xc_node_range, ctid is the second attribute in the above embodiment, AND the data corresponding to $2, $14, $13 is the first data value in the above embodiment.

Alternatively, the statements UPDATE idx_gsi_2set c2= $2, c3= $3, xc_node_range= $7, ctid= $6, xmin= $4, xmax= $5) WHERE c 2= $2, xc_node_range= $14AND ctid= $13 in the execution plan may be replaced by the following statement: UPDATE idx_gsi_2set c2= $2, c3= $3, xc_node_range= $7, ctid= $6, xmin= $4, xmax= $5) WHERE xc_node_range= $14AND ctid= $13, or may be replaced by the following statement: UPDATE idx_gsi_2set c2= $2, c3= $3, xc_node_range= $7, ctid= $6, xmin= $4, xmax= $5) WHERE c 2= $2.

Illustratively, based on example 3 described in step S1201 above, the operation of deleting data is performed on the base table 2 and the corresponding global secondary index idx_gsi_2 shown in table 3, wherein the deleted tuple is (40,40,7), and the data contained in the base table 2 after deleting the tuple is as shown in the schematic diagram (a) in fig. 16. After updating the base table data, the coordination node and the data node update the idx_gsi_2 by using the method provided in the foregoing embodiment, and the data included in the idx_gsi_2 is shown in the schematic diagram (b) in fig. 16.

Wherein, according to the distribution mode of the base table 2, the deleted data is stored in the data node 2; the deleted data is stored in node 2 according to the distribution of the global secondary index idx_gsi_2. The distributed storage of the base table 2 and the global secondary index idx_gsi_2 in the data node after deleting the data is shown in the schematic diagram in fig. 16 (c), in which part of the tuples in the base table and the global secondary index are deleted.

The final execution plan of the deleting data flow is as follows:

Plan:

Delete GSI(s)on table2

Node/s:All datanodes

Remote Query:DELETE FROM idx_gsi_2 WHERE c2＝$2,xc_node_range＝$7 AND ctid＝$6

->Delete on table2

Output:table2.c1,table2.c2,table2.c3,table2.ctid,table2.xmin,table2.xmax

Node/s:Alldatanode

Remote Query:DELETE FROM ONLY table2 WHERE table2.ctid＝$1 AND table2.xc_node_id＝$2RETURNING table2.c1,table2.c2,table2.c3,table2.ctid,table2.xmin,table2.xmax

->Data Node Scan on table2“_REMOTE_TABLE_QUERY”

Output:table2.c2,table2.ctid,table2.xc_node_id

Node/s:Alldatanode

Remote Query:SELECT c2,xc_node_id,ctid FROM ONLY table2 WHERE c2＝4

the statement DELETE FROM idx_gsi_2where c2= $2, xc_node_range= $7AND act= $6 may be the statement corresponding to the first instruction described in the above embodiment, c2= $2, xc_node_range= $7AND act= $6 is the first condition described in the above embodiment, c2 is the first attribute described in the above embodiment, xc_node_range, act is the second attribute described in the above embodiment, AND data corresponding to $2, $7, $6 is the first data value described in the above embodiment.

the base table and the global secondary index data are deleted in the same transaction, the base table is traversed through REMOTE QUERY to obtain original data information in the table, then the deleting operation of the base table data is carried out on the data node, and related information is returned through RETURNING. And deleting the data of the global secondary index idx_gsi_2 by using the previously acquired primary table data. When the deleted transaction is committed, version information corresponding to the deleted base table data and the global secondary index data is updated and marked as deleted, and the base table data shown in the diagram (a) in fig. 16 and the global secondary index data shown in the diagram (b) in fig. 16 can be obtained.

Alternatively, the statement DELETE FROM idx_gsi_2where c2= $2, xc_node_range= $7AND act= $6 in the execution plan may be replaced by the following statement: DELETE FROM idx_gsi_2WHERE xc_node_range = $7AND ctid= $6, or may be replaced by the following statement: DELETE FROM idx_gsi_2where c2= $2.

S1207: after the data node completes the IUD operation on the global secondary index of the base table, a notification of the success of the IUD operation is returned to the coordinator node.

S1208: the coordinating node prompts the user to successfully complete the IUD operation on the global secondary index.

It should be noted that, the specific implementation procedure provided in the foregoing embodiments is merely an illustration of a procedure applicable to the embodiments of the present application, and specific implementation may refer to the description in the foregoing embodiments. In addition, the execution sequence of each step in each implementation flow may be adjusted according to the actual requirement, and other steps may be added or part of steps may be reduced, which is not limited in the embodiment of the present application.

Based on the above embodiments and the same concept, the embodiments of the present application further provide a data processing apparatus, which is configured to implement the functions of the node provided by the embodiments of the present application. Fig. 17 shows a data processing apparatus 1700 provided in the present application, where the data processing apparatus 1700 may be a node device, or may be a chip or a chip system in the node device.

Specifically, the data processing apparatus 1700 includes a transceiver unit 1701 and a processing unit 1702. Wherein the transceiver 1701 is configured to receive data transmitted from an external data processing apparatus and to transmit data to the external data processing apparatus.

When the data processing apparatus 1700 is implemented as the first node in the above embodiment, the transceiver unit 1701 cooperates with the processing unit 1702 to perform the method performed by the first node according to the embodiment of the present application.

When the data processing apparatus 1700 is implemented as the second node in the above embodiment, the transceiver unit 1701 cooperates with the processing unit 1702 to perform the method performed by the second node according to the embodiment of the present application.

The division of the modules in the embodiments of the present application is schematically only one logic function division, and there may be another division manner in actual implementation, and in addition, each functional module in each embodiment of the present application may be integrated in one processor, or may exist separately and physically, or two or more modules may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules.

Based on the above embodiments and the same concept, the embodiments of the present application further provide a data processing apparatus, which is configured to implement the functions of the node provided by the embodiments of the present application. Fig. 18 shows a data processing apparatus 1800 according to an embodiment of the present application, where the data processing apparatus 1800 may be a node device, or may be a chip or a chip system in a node device.

In some embodiments of the present application, the data processing apparatus 1800 may also be a network device, an electronic device, or a chip, an integrated circuit, or other structure capable of performing the methods provided in the embodiments of the present application.

Illustratively, the data processing apparatus 1800 includes a transceiver 1801 and at least one processor 1802. The processor 1802 is coupled to the transceiver 1801, where the coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, units, or modules, which may be in electrical, mechanical, or other forms for information interaction between the devices, units, or modules.

In particular, the transceiver 1801 may be a circuit, a bus, a communication interface, or any other module that may be used to interact with information, and may be used to receive or transmit information.

Optionally, the data processing device 1800 may also include a memory 1803, the memory 1803 being coupled to the transceiver 1801 and the processor 1802 for storing program instructions.

The processor 1802 is configured to invoke program instructions stored in the memory 1803, so that the data processing apparatus 1800 performs a method performed by the first node or the second node or the third node in the methods provided in the embodiments of the present application.

The transceiver 1801 is used to receive and transmit radio frequency signals, and is coupled to a receiver and transmitter of the data processing device 1800. The transceiver 1801 communicates with communication networks and other devices, such as wireless local area networks (Wireless Local Area Networks, WLAN), bluetooth communication networks, mobile networks, etc., via radio frequency signals.

In particular implementations, the memory 1803 may include high-speed random access memory, and may also include non-volatile memory, such as one or more disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. The memory 1803 may store an operating system (hereinafter referred to as a system), such as ANDROID, IOS, WINDOWS, or an embedded operating system such as LINUX. The memory 1803 may be used to store implementation programs of the embodiments of the present application. The memory 1803 may also store network communication programs that may be used to communicate with one or more additional devices, one or more user devices, and one or more network devices.

The processor 1802 may be a general purpose central processing unit (Central Processing Unit, CPU), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the programs of the present Application.

In some embodiments, the data processing apparatus 1800 may also include an output device 1804 and an input device 1805. The output device 1804 communicates with the processor 1802 and may display information in a variety of ways. For example, the output device 1804 may be a liquid crystal display (Liquid Crystal Display, LCD), a light emitting diode (Light Emitting Diode, LED) display device, a Cathode Ray Tube (CRT) display device, or a projector (projector), or the like. The input device 1805 and the processor 1802 are in communication and may receive input from a user in a variety of ways. For example, the input device 1805 may be a mouse, keyboard, touch screen device, or sensing device, among others. To facilitate user access to the output devices 1804 and the input devices 1805, in some embodiments, the memory 1803 may further store a user interface program that may vividly display the content of the application program through a graphical operation interface, and receive control operations of the application program from a user through input controls such as menus, dialog boxes, and buttons.

In one example, when the data processing apparatus 1800 is implemented as the first node in the above embodiments, the transceiver 1801, in cooperation with the processor 1802, may be used to perform the methods provided by the embodiments of the present application that are performed by the first node.

In one example, when the data processing apparatus 1800 is implemented as the second node in the above embodiments, the transceiver 1801, in cooperation with the processor 1802, may be used to perform the methods provided by the embodiments of the present application that are performed by the second node.

In one example, when the data processing apparatus 1800 is implemented as the third node in the above embodiments, the transceiver 1801, in cooperation with the processor 1802, may be used to perform the method performed by the third node provided in the embodiments of the present application.

Based on the above embodiments and the same conception, the present application embodiments also provide a data processing apparatus, the apparatus including a memory and at least one processor; the memory is used for storing a computer program; the processor is configured to execute the computer program stored in the memory, and implement the method provided in the foregoing embodiment.

Based on the above embodiments and the same concepts, the embodiments of the present application further provide a distributed database system, including the first node, at least one second node, and at least one third node described in the above embodiments.

Based on the above embodiments and the same conception, the present embodiments also provide a computer storage medium having a computer-readable program stored therein, which when run on a computer, causes the computer to perform the method provided in the above embodiments.

Based on the above embodiments and the same conception, the present application embodiment also provides a computer program product for causing a computer to execute the method provided in the above embodiments when the computer program product is run on the computer.

Based on the above embodiments and the same conception, the present embodiments also provide a chip for reading a computer program stored in a memory, and performing the method provided in the above embodiments.

Based on the above embodiments and the same conception, the present application also provides a chip system, which includes a processor for supporting a computer device to implement the method provided in the above embodiments.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims

1. A method of global secondary indexing applied to a first node, the method comprising:

generating a first indication, wherein the first indication is used for indicating to update or delete a first tuple meeting a first condition in the global secondary index; wherein the first condition includes: the value of the data belonging to the first target attribute in the tuple is a first data value; wherein the first target attribute does not contain a null value;

and sending the first indication to a second node for storing the global secondary index, so that the second node updates or deletes the first tuple meeting the first condition in the global secondary index according to the first indication.

2. The method of claim 1, wherein the global secondary index comprises at least one attribute in a base table; the first target attribute comprises a first attribute and/or a second attribute;

Wherein the first attribute is one or more attributes of the at least one attribute, and the second attribute is an attribute for indicating location information for indicating a location of a tuple to which the data of the at least one attribute belongs in the base table.

3. The method of claim 2, wherein the second node comprises at least one data node;

when the first target attribute only comprises the first attribute, the value of the data belonging to the first attribute in the first condition is used for determining a first data node where the first tuple is located; or alternatively

When the first target attribute only comprises the second attribute, the value of the data belonging to the second attribute in the first condition is used for determining the first tuple in the at least one data node; or alternatively

When the first target attribute includes the first attribute and the second attribute, the value of the data belonging to the first attribute in the first condition is used for determining a first data node where the first tuple is located, and the value of the data belonging to the second attribute in the first condition is used for determining the first tuple in the first data node.

4. A method as claimed in claim 2 or 3, wherein the location information for each tuple comprises first location information and second location information; the first location information is used for indicating a second data node where the tuple is located, and the second location information is used for indicating a storage location of the tuple in the second data node.

5. The method according to claim 2 to 4,

the first attribute is an index key of the global secondary index; or alternatively

And part of the first attributes are index keys of the global secondary index, and the other attributes except the part of the first attributes are additional keys of the global secondary index.

6. The method of any of claims 2-5, wherein the second attribute is an additional key of the global secondary index.

7. The method of any of claims 2-6, wherein the global secondary index further comprises version information indicating a version of a tuple to which the data of the at least one attribute belongs, wherein the version of each tuple is used to indicate whether the data of the corresponding tuple is valid.

8. The method of any of claims 2-7, wherein prior to generating the first indication, the method further comprises:

receiving a second instruction, wherein the second instruction is used for indicating to update or delete the first data in the base table; wherein the first data comprises the first tuple;

according to the second indication, a third node for storing the first data is indicated to update or delete the first data;

receiving first target information from the third node, and determining the first indication according to the first target information;

when the second instruction is used for indicating to update the first data, the first target information comprises data belonging to the at least one attribute in the first data before updating, position information corresponding to the first data before updating, data belonging to the at least one attribute in the first data after updating and position information corresponding to the first data after updating; when the second instruction is used for indicating to delete the first data, the first target information comprises data belonging to the at least one attribute in the first data and position information corresponding to the first data; the position information corresponding to any data is used for indicating the position of the data in the base table.

9. The method of any one of claims 1-8, further comprising:

when a third instruction for instructing to add second data to the base table is received, determining a fourth node for storing the second data according to the third instruction, and instructing the fourth node to add the second data to the base table;

receiving second target information from the fourth node; the second target information comprises data belonging to the at least one attribute in the second data and position information corresponding to the second data, wherein the position information corresponding to the second data is used for indicating the position of the second data in the base table;

and updating the global secondary index according to the second target information.

10. A method of global secondary indexing applied to a second node, the method comprising:

receiving a first indication from a first node, wherein the first indication is used for indicating to update or delete a first tuple meeting a first condition in a global secondary index; wherein the first condition includes: the value of the data belonging to the first target attribute in the tuple is a first data value; wherein the first target attribute does not contain a null value; the global secondary index is stored in the second node;

And updating or deleting the first tuple meeting the first condition in the global secondary index according to the first indication.

11. The method of claim 10, wherein the global secondary index comprises at least one attribute in a base table; the first target attribute comprises a first attribute and/or a second attribute;

12. The method of claim 11, wherein the second node comprises at least one data node; before updating or deleting the first tuple in the global secondary index that satisfies the first condition according to the first indication, the method further comprises:

determining the first tuple;

wherein said determining said first tuple comprises:

when the first target attribute only comprises the first attribute, determining a first data node where the first tuple is located according to the value of the data belonging to the first attribute in the first condition, and searching the data stored in the first data node to obtain the first tuple; or alternatively

When the first target attribute only comprises the second attribute, searching for the first tuple in the at least one data node according to the value of the data belonging to the second attribute in the first condition; or alternatively

When the first target attribute comprises the first attribute and the second attribute, determining the first data node where the first tuple is located according to the value of the data belonging to the first attribute in the first condition, and searching the first data node according to the value of the data belonging to the second attribute in the first condition to obtain the first tuple.

13. The method of claim 11 or 12, wherein the location information corresponding to each tuple comprises first location information and second location information; the first location information is used for indicating a second data node where the tuple is located, and the second location information is used for indicating a storage location of the tuple in the second data node.

14. The method according to any one of claim 11 to 13, wherein,

15. The method of any of claims 11-14, wherein the second attribute is an additional key of the global secondary index.

16. The method of any of claims 11-15, wherein the global secondary index further comprises version information indicating a version of a tuple to which the data of the at least one attribute belongs, wherein the version of each tuple is used to indicate whether the data of the corresponding tuple is valid.

17. A data processing apparatus, the apparatus comprising:

the processing unit is used for generating a first instruction, wherein the first instruction is used for indicating to update or delete a first tuple meeting a first condition in the global secondary index; wherein the first condition includes: the value of the data belonging to the first target attribute in the tuple is a first data value; wherein the first target attribute does not contain a null value;

and the receiving and transmitting unit is used for transmitting the first indication to a second node for storing the global secondary index so that the second node updates or deletes the first tuple meeting the first condition in the global secondary index according to the first indication.

18. The apparatus of claim 17, wherein the global secondary index comprises at least one attribute in a base table; the first target attribute comprises a first attribute and/or a second attribute;

19. The apparatus of claim 18, wherein the second node comprises at least one data node;

20. The apparatus of claim 18 or 19, wherein the location information for each tuple comprises first location information and second location information; the first location information is used for indicating a second data node where the tuple is located, and the second location information is used for indicating a storage location of the tuple in the second data node.

21. The apparatus of any of claims 18-20, wherein the global secondary index further comprises version information indicating a version of a tuple to which the data of the at least one attribute belongs, wherein the version of each tuple is used to indicate whether the data of the corresponding tuple is valid.

22. The apparatus of any of claims 18-21, wherein the processing unit, prior to generating the first indication, is further to:

receiving, by the transceiver unit, a second instruction, where the second instruction is used to instruct updating or deleting the first data in the base table; wherein the first data comprises the first tuple;

according to the second instruction, a third node for storing the first data is instructed to update or delete the first data through the receiving-transmitting unit;

Receiving, by the transceiver unit, first target information from the third node, and determining the first indication according to the first target information;

23. The apparatus of any one of claims 17 to 22, wherein the processing unit is further configured to:

when a third instruction for instructing to add second data to the base table is received through the transceiving unit, determining a fourth node for storing the second data according to the third instruction, and instructing the fourth node to add the second data to the base table;

Receiving, by the transceiver unit, second target information from the fourth node; the second target information comprises data belonging to the at least one attribute in the second data and position information corresponding to the second data, wherein the position information corresponding to the second data is used for indicating the position of the second data in the base table;

24. A data processing apparatus, the apparatus comprising:

the receiving and transmitting unit is used for receiving a first instruction from a first node, wherein the first instruction is used for indicating to update or delete a first tuple meeting a first condition in the global secondary index; wherein the first condition includes: the value of the data belonging to the first target attribute in the tuple is a first data value; wherein the first target attribute does not contain a null value; the global secondary index is stored in the second node;

and the processing unit is used for updating or deleting the first tuple meeting the first condition in the global secondary index according to the first indication.

25. The apparatus of claim 24, wherein the global secondary index comprises at least one attribute in a base table; the first target attribute comprises a first attribute and/or a second attribute;

26. The apparatus of claim 25, wherein the second node comprises at least one data node; before updating or deleting the first tuple in the global secondary index that satisfies the first condition according to the first indication, the method further comprises:

determining the first tuple;

wherein said determining said first tuple comprises:

27. The apparatus of claim 25 or 26, wherein the location information for each tuple comprises first location information and second location information; the first location information is used for indicating a second data node where the tuple is located, and the second location information is used for indicating a storage location of the tuple in the second data node.

28. The apparatus of any of claims 25-27, wherein the global secondary index further comprises version information indicating a version of a tuple to which the data of the at least one attribute belongs, wherein the version of each tuple is used to indicate whether the data of the corresponding tuple is valid.

29. A data processing apparatus comprising a memory and at least one processor;

The memory is used for storing a computer program;

the processor is configured to execute a computer program stored in the memory, to implement the method according to any one of claims 1 to 9, or to implement the method according to any one of claims 10 to 16.

30. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer readable program which, when run on a computer, causes the computer to perform the method according to any one of claims 1-9 or to perform the method according to any one of claims 10-16.