CN108595482B

CN108595482B - Data indexing method and device

Info

Publication number: CN108595482B
Application number: CN201810205324.7A
Authority: CN
Inventors: 谢晓芹
Original assignee: Huawei Cloud Computing Technologies Co Ltd
Current assignee: Huawei Cloud Computing Technologies Co Ltd
Priority date: 2018-03-13
Filing date: 2018-03-13
Publication date: 2022-06-10
Anticipated expiration: 2038-03-13
Also published as: CN108595482A; WO2019174558A1

Abstract

The embodiment of the application discloses a data indexing method and device, which are used for reducing invalid secondary indexing operation and reducing time delay of secondary indexing when partitions are split in the secondary indexing process. The method in the embodiment of the application comprises the following steps: receiving a secondary index request, wherein the secondary index request carries a secondary index condition and a first index position; acquiring a first index result of a first partition according to the secondary index condition, wherein the first index result comprises first data and a first cursor which meet the secondary index condition; and acquiring a second index result according to the first partition identification and the second index position, wherein the second index result comprises second data and a second cursor which meet the secondary index condition.

Description

Data indexing method and device

Technical Field

The present application relates to the field of communications, and in particular, to a data indexing method and apparatus.

Background

In order to adapt to the increasing number of data entries in a distributed database system, dynamic lateral expansion of the system is often realized by partitioning (Partition) a data table, that is, as data service nodes in the system increase, performance linearly increases. A certain column value or a combination value of a certain number of columns in the user data table is partitioned according to a certain rule, and the certain column or the combination column for partitioning is called a Partition Key (Partition Key). There are generally two partitioning approaches: the first Partition mode is consistent Hash (Hash) Partition, and Hash calculation is carried out on a Partition Key value, so that the Partition to which the data belongs is known; the second Partition mode is Range Partition (Range Partition), which is stored according to a certain sort rule of Partition Key values, and the Range of Partition Key values is divided into continuous intervals, to which Range a Partition Key value of one piece of user data belongs, and to which corresponding Partition the Partition Key value of one piece of user data belongs. For range partitioning, in a distributed database, there are mainly 3 types of query requirements: the first type is inquired according to the value of a designated Partition Key, and the Partition Key must be unique; the second type is to inquire according to the condition of the Partition Key, and at the moment, a prefix or a range of the Partition Key is appointed; the third type is to perform query according to the secondary index condition, and then the value prefix or range of the secondary index column is specified.

In practical application, if a partition is subjected to split-merge operation in the process of performing traversal query on the partition, the secondary index sorting sequence will be changed, and if the traversal is continued, the returned traversal records may be repeated or omitted. In the existing scheme, a processing mode of balanced priority is adopted, and in the process of secondary index traversal operation, if the Partition which is being traversed is split and combined, secondary index traversal is executed again from the changed Partition.

The prior art adopts the processing mode of balanced priority, which can cause more invalid traversals and influence the time delay of the traversal operation of the user.

Disclosure of Invention

The embodiment of the application provides a data indexing method and device, which are used for reducing invalid secondary indexing operation and reducing time delay of secondary indexing when partitions are split in the secondary indexing process.

A first aspect of the present application provides a data indexing method, including: receiving a secondary index request, wherein the secondary index request carries a secondary index condition and a first index position, and the first index position is the position of the last index; obtaining a first index result of a first partition according to a secondary index condition, wherein the first index result comprises first data and a first cursor meeting the secondary index condition, the first cursor indicates a first partition identifier and a second index position, the first partition identifier is used for indicating that a first initial partition is split when indexing to the first index position, and the first initial partition comprises the first partition; and acquiring a second index result according to the first partition identification and the second index position, wherein the second index result comprises second data meeting the secondary index condition and a second cursor. In the embodiment, through the design of the multi-stage vernier, the traversal process with the condition of the second-stage index and the splitting and merging operation of the partition are decoupled, independent and independent, so that omission does not exist in the traversal return result of the second-stage index, invalid second-stage index operation is reduced, and the time delay of the second-stage index is reduced.

In a possible design, in a first implementation manner of the first aspect of the embodiment of the present application, the obtaining a first index result of the first partition according to the second-level index condition includes: performing secondary indexing from a first initial position in the first partition according to the secondary indexing condition, wherein the first initial position is the first indexing position or a next indexing position closest to the first indexing position; and acquiring a first index result of the first partition. In this embodiment, the starting position of the secondary index is determined by refining the process of performing the secondary index specifically according to the secondary index condition.

In a possible design, in a second implementation manner of the first aspect of the embodiment of the present application, the first cursor is further configured to indicate the first initial partition identifier, and the first initial partition identifier indicates a partition key range of the first initial partition. In the embodiment, the indication information in the cursor is added, the indication to the initial range is added, the traversal range of the secondary index is determined, and the process of the secondary index is accelerated.

In a possible design, in a third implementation manner of the first aspect of the embodiment of the present application, the obtaining a second index result according to the first partition identifier and the second index position includes: performing secondary indexing from a second starting position in the first sub-partition according to the first partition identifier, wherein the second starting position is the second indexing position or a next indexing position closest to the second indexing position; and acquiring a second index result of the first sub-partition, wherein the second index result comprises the second data and a second cursor, the second cursor indicates the first initial partition identifier, the first sub-partition identifier and a third index position, and the first sub-partition identifier is used for indicating that the first partition is split when indexing to the third index position. In this embodiment, the process of performing the secondary indexing again after the partition is split is defined, the starting position of performing the secondary indexing again is determined, and the cursor information for the next indexing is obtained.

In a possible design, in a fourth implementation manner of the first aspect of the embodiment of the present application, the method further includes: performing secondary indexing on a second sub-partition, the second sub-partition being included in the first partition; and acquiring a third indexing result of the second sub-partition, wherein the third indexing result comprises third data and a third cursor which meet the secondary indexing condition in the second sub-partition, the third cursor indicates the first initial partition identifier, the first partition identifier, the second sub-partition identifier and a fourth indexing position, and the second sub-partition identifier is used for indicating that the first partition is split when indexing to the fourth indexing position. In this embodiment, after the index of the second sub-partition is completed, the second sub-partition is continuously traversed by the second index, so that the steps of the embodiment of the present application are more complete.

In one possible design, in a fifth implementation manner of the first aspect of the embodiment of the present application, the method further includes: if the second sub-partition and the second partition are merged into a merged partition, wherein the second partition is contained in the first initial partition, performing secondary indexing on the second sub-partition in the merged partition; and acquiring a fourth index result of the merged partition, wherein the fourth index result comprises fourth data and a fourth cursor which meet the secondary index condition in the second sub-partition, and the fourth cursor indicates the first initial partition identifier, the first partition identifier, the second sub-partition identifier and a fifth index position. In this embodiment, when the second sub-partition and the second partition are merged, the range belonging to the second sub-partition in the merged region is first subjected to secondary index traversal, and the implementation manner of the embodiment of the present application is increased.

In a possible design, in a sixth implementation manner of the first aspect of the embodiment of the present application, the method further includes: if the second sub-partition in the merged partition is subjected to secondary indexing, performing secondary indexing on the second partition in the merged partition; and acquiring a fifth index result of the merged partition, wherein the fifth index result comprises fifth data and a fifth cursor which meet the secondary index condition in the second partition, the fifth cursor indicates the first initial partition identifier, the second partition identifier and a sixth index position, and the second partition identifier indicates that the first initial partition is split when indexing to the first index position. In this embodiment, when the second sub-partition and the second partition are merged, the range belonging to the second partition in the merged region is first subjected to secondary index traversal, and the implementation manner of the embodiment of the present application is increased.

In one possible design, in a seventh implementation manner of the first aspect of the embodiment of the present application, the method further includes: if the second sub-partition completes the second-level indexing, performing the second-level indexing on the second partition; and acquiring a sixth index result of the second partition, wherein the sixth index result comprises sixth data and a sixth cursor which meet the secondary index condition in the second partition, the sixth cursor comprises the first initial partition identifier, a second partition identifier and the seventh index position, and the second partition identifier indicates that the first initial partition is split at the first index position. In this embodiment, when the partition merging condition does not occur, after the secondary indexing of the second sub-partition is completed, the secondary indexing is continuously performed on the range of the second sub-partition, and the implementation manner of the embodiment of the present application is increased.

In a possible design, in an eighth implementation manner of the first aspect of the embodiment of the present application, the method further includes: if the first initial partition completes the secondary indexing, performing the secondary indexing on a second initial partition; and acquiring an index result of the second initial partition, wherein the index result of the second initial partition comprises data meeting the secondary index condition in the second initial partition and a seventh cursor, and the seventh cursor indicates a partition key range and an eighth index position of the second initial partition. In this embodiment, after the secondary indexing of the first initial partition is completed, the secondary indexing is continuously performed on the range of the second initial partition, and the implementation manner of the embodiment of the present application is increased.

In a possible design, in a ninth implementation manner of the first aspect of the embodiment of the present application, the partition key ranges all include values of the left boundary and do not include values of the right boundary. In this embodiment, the range of the partition key is limited, so that the present application is more logically strict.

A second aspect of the present application provides a data indexing apparatus, including: the receiving unit is used for receiving a secondary index request, and the secondary index request carries a secondary index condition and a first index position; the first obtaining unit is configured to obtain a first index result of a first partition according to the secondary index condition, where the first index result includes first data and a first cursor that satisfy the secondary index condition, the first cursor indicates a first partition identifier and a second index position, the first partition identifier is used to indicate that a first initial partition is split when indexing to the first index position, and the first initial partition includes the first partition; and the second obtaining unit is used for obtaining a second index result according to the first partition identification and the second index position, wherein the second index result comprises second data meeting the secondary index condition and a second cursor. In the embodiment, through the design of the multi-stage vernier, the traversal process with the condition of the second-stage index and the splitting and merging operation of the partition are decoupled, independent and independent, so that omission does not exist in the traversal return result of the second-stage index, invalid second-stage index operation is reduced, and the time delay of the second-stage index is reduced.

In a possible design, in a first implementation manner of the second aspect of the embodiment of the present application, the first obtaining unit is specifically configured to: performing secondary indexing from a first initial position in the first partition according to the secondary indexing condition, wherein the first initial position is the first indexing position or a next indexing position closest to the first indexing position; and acquiring a first index result of the first partition. In this embodiment, the starting position of the secondary index is determined by refining the process of performing the secondary index specifically according to the secondary index condition.

In a possible design, in a second implementation manner of the second aspect of the embodiment of the present application, the first cursor is further configured to indicate the first initial partition identifier, and the first initial partition identifier indicates a partition key range of the first initial partition. In the embodiment, the indication information in the cursor is added, the indication to the initial range is added, the traversal range of the secondary index is determined, and the process of the secondary index is accelerated.

In a possible design, in a third implementation manner of the second aspect of the embodiment of the present application, the second obtaining unit is specifically configured to: performing secondary indexing from a second starting position in the first sub-partition according to the first partition identifier, wherein the second starting position is the second indexing position or a next indexing position closest to the second indexing position; and acquiring a second index result of the first sub-partition, wherein the second index result comprises the second data and a second cursor, the second cursor indicates the first initial partition identifier, the first sub-partition identifier and a third index position, and the first sub-partition identifier is used for indicating that the first partition is split when indexing to the third index position. In this embodiment, the process of performing the secondary indexing again after the partition is split is defined, the starting position of performing the secondary indexing again is determined, and the cursor information for the next indexing is obtained.

In a possible design, in a fourth implementation manner of the second aspect of the embodiment of the present application, the data indexing apparatus further includes: the first indexing unit is used for carrying out secondary indexing on a second sub-partition, and the second sub-partition is contained in the first partition; a third obtaining unit, configured to obtain a third index result of the second sub-partition, where the third index result includes third data and a third cursor that satisfy the second-level index condition in the second sub-partition, the third cursor indicates the first initial partition identifier, the first partition identifier, the second sub-partition identifier, and a fourth index position, and the second sub-partition identifier is used to indicate that the first partition is split when indexing to the fourth index position. In this embodiment, after the index of the second sub-partition is completed, the second sub-partition is continuously traversed by the second index, so that the steps of the embodiment of the present application are more complete.

In a possible design, in a fifth implementation manner of the second aspect of the embodiment of the present application, the data indexing apparatus further includes: a second indexing unit, configured to perform secondary indexing on the second sub-partition in the merged partition if the second sub-partition and the second partition are merged into the merged partition, where the second partition is included in the first initial partition; a fourth obtaining unit, configured to obtain a fourth index result of the merged partition, where the fourth index result includes fourth data and a fourth cursor that satisfy the second-level index condition in the second sub-partition, and the fourth cursor indicates the first initial partition identifier, the first partition identifier, the second sub-partition identifier, and a fifth index position. In this embodiment, when the second sub-partition and the second partition are merged, the range belonging to the second sub-partition in the merged region is first subjected to secondary index traversal, and the implementation manner of the embodiment of the present application is increased.

In a possible design, in a sixth implementation manner of the second aspect of the embodiment of the present application, the data indexing apparatus further includes: a third indexing unit, configured to perform secondary indexing on the second partition in the merged partition if the secondary indexing is completed on the second sub-partition in the merged partition; a fifth obtaining unit, configured to obtain a fifth index result of the merged partition, where the fifth index result includes fifth data and a fifth cursor that satisfy the secondary index condition in the second partition, where the fifth cursor indicates the first initial partition identifier, the second partition identifier, and a sixth index position, and the second partition identifier indicates that the first initial partition is split when indexing to the first index position. In this embodiment, when the second sub-partition and the second partition are merged, the range belonging to the second sub-partition in the merged region is first subjected to secondary index traversal, and the implementation manner of the embodiment of the present application is increased.

In a possible design, in a seventh implementation manner of the second aspect of the embodiment of the present application, the data indexing device further includes: a fourth indexing unit, configured to perform secondary indexing on the second partition if the second sub-partition completes the secondary indexing; a sixth obtaining unit, configured to obtain a sixth index result of the second partition, where the sixth index result includes sixth data and a sixth cursor that satisfy the secondary index condition in the second partition, and the sixth cursor includes the first initial partition identifier, a second partition identifier, and the seventh index position, and the second partition identifier indicates that the first initial partition is split at the first index position. In this embodiment, when the partition merging condition does not occur, after the secondary indexing of the second sub-partition is completed, the secondary indexing is continuously performed on the range of the second sub-partition, and the implementation manner of the embodiment of the present application is increased.

In a possible design, in an eighth implementation manner of the second aspect of the embodiment of the present application, the data indexing apparatus further includes: a fifth indexing unit, configured to perform secondary indexing on a second initial partition if the first initial partition completes the secondary indexing; a seventh obtaining unit, configured to obtain an index result of the second initial partition, where the index result of the second initial partition includes data in the second initial partition that meets the secondary index condition and a seventh cursor, and the seventh cursor indicates a partition key range and an eighth index position of the second initial partition. In this embodiment, after the secondary indexing of the first initial partition is completed, the secondary indexing is continuously performed on the range of the second initial partition, and the implementation manner of the embodiment of the present application is increased.

In a possible design, in a ninth implementation manner of the second aspect of the embodiment of the present application, the partition key ranges all include values of the left boundary and do not include values of the right boundary. In this embodiment, the range of the partition key is limited, so that the present application is more logically strict.

A third aspect of the present application provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the method of the above-described aspects.

A fourth aspect of the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the above-described aspects.

Drawings

FIG. 1 is a diagram illustrating a network architecture according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an application scenario of the data indexing method in the embodiment of the present application;

FIG. 3 is a schematic diagram of an embodiment of a data indexing method in an embodiment of the present application;

FIG. 4 is a schematic diagram of an application scenario of the data indexing method in the embodiment of the present application;

FIG. 5 is a schematic diagram of an embodiment of a data indexing device in an embodiment of the present application;

FIG. 6 is a schematic diagram of another embodiment of a data indexing device in the embodiment of the present application;

fig. 7 is a schematic diagram of another embodiment of the data indexing device in the embodiment of the present application.

Detailed Description

In order to make the technical field better understand the scheme of the present application, the following description will be made on the embodiments of the present application with reference to the attached drawings.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Embodiments of the present application are applicable to distributed databases employing range partitioning techniques, may be applied in a network framework as shown in figure 1, in the network framework, as shown in fig. 1, in the embodiment of the present application, the distributed database system is divided into four continuous partitions according to the value range of the Partition Key (Partition Key), which are Partition 1(Partition1), Partition 2(Partition2), Partition 3(Partition3), and Partition 4(Partition4), wherein x is a value of a partition key, minKey is a minimum value of the partition key, maxKey is a maximum value of the partition key, a partition key value range of a partition1 is (minKey, -75), a partition key value range of a partition2 is (-75, 25), a partition key value range of a partition3 is (25, 175), a partition key value range of a partition4 is (175, maxKey), and the partitions 1 to 4 are sequentially arranged from left to right. When the content of a main record meeting the condition needs to be queried by a prefix or a Range of a certain secondary index, and it is guaranteed that a traversal result is not repeated and is not omitted (it is not guaranteed that a newly added record is returned), for the query request, according to a partition view (Partiton Range Map), traversal of the secondary index is performed on each partition according to a Range sequence, and a result is returned through a query interface.

In the network framework shown in fig. 1, a Partition view (Partiton Range Map) is used as a Range index of a user data table, and the Partition view records an identifier (Id) of each Partition, a Range (Range) of the Partition, and a server address to which the Partition belongs, so that operations such as creation, update, deletion, and the like of data records are facilitated, and which Partition belongs is located according to a Partition Key. For example, as shown in fig. 2, a user configures to Partition a user data table according to a name column, that is, the name is Partition Key, at this time, the system divides the user data table into 3 partitions, where the ranges of the three partitions are continuous ranges of values of the name, and are Partition 1(Partition1), Partition 2(Partition2), and Partition 3(Partition3), respectively, a Partition Key value range of Partition1 is [ MIN, "b"), a Partition Key value range of Partition2 is [ "b," "c"), and a Partition Key value range of Partition3 is [ "c," MAX), where MIN is a minimum value of a Partition Key, and MAX is a maximum value of the Partition Key. The user data table also includes a unique row identification (rowid), age (age), address (address), student identification (student id). In order to satisfy the complex query scenario, it is further necessary to index according to other columns of the user data table, create a secondary index table, and then sort the values of the column in which each secondary index is located, so as to generate the secondary index table, as shown in fig. 2. The secondary index table shown in fig. 2 records a unique row identifier (rowid), a partition key (name), and a secondary index column (student id) of a user data row, and may obtain corresponding user data according to the name column and the rowid column, and obtain the user data through two accesses. When Partition1 is split during the secondary indexing of Partition1, the Partition database system needs to complete Partition splitting operation and continue the secondary indexing operation, so that invalid secondary indexing operation is reduced, and the time delay of the secondary indexing is reduced.

For convenience of understanding, a specific flow of an embodiment of the present application is described below, and referring to fig. 3, an embodiment of a data indexing method in an embodiment of the present application includes:

301. a secondary index request is received.

And receiving a secondary index request, wherein the secondary index request carries a secondary index condition and a first index position, and the first index position is a position where data is obtained in a secondary index table last time in the last secondary index query process. And converting the traversal of the secondary index condition into the traversal of a secondary index table in each partition in each name space.

For example, for the user data table shown in FIG. 2, "name" is listed as the partition key and "student" is listed as the secondary index column. The user data table has been divided into a plurality of partitions according to the value of the "name" column, which are Partition1, Partition2, and Partition3, and the specific Partition view relationships are as follows:

Partition：1，[MIN，“b”)，server1_ip

Partition：2，[“b”，“c”)，server2_ip

Partition：3，[“c”，MAX)，server3_ip

where the

numbers

1, 2, 3 indicate Partition names, [ MIN, "b") denotes the Partition key range of Partition1, [ "b," "c") denotes the Partition key range of Partition2, [ "c," MAX) denotes the Partition key range of Partition 3. server1_ ip indicates the server address of Partition1, server2_ ip indicates the server address of Partition2, and server3_ ip indicates the server address of Partition 3. The secondary indexing conditions are as follows: "student" is 10 or more and 50 or less. The data indexing device sends the secondary index request and the secondary index condition to a Server1_ ip, and specifies a Partition: 1, [ MIN, "b"), querying records with "student" greater than or equal to 10 and less than or equal to 50; sending the secondary index request and the secondary index condition to a Server2_ ip, and specifying a Partition: 2, [ "b", "c"), querying records of which the "student" is more than or equal to 10 and less than or equal to 50; sending the secondary index request and the secondary index condition to a Server3_ ip, and specifying a Partition: 3, [ "c", MAX)), query for records for which "student" is equal to or greater than 10, and equal to or less than 50. Alternatively, the user may specify the "name" range as [ "a", "b").

It should be noted that, in the range partitioning technology, when a Partition is split (that is, one Partition is split into 2 or more partitions), a part of data under the original Partition is shifted to a new Partition according to the sequence of the Partition key, and the secondary index table is reconstructed; when two partitions of adjacent Range are merged, the data under the right Partition are sequentially migrated back to the left Partition, and the secondary index table is reconstructed. For example, Partition: 1, [ MIN, "b") and Partition: 2, [ "b", "c"), within the partiionkey namespace, range is contiguous, Partition1 is called left Partition, Partition2 is called right Partition. In order to quickly judge which Partition a secondary index record belongs to when the Partition is split, a Partition Key is generally stored in a secondary table record, and the Partition to which the secondary index record belongs can be judged without visiting a main record when the Partition traverses the migration record of the secondary index table; and the rowid of the main record can be recorded in the secondary index table, so that the uniqueness of the secondary index record is ensured, and the scene with the non-unique Partition Key can be dealt with.

It is understood that the partition key range [ X, Y ] in the description of the embodiments of the present application and the following embodiments may also be simply expressed as (X, Y), and includes the values of the left boundary and does not include the values of the right boundary, according to the principle of left-closed and right-open.

302. And acquiring a first index result of the first partition according to the secondary index condition.

And acquiring a first index result of the first partition according to the secondary index condition, wherein the first index result comprises first data meeting the secondary index condition and a first cursor, the first cursor indicates a first partition identifier and a second index position, the first partition identifier is used for indicating that the first initial partition is split when indexing to the first index position, and the first initial partition comprises the first partition. When the secondary index table is indexed in the first partition, the returned first index result comprises the first data and the first cursor, and the cursor records the following basic information: a first partition identification, such as split [ MIN- "d", pos0), indicating that the first partition [ MIN, "b") split when indexed to pos 0; the second index position, i.e., the position of the currently traversed secondary index, may be the value of the secondary index column and rowid composition, such as pos 1.

It should be noted that the first cursor is also used to indicate a first initial partition identifier, and the first initial partition identifier indicates a partition key range of the first initial partition.

For example, for the user data table shown in fig. 2, the range in which the Partition1 is merged together is referred to as a first initial Partition, and the Partition1, Partition2, and Partition3 are merged together, and during the secondary index traversal of the Partition1, the Partition1 is split into Partition4 and Partition5, where the Partition4 traverses each split sub-Partition (i.e., Partition4 and Partition5) sequentially on the left side of the Partition5, and the secondary index table of each sub-Partition traverses from the secondary index position before splitting. It is therefore necessary to add the KeyRange of the post-split partition and the two-level index position before splitting in the cursor (cursor).

303. And acquiring a second index result according to the first partition identification and the second index position.

And acquiring a second index result according to the first partition identification and the second index position, wherein the second index result comprises second data meeting the secondary index condition and a second cursor.

After the first initial partition is divided into a first partition and a second partition, the first partition is continuously divided into a first sub-partition and a second sub-partition, secondary indexing is continuously performed on the first sub-partition according to the first cursor, and required data and a second cursor are obtained, wherein the second sub-cursor comprises a first initial partition identifier, an identifier of the first partition, a first sub-partition identifier and a current index position (a third index position) of the first sub-partition, the first initial partition identifier indicates a partition key range of the first initial partition, the identifier of the first partition is used for indicating that the partition key range of the first initial partition is divided when the partition key range traverses to the first index position, and the first sub-partition identifier is used for indicating that the first partition is divided when the first partition indexes to the second index position.

Note that the start position of the secondary index differs for each new partition after the split. Specifically, secondary indexing is performed from a first start position in the first partition according to a secondary indexing condition, where the first start position is the first index position or a next index position closest to the first index position. For example, a set of

numbers

1, 2, 3, 4, 5, 6, 7, … …, 21 in the secondary index list of the first initial partition is split into the first partition and the second partition after the split occurs at the position of 4, the secondary index list of the first partition is 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, and the secondary index list of the second partition is 2, 4, 6, 8, 10, 12, 14, 16, 18, 20. The starting position of the first partition for secondary indexing is 5, and the position of the second partition for secondary indexing is 4. And after the first partition finishes the secondary index traversal, inquiring the next adjacent splitting range, and starting the secondary index traversal on the second partition.

It is understood that if the split occurs again during the traversal of the first sub-partition or the second sub-partition, the processing manner is similar to that of step 303, and detailed description thereof is omitted here. If the combination with other partitions occurs in the process of traversing a certain partition, the cursor is not affected in the case, and the record of the key range of the combined partition is not required to be added. Because, if the value of PartitionKey is stored in the record of the secondary index table, it can be directly judged whether the PartitionKey belongs to the range of the partition key of the vernier record; if the PartitionKey value is not stored in the secondary index record, the PartitionKey value can be acquired from the main record and whether the PartitionKey value belongs to the range to be traversed or not can be judged when the main record is read. Records that do not fall within the scope of the section key of the cursor can be filtered directly.

In the embodiment of the application, for the first partition for performing the secondary index, the first partition is split into the first sub-partition and the second sub-partition, and the first partition splitting identifier is added to the returned cursor to ensure that the secondary index traversal can be continuously performed even if the partitions are split or aggregated in the secondary index traversal process, the re-traversal is not needed, the mutual influence between the index traversal process and the partition splitting aggregation is avoided, the invalid secondary index operation is reduced, and the time delay of the secondary index is reduced.

For convenience of understanding, a specific application scenario is described as an example, please refer to fig. 4, and another embodiment of the data indexing method in the embodiment of the present application includes:

the distributed database system comprises Partition0, Partition1 and Partition2 which are named as zeroth initial Partition, first initial Partition and second initial Partition respectively, Partition keys of a user data table are 'name' columns and are arranged according to a lexicographic sequence, KeyRange is [ A-F ], a secondary index is 'student Id' column and is sorted according to size. For simplicity of description, it is assumed that the value of the student id has uniqueness, so that the traversal position of the secondary index can be represented by using only the value of the student id. It is understood that when the values of the secondary index columns are not unique, the studengdd may be replaced with StudentId-Rowid.

Specifically, the user issues a secondary index request to obtain all Student information with Student id < 100. The data indexing device decomposes the request into a plurality of Partition secondary index traversal operations according to the current Partition view (Partition Range Map), wherein the Partition Range Map is as follows:

Partition0[MIN，“A”)

Partition1[“A”，“F”)

Partition2[“F”，MAX)

here, a two-level index traversal of Partition1 is taken as an example for explanation, and the specific process is as follows:

(1) In the traversing process of the data indexing device according to the secondary index condition part 1, the distributed database system returns the traversing result: 1, 2, 3, 4, wherein the cursor carries the current partition key range and the secondary index position, specifically: currsor 1 is Range (a to F) pos (4), and the corresponding secondary index table is: 1. 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, the secondary index table may further include more data, and this embodiment is described by taking only 21 data as an example;

(2) when the secondary index traversal is continued, the Partition1 is split into Partition 3 (A-D) and Partition4 (D-F), and the secondary index table of Partition 3 is: 1. 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, the secondary index table of pariton 4 is: 2. 4, 6, 8, 10, 12, 14, 16, 18, 20, and continue traversing from Partiton3 with a start position of 4 (not containing the record), and then traversing to obtain a data record with studentId of 5, 7, and then the system returns the traversal result:

studentId

5, 7 data record with the returned cursor being: currsor 2 is Range (a-F) split1 (a-D, 4) pos (7), where Range (a-F) represents an initial partition key Range of a first initial partition traversed, split (a-D, 4) represents that splitting occurs when a previous Range (a-F) traverses to a secondary index pos of 4, and P (7) represents a position where the partition where split (a-D) is located traverses to the secondary index pos of 7, that is, a current index position;

(3) During the continuous secondary index traversal, if the partitin 3 is not split, after the secondary index traversal of the Partition3 is completed, the next adjacent Partition4 can be traversed according to split1 (a-D, 4) and Partition Range Map in the cursor, and all data records and the cursors of >4 are obtained, and the returned cursors at this time are: currsor 3 is Range (a to F) split2(D to F, 4) pos (10).

(4) Continuing to perform the secondary index traversal, if the Partition3 traverses to pos ═ 7, splitting occurs again, the splitting is split into Partition 5 (a-B) and Partition6 (B-D), continuing to traverse from Partition 5, the starting position is 7 (without the record), and the traversal obtains a record with the studentId of 9, at this time, the system returns the traversal result: studentId 9 data record and cursor, the cursor returned at this time is: the Range (a-F) split1 (a-D, 4) split2 (a-B, 7) Pos (9) is used for the cursor4, where Range (a-F) represents the initial partition key Range of the first initial partition traversed, split1 (a-D, 4) represents that splitting occurs when the previous Range (a-F) traverses to the secondary index Pos of 4, split2 (a-B, 7) represents that splitting occurs when the previous Range (a-D) traverses to the secondary index Pos of 7, and P (9) represents the position where the partition where split (a-B) is located traverses to the secondary index Pos of 9.

(5) After the Partition5 is traversed, according to Partition Range Map and the recorded split2 (A-B, 7) in the cursor, the next adjacent Partition6 (B-D) is found to continue traversing the secondary index, and the returned secondary index traversal result is as follows: studentId ═ 11, 15 data records and a cursor, when the cursor returned is: the cursor5 is Range (a-F) split1 (a-D, 4) split2 (B-D, 7) pos (15), where Range (a-F) represents the initial partition key Range of the first initial partition traversed, split1 (a-D, 4) represents that splitting occurs when the previous Range (a-F) traverses to the secondary index pos of 4, split2 (B-D, 7) represents that splitting occurs when the previous Range (a-D) traverses to the secondary index pos of 7, and P (15) represents the position where the partition traversed to the secondary index pos of 15 where split (B-D) is located.

(6) If the partitin 6 is not split or merged, after the traversal is finished, the next adjacent Partition4 (D-F) is found according to Partition Range Map and split2 (B-D, 7) recorded in the cursor, and the secondary index traversal is continuously carried out on the studentd column, and at the moment, the system returns the traversal result: studentId ═ 6, 8, 10 data records and the cursor, at which point the cursor returned is: currsor 3 is Range (a to F) split2(D to F, 4) pos (10). When the Partition6 traversal is finished, the traversal is finished for the entire Range (A-F) of the first initial Partition.

(7) If the Partition6 and the Partition4 are merged to become Partition7 (B-F) in the traversal process of the Partition6, the Partition6 and the Partition4 are mapped to Partition7 (B-F) according to the split2 (B-D, 7) recorded in the cursor, at the moment, the cursor is not changed, the secondary index table of the studentId column in the Partition7 is continuously traversed, and the starting point is pos (15). Since the secondary index sequence of the original Partition6 and Partition4 has been merged in Partition7, in traversing the secondary index, check that the partitionkey of the data record does not belong to split2 (B-D), skip, assuming that records with studentId 16, 18, 19 are traversed, where records 16 and 18 are discarded, records with studentId 19 are returned, and the cursor is: the Range (a to F) split1(a to D, 4) split2(B to D, 7) pos (19) is cure 6.

(8) And when the Range of Split2 (B-D) in the Partition7 (B-F) is traversed, obtaining the next adjacent Range of Split (D-F) according to the Partition Range Map. Now that the split1 (A-D, 4) in the cursor has also been traversed, this information can be removed. The secondary index traversal continues from pos (15), resulting in studentId being 16, 18, 19 data records, where 19 records do not belong to the partition key range (D-F) and are therefore discarded. Only records with studentId 16, 18 are returned, carrying the cursors: currsor 7 is Range (a to F) split1(D to F, 4) pos (19).

(9) After the partition key Range (D-F) index traversal is completed, the initial Range (A-F) has completed the secondary index traversal.

In this embodiment, through the design of the multi-stage cursors, the traversal process with the condition of the two-stage index and the splitting and merging operation of the partition are decoupled, independent of each other, and do not affect each other. The method and the device ensure that the secondary index traversal return result has no omission, reduce invalid secondary index operation and reduce the time delay of the secondary index.

With reference to fig. 5, the data indexing apparatus in the embodiment of the present application is described, and an embodiment of the data indexing apparatus in the embodiment of the present application includes:

a receiving unit 501, configured to receive a secondary index request, where the secondary index request carries a secondary index condition and a first index position;

a first obtaining unit 502, configured to obtain a first index result of a first partition according to the secondary index condition, where the first index result includes first data and a first cursor that satisfy the secondary index condition, the first cursor indicates a first partition identifier and a second index position, the first partition identifier is used to indicate that a first initial partition is split when indexing to the first index position, and the first initial partition includes the first partition;

A second obtaining unit 503, configured to obtain a second index result according to the first partition identifier and the second index position, where the second index result includes second data and a second cursor that satisfy the second-level index condition.

Referring to fig. 6, another embodiment of the data indexing apparatus in the embodiment of the present application includes:

a receiving unit 601, configured to receive a secondary index request, where the secondary index request carries a secondary index condition and a first index position;

a first obtaining unit 602, configured to obtain a first index result of a first partition according to the secondary index condition, where the first index result includes first data and a first cursor that satisfy the secondary index condition, the first cursor indicates a first partition identifier and a second index position, the first partition identifier is used to indicate that a first initial partition is split when indexing to the first index position, and the first initial partition includes the first partition;

a second obtaining unit 603, configured to obtain a second index result according to the first partition identifier and the second index position, where the second index result includes second data and a second cursor that satisfy the secondary index condition.

In a possible implementation manner, the first obtaining unit 602 is specifically configured to:

performing secondary indexing from a first initial position in the first partition according to the secondary indexing condition, wherein the first initial position is the first indexing position or a next indexing position closest to the first indexing position;

and acquiring a first index result of the first partition.

In a possible implementation manner, the first cursor is further used for indicating the first initial partition identifier, and the first initial partition identifier indicates a partition key range of the first initial partition.

In a possible implementation manner, the second obtaining unit 603 is specifically configured to:

performing secondary indexing from a second starting position in the first sub-partition according to the first partition identifier, wherein the second starting position is the second indexing position or a next indexing position closest to the second indexing position;

and acquiring a second index result of the first sub-partition, wherein the second index result comprises the second data and a second cursor, the second cursor indicates the first initial partition identifier, the first sub-partition identifier and a third index position, and the first sub-partition identifier is used for indicating that the first partition is split when indexing to the third index position.

In a possible implementation manner, the data indexing apparatus further includes:

a first indexing unit 604, configured to perform secondary indexing on a second sub-partition, where the second sub-partition is included in the first partition;

a third obtaining unit 605, configured to obtain a third index result of the second sub-partition, where the third index result includes third data and a third cursor that satisfy the second-level index condition in the second sub-partition, the third cursor indicates the first initial partition identifier, the first partition identifier, the second sub-partition identifier, and a fourth index position, and the second sub-partition identifier is used to indicate that the first partition is split when indexing to the fourth index position.

In one possible implementation manner, the data indexing apparatus further includes:

a second indexing unit 606, configured to perform secondary indexing on the second sub-partition in the merged partition if the second sub-partition and the second partition are merged into the merged partition, where the second partition is included in the first initial partition;

a fourth obtaining unit 607, configured to obtain a fourth index result of the merged partition, where the fourth index result includes fourth data and a fourth cursor in the second sub-partition, where the fourth cursor indicates the first initial partition identifier, the first partition identifier, the second sub-partition identifier, and a fifth index position, and the fourth data meets the second-level index condition.

a third indexing unit 608, configured to perform secondary indexing on the second partition in the merged partition if the secondary indexing is completed on the second sub-partition in the merged partition;

a fifth obtaining unit 609, configured to obtain a fifth index result of the merged partition, where the fifth index result includes fifth data and a fifth cursor that satisfy the secondary index condition in the second partition, where the fifth cursor indicates the first initial partition identifier, the second partition identifier, and a sixth index position, and the second partition identifier indicates that the first initial partition is split when indexing to the first index position.

a fourth indexing unit 610, configured to perform secondary indexing on the second partition if the second sub-partition completes the secondary indexing;

a sixth obtaining unit 611, configured to obtain a sixth index result of the second partition, where the sixth index result includes sixth data and a sixth cursor that satisfy the secondary index condition in the second partition, and the sixth cursor includes the first initial partition identifier, a second partition identifier, and the seventh index position, and the second partition identifier indicates that the first initial partition is split at the first index position.

a fifth indexing unit 612, configured to perform secondary indexing on a second initial partition if the first initial partition completes the secondary indexing;

a seventh obtaining unit 613, configured to obtain an index result of the second initial partition, where the index result of the second initial partition includes data in the second initial partition that meets the secondary index condition and a seventh cursor, and the seventh cursor indicates a partition key range and an eighth index position of the second initial partition.

In one possible implementation of the method according to the invention,

the partition key ranges all contain the values of the left boundary and no values of the right boundary.

Fig. 5 to 6 describe the data indexing device in the embodiment of the present application in detail from the perspective of the modular functional entity, and the data indexing device in the embodiment of the present application is described in detail from the perspective of hardware processing.

Fig. 7 is a schematic structural diagram of a data indexing apparatus 700 according to an embodiment of the present disclosure, where the data indexing apparatus 700 may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 701 (e.g., one or more processors) and a memory 709, and one or more storage media 708 (e.g., one or more mass storage devices) for storing applications 707 or data 706. Wherein the memory 709 and the storage medium 708 may be transient or persistent storage. The program stored on the storage medium 708 may include one or more modules (not shown), each of which may include a sequence of instructions operating on the data indexing device. Still further, the processor 701 may be configured to communicate with the storage medium 708 to execute a series of instruction operations in the storage medium 708 on the data indexing device 700.

The data indexing device 700 may also include one or more power supplies 702, one or more wired or wireless network interfaces 703, one or more input-output interfaces 704, and/or one or more operating systems 705, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and the like. Those skilled in the art will appreciate that the data indexing device structure shown in FIG. 7 does not constitute a limitation of the data indexing device, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

The following describes each component of the data indexing device in detail with reference to fig. 7:

the processor 701 is a control center of the data indexing apparatus, and may perform processing according to a set data indexing method. The processor 701 interfaces with various interfaces and circuitry to various portions of the overall data indexing device to perform various functions of the data indexing device and process data by running or executing software programs and/or modules stored in the memory 709, as well as invoking data stored in the memory 709, thereby enabling a secondary index traversal.

The memory 709 may be used to store software programs and modules, and the processor 701 executes various functional applications and data processing of the data indexing device 700 by operating the software programs and modules stored in the memory 709. The memory 709 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as receiving a secondary index request, etc.), and the like; the storage data area may store data created according to the use of the data indexing device (such as a result of obtaining an index, etc.), and the like. Further, the memory 709 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. The program of the data indexing method provided in the embodiment of the present application and the received data stream are stored in a memory, and when they are needed to be used, the processor 701 calls the data from the memory 709.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a server, a data center, etc., that is integrated with one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Claims

1. A method for indexing data, comprising:

receiving a secondary index request, wherein the secondary index request carries a secondary index condition and a first index position;

obtaining a first index result of a first partition according to the secondary index condition, wherein the first index result comprises first data and a first cursor meeting the secondary index condition, the first cursor indicates a first partition identifier and a second index position, the first partition identifier is used for indicating that a first initial partition is split when indexing to the first index position, and the first initial partition comprises the first partition;

2. The method of claim 1, wherein obtaining the first index result of the first partition according to the secondary index condition comprises:

and acquiring a first index result of the first partition.

3. The method of claim 1,

the first cursor is further used for indicating the first initial partition identification, and the first initial partition identification indicates a partition key range of the first initial partition.

4. The method of claim 3, wherein obtaining the second index result according to the first partition identifier and the second index position comprises:

performing secondary indexing from a second initial position in a first sub-partition according to the first partition identifier, wherein the second initial position is the second index position or a next index position closest to the second index position;

5. The method of claim 4, further comprising:

performing secondary indexing on a second sub-partition, the second sub-partition being included in the first partition;

and acquiring a third indexing result of the second sub-partition, wherein the third indexing result comprises third data and a third cursor which meet the secondary indexing condition in the second sub-partition, the third cursor indicates the first initial partition identifier, the first partition identifier, the second sub-partition identifier and a fourth indexing position, and the second sub-partition identifier is used for indicating that the first partition is split when indexing to the fourth indexing position.

6. The method of claim 5, further comprising:

if the second sub-partition and the second partition are merged into a merged partition, wherein the second partition is contained in the first initial partition, performing secondary indexing on the second sub-partition in the merged partition;

And acquiring a fourth index result of the merged partition, wherein the fourth index result comprises fourth data and a fourth cursor which meet the secondary index condition in the second sub-partition, and the fourth cursor indicates the first initial partition identifier, the first partition identifier, the second sub-partition identifier and a fifth index position.

7. The method of claim 6, further comprising:

if the second sub-partition in the merged partition is subjected to secondary indexing, performing secondary indexing on the second partition in the merged partition;

and acquiring a fifth index result of the merged partition, wherein the fifth index result comprises fifth data and a fifth cursor which meet the secondary index condition in the second partition, the fifth cursor indicates the first initial partition identifier, the second partition identifier and a sixth index position, and the second partition identifier indicates that the first initial partition is split when indexing to the first index position.

8. The method of claim 5, further comprising:

if the second sub-partition completes the second-level indexing, performing the second-level indexing on the second partition;

And acquiring a sixth index result of the second partition, wherein the sixth index result comprises sixth data and a sixth cursor which meet the secondary index condition in the second partition, the sixth cursor comprises the first initial partition identifier, a second partition identifier and a seventh index position, and the second partition identifier indicates that the first initial partition is split at the first index position.

9. The method according to any one of claims 1 to 8, further comprising:

if the first initial partition completes the secondary indexing, performing the secondary indexing on a second initial partition;

and acquiring an index result of the second initial partition, wherein the index result of the second initial partition comprises data meeting the secondary index condition in the second initial partition and a seventh cursor, and the seventh cursor indicates a partition key range and an eighth index position of the second initial partition.

10. The method of claim 3,

the partition key ranges all contain the value of the left boundary and no value of the right boundary.

11. A data indexing apparatus, comprising:

the receiving unit is used for receiving a secondary index request, and the secondary index request carries a secondary index condition and a first index position;

The first obtaining unit is configured to obtain a first index result of a first partition according to the secondary index condition, where the first index result includes first data and a first cursor that satisfy the secondary index condition, the first cursor indicates a first partition identifier and a second index position, the first partition identifier is used to indicate that a first initial partition is split when indexing to the first index position, and the first initial partition includes the first partition;

and the second acquisition unit is used for acquiring a second index result according to the first partition identification and the second index position, wherein the second index result comprises second data meeting the secondary index condition and a second cursor.

12. The data indexing device of claim 11, wherein the first obtaining unit is specifically configured to:

and acquiring a first index result of the first partition.

13. The data indexing device of claim 11,

14. The data indexing device of claim 13, wherein the second obtaining unit is specifically configured to:

15. The data indexing device of claim 14, further comprising:

the first indexing unit is used for carrying out secondary indexing on a second sub-partition, and the second sub-partition is contained in the first partition;

A third obtaining unit, configured to obtain a third index result of the second sub-partition, where the third index result includes third data and a third cursor that satisfy the second-level index condition in the second sub-partition, the third cursor indicates the first initial partition identifier, the first partition identifier, the second sub-partition identifier, and a fourth index position, and the second sub-partition identifier is used to indicate that the first partition is split when indexing to the fourth index position.

16. The data indexing device of claim 15, further comprising:

a second indexing unit, configured to perform secondary indexing on the second sub-partition in the merged partition if the second sub-partition and the second partition are merged into the merged partition, where the second partition is included in the first initial partition;

a fourth obtaining unit, configured to obtain a fourth index result of the merged partition, where the fourth index result includes fourth data and a fourth cursor that satisfy the second-level index condition in the second sub-partition, and the fourth cursor indicates the first initial partition identifier, the first partition identifier, the second sub-partition identifier, and a fifth index position.

17. The data indexing device of claim 16, further comprising:

a third indexing unit, configured to perform secondary indexing on the second partition in the merged partition if the secondary indexing is completed on the second sub-partition in the merged partition;

a fifth obtaining unit, configured to obtain a fifth index result of the merged partition, where the fifth index result includes fifth data and a fifth cursor that satisfy the secondary index condition in the second partition, where the fifth cursor indicates the first initial partition identifier, the second partition identifier, and a sixth index position, and the second partition identifier indicates that the first initial partition is split when indexing to the first index position.

18. The data indexing device of claim 15, further comprising:

the fourth indexing unit is used for performing secondary indexing on the second partition if the second sub-partition completes the secondary indexing;

a sixth obtaining unit, configured to obtain a sixth index result of the second partition, where the sixth index result includes sixth data and a sixth cursor that satisfy the secondary index condition in the second partition, and the sixth cursor includes the first initial partition identifier, a second partition identifier, and a seventh index position, and the second partition identifier indicates that the first initial partition is split at the first index position.

19. The data indexing device of any one of claims 11 to 18, further comprising:

a fifth indexing unit, configured to perform secondary indexing on a second initial partition if the first initial partition completes the secondary indexing;

a seventh obtaining unit, configured to obtain an index result of the second initial partition, where the index result of the second initial partition includes data in the second initial partition that meets the secondary index condition and a seventh cursor, and the seventh cursor indicates a partition key range and an eighth index position of the second initial partition.

20. The data indexing device of claim 13,

21. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any one of claims 1-10.