WO2019174558A1 - Procédé et dispositif d'indexation de données - Google Patents

Procédé et dispositif d'indexation de données Download PDF

Info

Publication number
WO2019174558A1
WO2019174558A1 PCT/CN2019/077744 CN2019077744W WO2019174558A1 WO 2019174558 A1 WO2019174558 A1 WO 2019174558A1 CN 2019077744 W CN2019077744 W CN 2019077744W WO 2019174558 A1 WO2019174558 A1 WO 2019174558A1
Authority
WO
WIPO (PCT)
Prior art keywords
partition
index
sub
identifier
cursor
Prior art date
Application number
PCT/CN2019/077744
Other languages
English (en)
Chinese (zh)
Inventor
谢晓芹
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2019174558A1 publication Critical patent/WO2019174558A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present application relates to the field of communications, and in particular, to a data indexing method and apparatus.
  • Partition Key a column or a combination column for partition division.
  • Partition Key There are usually two partitioning methods: the first partitioning method is a consistent hash (Hash) partition, and the Partition Key value is hashed to know which partition the data belongs to; the second partitioning method is Range Partition (Range Partition).
  • the scenario needs to be indexed according to other columns of the user data table before being indexed according to the Partition Key of the user data table.
  • This is called a secondary index.
  • the first type is based on the value of the specified Partition Key, and the Partition Key must be unique; the second type is based on the conditions of the Partition Key.
  • the query is performed. At this time, the Partition Key prefix or range is specified.
  • the third type is to query according to the secondary index condition. In this case, the value prefix or range of the secondary index column is specified.
  • the secondary index sort sequence will change. If the traversal continues, the returned traversal record may be duplicated or omitted. In the existing solution, the equalization priority processing mode is adopted. During the secondary index traversal operation, if the splitting merge is found in the traversed Partition, the secondary index traversal is re-executed from the changed Partition.
  • the prioritized processing method adopted by the existing solution may result in more invalid traversal and affect the delay of the user traversing operation.
  • the embodiment of the present invention provides a data indexing method and device, which are used to reduce invalid secondary index operations and reduce the delay of the secondary index when the partition is split during the secondary indexing process.
  • the first aspect of the present application provides a data indexing method, including: receiving a secondary index request, where the secondary index request carries a secondary index condition and a first index position, where the first index position is a previous index
  • the first index result of the first partition is obtained according to the second index condition, where the first index result includes the first data and the first cursor satisfying the second index condition, and the first cursor indicates the first partition identifier and the first a second index location
  • the first partition identifier is used to indicate that the first initial partition is split when indexing to the first index location, the first initial partition includes the first partition; and the second partition is obtained according to the first partition identifier and the second index location
  • the index result, wherein the second index result includes the second data that satisfies the secondary index condition and the second cursor.
  • the traversal process of the secondary index is decoupled from the splitting and merging operation of the partition, and each is independent, and does not affect each other, and the return result of the secondary index traversal is not lost, and Reduce invalid secondary index operations and reduce the latency of secondary indexes.
  • the obtaining, by the second index, the first index result of the first partition includes: according to the second index Conditioning, starting from a first starting position in the first partition, performing a secondary index, the first starting position being the first index position or a next index position closest to the first index position; acquiring the first The first index result of a partition.
  • the process of performing the secondary index according to the secondary index condition is refined, and the starting position of the secondary index is determined.
  • the first cursor is further used to indicate the first initial partition identifier, where the first initial partition identifier indicates The partition key range of an initial partition.
  • the indication information in the cursor is added, the indication of the initial range is added, the traversal range of the secondary index is clarified, and the secondary indexing process is accelerated.
  • the obtaining, by the first index identifier and the second index location, the second index result includes: The first partition identifier performs a secondary index from a second start position in the first sub-partition, and the second start position is the second index position or a next index closest to the second index position a second index result of the first sub-partition, the second index result includes the second data and a second cursor, and the second cursor indicates the first initial partition identifier, the first a partition identifier, a first sub-partition identifier, and a third index location, where the first sub-partition identifier is used to indicate that the first partition splits when indexing to the third index location.
  • the process of performing the secondary index again after the partition is split is determined, the starting position of the secondary index is determined again, and the cursor information for the next index is obtained.
  • the method further includes: performing a secondary index on the second sub-partition, where the second sub-partition is included in the a first partition; obtaining a third index result of the second sub-partition, where the third index result includes third data and a third cursor that meet the second-level index condition in the second sub-partition, where a third cursor indicating the first initial partition identifier, the first partition identifier, a second sub-partition identifier, and a fourth index location, where the second sub-partition identifier is used to indicate that the first partition is indexed to the first Splitting occurs when the four index positions.
  • the second-level index traversal of the second sub-partition is continued, so that the embodiment of the present application is more complete in the steps.
  • the method further includes: if the second sub-partition and the second sub-area are merged into a merged partition, the second And storing, by the first initial partition, a second index of the second sub-partition in the merged partition; acquiring a fourth index result of the merged partition, where the fourth index result includes the second a fourth data and a fourth cursor satisfying the secondary index condition in the child partition, the fourth cursor indicating the first initial partition identifier, the first partition identifier, the second sub-region identifier, and the fifth Index location.
  • the second-level index traversal is performed on the range of the second sub-area in the merging area, and the implementation manner of the embodiment of the present application is added.
  • the method further includes: if the second sub-partition in the merged partition completes the secondary index, The second partition in the merged partition performs a secondary index; obtaining a fifth index result of the merged partition, where the fifth index result includes fifth data in the second partition that satisfies the secondary index condition And a fifth cursor, the fifth cursor indicating the first initial partition identifier, a second partition identifier, and a sixth index position, where the second partition identifier indicates that the first initial partition is indexed to the first index Splitting occurs at the location.
  • the second-level index traversal is performed on the range of the second-partition in the merged area, and the implementation manner of the embodiment of the present application is added.
  • the method further includes: if the second sub-partition completes the second-level index, performing the second partition And obtaining a sixth index result of the second partition, where the sixth index result includes sixth data and a sixth cursor that meet the second index condition in the second partition, and the sixth cursor
  • the first initial partition identifier, the second partition identifier, and the seventh index location are included, and the second partition identifier indicates that the first initial partition is split when the first index location is located.
  • the partitioning is not performed, after the secondary index of the second sub-partition is completed, the second-level indexing of the range of the second partition is continued, and the implementation manner of the embodiment of the present application is added.
  • the method further includes: if the first initial partition completes the secondary index, performing the second initial partition Level index; obtaining an index result of the second initial partition, the index result of the second initial partition includes data that satisfies the secondary index condition and a seventh cursor in the second initial partition, the seventh cursor A partition key range and an eighth index position of the second initial partition are indicated.
  • the secondary index of the first initial partition is completed, the second index of the range of the second initial partition is continued, and the implementation manner of the embodiment of the present application is added.
  • the partition key range includes a value of a left boundary and does not include a value of a right boundary.
  • the scope of the partition key is limited, so that the application is more logically strict.
  • the second aspect of the present application provides a data indexing apparatus, including: a receiving unit, configured to receive a secondary index request, where the secondary index request carries a secondary index condition and a first index position; and the first acquiring unit is configured to: And acquiring, by the second index condition, a first index result of the first partition, where the first index result includes first data and a first cursor that meet the second index condition, where the first cursor indicates the first a partition identifier and a second index location, where the first partition identifier is used to indicate that the first initial partition is split when indexing to the first index location, the first initial partition includes the first partition; and the second obtain And a unit, configured to obtain a second index result according to the first partition identifier and the second index location, where the second index result includes second data that meets the secondary index condition and a second cursor.
  • the traversal process of the secondary index is decoupled from the splitting and merging operation of the partition, and each is independent, and does not affect each other, ensuring that the return result of the secondary index traversal is not missing, and Reduce invalid secondary index operations and reduce the latency of secondary indexes.
  • the first acquiring unit is specifically configured to: start from the first partition according to the second index condition The starting position begins to perform a secondary index, where the first starting position is the first index position or a next index position that is closest to the first index position; and the first index result of the first partition is obtained.
  • the process of performing the secondary index according to the secondary index condition is refined, and the starting position of the secondary index is determined.
  • the first cursor is further used to indicate the first initial partition identifier, where the first initial partition identifier indicates The partition key range of an initial partition.
  • the indication information in the cursor is added, the indication of the initial range is added, the traversal range of the secondary index is clarified, and the secondary indexing process is accelerated.
  • the second acquiring unit is specifically configured to: according to the first partition identifier, from the first sub-partition Starting from the second starting position, the second starting position is the second index position or the next index position closest to the second index position; acquiring the second index of the first sub-partition
  • the second index result includes the second data and the second cursor
  • the second cursor indicates the first initial partition identifier, the first partition identifier, the first sub-partition identifier, and the third index position
  • the first sub-partition identifier is used to indicate that the first partition splits when indexing to the third index position.
  • the process of performing the secondary index again after the partition is split is determined, the starting position of the secondary index is determined again, and the cursor information for the next index is obtained.
  • the data indexing apparatus further includes: a first indexing unit, configured to perform secondary indexing on the second sub-partition, The second sub-partition is included in the first partition, and the third obtaining unit is configured to obtain a third index result of the second sub-partition, where the third index result includes that the second sub-partition satisfies the a third data and a third cursor of the second index condition, the third cursor indicating the first initial partition identifier, the first partition identifier, the second sub-partition identifier, and the fourth index position, the second sub- The partition identifier is used to indicate that the first partition splits when indexing to the fourth index position.
  • the second-level index traversal of the second sub-partition is continued, so that the embodiment of the present application is more complete in the steps.
  • the data indexing apparatus further includes: a second index unit, if the second sub-partition and the second partition are merged into Merging the partition, the second partition is included in the first initial partition, and is used for performing secondary indexing on the second sub-partition in the merged partition; and a fourth acquiring unit, configured to acquire the merged partition a fourth index result, where the fourth index result includes fourth data and a fourth cursor that meet the second index condition in the second sub-partition, where the fourth cursor indicates the first initial partition identifier, The first partition identifier, the second sub-partition identifier, and the fifth index location are described.
  • the second-level index traversal is performed on the range of the second sub-area in the merging area, and the implementation manner of the embodiment of the present application is added.
  • the data indexing apparatus further includes: a third index unit, if the second sub-partition in the merged partition The second index is used to perform the second index on the second partition in the merged partition, and the fifth obtaining unit is configured to obtain a fifth index result of the merged partition, where the fifth index result includes a fifth data and a fifth cursor satisfying the secondary index condition in the second partition, the fifth cursor indicating the first initial partition identifier, a second partition identifier, and a sixth index position, the second The partition identifier indicates that the first initial partition splits when indexed to the first index position.
  • the second-level index traversal is performed on the range of the second sub-area in the merging area, and the implementation manner of the embodiment of the present application is added.
  • the data indexing apparatus further includes: a fourth index unit, if the second sub-partition completes the secondary index, And a sixth indexing unit, configured to acquire a sixth index result of the second partition, where the sixth index result includes that the second level meets the second level a sixth data and a sixth cursor of the index condition, the sixth cursor includes the first initial partition identifier, the second partition identifier, and the seventh index position, and the second partition identifier indicates the first initial partition Splitting occurs at the first index position.
  • the partitioning is not performed, after the secondary index of the second sub-partition is completed, the second-level indexing of the range of the second partition is continued, and the implementation manner of the embodiment of the present application is added.
  • the data indexing apparatus further includes: a fifth index unit, if the first initial partition completes the secondary index, And the second obtaining unit is configured to obtain the index result of the second initial partition, where the index result of the second initial partition includes the second initial partition that meets the The data of the secondary index condition and the seventh cursor, the seventh cursor indicating a partition key range and an eighth index position of the second initial partition.
  • the secondary index of the first initial partition is completed, the second index of the range of the second initial partition is continued, and the implementation manner of the embodiment of the present application is added.
  • the partition key range includes a value of a left boundary and does not include a value of a right boundary.
  • the scope of the partition key is limited, so that the application is more logically strict.
  • a third aspect of the present application provides a computer readable storage medium having stored therein instructions that, when executed on a computer, cause the computer to perform the methods described in the above aspects.
  • a fourth aspect of the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the methods described in the various aspects above.
  • FIG. 1 is a schematic diagram of a network architecture applied to an embodiment of the present application
  • FIG. 2 is a schematic diagram of an application scenario of a data indexing method in an embodiment of the present application
  • FIG. 3 is a schematic diagram of an embodiment of a data indexing method in an embodiment of the present application.
  • FIG. 4 is a schematic diagram of an application scenario of a data indexing method in an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an embodiment of a data indexing apparatus according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of another embodiment of a data indexing apparatus according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of another embodiment of a data indexing apparatus according to an embodiment of the present application.
  • the embodiment of the present invention provides a data indexing method and device, which are used to reduce invalid secondary index operations and reduce the delay of the secondary index when the partition is split during the secondary indexing process.
  • the embodiment of the present application can be applied to a distributed database using a range partitioning technology, and can be applied to a network framework as shown in FIG. 1.
  • distributed The database system is divided into four consecutive partitions according to the value range of the partition key (Partition Key), namely Partition1, Partition2, Partition3, Partition4, and x.
  • minKey is the minimum value of the partition key
  • maxKey is the maximum value of the partition key
  • the partition key of partition 1 is in the range of (minKey, -75)
  • the partition key of partition 2 is in the range of (-75,25)
  • the partition key of partition 3 has a value range of (25, 175)
  • the partition key of partition 4 has a value range of (175, maxKey)
  • partition 1 to partition 4 are successively arranged from left to right in order. .
  • the query request is based on the partition view (Partiton). Range Map), traversing the secondary index of each partition according to the range order and returning the result through the query interface. If the partition merge operation occurs during the traversal process, the secondary index sort sequence will change if you continue Traversing, then the returned traversal record may have duplicate index records.
  • the Partiton Range Map is used as the range index of the user data table.
  • the partition view records the identity (Id), partition range (Range) and partition of each partition.
  • the home server address facilitates the creation, update, and deletion of data records, and locates which partition belongs to it according to the Partition Key.
  • the user configuration partitions the user data table according to the name column, that is, the name is a Partition Key.
  • the system divides the user data table into three partitions, and the range of the three partitions is a continuous value of the name.
  • the range is partition 1 (Partition 1), partition 2 (Partition 2), partition 3 (Partition 3), the partition key of Partition 1 takes the value range [MIN, "b"), and the partition key of Partition 2 takes the value range ["b" , "c"), Partition3 partition key value range is ["c", MAX), where MIN is the minimum value of the partition key, MAX is the maximum value of the partition key.
  • the user data table also includes a unique row ID, age, address, and StudentId. To meet the complex query scenario, you need to index the other columns of the user data table, create a secondary index table, and then sort the values of the columns in each secondary index to generate a secondary index table, as shown in Figure 2.
  • Partition1 When Partition1 is split in the secondary indexing process for Partition1, the partitioned database system needs to complete the partition split operation and continue the secondary index operation, reducing the invalid secondary index operation and reducing the delay of the secondary index.
  • an embodiment of the data indexing method in the embodiment of the present application includes:
  • the secondary index request carries a secondary index condition and a first index position, where the first index position is the last time the data is acquired in the secondary index table during the last secondary index query process .
  • the traversal of the secondary index condition is converted into a traversal of the secondary index table within each partition within each namespace.
  • the "name” column is the partition key and the "student” column is the secondary index column.
  • the user data table has been divided into multiple partitions according to the value of the "name” column, which are Partition1, Partition2, and Partition3.
  • the specific partition view relationship is as follows:
  • the numbers 1, 2, 3 indicate the partition name
  • [MIN, "b" indicates the partition key range of Partition1
  • ["b", "c" indicates the partition key range of Partition2, ["c", MAX) Partition3's partition key range
  • Server1_ip indicates the server address of Partition1
  • server2_ip indicates the server address of Partition2
  • server3_ip indicates the server address of Partition3.
  • the secondary index condition is: "student" is greater than or equal to 10 and less than or equal to 50.
  • the data indexing device sends the secondary index request and the secondary index condition to Server1_ip, specifies Partition: 1, [MIN, "b"), queries the record whose "student” is greater than or equal to 10, and is less than or equal to 50;
  • the secondary index condition is sent to Server2_ip, specifying Partition: 2, ["b", "c"), querying the record "student” is greater than or equal to 10, less than or equal to 50; sending the secondary index request and the secondary index condition to Server3_ip Specify Partition:3,["c",MAX), and query the record whose "student” is greater than or equal to 10 and less than or equal to 50.
  • the user can specify the "name” range as ["a", "b”).
  • Partition when the Partition is split (that is, a partition will be split into 2 or more), the data under the original Partition will be moved to a new Partition in the order of PartitionKey. The table will be rebuilt; when the two partitions of the adjacent Range are merged, the data in the right Partition will be sequentially moved back to the left Partition, and the secondary index table is reconstructed.
  • Partition 1, [MIN, "b") and Partition: 2, ["b", "c"
  • Partition1 is called the left partition
  • Partition2 is called Right partition.
  • the Partition Key is generally saved to the secondary table record, and the secondary index record can be judged without traversing the primary record when traversing the secondary index table migration record. Which partition belongs to; and the rowid of the primary record is recorded in the secondary index table to ensure the uniqueness of the secondary index record to deal with the scenario where the Partition Key is not unique.
  • partition key range [X, Y) in the description of the embodiment and subsequent embodiments of the present application can also be simply expressed as (X, Y), following the left closed right open principle, including the value of the left boundary. Does not contain the value of the right border.
  • the first partition identifier is used to indicate that the first initial partition splits when indexing to the first index position, and the first initial partition includes the first partition.
  • the returned first index result includes the first data and the first cursor that are queried, and the following basic information is recorded in the cursor: the first partition identifier, such as split[MIN ⁇ "d" , pos0), indicating that the first partition [MIN, "b") splits when indexing to pos0; the second index position, that is, the position of the secondary index currently traversed, may be the value of the secondary index column and the rowid. Such as pos1.
  • the first cursor is further used to indicate a first initial partition identifier, and the first initial partition identifier indicates a partition key range of the first initial partition.
  • Partition1 is taken as the first partition, and the extents in which Partition1, Partition2, and Partition3 are merged together are referred to as the first initial partition.
  • Partition1 occurs.
  • Split split into Partition4 and Partition5, where Partition4 is on the left side of Partition5, sequentially traversing each split subpartition (ie Partition4 and Partition5), and the secondary index table of each subpartition starts from the secondary index position before splitting. Traversing. Therefore, it is necessary to increase the KeyRange of the split partition and the secondary index position before splitting in the cursor.
  • the first partition continues to be split into the first sub-partition and the second sub-partition, and the first sub-partition is further indexed according to the first cursor, and the required content is obtained.
  • the second child cursor includes a first initial partition identifier, an identifier of the first partition, a first sub-partition identifier, and a current index position of the first sub-partition (third index position)
  • the first initial The partition identifier indicates a partition key range of the first initial partition, where the identifier of the first partition is used to indicate that the partition key range of the first initial partition is split when traversing to the first index position, and the first sub-partition identifier is used to indicate the first The partition splits when indexing to the second index position.
  • the starting position of the secondary index is different.
  • the secondary index is started from the first starting position in the first partition according to the secondary index condition, and the first starting position is the first index position or the next index position closest to the first index position.
  • a set of numbers consisting of 1, 2, 3, 4, 5, 6, 7, ..., 20 in the secondary index list of the first initial partition is split into the first after splitting at the position of 4.
  • the partition and the second partition, the secondary index list of the first partition is 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, and the secondary index list of the second partition is 2, 4, 6, 8, 10, 12, 14, 16, 18, 20.
  • the starting position of the second index for the second partition is 5, and the position for the secondary index in the second partition is 4.
  • the first partition is split into the first sub-partition and the second sub-partition, and the first partition split identifier is added to the returned cursor to ensure the secondary index traversal process.
  • the secondary index traversal can be continued without re-traversal, avoiding the interaction between the index traversal process and the partition splitting aggregation, reducing the invalid secondary index operation, and reducing the delay of the secondary index.
  • FIG. 4 another embodiment of the data indexing method in the embodiment of the present application includes:
  • Partition0 Partition1
  • Partition2 Partition2
  • the PartitionKey of the user data table is the "name” column, arranged in lexicographic order, and KeyRange is [A ⁇ F)
  • the secondary index is listed as "StudentId” column, sorted by size.
  • StudentId the value of StudentId is unique, so that only the value of StudentId can represent the traversal position of the secondary index. It can be understood that when the value of the secondary index column is not unique, the StudentId is replaced with the StudentId-Rowid.
  • the data indexing device decomposes the request into a secondary index traversal operation of multiple Partitions according to the current Partition Range Map, and the PartitionRangeMap is:
  • Partition1 is split and split into Parititon3 (A ⁇ D) and Partition4 (D ⁇ F).
  • Partition6 and Partition4 are merged into Partition7 (B ⁇ F), and according to the split2 (B ⁇ D, 7) of the record in the cursor, it is mapped to Partition7 (B ⁇ F).
  • the traversal process of the secondary index conditional and the splitting and merging operation of the partition are decoupled, and they are independent and do not affect each other. Ensure that the secondary index traversal returns no omissions, and reduces invalid secondary index operations, reducing the latency of the secondary index.
  • an embodiment of the data indexing device in the embodiment of the present application includes:
  • the receiving unit 501 is configured to receive a secondary index request, where the secondary index request carries a secondary index condition and a first index position;
  • the first obtaining unit 502 is configured to obtain a first index result of the first partition according to the secondary index condition, where the first index result includes first data and a first cursor that meet the secondary index condition,
  • the first cursor indicates a first partition identifier and a second index location, where the first partition identifier is used to indicate that the first initial partition is split when indexing to the first index position, where the first initial partition includes First partition
  • a second obtaining unit 503 configured to acquire a second index result according to the first partition identifier and the second index position, where the second index result includes second data that meets the second index condition and Two cursors.
  • FIG. 6 another embodiment of the data indexing apparatus in the embodiment of the present application includes:
  • the receiving unit 601 is configured to receive a secondary index request, where the secondary index request carries a secondary index condition and a first index position;
  • the first obtaining unit 602 is configured to obtain a first index result of the first partition according to the secondary index condition, where the first index result includes first data and a first cursor that meet the secondary index condition,
  • the first cursor indicates a first partition identifier and a second index location, where the first partition identifier is used to indicate that the first initial partition is split when indexing to the first index position, where the first initial partition includes First partition
  • the second obtaining unit 603 is configured to obtain a second index result according to the first partition identifier and the second index position, where the second index result includes second data that meets the second index condition and Two cursors.
  • the first obtaining unit 602 is specifically configured to:
  • the first cursor is further used to indicate the first initial partition identifier, and the first initial partition identifier indicates a partition key range of the first initial partition.
  • the second obtaining unit 603 is specifically configured to:
  • the second index result includes the second data and a second cursor, where the second cursor indicates the first initial partition identifier, the first partition identifier a first sub-partition identifier and a third index location, where the first sub-partition identifier is used to indicate that the first partition splits when indexing to the third index position.
  • the data indexing device further includes:
  • a first index unit 604 configured to perform a secondary index on the second sub-partition, where the second sub-partition is included in the first partition;
  • a third obtaining unit 605 configured to acquire a third index result of the second sub-partition, where the third index result includes third data and a third cursor that meet the second-level index condition in the second sub-partition
  • the third cursor indicates the first initial partition identifier, the first partition identifier, the second sub-partition identifier, and the fourth index location, where the second sub-partition identifier is used to indicate that the first partition is in the index Splitting occurs when the fourth index position is reached.
  • the data indexing device further includes:
  • a second index unit 606 if the second sub-partition and the second partition are merged into a merged partition, the second partition is included in the first initial partition, and is used to compare the second sub-part of the merged partition Partitioning for secondary indexing;
  • the fourth obtaining unit 607 is configured to obtain a fourth index result of the merged partition, where the fourth index result includes fourth data and a fourth cursor that meet the second index condition in the second sub-partition.
  • the fourth cursor indicates the first initial partition identifier, the first partition identifier, the second sub-partition identifier, and the fifth index position.
  • the data indexing device further includes:
  • the third indexing unit 608 is further configured to perform secondary indexing on the second partition in the merged partition if the second sub-partition in the merged partition completes the secondary index;
  • a fifth obtaining unit 609 configured to acquire a fifth index result of the merged partition, where the fifth index result includes fifth data and a fifth cursor that meet the second index condition in the second partition,
  • the fifth cursor indicates the first initial partition identifier, the second partition identifier, and the sixth index position
  • the second partition identifier indicates that the first initial partition splits when indexing to the first index position.
  • the data indexing device further includes:
  • the fourth index unit 610 is configured to perform secondary indexing on the second partition if the second sub-part completes the secondary index.
  • a sixth obtaining unit 611 configured to acquire a sixth index result of the second partition, where the sixth index result includes sixth data and a sixth cursor that meet the second index condition in the second partition, where The sixth cursor includes the first initial partition identifier, the second partition identifier, and the seventh index position, and the second partition identifier indicates that the first initial partition is split when the first index position is located.
  • the data indexing device further includes:
  • the fifth index unit 612 is configured to perform a secondary index on the second initial partition if the first initial partition completes the secondary index.
  • the seventh obtaining unit 613 is configured to obtain an index result of the second initial partition, where the index result of the second initial partition includes data that meets the secondary index condition and a seventh cursor in the second initial partition,
  • the seventh cursor indicates a partition key range and an eighth index position of the second initial partition.
  • the partition key range contains the value of the left border and does not contain the value of the right border.
  • the data indexing device in the embodiment of the present application is described in detail from the perspective of a modular functional entity.
  • the data indexing device in the embodiment of the present application is described in detail below.
  • FIG. 7 is a schematic structural diagram of a data indexing apparatus according to an embodiment of the present application.
  • the data indexing apparatus 700 may generate a large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU). 701 (eg, one or more processors) and memory 709, one or more storage media 708 that store application 707 or data 706 (eg, one or one storage device in Shanghai).
  • the memory 709 and the storage medium 708 may be short-term storage or persistent storage.
  • Programs stored on storage medium 708 may include one or more modules (not shown), each of which may include a series of instruction operations in a data indexing device.
  • the processor 701 can be configured to communicate with the storage medium 708 to perform a series of instruction operations in the storage medium 708 on the data indexing device 700.
  • Data indexing device 700 may also include one or more power sources 702, one or more wired or wireless network interfaces 703, one or more input and output interfaces 704, and/or one or more operating systems 705, such as Windows Serve, Mac OS X, Unix, Linux, FreeBSD, and more. It will be understood by those skilled in the art that the data indexing device structure shown in FIG. 7 does not constitute a limitation on the data indexing device, and may include more or less components than those illustrated, or combine some components or different components. Arrangement.
  • the processor 701 is a control center of the data indexing device and can be processed according to the set data indexing method.
  • the processor 701 connects various portions of the entire data indexing device using various interfaces and lines, performs a data indexing device by running or executing software programs and/or modules stored in the memory 709, and recalling data stored in the memory 709. A variety of functions and processing data to achieve secondary index traversal.
  • the memory 709 can be used to store software programs and modules, and the processor 701 executes various functional applications and data processing of the data indexing device 700 by running software programs and modules stored in the memory 709.
  • the memory 709 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as receiving a secondary index request, etc.), and the like; the storage data area may be stored according to the data index. Data created by the use of the device (such as obtaining index results, etc.).
  • memory 709 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
  • the program of the data indexing method and the received data stream provided in the embodiment of the present application are stored in a memory, and the processor 701 calls from the memory 709 when it is needed.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transmission to another website site, computer, server or data center via wired (eg coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.).
  • wired eg coaxial cable, fiber optic, digital subscriber line (DSL)
  • wireless eg infrared, wireless, microwave, etc.
  • the computer readable storage medium can be any available media that can be stored by a computer or a data storage device such as a server, data center, or the like that includes one or more available media.
  • the usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a DVD), or a semiconductor medium (such as a solid state disk (SSD)) or the like.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • a computer readable storage medium A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program code. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Les modes de réalisation de la présente invention ont trait à un procédé et à un dispositif d'indexation de données permettant de réduire des opérations invalides d'indexation secondaire lorsqu'une partition se divise durant un processus d'indexation secondaire, diminuant ainsi le retard d'indexation secondaire. Le procédé des modes de réalisation de la présente invention comprend les étapes consistant à : recevoir une demande d'indexation secondaire, la demande d'indexation secondaire transportant une condition d'indexation secondaire et une première position d'indexation ; acquérir un premier résultat d'indexation d'une première partition selon la condition d'indexation secondaire, le premier résultat d'indexation comprenant des premières données et un premier curseur satisfaisant la condition d'indexation secondaire ; acquérir un second résultat d'indexation selon l'identifiant de la première partition et la seconde position d'indexation, le second résultat d'indexation comprenant des secondes données et un second curseur satisfaisant la condition d'indexation secondaire.
PCT/CN2019/077744 2018-03-13 2019-03-12 Procédé et dispositif d'indexation de données WO2019174558A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810205324.7A CN108595482B (zh) 2018-03-13 2018-03-13 一种数据索引方法及装置
CN201810205324.7 2018-03-13

Publications (1)

Publication Number Publication Date
WO2019174558A1 true WO2019174558A1 (fr) 2019-09-19

Family

ID=63626149

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/077744 WO2019174558A1 (fr) 2018-03-13 2019-03-12 Procédé et dispositif d'indexation de données

Country Status (2)

Country Link
CN (1) CN108595482B (fr)
WO (1) WO2019174558A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108595482B (zh) * 2018-03-13 2022-06-10 华为云计算技术有限公司 一种数据索引方法及装置
CN112231318A (zh) * 2020-10-14 2021-01-15 北京人大金仓信息技术股份有限公司 创建全局索引的方法和装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110213775A1 (en) * 2010-03-01 2011-09-01 International Business Machines Corporation Database Table Look-up
CN105512200A (zh) * 2015-11-26 2016-04-20 华为技术有限公司 一种分布式数据库处理的方法和设备
CN106777343A (zh) * 2017-01-16 2017-05-31 百融(北京)金融信息服务股份有限公司 增量分布式索引系统和方法
CN107688438A (zh) * 2017-08-03 2018-02-13 中国石油集团川庆钻探工程有限公司地球物理勘探公司 适用于大规模地震数据存储、快速定位的方法及装置
CN108595482A (zh) * 2018-03-13 2018-09-28 华为技术有限公司 一种数据索引方法及装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10831736B2 (en) * 2015-03-27 2020-11-10 International Business Machines Corporation Fast multi-tier indexing supporting dynamic update

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110213775A1 (en) * 2010-03-01 2011-09-01 International Business Machines Corporation Database Table Look-up
CN105512200A (zh) * 2015-11-26 2016-04-20 华为技术有限公司 一种分布式数据库处理的方法和设备
CN106777343A (zh) * 2017-01-16 2017-05-31 百融(北京)金融信息服务股份有限公司 增量分布式索引系统和方法
CN107688438A (zh) * 2017-08-03 2018-02-13 中国石油集团川庆钻探工程有限公司地球物理勘探公司 适用于大规模地震数据存储、快速定位的方法及装置
CN108595482A (zh) * 2018-03-13 2018-09-28 华为技术有限公司 一种数据索引方法及装置

Also Published As

Publication number Publication date
CN108595482B (zh) 2022-06-10
CN108595482A (zh) 2018-09-28

Similar Documents

Publication Publication Date Title
US10754878B2 (en) Distributed consistent database implementation within an object store
Vora Hadoop-HBase for large-scale data
US10534547B2 (en) Consistent transition from asynchronous to synchronous replication in hash-based storage systems
EP2863310B1 (fr) Procédé et appareil de traitement de données, ainsi que dispositif de stockage partagé
US10140351B2 (en) Method and apparatus for processing database data in distributed database system
WO2021003935A1 (fr) Procédé et appareil de stockage de groupes de données et dispositif informatique
JP2017512338A (ja) 第一クラスデータベース要素としての半構造データの実装
Bernstein et al. Optimizing optimistic concurrency control for tree-structured, log-structured databases
CN106484820B (zh) 一种重命名方法、访问方法及装置
US11030196B2 (en) Method and apparatus for processing join query
WO2010048789A1 (fr) Construction d'index, procédé d'interrogation, dispositif et système pour une base de données distribuée de mémoires de colonnes
US20080270352A1 (en) Modifying entry names in directory server
WO2023179787A1 (fr) Procédé et appareil de gestion de métadonnées pour un système de fichiers distribués
WO2019174558A1 (fr) Procédé et dispositif d'indexation de données
US11221777B2 (en) Storage system indexed using persistent metadata structures
JP7408626B2 (ja) テナント識別子の置換
CN108628969B (zh) 一种空间关键字索引方法及平台、存储介质
Qi Digital forensics and NoSQL databases
US20170270149A1 (en) Database systems with re-ordered replicas and methods of accessing and backing up databases
Agrawal et al. Survey on Mongodb: an open-source document database
KR20220011184A (ko) 증분 데이터 비교 구현 시스템 및 방법
Suganya et al. Efficient fragmentation and allocation in distributed databases
WO2023066222A1 (fr) Procédé et appareil de traitement de données, dispositif électronique, support de stockage et produit-programme
US20220365905A1 (en) Metadata processing method and apparatus, and a computer-readable storage medium
Klein et al. Dxram: A persistent in-memory storage for billions of small objects

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19767744

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19767744

Country of ref document: EP

Kind code of ref document: A1