CN111221857B - Method and apparatus for reading data records from a distributed system - Google Patents

Method and apparatus for reading data records from a distributed system Download PDF

Info

Publication number
CN111221857B
CN111221857B CN201811323197.7A CN201811323197A CN111221857B CN 111221857 B CN111221857 B CN 111221857B CN 201811323197 A CN201811323197 A CN 201811323197A CN 111221857 B CN111221857 B CN 111221857B
Authority
CN
China
Prior art keywords
partition
parent
server
data record
range
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811323197.7A
Other languages
Chinese (zh)
Other versions
CN111221857A (en
Inventor
向宇
黄飞腾
徐然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Cloud Computing Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Cloud Computing Technologies Co Ltd filed Critical Huawei Cloud Computing Technologies Co Ltd
Priority to CN201811323197.7A priority Critical patent/CN111221857B/en
Publication of CN111221857A publication Critical patent/CN111221857A/en
Application granted granted Critical
Publication of CN111221857B publication Critical patent/CN111221857B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a method and a device for reading data records from a distributed system, wherein the distributed system comprises a partition server and a client, the partition server is used for managing a parent partition in a data table, the partition server stores a first partition range checker, and the partition range of the first partition range checker is the partition range of the parent partition, and the method comprises the following steps: the method comprises the steps that a partition server receives a first reading request sent by a client in the process of splitting a parent partition, wherein the first reading request is used for requesting to read a first data record from the parent partition; the partition server reads a first data record from the parent partition according to the first read request, and the completion time of reading the first data record is later than the splitting completion time of the parent partition; the partition server uses the first partition range checker to check the partition range of the partition to which the row key value of the first data record belongs, which is beneficial to improving the accuracy of reading the data record.

Description

Method and apparatus for reading data records from a distributed system
Technical Field
The present application relates to the field of information technology, and more particularly, to a method and apparatus for reading data records from a distributed system.
Background
In distributed systems, data records are typically stored in the form of data tables, such as Hbase tables in Hadoop distributed systems. To improve query efficiency, the data table may be partitioned into a plurality of partitions (regions), and the plurality of partitions are partitioned into a plurality of partition servers, each of which manages a respective partition. Among the plurality of partition servers, some partition servers process access requests (read requests or write requests) more frequently, and some partition servers process access requests less frequently. This can lead to a load imbalance among the partitioned servers.
The industry typically balances the load on partitioned servers by splitting the partitions. For example, a partition server with a higher load in the distributed system is a target partition server, and a target partition in a partition managed by the target partition server is accessed with a higher frequency, the target partition may be regarded as a partition to be split (i.e., a parent partition), the parent partition may be split into a plurality of child partitions, and a part of the child partitions in the plurality of child partitions may be distributed to a partition server with a lower load in the distributed system, so as to reduce the load of the target partition server.
During partition splitting, the partition metadata of the parent partition, as well as the partition scope checker of the parent partition, need to be modified to apply to the new child partition. However, if during the splitting of the parent partition, after receiving a first read request requesting to read a first data record from the parent partition, and the completion time of reading the first data record is later than the splitting completion time of the parent partition, at this time, the partition server of the parent partition has been modified, and if the partition server after modification is used to check the partition range of the first data record, the first data record is no longer read from the parent partition, but read from a newly generated child partition, which results in a decrease in the accuracy of the first data record.
Disclosure of Invention
The application provides a method and a device for reading data records from a distributed system, so as to improve the accuracy of reading the data records.
In a first aspect, a method for reading data records from a distributed system, the distributed system including a partition server and a client, the partition server being configured to manage a parent partition in a data table, the partition server storing a first partition scope checker, a partition scope of the first partition scope checker being a partition scope of the parent partition, the method including: the partition server receives a first read request sent by the client in the process of splitting the parent partition, wherein the first read request is used for requesting to read a first data record from the parent partition; the partition server reads the first data record from the parent partition according to the first read request, and the completion time of reading the first data record is later than the splitting completion time of the parent partition; the partition server checks a partition range of a partition to which a row key value of the first data record belongs using the first partition range checker.
The above-mentioned partition server uses said first partition range checker to check the partition range of the partition to which the row key value of said first data record belongs, and it can be understood that, if said first read request requests to read the first data record from the parent partition in the first file, the partition server uses the first partition range checker to check the row key value of the read data record in said first file to obtain the data record in said parent partition in said first file, so that the partition server reads said first data record from the data record of said parent partition in said first file.
In the embodiment of the application, the first data record requested to be read by the first read request is checked by the first partition range checker, so that the problem that in the prior art, after the parent partition is successfully split, the first partition range checker is originally modified into a partition range checker of a child partition, so that the read data record is not in the partition range where the data record which is requested to be read by the first read request is originally located is solved, and the accuracy of reading the data record is improved.
In a possible implementation manner, the partition server stores a second partition range checker, where a partition range of the second partition range checker is a partition range of a first target child partition, and the first target child partition is a child partition obtained by splitting the parent partition, and the method further includes: after the splitting of the parent partition is completed, the partition server receives a second read request, wherein the second read request is used for requesting to read a second data record from the first target child partition; the partition server reads the second data record from the first target child partition; the partition server checks the partition range of the partition to which the row key value of the second data record belongs using the second partition range checker.
In the embodiment of the application, the data record requested to be read by the second read request is checked by using the second partition range checker, which is beneficial to improving the accuracy of reading the data record.
In a possible implementation manner, in the process of splitting the parent partition, a value of a timestamp corresponding to the parent partition is a maximum value, and the method further includes: the partition server acquires a timestamp carried by the first read request; and if the timestamp carried by the first read request is smaller than the timestamp corresponding to the parent partition, the partition server selects the first partition range checker to check the partition range of the partition to which the row key value of the first data record belongs.
In the embodiment of the application, the timestamp corresponding to the parent partition is compared with the timestamp carried by the read request to select different partition range inspectors, so that the accuracy of selecting the partition range inspectors is improved.
In a possible implementation manner, after the splitting of the parent partition is completed, a value of a timestamp corresponding to the parent partition is a splitting completion time of the parent partition, the partition server stores a third partition range checker, a partition range of the third partition range checker is a partition range of a second target child partition, and the second target child partition is a child partition obtained by splitting the parent partition, where the method further includes: the partition server receives a third read request, where the third read request is used to request reading of a third data record from the second target child partition, and the second target child partition is a child partition obtained by splitting the parent partition; the partition server acquires a timestamp carried by the third read request; and if the timestamp carried by the third read request is greater than the timestamp corresponding to the parent partition, the partition server uses the third partition range checker to check the partition range of the partition to which the row key value of the third data record belongs.
In the embodiment of the application, the timestamp corresponding to the parent partition is compared with the timestamp carried by the read request to select different partition range inspectors, so that the accuracy of selecting the partition range inspectors is improved.
In one possible implementation, the partition server stores multiple copies of the partition metadata record for the parent partition, and the method further includes: after splitting the parent partition into the plurality of child partitions, the partition server modifies the plurality of replicas into partition metadata records for the plurality of child partitions.
In the embodiment of the application, the partition metadata of the child partition is obtained by modifying the copy of the partition metadata of the parent partition, which is beneficial to simplifying the generation process of the partition metadata of the child partition.
In a second aspect, a partitioned server for reading data records from a distributed system is provided, the partitioned server comprising means for performing the steps of the first aspect or any one of the possible implementations of the first aspect.
The functions that the modules need to realize can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above-described functions.
In a third aspect, a partitioned server cluster is provided that includes at least one partitioned server, each partitioned server including a processor and a memory. The memory is adapted to store a computer program, and the processor is adapted to invoke and run the computer program from the memory, such that the cluster of partitioned servers performs the method of the first aspect.
In a fourth aspect, there is provided a computer program product comprising: computer program code which, when run on a computer, causes the computer to perform the method of the above-mentioned aspects.
It should be noted that, all or part of the computer program code may be stored in the first storage medium, where the first storage medium may be packaged together with the processor or may be packaged separately from the processor, and this is not specifically limited in this embodiment of the present application.
In a fifth aspect, a computer-readable medium is provided, which stores program code, which, when run on a computer, causes the computer to perform the method of the above-mentioned aspects.
Drawings
Fig. 1 is a schematic diagram of an architecture of a distributed system 100 to which an embodiment of the present application is applicable.
Fig. 2 is a schematic diagram of a data table 200 in a distributed system.
Fig. 3 is a flowchart of a partition splitting method according to an embodiment of the present application.
FIG. 4 is a flow diagram of a method of reading data records from a distributed system according to another embodiment of the present application.
FIG. 5 is a flow diagram of a method of reading data records from a distributed system according to another embodiment of the present application.
Fig. 6 is a schematic structural diagram of a partition server that writes data records according to an embodiment of the present application.
FIG. 7 is a schematic block diagram of a partitioned server cluster according to another embodiment of the present application.
Detailed Description
The technical solution in the present application will be described below with reference to the accompanying drawings.
For ease of understanding, a system architecture of a distributed system to which the embodiment of the present application is applicable is first described with reference to fig. 1. The distributed storage system 100 shown in FIG. 1 includes a client 110, a control server 120, a partition server 130, and a storage server 140.
And the storage server 140 is configured to provide a shared storage space for the data record, and is responsible for final persistence of the data record, and all the partition servers may share the data record in the distributed storage system.
The client (client) 110, which may be understood to provide an interface for accessing the distributed system. The client may improve access to the data records by storing information of the distributed system, such as partition location information of the data table, in the cache.
And the control server 120 is configured to manage each partition server, monitor the state of each partition server in real time, and balance the load among the partition servers. The control server also typically stores metadata records for the data table, partition metadata for each partition in the data table, and a routing table.
The partition metadata of the partition includes a partition range of the partition described by the partition metadata, and a partition server that manages the partition.
The metadata of the data table is used to describe the row key value range of the row key contained in the data table and the identification of the partition contained in the data table.
A partition server (region server) 130 for maintaining the allocated partition and processing read requests requesting data records to be read from the partition.
In a distributed system, data records may be stored in the form of files, the data records typically consisting of one or more key value pairs (see table 1), each data record containing a row key (row key) denoted by "_ id", a key (key) denoted by "key", and a value denoted by "value". These data records may logically be presented in the form of a data table. In order to improve the access performance of the data records in the distributed system, the data table can be split into a plurality of partitions in the horizontal direction. For example, in fig. 2, data table 200 may be divided into partition 1, … …, partition m, and partition m +1, where each partition includes a portion of consecutive row keys in the data table, and the value of the consecutive row keys included in each partition may represent the partition range of the partition.
TABLE 1
{"_id":1,"key1":"value1","key2":"value2","key3":"value3"}
{"_id":2,"key1":"value1","key2":"value2","key3":"value3"}
{"_id":3,"key1":"value1","key2":"value2","key3":"value3"}
{"_id":4,"key1":"value1","key2":"value2","key3":"value3"}
{"_id":5,"key1":"value1","key2":"value2","key3":"value3"}
{"_id":6,"key1":"value1","key2":"value2","key3":"value3","key4":"value4"}
In a distributed system, each partition has a respective partition scope. The data records in each partition are organized and stored by the distributed system storage engine, and the data records in the same partition can be dispersed in different data files, and accordingly, the data records in different partitions in the data table can be contained in the same data file. After the partition is split, the partition range corresponding to the partition may also change, and in order to ensure that the data record requested to be read by the read request is data in the partition range, a partition range checker (checker) may be set in the partition server to check the partition range to which the row key value of the read data record belongs.
For example, the partition server manages partition a and partition B in the data table, the partition range of partition a is [1-100], the partition range of partition B is [101-200], and the data records of partition a are scattered in the data files F1, F2, F3, and the data records of partition B are also scattered in the data files F1, F2, F3. At this time, if the read request requests to read the data records in the partition a, the storage engine will sequentially read the data records from the files F1, F2, and F3, but as described above, the data records belonging to the partition B are simultaneously stored in the data files F1, F2, and F3, and the storage engine will read the data records belonging to the partition B simultaneously while reading the data files F1, F2, and F3, and at this time, the partition range checker of the partition a may be used to exclude the data records belonging to the partition B.
By comprehensively considering the load and performance of the distributed system, the partition may need to be further split on the basis of the existing partition to generate more partitions. For example, concurrent performance of reads and writes may be improved by partition splitting. For another example, the load of different partition services in the distributed system may be balanced through a partition splitting mechanism, that is, a parent partition in a target partition server with an overloaded load is split into multiple child partitions, and the child partitions are handed over to other partition servers for management, so as to reduce the load of the target partition server.
The partition splitting method is described below with reference to fig. 3 in conjunction with the architecture of the distributed system shown in fig. 1. Fig. 3 is a flowchart of a partition splitting method according to an embodiment of the present application. Fig. 3 includes steps 310 through 380.
And 310, selecting the partition needing to be split as a parent partition by the control server according to the load condition of each partition server and the access frequency of the data records in each partition managed by the partition server.
320, the control server applies for storage space for storing the sub-partitions from the distributed system.
330, the control server sends a partition split (partition split) message to the target partition server where the parent partition is located. The partition splitting message carries the Identification (ID) of the parent partition, the storage space applied for the child partition, the partition metadata of the parent partition and other information.
340, the target partition server closes the partition service of the parent partition, i.e. marks the parent partition as offline. At this time, if the target partition server receives a read request for requesting to read the data record from the parent partition, the target partition server will feed back an exception to the client to inform that the parent partition is not the service partition, and accordingly, the client may perform a compensatory retry subsequently.
350, the target partition server divides the parent partition into two child partitions according to the partition metadata of the parent partition and a preset partition dividing strategy, and stores the partition metadata of a first child partition of the two child partitions into the storage space, wherein the partition metadata of a second child partition can occupy the storage space of the partition metadata of the parent partition.
The target partition server configures the partition scope checker for both child partitions 360.
Specifically, after the target partition server splits the parent partition, the partition range of the partition range checker of the parent partition is modified into the partition range of the first child partition, which is used as the partition range checker of the first child partition. Accordingly, a partition range checker is regenerated as a partition range checker for the second sub-partition.
The target partition server sends a partition split success message to the control server 370.
380, the control server modifies the information of the partition information of the routing table, the metadata of the partition, the metadata of the data table, etc., so as to record the partition information of the sub-partition.
As can be seen from the partition splitting process shown in fig. 3, before splitting the parent partition, the partition server will first take the parent partition off line, that is, the partition server does not process the read request requesting to read the data record from the parent partition, which may affect the overall performance of the distributed system and reduce the success rate of reading data by the client.
In order to improve the overall performance of the distributed system and improve the success rate of reading data by the client, the present application also provides a method for reading data records, which is described in detail below with reference to fig. 4 based on the distributed system shown in fig. 1.
FIG. 4 is a flow diagram of a method of reading data records from a distributed system according to another embodiment of the present application. The method shown in fig. 4 includes step 410 and step 420.
410, a target partition server receives a first read request sent by a client in a process of splitting a parent partition, where the first read request is used to request to read a first data record from the parent partition.
That is, the parent partition is online, whether during the process of splitting the parent partition or in preparation for splitting the parent partition.
The target partition server may prepare to split the parent partition and split the parent partition in the process of splitting the parent partition. I.e., the time period from when the target partition server receives a split request for the parent partition until the parent partition split is complete.
And 420, the target partition server reads the first data record from the parent partition according to the first read request.
In the embodiment of the application, the partition server can also read the data record read by the first read request from the parent partition in the process of splitting the parent partition, so that the problem that the parent partition needs to be offline before splitting in the traditional partition splitting process and the first read request aiming at the data record in the parent partition cannot be processed is avoided, the overall performance of a distributed system is improved, and the success rate of reading data by a client is improved.
There is also a case in which the target partition server may complete the reading of the first data record later than the splitting completion time of the parent partition in the process of processing the read request. For example, the first data record requested to be read by the first read request has a large data volume, and when the target partition server finishes reading all the data records, the parent partition has been split into a plurality of child partitions. At this time, since the partition range checker of the parent partition has been modified to be the partition range checker of the child partition, if the partition range checker of the child partition is used to check the partition to which the row key of the first data record belongs, a part of the data records in the first data record that do not belong to the partition range of the partition range checker of the child partition cannot be read out, so that the accuracy of reading the data records from the distributed system is reduced.
For example, the partition range of the parent partition and the partition range checker of the parent partition are [1,100], the partition range of the first child partition and the partition range checker of the first child partition are [1,50], the partition range of the second child partition and the partition range checker of the second child partition are [51,100], and the row key range of the first data record read by the first read request is [2,90]. When the target partition server reads the first data record, the partition range checker of the parent partition is modified into the partition range checker of the first child partition, and then only the partition range checker of the first child partition can be used for checking the first data record, at this time, the data record of which the row key value does not belong to the partition range [1,50] in the first data record is taken as an error data record and is not returned to the client, so that the accuracy of querying the data record is reduced.
To avoid this, a partition range checker of the parent partition may be reserved in the target partition server for checking whether the row key value of the first data record belongs to the parent partition.
That is, the time when the partition service processes the first read request is later than the time when the partition of the parent partition is completed, the target partition server stores a first partition scope checker, and the partition scope of the first partition scope checker is the partition scope of the parent partition, and the method further includes:
430, after the parent partition splitting is completed, the target partition server uses the first partition scope checker to check the partition scope of the partition to which the row key value of the first data record belongs.
The above-mentioned partition server uses said first partition range checker to check the partition range of the partition to which the row key value of said first data record belongs, and it can be understood that, if said first read request requests to read the first data record from the parent partition in the first file, the partition server uses the first partition range checker to check the row key value of the read data record in said first file to obtain the data record in said parent partition in said first file, so that the partition server reads said first data record from the data record of said parent partition in said first file.
It should be noted that the first partition scope checker (i.e., the partition scope checker of the parent partition) may be stored in the target partition server at all times. However, to conserve partition server resources, the first partition scope checker may be deleted after the target partition server has processed the first read request.
That is, in the embodiment of the present application, a read request may be associated with a partition scope checker, and thus, a read request that is the same as the first read request type, that is, a read request that is received during splitting of a parent partition and has a processing time later than the splitting completion time of the parent partition may be associated with the first partition scope checker.
For example, the partition range of the parent partition and the partition range checker of the parent partition are [1,100], the partition range of the first child partition and the partition range checker of the first child partition are [1,50], the partition range of the second child partition and the partition range checker of the second child partition are [51,100], and the row key range of the first data record read by the first read request is [2,90]. The first partition extent checker may be stored in the target partition server before the target partition server completes reading the first data record, such that the first data record may all be checked using the first partition extent checker.
Accordingly, for a read request received by a target partition server after the parent partition split is complete, a partition scope checker for the child partition may be associated.
That is, the target partition server stores a second partition range checker, a partition range of the second partition range checker is a partition range of a first target child partition, and the first target child partition is a child partition obtained by splitting the parent partition, and the method further includes: after the parent partition is split, the target partition server receives a second read request, wherein the second read request is used for requesting to read a second data record from the first target child partition; the target partition server reads the second data record from the first target sub-partition; the target partition server checks the partition range of the partition to which the row key value of the second data record belongs using the second partition range checker.
The first target sub-partition may be any one of a plurality of sub-partitions split based on the parent partition. The first target sub-partition may also be a sub-partition of the plurality of sub-partitions that needs to be managed by the target partition server, and accordingly, other sub-partitions of the plurality of sub-partitions may be allocated to other partition servers for management for load balancing reasons.
In order to simplify the generation process of the partition range checker (second partition range checker) of the child partition, a snapshot may be performed on the first partition range checker to obtain a copy of the first partition range checker, and the copy of the first partition range checker is modified to obtain the second partition range checker. Of course, the second partition extent checker may also be obtained by modifying the first partition extent checker.
It should be noted that, after the parent partition is split, when the target partition server receives a read request requesting to read a data record from the parent partition, a partition version error message may be directly returned to the client, so as to notify the client of updating the partition version.
The target partition server may determine a relationship between the reception time of the read request and the partition splitting completion time of the parent partition in a plurality of ways. For example, the determination may be made by way of a preset time period. That is, the partition splitting start time of the parent partition is taken as the start time of the preset time period, and the read request received in the preset time period is the read request associated with the first partition range checker. A read request received after a preset period of time, i.e., a read request that may be associated with the second partition scope checker. For another example, the determination may be made by means of a timestamp, and the specific determination is described below.
In the process of partitioning a parent partition, the value of a timestamp corresponding to the parent partition of a target partition server is a maximum value, and the method further includes: the target partition server determines that the timestamp carried by the first read request is smaller than the timestamp corresponding to the parent partition according to the timestamp carried by the first read request and the timestamp corresponding to the parent partition; the target partition server checks the partition range of the partition to which the row key value of the first data record belongs using the first partition range checker.
It should be noted that the target partition server may maintain a timestamp (i.e., a timestamp corresponding to the parent partition) for the parent partition, and each of the pair of data records read from the read request of the parent partition needs to be compared with the timestamp of the parent partition to determine the partition range checker associated with the read request.
After the parent partition is split, the value of the timestamp corresponding to the parent partition is the split completion time of the parent partition, the partition range in which the target partition server stores the third partition checker is the partition range of the second target child partition, and the second target child partition is a child partition obtained by splitting the parent partition, and the method further includes: the target partition server receives a third read request, wherein the third read request is used for requesting to read a third data record from the second target child partition, and the second target child partition is a child partition obtained by splitting the parent partition; the target partition server determines that the timestamp carried by the third read request is greater than the timestamp corresponding to the parent partition; the target partition server checks the partition range of the partition to which the row key value of the third data record belongs using the third partition range checker.
In the process of partition splitting, a target partition server needs to generate partition metadata of a child partition, and in order to simplify the generation process of the partition metadata, snapshot can be performed on the partition metadata of a parent partition to generate copies of the partition metadata of a plurality of parent partitions, so that the partition metadata of the child partition can be generated by modifying the partition metadata of the parent partition. That is, the target partition server stores multiple copies of the parent partition's partition metadata record, the method further comprising: after splitting the parent partition into the plurality of child partitions, the target partition server modifies the plurality of replicas into partition metadata records for the plurality of child partitions.
The plurality of replicas includes the original partition metadata of the parent partition, and the number of replicas may be equal to the number of child partitions to generate the partition metadata for each child partition.
For example, if the parent partition is divided into a first child partition and a second child partition, the partition range of the first child partition is from a start row key (start row key) of the parent partition to a row key corresponding to the split point, and the partition range of the second child partition is from a row key corresponding to the split point to an end row key (end row key) of the parent partition. Then, partition metadata of a parent partition can be regenerated by means of snapshot, and then partition metadata of a second child partition, that is, metadata for describing a partition from a row key corresponding to a split point to an end row key in the data table, in the original partition metadata of the parent partition is deleted as partition metadata of the first child partition, and partition metadata of a first child partition, that is, metadata for describing a partition from a start row key to a row key corresponding to a split point in the data table, in a copy of the partition metadata of the parent partition is deleted as partition metadata of the second child partition.
It should be noted that, the snapshot of the partition metadata of the parent partition and the snapshot of the partition range of the parent partition described above may be performed at the same time or at different times, which is not limited in the embodiment of the present application.
To facilitate an understanding of the present application, methods of embodiments of the present application are described below with reference to specific examples. It should be understood that the following examples are merely illustrative of the methods of the embodiments of the present application and are not intended to limit the scope of the embodiments of the present application.
FIG. 5 is a flow diagram of a method of reading data records from a distributed system according to another embodiment of the present application. The method shown in fig. 5 includes steps 510 to, it should be noted that preparation for partition splitting may refer to the description in fig. 3, and the following description focuses on a partition splitting process performed by a partition server, and a method flow of processing a read request by the partition server in the partition splitting process.
Assume that the parent partition to be split is partition a and the partition server managing partition a is the target partition server.
510, the target partition server performs preparation before partition splitting, that is, adjusting the timestamp corresponding to the parent partition to a maximum value, and performing snapshot on the partition metadata of the partition a and the partition range checker of the partition a to obtain a copy of the partition metadata of the partition a and the partition range checker of the partition a.
The target partition server receives 520 a first read request requesting that the data record be read from partition a.
And 530, the target partition server judges the size relationship between the timestamp carried by the first read request and the timestamp corresponding to the parent partition.
Since the value of the timestamp corresponding to the parent partition is the maximum value in the splitting process, the value of the timestamp carried by the first read request received in the splitting process is smaller than the value of the timestamp corresponding to the parent partition.
The target partition server associates the partition scope checker for partition a with the first read request 540. I.e. the partition range checker using partition a checks the partition to which the row key of the data record that the first read request requests to read belongs.
The target partition server splits partition a into child partition a' and child partition B550.
Specifically, the target partition server modifies the partition metadata of the partition a into the partition metadata of the child partition a', and modifies the copy of the partition metadata of the partition a into the partition metadata of the child partition B. The target partition server modifies the partition scope checker of partition a into the partition scope checker of sub-partition a', and modifies the copy of the partition scope checker of partition a into the partition scope checker of sub-partition B.
In addition, if the target partition server detects that the data requested to be read by the first read request is not processed, the target partition server also reserves a partition range checker of the partition a to check the partition to which the row key of the data record requested to be read by the first read request belongs.
560, the target partition server adjusts the timestamp corresponding to partition a to the time when partition a was split.
570, the target partition server receives the second read request.
It should be noted that, if the partition version number carried in the second read request is different from the partition version number maintained by the target partition server for the partition a, it indicates that the partition version stored by the client that sends the second read request is expired, and at this time, the target partition server returns a partition version error to the client to notify the client of updating the partition version. If the partition version number carried in the second read request is the same as the partition version number maintained by the target partition server for partition a, and the second read request requests to read the data record from the sub-partition a', step 580 is performed.
580, the target partition server determines the size relationship between the timestamp carried by the second read request and the timestamp corresponding to partition a.
The timestamp carried by the second read request is greater than the timestamp corresponding to the partition a, which indicates that the second read request is sent after the partition a is split, and at this time, if the data record requested to be read by the second read request belongs to data in the sub-partition a', step 590 is executed.
590, the target partition server processes the second read request.
The method of the embodiment of the present application is described in detail above with reference to fig. 1 to 5, and the apparatus for reading data records from a distributed system of the embodiment of the present application is described in detail below with reference to fig. 6 to 7. It should be noted that the apparatuses shown in fig. 6 to fig. 7 can implement each step in the above method, and are not described herein again for brevity.
Fig. 6 is a schematic structural diagram of a partition server for writing data records according to an embodiment of the present application, where the partition server 600 shown in fig. 6 includes: a receiving module 610 and a processing module 620. The partition server is used for managing a parent partition in a data table, the partition server is stored with a first partition range checker, the partition range of the first partition range checker is the partition range of the parent partition,
the partition server includes:
a receiving module 610, configured to receive a first read request sent by the client during splitting the parent partition, where the first read request is used to request to read a first data record from the parent partition;
the processing module 620 is configured to read the first data record from the parent partition according to the first read request, where a completion time of reading the first data record is later than a splitting completion time of the parent partition;
the processing module 620 is further configured to check, by using the first partition range checker, a partition range of a partition to which a row key value of the first data record belongs.
Optionally, as an embodiment, the partition server stores a second partition range checker, where a partition range of the second partition range checker is a partition range of a first target child partition, and the first target child partition is a child partition obtained by splitting the parent partition, and the receiving module 610 is further configured to receive, after the splitting of the parent partition is completed, a second read request, where the second read request is used to request to read a second data record from the first target child partition; the processing module 620 is further configured to read the second data record from the first target sub-partition; the processing module 620 is further configured to check, by using the second partition range checker, a partition range of a partition to which the row key value of the second data record belongs.
Optionally, as an embodiment, in the process of splitting the parent partition, a value of a timestamp corresponding to the parent partition is a maximum value, and the processing module 620 is further configured to: acquiring a timestamp carried by the first read request; and if the timestamp carried by the first read request is smaller than the timestamp corresponding to the parent partition, selecting the partition range of the partition to which the row key value of the first data record belongs by the first partition range checker.
Optionally, as an embodiment, after the parent partition is split, the timestamp value corresponding to the parent partition is the splitting completion time of the parent partition, the partition server stores a third partition range checker, a partition range of the third partition range checker is a partition range of a second target child partition, the second target child partition is a child partition obtained by splitting the parent partition, the receiving module 610 is further configured to receive a third read request, the third read request is used to request to read a third data record from the second target child partition, and the second target child partition is a child partition obtained by splitting the parent partition; the processing unit 620 is further configured to obtain a timestamp carried by the third read request; the processing unit 620 is further configured to check, by using the third partition range checker, a partition range of a partition to which a row key value of the third data record belongs, if the timestamp carried by the third read request is greater than the timestamp corresponding to the parent partition.
Optionally, as an embodiment, the partition server stores multiple copies of the partition metadata record of the parent partition, and the processing module 620 is further configured to: after splitting the parent partition into the plurality of child partitions, modifying the plurality of replicas into partition metadata records for the plurality of child partitions.
In an alternative embodiment, the receiving module 610 may be an input/output interface 730, the processing module 620 may be a processor 720, and the partitioned server may further include a memory 710, as shown in fig. 7.
FIG. 7 is a schematic block diagram of a partitioned server cluster according to another embodiment of the present application. The partitioned server cluster 700 shown in FIG. 7 may include at least one partitioned server, each partitioned server including: memory 710, processor 720, and input/output interface 730. The memory 710, the processor 720 and the input/output interface 730 are connected via an internal connection path, the memory 710 is used for storing instructions, and the processor 720 is used for executing the instructions stored in the memory 720 to control the input/output interface 730 to receive input data and information and output data such as operation results.
It should be noted that the partitioned server cluster may include one partitioned server, and may further include a plurality of partitioned servers. When the partitioned server cluster includes a plurality of partitioned servers, the plurality of partitioned servers cooperate with each other to implement each function implemented by the partitioned server in the methods shown in fig. 1 to 5, and the specific structure of the partitioned server cluster is shown in fig. 7, that is, the partitioned server cluster may include a plurality of memories, a plurality of processors, and a plurality of input/output interfaces. When the partitioned server cluster includes a partitioned server, the specific structure of the partitioned server can be shown in fig. 7, that is, the partitioned server cluster may include a memory, a processor and an input/output interface.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 720. The method disclosed in the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, etc. as is well known in the art. The storage medium is located in the memory 710, and the processor 720 reads the information in the memory 710 and performs the steps of the method in combination with the hardware. To avoid repetition, it is not described in detail here.
It should be understood that in the embodiments of the present application, the processor may be a Central Processing Unit (CPU), and the processor may also be other general-purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It will also be appreciated that the memory may include both read-only memory and random-access memory and provide instructions and data to the processor in embodiments of the present application. A portion of the processor may also include non-volatile random access memory. For example, the processor may also store information of the device type.
It should be understood that in the embodiment of the present application, "B corresponding to a" means that B is associated with a, from which B can be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may be determined from a and/or other information.
It should be understood that the term "and/or" herein is only one kind of association relationship describing the association object, and means that there may be three kinds of relationships, for example, a and/or B, and may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be read by a computer or a data storage device, such as a server, a data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a Digital Versatile Disk (DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (13)

1. A method for reading data records from a distributed system, the distributed system comprising a partition server and a client, the partition server being used for managing a parent partition in a data table, the partition server storing a first partition scope checker, the partition scope of the first partition scope checker being the partition scope of the parent partition,
the method comprises the following steps:
the partition server receives a first read request sent by the client in the process of splitting the parent partition, wherein the first read request is used for requesting to read a first data record from the parent partition;
the partition server reads the first data record from the parent partition according to the first read request, and the completion time of reading the first data record is later than the splitting completion time of the parent partition;
the partition server checks a partition range of a partition to which a row key value of the first data record belongs using the first partition range checker.
2. The method of claim 1, wherein the partitioned server stores a second partition scope checker, the partition scope of the second partition checker being a partition scope of a first target child partition, the first target child partition being a child partition that is split from the parent partition,
the method further comprises the following steps:
after the parent partition is split, the partition server receives a second read request, wherein the second read request is used for requesting to read a second data record from the first target child partition;
the partition server reads the second data record from the first target child partition;
the partition server checks the partition range of the partition to which the row key value of the second data record belongs using the second partition range checker.
3. The method of claim 1, wherein during the splitting of the parent partition, the timestamp value corresponding to the parent partition is a maximum value, and after the splitting of the parent partition is completed, the timestamp value corresponding to the parent partition is a splitting completion time of the parent partition.
4. The method of claim 3, wherein the method further comprises:
the partition server acquires a timestamp carried by the first read request;
and if the timestamp carried by the first read request is smaller than the timestamp corresponding to the parent partition, the partition server selects the first partition range checker to check the partition range of the partition to which the row key value of the first data record belongs.
5. The method of claim 3 or 4, wherein the partition server stores a third partition scope checker, a partition scope of the third partition scope checker is a partition scope of a second target child partition, the second target child partition is a child partition obtained by splitting the parent partition,
the method further comprises the following steps:
the partition server receives a third read request, where the third read request is used to request reading of a third data record from the second target child partition, and the second target child partition is a child partition obtained by splitting the parent partition;
the partition server acquires a timestamp carried by the third read request;
and if the timestamp carried by the third read request is greater than the timestamp corresponding to the parent partition, the partition server uses the third partition range checker to check the partition range of the partition to which the row key value of the third data record belongs.
6. The method of any of claims 1-5, wherein the partition server stores multiple copies of the partition metadata record for the parent partition, the method further comprising:
after splitting the parent partition into the plurality of child partitions, the partition server modifies the plurality of replicas into partition metadata records for the plurality of child partitions.
7. A partition server for reading data records from a distributed system, the distributed system comprising the partition server and a client, the partition server for managing a parent partition in a data table, the partition server storing a first partition scope checker, the partition scope of the first partition scope checker being the partition scope of the parent partition,
the partition server includes:
a receiving module, configured to receive a first read request sent by the client in a process of splitting the parent partition, where the first read request is used to request to read a first data record from the parent partition;
the processing module is used for reading the first data record from the parent partition according to the first reading request, and the completion time of reading the first data record is later than the splitting completion time of the parent partition;
the processing module is further configured to check, using the first partition range checker, a partition range of a partition to which a row key value of the first data record belongs.
8. The partition server of claim 7, wherein the partition server stores a second partition scope checker, the partition scope of the second partition scope checker being a partition scope of a first target child partition, the first target child partition being a child partition obtained by splitting the parent partition,
the receiving module is further configured to receive, by the partition server, a second read request after the parent partition is split, where the second read request is used to request to read a second data record from the first target child partition;
the processing module is further configured to read the second data record from the first target sub-partition;
the processing module is further configured to check, by using the second partition range checker, a partition range of a partition to which the row key value of the second data record belongs.
9. The partition server according to claim 7, wherein during the splitting of the parent partition, the timestamp value corresponding to the parent partition is a maximum value, and after the splitting of the parent partition is completed, the timestamp value corresponding to the parent partition is a splitting completion time of the parent partition.
10. The partition server of claim 9, wherein the processing module is further to:
acquiring a timestamp carried by the first read request;
and if the timestamp carried by the first read request is smaller than the timestamp corresponding to the parent partition, selecting the partition range of the partition to which the row key value of the first data record belongs by the first partition range checker.
11. The partition server of claim 9 or 10, wherein the partition server stores a third partition range checker, a partition range of the third partition range checker being a partition range of a second target child partition, the second target child partition being a child partition obtained by splitting the parent partition,
the receiving module is further configured to receive a third read request, where the third read request is used to request to read a third data record from the second target child partition, and the second target child partition is a child partition obtained by splitting the parent partition;
the processing unit is further configured to obtain a timestamp carried by the third read request;
the processing unit is further configured to check, by using the third partition range checker, a partition range of a partition to which a row key value of the third data record belongs, if a timestamp carried by the third read request is greater than a timestamp corresponding to the parent partition.
12. The partition server of any of claims 7-11, wherein the partition server stores multiple copies of the partition metadata record for the parent partition, the processing module further to:
after splitting the parent partition into the plurality of child partitions, modifying the plurality of replicas into partition metadata records for the plurality of child partitions.
13. A partitioned server cluster for reading data records from a distributed system, the distributed system comprising the partitioned server cluster and a client, the partitioned server cluster comprising at least one partitioned server, each partitioned server comprising a processor and a memory, the memory for storing a computer program, the processor for invoking and executing the computer program from the memory, such that the partitioned server cluster performs the method of any of claims 1-6.
CN201811323197.7A 2018-11-08 2018-11-08 Method and apparatus for reading data records from a distributed system Active CN111221857B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811323197.7A CN111221857B (en) 2018-11-08 2018-11-08 Method and apparatus for reading data records from a distributed system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811323197.7A CN111221857B (en) 2018-11-08 2018-11-08 Method and apparatus for reading data records from a distributed system

Publications (2)

Publication Number Publication Date
CN111221857A CN111221857A (en) 2020-06-02
CN111221857B true CN111221857B (en) 2023-04-18

Family

ID=70830168

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811323197.7A Active CN111221857B (en) 2018-11-08 2018-11-08 Method and apparatus for reading data records from a distributed system

Country Status (1)

Country Link
CN (1) CN111221857B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115113B (en) * 2020-09-25 2022-03-25 北京百度网讯科技有限公司 Data storage system, method, device, equipment and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7734615B2 (en) * 2005-05-26 2010-06-08 International Business Machines Corporation Performance data for query optimization of database partitions
US9996572B2 (en) * 2008-10-24 2018-06-12 Microsoft Technology Licensing, Llc Partition management in a partitioned, scalable, and available structured storage
GB2521197A (en) * 2013-12-13 2015-06-17 Ibm Incremental and collocated redistribution for expansion of an online shared nothing database
CN106326241A (en) * 2015-06-15 2017-01-11 阿里巴巴集团控股有限公司 Method and apparatus for reading/writing data table in data table splitting process
CN105353988A (en) * 2015-11-13 2016-02-24 曙光信息产业(北京)有限公司 Metadata reading and writing method and device
US10353895B2 (en) * 2015-11-24 2019-07-16 Sap Se Atomic visibility switch for transactional cache invalidation
US10726009B2 (en) * 2016-09-26 2020-07-28 Splunk Inc. Query processing using query-resource usage and node utilization data

Also Published As

Publication number Publication date
CN111221857A (en) 2020-06-02

Similar Documents

Publication Publication Date Title
US10740355B2 (en) System and method for optimizing data migration in a partitioned database
US11106538B2 (en) System and method for facilitating replication in a distributed database
US20170083579A1 (en) Distributed data processing method and system
US11296940B2 (en) Centralized configuration data in a distributed file system
US9875259B2 (en) Distribution of an object in volatile memory across a multi-node cluster
US11586641B2 (en) Method and mechanism for efficient re-distribution of in-memory columnar units in a clustered RDBMs on topology change
US11645424B2 (en) Integrity verification in cloud key-value stores
CN108616581B (en) Data storage system and method based on OLAP/OLTP hybrid application
US11082494B2 (en) Cross storage protocol access response for object data stores
JP2014519100A (en) Distributed caching and cache analysis
KR102121157B1 (en) Use of nonce table to solve concurrent blockchain transaction failure
RU2711348C1 (en) Method and system for processing requests in a distributed database
CN113760901A (en) Data processing method, device, equipment and storage medium
JP2023541298A (en) Transaction processing methods, systems, devices, equipment, and programs
CN113032335A (en) File access method, device, equipment and storage medium
CN108777718B (en) Method and device for accessing read-write-more-less system through client side by service system
CN111221857B (en) Method and apparatus for reading data records from a distributed system
US20240012800A1 (en) Data processing method, server, and system
CN117520278A (en) Multi-client high-precision directory quota control method for distributed file system
CN114205354B (en) Event management system, event management method, server, and storage medium
CN111782634B (en) Data distributed storage method, device, electronic equipment and storage medium
US20240169072A1 (en) Native multi-tenant row table encryption
WO2022083267A1 (en) Data processing method, apparatus, computing node, and computer readable storage medium
US20220261389A1 (en) Distributing rows of a table in a distributed database system
CN108572993A (en) Db divides library hash methods, electronic equipment, storage medium and the device to data access

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220208

Address after: 550025 Huawei cloud data center, jiaoxinggong Road, Qianzhong Avenue, Gui'an New District, Guiyang City, Guizhou Province

Applicant after: Huawei Cloud Computing Technology Co.,Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Applicant before: HUAWEI TECHNOLOGIES Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant