CN111290714A

CN111290714A - Data reading method and device

Info

Publication number: CN111290714A
Application number: CN202010081830.7A
Authority: CN
Inventors: 周力
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-02-06
Filing date: 2020-02-06
Publication date: 2020-06-16
Anticipated expiration: 2040-02-06
Also published as: CN111290714B

Abstract

The embodiment of the application provides a data reading method and device, and relates to the technical field of distributed storage. The specific implementation scheme is as follows: determining that the identifier of the data page to be read is a first identifier from the computing node, the data page to be read is a sub data page of a first father data page in the computing node, and sending a data page reading request to the storage node, wherein the data page reading request comprises the first identifier; the storage node determines that a first data page identified as a first identifier in the storage node corresponds to a page splitting process, and sends a target data page to the slave computing node according to the occurrence time of the page splitting process, wherein the target data page is the first data page or the second data page. According to the data reading method and device, the parent data page and the child data page in the slave computing node can be matched, and therefore the fact that the slave computing node reads correct data from the storage node is guaranteed.

Description

Data reading method and device

Technical Field

The embodiment of the application relates to computer technology, in particular to distributed storage technology.

Background

Under the support of cloud computing technology and services, the business scale is rapidly enlarged, higher requirements are provided for a core infrastructure service-database service constructed at the cloud end, and therefore a new generation of cloud native database architecture appears, and the service capability of a cloud database is greatly improved.

The cloud native database architecture comprises a master computing node, a slave computing node and a storage node, wherein the master computing node is responsible for reading and writing data, and the slave computing node can only read data. The data are not directly written into the storage node any more, but the master computing node generates a redo log and transmits the redo log to the storage node, and the storage node plays back the redo log into the data. And simultaneously transmitting the redo log to the slave computing node, and playing back the redo log into data from the slave computing node when needed so as to keep the data in the cache of the slave computing node up to date.

Since the storage node and the slave computing node independently play back the redo log to generate data, there is a possibility that the update schedules of the same data pages are inconsistent in a certain period of time, such as the slave computing node needs to read a child data page of a certain parent data page, and since the update schedules of the storage node and the slave computing node are inconsistent, there may be the following situations: page splitting related to child data pages occurs only in the storage node or from the compute node, so that child data pages read from the storage node from the compute node may not match parent data pages in the compute node, i.e., erroneous data may be read from the compute node.

Disclosure of Invention

The embodiment of the application provides a data reading method and device, which can enable a parent data page in a slave computing node to be matched with a child data page, namely, ensure that the slave computing node reads correct data from a storage node.

In a first aspect, an embodiment of the present application provides a data reading method, which is applied to a storage node, and the method includes: receiving a data page read request from a compute node, the data page read request including a first identification; determining that a first data page identified as the first identifier in the storage node corresponds to a page splitting process; and sending a target data page to the computing node according to the occurrence opportunity of the page splitting process, wherein the target data page is the first data page or the second data page. Optionally, the first identifier is an identifier of a data page to be read, which is determined by the compute node, and the data page to be read is a child data page of a first parent data page in the compute node.

In this aspect, the compute node is a slave compute node. According to the scheme, when a child data page of a certain data page needs to be read from a computing node, whether the child data page corresponds to a page splitting process can be determined according to splitting information, and under the condition that the child data page corresponds to the page splitting process, a correct read data page is determined according to occurrence time of the page splitting process, so that a parent data page and the child data page in the computing node are matched with each other, namely correct data can be read from the computing node.

In one possible embodiment, the storage node stores partial splitting information; the partial splitting information comprises an identifier of a splitting data page corresponding to a partial page splitting process and a log serial number LSN of a log triggering the page splitting process, and the splitting data page corresponding to the partial page splitting process is stored in the storage node; determining that a first data page identified as the first identification in the storage node corresponds to a page splitting process, comprising: determining that the first identifier exists in the identifiers of the split data pages included in the partial split information.

According to the scheme, whether the data page corresponds to the specific implementation of the page splitting process is determined through the splitting information, so that the determination of whether the data page corresponds to the page splitting process is simple and easy to implement.

In one possible implementation, the data page read request further includes a third LSN of the first parent data page; before the sending the target data page to the computing node according to the occurrence opportunity of the page splitting process, the method further includes: and determining the occurrence opportunity of the page splitting process according to the first LSN of the log triggering the page splitting process, the second LSN of the first data page in the storage node and the third LSN.

The specific implementation of determining the occurrence time of the page splitting process is provided by the scheme, namely the determination is carried out through the LSN of the corresponding log, and the data is obtained by replaying the log, so that the occurrence time of the page splitting process is judged more accurately according to the LSN of the corresponding log.

In a possible implementation, the determining the occurrence timing of the page splitting process includes: if the third LSN is less than the first LSN and the first LSN is less than or equal to the second LSN, determining that the page splitting process occurs after the storage node obtains the first parent data page; if the first LSN is less than the second LSN and the first LSN is less than the third LSN, determining that the page splitting process occurs before the storage node obtains the first parent data page; if the second LSN is less than the first LSN and the first LSN is less than or equal to the third LSN, then it is determined that the data playback speed of the storage node is slower than the data playback speed of the compute node and the page splitting process is pending to occur in the storage node.

In one possible implementation, the page splitting process occurs after the storage node obtains the first parent data page; before sending the target data page to the computing node, the method further includes: sending first information to the computing node, wherein the first information is used for the computing node to determine a target data page; receiving a read request for the target page of data from the compute node.

In this embodiment, a specific implementation of how to trigger the determination of the correct data page to be read from the compute node after the page splitting process corresponding to the first data page occurs when the storage node obtains the first parent data page is given.

In one possible embodiment, the first information includes: triggering a first LSN of a log of the page splitting process and indicating information indicating that the page splitting process occurs after the storage node obtains the first parent data page.

In a possible implementation, the page splitting process occurs before the storage node obtains the first parent data page, and the target data page is the first data page in the storage node.

In one possible implementation, the data playback speed of the storage node is slower than that of the computing node and the page splitting process is to occur in the storage node, the target data page being the first data page in the storage node; prior to sending the first data page identified as the first identification to the compute node, further comprising: and determining that the log playback of the storage node with the LSN as the first LSN is finished.

In a second aspect, an embodiment of the present application provides a data reading method, which is applied to a computing node, and the method includes: determining that an identifier of a data page to be read is a first identifier, wherein the data page to be read is a child data page of a first parent data page in the computing node; sending a data page reading request to a storage node, wherein the data page reading request comprises a first identifier, and the first identifier is used for the storage node to determine the occurrence time of a page splitting process corresponding to a first data page identified as the first identifier and send a target data page to the computing node according to the occurrence time; receiving a target data page from the storage node, the target data page being the first data page or a second data page of the storage node.

In this aspect, the compute node is a slave compute node. In the scheme, when a child data page of a certain data page needs to be read from a computing node, whether the child data page corresponds to a page splitting process can be determined according to splitting information, so that under the condition that the child data page corresponds to the page splitting process, a correct read data page is determined according to occurrence time of the page splitting process, a parent data page and the child data page in the computing node are matched with each other, and correct data can be read from the computing node.

In one possible implementation, before receiving the target data page from the storage node, the method further includes: receiving first information from the storage node, the first information comprising: triggering a first Log Sequence Number (LSN) of a log of the page splitting process and indicating information indicating that the page splitting process occurs after the storage node obtains the first parent data page; determining a target data page according to the first information; and sending a request for reading the target data page to the storage node.

The scheme provides a scheme for triggering the computing node to determine the target data page.

In a possible implementation, the determining a target data page according to the first information includes: determining that the first father data page does not correspond to a page splitting process in the storage node according to the first information; reading a second parent data page from the storage node, wherein the second parent data page is a first parent data page updated after the page splitting process occurs; and determining the target data page according to the second parent data page.

According to the scheme, the data page, namely the second father data page, which is obtained by updating the first father data page due to the page splitting process corresponding to the first data page, can be obtained from the computing node, so that the correct data page to be read, namely the target data page, can be determined according to the second father data page, the father data page and the sub data page in the computing node are matched with each other, and correct data can be read from the computing node.

In a possible implementation, the determining a target data page according to the first information includes: determining a corresponding page splitting process of the first parent data page in the storage node according to the first information; reading each data page from a second father data page to a target father data page from a storage node, wherein the target father data page does not correspond to a page splitting process and is positioned between a root data page and the second father data page or the target father data page is the root data page, the second father data page is a data page after the first father data page is split, and the second father data page is a father data page of the first data page; and determining the target data page according to the data pages from the second father data page to the target father data page. It will be appreciated that the identity of the second parent data page is also the second identity.

According to the scheme, each data page from the second father data page to the target father data page, which is updated due to the page splitting process corresponding to the first data page, can be obtained from the computing node, so that the correct data page to be read, namely the target data page, can be determined according to each data page from the second father data page to the target father data page, and further the father data page and the son data page in the computing node are matched with each other, so that the correct data can be read from the computing node.

In a possible implementation manner, the computing node stores therein full splitting information, where the full splitting information includes an identifier of a split data page corresponding to each page splitting process and an LSN of a log that triggers each page splitting process; determining that the first parent data page corresponds to a page splitting process in the storage node according to the first information, including: and according to the indication information, determining that the second identifier exists in the identifiers of the split data pages included in the full split information and the LSN triggering the corresponding page splitting process is the first LSN, and the second identifier is the identifier of the first father data page.

This solution, which gives a specific implementation of the process of determining that the first parent data page corresponds to a page split in the storage node, has less network overhead.

In a possible implementation manner, the computing node does not store the full split information, and the management node stores the full split information; the full splitting information comprises the identifier of a splitting data page corresponding to each page splitting process and the LSN of the log triggering each page splitting process; determining that the first parent data page corresponds to a page splitting process in the storage node according to the first information, including: sending a query request to a management node according to the indication information, wherein the query request comprises the first LSN and the second identifier, and the query request indicates the management node to determine whether the first father data page corresponds to a page splitting process in the storage node according to the full splitting information; receiving a query result from the management node, the query result indicating that the first parent data page corresponds to a page splitting process in the storage node.

Another specific implementation of the page splitting process for determining that the first parent data page corresponds to in the storage node is provided by the present solution, in which storage space of the slave computing node can be saved.

In a third aspect, an embodiment of the present application provides a data reading method, which is applied to a computing node, and the method includes: determining that an identifier of a data page to be read is a first identifier, wherein the data page to be read is a child data page of a first parent data page in the computing node; determining that a first data page identified as the first identifier in the storage node corresponds to a page splitting process; determining a target data page according to the occurrence opportunity of the page splitting process; reading the target page of data from the storage node.

In a possible implementation manner, the computing node stores therein full splitting information, where the full splitting information includes an identifier of a split data page corresponding to each page splitting process and an LSN of a log that triggers each page splitting process; determining that a first data page identified as the first identifier in the storage node corresponds to a page splitting process, including: and determining that the first identifier exists in the identifiers of the split data pages included in the full split information.

According to the scheme, the page splitting process corresponding to the first data page marked as the first mark in the storage node is determined through the splitting information, the implementation is simple and easy, and the network overhead is low.

In a possible implementation manner, the computing node does not store full splitting information, the management node stores full splitting information, and the full splitting information includes an identifier of a split data page corresponding to each page splitting process and an LSN of a log that triggers each page splitting process;

in the scheme, the page splitting process corresponding to the first data page marked as the first mark in the storage node is determined through the splitting information, and the method is simple and easy to implement.

In a possible implementation manner, before the determining a target data page according to the occurrence timing of the page splitting process, the method further includes: and determining the occurrence opportunity of the page splitting process according to the first LSN of the log triggering the page splitting process, the second LSN of the first data page in the storage node and the third LSN of the first father data page in the computing node.

In a possible implementation, the determining the occurrence timing of the page splitting process includes: if the third LSN is less than the first LSN and the first LSN is less than or equal to the second LSN, determining that the page splitting process occurs after the storage node obtains the first parent data page; if the first LSN is less than the second LSN and the first LSN is less than the third LSN, determining that the page splitting process occurs before the storage node obtains the first parent data page; if the second LSN is less than the first LSN and the first LSN is less than or equal to the third LSN, determining that the data playback speed of the storage node is slower than the data playback speed of the computing node and the page splitting process is to occur in the storage node.

In a possible implementation manner, if the page splitting process occurs after the storage node obtains the first parent data page, the determining the target data page includes: determining that the first parent data page does not correspond to a page splitting process in the storage node; reading a second father data page from a storage node, wherein the second father data page is a first father data page updated after the page splitting occurs; and determining a target data page according to the second parent data page.

According to the scheme, the updated data page, namely the second father data page, caused by the page splitting process corresponding to the first data page can be obtained from the computing node, so that the correct data page to be read, namely the target data page, can be determined according to the second father data page, the father data page and the son data page in the computing node are matched with each other, and the correct data can be read from the computing node.

In a possible implementation manner, if the page splitting process occurs after the storage node obtains the first parent data page, determining the target data page according to the occurrence timing of the page splitting process includes: determining that the first parent data page corresponds to a page splitting process in the storage node; reading each data page from a second father data page to a target father data page from a storage node, wherein the target father data page does not correspond to a page splitting process and is positioned between a root data page and the second father data page or the target father data page is the root data page, the second father data page is a data page after the first father data page is split, and the second father data page is a father data page of the first data page; and determining the target data page according to the data pages from the second father data page to the target father data page.

In a possible implementation manner, if the page splitting process occurs before the storage node obtains the first parent data page, the determining the target data page includes: determining the first data page in the storage node as a target data page.

In one possible implementation, if the data playback speed of the storage node is slower than the data playback speed of the computing node and the page splitting process is to occur in the storage node, the determining the target data page includes: determining that the storage node completes the log playback of the first LSN; determining the first data page in the storage node as a target data page.

In a fourth aspect, an embodiment of the present application provides a data reading apparatus, including: a transceiver module for receiving a data page read request from a compute node, the data page read request including a first identifier; the processing module is used for determining that a first data page identified as the first identification in the data reading device corresponds to a page splitting process; and the transceiver module is further configured to send a target data page to the computing node according to the occurrence opportunity of the page splitting process, where the target data page is the first data page or the second data page.

In a possible embodiment, the data reading device stores therein partial splitting information; the partial splitting information comprises an identifier of a splitting data page corresponding to a partial page splitting process and a Log Serial Number (LSN) of a log triggering the page splitting process, and the splitting data pages corresponding to the partial page splitting process are stored in the data reading device; the processing module is specifically configured to determine that the first identifier exists in identifiers of split data pages included in the partial split information.

In a possible implementation manner, the first identifier is an identifier of a data page to be read, which is determined by the compute node, and the data page to be read is a child data page of a first parent data page in the compute node.

In one possible implementation, the data page read request includes the first identification and a third LSN of the first parent data page in the compute point; before the transceiver module sends the target data page to the computing node according to the occurrence opportunity of the page splitting process, the processing module is further configured to: and determining the occurrence opportunity of the page splitting process according to the first LSN of the log triggering the page splitting process, the second LSN of the first data page in the data reading device and the third LSN.

In a possible implementation, the processing module is specifically configured to: if the third LSN is less than the first LSN and the first LSN is less than or equal to the second LSN, determining that the page splitting process occurs after the data reading device obtains the first parent data page; if the first LSN is less than the second LSN and the first LSN is less than the third LSN, determining that the page splitting process occurs before the data reading device obtains the first parent data page; if the second LSN is less than the first LSN and the first LSN is less than or equal to the third LSN, then it is determined that the data playback speed of the data reading device is slower than the data playback speed of the compute node and the page splitting process is pending to occur in the data reading device.

In one possible embodiment, the page splitting process occurs after the data reading device obtains the first parent data page; before the transceiver module sends the target data page to the computing node, the transceiver module is further configured to: sending first information to the computing node, wherein the first information is used for the computing node to determine the target data page to be read; receiving a read request for the target page of data from the compute node.

In one possible embodiment, the first information includes: triggering a first LSN of a log of the page splitting process and indicating information indicating that the page splitting process occurs after the data reading device obtains a first parent data page.

In a possible implementation, the page splitting process occurs before the data reading device obtains the first parent data page, and the target data page is the first data page in the data reading device.

In a possible embodiment, the data playback speed of the data reading device is slower than the data playback speed of the computing node and the page splitting process is to occur in the data reading device, the target data page being the first data page in the data reading device. Before the transceiver module sends the first data page to the computing node, the processing module is further configured to: and determining that the log playback of the LSN as the first LSN by the data reading device is finished.

In a fifth aspect, an embodiment of the present application provides a data reading apparatus, including: the processing module is used for determining that the identifier of the data page to be read is a first identifier, and the data page to be read is a sub data page of a first father data page in the data reading device; a receiving and sending module, configured to send a data page reading request to a storage node, where the data page reading request includes the first identifier, and the first identifier is used for the storage node to determine an occurrence time of a page splitting process corresponding to a first data page identified as a first identifier, and send a target data page to the data reading apparatus according to the occurrence time; the transceiver module is further configured to receive a target data page from the storage node, where the target data page is a second data page or the first data page.

In one possible embodiment, before the transceiver module receives a target data page from the storage node: the transceiver module is further configured to receive first information from the storage node, where the first information includes: triggering a first Log Sequence Number (LSN) of a log of the page splitting process and indicating information indicating that the page splitting process occurs after the storage node obtains the first parent data page; the processing module is further used for determining a target data page according to the first information; the transceiver module is further configured to send a request for reading the target data page to the storage node.

In a possible implementation, the processing module is specifically configured to: determining that a first father data page in the storage node does not correspond to a page splitting process according to the first information; reading a second parent data page from the storage node, wherein the second parent data page is a first parent data page updated after the page splitting process occurs; and determining the target data page according to the second parent data page.

In a possible implementation, the processing module is specifically configured to: determining a corresponding page splitting process of the first parent data page in the storage node according to the first information; reading each data page from a second father data page to a target father data page from a storage node, wherein the target father data page does not correspond to a page splitting process and is positioned between a root data page and the second father data page or the target father data page is the root data page, the second father data page is a data page after the first father data page is split, and the second father data page is a father data page of the first data page; and determining the target data page according to the data pages from the second father data page to the target father data page.

In a possible implementation manner, the data storage device stores therein full splitting information, where the full splitting information includes an identifier of a split data page corresponding to each page splitting process and an LSN of a log that triggers each page splitting process; the processing module is specifically configured to: and according to the first information, determining that the second identifier exists in the identifiers of the split data pages included in the full split information and the LSN triggering the corresponding page splitting process is the first LSN, and the second identifier is the identifier of the first father data page.

In a possible implementation, the data storage device does not store therein full split information, and the management node stores therein the full split information; the full splitting information comprises the identifier of a splitting data page corresponding to each page splitting process and the LSN of the log triggering each page splitting process; the processing module is specifically configured to: according to the indication information, controlling the transceiver module to send an inquiry request to a management node, wherein the inquiry request comprises the first LSN and the second identifier, and the inquiry request indicates the management node to determine whether the first father data page corresponds to a page splitting process in the storage node according to the full splitting information; determining that the first parent data page corresponds to a page splitting process in the storage node according to a query result received by the transceiver module from the management node, wherein the query result indicates that the first parent data page corresponds to the page splitting process in the storage node.

In a sixth aspect, an embodiment of the present application provides a data reading apparatus, including: the processing module is used for determining that the identifier of the data page to be read is a first identifier, and the data page to be read is a sub data page of a first father data page in the data reading device; determining that a first data page identified as the first identifier in the storage node corresponds to a page splitting process; determining a target data page according to the occurrence opportunity of the page splitting process; reading the target page of data from the storage node.

In a possible implementation manner, the data reading apparatus stores therein full splitting information, where the full splitting information includes an identifier of a split data page corresponding to each page splitting process and an LSN of a log that triggers each page splitting process; the processing module is specifically configured to: and determining that the first identifier exists in the identifiers of the split data pages included in the full split information.

In a possible implementation, the system further comprises a transceiver module; the data reading device does not store full splitting information, the management node stores the full splitting information, and the full splitting information comprises split data page identifiers corresponding to all page splitting processes and LSNs of logs triggering all page splitting processes; the processing module is specifically configured to control the transceiver module to send an inquiry request to the management node, where the inquiry request includes the first identifier, and the inquiry request indicates that the management node determines, according to the full splitting information, that the first data page corresponds to a page splitting process in the storage node; and determining that the first data page corresponds to a page splitting process in the storage node according to a query result received by the transceiver module from the management node, wherein the query result indicates that the first data page corresponds to the page splitting process in the storage node.

In a possible implementation manner, before the processing module determines the target data page according to the occurrence timing of the page splitting process, the processing module is further configured to: and determining the occurrence opportunity of the page splitting process according to the first LSN of the log triggering the page splitting process, the second LSN of the first data page in the storage node and the third LSN of the first parent data page in the data reading device.

In a possible implementation, the processing module is specifically configured to: if the third LSN is less than the first LSN and the first LSN is less than or equal to the second LSN, determining that the page splitting process occurs after the storage node obtains the first parent data page; if the first LSN is less than the second LSN and the first LSN is less than the third LSN, determining that the page splitting process occurs before the storage node obtains the first parent data page; and if the second LSN is smaller than the first LSN and the first LSN is smaller than or equal to the third LSN, determining that the data playback speed of the storage node is slower than that of the data reading device and the page splitting process is to occur in the storage node.

In a possible implementation manner, if the page splitting process occurs after the storage node obtains the first parent data page, the processing module is specifically configured to: determining that the first parent data page does not correspond to a page splitting process in the storage node; reading a second father data page from a storage node, wherein the second father data page is a first father data page updated after the page splitting occurs; and determining a target data page according to the second parent data page.

In a possible implementation manner, if the page splitting process occurs after the storage node obtains the first parent data page, the processing module is specifically configured to: determining that the first parent data page corresponds to a page splitting process in the storage node; reading each data page from a second father data page to a target father data page from a storage node, wherein the target father data page does not correspond to a page splitting process and is positioned between a root data page and the second father data page or the target father data page is the root data page, the second father data page is a data page after the first father data page is split, and the second father data page is a father data page of the first data page; and determining the target data page according to the data pages from the second father data page to the target father data page.

In a possible implementation manner, if the page splitting process occurs before the storage node obtains the first parent data page, the processing module is specifically configured to: determining the first data page in the storage node as a target data page.

In a possible implementation manner, if the data playback speed of the storage node is slower than the data playback speed of the data reading device and the page splitting process is to occur in the storage node, the processing module is specifically configured to: determining that the storage node completes the log playback of the first LSN; determining the first data page in the storage node as a target data page.

In a seventh aspect, an embodiment of the present application provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of any one of the possible designs of the first aspect and the first aspect or to perform the method of any one of the possible designs of the second aspect and the second aspect or to perform the method of any one of the possible designs of the third aspect and the third aspect.

In an eighth aspect, the present application provides a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of the possible designs of the first aspect and the first aspect, or to perform the method of any one of the possible designs of the second aspect and the second aspect, or to perform the method of any one of the possible designs of the third aspect and the third aspect.

One embodiment in the above application has the following advantages or benefits: the parent data page and the child data page in the slave compute node can be made to match, i.e., to ensure that the correct data is read from the storage node from the compute node. Because whether the first data page marked as the first mark in the storage node corresponds to the page splitting process is determined (the first mark is the mark of the data page to be read and initially determined by the computing node), if so, the correct data page to be read is determined according to the time of the page splitting process, and the correct data page is read from the storage node; therefore, the technical problem that when data are read from the storage node by the computing node due to the fact that the speeds of replaying the logs into the data from the computing node and the storage node are inconsistent in the prior art, the parent data page in the computing node is possibly not matched with the child data page is solved, and the technical effect that the parent data page in the computing node is matched with the child data page, namely correct data are read from the storage node by the computing node is guaranteed.

Other effects of the above-described alternative will be described below with reference to specific embodiments.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1 is a schematic structural diagram of a second-order B-tree according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a multi-level B-tree according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a page splitting process of a B-tree according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a B + tree according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a page splitting process of a B + tree according to an embodiment of the present application;

FIG. 6 is a system architecture diagram provided in accordance with an embodiment of the present application;

fig. 7 is a schematic view of a data reading scenario provided in an embodiment of the present application;

fig. 8 is a first flowchart of a data acquisition method according to an embodiment of the present application;

fig. 9 is a second flowchart of a data acquisition method according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a data acquisition apparatus according to an embodiment of the present application;

fig. 11 is a block diagram of an electronic device for implementing the data acquisition method according to the embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple. The terms "first," "second," and the like in this application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

First, elements related to the present application will be described.

In some data storage scenarios, such as a distributed database, data is stored in pages, each of which may be 4k bytes (16k or any other size). Data is added to the data pages in sequence until the data is full. In a database where data is organized in a B-tree or a variant of a B-tree (e.g., B + tree, B-tree), page splitting may occur if a data page is filled, i.e., one data page is split into two data pages, with new data inserted into the respective data pages in order.

In the B-tree or a variation of the B-tree, including non-leaf nodes and leaf nodes, all of the non-leaf nodes have children, in this embodiment, the "children" of the non-leaf nodes are children of the non-leaf nodes, and the non-leaf nodes are parents of their children.

The following briefly describes the B-tree and the B + tree, respectively.

1. Second order B-trees, i.e., binary search trees: (1) all non-leaf nodes have at most two children (Left and Right); (2) each node stores a keyword; (3) the left pointer of a non-leaf node points to a child node that is smaller than its key and the right pointer points to a child node that is larger than its key. Fig. 1 is a schematic structural diagram of a B-tree according to an embodiment of the present disclosure. Referring to fig. 1, each node corresponds to one data page, and each data page is framed by a square frame; the lowest node is a leaf node, the other nodes are non-leaf nodes, and the uppermost node of the non-leaf nodes is a root node. For example, node 101 is a parent of node 102 and node 103, and

nodes

102 and 103 are children of node 101. The data page corresponding to node 101 is the data page corresponding to node 102 and the parent data page of the data page corresponding to node 103, and the data page corresponding to node 102 and the data page corresponding to node 103 are the child data pages of the data page corresponding to node 101.

Searching the second-order B tree, starting from the root node, and hitting if the keywords of the query are equal to the keywords of the node; otherwise, if the searched keyword is smaller than the node keyword, entering a left subnode; if the key word of the query is larger than the key word of the node, entering a right subnode; if the pointer of the left subnode or the right subnode is null, the report can not find the corresponding key word.

2. A multi-level B-tree is a multi-way search tree (not binary), for an M-level B-tree: (1) any non-leaf node has at most M sub-nodes, and M>2; (2) the number of children of the root node is [2, M ]](ii) a (3) The number of child nodes of non-leaf nodes other than the root node is [ M/2, M%](ii) a (4) Each node stores at least M/2-1 (taking the whole) and at most M-1 keywords; (at least 2 keys) (5) the number of keys of a non-leaf node is equal to the number of pointers to child nodes minus 1; (6) keywords of non-leaf nodes: k₁,K₂,…,K_M-1And K is_i<K_i+1(ii) a That is, each node in the M-level B tree has M-1 keywords at most; (7) pointers to non-leaf nodes: p1],P[2],…,P[M]Wherein, P [1 ]]The pointed keyword is less than K₁P [ M ] of]The pointed keyword is greater than K_M-1Subtree of (1), other P [ i ]]The pointed keyword belongs to (K)_i-1,K_i) All leaf nodes of the subtree (8) are located at the same level. When M is 3, a structural diagram of a multi-level B-tree is shown in fig. 2. Referring to fig. 2, each node corresponds to one data page, and each data page is framed by a square frame; the lowest node is a leaf node, the other nodes are non-leaf nodes, and the uppermost node of the non-leaf nodes is a root node. Such as: node 201 is a parent of node 202, node 203 and node 204, and node 202, node 203 and node 204 are children of node 201. The data page corresponding to node 201 is the data page corresponding to node 202, the data page corresponding to node 203, and the parent data page of the data page corresponding to node 204, and the data page corresponding to node 202, the data page corresponding to node 203, and the data page corresponding to node 204 are child data pages of the data page corresponding to node 201.

Searching the multi-order B tree, starting from a root node, performing binary search on a keyword (ordered) sequence in the node, finishing the search if the keyword (ordered) sequence is hit, and otherwise entering a subnode of a range to which the query keyword belongs; and repeating until the corresponding child node pointer is null or is already a leaf node.

The following describes a page splitting process according to an embodiment of the present application, taking splitting of a 5-th-order B-tree as an example.

Fig. 3 is a schematic diagram of a page splitting process of a B-tree according to an embodiment of the present disclosure.

As shown in a diagram in fig. 3, 39 is inserted into the empty tree (i.e. data with key of 39 is inserted, and insertion "Y" means in this embodiment of the present application inserting data record with key of Y), and at this time, the root node includes a key, and the root node is also a leaf node. Referring to the B diagram of fig. 3, the

insertions

22, 97, and 41 are continued, and the root node includes 4 keys (i.e., 4 data records, one key for each data record). Referring to the C diagram in fig. 3, with continued insertion 53, beyond the number of maximum allowed included keywords at the node 4 after insertion 53, page splitting is performed centered at 41, as shown in D diagram in fig. 3, where the identification of the data pages comprising data records with

keywords

22 and 39 is the same as the identification of the data pages comprising data records 39, 22, 41 and 97 before insertion 53.

Sequential insertions

13, 21, 40, also result in splitting, as shown in E in FIG. 3, followed by

sequential insertions

30, 27, 33, 36, 35, 34, 24, 29, as shown in F in FIG. 3. As shown in the G diagram in fig. 3, the nodes where 26 and 26 are inserted continuously include more than 4 keywords, page splitting needs to be performed with 27 as the center, and carry 27 to the parent node, and the split result is shown in the H diagram in fig. 3, where the page identifiers including the

keywords

24 and 26 are the same as the identifier of the data page including the data records of 24, 26, 29 and 30 before 26 is inserted; the carry-over results in that the current root node also needs to be page split, the result of which is shown in graph I in fig. 3, where the identification of the data pages comprising the data records with

keys

22 and 27 is the same as the identification of the data pages comprising the data records 22, 33, 36 and 41 before insertion 27.

3. B + tree: the B + tree is a variant of the B tree, and is also a multi-way search tree: its definition is essentially the same as that of the B-tree, except for the following differences: (1) the number of child node pointers of the non-leaf node is the same as the number of the keywords, and the child node pointer P [ i ] of the non-leaf node]The directional key word belongs to [ K_i,K_i+1]A child node of (a); (3) adding a chain pointer to all leaf nodes; (4) all keys appear at leaf nodes. Furthermore, the B + tree has the following characteristics: (1) all keys appear in the linked list of leaf nodes (dense index), and the keys in the linked list are just in order; (2) it is unlikely that a search hits on non-leaf nodes: this is because the non-leaf nodes behave as indices (sparse indices) to the leaf nodes, and the leaf nodes behave as data layers to store data.

For convenience of description in this embodiment, although the non-leaf nodes in the B + tree do not store data, since all the keywords may appear in the leaf nodes, it is also considered that each non-leaf node in the B + tree corresponds to one data page, that is, each node corresponds to one data page in this embodiment.

Fig. 4 is a schematic structural diagram of a B + tree according to an embodiment of the present disclosure, as shown in fig. 4. Referring to fig. 4, the lowest node is a leaf node, the rest of nodes are non-leaf nodes, and the highest node of the non-leaf nodes is a root node. Such as: node 401 is a parent node of node 402, node 403, and node 404, and node 402, node 403, and node 404 are child nodes of node 401. The data page corresponding to node 401 is the data page corresponding to node 402, the data page corresponding to node 403, and the parent data page of the data page corresponding to node 404, and the data page corresponding to node 402, the data page corresponding to node 403, and the data page corresponding to node 404 are the child data pages of the data page corresponding to node 401.

The number of keys and the number of sub-nodes in the B + tree in fig. 4 are the same, and the B + tree is a 4-step B + tree. In another mode, the number of keys in the B + tree is 1 less than the number of children.

The following describes a page splitting process according to an embodiment of the present application, taking splitting of a 5 th-order B + tree as an example.

Fig. 5 is a schematic diagram of a page splitting process of a B + tree according to an embodiment of the present application.

As shown in diagram a of fig. 5, 5 is inserted into the empty tree, and the root node includes a key, and the root node is also a leaf node. Referring to the B diagram in fig. 5, the insertions of 8, 10 and 15 are continued, and the root node now includes 4 keys. Referring to the diagram C in fig. 5, the insertion 16 is continued, and the number of keys included in the node is exceeded by 4 after the insertion 16, so that the page splitting is performed. When a leaf node is split, 2 data records are split at the left node, 3 data records are split at the right node, and the middle key becomes a key in the index node, and the split data is as shown in a diagram D in fig. 5, wherein the identifier of the data page including the data records with the keys of 5 and 8 is the same as the identifier of the data page including the data records of 5, 8, 10 and 15 before 16 is inserted.

As shown in fig. 5, E, followed by insertion 17. As shown in fig. 5F, the nodes where 18 and 18 are inserted continuously include more than 4 keys, page splitting is required, the split left node includes 2 data records, the right node includes 3 data records, the key 16 carries to the parent node (index type), and the split result is shown in a graph G in fig. 5, in which the data page including the data records with

keys

10 and 15 has the same identifier as the data page including the data records with

keys

10, 15, 16, and 17 before 18 is inserted.

After inserting several data, a B + tree is obtained as shown in the H diagram in FIG. 5. Then, as shown in fig. 5I and fig. 5J, the nodes where 7 and 7 are inserted continuously include more than 4 keys, page splitting is required, the split left node includes 2 data records, the right node includes 3 data records, and the key 7 is carried to the parent node, where the identifier of the data page including the data records with

keys

5 and 6 is the same as the identifier of the data page including the data records with

keys

5, 6, 8, and 9 before the insertion of 7; the carry-over results in the current root node also requiring splitting, the result of which is shown in the K diagram of fig. 5, where the identity of the data page containing the data records with

keys

7 and 10 is the same as the identity of the data page containing the data records 10, 16, 18 and 20 prior to the insertion of 7.

The following describes a system architecture according to the present application, after a B tree and a B + tree are described.

Fig. 6 is a system architecture diagram provided in an embodiment of the present application, and referring to fig. 6, the system architecture of the present embodiment includes a computing layer and a distributed storage layer, and may further include a management layer. The computing layer may include a plurality of nodes including a master computing node for reading and writing data and slave computing nodes from which other slave computing nodes may read data from the distributed storage layer. The distributed storage tier includes a plurality of storage nodes for storing data. The management layer includes one or more management nodes.

In the system architecture, data are not directly written into the main computing node for local storage, and are not directly written into the storage node, but the main computing node generates a redo log and transmits the redo log to the storage node, and the storage node plays back the redo log into data. And simultaneously transmitting the redo log to the slave computing node, and if necessary, playing back the redo log into data by the slave computing node so as to keep the data in the cache of the slave computing node up to date. The redo log may be identified by a unique value, which is called a Log Sequence Number (LSN), where the LSN of the log generated first is smaller than the LSN of the log generated later.

That is, data in data pages organized in a B-tree or a variant of a B-tree from a compute node and a storage node are both obtained after playback of a redo log, and if data 1 is obtained by playback of a redo log 1, data 1 corresponds to redo log 1. And if the LSN of the first log in each redo log corresponding to the data included in the data page is the maximum, the LSN of the first redo log is the LSN of the data page. Correspondingly, if the LSN of the second log in each redo log corresponding to the data record triggering the page splitting process is the maximum, the second redo log is the redo log triggering the page splitting process.

As described above, the storage node and the slave computing node each independently play back the redo log to generate data, so that there is a possibility that the update schedules of the same data pages are inconsistent at a certain time, such as the slave computing node needs to read a child data page of a certain parent data page, and because the update schedules of the storage node and the slave computing node are inconsistent, there may be the following situations: page splitting related to child data pages occurs only in the storage node or from the compute node, so that child data pages read from the storage node from the compute node may not match parent data pages in the compute node, i.e., erroneous data may be read from the compute node. For example, the slave computing node and the storage node both organize data or data pages in a 5-step B-tree form, as shown in (a) of fig. 7, the slave computing node stores the data page 601, but does not store the sub data page of the data page 601, the sub data page is stored in the storage node, and the slave computing node needs to read the sub data page of the data page 601. If the B-tree in the storage node is as shown in (B) of fig. 7, the sub data page of the data page 601 stored in the storage node at this time matches the data page 601 stored from the compute node. However, if the playback speed of the storage node is faster than that of the slave computing node, as shown in (c) to (e) of fig. 7, page splitting occurs in the storage node due to the insertion of 26, and finally the B-tree in the storage node is as shown in (e) of fig. 7, at this time, the slave computing node cannot read the child data page matching the parent data page 601 in the slave computing node from the storage node because the data page shown in 601 no longer exists in the B-tree in the storage node.

In order to solve the above technical problem, a data acquisition method in the present embodiment is proposed. The following describes the data acquisition method of the present application with specific examples.

First, a data reading method corresponding to the storage node storing the splitting information and the computing node and/or the management node storing the full splitting information will be described with a specific embodiment.

Fig. 8 is a first flowchart of a data acquisition method according to an embodiment of the present application. Referring to fig. 8, the method of the present embodiment includes:

step S801, determining, from the computing node, that the identifier of the data page to be read is a first identifier, where the data page to be read is a child data page of a first parent data page in the computing node.

The slave computing node stores a first father data page, and the identifier of the child data page is determined to be a first identifier when the child data page of the first father data page needs to be read.

Step S802, sending a data page read request from the compute node to the storage node, where the data page read request includes the first identifier.

And determining whether the cache of the computing node comprises the sub data page identified as the first identification or not from the computing node, if so, acquiring the sub data page from the cache, and applying the sub data page. The sub data page may be, for example, data sent to the terminal device to display the data corresponding to the sub data page.

And if the computing node determines that the cache does not comprise the sub data page, sending a data page reading request to the storage node, wherein the data page reading request indication comprises a first identifier. In one approach, the data page read request further includes the LSN of the first parent data page in the slave compute node (for convenience of subsequent description, the LSN of the first parent data page is subsequently referred to as the third LSN).

Step S803, the storage node determines that the first data page identified as the first identifier in the storage node corresponds to a page splitting process.

The storage node receives a data page reading request from the slave computing node, determines whether a first data page identified as a first identifier in the storage node corresponds to a page splitting process, and if not, the storage node sends the stored first data page identified as the first identifier to the slave computing node; if yes, go to step S804.

In one mode: the storage node determines whether a first data page identified as a first identifier in the storage node corresponds to a page splitting process, and the method comprises the following steps: and the storage node determines whether the first data page identified as the first identifier in the storage node corresponds to a page splitting process according to the splitting information. The splitting information comprises an identifier of a splitting data page corresponding to the page splitting process and an LSN of a redo log triggering the page splitting process; an identification of a parent data page of the split data page may also be included.

Table 1 is at least part of the contents included in the fragmentation information.

TABLE 1

Referring to table 1, in the page splitting process, the original data page C is split into a new data page C and a new data page D, the identifier of the new data page C split into the original data page C is the same, the parent data page of the new data page C and the parent data page of the new data page D are P, and the LSN of the log triggering the page splitting process this time is 24. As shown in fig. 3D and E, after

insertion

13, 21, 40 (assuming 40 is the last inserted of the 3 keys), the data page of the data record including the

keys

22, 39, 13, and 21 is split into the data page of the data record including the

keys

13 and 21 and the data page of the data record including the

keys

39 and 40, and the data page of the data record including the

keys

22 and 41 is the parent data page of the two split data pages.

That is, for a page splitting process, the splitting information may include an identifier of a split data page corresponding to the page splitting process and an LSN of a log that triggers the page splitting process, where the identifier of the split data page corresponding to the page splitting process may include identifiers of two data pages (e.g., C and D above) involved in the page splitting process. In addition, the split information may also include an identification of the parent of the two data pages.

If a certain data page is split again after the split occurs, the split information further includes the identifier of the split data page corresponding to the re-occurring page splitting process and the LSN of the log triggering the re-occurring page splitting process, and accordingly at least part of the content in the split information may be as shown in table 2:

TABLE 2

Referring to table 2, in the page splitting process, the original data page C is split into a new data page C and a new data page D, the original data page C and the split new data page C have the same identifier, the parent data pages of the new data page C and the new data page D are P, and the LSN of the log triggering the page splitting process this time is 24. The new data page C is split into an updated data page C and a new data page D₂The updated data page C has the same identification as the new data page C, and the updated data page C and the new data page D₂Is P and the LSN of the log that triggered this page splitting process is 24.

If the parent data page is also split due to the page splitting process at a time, the splitting information further includes the identifier of the split data page corresponding to the page splitting process of the parent data page and the LSN of the log triggering the page splitting process of the parent data page. If each page split results in a split of the parent data page at the same time, then a consistent loop is made to the root data page, and at least part of the corresponding split information may be as shown in Table 3:

TABLE 3

Identification of split data pages	Identification of split data pages	Identification of parent data page	LSN of log triggering page splitting
				C	D	P1	24
P1	P2	PP1						24
				PP1	PP2	PP3	24
……	……	……	24
				PPPPP1	PPPPP2	Root	24

Therefore, if the storage node determines that the first identifier exists in the identifiers of the split data pages included in the split information, it is determined that the first data page corresponds to the page splitting process.

Further, the split information may be divided into partial split information and full split information.

The full split information includes: the method comprises the steps of identifying split data pages corresponding to all page splitting processes and triggering the LSN of a redo log of each page splitting process; an identification of the parent data page of each split data page may also be included. The full split information may be stored in the slave compute nodes and/or the management nodes, and may also be stored in the storage nodes.

At least part of the content of the full split information may be as shown in table 4:

TABLE 4

Identification of split data pages	Identification of split data pages	Identification of parent data page	LSN of log triggering page splitting
				100	101	12	24
110	111	15	39
				201	202	18	89
210	211	19	99

The partial splitting information may include an identifier of a split data page corresponding to the partial page splitting process and an LSN of a redo log that triggers the partial page splitting process; an identification of a parent data page of the split data page may also be included. The partial splitting information may be stored in a storage node, and accordingly, the split data page corresponding to the partial page splitting process included in the partial splitting information stored in the storage node is stored in the storage node, that is, the data page indicated by the identifier of the split data page included in the partial splitting information is stored in the storage node; in other words, the partial page splitting process included in the partial splitting information stored in the storage node is a page splitting process of at least a partial data page of the data pages stored in the storage node.

The storage node can store the full split information and can also store partial split information. When partial splitting information is stored in the storage node, the storage space of the storage node can be saved, and the efficiency of determining whether the first data page corresponds to the page splitting process can be improved.

When at least part of the content of the full split information is shown in table 4, the partial split information may be shown in tables 5 and 6:

TABLE 5

TABLE 6

Step S804, the storage node sends a target data page to the slave computing node according to the occurrence time of the page splitting process corresponding to the first data page, where the target data page is the first data page or the second data page in the storage node.

Before the storage node sends the target data page to the slave computing node according to the occurrence opportunity of the page splitting process corresponding to the first data page, the method further includes: and the storage node determines the occurrence opportunity of the page splitting process corresponding to the first data page.

In one scheme, a storage node determines an occurrence timing of a page splitting process corresponding to a first data page, including: and determining the occurrence time of the page splitting process corresponding to the first data page according to the first LSN of the log triggering the page splitting process, the second LSN of the data page marked as the first mark in the storage node and the third LSN of the first father data page in the slave computing node.

And if the third LSN is smaller than the first LSN and the first LSN is smaller than or equal to the second LSN, determining that the page splitting process corresponding to the first data page occurs after the storage node obtains the first parent data page. If the first LSN is less than the second LSN and the first LSN is less than the third LSN, determining that the page splitting process corresponding to the first data page occurs before the first parent data page is obtained from the storage node. And if the second LSN is less than the first LSN and the first LSN is less than or equal to the third LSN, determining that the data playback speed of the storage node is slower than that of the computing node, and waiting (about to happen) for a page splitting process corresponding to the first data page in the storage node.

It can be understood that, if the page splitting process corresponding to the first data page occurs before the storage node obtains the first parent data page, it indicates that after the storage node obtains the first parent data page, the first data page in the storage node does not correspond to the page splitting process, that is, the first data page identified as the first identifier in the storage node is a child data page matched with the first parent data page, and the storage node directly sends the data page identified as the first identifier to the slave computing node, so that correct data can be obtained from the computing node. That is, the target data page is the data page identified as the first identifier in the storage node at this time.

If the occurrence time of the page splitting process corresponding to the first data page comprises: and if the data playback speed of the storage node is lower than that of the slave computing node, and a page splitting process corresponding to the first data page in the storage node is to occur, sending the data page marked as the first mark to the slave computing node after determining that the log playback of the storage node to the first LSN is finished. The target data page is the data page identified as the first identifier in the storage node.

The log of the first LSN is a log which triggers a page splitting process corresponding to the first data page, so that after the storage node finishes replaying the log of the first LSN, the page splitting process corresponding to the first data page is also finished, at this time, the data page identified as the first identifier is sent to the slave computing node, and the slave computing node can obtain a child data page matched with the first parent data page, that is, correct data can be obtained.

If the page splitting process corresponding to the first data page occurs after the storage node obtains the first parent data page, before the storage node sends the target data page to the slave computing node according to the occurrence timing of the page splitting process corresponding to the first data page, the method further includes steps S805 to S808:

step S805, the storage node sends the first information to the slave computing node.

Wherein the first information may include: and triggering a first LSN and indication information of a log of a page splitting process corresponding to the first data page, wherein the indication information indicates that the page splitting process corresponding to the first data page occurs after the storage node obtains the first father data page.

And step S806, determining the target data page according to the first information from the computing node.

In one scheme: determining, from the compute node, a target data page based on the first information, including a 1-a 3 as follows:

a1, according to the first information, the slave computing node determines that the first father data page in the storage node does not correspond to the page splitting process.

If the slave computing node stores the full splitting information, determining that the first father data page in the storage node does not correspond to the page splitting process according to the first information, wherein the page splitting process comprises the following steps: and judging whether a second identifier exists in the identifiers of the split data pages included in the full split information and the LSN triggering the corresponding page splitting process is the first LSN according to the indication of the indication information, wherein the second identifier is the identifier of the first father data page, and the obtained judgment result is negative.

If the slave computing node does not store the full splitting information and the management node stores the full splitting information, the slave computing node determines that the first father data page in the storage node does not correspond to the page splitting process according to the first information, and the method comprises the following steps: the slave computing node sends a query request to the management node according to the indication of the indication information, wherein the query request comprises a second identifier of the first father data page and the first LSN, and the query request indicates the management node to determine whether the first father data page corresponds to a page splitting process according to the full splitting information; a query result from the management node is received from the compute node, the query result indicating that the first parent data page does not correspond to a page splitting process. After receiving the query request, the management node determines whether a second identifier exists in the identifiers of the split data pages included in the full split information and the LSN triggering the corresponding page splitting process is the first LSN, and determines that the result is negative.

a2, reading a second parent data page from the storage node from the compute node. The second father data page is the first father data page updated after the page splitting process corresponding to the first data page occurs, that is, the second father data page is identified as the second identifier in the storage node, and the second identifier is the identifier of the first father data page. It will be appreciated that the identity of the second parent data page is also the second identity.

Although the first parent data page in the storage node does not correspond to the page splitting process, the first data page in the storage node corresponds to the page splitting process after the first parent data page is obtained from the storage node, the parent data page corresponding to the data page identified as the first identifier is the first parent data page before the page splitting process corresponding to the first data page does not occur in the storage node, and the first parent data page is updated to be the second parent data page after the page splitting process corresponding to the first data page occurs in the storage node. For example, as shown in fig. F to G in fig. 5, after the insertion 18, the data page of the data record including the

keywords

10, 15, 16, 17 is split into the data page of the data record including the

keywords

10 and 15 and the data page of the data record including the

keywords

10, 16, 17, and the parent data page is updated from the data page of the data record including the keyword 10 to the data page of the data record including the

keywords

10 and 16.

Wherein reading the second parent data page from the storage node from the compute node comprises: a data read request is sent from the compute node to the storage node, the data read request including an identification of the first parent data page (i.e., the second identification described above), and a second parent data page is received from the storage node from the compute node.

a3, determining a target data page from the compute node based on the second parent data page.

The target data page determined from the second parent data page by the compute node may be a first data page identified as a first identifier in the storage node or a second data page different from the first data page.

The target data page may be determined according to the current search method of the B-tree or the B-tree variant, based on the second parent data page, which is not described herein again.

In another scheme: determining, from the compute node, a target data page based on the first information, including b 1-b 3 as follows:

b1, determining the first father data page in the storage node corresponding to the page splitting process according to the first information from the computing node.

A schematic diagram of the page splitting process corresponding to the first data page and the page splitting process corresponding to the first parent data page may be as shown in G diagram to I diagram in fig. 3, or I diagram to K diagram in fig. 5.

If the total splitting information is stored in the slave computing node, determining a corresponding page splitting process of a first father data page in the storage node according to the first information, wherein the process comprises the following steps: and judging whether a second identifier exists in the identifiers of the split data pages included in the full split information and the LSN triggering the corresponding page splitting process is the first LSN according to the indication of the indication information, wherein the second identifier is the identifier of the first father data page, and the obtained judgment result is yes.

If the slave computing node does not store the full splitting information and the management node stores the full splitting information, the slave computing node determines a page splitting process corresponding to a first father data page in the storage node according to the first information, and the page splitting process comprises the following steps: the slave computing node sends a query request to the management node according to the indication of the indication information, wherein the query request comprises a second identifier of the first father data page and the first LSN, and the query request indicates the management node to determine whether the first father data page corresponds to a page splitting process according to the full splitting information; query results from the management node are received from the compute nodes, the query results indicating that the first parent data page corresponds to a page splitting process. After receiving the query request, the management node determines whether a second identifier exists in the identifiers of the split data pages included in the full split information and the LSN triggering the corresponding page splitting process is the first LSN, and the obtained determination result is yes.

It can be understood that, when the first parent data page corresponds to the page splitting process, the first parent data page is split into a second parent data page and a third parent data page, the identifier of the second parent data page is the same as the identifier of the first parent data page, and the second parent data page is the parent data page of the data page after the first data page is split. If the first father data page is a root data page before the first father data page is split, a new root father data page is generated after the first father data page is split, and the new root father data page is a father data page of the second father data page and a father data page of the third father data page. If the first parent data page is not the root data page before the first parent data page is split, the parent data pages of the second parent data page and the third parent data page are at least updated, and a page splitting process may also occur.

b2, reading each data page from the storage node to the target parent data page from the computing node, wherein the target parent data page does not correspond to the page splitting process and is positioned between the root data page and the second parent data page or the target parent data page is the root data page, the second parent data page is the data page after the splitting of the first parent data page, and the second parent data page is the parent data page of the first data page.

Specifically, after determining that the first parent data page corresponds to the page splitting process in the storage node from the computing node, continuously determining whether the parent data page (subsequently referred to as a fourth parent data page) of the second parent data page into which the first parent data page is split has the page splitting process (refer to a method for determining whether the first parent data page corresponds to the page splitting process), if the fourth parent data page does not correspond to the page splitting process, acquiring the second parent data page and the fourth parent data page (at this time, the fourth parent data page is a target parent data page), and if the fourth parent data page corresponds to the page splitting process, continuously determining whether the parent data page (subsequently referred to as a fifth parent data page) of the fourth parent data page has the page splitting process; and repeating the process until the target parent data page is determined not to correspond to the page splitting process. It will be appreciated that there are instances where the target parent data page is the root data page corresponding to the root node.

b3, determining the target data page from the second father data page to each data page of the target father data page from the computing node.

Similarly, the target data page determined from the second parent data page by the compute node may be a first data page identified as a first identifier in the storage node, or the storage node may be a second data page different from the first data page.

The target data page may be determined according to the current search method of the B-tree or the B-tree variant, and according to each data page from the second parent data page to the target parent data page, which is not described herein again.

Step S807, a read request of the target data page is sent from the computing node to the storage node.

Wherein the read request for the target page of data may include an identification of the target page of data.

Step S808, the storage node acquires the target data page according to the request for reading the target data page.

And the storage node acquires the target data page according to the identifier of the target data page.

After the storage node acquires the target data page, the storage node sends the target data page to the slave computing node, receives the target data page from the slave computing node, the target data page is matched with a second father data page acquired from the slave computing node or matched with each data page from the second father data page to the target father data page, and the slave computing node can acquire correct data.

In this embodiment, when a child data page of a certain data page needs to be read from a computing node, whether the child data page corresponds to a page splitting process may be determined according to splitting information, so that when the child data page corresponds to the page splitting process, a correct read data page is determined according to occurrence timing of the page splitting process, so that a parent data page and the child data page in the computing node are matched with each other, that is, correct data can be read from the computing node.

Next, a data reading method corresponding to the case where the storage node does not store the splitting information and the computing node and/or the management node stores the full splitting information will be described with reference to a specific embodiment.

Fig. 9 is a second flowchart of a data reading method according to an embodiment of the present application, and referring to fig. 9, the method according to the embodiment includes:

step S901, determining, from the computing node, that the identifier of the data page to be read is a first identifier, where the data page to be read is a child data page of a first parent data page in the computing node.

The slave computing node stores a first father data page, and the identifier of the child data page is determined to be a first identifier when the child data page of the first father data page needs to be read. And determining whether the cache of the computing node comprises the sub data page identified as the first identification or not from the computing node, if so, acquiring the sub data page from the cache, and applying the sub data page. The sub data page may be, for example, data sent to the terminal device to display the data corresponding to the sub data page. And if the slave computing node determines that the sub data page identified as the first identifier is not included in the cache of the slave computing node, the slave computing node determines to read the sub data page from the storage node.

Step S902, determining from the compute node that the first data page identified as the first identifier in the storage node corresponds to a page splitting process.

In one approach, the full split information is stored from the compute nodes. Accordingly, determining from the compute node that the first page of data in the storage node corresponds to a page splitting process, comprises: it is determined that a first identifier exists among the identifiers of the split data pages included in the full split information.

The method can improve the data reading efficiency and reduce the network overhead.

In another mode, the slave computing node does not store the full split information, and the management node stores the full split information. Accordingly, determining from the compute node that the first page of data corresponds to a page splitting process includes: sending a query request to a management node from a computing node, wherein the query request comprises a first identifier, and the query request indicates the management node to determine whether a first data page corresponds to a page splitting process according to the full splitting information; query results from the management node are received from the compute nodes, the query results indicating that the first data page corresponds to a page splitting process. After receiving the query request, the management node determines whether the first identifier exists in the identifiers of the split data pages included in the full split information, and the obtained determination result is yes.

This way storage space of the slave computing node can be saved.

Step S903, the slave computing node determines a target data page according to the occurrence time of the page splitting process corresponding to the first data page.

The specific implementation of determining the occurrence timing of the page splitting process corresponding to the first data page from the computing node refers to the storage node in the embodiment shown in fig. 8 to determine the specific implementation of the occurrence timing of the page splitting process corresponding to the first data page.

If the page splitting process corresponding to the first data page occurs after the storage node obtains the first parent data page, in one manner, determining the target data page from the compute node includes: determining from the compute node that the first parent data page does not correspond to a page splitting process in the storage node; reading a second father data page from the storage node, wherein the second father data page is a first father data page updated after a page splitting process corresponding to the first data page occurs; the target data page is determined from the second parent data page. The specific implementation of the steps in this manner is illustrated in the embodiment shown in fig. 8.

If the page splitting process corresponding to the first data page occurs after the storage node obtains the first parent data page, and the first data page is a child data page of the first parent data page, in one mode, determining a target data page from the computation node includes: determining from the compute node that a first parent data page corresponds to a page split process in the storage node; reading each data page from a second father data page to a target father data page from the storage node, wherein the target father data page does not correspond to a page splitting process and is positioned between the root data page and the second father data page or the target father data page is the root data page, the second father data page is a data page after splitting the first father data page, and the second father data page is a father data page of the first data page; and determining the target data page according to the data pages from the second father data page to the target father data page. The specific implementation of the steps in this manner is illustrated in the embodiment shown in fig. 8.

If the page splitting process corresponding to the first data page occurs before the storage node obtains the first parent data page, determining a target data page, including: and determining the first data page in the storage node as a target data page.

If the occurrence time of the page splitting process corresponding to the first data page comprises: the data playback speed of the storage node is slower than that of the slave computing node, and a page splitting process corresponding to a first data page in the storage node is to occur, the method includes: determining that the log playback of the storage node to the first LSN is finished; and determining the first data page in the storage node as a target data page.

Step S904, reads the target data page from the storage node from the compute node.

In summary, in the embodiment shown in fig. 9, if the full splitting information is stored in the management node, not stored in the slave computing node, and also not stored in the storage node, the slave computing node needs to request the management node whether the data page to be read corresponds to the page splitting process every time the data needs to be read from the storage node, so that the network overhead is very large. An improved way is the case that the full split information is stored in the slave computing node in the embodiment shown in fig. 9, at this time, the slave computing node does not need to request the management node whether the data page to be read corresponds to the page split process, and the network overhead is reduced. Another improved way is that partial splitting information is stored in the storage node in the embodiment shown in fig. 8, at this time, when the storage node receives a data page reading request sent from the compute node, it is determined whether the data page to be read corresponds to a page splitting process according to the stored partial splitting information, and the full splitting information is used only when the data page to be read corresponds to the page splitting process and the page splitting process occurs at a certain time, and no matter whether the full splitting information is stored in the compute node or the management node, the network overhead is less compared with the scheme that "the full splitting information is stored in the management node, and is not stored in the compute node and the storage node, nor is the full splitting information or the partial splitting information stored in the storage node".

Next, the creation and cleaning process of split information will be described with specific embodiments.

Since all write operations are performed at the master computing node, when a data page is split, all the related data pages should be in the cache of the master computing node, so that the master computing node stores information of all related data pages, and thus, the split information may be created by the master computing node, and in an initial process, the master computing node sends newly created split information to other nodes, for example, sends full split information to a slave computing node or a management node, and sends partial split information to a storage node. In the subsequent process, the main node sends the newly added splitting information to the node stored with the splitting information so as to update the splitting information stored in the node.

As data increases, page splitting happens continuously, so that the amount of information in split information increases, and therefore, the split information needs to be cleaned up. The cleaning process may be controlled by the management node.

Wherein, in the slave computing node and the storage node, the playback of the redo logs is in the order of the LSN size, so the logs with LSNs smaller than the currently played back LSN will not be seen. Based on such facts, cleaning can be performed according to the LSN of the log currently played back from the compute node and the storage node. For example, in one scheme, the management node determines a smaller LSN of the LSNs of the logs newly played back from the computing node and the storage node, and sends the smaller LSN to each node storing the splitting information, so that each node storing the splitting information deletes information corresponding to the smaller LSN in the splitting information. For example, if the computing node has applied to the log with LSN of 100 and the storage node has applied to the log with LSN of 120, we can clean the information of the log with LSN less than 100 in the splitting information.

The embodiment gives a specific implementation of generation and deletion of split information for the data reading process described above.

The data reading method in the present application is explained above, and the apparatus according to the embodiment of the present application is explained below.

Fig. 10 is a schematic structural diagram of a data reading apparatus according to an embodiment of the present application. As shown in fig. 10, the data reading device 1000 may be a slave computing node as described above, or may be a component (e.g., an integrated circuit, a chip, etc.) of a slave computing node as described above. The data reading device 1000 may also be a storage node as above, and may also be a component (e.g., an integrated circuit, a chip, etc.) of a storage node as above. The data reading apparatus 1000 may include: a processing module 1002 (processing unit). Optionally, a transceiver module 1001 (transceiver unit) and a storage module 1003 (storage unit) may also be included.

In one possible design, one or more of the modules in FIG. 10 may be implemented by one or more processors or by one or more processors and memory; or by one or more processors and transceivers; or by one or more processors, memories, and transceivers, which are not limited in this application. The processor, the memory and the transceiver can be arranged independently or integrated.

The data reading apparatus has a function of implementing the slave computing node described in the embodiments of the present application, for example, the data reading apparatus includes a module or a unit or means (means) corresponding to the slave computing node performing the steps related to the slave computing node described in the embodiments of the present application, and the function or the unit or the means (means) may be implemented by software, or by hardware implementing corresponding software, or by a combination of software and hardware. Reference may be made in detail to the respective description of the corresponding method embodiments hereinbefore.

Or the data reading apparatus has a function of implementing the storage node described in the embodiment of the present application, for example, the data reading apparatus includes a module or a unit or means (means) corresponding to the step of executing the storage node described in the embodiment of the present application by the storage node, and the function or the unit or the means (means) may be implemented by software or hardware, or may be implemented by hardware executing corresponding software, or may be implemented by a combination of software and hardware. Reference may be made in detail to the respective description of the corresponding method embodiments hereinbefore.

Optionally, each module in the data reading apparatus 1000 in this embodiment of the application may be configured to execute the method described in this embodiment of the application.

In a first possible design, a data reading device 1000 may include a transceiver module 1001 and a processing module 1002.

A transceiving module 1001, configured to receive a data page read request from a compute node, where the data page read request includes a first identifier; a processing module 1002, configured to determine that a first data page identified as the first identifier in the data reading apparatus corresponds to a page splitting process; the transceiving module 1001 is further configured to send a target data page to the computing node according to the occurrence opportunity of the page splitting process, where the target data page is the first data page or the second data page.

Optionally, the data reading device stores therein partial splitting information; the partial splitting information comprises an identifier of a splitting data page corresponding to a partial page splitting process and a Log Serial Number (LSN) of a log triggering the page splitting process, and the splitting data pages corresponding to the partial page splitting process are stored in the data reading device; the processing module 1002 is specifically configured to determine that the first identifier exists in the identifier of the split data page included in the partial split information.

Optionally, the first identifier is an identifier of a data page to be read, which is determined by the compute node, and the data page to be read is a child data page of a first parent data page in the compute node.

Optionally, the data page read request includes the first identification and a third LSN of the first parent data page in the compute point; before the transceiver module 1001 sends the target data page to the computing node according to the occurrence opportunity of the page splitting process, the processing module 1002 is further configured to: and determining the occurrence opportunity of the page splitting process according to the first LSN of the log triggering the page splitting process, the second LSN of the first data page in the data reading device and the third LSN.

Optionally, the processing module 1002 is specifically configured to: if the third LSN is less than the first LSN and the first LSN is less than or equal to the second LSN, determining that the page splitting process occurs after the data reading device obtains the first parent data page; if the first LSN is less than the second LSN and the first LSN is less than the third LSN, determining that the page splitting process occurs before the data reading device obtains the first parent data page; if the second LSN is less than the first LSN and the first LSN is less than or equal to the third LSN, then it is determined that the data playback speed of the data reading device is slower than the data playback speed of the compute node and the page splitting process is pending to occur in the data reading device.

Optionally, the page splitting process occurs after the data reading device obtains the first parent data page; before the transceiving module 1001 sends the target data page to the computing node, the transceiving module 1001 is further configured to: sending first information to the computing node, wherein the first information is used for the computing node to determine the target data page to be read; receiving a read request for the target page of data from the compute node.

Optionally, the first information includes: triggering a first LSN of a log of the page splitting process and indicating information indicating that the page splitting process occurs after the data reading device obtains a first parent data page.

Optionally, the page splitting process occurs before the data reading device obtains the first parent data page, and the target data page is the first data page in the data reading device.

Optionally, the data playback speed of the data reading device is slower than the data playback speed of the computing node and the page splitting process is to occur in the data reading device, and the target data page is the first data page in the data reading device. Before the transceiver module 1001 sends the first data page to the computing node, the processing module 1002 is further configured to: and determining that the log playback of the LSN as the first LSN by the data reading device is finished.

The data reading apparatus in the first possible design may be a storage node or a part of a storage node in the implementation of the method shown in fig. 8.

In a second possible design, a data reading device 1000 may include a transceiver module 1001 and a processing module 1002.

A processing module 1002, configured to determine that an identifier of a data page to be read is a first identifier, where the data page to be read is a child data page of a first parent data page in the computing node; a transceiver module 1001, configured to send a data page reading request to a storage node, where the data page reading request includes the first identifier, and the first identifier is used for the storage node to determine an occurrence time of a page splitting process corresponding to a first data page identified as a first identifier, and send a target data page to the data reading apparatus according to the occurrence time; the transceiving module 1001 is further configured to receive a target data page from the storage node, where the target data page is a second data page or the first data page.

Optionally, before the transceiving module 1001 receives a target data page from the storage node: the transceiver module 1001 is further configured to receive first information from the storage node, where the first information includes: triggering a first Log Sequence Number (LSN) of a log of the page splitting process and indicating information indicating that the page splitting process occurs after the storage node obtains the first parent data page; the processing module 1002 is further configured to determine a target data page according to the first information; the transceiving module 1001 is further configured to send a request for reading the target data page to the storage node.

Optionally, the processing module 1002 is specifically configured to: determining that a first father data page in the storage node does not correspond to a page splitting process according to the first information; reading a second parent data page from the storage node, wherein the second parent data page is a first parent data page updated after the page splitting process occurs; and determining the target data page according to the second parent data page.

Optionally, the processing module 1002 is specifically configured to: determining a corresponding page splitting process of the first parent data page in the storage node according to the first information; reading each data page from a second father data page to a target father data page from a storage node, wherein the target father data page does not correspond to a page splitting process and is positioned between a root data page and the second father data page or the target father data page is the root data page, the second father data page is a data page after the first father data page is split, and the second father data page is a father data page of the first data page; and determining the target data page according to the data pages from the second father data page to the target father data page.

Optionally, the data storage device stores therein full splitting information, where the full splitting information includes an identifier of a split data page corresponding to each page splitting process and an LSN of a log that triggers each page splitting process; the processing module 1002 is specifically configured to: and according to the first information, determining that the second identifier exists in the identifiers of the split data pages included in the full split information and the LSN triggering the corresponding page splitting process is the first LSN, and the second identifier is the identifier of the first father data page.

Optionally, the data storage device does not store full split information, and the management node stores the full split information; the full splitting information comprises the identifier of a splitting data page corresponding to each page splitting process and the LSN of the log triggering each page splitting process; the processing module 1002 is specifically configured to: according to the indication information, controlling the transceiver module 1001 to send an inquiry request to a management node, where the inquiry request includes the first LSN and the second identifier, and the inquiry request indicates the management node to determine, according to the full splitting information, whether the first parent data page corresponds to a page splitting process in the storage node; determining that the first parent data page corresponds to a page splitting process in the storage node according to a query result received by the transceiver module 1001 from the management node, where the query result indicates that the first parent data page corresponds to a page splitting process in the storage node.

The data reading device in the second possible design may be a slave computing node or a part of a slave computing node in the implementation of the method.

In a third possible design, a data reading device 1000 may include a transceiver module 1001 and a processing module 1002.

A processing module 1002, configured to determine that an identifier of a data page to be read is a first identifier, where the data page to be read is a child data page of the first parent data page in the data reading apparatus; determining that a first data page identified as the first identifier in the storage node corresponds to a page splitting process; determining a target data page according to the occurrence opportunity of the page splitting process; reading the target page of data from the storage node.

Optionally, the data reading device stores therein full splitting information, where the full splitting information includes an identifier of a split data page corresponding to each page splitting process and an LSN of a log that triggers each page splitting process; the processing module 1002 is specifically configured to: and determining that the first identifier exists in the identifiers of the split data pages included in the full split information.

Optionally, a transceiver module 1001 is further included; the data reading device does not store full splitting information, the management node stores the full splitting information, and the full splitting information comprises split data page identifiers corresponding to all page splitting processes and LSNs of logs triggering all page splitting processes; the processing module 1002 is specifically configured to control the transceiver module 1001 to send an inquiry request to the management node, where the inquiry request includes the first identifier, and the inquiry request indicates that the management node determines, according to the full splitting information, that the first data page corresponds to a page splitting process in the storage node; and determining that the first data page corresponds to a page splitting process in the storage node according to a query result received by the transceiver module 1001 from the management node, where the query result indicates that the first data page corresponds to a page splitting process in the storage node.

Optionally, before the processing module 1002 determines the target data page according to the occurrence timing of the page splitting process, the processing module 1002 is further configured to: and determining the occurrence opportunity of the page splitting process according to the first LSN of the log triggering the page splitting process, the second LSN of the first data page in the storage node and the third LSN of the first parent data page in the data reading device.

Optionally, the processing module 1002 is specifically configured to: if the third LSN is less than the first LSN and the first LSN is less than or equal to the second LSN, determining that the page splitting process occurs after the storage node obtains the first parent data page; if the first LSN is less than the second LSN and the first LSN is less than the third LSN, determining that the page splitting process occurs before the storage node obtains the first parent data page; and if the second LSN is smaller than the first LSN and the first LSN is smaller than or equal to the third LSN, determining that the data playback speed of the storage node is slower than that of the data reading device and the page splitting process is to occur in the storage node.

Optionally, if the page splitting process occurs after the storage node obtains the first parent data page, the processing module 1002 is specifically configured to: determining that the first parent data page does not correspond to a page splitting process in the storage node; reading a second father data page from a storage node, wherein the second father data page is a first father data page updated after the page splitting occurs; and determining a target data page according to the second parent data page.

Optionally, if the page splitting process occurs after the storage node obtains the first parent data page, the processing module 1002 is specifically configured to: determining that the first parent data page corresponds to a page splitting process in the storage node; reading each data page from a second father data page to a target father data page from a storage node, wherein the target father data page does not correspond to a page splitting process and is positioned between a root data page and the second father data page or the target father data page is the root data page, the second father data page is a data page after the first father data page is split, and the second father data page is a father data page of the first data page; and determining the target data page according to the data pages from the second father data page to the target father data page.

Optionally, if the page splitting process occurs before the storage node obtains the first parent data page, the processing module 1002 is specifically configured to: determining the first data page in the storage node as a target data page.

Optionally, if the data playback speed of the storage node is slower than the data playback speed of the data reading apparatus and the page splitting process in the storage node is to occur, the processing module 1002 is specifically configured to: determining that the storage node completes the log playback of the first LSN; determining the first data page in the storage node as a target data page.

The data reading device in the third possible design may be a slave computing node or a part of a slave computing node in the implementation of the method shown in fig. 9.

The apparatus of this embodiment may be configured to implement the technical solutions of the above method embodiments, and the implementation principles and technical effects are similar, which are not described herein again.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 11 is a block diagram of an electronic device implementing the data reading method according to the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 11, the electronic apparatus includes: one or more processors 1101, a memory 1102, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 11, a processor 1101 is taken as an example.

The memory 1102 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor to cause the at least one processor to perform the data reading method provided by the present application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the data reading method provided by the present application.

The memory 1102, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the processing module 1002 and the transceiver module 1001 shown in fig. 10) corresponding to the data reading method in the embodiment of the present application. The processor 1101 executes various functional applications of the server and data processing, i.e., implements the data reading method in the above-described method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 1102.

The memory 1102 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by use of an electronic device implementing the data reading method, and the like. Further, the memory 1102 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 1102 may optionally include memory located remotely from the processor 1101, which may be connected via a network to an electronic device implementing the data reading method. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device implementing the data reading method may further include: an input device 1103 and an output device 1104. The processor 1101, the memory 1102, the input device 1103 and the output device 1104 may be connected by a bus or other means, and are exemplified by being connected by a bus in fig. 11.

The input device 1103 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of an electronic apparatus implementing the data reading method, such as an input device like a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer, one or more mouse buttons, a track ball, a joystick, etc. The output devices 1104 may include a display device, auxiliary lighting devices (e.g., LEDs), tactile feedback devices (e.g., vibrating motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the method and the device, when the sub data page of a certain data page in the storage node needs to be read, whether the sub data page corresponds to a page splitting process can be determined according to splitting information, and under the condition that the sub data page corresponds to the page splitting process, the data page to be read is determined according to the occurrence time of the page splitting process, so that a father data page and the sub data page obtained from the computing node are matched with each other, namely correct data can be read from the computing node.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A data reading method applied to a storage node, the method comprising:

receiving a data page read request from a compute node, the data page read request including a first identification;

determining that a first data page identified as the first identifier in the storage node corresponds to a page splitting process;

and sending a target data page to the computing node according to the occurrence opportunity of the page splitting process, wherein the target data page is the first data page or the second data page.

2. The method of claim 1, wherein the storage node stores partial split information; the partial splitting information comprises an identifier of a splitting data page corresponding to a partial page splitting process and a log serial number LSN of a log triggering the page splitting process, and the splitting data page corresponding to the partial page splitting process is stored in the storage node;

determining that a first data page identified as the first identification in the storage node corresponds to a page splitting process, comprising: determining that the first identifier exists in the identifiers of the split data pages included in the partial split information.

3. The method according to claim 1 or 2, wherein the first identifier is an identifier of a data page to be read determined by the compute node, and the data page to be read is a child data page of a first parent data page in the compute node.

4. The method of claim 3, wherein the data page read request further includes a third LSN of the first parent data page;

before the sending the target data page to the computing node according to the occurrence opportunity of the page splitting process, the method further includes: and determining the occurrence opportunity of the page splitting process according to the first LSN of the log triggering the page splitting process, the second LSN of the first data page in the storage node and the third LSN.

5. The method of claim 4, wherein the determining the occurrence of the page splitting process comprises:

if the third LSN is less than the first LSN and the first LSN is less than or equal to the second LSN, determining that the page splitting process occurs after the storage node obtains the first parent data page;

if the first LSN is less than the second LSN and the first LSN is less than the third LSN, determining that the page splitting process occurs before the storage node obtains the first parent data page;

if the second LSN is less than the first LSN and the first LSN is less than or equal to the third LSN, then it is determined that the data playback speed of the storage node is slower than the data playback speed of the compute node and the page splitting process is pending to occur in the storage node.

6. The method according to any one of claims 3 to 5, wherein the page splitting process occurs after the storage node obtains the first parent data page and before sending a target data page to the compute node, further comprising:

sending first information to the computing node, wherein the first information is used for the computing node to determine a target data page;

receiving a read request for the target page of data from the compute node.

7. The method according to any one of claims 3 to 5, wherein the first information comprises: triggering a first LSN of a log of the page splitting process and indicating information indicating that the page splitting process occurs after the storage node obtains the first parent data page.

8. The method according to any one of claims 3 to 5, wherein the page splitting process occurs before the storage node obtains the first parent data page, and the target data page is the first data page in the storage node.

9. The method according to any one of claims 3 to 5, wherein the data playback speed of the storage node is slower than the data playback speed of the computing node and the page splitting process is to occur in the storage node, the target data page being the first data page in the storage node; prior to sending the first data page identified as the first identification to the compute node, further comprising:

and determining that the log playback of the storage node with the LSN as the first LSN is finished.

10. A data reading method applied to a compute node, the method comprising:

determining that an identifier of a data page to be read is a first identifier, wherein the data page to be read is a child data page of a first parent data page in the computing node;

sending a data page reading request to a storage node, wherein the data page reading request comprises the first identifier, and the first identifier is used for the storage node to determine the occurrence time of a page splitting process corresponding to the first data page identified as the first identifier and send a target data page to the computing node according to the occurrence time;

receiving a target data page from the storage node, the target data page being a second data page or the first data page.

11. The method of claim 10, further comprising, prior to receiving a target page of data from the storage node:

receiving first information from the storage node, the first information comprising: triggering a first Log Sequence Number (LSN) of a log of the page splitting process and indicating information indicating that the page splitting process occurs after the storage node obtains the first parent data page;

determining a target data page according to the first information;

and sending a request for reading the target data page to the storage node.

12. The method of claim 11, wherein determining a target data page based on the first information comprises:

determining that the first father data page does not correspond to a page splitting process in the storage node according to the first information;

reading a second parent data page from the storage node, wherein the second parent data page is a first parent data page updated after the page splitting process occurs;

and determining the target data page according to the second parent data page.

13. The method of claim 11, wherein determining a target data page based on the first information comprises:

determining a corresponding page splitting process of the first parent data page in the storage node according to the first information;

reading each data page from a second father data page to a target father data page from a storage node, wherein the target father data page does not correspond to a page splitting process and is positioned between a root data page and the second father data page or the target father data page is the root data page, the second father data page is a data page after the first father data page is split, and the second father data page is a father data page of the first data page;

and determining the target data page according to the data pages from the second father data page to the target father data page.

14. The method of claim 13, wherein the compute node has stored therein full split information, the full split information including an identification of a split data page corresponding to each page split process and an LSN of a log that triggered each page split process;

determining that the first parent data page corresponds to a page splitting process in the storage node according to the first information, including: and according to the indication information, determining that the second identifier exists in the identifiers of the split data pages included in the full split information and the LSN triggering the corresponding page splitting process is the first LSN, and the second identifier is the identifier of the first father data page.

15. The method of claim 13, wherein the compute node has no full split information stored therein, and a management node has the full split information stored therein; the full splitting information comprises the identifier of a splitting data page corresponding to each page splitting process and the LSN of the log triggering each page splitting process;

determining that the first parent data page corresponds to a page splitting process in the storage node according to the first information, including:

sending a query request to a management node according to the indication information, wherein the query request comprises the first LSN and the second identifier, and the query request indicates the management node to determine whether the first father data page corresponds to a page splitting process in the storage node according to the full splitting information;

receiving a query result from the management node, the query result indicating that the first parent data page corresponds to a page splitting process in the storage node.

16. A data reading method applied to a compute node, the method comprising:

determining a target data page according to the occurrence opportunity of the page splitting process;

reading the target page of data from the storage node.

17. The method of claim 16, wherein the compute node stores therein full split information, the full split information including an identifier of a split data page corresponding to each page split process and an LSN of a log that triggers each page split process;

determining that a first data page identified as the first identifier in the storage node corresponds to a page splitting process, including: and determining that the first identifier exists in the identifiers of the split data pages included in the full split information.

18. The method according to claim 16, wherein the compute node does not store therein full split information, and the management node stores therein full split information, wherein the full split information includes an identifier of a split data page corresponding to each page split process and an LSN of a log that triggers each page split process;

determining that a first data page identified as the first identifier in the storage node corresponds to a page splitting process, including:

sending a query request to the management node, wherein the query request comprises the first identifier, and the query request indicates the management node to determine that the first data page corresponds to a page splitting process in the storage node according to the full splitting information;

and receiving a query result from the management node, wherein the query result indicates that the first data page corresponds to a page splitting process in the storage node.

19. A data reading apparatus, comprising:

a transceiver module for receiving a data page read request from a compute node, the data page read request including a first identifier;

the processing module is used for determining that a first data page identified as the first identification in the data reading device corresponds to a page splitting process;

and the transceiver module is further configured to send a target data page to the computing node according to the occurrence opportunity of the page splitting process, where the target data page is the first data page or the second data page.

20. The apparatus of claim 19, wherein the data reading means has stored therein partial fragmentation information; the partial splitting information comprises an identifier of a splitting data page corresponding to a partial page splitting process and a Log Serial Number (LSN) of a log triggering the page splitting process, and the splitting data pages corresponding to the partial page splitting process are stored in the data reading device;

the processing module is specifically configured to determine that the first identifier exists in identifiers of split data pages included in the partial split information.

21. The apparatus according to claim 19 or 20, wherein the first identifier is an identifier of a data page to be read determined by the compute node, and the data page to be read is a child data page of a first parent data page in the compute node.

22. The apparatus of claim 21, wherein the data page read request further comprises a third LSN of the first parent data page in the compute node;

before the transceiver module sends the target data page to the computing node according to the occurrence opportunity of the page splitting process, the processing module is further configured to:

and determining the occurrence opportunity of the page splitting process according to the first LSN of the log triggering the page splitting process, the second LSN of the first data page in the data reading device and the third LSN.

23. The apparatus of claim 22, wherein the processing module is specifically configured to:

if the third LSN is less than the first LSN and the first LSN is less than or equal to the second LSN, determining that the page splitting process occurs after the data reading device obtains the first parent data page;

if the first LSN is less than the second LSN and the first LSN is less than the third LSN, determining that the page splitting process occurs before the data reading device obtains the first parent data page;

if the second LSN is less than the first LSN and the first LSN is less than or equal to the third LSN, then it is determined that the data playback speed of the data reading device is slower than the data playback speed of the compute node and the page splitting process is pending to occur in the data reading device.

24. The apparatus according to any of claims 21 to 23, wherein the page splitting process occurs after the data reading means obtains the first parent data page; before the transceiver module sends the target data page to the computing node, the transceiver module is further configured to:

sending first information to the computing node, wherein the first information is used for the computing node to determine the target data page;

receiving a read request for the target page of data from the compute node.

25. The apparatus of claim 24, wherein the first information comprises: triggering a first LSN of a log of the page splitting process and indicating information indicating that the page splitting process occurs after the data reading device obtains the first parent data page.

26. The apparatus of any of claims 21 to 23, wherein the page splitting process occurs before the data reading apparatus obtains the first parent data page, and the target data page is the first data page in the data reading apparatus.

27. The apparatus according to any one of claims 19 to 22, wherein the data playback speed of the data reading apparatus is slower than the data playback speed of the computing node and the page splitting process is to occur in the data reading apparatus, the target data page being the first data page in the data reading apparatus; before the transceiver module sends the first data page to the computing node, the processing module is further configured to: and determining that the log playback of the LSN as the first LSN by the data reading device is finished.

28. A data reading apparatus, comprising:

the processing module is used for determining that the identifier of the data page to be read is a first identifier, and the data page to be read is a child data page of a first parent data page in the computing node;

a receiving and sending module, configured to send a data page reading request to a storage node, where the data page reading request includes the first identifier, and the first identifier is used for the storage node to determine an occurrence time of a page splitting process corresponding to a first data page identified as a first identifier, and send a target data page to the data reading apparatus according to the occurrence time;

the transceiver module is further configured to receive a target data page from the storage node, where the target data page is a second data page or the first data page.

29. The apparatus of claim 28, wherein prior to the transceiver module receiving a target data page from the storage node:

the transceiver module is further configured to receive first information from the storage node, where the first information includes: triggering a first Log Sequence Number (LSN) of a log of the page splitting process and indicating information, wherein the indicating information indicates that the page splitting process occurs after a first parent data page is obtained by the storage node;

the processing module is further used for determining a target data page according to the first information;

the transceiver module is further configured to send a request for reading the target data page to the storage node.

30. The apparatus of claim 29, wherein the processing module is specifically configured to:

and determining the target data page according to the second parent data page.

31. The apparatus of claim 29, wherein the processing module is specifically configured to:

32. The apparatus according to claim 31, wherein the data reading apparatus stores therein full splitting information, the full splitting information including an identifier of a split data page corresponding to each page splitting process and an LSN of a log that triggers each page splitting process;

the processing module is specifically configured to: and according to the first information, determining that the second identifier exists in the identifiers of the split data pages included in the full split information and the LSN triggering the corresponding page splitting process is the first LSN, and the second identifier is the identifier of the first father data page.

33. The apparatus of claim 31, wherein the data reading apparatus does not store therein full split information, and a management node stores therein the full split information; the full splitting information comprises the identifier of a splitting data page corresponding to each page splitting process and the LSN of the log triggering each page splitting process;

the processing module is specifically configured to:

according to the indication information, controlling the transceiver module to send an inquiry request to a management node, wherein the inquiry request comprises the first LSN and the second identifier, and the inquiry request indicates the management node to determine whether the first father data page corresponds to a page splitting process in the storage node according to the full splitting information;

determining that the first parent data page corresponds to a page splitting process in the storage node according to a query result received by the transceiver module from the management node, wherein the query result indicates that the first parent data page corresponds to the page splitting process in the storage node.

34. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9 or the method of any one of claims 10-15 or the method of any one of claims 16-18.

35. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-9 or the method of any one of claims 10-15 or the method of any one of claims 16-18.