CN111290714B

CN111290714B - Data reading method and device

Info

Publication number: CN111290714B
Application number: CN202010081830.7A
Authority: CN
Inventors: 周力
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-02-06
Filing date: 2020-02-06
Publication date: 2023-09-05
Anticipated expiration: 2040-02-06
Also published as: CN111290714A

Abstract

The embodiment of the application provides a data reading method and device, and relates to the technical field of distributed storage. The specific implementation scheme is as follows: determining, from the computing node, an identifier of a data page to be read as a first identifier, the data page to be read as a child data page of a first parent data page in the computing node, and sending a data page read request to the storage node, the data page read request including the first identifier; the storage node determines that the first data page identified as the first identification in the storage node corresponds to a page splitting process, and sends a target data page to the slave computing node according to the occurrence time of the page splitting process, wherein the target data page is the first data page or the second data page. The embodiment of the application can enable the parent data page and the child data page in the slave computing node to be matched, namely ensuring that the slave computing node reads correct data from the storage node.

Description

Data reading method and device

Technical Field

Embodiments of the present application relate to computer technology, and in particular, to a distributed storage technology.

Background

Under the support of cloud computing technology and service, the service scale is rapidly expanded, which puts higher requirements on core infrastructure service-database service built in the cloud, so that a new generation of cloud primary database architecture appears, and the service capability of the cloud database is greatly improved.

The cloud primary database architecture comprises a master computing node, a slave computing node and a storage node, wherein the master computing node is responsible for reading and writing data, and the slave computing node can only read the data. Wherein the data is no longer written directly to the storage node, but the master computing node generates a redox log and transmits the redox log to the storage node, which plays back the redox log as data. While the redox log is transferred to the slave computing node, which plays back the redox log as data as needed to keep the data in the slave computing node cache up to date.

Since the storage node and the slave computing node each play back the redox log independently to generate data, there is a possibility that the update progress of the same data page at a certain period of time is inconsistent, such as the slave computing node needs to read a child data page of a certain parent data page, and due to the inconsistent update progress of the storage node and the slave computing node data page, there may be the following situations: page splitting occurs only in the storage node or in the slave computing node in relation to the child data page, such that the child data page read from the storage node by the slave computing node may not match the parent data page in the slave computing node, i.e., erroneous data may be read from the computing node.

Disclosure of Invention

The embodiment of the application provides a data reading method and a data reading device, which can enable a parent data page and a child data page in a slave computing node to be matched, namely, ensure that the slave computing node reads correct data from a storage node.

In a first aspect, an embodiment of the present application provides a data reading method, applied to a storage node, where the method includes: receiving a data page read request from a computing node, the data page read request including a first identification; determining that a first data page identified in the storage node as the first identification corresponds to a page splitting process; and sending a target data page to the computing node according to the occurrence time of the page splitting process, wherein the target data page is the first data page or the second data page. Optionally, the first identifier is an identifier of a data page to be read determined by the computing node, and the data page to be read is a child data page of a first parent data page in the computing node.

In this aspect, the compute node is a slave compute node. In this scheme, when a slave computing node needs to read a child data page of a certain data page, whether the child data page corresponds to a page splitting process or not can be determined according to splitting information, so that in the case that the child data page corresponds to the page splitting process, a correct read data page is determined according to the occurrence time of the page splitting process, so that a parent data page and the child data page in the slave computing node are matched with each other, that is, correct data can be read from the computing node.

In one possible implementation, the storage node stores partial split information; the partial splitting information comprises an identifier of a splitting data page corresponding to a partial page splitting process and a log sequence number LSN of a log triggering the page splitting process, and the splitting data page corresponding to the partial page splitting process is stored in the storage node; determining that a first page of data identified in the storage node as the first identification corresponds to a page splitting process, comprising: determining that the first identifier exists in identifiers of split data pages included in the partial split information.

In the scheme, whether the data page corresponds to the specific implementation of the page splitting process is determined through the splitting information, so that the determination of whether the data page corresponds to the page splitting process is simple and easy to implement.

In one possible implementation, the data page read request further includes a third LSN of the first parent data page; before the sending the target data page to the computing node according to the occurrence opportunity of the page splitting process, the method further comprises: and determining the occurrence time of the page splitting process according to the first LSN of the log triggering the page splitting process, the second LSN of the first data page in the storage node and the third LSN.

The scheme provides a specific implementation for determining the occurrence time of the page splitting process, namely, the occurrence time of the page splitting process is determined through the LSN of the corresponding log, and the occurrence time of the page splitting process is accurately judged according to the size of the LSN of the corresponding log because the data are obtained through log playback.

In a possible implementation manner, the determining the occurrence time of the page splitting process includes: if the third LSN is less than the first LSN and the first LSN is less than or equal to the second LSN, determining that the page splitting process occurs after the storage node obtains the first parent data page; if the first LSN is less than the second LSN and the first LSN is less than the third LSN, determining that the page splitting process occurs before the storage node obtains the first parent data page; if the second LSN is less than the first LSN and the first LSN is less than or equal to the third LSN, determining that the data playback speed of the storage node is slower than the data playback speed of the compute node and that the page splitting process is to occur in the storage node.

In one possible implementation, the page splitting process occurs after the storage node obtains the first parent data page; before sending the target data page to the computing node, the method further comprises: sending first information to the computing node, wherein the first information is used for determining a target data page by the computing node; a read request of the target data page from the compute node is received.

In this embodiment, a specific implementation of how to trigger the determination of the correct data page to be read from the computing node after the first parent data page is obtained by the storage node in the page splitting process corresponding to the first data page is given.

In one possible implementation, the first information includes: a first LSN triggering a log of the page splitting process and indication information indicating that the page splitting process occurred after the storage node obtained the first parent data page.

In one possible implementation, the page splitting process occurs before the storage node obtains the first parent data page, the target data page being the first data page in the storage node.

In one possible implementation, the data playback speed of the storage node is slower than the data playback speed of the computing node and the page splitting process is to occur in the storage node, the target data page being the first data page in the storage node; before sending the first data page identified as the first identification to the computing node, further comprising: and determining that the log with the LSN as the first LSN is played back by the storage node.

In a second aspect, an embodiment of the present application provides a data reading method, applied to a computing node, where the method includes: determining an identifier of a data page to be read as a first identifier, wherein the data page to be read is a child data page of a first father data page in the computing node; sending a data page reading request to a storage node, wherein the data page reading request comprises a first identifier, and the first identifier is used for the storage node to determine the occurrence time of a page splitting process corresponding to a first data page identified as the first identifier and send a target data page to the computing node according to the occurrence time; a target data page is received from the storage node, the target data page being the first data page or a second data page of the storage node.

In this aspect, the compute node is a slave compute node. In this scheme, when a slave computing node needs to read a child data page of a certain data page, whether the child data page corresponds to a page splitting process or not may be determined according to the splitting information, so that in the case that the child data page corresponds to the page splitting process, a correct read data page is determined according to an occurrence timing of the page splitting process, so that a parent data page and the child data page in the slave computing node are matched with each other, that is, correct data can be read from the computing node.

In one possible implementation, before receiving the target data page from the storage node, the method further includes: receiving first information from the storage node, the first information comprising: triggering a first log sequence number LSN of a log of the page splitting process and indicating information, wherein the indicating information indicates that the page splitting process occurs after the storage node obtains the first father data page; determining a target data page according to the first information; and sending a request for reading the target data page to the storage node.

The present scheme gives a scheme of how to trigger the computing node to determine the target data page.

In a possible implementation manner, the determining the target data page according to the first information includes: determining that the first parent data page does not correspond to a page splitting process in the storage node according to the first information; reading a second parent data page from the storage node, the second parent data page being the first parent data page updated after the page splitting process occurs; and determining the target data page according to the second father data page.

According to the scheme, the updated data page-the second parent data page of the first parent data page, which is caused by the page splitting process corresponding to the first data page, can be obtained from the computing node, so that the correct data page to be read-the target data page can be determined according to the second parent data page, the parent data page and the child data page in the computing node are matched with each other, and the correct data can be read from the computing node.

In a possible implementation manner, the determining the target data page according to the first information includes: determining a page splitting process corresponding to the first parent data page in the storage node according to the first information; reading each data page from a storage node from a second parent data page to a target parent data page, wherein the target parent data page does not correspond to a page splitting process and is located between a root data page and the second parent data page or the target parent data page is the root data page, the second parent data page is a data page after the first parent data page is split, and the second parent data page is a parent data page of the first data page; and determining the target data page according to each data page from the second parent data page to the target parent data page. It will be appreciated that the identity of the second parent data page is also the second identity.

According to the scheme, updated data pages from the second father data page to the target father data page caused by the page splitting process corresponding to the first data page are obtained from the computing node, so that the correct data page to be read, namely the target data page, can be determined according to the data pages from the second father data page to the target father data page, further, the father data page and the son data page in the computing node are matched with each other, and correct data can be read from the computing node.

In one possible implementation manner, the computing node stores full-quantity splitting information, wherein the full-quantity splitting information comprises an identifier of a splitting data page corresponding to each page splitting process and an LSN of a log triggering each page splitting process; the determining, according to the first information, that the first parent data page corresponds to a page splitting process in the storage node includes: and determining that the second identifier exists in identifiers of split data pages included in the full split information and LSNs triggering corresponding page splitting processes are the first LSNs according to the indication information, wherein the second identifier is the identifier of the first parent data page.

The scheme provides a specific implementation of the page splitting process for determining the first parent data page in the storage node, and network overhead is low.

In one possible implementation, the computing node does not store therein the full-size split information, and the management node stores therein the full-size split information; the full-quantity splitting information comprises an identifier of a splitting data page corresponding to each page splitting process and an LSN of a log triggering each page splitting process; the determining, according to the first information, that the first parent data page corresponds to a page splitting process in the storage node includes: sending a query request to a management node according to the indication information, wherein the query request comprises the first LSN and the second identifier, and the query request indicates the management node to determine whether the first father data page corresponds to a page splitting process in the storage node according to the full-size splitting information; a query result is received from the management node, the query result indicating that the first parent data page corresponds to a page splitting process in the storage node.

The present solution provides another specific implementation of the process of determining that the first parent data page corresponds to a page splitting in the storage node, where storage space from the computing node may be saved.

In a third aspect, an embodiment of the present application provides a data reading method, applied to a computing node, where the method includes: determining an identifier of a data page to be read as a first identifier, wherein the data page to be read is a child data page of a first father data page in the computing node; determining that a first data page identified as the first identification in a storage node corresponds to a page splitting process; determining a target data page according to the occurrence time of the page splitting process; the target data page is read from the storage node.

In a possible implementation manner, the computing node stores full-quantity splitting information, wherein the full-quantity splitting information comprises an identifier of a splitting data page corresponding to each page splitting process and an LSN of a log triggering each page splitting process; determining that a first page of data in a storage node identified as the first identification corresponds to a page splitting process, comprising: determining that the first identifier exists in identifiers of split data pages included in the full amount of split information.

In the scheme, the first data page marked as the first mark in the storage node is determined to correspond to the page splitting process through the splitting information, so that the method is simple and easy to realize, and the network overhead is low.

In a possible implementation manner, the computing node does not store full-quantity splitting information, and the management node stores full-quantity splitting information, wherein the full-quantity splitting information comprises an identifier of a splitting data page corresponding to each page splitting process and an LSN of a log triggering each page splitting process;

in the scheme, the first data page marked as the first mark in the storage node is determined to correspond to the page splitting process through the splitting information, so that the method is simple and easy to realize.

In one possible implementation manner, before the determining the target data page according to the occurrence time of the page splitting process, the method further includes: determining the occurrence time of the page splitting process according to the first LSN of the log triggering the page splitting process, the second LSN of the first data page in the storage node and the third LSN of the first father data page in the computing node.

In a possible implementation manner, the determining the occurrence time of the page splitting process includes: if the third LSN is less than the first LSN and the first LSN is less than or equal to the second LSN, determining that the page splitting process occurs after the storage node obtains the first parent data page; if the first LSN is less than the second LSN and the first LSN is less than the third LSN, determining that the page splitting process occurs before the storage node obtains the first parent data page; if the second LSN is less than the first LSN, which is less than or equal to the third LSN, then determining that the data playback speed of the storage node is slower than the data playback speed of the compute node and that the page splitting process is to occur in the storage node.

In one possible implementation, if the page splitting process occurs after the storage node obtains the first parent data page, the determining the target data page includes: determining that the first parent data page does not correspond to a page splitting process in the storage node; reading a second parent data page from a storage node, the second parent data page being the first parent data page updated after the page splitting occurs; and determining a target data page according to the second father data page.

According to the scheme, the updated data page-the second father data page caused by the page splitting process corresponding to the first data page is obtained from the computing node, so that the correct data page to be read-the target data page can be determined according to the second father data page, the father data page and the son data page in the computing node are matched with each other, and the correct data can be read from the computing node.

In one possible implementation, if the page splitting process occurs after the storage node obtains the first parent data page, determining the target data page according to an occurrence timing of the page splitting process includes: determining that the first parent data page corresponds to a page splitting process in the storage node; reading each data page from a storage node from a second parent data page to a target parent data page, wherein the target parent data page does not correspond to a page splitting process and is located between a root data page and the second parent data page or the target parent data page is the root data page, the second parent data page is a data page after the first parent data page is split, and the second parent data page is a parent data page of the first data page; and determining the target data page according to each data page from the second parent data page to the target parent data page.

In one possible implementation, if the page splitting process occurs before the storage node obtains the first parent data page, the determining the target data page includes: determining the first data page in the storage node as a target data page.

In one possible implementation, if the data playback speed of the storage node is slower than the data playback speed of the computing node and the page splitting process is to occur in the storage node, the determining the target data page includes: determining that the log of the first LSN is replayed by the storage node; determining the first data page in the storage node as a target data page.

In a fourth aspect, an embodiment of the present application provides a data reading apparatus, including: a transceiver module for receiving a data page read request from a computing node, the data page read request including a first identification; a processing module for determining that a first data page identified as the first identification in the data reading device corresponds to a page splitting process; and the receiving and transmitting module is further used for transmitting a target data page to the computing node according to the occurrence time of the page splitting process, wherein the target data page is the first data page or the second data page.

In one possible embodiment, the data reading device has stored therein partial split information; the partial splitting information comprises an identification of a splitting data page corresponding to a partial page splitting process and a log sequence number LSN of a log triggering the page splitting process, and the splitting data page corresponding to the partial page splitting process is stored in the data reading device; the processing module is specifically configured to determine that the first identifier exists in identifiers of split data pages included in the partial split information.

In one possible implementation manner, the first identifier is an identifier of a data page to be read determined by the computing node, and the data page to be read is a child data page of a first parent data page in the computing node.

In one possible implementation, the data page read request includes the first identification and a third LSN of the first parent data page in the computing point; before the transceiver module sends the target data page to the computing node according to the occurrence time of the page splitting process, the processing module is further configured to: and determining the occurrence time of the page splitting process according to the first LSN of the log triggering the page splitting process, the second LSN of the first data page in the data reading device and the third LSN.

In a possible implementation manner, the processing module is specifically configured to: if the third LSN is less than the first LSN and the first LSN is less than or equal to the second LSN, determining that the page splitting process occurs after the data reading device obtains the first parent data page; if the first LSN is less than the second LSN and the first LSN is less than the third LSN, determining that the page splitting process occurs before the data reading device obtains the first parent data page; if the second LSN is less than the first LSN and the first LSN is less than or equal to the third LSN, determining that the data playback speed of the data reading device is slower than the data playback speed of the computing node and that the page splitting process is to occur in the data reading device.

In one possible implementation, the page splitting process occurs after the data reading device obtains the first parent data page; before the transceiver module sends the target data page to the computing node, the transceiver module is further configured to: sending first information to the computing node, wherein the first information is used for determining the target data page to be read by the computing node; a read request of the target data page from the compute node is received.

In one possible implementation, the first information includes: the method includes triggering a first LSN of a log of the page splitting process and indicating information indicating that the page splitting process occurs after the data reading device obtains a first parent data page.

In one possible implementation, the page splitting process occurs before the first parent data page is obtained by the data reading device, the target data page being the first data page in the data reading device.

In one possible implementation, the data playback speed of the data reading device is slower than the data playback speed of the computing node and the page splitting process is to occur in the data reading device, the target data page being the first data page in the data reading device. The processing module is further configured to, prior to the transceiver module sending the first data page to the computing node: and determining that the data reading device finishes the log playback with the LSN being the first LSN.

In a fifth aspect, an embodiment of the present application provides a data reading apparatus, including: the processing module is used for determining the identification of the data page to be read as a first identification, wherein the data page to be read is a child data page of a first parent data page in the data reading device; the receiving and transmitting module is used for sending a data page reading request to a storage node, the data page reading request comprises the first identifier, the first identifier is used for determining the occurrence time of a page splitting process corresponding to a first data page identified as the first identifier by the storage node, and sending a target data page to the data reading device according to the occurrence time; the receiving and transmitting module is further configured to receive a target data page from the storage node, where the target data page is a second data page or the first data page.

In one possible implementation, before the transceiver module receives a target data page from the storage node: the transceiver module is further configured to receive first information from the storage node, where the first information includes: triggering a first log sequence number LSN of a log of the page splitting process and indicating information, wherein the indicating information indicates that the page splitting process occurs after the storage node obtains the first father data page; the processing module is further used for determining a target data page according to the first information; the receiving and transmitting module is further configured to send a request for reading the target data page to the storage node.

In a possible implementation manner, the processing module is specifically configured to: determining that a first parent data page does not correspond to a page splitting process in the storage node according to the first information; reading a second parent data page from the storage node, the second parent data page being the first parent data page updated after the page splitting process occurs; and determining the target data page according to the second father data page.

In a possible implementation manner, the processing module is specifically configured to: determining a page splitting process corresponding to the first parent data page in the storage node according to the first information; reading each data page from a storage node from a second parent data page to a target parent data page, wherein the target parent data page does not correspond to a page splitting process and is located between a root data page and the second parent data page or the target parent data page is the root data page, the second parent data page is a data page after the first parent data page is split, and the second parent data page is a parent data page of the first data page; and determining the target data page according to each data page from the second parent data page to the target parent data page.

In one possible implementation manner, the data storage device stores full-quantity splitting information, wherein the full-quantity splitting information comprises an identifier of a splitting data page corresponding to each page splitting process and an LSN of a log triggering each page splitting process; the processing module is specifically configured to: and according to the first information, determining that the second identifier exists in the identifiers of the split data pages included in the full split information and the LSN triggering the corresponding page splitting process is the first LSN, wherein the second identifier is the identifier of the first father data page.

In a possible implementation manner, the data storage device does not store therein full-quantity split information, and the management node stores therein the full-quantity split information; the full-quantity splitting information comprises an identifier of a splitting data page corresponding to each page splitting process and an LSN of a log triggering each page splitting process; the processing module is specifically configured to: according to the indication information, controlling the transceiver module to send a query request to a management node, wherein the query request comprises the first LSN and the second identifier, and the query request indicates the management node to determine whether the first father data page corresponds to a page splitting process in the storage node according to the full-size splitting information; and determining a page splitting process corresponding to the first parent data page in the storage node according to a query result received by the transceiver module from the management node, wherein the query result indicates the page splitting process corresponding to the first parent data page in the storage node.

In a sixth aspect, an embodiment of the present application provides a data reading apparatus, including: the processing module is used for determining the identification of the data page to be read as a first identification, wherein the data page to be read is a child data page of a first parent data page in the data reading device; determining that a first data page identified as the first identification in a storage node corresponds to a page splitting process; determining a target data page according to the occurrence time of the page splitting process; the target data page is read from the storage node.

In one possible implementation manner, the data reading device stores full-quantity splitting information, wherein the full-quantity splitting information comprises an identifier of a splitting data page corresponding to each page splitting process and an LSN of a log triggering each page splitting process; the processing module is specifically configured to: determining that the first identifier exists in identifiers of split data pages included in the full amount of split information.

In one possible embodiment, the device further comprises a transceiver module; the data reading device does not store full-quantity splitting information, the management node stores full-quantity splitting information, and the full-quantity splitting information comprises identifiers of splitting data pages corresponding to each page splitting process and LSNs of logs triggering each page splitting process; the processing module is specifically configured to control the transceiver module to send a query request to the management node, where the query request includes the first identifier, and the query request instructs the management node to determine, according to the full-size splitting information, a corresponding page splitting process of the first data page in the storage node; and determining that the first data page corresponds to a page splitting process in the storage node according to a query result received by the transceiver module from the management node, wherein the query result indicates that the first data page corresponds to the page splitting process in the storage node.

In one possible implementation, before the processing module determines the target data page according to the occurrence timing of the page splitting process, the processing module is further configured to: determining the occurrence time of the page splitting process according to a first LSN of a log triggering the page splitting process, a second LSN of the first data page in a storage node and a third LSN of a first parent data page in the data reading device.

In a possible implementation manner, the processing module is specifically configured to: if the third LSN is less than the first LSN and the first LSN is less than or equal to the second LSN, determining that the page splitting process occurs after the storage node obtains the first parent data page; if the first LSN is less than the second LSN and the first LSN is less than the third LSN, determining that the page splitting process occurs before the storage node obtains the first parent data page; if the second LSN is less than the first LSN, which is less than or equal to the third LSN, then determining that the data playback speed of the storage node is slower than the data playback speed of the data reading device and that the page splitting process is to occur in the storage node.

In one possible implementation, if the page splitting process occurs after the storage node obtains the first parent data page, the processing module is specifically configured to: determining that the first parent data page does not correspond to a page splitting process in the storage node; reading a second parent data page from a storage node, the second parent data page being the first parent data page updated after the page splitting occurs; and determining a target data page according to the second father data page.

In one possible implementation, if the page splitting process occurs after the storage node obtains the first parent data page, the processing module is specifically configured to: determining that the first parent data page corresponds to a page splitting process in the storage node; reading each data page from a storage node from a second parent data page to a target parent data page, wherein the target parent data page does not correspond to a page splitting process and is located between a root data page and the second parent data page or the target parent data page is the root data page, the second parent data page is a data page after the first parent data page is split, and the second parent data page is a parent data page of the first data page; and determining the target data page according to each data page from the second parent data page to the target parent data page.

In one possible implementation, if the page splitting process occurs before the storage node obtains the first parent data page, the processing module is specifically configured to: determining the first data page in the storage node as a target data page.

In one possible implementation, if the data playback speed of the storage node is slower than the data playback speed of the data reading device and the page splitting process is to occur in the storage node, the processing module is specifically configured to: determining that the log of the first LSN is replayed by the storage node; determining the first data page in the storage node as a target data page.

In a seventh aspect, an embodiment of the present application provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect and any one of the possible designs of the first aspect or to perform the method of the second aspect and any one of the possible designs of the second aspect or to perform the method of the third aspect and any one of the possible designs of the third aspect.

In an eighth aspect, the present application provides a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of the first aspect and any one of the possible designs of the first aspect or to perform the method of the second aspect and any one of the possible designs of the second aspect or to perform the method of the third aspect and any one of the possible designs of the third aspect.

One embodiment of the above application has the following advantages or benefits: the parent data page in the slave computing node may be matched with the child data page, i.e., to ensure that the slave computing node reads the correct data from the storage node. Because whether the first data page identified as the first identification in the storage node corresponds to the page splitting process (the first identification is the identification of the data page to be read which is initially determined by the computing node) is determined, if so, determining the correct data page to be read according to the time of the page splitting process, and reading the correct data page from the storage node; therefore, the technical problem that the father data page and the son data page in the slave computing node are not matched when the slave computing node reads the data from the storage node due to inconsistent speed of replaying the log into the data in the slave computing node and the storage node in the prior art is solved, and the father data page and the son data page in the slave computing node are ensured to be matched, namely, the technical effect that the slave computing node reads correct data from the storage node is ensured.

Other effects of the above alternative will be described below in connection with specific embodiments.

Drawings

The drawings are included to provide a better understanding of the present application and are not to be construed as limiting the application. Wherein:

FIG. 1 is a schematic diagram of a second order B-tree according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a multi-level B-tree according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a page splitting process of a B-tree according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a B+ tree structure according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a page splitting process of a B+ tree according to an embodiment of the present application;

FIG. 6 is a system architecture diagram provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of a data reading scenario according to an embodiment of the present application;

FIG. 8 is a flowchart illustrating a data acquisition method according to an embodiment of the present application;

FIG. 9 is a second flowchart of a data acquisition method according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a data acquisition device according to an embodiment of the present application;

fig. 11 is a block diagram of an electronic device for implementing a data acquisition method of an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a alone, a and B together, and B alone, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural. The terms "first," "second," and the like, herein, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.

First, elements according to the present application will be described.

In some data storage scenarios, such as a distributed database, the data is stored in pages, each of which may be 4k bytes (16 k or any other size). Data is added to the data page in turn until the data is full. In databases where the data is organized in B-trees or variations of B-trees (e.g., b+ trees, B x trees), if the data pages are filled, page splitting may occur, i.e., one data page splits into two data pages, with new data being inserted into the corresponding data pages in order.

In a B-tree or a variant of a B-tree, including non-leaf nodes and leaf nodes, all of the non-leaf nodes have children, the "children" of the non-leaf nodes are referred to as children of the non-leaf nodes in this embodiment, and the non-leaf nodes are the "parent" of their children.

The B tree and the b+ tree are briefly described below.

1. Second order B-tree, i.e. binary search tree: (1) All non-leaf nodes have at most two child nodes (Left and Right); (2) each node stores a key word; (3) The left pointer of the non-leaf node points to the child node that is smaller than its key and the right pointer points to the child node that is larger than its key. Fig. 1 is a schematic diagram of a B tree structure according to an embodiment of the present application. Referring to fig. 1, each node corresponds to a data page, and each data page is framed by a square frame; the node at the bottom is a leaf node, the rest nodes are all non-leaf nodes, and the node at the top of the non-leaf nodes is a root node. For example, node 101 is a parent node of node 102 and node 103, and node 102 and node 103 are child nodes of node 101. The data page corresponding to the node 101 is a parent data page of the data page corresponding to the node 102 and the data page corresponding to the node 103, and the data page corresponding to the node 102 and the data page corresponding to the node 103 are child data pages of the data page corresponding to the node 101.

Searching a second-order B tree, starting from a root node, and hitting if the keywords of the query are equal to the keywords of the node; otherwise, if the keyword of the query is smaller than the keyword of the node, entering a left sub-node; if the keyword of the query is larger than the keyword of the node, entering a right sub-node; if the pointer of the left child node or the right child node is empty, the report fails to find the corresponding key.

2. Multi-level B-trees, a type of multi-path search tree (not binary), for M-level B-trees: (1) At most M sub-nodes are arranged in any non-leaf node, and M is>2; (2) The number of sub-nodes of the root node is [2, M ]]The method comprises the steps of carrying out a first treatment on the surface of the (3) The number of sub-nodes of non-leaf nodes other than the root node is [ M/2, M ]]The method comprises the steps of carrying out a first treatment on the surface of the (4) Each node stores at least M/2-1 (taking the whole) and at most M-1 keywords; (at least 2 keys) (5) the number of keys for non-leaf nodes is equal to the number of pointers to child nodes minus 1; (6) keywords of non-leaf nodes: k (K) ₁ ,K ₂ ,…,K _M-1 And K is _i <K _i+1 The method comprises the steps of carrying out a first treatment on the surface of the Namely, each node in the M-level B tree has M-1 keywords at most; (7) pointers to non-leaf nodes: p1],P[2],…,P[M]Wherein, P1]The pointing key is less than K ₁ P [ M ]]The pointing key is greater than K _M-1 Is the sub-tree of (2), other P [ i ]]The pointing key belongs to (K) _i-1 ,K _i ) All leaf nodes of the subtree (8) are located at the same layer. A schematic of a structure of a multi-level B tree is shown in fig. 2 when m=3. Referring to fig. 2, each node corresponds to a data page, and each data page is framed by a square frame; the node at the bottom is a leaf node, the rest nodes are all non-leaf nodes, and the node at the top of the non-leaf nodes is a root node. Such as: node 201 is the parent node of node 202, node 203, and node 204, and node 202, node 203, and node 204 are child nodes of node 201. The data page corresponding to the node 201 is the data page corresponding to the node 202, the data page corresponding to the node 203 and the parent data page of the corresponding data page of the node 204, the data page corresponding to the node 202, the data page corresponding to the node 203 and the nodeThe data page corresponding to point 204 is a sub-data page of the data page corresponding to node 201.

Searching a multi-level B tree, starting from a root node, performing binary search on a keyword (ordered) sequence in the node, ending if hit, and otherwise entering a sub-node of the range to which the query keyword belongs; and repeating until the corresponding sub-node pointer is null or is a leaf node.

The page splitting process according to the embodiment of the present application will be described below by taking splitting of a 5-level B tree as an example.

Fig. 3 is a schematic diagram of a page splitting process of a B tree according to an embodiment of the present application.

As shown in a diagram of fig. 3, 39 is inserted into the null tree (i.e., data with a key 39 is inserted, and "Y" is inserted in the embodiment of the present application to mean that the data record with a key Y is inserted), where the root node includes a key, and the root node is also a leaf node. Referring to figure B of fig. 3, continuing with inserts 22, 97 and 41, the root node now includes 4 keys (i.e., includes 4 data records, one key for each data record). Referring to figure C of FIG. 3, continuing with insertion 53, inserting 53 is followed by a number of keys 4 exceeding the maximum allowable number of nodes, page splitting is performed centered at 41, and after splitting, as shown by figure D of FIG. 3, wherein the identity of the data pages of the data records comprising keys 22 and 39 is the same as the identity of the data pages of the data records comprising keys 39, 22, 41 and 97 prior to insertion 53.

Insertion of 13, 21, 40 in sequence also causes splitting, as shown in figure 3 as E, followed by insertion of 30, 27, 33, 36, 35, 34, 24, 29 in sequence, as shown in figure 3 as F. Continuing to insert 26, as shown in graph G of fig. 3, the node at which the key is located includes more than 4 keys, requiring page splitting centered on 27, and carrying 27 to the parent node, the result of the splitting being shown in graph H of fig. 3, wherein the identity of the page including keys 24 and 26 is the same as the identity of the data page of the data record including 24, 26, 29 and 30 prior to insertion 26; the carry-out results in the current root node also requiring page splitting, the result of which is shown in figure 3 as I, where the identity of the data pages of the data records comprising keys 22 and 27 is the same as the identity of the data pages of the data records comprising keys 22, 33, 36 and 41 prior to insertion 27.

3. B+ tree: the b+ tree is a variant of the B tree and is also a multiple search tree: the definition is essentially the same as B-tree, except for the following differences: (1) The number of sub-node pointers of the non-leaf nodes is the same as the number of the keywords, and the sub-node pointers P [ i ] of the non-leaf nodes]The pointing key belongs to [ K ] _i ,K _i+1 ]Is a sub-node of (2); (3) adding a chain pointer to all leaf nodes; (4) all keys appear at leaf nodes. In addition, the b+ tree has the following characteristics: (1) All keys appear in the linked list of leaf nodes (dense index), and the keys in the linked list are just ordered; (2) no hit at non-leaf nodes is possible at search: this is because the non-leaf nodes are equivalent to indexes (sparse indexes) of leaf nodes, which are equivalent to data layers storing data.

In this embodiment, for convenience of description, although data is not stored in the non-leaf nodes in the b+ tree, since all keywords appear in the leaf nodes, each non-leaf node in the b+ tree is considered to correspond to one data page, that is, each node corresponds to one data page.

Fig. 4 is a schematic structural diagram of a b+ tree according to an embodiment of the present application, as shown in fig. 4. Referring to fig. 4, the lowermost node is a leaf node, the remaining nodes are all non-leaf nodes, and the uppermost node of the non-leaf nodes is a root node. Such as: node 401 is a parent node of node 402, node 403, and node 404, and node 402, node 403, and node 404 are child nodes of node 401. The data page corresponding to the node 401 is a parent data page of the data page corresponding to the node 402, the data page corresponding to the node 403 and the data page corresponding to the node 404, and the data page corresponding to the node 402, the data page corresponding to the node 403 and the data page corresponding to the node 404 are child data pages of the data page corresponding to the node 401.

The number of keys and the number of sub-nodes of the B+ tree in FIG. 4 are the same, and the B+ tree is the 4 th-order B+ tree. In another approach, the number of keys of the B+ tree is 1 less than the number of child nodes.

The page splitting process according to the embodiment of the present application will be described below by taking splitting of a 5-stage b+ tree as an example.

Fig. 5 is a schematic diagram of a page splitting process of a b+ tree according to an embodiment of the present application.

As shown in a diagram of fig. 5, 5 is inserted into the null tree, where the root node includes a key and is also a leaf node. Referring to diagram B in fig. 5, inserts 8, 10 and 15 continue, where the root node includes 4 keys. Referring to fig. 5C, the insertion 16 is continued, and the number of keywords included by the maximum allowable node is exceeded by 4 after the insertion 16, so that page splitting is performed. When a leaf node splits, the split left node 2 data records and right 3 data records, the middle key becomes the key in the index node, and after splitting, as shown in the D diagram of fig. 5, the identification of the data pages of the data records including the keys 5 and 8 is the same as the identification of the data pages of the data records including the keys 5, 8, 10 and 15 before insertion 16.

As shown in fig. 5E, 17 is then inserted. As shown in FIG. 5F, the node where the insert 18, 18 is located includes more than 4 keys, page splitting is required, the split left node 2 data records, the right node 3 data records, the key 16 is carried into the parent node (index type), the split result is shown in the G diagram of FIG. 5, wherein the identification of the data pages of the data records including keys 10 and 15 is the same as the identification of the data pages of the data records including keys 10, 15, 16 and 17 prior to the insert 18.

After inserting several data, a b+ tree is obtained as shown in the H diagram of fig. 5. Next, as shown in the I diagram in fig. 5 and the J diagram in fig. 5, the key words included in the nodes where 7,7 are located are continuously inserted to be more than 4, page splitting is needed, the split left node is 2 data records, the right node is 3 data records, the key word 7 is carried to the father node, and the identification of the data page of the data record including the key words 5 and 6 is the same as the identification of the data page of the data record including the key words 5, 6, 8 and 9 before 7 is inserted; the carry-out results in the current root node also requiring splitting, the result of which is shown in the K diagram of fig. 5, wherein the identity of the data pages of the data records comprising the keys 7 and 10 is the same as the identity of the data pages of the data records comprising 10, 16, 18 and 20 prior to insertion 7.

After the description of the B tree and the b+ tree, the system architecture according to the present application will be described.

Fig. 6 is a system architecture diagram provided in an embodiment of the present application, referring to fig. 6, where the system architecture of the present embodiment includes a computing layer and a distributed storage layer, and may further include a management layer. The computing layer may include a plurality of nodes including a master computing node for reading and writing data and slave computing nodes, other slave computing nodes may read data from the distributed storage layer. The distributed storage layer includes a plurality of storage nodes for storing data. The management layer includes one or more management nodes.

In the system architecture, the data is not directly written into the main computing node for local storage, is not directly written into the storage node any more, but the main computing node generates a redox log and transmits the redox log to the storage node, and the storage node plays back the redox log into the data. At the same time, the redox log is transmitted to the slave computing node, and if necessary, the slave computing node plays back the redox log into data so as to keep the data in the buffer memory of the slave computing node up to date. Wherein, the redox log can be identified by a unique value called log sequence number (log sequence number, abbreviated as LSN), and the LSN of the log generated first is smaller than the LSN of the log generated later.

That is, the data in the data pages organized in the B-tree or the variant of the B-tree from the computing node and the storage node are both obtained after playback from the redox log, and if the data 1 is obtained after playback from the redox log 1, the data 1 corresponds to the redox log 1. If the LSN of the first log in each redox log corresponding to the data included in the data page is the largest, the LSN of the first redox log is the LSN of the data page. Correspondingly, if the LSN of the second log in each redox log corresponding to the data record triggering the page splitting process is the largest, the second redox log is the redox log triggering the page splitting process.

As described above, the storage node and the slave computing node each play back the redox log independently to generate data, so that there is a possibility that the update progress of the same data page at a certain period of time is inconsistent, such as the slave computing node needs to read a child data page of a certain parent data page, and due to the inconsistent update progress of the storage node and the slave computing node data page, there may be cases that: page splitting occurs only in the storage node or in the slave computing node in relation to the child data page, such that the child data page read from the storage node by the slave computing node may not match the parent data page in the slave computing node, i.e., erroneous data may be read from the computing node. For example, the slave computing node and the storage node are both organized in the form of a 5-level B tree, and as shown in fig. 7 (a), the slave computing node stores therein a data page 601, but does not store therein a sub data page of the data page 601, the sub data page is stored in the storage node, and the slave computing node needs to read the sub data page of the data page 601. If the B-tree in the storage node is shown in fig. 7 (B), then the sub-data page of the data page 601 stored in the storage node matches the data page 601 stored from the computing node. However, if the playback speed of the storage node is faster than that of the slave computing node, as shown in fig. 7 (c) - (e), a page split occurs in the storage node due to the insertion of 26, and finally the B-tree in the storage node is shown in fig. 7 (e), at this time, the slave computing node cannot read the sub-data page matching the parent data page 601 in the slave computing node from the storage node because the B-tree in the storage node no longer has the data page shown as 601.

In order to solve the above technical problems, a data acquisition method in the present embodiment is proposed. The data acquisition method of the present application will be described below using specific examples.

Firstly, a specific embodiment is adopted to describe a data reading method corresponding to split information stored in a storage node and full split information stored in a calculation node and/or a management node.

Fig. 8 is a flowchart of a data acquisition method according to an embodiment of the present application. Referring to fig. 8, the method of the present embodiment includes:

step S801, determining, from a computing node, an identifier of a data page to be read as a first identifier, where the data page to be read is a child data page of a first parent data page in the computing node.

And storing the first father data page in the computing node, and determining the identification of the child data page as a first identification when the child data page of the first father data page needs to be read.

Step S802, a data page read request is sent from a computing node to a storage node, the data page read request including a first identification.

Determining from the computing node whether the sub-data page identified as the first identification is included in its cache, if so, obtaining the sub-data page from the cache, and applying the sub-data page. The application of the sub data page may be, for example, sending to the terminal device to display the data corresponding to the sub data page.

And if the computing node determines that the sub data page is not included in the cache, sending a data page reading request to the storage node, wherein the data page reading request indication comprises a first identification. In one approach, the data page read request also includes the LSN of the first parent data page from the compute node (for convenience of the following description, the LSN of the first parent data page is referred to as the third LSN).

Step S803, the storage node determines that the first data page identified as the first identification in the storage node corresponds to a page splitting process.

The storage node receives a data page reading request from the slave computing node, determines whether a first data page identified as a first identification in the storage node corresponds to a page splitting process, and if not, the storage node sends the stored first data page identified as the first identification to the slave computing node; if yes, step S804 is performed.

In one approach: the storage node determining whether a first page of data identified in the storage node as a first identification corresponds to a page splitting process, comprising: the storage node determines, based on the splitting information, whether a first page of data identified as a first identification in the storage node corresponds to a page splitting process. The splitting information comprises an identifier of a splitting data page corresponding to the page splitting process and an LSN of a redox log triggering the page splitting process; an identification of the parent data page of the split data page may also be included.

Table 1 is at least part of the content included in the split information.

TABLE 1

Referring to table 1, the page splitting process splits an original data page C into a new data page C and a new data page D, the identity of the splitting of the original data page C into the new data page C is the same, the parent data page of the new data page C and the new data page D is P, and the LSN of the log triggering the page splitting process is 24. As shown in D and E of fig. 3, after insertion 13, 21, 40 (assuming that 40 is the last inserted of the 3 keys), the data page of the data record including keys 22, 39, 13, and 21 is split into the data page of the data record including keys 13 and 21 and the data page of the data record including keys 39 and 40, and the data page of the data record including keys 22 and 41 is the parent data page of the two data pages after the split.

That is, for a one-time page splitting process, the splitting information may include an identification of the split data page to which the page splitting process corresponds and an LSN of the log that triggered the page splitting process, where the identification of the split data page to which the page splitting process corresponds may include identifications of two data pages (e.g., C and D above) that are involved in the page splitting process. In addition, the splitting information may also include an identification of a parent data page of the two data pages.

If a page of data is split and then a page splitting process occurs again, the splitting information further includes the identification of the split data page corresponding to the re-occurring page splitting process and the LSN of the log triggering the re-occurring page splitting process, and accordingly at least part of the content of the splitting information may be as shown in table 2:

TABLE 2

Referring to table 2, the page splitting process is to split an original data page C into a new data page C and a new data page D, the original data page C is identical to the split new data page C in identification, the parent data page of the new data page C and the new data page D is P, and the LSN of the log triggering this page splitting process is 24. The new data page C is in turn split into an updated data page C and a new data page D ₂ The updated data page C is identical to the new data page C in identification, and the updated data page C and the new data page D ₂ P, the LSN of the log triggering this page splitting process is 24.

If a page splitting process causes the parent data page to be split, the splitting information also comprises the identification of the splitting data page corresponding to the page splitting process of the parent data page and the LSN of the log triggering the page splitting process of the parent data page. If each page split results in a split of the parent page at the same time, then the same loop is made to the root page, and at least some of the corresponding split information may be as shown in Table 3:

TABLE 3 Table 3

Identification of split data pages	Identification of split data pages	Identification of parent data pages	LSN of log triggering page splitting
				C	D	P1	24
P1	P2	PP1	24
				PP1	PP2	PP3	24
……	……	……	24
				PPPPP1	PPPPP2	Root	24

Thus, if the storage node determines that there is a first identification among the identifications of the split data pages included in the split information, it is determined that the first data page corresponds to a page splitting process.

Further, the split information may be divided into partial split information and full split information.

The full-dose splitting information includes: the identification of the splitting data page corresponding to all page splitting processes and the LSN of the redox log triggering each page splitting process; an identification of the parent data page of each split data page may also be included. The full amount of split information may be stored in the slave computing node and/or the management node, and may also be stored in the storage node.

At least a portion of the content of the full amount of split information may be as shown in table 4:

TABLE 4 Table 4

Identification of split data pages	Identification of split data pages	Identification of parent data pages	LSN of log triggering page splitting
				100	101	12	24
110	111	15	39
				201	202	18	89
210	211	19	99

The partial page splitting information may include an identification of a split data page corresponding to the partial page splitting process, and an LSN of a redox log triggering the partial page splitting process; an identification of the parent data page of the split data page may also be included. The partial split information may be stored in a storage node, and accordingly, a split data page corresponding to a partial page splitting process included in the partial split information stored in the storage node is stored in the storage node, that is, a data page indicated by an identification of the split data page included in the partial split information is stored in the storage node; in other words, the partial page splitting process included in the partial splitting information stored in the storage node is a page splitting process of at least a partial data page among the data pages stored in the storage node.

The storage node may store the full split information and may also store the partial split information. When the partial splitting information is stored in the storage node, the storage space of the storage node can be saved, and the efficiency of determining whether the first data page corresponds to the page splitting process can be improved.

When at least part of the content of the full amount of split information is shown in table 4, the partial split information may be shown in tables 5 and 6:

TABLE 5

TABLE 6

In step S804, the storage node sends a target data page to the slave computing node according to the occurrence time of the page splitting process corresponding to the first data page, where the target data page is the first data page or the second data page in the storage node.

The storage node further includes, before sending the target data page to the slave computing node according to the occurrence opportunity of the page splitting process corresponding to the first data page: the storage node determines an occurrence opportunity of a page splitting process corresponding to the first data page.

In one aspect, a storage node determines an occurrence of a page splitting process corresponding to a first data page, including: and determining the occurrence time of the page splitting process corresponding to the first data page according to the first LSN of the log triggering the page splitting process, the second LSN of the data page identified as the first identification in the storage node and the third LSN of the first father data page in the slave computing node.

If the third LSN is smaller than the first LSN and the first LSN is smaller than or equal to the second LSN, determining that the page splitting process corresponding to the first data page occurs after the storage node obtains the first parent data page. If the first LSN is less than the second LSN and the first LSN is less than the third LSN, determining that the page splitting process corresponding to the first data page occurs before the first parent data page is obtained from the storage node. If the second LSN is less than the first LSN and the first LSN is less than or equal to the third LSN, determining that the data playback speed of the storage node is slower than the data playback speed of the computing node, and that a page splitting process corresponding to the first data page in the storage node is to occur (i.e., is to occur).

It can be understood that if the page splitting process corresponding to the first data page occurs before the storage node obtains the first parent data page, it is explained that after the first parent data page is obtained from the storage node, the first data page in the storage node does not have the corresponding page splitting process, that is, the first data page identified as the first identifier in the storage node is a child data page matched with the first parent data page, and the storage node directly sends the data page identified as the first identifier to the slave computing node, so that the slave computing node can obtain correct data. I.e. the target data page is now the data page identified as the first identification in the storage node.

If the occurrence time of the page splitting process corresponding to the first data page includes: and if the data playback speed of the storage node is slower than the data playback speed of the slave computing node, and the page splitting process corresponding to the first data page in the storage node is to occur, after the fact that the log playback of the storage node to the first LSN is completed is determined, the data page marked as the first mark is sent to the slave computing node. The target data page is now the data page identified as the first identification in the storage node.

The log of the first LSN is a log triggering a page splitting process corresponding to the first data page, so after the log of the first LSN is replayed by the storage node, the page splitting process corresponding to the first data page is completed, at this time, the data page identified as the first identifier is sent to the slave computing node, and the slave computing node can obtain a child data page matched with the first parent data page, that is, can obtain correct data.

If the page splitting process corresponding to the first data page occurs after the storage node obtains the first parent data page, before the storage node sends the target data page to the slave computing node according to the occurrence time of the page splitting process corresponding to the first data page, the method further includes the following steps S805 to S808:

Step S805, the storage node transmits the first information to the slave computing node.

Wherein the first information may include: the method comprises the steps of triggering a first LSN of a log of a page splitting process corresponding to a first data page and indicating information, wherein the indicating information indicates that the page splitting process corresponding to the first data page occurs after a storage node obtains a first father data page.

Step S806, determining the target data page according to the first information from the computing node.

In one scheme: determining a target data page from the computing node according to the first information, wherein the target data page comprises the following a 1-a 3:

a1, determining that the first parent data page does not correspond to the page splitting process in the storage node according to the first information by the slave computing node.

If the full amount of splitting information is stored in the slave computing node, determining that the first parent data page does not correspond to the page splitting process in the storage node according to the first information, including: judging whether a second identifier exists in identifiers of split data pages included in the full split information according to the indication of the indication information, wherein LSNs triggering the corresponding page splitting process are the first LSNs, the second identifiers are identifiers of the first father data pages, and the obtained judging result is no.

If the total split information is not stored in the slave computing node, and the total split information is stored in the management node, determining, by the slave computing node according to the first information, that the first parent data page does not correspond to the page splitting process in the storage node, including: sending a query request to the management node from the computing node according to the indication of the indication information, wherein the query request comprises the second identification of the first parent data page and the first LSN, and the query request indicates the management node to determine whether the first parent data page corresponds to the page splitting process according to the full splitting information; a query result from the management node is received from the computing node, the query result indicating that the first parent data page does not correspond to the page splitting process. After receiving the query request, the management node determines whether a second identifier exists in the identifiers of the split data pages included in the full split information, and the LSN triggering the corresponding page splitting process is the first LSN, and the obtained determination result is no.

a2, reading the second parent data page from the storage node from the computing node. The second parent data page is the first parent data page updated after the page splitting process corresponding to the first data page occurs, that is, the second parent data page is identified in the storage node as the second identifier, and the second identifier is the identifier of the first parent data page. It will be appreciated that the identity of the second parent data page is also the second identity.

Although the first parent data page in the storage node does not correspond to the page splitting process, after the first parent data page is obtained from the storage node, the first data page in the storage node corresponds to the page splitting process, the parent data page corresponding to the data page identified as the first identifier is the first parent data page in the storage node before the page splitting process corresponding to the first data page does not occur, and the first parent data page is updated to the second parent data page after the page splitting process corresponding to the first data page occurs, so that the computing node needs to read the second parent data page from the storage node, and the identification of the second parent data page is the same as the identification of the first parent data page. For example, as shown in fig. 5F through G, after insertion 18, the data page of the data record including the keywords 10, 15, 16, 17 is split into the data page of the data record including the keywords 10 and 15 and the data page of the data record including the keywords 10, 16, 17, and the parent data page is updated from the data page of the data record including the keywords 10 to the data page of the data record including the keywords 10 and 16.

Wherein reading, from the computing node, the second parent data page from the storage node, comprises: a data read request is sent from the computing node to the storage node, the data read request including an identification of the first parent data page (i.e., the second identification described above), and a second parent data page is received from the computing node from the storage node.

a3, determining the target data page according to the second father data page from the computing node.

The target data page determined from the computing node from the second parent data page may be a first data page identified as a first identification in the storage node or the storage node may be a second data page different from the first data page.

The target data page may be determined according to the second parent data page according to the current search method of the B-tree or the variant of the B-tree, which will not be described here.

In another scheme: determining a target data page from the computing node according to the first information, wherein the target data page comprises the following b 1-b 3:

b1, determining a page splitting process corresponding to the first parent data page in the storage node according to the first information from the computing node.

Wherein the first data page corresponds to a page splitting process, and a schematic diagram of the page splitting process corresponding to the first parent data page may be shown in fig. 3 from G to I, or fig. 5 from I to K.

If the full amount of splitting information is stored in the slave computing node, determining that the first parent data page corresponds to the page splitting process in the storage node according to the first information, including: judging whether a second identifier exists in identifiers of split data pages included in the full split information according to the indication of the indication information, wherein LSNs triggering the corresponding page splitting process are the first LSNs, the second identifier is the identifier of a first father data page, and the obtained judging result is yes.

If the total split information is not stored in the slave computing node, and the total split information is stored in the management node, determining, by the slave computing node according to the first information, that the first parent data page corresponds to the page splitting process in the storage node, including: sending a query request to the management node from the computing node according to the indication of the indication information, wherein the query request comprises the second identification of the first parent data page and the first LSN, and the query request indicates the management node to determine whether the first parent data page corresponds to the page splitting process according to the full splitting information; a query result from the management node is received from the computing node, the query result indicating that the first parent data page corresponds to a page splitting process. After receiving the query request, the management node determines whether a second identifier exists in the identifiers of the split data pages included in the full split information, and the LSN triggering the corresponding page splitting process is the first LSN, and the obtained determination result is yes.

It will be appreciated that the first parent data page corresponds to a page splitting process, and that the first parent data page is split into a second parent data page and a third parent data page, the identity of the second parent data page being the same as the identity of the first parent data page, and the second parent data page being the parent data page of the data page after the first data page has been split. If the first parent data page is the root data page before the first parent data page is split, a new root parent data page is generated after the first parent data page is split, and the new root parent data page is the parent data page of the second parent data page and the third parent data page. If the first parent data page is not the root data page before the first parent data page splits, at least the parent data pages of the second parent data page and the third parent data page are updated, and possibly a page splitting process.

b2, reading each data page from the storage node to the target parent data page from the computing node, wherein the target parent data page does not correspond to the page splitting process and is positioned between the root data page and the second parent data page or the target parent data page is the root data page, the second parent data page is the data page after the first parent data page is split, and the second parent data page is the parent data page of the first data page.

Specifically, after determining from the computing node that the first parent data page corresponds to the page splitting process in the storage node, continuing to determine whether a parent data page (hereinafter referred to as a fourth parent data page) of the second parent data page into which the first parent data page is split has been split (refer to a method of determining whether the first parent data page corresponds to the page splitting process), if the fourth parent data page does not correspond to the page splitting process, acquiring the second parent data page and the fourth parent data page (when the fourth parent data page is the target parent data page), and if the fourth parent data page corresponds to the page splitting process, continuing to determine whether a parent data page (hereinafter referred to as a fifth parent data page) of the fourth parent data page has been split; and repeating the process until the fact that the target father data page does not correspond to the page splitting process is determined. It will be appreciated that there are cases where the target parent data page is the root data page to which the root node corresponds.

b3, determining the target data page from the computing node according to each data page from the second parent data page to the target parent data page.

Likewise, the target data page determined from the computing node from the second parent data page may be the first data page identified as the first identification in the storage node, or the storage node may be a second data page different from the first data page.

The target data page may be determined according to the current search method of the B-tree or the variant of the B-tree according to each data page from the second parent data page to the target parent data page, which is not described herein.

Step S807, a read request of the target data page is transmitted from the computing node to the storage node.

Wherein the read request for the target data page may include an identification of the target data page.

Step S808, the storage node acquires the target data page according to the request for reading the target data page.

And the storage node acquires the target data page according to the identification of the target data page.

After the storage node acquires the target data page, the storage node sends the target data page to the slave computing node, receives the target data page from the slave computing node, and the target data page is matched with the second father data page acquired from the computing node or is matched with each data page from the second father data page to the target father data page, so that correct data can be acquired from the computer node.

In this embodiment, when a slave computing node needs to read a child data page of a certain data page, whether the child data page corresponds to a page splitting process may be determined according to the splitting information, so that in the case that the child data page corresponds to the page splitting process, a correct read data page is determined according to an occurrence timing of the page splitting process, so that a parent data page and the child data page in the slave computing node are matched with each other, that is, correct data can be read from the computing node.

Next, a specific embodiment is adopted to explain a data reading method corresponding to the split information stored in the storage node and the full split information stored in the calculation node and/or the management node.

Fig. 9 is a second flowchart of a data reading method according to an embodiment of the present application, referring to fig. 9, the method of the present embodiment includes:

step S901, determining, from a computing node, an identifier of a data page to be read as a first identifier, where the data page to be read is a child data page of a first parent data page in the computing node.

And storing the first father data page in the computing node, and determining the identification of the child data page as a first identification when the child data page of the first father data page needs to be read. Determining from the computing node whether the sub-data page identified as the first identification is included in its cache, if so, obtaining the sub-data page from the cache, and applying the sub-data page. The application of the sub data page may be, for example, sending to the terminal device to display the data corresponding to the sub data page. If the slave computing node determines that the sub-data page identified as the first identification is not included in its cache, the slave computing node determines to read the sub-data page from the storage node.

Step S902, determining from the computing node that the first data page identified as the first identification in the storage node corresponds to a page splitting process.

In one approach, a full amount of split information is stored from the compute node. Accordingly, determining from the compute node that the first page of data corresponds to a page splitting process in the storage node, comprising: a first one of the identifications of the split data pages included in the full amount of split information is determined to be present.

The method can improve the data reading efficiency and reduce the network overhead.

In another approach, the full split information is not stored from the compute node, and the full split information is stored in the management node. Accordingly, determining from the computing node that the first page of data corresponds to a page splitting process includes: sending a query request from a computing node to a management node, wherein the query request comprises a first identifier, and the query request instructs the management node to determine whether a first data page corresponds to a page splitting process according to full splitting information; a query result from the management node is received from the computing node, the query result indicating that the first data page corresponds to a page splitting process. After receiving the query request, the management node determines whether a first identifier exists in identifiers of split data pages included in the full split information, and the obtained determination result is yes.

This approach may save storage space from the computing node.

Step S903, determining, by the slave computing node, a target data page according to the occurrence timing of the page splitting process corresponding to the first data page.

Wherein determining, from the computing node, a specific implementation of the occurrence occasion of the page splitting process corresponding to the first data page refers to determining, by the storage node in the embodiment shown in fig. 8, a specific implementation of the occurrence occasion of the page splitting process corresponding to the first data page.

If the page splitting process corresponding to the first data page occurs after the storage node obtains the first parent data page, in one manner, determining the target data page from the computing node includes: determining from the compute node that the first parent data page does not correspond to a page splitting process in the storage node; reading a second father data page from the storage node, wherein the second father data page is the first father data page updated after the page splitting process corresponding to the first data page occurs; a target data page is determined from the second parent data page. A specific implementation of the steps in this manner is illustrated in the embodiment shown in fig. 8.

If the page splitting process corresponding to the first data page occurs after the storage node obtains the first parent data page, the first data page is a child data page of the first parent data page, and in one manner, determining, from the computing node, the target data page includes: determining from the compute node that a first parent data page corresponds to a page splitting process in the storage node; reading each data page from a second parent data page to a target parent data page from a storage node, wherein the target parent data page does not correspond to a page splitting process and is located between a root data page and the second parent data page or the target parent data page is the root data page, the second parent data page is a data page after the first parent data page is split, and the second parent data page is a parent data page of the first data page; and determining the target data page according to each data page from the second parent data page to the target parent data page. A specific implementation of the steps in this manner is illustrated in the embodiment shown in fig. 8.

If the page splitting process corresponding to the first data page occurs before the storage node obtains the first parent data page, determining the target data page includes: the first data page in the storage node is determined to be the target data page.

If the occurrence time of the page splitting process corresponding to the first data page includes: the data playback speed of the storage node is slower than the data playback speed of the slave computing node, and the page splitting process corresponding to the first data page in the storage node is to occur, then the method comprises: determining that the log of the first LSN is replayed by the storage node; the first data page in the storage node is determined to be the target data page.

Step S904, the target data page is read from the storage node by the slave computing node.

To sum up, in the embodiment shown in fig. 9, if the full split information is stored in the management node, not in the slave computing node, nor is the full split information or the partial split information stored in the storage node, the slave computing node needs to request from the storage node whether the data page to be read corresponds to the page splitting process or not every time the slave computing node needs to read data from the storage node, and the network overhead is high. An improved way is in the case where the full amount of split information is stored at the slave computing node in the embodiment shown in fig. 9, where the slave computing node does not need to request from the management node whether the data page to be read corresponds to a page splitting procedure, reducing network overhead. Another modification is that in the embodiment shown in fig. 8, partial splitting information is stored in the storage node, at this time, when the storage node receives a data page read request sent from the computing node, it is determined, according to the stored partial splitting information, whether a data page to be read corresponds to a page splitting process, and if the data page to be read corresponds to the page splitting process, the full splitting information is only used when the page splitting process occurs at a certain time, and the network overhead is less than in the scheme that "the full splitting information is stored in the management node, not stored in the slave computing node and the storage node, nor the full splitting information or the partial splitting information" is stored in the slave computing node or the management node.

Next, a description will be given of a process of creating and cleaning the separation information using specific embodiments.

Since all write operations are performed at the master computing node, when a data page is split, all related data pages should be in the cache of the master computing node, so the master computing node stores information of all related data pages, and thus split information can be created by the master computing node. In the subsequent process, the master node sends the newly added split information to the node storing the split information so as to update the split information stored in the node.

With the increase of data, page splitting occurs continuously, so that the amount of information in splitting information is more and more, and therefore, the splitting information needs to be cleaned. The cleaning process may be controlled by the management node.

Wherein, in the slave computing node and the storage node, the playback of the redox log is in order of the size of the LSN, so the log with the LSN smaller than the currently played back LSN will not be seen. Based on this fact, the cleaning can be done from the LSNs of the logs currently played back from the compute node and the storage node. For example, in one scheme, the management node determines a smaller LSN of the log newly played back from the computing node and the LSN of the log newly played back from the storage node, and sends the smaller LSN to each node storing the split information, so that each node storing the split information deletes information corresponding to less than the smaller LSN in the split information. For example, from a log with LSN of 100 applied to the compute node and a log with LSN of 120 applied to the store node, we can clean up the information of the page splitting process whose LSN is less than 100, which triggers the log of the page splitting process, in the splitting information.

The present embodiment gives a specific implementation of generation and deletion of split information for the data reading process described above.

The data reading method in the embodiment of the present application is described above, and the apparatus according to the embodiment of the present application is described below.

Fig. 10 is a schematic structural diagram of a data reading device according to an embodiment of the present application. As shown in fig. 10, the data reading apparatus 1000 may be a slave computing node as described above, or may be a component (e.g., an integrated circuit, a chip, etc.) of the slave computing node as described above. The data reading apparatus 1000 may be a storage node as above, or may be a component (e.g., an integrated circuit, a chip, etc.) of a storage node as above. The data reading apparatus 1000 may include: processing module 1002 (processing unit). Optionally, a transceiver module 1001 (transceiver unit) and a storage module 1003 (storage unit) may be further included.

In one possible design, one or more modules as in FIG. 10 may be implemented by one or more processors or by one or more processors and memory; or by one or more processors and transceivers; or by one or more processors, memory, and transceivers, to which embodiments of the application are not limited. The processor, the memory and the transceiver can be arranged separately or integrated.

The data reading device has the function of realizing the slave computing node described in the embodiment of the present application, for example, the data reading device includes a module or a unit or means (means) corresponding to the steps involved in the slave computing node described in the embodiment of the present application, where the function or the unit or means (means) may be implemented by software, or implemented by hardware, or implemented by executing corresponding software by hardware, or implemented by a combination of software and hardware. Reference is further made in detail to the corresponding description in the foregoing corresponding method embodiments.

Or the data reading device has the function of implementing the storage node described in the embodiment of the present application, for example, the data reading device includes a module or a unit or means (means) corresponding to the steps involved in the storage node described in the embodiment of the present application, where the function or the unit or means (means) may be implemented by software, or implemented by hardware, or implemented by executing corresponding software by hardware, or implemented by a combination of software and hardware. Reference is further made in detail to the corresponding description in the foregoing corresponding method embodiments.

Alternatively, each module in the data reading apparatus 1000 in the embodiment of the present application may be used to perform the method described in the embodiment of the method of the present application.

In a first possible design, a data reading device 1000 may include a transceiver module 1001 and a processing module 1002.

A transceiver module 1001 for receiving a data page read request from a computing node, the data page read request including a first identification; a processing module 1002 configured to determine that a first data page identified as the first identification in the data reading device corresponds to a page splitting process; the transceiver module 1001 is further configured to send a target data page to the computing node according to an occurrence timing of the page splitting process, where the target data page is the first data page or the second data page.

Optionally, the data reading device stores partial splitting information therein; the partial splitting information comprises an identification of a splitting data page corresponding to a partial page splitting process and a log sequence number LSN of a log triggering the page splitting process, and the splitting data page corresponding to the partial page splitting process is stored in the data reading device; the processing module 1002 is specifically configured to determine that the first identifier exists in the identifiers of the split data pages included in the partial split information.

Optionally, the first identifier is an identifier of a data page to be read determined by the computing node, and the data page to be read is a child data page of a first parent data page in the computing node.

Optionally, the data page read request includes the first identifier and a third LSN of the first parent data page in the computing point; before the transceiver module 1001 sends the target data page to the computing node according to the occurrence of the page splitting process, the processing module 1002 is further configured to: and determining the occurrence time of the page splitting process according to the first LSN of the log triggering the page splitting process, the second LSN of the first data page in the data reading device and the third LSN.

Optionally, the processing module 1002 is specifically configured to: if the third LSN is less than the first LSN and the first LSN is less than or equal to the second LSN, determining that the page splitting process occurs after the data reading device obtains the first parent data page; if the first LSN is less than the second LSN and the first LSN is less than the third LSN, determining that the page splitting process occurs before the data reading device obtains the first parent data page; if the second LSN is less than the first LSN and the first LSN is less than or equal to the third LSN, determining that the data playback speed of the data reading device is slower than the data playback speed of the computing node and that the page splitting process is to occur in the data reading device.

Optionally, the page splitting process occurs after the data reading device obtains the first parent data page; before the transceiver module 1001 sends the target data page to the computing node, the transceiver module 1001 is further configured to: sending first information to the computing node, wherein the first information is used for determining the target data page to be read by the computing node; a read request of the target data page from the compute node is received.

Optionally, the first information includes: the method includes triggering a first LSN of a log of the page splitting process and indicating information indicating that the page splitting process occurs after the data reading device obtains a first parent data page.

Optionally, the page splitting process occurs before the first parent data page is obtained by the data reading device, and the target data page is the first data page in the data reading device.

Optionally, the data playback speed of the data reading device is slower than the data playback speed of the computing node and the page splitting process is to occur in the data reading device, the target data page being the first data page in the data reading device. The processing module 1002 is further configured to, before the transceiver module 1001 sends the first data page to the computing node: and determining that the data reading device finishes the log playback with the LSN being the first LSN.

The data reading means in the first possible design may be a storage node or a part of a storage node in the implementation of the method shown in fig. 8.

In a second possible design, a data reading device 1000 may include a transceiver module 1001 and a processing module 1002.

A processing module 1002, configured to determine an identifier of a data page to be read as a first identifier, where the data page to be read is a child data page of a first parent data page in the computing node; a transceiver module 1001, configured to send a data page read request to a storage node, where the data page read request includes the first identifier, where the first identifier is used for the storage node to determine an occurrence timing of a page splitting process corresponding to a first data page identified as the first identifier, and send a target data page to the data reading device according to the occurrence timing; the transceiver module 1001 is further configured to receive a target data page from the storage node, where the target data page is a second data page or the first data page.

Optionally, before the transceiver module 1001 receives a target data page from the storage node: the transceiver module 1001 is further configured to receive first information from the storage node, where the first information includes: triggering a first log sequence number LSN of a log of the page splitting process and indicating information, wherein the indicating information indicates that the page splitting process occurs after the storage node obtains the first father data page; the processing module 1002 is further configured to determine a target data page according to the first information; the transceiver module 1001 is further configured to send a request to the storage node to read the target data page.

Optionally, the processing module 1002 is specifically configured to: determining that a first parent data page does not correspond to a page splitting process in the storage node according to the first information; reading a second parent data page from the storage node, the second parent data page being the first parent data page updated after the page splitting process occurs; and determining the target data page according to the second father data page.

Optionally, the processing module 1002 is specifically configured to: determining a page splitting process corresponding to the first parent data page in the storage node according to the first information; reading each data page from a storage node from a second parent data page to a target parent data page, wherein the target parent data page does not correspond to a page splitting process and is located between a root data page and the second parent data page or the target parent data page is the root data page, the second parent data page is a data page after the first parent data page is split, and the second parent data page is a parent data page of the first data page; and determining the target data page according to each data page from the second parent data page to the target parent data page.

Optionally, the data storage device stores full-quantity splitting information, wherein the full-quantity splitting information comprises an identifier of a splitting data page corresponding to each page splitting process and an LSN of a log triggering each page splitting process; the processing module 1002 is specifically configured to: and according to the first information, determining that the second identifier exists in the identifiers of the split data pages included in the full split information and the LSN triggering the corresponding page splitting process is the first LSN, wherein the second identifier is the identifier of the first father data page.

Optionally, the data storage device does not store therein full split information, and the management node stores therein the full split information; the full-quantity splitting information comprises an identifier of a splitting data page corresponding to each page splitting process and an LSN of a log triggering each page splitting process; the processing module 1002 is specifically configured to: according to the indication information, the transceiver module 1001 is controlled to send a query request to a management node, wherein the query request comprises the first LSN and the second identifier, and the query request instructs the management node to determine whether the first parent data page corresponds to a page splitting process in the storage node according to the full-size splitting information; determining that the first parent data page corresponds to a page splitting process in the storage node according to a query result received by the transceiver module 1001 from the management node, where the query result indicates that the first parent data page corresponds to a page splitting process in the storage node.

The data reading means in the second possible design may be the slave computing node or a part of the slave computing node in the implementation of the method described above.

In a third possible design, a data reading device 1000 may include a transceiver module 1001 and a processing module 1002.

A processing module 1002, configured to determine an identifier of a data page to be read as a first identifier, where the data page to be read is a child data page of the first parent data page in the data reading device; determining that a first data page identified as the first identification in a storage node corresponds to a page splitting process; determining a target data page according to the occurrence time of the page splitting process; the target data page is read from the storage node.

Optionally, the data reading device stores full-quantity splitting information, and the full-quantity splitting information includes an identifier of a splitting data page corresponding to each page splitting process and an LSN of a log triggering each page splitting process; the processing module 1002 is specifically configured to: determining that the first identifier exists in identifiers of split data pages included in the full amount of split information.

Optionally, a transceiver module 1001 is also included; the data reading device does not store full-quantity splitting information, the management node stores full-quantity splitting information, and the full-quantity splitting information comprises identifiers of splitting data pages corresponding to each page splitting process and LSNs of logs triggering each page splitting process; the processing module 1002 is specifically configured to control the transceiver module 1001 to send a query request to the management node, where the query request includes the first identifier, and the query request instructs the management node to determine, according to the full-size splitting information, a corresponding page splitting process of the first data page in the storage node; and determining that the first data page corresponds to a page splitting process in the storage node according to a query result received by the transceiver module 1001 from the management node, where the query result indicates that the first data page corresponds to a page splitting process in the storage node.

Optionally, before the processing module 1002 determines the target data page according to the occurrence timing of the page splitting process, the processing module 1002 is further configured to: determining the occurrence time of the page splitting process according to a first LSN of a log triggering the page splitting process, a second LSN of the first data page in a storage node and a third LSN of a first parent data page in the data reading device.

Optionally, the processing module 1002 is specifically configured to: if the third LSN is less than the first LSN and the first LSN is less than or equal to the second LSN, determining that the page splitting process occurs after the storage node obtains the first parent data page; if the first LSN is less than the second LSN and the first LSN is less than the third LSN, determining that the page splitting process occurs before the storage node obtains the first parent data page; if the second LSN is less than the first LSN, which is less than or equal to the third LSN, then determining that the data playback speed of the storage node is slower than the data playback speed of the data reading device and that the page splitting process is to occur in the storage node.

Optionally, if the page splitting process occurs after the storage node obtains the first parent data page, the processing module 1002 is specifically configured to: determining that the first parent data page does not correspond to a page splitting process in the storage node; reading a second parent data page from a storage node, the second parent data page being the first parent data page updated after the page splitting occurs; and determining a target data page according to the second father data page.

Optionally, if the page splitting process occurs after the storage node obtains the first parent data page, the processing module 1002 is specifically configured to: determining that the first parent data page corresponds to a page splitting process in the storage node; reading each data page from a storage node from a second parent data page to a target parent data page, wherein the target parent data page does not correspond to a page splitting process and is located between a root data page and the second parent data page or the target parent data page is the root data page, the second parent data page is a data page after the first parent data page is split, and the second parent data page is a parent data page of the first data page; and determining the target data page according to each data page from the second parent data page to the target parent data page.

Optionally, if the page splitting process occurs before the storage node obtains the first parent data page, the processing module 1002 is specifically configured to: determining the first data page in the storage node as a target data page.

Optionally, if the data playback speed of the storage node is slower than the data playback speed of the data reading device and the page splitting process is to occur in the storage node, the processing module 1002 is specifically configured to: determining that the log of the first LSN is replayed by the storage node; determining the first data page in the storage node as a target data page.

The data reading means in the third possible design may be the slave computing node or a part of the slave computing node in the implementation of the method shown in fig. 9.

The device of the present embodiment may be used to execute the technical solution of the foregoing method embodiment, and its implementation principle and technical effects are similar, and are not described herein again.

According to an embodiment of the present application, the present application also provides an electronic device and a readable storage medium.

As shown in fig. 11, a block diagram of an electronic device implementing a data reading method according to an embodiment of the present application is shown. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.

As shown in fig. 11, the electronic device includes: one or more processors 1101, memory 1102, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). In fig. 11, a processor 1101 is taken as an example.

Memory 1102 is a non-transitory computer-readable storage medium provided by the present application. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the data reading method provided by the application. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to execute the data reading method provided by the present application.

The memory 1102 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the processing module 1002 and the transceiver module 1001 shown in fig. 10) corresponding to the data reading method according to the embodiment of the present application. The processor 1101 executes various functional applications of the server and data processing, i.e., implements the data reading method in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 1102.

Memory 1102 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created by use of an electronic device implementing the data reading method, and the like. In addition, memory 1102 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, the memory 1102 may optionally include memory located remotely from the processor 1101, which may be connected to the electronic device implementing the data reading method via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device implementing the data reading method may further include: an input device 1103 and an output device 1104. The processor 1101, memory 1102, input device 1103 and output device 1104 may be connected by a bus or other means, for example in fig. 11.

The input device 1103 may receive input numeric or character information, and generate key signal inputs related to user settings and function control of an electronic device implementing the data reading method, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, etc. The output device 1104 may include a display device, auxiliary lighting (e.g., LEDs), and haptic feedback (e.g., a vibration motor), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

When a child data page of a certain data page in a storage node needs to be read, whether the child data page corresponds to a page splitting process or not can be determined according to splitting information, so that the data page to be read is determined according to the occurrence time of the page splitting process under the condition that the child data page corresponds to the page splitting process, and the parent data page and the child data page are obtained from a computing node to be matched with each other, namely, correct data can be read from the computing node.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed embodiments are achieved, and are not limited herein.

The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims

1. A data reading method, applied to a storage node, the method comprising:

receiving a data page read request from a computing node, the data page read request including a first identification;

determining that a first data page identified in the storage node as the first identification corresponds to a page splitting process;

sending a target data page to the computing node according to the occurrence time of the page splitting process, wherein the target data page is the first data page or the second data page;

The storage node stores partial split information; the partial splitting information comprises an identifier of a splitting data page corresponding to a partial page splitting process and a log sequence number LSN of a log triggering the page splitting process, and the splitting data page corresponding to the partial page splitting process is stored in the storage node;

determining that a first page of data identified in the storage node as the first identification corresponds to a page splitting process, comprising: determining that the first identifier exists in identifiers of split data pages included in the partial split information;

the first identifier is an identifier of a data page to be read, which is determined by the computing node, and the data page to be read is a child data page of a first father data page in the computing node;

the data page read request further includes a third LSN of the first parent data page;

before the sending the target data page to the computing node according to the occurrence opportunity of the page splitting process, the method further comprises: and determining the occurrence time of the page splitting process according to the first LSN of the log triggering the page splitting process, the second LSN of the first data page in the storage node and the third LSN.

2. The method of claim 1, wherein the determining an occurrence of the page splitting process comprises:

if the third LSN is less than the first LSN and the first LSN is less than or equal to the second LSN, determining that the page splitting process occurs after the storage node obtains the first parent data page;

if the first LSN is less than the second LSN and the first LSN is less than the third LSN, determining that the page splitting process occurs before the storage node obtains the first parent data page;

if the second LSN is less than the first LSN and the first LSN is less than or equal to the third LSN, determining that the data playback speed of the storage node is slower than the data playback speed of the compute node and that the page splitting process is to occur in the storage node.

3. The method of claim 1 or 2, wherein the page splitting process occurs after the storage node obtains the first parent data page, before sending a target data page to the computing node, further comprising:

sending first information to the computing node, wherein the first information is used for determining a target data page by the computing node;

A read request of the target data page from the compute node is received.

4. A method according to claim 3, wherein the first information comprises: a first LSN triggering a log of the page splitting process and indication information indicating that the page splitting process occurred after the storage node obtained the first parent data page.

5. The method of claim 1 or 2, wherein the page splitting process occurs before the storage node obtains the first parent data page, the target data page being the first data page in the storage node.

6. The method according to claim 1 or 2, wherein the data playback speed of the storage node is slower than the data playback speed of the computing node and the page splitting process is to occur in the storage node, the target data page being the first data page in the storage node; before sending the first data page identified as the first identification to the computing node, further comprising:

and determining that the log with the LSN as the first LSN is played back by the storage node.

7. A data reading method, applied to a computing node, the method comprising:

Determining an identifier of a data page to be read as a first identifier, wherein the data page to be read is a child data page of a first father data page in the computing node;

sending a data page reading request to a storage node, wherein the data page reading request comprises the first identifier, and the first identifier is used for determining the occurrence time of a page splitting process corresponding to a first data page identified as the first identifier by the storage node and sending a target data page to the computing node according to the occurrence time;

receiving a target data page from the storage node, the target data page being either a second data page or the first data page;

before receiving the target data page from the storage node, further comprising:

receiving first information from the storage node, the first information comprising: triggering a first log sequence number LSN of a log of the page splitting process and indicating information, wherein the indicating information indicates that the page splitting process occurs after the storage node obtains the first father data page;

determining a target data page according to the first information;

and sending a request for reading the target data page to the storage node.

8. The method of claim 7, wherein determining a target data page based on the first information comprises:

Determining that the first parent data page does not correspond to a page splitting process in the storage node according to the first information;

reading a second parent data page from the storage node, the second parent data page being the first parent data page updated after the page splitting process occurs;

and determining the target data page according to the second father data page.

9. The method of claim 7, wherein determining a target data page based on the first information comprises:

determining a page splitting process corresponding to the first parent data page in the storage node according to the first information;

reading each data page from a storage node from a second parent data page to a target parent data page, wherein the target parent data page does not correspond to a page splitting process and is located between a root data page and the second parent data page or the target parent data page is the root data page, the second parent data page is a data page after the first parent data page is split, and the second parent data page is a parent data page of the first data page;

and determining the target data page according to each data page from the second parent data page to the target parent data page.

10. The method of claim 9, wherein the computing node has stored therein a full amount of splitting information, the full amount of splitting information including an identification of a split data page corresponding to each page splitting process and an LSN of a log triggering each page splitting process;

the determining, according to the first information, that the first parent data page corresponds to a page splitting process in the storage node includes: and determining that the second identifier exists in identifiers of split data pages included in the full split information and LSNs triggering corresponding page splitting processes are the first LSNs according to the indication information, wherein the second identifier is the identifier of the first parent data page.

11. The method of claim 9, wherein the computing node does not have stored therein full split information, and wherein a management node has stored therein the full split information; the full-quantity splitting information comprises an identifier of a splitting data page corresponding to each page splitting process and an LSN of a log triggering each page splitting process;

the determining, according to the first information, that the first parent data page corresponds to a page splitting process in the storage node includes:

sending a query request to a management node according to the indication information, wherein the query request comprises the first LSN and the second identifier, and the query request indicates the management node to determine whether the first father data page corresponds to a page splitting process in the storage node according to the full-size splitting information;

A query result is received from the management node, the query result indicating that the first parent data page corresponds to a page splitting process in the storage node.

12. A data reading method, applied to a computing node, the method comprising:

determining that a first data page identified as the first identification in a storage node corresponds to a page splitting process;

determining a target data page according to the occurrence time of the page splitting process;

reading the target data page from the storage node;

the computing node stores full-quantity splitting information, wherein the full-quantity splitting information comprises an identifier of a splitting data page corresponding to each page splitting process and an LSN of a log triggering each page splitting process;

determining that a first page of data in a storage node identified as the first identification corresponds to a page splitting process, comprising: determining that the first identifier exists in identifiers of split data pages included in the full split information;

the computing node is not stored with full-quantity splitting information, the management node is stored with full-quantity splitting information, and the full-quantity splitting information comprises identifiers of splitting data pages corresponding to each page splitting process and LSNs of logs triggering each page splitting process;

Determining that a first page of data in a storage node identified as the first identification corresponds to a page splitting process, comprising:

sending a query request to the management node, wherein the query request comprises the first identifier, and the query request indicates the management node to determine that the first data page corresponds to a page splitting process in the storage node according to the full-size splitting information;

a query result is received from the management node, the query result indicating that the first data page corresponds to a page splitting process in the storage node.

13. A data reading apparatus, comprising:

a transceiver module for receiving a data page read request from a computing node, the data page read request including a first identification;

a processing module for determining that a first data page identified as the first identification in the data reading device corresponds to a page splitting process;

the receiving and transmitting module is further configured to send a target data page to the computing node according to an occurrence opportunity of the page splitting process, where the target data page is the first data page or the second data page;

the data reading device stores partial splitting information; the partial splitting information comprises an identification of a splitting data page corresponding to a partial page splitting process and a log sequence number LSN of a log triggering the page splitting process, and the splitting data page corresponding to the partial page splitting process is stored in the data reading device;

The processing module is specifically configured to determine that the first identifier exists in identifiers of split data pages included in the partial split information;

the data page read request further includes a third LSN of the first parent data page in the compute node;

before the transceiver module sends the target data page to the computing node according to the occurrence time of the page splitting process, the processing module is further configured to:

and determining the occurrence time of the page splitting process according to the first LSN of the log triggering the page splitting process, the second LSN of the first data page in the data reading device and the third LSN.

14. The apparatus of claim 13, wherein the processing module is specifically configured to:

if the third LSN is less than the first LSN and the first LSN is less than or equal to the second LSN, determining that the page splitting process occurs after the data reading device obtains the first parent data page;

if the first LSN is less than the second LSN and the first LSN is less than the third LSN, determining that the page splitting process occurs before the data reading device obtains the first parent data page;

If the second LSN is less than the first LSN and the first LSN is less than or equal to the third LSN, determining that the data playback speed of the data reading device is slower than the data playback speed of the computing node and that the page splitting process is to occur in the data reading device.

15. The apparatus of claim 13 or 14, wherein the page splitting process occurs after the data reading means obtains the first parent data page; before the transceiver module sends the target data page to the computing node, the transceiver module is further configured to:

sending first information to the computing node, wherein the first information is used for determining the target data page by the computing node;

a read request of the target data page from the compute node is received.

16. The apparatus of claim 15, wherein the first information comprises: a first LSN triggering a log of the page splitting process and indication information indicating that the page splitting process occurred after the data reading device obtained the first parent data page.

17. The apparatus of claim 13 or 14, wherein the page splitting process occurs before the first parent data page is obtained by the data reading apparatus, the target data page being the first data page in the data reading apparatus.

18. The apparatus of claim 13, wherein a data playback speed of the data reading apparatus is slower than a data playback speed of the computing node and the page splitting process is to occur in the data reading apparatus, the target data page being the first data page in the data reading apparatus; the processing module is further configured to, prior to the transceiver module sending the first data page to the computing node: and determining that the data reading device finishes the log playback with the LSN being the first LSN.

19. A data reading apparatus, comprising:

the processing module is used for determining the identification of the data page to be read as a first identification, wherein the data page to be read is a child data page of a first father data page in the computing node;

the receiving and transmitting module is used for sending a data page reading request to a storage node, the data page reading request comprises the first identifier, the first identifier is used for determining the occurrence time of a page splitting process corresponding to a first data page identified as the first identifier by the storage node, and sending a target data page to the data reading device according to the occurrence time;

the receiving and transmitting module is further configured to receive a target data page from the storage node, where the target data page is a second data page or the first data page;

Before the transceiver module receives a target data page from the storage node:

the transceiver module is further configured to receive first information from the storage node, where the first information includes: triggering a first log sequence number LSN of a log of the page splitting process and indicating information, wherein the indicating information indicates that the page splitting process occurs after the storage node obtains a first father data page;

the processing module is further used for determining a target data page according to the first information;

the receiving and transmitting module is further configured to send a request for reading the target data page to the storage node.

20. The apparatus of claim 19, wherein the processing module is specifically configured to:

and determining the target data page according to the second father data page.

21. The apparatus of claim 19, wherein the processing module is specifically configured to:

22. The apparatus of claim 21, wherein the data reading apparatus stores therein a full amount of splitting information, the full amount of splitting information including an identification of a split data page corresponding to each page splitting process and an LSN of a log triggering each page splitting process;

the processing module is specifically configured to: and according to the first information, determining that the second identifier exists in the identifiers of the split data pages included in the full split information and the LSN triggering the corresponding page splitting process is the first LSN, wherein the second identifier is the identifier of the first father data page.

23. The apparatus of claim 21, wherein the data reading means does not have stored therein full split information, and wherein a management node has stored therein the full split information; the full-quantity splitting information comprises an identifier of a splitting data page corresponding to each page splitting process and an LSN of a log triggering each page splitting process;

the processing module is specifically configured to:

according to the indication information, controlling the transceiver module to send a query request to a management node, wherein the query request comprises the first LSN and the second identifier, and the query request indicates the management node to determine whether the first father data page corresponds to a page splitting process in the storage node according to the full-size splitting information;

and determining a page splitting process corresponding to the first parent data page in the storage node according to a query result received by the transceiver module from the management node, wherein the query result indicates the page splitting process corresponding to the first parent data page in the storage node.

24. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6 or any one of claims 7-11 or 12.

25. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-6 or any one of claims 7-11 or any one of claims 12.