CN118349173A

CN118349173A - Data processing method, device, computing equipment and storage medium

Info

Publication number: CN118349173A
Application number: CN202410445368.2A
Authority: CN
Inventors: 刘易
Original assignee: Shanghai Bilibili Technology Co Ltd
Current assignee: Shanghai Bilibili Technology Co Ltd
Priority date: 2024-04-12
Filing date: 2024-04-12
Publication date: 2024-07-16

Abstract

The application discloses a data processing method, a data processing device, computing equipment and a storage medium. The method comprises the following steps: receiving a client request, and determining the current disk state of a storage node; if the disk state is the first state, acquiring a first threshold value matched with the first state, wherein the disk state of the storage node is the first state when the disk in the storage node is executing the merging operation; acquiring the current request load of a storage node; if the request load exceeds the first threshold, processing the client request is denied. According to the scheme, when the storage node executes the merging operation, the processing effect of the client request currently being processed by the storage node is guaranteed, the performance bottleneck of the storage node is avoided, and the occurrence frequency of request overtime or request failure is reduced.

Description

Data processing method, device, computing equipment and storage medium

Technical Field

The present application relates to the field of data storage technology, and in particular, to a data processing method, apparatus, computing device, computer storage medium, and computer program product.

Background

With the continuous development of technology and society, the conventional disk-based relational storage system cannot meet the storage requirements of high performance and low latency, and thus a disk-based LSM-Tree non-relational storage system has been developed. The LSM-Tree based non-relational storage system stores data in the form of key value pairs, has the characteristics of simple storage, high efficiency, easy expansion and the like, and is widely applied to a large-scale data storage scene.

The LSM-Tree based non-relational disk storage system contains Compaction operations on SSTable. Compaction operations occupy more CPU and disk IO resources, thereby causing system performance jitter, affecting the processing effect of the client requests currently being processed, causing more client requests to timeout or request failure, and the like.

Disclosure of Invention

The present application has been developed in response to the above-discussed problems, and in order to provide a data processing method, apparatus, computing device, computer storage medium, and computer program product that overcome, or at least partially solve, the above-discussed problems.

According to a first aspect of the present application there is provided a data processing method performed in a storage node employing a log structured merge tree architecture, the method comprising:

Receiving a client request, and determining the current disk state of the storage node;

If the disk state is a first state, acquiring a first threshold matched with the first state; when a disk in the storage node is executing merging operation, the disk state of the storage node is a first state;

Acquiring the current request load of the storage node;

and if the request load exceeds the first threshold, refusing to process the client request.

In an alternative embodiment, the method further comprises:

If the disk state is the second state, acquiring a second threshold matched with the second state; when the disk in the storage node does not execute the merging operation, the disk state of the storage node is in a second state, and the second threshold is larger than the first threshold;

and if the request load exceeds the second threshold, refusing to process the client request.

In an alternative embodiment, the method further comprises:

performing a pressure test on the storage node;

Determining a first request amount when the storage node is in a performance bottleneck of a first state in the pressure test process; the first threshold is determined according to the first request quantity.

And/or determining a second request amount when the storage node is in a performance bottleneck of a second state in the pressure test process; and determining the second threshold according to the second request quantity.

In an alternative embodiment, the method further comprises:

acquiring history request data of the storage node;

determining a request peak time of the storage node according to the historical request data;

generating a merging operation instruction in idle time before the request peak time so as to execute merging operation on a disk in the storage node according to the merging operation instruction; wherein a request load of the storage node for the idle time is less than a first threshold.

In an alternative embodiment, the method further comprises:

calculating the data writing speed of a disk in the storage node;

acquiring the total capacity and the residual capacity of each level of the magnetic disk;

Predicting the triggering time and/or merging level of the next merging operation according to the data writing speed, the total capacity and the residual capacity of each level;

and if the merging level is greater than a preset level and/or the trigger time is in the request peak time, executing the idle time before the request peak time to generate a merging operation instruction, so as to execute the merging operation according to the disk of the merging operation instruction in the storage node.

In an optional implementation manner, the predicting the trigger time of the next merging operation and/or the merging level according to the data writing speed and the residual capacity of each level includes:

Determining the triggering time according to the ratio of the residual capacity of the first level to the data writing speed, and determining the triggering merging operation of the first level;

Judging whether the residual capacity of the (i+1) th level is larger than or equal to the total capacity of the (i) th level; the initial value of i is 1;

if yes, determining that the (i+1) th level does not trigger merging operation, and taking the level smaller than or equal to i as a merging level;

If not, determining that the (i+1) -th level triggers merging operation; and after the i is increased by 1, executing the step of judging whether the residual capacity of the (i+1) th level is larger than the total capacity of the (i) th level.

In an alternative embodiment, the generating, at an idle time before the request peak time, a merge operation instruction for performing a merge operation according to the merge operation instruction on a disk in the storage node includes:

Generating a merging operation instruction aiming at a preset level in idle time before the request peak time so as to execute merging operation at the preset level according to the merging operation instruction;

Wherein the preset hierarchy is determined by:

Determining the disk writing quantity of the storage node in the request peak time according to the historical request data;

determining a preset level according to the writing quantity of the magnetic disk and the total capacity of each level; wherein the total capacity of the preset level is higher than the writing capacity of the magnetic disk.

According to a second aspect of the present application there is provided a data processing apparatus at a storage node employing a log structured merge tree architecture, the apparatus comprising:

the receiving module is used for receiving the client request;

The state determining module is used for determining the current disk state of the storage node;

the threshold determining module is used for acquiring a first threshold matched with the first state if the disk state is the first state; when a disk in the storage node is executing merging operation, the disk state of the storage node is a first state;

the load acquisition module is used for acquiring the current request load of the storage node;

and the decision module is used for refusing to process the client request if the request load exceeds the first threshold value.

In an alternative embodiment, the threshold determination module is configured to: if the disk state is the second state, acquiring a second threshold matched with the second state; when the disk in the storage node does not execute the merging operation, the disk state of the storage node is in a second state, and the second threshold is larger than the first threshold;

The decision module is used for: and if the request load exceeds the second threshold, refusing to process the client request.

In an alternative embodiment, the threshold determination module is configured to: performing a pressure test on the storage node; determining a first request amount when the storage node is in a performance bottleneck of a first state in the pressure test process; determining the first threshold according to the first request quantity;

In an alternative embodiment, the apparatus further comprises: the instruction generation module is used for acquiring history request data of the storage node; determining a request peak time of the storage node according to the historical request data;

In an alternative embodiment, the instruction generation module is configured to: calculating the data writing speed of a disk in the storage node;

In an alternative embodiment, the instruction generation module is configured to: determining the triggering time according to the ratio of the residual capacity of the first level to the data writing speed, and determining the triggering merging operation of the first level;

In an alternative embodiment, the instruction generation module is configured to: generating a merging operation instruction aiming at a preset level in idle time before the request peak time so as to execute merging operation at the preset level according to the merging operation instruction; the preset level is determined by the following steps: determining the disk writing quantity of the storage node in the request peak time according to the historical request data;

According to a third aspect of the present application there is provided a computing device comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the data processing method.

According to a fourth aspect of the present application, there is provided a computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the above-described data processing method.

According to a fifth aspect of the present application, there is provided a computer program product comprising at least one executable instruction for causing a processor to perform operations corresponding to the above-described data processing method.

When the storage node disk executes the merging operation, if the current request load of the storage node exceeds a first threshold value matched with the first state, the processing of the current newly received client request is refused, so that the processing effect of the client request currently processed by the storage node is ensured when the storage node executes the merging operation, the performance bottleneck of the storage node is avoided, and the occurrence frequency of excessive request overtime or request failure is reduced.

The application adopts different control thresholds (a first threshold and a second threshold) according to different situations of whether the disk executes the merging operation or not, and utilizes the different control thresholds to carry out request control, thereby realizing differentiated flow control, improving the flow control effect, avoiding the resource waste of the storage node caused by the undersize of the control threshold, avoiding the performance bottleneck of the storage node caused by the oversize of the control threshold, and avoiding the occurrence of more overtime request or failure request conditions.

According to the method and the device for testing the pressure of the storage node, the first threshold value and/or the second threshold value are/is determined according to the pressure test result, so that the setting of the first threshold value and the second threshold value is adapted to the performance of the actual storage node, and the request management and control precision is improved.

The application generates the merging operation instruction in idle time, and the merging operation instruction triggers the merging operation of the disk. Because the idle time is before the peak time of the storage node, the merging operation can be finished in advance before the request peak, the merging operation is avoided in the request peak time, the probability of performance bottleneck of the storage node is reduced, and the processing stability of the client request is ensured; and the request load of the storage node in idle time is smaller than a first threshold value, so that the performance bottleneck of the storage node is further avoided.

According to the method and the device, the triggering time and/or the merging level of the next merging operation are predicted according to the data writing speed, the total capacity of each level and the residual capacity, if the merging level is larger than the preset level and/or the triggering time is located in the request peak time, the merging operation instruction is generated in the idle time before the request peak time so as to execute the merging operation according to the disk in the storage node, so that the merging operation which consumes more system resources can be executed in advance before the peak time comes, and the performance bottleneck of the storage node is avoided.

The application adopts a recursion judgment mode to judge whether the merging operation of the upper hierarchy triggers the merging operation of the lower hierarchy layer by layer, thereby improving the triggering time and the prediction precision of the merging hierarchy.

According to the method and the device, the merging operation instruction aiming at the preset level is generated in the idle time in the request valley period before the request peak period, so that the merging operation can be executed in advance in the request valley period, and the performance bottleneck of the system is avoided.

According to the method and the device, the merging operation is executed at the preset level according to the merging operation instruction, so that the preset level can accommodate data in a space with more space, and the merging operation of the preset level and higher levels is prevented from being triggered in the peak period.

The application determines the preset level according to the disk writing quantity of the request peak time and the total capacity of each level, so that the preset level can accommodate the disk writing quantity of the peak time, and the merging operation of the preset level and higher levels is prevented from being triggered in the peak time.

The foregoing description is only an overview of the present application, and is intended to be implemented in accordance with the teachings of the present application in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present application more readily apparent.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 is a schematic diagram of a storage system according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a storage node according to an embodiment of the present application;

fig. 3 is a schematic flow chart of a data processing method according to a first embodiment of the present application;

fig. 4 is a schematic flow chart of a data processing method according to a second embodiment of the present application;

fig. 5 is a schematic flow chart of a data processing method according to a third embodiment of the present application;

fig. 6 is a schematic flow chart of a data processing method according to a fourth embodiment of the present application;

fig. 7 is a flow chart illustrating a trigger time and merge level prediction method according to a fourth embodiment of the present application;

fig. 8 is a schematic flow chart of a data processing method according to a fifth embodiment of the present application;

FIG. 9 is a schematic diagram of a data processing apparatus according to a sixth embodiment of the present application;

Fig. 10 is a schematic structural diagram of a computing device according to a seventh embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the application to those skilled in the art.

First, terms related to one or more embodiments of the present application will be explained.

LSM-Tree (Log-Structured-Merge-Tree) is a hierarchical, ordered disk-oriented data structure. The LSM-Tree structure stores the operation records such as data insertion, modification, deletion and the like in the memory, and writes the operation records into the disk in sequence in batches after the operation records reach a certain data volume.

SSTable (Sorted String Table, ordered string table) is the data structure of LSM-Tree in disk. SSTable stores operation records written from the memory in order.

Compaction (merge operation), LSM-Tree employs a multi-level topology in disk, each level containing one or more sstables. The process of moving data from one level of LSM-Tree to the next level is Compaction by triggering Compaction when the total capacity of sstables at a level exceeds a certain threshold, thereby merging sstables at that level into the next level.

Cascading merge, which is a process called cascading merge, when a merge operation performed by one level writes to the next level, causes the next level to perform the merge operation.

The following examples illustrate the implementation of the inventive arrangements in detail.

The scheme provided by the application can be applied to a storage system comprising one or more storage nodes, and the storage nodes adopt a log structure merging tree structure. For example, the storage system shown in fig. 1 may be employed, where the storage system shown in fig. 1 is a distributed system, and includes a plurality of storage nodes, and the storage nodes include a master node and a slave node. Each storage node comprises a service layer (Server) and a disk storage, wherein the disk storage is a disk-oriented database system adopting a log-structured merge tree architecture, and for example, the disk storage can be RocksDB (an embeddable persistent key-value storage based on a log-structured merge tree) and the like. In addition, the client may send data requests to the storage system, including write requests and read requests, with the corresponding requests being processed by respective storage nodes in the storage system.

The storage node in the application adopts the architecture of a log structure merging tree. As shown in fig. 2, the disk storage includes a cache portion and a disk portion. The cache contains MemTable, memTable which is a data structure used for storing the latest updated data and organizing the data orderly according to the Key; also included in the cache is immuTable (Immutable MemTable), which when MemTable reaches a certain size, converts to Immutable MemTable, immutable MemTable, an intermediate state that converts MemTable to SSTable. The disk portion includes WAL and SSTable of multiple levels (Level 0-n) and, after immuTable reaches a certain size, its data is restored (Flush) to disk. When the size or number of sstables in a certain level reaches a corresponding threshold, compaction of the level is triggered in the disk, so that the SSTable data in the level is merged into the next level. The service layer in the storage node sends a write request and a read request initiated by the client to the disk storage library. After receiving the write request, the disk storage library writes the write request into the Mem Table in sequence and records the write request in the WAL of the disk so as to facilitate fault recovery. The disk storage library receives the read request and inquires in the Mem Table in the cache, and returns if inquiring; if not, inquiring in immuTable, and if so, returning; if not, inquiring in the disk.

Example 1

Fig. 3 is a schematic flow chart of a data processing method according to a first embodiment of the present application. The method provided in this embodiment is performed in a storage node that adopts a log-structured merge tree architecture, and may be performed by a service layer in the storage node, for example.

Specifically, as shown in fig. 3, the method includes the steps of:

Step S301, a client request is received, and the current disk state of the storage node is determined.

The client request is a request sent by the client to the storage node, which may include a write request and/or a read request. In the prior art, the service layer of a storage node typically sends a client request to a disk storage for processing upon receiving the client request. Unlike the prior art, the service layer of the storage node monitors the disk state of the storage node, and manages and controls the client request according to the current disk state after receiving the client request.

Specifically, the application divides the disk state into a first state and a second state according to whether the disk in the storage node is executing the merging operation (Compaction), wherein the disk state is the first state when the merging operation is being executed in the disk, and the disk state is the second state when the merging operation is not being executed in the disk. Whereby the monitored current disk state is obtained after receiving the client request.

In an alternative embodiment, the current disk state of the storage node may be determined specifically by the following manner; when the disk storage in the storage node starts to execute the merging operation, notifying a service layer of the storage node in a callback function mode and the like, and marking the disk state of the storage node as a first state by the service layer; after the merging operation is finished, the disk storage informs a service layer of the storage node in a callback function mode, and the service layer marks the disk state of the storage node as a second state. By adopting the mode, the disk state of the storage node can be accurately determined.

Step S302, if the current disk state is a first state, a first threshold matched with the first state is obtained; when the disk in the storage node is executing the merging operation, the disk state of the storage node is a first state.

The first threshold is a request amount threshold that matches the first state, the first threshold being a request amount threshold employed to regulate client requests while in the first state.

In an alternative embodiment, before performing this step, the first threshold value may be determined in advance by: and performing pressure test on the storage node, determining a first request quantity when the storage node is in a performance bottleneck of a first state in the pressure test process, and determining a first threshold according to the first request quantity. Specifically, the storage node is continuously pressurized with the request quantity, when the performance index (such as the CPU utilization rate, the memory utilization rate and the like) of the storage node reaches a preset index threshold value, the storage node is indicated to reach a performance bottleneck, and if the storage node is in a first state, the number of client requests processed by the current storage node is the first request quantity; or generating a merging operation instruction to enable a disk in the storage node to execute the merging operation, so that the storage node is kept in a first state, and after the storage node is in the first state, the storage node is rapidly subjected to pressure test, so that the storage node rapidly reaches a performance bottleneck, and a first request quantity when the storage node is in the performance bottleneck in the first state is obtained. The first threshold is further derived from the first request amount, e.g., the first threshold may be equal to or near the first request amount, etc. By adopting the implementation mode for generating the first threshold according to the pressure test result, the performance of the first threshold and the actual storage node can be adapted, and the request management and control precision is improved.

Step S303, obtaining the current request load of the storage node.

The storage node's current request load is the number of client requests currently carried by the storage node, including the total number of client requests currently being processed by the storage node.

In step S304, if the request load exceeds the first threshold, the processing of the client request is refused.

If the request load of the storage node exceeds the first threshold, if the storage node continues to process the client request, the processing effect of the client request currently being processed by the storage node is easily affected, and performance bottlenecks of the storage node are easily caused. In view of this, under the condition that the current request load of the storage node exceeds the first threshold, processing of the client request is refused, that is, the client request is not sent to the disk storage, so that the processing effect of the client request currently being processed by the storage node is ensured, and excessive request overtime or request failure is avoided.

Therefore, in the data processing method provided by the embodiment of the application, when the merging operation is executed in the disk of the storage node (the disk state is the first state), if the current request load of the storage node exceeds the first threshold value matched with the first state, the processing of the client request which is newly received is refused, so that when the storage node executes the merging operation, the processing effect of the client request which is currently processed by the storage node is ensured, the performance bottleneck of the storage node is avoided, and the occurrence frequency of excessive request overtime or request failure is reduced.

Example two

Fig. 4 is a flow chart of a data processing method according to a second embodiment of the present application. The method provided in this embodiment is performed in a storage node that adopts a log-structured merge tree architecture, and may be performed by a service layer in the storage node, for example.

Specifically, as shown in fig. 4, the method includes the steps of:

in step S401, a client request is received, where the client request carries the retry number.

After a client sends a certain client request, if a request result of the client request cannot be received, the client will make a request retry to initiate the client request again. In the embodiment of the application, each client request carries the retry number corresponding to the current request, and the retry number is the retry number corresponding to the current request.

In an alternative embodiment, the client request may also carry a retry threshold, which is the total number of retries requested by the configured client.

Step S402, determining the current disk state of a storage node; if the first state is the first state, executing step S403; if the second state is the second state, step S405 is executed.

When a disk in a storage node is executing merging operation, the disk state of the storage node is a first state; and when the disk in the storage node does not execute the merging operation, the disk state of the storage node is a second state.

Step S403, a first threshold value matching the first state is acquired.

Step S404, judging whether the current request load of the storage node exceeds a first threshold value; if yes, go to step S407; if not, step S411 is performed.

And when the disk in the storage node is executing the merging operation, the client request is regulated by using a first threshold value. Specifically, if the current request load of the storage node exceeds the first threshold, if the current client request is continuously received to easily cause the storage node to have a performance bottleneck, step S407 is executed to further determine whether the client request needs to be refused to be processed; if the current request load of the storage node does not exceed the first threshold, step S411 is executed to enable the storage node to process the client request, so as to send the client request to the disk storage for data reading and writing.

Step S405, a second threshold matching the second state is acquired.

The second threshold is a request amount threshold that matches the second state, the second threshold being a request amount threshold employed to regulate client requests while in the second state. Wherein the second threshold is greater than the first threshold.

In an alternative embodiment, the second threshold value may be determined in advance before performing this step by: and performing pressure test on the storage node, determining a second request amount when the storage node is in a performance bottleneck of a second state in the pressure test process, and determining a second threshold according to the second request amount. Specifically, the storage node is continuously pressurized with the request amount, when the performance index (such as CPU utilization, memory utilization, etc.) of the storage node reaches the preset index threshold, it indicates that the storage node reaches the performance bottleneck, and if the storage node is in the second state, the number of client requests processed by the current storage node is the second request amount. Further based on the second request amount, a second threshold is derived, e.g., the second threshold may be equal to or near the second request amount, etc. By adopting the implementation mode for generating the second threshold according to the pressure test result, the second threshold can be matched with the performance of the actual storage node, and the request management and control precision is improved.

Step S406, judging whether the current request load of the storage node exceeds a second threshold value; if yes, go to step S407; if not, step S411 is performed.

And when the disk does not execute the merging operation in the storage node, the client request is regulated by using a second threshold value. Specifically, if the current request load of the storage node exceeds the second threshold, if the current client request is continuously received to easily cause the performance bottleneck of the storage node, step S407 is executed to further determine whether the client request needs to be rejected; if the current request load of the storage node does not exceed the second threshold, step S411 is executed to enable the storage node to process the client request, so as to send the client request to the disk storage for data reading and writing.

Step S407, judging whether the retry number of the client request exceeds a retry threshold; if yes, go to step S411; if not, step S408 is performed.

A retry threshold is preset, which may be carried to the client request, etc. If the retry number of the current client request exceeds the retry threshold, it indicates that the retry number of the current client request is more, step S411 is executed to process the client request, thereby ensuring user requirements and reducing occurrence of request failure; if the number of retries of the current client request does not exceed the retry threshold, which indicates that the number of retries of the current client request is small, step S408 is performed to reject the client request, which is not sent to the disk storage of the storage node.

Step S408, identifying the request type of the client request; if the write type is the write type, executing step S409; if the read type is the case, step S410 is performed.

If it is determined that the current client request needs to be refused, different implementation modes are further adopted according to different request types. In particular, the received client requests may be divided into a write type and a read type, the write type client requests being related to writing of data, whereby the client requests are received by a master node in a storage system of a master-slave structure; the read-type client request involves only a query of the data, whereby the client request can be received by both the master node and the slave node in the storage system of the master-slave structure.

Step S409, discard the client request.

If the current client request is of the write type, the client request is discarded (Drop), and the client performs a request retry since the client request has not reached the retry threshold. The retry interval of the request retry increases with the increase of the number of retries, thereby avoiding resource waste.

In an alternative embodiment, if it is monitored that the index value such as the current system load or the request failure rate of the storage node exceeds the preset index threshold, this step may be directly performed to discard the client request.

Step S410, forwarding the client request to other storage nodes.

If the current client request is of the read type, the client request is forwarded to other storage nodes in the storage system of the master-slave structure, which process the client request.

In step S411, the client request is processed.

Therefore, in the data processing method provided by the embodiment of the application, different control thresholds are adopted according to different situations of whether the disk executes the merging operation, and the different control thresholds are utilized to perform request control, so that differentiated flow control is realized, the flow control effect is improved, the resource waste of the storage node caused by too small control threshold setting is avoided, the performance bottleneck of the storage node caused by too large control threshold setting is avoided, and more request overtime or request failure situations are avoided.

Example III

Fig. 5 shows a flow chart of a data processing method according to a third embodiment of the present application. The method provided in this embodiment is performed in a storage node that adopts a log-structured merge tree architecture, and may be performed by a service layer in the storage node, for example.

Specifically, as shown in fig. 5, the method includes the steps of:

Step S501, history request data of a storage node is acquired.

Specifically, the history request data of the storage node includes data such as the number of client requests received by the storage node at each sampling time within the history time window. For example, the number of requests received by the store at each sample time for the past week may be obtained.

Step S502, determining the request peak time of the storage node according to the history request data.

The number of client requests received by the storage node often has a certain periodicity, for example, the trend of variation of the number of client requests received by the storage node every day is consistent, and the number of client requests presents fluctuation variation in a time period, so that the peak period of the number of client requests received by the storage node can be counted according to historical request data, and the peak period is the request peak period. For example, a change curve of the number of requests per day in the past may be plotted, and the peak value of the change curve may be determined, where the time interval corresponding to the peak value is the request peak period.

Step S503, generating a merging operation instruction in an idle time before a request peak time, so as to execute merging operation according to a disk in a storage node of the merging operation instruction; wherein the request load of the storage node for idle time is less than a first threshold.

The idle time in the present application is a time earlier than the request peak time and the request load of the storage node at that time is less than the first threshold. Therefore, after the merging operation instruction is generated in the idle time, the disk storage library can execute the instruction so as to trigger the merging operation on the disk, and the merging operation is finished in advance before the request peak time. The embodiment of the present application is not limited to a specific generation manner of the combined operation instruction, for example, compaction may be triggered through an interface such as CompactRange, compactFiles.

Therefore, in the data processing method provided by the embodiment of the application, the merging operation instruction is generated in the idle time, and the merging operation of the disk is triggered by the merging operation instruction. Because the idle time is before the peak time of the storage node, the merging operation can be finished in advance before the request peak, the merging operation is avoided in the request peak time, the probability of performance bottleneck of the storage node is reduced, and the processing stability of the client request is ensured; and the request load of the storage node in idle time is smaller than a first threshold value, so that the performance bottleneck of the storage node is further avoided.

Example IV

Fig. 6 is a schematic flow chart of a data processing method according to a fourth embodiment of the present application. The method provided in this embodiment is performed in a storage node that adopts a log-structured merge tree architecture, and may be performed by a service layer in the storage node, for example.

Specifically, as shown in fig. 6, the method includes the steps of:

step S601, obtaining history request data of a storage node, and determining a request peak time of the storage node according to the history request data.

Step S602, calculating the data writing speed of the disk in the storage node.

And in the historical time section (such as the last week, etc.), the data volume written into the storage node disk is counted in at least one sampling period (such as each hour is used as one sampling period, etc.), and the data writing speed of the sampling period is determined according to the ratio of the data volume to the duration of the sampling period. The average data writing speed can be obtained according to the data writing speed of each sampling period, or the data writing speeds corresponding to each sampling period are respectively recorded.

When the data writing speed of the magnetic disk is calculated, the current sampling period can be determined, and the data writing speed corresponding to the sampling period is used as the data writing speed of the magnetic disk in the step; the average data writing speed may also be used as the data writing speed of the magnetic disk in this step.

Step S603, obtaining the total capacity and the residual capacity of each level of the disk.

In a storage node employing a log-structured merge tree architecture, a disk contains multiple tiers, each tier containing one or more sstables. Each level has a corresponding total capacity, e.g., the total capacity of the next level is typically 10 times the last level, etc. The total capacity of a hierarchy is the maximum amount of data that the hierarchy can accommodate, and when the occupied capacity of the hierarchy (the amount of data written in the hierarchy) exceeds the total capacity, the merging operation of the hierarchy is automatically triggered, so that SSTable in the hierarchy is merged and written into the next hierarchy. Where the remaining capacity of the hierarchy = total capacity-occupied capacity.

In step S604, the trigger time and/or merging level of the next merging operation is predicted according to the data writing speed, the total capacity of each level, and the remaining capacity.

The data writing speed can reflect the speed of writing the disk data, the total capacity and the residual capacity of each level can reflect the capacity characteristics of each level of the disk, and the triggering time and/or the merging level of the next automatic triggering merging operation can be estimated according to the data writing speed, the total capacity and the residual capacity of each level. The triggering time is the predicted time for automatically triggering the disk merging operation after the data is continuously written into the disk, and the merging level is the level where the merging operation occurs in the disk.

In an alternative embodiment, the steps shown in fig. 7 may be specifically used to predict the trigger time and/or the merge level of the next merge operation:

S6041, determining the triggering time according to the ratio of the residual capacity of the first level to the data writing speed, and determining the first level triggering merging operation.

The ratio R1 of the remaining capacity of the first level (L1 level) to the data writing speed indicates that R1 is also required, and the occupied capacity of the first level reaches the total capacity of the L1 level, so that the merging operation is triggered at the first level.

S6042, judging whether the residual capacity of the (i+1) th level is larger than or equal to the total capacity of the (i) th level; if yes, go to step S6043; if not, go to step S6044.

Wherein, i is 1 as initial value. If the ith level triggers the merge operation, SSTable in the ith level is merged and written into the (i+1) th level. It is determined by this step whether the i+1-th level merge operation is triggered after the i-th level data is merged and written into the i+1-th level. Specifically, it is determined whether the remaining capacity of the i+1th hierarchy is greater than or equal to the total capacity of the i-th hierarchy.

S6043, determining that the (i+1) th level does not trigger merging operation, and taking a level less than or equal to i as a merging level.

If the remaining capacity of the i+1th level is greater than or equal to the total capacity of the i+1th level, determining that the i+1th level does not trigger the merging operation, and the i+1th level does not trigger the merging operation and does not cause merging of subsequent levels, so that a level less than or equal to i is taken as a merging level, the triggering time corresponding to each merging level is the triggering time determined in the step S6041, and ending the method.

S6044, determining that the (i+1) th level triggers merging operation, taking a level smaller than or equal to i+1 as a merging level, and increasing i by 1.

If the remaining capacity of the i+1th level is smaller than the total capacity of the i+1th level, determining that the i+1th level triggers the merging operation, and taking the level smaller than or equal to i+1 as the merging level, wherein the triggering time corresponding to each merging level is the triggering time determined in the step S6041. And (3) further carrying out i++, and carrying out the next judgment.

S6045, judging whether i is smaller than N; if yes, go to step S6042; if not, the method ends.

Where N is the highest level of the disk. If i=n, indicating that i is the highest level, and finishing the method when all levels are judged; if i is less than N, continuing the merging judgment of the subsequent layers.

For example, the trigger time of the next merging operation is determined to be T according to the ratio of the remaining capacity of the first hierarchy to the data writing speed (corresponding to step S6041), and the first hierarchy triggers the merging operation; step S6042 is performed (at this time, i=1) to determine whether the remaining capacity of the second hierarchy is greater than or equal to the total capacity of the first hierarchy; if yes, step S6043 is executed to determine that the second hierarchy does not trigger the merging operation, and only the first hierarchy is used as the merging hierarchy; if not, step S6044 is executed to determine that the second hierarchy triggers the merging operation, and i is self-increased to 2 by taking the first hierarchy and the second hierarchy as the merging hierarchy.

Further performing step S6045 (n=4, where i=2) since i < N, performing step S6042 to determine whether the remaining capacity of the third hierarchy is greater than or equal to the total capacity of the second hierarchy; if yes, step S6043 is executed to determine that the third hierarchy does not trigger the merging operation, and the first hierarchy and the second hierarchy are used as merging hierarchies; if not, determining that the third level triggers the merging operation, taking the first level, the second level and the third level as merging levels, and increasing i to 3.

Further performing step S6045 (n=4, where i=3) since i < N, performing step S6042 to determine whether the remaining capacity of the fourth hierarchy is greater than or equal to the total capacity of the third hierarchy; if yes, step S6043 is executed to determine that the fourth hierarchy does not trigger the merging operation, and the first hierarchy, the second hierarchy, and the third hierarchy are used as merging hierarchies; if not, determining that the fourth level triggers the merging operation, taking the first level, the second level, the third level and the fourth level as merging levels, and increasing i to 4.

Further execution of step S6045 (n=4, where i=4) ends the present method since i=n.

By adopting the prediction mode, the triggering time of the next merging operation and each merging level can be accurately predicted.

In yet another alternative embodiment, since in the actual implementation, the total capacity of the next level is typically 10 times that of the previous level, then as a rough prediction manner, the upper layer data effect may be ignored, and the trigger time of the merging operation of each level=the remaining capacity of the level/the data writing speed, thereby improving the prediction efficiency.

In step S605, if the merge level is greater than the preset level and/or the trigger time is in the request peak period, a merge operation instruction is generated in idle time before the request peak period, so as to execute the merge operation according to the merge operation instruction.

Specifically, the larger the merge hierarchy, the longer the merge operation time, and the more system resources are consumed by the merge operation, which has a greater impact on request processing. The present application may set a preset level, such as a third level, etc. When the merge level is greater than the preset level, it indicates that it needs to consume more system resources and longer merge time. If the trigger time is located in the request peak time, executing an idle time before the request peak time to generate a merging operation instruction, so as to enable a disk in the storage node to execute the merging operation according to the merging operation instruction, and executing the merging operation which consumes more system resources in advance before the peak time comes.

In an alternative embodiment, if the trigger time predicted by the storage node is the same as the trigger time predicted by other storage nodes, the times of generating the merging operation instructions by the storage nodes are different, so that the disk merging operation in the storage nodes is prevented from being triggered at the same time, and performance bottlenecks of the storage system are avoided.

Therefore, according to the data processing method provided by the embodiment of the application, the triggering time and/or the merging level of the next merging operation are predicted according to the data writing speed, the total capacity of each level and the residual capacity, if the merging level is greater than the preset level and/or the triggering time is located in the request peak time, the merging operation instruction is generated in the idle time before the request peak time so as to enable the disk in the storage node to execute the merging operation according to the merging operation instruction, so that the merging operation which consumes more system resources can be executed in advance before the peak time comes, and the performance bottleneck of the storage node is avoided.

Example five

Fig. 8 is a schematic flow chart of a data processing method according to a fifth embodiment of the present application. The method provided in this embodiment is performed in a storage node that adopts a log-structured merge tree architecture, and may be performed by a service layer in the storage node, for example.

Specifically, as shown in fig. 8, the method includes the steps of:

Step S801, obtain history request data of the storage node, and determine a request peak period and a request valley period of the storage node according to the history request data.

The number of the client requests presents fluctuation change in a time period, so that peak time periods of the number of the client requests received by the storage node can be counted according to historical request data, and the peak time periods are request peak time periods; accordingly, the trough period of the number of requests of the client received by the storage node can be counted according to the historical request data, and the trough period is the request trough period.

Step S802, determining an idle time from a request valley period preceding a request peak period.

And determining a request valley period before a request peak, and determining an idle time from the request valley period, wherein the request load of a storage node in the idle time is smaller than a first threshold value.

In step S803, a merge operation instruction for the preset hierarchy is generated in idle time, so as to execute the merge operation at the preset hierarchy according to the merge operation instruction.

Since higher levels of merge operations involve multiple levels of merge cascading, and the higher the level the greater the amount of merge data, the higher the level the more system resources are consumed to perform the merge operation. In view of this, a merge operation instruction for a preset hierarchy is generated at an idle time before a request peak period for performing a merge operation at the preset hierarchy according to the merge operation instruction. For example, the preset level may be a third level and/or a fourth level, etc. So that after the preset level performs the merge operation, more data can be accommodated.

In an alternative embodiment, the preset hierarchy may be specifically determined by: determining the disk writing quantity of a storage node in a request peak period according to the historical request data, and determining a preset level according to the disk writing quantity and the total capacity of each level; wherein the total capacity of the preset level is higher than the writing capacity of the magnetic disk. By adopting the method, the preset level can accommodate the disk writing quantity in the peak time, and the merging operation of the preset level and higher levels is prevented from being triggered in the peak time. For example, the disk write amount during the peak period of each day is 8GB, and the total capacity of each layer (L0, L1 … … Ln) of LSM-Tree of the RocksDB is as follows: 10MB, 100MB, 1GB, 10 GB..10 (n+1) MB, L3 or L4 may be used as the predetermined level.

Therefore, in the data processing method provided by the embodiment of the application, the merging operation instruction aiming at the preset level is generated in the idle time in the request low valley period before the request peak period, and the merging operation is executed at the preset level according to the merging operation instruction, so that the preset level can hold data in a space with more space, and the merging operation of the preset level and higher levels is prevented from being triggered in the peak period.

Example six

Fig. 9 is a schematic structural diagram of a data processing apparatus according to a sixth embodiment of the present application. The apparatus is located at a storage node employing a log structured merge tree architecture. As shown in fig. 9, the apparatus 900 includes: a receiving module 901, a state determining module 902, a threshold determining module 903, a load obtaining module 904, and a decision module 905.

A receiving module 901, configured to receive a client request;

a state determining module 902, configured to determine a current disk state of the storage node;

The threshold determining module 903 is configured to obtain a first threshold that is matched with the first state if the disk state is the first state; when a disk in the storage node is executing merging operation, the disk state of the storage node is a first state;

a load obtaining module 904, configured to obtain a current request load of the storage node;

a decision module 905, configured to refuse to process the client request if the request load exceeds the first threshold.

In an alternative embodiment, the threshold determination module is configured to: performing a pressure test on the storage node; determining a first request amount when the storage node is in a performance bottleneck of a first state in the pressure test process; determining the first threshold value according to the first request quantity and/or determining a second request quantity when the storage node is in a performance bottleneck of a second state in the pressure test process; and determining the second threshold according to the second request quantity.

In an alternative embodiment, the instruction generation module is configured to: determining a request valley period of the storage node according to the historical request data;

the idle time is determined from a request valley period preceding the request peak period.

Therefore, in the data processing device provided by the embodiment of the application, when the merging operation is executed in the storage node disk (the disk state is the first state), if the current request load of the storage node exceeds the first threshold value matched with the first state, the processing of the client request which is newly received is refused, so that when the storage node executes the merging operation, the processing effect of the client request which is currently processed by the storage node is ensured, the performance bottleneck of the storage node is avoided, and the occurrence frequency of excessive request overtime or request failure is reduced.

Example seven

FIG. 10 is a schematic structural diagram of an embodiment of a computing device according to a seventh embodiment of the present application, which is not limited to the specific implementation of the computing device.

As shown in fig. 10, the computing device may include: a processor 1002, a communication interface Communications Interface, a memory 1006, and a communication bus 1008.

Wherein: the processor 1002, communication interface 1004, and memory 1006 communicate with each other via a communication bus 1008. Communication interface 1004 is used for communicating with network elements of other devices, such as clients or other servers. The processor 1002 is configured to execute the program 1010, and may specifically perform relevant steps in the above-described data processing method embodiment.

In particular, program 1010 may include program code including computer operating instructions.

The processor 1002 may be a central processing unit CPU, or an Application-specific integrated Circuit ASIC (Application SPECIFIC INTEGRATED Circuit), or one or more integrated circuits configured to implement embodiments of the present application. The one or more processors included by the computing device may be the same type of processor, such as one or more CPUs; but may also be different types of processors such as one or more CPUs and one or more ASICs.

Memory 1006 for storing programs 1010. The memory 1006 may include high-speed RAM memory or may further include non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. The program 1010 may be used in particular to cause the processor 1002 to perform the steps of the data processing method described above.

Example eight

An eighth embodiment of the present application provides a non-volatile computer storage medium, where at least one executable instruction or a computer program is stored, where the executable instruction or the computer program may cause a processor to perform operations corresponding to the data processing method in any of the foregoing method embodiments.

Example nine

Embodiments of the present application provide a computer program product comprising at least one executable instruction or a computer program for causing a processor to perform operations corresponding to the data processing method in any of the above-described method embodiments.

The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with the teachings herein. The required structure for a construction of such a system is apparent from the description above. In addition, embodiments of the present application are not directed to any particular programming language. It will be appreciated that the teachings of the present application described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of enablement and best mode of the present application.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the above description of exemplary embodiments of the application, various features of the embodiments of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed application requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.

Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.

Various component embodiments of the application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functionality of some or all of the components according to embodiments of the present application may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). The present application can also be implemented as an apparatus or device program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the present application may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specifically stated.

Claims

1. A data processing method, the method performed in a storage node employing a log structured merge tree architecture, the method comprising:

Acquiring the current request load of the storage node;

2. The method according to claim 1, wherein the method further comprises:

3. The method according to claim 2, wherein the method further comprises:

performing a pressure test on the storage node;

Determining a first request amount when the storage node is in a performance bottleneck of a first state in the pressure test process; determining the first threshold according to the first request quantity;

4. A method according to any one of claims 1-3, characterized in that the method further comprises:

acquiring history request data of the storage node;

5. The method according to claim 4, wherein the method further comprises:

calculating the data writing speed of a disk in the storage node;

6. The method of claim 5, wherein predicting a trigger time and/or a merge level of a next merge operation based on the data writing speed and a remaining capacity of the respective levels comprises:

7. The method of any of claims 4-6, wherein the generating a merge operation instruction at an idle time prior to the request peak period for a disk in the storage node to perform a merge operation in accordance with the merge operation instruction comprises:

Wherein the preset hierarchy is determined by: determining the disk writing quantity of the storage node in the request peak time according to the historical request data; determining a preset level according to the writing quantity of the magnetic disk and the total capacity of each level; the total capacity of the preset level is higher than the writing capacity of the magnetic disk.

8. A data processing apparatus, the apparatus being located at a storage node employing a log structured merge tree architecture, the apparatus comprising:

the receiving module is used for receiving the client request;

9.A computing device, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;

the memory is configured to store at least one executable instruction, where the executable instruction causes the processor to perform operations corresponding to the data processing method according to any one of claims 1 to 7.

10. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the data processing method of any one of claims 1-7.

11. A computer program product comprising at least one executable instruction for causing a processor to perform operations corresponding to the data processing method according to any one of claims 1 to 7.