CN117336248A

CN117336248A - Flow limiting method, device and medium for distributed object storage

Info

Publication number: CN117336248A
Application number: CN202311305149.6A
Authority: CN
Inventors: 高志远; 王陈幸; 聂城星
Original assignee: Macrosan Technologies Co Ltd
Current assignee: Macrosan Technologies Co Ltd
Priority date: 2023-10-09
Filing date: 2023-10-09
Publication date: 2024-01-02

Abstract

The embodiment of the specification provides a flow limiting method, a device, a medium and electronic equipment for distributed object storage, wherein the method comprises the following steps: the storage node receives traffic data, wherein the traffic data comprises a traffic type identifier and traffic entity data, and the traffic entity data comprises object actual data and object metadata; judging whether the sum of the metadata flow weight and the current accumulated metadata flow weight of the leakage bucket is larger than a preset metadata flow threshold of the leakage bucket or not under the condition that the flow type of the flow data is identified as a recovery type; if yes, generating a bucket leakage delay task corresponding to the flow data, and placing the bucket leakage delay task into a bucket leakage buffer queue for delay processing; if not, updating the accumulated metadata flow weight of the leaky bucket to be the current accumulated metadata flow weight plus the metadata flow weight, and issuing the flow.

Description

Flow limiting method, device and medium for distributed object storage

Technical Field

The embodiment of the specification relates to the technical field of object storage, in particular to a flow limiting method, a device, a medium and electronic equipment for distributed object storage.

Background

Data recovery is an integral part of object storage. When abnormal conditions such as disk collapse and node power failure occur in hardware equipment, the storage system can reconstruct data, normal service processing, a life cycle of timing execution and the like are running normally, different service flows can adopt a competition mode to determine the sequence of hardware processing, and therefore the service quality of a single service type cannot be guaranteed.

In the existing technical scheme, priority scheduling in a QoS (Quality of Service ) scheduling algorithm is generally adopted to ensure the quality of service of a single service type, for a service type with a higher priority, the service type will be directly processed, and for a service type with a lower priority, the service type will be subjected to flow limiting processing through a leaky bucket.

In the scheme, the capacity threshold value of the leaky bucket is fixed, and the traffic exceeding the capacity threshold value of the leaky bucket is discarded.

Disclosure of Invention

In view of this, one or more embodiments of the present specification provide a method, apparatus, medium, and electronic device for flow restriction of distributed object storage.

In order to achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:

According to a first aspect of one or more embodiments of the present disclosure, a flow restriction method for a distributed object store is provided, where the object store includes at least one storage node, the storage node includes at least one disk, each disk includes a leaky bucket, and the leaky bucket includes a leaky bucket buffer queue;

the method comprises the following steps:

the storage node receives traffic data, wherein the traffic data comprises a traffic type identifier and traffic entity data, and the traffic entity data comprises object actual data and object metadata;

judging whether the sum of the metadata flow weight and the current accumulated metadata flow weight of the leakage bucket is larger than a preset metadata flow threshold of the leakage bucket or not under the condition that the flow type of the flow data is identified as a recovery type; the metadata flow weight is obtained by determining the flow type identifier of the flow data based on the mapping relation between the flow type identifier and the metadata flow weight;

if yes, generating a bucket leakage delay task corresponding to the flow data, and placing the bucket leakage delay task into a bucket leakage buffer queue for delay processing;

if not, updating the accumulated metadata flow weight of the leaky bucket to be the current accumulated metadata flow weight plus the metadata flow weight, and issuing the flow.

According to a second aspect of one or more embodiments of the present specification, there is provided a flow restriction device for a distributed object store, the object store comprising at least one storage node, the storage node comprising at least one disk, each disk comprising a leaky bucket, the leaky bucket comprising a leaky bucket cache queue;

the device comprises:

the flow receiving module is used for receiving flow data by the storage node, wherein the flow data comprises object actual data and object metadata;

the judging module is used for judging whether the sum of the metadata flow weight and the current accumulated metadata flow weight of the leakage bucket is larger than a preset metadata flow threshold of the leakage bucket or not under the condition that the flow type of the flow data is identified as a recovery type; the metadata flow weight is obtained by determining the flow type identifier of the flow data based on the mapping relation between the flow type identifier and the metadata flow weight;

and the flow processing module is used for generating a leakage bucket delay task corresponding to the flow data if the sum of the metadata flow weight and the leakage bucket current accumulated metadata flow weight is larger than a preset metadata flow threshold of the leakage bucket, placing the leakage bucket delay task into a leakage bucket cache queue for delay processing, and updating the accumulated metadata flow weight of the leakage bucket as the current accumulated metadata flow weight plus the metadata flow weight if the sum of the metadata flow weight and the leakage bucket current accumulated metadata flow weight is not larger than the preset metadata flow threshold of the leakage bucket, and issuing the flow.

According to a third aspect of one or more embodiments of the present specification, there is provided an electronic device comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor implements the method of any of the embodiments of the present specification by executing the executable instructions.

According to a fourth aspect of one or more embodiments of the present description, there is provided a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement a method of any of the embodiments of the present description.

According to the flow limiting method, the flow limiting device, the flow limiting medium and the electronic equipment for the distributed object storage, which are disclosed by the embodiment of the specification, through setting the buffer queue for the leaky bucket, the flow data exceeding the metadata flow threshold value is placed in the buffer queue, so that the discarding of the flow data cannot be caused, and the processing of the flow data is delayed through the flow limiting of the leaky bucket, so that the aim of limiting the IOPS can be achieved.

Drawings

In order to more clearly illustrate the technical solutions of one or more embodiments of the present disclosure or related technologies, the following description will briefly describe the drawings that are required to be used in the embodiments or related technology descriptions, and it is apparent that the drawings in the following description are only some embodiments described in one or more embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort to those of ordinary skill in the art.

FIG. 1 is a flow chart illustrating a method of traffic limiting for distributed object storage according to an exemplary embodiment of the present disclosure;

FIG. 2 is a flow chart illustrating a method of traffic restriction for distributed object storage according to an exemplary embodiment of the present disclosure;

FIG. 3 is a flow chart illustrating a method of traffic limiting for distributed object storage according to an exemplary embodiment of the present disclosure;

FIG. 4 is a flow chart illustrating a method of traffic restriction for distributed object storage according to an exemplary embodiment of the present disclosure;

FIG. 5 is a flow chart illustrating a method of traffic restriction for distributed object storage according to an exemplary embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a flow restriction device for distributed object storage according to an exemplary embodiment of the present disclosure;

fig. 7 is a hardware schematic of an apparatus shown in an exemplary embodiment of the present specification.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with the embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with aspects of the embodiments of the present description as detailed in the accompanying claims.

The terminology used in the embodiments of the description presented herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the embodiments of the description presented herein. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in the embodiments of the present specification to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

For ease of understanding, some concepts related to the embodiments of the present description are explained:

The object is: the basic entity stored in the object storage system is an aggregate of data of a file and related attribute information thereof, and the object consists of actual data of the object, metadata of the object and an object key.

Object actual data: the data portion of an object is typically unstructured data.

Object metadata: the attribute information of the object is a set of key value pairs, and can be colloquially understood as the attribute of the object, such as modification time, storage type, size, owner and the like of the object.

And (3) distributed storage: data is stored in a plurality of storage nodes in a scattered manner, and the scattered storage resources form a virtual storage system.

Payload: and the load of the RSocket message of the back-end network, wherein the RSocket is a communication mode of the storage system in the back-end network for data exchange among nodes, and the message to be transmitted each time is carried and sent by a Payload.

IOPS (Input/OutputOperations Per Second, number of read or write data operation requests that can be handled per unit time): input output per second (or read/write times) is one of the main indicators for measuring the performance of a magnetic disk.

Bandwidth: and the unit MB/s is used for measuring the speed of writing the object into the disk in unit time.

Callback function: unlike the programmer directly calling a function (call) through a command to complete the calling of the higher layer to the bottom layer, the callback function (callback) requires the realization of the transfer of a preset function from the higher layer to the bottom layer, and the reverse calling of the preset function of the higher layer is performed when the bottom layer executes to an expected step. The object storage system implements non-blocking operations on the data stream through asynchronous callbacks.

Channel: the producer and consumer all need to establish a Connection with the RabbitMQ, which is a TCP Connection, i.e., connection. Once the TCP connection is established, the client may then create an AMQP Channel (Channel), each of which is assigned a unique ID. The channel is a virtual Connection established over Connection, and each AMQP instruction processed by the rabhitmq is completed through the channel.

When abnormal conditions such as disk damage and node power failure occur in hardware equipment of the distributed object storage, the distributed object storage system can reconstruct data through redundancy strategies such as EC erasure codes, multiple copies and the like. When the system sends the part of the data stored in the rest nodes to the node through the back-end network, the node stores the reconstructed data into each disk of the current node, and the data flow formed by the data is the recovery flow data. Likewise, tasks such as data migration may also generate data recovery traffic. Different types of traffic data exist in the system, for example, a data stream when an upper layer service is processed is called an upper layer service traffic, and a data stream of a lifecycle batch deleting task which is executed regularly is called a lifecycle batch deleting traffic, etc.

For example, there is an object a, which has 6 data blocks and is equally stored in three nodes, the object a data block 1 is stored in the node 1, the disk of the data block 2 is damaged and is offline, the node 2 and the node 3 send the data block 3, the data block 4, the data block 5 and the data block 6 stored in the object a to the node 1 to calculate the missing two data blocks, and the data stream of the two data blocks sent by the node 2 to the node 1 and the node 3 to the node 1 and calculated by the node 1 when the data blocks are in the offline state is called as recovery traffic data.

In order to solve the problem that when the service quality of a single service type is ensured in the background art, due to the fact that the capacity threshold of a leaky bucket is fixed, when the leaky bucket is limited for the service type with lower priority, the service exceeding the leaky bucket threshold is discarded, the embodiment of the specification provides a flow limiting method for distributed object storage.

FIG. 1 is a flow chart illustrating a method of traffic limiting for distributed object storage in accordance with an exemplary embodiment.

As shown in fig. 1, the method may include the following processes. It should be noted that, in the embodiment of the present disclosure, the execution sequence of each step is not limited, and the sequence numbers of the steps in the following description are not limiting the sequence of the steps:

In step 101, a storage node receives traffic data.

In this step, the object store includes at least one storage node, where the storage node includes at least one disk, each disk includes a leaky bucket and a traffic statistics unit, and the leaky bucket includes a leaky bucket buffer queue.

The traffic data includes a traffic type identification and traffic entity data including object actual data and object metadata.

It should be noted that, the traffic data is an abstract concept, and in this embodiment of the present disclosure, the traffic data represents traffic data of a backend network RSocket, which includes two parts, one part is a traffic type identifier PayloadType for identifying traffic, and the other part is a traffic entity PayloadData, i.e. a byte stream.

In step 102, the storage node determines whether the traffic type is restoration traffic.

And the storage node receives the flow data and judges the type of the flow data through a flow type identifier carried in the flow data.

The storage node analyzes the flow type identifier in the flow data after receiving the flow data sent by the back-end network, judges whether the flow type identifier is contained in a preset recovery set, and determines that the flow type of the flow is the recovery type if the flow type identifier can be found in the set.

And when judging that the traffic type is not the recovery traffic, giving the traffic data preset metadata traffic weight, for example, the traffic weight can be 1. And the storage node issues the flow data and carries out classified statistics on the flow data through a flow statistics device.

In this step, the flow statistics device performs classification statistics on various flow data, so that the system can adjust at least one part of metadata flow threshold, preset period generation token number and preset concurrency according to the current flow condition of the disk.

For example, after the storage node determines that the traffic type is the upper layer traffic, the storage node assigns a metadata traffic weight 1 to the traffic data and issues the traffic data. The accumulated upper business flow weight counted by the flow counter is added with 1.

In step 103, after the storage node determines that the traffic type is the recovery type, it is determined whether the sum of the metadata traffic weight and the leaky bucket accumulated metadata traffic weight is greater than a preset metadata traffic threshold.

In this step, a preset metadata traffic weight is assigned to each object metadata, for example, the metadata traffic weight may be 500, and the metadata traffic weight may be adjusted according to the requirements and the current limiting policy, and the effect of the weight may be analogous to the effect of the byte length in a general leaky bucket.

For example, the preset metadata traffic threshold is 2000, the metadata traffic weight of each recovery traffic is 500, and the sum of the missing bucket accumulated metadata traffic weights is 1000 at this time, so when one traffic data is determined to be recovery traffic data, the sum of the metadata traffic weight 500 and the sum of the missing bucket accumulated metadata traffic weights 1000 at this time is 1500, and the result is smaller than the preset metadata traffic threshold 2000.

In step 104, when the storage node determines that the sum of the metadata traffic weight and the missing bucket accumulated metadata traffic weight is not greater than the preset metadata traffic threshold, updating the missing bucket accumulated metadata traffic weight to be the current accumulated metadata traffic weight plus the metadata traffic weight, and issuing the traffic.

For example, the preset metadata traffic threshold is 2000, the sum of the accumulated metadata traffic weights of the leaky bucket is 1000, the metadata traffic weight of the traffic data is 500, the sum of the metadata traffic weight 500 and the sum of the accumulated metadata traffic weights of the leaky bucket is 1500, and when the result is smaller than the preset metadata traffic threshold 2000, the traffic data is put into the leaky bucket, and the sum of the accumulated metadata traffic weights of the leaky bucket is updated to 1500.

In step 105, when the storage node determines that the sum of the metadata traffic weight and the leaky bucket accumulated metadata traffic weight is greater than a preset metadata traffic threshold, the traffic data is put into a buffer queue for delay processing.

And when the storage node judges that the sum of the metadata flow weight and the missing bucket accumulated metadata flow weight is larger than a preset metadata flow threshold, generating a missing bucket delay task corresponding to the metadata flow weight. The delay task comprises delay ending time and metadata flow weight. The delay end time is derived based on the current time plus a preset extension time, which may be, for example, 10 seconds.

For example, the preset metadata traffic threshold is 2000, the metadata traffic weight of each recovery traffic is 500, and the sum of the accumulated metadata traffic weights of the missing buckets is 2000 at this time, so when one traffic data is judged to be the recovery traffic data, the sum of the metadata traffic weight 500 and the sum of the accumulated metadata traffic weights of the missing buckets is 2500, and the result is greater than the preset metadata traffic threshold 2000, then a missing bucket delay task corresponding to the metadata traffic weight is generated, the delay end time calculated according to the preset extension time is recorded in the delay task, and the current time is 15:00, assuming that the preset extension time is 1min, the delay ending time is 15:01.

According to the scheme, the leakage bucket is arranged on each disk, the buffer queue is arranged on the leakage bucket, the flow data exceeding the metadata flow threshold value is put into the buffer queue for delay processing, so that data discarding is avoided, the processing of the flow data is delayed through the leakage bucket current limiting, and the aim of limiting the IOPS for each disk can be achieved.

How the traffic data is put into the buffer queue for delay processing will be exemplarily described below through fig. 2, and fig. 2 is a flowchart of a method for traffic limitation of distributed object storage, which is shown in an exemplary embodiment, and may include the following processing as shown in fig. 2:

in step 201, the storage node sequentially fetches the delay tasks from the leaky bucket cache queue in the next processing cycle of the leaky bucket.

In this step, the storage node processes the delay task at intervals, for example, 10 ms or 20 ms, which is not limited in this embodiment of the present disclosure.

In step 202, the storage node determines whether the delay task is overtime based on the current time and the end time of the delay task.

Illustratively, the current time is 15:00, the storage node takes out a first delay task from the leaky bucket cache queue, wherein the delay ending time of the delay task is 15:01, the metadata traffic weight corresponding to the delay task is 500.

In step 203, when the storage node determines that the delay task is overtime, the storage node directly issues traffic data.

Illustratively, the current time is 15:02, the delay ending time of the delay task is 15: and 01, judging that the time-delay task is overtime at the moment, directly issuing the flow data without limiting the flow data.

In step 204, when the storage node determines that the delay task is not overtime, it determines whether the sum of the metadata traffic weight and the leaky bucket accumulated metadata traffic weight is greater than a preset metadata traffic threshold.

Illustratively, the current time is 15:00, the delay ending time of the delay task is 15:01, i.e. the delay task has not timed out. And the storage node judges that the metadata flow weight of the current delay task is not larger than the preset metadata flow threshold 2000 based on the fact that the metadata flow weight of the current delay task is 500 and the current accumulated metadata flow weight of the leaky bucket is 1000.

In step 205, when the storage node determines that the sum of the metadata traffic weight and the missing bucket accumulated metadata traffic weight is not greater than the preset metadata traffic threshold, the accumulated metadata traffic weight of the missing bucket is updated to be the current accumulated metadata traffic weight plus the metadata traffic weight, and the traffic is issued.

In step 206, when the storage node determines that the sum of the metadata traffic weight and the missing bucket accumulated metadata traffic weight is greater than the preset metadata traffic threshold, the delay end time is updated, a missing bucket delay task corresponding to the traffic data is generated, and the missing bucket delay task is placed in a missing bucket buffer queue for delay processing.

In this step, the updated delay ending time is obtained according to the current delay ending time of the delay task and a preset extension time.

Illustratively, the current time is 15:00, the delay ending time of the delay task is 15:01, assuming that the preset extension time is 1min, the delay ending time is 15:02.

as described above with the flow of fig. 2, a flow of placing the traffic data into a buffer queue for delay processing is illustrated. According to the flow, the missing bucket sequentially takes out the delay tasks from the missing bucket cache queue in the next processing period, and judges whether the delay tasks exceed the delay ending time or not based on the current time. And directly issuing flow data corresponding to the delay task under the condition of overtime. Judging whether the current leaky bucket reaches the upper processing limit based on the metadata flow weight of the delay task and the accumulated metadata flow weight of the leaky bucket under the condition that the delay task is not overtime, if the current leaky bucket reaches the upper processing limit, updating the delay ending time, and regenerating the delay task to be put into the queue tail of the cache queue; if the upper limit is not reached, the traffic is directly issued. The scheme can reduce the waiting time of the flow data and avoid the discarding of the flow data, and can play a role in limiting the IOPS of a single disk.

After the IOPS for recovering the traffic data is limited by the steps of fig. 1, further, the bandwidth of the traffic data may be limited by the token bucket.

FIG. 3 is a flow chart illustrating a method of traffic limiting for distributed object storage in accordance with an exemplary embodiment.

As shown in fig. 3, the method may include the following processes. It should be noted that, in the embodiment of the present disclosure, the execution sequence of each step is not limited, and the sequence numbers of the steps in the following description are not limiting the sequence of the steps:

in step 301, a token bucket receives traffic data issued by a leaky bucket process.

It should be noted that each disk further includes a token bucket, where the token bucket includes a token bucket buffer queue, and tokens of a token number generated by the token bucket in a preset period are generated by the token bucket in each generation period and put into the token bucket. The generation period may be, for example, 10 milliseconds, and the preset period generation token number may be, for example, 26214. The token bucket capacity may have an upper limit, for example 26214, and after the token in the token bucket reaches the upper limit, no more tokens are placed in the token bucket. The number of tokens corresponds to the data size of the data written to the disk. The number of tokens generated in the preset period can be changed according to a current limiting strategy or actual requirements.

In step 302, the storage node determines whether the traffic type is restoration traffic.

In this step, since the leaky bucket current limit in fig. 1 and the token bucket current limit mentioned in the embodiment of the present specification are handled by two different parts in the system, it is necessary to determine the traffic type again.

Illustratively, after the token bucket receives the traffic data issued by the leaky bucket processing, the traffic identification information in the traffic data is analyzed to determine the traffic type of the traffic data.

In step 303, the storage node checks the number of tokens cached in the token bucket if the traffic type is identified as a recovery type.

In step 304, if the number of tokens is greater than the number of tokens required by the flow data, the recovery data is calculated from the flow data and written into the disk, and the number of tokens in the token bucket is updated. The number of the tokens cached in the token bucket is 1000, and the number of the tokens required for the recovery data calculated by the flow data at this time is 100, and because the number of the tokens cached in the token bucket can meet the requirement for the recovery data at this time, the recovery data is written into a disk, and the number of the tokens cached in the token bucket is updated.

In step 305, if the number of tokens is smaller than the number of tokens required by the flow data, a token bucket delay task corresponding to the flow data is generated and placed in a token bucket buffer queue for delay processing.

According to the scheme, the leakage bucket and the token bucket which are arranged in sequence are arranged on each disk, so that objects with different sizes can be limited at the same time. For example, objects greater than 1MB are referred to as large objects, and objects less than 1MB are referred to as small objects. For large objects, their IOPS may be limited by the leaky bucket, which in turn limits their bandwidth by the token bucket, because their actual data is large. Whereas for small objects, their IOPS is limited primarily by leaky buckets. The scheme has the effect of simultaneously limiting the flow of objects with different sizes, and as each disk is provided with a respective leakage bucket and token bucket, the flow limitation of the granularity of the disk can be achieved.

How to put token bucket delay tasks into token bucket cache queues for delay processing will be described by way of example with reference to fig. 4, fig. 4 is a flowchart illustrating a method for flow restriction for distributed object storage according to an exemplary embodiment, and as shown in fig. 4, the following processes may be included in the flow:

In step 401, the storage node sequentially fetches the delay tasks from the token bucket cache queue in the next processing cycle of the token bucket.

In step 402, the storage node performs the operation of writing the calculation recovery data to the disk until the number of tokens is greater than the number of tokens required for the flow data, and updates the number of tokens to be the current number of tokens minus the number of tokens required for the flow data.

The operation of writing the calculation recovery data into the disk refers to that when the distributed object storage adopts redundancy strategies such as EC erasure codes and the like to store data, when a certain disk of the storage node fails, the data on the disk can be calculated according to the data of the rest storage nodes by adopting a corresponding EC failure recovery algorithm.

In each processing period, the storage node can continuously judge whether the number of cached tokens in the token bucket can meet the requirement of recovering data or not until the number of tokens in the token bucket meets the requirement.

Exemplary, statistics t _i Time (i=1, 2.,) in the processing cycle corresponding to time (i=1, 2.,) the sum of the byte numbers N of all cache flows in the cache queue _sum If the statistics and the memory N _cache Exceeds a set threshold T _p The rate of throttling through the leaky bucket per request is slowed. Let N be _j And (3) representing the buffer flow corresponding to the delay task with the sequence number j, wherein the formula for calculating the ratio is as follows:

it should be noted that, the token bucket judges whether the token bucket cache queue cannot acquire enough token numbers meeting the requirements for a long time by calculating whether the ratio of the sum of bytes corresponding to all flow data in the token bucket cache queue to the memory in each processing period exceeds a preset token bucket current limit threshold. The preset token bucket throttling threshold may be, for example, 20%.

When such a situation exists, the storage node will turn up the metadata traffic weight corresponding to the recovery type traffic. The metadata traffic weight corresponding to the recovery type traffic may be adjusted to 1000.

For example, 3 processing cycles have elapsed, at this time, the sum of bytes corresponding to all the flow data in the token bucket buffer queue is 30, the memory value is 100, at this time, the ratio of the sum of bytes corresponding to the flow data to the memory is 30%, and exceeds the preset token bucket current limiting threshold by 20%, then the metadata weight corresponding to the recovery type flow is adjusted from 500 to 1000, then the recovery flow put in the leaky bucket is greatly reduced, so as to reduce the flow data sent to the token bucket, at this time, the number of tokens generated in the preset cycle is not changed, and the consumption speed of the tokens in the token bucket is greatly reduced, so that the number of the tokens buffered in the token bucket can meet the requirements in the token bucket buffer queue.

According to the scheme, whether the data in the token bucket cache queue cannot acquire enough tokens for a long time is judged by continuously calculating the ratio of the sum of bytes corresponding to all flow data in the token bucket cache queue to the memory, and when the ratio is larger than a preset token bucket current limiting threshold, the metadata weight for recovering the flow is adaptively increased, so that the flow data issued in the token bucket is reduced, and the delay task in the token bucket cache queue can acquire enough tokens.

When the data recovery service such as reconstruction, migration and repair is performed in the distributed object storage system, the storage system sends relevant information of the data to be recovered to the RabbitMQ for persistence processing so as to avoid data loss, wherein the relevant information comprises the type of recovery task, the name and storage position of the object, the virtual node number of the disk and the like. After receiving the information pushed by RabbitMQ, other nodes storing redundant data can send self-stored data according to the type of the recovery task in the information to complete data recovery.

However, under the condition of larger data quantity to be recovered, more message pushing is generated in RabbitMQ, and too much message pushing can also cause a server to generate extremely high resource occupation, so that the stable operation of upper-layer business is affected. In the prior art, the concurrency is changed by taking subscription relation between consumer and channel to reduce message pushing, but the proposal can cause problems of secondary consumption, memory leakage, performance reduction and the like of messages due to frequent subscription operation, and can cause message loss when serious, and can reduce the stability and reliability of a data storage system. Based on the defects of the prior art and the consideration of controlling the data volume in the leaky bucket cache queue, the embodiment of the specification also provides a flow limiting method.

FIG. 5 is a flow chart illustrating a method of traffic restriction for distributed object storage in accordance with an exemplary embodiment.

As shown in fig. 5, the method may include the following processes. It should be noted that, in the embodiment of the present disclosure, the execution sequence of each step is not limited, and the sequence numbers of the steps in the following description are not limiting the sequence of the steps:

in step 501, a consumer of a RabbitMQ queue is established with a message distributor of the RabbitMQ queue.

In this step, the storage node includes at least one rabitmq queue, each of the rabitmq queues includes a message distributor, each of the rabitmq queues includes a consumer, and the message distributor includes a message cache queue.

The message buffer queue is an unbounded queue, but because the RabbitMQ pushes new messages only after the messages are confirmed, the messages in the message buffer queue can be maintained at a certain value without overflowing the memory. The unbounded queue receives the data in the corresponding RabbitMQ queue, encapsulates the data into information, and then transmits the information to a consumer of the corresponding RabbitMQ queue for processing.

Illustratively, each storage node has an EC_ERROR queue for repair information delivery and a reconstructed queue for disk reconstruction, each with a message dispatcher. The message distributor may be implemented by a backpressure flow control mechanism in asynchronous responsive programming. And when the message exists, judging whether the number of the messages is larger than a preset concurrency amount, if so, issuing information of the preset concurrency amount to the consumer, otherwise, issuing all the messages in the cache queue to the consumer.

In step 502, after the rabitmq queue receives the traffic data sent by the server network, the storage node parses the traffic data and encapsulates it into a message, and the message is sent into the message buffer queue.

For example, when data recovery is required each time, the disk detection program of the node to be recovered sends recovery traffic data to the ec_error and the reconstructed queue, and the storage node encapsulates the recovery traffic data into a message including data such as routing information, priority, expiration time and the like, and stores the message into a message cache queue in the message distributor. The encapsulation is used for realizing the fine control and management of the message, and ensuring that the message can be accurately and efficiently transmitted and consumed.

In step 503, the message distributor calculates a difference between a preset concurrency amount and the total amount of the currently processed messages, wherein the total amount of the currently processed messages is a message which has issued that the consumer does not confirm that the processing is completed.

In this step, after the consumer finishes processing the message, a confirmation completion instruction is sent to the message distributor, the message distributor subtracts 1 from the total amount of the currently processed message, recalculates the difference between the preset concurrency amount of the message distributor and the total amount of the currently processed message, and triggers a post-message callback (onNext) to enable the rabkitmq server to send a new message to the message distributor, so that the flow data in the message cache queue is kept dynamically balanced to avoid memory overflow.

In addition, by the callback when error (onError) and the callback when end (onecomplete) can define the execution logic of the data stream due to error and the data stream end, the reliability and the safety of message consumption are ensured.

Recording the total quantity of the current processing message as S _runner The preset concurrency is S _qos The difference is S _req S, i.e _req ＝S _qos -S _runner The method comprises the steps of carrying out a first treatment on the surface of the The difference S _req To the number of messages that need to be delivered. For example, in practical application, the method description. Request is modified (S _req ) Middle difference S _req To adjust the number of delivered messages.

In step 504, in the case that the difference is greater than 0, a message with the number of the differences is issued to the consumer, and the total amount of the currently processed message is updated to be the sum of the total amount of the currently processed message and the difference.

It should be noted that atomicity in a multi-threaded environment needs to be guaranteed to avoid counting errors when updating the current processed message volumes and differences.

For example, the preset concurrency is set to 10, the total amount of the current processing messages of the consumers in the RabbitMQ queue is 8, the difference value is calculated to be 2, 2 messages are issued to the consumers, and the total amount of the current processing messages is updated to 10.

According to the scheme, the message concurrency device based on the backpressure flow control mechanism is used for avoiding the problems of message secondary consumption, memory leakage, network congestion, performance degradation, message loss and the like of a RabbitMQ system caused by frequent acquisition and ordering operations of consumers and channels, and improving the stability and reliability of a distributed object storage system.

Furthermore, the flow restriction method of the distributed object store mentioned in the present specification can be adjusted according to three different flow restriction strategies.

For example, if the data recovery priority policy is selected, the metadata traffic threshold may be adjusted to 0, the preset period generation token number may be adjusted to a maximum value, the preset concurrency amount may be adjusted to a first preset concurrency amount, and the storage node directly issues the first traffic data when the metadata traffic threshold is 0. The first preset concurrency is specifically that the pre-fetching value of an ec_error queue (a general repair message queue, which receives all types of repair messages) is 100, and the rebuild queue (for disk reconstruction repair) is 10. The policy does not limit all the recovered traffic data, and when the policy is adopted, if the recovered traffic data exists in the distributed object storage system, the recovered traffic data is preferentially processed.

And if the service priority strategy is selected, adjusting the metadata flow threshold to be a first metadata flow threshold, adjusting the preset period generation token number to be a first preset period generation token number, and adjusting the preset concurrency amount to be a second preset concurrency amount. The first metadata traffic threshold may be, for example, 1, and the first preset period generation token number may be, for example, 26214. The second preset concurrency is specifically that the prefetch value of the ec_error queue (general repair message queue, which receives all types of repair messages) is 5, and the rebuild queue (for disk reconstruction repair) is 2. The policy may reasonably limit the data recovery traffic to ensure normal traffic handling.

If the self-adaptive strategy is selected, calculating the service flow average value counted by the flow counter in N counting periods from the current time, and multiplying the service flow average value by a preset recovery flow limit threshold value to obtain a self-adaptive threshold value; the preset restoration flow limit threshold value can be freely adjusted.

For example, the recovery flow may be set according to the real-time change of the traffic flow counted by the flow counter in one counting period and a preset recovery flow limit threshold value a defined by the user, where the preset recovery flow limit threshold value a is the ratio of the recovery flow to the traffic flow. Let the upper layer traffic at time t be N ^t _pre Then the threshold T _adapt The calculation formula of (2) is as follows:

where n is the statistical period duration, i.eIs the average of traffic from time 0 to the end of a statistical period.

Adjusting the metadata flow threshold and the preset period generation token number to be the self-adaptive threshold;

under the condition that the service flow is greater than 0 at the current time, the preset concurrency is adjusted to be a second preset concurrency, namely, an EC_ERROR queue (a general repair message queue for receiving all types of repair messages) has a prefetch value of 5 and a rebuild queue (for disk reconstruction repair) has a prefetch value of 2;

And under the condition that the service flow is equal to 0 at the current time, the preset concurrency is adjusted to be a first preset concurrency, namely, the pre-fetching value of an EC_ERROR queue (a general repair message queue for receiving all types of repair messages) is 100, and the rebuild queue (for disk reconstruction repair) is 10.

When the strategy is adopted, the current situation of the upper-layer service flow can be perceived in real time through the flow monitor, and the threshold value for limiting the internal recovery flow is dynamically adjusted according to the current situation of the service flow. When upper layer business flow data exists in the storage node, the recovery flow data can keep a fixed ratio with the upper layer business flow data, namely, a preset recovery flow limit threshold value is preset, so that the processing of the upper layer business can be ensured; and when the upper layer service does not exist in the storage node, the recovery flow data is not limited any more, so that the effect of self-adaptively adjusting the flow limitation is achieved.

It should be noted that, when the preset concurrency amount is adjusted, the preset concurrency amount S may be adjusted specifically _qos The number of the issued messages is adjusted, thereby achieving the aim of controlling concurrency.

According to the scheme, the current limiting strategy is freely selected, so that the current limiting requirements under different service scenes can be met, and when the strategy is switched, the quantity of the issued messages can be changed by modifying the value of the preset concurrency quantity, so that the purpose of controlling concurrency is achieved.

FIG. 6 is a flow restriction device for distributed object storage, which may be used to implement any of the embodiments of the present disclosure, as shown in FIG. 6, and which may include: a flow receiving module 601, a judging module 602 and a flow processing module 603.

A traffic receiving module 601, configured to receive traffic data by the storage node, where the traffic data includes object actual data and object metadata;

the judging module 602 is configured to judge, when the flow type of the flow data is identified as a recovery type, whether a sum of a metadata flow weight and a current accumulated metadata flow weight of the leaky bucket is greater than a preset metadata flow threshold of the leaky bucket; the metadata flow weight is obtained by determining the flow type identifier of the flow data based on the mapping relation between the flow type identifier and the metadata flow weight;

and the flow processing module 603 is configured to generate a leaky bucket delay task corresponding to the flow data if the sum of the metadata flow weight and the current accumulated metadata flow weight of the leaky bucket is greater than a preset metadata flow threshold of the leaky bucket, put the leaky bucket delay task into a leaky bucket buffer queue for delay processing, and update the accumulated metadata flow weight of the leaky bucket to be the current accumulated metadata flow weight plus the metadata flow weight if the sum of the metadata flow weight and the current accumulated metadata flow weight of the leaky bucket is not greater than the preset metadata flow threshold of the leaky bucket, and issue the flow.

Embodiments of the present disclosure also provide an electronic device, referring to fig. 7, where the device includes a memory for storing computer instructions executable on the processor, and a processor for performing flow restriction on distributed object storage based on any of the above methods when the computer instructions are executed.

The present description also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements any of the methods described above.

Embodiments of the subject matter and the functional operations described in this specification can be implemented in: digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware including the structures disclosed in this specification and structural equivalents thereof, or a combination of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible, non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. Alternatively or additionally, the program instructions may be encoded on a manually-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode and transmit information to suitable receiver apparatus for execution by data processing apparatus. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.

The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Computers suitable for executing computer programs include, for example, general purpose and/or special purpose microprocessors, or any other type of central processing unit. Typically, the central processing unit will receive instructions and data from a read only memory and/or a random access memory. The essential elements of a computer include a central processing unit for carrying out or executing instructions and one or more memory devices for storing instructions and data. Typically, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks, etc. However, a computer does not have to have such a device. Furthermore, the computer may be embedded in another device, such as a mobile phone, a Personal Digital Assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device such as a Universal Serial Bus (USB) flash drive, to name a few.

Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices including, for example, semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., internal hard disk or removable disks), magneto-optical disks, and CDROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features of specific embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. On the other hand, the various features described in the individual embodiments may also be implemented separately in the various embodiments or in any suitable subcombination. Furthermore, although features may be acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, although operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Furthermore, the processes depicted in the accompanying drawings are not necessarily required to be in the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.

The foregoing description of the preferred embodiments is merely illustrative of the present invention and is not intended to limit the embodiments to the specific embodiments shown, but any modifications, equivalents, improvements, etc. within the spirit and principles of the embodiments are intended to be included within the scope of the embodiments shown.

Claims

1. A method for limiting the flow of a distributed object store, wherein the object store comprises at least one storage node, the storage node comprises at least one disk, each disk comprises a leaky bucket, and the leaky bucket comprises a leaky bucket cache queue;

the method comprises the following steps:

2. The method of claim 1, wherein each disk further comprises a token bucket, the token bucket comprising a token bucket cache queue, the token bucket generating a predetermined number of tokens per generation cycle and placing the tokens into the token bucket;

the method comprises the following steps:

receiving flow data issued by the leaky bucket processing, and checking the number of tokens cached in the token bucket under the condition that the flow type is identified as a recovery type;

under the condition that the number of tokens is larger than the number of tokens required by the flow data, executing operation of calculating recovery data and writing the recovery data into a disk, and simultaneously updating the number of tokens to be the number of current tokens minus the number of tokens required by the flow data;

and under the condition that the number of tokens is smaller than the number of tokens required by the flow data, generating a token bucket delay task corresponding to the flow data, and putting the token bucket delay task into a token bucket cache queue for delay processing.

3. The method of claim 1, wherein the delay task comprises: delay end time and metadata flow weight;

after the bucket leakage delay task corresponding to the flow data is generated and put into the bucket leakage buffer queue for delay processing, the method further comprises the following steps:

Sequentially taking out delay tasks from a leakage bucket cache queue in the next processing period of the leakage bucket, and judging whether the delay tasks are overtime or not based on the current time and the delay ending time of the delay tasks;

under the condition that the time-delay task is overtime, issuing flow data, wherein the flow data comprises a flow type identifier and object actual data;

judging whether the sum of the metadata flow weight of the delay task and the current accumulated metadata flow weight of the leakage bucket is larger than the preset metadata flow threshold value or not under the condition that the delay task is not overtime;

if yes, updating the delay ending time, generating a leaky bucket delay task corresponding to the flow data, and placing the leaky bucket delay task into a leaky bucket buffer queue for delay processing;

4. The method of claim 2, wherein after the token bucket delay task corresponding to the generated flow data is put into a token bucket buffer queue for delay processing, further comprising:

sequentially taking out delay tasks from a token bucket cache queue in the next processing period of the token bucket, and judging whether the number of tokens cached in the token bucket is larger than the number of tokens required by the flow data;

And executing the operation of writing the calculation recovery data into the disk until the number of the tokens is larger than the number of the tokens required by the flow data, and simultaneously updating the number of the tokens to be the current number of the tokens minus the number of the tokens required by the flow data.

5. The method according to claim 4, wherein the method further comprises:

counting the sum of byte numbers corresponding to all flow data in the token bucket cache queue in each processing period;

judging whether the ratio of the byte number sum to the memory exceeds a preset token bucket current limiting threshold;

and under the condition that the ratio exceeds the preset token bucket current limiting threshold, the metadata flow weight corresponding to the recovery type flow is regulated up.

6. The method of claim 2, wherein the storage node comprises at least one rabitmq queue, each of the rabitmq queues comprising a message distributor, each of the rabitmq queues comprising a consumer, the message distributor comprising a message cache queue;

the storage node receiving traffic data includes:

establishing a subscription relationship between a consumer of the RabbitMQ queue and a message distributor of the RabbitMQ queue;

After the RabbitMQ queue receives the flow data sent by the server network, analyzing the flow data and packaging the flow data into a message which is transmitted into the message cache queue;

the message distributor calculates the difference value between the preset concurrency quantity and the total quantity of the current processing messages, wherein the total quantity of the current processing messages is the message which is issued and is not confirmed to be processed by the consumer;

and under the condition that the difference value is greater than 0, sending messages with the quantity of the difference value to the consumers, and updating the total quantity of the current processing messages to be the sum of the total quantity of the current processing messages and the difference value.

7. The method of claim 6, wherein establishing a subscription relationship between the consumer of the rabitmq queue and the message distributor of the rabitmq queue comprises:

the message distributor receives the request of the consumer and judges whether a message exists in the message cache queue;

judging whether the number of the messages in the message cache queue is larger than the preset concurrency amount or not when the messages exist;

when the number of messages in the message cache queue is larger than the preset concurrency, sending messages with the number being the preset concurrency to the consumer;

Otherwise, the information in the whole information buffer queue is issued to the consumer.

8. The method of claim 6, wherein the method further comprises:

after the target message of the consumer is processed, the consumer sends a confirmation completion instruction to the message distributor;

the message distributor receives the confirmation completion instruction, then the value of the total quantity of the current processing message is subtracted by one, and the difference value between the preset concurrency quantity of the message distributor and the total quantity of the current processing message of the consumer is recalculated;

and under the condition that the difference value is greater than 0, sending messages with the quantity of the difference value to the consumers, and updating the total quantity of the current processing messages into the sum of the total quantity of the current processing messages and the difference value.

9. The method of claim 6, wherein the method further comprises:

and adjusting at least one part of the metadata flow threshold, the preset period generation token number and the preset concurrency amount based on different current limiting strategies.

10. The method of claim 9, wherein the adjusting at least a portion of the metadata traffic threshold, the preset number of cycle generation tokens, and the preset concurrency metric based on different throttling policies comprises:

And under the condition that the current limiting strategy is a data recovery priority strategy, the metadata flow threshold is adjusted to be 0, the preset period generation token number is adjusted to be a maximum value, the preset concurrency amount is adjusted to be a first preset concurrency amount, and the storage node directly issues flow data under the condition that the metadata flow threshold is 0.

11. The method of claim 9, wherein the adjusting at least a portion of the metadata traffic threshold, the preset number of cycle generation tokens, and the preset concurrency metric based on different throttling policies comprises:

and under the condition that the current limiting strategy is a service priority strategy, adjusting the metadata flow threshold to be a first metadata flow threshold, adjusting the preset period generation token number to be a first preset period generation token number, and adjusting the preset concurrency amount to be a second preset concurrency amount.

12. The method of claim 9, wherein each disk further comprises a traffic statistics unit, the traffic statistics unit being configured to classify the issued traffic;

the adjusting at least a portion of the metadata traffic threshold, the preset period generation token number, and the preset concurrency amount based on different current limiting policies includes:

Under the condition that the current limiting strategy is an adaptive strategy, calculating the service flow average value counted by the flow counter in N counting periods from the current time, and multiplying the service flow average value by a preset recovery flow limiting threshold value to obtain an adaptive threshold value;

under the condition that the service flow at the current time is greater than 0, adjusting the preset concurrency to a second preset concurrency;

and under the condition that the service flow in the current time is equal to 0, adjusting the preset concurrency to be a first preset concurrency.

13. A flow restriction device for a distributed object store, wherein said object store comprises at least one storage node, said storage node comprising at least one disk, each disk comprising a leaky bucket, said leaky bucket comprising a leaky bucket cache queue;

the device comprises:

14. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to implement the method of any one of claims 1-12 by executing the executable instructions.

15. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method of any of claims 1-12.