WO2017028696A1

WO2017028696A1 - Method and device for monitoring load of distributed storage system

Info

Publication number: WO2017028696A1
Application number: PCT/CN2016/093893
Authority: WO
Inventors: 张潇雨
Original assignee: 阿里巴巴集团控股有限公司
Priority date: 2015-08-17
Filing date: 2016-08-08
Publication date: 2017-02-23
Also published as: CN106469018B; CN106469018A

Abstract

A method and device for monitoring the load of a distributed storage system. The method comprises: determining all thread pools with load pressure exceeding the standard on each server in a distributed storage system (S1); and giving an alarm or performing load balancing for each thread pool with load pressure exceeding the standard (S2). The method and the device are capable of giving an alarm or automatically performing balanced load assignment between servers according to the states of thread pools of the servers, namely, the load excess service ability of the single servers, do not depend on a user request mode, can correctly process a request that is reached by different users at the same time, do not depend on the service capabilities of the servers, and are also capable of correctly giving an alarm or performing load balancing under the condition of inconsistency of the service capabilities of the servers in a cluster of a distributed storage system, thereby preventing the occurrence of a hotspot and improving the service quality of the distributed storage system.

Description

Load monitoring method and device for distributed storage system

The present application claims priority to Chinese Patent Application No. 20151050465, filed on Aug. 17, 2015, entitled,,,,,,,,,,,,,,,,,,,,,,,,

Technical field

The present application relates to the field of computers, and in particular, to a load monitoring method and device for a distributed storage system.

Background technique

A distributed storage system is a distributed system that uses a cluster to provide storage services. The user uses a key (Key) as an index to read and write the corresponding key (Value). For a key code, the user can write different values such as writing a value, reading a corresponding key value, or deleting a corresponding key value. Each operation is called a request. A thread pool in a distributed storage system is a service unit that has a certain number of threads. The request is first queued to join the queue of the thread pool, and the thread in the thread pool will take the request from the queue for processing in the idle state. Partition is the basic unit of distributed storage system scheduling. The key (Key) uniquely determines the subordinate partition (Partition) by the partition key (BeginKey) and the end key (EndKey). There is no overlap between the partitions. A server in a distributed storage system is a basic unit for providing services. Each server has a plurality of partitions, and requests for different key codes are different according to the partitions to which they belong. It is handled by different servers. The server internally uses the thread pool as the actual processing unit to handle different requests.

The user's key code is divided into partitions and then stored in the distributed file system in order. Since a single partition can only belong to one server (Server), when the number of user requests within a single partition increases, This will increase the load on the server, increase the user's latency (Latency), etc., and will also affect other partitions on this server. Therefore, in order to ensure full utilization of the service capabilities of all servers within the cluster, a load monitoring scheme is needed to spread hotspots and improve service quality. The current solution to request hotspots is the splitting and migration of partitions. The splitting is to divide the partition into multiple partitions according to different key ranges (Parts), and the split partitions are randomly distributed to other servers; the migration is to partition (Partition) Partition) moves from one server to another.

There are three general methods for resolving request hotspots:

1. After the size of a single partition exceeds a certain limit, divide the Partition into average partitions into partitions. However, the scheme of splitting according to the size of a single partition does not accurately reflect the processing power of the partition. The user's request mode is inconsistent, and the partition size has different effects on it. Sometimes it is different. In the case of a small partition size, hotspots may also occur because user requests are concentrated in a small range.

2. When the query rate per second (QPS) of a single partition (Partition) is greater than a certain threshold, the partition is split according to the range requested by the user. However, according to the requested query rate per second (QPS) as a threshold, it is necessary to determine the processing power of different servers, so different values need to be configured on different servers, and on the server. Sometimes it is not possible to achieve theoretical processing power when running other programs.

3. Obtain some parameters of the request execution, such as IO operation time, Cache hit times, etc., and then configure certain rules to perform splitting when the preset conditions are met. Although the scheme of configuring certain rules according to the parameters to be split is very flexible, it is for this reason that the rules configured in different scenarios are not the same, and the rules need to be updated according to the user's request mode. Not enough automation.

Summary of the invention

An object of the present application is to provide a load monitoring method and device for a distributed storage system, which can solve the problem of hot spots in a distributed storage system.

According to an aspect of the present application, a load monitoring method for a distributed storage system is provided, the method comprising:

Determining the thread pool where all load pressures on each server in the distributed storage system exceed the standard;

Alarm or load balancing for each thread pool with excessive load pressure.

Further, in the foregoing method, determining a thread pool in which all load pressures on each server in the distributed storage system exceed the standard include:

Obtaining a ratio of a wait time to a stay time of a request in a queue of each thread pool on each server, the stay time being the sum of a wait time of one request in a queue of each thread pool and an actual processing time;

When the ratio of the waiting time to the staying time exceeds a preset exceeding threshold, it is determined that the load pressure of the thread pool on the server where the request is located exceeds the standard.

Further, in the foregoing method, the preset exceeding threshold is determined according to a preset threshold of a request arrival rate of the thread pool, wherein the request arrival rate of the thread pool is a rate of a queue requesting to reach the thread pool and the thread pool The ratio of service capabilities per unit time, when the thread pool's request arrival rate exceeds the thread pool's request arrival rate. When the threshold is preset, the ratio of the corresponding waiting time to the staying time starts to rise sharply, and the preset exceeding threshold exceeds the ratio of the waiting time to the staying time when the start of the sharp rise.

Further, in the above method, load balancing is performed on each thread pool whose load pressure exceeds the standard, including:

The request of the thread pool that exceeds the standard of each load pressure is counted according to the partition, and the number of requests belonging to different partitions in the thread pool is counted, and the partitions are arranged in descending order according to the number of requests;

Determines whether the number of requests for the partition with the largest number of requests exceeds half of the total number of requests for all partitions of the thread pool.

If yes, split the partition with the largest number of requests.

Further, in the foregoing method, the splitting operation is performed on the partition with the largest number of requests, including:

The partition is divided into several sub-partitions, and the sub-partitions are distributed to other servers, wherein each sub-partition corresponds to a sub-key range within a range of key codes of the partition, and the number of requests to which each sub-part belongs is substantially equal.

Further, in the foregoing method, after determining whether the number of requests of the partition with the largest number of requests exceeds half of the total number of requests of all the partitions of the thread pool, the method further includes:

If not, selecting one or more partitions from the first partition in the descending ranked partition until the total number of requests belonging to the unselected remaining partitions is less than half of the total number of requests of all partitions of the thread pool ;

Migrate the selected partition.

Further, in the above method, the selected partition is migrated, including:

Migrate each selected partition to a server that has no thread pool with excessive load pressure.

Further, in the above method, each selected partition is migrated to a server that has no thread pool with excessive load pressure, including:

Find a server that meets the criteria for a thread pool that does not have excessive load pressure. If it finds it, migrate the selected partition to the found server.

Further, in the above method, the server that meets the condition that the thread pool without the load pressure exceeds the standard includes:

If a selected partition is migrated to a corresponding thread pool of a server that has no thread pool with excessive load pressure, the average thread usage of each corresponding thread pool of the migrated target server does not exceed the preset usage. Rate threshold, then the server is a server with an eligible thread pool that has no load pressure exceeded.

Further, in the above method, the average thread usage rate of each thread pool is obtained by the following formula (λ ₁ + λ) * B / n, wherein

λ ₁ represents the rate at which a request in a thread pool on the server before the migration reaches the queue of the thread pool;

λ represents the rate at which the target server to be migrated to the queue of the thread pool in a corresponding thread pool before the migration;

B represents the actual processing time of one request for each thread in a corresponding thread pool of the target server to be migrated;

n indicates the number of threads in a corresponding thread pool of the target server to be migrated to.

According to another aspect of the present application, a load balancing device of a distributed storage system is provided, the device comprising:

a load monitoring device, configured to determine a thread pool in which all load pressures on each server in the distributed storage system exceed the standard;

An alarm or load balancing device that is used to alarm or load balance each thread pool whose load pressure exceeds the standard.

Further, in the above device, the load monitoring device is configured to obtain a ratio of a waiting time and a stay time of a request in a queue of each thread pool on each server, where the stay time is a queue of each thread pool. The sum of the waiting time of one request and the actual processing time; when the ratio of the waiting time to the staying time exceeds a preset exceeding threshold, it is determined that the load pressure of the thread pool on the server where the request is located exceeds the standard.

Further, in the foregoing device, the preset exceeding threshold is determined according to a preset threshold of a request arrival rate of the thread pool, wherein the request arrival rate of the thread pool is a rate of a queue requesting to reach the thread pool and the thread pool The ratio of the service capacity per unit time, when the request arrival rate of the thread pool exceeds the preset threshold of the request arrival rate of the thread pool, the ratio of the corresponding waiting time to the stay time starts to rise sharply, and the preset exceeding threshold exceeds the The ratio of waiting time to staying time when starting a sharp rise.

Further, in the above device, the alarm or load balancing device is configured to collect, according to the partition, the request of the thread pool that exceeds the pressure of each load, and count the number of requests belonging to different partitions in the thread pool. And sorting the partitions in descending order of the number of requests; determining whether the number of requests for the partition with the largest number of requests exceeds half of the total number of requests of all partitions of the thread pool; and if so, splitting the partition with the largest number of requests.

Further, in the above device, the alarm or load balancing device is configured to divide the partition into a plurality of sub-partitions, and distribute the sub-partitions to other servers, wherein each sub-part corresponds to a sub-range of the key code range of the partition. The range of key codes, the number of requests to which each sub-partition belongs is substantially equal.

Further, in the above device, the alarm or load balancing device is configured to determine the partition with the largest number of requests Whether the number of requests exceeds half of the total number of requests of all partitions of the thread pool, and if not, select one or more partitions from the first partition in the descending ranked partition until the remaining partitions are not selected The total number of requests for membership is less than half of the total number of requests for all partitions of the thread pool; the migration operation is performed on the selected partition.

Further, in the above device, the alarm or load balancing device is configured to migrate each selected partition to a server of a thread pool that does not have a load pressure exceeding the standard.

Further, in the above device, the alarm or load balancing device is configured to search for a server that meets the condition that the thread pool has no overloaded load, and if found, migrates the selected partition to the found server.

Further, in the above device, the server that meets the condition that the thread pool without the load pressure exceeds the standard includes:

Further, in the above device, the average thread usage rate of each thread pool is obtained by the following formula (λ ₁ + λ) * B / n, wherein

Compared with the prior art, the present application determines whether the thread pool with excessive load pressure on each server in the distributed storage system is alarmed or load balanced for each thread pool whose load pressure exceeds the standard, and can be based on the thread pool status of the server. That is, the load of a single server exceeds the service capability to alarm or automatically distribute the load among the servers. It does not depend on the user request mode, and can correctly handle requests that are simultaneously reached by different users, and does not depend on the server's service capabilities. In the case of inconsistent service capabilities of the internal servers of the distributed storage system, alarms or load balancing can be performed correctly, thereby preventing hot spots and improving the quality of distributed storage system services.

Further, the present application can accurately obtain a thread pool in which all load pressures on each server exceed the standard by comparing the ratio of the waiting time W _q to the staying time W and the preset over-standard threshold th.

Further, the present application determines an accurate preset exceeding threshold according to a preset threshold of the thread pool's request arrival rate, so that all thread pools with excessive load pressure on each server can be obtained more accurately.

Further, the application will perform statistics according to the partitions of each thread pool whose load pressure exceeds the standard, and count the number of requests belonging to different partitions in the thread pool, and arrange the partitions in descending order according to the number of requests. When the number of requests for the partition with the largest number of requests exceeds half of the total number of requests for all partitions of the thread pool, the partition with the largest number of requests is split, and the partition that needs to be split can be accurately found, thereby effectively implementing Load balancing.

Further, in the present application, when the number of requests for the partition with the largest number of requests does not exceed half of the total number of requests of all the partitions of the thread pool, one or more are selected from the first partition in the descending ranked partition. Partitions, until the total number of requests belonging to the remaining unselected partitions is less than half of the total number of requests from all partitions of the thread pool, and the selected partitions are migrated to accurately find the partitions to be migrated in one thread pool. To achieve better load balancing.

Further, the present application migrates the selected partition to the found server under the premise of finding a server with a thread pool with no load pressure exceeding the standard, thereby better implementing load balancing.

Further, in the present application, if a selected partition is migrated to a corresponding thread pool of a server that has no thread pool with excessive load pressure, the average thread usage rate of each corresponding thread pool of the migrated target server is If the server does not exceed the preset usage threshold, the server is a server with an unqualified thread pool with excessive load pressure. It can accurately find the server with the thread pool without load pressure exceeding the standard, thus achieving better load balancing. .

DRAWINGS

Other features, objects, and advantages of the present application will become more apparent from the detailed description of the accompanying drawings.

1 shows a flow chart of a load monitoring method of a distributed storage system in accordance with an aspect of the present application;

2 is a flow chart showing a preferred embodiment of a load monitoring method of the distributed storage system of the present application;

FIG. 3 illustrates a schematic diagram of a preset over-standard threshold determination according to an embodiment of the present application; FIG.

4 shows a flow of another preferred embodiment of a load monitoring method of a distributed storage system according to the present application Figure

Figure 5 is a flow chart showing still another preferred embodiment of the load monitoring method of the distributed storage system according to the present application;

6 is a flow chart showing a specific application embodiment of a load monitoring method of a distributed storage system according to the present application;

7 is a block diagram showing a load monitoring device of a distributed storage system in accordance with another aspect of the present application.

The same or similar reference numerals in the drawings denote the same or similar components.

detailed description

The invention is further described in detail below with reference to the accompanying drawings.

In a typical configuration of the present application, the terminal, the device of the service network, and the trusted party each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory. Memory is an example of a computer readable medium.

Computer readable media includes both permanent and non-persistent, removable and non-removable media. Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, A magnetic tape cartridge, magnetic tape storage or other magnetic storage device or any other non-transportable medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media, such as modulated data signals and carrier waves.

As shown in FIG. 1 , the present application provides a load monitoring method for a distributed storage system, where the method includes:

Step S1, determining a thread pool whose load pressure exceeds the standard on each server in the distributed storage system;

In step S2, an alarm or load balancing is performed on each thread pool whose load pressure exceeds the standard. In this embodiment, according to the thread pool state of the server (Server), that is, the load of the single server exceeds the service capability, the alarm is automatically distributed or the load is automatically distributed among the servers, and the user request mode is not dependent on the user. Handling requests that are simultaneously reached by different users does not depend on the service capabilities of the server. In the case of inconsistent service capabilities of the internal servers of the distributed storage system, the alarms or load balancing can be correctly performed. Prevent hot spots and improve the quality of distributed storage system services. Specifically, according to the basic information of the thread pool of the server, all thread pools with excessive load pressure on each server in the distributed storage system may be determined, or alarm or load balancing may be performed on each thread pool whose load pressure exceeds the standard. A single server (Server) generally has several thread pools. A typical thread pool can be described by a single queue model. The basic information of the thread pool can be represented as a queue parameter. The specific queue parameters can include the following contents:

a) W _q : indicates the waiting time of a request in the queue of a thread pool.

b) B: indicates the actual processing time of a request in the queue of a thread pool.

c) W: indicates the waiting time of a request in the queue of a thread pool, that is, the waiting time W _q plus the actual processing time B.

d) λ: indicates the rate at which a request in a thread pool reaches the queue of the thread pool.

e) μ: indicates the service capability of a thread pool per unit time.

As shown in FIG. 2, in a preferred embodiment of the load monitoring method of the distributed storage system of the present application, in step S1, a thread pool in which all load pressures on each server in the distributed storage system exceeds the standard is determined, including:

Step S11, the acquisition of each thread pool on each server request queue wait time W _Q W stay with the ratio of the waiting time W _Q stay thread pool for each queue a request for the time W The sum of the actual processing time B;

In step S12, when the ratio of the waiting time _Wq to the staying time W exceeds the preset exceeding threshold th, it is determined that the load pressure of the thread pool on the server where the request is located exceeds the standard. Analyze each thread pool parameter on each service to find out the thread pool whose load pressure exceeds the standard. For how to judge the thread pool exceeding the standard load, you can use the following formula to judge:

W _q /W>th

The meaning of the formula is that when the ratio of the waiting time W _q to the staying time W exceeds the preset exceeding threshold th, it is determined that the load pressure of the thread pool on the server where the request is located exceeds the standard. In this embodiment, by comparing the ratio of the waiting time W _q to the staying time W and the preset over-standard threshold th, it is possible to accurately obtain a thread pool in which all load pressures on each server exceed the standard.

In a preferred embodiment of the load monitoring method of the distributed storage system of the present application, the preset exceeding threshold in step S12 is determined according to a preset threshold of a request arrival rate of the thread pool, wherein the request arrival rate of the thread pool is The ratio of the rate λ of the queue that requests the thread pool to the service capacity μ of the thread pool per unit time, when the thread pool request arrival rate λ/μ exceeds the preset threshold of the thread pool request arrival rate, the corresponding wait The ratio W _q /W of the time to the dwell time starts to rise sharply, and the preset over-standard threshold exceeds the ratio W _q /W of the waiting time to the dwell time when the start of the sharp rise. Specifically, the preset threshold of the request arrival rate of the thread pool can be determined by using FIG. 3. In FIG. 3, each thread table has a thread pool, and n respectively represents the number of threads in the corresponding thread pool, wherein the first one There are 1 thread in the thread pool, 2 threads in the second thread pool, 3 threads in the third thread pool, and 10 threads in the fourth thread pool. The fifth thread pool There are 24 threads in it. In Fig. 3, the abscissa indicates the request arrival rate λ/μ, and the ordinate indicates the ratio W _q /W of the waiting time W _q to the stay time W. From Fig. 3, it can be seen that W _q /W starts to rise sharply after the request arrival rate λ/μ exceeds a certain value (preset threshold of the thread pool's request arrival rate), and the start of W _q /W is sharp The value of the rising point is the inflection point, so the preset exceeding threshold of W _q /W can only exceed the inflection point. For example, in practice, the value of the inflection point can be set to 0.5, and the preset exceeding threshold is only greater than 0.5. can. In this embodiment, an accurate preset over-standard threshold is determined according to a preset threshold of the request arrival rate of the thread pool, so that all the thread pools whose load pressure exceeds the standard on each server can be obtained more accurately.

As shown in FIG. 4, in a preferred embodiment of the load monitoring method of the distributed storage system of the present application, load balancing is performed on each thread pool whose load pressure exceeds the standard in step S2, including:

Step S21, the request of the thread pool passing each load pressure exceeding the standard is performed according to the partition, and the number of requests belonging to different partitions in the thread pool is counted, and the partitions are arranged in descending order according to the number of requests;

Step S22, determining whether the number of requests of the partition with the largest number of requests exceeds half of the total number of requests of all the partitions of the thread pool, and if yes, go to step S23.

Step S23, performing a split operation on the partition with the largest number of requests. Specifically, for example, three partitions in a thread pool whose load pressure exceeds the standard are partition A, partition B, and partition C, wherein the number of requests belonging to partition A is 100, and the number of requests belonging to partition B is 20. The number of requests belonging to the partition C is 10, and the number of requests for the partition A having the largest number of requests exceeds half of the total number of requests of all the partitions of the thread pool 65=(100+20+10)/2. In this embodiment, the thread pool extraction request information with each load pressure exceeding the standard is analyzed, because the thread pool and the partition are not one-to-one correspondence, and the partition is only a logical unit and belongs to the partition (Partition). Requests may be processed using multiple thread pools. The request through the thread pool is counted according to the partition (Partition), and the number of requests belonging to different partitions (Partition) is counted, and then arranged in descending order according to the number of requests. If the number of requested partitions in a thread pool exceeds half of the total number of requests for all partitions of the thread pool, then select the partition and jump to step S23, in step S23. Because it belongs to The number of requests for the selected Partition accounts for half or more of the total number of requests for all partitions of the thread pool. Therefore, the selected Partition needs to be split, and the processing ends after the split. In this embodiment, by finding a partition with the largest number of requests exceeding half of the total number of requests of all the partitions of the thread pool, the partition can be regarded as a partition in the thread pool that has a significant influence on the load pressure exceeding the standard, and therefore, Select it and split it to effectively achieve load balancing.

In a preferred embodiment of the load monitoring method of the distributed storage system of the present application, in step S23, the partitioning operation is performed on the partition with the largest number of requests, including:

The partition is divided into several sub-partitions, and the sub-partitions are distributed to other servers, wherein each sub-partition corresponds to a sub-key range within a range of key codes of the partition, and the number of requests to which each sub-part belongs is substantially equal. Specifically, the splitting point is averaged according to the number of requests in the partition range after the request is placed on the thread pool, for example, the key code range of a certain partition range is 0.1 to 0.4, wherein 0.1~ There are 200 requests in the range of 0.2, 200 in the range of 0.2 to 0.3, and 200 in the range of 0.3 to 0.4. The partition can be divided into three sub-partitions, and the corresponding sub-key range The load balancing is better achieved by 0.1 to 0.2, 0.2 to 0.3, and 0.3 to 0.4, respectively.

As shown in FIG. 5, in a preferred embodiment of the load monitoring method of the distributed storage system of the present application, in step S22, it is determined whether the number of requests of the partition with the largest number of requests exceeds the total number of requests of all the partitions of the thread pool. After half, it also includes:

If no, go to step S24,

Step S24, selecting one or more partitions in the descending order partition from the first partition, until the total number of requests belonging to the unselected remaining partitions is less than half of the total number of requests of all partitions of the thread pool. ;

In step S25, a migration operation is performed on the selected partition. Here, if the partition with the largest number of requests does not exceed half of the total number of requests for all the partitions of the thread pool, the partition (Partition) is selected from the first partition (Partition) in descending order of the thread pool. Until the total number of requests to which the remaining partitions belong is less than half of the total number of requests for all partitions of the thread pool, and jumps to step S25, in step S25, because the partition with the largest number of requests does not exceed the thread pool The partitions of all partitions are half of the total number of requests, so there are no partitions (Partitions) in the request of the thread pool, so the partitions selected in order from the first partition are migrated one by one. For example, in a thread pool with a load pressure exceeding the standard, there are five partitions: partition D, partition E, partition F, partition G, and partition H. The number of requests belonging to partition D is 100, and the request belongs to partition E. The number is 100, and the number of requests belonging to partition F is 100, the number of requests belonging to the partition G is 100, and the number of requests belonging to the partition H is 100, and the first three partitions D, E, and F need to be selected for migration one by one, so that the remaining partitions G and H belong to the request. The total number of 200=100+100 is less than half of the total number of requests for all partitions of the thread pool 250=(100+100+100+100+100)/2. This embodiment can accurately find a partition in a thread pool that needs to be migrated, thereby achieving load balancing better.

In a preferred embodiment of the load monitoring method of the distributed storage system of the present application, the selected partition is migrated, including:

Load balancing is achieved by migrating each selected partition to a server that does not have a thread pool with excessive load stress.

In a preferred embodiment of the load monitoring method of the distributed storage system of the present application, each selected partition is migrated to a server having no thread pool with excessive load pressure, including:

Find a server that meets the criteria for a thread pool that does not have excessive load pressure. If it finds it, migrate the selected partition to the found server. In this embodiment, on the premise of finding a server that meets the condition that there is no thread pool with excessive load pressure, the selected partition is migrated to the found server, thereby achieving load balancing better.

In a preferred embodiment of the load monitoring method of the distributed storage system of the present application, the server that meets the condition that there is no thread pool with excessive load pressure includes:

If a selected partition is migrated to a corresponding thread pool of a server that has no thread pool with excessive load pressure, the average thread usage of each corresponding thread pool of the migrated target server does not exceed the preset usage. Rate threshold, then the server is a server with an eligible thread pool that has no load pressure exceeded. Specifically, a partition is taken from the set to be migrated, that is, all selected partitions, and the number of requests on all thread pools on the current server is obtained. For example, a selected partition on the server M uses two thread pools to process the membership. The request is respectively a read request thread pool Q1 and a write request thread pool Q2, and one of the server sets without the overloaded thread pool is randomly selected, for example, the server N is selected, and then the read of the selected partition (Partition) is calculated. After the request thread pool Q1 and the write request thread pool Q2 are respectively migrated to the corresponding read request thread pool Q1+ and the write request thread pool Q2+ on the server N, the read request thread pool Q1 is migrated to the read request thread pool Q1+, and the write request thread is to be written. After the pool Q2 is migrated to the write request thread pool Q2+, the average usage rate of the thread of the request thread pool Q1+ and the write request thread pool Q2+ on the server N does not exceed the preset usage threshold, and the preset usage threshold may be an empirical value. That is, if the usage rate of all corresponding thread pools after the migration does not exceed the preset usage threshold, it is determined that the migration is allowed, otherwise another A no load The server of the thread pool with excessive pressure repeats the process until all the servers of the thread pool with no overloaded load have been checked, and no server that meets the condition of the thread pool with no overloaded load is found, that is, the partition is abandoned. ). This embodiment can accurately find a server of a thread pool that does not have a load pressure exceeding the standard, thereby achieving load balancing better.

In a preferred embodiment of the load monitoring method of the distributed storage system of the present application, the average thread usage rate of each thread pool is obtained by the following formula (λ ₁ + λ) * B / n, wherein

λ represents the rate at which the target server to be migrated to a queue in a corresponding thread pool before the migration reaches the queue of the thread pool;

n indicates the number of threads in a corresponding thread pool of the target server to be migrated to. Specifically, taking the server M before the migration and the target server N to be migrated as an example, whether the average usage rate of the thread of the read request thread pool Q1+ and the write request thread pool Q2+ is not exceeded or not exceeds the preset usage threshold. Wherein, the average usage rate of the thread of the read request thread pool Q1+ is (λ _Q1 + λ _Q1+ )*B _Q1+ /n _Q1+ , and the calculation formula of the average thread usage rate of the write request thread pool Q2+ is (λ _Q2 + λ _Q2+ *B _Q2+ /n _Q2+ , this embodiment can accurately calculate the average thread usage rate of each thread pool, thereby better achieving load balancing.

As shown in FIG. 6, in a specific application example of the present application, a load monitoring method of a distributed storage system includes the following steps:

Step S61: Obtain a thread pool that is not processed in the distributed storage system, and determine whether it is acquired. If not, go to step S62, and if yes, go to step S63.

Step S62, ending;

Step S63, the acquisition waiting time ratio W _Q W stay with the thread pool is in a request queue, the stay latency time W W _Q for the thread pool queue a request to the actual processing time and B is ;

Step S64, determining whether the ratio of the waiting time _Wq to the staying time W exceeds a preset exceeding threshold th, if not, going to step S61, and if yes, going to step S65,

Step S65, the request through the thread pool is counted according to the partition, and the number of requests belonging to different partitions in the thread pool is counted, and the partitions are arranged in descending order according to the number of requests, and the number of requests is determined. Whether the number of requests of the partition exceeds half of the total number of requests of all the partitions of the thread pool, and if yes, go to step S66, if no, go to step S67.

Step S66, performing a split operation on the partition with the largest number of requests;

Step S67, selecting one or more partitions in the descending order partition from the first partition, until the total number of requests belonging to the unselected remaining partitions is less than half of the total number of requests of all partitions of the thread pool. ;

Step S68, determining whether there is an unprocessed partition in the selected partition, if yes, proceeding to step S69, if not, proceeding to step S61,

Step S69, taking off an unprocessed partition;

Step S70, searching for a server that meets the condition of the thread pool with no overload of the load pressure, and determining whether it is found, if not, then going to step S68 to remove an unprocessed partition from the selected partition and Subsequent processing until all selected partitions have been processed. If found, go to step S71.

Step S71, after migrating the selected partition to the found server, go to step S68.

As shown in FIG. 7, according to another aspect of the present application, a load monitoring device of a distributed storage system is provided, and the device 100 includes:

The load monitoring device 1 is configured to determine a thread pool whose load pressure exceeds the standard on each server in the distributed storage system;

The alarm or load balancing device 2 is configured to perform alarm or load balancing on each thread pool whose load pressure exceeds the standard. In this embodiment, according to the thread pool state of the server (Server), that is, the load of the single server exceeds the service capability, the alarm is automatically distributed or the load is automatically distributed among the servers, and the user request mode is not dependent on the user. Handling requests that are simultaneously reached by different users does not depend on the service capabilities of the server. In the case where the service capabilities of the internal servers of the distributed storage system are inconsistent, the alarms or load balancing can be correctly performed, thereby preventing hot spots and improving distribution. Storage system service quality. Specifically, according to the basic information of the thread pool of the server, all thread pools with excessive load pressure on each server in the distributed storage system may be determined, or alarm or load balancing may be performed on each thread pool whose load pressure exceeds the standard. A single server (Server) generally has several thread pools. A typical thread pool can be described by a single queue model. The basic information of the thread pool can be represented as a queue parameter. The specific queue parameters can include the following contents:

e) μ: indicates the service capability of a thread pool per unit time.

In a preferred embodiment of the load monitoring device of the distributed storage system of the present application, the load monitoring device 1 is configured to acquire a waiting time W _q and a stay time W of a request in a queue of each thread pool on each server. The ratio of the waiting time W is the _sum of the waiting time Wq of one request in the queue of each thread pool and the actual processing time B; when the ratio of the waiting time _Wq to the staying time W exceeds a preset exceeding threshold th When it is determined, the load pressure of the thread pool on the server where the request is located exceeds the standard. Analyze each thread pool parameter on each service to find out the thread pool whose load pressure exceeds the standard. For how to judge the thread pool exceeding the standard load, you can use the following formula to judge:

W _q /W>th

In a preferred embodiment of the load monitoring device of the distributed storage system of the present application, the preset exceeding threshold is determined according to a preset threshold of a request arrival rate of the thread pool, wherein the request arrival rate of the thread pool is a request arrival thread. The ratio of the rate of the queue of the pool to the service capability μ of the thread pool per unit time. When the request arrival rate of the thread pool exceeds the preset threshold of the thread pool request arrival rate λ/μ, the corresponding waiting time and the stay time The ratio W _q /W begins to rise sharply, and the preset over-standard threshold exceeds the ratio W _q /W of the waiting time to the dwell time when the start of the sharp rise. Each thread table has a thread pool, and n represents the number of threads in the corresponding thread pool. Among them, there are one thread in the first thread pool, and two threads in the second thread pool, and the third thread. There are 5 threads in the thread pool, 10 threads in the fourth thread pool, and 24 threads in the fifth thread pool. In Figure 3, the abscissa indicates the request arrival rate λ/μ, and the ordinate indicates waiting. The ratio of time W _q to stay time W is W _q /W. From Fig. 3, it can be seen that W _q /W starts to rise sharply after the request arrival rate λ/μ exceeds a certain value (preset threshold of the thread pool's request arrival rate), and the start of W _q /W is sharp The value of the rising point is the inflection point, so the preset exceeding threshold of W _q /W only needs to exceed the inflection point. For example, in practice, the value of the inflection point can be set to 0.5, and the preset exceeding threshold is only greater than 0.5. can. In this embodiment, an accurate preset over-standard threshold is determined according to a preset threshold of the request arrival rate of the thread pool, so that all the thread pools whose load pressure exceeds the standard on each server can be obtained more accurately.

In a preferred embodiment of the load monitoring device of the distributed storage system of the present application, the alarm or load balancing device 2 is configured to count the request of the thread pool that exceeds the pressure of each load according to the partition, and count the thread pool. The number of requests belonging to different partitions, and sorting the partitions in descending order of the number of requests; determining whether the number of requests for the partition with the largest number of requests exceeds half of the total number of requests for all partitions of the thread pool, and if so, The partition with the largest number of requests performs the split operation. Specifically, for example, three partitions in a thread pool whose load pressure exceeds the standard are partition A, partition B, and partition C, wherein the number of requests belonging to partition A is 100, and the number of requests belonging to partition B is 20. The number of requests belonging to the partition C is 10, and the number of requests for the partition A having the largest number of requests exceeds half of the total number of requests of all the partitions of the thread pool 65=(100+20+10)/2. In this embodiment, the thread pool extraction request information with each load pressure exceeding the standard is analyzed, because the thread pool and the partition are not one-to-one correspondence, and the partition is only a logical unit and belongs to the partition (Partition). Requests may be processed using multiple thread pools. The request through the thread pool is counted according to the partition (Partition), and the number of requests belonging to different partitions (Partition) is counted, and then arranged in descending order according to the number of requests. If the number of requested partitions in a thread pool exceeds half of the total number of requests for all partitions of the thread pool, then select the partition and jump to step S23, in step S23. Because the number of requests belonging to the selected partition is half or more of the total number of requests for all partitions of the thread pool, the selected partition needs to be split, and the processing ends after the split. . In this embodiment, by finding a partition with the largest number of requests exceeding half of the total number of requests of all the partitions of the thread pool, the partition can be regarded as a partition in the thread pool that has a significant influence on the load pressure exceeding the standard, and therefore, Select it and split it to effectively achieve load balancing.

In a preferred embodiment of the load monitoring device of the distributed storage system of the present application, the alarm or load balancing device 2 is configured to divide the partition into a plurality of sub-partitions, and distribute the sub-partitions to other servers, wherein each sub-partition The number of requests for each sub-partition is substantially equal to a sub-key range within the range of the key code of the partition. Specifically, the splitting point is averaged according to the number of requests in the partition range after the request is placed on the thread pool, for example, the key code range of a certain partition range is 0.1 to 0.4, wherein 0.1~ There are 200 requests in the range of 0.2, 200 in the range of 0.2 to 0.3, and 200 in the range of 0.3 to 0.4. The partition can be divided into three sub-partitions, and the corresponding sub-keys are The circumference is 0.1 to 0.2, 0.2 to 0.3, and 0.3 to 0.4, respectively, to achieve load balancing better.

In a preferred embodiment of the load monitoring device of the distributed storage system of the present application, the alarm or load balancing device 2 is configured to determine whether the number of requests for the partition with the largest number of requests exceeds the total number of requests for all the partitions of the thread pool. Half, if not, select one or more partitions in the descending order partition from the first partition, until the total number of requests belonging to the unselected remaining partitions is less than the request of all partitions of the thread pool Half of the total; migrate the selected partition. Here, if the partition with the largest number of requests does not exceed half of the total number of requests for all the partitions of the thread pool, the partition (Partition) is selected from the first partition (Partition) in descending order of the thread pool. The total number of requests to which the remaining partition belongs is less than half of the total number of requests for all partitions of the thread pool, and the selected partition is migrated because the partition with the largest number of requests does not exceed all of the thread pool The partitioned request has half of the total number of requests, so there is no partition (Partition) in the request of the thread pool, so the partitions selected in order from the first partition are migrated one by one. For example, in a thread pool with a load pressure exceeding the standard, there are five partitions: partition D, partition E, partition F, partition G, and partition H. The number of requests belonging to partition D is 100, and the request belongs to partition E. The number is 100, the number of requests belonging to the partition F is 100, the number of requests belonging to the partition G is 100, and the number of requests belonging to the partition H is 100, and the first three partitions D, E, and F need to be selected for migration one by one. Thus the total number of requests to which the remaining partitions G and H belong is 200=100+100 less than half of the total number of requests for all partitions of the thread pool 250=(100+100+100+100+100)/2. This embodiment can accurately find a partition in a thread pool that needs to be migrated, thereby achieving load balancing better.

In a preferred embodiment of the load monitoring device of the distributed storage system of the present application, the alarm or load balancing device 2 is configured to migrate each selected partition to a server that has no thread pool with excessive load pressure, thereby implementing a load. balanced.

In a preferred embodiment of the load monitoring device of the distributed storage system of the present application, the alarm or load balancing device 2 is configured to find a server of a thread pool that does not have a load pressure exceeding the standard, and if found, select the selected The partition is migrated to the discovered server. In this embodiment, on the premise of finding a server that meets the condition that there is no thread pool with excessive load pressure, the selected partition is migrated to the found server, thereby achieving load balancing better.

In a preferred embodiment of the load monitoring device of the distributed storage system of the present application, the server that meets the condition that there is no thread pool with excessive load pressure includes:

If a selected partition is migrated to the corresponding server of a thread pool that does not have excessive load pressure After the thread pool, when the average thread usage of each corresponding thread pool of the migrated target server does not exceed the preset usage threshold, the server is an eligible server with no thread pool with excessive load pressure. Specifically, a partition is taken from the set to be migrated, that is, all selected partitions, and the number of requests on all thread pools on the current server is obtained. For example, a selected partition on the server M uses two thread pools to process the membership. The request is respectively a read request thread pool Q1 and a write request thread pool Q2, and one of the server sets without the overloaded thread pool is randomly selected, for example, the server N is selected, and then the read of the selected partition (Partition) is calculated. After the request thread pool Q1 and the write request thread pool Q2 are respectively migrated to the corresponding read request thread pool Q1+ and the write request thread pool Q2+ on the server N, the read request thread pool Q1 is migrated to the read request thread pool Q1+, and the write request thread is to be written. After the pool Q2 is migrated to the write request thread pool Q2+, the average usage rate of the thread of the request thread pool Q1+ and the write request thread pool Q2+ on the server N does not exceed the preset usage threshold, and the preset usage threshold may be an empirical value. That is, if the usage rate of all corresponding thread pools after the migration does not exceed the preset usage threshold, it is determined that the migration is allowed, otherwise another A server that does not have a thread pool with excessive load pressure repeats the process until all the servers of the thread pool that have no overloaded load have been checked, and no server that meets the conditional thread pool with no overloaded load is found, that is, the migration is abandoned. The partition (Partition). This embodiment can accurately find a server of a thread pool that does not have a load pressure exceeding the standard, thereby achieving load balancing better.

In a preferred embodiment of the load monitoring device of the distributed storage system of the present application, the average thread usage rate of each thread pool is obtained by the following formula (λ ₁ + λ) * B / n, wherein

In summary, the present application determines whether a thread pool with excessive load pressure on each server in a distributed storage system is alarmed or load balanced for each thread pool whose load pressure exceeds the standard, and can be based on the thread pool status of the server. The load of the server exceeds the service capability to alarm or automatically distribute the load among the servers. It does not depend on the user request mode. It can correctly handle requests that are simultaneously reached by different users, and does not depend on the server's service capabilities. In the case of inconsistent service capabilities of cluster internal servers of a distributed storage system, alarms or load balancing can be performed correctly, thereby preventing hot spots and improving the quality of distributed storage system services.

It will be apparent to those skilled in the art that various modifications and changes can be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the application are within the scope of the claims and their equivalents This application is also intended to cover such modifications and variations.

It should be noted that the present invention can be implemented in software and/or a combination of software and hardware, for example, using an application specific integrated circuit (ASIC), a general purpose computer, or any other similar hardware device. In one embodiment, the software program of the present invention may be executed by a processor to implement the steps or functions described above. Likewise, the software program (including related data structures) of the present invention can be stored in a computer readable recording medium such as a RAM memory, a magnetic or optical drive or a floppy disk and the like. Additionally, some of the steps or functions of the present invention may be implemented in hardware, for example, as a circuit that cooperates with a processor to perform various steps or functions.

Additionally, a portion of the invention can be applied as a computer program product, such as computer program instructions, which, when executed by a computer, can invoke or provide a method and/or solution in accordance with the present invention. The program instructions for invoking the method of the present invention may be stored in a fixed or removable recording medium and/or transmitted by a data stream in a broadcast or other signal bearing medium, and/or stored in a The working memory of the computer device in which the program instructions are run. Herein, an embodiment in accordance with the present invention includes a device including a memory for storing computer program instructions and a processor for executing program instructions, wherein when the computer program instructions are executed by the processor, triggering The apparatus operates based on the aforementioned methods and/or technical solutions in accordance with various embodiments of the present invention.

It is apparent to those skilled in the art that the present invention is not limited to the details of the above-described exemplary embodiments, and the present invention can be embodied in other specific forms without departing from the spirit or essential characteristics of the invention. Therefore, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the invention is defined by the appended claims instead All changes in the meaning and scope of equivalent elements are included in the present invention. Any reference signs in the claims should not be construed as limiting the claim. In addition, it is to be understood that the word "comprising" does not exclude other elements or steps. A plurality of units or devices recited in the device claims may also be implemented by a unit or device by software or hardware. The first, second, etc. words are used to denote names and do not denote any particular order.

Claims

A load monitoring method for a distributed storage system, wherein the method comprises:

Determining the thread pool where all load pressures on each server in the distributed storage system exceed the standard;

Alarm or load balancing for each thread pool with excessive load pressure.
The method of claim 1 wherein determining a thread pool of all load pressures on each server in the distributed storage system exceeds a standard, comprising:

Obtaining a ratio of a wait time to a stay time of a request in a queue of each thread pool on each server, the stay time being the sum of a wait time of one request in a queue of each thread pool and an actual processing time;

When the ratio of the waiting time to the staying time exceeds a preset exceeding threshold, it is determined that the load pressure of the thread pool on the server where the request is located exceeds the standard.
The method of claim 2, wherein the preset exceeding threshold is determined according to a preset threshold of a request arrival rate of the thread pool, wherein the request arrival rate of the thread pool is a rate of a queue requesting to reach the thread pool The ratio of the service capacity of the thread pool per unit time. When the request arrival rate of the thread pool exceeds the preset threshold of the request arrival rate of the thread pool, the ratio of the corresponding waiting time to the stay time starts to rise sharply, and the preset exceeds the standard. The threshold value exceeds the ratio of the waiting time to the staying time when the start of the sharp rise.
The method according to any one of claims 1 to 3, wherein load balancing is performed on each thread pool whose load pressure exceeds the standard, including:

The request of the thread pool that exceeds the standard of each load pressure is counted according to the partition, and the number of requests belonging to different partitions in the thread pool is counted, and the partitions are arranged in descending order according to the number of requests;

Determines whether the number of requests for the partition with the largest number of requests exceeds half of the total number of requests for all partitions of the thread pool.

If yes, split the partition with the largest number of requests.
The method of claim 4, wherein the splitting operation is performed on the partition having the largest number of requests, including:

The partition is divided into several sub-partitions, and the sub-partitions are distributed to other servers, wherein each sub-partition corresponds to a sub-key range within a range of key codes of the partition, and the number of requests to which each sub-part belongs is substantially equal.
The method of claim 4, wherein after determining whether the number of requests for the partition having the largest number of requests exceeds half of the total number of requests of all the partitions of the thread pool, the method further comprises:

If not, select one or more partitions in the descending order of the partition from the first partition, The total number of requests to the remaining unselected partitions is less than half of the total number of requests for all partitions of the thread pool;

Migrate the selected partition.
The method of claim 6 wherein the migrating operation of the selected partition comprises:

Migrate each selected partition to a server that has no thread pool with excessive load pressure.
The method of claim 7 wherein each selected partition is migrated to a server of a thread pool that does not have a load stress threshold, comprising:

Find a server that meets the criteria for a thread pool that does not have excessive load pressure. If it finds it, migrate the selected partition to the found server.
The method of claim 8 wherein said eligible server of a thread pool without load stress exceeding the criteria comprises:

If a selected partition is migrated to a corresponding thread pool of a server that has no thread pool with excessive load pressure, the average thread usage of each corresponding thread pool of the migrated target server does not exceed the preset usage. Rate threshold, then the server is a server with an eligible thread pool that has no load pressure exceeded.
The method of claim 9, wherein the thread average usage rate of each thread pool is obtained by the following formula (λ 1 + λ) * B / n, wherein

λ 1 represents the rate at which a request in a thread pool on the server before the migration reaches the queue of the thread pool;

λ represents the rate at which the target server to be migrated to the queue of the thread pool in a corresponding thread pool before the migration;

B represents the actual processing time of one request for each thread in a corresponding thread pool of the target server to be migrated;

n indicates the number of threads in a corresponding thread pool of the target server to be migrated to.
A load monitoring device for a device distributed storage system, wherein the device includes:

a load monitoring device, configured to determine a thread pool in which all load pressures on each server in the distributed storage system exceed the standard;

An alarm or load balancing device that is used to alarm or load balance each thread pool whose load pressure exceeds the standard.
The device according to claim 11, wherein said load monitoring means is configured to obtain a ratio of a waiting time to a staying time of a request in a queue of each thread pool on each server, said staying time being each The sum of the wait time of a request in the queue of the thread pool and the actual processing time; when the wait time is When the ratio of the stay time exceeds the preset over-standard threshold, it is determined that the load pressure of the thread pool on the server where the request is located exceeds the standard.
The device of claim 12, wherein the preset exceeding threshold is determined according to a preset threshold of a request arrival rate of the thread pool, wherein the request arrival rate of the thread pool is a rate of a queue requesting to reach the thread pool The ratio of the service capacity of the thread pool per unit time. When the request arrival rate of the thread pool exceeds the preset threshold of the request arrival rate of the thread pool, the ratio of the corresponding waiting time to the stay time starts to rise sharply, and the preset exceeds the standard. The threshold value exceeds the ratio of the waiting time to the staying time when the start of the sharp rise.
The device according to any one of claims 11 to 13, wherein the alarm or load balancing device is configured to count the request of the thread pool that passes each load pressure exceeding the standard according to the partition, and count the membership in the thread pool. The number of requests in different partitions, and the partitions are sorted in descending order of the number of requests; determine whether the number of requests for the partition with the largest number of requests exceeds half of the total number of requests for all partitions of the thread pool, and if so, the request The partition with the largest number of partitions performs the split operation.
The apparatus according to claim 14, wherein said alarm or load balancing means is configured to divide the partition into a plurality of sub-partitions, and to distribute the sub-partitions to other servers, wherein each sub-part corresponds to a key range of the partition Within a subkey code range, the number of requests per subpartition is substantially equal.
The device according to claim 14, wherein said alarm or load balancing means is configured to determine whether the number of requests for the partition having the largest number of requests exceeds half of the total number of requests of all partitions of the thread pool, and if not, Selecting one or more partitions from the first partition in the descending ranked partition until the total number of requests belonging to the unselected remaining partitions is less than half of the total number of requests of all partitions of the thread pool; The partition is migrated.
The apparatus of claim 16 wherein said alerting or load balancing means is operative to migrate each selected partition to a server of a thread pool that is not overloaded with load.
The device according to claim 17, wherein said alarm or load balancing means is configured to find a server of a thread pool that does not have a load pressure exceeding the standard, and if found, migrate the selected partition to the found one. On the server.
The apparatus of claim 18, wherein the eligible server of the thread pool without load stress exceeding the standard comprises:

If a selected partition is migrated to a corresponding thread pool of a server that has no thread pool with excessive load pressure, the average thread usage of each corresponding thread pool of the migrated target server does not exceed the preset usage. Rate threshold, then the server is a qualified thread pool service with no load pressure exceeded Device.
The apparatus according to claim 19, wherein the thread average usage rate of each thread pool is obtained by the following formula (λ 1 + λ) * B / n, wherein

λ 1 represents the rate at which a request in a thread pool on the server before the migration reaches the queue of the thread pool;

λ represents the rate at which the target server to be migrated to the queue of the thread pool in a corresponding thread pool before the migration;

B represents the actual processing time of one request for each thread in a corresponding thread pool of the target server to be migrated;

n indicates the number of threads in a corresponding thread pool of the target server to be migrated to.