CN106469018B

CN106469018B - Load monitoring method and device for distributed storage system

Info

Publication number: CN106469018B
Application number: CN201510504654.2A
Authority: CN
Inventors: 张潇雨
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2015-08-17
Filing date: 2015-08-17
Publication date: 2019-12-27
Anticipated expiration: 2035-08-17
Also published as: CN106469018A; WO2017028696A1

Abstract

The application provides a load monitoring method and equipment of a distributed storage system, the method and equipment perform alarming or automatically balance and distribute loads among servers by determining all thread pools with over-standard load pressure on each server in the distributed storage system and performing alarming or load balancing on each thread pool with over-standard load pressure, and the method and equipment can perform alarming or automatically balance and distribute the loads among the servers according to the thread pool states of the servers, namely the load of a single server exceeds the service capacity.

Description

Load monitoring method and device for distributed storage system

Technical Field

The present application relates to the field of computers, and in particular, to a load monitoring method and device for a distributed storage system.

Background

The distributed storage system is a distributed system which provides storage service by using a cluster, and a user uses a Key code (Key) as an index to perform operations such as reading and writing on a corresponding Key Value (Value). For a Key (Key), a user may perform different types of operations, such as writing a Key (Value), reading a corresponding Key (Value), or deleting a corresponding Key (Value), to the Key, where each operation is referred to as a request. The thread pool in the distributed storage system is a service unit with a certain number of threads, requests are firstly added into a queue of the thread pool for waiting, and the threads in the thread pool can take out the requests from the queue in sequence for processing when the threads are idle. The Partition (Partition) is a basic unit for scheduling the distributed storage system, the Key code (Key) uniquely determines the affiliated Partition (Partition) through the start Key code (BeginKey) and the end Key code (EndKey) of the Partition (Partition), and different partitions (partitions) do not overlap. The Server (Server) in the distributed storage system is a basic unit for providing services, each Server (Server) is provided with a plurality of partitions (Partition), requests of different keys (Key) are processed by different servers (Server) according to different partitions (Partition), and a thread pool is used as an actual processing unit in the Server (Server) to process different requests.

The Key code (Key) of the user is divided into partitions (Partition) and then stored in the distributed file system in sequence, and because a single Partition can only belong to one Server (Server), when the user requests in the range of the single Partition are increased, the load of the Server is increased, the delay (Latency) of the user is increased, and the like, and other partitions on the Server are influenced. Therefore, in order to ensure that the service capabilities of all servers in the cluster are fully utilized, a load monitoring scheme is required to disperse hot spots and improve the service quality. The current solution to the request hot spot is the splitting and migration of partitions. The splitting is to divide the Partition (Partition) into a plurality of partitions (partitions) according to different key code ranges (Keys), and the partitions (partitions) after splitting can be randomly dispersed to other servers (servers); migration is the movement of a Partition (Partition) from one Server (Server) to another Server (Server).

The existing means for solving the request hotspot roughly include the following three types:

1. when the size of a single Partition (Partition) exceeds a certain limit, the Partition is split into several partitions (Partition) on average. However, the processing capability of the Partition (Partition) cannot be accurately reflected in the scheme of partitioning according to the size of a single Partition (Partition), the request modes of the users are inconsistent, the influence of the size of the Partition (Partition) on the Partition (Partition) is also different, and sometimes, a hotspot occurs due to the fact that the user requests are concentrated in a small range under the condition of a small size of the Partition (Partition).

2. And when the query rate per second (QPS) of the request of a single Partition (Partition) is greater than a certain threshold value, splitting the Partition (Partition) according to the range of the user request. However, since the processing capacity of different servers (servers) needs to be measured by dividing the requested query per second rate (QPS) as a threshold, different values need to be allocated to the different servers (servers), and the theoretical processing capacity cannot be achieved in some cases when other programs are running on the servers (servers).

3. Obtaining some parameters of request execution, such as IO operation time, Cache hit times and the like, then configuring a certain rule, and executing splitting when a preset condition is met. Although the scheme of configuring a certain rule according to parameters to perform splitting has strong flexibility, for this reason, the rules configured in different scenes are different, and the rule needs to be updated according to the request mode of the user, which is not automatic enough.

Disclosure of Invention

An object of the present application is to provide a load monitoring method and device for a distributed storage system, which can solve the problem of hot spots occurring in the distributed storage system.

According to an aspect of the present application, there is provided a load monitoring method of a distributed storage system, the method including:

determining all thread pools with excessive load pressure on each server in the distributed storage system;

and alarming or load balancing each thread pool with the excessive load pressure.

Further, in the above method, determining all thread pools whose load pressures on each server in the distributed storage system exceed a standard includes:

acquiring the ratio of the waiting time of a request in the queue of each thread pool on each server to the lingering time, wherein the lingering time is the sum of the waiting time of a request in the queue of each thread pool and the actual processing time;

and when the ratio of the waiting time to the staying time exceeds a preset overproof threshold, determining that the load pressure of a thread pool on the server where the request is positioned exceeds the standard.

Further, in the above method, the preset exceeding threshold is determined according to a preset threshold of a request arrival rate of the thread pool, where the request arrival rate of the thread pool is a ratio of a rate at which requests arrive at a queue of the thread pool to a service capacity of the thread pool per unit time, when the request arrival rate of the thread pool exceeds the preset threshold of the request arrival rate of the thread pool, a ratio of a corresponding waiting time to a lingering time starts to rise sharply, and the preset exceeding threshold exceeds the ratio of the waiting time to the lingering time when the request arrival rate starts to rise sharply.

Further, in the above method, load balancing is performed on each thread pool whose load pressure exceeds the standard, including:

counting the requests of each thread pool with the exceeding load pressure according to partitions, counting the number of the requests belonging to different partitions in the thread pool, and arranging the partitions in a descending order according to the number of the requests;

determining whether the request number of the partition with the largest request number exceeds half of the total number of the requests of all the partitions of the thread pool,

and if so, splitting the partition with the largest number of requests.

Further, in the method, the splitting the partition with the largest number of requests includes:

dividing the partition into a plurality of sub-partitions, and dispersing the sub-partitions to other servers, wherein each sub-partition corresponds to one sub-key code range in the key code range of the partition, and the request number of each sub-partition is basically equal.

Further, after determining whether the number of requests of the partition with the largest number of requests exceeds half of the total number of requests of all partitions of the thread pool, the method further includes:

if not, one or more partitions are sequentially selected from the first partition in the descending order of the partitions until the total number of the requests affiliated to the unselected rest partitions is less than half of the total number of the requests of all the partitions of the thread pool;

and performing migration operation on the selected partition.

Further, in the foregoing method, the migrating the selected partition includes:

and migrating each selected partition to a server of a thread pool without the exceeding load pressure.

Further, in the foregoing method, migrating each selected partition to a server without a thread pool whose load pressure exceeds a standard includes:

and searching the servers which meet the conditions and have no thread pool with excessive load pressure, and if the servers are searched, migrating the selected partition to the searched servers.

Further, in the above method, the server of the eligible thread pool without the overproof load pressure includes:

and if the average utilization rate of the threads of each corresponding thread pool of the migrated target server does not exceed a preset utilization rate threshold after the selected partition is migrated to the corresponding thread pool of the server without the thread pool with the excessive load pressure, the server is the server with the qualified thread pool without the excessive load pressure.

Further, in the above method, the average thread usage rate of each thread pool is determined by the following formula (λ)₁+ λ) B/n, wherein,

λ₁representing the rate at which requests in a thread pool on the server arrive at the queue of the thread pool before migration;

lambda represents the rate at which requests in a corresponding thread pool on a target server to be migrated reach a queue of the thread pool before migration;

b represents the actual processing time of each thread in a corresponding thread pool of the target server to be migrated to for a request;

n represents the number of threads in a corresponding thread pool of the target server to be migrated.

According to another aspect of the present application, there is also provided a load balancing apparatus of a distributed storage system, the apparatus including:

the load monitoring device is used for determining all thread pools with overproof load pressure on each server in the distributed storage system;

and the alarm or load balancing device is used for alarming or load balancing each thread pool with the exceeding load pressure.

Further, in the above device, the load monitoring apparatus is configured to obtain a ratio of a waiting time of a request in a queue of each thread pool on each server to a residence time, where the residence time is a sum of the waiting time of the request in the queue of each thread pool and an actual processing time; and when the ratio of the waiting time to the staying time exceeds a preset overproof threshold, determining that the load pressure of a thread pool on the server where the request is positioned exceeds the standard.

Further, in the above device, the preset exceeding threshold is determined according to a preset threshold of a request arrival rate of the thread pool, where the request arrival rate of the thread pool is a ratio of a rate at which requests arrive at a queue of the thread pool to a service capacity of the thread pool per unit time, when the request arrival rate of the thread pool exceeds the preset threshold of the request arrival rate of the thread pool, a ratio of a corresponding waiting time to a lingering time starts to rise sharply, and the preset exceeding threshold exceeds the ratio of the waiting time to the lingering time when the request arrival rate starts to rise sharply.

Further, in the above device, the alarm or load balancing means is configured to count, according to the number of partitions, requests of the thread pool whose load pressure exceeds the standard, count the number of requests belonging to different partitions in the thread pool, and arrange the partitions in a descending order according to the number of the requests; and judging whether the request number of the partition with the largest request number exceeds half of the total number of the requests of all the partitions of the thread pool, and if so, splitting the partition with the largest request number.

Further, in the above device, the alarm or load balancing apparatus is configured to divide the partition into a plurality of sub-partitions, and distribute the sub-partitions to other servers, where each sub-partition corresponds to a sub-key range in the key range of the partition, and the number of requests to which each sub-partition belongs is substantially equal.

Further, in the above apparatus, the alarm or load balancing device is configured to determine whether the number of requests of the partition with the largest number of requests exceeds half of the total number of requests of all partitions of the thread pool, and if not, sequentially select one or more partitions from a first partition among the partitions arranged in a descending order until the total number of requests to which the unselected remaining partitions belong is less than half of the total number of requests of all partitions of the thread pool; and performing migration operation on the selected partition.

Further, in the above apparatus, the alarm or load balancing means is configured to migrate each selected partition to a server of a thread pool without exceeding a load pressure.

Further, in the above apparatus, the alarm or load balancing device is configured to search for a server that meets the condition and has no thread pool with a standard load pressure, and if the server is found, migrate the selected partition to the found server.

Further, in the foregoing apparatus, the server of the eligible thread pool without exceeding the load pressure includes:

Further, in the above apparatus, the average usage rate of threads per thread pool is represented by the following formula (λ)₁+ λ) B/n, wherein,

In addition, the present application also provides a load monitoring device of a device distributed storage system, including:

a processor;

and a memory arranged to store computer executable instructions that, when executed, cause the processor to:

Compared with the prior art, the method and the device have the advantages that the alarm or load balance can be carried out according to the thread pool state of the server, namely the load of a single server exceeds the service capacity, so that the alarm or the load balance among the servers can be automatically carried out according to the thread pool state of the server, the requests of different users which are simultaneously reached can be correctly processed independent of a user request mode, the service capacity of the server is independent of the service capacity of the server, the alarm or the load balance which can be correctly executed under the condition that the service capacities of the servers in the cluster of the distributed storage system are inconsistent can be also be correctly executed, hot spots are prevented, and the service quality of the distributed storage system is improved.

Further, the application passes the waiting time W_qAnd comparing the ratio of the lingering time W with a preset overproof threshold th, so that all thread pools with overproof load pressures on each server can be accurately obtained.

Furthermore, the accurate preset exceeding threshold value is determined according to the preset threshold value of the request arrival rate of the thread pool, so that all the thread pools with the exceeding load pressure on each server can be obtained more accurately.

Furthermore, the method and the device count the requests of each thread pool with the exceeding load pressure according to the partitions, count the number of the requests belonging to different partitions in the thread pool, arrange the partitions in a descending order according to the number of the requests, and split the partition with the largest number of requests when the number of the requests of the partition with the largest number of requests exceeds half of the total number of the requests of all the partitions in the thread pool, so that the partition needing the splitting operation can be accurately found, and the load balance is effectively realized.

Further, in the present application, when the request number of the partition with the largest number of requests does not exceed half of the total number of requests of all partitions of the thread pool, one or more partitions are sequentially selected from the first partition in the descending order of the partitions until the total number of requests to which the unselected remaining partitions belong is less than half of the total number of requests of all partitions of the thread pool, and the selected partition is migrated, so that the partition to be migrated in the thread pool can be accurately found, and load balancing is better achieved.

Furthermore, the selected partition is migrated to the searched server only on the premise of searching the server which meets the conditions and has no thread pool with the overproof load pressure, so that the load balance is better realized.

Furthermore, in the application, after a selected partition is migrated to a corresponding thread pool of a server without a thread pool with excessive load pressure, when the average utilization rate of threads of each corresponding thread pool of the migrated target server does not exceed a preset utilization rate threshold, the server is a server of a qualified thread pool without excessive load pressure, and the server of the qualified thread pool without excessive load pressure can be accurately found, so that load balancing is better realized.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 illustrates a flow diagram of a method of load monitoring of a distributed storage system in accordance with an aspect of the subject application;

FIG. 2 is a flow chart illustrating a preferred embodiment of a load monitoring method for a distributed storage system according to the present application;

FIG. 3 illustrates a preset superscalar threshold determination schematic according to one embodiment of the present application;

FIG. 4 illustrates a flow diagram of another preferred embodiment of a method for load monitoring of a distributed storage system according to the present application;

FIG. 5 illustrates a flow diagram of yet another preferred embodiment of a method for load monitoring of a distributed storage system according to the present application;

FIG. 6 is a flowchart of an embodiment of a load monitoring method for a distributed storage system according to the present application;

FIG. 7 illustrates a block diagram of a load monitoring device of a distributed storage system in accordance with another aspect of the subject application.

The same or similar reference numbers in the drawings identify the same or similar elements.

Detailed Description

The present invention is described in further detail below with reference to the attached drawing figures.

In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.

As shown in fig. 1, the present application provides a load monitoring method for a distributed storage system, including:

step S1, determining all thread pools with excessive load pressure on each server in the distributed storage system;

and step S2, performing alarm or load balance on each thread pool with the exceeding load pressure. In this embodiment, an alarm is performed or a load is automatically and evenly distributed among servers according to a thread pool state of the servers (servers), that is, the load of a single Server (Server) exceeds a service capability, and the method can correctly process requests simultaneously made by different users without depending on a user request mode and without depending on the service capability of the servers, and can correctly perform the alarm or load balancing even when the service capabilities of the servers in a cluster of a distributed storage system are inconsistent, thereby preventing hot spots from occurring and improving the service quality of the distributed storage system. Specifically, all the thread pools with excessive load pressure on each Server in the distributed storage system can be determined according to the thread pool basic information of the Server (Server), or alarm or load balancing can be performed on each thread pool with excessive load pressure. A single Server (Server) generally has several thread pools, a typical thread pool may be described by a single queue model, the thread pool basic information may be embodied as a queue parameter, and the specific queue parameter may include the following:

a)W_q: indicating the latency of a request in the queue of a thread pool.

b) B: indicating the actual processing time of a request in the queue of a thread pool.

c) W: indicating the sojourn time, i.e. the waiting time W, of a request in a queue of a thread pool_qPlus the actual processing time B.

d) λ: indicating the rate at which requests in a thread pool arrive at the queue of the thread pool.

e) μ: representing the service capacity per time unit of a certain thread pool.

As shown in fig. 2, in a preferred embodiment of the load monitoring method for a distributed storage system according to the present application, in step S1, the determining a thread pool in which all load pressures on each server in the distributed storage system exceed a standard includes:

step S11, obtaining the waiting time W of a request in the queue of each thread pool on each server_qRatio to the residence time W of a request in the queue of each thread pool_qAnd the actual placeSum of treatment time B;

step S12, when the waiting time W is less than the preset waiting time_qAnd when the ratio of the time W to the stay time exceeds a preset exceeding threshold th, determining that the load pressure of the thread pool on the server where the request is positioned exceeds a standard. Analyzing each thread pool parameter on each service, finding out the thread pool with the excessive load pressure, and judging how to judge the thread pool with the excessive load by using the following formula:

W_q/W＞th

the meaning of this formula is when the waiting time W is_qAnd when the ratio of the time W to the stay time exceeds a preset exceeding threshold th, determining that the load pressure of the thread pool on the server where the request is positioned exceeds a standard. The embodiment passes the waiting time W_qAnd comparing the ratio of the lingering time W with a preset overproof threshold th, so that all thread pools with overproof load pressures on each server can be accurately obtained.

In a preferred embodiment of the load monitoring method for the distributed storage system of the present application, in step S12, the predetermined superstandard threshold is determined according to a predetermined threshold of a request arrival rate of a thread pool, where the request arrival rate of the thread pool is a ratio of a rate λ of requests arriving at a queue of the thread pool to a service capacity μ of the thread pool per unit time, and when the request arrival rate λ/μ of the thread pool exceeds the predetermined threshold of the request arrival rate of the thread pool, a corresponding ratio W of a waiting time to a sojourn time is obtained_qThe preset exceeding threshold value exceeds the ratio W of the waiting time and the lingering time when the sudden rising is started_qand/W. Specifically, the preset threshold of the request arrival rate of the thread pool may be determined by referring to fig. 3, in fig. 3, each line table has one thread pool, and n represents the number of threads in the corresponding thread pool, where 1 thread in the first thread pool, 2 threads in the second thread pool, 5 threads in the third thread pool, 10 threads in the fourth thread pool, 24 threads in the fifth thread pool, and the abscissa represents the request arrival rate λ/μ and the ordinate represents the waiting time W_qRatio W to residence time W_qand/W. From FIG. 3, W can be seen_qW begins to rise sharply after the request arrival rate lambda/mu exceeds a certain value (a preset threshold value of the request arrival rate of the thread pool), and W_qThe value of the point at which/W starts to rise sharply is the inflection point, and therefore W_qThe preset superstandard threshold of/W is only required to exceed the inflection point, for example, the value of the inflection point can be set to 0.5 in practice, and the preset superstandard threshold is only required to be greater than 0.5. According to the method and the device, the accurate preset overproof threshold is determined according to the preset threshold of the request arrival rate of the thread pool, so that all the thread pools with the overproof load pressures on each server can be obtained more accurately.

As shown in fig. 4, in a preferred embodiment of the load monitoring method for a distributed storage system according to the present application, the load balancing for each thread pool whose load pressure exceeds the standard in step S2 includes:

step S21, counting the requests passing through each thread pool with the overproof load pressure according to partitions, counting the number of the requests belonging to different partitions in the thread pool, and arranging the partitions in a descending order according to the number of the requests;

step S22, determining whether the number of requests for the partition with the largest number of requests exceeds half of the total number of requests for all partitions in the thread pool, if yes, proceeding to step S23,

in step S23, the partition with the largest number of requests is subjected to the splitting operation. Specifically, for example, if three partitions in a thread pool with a certain exceeding load pressure are partition a, partition B, and partition C, where the number of requests belonging to partition a is 100, the number of requests belonging to partition B is 20, and the number of requests belonging to partition C is 10, the number of requests 100 for partition a with the largest number exceeds half 65 of the total number of requests for all partitions in the thread pool, which is (100+20+ 10)/2. In this embodiment, request information is extracted from each thread pool with a super-standard load pressure for analysis, because the thread pools do not correspond to the partitions (Partition), the partitions (Partition) are only one logical unit, and a request belonging to the Partition (Partition) may be processed by using a plurality of thread pools. And counting the requests passing through the thread pool according to partitions (Partition), counting the number of the requests belonging to different partitions (Partition), and then arranging the requests in a descending order according to the number of the requests. If the number of requests of a Partition (Partition) with the largest number of requests in a thread pool exceeds half of the total number of requests of all partitions of the thread pool, the Partition (Partition) is selected and moved to step S23, step S23, because the number of requests belonging to the selected Partition (Partition) accounts for half or more of the total number of requests of all partitions of the thread pool, and therefore, the selected Partition (Partition) needs to be split, and the process is finished after the split. In this embodiment, by finding the partition with the largest number of requests, which exceeds half of the total number of requests of all the partitions of the thread pool, the partition may be considered as a partition in the thread pool, which has a significant influence on exceeding of the load pressure, and therefore, the partition needs to be selected and split, so as to effectively implement load balancing.

In a preferred embodiment of the load monitoring method for the distributed storage system of the present application, in step S23, the splitting operation is performed on the partition with the largest number of requests, including:

dividing the partition into a plurality of sub-partitions, and dispersing the sub-partitions to other servers, wherein each sub-partition corresponds to one sub-key code range in the key code range of the partition, and the request number of each sub-partition is basically equal. Specifically, the split point is selected averagely according to the number of requests in a Partition range after the requests falling on the thread pool from the Partition (Partition) are sequenced, for example, the number of the requests in a Partition range is 0.1 to 0.4, wherein the number of the requests in a range of 0.1 to 0.2 is 200, the number of the requests in a range of 0.2 to 0.3 is 200, and the number of the requests in a range of 0.3 to 0.4 is 200, the Partition can be divided into three sub-partitions, and the corresponding sub-key ranges are 0.1 to 0.2, 0.2 to 0.3, and 0.3 to 0.4, respectively, so that load balancing is better achieved.

As shown in fig. 5, in a preferred embodiment of the load monitoring method of the distributed storage system according to the present application, after determining whether the number of requests of the partition with the largest number of requests exceeds half of the total number of requests of all partitions of the thread pool in step S22, the method further includes:

if not, go to step S24,

step S24, one or more partitions are selected in sequence from the first partition in the descending order of partitions until the total number of the requests affiliated by the unselected rest partitions is less than half of the total number of the requests of all the partitions of the thread pool;

in step S25, a migration operation is performed on the selected partition. Here, if the Partition with the largest number of requests does not exceed half of the total number of requests of all partitions of the thread pool, the partitions (partitions) are sequentially selected from the first Partition (Partition) arranged in the thread pool in descending order until the total number of requests to which the remaining partitions belong is less than half of the total number of requests of all partitions of the thread pool, and the process proceeds to step S25, and in step S25, because the Partition with the largest number of requests does not exceed half of the total number of requests of all partitions of the thread pool, the partitions (partitions) having a significant influence in the requests of the thread pool are not present, and the partitions (partitions) sequentially selected from the first Partition are migrated one by one. For example, if there are five partitions in a thread pool with a exceeding load pressure, namely partition D, partition E, partition F, partition G, and partition H, where the number of requests belonging to partition D is 100, the number of requests belonging to partition E is 100, the number of requests belonging to partition F is 100, the number of requests belonging to partition G is 100, and the number of requests belonging to partition H is 100, the first three partitions D, E and F need to be selected for migration one by one, so that the total number of requests 200 to 100+100 of the remaining partitions G and H is less than half of the total number of requests of all the partitions in the thread pool 250 (100+100+100+100+ 100)/2. The embodiment can accurately find the partition needing to be migrated in one thread pool, thereby better realizing load balancing.

In a preferred embodiment of the load monitoring method for a distributed storage system of the present application, the migrating a selected partition includes:

and migrating each selected partition to a server of a thread pool without exceeding the load pressure, thereby realizing load balance.

In a preferred embodiment of the load monitoring method for a distributed storage system of the present application, migrating each selected partition to a server in a thread pool without exceeding a load pressure includes:

and searching the servers which meet the conditions and have no thread pool with excessive load pressure, and if the servers are searched, migrating the selected partition to the searched servers. In this embodiment, on the premise of searching a server that meets the condition and has no thread pool with excessive load pressure, the selected partition is migrated to the searched server, so as to better implement load balancing.

In a preferred embodiment of the load monitoring method for a distributed storage system, the server of the eligible thread pool with no overproof load pressure includes:

and if the average utilization rate of the threads of each corresponding thread pool of the migrated target server does not exceed a preset utilization rate threshold after the selected partition is migrated to the corresponding thread pool of the server without the thread pool with the excessive load pressure, the server is the server with the qualified thread pool without the excessive load pressure. Specifically, one Partition (Partition) is taken from the set to be migrated, that is, all selected partitions, and the number of requests of the Partition (Partition) on all thread pools of the current server is obtained, for example, a selected Partition on the server M uses two thread pools to process requests belonging to the Partition, that is, a read request thread pool Q1 and a write request thread pool Q2, one of the server sets without an overload thread pool is randomly selected, for example, a server N is selected, then after the read request thread pool Q1 and the write request thread pool Q2 of the selected Partition (Partition) are migrated to a read request thread pool Q1+ and a write request thread pool Q2+ corresponding to the server N, that is, after the read request thread pool Q1 is migrated to a read request thread pool Q1+, and the write request thread pool Q2 is migrated to a write request thread pool Q2+, average usage rates of the read request thread pool Q1+ and the write request thread pool Q2+ on the server N do not exceed a preset usage threshold, the preset utilization threshold may be an empirical value, that is, if the utilization of all the corresponding thread pools after migration does not exceed the preset utilization threshold, it is determined that migration is allowed, otherwise, another server without the thread pool with the overproof load pressure is selected to repeat the process until all the servers without the thread pool with the overproof load pressure have been examined, and no server with the thread pool with the overproof load pressure is found, that is, the Partition is abandoned (Partition). According to the embodiment, the server which meets the conditions and has no thread pool with overproof load pressure can be accurately found, so that the load balance is better realized.

In a preferred embodiment of the load monitoring method for a distributed storage system of the present application, the average usage rate of threads in each thread pool is determined by the following formula (λ [ # ])₁+ λ) B/n, wherein,

λ₁representing the rate at which requests in a thread pool on a server arrive at the queue of the thread pool before migration;

lambda represents the rate at which requests in a corresponding thread pool on a target server to be migrated reach the queue of the thread pool before migration;

n represents the number of threads in a corresponding thread pool of the target server to be migrated. Specifically, for example, the server M before migration and the target server N to be migrated are required to respectively calculate whether the average utilization rates of the threads in the read request thread pool Q1+ and the write request thread pool Q2+ do not exceed the preset utilization rate threshold, where the average utilization rate of the threads in the read request thread pool Q1+ is calculated as (λ 1 +)_Q1+λ_Q1+)*B_Q1+/n_Q1+The calculation formula of the average thread utilization rate of the write request thread pool Q2+ is (lambda)_Q2+λ_Q2+)*B_Q2+/n_Q2+The embodiment can accurately calculate the average utilization rate of the threads of each thread pool, thereby better realizing load balancing.

As shown in fig. 6, in a specific application example of the present application, the load monitoring method for a distributed storage system includes the following steps:

step S61, obtaining an unprocessed thread pool in the distributed storage system, and judging whether the thread pool is obtained, if not, going to step S62, if so, going to step S63,

step S62, end;

step S63, obtaining the waiting time W of a request in the queue of the thread pool_qThe ratio of the waiting time W of a request in the queue of the thread pool to the waiting time W_qThe sum of the actual processing time B;

step S64, judging the waiting time W_qWhether the ratio of the dwell time W to the preset exceeding threshold th is exceeded, if not, go to step S61, if yes, go to step S65,

step S65, counting the requests passing through the thread pool according to partitions, counting the number of the requests belonging to different partitions in the thread pool, arranging the partitions according to the number of the requests in a descending order, judging whether the number of the requests of the partition with the largest number of the requests exceeds half of the total number of the requests of all the partitions in the thread pool, if yes, going to step S66, if no, going to step S67,

step S66, splitting the partition with the largest number of requests;

step S67, one or more partitions are selected in sequence from the first partition in the descending order of partitions until the total number of the requests affiliated by the unselected rest partitions is less than half of the total number of the requests of all the partitions of the thread pool;

step S68, judging whether the selected subarea has unprocessed subareas, if yes, going to step S69, if no, going to step S61,

step S69, taking down an unprocessed partition;

step S70, searching the servers of the thread pool which meets the conditions and has no load pressure exceeding the standard, judging whether the servers are found, if not, turning to step S68 to take the unprocessed subarea from the selected subarea and carry out subsequent processing until all the selected subareas (partitions) are processed, if so, turning to step S71,

step S71, after the selected partition is migrated to the found server, the process goes to step S68.

As shown in fig. 7, according to another aspect of the present application, there is provided a load monitoring apparatus of a distributed storage system, the apparatus 100 including:

the load monitoring device 1 is used for determining all thread pools with overproof load pressure on each server in the distributed storage system;

and the alarm or load balancing device 2 is used for carrying out alarm or load balancing on each thread pool with the exceeding load pressure. In this embodiment, an alarm is performed or a load is automatically and evenly distributed among servers according to a thread pool state of the servers (servers), that is, the load of a single Server (Server) exceeds a service capability, and the method can correctly process requests simultaneously made by different users without depending on a user request mode and without depending on the service capability of the servers, and can correctly perform the alarm or load balancing even when the service capabilities of the servers in a cluster of a distributed storage system are inconsistent, thereby preventing hot spots from occurring and improving the service quality of the distributed storage system. Specifically, all the thread pools with excessive load pressure on each Server in the distributed storage system can be determined according to the thread pool basic information of the Server (Server), or alarm or load balancing can be performed on each thread pool with excessive load pressure. A single Server (Server) generally has several thread pools, a typical thread pool may be described by a single queue model, the thread pool basic information may be embodied as a queue parameter, and the specific queue parameter may include the following:

a)W_q: indicating the latency of a request in the queue of a thread pool.

In a preferred embodiment of the load monitoring apparatus of the distributed storage system of the present application, the load monitoring device 1 is configured to obtain a waiting time W of a request in a queue of each thread pool on each server_qRatio to the residence time W of a request in the queue of each thread pool_qThe sum of the actual processing time B; when the waiting time W is_qAnd when the ratio of the time W to the stay time exceeds a preset exceeding threshold th, determining that the load pressure of the thread pool on the server where the request is positioned exceeds a standard. Analyzing each thread pool parameter on each service, finding out the thread pool with the excessive load pressure, and judging how to judge the thread pool with the excessive load by using the following formula:

W_q/W＞th

In a preferred embodiment of the load monitoring device of the distributed storage system, the preset superstandard threshold is determined according to a preset threshold of a request arrival rate of the thread pool, where the request arrival rate of the thread pool is a ratio of a rate at which requests arrive at a queue of the thread pool to a service capacity μ of the thread pool per unit time, and when the request arrival rate of the thread pool exceeds a preset threshold of a request arrival rate λ/μ of the thread pool, a ratio W of a corresponding waiting time to a staying time is obtained_qThe preset exceeding threshold value exceeds the ratio W of the waiting time and the lingering time when the sudden rising is started_qand/W. Each thread table has a thread pool, n represents the number of threads in the corresponding thread pool, wherein, the number of threads in the first thread pool is 1, the number of threads in the second thread pool is 2, and the number of threads in the third thread pool isThere are 5 threads, 10 threads in the fourth thread pool, and 24 threads in the fifth thread pool, with the abscissa representing request arrival rate λ/μ and the ordinate representing wait time W in FIG. 3_qRatio W to residence time W_qand/W. From FIG. 3, W can be seen_qW begins to rise sharply after the request arrival rate lambda/mu exceeds a certain value (a preset threshold value of the request arrival rate of the thread pool), and W_qThe value of the point at which/W starts to rise sharply is the inflection point, and therefore W_qThe preset superstandard threshold of/W is only required to exceed the inflection point, for example, the value of the inflection point can be set to 0.5 in practice, and the preset superstandard threshold is only required to be greater than 0.5. According to the method and the device, the accurate preset overproof threshold is determined according to the preset threshold of the request arrival rate of the thread pool, so that all the thread pools with the overproof load pressures on each server can be obtained more accurately.

In a preferred embodiment of the load monitoring device of the distributed storage system of the present application, the alarm or load balancing apparatus 2 is configured to count, according to partitions, requests that pass through each thread pool whose load pressure exceeds a standard, count the number of requests belonging to different partitions in the thread pool, and arrange the partitions in a descending order according to the number of the requests; and judging whether the request number of the partition with the largest request number exceeds half of the total number of the requests of all the partitions of the thread pool, and if so, splitting the partition with the largest request number. Specifically, for example, if three partitions in a thread pool with a certain exceeding load pressure are partition a, partition B, and partition C, where the number of requests belonging to partition a is 100, the number of requests belonging to partition B is 20, and the number of requests belonging to partition C is 10, the number of requests 100 for partition a with the largest number exceeds half 65 of the total number of requests for all partitions in the thread pool, which is (100+20+ 10)/2. In this embodiment, request information is extracted from each thread pool with a super-standard load pressure for analysis, because the thread pools do not correspond to the partitions (Partition), the partitions (Partition) are only one logical unit, and a request belonging to the Partition (Partition) may be processed by using a plurality of thread pools. And counting the requests passing through the thread pool according to partitions (Partition), counting the number of the requests belonging to different partitions (Partition), and then arranging the requests in a descending order according to the number of the requests. If the number of requests of a Partition (Partition) with the largest number of requests in a thread pool exceeds half of the total number of requests of all partitions of the thread pool, the Partition (Partition) is selected and moved to step S23, step S23, because the number of requests belonging to the selected Partition (Partition) accounts for half or more of the total number of requests of all partitions of the thread pool, and therefore, the selected Partition (Partition) needs to be split, and the process is finished after the split. In this embodiment, by finding the partition with the largest number of requests, which exceeds half of the total number of requests of all the partitions of the thread pool, the partition may be considered as a partition in the thread pool, which has a significant influence on exceeding of the load pressure, and therefore, the partition needs to be selected and split, so as to effectively implement load balancing.

In a preferred embodiment of the load monitoring apparatus of the distributed storage system of the present application, the alarm or load balancing device 2 is configured to divide the partition into a plurality of sub-partitions, and distribute the sub-partitions to other servers, where each sub-partition corresponds to a sub-key range in the key range of the partition, and the number of requests that each sub-partition belongs to is substantially equal. Specifically, the split point is selected averagely according to the number of requests in a Partition range after the requests falling on the thread pool from the Partition (Partition) are sequenced, for example, the number of the requests in a Partition range is 0.1 to 0.4, wherein the number of the requests in a range of 0.1 to 0.2 is 200, the number of the requests in a range of 0.2 to 0.3 is 200, and the number of the requests in a range of 0.3 to 0.4 is 200, the Partition can be divided into three sub-partitions, and the corresponding sub-key ranges are 0.1 to 0.2, 0.2 to 0.3, and 0.3 to 0.4, respectively, so that load balancing is better achieved.

In a preferred embodiment of the load monitoring device of the distributed storage system of the present application, the alarm or load balancing apparatus 2 is configured to determine whether the request number of the partition with the largest request number exceeds half of the total number of requests of all partitions of the thread pool, and if not, sequentially select one or more partitions from a first partition in the descending order of partitions until the total number of requests to which the unselected remaining partitions belong is less than half of the total number of requests of all partitions of the thread pool; and performing migration operation on the selected partition. Here, if the Partition with the largest number of requests does not exceed half of the total number of requests of all partitions of the thread pool, the partitions (partitions) are sequentially selected from the first Partition (Partition) arranged in the thread pool in a descending order until the total number of requests to which the remaining partitions belong is less than half of the total number of requests of all partitions of the thread pool, and the selected partitions are migrated. For example, if there are five partitions in a thread pool with a exceeding load pressure, namely partition D, partition E, partition F, partition G, and partition H, where the number of requests belonging to partition D is 100, the number of requests belonging to partition E is 100, the number of requests belonging to partition F is 100, the number of requests belonging to partition G is 100, and the number of requests belonging to partition H is 100, the first three partitions D, E and F need to be selected for migration one by one, so that the total number of requests 200 to 100+100 of the remaining partitions G and H is less than half of the total number of requests of all the partitions in the thread pool 250 (100+100+100+100+ 100)/2. The embodiment can accurately find the partition needing to be migrated in one thread pool, thereby better realizing load balancing.

In a preferred embodiment of the load monitoring apparatus of the distributed storage system of the present application, the alarm or load balancing device 2 is configured to migrate each selected partition to a server in a thread pool without exceeding a load pressure, so as to implement load balancing.

In a preferred embodiment of the load monitoring device of the distributed storage system of the present application, the alarm or load balancing apparatus 2 is configured to search for a server that meets a condition and has no thread pool with a standard exceeding load pressure, and if the server is found, migrate the selected partition to the found server. In this embodiment, on the premise of searching a server that meets the condition and has no thread pool with excessive load pressure, the selected partition is migrated to the searched server, so as to better implement load balancing.

In a preferred embodiment of the load monitoring apparatus of the distributed storage system, the server of the eligible thread pool without the overproof load pressure includes:

In a preferred embodiment of the load monitoring device of the distributed storage system, the average usage rate of threads in each thread pool is determined by the following formula (λ [ # ])₁+ λ) B/n, wherein,

a processor;

In summary, the method and the device for alarming and load balancing in the distributed storage system determine all the thread pools with the excessive load pressures on each server in the distributed storage system, alarm or load balancing is carried out on each thread pool with the excessive load pressures, alarm can be carried out or load can be automatically and evenly distributed among the servers according to the thread pool states of the servers, namely the load of a single server exceeds the service capacity, the method and the device are independent of a user request mode, can correctly process the requests simultaneously reached by different users, are independent of the service capacity of the servers, and can correctly execute the alarm or load balancing under the condition that the service capacities of the servers in the cluster of the distributed storage system are inconsistent, so that hot spots are prevented from occurring, and the service quality of the distributed storage system is improved.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

It should be noted that the present invention may be implemented in software and/or in a combination of software and hardware, for example, as an Application Specific Integrated Circuit (ASIC), a general purpose computer or any other similar hardware device. In one embodiment, the software program of the present invention may be executed by a processor to implement the steps or functions described above. Also, the software programs (including associated data structures) of the present invention can be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Further, some of the steps or functions of the present invention may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

In addition, some of the present invention can be applied as a computer program product, such as computer program instructions, which when executed by a computer, can invoke or provide the method and/or technical solution according to the present invention through the operation of the computer. Program instructions which invoke the methods of the present invention may be stored on a fixed or removable recording medium and/or transmitted via a data stream on a broadcast or other signal-bearing medium and/or stored within a working memory of a computer device operating in accordance with the program instructions. An embodiment according to the invention herein comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or solution according to embodiments of the invention as described above.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims

1. A method for load monitoring of a distributed storage system, wherein the method comprises:

alarming or load balancing each thread pool with the excessive load pressure, comprising the following steps: counting the requests of each thread pool with the exceeding load pressure according to partitions, counting the number of the requests belonging to different partitions in the thread pool, and arranging the partitions in a descending order according to the number of the requests; and judging whether the request number of the partition with the largest request number exceeds half of the total number of the requests of all the partitions of the thread pool, and if so, splitting the partition with the largest request number.

2. The method of claim 1, wherein determining a thread pool in which all load pressures on each server in the distributed storage system are out of compliance comprises:

3. The method as claimed in claim 2, wherein the predetermined superstandard threshold is determined according to a predetermined threshold of request arrival rates of the thread pool, wherein the request arrival rate of the thread pool is a ratio of a rate of requests arriving at the queue of the thread pool to a service capacity of the thread pool per unit time, when the request arrival rate of the thread pool exceeds the predetermined threshold of request arrival rate of the thread pool, a ratio of a corresponding waiting time to a sojourn time starts to rise sharply, and the predetermined superstandard threshold exceeds the ratio of the waiting time to the sojourn time when the request arrival rate of the thread pool starts to rise sharply.

4. The method of claim 1, wherein the splitting operation for the partition with the largest number of requests comprises:

5. The method of claim 1, wherein determining whether the number of requests of the partition with the largest number of requests exceeds half of the total number of requests of all partitions of the thread pool further comprises:

and performing migration operation on the selected partition.

6. The method of claim 5, wherein performing the migration operation on the selected partition comprises:

7. The method of claim 6, wherein migrating each selected partition to a server without a thread pool having a superscalar load pressure comprises:

8. The method of claim 7, wherein the servers of the eligible thread pool that do not have a load pressure violation comprise:

9. The method of claim 8 wherein the average thread usage for each thread pool is determined byThe following formula (λ)₁+ λ) B/n, wherein,

10. A load monitoring device of a device distributed storage system, wherein the device comprises:

the alarm or load balancing device is used for carrying out alarm or load balancing on each thread pool with overproof load pressure, and comprises: the system comprises a thread pool, a plurality of partitions and a plurality of groups of load units, wherein the thread pool is used for counting the requests passing through each thread pool with the overproof load pressure according to the partitions, counting the number of the requests belonging to different partitions in the thread pool, and arranging the partitions in a descending order according to the number of the requests; and judging whether the request number of the partition with the largest request number exceeds half of the total number of the requests of all the partitions of the thread pool, and if so, splitting the partition with the largest request number.

11. The apparatus according to claim 10, wherein the load monitoring device is configured to obtain a ratio of a waiting time of a request in the queue of each thread pool to a sojourn time on each server, where the sojourn time is a sum of the waiting time and an actual processing time of the request in the queue of each thread pool; and when the ratio of the waiting time to the staying time exceeds a preset overproof threshold, determining that the load pressure of a thread pool on the server where the request is positioned exceeds the standard.

12. The apparatus according to claim 11, wherein the predetermined superscalar threshold is determined according to a predetermined threshold of request arrival rates of the thread pool, wherein the request arrival rates of the thread pool are ratios of the rates of requests arriving at the queues of the thread pool to the service capacity per unit time of the thread pool, when the request arrival rates of the thread pool exceed the predetermined threshold of request arrival rates of the thread pool, the corresponding ratios of the waiting time to the sojourn time start to rise sharply, and the predetermined superscalar threshold exceeds the ratios of the waiting time to the sojourn time when the start rises sharply.

13. The apparatus according to claim 10, wherein the alarm or load balancing means is configured to divide the partition into a plurality of sub-partitions, and distribute the sub-partitions to other servers, wherein each sub-partition corresponds to a sub-key range within the key range of the partition, and the number of requests to which each sub-partition belongs is substantially equal.

14. The apparatus according to claim 10, wherein the alarm or load balancing means is configured to determine whether the number of requests of the partition with the largest number of requests exceeds half of the total number of requests of all partitions of the thread pool, and if not, sequentially select one or more partitions from a first partition among the partitions in the descending order until the total number of requests to which the unselected remaining partitions belong is less than half of the total number of requests of all partitions of the thread pool; and performing migration operation on the selected partition.

15. The apparatus of claim 14, wherein said alarm or load balancing means is adapted to migrate each selected partition to a server of a thread pool that has no overproof load pressure.

16. The apparatus according to claim 15, wherein the alarm or load balancing means is configured to find a server that meets a condition without a thread pool with a overproof load pressure, and if found, migrate the selected partition to the found server.

17. The apparatus of claim 16, wherein the servers of the eligible thread pool that are not overproofed by load pressure comprise:

18. The apparatus of claim 17, wherein the average usage of threads per thread pool is given by the following formula (λ)₁+ λ) B/n, wherein,