WO2017028696A1 - Method and device for monitoring load of distributed storage system - Google Patents

Method and device for monitoring load of distributed storage system Download PDF

Info

Publication number
WO2017028696A1
WO2017028696A1 PCT/CN2016/093893 CN2016093893W WO2017028696A1 WO 2017028696 A1 WO2017028696 A1 WO 2017028696A1 CN 2016093893 W CN2016093893 W CN 2016093893W WO 2017028696 A1 WO2017028696 A1 WO 2017028696A1
Authority
WO
WIPO (PCT)
Prior art keywords
thread pool
partition
server
requests
partitions
Prior art date
Application number
PCT/CN2016/093893
Other languages
French (fr)
Chinese (zh)
Inventor
张潇雨
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017028696A1 publication Critical patent/WO2017028696A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • the present application relates to the field of computers, and in particular, to a load monitoring method and device for a distributed storage system.
  • a distributed storage system is a distributed system that uses a cluster to provide storage services.
  • the user uses a key (Key) as an index to read and write the corresponding key (Value).
  • Key an index
  • Value the corresponding key
  • For a key code the user can write different values such as writing a value, reading a corresponding key value, or deleting a corresponding key value.
  • Each operation is called a request.
  • a thread pool in a distributed storage system is a service unit that has a certain number of threads. The request is first queued to join the queue of the thread pool, and the thread in the thread pool will take the request from the queue for processing in the idle state. Partition is the basic unit of distributed storage system scheduling.
  • the key (Key) uniquely determines the subordinate partition (Partition) by the partition key (BeginKey) and the end key (EndKey). There is no overlap between the partitions.
  • a server in a distributed storage system is a basic unit for providing services. Each server has a plurality of partitions, and requests for different key codes are different according to the partitions to which they belong. It is handled by different servers. The server internally uses the thread pool as the actual processing unit to handle different requests.
  • the user's key code is divided into partitions and then stored in the distributed file system in order. Since a single partition can only belong to one server (Server), when the number of user requests within a single partition increases, This will increase the load on the server, increase the user's latency (Latency), etc., and will also affect other partitions on this server. Therefore, in order to ensure full utilization of the service capabilities of all servers within the cluster, a load monitoring scheme is needed to spread hotspots and improve service quality.
  • the current solution to request hotspots is the splitting and migration of partitions.
  • the splitting is to divide the partition into multiple partitions according to different key ranges (Parts), and the split partitions are randomly distributed to other servers; the migration is to partition (Partition) Partition) moves from one server to another.
  • Partition When the query rate per second (QPS) of a single partition (Partition) is greater than a certain threshold, the partition is split according to the range requested by the user. However, according to the requested query rate per second (QPS) as a threshold, it is necessary to determine the processing power of different servers, so different values need to be configured on different servers, and on the server. Sometimes it is not possible to achieve theoretical processing power when running other programs.
  • QPS query rate per second
  • An object of the present application is to provide a load monitoring method and device for a distributed storage system, which can solve the problem of hot spots in a distributed storage system.
  • a load monitoring method for a distributed storage system comprising:
  • determining a thread pool in which all load pressures on each server in the distributed storage system exceed the standard include:
  • the preset exceeding threshold is determined according to a preset threshold of a request arrival rate of the thread pool, wherein the request arrival rate of the thread pool is a rate of a queue requesting to reach the thread pool and the thread pool The ratio of service capabilities per unit time, when the thread pool's request arrival rate exceeds the thread pool's request arrival rate.
  • the threshold is preset, the ratio of the corresponding waiting time to the staying time starts to rise sharply, and the preset exceeding threshold exceeds the ratio of the waiting time to the staying time when the start of the sharp rise.
  • load balancing is performed on each thread pool whose load pressure exceeds the standard, including:
  • the request of the thread pool that exceeds the standard of each load pressure is counted according to the partition, and the number of requests belonging to different partitions in the thread pool is counted, and the partitions are arranged in descending order according to the number of requests;
  • the splitting operation is performed on the partition with the largest number of requests, including:
  • the partition is divided into several sub-partitions, and the sub-partitions are distributed to other servers, wherein each sub-partition corresponds to a sub-key range within a range of key codes of the partition, and the number of requests to which each sub-part belongs is substantially equal.
  • the method further includes:
  • the selected partition is migrated, including:
  • each selected partition is migrated to a server that has no thread pool with excessive load pressure, including:
  • the server that meets the condition that the thread pool without the load pressure exceeds the standard includes:
  • the server is a server with an eligible thread pool that has no load pressure exceeded.
  • the average thread usage rate of each thread pool is obtained by the following formula ( ⁇ 1 + ⁇ ) * B / n, wherein
  • ⁇ 1 represents the rate at which a request in a thread pool on the server before the migration reaches the queue of the thread pool
  • represents the rate at which the target server to be migrated to the queue of the thread pool in a corresponding thread pool before the migration
  • B represents the actual processing time of one request for each thread in a corresponding thread pool of the target server to be migrated
  • n indicates the number of threads in a corresponding thread pool of the target server to be migrated to.
  • a load balancing device of a distributed storage system comprising:
  • a load monitoring device configured to determine a thread pool in which all load pressures on each server in the distributed storage system exceed the standard
  • An alarm or load balancing device that is used to alarm or load balance each thread pool whose load pressure exceeds the standard.
  • the load monitoring device is configured to obtain a ratio of a waiting time and a stay time of a request in a queue of each thread pool on each server, where the stay time is a queue of each thread pool. The sum of the waiting time of one request and the actual processing time; when the ratio of the waiting time to the staying time exceeds a preset exceeding threshold, it is determined that the load pressure of the thread pool on the server where the request is located exceeds the standard.
  • the preset exceeding threshold is determined according to a preset threshold of a request arrival rate of the thread pool, wherein the request arrival rate of the thread pool is a rate of a queue requesting to reach the thread pool and the thread pool.
  • the ratio of the service capacity per unit time when the request arrival rate of the thread pool exceeds the preset threshold of the request arrival rate of the thread pool, the ratio of the corresponding waiting time to the stay time starts to rise sharply, and the preset exceeding threshold exceeds the The ratio of waiting time to staying time when starting a sharp rise.
  • the alarm or load balancing device is configured to collect, according to the partition, the request of the thread pool that exceeds the pressure of each load, and count the number of requests belonging to different partitions in the thread pool. And sorting the partitions in descending order of the number of requests; determining whether the number of requests for the partition with the largest number of requests exceeds half of the total number of requests of all partitions of the thread pool; and if so, splitting the partition with the largest number of requests.
  • the alarm or load balancing device is configured to divide the partition into a plurality of sub-partitions, and distribute the sub-partitions to other servers, wherein each sub-part corresponds to a sub-range of the key code range of the partition.
  • the range of key codes, the number of requests to which each sub-partition belongs is substantially equal.
  • the alarm or load balancing device is configured to determine the partition with the largest number of requests Whether the number of requests exceeds half of the total number of requests of all partitions of the thread pool, and if not, select one or more partitions from the first partition in the descending ranked partition until the remaining partitions are not selected
  • the total number of requests for membership is less than half of the total number of requests for all partitions of the thread pool; the migration operation is performed on the selected partition.
  • the alarm or load balancing device is configured to migrate each selected partition to a server of a thread pool that does not have a load pressure exceeding the standard.
  • the alarm or load balancing device is configured to search for a server that meets the condition that the thread pool has no overloaded load, and if found, migrates the selected partition to the found server.
  • the server that meets the condition that the thread pool without the load pressure exceeds the standard includes:
  • the server is a server with an eligible thread pool that has no load pressure exceeded.
  • the average thread usage rate of each thread pool is obtained by the following formula ( ⁇ 1 + ⁇ ) * B / n, wherein
  • ⁇ 1 represents the rate at which a request in a thread pool on the server before the migration reaches the queue of the thread pool
  • represents the rate at which the target server to be migrated to the queue of the thread pool in a corresponding thread pool before the migration
  • B represents the actual processing time of one request for each thread in a corresponding thread pool of the target server to be migrated
  • n indicates the number of threads in a corresponding thread pool of the target server to be migrated to.
  • the present application determines whether the thread pool with excessive load pressure on each server in the distributed storage system is alarmed or load balanced for each thread pool whose load pressure exceeds the standard, and can be based on the thread pool status of the server. That is, the load of a single server exceeds the service capability to alarm or automatically distribute the load among the servers. It does not depend on the user request mode, and can correctly handle requests that are simultaneously reached by different users, and does not depend on the server's service capabilities. In the case of inconsistent service capabilities of the internal servers of the distributed storage system, alarms or load balancing can be performed correctly, thereby preventing hot spots and improving the quality of distributed storage system services.
  • the present application can accurately obtain a thread pool in which all load pressures on each server exceed the standard by comparing the ratio of the waiting time W q to the staying time W and the preset over-standard threshold th.
  • the present application determines an accurate preset exceeding threshold according to a preset threshold of the thread pool's request arrival rate, so that all thread pools with excessive load pressure on each server can be obtained more accurately.
  • the application will perform statistics according to the partitions of each thread pool whose load pressure exceeds the standard, and count the number of requests belonging to different partitions in the thread pool, and arrange the partitions in descending order according to the number of requests.
  • the partition with the largest number of requests exceeds half of the total number of requests for all partitions of the thread pool, the partition with the largest number of requests is split, and the partition that needs to be split can be accurately found, thereby effectively implementing Load balancing.
  • the number of requests for the partition with the largest number of requests does not exceed half of the total number of requests of all the partitions of the thread pool
  • one or more are selected from the first partition in the descending ranked partition. Partitions, until the total number of requests belonging to the remaining unselected partitions is less than half of the total number of requests from all partitions of the thread pool, and the selected partitions are migrated to accurately find the partitions to be migrated in one thread pool. To achieve better load balancing.
  • the present application migrates the selected partition to the found server under the premise of finding a server with a thread pool with no load pressure exceeding the standard, thereby better implementing load balancing.
  • the average thread usage rate of each corresponding thread pool of the migrated target server is If the server does not exceed the preset usage threshold, the server is a server with an unqualified thread pool with excessive load pressure. It can accurately find the server with the thread pool without load pressure exceeding the standard, thus achieving better load balancing. .
  • FIG. 1 shows a flow chart of a load monitoring method of a distributed storage system in accordance with an aspect of the present application
  • FIG. 2 is a flow chart showing a preferred embodiment of a load monitoring method of the distributed storage system of the present application
  • FIG. 3 illustrates a schematic diagram of a preset over-standard threshold determination according to an embodiment of the present application
  • FIG. 4 shows a flow of another preferred embodiment of a load monitoring method of a distributed storage system according to the present application Figure
  • FIG. 5 is a flow chart showing still another preferred embodiment of the load monitoring method of the distributed storage system according to the present application.
  • FIG. 6 is a flow chart showing a specific application embodiment of a load monitoring method of a distributed storage system according to the present application
  • FIG. 7 is a block diagram showing a load monitoring device of a distributed storage system in accordance with another aspect of the present application.
  • the terminal, the device of the service network, and the trusted party each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage,
  • computer readable media does not include non-transitory computer readable media, such as modulated data signals and carrier waves.
  • the present application provides a load monitoring method for a distributed storage system, where the method includes:
  • Step S1 determining a thread pool whose load pressure exceeds the standard on each server in the distributed storage system
  • step S2 an alarm or load balancing is performed on each thread pool whose load pressure exceeds the standard.
  • the alarm according to the thread pool state of the server (Server), that is, the load of the single server exceeds the service capability, the alarm is automatically distributed or the load is automatically distributed among the servers, and the user request mode is not dependent on the user. Handling requests that are simultaneously reached by different users does not depend on the service capabilities of the server. In the case of inconsistent service capabilities of the internal servers of the distributed storage system, the alarms or load balancing can be correctly performed. Prevent hot spots and improve the quality of distributed storage system services.
  • all thread pools with excessive load pressure on each server in the distributed storage system may be determined, or alarm or load balancing may be performed on each thread pool whose load pressure exceeds the standard.
  • a single server generally has several thread pools.
  • a typical thread pool can be described by a single queue model.
  • the basic information of the thread pool can be represented as a queue parameter.
  • the specific queue parameters can include the following contents:
  • W q indicates the waiting time of a request in the queue of a thread pool.
  • b) B indicates the actual processing time of a request in the queue of a thread pool.
  • W indicates the waiting time of a request in the queue of a thread pool, that is, the waiting time W q plus the actual processing time B.
  • d) ⁇ indicates the rate at which a request in a thread pool reaches the queue of the thread pool.
  • step S1 a thread pool in which all load pressures on each server in the distributed storage system exceeds the standard is determined, including:
  • Step S11 the acquisition of each thread pool on each server request queue wait time W Q W stay with the ratio of the waiting time W Q stay thread pool for each queue a request for the time W The sum of the actual processing time B;
  • step S12 when the ratio of the waiting time Wq to the staying time W exceeds the preset exceeding threshold th, it is determined that the load pressure of the thread pool on the server where the request is located exceeds the standard. Analyze each thread pool parameter on each service to find out the thread pool whose load pressure exceeds the standard. For how to judge the thread pool exceeding the standard load, you can use the following formula to judge:
  • the meaning of the formula is that when the ratio of the waiting time W q to the staying time W exceeds the preset exceeding threshold th, it is determined that the load pressure of the thread pool on the server where the request is located exceeds the standard.
  • the ratio of the waiting time W q to the staying time W and the preset over-standard threshold th it is possible to accurately obtain a thread pool in which all load pressures on each server exceed the standard.
  • the preset exceeding threshold in step S12 is determined according to a preset threshold of a request arrival rate of the thread pool, wherein the request arrival rate of the thread pool is The ratio of the rate ⁇ of the queue that requests the thread pool to the service capacity ⁇ of the thread pool per unit time, when the thread pool request arrival rate ⁇ / ⁇ exceeds the preset threshold of the thread pool request arrival rate, the corresponding wait The ratio W q /W of the time to the dwell time starts to rise sharply, and the preset over-standard threshold exceeds the ratio W q /W of the waiting time to the dwell time when the start of the sharp rise.
  • the preset threshold of the request arrival rate of the thread pool can be determined by using FIG. 3.
  • each thread table has a thread pool, and n respectively represents the number of threads in the corresponding thread pool, wherein the first one There are 1 thread in the thread pool, 2 threads in the second thread pool, 3 threads in the third thread pool, and 10 threads in the fourth thread pool.
  • the fifth thread pool There are 24 threads in it.
  • the abscissa indicates the request arrival rate ⁇ / ⁇
  • the ordinate indicates the ratio W q /W of the waiting time W q to the stay time W. From Fig.
  • W q /W starts to rise sharply after the request arrival rate ⁇ / ⁇ exceeds a certain value (preset threshold of the thread pool's request arrival rate), and the start of W q /W is sharp
  • the value of the rising point is the inflection point, so the preset exceeding threshold of W q /W can only exceed the inflection point.
  • the value of the inflection point can be set to 0.5, and the preset exceeding threshold is only greater than 0.5. can.
  • an accurate preset over-standard threshold is determined according to a preset threshold of the request arrival rate of the thread pool, so that all the thread pools whose load pressure exceeds the standard on each server can be obtained more accurately.
  • load balancing is performed on each thread pool whose load pressure exceeds the standard in step S2, including:
  • Step S21 the request of the thread pool passing each load pressure exceeding the standard is performed according to the partition, and the number of requests belonging to different partitions in the thread pool is counted, and the partitions are arranged in descending order according to the number of requests;
  • Step S22 determining whether the number of requests of the partition with the largest number of requests exceeds half of the total number of requests of all the partitions of the thread pool, and if yes, go to step S23.
  • Step S23 performing a split operation on the partition with the largest number of requests.
  • three partitions in a thread pool whose load pressure exceeds the standard are partition A, partition B, and partition C, wherein the number of requests belonging to partition A is 100, and the number of requests belonging to partition B is 20.
  • the thread pool extraction request information with each load pressure exceeding the standard is analyzed, because the thread pool and the partition are not one-to-one correspondence, and the partition is only a logical unit and belongs to the partition (Partition).
  • Requests may be processed using multiple thread pools.
  • the request through the thread pool is counted according to the partition (Partition), and the number of requests belonging to different partitions (Partition) is counted, and then arranged in descending order according to the number of requests. If the number of requested partitions in a thread pool exceeds half of the total number of requests for all partitions of the thread pool, then select the partition and jump to step S23, in step S23. Because it belongs to The number of requests for the selected Partition accounts for half or more of the total number of requests for all partitions of the thread pool. Therefore, the selected Partition needs to be split, and the processing ends after the split.
  • the partition can be regarded as a partition in the thread pool that has a significant influence on the load pressure exceeding the standard, and therefore, Select it and split it to effectively achieve load balancing.
  • step S23 the partitioning operation is performed on the partition with the largest number of requests, including:
  • the partition is divided into several sub-partitions, and the sub-partitions are distributed to other servers, wherein each sub-partition corresponds to a sub-key range within a range of key codes of the partition, and the number of requests to which each sub-part belongs is substantially equal.
  • the splitting point is averaged according to the number of requests in the partition range after the request is placed on the thread pool, for example, the key code range of a certain partition range is 0.1 to 0.4, wherein 0.1 ⁇ There are 200 requests in the range of 0.2, 200 in the range of 0.2 to 0.3, and 200 in the range of 0.3 to 0.4.
  • the partition can be divided into three sub-partitions, and the corresponding sub-key range
  • the load balancing is better achieved by 0.1 to 0.2, 0.2 to 0.3, and 0.3 to 0.4, respectively.
  • step S22 it is determined whether the number of requests of the partition with the largest number of requests exceeds the total number of requests of all the partitions of the thread pool. After half, it also includes:
  • Step S24 selecting one or more partitions in the descending order partition from the first partition, until the total number of requests belonging to the unselected remaining partitions is less than half of the total number of requests of all partitions of the thread pool. ;
  • step S25 a migration operation is performed on the selected partition.
  • the partition (Partition) is selected from the first partition (Partition) in descending order of the thread pool.
  • the partition with the largest number of requests does not exceed the thread pool.
  • the partition with the largest number of requests does not exceed the thread pool.
  • the partitions of all partitions are half of the total number of requests, so there are no partitions (Partitions) in the request of the thread pool, so the partitions selected in order from the first partition are migrated one by one.
  • partition D partition D
  • partition E partition E
  • partition F partition G
  • partition H partition H
  • the number of requests belonging to partition D is 100
  • the number is 100
  • the number of requests belonging to partition F is 100
  • the number of requests belonging to the partition G is 100
  • the number of requests belonging to the partition H is 100
  • the first three partitions D, E, and F need to be selected for migration one by one, so that the remaining partitions G and H belong to the request.
  • the selected partition is migrated, including:
  • Load balancing is achieved by migrating each selected partition to a server that does not have a thread pool with excessive load stress.
  • each selected partition is migrated to a server having no thread pool with excessive load pressure, including:
  • the server that meets the condition that there is no thread pool with excessive load pressure includes:
  • a selected partition is migrated to a corresponding thread pool of a server that has no thread pool with excessive load pressure, the average thread usage of each corresponding thread pool of the migrated target server does not exceed the preset usage. Rate threshold, then the server is a server with an eligible thread pool that has no load pressure exceeded.
  • a partition is taken from the set to be migrated, that is, all selected partitions, and the number of requests on all thread pools on the current server is obtained. For example, a selected partition on the server M uses two thread pools to process the membership.
  • the request is respectively a read request thread pool Q1 and a write request thread pool Q2, and one of the server sets without the overloaded thread pool is randomly selected, for example, the server N is selected, and then the read of the selected partition (Partition) is calculated.
  • Partition the read of the selected partition
  • the average usage rate of the thread of the request thread pool Q1+ and the write request thread pool Q2+ on the server N does not exceed the preset usage threshold, and the preset usage threshold may be an empirical value. That is, if the usage rate of all corresponding thread pools after the migration does not exceed the preset usage threshold, it is determined that the migration is allowed, otherwise another A no load
  • the server of the thread pool with excessive pressure repeats the process until all the servers of the thread pool with no overloaded load have been checked, and no server that meets the condition of the thread pool with no overloaded load is found, that is, the partition is abandoned. ).
  • This embodiment can accurately find a server of a thread pool that does not have a load pressure exceeding the standard, thereby achieving load balancing better.
  • the average thread usage rate of each thread pool is obtained by the following formula ( ⁇ 1 + ⁇ ) * B / n, wherein
  • ⁇ 1 represents the rate at which a request in a thread pool on the server before the migration reaches the queue of the thread pool
  • represents the rate at which the target server to be migrated to a queue in a corresponding thread pool before the migration reaches the queue of the thread pool;
  • B represents the actual processing time of one request for each thread in a corresponding thread pool of the target server to be migrated
  • n indicates the number of threads in a corresponding thread pool of the target server to be migrated to. Specifically, taking the server M before the migration and the target server N to be migrated as an example, whether the average usage rate of the thread of the read request thread pool Q1+ and the write request thread pool Q2+ is not exceeded or not exceeds the preset usage threshold.
  • the average usage rate of the thread of the read request thread pool Q1+ is ( ⁇ Q1 + ⁇ Q1+ )*B Q1+ /n Q1+
  • the calculation formula of the average thread usage rate of the write request thread pool Q2+ is ( ⁇ Q2 + ⁇ Q2+ *B Q2+ /n Q2+ )
  • a load monitoring method of a distributed storage system includes the following steps:
  • Step S61 Obtain a thread pool that is not processed in the distributed storage system, and determine whether it is acquired. If not, go to step S62, and if yes, go to step S63.
  • Step S63 the acquisition waiting time ratio W Q W stay with the thread pool is in a request queue, the stay latency time W W Q for the thread pool queue a request to the actual processing time and B is ;
  • Step S64 determining whether the ratio of the waiting time Wq to the staying time W exceeds a preset exceeding threshold th, if not, going to step S61, and if yes, going to step S65,
  • Step S65 the request through the thread pool is counted according to the partition, and the number of requests belonging to different partitions in the thread pool is counted, and the partitions are arranged in descending order according to the number of requests, and the number of requests is determined. Whether the number of requests of the partition exceeds half of the total number of requests of all the partitions of the thread pool, and if yes, go to step S66, if no, go to step S67.
  • Step S66 performing a split operation on the partition with the largest number of requests
  • Step S67 selecting one or more partitions in the descending order partition from the first partition, until the total number of requests belonging to the unselected remaining partitions is less than half of the total number of requests of all partitions of the thread pool. ;
  • Step S68 determining whether there is an unprocessed partition in the selected partition, if yes, proceeding to step S69, if not, proceeding to step S61,
  • Step S69 taking off an unprocessed partition
  • Step S70 searching for a server that meets the condition of the thread pool with no overload of the load pressure, and determining whether it is found, if not, then going to step S68 to remove an unprocessed partition from the selected partition and Subsequent processing until all selected partitions have been processed. If found, go to step S71.
  • Step S71 after migrating the selected partition to the found server, go to step S68.
  • a load monitoring device of a distributed storage system is provided, and the device 100 includes:
  • the load monitoring device 1 is configured to determine a thread pool whose load pressure exceeds the standard on each server in the distributed storage system;
  • the alarm or load balancing device 2 is configured to perform alarm or load balancing on each thread pool whose load pressure exceeds the standard.
  • the alarm according to the thread pool state of the server (Server), that is, the load of the single server exceeds the service capability, the alarm is automatically distributed or the load is automatically distributed among the servers, and the user request mode is not dependent on the user. Handling requests that are simultaneously reached by different users does not depend on the service capabilities of the server. In the case where the service capabilities of the internal servers of the distributed storage system are inconsistent, the alarms or load balancing can be correctly performed, thereby preventing hot spots and improving distribution. Storage system service quality.
  • all thread pools with excessive load pressure on each server in the distributed storage system may be determined, or alarm or load balancing may be performed on each thread pool whose load pressure exceeds the standard.
  • a single server generally has several thread pools.
  • a typical thread pool can be described by a single queue model.
  • the basic information of the thread pool can be represented as a queue parameter.
  • the specific queue parameters can include the following contents:
  • W q indicates the waiting time of a request in the queue of a thread pool.
  • b) B indicates the actual processing time of a request in the queue of a thread pool.
  • W indicates the waiting time of a request in the queue of a thread pool, that is, the waiting time W q plus the actual processing time B.
  • d) ⁇ indicates the rate at which a request in a thread pool reaches the queue of the thread pool.
  • the load monitoring device 1 is configured to acquire a waiting time W q and a stay time W of a request in a queue of each thread pool on each server.
  • the ratio of the waiting time W is the sum of the waiting time Wq of one request in the queue of each thread pool and the actual processing time B; when the ratio of the waiting time Wq to the staying time W exceeds a preset exceeding threshold th
  • the load pressure of the thread pool on the server where the request is located exceeds the standard. Analyze each thread pool parameter on each service to find out the thread pool whose load pressure exceeds the standard. For how to judge the thread pool exceeding the standard load, you can use the following formula to judge:
  • the meaning of the formula is that when the ratio of the waiting time W q to the staying time W exceeds the preset exceeding threshold th, it is determined that the load pressure of the thread pool on the server where the request is located exceeds the standard.
  • the ratio of the waiting time W q to the staying time W and the preset over-standard threshold th it is possible to accurately obtain a thread pool in which all load pressures on each server exceed the standard.
  • the preset exceeding threshold is determined according to a preset threshold of a request arrival rate of the thread pool, wherein the request arrival rate of the thread pool is a request arrival thread.
  • Each thread table has a thread pool, and n represents the number of threads in the corresponding thread pool. Among them, there are one thread in the first thread pool, and two threads in the second thread pool, and the third thread. There are 5 threads in the thread pool, 10 threads in the fourth thread pool, and 24 threads in the fifth thread pool.
  • the abscissa indicates the request arrival rate ⁇ / ⁇ , and the ordinate indicates waiting.
  • the ratio of time W q to stay time W is W q /W. From Fig.
  • W q /W starts to rise sharply after the request arrival rate ⁇ / ⁇ exceeds a certain value (preset threshold of the thread pool's request arrival rate), and the start of W q /W is sharp
  • the value of the rising point is the inflection point, so the preset exceeding threshold of W q /W only needs to exceed the inflection point.
  • the value of the inflection point can be set to 0.5, and the preset exceeding threshold is only greater than 0.5. can.
  • an accurate preset over-standard threshold is determined according to a preset threshold of the request arrival rate of the thread pool, so that all the thread pools whose load pressure exceeds the standard on each server can be obtained more accurately.
  • the alarm or load balancing device 2 is configured to count the request of the thread pool that exceeds the pressure of each load according to the partition, and count the thread pool. The number of requests belonging to different partitions, and sorting the partitions in descending order of the number of requests; determining whether the number of requests for the partition with the largest number of requests exceeds half of the total number of requests for all partitions of the thread pool, and if so, The partition with the largest number of requests performs the split operation.
  • partition A three partitions in a thread pool whose load pressure exceeds the standard are partition A, partition B, and partition C, wherein the number of requests belonging to partition A is 100, and the number of requests belonging to partition B is 20.
  • the thread pool extraction request information with each load pressure exceeding the standard is analyzed, because the thread pool and the partition are not one-to-one correspondence, and the partition is only a logical unit and belongs to the partition (Partition). Requests may be processed using multiple thread pools.
  • the request through the thread pool is counted according to the partition (Partition), and the number of requests belonging to different partitions (Partition) is counted, and then arranged in descending order according to the number of requests. If the number of requested partitions in a thread pool exceeds half of the total number of requests for all partitions of the thread pool, then select the partition and jump to step S23, in step S23. Because the number of requests belonging to the selected partition is half or more of the total number of requests for all partitions of the thread pool, the selected partition needs to be split, and the processing ends after the split. .
  • the partition can be regarded as a partition in the thread pool that has a significant influence on the load pressure exceeding the standard, and therefore, Select it and split it to effectively achieve load balancing.
  • the alarm or load balancing device 2 is configured to divide the partition into a plurality of sub-partitions, and distribute the sub-partitions to other servers, wherein each sub-partition
  • the number of requests for each sub-partition is substantially equal to a sub-key range within the range of the key code of the partition.
  • the splitting point is averaged according to the number of requests in the partition range after the request is placed on the thread pool, for example, the key code range of a certain partition range is 0.1 to 0.4, wherein 0.1 ⁇
  • the partition can be divided into three sub-partitions, and the corresponding sub-keys are The circumference is 0.1 to 0.2, 0.2 to 0.3, and 0.3 to 0.4, respectively, to achieve load balancing better.
  • the alarm or load balancing device 2 is configured to determine whether the number of requests for the partition with the largest number of requests exceeds the total number of requests for all the partitions of the thread pool. Half, if not, select one or more partitions in the descending order partition from the first partition, until the total number of requests belonging to the unselected remaining partitions is less than the request of all partitions of the thread pool Half of the total; migrate the selected partition.
  • the partition (Partition) is selected from the first partition (Partition) in descending order of the thread pool.
  • the total number of requests to which the remaining partition belongs is less than half of the total number of requests for all partitions of the thread pool, and the selected partition is migrated because the partition with the largest number of requests does not exceed all of the thread pool
  • the partitioned request has half of the total number of requests, so there is no partition (Partition) in the request of the thread pool, so the partitions selected in order from the first partition are migrated one by one.
  • partition D partition D
  • partition E partition F
  • partition G partition G
  • partition H partition H.
  • the number of requests belonging to partition D is 100, and the request belongs to partition E.
  • the number is 100, the number of requests belonging to the partition F is 100, the number of requests belonging to the partition G is 100, and the number of requests belonging to the partition H is 100, and the first three partitions D, E, and F need to be selected for migration one by one.
  • This embodiment can accurately find a partition in a thread pool that needs to be migrated, thereby achieving load balancing better.
  • the alarm or load balancing device 2 is configured to migrate each selected partition to a server that has no thread pool with excessive load pressure, thereby implementing a load. balanced.
  • the alarm or load balancing device 2 is configured to find a server of a thread pool that does not have a load pressure exceeding the standard, and if found, select the selected The partition is migrated to the discovered server.
  • the selected partition is migrated to the found server, thereby achieving load balancing better.
  • the server that meets the condition that there is no thread pool with excessive load pressure includes:
  • a selected partition is migrated to the corresponding server of a thread pool that does not have excessive load pressure.
  • the server is an eligible server with no thread pool with excessive load pressure.
  • a partition is taken from the set to be migrated, that is, all selected partitions, and the number of requests on all thread pools on the current server is obtained.
  • a selected partition on the server M uses two thread pools to process the membership. The request is respectively a read request thread pool Q1 and a write request thread pool Q2, and one of the server sets without the overloaded thread pool is randomly selected, for example, the server N is selected, and then the read of the selected partition (Partition) is calculated.
  • the read request thread pool Q1 and the write request thread pool Q2 are respectively migrated to the corresponding read request thread pool Q1+ and the write request thread pool Q2+ on the server N
  • the read request thread pool Q1 is migrated to the read request thread pool Q1+
  • the write request thread is to be written.
  • the average usage rate of the thread of the request thread pool Q1+ and the write request thread pool Q2+ on the server N does not exceed the preset usage threshold, and the preset usage threshold may be an empirical value.
  • Partition This embodiment can accurately find a server of a thread pool that does not have a load pressure exceeding the standard, thereby achieving load balancing better.
  • the average thread usage rate of each thread pool is obtained by the following formula ( ⁇ 1 + ⁇ ) * B / n, wherein
  • ⁇ 1 represents the rate at which a request in a thread pool on the server before the migration reaches the queue of the thread pool
  • represents the rate at which the target server to be migrated to the queue of the thread pool in a corresponding thread pool before the migration
  • B represents the actual processing time of one request for each thread in a corresponding thread pool of the target server to be migrated
  • n indicates the number of threads in a corresponding thread pool of the target server to be migrated to. Specifically, taking the server M before the migration and the target server N to be migrated as an example, whether the average usage rate of the thread of the read request thread pool Q1+ and the write request thread pool Q2+ is not exceeded or not exceeds the preset usage threshold.
  • the average usage rate of the thread of the read request thread pool Q1+ is ( ⁇ Q1 + ⁇ Q1+ )*B Q1+ /n Q1+
  • the calculation formula of the average thread usage rate of the write request thread pool Q2+ is ( ⁇ Q2 + ⁇ Q2+ *B Q2+ /n Q2+ )
  • the present application determines whether a thread pool with excessive load pressure on each server in a distributed storage system is alarmed or load balanced for each thread pool whose load pressure exceeds the standard, and can be based on the thread pool status of the server.
  • the load of the server exceeds the service capability to alarm or automatically distribute the load among the servers. It does not depend on the user request mode. It can correctly handle requests that are simultaneously reached by different users, and does not depend on the server's service capabilities. In the case of inconsistent service capabilities of cluster internal servers of a distributed storage system, alarms or load balancing can be performed correctly, thereby preventing hot spots and improving the quality of distributed storage system services.
  • the present application can accurately obtain a thread pool in which all load pressures on each server exceed the standard by comparing the ratio of the waiting time W q to the staying time W and the preset over-standard threshold th.
  • the present application determines an accurate preset exceeding threshold according to a preset threshold of the thread pool's request arrival rate, so that all thread pools with excessive load pressure on each server can be obtained more accurately.
  • the application will perform statistics according to the partitions of each thread pool whose load pressure exceeds the standard, and count the number of requests belonging to different partitions in the thread pool, and arrange the partitions in descending order according to the number of requests.
  • the partition with the largest number of requests exceeds half of the total number of requests for all partitions of the thread pool, the partition with the largest number of requests is split, and the partition that needs to be split can be accurately found, thereby effectively implementing Load balancing.
  • the number of requests for the partition with the largest number of requests does not exceed half of the total number of requests of all the partitions of the thread pool
  • one or more are selected from the first partition in the descending ranked partition. Partitions, until the total number of requests belonging to the remaining unselected partitions is less than half of the total number of requests from all partitions of the thread pool, and the selected partitions are migrated to accurately find the partitions to be migrated in one thread pool. To achieve better load balancing.
  • the present application migrates the selected partition to the found server under the premise of finding a server with a thread pool with no load pressure exceeding the standard, thereby better implementing load balancing.
  • the average thread usage rate of each corresponding thread pool of the migrated target server is If the server does not exceed the preset usage threshold, the server is a server with an unqualified thread pool with excessive load pressure. It can accurately find the server with the thread pool without load pressure exceeding the standard, thus achieving better load balancing. .
  • the present invention can be implemented in software and/or a combination of software and hardware, for example, using an application specific integrated circuit (ASIC), a general purpose computer, or any other similar hardware device.
  • the software program of the present invention may be executed by a processor to implement the steps or functions described above.
  • the software program (including related data structures) of the present invention can be stored in a computer readable recording medium such as a RAM memory, a magnetic or optical drive or a floppy disk and the like.
  • some of the steps or functions of the present invention may be implemented in hardware, for example, as a circuit that cooperates with a processor to perform various steps or functions.
  • a portion of the invention can be applied as a computer program product, such as computer program instructions, which, when executed by a computer, can invoke or provide a method and/or solution in accordance with the present invention.
  • the program instructions for invoking the method of the present invention may be stored in a fixed or removable recording medium and/or transmitted by a data stream in a broadcast or other signal bearing medium, and/or stored in a The working memory of the computer device in which the program instructions are run.
  • an embodiment in accordance with the present invention includes a device including a memory for storing computer program instructions and a processor for executing program instructions, wherein when the computer program instructions are executed by the processor, triggering
  • the apparatus operates based on the aforementioned methods and/or technical solutions in accordance with various embodiments of the present invention.

Abstract

A method and device for monitoring the load of a distributed storage system. The method comprises: determining all thread pools with load pressure exceeding the standard on each server in a distributed storage system (S1); and giving an alarm or performing load balancing for each thread pool with load pressure exceeding the standard (S2). The method and the device are capable of giving an alarm or automatically performing balanced load assignment between servers according to the states of thread pools of the servers, namely, the load excess service ability of the single servers, do not depend on a user request mode, can correctly process a request that is reached by different users at the same time, do not depend on the service capabilities of the servers, and are also capable of correctly giving an alarm or performing load balancing under the condition of inconsistency of the service capabilities of the servers in a cluster of a distributed storage system, thereby preventing the occurrence of a hotspot and improving the service quality of the distributed storage system.

Description

分布式存储系统的负载监控方法及设备Load monitoring method and device for distributed storage system
本申请要求2015年08月17日递交的申请号为201510504654.2、发明名称为“分布式存储系统的负载监控方法及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 20151050465, filed on Aug. 17, 2015, entitled,,,,,,,,,,,,,,,,,,,,,,,,
技术领域Technical field
本申请涉及计算机领域,尤其涉及一种分布式存储系统的负载监控方法及设备。The present application relates to the field of computers, and in particular, to a load monitoring method and device for a distributed storage system.
背景技术Background technique
分布式存储系统是一种使用集群提供存储服务的分布式系统,用户使用键码(Key)作为索引对相应的键值(Value)进行读写等操作。对于一个键码(Key),用户可以对其进行写入键值(Value)、读取对应的键值(Value)或者删除对应的键值(Value)等不同类型的操作,每个操作称为一个请求。分布式存储系统中的线程池是一种拥有一定数目线程的服务单元,请求先加入线程池的队列中进行等待,线程池中的线程会在空闲的时候依次从队列中取出请求进行处理。分区(Partition)是分布式存储系统调度的基本单元,键码(Key)通过分区(Partition)的开始键码(BeginKey)和结束键码(EndKey)来唯一的确定隶属的分区(Partition),不同的分区(Partition)之间不存在重叠。分布式存储系统中的服务器(Server)是提供服务的基本单元,每个服务器(Server)上拥有若干个分区(Partition),不同键码(Key)的请求根据所属的分区(Partition)的不同从而被不同的服务器(Server)所处理,服务器(Server)内部使用线程池作为实际处理单元来处理不同的请求。A distributed storage system is a distributed system that uses a cluster to provide storage services. The user uses a key (Key) as an index to read and write the corresponding key (Value). For a key code, the user can write different values such as writing a value, reading a corresponding key value, or deleting a corresponding key value. Each operation is called a request. A thread pool in a distributed storage system is a service unit that has a certain number of threads. The request is first queued to join the queue of the thread pool, and the thread in the thread pool will take the request from the queue for processing in the idle state. Partition is the basic unit of distributed storage system scheduling. The key (Key) uniquely determines the subordinate partition (Partition) by the partition key (BeginKey) and the end key (EndKey). There is no overlap between the partitions. A server in a distributed storage system is a basic unit for providing services. Each server has a plurality of partitions, and requests for different key codes are different according to the partitions to which they belong. It is handled by different servers. The server internally uses the thread pool as the actual processing unit to handle different requests.
用户的键码(Key)是分成分区(Partition)然后按照顺序存储在分布式文件系统中,由于单个分区只能隶属于一台服务器(Server),因此,单个分区范围内的用户请求增多的时候,会导致服务器的负载增大,用户的延时(Latency)增高等,而且还会对这台服务器上的其他分区造成影响。因此,为了保证充分利用集群内部的所有服务器的服务能力,需要一种负载监控的方案来分散热点,提高服务质量。目前解决请求热点的手段是分区的分裂和迁移。其中,分裂是将分区(Partition)按照不同的键码范围(Keys)分成多个分区(Partition),分裂之后的分区(Partition)会随机分散到其他的服务器(Server)上;迁移是将分区(Partition)从一台服务器(Server)移动到另外一台服务器(Server)上。The user's key code is divided into partitions and then stored in the distributed file system in order. Since a single partition can only belong to one server (Server), when the number of user requests within a single partition increases, This will increase the load on the server, increase the user's latency (Latency), etc., and will also affect other partitions on this server. Therefore, in order to ensure full utilization of the service capabilities of all servers within the cluster, a load monitoring scheme is needed to spread hotspots and improve service quality. The current solution to request hotspots is the splitting and migration of partitions. The splitting is to divide the partition into multiple partitions according to different key ranges (Parts), and the split partitions are randomly distributed to other servers; the migration is to partition (Partition) Partition) moves from one server to another.
已有的解决请求热点的手段大致有如下三种: There are three general methods for resolving request hotspots:
1.当单个分区(Partition)的大小超过一定的限制之后,将Partition分成平均分裂成若干个分区(Partition)。但是,按照单个分区(Partition)的大小切分的方案中并不能准确的反映分区(Partition)的处理能力,用户的请求模式不一致,导致分区(Partition)大小对其的影响也不一样,有时候在很小的分区(Partition)大小的情况下也会因为用户请求集中在很小范围内的原因导致热点的出现。1. After the size of a single partition exceeds a certain limit, divide the Partition into average partitions into partitions. However, the scheme of splitting according to the size of a single partition does not accurately reflect the processing power of the partition. The user's request mode is inconsistent, and the partition size has different effects on it. Sometimes it is different. In the case of a small partition size, hotspots may also occur because user requests are concentrated in a small range.
2.当单个分区(Partition)的请求的每秒查询率(QPS)大于某个阈值之后,将分区(Partition)按照用户请求的范围进行分裂划分。但是,按照请求的的每秒查询率(QPS)作为阈值来划分需要测定不同服务器(Server)的处理能力,因此在不同的服务器(Server)上需要配置不同的值,而且在服务器(Server)上还运行有其他的程序情况下有时候并不能达到理论的处理能力。2. When the query rate per second (QPS) of a single partition (Partition) is greater than a certain threshold, the partition is split according to the range requested by the user. However, according to the requested query rate per second (QPS) as a threshold, it is necessary to determine the processing power of different servers, so different values need to be configured on different servers, and on the server. Sometimes it is not possible to achieve theoretical processing power when running other programs.
3.获取请求执行的一些参数,比如IO操作时间,Cache命中次数等,然后配置一定的规则,当满足预设的条件的时候执行分裂。根据参数配置一定的规则来进行分裂的方案虽然具有很强的灵活性,但是,也正是因为这个原因,在不同的场景下配置的规则也不尽相同,而且需要根据用户的请求模式更新规则,不够自动化。3. Obtain some parameters of the request execution, such as IO operation time, Cache hit times, etc., and then configure certain rules to perform splitting when the preset conditions are met. Although the scheme of configuring certain rules according to the parameters to be split is very flexible, it is for this reason that the rules configured in different scenarios are not the same, and the rules need to be updated according to the user's request mode. Not enough automation.
发明内容Summary of the invention
本申请的一个目的是提供一种用于分布式存储系统的负载监控方法及设备,能够解决分布式存储系统的出现热点的问题。An object of the present application is to provide a load monitoring method and device for a distributed storage system, which can solve the problem of hot spots in a distributed storage system.
根据本申请的一个方面,提供了一种分布式存储系统的负载监控方法,该方法包括:According to an aspect of the present application, a load monitoring method for a distributed storage system is provided, the method comprising:
确定分布式存储系统中每台服务器上所有负载压力超标的线程池;Determining the thread pool where all load pressures on each server in the distributed storage system exceed the standard;
对每个负载压力超标的线程池进行报警或负载均衡。Alarm or load balancing for each thread pool with excessive load pressure.
进一步的,上述方法中,确定分布式存储系统中每台服务器上所有负载压力超标的线程池,包括:Further, in the foregoing method, determining a thread pool in which all load pressures on each server in the distributed storage system exceed the standard include:
获取每台服务器上的每个线程池的队列中一个请求的等待时间与逗留时间的比值,所述逗留时间为每个线程池的队列中一个请求的等待时间与实际处理时间之和;Obtaining a ratio of a wait time to a stay time of a request in a queue of each thread pool on each server, the stay time being the sum of a wait time of one request in a queue of each thread pool and an actual processing time;
当所述等待时间与逗留时间的比值超过预设超标阈值时,确定该请求所在的服务器上的线程池的负载压力超标。When the ratio of the waiting time to the staying time exceeds a preset exceeding threshold, it is determined that the load pressure of the thread pool on the server where the request is located exceeds the standard.
进一步的,上述方法中,所述预设超标阈值根据线程池的请求到达率的预设阈值确定,其中,所述线程池的请求到达率为请求到达线程池的队列的速率与该线程池的单位时间的服务能力的比值,当线程池的请求到达率超过线程池的请求到达率的 预设阈值时,对应的等待时间与逗留时间的比值开始急剧上升,所述预设超标阈值超过所述开始急剧上升时等待时间与逗留时间的比值。Further, in the foregoing method, the preset exceeding threshold is determined according to a preset threshold of a request arrival rate of the thread pool, wherein the request arrival rate of the thread pool is a rate of a queue requesting to reach the thread pool and the thread pool The ratio of service capabilities per unit time, when the thread pool's request arrival rate exceeds the thread pool's request arrival rate. When the threshold is preset, the ratio of the corresponding waiting time to the staying time starts to rise sharply, and the preset exceeding threshold exceeds the ratio of the waiting time to the staying time when the start of the sharp rise.
进一步的,上述方法中,对每个负载压力超标的线程池进行负载均衡,包括:Further, in the above method, load balancing is performed on each thread pool whose load pressure exceeds the standard, including:
将经过每个负载压力超标的线程池的请求按照分区进行统计,统计出该线程池中隶属于不同分区的请求的个数,并将分区按请求的个数降序排列;The request of the thread pool that exceeds the standard of each load pressure is counted according to the partition, and the number of requests belonging to different partitions in the thread pool is counted, and the partitions are arranged in descending order according to the number of requests;
判断请求个数最多的分区的请求数量是否超过该线程池的所有分区的请求的总数的一半,Determines whether the number of requests for the partition with the largest number of requests exceeds half of the total number of requests for all partitions of the thread pool.
若是,对该请求个数最多的分区进行分裂操作。If yes, split the partition with the largest number of requests.
进一步的,上述方法中,对该请求个数最多的分区进行分裂操作,包括:Further, in the foregoing method, the splitting operation is performed on the partition with the largest number of requests, including:
将该分区分成若干个子分区,将子分区分散到其它服务器上,其中,每个子分区对应该分区的键码范围内的一个子键码范围,每个子分区隶属的请求个数基本相等。The partition is divided into several sub-partitions, and the sub-partitions are distributed to other servers, wherein each sub-partition corresponds to a sub-key range within a range of key codes of the partition, and the number of requests to which each sub-part belongs is substantially equal.
进一步的,上述方法中,判断请求个数最多的分区的请求数量是否超过该线程池的所有分区的请求的总数的一半之后,还包括:Further, in the foregoing method, after determining whether the number of requests of the partition with the largest number of requests exceeds half of the total number of requests of all the partitions of the thread pool, the method further includes:
若否,在所述降序排列的分区中从第一分区开始依次选择一个或多个分区,直至未选择的剩余的分区所隶属的请求的总数小于该线程池的所有分区的请求的总数的一半;If not, selecting one or more partitions from the first partition in the descending ranked partition until the total number of requests belonging to the unselected remaining partitions is less than half of the total number of requests of all partitions of the thread pool ;
对选择的分区进行迁移操作。Migrate the selected partition.
进一步的,上述方法中,将选择的分区进行迁移操作,包括:Further, in the above method, the selected partition is migrated, including:
将每个选择的分区迁移到没有负载压力超标的线程池的服务器上。Migrate each selected partition to a server that has no thread pool with excessive load pressure.
进一步的,上述方法中,将每个选择的分区迁移到没有负载压力超标的线程池的服务器上,包括:Further, in the above method, each selected partition is migrated to a server that has no thread pool with excessive load pressure, including:
查找符合条件的没有负载压力超标的线程池的服务器,若查找到,将该选择的分区迁移到该查找到的服务器上。Find a server that meets the criteria for a thread pool that does not have excessive load pressure. If it finds it, migrate the selected partition to the found server.
进一步的,上述方法中,所述符合条件的没有负载压力超标的线程池的服务器包括:Further, in the above method, the server that meets the condition that the thread pool without the load pressure exceeds the standard includes:
若将某个选择的分区迁移到某个没有负载压力超标的线程池的服务器的对应线程池上后,当该迁移到的目标服务器的每个对应线程池的线程平均使用率均没有超过预设使用率阈值,则该服务器为符合条件的没有负载压力超标的线程池的服务器。If a selected partition is migrated to a corresponding thread pool of a server that has no thread pool with excessive load pressure, the average thread usage of each corresponding thread pool of the migrated target server does not exceed the preset usage. Rate threshold, then the server is a server with an eligible thread pool that has no load pressure exceeded.
进一步的,上述方法中,每个线程池的线程平均使用率通过如下公式 (λ1+λ)*B/n获取,其中,Further, in the above method, the average thread usage rate of each thread pool is obtained by the following formula (λ 1 + λ) * B / n, wherein
λ1表示在迁移前服务器上的某个线程池中的请求到达线程池的队列的速率;λ 1 represents the rate at which a request in a thread pool on the server before the migration reaches the queue of the thread pool;
λ表示待迁移到的目标服务器在迁移前其上的某个对应线程池中的请求到达线程池的队列的速率;λ represents the rate at which the target server to be migrated to the queue of the thread pool in a corresponding thread pool before the migration;
B表示待迁移到的目标服务器的某个对应线程池中的每个线程对一个请求的实际处理时间;B represents the actual processing time of one request for each thread in a corresponding thread pool of the target server to be migrated;
n表示待迁移到的目标服务器的某个对应线程池中的线程个数。n indicates the number of threads in a corresponding thread pool of the target server to be migrated to.
根据本申请的另一个方面,还提供一种分布式存储系统的负载均衡设备,该设备包括:According to another aspect of the present application, a load balancing device of a distributed storage system is provided, the device comprising:
负载监控装置,用于确定分布式存储系统中每台服务器上所有负载压力超标的线程池;a load monitoring device, configured to determine a thread pool in which all load pressures on each server in the distributed storage system exceed the standard;
报警或负载均衡装置,用于对每个负载压力超标的线程池进行报警或负载均衡。An alarm or load balancing device that is used to alarm or load balance each thread pool whose load pressure exceeds the standard.
进一步的,上述设备中,所述负载监控装置,用于获取每台服务器上的每个线程池的队列中一个请求的等待时间与逗留时间的比值,所述逗留时间为每个线程池的队列中一个请求的等待时间与实际处理时间之和;当所述等待时间与逗留时间的比值超过预设超标阈值时,确定该请求所在的服务器上的线程池的负载压力超标。Further, in the above device, the load monitoring device is configured to obtain a ratio of a waiting time and a stay time of a request in a queue of each thread pool on each server, where the stay time is a queue of each thread pool. The sum of the waiting time of one request and the actual processing time; when the ratio of the waiting time to the staying time exceeds a preset exceeding threshold, it is determined that the load pressure of the thread pool on the server where the request is located exceeds the standard.
进一步的,上述设备中,所述预设超标阈值根据线程池的请求到达率的预设阈值确定,其中,所述线程池的请求到达率为请求到达线程池的队列的速率与该线程池的单位时间的服务能力的比值,当线程池的请求到达率超过线程池的请求到达率的预设阈值时,对应的等待时间与逗留时间的比值开始急剧上升,所述预设超标阈值超过所述开始急剧上升时等待时间与逗留时间的比值。Further, in the foregoing device, the preset exceeding threshold is determined according to a preset threshold of a request arrival rate of the thread pool, wherein the request arrival rate of the thread pool is a rate of a queue requesting to reach the thread pool and the thread pool The ratio of the service capacity per unit time, when the request arrival rate of the thread pool exceeds the preset threshold of the request arrival rate of the thread pool, the ratio of the corresponding waiting time to the stay time starts to rise sharply, and the preset exceeding threshold exceeds the The ratio of waiting time to staying time when starting a sharp rise.
进一步的,上述设备中,所述报警或负载均衡装置,用于将经过每个负载压力超标的线程池的请求按照分区进行统计,统计出该线程池中隶属于不同分区的请求的个数,并将分区按请求的个数降序排列;判断请求个数最多的分区的请求数量是否超过该线程池的所有分区的请求的总数的一半,若是,对该请求个数最多的分区进行分裂操作。Further, in the above device, the alarm or load balancing device is configured to collect, according to the partition, the request of the thread pool that exceeds the pressure of each load, and count the number of requests belonging to different partitions in the thread pool. And sorting the partitions in descending order of the number of requests; determining whether the number of requests for the partition with the largest number of requests exceeds half of the total number of requests of all partitions of the thread pool; and if so, splitting the partition with the largest number of requests.
进一步的,上述设备中,所述报警或负载均衡装置,用于将该分区分成若干个子分区,将子分区分散到其它服务器上,其中,每个子分区对应该分区的键码范围内的一个子键码范围,每个子分区隶属的请求个数基本相等。Further, in the above device, the alarm or load balancing device is configured to divide the partition into a plurality of sub-partitions, and distribute the sub-partitions to other servers, wherein each sub-part corresponds to a sub-range of the key code range of the partition. The range of key codes, the number of requests to which each sub-partition belongs is substantially equal.
进一步的,上述设备中,所述报警或负载均衡装置,用于判断请求个数最多的分区 的请求数量是否超过该线程池的所有分区的请求的总数的一半,若否,在所述降序排列的分区中从第一分区开始依次选择一个或多个分区,直至未选择的剩余的分区所隶属的请求的总数小于该线程池的所有分区的请求的总数的一半;对选择的分区进行迁移操作。Further, in the above device, the alarm or load balancing device is configured to determine the partition with the largest number of requests Whether the number of requests exceeds half of the total number of requests of all partitions of the thread pool, and if not, select one or more partitions from the first partition in the descending ranked partition until the remaining partitions are not selected The total number of requests for membership is less than half of the total number of requests for all partitions of the thread pool; the migration operation is performed on the selected partition.
进一步的,上述设备中,所述报警或负载均衡装置,用于将每个选择的分区迁移到没有负载压力超标的线程池的服务器上。Further, in the above device, the alarm or load balancing device is configured to migrate each selected partition to a server of a thread pool that does not have a load pressure exceeding the standard.
进一步的,上述设备中,所述报警或负载均衡装置,用于查找符合条件的没有负载压力超标的线程池的服务器,若查找到,将该选择的分区迁移到该查找到的服务器上。Further, in the above device, the alarm or load balancing device is configured to search for a server that meets the condition that the thread pool has no overloaded load, and if found, migrates the selected partition to the found server.
进一步的,上述设备中,所述符合条件的没有负载压力超标的线程池的服务器包括:Further, in the above device, the server that meets the condition that the thread pool without the load pressure exceeds the standard includes:
若将某个选择的分区迁移到某个没有负载压力超标的线程池的服务器的对应线程池上后,当该迁移到的目标服务器的每个对应线程池的线程平均使用率均没有超过预设使用率阈值,则该服务器为符合条件的没有负载压力超标的线程池的服务器。If a selected partition is migrated to a corresponding thread pool of a server that has no thread pool with excessive load pressure, the average thread usage of each corresponding thread pool of the migrated target server does not exceed the preset usage. Rate threshold, then the server is a server with an eligible thread pool that has no load pressure exceeded.
进一步的,上述设备中,每个线程池的线程平均使用率通过如下公式(λ1+λ)*B/n获取,其中,Further, in the above device, the average thread usage rate of each thread pool is obtained by the following formula (λ 1 + λ) * B / n, wherein
λ1表示在迁移前服务器上的某个线程池中的请求到达线程池的队列的速率;λ 1 represents the rate at which a request in a thread pool on the server before the migration reaches the queue of the thread pool;
λ表示待迁移到的目标服务器在迁移前其上的某个对应线程池中的请求到达线程池的队列的速率;λ represents the rate at which the target server to be migrated to the queue of the thread pool in a corresponding thread pool before the migration;
B表示待迁移到的目标服务器的某个对应线程池中的每个线程对一个请求的实际处理时间;B represents the actual processing time of one request for each thread in a corresponding thread pool of the target server to be migrated;
n表示待迁移到的目标服务器的某个对应线程池中的线程个数。n indicates the number of threads in a corresponding thread pool of the target server to be migrated to.
与现有技术相比,本申请通过确定分布式存储系统中每台服务器上所有负载压力超标的线程池,对每个负载压力超标的线程池进行报警或负载均衡,能够根据服务器的线程池状态即单台服务器的负载超出服务能力,来进行报警或自动地在服务器之间均衡分配负载,不依赖于用户请求模式,能正确的处理不同用户同时达到的请求,也不依赖于服务器的服务能力,在分布式存储系统的集群内部服务器的服务能力不一致的情况下也能正确执行的报警或负载均衡,从而防止出现热点,提高分布式存储系统服务质量。 Compared with the prior art, the present application determines whether the thread pool with excessive load pressure on each server in the distributed storage system is alarmed or load balanced for each thread pool whose load pressure exceeds the standard, and can be based on the thread pool status of the server. That is, the load of a single server exceeds the service capability to alarm or automatically distribute the load among the servers. It does not depend on the user request mode, and can correctly handle requests that are simultaneously reached by different users, and does not depend on the server's service capabilities. In the case of inconsistent service capabilities of the internal servers of the distributed storage system, alarms or load balancing can be performed correctly, thereby preventing hot spots and improving the quality of distributed storage system services.
进一步的,本申请通过等待时间Wq与逗留时间W的比值和预设超标阈值th的比较,能够精确地得到每台服务器上所有负载压力超标的线程池。Further, the present application can accurately obtain a thread pool in which all load pressures on each server exceed the standard by comparing the ratio of the waiting time W q to the staying time W and the preset over-standard threshold th.
进一步的,本申请根据线程池的请求到达率的预设阈值确定精确的预设超标阈值,从而能够更精确地得到每台服务器上所有负载压力超标的线程池。Further, the present application determines an accurate preset exceeding threshold according to a preset threshold of the thread pool's request arrival rate, so that all thread pools with excessive load pressure on each server can be obtained more accurately.
进一步的,本申请将经过每个负载压力超标的线程池的请求按照分区进行统计,统计出该线程池中隶属于不同分区的请求的个数,并将分区按请求的个数降序排列,当请求个数最多的分区的请求数量超过该线程池的所有分区的请求的总数的一半时,对该请求个数最多的分区进行分裂操作,可以精确找到需要进行分裂操作的分区,从而有效地实现负载均衡。Further, the application will perform statistics according to the partitions of each thread pool whose load pressure exceeds the standard, and count the number of requests belonging to different partitions in the thread pool, and arrange the partitions in descending order according to the number of requests. When the number of requests for the partition with the largest number of requests exceeds half of the total number of requests for all partitions of the thread pool, the partition with the largest number of requests is split, and the partition that needs to be split can be accurately found, thereby effectively implementing Load balancing.
进一步的,本申请中当请求个数最多的分区的请求数量未超过该线程池的所有分区的请求的总数的一半时,在所述降序排列的分区中从第一分区开始依次选择一个或多个分区,直至未选择的剩余的分区所隶属的请求的总数小于该线程池的所有分区的请求的总数的一半,对选择的分区进行迁移操作,能够精确地找到一个线程池中需要迁移的分区,从而更好地实现负载均衡。Further, in the present application, when the number of requests for the partition with the largest number of requests does not exceed half of the total number of requests of all the partitions of the thread pool, one or more are selected from the first partition in the descending ranked partition. Partitions, until the total number of requests belonging to the remaining unselected partitions is less than half of the total number of requests from all partitions of the thread pool, and the selected partitions are migrated to accurately find the partitions to be migrated in one thread pool. To achieve better load balancing.
进一步的,本申请在查找符合条件的没有负载压力超标的线程池的服务器的前提下,才将选择的分区迁移到该查找到的服务器上,从而更好地实现负载均衡。Further, the present application migrates the selected partition to the found server under the premise of finding a server with a thread pool with no load pressure exceeding the standard, thereby better implementing load balancing.
进一步的,本申请中若将某个选择的分区迁移到某个没有负载压力超标的线程池的服务器的对应线程池上后,当该迁移到的目标服务器的每个对应线程池的线程平均使用率均没有超过预设使用率阈值,则该服务器为符合条件的没有负载压力超标的线程池的服务器,能够精确地找到符合条件的没有负载压力超标的线程池的服务器,从而更好地实现负载均衡。Further, in the present application, if a selected partition is migrated to a corresponding thread pool of a server that has no thread pool with excessive load pressure, the average thread usage rate of each corresponding thread pool of the migrated target server is If the server does not exceed the preset usage threshold, the server is a server with an unqualified thread pool with excessive load pressure. It can accurately find the server with the thread pool without load pressure exceeding the standard, thus achieving better load balancing. .
附图说明DRAWINGS
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:Other features, objects, and advantages of the present application will become more apparent from the detailed description of the accompanying drawings.
图1示出根据本申请一个方面的一种分布式存储系统的负载监控方法的流程图;1 shows a flow chart of a load monitoring method of a distributed storage system in accordance with an aspect of the present application;
图2示出本申请分布式存储系统的负载监控方法一优选的实施例的流程图;2 is a flow chart showing a preferred embodiment of a load monitoring method of the distributed storage system of the present application;
图3示出根据本申请一个实施例的预设超标阈值确定原理图;FIG. 3 illustrates a schematic diagram of a preset over-standard threshold determination according to an embodiment of the present application; FIG.
图4示出根据本申请分布式存储系统的负载监控方法另一优选的实施例的流程 图;4 shows a flow of another preferred embodiment of a load monitoring method of a distributed storage system according to the present application Figure
图5示出根据本申请分布式存储系统的负载监控方法再一优选的实施例的流程图;Figure 5 is a flow chart showing still another preferred embodiment of the load monitoring method of the distributed storage system according to the present application;
图6示出根据本申请分布式存储系统的负载监控方法一具体应用实施例的流程图;6 is a flow chart showing a specific application embodiment of a load monitoring method of a distributed storage system according to the present application;
图7示出根据本申请另一个方面的分布式存储系统的负载监控设备的结构图。7 is a block diagram showing a load monitoring device of a distributed storage system in accordance with another aspect of the present application.
附图中相同或相似的附图标记代表相同或相似的部件。The same or similar reference numerals in the drawings denote the same or similar components.
具体实施方式detailed description
下面结合附图对本发明作进一步详细描述。The invention is further described in detail below with reference to the accompanying drawings.
在本申请一个典型的配置中,终端、服务网络的设备和可信方均包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration of the present application, the terminal, the device of the service network, and the trusted party each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory. Memory is an example of a computer readable medium.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer readable media includes both permanent and non-persistent, removable and non-removable media. Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, A magnetic tape cartridge, magnetic tape storage or other magnetic storage device or any other non-transportable medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media, such as modulated data signals and carrier waves.
如图1所示,本申请提供一种分布式存储系统的负载监控方法,该方法包括:As shown in FIG. 1 , the present application provides a load monitoring method for a distributed storage system, where the method includes:
步骤S1,确定分布式存储系统中每台服务器上所有负载压力超标的线程池;Step S1, determining a thread pool whose load pressure exceeds the standard on each server in the distributed storage system;
步骤S2,对每个负载压力超标的线程池进行报警或负载均衡。本实施例中根据服务器(Server)的线程池状态即单台服务器(Server)的负载超出服务能力,来进行报警或自动地在服务器之间均衡分配负载,不依赖于用户请求模式,能正确的处理不同用户同时达到的请求,也不依赖于服务器的服务能力,在分布式存储系统的集群内部服务器的服务能力不一致的情况下也能正确执行的报警或负载均衡,从而 防止出现热点,提高分布式存储系统服务质量。具体的,可根据服务器(Server)的线程池基本信息,确定分布式存储系统中每台服务器上所有负载压力超标的线程池,或对每个负载压力超标的线程池进行报警或负载均衡。单个服务器(Server)上一般拥有若干个线程池,典型的线程池可以用一个单队列模型来描述,线程池基本信息可体现为队列参数,具体的队列参数可包括如下内容:In step S2, an alarm or load balancing is performed on each thread pool whose load pressure exceeds the standard. In this embodiment, according to the thread pool state of the server (Server), that is, the load of the single server exceeds the service capability, the alarm is automatically distributed or the load is automatically distributed among the servers, and the user request mode is not dependent on the user. Handling requests that are simultaneously reached by different users does not depend on the service capabilities of the server. In the case of inconsistent service capabilities of the internal servers of the distributed storage system, the alarms or load balancing can be correctly performed. Prevent hot spots and improve the quality of distributed storage system services. Specifically, according to the basic information of the thread pool of the server, all thread pools with excessive load pressure on each server in the distributed storage system may be determined, or alarm or load balancing may be performed on each thread pool whose load pressure exceeds the standard. A single server (Server) generally has several thread pools. A typical thread pool can be described by a single queue model. The basic information of the thread pool can be represented as a queue parameter. The specific queue parameters can include the following contents:
a)Wq:表示某个线程池的队列中一个请求的等待时间。a) W q : indicates the waiting time of a request in the queue of a thread pool.
b)B:表示某个线程池的队列中一个请求的实际处理时间。b) B: indicates the actual processing time of a request in the queue of a thread pool.
c)W:表示某个线程池的队列中一个请求的逗留时间,即等待时间Wq加上实际处理时间B。c) W: indicates the waiting time of a request in the queue of a thread pool, that is, the waiting time W q plus the actual processing time B.
d)λ:表示某个线程池中的请求到达该线程池的队列的速率。d) λ: indicates the rate at which a request in a thread pool reaches the queue of the thread pool.
e)μ:表示某个线程池的单位时间的服务能力。e) μ: indicates the service capability of a thread pool per unit time.
如图2所示,本申请的分布式存储系统的负载监控方法一优选的实施例中,步骤S1,确定分布式存储系统中每台服务器上所有负载压力超标的线程池,包括:As shown in FIG. 2, in a preferred embodiment of the load monitoring method of the distributed storage system of the present application, in step S1, a thread pool in which all load pressures on each server in the distributed storage system exceeds the standard is determined, including:
步骤S11,获取每台服务器上的每个线程池的队列中一个请求的等待时间Wq与逗留时间W的比值,所述逗留时间W为每个线程池的队列中一个请求的等待时间Wq与实际处理时间B之和;Step S11, the acquisition of each thread pool on each server request queue wait time W Q W stay with the ratio of the waiting time W Q stay thread pool for each queue a request for the time W The sum of the actual processing time B;
步骤S12,当所述等待时间Wq与逗留时间W的比值超过预设超标阈值th时,确定该请求所在的服务器上的线程池的负载压力超标。对每台服务上的每个线程池参数进行分析,找出负载压力超标的线程池,关于如何判断超标负载的线程池,可以使用如下公式判断:In step S12, when the ratio of the waiting time Wq to the staying time W exceeds the preset exceeding threshold th, it is determined that the load pressure of the thread pool on the server where the request is located exceeds the standard. Analyze each thread pool parameter on each service to find out the thread pool whose load pressure exceeds the standard. For how to judge the thread pool exceeding the standard load, you can use the following formula to judge:
Wq/W>thW q /W>th
该公式的含义是当所述等待时间Wq与逗留时间W的比值超过预设超标阈值th时,确定该请求所在的服务器上的线程池的负载压力超标。本实施例通过等待时间Wq与逗留时间W的比值和预设超标阈值th的比较,能够精确地得到每台服务器上所有负载压力超标的线程池。The meaning of the formula is that when the ratio of the waiting time W q to the staying time W exceeds the preset exceeding threshold th, it is determined that the load pressure of the thread pool on the server where the request is located exceeds the standard. In this embodiment, by comparing the ratio of the waiting time W q to the staying time W and the preset over-standard threshold th, it is possible to accurately obtain a thread pool in which all load pressures on each server exceed the standard.
本申请的分布式存储系统的负载监控方法一优选的实施例中,步骤S12中所述预设超标阈值根据线程池的请求到达率的预设阈值确定,其中,所述线程池的请求到达率为请求到达线程池的队列的速率λ与该线程池的单位时间的服务能力μ的比值,当线程池的请求到达率λ/μ超过线程池的请求到达率的预设阈值时,对应的等 待时间与逗留时间的比值Wq/W开始急剧上升,所述预设超标阈值超过所述开始急剧上升时等待时间与逗留时间的比值Wq/W。具体的,线程池的请求到达率的预设阈值可以通过图3来判断,图3中,每一条线表一个线程池,n分别表示对应的线程池中的线程个数,其中,第一个线程池中的线程有1个,第二个线程池中的线程有2个,第三个线程池中的线程有3个,第四个线程池中的线程有10个,第五个线程池中的线程有24个,图3中横坐标表示请求到达率λ/μ,纵坐标表示等待时间Wq与逗留时间W的比值Wq/W。从图3中,可以看出Wq/W在请求到达率λ/μ超过一定的数值(线程池的请求到达率的预设阈值)之后就开始急剧上升,Wq/W的这一开始急剧上升的点的值即为拐点,因此Wq/W的预设超标阈值只要超过拐点即可,例如,实际中可将拐点的值设定为0.5,则所述预设超标阈值只要大于0.5即可。本实施例根据线程池的请求到达率的预设阈值确定精确的预设超标阈值,从而能够更精确地得到每台服务器上所有负载压力超标的线程池。In a preferred embodiment of the load monitoring method of the distributed storage system of the present application, the preset exceeding threshold in step S12 is determined according to a preset threshold of a request arrival rate of the thread pool, wherein the request arrival rate of the thread pool is The ratio of the rate λ of the queue that requests the thread pool to the service capacity μ of the thread pool per unit time, when the thread pool request arrival rate λ/μ exceeds the preset threshold of the thread pool request arrival rate, the corresponding wait The ratio W q /W of the time to the dwell time starts to rise sharply, and the preset over-standard threshold exceeds the ratio W q /W of the waiting time to the dwell time when the start of the sharp rise. Specifically, the preset threshold of the request arrival rate of the thread pool can be determined by using FIG. 3. In FIG. 3, each thread table has a thread pool, and n respectively represents the number of threads in the corresponding thread pool, wherein the first one There are 1 thread in the thread pool, 2 threads in the second thread pool, 3 threads in the third thread pool, and 10 threads in the fourth thread pool. The fifth thread pool There are 24 threads in it. In Fig. 3, the abscissa indicates the request arrival rate λ/μ, and the ordinate indicates the ratio W q /W of the waiting time W q to the stay time W. From Fig. 3, it can be seen that W q /W starts to rise sharply after the request arrival rate λ/μ exceeds a certain value (preset threshold of the thread pool's request arrival rate), and the start of W q /W is sharp The value of the rising point is the inflection point, so the preset exceeding threshold of W q /W can only exceed the inflection point. For example, in practice, the value of the inflection point can be set to 0.5, and the preset exceeding threshold is only greater than 0.5. can. In this embodiment, an accurate preset over-standard threshold is determined according to a preset threshold of the request arrival rate of the thread pool, so that all the thread pools whose load pressure exceeds the standard on each server can be obtained more accurately.
如图4所示,本申请的分布式存储系统的负载监控方法一优选的实施例中,步骤S2中的对每个负载压力超标的线程池进行负载均衡,包括:As shown in FIG. 4, in a preferred embodiment of the load monitoring method of the distributed storage system of the present application, load balancing is performed on each thread pool whose load pressure exceeds the standard in step S2, including:
步骤S21,将经过每个负载压力超标的线程池的请求按照分区进行统计,统计出该线程池中隶属于不同分区的请求的个数,并将分区按请求的个数降序排列;Step S21, the request of the thread pool passing each load pressure exceeding the standard is performed according to the partition, and the number of requests belonging to different partitions in the thread pool is counted, and the partitions are arranged in descending order according to the number of requests;
步骤S22,判断请求个数最多的分区的请求数量是否超过该线程池的所有分区的请求的总数的一半,若是,转到步骤S23,Step S22, determining whether the number of requests of the partition with the largest number of requests exceeds half of the total number of requests of all the partitions of the thread pool, and if yes, go to step S23.
步骤S23,对该请求个数最多的分区进行分裂操作。具体的,例如,某个负载压力超标的线程池中有三个分区分别为分区A、分区B和分区C,其中,隶属于分区A的请求数为100,隶属于分区B的请求数为20,隶属于分区C的请求数为10,则求个数最多的分区A的请求数量100超过了该线程池的所有分区的请求的总数的一半65=(100+20+10)/2。本实施例中,对于每个负载压力超标的线程池提取请求信息进行分析,因为线程池和分区(Partition)并不是一一对应的,分区(Partition)只是一个逻辑单元,隶属于该分区(Partition)的请求可能会使用多个线程池进行处理。将经过该线程池的请求按照分区(Partition)进行统计,统计出隶属于不同分区(Partition)的请求个数,然后按照请求个数降序排列。如果一个线程池中请求数目最多的某个分区(Partition)请求数目超过了该线程池的所有分区的请求的总数的一半,那么选择将该分区(Partition)并跳转到步骤S23,步骤S23中,因为隶属于该 选择的分区(Partition)的请求数目占了该线程池的所有分区的请求的总数的一半或一半以上,因此,需要对该选择的分区(Partition)进行分裂,分裂之后处理结束。本实施例中通过找到超过该线程池的所有分区的请求的总数的一半的请求个数最多的分区,该分区可以认为是该线程池中的对负载压力超标有显著影响的分区,因此,需要将其选择出来,对其进行分裂,从而有效地实现负载均衡。Step S23, performing a split operation on the partition with the largest number of requests. Specifically, for example, three partitions in a thread pool whose load pressure exceeds the standard are partition A, partition B, and partition C, wherein the number of requests belonging to partition A is 100, and the number of requests belonging to partition B is 20. The number of requests belonging to the partition C is 10, and the number of requests for the partition A having the largest number of requests exceeds half of the total number of requests of all the partitions of the thread pool 65=(100+20+10)/2. In this embodiment, the thread pool extraction request information with each load pressure exceeding the standard is analyzed, because the thread pool and the partition are not one-to-one correspondence, and the partition is only a logical unit and belongs to the partition (Partition). Requests may be processed using multiple thread pools. The request through the thread pool is counted according to the partition (Partition), and the number of requests belonging to different partitions (Partition) is counted, and then arranged in descending order according to the number of requests. If the number of requested partitions in a thread pool exceeds half of the total number of requests for all partitions of the thread pool, then select the partition and jump to step S23, in step S23. Because it belongs to The number of requests for the selected Partition accounts for half or more of the total number of requests for all partitions of the thread pool. Therefore, the selected Partition needs to be split, and the processing ends after the split. In this embodiment, by finding a partition with the largest number of requests exceeding half of the total number of requests of all the partitions of the thread pool, the partition can be regarded as a partition in the thread pool that has a significant influence on the load pressure exceeding the standard, and therefore, Select it and split it to effectively achieve load balancing.
本申请的分布式存储系统的负载监控方法一优选的实施例中,步骤S23,对该请求个数最多的分区进行分裂操作,包括:In a preferred embodiment of the load monitoring method of the distributed storage system of the present application, in step S23, the partitioning operation is performed on the partition with the largest number of requests, including:
将该分区分成若干个子分区,将子分区分散到其它服务器上,其中,每个子分区对应该分区的键码范围内的一个子键码范围,每个子分区隶属的请求个数基本相等。具体的,分裂点是从该分区(Partition)落在该线程池上的请求排序之后按照分区范围内的请求数平均选取,例如,某个分区范围的键码范围是0.1~0.4,其中,0.1~0.2范围的请求个数有200个,0.2~0.3范围的请求个数有200个,0.3~0.4范围的请求个数有200个,则可以将该分区分成三个子分区,对应的子键码范围分别为0.1~0.2、0.2~0.3和0.3~0.4,从而更好地实现负载均衡。The partition is divided into several sub-partitions, and the sub-partitions are distributed to other servers, wherein each sub-partition corresponds to a sub-key range within a range of key codes of the partition, and the number of requests to which each sub-part belongs is substantially equal. Specifically, the splitting point is averaged according to the number of requests in the partition range after the request is placed on the thread pool, for example, the key code range of a certain partition range is 0.1 to 0.4, wherein 0.1~ There are 200 requests in the range of 0.2, 200 in the range of 0.2 to 0.3, and 200 in the range of 0.3 to 0.4. The partition can be divided into three sub-partitions, and the corresponding sub-key range The load balancing is better achieved by 0.1 to 0.2, 0.2 to 0.3, and 0.3 to 0.4, respectively.
如图5所示,本申请的分布式存储系统的负载监控方法一优选的实施例中,步骤S22,判断请求个数最多的分区的请求数量是否超过该线程池的所有分区的请求的总数的一半之后,还包括:As shown in FIG. 5, in a preferred embodiment of the load monitoring method of the distributed storage system of the present application, in step S22, it is determined whether the number of requests of the partition with the largest number of requests exceeds the total number of requests of all the partitions of the thread pool. After half, it also includes:
若否,则转到步骤S24,If no, go to step S24,
步骤S24,在所述降序排列的分区中从第一分区开始依次选择一个或多个分区,直至未选择的剩余的分区所隶属的请求的总数小于该线程池的所有分区的请求的总数的一半;Step S24, selecting one or more partitions in the descending order partition from the first partition, until the total number of requests belonging to the unselected remaining partitions is less than half of the total number of requests of all partitions of the thread pool. ;
步骤S25,对选择的分区进行迁移操作。在此,若请求个数最多的分区未超过该线程池的所有分区的请求的总数的一半,则将从该线程池中降序排列的第一个分区(Partition)开始依次往后选择分区(Partition)直到剩下的分区所隶属的请求的总数小于该线程池的所有分区的请求的总数的一半,并跳转到步骤S25,步骤S25中,因为,请求个数最多的分区未超过该线程池的所有分区的请求的总数的一半,所以该线程池的请求中没有具有显著影响的分区(Partition),因此将从第一分区开始依次选择出来的分区(Partition)进行逐个迁移。例如,某个负载压力超标的线程池中有五个分区分别为分区D、分区E、分区F、分区G和分区H,其中,隶属于分区D的请求数为100,隶属于分区E的请求数为100,隶属于分区F的请求数为 100,隶属于分区G的请求数为100,隶属于分区H的请求数为100,则需要选择前三个分区D、E和F进行逐个迁移,从而剩下的分区G和H所隶属的请求的总数200=100+100小于该线程池的所有分区的请求的总数的一半250=(100+100+100+100+100)/2。本实施例能够精确地找到一个线程池中需要迁移的分区,从而更好地实现负载均衡。In step S25, a migration operation is performed on the selected partition. Here, if the partition with the largest number of requests does not exceed half of the total number of requests for all the partitions of the thread pool, the partition (Partition) is selected from the first partition (Partition) in descending order of the thread pool. Until the total number of requests to which the remaining partitions belong is less than half of the total number of requests for all partitions of the thread pool, and jumps to step S25, in step S25, because the partition with the largest number of requests does not exceed the thread pool The partitions of all partitions are half of the total number of requests, so there are no partitions (Partitions) in the request of the thread pool, so the partitions selected in order from the first partition are migrated one by one. For example, in a thread pool with a load pressure exceeding the standard, there are five partitions: partition D, partition E, partition F, partition G, and partition H. The number of requests belonging to partition D is 100, and the request belongs to partition E. The number is 100, and the number of requests belonging to partition F is 100, the number of requests belonging to the partition G is 100, and the number of requests belonging to the partition H is 100, and the first three partitions D, E, and F need to be selected for migration one by one, so that the remaining partitions G and H belong to the request. The total number of 200=100+100 is less than half of the total number of requests for all partitions of the thread pool 250=(100+100+100+100+100)/2. This embodiment can accurately find a partition in a thread pool that needs to be migrated, thereby achieving load balancing better.
本申请的分布式存储系统的负载监控方法一优选的实施例中,将选择的分区进行迁移操作,包括:In a preferred embodiment of the load monitoring method of the distributed storage system of the present application, the selected partition is migrated, including:
将每个选择的分区迁移到没有负载压力超标的线程池的服务器上,从而实现负载均衡。Load balancing is achieved by migrating each selected partition to a server that does not have a thread pool with excessive load stress.
本申请的分布式存储系统的负载监控方法一优选的实施例中,将每个选择的分区迁移到没有负载压力超标的线程池的服务器上,包括:In a preferred embodiment of the load monitoring method of the distributed storage system of the present application, each selected partition is migrated to a server having no thread pool with excessive load pressure, including:
查找符合条件的没有负载压力超标的线程池的服务器,若查找到,将该选择的分区迁移到该查找到的服务器上。本实施例在查找符合条件的没有负载压力超标的线程池的服务器的前提下,才将选择的分区迁移到该查找到的服务器上,从而更好地实现负载均衡。Find a server that meets the criteria for a thread pool that does not have excessive load pressure. If it finds it, migrate the selected partition to the found server. In this embodiment, on the premise of finding a server that meets the condition that there is no thread pool with excessive load pressure, the selected partition is migrated to the found server, thereby achieving load balancing better.
本申请的分布式存储系统的负载监控方法一优选的实施例中,所述符合条件的没有负载压力超标的线程池的服务器包括:In a preferred embodiment of the load monitoring method of the distributed storage system of the present application, the server that meets the condition that there is no thread pool with excessive load pressure includes:
若将某个选择的分区迁移到某个没有负载压力超标的线程池的服务器的对应线程池上后,当该迁移到的目标服务器的每个对应线程池的线程平均使用率均没有超过预设使用率阈值,则该服务器为符合条件的没有负载压力超标的线程池的服务器。具体的,从待迁移集合即所有选择的分区中取一个分区(Partition),获取其在目前服务器上所有线程池上的请求数目,例如,服务器M上某个选择的分区使用两个线程池处理隶属于其的请求,分别为读请求线程池Q1和写请求线程池Q2,在没有超负载线程池的服务器集合中随机选择一个,例如选择了服务器N,然后计算该选择的分区(Partition)的读请求线程池Q1和写请求线程池Q2分别迁移到服务器N上对应的读请求线程池Q1+和写请求线程池Q2+之后,即将读请求线程池Q1迁移到读请求线程池Q1+,及将写请求线程池Q2迁移到写请求线程池Q2+之后,服务器N上读请求线程池Q1+和写请求线程池Q2+的线程平均使用率均没有超过预设使用率阈值,预设使用率阈值可以是一经验值,即如果迁移之后所有的对应线程池的使用率都不超过预设使用率阈值,则判定允许迁移,否则选择另外一台没有负载 压力超标的线程池的服务器重复本过程,直到所有的没有负载压力超标的线程池的服务器都已经检查完毕,没有找到符合条件的没有负载压力超标的线程池的服务器,即放弃迁移该分区(Partition)。本实施例能够精确地找到符合条件的没有负载压力超标的线程池的服务器,从而更好地实现负载均衡。If a selected partition is migrated to a corresponding thread pool of a server that has no thread pool with excessive load pressure, the average thread usage of each corresponding thread pool of the migrated target server does not exceed the preset usage. Rate threshold, then the server is a server with an eligible thread pool that has no load pressure exceeded. Specifically, a partition is taken from the set to be migrated, that is, all selected partitions, and the number of requests on all thread pools on the current server is obtained. For example, a selected partition on the server M uses two thread pools to process the membership. The request is respectively a read request thread pool Q1 and a write request thread pool Q2, and one of the server sets without the overloaded thread pool is randomly selected, for example, the server N is selected, and then the read of the selected partition (Partition) is calculated. After the request thread pool Q1 and the write request thread pool Q2 are respectively migrated to the corresponding read request thread pool Q1+ and the write request thread pool Q2+ on the server N, the read request thread pool Q1 is migrated to the read request thread pool Q1+, and the write request thread is to be written. After the pool Q2 is migrated to the write request thread pool Q2+, the average usage rate of the thread of the request thread pool Q1+ and the write request thread pool Q2+ on the server N does not exceed the preset usage threshold, and the preset usage threshold may be an empirical value. That is, if the usage rate of all corresponding thread pools after the migration does not exceed the preset usage threshold, it is determined that the migration is allowed, otherwise another A no load The server of the thread pool with excessive pressure repeats the process until all the servers of the thread pool with no overloaded load have been checked, and no server that meets the condition of the thread pool with no overloaded load is found, that is, the partition is abandoned. ). This embodiment can accurately find a server of a thread pool that does not have a load pressure exceeding the standard, thereby achieving load balancing better.
本申请的分布式存储系统的负载监控方法一优选的实施例中,每个线程池的线程平均使用率通过如下公式(λ1+λ)*B/n获取,其中,In a preferred embodiment of the load monitoring method of the distributed storage system of the present application, the average thread usage rate of each thread pool is obtained by the following formula (λ 1 + λ) * B / n, wherein
λ1表示在迁移前服务器上的某个线程池中的请求到达该线程池的队列的速率;λ 1 represents the rate at which a request in a thread pool on the server before the migration reaches the queue of the thread pool;
λ表示待迁移到的目标服务器在迁移前其上的某个对应线程池中的请求到达该线程池的队列的速率;λ represents the rate at which the target server to be migrated to a queue in a corresponding thread pool before the migration reaches the queue of the thread pool;
B表示待迁移到的目标服务器的某个对应线程池中的每个线程对一个请求的实际处理时间;B represents the actual processing time of one request for each thread in a corresponding thread pool of the target server to be migrated;
n表示待迁移到的目标服务器的某个对应线程池中的线程个数。具体的,以上述迁移前的服务器M和待迁移到的目标服务器N为例,需要分别计算读请求线程池Q1+和写请求线程池Q2+的线程平均使用率是否均没有没有超过预设使用率阈值,其中,读请求线程池Q1+的线程平均使用率计算公式为(λQ1Q1+)*BQ1+/nQ1+,写请求线程池Q2+的线程平均使用率的计算公式为(λQ2Q2+)*BQ2+/nQ2+,本实施例可以精确地计算得到每个线程池的线程平均使用率,从而更好地实现负载均衡。n indicates the number of threads in a corresponding thread pool of the target server to be migrated to. Specifically, taking the server M before the migration and the target server N to be migrated as an example, whether the average usage rate of the thread of the read request thread pool Q1+ and the write request thread pool Q2+ is not exceeded or not exceeds the preset usage threshold. Wherein, the average usage rate of the thread of the read request thread pool Q1+ is (λ Q1 + λ Q1+ )*B Q1+ /n Q1+ , and the calculation formula of the average thread usage rate of the write request thread pool Q2+ is (λ Q2 + λ Q2+ *B Q2+ /n Q2+ , this embodiment can accurately calculate the average thread usage rate of each thread pool, thereby better achieving load balancing.
如图6所示,本申请一具体的应用实例中,分布式存储系统的负载监控方法包括如下步骤:As shown in FIG. 6, in a specific application example of the present application, a load monitoring method of a distributed storage system includes the following steps:
步骤S61,获取分布式存储系统中未处理的一个线程池,并判断是否获取到,若未获取到,则转到步骤S62,若获取到,则转到步骤S63,Step S61: Obtain a thread pool that is not processed in the distributed storage system, and determine whether it is acquired. If not, go to step S62, and if yes, go to step S63.
步骤S62,结束;Step S62, ending;
步骤S63,获取该线程池的队列中一个请求的等待时间Wq与逗留时间W的比值,所述逗留时间W为该线程池的队列中一个请求的等待时间Wq与实际处理时间B之和;Step S63, the acquisition waiting time ratio W Q W stay with the thread pool is in a request queue, the stay latency time W W Q for the thread pool queue a request to the actual processing time and B is ;
步骤S64,判断所述等待时间Wq与逗留时间W的比值是否超过预设超标阈值th,若否,转到步骤S61,若是,转到步骤S65,Step S64, determining whether the ratio of the waiting time Wq to the staying time W exceeds a preset exceeding threshold th, if not, going to step S61, and if yes, going to step S65,
步骤S65,将经过该线程池的请求按照分区进行统计,统计出该线程池中隶属于不同分区的请求的个数,并将分区按请求的个数降序排列,判断请求个数最多的 分区的请求数量是否超过该线程池的所有分区的请求的总数的一半,若是,转到步骤S66,若否,转到步骤S67,Step S65, the request through the thread pool is counted according to the partition, and the number of requests belonging to different partitions in the thread pool is counted, and the partitions are arranged in descending order according to the number of requests, and the number of requests is determined. Whether the number of requests of the partition exceeds half of the total number of requests of all the partitions of the thread pool, and if yes, go to step S66, if no, go to step S67.
步骤S66,对该请求个数最多的分区进行分裂操作;Step S66, performing a split operation on the partition with the largest number of requests;
步骤S67,在所述降序排列的分区中从第一分区开始依次选择一个或多个分区,直至未选择的剩余的分区所隶属的请求的总数小于该线程池的所有分区的请求的总数的一半;Step S67, selecting one or more partitions in the descending order partition from the first partition, until the total number of requests belonging to the unselected remaining partitions is less than half of the total number of requests of all partitions of the thread pool. ;
步骤S68,判断所述选择的分区中是否有未处理的分区,若有,则转到步骤S69,若无,则转到步骤S61,Step S68, determining whether there is an unprocessed partition in the selected partition, if yes, proceeding to step S69, if not, proceeding to step S61,
步骤S69,取下一个未处理的分区;Step S69, taking off an unprocessed partition;
步骤S70,查找符合条件的没有负载压力超标的线程池的服务器,并判断是否查找到,若未查找到,则转到步骤S68,以从所述选择的分区中取下一个未处理的分区并进行后续处理,直到所有选择的分区(Partition)都处理完毕,若查找到,则转到步骤S71,Step S70, searching for a server that meets the condition of the thread pool with no overload of the load pressure, and determining whether it is found, if not, then going to step S68 to remove an unprocessed partition from the selected partition and Subsequent processing until all selected partitions have been processed. If found, go to step S71.
步骤S71,将该选择的分区迁移到该查找到的服务器上后,转到步骤S68。Step S71, after migrating the selected partition to the found server, go to step S68.
如图7所示,根据本申请的另一面,还提供一种分布式存储系统的负载监控设备,该设备100包括:As shown in FIG. 7, according to another aspect of the present application, a load monitoring device of a distributed storage system is provided, and the device 100 includes:
负载监控装置1,用于确定分布式存储系统中每台服务器上所有负载压力超标的线程池;The load monitoring device 1 is configured to determine a thread pool whose load pressure exceeds the standard on each server in the distributed storage system;
报警或负载均衡装置2,用于对每个负载压力超标的线程池进行报警或负载均衡。本实施例中根据服务器(Server)的线程池状态即单台服务器(Server)的负载超出服务能力,来进行报警或自动地在服务器之间均衡分配负载,不依赖于用户请求模式,能正确的处理不同用户同时达到的请求,也不依赖于服务器的服务能力,在分布式存储系统的集群内部服务器的服务能力不一致的情况下也能正确执行的报警或负载均衡,从而防止出现热点,提高分布式存储系统服务质量。具体的,可根据服务器(Server)的线程池基本信息,确定分布式存储系统中每台服务器上所有负载压力超标的线程池,或对每个负载压力超标的线程池进行报警或负载均衡。单个服务器(Server)上一般拥有若干个线程池,典型的线程池可以用一个单队列模型来描述,线程池基本信息可体现为队列参数,具体的队列参数可包括如下内容:The alarm or load balancing device 2 is configured to perform alarm or load balancing on each thread pool whose load pressure exceeds the standard. In this embodiment, according to the thread pool state of the server (Server), that is, the load of the single server exceeds the service capability, the alarm is automatically distributed or the load is automatically distributed among the servers, and the user request mode is not dependent on the user. Handling requests that are simultaneously reached by different users does not depend on the service capabilities of the server. In the case where the service capabilities of the internal servers of the distributed storage system are inconsistent, the alarms or load balancing can be correctly performed, thereby preventing hot spots and improving distribution. Storage system service quality. Specifically, according to the basic information of the thread pool of the server, all thread pools with excessive load pressure on each server in the distributed storage system may be determined, or alarm or load balancing may be performed on each thread pool whose load pressure exceeds the standard. A single server (Server) generally has several thread pools. A typical thread pool can be described by a single queue model. The basic information of the thread pool can be represented as a queue parameter. The specific queue parameters can include the following contents:
a)Wq:表示某个线程池的队列中一个请求的等待时间。a) W q : indicates the waiting time of a request in the queue of a thread pool.
b)B:表示某个线程池的队列中一个请求的实际处理时间。 b) B: indicates the actual processing time of a request in the queue of a thread pool.
c)W:表示某个线程池的队列中一个请求的逗留时间,即等待时间Wq加上实际处理时间B。c) W: indicates the waiting time of a request in the queue of a thread pool, that is, the waiting time W q plus the actual processing time B.
d)λ:表示某个线程池中的请求到达该线程池的队列的速率。d) λ: indicates the rate at which a request in a thread pool reaches the queue of the thread pool.
e)μ:表示某个线程池的单位时间的服务能力。e) μ: indicates the service capability of a thread pool per unit time.
本申请分布式存储系统的负载监控设备一优选的实施例中,所述负载监控装置1,用于获取每台服务器上的每个线程池的队列中一个请求的等待时间Wq与逗留时间W的比值,所述逗留时间W为每个线程池的队列中一个请求的等待时间Wq与实际处理时间B之和;当所述等待时间Wq与逗留时间W的比值超过预设超标阈值th时,确定该请求所在的服务器上的线程池的负载压力超标。对每台服务上的每个线程池参数进行分析,找出负载压力超标的线程池,关于如何判断超标负载的线程池,可以使用如下公式判断:In a preferred embodiment of the load monitoring device of the distributed storage system of the present application, the load monitoring device 1 is configured to acquire a waiting time W q and a stay time W of a request in a queue of each thread pool on each server. The ratio of the waiting time W is the sum of the waiting time Wq of one request in the queue of each thread pool and the actual processing time B; when the ratio of the waiting time Wq to the staying time W exceeds a preset exceeding threshold th When it is determined, the load pressure of the thread pool on the server where the request is located exceeds the standard. Analyze each thread pool parameter on each service to find out the thread pool whose load pressure exceeds the standard. For how to judge the thread pool exceeding the standard load, you can use the following formula to judge:
Wq/W>thW q /W>th
该公式的含义是当所述等待时间Wq与逗留时间W的比值超过预设超标阈值th时,确定该请求所在的服务器上的线程池的负载压力超标。本实施例通过等待时间Wq与逗留时间W的比值和预设超标阈值th的比较,能够精确地得到每台服务器上所有负载压力超标的线程池。The meaning of the formula is that when the ratio of the waiting time W q to the staying time W exceeds the preset exceeding threshold th, it is determined that the load pressure of the thread pool on the server where the request is located exceeds the standard. In this embodiment, by comparing the ratio of the waiting time W q to the staying time W and the preset over-standard threshold th, it is possible to accurately obtain a thread pool in which all load pressures on each server exceed the standard.
本申请分布式存储系统的负载监控设备一优选的实施例中,所述预设超标阈值根据线程池的请求到达率的预设阈值确定,其中,所述线程池的请求到达率为请求到达线程池的队列的速率与该线程池的单位时间的服务能力μ的比值,当线程池的请求到达率超过线程池的请求到达率λ/μ的预设阈值时,对应的等待时间与逗留时间的比值Wq/W开始急剧上升,所述预设超标阈值超过所述开始急剧上升时等待时间与逗留时间的比值Wq/W。每一条线表一个线程池,n分别表示对应的线程池中的线程个数,其中,第一个线程池中的线程有1个,第二个线程池中的线程有2个,第三个线程池中的线程有5个,第四个线程池中的线程有10个,第五个线程池中的线程有24个,图3中横坐标表示请求到达率λ/μ,纵坐标表示等待时间Wq与逗留时间W的比值Wq/W。从图3中,可以看出Wq/W在请求到达率λ/μ超过一定的数值(线程池的请求到达率的预设阈值)之后就开始急剧上升,Wq/W的这一开始急剧上升的点的值即为拐点,因此Wq/W的预设超标阈值只要超过拐点即可,例如, 实际中可将拐点的值设定为0.5,则所述预设超标阈值只要大于0.5即可。本实施例根据线程池的请求到达率的预设阈值确定精确的预设超标阈值,从而能够更精确地得到每台服务器上所有负载压力超标的线程池。In a preferred embodiment of the load monitoring device of the distributed storage system of the present application, the preset exceeding threshold is determined according to a preset threshold of a request arrival rate of the thread pool, wherein the request arrival rate of the thread pool is a request arrival thread. The ratio of the rate of the queue of the pool to the service capability μ of the thread pool per unit time. When the request arrival rate of the thread pool exceeds the preset threshold of the thread pool request arrival rate λ/μ, the corresponding waiting time and the stay time The ratio W q /W begins to rise sharply, and the preset over-standard threshold exceeds the ratio W q /W of the waiting time to the dwell time when the start of the sharp rise. Each thread table has a thread pool, and n represents the number of threads in the corresponding thread pool. Among them, there are one thread in the first thread pool, and two threads in the second thread pool, and the third thread. There are 5 threads in the thread pool, 10 threads in the fourth thread pool, and 24 threads in the fifth thread pool. In Figure 3, the abscissa indicates the request arrival rate λ/μ, and the ordinate indicates waiting. The ratio of time W q to stay time W is W q /W. From Fig. 3, it can be seen that W q /W starts to rise sharply after the request arrival rate λ/μ exceeds a certain value (preset threshold of the thread pool's request arrival rate), and the start of W q /W is sharp The value of the rising point is the inflection point, so the preset exceeding threshold of W q /W only needs to exceed the inflection point. For example, in practice, the value of the inflection point can be set to 0.5, and the preset exceeding threshold is only greater than 0.5. can. In this embodiment, an accurate preset over-standard threshold is determined according to a preset threshold of the request arrival rate of the thread pool, so that all the thread pools whose load pressure exceeds the standard on each server can be obtained more accurately.
本申请分布式存储系统的负载监控设备一优选的实施例中,所述报警或负载均衡装置2,用于将经过每个负载压力超标的线程池的请求按照分区进行统计,统计出该线程池中隶属于不同分区的请求的个数,并将分区按请求的个数降序排列;判断请求个数最多的分区的请求数量是否超过该线程池的所有分区的请求的总数的一半,若是,对该请求个数最多的分区进行分裂操作。具体的,例如,某个负载压力超标的线程池中有三个分区分别为分区A、分区B和分区C,其中,隶属于分区A的请求数为100,隶属于分区B的请求数为20,隶属于分区C的请求数为10,则求个数最多的分区A的请求数量100超过了该线程池的所有分区的请求的总数的一半65=(100+20+10)/2。本实施例中,对于每个负载压力超标的线程池提取请求信息进行分析,因为线程池和分区(Partition)并不是一一对应的,分区(Partition)只是一个逻辑单元,隶属于该分区(Partition)的请求可能会使用多个线程池进行处理。将经过该线程池的请求按照分区(Partition)进行统计,统计出隶属于不同分区(Partition)的请求个数,然后按照请求个数降序排列。如果一个线程池中请求数目最多的某个分区(Partition)请求数目超过了该线程池的所有分区的请求的总数的一半,那么选择将该分区(Partition)并跳转到步骤S23,步骤S23中,因为隶属于该选择的分区(Partition)的请求数目占了该线程池的所有分区的请求的总数的一半或一半以上,因此,需要对该选择的分区(Partition)进行分裂,分裂之后处理结束。本实施例中通过找到超过该线程池的所有分区的请求的总数的一半的请求个数最多的分区,该分区可以认为是该线程池中的对负载压力超标有显著影响的分区,因此,需要将其选择出来,对其进行分裂,从而有效地实现负载均衡。In a preferred embodiment of the load monitoring device of the distributed storage system of the present application, the alarm or load balancing device 2 is configured to count the request of the thread pool that exceeds the pressure of each load according to the partition, and count the thread pool. The number of requests belonging to different partitions, and sorting the partitions in descending order of the number of requests; determining whether the number of requests for the partition with the largest number of requests exceeds half of the total number of requests for all partitions of the thread pool, and if so, The partition with the largest number of requests performs the split operation. Specifically, for example, three partitions in a thread pool whose load pressure exceeds the standard are partition A, partition B, and partition C, wherein the number of requests belonging to partition A is 100, and the number of requests belonging to partition B is 20. The number of requests belonging to the partition C is 10, and the number of requests for the partition A having the largest number of requests exceeds half of the total number of requests of all the partitions of the thread pool 65=(100+20+10)/2. In this embodiment, the thread pool extraction request information with each load pressure exceeding the standard is analyzed, because the thread pool and the partition are not one-to-one correspondence, and the partition is only a logical unit and belongs to the partition (Partition). Requests may be processed using multiple thread pools. The request through the thread pool is counted according to the partition (Partition), and the number of requests belonging to different partitions (Partition) is counted, and then arranged in descending order according to the number of requests. If the number of requested partitions in a thread pool exceeds half of the total number of requests for all partitions of the thread pool, then select the partition and jump to step S23, in step S23. Because the number of requests belonging to the selected partition is half or more of the total number of requests for all partitions of the thread pool, the selected partition needs to be split, and the processing ends after the split. . In this embodiment, by finding a partition with the largest number of requests exceeding half of the total number of requests of all the partitions of the thread pool, the partition can be regarded as a partition in the thread pool that has a significant influence on the load pressure exceeding the standard, and therefore, Select it and split it to effectively achieve load balancing.
本申请分布式存储系统的负载监控设备一优选的实施例中,所述报警或负载均衡装置2,用于将该分区分成若干个子分区,将子分区分散到其它服务器上,其中,每个子分区对应该分区的键码范围内的一个子键码范围,每个子分区隶属的请求个数基本相等。具体的,分裂点是从该分区(Partition)落在该线程池上的请求排序之后按照分区范围内的请求数平均选取,例如,某个分区范围的键码范围是0.1~0.4,其中,0.1~0.2范围的请求个数有200个,0.2~0.3范围的请求个数有200个,0.3~0.4范围的请求个数有200个,则可以将该分区分成三个子分区,对应的子键码范 围分别为0.1~0.2、0.2~0.3和0.3~0.4,从而更好地实现负载均衡。In a preferred embodiment of the load monitoring device of the distributed storage system of the present application, the alarm or load balancing device 2 is configured to divide the partition into a plurality of sub-partitions, and distribute the sub-partitions to other servers, wherein each sub-partition The number of requests for each sub-partition is substantially equal to a sub-key range within the range of the key code of the partition. Specifically, the splitting point is averaged according to the number of requests in the partition range after the request is placed on the thread pool, for example, the key code range of a certain partition range is 0.1 to 0.4, wherein 0.1~ There are 200 requests in the range of 0.2, 200 in the range of 0.2 to 0.3, and 200 in the range of 0.3 to 0.4. The partition can be divided into three sub-partitions, and the corresponding sub-keys are The circumference is 0.1 to 0.2, 0.2 to 0.3, and 0.3 to 0.4, respectively, to achieve load balancing better.
本申请分布式存储系统的负载监控设备一优选的实施例中,所述报警或负载均衡装置2,用于判断请求个数最多的分区的请求数量是否超过该线程池的所有分区的请求的总数的一半,若否,在所述降序排列的分区中从第一分区开始依次选择一个或多个分区,直至未选择的剩余的分区所隶属的请求的总数小于该线程池的所有分区的请求的总数的一半;对选择的分区进行迁移操作。在此,若请求个数最多的分区未超过该线程池的所有分区的请求的总数的一半,则将从该线程池中降序排列的第一个分区(Partition)开始依次往后选择分区(Partition)直到剩下的分区所隶属的请求的总数小于该线程池的所有分区的请求的总数的一半,并对选择的分区进行迁移操作,因为,请求个数最多的分区未超过该线程池的所有分区的请求的总数的一半,所以该线程池的请求中没有具有显著影响的分区(Partition),因此将从第一分区开始依次选择出来的分区(Partition)进行逐个迁移。例如,某个负载压力超标的线程池中有五个分区分别为分区D、分区E、分区F、分区G和分区H,其中,隶属于分区D的请求数为100,隶属于分区E的请求数为100,隶属于分区F的请求数为100,隶属于分区G的请求数为100,隶属于分区H的请求数为100,则需要选择前三个分区D、E和F进行逐个迁移,从而剩下的分区G和H所隶属的请求的总数200=100+100小于该线程池的所有分区的请求的总数的一半250=(100+100+100+100+100)/2。本实施例能够精确地找到一个线程池中需要迁移的分区,从而更好地实现负载均衡。In a preferred embodiment of the load monitoring device of the distributed storage system of the present application, the alarm or load balancing device 2 is configured to determine whether the number of requests for the partition with the largest number of requests exceeds the total number of requests for all the partitions of the thread pool. Half, if not, select one or more partitions in the descending order partition from the first partition, until the total number of requests belonging to the unselected remaining partitions is less than the request of all partitions of the thread pool Half of the total; migrate the selected partition. Here, if the partition with the largest number of requests does not exceed half of the total number of requests for all the partitions of the thread pool, the partition (Partition) is selected from the first partition (Partition) in descending order of the thread pool. The total number of requests to which the remaining partition belongs is less than half of the total number of requests for all partitions of the thread pool, and the selected partition is migrated because the partition with the largest number of requests does not exceed all of the thread pool The partitioned request has half of the total number of requests, so there is no partition (Partition) in the request of the thread pool, so the partitions selected in order from the first partition are migrated one by one. For example, in a thread pool with a load pressure exceeding the standard, there are five partitions: partition D, partition E, partition F, partition G, and partition H. The number of requests belonging to partition D is 100, and the request belongs to partition E. The number is 100, the number of requests belonging to the partition F is 100, the number of requests belonging to the partition G is 100, and the number of requests belonging to the partition H is 100, and the first three partitions D, E, and F need to be selected for migration one by one. Thus the total number of requests to which the remaining partitions G and H belong is 200=100+100 less than half of the total number of requests for all partitions of the thread pool 250=(100+100+100+100+100)/2. This embodiment can accurately find a partition in a thread pool that needs to be migrated, thereby achieving load balancing better.
本申请分布式存储系统的负载监控设备一优选的实施例中,所述报警或负载均衡装置2,用于将每个选择的分区迁移到没有负载压力超标的线程池的服务器上,从而实现负载均衡。In a preferred embodiment of the load monitoring device of the distributed storage system of the present application, the alarm or load balancing device 2 is configured to migrate each selected partition to a server that has no thread pool with excessive load pressure, thereby implementing a load. balanced.
本申请分布式存储系统的负载监控设备一优选的实施例中,所述报警或负载均衡装置2,用于查找符合条件的没有负载压力超标的线程池的服务器,若查找到,将该选择的分区迁移到该查找到的服务器上。本实施例在查找符合条件的没有负载压力超标的线程池的服务器的前提下,才将选择的分区迁移到该查找到的服务器上,从而更好地实现负载均衡。In a preferred embodiment of the load monitoring device of the distributed storage system of the present application, the alarm or load balancing device 2 is configured to find a server of a thread pool that does not have a load pressure exceeding the standard, and if found, select the selected The partition is migrated to the discovered server. In this embodiment, on the premise of finding a server that meets the condition that there is no thread pool with excessive load pressure, the selected partition is migrated to the found server, thereby achieving load balancing better.
本申请分布式存储系统的负载监控设备一优选的实施例中,所述符合条件的没有负载压力超标的线程池的服务器包括:In a preferred embodiment of the load monitoring device of the distributed storage system of the present application, the server that meets the condition that there is no thread pool with excessive load pressure includes:
若将某个选择的分区迁移到某个没有负载压力超标的线程池的服务器的对应 线程池上后,当该迁移到的目标服务器的每个对应线程池的线程平均使用率均没有超过预设使用率阈值,则该服务器为符合条件的没有负载压力超标的线程池的服务器。具体的,从待迁移集合即所有选择的分区中取一个分区(Partition),获取其在目前服务器上所有线程池上的请求数目,例如,服务器M上某个选择的分区使用两个线程池处理隶属于其的请求,分别为读请求线程池Q1和写请求线程池Q2,在没有超负载线程池的服务器集合中随机选择一个,例如选择了服务器N,然后计算该选择的分区(Partition)的读请求线程池Q1和写请求线程池Q2分别迁移到服务器N上对应的读请求线程池Q1+和写请求线程池Q2+之后,即将读请求线程池Q1迁移到读请求线程池Q1+,及将写请求线程池Q2迁移到写请求线程池Q2+之后,服务器N上读请求线程池Q1+和写请求线程池Q2+的线程平均使用率均没有超过预设使用率阈值,预设使用率阈值可以是一经验值,即如果迁移之后所有的对应线程池的使用率都不超过预设使用率阈值,则判定允许迁移,否则选择另外一台没有负载压力超标的线程池的服务器重复本过程,直到所有的没有负载压力超标的线程池的服务器都已经检查完毕,没有找到符合条件的没有负载压力超标的线程池的服务器,即放弃迁移该分区(Partition)。本实施例能够精确地找到符合条件的没有负载压力超标的线程池的服务器,从而更好地实现负载均衡。If a selected partition is migrated to the corresponding server of a thread pool that does not have excessive load pressure After the thread pool, when the average thread usage of each corresponding thread pool of the migrated target server does not exceed the preset usage threshold, the server is an eligible server with no thread pool with excessive load pressure. Specifically, a partition is taken from the set to be migrated, that is, all selected partitions, and the number of requests on all thread pools on the current server is obtained. For example, a selected partition on the server M uses two thread pools to process the membership. The request is respectively a read request thread pool Q1 and a write request thread pool Q2, and one of the server sets without the overloaded thread pool is randomly selected, for example, the server N is selected, and then the read of the selected partition (Partition) is calculated. After the request thread pool Q1 and the write request thread pool Q2 are respectively migrated to the corresponding read request thread pool Q1+ and the write request thread pool Q2+ on the server N, the read request thread pool Q1 is migrated to the read request thread pool Q1+, and the write request thread is to be written. After the pool Q2 is migrated to the write request thread pool Q2+, the average usage rate of the thread of the request thread pool Q1+ and the write request thread pool Q2+ on the server N does not exceed the preset usage threshold, and the preset usage threshold may be an empirical value. That is, if the usage rate of all corresponding thread pools after the migration does not exceed the preset usage threshold, it is determined that the migration is allowed, otherwise another A server that does not have a thread pool with excessive load pressure repeats the process until all the servers of the thread pool that have no overloaded load have been checked, and no server that meets the conditional thread pool with no overloaded load is found, that is, the migration is abandoned. The partition (Partition). This embodiment can accurately find a server of a thread pool that does not have a load pressure exceeding the standard, thereby achieving load balancing better.
本申请分布式存储系统的负载监控设备一优选的实施例中,每个线程池的线程平均使用率通过如下公式(λ1+λ)*B/n获取,其中,In a preferred embodiment of the load monitoring device of the distributed storage system of the present application, the average thread usage rate of each thread pool is obtained by the following formula (λ 1 + λ) * B / n, wherein
λ1表示在迁移前服务器上的某个线程池中的请求到达线程池的队列的速率;λ 1 represents the rate at which a request in a thread pool on the server before the migration reaches the queue of the thread pool;
λ表示待迁移到的目标服务器在迁移前其上的某个对应线程池中的请求到达线程池的队列的速率;λ represents the rate at which the target server to be migrated to the queue of the thread pool in a corresponding thread pool before the migration;
B表示待迁移到的目标服务器的某个对应线程池中的每个线程对一个请求的实际处理时间;B represents the actual processing time of one request for each thread in a corresponding thread pool of the target server to be migrated;
n表示待迁移到的目标服务器的某个对应线程池中的线程个数。具体的,以上述迁移前的服务器M和待迁移到的目标服务器N为例,需要分别计算读请求线程池Q1+和写请求线程池Q2+的线程平均使用率是否均没有没有超过预设使用率阈值,其中,读请求线程池Q1+的线程平均使用率计算公式为(λQ1Q1+)*BQ1+/nQ1+,写请求线程池Q2+的线程平均使用率的计算公式为(λQ2Q2+)*BQ2+/nQ2+,本实施例可以精确地计算得到每个线程池的线程平均使用率,从而更好地实现负载均衡。 n indicates the number of threads in a corresponding thread pool of the target server to be migrated to. Specifically, taking the server M before the migration and the target server N to be migrated as an example, whether the average usage rate of the thread of the read request thread pool Q1+ and the write request thread pool Q2+ is not exceeded or not exceeds the preset usage threshold. Wherein, the average usage rate of the thread of the read request thread pool Q1+ is (λ Q1 + λ Q1+ )*B Q1+ /n Q1+ , and the calculation formula of the average thread usage rate of the write request thread pool Q2+ is (λ Q2 + λ Q2+ *B Q2+ /n Q2+ , this embodiment can accurately calculate the average thread usage rate of each thread pool, thereby better achieving load balancing.
综上所述,本申请通过确定分布式存储系统中每台服务器上所有负载压力超标的线程池,对每个负载压力超标的线程池进行报警或负载均衡,能够根据服务器的线程池状态即单台服务器的负载超出服务能力,来进行报警或自动地在服务器之间均衡分配负载,不依赖于用户请求模式,能正确的处理不同用户同时达到的请求,也不依赖于服务器的服务能力,在分布式存储系统的集群内部服务器的服务能力不一致的情况下也能正确执行的报警或负载均衡,从而防止出现热点,提高分布式存储系统服务质量。In summary, the present application determines whether a thread pool with excessive load pressure on each server in a distributed storage system is alarmed or load balanced for each thread pool whose load pressure exceeds the standard, and can be based on the thread pool status of the server. The load of the server exceeds the service capability to alarm or automatically distribute the load among the servers. It does not depend on the user request mode. It can correctly handle requests that are simultaneously reached by different users, and does not depend on the server's service capabilities. In the case of inconsistent service capabilities of cluster internal servers of a distributed storage system, alarms or load balancing can be performed correctly, thereby preventing hot spots and improving the quality of distributed storage system services.
进一步的,本申请通过等待时间Wq与逗留时间W的比值和预设超标阈值th的比较,能够精确地得到每台服务器上所有负载压力超标的线程池。Further, the present application can accurately obtain a thread pool in which all load pressures on each server exceed the standard by comparing the ratio of the waiting time W q to the staying time W and the preset over-standard threshold th.
进一步的,本申请根据线程池的请求到达率的预设阈值确定精确的预设超标阈值,从而能够更精确地得到每台服务器上所有负载压力超标的线程池。Further, the present application determines an accurate preset exceeding threshold according to a preset threshold of the thread pool's request arrival rate, so that all thread pools with excessive load pressure on each server can be obtained more accurately.
进一步的,本申请将经过每个负载压力超标的线程池的请求按照分区进行统计,统计出该线程池中隶属于不同分区的请求的个数,并将分区按请求的个数降序排列,当请求个数最多的分区的请求数量超过该线程池的所有分区的请求的总数的一半时,对该请求个数最多的分区进行分裂操作,可以精确找到需要进行分裂操作的分区,从而有效地实现负载均衡。Further, the application will perform statistics according to the partitions of each thread pool whose load pressure exceeds the standard, and count the number of requests belonging to different partitions in the thread pool, and arrange the partitions in descending order according to the number of requests. When the number of requests for the partition with the largest number of requests exceeds half of the total number of requests for all partitions of the thread pool, the partition with the largest number of requests is split, and the partition that needs to be split can be accurately found, thereby effectively implementing Load balancing.
进一步的,本申请中当请求个数最多的分区的请求数量未超过该线程池的所有分区的请求的总数的一半时,在所述降序排列的分区中从第一分区开始依次选择一个或多个分区,直至未选择的剩余的分区所隶属的请求的总数小于该线程池的所有分区的请求的总数的一半,对选择的分区进行迁移操作,能够精确地找到一个线程池中需要迁移的分区,从而更好地实现负载均衡。Further, in the present application, when the number of requests for the partition with the largest number of requests does not exceed half of the total number of requests of all the partitions of the thread pool, one or more are selected from the first partition in the descending ranked partition. Partitions, until the total number of requests belonging to the remaining unselected partitions is less than half of the total number of requests from all partitions of the thread pool, and the selected partitions are migrated to accurately find the partitions to be migrated in one thread pool. To achieve better load balancing.
进一步的,本申请在查找符合条件的没有负载压力超标的线程池的服务器的前提下,才将选择的分区迁移到该查找到的服务器上,从而更好地实现负载均衡。Further, the present application migrates the selected partition to the found server under the premise of finding a server with a thread pool with no load pressure exceeding the standard, thereby better implementing load balancing.
进一步的,本申请中若将某个选择的分区迁移到某个没有负载压力超标的线程池的服务器的对应线程池上后,当该迁移到的目标服务器的每个对应线程池的线程平均使用率均没有超过预设使用率阈值,则该服务器为符合条件的没有负载压力超标的线程池的服务器,能够精确地找到符合条件的没有负载压力超标的线程池的服务器,从而更好地实现负载均衡。Further, in the present application, if a selected partition is migrated to a corresponding thread pool of a server that has no thread pool with excessive load pressure, the average thread usage rate of each corresponding thread pool of the migrated target server is If the server does not exceed the preset usage threshold, the server is a server with an unqualified thread pool with excessive load pressure. It can accurately find the server with the thread pool without load pressure exceeding the standard, thus achieving better load balancing. .
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技 术的范围之内,则本申请也意图包含这些改动和变型在内。It will be apparent to those skilled in the art that various modifications and changes can be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the application are within the scope of the claims and their equivalents This application is also intended to cover such modifications and variations.
需要注意的是,本发明可在软件和/或软件与硬件的组合体中被实施,例如,可采用专用集成电路(ASIC)、通用目的计算机或任何其他类似硬件设备来实现。在一个实施例中,本发明的软件程序可以通过处理器执行以实现上文所述步骤或功能。同样地,本发明的软件程序(包括相关的数据结构)可以被存储到计算机可读记录介质中,例如,RAM存储器,磁或光驱动器或软磁盘及类似设备。另外,本发明的一些步骤或功能可采用硬件来实现,例如,作为与处理器配合从而执行各个步骤或功能的电路。It should be noted that the present invention can be implemented in software and/or a combination of software and hardware, for example, using an application specific integrated circuit (ASIC), a general purpose computer, or any other similar hardware device. In one embodiment, the software program of the present invention may be executed by a processor to implement the steps or functions described above. Likewise, the software program (including related data structures) of the present invention can be stored in a computer readable recording medium such as a RAM memory, a magnetic or optical drive or a floppy disk and the like. Additionally, some of the steps or functions of the present invention may be implemented in hardware, for example, as a circuit that cooperates with a processor to perform various steps or functions.
另外,本发明的一部分可被应用为计算机程序产品,例如计算机程序指令,当其被计算机执行时,通过该计算机的操作,可以调用或提供根据本发明的方法和/或技术方案。而调用本发明的方法的程序指令,可能被存储在固定的或可移动的记录介质中,和/或通过广播或其他信号承载媒体中的数据流而被传输,和/或被存储在根据所述程序指令运行的计算机设备的工作存储器中。在此,根据本发明的一个实施例包括一个装置,该装置包括用于存储计算机程序指令的存储器和用于执行程序指令的处理器,其中,当该计算机程序指令被该处理器执行时,触发该装置运行基于前述根据本发明的多个实施例的方法和/或技术方案。Additionally, a portion of the invention can be applied as a computer program product, such as computer program instructions, which, when executed by a computer, can invoke or provide a method and/or solution in accordance with the present invention. The program instructions for invoking the method of the present invention may be stored in a fixed or removable recording medium and/or transmitted by a data stream in a broadcast or other signal bearing medium, and/or stored in a The working memory of the computer device in which the program instructions are run. Herein, an embodiment in accordance with the present invention includes a device including a memory for storing computer program instructions and a processor for executing program instructions, wherein when the computer program instructions are executed by the processor, triggering The apparatus operates based on the aforementioned methods and/or technical solutions in accordance with various embodiments of the present invention.
对于本领域技术人员而言,显然本发明不限于上述示范性实施例的细节,而且在不背离本发明的精神或基本特征的情况下,能够以其他的具体形式实现本发明。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本发明的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本发明内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。装置权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第一,第二等词语用来表示名称,而并不表示任何特定的顺序。 It is apparent to those skilled in the art that the present invention is not limited to the details of the above-described exemplary embodiments, and the present invention can be embodied in other specific forms without departing from the spirit or essential characteristics of the invention. Therefore, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the invention is defined by the appended claims instead All changes in the meaning and scope of equivalent elements are included in the present invention. Any reference signs in the claims should not be construed as limiting the claim. In addition, it is to be understood that the word "comprising" does not exclude other elements or steps. A plurality of units or devices recited in the device claims may also be implemented by a unit or device by software or hardware. The first, second, etc. words are used to denote names and do not denote any particular order.

Claims (20)

  1. 一种分布式存储系统的负载监控方法,其中,该方法包括:A load monitoring method for a distributed storage system, wherein the method comprises:
    确定分布式存储系统中每台服务器上所有负载压力超标的线程池;Determining the thread pool where all load pressures on each server in the distributed storage system exceed the standard;
    对每个负载压力超标的线程池进行报警或负载均衡。Alarm or load balancing for each thread pool with excessive load pressure.
  2. 如权利要求1所述的方法,其中,确定分布式存储系统中每台服务器上所有负载压力超标的线程池,包括:The method of claim 1 wherein determining a thread pool of all load pressures on each server in the distributed storage system exceeds a standard, comprising:
    获取每台服务器上的每个线程池的队列中一个请求的等待时间与逗留时间的比值,所述逗留时间为每个线程池的队列中一个请求的等待时间与实际处理时间之和;Obtaining a ratio of a wait time to a stay time of a request in a queue of each thread pool on each server, the stay time being the sum of a wait time of one request in a queue of each thread pool and an actual processing time;
    当所述等待时间与逗留时间的比值超过预设超标阈值时,确定该请求所在的服务器上的线程池的负载压力超标。When the ratio of the waiting time to the staying time exceeds a preset exceeding threshold, it is determined that the load pressure of the thread pool on the server where the request is located exceeds the standard.
  3. 如权利要求2所述的方法,其中,所述预设超标阈值根据线程池的请求到达率的预设阈值确定,其中,所述线程池的请求到达率为请求到达线程池的队列的速率与该线程池的单位时间的服务能力的比值,当线程池的请求到达率超过线程池的请求到达率的预设阈值时,对应的等待时间与逗留时间的比值开始急剧上升,所述预设超标阈值超过所述开始急剧上升时等待时间与逗留时间的比值。The method of claim 2, wherein the preset exceeding threshold is determined according to a preset threshold of a request arrival rate of the thread pool, wherein the request arrival rate of the thread pool is a rate of a queue requesting to reach the thread pool The ratio of the service capacity of the thread pool per unit time. When the request arrival rate of the thread pool exceeds the preset threshold of the request arrival rate of the thread pool, the ratio of the corresponding waiting time to the stay time starts to rise sharply, and the preset exceeds the standard. The threshold value exceeds the ratio of the waiting time to the staying time when the start of the sharp rise.
  4. 如权利要求1至3任一项所述的方法,其中,对每个负载压力超标的线程池进行负载均衡,包括:The method according to any one of claims 1 to 3, wherein load balancing is performed on each thread pool whose load pressure exceeds the standard, including:
    将经过每个负载压力超标的线程池的请求按照分区进行统计,统计出该线程池中隶属于不同分区的请求的个数,并将分区按请求的个数降序排列;The request of the thread pool that exceeds the standard of each load pressure is counted according to the partition, and the number of requests belonging to different partitions in the thread pool is counted, and the partitions are arranged in descending order according to the number of requests;
    判断请求个数最多的分区的请求数量是否超过该线程池的所有分区的请求的总数的一半,Determines whether the number of requests for the partition with the largest number of requests exceeds half of the total number of requests for all partitions of the thread pool.
    若是,对该请求个数最多的分区进行分裂操作。If yes, split the partition with the largest number of requests.
  5. 如权利要求4所述的方法,其中,对该请求个数最多的分区进行分裂操作,包括:The method of claim 4, wherein the splitting operation is performed on the partition having the largest number of requests, including:
    将该分区分成若干个子分区,将子分区分散到其它服务器上,其中,每个子分区对应该分区的键码范围内的一个子键码范围,每个子分区隶属的请求个数基本相等。The partition is divided into several sub-partitions, and the sub-partitions are distributed to other servers, wherein each sub-partition corresponds to a sub-key range within a range of key codes of the partition, and the number of requests to which each sub-part belongs is substantially equal.
  6. 如权利要求4所述的方法,其中,判断请求个数最多的分区的请求数量是否超过该线程池的所有分区的请求的总数的一半之后,还包括:The method of claim 4, wherein after determining whether the number of requests for the partition having the largest number of requests exceeds half of the total number of requests of all the partitions of the thread pool, the method further comprises:
    若否,在所述降序排列的分区中从第一分区开始依次选择一个或多个分区,直 至未选择的剩余的分区所隶属的请求的总数小于该线程池的所有分区的请求的总数的一半;If not, select one or more partitions in the descending order of the partition from the first partition, The total number of requests to the remaining unselected partitions is less than half of the total number of requests for all partitions of the thread pool;
    对选择的分区进行迁移操作。Migrate the selected partition.
  7. 如权利要求6所述的方法,其中,将选择的分区进行迁移操作,包括:The method of claim 6 wherein the migrating operation of the selected partition comprises:
    将每个选择的分区迁移到没有负载压力超标的线程池的服务器上。Migrate each selected partition to a server that has no thread pool with excessive load pressure.
  8. 如权利要求7所述的方法,其中,将每个选择的分区迁移到没有负载压力超标的线程池的服务器上,包括:The method of claim 7 wherein each selected partition is migrated to a server of a thread pool that does not have a load stress threshold, comprising:
    查找符合条件的没有负载压力超标的线程池的服务器,若查找到,将该选择的分区迁移到该查找到的服务器上。Find a server that meets the criteria for a thread pool that does not have excessive load pressure. If it finds it, migrate the selected partition to the found server.
  9. 如权利要求8所述的方法,其中,所述符合条件的没有负载压力超标的线程池的服务器包括:The method of claim 8 wherein said eligible server of a thread pool without load stress exceeding the criteria comprises:
    若将某个选择的分区迁移到某个没有负载压力超标的线程池的服务器的对应线程池上后,当该迁移到的目标服务器的每个对应线程池的线程平均使用率均没有超过预设使用率阈值,则该服务器为符合条件的没有负载压力超标的线程池的服务器。If a selected partition is migrated to a corresponding thread pool of a server that has no thread pool with excessive load pressure, the average thread usage of each corresponding thread pool of the migrated target server does not exceed the preset usage. Rate threshold, then the server is a server with an eligible thread pool that has no load pressure exceeded.
  10. 如权利要求9所述的方法,其中,每个线程池的线程平均使用率通过如下公式(λ1+λ)*B/n获取,其中,The method of claim 9, wherein the thread average usage rate of each thread pool is obtained by the following formula (λ 1 + λ) * B / n, wherein
    λ1表示在迁移前服务器上的某个线程池中的请求到达线程池的队列的速率;λ 1 represents the rate at which a request in a thread pool on the server before the migration reaches the queue of the thread pool;
    λ表示待迁移到的目标服务器在迁移前其上的某个对应线程池中的请求到达线程池的队列的速率;λ represents the rate at which the target server to be migrated to the queue of the thread pool in a corresponding thread pool before the migration;
    B表示待迁移到的目标服务器的某个对应线程池中的每个线程对一个请求的实际处理时间;B represents the actual processing time of one request for each thread in a corresponding thread pool of the target server to be migrated;
    n表示待迁移到的目标服务器的某个对应线程池中的线程个数。n indicates the number of threads in a corresponding thread pool of the target server to be migrated to.
  11. 一种设备分布式存储系统的负载监控设备,其中,该设备包括:A load monitoring device for a device distributed storage system, wherein the device includes:
    负载监控装置,用于确定分布式存储系统中每台服务器上所有负载压力超标的线程池;a load monitoring device, configured to determine a thread pool in which all load pressures on each server in the distributed storage system exceed the standard;
    报警或负载均衡装置,用于对每个负载压力超标的线程池进行报警或负载均衡。An alarm or load balancing device that is used to alarm or load balance each thread pool whose load pressure exceeds the standard.
  12. 如权利要求11所述的设备,其中,所述负载监控装置,用于获取每台服务器上的每个线程池的队列中一个请求的等待时间与逗留时间的比值,所述逗留时间为每个线程池的队列中一个请求的等待时间与实际处理时间之和;当所述等待时间与 逗留时间的比值超过预设超标阈值时,确定该请求所在的服务器上的线程池的负载压力超标。The device according to claim 11, wherein said load monitoring means is configured to obtain a ratio of a waiting time to a staying time of a request in a queue of each thread pool on each server, said staying time being each The sum of the wait time of a request in the queue of the thread pool and the actual processing time; when the wait time is When the ratio of the stay time exceeds the preset over-standard threshold, it is determined that the load pressure of the thread pool on the server where the request is located exceeds the standard.
  13. 如权利要求12所述的设备,其中,所述预设超标阈值根据线程池的请求到达率的预设阈值确定,其中,所述线程池的请求到达率为请求到达线程池的队列的速率与该线程池的单位时间的服务能力的比值,当线程池的请求到达率超过线程池的请求到达率的预设阈值时,对应的等待时间与逗留时间的比值开始急剧上升,所述预设超标阈值超过所述开始急剧上升时等待时间与逗留时间的比值。The device of claim 12, wherein the preset exceeding threshold is determined according to a preset threshold of a request arrival rate of the thread pool, wherein the request arrival rate of the thread pool is a rate of a queue requesting to reach the thread pool The ratio of the service capacity of the thread pool per unit time. When the request arrival rate of the thread pool exceeds the preset threshold of the request arrival rate of the thread pool, the ratio of the corresponding waiting time to the stay time starts to rise sharply, and the preset exceeds the standard. The threshold value exceeds the ratio of the waiting time to the staying time when the start of the sharp rise.
  14. 如权利要求11至13任一项所述的设备,其中,所述报警或负载均衡装置,用于将经过每个负载压力超标的线程池的请求按照分区进行统计,统计出该线程池中隶属于不同分区的请求的个数,并将分区按请求的个数降序排列;判断请求个数最多的分区的请求数量是否超过该线程池的所有分区的请求的总数的一半,若是,对该请求个数最多的分区进行分裂操作。The device according to any one of claims 11 to 13, wherein the alarm or load balancing device is configured to count the request of the thread pool that passes each load pressure exceeding the standard according to the partition, and count the membership in the thread pool. The number of requests in different partitions, and the partitions are sorted in descending order of the number of requests; determine whether the number of requests for the partition with the largest number of requests exceeds half of the total number of requests for all partitions of the thread pool, and if so, the request The partition with the largest number of partitions performs the split operation.
  15. 如权利要求14所述的设备,其中,所述报警或负载均衡装置,用于将该分区分成若干个子分区,将子分区分散到其它服务器上,其中,每个子分区对应该分区的键码范围内的一个子键码范围,每个子分区隶属的请求个数基本相等。The apparatus according to claim 14, wherein said alarm or load balancing means is configured to divide the partition into a plurality of sub-partitions, and to distribute the sub-partitions to other servers, wherein each sub-part corresponds to a key range of the partition Within a subkey code range, the number of requests per subpartition is substantially equal.
  16. 如权利要求14所述的设备,其中,所述报警或负载均衡装置,用于判断请求个数最多的分区的请求数量是否超过该线程池的所有分区的请求的总数的一半,若否,在所述降序排列的分区中从第一分区开始依次选择一个或多个分区,直至未选择的剩余的分区所隶属的请求的总数小于该线程池的所有分区的请求的总数的一半;对选择的分区进行迁移操作。The device according to claim 14, wherein said alarm or load balancing means is configured to determine whether the number of requests for the partition having the largest number of requests exceeds half of the total number of requests of all partitions of the thread pool, and if not, Selecting one or more partitions from the first partition in the descending ranked partition until the total number of requests belonging to the unselected remaining partitions is less than half of the total number of requests of all partitions of the thread pool; The partition is migrated.
  17. 如权利要求16所述的设备,其中,所述报警或负载均衡装置,用于将每个选择的分区迁移到没有负载压力超标的线程池的服务器上。The apparatus of claim 16 wherein said alerting or load balancing means is operative to migrate each selected partition to a server of a thread pool that is not overloaded with load.
  18. 如权利要求17所述的设备,其中,所述报警或负载均衡装置,用于查找符合条件的没有负载压力超标的线程池的服务器,若查找到,将该选择的分区迁移到该查找到的服务器上。The device according to claim 17, wherein said alarm or load balancing means is configured to find a server of a thread pool that does not have a load pressure exceeding the standard, and if found, migrate the selected partition to the found one. On the server.
  19. 如权利要求18所述的设备,其中,所述符合条件的没有负载压力超标的线程池的服务器包括:The apparatus of claim 18, wherein the eligible server of the thread pool without load stress exceeding the standard comprises:
    若将某个选择的分区迁移到某个没有负载压力超标的线程池的服务器的对应线程池上后,当该迁移到的目标服务器的每个对应线程池的线程平均使用率均没有超过预设使用率阈值,则该服务器为符合条件的没有负载压力超标的线程池的服务 器。If a selected partition is migrated to a corresponding thread pool of a server that has no thread pool with excessive load pressure, the average thread usage of each corresponding thread pool of the migrated target server does not exceed the preset usage. Rate threshold, then the server is a qualified thread pool service with no load pressure exceeded Device.
  20. 如权利要求19所述的设备,其中,每个线程池的线程平均使用率通过如下公式(λ1+λ)*B/n获取,其中,The apparatus according to claim 19, wherein the thread average usage rate of each thread pool is obtained by the following formula (λ 1 + λ) * B / n, wherein
    λ1表示在迁移前服务器上的某个线程池中的请求到达线程池的队列的速率;λ 1 represents the rate at which a request in a thread pool on the server before the migration reaches the queue of the thread pool;
    λ表示待迁移到的目标服务器在迁移前其上的某个对应线程池中的请求到达线程池的队列的速率;λ represents the rate at which the target server to be migrated to the queue of the thread pool in a corresponding thread pool before the migration;
    B表示待迁移到的目标服务器的某个对应线程池中的每个线程对一个请求的实际处理时间;B represents the actual processing time of one request for each thread in a corresponding thread pool of the target server to be migrated;
    n表示待迁移到的目标服务器的某个对应线程池中的线程个数。 n indicates the number of threads in a corresponding thread pool of the target server to be migrated to.
PCT/CN2016/093893 2015-08-17 2016-08-08 Method and device for monitoring load of distributed storage system WO2017028696A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510504654.2A CN106469018B (en) 2015-08-17 2015-08-17 Load monitoring method and device for distributed storage system
CN201510504654.2 2015-08-17

Publications (1)

Publication Number Publication Date
WO2017028696A1 true WO2017028696A1 (en) 2017-02-23

Family

ID=58050746

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/093893 WO2017028696A1 (en) 2015-08-17 2016-08-08 Method and device for monitoring load of distributed storage system

Country Status (2)

Country Link
CN (1) CN106469018B (en)
WO (1) WO2017028696A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109298917A (en) * 2017-07-25 2019-02-01 沈阳高精数控智能技术股份有限公司 A kind of self-adapting dispatching method suitable for real-time system hybrid task
CN109542629A (en) * 2018-12-26 2019-03-29 苏州乐麟无线信息科技有限公司 A kind of processing method and processing device of the data based on distributed system
CN112685196A (en) * 2020-12-24 2021-04-20 平安普惠企业管理有限公司 Thread pool management method, device, equipment and medium suitable for distributed technology
CN112749013A (en) * 2021-01-19 2021-05-04 广州虎牙科技有限公司 Thread load detection method and device, electronic equipment and storage medium
CN115934372A (en) * 2023-03-09 2023-04-07 浪潮电子信息产业股份有限公司 Data processing method, system, equipment and computer readable storage medium
CN116107760A (en) * 2023-04-07 2023-05-12 浪潮电子信息产业股份有限公司 Load balancing method, device, equipment and medium
CN112749013B (en) * 2021-01-19 2024-04-19 广州虎牙科技有限公司 Thread load detection method and device, electronic equipment and storage medium

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110928661B (en) * 2019-11-22 2023-06-16 北京浪潮数据技术有限公司 Thread migration method, device, equipment and readable storage medium
CN111949482B (en) * 2020-08-13 2022-05-20 广东佳米科技有限公司 Software performance bottleneck indication method and system based on thread load
CN114785796A (en) * 2022-04-22 2022-07-22 中国农业银行股份有限公司 Data equalization method and device
CN115033390B (en) * 2022-08-09 2022-11-25 阿里巴巴(中国)有限公司 Load balancing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090157969A1 (en) * 2007-12-18 2009-06-18 Harding Matthew J Buffer cache management to prevent deadlocks
CN102207891A (en) * 2011-06-10 2011-10-05 浙江大学 Method for achieving dynamic partitioning and load balancing of data-partitioning distributed environment
CN102594861A (en) * 2011-12-15 2012-07-18 杭州电子科技大学 Cloud storage system with balanced multi-server load
US20140143789A1 (en) * 2009-08-25 2014-05-22 Netapp, Inc. Adjustment of threads for execution based on over-utilization of a domain in a multi-processor system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6898617B2 (en) * 1999-11-18 2005-05-24 International Business Machines Corporation Method, system and program products for managing thread pools of a computing environment to avoid deadlock situations by dynamically altering eligible thread pools
US8713576B2 (en) * 2006-02-21 2014-04-29 Silicon Graphics International Corp. Load balancing for parallel tasks
CN101452406B (en) * 2008-12-23 2011-05-18 北京航空航天大学 Cluster load balance method transparent for operating system
CN102567089B (en) * 2011-10-25 2014-02-19 曙光信息产业(北京)有限公司 Design method for thread pool of metadata server in distributed file system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090157969A1 (en) * 2007-12-18 2009-06-18 Harding Matthew J Buffer cache management to prevent deadlocks
US20140143789A1 (en) * 2009-08-25 2014-05-22 Netapp, Inc. Adjustment of threads for execution based on over-utilization of a domain in a multi-processor system
CN102207891A (en) * 2011-06-10 2011-10-05 浙江大学 Method for achieving dynamic partitioning and load balancing of data-partitioning distributed environment
CN102594861A (en) * 2011-12-15 2012-07-18 杭州电子科技大学 Cloud storage system with balanced multi-server load

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109298917A (en) * 2017-07-25 2019-02-01 沈阳高精数控智能技术股份有限公司 A kind of self-adapting dispatching method suitable for real-time system hybrid task
CN109298917B (en) * 2017-07-25 2020-10-30 沈阳高精数控智能技术股份有限公司 Self-adaptive scheduling method suitable for real-time system mixed task
CN109542629A (en) * 2018-12-26 2019-03-29 苏州乐麟无线信息科技有限公司 A kind of processing method and processing device of the data based on distributed system
CN112685196A (en) * 2020-12-24 2021-04-20 平安普惠企业管理有限公司 Thread pool management method, device, equipment and medium suitable for distributed technology
CN112685196B (en) * 2020-12-24 2023-12-08 湖北华中电力科技开发有限责任公司 Thread pool management method, device, equipment and medium suitable for distributed technology
CN112749013A (en) * 2021-01-19 2021-05-04 广州虎牙科技有限公司 Thread load detection method and device, electronic equipment and storage medium
CN112749013B (en) * 2021-01-19 2024-04-19 广州虎牙科技有限公司 Thread load detection method and device, electronic equipment and storage medium
CN115934372A (en) * 2023-03-09 2023-04-07 浪潮电子信息产业股份有限公司 Data processing method, system, equipment and computer readable storage medium
CN116107760A (en) * 2023-04-07 2023-05-12 浪潮电子信息产业股份有限公司 Load balancing method, device, equipment and medium
CN116107760B (en) * 2023-04-07 2023-07-14 浪潮电子信息产业股份有限公司 Load balancing method, device, equipment and medium

Also Published As

Publication number Publication date
CN106469018B (en) 2019-12-27
CN106469018A (en) 2017-03-01

Similar Documents

Publication Publication Date Title
WO2017028696A1 (en) Method and device for monitoring load of distributed storage system
US10789085B2 (en) Selectively providing virtual machine through actual measurement of efficiency of power usage
US10715460B2 (en) Opportunistic resource migration to optimize resource placement
US9262181B2 (en) Process grouping for improved cache and memory affinity
US20150295970A1 (en) Method and device for augmenting and releasing capacity of computing resources in real-time stream computing system
US20130332608A1 (en) Load balancing for distributed key-value store
WO2017016421A1 (en) Method of executing tasks in a cluster and device utilizing same
US9563426B1 (en) Partitioned key-value store with atomic memory operations
WO2017020742A1 (en) Load balancing method and device
US10356150B1 (en) Automated repartitioning of streaming data
CN107122126B (en) Data migration method, device and system
US9807014B2 (en) Reactive throttling of heterogeneous migration sessions in a virtualized cloud environment
TWI694700B (en) Data processing method and device, user terminal
US9596298B1 (en) Load balancing in a distributed processing system
US20150081914A1 (en) Allocation of Resources Between Web Services in a Composite Service
WO2023005771A1 (en) Track querying method and device, storage medium, and computer program product
CN112988066B (en) Data processing method and device
TWI697223B (en) Data processing method
EP3423940A1 (en) A method and device for scheduling resources
Tang et al. An intermediate data partition algorithm for skew mitigation in spark computing environment
US10326826B1 (en) Migrating an on premises workload to a web services platform
US10594620B1 (en) Bit vector analysis for resource placement in a distributed system
US10812408B1 (en) Preventing concentrated selection of resource hosts for placing resources
US10387578B1 (en) Utilization limiting for nested object queries
CN106571935B (en) Resource scheduling method and equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16836560

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16836560

Country of ref document: EP

Kind code of ref document: A1