WO2017028697A1 - 计算机集群的扩容和缩容方法及设备 - Google Patents

计算机集群的扩容和缩容方法及设备 Download PDF

Info

Publication number
WO2017028697A1
WO2017028697A1 PCT/CN2016/093894 CN2016093894W WO2017028697A1 WO 2017028697 A1 WO2017028697 A1 WO 2017028697A1 CN 2016093894 W CN2016093894 W CN 2016093894W WO 2017028697 A1 WO2017028697 A1 WO 2017028697A1
Authority
WO
WIPO (PCT)
Prior art keywords
servers
computer cluster
server
current service
service requirements
Prior art date
Application number
PCT/CN2016/093894
Other languages
English (en)
French (fr)
Inventor
程霖
卢毅军
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017028697A1 publication Critical patent/WO2017028697A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/177Initialisation or configuration control

Definitions

  • the present application relates to the field of computers, and in particular, to a method and a device for expanding and shrinking a computer cluster.
  • the cluster access pressure becomes larger, it is necessary to expand the computer cluster and increase the number of servers in the computer cluster.
  • the access pressure is reduced, the computer cluster needs to be reduced. , that is, reducing the number of servers in the computer cluster that are serving.
  • the expansion and shrinkage of distributed computer clusters are generally completed by manual operations. Not only is the operation cumbersome, but it is also difficult to realize real-time and rapid expansion and contraction of distributed computer clusters.
  • An object of the present invention is to provide a method and a device for expanding and shrinking a computer cluster, which can solve the problem that the expansion and contraction process of the existing distributed computer cluster is not real-time, cumbersome, and inefficient.
  • a method for expanding and shrinking a computer cluster comprising:
  • the number of servers in the computer cluster that respond to all current service requirements is increased or decreased based on all current service requirements of the computer cluster and real-time performance parameters of all servers in the computer cluster.
  • the number of servers in the computer cluster that respond to all current service requirements is increased or decreased, including:
  • the computer cluster includes a front end machine and/or a back end machine in a distributed lock service.
  • the number of servers in the backend machine that respond to all current service requirements is always an odd number, and the backend machine The reduced number of servers in response to all current service requirements is greater than half of the original number before the decrease.
  • the real-time performance parameter of each server includes at least one of the following:
  • obtaining real-time performance parameters of each server in the computer cluster including:
  • a background monitoring process is configured in the to-be-monitored indicator item of the user process of each server in the computer cluster, and the real-time performance parameter of the to-be-monitored indicator item is collected by the background monitoring process.
  • the method further includes:
  • the background monitoring process is closed on the reduced server.
  • a device for expanding and shrinking a computer cluster comprising:
  • a parameter obtaining device configured to acquire real-time performance parameters of each server in the computer cluster
  • a capacity expansion and reduction device for increasing or decreasing the response to all current service requirements in the computer cluster according to all current service requirements of the computer cluster and real-time performance parameters of all servers in the computer cluster The number of servers requested.
  • the capacity expansion and reduction device is configured to increase the response in the computer cluster when the number of servers responding to all current service requirements is insufficient according to real-time performance parameters of all servers in the computer cluster.
  • the capacity expansion and reduction device is configured to determine, according to a preset indicator threshold and a real-time performance parameter corresponding to each server in the computer cluster, that the number of servers responding to all current service requirements is insufficient or the quantity is redundant. .
  • the computer cluster includes a front end machine and/or a back end machine in a distributed lock service.
  • the capacity expansion and reduction device is configured to enable the backend machine to respond to the current before and after increasing or decreasing the number of servers in the backend machine that respond to all current service requirements.
  • the number of servers required for all services is always an odd number, and the reduced number of servers in the backend machine that respond to all current service requirements is greater than half of the original number before the reduction.
  • the parameter obtaining device acquires real-time performance parameters of each server including at least one of the following:
  • the parameter obtaining device is configured to: in the indicator item to be monitored of the user process of each server in the computer cluster, a background monitoring process is set, and the to-be-monitored process is collected by the background monitoring process. Real-time performance parameters for indicator items.
  • the device further includes a starting device, configured to start the background monitoring process on the added server after increasing the number of servers in the computer cluster that respond to all current service requirements;
  • the apparatus also includes a shutdown device for shutting down the background monitoring process on the reduced server after reducing the number of servers in the computer cluster that are responsive to all current service requirements.
  • the present application increases or decreases the real-time performance parameter of each server in the computer cluster according to all current service requirements of the computer cluster and real-time performance parameters of all servers in the computer cluster.
  • Respond to the current number of servers in all service requirements and can obtain the performance change of each server in the computer cluster in real time, and then learn the operation status of the computer cluster according to the performance change, and automatically increase or decrease the operation according to the operation of the computer cluster.
  • the number of servers in the computer cluster that respond to all current service requirements is used to automatically and efficiently expand and shrink the computer cluster. This embodiment is especially applicable to a computer cluster with a large amount of access.
  • the present application determines, according to real-time performance parameters of all servers in the computer cluster, that the number of servers responding to all current service requirements is insufficient, and increases the number of servers in the computer cluster that respond to all current service requirements, according to the computer cluster.
  • the real-time performance parameters of all servers determine that the number of servers responding to all current service requirements is redundant, the number of servers in the computer cluster that respond to all current service requirements is reduced, and the real-time performance parameters of all servers in the computer cluster can be used.
  • Real-time monitoring of the load of each server in the computer cluster the number of servers is insufficient or redundant to achieve automatic and efficient expansion and shrinkage of the computer cluster.
  • the present application determines that the number of servers responding to all current service requirements is insufficient or the quantity is redundant according to the preset index threshold and the real-time performance parameter corresponding to each server in the computer cluster, so that the judgment result and subsequent determination according to the judgment result are performed.
  • the expansion and shrinkage of cluster servers is more efficient and accurate.
  • the backend machine responds to all current service requirements before and after increasing or decreasing the number of servers in the backend machine that respond to all current service requirements.
  • the number of servers is always an odd number, and the reduced number of servers in the backend machine that respond to all current service requirements is greater than half of the original number before the reduction, thereby achieving non-perceived, automatic Efficiently expand and shrink.
  • the present application collects a real-time performance parameter of the to-be-monitored indicator item by using the monitoring code by embedding a monitoring code, that is, a background monitoring process, in the to-be-monitored indicator item of the user process of each server in the computer cluster. This enables real-time acquisition of real-time performance parameters without the need to write additional monitoring programs that are independent of the user process, reducing the programmer's workload.
  • FIG. 1 shows a flow chart of a method for expanding and shrinking a computer cluster according to an aspect of the present application
  • FIG. 2 is a structural diagram of a capacity expansion and reduction device of a computer cluster according to another aspect of the present application.
  • FIG. 3 is a structural diagram of a capacity expansion and reduction device of a computer cluster according to a preferred embodiment of the present application
  • FIG. 4 shows a schematic diagram of a particular application embodiment in accordance with the present application.
  • the terminal, the device of the service network, and the trusted party each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage,
  • computer readable media does not include non-transitory computer readable media, such as modulated data signals and carrier waves.
  • the present application provides a method for expanding and shrinking a computer cluster, where the method includes:
  • Step S1 Obtain real-time performance parameters of each server in the computer cluster; where the content of the real-time performance parameters may be based on various real-time performance parameters of the server running performance selected by the actual monitoring, and may include at least one of the following: each The number of connections to the server, the number of read and write requests per server, the CPU utilization of each server, and the disk utilization of each server;
  • Step S2 Increase or decrease the number of servers in the computer cluster that respond to all current service requirements according to current service requirements of the computer cluster and real-time performance parameters of all servers in the computer cluster.
  • the solution of step S2 can be implemented by an intelligent unified operation and maintenance deployment platform, this embodiment
  • the performance change of each server in the computer cluster can be obtained in real time, and the operation status of the computer cluster can be known according to the performance change, and automatically increased according to the operation of the computer cluster or
  • the number of servers in the computer cluster that respond to all current service requirements is reduced, and the computer cluster is automatically and efficiently expanded and reduced.
  • This embodiment is particularly applicable to a computer cluster with a large amount of access.
  • step S2 the response in the computer cluster is increased or decreased according to all current service requirements of the computer cluster and real-time performance parameters of all servers in the computer cluster.
  • the number of servers currently required for all services including:
  • Step S21 When it is determined that the number of servers responding to all current service requirements is insufficient according to real-time performance parameters of all servers in the computer cluster, increase the number of servers in the computer cluster that respond to all current service requirements; Real-time performance parameters of all servers in the cluster, such as the number of connections per server, the number of read and write requests per server, the CPU utilization of each server, and the disk utilization of each server The number of servers that respond to service requirements cannot meet the demand for all services, then increase the number of servers in the cluster that respond to all current service requirements. For example, the current service demand has 11,000 read requests, while the current server The number can only meet 10,000 read requests, then you need to increase the corresponding number of servers to meet the remaining 1000 read requests;
  • Step S22 When it is determined that the number of servers responding to all current service requirements is redundant according to the real-time performance parameters of all servers in the computer cluster, the number of servers in the computer cluster that respond to all current service requirements is reduced.
  • the real-time performance parameters of all servers in the computer cluster such as the number of connections per server, the number of read and write requests per server, the CPU utilization of each server, and the disk utilization of each server.
  • the current service demand is There are 8000 read requests, and the current number of servers is five. As long as three of them satisfy the 8000 read requests, two servers need to be reduced to save server resources.
  • the real-time performance parameters of all the servers in the computer cluster are used to monitor the load of each server in the computer cluster in real time, and the number of servers is insufficient or redundant to realize automatic and efficient expansion and contraction of the computer cluster.
  • the number of servers responding to all current service requirements is determined to be insufficient or redundant according to the real-time performance parameters of all servers in the computer cluster in step S21 or step S22.
  • the threshold and real-time performance parameters determine that the number of servers responding to all current service requirements is insufficient or the quantity is redundant, so that the judgment result and the subsequent expansion and contraction of the cluster server according to the judgment result are more efficient and accurate.
  • the real-time performance parameters of all servers in the computer cluster such as the number of connections per server, the number of read and write requests per server, the CPU utilization of each server, and the disk utilization of each server.
  • the item or any combination determines that the number of servers that are required by the service cannot satisfy all the service requirements, then increases the number of servers in the computer cluster that respond to all current service requirements.
  • the current service demand has 11,000 reads.
  • the request, and the current number of servers is 5, respectively, A server, B server, C server, D server, E server, wherein the preset threshold of the A server meets up to 5000 read requests and the preset threshold of the B server.
  • the preset metric threshold of the C server can satisfy up to 1000 read requests
  • the preset metric threshold of the D server can satisfy up to 1000 read requests
  • the preset metric threshold of the E server can satisfy up to 1000 read requests.
  • the five servers add up to only 10,000 read requests, and need to increase the corresponding number of servers. Meet the remaining 1000 read requests; in addition, according to the real-time performance parameters of all servers in the computer cluster, such as the number of connections per server, the number of read and write requests per server, the CPU utilization of each server, and each One or any combination of the server's disk utilization determines the number of servers in the computer cluster that respond to all current service requirements, such as the number of servers that meet the service demand and when there are remaining service requirements, such as The current service requirement is 8000 read requests, and the current number of servers is 5, which are A server, B server, C server, D server, and E server.
  • the preset threshold of the A server is up to 5000.
  • the read request the default metric threshold of the B server can satisfy up to 2000 read requests
  • the preset metric threshold of the C server can satisfy up to 1000 read requests
  • the preset metric threshold of the D server can satisfy up to 1000 read requests and the preset of the E server.
  • the indicator threshold can satisfy up to 1000 read requests, and only three of them are Server A, B, and C.
  • Device, server D 8000 satisfy the read request, the need to reduce server 2 servers D, E server, to conserve server resources.
  • the computer cluster includes a front end machine and/or a back end machine in a distributed lock service.
  • a stateless intermediate layer is added between the client (client) and the back end machine (Quorum).
  • Front end (proxy) The front-end proxy (proxy) in the intermediate service is stateless, that is, each server in the front-end machine has no storage medium and does not need to store data, and usually plays a request from the client (client) to the back-end machine (Quorum). Role to reduce the data processing pressure of the back-end machine.
  • the backend machine is a group of machines in a distributed consistency system, which is stateful, that is, each service in the backend machine There is a storage medium for storing data, and the data stored in the storage medium on each server in the backend machine is always consistent, and the backend machine receives the forwarded client request from the front end machine and processes it.
  • the solution of the present application is applied to the front-end machine and/or the back-end machine, and the front-end machine and/or the back-end machine can be automatically and efficiently expanded and reduced.
  • step S2 increases or decreases the number of servers in the computer cluster that respond to all current service requirements.
  • the number of servers in the backend machine that respond to all current service requirements is always an odd number, and the backend machine The reduced number of servers in response to all current service requirements is greater than half of the original number before the decrease.
  • the computer cluster is the front-end machine, determining, according to real-time performance parameters of all servers in the computer cluster, increasing or decreasing the number of servers in the computer cluster that respond to all current service requirements, thereby implementing the front-end machine.
  • the number of servers in the backend machine that respond to all current service requirements is always an odd number, that is, the number of servers that respond to all current service demands is increased or decreased each time to an even number, and the required backend is additionally required.
  • the backend machine After the step of responding to the current number of servers for all service requests in the machine, the backend machine responds to the current location
  • the current number of servers for some service requirements is greater than half of the original number before the reduction, thereby achieving non-perceived, automatic, and efficient expansion and shrinkage of the backend machines.
  • the original number of servers in the backend machine that respond to all current service requirements is 5, then, if necessary, increase the number
  • the number of each increase must be any of the 2, 4, 6, etc.; when it is necessary to reduce the response to all current services in the backend
  • the backend The current number of servers in the machine that respond to all current service requirements is greater than half of the original number before the reduction. If the original number is 5, then only 2 can be reduced to ensure that the reduced backend responds to the current
  • the current number of servers for all service needs is three, which is greater than half of the original number before the reduction.
  • step S1 real-time performance parameters of each server in the computer cluster are obtained, including:
  • the monitoring code is embedded in the to-be-monitored indicator item of the user process of each server in the computer cluster, that is, a background monitoring process, and the real-time performance parameter of the to-be-monitored indicator item is collected by the monitoring code, thereby realizing the real-time performance parameter.
  • Real-time acquisition eliminates the need to write additional monitors that are independent of the user's process, reducing the programmer's workload.
  • a performance counter may be used in the monitoring code for recording continuous data based on time series, such as the number of connections per server, the number of read and write requests per server, and the like.
  • the method further includes:
  • the background monitoring process is started on the added server, so that the real-time performance parameter of the server that newly responds to the current service requirement is monitored, so as to facilitate real-time expansion and volume reduction of the subsequent computer cluster, specifically, increasing
  • the background monitoring process is started on the server, it may be checked whether there is a software package that starts the background monitoring process on the added server, and if there is no software package, first push the added server to the server. After the software package is started, the background monitoring process is started on the server, and if there is a software package, the background monitoring process is directly started on the server;
  • the method further includes:
  • the background monitoring process is closed on the reduced server to monitor the real-time performance parameters of the server that responds to the reduced current service demand.
  • a device for expanding and shrinking a computer cluster is further provided, where the device 100 includes:
  • the parameter obtaining device 1 is configured to obtain real-time performance parameters of each server in the computer cluster; where the content of the real-time performance parameter may be based on various real-time performance parameters of the server running performance selected by the actual monitoring, and may include at least one of the following Item: number of connections per server, number of read and write requests per server, CPU utilization per server, and disk utilization per server;
  • the capacity expansion and reduction device 2 is configured to increase or decrease the number of servers in the computer cluster that respond to all current service requirements according to current service requirements of the computer cluster and real-time performance parameters of all servers in the computer cluster.
  • the solution of the capacity expansion and reduction device 2 can be implemented by an intelligent unified operation and maintenance deployment platform.
  • each server in the computer cluster can be obtained in real time by acquiring real-time performance parameters of each server in the computer cluster. Performance changes, and then based on performance changes The operation of the computer cluster, and automatically increases or decreases the number of servers in the computer cluster that respond to all current service requirements according to the operation of the computer cluster, and realizes automatic and efficient capacity expansion and volume reduction of the computer cluster. This embodiment is particularly applicable. On a cluster of computers with huge traffic.
  • the capacity expansion and reduction device 2 is configured to determine a server that responds to all current service requirements according to real-time performance parameters of all servers in the computer cluster.
  • the number is not enough, increase the number of servers in the computer cluster that respond to all current service requirements; here, according to the real-time performance parameters of all servers in the computer cluster, such as the number of connections per server and the read and write requests of each server
  • the number of servers, the CPU utilization of each server, and the disk utilization of each server determine whether the number of servers that are required by the service cannot meet the requirements of all the services, and then increase the response in the cluster.
  • the number of servers required for all services for example, the current service demand is 11000 read requests, and the current number of servers can only meet 10,000 read requests, then the corresponding number of servers need to be increased to meet the remaining 1000 reads. request;
  • the number of servers in the computer cluster that respond to all current service requirements is reduced.
  • the real-time performance parameters of all servers in the computer cluster such as the number of connections per server, the number of read and write requests per server, the CPU utilization of each server, and the disk utilization of each server.
  • the current service demand is There are 8000 read requests, and the current number of servers is five. As long as three of them satisfy the 8000 read requests, two servers need to be reduced to save server resources.
  • the real-time performance parameters of all the servers in the computer cluster are used to monitor the load of each server in the computer cluster in real time, and the number of servers is insufficient or redundant to realize automatic and efficient expansion and contraction of the computer cluster.
  • the capacity expansion and reduction device 2 is configured to determine the response current according to a preset index threshold and a real-time performance parameter corresponding to each server in the computer cluster.
  • the number of servers required for the service is insufficient or the quantity is redundant, so that the judgment result and the subsequent expansion and contraction of the cluster server according to the judgment result are more efficient and accurate.
  • the real-time performance parameters of all servers in the computer cluster such as the number of connections per server, the number of read and write requests per server, the CPU utilization of each server, and the disk utilization of each server. If the number of servers or any combination determines that the number of servers that are required by the service cannot meet all of the service requirements, increase the calculation.
  • the number of servers in the cluster that respond to all current service requirements For example, the current service requirement is 11,000 read requests, and the current number of servers is 5, A, B, C, D, and E.
  • the server wherein the preset metric threshold of the A server satisfies up to 5000 read requests, the preset metric threshold of the B server satisfies up to 2000 read requests, and the preset metric threshold of the C server satisfies up to 1000 read requests, and the D server pref If the threshold value of the indicator meets up to 1000 read requests and the preset threshold of the E server satisfies up to 1000 read requests, the five servers can only satisfy up to 10,000 read requests, and the corresponding number of servers need to be added to satisfy the remaining 1000 read requests; in addition, according to the real-time performance parameters of all servers in the computer cluster, such as the number of connections per server, the number of read and write requests per server, the CPU utilization of each server, and the per-server One or any combination of disk utilization to determine the number of servers that are required to
  • the current service demand has 8000 read requests, and the current number of servers is 5, which is the A server.
  • the B server, the C server, the D server, and the E server wherein the preset metric threshold of the A server satisfies up to 5000 read requests, the preset metric threshold of the B server satisfies up to 2000 read requests, and the preset metric threshold of the C server is the most Meet 1000 read requests, the D server's preset metric threshold can satisfy up to 1000 read requests, and the E server's preset metric threshold can satisfy up to 1000 read requests, and only 3 servers A, B, C, If the D server satisfies the 8000 read requests, it needs to reduce two server D servers and E servers to save server resources.
  • the computer cluster includes a front end machine and/or a back end machine in a distributed lock service.
  • a stateless intermediate layer is added between the client (client) and the back end machine (Quorum).
  • Front end (proxy) The front-end proxy (proxy) in the intermediate service is stateless, that is, each server in the front-end machine has no storage medium and does not need to store data, and usually plays a request from the client (client) to the back-end machine (Quorum). Role to reduce the data processing pressure of the back-end machine.
  • the backend machine is a machine group in a distributed consistency system, which is stateful, that is, each server in the backend machine has a storage medium for storing data, and a storage medium on each server in the backend machine
  • the data stored in the file is always consistent, and the backend machine receives the forwarded client request from the front end machine and processes it.
  • the solution of the present application is applied to the front-end machine and/or the back-end machine, and the front-end machine and/or the back-end machine can be automatically and efficiently expanded and reduced.
  • the expansion and shrinkage loading Set to 2 before and after the increase or decrease of the number of servers in the backend machine that respond to all current service requirements, the number of servers in the backend machine that respond to all current service requirements is always an odd number And the reduced number of servers in the backend machine that respond to all current service requirements is greater than half of the original number before the decrease.
  • the computer cluster is the front-end machine, determining, according to real-time performance parameters of all servers in the computer cluster, increasing or decreasing the number of servers in the computer cluster that respond to all current service requirements, thereby implementing the front-end machine.
  • the number of servers in the backend machine that respond to all current service requirements is always an odd number, that is, the number of servers that respond to all current service demands is increased or decreased each time to an even number, and the required backend is additionally required.
  • the backend machine After the step of responding to the current number of servers for all service requests in the machine, the backend machine responds to the current location
  • the current number of servers for some service requirements is greater than half of the original number before the reduction, thereby achieving non-perceived, automatic, and efficient expansion and shrinkage of the backend machines.
  • the original number of servers in the backend machine that respond to all current service requirements is 5, then, if necessary, increase the number
  • the number of each increase must be any of the 2, 4, 6, etc.; when it is necessary to reduce the response to all current services in the backend
  • the backend The current number of servers in the machine that respond to all current service requirements is greater than half of the original number before the reduction. If the original number is 5, then only 2 can be reduced to ensure that the reduced backend responds to the current
  • the current number of servers for all service needs is three, which is greater than half of the original number before the reduction.
  • the parameter obtaining apparatus 1 is configured to embed a background monitoring process in the to-be-monitored indicator item of the user process of each server in the computer cluster.
  • the real-time performance parameter of the to-be-monitored indicator item is collected by the background monitoring process, so that real-time collection of real-time performance parameters is realized, and no additional monitoring program independent of the user process is required, thereby reducing the workload of the programmer.
  • a performance counter may be used in the monitoring code for recording continuous data based on time series, such as the number of connections per server, the number of read and write requests per server, and the like.
  • the device 100 further includes an activation device 3, configured to start the background monitoring process on the added server after increasing the number of servers in the computer cluster that respond to all current service requirements, thereby responding to the newly added response to the current service.
  • the real-time performance parameters of the required server are monitored to facilitate subsequent real-time expansion and volume reduction of the further computer cluster. Specifically, before the background monitoring process is started on the added server, the added server may be checked first.
  • the network monitoring system is first pushed to the added server, and then the background monitoring process is started on the server, if there is a software package, The background monitoring process can be directly started on the server;
  • the apparatus further includes a shutdown device 4 for closing the background monitoring process on the reduced server after reducing the number of servers in the computer cluster that respond to all current service requirements, thereby ending the reduction Monitoring of real-time performance parameters of the server in response to current service requirements.
  • a monitoring code that is, a background monitoring process 12 is embedded in the to-be-monitored indicator item of the user process 11 of each server 10 in the computer cluster, through the monitoring.
  • the code collects the real-time performance parameters of the to-be-monitored indicator items, and then the collected real-time performance parameters are sent by the collection server (Logtail) 13 to a unified deployment operation and maintenance system 14, and the unified deployment operation and maintenance system 14 is based on the computer.
  • the current service requirements of the cluster and the real-time performance parameters of all servers in the cluster of computers increase or decrease the number of servers 10 in the computer cluster 15 that respond to all current service requirements, and may also be activated on the added servers.
  • the background monitoring process may close the background monitoring process on the reduced server and further acquire cluster information, such as information of an idle server in the computer cluster, so that the performance change of each server in the computer cluster can be obtained in real time. Then, according to the performance change, the operation status of the computer cluster is known, and the root The operation of the computer cluster automatically increases or decreases the number of servers in the computer cluster that respond to all current service requirements, and realizes automatic and efficient expansion and contraction of the computer cluster. In addition, the background is started on the added server.
  • cluster information such as information of an idle server in the computer cluster
  • the present application obtains real-time performance parameters of each server in the computer cluster, and increases or decreases the response current in the computer cluster according to all current service requirements of the computer cluster and real-time performance parameters of all servers in the computer cluster.
  • the number of servers required for all services can be obtained in real time Take the performance change of each server in the computer cluster, and then learn the operation status of the computer cluster according to the performance change, and automatically increase or decrease the number of servers in the computer cluster that respond to all current service requirements according to the operation of the computer cluster.
  • the automatic and efficient expansion and contraction of the computer cluster is realized, and the embodiment is particularly applicable to a computer cluster with a large amount of access.
  • the present application determines, according to real-time performance parameters of all servers in the computer cluster, that the number of servers responding to all current service requirements is insufficient, and increases the number of servers in the computer cluster that respond to all current service requirements, according to the computer cluster.
  • the real-time performance parameters of all servers determine that the number of servers responding to all current service requirements is redundant, the number of servers in the computer cluster that respond to all current service requirements is reduced, and the real-time performance parameters of all servers in the computer cluster can be used.
  • Real-time monitoring of the load of each server in the computer cluster the number of servers is insufficient or redundant to achieve automatic and efficient expansion and shrinkage of the computer cluster.
  • the present application determines that the number of servers responding to all current service requirements is insufficient or the quantity is redundant according to the preset index threshold and the real-time performance parameter corresponding to each server in the computer cluster, so that the judgment result and subsequent determination according to the judgment result are performed.
  • the expansion and shrinkage of cluster servers is more efficient and accurate.
  • the backend machine responds to all current service requirements before and after increasing or decreasing the number of servers in the backend machine that respond to all current service requirements.
  • the number of servers is always an odd number, and the reduced number of servers in the backend machine that respond to all current service requirements is greater than half of the original number before the reduction, thereby achieving non-perceived, automatic Efficiently expand and shrink.
  • the present application collects a real-time performance parameter of the to-be-monitored indicator item by using the monitoring code by embedding a monitoring code, that is, a background monitoring process, in the to-be-monitored indicator item of the user process of each server in the computer cluster. This enables real-time acquisition of real-time performance parameters without the need to write additional monitoring programs that are independent of the user process, reducing the programmer's workload.
  • the present application can be implemented in software and/or a combination of software and hardware, for example, using an application specific integrated circuit (ASIC), a general purpose computer, or any other similar hardware device.
  • the software program of the present application can be executed by a processor to implement the steps or functions described above.
  • the software program (including related data structures) of the present application can be stored and readable by a computer In a recording medium, for example, a RAM memory, a magnetic or optical drive or a floppy disk, and the like.
  • some of the steps or functions of the present application may be implemented in hardware, for example, as a circuit that cooperates with a processor to perform various steps or functions.
  • a portion of the present application can be applied as a computer program product, such as computer program instructions, which, when executed by a computer, can invoke or provide a method and/or technical solution in accordance with the present application.
  • the program instructions for invoking the method of the present application may be stored in a fixed or removable recording medium, and/or transmitted by a data stream in a broadcast or other signal bearing medium, and/or stored in a The working memory of the computer device in which the program instructions are run.
  • an embodiment in accordance with the present application includes a device including a memory for storing computer program instructions and a processor for executing program instructions, wherein when the computer program instructions are executed by the processor, triggering
  • the apparatus operates based on the aforementioned methods and/or technical solutions in accordance with various embodiments of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)
  • Debugging And Monitoring (AREA)
  • Computer And Data Communications (AREA)

Abstract

本申请提供一种计算机集群的扩容和缩容方法及设备,本申请通过获取计算机集群中每台服务器的实时性能参数,根据计算机集群的当前所有的服务需求和该计算机集群中所有服务器的实时性能参数,增加或减少该计算机集群中响应当前所有的服务需求的服务器的数量,能够实时地获取计算机集群中每台服务器的性能变化情况,进而根据性能变化情况获知计算机集群的运行情况,并根据计算机集群的运行情况自动增加或减少该计算机集群中响应当前所有的服务需求的服务器的数量,实现计算机集群的自动、高效地扩容和缩容,本实施例尤其适用在访问量巨大的计算机集群上。

Description

计算机集群的扩容和缩容方法及设备
本申请要求2015年08月17日递交的申请号为201510504622.2、发明名称为“计算机集群的扩容和缩容方法及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机领域,尤其涉及一种计算机集群的扩容和缩容方法及设备。
背景技术
在分布式计算机集群服务中,随着集群访问压力越来越大,需要对计算机集群进行扩容,增加计算机集群中的进行服务的服务器数量;当访问压力减少时,又需要对计算机集群进行缩容,即减少计算机集群中的进行服务的服务器数量。目前一般是由人工操作完成分布式计算机集群的扩容与缩容,不仅操作比较麻烦,更难做到分布式计算机集群的实时、快速地扩容与缩容。
发明内容
本申请的一个目的是提供一种用于计算机集群的扩容和缩容方法及设备,能够解决现有的分布式计算机集群的扩容与缩容过程不实时、操作繁琐和效率低的问题。
根据本申请的一个方面,提供了一种计算机集群的扩容和缩容方法,该方法包括:
获取计算机集群中每台服务器的实时性能参数;
根据计算机集群的当前所有的服务需求和该计算机集群中所有服务器的实时性能参数,增加或减少该计算机集群中响应当前所有的服务需求的服务器的数量。
进一步的,上述方法中,根据计算机集群的当前所有的服务需求和该计算机集群中所有服务器的实时性能参数,增加或减少该计算机集群中响应当前所有的服务需求的服务器的数量,包括:
当根据计算机集群中所有服务器的实时性能参数判断响应当前所有的服务需求的服务器的数量不够时,增加该计算机集群中响应当前所有的服务需求的服务器的数量;
当根据计算机集群中所有服务器的实时性能参数判断响应当前所有的服务需求的服务器的数量有多余时,减少该计算机集群中响应当前所有的服务需求的服务器的数量。
进一步的,上述方法中,根据计算机集群中所有服务器的实时性能参数判断响 应当前所有的服务需求的服务器的数量不够或数量有多余中,根据计算机集群中每台服务器对应的预设指标阈值和实时性能参数判断响应当前所有的服务需求的服务器的数量不够或数量有多余。
进一步的,上述方法中,所述计算机集群包括分布式锁服务中的前端机和/或后端机。
进一步的,上述方法中,当所述计算机集群为所述后端机,增加或减少该计算机集群中响应当前所有的服务需求的服务器的数量中,
所述增加或减少该后端机中响应当前所有的服务需求的服务器的数量之前及之后,该后端机中响应当前所有的服务需求的服务器的数量始终为奇数台,且所述后端机中响应当前所有的服务需求的服务器的减少后的数量大于其减少前的原始数量的一半。
进一步的,上述方法中,所述每台服务器的实时性能参数包括以下至少任一项:
每台服务器的连接数;
每台服务器的读写请求数;
每台服务器的CPU利用率;
每台服务器的磁盘利用率。
进一步的,上述方法中,获取计算机集群中每台服务器的实时性能参数,包括:
在计算机集群中的每台服务器的用户进程的待监控指标项中植入一后台监控进程,通过所述后台监控进程采集所述待监控指标项的实时性能参数。
进一步的,上述方法中,增加该计算机集群中响应当前所有的服务需求的服务器的数量之后,还包括:
在增加的所述服务器上启动所述后台监控进程;
减少该计算机集群中响应当前所有的服务需求的服务器的数量之后,还包括:
在减少的所述服务器上关闭所述后台监控进程。
根据本申请的另一个方面,还提供一种计算机集群的扩容和缩容方法设备,该设备包括:
参数获取装置,用于获取计算机集群中每台服务器的实时性能参数;
扩容和缩容装置,用于根据计算机集群的当前所有的服务需求和该计算机集群中所有服务器的实时性能参数,增加或减少该计算机集群中响应当前所有的服务需 求的服务器的数量。
进一步的,上述设备中,所述扩容和缩容装置,用于当根据计算机集群中所有服务器的实时性能参数判断响应当前所有的服务需求的服务器的数量不够时,增加该计算机集群中响应当前所有的服务需求的服务器的数量;
当根据计算机集群中所有服务器的实时性能参数判断响应当前所有的服务需求的服务器的数量有多余时,减少该计算机集群中响应当前所有的服务需求的服务器的数量。
进一步的,上述设备中,所述扩容和缩容装置,用于根据计算机集群中每台服务器对应的预设指标阈值和实时性能参数判断响应当前所有的服务需求的服务器的数量不够或数量有多余。
进一步的,上述设备中,所述计算机集群包括分布式锁服务中的前端机和/或后端机。
进一步的,上述设备中,所述扩容和缩容装置,用于在所述增加或减少该后端机中响应当前所有的服务需求的服务器的数量之前及之后,使该后端机中响应当前所有的服务需求的服务器的数量始终为奇数台,且所述后端机中响应当前所有的服务需求的服务器的减少后的数量大于其减少前的原始数量的一半。
进一步的,上述设备中,所述参数获取装置,获取的每台服务器的实时性能参数包括以下至少任一项:
每台服务器的连接数;
每台服务器的读写请求数;
每台服务器的CPU利用率;
每台服务器的磁盘利用率。
进一步的,上述设备中,所述参数获取装置,用于在计算机集群中的每台服务器的用户进程的待监控指标项中植入一后台监控进程,通过所述后台监控进程采集所述待监控指标项的实时性能参数。
进一步的,上述设备中,所述设备还包括启动装置,用于在增加该计算机集群中响应当前所有的服务需求的服务器的数量之后,在增加的所述服务器上启动所述后台监控进程;
所述设备还包括关闭装置,用于在减少该计算机集群中响应当前所有的服务需求的服务器的数量之后,在减少的所述服务器上关闭所述后台监控进程。
与现有技术相比,本申请通过获取计算机集群中每台服务器的实时性能参数,根据计算机集群的当前所有的服务需求和该计算机集群中所有服务器的实时性能参数,增加或减少该计算机集群中响应当前所有的服务需求的服务器的数量,能够实时地获取计算机集群中每台服务器的性能变化情况,进而根据性能变化情况获知计算机集群的运行情况,并根据计算机集群的运行情况自动增加或减少该计算机集群中响应当前所有的服务需求的服务器的数量,实现计算机集群的自动、高效地扩容和缩容,本实施例尤其适用在访问量巨大的计算机集群上。
进一步的,本申请根据计算机集群中所有服务器的实时性能参数判断响应当前所有的服务需求的服务器的数量不够时,增加该计算机集群中响应当前所有的服务需求的服务器的数量,当根据计算机集群中所有服务器的实时性能参数判断响应当前所有的服务需求的服务器的数量有多余时,减少该计算机集群中响应当前所有的服务需求的服务器的数量,能够通过计算机集群中所有服务器的实时性能参数,来实时监控计算机集群中各服务器的负载大小,在服务器的数量不够或有多余实现计算机集群的自动、高效地扩容和缩容。
进一步的,本申请根据计算机集群中每台服务器对应的预设指标阈值和实时性能参数判断响应当前所有的服务需求的服务器的数量不够或数量有多余,从而使判断结果及后续根据该判断结果进行的集群服务器的扩容和缩容更高效和准确。
进一步的,当所述计算机集群为所述后端机,所述增加或减少该后端机中响应当前所有的服务需求的服务器的数量之前及之后,该后端机中响应当前所有的服务需求的服务器的数量始终为奇数台,且所述后端机中响应当前所有的服务需求的服务器的减少后的数量大于其减少前的原始数量的一半,从而实现后端机的无感知地、自动、高效地扩容和缩容。
进一步的,本申请通过在计算机集群中的每台服务器的用户进程的待监控指标项中植入监控代码即一后台监控进程,通过所述监控代码采集所述待监控指标项的实时性能参数,从而实现对实时性能参数的实时采集,无需编写独立于用户进程的额外监控程序,减少程序员的工作量。
附图说明
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:
图1示出根据本申请一个方面的一种计算机集群的扩容和缩容方法的流程图;
图2示出本申请另一方面的计算机集群的扩容和缩容设备的结构图;
图3示出根据本申请一个优选实施例的计算机集群的扩容和缩容设备的结构图;
图4示出根据本申请一具体的应用实施例的原理图。
附图中相同或相似的附图标记代表相同或相似的部件。
具体实施方式
下面结合附图对本申请作进一步详细描述。
在本申请一个典型的配置中,终端、服务网络的设备和可信方均包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
如图1所示,本申请提供一种计算机集群的扩容和缩容方法,该方法包括:
步骤S1,获取计算机集群中每台服务器的实时性能参数;在此,实时性能参数的内容可以根据实际监控需要选择的服务器运行性能的各种实时性能参数,可以包括以下至少任一项:每台服务器的连接数、每台服务器的读写请求数、每台服务器的CPU利用率和每台服务器的磁盘利用率;
步骤S2,根据计算机集群的当前所有的服务需求和该计算机集群中所有服务器的实时性能参数,增加或减少该计算机集群中响应当前所有的服务需求的服务器的数量。在此,步骤S2的方案可以智能化的统一的运维部署平台来实现,本实施例 通过获取计算机集群中每台服务器的实时性能参数,能够实时地获取计算机集群中每台服务器的性能变化情况,进而根据性能变化情况获知计算机集群的运行情况,并根据计算机集群的运行情况自动增加或减少该计算机集群中响应当前所有的服务需求的服务器的数量,实现计算机集群的自动、高效地扩容和缩容,本实施例尤其适用在访问量巨大的计算机集群上。
本申请的计算机集群的扩容和缩容方法一优选的实施例中,步骤S2,根据计算机集群的当前所有的服务需求和该计算机集群中所有服务器的实时性能参数,增加或减少该计算机集群中响应当前所有的服务需求的服务器的数量,包括:
步骤S21,当根据计算机集群中所有服务器的实时性能参数判断响应当前所有的服务需求的服务器的数量不够时,增加该计算机集群中响应当前所有的服务需求的服务器的数量;在此,可根据计算机集群中所有服务器的实时性能参数,如每台服务器的连接数、每台服务器的读写请求数、每台服务器的CPU利用率和每台服务器的磁盘利用率中的一项或任意组合判断当响服务需求的服务器的数量不能满足当所有的服务需求时,则增加该计算机集群中响应当前所有的服务需求的服务器的数量,例如,当前的服务需求是有11000个读请求,而当前的服务器数量只能满足10000个读请求,则需要增加对应数量的服务器,以满足剩余的1000个读请求;
步骤S22,当根据计算机集群中所有服务器的实时性能参数判断响应当前所有的服务需求的服务器的数量有多余时,减少该计算机集群中响应当前所有的服务需求的服务器的数量。在此,可根据计算机集群中所有服务器的实时性能参数,如每台服务器的连接数、每台服务器的读写请求数、每台服务器的CPU利用率和每台服务器的磁盘利用率中的一项或任意组合判断当响应服务需求的服务器的数量能够满足当前所有的服务需求且有剩余时时,则减少该计算机集群中响应当前所有的服务需求的服务器的数量,例如,当前的服务需求是有8000个读请求,而当前的服务器数量为5台,而只要其中的3台服务器就满足该8000个读请求,则需要减少2台服务器,以节省服务器资源。本实施例通过计算机集群中所有服务器的实时性能参数,来实时监控计算机集群中各服务器的负载大小,在服务器的数量不够或有多余实现计算机集群的自动、高效地扩容和缩容。
本申请的计算机集群的扩容和缩容方法一优选的实施例中,步骤S21或步骤S22的根据计算机集群中所有服务器的实时性能参数判断响应当前所有的服务需求的服务器的数量不够或数量有多余中,根据计算机集群中每台服务器对应的预设指标 阈值和实时性能参数判断响应当前所有的服务需求的服务器的数量不够或数量有多余,从而使判断结果及后续根据该判断结果进行的集群服务器的扩容和缩容更高效和准确。在此,可根据计算机集群中所有服务器的实时性能参数,如每台服务器的连接数、每台服务器的读写请求数、每台服务器的CPU利用率和每台服务器的磁盘利用率中的一项或任意组合判断当响服务需求的服务器的数量不能满足当所有的服务需求时,则增加该计算机集群中响应当前所有的服务需求的服务器的数量,例如,当前的服务需求是有11000个读请求,而当前的服务器数量为5台分别为A服务器、B服务器、C服务器、D服务器、E服务器,其中,A服务器的预设指标阈值最多满足5000个读请求、B服务器的预设指标阈值最多满足2000个读请求、C服务器的预设指标阈值最多满足1000个读请求、D服务器的预设指标阈值最多满足1000个读请求、E服务器的预设指标阈值最多满足1000个读请求,则该5台服务器加起来最多只能满足10000个读请求,需要增加对应数量的服务器,以满足剩余的1000个读请求;另外,也可根据计算机集群中所有服务器的实时性能参数,如每台服务器的连接数、每台服务器的读写请求数、每台服务器的CPU利用率和每台服务器的磁盘利用率中的一项或任意组合判断当响服务需求的服务器的数量满足当所有的服务需求且有剩余时时,则减少该计算机集群中响应当前所有的服务需求的服务器的数量,例如,当前的服务需求是有8000个读请求,而当前的服务器数量为5台,为A服务器、B服务器、C服务器、D服务器、E服务器,其中,A服务器的预设指标阈值最多满足5000个读请求、B服务器的预设指标阈值最多满足2000个读请求、C服务器的预设指标阈值最多满足1000个读请求、D服务器的预设指标阈值最多满足1000个读请求、E服务器的预设指标阈值最多满足1000个读请求,而只要其中的3台服务器A服务器、B服务器、C服务器、D服务器就满足该8000个读请求,则需要减少2台服务器D服务器、E服务器,以节省服务器资源。
本申请的计算机集群的扩容和缩容方法一优选的实施例中,所述计算机集群包括分布式锁服务中的前端机和/或后端机。在此,在分布式锁服务中,为了能够减轻后端机(Quorum)的压力与做到水平扩展,会在客户端(client)与后端机(Quorum)之间加入无状态的中间层的前端机(proxy)。处于中间服务的前端机(proxy)是无状态的,即该前端机中每台服务器没有存储介质,无需存储数据,通常起着将从客户端(client)请求转发给后端机(Quorum)的作用,以减轻后端机的数据处理压力。后端机是分布式一致性系统中的机器组,其是有状态的,即后端机中的每台服 务器上有用于存储数据的存储介质,后端机中的每台服务器上的存储介质中存储的数据始终保持一致,后端机从前端机接收转发的客户端请求并进行处理。在此,将本申请的方案应用于前端机和/或后端机,可以实现前端机和/或后端机自动、高效地扩容和缩容。
本申请的计算机集群的扩容和缩容方法一优选的实施例中,当所述计算机集群为所述后端机,步骤S2的增加或减少该计算机集群中响应当前所有的服务需求的服务器的数量中,
所述增加或减少该后端机中响应当前所有的服务需求的服务器的数量之前及之后,该后端机中响应当前所有的服务需求的服务器的数量始终为奇数台,且所述后端机中响应当前所有的服务需求的服务器的减少后的数量大于其减少前的原始数量的一半。具体的,当所述计算机集群为所述前端机时,根据计算机集群中所有服务器的实时性能参数确定增加或减少该计算机集群中响应当前所有的服务需求的服务器的数量,从而实现前端机的无感知地、自动、高效地扩容和缩容,而当所述计算机集群为所述后端机,由于后端机上每台服务器是有存储介质的(有状态的),除了考虑计算机集群中所有服务器的实时性能参数外,还需要考虑到后端机(Quorum)的冗余度,所以本实施例中要求增加或减少该后端机中响应当前所有的服务需求的服务器的数量的步骤之前及之后,该后端机中响应当前所有的服务需求的服务器的数量始终为奇数台,即每次增加或减少的响应当前所有的服务需求的服务器的数量为偶数台,另外要求所述减少该后端机中响应当前所有的服务需求的服务器的数量的步骤之后,该后端机中响应当前所有的服务需求的服务器的当前数量大于其减少前的原始数量的一半,从而实现后端机的无感知地、自动、高效地扩容和缩容。例如,增加或减少该后端机中响应当前所有的服务需求的服务器的数量的步骤之前,该后端机中响应当前所有的服务需求的服务器的原始数量为5台,那么,如果需要增加该后端机中响应当前所有的服务需求的服务器的数量时,每次增加的数量必须是2、4、6…等中的任一偶数台;当需要减少该后端机中响应当前所有的服务需求的服务器的数量时,每次减少的数量必须是2、4、6…等中的任一偶数台,且减少该后端机中响应当前所有的服务需求的服务器的数量后,该后端机中响应当前所有的服务需求的服务器的当前数量大于其减少前的原始数量的一半,如果原始数量为5台的话,那只能减少2台,才能保证减少后的该后端机中响应当前所有的服务需求的服务器的当前数量为3台,大于减少前的原始数量5台的一半。
本申请的计算机集群的扩容和缩容方法一优选的实施例中,步骤S1,获取计算机集群中每台服务器的实时性能参数,包括:
在计算机集群中的每台服务器的用户进程的待监控指标项中植入监控代码即一后台监控进程,通过所述监控代码采集所述待监控指标项的实时性能参数,从而实现对实时性能参数的实时采集,无需编写独立于用户进程的额外监控程序,减少程序员的工作量。在此,所述监控代码中可采用一性能计数器,用于记录基于时间序列的连续数据,如可以记录每台服务器的连接数、每台服务器的读写请求数等等。
本申请的计算机集群的扩容和缩容方法一优选的实施例中,步骤S2的增加该计算机集群中响应当前所有的服务需求的服务器的数量之后,还包括:
在增加的所述服务器上启动所述后台监控进程,从而对该新增加响应当前服务需求的服务器的实时性能参数进行监控,便于后续进一步的计算机集群的实时扩容、缩容,具体的,在增加的所述服务器上启动所述后台监控进程之前,可以先检查下增加的所述服务器上是否有启动所述后台监控进程的软件包,如果没有软件包,则先向增加的所述服务器推送该软件包后,再在该服务器上启动所述后台监控进程,如果有软件包,则在该服务器上直接启动所述后台监控进程即可;
相应的,步骤S2的减少该计算机集群中响应当前所有的服务需求的服务器的数量之后,还包括:
在减少的所述服务器上关闭所述后台监控进程,从而对结束对该减少的响应当前服务需求的服务器的实时性能参数进行的监控。
如图2所示,根据本申请的另一面,还提供一种计算机集群的扩容和缩容设备,其中,该设备100包括:
参数获取装置1,用于获取计算机集群中每台服务器的实时性能参数;在此,实时性能参数的内容可以根据实际监控需要选择的服务器运行性能的各种实时性能参数,可以包括以下至少任一项:每台服务器的连接数、每台服务器的读写请求数、每台服务器的CPU利用率和每台服务器的磁盘利用率;
扩容和缩容装置2,用于根据计算机集群的当前所有的服务需求和该计算机集群中所有服务器的实时性能参数,增加或减少该计算机集群中响应当前所有的服务需求的服务器的数量。在此,扩容和缩容装置2的方案可以智能化的统一的运维部署平台来实现,本实施例通过获取计算机集群中每台服务器的实时性能参数,能够实时地获取计算机集群中每台服务器的性能变化情况,进而根据性能变化情况获知 计算机集群的运行情况,并根据计算机集群的运行情况自动增加或减少该计算机集群中响应当前所有的服务需求的服务器的数量,实现计算机集群的自动、高效地扩容和缩容,本实施例尤其适用在访问量巨大的计算机集群上。
本申请的计算机集群的扩容和缩容设备一优选的实施例中,所述扩容和缩容装置2,用于当根据计算机集群中所有服务器的实时性能参数判断响应当前所有的服务需求的服务器的数量不够时,增加该计算机集群中响应当前所有的服务需求的服务器的数量;在此,可根据计算机集群中所有服务器的实时性能参数,如每台服务器的连接数、每台服务器的读写请求数、每台服务器的CPU利用率和每台服务器的磁盘利用率中的一项或任意组合判断当响服务需求的服务器的数量不能满足当所有的服务需求时,则增加该计算机集群中响应当前所有的服务需求的服务器的数量,例如,当前的服务需求是有11000个读请求,而当前的服务器数量只能满足10000个读请求,则需要增加对应数量的服务器,以满足剩余的1000个读请求;
当根据计算机集群中所有服务器的实时性能参数判断响应当前所有的服务需求的服务器的数量有多余时,减少该计算机集群中响应当前所有的服务需求的服务器的数量。在此,可根据计算机集群中所有服务器的实时性能参数,如每台服务器的连接数、每台服务器的读写请求数、每台服务器的CPU利用率和每台服务器的磁盘利用率中的一项或任意组合判断当响应服务需求的服务器的数量能够满足当前所有的服务需求且有剩余时时,则减少该计算机集群中响应当前所有的服务需求的服务器的数量,例如,当前的服务需求是有8000个读请求,而当前的服务器数量为5台,而只要其中的3台服务器就满足该8000个读请求,则需要减少2台服务器,以节省服务器资源。本实施例通过计算机集群中所有服务器的实时性能参数,来实时监控计算机集群中各服务器的负载大小,在服务器的数量不够或有多余实现计算机集群的自动、高效地扩容和缩容。
本申请的计算机集群的扩容和缩容设备一优选的实施例中,所述扩容和缩容装置2,用于根据计算机集群中每台服务器对应的预设指标阈值和实时性能参数判断响应当前所有的服务需求的服务器的数量不够或数量有多余,从而使判断结果及后续根据该判断结果进行的集群服务器的扩容和缩容更高效和准确。在此,可根据计算机集群中所有服务器的实时性能参数,如每台服务器的连接数、每台服务器的读写请求数、每台服务器的CPU利用率和每台服务器的磁盘利用率中的一项或任意组合判断当响服务需求的服务器的数量不能满足当所有的服务需求时,则增加该计算 机集群中响应当前所有的服务需求的服务器的数量,例如,当前的服务需求是有11000个读请求,而当前的服务器数量为5台分别为A服务器、B服务器、C服务器、D服务器、E服务器,其中,A服务器的预设指标阈值最多满足5000个读请求、B服务器的预设指标阈值最多满足2000个读请求、C服务器的预设指标阈值最多满足1000个读请求、D服务器的预设指标阈值最多满足1000个读请求、E服务器的预设指标阈值最多满足1000个读请求,则该5台服务器加起来最多只能满足10000个读请求,需要增加对应数量的服务器,以满足剩余的1000个读请求;另外,也可根据计算机集群中所有服务器的实时性能参数,如每台服务器的连接数、每台服务器的读写请求数、每台服务器的CPU利用率和每台服务器的磁盘利用率中的一项或任意组合判断当响服务需求的服务器的数量满足当所有的服务需求且有剩余时时,则减少该计算机集群中响应当前所有的服务需求的服务器的数量,例如,当前的服务需求是有8000个读请求,而当前的服务器数量为5台,为A服务器、B服务器、C服务器、D服务器、E服务器,其中,A服务器的预设指标阈值最多满足5000个读请求、B服务器的预设指标阈值最多满足2000个读请求、C服务器的预设指标阈值最多满足1000个读请求、D服务器的预设指标阈值最多满足1000个读请求、E服务器的预设指标阈值最多满足1000个读请求,而只要其中的3台服务器A服务器、B服务器、C服务器、D服务器就满足该8000个读请求,则需要减少2台服务器D服务器、E服务器,以节省服务器资源。
本申请的计算机集群的扩容和缩容设备一优选的实施例中,所述计算机集群包括分布式锁服务中的前端机和/或后端机。在此,在分布式锁服务中,为了能够减轻后端机(Quorum)的压力与做到水平扩展,会在客户端(client)与后端机(Quorum)之间加入无状态的中间层的前端机(proxy)。处于中间服务的前端机(proxy)是无状态的,即该前端机中每台服务器没有存储介质,无需存储数据,通常起着将从客户端(client)请求转发给后端机(Quorum)的作用,以减轻后端机的数据处理压力。后端机是分布式一致性系统中的机器组,其是有状态的,即后端机中的每台服务器上有用于存储数据的存储介质,后端机中的每台服务器上的存储介质中存储的数据始终保持一致,后端机从前端机接收转发的客户端请求并进行处理。在此,将本申请的方案应用于前端机和/或后端机,可以实现前端机和/或后端机自动、高效地扩容和缩容。
本申请的计算机集群的扩容和缩容设备一优选的实施例中,所述扩容和缩容装 置2,用于在所述增加或减少该后端机中响应当前所有的服务需求的服务器的数量之前及之后,使该后端机中响应当前所有的服务需求的服务器的数量始终为奇数台,且所述后端机中响应当前所有的服务需求的服务器的减少后的数量大于其减少前的原始数量的一半。具体的,当所述计算机集群为所述前端机时,根据计算机集群中所有服务器的实时性能参数确定增加或减少该计算机集群中响应当前所有的服务需求的服务器的数量,从而实现前端机的无感知地、自动、高效地扩容和缩容,而当所述计算机集群为所述后端机,由于后端机上每台服务器是有存储介质的(有状态的),除了考虑计算机集群中所有服务器的实时性能参数外,还需要考虑到后端机(Quorum)的冗余度,所以本实施例中要求增加或减少该后端机中响应当前所有的服务需求的服务器的数量的步骤之前及之后,该后端机中响应当前所有的服务需求的服务器的数量始终为奇数台,即每次增加或减少的响应当前所有的服务需求的服务器的数量为偶数台,另外要求所述减少该后端机中响应当前所有的服务需求的服务器的数量的步骤之后,该后端机中响应当前所有的服务需求的服务器的当前数量大于其减少前的原始数量的一半,从而实现后端机的无感知地、自动、高效地扩容和缩容。例如,增加或减少该后端机中响应当前所有的服务需求的服务器的数量的步骤之前,该后端机中响应当前所有的服务需求的服务器的原始数量为5台,那么,如果需要增加该后端机中响应当前所有的服务需求的服务器的数量时,每次增加的数量必须是2、4、6…等中的任一偶数台;当需要减少该后端机中响应当前所有的服务需求的服务器的数量时,每次减少的数量必须是2、4、6…等中的任一偶数台,且减少该后端机中响应当前所有的服务需求的服务器的数量后,该后端机中响应当前所有的服务需求的服务器的当前数量大于其减少前的原始数量的一半,如果原始数量为5台的话,那只能减少2台,才能保证减少后的该后端机中响应当前所有的服务需求的服务器的当前数量为3台,大于减少前的原始数量5台的一半。
本申请的计算机集群的扩容和缩容设备一优选的实施例中,所述参数获取装置1,用于在计算机集群中的每台服务器的用户进程的待监控指标项中植入一后台监控进程,通过所述后台监控进程采集所述待监控指标项的实时性能参数,从而实现对实时性能参数的实时采集,无需编写独立于用户进程的额外监控程序,减少程序员的工作量。在此,所述监控代码中可采用一性能计数器,用于记录基于时间序列的连续数据,如可以记录每台服务器的连接数、每台服务器的读写请求数等等。
如图3所示,本申请的计算机集群的扩容和缩容设备一优选的实施例中,所述 设备100还包括启动装置3,用于在增加该计算机集群中响应当前所有的服务需求的服务器的数量之后,在增加的所述服务器上启动所述后台监控进程,从而对该新增加响应当前服务需求的服务器的实时性能参数进行监控,便于后续进一步的计算机集群的实时扩容、缩容,具体的,在增加的所述服务器上启动所述后台监控进程之前,可以先检查下增加的所述服务器上是否有启动所述后台监控进程的软件包,如果没有软件包,则先向增加的所述服务器推送该软件包后,再在该服务器上启动所述后台监控进程,如果有软件包,则在该服务器上直接启动所述后台监控进程即可;
所述设备还包括关闭装置4,用于在减少该计算机集群中响应当前所有的服务需求的服务器的数量之后,在减少的所述服务器上关闭所述后台监控进程,从而对结束对该减少的响应当前服务需求的服务器的实时性能参数进行的监控。
如图4所示,本申请一具体的应用实施例中,在计算机集群中的每台服务器10的用户进程11的待监控指标项中植入监控代码即一后台监控进程12,通过所述监控代码采集所述待监控指标项的实时性能参数,然后由台服务器的收集装置(Logtail)13将采集到的实时性能参数发送到一统一部署运维系统14,由统部署运维系统14根据计算机集群的当前所有的服务需求和该计算机集群中所有服务器的实时性能参数增加或减少该计算机集群15中响应当前所有的服务需求的服务器10的数量,另外,还可在增加的所述服务器上启动所述后台监控进程或在减少的所述服务器上关闭所述后台监控进程及进一步获取集群信息如计算机集群中闲置服务器的信息等,从而能够实时地获取计算机集群中每台服务器的性能变化情况,进而根据性能变化情况获知计算机集群的运行情况,并根据计算机集群的运行情况自动增加或减少该计算机集群中响应当前所有的服务需求的服务器的数量,实现计算机集群的自动、高效地扩容和缩容,另外,在增加的所述服务器上启动所述后台监控进程,从而对该新增加响应当前服务需求的服务器的实时性能参数进行监控,便于后续进一步的计算机集群的实时扩容、缩容,在减少的所述服务器上关闭所述后台监控进程,从而对结束对该减少的响应当前服务需求的服务器的实时性能参数进行的监控。
综上所述,本申请通过获取计算机集群中每台服务器的实时性能参数,根据计算机集群的当前所有的服务需求和该计算机集群中所有服务器的实时性能参数,增加或减少该计算机集群中响应当前所有的服务需求的服务器的数量,能够实时地获 取计算机集群中每台服务器的性能变化情况,进而根据性能变化情况获知计算机集群的运行情况,并根据计算机集群的运行情况自动增加或减少该计算机集群中响应当前所有的服务需求的服务器的数量,实现计算机集群的自动、高效地扩容和缩容,本实施例尤其适用在访问量巨大的计算机集群上。
进一步的,本申请根据计算机集群中所有服务器的实时性能参数判断响应当前所有的服务需求的服务器的数量不够时,增加该计算机集群中响应当前所有的服务需求的服务器的数量,当根据计算机集群中所有服务器的实时性能参数判断响应当前所有的服务需求的服务器的数量有多余时,减少该计算机集群中响应当前所有的服务需求的服务器的数量,能够通过计算机集群中所有服务器的实时性能参数,来实时监控计算机集群中各服务器的负载大小,在服务器的数量不够或有多余实现计算机集群的自动、高效地扩容和缩容。
进一步的,本申请根据计算机集群中每台服务器对应的预设指标阈值和实时性能参数判断响应当前所有的服务需求的服务器的数量不够或数量有多余,从而使判断结果及后续根据该判断结果进行的集群服务器的扩容和缩容更高效和准确。
进一步的,当所述计算机集群为所述后端机,所述增加或减少该后端机中响应当前所有的服务需求的服务器的数量之前及之后,该后端机中响应当前所有的服务需求的服务器的数量始终为奇数台,且所述后端机中响应当前所有的服务需求的服务器的减少后的数量大于其减少前的原始数量的一半,从而实现后端机的无感知地、自动、高效地扩容和缩容。
进一步的,本申请通过在计算机集群中的每台服务器的用户进程的待监控指标项中植入监控代码即一后台监控进程,通过所述监控代码采集所述待监控指标项的实时性能参数,从而实现对实时性能参数的实时采集,无需编写独立于用户进程的额外监控程序,减少程序员的工作量。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。
需要注意的是,本申请可在软件和/或软件与硬件的组合体中被实施,例如,可采用专用集成电路(ASIC)、通用目的计算机或任何其他类似硬件设备来实现。在一个实施例中,本申请的软件程序可以通过处理器执行以实现上文所述步骤或功能。同样地,本申请的软件程序(包括相关的数据结构)可以被存储到计算机可读 记录介质中,例如,RAM存储器,磁或光驱动器或软磁盘及类似设备。另外,本申请的一些步骤或功能可采用硬件来实现,例如,作为与处理器配合从而执行各个步骤或功能的电路。
另外,本申请的一部分可被应用为计算机程序产品,例如计算机程序指令,当其被计算机执行时,通过该计算机的操作,可以调用或提供根据本申请的方法和/或技术方案。而调用本申请的方法的程序指令,可能被存储在固定的或可移动的记录介质中,和/或通过广播或其他信号承载媒体中的数据流而被传输,和/或被存储在根据所述程序指令运行的计算机设备的工作存储器中。在此,根据本申请的一个实施例包括一个装置,该装置包括用于存储计算机程序指令的存储器和用于执行程序指令的处理器,其中,当该计算机程序指令被该处理器执行时,触发该装置运行基于前述根据本申请的多个实施例的方法和/或技术方案。
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。装置权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第一,第二等词语用来表示名称,而并不表示任何特定的顺序。

Claims (16)

  1. 一种计算机集群的扩容和缩容方法,其中,该方法包括:
    获取计算机集群中每台服务器的实时性能参数;
    根据计算机集群的当前所有的服务需求和该计算机集群中所有服务器的实时性能参数,增加或减少该计算机集群中响应当前所有的服务需求的服务器的数量。
  2. 如权利要求1所述的方法,其中,根据计算机集群的当前所有的服务需求和该计算机集群中所有服务器的实时性能参数,增加或减少该计算机集群中响应当前所有的服务需求的服务器的数量,包括:
    当根据计算机集群中所有服务器的实时性能参数判断响应当前所有的服务需求的服务器的数量不够时,增加该计算机集群中响应当前所有的服务需求的服务器的数量;
    当根据计算机集群中所有服务器的实时性能参数判断响应当前所有的服务需求的服务器的数量有多余时,减少该计算机集群中响应当前所有的服务需求的服务器的数量。
  3. 如权利要求2所述的方法,其中,根据计算机集群中所有服务器的实时性能参数判断响应当前所有的服务需求的服务器的数量不够或数量有多余中,根据计算机集群中每台服务器对应的预设指标阈值和实时性能参数判断响应当前所有的服务需求的服务器的数量不够或数量有多余。
  4. 如权利要求1至3任一项所述的方法,其中,所述计算机集群包括分布式锁服务中的前端机和/或后端机。
  5. 如权利要求4所述的方法,其中,当所述计算机集群为所述后端机,增加或减少该计算机集群中响应当前所有的服务需求的服务器的数量中,
    所述增加或减少该后端机中响应当前所有的服务需求的服务器的数量之前及之后,该后端机中响应当前所有的服务需求的服务器的数量始终为奇数台,且所述后端机中响应当前所有的服务需求的服务器的减少后的数量大于其减少前的原始数量的一半。
  6. 如权利要求1至5任一项所述的方法,其中,所述每台服务器的实时性能参数包括以下至少任一项:
    每台服务器的连接数;
    每台服务器的读写请求数;
    每台服务器的CPU利用率;
    每台服务器的磁盘利用率。
  7. 如权利要求1至6任一项所述的方法,其中,获取计算机集群中每台服务器的实时性能参数,包括:
    在计算机集群中的每台服务器的用户进程的待监控指标项中植入一后台监控进程,通过所述后台监控进程采集所述待监控指标项的实时性能参数。
  8. 如权利要求7所述的方法,其中,增加该计算机集群中响应当前所有的服务需求的服务器的数量之后,还包括:
    在增加的所述服务器上启动所述后台监控进程;
    减少该计算机集群中响应当前所有的服务需求的服务器的数量之后,还包括:
    在减少的所述服务器上关闭所述后台监控进程。
  9. 一种计算机集群的扩容和缩容设备,其中,该设备包括:
    参数获取装置,用于获取计算机集群中每台服务器的实时性能参数;
    扩容和缩容装置,用于根据计算机集群的当前所有的服务需求和该计算机集群中所有服务器的实时性能参数,增加或减少该计算机集群中响应当前所有的服务需求的服务器的数量。
  10. 如权利要求9所述的设备,其中,所述扩容和缩容装置,用于当根据计算机集群中所有服务器的实时性能参数判断响应当前所有的服务需求的服务器的数量不够时,增加该计算机集群中响应当前所有的服务需求的服务器的数量;
    当根据计算机集群中所有服务器的实时性能参数判断响应当前所有的服务需求的服务器的数量有多余时,减少该计算机集群中响应当前所有的服务需求的服务器的数量。
  11. 如权利要求10所述的设备,其中,所述扩容和缩容装置,用于根据计算机集群中每台服务器对应的预设指标阈值和实时性能参数判断响应当前所有的服务需求的服务器的数量不够或数量有多余。
  12. 如权利要求9至11任一项所述的设备,其中,所述计算机集群包括分布式锁服务中的前端机和/或后端机。
  13. 如权利要求12所述的设备,其中,所述扩容和缩容装置,用于在所述增加或减少该后端机中响应当前所有的服务需求的服务器的数量之前及之后,使该后端机中响应当前所有的服务需求的服务器的数量始终为奇数台,且所述后端机中响 应当前所有的服务需求的服务器的减少后的数量大于其减少前的原始数量的一半。
  14. 如权利要求9至13任一项所述的设备,其中,所述参数获取装置,获取的每台服务器的实时性能参数包括以下至少任一项:
    每台服务器的连接数;
    每台服务器的读写请求数;
    每台服务器的CPU利用率;
    每台服务器的磁盘利用率。
  15. 如权利要求9至14任一项所述的设备,其中,所述参数获取装置,用于在计算机集群中的每台服务器的用户进程的待监控指标项中植入一后台监控进程,通过所述后台监控进程采集所述待监控指标项的实时性能参数。
  16. 如权利要求15所述的设备,其中,所述设备还包括启动装置,用于在增加该计算机集群中响应当前所有的服务需求的服务器的数量之后,在增加的所述服务器上启动所述后台监控进程;
    所述设备还包括关闭装置,用于在减少该计算机集群中响应当前所有的服务需求的服务器的数量之后,在减少的所述服务器上关闭所述后台监控进程。
PCT/CN2016/093894 2015-08-17 2016-08-08 计算机集群的扩容和缩容方法及设备 WO2017028697A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510504622.2A CN106470219A (zh) 2015-08-17 2015-08-17 计算机集群的扩容和缩容方法及设备
CN201510504622.2 2015-08-17

Publications (1)

Publication Number Publication Date
WO2017028697A1 true WO2017028697A1 (zh) 2017-02-23

Family

ID=58050730

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/093894 WO2017028697A1 (zh) 2015-08-17 2016-08-08 计算机集群的扩容和缩容方法及设备

Country Status (2)

Country Link
CN (1) CN106470219A (zh)
WO (1) WO2017028697A1 (zh)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111225004A (zh) * 2018-11-23 2020-06-02 中移(杭州)信息技术有限公司 一种服务器集群的扩容方法、装置和可读介质
CN111464616A (zh) * 2020-03-30 2020-07-28 招商局金融科技有限公司 自动调节应用负载服务数量的方法、服务器及存储介质
CN111464355A (zh) * 2020-03-31 2020-07-28 北京金山云网络技术有限公司 Kubernetes容器集群的伸缩容控制方法、装置和网络设备
CN112149975A (zh) * 2020-09-11 2020-12-29 杭州东方通信软件技术有限公司 一种基于人工智能的apm监控系统及监控方法
CN112698949A (zh) * 2020-12-31 2021-04-23 珠海派诺科技股份有限公司 一种多源异构柔性均衡采集方法、电子设备、存储介质
CN112732528A (zh) * 2021-01-08 2021-04-30 卓望数码技术(深圳)有限公司 基于it运维监控的指标采集方法、系统、设备及存储介质
CN112738189A (zh) * 2020-12-24 2021-04-30 航天信息股份有限公司 集群资源管理方法、装置、存储介质及电子设备
CN113037528A (zh) * 2019-12-25 2021-06-25 中国移动通信集团山东有限公司 一种告警服务节点的弹性扩缩容方法、装置
CN114153518A (zh) * 2021-10-25 2022-03-08 国网江苏省电力有限公司信息通信分公司 一种云原生MySQL集群自主扩容缩容的方法
CN115499299A (zh) * 2022-09-13 2022-12-20 航天信息股份有限公司 一种集群设备监控方法及装置

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145393A (zh) * 2017-04-27 2017-09-08 努比亚技术有限公司 一种负载调整方法、设备及计算机可读存储介质
CN107046581A (zh) * 2017-05-19 2017-08-15 北京奇艺世纪科技有限公司 一种服务运行状态的监测方法、装置及服务器
CN107528795A (zh) * 2017-09-14 2017-12-29 深圳市盛路物联通讯技术有限公司 一种物联网数据传输方法及系统
CN107911419A (zh) * 2017-10-26 2018-04-13 广州市雷军游乐设备有限公司 服务器组内扩容的方法、装置、存储介质和系统
CN107948305B (zh) * 2017-12-11 2019-04-02 北京百度网讯科技有限公司 漏洞扫描方法、装置、设备及计算机可读介质
CN108769100A (zh) * 2018-04-03 2018-11-06 郑州云海信息技术有限公司 一种基于kubernetes容器数量弹性伸缩的实现方法及其装置
CN108667654B (zh) * 2018-04-19 2021-04-20 北京奇艺世纪科技有限公司 服务器集群自动扩容方法及相关设备
CN111008026B (zh) 2018-10-08 2024-03-26 阿里巴巴集团控股有限公司 集群管理方法、装置及系统
CN109660421A (zh) * 2018-10-26 2019-04-19 平安科技(深圳)有限公司 弹性调度资源的方法、装置、服务器及存储介质
CN109617738B (zh) * 2018-12-28 2022-05-31 优刻得科技股份有限公司 用户服务扩缩容的方法、系统和非易失性存储介质
CN109771939B (zh) * 2019-01-15 2022-07-12 网易(杭州)网络有限公司 游戏服务器调整方法与装置、存储介质、电子设备
CN109976917B (zh) * 2019-04-08 2020-09-11 科大讯飞股份有限公司 一种负载调度方法、装置、负载调度器、存储介质及系统
CN110289994B (zh) * 2019-06-06 2022-04-08 厦门网宿有限公司 一种集群容量调整方法及装置
CN110737593B (zh) * 2019-09-19 2022-03-29 平安科技(深圳)有限公司 智能容量管理方法、装置及存储介质
CN110933097B (zh) * 2019-12-05 2022-06-28 美味不用等(上海)信息科技股份有限公司 面向多服务网关的限流与自动扩缩容方法
CN113407297B (zh) * 2020-03-17 2023-12-26 中国移动通信集团浙江有限公司 容器管理方法、装置及计算设备
CN111431748B (zh) * 2020-03-20 2022-09-30 支付宝(杭州)信息技术有限公司 一种对集群进行自动运维的方法、系统及装置
CN112199251B (zh) * 2020-09-25 2022-04-29 同程网络科技股份有限公司 通过定时任务实现服务器动态增减的方法、系统及装置
CN112422329B (zh) * 2020-11-05 2022-08-05 杭州米络星科技(集团)有限公司 流媒体服务器集群的管理方法、装置和电子设备
CN112559459B (zh) * 2020-12-15 2024-02-13 跬云(上海)信息科技有限公司 一种基于云计算的自适应存储分层系统及方法
CN112671570A (zh) * 2020-12-16 2021-04-16 微梦创科网络科技(中国)有限公司 自动扩缩容的方法及系统
CN112887169A (zh) * 2021-01-26 2021-06-01 广州欢网科技有限责任公司 服务器自动扩容方法、装置及服务器集群

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110208844A1 (en) * 2008-10-29 2011-08-25 Huawei Technologies Co., Ltd. Cluster system, method and device for expanding cluster system
CN103023969A (zh) * 2012-11-15 2013-04-03 北京搜狐新媒体信息技术有限公司 一种云平台调度方法及系统
CN103475566A (zh) * 2013-07-10 2013-12-25 北京发发时代信息技术有限公司 一种实时消息交换平台及分布式集群组建方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7062567B2 (en) * 2000-11-06 2006-06-13 Endeavors Technology, Inc. Intelligent network streaming and execution system for conventionally coded applications
CN102035737A (zh) * 2010-12-08 2011-04-27 北京交通大学 一种基于认知网络的自适应负载均衡方法和装置
CN102646062B (zh) * 2012-03-20 2014-04-09 广东电子工业研究院有限公司 一种云计算平台应用集群弹性扩容方法
CN102833355B (zh) * 2012-09-22 2015-12-09 广东电子工业研究院有限公司 一种面向云计算的负载均衡系统及负载均衡方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110208844A1 (en) * 2008-10-29 2011-08-25 Huawei Technologies Co., Ltd. Cluster system, method and device for expanding cluster system
CN103023969A (zh) * 2012-11-15 2013-04-03 北京搜狐新媒体信息技术有限公司 一种云平台调度方法及系统
CN103475566A (zh) * 2013-07-10 2013-12-25 北京发发时代信息技术有限公司 一种实时消息交换平台及分布式集群组建方法

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111225004A (zh) * 2018-11-23 2020-06-02 中移(杭州)信息技术有限公司 一种服务器集群的扩容方法、装置和可读介质
CN113037528A (zh) * 2019-12-25 2021-06-25 中国移动通信集团山东有限公司 一种告警服务节点的弹性扩缩容方法、装置
CN111464616A (zh) * 2020-03-30 2020-07-28 招商局金融科技有限公司 自动调节应用负载服务数量的方法、服务器及存储介质
CN111464355A (zh) * 2020-03-31 2020-07-28 北京金山云网络技术有限公司 Kubernetes容器集群的伸缩容控制方法、装置和网络设备
CN111464355B (zh) * 2020-03-31 2022-11-15 北京金山云网络技术有限公司 Kubernetes容器集群的伸缩容控制方法、装置和网络设备
CN112149975A (zh) * 2020-09-11 2020-12-29 杭州东方通信软件技术有限公司 一种基于人工智能的apm监控系统及监控方法
CN112149975B (zh) * 2020-09-11 2023-04-18 杭州东方通信软件技术有限公司 一种基于人工智能的apm监控系统及监控方法
CN112738189A (zh) * 2020-12-24 2021-04-30 航天信息股份有限公司 集群资源管理方法、装置、存储介质及电子设备
CN112698949A (zh) * 2020-12-31 2021-04-23 珠海派诺科技股份有限公司 一种多源异构柔性均衡采集方法、电子设备、存储介质
CN112732528A (zh) * 2021-01-08 2021-04-30 卓望数码技术(深圳)有限公司 基于it运维监控的指标采集方法、系统、设备及存储介质
CN114153518A (zh) * 2021-10-25 2022-03-08 国网江苏省电力有限公司信息通信分公司 一种云原生MySQL集群自主扩容缩容的方法
CN115499299A (zh) * 2022-09-13 2022-12-20 航天信息股份有限公司 一种集群设备监控方法及装置

Also Published As

Publication number Publication date
CN106470219A (zh) 2017-03-01

Similar Documents

Publication Publication Date Title
WO2017028697A1 (zh) 计算机集群的扩容和缩容方法及设备
US10776174B2 (en) Managing hosted resources across different virtualization platforms
US10719260B2 (en) Techniques for storing and retrieving data from a computing device
EP3335120B1 (en) Method and system for resource scheduling
JP5744707B2 (ja) メモリ使用量照会ガバナのためのコンピュータ実装方法、コンピュータ・プログラム、およびシステム(メモリ使用量照会ガバナ)
CN112286503A (zh) 多注册中心的微服务统一管理方法、装置、设备及介质
JP6637022B2 (ja) メモリー管理のための無難なガベージ・コレクションおよびタグ付き整数
WO2018233630A1 (zh) 故障发现
WO2021093365A1 (zh) 一种gpu显存管理控制方法及相关装置
WO2017028719A1 (zh) 元数据输出方法、客户端和元数据服务器
CN107872517B (zh) 一种数据处理方法及装置
US20160246512A1 (en) Data Compression Method and Storage System
CN105610917B (zh) 实现系统中同步数据修复的方法及系统
CN106657182B (zh) 云端文件处理方法和装置
WO2017162028A1 (zh) 一种模拟线上压力测试的方法和装置
US20160314026A1 (en) Establishing causality order of computer trace records
CN115587118A (zh) 任务数据的维表关联处理方法及装置、电子设备
US11055223B2 (en) Efficient cache warm up based on user requests
CN112433921A (zh) 用于动态埋点的方法及设备
WO2016197853A1 (zh) 一种基于复杂度的业务处理方法和装置
CN107423188B (zh) 日志处理方法及设备
EP3349416A1 (en) Relationship chain processing method and system, and storage medium
US20180309702A1 (en) Method and device for processing data after restart of node
CN110019497B (zh) 一种数据读取方法及装置
CN112860720B (zh) 一种存储容量的更新方法以及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16836561

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16836561

Country of ref document: EP

Kind code of ref document: A1