CN111930502A

CN111930502A - Server management method, device, equipment and storage medium

Info

Publication number: CN111930502A
Application number: CN202010760328.9A
Authority: CN
Inventors: 戴超群; 周佳佳
Original assignee: Suzhou Jiaochi Artificial Intelligence Research Institute Co ltd
Current assignee: Suzhou Jiaochi Artificial Intelligence Research Institute Co ltd
Priority date: 2020-07-31
Filing date: 2020-07-31
Publication date: 2020-11-13

Abstract

The embodiment of the invention discloses a server management method, a server management device, a server management equipment and a storage medium. The method comprises the following steps: acquiring the number of queuing tasks and the number of idle servers of a computer cluster; judging whether the number of queued tasks is greater than a preset task number threshold value or not and whether the number of idle servers is less than the preset number of servers or not; if yes, acquiring a target available server from a bootable server list of the computer cluster, and controlling a target idle server to execute booting operation; the available servers in the list of bootable servers are servers that were successfully shutdown in the auto-shutdown process. The embodiment of the invention can dynamically and automatically start the server according to the shortage condition of the resource application of the current computing task of the computer cluster, ensure the timely processing of the computing task, realize the power consumption maintenance of the whole cluster at the degree suitable for the computing task and avoid the resource waste.

Description

Server management method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a server management method, a server management device, server management equipment and a storage medium.

Background

In a computer cluster, there are typically multiple servers making up a computing resource. Computer clusters improve computing power by distributing computing tasks to different servers of the cluster.

In the related art, servers in a computer cluster are generally uniformly turned on and off. All servers remain in the on state after being turned on. And if the server acquires the distributed computing task, executing corresponding computing operation. And if the server does not acquire the computing task, maintaining the starting state and waiting for distributing the task.

The usage of the computer clusters is dynamically changed during the actual operation of the computer clusters. It is possible that the utilization of a computer cluster is not high for some time, and a resource strain may occur due to a task surge at other times. When the utilization of the computer cluster is not high, all the servers in the related art are kept in the on state, which causes a certain waste of resources.

Disclosure of Invention

Embodiments of the present invention provide a server management method, an apparatus, a device, and a storage medium, which can dynamically start a suitable number of servers according to an actual operating condition of a computer cluster, so as to maintain power consumption of the entire computer cluster at a level suitable for a computing task, and avoid resource waste.

In a first aspect, an embodiment of the present invention provides a server management method, including:

acquiring the number of queuing tasks and the number of idle servers of a computer cluster;

judging whether the number of the queued tasks is larger than a preset task number threshold value or not and whether the number of the idle servers is smaller than the preset number of the servers or not;

if the number of the queued tasks is larger than a preset task number threshold value and the number of the idle servers is smaller than the preset server number, acquiring a target available server from a bootable server list of the computer cluster, and controlling the target idle server to execute booting operation;

the available servers in the bootable server list are servers which are successfully powered off in the automatic power-off process.

In a second aspect, an embodiment of the present invention further provides a server management apparatus, including:

the quantity acquisition module is used for acquiring the quantity of queuing tasks and the quantity of idle servers of the computer cluster;

the quantity judging module is used for judging whether the number of the queued tasks is greater than a preset task quantity threshold value or not and whether the number of the idle servers is less than the preset number of the servers or not;

a server starting module, configured to, if the number of queued tasks is greater than a preset task number threshold and the number of idle servers is less than a preset server number, obtain a target available server from a bootable server list of the computer cluster, and control the target idle server to perform a starting operation;

In a third aspect, an embodiment of the present invention further provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the server management method according to the embodiment of the present invention.

In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is configured to, when executed by a processor, implement the server management method according to the embodiment of the present invention.

According to the technical scheme of the embodiment of the invention, the number of queuing tasks and the number of idle servers of a computer cluster are obtained, and then whether the number of queuing tasks is greater than a preset task number threshold value and whether the number of idle servers is less than the preset server number is judged; when the number of queued tasks is greater than the preset task number threshold and the number of idle servers is less than the preset server number, acquiring a target available server from a bootable server list of the computer cluster, controlling the target idle server to execute boot-up operation, determining the shortage of the current computing task application resource of the computer cluster according to the number of queued tasks and the number of idle servers, determining that the current computing task application resource of the computer cluster is relatively shortage according to the number of queued tasks and the number of idle servers, and dynamically starting a proper number of servers when the servers need to be automatically started, thereby dynamically determining the shortage of the current computing task application resource of the computer cluster according to the number of queued tasks and the number of idle servers, automatically starting the servers, ensuring the timely processing of computing tasks, and realizing that the power consumption of the whole cluster is maintained at a level suitable for the computing tasks, resource waste is avoided.

Drawings

Fig. 1 is a flowchart of a server management method according to an embodiment of the present invention.

Fig. 2 is a flowchart of a server management method according to a second embodiment of the present invention.

Fig. 3 is a flowchart of a server management method according to a third embodiment of the present invention.

Fig. 4 is a flowchart of a server management method according to a fourth embodiment of the present invention.

Fig. 5 is a schematic structural diagram of a server management apparatus according to a fifth embodiment of the present invention.

Fig. 6 is a schematic structural diagram of a computer device according to a sixth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention.

It should be further noted that, for the convenience of description, only some but not all of the relevant aspects of the present invention are shown in the drawings. Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

Example one

Fig. 1 is a flowchart of a server management method according to an embodiment of the present invention. The embodiment of the present invention is applicable to the case of managing servers in a computer cluster, and the method may be executed by the server management apparatus provided in the embodiment of the present invention, and the apparatus may be implemented in a software and/or hardware manner, and may be generally integrated in a computer device. Such as a management server in a computer cluster. The management server is a server for managing all servers in the computer cluster. As shown in fig. 1, the method of the embodiment of the present invention specifically includes:

step 101, acquiring the number of queued tasks and the number of idle servers of the computer cluster.

In this embodiment, the number of queued tasks of the computer cluster is the number of queued tasks of all users in the computer cluster. The number of idle servers is the number of all idle servers within the computer cluster. An idle server is a server in which all computing resources within the server are idle. Illustratively, the computing resource may be a Graphics Processing Unit (GPU) card.

Optionally, the obtaining of the number of queued tasks and the number of idle servers of the computer cluster may include: and acquiring the number of queued tasks and the number of idle servers of the computer cluster at regular time according to a preset startup time interval.

The preset startup time interval can be set according to the service requirement. For example, the preset power-on time interval may be 15 minutes. Acquiring the number of queued tasks and the number of idle servers of the computer cluster every 15 minutes; judging whether the number of the queued tasks is larger than a preset task number threshold value or not and whether the number of the idle servers is smaller than the preset number of the servers or not; if the number of the queued tasks is larger than a preset task number threshold value and the number of the idle servers is smaller than the preset server number, acquiring a target available server from a bootable server list of the computer cluster, and controlling the target idle server to execute booting operation; the available servers in the bootable server list are servers which are successfully powered off in the automatic power-off process. Therefore, the automatic starting process can be executed once every preset starting time interval.

Optionally, the number of queued tasks and the number of idle servers of the computer cluster may be obtained through a preset script command for obtaining the number of queued tasks and the number of idle servers.

Step 102, judging whether the number of the queued tasks is greater than a preset task number threshold value and whether the number of the idle servers is less than a preset server number: if yes, go to step 103; if not, go to step 104.

Optionally, the value of the preset task number threshold may be determined according to the frequency of the tasks. Illustratively, the preset task number threshold value is 10. In a practical scenario, a user unjustifiably submitting a computing resource request to a server can easily cause queuing of tasks. Empirically, a number of queued tasks of 10 and below can be considered normal.

And judging whether the number of the queued tasks is larger than a preset task number threshold value. If the number of the queued tasks is larger than the preset task number threshold value, the number of the queued tasks is larger than a normal value, the current computing task is in a more tense resource application, and a server needs to be automatically started to ensure the timely processing of the computing task. If the number of the queued tasks is smaller than or equal to the preset task number threshold, the number of the queued tasks is smaller than or equal to a normal value, the current computing task application resources are not very tight, the server does not need to be automatically started temporarily, and the timely processing of the computing tasks is guaranteed.

Optionally, the value of the preset number of servers may be determined according to the number of available machines. Illustratively, the value of the preset number of servers is 2. In a practical scenario, a server is often configured with 8 GPU cards, and when an idle server has only 1 and a user submits a computing task requiring 16 GPU cards, the task will be queued. Therefore, the preset number of the servers is set to be 2, so that the servers can be started in time, and resource support is provided for the computing task.

And judging whether the number of the queued tasks is greater than a preset task number threshold value or not and whether the number of the idle servers is less than the preset number of the servers or not. If the number of the queued tasks is larger than the preset task number threshold value and the number of the idle servers is smaller than the preset server number, the number of the queued tasks is larger than a normal value, the number of the available idle servers is smaller than a normal value, the current computing task is in short resource application, the servers need to be automatically started, and the timely processing of the computing task is guaranteed. If the number of the queued tasks is less than or equal to the threshold value of the preset number of the tasks or the number of the idle servers is more than or equal to the preset number of the servers, the current computing task application resources are not in shortage, the servers do not need to be automatically started temporarily, and the timely processing of the computing tasks is guaranteed.

Meanwhile, the tension condition of the current computing task application resources of the computer cluster is determined according to the number of the queued tasks and the number of the idle servers, and whether the servers need to be automatically started or not can be judged more reasonably.

Optionally, the method may further include: and judging whether available servers exist in the bootable server list of the computer cluster.

In this embodiment, before determining whether the number of queued tasks is greater than a preset task number threshold and whether the number of idle servers is less than a preset server number, it may be determined whether an available server exists in a list of bootable servers of the computer cluster. Optionally, after determining whether the number of queued tasks is greater than a preset task number threshold and whether the number of idle servers is less than a preset server number, determining whether an available server exists in a list of bootable servers of the computer cluster.

And maintaining the available servers which can be started in the computer cluster in the list of the starting servers of the computer cluster. The available server is the bootable server. Alternatively, the available servers in the bootable server list may be servers that were successfully shutdown in the auto-shutdown process.

If it is determined that there is no available server in the list of bootable servers of the computer cluster, this means that the purpose cannot be achieved even if the servers need to be automatically booted in the subsequent process, so this case may end the process. If it is determined that there is an available server in the list of bootable servers for the computer cluster, this means that the subsequent steps can continue to be performed even if the subsequent processes have the potential to successfully boot the server.

Step 103, obtaining a target available server from the list of bootable servers of the computer cluster, and controlling the target idle server to execute a boot operation.

Optionally, the automatic shutdown process may be: acquiring the number of idle servers of the computer cluster; judging whether the computer cluster meets the idle server closing condition or not according to the number of the idle servers; and if the computer cluster meets the idle server closing condition, acquiring a target idle server from the idle servers of the computer cluster, controlling the target idle server to execute shutdown operation, and adding the target idle server which is successfully shutdown into a bootable server list of the computer cluster.

And if the number of the queued tasks is greater than the preset task number threshold value and the number of the idle servers is less than the preset server number, acquiring a target available server from a bootable server list of the computer cluster, and controlling the target idle server to execute boot-up operation.

Optionally, obtaining a target available server from a list of bootable servers of the computer cluster, and controlling the target idle server to execute a boot operation may include: selecting an available server from a list of bootable servers of the computer cluster as a target available server; and controlling the target idle server to execute the starting operation through an Intelligent Platform Management Interface (IPMI) instruction.

Optionally, after controlling the target idle server to perform a boot operation, the method may further include: after waiting for a preset startup time period, judging whether the target available server is successfully started; if the target available server is successfully started, initializing the target available server; after waiting for a preset initialization time period, judging whether the target available server is initialized successfully; and if the initialization of the target available server is successful, writing the normal online information of the target available server into a log file.

The preset boot time period may be determined according to a time required for a boot operation of the server. For example, the booting operation of the server usually requires 5 minutes, and the preset booting time period is 5 minutes. The preset initialization period may be determined according to a time required for the initialization operation of the server. The normal online information is used for recording the information that the target available server successfully completes the starting operation and the initialization operation in the current automatic starting process and normally online.

Optionally, the target available server is tested by a network diagnostic tool (Packet Internet Groper, PING) instruction, and whether the target available server is successfully powered on is determined.

Optionally, if the target available server is not successfully started, the information of the unsuccessful start-up of the target available server is written into the log file, so that the operation and maintenance personnel can manually intervene and maintain the target available server according to the information of the unsuccessful start-up of the target available server when checking the log file regularly. The startup unsuccessful information is used for recording the information that the target available server has not been successfully started in the current automatic startup process.

Optionally, the initialization operation may include operations such as memory SWaP partition (SWaP) check, synchronization of resource management system SLURM configuration files, display card initialization, check of storage mount condition, and check of whether the scheduling system service is normal.

Optionally, if the target available server is not initialized successfully, the information of the unsuccessful initialization of the target available server is written into the log file, so that the operation and maintenance personnel can manually intervene and maintain the target available server according to the information of the unsuccessful initialization of the target available server when checking the log file regularly. The initialization unsuccessful information is used for recording that the target available server is not initialized successfully in the current automatic startup process.

And 104, writing the current resource condition information of the computer cluster into a log file.

In this embodiment, if the number of queued tasks is less than or equal to the preset task number threshold, or the number of idle servers is greater than or equal to the preset server number, the current resource condition information of the computer cluster is written into the log file, so that an operation and maintenance worker can determine the resource condition of the computer cluster in the current automatic startup process according to the current resource condition information of the computer cluster when checking the log file regularly.

Optionally, the current resource condition information of the computer cluster includes the number of queued tasks and the number of idle servers of the computer cluster.

The embodiment of the invention provides a server management method, which comprises the steps of obtaining the number of queuing tasks and the number of idle servers of a computer cluster, and then judging whether the number of the queuing tasks is greater than a preset task number threshold value or not and whether the number of the idle servers is less than the preset server number or not; when the number of queued tasks is greater than the preset task number threshold and the number of idle servers is less than the preset server number, acquiring a target available server from a bootable server list of the computer cluster, controlling the target idle server to execute boot-up operation, determining the shortage of the current computing task application resource of the computer cluster according to the number of queued tasks and the number of idle servers, determining that the current computing task application resource of the computer cluster is relatively shortage according to the number of queued tasks and the number of idle servers, and dynamically starting a proper number of servers when the servers need to be automatically started, thereby dynamically determining the shortage of the current computing task application resource of the computer cluster according to the number of queued tasks and the number of idle servers, automatically starting the servers, ensuring the timely processing of computing tasks, and realizing that the power consumption of the whole cluster is maintained at a level suitable for the computing tasks, resource waste is avoided.

Example two

Fig. 2 is a flowchart of a server management method according to a second embodiment of the present invention. The embodiment of the present invention may be combined with each optional solution in one or more of the above embodiments, and in the embodiment of the present invention, the server management method may further include: and judging whether available servers exist in the bootable server list of the computer cluster. And after controlling the target idle server to execute the boot operation, the method may further include: after waiting for a preset startup time period, judging whether the target available server is successfully started; if the target available server is successfully started, initializing the target available server; after waiting for a preset initialization time period, judging whether the target available server is initialized successfully; and if the initialization of the target available server is successful, writing the normal online information of the target available server into a log file.

As shown in fig. 2, the method of the embodiment of the present invention specifically includes:

step 201, acquiring the number of queued tasks and the number of idle servers of the computer cluster.

Non-exhaustive descriptions of the present embodiments may refer to the foregoing embodiments.

Step 202, judging whether an available server exists in the list of the bootable servers of the computer cluster: if yes, go to step 203; if not, the flow is ended.

In this embodiment, the bootable available servers in the computer cluster are maintained in the bootable server list of the computer cluster. The available server is the bootable server. Alternatively, the available servers in the bootable server list may be servers that were successfully shutdown in the auto-shutdown process.

Step 203, determining whether the number of queued tasks is greater than a preset task number threshold, and whether the number of idle servers is less than a preset server number: if yes, go to step 204; if not, go to step 211.

In this embodiment, it is determined whether the number of queued tasks is greater than a preset task number threshold and whether the number of idle servers is less than a preset server number. If the number of the queued tasks is larger than the preset task number threshold value and the number of the idle servers is smaller than the preset server number, the number of the queued tasks is larger than a normal value, the number of the available idle servers is smaller than a normal value, the current computing task is in short resource application, the servers need to be automatically started, and the timely processing of the computing task is guaranteed. If the number of the queued tasks is less than or equal to the threshold value of the preset number of the tasks or the number of the idle servers is more than or equal to the preset number of the servers, the current computing task application resources are not in shortage, the servers do not need to be automatically started temporarily, and the timely processing of the computing tasks is guaranteed.

And 204, acquiring a target available server from the bootable server list of the computer cluster, and controlling the target idle server to execute boot operation.

Optionally, obtaining a target available server from a list of bootable servers of the computer cluster, and controlling the target idle server to execute a boot operation may include: selecting an available server from a list of bootable servers of the computer cluster as a target available server; and controlling the target idle server to execute the starting operation through the IPMI instruction.

Step 205, after waiting for a preset boot time period, determining whether the target available server is booted successfully: if yes, go to step 206; if not, go to step 210.

In this embodiment, the preset boot time period may be determined according to the time required for the boot operation of the server. For example, the booting operation of the server usually requires 5 minutes, and the preset booting time period is 5 minutes.

Optionally, the PING instruction is used to test the target available server, and whether the target available server is successfully started is determined.

And step 206, initializing the target available server.

Step 207, after waiting for a preset initialization time period, determining whether the target available server is initialized successfully: if yes, go to step 208; if not, go to step 209.

In this embodiment, the preset initialization time period may be determined according to the time required for the initialization operation of the server.

And step 208, writing the normal online information of the target available server into a log file.

In this embodiment, the normal online information is used to record the information that the target available server successfully completes the booting operation and the initialization operation in the current automatic booting process, and is normally online.

And step 209, writing the information of the unsuccessful initialization of the target available server into a log file.

In this embodiment, if the target available server is not initialized successfully, the information of the unsuccessful initialization of the target available server is written into the log file, so that the operation and maintenance personnel can manually intervene and maintain the target available server according to the information of the unsuccessful initialization of the target available server when checking the log file regularly. The initialization unsuccessful information is used for recording that the target available server is not initialized successfully in the current automatic startup process.

Step 210, writing the information of the unsuccessful start-up of the target available server into a log file.

In this embodiment, if the target available server is not successfully started, the information of the unsuccessful start-up of the target available server is written into the log file, so that the operation and maintenance personnel can manually intervene and maintain the target available server according to the information of the unsuccessful start-up of the target available server when checking the log file regularly. The startup unsuccessful information is used for recording the information that the target available server has not been successfully started in the current automatic startup process.

And step 211, writing the current resource situation information of the computer cluster into a log file.

The embodiment of the invention provides a server management method, which comprises the steps of obtaining the number of queuing tasks and the number of idle servers of a computer cluster, judging whether the number of the queuing tasks is greater than a preset task number threshold value or not and whether the number of the idle servers is less than the preset server number or not when determining that available servers exist in a startup server list of the computer cluster, obtaining a target available server from the startup server list of the computer cluster when the number of the queuing tasks is greater than the preset task number threshold value and the number of the idle servers is less than the preset server number, controlling the target idle server to execute startup operation, writing corresponding information into a log file according to the startup condition and the initialization condition of the target available server, determining the tension condition of the current computing task application resources of the computer cluster according to the number of the queuing tasks and the number of the idle servers, the method can dynamically open a proper number of servers when determining that the current computing task application resources of the computer cluster are tense according to the number of queued tasks and the number of idle servers and the servers need to be automatically opened, thereby dynamically applying the tense situation of the resources according to the current computing task of the computer cluster, automatically opening the servers, ensuring the timely processing of the computing tasks, maintaining the power consumption of the whole cluster at the degree suitable for the computing tasks and avoiding the resource waste, and writing corresponding information into a log file according to the starting situation and the initialization situation of the servers, so that an operation and maintenance person can manually intervene and maintain a target available server according to the information in the log file when checking the log file regularly.

EXAMPLE III

Fig. 3 is a flowchart of a server management method according to a third embodiment of the present invention. The embodiment of the present invention may be combined with each optional solution in one or more of the above embodiments, and in the embodiment of the present invention, the server management method may further include: acquiring the number of idle servers of the computer cluster; judging whether the computer cluster meets the idle server closing condition or not according to the number of the idle servers; and if the computer cluster meets the idle server closing condition, acquiring a target idle server from the idle servers of the computer cluster, controlling the target idle server to execute shutdown operation, and adding the target idle server which is successfully shutdown into a bootable server list of the computer cluster.

As shown in fig. 3, the method of the embodiment of the present invention specifically includes:

step 301, obtaining the number of idle servers of the computer cluster.

In this embodiment, the number of idle servers is the number of all idle servers in the computer cluster. An idle server is a server in which all computing resources within the server are idle. Illustratively, the computing resource may be a GPU card.

Optionally, the acquiring the number of idle servers of the computer cluster may include: and acquiring the number of idle servers of the computer cluster at regular time according to a preset closing time interval.

The preset closing time interval may be set according to a service requirement. For example, the preset off interval may be one day. Acquiring the number of idle servers of the computer cluster every other day; judging whether the computer cluster meets the idle server closing condition or not according to the number of the idle servers; and if the computer cluster meets the idle server closing condition, acquiring a target idle server from the idle servers of the computer cluster, controlling the target idle server to execute shutdown operation, and adding the target idle server which is successfully shutdown into a bootable server list of the computer cluster. Therefore, the automatic shutdown process can be executed once at preset shutdown time intervals.

Optionally, because the shutdown process is not suitable to be executed frequently, the preset shutdown time interval is longer than the preset startup time interval.

According to experience, the calculation tasks submitted by users in a time period near the early morning of each day are usually the least, the utilization rate of a computer cluster is the lowest, and the triggering of an automatic shutdown process is usually the most appropriate. The number of idle servers of a computer cluster may be obtained in the morning of each day at preset off-time intervals. Specifically, whether the current time of the system crosses the day or not can be judged to serve as a trigger condition for executing the automatic shutdown process, and the automatic shutdown process is executed once every morning according to a preset shutdown time interval.

Optionally, the number of idle servers of the computer cluster may be obtained through a preset script command for obtaining the number of idle servers.

Step 302, according to the number of the idle servers, judging whether the computer cluster meets the idle server closing condition: if yes, go to step 303; if not, go to step 304.

Optionally, the determining, according to the number of idle servers, whether the computer cluster meets an idle server shutdown condition may include: judging whether the number of the idle servers is larger than a preset idle server number threshold value or not; if yes, determining that the computer cluster meets an idle server closing condition; if not, determining that the computer cluster does not meet the idle server closing condition.

The value of the preset idle server number threshold can be set according to the service requirement. Illustratively, the preset idle server number threshold value is 5.

And judging whether the number of the idle servers is greater than a preset idle server number threshold value or not. If the number of the idle servers is larger than the preset idle server number threshold value, the number of the idle servers in the computer cluster is larger than a normal value, too many idle servers in the computer cluster are in an open state, the too many idle servers need to be closed, resource waste is avoided, and the computer cluster is determined to meet the idle server closing condition. If the number of the idle servers is smaller than or equal to the preset idle server number threshold value, the number of the idle servers in the computer cluster is smaller than or equal to a normal value, the computer cluster does not have too many idle servers to maintain an opening state, the too many idle servers do not need to be closed temporarily, resource waste is avoided, and the computer cluster is determined not to meet idle server closing conditions.

Step 303, obtaining a target idle server from the idle servers of the computer cluster, controlling the target idle server to execute a shutdown operation, and adding the target idle server that is successfully shutdown to a list of bootable servers of the computer cluster.

Optionally, the obtaining a target idle server from the idle servers of the computer cluster, and controlling the target idle server to execute a shutdown operation may include: calculating the difference value between the number of the idle servers and the preset idle server number threshold value; acquiring the idle servers with the difference quantity from the idle servers of the computer cluster, taking the idle servers as target idle servers, and removing the target idle servers from a resource pool; and executing shutdown operation on the target idle server after the elimination.

The resource pool includes servers in the computer cluster that are maintained in an on state. In this embodiment, the target idle server that needs to execute the shutdown operation is removed from the resource pool in time.

Optionally, the shutdown operation of the target idle server after being removed is completed by executing a shutdown function on the target idle server after being removed.

In a specific example, the value of the preset idle server number threshold is 5. The number of idle servers is 7. If the number of idle servers is larger than 5, the number of idle servers in the computer cluster is larger than a normal value, the excessive idle servers in the computer cluster are maintained in an open state, the excessive idle servers need to be closed, resource waste is avoided, and the computer cluster is determined to meet the idle server closing condition. The difference value between the number of idle servers and the preset idle server number threshold is 2. And acquiring 2 idle servers from the idle servers of the computer cluster as target idle servers. And removing the target idle server from the resource pool. And executing shutdown operation on the eliminated target idle server.

And maintaining the available servers which can be started in the computer cluster in the list of the starting servers of the computer cluster. The available server is the bootable server. The target idle server that is successfully powered off is an available server that can be powered on in the computer cluster. Therefore, the target idle server which is successfully shut down in the automatic shutdown process is added into the list of the bootable servers of the computer cluster.

And step 304, writing the current resource situation information of the computer cluster into a log file.

In this embodiment, if the computer cluster does not satisfy the idle server shutdown condition, the current resource condition information of the computer cluster is written into the log file, so that an operation and maintenance worker can determine the resource condition of the computer cluster in the current automatic shutdown process according to the current resource condition information of the computer cluster when checking the log file regularly.

Optionally, the current resource condition information of the computer cluster includes the number of idle servers of the computer cluster.

The embodiment of the invention provides a server management method, which comprises the steps of obtaining the number of idle servers of a computer cluster, judging whether the computer cluster meets the idle server closing condition or not according to the number of the idle servers, obtaining a target idle server from the idle servers of the computer cluster when the computer cluster meets the idle server closing condition, controlling the target idle server to execute the shutdown operation, adding the target idle server which is successfully shutdown into a list of computer-enabled servers of the computer cluster, dynamically closing excessive idle servers according to the number of the idle servers, dynamically closing the servers according to the idle conditions of the servers in the computer cluster, saving the power consumption of the computer cluster, and maintaining the power consumption of the whole computer cluster at the degree which is adaptive to a computing task, resource waste is avoided.

Example four

Fig. 4 is a flowchart of a server management method according to a fourth embodiment of the present invention. In this embodiment of the present invention, the determining whether the computer cluster meets the idle server shutdown condition according to the number of idle servers may include: judging whether the number of the idle servers is larger than a preset idle server number threshold value or not; if yes, determining that the computer cluster meets an idle server closing condition; if not, determining that the computer cluster does not meet the idle server closing condition.

And the obtaining a target idle server from the idle servers of the computer cluster, and controlling the target idle server to execute a shutdown operation may include: calculating the difference value between the number of the idle servers and the preset idle server number threshold value; acquiring the idle servers with the difference quantity from the idle servers of the computer cluster, taking the idle servers as target idle servers, and removing the target idle servers from a resource pool; and executing shutdown operation on the target idle server after the elimination.

As shown in fig. 4, the method of the embodiment of the present invention specifically includes:

step 401, obtaining the number of idle servers of the computer cluster.

Step 402, judging whether the number of the idle servers is larger than a preset idle server number threshold value: if yes, go to step 403; if not, go to step 406.

In this embodiment, the value of the preset idle server number threshold may be set according to the service requirement. Illustratively, the preset idle server number threshold value is 5.

If the number of the idle servers is larger than the preset idle server number threshold value, the number of the idle servers in the computer cluster is larger than a normal value, too many idle servers in the computer cluster are in an open state, the too many idle servers need to be closed, resource waste is avoided, and the computer cluster is determined to meet the idle server closing condition. If the number of the idle servers is smaller than or equal to the preset idle server number threshold value, the number of the idle servers in the computer cluster is smaller than or equal to a normal value, the computer cluster does not have too many idle servers to maintain an opening state, the too many idle servers do not need to be closed temporarily, resource waste is avoided, and the computer cluster is determined not to meet idle server closing conditions.

And 403, calculating a difference value between the number of the idle servers and the preset idle server number threshold value.

Step 404, obtaining the idle servers with the difference number from the idle servers of the computer cluster, using the idle servers as target idle servers, and removing the target idle servers from a resource pool.

Step 405, executing shutdown operation on the eliminated target idle server, and adding the target idle server which is successfully shutdown to a bootable server list of the computer cluster.

And 406, writing the current resource condition information of the computer cluster into a log file.

The embodiment of the invention provides a server management method, which comprises the steps of determining whether a computer cluster meets an idle server closing condition by judging whether the number of idle servers is greater than a preset idle server number threshold value or not, determining that the computer cluster meets the idle server closing condition when the number of idle servers is greater than the preset idle server number threshold value, then calculating the difference value between the number of idle servers and the preset idle server number threshold value, obtaining the idle servers with the difference value from the idle servers of the computer cluster to be used as target idle servers, removing the target idle servers from a resource pool, executing shutdown operation on the removed target idle servers, dynamically closing excessive idle servers according to the number of idle servers and the preset idle server number threshold value, and dynamically closing the servers according to the idle conditions of the servers in the computer cluster, the server is automatically closed, the power consumption of the computer cluster is saved, the power consumption of the whole computer cluster is maintained at a degree suitable for the calculation task, and the resource waste is avoided.

EXAMPLE five

Fig. 5 is a schematic structural diagram of a server management apparatus according to a fifth embodiment of the present invention. As shown in fig. 5, the apparatus includes: a number obtaining module 501, a number judging module 502 and a server booting module 503.

The number obtaining module 501 is configured to obtain the number of queued tasks and the number of idle servers of the computer cluster; a quantity judgment module 502, configured to judge whether the number of queued tasks is greater than a preset task quantity threshold, and whether the number of idle servers is less than a preset server quantity; a server boot module 503, configured to, if the number of queued tasks is greater than a preset task number threshold and the number of idle servers is less than a preset server number, obtain a target available server from a bootable server list of the computer cluster, and control the target idle server to perform a boot operation; the available servers in the bootable server list are servers which are successfully powered off in the automatic power-off process.

The embodiment of the invention provides a server management device, which is characterized in that the number of queuing tasks and the number of idle servers of a computer cluster are obtained, and then whether the number of the queuing tasks is greater than a preset task number threshold value and whether the number of the idle servers is less than the preset server number is judged; when the number of queued tasks is greater than the preset task number threshold and the number of idle servers is less than the preset server number, acquiring a target available server from a bootable server list of the computer cluster, controlling the target idle server to execute boot-up operation, determining the shortage of the current computing task application resource of the computer cluster according to the number of queued tasks and the number of idle servers, determining that the current computing task application resource of the computer cluster is relatively shortage according to the number of queued tasks and the number of idle servers, and dynamically starting a proper number of servers when the servers need to be automatically started, thereby dynamically determining the shortage of the current computing task application resource of the computer cluster according to the number of queued tasks and the number of idle servers, automatically starting the servers, ensuring the timely processing of computing tasks, and realizing that the power consumption of the whole cluster is maintained at a level suitable for the computing tasks, resource waste is avoided.

In an optional implementation manner of the embodiment of the present invention, optionally, the server management apparatus may further include: and the server judging module is used for judging whether an available server exists in the list of the bootable servers of the computer cluster.

In an optional implementation manner of the embodiment of the present invention, optionally, the quantity obtaining module 501 may include: and the quantity timing acquisition unit is used for acquiring the quantity of the queued tasks and the quantity of the idle servers of the computer cluster at regular time according to the preset startup time interval.

In an optional implementation manner of the embodiment of the present invention, optionally, the server management apparatus may further include: the starting-up judging module is used for judging whether the target available server is started up successfully or not after waiting for a preset starting-up time period; the server initialization module is used for initializing the target available server if the target available server is successfully started; the initialization judging module is used for judging whether the target available server is initialized successfully or not after waiting for a preset initialization time period; and the information writing module is used for writing the normal online information of the target available server into a log file if the target available server is initialized successfully.

In an optional implementation manner of the embodiment of the present invention, optionally, the server management apparatus may further include: an idle number obtaining module, configured to obtain the number of idle servers of the computer cluster; a closing condition judging module, configured to judge whether the computer cluster meets a closing condition of the idle servers according to the number of the idle servers; and the server shutdown module is used for acquiring a target idle server from the idle servers of the computer cluster, controlling the target idle server to execute shutdown operation and adding the target idle server which is successfully shutdown into the bootable server list of the computer cluster if the computer cluster meets the idle server shutdown condition.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

The server management device can execute the server management method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the server management method.

EXAMPLE six

Fig. 6 is a schematic structural diagram of a computer device according to a sixth embodiment of the present invention. FIG. 6 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in FIG. 6 is only an example and should not bring any limitations to the functionality or scope of use of embodiments of the present invention.

As shown in FIG. 6, computer device 12 is in the form of a general purpose computing device. The components of computer device 12 may include, but are not limited to: one or more processors 16, a memory 28, and a bus 18 that connects the various system components (including the memory 28 and the processors 16).

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Computer device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, and commonly referred to as a "hard drive"). Although not shown in FIG. 6, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.

Computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with computer device 12, and/or with any devices (e.g., network card, modem, etc.) that enable computer device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, computer device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via network adapter 20. As shown, network adapter 20 communicates with the other modules of computer device 12 via bus 18. It should be appreciated that although not shown in FIG. 6, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processor 16 executes various functional applications and data processing by running the program stored in the memory 28, thereby implementing the server management method provided by the embodiment of the present invention: acquiring the number of queuing tasks and the number of idle servers of a computer cluster; judging whether the number of the queued tasks is larger than a preset task number threshold value or not and whether the number of the idle servers is smaller than the preset number of the servers or not; if the number of the queued tasks is larger than a preset task number threshold value and the number of the idle servers is smaller than the preset server number, acquiring a target available server from a bootable server list of the computer cluster, and controlling the target idle server to execute booting operation; the available servers in the bootable server list are servers which are successfully powered off in the automatic power-off process.

EXAMPLE seven

The seventh embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where when the computer program is executed by a processor, the computer program implements a server management method provided in the embodiments of the present invention: acquiring the number of queuing tasks and the number of idle servers of a computer cluster; judging whether the number of the queued tasks is larger than a preset task number threshold value or not and whether the number of the idle servers is smaller than the preset number of the servers or not; if the number of the queued tasks is larger than a preset task number threshold value and the number of the idle servers is smaller than the preset server number, acquiring a target available server from a bootable server list of the computer cluster, and controlling the target idle server to execute booting operation; the available servers in the bootable server list are servers which are successfully powered off in the automatic power-off process.

Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or computer device. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A server management method, comprising:

2. The method of claim 1, further comprising:

and judging whether available servers exist in the bootable server list of the computer cluster.

3. The method of claim 1, wherein obtaining the number of queued tasks and the number of free servers for the cluster of computers comprises:

and acquiring the number of queued tasks and the number of idle servers of the computer cluster at regular time according to a preset startup time interval.

4. The method of claim 1, further comprising, after controlling the target idle server to perform a boot operation:

after waiting for a preset startup time period, judging whether the target available server is successfully started;

if the target available server is successfully started, initializing the target available server;

after waiting for a preset initialization time period, judging whether the target available server is initialized successfully;

and if the initialization of the target available server is successful, writing the normal online information of the target available server into a log file.

5. The method of claim 1, further comprising:

acquiring the number of idle servers of the computer cluster;

judging whether the computer cluster meets the idle server closing condition or not according to the number of the idle servers;

and if the computer cluster meets the idle server closing condition, acquiring a target idle server from the idle servers of the computer cluster, controlling the target idle server to execute shutdown operation, and adding the target idle server which is successfully shutdown into a bootable server list of the computer cluster.

6. A server management apparatus, comprising:

7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the server management method according to any of claims 1-5 when executing the computer program.

8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the server management method according to any one of claims 1 to 5.