WO2017023237A1

WO2017023237A1 - Server resource management

Info

Publication number: WO2017023237A1
Application number: PCT/US2015/043054
Authority: WO
Inventors: Wade J. SATTERFIELD; Tyler Easterling; Pieter C. KRUITHOF; Zachary D. SCHRAG
Original assignee: Hewlett Packard Enterprise Development Lp
Priority date: 2015-07-31
Filing date: 2015-07-31
Publication date: 2017-02-09

Abstract

Examples for managing resources on a server or servers include receiving data from a physical server or servers hosting a hypervisor, container, virtual machine, and/or virtualized operating system. In an example, CPU usage data, memory usage data, and disk usage data for a hypervisor and/or container are received, and a CPU allocation, memory allocation, and disk allocation are calculated based on a capacity or capacities of the CPU, memory, and disk multiplied by an oversubscription value or ratio. In an example, the CPU usage data, memory usage data, and disk usage data are plotted or otherwise output to a first axis, and the CPU allocation, memory allocation, and disk allocation are plotted or otherwise output to a second axis of, for example, a scatter graph.

Description

SERVER RESOURCE MANAGEMENT

BACKGROUND

[0001] Computing systems, devices, and electronic components such as servers may run or execute in a networked environment and may run or support virtual machines or virtualized software, e.g., an operating system or a shared operating system. A virtual machine may be used to run applications, services, and computer programs, and/or to provide other functionality to a host or remote computer. In a data center, networked, or "cloud" environment, a plurality of virtual machines may be resident on or across a plurality of physical servers, such that the computing load or demand of the virtual machines may be distributed across a physical hardware deployment.

BRIEF DESCRIPTION OF THE DRAWINGS

[0002] The following detailed description references the drawings, wherein:

[0003] FIG. 1 is a block diagram of a system to manage server resources, according to an example;

[0004] FIG. 2 is a flowchart of outputting usage data and allocation data for server resource management, according to an example;

[0005] FIG. 3 is a flowchart of determining a memory block size at a given memory swap rate threshold, according to an example;

[0006] FIG. 4 is a block diagram of a system to output usage data and allocation data for server resource management, according to an example; and

[0007] FIG. 5 is a visual output of usage data and allocation data for server resource management, according to an example.

DETAILED DESCRIPTION

[0008] Various examples described below provide for managing resources on servers in a cloud service or in a data center. In an example, data from a physical server or servers hosting a hypervisor, container, virtual machine, and/or virtualized operating system is received. More specifically, in an example, CPU usage data, memory usage data, and disk usage data for a hypervisor and/or container (or server hosting a container) are received, and a CPU allocation, memory allocation, and disk allocation are calculated based on a capacity or capacities of the CPU, memory, and disk multiplied by an oversubscription value or ratio. In an example, the CPU usage data, memory usage data, and disk usage data are plotted or otherwise output to a first axis, and the CPU allocation, memory allocation, and disk allocation are plotted or otherwise output to a second axis of, for example, a scatter graph.

[0009] Companies, organizations, and information technology departments continuously seek ways to improve computing performance and reduce computing budgets and expenditures. One way to meet this goal may be to employ a strategy utilizing virtual machines, which may allow for server consolidation resulting in lower purchase and maintenance costs for computing hardware.

[0010] A virtual machine may be, for example, software and/or hardware-based emulation of a physical machine, e.g., a computer, or more specifically of an operating system. A virtual machine can be hosted by a host system that may include a physical server and/or a physical machine running a hypervisor, container, and/or other virtual machine software.

[0011] A hypervisor or container may be, for example, software that provides a virtualized environment including virtual machines allowing other software, including operating systems, to run on the host machine. A hypervisor or container may allow a virtual machine to access underlying system hardware and/or the system BIOS. In some examples, a physical server may run one hypervisor, which itself may run a plurality of virtual machines. [0012] In a cloud environment or data center, a service may empby a plurality of physical servers, with each of the physical servers running a hypen/isor or container and a plurality of virtual machines. As the load or demand on a virtual machine at any given time may vary, servers may be "oversubscribed" such that the number of hypervisors or containers assigned or subscribed to a physical server would exceed the physical capacity of the server if all of the subscribed virtual machines were to have a full computational demand at once. Oversubscription may be expressed in an oversubscription ratio, e.g., the ratio of virtual hardware resources to physical hardware resources.

[0013] In some examples, each resource may have its own oversubscription ratio, as the oversubscription ratio for disk may be very different than the oversubscription ratio for CPU. A physical hardware resource value multiplied by the oversubscription ratio for the resource may determine a limit for allocation of that resource. In an example, when a new allocation will bring the total of a server above this limit, the new load may not be placed on that server. If no server can accept the bad, then the cloud is considered full and the virtual machine or container is not created.

[0014] While an arrangement including the ability to oversubscribe servers may lower the cost of operating a doud service or data center, oversubscription may present challenges to server administrators. Setting the ratios too high may risk performance problems, while setting the ratios too bw makes for a more expensive cloud. Monitoring or visualization of CPU, memory, and disk usage data measured against allocation data, e.g., based on an oversubscription ratio and/or various calculations, may assist a cbud administrator in balancing resource demand within a cbud service or data center to avoid server issues such as performance degradation or crashes, and to avoid wasting resources with undersubscription.

[0015] Such monitoring or visualization may also assist a tuning tool, such as a scheduler, that assigns or re-assigns new or existing virtual machines to hypervisors, containers, or physical hardware. Similarly, such monitoring or visualization may provide a state of the cloud view so an administrator can ensure oversubscription ratio are properly set, and to balance and size resources. In such a view, CPU, memory, and disk resources may be monitored independently, to minimize the impact on a physical server when one resource runs out of capacity before other resources on the same physical hardware.

[0016] FIG. 1 is a block diagram of a system to manage server resources, according to an example.

[0017] A cloud service or data center (hereinafter "cloud service^*) may refer to a collection of servers and other computing devices which may be on-site, off-site, private, public, co-located, or located across a geographic area or areas. A cloud service may be used, for example, to host, execute, store, or otherwise process applications, websites, files, media, and other digital files, software, or machine-readable instructions.

[0018] A cloud service may comprise or communicate with computing devices such as servers, blade enclosures, workstations, desktop computers, laptops or notebook computers, point of sale devices, tablet computers, mobile phones, smart devices, or any other processing device or equipment including a processing resource. In examples described herein, a processing resource may include, for example, one processor or multiple processors included in a single computing device or distributed across multiple computing devices.

[0019] In the example of FIG. 1, servers or compute nodes 116, 124, and 132 may be computing devices that include a processing resource and a machine-readable storage medium comprising or encoded with instructions executable by the processing resource, as discussed below in more detail with respect to FIGS. 2-4. In some examples, the instructions may be implemented as engines or circuitry comprising any combination of hardware and programming to implement the functionalities of the engines or circuitry, as described below.

[0020] Servers 116, 124, and 132 may comprise a hypervisor, container, or other virtualization technology to store, execute, serve, and/or run a virtual machine or virtual machines, e.g., virtual machines 118-122, 126-130, and 134-138. As discussed above, a hypervisor or container may allow a virtual machine to access underlying system hardware and/or the system BIOS, and may report usage data of the physical servers 116, 124, and 132. Usage data may include data related to the performance, responsiveness, throughput, or other metrics of the CPU, memory, disk, or other hardware or software components of a physical server hosting a hypervisor or container. In some examples, usage data may also comprise event data, such as user interface responsiveness data, network error data, and/or application crash data.

[0021] Usage of hardware components on servers 116, 124, and 132 may also be collected or fetched by a monitoring tool, which may include or communicate with a software development kit ("SDK") or application programming interface ("API") or other tool resident on the servers. In other examples, servers 116, 124, and 132 may be monitored by another device, server, or process, such as an external monitoring tool. In such examples, the external monitoring tool may query the servers directly, such as through a "pull" configuration, or may monitor data exported or transmitted from the devices, such as in a "push" configuration.

[0022] Data monitored on servers 116, 124, and 132 may be received, processed, or analyzed on, for example, subscription server or engine 112, which may be communicatively coupled to a database such as subscription database 114. As with servers 116, 124, and 132, subscription server 112 may include a processing resource and a machine-readable storage medium comprising or encoded with instructions executable by the processing resource, and the instructions may be implemented as engines or circuitry comprising any combination of hardware and programming to implement the functionalities of the engines or circuitry, as described bebw.

[0023] Data received, processed, or analyzed on, for example subscription server or engine 112 may be used on the subscription server, or another server, to determine, calculate, or pbt data related to usage and allocation of servers such as servers 116, 124, and 132. As discussed below in more detail, the data plotted may be based on usage and allocation data from CPUs, memory, disk, and/or other components of servers 116, 124, and 132.

[0024] In the example of FIG. 1, laptop 104 and smartphone 108 (hereinafter "user devices") may be computing devices that may include a processing resource and a machine-readable storage medium comprising or encoded with instructions executable by the processing resource, as discussed bebw in more detail with respect to FIGS. 2-4. In some examples, the instructions may be implemented as engines or circuitry comprising any combinatbn of hardware and programming to implement the functionalities of the engines or circuitry, as described bebw.

[0025] Laptop 104 and/or smartphone 108 may run applications such as word processing tools, spreadsheet tools, presentation tools, programming tools, communications tools, utilities, games, or other applications. In the example of FIG. 1 , laptop 104 and/or smartphone 108 may display, visually or non-visually, the usage and allocation data output by subscription server 112. In one example, laptop 104 and/or smartphone 108 may display such data as a scatter graph or other graphs, e.g., graphs 106 and 110.

[0026] In the example of FIG. 1 , servers 116, 124, and 132, user devices 104 and 108, and subscription server 112 may communicate over network 102. Network 102 may be a local network, private network, public network, or other wired or wireless communications network accessible by a user or users. As used herein, a network or computer network may include, for example, a local area network (LAN), a wireless local area network (WLAN), a virtual private network (VPN), the Internet, a cellular network, or a combination thereof.

[0027] In examples described herein, devices that communicate on network 102 may include a network interface device that may be a hardware device to communicate over at least one computer network. In some examples, a network interface may be a network interface card (NIC) or the like installed on or coupled to, for example, servers 116, 124, and 132, user devices 104 and 108, and subscription server 112.

[0028] FIG. 2 is a flowchart of outputting usage data and allocation data for server resource management, according to an example.

[0029] In block 202, CPU usage data from a hypervisor may be fetched. The CPU usage data may be fetched, measured, grouped, organized, or otherwise selected based on, for example, a percentile. For example, CPU usage data may comprise CPU usage data at the 99^th percentile over a given time period, such as the last hour, last day, or last week. In an example, the CPU usage data may be expressed as a percentage, such as a percentage of maximum CPU capacity on a hypervisor.

[0030] In block 204, memory usage data from a hypervisor is fetched. The memory usage data may be fetched, measured, grouped, organized, or otherwise selected based on, for example, a minimum or maximum memory size used to run the hypervisor, e.g., the 95th percentile of the memory use on a hypervisor when swapping at a rate of at least 10 pages per minute. More specifically, in some examples, the memory usage data may reflect a memory size available to the hypervisor at a given memory swap rate threshold, as discussed below in more detail with respect to FIG. 3. In an example, the memory usage data may be expressed as a percentage, such as a percentage of maximum memory capacity on a hypervisor.

[0031] In block 206, disk usage data, or storage device usage data, from a hypervisor is fetched. The disk usage data may be fetched, measured, grouped, organized, or otherwise selected based on, for example, a current usage or demand on the disk. In an example, the disk usage data may be expressed as a percentage, such as a percentage of maximum disk capacity on a hypervisor.

[0032] In block 208, allocation data for the CPU is calculated or otherwise determined. In an example, the CPU allocation is determined by fetching a CPU capacity, such as the number of cores, and multiplying the capacity by an oversubscription value or ratio. In an example, a physical server may have a capacity of 16 cores on a CPU. In such an example, if an oversubscription value is set to 8, the CPU allocation would be calculated as 128 virtual CPUs. In an example, the CPU allocation may be expressed as a percentage, such as a percentage of maximum CPU allocation on a hypervisor.

[0033] In block 210, allocation data for memory is calculated or otherwise determined. In an example, the memory allocation is determined by fetching a memory capacity, such as the number of gigabytes of memory capacity, and multiplying the capacity by an oversubscription value or ratio. In an example, a physical server may have a capacity of 256 gigabytes of memory. In such an example, if an oversubscription value is set to 2, the memory allocation would be calculated as 512 virtual gigabytes of memory. In an example, the memory allocation may be expressed as a percentage, such as a percentage of maximum memory allocation on a hypervisor.

[0034] In block 212, allocation data for a disk or other storage device is calculated or otherwise determined. In an example, the disk allocation is determined by fetching a disk capacity, such as the number of gigabytes or terabytes of storage capacity, and multiplying the capacity by an oversubscription value or ratio. In an example, a physical server may have a capacity of 10 terabytes of storage. In such an example, if an oversubscription value is set to 1.5, the disk allocation would be calculated as 15 virtual terabytes of disk storage. In an example, the disk allocation may be expressed as a percentage, such as a percentage of maximum disk allocation on a hypervisor.

[0035] In block 214, the usage data for the CPU, memory, and disk may be plotted to a first axis of a graph, such as a scatter graph, along with the corresponding allocation data for the CPU, memory, and disk to a second axis, as shown in the example of FIG. 5. Resources plotted above or below certain degree lines, e.g., the degree lines of FIG. 5, may indicate that their oversubscription ratios are too high or too low.

[0036] In some examples, the graph plotted or generated in block 214 may indicate when a server or compute node is trending toward performance issues, e.g., when the usage of the server is higher than the allocation. In such cases, the oversubscription value or ratio of the server or individual resource may be lowered to prevent a virtual machine scheduler or assignor from subscribing another virtual machine to the server trending toward performance issues. In some examples, the scheduler may automatically adjust based on the graph plotted in block 214, or a user may be alerted.

[0037] In some examples, the graph plotted or generated in block 214 may indicate when a server or compute node is considered full before its resources are used, e.g., when the usage of the server is lower than the allocation. In such cases, the oversubscription value or ratio of the server or resource may be raised to prevent resources from being wasted. As above, in some examples, the scheduler may automatically adjust based on the graph plotted in block 214, or a user may be alerted.

[0038] In some examples, the graph plotted or generated in block 214 may indicate that, in an example of an oversubscribed single resource, disk usage is higher than disk allocation on a particular hypervisor, but that CPU and memory usage match allocation. In such an example, oversubscription ratios may be altered to balance demand across servers and/or hardware components, or a particular resource may be increased such as by adding additional disk capacity to a cloud service or data center, as opposed to adding an entire physical server. In some examples, the graph may indicate or assist in spreading demand across an environment of diverse physical resources of various ages and capacities.

[0039] In some examples, the graph plotted or generated in block 214 may indicate that no servers or computing nodes are trending to performance issues, and that no servers or computing nodes are wasting resources. In such examples, the graph may present with tight clustering of resource types demonstrating a good match of CPU, memory, and disk resources relative to demand, i.e., servers are filling uniformly or being assigned virtual machines by a scheduler uniformly.

[0040] In some examples, the data plotted to a graph in block 214 may be plotted with "error ellipses" to simplify the graph, such that the graph does not display hundreds or thousands of data points, but instead ellipses. In such examples, one ellipse may be used for each resource, e.g., for each of CPU, memory, and disk, with each ellipse centered on the average usage and allocation point for all servers. In examples, the ellipse may be sized and rotated to have a minimum area and include a majority of data points for a resource. Statistical techniques may be employed to determine an error ellipse.

[0041] FIG. 3 is a flowchart of determining a memory block size at a given memory swap rate threshold.

[0042] In block 302, memory allocation is locked within a server. Locking memory allocation may comprise locking additional memory from being allocated to a hypervisor or container. For example, if a hypervisor is currently assigned 16 gigabytes of memory, in block 302, the hypervisor may be locked from requesting additional memory.

[0043] In block 304, a current memory swap rate is measured. A swap rate may be the number of memory blocks or pages swapped from memory to disk over a time period, such as per second.

[0044] In block 306, a memory swap rate threshold is fetched. The threshold may be a desired or pre-configured maximum number of pages swapped per second. For example, the threshold may limit the number of pages swapped in per second to 20 to minimize impact on system performance.

[0045] In block 308, the memory block size available to the hypervisor is decreased. For example, if the memory block size assigned to the hypervisor is 16 gigabytes, the block size may be decreased by one quarter of a gigabyte, or 256 megabytes, or another increment.

[0046] In block 310, the memory block size available to the hypervisor is decreased in a loop and the current memory swap rate is measured until the memory swap rate threshold is met. In an example, a hypervisor may have 10 pages swapped per second at 16 gigabytes of memory, but swap in 15 pages per second when the memory block allocation is decreased by 256 megabytes. When the memory block allocation is decreased by another 256 megabytes, the memory swap rate may increase to 20 pages per second, meeting the pre-determined threshold. In such an event, the flow of blocks 308 to 310 may proceed to block 312

[0047] In block 312, the memory block size available to the hypervisor after blocks 308 and 310 is output. For example, the memory block size may be output as an input to a calculation, such as the calculation or determination described in block 204 above. In the example above, the memory block size would be output as 15.5 gigabytes.

[0048] FIG. 4 is a block diagram of a system to output usage data and allocation data for server resource management, according to an example.

[0049] The computing system 400 of FIG. 4 may comprise a processing resource or processor 402. As used herein, a processing resource may be at least one of a central processing unit (CPU), a semiconductor-based microprocessor, a graphics processing unit (GPU), a field-programmable gate array (FPGA) configured to retrieve and execute instructions, other electronic circuitry suitable for the retrieval and execution of instructions stored on a machine-readable storage medium, or a combination thereof. Processing resource 402 may fetch, decode, and execute instructions, e.g., instructions 410, stored on memory or storage medium 404 to perform the functionalities described herein. In examples, the functionalities of any of the instructions of storage medium 404 may be implemented in the form of electronic circuitry, in the form of executable instructions encoded on a machine-readable storage medium, or a combination thereof.

[0050] As used herein, a "machine-readable storage medium" may be any electronic, magnetic, optical, or other physical storage apparatus to contain or store information such as executable instructions, data, and the like. For example, any machine-readable storage medium described herein may be any of Random Access Memory (RAM), volatile memory, non-volatile memory, flash memory, a hard drive, a solid state drive, any type of storage disc or optical disc, and the like, or a combination thereof. Further, any machine- readable storage medium described herein may be non-transitory.

[0051] System 400 may also include persistent storage and/or memory. In some examples, persistent storage may be implemented by at least one non-volatile machine- readable storage medium, as described herein, and may be memory utilized by system 400 for managing server resources, as described herein. Memory may be implemented by at least one machine-readable storage medium, as described herein, and may be volatile storage utilized by system 400 for performing the processes as described herein, for example. Storage 404 may be separate from a memory. In some examples, a memory may temporarily store data portions while performing processing operations on them, such as calculating an early adopter probability.

[0052] In examples described herein, a machine-readable storage medium or media is part of an article or article of manufacture. An article or article of manufacture may refer to any manufactured single component or multiple components. The storage medium may be located either in the computing device executing the machine-readable instructions, or remote from but accessible to the computing device (e.g., via a computer network) for execution.

[0053] In some examples, instructions 410 may be part of an installation package that, when installed, may be executed by processing resource 402 to implement the functionalities described herein in relation to instructions 410. In such examples, storage medium 404 may be a portable medium or flash drive, or a memory maintained by a server from which the installation package can be downloaded and installed. In other examples, instructions 410 may be part of an application, applications, or components) already installed on a computing device 104, 108, 112, 116, 124, or 132 including a processing resource.

[0054] System 400 may also include a network interface device 408, as described above, which may receive data such as data 412 - 418, e.g., via a network.

[0055] The instructions in or on the memory or machine-readable storage of system 400 may comprise a subscription engine 112 and/or subscription database 114. In block 410, the instructions may fetch CPU, memory, and disk usage from a container and calculate allocation data for the CPU, memory, and disk based on container capacities multiplied by an oversubscription value. In an example, the usage data and the container data may be plotted, e.g., onto a first axis and a second axis of a scatter graph, or otherwise output for visual or non-visual display.

[0056] Although the instructions of FIGS. 2-4 show a specific order of performance of certain functionalities, the instructions of FIG. 4 are not limited to that order. For example, the functionalities shown in succession may be performed in a different order, may be executed concurrently or with partial concurrence, or a combination thereof.

[0057] All of the features disclosed in this specification, including any accompanying claims, abstract and drawings, and/or all of the elements of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or elements are mutually exclusive.

Claims

CLAIMS What is claimed is:

1. A system for server resource management, comprising:

a cloud subscription database to receive, from a hypervisor, central processing unit (CPU) usage data, memory usage data, and disk usage data; and

a cloud subscription engine to:

calculate a CPU allocation based on a hypervisor CPU capacity multiplied by a CPU oversubscription value,

calculate a memory allocation based on a hypervisor memory capacity multiplied by a memory oversubscription value,

calculate a disk allocation based on a hypervisor disk capacity multiplied by a disk oversubscription value, and

plot the CPU usage data, memory usage data, and disk usage data to a first axis, and plot the CPU allocation, memory allocation, and disk allocation to a second axis.

2. The system of claim 1, wherein the CPU usage data is based on a predetermined percentile.

3. The system of claim 1 , wherein the memory usage data is based on a memory block size available to the hypervisor at a given memory swap rate threshold.

4. The system of claim 1, wherein the disk usage data is based on a current demand.

5. The system of claim 1 , wherein the hypervisor is to run a virtual machine.

6. The system of claim 1, wherein the engine is to output a scatter graph comprising the first axis and the second axis.

7. The system of claim 1 , wherein the engine is to assign a new virtual machine to a hypervisor based on the CPU usage data, the memory usage data, the disk usage data, the CPU allocation, the memory allocation, and the disk allocation.

8. The system of claim 1, wherein the engine is to alert a user of an oversubscription based on the CPU usage data, the memory usage data, the disk usage data, the CPU allocation, the memory allocation, and the disk allocation.

9. A method for server resource management, comprising:

locking memory allocation within a server;

measuring a current memory swap rate;

fetching a memory swap rate threshold; and

decreasing a memory block size available to a hypervisor and measuring the current memory swap rate until the memory swap rate threshold is met.

10. The method of claim 9, wherein the memory swap rate threshold is based on a pre-determined memory performance metric.

11. An article comprising at least one non-transitory machine-readable storage medium comprising instructions executable by a processing resource of a server resource management system to:

fetch central processing unit (CPU) usage data, memory usage data, and disk usage data for a container;

calculate a CPU allocation based on a container CPU capacity and an oversubscription value;

calculate a memory allocation based on a container memory capacity and an oversubscription value;

calculate a disk allocation based on a container disk capacity and an oversubscription value; and plot the CPU usage data, memory usage data, and disk usage data to a first axis, and plot the CPU allocation, memory allocation, and disk allocation to a second axis,

wherein the container memory capacity is based on a memory block size available to the container at a given memory swap rate threshold.

12. The article of claim 11, wherein the container is based on a shared operating system.

13. The article of claim 11, wherein the instructions are further to plot the first axis and the second axis to a scatter graph.

14. The article of claim 11, wherein the instructions are further to assign a new container based on the usage data, the memory usage data, the disk usage data, the CPU allocation, the memory allocation, and the disk allocation.

15. The article of claim 11, wherein the instructions are further to alert a user of an oversubscription based on the usage data, the memory usage data, the disk usage data, the CPU allocation, the memory allocation, and the disk allocation.