WO2017023237A1 - Server resource management - Google Patents

Server resource management Download PDF

Info

Publication number
WO2017023237A1
WO2017023237A1 PCT/US2015/043054 US2015043054W WO2017023237A1 WO 2017023237 A1 WO2017023237 A1 WO 2017023237A1 US 2015043054 W US2015043054 W US 2015043054W WO 2017023237 A1 WO2017023237 A1 WO 2017023237A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
allocation
usage data
disk
cpu
Prior art date
Application number
PCT/US2015/043054
Other languages
French (fr)
Inventor
Wade J. SATTERFIELD
Tyler Easterling
Pieter C. KRUITHOF
Zachary D. SCHRAG
Original Assignee
Hewlett Packard Enterprise Development Lp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Enterprise Development Lp filed Critical Hewlett Packard Enterprise Development Lp
Priority to PCT/US2015/043054 priority Critical patent/WO2017023237A1/en
Publication of WO2017023237A1 publication Critical patent/WO2017023237A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45583Memory management, e.g. access or allocation

Definitions

  • Computing systems, devices, and electronic components such as servers may run or execute in a networked environment and may run or support virtual machines or virtualized software, e.g., an operating system or a shared operating system.
  • a virtual machine may be used to run applications, services, and computer programs, and/or to provide other functionality to a host or remote computer.
  • a plurality of virtual machines may be resident on or across a plurality of physical servers, such that the computing load or demand of the virtual machines may be distributed across a physical hardware deployment.
  • FIG. 1 is a block diagram of a system to manage server resources, according to an example
  • FIG. 2 is a flowchart of outputting usage data and allocation data for server resource management, according to an example
  • FIG. 3 is a flowchart of determining a memory block size at a given memory swap rate threshold, according to an example
  • FIG. 4 is a block diagram of a system to output usage data and allocation data for server resource management, according to an example.
  • FIG. 5 is a visual output of usage data and allocation data for server resource management, according to an example.
  • Various examples described below provide for managing resources on servers in a cloud service or in a data center.
  • data from a physical server or servers hosting a hypervisor, container, virtual machine, and/or virtualized operating system is received. More specifically, in an example, CPU usage data, memory usage data, and disk usage data for a hypervisor and/or container (or server hosting a container) are received, and a CPU allocation, memory allocation, and disk allocation are calculated based on a capacity or capacities of the CPU, memory, and disk multiplied by an oversubscription value or ratio.
  • the CPU usage data, memory usage data, and disk usage data are plotted or otherwise output to a first axis, and the CPU allocation, memory allocation, and disk allocation are plotted or otherwise output to a second axis of, for example, a scatter graph.
  • a virtual machine may be, for example, software and/or hardware-based emulation of a physical machine, e.g., a computer, or more specifically of an operating system.
  • a virtual machine can be hosted by a host system that may include a physical server and/or a physical machine running a hypervisor, container, and/or other virtual machine software.
  • a hypervisor or container may be, for example, software that provides a virtualized environment including virtual machines allowing other software, including operating systems, to run on the host machine.
  • a hypervisor or container may allow a virtual machine to access underlying system hardware and/or the system BIOS.
  • a physical server may run one hypervisor, which itself may run a plurality of virtual machines.
  • a service may empby a plurality of physical servers, with each of the physical servers running a hypen/isor or container and a plurality of virtual machines.
  • servers may be "oversubscribed" such that the number of hypervisors or containers assigned or subscribed to a physical server would exceed the physical capacity of the server if all of the subscribed virtual machines were to have a full computational demand at once.
  • Oversubscription may be expressed in an oversubscription ratio, e.g., the ratio of virtual hardware resources to physical hardware resources.
  • each resource may have its own oversubscription ratio, as the oversubscription ratio for disk may be very different than the oversubscription ratio for CPU.
  • a physical hardware resource value multiplied by the oversubscription ratio for the resource may determine a limit for allocation of that resource. In an example, when a new allocation will bring the total of a server above this limit, the new load may not be placed on that server. If no server can accept the bad, then the cloud is considered full and the virtual machine or container is not created.
  • Such monitoring or visualization may also assist a tuning tool, such as a scheduler, that assigns or re-assigns new or existing virtual machines to hypervisors, containers, or physical hardware.
  • a tuning tool such as a scheduler
  • such monitoring or visualization may provide a state of the cloud view so an administrator can ensure oversubscription ratio are properly set, and to balance and size resources.
  • CPU, memory, and disk resources may be monitored independently, to minimize the impact on a physical server when one resource runs out of capacity before other resources on the same physical hardware.
  • FIG. 1 is a block diagram of a system to manage server resources, according to an example.
  • a cloud service or data center may refer to a collection of servers and other computing devices which may be on-site, off-site, private, public, co-located, or located across a geographic area or areas.
  • a cloud service may be used, for example, to host, execute, store, or otherwise process applications, websites, files, media, and other digital files, software, or machine-readable instructions.
  • a cloud service may comprise or communicate with computing devices such as servers, blade enclosures, workstations, desktop computers, laptops or notebook computers, point of sale devices, tablet computers, mobile phones, smart devices, or any other processing device or equipment including a processing resource.
  • a processing resource may include, for example, one processor or multiple processors included in a single computing device or distributed across multiple computing devices.
  • servers or compute nodes 116, 124, and 132 may be computing devices that include a processing resource and a machine-readable storage medium comprising or encoded with instructions executable by the processing resource, as discussed below in more detail with respect to FIGS. 2-4.
  • the instructions may be implemented as engines or circuitry comprising any combination of hardware and programming to implement the functionalities of the engines or circuitry, as described below.
  • Servers 116, 124, and 132 may comprise a hypervisor, container, or other virtualization technology to store, execute, serve, and/or run a virtual machine or virtual machines, e.g., virtual machines 118-122, 126-130, and 134-138.
  • a hypervisor or container may allow a virtual machine to access underlying system hardware and/or the system BIOS, and may report usage data of the physical servers 116, 124, and 132.
  • Usage data may include data related to the performance, responsiveness, throughput, or other metrics of the CPU, memory, disk, or other hardware or software components of a physical server hosting a hypervisor or container.
  • usage data may also comprise event data, such as user interface responsiveness data, network error data, and/or application crash data.
  • servers 116, 124, and 132 may also be collected or fetched by a monitoring tool, which may include or communicate with a software development kit ("SDK”) or application programming interface ("API") or other tool resident on the servers.
  • SDK software development kit
  • API application programming interface
  • servers 116, 124, and 132 may be monitored by another device, server, or process, such as an external monitoring tool.
  • the external monitoring tool may query the servers directly, such as through a "pull" configuration, or may monitor data exported or transmitted from the devices, such as in a "push” configuration.
  • Data monitored on servers 116, 124, and 132 may be received, processed, or analyzed on, for example, subscription server or engine 112, which may be communicatively coupled to a database such as subscription database 114.
  • subscription server 112 may include a processing resource and a machine-readable storage medium comprising or encoded with instructions executable by the processing resource, and the instructions may be implemented as engines or circuitry comprising any combination of hardware and programming to implement the functionalities of the engines or circuitry, as described bebw.
  • Data received, processed, or analyzed on, for example subscription server or engine 112 may be used on the subscription server, or another server, to determine, calculate, or pbt data related to usage and allocation of servers such as servers 116, 124, and 132. As discussed below in more detail, the data plotted may be based on usage and allocation data from CPUs, memory, disk, and/or other components of servers 116, 124, and 132.
  • laptop 104 and smartphone 108 may be computing devices that may include a processing resource and a machine-readable storage medium comprising or encoded with instructions executable by the processing resource, as discussed bebw in more detail with respect to FIGS. 2-4.
  • the instructions may be implemented as engines or circuitry comprising any combinatbn of hardware and programming to implement the functionalities of the engines or circuitry, as described bebw.
  • Laptop 104 and/or smartphone 108 may run applications such as word processing tools, spreadsheet tools, presentation tools, programming tools, communications tools, utilities, games, or other applications.
  • laptop 104 and/or smartphone 108 may display, visually or non-visually, the usage and allocation data output by subscription server 112.
  • laptop 104 and/or smartphone 108 may display such data as a scatter graph or other graphs, e.g., graphs 106 and 110.
  • Network 102 may be a local network, private network, public network, or other wired or wireless communications network accessible by a user or users.
  • a network or computer network may include, for example, a local area network (LAN), a wireless local area network (WLAN), a virtual private network (VPN), the Internet, a cellular network, or a combination thereof.
  • devices that communicate on network 102 may include a network interface device that may be a hardware device to communicate over at least one computer network.
  • a network interface may be a network interface card (NIC) or the like installed on or coupled to, for example, servers 116, 124, and 132, user devices 104 and 108, and subscription server 112.
  • NIC network interface card
  • FIG. 2 is a flowchart of outputting usage data and allocation data for server resource management, according to an example.
  • CPU usage data from a hypervisor may be fetched.
  • the CPU usage data may be fetched, measured, grouped, organized, or otherwise selected based on, for example, a percentile.
  • CPU usage data may comprise CPU usage data at the 99 th percentile over a given time period, such as the last hour, last day, or last week.
  • the CPU usage data may be expressed as a percentage, such as a percentage of maximum CPU capacity on a hypervisor.
  • memory usage data from a hypervisor is fetched.
  • the memory usage data may be fetched, measured, grouped, organized, or otherwise selected based on, for example, a minimum or maximum memory size used to run the hypervisor, e.g., the 95th percentile of the memory use on a hypervisor when swapping at a rate of at least 10 pages per minute. More specifically, in some examples, the memory usage data may reflect a memory size available to the hypervisor at a given memory swap rate threshold, as discussed below in more detail with respect to FIG. 3. In an example, the memory usage data may be expressed as a percentage, such as a percentage of maximum memory capacity on a hypervisor.
  • disk usage data or storage device usage data, from a hypervisor is fetched.
  • the disk usage data may be fetched, measured, grouped, organized, or otherwise selected based on, for example, a current usage or demand on the disk.
  • the disk usage data may be expressed as a percentage, such as a percentage of maximum disk capacity on a hypervisor.
  • allocation data for the CPU is calculated or otherwise determined.
  • the CPU allocation is determined by fetching a CPU capacity, such as the number of cores, and multiplying the capacity by an oversubscription value or ratio.
  • a physical server may have a capacity of 16 cores on a CPU.
  • the CPU allocation would be calculated as 128 virtual CPUs.
  • the CPU allocation may be expressed as a percentage, such as a percentage of maximum CPU allocation on a hypervisor.
  • allocation data for memory is calculated or otherwise determined.
  • the memory allocation is determined by fetching a memory capacity, such as the number of gigabytes of memory capacity, and multiplying the capacity by an oversubscription value or ratio.
  • a physical server may have a capacity of 256 gigabytes of memory.
  • the memory allocation would be calculated as 512 virtual gigabytes of memory.
  • the memory allocation may be expressed as a percentage, such as a percentage of maximum memory allocation on a hypervisor.
  • allocation data for a disk or other storage device is calculated or otherwise determined.
  • the disk allocation is determined by fetching a disk capacity, such as the number of gigabytes or terabytes of storage capacity, and multiplying the capacity by an oversubscription value or ratio.
  • a physical server may have a capacity of 10 terabytes of storage.
  • an oversubscription value is set to 1.5, the disk allocation would be calculated as 15 virtual terabytes of disk storage.
  • the disk allocation may be expressed as a percentage, such as a percentage of maximum disk allocation on a hypervisor.
  • the usage data for the CPU, memory, and disk may be plotted to a first axis of a graph, such as a scatter graph, along with the corresponding allocation data for the CPU, memory, and disk to a second axis, as shown in the example of FIG. 5.
  • a graph such as a scatter graph
  • Resources plotted above or below certain degree lines, e.g., the degree lines of FIG. 5, may indicate that their oversubscription ratios are too high or too low.
  • the graph plotted or generated in block 214 may indicate when a server or compute node is trending toward performance issues, e.g., when the usage of the server is higher than the allocation. In such cases, the oversubscription value or ratio of the server or individual resource may be lowered to prevent a virtual machine scheduler or assignor from subscribing another virtual machine to the server trending toward performance issues. In some examples, the scheduler may automatically adjust based on the graph plotted in block 214, or a user may be alerted.
  • the graph plotted or generated in block 214 may indicate when a server or compute node is considered full before its resources are used, e.g., when the usage of the server is lower than the allocation. In such cases, the oversubscription value or ratio of the server or resource may be raised to prevent resources from being wasted. As above, in some examples, the scheduler may automatically adjust based on the graph plotted in block 214, or a user may be alerted.
  • the graph plotted or generated in block 214 may indicate that, in an example of an oversubscribed single resource, disk usage is higher than disk allocation on a particular hypervisor, but that CPU and memory usage match allocation.
  • oversubscription ratios may be altered to balance demand across servers and/or hardware components, or a particular resource may be increased such as by adding additional disk capacity to a cloud service or data center, as opposed to adding an entire physical server.
  • the graph may indicate or assist in spreading demand across an environment of diverse physical resources of various ages and capacities.
  • the graph plotted or generated in block 214 may indicate that no servers or computing nodes are trending to performance issues, and that no servers or computing nodes are wasting resources.
  • the graph may present with tight clustering of resource types demonstrating a good match of CPU, memory, and disk resources relative to demand, i.e., servers are filling uniformly or being assigned virtual machines by a scheduler uniformly.
  • the data plotted to a graph in block 214 may be plotted with "error ellipses" to simplify the graph, such that the graph does not display hundreds or thousands of data points, but instead ellipses.
  • one ellipse may be used for each resource, e.g., for each of CPU, memory, and disk, with each ellipse centered on the average usage and allocation point for all servers.
  • the ellipse may be sized and rotated to have a minimum area and include a majority of data points for a resource. Statistical techniques may be employed to determine an error ellipse.
  • FIG. 3 is a flowchart of determining a memory block size at a given memory swap rate threshold.
  • Locking memory allocation may comprise locking additional memory from being allocated to a hypervisor or container. For example, if a hypervisor is currently assigned 16 gigabytes of memory, in block 302, the hypervisor may be locked from requesting additional memory.
  • a current memory swap rate is measured.
  • a swap rate may be the number of memory blocks or pages swapped from memory to disk over a time period, such as per second.
  • a memory swap rate threshold is fetched.
  • the threshold may be a desired or pre-configured maximum number of pages swapped per second.
  • the threshold may limit the number of pages swapped in per second to 20 to minimize impact on system performance.
  • the memory block size available to the hypervisor is decreased. For example, if the memory block size assigned to the hypervisor is 16 gigabytes, the block size may be decreased by one quarter of a gigabyte, or 256 megabytes, or another increment.
  • the memory block size available to the hypervisor is decreased in a loop and the current memory swap rate is measured until the memory swap rate threshold is met.
  • a hypervisor may have 10 pages swapped per second at 16 gigabytes of memory, but swap in 15 pages per second when the memory block allocation is decreased by 256 megabytes.
  • the memory swap rate may increase to 20 pages per second, meeting the pre-determined threshold. In such an event, the flow of blocks 308 to 310 may proceed to block 312
  • the memory block size available to the hypervisor after blocks 308 and 310 is output.
  • the memory block size may be output as an input to a calculation, such as the calculation or determination described in block 204 above.
  • the memory block size would be output as 15.5 gigabytes.
  • FIG. 4 is a block diagram of a system to output usage data and allocation data for server resource management, according to an example.
  • the computing system 400 of FIG. 4 may comprise a processing resource or processor 402.
  • a processing resource may be at least one of a central processing unit (CPU), a semiconductor-based microprocessor, a graphics processing unit (GPU), a field-programmable gate array (FPGA) configured to retrieve and execute instructions, other electronic circuitry suitable for the retrieval and execution of instructions stored on a machine-readable storage medium, or a combination thereof.
  • Processing resource 402 may fetch, decode, and execute instructions, e.g., instructions 410, stored on memory or storage medium 404 to perform the functionalities described herein.
  • the functionalities of any of the instructions of storage medium 404 may be implemented in the form of electronic circuitry, in the form of executable instructions encoded on a machine-readable storage medium, or a combination thereof.
  • a "machine-readable storage medium” may be any electronic, magnetic, optical, or other physical storage apparatus to contain or store information such as executable instructions, data, and the like.
  • any machine-readable storage medium described herein may be any of Random Access Memory (RAM), volatile memory, non-volatile memory, flash memory, a hard drive, a solid state drive, any type of storage disc or optical disc, and the like, or a combination thereof.
  • RAM Random Access Memory
  • any machine-readable storage medium described herein may be non-transitory.
  • System 400 may also include persistent storage and/or memory.
  • persistent storage may be implemented by at least one non-volatile machine- readable storage medium, as described herein, and may be memory utilized by system 400 for managing server resources, as described herein.
  • Memory may be implemented by at least one machine-readable storage medium, as described herein, and may be volatile storage utilized by system 400 for performing the processes as described herein, for example.
  • Storage 404 may be separate from a memory.
  • a memory may temporarily store data portions while performing processing operations on them, such as calculating an early adopter probability.
  • a machine-readable storage medium or media is part of an article or article of manufacture.
  • An article or article of manufacture may refer to any manufactured single component or multiple components.
  • the storage medium may be located either in the computing device executing the machine-readable instructions, or remote from but accessible to the computing device (e.g., via a computer network) for execution.
  • instructions 410 may be part of an installation package that, when installed, may be executed by processing resource 402 to implement the functionalities described herein in relation to instructions 410.
  • storage medium 404 may be a portable medium or flash drive, or a memory maintained by a server from which the installation package can be downloaded and installed.
  • instructions 410 may be part of an application, applications, or components) already installed on a computing device 104, 108, 112, 116, 124, or 132 including a processing resource.
  • System 400 may also include a network interface device 408, as described above, which may receive data such as data 412 - 418, e.g., via a network.
  • the instructions in or on the memory or machine-readable storage of system 400 may comprise a subscription engine 112 and/or subscription database 114.
  • the instructions may fetch CPU, memory, and disk usage from a container and calculate allocation data for the CPU, memory, and disk based on container capacities multiplied by an oversubscription value.
  • the usage data and the container data may be plotted, e.g., onto a first axis and a second axis of a scatter graph, or otherwise output for visual or non-visual display.
  • FIGS. 2-4 show a specific order of performance of certain functionalities
  • the instructions of FIG. 4 are not limited to that order.
  • the functionalities shown in succession may be performed in a different order, may be executed concurrently or with partial concurrence, or a combination thereof.

Abstract

Examples for managing resources on a server or servers include receiving data from a physical server or servers hosting a hypervisor, container, virtual machine, and/or virtualized operating system. In an example, CPU usage data, memory usage data, and disk usage data for a hypervisor and/or container are received, and a CPU allocation, memory allocation, and disk allocation are calculated based on a capacity or capacities of the CPU, memory, and disk multiplied by an oversubscription value or ratio. In an example, the CPU usage data, memory usage data, and disk usage data are plotted or otherwise output to a first axis, and the CPU allocation, memory allocation, and disk allocation are plotted or otherwise output to a second axis of, for example, a scatter graph.

Description

SERVER RESOURCE MANAGEMENT
BACKGROUND
[0001] Computing systems, devices, and electronic components such as servers may run or execute in a networked environment and may run or support virtual machines or virtualized software, e.g., an operating system or a shared operating system. A virtual machine may be used to run applications, services, and computer programs, and/or to provide other functionality to a host or remote computer. In a data center, networked, or "cloud" environment, a plurality of virtual machines may be resident on or across a plurality of physical servers, such that the computing load or demand of the virtual machines may be distributed across a physical hardware deployment.
BRIEF DESCRIPTION OF THE DRAWINGS
[0002] The following detailed description references the drawings, wherein:
[0003] FIG. 1 is a block diagram of a system to manage server resources, according to an example;
[0004] FIG. 2 is a flowchart of outputting usage data and allocation data for server resource management, according to an example;
[0005] FIG. 3 is a flowchart of determining a memory block size at a given memory swap rate threshold, according to an example;
[0006] FIG. 4 is a block diagram of a system to output usage data and allocation data for server resource management, according to an example; and
[0007] FIG. 5 is a visual output of usage data and allocation data for server resource management, according to an example.
DETAILED DESCRIPTION
[0008] Various examples described below provide for managing resources on servers in a cloud service or in a data center. In an example, data from a physical server or servers hosting a hypervisor, container, virtual machine, and/or virtualized operating system is received. More specifically, in an example, CPU usage data, memory usage data, and disk usage data for a hypervisor and/or container (or server hosting a container) are received, and a CPU allocation, memory allocation, and disk allocation are calculated based on a capacity or capacities of the CPU, memory, and disk multiplied by an oversubscription value or ratio. In an example, the CPU usage data, memory usage data, and disk usage data are plotted or otherwise output to a first axis, and the CPU allocation, memory allocation, and disk allocation are plotted or otherwise output to a second axis of, for example, a scatter graph.
[0009] Companies, organizations, and information technology departments continuously seek ways to improve computing performance and reduce computing budgets and expenditures. One way to meet this goal may be to employ a strategy utilizing virtual machines, which may allow for server consolidation resulting in lower purchase and maintenance costs for computing hardware.
[0010] A virtual machine may be, for example, software and/or hardware-based emulation of a physical machine, e.g., a computer, or more specifically of an operating system. A virtual machine can be hosted by a host system that may include a physical server and/or a physical machine running a hypervisor, container, and/or other virtual machine software.
[0011] A hypervisor or container may be, for example, software that provides a virtualized environment including virtual machines allowing other software, including operating systems, to run on the host machine. A hypervisor or container may allow a virtual machine to access underlying system hardware and/or the system BIOS. In some examples, a physical server may run one hypervisor, which itself may run a plurality of virtual machines. [0012] In a cloud environment or data center, a service may empby a plurality of physical servers, with each of the physical servers running a hypen/isor or container and a plurality of virtual machines. As the load or demand on a virtual machine at any given time may vary, servers may be "oversubscribed" such that the number of hypervisors or containers assigned or subscribed to a physical server would exceed the physical capacity of the server if all of the subscribed virtual machines were to have a full computational demand at once. Oversubscription may be expressed in an oversubscription ratio, e.g., the ratio of virtual hardware resources to physical hardware resources.
[0013] In some examples, each resource may have its own oversubscription ratio, as the oversubscription ratio for disk may be very different than the oversubscription ratio for CPU. A physical hardware resource value multiplied by the oversubscription ratio for the resource may determine a limit for allocation of that resource. In an example, when a new allocation will bring the total of a server above this limit, the new load may not be placed on that server. If no server can accept the bad, then the cloud is considered full and the virtual machine or container is not created.
[0014] While an arrangement including the ability to oversubscribe servers may lower the cost of operating a doud service or data center, oversubscription may present challenges to server administrators. Setting the ratios too high may risk performance problems, while setting the ratios too bw makes for a more expensive cloud. Monitoring or visualization of CPU, memory, and disk usage data measured against allocation data, e.g., based on an oversubscription ratio and/or various calculations, may assist a cbud administrator in balancing resource demand within a cbud service or data center to avoid server issues such as performance degradation or crashes, and to avoid wasting resources with undersubscription.
[0015] Such monitoring or visualization may also assist a tuning tool, such as a scheduler, that assigns or re-assigns new or existing virtual machines to hypervisors, containers, or physical hardware. Similarly, such monitoring or visualization may provide a state of the cloud view so an administrator can ensure oversubscription ratio are properly set, and to balance and size resources. In such a view, CPU, memory, and disk resources may be monitored independently, to minimize the impact on a physical server when one resource runs out of capacity before other resources on the same physical hardware.
[0016] FIG. 1 is a block diagram of a system to manage server resources, according to an example.
[0017] A cloud service or data center (hereinafter "cloud service*) may refer to a collection of servers and other computing devices which may be on-site, off-site, private, public, co-located, or located across a geographic area or areas. A cloud service may be used, for example, to host, execute, store, or otherwise process applications, websites, files, media, and other digital files, software, or machine-readable instructions.
[0018] A cloud service may comprise or communicate with computing devices such as servers, blade enclosures, workstations, desktop computers, laptops or notebook computers, point of sale devices, tablet computers, mobile phones, smart devices, or any other processing device or equipment including a processing resource. In examples described herein, a processing resource may include, for example, one processor or multiple processors included in a single computing device or distributed across multiple computing devices.
[0019] In the example of FIG. 1, servers or compute nodes 116, 124, and 132 may be computing devices that include a processing resource and a machine-readable storage medium comprising or encoded with instructions executable by the processing resource, as discussed below in more detail with respect to FIGS. 2-4. In some examples, the instructions may be implemented as engines or circuitry comprising any combination of hardware and programming to implement the functionalities of the engines or circuitry, as described below.
[0020] Servers 116, 124, and 132 may comprise a hypervisor, container, or other virtualization technology to store, execute, serve, and/or run a virtual machine or virtual machines, e.g., virtual machines 118-122, 126-130, and 134-138. As discussed above, a hypervisor or container may allow a virtual machine to access underlying system hardware and/or the system BIOS, and may report usage data of the physical servers 116, 124, and 132. Usage data may include data related to the performance, responsiveness, throughput, or other metrics of the CPU, memory, disk, or other hardware or software components of a physical server hosting a hypervisor or container. In some examples, usage data may also comprise event data, such as user interface responsiveness data, network error data, and/or application crash data.
[0021] Usage of hardware components on servers 116, 124, and 132 may also be collected or fetched by a monitoring tool, which may include or communicate with a software development kit ("SDK") or application programming interface ("API") or other tool resident on the servers. In other examples, servers 116, 124, and 132 may be monitored by another device, server, or process, such as an external monitoring tool. In such examples, the external monitoring tool may query the servers directly, such as through a "pull" configuration, or may monitor data exported or transmitted from the devices, such as in a "push" configuration.
[0022] Data monitored on servers 116, 124, and 132 may be received, processed, or analyzed on, for example, subscription server or engine 112, which may be communicatively coupled to a database such as subscription database 114. As with servers 116, 124, and 132, subscription server 112 may include a processing resource and a machine-readable storage medium comprising or encoded with instructions executable by the processing resource, and the instructions may be implemented as engines or circuitry comprising any combination of hardware and programming to implement the functionalities of the engines or circuitry, as described bebw.
[0023] Data received, processed, or analyzed on, for example subscription server or engine 112 may be used on the subscription server, or another server, to determine, calculate, or pbt data related to usage and allocation of servers such as servers 116, 124, and 132. As discussed below in more detail, the data plotted may be based on usage and allocation data from CPUs, memory, disk, and/or other components of servers 116, 124, and 132.
[0024] In the example of FIG. 1, laptop 104 and smartphone 108 (hereinafter "user devices") may be computing devices that may include a processing resource and a machine-readable storage medium comprising or encoded with instructions executable by the processing resource, as discussed bebw in more detail with respect to FIGS. 2-4. In some examples, the instructions may be implemented as engines or circuitry comprising any combinatbn of hardware and programming to implement the functionalities of the engines or circuitry, as described bebw.
[0025] Laptop 104 and/or smartphone 108 may run applications such as word processing tools, spreadsheet tools, presentation tools, programming tools, communications tools, utilities, games, or other applications. In the example of FIG. 1 , laptop 104 and/or smartphone 108 may display, visually or non-visually, the usage and allocation data output by subscription server 112. In one example, laptop 104 and/or smartphone 108 may display such data as a scatter graph or other graphs, e.g., graphs 106 and 110.
[0026] In the example of FIG. 1 , servers 116, 124, and 132, user devices 104 and 108, and subscription server 112 may communicate over network 102. Network 102 may be a local network, private network, public network, or other wired or wireless communications network accessible by a user or users. As used herein, a network or computer network may include, for example, a local area network (LAN), a wireless local area network (WLAN), a virtual private network (VPN), the Internet, a cellular network, or a combination thereof.
[0027] In examples described herein, devices that communicate on network 102 may include a network interface device that may be a hardware device to communicate over at least one computer network. In some examples, a network interface may be a network interface card (NIC) or the like installed on or coupled to, for example, servers 116, 124, and 132, user devices 104 and 108, and subscription server 112.
[0028] FIG. 2 is a flowchart of outputting usage data and allocation data for server resource management, according to an example.
[0029] In block 202, CPU usage data from a hypervisor may be fetched. The CPU usage data may be fetched, measured, grouped, organized, or otherwise selected based on, for example, a percentile. For example, CPU usage data may comprise CPU usage data at the 99th percentile over a given time period, such as the last hour, last day, or last week. In an example, the CPU usage data may be expressed as a percentage, such as a percentage of maximum CPU capacity on a hypervisor.
[0030] In block 204, memory usage data from a hypervisor is fetched. The memory usage data may be fetched, measured, grouped, organized, or otherwise selected based on, for example, a minimum or maximum memory size used to run the hypervisor, e.g., the 95th percentile of the memory use on a hypervisor when swapping at a rate of at least 10 pages per minute. More specifically, in some examples, the memory usage data may reflect a memory size available to the hypervisor at a given memory swap rate threshold, as discussed below in more detail with respect to FIG. 3. In an example, the memory usage data may be expressed as a percentage, such as a percentage of maximum memory capacity on a hypervisor.
[0031] In block 206, disk usage data, or storage device usage data, from a hypervisor is fetched. The disk usage data may be fetched, measured, grouped, organized, or otherwise selected based on, for example, a current usage or demand on the disk. In an example, the disk usage data may be expressed as a percentage, such as a percentage of maximum disk capacity on a hypervisor.
[0032] In block 208, allocation data for the CPU is calculated or otherwise determined. In an example, the CPU allocation is determined by fetching a CPU capacity, such as the number of cores, and multiplying the capacity by an oversubscription value or ratio. In an example, a physical server may have a capacity of 16 cores on a CPU. In such an example, if an oversubscription value is set to 8, the CPU allocation would be calculated as 128 virtual CPUs. In an example, the CPU allocation may be expressed as a percentage, such as a percentage of maximum CPU allocation on a hypervisor.
[0033] In block 210, allocation data for memory is calculated or otherwise determined. In an example, the memory allocation is determined by fetching a memory capacity, such as the number of gigabytes of memory capacity, and multiplying the capacity by an oversubscription value or ratio. In an example, a physical server may have a capacity of 256 gigabytes of memory. In such an example, if an oversubscription value is set to 2, the memory allocation would be calculated as 512 virtual gigabytes of memory. In an example, the memory allocation may be expressed as a percentage, such as a percentage of maximum memory allocation on a hypervisor.
[0034] In block 212, allocation data for a disk or other storage device is calculated or otherwise determined. In an example, the disk allocation is determined by fetching a disk capacity, such as the number of gigabytes or terabytes of storage capacity, and multiplying the capacity by an oversubscription value or ratio. In an example, a physical server may have a capacity of 10 terabytes of storage. In such an example, if an oversubscription value is set to 1.5, the disk allocation would be calculated as 15 virtual terabytes of disk storage. In an example, the disk allocation may be expressed as a percentage, such as a percentage of maximum disk allocation on a hypervisor.
[0035] In block 214, the usage data for the CPU, memory, and disk may be plotted to a first axis of a graph, such as a scatter graph, along with the corresponding allocation data for the CPU, memory, and disk to a second axis, as shown in the example of FIG. 5. Resources plotted above or below certain degree lines, e.g., the degree lines of FIG. 5, may indicate that their oversubscription ratios are too high or too low.
[0036] In some examples, the graph plotted or generated in block 214 may indicate when a server or compute node is trending toward performance issues, e.g., when the usage of the server is higher than the allocation. In such cases, the oversubscription value or ratio of the server or individual resource may be lowered to prevent a virtual machine scheduler or assignor from subscribing another virtual machine to the server trending toward performance issues. In some examples, the scheduler may automatically adjust based on the graph plotted in block 214, or a user may be alerted.
[0037] In some examples, the graph plotted or generated in block 214 may indicate when a server or compute node is considered full before its resources are used, e.g., when the usage of the server is lower than the allocation. In such cases, the oversubscription value or ratio of the server or resource may be raised to prevent resources from being wasted. As above, in some examples, the scheduler may automatically adjust based on the graph plotted in block 214, or a user may be alerted.
[0038] In some examples, the graph plotted or generated in block 214 may indicate that, in an example of an oversubscribed single resource, disk usage is higher than disk allocation on a particular hypervisor, but that CPU and memory usage match allocation. In such an example, oversubscription ratios may be altered to balance demand across servers and/or hardware components, or a particular resource may be increased such as by adding additional disk capacity to a cloud service or data center, as opposed to adding an entire physical server. In some examples, the graph may indicate or assist in spreading demand across an environment of diverse physical resources of various ages and capacities.
[0039] In some examples, the graph plotted or generated in block 214 may indicate that no servers or computing nodes are trending to performance issues, and that no servers or computing nodes are wasting resources. In such examples, the graph may present with tight clustering of resource types demonstrating a good match of CPU, memory, and disk resources relative to demand, i.e., servers are filling uniformly or being assigned virtual machines by a scheduler uniformly.
[0040] In some examples, the data plotted to a graph in block 214 may be plotted with "error ellipses" to simplify the graph, such that the graph does not display hundreds or thousands of data points, but instead ellipses. In such examples, one ellipse may be used for each resource, e.g., for each of CPU, memory, and disk, with each ellipse centered on the average usage and allocation point for all servers. In examples, the ellipse may be sized and rotated to have a minimum area and include a majority of data points for a resource. Statistical techniques may be employed to determine an error ellipse.
[0041] FIG. 3 is a flowchart of determining a memory block size at a given memory swap rate threshold.
[0042] In block 302, memory allocation is locked within a server. Locking memory allocation may comprise locking additional memory from being allocated to a hypervisor or container. For example, if a hypervisor is currently assigned 16 gigabytes of memory, in block 302, the hypervisor may be locked from requesting additional memory.
[0043] In block 304, a current memory swap rate is measured. A swap rate may be the number of memory blocks or pages swapped from memory to disk over a time period, such as per second.
[0044] In block 306, a memory swap rate threshold is fetched. The threshold may be a desired or pre-configured maximum number of pages swapped per second. For example, the threshold may limit the number of pages swapped in per second to 20 to minimize impact on system performance.
[0045] In block 308, the memory block size available to the hypervisor is decreased. For example, if the memory block size assigned to the hypervisor is 16 gigabytes, the block size may be decreased by one quarter of a gigabyte, or 256 megabytes, or another increment.
[0046] In block 310, the memory block size available to the hypervisor is decreased in a loop and the current memory swap rate is measured until the memory swap rate threshold is met. In an example, a hypervisor may have 10 pages swapped per second at 16 gigabytes of memory, but swap in 15 pages per second when the memory block allocation is decreased by 256 megabytes. When the memory block allocation is decreased by another 256 megabytes, the memory swap rate may increase to 20 pages per second, meeting the pre-determined threshold. In such an event, the flow of blocks 308 to 310 may proceed to block 312
[0047] In block 312, the memory block size available to the hypervisor after blocks 308 and 310 is output. For example, the memory block size may be output as an input to a calculation, such as the calculation or determination described in block 204 above. In the example above, the memory block size would be output as 15.5 gigabytes.
[0048] FIG. 4 is a block diagram of a system to output usage data and allocation data for server resource management, according to an example.
[0049] The computing system 400 of FIG. 4 may comprise a processing resource or processor 402. As used herein, a processing resource may be at least one of a central processing unit (CPU), a semiconductor-based microprocessor, a graphics processing unit (GPU), a field-programmable gate array (FPGA) configured to retrieve and execute instructions, other electronic circuitry suitable for the retrieval and execution of instructions stored on a machine-readable storage medium, or a combination thereof. Processing resource 402 may fetch, decode, and execute instructions, e.g., instructions 410, stored on memory or storage medium 404 to perform the functionalities described herein. In examples, the functionalities of any of the instructions of storage medium 404 may be implemented in the form of electronic circuitry, in the form of executable instructions encoded on a machine-readable storage medium, or a combination thereof.
[0050] As used herein, a "machine-readable storage medium" may be any electronic, magnetic, optical, or other physical storage apparatus to contain or store information such as executable instructions, data, and the like. For example, any machine-readable storage medium described herein may be any of Random Access Memory (RAM), volatile memory, non-volatile memory, flash memory, a hard drive, a solid state drive, any type of storage disc or optical disc, and the like, or a combination thereof. Further, any machine- readable storage medium described herein may be non-transitory.
[0051] System 400 may also include persistent storage and/or memory. In some examples, persistent storage may be implemented by at least one non-volatile machine- readable storage medium, as described herein, and may be memory utilized by system 400 for managing server resources, as described herein. Memory may be implemented by at least one machine-readable storage medium, as described herein, and may be volatile storage utilized by system 400 for performing the processes as described herein, for example. Storage 404 may be separate from a memory. In some examples, a memory may temporarily store data portions while performing processing operations on them, such as calculating an early adopter probability.
[0052] In examples described herein, a machine-readable storage medium or media is part of an article or article of manufacture. An article or article of manufacture may refer to any manufactured single component or multiple components. The storage medium may be located either in the computing device executing the machine-readable instructions, or remote from but accessible to the computing device (e.g., via a computer network) for execution.
[0053] In some examples, instructions 410 may be part of an installation package that, when installed, may be executed by processing resource 402 to implement the functionalities described herein in relation to instructions 410. In such examples, storage medium 404 may be a portable medium or flash drive, or a memory maintained by a server from which the installation package can be downloaded and installed. In other examples, instructions 410 may be part of an application, applications, or components) already installed on a computing device 104, 108, 112, 116, 124, or 132 including a processing resource.
[0054] System 400 may also include a network interface device 408, as described above, which may receive data such as data 412 - 418, e.g., via a network.
[0055] The instructions in or on the memory or machine-readable storage of system 400 may comprise a subscription engine 112 and/or subscription database 114. In block 410, the instructions may fetch CPU, memory, and disk usage from a container and calculate allocation data for the CPU, memory, and disk based on container capacities multiplied by an oversubscription value. In an example, the usage data and the container data may be plotted, e.g., onto a first axis and a second axis of a scatter graph, or otherwise output for visual or non-visual display.
[0056] Although the instructions of FIGS. 2-4 show a specific order of performance of certain functionalities, the instructions of FIG. 4 are not limited to that order. For example, the functionalities shown in succession may be performed in a different order, may be executed concurrently or with partial concurrence, or a combination thereof.
[0057] All of the features disclosed in this specification, including any accompanying claims, abstract and drawings, and/or all of the elements of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or elements are mutually exclusive.

Claims

CLAIMS What is claimed is:
1. A system for server resource management, comprising:
a cloud subscription database to receive, from a hypervisor, central processing unit (CPU) usage data, memory usage data, and disk usage data; and
a cloud subscription engine to:
calculate a CPU allocation based on a hypervisor CPU capacity multiplied by a CPU oversubscription value,
calculate a memory allocation based on a hypervisor memory capacity multiplied by a memory oversubscription value,
calculate a disk allocation based on a hypervisor disk capacity multiplied by a disk oversubscription value, and
plot the CPU usage data, memory usage data, and disk usage data to a first axis, and plot the CPU allocation, memory allocation, and disk allocation to a second axis.
2. The system of claim 1, wherein the CPU usage data is based on a predetermined percentile.
3. The system of claim 1 , wherein the memory usage data is based on a memory block size available to the hypervisor at a given memory swap rate threshold.
4. The system of claim 1, wherein the disk usage data is based on a current demand.
5. The system of claim 1 , wherein the hypervisor is to run a virtual machine.
6. The system of claim 1, wherein the engine is to output a scatter graph comprising the first axis and the second axis.
7. The system of claim 1 , wherein the engine is to assign a new virtual machine to a hypervisor based on the CPU usage data, the memory usage data, the disk usage data, the CPU allocation, the memory allocation, and the disk allocation.
8. The system of claim 1, wherein the engine is to alert a user of an oversubscription based on the CPU usage data, the memory usage data, the disk usage data, the CPU allocation, the memory allocation, and the disk allocation.
9. A method for server resource management, comprising:
locking memory allocation within a server;
measuring a current memory swap rate;
fetching a memory swap rate threshold; and
decreasing a memory block size available to a hypervisor and measuring the current memory swap rate until the memory swap rate threshold is met.
10. The method of claim 9, wherein the memory swap rate threshold is based on a pre-determined memory performance metric.
11. An article comprising at least one non-transitory machine-readable storage medium comprising instructions executable by a processing resource of a server resource management system to:
fetch central processing unit (CPU) usage data, memory usage data, and disk usage data for a container;
calculate a CPU allocation based on a container CPU capacity and an oversubscription value;
calculate a memory allocation based on a container memory capacity and an oversubscription value;
calculate a disk allocation based on a container disk capacity and an oversubscription value; and plot the CPU usage data, memory usage data, and disk usage data to a first axis, and plot the CPU allocation, memory allocation, and disk allocation to a second axis,
wherein the container memory capacity is based on a memory block size available to the container at a given memory swap rate threshold.
12. The article of claim 11, wherein the container is based on a shared operating system.
13. The article of claim 11, wherein the instructions are further to plot the first axis and the second axis to a scatter graph.
14. The article of claim 11, wherein the instructions are further to assign a new container based on the usage data, the memory usage data, the disk usage data, the CPU allocation, the memory allocation, and the disk allocation.
15. The article of claim 11, wherein the instructions are further to alert a user of an oversubscription based on the usage data, the memory usage data, the disk usage data, the CPU allocation, the memory allocation, and the disk allocation.
PCT/US2015/043054 2015-07-31 2015-07-31 Server resource management WO2017023237A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2015/043054 WO2017023237A1 (en) 2015-07-31 2015-07-31 Server resource management

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2015/043054 WO2017023237A1 (en) 2015-07-31 2015-07-31 Server resource management

Publications (1)

Publication Number Publication Date
WO2017023237A1 true WO2017023237A1 (en) 2017-02-09

Family

ID=57943334

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/043054 WO2017023237A1 (en) 2015-07-31 2015-07-31 Server resource management

Country Status (1)

Country Link
WO (1) WO2017023237A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113821336A (en) * 2021-03-08 2021-12-21 北京京东乾石科技有限公司 Resource allocation method and device, storage medium and electronic equipment
US11438244B2 (en) * 2019-10-17 2022-09-06 Dell Products L.P. System and method to monitor usage of information handling system using baseboard management controller

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004012038A2 (en) * 2002-07-25 2004-02-05 Surgient Networks Near on-line servers
US20090094355A1 (en) * 2007-10-09 2009-04-09 International Business Machines Corporation Integrated capacity and architecture design tool
US20120072910A1 (en) * 2010-09-03 2012-03-22 Time Warner Cable, Inc. Methods and systems for managing a virtual data center with embedded roles based access control
WO2013149343A1 (en) * 2012-04-03 2013-10-10 Gridcentric Inc. Method and system for memory oversubscription for virtual machines
WO2014094472A1 (en) * 2012-12-17 2014-06-26 华为技术有限公司 Global memory sharing method and device and communication system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004012038A2 (en) * 2002-07-25 2004-02-05 Surgient Networks Near on-line servers
US20090094355A1 (en) * 2007-10-09 2009-04-09 International Business Machines Corporation Integrated capacity and architecture design tool
US20120072910A1 (en) * 2010-09-03 2012-03-22 Time Warner Cable, Inc. Methods and systems for managing a virtual data center with embedded roles based access control
WO2013149343A1 (en) * 2012-04-03 2013-10-10 Gridcentric Inc. Method and system for memory oversubscription for virtual machines
WO2014094472A1 (en) * 2012-12-17 2014-06-26 华为技术有限公司 Global memory sharing method and device and communication system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11438244B2 (en) * 2019-10-17 2022-09-06 Dell Products L.P. System and method to monitor usage of information handling system using baseboard management controller
CN113821336A (en) * 2021-03-08 2021-12-21 北京京东乾石科技有限公司 Resource allocation method and device, storage medium and electronic equipment
CN113821336B (en) * 2021-03-08 2024-04-05 北京京东乾石科技有限公司 Resource allocation method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
US10969967B2 (en) Allocation and balancing of storage resources based on anticipated workload levels
US8762525B2 (en) Managing risk in resource over-committed systems
US8930542B2 (en) Dynamically building a set of compute nodes to host the user's workload
US10606624B2 (en) Placement of virtual machines on physical hosts
US8843621B2 (en) Event prediction and preemptive action identification in a networked computing environment
US9405569B2 (en) Determining virtual machine utilization of distributed computed system infrastructure
US20150261578A1 (en) Deployment of virtual machines to physical host machines based on infrastructure utilization decisions
US9130831B2 (en) Streaming state data for cloud management
WO2016090181A1 (en) Vertical scaling of computing instances
US10496392B2 (en) Staged application rollout
US10896058B2 (en) Managing virtual clustering environments according to requirements
US9448824B1 (en) Capacity availability aware auto scaling
US9417902B1 (en) Managing resource bursting
US20140282540A1 (en) Performant host selection for virtualization centers
US9448848B2 (en) Controlling placement of virtual machines on physical host machines and placement of physical host machines in cabinets
US20200026576A1 (en) Determining a number of nodes required in a networked virtualization system based on increasing node density
US20120323821A1 (en) Methods for billing for data storage in a tiered data storage system
US20210035011A1 (en) Machine Learning-Based Anomaly Detection Using Time Series Decomposition
US9448841B2 (en) Decommissioning physical host machines after relocating unproductive virtual machines therefrom
WO2017023237A1 (en) Server resource management
US20130103838A1 (en) System and method for transferring guest operating system
US9641384B1 (en) Automated management of computing instance launch times
WO2017082939A1 (en) Data center capacity management
US20200133733A1 (en) Hyper-converged infrastructure (hci) ephemeral workload/data provisioning system
US11507469B2 (en) Method and system for risk score based asset data protection using a conformal framework

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15900506

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15900506

Country of ref document: EP

Kind code of ref document: A1