CN117806768A - Resource management method and device - Google Patents

Resource management method and device Download PDF

Info

Publication number
CN117806768A
CN117806768A CN202211177480.XA CN202211177480A CN117806768A CN 117806768 A CN117806768 A CN 117806768A CN 202211177480 A CN202211177480 A CN 202211177480A CN 117806768 A CN117806768 A CN 117806768A
Authority
CN
China
Prior art keywords
instance
resource
occupancy
value
virtual machine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211177480.XA
Other languages
Chinese (zh)
Inventor
李国玺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Cloud Computing Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Cloud Computing Technologies Co Ltd filed Critical Huawei Cloud Computing Technologies Co Ltd
Priority to CN202211177480.XA priority Critical patent/CN117806768A/en
Publication of CN117806768A publication Critical patent/CN117806768A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5021Priority
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/504Resource capping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/508Monitor

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A resource management method and apparatus, the method can be applied to the computing device, in this method, the computing device obtains the index data of the first example operated in the computing device, this index data is the data produced by monitoring the said example; predicting a resource demand of the first instance over a future time period based on the index data of the first instance; and adjusting the occupancy limit of the first instance to the resource according to the resource demand of the first instance. The design can flexibly and automatically set the occupancy rate limit of the instance to the resource, dynamically allocate the resource for the instance, further reduce the resource reservation value and improve the overall resource utilization rate.

Description

Resource management method and device
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for resource management.
Background
A physical server carries multiple virtual machines, which may have multiple virtual central processing units (virtual central processing unit, vCPU), and multiple vcpus may share resources on the physical server, such as a core in the physical CPU, where each vCPU in each virtual machine and the core are scheduled by a time slicing principle. For example: two vcpus are respectively a vCPU1 and a vCPU2, the vCPU1 occupies 100ms and then switches, the vCPU2 occupies 100ms, and the vCPU1 occupies the switches, and the cycle is repeated.
In cloud systems, the price of large-sized virtual machines is generally high, and meanwhile, available resources of the large-sized virtual machines are relatively large, and the CPU utilization rate of the large-sized virtual machines in most of time may be low, so that the overall resource utilization rate of the CPU is low.
Disclosure of Invention
The application provides a resource management method and device, which are used for improving the utilization rate of CPU resources.
In a first aspect, the present application provides a resource management method, where the method may be executed by a server, where the server obtains index data of a first instance in a computing device, where the index data is data generated by monitoring the first instance, for example, a use condition of the first instance for a computing resource in a historical time period, and the first instance is any instance running in the computing device; predicting a resource demand of the first instance over a future period of time based on the index data of the first instance; according to the resource demand of the first instance, the occupation rate limit of the first instance on the resource is adjusted, which may also be called as a resource limit, and may refer to the upper limit of the resource of the first instance in a future period of time.
According to the method, the server acquires the index data of the instance, predicts the resource demand of the instance in a period of time in the future based on the index data of the instance, adjusts the occupancy limit of the instance to the resource according to the resource demand, and can flexibly and automatically set the occupancy limit of the instance to the resource, so that the resource is dynamically allocated to the instance, the resource reservation value can be further reduced, and the overall resource utilization rate can be improved.
In one possible implementation, the index data of the first instance includes one or more of the following:
resource usage information of the first instance, resource quality information of the first instance;
the resource use information of the first instance is used for indicating the use condition of the first instance on one or more resources; the resource quality information of the first instance is used to indicate a resource quality of one or more resources used by the first instance.
Illustratively, the resource usage information of the first instance includes: for indicating an occupancy of processor resources by the first instance; the resource quality information of the first instance includes one or more of the following: scheduling delay, burst current limiting times, burst current limiting time length and the queuing number of vCPU in the first instance.
By the method, the occupancy rate limit of the instance to the resource is set based on the resource usage amount and the resource quality of the instance, so that the scheduling fairness can be improved from multiple dimensions.
In one possible implementation, the method further includes: determining an integrated value of a first instance according to the integrated information of the first instance and the resource usage information of the first instance; the integrated value is for use in a case where the occupancy of the resource by the first instance exceeds an occupancy limit.
By the method, the instance can use integration when the resource burst requirement exists, compensate the instance of the long-term yielding resource and assist in providing the burst performance of the instance.
In one possible implementation, the integration information of the first instance includes one or more of the following:
integral value of the first instance, integral reference value; the integration reference value is a division value of the integration and the consumption integration.
In one possible implementation, the method further includes: determining a prioritization of multiple instances within the computing device requesting the same resource; the priority order of the plurality of instances is determined according to the integral value and/or the resource quality information of each instance in the plurality of instances, wherein the resource quality information of the instance is used for indicating the resource quality of the resource used by the instance, and the higher the integral value of the instance, the higher the instance priority; the lower the resource quality of the instance, the higher the instance priority; the plurality of examples include: an instance in which the occupancy of the resource does not reach the occupancy limit, and/or an instance in which the occupancy of the resource exceeds the occupancy limit and has an integral; scheduling some or all of the plurality of instances to run according to the prioritization of the plurality of instances.
By the method, when a plurality of examples request the same resource, the examples can be scheduled based on the priority ordering of the plurality of examples, the more the examples with integral and/or the examples with lower resource quality can obtain scheduling opportunities preferentially, the examples with long-term yielding resources are compensated, and fair scheduling is realized.
In one possible implementation, if the part or all of the instances include the first instance, and the occupancy of the resource by the first instance reaches the occupancy limit of the resource by the first instance; the method further comprises the steps of: and determining the burst quantity of the first instance according to the integral value of the first instance and the number of the residual resources, wherein the burst quantity refers to the part of the first instance, the occupancy rate of which to the resources exceeds the occupancy rate limit of the first instance to the resources.
By the method, when the occupancy rate of the instance to the resource reaches the occupancy rate limit, the instance operation can be continuously scheduled by using the integral of the instance, the instance of long-term low CPU utilization operation or long-term yielding resource can be compensated, and the burst performance of the instance can be improved.
In one possible implementation, at least a first instance and a second instance are running in the computing device; the occupancy quota corresponding to the first instance is in a first threshold range; the occupancy quota corresponding to the second instance is in a second threshold range; the first threshold range is different from the second threshold range.
By the method, the instances with different threshold types are deployed in the same computing equipment, so that resource competition among the instances is reduced.
In a second aspect, the present application further provides a computing device, where the computing device has a function of implementing the actions in the method instance of the first aspect, and the beneficial effects may be referred to the description of the first aspect and are not repeated herein. The functions may be realized by hardware, or may be realized by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above. In one possible design, the device includes an acquisition module, a prediction module, and an adjustment module. These modules may perform the functions of the actions in the method examples of the first aspect, which are specifically referred to in the detailed description of the method examples and are not described herein.
In a third aspect, the present application further provides a computing device cluster, where the computing device cluster includes at least one computing device, and the at least one computing device has a function of implementing the behavior in the method instance of the first aspect, and the beneficial effects may be referred to in the description of the first aspect and are not repeated herein. Each computing device includes a processor and a memory in its structure, the processor being configured to support the computing device to perform some or all of the functions of the acts of the method of the first aspect described above. The memory is coupled to the processor and holds the program instructions and data necessary for the computing device. The architecture of the computing device also includes a communication interface for communicating with other devices.
In a fourth aspect, the present application also provides a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the method of the first aspect and each of the possible designs of the first aspect described above.
In a fifth aspect, the present application also provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect and each of the possible designs of the first aspect described above.
In a sixth aspect, the present application further provides a computer chip, the chip being connected to a memory, the chip being configured to read and execute a software program stored in the memory, and to perform the method according to the first aspect and each possible implementation manner of the first aspect.
Drawings
Fig. 1 is a schematic diagram of a possible network architecture according to an embodiment of the present application;
FIG. 2 is a diagram of a relationship between a physical CPU and a vCPU;
fig. 3 is a schematic structural diagram of a virtual machine according to an embodiment of the present application;
fig. 4 is a flow chart of a resource management method according to an embodiment of the present application;
fig. 5 is a schematic view of a scenario for collecting example index data according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a scheduling manner of an example provided in an embodiment of the present application;
FIG. 7 is a flow chart of another example integration method provided in an embodiment of the present application;
FIG. 8 is a schematic diagram of an integration curve according to an embodiment of the present disclosure;
FIG. 9 is a flowchart illustrating another resource management method according to an embodiment of the present disclosure;
FIG. 10 is a schematic diagram of a computing device according to an embodiment of the present disclosure;
FIG. 11 is a schematic diagram of a computing device according to an embodiment of the present application;
FIG. 12 is a schematic diagram of a computing device cluster according to an embodiment of the present disclosure;
fig. 13 is a schematic structural diagram of another computing device cluster according to an embodiment of the present application.
Detailed Description
For ease of understanding, some of the terms referred to in the embodiments of the present application will be explained first.
(1) In an example, a cloud service provider sells a basic unit of a service, each service may be deployed as a set of examples, a service may be implemented by software or an application, and a set of examples may also be understood as an application for implementing a service function running in N different running environments, where the service includes N examples, where N is an integer greater than 0. The runtime environment here may be a virtual machine or container, that is, an instance is typically a virtual machine or container running on the same physical machine, providing a complete infrastructure abstraction through software and hardware isolation techniques.
(2) Virtual machines refer to complete computer systems that run in a completely isolated environment with complete hardware system functionality through software emulation. Work that can be done in a physical machine can be done in a virtual machine. When creating a virtual machine in a computer, it is necessary to use a part of the hard disk and the memory capacity of the physical machine as the hard disk and the memory capacity of the virtual machine. Each virtual machine has an independent operating system, and can operate as if it were a physical machine. In practical application, one physical server can virtualize a plurality of virtual machines through a virtualization technology.
(3) The container is a way for realizing the virtualization of the operating system, and can enable a user to run an application program and the dependency relationship thereof in a process with isolated resources.
(4) Bare metal servers refer to physical servers, i.e., hosts (host), that can provide tenant with proprietary cloud resources.
(5) Resource Quota (quanta): the quota of CPU operations is typically expressed in terms of time that can run within a certain period.
(6) Clock ticks: and a passive scheduler mechanism of the CPU scheduler of the kernel of the operating system, wherein each fixed period kernel can respond to the clock interrupt to complete the scheduling activity.
(7) Scheduling delay: representing the time from the start of queuing into the run queue to the first dispatch to run on the CPU.
Fig. 1 is a schematic architecture diagram of a server according to an embodiment of the present application, where, as shown in fig. 1, the server includes a hardware layer and a software layer. The hardware layer is a conventional configuration of the server (e.g., CPU, memory, etc.), where the PCI devices may be, for example, network cards, GPUs, offload cards, etc. devices that may be plugged into the PCI/PCIe slots of the server. The software layer includes an operating system (which may be referred to as a host operating system with respect to the operating system of the virtual machine) installed and running on a server, and a virtual machine manager VMM (which may also be referred to as a Hypervisor) is provided in the host machine, the role of the virtual machine manager including implementing the computing virtualization of the virtual machine. Computing virtualization refers to providing portions of the computing resources (e.g., CPUs) of a server to a virtual machine. In brief, CPU virtualization refers to virtualizing one physical CPU core into multiple vcpus for use by a virtual machine.
Referring to fig. 2, a server running a virtual machine is generally provided with a plurality of physical CPUs, where one physical CPU may virtualize one vCPU or a plurality of vcpus, where a physical CPU is a processor in a hardware layer shown in fig. 1, and a vCPU is a virtual processor in a software layer shown in fig. 1 (for keeping brevity, only one vCPU is shown in each virtual machine in fig. 1, but the number of vcpus included in one virtual machine is not limited in the present application). Specifically, in the architecture shown in fig. 2, the server is provided with 4 physical CPUs, which are respectively denoted as CPU0, CPU1, CPU2, and CPU3, each physical CPU includes 4 CPU cores (CPU cores), and the number of vCPU that each CPU core can provide=3.
The virtual machine manager may provide the virtual machine with the total number of vcpus used = the number of physical CPUs (also referred to as socket number) x the number of CPU cores per physical CPU band x the number of vcpus supported per CPU core = 48. As shown in fig. 3, one hyper-thread may be provided to the virtual machine as 1 vCPU, at which time one physical CPU core may stably provide three vccpus.
Next, in conjunction with fig. 4, a method for resource management is provided in an embodiment of the present application. For convenience of description, the method is described below by taking the server shown in fig. 1 as an example. As shown in fig. 4, the method may include the steps of:
in step 401, the server collects index data of the first instance.
The first instance is any instance running on the server, and the server may collect index data of any instance running on the server periodically or in real time in units of instances (denoted as a first period). Further exemplary, the server may collect index data of any one instance running on each CPU core with the CPU core as granularity. Examples include one or more of virtual machines, containers, bare metal servers, and the like. The index data of an instance includes, but is not limited to, index data of any vCPU that the instance includes. The index data of the vCPU includes data generated by monitoring one or more performance indexes of the vCPU. Wherein, the index data of the vCPU comprises resource use information and resource quality information of the vCPU.
Wherein the performance metrics include, but are not limited to, one or more of the following:
(1) Performance indexes indicating the use condition of the vCPU resources, such as vCPU utilization, vCPU occupancy, vCPU utilization, and the like;
(2) Performance indicators indicating the quality of the resources used by the vCPU, such as vCPU scheduling delay, vCPU burst current limit times, vCPU burst current limit duration.
For example, taking the virtual machine 1 in fig. 1 as an example, and please understand with reference to fig. 5, assume that the virtual machine 1 includes a vCPU1, and the server collects the index data of the virtual machine 1 in the first period, where the index data of the virtual machine 1 includes the index data of the vCPU1 included in the virtual machine 1.
Exemplary, the index data of the vCPU1 includes:
1) The vCPU1 has one or more vCPU utilization within a first time period. For example, the server collects the vCPU utilization rates of the vCPU1 at a plurality of times (denoted as sampling times) in the first period, and the collected values of the vCPU utilization rates are arranged according to the sampling time sequence, so as to obtain a sampling sequence (denoted as a first sequence) for the CPU utilization rate.
Accordingly, the index data of vCPU1 during the first time period may include one or more of the following: the maximum value, minimum value, average value, standard deviation, and one or more specified position(s) of vCPU utilization values within the first sequence, such as a median value representing intermediate positions within the first sequence. The specified position may also be represented by Pn, n being a positive integer, such as P50 representing the value at 1/2 of the first sequence, i.e. the median. P25 represents the value at 1/4 in the first sequence. P80 represents the value at 4/5 in the first sequence, and so on.
2) The vCPU1 schedules the delay data for a first period of time. For example, the server monitors the schedule delay time of the vCPU1 in the first period, where the schedule delay time refers to the period from when the vCPU1 requests the CPU resource to when the CPU resource is acquired.
The number of times that the vCPU1 requests the CPU resources in the first time period is random, and the server monitors the scheduling delay time corresponding to one or more times that the vCPU1 requests the CPU resources in the first time period respectively, and the collected scheduling delay times are arranged according to the sequence of sampling time to obtain a sampling sequence (denoted as a second sequence) aiming at the scheduling delay time.
Accordingly, the schedule delay data of vCPU1 during the first time period includes one or more of the following: the minimum, maximum, average, standard value of the scheduling delay time in the second sequence, the value of one or more specified positions in the second sequence (such as P25, P50, P80, P90, P95, etc.), the number of scheduling times-out. Wherein, the schedule timeout means that the schedule delay time exceeds a set time threshold. The number of schedule times is the number of schedule delay times exceeding the set time threshold in the second sequence.
3) The number of bursty current limiting of vCPU1 in the first time period. The burst throttling number refers to the number of times that the vCPU requests burst resources are greater than a resource allowance (quota) in one scheduling period. A burst refers to a CPU resource that is required by the vCPU in one scheduling cycle exceeding the limit of the vCPU. For example, assuming that a scheduling period includes 100ms, vCPU1 continues to request CPU resources after running for 50ms in the scheduling period, i.e. a burst, if vCPU1 runs for 50ms in the scheduling period, the scheduler limits the vCPU to run in the scheduling period, which is denoted as a burst current limit.
4) The bursty current limit duration of vCPU1 in the first time period.
In addition to the index data of any vCPU described above, the index data of an instance may further include vCPU queuing number distribution data within the instance. The vCPU queuing refers to the number of vcpus having CPU resource requirements at one time in the virtual machine 1, for example, the server collects the queuing number of the vcpus in the virtual machine 1 at a plurality of sampling times in a first period (the queuing number at each time has a value range of [0, n ], n is the number of vcpus in the virtual machine 1), so as to obtain a sampling sequence (denoted as a third sequence) for the queuing number of the vcpus.
Accordingly, the vCPU queuing number distribution data of the virtual machine 1 may include one or more of the following: the minimum, maximum, average, standard value of the number of vCPU queues in the third sequence, the value of one or more specified locations in the third sequence (e.g., P25, P50, P80, P90, etc.).
The index data of other first instances (such as the virtual machine 2) on the server are obtained based on the same method, which is not described herein. It should be noted that, if the instance includes multiple vcpus (e.g., the virtual machine 2 includes the vCPU2 and the vCPU 3), the multiple vcpus are executed in turn on the CPU core, and at this time, when the index data of the vCPU of the virtual machine 2 is collected, the collected vCPU may be any one of the multiple vcpus included in the virtual machine 2, that is, the vCPU2 or the vCPU3.
In step 402, the server predicts a resource requirement value for the first instance based on the index data for the first instance.
Illustratively, the server predicts a resource demand value for the first instance over a future time period (denoted as a second time period) based on the index data of the first instance over the first time period.
In one embodiment, the server takes the index data of the first instance as input data of the prediction model, that is, inputs the index data of the first instance into the prediction model, so as to obtain a result (simply referred to as an output result) output by the prediction model. Wherein the prediction model predicts the resource requirement value of an object based on historical index data of the object. Objects herein include servers, instances, vcpus, etc. The output results of the predictive model as described above may be used to indicate the resource requirement or range of resource requirements of the first instance over the second period of time.
Where the resource demand value refers to a demand for resources in the second period of time, for example, the demand for CPU resources may range from 0 to 100%,100% means that CPU resources are required for all of the second period of time, 50 means that CPU resources are required for 50% of the time in the second period of time, and so on.
In another embodiment, the server may first process the index data of the first instance, and then input the processed index data into the prediction model. The process may include: data filtering, data complement, and the like. (1) The data filtering refers to screening the index data of the first instance based on the set condition to screen out the data meeting the set condition. The setting conditions may include a numerical range condition and a numerical logic condition, where the numerical range condition indicates that a numerical value of a performance index should be in a certain numerical range, and if the numerical range is exceeded, the numerical range is invalid, for example, the utilization of the vCPU cannot be negative, etc. The numerical logic condition means that the minimum value of vCPU utilization should be smaller than the maximum value of vCPU utilization, etc. (2) The data complement refers to checking whether the index data of the first instance is missing data, and if missing, performing complement, for example, using adjacent values for complement or complement 0.
The first period of time may be an adjacent period of time before the second period of time, or the first period of time may be a historical contemporaneous time of the second period of time. The length of the first period may be the same as or different from the length of the second period, and is not specifically limited.
In this embodiment, the prediction model may be a machine learning model, a linear regression learning model, an XGBOOST classifier, or the like, which is not particularly limited. The model training may be performed by an online training method or an offline training method, which will not be described in detail herein. Model training may be performed by the server or by other devices (referred to as training devices), and the training devices send the trained prediction modules to the server for use.
The server predicts the resource requirement values of other instances, see the above method, and will not be described here again.
In step 403, the server adjusts the occupancy limit of the first instance on the resource according to the resource requirement value (denoted as n) of the first instance.
The occupancy limit of an instance on a resource may be understood as a resource limit (quota) that the instance may use over a period of time, the resource limit referring to the maximum value of the resources that the instance may use. Hereinafter, the instance has the same meaning as the occupancy limit of the resource. Illustratively, an instance's quota=n×n×t, where N represents the number of vcpus included in the instance and T represents the length of the scheduling period. In one embodiment, the length of the second time period is equal to T.
Continuing with the above example, the server determines a CPU resource requirement value (denoted as n) based on the virtual machine 1 during a second time period 1 ) Determining CPU resource quota (noted as quota for virtual machine 1 1 ) The method comprises the steps of carrying out a first treatment on the surface of the And, based on the CPU resource demand value (denoted as n) of the virtual machine 2 in the second period of time 2 ) Determining CPU resource quota (noted as quota for virtual machine 2 2 ). Wherein, quota 1 =N 1 *n 1 *T。quota 2 =N 2 *n 2 *T。
For example, referring to fig. 6, process a and process B share resources of one CPU core, and process a and process B run in turn in one scheduling period. Let process a be virtual machine 1 in fig. 3 and process B be virtual machine 2 in fig. 3.
Assuming that the length of one scheduling period is 100ms, the quota=50 ms of the virtual machine 1, and the quota=20 ms of the virtual machine 2, that is, in this scheduling period, the virtual machine 1 may run for 50ms and the virtual machine 2 may run for 20ms. As shown in fig. 6, one scheduling period includes 10 clock ticks, each of which includes 10ms. Illustratively, the scheduling period includes three phases, respectively:
stage one: in the first four clock ticks, virtual machine 1 and virtual machine 2 halve the CPU time, each running for 20ms. It should be noted that stage one, the quota of virtual machine 2 has been exhausted, i.e., the occupancy of resources by virtual machine 2 has reached the resource quota of virtual machine 2.
The lengths of time (or time slices) in which the virtual machines 1 and 2 run may be the same in each clock tick (e.g., 5ms each in each clock tick of fig. 6). This may depend on the nice values of virtual machine 1, virtual machine 2, with lower nice values being the greater the time-slicing length and higher nice values being the shorter the time-slicing length. For example, if the nice values of virtual machine 1 and virtual machine 2 are the same, then the two VMs may bisect the clock tick, i.e., the time slicing lengths of the two VMs within one clock tick are the same. If the nices of virtual machine 1 and virtual machine 2 are different, then the time-slicing lengths of the two VMs within one clock tick are different, such as nice value of virtual machine 1=0, nice value of virtual machine 2=1, and as an example, virtual machine 1 may run for 7ms within one clock tick and virtual machine 2 for 3ms within one clock tick. The nice value of each VM may be set by the tenant that uses the VM. Typically, the nice value of the VM will be set to a minimum value to enhance competitiveness.
The order in which virtual machine 1 and virtual machine 2 are scheduled may be determined based on the priorities of virtual machine 1 and virtual machine 2, e.g., virtual machine 1 is scheduled in preference to virtual machine 2 if virtual machine 1 has a higher priority than virtual machine 2. If the priority of virtual machine 1 is the same as that of virtual machine 2, then scheduling may be on a "first come first serve" basis, such as scheduling in the order in which virtual machine 1 and virtual machine 2 are queued. The manner of determining the priority of the example may be any existing manner, or may be a manner provided in the present application, please refer to the description of fig. 7 below, which is not repeated herein.
Stage two: after the fourth clock tick, the virtual machine 1 continues to run for 30ms.
Stage three: after the seventh clock tick, the quota of virtual machine 1 is exhausted, i.e., the occupancy of resources by virtual machine 1 has reached the resource quota of virtual machine 1.
In one possible scenario, virtual machine 2 still has CPU resource requirements in stage two or stage three (e.g., the process to which virtual machine 2 corresponds is still waiting in the task queue for scheduling), and virtual machine 2 is current limited because the quota of virtual machine 2 is exhausted. Optionally, the burst current limiting frequency of the virtual machine 2 is updated, and the burst current limiting frequency of the virtual machine 2 is +1. Similarly, VM3 is throttled when there is a CPU resource demand in stage three, since VM 3's quota is exhausted. Optionally, the burst current limit number of VM3 is updated, and the burst current limit number of VM3 is +1.
Of course, the virtual machine 2 may not have CPU resource requirements in the second or third stage, i.e. the virtual machine 2 is not limited in the scheduling period, so the burst limited number of times of the virtual machine 2 is not updated. Similarly, VM3 may not have CPU resource requirements in stage three, i.e., virtual machine 3 is not constrained during the scheduling period, and therefore does not update the bursty current limit of VM 3.
Notably, the quota value of an instance represents an upper resource usage limit, not the actual resource usage. For example, the actual running time of the virtual machine 1 in the scheduling period may also be less than 50ms. It can be understood that if the result of prediction by the prediction model is accurate, the resource usage amount of the virtual machine 1 should=the resource demand amount of the virtual machine 1 predicted by the prediction model=the quota value.
According to the design, the server acquires the index data of the instance, predicts the resource demand of the instance in a period of time in the future based on the index data of the instance, adjusts the occupancy limit of the instance to the resource according to the resource demand, dynamically and flexibly adjusts the occupancy limit of the instance to the resource, fairly and dynamically allocates the resource for the instance, and further can reduce the resource reservation value and improve the overall resource utilization rate.
The embodiment of the application provides another resource management method for improving the burst performance of an instance. In which the method is performed by the server in fig. 3, for example, as shown in fig. 7, the method may include the steps of:
in step 701, the server obtains integration information and resource information of the first instance.
The first instance is any one instance running on the server, and the integral information of the instance includes one or more of an integral value of the instance, an integral reference value of the instance, and the like. The resource information of an instance includes the resource quota (i.e., quota) of the instance over a period of time.
Such as the server obtaining an integral value of the first instance at the first time and a resource quota for a period of time after the first time. For example, as will be appreciated in connection with fig. 5, in the method embodiment of fig. 4, the server obtains, at the start time of the second time period (e.g., the first time T1 in fig. 5), a quota value of the virtual machine 1 in the second time period (the quota value may be determined by the method embodiment of fig. 4), and an integral value of the virtual machine 1 in the first time. It should be understood that if no integration is performed, the initial value of the integration of the first instance is 0, that is, the integration value of the virtual machine 1 at the first moment is 0, or the embodiment supports the integration of the tenant purchasing instance, and the corresponding initial integration value may also be the integration value purchased by the user.
The integration reference value of an instance is a static fixed value that can be used to indicate the baseline performance of the instance, and is a demarcation value that accumulates integration or consumes integration, such as accumulating integration when the CPU utilization of the instance is below the integration reference value, and consuming integration when the CPU utilization of the instance is below the integration reference value.
The integration benchmark value for an instance may be determined based on one or more parameters of the instance's specification, the percentage of the instance, the VIP coefficient of the tenant, and so on. For example, the integral reference value = N x percent for the example; further exemplary, the integral reference value=n×percent×vip coefficient of the example. Wherein the VIP coefficients may be determined based on the class of the tenant and/or the class of traffic running within the example. For convenience of description, the following description will be continued taking the example of integrating reference value=n×excess ratio.
Where N represents the number of vcpus included in an instance, the percentage of the number of vcpus provided by a CPU core may be equal to the ratio of one vCPU to the CPU core, e.g., 3 vcpus running on a CPU core, each vCPU accounting for 1/3 of the CPU core.
Further, assuming that the virtual machine 1 includes one vCPU (denoted vCPU 1), the excess ratio=1/3, the integration reference value= 1*1/3=1/3≡33.3% of the virtual machine 1.
In step 702, the server determines an integration value for the first instance based on the integration information and the resource information for the first instance.
For example, please understand in connection with fig. 8 that the server determines a second integral value of the first instance based on the integral value (noted as a first integral value) of the first instance at a first time instant (at the end of the last scheduling period), an integral reference value (Burst) and a resource allowance (e.g. a quota value) of the first instance in a second time period, and takes the second integral value as a starting integral value of the first instance in the second time period. That is, the virtual machine 1 corresponds to two integral values at T1, namely, a first integral value and a second integral value, the first integral value being an integral value of the virtual machine 1 at the end of the last scheduling period, and the second integral value being an initial integral value of the virtual machine 1 at the next scheduling period.
For example, FIG. 8 shows the integral reference value and the quota value over different time periods for an instance (e.g., virtual machine 1). Illustratively, second integral value of virtual machine = first integral value + (Burst-quota) T; wherein T represents the length of the second time period; quota is the resource quota of the instance during the second time period.
It is understood that when the quota value is greater than the integration reference value, the integration is consumed, and it is understood that a partial integration is temporarily subtracted. Accumulating the integral when the quota value is less than the integral reference value can be understood as temporarily giving a partial integral.
In step 703, the server obtains the resource usage information of the first instance.
In the above example, the server monitors the first instance to obtain the resource usage information of the first instance in the second period of time. Illustratively, the resource usage information is used to indicate usage of the processor resource by the first instance, e.g., the resource usage information includes CPU utilization of the first instance per unit time during the second period of time.
With continued reference to fig. 8, fig. 8 also shows the CPU utilization curve and the integration curve of the virtual machine 1.
The CPU utilization curve of the virtual machine 1 includes CPU utilization of the virtual machine 1 per unit time in the second period, wherein the CPU utilization per unit time may be determined based on a ratio of an instance run time per unit time vCPU to the unit time. For example, if the unit time is 1ms, and the vCPU1 of the virtual machine 1 is running for 0.5ms within a certain 1ms, the CPU utilization of the virtual machine 1 is 50% for the 1 ms. It should be noted that 1ms is only an example, and the length of unit time is not limited in this application.
The process of obtaining the resource usage information of the virtual machine 1 in the second period of time by the server includes: the server monitors the CPU utilization rate of the vCPU1 of the virtual machine 1 in the second time period, and the obtained data is the resource usage information of the virtual machine 1 in the second time period, such as the CPU utilization rate curve of the virtual machine 1 shown in fig. 8.
Step 704, the server determines a third integral value for the first instance based on the second integral value for the first instance and the resource usage information.
The server calculates a third integral from the second integral of the first instance, and the quota value and CPU utilization of the instance over a second period of time. The third integrated value is an integrated value of the virtual machine 1 at the end time of the second period. It is understood that the integration value of an instance is for use in situations where the instance's occupancy of resources exceeds quota, but still has resource requirements. Specifically, when the CPU utilization is lower than the quota line, the integral is accumulated; when the CPU utilization is higher than the quota line, the integration is consumed.
As shown in fig. 8, at stage 1, cpu utilization is below the quota line, at which point the integral is accumulated.
In stage 2, CPU utilization is higher than the quota line, at which point the integration is consumed.
In stage 3, CPU utilization is below the quota line, at which point the integral is accumulated.
Third integrated value=second integrated value+Δ integrated value;
wherein,CPU utilization [ i ]]The i-th unit time in the second period of time, k represents the number of unit times included in the second period of time.
It is understood that the second integral value is an integral of the virtual machine 1 at a start time of the second period, and the third integral value is an integral of the virtual machine 1 at an end time of the second period.
There are various units of measure of the integrated value, such as% ×t,% represents CPU utilization, and t represents time. For example, 100% x 1min indicates that the vCPU may run at 100% utilization for 1 minute, or the vCPU may run at 50% utilization for 2 minutes. Etc. For another example, the unit of measure of the integral value may also be a custom value, such as the number of integral values, for example, 1 integral=100% ×1min.
Alternatively, the integration value of an instance may also have configuration parameters such as expiration date, upper integration limit, upper operating ratio limit, etc., which may be set by an administrator. The validity period refers to a time during which the integrated value can be retained. The upper integral limit refers to the cumulative integral maximum. The upper operation ratio limit refers to the maximum utilization of the vCPU when the vCPU uses integration (for example, the resource usage of the vCPU breaks through quota, which will be described later), for example, the upper operation ratio limit is 80%, that is, when the quota is broken through (for example, see the second stage in fig. 2), the vCPU is operated at 80% utilization, that is, the CPU utilization of the vCPU is 80% at most.
In this embodiment, the methods shown in fig. 4 and fig. 7 may be performed in multiple iterations, for example, please continue to understand with reference to fig. 8, and the subsequent iteration process may include:
(1) Based on the method shown in fig. 4, the server may determine the value of quota of the virtual machine 1 in the third time period (as determined by the embodiment shown in fig. 4) at the second time T2 (e.g., the starting time of the third time period or a time before the third time period). Referring to fig. 8, assuming that the server predicts that the resource requirement value of the virtual machine 1 in the third period becomes low, the quota value of the virtual machine 1 in the third period is correspondingly reduced.
(2) The server determines a fourth integrated value based on the quota value, the integration reference value, and the third integrated value of the virtual machine 1 in the third period. Referring to fig. 8, during the third period, the value of quota is lower than the Burst value, the gifting part integrates, a fourth integral is determined based on the gifted integral and the second integral, and the fourth integral is taken as the initial integral of the third period, and specific reference is made to the description of determining the second integral in the above, which is not repeated here;
(3) The server determines a fifth integral value of the virtual machine 1 based on the fourth integral value and the CPU utilization of the virtual machine 1 in the third period, where it is understood that the fourth integral value is a start integral value of the virtual machine 1 in the third period, and the fifth integral value is an integral value of the virtual machine 1 at an end time of the third period. The manner of determining the fifth integrated value may be referred to as the manner of determining the third integrated value, and will not be described here.
It should be noted that, as shown above, the method embodiment shown in fig. 7 is performed on the basis of the method embodiment of fig. 4, and in practical applications, the application may also support that the method embodiment of fig. 7 is not performed on the basis of the method embodiment of fig. 4, for example, in the method embodiment shown in fig. 7, the value of quota of the first instance running on the server is a fixed value.
In the above way, the instance can use the integral when the resource burst requirement exists, so that the instance which gives up the resource for a long time, especially the tenant pays high cost, but historically, the large-scale virtual machine which runs with low CPU utilization rate for a long time can assist in providing the burst performance of the instance, and the fair scheduling is realized as much as possible.
It should be noted that, the integral may be used when the occupancy rate of the resource by the instance reaches the quota and still has the resource requirement, as in fig. 8, after the CPU utilization rate of the virtual machine 1 reaches the quota value, the integral may be consumed to make the CPU utilization rate exceed the quota value, however, after the occupancy rate of the resource by the instance reaches the quota value, whether to continue to schedule the instance to execute the determination of the third resource management method provided by the present embodiment.
In the resource management method, the server schedules the running of the vCPU according to the resource quota of the instance, and when the resource quota is reached, the server may determine whether to schedule the instance to continue running based on one or more pieces of information such as priority of the instance, the number of instances of the requested resource, an integrated value of a first instance of the requested resource, and the number of remaining resources. It can be understood that the continued scheduling is breakthrough quota, and if the scheduling is not continued, the current is limited by the burst. It can be seen that the integration can effectively improve the burst performance of the instance, meet the resource requirement of the instance, and reduce the response time delay of the service running in the instance.
Another resource management method provided in the embodiment of the present application is described below with reference to fig. 9. The resource management method may be performed by the server in fig. 3. As shown in fig. 9, the method includes:
in step 901, the server detects that an instance has a bursty demand.
The burst requirement means that the resource used by the instance reaches a quota value in the scheduling period, but the resource use requirement still exists, or the process corresponding to the vCPU queues in a task queue to wait for scheduling.
Step 902, the server determines a prioritization of multiple instances requesting the same resource;
the plurality of examples include: an instance where the occupancy of the resource does not exceed the occupancy limit and/or an instance where the occupancy limit of the resource reaches the occupancy limit and also has an integral.
The server may determine a prioritization among the instances based on one or more of credit information, resource quality information, traffic information, etc. for each instance of the plurality of instances requesting resources; the integral information comprises an integral value of an instance, and the resource quality information comprises burst current limiting times of the instance, scheduling delay time distribution data, vCPU burst current limiting times, vCPU burst current limiting time length, vCPU queuing number distribution data and the like. The service information includes a user level of the tenant to which the instance belongs, a type of service the instance operates, and the like.
For example, in fig. 8, the virtual machine 1 has a burst demand at time T3, and the server may obtain at time T3 one or more index data such as an integral value of any VM (e.g., virtual machine 1 and virtual machine 2) requesting resources, a burst throttling number, scheduling delay time distribution data and vCPU queuing number distribution data in a period of time before time T3, and calculate the priority ranking of the virtual machine 1 and virtual machine 2.
The server determines the priority ordering of the examples according to the information in various ways, for example, the priority ordering can be sequentially compared and ordered through single index data, for example, the priority is higher as the integral value is higher, the priority is higher as the burst current limiting number is higher, the priority is higher as the number of vCPU queues is more, and the like. For another example, priorities of the respective instances are calculated from the obtained weighted values of the respective indexes, and priorities, i.e., ranks, are determined based on the calculated priorities.
In step 903, the server performs scheduling according to the prioritization of multiple instances requesting the same resource.
The server may determine information on the scheduling order, which instance or instances may burst, the burst size of each instance, etc., based on the prioritization of the instances. For example, the server may schedule the highest priority instance for the burst. Also for example, the server may schedule multiple instances with higher priority for the burst. Still further exemplary, the server schedules all instances for burst.
Wherein the burst size of an instance may be determined based on one or more of the integration of the instance, the number of resources remaining, the number of burst instances, instances that have not reached a quota value, and the like.
For example, as understood in conjunction with fig. 6 and 8, assuming that the duration of the second period is 100ms, the quota of the virtual machine 1 in the second period=50 ms, and the quota of the virtual machine 2=20 ms, as shown in fig. 6, the second period includes 10 clock ticks, each of which includes 10ms.
Illustratively, the scheduling period includes three phases, respectively:
(1) In the first four clock ticks, virtual machine 1 and virtual machine 2 halve the CPU time, each running for 20ms. It should be noted that stage one, the quota of virtual machine 2 has been exhausted, while the quota of virtual machine 1 has not been exhausted. As seen in fig. 8, at stage 1, the CPU utilization of virtual machine 1 is below the quota value. It should be understood that the CPU usage curve of fig. 8 is not calculated from the run time of the virtual machine 1 of fig. 6, but only illustrates the relationship between CPU usage and quota.
(2) After the fourth clock tick, the virtual machine 1 continues to run for 30ms. Note that after the seventh clock tick, the quota of virtual machine 1 is exhausted. As seen at time T3 of fig. 8, the CPU utilization of virtual machine 1 reaches the quota value.
(3) At the arrival of the eighth clock tick, in one possible scenario where virtual machine 1 is still requesting resources (i.e., bursty demand) and virtual machine 2 is not bursty, the server may schedule virtual machine 1 to run to break through quota based on the integral value of virtual machine 1 at time T3, with the total break through (e.g., the integral value consumed by stage 2 in fig. 8) being less than or equal to the integral value at time T3.
In another possible scenario, where virtual machine 1 has a bursty demand and virtual machine 2 also has a bursty demand, the server determines the priority ordering of virtual machine 1 and virtual machine 2 in the manner described above, and if the priority of virtual machine 1 is higher than the priority of virtual machine 2, then at the eighth clock tick, the server schedules virtual machine 1 to run, or the server schedules virtual machine 1 to run first and then schedules virtual machine 2 to run, virtual machine 1 and virtual machine 2 taking 10ms of the eighth clock tick.
By the mode, the priority of the instance of CPU resources which is not high in normal CPU load and is yielded for a long time can be improved, the instance is enabled to operate preferentially, and fair scheduling is achieved as much as possible.
Based on the above examples, the application provides a hybrid scenario, where corresponding priorities can be set based on resource limits of instances, and instances with different priorities are deployed in one server, such as a low-priority instance, a medium-priority instance, and a high-priority instance. The resource limits of these three examples may be dynamically adjusted in the manner shown in fig. 4, but the adjusted quota value is within the threshold range corresponding to the priority.
For example, the low priority instance is without a minimum quota value guarantee, e.g., the quota value of the low priority instance may be 0.
The resource quota of the middle priority instance can ensure that the minimum quota value is 20%, namely the quota value of the middle priority instance is 20% at minimum;
the resource quota of the high-priority instance can ensure that the minimum value of the quota is 30 percent and the maximum value of the quota is 100 percent, namely the minimum value of the quota of the high-priority instance is 30 percent and the maximum value of the quota of the high-priority instance can reach 100 percent.
The mixed part scene can reduce resource competition among examples. It should be noted that the above multiple priority examples are only examples, and the present application supports more or fewer priority example types, and the threshold ranges corresponding to different priorities are also examples, which are not limited in this application.
Based on the same inventive concept as the method embodiments, the present application further provides a computing device for executing the method executed by the server in the method embodiments of fig. 4 or fig. 7 or fig. 9. As shown in fig. 10, the computing device 1000 includes an acquisition module 1001, a prediction module 1002, and an adjustment module 1003; optionally, the computing device 1000 may further include an integration module 1004, a scheduling module 1005, a determination module 1006. Specifically, in the computing device 1000, connections are established between the modules through communication paths.
An obtaining module 1001, configured to obtain index data of a first instance in a computing device, where the index data is data generated by monitoring the first instance, and the first instance is any one instance in the computing device; see, for example, step 401, which is not described in detail herein.
A prediction module 1002, configured to predict a resource demand of the first instance in a future time period according to the index data of the first instance; see step 402 for details, which are not described in detail herein.
An adjustment module 1003, configured to adjust an occupancy limit of the first instance on the resource according to the resource demand of the first instance. See step 403 for details, which are not described here.
In one possible implementation, the index data of the first instance includes one or more of the following:
resource usage information of the first instance, resource quality information of the first instance;
the resource use information of the first instance is used for indicating the use condition of the first instance on one or more resources; the resource quality information of the first instance is used to indicate a resource quality of one or more resources used by the first instance.
In one possible implementation, the resource usage information of the first instance includes information indicating an occupancy of processor resources by the first instance;
The resource quality information of the first instance includes one or more of:
scheduling delay, burst current limiting times, burst current limiting time length and the queuing number of vCPUs in the first instance.
In one possible implementation, the integration module 1004 is configured to determine, according to the integration information of the first instance and the resource usage information of the first instance, an integrated value of the first instance, where the integrated value is used in a case where an occupancy of the resource by the first instance exceeds an occupancy limit and still has a resource usage requirement.
In one possible implementation, the integration information of the first instance includes one or more of the following: integral value of the first instance, integral reference value; the integration reference value is a division value of the integration and the consumption integration.
In one possible implementation, a determining module 1006 is configured to determine a prioritization of multiple instances within the computing device requesting the same resource; the priority order of the plurality of instances is determined according to the integral value and/or the resource quality information of each instance in the plurality of instances, wherein the resource quality information of the instance is used for indicating the resource quality of the resource used by the instance, and the higher the integral value of the instance, the higher the instance priority; the lower the resource quality of the instance, the higher the instance priority; the plurality of examples include: an instance in which the occupancy rate of the resource does not reach the occupancy rate limit, and/or an instance in which the occupancy rate of the resource reaches the occupancy rate limit and has an integral;
A scheduling module 1005, configured to schedule some or all of the plurality of instances to run according to the priority ranking in the plurality of instances.
In one possible implementation, if the part or all of the instances include the first instance, and the occupancy of the resource by the first instance reaches the occupancy limit of the resource by the first instance;
the determining module 1006 is further configured to: and determining the burst quantity of the first instance according to the integral value of the first instance and the number of the residual resources, wherein the burst quantity refers to the part of the first instance, the occupancy rate of which to the resources exceeds the occupancy rate limit of the first instance to the resources.
In one possible implementation, at least a first instance and a second instance are running in a computing device; the occupancy quota corresponding to the first instance is in a first threshold range; the occupancy quota corresponding to the second instance is in a second threshold range; the first threshold range is different from the second threshold range.
Illustratively, the implementation of the adjustment module 1003 in the computing device 1000 is described next as an example of the adjustment module 1003. Similarly, the implementation of the acquisition module 1001, the prediction module 1002, the integration module 1004, the scheduling module 1005, and the determination module 1006 may refer to the implementation of the adjustment module 1003.
When implemented in software, the adaptation module 1003 may be an application or block of code running on a computer device. The computer device may be at least one of a physical host, a virtual machine, a container, and the like. Further, the computer device may be one or more. For example, the adaptation module 1003 may be an application running on multiple hosts/virtual machines/containers. It should be noted that, a plurality of hosts/virtual machines/containers for running the application may be distributed in the same available area (availability zone, AZ) or may be distributed in different AZs. Multiple hosts/virtual machines/containers for running the application may be distributed in the same region (region) or may be distributed in different regions. Wherein typically a region may comprise a plurality of AZs.
Also, multiple hosts/virtual machines/containers for running the application may be distributed in the same virtual private cloud (virtual private cloud, VPC) or in multiple VPCs. Where typically a region may comprise multiple VPCs and a VPC may comprise multiple AZs.
When implemented in hardware, the adjustment module 1003 may include at least one computing device, such as a server or the like. Alternatively, the adjustment module 1003 may be a device implemented by an application-specific integrated circuit (ASIC) or a programmable logic device (programmable logic device, PLD), or the like. The PLD may be implemented as a complex program logic device (complex programmable logical device, CPLD), a field-programmable gate array (FPGA), a general-purpose array logic (generic array logic, GAL), or any combination thereof.
The plurality of computing devices included in adjustment module 1003 may be distributed among the same AZ or among different AZs. The adjustment module 1003 may include multiple computing devices distributed in the same region or in different regions. Likewise, the plurality of computing devices included in adjustment module 1003 may be distributed in the same VPC or may be distributed among multiple VPCs. Wherein the plurality of computing devices may be any combination of computing devices such as servers, ASIC, PLD, CPLD, FPGA, and GAL.
It should be noted that, in the embodiment of the present application, the division of the modules is schematic, which is merely a logic function division, and other division manners may be implemented in actual implementation. The functional modules in the embodiments of the present application may be integrated into one module, or each module may exist separately and physically, or two or more modules may be integrated into one module, for example, the first receiving module and the second receiving module are integrated into one module, or the first receiving module and the second receiving module are the same module. Similarly, the first determination module and the second determination module are integrated in one module, or the first determination module and the second determination module are the same module. The integrated units may be implemented in hardware or in software functional units.
The present application also provides a computing device 1100. As shown in fig. 11, the computing device 1100 includes: bus 1102, processor 1104, memory 1106, and communication interface 1108. Communication between processor 1104, memory 1106, and communication interface 1108 occurs via bus 1102. Computing device 1100 may be a server or a terminal device. It should be understood that the present application is not limited to the number of processors, memories in computing device 1100.
Bus 1102 may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one line is shown in FIG. 11, but not only one bus or one type of bus. Bus 1102 may include a path to transfer information between various components of computing device 1100 (e.g., memory 1106, processor 1104, communication interface 1108).
The processor 1104 may include any one or more of a central processing unit (central processing unit, CPU), a graphics processor (graphics processing unit, GPU), a Microprocessor (MP), or a digital signal processor (digital signal processor, DSP).
The memory 1106 may include volatile memory (RAM), such as random access memory (random access memory). The processor 1104 may also include a non-volatile memory (ROM), such as read-only memory (ROM), flash memory, a mechanical hard disk (HDD), or a solid state disk (solid state drive, SSD).
The memory 1106 stores executable program codes, and the processor 1104 executes the executable program codes to implement the functions of the acquisition module 1001, the prediction module 1002, and the adjustment module 1003, and optionally, the functions of the integration module 1004, the scheduling module 1005, and the determination module 1006, respectively, so as to implement a resource management method. That is, the memory 1106 has instructions stored thereon for the computing device 1000 to perform the resource management methods provided herein.
Communication interface 1108 enables communication between computing device 1100 and other devices or communication networks using transceiver modules such as, but not limited to, network interface cards, transceivers, and the like.
The embodiment of the application also provides a computing device cluster. The cluster of computing devices includes at least one computing device. The computing device may be a server. In some embodiments, the computing device may also be a terminal device such as a desktop, notebook, or smart phone.
As shown in fig. 12, the computing device cluster includes at least one computing device 1100. The same instructions for performing the resource allocation method may be stored in the memory 1106 in one or more computing devices 1100 in the computing device cluster.
In some possible implementations, portions of the instructions for performing the resource allocation method may also be stored separately in the memory 1106 of one or more computing devices 1100 in the cluster of computing devices. In other words, a combination of one or more computing devices 1100 may collectively execute instructions for performing a resource allocation method.
It should be noted that, the memory 1106 in different computing devices 1100 in the computing device cluster may store different instructions for performing part of the functions of the computing device. That is, the instructions stored in the memory 1106 of the different computing devices 1100 may implement the functions of the foregoing obtaining module 1001, the predicting module 1002, and the adjusting module 1003, and optionally may also implement the functions of the foregoing integrating module 1004, the scheduling module 1005, and the determining module 1006, respectively.
In some possible implementations, one or more computing devices in a cluster of computing devices may be connected through a network. Wherein the network may be a wide area network or a local area network, etc. Fig. 13 shows one possible implementation. As shown in fig. 13, two computing devices 1100A and 1100B are connected by a network. Specifically, the connection to the network is made through a communication interface in each computing device. In this type of possible implementation, instructions to perform the functions of the acquisition module 1001 and the adjustment module 1003 are stored in the memory 1106 in the computing device 1100A. Meanwhile, instructions to perform the functions of the prediction module 1002 are stored in a memory 1106 in the computing device 1100B.
It should be appreciated that the functionality of computing device 1100A shown in fig. 12 may also be performed by multiple computing devices 1100. Likewise, the functionality of computing device 1100B may also be performed by multiple computing devices 1100.
The embodiment of the application also provides another computing device cluster. The connection between computing devices in the computing device cluster may be similar to the connection of the computing device cluster described with reference to fig. 11 and 12. In contrast, the memory 1106 in one or more computing devices 1100 in the cluster of computing devices may have the same instructions stored therein for performing the resource management method.
In some possible implementations, portions of the instructions for performing the resource management method may also be stored separately in the memory 1106 of one or more computing devices 1100 in the cluster of computing devices. In other words, a combination of one or more computing devices 1100 may collectively execute instructions for performing the resource management method.
Embodiments of the present application also provide a computer program product comprising instructions. The computer program product may be software or a program product containing instructions capable of running on a computing device or stored in any useful medium. The computer program product, when run on at least one computing device, causes the at least one computing device to perform a resource management method.
Embodiments of the present application also provide a computer-readable storage medium. The computer readable storage medium may be any available medium that can be stored by a computing device or a data storage device such as a data center containing one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), etc. The computer-readable storage medium includes instructions that instruct a computing device to perform a resource management method.
Alternatively, the computer-executable instructions in the embodiments of the present application may be referred to as application program codes, which are not specifically limited in the embodiments of the present application.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device including one or more servers, data centers, etc. that can be integrated with the available medium. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
The various illustrative logical blocks and circuits described in the embodiments of the present application may be implemented or performed with a general purpose processor, a digital signal processor, an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the general purpose processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a digital signal processor and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a digital signal processor core, or any other similar configuration.
The steps of a method or algorithm described in the embodiments of the present application may be embodied directly in hardware, in a software element executed by a processor, or in a combination of the two. The software elements may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. In an example, a storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Although the present application has been described in connection with specific features and embodiments thereof, it will be apparent that various modifications and combinations can be made without departing from the spirit and scope of the application. Accordingly, the specification and drawings are merely exemplary illustrations of the present application as defined in the appended claims and are considered to cover any and all modifications, variations, combinations, or equivalents that fall within the scope of the present application. It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to include such modifications and variations as well.

Claims (19)

1. A method of resource management, comprising:
acquiring index data of a first instance in computing equipment, wherein the index data is data generated by monitoring the first instance, and the first instance is any one instance in the computing equipment;
predicting a resource demand of the first instance in a future time period according to the index data of the first instance;
and adjusting the occupancy rate limit of the first instance to the resources according to the resource demand of the first instance.
2. The method of claim 1, wherein the index data for the first instance comprises one or more of:
resource usage information of the first instance,
Resource quality information of the first instance;
the resource use information of the first instance is used for indicating the use condition of the first instance on one or more resources; the resource quality information of the first instance is used to indicate a resource quality of one or more resources used by the first instance.
3. The method of claim 2, wherein the resource usage information of the first instance includes information indicating an occupancy of resources by the first instance;
The resource quality information of the first instance includes one or more of:
scheduling delay, burst current limiting times, burst current limiting time length and the queuing number of vCPUs in the first instance.
4. A method according to any one of claims 1-3, wherein the method further comprises:
determining an integrated value of the first instance according to the integrated information of the first instance and the resource use information of the first instance; the integral value is for use in a case where the occupancy of the resource by the first instance exceeds an occupancy limit.
5. The method of claim 4, wherein the integration information for the first instance includes one or more of:
an integration value of the first instance, an integration reference value;
the integration reference value is a division value of accumulated integration and consumed integration.
6. The method of claim 4 or 5, wherein the method further comprises:
determining a prioritization of multiple instances within the computing device requesting the same resource; the priority order of the plurality of instances is determined according to the integral value and/or the resource quality information of each instance in the plurality of instances, wherein the resource quality information of the instance is used for indicating the resource quality of the resource used by the instance, and the higher the integral value of the instance, the higher the instance priority; the lower the resource quality of the instance, the higher the instance priority; the plurality of examples include: an instance in which the occupancy rate of the resource does not reach the occupancy rate limit, and/or an instance in which the occupancy rate of the resource reaches the occupancy rate limit and has an integral;
Scheduling some or all of the plurality of instances to run according to the prioritization of the plurality of instances.
7. The method of claim 6, wherein if the portion or all of the instances include the first instance and the occupancy of the resource by the first instance reaches an occupancy limit of the resource by the first instance;
the method further comprises the steps of:
and determining the burst quantity of the first instance according to the integral value of the first instance and the number of the residual resources, wherein the burst quantity refers to the part of the first instance, the occupancy rate of which to the resources exceeds the occupancy rate limit of the first instance to the resources.
8. The method of any of claims 1-7, wherein at least a first instance, a second instance are running in the computing device;
the occupancy quota corresponding to the first instance is in a first threshold range;
the occupancy quota corresponding to the second instance is in a second threshold range;
the first threshold range is different from the second threshold range.
9. A computing device, the device comprising:
the system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring index data of a first instance in computing equipment, the index data is data generated by monitoring the first instance, and the first instance is any one instance in the computing equipment;
A prediction module, configured to predict a resource demand of the first instance in a future time period according to the index data of the first instance;
and the adjusting module is used for adjusting the occupancy rate limit of the first instance on the resources according to the resource demand of the first instance.
10. The apparatus of claim 9, wherein the index data for the first instance comprises one or more of:
resource usage information of the first instance,
Resource quality information of the first instance;
the resource use information of the first instance is used for indicating the use condition of the first instance on one or more resources; the resource quality information of the first instance is used to indicate a resource quality of one or more resources used by the first instance.
11. The apparatus of claim 10, wherein the resource usage information of the first instance comprises information indicating an occupancy of processor resources by the first instance;
the resource quality information of the first instance includes one or more of:
scheduling delay, burst current limiting times, burst current limiting time length and the queuing number of vCPUs in the first instance.
12. The apparatus of any one of claims 9-11, wherein the apparatus further comprises an integration module;
the integration module is used for determining an integrated value of the first instance according to the integrated information of the first instance and the resource use information of the first instance; the integral value is for use in a case where the occupancy of the resource by the first instance exceeds an occupancy limit.
13. The apparatus of claim 12, wherein the integration information for the first instance comprises one or more of:
an integration value of the first instance, an integration reference value;
the integration reference value is a division value of accumulated integration and consumed integration.
14. The apparatus of claim 12 or 13, wherein the apparatus further comprises a determination module, a scheduling module;
the determining module is configured to determine a prioritization of multiple instances within the computing device requesting the same resource; the priority order of the plurality of instances is determined according to the integral value and/or the resource quality information of each instance in the plurality of instances, wherein the resource quality information of the instance is used for indicating the resource quality of the resource used by the instance, and the higher the integral value of the instance, the higher the instance priority; the lower the resource quality of the instance, the higher the instance priority; the plurality of examples include: an instance in which the occupancy rate of the resource does not reach the occupancy rate limit, and/or an instance in which the occupancy rate of the resource reaches the occupancy rate limit and has an integral;
The scheduling module is used for scheduling part or all of the plurality of instances to run according to the priority ordering in the plurality of instances.
15. The apparatus of claim 14, wherein if the portion or all of the instances comprise the first instance and the occupancy of the resource by the first instance reaches an occupancy limit of the resource by the first instance;
the determining module is further configured to determine a burst amount of the first instance according to the integrated value of the first instance and the number of remaining resources, where the burst amount refers to a portion where an occupancy rate of the first instance to the resources exceeds an occupancy limit of the first instance to the resources.
16. The apparatus of any of claims 9-15, wherein at least a first instance, a second instance are running in the computing device;
the occupancy quota corresponding to the first instance is in a first threshold range;
the occupancy quota corresponding to the second instance is in a second threshold range;
the first threshold range is different from the second threshold range.
17. A cluster of computing devices, comprising at least one computing device, each computing device comprising a processor and a memory;
The processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device to cause the cluster of computing devices to perform the method of any one of claims 1 to 8.
18. A computer program product containing instructions that, when executed by a cluster of computing devices, cause the cluster of computing devices to perform the method of any of claims 1 to 8.
19. A computer readable storage medium comprising computer program instructions which, when executed by a cluster of computing devices, perform the method of any of claims 1 to 8.
CN202211177480.XA 2022-09-26 2022-09-26 Resource management method and device Pending CN117806768A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211177480.XA CN117806768A (en) 2022-09-26 2022-09-26 Resource management method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211177480.XA CN117806768A (en) 2022-09-26 2022-09-26 Resource management method and device

Publications (1)

Publication Number Publication Date
CN117806768A true CN117806768A (en) 2024-04-02

Family

ID=90433978

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211177480.XA Pending CN117806768A (en) 2022-09-26 2022-09-26 Resource management method and device

Country Status (1)

Country Link
CN (1) CN117806768A (en)

Similar Documents

Publication Publication Date Title
EP3853731B1 (en) Commitment-aware scheduler
US9535736B2 (en) Providing service quality levels through CPU scheduling
US9019826B2 (en) Hierarchical allocation of network bandwidth for quality of service
Delgado et al. Kairos: Preemptive data center scheduling without runtime estimates
WO2022068697A1 (en) Task scheduling method and apparatus
US9973512B2 (en) Determining variable wait time in an asynchronous call-back system based on calculated average sub-queue wait time
US11150951B2 (en) Releasable resource based preemptive scheduling
US20090055829A1 (en) Method and apparatus for fine grain performance management of computer systems
CN111897637B (en) Job scheduling method, device, host and storage medium
US8959328B2 (en) Device, system, and method for multi-resource scheduling
EP4404539A1 (en) Resource scheduling method, apparatus and system, device, medium, and program product
CN105045667B (en) A kind of resource pool management method for virtual machine vCPU scheduling
CN104598311A (en) Method and device for real-time operation fair scheduling for Hadoop
CN111597044A (en) Task scheduling method and device, storage medium and electronic equipment
CN114461365A (en) Process scheduling processing method, device, equipment and storage medium
CN106874102A (en) Resource regulating method and device based on container work property
Liu et al. Leveraging dependency in scheduling and preemption for high throughput in data-parallel clusters
CN117806768A (en) Resource management method and device
CN115309556A (en) Microservice expansion method, microservice expansion device, microservice expansion server and storage medium
CN114661415A (en) Scheduling method and computer system
Liu et al. Cooperative job scheduling and data allocation for busy data-intensive parallel computing clusters
CN116893893B (en) Virtual machine scheduling method and device, electronic equipment and storage medium
CN114598705B (en) Message load balancing method, device, equipment and medium
Qiao et al. ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving
Louis et al. A best effort heuristic algorithm for scheduling timely constrained tasks in the cloud

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication