WO2022222975A1 - 负载处理方法、计算节点、计算节点集群及相关设备 - Google Patents

负载处理方法、计算节点、计算节点集群及相关设备 Download PDF

Info

Publication number
WO2022222975A1
WO2022222975A1 PCT/CN2022/088019 CN2022088019W WO2022222975A1 WO 2022222975 A1 WO2022222975 A1 WO 2022222975A1 CN 2022088019 W CN2022088019 W CN 2022088019W WO 2022222975 A1 WO2022222975 A1 WO 2022222975A1
Authority
WO
WIPO (PCT)
Prior art keywords
load
computing node
physical resources
computing
physical
Prior art date
Application number
PCT/CN2022/088019
Other languages
English (en)
French (fr)
Inventor
郭雷
比加利大卫
胡昊然
柯晓棣
彭骞
杨晔
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2022222975A1 publication Critical patent/WO2022222975A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing

Definitions

  • the present application relates to the technical field of cloud computing, and in particular, to a load processing method, a computing node, a computing node cluster and related equipment.
  • a computing node cluster such as a public cloud system or a private cloud system usually includes multiple physical computing nodes and resource management scheduling nodes. Among them, each computing node can run a load belonging to one or more tenants, and the load can be run in the form of a virtual machine (VM), container or process; the resource management scheduling node can allocate resources on each computing node. Incorporates management and schedules resources on a compute node to run the load on that compute node.
  • VM virtual machine
  • resource management scheduling nodes in a computing node cluster In practical application scenarios, the data processing capability of resource management scheduling nodes in a computing node cluster is usually limited, which makes it difficult for resource management scheduling nodes to manage computing nodes and resource scheduling performance when the number of computing nodes reaches a certain level. level, thus limiting the scale of the computing node cluster.
  • the present application provides a load processing method for increasing the scale of a computing node cluster.
  • the present application also provides a computing node, a computing node cluster, a computer-readable storage medium, and a computer program product.
  • the present application provides a load processing method, the method is applied to a first computing node, the first computing node is connected to other computing nodes, and the first computing node and other computing nodes are set on the same computing node cluster; when executing the load processing method, the first computing node determines available physical resources on the first computing node and available physical resources on other computing nodes, and receives a load running request, where the load running request is used to request to run a target load physical resources, so that the first computing node judges whether the physical resources available on the first computing node satisfy the physical resources requested by the load operation request; the physical resources available on the first computing node satisfy the physical resources requested by the load operation request In the case that the physical resources requested by the load operation request are allocated to the target load from the physical resources available on the first computing node; the physical resources available on the first computing node do not satisfy the physical resources requested by the load operation request.
  • a second computing node whose available physical resources satisfy the physical resources requested by the load running request is selected from other computing
  • the first computing node Since the first computing node performs fine-grained physical resource scheduling for the target load, and when the physical resources available on the first computing node do not meet the physical resources required for creating the target load, the first computing node can run the load The request is forwarded to other computing nodes with sufficient physical resources, so that other computing nodes can use their own physical resources to create the target load.
  • the scheduling system only needs to instruct the first computing node to perform physical resource scheduling for the target load, and does not need to perform a complex computing process to determine the specific physical resources allocated to the load, nor does it need to determine the first computing node that receives the load running request Whether there are enough physical resources to create a load (automatically calculated and determined by the first computing node), which can reduce the amount of computation required by the scheduling system to schedule physical resources and the difficulty of scheduling, so that the scheduling system can manage and schedule a larger number of computations Nodes, that is, increasing the scale of the computing node cluster.
  • the first computing node is connected to the scheduling system, and when the first computing node receives the load running request, it may specifically receive the load running request sent by the scheduling system. In this way, the first computing node creates and runs the target load under the scheduling of the scheduling system.
  • the scheduling system can first obtain the specifications of the virtual instance to be created specified by the tenant.
  • the scheduling system can provide a configuration interface, allowing the tenant to log in to the configuration interface remotely, and enter the type and specification of the virtual instance to be created in the configuration interface.
  • the scheduling system does not need to calculate the computing node with suitable idle resources in the computing node cluster according to the specification, but directly uses the first computing node as the default computing node, and sends the first computing node to indicate the load operation of the specification. request, wherein the load running request is used to request physical resources matching the specifications of the running target load (that is, the virtual instance), and then the first computing node is responsible for calculating the computing nodes (including computing nodes with suitable idle resources) in the computing node cluster.
  • the first computing node itself), so the scheduling operation is done by the first computing node, and the scheduling system can avoid performing direct scheduling operations, thereby reducing the amount of computation required by the scheduling system to schedule physical resources and the difficulty of scheduling.
  • the virtual instance is, for example, a virtual machine or a container.
  • the scheduling system is set in the data center of the public cloud, and the cluster of computing nodes where the first computing node is located is set in the edge cloud data center that is remotely connected to the data center of the public cloud.
  • the automatic scheduling of computing nodes in the edge cloud can increase the scale of the computing node cluster deployed in the edge cloud, and the scheduling system of the public cloud does not need to pay too much scheduling computing power for the edge cloud.
  • both the scheduling system and the computing node cluster where the first computing node is located are set in the data center of the public cloud. In this way, through the automatic scheduling of the first computing node, it is possible to increase the number of computing devices deployed in the public cloud. The size of the node cluster.
  • the scheduling system includes a virtual machine scheduling system and a container scheduling system.
  • the virtual machine scheduling system can generate a load running request for the virtual machine and send it to the first computing node
  • the container scheduling system may generate a load running request for the container and send it to the first computing node.
  • the first computing node may send response information to the scheduling system, the response information using To inform the scheduling system that the physical resources requested by the load operation request have been deducted on the first computing node, so that the scheduling system can update the physical resources available on the first computing node, so that when the scheduling system requests to create the next load, it can According to the physical resources required to create the next load and the physical resources available on the updated first computing node, it is determined whether the first computing node can schedule enough physical resources to support the creation and running of the next load.
  • the first computing node may send the total amount of available physical resources to the scheduling system, where the total amount of available physical resources includes the available physical resources on the first computing node The sum of the resources of the first computing node and the resources of the physical resources available on other computing nodes, in this way, the scheduling system can determine whether there are enough physical resources to create the target load according to the total amount of available physical resources reported by the first computing node.
  • the first computing node when it receives the load running request, it may specifically receive the load running request forwarded by a third computing node in the computing node cluster where the first computing node is located, and the third computing node The physical resources available on the server do not meet the physical resources requested by the load running request. In this way, the first computing node can not only forward the load running request to other computing nodes, but also can be used when the available physical resources on other computing nodes are insufficient. Receive a load running request sent by other computing nodes, and use the physical resources available on it to create a load corresponding to the load running request.
  • the scheduling system can first obtain the specifications of the virtual instance to be created specified by the tenant.
  • the scheduling system can provide a configuration interface, allowing the tenant to log in to the configuration interface remotely, and enter the type and value of the virtual instance to be created in the configuration interface.
  • specification at this time, the scheduling system does not need to calculate the computing node with suitable idle resources in the computing node cluster according to the specification, but directly uses the third computing node as the default computing node, and sends the third computing node indicating the specification.
  • a load running request where the load running request is used to request physical resources matching the specifications of the running target load (that is, the virtual instance), and then the third computing node is responsible for calculating the computing nodes with suitable idle resources in the computing node cluster (including the third computing node itself), so the work of the scheduling operation is completed by the third computing node, and the scheduling system can avoid direct scheduling operations, thereby reducing the amount of calculation required by the scheduling system to schedule physical resources and the difficulty of scheduling.
  • the third computing node is responsible for calculating the computing node with suitable idle resources as the first computing node in the computing node cluster, and therefore sends a load running request to the first computing node.
  • the load running request received by the first computing node carries the type of the target load, where the type of the target load includes virtual machines and containers, and the first computing node can also determine the type of the target load to be created.
  • the type of the target load is a virtual machine or a container, and if the physical resources available on the first computing node satisfy the physical resources requested by the load running request, when the type of the target load is a virtual machine, according to the allocated
  • the physical resource requested by the load running request creates a virtual machine on the first computing node, and when the type of the target load is a container, a container is created on the first computing node according to the physical resource requested by the allocated load running request. In this way, multiple different types of loads can be created on the first computing node.
  • the first computing node may specifically first collect the total amount of physical resources of other computing nodes and the physical resources used by other computing nodes, and determine the physical resources used by other computing nodes according to The total amount of physical resources of other computing nodes and the physical resources that have been used by other computing nodes, to determine the physical resources available on other computing nodes.
  • the used physical resources can be deducted from the total physical resources of other computing nodes, thereby The remaining physical resources can be physical resources available on other computing nodes.
  • the physical resources allocated for the target load from the physical resources available on the first computing node is the first physical resource; then, when the first computing node satisfies the rescheduling condition, the first physical resource is released, and the second physical resource is re-allocated for the target load from the available physical resources on the first computing node.
  • the second physical resource is different from the first physical resource.
  • the specification of the second physical resource is different from the specification of the first physical resource, or the resource type included in the second physical resource is different from the resource type included in the first physical resource, or the performance of the second physical resource is different. There are differences with the performance of the first physical resource, etc.
  • the present application provides a load processing device, the load processing device is applied to a first computing node, the first computing node is connected to other computing nodes, and the first computing node and the other computing nodes are set In the same computing node cluster
  • the load processing apparatus includes: a resource management module for determining available physical resources on the first computing node and available physical resources on the other computing nodes; a communication module for receiving load a running request, the load running request is used to request physical resources for running the target load; a scheduling module is used to determine whether the physical resources available on the first computing node satisfy the physical resources requested by the load running request , in the case that the physical resources available on the first computing node satisfy the physical resources requested by the load running request, allocate the target load from the physical resources available on the first computing node the physical resources requested by the load operation request, and in the case that the physical resources available on the first computing node do not satisfy the physical resources requested by the load operation request, select the available physical resources from the other computing nodes The physical resource satisfies
  • the first computing node is connected to a scheduling system, and the communication module is specifically configured to receive the load running request sent by the scheduling system.
  • the scheduling system is set in a data center of a public cloud
  • the computing node cluster is set in an edge cloud data center that is remotely connected to the data center of the public cloud.
  • both the scheduling system and the computing node cluster are set in the data center of the public cloud.
  • the scheduling system includes a virtual machine scheduling system and a container scheduling system.
  • the communication module is further configured to, after allocating the physical resources requested by the load running request to the target load from the physical resources available on the first computing node, send the physical resources to the target load.
  • the scheduling system sends response information, wherein the response information is used to notify the scheduling system that the physical resource requested by the load running request has been deducted on the first computing node.
  • the communication module is further configured to send the total amount of available physical resources to the scheduling system before receiving the load running request, where the total amount of available physical resources includes the The resource amount of available physical resources on the first computing node and the sum of the resource amounts of available physical resources on the other computing nodes.
  • the communication module is specifically configured to receive the load running request forwarded by a third computing node in the computing node cluster, wherein the physical resources available on the third computing node The physical resource requested by the load running request is not satisfied.
  • the load running request carries the type of the target load, wherein the type of the target load includes a virtual machine and a container
  • the load processing apparatus further includes a control module; the control module , which is specifically used for: judging that the type of the target load is a virtual machine or a container; when the physical resources available on the first computing node meet the physical resources requested by the load running request, in the When the type of the target load is a virtual machine, create a virtual machine on the first computing node according to the physical resources requested by the allocated load operation request, and when the type of the target load is a container, according to the allocated load The physical resource requested by the running request creates a container on the first computing node.
  • the resource management module is specifically configured to: collect the total amount of physical resources of the other computing nodes and the physical resources used by the other computing nodes; The total amount of resources and the physical resources used by the other computing nodes determine the physical resources available on the other computing nodes.
  • the scheduling module is further configured to: when the first computing node satisfies the rescheduling condition, release the first physical resource; A second physical resource is re-allocated to the target load from available physical resources on a computing node, where the second physical resource is different from the first physical resource.
  • the present application provides a computing node, the computing node includes a processor and a memory; the memory is used to store an instruction, and when the computing node is running, the processor executes the instruction stored in the memory, so that the The computing node executes the first aspect or the load processing method in any possible implementation manner of the first aspect.
  • the memory may be integrated in the processor, or may be independent of the processor.
  • a compute node may also include a bus. Among them, the processor is connected to the memory through the bus.
  • the memory may include readable memory and random access memory.
  • the present application provides a computing node cluster, wherein the computing node cluster includes multiple computing nodes, and one or more computing nodes in the multiple computing nodes execute the first aspect or the first A load processing method in any possible implementation of the aspect.
  • the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, which, when executed on a computer device, cause the computer device to perform the first aspect or any of the first aspects.
  • the present application provides a computer program product comprising instructions, which, when run on a computer device, cause the computer device to perform the method described in the first aspect or any one of the implementations of the first aspect.
  • the present application may further combine to provide more implementation manners.
  • FIG. 1 is a schematic diagram of the architecture of a computing node cluster according to an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of a computing node 200 according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of rescheduling physical resources on the computing node 200
  • FIG. 4 is a schematic structural diagram of another computing node 200 according to an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a load processing method provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of another load processing method provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a load processing apparatus 700 according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a computing node 800 according to an embodiment of the present application.
  • the computing node cluster includes multiple computing nodes, and the multiple computing nodes include computing node 200 , computing node 300 , computing node 400 , and the like. And, the multiple computing nodes can be resource managed and scheduled by the scheduling system 100 .
  • the number of computing nodes included in the computing node cluster is not limited. In practical application, each computing node can be implemented by computing devices such as servers.
  • the scheduling system 100 may run on one or more devices for managing the physical resources of one or more clusters of computing nodes.
  • the computing node 200 , the computing node 300 and the computing node 400 may periodically report their resource usage to the scheduling system 100 .
  • the computing node 200 can report various physical resources within the computing node 200 to the scheduling system 100 (such as the number of processors included in the computing node 200, the number of processor cores in each processor, and the size of the memory). etc.), and the physical resources (such as processors and memory allocated to each load) to which the loads currently running on the computing node 200 are allocated.
  • the computing node is, for example, a physical server, and it is worth noting that the computing node can also be any computer with a certain computing capability.
  • the scheduling system 100 allocates physical resources for the load on the computing node 200 , the computing node 300 or the computing node 400 , specifically the physical resources on the computing node 200 or the computing node 300 are allocated.
  • the scheduling system 100 needs to specify which processor or processors (such as CPU, GPU, etc.) the load runs on in the computing node 200, and specifically specify which processor cores (cores) on the processor are responsible for running the load.
  • processors such as CPU, GPU, etc.
  • cores processor cores
  • the scheduling system 100 selects a target computing node among the plurality of computing nodes, and the amount of computation required to use the target computing node to schedule physical resources for the load will be greater. , the more difficult it is to schedule physical resources, which affects the scheduling speed. Based on this concern, the scale of the computing node cluster is limited to a certain extent.
  • the load is, for example, various virtual instances on the public cloud, such as virtual machines or containers.
  • the scheduling system may provide a configuration interface or application programming interface for obtaining the type of load (eg virtual machine or container) and specifications (eg number of CPU cores and memory size) entered by the tenant.
  • type of load eg virtual machine or container
  • specifications eg number of CPU cores and memory size
  • an embodiment of the present application provides a load processing method, so as to improve the speed of scheduling physical resources in a computing node cluster and increase the scale of the computing node cluster.
  • the scheduling system 100 schedules physical resources matching the load specification specified by the tenant for the load applied by the tenant
  • the total amount of available physical resources recorded by the computing node 200 may be used as an example.
  • the computing node 200 has the scheduling capability of physical resources, and can pre-determine the physical resources available on the computing node 200 and the physical resources available on other computing nodes in the computing node cluster.
  • the sum of the physical resources available on the computing node 200 and the resource amount of the physical resources available on the computing node 300 , the computing node 400 , the computing node 500 , and the computing node 600 is the available physical resources recorded by the computing node 200 above. total amount.
  • the computing node 200 can judge whether the physical resources available on the computing node 200 can meet the physical resources requested by the load running request according to the received load running request, and the physical resources available on the computing node 200 can meet the load
  • the physical resources requested by the load operation request are allocated to the load from the available physical resources of the computing node 200, so that the computing node 200 can further utilize the allocated physical resources to run the load.
  • the computing node 200 performs fine-grained physical resource scheduling for the load, that is, the computing node 200 autonomously schedules which processor and which processor core the load runs on. Run the load and what memory space is allocated to that load, etc. Also, when the physical resources available on the computing node 200 do not meet the physical resources required for creating the new load, the computing node 200 may forward the load running request to other computing nodes 300 with sufficient physical resources, so that the computing node 300 creates the load with its own physical resources.
  • the scheduling system 100 only needs to instruct the computing node 200 to perform physical resource scheduling for the load, without performing a complex calculation process to determine the specific physical resources allocated to the load, and without determining whether the computing node 200 receiving the load operation request has Sufficient physical resources to create loads (automatically calculated and determined by the computing nodes 200), so that the amount of computation and the scheduling difficulty required by the scheduling system 100 to schedule physical resources can be reduced, so that the scheduling system 100 can manage and schedule a larger number of computing nodes , that is, the scale of the computing node cluster is increased.
  • the computing node cluster shown in Figure 1 can be deployed as a public cloud or an edge cloud.
  • the form of the public cloud may include a public cloud with a public cloud data center as the core, and the edge cloud in the form of an edge cloud with a base station data center as the core, and large enterprise data centers as the core. Enterprise cloud with the center as the core, and small edge cloud with the light-weight edge site data center as the core, etc.
  • the specific deployment form of the computing node cluster shown in 1 in an actual application scenario is not limited.
  • the scheduling system 100 and the computing node cluster may be deployed in the same cloud environment, for example, the scheduling system 100 and the computing node cluster may all be set in a data center of a public cloud.
  • the scheduling system 100 and the computing node cluster can be deployed separately.
  • the scheduling system 100 can be set up in the data center of the public cloud, and the computing node cluster can be set up at the edge that is remotely connected to the data center of the public cloud. Cloud data center.
  • the number of computing node clusters may be multiple, and the scheduling system 100 may be respectively connected to multiple computing node clusters.
  • the node sends a load operation request, and the computing node performs scheduling operations in this computing node cluster, thereby selecting computing nodes suitable for running virtual instances. Since the scheduling system 100 does not need to implement complex scheduling operations, the scheduling system 100 can support the calculation. The number of node clusters is higher.
  • the scheduling system 100 may provide a configuration interface or an application programming interface for tenants to select a specific computing node cluster, such as edge cloud 1, edge cloud 2, etc.
  • edge cloud 1 represents a computing node set in edge cloud data center 1 Cluster
  • edge cloud 2 represents a cluster of computing nodes set in edge cloud data center 2.
  • FIG. 2 a schematic structural diagram of a computing node 200 in the cluster node cluster is provided in an embodiment of the present application.
  • the computing node 200 may include a resource management module 201 , a communication module 202 , a scheduling module 203 , and a control module 204 .
  • the computing node 200 may be connected with other computing nodes to enable data communication with other computing nodes based on the connection.
  • other computing nodes may refer to one or more computing nodes other than computing node 200 in the computing node cluster, such as computing node 300 and computing node 400 in FIG. 2 .
  • the scheduling system 100 may be located outside the computing node cluster, as shown in FIG. 2 , and in other possible implementations, the scheduling system 100 may also be located inside the computing node cluster. This embodiment does not limit this.
  • the resource management module 201 is used to manage the physical resources available on the computing node 200, and the physical resources may include, for example, computing resources, storage resources, and network resources.
  • the computing resources include processors and memory, where the processors may be, for example, central processing units (CPUs), graphics processing units (graphics processing units, GPUs), etc., and each processor may include a or multiple processor cores.
  • Storage resources such as cloud disks.
  • the network resources for example, can be the link bandwidth, network port, etc., elastic public network IP, elastic network card, etc. used for data communication.
  • the physical resources available on the computing node 200 may refer to the remaining unallocated physical resources on the computing node 200 .
  • the resource management module 201 can collect the total amount of physical resources inside the computing node 200, and record the physical resources that have been allocated to the load running on the computing node 200, so that the resource management module 201 can determine the calculation
  • the physical resources on the node 200 that are not currently allocated to any load are hereinafter referred to as available physical resources.
  • the available physical resources determined by the resource management module 201 may also include physical resources that have been allocated to the load but are not used by the load during operation. In this way, the physical resources that are not used by the load temporarily can be scheduled to other loads for multiplexing within a certain period of time. In this way, resource utilization on the computing node 200 can be improved.
  • the resource management module 201 may also collect physical resources available on other computing nodes based on the communication connection between the computing node 200 and other computing nodes. Similar to the implementation of determining the physical resources available on the computing node 200, the resource management module 201 can collect the total amount information of the physical resources in the computing node 300 and the computing node 400 in other computing nodes and the information of the allocated physical resources, Therefore, the resource management module 201 can respectively determine the available physical resources in the computing node 300 and the available physical resources in the computing node 400 according to the total amount of physical resources in each computing node and the information about the allocated physical resources.
  • the computing node 300 and the computing node 400 may determine the available physical resources in advance according to the total amount of their own physical resources and the information about the physical resources that have been allocated, so that the resource management module 201 Through its communication connection with the computing node 300 and the computing node 400 , the information of the available physical resources determined on the computing node 300 and the information of the available physical resources determined on the computing node 400 are acquired.
  • the resource management module 201 can calculate the total amount of available physical resources according to the acquired physical resources available on the computing node 200 and the available physical resources on other computing nodes, and report the total available physical resources to the The scheduling system 100, for example, reports the total amount of processor cores currently remaining in the computing node 200, the total amount of remaining memory, the total amount of cloud disks, the total amount of available bandwidth, and the like.
  • the scheduling system 100 when the scheduling system 100 needs to create a new load (hereinafter referred to as a target load) for a tenant, it can determine the resource amount of the physical resource required to create the target load according to the specification specified by the tenant, and when the When the amount of resources is less than the total amount of available physical resources reported by the computing node 200, the scheduling system 100 may generate a load running request for the target load, and send the load running request to the computing node 200 to request the computing node 200 to use the corresponding Physical resources create and run target loads.
  • a target load a new load
  • the communication module 202 is configured to receive a load running request, where the load running request is used to request the computing node 200 to run the target load.
  • the load running request may include the type and priority of the target load, the type and quantity of physical resources required to run the target load, and the time period during which the target load runs.
  • the type of the target load may be, for example, a virtual machine, a container, and a process.
  • the priority of the target load can be used to indicate the priority of the computing node 200 to create the target load.
  • the scheduling system 100 sends a load operation request for the target load and a load operation request for other loads to the computing node 200 at the same time, if the priority of the target load is higher than the priority of other loads, the computing node 200 may give priority to The target load allocates physical resources.
  • the load running request may also include any one or more of the above information.
  • the load running request may also include other information, such as including alternative physical resources to run the load. For example, when the remaining GPU resources on the computing node 200 are insufficient to run the load, the computing node 200 can utilize the CPU resources to support the running of the load. Then, the communication module 202 may provide the received load running request to the scheduling module 203 .
  • the scheduling module 203 can parse out information such as the type, priority, type and quantity of physical resources, and running time period of the target load, and query the resource management module 201 for the available physical resources of the current computing node 200. resource. Then, the scheduling module 203 further determines whether the queried available physical resources of the computing node 200 can meet the physical resources required for running the target load according to the information obtained by the analysis. If it can be satisfied, the scheduling module 203 may select the corresponding first physical resource from the physical resources available on the computing node 200 .
  • the scheduling module 203 may determine the order of allocating the first physical resource to the target load according to the priority level; when the load operation request indicates the type and quantity of the physical resources , the scheduling module 203 can select a corresponding category and a corresponding number of physical resources as the first physical resource; when the load running request indicates a running time period, the scheduling module 203 can select a physical resource that is not used by other loads during the running time period The resource acts as the first physical resource. Then, the scheduling module 203 allocates the selected first physical resource to the target load, for example, establishing an association relationship between the target load and the first physical resource.
  • the resource management module 201 may deduct the first physical resource that has been allocated to the target load from the determined available physical resources. Further, the resource management module 201 may also send response information to the scheduling system 100 through the communication module 202, where the response information is used to notify the scheduling system 100 that the physical resources requested by the load running request have been deducted on the computing node 200, thereby The scheduling system 100 may correspondingly deduct the total amount of available physical resources reported by the computing node 200 according to the resource amount of the physical resources requested by the load running request.
  • the scheduling module 203 may be pre-configured with a corresponding resource scheduling strategy, so that when scheduling physical resources for the target load, the scheduling module 203 selects an appropriate resource scheduling strategy according to the load operation request, so as to meet the requirements of the load operation request.
  • the first physical resource is scheduled for the target load from the available physical resources based on the resource scheduling policy.
  • the resource scheduling policy may be, for example, a balanced scheduling policy, that is, when scheduling the first physical resource for the target load, the physical resources that have been allocated to each load are distributed evenly on the computing nodes 200 .
  • the scheduling module 203 allocates some of the processor cores in processor 1. After the load 1 is assigned, even if there are still unallocated remaining processor cores in the processor 1, the scheduling module 203 can also allocate part of the processor cores in the processor 2 to the load 2 according to the balancing policy. In this way, part of the processor cores on both processor 1 and processor 2 are allocated to the load on the computing node 200 .
  • the resource scheduling policy may be, for example, a sequential scheduling policy, that is, when scheduling the first physical resource for the target load, the physical resources on the computing node 200 may be sequentially scheduled to the target load.
  • a sequential scheduling policy that is, when scheduling the first physical resource for the target load, the physical resources on the computing node 200 may be sequentially scheduled to the target load.
  • the scheduling module 203 allocates some of the processor cores in the processor 1 to the load 1, since there are still unallocated remaining processor cores in the processor 1, the scheduling module 203 The unallocated remaining processor cores in processor 1 can be preferentially allocated to load 2.
  • the computing resource scheduling for load 2 ends. At this time, the resource allocation for processor cores in processor 1 and processor 2 Not balanced.
  • the scheduling module 203 may continue to allocate the processor cores in the processor 2 to the load 2.
  • the resource scheduling policy may also adopt other possible implementation manners, such as randomly selecting physical resources for scheduling, etc., which is not limited in this embodiment.
  • the computing node 200 may further include a control module 204, and the scheduling module 203 may send the information of the first physical resource to the control module 204 after allocating the first physical resource to the target load based on the load running request.
  • the control module 204 can use the first physical resource allocated by the scheduling module 203 to start running the target load, and the type of the target load is the type indicated by the load running request.
  • the scheduling system 100 may specifically be a virtual machine scheduling system, and the virtual machine scheduling system may send a load running request for creating a virtual machine to the computing node 200 .
  • control module 204 can determine that the type of the target load is a virtual machine, so that when the physical resources available on the computing node 200 can satisfy the physical resources requested by the load running request, the computing node 200 can allocate the first physical resource according to the (ie, the physical resources requested by the load running request) to create a virtual machine on the computing node 200 .
  • the scheduling system 100 may specifically be a container scheduling system, and the container scheduling system may send a load running request for creating a container to the computing node 200 .
  • control module 204 may determine that the type of the target load is a container, so that the computing node 200 If the physical resources available on the server can satisfy the physical resources requested by the load running request, a container is created on the computing node 200 according to the allocated first physical resource (that is, the physical resources requested by the load running request).
  • the scheduling system may simultaneously integrate the functions of the virtual machine scheduling system and the container scheduling system, which is not limited in this embodiment.
  • the control module 204 can also stop running the target load and release the first physical resources allocated to the target load.
  • the available physical resources determined by the resource management module 201 may include the first physical resource again.
  • the computing node 200 may further include a monitoring module 205, and the monitoring module 205 is configured to monitor the physical resources currently used on the computing node 200, and collect resource usage data, where the resource usage data is used to indicate running on the computing node 200.
  • the load on the server used physical resources during a historical time period (such as within the past 24 hours).
  • the monitoring module 205 can predict, according to the resource usage data, the physical resources that have been allocated to the load running on the computing node 200 but the load may not be used in a future time period (eg, within the next 24 hours), so that the resource management module 201 These physical resources that are not used by the load predicted by the monitoring module 205 can be included in the available physical resources, so that the part of the physical resources can be subsequently allocated to the target load and run within a specified time period. In this way, the limited physical resources on the computing node 200 can support a larger number of load operations.
  • the resource management module 201 may also predict, according to the resource usage data collected by the monitoring module 205, the physical resources that will not be used by the load for a period of time in the future, which is not limited in this embodiment. .
  • the scheduling module 203 can query the resource management module 201 for the available physical resources on other computing nodes, so that the scheduling module 203 can according to the physical resources available on the computing node 300 and the physical resources available on the computing node 400 fed back by the resource management module 201 resources, which determine the computing nodes that can satisfy the physical resources requested by the load operation request. Assuming that the physical resources available on the computing node 300 can satisfy the physical resources requested by the load running request, the scheduling module 203 may forward the load running request to the computing node 300 through the communication module 202 .
  • the computing node 300 can allocate a corresponding first physical resource for the load running request from the physical resources available to itself, and create and run a target load by using the first physical resource.
  • the specific implementation manner in which the computing node 300 allocates the first physical resource and runs the target load according to the received load operation request is similar to the specific implementation manner in which the computing node 200 allocates the first physical resource and runs the target load, and can refer to the foregoing implementation. The relevant parts of the example are described, which will not be repeated here.
  • the computing node 200 may not only receive a load running request from the scheduling system 100, but may also receive a load running request from other computing nodes.
  • the scheduling system 100 may first send a load running request to the computing node 400 to request the computing node 400 to create a target load based on the load running request.
  • the computing node 400 may forward the load running request to the computing node 200 that can satisfy the physical resources requested by the load running request, so as to calculate
  • the node 200 uses its own physical resources to create and run a target load based on the load execution request.
  • the scheduling module 203 may reschedule the physical resources allocated to each load on the computing node 200 , so as to reduce the resource fragmentation rate on the computing node 200 .
  • the scheduling module 203 can improve the running performance of some loads by rescheduling the physical resources allocated to the multiple loads.
  • the scheduling module 203 when the scheduling module 203 receives the load operation request, it can allocate the above-mentioned first physical resource to the target load according to the load operation request, and when the computing node 200 satisfies the rescheduling condition, the scheduling module 203 can release the first physical resource. resources, and reallocates a second physical resource to the target load from the physical resources available on the computing node 200, where the second physical resource is different from the first physical resource.
  • the specification of the second physical resource is different from the specification of the first physical resource, or the resource type included in the second physical resource is different from the resource type included in the first physical resource, or the performance of the second physical resource is different from that of the first physical resource. There are differences in the performance of a physical resource, etc.
  • the computing node 200 includes two CPUs as shown in FIG. 3 , and each CPU includes two non-uniform memory access (NUMA) nodes, and each NUMA node includes two non-uniform memory access (NUMA) nodes.
  • NUMA non-uniform memory access
  • the scheduling module 203 may allocate the processor cores C1 to C6 in the NUMA0 node to the load 1, and the NUMA1 node C9 to C16 in the NUMA2 node and C17 to C20 in the NUMA2 node are assigned to load 2, and C25 to C30 in the NUMA3 node are assigned to load 3.
  • the processor cores C7 to C8 on the NUMA0 node of the computing node 200, C21 to C24 on the NUMA2 node, and C31 to C32 on the NUMA3 node are the computing resource fragments in the computing node 200, as shown in FIG. 3 .
  • the scheduling module 203 can reschedule the physical resources allocated to the load 2 and the load 3. Specifically, the processor cores C17 to C24 on the NUMA2 node and C25 to 28 on the NUMA3 node can be allocated to the load 3, and the NUMA0 The processor cores C7 to C8 on the node and C29 to C32 on the NUMA3 node are assigned to load 2. In this way, after the computing resources are rescheduled by the scheduling module 203, a complete NUMA1 node may remain on the computing node 200, and there may be no computing resource fragments on the remaining NUMA nodes.
  • the new computing resources allocated by load 3 are all located on the same CPU, which eliminates the need for cross-CPU communication between the processor cores allocated by load 3, thereby improving The performance of load 3 at runtime.
  • the scheduling module 203 may periodically reschedule physical resources to reduce physical resource fragments on the computing node 200; accordingly, the rescheduling condition satisfied by the computing node 200 may specifically be the distance from the last time The time interval for performing rescheduling reaches the period length of rescheduling, etc.
  • the scheduling system 100 may also issue an instruction for rescheduling physical resources to the computing node 200, so that the scheduling module 203 executes the process of rescheduling physical resources according to the instruction; correspondingly, the rescheduling condition satisfied by the computing node 200 may be specifically Is the command to receive rescheduling.
  • the scheduling module 203 may, according to the resource amount of the physical resources available on the computing node 200 determined by the resource management module 201, when the resource amount is lower than a preset threshold, the scheduling module 203 may actively perform the rescheduling process of the physical resources ;
  • the rescheduling condition satisfied by the computing node 200 may specifically be that the resource amount of the physical resources available on the computing node 200 is lower than the preset threshold.
  • the scheduling module 203 can calculate the fragmentation rate of the physical resources according to the physical resources available on the computing node 200 determined by the resource management module 201, so that when the fragmentation rate is higher than the preset fragmentation rate, the scheduling module 203 can actively execute the physical resources.
  • the rescheduling condition satisfied by the computing node 200 may specifically be that the fragmentation rate of the computing node 200 is higher than the fragmentation rate threshold.
  • the scheduling module 203 can improve the performance of the target load by rescheduling physical resources, thereby improving the service quality of the service corresponding to the target load;
  • the rescheduling condition satisfied by the computing node 200 may specifically be that the service quality of the target load needs to be improved when the target load is running.
  • the specific implementation manner of how to trigger the scheduling module 203 to reschedule physical resources is not limited.
  • the scheduling module 203 can allocate an appropriate amount of physical resources to each load according to the usage of physical resources by each load in a future period of time predicted by the monitoring module 205 .
  • the monitoring module 205 can predict the usage of physical resources by each load in a future period of time according to the resource usage data obtained by monitoring. For example, when the monitoring module 205 predicts that the physical resources used by a partial load in a future period of time are smaller than the physical resources initially allocated to the load, the scheduling module 203 can reduce the resources of the allocated physical resources when re-allocating the physical resources to the load quantity. In this way, the limited physical resources on the computing node 200 can support the operation of a larger number of loads.
  • load 1 is used to provide tenants with office services such as text editing
  • load 2 is used to provide tenants with model training type services
  • load 3 is used to provide tenants with data storage type services.
  • load 1 usually has lower requirements for computing resources and storage resources during runtime
  • load 2 usually has higher requirements for computing resources and lower requirements for storage resources during runtime
  • load 3 requires computing resources during runtime.
  • the demand for resources is low, while the demand for storage resources is high.
  • the monitoring module 205 can classify each load running on the computing node 200 according to the service type according to the resource usage data, so that the scheduling module 203 can determine the physical resources to be re-allocated to each load according to the service type corresponding to each load amount of resources. For example, for load 1, since the amount of computing resources and storage resources used in the actual application of the load is small, the scheduling module 203 can reduce the amount of allocated computing resources and storage resources when re-allocating physical resources for load 1. Resources. Similarly, when the scheduling module 203 reallocates physical resources to load 2, it can reduce the resource amount of allocated storage resources; when the scheduling module 203 reallocates physical resources to load 3, it can reduce the resource amount of allocated computing resources.
  • the monitoring module 205 performs aggregation processing on the multiple loads running on the computing node 200 according to the resource usage data, so that when the scheduling module 203 reschedules the physical resources for the aggregated multiple loads, it can be based on the aggregation characteristics of the loads. , and assign the corresponding physical resources to it.
  • the monitoring module 205 can also simultaneously perform the above-mentioned classification, aggregation and prediction processing according to the resource usage data obtained by monitoring, so that the scheduling module 203 can obtain the processing results for each load (including classification information, aggregation information, etc.) according to the monitoring module 205. and forecast information, etc.), re-scheduling physical resources for multiple loads.
  • the computing node 200 shown in FIG. 2 above can support one type of load to run on the computing node 200, and in other possible embodiments, the computing node 200 can also support multiple types of loads at the same time. run.
  • FIG. 4 it is a schematic structural diagram of another computing node 200 .
  • the computing node 200 may still include a resource management module 201 , a communication module 202 , a scheduling module 203 , a control module 204 and a monitoring module 205 .
  • the resource management module 201 can be used to manage the available physical resources on the computing node 200 (and other computing nodes).
  • the communication module 202 is configured to receive a load running request sent by the scheduling system 100 or other computing nodes, and provide the load running request to the scheduling module 203 .
  • the scheduling module 203 may schedule corresponding physical resources on the computing node 200 according to the load running request and allocate them to the target load.
  • the control module 204 may include multiple control units, such as the control unit 1 and the control unit 2 shown in FIG. 4 .
  • different control units are used to control the startup and operation of different types of loads, for example, the control unit 2041 can control the operation of virtual machine type loads, and the control unit 2042 can control the operation of container type loads.
  • different control units may support load control interfaces corresponding to different types of loads, and the load control interfaces corresponding to the different types of loads may be defined by the operating system on the computing node 200 .
  • the control module 204 can determine the control unit corresponding to the load type according to the load type to which the target load indicated by the load operation request belongs, and further use the control unit
  • the target load is up and running with the allocated first physical resource.
  • the usage of physical resources by the target load (and other loads running on the computing node 200 ) during running can be monitored by the monitoring module 205 .
  • the computing node 200 composed of the resource management module 201 to the monitoring module 205 can support various types of load operation and resource allocation, and can not run in the Various scheduling systems are configured on the computing node 200, which can effectively reduce the overhead of the computing node 200 for the scheduling system, and can reduce the overall system complexity.
  • the resource management module 201 , the communication module 202 , the scheduling module 203 , the control module 204 , and the monitoring component 205 may be implemented by software, such as running on Computer programs, such as components, plug-ins, etc., on the computing node 200 .
  • the control module 204 when the control module 204 is implemented by components, the control unit 2041 and the control unit 2042 in the control module 204 may be plug-ins or the like registered in the control module 204 in advance.
  • control unit 2041 may be a QEMU-based plug-in, which is used to support virtual machine type workload operation; the control unit 2042 may be a KATA-based plug-in, which is used to support container type workload operation.
  • the resource management module 201, the communication module 202, the scheduling module 203, the control module 204, and the monitoring component 205 may also be implemented by hardware, for example, the resource management module 201, the scheduling module 203, the control module 204, and the monitoring component 205 may utilize dedicated integration Circuit (application-specific integrated circuit, ASIC) implementation, or programmable logic device (programmable logic device, PLD) implementation of equipment, etc.
  • the communication module 202 may be implemented using a network card or the like.
  • the above-mentioned PLD can be a complex program logic device (complex programmable logical device, CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL) or any combination thereof.
  • complex program logic device complex programmable logical device, CPLD
  • field-programmable gate array field-programmable gate array
  • GAL general array logic
  • the architecture included in the computing node 200 shown in FIG. 2 and FIG. 4 is only used as an exemplary description, and is not used to limit the specific implementation of the computing node 200 .
  • the computing node 200 may include more functional modules to support the computing node 200 to have more other functions; or, the computing node 200 may simultaneously support three or more types of loads in run on the computing node 200 and so on.
  • FIG. 5 is a schematic flowchart of a load processing method provided by an embodiment of the present application.
  • the load processing method shown in FIG. 5 may be applied to the computing node 200 shown in FIG. 2 or FIG. 4 , or applied to other applicable computing nodes 200 .
  • the application to the computing node 200 shown in FIG. 2 is taken as an example for illustrative description.
  • the load processing method shown in FIG. 5 may specifically include:
  • the communication module 202 receives a load running request sent by the scheduling system 100 or other computing nodes.
  • the scheduling system 100 may generate a load running request, and send the load running request to the computing node 200 to request to schedule the target load on the computing node 200 physical resources and start running.
  • each computing node in the computing node cluster can periodically report information on the total amount of its remaining physical resources, so that the scheduling system 100 can determine the computing nodes 200 required by the remaining physical resources to meet the target load, and generate The load running request is sent to the computing node 200.
  • the scheduling system 100 may send the load operation request to a predetermined computing node in the computing node cluster, so that the computing node receives the load operation request after receiving the load operation request.
  • the computing node can determine whether there are enough available physical resources to support the operation of the target load. If so, the computing node can perform physical resource scheduling for the target load and run the target load. If not, the computing node can determine the available physical resources on the computing node 200 according to the physical resources available on other computing nodes (such as computing node 300 and computing node 400 in FIG. 2 , etc.) that are recorded by the computing node. The physical resources are sufficient to meet the target load.
  • the computing node may send a load running request to the computing node 200 so as to create and run the target load on the computing node 200 .
  • the computing node cluster can quickly implement physical resource scheduling for the target load in a plurality of local computing nodes, so that the real-time performance of resource scheduling performed by the computing node cluster can be improved.
  • the communication module 202 sends the received load running request to the scheduling module 203.
  • the scheduling module 203 determines whether there are physical resources matching the load running request according to the available physical resources recorded by the computing node 200 determined by the resource management module 201. If it does not exist, go to step S504; and if it exists, go to step S505.
  • the available physical resources recorded by the computing node 200 may include the available physical resources of the computing node 200 itself and the available physical resources of one or more other computing nodes connected to the computing node 200 .
  • Different computing nodes can periodically interact and record their own available physical resources. In practical applications, different computing nodes can interact through a communication module.
  • the physical resources available on the computing node 200 may be, for example, physical resources on the computing node 200 that are not currently allocated to any load. Further, in addition to physical resources that are not allocated to any load, the available physical resources may also include physical resources that have been allocated to the load but are not used by the load during operation.
  • the physical resources may include computing resources, storage resources, and bandwidth resources.
  • S504 The scheduling module 203 rejects the load running request, and ends the process.
  • the scheduling module 203 can also feed back a notification of resource scheduling failure to the scheduling system 100 through the communication module 202, so that the scheduling system 100 requests other computing nodes to perform physical resource scheduling.
  • the scheduling module 203 determines whether the physical resource matching the load running request is a local physical resource of the computing node 200. If yes, allocate the first physical resource for the target load, and continue to perform step S508; if not, continue to perform step S506.
  • the scheduling module 203 forwards the load running request to other computing nodes having the physical resource through the communication module 204.
  • step S507 The communication module 203 determines whether the forwarding is successful. If the forwarding is successful, proceed to step S508; if the forwarding fails, proceed to step S504.
  • the control module 204 uses the allocated first physical resource to start the target load.
  • the resource management module 201 deducts the first physical resource allocated for the target load from the recorded available physical resources.
  • the control module 204 may first determine the load type to which the target load belongs, and determine the load type according to the load. Type Select the control unit corresponding to the load type to activate the target load.
  • the introduction is made from the perspective of the computing node 200 scheduling physical resources for the target load.
  • the computing node 200 may also reschedule the physical resources that have been allocated to each load.
  • a specific implementation process of the computing node 200 for rescheduling physical resources will be introduced with reference to the accompanying drawings.
  • FIG. 6 it is a schematic flowchart of rescheduling the physical resources allocated by each load on the computing node 200.
  • the method can be applied to the computing node 200 shown in FIG. 2 or FIG. 4. Specifically, the method may include:
  • the scheduling module 203 determines that the rescheduling condition is satisfied.
  • the computing node 200 may determine to reschedule the physical resources allocated to the load running on the computing node 200 when a preset rescheduling condition is satisfied.
  • the rescheduling condition may be, for example, receiving a rescheduling instruction sent by the scheduling system 100, so that the computing node 200 may, under the instruction of the rescheduling instruction, determine to reschedule the load that has been allocated to the load running on the computing node 200. physical resources. For example, when a certain load running on the computing node 200 needs to improve the service quality of the service, the scheduling system 100 can instruct the computing node 200 to re-schedule the load, so as to improve the performance of the load at runtime, thereby improving the load 2 Corresponding business service quality.
  • the rescheduling condition may be, for example, that the resource amount of available physical resources determined by the resource management module 201 is lower than a preset threshold, so that the computing node 200 can improve the rationality of allocating physical resources by rescheduling the physical resources.
  • the rescheduling condition may be, for example, that the fragmentation rate of the physical resources on the computing node 200 is lower than the preset fragmentation rate, etc. In this way, the computing node 200 can reduce the fragmentation rate of the physical resources on the computing node 200 by rescheduling the physical resources. .
  • the computing node 200 may periodically reschedule the physical resources allocated to each load, and the rescheduling condition may be, for example, that the computing node 200 has been away from the last physical resource rescheduling for a preset period of time (that is, rescheduling). cycle).
  • the specific implementation manner of the rescheduling condition is not limited.
  • the scheduling module 203 obtains resource management information and monitoring information, where the resource management information includes the total amount of physical resources on the computing node 200 recorded by the resource management module and the physical resource information allocated to each load, and the monitoring information includes the monitoring module 205 for Classification information, aggregation information, and prediction information corresponding to each load running on the computing node 200 .
  • the scheduling module 203 determines whether to reschedule the physical resources that have been allocated to the load. If yes, go to step 604; if not, go to step S607.
  • the scheduling module 203 can perform physical resource rescheduling for all loads running on the computing node 200 , and can also perform physical resource rescheduling for some loads. Scheduling physical resources, etc.
  • the scheduling module 203 releases the first physical resource and reschedules the second physical resource for the load.
  • the scheduling module 203 may also check whether there is a more optimal scheduling scheme relative to the current resource scheduling policy, and the scheduling scheme may be preset or real-time by the computing node 200. generate. Among them, the scheduling scheme can optimize the load in terms of running performance and resource consumption. If there is, the scheduling module 203 can complete the physical resource rescheduling of the load based on the scheduling scheme. Specifically, the first physical resource allocated for the load can be released first, and after the release of the first physical resource is completed, the A second physical resource is reallocated for the load from the currently available physical resources of the node 200 . If not, the scheduling module 203 may schedule physical resources for the load based on the current resource scheduling policy.
  • the resource management module 201 corrects the available physical resources.
  • the scheduling module 203 After the scheduling module 203 reschedules the physical resources for the loads, the information on the physical resources allocated to the respective loads on the computing node 200 changes. Therefore, the resource management module 201 The recorded resource allocation information is corrected.
  • the control module 204 performs a load migration operation with respect to the load whose physical resources have changed.
  • control module 204 uses the reassigned physical resources to rerun the load, so that the load runs based on the newly allocated physical resources and completes the load migration. At the same time, the control module 204 may release the physical resources previously allocated to the load.
  • an embodiment of the present application further provides a load processing apparatus.
  • FIG. 7 a schematic structural diagram of a load processing apparatus provided by an embodiment of the present application is shown.
  • the load processing apparatus 700 may be applied to a first computing node, the first computing node is connected to other computing nodes, and the first computing node and other computing nodes are arranged in the same computing node cluster, the load processing apparatus 700 includes:
  • a resource management module 701, configured to determine physical resources available on the first computing node and physical resources available on the other computing nodes;
  • a communication module 702 configured to receive a load running request, where the load running request is used to request physical resources for running the target load;
  • a scheduling module 703, configured to determine whether the physical resources available on the first computing node satisfy the physical resources requested by the load running request, and the physical resources available on the first computing node satisfy the load In the case of running the physical resources requested by the request, allocate the physical resources requested by the load running request to the target load from the physical resources available on the first computing node, and then allocate the physical resources requested by the load running request to the target load. In the case that the physical resources available on the server do not satisfy the physical resources requested by the load operation request, select the available physical resources from the other computing nodes to satisfy the first number of the physical resources requested by the load operation request. a second computing node, and forward the load running request to the second computing node.
  • the first computing node is connected to a scheduling system, and the communication module 702 is specifically configured to receive the load running request sent by the scheduling system.
  • the scheduling system is set in a data center of a public cloud
  • the computing node cluster is set in an edge cloud data center that is remotely connected to the data center of the public cloud.
  • both the scheduling system and the computing node cluster are set in the data center of the public cloud.
  • the scheduling system includes a virtual machine scheduling system and a container scheduling system.
  • the communication module 702 is further configured to, after allocating the physical resources requested by the load running request to the target load from the physical resources available on the first computing node, send a message to the target load.
  • the scheduling system sends response information, wherein the response information is used to notify the scheduling system that the physical resource requested by the load running request has been deducted on the first computing node.
  • the communication module 702 is further configured to send the total amount of available physical resources to the scheduling system before receiving the load operation request, where the total amount of available physical resources includes all The sum of the resource amount of available physical resources on the first computing node and the resource amount of available physical resources on the other computing nodes.
  • the communication module 702 is specifically configured to receive the load running request forwarded by a third computing node in the computing node cluster, wherein the available physical The resource does not satisfy the physical resource requested by the load running request.
  • the load running request carries the type of the target load, wherein the type of the target load includes a virtual machine and a container, and the load processing apparatus further includes a control module 704;
  • the control module 704 is specifically used for:
  • the load running request is allocated according to the load running request.
  • a virtual machine is created on the first computing node for the requested physical resource, and when the type of the target load is a container, a container is created on the first computing node based on the physical resource requested by the allocated load running request.
  • the resource management module 701 is specifically configured to:
  • the physical resources available on the other computing nodes are determined according to the total amount of physical resources of the other computing nodes and the physical resources used by the other computing nodes.
  • the physical resources available on the first computing node satisfy the physical resources requested by the load running request, the physical resources available on the first computing node are obtained from the physical resources available on the first computing node.
  • the physical resource allocated for the target load is the first physical resource;
  • the scheduling module 703 is also used for:
  • the target load is reassigned a second physical resource from the physical resources available on the first computing node, the second physical resource being different from the first physical resource.
  • the load processing apparatus 700 may further include more functional modules, such as a monitoring module 705, etc., the monitoring module 705 is used to monitor the allocated and unallocated physical resources, so that the resource management module 701 can Monitoring results determine available physical resources, etc.
  • a monitoring module 705 is used to monitor the allocated and unallocated physical resources, so that the resource management module 701 can Monitoring results determine available physical resources, etc.
  • the load processing apparatus 700 provided in this embodiment corresponds to the load processing method in the foregoing embodiment. Therefore, for the specific implementation manner of each module provided in this embodiment and the technical effect it has, reference may be made to the foregoing embodiment. The relevant descriptions are not repeated here. Specifically, for the specific implementation manner of the resource management module 701 in the load processing apparatus 700 and the technical effects it has, reference may be made to the resource management module 201 in the foregoing embodiment; the specific implementation manner of the communication module 702 in the load processing apparatus 700 For the specific implementation of the scheduling module 703 in the load processing device 700 and its technical effects, reference may be made to the scheduling module 203 in the foregoing embodiment.
  • an embodiment of the present application further provides a computing node, where the computing node may be a device for implementing the computing node 200 described above.
  • the computing node may be a device for implementing the computing node 200 described above.
  • FIG. 8 a schematic diagram of the hardware structure of the computing node is shown.
  • the computing node 800 includes a bus 801 , a processor 802 , a communication interface 803 and a memory 804 .
  • the processor 802 , the memory 804 and the communication interface 803 communicate through the bus 801 .
  • the bus 801 may be a peripheral component interconnect (PCI) bus, an extended industry standard architecture (EISA) bus, or the like.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on. For ease of presentation, only one thick line is used in FIG. 8, but it does not mean that there is only one bus or one type of bus.
  • the communication interface 803 is used for external communication, such as receiving a data acquisition request sent by the terminal.
  • the processor 802 may be a central processing unit (central processing unit, CPU).
  • Memory 804 may include volatile memory, such as random access memory (RAM).
  • RAM random access memory
  • Memory 804 may also include non-volatile memory, such as read-only memory (ROM), flash memory, HDD, or SSD.
  • Executable code is stored in the memory 804 , and the processor 802 executes the executable code to execute the aforementioned method executed by the computing node 200 .
  • the software or program codes required to execute the functions of the load processing apparatus 700 in FIG. 7 are stored in the memory 804, and the load processing apparatus 700 communicates with other devices (such as other computing The interaction of the nodes) is realized through the communication interface 803, and the processor is used to execute the instructions in the memory 804 to realize the function of the load processing apparatus 700, or to execute the method executed by the above computing node 200.
  • an embodiment of the present application further provides a computing node cluster, such as the computing node cluster shown in FIG. 2 and FIG. 4 , where the computing node cluster includes multiple computing nodes, and one of the multiple computing nodes or A plurality of computing nodes perform the method performed by the computing node 200 in the above-mentioned embodiment.
  • an embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium runs on a computer device, the computer device causes the computer device to execute the execution of the computing node 200 in the foregoing embodiment. method.
  • an embodiment of the present application further provides a computer program product, when the computer program product is executed by a computer, the computer executes any one of the foregoing data providing methods.
  • the computer program product can be a software installation package, which can be downloaded and executed on a computer if any one of the aforementioned data providing methods needs to be used.
  • the device embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be A physical unit, which can be located in one place or distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be retrieved from a website, computer, training device, or data Transmission from the center to another website site, computer, training facility or data center via wired (eg coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.) means.
  • wired eg coaxial cable, fiber optic, digital subscriber line (DSL)
  • wireless eg infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a training device, a data center, or the like that includes an integration of one or more available media.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)

Abstract

本申请提供了一种负载处理方法,应用于第一计算节点,该第一计算节点与其他计算节点连接,并且,该第一计算节点与其他计算节点设置在同一计算节点集群。具体的,确定第一计算节点上可用的物理资源和其他计算节点上可用的物理资源,并接收负载运行请求,该负载运行请求用于请求运行目标负载的物理资源,从而第一计算节点判断在第一计算节点上可用的物理资源是否能够满足负载运行请求所请求的物理资源,当能够满足时,从第一计算节点上可用的物理资源中为目标负载分配物理资源。而当不能满足时,将负载运行请求转发至第二计算节点。如此,可以降低调度系统调度物理资源所需的计算量以及调度难度,进而可以增加计算节点集群的规模。

Description

负载处理方法、计算节点、计算节点集群及相关设备 技术领域
本申请涉及云计算技术领域,尤其涉及一种负载处理方法、计算节点、计算节点集群及相关设备。
背景技术
在公有云系统或者私有云系统等计算节点集群中,通常包括多个物理计算节点以及资源管理调度节点。其中,每个计算节点上可以运行属于一个或者多个租户的负载,该负载可以以虚拟机(virtual machine,VM)、容器或者进程等方式运行;资源管理调度节点可以将各个计算节点上的资源纳入管理,并调度计算节点上的资源来运行该计算节点上的负载。
实际应用场景中,计算节点集群中资源管理调度节点的数据处理能力通常有限,这使得当计算节点的数量达到一定程度时,资源管理调度节点对于计算节点的管理以及资源调度的性能难以达到较高水平,从而限制了计算节点集群的规模。
发明内容
本申请提供了一种负载处理方法,用于增加计算节点集群的规模。此外,本申请还提供了一种计算节点、计算节点集群、计算机可读存储介质以及计算机程序产品。
第一方面,本申请提供了一种负载处理方法,该方法应用于第一计算节点,该第一计算节点与其他计算节点连接,并且,该第一计算节点与其他计算节点设置在同一计算节点集群;第一计算节点在执行该负载处理方法时,确定第一计算节点上可用的物理资源和其他计算节点上可用的物理资源,并接收负载运行请求,该负载运行请求用于请求运行目标负载的物理资源,从而第一计算节点判断在第一计算节点上可用的物理资源是否满足负载运行请求所请求的物理资源;在第一计算节点上可用的物理资源满足负载运行请求所请求的物理资源的情况下,从第一计算节点上可用的物理资源中为目标负载分配负载运行请求所请求的物理资源;在第一计算节点上可用的物理资源不满足负载运行请求所请求的物理资源的情况下,从其他计算节点中选择可用的物理资源满足负载运行请求所请求的物理资源的第二计算节点,并将该负载运行请求转发至第二计算节点。
由于是由第一计算节点为目标负载进行细粒度的物理资源调度,并且,在第一计算节点上可用的物理资源不满足创建目标负载所需的物理资源时,第一计算节点可以将负载运行请求转发至其他具有足够物理资源的计算节点中,以便由其他计算节点利用自身的物理资源创建该目标负载。这样,调度系统仅需指示第一计算节点为该目标 负载进行物理资源调度,而无需执行复杂的计算过程来确定分配给该负载的具体物理资源,也无需确定接收负载运行请求的第一计算节点是否具有足够的物理资源来创建负载(由第一计算节点自动计算并确定),从而可以降低调度系统调度物理资源所需的计算量以及调度难度,进而调度系统可以管理和调度更多数量的计算节点,也即增加了计算节点集群的规模。
在一种可能的实施方式中,第一计算节点与调度系统连接,并且,第一计算节点在接收负载运行请求时,具体可以是接收调度系统发送的负载运行请求。如此,第一计算节点在调度系统的调度下,创建并运行目标负载。
其中,调度系统可先获取租户指定的待创建的虚拟实例的规格,举例而言,调度系统可提供配置界面,让租户远程登录配置界面,在配置界面输入待创建的虚拟实例的类型和规格,此时调度系统无需根据该规格在计算节点集群中计算出具有合适的空闲资源的计算节点,而是直接将第一计算节点作为默认的计算节点,向第一计算节点发送指示该规格的负载运行请求,其中该负载运行请求用于请求运行目标负载(即虚拟实例)的与规格匹配的物理资源,再由第一计算节点负责在计算节点集群中计算出具有合适的空闲资源的计算节点(包括第一计算节点自身),因此调度运算的工作交由第一计算节点完成,调度系统可以避免进行直接的调度运算,从而降低调度系统调度物理资源所需的计算量以及调度难度。
其中,虚拟实例例如为虚拟机或容器。
在一种可能的实施方式中,调度系统设置在公有云的数据中心,而第一计算节点所在的计算节点集群设置在与公有云的数据中心远程连接的边缘云数据中心,如此,通过第一计算节点在边缘云的自动调度,可以实现增加部署于边缘云的计算节点集群的规模,而公有云的调度系统无需为边缘云付出过多的调度算力。
在一种可能的实施方式中,调度系统以及第一计算节点所在的计算节点集群均设置在公有云的数据中心,如此,通过第一计算节点的自动调度,可以实现增加部署于公有云的计算节点集群的规模。
在一种可能的实施方式中,调度系统包括虚拟机调度系统以及容器调度系统。这样,当需要为租户在集群节点集群中的第一计算节点或者其他计算节点上创建虚拟机时,可以由虚拟机调度系统生成针对虚拟机的负载运行请求,并将其发送给第一计算节点,而当需要为租户在集群节点集群中的第一计算节点或者其他计算节点上创建容器时,可以由容器调度系统生成针对容器的负载运行请求,并将其发送给第一计算节点。
在一种可能的实施方式中,从第一计算节点上可用的物理资源中为目标负载分配负载运行请求所请求的物理资源后,第一计算节点可以向调度系统发送响应信息,该响应信息用于通知调度系统该负载运行请求所请求的物理资源已经在第一计算节点上扣减,从而调度系统可以更新第一计算节点上可用的物理资源,以便调度系统在请求创建下一个负载时,可以根据创建下一个负载所需的物理资源以及更新后的第一计算节点上可用的物理资源,确定第一计算节点是否能够调度足够的物理资源来支持下一个负载的创建和运行。
在一种可能的实施方式中,第一计算节点在接收负载运行请求之前,可以向调度 系统发送可用的物理资源总量,该可用的物理资源总量包括第一计算节点上的可用的物理资源的资源量以及其他计算节点上可用的物理资源的资源量之和,如此,调度系统可以根据第一计算节点上报的可用的物理资源总量确定是否具有足够的物理资源来创建目标负载。
在一种可能的实施方式中,第一计算节点在接收负载运行请求时,具体可以是接收第一计算节点所在的计算节点集群中的第三计算节点转发的负载运行请求,该第三计算节点上可用的物理资源不满足负载运行请求所请求的物理资源,如此,第一计算节点不仅可以将负载运行请求转发给其他计算节点,也可以是在其他计算节点上的可用的物理资源不足时,接收其他计算节点发送的负载运行请求,并利用其上可用的物理资源创建该负载运行请求所对应的负载。
如上所述,调度系统可先获取租户指定的待创建的虚拟实例的规格,举例而言,调度系统可提供配置界面,让租户远程登录配置界面,在配置界面输入待创建的虚拟实例的类型和规格,此时调度系统无需根据该规格在计算节点集群中计算出具有合适的空闲资源的计算节点,而是直接将第三计算节点作为默认的计算节点,向第三计算节点发送指示该规格的负载运行请求,其中该负载运行请求用于请求运行目标负载(即虚拟实例)的与规格匹配的物理资源,再由第三计算节点负责在计算节点集群中计算出具有合适的空闲资源的计算节点(包括第三计算节点自身),因此调度运算的工作交由第三计算节点完成,调度系统可以避免进行直接的调度运算,从而降低调度系统调度物理资源所需的计算量以及调度难度。
在本实施方式中,第三计算节点负责在计算节点集群中计算出具有合适的空闲资源的计算节点为第一计算节点,因此发送负载运行请求至第一计算节点。
在一种可能的实施方式中,第一计算节点所接收的负载运行请求中携带有目标负载的类型,其中,该目标负载的类型包括虚拟机和容器,则第一计算节点还可以判断所要创建的目标负载的类型为虚拟机或容器,并且,在第一计算节点上可用的物理资源满足该负载运行请求所请求的物理资源的情况下,在目标负载的类型为虚拟机时,根据分配的负载运行请求所请求的物理资源在第一计算节点创建虚拟机,而在目标负载的类型为容器时,根据分配的负载运行请求所请求的物理资源在第一计算节点创建容器。如此,可以实现在第一计算节点上创建多种不同类型的负载。
在一种可能的实施方式中,第一计算节点在确定其他计算节点上可用的物理资源时,具体可以是先采集其他计算节点的物理资源总量以及其他计算节点已使用的物理资源,并根据其他计算节点的物理资源总量以及其他计算节点已使用的物理资源,确定其他计算节点上可用的物理资源,具体可以是从其他计算节点的物理资源总量扣减该已使用的物理资源,从而剩余的物理资源即可以是其他计算节点上可用的物理资源。
在一种可能的实施方式中,在第一计算节点上可用的物理资源满足负载运行请求所请求的物理资源的情况下,从第一计算节点上可用的物理资源中为目标负载分配的物理资源为第一物理资源;则,当第一计算节点满足重调度条件时,释放该第一物理资源,并从第一计算节点上可用的物理资源中重新为目标负载分配第二物理资源,该第二物理资源与第一物理资源不同,如此,第一计算节点可以根据实际应用的需求重新为负载分配物理资源,从而可以提高第一计算节点分配物理资源的合理性以及资源 利用率。
示例性地,第二物理资源的规格与第一物理资源的规格存在差异,或者,第二物理资源包括的资源类型与第一物理资源包括的资源类型存在差异,或者,第二物理资源的性能与第一物理资源的性能存在差异等。
第二方面,本申请提供一种负载处理装置,所述负载处理装置应用于第一计算节点,所述第一计算节点与其他计算节点连接,所述第一计算节点和所述其他计算节点设置在同一计算节点集群,所述负载处理装置包括:资源管理模块,用于确定所述第一计算节点上可用的物理资源和所述其他计算节点上可用的物理资源;通信模块,用于接收负载运行请求,所述负载运行请求用于请求运行目标负载的物理资源;调度模块,用于判断在所述第一计算节点上可用的物理资源是否满足所述负载运行请求所请求的所述物理资源,在所述第一计算节点上可用的物理资源满足所述负载运行请求所请求的所述物理资源的情况下,从所述第一计算节点上可用的物理资源中为所述目标负载分配所述负载运行请求所请求的物理资源,在所述第一计算节点上可用的物理资源不满足所述负载运行请求所请求的所述物理资源的情况下,从所述其他计算节点中选择可用的物理资源满足所述负载运行请求所请求的所述物理资源的第二计算节点,并将所述负载运行请求转发至所述第二计算节点。
在一种可能的实施方式中,所述第一计算节点与调度系统连接,所述通信模块,具体用于接收所述调度系统发送的所述负载运行请求。
在一种可能的实施方式中,所述调度系统设置在公有云的数据中心,所述计算节点集群设置在与所述公有云的数据中心远程连接的边缘云数据中心。
在一种可能的实施方式中,所述调度系统以及所述计算节点集群均设置在所述公有云的数据中心。
在一种可能的实施方式中,所述调度系统包括虚拟机调度系统和容器调度系统。
在一种可能的实施方式中,所述通信模块,还用于从所述第一计算节点上可用的物理资源中为所述目标负载分配所述负载运行请求所请求的物理资源之后,向所述调度系统发送响应信息,其中,所述响应信息用于通知所述调度系统所述负载运行请求所请求的所述物理资源已在所述第一计算节点上扣减。
在一种可能的实施方式中,所述通信模块,还用于在接收所述负载运行请求之前,向所述调度系统发送可用的物理资源总量,所述可用的物理资源总量包括所述第一计算节点上的可用的物理资源的资源量以及所述其他计算节点上可用的物理资源的资源量之和。
在一种可能的实施方式中,所述通信模块,具体用于接收所述计算节点集群中的第三计算节点转发的所述负载运行请求,其中,所述第三计算节点上可用的物理资源不满足所述负载运行请求所请求的所述物理资源。
在一种可能的实施方式中,所述负载运行请求携带有所述目标负载的类型,其中所述目标负载的类型包括虚拟机和容器,所述负载处理装置还包括控制模块;所述控制模块,具体用于:判断所述目标负载的类型为虚拟机或容器;在所述第一计算节点上可用的物理资源满足所述负载运行请求所请求的所述物理资源的情况下,在所述目标负载的类型为虚拟机时,根据分配的所述负载运行请求所请求的物理资源在所述第 一计算节点创建虚拟机,在所述目标负载的类型为容器时,根据分配的所述负载运行请求所请求的物理资源在所述第一计算节点创建容器。
在一种可能的实施方式中,所述资源管理模块,具体用于:采集所述其他计算节点的物理资源总量以及所述其他计算节点已使用的物理资源;根据所述其他计算节点的物理资源总量以及所述其他计算节点已使用的物理资源,确定所述其他计算节点上可用的物理资源。
在一种可能的实施方式中,在所述第一计算节点上可用的物理资源满足所述负载运行请求所请求的所述物理资源的情况下,从所述第一计算节点上可用的物理资源中为所述目标负载分配的物理资源为第一物理资源;所述调度模块,还用于:当所述第一计算节点满足重调度条件时,释放所述第一物理资源;从所述第一计算节点上可用的物理资源中重新为所述目标负载分配第二物理资源,所述第二物理资源与所述第一物理资源不同。
第三方面,本申请提供一种计算节点,所述计算节点包括处理器和存储器;该存储器用于存储指令,当该计算节点运行时,该处理器执行该存储器存储的该指令,以使该计算节点执行上述第一方面或第一方面任一种可能实现方式中的负载处理方法。需要说明的是,该存储器可以集成于处理器中,也可以是独立于处理器之外。计算节点还可以包括总线。其中,处理器通过总线连接存储器。其中,存储器可以包括可读存储器以及随机存取存储器。
第四方面,本申请提供一种计算节点集群,其特征在于,所述计算节点集群包括多个计算节点,所述多个计算节点中的一个或者多个计算节点执行上述第一方面或第一方面任一种可能实现方式中的负载处理方法。
第五方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机设备上运行时,使得计算机设备执行上述第一方面或第一方面的任一种实现方式所述的方法。
第六方面,本申请提供了一种包含指令的计算机程序产品,当其在计算机设备上运行时,使得计算机设备执行上述第一方面或第一方面的任一种实现方式所述的方法。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种计算节点集群的架构示意图;
图2为本申请实施例提供的一种计算节点200的结构示意图;
图3为在计算节点200上重调度物理资源的示意图;
图4为本申请实施例提供的另一种计算节点200的结构示意图;
图5为本申请实施例提供的一种负载处理方法的流程示意图;
图6为本申请实施例提供的另一种负载处理方法的流程示意图;
图7为本申请实施例提供的一种负载处理装置700的结构示意图;
图8为本申请实施例提供的一种计算节点800的结构示意图。
具体实施方式
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解,这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。
参见图1,为一种计算节点集群的示例性架构。如图1所示,该计算节点集群包括多个计算节点,该多个计算节点中包括计算节点200、计算节点300以及计算节点400等。并且,该多个计算节点可以被调度系统100进行资源管理和调度。本实施例中,对于计算节点集群所包括的计算节点的数量并不进行限定。实际应用时,各个计算节点,均可以通过服务器等计算设备实现。调度系统100可以运行在一个或者多个设备上,用于管理一个或者多个计算节点集群的物理资源。
其中,计算节点200、计算节点300以及计算节点400可以周期性的向调度系统100上报自身的资源使用情况。以计算节点200为例,其可以向调度系统100上报计算节点200内部的各种物理资源(如计算节点200包括的处理器的数量,每个处理器中的处理器核的数量、内存的大小等),以及当前运行在计算节点200上的负载所分配到的物理资源(如分配给各个负载的处理器以及内存等)。
计算节点例如为物理服务器,值得注意的是,计算节点也可以是具有一定运算能力的任何计算机。
当需要为租户创建新的负载时,若由调度系统100在计算节点200、计算节点300或者计算节点400上为该负载分配物理资源,具体是对计算节点200或计算节点300上的物理资源进行细粒度的调度,如调度系统100需要指定该负载运行在计算节点200的哪个或者哪些处理器(如CPU、GPU等),并具体指定该处理器上的哪些处理器核(core)负责运行该负载,同时,还指定将计算节点200的内存中的哪些内存空间分配给该负载等。则,当计算节点集群中包括的计算节点数量越多时,调度系统100在多个计算节点中选择一个目标计算节点,利用该目标计算节点为该负载调度物理资源所需的计算量也会越大,调度物理资源的难度也就越高,从而影响调度速度,并基于此顾虑,一定程度限制了计算节点集群的规模。
其中,负载例如为公有云上的各种虚拟实例,例如为虚拟机或容器。
进一步,调度系统可以提供配置界面或应用程序编程接口,配置界面或应用程序编程接口用于获取租户输入的负载的类型(例如虚拟机或容器)和规格(例如CPU核数量和内存大小)。
基于此,本申请实施例提供了一种负载处理方法,用以提高在计算节点集群中调度物理资源的速度,并增大计算节点集群的规模。具体的,调度系统100在为租户申请的负载调度与租户指定的负载的规格匹配的物理资源时,可以根据计算节点200(此处以计算节点200为例)所记录的可用的物理资源的总量,确定可用的物理资源的总量是否能够满足用户申请的新负载所需的物理资源的资源量,并且在确定能够满足时, 直接向计算节点200发送要求所需的物理资源的负载运行请求。计算节点200具有物理资源的调度能力,并可以预先确定计算节点200上可用的物理资源以及计算节点集群中其他计算节点上可用的物理资源。其中,计算节点200上可用的物理资源以及计算节点300、计算节点400、计算节点500、计算节点600上可用的物理资源的资源量之和,即为上述计算节点200所记录的可用的物理资源的总量。然后,计算节点200可以根据接收到的负载运行请求,判断计算节点200上可用的物理资源是否能够满足该负载运行请求所请求的物理资源,并在计算节点200上可用的物理资源能够满足该负载运行请求所请求的物理资源的情况下,从计算节点200可用的物理资源中为该负载分配负载运行请求所请求的物理资源,以便计算节点200能够进一步利用分配的物理资源运行该负载。而在计算节点200上可用的物理资源不满足该负载运行请求所请求的物理资源的情况下,从计算节点集群中选择选择可用的物理资源满足该负载运行请求所请求的物理资源的计算节点300(此处以计算节点300为例),并将该负载运行请求转发至计算节点300,以便计算节点300基于该负载运行请求为该负载分配物理资源。
由于在为该新的负载调度物理资源的过程中,是由计算节点200为该负载进行细粒度的物理资源调度,即由计算节点200自主调度该负载运行在哪个处理器、由哪些处理器核运行负载以及将哪些内存空间分配给该负载等。并且,在计算节点200上可用的物理资源不满足创建该新的负载所需的物理资源时,计算节点200可以将负载运行请求转发至其他具有足够物理资源的计算节点300中,以便由计算节点300利用自身的物理资源创建该负载。这样,调度系统100仅需指示计算节点200为该负载进行物理资源调度,而无需执行复杂的计算过程来确定分配给该负载的具体物理资源,也无需确定接收负载运行请求的计算节点200是否具有足够的物理资源来创建负载(由计算节点200自动计算并确定),从而可以降低调度系统100调度物理资源所需的计算量以及调度难度,进而调度系统100可以管理和调度更多数量的计算节点,也即增加了计算节点集群的规模。
实际应用场景中,图1所示的计算节点集群可以部署为公有云或者边缘云。其中,当基于该计算节点集群部署公有云时,该公有云的形态可以包括以公有云数据中心为核心的公有云,边缘云的形态为以基站数据中心为核心的边缘云、以大企业数据中心为核心的企业云、以及以轻量边缘站点数据中心为核心的小型边缘云等。本实施例中,对于如1所示的计算节点集群在实际应用场景中的具体部署形态并不进行限定。作为一种示例,调度系统100以及计算节点集群可以部署于相同的云环境中,如调度系统100以及计算节点集群可以全部设置在公有云的数据中心。而在另一种示例中,调度系统100以及计算节点集群可以分开部署,例如,调度系统100可以设置在公有云的数据中心,而计算节点集群可以设置在与公有云的数据中心远程连接的边缘云数据中心。
值得注意的是,计算节点集群的数量可以是多个,调度系统100可分别与多个计算节点集群连接,调度系统100可选择某个计算节点集群,并向某个计算节点集群中的预定计算节点发送负载运行请求,由该计算节点在本计算节点集群内进行调度运算,从而选出适合运行虚拟实例的计算节点,由于调度系统100无需实现复杂的调度运算, 使得调度系统100可支持的计算节点集群数量更多。
可选地,调度系统100可提供配置界面或应用程序编程接口以供租户选择特定的计算节点集群,如边缘云1、边缘云2等,边缘云1表示设置在边缘云数据中心1的计算节点集群,边缘云2表示设置在边缘云数据中心2的计算节点集群。在图1所示的计算节点集群的基础上,参见图2,为本申请实施例中提供一种该集群节点集群中的计算节点200的结构示意图。如图2所示,计算节点200可以包括资源管理模块201、通信模块202、调度模块203、控制模块204。计算节点200可以与其他计算节点连接,以便基于该连接实现与其他计算节点之间的数据通信。本实施例中,其他计算节点可以指计算节点集群中除计算节点200以外的一个或者多个计算节点,如图2中的计算节点300以及计算节点400等。值得注意的是,调度系统100可以位于计算节点集群外部,如图2所示,而在其他可能的实施方式中,调度系统100也可以是位于计算节点集群内部。本实施例对此并不进行限定。
资源管理模块201用于对计算节点200上可用的物理资源进行管理,该物理资源例如可以包括计算资源、存储资源以及网络资源等。其中,计算资源,包括处理器以及内存,其中,处理器例如可以是中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)等,并且,每个处理器可以包括一个或者多个处理器核。存储资源,例如可以是云磁盘。网络资源,例如可以是数据通信时所采用的连链路带宽、网口等、弹性公网IP、弹性网卡等。并且,计算节点200上可用的物理资源,可以是指计算节点200上未被分配的剩余物理资源。
实际应用时,资源管理模块201可以采集计算节点200内部的物理资源的总量信息,并对已经分配给运行在该计算节点200上的负载的物理资源进行记录,从而资源管理模块201可以确定计算节点200上当前未被分配给任意负载的物理资源,以下称之为可用的物理资源。进一步的,资源管理模块201所确定的可用的物理资源中,还可以包括已经分配给负载但是该负载在运行时并未使用的物理资源。这样,对于该部分暂未被负载所使用的物理资源,可以在一定时间段内被调度给其他负载进行复用。如此,可以提高计算节点200上的资源利用率。
同时,资源管理模块201还可以基于计算节点200与其他计算节点之间的通信连接,采集其他计算节点上可用的物理资源。与确定计算节点200上可用的物理资源的实现方式类似,资源管理模块201可以采集其他计算节点中的计算节点300以及计算节点400中的物理资源的总量信息以及已经分配的物理资源的信息,从而资源管理模块201可以根据各个计算节点中的物理资源的总量信息以及已经分配的物理资源的信息,分别确定计算节点300中可用的物理资源以及计算节点400中可用的物理资源。当然,在其他可能的实施方式中,计算节点300以及计算节点400可以预先根据自身的物理资源的总量信息以及已经分配的物理资源的信息,确定出各自可用的物理资源,从而资源管理模块201通过其与计算节点300、计算节点400之间的通信连接,获取计算节点300上已经确定的可用的物理资源的信息以及计算节点400上已经确定的可用的物理资源的信息。
进一步的,资源管理模块201可以根据获取的计算节点200上可用的物理资源以及其他计算节点上可用的物理资源,计算出可用的物理资源的总量,并将该可用的物 理资源总量上报给调度系统100,如上报计算节点200当前剩余的处理器核的总量、剩余内存的总量、云磁盘的总量、可用带宽总量等。这样,当调度系统100需要为租户创建新的负载(以下称之为目标负载)时,可以根据租户所指定的规格,确定创建目标负载所需的物理资源的资源量,并且当该物理资源的资源量小于计算节点200上报的可用的物理资源的总量时,调度系统100可以为该目标负载生成负载运行请求,并将该负载运行请求发送给计算节点200,以请求计算节点200利用相应的物理资源创建并运行目标负载。
通信模块202,用于接收负载运行请求,该负载运行请求用于请求计算节点200运行目标负载。示例性地,负载运行请求,可以包括目标负载的类型、优先级、运行目标负载所需的物理资源的种类和数量、以及目标负载运行的时间段等。其中,目标负载的类型,例如可以是虚拟机、容器和进程等类型。目标负载的优先级,可以用于指示计算节点200创建目标负载的优先程度。比如,当调度系统100同时向计算节点200发送针对目标负载的负载运行请求以及针对其他负载的负载运行请求时,若目标负载的优先级高于其他负载的优先级,则计算节点200可以优先为目标负载分配物理资源。实际应用时,该负载运行请求中也可以是包括上述信息中的任意一种或多种。或者,负载运行请求还可以包括其他信息,如包括运行该负载的可替换的物理资源。比如,当计算节点200上剩余的GPU资源不足以运行该负载时,计算节点200可以利用CPU资源来支持该负载的运行。然后,通信模块202可以将接收到的负载运行请求提供给调度模块203。
调度模块203在接收到负载运行请求后,可以解析出目标负载的类型、优先级、物理资源的种类和数量以及运行时间段等信息,并向资源管理模块201查询当前计算节点200的可用的物理资源。然后,调度模块203根据解析得到的信息,进一步判断查询到的计算节点200的可用物理资源是否能够满足运行目标负载所需的物理资源。若能够满足,则调度模块203可以从计算节点200上可用的物理资源中选取相应的第一物理资源。例如,当负载运行请求指示了目标负载的优先级时,调度模块203可以根据该优先级的高低确定为目标负载分配第一物理资源的顺序;当负载运行请求指示了物理资源的种类以及数量时,调度模块203可以选取对应类别以及对应数量的物理资源作为第一物理资源;当负载运行请求指示了运行时间段时,调度模块203可以选取在该运行时间段内不被其他负载所使用的物理资源作为第一物理资源。然后,调度模块203将选取的第一物理资源分配给目标负载,例如可以是建立目标负载与第一物理资源的关联关系等。此时,资源管理模块201可以在确定的可用物理资源中扣除已经分配给目标负载的第一物理资源。进一步的,资源管理模块201还可以通过通信模块202向调度系统100发送响应信息,该响应信息用于通知调度系统100该负载运行请求所请求的物理资源已经在计算节点200上完成扣减,从而调度系统100可以根据负载运行请求所请求的物理资源的资源量,对计算节点200上报的可用的物理资源的总量进行相应的扣减。
实际应用时,调度模块203可以预先配置有相应的资源调度策略,从而在为目标负载调度物理资源时,调度模块203根据负载运行请求选择合适的资源调度策略,从而在满足负载运行请求所请求的物理资源的情况,基于该资源调度策略从可用物理资 源中为目标负载调度第一物理资源。
作为一些示例,资源调度策略例如可以是均衡调度策略,即在为目标负载调度第一物理资源时,已经分配给各个负载的物理资源在计算节点200上均衡化分布。以计算资源为例,假设计算节点200上包括处理器1以及处理器2,并且每个处理器中可以包括多个处理器核,则调度模块203在将处理器1中的部分处理器核分配给负载1后,即使处理器1中还存在未被分配的剩余处理器核,调度模块203也可以根据该均衡策略将处理器2中的部分处理器核分配给负载2。如此,处理器1以及处理器2上均存在部分处理器核被分配给计算节点200上的负载。
或者,资源调度策略例如可以是顺序调度策略,即在为目标负载调度第一物理资源时,可以将计算节点200上的物理资源顺序调度给该目标负载。仍以为负载分配处理器核为例,调度模块203在将处理器1中的部分处理器核分配给负载1后,由于处理器1中还存在未被分配的剩余处理器核,则调度模块203可以优先将处理器1中未被分配的剩余处理器核分配给负载2。并且,若分配给负载2的处理器核的数量满足负载2的运行所需,则针对负载2的计算资源调度结束,此时,处理器1以及处理器2中针对处理器核的资源分配情况并不均衡。而若分配给负载的处理器核的数量仍然不满足负载2的运行所需,则调度模块203可以继续将处理器2中的处理器核分配给负载2。当然,实际应用时,资源调度策略也可以是采用其他可能的实施方式,如随机选择物理资源进行调度等,本实施例对此并不进行限定。
实际应用场景中,计算节点200还可以包括控制模块204,并且,调度模块203在基于负载运行请求为目标负载分配第一物理资源后,可以将第一物理资源的信息发送给控制模块204。这样,控制模块204可以利用调度模块203所分配的第一物理资源启动运行目标负载,目标负载的类型即为负载运行请求所指示的类型。作为一种实现示例,调度系统100具体可以是虚拟机调度系统,并且,该虚拟机调度系统可以向计算节点200发送创建虚拟机的负载运行请求。这样,控制模块204可以判断目标负载的类型为虚拟机,从而在计算节点200上可用的物理资源能够满足负载运行请求所请求的物理资源的情况下,计算节点200可以根据分配的第一物理资源(也即负载运行请求所请求的物理资源)在计算节点200创建虚拟机。或者,调度系统100具体可以是容器调度系统,并且,该容器调度系统可以向计算节点200发送创建容器的负载运行请求,这样,控制模块204可以判断目标负载的类型为容器,从而在计算节点200上可用的物理资源能够满足负载运行请求所请求的物理资源的情况下,根据分配的第一物理资源(也即负载运行请求所请求的物理资源)在计算节点200创建容器。当然,在其它示例中,调度系统可以同时集成虚拟机调度系统以及容器调度系统的功能,本实施例对此并不进行限定。
另外,当该目标负载结束运行时,如该目标负载的运行时长达到调度系统100所请求的运行时长等,控制模块204还可以停止运行目标负载,并释放分配给目标负载的第一物理资源。此时,资源管理模块201所确定的可用物理资源中可以重新包括该第一物理资源。
进一步地,计算节点200还可以包括监控模块205,该监控模块205用于对计算节点200上当前被使用物理资源进行监控,采集得到资源使用数据,该资源使用数据 用于指示运行在计算节点200上的负载在历史时间段(如过去24小时内)使用物理资源的情况。这样,监控模块205可以根据该资源使用数据,预测已经分配给计算节点200上运行的负载但是该负载在未来时间段(如未来24小时内)可能不被使用的物理资源,从而资源管理模块201可以将监控模块205预测的这些不被负载使用的物理资源纳入可用物理资源,以便后续可以将这部分物理资源分配给目标负载并在指定的时间段内进行运行。如此,计算节点200上有限的物理资源可以支持更多数量的负载运行。在另一种示例中,也可以是由资源管理模块201根据监控模块205采集到的资源使用数据预测出未来一段时间内不被负载所使用的物理资源等,本实施例对此并不进行限定。
值得注意的是,计算节点200上可用的物理资源也可能不满足负载运行请求所请求的物理资源。此时,调度模块203可以向资源管理模块201查询其他计算节点上可用的物理资源,从而调度模块203可以根据资源管理模块201反馈的计算节点300上可用的物理资源以及计算节点400上可用的物理资源,确定能够满足负载运行请求所请求的物理资源的计算节点。假设计算节点300上可用的物理资源能够满足负载运行请求所请求的物理资源,则调度模块203可以将通过通信模块202将负载运行请求转发给至计算节点300。这样,计算节点300在接收到该负载运行请求后,可以从自身可用的物理资源中为该负载运行请求分配相应的第一物理资源,并利用该第一物理资源创建并运行目标负载。其中,计算节点300根据接收到的负载运行请求分配第一物理资源并运行目标负载的具体实现方式,与上述计算节点200分配第一物理资源并运行目标负载的具体实现方式类似,可参见前述实施例的相关之处描述,在此不做赘述。
值得注意的是,本实施例中,计算节点200不仅可以从调度系统100中接收到负载运行请求,也可以是从其他计算节点处接收到负载运行请求。比如,在部分应用场景中,调度系统100可以先向计算节点400发送负载运行请求,以请求计算节点400基于该负载运行请求创建目标负载。当计算节点400上可用的物理资源不能满足负载运行请求所请求的物理资源时,计算节点400可以将该负载运行请求转发给能够满足该负载运行请求所请求的物理资源的计算节点200,以便计算节点200利用自身的物理资源基于该负载运行请求创建并运行目标负载。
实际应用时,调度模块203在为计算节点200上的各个负载分配物理资源后,可能会使得计算节点200上存在物理资源碎片。此时,调度模块203可以对计算节点200上各个负载所分配到的物理资源进行重新调度,以此减少计算节点200上的资源碎片率。同时,调度模块203可以通过重新调度多个负载所分配到的物理资源来提升部分负载在运行时的性能。具体实现时,调度模块203在接收到负载运行请求时,可以根据负载运行请求为目标负载分配上述第一物理资源,而当计算节点200满足重调度条件时,调度模块203可以释放该第一物理资源,并从计算节点200上可用的物理资源中重新为该目标负载分配第二物理资源,该第二物理资源与第一物理资源不同。比如,第二物理资源的规格与第一物理资源的规格存在差异,或者,第二物理资源包括的资源类型与第一物理资源包括的资源类型存在差异,或者,第二物理资源的性能与第一物理资源的性能存在差异等。
以计算资源为例,假设计算节点200上包括如图3所示的两个CPU,并且每个 CPU上包括两个非统一内存访问架构(non-uniform memory access,NUMA)节点,每个NUMA节点中集成有8个处理器核,如图3所示的C1至C32。在进行物理资源的初始调度过程中,对于运行在计算节点200上的负载1、负载2以及负载3,调度模块203可以将NUMA0节点中的处理器核C1至C6分配给负载1、将NUMA1节点中的C9至C16以及NUMA2节点中的C17至C20分配给负载2、将NUMA3节点中的C25至C30分配给负载3。这样,计算节点200的NUMA0节点上的处理器核C7至C8、NUMA2节点上的C21至C24、以及NUMA3节点上的C31至C32即为计算节点200中的计算资源碎片,如图3所示。为此,调度模块203可以对分配给负载2以及负载3的物理资源进行重新调度,具体可以将NUMA2节点上的处理器核C17至C24以及NUMA3节点上的C25至28分配给负载3、将NUMA0节点上的处理器核C7至C8以及NUMA3节点上的C29至C32分配给负载2。这样,通过调度模块203对于计算资源的重新调度后,计算节点200上可以剩余完整的NUMA1节点,而在其余NUMA节点上可以不存在计算资源碎片。同时,在对物理资源进行重新调度后,负载3所分配到的新的计算资源全部位于同一CPU,这使得负载3所分配到的处理器核之间可以无需进行跨CPU进行通信,从而可以提高负载3在运行时所具有的性能。
在一些示例中,调度模块203可以是周期性的进行物理资源的重调度,以减少计算节点200上的物理资源碎片;相应的,计算节点200所满足的重调度条件,具体可以是距离上一次执行重调度的时间间隔达到重调度的周期时长等。或者,调度系统100也可以向计算节点200下发重新调度物理资源的指令,从而调度模块203根据该指令执行重新调度物理资源的过程;相应的,计算节点200所满足的重调度条件,具体可以是接收到重调度的指令。又或者,调度模块203可以根据资源管理模块201所确定的计算节点200上可用的物理资源的资源量,当该资源量低于预设阈值时,调度模块203可以主动执行物理资源的重调度过程;相应的,计算节点200所满足的重调度条件,具体可以是计算节点200上可用的物理资源的资源量低于预设阈值。又或者,调度模块203可以根据资源管理模块201所确定的计算节点200上可用的物理资源计算物理资源的碎片率,从而当该碎片率高于预设碎片率时,调度模块203可以主动执行物理资源的重调度过程等;相应的,计算节点200所满足的重调度条件,具体可以是计算节点200的碎片率高于碎片率阈值。又或者,当计算节点200接收到需要提升目标负载在运行时的业务服务质量时,调度模块203可以通过重新调度物理资源的方式提高目标负载的性能,从而提升目标负载对应的业务服务质量;相应的,计算节点200所满足的重调度条件,具体可以是目标负载在运行时的业务服务质量需要被提升。本实施例中,对于如何触发调度模块203重新调度物理资源的具体实现方式并不进行限定。
其中,调度模块203在为各个负载重新调度物理资源时,可以根据监控模块205预测出的各个负载在未来一段时间段内对于物理资源的使用情况,为该各个负载分配适量的物理资源。其中,监控模块205可以根据监控得到的资源使用数据,预测各个负载在未来一段时间段内对于物理资源的使用情况。比如,当监控模块205预测部分负载在未来一段时间内所使用的物理资源小于初始分配该负载的物理资源时,调度模块203在重新为该负载分配物理资源时,可以减少分配的物理资源的资源量。如此, 计算节点200上有限的物理资源可以支持更多数量的负载运行。
或者,实际应用场景中,运行在计算节点200上的不同负载为租户提供不同类型的服务,此时,不同负载在运行时所使用的物理资源可能存在较大差异。例如,假设计算节点200上运行有负载1、负载2以及负载3。其中,负载1用于为租户提供文本编辑等办公类服务,负载2用于为租户提供模型训练类型的服务,负载3用于为租户提供数据存储类型的服务。此时,负载1在运行时对于计算资源以及存储资源的需求通常较低;负载2在运行时对于计算资源的需求通常较高,而对于存储资源的需求较低;负载3在运行时对于计算资源的需求较低,而对于存储资源的需求较高。基于此,监控模块205可以根据资源使用数据,对计算节点200上运行的各个负载按照服务类型进行分类,从而调度模块203可以根据各个负载所对应的服务类型,确定重新为各个负载分配的物理资源的资源量。比如,对于负载1,由于该负载实际应用时所使用的计算资源以及存储资源的资源量较小,从而调度模块203在为负载1重新分配物理资源时,可以减少分配的计算资源以及存储资源的资源量。类似的,调度模块203在为负载2重新分配物理资源时,可以减少分配的存储资源的资源量;调度模块203在为负载3重新分配物理资源时,可以减少分配的计算资源的资源量。
进一步的,计算节点200上运行的不同负载之间具有一定的相关性,此时,若其中一个负载对于物理资源的使用情况发生变化,其余负载对于物理资源的使用情况也会发生相应变化。比如,假设计算节点200运行有负载1以及负载2,并且负载2作为负载1的备份,以提高负载1和负载2提供服务的可靠性。此时,若负载1在运行时生成并存储的业务数据的数据量较大时,则从负载1备份至负载2上的业务数据的数据量也较大,也即,负载2对于存储资源的使用需求可以随着负载1对于存储资源的使用需求发生变化而变化。为此,监控模块205根据资源使用数据,对计算节点200上运行的多个负载进行聚合处理,从而调度模块203在为聚合后的多个负载重新调度物理资源时,可以根据该负载的聚合特征,为其分配相应的物理资源。
实际应用时,监控模块205也可以根据监控得到的资源使用数据同时进行上述分类、聚合以及预测处理,从而调度模块203可以根据监控模块205所获得针对各个负载的处理结果(包括分类信息、聚合信息以及预测信息等),重新为多个负载调度物理资源。
上述图2所示的计算节点200,可以支持一种类型的负载在该计算节点200上运行,而在其他可能的实施例中,计算节点200也可以是同时支持多种类型的负载在其上运行。参阅图4,为另一种计算节点200的结构示意图。如图4所示,该计算节点200仍然可以包括资源管理模块201、通信模块202、调度模块203、控制模块204以及监控模块205。其中,资源管理模块201可以用于管理计算节点200(以及其他计算节点)上的可用物理资源。通信模块202用于接收调度系统100或者其他计算节点发送的负载运行请求,并将该负载运行请求提供给调度模块203。调度模块203可以根据该负载运行请求在计算节点200上调度相应的物理资源并将其分配给目标负载。
与图2所示的计算节点200不同的是,图4所示的计算节点200中,控制模块204可以包括多个控制单元,如图4所示的控制单元1以及控制单元2。其中,不同控制单元用于控制启动以及运行不同类型的负载,如控制单元2041可以控制运行虚拟机类 型的负载,控制单元2042可以控制运行容器类型的负载。并且,不同控制单元可以支持不同类型负载所对应的负载控制接口,该不同类型负载所对应的负载控制接口可以由计算节点200上的操作系统进行定义。
因此,在调度模块203为目标负载分配第一物理资源后,控制模块204可以根据负载运行请求所指示的目标负载所属的负载类型,确定与该负载类型对应的控制单元,并进一步通过该控制单元利用分配的第一物理资源启动并运行目标负载。该目标负载(以及运行在计算节点200上的其他负载)在运行时对于物理资源的使用情况,可以由监控模块205进行监控。在图4所示的计算节点200中,由资源管理模块201至监控模块205所构成的计算节点200可以支持多种类型的负载运行以及资源分配,而可以不用为了支持多种类型的负载运行在计算节点200上配置多种调度系统,这可以有效降低计算节点200针对该调度系统的开销,并可以降低整体系统的复杂性。
本实施例中,图2以及图4所示的计算节点200中,资源管理模块201、通信模块202、调度模块203、控制模块204以及监控组件205,可以是通过软件实现,例如可以是运行在计算节点200上的计算机程序,如组件、插件等。其中,当控制模块204通过组件实现时,该控制模块204中的控制单元2041以及控制单元2042可以为预先注册至该控制模块204中的插件等。如控制单元2041可以是基于QEMU的插件,用于支持虚拟机类型的负载运行;控制单元2042可以是基于KATA的插件,用于支持容器类型的负载运行。或者,资源管理模块201、通信模块202、调度模块203、控制模块204以及监控组件205也可以是由硬件实现,如资源管理模块201、调度模块203、控制模块204以及监控组件205可以利用专用集成电路(application-specific integrated circuit,ASIC)实现、或可编程逻辑器件(programmable logic device,PLD)实现的设备等。通信模块202可以利用网卡实现等。其中,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD)、现场可编程门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合实现。
需要说明的是,图2以及图4所示的计算节点200所包括的架构,仅作为一种示例性说明,并不用于限定计算节点200的具体实现。例如,在其他可能的实施例中,计算节点200可以包括更多的功能模块以支持计算节点200具有更多其他的功能;或者,计算节点200可以同时支持三种或三种以上类型的负载在该计算节点200上运行等。
为便于理解,下面结合附图,对本申请的实施例进行描述。
参见图5,图5为本申请实施例提供的一种负载处理方法的流程示意图。其中,图5所示的负载处理方法可以应用于图2或图4所示的计算节点200,或者应用于其他可适用的计算节点200中。为便于说明,本实施例中以应用于图2所示的计算节点200为例进行示例性说明。
基于图2所示的计算节点200,图5所示的负载处理方法具体可以包括:
S501:通信模块202接收调度系统100或者其他计算节点发送的负载运行请求。
本实施例中,当计算节点集群需要为租户创建目标负载时,调度系统100可以生 成负载运行请求,并将该负载运行请求发送给计算节点200,以请求在计算节点200上为该目标负载调度物理资源并启动运行。实际应用时,计算节点集群中各个计算节点可以周期性的上报自身剩余的物理资源总量信息,从而调度系统100可以根据确定剩余物理资源能够满足该目标负载所需的计算节点200,并将生成的负载运行请求发送给该计算节点200。
而在另一种可能的实施方式中,调度系统100在生成负载运行请求后,可以将该负载运行请求发送给计算节点集群中的预定计算节点,从而该计算节点在接收到该负载运行请求后,可以由该计算节点判断是否具有足够的可用物理资源来支持目标负载的运行。若有,则该计算节点可以为目标负载进行物理资源调度并运行该目标负载。而若没有,则该计算节点可以根据其记录的与计算节点200连接的其他计算节点(如图2中的计算节点300以及计算节点400等)上可用的物理资源,确定计算节点200上可用的物理资源能够满足目标负载所需。此时,该计算节点可以将负载运行请求发送给计算节点200,以便在该计算节点200创建并运行该目标负载。在该实施方式中,计算节点集群可以在局部的多个计算节点中快速实现为目标负载进行物理资源调度,从而可以提高计算节点集群进行资源调度的实时性。
S502:通信模块202将接收到的负载运行请求发送给调度模块203。
S503:调度模块203根据资源管理模块201确定的计算节点200所记录的可用的物理资源,确定是否存在与负载运行请求相匹配的物理资源。若不存在,则执行步骤S504;而若存在,则执行步骤S505。
其中,计算节点200所记录可用的物理资源可以包括计算节点200自身所具有的可用的物理资源,以及与该计算节点200连接的其他一个或者多个计算节点上所具有的可用的物理资源。不同计算节点之间,可以周期性地针对自身所具有的可用的物理资源进行交互并记录,实际应用时,不同计算节点之间可以通过通信模块进行交互。
示例性地,计算节点200上可用的物理资源,例如可以是计算节点200上当前未被分配给任意负载的物理资源。进一步的,可用的物理资源,除了包括未被分配给任意负载的物理资源之外,还可以包括已经分配给负载但是该负载在运行时并未使用的物理资源。其中,物理资源,可以包括计算资源、存储资源以及带宽资源等。
S504:调度模块203拒绝负载运行请求,并结束流程。
实际应用时,在结束流程之前,调度模块203还可以通过通信模块202向调度系统100反馈资源调度失败的通知,以便于调度系统100请求其他计算节点进行物理资源调度。
S505:调度模块203判断与负载运行请求相匹配的物理资源是否位为计算节点200本地的物理资源。若是,则为目标负载分配第一物理资源,并继续执行步骤S508;若否,则继续执行步骤S506。
S506:调度模块203通过通信模块204将负载运行请求转发给具有该物理资源的其他计算节点。
S507:通信模块203确定是否转发成功。若转发成功,则继续执行步骤S508;若转发失败,则继续执行步骤S504。
S508:控制模块204利用分配的第一物理资源,启动目标负载。
S509:资源管理模块201从记录的可用的物理资源中,扣除该为目标负载分配的第一物理资源。
进一步的,当图5所示的负载处理方法应用于图4所示的计算节点200时,控制模块204在启动目标负载的过程中,可以先确定该目标负载所属的负载类型,并根据该负载类型选择与该负载类型相对应的控制单元来启动目标负载。
上述实施例中,是从计算节点200为目标负载调度物理资源的角度进行介绍。实际应用场景中,计算节点200还可以对已经分配给各个负载的物理资源进行重新调度。下面,结合附图对计算节点200重新调度物理资源的具体实现流程进行介绍。
参见图6,为一种重新调度计算节点200上各个负载所分配到的物理资源的流程示意图,该方法可以应用于图2或者图4所示的计算节点200,该方法具体可以包括:
S601:调度模块203确定满足重调度条件。
本实施例中,计算节点200可以在满足预设的重调度条件时,确定重新调度已经分配给计算节点200上运行的负载的物理资源。
作为一些示例,重调度条件,例如可以是接收到调度系统100发送的重调度指令,从而计算节点200可以在该重调度指令的指示下,确定重新调度已经分配给计算节点200上运行的负载的物理资源。比如,当运行在计算节点200上的某个负载需要提升业务服务质量时,调度系统100可以指示计算节点200重新为该负载进行重调度,以此提高该负载在运行时的性能,进而提升负载2对应的业务服务质量。
或者,重调度条件,例如可以是资源管理模块201所确定的可用的物理资源的资源量低于预设阈值,以便计算节点200通过重新调度物理资源来提高分配物理资源的合理性。
又或者,重调度条件,例如可以是计算节点200上物理资源的碎片率低于预设碎片率等,这样,计算节点200通过重新调度物理资源可以减小计算节点200上的物理资源的碎片率。
又或者,计算节点200可以周期性对分配给各个负载的物理资源进行重新调度,则重调度条件,例如可以是计算节点200距离上一次重新调度物理资源的时长达到预设时长(也即重调度周期)。本实施例中,对于重调度条件的具体实现方式并不进行限定。
S602:调度模块203获取资源管理信息以及监控信息,该资源管理信息包括资源管理模块记录的计算节点200上的物理资源总量以及分配给各个负载的物理资源信息,该监控信息包括监控模块205针对运行在计算节点200上各个负载所对应的分类信息、聚合信息以及预测信息。
其中,监控模块205对计算节点200上的各个负载所进行的分类、聚合以及预测处理的具体实现方式,可以参见前述实施例中的相关之处描述,在此不做赘述。
S603:调度模块203判断是否对已经分配给该负载的物理资源进行重新调度。若是,则,继续执行步骤604;若否,则继续执行步骤S607。
实际应用时,调度模块203可以对计算节点200上运行的所有负载进行物理资源重调度,也可以是针对部分负载进行物理资源重调度,比如,调度模块203可以为调 度系统100所指定的负载重新调度物理资源等。
S604:调度模块203释放第一物理资源,并为负载重新调度第二物理资源。
在进一步可能的实施方式中,调度模块203在为负载重新调度物理资源之前,还可以查找是否存在相对于当前资源调度策略更加优化的调度方案,该调度方案可以预先设定或者由计算节点200实时生成。其中,该调度方案可以使得负载在运行性能、资源消耗等方面得到优化。若存在,则调度模块203可以基于该调度方案完成对该负载的物理资源重调度,具体可以是先释放已经为负载分配的第一物理资源,并在完成第一物理资源的释放后,从计算节点200当前的可用物理资源中为该负载重新分配第二物理资源。而若不存在,则调度模块203可以基于当前的资源调度策略为负载调度物理资源。
S605:资源管理模块201修正可用的物理资源。
由于调度模块203在为负载重新调度物理资源后,计算节点200上分配给各个负载的物理资源的信息发生变化,因此,资源管理模块201可以根据各个负载重新分配到的物理资源,对重调度之前所记录的资源分配信息进行修正。
S606:控制模块204针对物理资源发生变化的负载,执行负载迁移操作。
具体实现时,控制模块204利用重新分配的物理资源重新运行该负载,以使得该负载基于新分配的物理资源进行运行,完成负载迁移。同时,控制模块204可以释放之前分配给负载的物理资源。
S607:计算节点200上的物理资源重调度过程结束。
基于上述负载处理方法,本申请实施例还提供一种负载处理装置。参阅图7,示出了本申请实施例提供的一种负载处理装置的结构示意图,负载处理装置700可以应用于第一计算节点,该第一计算节点与其他计算节点连接,并且第一计算节点和其他计算节点设置在同一计算节点集群,所述负载处理装置700包括:
资源管理模块701,用于确定所述第一计算节点上可用的物理资源和所述其他计算节点上可用的物理资源;
通信模块702,用于接收负载运行请求,所述负载运行请求用于请求运行目标负载的物理资源;
调度模块703,用于判断在所述第一计算节点上可用的物理资源是否满足所述负载运行请求所请求的所述物理资源,在所述第一计算节点上可用的物理资源满足所述负载运行请求所请求的所述物理资源的情况下,从所述第一计算节点上可用的物理资源中为所述目标负载分配所述负载运行请求所请求的物理资源,在所述第一计算节点上可用的物理资源不满足所述负载运行请求所请求的所述物理资源的情况下,从所述其他计算节点中选择可用的物理资源满足所述负载运行请求所请求的所述物理资源的第二计算节点,并将所述负载运行请求转发至所述第二计算节点。
在一种可能的实施方式中,所述第一计算节点与调度系统连接,所述通信模块702,具体用于接收所述调度系统发送的所述负载运行请求。
在一种可能的实施方式中,所述调度系统设置在公有云的数据中心,所述计算节点集群设置在与所述公有云的数据中心远程连接的边缘云数据中心。
在一种可能的实施方式中,所述调度系统以及所述计算节点集群均设置在所述公有云的数据中心。
在一种可能的实施方式中,所述调度系统包括虚拟机调度系统和容器调度系统。
在一种可能的实施方式中,所述通信模块702,还用于从所述第一计算节点上可用的物理资源中为所述目标负载分配所述负载运行请求所请求的物理资源之后,向所述调度系统发送响应信息,其中,所述响应信息用于通知所述调度系统所述负载运行请求所请求的所述物理资源已在所述第一计算节点上扣减。
在一种可能的实施方式中,所述通信模块702,还用于在接收所述负载运行请求之前,向所述调度系统发送可用的物理资源总量,所述可用的物理资源总量包括所述第一计算节点上的可用的物理资源的资源量以及所述其他计算节点上可用的物理资源的资源量之和。
在一种可能的实施方式中,所述通信模块702,具体用于接收所述计算节点集群中的第三计算节点转发的所述负载运行请求,其中,所述第三计算节点上可用的物理资源不满足所述负载运行请求所请求的所述物理资源。
在一种可能的实施方式中,所述负载运行请求携带有所述目标负载的类型,其中所述目标负载的类型包括虚拟机和容器,所述负载处理装置还包括控制模块704;
所述控制模块704,具体用于:
判断所述目标负载的类型为虚拟机或容器;
在所述第一计算节点上可用的物理资源满足所述负载运行请求所请求的所述物理资源的情况下,在所述目标负载的类型为虚拟机时,根据分配的所述负载运行请求所请求的物理资源在所述第一计算节点创建虚拟机,在所述目标负载的类型为容器时,根据分配的所述负载运行请求所请求的物理资源在所述第一计算节点创建容器。
在一种可能的实施方式中,所述资源管理模块701,具体用于:
采集所述其他计算节点的物理资源总量以及所述其他计算节点已使用的物理资源;
根据所述其他计算节点的物理资源总量以及所述其他计算节点已使用的物理资源,确定所述其他计算节点上可用的物理资源。
在一种可能的实施方式中,在所述第一计算节点上可用的物理资源满足所述负载运行请求所请求的所述物理资源的情况下,从所述第一计算节点上可用的物理资源中为所述目标负载分配的物理资源为第一物理资源;
所述调度模块703,还用于:
当所述第一计算节点满足重调度条件时,释放所述第一物理资源;
从所述第一计算节点上可用的物理资源中重新为所述目标负载分配第二物理资源,所述第二物理资源与所述第一物理资源不同。
进一步的,负载处理装置700还可以包括更多的功能模块,如还可以包括监控模块705等,该监控模块705用于对已分配以及未分配的物理资源进行监控,以便资源管理模块701可以根据监控结果确定可用的物理资源等。
本实施例提供的负载处理装置700对应于前述实施例中的负载处理方法,因此,本实施例中所提供的各个模块的具体实现方式及其所具有的技术效果,可以参见前述 实施例中的相关之处描述,在此不做赘述。具体地,负载处理装置700中的资源管理模块701的具体实现方式及其所具有的技术效果,可以参见前述实施例中的资源管理模块201;负载处理装置700中的通信模块702的具体实现方式及其所具有的技术效果,可以参见前述实施例中的通信模块202;负载处理装置700中的调度模块703的具体实现方式及其所具有的技术效果,可以参见前述实施例中的调度模块203;负载处理装置700中的控制模块704的具体实现方式及其所具有的技术效果,可以参见前述实施例中的控制模块204;负载处理装置700中的监控模块705的具体实现方式及其所具有的技术效果,可以参见前述实施例中的监控模块205等,本实施例在此不做赘述。
另外,本申请实施例还提供了一种计算节点,该计算节点可以是用于实现上述计算节点200的设备。参见图8,示出了该计算节点的硬件结构示意图。
如图8所示,计算节点800包括总线801、处理器802、通信接口803和存储器804。处理器802、存储器804和通信接口803之间通过总线801通信。总线801可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图8中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。通信接口803用于与外部通信,例如接收终端发送的数据获取请求等。
其中,处理器802可以为中央处理器(central processing unit,CPU)。存储器804可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。存储器804还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,HDD或SSD。
存储器804中存储有可执行代码,处理器802执行该可执行代码以执行前述计算节点200所执行的方法。
具体地,在实现图7所示实施例的情况下,执行图7中的负载处理装置700的功能所需的软件或程序代码存储在存储器804中,负载处理装置700与其他设备(如其他计算节点)的交互通过通信接口803实现,处理器用于执行存储器804中的指令,实现负载处理装置700的功能,或者执行上述计算节点200所执行的方法。
此外,本申请实施例还提供了一种计算节点集群,如图2以及图4中所示的计算节点集群,所述计算节点集群包括多个计算节点,所述多个计算节点中的一个或者多个计算节点执行上述实施例计算节点200所执行的方法。
此外,本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当其在计算机设备上运行时,使得计算机设备执行上述实施例计算节点200所执行的方法。
此外,本申请实施例还提供了一种计算机程序产品,所述计算机程序产品被计算机执行时,所述计算机执行前述数据提供方法的任一方法。该计算机程序产品可以为一个软件安装包,在需要使用前述数据提供方法的任一方法的情况下,可以下载该计算机程序产品并在计算机上执行该计算机程序产品。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、训练设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。

Claims (26)

  1. 一种负载处理方法,其特征在于,所述方法应用于第一计算节点,所述第一计算节点与其他计算节点连接,所述第一计算节点和所述其他计算节点设置在同一计算节点集群,所述方法包括:
    确定所述第一计算节点上可用的物理资源和所述其他计算节点上可用的物理资源;
    接收负载运行请求,所述负载运行请求用于请求运行目标负载的物理资源;
    判断在所述第一计算节点上可用的物理资源是否满足所述负载运行请求所请求的所述物理资源,在所述第一计算节点上可用的物理资源满足所述负载运行请求所请求的所述物理资源的情况下,从所述第一计算节点上可用的物理资源中为所述目标负载分配所述负载运行请求所请求的物理资源,在所述第一计算节点上可用的物理资源不满足所述负载运行请求所请求的所述物理资源的情况下,从所述其他计算节点中选择可用的物理资源满足所述负载运行请求所请求的所述物理资源的第二计算节点,并将所述负载运行请求转发至所述第二计算节点。
  2. 根据权利要求1所述的方法,其特征在于,所述第一计算节点与调度系统连接,所述接收负载运行请求,包括:
    接收所述调度系统发送的所述负载运行请求。
  3. 根据权利要求2所述的方法,其特征在于,所述调度系统设置在公有云的数据中心,所述计算节点集群设置在与所述公有云的数据中心远程连接的边缘云数据中心。
  4. 根据权利要求2所述的方法,其特征在于,所述调度系统以及所述计算节点集群均设置在所述公有云的数据中心。
  5. 根据权利要求2至4任一项所述的方法,其特征在于,所述调度系统包括虚拟机调度系统和容器调度系统。
  6. 根据权利要求2至5任一项所述的方法,其特征在于,从所述第一计算节点上可用的物理资源中为所述目标负载分配所述负载运行请求所请求的物理资源之后,所述方法还包括:
    向所述调度系统发送响应信息,其中,所述响应信息用于通知所述调度系统所述负载运行请求所请求的所述物理资源已在所述第一计算节点上扣减。
  7. 根据权利要求2至6任一项所述的方法,其特征在于,在接收所述负载运行请求之前,所述方法还包括:
    向所述调度系统发送可用的物理资源总量,所述可用的物理资源总量包括所述第一计算节点上的可用的物理资源的资源量以及所述其他计算节点上可用的物理资源的资源量之和。
  8. 根据权利要求1所述的方法,其特征在于,所述接收负载运行请求,包括:
    接收所述计算节点集群中的第三计算节点转发的所述负载运行请求,其中,所述第三计算节点上可用的物理资源不满足所述负载运行请求所请求的所述物理资源。
  9. 根据权利要求1至8任一项所述的方法,其特征在于,所述负载运行请求携带有所述目标负载的类型,其中所述目标负载的类型包括虚拟机和容器,所述方法还包 括:
    判断所述目标负载的类型为虚拟机或容器;
    在所述第一计算节点上可用的物理资源满足所述负载运行请求所请求的所述物理资源的情况下,在所述目标负载的类型为虚拟机时,根据分配的所述负载运行请求所请求的物理资源在所述第一计算节点创建虚拟机,在所述目标负载的类型为容器时,根据分配的所述负载运行请求所请求的物理资源在所述第一计算节点创建容器。
  10. 根据权利要求1至9任一项所述的方法,其特征在于,所述确定其他计算节点上可用的物理资源,包括:
    采集所述其他计算节点的物理资源总量以及所述其他计算节点已使用的物理资源;
    根据所述其他计算节点的物理资源总量以及所述其他计算节点已使用的物理资源,确定所述其他计算节点上可用的物理资源。
  11. 根据权利要求1至10任一项所述的方法,其特征在于,在所述第一计算节点上可用的物理资源满足所述负载运行请求所请求的所述物理资源的情况下,从所述第一计算节点上可用的物理资源中为所述目标负载分配的物理资源为第一物理资源;
    所述方法还包括:
    当所述第一计算节点满足重调度条件时,释放所述第一物理资源,并从所述第一计算节点上可用的物理资源中重新为所述目标负载分配第二物理资源,所述第二物理资源与所述第一物理资源不同。
  12. 一种负载处理装置,其特征在于,所述负载处理装置应用于第一计算节点,所述第一计算节点与其他计算节点连接,所述第一计算节点和所述其他计算节点设置在同一计算节点集群,所述负载处理装置包括:
    资源管理模块,用于确定所述第一计算节点上可用的物理资源和所述其他计算节点上可用的物理资源;
    通信模块,用于接收负载运行请求,所述负载运行请求用于请求运行目标负载的物理资源;
    调度模块,用于判断在所述第一计算节点上可用的物理资源是否满足所述负载运行请求所请求的所述物理资源,在所述第一计算节点上可用的物理资源满足所述负载运行请求所请求的所述物理资源的情况下,从所述第一计算节点上可用的物理资源中为所述目标负载分配所述负载运行请求所请求的物理资源,在所述第一计算节点上可用的物理资源不满足所述负载运行请求所请求的所述物理资源的情况下,从所述其他计算节点中选择可用的物理资源满足所述负载运行请求所请求的所述物理资源的第二计算节点,并将所述负载运行请求转发至所述第二计算节点。
  13. 根据权利要求12所述的负载处理装置,其特征在于,所述第一计算节点与调度系统连接,所述通信模块,具体用于接收所述调度系统发送的所述负载运行请求。
  14. 根据权利要求13所述的负载处理装置,其特征在于,所述调度系统设置在公有云的数据中心,所述计算节点集群设置在与所述公有云的数据中心远程连接的边缘云数据中心。
  15. 根据权利要求13所述的负载处理装置,其特征在于,所述调度系统以及所述 计算节点集群均设置在所述公有云的数据中心。
  16. 根据权利要求13至15任一项所述的负载处理装置,其特征在于,所述调度系统包括虚拟机调度系统和容器调度系统。
  17. 根据权利要求13至16任一项所述的负载处理装置,其特征在于,所述通信模块,还用于从所述第一计算节点上可用的物理资源中为所述目标负载分配所述负载运行请求所请求的物理资源之后,向所述调度系统发送响应信息,其中,所述响应信息用于通知所述调度系统所述负载运行请求所请求的所述物理资源已在所述第一计算节点上扣减。
  18. 根据权利要求13至17任一项所述的装置,其特征在于,所述通信模块,还用于在接收所述负载运行请求之前,向所述调度系统发送可用的物理资源总量,所述可用的物理资源总量包括所述第一计算节点上的可用的物理资源的资源量以及所述其他计算节点上可用的物理资源的资源量之和。
  19. 根据权利要求12所述的负载处理装置,其特征在于,所述通信模块,具体用于接收所述计算节点集群中的第三计算节点转发的所述负载运行请求,其中,所述第三计算节点上可用的物理资源不满足所述负载运行请求所请求的所述物理资源。
  20. 根据权利要求12至19任一项所述的负载处理装置,其特征在于,所述负载运行请求携带有所述目标负载的类型,其中所述目标负载的类型包括虚拟机和容器,所述负载处理装置还包括控制模块;
    所述控制模块,具体用于:
    判断所述目标负载的类型为虚拟机或容器;
    在所述第一计算节点上可用的物理资源满足所述负载运行请求所请求的所述物理资源的情况下,在所述目标负载的类型为虚拟机时,根据分配的所述负载运行请求所请求的物理资源在所述第一计算节点创建虚拟机,在所述目标负载的类型为容器时,根据分配的所述负载运行请求所请求的物理资源在所述第一计算节点创建容器。
  21. 根据权利要求12至20任一项所述的负载处理装置,其特征在于,所述资源管理模块,具体用于:
    采集所述其他计算节点的物理资源总量以及所述其他计算节点已使用的物理资源;
    根据所述其他计算节点的物理资源总量以及所述其他计算节点已使用的物理资源,确定所述其他计算节点上可用的物理资源。
  22. 根据权利要求12至21任一项所述的负载处理装置,其特征在于,在所述第一计算节点上可用的物理资源满足所述负载运行请求所请求的所述物理资源的情况下,从所述第一计算节点上可用的物理资源中为所述目标负载分配的物理资源为第一物理资源;
    所述调度模块,还用于:
    当所述第一计算节点满足重调度条件时,释放所述第一物理资源;
    从所述第一计算节点上可用的物理资源中重新为所述目标负载分配第二物理资源,所述第二物理资源与所述第一物理资源不同。
  23. 一种计算节点,其特征在于,所述计算节点包括处理器和存储器;
    所述处理器用于执行所述存储器中存储的指令,以使得所述计算节点执行权利要求1至11中任一项所述的方法。
  24. 一种计算节点集群,其特征在于,所述计算节点集群包括多个计算节点,所述多个计算节点中的一个或者多个计算节点执行如权利要求1至11任一项所述的方法。
  25. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当其在计算设备上运行时,使得所述计算设备执行如权利要求1至11任一项所述的方法。
  26. 一种包含指令的计算机程序产品,当其在计算设备上运行时,使得所述计算设备执行如权利要求1至11中任一项所述的方法。
PCT/CN2022/088019 2021-04-20 2022-04-20 负载处理方法、计算节点、计算节点集群及相关设备 WO2022222975A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202110424951.1 2021-04-20
CN202110424951 2021-04-20
CN202110932332.3 2021-08-13
CN202110932332.3A CN115220862A (zh) 2021-04-20 2021-08-13 负载处理方法、计算节点、计算节点集群及相关设备

Publications (1)

Publication Number Publication Date
WO2022222975A1 true WO2022222975A1 (zh) 2022-10-27

Family

ID=83606646

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/088019 WO2022222975A1 (zh) 2021-04-20 2022-04-20 负载处理方法、计算节点、计算节点集群及相关设备

Country Status (2)

Country Link
CN (1) CN115220862A (zh)
WO (1) WO2022222975A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090320023A1 (en) * 2008-06-24 2009-12-24 Barsness Eric L Process Migration Based on Service Availability in a Multi-Node Environment
CN107479947A (zh) * 2017-08-18 2017-12-15 郑州云海信息技术有限公司 一种虚拟机能耗优化方法和系统
CN110955487A (zh) * 2018-09-27 2020-04-03 株式会社日立制作所 Hci环境下的vm/容器和卷配置决定方法及存储系统
CN111880939A (zh) * 2020-08-07 2020-11-03 曙光信息产业(北京)有限公司 容器动态迁移方法、装置及电子设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090320023A1 (en) * 2008-06-24 2009-12-24 Barsness Eric L Process Migration Based on Service Availability in a Multi-Node Environment
CN107479947A (zh) * 2017-08-18 2017-12-15 郑州云海信息技术有限公司 一种虚拟机能耗优化方法和系统
CN110955487A (zh) * 2018-09-27 2020-04-03 株式会社日立制作所 Hci环境下的vm/容器和卷配置决定方法及存储系统
CN111880939A (zh) * 2020-08-07 2020-11-03 曙光信息产业(北京)有限公司 容器动态迁移方法、装置及电子设备

Also Published As

Publication number Publication date
CN115220862A (zh) 2022-10-21

Similar Documents

Publication Publication Date Title
US9916183B2 (en) Scheduling mapreduce jobs in a cluster of dynamically available servers
US11836535B1 (en) System and method of providing cloud bursting capabilities in a compute environment
EP2360587A1 (en) Automatic workload transfer to an on-demand center
US20120323988A1 (en) Task allocation in a computer network
US11467874B2 (en) System and method for resource management
US8959367B2 (en) Energy based resource allocation across virtualized machines and data centers
CN109257399B (zh) 云平台应用程序管理方法及管理平台、存储介质
CN109766172B (zh) 一种异步任务调度方法以及装置
US10606650B2 (en) Methods and nodes for scheduling data processing
US20120324111A1 (en) Task allocation in a computer network
CN110914805A (zh) 用于分层任务调度的计算系统
US20210406053A1 (en) Rightsizing virtual machine deployments in a cloud computing environment
WO2014136302A1 (ja) タスク管理装置及びタスク管理方法
CN114721824A (zh) 一种资源分配方法、介质以及电子设备
CN114116173A (zh) 动态调整任务分配的方法、装置和系统
US9928092B1 (en) Resource management in a virtual machine cluster
JP6279816B2 (ja) ストレージ監視システムおよびその監視方法
WO2022222975A1 (zh) 负载处理方法、计算节点、计算节点集群及相关设备
US20230155958A1 (en) Method for optimal resource selection based on available gpu resource analysis in large-scale container platform
WO2022111466A1 (zh) 任务调度方法、控制方法、电子设备、计算机可读介质
US20220100573A1 (en) Cloud bursting technologies
CN114090201A (zh) 资源调度方法、装置、设备及存储介质
KR102014246B1 (ko) 리소스 통합관리를 위한 메소스 처리 장치 및 방법
WO2024087663A1 (zh) 作业调度方法、装置和芯片
US20240160487A1 (en) Flexible gpu resource scheduling method in large-scale container operation environment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22791081

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22791081

Country of ref document: EP

Kind code of ref document: A1