WO2024046017A1 - 基于容器的进程调度方法、装置、设备及存储介质 - Google Patents

基于容器的进程调度方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2024046017A1
WO2024046017A1 PCT/CN2023/110686 CN2023110686W WO2024046017A1 WO 2024046017 A1 WO2024046017 A1 WO 2024046017A1 CN 2023110686 W CN2023110686 W CN 2023110686W WO 2024046017 A1 WO2024046017 A1 WO 2024046017A1
Authority
WO
WIPO (PCT)
Prior art keywords
cpu
container
home
running
away
Prior art date
Application number
PCT/CN2023/110686
Other languages
English (en)
French (fr)
Inventor
彭志光
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2024046017A1 publication Critical patent/WO2024046017A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/485Task life-cycle, e.g. stopping, restarting, resuming execution
    • G06F9/4856Task life-cycle, e.g. stopping, restarting, resuming execution resumption being on a different machine, e.g. task migration, virtual machine migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/48Indexing scheme relating to G06F9/48
    • G06F2209/484Precedence

Definitions

  • the present application relates to the field of cloud computing technology, and in particular to a container-based process scheduling method, device, equipment and storage medium.
  • the container cloud is a container-based cloud platform. By creating containers on the device, the container cloud can provide services for the business by providing containers.
  • hybrid deployment is often used to deploy services on container clouds, that is, the business processes of multiple services are deployed on the same device.
  • container clouds that is, the business processes of multiple services are deployed on the same device.
  • multiple containers will be created on the same device to provide them for different businesses.
  • the business processes of different businesses are isolated in different containers.
  • the operation of business processes depends on CPU (Central Processing Unit, central processing unit) resources.
  • CPU Central Processing Unit, central processing unit
  • migrating business processes it is also necessary to consider whether the CPU resource utilization is reasonable and whether the business performance meets the standards. For example, there cannot be a situation where the business process in the container cannot be run even if some CPUs are idle, because this situation will lead to low resource utilization. For another example, there should be no conflict between different businesses.
  • the business processes of delay-sensitive services may compete for CPU resources, resulting in a larger scheduling delay for the business processes of delay-sensitive services. Thereby affecting business performance.
  • Embodiments of the present application provide a container-based process scheduling method, device, equipment and storage medium.
  • the technical solutions provided by the embodiments of this application include the following contents.
  • a container-based process scheduling method includes:
  • the container For any container, periodically obtain the running status data of the container's home CPU; where the home CPU refers to the CPU on the device that has established a binding relationship with the container; the number of CPUs bound to the container is less than The target number refers to the number of CPUs required to meet the business operation requirements of the container;
  • a container-based process scheduling device includes:
  • the acquisition module is configured to, for any container, periodically obtain the running status data of the home CPU of the container; wherein the home CPU refers to the CPU on the device that has established a binding relationship with the container; the container is bound to The specified number of CPUs is less than the target number, and the target number refers to the number of CPUs required to meet the business operation requirements of the container;
  • the scheduling module is configured to perform business process migration between the home CPU and the away CPU in response to the running status data of the home CPU meeting the load balancing conditions; wherein the away CPU refers to the device that is connected to the home CPU.
  • the above container has not established a binding relationship with the CPU;
  • a determining module configured to, in response to the migration of the first business process in the container, determine the running priority of the first business process on the migrated CPU;
  • the running module is configured to run the first service process on the moved CPU according to the running priority of the first service process.
  • a computer device includes a processor and a memory.
  • a computer program is stored in the memory.
  • the computer program is loaded and executed by the processor to implement the above container-based process scheduling. method.
  • a computer-readable storage medium in which a computer program is stored, and the computer program is loaded and executed by a processor to implement the above container-based process scheduling method.
  • the computer program product includes a computer program.
  • the computer program is stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer program from the computer-readable storage medium, and processes The computer program is executed by the server, so that the computer device executes the above container-based process scheduling method.
  • the scheduling scheme provided by the embodiment of this application first proposes the concepts of home CPU and away CPU based on the binding relationship between the container and the CPU on the device; for any container, the CPU that establishes a binding relationship with the container is called the container.
  • the home CPU, and the CPU that has not established a binding relationship with the container is called the away CPU of the container.
  • the number of CPUs bound to each container is less than the target number, where the target number refers to the number of CPUs required to meet the business operation requirements of each container. Since each container is bound to fewer CPUs, more containers can be deployed on the same device for different business uses, achieving higher CPU resource oversales and improving resource utilization. Among them, oversold here means that the actual CPU resources owned are less than the allocated CPU resources.
  • the embodiment of the present application also supports the scheduling of business processes among multiple CPUs. Specifically, for any home CPU of the container, in response to the running status data of the home CPU meeting the load balancing conditions, the business process will be migrated between the home CPU and the away CPU of the container; this scheduling method can This avoids the situation where business processes cannot run even if some CPUs are idle, ensuring resource utilization.
  • the embodiment of this application also proposes the concept of running priority; assuming that the first business process in the container is migrated, the running priority of the first business process on the migrated CPU will be determined, and then based on the first business process The running priority of the process is to run the first service process on the moved CPU.
  • This priority control method can avoid conflicts between different services. Among them, the first business process runs according to the determined priority on the moved CPU, which will not affect the running of the business process in the container bound to the CPU, ensuring business performance.
  • the scheduling solution provided by the embodiments of the present application can take into account both service performance and resource utilization.
  • Figure 1 is a schematic structural diagram of a computer device according to an exemplary embodiment
  • Figure 2 is a system architecture diagram of a container cloud according to an exemplary embodiment
  • Figure 3 is a system architecture diagram of another container cloud according to an exemplary embodiment
  • Figure 4 is a schematic diagram of binding between a container and a CPU according to an exemplary embodiment
  • Figure 5 is a flow chart of a container-based process scheduling method according to an exemplary embodiment
  • Figure 6 is a schematic diagram of load detection according to an exemplary embodiment
  • Figure 7 is a schematic diagram of priority control according to an exemplary embodiment
  • Figure 8 is a flow chart of another container-based process scheduling method according to an exemplary embodiment
  • Figure 9 is a schematic diagram of a capacity expansion logic according to an exemplary embodiment
  • Figure 10 is a flow chart of yet another container-based process scheduling method according to an exemplary embodiment
  • Figure 11 is a schematic diagram of a shrinking logic according to an exemplary embodiment
  • Figure 12 is a flow chart of yet another container-based process scheduling method according to an exemplary embodiment
  • Figure 13 is a schematic structural diagram of a container-based process scheduling device according to an exemplary embodiment.
  • first, second, etc. are used to distinguish the same or similar items with basically the same function and function. It should be understood that the terms “first”, “second” and “nth” There is no logical or sequential dependency between them, and there is no limit on the number or execution order. It should also be understood that, although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by the terms.
  • first element could be termed a second element, and similarly, a second element could be termed a first element, without departing from the scope of various examples.
  • Both the first element and the second element may be elements, and in some cases, may be separate and distinct elements.
  • At least one refers to one or more than one.
  • at least one element can be one element, two elements, three elements, or any integer number greater than or equal to one.
  • Multiple means two or more.
  • multiple elements can be two elements, three elements, or any integer number greater than or equal to two.
  • the information including but not limited to user equipment information, user personal information, etc.
  • data including but not limited to data used for analysis, stored data, displayed data, etc.
  • signals involved in this application All are authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data need to comply with relevant laws, regulations and standards of relevant countries and regions.
  • cloud technology refers to a hosting technology that unifies a series of resources such as hardware, software, and networks within a wide area network or local area network to realize data calculation, storage, processing, and sharing.
  • Cloud technology is a general term for network technology, information technology, integration technology, management platform technology, application technology, etc. based on the cloud computing business model. It can form a resource pool and use it on demand, which is flexible and convenient. Cloud computing technology will become an important support.
  • the background services of technical network systems require a large amount of computing and storage resources, such as video websites, picture websites and more portal websites.
  • each item may have its own identification mark, which needs to be transmitted to the backend system for logical processing. Data at different levels will be processed separately, and all types of industry data need to be powerful. System backing support can only be achieved through cloud computing.
  • Cloud Computing refers to the delivery and usage model of IT infrastructure, which refers to obtaining required resources through the network in an on-demand and easily scalable manner; Cloud Computing in a broad sense refers to the delivery and usage model of services, which refers to the on-demand and easily scalable manner through the network. Get the services you need in an easily scalable way. Such services can be IT, software, Internet-related, or other services.
  • Cloud computing is Grid Computing, Distributed Computing, Parallel Computing, Utility Computing, Network Storage Technologies, Virtualization, Load Balancing ( Load Balance) and other traditional computer and network technology development and integration products.
  • Cloud computing has developed rapidly with the development of the Internet, real-time data streams, diversification of connected devices, and the demand for search services, social networks, mobile commerce, and open collaboration. Different from the previous parallel distributed computing, the emergence of cloud computing will conceptually promote revolutionary changes in the entire Internet model and enterprise management model.
  • Container In Linux, container technology is a process isolation technology. In terms of computing form, container technology is a lightweight operating system layer virtualization technology with a kernel. Containers can isolate processes into an independent environment.
  • Container cloud It is an emerging product form in cloud computing technology.
  • Container cloud is a container management platform composed of containers, which provides users with great convenience in using containers. Among them, by creating containers on physical machines or virtual machines, container clouds can provide services for businesses by providing containers. To put it another way, container cloud uses containers as the basic unit of resource allocation and scheduling, encapsulates the environment in which software runs, and provides developers and system administrators with a platform for building, publishing, and running distributed applications.
  • Hybrid deployment refers to deploying the processes of multiple services on the same device.
  • the businesses mentioned here include but are not limited to: game business, search business, information flow business, e-commerce transaction business, big data business, machine learning business, storage business, etc.
  • Process scheduling refers to dynamically allocating the CPU to a process in the run queue according to certain rules so that it can be executed; in other words, process scheduling It refers to selecting a process from the run queue according to certain rules so that it can obtain the CPU.
  • process scheduling refers to scheduling processes between different CPUs to execute them.
  • cpuset mechanism In Linux, the basic function of cpuset is to restrict certain processes to run only on certain CPUs of the device. For example, assuming there are 4 processes and 4 CPUs on a device, cpuset can be used to make the first process and the second process run only on the first CPU and the second CPU. To put it another way, cpuset is used to limit the range of CPUs that a process can run on.
  • Overselling In the embodiment of this application, overselling refers to deploying more containers on equipment with fixed specifications.
  • fixed specifications refer to a fixed number of CPUs.
  • oversold means that the number of CPUs required by the actual deployed container is greater than the number of CPUs available on the device. For example, in order to ensure the business service quality of each container, each container requires 4 CPUs, and only 8 CPUs are deployed on the device. However, more than 2 containers are deployed to use these 8 CPUs to improve resource utilization.
  • the container-based process scheduling method provided by the embodiments of the present application is applied to the computer device shown in Figure 1 or the virtual machine created on the above computer device. That is to say, the container-based process scheduling method provided by the embodiment of the present application can be executed by the computer device shown in Figure 1, or by a virtual machine created on the above-mentioned computer device.
  • the above computer equipment is also called a physical machine or a physical server in the embodiment of this application.
  • FIG. 1 is a schematic structural diagram of a computer device according to an exemplary embodiment.
  • the computer device 100 may vary greatly due to different configurations or performance, and may include one or more central processing units (also called processors) 101 and one or more memories 102, where the memory 102 At least one program code is stored in , and the at least one program code is loaded and executed by the central processing unit 101 to implement the container-based process scheduling method provided by the embodiment of the present application.
  • the above-mentioned at least one program code may be called a computer program.
  • the computer device 100 may also have components such as wired or wireless network interfaces, keyboards, and input and output interfaces for input and output.
  • the computer device 100 may also include other components for realizing device functions, which will not be described again here.
  • the container-based process scheduling method provided by the embodiments of the present application is applied to the kernel layer in the system architecture of the container cloud shown in Figure 2.
  • Figure 2 is a system architecture diagram of a container cloud according to an exemplary embodiment.
  • the system architecture includes a device layer 201, a kernel layer 202, a platform layer 203 and a business layer 204 from bottom to top.
  • the device layer 201 includes physical machines and virtual machines.
  • the kernel layer 202 is used to provide resource management and process scheduling functions.
  • the kernel layer 202 also provides a cpuset mechanism and a quota mechanism.
  • the container orchestrator full name: Kubernetes, abbreviated: k8s
  • the container orchestrator full name: Kubernetes, abbreviated: k8s
  • the container orchestrator full name: Kubernetes, abbreviated: k8s
  • the platform layer 203 uses the cpuset mechanism and quota mechanism provided by the kernel layer 202 to generate core-bound containers or non-core-bound containers, and then provides them to the business layer 204 for different business uses.
  • the service layer 204 includes various services deployed in a hybrid manner. It should be noted that Figure 2 only shows four services: service A, service B, service C and service D. In fact, the service layer 204 may include more or less services, and this application will not limit it here. .
  • the cpuset mechanism will restrict the business processes in some containers to run only on the fixed CPU of the device, which also limits the range of CPU resources that the business processes can use.
  • the cpuset mechanism allocates CPU resources to the business processes in the container by binding the CPU.
  • the above-mentioned binding core refers to the affinity between the set process and the core of the CPU. After setting, this process will only run on the bound CPU.
  • the quota mechanism There is no binding between the container and the CPU, and the business process in the container can run on any CPU. However, based on the quota mechanism, the CPU resources that each container can utilize within a fixed time period are limited.
  • each container requires 4 CPUs. These four CPUs are the number of CPUs required to meet the business operation requirements of the container. In other words, they are the amount of CPU resources required to ensure the business service quality of the container.
  • each container is independently bound to 4 CPUs, so these 8 CPUs can be allocated to two containers.
  • the quota mechanism can limit the share of CPU resources used by each container within a fixed time period to 400%.
  • the above fixed time period is usually 100ms (milliseconds), and 400% means that a maximum of 400ms of CPU time is used in every 100ms time period, that is, 4 CPUs.
  • the business process in a container can run on any 4 CPUs among the 8 CPUs.
  • a certain degree of CPU oversold can be achieved.
  • these 8 CPUs can be allocated to 3 or more containers.
  • embodiments of this application propose a container-based process scheduling solution, which can deploy more containers on the same device and provide Different business uses, and each container can bind fewer CPUs, thereby achieving higher CPU oversold without affecting business performance, significantly improving resource utilization.
  • Figure 3 is a system architecture diagram of another container cloud according to an exemplary embodiment.
  • a container type is added, namely a dynamic expansion container (also called an elastic container).
  • the core binding logic of the embodiment of the present application is first introduced below.
  • a binding relationship will be established between the container and some CPUs on the device, thereby forming the home CPU of the container.
  • the number of CPUs bound to the container is less than the target number.
  • the target number refers to the number of CPUs required to meet the business operation requirements of each container. That is, this embodiment of the application will bind fewer CPUs to each container.
  • the CPU that has not established a binding relationship with the container is called the away CPU of the container in this embodiment.
  • FIG. 4 it is assumed that a total of 8 CPUs are deployed on the device, namely CPU0-CPU7.
  • Each container requires 4 CPUs, but the embodiment of this application only binds 2 CPUs to each container.
  • CPU Taking container 1 as an example, container 1 only has a binding relationship with CPU0-CPU1, and CPU2-CPU7 belong to the away CPUs of container 1.
  • the business process in container 1 is called the home process of CPU0-CPU1 in the embodiment of the present application
  • the business process in the container 2-4 is called the away process of CPU0-CPU1 in the embodiment of the present application.
  • the load detection unit 2021 is used to obtain the running status data of each CPU on the device. Taking the running status data including the load and the scheduling delay of the process in the run queue as an example, the load detection unit 2021 is used to detect the load change of each CPU on the device and the scheduling delay of the process in the run queue. Among them, by knowing the load changes of each CPU and the scheduling delay of the processes in the running queue, it can be assisted to determine whether the business process in each container can only run on the home CPU or can be extended to run on the away CPU.
  • the embodiment of this application proposes the concepts of home CPU and away CPU. Since the away CPU of container A may correspond to the home CPU of container B, when the business process in container A needs to be expanded to run on an away CPU, it must also be ensured that it does not affect the running of the home process on the away CPU. To this end, the embodiment of the present application proposes priority control logic. In some possible implementations, the priority control module 2022 is used to determine whether the migrated business process will The running priority on the CPU. The priority of the business process running on the home CPU and the away CPU are different. For example, the running priority of the business process on the home CPU is higher than the running priority on the away CPU.
  • capacity expansion and contraction include capacity expansion and capacity reduction.
  • expansion means that the business process in the container is expanded from the home CPU of the container to run on the away CPU of the container.
  • Shrinking refers to bringing the migrated business process back to the main CPU to run. That is to say, the CPU range that the business process in the container can run can change and is not limited to the bound host CPU. Therefore, in the embodiment of this application, the container is also called an elastic container or a dynamic expansion container.
  • capacity expansion and contraction can be performed dynamically based on the detection results of the load detection unit, it is called dynamic expansion and contraction.
  • the dynamic expansion and contraction unit 2023 is used to control whether the business process in each container should be run on the home CPU or expanded to run on the away CPU according to the detection result of the load detection unit 2021; or, control the business Whether the process needs to be moved from the away CPU of the container to which it belongs to run on the home CPU of the container.
  • the container-based process scheduling solution provided by the embodiments of this application can be applied not only to container cloud scenarios, but also to online & online mixed deployment scenarios, offline and online mixed deployment scenarios, and cost optimization. Scenes.
  • container cloud technology may be involved for online & online co-location scenarios, off-line co-location scenarios and cost optimization scenarios, and this application is not limited here.
  • online refers to online business
  • offline refers to offline business.
  • Online services usually take a long time to run, have obvious ups and downs in resource utilization, and are delay-sensitive, such as information flow services, e-commerce transaction services, etc.
  • Offline services usually have higher resource utilization during operation, but are generally not sensitive to latency, such as machine learning services.
  • co-location For offline and online co-location scenarios, the meaning of co-location is to mix online services and offline services on the same physical resources, and make full use of resources through resource isolation, scheduling and other control methods while ensuring the stability of services. To put it another way, since the resource utilization rate of online business has obvious ups and downs, the main scenario of co-location is to utilize the idle resources of online business in various periods by filling in offline business to reduce costs. Correspondingly, for online & online co-location scenarios, co-location means mixing different online services onto the same physical resources.
  • the scheduling solution provided by the embodiments of this application can achieve higher resource oversales while ensuring business performance.
  • additional trusted solutions are provided for the resource allocation method of each container in the container cloud scenario.
  • Figure 5 is a flowchart of a container-based process scheduling method according to an exemplary embodiment.
  • the execution subject of this method is a computer device or a virtual machine created on the computer device, where the computer device is also called a physical machine or a physical server.
  • the execution subject of this method is the kernel layer in the system architecture of the container cloud. Taking a physical server as an example, see Figure 5.
  • the method flow includes the following steps.
  • the physical server periodically obtains the running status data of the container's home CPU; where the home CPU refers to the CPU on the device that has a binding relationship with the container; the number of CPUs bound to the container is less than The target number refers to the number of CPUs required to meet the business operation requirements of each container.
  • This step is performed by the load detection unit provided by the core of the CPU deployed on the physical server.
  • the home CPU mentioned in steps 501-503 refers to any home CPU bound to the container.
  • the load detection unit periodically obtains the running status data of each CPU on the device.
  • the running status data of the CPU is used to reflect the busyness of the CPU.
  • the above running status data includes at least one of load and scheduling delay of processes in the CPU's run queue. Taking any home CPU of any container as an example, periodically obtain the running status data of the container's home CPU, including at least one of the following:
  • scheduling delay is also called the scheduling delay, which is essentially the time interval to ensure that every runnable process runs at least once.
  • scheduling delay refers to the time between when a process is ready to run (enters the CPU's run queue) and when it is actually executed (obtains the execution rights of the CPU).
  • this embodiment of the application will use periodic ticks on each CPU to update the load situation, and determine whether the business process should run on the home CPU based on the updated load situation. , it still needs to be expanded to the away CPU to run.
  • Figure 6 shows that the business process in container 1 may need to be expanded from the original CPU0 to run on other CPUs according to the updated load situation.
  • another home CPU i.e., CPU1
  • CPU1 another home CPU
  • 2-CPU7 can be used as objects that can be expanded to operate. This application does not limit this.
  • tick is the relative time unit of the operating system, also known as the time base of the operating system. It is derived from the periodic interrupt (output pulse) of the timer.
  • An interrupt represents a tick, also known as a "clock tick”.
  • the corresponding relationship between a tick and time can be set when initializing the timer, that is, the length of time corresponding to the tick can be adjusted.
  • the kernel provides corresponding adjustment mechanisms that can change the corresponding time length of the tick according to specific circumstances. For example, you can make the operating system generate a tick in 5ms, or you can make the operating system generate a tick in 10ms. Among them, the size of the tick determines the time granularity of the operating system.
  • the scheduling cycle refers to the time period during which all runnable processes are executed on the CPU.
  • the size of the scheduling period is 24ms, which is not limited in this application.
  • the load of the home CPU is periodically obtained, including but not limited to the following methods:
  • the average load of the home CPU in the current scheduling cycle obtains the average load of the home CPU within a fixed period of time.
  • the average load is used to indicate the average number of processes in the run queue of the home CPU; obtain the average load of the home CPU in the previous scheduling cycle; According to the average load of the home CPU within a fixed period of time and the average load of the home CPU in the previous scheduling cycle, the average load of the home CPU in the current scheduling cycle is obtained.
  • the average load within the above fixed period refers to rq.load avg calculated by kernel statistics.
  • rq refers to the run queue run queue.
  • the fixed duration is 1 minute, 5 minutes or 15 minutes, which is not limited by this application.
  • d refers to the average load of the home CPU in the previous scheduling cycle
  • rq.load avg refers to the average load of the home CPU within a fixed period
  • 0.8
  • 0.2
  • load avg refers to the The average load of the home CPU in the current scheduling period.
  • the load of each CPU in any scheduling cycle can be calculated through the above calculation formula.
  • the above describes how to periodically calculate the load of each CPU.
  • the following describes how to save the load of each CPU.
  • it will be judged whether to update the load status of its corresponding home CPU and away CPU; if the time since the last update is greater than a specific period of time (one update cycle time interval), it will trigger to read the load of the corresponding home CPU and away CPU and save it.
  • the size of the update period is consistent with the size of the scheduling period, which is 24 ms, and is not limited in this application.
  • the running status data of the host CPU that is bound to the container is periodically obtained, including but not limited to the following methods:
  • each CPU includes a home process list and an away process list.
  • this embodiment of the application will periodically perform scheduling delay statistics.
  • the time interval for periodic statistics may be consistent with the above-mentioned scheduling period or the above-mentioned tick, which is not limited in this application.
  • the number of CPUs that can be bound to each container and the number of expandable CPUs can be dynamically adjusted, and different values are assigned according to the actual situation, which is not limited in this application.
  • the ratio of expansion (number of expanded CPUs) and contraction (number of bound CPUs) can be controlled through sysctl control parameters, which is not limited in this application.
  • the physical server In response to the running status data of the home CPU meeting the load balancing conditions, the physical server performs business process migration between the home CPU and the away CPU on the device; where the away CPU means that the device has no binding relationship with the container. of CPU.
  • This step is performed by the dynamic expansion and contraction unit provided by the core of the CPU deployed on the physical server.
  • business process migration includes expansion logic and shrinkage logic. That is, for any container, when the running status data of the container's home CPU and away CPU meet the load balancing conditions, expansion and shrinkage will be triggered. At this time, it is necessary to be able to quickly expand the business process to the away CPU. Run or pull it back to the home CPU to run.
  • load balancing-based capacity expansion logic and capacity reduction logic include but are not limited to the following situations.
  • Scenario 1 The running status data of the container's home CPU meets the load balancing conditions. For example, if the load increases or the scheduling delay of the business process is too large, some business processes need to be extended to the away CPU to run the expansion logic.
  • Scenario 2 When the business process runs on the away CPU, there will be two situations of shrinkage. One situation is that the running status data of the away CPU meets the load balancing conditions, such as an increase in load or the scheduling delay of the migrated business process. Too large; another situation may be that the load on the home CPU has been relatively low for a period of time, and the business process should also be moved back to the home CPU to run.
  • the load balancing conditions such as an increase in load or the scheduling delay of the migrated business process.
  • Too large another situation may be that the load on the home CPU has been relatively low for a period of time, and the business process should also be moved back to the home CPU to run.
  • the physical server determines the running priority of the first business process on the migrated CPU; based on the running priority of the first business process, runs it on the migrated CPU. First business process.
  • This step is performed by the priority control unit provided by the core of the CPU deployed on the physical server.
  • the running priority of a business process on its corresponding home CPU is higher than the running priority on its corresponding away CPU.
  • determine the running priority of the first business process on the moved-in CPU including but not limited to the following methods: in response to the moving-in CPU being the home CPU, assign the running priority of the first business process on the moved-in CPU The priority is set to the first running priority; in response to the migrating CPU being an away CPU that has not established a binding relationship with the container, the running priority of the first business process on the migrating CPU is set to the second running priority. levels; among which, the first running priority is higher than the second running priority.
  • Figure 7 shows the running priority control situation in which a certain business process in container 1 is extended from the home CPU (CPU0-CPU1) to run on the away CPU (such as CPU4).
  • the home CPUs of container 1 are CPU0-CPU1
  • the away CPUs are CPU2-CPU7.
  • the business process in container 1 runs on CPU0-CPU1, set it to high operation. Priority; Assuming that the running status data of CPU1 meets the load balancing conditions and expansion conditions, the business process in container 1 will be extended to run on other CPUs, that is, added to the running queue of other CPUs.
  • Figure 7 shows that a certain business process in container 1 is extended to run on CPU4 (step 1 in Figure 7).
  • the business process will be set to a low running priority on CPU4; when CPU4 When the host process is awakened and needs CPU resources, the host process will obtain CPU resources with high running priority (step 2 in Figure 7); when the load on CPU4 increases or the scheduling delay of low running priority business processes is too large , meets the conditions for scaling down, and was previously migrated
  • the moved business process will be moved back to CPU1 (step 3 in Figure 7). At this time, the running priority of the business process is set back to the high running priority.
  • the setting strategy of the running priority can be: when a business process joins the queue, determine whether the CPU joining the queue is its corresponding home CPU; if so, set the running priority of the business process to high running Priority; or, when a business process is enqueued, determine whether the enqueued CPU is its corresponding away CPU; if so, set the running priority of the business process to low running priority.
  • the scheduling scheme provided by the embodiments of this application is applied to the situation of deploying different services on the same device.
  • the scheduling scheme first proposes the concepts of home CPU and away CPU based on the binding relationship between the container and the CPU on the device; for any For a container, the CPU that has established a binding relationship with the container is called the home CPU of the container, and the CPU that has not established a binding relationship with the container is called the away CPU of the container.
  • the number of CPUs bound to each container is less than the target number, where the target number refers to the number of CPUs required to meet the business operation requirements of each container. Since each container is bound to fewer CPUs, more containers can be deployed on the same device for different business uses, achieving higher CPU resource oversales and improving resource utilization. Among them, oversold here means that the actual CPU resources owned are less than the allocated CPU resources.
  • the embodiments of the present application also support the scheduling of business processes among multiple CPUs, enabling efficient business process migration. Specifically, for any home CPU of the container, in response to the running status data of the home CPU meeting the load balancing conditions, the business process will be migrated between the home CPU and the away CPU of the container; this scheduling method can This avoids the situation where business processes cannot run even if some CPUs are idle, ensuring resource utilization.
  • the embodiment of this application also proposes the concept of running priority; assuming that the first business process in the container is migrated, the running priority of the first business process on the migrated CPU will be determined, and then the first business process will be moved according to the first business process.
  • the running priority of the process is to run the first service process on the moved CPU.
  • This priority control method can avoid conflicts between different services. Among them, the first business process runs according to the determined priority on the moved CPU, which will not affect the running of the business process in the container bound to the CPU, ensuring business performance.
  • the scheduling solution provided by the embodiments of the present application can take into account both service performance and resource utilization.
  • embodiments of this application propose a new way to allocate CPU resources to containers, taking into account the performance of the container and the oversold rate of the CPU.
  • business processes run on the home CPU and away CPU with different priorities, thus ensuring business performance.
  • Figure 8 is a flowchart of another container-based process scheduling method according to an exemplary embodiment.
  • the execution subject of this method is a computer device or a virtual machine created on the computer device, where the computer device is also called a physical machine or a physical server.
  • the execution subject of this method is the kernel layer in the system architecture of the container cloud. Taking a physical server as an example, see Figure 8.
  • the method flow includes the following steps.
  • the physical server binds the home CPU to each container created; the home CPUs bound to different containers are different, and the number of home CPUs bound to each container is less than the target number.
  • the target number refers to the number of home CPUs that satisfy each container. The number of CPUs required for business operation requirements.
  • the CPUs bound to container 1 are CPU0-CPU1, that is, the home CPUs of container 1 are CPU0-CPU1; the CPUs bound to container 2 are CPU2-CPU3, that is, the home CPUs of container 1 are CPU2-CPU3;
  • the CPUs bound to container 3 are CPU4-CPU5, that is, the home CPUs of container 3 are CPU4-CPU5;
  • the CPUs bound to container 4 are CPU6-CPU7, that is, the home CPUs of container 4 are CPU6-CPU7.
  • the physical server periodically obtains the running status data of the container's home CPU.
  • the load detection unit periodically obtains the running status data of each CPU on the device.
  • This is just an example using any home CPU of any container.
  • the above running status data includes at least one of load and scheduling delay of processes in the CPU's run queue.
  • the running status data including load as an example, in response to the load of the home CPU not meeting the load balancing conditions, the business process will not be scheduled. For example, if the load balancing condition is not met here, it may be that the load of the home CPU is lower than a certain load threshold. For example, the value of the load threshold is 0.6, which is not limited in this application.
  • the physical server performs business process migration between the home CPU and the away CPU on the device; where the away CPU means that the device has no binding relationship with the container. of CPU.
  • first business process, the second business process, the third business process, the fourth business process, the first CPU, the second CPU, the third CPU, the fourth CPU, the first load threshold, The second load threshold, the third load threshold, the first time threshold, and the second time threshold are only to distinguish different business processes, CPUs, load thresholds, and time thresholds, and do not constitute any other limitations.
  • FIG. 9 is a schematic diagram of a capacity expansion logic according to an exemplary embodiment.
  • CPU1 is the home CPU of container 1, and its running status data meets the load balancing conditions. Therefore, some business processes on it need to be extended to run on CPU6, where CPU6 is the away CPU of the container.
  • step 8031 corresponds to the capacity expansion logic
  • step 8032 corresponds to the capacity reduction logic
  • the first CPU with the lowest load among the away CPUs that have not established a binding relationship with the container will be determined to run on the home CPU.
  • the first business process migrates to the first CPU; or, in the current scheduling cycle, in response to the scheduling delay of the process in the run queue of the home CPU being greater than the first time threshold, in the away server that has not established a binding relationship with the container
  • the second CPU that determines the minimum scheduling delay of the process among the CPUs will move the first business process running on the home CPU to the second CPU.
  • first CPU and the second CPU may be the same CPU or different CPUs, which is not limited in this application.
  • scheduling delay of the processes in the run queue of the home CPU can be either for the home process of the home CPU or the away process of the home CPU. This application also does not impose restrictions here.
  • the above-mentioned first load threshold may be 0.8
  • the above-mentioned first time threshold may be 24 ms, which is not limited in this application.
  • the first load threshold as 0.8
  • the first time threshold as 24ms
  • the dynamic expansion and contraction unit will select the best capacity among the away CPUs of the container.
  • Idle CPU and sends an Inter-Processor Interrupt (IPI, Inter-Processor Interrupt) that forces load balancing to the away CPU; after receiving the IPI interrupt that forces load balancing, the away CPU performs forced load balancing and goes directly to the home CPU Pull business processes from the CPU's running queue without waiting for the load balancing cycle to arrive.
  • the load balancing cycle is used to limit the frequency of performing load balancing to avoid performing load balancing too frequently.
  • the first business process to be migrated is the home process located at the end of the run queue of the home CPU.
  • the physical server determines the running priority of the first business process on the migrated CPU; based on the running priority of the first business process, runs it on the migrated CPU. First business process.
  • the first business process runs on the moving CPU first.
  • the level will be set to the second running priority, that is, the first business process will run at a lower priority on the moved CPU to avoid affecting the execution of the host process on this CPU.
  • the running priority of the first business process on the migrating CPU will be set to first.
  • Running priority that is, the first business process will run with a higher priority after returning to its home CPU.
  • the embodiment of the present application also includes another capacity expansion logic, that is, referring to Figure 10, the above steps 803-804 can also be replaced by the following steps 805-806.
  • the physical server determines the first guest CPU with the lowest load among the away CPUs that have not established a binding relationship with the container. CPU; determine the third CPU with the highest load among all CPUs on the device; move the second business process running on the third CPU to the first CPU.
  • the second business process running on the third CPU is moved to the first CPU, including but not limited to the following methods: after the target duration, the second business process in the running queue of the third CPU is moved to the first CPU. The business process is added to the running queue of the first CPU; the target duration is set according to the load balancing cycle.
  • the target threshold range is 0.6-0.8, which is not limited in this application.
  • the dynamic expansion and contraction unit will select the idlest CPU with the lowest load among the away CPUs of the container, and send it to the container.
  • the away CPU sends an IPI interrupt for periodic load balancing.
  • the away CPU After receiving the IPI interrupt for periodic load balancing, the away CPU will shorten the load balancing cycle. For example, it will perform periodic load balancing when half of the load balancing cycle is reached. For example, periodic load balancing will find the busiest CPU from the global CPU, and then pull the business process from the busiest CPU.
  • a periodic load balancing strategy is adopted to pull the business process from the busiest CPU in the global CPU and migrate it to the idlest CPU, thereby improving resource utilization and improving the efficiency of each CPU in the entire system. load balancing among them.
  • the second business process to be migrated is the home process located at the end of the run queue of the CPU.
  • the physical server determines the running priority of the second service process on the first CPU; and runs the second service process on the first CPU according to the running priority of the second service process.
  • This step is the same as the above-mentioned step 804 and will not be described again here.
  • the embodiments of the present application also include shrinking logic.
  • shrinking logic for example, when the business process of the home CPU is running on the away CPU, the running status data of the away CPU meets the load balancing conditions, such as the load increases or the scheduling delay of the migrated business process is too large, Then the scaling operation will be triggered.
  • FIG. 11 is a schematic diagram of a shrinking logic according to an exemplary embodiment.
  • CPU6 is the away CPU of container 1, and its running status data meets the load balancing conditions. For example, CPU6 has an increased load due to running its home process, so the migrated business process needs to be moved back to its home CPU, CPU1. run.
  • step 807 is also included.
  • the response to the running status data includes the scheduling delay of the process; in the case that the moving CPU is the home CPU and the moving CPU is an away CPU that has not established a binding relationship with the container, in the next scheduling cycle,
  • the physical server moves the first service process from the moved-in CPU back to the home CPU.
  • the above third time threshold may be 24 ms, which is not limited by this application.
  • the dynamic expansion and contraction unit will send an IPI interrupt for forced load balancing to the home CPU; after receiving the IPI interrupt for forced load balancing, the home CPU will perform forced load balancing and directly pull the first page from the running queue of the away CPU.
  • the first business process will run with a higher priority after returning to the home CPU.
  • the embodiment of the present application also includes another type of shrinkage logic.
  • the reduction logic shown in the embodiment of this application also includes:
  • next scheduling cycle in response to the load of the migrated CPU being higher than the third load threshold, determine the third CPU with the highest load among all CPUs on the device; migrate the third business process running on the third CPU The home CPU; determines the running priority of the third business process on the home CPU; runs the third business process on the home CPU according to the running priority of the third business process; or,
  • the fourth CPU with the largest scheduling delay of the process is determined among all CPUs on the device;
  • the fourth business process on the four CPUs is moved to the home CPU; the running priority of the fourth business process on the home CPU is determined; and the fourth business process is run on the home CPU according to the running priority of the fourth business process.
  • the third CPU and the fourth CPU may be the same CPU or different CPUs.
  • the third business process and the fourth business process may be the same business process or they may be different CPUs. are different business processes, and are not limited in this application.
  • the third business process and the fourth business process to be migrated may be the home process located at the end of the running queue of the corresponding CPU. This application also imposes no restrictions here.
  • the above-mentioned third load threshold may be 0.7
  • the above-mentioned second time threshold may be 18 ms, which is not limited in this application.
  • the dynamic expansion and contraction unit will send periodic load balancing to the home CPU. IPI interrupt; after receiving the IPI interrupt of periodic load balancing, the home CPU will ignore the time control of the load balancing cycle, directly perform periodic load balancing, find the busiest CPU from the global CPU, and directly go to the CPU's IPI interrupt. Pull business processes from the running queue.
  • periodic load balancing is performed and the business processes on the busy CPU are moved to the relatively idle CPU, fully ensuring global CPU load balancing.
  • the scheduling scheme provided by the embodiments of this application is applied to the situation of deploying different services on the same device.
  • the scheduling scheme first proposes the concepts of home CPU and away CPU based on the binding relationship between the container and the CPU on the device; for any For a container, the CPU that has established a binding relationship with the container is called the home CPU of the container, and the CPU that has not established a binding relationship with the container is called the away CPU of the container.
  • the number of CPUs bound to each container is less than the target number, where the target number refers to the number of CPUs required to meet the business operation requirements of each container. Since each container is bound to fewer CPUs, more containers can be deployed on the same device for different business uses, achieving higher CPU resource oversales and improving resource utilization. Among them, oversold here means that the actual CPU resources owned are less than the allocated CPU resources.
  • the embodiments of the present application also support the scheduling of business processes among multiple CPUs, enabling efficient business process migration. This scheduling method can avoid the situation where business processes cannot run even if some CPUs are idle, ensuring resource utilization.
  • the embodiment of this application also proposes the concept of running priority; for example, assuming that the first business process in the container is migrated, then the running priority of the first business process on the migrated CPU will be determined, and then According to the running priority of the first service process, the first service process is run on the moved CPU.
  • This priority control method can avoid conflicts between different services. Among them, the first business process runs according to the determined priority on the moved CPU, which will not affect the running of the business process in the container bound to the CPU, ensuring business performance.
  • the scheduling solution provided by the embodiments of the present application can take into account both service performance and resource utilization.
  • the embodiment of this application proposes a new way to allocate CPU resources to the container, taking into account the performance and performance of the container. CPU oversold rate.
  • business processes run on the home CPU and away CPU with different priorities, thus ensuring business performance.
  • step 804 if the CPU to which the first business process migrates does not currently have its own home process running, then the away process running on this CPU can be temporarily set to high according to certain rules. The priority will be switched back to low priority when there is a host process running.
  • the method provided by the embodiment of the present application also includes: responding that the home process of the migrating CPU is not currently running, and the current Among the away processes running on the migrating CPU, the service to which the first business process belongs has the highest latency sensitivity, and the running priority of the first business process is temporarily adjusted to the second running priority; in response to the host process of the migrating CPU The process is in the ready state, and the running priority of the first business process is adjusted back to the first running priority.
  • delay sensitivity is used to characterize the sensitivity of services to delay. For example, gaming services are highly sensitive to latency.
  • business process migration between the home CPU and the away CPU may also include: in the current scheduling cycle, in response to the If the running status data of the home CPU meets the load balancing conditions and the migrating CPU is the home CPU, multiple candidate migration objects are determined among the away CPUs that have not established a binding relationship with the container based on the running status data in the current scheduling cycle. ; Predict the running status data of multiple candidate migration objects in subsequent scheduling cycles; Based on the running status data and predicted running status data of multiple candidate migration objects in the current scheduling cycle, predict the running status data of multiple candidate migration objects in the current scheduling cycle. Determine the moved-in CPU; perform business process migration between the home CPU and the moved-in CPU.
  • prediction can be made based on the number of processes in the run queue of each candidate migration object, or prediction can be made based on the type of business processed by the container bound to each candidate migration object.
  • This application is here No restrictions.
  • Figure 13 is a schematic structural diagram of a container-based process scheduling device according to an exemplary embodiment. Referring to Figure 13, the device includes the following modules.
  • the acquisition module 1301 is configured to, for any container, periodically acquire the running status data of the container's home CPU; where the home CPU refers to the CPU on the device that has established a binding relationship with the container; the container The number of bound CPUs is less than the target number, which refers to the number of CPUs required to meet the business operation requirements of the container.
  • the scheduling module 1302 is configured to perform business process migration between the home CPU and the away CPU in response to the running status data of the home CPU meeting the load balancing conditions; wherein the away CPU refers to the device on the device.
  • the container has not established a binding relationship with the CPU.
  • the determining module 1303 is configured to, in response to the migration of the first business process in the container, determine the running priority of the first business process on the migrated CPU.
  • the running module 1304 is configured to run the first service process on the moved CPU according to the running priority of the first service process.
  • the scheduling scheme provided by the embodiment of this application first proposes the concepts of home CPU and away CPU based on the binding relationship between the container and the CPU on the device; for any container, the CPU that establishes a binding relationship with the container is called the The home CPU of the container.
  • the CPU that has not established a binding relationship with the container is called the away CPU of the container.
  • the number of CPUs bound to each container is less than the target number, where the target number refers to the number of CPUs required to meet the business operation requirements of each container. Since each container is bound to fewer CPUs, more containers can be deployed on the same device for different business uses, achieving higher CPU resource oversales and improving resource utilization. Among them, oversold here means that the actual CPU resources owned are less than the allocated CPU resources.
  • the embodiment of the present application also supports the scheduling of business processes among multiple CPUs. Specifically, for any home CPU of the container, in response to the running status data of the home CPU meeting the load balancing conditions, the business process will be migrated between the home CPU and the away CPU of the container; this scheduling method can This avoids the situation where business processes cannot run even if some CPUs are idle, ensuring resource utilization.
  • the embodiment of this application also proposes the concept of running priority; assuming that the first business process in the container is migrated, the running priority of the first business process on the migrated CPU will be determined, and then based on the first business process The running priority of the process is to run the first service process on the moved CPU.
  • This priority control method can avoid conflicts between different services. Among them, the first business process runs according to the determined priority on the moved CPU, which will not affect the running of the business process in the container bound to the CPU, ensuring business performance.
  • the scheduling solution provided by the embodiments of the present application can take into account both service performance and resource utilization.
  • the running status data includes the scheduling delay of processes in the run queue of the home CPU; the acquisition module is configured to periodically obtain the home processes in the home process list of the home CPU.
  • the scheduling delay of The business process in the container that has not established a binding relationship with the host CPU is described.
  • the determining module is configured to, in response to the fact that the migrating CPU is the home CPU, set the running priority of the first business process on the migrating CPU. is the first running priority; in response to the migrating CPU being an away CPU that has not established a binding relationship with the container, the running priority of the first business process on the migrating CPU is set to A second running priority; wherein the first running priority is higher than the second running priority.
  • the running status data includes the load of the home CPU; the scheduling module is configured to respond to the load of the home CPU being higher than the first load threshold within the current scheduling cycle, Determine the first CPU with the lowest load among the away CPUs that have not established a binding relationship with the container; and migrate the first business process running on the home CPU to the first CPU.
  • the running status data includes the scheduling delay of processes in the run queue of the host CPU; the scheduling module is configured to respond to the operation of the host CPU within the current scheduling cycle.
  • the scheduling delay of the process in the queue is greater than the first time threshold, and the second CPU with the smallest scheduling delay of the process is determined among the away CPUs that have not established a binding relationship with the container; the second CPU running on the home CPU is The first business process is moved to the second CPU.
  • the running status data includes the load of the home CPU; the scheduling module is configured to, within the current scheduling cycle, when the moving CPU is the home CPU and the moving in When the CPU is an away CPU that has not established a binding relationship with the container, in response to the load of the home CPU being lower than the second load threshold, the first business process is moved back from the migrating CPU The home CPU.
  • the running status data includes the load of the home CPU; the scheduling module is also configured to, within the current scheduling cycle, respond to the load of the home CPU being within the target threshold interval, in Determine the first CPU with the lowest load among the away CPUs that have not established a binding relationship with the container; determine the third CPU with the highest load among all CPUs on the device; determine the second CPU running on the third CPU The business process is moved to the first CPU;
  • the determining module is also configured to determine the running priority of the second service process on the first CPU;
  • the running module is further configured to run the second service process on the first CPU according to the running priority of the second service process.
  • the running module is configured to add the second business process located in the running queue of the third CPU to the running queue of the first CPU after a target duration. ; Wherein, the target duration is set according to the load balancing cycle.
  • the running status data includes the load of the home CPU; the CPU that is moving out is the home CPU and the CPU that is moving in is an away CPU that has not established a binding relationship with the container.
  • the scheduling module is also configured to respond to the load of the incoming CPU being higher than the third load threshold in the next scheduling cycle, Determine the third CPU with the highest load among all CPUs on the device; move the third business process running on the third CPU to the home CPU;
  • the determination module is also configured to determine the running priority of the third service process on the home CPU
  • the running module is further configured to run the third service process on the home CPU according to the running priority of the third service process.
  • the running status data includes the scheduling delay of processes in the run queue of the home CPU; the CPU that is moving out is the home CPU and the CPU that is moving in is the same as that of the container.
  • the scheduling module is also configured to respond to the scheduling delay of the process in the run queue of the moving CPU being greater than the second time threshold in the next scheduling cycle. Determine the fourth CPU with the largest process scheduling delay among all CPUs on the device; move the fourth business process running on the fourth CPU to the home CPU;
  • the determination module is also configured to determine the running priority of the fourth service process on the home CPU
  • the running module is further configured to run the fourth service process on the home CPU according to the running priority of the fourth service process.
  • the running status data includes the scheduling delay of processes in the run queue of the home CPU; the CPU that is moving out is the home CPU and the CPU that is moving in is the same as that of the container.
  • the scheduling module is also configured to respond to the scheduling delay of the process in the run queue of the moving CPU being greater than the third time threshold in the next scheduling cycle, Migrate the first service process from the migrated CPU back to the home CPU.
  • the determining module is further configured to respond to the fact that the migrating CPU is not currently running.
  • the home process of the CPU, and among the away processes currently running on the moved CPU, the service to which the first service process belongs has the highest latency sensitivity, the running priority of the first service process is temporarily adjusted to The second running priority: in response to the home process of the moved-in CPU being in a ready state, adjust the running priority of the first service process back to the first running priority.
  • the scheduling module is further configured to, within the current scheduling period, respond to the running status data of the home CPU meeting the load balancing condition and the CPU that moves out is the home CPU, according to the current
  • the running status data within the scheduling cycle determines multiple candidate migration objects in the away CPU that has not established a binding relationship with the container; predicts the running status data of the multiple candidate migration objects in subsequent scheduling cycles. ; According to the running status data and predicted running status data of the multiple candidate moving objects in the current scheduling cycle, determine the moving CPU among the multiple candidate moving objects; between the home CPU and Business processes are migrated between the migrated CPUs.
  • container-based process scheduling device when the container-based process scheduling device provided in the above embodiment performs process scheduling, only the division of the above functional modules is used as an example. In actual applications, the above functions can be allocated to different functions as needed. Module completion means dividing the internal structure of the device into different functional modules to complete all or part of the functions described above.
  • container-based process scheduling device provided by the above embodiments and the container-based process scheduling method embodiments belong to the same concept. Please refer to the method embodiments for details of the specific implementation process, which will not be described again here.
  • a computer-readable storage medium such as a memory including a computer program.
  • the computer program can be executed by a processor in a computer device to complete the container-based process scheduling method in the above embodiment.
  • the computer-readable storage medium may be a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a compact disc read-only memory (Compact Disc Read-Only Memory, CD-ROM). ), tapes, floppy disks and optical data storage devices, etc.
  • a computer program product includes a computer program.
  • the computer program is stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer program from the computer-readable storage medium.
  • the computer program is obtained, and the processor executes the computer program, so that the computer device executes the above container-based process scheduling method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本申请公开了一种基于容器的进程调度方法、装置、设备及存储介质,属于云计算技术领域。该方法包括:对于任意一个容器,周期性获取该容器的主场CPU的运行状态数据;该主场CPU是指与该容器建立绑定关系的CPU;该容器绑定的CPU个数小于目标数目,目标数目是指满足该容器的业务运行需求所需的CPU个数;响应于该主场CPU的运行状态数据满足负载均衡条件,在该主场CPU和客场CPU之间进行业务进程迁移;客场CPU是指与该容器未建立绑定关系的CPU;响应于该容器内的第一业务进程被迁移,确定第一业务进程在迁入的CPU上的运行优先级;根据确定的运行优先级运行第一业务进程。本申请能够兼顾业务性能和资源利用率。

Description

基于容器的进程调度方法、装置、设备及存储介质
本申请要求于2022年09月02日提交的申请号为202211068759.4、发明名称为“基于容器的进程调度方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及云计算技术领域,特别涉及一种基于容器的进程调度方法、装置、设备及存储介质。
背景技术
随着IT(Internet Technology,互联网技术)的演进,基础架构方面也在推陈出新,伴随虚拟化技术的不断成熟,云计算也发展到了新的阶段。云计算不但改变了企业的业务架构,还改变了企业的运营模式,将业务部署于云平台是未来业务运营的一种发展趋势。示例性地,容器云是一种基于容器的云平台,通过在设备上创建容器,容器云能够以提供容器的方式为业务提供服务。
目前常采用混合部署的方式在容器云上进行业务部署,即将多种业务的业务进程部署到同一个设备上。为此,在同一个设备上会创建多个容器,以提供给不同的业务使用,比如不同业务的业务进程被隔离在不同的容器内。其中,业务进程的运行依赖于CPU(Central Processing Unit,中央处理单元)资源,在进行业务进程迁移时,还需考虑CPU资源利用是否合理以及业务性能是否达标。比如,不能出现即使某些CPU处于空闲状态也不能运行容器内业务进程的情况,因为该种情况会导致资源利用率低。又比如,不能出现不同业务之间相互冲突的情况。例如时延敏感业务(比如游戏业务)的业务进程和时延不敏感业务(比如机器学习任务)的业务进程可能会竞争CPU资源,进而导致时延敏感业务的业务进程的调度时延较大,从而影响业务性能。
基于以上描述可知,业务进程的调度需要兼顾资源利用率和业务性能。为此,在不影响业务性能的前提下,如何提升资源利用率,便成为本领域在进行进程调度时需要重点关注的一项。
发明内容
本申请实施例提供了一种基于容器的进程调度方法、装置、设备及存储介质。本申请实施例提供的技术方案包括如下内容。
一方面,提供了一种基于容器的进程调度方法,所述方法包括:
对于任意一个容器,周期性获取所述容器的主场CPU的运行状态数据;其中,所述主场CPU是指设备上与所述容器建立绑定关系的CPU;所述容器绑定的CPU个数小于目标数目,所述目标数目是指满足所述容器的业务运行需求所需的CPU个数;
响应于所述主场CPU的运行状态数据满足负载均衡条件,在所述主场CPU和客场CPU之间进行业务进程迁移;其中,所述客场CPU是指所述设备上与所述容器未建立绑定关系的CPU;
响应于所述容器内的第一业务进程被迁移,确定所述第一业务进程在迁入的CPU上的运行优先级;根据所述第一业务进程的运行优先级,在所述迁入的CPU上运行所述第一业务进程。
另一方面,提供了一种基于容器的进程调度装置,所述装置包括:
获取模块,被配置为对于任意一个容器,周期性获取所述容器的主场CPU的运行状态数据;其中,所述主场CPU是指设备上与所述容器建立绑定关系的CPU;所述容器绑定的CPU个数小于目标数目,所述目标数目是指满足所述容器的业务运行需求所需的CPU个数;
调度模块,被配置为响应于所述主场CPU的运行状态数据满足负载均衡条件,在所述主场CPU和客场CPU之间进行业务进程迁移;其中,所述客场CPU是指所述设备上与所述容器未建立绑定关系的CPU;
确定模块,被配置为响应于所述容器内的第一业务进程被迁移,确定所述第一业务进程在迁入的CPU上的运行优先级;
运行模块,被配置为根据所述第一业务进程的运行优先级,在所述迁入的CPU上运行所述第一业务进程。
另一方面,提供了一种计算机设备,所述设备包括处理器和存储器,所述存储器中存储有计算机程序,所述计算机程序由所述处理器加载并执行以实现上述的基于容器的进程调度方法。
另一方面,提供了一种计算机可读存储介质,所述存储介质中存储有计算机程序,所述计算机程序由处理器加载并执行以实现上述的基于容器的进程调度方法。
另一方面,提供了一种计算机程序产品,该计算机程序产品包括计算机程序,该计算机程序存储在计算机可读存储介质中,计算机设备的处理器从计算机可读存储介质读取该计算机程序,处理器执行该计算机程序,使得该计算机设备执行上述的基于容器的进程调度方法。
本申请实施例提供的调度方案首先基于设备上容器与CPU之间的绑定关系提出了主场CPU和客场CPU的概念;对于任意一个容器,与该容器建立绑定关系的CPU称之为该容器的主场CPU,与该容器未建立绑定关系的CPU称之为该容器的客场CPU。且,每个容器绑定的CPU个数小于目标数目,其中,目标数目是指满足每个容器的业务运行需求所需的CPU个数。由于每个容器绑定了较少的CPU,因此能够在同一个设备上部署更多的容器,提供给不同的业务使用,实现了较高的CPU资源超卖,能够提升资源利用率。其中,超卖在此处是指实际拥有的CPU资源小于分配的CPU资源。
另外,本申请实施例还支持业务进程在多个CPU之间进行调度。详细来讲,对于该容器的任意一个主场CPU,响应于该主场CPU的运行状态数据满足负载均衡条件,则在该主场CPU和该容器的客场CPU之间进行业务进程迁移;这种调度方式能够避免出现即使某些CPU处于空闲状态也不能运行业务进程的情况,确保了资源利用率。
另外,本申请实施例还提出了运行优先级的概念;假设该容器内的第一业务进程被迁移,那么会确定第一业务进程在迁入的CPU上的运行优先级,进而根据第一业务进程的运行优先级,在迁入的CPU上运行第一业务进程,该种优先级控制方式能够避免出现不同业务之间相互冲突的情况。其中,第一业务进程在迁入的CPU上按照确定的优先级运行,不会影响与该CPU绑定的容器内业务进程在其上的运行,确保了业务性能。
综上所述,本申请实施例提供的调度方案能够兼顾业务性能和资源利用率。
附图说明
图1是根据一示例性实施例示出的一种计算机设备的结构示意图;
图2是根据一示例性实施例示出的一种容器云的系统架构图;
图3是根据一示例性实施例示出的另一种容器云的系统架构图;
图4是根据一示例性实施例示出的一种容器与CPU之间的绑定示意图;
图5是根据一示例性实施例示出的一种基于容器的进程调度方法的流程图;
图6是根据一示例性实施例示出的一种负载检测示意图;
图7是根据一示例性实施例示出的一种优先级控制示意图;
图8是根据一示例性实施例示出的另一种基于容器的进程调度方法的流程图;
图9是根据一示例性实施例示出的一种扩容逻辑的示意图;
图10是根据一示例性实施例示出的又一种基于容器的进程调度方法的流程图;
图11是根据一示例性实施例示出的一种缩容逻辑的示意图;
图12是根据一示例性实施例示出的再一种基于容器的进程调度方法的流程图;
图13是根据一示例性实施例示出的一种基于容器的进程调度装置的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
本申请中术语“第一”、“第二”等字样用于对作用和功能基本相同的相同项或相似项进行区分,应理解,“第一”、“第二”、“第n”之间不具有逻辑或时序上的依赖关系,也不对数量和执行顺序进行限定。还应理解,尽管以下描述使用术语第一、第二等来描述各种元素,但这些元素不应受术语的限制。
这些术语只是用于将一个元素与另一个元素区别开。例如,在不脱离各种示例的范围的情况下,第一元素能够被称为第二元素,并且类似地,第二元素也能够被称为第一元素。第一元素和第二元素都可以是元素,并且在某些情况下,可以是单独且不同的元素。
其中,至少一个是指一个或一个以上,例如,至少一个元素可以是一个元素、两个元素、三个元素等任意大于等于一的整数个元素。而多个是指两个或者两个以上,例如,多个元素可以是两个元素、三个元素等任意大于等于二的整数个元素。
需要说明的是,本申请所涉及的信息(包括但不限于用户设备信息、用户个人信息等)、数据(包括但不限于用于分析的数据、存储的数据、展示的数据等)以及信号,均为经用户授权或者经过各方充分授权的,且相关数据的收样本集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。
本申请实施例提供的基于容器的进程调度方案涉及云技术。其中,云技术是指在广域网或局域网内将硬件、软件、网络等系列资源统一起来,实现数据的计算、储存、处理和共享的一种托管技术。
云技术是基于云计算商业模式应用的网络技术、信息技术、整合技术、管理平台技术、应用技术等的总称,可以组成资源池,按需所用,灵活便利。云计算技术将变成重要支撑。技术网络系统的后台服务需要大量的计算、存储资源,如视频网站、图片类网站和更多的门户网站。伴随着互联网行业的高度发展和应用,将来每个物品都有可能存在自己的识别标志,都需要传输到后台系统进行逻辑处理,不同程度级别的数据将会分开处理,各类行业数据皆需要强大的系统后盾支撑,只能通过云计算来实现。
云计算(Cloud Computing)指IT基础设施的交付和使用模式,指通过网络以按需、易扩展的方式获得所需资源;广义云计算指服务的交付和使用模式,指通过网络以按需、易扩展的方式获得所需服务。这种服务可以是IT和软件、互联网相关,也可是其他服务。云计算是网格计算(Grid Computing)、分布式计算(Distributed Computing)、并行计算(Parallel Computing)、效用计算(Utility Computing)、网络存储(Network Storage Technologies)、虚拟化(Virtualization)、负载均衡(Load Balance)等传统计算机和网络技术发展融合的产物。
随着互联网、实时数据流、连接设备多样化的发展,以及搜索服务、社会网络、移动商务和开放协作等需求的推动,云计算迅速发展起来。不同于以往的并行分布式计算,云计算的产生从理念上将推动整个互联网模式、企业管理模式发生革命性的变革。
下面先对本申请实施例涉及到的一些关键术语或缩略语进行介绍。
容器(container):在Linux中,容器技术是一种进程隔离的技术。在计算形态上,容器技术是一种内核轻量级的操作系统层虚拟化技术。容器能够将进程隔离在一个独立环境中。
容器云:是在云计算技术中新兴的一种产品形态,容器云是由容器构成的容器管理平台,为用户使用容器提供了极大的便利。其中,通过在物理机或者虚拟机上创建容器,容器云能够以提供容器的方式为业务提供服务。换一种表达方式,容器云是以容器作为资源分配和调度的基本单位,封装了软件运行的环境,为开发者和系统管理员提供用于构建、发布和运行分布式应用的平台。
混合部署:是指将多种业务的进程部署到同一个设备上。在一些可能的实现方式中,此处提及的业务包括但不限于:游戏业务、搜索业务、信息流业务、电商交易业务、大数据业务、机器学习业务、存储业务等。
进程调度:通常意义上来讲,以一个CPU为例,进程调度是指按照一定规则,动态地将CPU分配给处于运行队列中的某一个进程,以使之执行;换一种表达方式,进程调度是指从运行队列中按照一定规则挑选一个进程,以使之获得CPU。
在本申请实施例中,进程调度是指在不同CPU之间调度进程,以使之执行。
cpuset机制:在Linux中,cpuset的基本功能是限制某些进程只能运行在设备的某些CPU上。示例性地,假设在一个设备上有4个进程和4个CPU,利用cpuset可以让第1个进程和第2个进程只运行在第1个CPU和第2个CPU上。换一种表达方式,cpuset用于限定进程可以运行的CPU范围。
超卖:在本申请实施例中,超卖是指在固定规格的设备上部署更多的容器。
在一些可能的实现方式中,固定规格是指CPU的数量固定。在该种情况下,超卖是指实际部署的容器所需的CPU个数大于设备上拥有的CPU个数。例如,为了保证每个容器的业务服务质量,每个容器需要4个CPU,且设备上仅部署有8个CPU,但是部署2个以上的容器来使用这8个CPU,以提升资源利用率。
下面介绍本申请实施例提供的基于容器的进程调度方案涉及的实施环境。
在一些可能的实现方式中,在设备层面,本申请实施例提供的基于容器的进程调度方法应用于图1所示的计算机设备或在上述计算机设备上创建的虚拟机。也就是说,本申请实施例提供的基于容器的进程调度方法可以由图1所示的计算机设备执行,或由上述计算机设备上创建的虚拟机执行。其中,上述计算机设备在本申请实施例中也被称为物理机或物理服务器。
图1是根据一示例性实施例示出的一种计算机设备的结构示意图。
参见图1,该计算机设备100可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理单元(也称处理器)101和一个或一个以上的存储器102,其中,存储器102中存储有至少一条程序代码,该至少一条程序代码由中央处理单元101加载并执行以实现本申请实施例提供的基于容器的进程调度方法。上述至少一条程序代码可以称为计算机程序。当然,该计算机设备100还可以具有有线或无线网络接口、键盘以及输入输出接口等部件,以便进行输入输出,该计算机设备100还可以包括其他用于实现设备功能的部件,在此不做赘述。
在另一些可能的实现方式中,在系统架构层面,本申请实施例提供的基于容器的进程调度方法应用于图2所示的容器云的系统架构中的内核层。
图2是根据一示例性实施例示出的一种容器云的系统架构图。
参见图2,该系统架构自底向上依次包括设备层201、内核层202、平台层203和业务层204。其中,设备层201包括物理机和虚拟机。内核层202用于提供资源管理以及进程调度功能。示例性地,内核层202还提供cpuset机制和quota机制。平台层203的容器编排器(全称:Kubernetes,简称:k8s)利用内核层202提供的cpuset机制和quota机制,生成绑核容器或非绑核容器,然后提供给业务层204不同的业务使用。业务层204包括混合部署的各种业务。需要说明的是,图2仅是示出了业务A、业务B、业务C和业务D这四个业务,实际上业务层204可以包括更多或更少的业务,本申请在此不做限制。
其中,cpuset机制会限制某些容器内的业务进程只能运行在设备的固定CPU上,也即限制业务进程可以使用的CPU资源范围。换一种表达方式,cpuset机制通过绑定CPU的方式分配CPU资源给容器内的业务进程使用。相应地,上述提及的绑核是指设定进程与CPU的内核之间的亲和力(affinity)。设定以后,这个进程便仅在绑定的CPU上运行。而quota机制 则不进行容器与CPU的绑定,容器内的业务进程可以运行在任意的CPU上,但是基于quota机制会对每个容器在固定时间周期内可以利用的CPU资源进行限额。
示例性地,假设一个设备上部署有8个CPU,其中,每个容器需要4个CPU。这4个CPU是为了满足容器的业务运行需求所需的CPU个数,换言之,是为了保证容器的业务服务质量所需的CPU资源量。针对cpuset机制,为了避免不同业务之间相互冲突,每个容器独立绑定4个CPU,所以可以为两个容器分配这8个CPU。而在非绑核情况下,通过quota机制可以限制每个容器在固定时间周期内使用的CPU资源份额为400%。其中,上述固定时间周期通常为100ms(毫秒),400%表示在每100ms的时间周期内最多使用400ms的CPU时间,即4个CPU。也即,某个容器内的业务进程可以运行在这8个CPU中的任意4个CPU上。针对该种情况,可以实现一定程度的CPU超卖,比如该设备上虽然只有8个CPU,但是可以为3个或更多数量的容器分配这8个CPU。
综上所述,针对cpuset机制,不同业务之间虽然不会存在冲突,但是CPU的超卖率低,存在即便某些CPU处于空闲状态也不能运行非绑定容器内的业务进程的问题,资源利用率低;针对quota机制,虽然可以实现一定程度的CPU超卖,但由于容器与CPU之间未进行绑定,因此存在业务之间相互冲突的问题,比如时延敏感业务的业务进程和时延不敏感业务的业务进程会竞争CPU资源,可能导致时延敏感业务的业务进程的运行延迟较大,进而影响业务性能。
在同一个设备上部署不同业务的情形下,为了兼顾资源利用率和业务性能,本申请实施例提出了一种基于容器的进程调度方案,能够在同一个设备上部署更多的容器,提供给不同的业务使用,且每个容器能够绑定更少的CPU,从而在不影响业务性能的前提上,实现更高的CPU超卖,显著提升了资源利用率。
图3是根据一示例性实施例示出的另一种容器云的系统架构图。
参见图3,在内核层202上新添加了三个单元,分别是负载检测单元2021、优先级控制单元2022和动态扩缩容单元2023。相应地,针对平台层203,在绑核容器和非绑核容器之外,还增加了一种容器类型,即动态扩缩容器(也称弹性容器)。
为了便于理解上述提及的动态扩缩容,下面先介绍本申请实施例的绑核逻辑。在同一个设备上混合部署多种业务的情形下,对于该设备上的任意一个容器,会建立该容器与设备上部分CPU之间的绑定关系,进而形成该容器的主场CPU。其中,该容器绑定的CPU个数小于目标数目,该目标数目是指满足每个容器的业务运行需求所需的CPU个数。即本申请实施例会为每个容器绑定更少的CPU。而未与该容器建立绑定关系的CPU,在本申请实施例中称为该容器的客场CPU。示例性地,如图4所示,假设设备上一共部署了8个CPU,分别为CPU0-CPU7,其中,每个容器需要4个CPU,但是本申请实施例仅为每个容器绑定2个CPU。以容器1为例,容器1仅与CPU0-CPU1建立了绑定关系,而CPU2-CPU7属于容器1的客场CPU。另外,容器1内的业务进程在本申请实施例中被称为CPU0-CPU1的主场进程,容器2-4内的业务进程在本申请实施例中被称为CPU0-CPU1的客场进程。
综上所述,负载检测单元2021用于获取设备上每个CPU的运行状态数据。以运行状态数据包括负载和运行队列中进程的调度时延为例,则负载检测单元2021用于检测设备上每个CPU的负载变化情况以及运行队列中进程的调度时延。其中,通过获知每个CPU的负载变化情况以及运行队列中进程的调度时延,可以辅助判断每个容器内的业务进程是只能运行在主场CPU上,还是可以扩展到客场CPU上运行。
其中,进程调度的一个非常重要的方面便是运行优先级的控制逻辑。为了实现更高的超卖率,本申请实施例提出了主场CPU和客场CPU的概念。由于容器A的客场CPU可能对应容器B的主场CPU,所以当容器A内的业务进程需要扩展到某一客场CPU上运行时,还需确保不能影响这个客场CPU上的主场进程的运行。为此,本申请实施例提出了优先级控制逻辑。在一些可能的实现方式中,优先级控制模块2022用于确定被迁移的业务进程在迁入的 CPU上的运行优先级,其中,业务进程运行在主场CPU和客场CPU上的优先级不同,比如业务进程在主场CPU上的运行优先级高于在客场CPU上的运行优先级。
在另一些可能的实现方式中,扩缩容包括扩容和缩容。其中,扩容是指容器内的业务进程由该容器的主场CPU扩展到该容器的客场CPU上运行。缩容是指将被迁移的业务进程拉回主场CPU运行。也即,容器内业务进程可运行的CPU范围是能够变化的,并不局限于绑定的主场CPU,因此在本申请实施例中容器也被称为弹性容器或动态扩缩容器。另外,由于扩容和缩容能够根据负载检测单元的检测结果动态进行,因此被称为动态扩缩容。相应地,动态扩缩容单元2023用于根据负载检测单元2021的检测结果,来控制每个容器内的业务进程是应该在主场CPU上运行,还是应该扩展到客场CPU上运行;或者,控制业务进程是否需要由所属容器的客场CPU迁回所属容器的主场CPU上运行。
下面介绍本申请实施例提供的基于容器的进程调度方案的应用场景。
在一些可能的实现方式中,本申请实施例提供的基于容器的进程调度方案除了能够应用在容器云场景下之外,还能够应用于在线&在线混部场景、离在线混部场景以及成本优化场景。示例性地,针对在线&在线混部场景、离在线混部场景以及成本优化场景,可能会涉及容器云技术,本申请在此不做限制。
其中,在线是指在线业务,离线是指离线业务。在线服务通常运行时间长,资源利用率有明显的起伏特征,时延敏感,比如信息流业务、电商交易业务等。而离线业务在运行期间通常资源利用率较高,但是对时延一般不敏感,比如机器学习业务。
针对离在线混部场景,混部的含义是,将在线业务和离线业务混合到相同物理资源上,通过资源隔离、调度等控制手段,充分使用资源,同时保证服务的稳定性。换一种表达方式,由于在线业务的资源利用率有明显的起伏特征,所以混部的主要场景是通过填充离线业务将在线业务各个时段的空闲资源利用起来,以减少成本开支。相应地,针对在线&在线混部场景,混部的含义是,将不同的在线业务混合到相同物理资源上。
针对容器云场景,本申请实施例提供的调度方案可以在保证业务性能的基础上,实现更高的资源超卖。另外,为容器云场景下每个容器的资源分配方式,提供了额外的可信方案。
针对在线&在线混部场景,由于本申请实施例提供的调度方案为每个容器固定分配更少的资源(比如绑定更少的CPU),因此可以在同等性能的机器上混部更多的在线业务。
图5是根据一示例性实施例示出的一种基于容器的进程调度方法的流程图。如前所述,在设备层面,该方法的执行主体为计算机设备或在计算机设备上创建的虚拟机,其中,该计算机设备也被称为物理机或物理服务器。在系统架构层面,该方法的执行主体为容器云的系统架构中的内核层。以物理服务器为例,参见图5,该方法流程包括以下步骤。
501、对于任意一个容器,物理服务器周期性获取该容器的主场CPU的运行状态数据;其中,该主场CPU是指设备上与该容器建立绑定关系的CPU;该容器绑定的CPU个数小于目标数目,目标数目是指满足每个容器的业务运行需求所需的CPU个数。
本步骤由物理服务器上部署的CPU的内核提供的负载检测单元执行。其中,步骤501-503提及的主场CPU指代该容器绑定的任意一个主场CPU。在本申请实施例中,负载检测单元会周期性获取设备上每个CPU的运行状态数据,此处仅是以任意一个容器的任意一个主场CPU为例进行举例说明。
CPU的运行状态数据用于反映该CPU运行的繁忙程度。在一些可能的实现方式中,上述运行状态数据包括负载和CPU的运行队列中进程的调度时延中的至少一种。以任意一个容器的任意一个主场CPU为例,周期性获取该容器的主场CPU的运行状态数据,包括以下至少一种:
周期性获取该主场CPU的负载;
周期性获取该主场CPU的运行队列中进程的调度时延。
其中,调度时延也称调度延迟,实质上是保证每一个可运行进程都至少运行一次的时间间隔。换一种表达方式,调度时延是指一个进程具备运行的条件(进入CPU的运行队列),到真正执行(获得CPU的执行权)的这段时间。
以运行状态数据包括负载为例,如图6所示,本申请实施例会利用每个CPU上周期性的tick进行负载情况的更新,并根据更新的负载情况来决定业务进程是应该运行在主场CPU上,还是需要扩展到客场CPU上去运行。其中,图6示出了容器1内的业务进程可能需要根据更新的负载情况由原本的CPU0而扩展运行到其他的CPU上。需要说明的是,图6将另一个主场CPU(即CPU1)也作为可以扩展运行的对象,实际上还可以仅将客场CPU2-CPU7作为扩展运行的对象,本申请在此不做限制。
其中,tick是操作系统的相对时间单位,也被称为操作系统的时基,来源于定时器的周期性中断(输出脉冲),一次中断表示一个tick,也被称做一个“时钟滴答”。一个tick与时间的对应关系可以在初始化定时器时设定,即tick所对应的时间长度是可以调整的。一般情况下,内核都提供相应的调整机制,可以根据特定情况来改变tick对应的时间长度。例如,可以使操作系统5ms产生一个tick,也可以使操作系统10ms产生一个tick。其中,tick的大小决定了操作系统的时间粒度。
需要说明的是,为了可以敏锐地感知到每个CPU的负载变化又兼顾负载的短时波动情况,并不是每个tick均进行负载统计,而可能是几个tick才进行一次负载统计。在一些可能的实现方式中,只有当时间过了一个调度周期才会利用计算公式去统计一次每个CPU的负载情况。其中,调度周期是指所有可运行进程都在CPU上执行一遍的时间周期。示例性地,调度周期的大小为24ms,本申请在此不做限制。
基于以上描述,在本申请实施例中,周期性获取该主场CPU的负载,包括但不限于如下方式:
在当前调度周期内,获取该主场CPU在固定时长内的平均负载,该平均负载用于指示该主场CPU的运行队列中的平均进程数;获取该主场CPU在上一个调度周期内的平均负载;根据该主场CPU在固定时长内的平均负载和该主场CPU在上一个调度周期内的平均负载,获取该主场CPU在当前调度周期内的平均负载。
其中,上述固定时长内的平均负载是指由内核统计计算得来的rq.loadavg。其中,rq是指运行队列run queue。示例性地,该固定时长为1分钟、5分钟或15分钟,本申请在此不做限制。
在另一些可能的实现方式中,本申请实施例基于如下计算公式获取该主场CPU在当前调度周期内的平均负载。
loadavg=α*(d-rq.loadavg)+β*rq.loadavg
示例性地,d指代该主场CPU在上一个调度周期内的平均负载,rq.loadavg指代该主场CPU在固定时长内的平均负载,α=0.8,β=0.2,loadavg指代该主场CPU在当前调度周期内的平均负载。
在本申请实施例中,可以通过上述计算公式来计算每个CPU在任意一个调度周期内的负载。
以上介绍了如何周期性地计算每个CPU的负载情况,下面介绍如何保存每个CPU的负载情况。在一些可能的实现方式中,在每个业务进程的运行过程中,会判断是否更新其对应的主场CPU和客场CPU的负载情况;如果距离上次更新的时间大于了特定时长(一个更新周期的时间间隔),便会触发读取其对应的主场CPU和客场CPU的负载并保存起来。示例性地,更新周期的大小与调度周期的大小一致,均为24ms,本申请在此不做限制。
在另一些可能的实现方式中,周期性获取与该容器建立有绑定关系的主场CPU的运行状态数据,包括但不限于如下方式:
周期性获取该主场CPU的主场进程列表中主场进程的调度时延;其中,此处的主场进程是指该容器内的业务进程;
周期性获取该主场CPU的客场进程列表中客场进程的调度时延;其中,此处的客场进程是指与该主场CPU未建立绑定关系的容器内的业务进程。
换一种表达方式,每个CPU包含一个主场进程列表和一个客场进程列表,针对主场进程列表和客场进行列表,本申请实施例会周期性进行调度时延的统计。示例性地,周期性进行统计的时间间隔既可以与上述调度周期一致,也可以与上述tick一致,本申请在此不做限制。通过上述方式,能够敏锐地感知到每个进程的调度时延的变化。
在另一些可能的实现方式中,每个容器可以绑定的CPU个数,以及可以扩展的CPU个数是可以动态调整的,根据实际情况而赋予不同的值,本申请在此不做限制。示例性地,可以通过sysctl控制参数来控制扩展(扩展的CPU个数)与收缩(绑定的CPU个数)的比例,本申请在此不做限制。
502、响应于该主场CPU的运行状态数据满足负载均衡条件,物理服务器在该主场CPU和设备上客场CPU之间进行业务进程迁移;其中,客场CPU是指设备上与该容器未建立绑定关系的CPU。
本步骤由物理服务器上部署的CPU的内核提供的动态扩缩容单元执行。
其中,业务进程迁移包括扩容逻辑和缩容逻辑。即,对于任意一个容器来讲,响应于该容器的主场CPU和客场CPU的运行状态数据满足负载均衡条件时,会触发扩容和缩容,此时需要能够快速地将业务进程扩展到客场CPU上去运行或者拉回到主场CPU上去运行。示例性地,基于负载均衡的扩容逻辑和缩容逻辑包括但不限于如下情形。
情形一、该容器的主场CPU的运行状态数据满足负载均衡条件,比如负载增高或业务进程的调度时延过大,部分业务进程需要扩展到客场CPU上去运行的扩容逻辑。
情形二、当业务进程运行在客场CPU上,会出现两种情况下的缩容;一种情况是客场CPU的运行状态数据满足负载均衡条件,比如负载增高或迁移过来的业务进程的调度时延过大;另一种情况可以是主场CPU上的负载已经持续一段时间比较低,该业务进程也应该被迁回至主场CPU上去运行。
关于扩容逻辑和缩容逻辑更为详细的描述还请参见下文实施例。
503、响应于该容器内的第一业务进程被迁移,物理服务器确定第一业务进程在迁入的CPU上的运行优先级;根据第一业务进程的运行优先级,在迁入的CPU上运行第一业务进程。
本步骤由物理服务器上部署的CPU的内核提供的优先级控制单元执行。
在一些可能的实现方式中,业务进程在其对应的主场CPU上的运行优先级要高于在其对应的客场CPU上的运行优先级。相应地,确定第一业务进程在迁入的CPU上的运行优先级,包括但不限于如下方式:响应于迁入的CPU为该主场CPU,将第一业务进程在迁入的CPU上的运行优先级设置为第一运行优先级;响应于迁入的CPU为与该容器未建立绑定关系的客场CPU,将第一业务进程在迁入的CPU上的运行优先级设置为第二运行优先级;其中,第一运行优先级高于第二运行优先级。
示例性地,图7示出了容器1内的某个业务进程由主场CPU(CPU0-CPU1)扩展运行到客场CPU(比如CPU4)上的运行优先级控制情况。参见图7,设备上一共部署具有8个CPU,其中,容器1的主场CPU是CPU0-CPU1,客场CPU是CPU2-CPU7;容器1内的业务进程运行在CPU0-CPU1上时,设置为高运行优先级;假设CPU1的运行状态数据满足负载均衡条件,符合扩容条件,那么会将容器1内的业务进程扩展运行到其他CPU上,即加入到其他CPU的运行队列中。其中,图7中示出了将容器1内的某个业务进程扩展运行到CPU4上(图7中步骤1),此时会将该业务进程在CPU4上设置为低运行优先级;当CPU4上的主场进程被唤醒需要CPU资源时,该主场进程会以高运行优先级获得CPU资源(图7中步骤2);当CPU4上的负载增加或者低运行优先级的业务进程的调度时延过大,符合缩容条件,之前迁 移过来的业务进程会被迁回至CPU1(图7中步骤3),此时该业务进程的运行优先级设置回高运行优先级。
综上所述,运行优先级的设置策略可以是:业务进程在入队时,判断入队的CPU是否是其对应的主场CPU;如果是,则将该业务进程的运行优先级设置为高运行优先级;或者,业务进程入队时,判断入队的CPU是否是其对应的客场CPU;如果是,则将该业务进程的运行优先级设置为低运行优先级。通过上述方式,保证了容器内任一业务进程在主场CPU上的运行优先级高于在客场CPU上的运行优先级,从而保证迁入CPU上业务的业务性能不会受到迁入业务进程的影响,兼顾业务性能和资源利用率。
本申请实施例提供的调度方案应用于在同一个设备上部署不同业务的情形下,该调度方案首先基于设备上容器与CPU之间的绑定关系提出了主场CPU和客场CPU的概念;对于任意一个容器,与该容器建立绑定关系的CPU称之为该容器的主场CPU,与该容器未建立绑定关系的CPU称之为该容器的客场CPU。且,每个容器绑定的CPU个数小于目标数目,其中,目标数目是指满足每个容器的业务运行需求所需的CPU个数。由于每个容器绑定了较少的CPU,因此能够在同一个设备上部署更多的容器,提供给不同的业务使用,实现了较高的CPU资源超卖,能够提升资源利用率。其中,超卖在此处是指实际拥有的CPU资源小于分配的CPU资源。
另外,本申请实施例还支持业务进程在多个CPU之间进行调度,能够实现高效地业务进程迁移。详细来讲,对于该容器的任意一个主场CPU,响应于该主场CPU的运行状态数据满足负载均衡条件,则在该主场CPU和该容器的客场CPU之间进行业务进程迁移;这种调度方式能够避免出现即使某些CPU处于空闲状态也不能运行业务进程的情况,确保了资源利用率。
另外,本申请实施例还提出了运行优先级的概念;假设该容器内的第一业务进程被迁移,那么会确定第一业务进程在迁入的CPU上的运行优先级,进而根据第一业务进程的运行优先级,在迁入的CPU上运行第一业务进程,该种优先级控制方式能够避免出现不同业务之间相互冲突的情况。其中,第一业务进程在迁入的CPU上按照确定的优先级运行,不会影响与该CPU绑定的容器内业务进程在其上的运行,确保了业务性能。
综上所述,本申请实施例提供的调度方案能够兼顾业务性能和资源利用率。示例性地,对于容器云场景,本申请实施例提出了为容器分配CPU资源的新方式,兼顾了容器的性能与CPU的超卖率。另外,业务进程运行于主场CPU和客场CPU上具有不同的优先级,从而保证了业务性能。
图8是根据一示例性实施例示出的另一种基于容器的进程调度方法的流程图。如前所述,在设备层面,该方法的执行主体为计算机设备或在计算机设备上创建的虚拟机,其中,该计算机设备也被称为物理机或物理服务器。在系统架构层面,该方法的执行主体为容器云的系统架构中的内核层。以物理服务器为例,参见图8,该方法流程包括以下步骤。
801、物理服务器为创建的每个容器绑定主场CPU;其中,不同容器绑定的主场CPU不同,且每个容器绑定的主场CPU个数小于目标数目,该目标数目是指满足每个容器的业务运行需求所需的CPU个数。
其中,不同容器绑定的主场CPU不同,也即是指每个容器绑定设备上的不同CPU。如图4所示,容器1绑定的CPU为CPU0-CPU1,即容器1的主场CPU为CPU0-CPU1;容器2绑定的CPU为CPU2-CPU3,即容器1的主场CPU为CPU2-CPU3;容器3绑定的CPU为CPU4-CPU5,即容器3的主场CPU为CPU4-CPU5;容器4绑定的CPU为CPU6-CPU7,即容器4的主场CPU为CPU6-CPU7。
802、对于任意一个容器,物理服务器周期性获取该容器的主场CPU的运行状态数据。
在本申请实施例中,负载检测单元会周期性获取设备上每个CPU的运行状态数据,此处 仅是以任意一个容器的任意一个主场CPU为例进行举例说明。
在一些可能的实现方式中,上述运行状态数据包括负载和CPU的运行队列中进程的调度时延中的至少一种。以运行状态数据包括负载为例,响应于该主场CPU的负载不满足负载均衡条件,则不进行业务进程调度。示例性地,此处的不满足负载均衡条件,可以是该主场CPU的负载低于某一负载阈值,比如该负载阈值的取值为0.6,本申请在此不做限制。
803、响应于该主场CPU的运行状态数据满足负载均衡条件,物理服务器在该主场CPU和设备上客场CPU之间进行业务进程迁移;其中,客场CPU是指设备上与该容器未建立绑定关系的CPU。
需要说明的是,下文中出现的第一业务进程、第二业务进程、第三业务进程、第四业务进程,第一CPU、第二CPU、第三CPU、第四CPU,第一负载阈值、第二负载阈值、第三负载阈值,第一时间阈值、第二时间阈值,仅是为了对不同的业务进程、CPU、负载阈值和时间阈值进行区分,而不构成任何其他的限定。
以扩容逻辑为例,响应于该主场CPU的运行状态数据满足负载均衡条件,比如负载增高或业务进程的调度时延过大,则部分业务进程需要扩展到客场CPU上去运行。其中,图9是根据一示例性实施例示出的一种扩容逻辑的示意图。在图9中,CPU1作为容器1的主场CPU,其运行状态数据满足负载均衡条件,因此其上的部分业务进程需要扩展到CPU6上去运行,其中,CPU6为该容器的客场CPU。
在本申请实施例中,响应于该主场CPU的运行状态数据满足负载均衡条件,在该主场CPU和设备上客场CPU之间进行业务进程迁移,包括但不限于如下几种情形。其中,步骤8031对应扩容逻辑,步骤8032对应缩容逻辑。
8031、在当前调度周期内,响应于该主场CPU的负载高于第一负载阈值,在与该容器未建立绑定关系的客场CPU中确定负载最低的第一CPU,将运行在该主场CPU上的第一业务进程迁入第一CPU;或,在当前调度周期内,响应于该主场CPU的运行队列中进程的调度时延大于第一时间阈值,在与该容器未建立绑定关系的客场CPU中确定进程的调度时延最小的第二CPU,将运行在该主场CPU上的第一业务进程迁入第二CPU。
需要说明的是,第一CPU和第二CPU既可能是同一CPU,也可能是不同的CPU,本申请在此不做限制。另外,该主场CPU的运行队列中进程的调度时延既可以针对该主场CPU的主场进程,也可以针对该主场CPU的客场进程,本申请在此同样不做限制。
示例性地,上述第一负载阈值可以是0.8,上述第一时间阈值可以是24ms,本申请在此不做限制。以第一负载阈值为0.8,第一时间阈值为24ms为例,假设该主场CPU的负载超过0.8或者进程的调度时延大于24ms,那么动态扩缩容单元会在该容器的客场CPU中挑选最空闲的CPU,并且向该客场CPU发送强制负载均衡的处理器间中断(IPI,Inter-Processor Interrupt);该客场CPU在接收到强制负载均衡的IPI中断后,执行强制负载均衡,直接到该主场CPU的运行队列中拉取业务进程,而不需要等待负载均衡周期的到来。其中,负载均衡周期用于限制执行负载均衡的频次,以避免过于频繁地进行负载均衡。通过上述方式,当主场CPU的负载过高或者该主场CPU的运行队列中进程的调度时延过大时,执行强制负载均衡,将运行在该主场CPU上的业务进程迁入负载最低或调度时延最小的客场CPU,充分保障了主场CPU繁忙时其上的业务进程执行的及时性。
在一些可能的实现方式中,被迁移的第一业务进程是位于该主场CPU的运行队列中队尾的主场进程。
8032、在当前调度周期内,在迁出的CPU为该主场CPU且迁入的CPU为与该容器未建立绑定关系的客场CPU的情况下,响应于该主场CPU的负载低于第二负载阈值,从迁入的CPU将第一业务进程迁回该主场CPU。
通过上述方式,当主场CPU的负载较低时,将第一业务进程迁回该主场CPU,使得该第一业务进程仍然能够在其主场CPU上以较高的优先级运行,充分保障了业务性能。
804、响应于该容器内的第一业务进程被迁移,物理服务器确定第一业务进程在迁入的CPU上的运行优先级;根据第一业务进程的运行优先级,在迁入的CPU上运行第一业务进程。
在本申请实施例中,在迁出的CPU为该主场CPU且迁入的CPU为与该容器未建立绑定关系的客场CPU的情况下,第一业务进程在迁入的CPU上的运行优先级会被设置为第二运行优先级,即第一业务进程在迁入的CPU上以较低的优先级运行,以避免影响这个CPU上主场进程的执行。
在迁入的CPU为该主场CPU且迁出的CPU为与该容器未建立绑定关系的客场CPU的情况下,第一业务进程在迁入的CPU上的运行优先级会被设置为第一运行优先级,即第一业务进程在回归其主场CPU后,会以较高的优先级运行。
在另一些可能的实现方式中,本申请实施例还包括另外一种扩容逻辑,即参见图10,上述步骤803-804还可以被如下步骤805-806替代。
805、响应于该运行状态数据包括负载,在当前调度周期内,响应于该主场CPU的负载位于目标阈值区间,物理服务器在与该容器未建立绑定关系的客场CPU中确定负载最低的第一CPU;在设备上的全量CPU中确定负载最高的第三CPU;将运行在第三CPU上的第二业务进程迁入第一CPU。
在一些可能的实现方式中,将运行在第三CPU上的第二业务进程迁入第一CPU,包括但不限于如下方式:在目标时长后,将位于第三CPU的运行队列中的第二业务进程加入到第一CPU的运行队列中;其中,目标时长是根据负载均衡周期设置的。
示例性地,该目标阈值区间为0.6-0.8,本申请在此不做限制。
以该目标阈值区间为0.6-0.8为例,假设该主场CPU的负载超过0.6但小于0.8,那么动态扩缩容单元会在该容器的客场CPU中挑选最空闲即负载最低的CPU,并且向该客场CPU发送周期性负载均衡的IPI中断,该客场CPU在接收到周期性负载均衡的IPI中断后,会缩短执行负载均衡的周期,比如在达到一半的负载均衡周期时即进行周期性负载均衡。示例性地,周期性负载均衡会从全局CPU中去寻找最为繁忙的CPU,进而从最为繁忙的CPU上去拉取业务进程。通过上述方式,采用周期性负载均衡策略,从全局CPU中去最为繁忙的CPU上拉取业务进程,迁移至最空闲的CPU上去,从而提升了资源利用率,以及提升了整个系统的各个CPU之间的负载均衡性。
在另一些可能的实现方式中,被迁移的第二业务进程是位于该CPU的运行队列中队尾的主场进程。
806、物理服务器确定第二业务进程在第一CPU上的运行优先级;根据第二业务进程的运行优先级,在第一CPU上运行第二业务进程。
本步骤与上述步骤804同理,此处不再赘述。
在另一些可能的实现方式中,本申请实施例还包括缩容逻辑。针对缩容逻辑,比如当该主场CPU的业务进程运行在客场CPU上时,响应于该客场CPU的运行状态数据满足负载均衡条件,比如负载增高或迁移过来的业务进程的调度时延过大,那么便会触发缩容操作。其中,图11是根据一示例性实施例示出的一种缩容逻辑的示意图。在图11中,CPU6作为容器1的客场CPU,其运行状态数据满足负载均衡条件,比如CPU6因为运行其主场进程导致负载增高,因此被迁移过来的业务进程需要迁回到其主场CPU即CPU1上去运行。
参见图12,在步骤804之后还包括如下步骤807。
807、响应于运行状态数据包括进程的调度时延;在迁出的CPU为该主场CPU且迁入的CPU为与该容器未建立绑定关系的客场CPU的情况下,在下一个调度周期内,响应于迁入的CPU的运行队列中进程的调度时延大于第三时间阈值,物理服务器从迁入的CPU将第一业务进程迁回该主场CPU。
示例性地,上述第三时间阈值可以是24ms,本申请在此不做限制。
以第三时间阈值为24ms为例,假设在客场CPU上第一业务进程的调度时延大于24ms, 那么动态扩缩容单元会向该主场CPU发送强制负载均衡的IPI中断;该主场CPU在接收到强制负载均衡的IPI中断后,执行强制负载均衡,直接到该客场CPU的运行队列中拉取第一业务进程,第一业务进程在回归该主场CPU后,会以较高的优先级运行。通过上述方式,当客场CPU的运行队列中进程的调度时延较大时,将第一业务进程迁回该主场CPU,确保该第一业务进程的及时执行。
在另一些可能的实现方式中,本申请实施例还包括另外一种缩容逻辑。
其中,在迁出的CPU为该主场CPU且迁入的CPU为与该容器未建立绑定关系的客场CPU的情况下,本申请实施例示出的缩容逻辑还包括:
在下一个调度周期内,响应于迁入的CPU的负载高于第三负载阈值,在设备上的全量CPU中确定负载最高的第三CPU;将运行在第三CPU上的第三业务进程迁入该主场CPU;确定第三业务进程在该主场CPU上的运行优先级;根据第三业务进程的运行优先级,在该主场CPU上运行第三业务进程;或,
在下一个调度周期内,响应于迁入的CPU的运行队列中进程的调度时延大于第二时间阈值,在设备上的全量CPU中确定进程的调度时延最大的第四CPU;将运行在第四CPU上的第四业务进程迁入该主场CPU;确定第四业务进程在该主场CPU上的运行优先级;根据第四业务进程的运行优先级,在该主场CPU上运行第四业务进程。
在另一些可能的实现方式中,第三CPU和第四CPU既可能是同一CPU,也可能是不同的CPU,相应地,第三业务进程和第四业务进程既可能是同一业务进程,也可能是不同的业务进程,本申请在此不做限制。另外,被迁移的第三业务进程和第四业务进程可以是位于相应CPU的运行队列中队尾的主场进程,本申请在此同样不做限制。
示例性地,上述第三负载阈值可以是0.7,上述第二时间阈值可以是18ms,本申请在此不做限制。以第三负载阈值为0.7,第二时间阈值为18ms为例,假设该客场CPU的负载超过0.7或者进程的调度时延大于18ms,那么动态扩缩容单元会向该主场CPU发送周期性负载均衡的IPI中断;该主场CPU在接收到周期性负载均衡的IPI中断后,会忽略负载均衡周期的时间控制,直接进行周期性负载均衡,从全局CPU中寻找最繁忙的CPU,直接到该CPU的运行队列中拉取业务进程。通过上述方式,当客场CPU过于繁忙时,执行周期性负载均衡,将繁忙的CPU上的业务进程迁入较为空闲的CPU,充分保障了全局CPU的负载均衡。
本申请实施例提供的调度方案应用于在同一个设备上部署不同业务的情形下,该调度方案首先基于设备上容器与CPU之间的绑定关系提出了主场CPU和客场CPU的概念;对于任意一个容器,与该容器建立绑定关系的CPU称之为该容器的主场CPU,与该容器未建立绑定关系的CPU称之为该容器的客场CPU。且,每个容器绑定的CPU个数小于目标数目,其中,目标数目是指满足每个容器的业务运行需求所需的CPU个数。由于每个容器绑定了较少的CPU,因此能够在同一个设备上部署更多的容器,提供给不同的业务使用,实现了较高的CPU资源超卖,能够提升资源利用率。其中,超卖在此处是指实际拥有的CPU资源小于分配的CPU资源。
另外,本申请实施例还支持业务进程在多个CPU之间进行调度,能够实现高效地业务进程迁移。这种调度方式能够避免出现即使某些CPU处于空闲状态也不能运行业务进程的情况,确保了资源利用率。
另外,本申请实施例还提出了运行优先级的概念;示例性地,假设该容器内的第一业务进程被迁移,那么会确定第一业务进程在迁入的CPU上的运行优先级,进而根据第一业务进程的运行优先级,在迁入的CPU上运行第一业务进程,该种优先级控制方式能够避免出现不同业务之间相互冲突的情况。其中,第一业务进程在迁入的CPU上按照确定的优先级运行,不会影响与该CPU绑定的容器内业务进程在其上的运行,确保了业务性能。
综上所述,本申请实施例提供的调度方案能够兼顾业务性能和资源利用率。示例性地,对于容器云场景,本申请实施例提出了为容器分配CPU资源的新方式,兼顾了容器的性能与 CPU的超卖率。另外,业务进程运行于主场CPU和客场CPU上具有不同的优先级,从而保证了业务性能。
在另一些可能的实现方式中,针对上述步骤804,如果第一业务进程迁入的CPU上当前没有运行自己的主场进程,那么可以暂时地按照一定规则将这个CPU上运行的客场进程设置为高优先级,等有主场进程运行时,再切换回低优先级。详细来讲,在迁入的CPU为与该容器未建立绑定关系的客场CPU的情况下,本申请实施例提供的方法还包括:响应于当前未运行迁入的CPU的主场进程,且当前在迁入的CPU上运行的客场进程中第一业务进程所属业务的时延敏感度最高,将第一业务进程的运行优先级临时调整为第二运行优先级;响应于迁入的CPU的主场进程处于就绪状态,将第一业务进程的运行优先级调整回第一运行优先级。其中,时延敏感度用于表征业务对时延的敏感程度。例如,对于游戏业务,其时延敏感度较高。通过上述方式,如果迁移到的CPU上当前没有运行自己的主场进程,那么可以暂时地按照一定规则将这个CPU上的客场进程设置为高优先级,等有主场进程运行时,再切换回低优先级,从而充分利用CPU的处理资源,尽可能地提升业务进程的执行效率。
在另一些可能的实现方式中,响应于该主场CPU的运行状态数据满足负载均衡条件,在该主场CPU和客场CPU之间进行业务进程迁移,还可以包括:在当前调度周期内,响应于该主场CPU的运行状态数据满足负载均衡条件且迁出的CPU为该主场CPU,根据当前调度周期内的运行状态数据,在与该容器未建立绑定关系的客场CPU中确定多个候选迁入对象;预测多个候选迁入对象在之后多个调度周期内的运行状态数据;根据多个候选迁入对象在当前调度周期内的运行状态数据和预测的运行状态数据,在多个候选迁入对象中确定迁入的CPU;在该主场CPU和迁入的CPU之间进行业务进程迁移。示例性地,可以根据每个候选迁入对象的运行队列中进程的个数进行预测,或者,还可以根据每个候选迁入对象绑定的容器所处理的业务类型进行预测,本申请在此不做限制。通过上述方式,将多个CPU作为候选的迁入对象,然后预估在接下来的几个调度周期内这些候选的迁入对象的运行状态数据,并据此完成负载均衡,能够避免频繁地进行负载均衡操作,减少处理开销。
图13是根据一示例性实施例示出的一种基于容器的进程调度装置的结构示意图。参见图13,该装置包括以下模块。
获取模块1301,被配置为对于任意一个容器,周期性获取所述容器的主场CPU的运行状态数据;其中,所述主场CPU是指设备上与所述容器建立绑定关系的CPU;所述容器绑定的CPU个数小于目标数目,所述目标数目是指满足所述容器的业务运行需求所需的CPU个数。
调度模块1302,被配置为响应于所述主场CPU的运行状态数据满足负载均衡条件,在所述主场CPU和客场CPU之间进行业务进程迁移;其中,所述客场CPU是指所述设备上与所述容器未建立绑定关系的CPU。
确定模块1303,被配置为响应于所述容器内的第一业务进程被迁移,确定所述第一业务进程在迁入的CPU上的运行优先级。
运行模块1304,被配置为根据所述第一业务进程的运行优先级,在所述迁入的CPU上运行所述第一业务进程。
本申请实施例提供的调度方案,首先基于设备上容器与CPU之间的绑定关系提出了主场CPU和客场CPU的概念;对于任意一个容器,与该容器建立绑定关系的CPU称之为该容器的主场CPU,与该容器未建立绑定关系的CPU称之为该容器的客场CPU。且,每个容器绑定的CPU个数小于目标数目,其中,目标数目是指满足每个容器的业务运行需求所需的CPU个数。由于每个容器绑定了较少的CPU,因此能够在同一个设备上部署更多的容器,提供给不同的业务使用,实现了较高的CPU资源超卖,能够提升资源利用率。其中,超卖在此处是指实际拥有的CPU资源小于分配的CPU资源。
另外,本申请实施例还支持业务进程在多个CPU之间进行调度。详细来讲,对于该容器的任意一个主场CPU,响应于该主场CPU的运行状态数据满足负载均衡条件,则在该主场CPU和该容器的客场CPU之间进行业务进程迁移;这种调度方式能够避免出现即使某些CPU处于空闲状态也不能运行业务进程的情况,确保了资源利用率。
另外,本申请实施例还提出了运行优先级的概念;假设该容器内的第一业务进程被迁移,那么会确定第一业务进程在迁入的CPU上的运行优先级,进而根据第一业务进程的运行优先级,在迁入的CPU上运行第一业务进程,该种优先级控制方式能够避免出现不同业务之间相互冲突的情况。其中,第一业务进程在迁入的CPU上按照确定的优先级运行,不会影响与该CPU绑定的容器内业务进程在其上的运行,确保了业务性能。
综上所述,本申请实施例提供的调度方案能够兼顾业务性能和资源利用率。
在一些可能的实现方式中,所述运行状态数据包括所述主场CPU的运行队列中进程的调度时延;所述获取模块,被配置为周期性获取所述主场CPU的主场进程列表中主场进程的调度时延;其中,所述主场进程是指所述容器内的业务进程;周期性获取所述主场CPU的客场进程列表中客场进程的调度时延;其中,所述客场进程是指与所述主场CPU未建立绑定关系的容器内的业务进程。
在一些可能的实现方式中,所述确定模块,被配置为响应于所述迁入的CPU为所述主场CPU,将所述第一业务进程在所述迁入的CPU上的运行优先级设置为第一运行优先级;响应于所述迁入的CPU为与所述容器未建立绑定关系的客场CPU,将所述第一业务进程在所述迁入的CPU上的运行优先级设置为第二运行优先级;其中,所述第一运行优先级高于所述第二运行优先级。
在一些可能的实现方式中,所述运行状态数据包括所述主场CPU的负载;所述调度模块,被配置为在当前调度周期内,响应于所述主场CPU的负载高于第一负载阈值,在与所述容器未建立绑定关系的客场CPU中确定负载最低的第一CPU;将运行在所述主场CPU上的所述第一业务进程迁入所述第一CPU。
在一些可能的实现方式中,所述运行状态数据包括所述主场CPU的运行队列中进程的调度时延;所述调度模块,被配置为在当前调度周期内,响应于所述主场CPU的运行队列中进程的调度时延大于第一时间阈值,在与所述容器未建立绑定关系的客场CPU中确定进程的调度时延最小的第二CPU;将运行在所述主场CPU上的所述第一业务进程迁入所述第二CPU。
在一些可能的实现方式中,所述运行状态数据包括所述主场CPU的负载;所述调度模块,被配置为在当前调度周期内,在迁出的CPU为所述主场CPU且所述迁入的CPU为与所述容器未建立绑定关系的客场CPU的情况下,响应于所述主场CPU的负载低于第二负载阈值,从所述迁入的CPU将所述第一业务进程迁回所述主场CPU。
在一些可能的实现方式中,所述运行状态数据包括所述主场CPU的负载;所述调度模块,还被配置为在当前调度周期内,响应于所述主场CPU的负载位于目标阈值区间,在与所述容器未建立绑定关系的客场CPU中确定负载最低的第一CPU;在所述设备上的全量CPU中确定负载最高的第三CPU;将运行在所述第三CPU上的第二业务进程迁入所述第一CPU;
所述确定模块,还被配置为确定所述第二业务进程在所述第一CPU上的运行优先级;
所述运行模块,还被配置为根据所述第二业务进程的运行优先级,在所述第一CPU上运行所述第二业务进程。
在一些可能的实现方式中,所述运行模块,被配置为在目标时长后,将位于所述第三CPU的运行队列中的所述第二业务进程加入到所述第一CPU的运行队列中;其中,所述目标时长是根据负载均衡周期设置的。
在一些可能的实现方式中,所述运行状态数据包括所述主场CPU的负载;在迁出的CPU为所述主场CPU且所述迁入的CPU为与所述容器未建立绑定关系的客场CPU的情况下,所述调度模块,还被配置为在下一个调度周期内,响应于迁入的CPU的负载高于第三负载阈值, 在所述设备上的全量CPU中确定负载最高的第三CPU;将运行在所述第三CPU上的第三业务进程迁入所述主场CPU;
所述确定模块,还被配置为确定所述第三业务进程在所述主场CPU上的运行优先级;
所述运行模块,还被配置为根据所述第三业务进程的运行优先级,在所述主场CPU上运行所述第三业务进程。
在一些可能的实现方式中,所述运行状态数据包括所述主场CPU的运行队列中进程的调度时延;在迁出的CPU为所述主场CPU且所述迁入的CPU为与所述容器未建立绑定关系的客场CPU的情况下,所述调度模块,还被配置为在下一个调度周期内,响应于迁入的CPU的运行队列中进程的调度时延大于第二时间阈值,在所述设备上的全量CPU中确定进程的调度时延最大的第四CPU;将运行在所述第四CPU上的第四业务进程迁入所述主场CPU;
所述确定模块,还被配置为确定所述第四业务进程在所述主场CPU上的运行优先级;
所述运行模块,还被配置为根据所述第四业务进程的运行优先级,在所述主场CPU上运行所述第四业务进程。
在一些可能的实现方式中,所述运行状态数据包括所述主场CPU的运行队列中进程的调度时延;在迁出的CPU为所述主场CPU且所述迁入的CPU为与所述容器未建立绑定关系的客场CPU的情况下,所述调度模块,还被配置为在下一个调度周期内,响应于所述迁入的CPU的运行队列中进程的调度时延大于第三时间阈值,从所述迁入的CPU将所述第一业务进程迁回所述主场CPU。
在一些可能的实现方式中,在所述迁入的CPU为与所述容器未建立绑定关系的客场CPU的情况下,所述确定模块,还被配置为响应于当前未运行所述迁入的CPU的主场进程,且当前在所述迁入的CPU上运行的客场进程中所述第一业务进程所属业务的时延敏感度最高,将所述第一业务进程的运行优先级临时调整为所述第二运行优先级;响应于所述迁入的CPU的主场进程处于就绪状态,将所述第一业务进程的运行优先级调整回所述第一运行优先级。
在一些可能的实现方式中,所述调度模块,还被配置为在当前调度周期内,响应于所述主场CPU的运行状态数据满足负载均衡条件且迁出的CPU为所述主场CPU,根据当前调度周期内的运行状态数据,在与所述容器未建立绑定关系的客场CPU中确定多个候选迁入对象;预测所述多个候选迁入对象在之后多个调度周期内的运行状态数据;根据所述多个候选迁入对象在当前调度周期内的运行状态数据和预测的运行状态数据,在所述多个候选迁入对象中确定所述迁入的CPU;在所述主场CPU和所述迁入的CPU之间进行业务进程迁移。
上述所有可选技术方案,可以采用任意结合形成本公开的可选实施例,在此不再一一赘述。
需要说明的是:上述实施例提供的基于容器的进程调度装置在进行进程调度时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的基于容器的进程调度装置与基于容器的进程调度方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
在示例性实施例中,还提供了一种计算机可读存储介质,例如包括计算机程序的存储器,上述计算机程序可由计算机设备中的处理器执行以完成上述实施例中的基于容器的进程调度方法。例如,所述计算机可读存储介质可以是只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、光盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、磁带、软盘和光数据存储设备等。
在示例性实施例中,还提供了一种计算机程序产品,该计算机程序产品包括计算机程序,该计算机程序存储在计算机可读存储介质中,计算机设备的处理器从计算机可读存储介质读 取该计算机程序,处理器执行该计算机程序,使得该计算机设备执行上述基于容器的进程调度方法。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (15)

  1. 一种基于容器的进程调度方法,所述方法由计算机设备执行,所述方法包括:
    对于任意一个容器,周期性获取所述容器的主场中央处理单元CPU的运行状态数据;其中,所述主场CPU是指设备上与所述容器建立绑定关系的CPU;所述容器绑定的CPU个数小于目标数目,所述目标数目是指满足所述容器的业务运行需求所需的CPU个数;
    响应于所述主场CPU的运行状态数据满足负载均衡条件,在所述主场CPU和客场CPU之间进行业务进程迁移;其中,所述客场CPU是指所述设备上与所述容器未建立绑定关系的CPU;
    响应于所述容器内的第一业务进程被迁移,确定所述第一业务进程在迁入的CPU上的运行优先级;根据所述第一业务进程的运行优先级,在所述迁入的CPU上运行所述第一业务进程。
  2. 根据权利要求1所述的方法,其中,所述运行状态数据包括所述主场CPU的运行队列中进程的调度时延;
    所述周期性获取所述容器的主场中央处理单元CPU的运行状态数据,包括:
    周期性获取所述主场CPU的主场进程列表中主场进程的调度时延;其中,所述主场进程是指所述容器内的业务进程;
    周期性获取所述主场CPU的客场进程列表中客场进程的调度时延;其中,所述客场进程是指与所述主场CPU未建立绑定关系的容器内的业务进程。
  3. 根据权利要求1或2所述的方法,其中,所述确定所述第一业务进程在迁入的CPU上的运行优先级,包括:
    响应于所述迁入的CPU为所述主场CPU,将所述第一业务进程在所述迁入的CPU上的运行优先级设置为第一运行优先级;
    响应于所述迁入的CPU为与所述容器未建立绑定关系的客场CPU,将所述第一业务进程在所述迁入的CPU上的运行优先级设置为第二运行优先级;
    其中,所述第一运行优先级高于所述第二运行优先级。
  4. 根据权利要求1至3任一项所述的方法,其中,所述运行状态数据包括所述主场CPU的负载和所述主场CPU的运行队列中进程的调度时延;
    所述响应于所述主场CPU的运行状态数据满足负载均衡条件,在所述主场CPU和客场CPU之间进行业务进程迁移,包括:
    在当前调度周期内,响应于所述主场CPU的负载高于第一负载阈值,在与所述容器未建立绑定关系的客场CPU中确定负载最低的第一CPU;将运行在所述主场CPU上的所述第一业务进程迁入所述第一CPU;或,
    在当前调度周期内,响应于所述主场CPU的运行队列中进程的调度时延大于第一时间阈值,在与所述容器未建立绑定关系的客场CPU中确定进程的调度时延最小的第二CPU;将运行在所述主场CPU上的所述第一业务进程迁入所述第二CPU。
  5. 根据权利要求1至4任一项所述的方法,其中,所述运行状态数据包括所述主场CPU的负载;
    所述响应于所述主场CPU的运行状态数据满足负载均衡条件,在所述主场CPU和客场CPU之间进行业务进程迁移,包括:
    在当前调度周期内,在迁出的CPU为所述主场CPU且所述迁入的CPU为与所述容器未 建立绑定关系的客场CPU的情况下,响应于所述主场CPU的负载低于第二负载阈值,从所述迁入的CPU将所述第一业务进程迁回所述主场CPU。
  6. 根据权利要求1至5任一项所述的方法,其中,所述运行状态数据包括所述主场CPU的负载;所述方法还包括:
    在当前调度周期内,响应于所述主场CPU的负载位于目标阈值区间,在与所述容器未建立绑定关系的客场CPU中确定负载最低的第一CPU;
    在所述设备上的全量CPU中确定负载最高的第三CPU;
    将运行在所述第三CPU上的第二业务进程迁入所述第一CPU;
    确定所述第二业务进程在所述第一CPU上的运行优先级;
    根据所述第二业务进程的运行优先级,在所述第一CPU上运行所述第二业务进程。
  7. 根据权利要求6所述的方法,其中,所述将运行在所述第三CPU上的第二业务进程迁入所述第一CPU,包括:
    在目标时长后,将位于所述第三CPU的运行队列中的所述第二业务进程加入到所述第一CPU的运行队列中;
    其中,所述目标时长是根据负载均衡周期设置的。
  8. 根据权利要求1至7任一项所述的方法,其中,所述运行状态数据包括所述主场CPU的负载和所述主场CPU的运行队列中进程的调度时延;
    在迁出的CPU为所述主场CPU且所述迁入的CPU为与所述容器未建立绑定关系的客场CPU的情况下,所述方法还包括:
    在下一个调度周期内,响应于迁入的CPU的负载高于第三负载阈值,在所述设备上的全量CPU中确定负载最高的第三CPU;将运行在所述第三CPU上的第三业务进程迁入所述主场CPU;确定所述第三业务进程在所述主场CPU上的运行优先级;根据所述第三业务进程的运行优先级,在所述主场CPU上运行所述第三业务进程;或,
    在下一个调度周期内,响应于迁入的CPU的运行队列中进程的调度时延大于第二时间阈值,在所述设备上的全量CPU中确定进程的调度时延最大的第四CPU;将运行在所述第四CPU上的第四业务进程迁入所述主场CPU;确定所述第四业务进程在所述主场CPU上的运行优先级;根据所述第四业务进程的运行优先级,在所述主场CPU上运行所述第四业务进程。
  9. 根据权利要求1至8任一项所述的方法,其中,所述运行状态数据包括所述主场CPU的运行队列中进程的调度时延;
    在迁出的CPU为所述主场CPU且所述迁入的CPU为与所述容器未建立绑定关系的客场CPU的情况下,所述方法还包括:
    在下一个调度周期内,响应于所述迁入的CPU的运行队列中进程的调度时延大于第三时间阈值,从所述迁入的CPU将所述第一业务进程迁回所述主场CPU。
  10. 根据权利要求3所述的方法,其中,在所述迁入的CPU为与所述容器未建立绑定关系的客场CPU的情况下,所述方法还包括:
    响应于当前未运行所述迁入的CPU的主场进程,且当前在所述迁入的CPU上运行的客场进程中所述第一业务进程所属业务的时延敏感度最高,将所述第一业务进程的运行优先级临时调整为所述第二运行优先级;
    响应于所述迁入的CPU的主场进程处于就绪状态,将所述第一业务进程的运行优先级调整回所述第一运行优先级。
  11. 根据权利要求1至10任一项所述的方法,其中,所述响应于所述主场CPU的运行状态数据满足负载均衡条件,在所述主场CPU和客场CPU之间进行业务进程迁移,包括:
    在当前调度周期内,响应于所述主场CPU的运行状态数据满足负载均衡条件且迁出的CPU为所述主场CPU,根据当前调度周期内的运行状态数据,在与所述容器未建立绑定关系的客场CPU中确定多个候选迁入对象;
    预测所述多个候选迁入对象在之后多个调度周期内的运行状态数据;
    根据所述多个候选迁入对象在当前调度周期内的运行状态数据和预测的运行状态数据,在所述多个候选迁入对象中确定所述迁入的CPU;
    在所述主场CPU和所述迁入的CPU之间进行业务进程迁移。
  12. 一种基于容器的进程调度装置,所述装置包括:
    获取模块,被配置为对于任意一个容器,周期性获取所述容器的主场中央处理单元CPU的运行状态数据;其中,所述主场CPU是指设备上与所述容器建立绑定关系的CPU;所述容器绑定的CPU个数小于目标数目,所述目标数目是指满足所述容器的业务运行需求所需的CPU个数;
    调度模块,被配置为响应于所述主场CPU的运行状态数据满足负载均衡条件,在所述主场CPU和客场CPU之间进行业务进程迁移;其中,所述客场CPU是指所述设备上与所述容器未建立绑定关系的CPU;
    确定模块,被配置为响应于所述容器内的第一业务进程被迁移,确定所述第一业务进程在迁入的CPU上的运行优先级;
    运行模块,被配置为根据所述第一业务进程的运行优先级,在所述迁入的CPU上运行所述第一业务进程。
  13. 一种计算机设备,所述设备包括处理器和存储器,所述存储器中存储有计算机程序,所述计算机程序由所述处理器加载并执行以实现如权利要求1至11中任一项权利要求所述的基于容器的进程调度方法。
  14. 一种计算机可读存储介质,所述存储介质中存储有计算机程序,所述计算机程序由处理器加载并执行以实现如权利要求1至11中任一项权利要求所述的基于容器的进程调度方法。
  15. 一种计算机程序产品,所述计算机程序产品包括计算机程序,所述计算机程序存储在计算机可读存储介质中,计算机设备的处理器从所述计算机可读存储介质读取所述计算机程序,所述处理器执行所述计算机程序,使得所述计算机设备执行如权利要求1至11中任一项权利要求所述的基于容器的进程调度方法。
PCT/CN2023/110686 2022-09-02 2023-08-02 基于容器的进程调度方法、装置、设备及存储介质 WO2024046017A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211068759.4A CN115129458B (zh) 2022-09-02 2022-09-02 基于容器的进程调度方法、装置、设备及存储介质
CN202211068759.4 2022-09-02

Publications (1)

Publication Number Publication Date
WO2024046017A1 true WO2024046017A1 (zh) 2024-03-07

Family

ID=83387095

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/110686 WO2024046017A1 (zh) 2022-09-02 2023-08-02 基于容器的进程调度方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN115129458B (zh)
WO (1) WO2024046017A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115129458B (zh) * 2022-09-02 2022-11-25 腾讯科技(深圳)有限公司 基于容器的进程调度方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5506987A (en) * 1991-02-01 1996-04-09 Digital Equipment Corporation Affinity scheduling of processes on symmetric multiprocessing systems
CN109522101A (zh) * 2017-09-20 2019-03-26 三星电子株式会社 用于调度多个操作系统任务的方法、系统和/或装置
CN110928661A (zh) * 2019-11-22 2020-03-27 北京浪潮数据技术有限公司 一种线程迁移方法、装置、设备及可读存储介质
CN112199194A (zh) * 2020-10-14 2021-01-08 广州虎牙科技有限公司 基于容器集群的资源调度方法、装置、设备和存储介质
CN113590313A (zh) * 2021-07-08 2021-11-02 杭州朗和科技有限公司 负载均衡方法、装置、存储介质和计算设备
CN115129458A (zh) * 2022-09-02 2022-09-30 腾讯科技(深圳)有限公司 基于容器的进程调度方法、装置、设备及存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488210B (zh) * 2020-04-02 2023-04-07 腾讯科技(深圳)有限公司 基于云计算的任务调度方法、装置和计算机设备
CN111694658A (zh) * 2020-04-30 2020-09-22 北京城市网邻信息技术有限公司 Cpu资源分配方法、装置、电子设备和存储介质
US11593143B2 (en) * 2020-07-30 2023-02-28 Vmware, Inc. System and method for distributed orchestration management in network function virtualization
CN112130963A (zh) * 2020-09-30 2020-12-25 腾讯科技(深圳)有限公司 虚拟机任务的调度方法、装置、计算机设备及存储介质
CN112559176A (zh) * 2020-12-11 2021-03-26 广州橙行智动汽车科技有限公司 一种指令处理方法和装置
CN113806075A (zh) * 2021-08-13 2021-12-17 苏州浪潮智能科技有限公司 kubernetes集群的容器热更新CPU核的方法、装置、设备及可读介质
CN113992688A (zh) * 2021-09-26 2022-01-28 阿里巴巴达摩院(杭州)科技有限公司 分布单元云端部署方法、设备、存储介质和系统
CN114579271A (zh) * 2022-02-28 2022-06-03 阿里巴巴(中国)有限公司 任务调度方法、分布式系统及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5506987A (en) * 1991-02-01 1996-04-09 Digital Equipment Corporation Affinity scheduling of processes on symmetric multiprocessing systems
CN109522101A (zh) * 2017-09-20 2019-03-26 三星电子株式会社 用于调度多个操作系统任务的方法、系统和/或装置
CN110928661A (zh) * 2019-11-22 2020-03-27 北京浪潮数据技术有限公司 一种线程迁移方法、装置、设备及可读存储介质
CN112199194A (zh) * 2020-10-14 2021-01-08 广州虎牙科技有限公司 基于容器集群的资源调度方法、装置、设备和存储介质
CN113590313A (zh) * 2021-07-08 2021-11-02 杭州朗和科技有限公司 负载均衡方法、装置、存储介质和计算设备
CN115129458A (zh) * 2022-09-02 2022-09-30 腾讯科技(深圳)有限公司 基于容器的进程调度方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN115129458A (zh) 2022-09-30
CN115129458B (zh) 2022-11-25

Similar Documents

Publication Publication Date Title
KR100628492B1 (ko) 실시간 동작 수행방법 및 시스템
US9396012B2 (en) Systems and methods of using a hypervisor with guest operating systems and virtual processors
US7296267B2 (en) System and method for binding virtual machines to hardware contexts
WO2017070900A1 (zh) 多核数字信号处理系统中处理任务的方法和装置
US20080270199A1 (en) Methods and apparatus for management of heterogeneous workloads
Hashem et al. MapReduce scheduling algorithms: a review
US9411649B2 (en) Resource allocation method
KR20050016170A (ko) 실시간 동작 수행방법 및 시스템
JPWO2002069174A1 (ja) 並列プロセス実行方法、及びマルチプロセッサ型コンピュータ
WO2024046017A1 (zh) 基于容器的进程调度方法、装置、设备及存储介质
CN111459622B (zh) 调度虚拟cpu的方法、装置、计算机设备和存储介质
WO2021046777A1 (zh) 资源调度、申请与定价方法、设备、系统及存储介质
Zhao et al. Efficient sharing and fine-grained scheduling of virtualized GPU resources
Pastorelli et al. Practical size-based scheduling for MapReduce workloads
WO2023160359A1 (zh) 资源调度方法以及装置
Majumder et al. Energy-aware real-time tasks processing for fpga-based heterogeneous cloud
Jin et al. Preemption-aware kernel scheduling for gpus
Shih et al. Fairness scheduler for virtual machines on heterogonous multi-core platforms
Xu et al. Optimal construction of virtual networks for cloud-based MapReduce workflows
EP3430510B1 (en) Operating system support for game mode
Tesfatsion et al. Power and performance optimization in FPGA‐accelerated clouds
US20220129327A1 (en) Latency sensitive workload balancing
Nzanywayingoma et al. Task scheduling and virtual resource optimising in Hadoop YARN-based cloud computing environment
CN114968500A (zh) 一种任务调度方法、装置、设备及存储介质
Pang et al. Efficient CUDA stream management for multi-DNN real-time inference on embedded GPUs

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23859052

Country of ref document: EP

Kind code of ref document: A1