WO2023230909A1 - 调度方法及相关装置 - Google Patents

调度方法及相关装置 Download PDF

Info

Publication number
WO2023230909A1
WO2023230909A1 PCT/CN2022/096455 CN2022096455W WO2023230909A1 WO 2023230909 A1 WO2023230909 A1 WO 2023230909A1 CN 2022096455 W CN2022096455 W CN 2022096455W WO 2023230909 A1 WO2023230909 A1 WO 2023230909A1
Authority
WO
WIPO (PCT)
Prior art keywords
execution
time
task
processor
virtual machine
Prior art date
Application number
PCT/CN2022/096455
Other languages
English (en)
French (fr)
Inventor
朱湘毅
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2022/096455 priority Critical patent/WO2023230909A1/zh
Publication of WO2023230909A1 publication Critical patent/WO2023230909A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • the present application relates to the field of information technology, and in particular, to a scheduling method and related devices.
  • Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain judgment results.
  • artificial intelligence is a branch of computer science that attempts to understand the nature of intelligence and produce a new class of intelligent machines that can respond in a manner similar to human intelligence.
  • Artificial intelligence is the study of the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, basic AI theory, etc.
  • This application provides a scheduling method and related devices, which meet the computing power requirements of each virtual machine and ensure the rationality of time slice allocation in each virtual machine.
  • this application provides a scheduling method applied to a virtual machine system.
  • the virtual machine system includes multiple virtual machines, and the multiple virtual machines share the computing power of a processor.
  • the method includes: processing The controller allocates a time slice to each virtual machine in the plurality of virtual machines according to first configuration information, the first configuration information is used to indicate a computing power proportion of each virtual machine in the plurality of virtual machines; in the The time slice exists in a first virtual machine among the plurality of virtual machines, and when the first task is being executed, a second task arrives, the processor stops executing the first task, and schedules execution of the third task. Two tasks, the priority of the execution sequence to which the second task belongs is higher than the priority of the execution sequence to which the first task belongs.
  • the processor allocates time slices to each virtual machine in the virtual machine system based on the computing power ratio of each virtual machine among multiple virtual machines.
  • the virtual machine with a larger computing power ratio is allocated more time slices.
  • the first virtual machine among the plurality of virtual machines includes n execution sequences, where n is an integer greater than or equal to 0; the method further includes: the processor performs One configuration information and second configuration information allocate the time slices to s execution sequences among the n execution sequences, and the second configuration information is used to indicate the n execution sequences in the first virtual machine.
  • the priority of , s is an integer less than or equal to n.
  • the first virtual machine is any virtual machine among multiple virtual machines in the virtual machine system.
  • the first virtual machine includes n execution sequences, and the processor performs operations according to the n execution sequences in the first virtual machine.
  • the priority is to allocate time slices to s execution sequences among the n execution sequences. Among the s execution sequences, the execution sequence with a higher priority is allocated more time slices. This realizes the secondary allocation of time slices and satisfies the The computing power requirements of each execution sequence in a virtual machine while ensuring the rationality of allocating time slices to each execution sequence.
  • the priority of the execution sequence in the first virtual machine includes a common type priority and a real-time type priority, and the real-time type priority is higher than the common type priority. level; wherein, the priorities of the s execution sequences are common type priorities.
  • the priority of the execution sequence in the first virtual machine includes the normal type priority and the real-time type priority.
  • the processor only determines n executions based on the priorities of n execution sequences in the first virtual machine.
  • the execution sequence with the priority of the normal type priority is allocated a time slice, and the execution sequence with the priority of the real-time type priority is scheduled according to actual needs. As long as the first virtual machine has a time slice, it can be scheduled with the priority of the real-time type.
  • the priority execution sequence ensures the real-time nature of real-time services.
  • the method further includes: the processor allocates the time slice to each virtual machine according to the first configuration information and third configuration information, the third configuration information A cycle length used to instruct the processor to allocate time slices to the multiple virtual machines.
  • the processor allocates time slices to multiple virtual machines in the virtual machine system according to the period indicated by the third configuration information, which meets the computing power requirements of each virtual machine and ensures that all services in the virtual machine can be completed on time. Complete as needed.
  • the cycle length of the time slice allocated by the processor to the multiple virtual machines is greater than or equal to the cycle length of the execution cycle of an execution sequence with a real-time priority.
  • the cycle length of the time slice allocated by the processor to multiple virtual machines in the virtual machine system is greater than or equal to the cycle length of the execution cycle of the execution sequence with the real-time type priority, so that the real-time in the virtual machine When the task is completed, time slices are re-allocated to multiple virtual machines to ensure the real-time nature of real-time tasks.
  • the method further includes: the processor receiving the first configuration information.
  • the method further includes: when there is a time slice and x first execution sequences and q second execution sequences to be executed in the first virtual machine among the plurality of virtual machines.
  • the processor preferentially schedules the tasks in the q second execution sequences, the priority of the first execution sequence is a common type priority, and the priority of the second execution sequence is a real-time type priority.
  • x and q are integers greater than or equal to 0.
  • the processor schedules the execution sequence first.
  • the priority of the tasks in the execution sequence is the real-time type priority, which ensures the real-time nature of the real-time tasks.
  • the processor preferentially schedules tasks in the q second execution sequences, including: the real-time type priority in the q second execution sequences also includes a first real-time priority. and a second real-time priority, the first real-time priority has a higher priority than the second real-time priority; the processor preferentially schedules the first real-time priority tasks in the q second execution sequences.
  • the processor when there are multiple execution sequences with real-time priority levels to be executed in the first virtual machine, the processor prioritizes tasks in the execution sequences with higher real-time priority levels.
  • the processor when executing the tasks in the execution sequence of the low-level real-time type priority, the execution results of the tasks in the execution sequence of the high-level real-time type priority need to be used, so the real-time type priority of the high level is executed first. Executing tasks in a sequence helps ensure the real-time performance of real-time tasks and improves the execution efficiency of real-time tasks.
  • the first task belongs to any one of the x first execution sequences
  • the second task belongs to any one of the q second execution sequences.
  • the processor stops scheduling the normal type priority task and starts scheduling the real-time type priority task. tasks, ensuring the real-time nature of real-time tasks.
  • the processor will The slice allocation ratio allocates the remaining time slices to the x first execution sequences.
  • the remaining time slices are the sum of the time slices allocated by the processor for the x first execution sequences and the q first execution sequences. The difference in time slices used by tasks in the second execution sequence.
  • the processor allocates the remaining time slices to the ordinary type priority execution sequence according to the original time slice allocation ratio, so that the ordinary priority execution sequence The time slice still satisfies the time slice allocation ratio.
  • the first task and the second task belong to different execution sequences among the q second execution sequences, and the execution sequence to which the second task belongs has a higher priority than the execution sequence.
  • the processor stops scheduling the low-level real-time priority task.
  • Tasks start real-time priority tasks with high scheduling levels, ensuring that real-time priority tasks in the first virtual machine can be executed from high to low in priority order, ensuring the real-time performance of real-time tasks and improving real-time performance. Task execution efficiency.
  • the method further includes: the processor obtains the total execution time and the current execution time of the first task; the processor obtains the total execution time and the current execution time of the first task; The current execution time is used to calculate the remaining execution time of the first task; when the remaining execution time of the first task is less than a preset threshold, the processor executes the first task.
  • the first task is executing and a second task arrives.
  • the first task is calculated based on the total execution time of the first task and the current execution time.
  • the processor still executes the first task to ensure that the current first task is completed to save system resources and improve business execution efficiency.
  • the method further includes: the processor obtains the total execution time and the current execution time of the first task; the processor obtains the total execution time and the current execution time of the first task; The current execution time is used to calculate the remaining execution time of the first task; when the remaining execution time of the first task is less than the product of the preset switching backup time of the first task and the preset multiple, the The processor performs the first task.
  • the first task is executing and a second task arrives.
  • the first task is calculated based on the total execution time of the first task and the current execution time.
  • the processor still executes the first task to ensure that the current first task is completed to save the system. resources to improve business execution efficiency.
  • the method further includes: the processor obtains the total execution time and the current execution time of the first task; the processor obtains the total execution time and the current execution time of the first task; The current execution time is used to calculate the remaining execution time of the first task; when the remaining execution time of the first task is greater than or equal to a preset threshold, the processor stops executing the first task.
  • the first task is executing and a second task arrives.
  • the first task is calculated based on the total execution time of the first task and the current execution time.
  • the processor stops executing the first task and starts executing the second task to ensure the real-time performance of the second task with high priority and improve Business execution efficiency.
  • the method further includes: storing execution information of the first task in a backup memory unit.
  • the processor after the processor stops scheduling the first task, it stores the execution information of the first task in the backup memory unit. When the processor schedules the first task again, it retrieves the execution information of the first task from the backup memory unit. , continue to perform the first task, and improve the execution efficiency of the first task.
  • the execution information of the first task includes one or more of the following information: general registers, special registers, internal cache memory, buffer memory of the logical operation unit that executes the first task. data in the area.
  • the processor when the processor stops scheduling the first task, it stores the data in the general register, special register, internal cache memory, and buffer of the logical operation unit executing the first task into the backup memory unit for re-execution.
  • the first task retains sufficient data and improves the execution efficiency of the first task.
  • the number of backup memory units in the processor is equal to the product of the number of logical operation units in the processor and the number of real-time priority levels. Compared with setting a backup for each execution sequence in the virtual machine system, Memory unit, greatly saving memory space.
  • the method further includes: when the execution of the second task is completed, the processor executes the unfinished first task.
  • the preempted task when a real-time task or a higher-priority task is completed, the preempted task will be processed first instead of executing a new task to ensure that the preempted task is executed first.
  • the method further includes: when there is no time slice in the first virtual machine and there is an execution sequence to be executed, the processor schedules other than the first virtual machine. Other virtual machines with time slices execute the tasks of the execution sequence in the first virtual machine.
  • the processor schedules other idle virtual machines to execute tasks, which improves the business processing efficiency of the virtual machine system and maximizes the utilization of the processor's computing power. change.
  • the method further includes: the time slice allocated by the processor to each virtual machine according to the first configuration information satisfies the following relationship:
  • Y represents the time slice allocated to each virtual machine
  • t represents the scheduling period, and in each scheduling cycle, the processor allocates the time slice to each virtual machine
  • m represents the The number of logical operation units in the processor
  • p represents the proportion of computing power of each virtual machine.
  • this application provides a scheduling method, applied to a virtual machine system.
  • the virtual machine system includes a host.
  • the method includes: the host obtains virtual machine configuration information, and the virtual machine configuration information is used to indicate the creation of a virtual machine.
  • Multiple virtual machines, the virtual machine configuration information includes the computing power ratio of each virtual machine in the multiple virtual machines; the host creates the multiple virtual machines according to the virtual machine configuration information, and the multiple virtual machines The virtual machines share the computing power of the processor; the host sends first configuration information to the processor, where the first configuration information is used to indicate a proportion of the computing power of each virtual machine in the plurality of virtual machines.
  • the virtual machine system includes a host.
  • the host creates multiple virtual machines based on the acquired virtual machine configuration information.
  • the created multiple virtual machines jointly complete tasks in the virtual machine system and send the first configuration information to the processor. , used to instruct the processor to increase the rationality of creating multiple virtual machines according to the computing power ratio of each virtual machine in multiple virtual machines, save the resources of the virtual machine system, and improve the execution efficiency of the virtual machine system.
  • the method further includes: the host obtains multiple models; and the host configures the first virtual machine among the multiple virtual machines according to the first model among the multiple models. Create n execution sequences, where n is an integer greater than or equal to 0; the host configures a priority for each execution sequence in the n execution sequences; sends second configuration information to the processor, and the second The configuration information is used to indicate the priorities of n execution sequences in the first virtual machine.
  • the host creates n execution sequences for the first virtual machine based on the first model among the multiple acquired models, configures a priority for each of the n execution sequences, and sends the second execution sequence to the processor.
  • Configuration information used to indicate the priorities of n execution sequences in the first virtual machine.
  • the n execution sequences can be processed in parallel, and each of the n execution sequences can be executed in order of priority from high to low, The execution efficiency of the first model is improved and the real-time requirements of the first model are ensured.
  • the priority includes a common type priority and a real-time type priority, and the real-time type priority is higher than the common type priority.
  • the virtual machine configuration information also includes a period length for allocating time slices to the multiple virtual machines; the method further includes: the host sending third configuration information to the processor , the third configuration information is used to indicate the length of the period for allocating time slices to the multiple virtual machines.
  • the host sends third configuration information to the processor, which is used to indicate the period of allocating time slices to multiple virtual machines in the virtual machine system, which meets the computing power requirements of each virtual machine and ensures that All business can be completed on time and on demand.
  • the period of allocating time slices to the plurality of virtual machines is greater than or equal to the execution period of an execution sequence with a real-time priority.
  • the period of allocating time slices to multiple virtual machines in the virtual machine system is greater than or equal to the execution period of the execution sequence with a real-time priority, so that when the real-time tasks in the virtual machines are completed , and then re-allocate time slices to multiple virtual machines to ensure the real-time performance of real-time tasks.
  • this application provides a scheduling device applied to a virtual machine system.
  • the virtual machine system includes multiple virtual machines, and the multiple virtual machines share the computing power of the processor.
  • the device includes: a distribution module, configured to allocate a time slice to each virtual machine in the plurality of virtual machines according to first configuration information, the first configuration information being used to indicate a computing power ratio of each virtual machine in the plurality of virtual machines; execute A module configured to cause the processor to stop executing the first task when the time slice exists in the first virtual machine among the plurality of virtual machines and a second task arrives when the first task is being executed, And the second task is scheduled to be executed, and the priority of the execution sequence to which the second task belongs is higher than the priority of the execution sequence to which the first task belongs.
  • the first virtual machine among the plurality of virtual machines includes n execution sequences, where n is an integer greater than or equal to 0; the device further includes: the allocation module is further configured to: The first configuration information and the second configuration information allocate the time slices to s execution sequences among the n execution sequences, and the second configuration information is used to indicate the n execution sequences in the first virtual machine.
  • the priority of an execution sequence, s is an integer less than or equal to n.
  • the priority of the execution sequence in the first virtual machine includes a common type priority and a real-time type priority, and the real-time type priority is higher than the common type priority. level; wherein, the priorities of the s execution sequences are common type priorities.
  • the allocation module is further configured to allocate the time slice to each virtual machine according to the first configuration information and third configuration information, and the third configuration information is used to indicate The processor allocates a period length of a time slice to the plurality of virtual machines.
  • the cycle length of the time slice allocated by the processor to the multiple virtual machines is greater than or equal to the cycle length of the execution cycle of an execution sequence with a real-time priority.
  • the device further includes: a receiving module, configured for the processor to receive the first configuration information.
  • the device further includes: a scheduling module configured to have a time slice and x first execution sequences and q first execution sequences to be executed in a first virtual machine among the plurality of virtual machines.
  • a scheduling module configured to have a time slice and x first execution sequences and q first execution sequences to be executed in a first virtual machine among the plurality of virtual machines.
  • the tasks in the q second execution sequences are prioritized.
  • the priority of the first execution sequence is a common type priority
  • the priority of the second execution sequence is a real-time type priority.
  • Level, x and q are integers greater than or equal to 0.
  • the scheduling module is specifically configured to: the real-time type priorities in the q second execution sequences also include a first real-time priority and a second real-time priority, and the first real-time priority The priority of the priority is higher than the second real-time priority; the scheduling module preferentially schedules the first real-time priority tasks in the q second execution sequences.
  • the first task belongs to any one of the x first execution sequences
  • the second task belongs to any one of the q second execution sequences.
  • the allocation module is also configured to The time slice allocation ratio allocates the remaining time slices to the x first execution sequences, and the remaining time slices are the sum of the time slices allocated by the processor for the x first execution sequences and the The difference between the time slices used for task execution in the q second execution sequences.
  • the first task and the second task belong to different execution sequences among the q second execution sequences, and the execution sequence to which the second task belongs has a higher priority than the execution sequence.
  • the device further includes: an acquisition module, configured to acquire the total execution time of the first task and the current execution time; and a calculation module, configured to obtain the total execution time of the first task according to the total execution time of the first task. and the current execution time, to calculate the remaining execution time of the first task; when the remaining execution time of the first task is less than a preset threshold, the execution module is also used to execute the first task.
  • an acquisition module configured to acquire the total execution time of the first task and the current execution time
  • a calculation module configured to obtain the total execution time of the first task according to the total execution time of the first task. and the current execution time, to calculate the remaining execution time of the first task; when the remaining execution time of the first task is less than a preset threshold, the execution module is also used to execute the first task.
  • the obtaining module is also used to obtain the total execution time and the current execution time of the first task;
  • the calculation module is also used to obtain the total execution time and the current execution time of the first task;
  • the current execution time is used to calculate the remaining execution time of the first task; when the remaining execution time of the first task is less than the product of the preset switching backup time of the first task and the preset multiple, the The execution module is also used to execute the first task.
  • the obtaining module is also used to obtain the total execution time and the current execution time of the first task;
  • the calculation module is also used to obtain the total execution time and the current execution time of the first task;
  • the current execution time is used to calculate the remaining execution time of the first task; when the remaining execution time of the first task is greater than or equal to a preset threshold, the execution module is also used to stop executing the first task. .
  • the device further includes: a storage module configured to store execution information of the first task into a backup memory unit.
  • the execution information of the first task includes one or more of the following information: general registers, special registers, internal cache memory, buffer memory of the logical operation unit that executes the first task. data in the area.
  • the execution module is also configured to cause the processor to execute the unfinished first task when execution of the second task is completed.
  • the scheduling module is also configured to schedule other existence times other than the first virtual machine when there is no time slice in the first virtual machine and there is an execution sequence to be executed.
  • the virtual machine of the slice executes the tasks of the execution sequence in the first virtual machine.
  • the time slice allocated by the processor to each virtual machine according to the first configuration information satisfies the following relationship:
  • Y represents the time slice allocated to each virtual machine
  • t represents the scheduling period, and in each scheduling cycle, the processor allocates the time slice to each virtual machine
  • m represents the The number of logical operation units in the processor
  • p represents the proportion of computing power of each virtual machine.
  • beneficial effects of the third aspect and various possible implementations of the third aspect may be referred to the beneficial effects of the first aspect and various possible implementations of the first aspect, and will not be described again here.
  • this application provides a scheduling device applied to a virtual machine system.
  • the virtual machine system includes a host.
  • the device includes: an acquisition module for acquiring virtual machine configuration information.
  • the virtual machine configuration information is used for Instructing to create multiple virtual machines, the virtual machine configuration information includes the computing power ratio of each virtual machine in the multiple virtual machines; a creation module configured to create the multiple virtual machines according to the virtual machine configuration information , the multiple virtual machines share the computing power of the processor;
  • the sending module is configured to send first configuration information to the processor, the first configuration information is used to indicate each virtual machine in the multiple virtual machines proportion of computing power.
  • the obtaining module is also used to obtain multiple models; the creating module is also used to create a first model in the multiple virtual machines according to the first model in the multiple models.
  • the virtual machine creates n execution sequences, where n is an integer greater than or equal to 0; the configuration module is used to configure the priority for each of the n execution sequences; the sending module is used to send the Second configuration information, the second configuration information is used to indicate the priorities of n execution sequences in the first virtual machine.
  • the priority includes a common type priority and a real-time type priority, and the real-time type priority is higher than the common type priority.
  • the virtual machine configuration information also includes a period length for allocating time slices to the multiple virtual machines; the sending module is also configured to send third configuration information to the processor, so The third configuration information is used to indicate the length of a period for allocating time slices to the multiple virtual machines.
  • the period of allocating time slices to the plurality of virtual machines is greater than or equal to the execution period of an execution sequence with a real-time priority.
  • this application provides a scheduling device.
  • the apparatus may include a processor coupled to a memory.
  • the memory is used to store program codes
  • the processor is used to execute the program codes in the memory to implement the method in the first aspect or the second aspect or any one of the implementation manners.
  • the device may also include the memory.
  • the present application provides a chip, including at least one processor and a communication interface.
  • the communication interface and the at least one processor are interconnected through lines.
  • the at least one processor is used to run computer programs or instructions to execute The method described in the first aspect or the second aspect or any possible implementation manner thereof.
  • the present application provides a chip system, which includes a plurality of chips as in the sixth aspect.
  • the present application provides a computer-readable medium that stores program code for device execution.
  • the program code includes a method for executing the first aspect or the second aspect, or any one of them possible. The method described in the implementation.
  • the present application provides a computer program product containing instructions.
  • the computer program product When the computer program product is run on a computer, it causes the computer to execute the steps described in the first aspect or the second aspect or any possible implementation manner thereof. method.
  • the present application provides a computing device, including at least one processor and a communication interface, the communication interface and the at least one processor are interconnected through lines, the communication interface communicates with the target system, and the at least one processing
  • the processor is used to run computer programs or instructions to perform the method described in the first aspect or the second aspect or any one of the possible implementations.
  • the present application provides a computing system, including at least one processor and a communication interface, the communication interface and the at least one processor are interconnected through lines, the communication interface communicates with the target system, and the at least one The processor is used to run computer programs or instructions to perform the method described in the first aspect or the second aspect or any possible implementation manner therein.
  • the present application provides a vehicle, which includes the chip described in the sixth aspect or the dispatching device described in the fifth aspect.
  • Figure 1 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • Figure 2 is a schematic flowchart of a scheduling method provided by an embodiment of the present application.
  • Figure 3 is a configuration flow chart of the scheduling cycle provided by an embodiment of the present application.
  • Figure 4 is a schematic flowchart of a scheduling method provided by an embodiment of the present application.
  • Figure 5 is a schematic flowchart of a host creating multiple virtual machines according to an embodiment of the present application
  • Figure 6 is a schematic flowchart of creating an execution sequence provided by an embodiment of the present application.
  • Figure 7 is a schematic diagram of priority preemption provided by an embodiment of the present application.
  • Figure 8 is a schematic diagram of a backup memory unit provided by an embodiment of the present application.
  • Figure 9 is a schematic structural diagram of a scheduling device according to an embodiment of the present application.
  • Figure 10 is a schematic structural diagram of a scheduling device according to another embodiment of the present application.
  • Figure 11 is a schematic structural diagram of a scheduling device according to another embodiment of the present application.
  • the Central Car Computer (CCC) architecture can virtualize multiple partitions of different security levels on a powerful underlying hardware platform for use by different businesses, and these partitions share hardware with computing processing capabilities.
  • partitions can also be called containers or virtual machines (VMs)
  • VMs virtual machines
  • computer processing capabilities can be referred to as computing power
  • hardware with computing power can include neural network processors (network process unit, NPU) or graphics processors ( graphics processing unit, GPU) and other processors.
  • the field of autonomous driving can include services such as planning control, predictive planning, perception fusion, 360-degree management view, driver monitor system (DMS), entertainment system or cockpit-related functions. These businesses have different requirements for computing power and real-time performance.
  • planning control, predictive planning, perception fusion and other services have high computing power requirements and real-time requirements;
  • DMS has high computing power requirements but not real-time requirements;
  • cockpit-related services have higher computing power requirements than actual requirements. It is related to functions and does not have high requirements on real-time performance.
  • Figure 1 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • the virtual machine system 100 includes a host 110 and a processor 120 .
  • the host 110 may include multiple virtual machines (VM1, VM2,..., VMn) and a virtual machine manager (hypervisor) 111.
  • Each of the multiple virtual machines includes an application (application, APP), memory management (runtime) and a virtual processor driver.
  • the virtual machine manager 111 may include a processor driver and a central processing unit (CPU).
  • the processor driver can provide the virtual machine manager 111 with the driving function of the processor 120.
  • the processor driver can provide the virtual machine manager 111 with the functions of setting the virtual machine computing power ratio and setting the virtual machine resource scheduling cycle.
  • Interface; runtime can be deployed in the APP and can provide user-mode driver functions of the processor 120 (such as application programming interface (API), etc.).
  • the APP loads the AI model to the processor 120 by calling the API provided by the runtime. And drive the processor 120 to execute the AI model and obtain the execution result of the AI model.
  • the processor 120 may be a dedicated neural network processor (AI chip), such as an NPU or a GPU.
  • the processor 120 may include a controller 121 and a plurality of logical operation units.
  • the controller 121 is configured to receive the AI model sent by the host 110 , schedule the execution of the AI model, obtain the execution results of the AI model, and report the execution results of the AI model to the host 110 .
  • the logical operation unit is used to execute the tasks in the AI model issued by the controller 121 (execution units in the execution sequence), and return the execution results of the tasks to the controller 121.
  • the AI model can be a computational graph structure.
  • the APP Before sending the AI model to the processor 120, the APP will convert the AI model of the computational graph structure into the execution sequence structure of the processor 120.
  • One AI model corresponds to one Or multiple execution sequences (multiple execution sequences can increase the degree of parallelism), each execution sequence has multiple execution units (that is, AI tasks).
  • An execution unit (AI task) can also be divided into multiple blocks (blocks). The number of blocks is generally equal to the number of cores of the logical operation unit in the processor 120.
  • Each logical operation unit executes one block at a time, so for a
  • the controller 121 can schedule AI tasks to be executed on multiple logical operation units at the same time.
  • the APP can load the AI model to the processor 120 and execute it multiple times once, or it can send the AI tasks of the AI model to the processor 120 for execution multiple times. No matter which execution mode is used, what the processor 120 sees is that the AI tasks in the execution sequence need to be executed. Therefore, the processor 120 concurrently schedules the AI model, that is, concurrently schedules the execution sequence.
  • the controller 121 can only issue one block of an execution unit to the logical operation unit at a time. After the execution of the logical operation unit is completed, the controller 121 then issues the next block to the logical operation unit.
  • the system architecture shown in Figure 1 is only an example of the virtual machine system provided by this application.
  • the virtual machine system 100 may include more or less than what is shown in the figure. Parts, or combining some parts, or splitting some parts, or different parts arrangements.
  • the components shown in the figures can be implemented in hardware, software, or a combination of software and hardware, and are not limited in this application.
  • AI model execution is a typical parallel computing.
  • AI tasks can be divided into multiple blocks, so they can be executed on multiple logical operation units at the same time to improve computing parallelism, improve execution efficiency, and shorten execution time.
  • the perception, prediction, and planning business execution AI models are in time order.
  • the results of perception are given to prediction and planning, and the results of prediction are given to planning. Therefore, if all logical operation units of the processor are used for real-time services when sensing, predicting, and planning services are executing AI models, the time for executing the AI models for real-time services can be minimized, thereby best ensuring the real-time nature of the services.
  • All virtual machines have configured computing power specifications.
  • each virtual machine needs to be allocated a time slice using the processor's logical operation unit.
  • the controller in the processor controls the business of each virtual machine.
  • the proportion of computing power allocated to non-real-time services must also be guaranteed. Therefore, the time slice that the controller in the processor allocates to each virtual machine to use the processor cannot be too small, that is, the scheduling cycle of the controller in the processor (each virtual machine is assigned a time slice in each scheduling cycle) cannot be too small. .
  • Figure 2 is a schematic diagram of an execution sequence of a scheduling method according to an embodiment of the present application. As shown in Figure 2, the method at least includes S201 to S205. The method shown in FIG. 2 can be applied to the virtual machine system 100 shown in FIG. 1 .
  • the host obtains virtual machine configuration information.
  • the virtual machine configuration information is used to instruct the creation of multiple virtual machines.
  • the virtual machine configuration information includes the computing power ratio of each virtual machine in the multiple virtual machines.
  • the host in this embodiment may be the host 110 in Figure 1 .
  • the host obtains the virtual machine configuration information through a virtual machine manager.
  • the processor driver provides an interface for the virtual machine manager to configure virtual machine configuration information.
  • the virtual machine manager in this embodiment can be the virtual machine manager 111 in Figure 1
  • the processor driver in this embodiment can be the processor driver in the virtual machine manager 111 in Figure 1 program.
  • the virtual machine configuration information may include information such as the number of virtual machines in the virtual machine system, the computing power ratio of each virtual machine among the multiple virtual machines, the resource scheduling period for the multiple virtual machines, and other information.
  • S202 The host creates multiple virtual machines according to the virtual machine configuration information, and the multiple virtual machines share the computing power of the processor.
  • the host creates multiple virtual machines according to the number of virtual machines in the virtual machine indicated by the virtual machine configuration information, and the multiple virtual machines share the computing power of the processor.
  • the host obtains multiple models, and the host creates n execution sequences (execution sequences) for the first virtual machine among the multiple virtual machines based on the first model among the multiple models, where n is greater than or an integer equal to 0, and configures the priority for each of the n execution sequences.
  • the priority can include the normal type priority and the real-time type priority. The real-time type priority is higher than the normal type priority.
  • the first model is any one of the multiple models obtained by the host, and the first virtual machine is any one of the multiple virtual machines of the virtual machine system.
  • the host sends first configuration information to the processor.
  • the first configuration information is used to indicate the computing power ratio of each virtual machine in the plurality of virtual machines.
  • the host sends first configuration information to the processor, and the first configuration information is used to indicate the computing power ratio of each virtual machine among the multiple virtual machines, so that the processor can perform the calculation according to the instruction in the first configuration information.
  • the computing power ratio of each virtual machine among the multiple virtual machines is calculated, and a time slice is allocated to each virtual machine.
  • the computing power ratio of each virtual machine in multiple virtual machines is the available computing power ratio of each virtual machine set by the NPU.
  • the computing power ratio of each virtual machine can be set separately, and the total computing power ratio is less than or equal to 100%.
  • the host sends second configuration information to the processor, and the second configuration information is used to indicate the priorities of n execution sequences in the first virtual machine, so that the processor indicates according to the second configuration information The priority of the n execution sequences in the first virtual machine, and allocate time slices to the n execution sequences.
  • n execution sequences are execution sequences to be executed in the first virtual machine.
  • the virtual machine configuration information obtained by the host also includes the period length of allocating time slices for multiple virtual machines, and the host can also send third configuration information to the processor, and the third configuration information is The length of the cycle indicating the allocation of time slices to multiple virtual machines in the virtual machine system is such that the processor allocates time slices to the multiple virtual machines in the virtual machine system according to the cycle indicated by the third configuration information.
  • the cycle length of time slices allocated to multiple virtual machines is greater than or equal to the execution cycle of an execution sequence whose priority is real-time type priority.
  • the period length for allocating time slices to multiple virtual machines is also called a scheduling period.
  • the scheduling period should be no less than the period of real-time priority services.
  • real-time priority services can be scheduled first, and then normal priority services can be scheduled. If the cycle is too small, real-time priority services can easily use up the allocated time slice.
  • Ordinary priority services also have time slices and can also execute AI models. Real-time priority services can only wait until the next cycle. Only time slices are available, causing real-time priority services to be untimely.
  • the configuration flow chart of the scheduling cycle is shown in Figure 3.
  • the virtual machine manager sends the scheduling cycle to the NPU driver, and the NPU driver sends the scheduling cycle to the NPU controller.
  • the NPU controller saves the scheduling cycle for use. for subsequent use.
  • S204 The processor allocates a time slice to each virtual machine according to the first configuration information, and the first configuration information is used to indicate the computing power proportion of each virtual machine in the plurality of virtual machines.
  • the processor allocates a time slice to each virtual machine in the virtual machine system according to the computing power ratio of the virtual machine indicated by the first configuration information, and the virtual machine with a larger computing power ratio is allocated a time slice. The more pieces there are.
  • the first virtual machine in the virtual machine system includes n execution sequences, where n is an integer greater than or equal to 0; the processor receives the second configuration information, and performs the configuration according to the received first configuration information. and the second configuration information allocates time slices to s execution sequences among the n execution sequences, where the execution sequence with a higher priority among the s execution sequences is allocated more time slices, and s is an integer less than or equal to n. .
  • the priorities of the s execution sequences assigned to the time slice are all common type priorities.
  • the processor only selects those whose priorities are common type priorities.
  • the execution sequence is allocated a time slice.
  • the execution sequence with the real-time type priority is not allocated a time slice.
  • the processor can schedule the priority with the real-time type priority. level of execution of tasks in a sequence.
  • the processor receives the third configuration information, and allocates a time slice to each virtual machine in the virtual machine system according to a period length for allocating time slices to multiple virtual machines indicated by the third configuration information.
  • the processor allocates a time slice to each virtual machine in the virtual machine system based on the first configuration information according to the cycle length indicated by the third configuration information.
  • time slice allocated by the processor to the first virtual machine in the virtual machine system satisfies the following relationship:
  • the processor allocates a time slice to an execution sequence with a common type priority in the first virtual machine based on the first configuration information and the second configuration information according to the period indicated by the third configuration information.
  • the common type priority is divided into multiple levels.
  • the processor determines different levels of common type priorities in the first virtual machine based on the first configuration information and the second configuration information according to the period indicated by the third configuration information.
  • Execution sequences are allocated time slices.
  • the execution sequence of the common type priority in the first virtual machine can be divided into four levels.
  • the ratio of time slices allocated to these four levels from high to low is 10:8:4:2:1, and the priority level it belongs to is 10:8:4:2:1.
  • Execution sequences with a larger ratio are allocated more time slices.
  • the processor After the processor allocates a time slice to the first virtual machine, the processor schedules tasks in the first virtual machine.
  • the processor schedules q first A task in the second execution sequence, the priority of the first execution sequence is the normal type priority, the priority of the second execution sequence is the real-time type priority, x and q are integers greater than or equal to 0.
  • the priorities of the real-time type in the q second execution sequences also include a first real-time priority and a second real-time priority.
  • the first real-time priority has a higher priority than the second real-time priority, and the processor has priority.
  • a second task arrives.
  • the first task belongs to any one of the x first execution sequences
  • the second task belongs to If any one of the q second execution sequences occurs, the processor stops executing the first task and schedules the execution of the second task.
  • time slice allocation ratio in x first execution sequences, such as 10:8:4:2:1
  • the processing The processor allocates the remaining time slices to the x first execution sequences according to the time slice allocation ratio.
  • the remaining time slices are the sum of the time slices allocated by the processor for the x first execution sequences and the tasks in the q second execution sequences. The difference in time slices used for execution.
  • the processor will allocate the remaining time slices to the ordinary type priority execution sequence according to the original time slice allocation ratio, thereby making the ordinary priority execution sequence The time slice still meets the time slice allocation ratio.
  • a second task arrives, and the first task and the second task belong to different execution sequences among the q second execution sequences, And the priority of the execution sequence to which the second task belongs is higher than the priority of the execution sequence to which the first task belongs, then the processor stops executing the first task and schedules execution of the second task.
  • the processor before the processor stops scheduling the first task, the processor obtains the total execution time and the current execution time of the first task, calculates the remaining execution time of the first task, and When the remaining execution time of the first task is less than the preset threshold, the processor executes the first task.
  • the preset threshold includes switching backup time.
  • the processor Before the processor triggers task switching, it first calculates the first task's execution time based on the total execution time of the first task and the current execution time. The remaining execution time, when the remaining execution time of the first task is less than the preset threshold, the processor still executes the first task to ensure that the current first task is completed, so as to save system resources and improve business execution efficiency.
  • the processor obtains the total execution time and the current execution time of the first task, calculates the remaining execution time of the first task, and when the remaining execution time of the first task is less than When the preset switching backup time of the first task is multiplied by the preset multiple, the processor executes the first task.
  • the switching backup time can be configured by the NPU driver to the NPU controller.
  • the switching backup time of the logical operation unit is different when it has different core numbers and different operating frequencies, and is configured by the NPU driver to the NPU controller; the preset multiple is Configured in advance.
  • the processor Before the processor triggers task switching, it first calculates the first task's execution time based on the total execution time of the first task and the current execution time. The remaining execution time. When the remaining execution time of the first task is less than the product of the preset switching backup time of the first task and the preset multiple, the processor still executes the first task to ensure that the current first task is completed to save system resources. , improve business execution efficiency.
  • the processor obtains the total execution time and the current execution time of the first task, calculates the remaining execution time of the first task, and when the remaining execution time of the first task is greater than or equals the preset threshold, the processor stops executing the first task.
  • the processor Before the processor triggers task switching, it first calculates the first task's execution time based on the total execution time of the first task and the current execution time. The remaining execution time. When the remaining execution time of the first task is greater than or equal to the preset threshold, the processor stops executing the first task and starts executing the second task to ensure the real-time performance of the second task with high priority and improve the business execution efficiency.
  • the preset threshold includes switching backup time.
  • execution information of the first task is stored in the backup memory unit.
  • the execution information of the first task may include general registers, special registers, internal cache memory, data in the buffer, etc. of the logical operation unit that executes the first task.
  • the number of backup memory units in a processor satisfies the following relationship:
  • L represents the number of backup memory units in the processor
  • e represents the number of logical operation units in the processor
  • g represents the number of real-time priority levels.
  • the processor when there is no time slice in the first virtual machine and there is an execution sequence to be executed, the processor schedules other virtual machines with time slices except the first virtual machine to execute the first virtual machine. Execution sequence of tasks in the machine.
  • the processor can schedule other virtual machines with available time slices or remaining time slices to execute the tasks, which improves the business processing efficiency of the virtual machine system and achieves Maximizes the utilization of processor computing power.
  • the processor executes the unfinished first task, that is, when the real-time task or a higher-priority task is completed, the preempted task is processed first instead of Execute new tasks and ensure that the preempted tasks are executed first.
  • the execution of the preempted task will be resumed first, because if a new task is executed, the execution of the previously interrupted task is not continued. , then when preemption occurs again, the corresponding memory space has already backed up the on-site information, and there is no memory space to back up the on-site information of this task.
  • the processor allocates a time slice to each virtual machine in the virtual machine system according to the computing power ratio of each virtual machine among multiple virtual machines.
  • the virtual machine with a larger computing power ratio is allocated
  • the more time slices, the more virtual machines can share the processor's computing power, which meets the computing power needs of each virtual machine and ensures the rationality of time slice allocation in each virtual machine.
  • tasks arrive they are executed first, which can ensure the real-time execution of high-priority tasks.
  • the following takes the processor in the virtual machine system as an NPU as an example to introduce the scheduling method provided by this application.
  • Figure 4 is a schematic flowchart of a scheduling method according to an embodiment of the present application. As shown in Figure 4, the method at least includes S401 to S407. The method shown in FIG. 4 can be applied to the virtual machine system 100 shown in FIG. 1 .
  • the host obtains the virtual machine configuration information.
  • S401 can refer to S201 and will not be described again here.
  • S402 The host creates multiple virtual machines based on the virtual machine configuration information.
  • the virtual machine configuration information obtained by the host includes information such as the number of created virtual machines and the computing power ratio of each virtual machine among the multiple created virtual machines.
  • the flow chart for a host to create multiple virtual machines is shown in Figure 5.
  • the virtual machine manager (hypervisor) in the host creates multiple virtual machines based on the virtual machine configuration information.
  • the hypervisor calls the NPU driver (NPU driver). ) interface, inform the NPU driver of the virtual machine identifier (identity document, ID) and computing power ratio of each newly added virtual machine, and the NPU driver informs the virtual NPU driver (vNPU driver) in the virtual machine of the virtual machine ID, vNPU
  • the driver saves the virtual machine ID information.
  • the NPU driver configures the virtual machine ID and computing power ratio to the NPU controller.
  • the NPU controller saves information such as the virtual machine ID and computing power ratio.
  • the host obtains the target model.
  • the APP in any virtual machine in the virtual machine system obtains the target model and loads the target model.
  • the target model may be an AI model.
  • S404 The host creates multiple execution sequences according to the target model.
  • execution sequences need to be created.
  • FIG. 6 is a flow chart for creating an execution sequence provided by an embodiment of the present application.
  • the virtual machine manager calls the interface of memory management (runtime) to create an execution sequence and configures the priority for the created execution sequence.
  • the memory management calls the interface of the virtual NPU driver in the virtual machine to apply for the execution sequence ID.
  • the NPU driver applies for an execution sequence ID from the NPU driver in the host.
  • the virtual NPU driver also sends the virtual machine ID to the NPU driver.
  • the NPU driver assigns an execution sequence ID to the virtual machine corresponding to the virtual machine ID and returns it to the virtual NPU driver.
  • the NPU driver configures the virtual machine ID and execution sequence ID to the NPU controller at the same time.
  • the NPU controller saves the virtual machine ID and execution sequence ID information, so the NPU controller has information about the execution sequence ID of each virtual machine.
  • the APP does not know whether it is running on a virtual machine or a host.
  • the memory management interface provided to the APP is the same, and the priority definition of the execution sequence is also the same.
  • the priority of the execution sequence can be divided into real-time type priority and common type priority.
  • the real-time type priority and common type priority can be further subdivided into multiple levels.
  • real-time priority can be divided into three levels: SP0, SP1 and SP2.
  • the priority of SP0 is higher than the priority of SP1
  • the priority of SP1 is higher than the priority of SP2
  • the common priority can be divided into WRR0 and WRR1.
  • WRR0 has a higher priority than WRR1, WRR1 has a higher priority than WRR2, WRR2 has a higher priority than WRR3, WRR3 has a higher priority than WRR4 level;
  • the execution sequence of real-time type priority can preempt the execution sequence of common type priority, and the execution sequence of high-level real-time type priority can preempt the execution sequence of low-level real-time type priority;
  • the schematic diagram of priority preemption is as shown in the figure As shown in 7, the execution sequence with priority SP0 can preempt the execution sequences with priorities SP1 and SP2.
  • the execution sequence with priority SP1 can preempt the execution sequence with priority SP2.
  • the level of common type priority can be reflected in The size of the time slice.
  • S405 The host sends resource configuration information to the NPU.
  • Multiple virtual machines in the virtual machine system share the resources of the NPU, and the resource configuration information sent by the host to the NPU is used to instruct the NPU to allocate resources to the multiple virtual machines.
  • the resource configuration information sent by the host to the NPU may include the computing power ratio of each virtual machine among the multiple virtual machines, the priority of the execution sequence in each virtual machine, and the cycle of resource allocation for each virtual machine. Length and other information.
  • S406 The NPU allocates resources to multiple virtual machines according to the resource configuration information.
  • the NPU controller allocates time slices to each virtual machine.
  • the unit of the time slice is microseconds (us), and the resource allocation period indicated by the resource configuration information can be 30 hertz (HZ).
  • the NPU controller allocates time slices to the execution sequence of the common type priority according to the priority of the execution sequence in the first virtual machine.
  • the ratio of time slices allocated in the first virtual machine to execution sequences of common type priorities WRR0 to WRR4 may be 10:8:4:2:1. Since the execution sequence of the real-time type priority can preempt the time slice of the execution sequence of the normal type priority, the execution sequence of the real-time type priority is not allocated a time slice.
  • the NPU controller deducts the time of the execution sequence of the normal type priority according to the time slice ratio of the normal type priority. Slices, the total number of time slices deducted is the time slice used by the execution sequence of real-time type priority, so that the time slice of the execution sequence of common type priority still meets the preset ratio.
  • the NPU controller reallocates time slices to each virtual machine.
  • any virtual machine does not use up the time slice in the last resource allocation cycle, the remaining time slice will not be carried into the next resource allocation cycle; if the time slice used by any virtual machine exceeds the allocated specification, it should be It is caused by the execution time of the last task in the execution sequence. Since the execution time of the task is much smaller than the time slice of the virtual machine, the excess will not be deducted in the next resource allocation cycle. Within the virtual machine, the unused time slices of execution sequences of normal type priority will not be carried into the next resource allocation cycle, but the overused time slices in the previous resource allocation cycle will be deducted from the next time slice allocation.
  • S407 The NPU schedules tasks in multiple virtual machines.
  • the NPU controller in the NPU schedules the execution sequence of tasks in the virtual machine system.
  • the NPU controller first schedules real-time priority execution sequences in virtual machines with time slices and ordinary priority execution sequences with time slices. If these execution sequences do not have task scheduling, then schedule other execution sequences. Execution sequences, for example, execution sequences without time slices.
  • each type of execution sequence is scheduled sequentially in the order of real-time high priority, real-time low priority, and normal priority. That is, after the execution sequences of high-level real-time type priorities are all scheduled, then the execution sequences of low-level real-time type priorities are scheduled, and finally the ordinary type execution sequences are scheduled.
  • the NPU controller schedules other execution sequences. . In this case, the NPU controller waits until the currently running task is completed before switching tasks in other execution sequences, and does not switch the executing task/block midway.
  • the NPU controller when each block is executed, can obtain the time slice used by the block to run, and deduct the corresponding execution sequence and the time slice of the virtual machine.
  • a virtual machine with time slices when a task with a normal type priority is being executed and a task with a real-time priority execution sequence arrives, the scheduling of the task with a normal type priority is stopped and the scheduling of the real-time type priority is started. task. After the task scheduling of the real-time priority execution sequence in the virtual machine is completed, the time slice used by the real-time priority task needs to be deducted from the time slice of the normal priority execution sequence of the virtual machine.
  • the NPU controller needs to interrupt the executing block and instead execute the block of the real-time (high) priority task. This process can be called preemption.
  • the NPU controller needs to back up the current running scene information of the block.
  • the scene information can include general register information, special register information, and internal cache memory in the logical unit that executes the block. (buffer) and buffer (cache) information, etc.
  • the buffer and cache space in the logical operation unit are relatively large, so an equally large memory space is required to back up this information.
  • the memory space that backs up all on-site information of a logical operation unit is called a backup memory unit.
  • L represents the number of backup memory units in the NPU
  • e represents the number of logical operation units in the NPU
  • g represents the real-time type.
  • the real-time priority has three levels, namely SP0, SP1 and SP2, and the value of g is 3.
  • the number of backup memory units in each logical operation unit is equal to the number of priority levels of the real-time type.
  • the number of backup memory units in each logical operation unit in the NPU is 3.
  • Each memory backup block is used to store the general registers, special registers, internal cache, and buffer data of the logical operation unit, that is, the on-site information of the logical operation unit running block.
  • Figure 8 is a schematic diagram of a backup memory unit provided by an embodiment of the present application.
  • the NPU includes multiple logical operation units (logical operation unit 0 to logical operation unit 3), which are controlled by the controller.
  • each logical operation unit includes multiple internal caches (buffers) and multiple buffers (cache), etc.
  • Backup memory unit 0 to backup memory unit 3 can back up all on-site information of a logical operation unit (all general registers , special register, buffer, cache).
  • the on-site information of the preempted block in the logical operation unit is backed up to the corresponding space.
  • the execution sequence of the normal type priority is preempted by the execution sequence of the real-time type priority SP0/1/2
  • the on-site information is backed up in the backup memory unit 2; if the execution sequence of the real-time type priority SP2 is preempted by the priority
  • the on-site information is backed up in the backup memory unit 1; when the execution sequence of the real-time type priority SP1 is preempted by the execution sequence of the real-time type priority SP0, On-site information is backed up in backup memory unit 0.
  • the NPU driver allocates the total space of the backup memory unit according to the number of logical operation units in the NPU, and the NPU driver configures the backup memory unit to the NPU controller when the NPU starts.
  • the NPU controller detects a real-time type execution sequence or a high-level real-time type priority. If a task arrives in the execution sequence, the NPU controller configures the control register of the logic operation unit to stop the logic operation unit from executing the current block. The logic operation unit stops the execution of each internal execution sequence waterline and returns a signal to the NPU controller. , informing the NPU controller that it has stopped executing the current block. The NPU controller configures the control register of the logical operation unit to enable the backup function of the logical operation unit.
  • the logical operation unit starts the memory tagging extension (MTE) and transfers the register, cache, The data in the buffer is backed up to the designated backup memory unit.
  • the logic operation unit returns a signal to the NPU controller to inform the NPU controller that the backup is completed.
  • the NPU controller records the information of the stopped execution sequence, task, and block.
  • the NPU controller will The block of the real-time (high) priority task is configured to the logical operation unit, and the logical operation unit executes the block of the currently configured real-time type priority task.
  • the block of the preempted task is restored first.
  • the preempted task will be resumed first. Because if a new task is executed instead of continuing to execute the previously interrupted task, when preemption occurs again, the corresponding memory space has already backed up the on-site information, and there is no memory space to back up the on-site information of this new task.
  • the preemption recovery execution sequence program may include: the NPU controller configures the register of the logical operation unit, so that the logical operation unit enables the on-site recovery function, the logical operation unit starts the MTE, and restores data from the designated backup memory unit to the register.
  • the logical operation unit returns a signal to the NPU controller to inform the NPU controller that the recovery is complete.
  • the NPU controller configures the logical operation unit so that the logical operation unit starts executing the preempted block, that is, before continuing execution from the current restored environment.
  • the interrupted block at the same time, the NPU controller clears the stopped execution sequence, task, and block information recorded in the backup memory unit.
  • preemption is expensive because it requires backing up and restoring the environment. If the remaining execution time of the current block is less than the backup time, allowing the block to complete execution is more conducive to task scheduling and improves task execution efficiency than preemption. Then, the NPU controller needs to know the total execution time and the execution time of the currently executed task or block, so as to know how long the current block needs to be executed before it ends, and to know whether preemption can be avoided. Therefore, it is necessary to increase the execution time statistics of tasks or blocks.
  • a performance statistics counter t_cnt is added to each logical unit in the NPU to count the number of execution cycles of the block, and a space is added to each task to store the number of execution cycles of the task/block. This execution cycle The number is based on the actual execution time of the block, so the number of execution cycles is 0 when the model is loaded.
  • the logical operation unit when the logical operation unit starts executing a block, it sets the performance statistics counter t_cnt to 0. During the execution process, the logical operation unit increases t_cnt by 1 for each cycle. During the execution of the block by the logical operation unit, the NPU controller can read the number of cycles that the block has executed. When the block execution ends, the NPU controller can obtain the total number of cycles executed by the block.
  • the NPU controller controls the scheduling of task/block execution.
  • the NPU controller reads the t_cnt value of the logical operation unit, that is, the execution time of the block, and summarizes it into the position of the cyele number of the corresponding task in the execution queue. , so you can get the total number of execution cycles of the task. Divide the total number of execution cycles by the number of blocks, and you can get the average execution time of the block.
  • the NPU controller needs to re-count the number of task execution cycles.
  • the NPU controller only re-counts and updates the total cycle number of tasks when the AI model is run for the first time, or when the frequency and number of cores of the logical operation unit in the NPU are adjusted.
  • the NPU controller configures a preset threshold. If the remaining time of the currently executed block is less than the preset threshold, no preemption will be performed. After the current block is executed, the block of the newly arrived task will be executed.
  • the configured preset threshold may be a multiple of the switching backup time. If the remaining execution time of the currently executed block is less than the preset multiple of the switching backup time, no preemption will be performed.
  • the switching backup time can be configured by the NPU driver to the NPU controller.
  • the switching backup time of the logical operation unit is different when it has different core numbers and different operating frequencies, and is configured by the NPU driver to the NPU controller.
  • the task information in the model queue does not include the task execution statistics, so the controller does not consider preemption avoidance and preempts directly.
  • the task execution statistics time will be available, and then the NPU controller can perform preemption avoidance operations.
  • the NPU controller when there is no task scheduling for the real-time type priority execution sequence in the virtual machine with time slices or the execution sequence with time slices, the NPU controller performs idle processing.
  • the NPU controller schedules a virtual machine with a time slice to execute all execution sequences in a virtual machine without a time slice and an execution sequence in a virtual machine with a time slice but an execution sequence without a time slice, and during the scheduling process, no Differentiate the priorities of execution sequences.
  • Each execution sequence is executed for a preset time before other execution sequences are scheduled.
  • the execution time of the tasks in the execution sequence does not occupy the time slice allocated by the virtual machine.
  • a virtual machine with a time slice executes a task of an execution sequence in another virtual machine, and a task arrives in the execution sequence of the real-time type priority in the virtual machine, the scheduling of the current task is stopped and the real-time type of the arriving real-time type is started. Tasks in the priority execution sequence; if a task arrives in an execution sequence with a common priority, the virtual machine will be scheduled to execute the task in the execution sequence with a common priority after the current task is completed.
  • a virtual machine does not have a time slice, but the execution sequence of the real-time priority in the virtual machine has tasks to be scheduled, it means that the resource allocation of the virtual machine is abnormal.
  • a situation where abnormal virtual machine resource allocation occurs can be that the virtual machine has all real-time priority execution sequences or the number of real-time priority execution sequences in the virtual machine is much greater than the normal type priority.
  • the number of execution sequences causes the computing power allocated to the virtual machine to be too small.
  • the solution is to increase the computing power ratio of the virtual machine.
  • the execution sequence of the common type priority has used up the time slice in the virtual machine, resulting in no time slice in the virtual machine.
  • the solution can be to move the normal type priority execution sequence to another virtual machine or increase the computing power of the virtual machine.
  • the technical solution provided by this application enables each virtual machine in the virtual machine system to share the NPU computing power according to the configured NPU computing power ratio; real-time type tasks are executed first, and real-time type tasks can use all NPU logical operation units, thereby ensuring Real-time performance of real-time tasks; when the remaining execution time of the block currently executing the task is less than the preset threshold, the NPU controller does not perform preemption switching, which improves the usage efficiency of the NPU and improves the real-time performance of real-time tasks.
  • Figure 9 is a schematic structural diagram of a scheduling device according to an embodiment of the present application.
  • the device 900 may include a distribution module 901, an execution module 902, a receiving module 903, a scheduling module 904, an acquisition module 905, a calculation module 906 and a storage module 907.
  • Apparatus 900 may be used to implement the operations implemented by the processor in FIGS. 2 and 4 .
  • Any module in the embodiments of this application may be implemented in whole or in part by software and/or hardware.
  • the device 900 can be used to implement the method shown in Figure 2 above.
  • the allocation module 901 is used to implement S204
  • the execution module 902 is used to implement S205.
  • the device 900 may also include a scheduling module, and the device 900 in this implementation may be used to implement the method shown in Figure 4 above.
  • the allocation module 901 is used to implement S406, and the scheduling module is used to implement S407.
  • Figure 10 is a schematic structural diagram of a scheduling device according to another embodiment of the present application.
  • the device 1000 may include an acquisition module 1001, a creation module 1002, a sending module 1003 and a configuration module 1004.
  • the apparatus 1000 may be used to implement the operations implemented by the host in FIGS. 2 and 4 .
  • Any module in the embodiments of this application may be implemented in whole or in part by software and/or hardware.
  • the device 1000 can be used to implement the method shown in Figure 2 above.
  • the acquisition module 1001 is used to implement S201
  • the creation module 1002 is used to implement S202
  • the sending module 1003 is used to implement S203.
  • the device 1000 can be used to implement the method shown in Figure 4 above.
  • the acquisition module 1001 is used to implement S401 and S403
  • the creation module 1002 is used to implement S402 and S404
  • the sending module 1003 is used to implement S405.
  • Figure 11 is a schematic structural diagram of a scheduling device provided by an embodiment of the present application.
  • the device 1100 shown in Figure 11 can be used to perform the method described in any of the aforementioned embodiments.
  • the device 1100 in this embodiment includes: a memory 1101, a processor 1102, a communication interface 1103 and a bus 1104.
  • the memory 1101, the processor 1102, and the communication interface 1103 implement communication connections between each other through the bus 1104.
  • the memory 1101 may be a read only memory (ROM), a static storage device, a dynamic storage device or a random access memory (RAM).
  • the memory 1101 may store programs, and when the program stored in the memory 1101 is executed by the processor 1102, the processor 1102 may be used to perform various steps of the methods shown in FIGS. 2 to 4 .
  • the processor 1102 may be a general central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), or one or more integrated circuits for executing related programs to Implement the scheduling method of the method embodiment of the present application.
  • CPU central processing unit
  • ASIC application specific integrated circuit
  • the processor 1102 may also be an integrated circuit chip with signal processing capabilities. During the implementation process, various steps of the methods of various embodiments of the present application can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 1102 .
  • the above-mentioned processor 1102 can also be a general-purpose processor, a digital signal processor (digital signal processing, DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, Discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
  • the storage medium is located in the memory 1101.
  • the processor 1102 reads the information in the memory 1101, and combines its hardware to complete the functions required to be performed by each method in the embodiment of the present application. For example, each of the embodiments shown in Figures 2 to 4 can be executed. Steps/Function.
  • the communication interface 1103 may use, but is not limited to, a transceiver device such as a transceiver to implement communication between the device 1100 and other devices or communication networks.
  • Bus 1104 may include a path that carries information between various components of device 1100 (eg, memory 1101, processor 1102, communication interface 1103).
  • the device 1100 shown in the embodiment of the present application may be an electronic device, or may also be a chip configured in the electronic device.
  • the processor in the embodiment of the present application can be a central processing unit (CPU).
  • the processor can also be other general-purpose processors, digital signal processors (DSP), or application-specific integrated circuits. (application specific integrated circuit, ASIC), off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • non-volatile memory may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • non-volatile memory can be read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically removable memory. Erase electrically programmable read-only memory (EPROM, EEPROM) or flash memory.
  • Volatile memory can be random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • static random access memory static random access memory
  • DRAM dynamic random access memory
  • RAM synchronous dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • double data rate SDRAM double data rate SDRAM
  • DDR SDRAM double data rate SDRAM
  • enhanced SDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous connection dynamic random access memory access memory
  • direct rambus RAM direct rambus RAM, DR RAM
  • the above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination.
  • the above-described embodiments may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmit to another website, computer, server or data center through wired (such as infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that a computer can access, or a data storage device such as a server or a data center that contains one or more sets of available media.
  • the usable media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media.
  • the semiconductor medium may be a solid state drive.
  • At least one refers to one or more, and “plurality” refers to two or more.
  • At least one of the following” or similar expressions thereof refers to any combination of these items, including any combination of a single item (items) or a plurality of items (items).
  • at least one of a, b, or c can mean: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple .
  • the size of the sequence numbers of the above-mentioned processes does not mean the order of execution.
  • the execution order of each process should be determined by its functions and internal logic, and should not be used in the embodiments of the present application.
  • the implementation process constitutes any limitation.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk and other various media that can store program codes.

Abstract

本申请实施例提供一种调度方法及相关装置。虚拟机系统包含主机,主机根据获取的虚拟机配置信息创建多个虚拟机,该多个虚拟机共享处理器的算力,处理器根据接收的来自主机的第一配置信息为多个虚拟机中的每个虚拟机分配时间片,其中,第一配置信息用于指示多个虚拟机中每个虚拟机的算力比例,算力比例越大的虚拟机分配到的时间片越多。能够满足各个虚拟机的算力需求,保证每个虚拟机中时间片分配的合理性。本申请提供的实施例能够用于智能汽车或新能源汽车等智能化计算设备。

Description

调度方法及相关装置 技术领域
本申请涉及信息技术领域,尤其涉及一种调度方法及相关装置。
背景技术
人工智能(artificial intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得判断结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能领域的研究包括机器人、自然语言处理、计算机视觉、决策与推理、人机交互、推荐与搜索、AI基础理论等。
但是智能汽车或其他智能终端的计算资源有限,为人工智能的落地应用带来了挑战。
发明内容
本申请提供了一种调度方法及相关装置,满足了各个虚拟机的算力需求,保证了每个虚拟机中时间片分配的合理性。
第一方面,本申请提供一种调度方法,应用于虚拟机系统,所述虚拟机系统包含多个虚拟机,所述多个虚拟机共享处理器的算力,所述方法包括:所述处理器根据第一配置信息为所述多个虚拟机中的每个虚拟机分配时间片,所述第一配置信息用于指示所述多个虚拟机中每个虚拟机的算力比例;在所述多个虚拟机中的第一虚拟机中存在所述时间片,且第一任务正在执行时,有第二任务到达,所述处理器停止执行所述第一任务,且调度执行所述第二任务,所述第二任务所属的执行序列的优先级高于所述第一任务所属的执行序列的优先级。
本方法中,处理器根据多个虚拟机中每个虚拟机的算力比例,为虚拟机系统中的每个虚拟机分配时间片,算力比例越大的虚拟机分配到的时间片越多,实现了多个虚拟机对处理器算力的共享,满足了各个虚拟机的算力需求;且当有优先级高的任务到达时,优先执行,可以保证高优先级任务执行的实时性。
在一种可能的实现方式中,所述多个虚拟机中的第一虚拟机包括n个执行序列,n为大于或等于0的整数;所述方法还包括:所述处理器根据所述第一配置信息和第二配置信息为所述n个执行序列中的s个执行序列分配所述时间片,所述第二配置信息用于指示所述第一虚拟机中的所述n个执行序列的优先级,s为小于或等于n的整数。
该实现方式中,第一虚拟机为虚拟机系统中多个虚拟机中的任意一个虚拟机,第一虚拟机中包括n个执行序列,处理器根据第一虚拟机中的n个执行序列的优先级为该n个执行序列中的s个执行序列分配时间片,s个执行序列中优先级越高的执行序列 分配到的时间片越多,实现了时间片的二级分配,满足了第一虚拟机中各个执行序列的算力需求,同时保证了为每个执行序列分配时间片的合理性。
在一种可能的实现方式中,所述第一虚拟机中的执行序列的优先级包含普通类型的优先级和实时类型的优先级,所述实时类型的优先级高于所述普通类型的优先级;其中,所述s个执行序列的优先级为普通类型的优先级。
该实现方式中,第一虚拟机中的执行序列的优先级包含普通类型的优先级和实时类型的优先级,处理器根据第一虚拟机中n个执行序列的优先级,仅为n个执行序列中优先级为普通类型优先级的执行序列分配时间片,优先级为实时类型优先级的执行序列则按照实际需要进行调度,只要第一虚拟机有时间片,就能调度优先级为实时类型优先级的执行序列,保证了实时业务的实时性。
在一种可能的实现方式中,所述方法还包括:所述处理器根据所述第一配置信息和第三配置信息为所述每个虚拟机分配所述时间片,所述第三配置信息用于指示所述处理器为所述多个虚拟机分配时间片的周期长度。
该实现方式中,处理器根据第三配置信息指示的周期为虚拟机系统中的多个虚拟机分配时间片,满足了各个虚拟机的算力需求,同时保证了虚拟机中各项业务能够按时按需完成。
在一种可能的实现方式中,所述处理器为所述多个虚拟机分配时间片的周期长度大于或等于优先级为实时类型的优先级的执行序列的执行周期的周期长度。
该实现方式中,处理器为虚拟机系统中的多个虚拟机分配时间片的周期长度大于或等于优先级为实时类型优先级的执行序列的执行周期的周期长度,使得在虚拟机中的实时任务完成的情况下,再重新为多个虚拟机分配时间片,保证了实时任务的实时性。
在一种可能的实现方式中,所述方法还包括:所述处理器接收所述第一配置信息。
在一种可能的实现方式中,所述方法还包括:在所述多个虚拟机中的第一虚拟机中存在时间片和待执行的x个第一执行序列和q个第二执行序列时,所述处理器优先调度所述q个第二执行序列中的任务,所述第一执行序列的优先级为普通类型的优先级,所述第二执行序列的优先级为实时类型的优先级,x和q为大于或等于0的整数。该实现方式中,在第一虚拟机中既包括待执行的优先级为普通类型优先级的执行序列,又包括待执行的优先级为实时类型优先级的执行序列时,处理器优先调度待执行的优先级为实时类型优先级的执行序列中的任务,保证了实时任务的实时性。
在一种可能的实现方式中,所述处理器优先调度所述q个第二执行序列中的任务,包括:所述q个第二执行序列中实时类型的优先级还包括第一实时优先级和第二实时优先级,所述第一实时优先级的优先级高于所述第二实时优先级;所述处理器优先调度所述q个第二执行序列中第一实时优先级的任务。
该实现方式中,在第一虚拟机中有多个待执行的优先级为实时类型优先级的执行序列时,处理器优先调度等级高的实时类型优先级的执行序列中的任务,在很多种情况下,执行等级低的实时类型优先级的执行序列中的任务时,需要用到等级高的实时类型优先级的执行序列中的任务的执行结果,所以优先执行等级高的实时类型优先级的执行序列中的任务,有助于保证实时任务的实时性,同时提高了实时任务的执行效 率。
在一种可能的实现方式中,所述第一任务属于所述x个第一执行序列中的任意一个执行序列,所述第二任务属于所述q个第二执行序列中的任意一个执行序列。
该实现方式中,在第一虚拟机中,普通类型优先级的任务正在执行时,有实时类型优先级的任务到达,则处理器停止调度该普通类型优先级的任务,启动调度实时类型优先级的任务,保证了实时任务的实时性。
在一种可能的实现方式中,若所述x个第一执行序列中存在时间片分配比例,则在所述q个第二执行序列中的任务执行完成后,所述处理器根据所述时间片分配比例将剩余的时间片分配给所述x个第一执行序列,所述剩余的时间片为所述处理器为所述x个第一执行序列分配的时间片的总和与所述q个第二执行序列中的任务执行所使用的时间片的差值。
该实现方式中,当实时类型的优先级的任务执行完成后,处理器将剩下的时间片按照原来的时间片分配比例分配给普通类型的优先级的执行序列,从而使得普通优先级执行序列的时间片仍然满足时间片分配比例。
在一种可能的实现方式中,所述第一任务和所述第二任务属于所述q个第二执行序列中不同的执行序列,且所述第二任务所属执行序列的优先级高于所述第一任务所属执行序列的优先级。
该实现方式中,在第一虚拟机中,等级低的实时类型优先级的任务正在执行时,有等级高的实时类型优先级的任务达到,则处理器停止调度等级低的实时类型优先级的任务,启动调度等级高的实时类型优先级的任务,保证了第一虚拟机中实时类型优先级的任务能够按照优先级等级顺序由高到低执行,保证了实时任务的实时性,提高了实时任务的执行效率。
在一种可能的实现方式中,所述方法还包括:所述处理器获取所述第一任务的总执行时间和当前执行的时间;所述处理器根据所述第一任务的总执行时间和所述当前执行的时间,计算所述第一任务的剩余执行时间;在所述第一任务的剩余执行时间小于预设阈值时,所述处理器执行所述第一任务。
该实现方式中,在第一虚拟机中,第一任务正在执行,有第二任务达到,处理器触发任务切换之前,先根据第一任务的总执行时间和当前执行的时间,计算第一任务的剩余执行时间,在第一任务的剩余执行时间小于预设阈值时,处理器仍执行第一任务,保证当前第一任务执行完成,以节约系统资源,提高业务的执行效率。
在一种可能的实现方式中,所述方法还包括:所述处理器获取所述第一任务的总执行时间和当前执行的时间;所述处理器根据所述第一任务的总执行时间和所述当前执行的时间,计算所述第一任务的剩余执行时间;在所述第一任务的剩余执行时间小于所述第一任务的预设切换备份时间与预设倍数的乘积时,所述处理器执行所述第一任务。
该实现方式中,在第一虚拟机中,第一任务正在执行,有第二任务达到,处理器触发任务切换之前,先根据第一任务的总执行时间和当前执行的时间,计算第一任务的剩余执行时间,在第一任务的剩余执行时间小于第一任务的预设切换备份时间与预设倍数的乘积时,处理器仍执行第一任务,保证当前第一任务执行完成,以节约系统 资源,提高业务的执行效率。
在一种可能的实现方式中,所述方法还包括:所述处理器获取所述第一任务的总执行时间和当前执行的时间;所述处理器根据所述第一任务的总执行时间和所述当前执行的时间,计算所述第一任务的剩余执行时间;在所述第一任务的剩余执行时间大于或等于预设阈值时,所述处理器停止执行所述第一任务。
该实现方式中,在第一虚拟机中,第一任务正在执行,有第二任务达到,处理器触发任务切换之前,先根据第一任务的总执行时间和当前执行的时间,计算第一任务的剩余执行时间,在第一任务的剩余执行时间大于或等于预设阈值时,处理器停止执行第一任务,启动执行第二任务,以保证优先级高的第二任务的实时性,提高了业务的执行效率。
在一种可能的实现方式中,所述处理器停止调度所述第一任务之后,所述方法还包括:将所述第一任务的执行信息存储至备份内存单元中。
该实现方式中,处理器停止调度第一任务后,将第一任务的执行信息存储至备份内存单元中,处理器再次调度第一任务时,从备份内存单元中调取第一任务的执行信息,继续执行第一任务,提高了第一任务的执行效率。
在一种可能的实现方式中,所述第一任务的执行信息包括以下信息的一种或多种:执行所述第一任务的逻辑运算单元的通用寄存器、专用寄存器、内部高速缓冲存储器、缓冲区中的数据。
该实现方式中,处理器停止调度第一任务时,将执行第一任务的逻辑运算单元的通用寄存器、专用寄存器、内部高速缓冲存储器、缓冲区中的数据存储至备份内存单元中,为重新执行第一任务保留了充足的数据,提高了第一任务的执行效率。
在一种可能的实现方式中,所述处理器中的备份内存单元的数量满足以下关系式:L=e*g,其中,L表示所述处理器中的备份内存单元的数量,e表示所述处理器中的逻辑运算单元的数量,g表示实时类型的优先级的等级个数。
该实现方式中,处理器中的备份内存单元的数量等于处理器中逻辑运算单元的数量与实时类型优先级的等级个数的乘积,相比于为虚拟机系统中每个执行序列设置一个备份内存单元,大大节省了内存空间。
在一种可能的实现方式中,所述方法还包括:当所述第二任务执行完成时,所述处理器执行未完成的所述第一任务。
该实现方式中,当实时任务或者更高优先级的任务执行完时,优先处理被抢占的任务,而不是执行新的任务,保证被抢占的任务先执行。
在一种可能的实现方式中,所述方法还包括:在所述第一虚拟机中没有时间片,且存在有待执行的执行序列时,所述处理器调度除所述第一虚拟机外的其他存在时间片的虚拟机执行所述第一虚拟机中的执行序列的任务。
该实现方式中,在第一虚拟机没有时间片但有待执行任务时,处理器调度其他空闲的虚拟机来执行任务,提高了虚拟机系统的业务处理效率,实现了处理器算力利用的最大化。
在一种可能的实现方式中,所述方法还包括:所述处理器根据第一配置信息为所述每个虚拟机分配的时间片满足如下关系式:
Y=1000*1000/t*m*p
其中,Y表示为所述每个虚拟机分配的时间片;t表示调度周期,在每个所述调度周期,所述处理器为所述每个虚拟机分配所述时间片;m表示所述处理器中逻辑运算单元的数量;p表示所述每个虚拟机的算力比例。
第二方面,本申请提供一种调度方法,应用于虚拟机系统,所述虚拟机系统包含主机,所述方法包括:所述主机获取虚拟机配置信息,所述虚拟机配置信息用于指示创建多个虚拟机,所述虚拟机配置信息包括所述多个虚拟机中每个虚拟机的算力比例;所述主机根据所述虚拟机配置信息,创建所述多个虚拟机,所述多个虚拟机共享处理器的算力;所述主机向所述处理器发送第一配置信息,所述第一配置信息用于指示所述多个虚拟机中每个虚拟机的算力比例。
本方法中,虚拟机系统中包含主机,主机根据获取的虚拟机配置信息创建多个虚拟机,由创建的多个虚拟机共同完成虚拟机系统中的任务,并向处理器发送第一配置信息,用于指示处理器根据多个虚拟机中每个虚拟机的算力比例,提高了创建多个虚拟机的合理性,节约了虚拟机系统的资源,提高了虚拟机系统的执行效率。
在一种可能的实现方式中,所述方法还包括:所述主机获取多个模型;所述主机根据所述多个模型中的第一模型为所述多个虚拟机中的第一虚拟机创建n个执行序列,n为大于或等于0的整数;所述主机为所述n个执行序列中的每个执行序列配置优先级;向所述处理器发送第二配置信息,所述第二配置信息用于指示所述第一虚拟机中的n个执行序列的优先级。
该实现方式中,主机根据获取的多个模型中的第一模型为第一虚拟机创建n个执行序列,为n个执行序列中的每个执行序列配置优先级,并向处理器发送第二配置信息,用于指示第一虚拟机中n个执行序列的优先级,该n个执行序列能够并行处理,n个执行序列中的每个执行序列能够按照优先级等级顺序从高到低执行,提高了第一模型的执行效率,保证了第一模型的实时性要求。
在一种可能的实现方式中,所述优先级包含普通类型的优先级和实时类型的优先级,所述实时类型的优先级高于所述普通类型的优先级。
在一种可能的实现方式中,所述虚拟机配置信息还包括为所述多个虚拟机分配时间片的周期长度;所述方法还包括:所述主机向所述处理器发送第三配置信息,所述第三配置信息用于指示为所述多个虚拟机分配时间片的周期长度。
该实现方式中,主机向处理器发送第三配置信息,用于指示为虚拟机系统中的多个虚拟机分配时间片的周期,满足了各个虚拟机的算力需求,同时保证了虚拟机中各项业务能够按时按需完成。
在一种可能的实现方式中,所述为所述多个虚拟机分配时间片的周期大于或等于优先级为实时类型的优先级的执行序列的执行周期。
该实现方式中,为虚拟机系统中的多个虚拟机分配时间片的周期大于或等于优先级为实时类型的优先级的执行序列的执行周期,使得在虚拟机中的实时任务完成的情况下,再重新为多个虚拟机分配时间片,保证了实时任务的实时性。
第三方面,本申请提供一种调度装置,应用于虚拟机系统,所述虚拟机系统包含多个虚拟机,所述多个虚拟机共享处理器的算力,所述装置包括:分配模块,用于根 据第一配置信息为所述多个虚拟机中的每个虚拟机分配时间片,所述第一配置信息用于指示所述多个虚拟机中每个虚拟机的算力比例;执行模块,用于在所述多个虚拟机中的第一虚拟机中存在所述时间片,且第一任务正在执行时,有第二任务到达,所述处理器停止执行所述第一任务,且调度执行所述第二任务,所述第二任务所属的执行序列的优先级高于所述第一任务所属的执行序列的优先级。
在一种可能的实现方式中,所述多个虚拟机中的第一虚拟机包括n个执行序列,n为大于或等于0的整数;所述装置还包括:所述分配模块还用于根据所述第一配置信息和第二配置信息为所述n个执行序列中的s个执行序列分配所述时间片,所述第二配置信息用于指示所述第一虚拟机中的所述n个执行序列的优先级,s为小于或等于n的整数。
在一种可能的实现方式中,所述第一虚拟机中的执行序列的优先级包含普通类型的优先级和实时类型的优先级,所述实时类型的优先级高于所述普通类型的优先级;其中,所述s个执行序列的优先级为普通类型的优先级。
在一种可能的实现方式中,所述分配模块还用于根据所述第一配置信息和第三配置信息为所述每个虚拟机分配所述时间片,所述第三配置信息用于指示所述处理器为所述多个虚拟机分配时间片的周期长度。
在一种可能的实现方式中,所述处理器为所述多个虚拟机分配时间片的周期长度大于或等于优先级为实时类型的优先级的执行序列的执行周期的周期长度。
在一种可能的实现方式中,所述装置还包括:接收模块,用于所述处理器接收所述第一配置信息。
在一种可能的实现方式中,所述装置还包括:调度模块,用于在所述多个虚拟机中的第一虚拟机中存在时间片和待执行的x个第一执行序列和q个第二执行序列时,优先调度所述q个第二执行序列中的任务,所述第一执行序列的优先级为普通类型的优先级,所述第二执行序列的优先级为实时类型的优先级,x和q为大于或等于0的整数。
在一种可能的实现方式中,所述调度模块具体用于:所述q个第二执行序列中实时类型的优先级还包括第一实时优先级和第二实时优先级,所述第一实时优先级的优先级高于所述第二实时优先级;所述调度模块优先调度所述q个第二执行序列中第一实时优先级的任务。
在一种可能的实现方式中,所述第一任务属于所述x个第一执行序列中的任意一个执行序列,所述第二任务属于所述q个第二执行序列中的任意一个执行序列。
在一种可能的实现方式中,若所述x个第一执行序列中存在时间片分配比例,则在所述q个第二执行序列中的任务执行完成后,所述分配模块还用于根据所述时间片分配比例将剩余的时间片分配给所述x个第一执行序列,所述剩余的时间片为所述处理器为所述x个第一执行序列分配的时间片的总和与所述q个第二执行序列中的任务执行所使用的时间片的差值。
在一种可能的实现方式中,所述第一任务和所述第二任务属于所述q个第二执行序列中不同的执行序列,且所述第二任务所属执行序列的优先级高于所述第一任务所属执行序列的优先级。
在一种可能的实现方式中,所述装置还包括:获取模块,用于获取所述第一任务的总执行时间和当前执行的时间;计算模块用于根据所述第一任务的总执行时间和所述当前执行的时间,计算所述第一任务的剩余执行时间;在所述第一任务的剩余执行时间小于预设阈值时,所述执行模块还用于执行所述第一任务。
在一种可能的实现方式中,所述获取模块还用于获取所述第一任务的总执行时间和当前执行的时间;所述计算模块还用于根据所述第一任务的总执行时间和所述当前执行的时间,计算所述第一任务的剩余执行时间;在所述第一任务的剩余执行时间小于所述第一任务的预设切换备份时间与预设倍数的乘积时,所述执行模块还用于执行所述第一任务。
在一种可能的实现方式中,所述获取模块还用于获取所述第一任务的总执行时间和当前执行的时间;所述计算模块还用于根据所述第一任务的总执行时间和所述当前执行的时间,计算所述第一任务的剩余执行时间;在所述第一任务的剩余执行时间大于或等于预设阈值时,所述执行模块还用于停止执行所述第一任务。
在一种可能的实现方式中,所述处理器停止调度所述第一任务之后,所述装置还包括:存储模块,用于将所述第一任务的执行信息存储至备份内存单元中。
在一种可能的实现方式中,所述第一任务的执行信息包括以下信息的一种或多种:执行所述第一任务的逻辑运算单元的通用寄存器、专用寄存器、内部高速缓冲存储器、缓冲区中的数据。
在一种可能的实现方式中,所述处理器中的备份内存单元的数量满足以下关系式:L=e*g,其中,L表示所述处理器中的备份内存单元的数量,e表示所述处理器中的逻辑运算单元的数量,g表示实时类型的优先级的等级个数。
在一种可能的实现方式中,所述执行模块还用于当所述第二任务执行完成时,所述处理器执行未完成的所述第一任务。
在一种可能的实现方式中,所述调度模块还用于在所述第一虚拟机中没有时间片,且存在有待执行的执行序列时,调度除所述第一虚拟机外的其他存在时间片的虚拟机执行所述第一虚拟机中的执行序列的任务。
在一种可能的实现方式中,所述处理器根据第一配置信息为所述每个虚拟机分配的时间片满足如下关系式:
Y=1000*1000/t*m*p
其中,Y表示为所述每个虚拟机分配的时间片;t表示调度周期,在每个所述调度周期,所述处理器为所述每个虚拟机分配所述时间片;m表示所述处理器中逻辑运算单元的数量;p表示所述每个虚拟机的算力比例。
第三方面及第三方面的各种可能的实现方式中的有益效果可参见第一方面及第一方面的各种可能的实现方式中的有益效果,此处不再赘述。
第四方面,本申请提供一种调度装置,应用于虚拟机系统,所述虚拟机系统包含主机,所述装置包括:获取模块,用于获取虚拟机配置信息,所述虚拟机配置信息用于指示创建多个虚拟机,所述虚拟机配置信息包括所述多个虚拟机中每个虚拟机的算力比例;创建模块,用于根据所述虚拟机配置信息,创建所述多个虚拟机,所述多个虚拟机共享处理器的算力;发送模块,用于向所述处理器发送第一配置信息,所述第 一配置信息用于指示所述多个虚拟机中每个虚拟机的算力比例。
在一种可能的实现方式中,所述获取模块还用于获取多个模型;所述创建模块还用于根据所述多个模型中的第一模型为所述多个虚拟机中的第一虚拟机创建n个执行序列,n为大于或等于0的整数;配置模块,用于为所述n个执行序列中的每个执行序列配置优先级;发送模块,用于向所述处理器发送第二配置信息,所述第二配置信息用于指示所述第一虚拟机中的n个执行序列的优先级。
在一种可能的实现方式中,所述优先级包含普通类型的优先级和实时类型的优先级,所述实时类型的优先级高于所述普通类型的优先级。
在一种可能的实现方式中,所述虚拟机配置信息还包括为所述多个虚拟机分配时间片的周期长度;所述发送模块还用于向所述处理器发送第三配置信息,所述第三配置信息用于指示为所述多个虚拟机分配时间片的周期长度。
在一种可能的实现方式中,所述为所述多个虚拟机分配时间片的周期大于或等于优先级为实时类型的优先级的执行序列的执行周期。
第四方面及第四方面的各种可能的实现方式中的有益效果可参见第二方面及第二方面的各种可能的实现方式中的有益效果,此处不再赘述。
第五方面,本申请提供一种调度装置。该装置可以包括与存储器耦合的处理器。其中,该存储器用于存储程序代码,该处理器用于执行该存储器中的程序代码,以实现第一方面或第二方面或其中任意一种实现方式中的方法。
可选地,该装置还可以包括该存储器。
第六方面,本申请提供一种芯片,包括至少一个处理器和通信接口,所述通信接口和所述至少一个处理器通过线路互联,所述至少一个处理器用于运行计算机程序或指令,以执行如第一方面或第二方面或其中任意一种可能的实现方式所述的方法。
第七方面,本申请提供一种芯片系统,该芯片系统包括多个如第六方面中的芯片。
第八方面,本申请提供一种计算机可读介质,该计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行如第一方面或第二方面或其中任意一种可能的实现方式所述的方法。
第九方面,本申请提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行如第一方面或第二方面或其中任意一种可能的实现方式所述的方法。
第十方面,本申请提供一种计算设备,包括至少一个处理器和通信接口,所述通信接口和所述至少一个处理器通过线路互联,所述通信接口与目标系统通信,所述至少一个处理器用于运行计算机程序或指令,以执行如第一方面或第二方面或其中任意一种可能的实现方式所述的方法。
第十一方面,本申请提供一种计算系统,包括至少一个处理器和通信接口,所述通信接口和所述至少一个处理器通过线路互联,所述通信接口与目标系统通信,所述至少一个处理器用于运行计算机程序或指令,以执行如第一方面或第二方面或其中任意一种可能的实现方式所述的方法。
第十二方面,本申请提供一种车辆,所述车辆包括如第六方面所述的芯片或第五方面所述的调度装置。
附图说明
图1为本申请的实施例提供的一种系统架构的示意图;
图2为本申请的实施例提供的一种调度方法的流程示意图;
图3为本申请一个实施例提供的调度周期的配置流程图;
图4为本申请一个实施例提供的一种调度方法的流程示意图;
图5为本申请一个实施例提供的主机创建多个虚拟机的流程示意图;
图6为本申请一个实施例提供的创建执行序列的流程示意图;
图7为本申请一个实施例提供的优先级抢占的示意图;
图8为本申请一个实施例提供的备份内存单元的示意图;
图9为本申请一个实施例的调度装置的示意性结构图;
图10为本申请另一个实施例的调度装置的示意性结构图;
图11为本申请又一个实施例的调度装置的结构示意图。
具体实施方式
下面将结合本申请的实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
很多人工智能应用场景中会包含多种不同的业务。中央汽车计算机(Central Car Computer,CCC)架构能够在性能强大的底层硬件平台上,虚拟出多个不同安全等级的分区,供不同的业务使用,且这些分区共享具有计算处理能力的硬件。其中,分区也可以称为容器或虚拟机(virtual machine,VM),计算机处理能力可以简称为算力,具有算力的硬件可以包括神经网络处理器(network process unit,NPU)或图形处理器(graphics processing unit,GPU)等处理器。
以自动驾驶领域为例,自动驾驶领域可以包含规划控制、预测规划、感知融合、管理面360度环视、驾驶员监控(driver monitor system,DMS)、娱乐系统或与座舱相关的功能等业务。这些业务对算力的要求和实时性的要求各不相同。
例如,规划控制、预测规划、感知融合等业务对算力要求和实时性要求均较高;DMS对算力要求要高,对实时性要求不高;与座舱相关业务对算力的要求与实际功能有关,对实时性要求不高。
图1为本申请的实施例提供的一种系统架构的示意图。如图1所示,虚拟机系统100包括主机110和处理器120。
主机110可以包括多个虚拟机(VM1,VM2,…,VMn)和虚拟机管理器(hypervisor)111,多个虚拟机中每个虚拟机包括应用程序(application,APP)、内存管理(runtime)和虚拟处理器驱动程序,虚拟机管理器111可以包括处理器驱动程序和中央处理器(central processing unit,CPU)。其中,处理器驱动程序可以为虚拟机管理器111提供处理器120的驱动功能,例如,处理器驱动程序能够为虚拟机管理器111提供设置虚拟机算力比例和设置虚拟机资源调度周期等的接口;runtime可以部署在APP中, 可以提供处理器120的用户态驱动功能(例如应用程序接口(application programming interface,API)等),APP通过调用runtime提供的API将AI模型加载至处理器120,并驱动处理器120执行AI模型,获取AI模型的执行结果。
处理器120可以为专用的神经网络处理器(AI芯片),例如NPU或GPU等。处理器120可以包括控制器121和多个逻辑运算单元。控制器121用于接收主机110发送的AI模型,调度AI模型执行,得到AI模型的执行结果,并将AI模型的执行结果上报给主机110。逻辑运算单元用于执行控制器121下发的AI模型中的任务(执行序列中的执行单元),给控制器121返回任务的执行结果。
示例性的,AI模型可以是计算图结构,APP将AI模型下发给处理器120之前,会将计算图结构的AI模型进行转换,转换成处理器120的执行序列结构,一个AI模型对应一个或多个执行序列(多个执行序列可以提高并行度),每个执行序列有多个执行单元(也就是AI任务)。一个执行单元(AI任务)还可以切分为多个块(block),block的个数一般和处理器120中逻辑运算单元的核数相等,每个逻辑运算单元一次执行一个block,因此对于一个AI任务,控制器121可以将其同时调度到多个逻辑运算单元上执行,AI模型的所有执行序列的所有AI任务都执行完成后,APP才能得到完整的计算结果。APP可以将AI模型加载到处理器120,一次加载多次执行,也可以分多次将AI模型的AI任务下发到处理器120执行。不管哪种执行方式,处理器120看到的都是执行序列中的AI任务需要执行,因此处理器120对AI模型并发调度,也就是对执行序列并发调度。
示例性的,控制器121可以一次只给逻辑运算单元下发一个执行单元的一个block,逻辑运算单元执行完成后,控制器121再下发下一个block到逻辑运算单元。
可以理解的是,图1所示的系统架构仅是本申请提供的虚拟机系统的一种示例,在本申请另一些实施例中,虚拟机系统100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件、软件或软件和硬件的组合实现,本申请不作限定。
以自动驾驶领域为例,自动驾驶业务中的感知、预测、规划是实时业务,要求对应的AI模型执行的时间越短越好。AI模型执行是典型的并行计算,AI任务(算子)可以切分成多个block,因此可以同时在多个逻辑运算单元上执行以提高计算并行度,提高执行效率,缩短执行时间。
同时,感知、预测、规划业务执行AI模型,是有时间先后顺序的,感知的结果给预测和规划,预测的结果给规划。因此如果感知、预测、规划业务执行AI模型时,处理器所有的逻辑运算单元都给实时业务使用,则执行实时业务的AI模型的时间可以最短,进而最好的保证了业务的实时性。
因此,不为每个虚拟机隔离出一定数量的逻辑运算单元,而是让虚拟机系统中的所有虚拟机共享处理器所有的逻辑运算单元,这样能使实时业务的运行时间最短。
所有虚拟机都有配置的算力规格,为保证分区算力使用的精确性,需要为每个虚拟机分配使用处理器逻辑运算单元的时间片,处理器中的控制器控制各虚拟机的业务使用时间片,由时间片的使用来体现各虚拟机的算力配置规格。
由于非实时业务也有算力分配,也有AI模型需要执行,非实时业务的分配算力比 例也要得到保证。因此,处理器中的控制器给各虚拟机分配使用处理器的时间片不能过小,也就是处理器中控制器的调度周期(每个调度周期给各虚拟机分配一次时间片)不能过小。
因此,如何为这些分区分配算力或者说为这些分区调度算力,才能满足各个业务的算力要求和实时性要求,成为了亟待解决的问题。
基于上述描述,本申请提出了一种调度方法。图2为本申请的实施例的一种调度方法的执行序列程示意图。如图2所示,该方法至少包括S201至S205。图2所示的方法可以应用于图1所示的虚拟机系统100中。
S201,主机获取虚拟机配置信息,虚拟机配置信息用于指示创建多个虚拟机,虚拟机配置信息包括多个虚拟机中每个虚拟机的算力比例。
作为一种示例,本实施例中的主机可以为图1中的主机110。
在一种可能的实现方式中,主机通过虚拟机管理器获取虚拟机配置信息。处理器驱动程序为虚拟机管理器提供配置虚拟机配置信息的接口。
作为一种示例,本实施例中的虚拟机管理器可以为图1中的虚拟机管理器111,本实施例中的处理器驱动程序可以为图1中虚拟机管理器111中的处理器驱动程序。
示例性的,虚拟机配置信息可以包括虚拟机系统中虚拟机的数量、多个虚拟机中每个虚拟机的算力比例、为多个虚拟机进行资源调度的周期等信息。
S202,主机根据虚拟机配置信息,创建多个虚拟机,多个虚拟机共享处理器的算力。
在一种可能的实现方式中,主机根据虚拟机配置信息指示的虚拟机中虚拟机的数量,创建多个虚拟机,该多个虚拟机共享处理器的算力。
在另一种可能的实现方式中,主机获取多个模型,主机根据多个模型中的第一模型为多个虚拟机中的第一虚拟机创建n个执行序列(执行序列),n为大于或等于0的整数,并为n个执行序列中的每个执行序列配置优先级。其中,优先级可以包含普通类型的优先级和实时类型的优先级,实时类型的优先级高于普通类型的优先级。
需要说明的是,第一模型为主机获取的多个模型中的任意一个模型,第一虚拟机为虚拟机系统的多个虚拟机中的任意一个虚拟机。
S203,主机向处理器发送第一配置信息,第一配置信息用于指示多个虚拟机中每个虚拟机的算力比例。
在一种可能的实现方式中,主机向处理器发送第一配置信息,第一配置信息用于指示多个虚拟机中每个虚拟机的算力比例,以便处理器根据第一配置信息中指示的多个虚拟机中每个虚拟机的算力比例,为每个虚拟机分配时间片。
可选地,多个虚拟机中每个虚拟机的算力比例为NPU设置的每个虚拟机的可用的算力比例,例如,NPU中一共有五个虚拟机:VM1、VM2、VM3、VM4和VM5,VM1、VM2、VM3、VM4和VM5分别使用NPU总算力的25%、50%、10%、15%,每个虚拟机的算力比例可以分别设置,总算力比例小于等于100%。
在另一种可能的实现方式中,主机向处理器发送第二配置信息,第二配置信息用于指示第一虚拟机中的n个执行序列的优先级,以便处理器根据第二配置信息指示的第一虚拟机中的n个执行序列的优先级,为n个执行序列分配时间片。
其中,n个执行序列为第一虚拟机中的待执行的执行序列。
在又一种可能的实现方式中,主机获取的虚拟机配置信息中还包括为多个虚拟机分配时间片的周期长度,则主机还可以向处理器发送第三配置信息,第三配置信息用于指示为虚拟机系统中的多个虚拟机分配时间片的周期长度,以便处理器根据第三配置信息指示的周期为虚拟机系统中的多个虚拟机分配时间片。
作为一种示例,为多个虚拟机分配时间片的周期长度大于或等于优先级为实时类型优先级的执行序列的执行周期。
可选地,为多个虚拟机分配时间片的周期长度也称作调度周期。
也就是说,调度周期应不小于实时类型优先级的业务的周期,在一个算力时间片分配周期内,实时类型优先级的业务可以优先调度,然后再调度普通类型优先级的业务。如果周期过小,则实时类型优先级的业务很容易用完分配的时间片,普通类型优先级的业务的也有时间片,也可以执行AI模型,实时类型优先级的业务只能到下个周期才有时间片,导致实时类型优先级的业务不及时。
示例性地,调度周期的配置流程图如图3所示,虚拟机管理器将调度周期发送至NPU驱动器,NPU驱动器将该调度周期再发送至NPU控制器,NPU控制器保存该调度周期,供后续使用。
S204,处理器根据第一配置信息为每个虚拟机分配时间片,第一配置信息用于指示多个虚拟机中每个虚拟机的算力比例。
在一种可能的实现方式中,处理器根据第一配置信息指示的虚拟机的算力比例为虚拟机系统中的每个虚拟机分配时间片,算力比例越大的虚拟机分配到的时间片越多。
在另一种可能的实现方式中,虚拟机系统中的第一虚拟机包括n个执行序列,n为大于或等于0的整数;处理器接收第二配置信息,并根据接收的第一配置信息和第二配置信息为n个执行序列中的s个执行序列分配时间片,其中,s个执行序列中优先级越高的执行序列分配到的时间片越多,s为小于或等于n的整数。
作为一种示例,被分配到时间片的s个执行序列的优先级均为普通类型的优先级,处理器根据第一配置信息和第二配置信息,仅为优先级为普通类型的优先级的执行序列分配时间片,n个执行序列中优先级为实时类型优先级的执行序列不分配时间片,但在第一虚拟机有时间片的情况下,处理器就能调度优先级为实时类型优先级的执行序列中的任务。
在一种可能的实现方式中,处理器接收第三配置信息,并根据第三配置信息指示的为多个虚拟机分配时间片的周期长度为虚拟机系统中的每个虚拟机分配时间片。
作为一种示例,处理器按照第三配置信息指示的周期长度,基于第一配置信息为虚拟机系统中每个虚拟机分配时间片。
示例性的,处理器为虚拟机系统中的第一虚拟机分配的时间片满足如下关系式:
Y=1000*1000/t*m*p
其中,Y表示为每个虚拟机分配的时间片;t表示调度周期,在每个调度周期,处理器为每个虚拟机分配时间片;m表示处理器中逻辑运算单元的数量;p表示每个虚拟机的算力比例。
作为另一种示例,处理器按照第三配置信息指示的周期,基于第一配置信息和第 二配置信息为第一虚拟机中优先级为普通类型的优先级的执行序列分配时间片。
示例性的,普通类型的优先级分为多个等级,处理器根据第三配置信息指示的周期,基于第一配置信息和第二配置信息为第一虚拟机中不同等级的普通类型优先级的执行序列分配时间片。例如,第一虚拟机中的普通类型优先级的执行序列可以分为四个等级,这四个等级由高到低分配时间片的比例为10:8:4:2:1,所属优先级等级比例越大的执行序列分配的时间片越多。
在处理器为第一虚拟机分配时间片后,处理器对第一虚拟机中的任务进行调度。
S205,在多个虚拟机中的第一虚拟机中存在时间片,且第一任务正在执行时,有第二任务到达,停止执行第一任务,且调度执行第二任务,第二任务所属的执行序列的优先级高于第一任务所属的执行序列的优先级。
在一种可能的实现方式中,在多个虚拟机中的第一虚拟机中有时间片,且存在待执行的x个第一执行序列和q个第二执行序列时,处理器优先调度q个第二执行序列中的任务,第一执行序列的优先级为普通类型的优先级,第二执行序列的优先级为实时类型的优先级,x和q为大于或等于0的整数。
作为一种示例,q个第二执行序列中实时类型的优先级还包括第一实时优先级和第二实时优先级,第一实时优先级的优先级高于第二实时优先级,处理器优先调度q个第二执行序列中第一实时优先级的任务。
在一种可能的实现方式中,在第一虚拟机中,第一任务正在执行时,有第二任务到达,第一任务属于x个第一执行序列中的任意一个执行序列,第二任务属于q个第二执行序列中的任意一个执行序列,则处理器停止执行第一任务,调度执行第二任务。
在一种可能的实现方式中,若x个第一执行序列中存在时间片分配比例,如10:8:4:2:1,则在q个第二执行序列中的任务执行完成后,处理器根据时间片分配比例将剩余的时间片分配给x个第一执行序列,剩余的时间片为处理器为x个第一执行序列分配的时间片的总和与q个第二执行序列中的任务执行所使用的时间片的差值。
也就是说,当实时类型的优先级的任务执行完成后,处理器将剩下的时间片按照原来的时间片分配比例分配给普通类型的优先级的执行序列,从而使得普通优先级执行序列的时间片仍然满足时间片分配比例。
在另一种可能的实现方式中,在第一虚拟机中,第一任务正在执行时,有第二任务到达,第一任务和第二任务属于q个第二执行序列中不同的执行序列,且第二任务所属执行序列的优先级高于第一任务所属执行序列的优先级,则处理器停止执行第一任务,调度执行第二任务。
作为一种示例,针对上述两种可能的实现方式,在处理器停止调度第一任务之前,处理器获取第一任务的总执行时间和当前执行的时间,计算第一任务的剩余执行时间,在第一任务的剩余执行时间小于预设阈值时,处理器执行第一任务。
可选地,预设阈值包括切换备份时间。
也就是说,在第一虚拟机中,第一任务正在执行,有第二任务达到,处理器触发任务切换之前,先根据第一任务的总执行时间和当前执行的时间,计算第一任务的剩余执行时间,在第一任务的剩余执行时间小于预设阈值时,处理器仍执行第一任务,保证当前第一任务执行完成,以节约系统资源,提高业务的执行效率。
作为另一种示例,在处理器停止调度第一任务之前,处理器获取第一任务的总执行时间和当前执行的时间,计算第一任务的剩余执行时间,在第一任务的剩余执行时间小于第一任务的预设切换备份时间与预设倍数的乘积时,处理器执行所述第一任务。
其中,切换备份时间可以由NPU驱动器配置给NPU控制器,逻辑运算单元在不同核数、不同运行频率时的切换备份时间是不一样的,都由NPU驱动器配置给NPU控制器;预设倍数为提前配置好的。
也就是说,在第一虚拟机中,第一任务正在执行,有第二任务达到,处理器触发任务切换之前,先根据第一任务的总执行时间和当前执行的时间,计算第一任务的剩余执行时间,在第一任务的剩余执行时间小于第一任务的预设切换备份时间与预设倍数的乘积时,处理器仍执行第一任务,保证当前第一任务执行完成,以节约系统资源,提高业务的执行效率。
作为又一种示例,在处理器停止调度第一任务之前,处理器获取第一任务的总执行时间和当前执行的时间,计算第一任务的剩余执行时间,在第一任务的剩余执行时间大于或等于预设阈值时,处理器停止执行所述第一任务。
也就是说,在第一虚拟机中,第一任务正在执行,有第二任务达到,处理器触发任务切换之前,先根据第一任务的总执行时间和当前执行的时间,计算第一任务的剩余执行时间,在第一任务的剩余执行时间大于或等于预设阈值时,处理器停止执行第一任务,启动执行第二任务,以保证优先级高的第二任务的实时性,提高了业务的执行效率。
可选地,预设阈值包括切换备份时间。
作为一种示例,在处理器停止调度第一任务之后,将第一任务的执行信息存储至备份内存单元中。其中,第一任务的执行信息可以包括执行第一任务的逻辑运算单元的通用寄存器、专用寄存器、内部高速缓冲存储器、缓冲区中的数据等。
作为一种示例,处理器中的备份内存单元的数量满足以下关系式:
L=e*g
其中,L表示处理器中的备份内存单元的数量,e表示处理器中的逻辑运算单元的数量,g表示实时类型优先级的等级个数。
在又一种可能的实现方式中,在第一虚拟机中没有时间片,且存在有待执行的执行序列时,处理器调度除第一虚拟机外的其他存在时间片的虚拟机执行第一虚拟机中的执行序列的任务。
也就是说,在第一虚拟机没有时间片但有待执行任务时,处理器可以调度其他有可用时间片或者有剩余时间片的虚拟机来执行任务,提高了虚拟机系统的业务处理效率,实现了处理器算力利用的最大化。
可选地,当所述第二任务执行完成时,处理器执行未完成的所述第一任务,即当实时任务或者更高优先级的任务执行完时,优先处理被抢占的任务,而不是执行新的任务,保证被抢占的任务先执行。
例如,某个普通类型优先级的任务被抢占,所有实时执行序列的任务都调度完成时,优先恢复被抢占的任务的执行,因为如果执行新的任务,而不是继续执行之前被打断的任务,则再次发生抢占时,对应的内存空间已经备份了现场信息,没有内存空 间来备份这个任务的现场信息。
本申请提供的技术方案中,处理器根据多个虚拟机中每个虚拟机的算力比例,为虚拟机系统中的每个虚拟机分配时间片,算力比例越大的虚拟机分配到的时间片越多,实现了多个虚拟机对处理器算力的共享,满足了各个虚拟机的算力需求,同时保证了每个虚拟机中时间片分配的合理性,且当有优先级高的任务到达时,优先执行,可以保证高优先级任务执行的实时性。
下面以虚拟机系统中的处理器为NPU为例,对本申请提供的调度方法进行介绍。
图4为本申请一个实施例的一种调度方法的流程示意图。如图4所示,该方法至少包括S401至S407。图4所示的方法可以应用于图1所示的虚拟机系统100中。
S401,主机获取虚拟机配置信息。
需要说明的是,S401可以参考S201,此处不再进行赘述。
S402,主机根据虚拟机配置信息创建多个虚拟机。
在一种可能的实现方式中,主机获取的虚拟机配置信息中包括创建虚拟机的数量和创建的多个虚拟机中每个虚拟机的算力比例等信息。
作为一种示例,主机创建多个虚拟机的流程图如图5所示,主机中的虚拟机管理器(hypervisor)根据虚拟机配置信息,创建出多个虚拟机,hypervisor调用NPU驱动器(NPU driver)的接口,将各新增的虚拟机的虚拟机标识符(identity document,ID)和算力比例告知NPU driver,NPU driver将虚拟机ID告知虚拟机中的虚拟NPU驱动器(vNPU driver),vNPU driver保存虚拟机ID信息,NPU driver将虚拟机ID和算力比例配置给NPU控制器,NPU控制器保存虚拟机ID和算力比例等信息。
S403,主机获取目标模型。
在一种可能的实现方式中,虚拟机系统中任意一个虚拟机中的APP获取目标模型,并对目标模型进行加载。
作为一种示例,目标模型可以为AI模型。
S404,主机根据目标模型创建多个执行序列。
在一种可能的实现方式中,虚拟机系统中任意一个虚拟机中的APP加载目标模型时,需要创建多个执行序列(执行序列)。
作为一种示例,图6为本申请一个实施例提供的创建执行序列的流程图。如图6所示,虚拟机管理器调用内存管理(runtime)的接口创建执行序列,并为创建的执行序列配置优先级,内存管理调用虚拟机中的虚拟NPU驱动器的接口申请执行序列ID,虚拟NPU驱动器向主机中的NPU驱动器申请执行序列ID,虚拟NPU驱动器将虚拟机ID也发给NPU驱动器,NPU驱动器为该虚拟机ID对应的虚拟机分配一个执行序列ID,并返回给虚拟NPU驱动器,NPU驱动器同时将虚拟机ID和执行序列ID配置给NPU控制器,NPU控制器保存虚拟机ID和执行序列ID信息,因此NPU控制器有各个虚拟机下执行序列ID的信息。
另外,APP不感知是运行在虚拟机上还是主机上,内存管理给APP提供的接口一样,执行序列的优先级定义也一样。其中,执行序列的优先级可以分为实时类型的优先级和普通类型的优先级,实时类型的优先级和普通类型的优先级又可以进一步细分为多个等级。
例如,实时优先级可以分为SP0、SP1和SP2三个等级,SP0的优先级高于SP1的优先级,SP1的优先级高于SP2的优先级;普通类型的优先级可以分为WRR0、WRR1、WRR3和WRR4四个等级,WRR0的优先级高于WRR1的优先级,WRR1的优先级高于WRR2的优先级,WRR2的优先级高于WRR3的优先级,WRR3的优先级高于WRR4的优先级;实时类型优先级的执行序列可以抢占普通类型优先级的执行序列,等级高的实时类型优先级的执行序列可以抢占等级低的实时类型优先级的执行序列;其优先级抢占的示意图如图7所示,优先级为SP0的执行序列可以抢占优先级为SP1和SP2的执行序列,优先级为SP1的执行序列可以抢占优先级为SP2的执行序列,普通类型优先级的等级高低可以体现于时间片的大小。
S405,主机向NPU发送资源配置信息。
虚拟机系统中的多个虚拟机共享NPU的资源,主机向NPU发送的资源配置信息用于指示NPU对多个虚拟机进行资源分配。
作为一种示例,主机向NPU发送的资源配置信息可以包括多个虚拟机中每个虚拟机的算力比例,每个虚拟机中执行序列的优先级和为每个虚拟机进行资源分配的周期长度等信息。
S406,NPU根据资源配置信息对多个虚拟机进行资源分配。
在一种可能的实现方式中,在资源配置信息指示的每个资源分配周期中,NPU控制器给各个虚拟机分配时间片。
作为一种示例,NPU控制器为第一虚拟机分配的时间片可以满足公式Y=1000*1000/t*m*p,其中,Y表示NPU为第一虚拟机分配的时间片,t表示调度周期,在每个调度周期,处理器为第一虚拟机分配时间片,m表示NPU中逻辑运算单元的数量,p表示第一虚拟机的算力比例。时间片的单位为微秒(us),资源配置信息指示的资源分配周期可以为30赫兹(hertz,HZ)。
在一种可能的实现方式中,NPU控制器按照第一虚拟机中执行序列的优先级为普通类型优先级的执行序列分配时间片。
作为一种示例,第一虚拟机中为普通类型优先级WRR0至WRR4的执行序列分配时间片的比例可以为10:8:4:2:1。由于实时类型优先级的执行序列可以抢占普通类型优先级的执行序列的时间片,因此实时类型优先级的执行序列不分配时间片。
作为一种示例,在多个虚拟机的任意一个虚拟机中,实时类型优先级的执行序列运行时,NPU控制器按照普通类型优先级的时间片比例,扣除普通类型优先级的执行序列的时间片,扣除时间片的总数就是实时类型优先级的执行序列使用的时间片,从而使得普通类型优先级的执行序列的时间片仍然满足预设比例。
在一种可能的实现方式中,在一个资源分配周期结束后,NPU控制器给各个虚拟机重新分配时间片。
作为一种示例,如果任意虚拟机在上个资源分配周期没有使用完时间片,剩余的时间片不带入下一个资源分配周期;如果任意虚拟机使用的时间片超出分配的规格,则应该是最后一个执行序列的任务(task)执行时间导致的,由于task的执行时间远小于虚拟机的时间片,因此超出部分也不在下一个资源分配周期中扣除。在虚拟机内部,普通类型优先级的执行序列没有使用完的时间片也不带入下一个资源分配周期, 但是上一个资源分配周期中多用的时间片会在下一次分配时间片时扣除。
S407,NPU对多个虚拟机中的任务进行调度。
在一种可能的实现方式中,NPU中的NPU控制器对虚拟机系统中执行序列的任务进行调度。
作为一种示例,NPU控制器先调度有时间片的虚拟机中的实时类型优先级的执行序列和有时间片的普通类型优先级的执行序列,如果这些执行序列都没有任务调度,再调度其它执行序列,例如,没有时间片的执行序列。
作为另一种示例,在有实时类型优先级和普通类型优先级的执行序列一起调度时,按照实时高优先级、实时低优先级、普通优先级的顺序依次调度每种类型的执行序列。即等级高的实时类型优先级的执行序列都调度完成后,再调度等级低的实时类型优先级的执行序列,最后调度普通类型的执行序列。
作为又一种示例,在有多个同等优先级的执行序列一起调度时,任意一个执行序列的运行时间超过预设时间,或该执行序列的时间片用完,则NPU控制器调度其它执行序列。这种情况下,NPU控制器待当前运行的任务(task)执行完成后再切换其它执行序列的task,不中途切换正在执行的task/block。
在一种可能的实现方式中,NPU控制器在每个block执行完成时,可以得到block运行所用的时间片,并扣除对应的执行序列和虚拟机的时间片。
作为一种示例,在有时间片的虚拟机中普通类型优先级的任务正在执行时,有实时类型优先级执行序列的任务达到,则停止调度普通类型优先级的任务,启动调度实时类型优先级的任务。虚拟机中实时类型优先级执行序列的任务调度完成后,实时类型优先级任务所使用的时间片需要从该虚拟机的普通优先级执行序列的时间片中扣除。
示例性的,在实时类型优先级的执行序列有任务到达时,如果逻辑运算单元正在执行普通类型优先级执行序列的task的block;或等级低的实时类型优先级执行序列的task执行时,等级高的实时类型优先级执行序列的任务到达,NPU控制器需要中断正在执行的block,转而执行实时(高)优先级task的block,这个过程可以称为抢占。
执行序列的task/block的执行被打断后,NPU控制器需要备份该block当前运行的现场信息,现场信息可以包括执行该block的逻辑单元中的通用寄存器信息、专用寄存器信息、内部高速缓冲存储器(buffer)中的信息和缓冲区(cache)中的信息等。逻辑运算单元中的buffer和cache空间比较大,因此需要同样大的内存空间来备份这些信息。其中,备份一个逻辑运算单元所有现场信息(所有的通用寄存器、专用寄存器、buffer、cache)的内存空间称为一个备份内存单元。
作为一种示例,NPU中的备份内存单元的数量可以满足关系式L=e*g,其中,L表示NPU中备份内存单元的数量,e表示NPU中逻辑运算单元的数量,g表示实时类型的优先级的等级个数。
示例性的,实时类型的优先级有3个等级,分别为SP0、SP1和SP2,则g的取值为3。
示例性的,每个逻辑运算单元中备份内存单元的数量等于实时类型的优先级的等级个数。实时类型的优先级有3个等级时,NPU中每个逻辑运算单元中备份内存单元的数量为3个。每个内存备份块用于存放逻辑运算单元的通用寄存器、专用寄存器、 内部cache、buffer的数据,即逻辑运算单元运行block的现场信息。
可选地,图8为本申请一个实施例提供的备份内存单元示意图,如图8所示,NPU中包括多个逻辑运算单元(逻辑运算单元0至逻辑运算单元3),由控制器进行控制,每个逻辑运算单元包括多个内部高速缓冲存储器(buffer)和多个缓冲区(cache)等,备份内存单元0至备份内存单元3中可以备份一个逻辑运算单元所有现场信息(所有的通用寄存器、专用寄存器、buffer、cache)。
示例性的,抢占发生时,被抢占的block在逻辑运算单元中的现场信息,备份到对应的空间中。优先级为普通类型的执行序列被优先级为实时类型SP0/1/2的执行序列抢占时,现场信息备份在备份内存单元2中;如果优先级为实时类型优先级SP2的执行序列被优先级为实时类型优先级SP0/1的执行序列抢占时,现场信息备份在备份内存单元1中;优先级为实时类型优先级SP1的执行序列被优先级为实时类型优先级SP0的执行序列抢占时,现场信息备份在备份内存单元0中。
示例性的,NPU驱动器根据NPU中逻辑运算单元的数量,分配备份内存单元的总空间,NPU驱动器在NPU启动时,将备份内存单元配置给NPU控制器。
作为一种示例,逻辑运算单元正在执行普通类型优先级的task/block或等级低的实时类型优先级的task/block时,NPU控制器检测到实时类型的执行序列或等级高的实时类型优先级的执行序列中有任务达到,则NPU控制器配置逻辑运算单元的控制寄存器,使逻辑运算单元停止执行当前的block,逻辑运算单元停止内部各执行序列水线的执行,并给NPU控制器回信号,告知NPU控制器已经停止执行当前block,NPU控制器配置逻辑运算单元的控制寄存器,使逻辑运算单元开启备份功能,逻辑运算单元启动内存标签扩展(memory tagging extension,MTE),将寄存器、cache、buffer中的数据备份到指定的备份内存单元中,逻辑运算单元给NPU控制器回信号,告知NPU控制器备份完成,NPU控制器记录被停止的执行序列、task、block的信息,NPU控制器将实时(高)优先级task的block配置给逻辑运算单元,逻辑运算单元执行当前配置的实时类型优先级task的block。
作为一种示例,被抢占的task和执行序列队列中待调度的task都有调度机会时,优先恢复被抢占的task的block。
示例性的,某个普通类型优先级的task被抢占后,当所有实时类型优先级执行序列的task都调度完成时,优先恢复执行被抢占的task。因为如果执行新的task,而不是继续执行之前被打断的task,则再次发生抢占时,对应的内存空间已经备份了现场信息,没有内存空间来备份这个新的task的现场信息。
示例性的,抢占恢复执行序列程可以包括:NPU控制器配置逻辑运算单元的寄存器,使逻辑运算单元开启现场恢复功能,逻辑运算单元启动MTE,从指定的备份内存单元中把数据恢复到寄存器、cache和buffer中,逻辑运算单元给NPU控制器回信号,告知NPU控制器恢复完成,NPU控制器配置逻辑运算单元,使逻辑运算单元开始执行被抢占的block,即从当前恢复的环境继续执行之前被中断的block,同时NPU控制器清除备份内存单元中记录的被停止的执行序列、task、block信息。
需要说明的是,在逻辑运算单元A被抢占的block,可以在逻辑运算单元B上恢复运行。
作为一种示例,由于抢占需要备份和恢复环境,因此开销较大。如果当前block的执行剩余时间小于备份时间,则让该block执行完成比发生抢占更有利于任务调度,提高任务执行效率。那么,NPU控制器需要知道当前执行的task或block的总执行时间和已经执行的时间,才能知道当前的block还需要执行多长时间才能结束,才能知道是否可以避免抢占。因此需要增加task或block的执行时间统计。
示例性的,为NPU中每个逻辑单元内部增加一个性能统计计数器t_cnt,用于统计block的执行周期(cycle)数,为每个task增加一个空间存放task/block的执行cycle数,这个执行cycle数是根据block实际执行时间得来的,因此模型加载时执行cycle数为0。
示例性的,逻辑运算单元开始执行block时,将性能统计计数器t_cnt设置为0,执行过程中,逻辑运算单元每个cycle将t_cnt加1。在逻辑运算单元执行block过程中,NPU控制器可以读取block已经执行的cycle数,block执行结束时,NPU控制器可以获取block的执行总cycle数。
示例性的,NPU控制器控制调度task/block执行,block执行结束时,NPU控制器读取逻辑运算单元的t_cnt值,即block的执行时间,汇总到执行队列中对应的task的cyele数的位置,因此可以得到task的总执行cycle数,总执行cycle数除以block数,可以得到block的平均执行时间。
如果NPU中逻辑运算单元的频率和核数发生调整,task的执行cycle数也会变化,因此NPU控制器需要重新统计task的执行cycle数。NPU控制器只在第一次运行AI模型,或NPU中逻辑运算单元的频率和核数发生调整时,才重新统计和更新task的总cycle数。
作为一种示例,NPU控制器配置预设阈值,如果当前执行的block的剩余时间小于预设阈值,则不进行抢占,执行完当前的block后,再执行新到达的task的block。
示例性的,配置的预设阈值可以为切换备份时间的倍数,如果当前执行的block的剩余执行时间小于预设的切换备份时间的倍数,则不进行抢占。切换备份时间可以由NPU驱动器配置给NPU控制器,逻辑运算单元在不同核数、不同运行频率时的切换备份时间是不一样的,都由NPU驱动器配置给NPU控制器。
需要说明的是,模型初次运行、逻辑运算单元发生核数调整、运行频率调整时,模型队列中的task信息里没有task的执行统计时间,因此控制器不考虑抢占避免,直接抢占。模型执行一次后就有了task执行统计时间,之后NPU控制器就可以执行抢占避免操作。
在一种可能的实现方式中,当有时间片的虚拟机中的实时类型优先级的执行序列或有时间片的执行序列都没有任务调度时,NPU控制器进行空闲处理。
作为一种示例,NPU控制器调度有时间片的虚拟机执行没有时间片的虚拟机中的所有执行序列和虚拟机有时间片但执行序列无时间片的执行序列,并且在调度过程中,不区分执行序列的优先级,每个执行序列均执行预设时间后,再调度其它执行序列。
作为一种示例,在空闲处理状态下,执行序列的task执行的时间,不占用该虚拟机分配的时间片。
作为一种示例,在有时间片的虚拟机执行其它虚拟机中执行序列的task时,该虚 拟机中实时类型优先级的执行序列有任务到达,则停止调度当前任务,启动调度到达的实时类型优先级执行序列的任务;如果是普通类型优先级的执行序列有任务到达,则等待当前任务执行完成后,调度该虚拟机执行到达的普通类型优先级执行序列的任务。
在一种可能的实现方式中,若虚拟机没有时间片,但该虚拟机中实时类型优先级的执行序列有task待调度,则表示该虚拟机资源分配异常。
作为一种示例,出现虚拟机资源分配异常的一种情况可以为该虚拟机中都是实时类型优先级的执行序列或该虚拟机中实时类型优先级执行序列的数量远远大于普通类型优先级执行序列的数量,导致该虚拟机分配的算力过小,解决方法是增加该虚拟机的算力比例。
作为另一种示例,NPU控制器调度该虚拟机执行实时类型优先级执行序列的task前,普通类型优先级的执行序列已用完该虚拟机中的时间片,导致该虚拟机中无时间片执行实时类型优先的执行序列,解决方法可以是将普通类型优先级的执行序列移到另外的虚拟机中执行或增加该虚拟机的算力。
本申请提供的技术方案,使虚拟机系统中各虚拟机按照配置的NPU算力比例,共享NPU算力;实时类型的任务优先执行,实时类型任务可以使用所有的NPU逻辑运算单元,从而保证了实时任务的实时性;当当前执行任务的block的剩余执行时间小于预设阈值时,NPU控制器不做抢占切换,提高了NPU的使用效率,同时提高了实时任务的实时性。
图9为本申请一个实施例的调度装置的示意性结构图。如图9所示,装置900可以包括分配模块901、执行模块902、接收模块903、调度模块904、获取模块905、计算模块906和存储模块907。装置900可以用于实现图2和图4中由处理器实现的操作。
本申请实施例中的任意模块可以全部或部分通过软件和/或硬件方式实现。
在一种实现方式中,装置900可以用于实现上述图2所示的方法。例如,分配模块901用于实现S204,执行模块902用于实现S205。
在另一种实现方式中,装置900还可以包括调度模块,该实现方式中的装置900可以用于实现上述图4所示的方法。例如,分配模块901用于实现S406,调度模块用于实现S407。
图10为本申请另一个实施例的调度装置的示意性结构图。如图10所示,装置1000可以包括获取模块1001、创建模块1002、发送模块1003和配置模块1004。装置1000可以用于实现图2和图4中由主机实现的操作。
本申请实施例中的任意模块可以全部或部分通过软件和/或硬件方式实现。
在一种实现方式中,装置1000可以用于实现上述图2所示的方法。例如,获取模块1001用于实现S201,创建模块1002用于实现S202,发送模块1003用于实现S203。
在另一种实现方式中,装置1000可以用于实现上述图4所示的方法。例如,获取模块1001用于实现S401和S403,创建模块1002用于实现S402和S404,发送模块1003用于实现S405。
图11为本申请一个实施例提供的调度装置的结构示意图。图11所示的装置1100 可以用于执行前述任意一个实施例所述的方法。
如图11所示,本实施例的装置1100包括:存储器1101、处理器1102、通信接口1103以及总线1104。其中,存储器1101、处理器1102、通信接口1103通过总线1104实现彼此之间的通信连接。
存储器1101可以是只读存储器(read only memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(random access memory,RAM)。存储器1101可以存储程序,当存储器1101中存储的程序被处理器1102执行时,处理器1102可以用于执行图2至图4所示的方法的各个步骤。
处理器1102可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),或者一个或多个集成电路,用于执行相关程序,以实现本申请方法实施例的调度方法。
处理器1102还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请各个实施例的方法的各个步骤可以通过处理器1102中的硬件的集成逻辑电路或者软件形式的指令完成。
上述处理器1102还可以是通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1101,处理器1102读取存储器1101中的信息,结合其硬件完成本申请实施例中各个方法所需执行的功能,例如,可以执行图2至图4所示实施例的各个步骤/功能。
通信接口1103可以使用但不限于收发器一类的收发装置,来实现装置1100与其他设备或通信网络之间的通信。
总线1104可以包括在装置1100各个部件(例如,存储器1101、处理器1102、通信接口1103)之间传送信息的通路。
应理解,本申请实施例所示的装置1100可以是电子设备,或者,也可以是配置于电子设备中的芯片。
应理解,本申请实施例中的处理器可以为中央处理单元(central processing unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
还应理解,本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器 (read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的随机存取存储器(random access memory,RAM)可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令或计算机程序。在计算机上加载或执行所述计算机指令或计算机程序时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质。半导体介质可以是固态硬盘。
应理解,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,其中A,B可以是单数或者复数。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系,但也可能表示的是一种“和/或”的关系,具体可参考前后文进行理解。
本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (30)

  1. 一种调度方法,其特征在于,应用于虚拟机系统,所述虚拟机系统包含多个虚拟机,所述多个虚拟机共享处理器的算力,所述方法包括:
    所述处理器根据第一配置信息为所述多个虚拟机中的每个虚拟机分配时间片,所述第一配置信息用于指示所述多个虚拟机中每个虚拟机的算力比例;
    在所述多个虚拟机中的第一虚拟机中存在所述时间片,且第一任务正在执行时,有第二任务到达,所述处理器停止执行所述第一任务,且调度执行所述第二任务,所述第二任务所属的执行序列的优先级高于所述第一任务所属的执行序列的优先级。
  2. 根据权利要求1所述的方法,其特征在于,所述多个虚拟机中的第一虚拟机包括n个执行序列,n为大于或等于0的整数;
    所述方法还包括:
    所述处理器根据所述第一配置信息和第二配置信息为所述n个执行序列中的s个执行序列分配所述时间片,所述第二配置信息用于指示所述第一虚拟机中的所述n个执行序列的优先级,s为小于或等于n的整数。
  3. 根据权利要求2所述的方法,其特征在于,所述第一虚拟机中的执行序列的优先级包含普通类型的优先级和实时类型的优先级,所述实时类型的优先级高于所述普通类型的优先级;
    其中,所述s个执行序列的优先级为普通类型的优先级。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述方法还包括:
    所述处理器根据所述第一配置信息和第三配置信息为所述每个虚拟机分配所述时间片,所述第三配置信息用于指示所述处理器为所述多个虚拟机分配时间片的周期长度。
  5. 根据权利要求4所述的方法,其特征在于,所述处理器为所述多个虚拟机分配时间片的周期长度大于或等于优先级为实时类型的优先级的执行序列的执行周期的周期长度。
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,所述方法还包括:
    所述处理器接收所述第一配置信息。
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述方法还包括:
    在所述多个虚拟机中的第一虚拟机中存在时间片和待执行的x个第一执行序列和q个第二执行序列时,所述处理器优先调度所述q个第二执行序列中的任务,所述第一执行序列的优先级为普通类型的优先级,所述第二执行序列的优先级为实时类型的优先级,x和q为大于或等于0的整数。
  8. 根据权利要求7所述的方法,其特征在于,所述处理器优先调度所述q个第二执行序列中的任务,包括:
    所述q个第二执行序列中实时类型的优先级还包括第一实时优先级和第二实时优先级,所述第一实时优先级的优先级高于所述第二实时优先级;
    所述处理器优先调度所述q个第二执行序列中第一实时优先级的任务。
  9. 根据权利要求7或8所述的方法,其特征在于,所述第一任务属于所述x个第一执行序列中的任意一个执行序列,所述第二任务属于所述q个第二执行序列中的任 意一个执行序列。
  10. 根据权利要求9所述的方法,其特征在于,若所述x个第一执行序列中存在时间片分配比例,则在所述q个第二执行序列中的任务执行完成后,所述处理器根据所述时间片分配比例将剩余的时间片分配给所述x个第一执行序列,所述剩余的时间片为所述处理器为所述x个第一执行序列分配的时间片的总和与所述q个第二执行序列中的任务执行所使用的时间片的差值。
  11. 根据权利要求7或8所述的方法,其特征在于,所述第一任务和所述第二任务属于所述q个第二执行序列中不同的执行序列,且所述第二任务所属执行序列的优先级高于所述第一任务所属执行序列的优先级。
  12. 根据权利要求9至11中任一项所述的方法,其特征在于,所述方法还包括:
    所述处理器获取所述第一任务的总执行时间和当前执行的时间;
    所述处理器根据所述第一任务的总执行时间和所述当前执行的时间,计算所述第一任务的剩余执行时间;
    在所述第一任务的剩余执行时间小于预设阈值时,所述处理器执行所述第一任务。
  13. 根据权利要求9至11中任一项所述的方法,其特征在于,所述方法还包括:
    所述处理器获取所述第一任务的总执行时间和当前执行的时间;
    所述处理器根据所述第一任务的总执行时间和所述当前执行的时间,计算所述第一任务的剩余执行时间;
    在所述第一任务的剩余执行时间小于所述第一任务的预设切换备份时间与预设倍数的乘积时,所述处理器执行所述第一任务。
  14. 根据权利要求9至11中任一项所述的方法,其特征在于,所述方法还包括:
    所述处理器获取所述第一任务的总执行时间和当前执行的时间;
    所述处理器根据所述第一任务的总执行时间和所述当前执行的时间,计算所述第一任务的剩余执行时间;
    在所述第一任务的剩余执行时间大于或等于预设阈值时,所述处理器停止执行所述第一任务。
  15. 根据权利要求9至13中任一项所述的方法,其特征在于,所述方法还包括:
    将所述第一任务的执行信息存储至备份内存单元中。
  16. 根据权利要求15所述的方法,其特征在于,所述第一任务的执行信息包括以下信息的一种或多种:执行所述第一任务的逻辑运算单元的通用寄存器、专用寄存器、内部高速缓冲存储器、缓冲区中的数据。
  17. 根据权利要求15或16所述的方法,其特征在于,所述处理器中的备份内存单元的数量满足以下关系式:
    L=e*g
    其中,L表示所述处理器中的备份内存单元的数量,e表示所述处理器中的逻辑运算单元的数量,g表示实时类型的优先级的等级个数。
  18. 根据权利要求1至17中任一项所述的方法,其特征在于,所述方法还包括:
    当所述第二任务执行完成时,所述处理器执行未完成的所述第一任务。
  19. 根据权利要求7至18中任一项所述的方法,其特征在于,所述方法还包括:
    在所述第一虚拟机中没有时间片,且存在有待执行的执行序列时,所述处理器调度除所述第一虚拟机外的其他存在时间片的虚拟机执行所述第一虚拟机中的执行序列的任务。
  20. 根据权利要求1至19中任一项所述的方法,其特征在于,所述处理器根据第一配置信息为所述每个虚拟机分配的时间片满足如下关系式:
    Y=1000*1000/t*m*p
    其中,Y表示为所述每个虚拟机分配的时间片;t表示调度周期,在每个所述调度周期,所述处理器为所述每个虚拟机分配所述时间片;m表示所述处理器中逻辑运算单元的数量;p表示所述每个虚拟机的算力比例。
  21. 一种调度方法,其特征在于,应用于虚拟机系统,所述虚拟机系统包含主机,所述方法包括:
    所述主机获取虚拟机配置信息,所述虚拟机配置信息用于指示创建多个虚拟机,所述虚拟机配置信息包括所述多个虚拟机中每个虚拟机的算力比例;
    所述主机根据所述虚拟机配置信息,创建所述多个虚拟机,所述多个虚拟机共享处理器的算力;
    所述主机向所述处理器发送第一配置信息,所述第一配置信息用于指示所述多个虚拟机中每个虚拟机的算力比例。
  22. 根据权利要求21所述的方法,其特征在于,所述方法还包括:
    所述主机获取多个模型;
    所述主机根据所述多个模型中的第一模型为所述多个虚拟机中的第一虚拟机创建n个执行序列,n为大于或等于0的整数;
    所述主机为所述n个执行序列中的每个执行序列配置优先级;
    向所述处理器发送第二配置信息,所述第二配置信息用于指示所述第一虚拟机中的n个执行序列的优先级。
  23. 根据权利要求22所述的方法,其特征在于,所述优先级包含普通类型的优先级和实时类型的优先级,所述实时类型的优先级高于所述普通类型的优先级。
  24. 根据权利要求21至23中任一项所述的方法,其特征在于,所述虚拟机配置信息还包括为所述多个虚拟机分配时间片的周期长度;
    所述方法还包括:
    所述主机向所述处理器发送第三配置信息,所述第三配置信息用于指示为所述多个虚拟机分配时间片的周期长度。
  25. 根据权利要求24所述的方法,其特征在于,所述为所述多个虚拟机分配时间片的周期大于或等于优先级为实时类型的优先级的执行序列的执行周期。
  26. 一种调度装置,其特征在于,包括:存储器和处理器;
    所述存储器用于存储程序指令;
    所述处理器用于调用所述存储器中的程序指令执行如权利要求1至20中任一项所述的方法或权利要求21至25中任一项所述的方法。
  27. 一种芯片,其特征在于,包括至少一个处理器和通信接口,所述通信接口和所述至少一个处理器通过线路互联,所述至少一个处理器用于运行计算机程序或指令, 以执行如权利要求1至20中任一项所述的方法或权利要求21至25中任一项所述的方法。
  28. 一种计算机可读介质,其特征在于,所述计算机可读介质存储用于计算机执行的程序代码,该程序代码包括用于执行如权利要求1至20中任一项所述的方法或权利要求21至25中任一项所述的方法的指令。
  29. 一种计算机程序产品,其特征在于,所述计算机程序产品包括指令,当所述指令被执行时,使得计算机执行权利要求1至20中任一项所述的方法或权利要求21至25中任一项所述的方法。
  30. 一种车辆,其特征在于,所述车辆包括如权利要求26所述的调度装置或如权利要求27所述的芯片。
PCT/CN2022/096455 2022-05-31 2022-05-31 调度方法及相关装置 WO2023230909A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/096455 WO2023230909A1 (zh) 2022-05-31 2022-05-31 调度方法及相关装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/096455 WO2023230909A1 (zh) 2022-05-31 2022-05-31 调度方法及相关装置

Publications (1)

Publication Number Publication Date
WO2023230909A1 true WO2023230909A1 (zh) 2023-12-07

Family

ID=89026707

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/096455 WO2023230909A1 (zh) 2022-05-31 2022-05-31 调度方法及相关装置

Country Status (1)

Country Link
WO (1) WO2023230909A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7356817B1 (en) * 2000-03-31 2008-04-08 Intel Corporation Real-time scheduling of virtual machines
US20120216193A1 (en) * 2011-02-21 2012-08-23 Samsung Electronics Co., Ltd. Apparatus and method for controlling virtual machine schedule time
CN109857542A (zh) * 2018-12-14 2019-06-07 贵州华芯通半导体技术有限公司 算力资源调节方法、系统及装置
US20200326980A1 (en) * 2017-10-10 2020-10-15 Opensynergy Gmbh Control Unit, Method for Operating A Control Unit, Method for Configuring A Virtualization System of A Control Unit
CN114237818A (zh) * 2021-12-01 2022-03-25 科东(广州)软件科技有限公司 虚拟机间共享资源的方法、系统、计算设备及存储介质
CN114327843A (zh) * 2020-09-29 2022-04-12 华为技术有限公司 任务调度方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7356817B1 (en) * 2000-03-31 2008-04-08 Intel Corporation Real-time scheduling of virtual machines
US20120216193A1 (en) * 2011-02-21 2012-08-23 Samsung Electronics Co., Ltd. Apparatus and method for controlling virtual machine schedule time
US20200326980A1 (en) * 2017-10-10 2020-10-15 Opensynergy Gmbh Control Unit, Method for Operating A Control Unit, Method for Configuring A Virtualization System of A Control Unit
CN109857542A (zh) * 2018-12-14 2019-06-07 贵州华芯通半导体技术有限公司 算力资源调节方法、系统及装置
CN114327843A (zh) * 2020-09-29 2022-04-12 华为技术有限公司 任务调度方法及装置
CN114237818A (zh) * 2021-12-01 2022-03-25 科东(广州)软件科技有限公司 虚拟机间共享资源的方法、系统、计算设备及存储介质

Similar Documents

Publication Publication Date Title
US10552222B2 (en) Task scheduling method and apparatus on heterogeneous multi-core reconfigurable computing platform
KR100628492B1 (ko) 실시간 동작 수행방법 및 시스템
Kato et al. Semi-partitioned fixed-priority scheduling on multiprocessors
WO2017070900A1 (zh) 多核数字信号处理系统中处理任务的方法和装置
CN110489213B (zh) 一种任务处理方法及处理装置、计算机系统
JP2018190454A (ja) 動的仮想マシンサイジング
WO2016078178A1 (zh) 一种虚拟cpu调度方法
US10846251B1 (en) Scratchpad-based operating system for multi-core embedded systems
US9588808B2 (en) Multi-core system performing packet processing with context switching
US10275558B2 (en) Technologies for providing FPGA infrastructure-as-a-service computing capabilities
US11347546B2 (en) Task scheduling method and device, and computer storage medium
CN104598298A (zh) 基于虚拟机当前工作性质以及任务负载的虚拟机调度算法
CN112130963A (zh) 虚拟机任务的调度方法、装置、计算机设备及存储介质
WO2023093843A1 (zh) 一种配置装置、调度装置及配置方法和调度方法
CN115167996A (zh) 调度方法及装置、芯片、电子设备及存储介质
CN113296926B (zh) 一种资源分配方法、计算设备及存储介质
CN115237556A (zh) 调度方法及装置、芯片、电子设备及存储介质
CN112925616A (zh) 任务分配方法、装置、存储介质及电子设备
CN116501447B (zh) 基于Xen的硬实时实现系统
US20170132030A1 (en) Virtual machine system, control method thereof, and control program recording medium thereof
WO2023230909A1 (zh) 调度方法及相关装置
CN113296957B (zh) 一种用于动态分配片上网络带宽的方法及装置
WO2022236816A1 (zh) 一种任务分配方法及装置
CN115509704A (zh) 一种任务调度方法、装置、设备及存储介质
WO2023123395A1 (zh) 一种计算任务处理装置、方法及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22944245

Country of ref document: EP

Kind code of ref document: A1