WO2013131340A1 - 片上系统soc的多处理器的调度方法及装置 - Google Patents

片上系统soc的多处理器的调度方法及装置 Download PDF

Info

Publication number
WO2013131340A1
WO2013131340A1 PCT/CN2012/077537 CN2012077537W WO2013131340A1 WO 2013131340 A1 WO2013131340 A1 WO 2013131340A1 CN 2012077537 W CN2012077537 W CN 2012077537W WO 2013131340 A1 WO2013131340 A1 WO 2013131340A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
cpu
cpus
slave
soc
Prior art date
Application number
PCT/CN2012/077537
Other languages
English (en)
French (fr)
Inventor
王翔宇
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to EP12870664.5A priority Critical patent/EP2824569A4/en
Priority to US14/383,203 priority patent/US20150121391A1/en
Publication of WO2013131340A1 publication Critical patent/WO2013131340A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5044Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5017Task decomposition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/504Resource capping
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present invention relates to the field of communications, and in particular to a multi-processor scheduling method and apparatus for a system-on-a-chip.
  • BACKGROUND At present, multi-processor systems have been fully applied, but how to combine multiple homogeneous/heterogeneous processors (Central Processing Units, CPUs) as a whole to complete batch tasks, related There is no clear way to deal with technology.
  • the most commonly used parallel processing method is the Symmetrical Multi-Processing (SMP) system (as shown in Figure 1), that is, for multiple homogeneous processors, which solves the cache (Cache).
  • SMP Symmetrical Multi-Processing
  • peripheral devices such as memory/external interrupts/external devices
  • SMP-compatible operating systems such as Linux/Windows
  • the task is divided into multiple subtasks by the operating system and dynamically dispatched to the appropriate target processor for load execution.
  • Another type of parallel processing that uses more is the clustering of computers (as shown in Figure 2), that is, each individual computer acts as a single node in the entire system.
  • the task is automatically distributed to another computer through another network or a computer in the network. After all the computers are executed, feedback information is sent to the distribution computer, and the task execution is ended.
  • FIG. 3 is a schematic diagram of a SOC multi-core scheduling architecture according to the related art.
  • SOC system on chip
  • CLUSTER CPU communication speed in the cluster
  • Multiple isomorphic CPUs are used as a cluster (suggested to form a cluster of homogeneous CPUs, especially in heterogeneous cases), which can coexist with CPU clusters of other architectures, sharing all external memory and peripherals.
  • 4 is a schematic diagram of a SOC parallel computing architecture according to the related art. As shown in FIG. 4, the SOC system can externally obtain a task stream (Task Stream), and the task stream can include multiple compiled according to different architecture processor types. The binary execution code.
  • the code can be dynamically executed automatically according to the number of processors allocated, and can communicate with any computer in the assigned processor group with error reporting and final feedback.
  • Code writing rules can conform to industry multiprocessor programming standards, such as the Message Passing Interface (MPI) standard.
  • MPI Message Passing Interface
  • a related solution is provided in the related art, in which the main operating system can monitor some behaviors of the operating system, and can send commands to adjust the current actions, and cannot implement task scheduling.
  • MPI Message Passing Interface
  • a related solution is provided in the related art, in which the main operating system can monitor some behaviors of the operating system, and can send commands to adjust the current actions, and cannot implement task scheduling.
  • it is mainly a transaction level/thread level detail scheduling policy processing, and an MPI multi-processor scheduling method. It can be seen that in the SOC related technology, the task scheduling in the homogeneous/heterogeneous processor group cannot be implemented with the processor as the basic scheduling unit.
  • a method for scheduling a multi-processor of a system-on-chip SOC including: after receiving a task to be executed, a main CPU of the SOC acquires dynamic execution parameters of the task; Determining, by the CPU, one or more slave CPUs currently available in the SOC, a task allocation scheme that satisfies the dynamic execution parameter; the primary CPU schedules one or more slave CPUs to perform the task according to the task assignment scheme.
  • the dynamic execution parameter includes: a CPU type that performs the task; and the main CPU determines, according to one or more slave CPUs currently available in the SOC, a task allocation scheme that satisfies the dynamic execution parameter, The method includes: assigning the task to one or more slave CPUs of the one or more slave CPUs that are currently available to the SOC and corresponding to the CPU type.
  • the dynamic execution parameter further includes: a maximum number of CPUs executed in parallel; the main CPU determines, according to one or more slave CPUs currently available in the SOC, a task allocation scheme that satisfies the dynamic execution parameter
  • the method includes: assigning the task to one or more slave CPUs corresponding to the CPU type in one or more slave CPUs currently available to the SOC, where the number of the one or more slave CPUs is not greater than The maximum number of CPUs.
  • the master CPU schedules one or more slave CPUs to execute the task according to the task allocation scheme, including: the master CPU selects one slave CPU as a virtual master CPU from a plurality of the slave CPUs, and The task is distributed to the selected virtual main CPU; the selected virtual main CPU schedules a plurality of CPUs in the slave CPU to perform the task.
  • the selected virtual main CPU scheduling a plurality of the CPUs in the slave CPU to execute the task the method includes: the selected virtual master CPU receiving a result of performing the task fed back by each of the slave CPUs; The selected virtual main CPU summarizes the results fed back from the CPU and feeds back the results to the main CPU.
  • the dynamic execution parameter further includes: a maximum execution time of the task; the method further includes: in a case that the result summary is not received after the maximum execution time is exceeded, the main CPU notifies The slave CPU executing the task stops executing the task and releases the CPU resources occupied by the task.
  • the plurality of slave CPUs comprise: slave CPUs belonging to the same CPU cluster.
  • the determining module is further configured to allocate the task to one or more slave CPUs currently available in the SOC One or more slave CPUs corresponding to the CPU type.
  • the determining module is further configured to allocate the task to one or more slave CPUs currently available to the SOC One or more slave CPUs corresponding to the CPU type, wherein the number of the one or more slave CPUs is not greater than the maximum number of CPUs.
  • the plurality of slave CPUs determined by the determining module comprise: slave CPUs belonging to the same CPU cluster.
  • the task allocation scheme schedules one or more slave CPUs to perform the above tasks, and implements multiprocessor scheduling with the processor as the basic scheduling unit.
  • BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are set to illustrate,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
  • FIG. 5 is a flowchart of a multiprocessor scheduling method of a system-on-chip SOC according to an embodiment of the present invention
  • FIG. 6 is a schematic diagram of a modification of an executable task according to an embodiment of the present invention
  • FIG. 8 is a schematic diagram of a MAIN CPU interacting with other CLUSTER CPUs according to an embodiment of the present invention
  • FIG. 9 is a multiprocessor scheduling apparatus for a system-on-chip SOC according to an embodiment of the present invention
  • a scheduling method for a multi-processor of a system-on-chip SOC is provided, which can implement scheduling of a multi-processor of an SOC.
  • 5 is a flowchart of a multiprocessor scheduling method of a system-on-chip SOC according to an embodiment of the present invention. As shown in FIG. 5, the method may include the following steps (step S502-step S506): Step S502, a system on chip After receiving the task to be executed, the main processor CPU of the SOC acquires dynamic execution parameters of the task. Step S504, the main CPU determines a task allocation scheme that satisfies the dynamic execution parameter according to one or more slave CPUs currently available in the SOC.
  • Step S506 the main CPU schedules one or more slave CPUs to perform the above tasks according to the task assignment scheme described above.
  • the main CPU of the SOC acquires the dynamic execution parameter of the task, and determines, according to one or more slave CPUs currently available in the SOC, that the dynamic execution parameter is satisfied.
  • the task allocation scheme, and scheduling one or more slave CPUs to perform the above tasks according to the determined task allocation scheme implements multiprocessor scheduling with the processor as the basic scheduling unit. In a heterogeneous SOC system, different types of processors are included, and different tasks correspond to different CPU types.
  • the dynamic execution parameter may include a CPU type that performs the task, and at this time, the main CPU determines that the dynamic execution parameter is satisfied according to one or more slave CPUs currently available in the SOC.
  • the task may be assigned to one or more slave CPUs of one or more slave CPUs that are currently available to the SOC corresponding to the CPU type of the task.
  • the main CPU in the SOC can assign the tasks to be executed to the currently available slave CPUs in the SOC.
  • the number of CPUs that can be allocated for each task can be a fixed number or a fixed number. It is a dynamically variable number, or there is no limit to the number of CPUs. Therefore, in another preferred embodiment of the embodiment of the present invention, the dynamic execution parameter may further include a maximum number of CPUs executed in parallel, and at this time, the primary CPU determines one or more slave CPUs currently available according to the SOC.
  • the task may be assigned to one or more slave CPUs of the one or more slave CPUs currently available to the SOC corresponding to the CPU type and not larger than the maximum number of CPUs executed in parallel.
  • efficient scheduling of multiple processors in a heterogeneous system is achieved.
  • the SOC system it is generally recommended to combine multiple isomorphic processors to form a cluster (CLUSTER).
  • CLUSTER the communication speed between CPUs belonging to the same cluster is faster than that of different clusters of CPUs. The communication speed is fast, so that the CPU in the same cluster processes the task faster.
  • the task when the primary CPU determines a task allocation scheme that satisfies the dynamic execution parameter according to one or more slave CPUs currently available in the SOC, the task may be assigned to belong to the same cluster. Multiple CPUs. For example, in a SOC system, each successive 4 CPUs are in a cluster. After receiving the task to be executed, the maximum number of CPUs executed in parallel for the task is determined according to the obtained dynamic execution parameters. By assigning multiple identical tasks from the CPU to the same cluster for efficiency, the task can be distributed to four slave CPUs belonging to the same cluster.
  • one or more slave CPUs may be scheduled to perform tasks according to the determined task assignment scheme, in yet another embodiment of the present invention.
  • the slave CPU performs a task from the CPU summary mode, that is, the master CPU selects one slave CPU (referred to as a virtual master CPU) from the plurality of slave CPUs, and distributes the task to the selected slave CPU, and selects Scheduling a plurality of slave CPUs from the CPU to execute the tasks.
  • the master CPU selects one slave CPU (referred to as a virtual master CPU) from the plurality of slave CPUs, and distributes the task to the selected slave CPU, and selects Scheduling a plurality of slave CPUs from the CPU to execute the tasks.
  • a slave CPU that communicates with the determined plurality of slave CPUs from the CPU, so that the task from the CPU is more efficient.
  • the selected slave CPU schedules a plurality of slave CPUs to execute tasks, and each slave CPU executes the distributed tasks in parallel, and returns the result of executing the tasks to the selected slave CPU.
  • the selected slave CPU receives the results of the execution tasks fed back from the CPU, and summarizes the results fed back from the CPU and feeds back to the main CPU.
  • the main CPU receives the result summary of the selected slave CPU and outputs the execution result of the task.
  • the maximum execution time of the task may be set.
  • the dynamic execution parameter may also include the maximum execution time of the task.
  • Embodiment 2 in a SOC multi-processor architecture as shown in FIG. 3, a parallel computer architecture of a multi-tasking flow of the SOC system shown in FIG. 4 is taken as an example to illustrate the multi-core parallel computer system. Scheduling method and processing flow.
  • a separate processor primary CPU acts as a scheduling processor that receives the task stream and feeds back the task results.
  • the method provided by the embodiment of the present invention can be applied to an SOC system, and can also be applied to a multi-computer cluster environment composed of a plurality of homogeneous and heterogeneous calculator groups.
  • the task is received by the main CPU (MAIN CPU), and the task is assigned to the corresponding computer group, and the corresponding computer group processes the assigned task in parallel, and feeds the execution result to the MAIN CPU, which is provided by the MAIN CPU. Get the task execution results, all scheduling work is also done by the MAIN CPU.
  • the processor acts as the basic unit for scheduling, and the MAIN CPU gets the task and assigns it to a different slave CPU.
  • a virtual processor group is allocated for each task, and there is a corresponding correspondence between the virtual processor group and the actual processor group.
  • a built-in SOC multi-core system contains a main CPU, and all other CPUs are called slave CPUs. Both the main CPU and the slave CPU can access the memory of the same address space, which is convenient for the task to be given to the slave CPU.
  • all tasks that need to be loaded are stored in binary form, and may include the priority of the task (whether it is preferentially scheduled), and the maximum number of processors that can be executed in parallel (fixed number or unlimited) Quantity), maximum execution time (allowing the task to be deprived after this time), target processor type (target cluster to which it is loaded), and dynamic data area (dynamic information such as the number of processors that can be allocated;) .
  • all tasks that need to be loaded are written in accordance with multiprocessor programming specifications (such as MPI), and all tasks that need to be loaded are transformed into parallel scheduling operations.
  • the transformation of executable tasks is shown in Figure 6. For example, increase the communication function between multiple CPUs, increase the function such as obtaining the current CPU ID, and so on.
  • the program needs to be linked with the related multi-core library when compiling.
  • the name of the library can be called "libmcore.a".
  • the program needs to be linked with such a library and eventually generate the target file.
  • all tasks that need to be loaded also store dynamic execution parameters in a fixed location, for example, how many CPU cores are running or other parameters. This parameter needs to be placed in the specified location by command line or other means, such as DS: 0xl00, which is 512 bytes in length, so that the dynamic parameters need to be actually written into the execution space of the task when the task is actually loaded.
  • All the processor groups executable from the CPU are virtual CPU groups, which should have a certain correspondence with the actual physical CPU.
  • the main CPU dynamically allocates the corresponding physical CPU according to the nature of the task, and the processors must be in accordance with each other.
  • a processor programming specification (such as MPI) is used for inter-task communication, which actually involves communication between multiple virtual processors.
  • MPI processor programming specification
  • the virtual processor is associated with the actual physical processor.
  • each consecutive 4 CPUs are in one cluster, 4 CPUs are in use, 8 CPU resources are idle, Task 0 can only run on one CPU, and Task 1 can run on 3 CPUs at most.
  • Task 2 is not limited to the number of CPUs executed.
  • the occupation CPU allocates the task to the appropriate processor according to the priority of the task and the processor type to which the task belongs.
  • the allocation of the physical processor can refer to the above allocation method.
  • the actual application is for the virtual CPU group, which implies the details of the physical CPU.
  • the main CPU allocates a task to the virtual CPU group, it can use a slave CPU in the assigned virtual slave CPU group as the virtual master CPU.
  • the virtual primary CPU does not necessarily select the first CPU in the group, and preferably can be assigned to the location with the fastest communication speed with other processors of the group.
  • Virtual slave CPU0 is generally considered to be the master CPU in the virtual slave CPU group (unlike the main CPU of the entire architecture, it can be called the virtual master CPU).
  • the task scheduling and execution mode of the virtual slave CPU is performed from the CPU summary mode, which is hereinafter referred to as logical CPU0.
  • the flow from the CPU summary mode is described in detail below.
  • the summary method from the CPU is mainly to select a slave CPU as the master CPU in the slave CPU group, and to perform the summary work of the selected slave CPU. That is, one of the plurality of slave CPUs is viewed from the CPU as a master CPU relative to the other slave CPUs, which assists in the task assignment and data statistics functions.
  • the program When the program is executed, it needs to add synchronization mechanism to the written code. Therefore, the execution efficiency of the slave CPU cannot be maximized. At least one CPU needs to wait for the completion of other CPU tasks, and finally feeds the result back to the main CPU.
  • the main CPU in the CPU group is assumed to be a logical CPU0. Although this method does not have the high efficiency of the main CPU scheduling mode, the burden on the main CPU is alleviated, and the task of unifying the results is also performed in the slave CPU group. From the logical implementation point of view, the CPU summary method is more operable than the main CPU scheduling method. For example, to calculate 1 + 2 + ... + 100, the task can be broken down into four different programs below, and can be broken down into four different CPUs to run.
  • 7 is a schematic diagram of a summary mode from a CPU according to an embodiment of the present invention. As shown in FIG. 7, the logical CPU0 executes "1 + 2 + ...
  • the logical CPU 1 executes "26 + 27 + ... + 50", and the result 950 is reported to the logical CPU 0; the logical CPU 2 executes "51 + 52 + ... + 75 ", and the result 1575 is reported to the logical CPU 0; the logical CPU 3 executes "76 + 77 + ... + 100", and the result 2200 is reported to the logical CPU 0; after the logical CPU0 gets all the results, the results are summarized and the final result is reported to the main CPU. The main CPU directly outputs the final result "5050", thereby completing the execution of this task.
  • FIG. 8 is a schematic diagram of the interaction between the MAIN CPU and other CLUSTER CPUs according to the embodiment of the present invention. As shown in FIG. 8, in the embodiment of the present invention, any CPU can be used. Timing feedback information to the main CPU. Depending on the execution status of the task flow, when it exceeds its maximum runtime, the primary CPU can deprive the task and release its occupied processor resources.
  • the main CPU After the task is executed, the main CPU outputs the running result and releases the resources occupied by the task. In practical applications, as long as there is a waiting task flow and there are available processor resources, the main CPU will cycle through all the scheduling work. In the embodiment of the present invention, the CPU mapping and the priority processing are relatively easy to implement.
  • the embodiment of the present invention provides a task for dynamically linking a multi-core communication library and embedding dynamic parameters, and according to the idea and method of scheduling from the CPU summary manner, but It is not limited to the above implementation examples, but should also include other similar dynamic processor scheduling use cases.
  • FIG. 9 is a structural block diagram of a multiprocessor scheduling apparatus for a system-on-chip SOC according to an embodiment of the present invention. As shown in FIG. 9, the apparatus may include: an obtaining module 10, a determining module 20, and a scheduling module 30.
  • the obtaining module 10 is configured to, after the main processor (CPU) of the system on chip (SOC) receives the task to be executed, Obtaining a dynamic execution parameter of the task; the determining module 20 is coupled to the obtaining module 10, and configured to determine, according to one or more slave CPUs currently available in the SOC, a task allocation scheme that satisfies the dynamic execution parameter; the scheduling module 30, The determination module 20 is coupled to be arranged to schedule one or more slave CPUs to perform the tasks described above in accordance with the task assignment scheme described above.
  • CPU main processor
  • SOC system on chip
  • the main CPU of the SOC after receiving the task to be executed, acquires the dynamic execution parameter of the task, and determines a task allocation scheme that satisfies the dynamic execution parameter according to one or more slave CPUs currently available in the SOC, and Dispatching one or more slave CPUs to perform the above tasks according to the determined task allocation scheme implements multiprocessor scheduling with the processor as the basic scheduling unit.
  • different types of processors are included, and different tasks correspond to different CPU types. For example, for certain tasks, only ARM can be executed, some tasks can only be executed by the DSP, and some tasks can be executed by both ARM and DSP.
  • the determination module 20 is further configured to assign the task to one or more slave CPUs currently available in the SOC.
  • One or more slave CPUs corresponding to the CPU type are assigned to assign the task to one or more slave CPUs currently available in the SOC.
  • the main CPU in the SOC can assign the tasks to be executed to the currently available slave CPUs in the SOC.
  • the number of CPUs that can be allocated for each task can be a fixed number or a fixed number. It is a dynamically variable number, or there is no limit to the number of CPUs. Therefore, in another preferred embodiment of the embodiment of the present invention, in the case where the dynamic execution parameter includes the maximum number of CPUs executed in parallel, the determination module 20 is further configured to assign the task to one or more currently available to the SOC. From the one or more slave CPUs of the CPU corresponding to the above CPU type, the number of the one or more slave CPUs is not greater than the maximum number of CPUs executed in parallel.
  • the processor is scheduled according to the maximum number of CPUs that the task executes in parallel.
  • multiple isomorphic processors can be merged together to form a cluster.
  • the communication speed between CPUs belonging to the same cluster is faster than that between different clusters of CPUs.
  • CPUs in the same cluster Processing tasks is also faster. Therefore, in still another preferred embodiment of the embodiment of the present invention, the determining module 20 may assign the task to the same one when determining, according to one or more slave CPUs currently available in the SOC, a task allocation scheme that satisfies the dynamic execution parameter.
  • Multiple CPUs of a cluster For example, in a SOC system, each successive 4 CPUs are in a cluster.
  • the scheduling module 30 may schedule one or more slave CPUs to perform tasks according to the determined task assignment scheme, which is implemented in the present invention.
  • the slave CPU performs a task from the CPU summary mode, ⁇
  • the scheduling module 30 selects a slave CPU from the plurality of slave CPUs, and distributes the task to the selected slave CPU, by the selected slave
  • the CPU schedules a plurality of slave CPUs within the CPU to execute the tasks.
  • the selected slave CPU schedules a plurality of slave CPUs to execute tasks, and each slave CPU executes the distributed tasks in parallel, and returns the result of executing the tasks to the selected slave CPU.
  • the selected slave CPU receives the results of the execution tasks fed back from the CPU, and summarizes the results fed back from the CPU and feeds back to the main CPU.
  • the main CPU receives the result summary of the selected slave CPU and outputs the execution result of the task.
  • the maximum execution time of the task may be set.
  • the dynamic execution parameter may also include the maximum execution time of the task. At this time, if the result summary is not received after the maximum execution time of the task is exceeded, the main CPU notifies the slave CPU that executes the task to stop executing the task, and releases the CPU occupied by the task. Resources.
  • the present invention achieves the following technical effects: After receiving the task to be executed, the main CPU of the SOC acquires dynamic execution parameters of the task, according to one or more slave CPUs currently available in the SOC. And determining a task allocation scheme that satisfies the dynamic execution parameter, and scheduling one or more slave CPUs to perform the foregoing tasks according to the determined task allocation scheme, and implementing multi-processor scheduling with the processor as a basic scheduling unit. Assigning a task to one or more slave CPUs corresponding to the CPU type of the task currently available in the SOC, realizing scheduling of multiple processors in the heterogeneous SOC system, capable of being executed The task schedules the type of CPU required.

Abstract

公开了一种片上系统(SOC)的多处理器的调度方法及装置。该方法包括:片上系统(SOC)的主处理器(CPU)接受到需要执行的任务后,获取该任务的动态执行参数(S502);主CPU根据SOC中当前可用的一个或多个从CPU,确定满足动态执行参数的任务分配方案(S504);主CPU按照任务分配方案调度一个或多个从CPU执行任务(S506)。该方案实现了SOC的多处理器调度。

Description

片上系统 SOC的多处理器的调度方法及装置 技术领域 本发明涉及通信领域, 具体而言, 涉及一种片上系统 soc的多处理器的调度方法 及装置。 背景技术 目前, 多处理器系统已经得到了充分的应用, 但是怎样将多个同构 /异构的处理器 (Central Processing Unit, 简称为 CPU)群合并在一起作为一个整体来完成批量任务, 相关技术中并没有明确的处理方式。 目前使用最多的并行处理方式是采用并行多处理器 ( Symmetrical Multi-Processing, 简称为 SMP) 系统 (如图 1所示), 也就是针对多个同构的处理器, 其在解决了缓存 (Cache) —致性、 内存 (Memory) —致性等并行技术的前提下, 共 享所有的外围设备, 例如内存 /外部中断 /外部设备等。 这样软件系统可以选用 Linux/Windows等支持 SMP的操作系统来加载任务并执行。 由操作系统将任务划分成 多个子任务, 并动态调度到合适的目标处理器上加载执行。 另外一种使用较多的并行处理模式为计算机集群方式(如图 2所示), 也就是每个 独立的计算机都作为整个系统中的单一节点。 由另外的计算机或者网络中的某台计算 机通过网络自动分发任务给其他的计算机, 所有的计算机执行完成后都反馈信息给这 台分发计算机, 并结束任务执行。 图 3是根据相关技术的 SOC多核调度架构的示意图, 在如图 3所示的片上系统 ( System on Chip, 简称为 SOC)中, 假如簇(CLUSTER)内的 CPU通讯速度比较快, 那么就可以将多个同构的 CPU作为一个簇 (建议是同构的 CPU组成一个簇, 特别情 况下异构也支持)存在, 其可以和其他架构的 CPU簇并存, 共享全部的外部内存和外 设。 图 4是根据相关技术的 SOC并行计算构架的示意图, 如图 4所示, SOC系统可 以从外部得到任务流(Task Stream), 该任务流可以包含多个按照不同架构处理器类型 进行编译而产生的 2进制执行代码。 该代码可以自动按照所分配的处理器数量来动态 执行, 并可以和所分配的处理器组内任何计算机进行通讯, 并具备错误报告和最终反 馈结果的功能。 代码编写规则可以符合业内多处理器编程标准, 例如, 消息传递接口 (Message Passing Interface, 简称为 MPI) 标准。 相关技术中提供了一种处理方案, 在该处理方案中, 主操作系统可以对从操作系 统的一些行为进行监控, 并可以给其发送命令使其调整当前的动作, 而无法实现任务 调度。 又例如在相关技术中提供的另一种处理方案中, 其主要是事务级 /线程级的细节 调度策略处理, 与采用 MPI多处理器调度方式方法。 由此可知, 在 SOC相关技术中, 无法以处理器为基本调度单元实现同构 /异构的 处理器群中的任务调度。 发明内容 针对相关技术,无法在 SOC系统中实现以处理器为基本调度单元执行任务调度的 问题, 本发明提供了一种片上系统 SOC的多处理器的调度方法及装置, 以至少解决上 述问题。 根据本发明的一个方面,提供了一种片上系统 SOC的多处理器的调度方法,包括: SOC的主 CPU接收到需要执行的任务后,获取所述任务的动态执行参数;所述主 CPU 根据所述 SOC中当前可用的一个或多个从 CPU, 确定满足所述动态执行参数的任务 分配方案;所述主 CPU按照所述任务分配方案调度一个或多个从 CPU执行所述任务。 优选地, 所述动态执行参数, 包括: 执行所述任务的 CPU类型; 所述主 CPU根 据所述 SOC中当前可用的一个或多个从 CPU, 确定满足所述动态执行参数的任务分 配方案, 包括: 将所述任务分配给所述 SOC当前可用的一个或多个从 CPU中与所述 CPU类型对应的一个或多个从 CPU。 优选地, 所述动态执行参数, 还包括: 并行执行的最大 CPU个数; 所述主 CPU 根据所述 SOC中当前可用的一个或多个从 CPU, 确定满足所述动态执行参数的任务 分配方案, 包括: 将所述任务分配给所述 SOC当前可用的一个或多个从 CPU中与所 述 CPU类型对应的一个或多个从 CPU, 其中, 所述一个或多个从 CPU的数量不大于 所述最大 CPU个数。 优选地, 所述主 CPU按照所述任务分配方案调度一个或多个从 CPU执行所述任 务, 包括: 所述主 CPU从多个所述从 CPU中选择一个从 CPU作为虚拟主 CPU, 并将 所述任务分发给选择的虚拟主 CPU; 所述选择的虚拟主 CPU调度多个所述从 CPU内 的 CPU执行所述任务。 优选地, 所述选择的虚拟主 CPU调度多个所述从 CPU内的 CPU执行所述任务, 包括: 所述选择的虚拟主 CPU接收各个所述从 CPU反馈的执行所述任务的结果; 所 述选择的虚拟主 CPU将各个所述从 CPU反馈的所述结果汇总, 反馈给所述主 CPU。 优选地, 所述动态执行参数, 还包括: 所述任务的最大执行时间; 所述方法还包 括: 在超过所述最大执行时间后未收到所述结果汇总的情况下, 所述主 CPU通知执行 所述任务的从 CPU停止执行所述任务, 并释放所述任务占用的 CPU资源。 优选地, 多个所述从 CPU包括: 归属于同一 CPU簇的从 CPU。 根据本发明的另一个方面, 提供了一种片上系统 SOC的多处理器的调度装置, 包 括: 获取模块, 设置为在 SOC的主 CPU接收到需要执行的任务后, 获取所述任务的 动态执行参数; 确定模块, 设置为根据所述 SOC中当前可用的一个或多个从 CPU, 确定满足所述动态执行参数的任务分配方案; 调度模块, 设置为按照所述任务分配方 案调度一个或多个从 CPU执行所述任务。 优选地, 在所述动态执行参数包括执行所述任务的 CPU类型的情况下: 所述确定 模块, 还设置为将所述任务分配给所述 SOC中当前可用的一个或多个从 CPU中与所 述 CPU类型对应的一个或多个从 CPU。 优选地, 在所述动态执行参数包括并行执行的最大 CPU个数的情况下: 所述确定 模块, 还设置为将所述任务分配给所述 SOC当前可用的一个或多个从 CPU中与所述 CPU类型对应的一个或多个从 CPU, 其中, 所述一个或多个从 CPU的数量不大于所 述最大 CPU个数。 优选地,所述确定模块确定的多个所述从 CPU包括:归属于同一 CPU簇的从 CPU。 通过本发明, SOC的主 CPU接收到需要执行的任务后, 获取该任务的动态执行 参数, 根据 SOC中当前可用的一个或多个从 CPU, 确定满足动态执行参数的任务分 配方案, 并按照确定的任务分配方案调度一个或多个从 CPU执行上述任务, 实现了以 处理器为基本调度单元的多处理器调度。 附图说明 此处所说明的附图用来提供对本发明的进一步理解, 构成本申请的一部分, 本发 明的示意性实施例及其说明用于解释本发明, 并不构成对本发明的不当限定。 在附图 中: 图 1是根据相关技术的 SMP多处理器构架的示意图; 图 2是根据相关技术的计算机集群构架的示意图; 图 3是根据相关技术的 SOC多核调度架构的示意图; 图 4是根据相关技术的 SOC并行计算构架的示意图; 图 5是根据本发明实施例的片上系统 SOC的多处理器的调度方法的流程图; 图 6是根据本发明实施例的可执行任务的改造的示意图; 图 7是根据本发明实施例的从 CPU汇总方式的示意图; 图 8是根据本发明实施例的 MAIN CPU和其他 CLUSTER CPU交互的示意图; 图 9是根据本发明实施例的片上系统 SOC的多处理器的调度装置的结构框图。 具体实施方式 下文中将参考附图并结合实施例来详细说明本发明。 需要说明的是, 在不冲突的 情况下, 本申请中的实施例及实施例中的特征可以相互组合。 实施例一 根据本发明实施例, 提供了一种片上系统 SOC的多处理器的调度方法, 可以实现 对 SOC的多处理器的调度。 图 5是根据本发明实施例的片上系统 SOC的多处理器的调度方法的流程图,如图 5所示, 该方法可以包括以下几个步骤 (步骤 S502-步骤 S506): 步骤 S502, 片上系统 SOC的主处理器 CPU接收到需要执行的任务后, 获取该任 务的动态执行参数。 步骤 S504, 主 CPU根据 SOC中当前可用的一个或多个从 CPU, 确定满足上述动 态执行参数的任务分配方案。 步骤 S506,主 CPU按照上述任务分配方案调度一个或多个从 CPU执行上述任务。 通过本发明实施例, SOC的主 CPU接收到需要执行的任务后, 获取该任务的动 态执行参数, 根据 SOC中当前可用的一个或多个从 CPU, 确定满足动态执行参数的 任务分配方案, 并按照确定的任务分配方案调度一个或多个从 CPU执行上述任务, 实 现了以处理器为基本调度单元的多处理器调度。 在异构 SOC系统中, 包含不同类型的处理器, 而不同的任务对应着不同的 CPU 类型。 例如, 对于某些任务只能通过 ARM执行, 某些任务只能通过 DSP执行, 而某 些任务则既能够通过 ARM执行又能够通过 DSP执行。 因此, 在本发明实施例的一个 优选实施方式中, 上述动态执行参数可以包括执行该任务的 CPU类型, 此时, 主 CPU 在根据 SOC中当前可用的一个或多个从 CPU确定满足动态执行参数的任务分配方案 时,可以将任务分配给 SOC当前可用的一个或多个从 CPU中与该任务的 CPU类型对 应的一个或多个从 CPU。 通过本优选实施方式, 实现了在异构 SOC系统中的多处理 器的调度, 能够为需要执行的任务调度所需类型的 CPU。
SOC中的主 CPU接收到需要执行的任务后, 可以将需要执行的任务分配给 SOC 中当前可用的从 CPU执行, 每一个任务可以分配的 CPU的数量不同, 可以是一个固 定的数量, 也可以是动态可变的数量, 或者对 CPU的数量没有限制。 因此, 在本发明 实施例的另一个优选实施方式中, 上述动态执行参数还可以包括并行执行的最大 CPU 个数, 此时, 主 CPU在根据 SOC中当前可用的一个或多个从 CPU, 确定满足动态执 行参数的任务分配方案时, 可以将该任务分配给 SOC当前可用的一个或多个从 CPU 中与 CPU类型对应, 且不大于并行执行的最大 CPU个数的一个或多个从 CPU。 通过 本优选实施方式, 实现了异构系统中多处理器的有效调度。 在 SOC 系统中, 一般建议将多个同构的处理器合并在一起, 形成一个簇 (CLUSTER),在硬件设计时就已经使得属于同一个簇的 CPU之间的通信速度比不同簇 的 CPU之间的通信速度快, 从而同一簇内的 CPU处理任务的速度也更快。 因此, 在 本发明实施例的又一个优选实施方式中, 主 CPU在根据 S0C中当前可用的一个或多 个从 CPU确定满足动态执行参数的任务分配方案时,可以将任务分配给归属于同一簇 的多个 CPU。 例如, 在一个 S0C系统中, 每连续的 4个 CPU在一个簇内, 接收到需 要执行的任务后, 根据获取的动态执行参数确定该任务并行执行的最大 CPU个数为 4 个, 为了实现尽量将多个相同任务从 CPU都分配到同一个簇中以达到提高效率的目 的, 可以将该任务分发给归属于同一簇的 4个从 CPU执行。 主 CPU根据 S0C中当前可用的一个或多个从 CPU确定满足动态执行参数的任务 分配方案后, 可以按照确定的任务分配方案调度一个或多个从 CPU执行任务, 在本发 明实施例的再一个优选实施方式中, 采取从 CPU汇总方式调度从 CPU执行任务, 即, 主 CPU从多个从 CPU中选择一个从 CPU (称为虚拟主 CPU), 并将任务分发给选择的 从 CPU, 选择的从 CPU调度多个从 CPU内的从 CPU执行所述任务。 在实际应用中, 可以选择从 CPU中与确定的多个从 CPU通信速度最快的从 CPU,使得从 CPU执行任 务的效率更高。 进一步的, 选择的从 CPU调度多个从 CPU内的从 CPU执行任务, 各个从 CPU 并行执行分发的任务, 并向选择的从 CPU返回执行任务的结果。 选择的从 CPU接收 各个从 CPU反馈的执行任务的结果,并将各个从 CPU反馈的结果汇总,反馈给主 CPU。 主 CPU接收选择的从 CPU的结果汇总, 输出任务的执行结果。 在本发明实施例的另一个优选实施方式中, 为了避免执行任务长时间占用系统资 源, 可以设置任务的最大执行时间。 动态执行参数还可以包括任务的最大执行时间, 此时, 在超过任务的最大执行时间后未收到结果汇总的情况下, 主 CPU通知执行任务 的从 CPU停止执行任务, 并释放任务占用的 CPU资源。 实施例二 根据本发明实施例,在如图 3所示的 SOC多处理器构架中, 以如图 4所示的 SOC 系统的多任务流的并行计算机架构为例, 说明该多核并行计算机系统的调度方式和处 理流程。在适合 SOC实现 (单芯片环境;)的同构 /异构多核计算机系统中, 由独立的处理 器 (主 CPU) 作为调度处理器, 其接收任务流并反馈任务结果。 本发明实施例提供的 方法可以应用在 SOC系统,也可以应用在由多个同构和异构的计算器群组成的多计算 机集群环境下。 在本发明实施例中, 由主 CPU (MAIN CPU) 接收任务, 并将任务分配给相应的 计算机群, 相应的计算机群并行处理分配的任务, 并将执行结果反馈给 MAIN CPU, 由 MAIN CPU来得到任务执行结果,所有的调度工作也由 MAIN CPU来完成。在 SOC 系统中,处理器作为为调度的基本单元, MAIN CPU得到任务,并分配到不同的从 CPU 上。 在实际应用中, 为每个任务将分配虚拟的处理器群, 虚拟的处理器群和实际的处 理器群之间有相应的对应关系。 构建一个 SOC多核系统, 将同构的处理器放在一个簇内。 在构建的 SOC多核系 统中包含一个主 CPU, 其他所有 CPU称为从 CPU。 主 CPU和从 CPU都可以访问相 同地址空间的内存, 方便下达任务给从 CPU。 在本发明实施例中, 所有需要加载的任务都是以 2进制形式存放的, 可以包含任 务的优先级(是否被优先调度)、可以并行执行的最大处理器个数(固定数量或不限数 量)、 最大执行时间 (允许到达该时间后剥夺该任务执行)、 目标处理器类型 (被加载 到的目标簇), 以及动态数据区 (分配到的可运行处理器个数等动态信息;)。 此外, 所有需要加载的任务都按照多处理器编程规范 (例如 MPI) 来编写, 所有 需要加载的任务都被改造成适合并行调度运行,可执行任务的改造如图 6所示。例如, 增加多 CPU之间通讯函数, 增加取得当前 CPU ID等函数等。 因此, 该程序编译的时 候需要和相关的多核库链接到一起, 库的名称可以叫做 "libmcore.a", 该程序实际编 译时需要和这样的库链接到一起并最终生成目标文件。 进一步的, 所有需要加载的任务还在固定位置存放动态执行参数, 例如, 在多少 个 CPU核上运行或者其他参数等。该参数需要以命令行或其他方式放在指定地点, 例 如 DS:0xl00, 长度为 512字节的地址范围内, 这样任务实际加载时就需要把这些动态 参数实际写入该任务的执行空间中去。 所有从 CPU可执行的处理器组为虚拟 CPU组,其应该和实际的物理 CPU之间存 在一定的对应关系,由主 CPU根据任务性质动态分配相应的物理 CPU, 另外处理器之 间必须按照多处理器编程规范 (例如 MPI) 来进行任务间通讯, 其实际上涉及多个虚 拟处理器之间的通讯, 由主 CPU实际分配任务时, 将虚拟处理器和实际物理处理器对 应起来。 鉴于上述描述, 假定当前一共有 12个 CPU资源, 并且所有的 CPU资源都是同构 的, 所有任务都是同构处理器目标映像。 另外假定每连续的 4个 CPU在一个簇里面, 其中 4个 CPU正在使用中,剩下 8个 CPU资源空闲,任务 0只能在一个 CPU上运行, 任务 1最大可以在 3个 CPU上运行, 任务 2不限所执行的 CPU个数。 假定当前全部 的物理 CPU序号为: 0,1,2,3,4,5,6,7,8,9,10,11; 正在被占用的 CPU序号为 :2,8,9,11 ; 这 样可用的 CPU序号为: 0,1,3,4,5,6,7,10; 为了实现尽量将多个相同任务从 CPU都分配 到同一个簇中以达到提高效率的目的, 使用的分配方式如下: 因为 Taskl占用 3个 CPU, 但是 Task2占用 4个 CPU, 刚好为一个簇的容量, 因 此应该优先把空闲整簇 1分配给 Task2, 而簇 0空闲 3个 CPU, 这样也刚好可以分配 给 Taskl, 剩下的一个分配给 TaskO。 优化过的 CPU资源分配如下表所示:
物理 CPU 状态 任务名称 任务逻辑 CPU
0 空闲 Task 1 0
1 空闲 Task 1 1
CLUSTER0
2 占用
3 空闲 Task 1 2
CLUSTER 1 4 空闲 Task 2 0
5 空闲 Task 2 1 6 空闲 Task 2 2
7 空闲 Task 2 3
8 占用
9
CLUSTER2 占用
10 空闲 Task 0 0
11 占用 主 CPU根据任务的优先级, 以及该任务所属的处理器类型, 根据当前空闲处理器 的分布状况, 分配该任务到合适的处理器上, 物理处理器的分配可以参照上述分配方 法。针对任何任务而言, 其实际的应用程序面向的都是虚拟 CPU组, 隐含了物理 CPU 的细节信息。 主 CPU分配任务到虚拟 CPU组时,可以将所分配的虚拟从 CPU组中的某从 CPU 作为虚拟主 CPU。 该虚拟主 CPU并不一定选择组内首个 CPU, 优选地可以被分配到 和本组其他处理器通讯速度最快的位置。虚拟从 CPU0—般被认为是虚拟从 CPU组中 的主 CPU (和整个架构的主 CPU不同, 可以称为虚拟主 CPU)。 虚拟从 CPU的任务 调度和执行方式采用从 CPU汇总方式来进行, 上述虚拟主 CPU在下文中被称为逻辑 CPU0。 下面对从 CPU汇总方式的流程进行详细描述。 从 CPU汇总方式主要是从虚拟从 CPU组中选择一个从 CPU作为从 CPU组中的 主 CPU,由所选择的从 CPU来完成任务的汇总工作。 即, 将多个从 CPU中的一个从 CPU作为相对其他从 CPU中的主 CPU来看待,其辅助完成任务分配及数据统计功能。 该程序执行时需要对所写代码增加同步机制, 因此从 CPU的执行效率不能达到最高, 至少需要有一个 CPU在等待其他 CPU任务的完成, 并最终将结果反馈给主 CPU。 为 了描述清楚, 将从 CPU组中的主 CPU假定为逻辑 CPU0, 该方式虽然没有主 CPU调 度方式的高效率, 但是减轻了主 CPU的负担, 统一结果的任务也放在从 CPU组中完 成。 从逻辑实现上而言, 从 CPU汇总方式比主 CPU调度的方式更具备可操作性。 例如为了计算 1 + 2 + ... + 100, 该任务可以分解成下面的四个不同的程序, 并可 以分解到四个不同的 CPU上来加以运行。 图 7是根据本发明实施例的从 CPU汇总方 式的示意图, 如图 7所示, 逻辑 CPU0 执行 " 1 + 2 + ... + 25 ", 并等待其他 CPU执行 完成; 逻辑 CPU1执行 "26 + 27 + ... + 50", 并将结果 950报告给逻辑 CPU 0; 逻辑 CPU2执行 "51 + 52 + ... + 75 ", 并将结果 1575报告给逻辑 CPU 0; 逻辑 CPU3执行 "76 + 77 + ... + 100",并将结果 2200报告给逻辑 CPU 0;逻辑 CPU0得到全部结果都 后,对各个结果计算汇总并将最终结果报告给主 CPU。主 CPU直接将最终结果" 5050" 输出, 进而完成本次任务的执行。 从 CPU汇总的好处是在于降低了任务分配的难度,并且也减轻了主 CPU的负担, 所付出的代价就是程序编码较复杂, 因为必须存在多个从 CPU之间的同步机制, 对执 行效率有一定的影响。 每个从 CPU上都执行下面的一样的代码,但是通过 CPU ID来执行不同的代码段, 相应的采取从 CPU汇总方式的函数伪代码如下所示: int fUnc_sum(int start_data, int end—data)
int 1; int sum=0; for (i=start_data;i<=end_data;i++) sum+=i; return sum;
int main()
int result; int data; int id; id = get_cpuid(); data = id * 25 + 1; result = fUnc_sum(data, data + 24); if (id==0)
wait_all_cpu_data(); send_result_to_main_cpu(result + cpu 1—result + cpu2_result + cpu3_result);
else send_result_to_cpuO(result); return 0;
逻辑 CPUO需要执行所有从 CPU反馈数据的累加工作,并最终将该任务的结果反 馈给主 CPU来完成。 CPU之间同步通讯主要在从 CPU内部完成,减轻了主 CPU的压 力。 逻辑 CPU0执行完毕后, 需要将结果反馈给主 CPU, 图 8是根据本发明实施例的 MAIN CPU和其他 CLUSTER CPU交互的示意图, 如图 8所示, 在本发明实施例中, 任意 CPU都可以定时反馈信息给主 CPU。 根据任务流的执行状况, 当其超过其最大运行时间时, 主 CPU可以剥夺该任务, 并将其所占用处理器资源释放。在任务执行完毕后, 主 CPU输出运行结果, 并释放该 任务所占用的资源。 在实际应用中, 只要有等待任务流, 并且有可用处理器资源, 那 么主 CPU就一直循环完成所有调度工作。 在本发明实施例中, CPU映射以及优先级处理是比较容易做到的, 本发明实施例 提供了任务动态链接多核通讯库并嵌入动态参数,并按照从 CPU汇总方式调度的思路 和方法,但并不局限于上面的实施例子,还应包含其他类似的动态处理器调度的用例。 另外发明实施例给出了一种适合 SOC 实现的并行计算机的多任务处理和调度方式和 方法, 也可以实际适用在非 SMP系统的多核架构下的任务调度和处理方面。 实施例三 根据本发明实施例, 还提供了一种片上系统 SOC的多处理器的调度装置, 可以实 现本发明实施里提供方法。 图 9是根据本发明实施例的片上系统 SOC的多处理器的调度装置的结构框图,如 图 9所示, 该装置可以包括: 获取模块 10、 确定模块 20和调度模块 30。 其中, 获取 模块 10, 设置为在片上系统 (SOC) 的主处理器 (CPU) 接收到需要执行的任务后, 获取该任务的动态执行参数; 确定模块 20, 与获取模块 10相耦合, 设置为根据 SOC 中当前可用的一个或多个从 CPU, 确定满足上述动态执行参数的任务分配方案; 调度 模块 30,与确定模块 20相耦合,设置为按照上述任务分配方案调度一个或多个从 CPU 执行上述任务。 通过本发明实施例, SOC的主 CPU接收到需要执行的任务后, 获取该任务的动 态执行参数, 根据 SOC中当前可用的一个或多个从 CPU, 确定满足动态执行参数的 任务分配方案, 并按照确定的任务分配方案调度一个或多个从 CPU执行上述任务, 实 现了以处理器为基本调度单元的多处理器调度。 在异构 SOC系统中, 包含不同类型的处理器, 而不同的任务对应着不同的 CPU 类型。 例如, 对于某些任务只能通过 ARM执行, 某些任务只能通过 DSP执行, 而某 些任务则既能够通过 ARM执行又能够通过 DSP执行。 因此, 在本发明实施例的一个 优选实施方式中, 在动态执行参数包括执行任务的 CPU类型的情况下, 确定模块 20 还设置为,将任务分配给 SOC中当前可用的一个或多个从 CPU中与所述 CPU类型对 应的一个或多个从 CPU。 通过本优选实施方式, 实现了在异构 SOC系统中的多处理 器的调度, 能够为需要执行的任务调度所需类型的 CPU。
SOC中的主 CPU接收到需要执行的任务后, 可以将需要执行的任务分配给 SOC 中当前可用的从 CPU执行, 每一个任务可以分配的 CPU的数量不同, 可以是一个固 定的数量, 也可以是动态可变的数量, 或者对 CPU的数量没有限制。 因此, 在本发明 实施例的另一个优选实施方式中,在动态执行参数包括并行执行的最大 CPU个数的情 况下, 确定模块 20还设置为, 将任务分配给 SOC当前可用的一个或多个从 CPU中与 上述 CPU类型对应的一个或多个从 CPU, 上述一个或多个从 CPU的数量不大于并行 执行的最大 CPU个数。 通过本优选实施方式, 实现了根据任务并行执行的最大 CPU 数量来调度处理器。 在 SOC系统中, 可以将多个同构的处理器合并在一起, 形成一个簇, 属于同一个 簇的 CPU之间的通信速度比不同簇的 CPU之间的通信速度快, 同一簇内的 CPU处理 任务的速度也更快。 因此, 在本发明实施例的又一个优选实施方式中, 确定模块 20 在根据 S0C中当前可用的一个或多个从 CPU确定满足动态执行参数的任务分配方案 时, 可以将任务分配给归属于同一簇的多个 CPU。 例如, 在一个 S0C系统中, 每连 续的 4个 CPU在一个簇内, 接收到需要执行的任务后, 根据获取的动态执行参数确定 该任务并行执行的最大 CPU个数为 4个, 为了实现尽量将多个相同任务从 CPU都分 配到同一个簇中以达到提高效率的目的, 可以将该任务分发给归属于同一簇的 4个从 CPU执行。 确定模块 20根据 SOC中当前可用的一个或多个从 CPU确定满足动态执行参数的 任务分配方案后, 调度模块 30可以按照确定的任务分配方案调度一个或多个从 CPU 执行任务, 在本发明实施例的再一个优选实施方式中, 采取从 CPU汇总方式调度从 CPU执行任务, δΡ, 调度模块 30从多个从 CPU中选择一个从 CPU, 并将任务分发给 选择的从 CPU, 由选择的从 CPU调度多个从 CPU内的从 CPU执行所述任务。在实际 应用中,可以选择从 CPU中与确定的多个从 CPU通信速度最快的从 CPU,使得从 CPU 执行任务的效率更高。 进一步的, 选择的从 CPU调度多个从 CPU内的从 CPU执行任务, 各个从 CPU 并行执行分发的任务, 并向选择的从 CPU返回执行任务的结果。 选择的从 CPU接收 各个从 CPU反馈的执行任务的结果,并将各个从 CPU反馈的结果汇总,反馈给主 CPU。 主 CPU接收选择的从 CPU的结果汇总, 输出任务的执行结果。 在本发明实施例的另一个优选实施方式中, 为了避免执行任务长时间占用系统资 源, 可以设置任务的最大执行时间。 动态执行参数还可以包括任务的最大执行时间, 此时, 在超过任务的最大执行时间后未收到结果汇总的情况下, 主 CPU通知执行任务 的从 CPU停止执行任务, 并释放任务占用的 CPU资源。 从以上的描述中, 可以看出, 本发明实现了如下技术效果: SOC的主 CPU接收 到需要执行的任务后, 获取该任务的动态执行参数, 根据 SOC中当前可用的一个或多 个从 CPU, 确定满足动态执行参数的任务分配方案, 并按照确定的任务分配方案调度 一个或多个从 CPU执行上述任务, 实现了以处理器为基本调度单元的多处理器调度。 将任务分配给 SOC中当前可用的一个或多个从 CPU中与任务的 CPU类型对应的一个 或多个从 CPU, 实现了在异构 SOC系统中的多处理器的调度, 能够为需要执行的任 务调度所需类型的 CPU。 将任务分配给归属于同一簇的多个 CPU, 使得多个 CPU之 间的通信速度更快, 提高了任务处理效率。 同时, 采用从 CPU汇总方式, 减轻了主 CPU的负担, 提高了系统的可靠性。 显然, 本领域的技术人员应该明白, 上述的本发明的各模块或各步骤可以用通用 的计算装置来实现, 它们可以集中在单个的计算装置上, 或者分布在多个计算装置所 组成的网络上, 可选地, 它们可以用计算装置可执行的程序代码来实现, 从而, 可以 将它们存储在存储装置中由计算装置来执行, 并且在某些情况下, 可以以不同于此处 的顺序执行所示出或描述的步骤, 或者将它们分别制作成各个集成电路模块, 或者将 它们中的多个模块或步骤制作成单个集成电路模块来实现。 这样, 本发明不限制于任 何特定的硬件和软件结合。 以上所述仅为本发明的优选实施例而已, 并不用于限制本发明, 对于本领域的技 术人员来说, 本发明可以有各种更改和变化。 凡在本发明的精神和原则之内, 所作的 任何修改、 等同替换、 改进等, 均应包含在本发明的保护范围之内。

Claims

权 利 要 求 书
1. 一种片上系统 SOC的多处理器的调度方法, 包括:
片上系统 S0C的主处理器 CPU接收到需要执行的任务后, 获取所述任务 的动态执行参数;
所述主 CPU根据所述 SOC中当前可用的一个或多个从 CPU, 确定满足所 述动态执行参数的任务分配方案;
所述主 CPU按照所述任务分配方案调度一个或多个从 CPU执行所述任务。
2. 根据权利要求 1所述的方法, 其中,
所述动态执行参数, 包括: 执行所述任务的 CPU类型;
所述主 CPU根据所述 SOC中当前可用的一个或多个从 CPU, 确定满足所 述动态执行参数的任务分配方案,包括:将所述任务分配给所述 SOC当前可用 的一个或多个从 CPU中与所述 CPU类型对应的一个或多个从 CPU。
3. 根据权利要求 2所述的方法, 其中,
所述动态执行参数, 还包括: 并行执行的最大 CPU个数;
所述主 CPU根据所述 SOC中当前可用的一个或多个从 CPU, 确定满足所 述动态执行参数的任务分配方案,包括:将所述任务分配给所述 SOC当前可用 的一个或多个从 CPU中与所述 CPU类型对应的一个或多个从 CPU, 其中, 所 述一个或多个从 CPU的数量不大于所述最大 CPU个数。
4. 根据权利要求 3所述的方法,其中,所述主 CPU按照所述任务分配方案调度一 个或多个从 CPU执行所述任务, 包括:
所述主 CPU从多个所述从 CPU中选择一个从 CPU作为虚拟主 CPU, 并 将所述任务分发给选择的虚拟主 CPU;
所述选择的虚拟主 CPU调度多个所述从 CPU内的 CPU执行所述任务。
5. 根据权利要求 4所述的方法, 其中, 所述选择的虚拟主 CPU调度多个所述从 CPU内的 CPU执行所述任务, 包括:
所述选择的虚拟主 CPU接收各个所述从 CPU反馈的执行所述任务的结果; 所述选择的虚拟主 CPU将各个所述从 CPU反馈的所述结果汇总, 反馈给 所述主 CPU。
6. 根据权利要求 5所述的方法, 其中, 所述动态执行参数, 还包括: 所述任务的最大执行时间;
所述方法还包括: 在超过所述最大执行时间后未收到所述结果汇总的情况 下, 所述主 CPU通知执行所述任务的从 CPU停止执行所述任务, 并释放所述 任务占用的 CPU资源。
7. 根据权利要求 1至 6中任一项所述的方法, 其中, 多个所述从 CPU包括: 归属 于同一 CPU簇的从 CPU。
8. —种片上系统 SOC的多处理器的调度装置, 包括:
获取模块, 设置为在片上系统 SOC的主处理器 CPU接收到需要执行的任 务后, 获取所述任务的动态执行参数;
确定模块, 设置为根据所述 SOC中当前可用的一个或多个从 CPU, 确定 满足所述动态执行参数的任务分配方案;
调度模块,设置为按照所述任务分配方案调度一个或多个从 CPU执行所述 任务。
9. 根据权利要求 8所述的装置, 其中, 在所述动态执行参数包括执行所述任务的 CPU类型的情况下:
所述确定模块,还设置为将所述任务分配给所述 SOC中当前可用的一个或 多个从 CPU中与所述 CPU类型对应的一个或多个从 CPU。
10. 根据权利要求 9所述的装置, 其中, 在所述动态执行参数包括并行执行的最大 CPU个数的情况下:
所述确定模块,还设置为将所述任务分配给所述 SOC当前可用的一个或多 个从 CPU中与所述 CPU类型对应的一个或多个从 CPU, 其中, 所述一个或多 个从 CPU的数量不大于所述最大 CPU个数。
11. 根据权利要求 8至 10中任一项所述的装置,其中,所述确定模块确定的多个所 述从 CPU包括: 归属于同一 CPU簇的从 CPU。
PCT/CN2012/077537 2012-03-05 2012-06-26 片上系统soc的多处理器的调度方法及装置 WO2013131340A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP12870664.5A EP2824569A4 (en) 2012-03-05 2012-06-26 METHOD AND DEVICE FOR PLANNING A MULTIPROCESSOR OF A SYSTEM ON A CHIP (SOC)
US14/383,203 US20150121391A1 (en) 2012-03-05 2012-06-26 Method and device for scheduling multiprocessor of system on chip (soc)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210054957.5 2012-03-05
CN2012100549575A CN103294554A (zh) 2012-03-05 2012-03-05 片上系统soc的多处理器的调度方法及装置

Publications (1)

Publication Number Publication Date
WO2013131340A1 true WO2013131340A1 (zh) 2013-09-12

Family

ID=49095484

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/077537 WO2013131340A1 (zh) 2012-03-05 2012-06-26 片上系统soc的多处理器的调度方法及装置

Country Status (4)

Country Link
US (1) US20150121391A1 (zh)
EP (1) EP2824569A4 (zh)
CN (1) CN103294554A (zh)
WO (1) WO2013131340A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107678853A (zh) * 2016-08-02 2018-02-09 中国电信股份有限公司 图形处理任务的调度方法以及装置

Families Citing this family (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101819540B (zh) * 2009-02-27 2013-03-20 国际商业机器公司 在集群中调度任务的方法和系统
RU2538920C2 (ru) * 2013-05-06 2015-01-10 Общество с ограниченной ответственностью "Аби ИнфоПоиск" Способ распределения задач сервером вычислительной системы, машиночитаемый носитель информации и система для реализации способа
US9547497B2 (en) * 2013-09-27 2017-01-17 Intel Corporation Sharing embedded hardware resources
CN103645954B (zh) * 2013-11-21 2018-12-14 华为技术有限公司 一种基于异构多核体系的cpu调度方法、装置和系统
WO2015139164A1 (zh) 2014-03-17 2015-09-24 华为技术有限公司 一种任务调度的方法、装置及设备
CN105224856A (zh) * 2014-07-02 2016-01-06 腾讯科技(深圳)有限公司 计算机系统检测方法及装置
US9600312B2 (en) 2014-09-30 2017-03-21 Amazon Technologies, Inc. Threading as a service
US9830193B1 (en) 2014-09-30 2017-11-28 Amazon Technologies, Inc. Automatic management of low latency computational capacity
US9146764B1 (en) 2014-09-30 2015-09-29 Amazon Technologies, Inc. Processing event messages for user requests to execute program code
US9678773B1 (en) 2014-09-30 2017-06-13 Amazon Technologies, Inc. Low latency computational capacity provisioning
US9537788B2 (en) 2014-12-05 2017-01-03 Amazon Technologies, Inc. Automatic determination of resource sizing
US9400685B1 (en) * 2015-01-30 2016-07-26 Huawei Technologies Co., Ltd. Dividing, scheduling, and parallel processing compiled sub-tasks on an asynchronous multi-core processor
US9588790B1 (en) 2015-02-04 2017-03-07 Amazon Technologies, Inc. Stateful virtual compute system
US9733967B2 (en) 2015-02-04 2017-08-15 Amazon Technologies, Inc. Security protocols for low latency execution of program code
US10754701B1 (en) * 2015-12-16 2020-08-25 Amazon Technologies, Inc. Executing user-defined code in response to determining that resources expected to be utilized comply with resource restrictions
US10067801B1 (en) 2015-12-21 2018-09-04 Amazon Technologies, Inc. Acquisition and maintenance of compute capacity
US9910713B2 (en) 2015-12-21 2018-03-06 Amazon Technologies, Inc. Code execution request routing
US11132213B1 (en) 2016-03-30 2021-09-28 Amazon Technologies, Inc. Dependency-based process of pre-existing data sets at an on demand code execution environment
CN105955807B (zh) * 2016-04-20 2023-10-31 上海瀚银信息技术有限公司 一种任务处理系统及方法
CN107451090B (zh) * 2016-06-01 2020-09-11 华为技术有限公司 数据处理系统和数据处理方法
US10102040B2 (en) 2016-06-29 2018-10-16 Amazon Technologies, Inc Adjusting variable limit on concurrent code executions
CN106776039B (zh) * 2016-12-30 2020-04-03 Oppo广东移动通信有限公司 一种数据处理方法及装置
CN106802828A (zh) * 2016-12-30 2017-06-06 广东欧珀移动通信有限公司 一种应用数据处理方法及设备
US10394717B1 (en) * 2018-02-16 2019-08-27 Microsoft Technology Licensing, Llc Central processing unit cache friendly multithreaded allocation
US10853115B2 (en) 2018-06-25 2020-12-01 Amazon Technologies, Inc. Execution of auxiliary functions in an on-demand network code execution system
US11146569B1 (en) 2018-06-28 2021-10-12 Amazon Technologies, Inc. Escalation-resistant secure network services using request-scoped authentication information
US10949237B2 (en) 2018-06-29 2021-03-16 Amazon Technologies, Inc. Operating system customization in an on-demand network code execution system
US11099870B1 (en) 2018-07-25 2021-08-24 Amazon Technologies, Inc. Reducing execution times in an on-demand network code execution system using saved machine states
CN109165433B (zh) * 2018-08-13 2023-05-26 国网重庆市电力公司电力科学研究院 一种复杂场景的工频电场计算方法及系统
US11099917B2 (en) 2018-09-27 2021-08-24 Amazon Technologies, Inc. Efficient state maintenance for execution environments in an on-demand code execution system
US11243953B2 (en) 2018-09-27 2022-02-08 Amazon Technologies, Inc. Mapreduce implementation in an on-demand network code execution system and stream data processing system
US11943093B1 (en) 2018-11-20 2024-03-26 Amazon Technologies, Inc. Network connection recovery after virtual machine transition in an on-demand network code execution system
US10884812B2 (en) 2018-12-13 2021-01-05 Amazon Technologies, Inc. Performance-based hardware emulation in an on-demand network code execution system
US11010188B1 (en) 2019-02-05 2021-05-18 Amazon Technologies, Inc. Simulated data object storage using on-demand computation of data objects
US11861386B1 (en) 2019-03-22 2024-01-02 Amazon Technologies, Inc. Application gateways in an on-demand network code execution system
CN111857061A (zh) * 2019-04-28 2020-10-30 北京国电智深控制技术有限公司 一种计算任务实现方法、装置及系统、存储介质
US11119809B1 (en) 2019-06-20 2021-09-14 Amazon Technologies, Inc. Virtualization-based transaction handling in an on-demand network code execution system
US11115404B2 (en) 2019-06-28 2021-09-07 Amazon Technologies, Inc. Facilitating service connections in serverless code executions
US11159528B2 (en) 2019-06-28 2021-10-26 Amazon Technologies, Inc. Authentication to network-services using hosted authentication information
US11190609B2 (en) 2019-06-28 2021-11-30 Amazon Technologies, Inc. Connection pooling for scalable network services
US11119826B2 (en) 2019-11-27 2021-09-14 Amazon Technologies, Inc. Serverless call distribution to implement spillover while avoiding cold starts
CN110928668B (zh) * 2019-12-09 2022-06-07 北京思特奇信息技术股份有限公司 一种基于ZooKeeper实现云化任务编排调度的方法和系统
US11714682B1 (en) 2020-03-03 2023-08-01 Amazon Technologies, Inc. Reclaiming computing resources in an on-demand code execution system
US11188391B1 (en) 2020-03-11 2021-11-30 Amazon Technologies, Inc. Allocating resources to on-demand code executions under scarcity conditions
US11550713B1 (en) 2020-11-25 2023-01-10 Amazon Technologies, Inc. Garbage collection in distributed systems using life cycled storage roots
US11593270B1 (en) 2020-11-25 2023-02-28 Amazon Technologies, Inc. Fast distributed caching using erasure coded object parts
TWI756974B (zh) 2020-12-09 2022-03-01 財團法人工業技術研究院 機器學習系統及其資源配置方法
KR102570905B1 (ko) * 2021-05-17 2023-08-29 주식회사 엘지유플러스 클라우드 환경에서의 컨테이너 기반 자원의 최적화 시스템
US11388210B1 (en) 2021-06-30 2022-07-12 Amazon Technologies, Inc. Streaming analytics using a serverless compute system
CN113535719A (zh) * 2021-07-07 2021-10-22 锐掣(杭州)科技有限公司 数据过滤方法、数据过滤装置、存储介质及产品

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050022173A1 (en) * 2003-05-30 2005-01-27 Codito Technologies Private Limited Method and system for allocation of special purpose computing resources in a multiprocessor system
US20080244588A1 (en) * 2007-03-28 2008-10-02 Massachusetts Institute Of Technology Computing the processor desires of jobs in an adaptively parallel scheduling environment
CN101387952A (zh) * 2008-09-24 2009-03-18 上海大学 单芯片多处理器任务调度管理方法
CN101566957A (zh) * 2008-04-25 2009-10-28 恩益禧电子股份有限公司 信息处理系统及任务执行控制方法
CN101706743A (zh) * 2009-12-07 2010-05-12 北京航空航天大学 一种多核环境下的虚拟机调度方法

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3878431B2 (ja) * 2000-06-16 2007-02-07 株式会社ルネサステクノロジ 半導体集積回路装置
US6959372B1 (en) * 2002-02-19 2005-10-25 Cogent Chipware Inc. Processor cluster architecture and associated parallel processing methods
JP2007519103A (ja) * 2004-01-08 2007-07-12 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ マルチプロセッサシステムにおけるリソース管理
US20080163183A1 (en) * 2006-12-29 2008-07-03 Zhiyuan Li Methods and apparatus to provide parameterized offloading on multiprocessor architectures
US8156495B2 (en) * 2008-01-17 2012-04-10 Oracle America, Inc. Scheduling threads on processors
US8225325B2 (en) * 2008-06-06 2012-07-17 Apple Inc. Multi-dimensional thread grouping for multiple processors
US8479214B2 (en) * 2008-09-30 2013-07-02 Microsoft Corporation Hardware throughput saturation detection
US20100242014A1 (en) * 2009-03-17 2010-09-23 Xiaohan Zhu Symmetric multi-processor operating system for asymmetric multi-processor architecture
US9081621B2 (en) * 2009-11-25 2015-07-14 Microsoft Technology Licensing, Llc Efficient input/output-aware multi-processor virtual machine scheduling

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050022173A1 (en) * 2003-05-30 2005-01-27 Codito Technologies Private Limited Method and system for allocation of special purpose computing resources in a multiprocessor system
US20080244588A1 (en) * 2007-03-28 2008-10-02 Massachusetts Institute Of Technology Computing the processor desires of jobs in an adaptively parallel scheduling environment
CN101566957A (zh) * 2008-04-25 2009-10-28 恩益禧电子股份有限公司 信息处理系统及任务执行控制方法
CN101387952A (zh) * 2008-09-24 2009-03-18 上海大学 单芯片多处理器任务调度管理方法
CN101706743A (zh) * 2009-12-07 2010-05-12 北京航空航天大学 一种多核环境下的虚拟机调度方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107678853A (zh) * 2016-08-02 2018-02-09 中国电信股份有限公司 图形处理任务的调度方法以及装置
CN107678853B (zh) * 2016-08-02 2020-08-25 中国电信股份有限公司 图形处理任务的调度方法以及装置

Also Published As

Publication number Publication date
EP2824569A1 (en) 2015-01-14
EP2824569A4 (en) 2016-06-01
CN103294554A (zh) 2013-09-11
US20150121391A1 (en) 2015-04-30

Similar Documents

Publication Publication Date Title
WO2013131340A1 (zh) 片上系统soc的多处理器的调度方法及装置
Phillips et al. Adapting a message-driven parallel application to GPU-accelerated clusters
US9779042B2 (en) Resource management in a multicore architecture
JP6018021B2 (ja) マルチコアアーキテクチャにおけるリソース管理
KR102600852B1 (ko) 이종 cpu/gpu 시스템에서 데이터 흐름 신호 처리 애플리케이션 가속화
Becchi et al. A virtual memory based runtime to support multi-tenancy in clusters with GPUs
JP2009519513A (ja) 専用スレッド管理を用いたマルチコアの演算処理方法及び装置
Peter et al. Design principles for end-to-end multicore schedulers
WO2016159765A1 (en) Many-core processor architecture and many-core operating system
JP2010079622A (ja) マルチコアプロセッサシステム、および、そのタスク制御方法
Sajjapongse et al. A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with gpus
KR20130080722A (ko) 병렬 컴퓨팅 프레임워크 기반의 클러스터 시스템, 호스트 노드, 계산 노드 및 어플리케이션 실행 방법
Cadambi et al. COSMIC: middleware for high performance and reliable multiprocessing on xeon phi coprocessors
del Cuvillo et al. Toward a software infrastructure for the cyclops-64 cellular architecture
CN111459622B (zh) 调度虚拟cpu的方法、装置、计算机设备和存储介质
Arnold et al. Power aware heterogeneous MPSoC with dynamic task scheduling and increased data locality for multiple applications
Jatala et al. Improving GPU performance through resource sharing
CN104714843A (zh) 多内核操作系统实例支持多处理器的方法及装置
Elliott et al. Gpusync: Architecture-aware management of gpus for predictable multi-gpu real-time systems
US20160267621A1 (en) Graphic processing system and method thereof
CN112783651B (zh) 一种云平台vGPU负载均衡调度方法、介质及装置
US20120137300A1 (en) Information Processor and Information Processing Method
Cai et al. ABSS: An Adaptive Batch-Stream Scheduling Module for Dynamic Task Parallelism on Chiplet-based Multi-Chip Systems
Labarta et al. Hybrid Parallel Programming with MPI/StarSs
Maia et al. Combining rtsj with fork/join: a priority-based model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12870664

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2012870664

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 14383203

Country of ref document: US