WO2022174442A1 - 多核处理器、多核处理器的处理方法及相关设备 - Google Patents

多核处理器、多核处理器的处理方法及相关设备 Download PDF

Info

Publication number
WO2022174442A1
WO2022174442A1 PCT/CN2021/077230 CN2021077230W WO2022174442A1 WO 2022174442 A1 WO2022174442 A1 WO 2022174442A1 CN 2021077230 W CN2021077230 W CN 2021077230W WO 2022174442 A1 WO2022174442 A1 WO 2022174442A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
chain
core
chains
scheduler
Prior art date
Application number
PCT/CN2021/077230
Other languages
English (en)
French (fr)
Inventor
张雷
肖潇
於正强
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2021/077230 priority Critical patent/WO2022174442A1/zh
Priority to CN202180093759.7A priority patent/CN116868169A/zh
Priority to EP21926159.1A priority patent/EP4287024A4/en
Publication of WO2022174442A1 publication Critical patent/WO2022174442A1/zh
Priority to US18/452,046 priority patent/US20230393889A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/485Task life-cycle, e.g. stopping, restarting, resuming execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • G06F9/522Barrier synchronisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5017Task decomposition

Definitions

  • the present application relates to the technical field of processors, and in particular, to a multi-core processor, a processing method for a multi-core processor, and related devices.
  • the task scheduler (Job Manager, JM) is used to implement multi-core GPU task scheduling (Kick-Off, KO).
  • the device development kit and driver parse the upper-layer applications (APPs) calls to the graphics/computing application programming interface (Application Programming Interface, API), and encapsulate them into tasks that the GPU can recognize and execute.
  • APPs Application Programming Interface
  • JC task chain
  • Communication Stream command stream
  • the multi-cores of the GPU concurrently execute the tasks they receive.
  • the task scheduler is responsible for multi-core scheduling and is responsible for or participating in multi-process management, which affects the multi-core utilization efficiency.
  • the prior art solution does not solve the no-load problem of GPU multi-core scheduling.
  • Embodiments of the present application provide a multi-core processor, a processing method for a multi-core processor, and related equipment, so as to solve the multi-core no-load problem and improve the multi-core scheduling performance.
  • an embodiment of the present application provides a multi-core processor, including a task scheduler and multiple processing cores coupled to the task scheduler; wherein the task scheduler is used to store multiple task chains and the dependencies between the multiple task chains, the dependencies include dependencies and non-dependencies; the task scheduler is also used to: according to the dependencies between the multiple task chains, from A first task chain and a second task chain are determined from the plurality of task chains; there is no dependency between the first task chain and the second task chain, and the first task chain includes one or more first task chains task, the second task chain includes one or more second tasks; part or all of the plurality of processing cores are scheduled to execute the one or more first tasks; when at least one of the plurality of processing cores is When one first processing core is in an idle state, at least one second task in the second task chain is scheduled to be executed in the at least one first processing core.
  • the multi-core processor may be a multi-core coprocessor such as a GPU or a Neural Network Processing Unit (NPU), which includes a task scheduler and multiple processing cores coupled to the task scheduler;
  • the task scheduler can maintain the dependencies between task chains, that is, store the dependencies between multiple task chains, and the task scheduler also stores these multiple task chains, so that the task scheduler can retrieve the multiple task chains from the It is determined that the first task chain and the second task chain have no dependencies; the first task chain includes one or more first tasks, and the second task chain includes one or more second tasks, and the task scheduler can schedule these multiple tasks.
  • NPU Neural Network Processing Unit
  • Some or all of the processing cores execute one or more first tasks in the first task chain; since the first task chain and the second task chain are independent, the first task chain and the second task chain can Parallel execution, or the first task in the first task chain and the second task in the second task chain can be executed in parallel, when at least one of the multiple processing cores is in an idle state, the task scheduler will At least one second task in the second task chain is scheduled to be executed in the at least one first processing core; wherein, in an idle state or an idle state, that is, the processing core is not executing a task, the processing core in the idle state may be
  • the processing core that is not scheduled to execute the first task in the first task chain may also be a processing core that is idling after executing the first task in the first task chain; thus, in this embodiment of the present application, once there is a When the processing core is idle, the idle processing core will be immediately scheduled by the task scheduler to execute tasks, thereby improving the multi-core scheduling performance.
  • the task scheduler includes a dependency management unit and a task queue unit; wherein, the dependency management unit is used to store the dependency relationship between the multiple task chains; if it is determined that all After the dependency between the first task chain and the second task chain is no dependency, send a first instruction to the task queue unit, where the first instruction is used to instruct the first task chain and the The dependency relationship between the second task chains is described as a non-dependency relationship.
  • the task scheduler includes a dependency management unit and a task queue unit, and the hardware implements dependency management between task chains, that is, the dependency management unit can store the dependency relationship between task chains without software (ie, DDK) Participate in the dependency management and control between task chains, thereby saving the interaction time of software and hardware and software side calls; and after the dependencies between the task chains are removed, that is, the dependencies between the task chains are no dependencies or never exist.
  • the dependency management unit can store the dependency relationship between task chains without software (ie, DDK) Participate in the dependency management and control between task chains, thereby saving the interaction time of software and hardware and software side calls; and after the dependencies between the task chains are removed, that is, the dependencies between the task chains are no dependencies or never exist.
  • the hardware responds quickly and can immediately dispatch the non-dependency task chain to the processing core, which is better than the software side management; for example, if the dependency management unit determines that the first task chain and the second task chain are connected After the dependency between them is no dependency, the first instruction is immediately sent to the task queue unit, and the task queue unit immediately sends the first task chain and the second task chain to the processing core for execution.
  • the task scheduler further includes a task splitting unit and a multi-core management unit; wherein, the task queue unit is configured to store the multiple task chains; after receiving the dependency management unit After the first instruction sent by the unit, send the first task chain and the second task chain to the task splitting unit, and send a second instruction to the multi-core management unit, where the second instruction is used to indicate The multi-core management unit preempts processing cores for the first task chain and the second task chain.
  • the task scheduler further includes a task splitting unit and a multi-core management unit, and the task queue unit can store multiple task chains.
  • the task scheduler After receiving the first instruction sent by the dependency management unit, the task scheduler knows the first task The chain and the second task chain have no dependency, and the first task chain and the second task chain are sent to the task splitting unit; and the second instruction is sent to the multi-core management unit, and the second instruction is used to instruct the multi-core management unit to be the first task chain. and the second task chain to preempt the processing core; since the task splitting unit can split the first task chain into one or more first tasks and the second task chain into one or more second tasks, the multi-core management unit Processing cores can be preempted for the first task chain and the second task chain, which facilitates the execution of the first task chain and the second task chain.
  • the task splitting unit is configured to split the first task chain into the one or more first tasks; the multi-core management unit is configured to split the first task chain into the one or more first tasks; Two instructions, preempting one or more second processing cores from the plurality of processing cores; sending the result of preempting the one or more second processing cores to the task splitting unit; the task splitting unit, is also used to schedule the one or more second processing cores to execute the one or more first tasks.
  • the task splitting unit may split the first task chain into one or more first tasks; wherein, the second instruction may include the requirements for executing the first task chain The number of processing cores or the identification of the processing cores specifically used to execute the first task chain, etc., after receiving the second instruction from the task queue unit, the multi-core management unit can preempt one of the multiple processing cores according to the second instruction.
  • the task splitting unit splits the first task chain into one or more first tasks, and After receiving the result that the multi-core management unit preempts one or more second processing cores for the first task chain, schedule the one or more second processing cores to execute one or more first tasks of the first task chain; this is beneficial to Preempt computing resources for the execution of the first task chain.
  • the task splitting unit is further configured to split the second task chain into the one or more second tasks;
  • the multi-core management unit is further configured to split the second task chain into the one or more second tasks;
  • the task splitting unit may split the second task chain into one or more second tasks; the task splitting unit after scheduling the last task chain of the first task chain After a task is executed by one of the one or more second processing cores, the multi-core management unit may preempt the processing core for the execution of the second task in the second task chain; wherein the second instruction may include executing The number of processing cores required by the second task chain or the identification of the processing cores specifically used to execute the second task chain, etc.; after that, as long as at least one of the multiple processing cores is in an idle state, the multi-core management unit will Preempt the at least one first processing core according to the second instruction, and send the result of preempting the at least one first processing core to the task splitting unit; the task splitting unit can then split at least one of the one or more second tasks A second task is scheduled to be executed in the at least one first processing core; in this way, the hardware (multi-core management unit) implements the release and application of processing cores with the
  • this management method greatly reduces or even eliminates the no-load problem of some processing cores, and improves the utilization efficiency of processing cores.
  • the task scheduler further includes a task assembling unit; the task assembling unit is configured to obtain the command flow and the dependencies between some or all of the multiple task chains , and generate some or all of the multiple task chains according to the command flow; send some or all of the multiple task chains to the task queue unit, and send to the dependency management unit Dependencies between some or all of the plurality of task chains are sent.
  • the software (DDK) may issue tasks to the multi-core processor in the form of a command stream
  • the task assembly unit in the multi-core processor may receive the command stream and receive some or all of the tasks in multiple task chains and generate some or all of the task chains in the plurality of task chains according to the command flow; and send some or all of the task chains in the plurality of task chains to the task queue unit, and to the dependency management unit
  • the dependencies between some or all of the multiple task chains are sent; in this way, when the software (DDK) issues tasks in the form of a command stream, multi-core scheduling can also be realized.
  • an embodiment of the present application provides a method for processing a multi-core processor, which is applied to a multi-core processor, where the multi-core processor includes a task scheduler and multiple processing cores coupled to the task scheduler;
  • the method includes: storing, by the task scheduler, a plurality of task chains and dependencies between the plurality of task chains, the dependencies including dependencies and non-dependencies; A dependency relationship between multiple task chains, a first task chain and a second task chain are determined from the multiple task chains; there is no dependency relationship between the first task chain and the second task chain, and the The first task chain includes one or more first tasks, and the second task chain includes one or more second tasks; the task scheduler schedules some or all of the plurality of processing cores to execute the one or multiple first tasks; when at least one first processing core in the multiple processing cores is in an idle state, schedule at least one second task in the second task chain to the is executed in the at least one first processing core.
  • the task scheduler includes a dependency management unit and a task queue unit; wherein, storing the dependencies between the multiple task chains by the task scheduler includes: The dependency management unit in the task scheduler stores the dependencies between the multiple task chains; the task scheduler stores the dependencies between the multiple task chains from the multiple Determining the first task chain and the second task chain in the task chain includes: if the dependency between the first task chain and the second task chain is determined by the dependency management unit in the task scheduler After the dependency relationship is independent, the dependency management unit in the task scheduler sends a first instruction to the task queue unit, where the first instruction is used to instruct the first task chain and the second task Dependencies between chains are no dependencies.
  • the task scheduler further includes a task splitting unit and a multi-core management unit; wherein, storing multiple task chains by the task scheduler includes: storing multiple task chains by the task scheduler
  • the task queue unit of the system stores the plurality of task chains; the task scheduler determines the first task chain and the first task chain from the plurality of task chains according to the dependencies between the plurality of task chains.
  • Two task chains further comprising: after receiving the first instruction sent by the dependency management unit in the task scheduler through the task queue unit in the task scheduler, The task queue unit sends the first task chain and the second task chain to the task splitting unit, and sends a second instruction to the multi-core management unit, where the second instruction is used to instruct the The multi-core management unit preempts processing cores for the first task chain and the second task chain.
  • the scheduling part or all of the multiple processing cores to execute the one or more first tasks by the task scheduler includes: The task splitting unit splits the first task chain into the one or more first tasks; through the multi-core management unit in the task scheduler, according to the second instruction, from the multi-core management unit; Preempt one or more second processing cores from the processing cores; send the result of preempting the one or more second processing cores to the task splitting unit through the multi-core management unit in the task scheduler; The task splitting unit in the task scheduler schedules the one or more second processing cores to execute the one or more first tasks.
  • using the task scheduler to schedule at least one second task chain in the second Scheduling a task to be executed in the at least one first processing core includes: splitting the second task chain into the one or more second tasks by the task splitting unit in the task scheduler; When at least one first processing core in the plurality of processing cores is in an idle state, preempt the at least one first processing core according to the second instruction by the multi-core management unit in the task scheduler; The result of preempting the at least one first processing core is sent to the task splitting unit by the multi-core management unit in the task scheduler; the task splitting unit in the task scheduler splits the At least one second task among the one or more second tasks is scheduled to be executed in the at least one first processing core.
  • the task scheduler further includes a task assembling unit; the method further includes: acquiring the command stream and the multiple task chains through the task assembling unit in the task scheduler the dependencies between some or all of the task chains, and generate some or all of the multiple task chains according to the command flow; send the tasks to the tasks through the task assembling unit in the task scheduler
  • the queuing unit sends some or all of the plurality of task chains, and sends the dependency relationship between some or all of the plurality of task chains to the dependency management unit.
  • the present application provides a semiconductor chip, which may include the multi-core processor provided by any one of the implementation manners of the foregoing first aspect.
  • the present application provides a semiconductor chip, which may include: a multi-core processor provided by any one of the implementation manners of the first aspect, an internal memory coupled to the multi-core processor, and an external memory.
  • the present application provides a system-on-chip SoC chip, where the SoC chip includes the multi-core processor provided by any one of the implementation manners of the first aspect, an internal memory coupled to the multi-core processor, and an external memory.
  • the SoC chip may be composed of chips, or may include chips and other discrete devices.
  • the present application provides a chip system, where the chip system includes the multi-core processor provided by any one of the implementation manners of the first aspect above.
  • the chip system further includes a memory, and the memory is used for saving necessary or related program instructions and data during the operation of the multi-core processor.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • the present application provides a processing device, the processing device having the function of implementing any one of the processing methods for a multi-core processor in the second aspect above.
  • This function can be implemented by hardware or by executing corresponding software by hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the present application provides a terminal, where the terminal includes a multi-core processor, and the multi-core processor is the multi-core processor provided by any one of the implementation manners of the first aspect above.
  • the terminal may also include a memory for coupling with the multi-core processor, which holds program instructions and data necessary for the terminal.
  • the terminal may also include a communication interface for the terminal to communicate with other devices or a communication network.
  • the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a multi-core processor, implements the multi-core processing described in any one of the above second aspects The processing method flow of the device.
  • an embodiment of the present application provides a computer program, where the computer program includes instructions, when the computer program is executed by a multi-core processor, the multi-core processor can execute the multi-core processor described in any one of the above second aspects.
  • FIG. 1 is a schematic diagram of the architecture of a multi-core scheduling system provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a scheduling execution process of a task chain provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of the architecture of another multi-core scheduling system provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of another scheduling execution process of a task chain provided by an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a multi-core scheduling provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of a processing method for a multi-core processor provided by an embodiment of the present application.
  • a component may be, but is not limited to, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer.
  • an application running on a computing device and the computing device may be components.
  • One or more components may reside within a process and/or thread of execution, and a component may be localized on one computer and/or distributed between two or more computers.
  • these components can execute from various computer readable media having various data structures stored thereon.
  • a component may, for example, be based on a signal having one or more data packets (eg, data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet interacting with other systems via signals) Communicate through local and/or remote processes.
  • data packets eg, data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet interacting with other systems via signals
  • FIG. 1 is a schematic structural diagram of a multi-core scheduling system provided by an embodiment of the present application.
  • the task scheduler implements task and process scheduling management in units of task chains, wherein the task chain is a singly linked list structure, which is a collection of a series of tasks.
  • the task chain can be assembled by the device development kit and driver, and then sent to the task scheduler; or the task chain can be assembled in the task scheduler.
  • the software ie DDK
  • the task scheduler does not perceive the dependencies between the task chains.
  • the task scheduler executes the task chains issued by the software in a fixed order.
  • APPs upper-layer applications
  • DDK lower layer of APPs.
  • Dependency can be understood as the execution of a task chain needs to be based on the execution or completion of other task chains.
  • the tasks specified by upper-layer applications (APPs) are divided into different types of task chains, such as binning, render, compute, raytracing, transfer, etc. These different types of tasks can be scheduled and executed in parallel on the hardware. . If several task chains belong to the same type as divided above, they are called task chains of the same type. For the same type of task chain, the task scheduler ensures that the next task chain is scheduled to be executed after the previous task chain ends. Therefore, for the scheduled execution of the same type of task chain, there are the following shortcomings:
  • the load of the task chain is too small, and some processing cores are unloaded during the execution of the task chain, but cannot be used for the execution of the next task chain in advance.
  • the specific performance is that, according to the execution order, it is assumed to be divided into a task chain executed before and a task chain executed later.
  • the execution time of the task chain is related to the task with the longest execution time in the task chain. Since each task in the task chain The execution time is different, and the execution time of each processing core used to execute the task in the previously executed task chain is inconsistent. Some processing cores have short execution time, some processing cores have long execution time, and short execution time.
  • the core After the core executes the tasks in the previously executed task chain, it needs to wait for the processing core with a long execution time to complete the tasks in the previously executed task chain until the execution of the previously executed task chain is completed.
  • the processing core is always in an idle state, but it cannot be used to execute the task chain executed later, which will cause some processing cores to be idle (IDLE) for a long time before the execution of the descendant task chain starts, and the hardware performance is wasted .
  • the existing technical solutions do not solve the no-load problem of multi-core scheduling, especially for a task chain with a light load, since some processing cores have a long no-load time, the performance loss is serious.
  • FIG. 2 is a schematic diagram of a scheduling execution process of a task chain provided by an embodiment of the present application.
  • Figure 2 is briefly described as follows:
  • task chain 0 Job chain0
  • task chain 1 Job chain1
  • tasks tasks 0 to 3 respectively, with a total of 4 tasks; among them, task chain 0 and task chain 1 are of the same type Task chain, and there is no dependency between task chain 0 and task chain 1.
  • the multi-core processor has a 4-core structure, that is, the multi-core processor includes processing cores 0 to 3.
  • the task scheduler first issues four tasks in task chain 0 to processing cores 0 to 3 for execution; for example, task 0 in task chain 0 is issued to processing core 0 for execution, and task 1 in task chain 0 is executed It is issued to processing core 1 for execution, task 2 in task chain 0 is issued to processing core 2 for execution, and task 3 in task chain 0 is issued to processing core 3 for execution.
  • the task scheduler sends the four tasks in the task chain to the processing cores 0 to 3 for execution; for example, Task 0 in the task chain is issued to processing core 0 for execution, task 1 in the task chain is issued to processing core 1 for execution, task 2 in the task chain is issued to processing core 2 for execution, and task 3 in the task chain is issued For processing core 3 to execute.
  • processing core 1 executes task 1 in task chain 0 and task chain 1
  • processing core 2 executes task 2 in task chain 0 and task chain 1
  • processing core 3 executes task chain 0 and task chain 1.
  • Task 3 there is no load situation.
  • Processing core dead time is a drop in hardware performance, resulting in a performance penalty for processing cores.
  • the present application needs to solve the no-load problem between multi-cores in the task chain scheduling process, and improve the performance of multi-core scheduling.
  • FIG. 3 is a schematic structural diagram of a multi-core scheduling system 30 provided by an embodiment of the present application.
  • the multi-core scheduling system 30 includes a multi-core processor 31 and a device development kit and driver (DDK) 32 .
  • the multi-core processor 31 may be a multi-core coprocessor such as a GPU, a Neural Network Processing Unit (NPU), etc.
  • the multi-core processor 31 may specifically include a task scheduler 311 and multiple processing units coupled to the task scheduler 311 The core 312; wherein, the task scheduler 311 is used to store multiple task chains and dependencies between the multiple task chains, and the dependencies include dependencies and no dependencies; the task scheduler 311, further used to: determine a first task chain and a second task chain from the plurality of task chains according to the dependencies between the plurality of task chains; the first task chain and the second task chain There is no dependency between chains, the first task chain includes one or more first tasks, and the second task chain includes one or more second tasks; scheduling some or all of the plurality of processing cores 312 Execute the one or more first tasks; when at least one first processing core in the plurality of processing cores 312 is in an idle state, schedule at least one second task in the second task chain to the executed in at least one first processing core.
  • the task scheduler 311 is applied to the task distribution of the multiple processing cores 312 of the multi-core processor 31 and the scheduling management of the multiple processing cores 312 , and is a management unit of the multi-core processor 31 .
  • the device development kit and driver 32 include a user mode driver (User Mode Driver, UMD) and a kernel mode driver (Kernel Mode Driver, KMD).
  • UMD User Mode Driver
  • KMD Kernel Mode Driver
  • the multiple task chains stored in the task scheduler 311 are analyzed by the device development kit and the driver 32 to parse the API calls of upper-layer applications (APPs), and transmit the tasks to the task scheduler 311 on the multi-core processor 31 .
  • the device development kit and the driver 32 can directly complete the task assembly and deliver it to the task scheduler 311 in the form of a task chain.
  • the device development kit and driver 32 can also hand over task assembly or work to the task scheduler 311, and deliver the task to the task scheduler 311 in the form of a command stream, and the task scheduler 311 assembles the task chain according to the command stream.
  • the device development kit and driver 32 will also deliver the dependencies between the task chains to the task scheduler 311 , and the dependencies between the task chains include dependencies and no dependencies.
  • the dependencies between the task chains are maintained in the software (device development kit, driver 32 ), and the multi-core processor 31 cannot know the dependencies between the task chains.
  • the task scheduler is It is ensured that the task chain executed later is scheduled to be executed after the execution of the task chain executed earlier is completed, so that some processing cores in the multi-core processor will have an idle period.
  • the present application proposes a novel multi-core scheduling scheme in consideration of the deficiencies of the existing multi-core scheduling schemes.
  • the technical solution provided by the present application maintains the dependency relationship between task chains on hardware, that is, maintains the dependency relationship between task chains on the multi-core processor 31 , specifically on the task scheduler 311 .
  • Dependencies between task chains Since the task scheduler 311 can know the dependencies between the task chains, the task scheduler 311 can deliver tasks in the task chains without dependencies to the processing cores 312 for execution in advance, so as to prevent the occurrence of no-load of the processing cores.
  • the two task chains when the two task chains are delivered to the task scheduler 311 or assembled in the task scheduler 311, the two task chains may be independent, that is, the two A task chain has no dependencies at the beginning, and can be directly scheduled for execution; when the two task chains are delivered to the task scheduler 311 or assembled in the task scheduler 311, the two task chains may also be If there is a dependency relationship, the dependency relationship between the two task chains was later released, that is, the two task chains had a dependency relationship at the beginning, but later became no dependency relationship. After the dependency is lifted, it can be scheduled for execution.
  • the task scheduler 311 schedules the execution of the task chain, if there is no dependency between the task chains, that is, the dependency between the task chains is no dependency, the task in the currently executing task chain can be issued After the processing core is finished, without waiting for the execution of the previously executed task chain to complete, the task of the later executed task chain is immediately dispatched to the processing core, and the empty processing core is scheduled to be used by the later executed task chain.
  • the first task chain starts executing before the second task chain.
  • the first task chain includes one or more first tasks
  • the second task chain includes One or more second tasks; when the one or more first tasks are all issued to some or all of the plurality of processing cores 312 for execution, as long as at least one of the plurality of processing cores 312 returns In the idle state, at least one second task among the one or more second tasks is delivered to the at least one first processing core in the idle state for execution.
  • the idle state or the idle state that is, the processing core 312 is not executing tasks.
  • the processing cores 312 in the idle state may be processing cores that are not scheduled to execute the first task in the first task chain.
  • the processing cores used to execute the first task chain are only part of the multiple processing cores 312, If the processing core 312 for executing the first task in the first task chain is in an idle state, it can be used for executing the second task in the second task chain.
  • the processing core 312 in the idle state may also be a processing core that is idle after executing the first task in the first task chain.
  • the processing core 312 for executing the first task in the first task chain has executed the After a task starts to be in an idle state, it can be used to execute the second task in the second task chain immediately without waiting for the completion of the first task chain before executing the second task in the second task chain.
  • the execution of the first task chain is completed means that all the first tasks in the first task chain are executed and completed, and one processing core 312 can execute at least one first task or at least one second task. It should be understood that the multi-core scheduling process of the present application is a dynamic process.
  • the execution time of the third task chain is after the second task chain, and the third task chain includes one or more Three tasks; when the one or more second tasks are all issued to the processing cores 312 for execution, as long as there are processing cores 312 in the idle state among the multiple processing cores 312, the one or more third tasks will be At least one third task is issued to the processing core 312 in an idle state for execution; wherein, the processing core 312 used for executing the third task may be: the processing core 312 not used for executing the first task and the second task, or It may be the processing core 312 that is idle after executing the first task, or the processing core 312 that is idle after executing the first task and the second task.
  • each processing core 312 of the plurality of processing cores 312 is immediately scheduled to execute the task of the next task chain as long as it is in an idle state, so that the present application can effectively solve the problem of no-load processing cores and improve multi-core scheduling performance.
  • first task chain and second task chain may be the same type of task chain, but the first task chain and the second task chain have no dependencies when they are delivered to the processing core for execution.
  • the above-mentioned first task chain and second task chain may also be different types of task chains, which can be regarded as independent, because different types of task chains can be executed in parallel.
  • the device development kit and the driver 32 actively issue tasks to the multi-core processor 31 .
  • the multi-core processor 31 After the multi-core processor 31 completes the task, it informs the device development kit and the driver 32 by interrupting or querying the register; generally, it is an interrupt, which is efficient and friendly to the device development kit and the driver 32 .
  • the multi-core processor 31 includes a task scheduler 311 and multiple processing cores 312 coupled to the task scheduler 311; the task scheduler 311 can maintain dependencies between task chains, that is, store multiple processing cores 312. The dependencies between the task chains, and the task scheduler 311 also stores the multiple task chains, so that the task scheduler 311 can determine the first task chain and the second task chain without dependencies from the multiple task chains; The first task chain includes one or more first tasks, and the second task chain includes one or more second tasks.
  • the task scheduler 311 may schedule some or all of the multiple processing cores 312 to execute the first task chain.
  • the task scheduler 311 schedules at least one second task in the second task chain to the at least one first processing core for execution; thus, in this embodiment of the present application, once a processing core is idling, the idling processing core will be immediately scheduled by the task scheduler 311 to execute tasks, thereby improving the Multi-core scheduling performance.
  • the task scheduler 311 includes a dependency management unit 3111 and a task queue unit 3112; wherein the dependency management unit 3111 is used to store the dependency relationship between the multiple task chains; If it is determined that the dependency between the first task chain and the second task chain is no dependency, a first instruction is sent to the task queue unit 3112, where the first instruction is used to instruct the first A dependency relationship between a task chain and the second task chain is a non-dependency relationship.
  • the task scheduler 311 includes a dependency management unit 3111 and a task queue unit 3112 .
  • the device development kit, the driver 32 or the task scheduler 311 delivers the task chain to the task queue 3112 , and simultaneously delivers the dependency between the task chains to the dependency management unit 3111 .
  • the device development kit and the driver 32 deliver the dependencies between the task chains to the task scheduler 311 , that is, the device development kit and the driver 32 deliver the dependencies between the task chains to the task scheduler 311 .
  • the dependency management unit 3111 can store the dependencies between task chains; the device development kit and driver 32 deliver the task chain to the task scheduler 311, that is, the device development kit and driver 32 deliver the task chain to In the task queue unit 3112 of the task scheduler 311 , the task queue unit 3112 can be used to store the task chain; in addition, the task chain assembled by the task scheduler 311 is also stored in the task queue unit 3112 .
  • the device development kit and the driver 32 are delivered to the task queue unit 3112, or the task chains obtained after being assembled by the task scheduler 311 and delivered and stored in the task queue unit 3112 may have a dependency relationship, or may have no dependency relationship. and when there is a dependency relationship between the task chains delivered to the task queue unit 3112, the dependency relationship can be released along with the execution of the task chain.
  • the dependency management unit 3111 can maintain the dependency between task chains, and specifically records the change of the dependency between the task chains.
  • the task chains that are delivered to the task queue unit 3112 and have no dependencies at the beginning can be executed immediately, that is, the dependency management unit 3111 can inform the task queue unit 3112 that these task chains that have no dependencies at the beginning can be executed.
  • the dependency management unit 3111 records the dependencies of these initially dependent task chains. After the dependencies among the task chains with dependencies are released, the dependency management unit 3111 informs the task queue unit 3112 that these task chains can be executed. For example, after the dependency management unit 3111 determines that the dependency relationship between the first task chain and the second task chain is no dependency relationship, it informs the task queue unit 3112 of the dependency between the first task chain and the second task chain through the first instruction The relationship is non-dependent. It should be understood that dependency resolution is the change of a dependency from having a dependency to no dependency. Wherein, each first instruction is directed to an independent task chain, and is used to inform the task queue unit 3112 whether the task chain can start to be executed.
  • a task chain may depend on the completion of execution of one or more other task chains.
  • a task chain may depend on the end of the processing of an event in the DDK.
  • the characteristic value (signal semaphore) can be written to a semaphore buffer (buffer).
  • the dependency management unit 3111 can poll the semaphore, and poll the expected value at a certain time point, that is, poll the signal triggered by the end of task chain 0.
  • the dependency management unit 3111 confirms that the task chain 1 can be executed, and informs the task queue unit 3112 that the execution of the task chain 1 can be issued.
  • the above task chain 0 may be the first task chain, and the above task chain 1 may be the second task chain.
  • the dependency management unit 3111 notifies the task queue unit 3112 to deliver the task chains that execute these dependent contacts after judging that the dependencies between the task chains are released.
  • the task queue unit 3112 After the task queue unit 3112 completes the execution of a task chain, it notifies the dependency management unit 3111 of a certain semaphore.
  • dependencies such as barrier, fence, semaphore, event, etc.
  • semaphore there are two types of polling (wait/polling) and setting (signal/write) class event.
  • the dependency management unit 3111 needs to be notified.
  • the signal can be translated into set, it does not mean that it is set from two values 0 and 1, and any value can be written.
  • the action of the signal is to write the buffer, and the written value can be any value according to the maintenance rules.
  • the task scheduler 311 includes a dependency management unit 3111 and a task queue unit 3112, and the hardware implements dependency management between task chains, that is, the dependency management unit 3111 can acquire and store the dependencies between the task chains without the need for Software (ie DDK) participates in the dependency management and control between task chains, thus saving the interaction time of software and hardware and software side calls; and after the dependencies between task chains are removed, that is, the dependencies between task chains are No dependencies or after the transition from a dependency relationship to a non-dependency relationship, the hardware responds quickly, and can immediately schedule a task chain without a dependency relationship to the processing core, which is better than the software side management; for example, if the dependency management unit 3111 determines that the first task is After the dependency between the chain and the second task chain is no dependency, the first instruction is immediately sent to the task queue unit 3112, and the task queue unit 3112 immediately sends the first task chain and the second task chain to the processing core for execution.
  • Software ie DDK
  • the task scheduler 311 further includes a task splitting unit 3113 and a multi-core management unit 3114; wherein the task queue unit 3112 is used to store the multiple task chains; After the first instruction sent by the dependency management unit 3111, the first task chain and the second task chain are sent to the task splitting unit 3113, and the second instruction is sent to the multi-core management unit 3114. The second instruction is used to instruct the multi-core management unit 3114 to preempt processing cores for the first task chain and the second task chain.
  • the task scheduler 311 further includes a task splitting unit 3113 and a multi-core management unit 3114 .
  • the task queue unit 3112 stores the multiple task chains, that is, the task queue unit 3112 manages multiple task chains of multiple processes; for example, the task queue unit 3112 can link the first task chain and the second task chain without dependencies send execution.
  • the task queuing unit 3112 can assign the task chain with no dependencies or dependency relief to the task splitting unit 3113 for execution according to a certain strategy; at the same time, it informs the multi-core management unit 3114 to apply for a corresponding processing core for execution without dependencies or dependency relief. task chain.
  • the dependency management unit 3111 informs the task queue unit 3112 through the first instruction that the dependency relationship between the first task chain and the second task chain is non-dependency; after the task queue unit 3112 receives the first instruction, the first task chain
  • the second task chain and the second task chain are sent to the task splitting unit 3113, and the multi-core management unit 3114 is notified by the second instruction to preempt the processing core 312 for the first task chain and the second task chain, so as to execute the first task chain and the second task chain. task chain.
  • the task queue unit 3112 needs to inform the multi-core management unit 3114 which processing cores to preempt for the first task chain and the second task chain through the second instruction, but it does not need to explain how to preempt, because the multi-core management unit 3114 implements preemption using a fixed strategy.
  • the second instruction for preempting the processing core for the first task chain and the second task chain is sent in two times, the first sending informs the multi-core management unit 3114 to preempt the processing core for the first task chain, and the second sending informs the multi-core management Unit 3114 preempts the processing core for the second task chain.
  • the above certain strategies include but are not limited to:
  • the task queue unit 3112 may, when the software enables the time slice rotation function (the function software can choose whether to enable or not), in the The task chain of the corresponding process is dispatched only after the corresponding time slice.
  • the task queue unit manages the distribution of the binning/compute task chain through a predetermined strategy, such as interleaving.
  • the multi-core management unit 3114 can realize dynamic preemption (or dynamic occupation) and dynamic release of multiple processing cores 312. If a certain processing core completes the task in the previously executed task chain, the multi-core management unit 3114 immediately Release and re-apply to preempt the processing core for executing tasks in the task chain to be executed later; for example, after a certain processing core executes the first task in the first task chain, the multi-core management unit 3114 can It is released from the first task chain and re-applies to preempt the processing core for executing the second task in the second task chain. It should be understood that the interpretation of dynamic preemption is that it may not be used under the preemption.
  • the processing core 312 preempted by the multi-core management unit 3114 for the task chain will not be used to execute the tasks in the task chain.
  • the management unit 3114 will directly release the processing core 312, and the release speed in this case is very fast.
  • the task queue unit 3112 issues the task chain to the task splitting unit 3113 .
  • the task scheduler 311 further includes a task splitting unit 3113 and a multi-core management unit 3114.
  • the task queue unit 3112 can store multiple task chains.
  • the task scheduler 311 receives the first instruction sent by the dependency management unit 3111.
  • the multi-core management unit 3114 preempts processing cores for the first task chain and the second task chain; since the task splitting unit 3113 can split the first task chain into one or more first tasks and the second task chain into one or multiple second tasks, the multi-core management unit 3114 can preempt processing cores for the first task chain and the second task chain, which is beneficial to the execution of the first task chain and the second task chain.
  • the task splitting unit 3113 is configured to split the first task chain into the one or more first tasks; the multi-core management unit 3114 is configured to the second instruction, preempt one or more second processing cores from the plurality of processing cores 312; send the result of preempting the one or more second processing cores to the task splitting unit 3113; the task The splitting unit 3113 is further configured to schedule the one or more second processing cores to execute the one or more first tasks.
  • the task splitting unit 3113 splits the tasks in the task chain.
  • the task splitting unit 3113 splits the first task chain into one or more first tasks; and the rules for splitting the task chain can be For raster order (Raster order), Z order (Z order), U order (U order), 3D cube (3D cube) and so on.
  • the task splitting unit 3113 sends the split task to the processing core 312 that has been preempted for the task chain in the multi-core management unit 3114, and the processing core 312 implements the calculation and execution of the task;
  • One or more second processing cores are preempted in the core 312 to execute the first task chain, and the one or more second processing cores may be some or all of the multiple processing cores 312.
  • the task splitting unit 3113 divides the first One or more first tasks obtained by splitting the task chain are delivered to the one or more second processing cores. It should be understood that there is no specific relationship between the tasks split from the task chain and the processing core 312.
  • the tasks split from the task chain can be sent to the device development kit, and the task chain specified by the driver 32 is used to execute the task chain. on any one of the processing cores 312. For example, one or more first tasks obtained by splitting the first task chain are randomly distributed to the above-mentioned one or more second processing cores.
  • the rules for the multi-core management unit 3114 to preempt processing cores for the task chain are as follows:
  • the device development kit and the driver 32 will issue the designation to the task queue unit 3112;
  • the device development kit and driver 32 will specify that the task chain can be executed on all processing cores 312, but in special scenarios, when some task chains can be slowly executed in an asynchronous (async) manner, the device development kit and driver 32 can be specified.
  • the task chain is only allowed to execute on certain processing cores 312 .
  • the development kit, driver 32 specifies in advance that the first task chain can be executed on all or part of the plurality of processing cores.
  • a multi-core processor 31 exemplifies two scenarios for the GPU:
  • the GPU can do device virtualization, so that for DDK, he can "see” multiple GPU instances (although the hardware is still essentially one GPU).
  • each GPU instance can see different GPU cores. For example, GPU0 instance can only see GPU cores 0 to 1; GPU1 instance can only see GPU cores 2 to 5.
  • DDK when scheduling the task chain on the GPU0 instance, you need to specify that the task chain can only be executed on GPU cores 0 to 1; when scheduling the task chain on the GPU1 instance, you need to specify GPU cores 2 to 5.
  • APPs can specify that certain tasks are asynchronous computing scenarios (async compute). These computations do not require high real-time performance.
  • DDK estimates the async compute task chain through certain indicators. Calculate the load, thereby allocating the corresponding number of GPU cores so that they do not execute at full speed.
  • the multi-core management unit 3114 and the task splitting unit 3113 can share the preemption status of the processing cores 312 in real time, that is, the multi-core management unit 3114 sends the preemption status of the processing cores 312 to the task splitting unit 3113 in real time. After any one of the processing cores 312 completes the task execution, it will notify the multi-core management unit 3114, and the multi-core management unit 2114 actively decides to release and preempt the processing cores according to the scoreboard maintained by itself and the task completion status.
  • the scoreboard is located in the multi-core management unit 3114, and the dependency management unit 3111 needs to know the end event of each task chain in order to handle the dependencies between the task chains, and obtain this information indirectly through the scoreboard.
  • the task splitting unit 3113 is responsible for dispatching tasks to the processing cores 312, but needs to query the scoreboard in the multi-core management unit 3114, which processing cores have been preempted by the multi-core management unit, and whether these processing cores can still receive tasks at present Or whether the task can still be executed, and whether the processing cores used to execute a certain task chain are all released (this is the mark of the end of the execution of the task chain).
  • the scoreboard in the multi-core management unit 3114 needs to be written to record the assignment of tasks on the processing cores preempted by the multi-core management unit 3114 .
  • the task splitting unit 3113 may split the first task chain into one or more first tasks; wherein the second instruction may include executing the first task chain.
  • the multi-core management unit 3114 After receiving the second instruction from the task queue unit 3112, the multi-core management unit 3114 In 312, one or more second processing cores are preempted, and the result of preempting one or more second processing cores is sent to the task splitting unit 3113; the task splitting unit 3113 splits the first task chain into one or more and after receiving the result that the multi-core management unit 3114 preempts one or more second processing cores for the first task chain, schedule the one or more second processing cores to execute one or more of the first task chain The first task; this is conducive to preempting computing resources for the execution of the first task chain.
  • the task splitting unit 3113 is further configured to split the second task chain into the one or more second tasks;
  • the multi-core management unit 3114 is further configured to When at least one first processing core in the plurality of processing cores 312 is in an idle state, preempt the at least one first processing core according to the second instruction; The result of the at least one first processing core;
  • the task splitting unit 3113 is further configured to schedule at least one second task of the one or more second tasks to the at least one first processing core for execution.
  • the task splitting unit 3113 may further split the second task chain into one or more second tasks. After the task splitting unit 3113 schedules one or more second processing cores to execute one or more first tasks obtained by splitting the first task chain, the multi-core management unit 3114 can immediately preempt the processing core for the execution of the second task chain And the multi-core management unit 3114 preempts the processing core for the execution of the second task chain, as long as there is a processing core 312 in an idle state, it can be preempted for executing the second task chain, and used for executing the processing core of the second task chain That is, the first processing core. It should be understood that the second task chain may be executed on all or part of the plurality of processing cores is also specified in advance by the development kit and the driver 32 .
  • the processing cores 312 in the idle state may be processing cores that are not scheduled to execute the first task in the first task chain.
  • the processing cores used to execute the first task chain are only part of the multiple processing cores 312, If the processing core 312 for executing the first task in the first task chain is in an idle state, it can be preempted by the multi-core management unit 3114 for executing the second task in the second task chain.
  • the processing core 312 in the idle state may also be a processing core that is idle after executing the first task in the first task chain.
  • the processing core 312 for executing the first task in the first task chain has executed the After a task starts to be in an idle state, it can be preempted by the multi-core management unit 3114 for executing the second task in the second task chain immediately, without waiting for the execution of the first task chain to complete before being preempted by the multi-core management unit 3114 for execution The second task in the second task chain.
  • FIG. 4 is a schematic diagram of another task chain scheduling and execution process provided by an embodiment of the present application.
  • a brief description of FIG. 4 is as follows:
  • task chain 0 and task chain 1 can be divided into tasks 0 to 3 respectively, with a total of 4 tasks; among them, task chain 0 and task chain 1 are the same type of task chain, and task chain 0 and task chain 1 There is no dependency between them.
  • the multi-core processor has a 4-core structure, that is, the multi-core processor includes processing cores 0 to 3.
  • the task scheduler first issues the four tasks of task chain 0 to processing cores 0 to 3 for execution; for example, task 0 in task chain 0 is issued to processing core 0 for execution, and task 1 in task chain 0 is executed It is sent to processing core 1 for execution, task 2 in task chain 0 is sent to processing core 2 for execution, and task 3 in task chain 0 is sent to processing core 3 for execution.
  • the task scheduler After waiting for any one of the processing cores 0 to 3 to complete the task in the task chain 0, the task scheduler immediately sends the task in the task chain 1 to the processing core for execution; for example, the processing core 3 finishes executing the task chain.
  • the processing core 3 For task 3 in task chain 0, immediately send task 0 in task chain 1 to processing core 3 for execution; after processing core 2 executes task 2 in task chain 0, it immediately sends task 1 in task chain 1 to the processing core. 2 execution; processing core 1 finishes executing task 1 in task chain 0, and immediately sends task 2 in task chain 1 to processing core 1 for execution; processing core 0 finishes executing task 0 in task chain 0, and immediately sends task Task 3 in 1 is sent to processing core 0 for execution.
  • the above task chain 0 may be the first task chain
  • the above task chain 1 may be the second task chain.
  • the scheduling feature in Figure 4 enables concurrent execution of independent task chains, timely scheduling and full use of the computing power of the processing core, reducing performance degradation caused by no-load phenomena.
  • the processing core 312 for executing the release of the first task chain is preempted in time for executing the second task chain;
  • the tasks in each task chain are issued by a balanced strategy to ensure that the number of unfinished tasks on each processing core is basically equal.
  • multi-core management Unit 3114 will grab all computing resources used to execute low priority task chains; for processing core 312, it can only see tasks in high priority task chains or only tasks in low priority task chains , it is impossible to see the tasks in the high-priority task chain and the tasks in the low-priority task chain at the same time.
  • the rest of the processing Cores can be dynamically scheduled to execute other types of task chains, such as dynamically scheduled to execute binning-type task chains.
  • the task splitting unit 3113 can split the second task chain into one or more second tasks; Finally, after the first task is executed by one of the one or more second processing cores, the multi-core management unit 3114 can preempt the processing core for the execution of the second task in the second task chain; wherein the second instruction It may include the number of processing cores required to execute the second task chain or the identification of the processing cores specifically used to execute the second task chain; thereafter, as long as at least one first processing core in the plurality of processing cores 312 is in an idle state, the multi-core The management unit 3114 will preempt the at least one first processing core according to the second instruction, and send the result of preempting the at least one first processing core to the task splitting unit 3113; At least one second task among the second tasks is scheduled to be executed in the at least one first processing core; in this way, the hardware (multi-core management unit 3114 ) implements the release and application of processing cores with the granularity of multiple processing cores 3
  • this management method greatly reduces or even eliminates the no-load problem of some processing cores, and improves the utilization efficiency of the processing cores.
  • the task scheduler 311 further includes a task assembling unit 3115; the task assembling unit 3115 is configured to obtain the command flow and some or all of the task chains among the multiple task chains. and generate some or all of the multiple task chains according to the command flow; send some or all of the multiple task chains to the task queue unit 3112, and send all or all of the multiple task chains to the task queue unit 3112
  • the dependency management unit 3111 sends the dependency between some or all of the multiple task chains.
  • DDK inserts the dependencies specified in the API and the dependencies that are not specified by the API but inferred by the DDK itself into the command stream in the order of instructions.
  • the hardware executes the command stream, assembles the commands in the command stream into a job, matches the dependency of the instruction form to the corresponding task chain, and sends it to the next-level module after completion.
  • the device development kit and the driver 32 can directly complete the task assembly, and deliver it to the task scheduler 311 in the form of a task chain.
  • the device development kit and driver 32 can also hand over the task assembly or work to the task assembly unit 3115 in the task scheduler 311, and issue the task to the task assembly unit 3115 in the form of a command stream, and the task assembly unit 3115 assembles it according to the command stream.
  • task chain in addition, the device development kit and driver 32 will also issue the dependencies between the task chains to the task assembly unit 3115; after the task assembly unit 3115 assembles the task chain, it sends the assembled task chain to the task queue unit 3112, and send the dependencies of the assembled task chain to the dependency management unit 3111.
  • the task assembling unit 3115 may optionally exist according to the work division of the device development kit, the driver 32 and the multi-core processor 31 .
  • the software (DDK) may issue tasks to the multi-core processor 31 in the form of a command stream, and the task assembly unit 3115 in the multi-core processor 31 may receive the command stream and receive parts of multiple task chains and generate some or all of the plurality of task chains according to the command flow; and send some or all of the plurality of task chains to the task queue unit 3112, and Send the dependencies between some or all of the multiple task chains to the dependency management unit 3111; in this way, when the software (DDK) issues tasks in the form of command streams, multi-core scheduling can also be implemented.
  • FIG. 5 is a schematic flowchart of a multi-core scheduling provided by an embodiment of the present application, which can be applied to the multi-core scheduling system 30 shown in FIG. 3, including but not limited to the following steps:
  • Step 501 Device development kit, driver (DDK) task analysis.
  • DDK parses the tasks that need to be executed by the multi-core processor by calling the parsing API, and sets the dependencies between the tasks.
  • Step 502 is entered after the parsing of a segment of tasks is completed.
  • the DDK task parsing process can be specifically executed by the device development kit and the driver 32 .
  • Step 502 Task assembly.
  • tasks are assembled into task chains identifiable by multi-core processors, and corresponding data sequences (desc or descriptors) are constructed, and dependencies are recorded.
  • descriptors are data structures stored in double-rate synchronous dynamic random access memory (Double Data Rate, DDR), which are used to characterize various aspects of each task chain, such as which input data is available, which program segment is used to execute, How to process, where to output, in what format, etc.
  • DDR Double Data Rate
  • the task assembly process may be executed by the device development kit, the driver 32 or the task assembly unit 3115 .
  • Step 503 Dependency management.
  • the dependency management process it participates in maintaining the dependencies between task chains according to the record information of the scoreboard.
  • the remaining task chains on which the task chain waiting for execution depends are recorded in the scoreboard as completed, the dependency relationship of the task chain waiting for execution is released.
  • the dependency management process can be specifically executed by the dependency management unit 3111 , and the scoreboard is located in the multi-core management unit 3114 .
  • Step 504 Task queue.
  • the task queuing process can be specifically executed by the task queuing unit 3112 .
  • Step 505 Multi-core management.
  • the number of tasks obtained by splitting a task chain may be the same as or different from the number of processing cores; there is a situation where the number of tasks obtained by splitting a task chain is greater than the number of processing cores, and there is at least one processing core.
  • the core needs to execute two or more tasks of the task chain; for a processing core that needs to execute two or more tasks of the task chain, it is released after the last task of the task chain is executed. ; And for a processing core that only executes one task of the task chain, the task of the task chain that it executes is also the last task of the task chain.
  • the multi-core management process may be specifically executed by the multi-core management unit 3114 .
  • Step 506 Task splitting.
  • the task chain waiting to be executed is divided into one or more tasks, and sent to the processing cores that apply for preemption for the task chain waiting to be executed in step 505 to realize task calculation. After one or more split tasks are issued, step 507 and step 508 are entered simultaneously.
  • the task splitting process can be specifically executed by the task splitting unit 3113 .
  • Step 507 Scoreboard.
  • the scoreboard records the task issued to each processing core and the task chain to which the task belongs, and confirms whether the task in a task chain on the processing core is completely completed according to the return information of the processing core, and if so, go to step 505 Perform dynamic release and dynamic preemption of processing cores.
  • the scoreboard is located in the multi-core management unit 3114 , and the scoreboard process can be specifically executed by the multi-core management unit 3114 .
  • Step 508 Multi-core execution.
  • each processing core executes independently, and each processing core returns a response to the scoreboard after completing each task.
  • the multi-core execution process may be executed by the processing core 312 .
  • the task scheduler manages the dependencies between the same type of task chains, and the dependencies between the task chains need to be managed on the hardware side, not on the software (DDK) side. That is, the hardware implements the dependency management of the task chain, without the need for DDK to participate in the control, which saves the interaction time of software and hardware and software side calls, and the hardware responds quickly. After the dependency relationship is released, a new task chain can be dispatched immediately, which is better than Software side management.
  • the task scheduler implements fine-grained dynamic release and dynamic preemption operations of processing cores.
  • a processing core completes the last task of a certain task chain, it is immediately released and preempted to execute the task to be executed again.
  • chain to mitigate or eliminate processing core idling through fine-grained management. That is, the hardware implements the fine-grained release and preemption of the multi-core of the multi-core processor, and each processing core is independently managed.
  • a processing core completes a task that belongs to its own execution in a task chain, it is immediately released and re-applied for the remaining tasks.
  • the computing resources of the chain Compared with the unified release and application operation of multiple cores with the task chain as the boundary or force, this management method greatly reduces or even eliminates the no-load problem of some processing cores, and improves the utilization efficiency of processing cores.
  • the task scheduler implements dynamic scheduling of processing cores across task chains and processes to prevent processing cores from being unloaded. After the task of the task chain is issued, if there is no dependency on the next task chain, the next task chain can be executed immediately without waiting for the end of the task chain. That is, the hardware implements dynamic scheduling across task chains and processes, which can effectively reduce the problem of processing core no-load within the same process and between different processes, which is better than software-side management.
  • FIG. 6 is a method for processing a multi-core processor provided by an embodiment of the present application, which is applied to a multi-core processor.
  • the multi-core processor includes a task scheduler and a plurality of and the processing method of the multi-core processor is applicable to any of the multi-core processors in the above-mentioned FIG. 3-FIG. 5 and the devices (such as mobile phones, computers, servers, etc.) including the multi-core processors.
  • the method may include but is not limited to steps 601-604, wherein,
  • Step 601 Store multiple task chains and dependencies between the multiple task chains through the task scheduler, where the dependencies include dependencies and no dependencies;
  • Step 602 Determine, by the task scheduler, a first task chain and a second task chain from the plurality of task chains according to the dependencies between the plurality of task chains; the first task chain and the There is no dependency between the second task chains, the first task chain includes one or more first tasks, and the second task chain includes one or more second tasks;
  • Step 603 Schedule some or all of the plurality of processing cores to execute the one or more first tasks through the task scheduler;
  • Step 604 When at least one first processing core in the plurality of processing cores is in an idle state, schedule at least one second task in the second task chain to the at least one first processing core through the task scheduler. Executed in a processing core.
  • the task scheduler includes a dependency management unit and a task queue unit; wherein, storing the dependencies between the multiple task chains by the task scheduler includes: The dependency management unit in the task scheduler stores the dependencies between the multiple task chains; the task scheduler stores the dependencies between the multiple task chains from the multiple Determining the first task chain and the second task chain in the task chain includes: if the dependency between the first task chain and the second task chain is determined by the dependency management unit in the task scheduler After the dependency relationship is independent, the dependency management unit in the task scheduler sends a first instruction to the task queue unit, where the first instruction is used to instruct the first task chain and the second task Dependencies between chains are no dependencies.
  • the task scheduler further includes a task splitting unit and a multi-core management unit; wherein, storing multiple task chains by the task scheduler includes: storing multiple task chains by the task scheduler
  • the task queue unit of the system stores the plurality of task chains; the task scheduler determines the first task chain and the first task chain from the plurality of task chains according to the dependencies between the plurality of task chains.
  • Two task chains further comprising: after receiving the first instruction sent by the dependency management unit in the task scheduler through the task queue unit in the task scheduler, The task queue unit sends the first task chain and the second task chain to the task splitting unit, and sends a second instruction to the multi-core management unit, where the second instruction is used to instruct the The multi-core management unit preempts processing cores for the first task chain and the second task chain.
  • the scheduling part or all of the multiple processing cores to execute the one or more first tasks by the task scheduler includes: The task splitting unit splits the first task chain into the one or more first tasks; through the multi-core management unit in the task scheduler, according to the second instruction, from the multi-core management unit; Preempt one or more second processing cores from the processing cores; send the result of preempting the one or more second processing cores to the task splitting unit through the multi-core management unit in the task scheduler; The task splitting unit in the task scheduler schedules the one or more second processing cores to execute the one or more first tasks.
  • using the task scheduler to schedule at least one second task chain in the second Scheduling a task to be executed in the at least one first processing core includes: splitting the second task chain into the one or more second tasks by the task splitting unit in the task scheduler; When at least one first processing core in the plurality of processing cores is in an idle state, preempt the at least one first processing core according to the second instruction by the multi-core management unit in the task scheduler; The result of preempting the at least one first processing core is sent to the task splitting unit by the multi-core management unit in the task scheduler; the task splitting unit in the task scheduler splits the At least one second task among the one or more second tasks is scheduled to be executed in the at least one first processing core.
  • the task scheduler further includes a task assembling unit; the method further includes: acquiring the command stream and the multiple task chains through the task assembling unit in the task scheduler the dependencies between some or all of the task chains, and generate some or all of the multiple task chains according to the command flow; send the tasks to the tasks through the task assembling unit in the task scheduler
  • the queuing unit sends some or all of the plurality of task chains, and sends the dependency relationship between some or all of the plurality of task chains to the dependency management unit.
  • the multi-core processor includes a task scheduler and multiple processing cores coupled to the task scheduler; the task scheduler can maintain dependencies between task chains, that is, store the dependencies between multiple task chains. Dependency relationship, and also store these multiple task chains through the task scheduler, so that the first task chain and the second task chain without dependencies can be determined from the multiple task chains through the task scheduler; and the first task chain
  • the chain includes one or more first tasks, the second task chain includes one or more second tasks, and some or all of the multiple processing cores can be scheduled to execute one or more of the first task chains through the task scheduler
  • the second task of the second task chain can be executed in parallel, and when at least one first processing core in the multiple processing cores is in an idle state, at least one second task in the second task chain can be scheduled to the at least one first processing core through the task scheduler.
  • the idle processing core will be immediately scheduled by the task scheduler to execute tasks, thereby improving multi-core scheduling performance.
  • the present application further provides a semiconductor chip, which may include the multi-core processor provided by any one of the implementation manners of the foregoing embodiments.
  • the present application further provides a semiconductor chip, which may include the multi-core processor provided by any one of the above embodiments, an internal memory coupled to the multi-core processor, and an external memory.
  • the present application further provides a system-on-a-chip SoC chip, where the SoC chip includes the multi-core processor provided by any one of the foregoing embodiments, an internal memory coupled to the multi-core processor, and an external memory.
  • the SoC chip may be composed of chips, or may include chips and other discrete devices.
  • the present application further provides a chip system, where the chip system includes the multi-core processor provided by any one of the implementation manners of the foregoing embodiments.
  • the chip system further includes a memory, and the memory is used for saving necessary or related program instructions and data during the operation of the multi-core processor.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • the present application further provides a processing apparatus, which has the function of implementing any one of the processing methods for a multi-core processor in the foregoing method embodiments.
  • This function can be implemented by hardware or by executing corresponding software by hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the present application further provides a terminal, where the terminal includes a multi-core processor, and the multi-core processor is the multi-core processor provided by any one of the implementation manners of the foregoing embodiments.
  • the terminal may also include memory for coupling with the multi-core processor, which holds program instructions and data necessary for the terminal.
  • the terminal may also include a communication interface for the terminal to communicate with other devices or a communication network.
  • Embodiments of the present application further provide a computer-readable storage medium, wherein the computer-readable storage medium may store a program, and when the program is executed by a multi-core processor, the program includes part or all of any one of the foregoing method embodiments step.
  • Embodiments of the present application further provide a computer program, where the computer program includes instructions, when the computer program is executed by a multi-core processor, the multi-core processor can execute any of the multi-core processors described in the above method embodiments. Some or all of the steps of a processing method.
  • the disclosed apparatus may be implemented in other manners.
  • the device embodiments described above are only illustrative.
  • the division of the above-mentioned units is only a logical function division.
  • multiple units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical or other forms.
  • the units described above as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated units are implemented in the form of software functional units and sold or used as independent products, they may be stored in a computer-readable storage medium.
  • the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc., specifically a processor in the computer device) to execute all or part of the steps of the foregoing methods in the various embodiments of the present application.
  • a computer device which may be a personal computer, a server, or a network device, etc., specifically a processor in the computer device
  • the aforementioned storage medium may include: U disk, mobile hard disk, magnetic disk, optical disk, Read-Only Memory (Read-Only Memory, abbreviation: ROM) or Random Access Memory (Random Access Memory, abbreviation: RAM), etc.
  • a medium that can store program code may include: U disk, mobile hard disk, magnetic disk, optical disk, Read-Only Memory (Read-Only Memory, abbreviation: ROM) or Random Access Memory (Random Access Memory, abbreviation: RAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本申请实施例提供一种多核处理器、多核处理器的处理方法及相关设备,其中的多核处理器包括任务调度器以及耦合于任务调度器的多个处理核;任务调度器存储多个任务链和多个任务链之间的依赖关系;任务调度器还用于:根据多个任务链之间的依赖关系,从多个任务链中确定第一任务链和第二任务链;第一任务链与第二任务链之间无依赖关系;调度多个处理核中的部分或全部执行一个或多个第一任务;当多个处理核中有至少一个第一处理核处于空闲状态时,将第二任务链中的至少一个第二任务调度至至少一个第一处理核中执行。采用本申请实施例,可以提升多核调度性能。

Description

多核处理器、多核处理器的处理方法及相关设备 技术领域
本申请涉及处理器技术领域,尤其涉及一种多核处理器、多核处理器的处理方法及相关设备。
背景技术
在图形处理器(Graphics Processing Unit,GPU)设计中,任务调度器(Job Manager,JM)用于实现GPU多核的任务调度下发(Kick-Off,KO)。设备开发包、驱动(Device Development Kit,DDK)解析上层应用(APPs)对图形/计算应用程序编程接口(Application Programming Interface,API)的调用,封装成GPU可识别并执行的任务,以任务(Job)/任务链(Job Chain,JC)或命令流(Command Stream)的形式下发给GPU上的任务调度器。任务调度器对DDK封装的任务进行识别和拆分,下发给GPU的多核,GPU的多核之间并发执行各自接收到的任务。任务调度器负责多核调度以及负责或参与多进程管理,影响多核利用效率。然而,现有技术方案未解决GPU多核调度的空载问题。
发明内容
本申请实施例提供了一种多核处理器、多核处理器的处理方法及相关设备,以解决多核空载问题,提升多核调度性能。
第一方面,本申请实施例提供了一种多核处理器,包括任务调度器、以及耦合于所述任务调度器的多个处理核;其中,所述任务调度器,用于存储多个任务链和所述多个任务链之间的依赖关系,所述依赖关系包括有依赖关系和无依赖关系;所述任务调度器,还用于:根据所述多个任务链之间的依赖关系,从所述多个任务链中确定第一任务链和第二任务链;所述第一任务链与所述第二任务链之间无依赖关系,所述第一任务链包括一个或多个第一任务,所述第二任务链包括一个或多个第二任务;调度所述多个处理核中的部分或全部执行所述一个或多个第一任务;当所述多个处理核中有至少一个第一处理核处于空闲状态时,将所述第二任务链中的至少一个第二任务调度至所述至少一个第一处理核中执行。本申请实施例中,多核处理器可以为GPU、神经网络处理器(Neural Network Processing Unit,NPU)等多核协处理器,其包括任务调度器、以及耦合于该任务调度器的多个处理核;任务调度器可以维护任务链之间的依赖关系,也即存储多个任务链之间的依赖关系,并且任务调度器还存储这多个任务链,如此任务调度器可以从这多个任务链中确定出无依赖关系的第一任务链与第二任务链;而第一任务链包括一个或多个第一任务,第二任务链包括一个或多个第二任务,任务调度器可以调度这多个处理核中的部分或全部执行第一任务链中的一个或多个第一任务;由于第一任务链与第二任务链是无依赖关系的,故第一任务链与第二任务链可以并行执行,或者第一任务链中的第一任务与第二任务链中的第二任务可以并行执行,当这多个处理核中有至少一个第一处理核处于空闲状态时,任务调度器将第二 任务链中的至少一个第二任务调度至这至少一个第一处理核中执行;其中,空闲状态或称为空载状态,也即处理核没有在执行任务,空闲状态的处理核可以为未调度用于执行第一任务链中的第一任务的处理核,也可以是执行完第一任务链中的第一任务后处于空载的处理核;如此,本申请实施例中,一旦有处理核出现空载情况,该空载的处理核会立刻被任务调度器调度用于执行任务,从而可以提升多核调度性能。
在一种可能的实现方式中,所述任务调度器包括依赖管理单元、任务队列单元;其中,所述依赖管理单元,用于存储所述多个任务链之间的依赖关系;若判断到所述第一任务链与所述第二任务链之间的依赖关系为无依赖关系后,向所述任务队列单元发送第一指令,所述第一指令用于指示所述第一任务链与所述第二任务链之间的依赖关系为无依赖关系。本申请实施例中,任务调度器包括依赖管理单元、任务队列单元,硬件实现任务链之间的依赖管理,也即依赖管理单元可以存储任务链之间的依赖关系,无需软件(也即DDK)参与任务链之间的依赖管理控制,从而节省了软硬件的交互时间和软件侧调用;且任务链之间的依赖关系解除后,也即任务链之间的依赖关系为无依赖关系或者从有依赖关系转变成无依赖关系后,硬件响应迅速,能立即调度无依赖关系的任务链给处理核,优于软件侧管理;例如,依赖管理单元若判断到第一任务链与第二任务链之间的依赖关系为无依赖关系后,立即向任务队列单元发送第一指令,任务队列单元立即将第一任务链与第二任务链下发给处理核执行。
在一种可能的实现方式中,所述任务调度器还包括任务拆分单元、多核管理单元;其中,所述任务队列单元,用于存储所述多个任务链;在接收到所述依赖管理单元发送的第一指令后,向所述任务拆分单元发送所述第一任务链和所述第二任务链,以及向所述多核管理单元发送第二指令,所述第二指令用于指示所述多核管理单元为所述第一任务链和所述第二任务链抢占处理核。本申请实施例中,任务调度器还包括任务拆分单元、多核管理单元,任务队列单元可以存储多个任务链,任务调度器在接收到依赖管理单元发送的第一指令后,知晓第一任务链和第二任务链无依赖关系,将第一任务链和第二任务链发送给任务拆分单元;以及向多核管理单元发送第二指令,通过第二指令指示多核管理单元为第一任务链和第二任务链抢占处理核;由于任务拆分单元可以将第一任务链拆分成一个或多个第一任务以及将第二任务链拆分成一个或多个第二任务,多核管理单元可以为第一任务链和第二任务链抢占处理核,如此有利于第一任务链和第二任务链的执行。
在一种可能的实现方式中,所述任务拆分单元,用于将所述第一任务链拆分成所述一个或多个第一任务;所述多核管理单元,用于根据所述第二指令,从所述多个处理核中抢占一个或多个第二处理核;向所述任务拆分单元发送抢占所述一个或多个第二处理核的结果;所述任务拆分单元,还用于调度所述一个或多个第二处理核执行所述一个或多个第一任务。本申请实施例中,任务拆分单元在接收到第一任务链后,可以将第一任务链拆分成一个或多个第一任务;其中,第二指令可以包括执行第一任务链所需要的处理核的数量或具体用于执行第一任务链的处理核标识等,多核管理单元在接收到任务队列单元发来的第二指令后,可以根据第二指令从多个处理核中抢占一个或多个第二处理核,并将抢占一个或多个第二处理核的结果发送给任务拆分单元;任务拆分单元在将第一任务链拆分成一个或多个第一任务,且接收到多核管理单元为第一任务链抢占一个或多个第二处理核的结果 后,调度这一个或多个第二处理核执行第一任务链的一个或多个第一任务;如此有利于为第一任务链的执行抢占计算资源。
在一种可能的实现方式中,所述任务拆分单元,还用于将所述第二任务链拆分成所述一个或多个第二任务;所述多核管理单元,还用于当所述多个处理核中有至少一个第一处理核处于空闲状态时,根据所述第二指令,抢占所述至少一个第一处理核;向所述任务拆分单元发送抢占所述至少一个第一处理核的结果;所述任务拆分单元,还用于将所述一个或多个第二任务中的至少一个第二任务调度至所述至少一个第一处理核中执行。本申请实施例中,任务拆分单元在接收到第二任务链后,可以将第二任务链拆分成一个或多个第二任务;任务拆分单元在调度完第一任务链的最后第一任务给一个或多个第二处理核中的一个第二处理核执行之后,多核管理单元即可为第二任务链中的第二任务的执行抢占处理核;其中,第二指令可以包括执行第二任务链所需要的处理核的数量或具体用于执行第二任务链的处理核标识等;此后,只要多个处理核中有至少一个第一处理核处于空闲状态,多核管理单元就会根据第二指令抢占该至少一个第一处理核,并将抢占该至少一个第一处理核的结果发送给任务拆分单元;任务拆分单元即可将这一个或多个第二任务中的至少一个第二任务调度至该至少一个第一处理核中执行;如此,硬件(多核管理单元)实现以多个处理核的为粒度,进行处理核的释放和申请,每个处理核独立管理,当一个处理核完成一个任务链中归属于自己的任务后,立即被释放该处理核,并重新申请该处理核为其他任务链的计算资源。该管理方式相比于以任务链为边界对多个处理核的统一释放和申请操作,极大的减少甚至消除了部分处理核的空载问题,提升了处理核的利用效率。
在一种可能的实现方式中,所述任务调度器还包括任务组装单元;所述任务组装单元,用于获取命令流以及所述多个任务链中的部分或全部任务链之间的依赖关系,并根据所述命令流生成所述多个任务链中的部分或全部任务链;向所述任务队列单元发送所述多个任务链中的部分或全部任务链,以及向所述依赖管理单元发送所述多个任务链中的部分或全部任务链之间的依赖关系。本申请实施例中,软件(DDK)可能将任务以命令流的形式下发给多核处理器,多核处理器中的任务组装单元可以接收命令流,以及接收多个任务链中的部分或全部任务链之间的依赖关系;并根据该命令流生成该多个任务链中的部分或全部任务链;以及向任务队列单元发送该多个任务链中的部分或全部任务链,以及向依赖管理单元发送该多个任务链中的部分或全部任务链之间的依赖关系;如此,在软件(DDK)以命令流的形式下发任务时,也能实现多核调度。
第二方面,本申请实施例提供了一种多核处理器的处理方法,应用于多核处理器,所述多核处理器包括任务调度器、以及耦合于所述任务调度器的多个处理核;所述方法包括:通过所述任务调度器存储多个任务链和所述多个任务链之间的依赖关系,所述依赖关系包括有依赖关系和无依赖关系;通过所述任务调度器根据所述多个任务链之间的依赖关系,从所述多个任务链中确定第一任务链和第二任务链;所述第一任务链与所述第二任务链之间无依赖关系,所述第一任务链包括一个或多个第一任务,所述第二任务链包括一个或多个第二任务;通过所述任务调度器调度所述多个处理核中的部分或全部执行所述一个或多个第一任务;当所述多个处理核中有至少一个第一处理核处于空闲状态时,通过所述任务 调度器将所述第二任务链中的至少一个第二任务调度至所述至少一个第一处理核中执行。
在一种可能的实现方式中,所述任务调度器包括依赖管理单元、任务队列单元;其中,所述通过所述任务调度器存储所述多个任务链之间的依赖关系,包括:通过所述任务调度器中的所述依赖管理单元存储所述多个任务链之间的依赖关系;所述通过所述任务调度器根据所述多个任务链之间的依赖关系,从所述多个任务链中确定第一任务链和第二任务链,包括:若通过所述任务调度器中的所述依赖管理单元判断到所述第一任务链与所述第二任务链之间的依赖关系为无依赖关系后,通过所述任务调度器中的所述依赖管理单元向所述任务队列单元发送第一指令,所述第一指令用于指示所述第一任务链与所述第二任务链之间的依赖关系为无依赖关系。
在一种可能的实现方式中,所述任务调度器还包括任务拆分单元、多核管理单元;其中,所述通过所述任务调度器存储多个任务链,包括:通过所述任务调度器中的所述任务队列单元存储所述多个任务链;所述通过所述任务调度器根据所述多个任务链之间的依赖关系,从所述多个任务链中确定第一任务链和第二任务链,还包括:在通过所述任务调度器中的所述任务队列单元接收到通过所述任务调度器中的所述依赖管理单元发送的第一指令后,通过所述任务调度器中的所述任务队列单元向所述任务拆分单元发送所述第一任务链和所述第二任务链,以及向所述多核管理单元发送第二指令,所述第二指令用于指示所述多核管理单元为所述第一任务链和所述第二任务链抢占处理核。
在一种可能的实现方式中,所述通过所述任务调度器调度所述多个处理核中的部分或全部执行所述一个或多个第一任务,包括:通过所述任务调度器中的所述任务拆分单元将所述第一任务链拆分成所述一个或多个第一任务;通过所述任务调度器中的所述多核管理单元根据所述第二指令,从所述多个处理核中抢占一个或多个第二处理核;通过所述任务调度器中的所述多核管理单元向所述任务拆分单元发送抢占所述一个或多个第二处理核的结果;通过所述任务调度器中的所述任务拆分单元调度所述一个或多个第二处理核执行所述一个或多个第一任务。
在一种可能的实现方式中,所述当所述多个处理核中有至少一个第一处理核处于空闲状态时,通过所述任务调度器将所述第二任务链中的至少一个第二任务调度至所述至少一个第一处理核中执行,包括:通过所述任务调度器中的所述任务拆分单元将所述第二任务链拆分成所述一个或多个第二任务;当所述多个处理核中有至少一个第一处理核处于空闲状态时,通过所述任务调度器中的所述多核管理单元根据所述第二指令,抢占所述至少一个第一处理核;通过所述任务调度器中的所述多核管理单元向所述任务拆分单元发送抢占所述至少一个第一处理核的结果;通过所述任务调度器中的所述任务拆分单元将所述一个或多个第二任务中的至少一个第二任务调度至所述至少一个第一处理核中执行。
在一种可能的实现方式中,所述任务调度器还包括任务组装单元;所述方法还包括:通过所述任务调度器中的所述任务组装单元获取命令流以及所述多个任务链中的部分或全部任务链之间的依赖关系,并根据所述命令流生成所述多个任务链中的部分或全部任务链;通过所述任务调度器中的所述任务组装单元向所述任务队列单元发送所述多个任务链中的部分或全部任务链,以及向所述依赖管理单元发送所述多个任务链中的部分或全部任务链之间的依赖关系。
第三方面,本申请提供一种半导体芯片,可包括上述第一方面中的任意一种实现方式所提供的多核处理器。
第四方面,本申请提供一种半导体芯片,可包括:上述第一方面中的任意一种实现方式所提供的多核处理器、耦合于所述多核处理器的内部存储器以及外部存储器。
第五方面,本申请提供一种片上系统SoC芯片,该SoC芯片包括上述第一方面中的任意一种实现方式所提供的多核处理器、耦合于所述多核处理器的内部存储器和外部存储器。该SoC芯片,可以由芯片构成,也可以包含芯片和其他分立器件。
第六方面,本申请提供了一种芯片系统,该芯片系统包括上述第一方面中的任意一种实现方式所提供的多核处理器。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存所述多核处理器在运行过程中所必要或相关的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其它分立器件。
第七方面,本申请提供一种处理装置,该处理装置具有实现上述第二方面中的任意一种多核处理器的处理方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。
第八方面,本申请提供一种终端,该终端包括多核处理器,该多核处理器为上述第一方面中的任意一种实现方式所提供的多核处理器。该终端还可以包括存储器,存储器用于与多核处理器耦合,其保存终端必要的程序指令和数据。该终端还可以包括通信接口,用于该终端与其它设备或通信网络通信。
第九方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,该计算机程序被多核处理器执行时实现上述第二方面中任意一项所述的多核处理器的处理方法流程。
第十方面,本申请实施例提供了一种计算机程序,该计算机程序包括指令,当该计算机程序被多核处理器执行时,使得多核处理器可以执行上述第二方面中任意一项所述的多核处理器的处理方法流程。
附图说明
图1是本申请实施例提供的一种多核调度系统的架构示意图。
图2是本申请实施例提供的一种任务链的调度执行过程示意图。
图3是本申请实施例提供的另一种多核调度系统的架构示意图。
图4是本申请实施例提供的另一种任务链的调度执行过程示意图。
图5是本申请实施例提供的一种多核调度的流程示意图。
图6是本申请实施例提供的一种多核处理器的处理方法的流程示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例进行描述。本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不 同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
在本说明书中使用的术语“部件”、“模块”、“系统”等用于表示计算机相关的实体、硬件、固件、硬件和软件的组合、软件、或执行中的软件。例如,部件可以是但不限于,在处理器上运行的进程、处理器、对象、可执行文件、执行线程、程序和/或计算机。通过图示,在计算设备上运行的应用和计算设备都可以是部件。一个或多个部件可驻留在进程和/或执行线程中,部件可位于一个计算机上和/或分布在两个或更多个计算机之间。此外,这些部件可从在上面存储有各种数据结构的各种计算机可读介质执行。部件可例如根据具有一个或多个数据分组(例如来自与本地系统、分布式系统和/或网络间的另一部件交互的二个部件的数据,例如通过信号与其它系统交互的互联网)的信号通过本地和/或远程进程来通信。
首先,为了便于理解本申请实施例,进一步分析并提出本申请所具体要解决的技术问题。
请参阅图1,图1是本申请实施例提供的一种多核调度系统的架构示意图。任务调度器以任务链为单位实现任务和进程调度管理,其中,任务链为单向链表结构,为一系列任务的集合。任务链可由设备开发包、驱动组装得到,然后下发给任务调度器;或任务链在任务调度器中完成组装。软件(也即DDK)维护任务链之间的依赖关系,任务调度器不感知任务链之间的依赖关系,任务调度器固定保序执行软件下发的任务链。也即上层应用(APPs)来指定任务链之间的依赖关系,DDK负责解析并维护任务链之间的依赖关系;在逻辑关系上,DDK在APPs的下层。依赖关系可以理解为,某个任务链的执行,需要以其他任务链的执行或执行完成为基础。DDK和硬件交互上,将上层应用(APPs)指定的任务划分为不同类型的任务链,如binning、render、compute、raytracing、transfer等,这几种不同类型的任务在硬件上可以并行调度和执行。如果若干个任务链都属于以上划分的同一个类型,则称为同一类型的任务链。对于同一类型的任务链,任务调度器保证前一个任务链结束后,才调度执行下一个任务链。因此,对于同一类型的任务链的调度执行,存在以下不足:
(1)任务链负载过小时,部分处理核在任务链执行过程中空载,但无法提前用于下一个任务链的执行。具体表现为,按照执行顺序前后,假设分为在前执行的任务链和在后执行的任务链,任务链执行时间与任务链中执行时间最久的任务有关,由于任务链中的每个任务被执行时间不同,用于执行在前执行的任务链中的任务的每个处理核执行完任务的时间不一致,有的处理核执行时间短,有的处理核执行时间长,执行时间短的处理核在执行 完在前执行的任务链中的任务后,需要等待执行时间长的处理核执行完在前执行的任务链中的任务,直至在前执行的任务链执行完成之前,执行时间短的处理核一直处于空载状态,但其又无法用于执行在后执行的任务链,从而会导致在后裔个任务链开始执行前,有部分处理核空闲(IDLE)较长时间,硬件性能存在浪费。
(2)从实测数据看,部分基准测试序列(benchmark)/关键帧性能损失较大。有很多组织/机构会为评测GPU性能提供精心编写的测试序列,常见的如GFX benchmark/3DMARK。GPU实际性能表现会参考以上基准测试的结果。
总结为,现有技术方案未解决多核调度的空载问题,尤其对负载较轻的任务链,因部分处理核空载时间较长,性能损失严重。
请参阅图2,图2是本申请实施例提供的一种任务链的调度执行过程示意图。图2简要描述如下:
(1)假设任务链0(Job chain0)和任务链1(Job chain1)分别可以拆分成任务(task)0至3,总计4个任务;其中,任务链0和任务链1为同一类型的任务链,且任务链0和任务链1之间无依赖关系。
(2)为简单起见,假定多核处理器为4核结构,也即多核处理器包括处理核0至3。
(3)任务调度器首先将任务链0中的4个任务下发给处理核0至3执行;例如,任务链0中的任务0下发给处理核0执行,任务链0中的任务1下发给处理核1执行,任务链0中的任务2下发给处理核2执行,任务链0中的任务3下发给处理核3执行。
(4)等待处理核0至3全部执行完任务链0中的4个任务后,任务链0执行完成;任务调度器下发任务链中的4个任务给处理核0至3执行;例如,任务链中的任务0下发给处理核0执行,任务链中的任务1下发给处理核1执行,任务链中的任务2下发给处理核2执行,任务链中的任务3下发给处理核3执行。
(5)对于任意一个任务链的执行来说,因任务之间的负载并不一致,多个处理核之间的执行该任务链的任务的时间并不相同,故存在部分处理核出现空载的情况。例如,处理核1在执行任务链0和任务链1中的任务1、处理核2在执行任务链0和任务链1中的任务2、处理核3在执行任务链0和任务链1中的任务3,均存在空载的情况。处理核空载时间为硬件性能的损失量(drop),从而导致处理核的性能损失。
因此,本申请需要解决在任务链调度过程中的多核之间的空载问题,提升多核调度的性能。
基于上述,本申请提供一种多核调度系统。请参阅图3,图3是本申请实施例提供的一种多核调度系统30的架构示意图,该多核调度系统30包括多核处理器31和设备开发包、驱动(DDK)32。多核处理器31可以为GPU、神经网络处理器(Neural Network Processing Unit,NPU)等多核协处理器,多核处理器31具体可以包括任务调度器311以及耦合于所述任务调度器311的多个处理核312;其中,所述任务调度器311,用于存储多个任务链和所述多个任务链之间的依赖关系,所述依赖关系包括有依赖关系和无依赖关系;所述任务调度器311,还用于:根据所述多个任务链之间的依赖关系,从所述多个任务链中确定第一任务链和第二任务链;所述第一任务链与所述第二任务链之间无依赖关系,所述第一任务 链包括一个或多个第一任务,所述第二任务链包括一个或多个第二任务;调度所述多个处理核312中的部分或全部执行所述一个或多个第一任务;当所述多个处理核312中有至少一个第一处理核处于空闲状态时,将所述第二任务链中的至少一个第二任务调度至所述至少一个第一处理核中执行。
其中,任务调度器311应用于多核处理器31的多个处理核312的任务下发与多个处理核312的调度管理,为多核处理器31的管理单元。
其中,设备开发包、驱动32包括用户态驱动程序(User Mode Driver,UMD)和核心态驱动程序(Kernel Mode Driver,KMD)。
其中,任务调度器311中存储的多个任务链是由设备开发包、驱动32解析上层应用(APPs)的API调用,将任务传送到多核处理器31上的任务调度器311上的。设备开发包、驱动32可直接完成任务组装,以任务链的形式下发到任务调度器311中。设备开发包、驱动32也可以将任务组装或工作移交给任务调度器311,将任务以命令流的形式下发给任务调度器311,任务调度器311根据命令流组装得到任务链。此外,设备开发包、驱动32还会将任务链之间的依赖关系下发给任务调度器311,任务链之间的依赖关系包括有依赖关系和无依赖关系。
现有技术中,在软件(设备开发包、驱动32)中维护任务链之间的依赖关系,多核处理器31无法知晓任务链之间的依赖关系,对于同一类型的任务链,任务调度器在保证在前执行的任务链执行结束后,才调度执行在后执行的任务链,从而多核处理器中会有部分处理核存在空载的时段。
本申请考虑了现有多核调度方案的不足,提出了一种新型的多核调度方案。相比于现有技术,本申请提供的技术方案在硬件上维护任务链之间的依赖关系,也即在多核处理器31上维护任务链之间的依赖关系,具体在任务调度器311上维护任务链之间的依赖关系。由于任务调度器311可以知晓任务链之间的依赖关系,任务调度器311可以提前下发无依赖关系的任务链中的任务给处理核312执行,防止处理核空载的发生。对于任意两个任务链来说,这两个任务链在下发到任务调度器311时,或在任务调度器311中组装得到时,这两个任务链可能是无依赖关系的,也即这两个任务链一开始就无依赖关系,其可以直接被调度执行;这两个任务链在下发到任务调度器311时,或在任务调度器311中组装得到时,这两个任务链也可能是有依赖关系的,后来这两个任务链之间的依赖关系被解除了,也即这两个任务链一开始有依赖关系,后来变成无依赖关系了,在这两个任务链之间的依赖关系被解除后,其可以被调度执行。具体地,任务调度器311调度任务链的执行过程中,如果任务链之间没有依赖,也即任务链之间的依赖关系为无依赖关系,可以当在前执行的任务链中的任务下发给处理核结束后,不等在前执行的任务链执行结束,立即调度下发在后执行的任务链的任务给处理核,将空载的处理核调度给在后执行的任务链使用。
举例来说,第一任务链与第二任务链之间无依赖关系,第一任务链比第二任务链先开始执行,第一任务链包括一个或多个第一任务,第二任务链包括一个或多个第二任务;当该一个或多个第一任务全部下发给多个处理核312中的部分或全部执行后,只要这多个处理核312中有至少一个第一处理核还处于空闲状态,就将一个或多个第二任务中的至少一个第二任务下发给处于空闲状态的至少一个第一处理核执行。其中,空闲状态或称为空载 状态,也即处理核312没有在执行任务。空闲状态的处理核312可以为未调度用于执行第一任务链中的第一任务的处理核,例如,用于执行第一任务链的处理核只是多个处理核312中的部分,那么未用于执行第一任务链中的第一任务的处理核312若处于空闲状态,则可以用于执行第二任务链中的第二任务。空闲状态的处理核312也可以是执行完第一任务链中的第一任务后处于空载的处理核,例如,用于执行第一任务链中的第一任务的处理核312执行完该第一任务后,开始处于空闲状态,则立刻可以用于执行第二任务链中的第二任务,而无需等到第一任务链执行完成才用于执行第二任务链中的第二任务。第一任务链执行完成是指第一任务链中的所有第一任务都执行完成,一个处理核312可以执行至少一个第一任务或至少一个第二任务。应理解,本申请的多核调度过程是一个动态过程。进一步地,若还有第三任务链,第三任务链与第二任务链之间无依赖关系,第三任务链开始执行时间在第二任务链后,第三任务链包括一个或多个第三任务;当该一个或多个第二任务全部下发给处理核312执行后,只要这多个处理核312中还有空闲状态的处理核312,就将一个或多个第三任务中的至少一个第三任务下发给处于空闲状态的至处理核312执行;其中,用于执行第三任务的处理核312可以为:未用于执行第一任务和第二任务的处理核312,也可以是执行完第一任务后处于空载的处理核312,还可以执行完第一任务以及执行完第二任务后处于空载的处理核312。如此,多个处理核312中的每个处理核312只要处于空闲状态就立刻被调度用于执行下一个任务链的任务,从而本申请可有效解决处理核空载问题,提升多核调度性能。
其中,上述第一任务链与第二任务链可以是同一类型的任务链,但第一任务链与第二任务链下发到处理核中执行时的依赖关系为无依赖关系。上述第一任务链与第二任务链也可以是不同类型的任务链,其可以看成是无依赖关系,因为不同类型的任务链是可以并行执行的。
应理解,设备开发包、驱动32与多核处理器31之间存在双向通信的:
(1)设备开发包、驱动32主动向多核处理器31下发任务。
(2)多核处理器31完成任务后通过中断或查询寄存器告知设备开发包、驱动32;一般是中断,对设备开发包、驱动32效率友好。
本申请实施例中,多核处理器31包括任务调度器311、以及耦合于该任务调度器311的多个处理核312;任务调度器311可以维护任务链之间的依赖关系,也即存储多个任务链之间的依赖关系,并且任务调度器311还存储这多个任务链,如此任务调度器311可以从这多个任务链中确定出无依赖关系的第一任务链与第二任务链;而第一任务链包括一个或多个第一任务,第二任务链包括一个或多个第二任务,任务调度器311可以调度这多个处理核312中的部分或全部执行第一任务链中的一个或多个第一任务;由于第一任务链与第二任务链是无依赖关系的,故第一任务链与第二任务链可以并行执行,或者第一任务链中的第一任务与第二任务链中的第二任务可以并行执行,当这多个处理核312中有至少一个第一处理核处于空闲状态时,任务调度器311将第二任务链中的至少一个第二任务调度至这至少一个第一处理核中执行;如此,本申请实施例中,一旦有处理核出现空载情况,该空载的处理核会立刻被任务调度器311调度用于执行任务,从而可以提升多核调度性能。
在一种可能的实现方式中,所述任务调度器311包括依赖管理单元3111、任务队列单 元3112;其中,所述依赖管理单元3111,用于存储所述多个任务链之间的依赖关系;若判断到所述第一任务链与所述第二任务链之间的依赖关系为无依赖关系后,向所述任务队列单元3112发送第一指令,所述第一指令用于指示所述第一任务链与所述第二任务链之间的依赖关系为无依赖关系。
其中,任务调度器311包括依赖管理单元3111、任务队列单元3112。设备开发包、驱动32或任务调度器311将任务链下发到任务队列3112,同时将任务链之间的依赖关系下发到依赖管理单元3111。具体地,设备开发包、驱动32将任务链之间的依赖关系下发到任务调度器311,也即设备开发包、驱动32将任务链之间的依赖关系下发到任务调度器311中的依赖管理单元3111,依赖管理单元3111可以存储任务链之间的依赖关系;设备开发包、驱动32将任务链下发到任务调度器311,也即设备开发包、驱动32将任务链下发到任务调度器311的任务队列单元3112中,任务队列单元3112可以用于存储任务链;此外,任务调度器311组装得到的任务链也会存储在任务队列单元3112中。
其中,设备开发包、驱动32下发到任务队列单元3112中或任务调度器311组装得到下发存储在任务队列单元3112中的任务链之间可能是有依赖关系的,也有可能是无依赖关系的;且在下发到任务队列单元3112中的任务链之间有依赖关系时,这个依赖关系可以随着任务链执行被解除。其中,依赖管理单元3111可以维护任务链之间的依赖关系,具体会记录任务链之间的依赖关系的变化情况。对于下发到任务队列单元3112的、一开始就无依赖关系的任务链,可以立即执行,也即依赖管理单元3111可以告知任务队列单元3112可以执行这些一开始就无依赖关系的任务链。对于下发到任务队列单元3112的、一开始有依赖关系的任务链,需等待依赖解除后执行,也即依赖管理单元3111记录这些一开始有依赖关系的任务链的依赖情况,当确认到一开始有依赖关系的任务链之间的依赖解除后,依赖管理单元3111告知任务队列单元3112可以执行这些任务链了。例如,依赖管理单元3111判断到第一任务链与第二任务链之间的依赖关系为无依赖关系后,通过第一指令告知任务队列单元3112第一任务链与第二任务链之间的依赖关系为无依赖关系。应理解,依赖解除也即依赖关系从有依赖关系变成无依赖关系。其中,每条第一指令针对一个独立的任务链,用于告知任务队列单元3112该任务链是否可以开始执行。
其中,当任务链之间有依赖时,大体上可能依赖于两种事件:
(1)一个任务链可能依赖于其他一个或几个任务链的执行结束。
(2)一个任务链可能依赖于DDK某个事件的处理结束。
举例如下,假设任务链1的执行,依赖任务链0的结束,则:
(1)任务链0结束后,可以往一个信号量缓存(buffer)中写入特性值(signal semaphore)。
(2)依赖管理单元3111可以轮询(polling)该信号量,某个时间点轮询到预期值,也即轮询到任务链0结束触发的信号。
(3)此时,依赖管理单元3111确认任务链1可以开始执行,告知任务队列单元3112可以下发任务链1的执行。其中,上述任务链0可以为第一任务链,上述任务链1可以为第二任务链。
应理解,依赖管理单元3111与任务队列单元3112之间存在双向通信:
(1)依赖管理单元3111在判断到任务链之间的依赖解除后,通知任务队列单元3112 下发执行这些依赖接触的任务链。
(2)任务队列单元3112完成一个任务链的执行后,通知依赖管理单元3111某个信号量。其中,依赖分为多种,如栅栏(barrier)、栅栏(fence)、旗语(semaphore)、事件(event)等,对于semaphore,存在轮询(wait/polling)和置位(signal/write)两类事件。一个任务链执行结束后,可能跟随如置位旗语(semaphore signal)操作,因此需告知依赖管理单元3111。其中,signal虽然可以翻译成置位,但并不是说从0和1两个值进行置位,可以写入任意值,signal的动作就是写buffer,写入值根据维护规则可以是任意值。
本申请实施例中,任务调度器311包括依赖管理单元3111、任务队列单元3112,硬件实现任务链之间的依赖管理,也即依赖管理单元3111可以获取并存储任务链之间的依赖关系,无需软件(也即DDK)参与任务链之间的依赖管理控制,从而节省了软硬件的交互时间和软件侧调用;且任务链之间的依赖关系解除后,也即任务链之间的依赖关系为无依赖关系或者从有依赖关系转变成无依赖关系后,硬件响应迅速,能立即调度无依赖关系的任务链给处理核,优于软件侧管理;例如,依赖管理单元3111若判断到第一任务链与第二任务链之间的依赖关系为无依赖关系后,立即向任务队列单元3112发送第一指令,任务队列单元3112立即将第一任务链与第二任务链下发给处理核执行。
在一种可能的实现方式中,所述任务调度器311还包括任务拆分单元3113、多核管理单元3114;其中,所述任务队列单元3112,用于存储所述多个任务链;在接收到所述依赖管理单元3111发送的第一指令后,向所述任务拆分单元3113发送所述第一任务链和所述第二任务链,以及向所述多核管理单元3114发送第二指令,所述第二指令用于指示所述多核管理单元3114为所述第一任务链和所述第二任务链抢占处理核。
其中,任务调度器311还包括任务拆分单元3113、多核管理单元3114。任务队列单元3112存储所述多个任务链,也即任务队列单元3112管理多个进程的多个任务链;例如,任务队列单元3112可以将无依赖关系的第一任务链和第二任务链下发执行。
具体地,任务队列单元3112可以依据一定策略将无依赖关系或依赖解除的任务链下放给任务拆分单元3113执行;同时告知多核管理单元3114申请相应的处理核用于执行无依赖关系或依赖解除的任务链。例如,依赖管理单元3111通过第一指令告知任务队列单元3112第一任务链与第二任务链之间的依赖关系为无依赖关系;任务队列单元3112接收到第一指令后,将第一任务链与第二任务链下发给任务拆分单元3113,以及通过第二指令告知多核管理单元3114为第一任务链和第二任务链抢占处理核312,以用于执行第一任务链和第二任务链。其中,任务队列单元3112需要通过第二指令告知多核管理单元3114为第一任务链和第二任务链分别抢占哪些处理核,但不用说明如何抢占,因为多核管理单元3114使用固定策略实施抢占。用于为第一任务链和第二任务链抢占处理核的第二指令,分两次发送,第一次发送告知多核管理单元3114为第一任务链抢占处理核,第二次发送告知多核管理单元3114为第二任务链抢占处理核。
其中,上述一定的策略包括但不限于:
(1)可能有多个进程(APPs)的多个任务链都已解除依赖获取执行权限,任务队列单元3112可在软件使能时间片轮转功能时(该功能软件可选择是否使能),在对应时间片才调度下发对应进程的任务链。
(2)可能有多个进程(APPs)的多个任务链都已解除依赖获取执行权限,且多个进程指定的任务链优先级(priority)不同,任务队列单元3112可在软件不使能时间片轮转功能时(该功能软件可选择是否使能),给予高优先级任务链更高的调度优先级,阻塞低优先级任务链下发。
(3)由于硬件设计的限制,某些厂家的处理核不能很好的支持某些任务链并发(如binning/compute任务链并发时,处理核调度策略问题使得执行不均衡),该场景下需任务队列单元通过预定策略管理binning/compute任务链的下发,比如交织形式下发等。
其中,多核管理单元3114可以实现多个处理核312的动态抢占(或称动态占用)与动态释放,如果某个处理核执行完在前执行的任务链中的任务后,多核管理单元3114则立即释放并重新申请抢占该处理核用于执行在后执行的任务链中的任务;例如,某个处理核执行完第一任务链中的第一任务后,多核管理单元3114可以立即从用于执行第一任务链中释放出来,并重新申请抢占该处理核用于执行第二任务链中的第二任务。应理解,动态抢占的解释为占下不一定能用上,例如,在某些情况下,多核管理单元3114为任务链抢占的处理核312并不会用于执行该任务链中的任务,多核管理单元3114会直接释放该处理核312,此种情况下的释放速度很快。
应理解,任务队列单元3112与任务拆分单元3113存在双向通信:
(1)任务队列单元3112下发任务链给任务拆分单元3113。
(2)任务拆分单元3113完成该任务链的任务拆分以及下发执行后,根据多核管理单元3114是否已全部释放用于执行该任务链的处理核,来判定该任务链是否执行结束;任意任务链执行结束时,均需告知任务队列单元3112。
本申请实施例中,任务调度器311还包括任务拆分单元3113、多核管理单元3114,任务队列单元3112可以存储多个任务链,任务调度器311在接收到依赖管理单元3111发送的第一指令后,知晓第一任务链和第二任务链无依赖关系,将第一任务链和第二任务链发送给任务拆分单元3113;以及向多核管理单元3114发送第二指令,通过第二指令指示多核管理单元3114为第一任务链和第二任务链抢占处理核;由于任务拆分单元3113可以将第一任务链拆分成一个或多个第一任务以及将第二任务链拆分成一个或多个第二任务,多核管理单元3114可以为第一任务链和第二任务链抢占处理核,如此有利于第一任务链和第二任务链的执行。
在一种可能的实现方式中,所述任务拆分单元3113,用于将所述第一任务链拆分成所述一个或多个第一任务;所述多核管理单元3114,用于根据所述第二指令,从所述多个处理核312中抢占一个或多个第二处理核;向所述任务拆分单元3113发送抢占所述一个或多个第二处理核的结果;所述任务拆分单元3113,还用于调度所述一个或多个第二处理核执行所述一个或多个第一任务。
其中,任务拆分单元3113对任务链中的任务做拆分,例如,任务拆分单元3113将第一任务链拆分成一个或多个第一任务;而对任务链做拆分的规则可以为光栅顺序(Raster order)、Z顺序(Z order)、U顺序(U order)、3D立方体(3D cube)等。任务拆分单元3113将拆分得到的任务下发到多核管理单元3114中已为该任务链抢占的处理核312上,处理核312实现任务的计算执行;例如,多核管理单元3114从多个处理核312中抢占一个 或多个第二处理核用于执行第一任务链,这一个或多个第二处理核可以为多个处理核312中的部分或全部,任务拆分单元3113将第一任务链拆分得到的一个或多个第一任务下发到这一个或多个第二处理核上。应理解,任务链拆分出来的任务,与处理核312之间没有特定关系,任务链拆分出来的任务可以下发到设备开发包、驱动32该任务链指定的、用于执行该任务链的任意一个处理核312上。例如,第一任务链拆分得到的一个或多个第一任务是随机下发到上述一个或多个第二处理核上的。
其中,多核管理单元3114为任务链抢占处理核的规则如下:
(1)每一个任务链最多可以在多少个或哪几个处理核312上执行需要设备开发包、驱动32提前指定,设备开发包、驱动32将指定下发到任务队列单元3112中;一般情况下设备开发包、驱动32会指定任务链可以在所有处理核312上执行,但在特殊场景下,某些任务链可以异步(async)的方式慢慢执行时,设备开发包、驱动32可以指定该任务链只允许在某几个处理核312上执行。例如,开发包、驱动32提前指定第一任务链可以在多个处理核中的全部或部分上执行。
例如一多核处理器31为GPU举例两种场景:
第一种场景,GPU可以做设备虚拟化,从而使得对DDK而言,他可以“看到”多个GPU实例(虽然硬件上本质还是只有一个GPU)。多个GPU实例上,每个GPU实例看到的GPU核可以不同,比如GPU0实例只能看到GPU核0~1;GPU1实例只能看到GPU核2~5等。这时对于DDK来说,往GPU0实例上调度任务链时需要指定任务链只能被执行在GPU核0~1;往GPU1实例上调度任务链时需指定GPU核2~5。
第二种场景,用户(APPs)可以指定某些任务是异步计算场景(async compute),这些计算对实时性要求不高,一种可能的实现是,DDK通过一定指标估算该async compute任务链的计算负载,从而分配对应数量的GPU核,使其不全速执行。
(2)每个任务链在被调度时,都需要告诉多核管理单元3114申请哪些处理核312(设备开发包、驱动32指定)用于执行该任务链,但是否能申请上,取决于在该任务链之前执行的任务链是否已经释放了这些处理核312。
其中,多核管理单元3114与任务拆分单元3113可以实时共享处理核312的抢占情况,也即多核管理单元3114会实时把处理核312的抢占情况发给任务拆分单元3113。任意一个处理核312完成任务执行后,会告诉多核管理单元3114,多核管理单元2114依据自身维护的计分板(scoreboard)和任务完成情况,主动决定处理核的释放和抢占。计分板位于多核管理单元3114内,依赖管理单元3111为处理任务链之间的依赖,需要知晓每个任务链的结束事件,间接通过计分板获取该信息。
应理解,任务拆分单元3113与多核管理单元3114存在双向通信:
(1)任务拆分单元3113负责下发任务给处理核312,但需查询多核管理单元3114中的计分板,查询多核管理单元已抢占了哪些处理核,这些处理核目前是否还可接收任务或是否还可以执行任务,以及用于执行某个任务链的处理核是否都释放完毕(此为该任务链执行结束的标记)。
(2)任务拆分单元3113下发任务后,需写多核管理单元3114中的计分板,记录任务在多核管理单元3114抢占的处理核上的分配情况。
本申请实施例中,任务拆分单元3113在接收到第一任务链后,可以将第一任务链拆分成一个或多个第一任务;其中,第二指令可以包括执行第一任务链所需要的处理核的数量或具体用于执行第一任务链的处理核标识等,多核管理单元3114在接收到任务队列单元3112发来的第二指令后,可以根据第二指令从多个处理核312中抢占一个或多个第二处理核,并将抢占一个或多个第二处理核的结果发送给任务拆分单元3113;任务拆分单元3113在将第一任务链拆分成一个或多个第一任务,且接收到多核管理单元3114为第一任务链抢占一个或多个第二处理核的结果后,调度这一个或多个第二处理核执行第一任务链的一个或多个第一任务;如此有利于为第一任务链的执行抢占计算资源。
在一种可能的实现方式中,所述任务拆分单元3113,还用于将所述第二任务链拆分成所述一个或多个第二任务;所述多核管理单元3114,还用于当所述多个处理核312中有至少一个第一处理核处于空闲状态时,根据所述第二指令,抢占所述至少一个第一处理核;向所述任务拆分单元3113发送抢占所述至少一个第一处理核的结果;所述任务拆分单元3113,还用于将所述一个或多个第二任务中的至少一个第二任务调度至所述至少一个第一处理核中执行。
其中,任务拆分单元3113还可以将第二任务链拆分成一个或多个第二任务。在任务拆分单元3113调度一个或多个第二处理核执行第一任务链拆分得到的一个或多个第一任务后,多核管理单元3114即可立即为第二任务链的执行抢占处理核;且多核管理单元3114在为第二任务链的执行抢占处理核时,只要有空闲状态的处理核312,就可以抢占过来用于执行第二任务链,用于执行第二任务链的处理核也即第一处理核。应理解,第二任务链可以在多个处理核中的全部或部分上执行也是开发包、驱动32提前指定的。空闲状态的处理核312可以为未调度用于执行第一任务链中的第一任务的处理核,例如,用于执行第一任务链的处理核只是多个处理核312中的部分,那么未用于执行第一任务链中的第一任务的处理核312若处于空闲状态,则可以被多核管理单元3114抢占用于执行第二任务链中的第二任务。空闲状态的处理核312也可以是执行完第一任务链中的第一任务后处于空载的处理核,例如,用于执行第一任务链中的第一任务的处理核312执行完该第一任务后,开始处于空闲状态,则立刻可以被多核管理单元3114抢占用于执行第二任务链中的第二任务,而无需等到第一任务链执行完成才被多核管理单元3114抢占用于执行第二任务链中的第二任务。
请参阅图4,图4是本申请实施例提供的另一种任务链的调度执行过程示意图,图4的简要描述如下:
(1)假设任务链0和任务链1分别可以拆分成任务0至3,总计4个任务;其中,任务链0和任务链1为同一类型的任务链,且任务链0和任务链1之间无依赖关系。
(2)为简单起见,假定多核处理器为4核结构,也即多核处理器包括处理核0至3。
(3)任务调度器首先将任务链0的4个任务下发给处理核0至3执行;例如,任务链0中的任务0下发给处理核0执行,任务链0中的任务1下发给处理核1执行,任务链0中的任务2下发给处理核2执行,任务链0中的任务3下发给处理核3执行。
(4)等待处理核0至3中的任意一个执行完任务链0中的任务后,任务调度器立即下发任务链1中的任务给该处理核执行;例如,处理核3执行完任务链0中的任务3,立即 将任务链1中的任务0下发给处理核3执行;处理核2执行完任务链0中的任务2,立即将任务链1中的任务1下发给处理核2执行;处理核1执行完任务链0中的任务1,立即将任务链1中的任务2下发给处理核1执行;处理核0执行完任务链0中的任务0,立即将任务链1中的任务3下发给处理核0执行。
应理解,上述任务链0可以为第一任务链,上述任务链1可以为第二任务链。图4中的调度特性使得无依赖的任务链可以并发执行,及时调度且充分利用了处理核的计算能力,减少了空载现象导致的性能下降。
需要说明的是,在拆分第二任务链的过程中,无需考虑第一任务链中的第一任务的空载损失是否需要完全被第二任务链中的第二任务弥补掉。在业务执行过程中,宏观上就已经达到该效果,原因为:
(1)用于执行第一任务链的释放的处理核312及时被抢占用于执行第二任务链了;
(2)对于第一任务链和第二任务链,每个任务链中的任务都以均衡策略下发,保证每个处理核上未执行完的任务的个数基本相等。
(3)任务链中的任务的拆分策略主要考虑的是缓存位置(cache locality)。
应理解,只有同一类型的任务链的执行才存在抢占处理核的情况,对于不同类型的任务链的执行不存在抢占处理核的情况。原因在于,对于同一类型的两个任务链,例如compute类型的两个任务链,这两个任务链会分高低优先级两路,也即分高优先级的任务链和低优先级的任务链;假设开发包、驱动32指定为高优先级的任务链抢占全部处理核,开发包、驱动32指定为低优先级的任务链抢占部分处理核,高优先级的任务链在执行时,多核管理单元3114会抢走用于执行低优先级的任务链的全部计算资源;对于处理核312,其只能看到高优先级的任务链中的任务或只能看到低优先级的任务链中的任务,不可能同时看到高优先级的任务链中的任务和低优先级的任务链中的任务。但是,对于同一类型的只有一个个任务链时,例如compute类型只有一个低优先级的任务链,并且开发包、驱动32指定为该低优先级的任务链抢占部分处理核时,其余的部分处理核可以动态调度用于执行其他类型的任务链,例如动态调度用于执行binning类型的任务链。
本申请实施例中,任务拆分单元3113在接收到第二任务链后,可以将第二任务链拆分成一个或多个第二任务;任务拆分单元3113在调度完第一任务链的最后第一任务给一个或多个第二处理核中的一个第二处理核执行之后,多核管理单元3114即可为第二任务链中的第二任务的执行抢占处理核;其中,第二指令可以包括执行第二任务链所需要的处理核的数量或具体用于执行第二任务链的处理核标识等;此后,只要多个处理核312中有至少一个第一处理核处于空闲状态,多核管理单元3114就会根据第二指令抢占该至少一个第一处理核,并将抢占该至少一个第一处理核的结果发送给任务拆分单元3113;任务拆分单元3113即可将这一个或多个第二任务中的至少一个第二任务调度至该至少一个第一处理核中执行;如此,硬件(多核管理单元3114)实现以多个处理核312的为粒度,进行处理核的释放和申请,每个处理核独立管理,当一个处理核完成一个任务链中归属于自己的任务后,立即被释放该处理核,并重新申请该处理核为其他任务链的计算资源。该管理方式相比于以任务链为边界对多个处理核312的统一释放和申请操作,极大的减少甚至消除了部分处理核的空载问题,提升了处理核的利用效率。
在一种可能的实现方式中,所述任务调度器311还包括任务组装单元3115;所述任务组装单元3115,用于获取命令流以及所述多个任务链中的部分或全部任务链之间的依赖关系,并根据所述命令流生成所述多个任务链中的部分或全部任务链;向所述任务队列单元3112发送所述多个任务链中的部分或全部任务链,以及向所述依赖管理单元3111发送所述多个任务链中的部分或全部任务链之间的依赖关系。
这种场景下,DDK将API中指定的依赖,以及API虽未指定,但DDK自行推测的依赖,以指令形式顺序插入命令流。硬件执行该命令流,将命令流中的命令组装成任务(job),并将其中指令形态的依赖匹配到对应任务链,完成后下发给后级模块。
其中,设备开发包、驱动32可直接完成任务组装,以任务链的形式下发到任务调度器311中。设备开发包、驱动32也可以将任务组装或工作移交给任务调度器311中的任务组装单元3115,将任务以命令流的形式下发给任务组装单元3115,任务组装单元3115根据命令流组装得到任务链;此外,设备开发包、驱动32还会将任务链之间的依赖关系下发给任务组装单元3115;任务组装单元3115组装得到任务链后,将组装得到的任务链发给任务队列单元3112,以及将组装得到的任务链的依赖情况发送给依赖管理单元3111。可以理解,根据设备开发包、驱动32与多核处理器31的工作分工,该任务组装单元3115可能是可选存在的。
本申请实施例中,软件(DDK)可能将任务以命令流的形式下发给多核处理器31,多核处理器31中的任务组装单元3115可以接收命令流,以及接收多个任务链中的部分或全部任务链之间的依赖关系;并根据该命令流生成该多个任务链中的部分或全部任务链;以及向任务队列单元3112发送该多个任务链中的部分或全部任务链,以及向依赖管理单元3111发送该多个任务链中的部分或全部任务链之间的依赖关系;如此,在软件(DDK)以命令流的形式下发任务时,也能实现多核调度。
请参阅图5,图5是本申请实施例提供的一种多核调度的流程示意图,其可以应用于图3所示的多核调度系统30,包括但不限于以下步骤:
步骤501:设备开发包、驱动(DDK)任务解析。
在DDK任务解析流程中,DDK通过分析API调用解析需多核处理器执行的任务,并设置任务之间的依赖关系。在一段任务解析完毕后,进入步骤502。
其中,DDK任务解析流程具体可以由设备开发包、驱动32执行。
步骤502:任务组装。
在任务组装流程中,将任务组装成多核处理器可识别的任务链,并构造对应的数据排序(desc或descriptors),并记录依赖。其中,descriptors是存储在双倍速率同步动态随机存储器(Double Data Rate,DDR)中的数据结构,用于表征每个任务链的各方面信息,如输入数据都有哪些、使用哪个程序段执行、以何种方式处理、输出到哪里、以何种格式输出等。任务链组装完毕后,将任务链之间的依赖关系与任务链下发,同时进入步骤503和步骤504。
其中,任务组装流程具体可以由设备开发包、驱动32或任务组装单元3115执行。
步骤503:依赖管理。
在依赖管理流程中,依据计分板的记录信息参与维护任务链之间的依赖关系。当等待执行的任务链所依赖的其余任务链在计分板中均记录执行完毕时,解除该等待执行的任务链的依赖关系。
其中,依赖管理流程具体可以由依赖管理单元3111执行,计分板位于多核管理单元3114中。
步骤504:任务队列。
在任务队列流程中,当等待执行的任务链的依赖关系解除后,下发该等待执行的任务链,同时进入步骤505和步骤506。
其中,任务队列流程具体可以由任务队列单元3112执行。
步骤505:多核管理。
在多核管理流程中,执行多核处理器的多个处理核的动态抢用和动态释放操作。当计分板上记录某个处理核完成一个任务链的所有任务后,立即释放该处理核,并重新申请该处理核用于执行该等待执行的任务链,且进入步骤506。具体地,一个任务链切分得到的任务的数量与处理核的数量可能相同,也可能不同;存在任务链切分得到的任务的数量多于处理核的数量的情况,此时存在至少一个处理核需要执行该任务链的两个及以上数量的任务;对于需要执行该任务链的两个及以上数量的任务的处理核来说,其在执行完该任务链的最后一个任务后才被释放;而对于只执行该任务链的一个任务的处理核来说,其执行的该任务链的这个任务也即该任务链的最后一个任务。
其中,多核管理流程具体可以由多核管理单元3114执行。
步骤506:任务拆分。
在任务拆分流程中,将该等待执行的任务链拆分成一个或多个任务,并下发给步骤505中为该等待执行的任务链申请抢用的处理核上,实现任务计算。拆分得到的一个或多个任务下发后,同时进入步骤507和步骤508。
其中,任务拆分流程具体可以由任务拆分单元3113执行。
步骤507:计分板。
计分板记录下发给每个处理核的任务及该任务归属的任务链,并依据处理核的返回信息确认还处理核上的一个任务链中的任务是否完全结束,如结束则进入步骤505执行处理核的动态释放和动态抢占。
其中,计分板位于多核管理单元3114中,计分板流程具体可以由多核管理单元3114执行。
步骤508:多核执行。
在多核执行流程中,实现任务的计算执行,每个处理核独立执行,每个处理核在完成每个任务后均返回响应到计分板。
其中,多核执行流程具体可以由处理核312执行。
本申请实施例中,任务调度器管理同类任务链之间的依赖,任务链之间的依赖需在硬件上管理,不在软件(DDK)侧管理。也即,硬件实现任务链的依赖管理,无需DDK参与控制,节省了软硬件的交互时间和软件侧调用,且硬件响应迅速,在依赖关系解除后能立即调度新的任务链下发,优于软件侧管理。
本申请实施例中,任务调度器实现处理核的细粒度动态释放和动态抢占操作,当一个处理核完成某个任务链的最后一个任务后,立即释放并重新被抢占用于执行待执行的任务链,通过细粒度管理减轻或消除处理核空载现象。也即,硬件实现多核处理器的多核的细粒度释放和抢占,每个处理核独立管理,当一个处理核完成一个任务链中归属于自己执行的任务后,立即被释放并重新申请为其余任务链的计算资源。该管理方式相比于以任务链为边界或力度对多核的统一释放和申请操作,极大的减少、甚至消除了部分处理核的空载问题,提升了处理核的利用效率。
本申请实施例中,任务调度器实现处理核跨任务链、跨进程动态调度,防止处理核空载。在下发完任务链的任务后,如果和下一个任务链之间无依赖,无需等任务链结束,可立即执行下一个任务链。也即,硬件实现跨任务链,跨进程的动态调度,可实现在同一个进程内、不同进程间均有效减少处理核空载问题,优于软件侧管理。
请参阅图6,图6是本申请实施例提供的一种多核处理器的处理方法,应用于多核处理器,所述多核处理器包括任务调度器、以及耦合于所述任务调度器的多个处理核;且该多核处理器的处理方法适用于上述图3-图5中的任意一种多核处理器以及包含所述多核处理器的设备(如手机、电脑、服务器等)。该方法可以包括但不限于步骤601-步骤604,其中,
步骤601:通过所述任务调度器存储多个任务链和所述多个任务链之间的依赖关系,所述依赖关系包括有依赖关系和无依赖关系;
步骤602:通过所述任务调度器根据所述多个任务链之间的依赖关系,从所述多个任务链中确定第一任务链和第二任务链;所述第一任务链与所述第二任务链之间无依赖关系,所述第一任务链包括一个或多个第一任务,所述第二任务链包括一个或多个第二任务;
步骤603:通过所述任务调度器调度所述多个处理核中的部分或全部执行所述一个或多个第一任务;
步骤604:当所述多个处理核中有至少一个第一处理核处于空闲状态时,通过所述任务调度器将所述第二任务链中的至少一个第二任务调度至所述至少一个第一处理核中执行。
在一种可能的实现方式中,所述任务调度器包括依赖管理单元、任务队列单元;其中,所述通过所述任务调度器存储所述多个任务链之间的依赖关系,包括:通过所述任务调度器中的所述依赖管理单元存储所述多个任务链之间的依赖关系;所述通过所述任务调度器根据所述多个任务链之间的依赖关系,从所述多个任务链中确定第一任务链和第二任务链,包括:若通过所述任务调度器中的所述依赖管理单元判断到所述第一任务链与所述第二任务链之间的依赖关系为无依赖关系后,通过所述任务调度器中的所述依赖管理单元向所述任务队列单元发送第一指令,所述第一指令用于指示所述第一任务链与所述第二任务链之间的依赖关系为无依赖关系。
在一种可能的实现方式中,所述任务调度器还包括任务拆分单元、多核管理单元;其中,所述通过所述任务调度器存储多个任务链,包括:通过所述任务调度器中的所述任务队列单元存储所述多个任务链;所述通过所述任务调度器根据所述多个任务链之间的依赖 关系,从所述多个任务链中确定第一任务链和第二任务链,还包括:在通过所述任务调度器中的所述任务队列单元接收到通过所述任务调度器中的所述依赖管理单元发送的第一指令后,通过所述任务调度器中的所述任务队列单元向所述任务拆分单元发送所述第一任务链和所述第二任务链,以及向所述多核管理单元发送第二指令,所述第二指令用于指示所述多核管理单元为所述第一任务链和所述第二任务链抢占处理核。
在一种可能的实现方式中,所述通过所述任务调度器调度所述多个处理核中的部分或全部执行所述一个或多个第一任务,包括:通过所述任务调度器中的所述任务拆分单元将所述第一任务链拆分成所述一个或多个第一任务;通过所述任务调度器中的所述多核管理单元根据所述第二指令,从所述多个处理核中抢占一个或多个第二处理核;通过所述任务调度器中的所述多核管理单元向所述任务拆分单元发送抢占所述一个或多个第二处理核的结果;通过所述任务调度器中的所述任务拆分单元调度所述一个或多个第二处理核执行所述一个或多个第一任务。
在一种可能的实现方式中,所述当所述多个处理核中有至少一个第一处理核处于空闲状态时,通过所述任务调度器将所述第二任务链中的至少一个第二任务调度至所述至少一个第一处理核中执行,包括:通过所述任务调度器中的所述任务拆分单元将所述第二任务链拆分成所述一个或多个第二任务;当所述多个处理核中有至少一个第一处理核处于空闲状态时,通过所述任务调度器中的所述多核管理单元根据所述第二指令,抢占所述至少一个第一处理核;通过所述任务调度器中的所述多核管理单元向所述任务拆分单元发送抢占所述至少一个第一处理核的结果;通过所述任务调度器中的所述任务拆分单元将所述一个或多个第二任务中的至少一个第二任务调度至所述至少一个第一处理核中执行。
在一种可能的实现方式中,所述任务调度器还包括任务组装单元;所述方法还包括:通过所述任务调度器中的所述任务组装单元获取命令流以及所述多个任务链中的部分或全部任务链之间的依赖关系,并根据所述命令流生成所述多个任务链中的部分或全部任务链;通过所述任务调度器中的所述任务组装单元向所述任务队列单元发送所述多个任务链中的部分或全部任务链,以及向所述依赖管理单元发送所述多个任务链中的部分或全部任务链之间的依赖关系。
需要说明的是,图6所描述的多核处理器的处理方法的具体流程,可参见上述图3-图5中所述的本申请实施例中的相关描述,此处不再赘述。
本申请实施例中,多核处理器包括任务调度器、以及耦合于该任务调度器的多个处理核;可以通过任务调度器维护任务链之间的依赖关系,也即存储多个任务链之间的依赖关系,并且还通过任务调度器存储这多个任务链,如此可以通过任务调度器从这多个任务链中确定出无依赖关系的第一任务链与第二任务链;而第一任务链包括一个或多个第一任务,第二任务链包括一个或多个第二任务,可以通过任务调度器调度这多个处理核中的部分或全部执行第一任务链中的一个或多个第一任务;由于第一任务链与第二任务链是无依赖关系的,故第一任务链与第二任务链可以并行执行,或者第一任务链中的第一任务与第二任务链中的第二任务可以并行执行,当这多个处理核中有至少一个第一处理核处于空闲状态时,可以通过任务调度器将第二任务链中的至少一个第二任务调度至这至少一个第一处理核中执行;如此,本申请实施例中,一旦有处理核出现空载情况,该空载的处理核会立刻 被任务调度器调度用于执行任务,从而可以提升多核调度性能。
本申请还提供一种半导体芯片,可包括上述实施例中的任意一种实现方式所提供的多核处理器。
本申请还提供一种半导体芯片,可包括上述实施例中的任意一种实现方式所提供的多核处理器、耦合于所述多核处理器的内部存储器以及外部存储器。
本申请还提供一种片上系统SoC芯片,该SoC芯片包括上述实施例中的任意一种实现方式所提供的多核处理器、耦合于所述多核处理器的内部存储器和外部存储器。该SoC芯片,可以由芯片构成,也可以包含芯片和其他分立器件。
本申请还提供一种芯片系统,该芯片系统包括上述实施例中的任意一种实现方式所提供的多核处理器。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存所述多核处理器在运行过程中所必要或相关的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其它分立器件。
本申请还提供一种处理装置,该处理装置具有实现上述方法实施例中的任意一种多核处理器的处理方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。
本申请还提供一种终端,该终端包括多核处理器,该多核处理器为上述实施例中的任意一种实现方式所提供的多核处理器。该终端还可以包括存储器,存储器用于与多核处理器耦合,其保存终端必要的程序指令和数据。该终端还可以包括通信接口,用于该终端与其它设备或通信网络通信。
本申请实施例还提供一种计算机可读存储介质,其中,该计算机可读存储介质可存储有程序,该程序被多核处理器执行时包括上述方法实施例中记载的任意一种的部分或全部步骤。
本申请实施例还提供一种计算机程序,该计算机程序包括指令,当该计算机程序被多核处理器执行时,使得所述多核处理器可以执行上述方法实施例中记载的任意一种多核处理器的处理方法的部分或全部步骤。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可能可以采用其它顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接, 可以是电性或其它的形式。
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
上述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以为个人计算机、服务器或者网络设备等,具体可以是计算机设备中的处理器)执行本申请各个实施例上述方法的全部或部分步骤。其中,而前述的存储介质可包括:U盘、移动硬盘、磁碟、光盘、只读存储器(Read-Only Memory,缩写:ROM)或者随机存取存储器(Random Access Memory,缩写:RAM)等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (14)

  1. 一种多核处理器,其特征在于,包括任务调度器、以及耦合于所述任务调度器的多个处理核;其中,
    所述任务调度器,用于存储多个任务链和所述多个任务链之间的依赖关系,所述依赖关系包括有依赖关系和无依赖关系;
    所述任务调度器,还用于:
    根据所述多个任务链之间的依赖关系,从所述多个任务链中确定第一任务链和第二任务链;所述第一任务链与所述第二任务链之间无依赖关系,所述第一任务链包括一个或多个第一任务,所述第二任务链包括一个或多个第二任务;
    调度所述多个处理核中的部分或全部执行所述一个或多个第一任务;
    当所述多个处理核中有至少一个第一处理核处于空闲状态时,将所述第二任务链中的至少一个第二任务调度至所述至少一个第一处理核中执行。
  2. 根据权利要求1所述的多核处理器,其特征在于,所述任务调度器包括依赖管理单元、任务队列单元;其中,
    所述依赖管理单元,用于存储所述多个任务链之间的依赖关系;若判断到所述第一任务链与所述第二任务链之间的依赖关系为无依赖关系后,向所述任务队列单元发送第一指令,所述第一指令用于指示所述第一任务链与所述第二任务链之间的依赖关系为无依赖关系。
  3. 根据权利要求2所述的多核处理器,其特征在于,所述任务调度器还包括任务拆分单元、多核管理单元;其中,
    所述任务队列单元,用于存储所述多个任务链;在接收到所述依赖管理单元发送的第一指令后,向所述任务拆分单元发送所述第一任务链和所述第二任务链,以及向所述多核管理单元发送第二指令,所述第二指令用于指示所述多核管理单元为所述第一任务链和所述第二任务链抢占处理核。
  4. 根据权利要求3所述的多核处理器,其特征在于,
    所述任务拆分单元,用于将所述第一任务链拆分成所述一个或多个第一任务;
    所述多核管理单元,用于根据所述第二指令,从所述多个处理核中抢占一个或多个第二处理核;向所述任务拆分单元发送抢占所述一个或多个第二处理核的结果;
    所述任务拆分单元,还用于调度所述一个或多个第二处理核执行所述一个或多个第一任务。
  5. 根据权利要求4所述的多核处理器,其特征在于,
    所述任务拆分单元,还用于将所述第二任务链拆分成所述一个或多个第二任务;
    所述多核管理单元,还用于当所述多个处理核中有至少一个第一处理核处于空闲状态 时,根据所述第二指令,抢占所述至少一个第一处理核;向所述任务拆分单元发送抢占所述至少一个第一处理核的结果;
    所述任务拆分单元,还用于将所述一个或多个第二任务中的至少一个第二任务调度至所述至少一个第一处理核中执行。
  6. 根据权利要求2-5中任一项所述的多核处理器,其特征在于,所述任务调度器还包括任务组装单元;
    所述任务组装单元,用于获取命令流以及所述多个任务链中的部分或全部任务链之间的依赖关系,并根据所述命令流生成所述多个任务链中的部分或全部任务链;向所述任务队列单元发送所述多个任务链中的部分或全部任务链,以及向所述依赖管理单元发送所述多个任务链中的部分或全部任务链之间的依赖关系。
  7. 一种多核处理器的处理方法,其特征在于,应用于多核处理器,所述多核处理器包括任务调度器、以及耦合于所述任务调度器的多个处理核;所述方法包括:
    通过所述任务调度器存储多个任务链和所述多个任务链之间的依赖关系,所述依赖关系包括有依赖关系和无依赖关系;
    通过所述任务调度器根据所述多个任务链之间的依赖关系,从所述多个任务链中确定第一任务链和第二任务链;所述第一任务链与所述第二任务链之间无依赖关系,所述第一任务链包括一个或多个第一任务,所述第二任务链包括一个或多个第二任务;
    通过所述任务调度器调度所述多个处理核中的部分或全部执行所述一个或多个第一任务;
    当所述多个处理核中有至少一个第一处理核处于空闲状态时,通过所述任务调度器将所述第二任务链中的至少一个第二任务调度至所述至少一个第一处理核中执行。
  8. 根据权利要求7所述的方法,其特征在于,所述任务调度器包括依赖管理单元、任务队列单元;其中,
    所述通过所述任务调度器存储所述多个任务链之间的依赖关系,包括:
    通过所述任务调度器中的所述依赖管理单元存储所述多个任务链之间的依赖关系;
    所述通过所述任务调度器根据所述多个任务链之间的依赖关系,从所述多个任务链中确定第一任务链和第二任务链,包括:
    若通过所述任务调度器中的所述依赖管理单元判断到所述第一任务链与所述第二任务链之间的依赖关系为无依赖关系后,通过所述任务调度器中的所述依赖管理单元向所述任务队列单元发送第一指令,所述第一指令用于指示所述第一任务链与所述第二任务链之间的依赖关系为无依赖关系。
  9. 根据权利要求8所述的方法,其特征在于,所述任务调度器还包括任务拆分单元、多核管理单元;其中,
    所述通过所述任务调度器存储多个任务链,包括:
    通过所述任务调度器中的所述任务队列单元存储所述多个任务链;
    所述通过所述任务调度器根据所述多个任务链之间的依赖关系,从所述多个任务链中确定第一任务链和第二任务链,还包括:
    在通过所述任务调度器中的所述任务队列单元接收到通过所述任务调度器中的所述依赖管理单元发送的第一指令后,通过所述任务调度器中的所述任务队列单元向所述任务拆分单元发送所述第一任务链和所述第二任务链,以及向所述多核管理单元发送第二指令,所述第二指令用于指示所述多核管理单元为所述第一任务链和所述第二任务链抢占处理核。
  10. 根据权利要求9所述的方法,其特征在于,所述通过所述任务调度器调度所述多个处理核中的部分或全部执行所述一个或多个第一任务,包括:
    通过所述任务调度器中的所述任务拆分单元将所述第一任务链拆分成所述一个或多个第一任务;
    通过所述任务调度器中的所述多核管理单元根据所述第二指令,从所述多个处理核中抢占一个或多个第二处理核;
    通过所述任务调度器中的所述多核管理单元向所述任务拆分单元发送抢占所述一个或多个第二处理核的结果;
    通过所述任务调度器中的所述任务拆分单元调度所述一个或多个第二处理核执行所述一个或多个第一任务。
  11. 根据权利要求10所述的方法,其特征在于,所述当所述多个处理核中有至少一个第一处理核处于空闲状态时,通过所述任务调度器将所述第二任务链中的至少一个第二任务调度至所述至少一个第一处理核中执行,包括:
    通过所述任务调度器中的所述任务拆分单元将所述第二任务链拆分成所述一个或多个第二任务;
    当所述多个处理核中有至少一个第一处理核处于空闲状态时,通过所述任务调度器中的所述多核管理单元根据所述第二指令,抢占所述至少一个第一处理核;
    通过所述任务调度器中的所述多核管理单元向所述任务拆分单元发送抢占所述至少一个第一处理核的结果;
    通过所述任务调度器中的所述任务拆分单元将所述一个或多个第二任务中的至少一个第二任务调度至所述至少一个第一处理核中执行。
  12. 根据权利要求8-11中任一项所述的方法,其特征在于,所述任务调度器还包括任务组装单元;所述方法还包括:
    通过所述任务调度器中的所述任务组装单元获取命令流以及所述多个任务链中的部分或全部任务链之间的依赖关系,并根据所述命令流生成所述多个任务链中的部分或全部任务链;
    通过所述任务调度器中的所述任务组装单元向所述任务队列单元发送所述多个任务链 中的部分或全部任务链,以及向所述依赖管理单元发送所述多个任务链中的部分或全部任务链之间的依赖关系。
  13. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,该计算机程序被多核处理器执行时实现上述权利要求7-12中任意一项所述的方法。
  14. 一种计算机程序,其特征在于,所述计算机可读程序包括指令,当所述计算机程序被多核处理器执行时,使得所述多核处理器执行如上述权利要求7-12中任意一项所述的方法。
PCT/CN2021/077230 2021-02-22 2021-02-22 多核处理器、多核处理器的处理方法及相关设备 WO2022174442A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
PCT/CN2021/077230 WO2022174442A1 (zh) 2021-02-22 2021-02-22 多核处理器、多核处理器的处理方法及相关设备
CN202180093759.7A CN116868169A (zh) 2021-02-22 2021-02-22 多核处理器、多核处理器的处理方法及相关设备
EP21926159.1A EP4287024A4 (en) 2021-02-22 2021-02-22 MULTI-CORE PROCESSOR, MULTI-CORE PROCESSOR PROCESSING METHOD AND RELATED DEVICE
US18/452,046 US20230393889A1 (en) 2021-02-22 2023-08-18 Multi-core processor, multi-core processor processing method, and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/077230 WO2022174442A1 (zh) 2021-02-22 2021-02-22 多核处理器、多核处理器的处理方法及相关设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/452,046 Continuation US20230393889A1 (en) 2021-02-22 2023-08-18 Multi-core processor, multi-core processor processing method, and related device

Publications (1)

Publication Number Publication Date
WO2022174442A1 true WO2022174442A1 (zh) 2022-08-25

Family

ID=82931921

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/077230 WO2022174442A1 (zh) 2021-02-22 2021-02-22 多核处理器、多核处理器的处理方法及相关设备

Country Status (4)

Country Link
US (1) US20230393889A1 (zh)
EP (1) EP4287024A4 (zh)
CN (1) CN116868169A (zh)
WO (1) WO2022174442A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102098503A (zh) * 2009-12-14 2011-06-15 中兴通讯股份有限公司 一种多核处理器并行解码图像的方法和装置
CN103235742A (zh) * 2013-04-07 2013-08-07 山东大学 多核集群服务器上基于依赖度的并行任务分组调度方法
CN103885826A (zh) * 2014-03-11 2014-06-25 武汉科技大学 一种多核嵌入式系统实时任务调度实现方法
US20170060640A1 (en) * 2015-08-31 2017-03-02 Mstar Semiconductor, Inc. Routine task allocating method and multicore computer using the same
US20180032376A1 (en) * 2016-07-27 2018-02-01 Samsung Electronics Co .. Ltd. Apparatus and method for group-based scheduling in multi-core processor system
CN109697122A (zh) * 2017-10-20 2019-04-30 华为技术有限公司 任务处理方法、设备及计算机存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102098503A (zh) * 2009-12-14 2011-06-15 中兴通讯股份有限公司 一种多核处理器并行解码图像的方法和装置
CN103235742A (zh) * 2013-04-07 2013-08-07 山东大学 多核集群服务器上基于依赖度的并行任务分组调度方法
CN103885826A (zh) * 2014-03-11 2014-06-25 武汉科技大学 一种多核嵌入式系统实时任务调度实现方法
US20170060640A1 (en) * 2015-08-31 2017-03-02 Mstar Semiconductor, Inc. Routine task allocating method and multicore computer using the same
US20180032376A1 (en) * 2016-07-27 2018-02-01 Samsung Electronics Co .. Ltd. Apparatus and method for group-based scheduling in multi-core processor system
CN109697122A (zh) * 2017-10-20 2019-04-30 华为技术有限公司 任务处理方法、设备及计算机存储介质

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PENG, MANMAN ET AL.: "Task Allocation and Load Balance on Multi-core Processors", MICROELECTRONICS & COMPUTER, vol. 28, no. 11, 30 November 2011 (2011-11-30), pages 35 - 39, XP055960537 *
See also references of EP4287024A4 *

Also Published As

Publication number Publication date
CN116868169A (zh) 2023-10-10
EP4287024A4 (en) 2024-02-28
EP4287024A1 (en) 2023-12-06
US20230393889A1 (en) 2023-12-07

Similar Documents

Publication Publication Date Title
US8359595B2 (en) Generic application server and method of operation therefor
US9864627B2 (en) Power saving operating system for virtual environment
US9727372B2 (en) Scheduling computer jobs for execution
US9454389B2 (en) Abstracting a multithreaded processor core to a single threaded processor core
US10671458B2 (en) Epoll optimisations
US20190034230A1 (en) Task scheduling method and apparatus
US6732138B1 (en) Method and system for accessing system resources of a data processing system utilizing a kernel-only thread within a user process
CN109144710B (zh) 资源调度方法、装置及计算机可读存储介质
US9411636B1 (en) Multi-tasking real-time kernel threads used in multi-threaded network processing
US8402470B2 (en) Processor thread load balancing manager
CN108595282A (zh) 一种高并发消息队列的实现方法
US9973512B2 (en) Determining variable wait time in an asynchronous call-back system based on calculated average sub-queue wait time
WO2023103296A1 (zh) 一种写数据高速缓存的方法、系统、设备和存储介质
WO2021022964A1 (zh) 一种基于多核系统的任务处理方法、装置及计算机可读存储介质
CN107562685B (zh) 一种基于延时补偿的多核处理器核心间数据交互的方法
CN112491426A (zh) 面向多核dsp的服务组件通信架构及任务调度、数据交互方法
CN114721818A (zh) 一种基于Kubernetes集群的GPU分时共享方法和系统
CN113946445A (zh) 一种基于asic的多线程模块及多线程控制方法
WO2022174442A1 (zh) 多核处理器、多核处理器的处理方法及相关设备
WO2023241307A1 (zh) 管理线程的方法及装置
CN111309494A (zh) 一种多线程事件处理组件
CN115658278A (zh) 一种支持高并发协议交互的微任务调度机
CN106997304B (zh) 输入输出事件的处理方法及设备
CN112114967B (zh) 一种基于服务优先级的gpu资源预留方法
US20230393782A1 (en) Io request pipeline processing device, method and system, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21926159

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180093759.7

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 2021926159

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2021926159

Country of ref document: EP

Effective date: 20230830

NENP Non-entry into the national phase

Ref country code: DE