CN115686873A - Core scheduling method and device for multi-core system - Google Patents

Core scheduling method and device for multi-core system Download PDF

Info

Publication number
CN115686873A
CN115686873A CN202211713989.1A CN202211713989A CN115686873A CN 115686873 A CN115686873 A CN 115686873A CN 202211713989 A CN202211713989 A CN 202211713989A CN 115686873 A CN115686873 A CN 115686873A
Authority
CN
China
Prior art keywords
core
task
schedulable
region
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211713989.1A
Other languages
Chinese (zh)
Other versions
CN115686873B (en
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Moore Threads Technology Co Ltd
Original Assignee
Moore Threads Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Moore Threads Technology Co Ltd filed Critical Moore Threads Technology Co Ltd
Priority to CN202211713989.1A priority Critical patent/CN115686873B/en
Publication of CN115686873A publication Critical patent/CN115686873A/en
Application granted granted Critical
Publication of CN115686873B publication Critical patent/CN115686873B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

A core scheduling method and apparatus for a multi-core system are disclosed. The method comprises the following steps: receiving a task execution request from a target application, wherein the task execution request comprises a task load level of a task to be executed; acquiring a hierarchical task allocation mode aiming at the multi-core system, wherein the hierarchical task allocation mode comprises the corresponding relation between each core in a schedulable core area of the multi-core system and a plurality of task load levels expected to be allocated; determining the scheduling priority of each core in the schedulable core area according to the position relation between each core in the schedulable core area and the preset core; acquiring a task allocation state of each core in a schedulable core area; and determining a target core for processing the task to be executed from the schedulable core area according to the task load level of the task to be executed, the hierarchical task allocation mode and the scheduling priority and the task allocation state of each core in the schedulable core area.

Description

Core scheduling method and device for multi-core system
Technical Field
The present application relates to the field of computer technologies, and in particular, to a core scheduling method and apparatus for a multi-core system, a computing device, a computer-readable storage medium, and a computer program product.
Background
With the development of computer technology, more and more people realize that excessive heat is generated only by increasing the speed of a chip of a single-core processor, the problem of heat dissipation faces a bottleneck, and performance improvement matched with the bottleneck cannot be brought; in addition, even if the heat dissipation problem is not considered, the performance improvement brought by the speed improvement of the current single-core processor is not proportional to the huge cost consumed by the current single-core processor, namely, the cost performance is unacceptable. In this case, a multi-core processor or a multi-core system is generated. Different from the chip speed improvement of a single-core processor, the multi-core processor mainly improves the overall performance of the processor by increasing the number of processor cores in the chip. The multi-core processor technology not only brings stronger computing performance to the computer in application, but also can meet the requirements of multi-task parallel processing and multi-task computing environment.
However, when a multi-core system is used to process tasks, if each core in the multi-core system is not scheduled properly, it may cause that too many cores in a certain core area of a core array of the system are scheduled (for example, at the same time period) to execute relatively complex or computationally expensive tasks (i.e., tasks with too much load), that is, high-load tasks are excessively aggregated in the core area, so that multiple cores arranged adjacent to the area in the area are forced to perform high-speed operations or to be in a high-load state at the same time, and thus the power density of the core area is greatly increased, thereby causing high heat generation of the area; and at the same time, the task load of each core in the physical vicinity is too heavy, so that the core task delay is increased (even exponentially increased), and the overall performance of the system is significantly influenced.
Disclosure of Invention
A core scheduling method and apparatus, a computing device, a computer-readable storage medium and a computer program product for a multi-core system are provided, which desirably alleviate, alleviate or even eliminate some or all of the above-mentioned problems and other possible problems.
According to an aspect of the present application, a core scheduling method for a multi-core system is provided, including: receiving a task execution request from a target application, wherein the task execution request comprises a task load level of a task to be executed; acquiring a hierarchical task allocation mode aiming at the multi-core system, wherein the hierarchical task allocation mode comprises the corresponding relation between each core in a schedulable core area of the multi-core system and a plurality of task load levels expected to be allocated, and the plurality of task load levels expected to be allocated are related to a target application; determining the scheduling priority of each core in the schedulable core area according to the position relationship between each core in the schedulable core area and the preset core; acquiring a task allocation state of each core in a schedulable core area; and determining a target core for processing the task to be executed from the schedulable core area according to the task load level of the task to be executed, the hierarchical task allocation mode and the scheduling priority and the task allocation state of each core in the schedulable core area.
In a core scheduling method according to some embodiments of the present application, determining a scheduling priority of each core in the schedulable core area according to a positional relationship between each core in the schedulable core area and a predetermined core includes: dividing the schedulable core area into a plurality of schedulable core sub-areas, the plurality of schedulable core sub-areas including a reference schedulable core sub-area where the predetermined core is located; determining a first scheduling order for each schedulable core sub-region based on a distance between each schedulable core sub-region and a reference schedulable core sub-region; and determining the scheduling priority of each core in each schedulable core sub-area at least according to the first scheduling sequence of each schedulable core sub-area.
In a core scheduling method according to some embodiments of the present application, cores in the same schedulable core sub-region share the same power domain.
In some embodiments of the core scheduling method according to the present application, each of the schedulable core sub-regions is a square core array region and includes the same number of cores.
In a core scheduling method according to some embodiments of the present application, determining a first scheduling order for each schedulable core sub-region based on a distance between each schedulable core sub-region and a reference schedulable core sub-region includes: calculating a lateral distance and a longitudinal distance between each schedulable core sub-region and the reference schedulable core sub-region; determining a first scheduling order for each schedulable core sub-region based on at least one of a lateral distance and a longitudinal distance between each schedulable core sub-region and the reference schedulable core sub-region.
In a core scheduling method according to some embodiments of the present application, determining a first scheduling order for each schedulable core sub-region based on at least one of a lateral distance and a longitudinal distance between each schedulable core sub-region and the reference schedulable core sub-region includes: for each schedulable core sub-region, calculating a sum of squares of a lateral distance and a longitudinal distance between the schedulable core sub-region and the reference schedulable core sub-region; determining a first scheduling sequence of each schedulable core sub-region based on a sum of squares of the transverse distance and the longitudinal distance corresponding to each schedulable core sub-region; in response to there being at least two schedulable core sub-regions of the plurality of schedulable core sub-regions with a first scheduling order that is the same, updating the first scheduling order for the at least two schedulable core sub-regions based on a longitudinal distance between each of the at least two schedulable core sub-regions and the reference schedulable core sub-region.
In some embodiments of the core scheduling method according to the present application, determining a scheduling priority of each core in each schedulable core sub-region according to at least the first scheduling order of each schedulable core sub-region includes: determining a second scheduling sequence of each core in each schedulable core sub-area according to the transverse arrangement sequence and the longitudinal arrangement sequence of each core in each schedulable core sub-area; and determining the scheduling priority of each core in each schedulable core sub-area according to the first scheduling sequence of each schedulable core sub-area and the second scheduling sequence of each core in the schedulable core sub-area.
In the core scheduling method according to some embodiments of the present application, determining a first scheduling order of each schedulable core sub-region based on a distance between each schedulable core sub-region and a reference schedulable core sub-region includes: based on the distance between each schedulable core sub-region and the reference schedulable core sub-region, sequencing the schedulable core sub-regions in a sequence from small to large; and determining a first scheduling sequence of each schedulable core sub-region according to the sequencing result, wherein the first scheduling sequence of each schedulable core sub-region is consistent with the sequencing result.
In the core scheduling method according to some embodiments of the present application, in the hierarchical task allocation mode, the schedulable core region includes multiple types of core regions corresponding to the multiple task load levels of the expected allocation one to one, each type of core region includes multiple non-adjacent sub-regions, and each sub-region includes one core or at least two adjacent cores.
In some embodiments of the core scheduling method according to the present application, the hierarchical task allocation pattern is obtained according to a plurality of task load levels of the expected allocation.
In the core scheduling method according to some embodiments of the present application, in a case where the plurality of task load levels expected to be allocated include a first task load level and a second task load level, the hierarchical task allocation pattern is a first hierarchical task allocation pattern in which the schedulable core areas include a first class core area corresponding to the first task load level and a second class core area corresponding to the second task load level, and each of the first class core area and each of the second class core area include one core.
In the core scheduling method according to some embodiments of the present application, in a case where the plurality of task load levels expected to be allocated include a first task load level, a second task load level, and a third task load level, the hierarchical task allocation pattern is a second hierarchical task allocation pattern in which the schedulable core areas include a third class core area corresponding to the first task load level, a fourth class core area corresponding to the second task load level, and a fifth class core area corresponding to the third task load level, and each sub-area in the fourth class core area is not adjacent to each sub-area in the fifth class core area.
In some embodiments of the core scheduling method according to the present application, the task complexity level corresponding to each of the second task load level and the third task load level is greater than the task complexity level corresponding to the first task load level.
In a core scheduling method according to some embodiments of the present application, a schedulable core area is an array area, and in a case where the plurality of task load levels of the expected assignment includes a first task load level, a second task load level and a third task load level and a range is smaller than a first range threshold, the hierarchical task assignment pattern is a first sub-pattern of a second hierarchical task assignment pattern, the range indicating a difference between task complexity levels corresponding to a highest task load level and a lowest task load level among the plurality of task load levels of the expected assignment, in the first sub-pattern, in each row and each column of the schedulable core area, each core in a fourth type area is separated by at least one core in the third type area and at least one core in a fifth type area, and each core in the fifth type area is separated by at least one core in the third type area and at least one core in the fourth type area.
In a core scheduling method according to some embodiments of the present application, in a case where the plurality of task load levels of the expected allocation include a first task load level, a second task load level, and a third task load level and the range is greater than or equal to a second range threshold, the second hierarchical task allocation pattern is a second sub-pattern of the second hierarchical task allocation pattern in which the first range threshold is less than or equal to the second range threshold, in the second sub-pattern, in odd rows and odd columns of the schedulable core area, each core in the fourth class area is separated by one or more cores in the third class area, and in even rows and even columns of the schedulable area, each core in the fourth class area is separated by at least one core in the third class area and at least one core in the fifth class area, each core in the fifth class area is separated by at least one core in the third class area and at least one core in the fourth class area.
In some embodiments of the core scheduling method according to the present application, a task complexity corresponding to the third task load level is greater than a task complexity corresponding to the second task load level.
In some embodiments of the core scheduling method according to the present application, the method further includes: acquiring regional scheduling parameters corresponding to the target application, wherein the regional scheduling parameters are determined based on the core number required by the running of the target application; and determining a schedulable core area from a core array of the multi-core system according to the area scheduling parameter.
In the core scheduling method according to some embodiments of the present application, determining a target core for processing the task to be executed from the schedulable core area according to the task load level of the task to be executed, the hierarchical task allocation pattern, and the scheduling priority and the task allocation status of each core in the schedulable core area, includes: determining a first candidate core area matched with the task load level of the task to be executed from the schedulable area according to the hierarchical task distribution mode; determining a second candidate core region from the first candidate core region based on task allocation states of respective cores in the schedulable region; and selecting a target core from the second candidate area according to the scheduling priority of each core in the schedulable core area.
In a core scheduling method according to some embodiments of the present application, a power domain of a target core is a power domain that the target core shares with at least one other core in the multi-core system, and the method further includes: detecting a task allocation state of the at least one other core in response to the task to be executed being processed by the target core; controlling switching of a power domain of the target core based on a task allocation status of the at least one other core.
In a core scheduling method according to some embodiments of the present application, controlling a switch of a power domain of the target core based on a task allocation status of the at least one other core includes: and in response to the task allocation status of the at least one other core being an unallocated task, turning off the power domain of the target core.
In a core scheduling method according to some embodiments of the present application, controlling switching of a power domain of the target core based on a task allocation status of the at least one other core includes: starting power domain idle state timing in response to the task allocation state of the at least one other core being an unallocated task state; and responding to the preset time when the power domain idle state is timed, and turning off the power domain of the target core.
In a core scheduling method according to some embodiments of the present application, controlling switching of a power domain of the target core based on a task allocation status of the at least one other core further includes: acquiring a task allocation status of each of the at least one other core and the target core in real time during the power domain idle state timing; in response to a task allocation status of at least one of the at least one other core and the target core being an allocated task status during the power domain idle state timing, terminating the power domain idle state timing and keeping a power domain of the target core on.
In some embodiments of the core scheduling method according to the present application, the method further includes: and updating the task allocation state of the target core to be an unallocated task state in response to the completion of the task to be executed by the target core.
In some embodiments of the core scheduling method according to the present application, the method further includes: detecting a power state of a power domain of the target core in response to the target core being determined; and controlling the processing process of the task to be executed according to the power supply state of the power domain of the target core.
In a core scheduling method according to some embodiments of the present application, controlling a processing procedure of the task to be executed according to a power supply state of a power domain of the target core includes: responding to the situation that the power supply state of the power domain of the target core is a complete turn-off state, starting the power domain of the target core and detecting whether the power domain of the target core enters the complete turn-on state in real time; and in response to the power domain of the target core entering a fully on state, instructing the target core to process the task to be executed.
In a core scheduling method according to some embodiments of the present application, controlling a processing procedure of the task to be executed according to a power supply state of a power domain of the target core includes: responding to the power supply state of the power domain of the target core as a power-on state, and detecting whether the power domain of the target core enters a fully-on state in real time; and in response to the power domain of the target core entering a fully-on state, instructing the target core to process the task to be executed.
In a core scheduling method according to some embodiments of the present application, controlling a processing procedure of the task to be executed according to a power supply state of a power domain of the target core includes: responding to the fact that the power supply state of the power domain of the target core is a power-down state, and detecting whether the power domain of the target core enters a complete turn-off state or not in real time; responding to the fact that the power domain of the target core enters a complete turn-off state, starting the power domain of the target core and detecting whether the power domain of the target core enters the complete turn-on state in real time; and responding to the power domain of the target core entering a fully-on state, and instructing the target core to process the task to be executed.
According to another aspect of the present application, there is provided a core scheduling apparatus for a multi-core system, including: a receiving module configured to receive a task execution request from a target application, the task execution request including a task load level of a task to be executed; a first obtaining module configured to obtain a hierarchical task allocation schema for a multi-core system, the hierarchical task allocation schema including a correspondence between individual cores in a schedulable core area of the multi-core system and a plurality of task load levels of an expected allocation, the plurality of task load levels of an expected allocation being related to a target application; a first determining module configured to determine a scheduling priority of each core in the schedulable core area according to a positional relationship between each core in the schedulable core area and a predetermined core; a second obtaining module configured to obtain a task allocation status of each core in the schedulable core area; a second determining module configured to determine a target core for processing the task to be executed from the schedulable core region according to the task load level of the task to be executed, the hierarchical task allocation pattern, and the scheduling priority and task allocation status of each core in the schedulable core region.
According to another aspect of the application, a computing device is presented, comprising: a memory and a processor, wherein the memory has stored therein a computer program that, when executed by the processor, causes the processor to perform a core scheduling method for a multi-core system according to some embodiments of the present application.
According to another aspect of the present application, a computer-readable storage medium is presented having computer-readable instructions stored thereon which, when executed, implement a method according to some embodiments of the present application.
According to another aspect of the present application, a computer program product is presented, comprising a computer program which, when executed by a processor, performs the steps of the method according to some embodiments of the present application.
In the core scheduling method and device for the multi-core system according to some embodiments of the present application, first, a task hierarchical processing manner is adopted for tasks issued by software (target application), that is, task complexity is represented by task load levels, which is beneficial to quantitative analysis of task operation complexity and simplifies the core scheduling process based on hierarchical tasks; secondly, through a hierarchical task allocation mode, the balanced allocation or arrangement of tasks of different load levels in the core array can be realized, so that the power density of the core array of the multi-core system is effectively controlled, the power density of the multi-core array during task execution is relatively balanced, the problems of high load state, overlarge power density and high heating of a core array area caused by excessive aggregation of high-load tasks are avoided, and the overall performance and the task execution efficiency of the multi-core system are effectively improved; in addition, while the hierarchical task allocation mode is used for realizing the balanced arrangement of different task load levels, the core allocation sequence of the current tasks to be executed is defined by using the scheduling priority (for example, the near-zero principle) determined based on the position relationship (for example, the position closeness degree of the cores to be scheduled) between each core and a predetermined core (for example, the initial cores to be scheduled with the ID =0 at the upper left corner of the core array), so that each hierarchical task is preferentially and intensively allocated to the area (for example, the power supply area of the same power supply domain) near the predetermined core, which is beneficial to the centralized management of the scheduled cores in the multi-core system, especially the regional power supply management, the core management efficiency of the multi-core system is remarkably improved, and simultaneously, the energy consumption, the power supply life loss and the work efficiency loss (for example, caused by frequent switching of the power supplies) are reduced by the unified regional power supply management (for example, the power supply sleep mode is adopted when each core in the scheduled core area is idle).
Drawings
Various aspects, features and advantages of the present application will become more readily apparent from the following detailed description and the accompanying drawings, in which:
FIG. 1 schematically illustrates an example implementation environment for a core scheduling method for a multi-core system according to some embodiments of the present application;
FIG. 2 schematically illustrates a flow diagram for core scheduling for a multi-core system according to some embodiments of the present application;
fig. 3A and 3B schematically show entity architecture diagrams corresponding to a core scheduling method for a multi-core system according to some embodiments of the present application, respectively;
FIG. 4 schematically illustrates a core scheduling priority diagram of a multi-core system according to some embodiments of the present application;
FIG. 5 schematically illustrates a flow diagram for core scheduling for a multi-core system according to some embodiments of the present application;
FIGS. 6A-6E each schematically illustrate a hierarchical task allocation schema in accordance with some embodiments of the present application;
FIG. 7 schematically illustrates a flow diagram for core scheduling for a multi-core system according to some embodiments of the present application;
FIG. 8 schematically illustrates a flow diagram for core scheduling for a multi-core system according to some embodiments of the present application;
FIG. 9A schematically illustrates a state change schematic for a power domain of a multi-core system according to some embodiments of the present application;
FIGS. 9B and 9C illustrate waveforms of key signals, respectively, during an implementation of a method for core scheduling for a multi-core system according to some embodiments of the present application;
FIG. 10 schematically illustrates an example block diagram of a core scheduling apparatus for a multi-core system in accordance with some embodiments of the present application; and
FIG. 11 schematically illustrates an example block diagram of a computing device in accordance with some embodiments of the present application.
It is to be noted that the figures are diagrammatic and explanatory only and are not necessarily drawn to scale.
Detailed Description
Several embodiments of the present application will be described in more detail below with reference to the accompanying drawings in order to enable those skilled in the art to practice the application. This application may be embodied in many different forms and purposes and should not be construed as limited to the embodiments set forth herein. These embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art. The examples do not limit the present application.
It will be understood that, although the terms first, second, third, etc. may be used herein to describe various elements, components and/or sections, these elements, components and/or sections should not be limited by these terms. These terms are only used to distinguish one element, component or section from another element, component or section. Thus, a first element, component, or section discussed below could be termed a second element, component, or section without departing from the teachings of the present application.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the singular forms "a", "an" and "the" include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and/or the present specification and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Before describing embodiments of the present application in detail, some relevant concepts are explained first for clarity.
1. The multi-core system: i.e., multi-core processors, which refer to the integration of two or more complete compute engines or cores in one processor, such as a Chip Multiprocessor (CMP) architecture; a "core" in a multi-core system refers to a compute engine or core, which may be referred to herein as a "core," for information processing and task execution.
2. Power density of multi-core system: this refers herein to the density of core power consumption in a region of a multi-core system array, and may be, for example, equal to the ratio of the total power consumption of all cores in the region to the area of the region.
3. Core scheduling: refers to the management and control of individual cores or cores in a multi-core system, including, for example, task allocation, power regulation, and the like.
In order to solve the problem of overhigh local or overall power density of a core array caused by improper distribution of each core task in the multi-core system, the application provides a core scheduling method for the multi-core system. The method comprises the steps of firstly, dividing a task to be executed into a plurality of load levels according to the calculation complexity or the calculation load; subsequently (pre-) determining a task allocation policy or pattern based on the different load levels (i.e. the arrangement of the locations in the core array at the various load levels) and determining core scheduling priorities in terms of the positional relationship between the cores and the predetermined cores; and finally, distributing the tasks to be executed to corresponding cores based on the determined task distribution strategy, the scheduling priority and the like, namely scheduling the corresponding target cores to process the tasks to be executed.
FIG. 1 schematically illustrates an example environment 100 for implementing a core scheduling method for a multi-core system according to some embodiments of the present application. As shown in FIG. 1, the implementation environment 100 may include a target application 110, a multi-core system 120, and a core scheduling platform 130. The target application 110 may be various types of software or applications that are running on a computing device (e.g., server, terminal device, embedded computing device, etc.) for distributing tasks to be performed for processing by a processor. The multi-core system 120 may be a multi-core processor located in a computing device for processing various tasks issued by the target application 110. The core scheduling platform 130 may be a software module (e.g., a program module) and/or a hardware module (e.g., a circuit) in a computing device that manages or schedules the operations of the cores in the multi-core system 120 (e.g., may include task allocation or power management for the cores, etc.). The core scheduling method for a multi-core system according to some embodiments of the application may be implemented by using the above-described core scheduling platform 130.
In some embodiments, the target application 110 may comprise a terminal application (program) running in user mode on a terminal device that may interact with a user and have a visual user interface. From a functional perspective, the terminal applications may include cloud games, social applications, payment applications, shopping applications, multimedia applications (such as audio and video applications), educational applications, and the like; from an access style perspective, terminal applications may include locally installed applications, applets accessed via other applications, web programs accessed via a browser, and the like. The terminal applications may include, but are not limited to, cell phone APPs, computer software, etc. In some embodiments, the target application 110 may comprise a program or software running in a server, i.e., a server-side application. Alternatively, the target application 110 may also include a system application running on a terminal device or a server.
In some embodiments, the multi-core system 120 may be a multi-core Central Processing Unit (CPU) in a computing device, and optionally may also be a multi-core Graphics Processor (GPU) or other various types of processors or chipsets. Multi-core technology is a relatively common technology to improve processor performance, especially on servers; typically, the number of cores of a multi-core processor of a server product is 16 cores, and 40 or 80 cores are common, and the size can reach hundreds of cores or even thousands of cores. The cores in the multi-core system 120 may be interconnected via a Network On Chip (NOC). NOC refers to an interconnection structure between cores in a multi-core system, and aims to realize interconnection and intercommunication between cores in the multi-core system; for example, the mesh structure and the full-interconnection structure (in comparison, the full-interconnection structure has higher performance and less time delay, but has a complex structure, and the mesh structure has a relatively simple overall structure and is convenient and flexible to wire). The core scheduling method can realize the scheduling and task allocation of multi-core in both a multi-core system with a mesh NOC structure and a multi-core system with a full-interconnection NOC structure.
In some embodiments, core scheduling platform 130 may include separate hardware, software, firmware or a combination thereof for implementing corresponding functions, such as a processor with data transceiving and processing capabilities, a single chip, a discrete logic circuit with logic gates for implementing logic functions on data signals, an application specific integrated circuit with suitable combinational logic gates, a Programmable Gate Array (Programmable Gate Array), a Field Programmable Gate Array (Field Programmable Gate Array), etc., or a combination of any two or more of the above hardware; or may also include, but is not limited to, processes running on a processor, objects, executables, threads of execution, programs, and the like.
As shown in fig. 1, first, core scheduling platform 130 may be configured to: a task execution request is received from a target application, the task execution request including a task load level for a task to be executed, the task load level may indicate a task complexity level. Second, core scheduling platform 130 may be configured to: the method comprises the steps of obtaining a hierarchical task allocation mode aiming at the multi-core system, wherein the hierarchical task allocation mode comprises the corresponding relation between each core in a schedulable core area of the multi-core system and a plurality of task load levels expected to be allocated, and the plurality of task load levels expected to be allocated are related to target applications. Further, core scheduling platform 130 may be configured to: and determining the scheduling priority of each core in the schedulable core area according to the position relation between each core in the schedulable core area and the preset core. Further, core scheduling platform 130 may be configured to: and acquiring the task distribution state of each core in the schedulable core area. Finally, core scheduling platform 130 may be configured to: and determining a target core for processing the task to be executed from the schedulable core area according to the task load level of the task to be executed, the hierarchical task allocation mode and the scheduling priority and the task allocation state of each core in the schedulable core area.
As shown in fig. 1, optionally, after determining the target core for processing the task to be executed, the core scheduling platform 130 may be configured to send an identifier of the target core (e.g., a target core ID) to the target application 110; subsequently, after obtaining the target core identifier, the target application 110 may directly send the identifier and the task to be executed to the multi-core system 120 for processing by the target core corresponding to the identifier; finally, the multi-core system 120 assigns the target core corresponding to the identifier to perform task processing after receiving the identifier and the task to be executed.
The computing devices or computers involved in implementing the environment 100 in fig. 1 (e.g., computing devices on which the multi-core system 120 resides) may include terminal devices and/or servers. The terminal device may be any type of mobile computing device, including a mobile computer (e.g., personal Digital Assistant (PDA), laptop computer, notebook computer, tablet computer, netbook, etc.), a mobile phone (e.g., cellular phone, smartphone, etc.), a wearable computing device (e.g., smart watch, headset, including smart glasses, etc.), or other type of mobile device. In some embodiments, the terminal device may also be a stationary computing device, such as a desktop computer, a gaming console, a smart television, and so forth. The server may be a single server or a cluster of servers, or may be a cloud server or a cluster of cloud servers capable of providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, web services, cloud communications, middleware services, domain name services, security services, CDNs, and big data and artificial intelligence platforms. It should be understood that the servers referred to herein are typically server computers having a large amount of memory and processor resources, but other embodiments are possible.
The implementation environment 100 of the core scheduling method for a multi-core system according to some embodiments of the present application shown in fig. 1 is only schematic. The core scheduling method for a multi-core system according to the present application is not limited to the illustrated example implementation environment. It should be understood that herein, generally, the target application 110, the core scheduling platform 130 and the multi-core system 120 shown in fig. 1 may be in the same computing device, but they may also be of different computing devices, respectively. For example, the target application 110 is an application program or software running in a terminal device, the multi-core system 120 may be a multi-core processor of a server for processing or executing tasks issued by the target application on the terminal device through network transmission, and the core scheduling platform may be located or run on at least one side of the server or the terminal device to implement core scheduling and task allocation in the multi-core system 120.
Fig. 2 schematically illustrates a flow diagram of a core scheduling method for a multi-core system according to some embodiments of the present application. As shown in fig. 2, a core scheduling method according to some embodiments of the present application may include:
s210, a task execution request receiving step;
s220, a step of obtaining a grading task distribution mode;
s230, determining a scheduling priority;
s240, acquiring a task allocation state; and
and S250, determining a target core.
Fig. 3A schematically illustrates a corresponding entity architecture diagram for a core scheduling method for a multi-core system according to some embodiments of the present application.
As shown in fig. 3A, a core scheduling method for a multi-core system according to some embodiments of the present application involves information interaction between a target application 310, a multi-core array 320 (i.e., a core array formed by each core of the multi-core system), and a core scheduling platform 330. In FIG. 3A, a multi-core array 320 may indicate the physical arrangement of the cores in the multi-core system 120 shown in FIG. 1. As shown in fig. 3A, the multi-core array 320 may be a square array of n × n cores, where n is an integer greater than or equal to 2. It is noted that the multi-core array 320 may alternatively take on other physical arrangements of shapes, such as rectangles, diamonds, and the like. As shown in fig. 3A, a core scheduling platform 330 for implementing the core scheduling method according to the present application may include: an application interaction interface 331 for interacting with the target application 310, such as receiving task execution requests therefrom and sending identifiers of assigned cores thereto; the core scheduling component 332 is configured to implement core scheduling according to the task execution request received by the application interaction interface 331 and according to the hierarchical task allocation mode, the task load level of the task to be executed, and the core task allocation state. Optionally, as shown in fig. 3A, the core scheduling platform 330 may further include: a core state recording table 333, configured to record a working state of each core in the multi-core array 320, for example, a core identifier, a task allocation state, a power state, a task load level to be allocated, and the like; and a power management component 334, configured to control power on and/or power off of each core according to the state of the core or the core state record table 333, so as to reduce overall power consumption of the multi-core system or the multi-core array 320 to the maximum extent while ensuring normal operation of the multi-core system or the multi-core array.
Fig. 3B shows the internal structure and interaction diagram of the application interaction interface 331 and the core scheduling component 332 in fig. 3A. The application interaction interface 331 may also be referred to as a software and hardware interaction interface (i.e., an interaction interface of a target application (software) with the multi-core system 320 or the core scheduling platform (hardware) 330).
As shown in fig. 3B, the application interaction interface 331 may include a plurality of registers: the TASK request register TASK _ REQ, the TASK LOAD LEVEL register LOAD _ LEVEL, and the TASK classification parameter register MODE are respectively used for receiving and storing the TASK execution request, the TASK LOAD LEVEL of the TASK to be executed, and the TASK classification parameter corresponding to the target application from the target application 310. Optionally, as shown in fig. 3B, the application interaction interface 331 may further include a region scheduling parameter register MC _ SIZE for receiving and storing the region scheduling parameters from the target application 310. Optionally, as shown in fig. 3B, the application interaction interface 331 may further include a target core identifier register, core _ SEL, for receiving and storing an Identifier (ID) of the scheduled or allocated target core from the core scheduling component 332 and feeding it back to the target application 310. As shown in fig. 3B, the application interaction interface 331 may optionally include a power domain parameter register PWR _ SIZE and a region partition parameter register CZ _ SIZE for storing the multi-core system power domain parameter and the region partition parameter, respectively. In some embodiments, the power domain parameters are used to describe the number of cores and the shape of a core region involved in each power domain in the multi-core system, and may be represented by m × n, where m represents the number of horizontal cores of a core array corresponding to the power domain, and n represents the number of vertical cores; the region division parameters are used for describing the number and the shape of cores of each schedulable sub-region of the multi-core system or the schedulable region therein after being divided, and the representing method is consistent with the power domain parameters. In general, the power domain parameter and the region division parameter may be parameters predetermined based on software and hardware conditions.
It should be noted that although the various registers in the application interaction interface 331 are shown as separate structures in fig. 3B, in some embodiments, two or more of them may be different components of the same register.
Next, steps S210 to S250 shown in fig. 2 will be described in detail with reference to fig. 3A and 3B. As shown in the implementation environment of fig. 1 and the entity architecture diagram of fig. 3A, the above-described steps S210-S250 of the core scheduling method according to some embodiments of the present disclosure may be implemented in the core scheduling platform 130, 330.
In step S210 (task execution request receiving step), a task execution request including a task load level of a task to be executed is received from a target application. The task load level may indicate a task complexity level.
Generally, the core scheduling of the multi-core system may include work scheduling of a plurality of cores, where the work scheduling refers to task allocation for each core in the multi-core system, that is, how to schedule the corresponding core for processing for different (task load level) tasks. Therefore, before performing core scheduling, a task execution request needs to be received from an application currently running in a computing device where the multi-core system is located, and then a core scheduling process is started in response to the task execution request.
According to the concept of the present application, in order to overcome the problem of power density imbalance caused by excessive aggregation of high task loads in a core array of a multi-core system, core scheduling with controllable power density balance may be implemented through a hierarchical task allocation strategy or mode for the multi-core system. Thus, the task load level of the task to be executed, which is used to indicate the complexity of the task, needs to be included in the task execution request. Optionally, the task execution request may further include a task ranking parameter for obtaining a ranking task allocation pattern in step S220. Further optionally, the task execution request may further include a region scheduling parameter for determining the schedulable core region.
Step S210 may be completed through the application interaction interface 331 shown in fig. 3B. As shown in the entity architecture diagrams of fig. 3A and 3B, the core scheduling platform 330 may receive and store the TASK execution request from the target application 310 through the TASK request register TASK _ REQ of the application interaction interface 331, where the TASK LOAD LEVEL of the TASK to be executed may be separately stored in the TASK LOAD LEVEL register LOAD _ LEVEL; then, the task LOAD LEVEL register LOAD _ LEVEL is used to send the task LOAD LEVEL of the task to be executed to the target core determination module 332a of the core scheduling component 332.
In some embodiments, a task load level is a parameter that measures task complexity or computational load, which may be used to characterize the computational or processing burden expected by a processor core when executing or processing a respective task. Through the division of the task load levels, various tasks related to target applications with different complexities can be classified into a plurality of task load levels, and therefore the process of core scheduling is simplified. The specific number of task load levels of the relevant tasks of the target application and the calculation load range corresponding to each task load level may be determined according to the target application and the specific application scenario of each task issued by the target application. As shown in fig. 3A and 3B, the task load level of the task to be performed may be predetermined by the target application 310 before sending it to the core scheduling platform 330 to simplify the operation of the core scheduling platform 330. Alternatively, the task load level division of the task to be executed with respect to the target application or 310 may also be determined by the core scheduling platform 330 by detecting the complexity of the task to be executed when receiving the task execution request.
In step S220 (hierarchical task allocation pattern obtaining step), a hierarchical task allocation pattern for the multi-core system is obtained, and the hierarchical task allocation pattern may include a correspondence between each core in a schedulable core area of the multi-core system and a plurality of task load levels expected to be allocated. The plurality of task load levels for the expected allocation are associated with the target application. Wherein the schedulable core area may be at least a portion of a core array of the multi-core system.
According to the core scheduling method of the present application, after a task execution request is received, a core may be scheduled using a hierarchical task allocation pattern based on the request, and thus the hierarchical task allocation pattern needs to be acquired before scheduling.
In some embodiments, the hierarchical task allocation pattern may be described as including a correspondence between a location of each core in the schedulable region in the multi-core array and a plurality of task load levels expected to be allocated, where the schedulable region is a core region screened from the core array of the multi-core system that includes the plurality of cores. The plurality of task load levels expected to be allocated are associated with the target application and may be determined, for example, based on task ranking parameters published by the target application. The task ranking parameter may include a plurality of task load levels for an expected allocation associated with the target application.
In some embodiments, the retrieval of the hierarchical task allocation schema may be based on a plurality of task load levels of the expected allocation associated with the target application. For example, a plurality of task load levels of the expected distribution can be obtained based on the task ranking parameters, and then the ranking distribution mode corresponding to the plurality of task load levels of the expected distribution can be obtained. As shown in fig. 3B, step S220 may be completed in the task allocation MODE obtaining module 332B in the core scheduling component 332, that is, based on the task classification parameters received from the task classification parameter register MODE, obtaining the corresponding classification task allocation MODE and sending it to the target core determining module 332a.
In some embodiments, the plurality of task load levels expected to be allocated are related to the task ranking parameters corresponding to the target application, and since the task related to the target application is an object processed by the multi-core system, all the task load levels corresponding to the target application may be considered as the task load levels to be allocated or expected to be allocated. In this way, the plurality of task load levels expected to be allocated may directly define the one-to-one correspondence of the task load levels referred to by the target application corresponding to the task ranking parameters (i.e., all the task load levels into which the respective tasks included in the target application are divided). For example, when the task ranking parameter indicates that the target application includes m task load levels and can be defined as level 0, level 1, \ 8230, and level m-1, respectively, in order from low to high corresponding task complexity, the plurality of task load levels expected to be allocated can also be the m levels.
In some embodiments, the task ranking parameter may be defined as a total number of task load levels involved or encompassed by the target application (i.e., a total number of task load levels expected to be allocated), or optionally include a task complexity range corresponding to each task load level, or optionally include a range of odds for each task load level. In this way, after the task ranking parameters corresponding to the target application are obtained, the ranking task allocation mode of the multi-core system (for the tasks issued by the target application) can be determined according to the task ranking parameters.
In some embodiments, the hierarchical task allocation pattern may be derived from a plurality of task load levels for which allocation is desired. In one aspect, with respect to the acquisition of hierarchical task allocation patterns, a selection may be made from a predetermined (fixed) plurality of candidate hierarchical task allocation patterns, depending on the total number of the plurality of task load levels expected to be allocated. For example, a plurality of candidate hierarchical task allocation modes may be predetermined and stored in the database before core scheduling is performed, and when core scheduling is required, the candidate allocation modes are first extracted from the database; and then selecting a suitable candidate hierarchical distribution mode (corresponding to the task hierarchical parameter of the target application) from the task hierarchical parameters as the basis of the current core scheduling. For example, when the task ranking parameter indicates that the target application includes m task load levels, the number of task load levels of the expected distribution corresponding to the target application is m, and therefore, the matching hierarchical distribution pattern including the m task load levels of the expected distribution can be selected from a plurality of candidate hierarchical task distribution patterns. On the other hand, in addition to the total number of the plurality of task load levels expected to be allocated, the hierarchical task allocation pattern may be obtained according to the range of the plurality of task load levels expected to be allocated (i.e. the relative complexity and/or the absolute complexity of the task corresponding to each task load level involved by the target application), and refer to the second hierarchical task allocation pattern shown in fig. 6C and 6D for details.
As shown in fig. 3A and 3B, the core scheduling platform 330 may receive and store the task ranking parameters corresponding to the target application from the target application 310 in advance (for example, before the target application issues the task execution request) through the task ranking parameter register MODE of the application interaction interface 331. Then, when core scheduling is required, the core scheduling component 332 of the core scheduling platform 330 may extract the task classification parameter from the task classification parameter register MODE of the application interaction interface 331 for obtaining the classification task allocation MODE.
In some embodiments, the determination of the task ranking parameter corresponding to the target application may be determined according to the application scenario of the target application and/or its respective task. The application scenario herein may refer to an information processing method involving different computation loads corresponding to each task included in the target application, and may include, for example, simple data reading and writing, high-speed information interaction, complex data computation, and the like. Since the application scenario corresponding to each task issued by the target application determines the complexity of processing or executing the corresponding task, the task load level corresponding to each task in the target application can be determined based on the application scenario.
For example, when an application scenario of a task is complex data operation, the task may be classified into a higher task load level due to the fact that the task consumes more time and has a larger calculation load, and a task involving only simple data read-write operation may be classified into a lower task load level. For example, if each task of the target application includes an application scenario involving m different information processing manners with different computational loads, all tasks involved in the target application may be divided into m task load levels; at this time, the number of the plurality of task load levels expected to be allocated is m, and optionally, the task load levels can be respectively defined as level 1, level 2, \8230;, and level m in the order of the corresponding task complexity from low to high. Thus, the task ranking parameter may be defined as m.
In some embodiments, the schedulable core region may be the entire multi-core system array region or a portion of the region screened therefrom. The purpose of screening the schedulable core area is to reduce the core scheduling range of the multi-core array for the target application, thereby simplifying the task allocation process, improving the working efficiency and reducing the energy consumption. For example, after a part of the region is selected from the multi-core array as the schedulable core region, the power of all the cores except the schedulable core region in the multi-core array can be directly turned off, so as to sufficiently save energy and reduce power consumption. The schedulable core area may be determined in advance according to the number of cores required by the target application to run or process the task before the core scheduling, or based on an area scheduling parameter obtained by the target application according to the number of required cores.
In particular, for example, in the case where the number of processor cores required for the target application to run is not predictable, the multi-core array overall area may be determined as the schedulable core area. On the other hand, the selection of a schedulable core region may also take into account the current operating state of each core in multi-core array 320. For example, there may be cores in the multi-core array 320 that are executing tasks issued by other applications than the target application (non-idle cores) and cannot process other tasks at the same time, which needs to be excluded from the schedulable core area.
In some embodiments, the correspondence between each core in the schedulable core area included in the hierarchical task assignment pattern and the plurality of task load levels expected to be assigned may be characterized by an arrangement of each task load level expected to be assigned in each core location in the schedulable core area of the multi-core array, which may be referred to in fig. 6A-6D and their corresponding descriptions. The location of a core therein may refer to the physical location of the core in the core array, including, for example, an absolute location or a relative location.
The hierarchical task allocation mode expressed as the corresponding relation between each core to be scheduled and the task load level to be allocated (or the arrangement of the task load level to be allocated on the core position to be scheduled) can intuitively indicate the allocation and deployment conditions of different task load levels in the multi-core array, so that the expected power density of the multi-core array can be controlled according to the core position deployment of different load levels, and particularly the task decentralized deployment of the same load level (especially a high load level) can be carried out to avoid the local power density imbalance or overhigh of the multi-core array.
Since the task ranking parameter may indicate a task load level division situation related to the target application, all task load levels involved by the target application, i.e. a plurality of task load levels expected to be allocated, may be obtained based on the task ranking parameter. Assuming, for example, that the task load level is defined as the total number m of different task load levels involved by the target application, a hierarchical task allocation pattern comprising m expected allocated task load levels may be selected from the fixed candidate hierarchical task allocation patterns as shown in fig. 6A-6E for task allocation or core scheduling of the target application.
In some embodiments, the hierarchical task allocation pattern obtaining step S220 may be completed in advance of the task execution request. In other words, the hierarchical task allocation pattern may be first pre-obtained or determined for the target application, and then the target application-related task allocation and core scheduling process may be started.
In step S230 (scheduling priority determining step), the scheduling priority of each core in the schedulable core area is determined according to the positional relationship between each core in the schedulable core area and the predetermined core.
In some embodiments, a predetermined Core refers to a predetermined starting Core to be scheduled, e.g., a Core _ ID =0 Core. Alternatively, the predetermined Core is not limited to the starting Core to be scheduled or the Core _ ID =0, but may be other cores in the multi-Core array. The scheduling priority of each core in the schedulable region is the order or sequence of the tasks to be executed issued by the scheduling execution target application of each core, that is, the order in which the cores are assigned with the tasks. In some embodiments, the scheduling priority of each Core may be determined directly in the order of the physical locations of the cores (e.g., from left to right, from top to bottom, as indicated by the arrows in fig. 6A-6E), which may be, for example, one-to-one correspondence with the Core _ ID (i.e., the smaller the Core _ ID, the higher the scheduling priority). In fig. 6A-6C and 6E, schedulable sub-regions of different grayscales similar to fig. 4 are also shown, the grayscales representing the first scheduling order of the schedulable sub-regions and thus the scheduling priority of the cores in the schedulable sub-regions can be derived. In other words, the gray scale in the figure reflects the core scheduling priority order to some extent.
Based on the concept of the present application, in order to implement unified centralized management of multiple cores adjacent to each other, the scheduling priorities of the cores may be appropriately defined so as to relatively centrally schedule the cores while following a hierarchical task allocation pattern (i.e., arrangement of multiple task load levels expected to be allocated in a Core array), i.e., to preferentially allocate a current task to a Core close to a predetermined Core (Core No. 0 in the upper left corner as shown in fig. 4), instead of simply defining the scheduling priorities of the cores in a position-ordered order or in a Core _ ID ascending order (left-to-right, top-to-bottom as shown in fig. 4). In other words, the scheduling priority (i.e., the order in which the cores are assigned tasks) of the various cores in the multi-core system may be determined based on a positional relationship with the predetermined cores, e.g., the closer to the predetermined cores, the higher the scheduling priority. When the predetermined Core is a Core _ ID =0 Core, this scheduling priority (i.e., task allocation order) determination manner may be referred to as a "near zero principle".
The objective of the near-zero principle is to implement core scheduling in a relatively centralized core area as much as possible, thereby implementing unified centralized management of scheduled cores. In some embodiments, the near-zero principle (i.e. determining the scheduling priority of each core in the schedulable core region according to the position relationship between each core in the schedulable core region and the predetermined core as described in S230) may be directly determined based on the distance from the predetermined core, for example, the scheduling priority of the predetermined core itself is the highest, and the scheduling priorities of the other cores are sequentially decreased as the distance from the predetermined core increases.
In some embodiments, in order to better implement regional management of cores, the near-zero principle may also be extended to be implemented based on the position relationship or distance between different core sub-regions and a reference core sub-region where a predetermined core is located, taking a schedulable core sub-region as a unit, and details can be seen in fig. 4 and fig. 5. In this case, each schedulable core sub-region may correspond to a power domain of the multi-core array, for example, the range of each schedulable core sub-region is exactly the same as the core range related to one power domain, which is beneficial to implementing regional power management of the core. Optionally, the schedulable core sub-regions may also be different from the range of the power domain, e.g. comprise multiple power domain ranges or only a part of a power domain range.
As shown in fig. 3B, step S230 may be completed in the scheduling priority determining module 332d in the core scheduling component 332. As shown in fig. 3B, the scheduling priority determining module 332d may first obtain the schedulable core region from the schedulable core region determining module 332c, and at the same time obtain the region partition parameter (i.e. the SIZE and shape of the partitioned region) from, for example, the region partition parameter register CZ _ SIZE of the application interaction interface 331, optionally, the region partition parameter may be the same as the power domain parameter stored by the power domain parameter register PWR _ SIZE; secondly, the schedulable core area is divided based on the area division parameters to obtain a plurality of schedulable core sub-areas; finally, the scheduling order of each schedulable core sub-region is determined based on the position relationship with the reference schedulable core sub-region, and the scheduling priority of the cores in the schedulable sub-region is determined and sent to the target core determination module 332a.
Alternatively, the scheduling priority of each core in the multi-core system may be predetermined before task allocation or core scheduling, and then may be stored in a core state record table as shown in fig. 3A. Therefore, in the processes of receiving, distributing and executing the tasks, the scheduling priority of the target core can be directly obtained from the core state record table, and the whole core scheduling process of the multi-core system is optimized.
In step S240 (task allocation state acquisition step), task allocation states of a plurality of cores in the schedulable core area are acquired.
According to the concept of the present application, to implement core scheduling with balanced power density, it is necessary to obtain a task allocation state of each core to be scheduled in a schedulable region before allocating a task to the core. The task allocation status of a core may indicate whether the core has allocated a task. For example, when a core in the schedulable core area is in the assigned task state, it indicates that the core has been scheduled to process or is ready to process the corresponding task, i.e., is in a working state, and thus cannot accept a new task assignment; and if the core is in the unallocated task state, indicating that the core is currently in an idle state and can accept new task allocation. Therefore, before core scheduling or task allocation is performed, it is necessary to know the current task allocation state of each core to be scheduled, so as to avoid core scheduling confusion and task conflict.
In some embodiments, as shown in fig. 3B, the core scheduling component 332 may directly obtain the required task allocation status of each core from the core status record table 333, because the core status record table 333 is responsible for recording various real-time status information including the task allocation status of the core in real time. Accordingly, the core scheduling component 332 may also send the task allocation status of the corresponding core to the core status record table 333 to update the related information after the completion of the core scheduling or task allocation and the completion of the task processing.
The core status record table 333 shown in fig. 3B may be created and managed by the core scheduling platform 330, and is used to record information such as the task allocation status, the allocated task load level, and the power switch status of each core of the current multi-core system. Table 1 schematically illustrates a core state record table of a multi-core system according to some embodiments of the present application. As shown in table 1, the core state record table of the multi-core system may include the following six parameters: core _ ID, assigned, load _ level, PWRID, PWR _ off _ done, which represent the Core identifier, task allocation status, assigned task Load level, power domain ID, power domain switch status, and power domain switch completion status, respectively.
TABLE 1 core State records Table
Figure 347912DEST_PATH_IMAGE002
As shown in table 1, for example, the multi-core system is a 16 × 16 core array with 256 cores, one row is provided for each core in the array, and three columns to the left of the cores are allocated with task-related information with 256 rows, because the information is independent for each core; the right three columns have 16 rows of state information about the power domains, since each power domain comprises 16 cores, i.e. 16 cores share one power domain. The depth and size of the specific tables of table 1 may be configured according to system requirements.
The Core _ ID in table 1 indicates an identifier of each Core, which is set to be not changed. As shown in Table 1, the Core _ ID assigns a unique ID to each Core of the multi-Core system in sequence. Taking the multi-Core system array based on 16 × 16 cores of a mesh NOC structure as an example, the Core _ IDs may be assigned in the following order: starting with the first Core at the top left of the array (i.e., the Core in the first row and column), the Core _ IDs of the first row and the left-to-right cores are 0-15, the second row and the left-to-right cores are 16-31, \ 8230, and so on, and finally the Core _ IDs of the 16 × 16 multi-Core system range from 0-255. Alternatively, the IDs may be allocated according to the power domains to which the cores belong, for example, the IDs of a corresponding number are allocated to each power domain in sequence according to the position order of each power domain and the size of the power domain (i.e., the number of cores), and then specific IDs are allocated in sequence according to the position arrangement order of each core in the same power domain. For example, taking a 16 x 16 Core array with a power domain parameter of 4 x 4 as an example, a total of 16 power domains may be first assigned 16 Core _ IDs from left to right and top to bottom, with the first power domain in the top left corner being assigned 0-15, the second power domain in the first row being assigned 16-31, \8230, and so on. Then, the 16 cores are each assigned a Core _ ID in the first power domain, still in left-to-right, top-to-bottom order, e.g., core _ IDs for cores in the first row are 0-3 in order, core _ IDs for cores in the second row are 4-7 in order, 8230, and so on; the Core _ IDs of the cores in the second row are 16-19 in sequence from left to right, the Core _ IDs of the cores in the second row are 20-23 in sequence from left to right, \ 8230, and so on, and finally ID allocation of 256 cores is completed.
Assigned in table 1 represents task assignment status, i.e. whether the core has been Assigned a task, where Assigned defaults to 0, which represents that no task is Assigned; when a core is selected by core scheduling platform 330 to execute a task to be allocated, the Assigned of the core will be set to 1. When the core completes the Assigned task and restores the idle state, the Assigned is cleared again. From the time when the core scheduling platform 330 sets the Assigned to 1 to the time when the target application transmits the task according to the Assigned value and performs processing, there is a time difference, and thus the core may not really start processing the corresponding task when the Assigned becomes 1; similarly, the Assigned is cleared after the core completes task processing, so that the core is completely released when the Assigned is 0. Before, after and during the execution of the task, the Assigned is 1, no other task is allocated to the core in the period, and therefore no task conflict exists.
The Load _ level in expression 1 records the task Load level of the software task assigned to the core. The method is synchronously updated when the Assigned is updated, records the task load level of the task to be executed to the row of the core allocated in the core state record table, and keeps the task load level unchanged during the task execution. The Load _ level and the Assigned are updated synchronously and are completely related. When Assigned is 0, load _ level is set to Null, since of course there is no possibility that a corresponding task Load level exists when the core is not Assigned a task. In a specific implementation process, when a core does not allocate a task (i.e. Assigned is 0), the Assigned task Load level Load _ level may be set to a default initial value of 0, where "0" has no practical meaning (e.g. it does not represent the actually Assigned task Load level value), and is only an initial value set to meet the requirement of a storage space. Since the core is not Assigned a task when Assigned =0, assigned =0 has implicitly indicated that the "Assigned task Load level Load _ level" is empty, so that the default initial value 0 of Load _ level is not mistaken for "Assigned task Load level is 0 level" at this time. Only when the Assigned is 1, the value of Load _ level has practical significance, namely, the value represents the specific value of the 'distributed task Load level'.
PWRID in table 1 indicates the identifier of each power domain in the multi-core system, which is set to no longer change. The number of the power domains can be obtained according to the region scheduling parameters of the multi-core system and the power domain parameters of each power domain. As shown in table 1, the region scheduling parameter is 16 × 16, and the power domain parameter is 4 × 4, which indicates that the multi-core system has 16 × 16 cores in total, each power domain includes 4 × 4 core arrays, 16 power domains are required to be set, and the PWRID ranges from 0 to 15. As shown in Table 1, the power domain PWRID =0 powers 16 cores with Core _ ID of 0-15, and the power domain PWRID =15 powers cores with Core _ ID of 240-255 for 8230.
PWR _ off in table 1 indicates whether the switch signal, i.e. the power domain is turned off, which is default to 1, indicating that the power domain is turned off (e.g. including the state from the instant the power domain off action is performed to before the on action is performed); 0 represents that the power domain is turned on (e.g., including the state from the instant the power domain turn-on action is performed to before the turn-off action is performed); PWR _ off _ done indicates a switch completion signal, i.e., whether the power domain is fully on or fully off, 0 indicates that the power domain is not fully off or fully on, 1 indicates that the power domain is not fully on or fully off, and PWR _ off _ done can exactly derive the power state of the power domain only in combination with PWR _ off.
In some embodiments, if the switching signal PWR _ off is 1 and the switching completion signal PWR _ off _ done is 1, the power state of the power domain is a completely off state, as shown in the power domain with PWRID of 15 in table 1; if the switching signal PWR _ off is 1 and the switching completion signal PWR _ off _ done is 0, the power supply state of the power domain is the power-down state, as shown in the power domain with PWRID of 15 in table 1; if the switching signal PWR _ off is 0 and the switching completion signal PWR _ off _ done is 0, the power state of the power domain is a fully on state, as shown in the power domain with PWRID of 0 in table 1; if the switching signal PWR _ off is 0 and the switching completion signal PWR _ off _ done is 1, the power state of the power domain is a power-on state.
In step S250 (target core determining step), a target core for processing the task to be executed is determined from the schedulable core area according to the task load level of the task to be executed, the hierarchical task allocation pattern, and the scheduling priority and task allocation status of each core in the schedulable core area.
As shown in fig. 3B, step S250 may be performed in the target core determining module 332a in the core scheduling component 332, that is, a target core suitable for processing the task to be executed is selected from the schedulable area according to the task LOAD LEVEL of the task to be executed obtained from the task LOAD LEVEL register LOAD _ LEVEL, the task allocation status of the cores in the multi-core array (especially in the schedulable area) obtained from the core status record table 333, and the hierarchical task allocation pattern obtained from the task allocation pattern obtaining module 332B.
In some embodiments, the core scheduling scheme based on the hierarchical task allocation mode may perform layer-by-layer screening in the schedulable region based on different factors, such as a task load level of the task to be executed, the hierarchical task allocation mode, and a task allocation state of the core array, respectively, to obtain a target core that meets the expectation. Specifically, the core scheduling component 332 may first screen out, according to the task allocation state of each core in the schedulable core area, a core in an idle state (that is, the task allocation state is an unallocated task) from the schedulable core area to form a first candidate core area; then, according to the corresponding relation between each core in the schedulable core area included in the hierarchical distribution mode and the task load level expected to be distributed (namely the corresponding arrangement of the task load levels to be distributed in each core position of the schedulable core area of the core array), screening a second candidate core area formed by cores of which the task load level expected to be distributed is matched with the task load level of the task to be executed from the first candidate core area; and finally, selecting the target core from the second candidate core area according to a preset rule. The predetermined rules may include, for example, choosing in a positional order (e.g., lateral left to right, longitudinal top to bottom), or may also be chosen randomly.
In the core scheduling method for the multi-core system according to some embodiments of the present application, firstly, a task hierarchical processing manner is used, that is, the task to be executed divides the task load levels according to the task complexity, so that the operation of the task to be executed or the representation of the execution complexity is simplified, the subsequent task allocation or core scheduling process is simplified, and the work efficiency is improved; secondly, by using a hierarchical task allocation mode (i.e., scheduling or allocating cores for processing the tasks according to task load levels), that is, a corresponding relationship between cores in the multi-core array and task load levels expected to be allocated, balanced allocation or arrangement of tasks of different load levels at each core position in a schedulable region of the core array (for example, staggered allocation of cores corresponding to tasks of higher load levels and tasks of lower load levels, etc.) can be realized, so that the power density in the whole or local core array of the multi-core system is relatively balanced, the problems of local high load state, excessive power density and high regional heat of the core array caused by excessive aggregation of high-load tasks on adjacent cores are avoided, and the overall performance of the multi-core system is effectively improved.
In addition, while the hierarchical task allocation mode is used for realizing the balanced arrangement of different task load levels, the core allocation sequence of the current tasks to be executed is defined by using the scheduling priority (for example, the near-zero principle) determined based on the position relationship (for example, the position closeness degree) between each core and a predetermined core (for example, the initial cores to be scheduled with the ID =0 at the upper left corner of the core array), so that each hierarchical task is preferentially and intensively allocated to the area (for example, the power supply area of the same power supply domain) near the predetermined core, which is beneficial to the centralized and unified management of the scheduled cores, especially the regional power supply management, the core management efficiency of the multi-core system is remarkably improved, and simultaneously, the energy consumption, the power supply life loss and the work efficiency loss (for example, caused by frequent switching of the power supplies) are reduced by the unified regional power supply management (for example, the power supply sleep mode is adopted when each core in the scheduled core area is idle).
Fig. 4 illustrates a core scheduling priority diagram according to some embodiments of the present application. Fig. 5 illustrates an example process of core scheduling prioritization in accordance with some embodiments of the present application.
As shown in fig. 5, step S230 (scheduling priority determining step) shown in fig. 2 may include: s231, a schedulable core area dividing step; s232, a first scheduling sequence determining step; s233, a core scheduling priority determining step. The above steps are explained in detail with reference to fig. 4.
In S231, the schedulable core region is divided into a plurality of schedulable core sub-regions, where the plurality of schedulable core sub-regions includes a reference schedulable core sub-region where the predetermined core is located.
In some embodiments, the division of the multi-core array or the schedulable core region therein is a precondition for implementing the near-zero rule and the regionalized management of the core, because the scheduling order of each schedulable sub-region and the scheduling priority of the core in the schedulable core sub-region can be determined according to the position relationship between the schedulable sub-region and the predetermined core after the division. The plurality of schedulable sub-areas thus divided may have the same shape and the same number of cores for convenience of operation management. As shown in fig. 4, for a 16 × 16 multi-core array or a schedulable core region thereof, the multi-core array or the schedulable core region may be divided into 16 schedulable core sub-regions, which are numbered 0-15; each divided schedulable core sub-region is a 4-by-4 square core array; the gray scale in fig. 4 represents the scheduling order of the different schedulable core sub-regions 0-15 or the scheduling priority of the different cores.
In some embodiments, this partitioning of regions may be done according to the distribution and size of power domains in the multi-core system to facilitate regional power management. For example, the schedulable core area is divided into a plurality of schedulable core sub-areas corresponding to the core areas related to the power domains of the multi-core system one to one, that is, each core in each schedulable core sub-area is located in the same power domain and shares the same power domain. As shown in fig. 4, the schedulable core sub-regions may also be set to 4 x 4 square core arrays, assuming that the power domain size is 16 and each core in the 4 x 4 square core array is powered. Therefore, the near-zero principle realized by arranging the schedulable core sub-regions corresponding to the power domain is beneficial to centralized and unified regional management of the power supply of the scheduled core in the multi-core system, and the power supply management efficiency is improved.
Among the divided schedulable core sub-regions, as described in S231, there is one particular schedulable core sub-region, i.e., the reference schedulable core sub-region where the predetermined core is located. The reference schedulable core sub-region serves the same function as the predetermined core in S230, i.e. is used as a reference for calculating the scheduling order for each schedulable core sub-region (e.g. the closer to the reference schedulable core sub-region, the earlier the scheduling order).
In S232, a first scheduling order for each schedulable core sub-region is determined based on a distance between each schedulable core sub-region and a reference schedulable core sub-region.
As shown in fig. 4, assuming that the Core with a zero Core _ ID in the top left corner is the predetermined Core, the 4 × 4 square Core array including the predetermined Core in the top left corner is the reference schedulable Core sub-region. In the above step, the first scheduling order is used to indicate a scheduling priority or a scheduling order of the schedulable core sub-regions. As shown in fig. 4, the distance between a certain 4 × 4 schedulable Core sub-region and the reference schedulable Core sub-region at the top left corner may be defined as the distance between the cores at the respective corresponding positions, for example, the distance between the Core at the top left corner in the schedulable Core sub-region immediately to the right of the reference Core sub-region in fig. 4 (i.e., core _ ID = 5) and the predetermined Core at the top left corner in the reference schedulable Core sub-region (Core _ ID = 0) is the distance between these two regions. In some embodiments, S232 may include: sorting the schedulable core sub-regions based on a sequence of distances from small to large between each schedulable core sub-region and a reference schedulable core sub-region; and determining a first scheduling sequence of each schedulable core sub-region according to the sequencing result, wherein the first scheduling sequence of each schedulable core sub-region is consistent with the sequencing result.
The gray scale in fig. 4 represents the first scheduling order for each schedulable core sub-region, wherein the deeper the gray scale, the earlier the first scheduling order. As shown in fig. 4, the reference schedulable sub-region at the top left corner contains a predetermined kernel (e.g., zero-numbered kernel, i.e., the kernel at the top left corner in fig. 4), so that the grayscale thereof is the deepest, which means that its corresponding first scheduling order is the first, i.e., its first scheduling order may be assigned 1, i.e., the scheduling order is ranked first. As shown in fig. 4, as the distance (e.g., the lateral distance and the longitudinal distance) from the reference schedulable sub-region is farther and farther, the gray scale of each 4 × 4 schedulable core becomes lighter and lighter in a gradual trend from deep to light, which means that the assignment (i.e., the sequence number) of the first scheduling order of the schedulable core sub-region is positively correlated with the distance from the reference schedulable core sub-region, i.e., the greater the distance, the greater the assignment of the first scheduling order, i.e., the further the scheduling order is.
In some embodiments, the distance between two schedulable sub-regions may be determined based on the lateral distance and the longitudinal distance between the two since the lateral and longitudinal distances may fully determine the actual distance between the two and may be more easily calculated between in the context of the array shaped schedulable sub-regions. As shown in fig. 5, step S232 may include: s232a, calculating the transverse distance and the longitudinal distance between each schedulable core subregion and the reference schedulable core subregion; s232b, determining a first scheduling order of the schedulable core sub-regions based on a lateral distance and a longitudinal distance between each schedulable core sub-region and the reference schedulable core sub-region.
In some embodiments, the distance between two schedulable sub-regions may represent an absolute distance or may represent a non-absolute distance. For example, the distance between the sub-regions may be defined as the geometric distance between the center points (i.e. two-dimensional Euclidean distance), i.e. the quadratic root of the sum of the squares of the lateral distance and the longitudinal distance between the center points. Furthermore, for example, as shown in fig. 4, for two or more schedulable core sub-regions with the same gray scale (i.e., the same distance from the reference schedulable sub-region), it may be directly determined to have the same first schedulable order (as compared to other schedulable sub-regions with different grays), and then the respective first schedulable order may be updated in a position ordering (e.g., left to right, top to bottom) or, alternatively, the respective first scheduling order may be randomly updated.
In some embodiments, the determining a first scheduling order for each schedulable core sub-region based on a lateral distance and a longitudinal distance between each schedulable core sub-region and the reference schedulable core sub-region may include: for each schedulable core sub-region, calculating a sum of squares of a lateral distance and a longitudinal distance between the schedulable core sub-region and the reference schedulable core sub-region; determining a first scheduling sequence of each schedulable core sub-region based on a sum of squares of the lateral distance and the longitudinal distance corresponding to each schedulable core sub-region; in response to there being at least two schedulable core sub-regions of the plurality of schedulable core sub-regions having a same first scheduling order, updating the first scheduling order for the at least two schedulable core sub-regions based on a longitudinal distance between each of the at least two schedulable core sub-regions and the reference schedulable core sub-region.
For example, as shown in fig. 4, the distance between the schedulable core sub-area No. 1 and the schedulable core sub-area No. 4 and the reference schedulable core sub-area No. 0 is the same (i.e. the distance between the schedulable core sub-areas is the same as the square sum of the longitudinal distances, the distance between the schedulable core sub-areas No. 1 and No. 4 is the same as the longitudinal distance), so that the first scheduling order is the same, for example, both schedulable core sub-areas are assigned to 1; at this time, in order to distinguish the scheduling sequences of the two, the first scheduling sequence of number 1 may be assigned and updated to be 1+0 (because number 1 is on the right of number 0, the longitudinal positions of the two are the same, and thus the longitudinal distance is 0) according to the position arrangement sequence (first horizontal and then longitudinal, i.e., from left to right, and from top to bottom), i.e., the sequence of the longitudinal distance from small to large, and the first scheduling sequence of number 4 is updated to be 1+1 (number 4 is below number 0, and the longitudinal distance is greater than zero). Through the above updating, the first scheduling orders of the schedulable core sub-areas No. 1 and No. 4 are distinguished (for example, no. 1 precedes No. 4 according to the position arrangement order), so that the scheduling order conflict problem is solved.
In S233, the scheduling priority of each core in each schedulable core sub-region is determined at least according to the first scheduling order of each schedulable core sub-region.
After the first scheduling order of each schedulable core sub-region is determined, the scheduling priority of each core within each schedulable core sub-region needs to be finally determined based on the first scheduling order. In some embodiments, a second scheduling order of cores may be first defined for indicating an internal scheduling order of cores of the same schedulable core sub-region; then, the scheduling priority of each core is finally obtained based on the first scheduling order (outer scheduling order) of the schedulable sub-areas and the second scheduling order (inner scheduling order) of the cores in the areas. The second scheduling order may be randomly generated, or may be created according to the position arrangement of the cores (for example, from left to right, then from top to bottom).
In some embodiments, as shown in fig. 5, step S233 may include: s233a, determining a second scheduling sequence of each core in each schedulable core sub-area according to the position arrangement sequence of each core in each schedulable core sub-area; and S233b, determining the scheduling priority of each core in each schedulable core region according to the first scheduling sequence of each schedulable core sub-region and the second scheduling sequence of each core in the schedulable core sub-region. Specifically, the scheduling priorities of the cores may be defined as "a first scheduling order + a second scheduling order", and when the scheduling priorities are sorted, the cores are firstly sorted according to the first scheduling order, and when the first scheduling order is the same, the cores are sorted according to the second scheduling order, so that the scheduling order of each core in the schedulable region may be uniquely determined. As shown in fig. 4, for example, assuming that the first scheduling order of the schedulable sub-areas No. 1 is assigned as 1, and wherein the second scheduling order of the core at the upper left corner (e.g., in the order of horizontal first and vertical second (or vertical first and horizontal second)) is determined as 0 according to the position arrangement order, the value of the scheduling priority of the core may be defined as 1+0, where the 1 before the plus sign represents the first scheduling order (i.e., the area scheduling order) and 0 represents the second scheduling order (i.e., the inner scheduling order).
FIGS. 6A-6E schematically illustrate hierarchical task allocation patterns according to some embodiments of the present application. As shown in fig. 6A-6E, the core array of the multi-core system is shown as a 16 by 16 square lattice array, with the various rows labeled 0-15 from top to bottom and the various columns also labeled 0-15 from left to right; each square in the array represents a core location, and the number in each square represents the expected assigned task load level corresponding to the core location, so that the square array including the number shown in fig. 6A-6E can represent the corresponding relationship between each core in the schedulable core area of the multi-core system and the expected assigned task load level, i.e., the hierarchical task assignment pattern. Alternatively, as shown in FIGS. 6A-6C, the grayscales in the figures represent the first scheduling order of the schedulable sub-regions, and the arrows in the grid array represent the second scheduling order of the cores, i.e., allocating cores for the scheduling of the tasks to be executed from left to right in the horizontal direction and from top to bottom in the vertical direction. As shown, the scheduling priority of the cores may be determined based on the first and second scheduling orders.
In some embodiments, in the hierarchical task allocation pattern obtained in S220, the schedulable core areas may include a plurality of types of core areas corresponding to a plurality of task load levels expected to be allocated in a one-to-one manner, each type of core area including a plurality of non-adjacent sub-areas, each sub-area including one core or at least two adjacent cores. "adjacent" of two cores herein may be understood as being physically located in close proximity, i.e. there is no other core between the two; while "adjacent" to more than three cores may be understood as any one of the cores being immediately adjacent to at least one other of the cores (i.e., no additional cores being present with respect to each other). For example, in a multi-core array, multiple cores are "adjacent" primarily including laterally or longitudinally adjacent.
As shown in fig. 6A, the schedulable core area includes four types of core areas, i.e., a level 0 area, a level 1 area, a level 2 area, and a level 3 area shown by dotted lines, each type of area includes a plurality of non-adjacent sub-areas, i.e., the same type of area is non-adjacent, and each sub-area includes only one core. In other words, any two adjacent regions are heterogeneous regions. As shown in fig. 6A, in the first row of the schedulable region, the level 0 region includes 4 sub-regions, the level 1 sub-region includes 2 sub-regions, the level 2 region includes 1 sub-region, and the level 3 region includes 1 sub-region. As shown in fig. 6A-6D, the sub-regions of the homogeneous region are non-adjacent in both the lateral and longitudinal directions of the array of schedulable regions, with one or more other homogeneous regions therebetween, such as level 1 region or level 2 region or level 3 region between adjacent sub-regions of the level 0 region in fig. 6A.
As shown in fig. 6D, the schedulable core area includes three types of core areas, i.e. a level 0 area, a level 1 area, and a level 2 area, which are shown by dotted lines, wherein the same type of areas are not adjacent; wherein the sub-regions are divided according to the number of the core sub-regions, each sub-region of the level 1 region and the level 2 region includes only one core (i.e. the core corresponding to the load level 1 or the core corresponding to the load level 2), but the level 0 region includes four sub-regions, i.e. the sub-regions shown by the dotted lines respectively include 1, 3, 4, 5 cores, wherein each core in the sub-regions corresponding to the plurality of cores is adjacently arranged (i.e. adjacent in sequence two by two), i.e. these sub-regions can be seen as connected regions.
As shown in fig. 6A-6D, such non-adjacent arrangement of the same-class sub-areas (i.e., the core areas formed by cores corresponding to the same task load level) can distribute tasks of the same load level to a plurality of cores relatively dispersed in physical location, thereby avoiding power density imbalance caused by excessively concentrated distribution of a large number of tasks of the same load level (especially, high load level) to cores adjacent to each other.
In some embodiments, the hierarchical tasking patterns shown in fig. 6A-6E may be predetermined (fixed)) candidate hierarchical tasking patterns. Core scheduling platform 330, upon receiving a request to execute a task, may select an appropriate one of these candidate hierarchical task allocation patterns to implement core scheduling in accordance with the task hierarchy parameters.
Fig. 7 schematically illustrates a flow chart of a core scheduling method for a multi-core system according to some embodiments of the present application. As shown in fig. 7, in addition to steps S210 to 250, before step S220 (hierarchical task allocation pattern obtaining step), the core scheduling method for a multi-core system according to some embodiments of the present application shown in fig. 2 may further include:
s260, acquiring regional scheduling parameters: acquiring regional scheduling parameters corresponding to the target application, wherein the regional scheduling parameters are determined based on the core number required by the running of the target application;
s270, a schedulable core area determining step: and determining a schedulable core region from a core array of the multi-core system according to the region scheduling parameter.
According to the core scheduling method, a schedulable core area screening process is added before the hierarchical task allocation mode is determined, so that the task allocation process is simplified, the working efficiency is improved, and the energy consumption is reduced. For example, the schedulable core regions of the multi-core system may be determined based on the number of cores required by the target application to run, i.e., one core region may be defined or screened out in the multi-core array 320 of the multi-core system as the schedulable core region for processing various tasks issued by the target application 310.
The region scheduling parameter corresponding to the target application may correspond to the number of cores required for the target application to run, and thus may be used to define the number of cores, the area, etc. of the schedulable core region of the multi-core system, thereby determining the schedulable core region. In particular, the regional scheduling parameter may be directly defined to be greater than or equal to the total number of cores required for the target application to run. In some embodiments, the target application 310 may predict the total amount of processor cores needed to process the various tasks it issues based on its particular application scenario.
As shown in FIG. 3B, before core scheduling begins, the region scheduling parameter register MC _ SIZE in the application interaction interface 331 may be utilized to receive and store region scheduling parameters from the target application 310 and send them to the schedulable core region determination module 332c to determine the SIZE and shape of the schedulable core region of the multi-core system based on the region scheduling parameters. Optionally, the task execution request may further include a regional scheduling parameter, so that step S260 (the regional scheduling parameter obtaining step) may be simplified as follows: and acquiring the area scheduling parameters from the task execution request.
For example, the area scheduling parameter may be defined by the number of cores required for the target application to run, for example, the area scheduling parameter is 64, and the corresponding schedulable core area is an area (which may be a continuous area or a discontinuous area) including 64 cores, and specifically, 64 or more cores may be randomly selected in the multi-core array 320 to form the schedulable core area. Alternatively, the region scheduling parameters may also include X-size and Y-size (the unit may be length unit or may be the number of cores) for defining the horizontal and vertical size or the horizontal and vertical number of cores of the schedulable core region or array, respectively, so that a rectangular schedulable core region may be determined.
In the multi-core array of FIGS. 6A-6E, the squares of reference numerals (covered by ellipses and arrows) form the schedulable core region of the multi-core array. FIG. 6A illustrates a hierarchical task allocation pattern including a schedulable core region as part of a multi-core array. As shown in fig. 6A, in a 16-by-16 core array, when the region scheduling parameters X-size and Y-size may be respectively equal to 8 cores, the corresponding schedulable core region may select a core region with a size of 8-by-8 at the upper left corner of the core array (i.e., a connected array region including 64 cores corresponding to row 0 to row 7 and column 0 to column 7 of the entire core array). As shown in fig. 6A, after the 8 × 8 schedulable core region (shown as white) is selected, all cores in the other regions (shown as non-white) of the 16 × 16 multi-core array except the schedulable core region may be powered off directly to save energy and reduce power consumption substantially.
It is apparent that the schedulable core regions in the hierarchical task allocation pattern shown in fig. 6B-6E are all the entire multi-core array of the multi-core system, i.e., the 16 x 16 array. As shown in fig. 6B-6E, in a 16 × 16 core array, when the region scheduling parameter is equal to 256 cores or X-size and Y-size are respectively equal to 16 cores, the entire multi-core array core region is selected as a schedulable core region.
In some embodiments, the retrieval of the hierarchical task allocation schema may be based on a plurality of task load levels of an expected allocation associated with the target application. For example, where the plurality of task load levels of the expected allocation includes a first task load level and a second task load level, the hierarchical task allocation pattern may be a first hierarchical task allocation pattern in which the schedulable core regions include a first class of core regions corresponding to the first task load level and a second class of core regions corresponding to the second task load level, and each of the first class of core regions and each of the second class of core regions includes a core.
Fig. 6B illustrates a first hierarchical task allocation pattern in accordance with some embodiments of the present application. As shown in the lattice array of fig. 6B, the schedulable core area of the multi-core system is the entire core array area, and the schedulable core area is divided into two types of core areas: a first type of core area, i.e., a level 0 area corresponding to a level 0 task load level (i.e., a first task load level); and a second class of core regions, i.e., level 1 regions corresponding to a level 1 task load level (i.e., a second task load level). In other words, the core location labeled 0 in the lattice array corresponds to a level 0 task and the core location labeled 1 corresponds to a level 1 task. As shown by the dashed lines in fig. 6B, the sub-regions of the first type core region (level 0 region) include only one core (i.e., level 0 core), and the sub-regions of the second type core region (level 1 region) also include one core (i.e., level 1 core). In some embodiments, the second task load level (i.e., level 1) is greater than the first task load level (i.e., level 0), i.e., the former has a higher task complexity level than the latter.
As shown in fig. 6B, the sub-regions of the level 0 region (i.e., the first type core region) and the sub-regions of the level 1 region (i.e., the second type core region) are mutually interspersed and staggered in the core array, and the number of the sub-regions is equal, and each sub-region occupies 50% of the multi-core array. Thus, the first hierarchical task allocation pattern shown in fig. 6B may support core scheduling of two expected allocated task load levels (e.g., 0 and 1), i.e., a core schedule corresponding to the level 0 region performs the level 0 task or allocates the level 0 task to a core in the level 0 region, and a core schedule corresponding to the level 1 region performs the level 1 task or allocates the level 1 task to a core in the level 1 region. Accordingly, when the plurality of task load levels of the expected distribution determined based on the task ranking parameters includes two levels (the first task load level and the second task load level), the first hierarchical task distribution pattern as shown in fig. 6B may be selected.
As shown in fig. 6B, the first hierarchical task allocation mode successfully separates a plurality of tasks with higher task load levels (i.e., level 1) or computational complexity from tasks with lower task load levels (i.e., level 0) by using the staggered arrangement of tasks with two different load levels (i.e., level 0 and level 1) in the core array, thereby avoiding the excessive aggregate distribution of the high-load tasks in the core array and effectively improving the power density balance of the multi-core system.
In some embodiments, where the plurality of task load levels expected to be allocated include a first task load level, a second task load level, and a third task load level, the hierarchical task allocation pattern may be a second hierarchical task allocation pattern in which the schedulable core regions include a third class of core regions corresponding to the first task load level, a fourth class of core regions corresponding to the second task load level, and a fifth class of core regions corresponding to the third task load level, each sub-region of the fourth class of core regions being non-adjacent to each sub-region of the fifth class of core regions. Optionally, the task complexity level corresponding to each of the second task load level and the third task load level is greater than the task complexity level corresponding to the first task load level. Optionally, the number of cores of the fourth type core area and the number of cores of the fifth type core area are each smaller than the number of cores of the third type core area.
Fig. 6C and 6D illustrate a second hierarchical task allocation pattern according to some embodiments of the present application. As shown in fig. 6C and 6D, the schedulable core area of the multi-core system is the entire core array area, and the core array (i.e. the lattice array in the figure) of the multi-core system is divided into three core areas: a third class core region, i.e., a level 0 region corresponding to a level 0 task load level (first task load level); a fourth class core region, i.e., a level 1 region corresponding to a level 1 task load level (second task load level); and a fifth type core area corresponding to a level 2 area of a level 2 task load level (third task load level). In other words, the box labeled 0 (i.e., the core position) in the grid array corresponds to a level 0 task, the box labeled 1 corresponds to a level 1 task, and the box labeled 2 corresponds to a level 2 task. As shown in fig. 6C and 6D, the sub-regions in the fourth class core region (level 1 region) and the sub-regions in the fifth class core region (level 2 region) both include only one core, and the two classes of regions are not adjacent, and the sub-regions of the level 1 region and the level 2 region are separated from each other by one or more sub-regions of the level 0 region. Thus, the second hierarchical task allocation pattern shown in fig. 6C and 6D may support core scheduling of three expected allocation or task load levels to be allocated (e.g., 0,1, and 2), i.e., the core schedule corresponding to the 0-level region executes the 0-level task or the 0-level task is allocated to the core in the 0-level region, the core schedule corresponding to the 1-level region executes the 1-level task, and the core schedule corresponding to the 2-level region executes the 2-level task. Optionally, the third task load level and the second task load level are both greater than the first task load level.
Therefore, when the plurality of task load levels of the expected distribution determined based on the task ranking parameter includes three levels (the first task load level, the second task load level, and the third task load level), a second ranking task distribution pattern as shown in fig. 6C or 6D may be selected. As shown in fig. 6C and 6D, in the second hierarchical allocation pattern: the level 0 region, the level 1 region and the level 2 region are arranged in the multi-core array in an interleaving manner, and particularly, the fourth type core region (level 1 region) corresponding to a higher load level is not adjacent to the fifth type core region (level 2 region), so that the sub-regions of the level 1 region, the sub-regions of the level 2 region and the sub-regions of the level 1 and the level 2 are distributed at intervals. For example, in FIG. 6C, two 2 s are separated by 0 and 1, and two 1 s are separated by 0, 2, 0,1 and 2 by 0. Due to the arrangement, the tasks with higher loads are distributed to the cores in the multi-core array more dispersedly than the tasks with lower loads, and the problem of multi-core array power density unbalance caused by excessive concentration of the tasks with high loads is avoided.
In some embodiments, the second hierarchical task allocation pattern comprises a first sub-pattern in which: each sub-region of the third type of core region, each sub-region of the fourth type of core region, and each sub-region of the fifth type of core region comprises a core. Optionally, in the first sub-mode, in each row and each column of the schedulable core area, the cores in the fourth type area are separated by at least one core in the third type area and at least one core in the fifth type area, and the cores in the fifth type area are separated by at least one core in the third type area and at least one core in the fourth type area.
Fig. 6C shows a first sub-mode of the second hierarchical task allocation mode. As in fig. 6c, each of the sub-regions in the level 0 core region, the level 1 core region, and the level 2 core region includes only one core. As shown in fig. 6C, in the first sub-mode, the distribution order of the three task load levels (i.e. level 0, level 1, level 2) in the multi-core array is: transverse 01020102, longitudinal 02010201; viewed from the oblique diagonal direction from the upper left to the lower right, the arrangement sequence is a row of 0-level tasks, a row of 1-level tasks, another row of 0-level tasks, and a row of 2-level tasks, and then repeating. In other words, in fig. 6C, each core (i.e., the core corresponding to the level 1 load level) in the level 1 region (i.e., the core corresponding to the level 1 load level) is separated by two cores (i.e., the core corresponding to the level 0 load level) in the level 0 region (i.e., the core corresponding to the level 0 load level) and 1 core (i.e., the core corresponding to the level 2 load level) in the level 2 region (i.e., the core region corresponding to the fifth load level) is separated by two cores (i.e., the core corresponding to the level 0 load level) in the level 0 region (i.e., the core region corresponding to the level 0 load level) and 1 core (i.e., the core corresponding to the level 1 load level) in the level 1 region (i.e., the core region of the fourth load level).
The purpose of the task load level arrangement shown in FIG. 6C is to distribute tasks of different load levels across adjacently located cores and to separate higher load levels (levels 1 and 2) with lower load levels (level 0) to avoid excessive clustering of high load tasks. In the first sub-mode of the second hierarchical task allocation mode shown in fig. 6C, the ratio of the 0-level, 1-level and 2-level tasks is 2. Therefore, the distributed arrangement of the high-load-level tasks in the core array is ensured, and the absolute quantity of the high-load-level tasks is controlled to be lower, so that the relatively balanced power density in the schedulable area of the multi-core array is ensured.
In some embodiments, the second hierarchical task allocation pattern comprises a second sub-pattern. In the second sub-mode, in the odd rows and the odd columns of the schedulable core areas, each core in the fourth type area is separated by one or more cores in the third type area, and in the even rows and the even columns of the schedulable area, each core in the fourth type area is separated by at least one core in the third type area and at least one core in the fifth type area, and each core in the fifth type area is optionally separated by at least one core in the third type area and at least one core in the fourth type area.
Fig. 6D shows a second sub-mode of the second hierarchical task allocation mode. As indicated by the dashed boxes in fig. 6D, the sub-regions of the third type core region (level 0 region) may comprise 1, 3, 4 or 5 cores, while the sub-regions of the fourth and fifth type core regions (level 1 region and level 2 region) contain only 1 core. In the second sub-mode, the distribution order of the three task load levels (i.e. level 0, level 1 and level 2) in the horizontal direction and the vertical direction in the multi-core array is: 00010001 or 02010201. As shown in fig. 6D, the layout sequence of the second sub-pattern, viewed from the diagonal direction from top left to bottom right, is: the method comprises the following steps of a row of 0-level tasks, a row of 1-level tasks, a row of 0-level tasks and a row of 2-level tasks, wherein the row 1. In other words, in fig. 6D, in odd rows and odd columns of the multi-core array, each core (i.e., core corresponding to a level 1 load level) in the level 1 region (i.e., core region of the fourth type) is separated by three cores (i.e., core corresponding to a level 0 load level) in the level 0 region (i.e., core region of the third type); in even rows and even columns of the multi-core array, each core (i.e., the core corresponding to the level 1 load level) in the level 1 region (i.e., the core corresponding to the level 1 load level) is separated by two cores (i.e., the core corresponding to the level 0 load level) in the level 0 region (i.e., the core corresponding to the level 0 load level) and 1 core (i.e., the core corresponding to the level 2 load level) in the level 2 region (i.e., the core corresponding to the level 2 load level) is separated by two cores (i.e., the core corresponding to the level 0 load level) in the level 0 region and 1 core (i.e., the core corresponding to the level 1 load level) in the level 1 region.
Similar to fig. 6C, the purpose of the second hierarchical task allocation pattern shown in fig. 6D is also: tasks at different load levels are distributed across the proximately located cores and level 1 and level 2 tasks are separated by level 0 to avoid high load task aggregation. In the second sub-mode of the second hierarchical task allocation mode shown in fig. 6D, the ratio of the 0-level, 1-level and 2-level tasks is 5. Compared with the first sub-mode of fig. 6C, the task occupation ratio of a larger load level in the second sub-mode shown in fig. 6D is lower, and the method is suitable for a use scenario in which the operation complexity is higher, a task which can generate higher power, or the requirement on the power density balance degree is higher. For example, when the difference between the task complexity levels corresponding to the lower load level (level 0) and the higher load level (levels 1 and 2) is relatively large, a larger number difference between the low-level and high-level tasks needs to be used to compensate for the larger difference in complexity between the single high-level task and the single low-level task.
In some embodiments, the task ranking parameter may include a total number of task load levels involved or encompassed by the target application, or optionally a task complexity range corresponding to each task load level. Thus, based on the task ranking parameters, not only the plurality of task load levels (i.e., the number of levels) that are expected to be allocated may be determined, but also a very different level between these task load levels, which may indicate a difference between the task complexity levels corresponding to the highest task load level and the lowest task load level among the plurality of task load levels.
Therefore, when determining the hierarchical task allocation mode, not only the specific number of the plurality of task load levels expected to be allocated corresponding to the target application may be considered, but also the relative difference (i.e., the extreme difference) between the actual task complexity levels corresponding to the respective levels may be considered, so as to more reasonably schedule the core in the multi-core system to execute the task corresponding to the target application. For example, a plurality of pre-selected distribution patterns corresponding to the number of levels (e.g., the first sub-pattern and the second sub-pattern of the second hierarchical task distribution pattern shown in fig. 6C and 6D) are first determined according to a plurality of task load levels expected to be distributed, and then a distribution pattern suitable for the extreme difference is determined or selected from the pre-selected distribution patterns according to the extreme difference between the load levels.
In some embodiments, where the plurality of task load levels of the expected allocation include a first task load level, a second task load level, and a third task load level and the range is less than a first range threshold, the hierarchical task allocation pattern is a first sub-pattern of a second hierarchical task allocation pattern, the range indicating a difference between task complexities corresponding to a highest task load level and a lowest task load level of the plurality of task load levels of the expected allocation. In some embodiments, the hierarchical task allocation pattern is a second sub-pattern of the second hierarchical task allocation pattern where the plurality of task load levels of the expected allocation includes the first task load level, the second task load level, and the third task load level and the range is greater than or equal to a second range threshold. Alternatively, the first range threshold may be smaller than or equal to the second range threshold, and the first range threshold and the second range threshold may be determined according to specific situations (e.g., an application scenario of a target application).
As shown in fig. 6C and 6D, the first and second sub-patterns both belong to a second hierarchical allocation pattern suitable for target applications with a task load level number of 3, but the former has the same low-level (level 0) and high-level (level 1 and level 2) task ratio, and is suitable for task allocation of target applications with relatively small task load level differences; in the latter case, the number of low-level tasks is more than half and the distribution is wide, while the number of high-level tasks is small, which is suitable for task allocation of target applications with relatively large range of task load levels.
FIG. 6E illustrates a default hierarchical task allocation pattern in accordance with some embodiments of the present disclosure. As shown in FIG. 6E, in the default hierarchical task allocation mode, the correspondence between core locations and expected allocated task load levels in the schedulable area includes only the correspondence of core locations to level 0 load levels. In other words, in the default hierarchical task allocation mode, no matter how many task load levels of the target application are divided or the corresponding task hierarchical parameters of the target application are equal to, the task load levels expected to be allocated will be uniformly set to 0 level, that is, the mode does not actually distinguish the task load levels of the tasks to be allocated. As shown in fig. 6E, since it does not distinguish between task levels, each task expected to be allocated is treated as a level 0 load level (i.e., the first task load level), and the corresponding grid array as a whole includes only level 0 core regions. Alternatively, the core scheduling order of the default hierarchical task allocation pattern may be in accordance with the positional order of the cores. As shown in fig. 6E, according to the order of the requests of the tasks to be executed, the cores for processing the respective tasks to be executed (i.e. allocating cores for the tasks to be executed) may be sequentially scheduled in the order of positions from left to right in the horizontal direction and from top to bottom in the vertical direction, so that any task (regardless of its task load level) may implement the core scheduling of sequential polling using this mode.
As shown in fig. 7, step S250 shown in fig. 2 may include:
s251, according to a hierarchical task distribution mode, determining a first candidate region matched with the task load level of the task to be executed from the schedulable region;
s252, determining a second candidate region from the first candidate region based on the task allocation status of each core in the schedulable region;
s253, selecting a target core from the second candidate region according to the scheduling priority of each core in the schedulable core region.
Assuming that the load level of the task to be executed is 1, a core corresponding to the level 1 task may be first selected from the schedulable core areas shown in fig. 6B-6D to form a first candidate area (i.e., level 1 area); then according to the task allocation state of the cores in the first candidate area, screening out the cores in the state of 'no task allocation' from the first candidate area to form a second candidate area; finally, cores having the highest scheduling priorities are selected from the second candidate regions (i.e., cores to which no task is assigned) as target cores suitable for processing the tasks to be executed, in accordance with the core scheduling priorities (i.e., a first scheduling order such as the gray scale representations shown in fig. 6B-6D and a second scheduling order determined based on the regional internal position order (horizontal left to right, vertical up to down as indicated by arrows in the figure).
FIG. 8 schematically illustrates a flow diagram of a core scheduling method for a multi-core system according to some embodiments of the present application. In some embodiments, after the target core determining step S250 shown in fig. 2, the processing procedure of the target core to the task to be executed may be controlled based on the current power supply state of the power domain. As shown in fig. 8, the core scheduling method for a multi-core system according to some embodiments of the present application may further include:
s810, detecting the power domain state;
and S820, task processing process control step.
In step S810 (power domain state detecting step), in response to the target core being determined, a power state of a power domain corresponding to the target core is detected, the power domain being used to supply power to the target core and at least one other core in the multi-core system.
Based on the concept of the present application, after the task to be executed is allocated to the target core, the target core needs to process the task to be executed, and thus it is necessary to ensure that the power domain of the target core is normally turned on (i.e., fully turned on) before performing task processing. Therefore, for the normal development of subsequent task processing, the current power supply state of the shared power supply domain of the target core needs to be known, and then corresponding power supply domain control is performed according to the specific power supply state, so that preparation is made for the target core to perform task processing to be executed. In some embodiments, since (e.g., larger scale) power domain turn-on and turn-off may require a process, the power states of the cores may include: a fully on state (i.e., a normal power state), a fully off state (i.e., a fully off state), a power on state (i.e., a process from the moment the power domain is turned on to the moment it is fully on), and a power down state (i.e., a process from the moment the power is turned off to the moment it is fully off).
In some embodiments, as shown in fig. 3B, the power management component 334 may directly obtain the switch states of the power domains of each core from the core state record table 333 (i.e., table 1) and thereby derive the power states of its particular power states (e.g., the shared power domain that the target core shares with at least one other core). Because the core status recording table 333 is responsible for recording various real-time status information including the power domain switch status and the switch completion status, the task allocation status of each core, and the like in real time. Accordingly, the power management component 334 may also send the power domain state of the corresponding core to the core state record table 333 to update the relevant information after power domain control is performed and/or after task processing is completed. As shown in fig. 3B, optionally, the power management component 334 may also obtain the power domain parameters of the multi-core system from the power domain parameter register PWR _ SIZE of the application interaction interface 331, so as to be used for understanding the overall status of the power domain of the target core.
In step S820 (task processing procedure control step), the processing procedure of the task to be executed is controlled according to the power supply state of the power supply domain.
As shown in fig. 3B, the first control module 334a in the power management component 334 may obtain the power domain power state of the target core directly from the core state record table 333 after obtaining the target core ID from the core scheduling component 332, so as to control the power domain and/or the task processing process based on the current power state of the power domain. For example, after the target core determines, the corresponding power domain needs to be turned on or kept on so that the target core can process the task to be executed. In case the power domain is in a normal power state, i.e. fully on, the target core may be directly instructed to perform the task.
In some embodiments, the power domain may be turned on directly, with the power domain turned off completely by default, and the target core may be instructed to process the task to be performed until the power domain is turned on completely. For example, S820 may include: responding to the situation that the power supply state of the power domain of the target core is a complete turn-off state, starting the power domain of the target core and detecting whether the power domain of the target core enters the complete turn-on state in real time; and responding to the power domain of the target core entering a fully-on state, and indicating the target core to process the task to be executed.
And during the power-on or power-off state of the power domain, power domain control and task processing can be performed according to circumstances. In some embodiments, S820 may include: responding to the power supply state of the power domain of the target core as a power-on state, and detecting whether the power domain of the target core enters a fully-opened state in real time; instructing the target core to process the task to be executed in response to the power domain of the target core entering a fully on state. In some embodiments, S820 may further include: responding to the power supply state of the power domain of the target core as a power-down state, and detecting whether the power domain of the target core enters a complete turn-off state in real time; responding to the fact that the power domain of the target core enters a complete turn-off state, starting the power domain of the target core and detecting whether the power domain of the target core enters the complete turn-on state in real time; and in response to the power domain of the target core entering a fully on state, instructing the target core to process the task to be executed.
In some embodiments, according to the concept of the present application, in combination with the near-zero principle, in order to implement core scheduling of regionalizable power management, it is necessary to adopt a multi-core shared power domain (relative to an independent power domain of different cores) in a multi-core system, and implement task processing based on control of the shared power domain. The multi-core shared power domain refers to a power supply device shared by a plurality of cores which are physically and relatively adjacent, namely, one shared power domain can supply power for at least two cores. In some embodiments, as shown in fig. 3B, the number of shared power domains and the power domain parameters (i.e., the shape of the core region and the number of cores involved in the power domain) in the multi-core system may be predetermined based on actual conditions (e.g., application scenario, technology process, etc.) and stored in the power domain parameter register PWR _ SIZE. In some embodiments, each core included in the shared power domain may generally be each core of a square core array in the multi-core system, i.e., its power domain parameter may be, for example, 2 × 2, 4 × 4, or 8 × 8, etc., which facilitates the arrangement (e.g., arrangement of power devices and routing arrangement with each core) and management of the power domain and its corresponding core. Alternatively, the plurality of cores included in the shared power domain may be individual cores in a rectangular or other shaped array of cores.
As shown in fig. 8, when the power domain of the target core is a power domain shared by the target core and at least one other core in the multi-core system, the core scheduling method for a multi-core system according to some embodiments of the present application may further include:
s830, a task allocation state obtaining step; and
and S840, a power domain switch control step.
In step S830 (task allocation state acquisition step), in response to the task to be executed being completed by the target core processing, a task allocation state of at least one other core is acquired. Wherein at least one other core refers to all cores sharing a power domain with the target core.
Generally, after the task to be executed is completed, the corresponding power domain may be turned off in time to avoid energy consumption loss. However, since the target core shares the power domain with at least one other core, it is necessary to know the task allocation and execution status of the other cores in the shared power domain after the task processing is completed. For example, if all other cores of the shared power domain are not assigned tasks or task processing is complete, indicating that they are in an idle state, the power domain may be turned off at a suitable time to conserve resources. In some embodiments, the task allocation and execution of the core may be directly obtained from the parameter Assigned (task allocation status) in the core status record table (i.e. table 1) shown in fig. 3B, for example, assigned =1 indicates that the task status is allocated, i.e. the core is occupied; and Assigned =0 indicates an unallocated task state, i.e., a core is unoccupied and is in an idle state, and a task can be allocated.
In step S840 (power domain switching control step), switching of the power domain of the target core is controlled based on the task allocation state of at least one other core.
After obtaining the task allocation status of other cores involved in the power domain of the target core, the power domain may be controlled according to this information. For example, when all cores are idle, the power domain is turned off directly; optionally, the system may also wait for a preset time (i.e., sleep properly) before turning off to deal with the distribution processing of the task flows with smaller intervals, so as to avoid the damage of the power domain caused by frequent switching of the power domain and the influence of frequent power-on and power-off on the overall work efficiency. As shown in fig. 3B, the second control module 334B in the power management component 334, after receiving the message of the target task completion from the first control module 334a, may know whether all the cores involved in the power domain of the target core (i.e., all the cores sharing the same power domain as the target core) are idle based on the task allocation status of all the cores acquired from the core status record table 333; if the power domain is idle, the power domain is directly turned off or turned off after waiting for the preset sleep time, otherwise, the power domain is continuously kept turned on.
Fig. 9A illustrates a core power domain state change diagram involved in a core scheduling method for a multi-core system according to some embodiments of the present application. Fig. 9B and 9C illustrate example waveform diagrams of key signals in implementing a core scheduling method for a multi-core system according to some embodiments of the present application. The waveform diagrams of fig. 9B and 9C include four key signals: assigned, assigned _ done, PWR _ off, and PWR _ off _ done, which respectively represent a task assignment state signal, a task processing ready signal, a switching signal of a power domain corresponding to the target core, and a switching completion signal of the target core. As shown in fig. 9B, T1, T2, T3, T4, and T5 respectively represent a power domain power-on start time, a power-on completion time, a sleep timing start time, a power-off start time, and a power-off completion time corresponding to the target core, and T is a time point for the target core to start power-on, power-off, and power-off Powering on Indicating the period of time during which the power-up process is continued, T Power down Indicating the duration of the power down process, T sleep Representing a preset sleep threshold time. As shown in fig. 9C, t6 and t7 represent the time when the target core is assigned a task and the time when the task is completed during the power domain being in the working state or the idle state, respectively.
The process of power domain state change and localized power management according to some embodiments of the present application is described below in conjunction with fig. 9A-9C.
As shown in fig. 9A, the states of the slave power domain (or power states) include a (fully) off state, a power-on state, an active state, an idle state, and a power-down state, where the active state and the idle state may be collectively referred to as a fully on state. As shown in fig. 9A, the transition process between the states of the power domains is as follows: 1: firstly, a power domain is in an off state by default; 2. in response to a core (e.g., a target core) within the power domain being tasked, the power domain is brought from a (fully) off state to a powered-on state; 3. after the electrification is finished, the power domain enters a working state from an electrification state, and a corresponding core is ready to execute a task; 4. when the corresponding core task is completed and all other cores in the power domain are not allocated with tasks or the tasks are completed (namely, idle), the power domain is switched into an idle state from a working state; 5. if the corresponding core task is completed, but other cores in the power domain work, the power domain keeps a working state; 6. no core is assigned with new task during the idle state of the power domain, and after the idle state expires, namely the sleep timing reaches the preset T sleep The power domain enters a power-down state from an idle state; 7. if a new task is distributed to the core during the idle state of the power domain, the power domain returns to the working state from the idle state; 8. after the power-down state is finished, the power domain returns to the original shutdown state, and waits for new task allocation to restart the process.
Specifically, as shown in fig. 9A, the shared power domain corresponding to the target core (e.g., to power the target core and at least one other core) is completely turned off in a default state to save power consumption (as is true for each power domain in a multi-core system). As shown in FIG. 9B, for example, before time t1 and after time t5, the power domain of the target core is off, where PWR _ off is 1 and PWR _ off _ done is 1.
As shown in fig. 9A, assuming that the target core is the core to which the task is first allocated in the corresponding power domain, as the target core is allocated with the task to be executed, the power domain is turned on, thereby entering a power-on state; as shown in fig. 9B, at time t1 when the target core is Assigned with a task, the Assigned signal changes from 0 to 1, which indicates that the target core enters the task Assigned state; the Assigned _ done signal is still 0, indicating that the target core is not ready; the PWR _ off signal for the corresponding power domain goes from 1 to 0, indicating that the power domain is turned on, i.e., the power-on state is entered, while PWR _ off _ done remains at 1, indicating that the power domain is not fully turned on, i.e., the power-on is not complete.
As shown in FIGS. 9A and 9B, at time t2 when power-up of the power domain is complete, the power domain is fully turned on, enters an active state, PWR _ off _ done changes from 1 to 0, PWR _offremains at 0; at time t2, the Assigned signal remains 1, and the Assigned _ done signal changes from 0 to 1, which indicates that the target core is ready to start processing the task to be executed.
As shown in fig. 9A and 9B, at time t3 when the target core task is completed, the Assigned and Assigned _ done signals change from 1 to 0, which indicates that the assignment state of the target core task is updated to an unassigned task, and the idle state (waiting for the next assignment of tasks) is entered; at this time, if all other cores included in the corresponding power domain are in an idle state (i.e., the task allocation state is an unallocated task), the power domain corresponding to the target core enters the idle state from the working state, as shown in fig. 9B, PWR _ off and PWR _ off _ done remain 0, and the sleep timer is started.
As shown in fig. 9A and 9B, at the target core task completion time t3, if there is a non-idle state (i.e., the task allocation state is an allocated task) in other cores included in the corresponding power domain, the power domain remains in the working state, and PWR _ off _ done remain at 0; and waiting until all cores of the power domain are in the idle state (i.e. the task allocation state is an unallocated task), the power domain enters the idle state from the working state, and PWR _ off _ done remain 0. Although not shown in fig. 9B, during the power domain idle state from time t3 to time t4, if the target core is Assigned a new task, assigned and Assigned _ done change from 0 to 1, the power domain enters the operating state again, and PWR _ off _ done remain at 0.
As shown in FIGS. 9A and 9B, when the sleep timer reaches a predetermined time T sleep At time t4, i.e. at the end of sleep,the power domain enters a power-down state from an idle state; at this time, if the target core is not Assigned with a new task, the Assigned and Assigned _ done signals remain 0, PWR _ off of the power domain changes from 0 to 1, which means that the power domain is turned off, and power is turned off, and PWR _ off _ done remains 0, which means that it is not completely turned off. Although not shown in fig. 9B, during the power down process of the power domain from time t4 to time t5, if a new task is allocated to the target core, the Assigned signal is changed from 0 to 1, the Assigned _donesignal remains 0, the power domain continues the power down process, and is powered up again at time t5 when the power down is finished, and the task processing and power domain management processes are repeated.
As shown in fig. 9A and 9B, at time t5 when the power down of the power domain is completed, the power domain returns to the original fully off state; at this time, PWR _ off _ done changes from 0 to 1, indicating that the power down is completed and the power domain is completely turned off.
Fig. 9C illustrates an example waveform diagram showing key signals in an implementation process of a core scheduling method for a multi-core system when a power domain of a target core is in an operating state or an idle state according to some embodiments of the present application. As shown in fig. 9C, at time t6 when the target core is Assigned with the task complete, if the power domain corresponding to the target core is in the working state, assigned and Assigned _ done are changed from 0 to 1, which indicates that the task assignment is complete and the target core is ready (because no power-up process is needed) and PWR _ off _ done are always kept at 0; when the target core completes the task at time t7, assigned and Assigned _ done are changed from 1 to 0. In addition, as shown in fig. 9C, at time t6 when the target core is assigned with a task, if the power domain corresponding to the target core is in the idle state, the power domain immediately jumps out of the idle state and resumes the operating state, but PWR _ off and PWR _ off _ done are always kept at 0, so that the power domain is in the idle state and the operating state, and the corresponding waveform diagrams are not changed.
According to the target core task allocation state and the corresponding power domain state change process shown in fig. 9A and 9B, after a task to be executed is allocated to a target core, the processing of the power domain and the task to be executed can be controlled by the following steps:
1. if the power domain corresponding to the target core is not powered on at the moment, namely both PWR _ off and PWR _ off _ done are 1, starting the power domain to power on the area; after the power-on is finished, setting PWR _ off _ done of the power domain to be 0, setting Assigned _ done of the core to be 1, and then starting task execution;
2. if the power domain corresponding to the target core is powered on and supplies power normally, the PWR _ OFF is 0, and the PWR _ OFF _ done is 0, directly setting the Assigned _ done of the core to 1, and starting to execute the task; the power domain PWR _ OFF remains 0 and PWR _ OFF _doneremains 0;
3. if the power domain corresponding to the target core starts to be powered OFF, namely PWR _ OFF is 1 and PWR _off _, done is 0, the power failure is waited to be completed, namely PWR _ OFF is 1 and PWR _, OFF _, done is 1, then PWR _ OFF is set to 0 and power is powered on again; after the power-on is finished, PWR _ off _ done is set to be 0, and the Assigned _ done of the core is set to be 1, and the task execution is started.
In some embodiments, in response to the task to be performed being completed by the target core process, S840 (power domain switch controlling step) may include: and in response to the task allocation status of the at least one other core being an unallocated task, turning off the power domain of the target core. In some embodiments, S840 may include: starting power domain idle state timing in response to the task allocation status of the at least one other core being an unallocated task; and responding to the preset time when the power domain idle state is timed, and turning off the power domain of the target core. In some embodiments, S840 may further include: acquiring a task allocation state of each of the at least one other core and the target core in real time during the power domain idle state timing; terminating the power domain idle state timing in response to a task allocation status of at least one of the at least one other core and the target core being an allocated task status during the power domain idle state timing.
Specifically, according to the target core task allocation status and the corresponding power domain status change process shown in fig. 9A and 9B, after the task to be executed allocated to the target core is completed, the target core may be in an idle state, and the Assigned and Assigned of the core are usedd _ done is set to 0; and if each core of the power domain is in an idle state, the power domain enters the idle state, and the sleep timing is started after PWR _ off =0 and PWR _off _done =0 are kept; then, when the idle state reaches the preset time sleep time threshold T sleep When the power domain starts to be powered off, PWR _ off =1, PWR_ _ off_done =0; PWR _ OFF =1, PWR _ OFF _done =1 after completion of the power down, and the power domain resumes the default state, i.e., the fully OFF state. If a certain core contained in the power domain is distributed to the tasks in the sleep timing process, the sleep timing is cleared, the power domain is recovered to a normal working state, and the core processes the tasks normally.
In the key signal waveform diagram regarding the task allocation status update and power domain management and control process of the core shown in fig. 9B, the Assigned, assigned _ done, PWR _ off, and PWR _ off _ done signals are the information of interest of the core status record table; the Assigned _ done is an intermediate signal for avoiding calling the target core when the power domain is powered down or powered up incompletely, and causing errors. For example, the Assigned _ done is updated to 1 at the time t2 when power-up is completed, so that the processing procedure of the corresponding task is started only when the target core is ready (i.e., when the power domain is completely powered up), and task processing failure caused by insufficient preparation or incomplete power-up of the target core is avoided.
Fig. 10 schematically illustrates an example block diagram of a core scheduling apparatus 1000 for a multi-core system according to some embodiments of the present application. The core scheduling apparatus 1000 shown in fig. 10 may correspond to the core scheduling platform 130 of fig. 1.
As shown in fig. 10, the core scheduling apparatus 1000 for a multi-core system includes a receiving module 1010, a first obtaining module 1020, a first determining module 1030, a second obtaining module 1040, and a second determining module 1050. The receiving module 1010 may be configured to receive a task execution request from a target application, the task execution request including a task load level of a task to be executed. The first obtaining module 1020 may be configured to obtain a hierarchical task allocation pattern for the multi-core system, the hierarchical task allocation pattern including a correspondence between individual cores in a schedulable core region of the multi-core system and a plurality of task load levels expected to be allocated. The first determining module 1030 may be configured to determine the scheduling priority of each core in the schedulable core area according to a positional relationship between each core in the schedulable core area and a predetermined core. The second obtaining module 1040 may be configured to obtain task allocation statuses of the cores in the schedulable core area. The second determining module 1050 may be configured to determine a target core for processing the task to be executed from the schedulable core region according to a task load level of the task to be executed, the hierarchical task allocation pattern, and a scheduling priority and a task allocation status of each core in the schedulable core region.
It should be noted that the various modules described above may be implemented in software or hardware or a combination of both. Several different modules may be implemented in the same software or hardware configuration, or one module may be implemented by several different software or hardware configurations.
According to the core scheduling device for the multi-core system, a task hierarchical processing mode is adopted for tasks issued by software (target application), namely, task load levels are utilized to represent task complexity, quantitative analysis of task operation complexity is facilitated, and a core scheduling process based on hierarchical tasks is simplified; secondly, through a hierarchical task allocation mode, the balanced allocation or arrangement of tasks of different load levels in the core array can be realized, so that the power density of the core array of the multi-core system is effectively controlled, the power density of the multi-core array during task execution is relatively balanced, the problems of high load state, overlarge power density and high heating of the core array area caused by excessive aggregation of high-load tasks are avoided, and the overall performance and the task execution efficiency of the multi-core system are effectively improved; in addition, while the hierarchical task allocation mode is used for realizing the balanced arrangement of different task load levels, the core allocation sequence of the current tasks to be executed is defined by using the scheduling priority (for example, the near-zero principle) determined based on the position relationship (for example, the position closeness degree of the cores to be scheduled) between each core and a predetermined core (for example, the initial cores to be scheduled with the ID =0 at the upper left corner of the core array), so that each hierarchical task is preferentially and intensively allocated to the area (for example, the power supply area of the same power supply domain) near the predetermined core, which is beneficial to the centralized management of the scheduled cores in the multi-core system, especially the regional power supply management, the core management efficiency of the multi-core system is remarkably improved, and simultaneously, the energy consumption, the power supply life loss and the work efficiency loss (for example, caused by frequent switching of the power supplies) are reduced by the unified regional power supply management (for example, the power supply sleep mode is adopted when each core in the scheduled core area is idle).
FIG. 11 schematically illustrates an example block diagram of a computing device 1100 in accordance with some embodiments of the present application. Computing device 1100 may represent a device to implement the various apparatus or modules described herein and/or perform the various methods described herein. Computing device 1100 can be, for example, a server, a desktop computer, a laptop computer, a tablet, a smartphone, a smartwatch, a wearable device, or any other suitable computing device or computing system, which can include various levels of devices from full resource devices with substantial storage and processing resources to low-resource devices with limited storage and/or processing resources. In some embodiments, the core scheduling apparatus 1000 for a multi-core system described above with respect to fig. 10 may be implemented in one or more computing devices 1100, respectively.
As shown in fig. 11, the example computing device 1100 includes a processing system 1101, one or more computer-readable media 1102, and one or more I/O interfaces 1103 communicatively coupled to each other. Although not shown, the computing device 1100 may also include a system bus or other data and command transfer system that couples the various components to one another. A system bus can include any one or combination of different bus structures, such as a memory bus or memory controller, a peripheral bus, a universal serial bus, and/or a processor or local bus that utilizes any of a variety of bus architectures. Alternatively, control and data lines, for example, may be included.
The processing system 1101 represents functionality to perform one or more operations using hardware. Accordingly, the processing system 1101 is illustrated as including hardware elements 1104 that may be configured as processors, functional blocks, and so forth. This may include implementation in hardware as an application specific integrated circuit or other logic device formed using one or more semiconductors. Hardware element 1104 is not limited by the material from which it is formed or the processing mechanisms employed therein. For example, a processor may be comprised of semiconductor(s) and/or transistors (e.g., electronic Integrated Circuits (ICs)). In such a context, processor-executable instructions may be electronically-executable instructions.
The computer-readable medium 1102 is illustrated as including a memory/storage 1105. Memory/storage 1105 represents memory/storage associated with one or more computer-readable media. Memory/storage 1105 may include volatile media (such as Random Access Memory (RAM)) and/or nonvolatile media (such as Read Only Memory (ROM), flash memory, optical disks, magnetic disks, and so forth). Memory/storage 1105 may include fixed media (e.g., RAM, ROM, a fixed hard drive, etc.) as well as removable media (e.g., flash memory, a removable hard drive, an optical disk, and so forth). The computer-readable medium 1102 may be configured in various other ways, which are further described below.
One or more I/O (input/output) interfaces 1103 represent functionality that allows a user to enter commands and information to computing device 1100, and that also allows information to be displayed to the user and/or sent to other components or devices using various input/output devices. Examples of input devices include a keyboard, a cursor control device (e.g., a mouse), a microphone (e.g., for voice input), a scanner, touch functionality (e.g., capacitive or other sensors configured to detect physical touch), a camera (e.g., motion that does not involve touch may be detected as gestures using visible or invisible wavelengths such as infrared frequencies), a network card, a receiver, and so forth. Examples of output devices include a display device, speakers, printer, haptic response device, network card, transmitter, and so forth.
Computing device 1100 also includes core scheduling policy 1106. Core scheduling policy 1106 may be stored as computer program instructions in memory/storage 1105, or may be hardware or firmware. The core scheduling policy 1106 may implement all functions of the respective modules of the core scheduling apparatus 1100 for a multi-core system described with respect to fig. 10, along with the processing system 1101 and the like.
Various techniques may be described herein in the general context of software, hardware, elements, or program modules. Generally, these modules include routines, programs, objects, elements, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The terms "module," "functionality," and the like, as used herein generally represent software, firmware, hardware, or a combination thereof. The features of the techniques described herein are platform-independent, meaning that the techniques may be implemented on a variety of computing platforms having a variety of processors.
An implementation of the described modules and techniques may be stored on or transmitted across some form of computer readable media. Computer readable media can include a variety of media that can be accessed by computing device 1100. By way of example, and not limitation, computer-readable media may comprise "computer-readable storage media" and "computer-readable signal media".
"computer-readable storage medium" refers to a medium and/or device, and/or a tangible storage apparatus, capable of persistently storing information, as opposed to mere signal transmission, carrier wave, or signal per se. Accordingly, computer-readable storage media refer to non-signal bearing media. Computer-readable storage media include hardware such as volatile and nonvolatile, removable and non-removable media and/or storage devices implemented in a method or technology suitable for storage of information such as computer-readable instructions, data structures, program modules, logic elements/circuits or other data. Examples of computer readable storage media may include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical storage, hard disks, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or other storage, tangible media, or an article of manufacture suitable for storing the desired information and which may be accessed by a computer.
"computer-readable signal medium" refers to a signal-bearing medium configured to transmit instructions to the hardware of the computing device 1100, such as via a network. Signal media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave, data signal or other transport mechanism. Signal media also includes any information delivery media. By way of example, and not limitation, signal media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.
As previously described, hardware element 1104 and computer-readable medium 1102 represent instructions, modules, programmable device logic, and/or fixed device logic implemented in hardware form that, in some embodiments, may be used to implement at least some aspects of the techniques described herein. The hardware elements may include integrated circuits or systems-on-chips, application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), complex Programmable Logic Devices (CPLDs), and other implementations in silicon or components of other hardware devices. In this context, a hardware element may serve as a processing device to perform program tasks defined by instructions, modules, and/or logic embodied by the hardware element, as well as a hardware device to store instructions for execution, such as the computer-readable storage medium described previously.
Combinations of the foregoing may also be used to implement the various techniques and modules described herein. Thus, software, hardware, or program modules and other program modules may be implemented as one or more instructions and/or logic embodied on some form of computer-readable storage medium and/or by one or more hardware elements 1104. The computing device 1100 may be configured to implement particular instructions and/or functions corresponding to software and/or hardware modules. Thus, implementing modules at least partially in hardware as modules executable by the computing device 1100 as software may be accomplished, for example, through the use of computer-readable storage media of a processing system and/or hardware elements 1104. The instructions and/or functions may be executed/operable by, for example, one or more computing devices 1100 and/or processing system 1101 to implement the techniques, modules, and examples described herein.
The techniques described herein may be supported by these various configurations of the computing device 1100 and are not limited to specific examples of the techniques described herein.
In particular, according to an embodiment of the present application, the processes described above with reference to the flowcharts may be implemented as a computer program. For example, embodiments of the present application provide a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing at least one step of the method embodiments of the present application.
In some embodiments of the present application, one or more computer-readable storage media are provided having computer-readable instructions stored thereon that, when executed, implement a core scheduling method for a multi-core system according to some embodiments of the present application. The steps of the core scheduling method for a multi-core system according to some embodiments of the present application may be converted into computer readable instructions by programming and stored in a computer readable storage medium. When such a computer-readable storage medium is read or accessed by a computing device or computer, the computer-readable instructions therein are executed by a processor on the computing device or computer to implement the methods according to some embodiments of the present application.
In the description herein, the description of the terms "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, any one or a combination of the following techniques, which are well known in the art, may be used: discrete logic circuits having logic Gate circuits for implementing logic functions on data signals, application specific integrated circuits having appropriate combinational logic Gate circuits, programmable Gate arrays (Programmable Gate arrays), field Programmable Gate arrays (Field Programmable Gate arrays), and the like.
It will be understood by those skilled in the art that all or part of the steps of the method of the above embodiments may be performed by hardware associated with program instructions, and that the program may be stored in a computer readable storage medium, which when executed, includes performing one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer-readable storage medium.

Claims (31)

1. A core scheduling method for a multi-core system, the method comprising:
receiving a task execution request from a target application, wherein the task execution request comprises a task load level of a task to be executed;
acquiring a hierarchical task allocation pattern for the multi-core system, wherein the hierarchical task allocation pattern comprises a corresponding relation between each core in a schedulable core area of the multi-core system and a plurality of task load levels expected to be allocated, and the plurality of task load levels expected to be allocated are related to a target application;
determining the scheduling priority of each core in the schedulable core area according to the position relationship between each core in the schedulable core area and a preset core;
acquiring a task allocation state of each core in the schedulable core area;
and determining a target core for processing the task to be executed from the schedulable core area according to the task load level of the task to be executed, the hierarchical task allocation mode and the scheduling priority and task allocation state of each core in the schedulable core area.
2. The method of claim 1, wherein determining the scheduling priority of each core in the schedulable core region according to a positional relationship between each core in the schedulable core region and a predetermined core comprises:
dividing the schedulable core area into a plurality of schedulable core sub-areas, the plurality of schedulable core sub-areas including a reference schedulable core sub-area where the predetermined core is located;
determining a first scheduling order for each schedulable core sub-region based on a distance between each schedulable core sub-region and a reference schedulable core sub-region;
and determining the scheduling priority of each core in each schedulable core sub-area at least according to the first scheduling sequence of each schedulable core sub-area.
3. The method of claim 2, wherein cores in the same sub-region of schedulable cores share the same power domain.
4. The method of claim 2, wherein each of the plurality of schedulable core sub-regions is a square core array region and includes a same number of cores.
5. The method of claim 2, wherein determining the first scheduling order for each schedulable core sub-region based on a distance between each schedulable core sub-region and a reference schedulable core sub-region comprises:
calculating a lateral distance and a longitudinal distance between each schedulable core sub-region and the reference schedulable core sub-region;
determining a first scheduling order for each schedulable core sub-region based on at least one of a lateral distance and a longitudinal distance between each schedulable core sub-region and the reference schedulable core sub-region.
6. The method of claim 5, wherein determining the first scheduling order for each schedulable core sub-region based on at least one of a lateral distance and a longitudinal distance between each schedulable core sub-region and the reference schedulable core sub-region comprises:
for each schedulable core sub-region, calculating a sum of squares of a lateral distance and a longitudinal distance between the schedulable core sub-region and the reference schedulable core sub-region;
determining a first scheduling sequence of each schedulable core sub-region based on a sum of squares of the transverse distance and the longitudinal distance corresponding to each schedulable core sub-region;
in response to there being at least two schedulable core sub-regions of the plurality of schedulable core sub-regions having a same first scheduling order, updating the first scheduling order for the at least two schedulable core sub-regions based on a longitudinal distance between each of the at least two schedulable core sub-regions and the reference schedulable core sub-region.
7. The method of claim 2, wherein determining the scheduling priority of each core in each schedulable core sub-region according to at least the first scheduling order for each schedulable core sub-region comprises:
determining a second scheduling sequence of each core in each schedulable core sub-area according to the transverse arrangement sequence and the longitudinal arrangement sequence of each core in each schedulable core sub-area;
and determining the scheduling priority of each core in each schedulable core sub-area according to the first scheduling sequence of each schedulable core sub-area and the second scheduling sequence of each core in the schedulable core sub-area.
8. The method according to claim 2, wherein the predetermined core is a starting core to be scheduled preset in the schedulable core region, and the determining the first scheduling order for each schedulable core sub-region based on a distance between each schedulable core sub-region and a reference schedulable core sub-region comprises:
sorting the schedulable core sub-regions in order from small to large based on a distance between each schedulable core sub-region and a reference schedulable core sub-region;
and determining a first scheduling sequence of each schedulable core sub-region according to the sequencing result, wherein the first scheduling sequence of each schedulable core sub-region is consistent with the sequencing result.
9. The method of claim 1, wherein in the hierarchical task assignment mode, the schedulable core region comprises multiple types of core regions corresponding to the multiple task load levels of the expected assignment one to one, each type of core region comprises multiple non-adjacent sub-regions, and each sub-region comprises one core or at least two adjacent cores.
10. The method of claim 9, wherein the hierarchical task allocation pattern is derived from a plurality of task load levels of the expected allocation.
11. The method of claim 10, wherein in the event that the plurality of task load levels of the anticipated allocation include a first task load level and a second task load level, the hierarchical task allocation pattern is a first hierarchical task allocation pattern,
in the first hierarchical task allocation mode, the schedulable core regions include a first class of core regions corresponding to the first task load level and a second class of core regions corresponding to the second task load level, and each sub-region of the first class of core regions and each sub-region of the second class of core regions includes a core.
12. The method of claim 10, wherein in the event that the plurality of task load levels of the expected allocation include a first task load level, a second task load level, and a third task load level, the hierarchical task allocation pattern is a second hierarchical task allocation pattern,
in the second hierarchical task allocation mode, the schedulable core areas include a third class core area corresponding to the first task load level, a fourth class core area corresponding to the second task load level, and a fifth class core area corresponding to the third task load level, and each sub-area in the fourth class core area is not adjacent to each sub-area in the fifth class core area.
13. The method of claim 12, wherein the task complexity level for each of the second task load level and the third task load level is greater than the task complexity level for the first task load level.
14. The method of claim 13, wherein the schedulable core region is an array region, and wherein if the plurality of task load levels of the expected allocation includes a first task load level, a second task load level, and a third task load level and a range is less than a first range threshold, the hierarchical task allocation pattern is a first sub-pattern of a second hierarchical task allocation pattern, the range indicating a difference between the task complexity levels corresponding to a highest task load level and a lowest task load level of the plurality of task load levels of the expected allocation,
in the first sub-mode, in each row and each column of schedulable core regions, cores in the fourth type of region are separated by at least one core in the third type of region and at least one core in the fifth type of region, and cores in the fifth type of region are separated by at least one core in the third type of region and at least one core in the fourth type of region.
15. The method of claim 14, wherein the second hierarchical task allocation pattern is a second sub-pattern of the second hierarchical task allocation pattern where the plurality of task load levels of the expected allocation includes a first task load level, a second task load level, and a third task load level and the spread is greater than or equal to a second spread threshold, wherein the first spread threshold is less than or equal to the second spread threshold,
in the second sub-mode, in odd rows and odd columns of the schedulable core areas, the cores in the fourth type of area are separated by one or more cores in the third type of area, and in even rows and even columns of the schedulable area, the cores in the fourth type of area are separated by at least one core in the third type of area and at least one core in the fifth type of area, the cores in the fifth type of area are separated by at least one core in the third type of area and at least one core in the fourth type of area.
16. The method of any of claims 12-15, wherein a task complexity level corresponding to a third task load level is greater than a task complexity level corresponding to a second task load level.
17. The method of claim 1, further comprising:
acquiring regional scheduling parameters corresponding to the target application, wherein the regional scheduling parameters are determined based on the core number required by the running of the target application;
and determining a schedulable core area from a core array of the multi-core system according to the area scheduling parameter.
18. The method of claim 1, wherein the determining a target core from the schedulable core area for processing the task to be executed according to the task load level of the task to be executed, the hierarchical task allocation pattern, and the scheduling priority and task allocation status of each core in the schedulable core area comprises:
determining a first candidate core area matched with the task load level of the task to be executed from the schedulable core area according to the hierarchical task distribution mode;
determining a second candidate core region from the first candidate core region based on task allocation status of each core in the schedulable core region;
selecting a target core from the second candidate core region according to the scheduling priority of each core in the schedulable core region.
19. The method of claim 1, wherein the power domain of the target core is a power domain shared by the target core and at least one other core in the multi-core system, and further comprising:
detecting a task allocation state of the at least one other core in response to the task to be executed being processed by the target core;
controlling switching of a power domain of the target core based on a task allocation status of the at least one other core.
20. The method of claim 19, wherein controlling switching of a power domain of the target core based on the task allocation status of the at least one other core comprises:
and in response to the task allocation status of the at least one other core being an unallocated task, turning off the power domain of the target core.
21. The method of claim 19, wherein controlling switching of a power domain of the target core based on the task allocation status of the at least one other core comprises:
starting power domain idle state timing in response to the task allocation state of the at least one other core being an unallocated task state;
and responding to the preset time when the power domain idle state is timed, and turning off the power domain of the target core.
22. The method of claim 21, wherein controlling switching of a power domain of the target core based on the task allocation status of the at least one other core further comprises:
acquiring a task allocation state of each of the at least one other core and the target core in real time during the power domain idle state timing;
in response to a task allocation status of at least one of the at least one other core and the target core being an allocated task status during the power domain idle state timing, terminating the power domain idle state timing and keeping a power domain of the target core on.
23. The method of claim 1, further comprising:
and in response to the fact that the task to be executed is processed and completed by the target core, updating the task allocation state of the target core to be an unallocated task state.
24. The method of claim 1, further comprising:
in response to the target core being determined, detecting a power state of a power domain of the target core;
and controlling the processing process of the task to be executed according to the power supply state of the power domain of the target core.
25. The method according to claim 24, wherein the controlling the processing procedure of the task to be executed according to the power supply state of the power domain of the target core comprises:
responding to the fact that the power supply state of the power domain of the target core is a complete turn-off state, starting the power domain of the target core and detecting whether the power domain of the target core enters a complete turn-on state in real time; and
and in response to the power domain of the target core entering a fully-on state, instructing the target core to process the task to be executed.
26. The method according to claim 24, wherein the controlling the processing procedure of the task to be executed according to the power supply state of the power domain of the target core comprises:
responding to the power supply state of the power domain of the target core as a power-on state, and detecting whether the power domain of the target core enters a fully-opened state in real time;
and in response to the power domain of the target core entering a fully-on state, instructing the target core to process the task to be executed.
27. The method according to claim 24, wherein the controlling the processing procedure of the task to be executed according to the power supply state of the power domain of the target core comprises:
responding to the power supply state of the power domain of the target core as a power-down state, and detecting whether the power domain of the target core enters a complete turn-off state in real time;
responding to the fact that the power domain of the target core enters a complete turn-off state, starting the power domain of the target core and detecting whether the power domain of the target core enters the complete turn-on state in real time; and
and in response to the power domain of the target core entering a fully-on state, instructing the target core to process the task to be executed.
28. A core scheduling apparatus for a multi-core system, the apparatus comprising:
a receiving module configured to receive a task execution request from a target application, the task execution request including a task load level of a task to be executed;
a first obtaining module configured to obtain a hierarchical task allocation pattern for a multi-core system, the hierarchical task allocation pattern comprising a correspondence between individual cores in a schedulable core area of the multi-core system and a plurality of task load levels of an expected allocation, the plurality of task load levels of the expected allocation being related to a target application;
a first determining module configured to determine a scheduling priority of each core in the schedulable core area according to a positional relationship between each core in the schedulable core area and a predetermined core;
a second obtaining module configured to obtain a task allocation status of each core in the schedulable core area;
a second determining module configured to determine a target core for processing the task to be executed from the schedulable core region according to the task load level of the task to be executed, the hierarchical task allocation pattern, and the scheduling priority and task allocation status of each core in the schedulable core region.
29. A computing device, comprising:
a memory and a processor, wherein the processor is capable of,
wherein the memory has stored therein a computer program which, when executed by the processor, causes the processor to perform the method of any one of claims 1-27.
30. A computer-readable storage medium having computer-readable instructions stored thereon which, when executed, implement the method of any one of claims 1-27.
31. A computer program product, characterized in that it comprises a computer program which, when being executed by a processor, carries out the steps of the method according to any one of claims 1-27.
CN202211713989.1A 2022-12-30 2022-12-30 Core scheduling method and device for multi-core system Active CN115686873B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211713989.1A CN115686873B (en) 2022-12-30 2022-12-30 Core scheduling method and device for multi-core system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211713989.1A CN115686873B (en) 2022-12-30 2022-12-30 Core scheduling method and device for multi-core system

Publications (2)

Publication Number Publication Date
CN115686873A true CN115686873A (en) 2023-02-03
CN115686873B CN115686873B (en) 2023-04-07

Family

ID=85056513

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211713989.1A Active CN115686873B (en) 2022-12-30 2022-12-30 Core scheduling method and device for multi-core system

Country Status (1)

Country Link
CN (1) CN115686873B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116109110A (en) * 2023-04-11 2023-05-12 华能信息技术有限公司 Task scheduling method for service center

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090288092A1 (en) * 2008-05-15 2009-11-19 Hiroaki Yamaoka Systems and Methods for Improving the Reliability of a Multi-Core Processor
CN106991071A (en) * 2017-03-31 2017-07-28 联想(北京)有限公司 kernel dispatching method and system
CN108170526A (en) * 2017-12-06 2018-06-15 北京像素软件科技股份有限公司 Load capacity optimization method, device, server and readable storage medium storing program for executing
CN112817428A (en) * 2021-01-25 2021-05-18 广州虎牙科技有限公司 Task running method and device, mobile terminal and storage medium
CN113934530A (en) * 2020-12-31 2022-01-14 技象科技(浙江)有限公司 Multi-core multi-queue task cross processing method, device, system and storage medium
CN115033352A (en) * 2021-02-23 2022-09-09 阿里云计算有限公司 Task scheduling method, device and equipment for multi-core processor and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090288092A1 (en) * 2008-05-15 2009-11-19 Hiroaki Yamaoka Systems and Methods for Improving the Reliability of a Multi-Core Processor
CN106991071A (en) * 2017-03-31 2017-07-28 联想(北京)有限公司 kernel dispatching method and system
CN108170526A (en) * 2017-12-06 2018-06-15 北京像素软件科技股份有限公司 Load capacity optimization method, device, server and readable storage medium storing program for executing
CN113934530A (en) * 2020-12-31 2022-01-14 技象科技(浙江)有限公司 Multi-core multi-queue task cross processing method, device, system and storage medium
CN112817428A (en) * 2021-01-25 2021-05-18 广州虎牙科技有限公司 Task running method and device, mobile terminal and storage medium
CN115033352A (en) * 2021-02-23 2022-09-09 阿里云计算有限公司 Task scheduling method, device and equipment for multi-core processor and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116109110A (en) * 2023-04-11 2023-05-12 华能信息技术有限公司 Task scheduling method for service center
CN116109110B (en) * 2023-04-11 2023-06-23 华能信息技术有限公司 Task scheduling method for service center

Also Published As

Publication number Publication date
CN115686873B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN111176828B (en) System on chip comprising multi-core processor and task scheduling method thereof
CN102521154B (en) For creating the dynamic memory allocation of low power section and reorientating
EP1318453A1 (en) Scheduling system, method and apparatus for a cluster
CN109906421B (en) Processor core partitioning based on thread importance
Hashem et al. MapReduce scheduling algorithms: a review
Yan et al. Energy-aware systems for real-time job scheduling in cloud data centers: A deep reinforcement learning approach
CN115686873B (en) Core scheduling method and device for multi-core system
CN109388486B (en) Data placement and migration method for heterogeneous memory and multi-type application mixed deployment scene
Gandomi et al. HybSMRP: a hybrid scheduling algorithm in Hadoop MapReduce framework
CN110196851A (en) A kind of date storage method, device, equipment and storage medium
US10768684B2 (en) Reducing power by vacating subsets of CPUs and memory
Deng et al. A data and task co-scheduling algorithm for scientific cloud workflows
CN115686800B (en) Dynamic core scheduling method and device for multi-core system
Abedi et al. Dynamic resource allocation using improved firefly optimization algorithm in cloud environment
Li et al. Energy-aware scheduling on multiprocessor platforms
Hu et al. Adaptive energy-minimized scheduling of real-time applications in vehicular edge computing
CN115686871B (en) Core scheduling method and device for multi-core system
Kuo et al. Task assignment with energy efficiency considerations for non-DVS heterogeneous multiprocessor systems
US20230289223A1 (en) Task scheduling method, game engine, device and storage medium
CN115981819B (en) Core scheduling method and device for multi-core system
Ki et al. Co-optimizing CPU voltage, memory placement, and task offloading for energy-efficient mobile systems
Liu et al. Energy‐aware virtual machine consolidation based on evolutionary game theory
Ghose et al. Scheduling real time tasks in an energy-efficient way using VMs with discrete compute capacities
Majumder et al. Energy-aware real-time tasks processing for fpga-based heterogeneous cloud
Yang et al. 0–1 ILP-based run-time hierarchical energy optimization for heterogeneous cluster-based multi/many-core systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant