WO2024114728A1 - 一种异构处理器及相关调度方法 - Google Patents
一种异构处理器及相关调度方法 Download PDFInfo
- Publication number
- WO2024114728A1 WO2024114728A1 PCT/CN2023/135404 CN2023135404W WO2024114728A1 WO 2024114728 A1 WO2024114728 A1 WO 2024114728A1 CN 2023135404 W CN2023135404 W CN 2023135404W WO 2024114728 A1 WO2024114728 A1 WO 2024114728A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- performance
- target
- processing core
- information
- scenario
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 238000012545 processing Methods 0.000 claims abstract description 274
- 230000015654 memory Effects 0.000 claims abstract description 61
- 238000004422 calculation algorithm Methods 0.000 claims description 78
- 238000003860 storage Methods 0.000 claims description 45
- 238000005070 sampling Methods 0.000 claims description 44
- 238000004590 computer program Methods 0.000 claims description 14
- 238000010586 diagram Methods 0.000 description 14
- 230000008569 process Effects 0.000 description 12
- 238000007635 classification algorithm Methods 0.000 description 10
- 230000001360 synchronised effect Effects 0.000 description 7
- 238000012549 training Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 238000005265 energy consumption Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 4
- 230000006872 improvement Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 2
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005538 encapsulation Methods 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 229910021421 monocrystalline silicon Inorganic materials 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present application relates to the field of multi-core processors, and in particular to a heterogeneous processor and a related scheduling method.
- a multi-core heterogeneous processor refers to a hardware platform that contains multiple different types of processor cores.
- a multi-core heterogeneous processor may include performance cores, energy efficiency cores, and low-power cores.
- the performance core can provide high performance, high throughput, and low latency, and can be used to ensure the high performance scenario of the processor system
- the energy efficiency core can provide the best energy efficiency ratio, and can be used to ensure the better performance and lower energy consumption scenario of the processor system
- the low-power core can provide the lowest power consumption, and can be used to ensure the lowest power consumption scenario of the processor system.
- the large and small core scheduling of multi-core heterogeneous processors is to perceive the thread loads running on different processor cores in real time, determine the scheduling strategy under the performance constraints and energy consumption constraints, and realize the allocation of different types of threads to processor cores with different energy efficiencies for execution, so as to obtain the optimal energy efficiency or best performance of the processor system.
- the large nonlinearity and time-varying nature of the performance and energy efficiency of different types of thread loads on different types of processor cores in different scenarios it is difficult to accurately predict the performance of threads on different types of processor cores.
- the technical problem to be solved by the embodiments of the present application is how to provide a heterogeneous processor and a related scheduling method to determine the performance of threads on different types of processor cores, and then determine the scheduling strategy of the threads based on the performance, thereby improving the performance of the heterogeneous processor or improving the energy efficiency of the heterogeneous processor.
- an implementation case of the present application provides a heterogeneous processor, characterized in that the heterogeneous processor includes multiple processing cores of different sizes, and each of the multiple processing cores includes a scenario classifier and a performance predictor, wherein the scenario classifier in the first processing core is used to: obtain operation information of the first processing core, the operation information includes one or more of instruction stream characteristics and memory operation characteristics of the target business running by the first processing core, and the first processing core is any one of the multiple processing cores; determine a target scenario of the target business from multiple preset scenarios based on the operation information; the performance predictor in the first processing core is used to: predict target performance information based on the target scenario and the operation information, and the target performance information is performance prediction information corresponding to the target business.
- the operation information of the thread of the target business currently being run by the processing core can be obtained through the scenario classifier in the processing core (which may include instruction stream characteristics, data access characteristics, memory operation characteristics, etc. of the target business being run by the processing core), so as to determine the business scenario of the target business based on the operation information.
- the performance predictor in the processing core can predict the performance of migrating the thread of the current processing core to other processor cores of different types based on the operation information of the current processing core and the business scenario of the target business, so that the scheduling strategy of the thread can be determined based on the prediction result, thereby improving the performance of heterogeneous processors or improving the energy efficiency of heterogeneous processors.
- coarse-grained performance prediction is performed based only on information such as the processing core utilization rate of the application (i.e., business) collected by the processor subsystem, which cannot meet the real-time requirements, and thus leads to scheduling loss of performance and energy efficiency.
- the performance predictor in each processing core includes multiple performance sub-predictors and a target selector, and each of the multiple performance sub-predictors corresponds to a performance prediction algorithm for the preset scenario.
- the performance predictor in the first processing core is specifically used to: determine the corresponding performance information through each performance sub-predictor based on the performance prediction algorithm corresponding to the preset scenario and the operating information of the first processing core; obtain multiple performance information corresponding to the multiple performance sub-predictors respectively through the target selector, and determine the target performance information from the multiple performance information based on the target scenario.
- each performance sub-predictor since each performance sub-predictor corresponds to a performance prediction algorithm for a preset scenario, each performance sub-predictor can predict the performance of migrating the thread of the current processing core to other processor cores of different types based on its own performance prediction algorithm and operation information (it should be emphasized that each type of performance prediction algorithm will independently select its own processor events and storage subsystem events to determine the performance of migrating the thread of the current processing core to other processor cores of different types). Furthermore, the target selector can select one from multiple prediction results as the target prediction result based on the business scenario output by the scenario classifier, so as to more accurately predict the performance of migrating the current processing core to other processor cores. The performance of migrating threads from a processing core to other processor cores of different types.
- the performance predictor in the first processing core is specifically used to: determine a target performance prediction algorithm corresponding to the target scenario from a plurality of performance prediction algorithms, wherein the plurality of performance prediction algorithms correspond one-to-one to the plurality of preset scenarios; and determine the target performance information based on the operating information of the first processing core and the target performance prediction algorithm.
- the performance predictor since the performance predictor includes performance prediction algorithms corresponding to multiple preset scenarios, it is possible to first select a performance prediction algorithm corresponding to the business scenario from multiple performance prediction algorithms based on the business scenario output by the scenario classifier, and then make predictions based on the performance prediction algorithm and operating information, thereby being able to more accurately predict the performance of migrating the threads of the current processing core to other processor cores of different types.
- the performance predictor in the first processing core is further used to: determine the multiple preset scenarios and the multiple performance prediction algorithms corresponding to the multiple preset scenarios based on sample data, and the sample data includes instruction stream characteristics and memory operation characteristics corresponding to the multiple preset scenarios.
- the sample data can be understood as instruction stream features and memory operation features corresponding to multiple scenarios respectively, and the sample data can be trained according to the training algorithm to obtain multiple preset scenarios and a performance prediction algorithm corresponding to each preset scenario.
- an operating system is run on the heterogeneous processor, and the performance predictor in the first processing core is further used to: send the target performance information to the operating system; the operating system is used to: determine a second processing core from the multiple processing cores based on the target performance information, and schedule the target service to the second processing core for processing.
- the processing core may send the prediction result to the operating system, and the operating system may determine the scheduling strategy of the thread based on the prediction result, thereby improving the performance of the heterogeneous processor.
- the first processing core also includes a first sampling unit and a second sampling unit
- the first sampling unit is used to: obtain the instruction stream characteristics of the first processing core running the target business, the instruction stream characteristics include instruction type, number of instructions, processor core dynamic events or one or more
- the second sampling unit is used to: obtain the memory operation characteristics of the first processing core running the target business, the memory operation characteristics include access bandwidth, access latency, miss rate, request queue occupancy rate or one or more.
- the first sampling unit can be understood as a processor event sampling unit, which can be used to collect the instruction flow characteristics of the target business currently running on the first processing core, that is, the type of instructions and the number of instructions executed by the first processing core in the process of running the target business (which can be understood as information inside the processing core);
- the second sampling unit can be understood as a storage subsystem event sampling unit, which can be used to collect the memory operation characteristics of the target business currently running on the first processing core (which can be understood as information outside the processing core), that is, the access bandwidth, access delay, miss rate, request queue occupancy rate and other information when the first processing core accesses the storage subsystem.
- the scenario classifier in the first processing core can determine the business scenario of the target business based on the information such as the type of instructions and the number of instructions executed by the first processing core when running the target business and the access bandwidth, access delay, miss rate, request queue occupancy rate and other information when the first processing core accesses the storage subsystem, and then can predict the performance of scheduling the target business to other processing cores based on the business scenario of the target business.
- the present application provides a scheduling method, characterized in that it is applied to a heterogeneous processor, the heterogeneous processor includes multiple processing cores of different sizes, and each of the multiple processing cores includes a scenario classifier and a performance predictor, the method includes: obtaining operation information of the first processing core through the scenario classifier in the first processing core, the operation information including one or more of instruction stream characteristics and memory operation characteristics of the target business running by the first processing core, and the first processing core is any one of the multiple processing cores; determining a target scenario of the target business from multiple preset scenarios based on the operation information; predicting target performance information based on the target scenario and the operation information through the performance predictor in the first processing core, the target performance information being performance prediction information corresponding to the target business.
- the performance predictor in each processing core includes multiple performance sub-predictors and a target selector, and each of the multiple performance sub-predictors corresponds to a performance prediction algorithm for the preset scenario, and predicting the target performance information based on the target scenario and the operating information includes: determining the corresponding performance information through each performance sub-predictor based on the performance prediction algorithm corresponding to the preset scenario and the operating information of the first processing core; obtaining multiple performance information corresponding to the multiple performance sub-predictors respectively through the target selector, and determining the target performance information from the multiple performance information based on the target scenario.
- predicting the target performance information based on the target scenario and the operating information includes: determining a target performance prediction algorithm corresponding to the target scenario from a plurality of performance prediction algorithms, the plurality of performance prediction algorithms corresponding one-to-one to the plurality of preset scenarios; and determining the target performance information based on the operating information of the first processing core and the target performance prediction algorithm.
- the method further includes: determining, through the performance predictor in the first processing core, the multiple preset scenarios and the multiple performance prediction algorithms corresponding to the multiple preset scenarios based on sample data, wherein the sample data includes instruction stream characteristics and memory operation characteristics corresponding to the multiple preset scenarios.
- an operating system is run on the heterogeneous processor, and the method further includes: sending the target performance information to the operating system through the performance predictor in the first processing core; determining, through the operating system, a second processing core from the multiple processing cores based on the target performance information, and scheduling the target service to the second processing core for processing.
- the first processing core also includes a first sampling unit and a second sampling unit
- the method further includes: obtaining, through the first sampling unit, the instruction stream characteristics of the first processing core running the target business, the instruction stream characteristics including one or more of instruction type, number of instructions, and processor core dynamic events; and obtaining, through the second sampling unit, the memory operation characteristics of the first processing core running the target business, the memory operation characteristics including one or more of access bandwidth, access latency, miss rate, and request queue occupancy rate.
- the present application provides a computer storage medium, characterized in that the computer storage medium stores a computer program, and when the computer program is executed by a processor, it implements the method described in any one of the second aspects above.
- the present application provides a chip system, which includes a processor for supporting an electronic device to implement the functions involved in the second aspect, for example, generating or processing the information involved in the scheduling method.
- the chip system also includes a memory, which is used to store program instructions and data necessary for the electronic device.
- the chip system can be composed of a chip, or it can include a chip and other discrete devices.
- the present application provides a computer program product, characterized in that the computer program includes instructions, and when the computer program is executed by a computer, the computer executes any one of the methods described in the second aspect above.
- FIG1 is a schematic diagram of the structure of a multi-core heterogeneous processor provided by an embodiment of the present invention.
- FIG. 2 is a schematic diagram of the structure of a heterogeneous processor provided by an embodiment of the present invention.
- FIG. 3 is a schematic diagram of the internal structure of a first processing core provided by an embodiment of the present invention.
- FIG. 4 is a schematic diagram of a heterogeneous processor system provided by an embodiment of the present invention.
- FIG. 5 is a schematic diagram of the internal structure of a processing core provided by an embodiment of the present invention.
- FIG. 6 is a schematic diagram of the internal structure of another processing core provided by an embodiment of the present invention.
- FIG. 7 is a training flowchart of a performance prediction algorithm provided by an embodiment of the present invention.
- FIG8 is a schematic diagram of big and small core scheduling provided by an embodiment of the present invention.
- FIG. 9 is a flowchart of a scheduling method provided by an embodiment of the present invention.
- the embodiment of the present application provides a multi-core heterogeneous processor.
- Figure 1 is a schematic diagram of the structure of a multi-core heterogeneous processor provided by an embodiment of the present invention.
- the multi-core heterogeneous processor 101 refers to a chip that integrates multiple processor cores (also called processor cores). These processor cores have different functions and structures, which are integrated into the same chip in an effective way, and the application programs are assigned to different processor cores in an effective partitioning manner for parallel processing, thereby improving the performance of the processor system.
- the multi-core heterogeneous processor 101 can be located in any electronic device, such as a computer, a mobile phone, a tablet, a personal digital assistant, a smart wearable device, a smart car or a smart home appliance.
- the multi-core heterogeneous processor 101 can specifically be a chip or a chipset or a circuit board equipped with a chip or a chipset.
- the chip or chipset or the circuit board equipped with a chip or a chipset can work under the necessary software driver. Specifically,
- processor core 101F is the abbreviation of processor core, also known as kernel, and is the most important component of the central processing unit (CPU). It is made of single crystal silicon with a certain production process. All CPU calculations, command reception or command storage, and data processing are performed by the processor core.
- An operating system, a file system (such as a flash file system F2FS) or an application program can be run on multiple processor cores to control multiple hardware or software components connected to the processor, and various data can be processed and operations can be performed.
- Each processor core in the multiple processor cores can load instructions or data stored in a storage device (which can be understood as an external memory, such as a disk, etc.) into the internal memory 102, and call the instructions or data that need to be calculated to the processor core for calculation. After the calculation is completed, the processor core temporarily stores the result in the internal memory 102, and stores the instructions or data that need to be stored for a long time in the storage device (i.e., the external memory) through the controller 103.
- the memory in the multi-core heterogeneous processor 101 may be a cache memory (Cache).
- the Cache may include one or more of a first-level cache (L1 Cache), a second-level cache (L2 Cache), a third-level cache (L3 Cache), etc.
- the Cache may store instructions or data that have just been used or cyclically used by the multi-core heterogeneous processor 101. If the multi-core heterogeneous processor 101 needs to use the instruction or data again, it may be directly called from the Cache. Repeated access is avoided, the waiting time of the processor core is reduced, and the efficiency of the processor system is improved. It is understandable that the processor core 1011 and the (F-1) processor cores may communicate via a bus or other coupling methods, which is not specifically limited here.
- each processor core is heterogeneous, that is, the structures of different processor cores (1011, 1012...101F) are different, and the processor cores can be divided into large cores, medium cores, small cores, etc. according to performance.
- the types of processor cores may include performance cores (also called large cores), energy efficiency cores (also called medium cores), and low power cores (also called small cores), etc., wherein the performance core can provide high performance, high throughput and low latency, and can be used to ensure the high performance scenario of the processor system; the energy efficiency core can provide the best energy efficiency ratio, and can be used to ensure the better performance and lower energy consumption scenario of the processor system; the low power core can provide the lowest power consumption, and can be used to ensure the lowest power consumption scenario of the processor system.
- performance cores also called large cores
- energy efficiency cores also called medium cores
- low power cores also called small cores
- the large and small core scheduling of the multi-core heterogeneous processor 101 is to perceive the thread loads running on different processor cores in real time, determine the thread scheduling strategy under the performance constraint and energy consumption constraint, and realize the allocation of different types of threads to processor cores with different energy efficiencies for execution, so as to obtain the best energy efficiency or the best performance of the processor system.
- the core problem of big and small core scheduling of the multi-core heterogeneous processor 101 is to collect the performance data of the thread (or process) running on the current processor A core, and predict the performance of assuming that the thread (or process) is migrated to a different type of processor B core, and then determine the scheduling strategy of the thread (or process) based on the prediction result, thereby improving the performance of the multi-core heterogeneous processor 101. How to predict the performance of threads (or processes) running on different processor cores will be described in detail later, so it will not be repeated here.
- the internal memory 102 can be located outside the multi-core heterogeneous processor 101. It is usually a power-off volatile memory. The content stored on it will be lost when the power is off. It can also be called the main memory.
- the internal memory 102 in the present application includes a readable and writable running memory, which is used to temporarily store the calculation data in multiple processor cores, and to interact with storage devices or other external memories. It can be used as a storage medium for temporary data of operating systems or other running programs. In the present application, the task scenario running on the current processor core can be predicted based on the data access characteristics of the processor core accessing the internal memory 102.
- the internal memory 102 may include one or more of a dynamic random access memory (DRAM), a static random access memory (SRAM), a synchronous dynamic random access memory (SDRAM), etc.
- DRAM includes double data rate synchronous dynamic random access memory (Double Data Rate Synchronous Dynamic Random Access Memory, DDR SDRAM), referred to as DDR, second-generation double data rate synchronous dynamic random access memory (DDR2), third-generation double data rate synchronous dynamic random access memory (DDR3), fourth-generation low-power double data rate synchronous dynamic random access memory (Low Power Double Data Rate 4, LPDDR4) and fifth-generation low-power double data rate synchronous dynamic random access memory (Low Power Double Data Rate 5, LPDDR5), etc.
- DDR SDRAM Double Data Rate Synchronous Dynamic Random Access Memory
- DDR SDRAM double data rate synchronous dynamic random access memory
- DDR2 second-generation double data rate synchronous dynamic random access memory
- DDR3 third-generation double data rate synchronous dynamic random access memory
- the controller 103 is commonly used to manage and control the communication between the multi-core heterogeneous processor 101 and an external storage device (such as a disk, etc.), and provides a standardized interface (such as the universal flash storage UFS standard) for the communication between the multi-core heterogeneous processor 101 and the external storage device.
- an external storage device such as a disk, etc.
- the external storage device is not shown in FIG1 , but the multi-core heterogeneous processor 101 can be connected not only to the internal memory 102, but also to the external storage device.
- the controller 103 can transmit commands (such as write, read, erase, etc.
- the controller 103 can convert the commands or data into data packets that support a certain protocol by encapsulation, and for multi-core heterogeneous processors 101, the controller 103 can convert the commands or data into data packets that support a certain protocol by encapsulation.
- the controller 103 performs reverse operations on the data received by the heterogeneous processor 101.
- the structure of the multi-core heterogeneous processor 101 in FIG. 1 is only some exemplary implementations provided in the embodiments of the present invention.
- the structure of the multi-core heterogeneous processor 101 in the embodiments of the present invention includes but is not limited to the above implementations.
- FIG 2 is a schematic diagram of the structure of a heterogeneous processor provided in an embodiment of the present invention.
- the heterogeneous processor in the embodiment of the present invention will be described in detail in conjunction with Figure 2.
- the heterogeneous processor 200 can be used to predict the performance of threads on different types of processor cores, and can determine the scheduling strategy of the thread based on the prediction results, thereby improving the performance of the heterogeneous processor, or improving the energy efficiency of the heterogeneous processor.
- the heterogeneous processor 200 provided in an embodiment of the present invention may include part or all of the structure and functions of the multi-core heterogeneous processor 101 in Figure 1 above.
- the heterogeneous processor 200 may include but is not limited to multiple processing cores of different sizes, and each of the multiple processing cores includes a scene classifier and a performance predictor, wherein,
- the scene classifier 2011 in the first processing core 201 is used to obtain the operation information of the first processing core 201 .
- the operation information includes one or more of instruction flow characteristics, data access characteristics, and memory operation characteristics of the target service executed by the first processing core 201, and the first processing core 201 is any one of the multiple processing cores.
- the target service can be understood as any application.
- the performance of the multiple processing cores included in the heterogeneous processor 200 may be different, and they may be divided into large cores, medium cores, small cores, etc. according to the performance.
- the processing cores in the heterogeneous processor 200 may be performance cores (may be called large cores), energy efficiency cores (may be called medium cores), low power consumption cores (may be called small cores), etc., wherein the performance cores may provide high performance, high throughput and low latency, and may be used to ensure the high performance scenario of the processor system; the energy efficiency cores may provide the best energy efficiency ratio, and may be used to ensure the better performance and lower energy consumption scenario of the processor system; the low power consumption cores may provide the lowest power consumption, and may be used to ensure the lowest power consumption scenario of the processor system.
- the first processing core 201 may be any one of the multiple processing cores, and may run the thread of the target business on the first processing core 201.
- a currently optimal one may be selected from multiple processing cores according to the current operating status of each processing core as the first processing core 201. Further, the thread of the target service may be scheduled to the first processing core 201 for processing.
- the scenario classifier 2011 in the first processing core 201 can obtain the operation information of the first processing core 201 running the target business after a preset time period, so as to subsequently determine the business scenario of the target business, and then predict the performance of scheduling the target business to other processing cores based on the business scenario of the target business.
- the first processing core 201 also includes a first sampling unit 2013 and a second sampling unit 2014
- the first sampling unit 2013 is used to: obtain the instruction stream characteristics of the first processing core 201 running the target business, the instruction stream characteristics include one or more of instruction type, number of instructions, and processor core dynamic events
- the second sampling unit 2014 is used to: obtain the memory operation characteristics of the first processing core 201 running the target business, the memory operation characteristics include one or more of access bandwidth, access latency, miss rate, and request queue occupancy rate.
- the first processing core 201 may further include a first sampling unit 2013 and a second sampling unit 2014, wherein the first sampling unit 2013 may be understood as a processor event sampling unit, which may be used to collect instruction flow characteristics of the target service currently running on the first processing core 201, that is, the type of instructions executed by the first processing core 201 in the process of running the target service, the number of instructions, processor core dynamic events and other information, wherein the characteristics of the processor core dynamic events may include but are not limited to the absence of a primary data or instruction cache, the absence of a branch predictor, the error of a branch predictor, the occupancy of a processor queue, the absence of a data and instruction translation backup buffer, the emission bandwidth, the pipeline congestion, the hardware prefetch match and the like; the second sampling unit 2014 may be understood as a storage subsystem event sampling unit, which may be used to collect the memory operation characteristics of the target service currently
- the scenario classifier 2011 in the first processing core 201 can determine the business scenario of the target business based on information such as the instruction type and number of instructions executed when the first processing core 201 runs the target business and information such as the access bandwidth, access latency, miss rate, and request queue occupancy rate when the first processing core 201 accesses the storage subsystem, and can then predict the performance of scheduling the target business to other processing cores based on the business scenario of the target business.
- each processing core in the heterogeneous processor 200 may include a processor event sampling unit and a storage subsystem event sampling unit, which are respectively used to obtain the instruction flow characteristics (which can be understood as information inside the processing core) and memory operation characteristics (which can be understood as information outside the processing core) of the corresponding processing core.
- the scenario classifier 2011 in the first processing core 201 is further configured to determine a target scenario for the target service from a plurality of preset scenarios based on the operation information.
- the multiple preset scenes can be game scenes, reading scenes, video scenes, life service scenes, etc., and the types of preset scenes are not specifically limited in this application.
- the scene classifier 2011 in the first processing core 201 After the scene classifier 2011 in the first processing core 201 obtains the operation information of the first processing core 201, it can analyze the type of instructions executed by the first processing core 201 when running the target business, the number of instructions and other characteristics, as well as the access bandwidth, access delay, miss rate, request queue occupancy rate and other characteristics when the first processing core 201 accesses the storage subsystem, and then determine the business scenario of the target business from multiple preset scenes based on the above characteristics, such as the target business is a game business, etc., and then predict the performance of scheduling the target business to other processing cores based on the business scenario of the target business.
- FIG. 4 is a schematic diagram of a heterogeneous processor system provided by an embodiment of the present invention, in which the heterogeneous processor may include a performance core (large core), an energy-efficient core 1 (medium core), an energy-efficient core 2 (medium core), a low-power core 1 (small core), a low-power core 2 (small core), a low-power core 3 (small core), a low-power core 4 (small core), etc.
- the large core and the medium core can interact with the secondary cache during operation
- the small core can interact with the tertiary cache during operation.
- the thread of the target business can run on any processing core, and it can be assumed here that the target business initially runs on the energy-efficient core 1.
- the scenario classifier in the energy-efficient core 1 can be classified according to the characteristics of the thread load, and each classification represents a different scenario.
- the input of the scenario classifier is partly the event of the processor, and partly the event of the storage subsystem, such as the instruction cache, the data cache, the secondary cache, the tertiary cache, and the events generated by the memory controller.
- the scenario classifier can determine the business scenario of the target business based on the events of the processor and the events of the storage subsystem, and then can predict the performance of scheduling the target business to other processing cores based on the business scenario of the target business. In some embodiments, it is only necessary to predict the performance of scheduling the target business to other types of processor cores, and it is not necessary to predict the performance of scheduling the target business to the same type of processor cores.
- scene classifier in the processing core can be implemented through software or hardware, and is not specifically limited in this application.
- the scene classifier in the processing core can be a linear classifier, which is used to classify scenes.
- the input of the linear classifier is events from the processor and storage subsystem, and the output of the linear classifier is the scene category of the target business.
- the number of scene categories can be determined by the accuracy requirements of the classification, the computing power of the system, etc.
- the scene classifier in the processing core can also be implemented using a perceptron model, using online feedback to correct the perceptron; the scene classifier in the processing core can also be implemented using an artificial neural network; the scene classifier in the processing core can also be provided using an offline trained model.
- the performance predictor 2012 in the first processing core 201 is used to predict the target performance information corresponding to the target business based on the target scenario and the operation information.
- the target performance information may include performance prediction information for scheduling the target service to the multiple processing cores.
- the target performance information may be understood as the performance of assuming that the thread of the current processing core is migrated to other processing cores of different types, and the result may be a correlation coefficient or an enumeration, such as a strong improvement, a slight improvement, basically unchanged, a slight decrease, a strong decrease, etc.
- the performance prediction information may be understood as the performance comparison result of another processing core and the current processing core assuming that the thread of the current processing core is migrated to another processing core, such as migrating the thread on the energy efficiency core 1 to the performance core, the performance will be strongly improved, etc.
- the performance of migrating the thread of the current processing core to other processing cores of different types is judged by the performance predictor 2012, and the performance predictor 2012 may use the output of the scenario classifier 2011 as input, and may select the corresponding performance prediction algorithm by inputting the scenario type of the target service to predict the performance of scheduling the target service to other processing cores.
- each type of performance prediction algorithm will independently select its own processor events and storage subsystem events to determine the performance of migrating the threads of the current processing core to other processor cores of different types.
- performance predictor in the processing core can be implemented by software or hardware, which is not specifically limited in this application.
- Figure 5 is a schematic diagram of the internal structure of a processing core provided by an embodiment of the present invention.
- the performance predictor 2012 in each processing core includes multiple performance sub-predictors and a target selector, and each of the multiple performance sub-predictors corresponds to a performance prediction algorithm for the preset scenario.
- the performance predictor 2012 in the first processing core 201 is specifically used to: determine the corresponding performance information through each performance sub-predictor based on the performance prediction algorithm corresponding to the preset scenario and the operating information of the first processing core 201; obtain the multiple performance information corresponding to the multiple performance sub-predictors through the target selector, and determine the target performance information from the multiple performance information based on the target scenario.
- the processor event sampling unit in the first processing core 201 may be used to collect instruction stream features (such as various processor events) of the first processing core 201 running the target business
- the storage subsystem event sampling unit in the first processing core 201 may be used to collect the first processing core 201.
- the memory operation characteristics (such as various storage subsystem events) of the target business are run, so that the scenario classifier 2011 and each performance sub-predictor can obtain the operation information of the first processing core 201 running the target business.
- each performance sub-predictor corresponds to a performance prediction algorithm of a preset scenario
- each performance sub-predictor can predict the performance of migrating the thread of the current processing core to other processor cores of different types based on its own performance prediction algorithm and operation information (it should be emphasized that each type of performance prediction algorithm will independently select its own processor events and storage subsystem events to determine the performance of migrating the thread of the current processing core to other processor cores of different types).
- the target selector can select one from multiple prediction results as the target prediction result based on the business scenario output by the scenario classifier 2011, so as to more accurately predict the performance of migrating the thread of the current processing core to other processor cores of different types.
- Figure 6 is a schematic diagram of the internal structure of another processing core provided by an embodiment of the present invention.
- the performance predictor 2012 in the first processing core 201 is specifically used to: determine the target performance prediction algorithm corresponding to the target scenario from multiple performance prediction algorithms, and the multiple performance prediction algorithms correspond one-to-one to the multiple preset scenarios; determine the target performance information based on the operating information of the first processing core 201 and the target performance prediction algorithm.
- the processor event sampling unit in the first processing core 201 can collect the instruction stream characteristics (such as various processor events) of the first processing core 201 running the target business, and the storage subsystem event sampling unit in the first processing core 201 can also collect the memory operation characteristics (such as various storage subsystem events) of the first processing core 201 running the target business, so that the scenario classifier 2011 and the performance predictor 2012 can obtain the operation information of the first processing core 201 running the target business.
- the processor event sampling unit in the first processing core 201 can collect the instruction stream characteristics (such as various processor events) of the first processing core 201 running the target business
- the storage subsystem event sampling unit in the first processing core 201 can also collect the memory operation characteristics (such as various storage subsystem events) of the first processing core 201 running the target business, so that the scenario classifier 2011 and the performance predictor 2012 can obtain the operation information of the first processing core 201 running the target business.
- the performance prediction algorithm corresponding to the business scenario can be selected from multiple performance prediction algorithms according to the business scenario output by the scenario classifier 2011, and then the performance of migrating the thread of the current processing core to other processor cores of different types can be predicted based on the performance prediction algorithm and the operation information (it should be emphasized that each type of performance prediction algorithm will independently select its own processor event and storage subsystem event to determine the performance of migrating the thread of the current processing core to other processor cores of different types).
- the performance predictor 2012 in the first processing core 201 is further used to: determine the multiple preset scenarios and the multiple performance prediction algorithms corresponding to the multiple preset scenarios based on sample data, and the sample data includes instruction stream characteristics and memory operation characteristics corresponding to the multiple preset scenarios.
- the sample data can be understood as instruction stream features and memory operation features corresponding to multiple scenarios respectively, and then the sample data can be trained according to the training algorithm to obtain multiple preset scenarios and the performance prediction algorithm corresponding to each preset scenario.
- FIG. 7 is a training flow chart of a performance prediction algorithm provided by an embodiment of the present invention.
- the preset scenario and the performance prediction algorithm in the figure can be jointly trained and optimized to achieve collaborative optimization so that the accuracy is optimized.
- the detailed process is as follows:
- Step S301 Initialize the scene classification algorithm. Specifically, other algorithms may be used to obtain an initial scene classification algorithm through feature clustering.
- Step S302 Use a scene classification algorithm to classify samples into N scenes.
- the scene classification algorithm can be used to classify the load, and the training sample points can be divided into N categories.
- Step S303 taking the samples of each scene as input for training, and obtaining a total of N independent performance prediction algorithms. Specifically, the sample points divided into N categories are independently used for the regression of each category of performance prediction algorithms, and N performance prediction algorithms are obtained.
- Step S304 Whether the prediction error meets the requirements. Specifically, each performance prediction algorithm calculates the error for the sample points of this group. When the prediction error meets the requirements, the iterative training process exits to step S308. If the prediction error does not meet the requirements, it proceeds to step S305.
- Step S305 Take all samples as input, use N performance prediction algorithms to make independent predictions, and calculate the prediction errors. Specifically, take all sample points as input, and the N performance prediction algorithms trained in step S303 make independent predictions on all sample points, and calculate the errors of each sample point under different performance prediction algorithms.
- Step S306 Regrouping according to the predicted error. Specifically, according to the error calculated in step S305, each sample point is divided into the group with the smallest error. In this way, all sample points can be regrouped into N groups.
- Step S307 Use the grouping information to train the scene classification algorithm to obtain a new scene classification algorithm. Specifically, the grouping information of the sample points in step S306 can be used as input to train a new scene classification algorithm.
- the new scene classification algorithm trained in step S307 reenters step S302 and repeats the iterative process of step S302. The entire process is repeated and iterated until step S304 meets the error requirement and the iteration ends.
- Step S308 Outputting the scene classification algorithm and the corresponding performance prediction algorithm. Specifically, the scene classification algorithm and the corresponding performance prediction algorithm may be output.
- an operating system is run on the heterogeneous processor 200, and the performance predictor 2012 in the first processing core 201 is further used to: send the target performance information to the operating system; the operating system is used to: determine a second processing core from the multiple processing cores based on the target performance information, and schedule the target service to the second processing core for processing.
- the second processing core is one of the plurality of processing cores. In some embodiments, the second processing core is different from the first processing core. If the operating system determines that the thread needs to be scheduled based on the target performance information, one of the remaining processing cores except the first processing core can be determined as the second processing core. In some embodiments, the second processing core can be the same processing core as the first processing core. If the operating system determines that the thread does not need to be scheduled based on the target performance information, the second processing core is the previous first processing core, indicating that the thread does not need to be scheduled.
- the processing core can send the prediction result to the operating system, and then the operating system can determine the scheduling strategy of the thread based on the prediction result, thereby improving the performance of the heterogeneous processor or improving the energy efficiency of the heterogeneous processor.
- the operating system can determine one of the multiple processing cores as the target processing core (i.e., the second processing core) based on the prediction result. During this process, the operating system can make decisions based on factors such as the frequency and voltage of the processing core.
- a linear regression model is used to predict the effect of processor frequency on performance, and can also be used to predict the effect of storage subsystem frequency on performance.
- FIG8 is a schematic diagram of a large and small core scheduling provided by an embodiment of the present invention, in which the heterogeneous processor may include a processor large and small core scheduling control unit 2000, a processor large and small core scheduling control unit 3000, and an operating system decision module 5000.
- the operating system decision module 5000 is a software part.
- the processor large and small core scheduling control unit 2000 and the processor large and small core scheduling control unit 3000 can be implemented by software or hardware.
- the processor event sampling unit and the storage subsystem event sampling unit in the processor large and small core scheduling control unit 2000 are sampled and collected to obtain the status of the processor and the storage subsystem and identify the business characteristics of the instruction stream executed in the processor system.
- the scenario classifier in the processor large and small core scheduling control unit 2000 samples the processor and storage subsystem events through the processor event sampling unit and the storage subsystem event sampling unit, uses the scenario classifier to classify the load scenario, and outputs the scenario classification of the load to the next level performance prediction algorithm selector.
- Performance sub-predictor 1, performance sub-predictor 2, performance sub-predictor 3, and performance sub-predictor 4 are performance predictors for different scenarios.
- the performance predictor predicts the performance of the current processor's thread migration to other processor cores of different types according to different performance prediction algorithms. Its output is: the performance of the current processor's thread migration to other processor cores of different types.
- the result can be a correlation coefficient or an enumeration, such as: strong improvement, slight improvement, basically unchanged, slight decrease, strong decrease, etc.
- the operating system decision module 5000 collects the thread scheduling result prediction information of all cores. By summarizing the prediction information of the thread scheduling of each core, the overall scheduling method is determined.
- the operation information of the thread of the current processing core running the target business (which may include the instruction stream characteristics of the processing core running the target business, and the memory operation characteristics) can be obtained through the scenario classifier in the processing core to determine the business scenario of the target business based on the operation information.
- the performance predictor in the processing core can predict the performance of migrating the thread of the current processing core to other processor cores of different types based on the operation information of the current processing core and the business scenario of the target business, so that the scheduling strategy of the thread can be determined based on the prediction results, thereby improving the performance of heterogeneous processors or improving the energy efficiency of heterogeneous processors.
- coarse-grained performance prediction is performed only based on information such as the processing core utilization rate of the application (i.e., the business) collected by the processor subsystem, which cannot meet the real-time requirements, resulting in scheduling loss performance and energy efficiency.
- FIG 9 is a flowchart of a scheduling method provided by an embodiment of the present invention.
- the method is applicable to a heterogeneous processor and a device including the heterogeneous processor in Figure 2 above, wherein the heterogeneous processor includes multiple processing cores of different sizes, and each of the multiple processing cores includes a scene classifier and a performance predictor.
- the method may include the following steps S401-S403, which are described in detail as follows:
- Step S401 obtaining operation information of the first processing core through a scene classifier in the first processing core.
- the operation information includes one or more of the instruction stream characteristics of the target business executed by the first processing core and the memory operation characteristics, and the first processing core is any one of the multiple processing cores.
- Step S402 Determine, by means of a scenario classifier in the first processing core, a target scenario of the target business from a plurality of preset scenarios based on the operation information.
- Step S403 predicting target performance information corresponding to the target business based on the target scenario and the operation information through the performance predictor in the first processing core.
- the target performance information includes performance prediction information for scheduling the target service to the multiple processing cores.
- the performance predictor in each processing core includes multiple performance sub-predictors and a target selector, and each of the multiple performance sub-predictors corresponds to a performance prediction algorithm for the preset scenario, and the target performance information corresponding to the target business based on the target scenario and the operating information is predicted, including: determining the corresponding performance information through each performance sub-predictor based on the performance prediction algorithm corresponding to the preset scenario and the operating information of the first processing core; obtaining multiple performance information corresponding to the multiple performance sub-predictors respectively through the target selector, and determining the target performance information from the multiple performance information based on the target scenario.
- predicting the target performance information corresponding to the target business based on the target scenario and the operating information includes: determining a target performance prediction algorithm corresponding to the target scenario from a plurality of performance prediction algorithms, the plurality of performance prediction algorithms corresponding one-to-one to the plurality of preset scenarios; and determining the target performance information based on the operating information of the first processing core and the target performance prediction algorithm.
- an operating system is run on the heterogeneous processor, and the method further includes: sending the target performance information to the operating system through the performance predictor in the first processing core; determining, through the operating system, a second processing core from the multiple processing cores based on the target performance information, and scheduling the target service to the second processing core for processing.
- the first processing core also includes a first sampling unit and a second sampling unit
- the method further includes: obtaining, through the first sampling unit, the instruction stream characteristics of the first processing core running the target business, the instruction stream characteristics including one or more of instruction type, number of instructions, and processor core dynamic events; and obtaining, through the second sampling unit, the memory operation characteristics of the first processing core running the target business, the memory operation characteristics including one or more of access bandwidth, access latency, miss rate, and request queue occupancy rate.
- the operation information of the thread of the current processing core running the target business (which may include the instruction stream characteristics and data access characteristics of the processing core running the target business) can be obtained through the scenario classifier in the processing core to determine the business scenario of the target business based on the operation information.
- the performance predictor in the processing core can predict the performance of migrating the thread of the current processing core to other processor cores of different types based on the operation information of the current processing core and the business scenario of the target business, so that the scheduling strategy of the thread can be determined based on the prediction results, thereby improving the performance of heterogeneous processors or improving the energy efficiency of heterogeneous processors.
- coarse-grained performance prediction is performed only based on information such as the processing core utilization rate of the application (i.e., the business) collected by the processor subsystem, which cannot meet the real-time requirements, resulting in scheduling loss performance and energy efficiency.
- the present application provides a computer storage medium, characterized in that the computer storage medium stores a computer program, and when the computer program is executed by a processor, any one of the above-mentioned scheduling methods is implemented.
- An embodiment of the present application provides an electronic device, the electronic device includes a processor, the processor is configured to support the electronic device to implement the corresponding functions in any of the above scheduling methods.
- the electronic device may also include a memory, the memory is used to couple with the processor, and the memory stores the necessary program instructions and data of the electronic device.
- the electronic device may also include a communication interface for the electronic device to communicate with other devices or a communication network.
- the present application provides a chip system, which includes a processor for supporting an electronic device to implement the functions involved above, for example, generating or processing information involved in the above scheduling method.
- the chip system also includes a memory, which is used to store program instructions and data necessary for the electronic device.
- the chip system can be composed of a chip, or it can include a chip and other discrete devices.
- the present application provides a computer program product, characterized in that the computer program includes instructions, and when the computer program is executed by a computer, the computer executes the above-mentioned scheduling method.
- the disclosed devices can be implemented in other ways.
- the device embodiments described above are only schematic, and the division of the above units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed.
- Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, devices or units.
- the indirect coupling or communication connection between elements can be electrical or other forms.
- the units described above as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.
- the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
- the technical solution of the present application is essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product.
- the computer software product is stored in a storage medium, including several instructions to enable a computer device (which can be a personal computer, server or network device, etc., specifically a processor in a computer device) to perform all or part of the steps of the above-mentioned methods in each embodiment of the present application.
- the aforementioned storage medium may include: U disk, mobile hard disk, magnetic disk, optical disk, read-only memory (Read-Only Memory, abbreviated: ROM) or random access memory (Random Access Memory, abbreviated: RAM) and other media that can store program codes.
- ROM Read-Only Memory
- RAM Random Access Memory
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Debugging And Monitoring (AREA)
Abstract
一种异构处理器,异构处理器包括大小不同的多个处理核,且多个处理核中每个处理核包括场景分类器和性能预测器。第一处理核中的场景分类器获取第一处理核的运行信息,运行信息包括第一处理核运行目标业务的指令流特征和存储器运行特征中的一种或多种,第一处理核为多个处理核中的任意一个;基于运行信息从多个预设场景中确定目标业务的目标场景;第一处理核中的性能预测器基于目标场景和运行信息预测目标性能信息,目标性能信息为与目标业务对应的性能预测信息。还公开相关调度方法,能够提升异构处理器的性能或能效。
Description
本申请要求在2022年12月02日提交中国国家知识产权局、申请号为202211534517.X的中国专利申请的优先权,发明名称为“一种异构处理器及相关调度方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及多核处理器领域,尤其涉及一种异构处理器及相关调度方法。
多核异构处理器(也可称为混合处理器)是指包含多个不同类型的处理器核心的硬件平台。例如,多核异构处理器中可以包括性能核心,能效核心,和低功耗核心。其中,性能核心可提供高的性能,高的吞吐量和低延迟,可用于保证处理器系统的高性能场景;能效核心可提供最优的能效比,可用于保证处理器系统的较优性能和较低能耗场景;低功耗核心可提供最低功耗,可用于保证处理器系统的最低功耗场景。多核异构处理器的大小核调度是对于运行在不同的处理器核心上的线程负载进行实时感知,在性能约束和能耗约束下,确定调度策略,实现将不同类型的线程分配到不同能效的处理器核心上执行,以获得处理器系统的最优能效或最好性能。但由于不同场景下不同类型的线程负载在不同类型的处理器核心上的性能表现和能效表现存在着较大的非线性和时变性,难以准确地预测线程在不同类型处理器核心上的性能表现。
因此,如何确定线程在不同类型的处理器核心上的性能表现,进而可基于该性能表现确定线程的调度策略,以提升异构处理器的性能,或是改善异构处理器的能效,是亟待解决的问题。
发明内容
本申请实施例所要解决的技术问题在于,如何提供一种异构处理器及相关调度方法,以确定线程在不同类型的处理器核心上的性能表现,进而可基于该性能表现确定线程的调度策略,从而提升异构处理器的性能,或是改善异构处理器的能效。
第一方面,本申请实施案例提供一种异构处理器,其特征在于,所述异构处理器包括大小不同的多个处理核,且所述多个处理核中每个处理核包括场景分类器和性能预测器,其中,第一处理核中的场景分类器,用于:获取所述第一处理核的运行信息,所述运行信息包括所述第一处理核运行目标业务的指令流特征和存储器运行特征中的一种或多种,所述第一处理核为所述多个处理核中的任意一个;基于所述运行信息从多个预设场景中确定所述目标业务的目标场景;所述第一处理核中的性能预测器,用于:基于所述目标场景和所述运行信息预测目标性能信息,所述目标性能信息为与所述目标业务对应的性能预测信息。
在本发明实施例中,可通过处理核中的场景分类器获取当前处理核运行目标业务的线程的运行信息(可以包括处理核运行目标业务的指令流特征,数据访问特征,存储器运行特征等),以基于运行信息确定目标业务的业务场景。接下来,处理核中的性能预测器可以基于当前处理核的运行信息和目标业务的业务场景,预测将当前处理核的线程迁移到不同类型的其他处理器核上的性能,以便后续可以基于预测结果确定线程的调度策略,从而提升异构处理器的性能,或是改善异构处理器的能效。在一些实施例中,仅基于处理器子系统收集应用程序(即业务)的处理核利用率等信息进行粗粒度的性能预测,无法满足实时性的要求,进而导致调度损失性能和能效。
在一种可能的实现方式中,所述每个处理核中的所述性能预测器包括多个性能子预测器和目标选择器,且所述多个性能子预测器中每个性能子预测器对应一个所述预设场景的性能预测算法,所述第一处理核中的所述性能预测器,具体用于:通过每个所述性能子预测器,基于对应所述预设场景的性能预测算法和所述第一处理核的所述运行信息,确定对应的性能信息;通过所述目标选择器,获取所述多个性能子预测器分别对应的多个性能信息,并基于所述目标场景从所述多个性能信息中确定所述目标性能信息。
在本发明实施例中,由于每个性能子预测器都对应了一个预设场景的性能预测算法,因此每个性能子预测器都可以根据自身的性能预测算法和运行信息预测将当前处理核的线程迁移到不同类型的其他处理器核心上的性能(需强调的是,每类性能预测算法会独立选择各自的处理器事件和存储子系统事件来判断将当前处理核的线程迁移到不同类型的其他处理器核心上的性能)。进一步地,目标选择器可以基于场景分类器输出的业务场景从多个预测结果中选择一个作为目标预测结果,从而能够更加准确地预测到将当前
处理核的线程迁移到不同类型的其他处理器核心上的性能。
在一种可能的实现方式中,所述第一处理核中的所述性能预测器,具体用于:从多个性能预测算法中确定所述目标场景对应的目标性能预测算法,所述多个性能预测算法与所述多个预设场景一一对应;基于所述第一处理核的所述运行信息和所述目标性能预测算法,确定所述目标性能信息。
在本发明实施例中,由于性能预测器中包括了多个预设场景分别对应的性能预测算法,因此可以先根据场景分类器输出的业务场景,从多个性能预测算法中选择该业务场景对应的性能预测算法,进而可以基于该性能预测算法和运行信息进行预测,从而能够更加准确地预测到将当前处理核的线程迁移到不同类型的其他处理器核心上的性能。
在一种可能的实现方式中,所述第一处理核中的所述性能预测器,还用于:基于样本数据,确定所述多个预设场景以及所述多个预设场景分别对应的多个所述性能预测算法,所述样本数据包括所述多个预设场景分别对应的指令流特征和存储器运行特征。
在本发明实施例中,样本数据可以理解为多个场景分别对应的指令流特征和存储器运行特征,进而可以根据训练算法对样本数据进行训练,从而可以得到多个预设场景,以及每个预设场景对应的性能预测算法。
在一种可能的实现方式中,所述异构处理器上运行操作系统,所述第一处理核中的所述性能预测器,还用于:向所述操作系统发送所述目标性能信息;所述操作系统,用于:基于所述目标性能信息从所述多个处理核中确定第二处理核,并将所述目标业务调度至所述第二处理核进行处理。
在本发明实施例中,当预测到线程在不同类型的处理器核心上的性能表现后,处理核可以将预测结果发送至操作系统,进而操作系统可以基于预测结果确定线程的调度策略,从而提升异构处理器的性能。
在一种可能的实现方式中,所述第一处理核中还包括第一采样单元和第二采样单元,所述第一采样单元,用于:获取所述第一处理核运行所述目标业务的所述指令流特征,所述指令流特征包括指令类型,指令数量,处理器核动态事件中的一种或多种;所述第二采样单元,用于:获取所述第一处理核运行所述目标业务的所述存储器运行特征,所述存储器运行特征包括访问带宽,访问延迟,缺失率,请求队列占用率中的一种或多种。
在本发明实施例中,第一采样单元可以理解为处理器事件采样单元,可以用于采集第一处理核当前运行目标业务的指令流特征,即第一处理核运行目标业务过程中所执行的指令类型,指令数量等信息(可以理解为处理核内部的信息);第二采样单元可以理解为存储子系统事件采样单元,可以用于采集第一处理核当前运行目标业务的存储器运行特征(可以理解为处理核外部的信息),即第一处理核访问存储子系统时的访问带宽,访问延迟,缺失率,请求队列占用率等信息。进一步地,第一处理核中的场景分类器可以基于第一处理核运行目标业务时所执行的指令类型,指令数量等信息和第一处理核访问存储子系统时的访问带宽,访问延迟,缺失率,请求队列占用率等信息确定目标业务的业务场景,然后可基于目标业务的业务场景预测将目标业务调度至其他处理核上的性能表现。
第二方面,本申请提供了一种调度方法,其特征在于,应用于异构处理器,所述异构处理器包括大小不同的多个处理核,且所述多个处理核中每个处理核包括场景分类器和性能预测器,所述方法包括:通过第一处理核中的场景分类器,获取所述第一处理核的运行信息,所述运行信息包括所述第一处理核运行目标业务的指令流特征和存储器运行特征中的一种或多种,所述第一处理核为所述多个处理核中的任意一个;基于所述运行信息从多个预设场景中确定所述目标业务的目标场景;通过所述第一处理核中的性能预测器,基于所述目标场景和所述运行信息预测目标性能信息,所述目标性能信息为与所述目标业务对应的性能预测信息。
在一种可能的实现方式中,所述每个处理核中的所述性能预测器包括多个性能子预测器和目标选择器,且所述多个性能子预测器中每个性能子预测器对应一个所述预设场景的性能预测算法,所述基于所述目标场景和所述运行信息预测目标性能信息,包括:通过每个所述性能子预测器,基于对应所述预设场景的性能预测算法和所述第一处理核的所述运行信息,确定对应的性能信息;通过所述目标选择器,获取所述多个性能子预测器分别对应的多个性能信息,并基于所述目标场景从所述多个性能信息中确定所述目标性能信息。
在一种可能的实现方式中,所述基于所述目标场景和所述运行信息预测目标性能信息,包括:从多个性能预测算法中确定所述目标场景对应的目标性能预测算法,所述多个性能预测算法与所述多个预设场景一一对应;基于所述第一处理核的所述运行信息和所述目标性能预测算法,确定所述目标性能信息。
在一种可能的实现方式中,所述方法还包括:通过所述第一处理核中的所述性能预测器,基于样本数据确定所述多个预设场景以及所述多个预设场景分别对应的多个所述性能预测算法,所述样本数据包括所述多个预设场景分别对应的指令流特征和存储器运行特征。
在一种可能的实现方式中,所述异构处理器上运行操作系统,所述方法还包括:通过所述第一处理核中的所述性能预测器,向所述操作系统发送所述目标性能信息;通过所述操作系统,基于所述目标性能信息从所述多个处理核中确定第二处理核,并将所述目标业务调度至所述第二处理核进行处理。
在一种可能的实现方式中,所述第一处理核中还包括第一采样单元和第二采样单元,所述方法还包括:通过所述第一采样单元,获取所述第一处理核运行所述目标业务的所述指令流特征,所述指令流特征包括指令类型,指令数量,处理器核动态事件中的一种或多种;通过所述第二采样单元,获取所述第一处理核运行所述目标业务的所述存储器运行特征,所述存储器运行特征包括访问带宽,访问延迟,缺失率,请求队列占用率中的一种或多种。
第三方面,本申请提供了一种计算机存储介质,其特征在于,所述计算机存储介质存储有计算机程序,该计算机程序被处理器执行时实现上述第二方面任意一项所述的方法。
第四方面,本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持电子设备实现上述第二方面中所涉及的功能,例如,生成或处理上述调度方法中所涉及的信息。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存电子设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。
第五方面,本申请提供一种计算机程序产品,其特征在于,所述计算机程序包括指令,当所述计算机程序被计算机执行时,使得所述计算机执行上述第二方面中任意一项所述的方法。
图1是本发明实施例提供的一种多核异构处理器的结构示意图。
图2为本发明实施例提供的一种异构处理器的结构示意图。
图3为本发明实施例提供的一种第一处理核的内部结构示意图。
图4为本发明实施例提供的一种异构处理器系统的示意图。
图5为本发明实施例提供的一种处理核的内部结构示意图。
图6为本发明实施例提供的另一种处理核的内部结构示意图。
图7为本发明实施例提供的一种性能预测算法的训练流程图。
图8为本发明实施例提供的一种大小核调度的示意图。
图9是本发明实施例提供的一种调度方法的流程图。
下面将结合本申请实施例中的附图,对本申请实施例进行描述。
本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
本申请实施例提供了一种多核异构处理器。请参见图1,图1是本发明实施例提供的一种多核异构处理器的结构示意图,多核异构处理器101是指在一个芯片中集成有多个处理器核(也可称为处理器核心)。
这些处理器核具有不同的功能和结构,以一种有效的方式集成在同一个芯片中,并以有效的划分方式将应用程序分配给不同的处理器核进行并行处理,从而提高处理器系统的性能。该多核异构处理器101可以位于任意一个电子设备中,如电脑、计算机、手机、平板、个人数字助理、智能穿戴设备、智能车载或智能家电等各类设备中。该多核异构处理器101具体可以是芯片或芯片组或搭载有芯片或者芯片组的电路板。该芯片或芯片组或搭载有芯片或芯片组的电路板可在必要的软件驱动下工作。具体地,
多个处理器核(图1中以F个为例,F为大于1的整数),如处理器核1011、处理器核1012……处理器核101F,是处理器核心的简称,又称为内核,是中央处理器(Central Processing Unit,CPU)最重要的组成部分,它是由单晶硅以一定的生产工艺制造出来的,CPU所有的计算、接收命令或存储命令、处理数据都由处理器核执行。在多个处理器核上可运行操作系统、文件系统(如闪存文件系统F2FS)或应用程序等,以控制连接到处理器的多个硬件或软件元件,并且可处理各种数据并执行操作。多个处理器核中的每个处理器核都可以将存储设备(可以理解为外存储器,如磁盘等)中存储的指令或数据加载到内存储器102中,并把需要运算的指令或数据调到处理器核中进行运算,当运算完成后处理器核再将结果临时存储在内存储器102中,并将需要长期存储的指令或数据通过控制器103存储至存储设备(即外存储器)中。在一些实施例中,多核异构处理器101中的存储器可以为高速缓冲存储器(Cache)。所述Cache中可以包括一级缓存(L1 Cache)、二级缓存(L2 Cache)三级缓存(L3 Cache)等中的一种或多种。所述Cache可以保存多核异构处理器101刚用过或循环使用的指令或数据。如果多核异构处理器101需要再次使用该指令或数据,可从所述Cache中直接调用。避免了重复存取,减少了处理器核的等待时间,因而提高了处理器系统的效率。可以理解的是,处理器核1011与(F-1)个处理器核之间可通过总线或其他方式耦合通信,此处不作具体限定。
在本发明实施例中,各个处理器核之间为异构,即不同处理器核(1011,1012……101F)之间的结构不同,且处理器核可以根据性能分为大核,中核,小核等。处理器核的类型可以包括性能核心(也可以称大核),能效核心(也可以称为中核),和低功耗核心(也可以称为小核)等,其中,性能核心可提供高的性能,高的吞吐量和低延迟,可用于保证处理器系统的高性能场景;能效核心可提供最优的能效比,可用于保证处理器系统的较优性能和较低能耗场景;低功耗核心可提供最低功耗,可用于保证处理器系统的最低功耗场景。多核异构处理器101的大小核调度是对于运行在不同的处理器核心上的线程负载进行实时感知,在性能约束和能耗约束下,确定线程的调度策略,实现将不同类型的线程分配到不同能效的处理器核心上执行,以获得处理器系统的最优能效或最好性能。
在本发明实施例中,多核异构处理器101的大小核调度的核心问题是,收集当前处理器A核心上运行的线程(或进程)的性能数据,并预测假设将该线程(或进程)迁移到不同类型的处理器B核心上的性能,进而可以基于预测结果确定线程(或进程)的调度策略,从而提升多核异构处理器101的性能。后续将详细说明如何预测线程(或进程)在不同处理器核心上运行的性能表现,在此先不赘述。
内存储器102(Memory),简称为内存,可位于多核异构处理器101的外部,通常为掉电易失性存储器,断电时会丢失其上存储的内容,也可称为主存储器。本申请中的内存储器102包括可读可写的运行内存,其作用是用于暂时存放多个处理器核中的运算数据,以及与存储设备或其他外部存储器交互数据,可作为操作系统或其他正在运行中的程序的临时数据的存储媒介。在本申请中,可以基于处理器核访问内存储器102的数据访问特征,预测当前处理器核上运行的任务场景。内存储器102可以包括,动态随机存取存储器(DRAM)、静态随机存取存储器(SRAM)、同步动态随机存取存储器(SDRAM)等中的一种或多种。其中,DRAM又包括双倍速率同步动态随机存储器(Double Data Rate Synchronous Dynamic Random Access Memory,DDR SDRAM)简称DDR、二代双倍速率同步动态随机存储器(DDR2)、三代双倍速率同步动态随机存储器(DDR3)、四代低功耗双倍数据率同步动态随机存储器(Low Power Double Data Rate 4,LPDDR4)和五代低功耗双倍数据率同步动态随机存储器(Low Power Double Data Rate 5,LPDDR5)等。
控制器103,常用于管理并控制多核异构处理器101与外部存储设备(如磁盘等)之间的通信,并为多核异构处理器101和外部存储设备之间的通信提供标准化(例如通用闪存存储UFS标准)的接口。需要说明的是,图1中未示出外部存储设备,但多核异构处理器101不仅能够与内存储器102相连接,还能与外部存储设备相连接。具体地,控制器103可以根据多核异构处理器101发出的读/写等请求,向外部存储设备传递命令(例如,写入、读取、擦除等命令)及数据,以及根据存储设备读/写数据等结果向多核异构处理器101反馈事件(如命令完成事件、命令状态事件、硬件出错事件等)。对于从多核异构处理器101发出的命令或数据,控制器103可将命令或数据通过封装的方式转换为支持某个协议的数据包,而对于多
核异构处理器101接收的数据,控制器103则进行反向操作。
可以理解的是,图1中的多核异构处理器101的结构只是本发明实施例提供的一些示例性的实施方式,本发明实施例中的多核异构处理器101的结构包括但不仅限于以上实现方式。
下面结合本发明实施例中的附图对本发明实施例进行描述。
请参见图2,图2为本发明实施例提供的一种异构处理器的结构示意图,下面将结合附图2对本发明实施例中的异构处理器进行详细描述。如图2所示,该异构处理器200可以用于预测线程在不同类型的处理器核心上的性能表现,并可基于预测结果确定线程的调度策略,从而提升异构处理器的性能,或是改善异构处理器的能效。需要说明的是,本发明实施例提供的异构处理器200可以包括上述图1中的多核异构处理器101的部分或全部结构和功能。异构处理器200可以包括但不限于大小不同的多个处理核,且所述多个处理核中每个处理核包括场景分类器和性能预测器,其中,
第一处理核201中的场景分类器2011,用于获取所述第一处理核201的运行信息。
具体的,所述运行信息包括所述第一处理核201运行目标业务的指令流特征,数据访问特征和存储器运行特征中的一种或多种,所述第一处理核201为所述多个处理核中的任意一个。目标业务可以理解为任意的应用程序。
需要说明的是,异构处理器200中包括的多个处理核的性能可以不同,可以根据性能分为大核,中核,小核等。例如,异构处理器200中的处理核可以为性能核心(可称大核),能效核心(可称为中核),低功耗核心(可称为小核)等,其中,性能核心可提供高的性能,高的吞吐量和低延迟,可用于保证处理器系统的高性能场景;能效核心可提供最优的能效比,可用于保证处理器系统的较优性能和较低能耗场景;低功耗核心可提供最低功耗,可用于保证处理器系统的最低功耗场景。第一处理核201可以为多个处理核中的任意一个,可以在第一处理核201上运行目标业务的线程。
可选的,在确定第一处理核201时,可以根据各个处理核当前的运行状态,从多个处理核中选择一个当前最优的作为第一处理核201。进一步地,可以将目标业务的线程调度至第一处理核201进行处理。
在运行过程中,由于异构处理器200中的处理核的运行状态在实时变化,第一处理核201不一定在运行过程中还是最优的处理核。因此,第一处理核201中的场景分类器2011可以在预设时间段后获取第一处理核201运行目标业务的运行信息,以便后续确定目标业务的业务场景,进而可基于目标业务的业务场景预测将目标业务调度至其他处理核上的性能表现。
在一种可能的实现方式中,所述第一处理核201中还包括第一采样单元2013和第二采样单元2014,所述第一采样单元2013,用于:获取所述第一处理核201运行所述目标业务的所述指令流特征,所述指令流特征包括指令类型,指令数量,处理器核动态事件中的一种或多种;所述第二采样单元2014,用于:获取所述第一处理核201运行所述目标业务的所述存储器运行特征,所述存储器运行特征包括访问带宽,访问延迟,缺失率,请求队列占用率中的一种或多种。
请参见图3,图3为本发明实施例提供的一种第一处理核的内部结构示意图,图中第一处理核201中还可以包括第一采样单元2013和第二采样单元2014,其中,第一采样单元2013可以理解为处理器事件采样单元,可以用于采集第一处理核201当前运行目标业务的指令流特征,即第一处理核201运行目标业务过程中所执行的指令类型,指令数量,处理器核动态事件等信息,其中,处理器核动态事件的特征中可以包括但不限于一级数据或指令缓存的缺失,分支预测器的缺失,分支预测器的错误,处理器队列的占用,数据和指令转译后备缓冲区缺失,发射带宽,流水线堵塞,硬件预取匹配等;第二采样单元2014可以理解为存储子系统事件采样单元,可以用于采集第一处理核201当前运行目标业务的存储器运行特征(存储器可以包括内存储器和缓存,其中缓存可以包括一级缓存,二级缓存,三级缓存等),即第一处理核201访问存储子系统时的访问带宽,访问延迟,缺失率,请求队列占用率等信息。进一步地,第一处理核201中的场景分类器2011可以基于第一处理核201运行目标业务时所执行的指令类型,指令数量等信息和第一处理核201访问存储子系统时的访问带宽,访问延迟,缺失率,请求队列占用率等信息确定目标业务的业务场景,进而可基于目标业务的业务场景预测将目标业务调度至其他处理核上的性能表现。
需要说明的是,异构处理器200中的每个处理核中都可以包括处理器事件采样单元和存储子系统事件采样单元,分别用于获取对应的处理核的指令流特征(可以理解为处理核内部的信息)和存储器运行特征(可以理解为处理核外部的信息)。
所述第一处理核201中的所述场景分类器2011,还用于基于所述运行信息从多个预设场景中确定所述目标业务的目标场景。
具体的,多个预设场景可以分别为游戏场景、阅读场景、视频场景、生活服务类场景等,在本申请中并不具体限定预设场景的类型。第一处理核201中的场景分类器2011获取到第一处理核201的运行信息后,可以分析第一处理核201运行目标业务时所执行的指令类型,指令数量等特征,以及第一处理核201访问存储子系统时的访问带宽,访问延迟,缺失率,请求队列占用率等特征,进而可以基于上述特征从多个预设场景中确定目标业务的业务场景,如目标业务为游戏业务等,进而可基于目标业务的业务场景预测将目标业务调度至其他处理核上的性能表现。
例如,如图4所示,图4为本发明实施例提供的一种异构处理器系统的示意图,图中异构处理器中可以包括性能核心(大核),能效核心1(中核),能效核心2(中核),低功耗核心1(小核),低功耗核心2(小核),低功耗核心3(小核),低功耗核心4(小核)等。其中,大核和中核在运行过程中可以与二级缓存进行交互,小核在运行过程中可以与三级缓存进行交互。目标业务的线程可以运行在任意一个处理核上,在此可以假设目标业务最初运行在能效核心1上。在运行过程中,能效核心1中的场景分类器可以根据线程负载的特性进行分类,每个分类代表了一个不同的场景。场景分类器的输入一部分是处理器的事件,一部分是存储子系统的事件,如指令缓存,数据缓存,二级缓存,三级缓存,和内存控制器产生的事件等。进一步地,场景分类器可以基于处理器的事件和存储子系统的事件确定目标业务的业务场景,进而可基于目标业务的业务场景预测将目标业务调度至其他处理核上的性能表现。在一些实施例中,仅需预测将目标业务调度至其他类型的处理器核上的性能表现,而无需预测将目标业务调度至同类型的处理器核上的性能表现。
需要说明的是,处理核中的场景分类器,可以通过软件实现,也可以通过硬件实现,在本申请中不做具体的限定。
可选的,处理核中的场景分类器可以为一个线性分类器,该线性分类器被用于进行场景分类。该线性分类器的输入是来源于处理器和存储子系统的事件,线性分类器的输出是目标业务的场景类别。场景的类别数量可以由分类的精度要求,系统的计算能力等确定。
可选的,处理核中的场景分类器还可以使用感知器模型来实现,使用在线的反馈来矫正该感知器;处理核中的场景分类器还可以采用人工神经网络来实现;处理核中的场景分类器还可以使用离线训练的模型提供。
所述第一处理核201中的性能预测器2012,用于基于所述目标场景和所述运行信息预测所述目标业务对应的目标性能信息。
在一些实施例中,所述目标性能信息可以包括将所述目标业务调度至所述多个处理核上的性能预测信息。目标性能信息可以理解为假设将当前处理核的线程迁移到不同类型的其他处理核上的性能表现,其结果可以是相关系数,也可以是枚举量,如强烈提升,轻微提升,基本不变,轻微下降,强烈下降等。性能预测信息可以理解为假设将当前处理核的线程迁移到另一个处理核上,另一个处理核与当前处理核的性能对比结果,如将能效核心1上的线程迁移到性能核心上,性能会强烈提升等。在本申请中,通过性能预测器2012来判断将当前处理核的线程迁移到不同类型的其他处理核上的性能,且性能预测器2012可以使用场景分类器2011的输出作为输入,并可以通过输入目标业务的场景类型选择相应的性能预测算法,以预测将目标业务调度至其他处理核上的性能表现。
需要说明的是,每类性能预测算法会独立选择各自的处理器事件和存储子系统事件来判断将当前处理核的线程迁移到不同类型的其他处理器核心上的性能。
还需要说明的是,处理核中的性能预测器,可以通过软件实现,也可以通过硬件实现,在本申请中不做具体的限定。
接下来,关于性能预测器2012如何基于目标场景和运行信息预测目标业务对应的目标性能信息,在本申请实施例中提供了两种具体实施方式,详细描述如下:
其一,请参见图5,图5为本发明实施例提供的一种处理核的内部结构示意图,图中每个处理核中的所述性能预测器2012包括多个性能子预测器和目标选择器,且所述多个性能子预测器中每个性能子预测器对应一个所述预设场景的性能预测算法,所述第一处理核201中的所述性能预测器2012,具体用于:通过每个所述性能子预测器,基于对应所述预设场景的性能预测算法和所述第一处理核201的所述运行信息,确定对应的性能信息;通过所述目标选择器,获取所述多个性能子预测器分别对应的多个性能信息,并基于所述目标场景从所述多个性能信息中确定所述目标性能信息。
具体的,可通过第一处理核201中的处理器事件采样单元采集第一处理核201运行目标业务的指令流特征(如各种处理器事件),也可通过第一处理核201中的存储子系统事件采样单元采集第一处理核201
运行目标业务的存储器运行特征(如各种存储子系统事件),从而场景分类器2011和各个性能子预测器可以获取到第一处理核201运行目标业务的运行信息。由于每个性能子预测器都对应了一个预设场景的性能预测算法,因此每个性能子预测器都可以根据自身的性能预测算法和运行信息预测将当前处理核的线程迁移到不同类型的其他处理器核心上的性能(需强调的是,每类性能预测算法会独立选择各自的处理器事件和存储子系统事件来判断将当前处理核的线程迁移到不同类型的其他处理器核心上的性能)。进一步地,目标选择器可以基于场景分类器2011输出的业务场景从多个预测结果中选择一个作为目标预测结果,从而能够更加准确地预测到将当前处理核的线程迁移到不同类型的其他处理器核心上的性能。
其二,请参见图6,图6为本发明实施例提供的另一种处理核的内部结构示意图,图中所述第一处理核201中的所述性能预测器2012,具体用于:从多个性能预测算法中确定所述目标场景对应的目标性能预测算法,所述多个性能预测算法与所述多个预设场景一一对应;基于所述第一处理核201的所述运行信息和所述目标性能预测算法,确定所述目标性能信息。
具体的,可通过第一处理核201中的处理器事件采样单元采集第一处理核201运行目标业务的指令流特征(如各种处理器事件),也可通过第一处理核201中的存储子系统事件采样单元采集第一处理核201运行目标业务的存储器运行特征(如各种存储子系统事件),从而场景分类器2011和性能预测器2012可以获取到第一处理核201运行目标业务的运行信息。由于性能预测器2012中包括了多个预设场景分别对应的性能预测算法,因此可以先根据场景分类器2011输出的业务场景,从多个性能预测算法中选择该业务场景对应的性能预测算法,进而可以基于该性能预测算法和运行信息预测将当前处理核的线程迁移到不同类型的其他处理器核心上的性能(需强调的是,每类性能预测算法会独立选择各自的处理器事件和存储子系统事件来判断将当前处理核的线程迁移到不同类型的其他处理器核心上的性能)。
在一种可能的实现方式中,所述第一处理核201中的所述性能预测器2012,还用于:基于样本数据,确定所述多个预设场景以及所述多个预设场景分别对应的多个所述性能预测算法,所述样本数据包括所述多个预设场景分别对应的指令流特征和存储器运行特征。
具体的,样本数据可以理解为多个场景分别对应的指令流特征和存储器运行特征,进而可以根据训练算法对样本数据进行训练,从而可以得到多个预设场景,以及每个预设场景对应的性能预测算法。
例如,如图7所示,图7为本发明实施例提供的一种性能预测算法的训练流程图,图中预设场景和性能预测算法可以联合进行训练和优化,以实现协同优化,使得精度达到最优,详细流程如下:
步骤S301:初始化场景分类算法。具体的,可以使用其他的算法通过特征聚类获得初始的场景分类算法。
步骤S302:使用场景分类算法对于样本进行分类,分为N个场景。具体的,场景分类算法可用于对于负载进行分类,进而训练的样本点可被分为N类。
步骤S303:将每个场景的样本作为输入,进行训练,共获得独立的N个性能预测算法。具体的,分为N类的样本点被独立用于每类性能预测算法的回归,得到N个性能预测算法。
步骤S304:预测误差是否满足要求。具体的,每个性能预测算法对于本组的样本点计算误差,当预测的误差满足要求,迭代训练的过程就退出到步骤S308。如果预测的误差不满足要求,就进入步骤S305。
步骤S305:将全部的样本作为输入,使用N个性能预测算法进行独立的预测,计算预测的误差。具体的,将全部的样本点作为输入,步骤S303训练的N个性能预测算法对全部的样本点进行独立的预测,计算出每个样本点在不同的性能预测算法下的误差。
步骤S306:根据预测的误差重新进行分组。具体的,根据步骤S305计算的误差,每个样本点被分入到误差最小的分组中。用这种方法,全部的样本点可被重新分成N个组。
步骤S307:将分组的信息用于训练场景分类算法,获得新的场景分类算法。具体的,可使用步骤S306的样本点的分组信息作为输入,训练新的场景分类算法。步骤S307训练的新的场景分类算法重新进入步骤S302,重复步骤S302的迭代过程。重新整个流程,进行循环迭代,直到步骤S304满足误差要求,迭代结束。
步骤S308:输出场景分类算法和对应的性能预测算法。具体的,可以输出场景分类算法和对应的性能预测算法。
在一种可能的实现方式中,所述异构处理器200上运行操作系统,所述第一处理核201中的所述性能预测器2012,还用于:向所述操作系统发送所述目标性能信息;所述操作系统,用于:基于所述目标性能信息从所述多个处理核中确定第二处理核,并将所述目标业务调度至所述第二处理核进行处理。
具体的,第二处理核为多个处理核中的一个。在一些实施例中,第二处理核与上述第一处理核为不同
的处理核,若操作系统基于目标性能信息判断线程需要进行调度,则可以从除第一处理核外的剩余处理核中确定一个作为第二处理核。在一些实施例中,第二处理核可以与上述第一处理核为同一个处理核,若操作系统基于目标性能信息判断线程无需进行调度,则第二处理核为之前的第一处理核,表示线程无需进行调度。当预测到线程在不同类型的处理器核心上的性能表现后,处理核可以将预测结果发送至操作系统,进而操作系统可以基于预测结果确定线程的调度策略,从而提升异构处理器的性能,或是改善异构处理器的能效。
可选的,当操作系统接收到处理核发送的预测结果后,可以基于预测结果从多个处理核中确定一个作为目标处理核(即第二处理核),在此过程中操作系统可以基于处理核的频率和电压等因素进行决策。
在一种可能的实现方式中,线性回归模型被用于预测处理器频率对于性能的影响,还可用于预测存储子系统的频率对于性能的影响。
例如,如图8所示,图8为本发明实施例提供的一种大小核调度的示意图,图中异构处理器可以包括处理器大小核调度控制单元2000,处理器大小核调度控制单元3000,操作系统决策模块5000。其中操作系统决策模块5000为软件部分。处理器大小核调度控制单元2000,处理器大小核调度控制单元3000可以用软件实现,也可以用硬件实现。处理器大小核调度控制单元2000中的处理器事件采样单元,存储子系统事件采样单元被采样和收集,用于获取处理器和存储子系统的状态,识别处理器系统中执行的指令流的业务特征。处理器大小核调度控制单元2000中的场景分类器通过处理器事件采样单元和存储子系统事件采样单元对处理器和存储子系统事件进行采样,使用场景分类器进行负载场景分类,输出负载的场景分类给下一级性能预测算法选择器。性能子预测器1,性能子预测器2,性能子预测器3,性能子预测器4为不同的场景的性能预测器。性能预测器根据不同的性能预测算法预测当前处理器的线程迁移到不同类型的其他处理器核心上的性能。其输出是:当前处理器的线程迁移到不同类型的其他处理器核心上的性能,其结果可以是相关系数,也可以枚举量,例如:强烈提升,轻微提升,基本不变,轻微下降,强烈下降等。操作系统决策模块5000收集全部的核的线程调度结果预测信息。通过汇总各个核的线程调度的预测信息,判断调度的整体方式。
在本申请中,可通过处理核中的场景分类器获取当前处理核运行目标业务的线程的运行信息(可以包括处理核运行目标业务的指令流特征,和存储器运行特征),以基于运行信息确定目标业务的业务场景。接下来,处理核中的性能预测器可以基于当前处理核的运行信息和目标业务的业务场景,预测将当前处理核的线程迁移到不同类型的其他处理器核上的性能,以便后续可以基于预测结果确定线程的调度策略,从而提升异构处理器的性能,或是改善异构处理器的能效。而在一些实施例中,仅基于处理器子系统收集应用程序(即业务)的处理核利用率等信息进行粗粒度的性能预测,无法满足实时性的要求,进而导致调度损失性能和能效。
上述详细阐述了本发明实施例的异构处理器,下面提供了本发明实施例的相关方法。
请参见图9,图9是本发明实施例提供的一种调度方法的流程图,该方法适用于上述图2中的一种异构处理器以及包含所述异构处理器的设备,所述异构处理器包括大小不同的多个处理核,且所述多个处理核中每个处理核包括场景分类器和性能预测器。该方法可以包括以下步骤S401-步骤S403,详细描述如下:
步骤S401:通过第一处理核中的场景分类器,获取所述第一处理核的运行信息。
具体的,所述运行信息包括所述第一处理核运行目标业务的指令流特征,存储器运行特征中的一种或多种,所述第一处理核为所述多个处理核中的任意一个。
步骤S402:通过第一处理核中的场景分类器,基于所述运行信息从多个预设场景中确定所述目标业务的目标场景。
步骤S403:通过所述第一处理核中的性能预测器,基于所述目标场景和所述运行信息预测所述目标业务对应的目标性能信息。
具体的,所述目标性能信息包括将所述目标业务调度至所述多个处理核上的性能预测信息。
在一种可能的实现方式中,所述每个处理核中的所述性能预测器包括多个性能子预测器和目标选择器,且所述多个性能子预测器中每个性能子预测器对应一个所述预设场景的性能预测算法,所述基于所述目标场景和所述运行信息预测所述目标业务对应的目标性能信息,包括:通过每个所述性能子预测器,基于对应所述预设场景的性能预测算法和所述第一处理核的所述运行信息,确定对应的性能信息;通过所述目标选择器,获取所述多个性能子预测器分别对应的多个性能信息,并基于所述目标场景从所述多个性能信息中确定所述目标性能信息。
在一种可能的实现方式中,所述基于所述目标场景和所述运行信息预测所述目标业务对应的目标性能信息,包括:从多个性能预测算法中确定所述目标场景对应的目标性能预测算法,所述多个性能预测算法与所述多个预设场景一一对应;基于所述第一处理核的所述运行信息和所述目标性能预测算法,确定所述目标性能信息。
在一种可能的实现方式中,所述方法还包括:通过所述第一处理核中的所述性能预测器,基于样本数据确定所述多个预设场景以及所述多个预设场景分别对应的多个所述性能预测算法,所述样本数据包括所述多个预设场景分别对应的指令流特征和存储器运行特征。
在一种可能的实现方式中,所述异构处理器上运行操作系统,所述方法还包括:通过所述第一处理核中的所述性能预测器,向所述操作系统发送所述目标性能信息;通过所述操作系统,基于所述目标性能信息从所述多个处理核中确定第二处理核,并将所述目标业务调度至所述第二处理核进行处理。
在一种可能的实现方式中,所述第一处理核中还包括第一采样单元和第二采样单元,所述方法还包括:通过所述第一采样单元,获取所述第一处理核运行所述目标业务的所述指令流特征,所述指令流特征包括指令类型,指令数量,处理器核动态事件中的一种或多种;通过所述第二采样单元,获取所述第一处理核运行所述目标业务的所述存储器运行特征,所述存储器运行特征包括访问带宽,访问延迟,缺失率,请求队列占用率中的一种或多种。
在本申请中,可通过处理核中的场景分类器获取当前处理核运行目标业务的线程的运行信息(可以包括处理核运行目标业务的指令流特征,和数据访问特征),以基于运行信息确定目标业务的业务场景。接下来,处理核中的性能预测器可以基于当前处理核的运行信息和目标业务的业务场景,预测将当前处理核的线程迁移到不同类型的其他处理器核上的性能,以便后续可以基于预测结果确定线程的调度策略,从而提升异构处理器的性能,或是改善异构处理器的能效。而在一些实施例中,仅基于处理器子系统收集应用程序(即业务)的处理核利用率等信息进行粗粒度的性能预测,无法满足实时性的要求,进而导致调度损失性能和能效。
本申请提供了一种计算机存储介质,其特征在于,所述计算机存储介质存储有计算机程序,该计算机程序被处理器执行时实现上述任意一种调度方法。
本申请实施例提供一种电子设备,该电子设备中包括处理器,处理器被配置为支持该电子设备实现上述任意一种调度方法中相应的功能。该电子设备还可以包括存储器,存储器用于与处理器耦合,其保存该电子设备必要的程序指令和数据。该电子设备还可以包括通信接口,用于该电子设备与其他设备或通信网络通信。
本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持电子设备实现上述所涉及的功能,例如,生成或处理上述一种调度方法中所涉及的信息。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存电子设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。
本申请提供一种计算机程序产品,其特征在于,所述计算机程序包括指令,当所述计算机程序被计算机执行时,使得所述计算机执行上述一种调度方法。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可能可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单
元的间接耦合或通信连接,可以是电性或其它的形式。
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
上述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以为个人计算机、服务器或者网络设备等,具体可以是计算机设备中的处理器)执行本申请各个实施例上述方法的全部或部分步骤。其中,而前述的存储介质可包括:U盘、移动硬盘、磁碟、光盘、只读存储器(Read-Only Memory,缩写:ROM)或者随机存取存储器(Random Access Memory,缩写:RAM)等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。
Claims (14)
- 一种异构处理器,其特征在于,所述异构处理器包括大小不同的多个处理核,且所述多个处理核中每个处理核包括场景分类器和性能预测器,其中,第一处理核中的场景分类器,用于:获取所述第一处理核的运行信息,所述运行信息包括所述第一处理核运行目标业务的指令流特征和存储器运行特征中的一种或多种,所述第一处理核为所述多个处理核中的任意一个;基于所述运行信息从多个预设场景中确定所述目标业务的目标场景;所述第一处理核中的性能预测器,用于:基于所述目标场景和所述运行信息预测目标性能信息,所述目标性能信息为与所述目标业务对应的性能预测信息。
- 如权利要求1所述的异构处理器,其特征在于,所述每个处理核中的所述性能预测器包括多个性能子预测器和目标选择器,且所述多个性能子预测器中每个性能子预测器对应一个所述预设场景的性能预测算法,所述第一处理核中的所述性能预测器,具体用于:通过每个所述性能子预测器,基于对应所述预设场景的性能预测算法和所述第一处理核的所述运行信息,确定对应的性能信息;通过所述目标选择器,获取所述多个性能子预测器分别对应的多个性能信息,并基于所述目标场景从所述多个性能信息中确定所述目标性能信息。
- 如权利要求1所述的异构处理器,其特征在于,所述第一处理核中的所述性能预测器,具体用于:从多个性能预测算法中确定所述目标场景对应的目标性能预测算法,所述多个性能预测算法与所述多个预设场景一一对应;基于所述第一处理核的所述运行信息和所述目标性能预测算法,确定所述目标性能信息。
- 如权利要求2或3所述的异构处理器,其特征在于,所述第一处理核中的所述性能预测器,还用于:基于样本数据,确定所述多个预设场景以及所述多个预设场景分别对应的多个所述性能预测算法,所述样本数据包括所述多个预设场景分别对应的指令流特征和存储器运行特征。
- 如权利要求1-4任意一项所述的异构处理器,其特征在于,所述异构处理器上运行操作系统,所述第一处理核中的所述性能预测器,还用于:向所述操作系统发送所述目标性能信息;所述操作系统,用于:基于所述目标性能信息从所述多个处理核中确定第二处理核,并将所述目标业务调度至所述第二处理核进行处理。
- 如权利要求1-5任意一项所述的异构处理器,其特征在于,所述第一处理核中还包括第一采样单元和第二采样单元,所述第一采样单元,用于:获取所述第一处理核运行所述目标业务的所述指令流特征,所述指令流特征包括指令类型,指令数量、处理器核动态事件中的一种或多种;所述第二采样单元,用于:获取所述第一处理核运行所述目标业务的所述存储器运行特征,所述存储器运行特征包括访问带宽,访问延迟,缺失率,请求队列占用率中的一种或多种。
- 一种调度方法,其特征在于,应用于异构处理器,所述异构处理器包括大小不同的多个处理核,且所述多个处理核中每个处理核包括场景分类器和性能预测器,所述方法包括:通过第一处理核中的场景分类器,获取所述第一处理核的运行信息,所述运行信息包括所述第一处理核运行目标业务的指令流特征和存储器运行特征中的一种或多种,所述第一处理核为所述多个处理核中的任意一个;基于所述运行信息从多个预设场景中确定所述目标业务的目标场景;通过所述第一处理核中的性能预测器,基于所述目标场景和所述运行信息预测目标性能信息,所述目标性能信息为与所述目标业务对应的性能预测信息。
- 如权利要求7所述的方法,其特征在于,所述每个处理核中的所述性能预测器包括多个性能子预测器和目标选择器,且所述多个性能子预测器中每个性能子预测器对应一个所述预设场景的性能预测算法, 所述基于所述目标场景和所述运行信息预测目标性能信息,包括:通过每个所述性能子预测器,基于对应所述预设场景的性能预测算法和所述第一处理核的所述运行信息,确定对应的性能信息;通过所述目标选择器,获取所述多个性能子预测器分别对应的多个性能信息,并基于所述目标场景从所述多个性能信息中确定所述目标性能信息。
- 如权利要求7所述的方法,其特征在于,所述基于所述目标场景和所述运行信息预测目标性能信息,包括:从多个性能预测算法中确定所述目标场景对应的目标性能预测算法,所述多个性能预测算法与所述多个预设场景一一对应;基于所述第一处理核的所述运行信息和所述目标性能预测算法,确定所述目标性能信息。
- 如权利要求8或9所述的方法,其特征在于,所述方法还包括:通过所述第一处理核中的所述性能预测器,基于样本数据确定所述多个预设场景以及所述多个预设场景分别对应的多个所述性能预测算法,所述样本数据包括所述多个预设场景分别对应的指令流特征和存储器运行特征。
- 如权利要求7-10任意一项所述的方法,其特征在于,所述异构处理器上运行操作系统,所述方法还包括:通过所述第一处理核中的所述性能预测器,向所述操作系统发送所述目标性能信息;通过所述操作系统,基于所述目标性能信息从所述多个处理核中确定第二处理核,并将所述目标业务调度至所述第二处理核进行处理。
- 如权利要求7-11任意一项所述的方法,其特征在于,所述第一处理核中还包括第一采样单元和第二采样单元,所述方法还包括:通过所述第一采样单元,获取所述第一处理核运行所述目标业务的所述指令流特征,所述指令流特征包括指令类型,指令数量,处理器核动态事件中的一种或多种;通过所述第二采样单元,获取所述第一处理核运行所述目标业务的所述存储器运行特征,所述存储器运行特征包括访问带宽,访问延迟,缺失率,请求队列占用率中的一种或多种。
- 一种计算机存储介质,其特征在于,所述计算机存储介质存储有计算机程序,该计算机程序被处理器执行时实现上述权利要求7-12中任意一项所述的方法。
- 一种计算机程序产品,其特征在于,所述计算机程序包括指令,当所述计算机程序被计算机或处理器执行时,使得所述计算机或处理器执行如权利要求7-12中任意一项所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211534517.X | 2022-12-02 | ||
CN202211534517.XA CN118170503A (zh) | 2022-12-02 | 2022-12-02 | 一种异构处理器及相关调度方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024114728A1 true WO2024114728A1 (zh) | 2024-06-06 |
Family
ID=91323022
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/135404 WO2024114728A1 (zh) | 2022-12-02 | 2023-11-30 | 一种异构处理器及相关调度方法 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN118170503A (zh) |
WO (1) | WO2024114728A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118445595A (zh) * | 2024-07-02 | 2024-08-06 | 深圳市铨天科技有限公司 | 一种存算一体芯片的细粒度映射方法、系统及介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103119580A (zh) * | 2010-09-25 | 2013-05-22 | 英特尔公司 | 异构多处理器计算平台中的应用调度 |
CN104583900A (zh) * | 2012-10-04 | 2015-04-29 | 英特尔公司 | 在处理器的异质核之间动态切换工作载荷 |
US20170262955A1 (en) * | 2017-05-26 | 2017-09-14 | Mediatek Inc. | Scene-Aware Power Manager For GPU |
CN111132283A (zh) * | 2019-11-11 | 2020-05-08 | 华为技术有限公司 | 一种功耗控制方法及设备 |
-
2022
- 2022-12-02 CN CN202211534517.XA patent/CN118170503A/zh active Pending
-
2023
- 2023-11-30 WO PCT/CN2023/135404 patent/WO2024114728A1/zh unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103119580A (zh) * | 2010-09-25 | 2013-05-22 | 英特尔公司 | 异构多处理器计算平台中的应用调度 |
CN104583900A (zh) * | 2012-10-04 | 2015-04-29 | 英特尔公司 | 在处理器的异质核之间动态切换工作载荷 |
US20170262955A1 (en) * | 2017-05-26 | 2017-09-14 | Mediatek Inc. | Scene-Aware Power Manager For GPU |
CN111132283A (zh) * | 2019-11-11 | 2020-05-08 | 华为技术有限公司 | 一种功耗控制方法及设备 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118445595A (zh) * | 2024-07-02 | 2024-08-06 | 深圳市铨天科技有限公司 | 一种存算一体芯片的细粒度映射方法、系统及介质 |
Also Published As
Publication number | Publication date |
---|---|
CN118170503A (zh) | 2024-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021254135A1 (zh) | 任务执行方法及存储设备 | |
CN104298550B (zh) | 一种面向Hadoop的动态调度方法 | |
US10348815B2 (en) | Command process load balancing system | |
WO2024114728A1 (zh) | 一种异构处理器及相关调度方法 | |
US20130232310A1 (en) | Energy efficiency in a distributed storage system | |
US9569381B2 (en) | Scheduler for memory | |
US11914894B2 (en) | Using scheduling tags in host compute commands to manage host compute task execution by a storage device in a storage system | |
US20120297216A1 (en) | Dynamically selecting active polling or timed waits | |
CN112052082B (zh) | 任务属性优化方法、装置、服务器及存储介质 | |
JP2022033688A (ja) | メモリアクセスリクエストスケジューリング方法、装置、電子デバイス、コンピュータ可読記憶媒体及びコンピュータプログラム | |
US12056360B2 (en) | Optimized I/O performance regulation for non-volatile storage | |
US20240143392A1 (en) | Task scheduling method, chip, and electronic device | |
US10872015B2 (en) | Data storage system with strategic contention avoidance | |
WO2024119930A1 (zh) | 调度方法、装置、计算机设备和存储介质 | |
CN116248699B (zh) | 多副本场景下的数据读取方法、装置、设备及存储介质 | |
CN117093335A (zh) | 分布式存储系统的任务调度方法及装置 | |
CN112114967B (zh) | 一种基于服务优先级的gpu资源预留方法 | |
CN114281543A (zh) | 一种基于固态存储实现存算一体化的系统及方法 | |
CN114077481A (zh) | 任务调度方法、装置、设备和存储介质 | |
CN113655963B (zh) | 基于内存桥接的数据存储系统、方法及计算机设备 | |
WO2015052823A1 (ja) | クラウド管理装置、その管理方法、およびそのシステム | |
CN118502679B (zh) | 存储器的数据访问调度方法及装置 | |
TWI823655B (zh) | 適用於智慧處理器的任務處理系統與任務處理方法 | |
WO2023066248A1 (zh) | 数据处理方法、装置、设备和系统 | |
WO2024093280A1 (zh) | 任务管理方法、装置、系统、通信设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23896870 Country of ref document: EP Kind code of ref document: A1 |