WO2022188306A1 - 一种任务分配方法及装置 - Google Patents

一种任务分配方法及装置 Download PDF

Info

Publication number
WO2022188306A1
WO2022188306A1 PCT/CN2021/103160 CN2021103160W WO2022188306A1 WO 2022188306 A1 WO2022188306 A1 WO 2022188306A1 CN 2021103160 W CN2021103160 W CN 2021103160W WO 2022188306 A1 WO2022188306 A1 WO 2022188306A1
Authority
WO
WIPO (PCT)
Prior art keywords
core
information
task
processor
cores
Prior art date
Application number
PCT/CN2021/103160
Other languages
English (en)
French (fr)
Inventor
尹文
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022188306A1 publication Critical patent/WO2022188306A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Definitions

  • the present application relates to the field of computer technology, and in particular, to a task assignment method and device.
  • the task allocation method of the current multi-core processor will ensure that each core of the processor works at a higher frequency as much as possible to ensure processing performance.
  • a higher voltage is required to support the cores of the processor operating at that frequency. High voltage will accelerate the aging of transistor circuits in the processor core, reducing the life of the processor core. If the core has been working at the highest frequency, the lifespan will drop off a cliff, shortening to months or even days. A core with exhausted or insufficient lifespan will operate at a significantly reduced frequency.
  • Embodiments of the present application provide a task allocation method and apparatus, so as to improve the service life of a processor core.
  • an embodiment of the present application provides a task allocation method, which can be applied to a computer system with multiple processor cores to improve the service life of the cores in the computer system.
  • the method can be implemented by task allocation, and the task allocation device can be a computer system, and the computer system can be a computer system with multiple processor cores to which the method is applied, or other computer systems.
  • the method includes: the task allocation apparatus may obtain first information of multiple cores of the processor, where the first information is used to describe the service life of the cores.
  • the task allocation apparatus may further determine, according to the respective first information of the multiple cores, a first core for processing the first task from the multiple cores, where the first task is a task to be processed.
  • the task allocation apparatus may determine the first core according to the respective first information of the multiple cores and the task amount information of the first task.
  • the task allocation device may also allocate cores according to the number of cores in the physical core group. determine the first physical core group corresponding to the first task, and according to the first information of the cores in the first physical core group, determine from the cores of the first physical core group that each thread in the first task corresponds to the nucleus.
  • the task allocation apparatus may further store updated first information of the first core, where the updated first information is determined according to the first information of the first core and the task amount information of the first task.
  • the lifespan information of the first core can be updated according to the task amount information of the first task, so the lifespan information of the core can be obtained more accurately, and the lifespan balancing effect of subsequent task assignment can be improved.
  • the first information of the kth core of the processor includes the remaining lifetime information of the kth core; and/or the first information includes the used lifetime information of the kth core.
  • the remaining lifetime information of the kth core may be determined according to the total lifetime information and the used lifetime information of the kth core.
  • the lifetime information of the kth core may be based on the historical operation time of the kth core, the frequency and historical operation time of the kth core, the voltage and historical operation time of the kth core, the kth core
  • the position of the core on the processor, the total memory access time of the kth core, the historical instruction number of the kth core and the average running time of the instructions, or the frequency of the kth core and the total memory access time are determined by at least one information. .
  • an embodiment of the present application provides a task allocation method, which can be applied to a computer system with multiple processors, so as to improve the service life of cores in the computer system.
  • the method can be implemented by task allocation, and the task allocation device can be a computer system, and the computer system can be a computer system with multiple processor cores to which the method is applied, or other computer systems.
  • the method includes: the task distribution device may acquire third information of the multiple processors, where the third information is used to describe the service life of the processors.
  • the task allocating apparatus may further determine, according to the respective first information of the multiple processors, a first processor for processing the second task from the multiple processors, and the second task is a to-be-processed task.
  • the task allocation device may determine the first processor according to the respective third information of the multiple processors and the task amount information of the second task, so as to further improve the rationality of task allocation sex.
  • an embodiment of the present application provides a task assignment apparatus, which can be applied to a computer system having multiple processor cores, and the task assignment apparatus can specifically implement the behavior of the task assignment method in the first aspect or the second aspect.
  • the task assignment device may be a hardware or software unit in a computer system, and may include at least one module, and the at least one module is used to implement the task assignment method described in the first aspect or the second aspect and any possible designs thereof. .
  • an embodiment of the present application provides a task allocation apparatus, including at least one processor, the at least one processor is coupled with at least one memory: the at least one processor is configured to execute the at least one memory A computer program or instructions stored to cause the apparatus to perform the method of the first aspect or the second aspect and any possible designs thereof.
  • the apparatus further includes a communication interface to which the processor is coupled.
  • the communication interface may be a transceiver or an input/output interface; when the device is a chip included in a network device, the communication interface may be an input/output interface of the chip.
  • the transceiver may be a transceiver circuit, and the input/output interface may be an input/output circuit.
  • an embodiment of the present application provides a computing device, including a processor and a memory, where the processor includes multiple processor cores; the memory is used for storing computer programs or instructions; the processor is used for The computer program or instructions are executed to implement the task assignment method described in the first aspect or the second aspect and any possible designs thereof.
  • an embodiment of the present application provides a computer system
  • the computer system may include a recording module, a task allocation module, and multiple processor cores
  • the task allocation module may be configured to
  • the respective first information implements the method described in the first aspect and any possible designs thereof.
  • the recording module may include an editable memory for storing the respective first information of the plurality of processor cores, or for reading the respective first information of the multiple processor cores from the editable memory.
  • an embodiment of the present application provides a computer system
  • the computer system may include a recording module, a task allocation module, and multiple processors
  • the task allocation module may be configured to
  • the first information implements the method described in the second aspect and any possible designs thereof.
  • the recording module may include an editable memory for storing the respective third information of the plurality of processors, or for reading the respective third information of the plurality of processors from the editable memory.
  • an embodiment of the present application provides a readable storage medium for storing an instruction, when the instruction is executed, the method described in the first aspect or the second aspect and any possible designs thereof is enabled is realized.
  • the embodiments of the present application provide a computer program product containing instructions, which, when run on a computer, cause the computer to execute the method described in the first aspect or the second aspect and any possible designs thereof.
  • an embodiment of the present application provides a chip system, including: a processor, the processor is coupled to a memory, the memory is used to store programs or instructions, the chip system may further include an interface circuit, the interface circuit It is used to receive programs or instructions and transmit them to a processor; when the programs or instructions are executed by the processor, the system-on-a-chip enables the method in the first aspect or the second aspect and any possible designs thereof.
  • the number of processors in the chip system may be one or more.
  • the processor can be implemented by hardware or by software.
  • the processor may be a logic circuit, an integrated circuit, or the like.
  • the processor may be a general-purpose processor implemented by reading software codes stored in memory.
  • the memory may be integrated with the processor, or may be provided separately from the processor, which is not limited in this application.
  • the memory may be a non-transitory processor, such as a read-only memory (ROM), which may be integrated with the processor on the same chip, or may be separately provided on different chips.
  • ROM read-only memory
  • the type of memory and the manner in which the memory and the processor are arranged are not particularly limited.
  • the present application may further combine to provide more implementations.
  • Fig. 1 is the schematic diagram of EAS technology under Linux operating system
  • FIG. 2 is a schematic flowchart of a task allocation method provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a thread allocation process according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of the location of a core on a CPU according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of the location of another core on a CPU provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of an implementation manner of a task allocation method provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of an implementation manner of another task allocation method provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of the architecture of a CPU and GPU heterogeneous system according to an embodiment of the present application.
  • FIG. 9 is a schematic diagram of an implementation manner of another task allocation method provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a task assignment apparatus provided by an embodiment of the application.
  • FIG. 11 is a schematic structural diagram of another task allocation apparatus provided by an embodiment of the present application.
  • the embodiments of the present application provide a task allocation method and apparatus, which help to improve the service life of cores in a multi-core processor.
  • the method and the device are based on the same technical concept. Since the principles of the method and the device to solve the problem are similar, the implementation of the device and the method can be referred to each other, and the repetition will not be repeated.
  • a computer system consists of hardware (sub)systems and software (sub)systems.
  • the hardware (sub) system includes the organic combination of various physical components (such as processors, etc.) composed of electrical, magnetic, optical, mechanical and other principles, and is the entity on which the system works;
  • the software (sub) system includes various Procedures and documents for directing the entire system to work according to specified requirements.
  • modern computer systems are as small as microcomputers and personal computers, as large as supercomputers and their networks, with various shapes and characteristics, and have been widely used in scientific computing, transaction processing and process control. field and have a profound impact on the progress of society.
  • the computer system in the embodiment of the present application may be a computer system in a terminal device, which is a device that provides business services to users and has functions such as voice or data connectivity.
  • the terminal device may also be referred to as a terminal device, and may also be referred to as user equipment (UE), a mobile station (mobile station, MS), a mobile terminal (mobile terminal, MT), etc.
  • UE user equipment
  • MS mobile station
  • MT mobile terminal
  • the terminal device may also be a chip.
  • a terminal device is taken as an example for specific description.
  • the terminal device may be a handheld device with a wireless connection function, a vehicle-mounted device, or the like.
  • some examples of terminal devices are: mobile phone (mobile phone), tablet computer, notebook computer, PDA, mobile internet device (MID), smart point of sale (POS), wearable device, Virtual reality (VR) equipment, augmented reality (AR) equipment, wireless terminals in industrial control, wireless terminals in self driving, remote medical surgery wireless terminals in smart grids, wireless terminals in transportation safety, wireless terminals in smart cities, wireless terminals in smart homes, Class smart meters (smart water meter, smart electricity meter, smart gas meter), etc.
  • the computer system in this embodiment of the present application may be a server, which is a device that provides a data connection service. Since the server can respond to the service request of the terminal device and process it, generally speaking, the server should have the ability to undertake and guarantee the service.
  • the server may be a server located in a data network (DN), such as a common server, a server in a cloud platform; or a multi-access edge computing (multi-access edge) located in the core network computing, MEC) server, etc.
  • DN data network
  • MEC multi-access edge computing
  • the computer system in this embodiment of the present application may also be a processor, a chip, or a chip system.
  • OS Operating system
  • OS is the most basic system software running on a computer system, such as windows system, Android system, IOS system, windows server system, Netware system, Unix system, Linux system.
  • the kernel is the first layer of software expansion based on hardware, providing the most basic functions of the operating system. For example, the processes, memory, drivers, files, and network systems responsible for managing the system determine the performance and stability of the system.
  • the core of the processor, or the core is the core chip in the processor.
  • the number of cores of a processor refers to how many cores a processor consists of. The more cores, the faster the processor runs and the better the performance. If the number of cores of a processor is greater than or equal to 2, the processor can be called a multi-core processor.
  • Homogeneous processors or homogeneous multi-core processors.
  • the structure of each processor core of a homogeneous multi-core processor is exactly the same, and the status is also the same.
  • different cores can share the same code, or different cores can execute different codes respectively.
  • a CPU isomorphic processor as an example, a plurality of cores in the processor are all CPU cores, or the computing modules included in the processor are all called CPU computing modules.
  • Heterogeneous processors or heterogeneous multi-core processors.
  • Different cores of a heterogeneous multi-core processor may employ cores with different functions.
  • Heterogeneous multi-core processors are often used for special applications, such as signal processing.
  • some cores are generally used for management and scheduling, and other cores are used for specific performance acceleration.
  • the processor cores are interconnected through shared buses, crossbar switches, and on-chip networks.
  • the processor may include at least one CPU core and at least one xPU (such as GPU or NPU, etc.) core, or the computing module included in the processor includes at least a CPU computing module and a xPU (such as GPU or NPU, etc.) computing module, in which the CPU core can be used for management and scheduling.
  • the CPU core can be used for management and scheduling.
  • Process and thread (tread).
  • a process is the smallest unit of resource allocation, and a thread or logical thread is the smallest unit of program execution.
  • a thread can include one or more instructions, so the processing running time of each thread may be different. That is, resources are allocated to a process, and all threads within the same process share all resources of that process. Among them, a thread can only belong to one process, and a process can have multiple threads, but at least one thread.
  • a thread is a single sequential flow of control in a process.
  • a process can have multiple threads concurrently, and each thread can execute different tasks in parallel. For multi-core processors, different cores can be used to execute different threads, thus enabling parallelization of tasks.
  • a thread can be understood as the smallest pipeline unit in a processor performing specific data processing. It should be understood that a core may correspond to one or more pipelines in order to achieve multitasking parallel processing.
  • task refers to an activity completed by the software.
  • An application may contain one or more tasks.
  • a task can be either a process or a thread, or can include multiple processes and/or multiple threads.
  • task allocation refers to allocating processes and/or threads included in a task
  • thread allocation refers to allocating threads to processor pipelines so that threads are processed through pipelines.
  • a task could be to read data and put it in memory.
  • This task can be implemented as a process or as a thread (or as an interrupt task).
  • the following takes thread allocation as an example to illustrate the way of task allocation in the current multi-core processor.
  • the prior art mainly realizes thread allocation through a software scheduling method at the operating system (operation system, OS) layer.
  • OS operating system
  • the complete fair schedule (CFS) invoker and load balancing of the Linux kernel are mainly for the priority of server performance, so multiple threads are evenly distributed to the system
  • CFS complete fair schedule
  • the Linux kernel 5.3 adopts the energy-aware scheduler (EAS) technology, which can make full use of the differences in core power consumption, performance and frequency to achieve the optimal balance between performance and power consumption.
  • EAS energy-aware scheduler
  • the Linux scheduler in EAS can execute CFS
  • the Linux CPU idle mechanism represented as the idle mechanism in Figure 1
  • the Linux CPU frequency conversion mechanism determines when to increase or decrease the CPU frequency
  • the energy model can be used to balance the energy consumption between the Linux scheduler, the Linux CPU idle mechanism, and the Linux CPU frequency conversion mechanism.
  • Embodiments of the present application provide a task allocation method and apparatus, which can allocate a task to be processed to a first core of a processor according to life information of multiple cores of a processor, and the first core processes the task. Since tasks are allocated according to the lifetime information of the cores, the lifetimes of multiple cores of the processor can be balanced, so that each core can work at a higher frequency.
  • the task assignment method can be performed by a task assignment apparatus, and the task assignment apparatus may be a computer system.
  • the apparatus can be implemented by a combination of software and hardware or by hardware.
  • the task allocation method may be implemented by the operating system executing software, or by the chip executing software solidified in the chip or the like.
  • the task allocation can be implemented by firmware (BIOS) and/or a task allocation layer (such as a scheduler) of an operating system, or can be implemented by a processor (or processor chip) such as a CPU, which is not specifically described in this application. limited.
  • the task allocation method provided in this application can be applied to a homogeneous processor or a heterogeneous processor.
  • the homogeneous processors are, for example, CPU homogeneous processors
  • the heterogeneous processors are, for example, heterogeneous processors of CPU and xPU.
  • the method may include the following steps:
  • the task distribution apparatus acquires respective lifetime information of multiple cores of the processor, where the lifetime information can be used to describe the lifetime of the cores. It should be understood that the lifetime information may be referred to as the first information in the textual description and drawings of the present application.
  • life information or service life information may be used to determine the remaining lifetime and/or the used lifetime of the core, that is, the lifetime information may include remaining lifetime information and/or used lifetime information.
  • the total lifetime information can be used to describe the maximum working time that the core can support.
  • the used life information can be related to information such as the running time of the nuclear historical task or the workload of the historical task, etc., and can be used to describe the time length of the core that has been consumed.
  • the remaining life information can be determined according to the total life information and the used life information, and describes the remaining available time of the core. It should be understood that, unless otherwise specified in this application, the lifetime information may be any one or more items of remaining lifetime information or used lifetime information.
  • the lifetime information may be time information used to indicate the service life, or, for the convenience of calculation, the lifetime information may be a value quantified by normalization and other methods according to the time information.
  • the life information can be 5 years, Can also be 100.
  • the duration corresponding to the total lifetime information of different cores may be different, generally around 5 years (or other time information), there may be a deviation of about 5%, and it can be considered that the total lifetime of the core
  • the size of the information is a Gaussian distribution, and the normalized value may fluctuate around 100.
  • the remaining service life and/or the used service life of the core can also be normalized to a numerical value to obtain the remaining service life information and/or the used service life information, respectively.
  • the task allocation apparatus determines, according to the respective lifetime information of the multiple cores, a first core for processing the first task from the multiple cores, where the first core is one of the multiple cores of the processor.
  • the lifespan information includes remaining lifespan information
  • the task distribution device may select the core with the largest remaining lifespan information as the first core, and/or the lifespan information includes used lifespan information, and the task distribution device may select the core with the smallest used lifespan information as the first core.
  • first nucleus the first core may be selected from the cores whose remaining life information is greater than or equal to the first threshold and/or whose used life information is less than or equal to the second threshold, and the selection method at this time may include random selection.
  • the cores may be sorted according to the lifetime information of each core, for example, the weight of each core is determined according to the lifetime information, and the assignment is performed according to the weight of each core during task allocation.
  • the weight of the core may indicate the possibility of the core being selected as the first core, and the higher the weight is, the more likely the corresponding core is to be determined as the first core.
  • the weights of the cores may be positively correlated with the remaining lifetime information of the cores, and/or negatively correlated with the used lifetime information of the cores.
  • the tasks to be processed can be allocated to the first core of the processor according to the life information of multiple cores of the processor, so that the service life of each core can be balanced, so as to prolong the life of the processor core as much as possible. lifetime, allowing the core to operate at higher frequencies for longer periods of time.
  • the process of determining the first core in S102 is a process of task allocation, which can be regarded as a process of determining the mapping relationship between the threads included in the task and the first core.
  • the task allocation device may, according to the respective second information of the physical core clusters, select from the plurality of physical core clusters.
  • a first physical core group corresponding to the first task is determined in the group, and the cores of the first physical core group include the first core. It should be understood that caches may be shared among the cores of a physical core group.
  • the second information may be lifetime information of the physical core group, for example, the second information includes remaining lifetime information and/or used lifetime information of the physical core group, and the second information may also be the average lifetime information of the cores in the physical core group and / or the lifetime information of the core with the least remaining lifetime in the physical core group, etc.
  • the remaining lifetime information of the physical core group may be the sum of the remaining lifetime information of the cores included in the physical core group, or may be other parameters or indicators used to measure the remaining lifetime of the physical core group.
  • the used life information of the physical core set may be the sum of the used life information of the cores included in the physical core set, or may be other parameters or indicators used to measure the used life of the physical core set.
  • the average lifetime information of the cores in the physical core group may be determined according to the remaining lifetime information or the used lifetime information of each core in the physical core group, and the lifetime information of the core with the least remaining lifetime information in the physical core group may be the core with the least remaining lifetime information. life remaining information and/or used life information.
  • the core corresponding to each thread of the first task may be determined according to the lifetime information of the cores in the first physical core group. It should be understood that when determining the physical core group corresponding to the task according to the second information, for example, the physical core group with larger remaining life information, smaller used life information, or larger life information of the core with the least remaining life in the group may be used as the first physical core group.
  • a physical nucleus group to further equalize the lifetime information of the nucleus, thereby further improving the lifetime of the nucleus.
  • task 1 and task N respectively include three threads, which are denoted as thread 1 to thread 3.
  • the task allocating apparatus may determine the mapping relationship between the task and the physical core group of the CPU.
  • a physical core group may include one or more cores, and FIG. 3 is illustrated by taking the number of cores in each physical core group as 3 as an example.
  • the task assignment device may determine that task 1 corresponds to physical core group 1, and therefore assign thread 1 to thread 3 of task 1 to cores in physical core group 1 (the assignment relationship is shown in FIG. 3 by not cutting
  • the task allocation device may determine that task N corresponds to physical core group N, and allocate thread 1 to thread 3 of task N to cores in physical core group N.
  • the first physical core group includes at least one second core.
  • the task allocation apparatus may select a physical core group corresponding to the task from a physical core group including at least one second core according to the second information of the physical core group.
  • the second core may be a core whose remaining life information is greater than or equal to the third threshold and/or whose used life information is less than or equal to the fourth threshold, for example, the remaining life information of the cores in a certain physical core group is less than the third threshold
  • a threshold such as 5, indicates that all cores in the physical core group are about to run out of life, and tasks can be assigned to cores in other physical core groups.
  • the second core in physical core group 1 may include core 2 and core 3
  • the second core in physical core group N may include core 2 and core 3 .
  • the task allocation device may select the physical core group corresponding to the task according to the weight of the physical core group.
  • the weight of the physical core group is positively correlated with the remaining lifetime information of the core (or the second core) in the physical core group, and/or, is formed with the used lifetime information of the core (or the second core) in the physical core group Anticorrelation.
  • the physical core group with the largest remaining life information of the core may be used as the physical core group corresponding to the task, or the core (or the second core) ), the physical core group with the smallest used life information is used as the physical core group corresponding to the task, or, a physical core group whose remaining life information of the core (or the second core) is greater than or equal to the fifth threshold is used as the corresponding physical core group for the first task A physical core group, or a physical core group whose used life information of the core (or the second core) is less than or equal to the sixth threshold is used as the physical core group corresponding to the task.
  • the task assignment device may assign the threads included in the task to the cores in the physical core group corresponding to the task according to the lifetime information of the cores in the physical core group, so as to form a mapping relationship between the cores and the threads.
  • the total number of threads to be allocated included in task 1 and task N may be the same as the total number of cores included in physical core group 1 and physical core group N participating in the allocation.
  • the allocation apparatus when the task allocation apparatus allocates threads to cores in a physical core group, the allocation may be performed according to the weight of the cores.
  • the remaining life information of core 1 in physical core group 1 is 1
  • the remaining life information of core 2 and core 3 is 10
  • the remaining life information of core 3 is 15, the weight of core 3 in physical core group 1 is higher than that of core 2 and the weight of core 2 is higher than that of core 1.
  • the thread can be assigned to the core first. 3.
  • the second priority is allocated to core 2, and not allocated to core 1 as much as possible. For example, as shown in FIG. 3 , thread 2 and thread 3 can be assigned to core 3, and thread 1 can be assigned to core 2, that is, core 1 does not need to assign the thread of task 1.
  • the lifetime information of the kth core may be based on the historical running time of the kth core, the frequency and historical running time of the kth core, the kth core voltage and historical running time, the position of the kth core on the processor, the total memory access time of the kth core, the historical number of instructions and the average running time of the instructions of the kth core, or the frequency of the kth core ( or voltage or the position of the core on the processor) and at least one of the total memory access time.
  • 1 ⁇ k ⁇ K, k and K are both positive integers.
  • the specific instructions are as follows:
  • the historical running time of the core may indicate the historical working time of the core, for example, it may be the total working time of the core since it was first run. It should be understood that the greater the historical run time of the core, the greater the used life information and/or the less the remaining life information of the core.
  • the running duration may refer to the clock cycles consumed by the core executing the instruction.
  • a counter can be started to count the clock cycles consumed by the core executing the instruction. The length of the period determines the historical run time.
  • the historical instruction number of the kth core and the average running time of the instructions can be used to determine the historical running time of the kth core. Therefore, the kth core can be determined according to the historical instruction number and the average running time of the kth core. Lifetime information.
  • the number of historical instructions refers to the total number of instructions included in all historical tasks that have been processed by the core.
  • the number of historical instructions of the kth core may be the total number of instructions included in all historical tasks executed by the kth core before the first task is allocated.
  • the average running time of an instruction refers to the average running time of executing one instruction, which can be determined according to the running time of multiple instructions and the number of instructions of multiple instructions.
  • the frequency of the kth core may be the average power of the core during a period of time (eg, historical running time) or in the process of completing a certain task (eg, all historical tasks). It can be understood that in the case of the same historical running time, the greater the frequency of the core, the greater the loss of the core, that is, the smaller the used life information and/or the greater the remaining life information, therefore, the frequency of the core can be passed. and historical runtime to determine the lifetime information of the core.
  • the voltage of the kth core may be the average voltage of the core during a period of time (eg, historical running time) or during the completion of a certain task (eg, all historical tasks). It can be understood that under the same historical operating time, the greater the voltage of the core, the greater the loss of the core, that is, the smaller the used life information and/or the greater the remaining life information. Therefore, the voltage that can pass through the core is larger. and historical runtime to determine the lifetime information of the core.
  • the location of the core on the processor may also affect the lifetime information of the core, where the location refers to the physical location of the core on the chip, and/or the relative location between the cores.
  • the thermal density is different, and the thermal density is related to the position of the core on the processor.
  • the thermal density is higher the closer to the center of the chip, or the highest thermal density is closer to the center of the area where the physical core group is dense, and the higher the thermal density under the same frequency and/or historical operating time
  • the junction temperature refers to the temperature of the semiconductor transistor. Under the same operating time, the life of the core decreases faster at a higher junction temperature.
  • Tj1, Tj2 and Tj3 represent the junction temperatures of position 1, position 2 and position 3 shown in Figure 4, respectively, where Tj1, Tj2 and Tj3 are respectively away from the center of the chip
  • the distance of the positions increases sequentially, that is to say, the distance between Tj1 and the center of the chip is less than the distance between Tj2 and the center of the core array, and the distance between Tj2 and the center of the chip is less than the distance between Tj3 and the center of the chip, then Tj1 , Tj2 and Tj3 satisfy the following relationship: Tj1>Tj2>Tj3.
  • Tj, Tj2 and Tj3 respectively represent the junction temperature of position 1, position 2 and position 3 shown in Figure 5, wherein Tj1, Tj2 and Tj3
  • the remaining life information and/or the used life information of the core may be determined according to the junction temperature of the core. For example, in the case of the same historical operating time, the greater the junction temperature, the greater the used life information, and/or, The remaining life information is smaller.
  • the chip space can be divided into multiple areas by the chip temperature, and a limited number of temperature sensors can be placed in different areas.
  • different temperature sensors obtain The junction temperature of the junction temperature can be fed back to the task distribution device in real time, which can be used to determine the junction temperature of the area where each junction temperature detector is located.
  • the memory access time of the core is the memory access time of the core, which can also be called the access time. It refers to the time elapsed from the start of a memory operation to the completion of the operation by the core.
  • Memory operations include memory access, such as reading memory. data in .
  • the memory access time is related to the hardware parameters of the processor, which can be understood as the same memory access time for the same core.
  • the total memory access time of a core refers to the sum of the memory access times in the core processing at least one historical task. It can be understood that the longer the total memory access time of the core, the more memory access times required for the tasks processed by the core.
  • the remaining lifetime information and/or the used lifetime information of the core may be determined according to the total memory access time of the core. The longer the total core access time is, the larger the used lifetime information and/or the smaller the remaining lifetime information of the core. .
  • the total memory access time of the core may also be replaced by the hidden total memory access time of the core or the unhidden total memory access time of the core.
  • the total hidden fetch time for a core is the sum of the hidden fetch times in which the core processes at least one task.
  • the hidden memory access time means that the core performs other operations than the memory operation during the memory operation, so that the core performs the memory operation during the memory operation (the period is the memory access time, or a part of the time period of the memory access time. If there is no idling in the memory operation, the time when other operations are performed in the memory access time corresponding to this memory operation is called the hidden memory access time.
  • the task assignment device may tend to assign tasks to cores with less total hidden fetch time.
  • the total unhidden fetch time is the sum of the unhidden fetch times in the core processing at least one task.
  • the unhidden memory access time refers to the time when no other operations other than memory operations are performed during the memory access time.
  • the task assignment device may tend to assign tasks to cores with less total fetch time that are not hidden.
  • the task allocation device may be based on the frequency of the kth core, the voltage of the kth core, the position of the kth core on the CPU, the historical running time of the kth core, or the memory access of the kth core. Any information in the total time determines the used life information of the k-th core, and can also be based on the frequency of the k-th core, the voltage of the k-th core, the position of the k-th core on the CPU, and the k-th core. Multiple pieces of information in the historical running time, or the total fetch time of the kth core, determine the elapsed lifetime information of the kth core.
  • the weights are respectively set for the information, and the used life information of the kth core is determined according to the multiple pieces of information and the weights corresponding to the multiple pieces of information.
  • the core lifetime information can be further determined in combination with the core frequency, voltage or the position of the core on the processor.
  • the lifetime information is determined. For example, when the historical running time of two cores is the same, the used life information of the core with a higher frequency or voltage, or the core located closer to the center of the processor chip or the center of the area where the physical core group is dense The larger and/or the smaller the remaining life information.
  • the manner of determining the lifetime information shown above is merely an example, and in actual use, an extended manner such as permutation and combination may be performed on the basis of the manner exemplified above to determine the lifetime information of the kth core.
  • the lifetime information can also be determined according to the historical running time, frequency, voltage and position of the core on the processor.
  • the used life information of the kth core may be based on the number of historical instructions of the kth core, the average running time of instructions, the frequency of the kth core, and the memory access of the kth core time, and the ratio between the unhidden memory access time in at least one historical task (eg, all historical tasks) of the kth core and the total running time of the at least one historical task.
  • the number of historical instructions of the kth core, the average running time of instructions, the frequency of the kth core, and the memory access time of the kth core may refer to the foregoing description.
  • the memory access time that is not hidden in the at least one historical task of the kth core may be the memory access time during the execution of the at least one historical task by the kth core, and the total running time of the at least one historical task may be the memory access time of the kth core Clock cycles consumed to execute the at least one historical task.
  • the used life information of the k th core may conform to the following formula 1:
  • T comp represents the used life information of the k th core
  • F represents the frequency of the k th core
  • CPI k represents the average running time of the kth core executing instructions.
  • CPI k can be unchanged. Indicates the access time of the kth core.
  • the remaining lifetime information of the core may be determined according to the total lifetime information and the used lifetime information.
  • T core_total the total lifetime information of the k th core
  • T core_total the remaining lifetime information of the k th core
  • T core complies with the following formula 2:
  • T core T core_total -T comp .
  • the frequency of the kth core, the voltage of the kth core, the historical running time of the kth core, or the total memory access time of the kth core, or according to the kth core The number of historical instructions, the average running time of instructions, the frequency of the kth core, the memory access time of the kth core, and the total unhidden memory access time of at least one historical task of the kth core and the historical task.
  • the ratio between the running times, after determining the used life information of the k th core, the used life information of the k th core can also be corrected according to the position of the k th core on the CPU.
  • the junction temperature Tj of the kth core can be determined according to the position of the kth core on the processor, then the relationship between the used life information of the kth core before and after the correction and Tj can conform to Equation 3:
  • T comp ′ represents the used life information of the k-th core after correction
  • T comp represents the used life information of the k-th core before correction
  • a is a correction coefficient, which can be a set value.
  • the remaining life information of the k th core can be corrected according to the position of the k th core on the CPU, and the remaining life information of the k th core before and after the correction Equation 4 can be satisfied between Tj:
  • T core ′ represents the remaining life information of the k th core after correction
  • T core represents the remaining life information of the k th core before the correction
  • a is a correction coefficient, which can be a set value.
  • the first core may be determined according to the respective lifetime information of the multiple cores and the task amount information of the first task.
  • the task amount information of the first task may indicate the number of instructions included in the first task, and the number of instructions may be used to determine the running time for executing the first task.
  • the running time for the core to execute the first task may be determined according to the average running time of any core for executing one instruction and the number of instructions included in the first task.
  • the task amount information of the first task may also indicate the running time of the first task.
  • the first task is a thread as an example for description.
  • the first core may also be determined according to the task amount information of the thread and the respective lifetime information of the multiple cores of the processor.
  • the task amount information of the thread can be used to determine the running time of executing the thread. Taking the life information as the time information of the remaining life or the time information of the used life as an example, the running time of the thread can be determined according to the task amount information of the thread, and the difference between the time information corresponding to the remaining life and the running time of executing the thread can be determined.
  • the first core is selected among the cores greater than or equal to the seventh threshold, and/or the cores whose sum of the used life information and the task amount information of the thread is less than or equal to the eighth threshold.
  • the selection manner of selecting the first core may be random selection or selection based on remaining life information and/or already service life information, or the like.
  • the remaining lifetime information of the selected first core may be limited to be not less than the task amount information of the thread, so as to prevent the core from running out of lifetime after executing the thread.
  • At least one of the first threshold, second threshold, third threshold, fourth threshold, fifth threshold, sixth threshold, seventh threshold or eighth threshold involved in this application may be determined according to the processor.
  • the average lifespan information of other cores other than one core is determined, or is determined according to the average lifespan information of all cores of the processor.
  • the remaining life information and/or the used life information of the first core can be updated and stored according to the task amount information of the thread, and used as the core's information in subsequent task allocation. Lifetime information.
  • the update of the lifetime information of the core may be performed according to the set duration, or the lifetime information of the core may be updated after the core executes one or more tasks.
  • the method of updating the used life information of the core can be described with reference to the foregoing method of determining the used life information of the core.
  • the processor determines, according to Formula 1, at time 1 before executing the first task, the respectively used lifetime information of the K cores, and determines the first core from the K cores.
  • the processor may update the used life information of the first core before time 2 according to the information of the tasks (for example, including the first task) that have been executed by the first core.
  • the used life information T comp ′ updated by the first core may conform to formula 5:
  • F represents the frequency of the processor
  • CPI k represents the average running time for the kth core to execute an instruction.
  • the used life information of the core can also be updated according to other methods than Formula 1, which is not specifically limited in this application.
  • the duration of the core executing the first task can also be counted by a counter, and the used life information of the first core before time 2 is obtained according to the duration and the used life information of the first core before time 1.
  • the updated remaining lifetime information of the kth core may be determined according to the total lifetime information of the kth core and the updated used lifetime information of the kth core.
  • Equation 2 can be referred to.
  • the core lifetime information may be stored in the BIOS.
  • a core lifetime recorder unit or simply a record unit may be added to the BIOS to store the core lifetime information.
  • the module or unit performing the task assignment can obtain the life information of the core from the recording unit.
  • the updated lifetime information of the cores may also be stored to the recording unit.
  • Manner 1 The task allocation method provided by the embodiment of the present application is implemented by executing software on the operating system of the processor.
  • the Linux scheduler can execute the software (such as executing computer program instructions), so that the Linux scheduler can perform task allocation according to the life information of the core, that is, determine the relationship between the core and the core according to the life information of the core. Mapping relationship between tasks/threads.
  • the Linux scheduler can store computer program instructions, or acquire and execute program instructions from a memory, so as to implement the task allocation method.
  • the task allocation process may include: after the application program runs, the tasks of the application program are processed by the Linux operating system (represented as operation in FIG. 6 ).
  • the application programming interface (API) of the system) performs resource allocation, the driver of the Linux operating system sends the task to the Linux scheduler, and the Linux scheduler executes the software, and realizes the following steps: obtain the life information of the core from the recording unit of the BIOS (In other words, the core life information stored in the recording unit is transparently transmitted to the Linux scheduler), and the mapping relationship between the threads included in the task and the core is determined according to the core life information, that is, task allocation is realized.
  • each thread is executed by the core corresponding to the thread, and then the lifespan information of the core corresponding to the thread is updated after the thread ends, and the updated lifespan information is stored in the recording unit of the BIOS, so that subsequent downloads can be performed according to the updated lifespan information of the core.
  • Manner 2 The task allocation method provided by the embodiment of the present application is implemented by a processor.
  • the CPU can execute the software solidified in the CPU (such as executing computer program instructions), so that the CPU can allocate tasks according to the life information of the core, that is, determine the threads included in the core and the task according to the life information of the core. the mapping relationship between them.
  • computer program instructions can also be obtained by the CPU from a storage system or a memory other than the CPU, so as to implement the task allocation method.
  • the task allocation process may include: after the application program runs, the tasks of the application program perform resource allocation through the API of the Linux operating system (represented as operating system in FIG. 7 ).
  • the Linux operating system The driver sends the task to the Linux scheduler (represented as the scheduler in Figure 7), the Linux scheduler sends the task to the CPU, and the CPU executes the software to implement the following steps: Obtain the core's data from the recording unit of the BIOS through the hardware interface of the CPU.
  • the lifespan information (or, in other words, the lifespan information of the cores stored in the recording unit is transparently transmitted to the CPU), and the mapping relationship between the threads and the cores is determined according to the lifespan information of the cores.
  • Each thread is executed by the core corresponding to the thread. After the thread ends, the life information of the core corresponding to the thread is updated by the CPU, and the updated life information is stored in the recording unit of the BIOS through the hardware interface, so as to follow the updated life of the core. information for the next task assignment.
  • Mode 3 Application of the task allocation method provided by the embodiment of the present application in a heterogeneous processor.
  • the CPU can be used to schedule tasks of the GPU.
  • the CPU and the GPU can be connected through a fast peripheral component interconnect (PCIe) bus (bus) or other means.
  • PCIe peripheral component interconnect
  • the operating system of the CPU can perform the task of the GPU according to the life information of the GPU core.
  • the PCIe bus can be used to connect the dynamic random access memory (DRAM) of the CPU and the DRAM of the GPU.
  • the CPU may further include a control unit, an arithmetic and logic unit (arithmetic and logic unit, ALU) and a cache memory (Cache).
  • the Linux operating system of the CPU can obtain the lifespan information of the GPU core through the PCIe bus and store it in the recording unit of the BIOS of the CPU.
  • the Linux operating system of the CPU can obtain the parameters of the GPU core for determining the lifetime information through the PCIe bus, for example, the total lifetime information of the kth core of the GPU, the frequency of the processor, the number of historical instructions of the kth core, At least one of the average task volume information of instructions, the fetch time of the kth core, or the ratio between the unhidden fetch time of the kth core in executing at least one task and the running time of at least one task information, determine the lifespan information of the GPU core according to the acquired parameters, and store it in the recording unit of the CPU BIOS.
  • the lifetime information of the GPU core and/or the parameters for determining the lifetime information of the GPU core may be stored in the DRAM of the GPU.
  • the task of the GPU can be performed by the Linux scheduler of the Linux operating system of the CPU or the CPU according to the life information of the GPU core stored in the recording unit of the BIOS. Allocation, that is, determining the mapping relationship between the threads included in the tasks of the GPU and the cores of the GPU.
  • the manner in which the GPU task allocation is performed by the Linux scheduler may refer to the description in the foregoing manner 1, and the manner in which the CPU performs the GPU task assignment may refer to the description in the foregoing manner 2, which will not be repeated here.
  • the CPU determines the mapping relationship between the threads and the cores of the GPU, the CPU can notify the GPU of the mapping relationship through the bus.
  • the lifespan information of the GPU core can be updated by the Linux scheduler or the CPU, and the updated lifespan information of the GPU core is stored in the recording unit.
  • the embodiments of the present application propose a multi-core management software and hardware system design in scenarios such as homogeneous processors and heterogeneous processors, so that the task allocation process is performed according to the life information of the cores, and the load balancing problem of the CPU cores is solved. , in order to prolong the life of the nucleus. Due to the extended core lifetime, the core can operate at a higher frequency, thus improving the performance of the processor.
  • an embodiment of the present application provides another task assignment method. Taking the task assignment device executing the method as an example, the method may include the following steps:
  • the task allocating apparatus acquires third information respectively of the multiple processors, where the third information is used to describe the service life of the processors.
  • the third information may be lifetime information of the processor.
  • the third information includes remaining lifetime information and/or used lifetime information of the processor, and the third information may also be the average lifetime information and/or processing information of cores in the processor.
  • the lifetime information of the core with the least remaining lifetime in the processor may be the sum of remaining lifetime information of the cores included in the processor, or may be other parameters or indicators used to measure the remaining lifetime of the processor.
  • the used life information of the processor may be the sum of the used life information of the cores included in the processor, or may be other parameters or indicators used to measure the used life of the processor.
  • the average lifetime information of the cores in the processor may be determined according to the remaining lifetime information or the used lifetime information of each core in the processor.
  • the lifetime information of the core with the least remaining lifetime in the processor may be the remaining lifetime information and/or the used lifetime information of the core with the least remaining lifetime information.
  • the task assignment device may be one of multiple processors, or may be a computer system other than multiple processors.
  • the task allocation apparatus determines, according to the third information of the multiple processors, a first processor for processing a second task from the multiple processors, and the second task is a task to be processed.
  • the task allocating apparatus may select the first processor from processors whose remaining life information indicated by the third information is larger and/or whose used life information is smaller.
  • the task assignment device can perform task assignment according to the third messages of the multiple processors, so the life spans of the multiple processors are balanced to prolong the life span of the processor system composed of the multiple processors.
  • the task allocation apparatus may determine the first processor according to the respective third information of the multiple processors and the task amount information of the second task.
  • the first processor according to the third information of the multiple processors and the task amount information of the second task respectively refer to S102 to determine the first processor according to the life information of the multiple cores and the task amount information of the first task respectively. cores to avoid running out of processor life due to the execution of the first task. For example, the remaining life information indicated by the third information or the processor that is not lower than the task amount information of the second task may be determined as the first processor.
  • the embodiments of the present application further provide a task assignment device, which is used to implement the steps shown in the above method embodiments.
  • the device may include the structure shown in FIG. 10 and/or FIG. 11 .
  • the task assignment apparatus can be applied to a computer system with multiple processor cores, and can be used to implement the task assignment method shown in FIG. 2 and/or FIG. 9 .
  • the task assignment apparatus may include a recording module 1010 and a task assignment module 1020 .
  • the recording module 1010 may be configured to acquire respective first information of multiple cores of the processor, where the first information is used to describe the service life of the cores.
  • the task allocation module 1020 may be configured to determine, according to the respective first information of the multiple cores, a first core for processing the first task from the multiple cores, and the first task is a task to be processed.
  • the task assignment apparatus in this embodiment of the present application may be implemented by software, for example, a computer program or instruction having the functions of the recording module 1010 and/or the task assignment module 1020 described above, and the corresponding computer program or instruction may be stored.
  • the above-mentioned functions of the recording module 1010 and/or the task assignment module 1020 are realized by the processor reading the corresponding computer program or instruction in the memory.
  • the task allocation apparatus in the embodiment of the present application may also be implemented by hardware.
  • the task allocation module 1020 may include a processor (eg, a CPU or a processor in a system chip).
  • the logging module 1010 may include a memory, or include a communication interface, such as a transceiver or an input/output interface, that supports communication with the memory, for the task assignment module 1020 to obtain the first information of the core from the memory.
  • the task allocation module 1020 may determine the first core according to the respective first information of the multiple cores and the task amount information of the first task.
  • the task allocation module 1020 can be based on the physical core group.
  • the first information of the cores in the first physical core group is determined, and the first physical core group corresponding to the first task is determined, and according to the first information of the cores in the first physical core group, each cores corresponding to each thread.
  • the recording module 1010 is further configured to store updated first information of the first core, and the updated first information is determined according to the first information of the first core and the task amount information of the first task.
  • the updated first information may be determined by the task assignment module 1020 .
  • the first information of the kth core of the processor includes remaining lifetime information of the kth core; and/or the first information includes the used lifetime information of the kth core.
  • the remaining lifetime information of the kth core is determined according to the total lifetime information and the used lifetime information of the kth core.
  • the lifetime information of the kth core may be based on the historical running time of the kth core, the frequency and historical running time of the kth core, the voltage and historical running time of the kth core, the kth core.
  • the position of the cores on the processor, the total memory access time of the kth core, the historical instruction count and average execution time of the kth core, or the frequency of the kth core and the total memory access time at least one piece of information Sure.
  • the recording module 1010 belongs to the firmware, or the recording module 1010 may acquire the respective first information of the multiple cores of the processor from the firmware.
  • the recording module includes an editable memory in the firmware, and the editable memory can be used to store at least one of the first information, the second information or the third information designed in this application.
  • the programmable memory is an erasable programmable read-only memory (EPROM) or an electrically erasable programmable read-only memory (EEPROM).
  • the recording module includes a performance monitor unit (performance monitor unit, PMU) of the processor, for the processor to obtain at least one of the first information, the second information or the third information from the firmware.
  • a performance monitor unit performance monitor unit, PMU
  • the task allocation module 1020 includes a task scheduler.
  • the task allocation module 1020 is a Linux scheduler in the Linux operating system.
  • the recording module 1010 may acquire third information of multiple processors respectively, where the third information is used to describe the service life of the processors.
  • the task allocation module 1020 may determine, according to the respective first information of the multiple processors, a first processor for processing the second task from the multiple processors, and the second task is a to-be-processed task.
  • the task allocation module 1020 may specifically determine the first processor according to the respective third information of the multiple processors and the task amount information of the second task.
  • the embodiments of the present application further provide another task allocation apparatus that may include the structure shown in FIG. 11 for executing the actions of the task allocation methods provided in the embodiments of the present application in FIG. 2 , FIG. 9 and/or the present application.
  • the task allocation apparatus may include a processor 1110 and a memory 1120 .
  • the processor 1110 may include multiple cores.
  • the memory 1120 may be used to store lifetime information of a plurality of cores.
  • the processor 1110 may be configured to execute the task allocation method described in the foregoing embodiments. It should be understood that in FIG. 11 , only one processor 1110 and one memory 1120 are used as an example for description, and the task allocation apparatus provided in this application may include other numbers of memories 1120 and processors 1110 .
  • the processor 1110 and the memory 1120 are connected to each other through a bus.
  • the bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, or the like.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in FIG. 11, but it does not mean that there is only one bus or one type of bus.
  • the at least one processor 1110 may include at least one of the following: a CPU, a microprocessor, an application specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the programs of the present application.
  • the CPU may include a power consumption controller and at least one processor core, and the power consumption controller can acquire the failure information of the at least one processor core, and convert the failure information of the at least one processor core stored in the memory 1120 .
  • the memory 1120 can be ROM or other types of static storage devices that can store static information and instructions, RAM or other types of dynamic storage devices that can store information and instructions, or can be EEPROM, compact disc read-only memory. , CD-ROM) or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, Blu-ray disk, etc.), magnetic disk storage medium or other magnetic storage device, or capable of carrying or storing instructions or The desired program code in the form of a data structure and any other medium that can be accessed by a computer, but not limited thereto.
  • the memory can exist independently and be connected to the processor through a bus.
  • the memory can also be integrated with the processor.
  • the memory 1120 is used for storing computer-executed instructions for executing the solutions of the present application, and the execution is controlled by the processor 1110 .
  • the processor 1110 is configured to execute the computer-executed instructions stored in the memory 1120, thereby implementing the task scheduling method provided by the above embodiments of the present application.
  • the function of the task allocation module 1020 shown in FIG. 10 can be implemented by the processor 1110 .
  • the function of the recording module 1010 shown in FIG. 10 can be implemented by the memory 1120, that is, acquiring the first information of the core, and/or storing the updated first information of the core.
  • the task assignment shown in FIG. 11 may further include a communication interface, such as a transceiver or an input/output interface.
  • a communication interface such as a transceiver or an input/output interface.
  • the first information of the core may be acquired from other memories (or other storage media) by the interface, and/or the updated first information of the core may be sent to other memories.
  • the computer program instructions in the embodiments of the present application may also be referred to as application program codes, which are not specifically limited in the embodiments of the present application.
  • Embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium is used to store a computer program, and when the computer program is executed by a computer, the computer can implement the processes related to the foregoing method embodiments.
  • Embodiments of the present application further provide a computer program product, where the computer program product is used to store a computer program, and when the computer program is executed by a computer, the computer can implement the processes related to the foregoing method embodiments.
  • Embodiments of the present application further provide a chip or a chip system (or circuit), where the chip may include a processor, and the processor may be configured to call a program or an instruction in a memory to execute the connection between the network device and/or the network device and/or the method provided by the foregoing method embodiments. Terminal related processes.
  • the chip system may include components such as the chip, memory or transceiver.
  • the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
  • the apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)
  • Power Sources (AREA)

Abstract

本申请公开了一种任务分配方法及装置,涉及计算机技术领域。该方法可应用于具有多个处理器核心的计算机系统,包括:获取处理器的多个核分别的第一信息,第一信息可用于描述核的使用寿命。以及,根据多个核分别的第一信息,从处理器的多个核中确定用于处理第一任务的第一核,其中,该第一任务为待处理任务。通过上述设计,在具有多个处理器核心的计算机系统中,可以根据处理器核的寿命信息对任务进行调度,因此可以尽可能地实现多个处理器核心之间的磨损均衡,有助于保障器件的使用寿命,确保核能够在更长时间工作在较高的频率。

Description

一种任务分配方法及装置
本申请要求在2021年3月9日提交中国专利局、申请号为202110254699.4、申请名称为“一种基于负载动态适配的高性能CPU软硬件设计方法及装置”的中国专利申请的优先权,以及于2021年4月15日提交中国专利局、申请号为202110408180.7、申请名称为“一种任务分配方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别涉及一种任务分配方法及装置。
背景技术
随着个人计算机(personal computer,PC)、大数据、分布式存储和云计算等计算系统的飞速发展,对于中央处理器(central processing unit,CPU)以及包括图形处理器(graphics processing unit,GPU)和嵌入式神经网络处理器(neural-network processing units,NPU)等的异构计算的xPU来说,越来越需要处理器运行到更高的频率以获取更好的处理性能。例如,目前的多核处理器的任务分配方式会尽可能将保证处理器各个核工作在更高的频率上以确保处理性能。然而,当频率上升到一定阈值时,需要更高的电压来支持处理器的核工作在该频率。高电压会加速处理器核中晶体管电路的老化,降低处理器核的寿命。如果核一直工作在最高频率,寿命会断崖式下跌,缩短到几个月甚至几天。寿命耗尽或不足的核,其工作的频率会大大降低。
因此,目前需要改善任务分配方式以提高处理器核的使用寿命,以确保核能够在更长时间工作在较高的频率。
发明内容
本申请实施例提供一种任务分配方法及装置,以提高处理器核的使用寿命。
第一方面,本申请实施例提供了一种任务分配方法,该方法可以应用于具有多个处理器核心的计算机系统以提高计算机系统中核心的使用寿命。该方法可由任务分配置实施,任务分配装置可以是计算机系统,该计算机系统可以是该方法所应用的具有多个处理器核心的计算机系统,也可以是其他计算机系统。
以执行主体是任务分配装置实施为例,该方法包括:任务分配装置可获取处理器的多个核分别的第一信息,其中,第一信息用于描述核的使用寿命。任务分配装置还可根据多个核分别的第一信息,从多个核中确定用于处理第一任务的第一核,其中,第一任务为待处理任务。
通过上述设计,在具有多个处理器核心的计算机系统中,可以根据处理器核的寿命信息对任务进行调度,因此可以尽可能地实现多个处理器核心之间的磨损均衡,有助于保障器件的使用寿命,确保核能够在更长时间工作在较高的频率。
在一种可能的设计中,任务分配装置可根据多个核分别的第一信息和第一任务的任务量信息,确定第一核。
采用该设计可进一步提高任务分配的合理性。
在一种可能的设计中,如果处理器包括多个物理核组,任一物理核组包括多个核,且第一任务包括多个线程,则任务分配装置还可以根据物理核组中的核的第一信息,确定第一任务对应的第一物理核组,并根据第一物理核组中的核的第一信息,从第一物理核组的核中确定第一任务中每个线程对应的核。
采用该设计,可进一步均衡各个物理核组的核之间的使用寿命,以获得更好的寿命均衡效果。
在一种可能的设计中,任务分配装置还可存储第一核的更新的第一信息,更新的第一信息根据第一核的第一信息和第一任务的任务量信息确定。
采用该设计,可根据第一任务的任务量信息更新第一核的寿命信息,因此可以更加准确地获得核的寿命信息,提高后续任务分配的寿命均衡效果。
在一种可能的设计中,处理器的第k个核的第一信息包括第k个核的剩余寿命信息;和/或,第一信息包括第k个核的已用寿命信息。可选的,第k个核的剩余寿命信息可根据第k个核的总寿命信息和已用寿命信息确定。
在一种可能的设计中,第k个核的寿命信息可根据该第k核的历史运行时间、第k个核的频率和历史运行时间、第k核的电压和历史运行时间、第k个核在处理器上的位置、第k核的访存总时间、第k个核的历史指令数和指令的平均运行时间,或者第k个核的频率和访存总时间中的至少一个信息确定。
采用该设计,可提高核的寿命信息的确定精度,以提高寿命均衡效果。
第二方面,本申请实施例提供了一种任务分配方法,该方法可以应用于具有多个处理器的计算机系统,以提高计算机系统中核的使用寿命。该方法可由任务分配置实施,任务分配装置可以是计算机系统,该计算机系统可以是该方法所应用的具有多个处理器核心的计算机系统,也可以是其他计算机系统。
以执行主体是任务分配装置为例,该方法包括:任务分配装置可获取多个处理器分别的第三信息,第三信息用于描述处理器的使用寿命。任务分配装置还可根据多个处理器分别的第一信息,从多个处理器中确定用于处理第二任务的第一处理器,第二任务为待处理任务。
通过上述设计,在具有多个处理器的计算机系统中,可以根据处理器的寿命信息对任务进行调度,因此可以尽可能地实现多个处理器之间的磨损均衡,有助于保障器件的使用寿命,确保处理器能够在更长时间工作在较高的频率。
在一种可能的设计中,任务分配装置可根据所述多个处理器分别的第三信息和所述第二任务的任务量信息,确定所述第一处理器,以进一步提高任务分配的合理性。
第三方面,本申请实施例提供了一种任务分配装置,可以应用于具有多个处理器核心的计算机系统,该任务分配装置具体可以实现上述第一方面或第二方面中任务分配方法行为的功能。该任务分配装置可以为计算机系统中的硬件或软件单元,可以包括至少一个模块,该至少一个模块用于实现上述第一方面或第二方面及其任一可能的设计所述的任务分配的方法。
第四方面,本申请实施例提供了一种任务分配装置,包括至少一个处理器,所述至少一个处理器与至少一个存储器耦合:所述至少一个处理器,用于执行所述至少一个存储器中存储的计算机程序或指令,以使得所述装置执行上述第一方面或第二方面及其任一可能的设计中的方法。可选地,该装置还包括通信接口,处理器与通信接口耦合。该通信接口可以是收 发器或输入/输出接口;当该装置为网络设备中包含的芯片时,该通信接口可以是芯片的输入/输出接口。可选地,收发器可以为收发电路,输入/输出接口可以是输入/输出电路。
第五方面,本申请实施例提供一种计算设备,包括处理器和存储器,所述处理器包括多个处理器核心;所述存储器,用于存储计算机程序或指令;所述处理器,用于执行计算机程序或指令,实现上述第一方面或第二方面及其任一可能的设计所述的任务分配方法。
第六方面,本申请实施例提供一种计算机系统,该计算机系统可以包括记录模块、任务分配模块和多个处理器核心,所述任务分配模块可用于根据从记录模块获取的多个处理器核分别的第一信息,实现上述第一方面及其任一可能的设计所述的方法。记录模块可包括可编辑存储器,用于存储多个处理器核分别的第一信息,或用于从可编辑存储器读取多个处理器核分别的第一信息。
第七方面,本申请实施例提供一种计算机系统,该计算机系统可以包括记录模块、任务分配模块和多个处理器,所述任务分配模块可用于根据从记录模块获取的多个处理器核心的第一信息,实现上述第二方面及其任一可能的设计所述的方法。记录模块可包括可编辑存储器,用于存储多个处理器分别的第三信息,或用于从可编辑存储器读取多个处理器分别的第三信息。
第八方面,本申请实施例提供了一种可读存储介质,用于存储指令,当所述指令被执行时,使上述第一方面或第二方面及其任一可能的设计所述的方法被实现。
第九方面,本申请实施例提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面或第二方面及其任一可能的设计所述的方法。
第十方面,本申请实施例提供一种芯片系统,包括:处理器,所述处理器与存储器耦合,所述存储器用于存储程序或指令,该芯片系统还可包括接口电路,所述接口电路用于接收程序或指令并传输至处理器;当所述程序或指令被所述处理器执行时,使得该芯片系统实现上述第一方面或第二方面及其任一可能的设计中的方法。
可选地,该芯片系统中的处理器可以为一个或多个。该处理器可以通过硬件实现也可以通过软件实现。当通过硬件实现时,该处理器可以是逻辑电路、集成电路等。当通过软件实现时,该处理器可以是一个通用处理器,通过读取存储器中存储的软件代码来实现。
可选地,该芯片系统中的存储器也可以为一个或多个。该存储器可以与处理器集成在一起,也可以和处理器分离设置,本申请并不限定。示例性的,存储器可以是非瞬时性处理器,例如只读存储器(read-only memory,ROM),其可以与处理器集成在同一块芯片上,也可以分别设置在不同的芯片上,本申请对存储器的类型,以及存储器与处理器的设置方式不作具体限定。
本申请在上述各方面提供的实现的基础上,还可以进行进一步组合以提供更多实现。
以上第三方面至第十方面及其任一可能的设计的有益效果可参见第一方面或第二方面及其任一可能的设计中的有益效果。
附图说明
图1为Linux操作系统下的EAS技术示意图;
图2为本申请实施例提供的一种任务分配方法的流程示意图;
图3为本申请实施例提供的一种线程分配过程的示意图;
图4为本申请实施例提供的一种核在CPU上的位置示意图;
图5为本申请实施例提供的另一种核在CPU上的位置示意图;
图6为本申请实施例提供的一种任务分配方法的实现方式示意图;
图7为本申请实施例提供的另一种任务分配方法的实现方式示意图;
图8为本申请实施例提供的一种CPU和GPU异构系统的架构示意图;
图9为本申请实施例提供的另一种任务分配方法的实现方式示意图;
图10为本申请实施例提供的一种任务分配装置的结构示意图;
图11为本申请实施例提供的另一种任务分配装置的结构示意图。
具体实施方式
为了解决背景技术中提及的问题,本申请实施例提供了一种任务分配方法及装置,有助于提高多核处理器中核的使用寿命。其中,方法和装置是基于同一技术构思的,由于方法及装置解决问题的原理相似,因此装置与方法的实施可以相互参见,重复之处不再赘述。
以下,对本申请中的部分用语进行解释说明,以便于本领域技术人员理解。
计算机系统,由硬件(子)系统和软件(子)系统组成。其中,硬件(子)系统包括由电、磁、光、机械等原理构成的各种物理部件(如处理器等)的有机组合,是系统赖以工作的实体;软件(子)系统包括各种程序和文件,用于指挥全系统按指定的要求进行工作。随着计算机技术的发展,现代计算机系统小到微型计算机和个人计算机,大到巨型计算机及其网络,形态、特性多种多样,已广泛用于科学计算、事务处理和过程控制,日益深入社会各个领域,对社会的进步产生深刻影响。
在一种实现方式中,本申请实施例中的计算机系统,可以为终端装置内的计算机系统,是一种向用户提供业务服务、具有语音或数据连通等功能的装置。终端装置又可以称为终端设备,还可以称为用户设备(user equipment,UE)、移动台(mobile station,MS)、移动终端(mobile terminal,MT)等,终端装置也可以为一种芯片。在本申请后续实施例和描述中,以终端设备为例进行具体描述。
例如,终端设备可以为具有无线连接功能的手持式设备、车载设备等。目前,一些终端设备的举例为:手机(mobile phone)、平板电脑、笔记本电脑、掌上电脑、移动互联网设备(mobile internet device,MID)、智能销售终端(point of sale,POS)、可穿戴设备,虚拟现实(virtual reality,VR)设备、增强现实(augmented reality,AR)设备、工业控制(industrial control)中的无线终端、无人驾驶(self driving)中的无线终端、远程手术(remote medical surgery)中的无线终端、智能电网(smart grid)中的无线终端、运输安全(transportation safety)中的无线终端、智慧城市(smart city)中的无线终端、智慧家庭(smart home)中的无线终端、各类智能仪表(智能水表、智能电表、智能燃气表)等。
在另一种实现方式中,本申请实施例中的计算机系统,可以是服务器,是提供数据连通服务的设备。由于服务器可以响应终端设备的服务请求,并进行处理,因此一般来说服务器应具备承担服务并且保障服务的能力。在本申请中,所述服务器可以为位于数据网络(data network,DN)中的服务器,例如普通服务器,云平台中的服务器;或者为位于核心网内的 多接入边缘计算(multi-access edge computing,MEC)服务器等。
本申请实施例中的计算机系统还可以是处理器、芯片或芯片系统。
(2)操作系统(operating system,OS),是运行在计算机系统上的最基本的系统软件,例如windows系统、Android系统、IOS系统、windows server系统、Netware系统、Unix系统、Linux系统。本领域技术人员可以理解,其它操作系统中,也可以采用类似的算法实现,本申请对此不做限定。
(3)内核(kernel),是基于硬件的第一层软件扩充,提供操作系统的最基本的功能。例如,可负责管理系统的进程、内存、驱动程序、文件和网络系统,决定着系统的性能和稳定性。
(4)处理器的核(core),或称核心,即处理器中的核心芯片。处理器的核数是指一个处理器由多少个核组成。核数越多,代表这个处理器的运转速度越快,性能越好。如果处理器的核数大于或等于2,则该处理器可称为多核处理器。
(5)同构处理器,或称同构多核处理器。同构多核处理器的每个处理器核心的结构完全相同,同时地位也是等同的。同构多核处理器中,可以由不同核心共享相同的代码,也可以由不同核心分别执行不同的代码。以CPU同构处理器为例,该处理器中的多个核均为CPU核,或者说该处理器包括的运算模块均为称CPU运算模块。
(6)异构处理器,或称异构多核处理器。异构多核处理器的不同核心可采用功能不同的核。异构多核处理器常用于特殊应用,如信号处理等。异构多核处理器中,一般是某些核用于管理和调度,另外一些核用于特定的性能加速,处理器核之间通过共享总线、交叉开关互联和片上网络。以CPU和xPU异构处理器为例,该处理器中可包括至少一个CPU核和至少一个xPU(如GPU或NPU等)核,或者说该处理器包括的运算模块至少包括CPU运算模块和一个xPU(如GPU或NPU等)运算模块,其中,CPU核可由于实现管理和调度。
(7)进程(process)和线程(tread)。进程是资源分配的最小单位,线程或称逻辑线程,是程序执行的最小单位,一个线程可包括一个或多个指令,因此每个线程的处理运行时间可能不同。也就是说,资源分配给进程,同一进程内的所有线程共享该进程的所有资源。其中,一个线程只能属于一个进程,而一个进程可以有多个线程,但至少有一个线程。一个线程是进程中一个单一顺序的控制流,一个进程中可以并发多个线程,每条线程可并行执行不同的任务。对于多核处理器,不同的核可用于执行不同的线程,因此能够实现任务的并行。本申请中,线程可理解为处理器在进行特定的数据处理中最小的流水线单元。应理解,一个核可对应于一个或多个流水线以便实现多任务并行处理。
(8)任务(task),指由软件完成的一个活动。一个应用程序可能包含一个或多个任务。一个任务既可以是一个进程,也可以是一个线程,或者可以包括多个进程和/或多个线程。本申请中,除特殊说明外,任务分配(或调度)是指将任务包括的进程和/或线程进行分配,线程分配是指将线程分配到处理器流水线,使得线程通过流水线被处理。例如,任务可以是读取数据并将数据放入内存中。这个任务可以作为一个进程来实现,也可以作为一个线程(或作为一个中断任务)来实现。
下面以线程分配为例,说明目前的多核处理器中任务分配的方式。
在系统的软硬件实现上,现有技术主要是通过操作系统(operation system,OS)层的软件调度方法实现线程分配。以CPU同构处理器下的Linux操作系统为例,Linux内核的完全公平调度(complete fair schedule,CFS)调用器和负载均衡主要是为了服务器性能优 先而考虑,因此将多个线程平均分配到系统所有可用的CPU核上,最大限度地提高系统的吞吐量,即尽可能地让CPU的核工作在更高频率。
然而,更高的CPU频率会造成CPU核的使用寿命的缩短。原因在于,在CPU中的场效应晶体管(field effect transistor,FET)需要完全放电才能确保信号完整性,当频率变得更高的时候,需要更小的释放时间,因此只有通过提高电压来缩短放电时间。而更高的电压会加速晶体管老化,缩短CPU核的寿命。另外,当CPU持续采用高电压时,芯片的温度会急剧提升,也会导致CPU核的使用寿命的缩短。因此,即便任务分配的过程采用平均分配的方式执行,由于每个线程的处理运行时间可能不同,仍然可能导致不同核的损耗程度不同,即导致核的剩余寿命不同。如果核的寿命耗尽或者寿命较低,该核的工作频率会大大降低,造成处理器性能降低。
如图1所示,Linux内核5.3采用了能耗感知调度(energy aware scheduler,EAS)技术,EAS技术能够充分利用核的功耗、性能和频率差异来达到性能和功耗的最优平衡。例如EAS中的Linux调度器(在图1中表示为调度器)可执行CFS,Linux CPU空闲(idle)机制(在图1中表示为空闲机制)能够决定CPU合适进入空闲模式以降低功耗,Linux CPU变频机制(在图1中表示为变频机制)能决定何时增加或降低CPU频率,能量模型可用于平衡Linux调度器、Linux CPU空闲机制和Linux CPU变频机制之间的能量消耗。然而该方案对于寿命的延长效果有限,并且,该方案主要适用于核数少且核间差异大的情况,比如适宜应用于移动终端中,而在对称多处理器结构(symmetric multi-processor,SMP)等场景的应用中收益甚微,因此适用场景有限。
因此,有必要研究能够有效提高处理器的使用寿命的方案。
本申请实施例提供一种任务分配方法及装置,可根据处理器多个核的寿命信息将待处理的任务分配到处理器的第一核上,由第一核处理该任务。由于任务是根据核的寿命信息分配的,因此可对处理器的多个核的使用寿命进行均衡,使得各个核能够工作在较高的频率。
应理解,该任务分配方法可由任务分配装置执行,任务分配装置可以是计算机系统。该装置可通过软件与硬件结合的方式实现或通过硬件方式实现。例如,该任务分配方法可通过操作系统执行软件的方式实现,或者可通过芯片执行固化在芯片中的软件等方式实现。具体来说,该任务分配置可由固件(BIOS)和/或操作系统的任务分配层(如调度器)实现,或者,可通过CPU等处理器(或处理器芯片)实现,本申请不进行具体限定。
下面结合附图介绍本申请实施例技术实现的具体实施例。应理解,本申请提供的任务分配方法可应用于同构处理器或异构处理器。其中,同构处理器例如CPU同构处理器,异构处理器例如CPU和xPU的异构处理器。
如图2所示,该方法可包括以下步骤:
S101:任务分配装置获取处理器的多个核分别的寿命信息,寿命信息可用于描述核的使用寿命。应理解,寿命信息在本申请的文字描述和附图中可称为第一信息。
在本申请中,寿命信息或称使用寿命信息。本申请中,寿命信息可用于确定核的剩余使用寿命和/或已用使用寿命,也就是说,寿命信息可包括剩余寿命信息和/或已用寿命信息。其中,总寿命信息可用于描述核最大可支持进行的工作时长。已用寿命信息可根据核历史任务运行时间或历史任务工作量等信息有关,可用于描述核已消耗的时长。剩余寿命信息可根据总寿命信息和已用寿命信息确定,描述核剩余的可用时长。应理解,后续本申请中除特殊说明,寿命信息可以是剩余寿命信息或者或已用寿命信息中的任意一项或多项。
本申请中,寿命信息可以是用于指示使用寿命的时间信息,或者,为了方便计算,寿命信息可以是根据时间信息通过归一化等方式量化的数值。
这里举例一种寿命信息的归一化方式。以总使用寿命为5年为例,即总使用寿命对应的时间信息为5年,可将5年归一化为100,即总寿命信息量化的数值为100,则寿命信息可以是5年,也可以是100。其中,由于设计和工艺的偏差,不同的核的总寿命信息对应的时长可能不同,一般在5年(或其他时间信息)的上下,可能有5%左右的偏差量,可认为核的总寿命信息的大小成高斯分布,归一化后的数值可能在100上下有一定浮动。按照相似的归一化方式,还可将该核的剩余使用寿命和/或已使用寿命归一化为数值,分别获得剩余寿命信息和/或已用寿命信息。
S102:任务分配装置根据多个核分别的寿命信息,从多个核中确定用于处理第一任务的第一核,该第一核是处理器的多个核中的一个。
例如,寿命信息包括剩余寿命信息,任务分配装置可选择剩余寿命信息最大的核作为第一核,和/或,寿命信息包括已用寿命信息,任务分配装置可选择已用寿命信息最小的核作为第一核。或者,可以在剩余寿命信息大于或等于第一阈值和/或已用寿命信息小于或等于第二阈值的核中,选择第一核,此时的选择方式可以是包括随机选择等方式。或者,可以按照各个核的寿命信息对核进行排序,如,根据寿命信息确定各个核的权重,在任务分配时根据各个核的权重进行分配。其中,核的权重可表示核被选为第一核的可能性,权重越高则对应的核更可能被确定为第一核。核的权重可与核的剩余寿命信息成正相关,和/或,与核的已用寿命信息成负相关。
采用图2所示流程,可根据处理器多个核的寿命信息将待处理的任务分配到处理器的第一核上,使得各个核的使用寿命得到均衡,以尽可能地延长处理器核的寿命,使得核能够更长时间地工作在较高频率。
示例性的,S102中确定第一核的过程即任务分配的过程,其可视为确定任务所包括的线程与第一核之间映射关系的过程。其中,如果处理器包括多个物理核组(physical core cluster),任意一个物理核组中包括一个或多个核,则任务分配装置可根据物理核组分别的第二信息,从多个物理核组中确定第一任务对应的第一物理核组,第一物理核组的核包括第一核。应理解,物理核组的核之间可能共享缓存。
其中,第二信息可以是物理核组的寿命信息,比如,第二信息包括物理核组的剩余寿命信息和/或已用寿命信息,第二信息也可以是物理核组中核的平均寿命信息和/或物理核组中剩余寿命最少的核的寿命信息等。其中,物理核组的剩余寿命信息可以是物理核组包括的核的剩余寿命信息的总和,也可以是用于衡量物理核组的剩余寿命的其他的参数或指标。物理核组的已用寿命信息可以是物理核组包括的核的已用寿命信息的总和,也可以是用于衡量物理核组的已用寿命的其他的参数或指标。物理核组中核的平均寿命信息可根据物理核组中每个核的剩余寿命信息或已用寿命信息确定的,物理核组中剩余寿命最少的核的寿命信息可以是该剩余寿命信息最少的核的剩余寿命信息和/或已用寿命信息。
可选的,进一步的,可根据第一物理核组中的核的寿命信息,确定第一任务的每个线程所分别对应的核。应理解,根据第二信息确定任务对应的物理核组时,例如可将剩余寿命信息较大、已用寿命信息较小或组中剩余寿命最少的核的寿命信息较大的物理核组作为第一物理核组,以进一步均衡核的寿命信息,从而进一步提高核的使用寿命。
下面以图3为例进行说明,任务1和任务N分别包括3个线程,记为线程1至线程3。
可选的,在任务1和任务N的分配过程中,任务分配装置可确定任务与CPU的物理核组之间的映射关系。其中,物理核组中可包括一个或多个核,图3中以每个物理核组中的核的数量是3为例进行说明。比如图3所示,任务分配装置可确定任务1对应于物理核组1,因此将任务1的线程1至线程3分配给物理核组1中的核(该分配关系在图3中通过没有剪头的连接线表示),同理,任务分配装置可确定任务N对应于物理核组N,并将任务N的线程1至线程3分配给物理核组N中的核。
可选的,第一物理核组包括至少一个第二核。例如,在任务分配的过程中,任务分配装置可根据物理核组的第二信息从包括至少一个第二核的物理核组中,选择与任务对应的物理核组。其中,第二核可以是剩余寿命信息大于或等于第三阈值和/或已用寿命信息小于或等于第四阈值的核,比如,某个物理核组中的核的剩余寿命信息均小于第三阈值,比如5,表示该物理核组中的全部核的寿命即将耗尽,可将任务分配给其他物理核组中的核。以图3为例,物理核组1中的第二核可包括核2和核3,物理核组N中的第二核可包括核2和核3。
本申请中,任务分配装置可根据物理核组的权重选择任务对应的物理核组。其中,物理核组的权重与物理核组中的核(或第二核)的剩余寿命信息成正相关,和/或,与物理核组中的核(或第二核)的已用寿命信息成反相关。也就是说,物理核组中的核(或第二核)的剩余寿命信息越大,物理核组的权重越大,则物理核组越可能被确定为任务对应的物理核组,和/或,物理核组中的核(或第二核)的已用寿命信息越小,物理核组的权重越大,则物理核组越可能被确定为任务对应的物理核组,以尽可能选择剩余寿命长的物理核组进行任务分配,提高处理器性能,延长处理器寿命。
或者,也可以在每次确定任务对应的物理核组时,将核(或第二核)的剩余寿命信息最大的物理核组作为任务对应的物理核组,或者,将核(或第二核)的已用寿命信息最小的物理核组作为任务对应的物理核组,或者,将核(或第二核)的剩余寿命信息大于或等于第五阈值的一个物理核组作为第一任务对应的物理核组,或者,将核(或第二核)的已用寿命信息小于或等于第六阈值的一个物理核组作为任务对应的物理核组。
进一步,任务分配装置可根据物理核组中核的寿命信息,将任务包括的线程分配给任务对应的物理核组中的核,形成核与线程之间的映射关系。
可选的,如图3所示,任务1和任务N包括的待分配的线程总数可以与参与分配的物理核组1和物理核组N包括的核的总数相同。
在一种可能的实现方式中,在任务分配装置将线程分配给物理核组中的核时,可根据核的权重进行分配。以图3所示的任务1中的线程分配的过程为例进行说明,物理核组1中的核1的剩余寿命信息为1,核2和剩余寿命信息为10,核3的剩余寿命信息为15,则物理核组1中的核3的权重高于核2的权重且核2的权重高于核1的权重,在分配任务1的线程1至线程3时,可优先将线程分配至核3,次优先分配给核2,而尽可能不分配给核1。比如图3所示,可将线程2和线程3分配给核3,以及将线程1分配给核2,即核1不需要分配任务1的线程。
下面通过举例的方式对本申请实施例提供的核的寿命信息的确定方式进行说明。
在一种可能的实现方式中,如果处理器包括K个核,则第k个核的寿命信息可根据该第k核的历史运行时间、第k个核的频率和历史运行时间、第k核的电压和历史运行时间、第k个核在处理器上的位置、第k核的访存总时间、第k个核的历史指令数和指令的平均运行时间,或者第k个核的频率(或电压或核在处理器上的位置)和访存总时间中的至少一个信 息确定。其中,1≤k≤K,k和K均为正整数。以下进行具体说明:
(1)第k核的历史运行时间
核的历史运行时间可指示核的历史工作时长,比如,可以是核自第一次运行以来的总工作时长。应理解,核的历史运行时间越大则核的已用寿命信息越大和/或剩余寿命信息越小。
本申请中,运行时长可以是指核执行指令消耗的时钟周期,比如,可在核开始执行第一个任务的同时启动计数器对核执行指令消耗的时钟周期进行计数,通过该计数器的计数和时钟周期的长度确定历史运行时间。
(2)第k个核的历史指令数和指令的平均运行时间
第k个核的历史指令数和指令的平均运行时间可用于确定第k个核的历史运行时间,因此,可根据第k个核的历史指令数和指令的平均运行时间确定第k个核的寿命信息。其中,历史指令数是指核的已经处理的全部历史任务所包括的指令总数。比如,第k个核的历史指令数可以是在分配该第一任务之前,第k个核执行的全部历史任务包含的指令总数。指令的平均运行时间是指执行一个指令的平均运行时间,可根据多个指令的运行时间和多个指令的指令数量确定。
(3)第k核的频率和历史运行时间
其中,第k核的频率可以是该核在一段时间(如历史运行时间)或完成一定任务(如全部历史任务)的过程中运行的平均功率。可以理解的是,在历史运行时间相同的情况下,核的频率越大则该核的损耗越大,即已用寿命信息越小和/或剩余寿命信息越大,因此,可通过核的频率和历史运行时间确定核的寿命信息。
(4)第k核的电压和历史运行时间
第k核的电压可以是该核在一段时间(如历史运行时间)或完成一定任务(如全部历史任务)的过程中运行的平均电压。可以理解的是,在历史运行时间相同的情况下,核的电压越大则该核的损耗越大,即已用寿命信息越小和/或剩余寿命信息越大,因此,可通过核的电压和历史运行时间确定核的寿命信息。
(5)第k核在处理器上的位置
核在处理器上的位置同样可能影响核的寿命信息,这里的位置是指核在芯片上的物理位置,和/或核之间的相对位置。其中,处于不同的位置的处理器核在运行过程中,由于芯片电路布局和电源网络设计的因素,热密度不同,热密度与核在处理器上的位置相关。一般情况下,越靠近芯片中心位置的热密度越高,或者,越靠近物理核组密集的区域中心位置的热密度最高,在相同的频率和/或历史运行时长的条件下,热密度越高的核的结温(junction temperature)越高即核的已用寿命信息越小和/或剩余寿命信息越大。其中,结温是指半导体晶体管的温度,在相同运行时长下,核在节温越高的地方寿命下降越快。
如图4所示,在采用核阵列布局时,Tj1、Tj2和Tj3分别表示图4所示的位置1、位置2和位置3分别的结温,其中,Tj1、Tj2和Tj3分别距离芯片的中心位置的距离依序递增,也就是说,Tj1距离芯片的中心位置的距离小于Tj2距离核阵列中心位置的距离,且Tj2距离芯片的中心位置的距离小于Tj3距离芯片的中心位置的距离,则Tj1、Tj2和Tj3之间满足有如下关系:Tj1>Tj2>Tj3。又如图5所示,在采用多个物理核组的布局时,Tj、Tj2和Tj3分别表示图5所示的位置1、位置2和位置3分别的结温,其中,Tj1、Tj2和Tj3分别距离物理核组的中心位置的距离依序递增,也就是说,Tj1距离最近的物理核组的中心位置的距离小于Tj2距离最近的物理核组的中心位置的距离,且Tj2距离最近的物理核组的中 心位置的距离小于Tj3距离最近的物理核组的中心位置的距离,则Tj1、Tj2和Tj3之间满足有如下关系:Tj1>Tj2>Tj3。
可选的,可根据核的结温确定核的剩余寿命信息和/或已用寿命信息,例如,历史运行时长相同的情况下,节温越大则已用寿命信息越大,和/或,剩余寿命信息越小。
应理解,从空间维度,可以通过芯片节温将芯片空间分为多个区域,在不同的区域放置有限个节温探测器(temperature sensor),在处理器运行过程中,不同节温探测器获得的结温可以实时反馈至任务分配装置,可供确定各个节温探测器所在区域的结温。
(6)第k个核的访存总时间
核的访存时间即该核的存储访问时间,也可以称之为存取时间,是指核从启动一次存储器操作,到完成该操作所经历的时间,存储器操作包括访问存储器,例如读取存储器中的数据。访存时间与处理器的硬件参数相关,可以理解为对于同一个核来说访存时间不变。
核的访存总时间是指核处理至少一个历史任务中的访存时间的总和。可以理解,核的访存总时间越长则说明核处理的任务需要的访存次数越多。可选的,可根据核的访存总时间确定核的剩余寿命信息和/或已用寿命信息,核的访存总时间越大则核的已用寿命信息越大和/或剩余寿命信息越小。
可选的,核的访存总时间也可替换为核的被隐藏的访存总时间或核的未被隐藏的访存总时间。
核的被隐藏的访存总时间是核处理至少一个任务中的被隐藏的访存时间的总和。被隐藏的访存时间是指核在执行存储器操作的期间,执行了存储器操作以外的其他操作,使得核在执行该存储器操作期间(该期间即访存时间,或访存时间内的部分时间段内)内没有空转,则这个存储器操作对应的访存时间中的执行了其他操作的时间被称为被隐藏的访存时间。任务分配装置可倾向于将任务分配给被隐藏的访存总时间较小的核。
未被隐藏的访存总时间是核处理至少一个任务中的未被隐藏的访存时间的总和。未被隐藏的访存时间则是指在访存时间内,未执行存储器操作以外的其他操作的时间。任务分配装置可倾向于将任务分配给未被隐藏的访存总时间较小的核。
应理解,任务分配装置可根据以上第k个核的频率、第k个核的电压、第k个核在CPU上的位置、第k个核的历史运行时间,或者第k个核的访存总时间中的任意一个信息确定第k个核的已用寿命信息,还可根据第k个核的频率、第k个核的电压、第k个核在CPU上的位置、第k个核的历史运行时间,或者第k个核的访存总时间中的多个信息确定第k个核的已用寿命信息。比如,对第k个核的频率、第k个核的电压、第k个核在CPU上的位置、第k个核的历史运行时间,或者第k个核的访存总时间中的多个信息分别设定权重,根据多个信息和多个信息对应的权重确定第k个核的已用寿命信息。
(7)第k个核的频率(或电压或核在处理器上的位置)和访存总时间
如前述,核的访存总时间越长则说明核处理的任务需要的访存次数越多,也就是核的已用寿命信息越大和/或剩余寿命信息越小。如果访存总时间相同,可进一步结合核的频率、电压或核在处理器上的位置确定核的寿命信息,具体方式可参照根据历史运行时间和核的频率、电压或核在处理器上的位置中的一个确定寿命信息的方式。举例来说,在两个核的历史运行时间相同的情况下,频率或电压更大的核,或者越靠近处理器芯片的中心位置或物理核组密集的区域中心位置的核的已用寿命信息越大和/或剩余寿命信息越小。
应理解,以上所示确定寿命信息的方式仅仅是举例说明,在实际使用中还可在以上举例 的方式的基础上进行排列组合等扩展方式确定第k个核的寿命信息。比如,实际使用时还可根据核的历史运行时间、频率、电压和核在处理器上的位置确定寿命信息。
在另一种可能的实现方式中,第k个核的已用寿命信息可根据第k个核的历史指令数、指令的平均运行时间、第k个核的频率、第k个核的访存时间,和第k个核的至少一个历史任务(如全部历史任务)中未被隐藏的访存时间与该至少一个历史任务的总运行时间之间的比例确定。
其中,第k个核的历史指令数、指令的平均运行时间、第k个核的频率、第k个核的访存时间可参见前述说明。第k个核的至少一个历史任务中未被隐藏的访存时间可以是第k个核执行至少一个历史任务过程中的访存时间,该至少一个历史任务的总运行时间可以是第k个核执行该至少一个历史任务消耗的时钟周期。
示例性的,在根据第k个核的历史指令数、指令的平均运行时间、第k个核的频率、第k个核的访存时间,和第k个核的至少一个历史任务中未被隐藏的访存时间与该至少一个历史任务的总运行时间之间的比例确定第k个核的已用寿命信息时,第k个核的已用寿命信息可符合以下公式1:
Figure PCTCN2021103160-appb-000001
其中,T comp表示第k个核的已用寿命信息,F表示第k个核的频率,#inst k表示第k个核在执行第m个历史任务中执行的指令数,其中,m=1、2……M,M为第k个核执行的全部历史任务的数量。CPI k表示第k个核执行指令的平均运行时间。可选的,对于同一个核来说,CPI k可以不变。
Figure PCTCN2021103160-appb-000002
表示第k个核的访存时间。
Figure PCTCN2021103160-appb-000003
表示执行第m个任务中,第k个核未被隐藏的访存时间占第k个核执行该第m个任务的运行时间的比例,或者,
Figure PCTCN2021103160-appb-000004
表示执行第1至m个任务的过程中,第k个核未被隐藏的访存时间占第k个核执行该第1至m个任务的运行时间的比例。
在另一种可能的实现方式中,核的剩余寿命信息可根据总寿命信息和已用寿命信息确定。
例如,可将第k个核的总寿命信息表示为T core_total,以及将第k个核的剩余寿命信息表示为T core,则T core符合以下公式2:
T core=T core_total-T comp。    (公式2)
可选的,在根据第k个核的频率、第k个核的电压、第k个核的历史运行时间,或者第k个核的访存总时间中的至少一个,或根据第k个核的历史指令数、指令的平均运行时间、第k个核的频率、第k个核的访存时间,和第k个核的至少一个历史任务中未被隐藏的访存时间与历史任务的总运行时间之间的比例,确定第k个核的已用寿命信息后,还可根据第k个核在CPU上的位置对第k个核的已用寿命信息进行修正。
可选的,可根据第k个核在处理器上的位置确定第k个核的结温Tj,则修正前后第k个核的已用寿命信息与Tj之间可以符合公式3:
T comp′=T comp+a*Tj。    (公式3)
其中,T comp′表示第k个核的修正后的已用寿命信息,T comp表示第k个核的修正前的已用寿命信息。a为校正系数,可以是设定值。
或者,可在根据公式2确定第k个核的剩余寿命信息后,根据第k个核在CPU上的位置 对第k个核的剩余寿命信息进行修正,修正前后第k个核的剩余寿命信息与Tj之间可以符合公式4:
T core′=T core-a*Tk。    (公式4)
其中,T core′表示第k个核的修正后的剩余寿命信息,T core表示第k个核的修正前的剩余寿命信息。a为校正系数,可以是设定值。
可选的,在S102中,可根据多个核分别的寿命信息和第一任务的任务量信息,确定第一核。可选的,第一任务的任务量信息可指示第一任务包含的指令数,该指令数可用于确定执行第一任务的运行时间。比如,可根据任一核执行一个指令的平均运行时间和第一任务包含的指令数,确定该核执行第一任务的运行时间。另外,第一任务的任务量信息也可指示第一任务的运行时间。
这里以第一任务是一个线程为例进行说明。在S102中,还可根据该线程的任务量信息和处理器多个核分别的寿命信息,确定第一核。其中,该线程的任务量信息可用于确定执行该线程的运行时间。以寿命信息为剩余寿命的时间信息或已用寿命的时间信息为例,可根据线程的任务量信息确定该线程的运行时间,并从剩余寿命对应的时间信息与执行该线程的运行时间的差大于或等于第七阈值的核,和/或已用寿命信息与该线程的任务量信息的和小于或等于第八阈值的核中,选择第一核。此时选择第一核的选择方式可以是随机选择或根据剩余寿命信息和/或已使用寿命信息选择等。另外,可选的,可限定选择的第一核的剩余寿命信息不小于该线程的任务量信息,以避免该核在执行该线程后寿命耗尽。
应理解,本申请中涉及的第一阈值、第二阈值、第三阈值、第四阈值、第五阈值、第六阈值、第七阈值或第八阈值中的至少一个,可根据处理器除第一核以外的其他核的平均寿命信息确定,或根据处理器的全部核的平均寿命信息确定。比如,可根据其他核或全部核的平均寿命信息向上或向下浮动一定比例,确定第一阈值、第二阈值、第三阈值、第四阈值、第五阈值、第六阈值、第七阈值或第八阈值中的至少一个。
可选的,在第一核执行该线程后,可根据该线程的任务量信息更新并存储第一核的剩余寿命信息和/或已用寿命信息,用于后续的任务分配中作为该核的寿命信息。其中,可以根据设定时长执行核的寿命信息的更新,或者,在核每执行完一个或多个任务后更新核的寿命信息。
其中,更新核的已用寿命信息的方式可参照前述确定核的已用寿命信息的方式说明。举例来说,处理器根据公式1确定在执行第一任务之前的时刻1,K个核分别的已用寿命信息,并从K个核中确定第一核。处理器可根据第一核已经执行的任务(例如包括第一任务)信息,更新时刻2之前该第一核的已用寿命信息。
这里以时刻1之前,第一核完成的任务数量为M,且在时刻2之前,第一核完成的任务数量为M+1(即在时刻1和时刻2之间,)为例进行说明,此时第一核更新的已用寿命信息T comp′可符合公式5:
Figure PCTCN2021103160-appb-000005
其中,F表示处理器的频率,#inst k′表示第k个核在执行第m个任务中执行的指令数,其中,m=1、2……M+1。CPI k表示第k个核执行一个指令的平均运行时间。
Figure PCTCN2021103160-appb-000006
表示第k个核的访存时间。
Figure PCTCN2021103160-appb-000007
表示执行第m个任务中,第k个核未被隐藏的访存时间占第k个核执行该第m个任务的运行时间的比例。
另外,也可根据公式1以外的其他方式更新核的已用寿命信息,本申请中不进行具体限定。比如在上例中,还可通过计数器对核执行第一任务的时长进行统计,根据该时长和时刻1之前第一核的已用寿命信息,获得时刻2之前第一核的已用寿命信息。
此外,第k个核的更新的剩余寿命信息可根据第k个核的总寿命信息和第k个核更新的已用寿命信息确定。例如可参照公式2。
本申请实施例中,核的寿命信息可存储在BIOS中,例如,可在BIOS中新增寿命记录(core lifetime recorder)单元或简称为记录单元,用于存储核的寿命信息。在进行任务分配时,执行任务分配的模块或单元可从记录单元中获取核的寿命信息。还可将核的更新的寿命信息存储至记录单元。
下面结合软硬件逻辑结构图,介绍本申请实施例提供的任务分配方法的实现方式。
方式一、通过处理器的操作系统执行软件实现本申请实施例提供的任务分配方法。
以处理器的操作系统是Linux操作系统为例,可由Linux调度器执行软件(如执行计算机程序指令),使得Linux调度器根据核的寿命信息进行任务分配,也就是根据核的寿命信息确定核与任务/线程之间的映射关系。其中,Linux调度器可存储计算机程序指令,或从存储器中获取进行将程序指令,以实现该任务分配方法。
如图6所示,由Linux调度器(图6中表示为调度器)执行软件时,任务分配过程可包括:在应用程序运行后,应用程序的任务通过Linux操作系统(图6中表示为操作系统)的应用程序接口(application programming interface,API)进行资源分配,Linux操作系统的驱动将任务发送到Linux调度器,Linux调度器执行软件,实现以下步骤:从BIOS的记录单元获取核的寿命信息(或称,将记录单元中存储的核的寿命信息透传到Linux调度器),根据核的寿命信息确定任务包括的线程与核之间的映射关系,即实现任务分配。之后,由线程对应的核执行各个线程,之后在线程结束后更新线程对应的核的寿命信息,并将更新的寿命信息存储到BIOS的记录单元中,以便后续根据核的更新的寿命信息进行下一次的任务分配。
方式二、通过处理器实现本申请实施例提供的任务分配方法。
以处理器是CPU为例,可由CPU执行固化在CPU中的软件(如执行计算机程序指令),使得CPU根据核的寿命信息进行任务分配,也就是根据核的寿命信息确定核与任务包括的线程之间的映射关系。另外,也可由CPU从存储系统或CPU以外的存储器获取计算机程序指令,以实现该任务分配方法。
如图7所示,由CPU执行软件时,任务分配过程可包括:在应用程序运行后,应用程序的任务通过Linux操作系统(图7中表示为操作系统)的API进行资源分配,Linux操作系统的驱动将任务发送到Linux调度器(图7中表示为调度器),Linux调度器将任务发送到CPU,由CPU执行软件,实现以下步骤:通过CPU的硬件接口从BIOS的记录单元获取核的寿命信息(或称,将记录单元中存储的核的寿命信息透传到CPU),根据核的寿命信息确定线程与核之间的映射关系。由线程对应的核执行各个线程,在线程结束后,由CPU更新线程对应的核的寿命信息,并通过硬件接口将更新的寿命信息存储到BIOS的记录单元中,以便后续根据核的更新的寿命信息进行下一次的任务分配。
方式三、本申请实施例提供的任务分配方法在异构处理器中的应用。
如图8所示,以CPU和GPU组成的异构处理器为例,CPU可用于对GPU的任务进行调度。其中,CPU与GPU之间可通过快速外围组件互连(peripheral component interconnect  expres,PCIe)总线(bus)或其他方式连接,此时可由CPU的操作系统根据GPU的核的寿命信息对GPU的任务进行分配。可选的,PCIe总线可用于连接CPU的动态随机存取存储器(dynamic random access memory,DRAM)和GPU的DRAM。其中,CPU还可包括控制单元、运算逻辑单元(arithmetic and logic unit,ALU)和高速缓冲存储器(Cache)。
CPU的Linux操作系统可通过PCIe bus获取GPU核的寿命信息,并存储在CPU的BIOS的记录单元中。或者,CPU的Linux操作系统可通过PCIe bus获取GPU核的用于确定寿命信息的参数,比如,GPU的第k个核的总寿命信息、处理器的频率、第k个核的历史指令数、指令的平均任务量信息、第k个核的访存时间,或第k个核在执行至少一个任务中的未被隐藏的访存时间与至少一个任务的运行时间之间的比例中的至少一个信息,根据获取的参数确定GPU核的寿命信息,并存储在CPU BIOS的记录单元中。可选的,GPU核的寿命信息和/或用于确定GPU核的寿命信息的参数可存储在GPU的DRAM中。
当需要对GPU的任务进行分配时,仍以操作系统是Linux系统为例,可由CPU的Linux操作系统的Linux调度器或CPU根据BIOS的记录单元中存储的GPU核的寿命信息对GPU的任务进行分配,即确定GPU的任务包括的线程与GPU的核之间的映射关系。其中,由Linux调度器执行GPU任务分配的方式可参见前述方式一中的说明,由CPU执行GPU任务分配的方式可参见前述方式二中的说明,这里不再展开赘述。可选的,如果由CPU确定线程与GPU的核之间的映射关系,可由CPU通过总线将该映射关系通知给GPU。
此外,在GPU的线程对应的核执行线程后,可由Linux调度器或CPU更新GPU核的寿命信息,并将更新后的GPU核的寿命信息存储在记录单元中。
综上所述,本申请实施例提出了在同构处理器和异构处理器等场景下的多核管理软硬件系统设计,使得任务分配过程根据核的寿命信息进行,解决CPU核的负载均衡问题,以延长核的使用寿命。由于核的寿命延长,核能够工作在较高的频率,因此可以提高处理器的性能。
如图9所示,本申请实施例提供另一种任务分配方法,以任务分配装置执行该方法为例,该方法可包括以下步骤:
S201:任务分配装置获取多个处理器分别的第三信息,第三信息用于描述处理器的使用寿命。
其中,第三信息可以是处理器的寿命信息,比如,第三信息包括处理器的剩余寿命信息和/或已用寿命信息,第三信息也可以是处理器中核的平均寿命信息和/或处理器中剩余寿命最少的核的寿命信息等。其中,处理器的剩余寿命信息可以是处理器包括的核的剩余寿命信息的总和,也可以是用于衡量处理器的剩余寿命的其他的参数或指标。处理器的已用寿命信息可以是处理器包括的核的已用寿命信息的总和,也可以是用于衡量处理器的已用寿命的其他的参数或指标。处理器中核的平均寿命信息可根据处理器中每个核的剩余寿命信息或已用寿命信息确定的。处理器中剩余寿命最少的核的寿命信息可以是该剩余寿命信息最少的核的剩余寿命信息和/或已用寿命信息。
应理解,在图9所示的任务分配方法中,任务分配装置可以是多个处理器中的一个,也可以是多个处理器以外的计算机系统。
S202:任务分配装置根据多个处理器分别的第三信息,从多个处理器中确定用于处理第二任务的第一处理器,第二任务为待处理任务。
例如,任务分配装置可从第三信息指示的剩余寿命信息较大和/或已用寿命信息较小的处理器中选择第一处理器。
采用以上方法,可由任务分配装置根据多个处理器分别的第三消息进行任务分配,因此对多个处理器的寿命进行均衡,以延长多个处理器组成的处理器系统的使用寿命。
可选的,在S202中,任务分配装置可根据多个处理器分别的第三信息和第二任务的任务量信息,确定第一处理器。根据多个处理器分别的第三信息和第二任务的任务量信息确定第一处理器的方式,可参照S102中根据多个核分别的寿命信息和第一任务的任务量信息,确定第一核的方式,以避免由于执行第一任务而导致处理器的寿命耗尽。例如,可确定第三信息指示的剩余寿命信息或不低于第二任务的任务量信息的处理器作为第一处理器。
基于相同的发明构思,本申请实施例还提供一种任务分配装置,用于实现以上方法实施例所示的步骤。该装置可包括图10和/或图11所示结构。该任务分配装置可以应用于具有多个处理器核的计算机系统,可用于实现图2和/或图9所示的任务分配方法。如图10所示,该任务分配装置可以包括记录模块1010和任务分配模块1020。
在实现图2所示任务分配方法时,记录模块1010可用于获取处理器的多个核分别的第一信息,第一信息用于描述核的使用寿命。任务分配模块1020可用于根据多个核分别的第一信息,从多个核中确定用于处理第一任务的第一核,第一任务为待处理任务。
应理解的是,本申请实施例中的任务分配装置可以由软件实现,例如,具有上述记录模块1010和/或任务分配模块1020的功能的计算机程序或指令来实现,相应计算机程序或指令可以存储在终端内部的存储器中,通过处理器读取该存储器内部的相应计算机程序或指令来实现记录模块1010和/或任务分配模块1020的上述功能。或者,本申请实施例中的任务分配装置还可以由硬件来实现。其中,任务分配模块1020可以包括处理器(如CPU或系统芯片中的处理器)。记录模块1010可包括存储器,或包括支持与存储器进行通信的通信接口,例如收发器或输入/输出接口,用于任务分配模块1020从存储器获取核的第一信息。
一种可选的实现方式中,任务分配模块1020可根据多个核分别的第一信息和第一任务的任务量信息,确定所述第一核。
一种可选的实现方式中,如果计算机系统中的处理器包括多个物理核组,任一物理核组包括多个核,第一任务包括多个线程,任务分配模块1020可根据物理核组中的核的第一信息,确定第一任务对应的第一物理核组,并根据第一物理核组中的核的第一信息,从第一物理核组的核中确定第一任务中每个线程对应的核。
一种可选的实现方式中,记录模块1010还可用于存储第一核的更新的第一信息,更新的第一信息根据第一核的第一信息和第一任务的任务量信息确定。可选的,更新的第一信息可由任务分配模块1020确定。
一种可选的实现方式中,处理器的第k个核的第一信息包括第k个核的剩余寿命信息;和/或,第一信息包括第k个核的已用寿命信息。可选的,第k个核的剩余寿命信息根据第k个核的总寿命信息和已用寿命信息确定。
一种可选的实现方式中,第k个核的寿命信息可根据该第k核的历史运行时间、第k个核的频率和历史运行时间、第k核的电压和历史运行时间、第k个核在处理器上的位置、第k核的访存总时间、第k个核的历史指令数和指令的平均运行时间,或者第k个核的频率和访存总时间中的至少一个信息确定。
一种可选的实现方式中,记录模块1010属于固件,或者,记录模块1010可从固件获取处理器的多个核分别的第一信息。
具体来说,记录模块包括固件中的可编辑存储器,该可编辑存储器可用于存储本申请设 计的第一信息、第二信息或第三信息中的至少一个。例如,该可编辑存储器是可擦可编程只读存储器(erasable programmable read-only memory,EPROM)或电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)。
或者,记录模块包括处理器的性能监视单元(performance monitor unit,PMU),用于处理器从固件获取第一信息、第二信息或第三信息中的至少一个。
一种可选的实现方式中,任务分配模块1020包括任务调度器。比如,任务分配模块1020是Linux操作系统中的Linux调度器。
在实现图9所示任务分配方法时,记录模块1010可获取多个处理器分别的第三信息,第三信息用于描述处理器的使用寿命。任务分配模块1020可根据多个处理器分别的第一信息,从多个处理器中确定用于处理第二任务的第一处理器,第二任务为待处理任务。
在一种可能的设计中,任务分配模块1020可具体根据所述多个处理器分别的第三信息和所述第二任务的任务量信息,确定所述第一处理器。
可以理解的是,该装置用于上述任务分配方法时的具体实现过程以及相应的有益效果,可以参考前述方法实施例中的相关描述,这里不再赘述。
基于相同的技术构思,本申请实施例还提供了另一种任务分配装置可包括图11所示的结构,用于执行图2、图9和/或本申请实施例提供的任务分配方法的动作。参阅图11所示,该任务分配装置可以包括处理器1110和存储器1120。其中,处理器1110中可包括多个核。存储器1120,可用于存储多个核的寿命信息。处理器1110,可用于执行上述实施例述及的任务分配方法。应理解,图11中仅以1个处理器1110和存储器1120为例进行介绍,本申请提供的任务分配装置中可包括其他数量的存储器1120和处理器1110。
可选的,所述处理器1110、所述存储器1120之间通过总线相互连接。总线可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图11中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
所述至少一个处理器1110中可以包含以下至少一项:CPU,微处理器,专用集成电路(application specific integrated circuit,ASIC),或一个或多个用于控制本申请方案程序执行的集成电路。其中,所述CPU中可以包括功耗控制器和至少一个处理器核心,所述功耗控制器能够获取所述至少一个处理器核心的失效信息,并将所述至少一个处理器核心的失效信息存储至所述存储器1120中。
存储器1120可以是ROM或可存储静态信息和指令的其他类型的静态存储设备,RAM或者可存储信息和指令的其他类型的动态存储设备,也可以是EEPROM、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过总线与处理器相连接。存储器也可以和处理器集成在一起。
其中,存储器1120用于存储执行本申请方案的计算机执行指令,并由处理器1110来控制执行。处理器1110用于执行存储器1120中存储的计算机执行指令,从而实现本申请上述实施例提供的任务调度方法。可选的,可由处理器1110实现图10所示的任务分配模块1020 的功能。另外,可由存储器1120实现图10所示的记录模块1010的功能,即获取核的第一信息,和/或,存储更新的核的第一信息。
可选的,图11所示的任务分配置在还可包括通信接口,如收发器或输入/输出接口等。例如,在存储器1120中未存储核的寿命信息时,可由接口从其他存储器(或其他存储介质)获取核的第一信息,和/或,将更新的核的第一信息发送至其他存储器。
可选的,本申请实施例中的计算机程序指令也可以称之为应用程序代码,本申请实施例对此不作具体限定。
本申请实施例还提供一种计算机可读存储介质,计算机可读存储介质用于存储计算机程序,该计算机程序被计算机执行时,计算机可以实现上述方法实施例相关的流程。
本申请实施例还提供一种计算机程序产品,计算机程序产品用于存储计算机程序,该计算机程序被计算机执行时,计算机可以实现上述方法实施例相关的流程。
本申请实施例还提供一种芯片或芯片系统(或电路),该芯片可包括处理器,该处理器可用于调用存储器中的程序或指令,执行上述方法实施例提供的与网络设备和/或终端相关的流程。该芯片系统可包括该芯片、存储器或收发器等组件。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的保护范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。
需要理解的是,在本申请的描述中,“第一”、“第二”等词汇,仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。另外在本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。在本申请的文字描述中,字符“/”, 一般表示前后关联对象是一种“或”的关系;在本申请的公式中,字符“/”,表示前后关联对象是一种“相除”的关系。

Claims (28)

  1. 一种任务分配方法,其特征在于,包括:
    获取处理器的多个核分别的第一信息,所述第一信息用于描述核的使用寿命;
    根据所述多个核分别的第一信息,从所述多个核中确定用于处理第一任务的第一核,所述第一任务为待处理任务。
  2. 如权利要求1所述的方法,其特征在于,所述根据所述多个核分别的第一信息,从所述多个核中确定用于处理第一任务的第一核,包括:
    根据所述多个核分别的第一信息和所述第一任务的任务量信息,确定所述第一核。
  3. 如权利要求1或2所述的方法,其特征在于,所述处理器包括多个物理核组,任一物理核组包括多个所述核,所述方法还包括:+
    根据所述多个物理核组分别的第二信息,从所述多个物理核组中确定所述第一任务对应的第一物理核组,所述第二信息包括物理核组中核的平均寿命信息和/或物理核组中剩余寿命最少的核的寿命信息,所述第一物理核组中的核包括所述第一核。
  4. 如权利要求3所述的方法,其特征在于,所述第一任务包括多个线程,所述根据所述多个核分别的第一信息,从所述多个核中确定用于处理第一任务的第一核,包括:
    根据所述第一物理核组中的核的所述第一信息,从所述第一物理核组的核中确定所述第一任务中每个线程对应的核。
  5. 如权利要求1-4中任一所述的方法,其特征在于,还包括:
    存储所述第一核的更新的第一信息,所述更新的第一信息根据所述第一核的第一信息和所述第一任务的任务量信息确定。
  6. 如权利要求1-5中任一所述的方法,其特征在于,所述处理器的第k个核的所述第一信息包括所述第k个核的剩余寿命信息;和/或,
    所述第一信息包括所述第k个核的已用寿命信息。
  7. 如权利要求1-6中任一所述的方法,其特征在于,所述第k个核的第一信息根据以下信息确定:
    所述第k核的频率和历史运行时间;或者,
    所述第k核的电压和历史运行时间;或者,
    所述第k核的历史运行时间;或者,
    所述第k个核在所述处理器上的位置;或者,
    所述第k核的访存总时间;或者,
    所述第k个核的历史指令数和指令的平均运行时间;或者,
    所述第k个核的频率和访存总时间。
  8. 一种任务分配装置,其特征在于,包括:
    记录模块,用于获取处理器的多个核分别的第一信息,所述第一信息用于描述核的使用寿命;
    任务分配模块,用于根据所述多个核分别的第一信息,从所述多个核中确定用于处理第一任务的第一核,所述第一任务为待处理任务。
  9. 如权利要求8所述的装置,其特征在于,所述任务分配模块具体用于:
    根据所述多个核分别的第一信息和所述第一任务的任务量信息,确定所述第一核。
  10. 如权利要求8或9所述的装置,其特征在于,所述处理器包括多个物理核组,任一物理核组包括多个所述核,所述任务分配模块还用于:
    根据所述多个物理核组分别的第二信息,从所述多个物理核组中确定所述第一任务对应的第一物理核组,所述第二信息包括物理核组中核的平均寿命信息和/或物理核组中剩余寿命最少的核的寿命信息,所述第一物理核组中的核包括所述第一核。
  11. 如权利要求10所述的装置,其特征在于,所述第一任务包括多个线程,所述任务分配模块具体用于:
    根据所述第一物理核组中的核的所述第一信息,从所述第一物理核组的核中确定所述第一任务中每个线程对应的核。
  12. 如权利要求8-11中任一所述的装置,其特征在于,所述记录模块还用于:
    存储所述第一核的更新的第一信息,所述更新的第一信息根据所述第一核的第一信息和所述第一任务的任务量信息确定。
  13. 如权利要求8-12中任一所述的装置,其特征在于,所述处理器的第k个核的所述第一信息包括所述第k个核的剩余寿命信息;和/或,
    所述第一信息包括所述第k个核的已用寿命信息。
  14. 如权利要求8-13中任一所述的装置,其特征在于,所述第k个核的第一信息根据以下信息确定:
    所述第k核的频率和历史运行时间;或者,
    所述第k核的电压和历史运行时间;或者,
    所述第k核的历史运行时间;或者,
    所述第k个核的所述第k个核在所述处理器上的位置;或者,
    所述第k个核的访存总时间;或者,
    所述第k个核的历史指令数和指令的平均运行时间;或者,
    所述第k个核的频率和访存总时间。
  15. 如权利要求8-14中任一所述的装置,其特征在于,所述记录模块属于固件;或者,
    所述记录模块具体用于:
    从固件获取所述处理器的多个核分别的第一信息。
  16. 如权利要求15所述的装置,其特征在于,所述记录模块包括所述固件中的可编辑存储器。
  17. 如权利要求15所述的装置,其特征在于,所述记录模块包括所述处理器的性能监视单元,所述性能监视单元具体用于:
    从固件获取所述处理器的多个核分别的第一信息。
  18. 如权利要求8-17中任一所述的装置,其特征在于,所述任务分配模块包括任务调度器。
  19. 一种任务分配方法,其特征在于,包括:
    获取多个处理器分别的第三信息,所述第三信息用于描述处理器的使用寿命;
    根据所述多个处理器分别的第一信息,从所述多个处理器中确定用于处理第二任务的第一处理器,所述第二任务为待处理任务。
  20. 如权利要求19所述的方法,其特征在于,所述根据所述多个处理器分别的第一信息,从所述多个处理器中确定用于处理第二任务的第一处理器,包括:
    根据所述多个处理器分别的第三信息和所述第二任务的任务量信息,确定所述第一处理器。
  21. 一种任务分配装置,其特征在于,包括:
    记录单元,用于获取处理器的多个处理器分别的第三信息,所述第三信息用于描述处理器的使用寿命;
    任务分配模块,用于根据所述多个处理器分别的第一信息,从所述多个处理器中确定用于处理第二任务的第一处理器,所述第二任务为待处理任务。
  22. 如权利要求21所述的装置,其特征在于,所述任务分配模块具体用于:
    根据所述多个处理器分别的第三信息和所述第一任务的任务量信息,确定所述第一处理器。
  23. 如权利要求21或22所述的装置,其特征在于,所述记录模块属于第一处理器的固件;或者,
    所述记录模块还用于:
    从所述处理器分别的固件获取所述处理器分别的第三信息。
  24. 如权利要求23所述的装置,其特征在于,所述记录模块包括所述第一处理器的固件中的可编辑存储器。
  25. 如权利要求23所述的装置,其特征在于,所述记录模块包括所述处理器的性能监视单元,所述性能监视单元具体用于:
    从所述处理器分别的固件获取所述处理器分别的第三信息。
  26. 一种任务处理装置,其特征在于,包括存储器和处理器:
    所述存储器用于存储计算机程序指令;
    所述处理器用于调用计算机程序指令并执行,以实现如权利要求1-7或19-20中任一所述的方法。
  27. 一种芯片,其特征在于,与存储器连接,用于允许所述存储器中存储的程序指令,使得所述计算机执行如权利要求1-7或19-20中任一所述的方法。
  28. 一种计算机可读存储介质,其特征在于,存储有计算机程序指令,当所述计算机程序指令在计算机上运行时,使得所述计算机执行如权利要求1-7或19-20中任一所述的方法。
PCT/CN2021/103160 2021-03-09 2021-06-29 一种任务分配方法及装置 WO2022188306A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202110254699 2021-03-09
CN202110254699.4 2021-03-09
CN202110408180.7 2021-04-15
CN202110408180.7A CN115048194A (zh) 2021-03-09 2021-04-15 一种任务分配方法及装置

Publications (1)

Publication Number Publication Date
WO2022188306A1 true WO2022188306A1 (zh) 2022-09-15

Family

ID=83156407

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/103160 WO2022188306A1 (zh) 2021-03-09 2021-06-29 一种任务分配方法及装置

Country Status (2)

Country Link
CN (1) CN115048194A (zh)
WO (1) WO2022188306A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105830034A (zh) * 2013-12-18 2016-08-03 高通股份有限公司 针对增加的工作寿命和最大化的性能的多核系统设计的运行时间优化
US20170031412A1 (en) * 2015-07-29 2017-02-02 Intel Corporation Masking a power state of a core of a processor
CN108509014A (zh) * 2017-02-27 2018-09-07 三星电子株式会社 计算设备和分配功率到每个计算设备中的多个核的方法
CN111105837A (zh) * 2018-10-29 2020-05-05 三星电子株式会社 管理退化程度的电子装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105830034A (zh) * 2013-12-18 2016-08-03 高通股份有限公司 针对增加的工作寿命和最大化的性能的多核系统设计的运行时间优化
US20170031412A1 (en) * 2015-07-29 2017-02-02 Intel Corporation Masking a power state of a core of a processor
CN108509014A (zh) * 2017-02-27 2018-09-07 三星电子株式会社 计算设备和分配功率到每个计算设备中的多个核的方法
CN111105837A (zh) * 2018-10-29 2020-05-05 三星电子株式会社 管理退化程度的电子装置

Also Published As

Publication number Publication date
CN115048194A (zh) 2022-09-13

Similar Documents

Publication Publication Date Title
EP3155521B1 (en) Systems and methods of managing processor device power consumption
TWI537821B (zh) 對每一核心提供電壓及頻率控制之技術
CN109643243B (zh) 用于动态虚拟cpu核分配的方法、系统、设备、装置和介质
GB2544609B (en) Granular quality of service for computing resources
US8489904B2 (en) Allocating computing system power levels responsive to service level agreements
US8707314B2 (en) Scheduling compute kernel workgroups to heterogeneous processors based on historical processor execution times and utilizations
KR101748747B1 (ko) 프로세서의 구성가능한 피크 성능 제한들의 제어
JP2018533122A (ja) マルチバージョンタスクの効率的なスケジューリング
CN103562870A (zh) 异构核心的自动加载平衡
JP2018534675A (ja) 再マッピング同期によるタスクサブグラフの加速化
CN104169832A (zh) 提供处理器的能源高效的超频操作
US20120297216A1 (en) Dynamically selecting active polling or timed waits
US20180032376A1 (en) Apparatus and method for group-based scheduling in multi-core processor system
CN111190735B (zh) 一种基于Linux的片上CPU/GPU流水化计算方法及计算机系统
US20210397476A1 (en) Power-performance based system management
US9547576B2 (en) Multi-core processor system and control method
US20140259022A1 (en) Apparatus and method for managing heterogeneous multi-core processor system
US9760404B2 (en) Dynamic tuning of multiprocessor/multicore computing systems
US11422857B2 (en) Multi-level scheduling
WO2022188306A1 (zh) 一种任务分配方法及装置
CN101847128A (zh) 管理tlb的方法和装置
US20160292012A1 (en) Method for exploiting parallelism in task-based systems using an iteration space splitter
WO2022062937A1 (zh) 任务调度方法、装置以及计算机系统
WO2024012153A1 (zh) 一种数据处理方法及装置
CN113360192A (zh) 热缓存识别方法、装置、存储介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21929776

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21929776

Country of ref document: EP

Kind code of ref document: A1