CN100562854C - Method for implementing load equalization of multicore processor operating system - Google Patents

Method for implementing load equalization of multicore processor operating system Download PDF

Info

Publication number
CN100562854C
CN100562854C CN 200810061134 CN200810061134A CN100562854C CN 100562854 C CN100562854 C CN 100562854C CN 200810061134 CN200810061134 CN 200810061134 CN 200810061134 A CN200810061134 A CN 200810061134A CN 100562854 C CN100562854 C CN 100562854C
Authority
CN
China
Prior art keywords
load
processor
core
load balancing
pi
Prior art date
Application number
CN 200810061134
Other languages
Chinese (zh)
Other versions
CN101256515A (en
Inventor
严力科
冯德贵
施青松
曹明腾
罡 王
王宇杰
威 胡
蒋冠军
斌 谢
陈天洲
Original Assignee
浙江大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学 filed Critical 浙江大学
Priority to CN 200810061134 priority Critical patent/CN100562854C/en
Publication of CN101256515A publication Critical patent/CN101256515A/en
Application granted granted Critical
Publication of CN100562854C publication Critical patent/CN100562854C/en

Links

Abstract

本发明公开了一种多核处理器操作系统负载均衡的实现方法。 The present invention discloses a multi-core processor operating system load balancing implementation. 是通过对多核处理器操作系统运行时,对负载情况进行检测,并根据检测的负载情况对线程进行分配。 An operating system runs through the multi-core processor, the load situation is detected, and the allocation of threads according to the detected load. 本方法实现多核处理器操作系统负载的均衡,从而在多核处理器操作系统上运行的多线程程序能够在操作系统的调度下,多线程均衡的分布在不同的处理器核上,从而提高多处理器核的执行效率。 The present method of operating a multicore processor system to achieve a balanced load, so that multi-thread programs running on a multi-core processor system capable of operating under the operating system scheduler, a balanced distribution of the multi-threaded processor cores on different, thereby improving the multiprocessing efficiency is core.

Description

多核处理器操作系统负载均衡的实现方法 Operating system core processor implemented method of load balancing

技术领域 FIELD

本发明涉及多核处理器操作系统技术,特别是涉及一种多核处理器操作系统负载均衡的实现方法。 The present invention relates to a multi-core processor operating system technology, particularly to a multi-core processor operating system load balancing implementation. 背景技术 Background technique

摩尔定律已经出现了几十年了,但是近年来随着集成电路晶体管尺寸的不断縮小,在硅片尺寸里面很难再装下更多的晶体元件,集成电路的复杂度不能得到大幅度提升,进而预示着处理器性能得不到大幅度提高。 Moore's Law has been around for decades, but in recent years with shrinking transistor size of the integrated circuit, the die size which more difficult to hold the crystal element, the complexity of integrated circuits can not be greatly improved, thus indicating that the processor performance can not be greatly improved. 另一方面,处理器 On the other hand, the processor

的频率已经到了一个瓶颈(Pentium4最高达到了3.8GHZ,并没有达到预期的4GHz),已经很难再提高,即使可以提高运行频率,所带来的功耗问题也不能解决。 The frequency has reached a bottleneck (Pentium4 up to a 3.8GHZ, did not achieve the expected 4GHz), has been difficult to improve, even if you can improve the operating frequency, power consumption caused by the problem can not be solved. 因此,为了提升性能,以Intel、 AMD、 IBM为代表的硬件开发商开始着眼于多核处理器(也称单芯片多处理架构,Chip Multi-Processors, CMP)的开发。 Therefore, in order to enhance performance to Intel, AMD, IBM, represented by hardware developers began to focus on multi-core processors (also known as single-chip multi-processing architecture, Chip Multi-Processors, CMP) development.

新的体系结构的出现,必须有合适的软件匹配才能发挥其更好的性能。 The emergence of new architecture must have the appropriate software to match in order to play its better performance. 多核体系结构性能提升的最基本思想是将任务进行合适的分解,让任务在多个处理器上同时并行。 Structural performance multi-core architecture to enhance the basic idea is to break down tasks appropriate, make tasks in parallel on multiple processors simultaneously. 因此,并行计算是多核体系结构最大的特点。 Thus, parallel computing is the greatest feature a multi-core architecture. 目前任务的划分、 多线程执行实现的主要瓶颈在软件上,因为多核体系结构要求的多线程不单是在软件程度上实现多线程,而且要在硬件层面上实现多线程。 The current division of tasks, the main bottlenecks in the implementation of multi-threading implemented in software, because the multi-threaded multi-core system architecture requires not only the realization of multi-threaded software in extent, but also to achieve multi-threading on the hardware level.

操作系统作为与硬件接触最密切的软件,如何让操作系统更好地发挥多核的性能,是目前研究的一个热点。 Operating system as the closest contact with the hardware software, how to make the operating system better play multi-core performance, is a hot topic at present. 多处理之间协调工作,并发程度达到尽可能得大,很大程度上依赖于操作系统调度程序对任务的调度和分配。 Coordination between multi-processing, concurrent too great an extent as possible, to a large extent dependent on the operating system scheduler scheduling and allocation of tasks.

众所周知,操作系统的任务调度包括对实时任务、交互性任务和后台批处理任务的调度。 As we all know, the operating system, including task scheduling, task and interactive real-time task scheduling background batch tasks. 调度的算法可以基于优先级、时间片轮转、任务抢占等。 The scheduling algorithm may be based on priority, round-robin, task preemption. 调度解决的主要问题是如何达到资源最充分的利用和系统最大的吞吐量、而花费尽可能少的调度时间。 The main problem is how to solve scheduling to achieve maximum throughput fullest use of resources and systems, while spending as little as possible scheduling time.

多处理器相较于单核处理器提出了新的调度问题:负载均衡和任务分配。 Multi-processor single-core processor compared to the proposed new scheduling problem: load balancing and task allocation. 负载均衡指尽可能让所有的处理器都能均衡的占有资源,以达到最大的系统吞吐率;任务分配指系统把任务合理分配到各处理器核上,以达到处理器之间工作量的均衡。 Load balancing refers to as far as possible so that all processors can be balanced in the possession of resources to achieve maximum system throughput; task allocation system refers to the rational allocation of tasks to each processor core, in order to achieve a balanced workload between processors . 在考虑诸多不同的多核体系架构下,探索一种合适的多核处理器调度显得尤为重要。 In consideration of the many different multi-core architecture, the exploration of a suitable multi-core processor scheduling is very important.

多核系统与单核系统最大的区别是多核系统的并发性。 Multi-core systems and the biggest difference is a single-core system concurrency multicore systems. 并发性要求系统中所有的处理器都能尽量达到最大的吞吐率和资源最大利用率。 Concurrency requires all processors in the system can try to achieve maximum throughput and maximum resource utilization. 因此,我们希望每个处理器能均衡地执行任务,负载相同。 Therefore, we hope that each processor can perform tasks in a balanced manner, the same load.

负载均衡是多核操作系统提出的新问题。 Load balancing is a new multi-core operating system questions raised. 在单核操作系统中,只有一个核, 不需要考虑负载均衡。 In the single-core operating system, only one core, no need to consider load balancing. 多核系统要尽量达到最好的执行性能,需要将任务均匀地分配到每个处理器核上,这里指的均匀,不单单是任务数量的均匀,还包括对系统资源访问均匀、执行时间均匀。 To achieve the best possible multi-core system execution performance, it is necessary to uniformly assign tasks to each processor core, a uniform mean here, not only the number of tasks evenly, further comprising a uniform access to system resources, even the execution time. 任务合理分配的目标是负载均衡。 Objectives and tasks of the rational allocation of load balancing. 从任务执行的狭义角度来看,任务运行的时间长短、访问资源的时机、请求系统中断等都是不可预见的,任务的执行是动态的。 From the narrow perspective of task execution point of view, the task runs the length of time, the opportunity to access a resource request system outages are unpredictable, mission is dynamic. 因此负载的均衡不能单方面考虑任务分配,当系统中处理器之间负载发生不均衡时,需要对任务作迁移作为运行时动态平衡,以达到各处理器负载均衡的目的。 Thus load balancing can not unilaterally consider the task assignment, when the load between the processors in a system imbalance occurs, the task needs to be run as a dynamic equilibrium migration, each processor in order to achieve load balancing purposes.

合理的解决处理器的负载均衡这两个新问题,能够对资源进行更有效合理利用,使任务执行得到最快响应。 Processor load balancing reasonable solution of these two new issues, can be more effective and rational use of resources, the task execution to get the fastest response. 发明内容 SUMMARY

本发明的目的在于提供一种多核处理器操作系统负载均衡的实现方法。 Object of the present invention is to provide a multi-core processor operating system load balancing implementation. 本发明解决其技术问题采用的技术方案如下: The present invention solves the technical problem using the technical solution as follows:

1) 调度域构建: 1) Construction of scheduling domain:

处理器核初始化的过程中,访问每个处理器核;共享二级缓存的处理器核被划分到同一个调度域当中;这样,就可以形成若干个不同的调度域; Initialization process of the processor core, each processor core access; shared L2 cache to the processor core is divided among the same scheduling domain; Thus, it may be formed of a number of different scheduling domains;

2) 负载向量计算: 2) Load vector calculation:

使用资源使用率和运行队列长度作为计算负载向量的因子,使用公式(1) 计算处理器核的利用率FCPU,其中Tused为处理器运算时间,Tidle为处理器空闲时间, Resource usage and operation using queue length as a vector load factor calculated using Equation (1) is calculated processor utilization FCPU core, wherein Tused processor operation time, processor idle time Tidle,

FCPU = Tused/ ( Tidle + Tused) (1) 使用公式(2)计算负载向量Fload,其中FCPU为处理器核的利用率,利用公式(1)进行计算,Fnrn—queue为处理器核运行队列的长度; FCPU = Tused / (Tidle + Tused) (1) using equation (2) calculates the load vector FLOAD, wherein FCPU processor core utilization, using Equation (1) is calculated, Fnrn-queue is a queue of processor cores running length;

Fload= (FCPU+1) *Frun—queue (2) Fload = (FCPU + 1) * Frun-queue (2)

3) 负载均衡检测: 3) Load balancing test:

对于处理器核的一个调度域Pset= (P1,P2,…,Pn},其中P1,P2,…,Pn 是调度域Pset中的处理器核,对于Pset中的处理器核Pi会去检测该处理器核与其他处理器核是否有负载失衡的状况;每个处理器核都有自己的负载检查,负载检査的时间发生在线程分配、处理器空闲和固定时间间隔; A core processor for scheduling domain Pset = (P1, P2, ..., Pn}, where P1, P2, ..., Pn the processor core is in scheduling domain Pset, for the processor core will Pset Pi to detect the whether the processor core and other cores have load imbalances; each processor core has its own inspection load, load time check occurs in the thread allocation, the processor is idle and fixed time interval;

负载均衡检査过程如下:第一步,Pi检测与它在同一个调度域中的处理器核Pj,Pj+l,......, Pj+k,若 Load balancing checking process as follows: First, Pi detect it in the same scheduling domain core processor Pj, Pj + l, ......, Pj + k, if

负载不均衡,则返回负载向量相差最大的处理器核P和H负载向量差值正负的 Unbalanced load, the load is returned biggest difference vector processor core P and the load vector difference H of the positive and negative

标志W,检査结束; Flag W, check the end;

第二步,若同一调度域中的处理器核负载均衡,则去检査其他同层调度域中的负载,同层调度域的检查只需要检查其中任意一个处理器Pm即可;负载不均衡时,返回第一个负载量不平衡的处理器核P和Pi负载差正负的标志W; The second step, if the same processor core load balancing scheduling domain, to check the load of the same layer as the other domain scheduling, scheduling domain checking the same layer only need to check any of Pm to a processor; unbalanced load when, returns the first unbalanced loading of the processor core and the Pi P load difference W is positive or negative sign;

4) 线程分配: 4) thread allocation:

当线程Tnew产生后,分配流程如下: When the thread Tnew generation, distribution process is as follows:

当线程Tnew的状态为可运行后,调用父线程Tparent所在处理器核Pparent 的检测负载均衡,若负载均衡,则该线程被进入父进程所在的处理器核的运行队列中;否则,该线程被插到负载最小的处理器核Pload—least的运行队列中; When the state of the thread Tnew is run, call detecting load balancing where the parent thread Tparent processor core Pparent, if load balancing, the thread is running into the queue processor cores where the parent process; otherwise, the thread is smallest load into the run queue processor core Pload-least in the;

5) 运行时动态负载均衡.- 5) run-time dynamic load balancing .-

对于处理器核的集合Pset,对于任意Pi属于Pset,都有独立的检查负载均衡策略。 For Pset set of processor cores, for any part of Pset Pi, has a separate check load balancing strategy. 这里采用的负载均衡检查策略与步骤3)中相同: Check load balancing strategies employed herein with the same step 3):

若Pi有线程运行,负载检测由Pi隔固定的时间间隔调用;若Pi空闲,则减少时间间隔数,以尽量少的时间间隔进行检测。 If Pi have threads running load detection interval is called by a fixed time interval Pi; if Pi is idle, reducing the number of intervals, in order to minimize detection time intervals. 若所有处理器核都空闲,则调整检查负载均衡的时间间隔数; If all processor cores are idle, then the check load balancing to adjust the number of time intervals;

处理器核Pi发现负载均衡,负载均衡检査策略会返回与Pi负载不均衡的处理器Pt以及负载大小关系比较值W;若WX), Pi的负载量小于Pt,需要迁移Pi中部分就绪队列中的线程到Pt的就绪队列中;若W〈0,则需要从Pt中的就绪队列迁移部分线程到Pi中,以达到负载均衡;如果百=0时,负载已经均衡, 不需要迁移线程队列。 Pi-core processor found in load balancing, load balancing check policy and will return Pi Pt processor load imbalance and load size relationship between the comparison value W; if WX), Pi loading amount is less than Pt, you need to migrate Pi in the ready queue section thread into the ready queue of Pt; if W <0, it is necessary to migrate from Pt queue in the ready thread portion Pi in order to achieve load balancing; = 0 if one hundred, the load balancing has been no need to migrate the thread queue .

本发明与背景技术相比,具有的有益的效果是-本发明是一种面向多核处理器操作系统的负载均衡方法,其主要功能是通过构建调度域,在调度域内部和调度域之间进行负载的均衡,从而在多核处理器操作系统上运行的多线程程序能够在操作系统的调度下,多线程均衡的分布在不同的处理器核上,从而提高多处理器核的执行效率。 Compared with the background of the invention, having a beneficial effect - the present invention is a method of load balancing for a multi-core processor, an operating system, its main function is carried out between the scheduled and scheduling domain within the domain by constructing scheduling domain load balancing, thereby multithreaded programs running on a multi-core processor system capable of operating under the operating system scheduler, a balanced distribution of the multi-threaded processor cores on different, thereby improving the efficiency of the multiple processor cores.

(1) 高效性。 (1) efficiency. 由操作系统对负载进行均衡,使得多个线程能够均衡的分布在多个处理器核上运行,提高了运行效率。 For load balancing by the operating system so that multiple threads can run a balanced distribution across multiple processor cores, improving operating efficiency.

(2) 实用性。 (2) practicality. 负载均衡能够提高线程运行的并行度,减少线程迁移,经过反复试验证明有很好的实用性。 Load balancing can improve the running thread parallelism, reduce thread migration, after repeated tests proved good practicability.

附图说明图1是本发明的实施过程示意图; 图2是四核二路多处理器调度域构建的示意图; 图3是四核二路多处理器负载均衡的线程分配示意图; 图4是四核二路多处理器调度域内负载不均衡的线程分配示意图; 图5是四核二路多处理器调度域间负载不均衡的线程分配示意图。 BRIEF DESCRIPTION OF DRAWINGS FIG. 1 is a schematic embodiment of the process of the present invention; FIG. 2 is a schematic diagram of a quad-core two-way multiprocessor scheduling domain construction; FIG. 3 is a schematic view of dispensing two-way four-core processor load balancing of multiple threads; FIG. 4 is four Road core art multiprocessor scheduling load imbalance schematic thread allocation; FIG. 5 is a schematic view of dispensing 4-core two-way multiprocessor scheduling domains thread load imbalance. 具体实施方式 Detailed ways

本发明是一种多核处理器操作系统负载均衡的实现方法,下面结合图1说明其具体实施过程。 The present invention is a multi-core processor operating system load balancing implemented method, described below in conjunction with FIG. 1 specific implementation.

1) 调度域构建: 1) Construction of scheduling domain:

通常所说的线程是指共享资源的轻量级进程,在现代操作系统调度中,线程是任务调度的基本单位。 Commonly referred to as a lightweight thread refers to the process of sharing resources in a modern operating system scheduler, the thread is the basic unit of task scheduling. 在本发明中调度的基本单位是线程;而负载是指运行在不同的处理器核上的线程。 The basic unit of scheduling in the present invention is a thread; the load is the thread running on different processor cores. 多核处理器有三个典型的特点:多处理器核、处理器核间共享二级高速缓存、处理器核之间能通过寄存器直接通信。 There are three typical multicore processor features: multi-processor core, the processor core among shared L2 cache, through direct communication between a processor core register. 在这样的处理器上, 一级高速缓存是各个处理器核私有的。 In such a processor, a cache is private to each processor core.

调度域,是负载量需要达到平衡的处理器核的集合。 Scheduling domain, is a collection of the amount required to balance the load of the processor core. 对于共享二级高速缓存的处理器核来说,线程在共享二级缓存的处理器之间迁移时,所发生的二级高速缓存失配率代价与不进行任务迁移的二级高速缓存失配率代价一致。 For shared L2 cache processor core, the thread migration between processors of shared L2 cache, L2 cache that occurred mismatch between costs and does not migrate task L2 cache mismatch rate consistent with price. 调度域的构建是将共享二级缓存的处理器核划分到同一个调度域中。 Construction scheduling domain shared processor core is divided into two cache same scheduling domain. 调度域是级层式结构。 Scheduling domain is a laminar structure. 最高层(第n级的调度域,若有n层调度域)包含所有的处理器核,而最底层(第0级,基层调度域)的调度域表示调度中负载关系最密切的处理器核。 Top (of the n-th stage scheduling domain, if the n-layer scheduling domain) contains all of the processor cores, and the bottom (0th, scheduling domain base layer) most closely scheduling domain representation of the relationship between the load scheduling processor core . 如果两个处理器在同一个调度域中,需要进行负载均衡。 If the two processors in the same scheduling domain, the need for load balancing. 调度域之间如果有父子、祖先或者兄弟关系,那么调度域之间的处理器核能进行负载均衡。 If there are father and son, fathers or brothers relationship between scheduling domain, then the processor scheduling between the domains of nuclear energy for load balancing. 图2 以四核二路多处理器为例,说明调度域的结构。 2 quad-core two-way multi-processor, for example, the configuration of scheduling domain. 处理器核0和处理器核1为基层调度域,处理器核2和处理器核3亦为基层调度域。 Processor core 0 and core 1 for the primary processor scheduling domain, the processor core 2 and the core 3 is also a base layer processor scheduling domains. 两个基层调度域共同构成上一层调度域。 The two base layers together constitute a layer scheduling domain scheduling domains.

每个处理器核在启动时都会分配到一个逻辑ID,这些逻辑ID从0开始递增。 Each processor core at startup is assigned to a logical ID, the logical ID is incremented from zero. 处理器核初始化的过程中,访问每个处理器核。 Processor core initialization process, access to each processor core. 共享二级缓存的处理器核被划分到同一个调度域当中。 Shared L2 cache to the processor core is divided among the same scheduling domains. 这样,就可以形成若干个不同的调度域。 Thus, it may be formed of a number of different scheduling domains.

2) 负载向量计算- 2) Load Vector Calculation -

负载向量是指进行负载比较的尺度。 It refers to a vector load scale load comparison. 为了对负载均衡进行有效的评估,需要使用负载向量。 In order to effectively evaluate the load balancing is required load vector. 负载向量,定义为判断处理器核负载的基本单位。 Load vector, defined as the basic unit of the processor core is determined load. 本发明使用资源使用率和运行队列长度作为计算负载向量的因子。 The present invention uses the resource usage and queue length is calculated as a vector load factor. 计算公式,其中Tused为处理器运算时间,Tidle为处理器空闲时间。 Calculation formula, wherein Tused processor operation time, processor idle time Tidle.

FGPU = Tused/ (Tidle + Tused) (1) FGPU = Tused / (Tidle + Tused) (1)

公式(2)给出了本发明中负载向量的计算方式,其中FCPU为处理器地利用率,利用公式(1)进行计算,Frun一queue为处理器核运行队列的长度。 Equation (2) shows the calculation load vectors of the present invention, wherein the processor utilization FCPU, using Equation (1) is calculated, a queue length of Frun run queue processor core.

Fload= (FCPU+1) *Frun—queue (2) Fload = (FCPU + 1) * Frun-queue (2)

3) 负载均衡检测 3) Load Balancing Detection

负载检测是指操作系统检查处理器核之间是否存在负载不平衡。 Load detecting means whether there is a load imbalance between the operating system checks the processor core. 负载均衡的检査是多核操作系统调度实现负载均衡中十分重要的部分。 Load balancing check is part of the multi-core operating system scheduler load balancing very important. 对于处理器核的一个调度域Pset={Pl,P2, ..., Pn},其中P1,P2, ..., Pn是调度域Pset中的处理器核,对于Pset中的处理器核Pi会去检测该处理器核与其他处理器核是否有负载失衡的状况。 A core processor for scheduling domain Pset = {Pl, P2, ..., Pn}, where P1, P2, ..., Pn the processor core is in scheduling domain Pset, for the processor core Pi Pset the processor will be to detect nuclear and other processor cores if there is load imbalances.

每个处理器核都有自己的负载检査。 Each processor core has its own load tests. 负载检査的时间发生在线程分配、处理器空闲和固定时间间隔。 Check load occurs at the time of thread allocation, processor idle time and a fixed intervals. 负载均衡检査过程如下: Check the load balancing process is as follows:

(1) Pi检测与它在同一个调度域中的处理器核Pj,Pj+l,......, Pj+k,若负载 (1) Pi detect it in the same scheduling domain core processor Pj, Pj + l, ......, Pj + k, if the load

不均衡,则返回负载向量相差最大的处理器核P和Pi负载向量差值正负的标志W。 Unbalanced load vector most different return flag processor core P and Pi positive and negative load vector difference W. 检查结束。 Check the end.

(2) 若同一调度域中的处理器核负载均衡,则去检查其他同层调度域中的负载。 (2) If the core domain of the same scheduling load balancing, the load to check with other layers scheduling domain. 同层调度域的检査只需要检查其中任意一个处理器Pm即可。 Check the same layer scheduling domain only need to check any of Pm to a processor. 负载不均衡时,返回第一个负载量不平衡的处理器核P和Pi负载差正负的标志W。 Unbalanced load, the first load imbalance returns processor core P and the difference between the positive and negative signs Pi load W.

对于负载量比较,需要用到前面所提到的负载向量。 Comparative load, the load need to use the aforementioned vector. 规定阀值M,对同一调度域中的处理器核,若负载向量差小于阀值aM (a<l),则负载均衡,反之, 负载不均衡;不同调度域中的处理器,取阀值M做比较。 Predetermined threshold value M, the same processor core domain scheduling, if the load is less than the threshold difference vector aM (a <l), the load balancing, on the contrary, the load is not balanced; processor scheduling domain different, thresholded M comparison. 其中阀值M和因子a 的选择与二级高速缓存命中失配、调度队列任务转移时间、调度器调度时间等有关系,可以在使用时根据应用环境进行设定。 Wherein the threshold value M and a factor selected cache hit with two mismatches, the transition time task scheduling queue, the scheduler schedules time relationship, the application environment may be set during use.

4) 线程分配: 4) thread allocation:

线程分配指线程均衡地分配到各个处理器核上。 Thread allocation refers to the threads evenly distributed to each processor core. 在新线程产生时,若处理器核负载均衡条件成立,线程会优先考虑在父线程执行的处理器核上继续执行。 When a new thread is generated, if the processor core load balancing condition is satisfied, the thread priority to continue to execute on the processor cores parent thread execution. 因此,在线程的任务描述符保留cpu一mask,用于标识某个线程可以运行的处理器核的集合,限制了线程可执行的处理器,设置该值后线程只能在cpujnask规定的处理器核集合内执行,达到负载的静态平衡。 Thus, task descriptors thread reserve a cpu mask, to identify a set of processor core thread can run, limits the executable processor threads, the processor set a value in a predetermined thread can cpujnask the implementation of a set of nuclear, to achieve static load balancing. 当线程Tnew产生后,分配流程如下:当线程Tnew的状态为可运行(runnable)后,调用父线程Tparent所在处理器核Pparent的检测负载均衡,若负载均衡,则该线程被进入父进程所在的处理器核的运行队列中。 When the thread Tnew generation, distribution process is as follows: when the state of the thread to be run Tnew (Runnable), calling the parent thread processor core is located Tparent Pparent detected load balancing, if the load balancing, the thread is located into the parent run queue processor core. 否则,该线程被插到负载最小的处理器核Pload—least的运行队列中。 Otherwise, the thread is inserted into the run queue processor minimal loading Pload-least the core.

下面以四核二路多核处理器为例,说明线程分配策略。 Below quad-core two-way multi-core processors to illustrate the thread allocation strategy. 处理器核0和处理器核1同在一个调度域中,处理器核2和处理器核3在同一个调度域。 The processor core of processor cores 0 and 1 in the same domain scheduling, the processor core 2 and the processor core 3 in the same scheduling domains. 新线程的父线程在处理器核2就绪。 Ready parent thread new thread in the processor core 2. 图3中负载平衡,Tnew被分配到处理器核1中; 在图4中调度域内负载不平衡,Tnew被分配到处理器核O中;图5调度域内负载均衡,但调度域间负载不平衡,Tnew被分配到处理器核2中。 Load balancing in FIG. 3, Tnew is assigned to a processor core; scheduling art unbalanced load in FIG 4, Tnew O is allocated to the processor core; and FIG. 5 schedule art load balancing, but the inter-domain scheduling load imbalance , Tnew is allocated to the processor core 2.

在现代操作系统中,线程产生的速度很快。 In modern operating systems, the speed of the thread produced quickly. 如果在每个线程产生时都去检测负载均衡,这个代价是得不偿失的。 If it is load balancing to detect when each thread is generated, and that price is worth the candle. 每个处理器核每隔一定时间间隔去检测负载均衡。 Each processor core at regular time intervals to detect load balancing. 在一段时间内,线程都分配到同一个处理器核中。 Over a period of time, threads are assigned to the same processor core. 这样更有效的解决负载均衡检测的代价问题。 This is more effective to solve the problem of load balancing the cost of testing.

5)运行时动态负载均衡: 5) dynamic load balancing runtime:

线程在运行时的状态会因为各种资源不足、用户中断、运行异常、需要通信的线程状态为不可运行等情况进入等待队列,线程运行中的缺页损失会造成线程等待。 Thread state run time because of lack of resources, user interruption, abnormal operation, the need for communication thread state is not available to run from entering the waiting queue, missing pages losses thread running will cause the thread to wait. 各种线程不能正常继续运行的条件是不可预知的,因此线程运行的剩余时间是动态可变的。 Conditions of the various threads can not continue to run normally is unpredictable, and therefore the remaining time running thread is dynamically variable. 因此只有线程分配维持处理器之间的负载均衡是不够的, 需要在线程运行时也做到动态的负载均衡。 Therefore, only the thread allocation is not enough to maintain load balancing between processors, also you need to do dynamic load balancing thread running. 运行时动态负载均衡的实现主要由处理器核之间的线程迁移实现。 Dynamic load balancing achieved mainly by a thread migration between cores to achieve runtime.

多核处理器共享二级缓存,线程在同一个调度域的处理器核间迁移,代价要比在不同调度域间的迁移失配代价小很多。 Multi-core processors shared L2 cache, thread migration between the same scheduling domain processor core, the cost is much smaller than the cost of migrating misfit between different scheduling domain.

对于处理器核的集合Pset,对于任意Pi属于Pset,都有独立的检查负载均衡策略。 For Pset set of processor cores, for any part of Pset Pi, has a separate check load balancing strategy. 这里采用的负载均衡检査策略与步骤3)中相同。 3 are the same) load balancing policy checking step used herein. 若Pi有线程运行, 负载检测由Pi隔固定的时间间隔调用;若Pi空闲,则减少时间间隔数,以尽量少的时间间隔进行检测。 If Pi have threads running load detection interval is called by a fixed time interval Pi; if Pi is idle, reducing the number of intervals, in order to minimize detection time intervals. 若所有处理器核都空闲,则调整检查负载均衡的时间间隔数。 If all processor cores are idle, adjust the load balancing check number of time intervals.

处理器核Pi发现负载均衡,负载均衡检査策略会返回与Pi负载不均衡的处理器Pt以及负载大小关系比较值W。 Pi-core processor found in load balancing, load balancing check policy and will return Pi Pt processor load imbalance and load size relationship between the comparison value W. 若WX), Pi的负载量小于Pt,需要迁移Pi中部分就绪队列中的线程到Pt的就绪队列中;若W〈0,则需要从Pt中的就绪队列迁移部分线程到Pi中,以达到负载均衡。 If WX), Pi loading amount is less than Pt, you need to migrate Pi portion ready queue threads to Pt ready queue; if W <0, the need to queue and relocation of the thread from Pt, ready to Pi in order to achieve load balancing. 如果W-0时,负载已经均衡, 不需要迁移线程队列。 If the W-0, the load has been balanced, no need to migrate thread queue. 由于很多因素,如线程迁移时很多线程不能进行迁移,有可能达不到负载平衡。 Due to many factors, such as the many threads when the thread migration can not migrate, it is possible to reach load balancing. 所以需要对其他不平衡的处理器继续做动态负载平衡,直到负载平衡。 So it is necessary to continue to do dynamic load balancing to other processors balanced, until the load balancing.

让单个处理器核Pi独自检测负载均衡,平衡目标为Pi与其负载量相差最大的处理器核之间的负载。 Let alone a single processor core Pi detected load balancing, load balancing target of Pi and its load difference between the maximum amount of processor cores. 由于每个处理器核都会对与其负载量相差最多的处理器进行负载均衡,因此每个处理器核的动态负载平衡可以达到全局的负载平衡。 Since each processor core will most of its loading difference processor load balancing, each processor core can achieve dynamic load balancing global load balancing.

线程从处理器Pt到Pi迁移时,当被选择的线程Tselected满足以下条件时, 线程不做迁移。 When the thread migration from processor to Pt Pi, when selected thread Tselected following conditions are met, the thread is not migrate.

(1) 线程Tselected正在目标处理器核中执行。 (1) Tselected thread is executing the target processor cores.

(2) 线程Tselected处于cache热命中状态,即当前线程在最近时间段内被使用过。 (2) hot thread Tselected in cache hit state, that is, the current thread is used in the most recent period.

(3) 线程Tselected的cpu_mask所示的处理器核集合中不包含处理器Pi, 则线程Tselected不能被Pi运行。 A set of processor core (3) of the thread Tselected cpu_mask not shown includes a processor Pi, Pi the thread Tselected not be run.

Claims (1)

1.一种多核处理器操作系统负载均衡的实现方法,其特征在于: 1)调度域构建: 处理器核初始化的过程中,访问每个处理器核;共享二级缓存的处理器核被划分到同一个调度域当中;这样,就可以形成若干个不同的调度域; 2)负载向量计算: 使用资源使用率和运行队列长度作为计算负载向量的因子,使用公式(1)计算处理器核的利用率FCPU,其中Tused为处理器运算时间,Tidle为处理器空闲时间, FCPU=Tused/(Tidle+Tused) (1) 使用公式(2)计算负载向量Fload,其中FCPU为处理器核的利用率,利用公式(1)进行计算,Frun_queue为处理器核运行队列的长度; Fload=(FCPU+1)*Frun_queue (2) 3)负载均衡检测: 对于处理器核的一个调度域Pset={P1,P2,...,Pn},其中P1,P2,...,Pn是调度域Pset中的处理器核,对于Pset中的处理器核Pi会去检测该处理器核与其他处理器核是否有负载失衡的状 A multi-core processor operating system load balancing implemented method, comprising: 1) Construction of scheduling domains: a processor core initialization process, each processor core access; shared L2 cache of the processor core is divided to which the same scheduling domain; Thus, it may be formed of a number of different scheduling domain; 2) load vector calculation: use resource usage and queue length as a vector load factor calculation, using equation (1) calculation processor cores FCPU utilization, wherein Tused processor operation time, Tidle processor idle time, FCPU = Tused / (Tidle + Tused) (1) using equation (2) calculates the load vector Fload, wherein the processor core utilization FCPU , using equation (1) is calculated, Frun_queue length of the run queue processor core; Fload = (FCPU + 1) * Frun_queue (2) 3) load balancing detection: a processor core for scheduling domain Pset = {P1, P2, ..., Pn}, where P1, P2, ..., Pn the processor core is in scheduling domain Pset, for the processor core will Pset Pi to detect the processor core if the other processor cores like a load imbalance 况;每个处理器核都有自己的负载检查,负载检查的时间发生在线程分配、处理器空闲和固定时间间隔; 负载均衡检查过程如下: 第一步,Pi检测与它在同一个调度域中的处理器核Pj,Pj+1,......,Pj+k,若负载不均衡,则返回负载向量相差最大的处理器核P和Pi负载向量差值正负的标志W,检查结束; 第二步,若同一调度域中的处理器核负载均衡,则去检查其他同层调度域中的负载,同层调度域的检查只需要检查其中任意一个处理器Pm即可;负载不均衡时,返回第一个负载量不平衡的处理器核P和Pi负载差正负的标志W; 4)线程分配: 当线程Tnew产生后,分配流程如下: 当线程Tnew的状态为可运行后,调用父线程Tparent所在处理器核Pparent的检测负载均衡,若负载均衡,则该线程被进入父进程所在的处理器核的运行队列中;否则,该线程被插到负载最小的处理 Conditions; each core has its own load inspection, inspection of the load in the thread dispensing time occurs, and processor idle fixed time interval; check load balancing process is as follows: First, it detects the same Pi scheduling domain the core processor Pj, Pj + 1, ......, Pj + k, if the load imbalance, the return load vector most different processor cores P and Pi plus or minus sign of the difference vector load W, check the end; the second step, if the same processor core load balancing scheduling domain, to check the load of the same layer as the other domain scheduling, scheduling domain checking the same layer only need to check any of Pm to a processor; load when uneven, unbalanced loading of the first return of the processor core and the Pi P load difference between the plus and minus signs W; 4) thread allocation: Tnew when the thread generation, distribution process is as follows: when the state of the thread to be run Tnew after the call to detect load balancing parent thread Tparent where the processor core Pparent, if load balancing, the thread is running into the queue processor cores where the parent process; otherwise, the thread is inserted into the smallest processing load 器核Pload_least的运行队列中; 5)运行时动态负载均衡: 对于处理器核的集合Pset,对于任意Pi属于Pset,都有独立的检查负载均衡策略,这里采用的负载均衡检查策略与步骤3)中相同; 若Pi有线程运行,负载检测由Pi隔固定的时间间隔调用;若Pi空闲,则减少时间间隔数,以尽量少的时间间隔进行检测,若所有处理器核都空闲,则调整检查负载均衡的时间间隔数; 处理器核Pi发现负载均衡,负载均衡检查策略会返回与Pi负载不均衡的处理器Pt以及负载大小关系比较值W;若W>0,Pi的负载量小于Pt,需要迁移Pi中部分就绪队列中的线程到Pt的就绪队列中;若W<0,则需要从Pt中的就绪队列迁移部分线程到Pi中,以达到负载均衡;如果W=0时,负载已经均衡,不需要迁移线程队列。 Run queue of cores Pload_least; 5) dynamic load balancing runtime: Pset set for the processor core, for any Pset Pi belongs, has its own load balancing policy checks, check load balancing strategies employed herein Step 3) the same; if Pi has threads running, the load detection is called by a fixed time interval Pi compartment; if Pi is idle, reducing the number of intervals, in order to minimize detection time interval, when all processor cores are idle, checking the adjustment load balancing number of time intervals; Pi processor core that the load balancing, load balancing policy checks and returns the processor load imbalance Pi Pt, and W is the load magnitude relation comparison value; if W> 0, Pi is less than the amount of Pt supported, you need to migrate Pi portion ready queue threads to Pt ready queue; if W <0, the need to queue and relocation of the thread from the Pt is ready to Pi in order to achieve load balancing; If W = 0, the load has been balanced, no need to migrate thread queue.
CN 200810061134 2008-03-11 2008-03-11 Method for implementing load equalization of multicore processor operating system CN100562854C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200810061134 CN100562854C (en) 2008-03-11 2008-03-11 Method for implementing load equalization of multicore processor operating system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200810061134 CN100562854C (en) 2008-03-11 2008-03-11 Method for implementing load equalization of multicore processor operating system

Publications (2)

Publication Number Publication Date
CN101256515A CN101256515A (en) 2008-09-03
CN100562854C true CN100562854C (en) 2009-11-25

Family

ID=39891358

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200810061134 CN100562854C (en) 2008-03-11 2008-03-11 Method for implementing load equalization of multicore processor operating system

Country Status (1)

Country Link
CN (1) CN100562854C (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011150792A1 (en) * 2010-11-29 2011-12-08 华为技术有限公司 Power saving realization method and device for cpu
CN106293935A (en) * 2016-07-28 2017-01-04 张升泽 Electric current is in the how interval distribution method within multi core chip and system

Families Citing this family (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101394362B (en) 2008-11-12 2010-12-22 清华大学 Method for load balance to multi-core network processor based on flow fragmentation
CN101739286B (en) 2008-11-19 2012-12-12 英业达股份有限公司 Method for balancing load of storage server with a plurality of processors
CN101782862B (en) * 2009-01-16 2013-03-13 鸿富锦精密工业(深圳)有限公司 Processor distribution control system and control method thereof
CN101504618B (en) 2009-02-26 2011-04-27 浙江大学 Multi-core processor oriented real-time thread migration method
US8788570B2 (en) * 2009-06-22 2014-07-22 Citrix Systems, Inc. Systems and methods for retaining source IP in a load balancing multi-core environment
US20110022870A1 (en) * 2009-07-21 2011-01-27 Microsoft Corporation Component power monitoring and workload optimization
EP2549384B1 (en) * 2010-03-18 2018-01-03 Fujitsu Limited Multi-core processor system, arbitration circuit control method, and arbitration circuit control program
WO2011117987A1 (en) * 2010-03-24 2011-09-29 富士通株式会社 Multi-core system and start-up method
CN102455944A (en) * 2010-10-29 2012-05-16 迈普通信技术股份有限公司 Multi-core load balancing method and processor thereof
CN102156659A (en) * 2011-03-28 2011-08-17 中国人民解放军国防科学技术大学 Scheduling method and system for job task of file
CN102521047B (en) * 2011-11-15 2014-07-09 重庆邮电大学 Method for realizing interrupted load balance among multi-core processors
CN103197977B (en) * 2011-11-16 2016-09-28 华为技术有限公司 A kind of thread scheduling method, thread scheduling device and multi-core processor system
TWI439925B (en) * 2011-12-01 2014-06-01 Inst Information Industry Embedded systems and methods for threads and buffer management thereof
CN102546946B (en) * 2012-01-05 2014-04-23 中国联合网络通信集团有限公司 Method and device for processing task on mobile terminal
CN103297767B (en) * 2012-02-28 2016-03-16 三星电子(中国)研发中心 A kind of jpeg image coding/decoding method and decoder being applicable to multinuclear embedded platform
CN102609307A (en) * 2012-03-07 2012-07-25 汉柏科技有限公司 Multi-core multi-thread dual-operating system network equipment and control method thereof
CN102629217B (en) * 2012-03-07 2015-04-22 汉柏科技有限公司 Network equipment with multi-process multi-operation system and control method thereof
CN102681889B (en) * 2012-04-27 2015-01-07 电子科技大学 Scheduling method of cloud computing open platform
CN102866922B (en) * 2012-08-31 2014-10-22 河海大学 Load balancing method used in massive data multithread parallel processing
CN104239149B (en) * 2012-08-31 2017-03-29 南京工业职业技术学院 A kind of service end multi-threaded parallel data processing method and load-balancing method
CN102929718B (en) * 2012-09-17 2015-03-11 厦门坤诺物联科技有限公司 Distributed GPU (graphics processing unit) computer system based on task scheduling
CN102929772A (en) * 2012-10-16 2013-02-13 苏州迈科网络安全技术股份有限公司 Monitoring method and system of intelligent real-time system
CN103793270B (en) * 2012-10-26 2018-09-07 百度在线网络技术(北京)有限公司 Moving method, device and the terminal of end application
CN104219161B (en) * 2013-06-04 2017-09-05 华为技术有限公司 A kind of method and device of balance nodes load
CN103530191B (en) * 2013-10-18 2017-09-12 杭州华为数字技术有限公司 Focus identifying processing method and device
CN103617086B (en) * 2013-11-20 2017-02-08 东软集团股份有限公司 Parallel computation method and system
CN105009083A (en) * 2013-12-19 2015-10-28 华为技术有限公司 Method and device for scheduling application process
JP5808450B1 (en) 2014-04-04 2015-11-10 ファナック株式会社 Control device for executing sequential program using multi-core processor
CN103927225B (en) * 2014-04-22 2018-04-10 浪潮电子信息产业股份有限公司 A kind of internet information processing optimization method of multi-core framework
CN104239153B (en) * 2014-09-29 2018-09-11 三星电子(中国)研发中心 The method and apparatus of multi-core CPU load balancing
CN105700951A (en) * 2014-11-25 2016-06-22 中兴通讯股份有限公司 Method and device for realizing CPU (Central Processing Unit) business migration
CN104506452B (en) * 2014-12-16 2017-12-26 福建星网锐捷网络有限公司 A kind of message processing method and device
CN106033374A (en) * 2015-03-13 2016-10-19 西安酷派软件科技有限公司 Method and device for distributing multi-core central processing unit in multisystem, and terminal
CN104978235A (en) * 2015-06-30 2015-10-14 柏斯红 Operating frequency prediction based load balancing method
CN106371914A (en) * 2015-07-23 2017-02-01 中国科学院声学研究所 Load intensity-based multi-core task scheduling method and system
US20170039093A1 (en) * 2015-08-04 2017-02-09 Futurewei Technologies, Inc. Core load knowledge for elastic load balancing of threads
CN106487606A (en) * 2015-08-28 2017-03-08 阿里巴巴集团控股有限公司 A kind of dispatching method for network tester and system
CN105700959B (en) * 2016-01-13 2019-02-26 南京邮电大学 A kind of multithreading division and static equilibrium dispatching method towards multi-core platform
WO2018018372A1 (en) * 2016-07-25 2018-02-01 张升泽 Method and system for calculating current in electronic chip
WO2018018373A1 (en) * 2016-07-25 2018-02-01 张升泽 Power calculation method and system for multiple core chips
WO2018018424A1 (en) * 2016-07-26 2018-02-01 张升泽 Temperature control method and system based on chip
WO2018018425A1 (en) * 2016-07-26 2018-02-01 张升泽 Method and system for allocating threads of multi-kernel chip
CN106227602A (en) * 2016-07-26 2016-12-14 张升泽 The distribution method being supported between multi core chip and system
WO2018018452A1 (en) * 2016-07-27 2018-02-01 李媛媛 Load balance application method and system in multi-core chip
CN106775975A (en) * 2016-12-08 2017-05-31 青岛海信移动通信技术股份有限公司 Process scheduling method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1329312A (en) 2000-06-08 2002-01-02 国际商业机器公司 Interactive data handling system control display interface for tracking distributed message in dynamic work load equalization communication system
CN1664803A (en) 2004-03-04 2005-09-07 国际商业机器公司 Mechanism for enabling the distribution of operating system resources in a multi-node computer system
CN1786917A (en) 2004-12-07 2006-06-14 国际商业机器公司 Borrowing threads as a form of load balancing in a multiprocessor data processing system
US7255722B2 (en) 2002-12-18 2007-08-14 Wako Filter Technology Co Gas filter

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1329312A (en) 2000-06-08 2002-01-02 国际商业机器公司 Interactive data handling system control display interface for tracking distributed message in dynamic work load equalization communication system
US7255722B2 (en) 2002-12-18 2007-08-14 Wako Filter Technology Co Gas filter
CN1664803A (en) 2004-03-04 2005-09-07 国际商业机器公司 Mechanism for enabling the distribution of operating system resources in a multi-node computer system
CN1786917A (en) 2004-12-07 2006-06-14 国际商业机器公司 Borrowing threads as a form of load balancing in a multiprocessor data processing system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011150792A1 (en) * 2010-11-29 2011-12-08 华为技术有限公司 Power saving realization method and device for cpu
US9377842B2 (en) 2010-11-29 2016-06-28 Huawei Technologies Co., Ltd. Method and apparatus for realizing CPU power conservation
CN106293935A (en) * 2016-07-28 2017-01-04 张升泽 Electric current is in the how interval distribution method within multi core chip and system

Also Published As

Publication number Publication date
CN101256515A (en) 2008-09-03

Similar Documents

Publication Publication Date Title
Cazorla et al. Dynamically controlled resource allocation in SMT processors
Merkel et al. Resource-conscious scheduling for energy efficiency on multicore processors
Knauerhase et al. Using OS observations to improve performance in multicore systems
US8607235B2 (en) Mechanism to schedule threads on OS-sequestered sequencers without operating system intervention
Mars et al. Contention aware execution: online contention detection and response
US20150135189A1 (en) Software-based thread remapping for power savings
JP2014513373A (en) Automatic load balancing for heterogeneous cores
US8707314B2 (en) Scheduling compute kernel workgroups to heterogeneous processors based on historical processor execution times and utilizations
US10185566B2 (en) Migrating tasks between asymmetric computing elements of a multi-core processor
US20070204268A1 (en) Methods and systems for scheduling processes in a multi-core processor environment
Li et al. Efficient operating system scheduling for performance-asymmetric multi-core architectures
Augonnet et al. Data-aware task scheduling on multi-accelerator based platforms
Goumas et al. Performance evaluation of the sparse matrix-vector multiplication on modern architectures
US20130081039A1 (en) Resource allocation using entitlements
US7996839B2 (en) Heterogeneous processor core systems for improved throughput
US20070074217A1 (en) Scheduling optimizations for user-level threads
Zhong et al. Kernelet: High-throughput GPU kernel executions with dynamic slicing and scheduling
US7698540B2 (en) Dynamic hardware multithreading and partitioned hardware multithreading
US7996346B2 (en) Method for autonomic workload distribution on a multicore processor
Cong et al. Energy-efficient scheduling on heterogeneous multi-core architectures
US9251103B2 (en) Memory-access-resource management
Bautista et al. A simple power-aware scheduling for multicore systems when running real-time applications
Van Craeynest et al. Fairness-aware scheduling on single-ISA heterogeneous multi-cores
Weng et al. Dynamic adaptive scheduling for virtual machines
US20100077185A1 (en) Managing thread affinity on multi-core processors

Legal Events

Date Code Title Description
C06 Publication
C10 Request of examination as to substance
C14 Granted
C17 Cessation of patent right