CN100562854C

CN100562854C - The implementation method of load equalization of multicore processor operating system

Info

Publication number: CN100562854C
Application number: CNB2008100611349A
Authority: CN
Inventors: 陈天洲; 胡威; 曹明腾; 施青松; 严力科; 谢斌; 冯德贵; 王罡; 蒋冠军; 王宇杰
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2008-03-11
Filing date: 2008-03-11
Publication date: 2009-11-25
Anticipated expiration: 2028-03-11
Also published as: CN101256515A

Abstract

The invention discloses a kind of implementation method of load equalization of multicore processor operating system.Be when multicore processor operating system is moved, loading condition is detected, and thread is distributed according to the loading condition that detects.This method realizes the equilibrium of multicore processor operating system load, thereby under the multithread programs that moves on the multicore processor operating system can the scheduling in operating system, being distributed on the different processor cores of multithreading equilibrium, thus the execution efficient that multiprocessor is examined improved.

Description

The implementation method of load equalization of multicore processor operating system

Technical field

The present invention relates to the multicore processor operating system technology, particularly relate to a kind of implementation method of load equalization of multicore processor operating system.

Background technology

Moore's Law decades occurred, but constantly dwindling along with the integrated circuit transistor size in recent years, be difficult to refill more crystal element down in the die size the inside, the complexity of integrated circuit can not be improved significantly, and then is indicating that processor performance can not get increasing substantially.On the other hand, the frequency of processor has been difficult to improve to a bottleneck (Pentium 4 has been up to 3.8GHZ, does not reach the 4GHz of expection) again, even can improve running frequency, the power problems that is brought can not solve.Therefore, in order to promote performance, be that the hardware development merchant of representative begins to be conceived to polycaryon processor and (also claims single-chip multiprocessing framework, Chip Multi-Processors, exploitation CMP) with Intel, AMD, IBM.

The appearance of new architecture must have suitable its more performance of software coupling competence exertion.The basic thought that the multi-core system structural behaviour promotes is that task is carried out suitable decomposition, makes task parallel simultaneously on a plurality of processors.Therefore, parallel computation is the characteristics of multi-core system structure maximum.The main bottleneck that the division of task at present, multithreading are carried out realization because the multithreading of multi-core system structural requirement singly is not to realize multithreading on software degree, and will be realized multithreading on hardware view on software.

How operating system allows operating system bring into play the performance of multinuclear better as contacting the closest software with hardware, is a focus of studying at present.Co-ordination between the multiprocessing, concurrent degree reach as far as possible greatly, depend on scheduling and the distribution of operating system dispatcher to task to a great extent.

As everyone knows, the task scheduling of operating system comprises the scheduling to real-time task, interactivity task and backstage batch processing task.The algorithm of scheduling can be based on priority, round-robin, task preemption etc.The subject matter that scheduling solves is how to reach the handling capacity of the utilization of resource fullest and system's maximum and spend the least possible scheduling time.

Multiprocessor has proposed new scheduling problem compared to single core processor: load balancing and Task Distribution.Load balancing refers to allow as far as possible the resource of occupying that all processors can both be balanced, to reach maximum system throughput; Task Distribution refer to system the task reasonable distribution to each processor core, to reach the equilibrium of workload between the processor.Considering under many different multi-core system structures, exploring a kind of suitable polycaryon processor scheduling and seem particularly important.

The difference of multiple nucleus system and monokaryon system maximum is the concurrency of multiple nucleus system.All processors can both reach maximum throughput and resource peak use rate as far as possible in the concurrency requirement system.Therefore, we wish that each processor can balancedly execute the task, and load is identical.

Load balancing is the new problem that multinuclear operating system proposes.In monokaryon operating system, have only a nuclear, do not need to consider load balancing.Multiple nucleus system will reach best execution performance as far as possible, task need be assigned on each processor core equably, refer to here evenly, not merely be task quantity evenly, also comprise even to system resource access, the execution time is even.The target of task reasonable distribution is a load balancing.From the narrow sense angle that task is carried out, the time length of task run, the opportunity of access resources, Request System interruption etc. all are unpredictalbe, and task executions is dynamic.Therefore Task Distribution can not be unilaterally considered in the equilibrium of load, when in the system between the processor load take place when unbalanced, mobile equilibrium in the time of need doing migration as operation to task is to reach the purpose of each processor load balancing.

Reasonably solve these two new problems of load balancing of processor, can carry out more effective and reasonable utilization, task is carried out obtained fastest response resource.

Summary of the invention

The object of the present invention is to provide a kind of implementation method of load equalization of multicore processor operating system.

The technical scheme that the present invention solves its technical matters employing is as follows:

1) dispatching zone makes up:

In the initialized process of processor core, visit each processor core; The processor core of sharing L2 cache is divided in the middle of the same dispatching zone; Like this, just can form several different dispatching zones;

2) load vector is calculated:

Use resource utilization and the operation queue length factor as the computational load vector, use the utilization factor FCPU of formula (1) computation processor nuclear, wherein Tused is the processor calculating time, and Tidle is processor free time,

FCPU＝Tused/(Tidle+Tused)(1)

Use formula (2) computational load vector Fload, wherein FCPU is the utilization factor of processor core, utilizes formula (1) to calculate, and Frun_queue is the length of processor core operation queue;

Fload＝(FCPU+1)*Frun_queue (2)

3) load balancing detects:

For a dispatching zone Pset={P1 of processor core, P2 ..., Pn}, P1 wherein, P2, ..., Pn is the processor core among the dispatching zone Pset, can go to detect the situation whether this processor core and other processor cores have load imbalance for the processor core Pi among the Pset; Each processor core all has the load inspection of oneself, and the time of load inspection occurs in thread distribution, processor free time and Fixed Time Interval;

The load balancing checking process is as follows:

The first step, Pi detects and its processor core Pj in same dispatching zone, Pj+1 ..., Pj+k if load is unbalanced, then returns load vector and differs maximum processor core P and the positive and negative sign W of Pi load vector difference, checks and finishes;

In second step, if the processor core load balancing in the same dispatching zone then goes to check other with the load in the layer dispatching zone, the inspection of layer dispatching zone only need check that wherein any one processor P m gets final product together; When load is unbalanced, return unbalanced processor core P of first charge capacity and the positive and negative sign W of Pi load difference;

4) thread distributes:

After thread Tnew produced, allocation flow was as follows:

When the state of thread Tnew for after can moving, call the detection load balancing of father's thread Tparent place processor core Pparent, if load balancing, then this thread is entered in the operation queue of processor core at parent process place; Otherwise this thread is inserted in the operation queue of processor core Pload_least of load minimum;

Dynamic load leveling when 5) moving:

Set Pset for processor core belongs to Pset for any Pi, all has independently to check the load balancing strategy.Here identical in the load balancing inspection policy of Cai Yonging and the step 3):

If Pi has the thread operation, load detecting is called every regular time at interval by Pi; If the Pi free time then reduces time interval number, few time interval is detected to try one's best.If all processor cores are all idle, then adjust the time interval number of checking load balancing;

Processor core Pi finds load balancing, and the load balancing inspection policy can be returned and unbalanced processor P t of Pi load and load magnitude relationship fiducial value W; If W＞0, the charge capacity of Pi are less than Pt, the thread among the needs migration Pi in the part ready queue is in the ready queue of Pt; If W＜0 then needs the ready queue from Pt to move the part thread in Pi, to reach load balancing; If during W=0, load is balanced, does not need to move the thread formation.

The present invention compares with background technology, and the useful effect that has is:

The present invention is a kind of load-balancing method towards multicore processor operating system, its major function is by making up dispatching zone, between dispatching zone inside and dispatching zone, carry out the equilibrium of load, thereby under the multithread programs that moves on the multicore processor operating system can the scheduling in operating system, being distributed on the different processor cores of multithreading equilibrium, thus the execution efficient that multiprocessor is examined improved.

(1) high efficiency.By operating system equilibrium is carried out in load, made a plurality of threads balanced being distributed on a plurality of processor cores to move, improved operational efficiency.

(2) practicality.Load balancing can improve the degree of parallelism of thread operation, reduces thread migration, through the repetition test proof good practicability is arranged.

Description of drawings

Fig. 1 is an implementation process synoptic diagram of the present invention;

Fig. 2 is the synoptic diagram that four nuclears, two road scheduling of multiprocessor territories make up;

Fig. 3 is that the thread of four nuclears No. two multiprocessor load balancing distributes synoptic diagram;

Fig. 4 is that the unbalanced thread of four nuclears, two road scheduling of multiprocessor territory internal burdens distributes synoptic diagram;

Fig. 5 is that the unbalanced thread of load distributes synoptic diagram between four nuclears, two road scheduling of multiprocessor territories.

Embodiment

The present invention is a kind of implementation method of load equalization of multicore processor operating system, below in conjunction with Fig. 1 its specific implementation process is described.

1) dispatching zone makes up:

Usually said thread is meant the Lightweight Process of shared resource, and in the modern operating system scheduling, thread is the base unit of task scheduling.Tiao Du base unit is a thread in the present invention; And load is meant the thread that operates on the different processor cores.Polycaryon processor has three typical characteristics: share between second level cache, the processor core between multiprocessor nuclear, processor core and can pass through the register direct communication.On such processor, on-chip cache is that each processor core is privately owned.

Dispatching zone is the set that charge capacity need reach the processor core of balance.For the processor core of sharing second level cache, when thread moved between the processor of sharing L2 cache, the second level cache mismatch ratio cost that is taken place was consistent with the second level cache mismatch ratio cost of not carrying out task immigration.The structure of dispatching zone is that the processor core that will share L2 cache is divided in the same dispatching zone.Dispatching zone is a grade layered structure.Top (dispatching zone of n level, if having n layer dispatching zone) comprises all processor cores, and the dispatching zone of the bottom (the 0th grade, basic unit's dispatching zone) represent to dispatch in the closest processor core of load relationship.If two processors in same dispatching zone, need carry out load balancing.If father and son, ancestors or brotherhood are arranged between the dispatching zone, the processor nuclear energy between the dispatching zone carries out load balancing so.Fig. 2 is an example with four nuclears No. two multiprocessors, and the structure of dispatching zone is described.Processor core 0 and processor core 1 are basic unit's dispatching zone, and processor core 2 and processor core 3 also are basic unit's dispatching zone.Two basic unit's dispatching zones constitute the last layer dispatching zone jointly.

Each processor core all can be assigned to a logic ID when starting, these logic ID increase progressively since 0.In the initialized process of processor core, visit each processor core.The processor core of sharing L2 cache is divided in the middle of the same dispatching zone.Like this, just can form several different dispatching zones.

2) load vector is calculated:

Load vector is meant carries out duty factor yardstick.For load balancing is effectively assessed, need the working load vector.Load vector is defined as the base unit that decision processor is examined load.

The present invention uses resource utilization and the operation queue length factor as the computational load vector.

Formula (1) has provided the utilization factor FCPU computing formula of processor core, and wherein Tused is the processor calculating time, and Tidle is processor free time.

FCPU＝Tused/(Tidle+Tused)(1)

Formula (2) has provided the account form of load vector among the present invention, and wherein FCPU is a processor ground utilization factor, utilizes formula (1) to calculate, and Frun_queue is the length of processor core operation queue.

Fload＝(FCPU+1)*Frun_queue (2)

3) load balancing detects

Load detecting is meant operating system checks between the processor core whether have laod unbalance.The inspection of load balancing is that the multinuclear operating system scheduling is realized crucial part in the load balancing.For a dispatching zone Pset={P1 of processor core, P2 ..., Pn}, P1 wherein, P2, ..., Pn is the processor core among the dispatching zone Pset, can go to detect the situation whether this processor core and other processor cores have load imbalance for the processor core Pi among the Pset.

Each processor core all has the load inspection of oneself.The time of load inspection occurs in thread distribution, processor free time and Fixed Time Interval.

The load balancing checking process is as follows:

(1) Pi detects and its processor core Pj in same dispatching zone, Pj+1 ..., Pj+k if load is unbalanced, then returns load vector and differs maximum processor core P and the positive and negative sign W of Pi load vector difference.Check and finish.

(2) if the processor core load balancing in the same dispatching zone goes then to check that other are with the load in the layer dispatching zone.Inspection with layer dispatching zone only need check that wherein any one processor P m gets final product.When load is unbalanced, return unbalanced processor core P of first charge capacity and the positive and negative sign W of Pi load difference.

For charge capacity relatively, need use load vector noted earlier.Prescribed threshold M, to the processor core in the same dispatching zone, if the load vector difference is less than threshold values aM (a＜1), load balancing then, on the contrary load is unbalanced; Processor in the different dispatching zones is got threshold values M and is compared.Wherein the selection of threshold values M and factor a and second level cache hit mismatch, scheduling queue task transfers time, scheduler schedules time etc. relation, can set according to applied environment in use.

4) thread distributes:

Thread distributes and refers to that thread balancedly is assigned on each processor core.When new thread produced, if processor core load balancing condition is set up, thread can be paid the utmost attention to and continue to carry out on the processor core of father's thread execution.Therefore, keep cpu_mask at the task descriptor of thread, be used to identify the set of the processor core that certain thread can move, limited the executable processor of thread, this value back thread is set can only be carried out in the set of cpu_mask predetermined process device nuclear, reaches the static balancing of load.

After thread Tnew produces, allocation flow is as follows: after the state of thread Tnew is to move (runnable), call the detection load balancing of father's thread Tparent place processor core Pparent, if load balancing, then this thread is entered in the operation queue of processor core at parent process place.Otherwise this thread is inserted in the operation queue of processor core Pload_least of load minimum.

Be example with four nuclears, two road polycaryon processors below, the thread allocation strategy is described.Processor core 0 and processor core 1 coexist in the dispatching zone, and processor core 2 and processor core 3 are at same dispatching zone.Father's thread of new thread is ready at processor core 2.Load balance among Fig. 3, Tnew are assigned in the processor core 1; Dispatching zone internal burden imbalance in Fig. 4, Tnew is assigned in the processor core 0; The equilibrium of Fig. 5 dispatching zone internal burden, but laod unbalance between dispatching zone, Tnew is assigned in the processor core 2.

In modern operating system, the speed that thread produces is very fast.If all go to detect load balancing when each thread produces, this cost loses more than gain.Each processor core goes to detect load balancing at regular intervals at interval.In a period of time, thread all is assigned in the same processor core.The cost problem that more effective like this solution load balancing detects.

Dynamic load leveling when 5) moving:

Thread can be because various inadequate resources, user's interruption, operation exception, the thread state that needs communication enter waiting list for situation such as moving at the state in when operation, and the operating loss of skipping leaf of thread can cause thread waits.The condition that various threads can not normally continue to move is unpredictable, so be dynamically changeable the excess time of thread operation.Therefore it is not enough having only thread to distribute the load balancing of keeping between the processor, need also accomplish dynamic load balancing when thread moves.The realization of dynamic load leveling is mainly realized by the thread migration between the processor core during operation.

Polycaryon processor is shared L2 cache, and thread moves between the processor core of same dispatching zone, and cost is little more a lot of than the migration mismatch cost between different dispatching zones.

Set Pset for processor core belongs to Pset for any Pi, all has independently to check the load balancing strategy.Here identical in the load balancing inspection policy of Cai Yonging and the step 3).If Pi has the thread operation, load detecting is called every regular time at interval by Pi; If the Pi free time then reduces time interval number, few time interval is detected to try one's best.If all processor cores are all idle, then adjust the time interval number of checking load balancing.

Processor core Pi finds load balancing, and the load balancing inspection policy can be returned and unbalanced processor P t of Pi load and load magnitude relationship fiducial value W.If W＞0, the charge capacity of Pi are less than Pt, the thread among the needs migration Pi in the part ready queue is in the ready queue of Pt; If W＜0 then needs the ready queue from Pt to move the part thread in Pi, to reach load balancing.If during W=0, load is balanced, does not need to move the thread formation.

Because several factors, a lot of threads can not move during as thread migration, might not reach load balance.So need continue to do balancing dynamic load to other unbalanced processors, up to load balance.

Allow single processor core Pi detect load balancing alone, the balance target is that Pi and its charge capacity differ the load between the maximum processor core.Because each processor core all can carry out load balancing to differing maximum processor with its charge capacity, so the balancing dynamic load of each processor core can reach the load balance of the overall situation.

When Pi moved, when selecteed thread Tselected met the following conditions, thread was not done migration to thread from processor P t.

(1) thread Tselected just carries out in target processor nuclear.

(2) thread Tselected is in the hot hit condition of cache, and promptly current thread had been used in the time period recently.

(3) do not comprise processor P i in the set of the processor core shown in the cpu_mask of thread Tselected, then thread Tselected can not be moved by Pi.

Claims

1. the implementation method of a load equalization of multicore processor operating system is characterized in that:

1) dispatching zone makes up:

2) load vector is calculated:

FCPU＝Tused/(Tidle+Tused) (1)

Fload＝(FCPU+1)*Frun_queue (2)

3) load balancing detects:

The load balancing checking process is as follows:

4) thread distributes:

After thread Tnew produced, allocation flow was as follows:

Dynamic load leveling when 5) moving:

Set Pset for processor core belongs to Pset for any Pi, all has independently to check the load balancing strategy, and is identical in the load balancing inspection policy of Cai Yonging and the step 3) here;

If Pi has the thread operation, load detecting is called every regular time at interval by Pi; If the Pi free time, then reduce time interval number, detect with few time interval of trying one's best, if all processor cores are all idle, then adjust the time interval number of checking load balancing;