Many-core system processor resource dispatching method based on thermal sensing dynamic task migrating
Technical field
The present invention relates to fields of communication technology, and in particular at a kind of many-core system based on thermal sensing dynamic task migrating
Manage device resource regulating method.
Background technique
Many-core chip is one of the processor component in the fields such as cloud computing, mobile computing.Many-core processor is widely used in
Server, the fields such as data center.In the development of computer system, many-core chip becoming one it is more and more important flat
Platform.Increase with application to the demand of calculating, the integrated level and performance of many-core chip are also continuously improved, following, are fast
The chip power density and temperature that speed increases, temperature become a key factor of limitation chip performance.Excessively high temperature when long
It will affect the reliability and service life of many-core chip.Due to radiating condition limitation and for safeguards system operation safety,
Generally power constraint is arranged in chip system.To meet power constraint while keeping supercomputing performance, one on chip
Processor is divided to have to be in close state, this is referred to as black silicon phenomenon.
Unactivated idle processor is generally referred to as bubble in chip system.It is some to do since the thermal diffusivity of bubble is preferable
Bubble is placed in around active processor by method, and the processor is enabled to run with higher frequency to improve calculated performance.Such does
Method is constant in running frequency since duty mapping position is fixed and invariable still in static task-processor core maps category
In the state of, it is still possible to generate hot spot.Certain methods reduce hot spot using dynamic task immigration, when former processor
When temperature gradually rises to the temperature threshold higher than setting, they move to thread or task on other cores from thermonuclear is crossed.One
Class method tends to the processor for selecting global temperature minimum or randomly selects migration target of the idle processor as task,
This can increase considerably communication distance.If the communication between application task is more, this has clearly resulted in excessively high communication cost.Separately
A kind of method chooses a processor migration task adjacent with Overheating Treatment device every time, after repeatedly such migration, answers
With that may leave a discontinuous free area when leaving system, the new application for reaching or waiting in the queue can not connect
It is continuous to be mapped in idle processor.The case where above two classes method may all face be, when intertask communication may be through excessive
A processor core for being currently running other application causes the intertask communication of application to clash, reduces communication efficiency.
Due to the diversity and complexity of user's request, it is negative that server system is expected to cope with changeable work
It carries, responds in the shortest possible time.A kind of task immigration method order is each currently running the processor overclocking fortune of task
Row, all moves to all tasks in another piece of continuous idle processor region when reaching temperature threshold.Although protecting
The communication distance held between task is constant, system when such methods are only applicable to load lower, when applying arrival more, needs
Processor resource it is more when, when the processor resource dispatching method of this low system availability can bring too long application to wait
Between, the performance gain under overclocking is offset, the response time is increased.
Disclosed in the patent of invention that Authorization Notice No. is CN201310059705 it is a kind of " a kind of nuclear resource distribution method,
Device and many-core system ", which mainly proposes a kind of nuclear resource distribution method, according to the Thread Count of consumer process
(required idle nucleus number mesh) merges the core subregion of dispersion the continuous core subregion formed is distributed to consumer process,
Optimize communication cost.This method chooses benchmark core point from least two dispersion core subregions according to core zoned migration cost
Area and one are from core subregion, so that total core zoned migration cost is minimum.Secondly migration makes from the idle core of core subregion from core point
Area's free time core merges to form a continuous core subregion with the idle core of benchmark core subregion.Above-mentioned patent is mainly from removing fragmentation
Angle integrates the processor resource of on piece, make reach application can be realized Continuous Mappings, it the shortcomings that be not examine
The power constraint and temperature restraint of worry system, do not account for the temperature peak being likely to occur and non-uniform Temperature Distribution, system
In the presence of overheat risk.
The method in relation to task immigration all will not take into account the utilization of bubble above.In the presence of black silicon phenomenon,
Chip temperature constraint and load are considered simultaneously, how to design the dynamic processor scheduling of resource of many-core system, are to maintain its high property
The key of energy.
Summary of the invention
The present invention is the processor money dispatching method under black silicon phenomenon, in two-dimentional network-on-chip many-core system task schedule
It is realized in analogue system, hot wind danger can be evaded while keeping higher processor running frequency and considers communication cost,
To realize the throughput of raising system.
For this purpose, a kind of many-core system processor resource dispatching party based on thermal sensing dynamic task migrating proposed by the present invention
Method, comprising the following steps:
Step 1: detection waits whether sequence is empty;It is not sky, then each mapped in waiting sequence is applied, then into
Row step 2;Terminate this scheduling of resource if empty, when next clock cycle being waited to start, carries out step 1;
Step 2: detection reaches whether queue is empty;If reaching queue not empty, step 3 is carried out;It is if reaching queue
Sky, then the clock cycle does not have new opplication arrival, is not required to carry out processor resource scheduling, terminates this scheduling of resource, under waiting
When one clock cycle starts, step 1 is carried out;
Step 3: whether without application operation in detection system, if not application operation, i.e. N*N processor of system
Resource is available, then with the execution that bubble-performance model and waiting time model are respectively under each estimation difference bubble
Time and waiting time pass through the total of branch-bound algorithm search application as the calculating input of cost function in branch-and-bound
Response time shortest bubble allocation result, keeps overall response time most short;If having using running, step 4 is carried out;
Step 4: if having using running, the application region that the available processors of system have been occupied is divided into one group not
Continuous free time Free Region.It is respectively the optional number of bubbles of each estimation with bubble-performance model and waiting time model
Under execution time and waiting time, as in branch-and-bound cost function calculating input, searched for by branch-bound algorithm
Optimal bubble allocation result, keeps overall response time most short.
Step 5: using mapping phase: using adapting to for the first time, heuritic approach (First-Fit Heuristic) selection is empty
The mapped mode in not busy region, reselection application is mapped;The mapped mode of the application includes rectangular mapped mode and communication
Preferential mapped mode;
Step 6: using the operation phase: according to calculation amount-traffic ratio of the mapped mode of application and its own
(Computation communication rate, CCR) alternatively foundation, for different types of application selection migration mould
Formula.
Further, the bubble-performance model for calculate it is each apply the corresponding execution under different bubbles numbers when
Between, which is polynomial regression model, and enabling the execution time of application is ∏i, bubbles number b that application region includesi, application
The longest route jumping figure h of critical path hop count, that is, weight pathi, the average calculation times c of task ini, the task in
Average communication data ti, bubble-performance model is as follows:
Wherein n1~n4For polynomial order, αk, βk, γk, θkFor the fitting coefficient of model, value is obtained by maximum-likelihood method
It arrives.For application i number of bubbles k power,For application i critical path hop count k power,For average task in application i
The k power of time is calculated,For the average every two intertask communication time of application i.
Further, the waiting time model be given current application area size, that is, number of bubbles and number of tasks it
With, for calculate it is each apply under different bubbles numbers and current arrival rate under the corresponding waiting time, which is multinomial
Regression model, the region area of current application are denoted as R, and the total processor quantity of system is denoted as | T |, mapped the average task of application
Number scale is | Ai|, enabling the waiting time of current application is ηi, ηiBy following variable modeling: the average bubble number-for having mapped application is appointed
Be engaged in number ratio ri, mapped the average performance times e of applicationi, using arrival rate λ, waiting time model is as follows:
Wherein a0For constant term, z is polynomial order, rjFor the average bubble number-number of tasks ratio j for having mapped application
Power, ejFor mapped application average performance times j power, λjFor the j power of current application arrival rate.δj, εj, μjIt serves as reasons
The fitting coefficient of each parameter item of the correspondence that Maximum Likelihood Regression obtains.
Further, application task number is the processor quantity occupied using mapping.One application is multiple small tasks
Intersection executes operation part different in application when each task run, thus parallel execution application.Each duty mapping is wanted
A processor core is occupied, application task number is equal to the application operation processor quantity to be occupied.
Further, the optional number of bubbles is the processor sum of each free area and the difference of application task number;It answers
Referred to number of tasks and is equal to the application operation processor quantity to be occupied.
Further, step 3 is realized by global administration's device (Global manager), and when subsequent applications reach, the overall situation is managed
Available processors region all in device statistics current system is managed, is each subsequent these available processors areas of arrival estimation
Domain corresponding execution time and waiting time find the corresponding pass of the smallest application of cost-Free Region by branch-bound algorithm
System.
Further, the first-fit algorithm implements process are as follows: in system from left to right, searches for from top to bottom
Idle processor judges whether it can be as the starting point of application region, and can be that starting point is same as the condition of starting point arranges the right side
The product of idle processor number is greater than or equal to application region area below side idle processor number and same column;When finding first
It can be used as the processor of the starting point of application region, then the processor distributed to this as the free area in the most upper left corner and answered
With, will be using putting waiting list into if can not find the processor that can be used as the starting point of application region;It, will after region is selected
Using with the mode map of selection in the area.It is described selection application mapped mode mode are as follows: when application bubble with
When number of tasks meets 1: 1, using the rectangular mapped mode of selection;When the bubble of application and number of tasks are unsatisfactory for 1: 1, using selection
Communicate preferential mapped mode.
Further, the migration model includes three kinds of migration models, is respectively as follows: rectangular migration model, most cold in region
Most cold neighbours' core migration model in core migration model and region;The step 5 is according to the mapped mode of application and the meter of its own
Calculation amount-traffic ratio alternatively foundation respectively must not exceed its task quantity using number of bubbles maximum, select migration model
Process it is as follows:
If number of bubbles is equal with number of tasks, select rectangular migration model to keep its communication distance in transition process for application
In it is constant, and processor runs overclocking;
If number of bubbles is unequal with number of tasks, for the preferential mapping of application selection communication.Threshold value value is 2h/ (h-1), and h is
Key application route jumping figure, that is, longest the route jumping figure of weight path is moved for the CCR most cold core of application selection region for being greater than threshold value
It moves, the most cold neighbours' core migration of application selection region for being less than threshold value for CCR.
Further, three kinds of migration models the specific implementation process is as follows:
(1) rectangular migration model, application need to meet rectangular mapping and application region size is at least twice of number of tasks, adjust
Used time unbinds all tasks and its processor currently mapped, and task is sequentially sequentially mapped to apply by task number (ID)
In another piece of free area in region;
(2) most cold core migration model in region, when calling, are navigated to by the task ID that Hot spots detection obtains beyond temperature threshold
On the processor of value, that is, Overheating Treatment device, the then minimum processor core of seeking temperature in application region, when finding a temperature
The transition condition of most cold core migration model in minimum processor core and region is spent, i.e. the processor is that bubble is appointed without mapping
The idle processor of business does not have mapping tasks on processor, and the processor of task and Overheating Treatment device, that is, originally is unbinded and reflected
It is mapped on the processor, if being unable to satisfy the transition condition of task, i.e., most cold core is not bubble, is done at frequency reducing to Overheating Treatment device
Reason;
(3) most cold neighbours' core migration model in region navigates to overheat by the task ID that Hot spots detection obtains when calling
It manages on device, the minimum processor core of seeking temperature in the 8 adjacent cores adjacent with the processor, if the processor core searched
The transition condition for meeting most cold neighbours' core migration model in region, i.e., had mapped task thereon but temperature is not higher than threshold temperature
2/3rds or the processor be bubble be idle processor, then execute task immigration, if having in the processor appoint
Business, exchanges the task on the processor and Overheating Treatment device, i.e., double time to unbind-remap;If the processor core is in idle shape
State, execute single unbind-remap, if being unable to satisfy the transition condition of task, i.e., its temperature be higher than threshold temperature three/
Two, down conversion process is done to Overheating Treatment device.
Further, the branch-bound algorithm are as follows: the bubble that different number is distributed for each application, at each node
Current overall response time is calculated as cost function;Overall response time σ be the node at be assigned bubble application etc.
Maximum value to the sum of time and execution time:
σ=max { ηi+∏i, i ∈ { 0,1 ..., n }
Wherein ηiFor the waiting time of application i, ∏iFor the execution time of application i.It is most slow that every second priority expands lower bound growth
Slow branch, and longer upper layer node of i.e. response time higher than next node layer cost is cut, when finally obtaining overall response
Between shortest bubble allocation result, i.e. application region division result.
Further, after using arrival, when calculating the execution applied under different number of bubbles according to bubble-performance model estimation
Between.Respectively it must not exceed its task quantity using number of bubbles maximum.When number of bubbles is equal to task quantity, all migrations are estimated
The execution time under mode.When number of bubbles is less than task quantity, the neighbour of communication is conducive to for the CCR estimation for being higher than threshold value
Occupy the execution time under most cold core task immigration mode;For CCR lower than the estimation of threshold value conducive to most cold in the region calculated
Core task immigration mode.
Further, if the processor quantity that region includes is greater than application task number, according to bubble in mapping rear region
The frequency that activity processor in the region is calculated with the ratio of number of tasks enables processor overclocking run.
Further, the operation phase is being applied, system is in each control room every interior detection hot spot.When there is hot spot,
Task immigration is carried out by the migration model of selection in corresponding application region.
Further, the preferential mapped mode of the communication, to communicate the maximum node of weight as preferred duty mapping
On the processor core near the geometric center of application region, and by coupled node be sequentially mapped to away from its Manhattan away from
From on shortest available processors core.Secondly choose father's node that non-mapping tasks have mapped, by coupled node according to
It is secondary to be mapped to away from the shortest available processors core of its manhatton distance.And so on, until all tasks of application all map
Onto processor.
Further, the square mapped mode, taking the square root of application task number to be rounded downwards is side length, by all
Continuous Mappings of being engaged in are in a rectangular region.
Identical with sampling interval control room is chosen every, one control room of every mistake every having detected whether that temperature of processor is higher than
Given temperature threshold.If it is not, without task immigration.If there is exceed temperature threshold processor core, then by
The task orientation mapped on hot spot to corresponding application, in the event class example for being bundled with the application by the mode of selection into
Row task immigration.Hot spots detection provides the application ID and its overheat task ID of overheat.
By bubble-performance model, waiting time model, three kinds of migration models of design and the expansion two-dimensional slice of establishing application
Upper network many-core system.In resource allocation, using black silicon phenomenon to obtain preferably calculated performance.On task immigration, root
According to application mapped mode and its own calculation amount-traffic ratio (Computation communication rate,
CCR) alternatively foundation selects migration model for different types of application.Higher CCR generally means that calculated performance is corresponding
The contribution of overall performance is bigger, on the contrary then mean that communication performance is more important in overall performance.
The present invention have compared with prior art it is below the utility model has the advantages that
1, in this dispatching method, bubble serves not only as heat dissipation, can also be simultaneously as task on Overheating Treatment device
Migrating objects.
2, system can the current application arrival rate of dynamic response, make full use of black silicon phenomenon to mention when reaching and applying less
The runnability for rising application improves system availability when arrival application is more, avoids the waiting time of application too long.It is overall
On, more other traditional task immigration methods improve system throughput, reduce the average response time of application.
Detailed description of the invention
Fig. 1 is the structural block diagram of the analogue system after the present invention expands.
Fig. 2 is the algorithm exemplary diagram that branch circle distributes bubble in the present invention.
Fig. 3 is the flow chart of processor resource dispatching method in the present invention.
Specific embodiment
Present invention will now be described in further detail with reference to the embodiments and the accompanying drawings, but embodiments of the present invention are unlimited
In this.
A kind of many-core system processor resource dispatching method based on thermal sensing dynamic task migrating of the present embodiment uses
The corresponding CCR threshold value of application 1 be 2.6, be 2.6 using 2 corresponding CCR threshold values, be 2.4 using 3 corresponding CCR threshold values.System scale
Contain 25 processor cores altogether for 5*5, the clock frequency of processor core is 1GHZ, and the clock cycle was 1 nanosecond.
As shown in Figure 3, comprising the following steps:
Step 1: detection waits whether sequence is empty;It is not sky, then each mapped in waiting sequence is applied, then into
Row step 2;Terminate this scheduling of resource if empty, waits next clock cycle;
Step 2: detection reaches whether queue is empty;If reaching queue not empty, step 3 is carried out;It is if reaching queue
Sky, then the clock cycle does not have new opplication arrival, is not required to carry out processor resource scheduling, terminates this scheduling of resource, under waiting
When one clock cycle starts, step 1 is carried out.
Step 3: whether without application operation in detection system, if not application operation, i.e. 5*5 processor of system
Resource is available, then with the execution that bubble-performance model and waiting time model are respectively under each estimation difference bubble
Time and waiting time search for optimal gas by branch-bound algorithm as the calculating input of cost function in branch-and-bound
Steep allocation result;If having using running, step 4 is carried out.
Step 4: if having using running, the application region that the available processors of system have been occupied is divided into one group not
Continuous free time Free Region.It is respectively the optional number of bubbles of each estimation with bubble-performance model and waiting time model
Under (optional number of bubbles be each free area processor sum and application task number difference) execution time and waiting time,
As the calculating input of cost function in branch-and-bound, optimal bubble allocation result is searched for by branch-bound algorithm, is made total
Response time is most short.
Step 5: using mapping phase: using adapting to for the first time, heuritic approach (First-Fit Heuristic) selection is empty
The mapped mode in not busy region, reselection application is mapped;The mapped mode of the application includes rectangular mapped mode and communication
Preferential mapped mode;
Step 6: using the operation phase: according to calculation amount-traffic ratio of the mapped mode of application and its own
(Computation communication rate, CCR) alternatively foundation, for different types of application selection migration mould
Formula.
Further, the bubble-performance model for calculate it is each apply the corresponding execution under different bubbles numbers when
Between, which is polynomial regression model, and enabling the execution time of application is ∏i, bubbles number b that application region includesi, application
The longest route jumping figure h of critical path hop count, that is, weight pathi, the average calculation times c of task ini, the task in
Average communication data ti, bubble-performance model is as follows:
Wherein n1~n4For polynomial order, αk, βk, γk, θkFor model coefficient, value is obtained by maximum-likelihood method.For
Using the k power of the number of bubbles of i,For application i critical path hop count k power,For the average task computation time in application i
K power,For the average every two intertask communication time of application i.
Further, the waiting time model be given current application area size, that is, number of bubbles and number of tasks it
With, for calculate it is each apply under different bubbles numbers and current arrival rate under the corresponding waiting time, which is multinomial
Regression model, the region area of current application are denoted as R, and the total processor quantity of system is denoted as | T |, mapped the average task of application
Number scale is | Ai|, enabling the waiting time of current application is ηi, ηiBy following variable modeling: the average bubble number-for having mapped application is appointed
Be engaged in number ratio ri, mapped the average performance times e of applicationi, using arrival rate λ, waiting time model is as follows:
Wherein a0For constant term, z is polynomial order, rjFor the average bubble number-number of tasks ratio j for having mapped application
Power, ejFor mapped application average performance times j power, λjFor the j power of current application arrival rate.δj, εj, μjIt serves as reasons
The fitting coefficient of each parameter item of the correspondence that Maximum Likelihood Regression obtains.
Further, step 3 is realized by global administration's device (Global manager), i.e., global when subsequent applications arrival
Manager counts available processors region all in current system, is each these available processors of subsequent arrival estimation
Region corresponding execution time and waiting time find the corresponding pass of the smallest application of cost-Free Region by branch-bound algorithm
System.
Further, the first-fit algorithm implements process are as follows: in system from left to right, searches for from top to bottom
Idle processor judges whether it can be as the starting point of application region, and can be that starting point is same as the condition of starting point arranges the right side
The product of idle processor number is greater than or equal to application region area below side idle processor number and same column;When finding first
It can be used as the processor of the starting point of application region, then the processor distributed to this as the free area in the most upper left corner and answered
With, will be using putting waiting list into if can not find the processor that can be used as the starting point of application region;It, will after region is selected
Using with the mode map of selection in the area.
The mode of the mapped mode of the selection application are as follows: when the bubble of application and number of tasks meet 1: 1, using selection
Rectangular mapped mode;When the bubble of application and number of tasks are unsatisfactory for 1: 1, preferential mapped mode is communicated using selection.
Further, the migration model includes three kinds of migration models, is respectively as follows: rectangular migration model, most cold in region
Most cold neighbours' core migration model in core migration model and region;The step 5 is according to the mapped mode of application and the meter of its own
Calculation amount-traffic ratio alternatively foundation respectively must not exceed its task quantity using number of bubbles maximum, select migration model
Process it is as follows:
If number of bubbles is equal with number of tasks, select rectangular migration model to keep its communication distance in transition process for application
In it is constant, and processor runs overclocking;
If number of bubbles is unequal with number of tasks, for the preferential mapping of application selection communication.It is greater than the application selection of threshold value for CCR
The most cold core migration in region, the most cold neighbours' core migration of application selection region for being less than threshold value for CCR.
Further, three kinds of migration models the specific implementation process is as follows:
(1) rectangular migration model, application need to meet rectangular mapping and application region size is at least twice of number of tasks, adjust
Used time unbinds all tasks and its processor currently mapped, is sequentially mapped to another piece of sky of application region in order again
In not busy region;
(2) most cold core migration model in region navigates to Overheating Treatment device by the task ID that Hot spots detection obtains when calling
On, the then minimum processor core of seeking temperature in application region, when finding the minimum processor core of temperature and satisfaction
Transition condition, the i.e. processor are bubble, do not have mapping tasks on processor, and task is unbinded with Overheating Treatment device and is reflected again
It is mapped on the processor, if being unable to satisfy the transition condition of task, i.e., most cold core is not bubble, is done at frequency reducing to Overheating Treatment device
Reason;
(3) most cold neighbours' core migration model in region navigates to overheat by the task ID that Hot spots detection obtains when calling
It manages on device, the minimum processor core of seeking temperature in the 8 adjacent cores adjacent with the processor, if the processor core searched
Meet transition condition, i.e., has had mapped task thereon but temperature is not higher than 2/3rds or processor of threshold temperature as gas
Bubble then selectes the target processor that the processor is task immigration.If having task in the processor, the processor and mistake are exchanged
Task on annealer executes and unbinds-remap twice, the task X that Overheating Treatment device is mapped with it is unbinded, by mesh
The task Y unbundlings that mark processor is mapped with it, then task X is remapped on target processor, task Y is remapped to
On Overheating Treatment device;If the processor core is idle processor, executes single and unbind-remap, if being unable to satisfy task
Transition condition, i.e. its temperature are higher than 2/3rds of threshold temperature, do down conversion process to Overheating Treatment device.
Further, after using arrival, when calculating the execution applied under different number of bubbles according to bubble-performance model estimation
Between.Respectively it must not exceed its task quantity using number of bubbles maximum.When number of bubbles is equal to task quantity, all migrations are estimated
The execution time under mode.When number of bubbles is less than task quantity, the neighbour of communication is conducive to for the CCR estimation for being higher than threshold value
Occupy the execution time under most cold core task immigration mode;For CCR lower than the estimation of threshold value conducive to most cold in the region calculated
Core task immigration mode.
Further, if the processor quantity that region includes is greater than application task number, according to bubble in mapping rear region
The frequency that activity processor in the region is calculated with the ratio of number of tasks enables processor overclocking run.
Further volume is applying the operation phase, and system is in each control room every interior detection hot spot.When there is hot spot,
Task immigration is carried out by the migration model of selection in corresponding application region.
Further, the preferential mapped mode of the communication, to communicate the maximum node of weight as preferred duty mapping
On the processor core near the geometric center of application region, and by coupled node be sequentially mapped to away from its Manhattan away from
From on shortest available processors core.Secondly choose father's node that non-mapping tasks have mapped, by coupled node according to
It is secondary to be mapped to away from the shortest available processors core of its manhatton distance.And so on, until all tasks of application all map
Onto processor.
Further, the square mapped mode, taking the square root of application task number to be rounded downwards is side length, by all
Continuous Mappings of being engaged in are in a rectangular region.
Identical with sampling interval control room is chosen every, one control room of every mistake every having detected whether that temperature of processor is higher than
Given temperature threshold.If it is not, without task immigration.If there is the processing for being 80 degrees Celsius beyond temperature threshold
Device core, then by the task orientation that is mapped on hot spot to corresponding application, by selection in the event class example for being bundled with the application
Mode carry out task immigration.Hot spots detection provides the application ID and its overheat task ID of overheat.
The present invention realizes that, referring to Fig. 1, which includes in an analogue system
Pplication generator, event driven simulation multiple nucleus system and HotSpot (Chinese name hot spot) temperature simulation device are U.S. Fu Jini
University utilizes the temperature simulation model of resistance-capacitance equivalent model exploitation, has the characteristics that quickly and accurately, to use newest hair
Cloth version HotSpot6.0 temperature simulation device.Pplication generator generates the task number that task image is created that simulation application at random
With task computation amount (range 100-800), the traffic (range 50-500) of communication topology and every communication side between task is appointed
Business figure is represented as the directed acyclic graph of Weight, and the communication contained between the calculation amount and task of each task relies on.
Event driven simulation multiple nucleus system simulates the binding relationship and task of single task example and analog processor
Execution calculating and intertask communication.Processor resource dispatching algorithm includes to distribute bubble, duty mapping and task fortune to application
Row Migration Simulation.Duty mapping is realized by the binding of single task-uniprocessor.After application terminates operation, occupied by task
Processor can all be released.When simulating task run, the running frequency that the calculating speed of task is equal to alignment processing device (is
The running frequency of each analog processor is controllable in uniting), the communication speed between each pair of task is equal to the routing of processor
Manhatton distance between frequency/corresponding two processor.
In task mapping phase, two kinds of optional mapping methods, rectangular mapping and the preferential mapping of communication are provided:
All tasks of application are simply pressed numeric order Continuous Mappings in a squarish region by rectangular mapping,
The region accounts for the half in overall applicability region (allocated free time continuous processing device region).
The preferential mapping of communication three dynamic arrays MAP, MET, UNM of creation first respectively represent having mapped for task, and
The task of mapping tasks connected (having communication), and other tasks not in the first two queue.Node on behalf analog processor.
When initialization, select the approximate geometric center of application region (allocated free time continuous processing device region) as first node,
The accumulative highest task of the traffic is chosen as preferred task and is mapped on first node, MAP array is put into.It will appoint with first choice
The connected all tasks of business are sequentially placed into MUP array, and remaining task is put into UNM array.To first task in MET, in MAP
In find it father's task and corresponding node, successively from the node manhatton distance be 1,2,3 ... node at start
Search, until finding first enabled node, by the duty mapping in MET array on node, it is moved to from MET
In MAP array, and coupled and in UNM task is moved in MET from UNM.To node each in MET be carried out with
Upper step, until the size of MAP array is equal to the task quantity of application, mapping is completed.
According to the power module of uniprocessor, the function of each processor in many-core system is calculated in every microsecond
Rate, the power trace as current time.Using fixed time period as the sampling interval, one sampling interval of every mistake is called
HotSpot temperature simulation device emulates system temperature, is made with the power track that system runs time consuming each moment
For input, when Hotspot returns to this in etching system each processor core instantaneous analog temperature.When the temperature of some processor is super
When crossing the temperature threshold of setting, the corresponding application of hot spot will execute task immigration.
In task migration phase, three kinds of optional migration models are provided, rectangular migration model, most cold core migration in region
Most cold neighbours' core migration model in mode and region.
Rectangular migration model is based on rectangular mapping, will be mapped as another in rectangular task bulk migration to application region
In one rectangular region.
The minimum processor core of most cold core migration model seeking temperature in application region in region, when finding a temperature
Minimum processor core and meet transition condition, it will be on the task immigration of overheat to most cold core.
Most cold neighbours' core migration model minimum processing of seeking temperature in 8 cores adjacent with Overheating Treatment device in region
Device core exchanges the task on the processor and Overheating Treatment device, and suitably reduce Overheating Treatment if meeting certain temperature condition
The frequency of device.
Fig. 2 illustrates an example and is three after expression system reaches three applications for the first time according to branch-bound algorithm and answers
With the different number of bubbles of distribution to obtain shortest overall response time.
Include seven tasks using 1, bubble 0,1,3,5,7 corresponding execution time difference are obtained by bubble-performance model
It (is indicated for 80,76,62,51,40 to simplify, only consider the number of bubbles that may be constructed regular domain, the longest edge of application region is small
In the side length of system).
Include five tasks using 2, bubble 0,1,3,4,5 corresponding execution time difference are obtained by bubble-performance model
It is 300,275,250,217,150.
Include eight tasks using 3, bubble 0,1,2,4,6,8 corresponding execution time point are obtained by bubble-performance model
It Wei 195,184,166,147,125,99.
B1 in embodiment, b2, b3 respectively represent the bubble distributed to using 1, distribute to the bubble using 2, and distribute to
Using 3 bubble.By the characteristic of branch-bound method, every second priority extended response time increases most slow branch (when overall response
Between i.e. lower bound), trimming lower bound be greater than low layer branch lower bound upper layer branch (lower bound of first branch of such as third layer be 195,
Trim the branch that all lower bounds of the second layer are greater than 195).The shortest branch of overall response time is found in the bottom, it is as optimal
It solves, the optimal solution of overall response time is 165 in embodiment, trims the branch that all lower bounds are greater than optimal solution.Optimal solution is corresponding
Bubble distributes to three applications, and the corresponding bubble of the optimal solution of embodiment is { b1=7, b2=5, b3=6 }.Such as Fig. 2, optimal solution
The corresponding ancestor node of node is that b1=7 gives using 17 bubbles of distribution, and corresponding father's node is that b2=5 gives using 2
5 bubbles are distributed, the node oneself corresponds to b3=6 and gives using 36 bubbles of distribution.
Application region divides and scheduling of resource is as follows: 7 tasks need to be mapped using 1 application region and include 7 bubbles,
The processor region for being 14 for its allocated size, first-fit algorithm search continuous free area (one of them of 3*5 for it
Processor is not occupied);5 tasks need to be mapped using 2 application region and include 5 bubbles, be its allocated size be 10
Processor region, first-fit algorithm search the continuous free area of 2*5 for it;8 need to be mapped using 3 application region to appoint
It is engaged in and includes 6 bubbles, not sufficiently large free area, waits in the queue using 3 in system.
To applying 1, number of tasks is equal with number of bubbles, it is mapped in region by rectangular mapped mode, and selects rectangular
Migration model, application 1 bring into operation;To applying 2, number of tasks is equal with number of bubbles, it is mapped to area by rectangular mapped mode
In domain, and rectangular migration model is selected, application 2 brings into operation.
It is run using 1 end first, the processor core that application region occupies is released.It need to be reflected using 3 application region
It penetrates 8 tasks and includes 6 bubbles, be the continuous idle processor region that its allocated size is 14, first-fit algorithm is it
Search the free area of 3*5.
To applying 3, number of bubbles is less than number of tasks, it is mapped in region by preferential mapped mode is communicated.Calculate its CCR
It is greater than threshold value for 3.2, is the most cold core migration model of its selection region, application 3 brings into operation.
Above content is the detailed description carried out in conjunction with specific embodiment to the present invention, but can not assert the present invention
Specific implementation be only limited to this content.For general technical staff of the technical field of the invention, this hair is not being departed from
Under the premise of bright principle and spirit, these can also be implemented to carry out several adjustment, modification, replacement and/or modification.The present invention
Protection scope limited by appended claims and its equivalent.