CN109918195A - Many-core system processor resource dispatching method based on thermal sensing dynamic task migrating - Google Patents

Many-core system processor resource dispatching method based on thermal sensing dynamic task migrating Download PDF

Info

Publication number
CN109918195A
CN109918195A CN201910049800.5A CN201910049800A CN109918195A CN 109918195 A CN109918195 A CN 109918195A CN 201910049800 A CN201910049800 A CN 201910049800A CN 109918195 A CN109918195 A CN 109918195A
Authority
CN
China
Prior art keywords
application
processor
region
task
bubble
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910049800.5A
Other languages
Chinese (zh)
Other versions
CN109918195B (en
Inventor
文生雁
王小航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201910049800.5A priority Critical patent/CN109918195B/en
Publication of CN109918195A publication Critical patent/CN109918195A/en
Application granted granted Critical
Publication of CN109918195B publication Critical patent/CN109918195B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Multi Processors (AREA)

Abstract

The present invention discloses the many-core system processor resource dispatching method based on thermal sensing dynamic task migrating, comprising the following steps: Step 1: detection waits whether sequence is empty;It is not sky, then maps application, then carry out step 2;Step 2: detection reaches whether queue is empty;If non-empty carries out step 3;Step 3:, if it is not, then estimating operation and the waiting time of bubble respectively with model, searching for optimal bubble allocation result by branch-bound algorithm whether without application operation in detection system;Step 4: searching for optimal bubble allocation result by branch-bound algorithm;Step 5: using mapping phase, Step 6: using the operation phase.This method utilizes black silicon phenomenon, and according to the calculating of the arrival rate of application and application sensitivity or communication sensitivity characteristic, the application for adapting to different length reaches queue, keep task run frequency, changeable application arrival rate can be responded, the throughput of system is effectively improved, improves system performance.

Description

Many-core system processor resource dispatching method based on thermal sensing dynamic task migrating
Technical field
The present invention relates to fields of communication technology, and in particular at a kind of many-core system based on thermal sensing dynamic task migrating Manage device resource regulating method.
Background technique
Many-core chip is one of the processor component in the fields such as cloud computing, mobile computing.Many-core processor is widely used in Server, the fields such as data center.In the development of computer system, many-core chip becoming one it is more and more important flat Platform.Increase with application to the demand of calculating, the integrated level and performance of many-core chip are also continuously improved, following, are fast The chip power density and temperature that speed increases, temperature become a key factor of limitation chip performance.Excessively high temperature when long It will affect the reliability and service life of many-core chip.Due to radiating condition limitation and for safeguards system operation safety, Generally power constraint is arranged in chip system.To meet power constraint while keeping supercomputing performance, one on chip Processor is divided to have to be in close state, this is referred to as black silicon phenomenon.
Unactivated idle processor is generally referred to as bubble in chip system.It is some to do since the thermal diffusivity of bubble is preferable Bubble is placed in around active processor by method, and the processor is enabled to run with higher frequency to improve calculated performance.Such does Method is constant in running frequency since duty mapping position is fixed and invariable still in static task-processor core maps category In the state of, it is still possible to generate hot spot.Certain methods reduce hot spot using dynamic task immigration, when former processor When temperature gradually rises to the temperature threshold higher than setting, they move to thread or task on other cores from thermonuclear is crossed.One Class method tends to the processor for selecting global temperature minimum or randomly selects migration target of the idle processor as task, This can increase considerably communication distance.If the communication between application task is more, this has clearly resulted in excessively high communication cost.Separately A kind of method chooses a processor migration task adjacent with Overheating Treatment device every time, after repeatedly such migration, answers With that may leave a discontinuous free area when leaving system, the new application for reaching or waiting in the queue can not connect It is continuous to be mapped in idle processor.The case where above two classes method may all face be, when intertask communication may be through excessive A processor core for being currently running other application causes the intertask communication of application to clash, reduces communication efficiency.
Due to the diversity and complexity of user's request, it is negative that server system is expected to cope with changeable work It carries, responds in the shortest possible time.A kind of task immigration method order is each currently running the processor overclocking fortune of task Row, all moves to all tasks in another piece of continuous idle processor region when reaching temperature threshold.Although protecting The communication distance held between task is constant, system when such methods are only applicable to load lower, when applying arrival more, needs Processor resource it is more when, when the processor resource dispatching method of this low system availability can bring too long application to wait Between, the performance gain under overclocking is offset, the response time is increased.
Disclosed in the patent of invention that Authorization Notice No. is CN201310059705 it is a kind of " a kind of nuclear resource distribution method, Device and many-core system ", which mainly proposes a kind of nuclear resource distribution method, according to the Thread Count of consumer process (required idle nucleus number mesh) merges the core subregion of dispersion the continuous core subregion formed is distributed to consumer process, Optimize communication cost.This method chooses benchmark core point from least two dispersion core subregions according to core zoned migration cost Area and one are from core subregion, so that total core zoned migration cost is minimum.Secondly migration makes from the idle core of core subregion from core point Area's free time core merges to form a continuous core subregion with the idle core of benchmark core subregion.Above-mentioned patent is mainly from removing fragmentation Angle integrates the processor resource of on piece, make reach application can be realized Continuous Mappings, it the shortcomings that be not examine The power constraint and temperature restraint of worry system, do not account for the temperature peak being likely to occur and non-uniform Temperature Distribution, system In the presence of overheat risk.
The method in relation to task immigration all will not take into account the utilization of bubble above.In the presence of black silicon phenomenon, Chip temperature constraint and load are considered simultaneously, how to design the dynamic processor scheduling of resource of many-core system, are to maintain its high property The key of energy.
Summary of the invention
The present invention is the processor money dispatching method under black silicon phenomenon, in two-dimentional network-on-chip many-core system task schedule It is realized in analogue system, hot wind danger can be evaded while keeping higher processor running frequency and considers communication cost, To realize the throughput of raising system.
For this purpose, a kind of many-core system processor resource dispatching party based on thermal sensing dynamic task migrating proposed by the present invention Method, comprising the following steps:
Step 1: detection waits whether sequence is empty;It is not sky, then each mapped in waiting sequence is applied, then into Row step 2;Terminate this scheduling of resource if empty, when next clock cycle being waited to start, carries out step 1;
Step 2: detection reaches whether queue is empty;If reaching queue not empty, step 3 is carried out;It is if reaching queue Sky, then the clock cycle does not have new opplication arrival, is not required to carry out processor resource scheduling, terminates this scheduling of resource, under waiting When one clock cycle starts, step 1 is carried out;
Step 3: whether without application operation in detection system, if not application operation, i.e. N*N processor of system Resource is available, then with the execution that bubble-performance model and waiting time model are respectively under each estimation difference bubble Time and waiting time pass through the total of branch-bound algorithm search application as the calculating input of cost function in branch-and-bound Response time shortest bubble allocation result, keeps overall response time most short;If having using running, step 4 is carried out;
Step 4: if having using running, the application region that the available processors of system have been occupied is divided into one group not Continuous free time Free Region.It is respectively the optional number of bubbles of each estimation with bubble-performance model and waiting time model Under execution time and waiting time, as in branch-and-bound cost function calculating input, searched for by branch-bound algorithm Optimal bubble allocation result, keeps overall response time most short.
Step 5: using mapping phase: using adapting to for the first time, heuritic approach (First-Fit Heuristic) selection is empty The mapped mode in not busy region, reselection application is mapped;The mapped mode of the application includes rectangular mapped mode and communication Preferential mapped mode;
Step 6: using the operation phase: according to calculation amount-traffic ratio of the mapped mode of application and its own (Computation communication rate, CCR) alternatively foundation, for different types of application selection migration mould Formula.
Further, the bubble-performance model for calculate it is each apply the corresponding execution under different bubbles numbers when Between, which is polynomial regression model, and enabling the execution time of application is ∏i, bubbles number b that application region includesi, application The longest route jumping figure h of critical path hop count, that is, weight pathi, the average calculation times c of task ini, the task in Average communication data ti, bubble-performance model is as follows:
Wherein n1~n4For polynomial order, αk, βk, γk, θkFor the fitting coefficient of model, value is obtained by maximum-likelihood method It arrives.For application i number of bubbles k power,For application i critical path hop count k power,For average task in application i The k power of time is calculated,For the average every two intertask communication time of application i.
Further, the waiting time model be given current application area size, that is, number of bubbles and number of tasks it With, for calculate it is each apply under different bubbles numbers and current arrival rate under the corresponding waiting time, which is multinomial Regression model, the region area of current application are denoted as R, and the total processor quantity of system is denoted as | T |, mapped the average task of application Number scale is | Ai|, enabling the waiting time of current application is ηi, ηiBy following variable modeling: the average bubble number-for having mapped application is appointed Be engaged in number ratio ri, mapped the average performance times e of applicationi, using arrival rate λ, waiting time model is as follows:
Wherein a0For constant term, z is polynomial order, rjFor the average bubble number-number of tasks ratio j for having mapped application Power, ejFor mapped application average performance times j power, λjFor the j power of current application arrival rate.δj, εj, μjIt serves as reasons The fitting coefficient of each parameter item of the correspondence that Maximum Likelihood Regression obtains.
Further, application task number is the processor quantity occupied using mapping.One application is multiple small tasks Intersection executes operation part different in application when each task run, thus parallel execution application.Each duty mapping is wanted A processor core is occupied, application task number is equal to the application operation processor quantity to be occupied.
Further, the optional number of bubbles is the processor sum of each free area and the difference of application task number;It answers Referred to number of tasks and is equal to the application operation processor quantity to be occupied.
Further, step 3 is realized by global administration's device (Global manager), and when subsequent applications reach, the overall situation is managed Available processors region all in device statistics current system is managed, is each subsequent these available processors areas of arrival estimation Domain corresponding execution time and waiting time find the corresponding pass of the smallest application of cost-Free Region by branch-bound algorithm System.
Further, the first-fit algorithm implements process are as follows: in system from left to right, searches for from top to bottom Idle processor judges whether it can be as the starting point of application region, and can be that starting point is same as the condition of starting point arranges the right side The product of idle processor number is greater than or equal to application region area below side idle processor number and same column;When finding first It can be used as the processor of the starting point of application region, then the processor distributed to this as the free area in the most upper left corner and answered With, will be using putting waiting list into if can not find the processor that can be used as the starting point of application region;It, will after region is selected Using with the mode map of selection in the area.It is described selection application mapped mode mode are as follows: when application bubble with When number of tasks meets 1: 1, using the rectangular mapped mode of selection;When the bubble of application and number of tasks are unsatisfactory for 1: 1, using selection Communicate preferential mapped mode.
Further, the migration model includes three kinds of migration models, is respectively as follows: rectangular migration model, most cold in region Most cold neighbours' core migration model in core migration model and region;The step 5 is according to the mapped mode of application and the meter of its own Calculation amount-traffic ratio alternatively foundation respectively must not exceed its task quantity using number of bubbles maximum, select migration model Process it is as follows:
If number of bubbles is equal with number of tasks, select rectangular migration model to keep its communication distance in transition process for application In it is constant, and processor runs overclocking;
If number of bubbles is unequal with number of tasks, for the preferential mapping of application selection communication.Threshold value value is 2h/ (h-1), and h is Key application route jumping figure, that is, longest the route jumping figure of weight path is moved for the CCR most cold core of application selection region for being greater than threshold value It moves, the most cold neighbours' core migration of application selection region for being less than threshold value for CCR.
Further, three kinds of migration models the specific implementation process is as follows:
(1) rectangular migration model, application need to meet rectangular mapping and application region size is at least twice of number of tasks, adjust Used time unbinds all tasks and its processor currently mapped, and task is sequentially sequentially mapped to apply by task number (ID) In another piece of free area in region;
(2) most cold core migration model in region, when calling, are navigated to by the task ID that Hot spots detection obtains beyond temperature threshold On the processor of value, that is, Overheating Treatment device, the then minimum processor core of seeking temperature in application region, when finding a temperature The transition condition of most cold core migration model in minimum processor core and region is spent, i.e. the processor is that bubble is appointed without mapping The idle processor of business does not have mapping tasks on processor, and the processor of task and Overheating Treatment device, that is, originally is unbinded and reflected It is mapped on the processor, if being unable to satisfy the transition condition of task, i.e., most cold core is not bubble, is done at frequency reducing to Overheating Treatment device Reason;
(3) most cold neighbours' core migration model in region navigates to overheat by the task ID that Hot spots detection obtains when calling It manages on device, the minimum processor core of seeking temperature in the 8 adjacent cores adjacent with the processor, if the processor core searched The transition condition for meeting most cold neighbours' core migration model in region, i.e., had mapped task thereon but temperature is not higher than threshold temperature 2/3rds or the processor be bubble be idle processor, then execute task immigration, if having in the processor appoint Business, exchanges the task on the processor and Overheating Treatment device, i.e., double time to unbind-remap;If the processor core is in idle shape State, execute single unbind-remap, if being unable to satisfy the transition condition of task, i.e., its temperature be higher than threshold temperature three/ Two, down conversion process is done to Overheating Treatment device.
Further, the branch-bound algorithm are as follows: the bubble that different number is distributed for each application, at each node Current overall response time is calculated as cost function;Overall response time σ be the node at be assigned bubble application etc. Maximum value to the sum of time and execution time:
σ=max { ηi+∏i, i ∈ { 0,1 ..., n }
Wherein ηiFor the waiting time of application i, ∏iFor the execution time of application i.It is most slow that every second priority expands lower bound growth Slow branch, and longer upper layer node of i.e. response time higher than next node layer cost is cut, when finally obtaining overall response Between shortest bubble allocation result, i.e. application region division result.
Further, after using arrival, when calculating the execution applied under different number of bubbles according to bubble-performance model estimation Between.Respectively it must not exceed its task quantity using number of bubbles maximum.When number of bubbles is equal to task quantity, all migrations are estimated The execution time under mode.When number of bubbles is less than task quantity, the neighbour of communication is conducive to for the CCR estimation for being higher than threshold value Occupy the execution time under most cold core task immigration mode;For CCR lower than the estimation of threshold value conducive to most cold in the region calculated Core task immigration mode.
Further, if the processor quantity that region includes is greater than application task number, according to bubble in mapping rear region The frequency that activity processor in the region is calculated with the ratio of number of tasks enables processor overclocking run.
Further, the operation phase is being applied, system is in each control room every interior detection hot spot.When there is hot spot, Task immigration is carried out by the migration model of selection in corresponding application region.
Further, the preferential mapped mode of the communication, to communicate the maximum node of weight as preferred duty mapping On the processor core near the geometric center of application region, and by coupled node be sequentially mapped to away from its Manhattan away from From on shortest available processors core.Secondly choose father's node that non-mapping tasks have mapped, by coupled node according to It is secondary to be mapped to away from the shortest available processors core of its manhatton distance.And so on, until all tasks of application all map Onto processor.
Further, the square mapped mode, taking the square root of application task number to be rounded downwards is side length, by all Continuous Mappings of being engaged in are in a rectangular region.
Identical with sampling interval control room is chosen every, one control room of every mistake every having detected whether that temperature of processor is higher than Given temperature threshold.If it is not, without task immigration.If there is exceed temperature threshold processor core, then by The task orientation mapped on hot spot to corresponding application, in the event class example for being bundled with the application by the mode of selection into Row task immigration.Hot spots detection provides the application ID and its overheat task ID of overheat.
By bubble-performance model, waiting time model, three kinds of migration models of design and the expansion two-dimensional slice of establishing application Upper network many-core system.In resource allocation, using black silicon phenomenon to obtain preferably calculated performance.On task immigration, root According to application mapped mode and its own calculation amount-traffic ratio (Computation communication rate, CCR) alternatively foundation selects migration model for different types of application.Higher CCR generally means that calculated performance is corresponding The contribution of overall performance is bigger, on the contrary then mean that communication performance is more important in overall performance.
The present invention have compared with prior art it is below the utility model has the advantages that
1, in this dispatching method, bubble serves not only as heat dissipation, can also be simultaneously as task on Overheating Treatment device Migrating objects.
2, system can the current application arrival rate of dynamic response, make full use of black silicon phenomenon to mention when reaching and applying less The runnability for rising application improves system availability when arrival application is more, avoids the waiting time of application too long.It is overall On, more other traditional task immigration methods improve system throughput, reduce the average response time of application.
Detailed description of the invention
Fig. 1 is the structural block diagram of the analogue system after the present invention expands.
Fig. 2 is the algorithm exemplary diagram that branch circle distributes bubble in the present invention.
Fig. 3 is the flow chart of processor resource dispatching method in the present invention.
Specific embodiment
Present invention will now be described in further detail with reference to the embodiments and the accompanying drawings, but embodiments of the present invention are unlimited In this.
A kind of many-core system processor resource dispatching method based on thermal sensing dynamic task migrating of the present embodiment uses The corresponding CCR threshold value of application 1 be 2.6, be 2.6 using 2 corresponding CCR threshold values, be 2.4 using 3 corresponding CCR threshold values.System scale Contain 25 processor cores altogether for 5*5, the clock frequency of processor core is 1GHZ, and the clock cycle was 1 nanosecond.
As shown in Figure 3, comprising the following steps:
Step 1: detection waits whether sequence is empty;It is not sky, then each mapped in waiting sequence is applied, then into Row step 2;Terminate this scheduling of resource if empty, waits next clock cycle;
Step 2: detection reaches whether queue is empty;If reaching queue not empty, step 3 is carried out;It is if reaching queue Sky, then the clock cycle does not have new opplication arrival, is not required to carry out processor resource scheduling, terminates this scheduling of resource, under waiting When one clock cycle starts, step 1 is carried out.
Step 3: whether without application operation in detection system, if not application operation, i.e. 5*5 processor of system Resource is available, then with the execution that bubble-performance model and waiting time model are respectively under each estimation difference bubble Time and waiting time search for optimal gas by branch-bound algorithm as the calculating input of cost function in branch-and-bound Steep allocation result;If having using running, step 4 is carried out.
Step 4: if having using running, the application region that the available processors of system have been occupied is divided into one group not Continuous free time Free Region.It is respectively the optional number of bubbles of each estimation with bubble-performance model and waiting time model Under (optional number of bubbles be each free area processor sum and application task number difference) execution time and waiting time, As the calculating input of cost function in branch-and-bound, optimal bubble allocation result is searched for by branch-bound algorithm, is made total Response time is most short.
Step 5: using mapping phase: using adapting to for the first time, heuritic approach (First-Fit Heuristic) selection is empty The mapped mode in not busy region, reselection application is mapped;The mapped mode of the application includes rectangular mapped mode and communication Preferential mapped mode;
Step 6: using the operation phase: according to calculation amount-traffic ratio of the mapped mode of application and its own (Computation communication rate, CCR) alternatively foundation, for different types of application selection migration mould Formula.
Further, the bubble-performance model for calculate it is each apply the corresponding execution under different bubbles numbers when Between, which is polynomial regression model, and enabling the execution time of application is ∏i, bubbles number b that application region includesi, application The longest route jumping figure h of critical path hop count, that is, weight pathi, the average calculation times c of task ini, the task in Average communication data ti, bubble-performance model is as follows:
Wherein n1~n4For polynomial order, αk, βk, γk, θkFor model coefficient, value is obtained by maximum-likelihood method.For Using the k power of the number of bubbles of i,For application i critical path hop count k power,For the average task computation time in application i K power,For the average every two intertask communication time of application i.
Further, the waiting time model be given current application area size, that is, number of bubbles and number of tasks it With, for calculate it is each apply under different bubbles numbers and current arrival rate under the corresponding waiting time, which is multinomial Regression model, the region area of current application are denoted as R, and the total processor quantity of system is denoted as | T |, mapped the average task of application Number scale is | Ai|, enabling the waiting time of current application is ηi, ηiBy following variable modeling: the average bubble number-for having mapped application is appointed Be engaged in number ratio ri, mapped the average performance times e of applicationi, using arrival rate λ, waiting time model is as follows:
Wherein a0For constant term, z is polynomial order, rjFor the average bubble number-number of tasks ratio j for having mapped application Power, ejFor mapped application average performance times j power, λjFor the j power of current application arrival rate.δj, εj, μjIt serves as reasons The fitting coefficient of each parameter item of the correspondence that Maximum Likelihood Regression obtains.
Further, step 3 is realized by global administration's device (Global manager), i.e., global when subsequent applications arrival Manager counts available processors region all in current system, is each these available processors of subsequent arrival estimation Region corresponding execution time and waiting time find the corresponding pass of the smallest application of cost-Free Region by branch-bound algorithm System.
Further, the first-fit algorithm implements process are as follows: in system from left to right, searches for from top to bottom Idle processor judges whether it can be as the starting point of application region, and can be that starting point is same as the condition of starting point arranges the right side The product of idle processor number is greater than or equal to application region area below side idle processor number and same column;When finding first It can be used as the processor of the starting point of application region, then the processor distributed to this as the free area in the most upper left corner and answered With, will be using putting waiting list into if can not find the processor that can be used as the starting point of application region;It, will after region is selected Using with the mode map of selection in the area.
The mode of the mapped mode of the selection application are as follows: when the bubble of application and number of tasks meet 1: 1, using selection Rectangular mapped mode;When the bubble of application and number of tasks are unsatisfactory for 1: 1, preferential mapped mode is communicated using selection.
Further, the migration model includes three kinds of migration models, is respectively as follows: rectangular migration model, most cold in region Most cold neighbours' core migration model in core migration model and region;The step 5 is according to the mapped mode of application and the meter of its own Calculation amount-traffic ratio alternatively foundation respectively must not exceed its task quantity using number of bubbles maximum, select migration model Process it is as follows:
If number of bubbles is equal with number of tasks, select rectangular migration model to keep its communication distance in transition process for application In it is constant, and processor runs overclocking;
If number of bubbles is unequal with number of tasks, for the preferential mapping of application selection communication.It is greater than the application selection of threshold value for CCR The most cold core migration in region, the most cold neighbours' core migration of application selection region for being less than threshold value for CCR.
Further, three kinds of migration models the specific implementation process is as follows:
(1) rectangular migration model, application need to meet rectangular mapping and application region size is at least twice of number of tasks, adjust Used time unbinds all tasks and its processor currently mapped, is sequentially mapped to another piece of sky of application region in order again In not busy region;
(2) most cold core migration model in region navigates to Overheating Treatment device by the task ID that Hot spots detection obtains when calling On, the then minimum processor core of seeking temperature in application region, when finding the minimum processor core of temperature and satisfaction Transition condition, the i.e. processor are bubble, do not have mapping tasks on processor, and task is unbinded with Overheating Treatment device and is reflected again It is mapped on the processor, if being unable to satisfy the transition condition of task, i.e., most cold core is not bubble, is done at frequency reducing to Overheating Treatment device Reason;
(3) most cold neighbours' core migration model in region navigates to overheat by the task ID that Hot spots detection obtains when calling It manages on device, the minimum processor core of seeking temperature in the 8 adjacent cores adjacent with the processor, if the processor core searched Meet transition condition, i.e., has had mapped task thereon but temperature is not higher than 2/3rds or processor of threshold temperature as gas Bubble then selectes the target processor that the processor is task immigration.If having task in the processor, the processor and mistake are exchanged Task on annealer executes and unbinds-remap twice, the task X that Overheating Treatment device is mapped with it is unbinded, by mesh The task Y unbundlings that mark processor is mapped with it, then task X is remapped on target processor, task Y is remapped to On Overheating Treatment device;If the processor core is idle processor, executes single and unbind-remap, if being unable to satisfy task Transition condition, i.e. its temperature are higher than 2/3rds of threshold temperature, do down conversion process to Overheating Treatment device.
Further, after using arrival, when calculating the execution applied under different number of bubbles according to bubble-performance model estimation Between.Respectively it must not exceed its task quantity using number of bubbles maximum.When number of bubbles is equal to task quantity, all migrations are estimated The execution time under mode.When number of bubbles is less than task quantity, the neighbour of communication is conducive to for the CCR estimation for being higher than threshold value Occupy the execution time under most cold core task immigration mode;For CCR lower than the estimation of threshold value conducive to most cold in the region calculated Core task immigration mode.
Further, if the processor quantity that region includes is greater than application task number, according to bubble in mapping rear region The frequency that activity processor in the region is calculated with the ratio of number of tasks enables processor overclocking run.
Further volume is applying the operation phase, and system is in each control room every interior detection hot spot.When there is hot spot, Task immigration is carried out by the migration model of selection in corresponding application region.
Further, the preferential mapped mode of the communication, to communicate the maximum node of weight as preferred duty mapping On the processor core near the geometric center of application region, and by coupled node be sequentially mapped to away from its Manhattan away from From on shortest available processors core.Secondly choose father's node that non-mapping tasks have mapped, by coupled node according to It is secondary to be mapped to away from the shortest available processors core of its manhatton distance.And so on, until all tasks of application all map Onto processor.
Further, the square mapped mode, taking the square root of application task number to be rounded downwards is side length, by all Continuous Mappings of being engaged in are in a rectangular region.
Identical with sampling interval control room is chosen every, one control room of every mistake every having detected whether that temperature of processor is higher than Given temperature threshold.If it is not, without task immigration.If there is the processing for being 80 degrees Celsius beyond temperature threshold Device core, then by the task orientation that is mapped on hot spot to corresponding application, by selection in the event class example for being bundled with the application Mode carry out task immigration.Hot spots detection provides the application ID and its overheat task ID of overheat.
The present invention realizes that, referring to Fig. 1, which includes in an analogue system Pplication generator, event driven simulation multiple nucleus system and HotSpot (Chinese name hot spot) temperature simulation device are U.S. Fu Jini University utilizes the temperature simulation model of resistance-capacitance equivalent model exploitation, has the characteristics that quickly and accurately, to use newest hair Cloth version HotSpot6.0 temperature simulation device.Pplication generator generates the task number that task image is created that simulation application at random With task computation amount (range 100-800), the traffic (range 50-500) of communication topology and every communication side between task is appointed Business figure is represented as the directed acyclic graph of Weight, and the communication contained between the calculation amount and task of each task relies on.
Event driven simulation multiple nucleus system simulates the binding relationship and task of single task example and analog processor Execution calculating and intertask communication.Processor resource dispatching algorithm includes to distribute bubble, duty mapping and task fortune to application Row Migration Simulation.Duty mapping is realized by the binding of single task-uniprocessor.After application terminates operation, occupied by task Processor can all be released.When simulating task run, the running frequency that the calculating speed of task is equal to alignment processing device (is The running frequency of each analog processor is controllable in uniting), the communication speed between each pair of task is equal to the routing of processor Manhatton distance between frequency/corresponding two processor.
In task mapping phase, two kinds of optional mapping methods, rectangular mapping and the preferential mapping of communication are provided:
All tasks of application are simply pressed numeric order Continuous Mappings in a squarish region by rectangular mapping, The region accounts for the half in overall applicability region (allocated free time continuous processing device region).
The preferential mapping of communication three dynamic arrays MAP, MET, UNM of creation first respectively represent having mapped for task, and The task of mapping tasks connected (having communication), and other tasks not in the first two queue.Node on behalf analog processor. When initialization, select the approximate geometric center of application region (allocated free time continuous processing device region) as first node, The accumulative highest task of the traffic is chosen as preferred task and is mapped on first node, MAP array is put into.It will appoint with first choice The connected all tasks of business are sequentially placed into MUP array, and remaining task is put into UNM array.To first task in MET, in MAP In find it father's task and corresponding node, successively from the node manhatton distance be 1,2,3 ... node at start Search, until finding first enabled node, by the duty mapping in MET array on node, it is moved to from MET In MAP array, and coupled and in UNM task is moved in MET from UNM.To node each in MET be carried out with Upper step, until the size of MAP array is equal to the task quantity of application, mapping is completed.
According to the power module of uniprocessor, the function of each processor in many-core system is calculated in every microsecond Rate, the power trace as current time.Using fixed time period as the sampling interval, one sampling interval of every mistake is called HotSpot temperature simulation device emulates system temperature, is made with the power track that system runs time consuming each moment For input, when Hotspot returns to this in etching system each processor core instantaneous analog temperature.When the temperature of some processor is super When crossing the temperature threshold of setting, the corresponding application of hot spot will execute task immigration.
In task migration phase, three kinds of optional migration models are provided, rectangular migration model, most cold core migration in region Most cold neighbours' core migration model in mode and region.
Rectangular migration model is based on rectangular mapping, will be mapped as another in rectangular task bulk migration to application region In one rectangular region.
The minimum processor core of most cold core migration model seeking temperature in application region in region, when finding a temperature Minimum processor core and meet transition condition, it will be on the task immigration of overheat to most cold core.
Most cold neighbours' core migration model minimum processing of seeking temperature in 8 cores adjacent with Overheating Treatment device in region Device core exchanges the task on the processor and Overheating Treatment device, and suitably reduce Overheating Treatment if meeting certain temperature condition The frequency of device.
Fig. 2 illustrates an example and is three after expression system reaches three applications for the first time according to branch-bound algorithm and answers With the different number of bubbles of distribution to obtain shortest overall response time.
Include seven tasks using 1, bubble 0,1,3,5,7 corresponding execution time difference are obtained by bubble-performance model It (is indicated for 80,76,62,51,40 to simplify, only consider the number of bubbles that may be constructed regular domain, the longest edge of application region is small In the side length of system).
Include five tasks using 2, bubble 0,1,3,4,5 corresponding execution time difference are obtained by bubble-performance model It is 300,275,250,217,150.
Include eight tasks using 3, bubble 0,1,2,4,6,8 corresponding execution time point are obtained by bubble-performance model It Wei 195,184,166,147,125,99.
B1 in embodiment, b2, b3 respectively represent the bubble distributed to using 1, distribute to the bubble using 2, and distribute to Using 3 bubble.By the characteristic of branch-bound method, every second priority extended response time increases most slow branch (when overall response Between i.e. lower bound), trimming lower bound be greater than low layer branch lower bound upper layer branch (lower bound of first branch of such as third layer be 195, Trim the branch that all lower bounds of the second layer are greater than 195).The shortest branch of overall response time is found in the bottom, it is as optimal It solves, the optimal solution of overall response time is 165 in embodiment, trims the branch that all lower bounds are greater than optimal solution.Optimal solution is corresponding Bubble distributes to three applications, and the corresponding bubble of the optimal solution of embodiment is { b1=7, b2=5, b3=6 }.Such as Fig. 2, optimal solution The corresponding ancestor node of node is that b1=7 gives using 17 bubbles of distribution, and corresponding father's node is that b2=5 gives using 2 5 bubbles are distributed, the node oneself corresponds to b3=6 and gives using 36 bubbles of distribution.
Application region divides and scheduling of resource is as follows: 7 tasks need to be mapped using 1 application region and include 7 bubbles, The processor region for being 14 for its allocated size, first-fit algorithm search continuous free area (one of them of 3*5 for it Processor is not occupied);5 tasks need to be mapped using 2 application region and include 5 bubbles, be its allocated size be 10 Processor region, first-fit algorithm search the continuous free area of 2*5 for it;8 need to be mapped using 3 application region to appoint It is engaged in and includes 6 bubbles, not sufficiently large free area, waits in the queue using 3 in system.
To applying 1, number of tasks is equal with number of bubbles, it is mapped in region by rectangular mapped mode, and selects rectangular Migration model, application 1 bring into operation;To applying 2, number of tasks is equal with number of bubbles, it is mapped to area by rectangular mapped mode In domain, and rectangular migration model is selected, application 2 brings into operation.
It is run using 1 end first, the processor core that application region occupies is released.It need to be reflected using 3 application region It penetrates 8 tasks and includes 6 bubbles, be the continuous idle processor region that its allocated size is 14, first-fit algorithm is it Search the free area of 3*5.
To applying 3, number of bubbles is less than number of tasks, it is mapped in region by preferential mapped mode is communicated.Calculate its CCR It is greater than threshold value for 3.2, is the most cold core migration model of its selection region, application 3 brings into operation.
Above content is the detailed description carried out in conjunction with specific embodiment to the present invention, but can not assert the present invention Specific implementation be only limited to this content.For general technical staff of the technical field of the invention, this hair is not being departed from Under the premise of bright principle and spirit, these can also be implemented to carry out several adjustment, modification, replacement and/or modification.The present invention Protection scope limited by appended claims and its equivalent.

Claims (10)

1. the many-core system processor resource dispatching method based on thermal sensing dynamic task migrating, which is characterized in that including following Step:
Step 1: detection waits whether sequence is empty;It is not sky, then maps each application waited in sequence, then walked Rapid two;Terminate this scheduling of resource if empty, when next clock cycle being waited to start, carries out step 1;
Step 2: detection reaches whether queue is empty;If reaching queue not empty, step 3 is carried out;If reaching queue is sky, The clock cycle does not have new opplication arrival, is not required to carry out processor resource scheduling, terminates this scheduling of resource, when waiting next When the clock period starts, step 1 is carried out;
Step 3: whether without application operation in detection many-core system, if not application operation, then N*N of system is handled Device resource is available, then is respectively holding under each estimation difference bubble with bubble-performance model and waiting time model Row time and waiting time pass through branch-bound algorithm search application as the calculating input of cost function in branch-and-bound The shortest bubble allocation result of overall response time;If having using running, step 4 is carried out;
Step 4: if having using run, the application region that the available processors of system have been occupied be divided into one group it is discontinuous Idle Free Region, be respectively under the optional number of bubbles of each estimation with bubble-performance model and waiting time model Time and waiting time are executed, as the calculating input of cost function in branch-and-bound, is searched for by branch-bound algorithm best Bubble allocation result;
Step 5: using mapping phase: selecting free area, the mapping mould of reselection application using heuritic approach is adapted to for the first time Formula is mapped;
Step 6: using the operation phase: according to calculation amount-traffic ratio of the mapped mode of application and itself (Computation communication rate, CCR) alternatively foundation, for different types of application selection migration mould Formula.
2. processor resource dispatching method according to claim 1, which is characterized in that the bubble-performance model is used for Calculate it is each apply the corresponding execution time under different bubbles numbers, the model be polynomial regression model, enable the execution of application Time is Πi, bubbles number b that application region includesi, critical path hop count, that is, longest route jumping figure of weight path of application hi, the average calculation times c of task ini, the task average communication data t ini, bubble-performance model is as follows:
Wherein n1~n4For polynomial order, αkkkkFor the fitting coefficient of model, value is obtained by maximum-likelihood method. For application i number of bubbles k power,For application i critical path hop count k power,For average task computation in application i The k power of time,For the average every two intertask communication time of application i.
3. processor resource dispatching method according to claim 1, which is characterized in that the waiting time model is given Area size, that is, the sum of number of bubbles and number of tasks of current application each applied under different bubbles numbers and is currently arrived for calculating Corresponding waiting time under up to rate, the model are polynomial regression model, and the region area of current application is denoted as R, and system is always located Reason device quantity is denoted as | T |, the average number of tasks for having mapped application is denoted as | Ai|, enabling the waiting time of current application is ηi, ηiBy with Lower variable modeling: r is the average bubble number-number of tasks ratio for having mapped application, and e is the average performance times for having mapped application, λ For using arrival rate, waiting time model is as follows:
Wherein a0For constant term, z is polynomial order, rjFor the average bubble number-number of tasks ratio j power for having mapped application, ejFor mapped application average performance times j power, λjFor the j power of current application arrival rate, j is 1 value for arriving z, δj, εj, μjFitting coefficient for each parameter item obtained by Maximum Likelihood Regression.
4. processor resource dispatching method according to claim 1, which is characterized in that step 3 is real by global administration's device Existing, it is each subsequent arrival that when subsequent applications reach, global administration's device, which counts available processors region all in current system, These available processors regions of estimation corresponding execution time and waiting time find cost most by branch-bound algorithm Small application-Free Region corresponding relationship.
5. processor resource dispatching method according to claim 1, which is characterized in that the mapped mode of the application includes The rectangular mapped mode mapped mode preferential with communication.
6. processor resource dispatching method according to claim 1, which is characterized in that the first-fit algorithm are as follows: In system from left to right, idle processor is searched for from top to bottom, judges whether it can be as the starting point of application region, it can conduct The condition of starting point is that starting point is greater than or waits with the product of row's right side idle processor number and idle processor number below same column In application region area;When finding first processor that can be used as the starting point of application region, then by the processor as most The application is distributed in the free area in the upper left corner, if can not find the processor that can be used as the starting point of application region, will apply Put waiting list into;After region is selected, the mode map that application is selected is in the area;The mapping mould of the selection application The mode of formula are as follows: when the bubble of application and number of tasks meet 1:1, using the rectangular mapped mode of selection;When application bubble with When number of tasks is unsatisfactory for 1:1, preferential mapped mode is communicated using selection.
7. processor resource dispatching method according to claim 1, which is characterized in that the migration model includes three kinds and moves Mode shifter is respectively as follows: rectangular migration model, most cold neighbours' core migration model in most cold core migration model and region in region;Institute Step 5 is stated according to the mapped mode of application and the calculation amount-traffic ratio of its own alternatively foundation, respectively applies bubble Quantity maximum must not exceed its task quantity, select the condition of migration model as follows:
If number of bubbles is equal with number of tasks, select rectangular migration model for application with keep its communication distance in transition process not Become, and processor runs overclocking;
If number of bubbles is unequal with number of tasks, for the most cold core migration of application selection region that CCR is greater than threshold value, it is less than threshold for CCR The most cold neighbours' core migration of the application selection region of value, threshold value value are 2h/ (h-1), and h is that key application route jumping figure weights road The longest route jumping figure of diameter.
8. processor resource dispatching method according to claim 6, which is characterized in that three kinds of migration models are specifically real Existing process is as follows:
(1) rectangular migration model, application need to meet rectangular mapping and application region size is at least twice of number of tasks, when calling All tasks and its processor currently mapped are unbinded, task is sequentially mapped to the sky of application region by task number sequence In not busy region;
(2) most cold core migration model in region, when calling, are navigated to by the task ID that Hot spots detection obtains beyond temperature threshold On processor, that is, Overheating Treatment device, the then minimum processor core of seeking temperature in application region, when finding a temperature most Low processor core and the transition condition for meeting most cold core migration model in region, the i.e. processor are that bubble is appointed without mapping The idle processor of business does not have mapping tasks on processor, and the processor of task and Overheating Treatment device, that is, originally is unbinded and reflected It is mapped on the processor, if being unable to satisfy the transition condition of most cold core migration model in region, i.e., most cold core is not bubble, to mistake Annealer does down conversion process;
(3) most cold neighbours' core migration model in region navigates to Overheating Treatment device by the task ID that Hot spots detection obtains when calling On, the minimum processor core of seeking temperature in the 8 adjacent cores adjacent with the processor, if the processor core searched meets The transition condition of most cold neighbours' core migration model in region, i.e., the temperature for having had mapped task but processor core thereon are not higher than threshold 2/3rds or the processor for being worth temperature are bubble, then execute task immigration, if having task in the processor, exchange should Task on processor and Overheating Treatment device, i.e., it is double time to unbind-remap;If the processor core is in idle condition, list is executed Secondary to unbind-remap, if being unable to satisfy the transition condition of task, i.e., its temperature is higher than 2/3rds of threshold temperature, to overheat Processor does down conversion process.
9. processor resource dispatching method according to claim 1, which is characterized in that the branch-bound algorithm are as follows: be The bubble of each application distribution different number, calculates current overall response time as cost function at each node;It is total to ring σ is the waiting time for the application that bubble has been assigned at the node and the maximum value of the sum of execution time between seasonable:
σ=max { ηii, i ∈ { 0,1 ..., n }
Wherein ηiFor the waiting time of application i, ΠiFor the execution time of application i.It is most slow that every second priority expands lower bound growth Branch, and longer upper layer node of i.e. response time higher than next node layer cost is cut, finally obtain overall response time most Short bubble allocation result, i.e. application region division result.
10. processor resource dispatching method according to claim 1, which is characterized in that the optional number of bubbles is each The processor sum of free area and the difference of application task number;Application task number, which refers to, is equal to the application operation processor number to be occupied Amount.
CN201910049800.5A 2019-01-18 2019-01-18 Resource scheduling method for many-core system processor based on thermal perception dynamic task migration Active CN109918195B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910049800.5A CN109918195B (en) 2019-01-18 2019-01-18 Resource scheduling method for many-core system processor based on thermal perception dynamic task migration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910049800.5A CN109918195B (en) 2019-01-18 2019-01-18 Resource scheduling method for many-core system processor based on thermal perception dynamic task migration

Publications (2)

Publication Number Publication Date
CN109918195A true CN109918195A (en) 2019-06-21
CN109918195B CN109918195B (en) 2023-06-20

Family

ID=66960500

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910049800.5A Active CN109918195B (en) 2019-01-18 2019-01-18 Resource scheduling method for many-core system processor based on thermal perception dynamic task migration

Country Status (1)

Country Link
CN (1) CN109918195B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110794949A (en) * 2019-09-27 2020-02-14 苏州浪潮智能科技有限公司 Power consumption reduction method and system for automatically allocating computing resources based on component temperature
CN112445154A (en) * 2019-08-27 2021-03-05 无锡江南计算技术研究所 Multi-stage processing method for heterogeneous many-core processor temperature alarm
CN113867973A (en) * 2021-12-06 2021-12-31 腾讯科技(深圳)有限公司 Resource allocation method and device
CN114039980A (en) * 2021-11-08 2022-02-11 郑州轻工业大学 Low-delay container migration path selection method and system facing edge collaborative computing
WO2024009747A1 (en) * 2022-07-08 2024-01-11 ソニーグループ株式会社 Information processing device, information processing method, and program

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102473161A (en) * 2009-08-18 2012-05-23 国际商业机器公司 Decentralized load distribution to reduce power and/or cooling cost in event-driven system
US20120331474A1 (en) * 2010-02-19 2012-12-27 Nec Corporation Real time system task configuration optimization system for multi-core processors, and method and program
US20150169380A1 (en) * 2013-12-17 2015-06-18 International Business Machines Corporation Calculation method and apparatus for evaluating response time of computer system in which plurality of units of execution can be run on each processor core
CN107193656A (en) * 2017-05-17 2017-09-22 深圳先进技术研究院 Method for managing resource, terminal device and the computer-readable recording medium of multiple nucleus system
US20180107965A1 (en) * 2016-10-13 2018-04-19 General Electric Company Methods and systems related to allocating field engineering resources for power plant maintenance

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102473161A (en) * 2009-08-18 2012-05-23 国际商业机器公司 Decentralized load distribution to reduce power and/or cooling cost in event-driven system
US20120331474A1 (en) * 2010-02-19 2012-12-27 Nec Corporation Real time system task configuration optimization system for multi-core processors, and method and program
US20150169380A1 (en) * 2013-12-17 2015-06-18 International Business Machines Corporation Calculation method and apparatus for evaluating response time of computer system in which plurality of units of execution can be run on each processor core
US20180107965A1 (en) * 2016-10-13 2018-04-19 General Electric Company Methods and systems related to allocating field engineering resources for power plant maintenance
CN107193656A (en) * 2017-05-17 2017-09-22 深圳先进技术研究院 Method for managing resource, terminal device and the computer-readable recording medium of multiple nucleus system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WANG XIAO-HANG,等;: "Energy Efficient Run-Time Incremental Mapping for 3-D Networks-On-Chip", 《JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY》 *
卢国明,等;: "数据网格资源协同分配问题研究", 《系统工程与电子技术》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112445154A (en) * 2019-08-27 2021-03-05 无锡江南计算技术研究所 Multi-stage processing method for heterogeneous many-core processor temperature alarm
CN110794949A (en) * 2019-09-27 2020-02-14 苏州浪潮智能科技有限公司 Power consumption reduction method and system for automatically allocating computing resources based on component temperature
CN114039980A (en) * 2021-11-08 2022-02-11 郑州轻工业大学 Low-delay container migration path selection method and system facing edge collaborative computing
CN113867973A (en) * 2021-12-06 2021-12-31 腾讯科技(深圳)有限公司 Resource allocation method and device
CN113867973B (en) * 2021-12-06 2022-02-25 腾讯科技(深圳)有限公司 Resource allocation method and device
WO2024009747A1 (en) * 2022-07-08 2024-01-11 ソニーグループ株式会社 Information processing device, information processing method, and program

Also Published As

Publication number Publication date
CN109918195B (en) 2023-06-20

Similar Documents

Publication Publication Date Title
CN109918195A (en) Many-core system processor resource dispatching method based on thermal sensing dynamic task migrating
Silva Filho et al. Approaches for optimizing virtual machine placement and migration in cloud environments: A survey
Zeng et al. Joint optimization of task scheduling and image placement in fog computing supported software-defined embedded system
Ge et al. GA-based task scheduler for the cloud computing systems
Oleghe Container placement and migration in edge computing: Concept and scheduling models
Liu et al. Strategy configurations of multiple users competition for cloud service reservation
Keshk et al. Cloud task scheduling for load balancing based on intelligent strategy
Xia Resource scheduling for piano teaching system of internet of things based on mobile edge computing
Wang et al. An adaptive model-free resource and power management approach for multi-tier cloud environments
CN102984137A (en) Multi-target server scheduling method based on multi-target genetic algorithm
CN108845886B (en) Cloud computing energy consumption optimization method and system based on phase space
Rugwiro et al. Task scheduling and resource allocation based on ant-colony optimization and deep reinforcement learning
Amin et al. iHPSA: An improved bio-inspired hybrid optimization algorithm for task mapping in Network on Chip
Sharma et al. An improved task allocation strategy in cloud using modified k-means clustering technique
CN103997515B (en) Center system of selection and its application are calculated in a kind of distributed cloud
Yao et al. A network-aware virtual machine allocation in cloud datacenter
Hu et al. Lars: A latency-aware and real-time scheduling framework for edge-enabled internet of vehicles
Xu et al. Fog-cloud task scheduling of energy consumption optimisation with deadline consideration
Taheri et al. Hopfield neural network for simultaneous job scheduling and data replication in grids
Meng et al. Communication and cooling aware job allocation in data centers for communication-intensive workloads
Lai et al. Delay-aware container scheduling in kubernetes
Maashi et al. Elevating Survivability in Next-Gen IoT-Fog-Cloud Networks: Scheduling Optimization With the Metaheuristic Mountain Gazelle Algorithm
CN110308965A (en) The rule-based heuristic virtual machine distribution method and system of cloud data center
Liu et al. Energy‐aware virtual machine consolidation based on evolutionary game theory
CN108182243A (en) A kind of Distributed evolutionary island model parallel method based on Spark

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant