CN103294550A - Heterogeneous multi-core thread scheduling method, heterogeneous multi-core thread scheduling system and heterogeneous multi-core processor - Google Patents

Heterogeneous multi-core thread scheduling method, heterogeneous multi-core thread scheduling system and heterogeneous multi-core processor Download PDF

Info

Publication number
CN103294550A
CN103294550A CN2013102065330A CN201310206533A CN103294550A CN 103294550 A CN103294550 A CN 103294550A CN 2013102065330 A CN2013102065330 A CN 2013102065330A CN 201310206533 A CN201310206533 A CN 201310206533A CN 103294550 A CN103294550 A CN 103294550A
Authority
CN
China
Prior art keywords
thread
nuclear
scheduling
heterogeneous multi
sampling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013102065330A
Other languages
Chinese (zh)
Other versions
CN103294550B (en
Inventor
王磊
陈云霁
陈天石
陆超
李梦竹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201310206533.0A priority Critical patent/CN103294550B/en
Publication of CN103294550A publication Critical patent/CN103294550A/en
Application granted granted Critical
Publication of CN103294550B publication Critical patent/CN103294550B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a heterogeneous multi-core thread scheduling method. The heterogeneous multi-core thread scheduling method includes: respectively generating a ranking list for threads and cores according to dynamic characteristics of a program, finding out optimal stable match of the threads and the cores according to the ranking lists, and performing thread scheduling according to stable match. The heterogeneous multi-core thread scheduling method specifically includes: receiving characteristic vectors of the threads running in the cores, and selecting priority ranking of the cores for the threads according to the characteristic vectors; ranking each thread for each core; receiving the ranking lists of the threads and the cores, and finding out stable match results of the threads and the cores; and receiving the stable match results, scheduling by an operating system and allocating each thread to the corresponding core for running. Huge expenditure caused by sampling scheduling is avoided, the heterogeneous multi-core thread scheduling method takes more complex factors influencing performances and power consumption into consideration, and only relative relations rather than specific values need to be predicated, so that model complexity is lowered while scheduling precision is also improved.

Description

A kind of heterogeneous polynuclear thread scheduling method, system and heterogeneous multi-nucleus processor
Technical field
The present invention relates to a kind of in single instrction collection heterogeneous multi-nucleus processor (Single-ISA heterogeneous multi-core processors) thread equate thread scheduling method (threads scheduling policy) field under the situation with the check figure order, relate in particular to a kind of according to thread and check select prioritization each other after, finish the realization of thread scheduling method with the Gale-Shapley algorithm.
Background technology
Along with the development of integrated circuit technology, increasing nuclear is integrated in the same SOC (system on a chip), and (chip multi-processors CMP) becomes a kind of processor structure of main flow to chip multi-core processor gradually.Chip multi-core processor is by providing more performance performance in integrated a plurality of identical general purpose core on the sheet for the program of parallel running in system, but also can be subjected to power consumption simultaneously, heat radiation, the restriction of chip area etc.In order more effectively to take full advantage of power consumption limited on the sheet and area, industry member and academia have proposed the heterogeneous multi-nucleus processor structure.
Heterogeneous multi-nucleus processor has the multiple form of the composition, and the present invention relates generally to single instrction collection heterogeneous multi-nucleus processor (Single-ISA heterogeneous multi-core processors).In single instrction collection heterogeneous multi-nucleus processor, dissimilar nuclear shares same set of instruction set.Difference between the nuclear both can be caused by frequency, cache size, power consumption constraints parameters such as (power budget), also may be that (for example: out-of-order or in-order, instruction emission width etc.) difference causes owing to basic structure design.In addition, the present invention is primarily aimed in the heterogeneous multi-nucleus processor situation that each self-operating on each nuclear a single-threading program, so number of threads always equals the number of the nuclear in the system, and thread can be considered and the program equivalence.
Different programs has different performance of program usually.Further, even for same program, according to the variation of input set and execute phase, significant variation also can take place in its performance of program.
In heterogeneous multi-nucleus processor, according to performance of program, each thread scheduling is moved above the most suitable nuclear separately to them, this is referred to as thread scheduling.The purpose of thread scheduling is to provide more performance performance with suitable nuclear for thread, avoids the waste of power consumption simultaneously as far as possible, makes that limited power consumption and area resource all more effectively utilized on the sheet.
Dispatching method has static and dynamic branch, wherein, static dispatching method infers that by the irrelevant feature of off-line extraction procedure and concrete execution environment each thread in the performance performance that dissimilar nuclear moves, according to predicting the outcome, moves each thread scheduling to corresponding nuclear.The static scheduling method has only been used the difference between program, ignored program and had different performance of program in the different execute phases certainly, so there is natural defective in the static scheduling method.
Based on the way of the dynamic dispatching method of sampling scheduling being divided into two stages carries out: sampling phase and stable execute phase.After the trigger event generation of marked change has taken place when the program behavioral characteristics of indicating, enter sampling phase; In sampling phase, each thread is dispatched to respectively on every type the nuclear and trys out, therefore need all scheduling schemes of traversal, and note every kind of corresponding performance performance of scheduling scheme; Pick out the stable execute phase that the optimum scheduling scheme of performance performance enters the long term then, up to the generation of next trigger event.The behavioral characteristics that can take full advantage of program based on the dynamic dispatching method of sampling is dispatched.But, can bring a large amount of thread migration costs in sampling phase, and need allow program under various nonideal scheduling schemes, try out when traveling through different scheduling schemes, the performance cost that causes thus is also very big; The expense of sampling in addition can increase along with the type of system's center and increase sharply, and makes that the extensibility of this class dispatching method is very poor, can't be applied in the reality.
For fear of the expense that sampling brings, a class is suggested based on didactic dispatching method.This class dispatching method comes the more operating key messages of capture program by the monitor component (monitor) of some hardware, IPC for example, the cache invalidation rate, blocking time etc., and rule of thumb rule estimates that with these multidate informations each thread in the performance performance that dissimilar nuclear moves, uses greedy algorithm to select suitable nuclear according to the income size as thread then.
Below technical scheme representative in this class dispatching method is carried out some simple introductions:
In a heterogeneous multi-nucleus processor that is made of the nuclear of different frequency, the IPC according to a last execute phase sorts from high to low with thread, will examine by frequency simultaneously and sort, and then thread and the relative position of nuclear according to ordering is mated.Similarly way can also be divided into computation-intensive (compute-intensive) and memory access intensity (memory-intensive) two classes with thread by the cache invalidation rate information such as (cache miss rate) of collecting thread, then with the thread scheduling of computation-intensive to macronucleus (for example: the frequency height, the buffer memory area is big, out of order execution etc.) operation, the thread of memory access intensity is scheduled for that (for example: frequency is low on the small nut, the buffer memory area is little, and order is carried out etc.) operation.The starting point of this dispatching method is with instruction level parallelism degree (ILP) thereby higher computation-intensive thread is assigned to obtains the more performance performance on the macronucleus, and the thread of memory access intensity is assigned on the small nut to save power consumption.The further improvement of this class way is, with the cache invalidation rate that collects, blocking time information such as (stall time) in conjunction with the structural parameters of each nuclear, estimate the performance performance that each thread moves in different IPs, go up operation according to the big young pathbreaker's thread scheduling of performance benefits to each nuclear with greedy algorithm then.
This class dispatching method is only used the important performance of program of minority (cache invalidation rate for example usually, IPC etc.) and the structural parameters of nuclear (frequency for example, cache size etc.) come the performance of program is estimated in conjunction with certain domain knowledge or empirical rule, and in fact, the performance of program is relevant with the large amount of complex factor, this causes prediction often not accurate enough, thereby makes that the effect of this class dispatching method is undesirable.
In addition, existing dispatching method depends on mostly by a formula model and considers that limited Several Factors predicts the performance performance that each thread moves at dissimilar nuclear.But the actual performance of program is relevant with various complicated factors, causes this class prediction accuracy limited.On the other hand, even there is an accurate forecast model, the complexity of its realization is very high usually, nor necessarily helps to realize better scheduling.For example, suppose that a thread is respectively (5,4.8) in the actual performance that two dissimilar nuclears move, being predicted as of model A (4.9,5.1), being predicted as of Model B (10,1).Obviously, the prediction of model A is more accurate, but the scheduling scheme of making according to the prediction of Model B is more reliable.From this example as can be seen, in fact need the performance exact value on different nuclear, moved if it were not for thread, but a relativeness predicts that namely thread is in performance ordering of each nuclear operation.
On the other hand, existing dispatching method from the angle of thread, as decision-maker, carries out greed scheduling according to single optimization aim with thread mostly.
Generally speaking, the dispatching method of Ti Chuing all is to set an optimization aim from the visual angle of program before, selects suitable nuclear with program as decision-maker.The problem that the dispatching method of this unidirectional selection exists is in scheduling process, and nuclear does not initiatively determine whether to receive the right of a thread according to situations such as himself design feature and power consumption constraints.For example, after a nuclear is selected by the thread of certain computation-intensive, mean that this nuclear energy enough is its performance that offers the best for this thread; But from the angle of nuclear, may cause its power consumption to surpass restriction (power budget) if this stone grafting is received this thread, then this scheduling scheme is obviously not ideal enough.
Summary of the invention
In order to solve the problems of the technologies described above, the objective of the invention is to propose a kind of thread scheduling method and dispatching system of the heterogeneous multi-nucleus processor based on the Gale-Shapley algorithm, at the thread scheduling problem in the heterogeneous multi-nucleus processor, the present invention can carry out dynamic dispatching according to the variation of performance of program, effectively avoided the great expense incurred brought of dispatching method based on sampling, and heuristic dispatching method is difficult to accurately predicting to performance and causes dispatching dissatisfactory defective, and thread and nuclear all as the decision-making participant, can be taken into account the demand of thread and nuclear simultaneously in the process of dispatching.
Specifically, the invention discloses a kind of heterogeneous polynuclear thread scheduling method, comprise that the behavioral characteristics according to program is respectively thread and karyogenesis sorted lists, and find out the stable coupling of the optimum of thread and nuclear according to sorted lists, carry out thread scheduling according to this stable coupling.
Described thread and karyogenesis sorted lists comprise the generation order models, specifically comprise the steps:
(1) selects an ideal data storehouse;
(2) extraction procedure sampling fragment from this database;
(3) the sampling of program fragment is moved at the simulator of each nuclear respectively, and obtained respective response, sampling of program fragment and response thereof are divided into training set and test set two parts;
(4) select suitable learning algorithm training order models;
(5) when the test error of order models meets the demands, the training stage finishes.
Described this sampling of program fragment comprises proper vector, and for thread, this proper vector of a sampling of program fragment of input is exported the sorted lists to each nuclear; For nuclear, import this proper vector of each thread sampling of program fragment, be output as the sorted lists that each checks each thread.
Described heterogeneous polynuclear thread scheduling method specifically comprises the steps:
Collect the operating all kinds of multidate informations of thread, be output as the proper vector of certain sampling of program fragment of thread;
Reception operates in the proper vector of the thread of this nuclear, and selects a prioritization for this thread to each nuclear according to it;
Checking each thread for each sorts;
Receive the sorted lists of each thread and nuclear, and find out the stable matching result of thread and nuclear;
Receive this matching result, dispatch by operating system, each thread is assigned on the corresponding nuclear moves.
Described heterogeneous polynuclear thread scheduling method, this stable coupling of finding out thread and nuclear comprises the steps:
(1) thread proposes matching request to nuclear from high to low according to its prioritization, does not have match objects as fruit stone, and it is right with its formation coupling then to select to accept request;
(2) as fruit stone match objects has been arranged, then newer thread and the priority of match objects, the thread of accepting before if the priority of new thread is higher than, then select to accept new thread as match objects, if the thread of accepting before the priority of new thread is lower than is then refused new request;
(3) unaccepted thread is reselected next nuclear proposition matching request on the sorted lists, has all found match objects up to all threads and nuclear.
The described stable coupling of finding out thread and nuclear comprises employing Gale-Shapley algorithm.
The invention also discloses a kind of heterogeneous polynuclear thread scheduling system, it is characterized in that, comprise information acquisition module, T sorting unit, C sorting unit, adaptation, thread scheduler, wherein:
Information acquisition module is used for collecting the operating all kinds of multidate informations of each thread, is output as the proper vector of certain sampling of program fragment of each thread;
The T sorting unit is used for receiving the proper vector that operates in the thread on this nuclear, and selects prioritization for this thread to each nuclear according to it;
The C sorting unit is used to each to check each thread and sorts;
Adaptation is used for receiving the sorted lists of each thread and each nuclear, and obtains the stable matching result of thread and nuclear;
Thread scheduler receives this matching result, dispatches by operating system, each thread is assigned on the corresponding nuclear moves.
The invention also discloses a kind of heterogeneous multi-nucleus processor that adopts above-mentioned any heterogeneous polynuclear thread scheduling method.
The invention also discloses a kind of heterogeneous multi-nucleus processor that comprises above-mentioned heterogeneous polynuclear thread scheduling system.
The invention has the beneficial effects as follows: the great expense incurred of having avoided the sampling scheduling to bring on the basis of the behavioral characteristics that can utilize program; Substitute the way of coming the estimated performance power consumption with experimental formula with a nonlinear study order models, the complicated factors that influence the performance power consumption can be taken into account more, and the only relativeness that need predict but not occurrence have also improved scheduling accuracy when having reduced the complexity of model; In the scheduling process of thread, by thread and nuclear all are considered as independent decision-making main body in the game process, thus the power consumption constraints of accomplishing performance requirement and the nuclear of the program of taking into account; Utilize the Gale-Shapley algorithm to find one to be in the stable coupling of Pareto optimality and to carry out thread scheduling according to it.
Description of drawings
Fig. 1 order models off-line training of the present invention framework
The structure embodiment of Fig. 2 the present invention four nuclear heterogeneous multi-nucleus processors
Embodiment
The present invention uses for reference game theory, thread and nuclear all are considered as selfish decision-making participant, they all can maximize its performance or power consumption income respectively from angle separately as far as possible, objectively make dispatching method can take into account the optimization aim of thread and nuclear two aspects, thereby obtain a more excellent overall scheduling decision-making.
In the present invention, need obtain thread for the selection prioritization of each nuclear.And the angle from examining, the prioritization of each thread being carried out a reception.
In order to obtain above-mentioned each prioritization, need to use study ordering techniques (learn-to-rank technique) to train order models (ranker).Provided specific practice that the present invention obtains order models as Fig. 1:
Application data base Application database: an infinitely-great ideal data storehouse that comprises all programs;
Sampling of program fragment Sample application phase: the sampling of program fragment of from some example programs, extracting, it performance of program that possesses should be able to represent most of common programs, and the proper vector that can extract program with some program analysis tools commonly used such as mika;
Simulator Simulator: for a heterogeneous multi-nucleus processor, the number of nuclear and the type of each nuclear are determined in advance, the sampling of program fragment is moved at the simulator of each nuclear respectively, and obtain respective response (each core response), sampling of program fragment and response thereof are divided into training set and test set two parts;
Learning algorithm Learning algorithm: the training of order models ranker is a supervised learning process, according to circumstances select suitable learning algorithm for example RankBoost wait to train order models;
Order models Ranker model: when the test error of order models can meet the demands, the training stage finished.
For thread, the input of T-ranker is the proper vector of a usability of program fragments, output is a sorted lists to each nuclear, for T-ranker, it is changeless needing the nuclear of ordering, therefore only input variable is the proper vector of each usability of program fragments, needs T-ranker of training can be general on all nuclears; For certain nuclear, the input of C-ranker is the proper vector of each thread fragment, output is this sorted lists checking each thread, for C-ranker, need the thread of ordering to be in the variation, and each nuclear has different structure configuration features, even inequality for identical its ranking results page or leaf of one group of thread, therefore need solely train a C-ranker for each vouching.
After training finishes, order models can be realized by hardware, be integrated on each nuclear.Perhaps as the part of dispatching system of the present invention.
Utilizing order models to be respectively after thread and nuclear obtains its sorted lists separately, find stable coupling according to the Gale-Shapley algorithm, carry out thread scheduling according to matching result then.Thereby reach the state of a Pareto optimality, make all threads and nuclear all be in a relative satisfied state, thereby objectively realize the thread scheduling of an approximate global optimum.
Supposing respectively has N element in set A and the set B, and each element has oneself a prioritization tabulation to comprise all elements of another set, then always can find a stable matching status for these two set according to the Gale-Shapley algorithm, the optimum matching object that makes each element to find to find at it.Unsettled coupling means all have precedence over their separately match objects now on the element a that exists in the set A and each the comfortable the other side's of element b in the set B the sorted lists under this state, so a and b are more prone to refuse their current match objects and mate with the other side.One does not exist the coupling of labile factor to be stable coupling.For set A and B, may there be a plurality of stable couplings.Theoretical proof, the coupling that finds according to the Gale-Shapley algorithm always is in Pareto-optimality, and is best a kind of in all stable couplings.
Be the thread scheduling method based on the heterogeneous multi-nucleus processor of Gale-Shapley algorithm of realizing that the present invention provides, describe as an example with the heterogeneous multi-nucleus processor of one 4 nuclear.Obviously, the present invention also can expand to integrated more in the heterogeneous multi-nucleus processor of multinuclear, and for the type of nuclear without limits.
As shown in Figure 2, in the heterogeneous multi-nucleus processor of one 4 nuclear, except 4 nuclears, comprise with lower member: the information acquisition module Monitor on each nuclear, T sorting unit T-ranker on each nuclear, a C sorting unit C-ranker, an adaptation Matchmaker, a thread scheduler Scheduler.
Monitor: be used for collecting the operating all kinds of multidate informations of thread, include but not limited to the cache invalidation rate, blocking time, the integer instructions number, floating point instruction number etc., it is output as the proper vector of certain program segment of thread;
T-ranker: receiving the proper vector of the thread that operates in this nuclear, and select prioritization for this thread to each nuclear according to it, is order standard usually with the performance;
C-ranker: in fact at inner integrated four order models of C-ranker, be respectively applied to check four threads for each and sort, order standard can be made as under the prerequisite that satisfies power consumption constraints (power budget) according to the ordering from high to low of performance power consumption ratio; Because each independent ranker needs to receive proper vector from four threads, thereby it is concentrated in together to reduce communication-cost, only need receive primary information from the Monitor of four nuclears;
Matchmaker: receive the sorted lists of each thread and nuclear, and find out stable matching result according to the Gale-Shapley algorithm;
Scheduler: receive the matching result of Matchmaker, dispatch by operating system, each thread is assigned on the corresponding nuclear moves.
In order to make purpose of the present invention, technical scheme and advantage clear thorough more, below in conjunction with drawings and Examples, the thread scheduling method of the heterogeneous multi-nucleus processor based on the Gale-Shapley algorithm of the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explaining the present invention, and be not used in restriction the present invention.
The embodiment of the invention is based on the thread scheduling method of the heterogeneous multi-nucleus processor of Gale-Shapley algorithm, comprise that the behavioral characteristics according to program is respectively thread and karyogenesis sorted lists, and carry out thread scheduling according to the stable coupling that ranking results is found out an optimum with the Gale-Shapley algorithm.Have only core0 in the present embodiment in the supposing the system, core1, core2, four isomery nuclears of core3, it has different structure configurations separately.Obviously, the present invention also can expand in the heterogeneous processor that comprises multinuclear more, and the nuclear of four in its implementation and this example heterogeneous multi-nucleus processor is big difference too not, is not therefore specified at this, but all should be considered as being included in the category of the present invention.
Following elder generation is the implementation procedure that example is specifically introduced ranker model off-line training with order models training framework shown in Figure 1.
At first, with representative example program for example SPEC2006 be a series of program segment with it according to certain regular cutting as routine library, for example the instruction of each ten million bar is considered as a program segment; With the program analysis tool proper vector of extraction procedure section such as mika for example, wherein can comprise ILP, integer instructions number, floating point instruction number, various information such as cache invalidation rate; Some program segments of random choose from the storehouse, respectively at core0, core1, core2 carries out emulation on the simulator of four nuclears of core3, and obtains corresponding performance information such as IPC etc., and power consumption information such as performance power consumption compare etc.; Be training set and test set two parts with program segment and the simulation result random division thereof of random choose, selected study sort algorithm for example RankBoost carries out the training of order models, and the training process of order models is the process of a supervised learning.
For T-ranker, the proper vector that it is input as a program segment is output as the performance ordering that same program segment moves at four nuclears, and just thread is to the selection prioritization of nuclear, as previously mentioned, T-ranker only need train a model can be respectively applied to four nuclears; C-ranker for certain nuclear, it is input as the proper vector in four distinct program sections of this nuclear operation, be output as four performance power consumption ratio ordering, namely check the prioritization of accepting of thread, C-ranker needs to train an independently order models for each nuclear separately.When the test error of model on test set hanged down to the acceptable degree, the training stage of model finished.
After the order models training finishes, it is realized at heterogeneous multi-nucleus processor with hardware mode, be used for thread scheduling.
Be the realization of the thread scheduling method of the example heterogeneous multi-nucleus processor of specifically introducing the Gale-Shapley algorithm below with heterogeneous multi-nucleus processor shown in Figure 2 scheduling framework.
Suppose to have four thread T0, T1, T2, T3 run on this heterogeneous multi-nucleus processor.During initialization owing to there is not the prior imformation of thread, with its random schedule on four nuclears, for example obtain following matching way (T0, core0), (T1, core1), (T2, core2), (T3, core3).
After operation after a while, each Monitor collects the program behavioral characteristics of place nuclear, it is sent to corresponding T-ranker and C-ranker respectively, and obtain following ranking results:
Table 1 thread is to the ranking results of nuclear
Figure BDA00003268902600091
Table 2 is checked the ranking results of thread
After obtaining above sorted lists, ranking results is sent to Matchmaker, Matchmaker finds out the stable coupling of an optimum according to the Gale-Shapley algorithm:
At first, T0 selects prioritization to file a request to Core2 according to it, and Core2 does not have match objects at this moment, accepts the request of T0, form a coupling to (T0, Core2);
Then, T1 selects prioritization to file a request to Core1 according to it, and Core1 does not have match objects at this moment, accepts the request of T1, form a coupling to (T1, Core1);
Then, T2 selects prioritization to file a request to Core2 according to it, and Core2 mates with T0 this moment, Core2 check its its accept prioritization, find that the priority of T2 is higher than T0, so accept the request that T2 proposes, again form coupling to (T2, Core2);
Since Core2 again with T2 coupling, so T0 loses match objects, it is filed a request to Core1 according to the descending order, TO priority on the sorted lists of Core1 is higher than T1, so Core1 selects to accept, form new coupling to (T0, Core1).
By that analogy, thread proposes matching request to nuclear from high to low according to its prioritization, does not have match objects as fruit stone, and it is right with its formation coupling then to select to accept request; As fruit stone match objects has been arranged, then newer thread and the priority of match objects are if the thread that the priority of new thread is accepted before being higher than is then selected to accept new thread as match objects, if the thread of accepting before the priority of new thread is lower than is then refused new request; Unaccepted thread is reselected next nuclear proposition matching request on the sorted lists; All found match objects up to all threads and nuclear, the matching process that carries out according to the Gale-Shapley algorithm finishes.Theoretical proof, this coupling necessarily is in steady state (SS), and is a kind of of optimum in all stable couplings.And undoubtedly, the coupling that obtains according to said process is Pareto optimality, because there is not thread (or nuclear) to improve self benefits under the prerequisite of not damaging other thread (or nuclear) income.
The stable coupling that finally obtains is: (T0, Core1), (T1, Core3), (T2, Core2), (T3, Core0).Scheduler is dispatched to corresponding nuclear according to matching result respectively with thread and goes up operation.Monitor continues to gather new performance of program, for scheduling is next time prepared.
More than be embodiments of the invention, also have a lot of situations in like manner can push away, do not enumerate one by one, particularly the present invention just uses sort algorithm RankBoost and obtains ranking results, and obtain stable coupling in conjunction with the Gale-Shapley algorithm and be used for thread scheduling, can also reach same matching result with other sort algorithm in conjunction with the Gale-Shapley algorithm, AdaRank for example, Rank SVM etc.
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.These are revised and modification belongs within protection scope of the present invention.

Claims (9)

1. a heterogeneous polynuclear thread scheduling method is characterized in that, comprises that the behavioral characteristics according to program is respectively thread and karyogenesis sorted lists, and finds out the stable coupling of the optimum of thread and nuclear according to sorted lists, carries out thread scheduling according to this stable coupling.
2. heterogeneous polynuclear thread scheduling method as claimed in claim 1 is characterized in that, thread and karyogenesis sorted lists comprise the generation order models, specifically comprise the steps:
(1) selects an ideal data storehouse;
(2) extraction procedure sampling fragment from this database;
(3) the sampling of program fragment is moved at the simulator of each nuclear respectively, and obtained respective response, sampling of program fragment and response thereof are divided into training set and test set two parts;
(4) select suitable learning algorithm training order models;
(5) when the test error of order models meets the demands, the training stage finishes.
3. heterogeneous polynuclear thread scheduling method as claimed in claim 2 is characterized in that, this sampling of program fragment comprises proper vector, and for thread, this proper vector of a sampling of program fragment of input is exported the sorted lists to each nuclear; For nuclear, import this proper vector of each thread sampling of program fragment, be output as the sorted lists that each checks each thread.
4. heterogeneous polynuclear thread scheduling method as claimed in claim 1 is characterized in that, specifically comprises the steps:
Collect the operating all kinds of multidate informations of thread, be output as the proper vector of certain sampling of program fragment of thread;
Reception operates in the proper vector of the thread of this nuclear, and selects a prioritization for this thread to each nuclear according to it;
Checking each thread for each sorts;
Receive the sorted lists of each thread and nuclear, and find out the stable matching result of thread and nuclear;
Receive this matching result, dispatch by operating system, each thread is assigned on the corresponding nuclear moves.
5. as claim 1 or 4 described heterogeneous polynuclear thread scheduling methods, it is characterized in that this stable coupling of finding out thread and nuclear comprises the steps:
(1) thread proposes matching request to nuclear from high to low according to its prioritization, does not have match objects as fruit stone, and it is right with its formation coupling then to select to accept request;
(2) as fruit stone match objects has been arranged, then newer thread and the priority of match objects, the thread of accepting before if the priority of new thread is higher than, then select to accept new thread as match objects, if the thread of accepting before the priority of new thread is lower than is then refused new request;
(3) unaccepted thread is reselected next nuclear proposition matching request on the sorted lists, has all found match objects up to all threads and nuclear.
6. as claim 1 or 4 described heterogeneous polynuclear thread scheduling methods, it is characterized in that this stable coupling of finding out thread and nuclear comprises employing Gale-Shapley algorithm.
7. a heterogeneous polynuclear thread scheduling system is characterized in that, comprises information acquisition module, T sorting unit, C sorting unit, adaptation, thread scheduler, wherein:
Information acquisition module is used for collecting the operating all kinds of multidate informations of each thread, is output as the proper vector of certain sampling of program fragment of each thread;
The T sorting unit is used for receiving the proper vector that operates in the thread on this nuclear, and selects prioritization for this thread to each nuclear according to it;
The C sorting unit is used to each to check each thread and sorts;
Adaptation is used for receiving the sorted lists of each thread and each nuclear, and obtains the stable matching result of thread and nuclear;
Thread scheduler receives this matching result, dispatches by operating system, each thread is assigned on the corresponding nuclear moves.
8. heterogeneous multi-nucleus processor that adopts any one method of claim 1-6.
9. heterogeneous multi-nucleus processor that comprises claim 8.
CN201310206533.0A 2013-05-29 2013-05-29 A kind of heterogeneous polynuclear thread scheduling method, system and heterogeneous multi-nucleus processor Active CN103294550B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310206533.0A CN103294550B (en) 2013-05-29 2013-05-29 A kind of heterogeneous polynuclear thread scheduling method, system and heterogeneous multi-nucleus processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310206533.0A CN103294550B (en) 2013-05-29 2013-05-29 A kind of heterogeneous polynuclear thread scheduling method, system and heterogeneous multi-nucleus processor

Publications (2)

Publication Number Publication Date
CN103294550A true CN103294550A (en) 2013-09-11
CN103294550B CN103294550B (en) 2016-08-10

Family

ID=49095481

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310206533.0A Active CN103294550B (en) 2013-05-29 2013-05-29 A kind of heterogeneous polynuclear thread scheduling method, system and heterogeneous multi-nucleus processor

Country Status (1)

Country Link
CN (1) CN103294550B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104679586A (en) * 2013-10-31 2015-06-03 三星电子株式会社 Electronic systems including heterogeneous multi-core processors and method of operating same
WO2015089780A1 (en) * 2013-12-19 2015-06-25 华为技术有限公司 Method and device for scheduling application process
CN106293644A (en) * 2015-05-12 2017-01-04 超威半导体产品(中国)有限公司 The power budget approach of consideration time thermal coupling
CN106897248A (en) * 2017-01-08 2017-06-27 广东工业大学 Low-power consumption reconfiguration technique based on heterogeneous multi-processor array
WO2018018425A1 (en) * 2016-07-26 2018-02-01 张升泽 Method and system for allocating threads of multi-kernel chip
CN109710484A (en) * 2017-10-25 2019-05-03 中国电信股份有限公司 Method of adjustment, device and the computer readable storage medium of equipment energy consumption
CN109901840A (en) * 2019-02-14 2019-06-18 中国科学院计算技术研究所 A kind of isomery compiling optimization method that cross-thread redundancy is deleted
CN109947569A (en) * 2019-03-15 2019-06-28 Oppo广东移动通信有限公司 Bind method, apparatus, terminal and the storage medium of core
CN111047499A (en) * 2019-11-18 2020-04-21 中国航空工业集团公司西安航空计算技术研究所 Large-scale dyeing array robustness verification method
CN111542808A (en) * 2017-12-26 2020-08-14 三星电子株式会社 Method and system for predicting optimal number of threads for running application on electronic device
CN113886196A (en) * 2021-12-07 2022-01-04 上海燧原科技有限公司 On-chip power consumption management method, electronic device and storage medium
CN115617497A (en) * 2022-12-14 2023-01-17 阿里巴巴达摩院(杭州)科技有限公司 Thread processing method, scheduling component, monitoring component, server and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050081203A1 (en) * 2003-09-25 2005-04-14 International Business Machines Corporation System and method for asymmetric heterogeneous multi-threaded operating system
CN101634953A (en) * 2008-07-22 2010-01-27 国际商业机器公司 Method and device for calculating search space, and method and system for self-adaptive thread scheduling

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050081203A1 (en) * 2003-09-25 2005-04-14 International Business Machines Corporation System and method for asymmetric heterogeneous multi-threaded operating system
CN101634953A (en) * 2008-07-22 2010-01-27 国际商业机器公司 Method and device for calculating search space, and method and system for self-adaptive thread scheduling

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
季园园等: "基于线程流水线的多核线程调度策略", 《计算机工程》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104679586A (en) * 2013-10-31 2015-06-03 三星电子株式会社 Electronic systems including heterogeneous multi-core processors and method of operating same
WO2015089780A1 (en) * 2013-12-19 2015-06-25 华为技术有限公司 Method and device for scheduling application process
CN105009083A (en) * 2013-12-19 2015-10-28 华为技术有限公司 Method and device for scheduling application process
CN106293644A (en) * 2015-05-12 2017-01-04 超威半导体产品(中国)有限公司 The power budget approach of consideration time thermal coupling
WO2018018425A1 (en) * 2016-07-26 2018-02-01 张升泽 Method and system for allocating threads of multi-kernel chip
CN106897248A (en) * 2017-01-08 2017-06-27 广东工业大学 Low-power consumption reconfiguration technique based on heterogeneous multi-processor array
CN109710484A (en) * 2017-10-25 2019-05-03 中国电信股份有限公司 Method of adjustment, device and the computer readable storage medium of equipment energy consumption
CN111542808A (en) * 2017-12-26 2020-08-14 三星电子株式会社 Method and system for predicting optimal number of threads for running application on electronic device
CN111542808B (en) * 2017-12-26 2024-03-22 三星电子株式会社 Method and system for predicting an optimal number of threads running an application on an electronic device
CN109901840A (en) * 2019-02-14 2019-06-18 中国科学院计算技术研究所 A kind of isomery compiling optimization method that cross-thread redundancy is deleted
CN109947569A (en) * 2019-03-15 2019-06-28 Oppo广东移动通信有限公司 Bind method, apparatus, terminal and the storage medium of core
CN109947569B (en) * 2019-03-15 2021-04-06 Oppo广东移动通信有限公司 Method, device, terminal and storage medium for binding core
CN111047499A (en) * 2019-11-18 2020-04-21 中国航空工业集团公司西安航空计算技术研究所 Large-scale dyeing array robustness verification method
CN113886196A (en) * 2021-12-07 2022-01-04 上海燧原科技有限公司 On-chip power consumption management method, electronic device and storage medium
CN115617497A (en) * 2022-12-14 2023-01-17 阿里巴巴达摩院(杭州)科技有限公司 Thread processing method, scheduling component, monitoring component, server and storage medium

Also Published As

Publication number Publication date
CN103294550B (en) 2016-08-10

Similar Documents

Publication Publication Date Title
CN103294550A (en) Heterogeneous multi-core thread scheduling method, heterogeneous multi-core thread scheduling system and heterogeneous multi-core processor
Alipourfard et al. {CherryPick}: Adaptively unearthing the best cloud configurations for big data analytics
CN102360313B (en) Performance acceleration method of heterogeneous multi-core computing platform on chip
Wu et al. Using performance-power modeling to improve energy efficiency of hpc applications
CN107861606A (en) A kind of heterogeneous polynuclear power cap method by coordinating DVFS and duty mapping
Xiong et al. A characterization of big data benchmarks
CN103885826B (en) Real-time task scheduling implementation method of multi-core embedded system
CN102855153B (en) Towards the stream compile optimization method of chip polycaryon processor
CN112306658B (en) Digital twin application management scheduling method for multi-energy system
CN103473175A (en) Extraction method for software testing case set
Torng et al. Asymmetry-aware work-stealing runtimes
Zhong et al. A green computing based architecture comparison and analysis
CN108769182A (en) A kind of prediction executes the Combinatorial Optimization dispatching method of task execution time
CN108983712A (en) A kind of optimization mixes the method for scheduling task of crucial real-time system service life
US20230119235A1 (en) Large-Scale Accelerator System Energy Performance Optimization
CN105740059B (en) A kind of population dispatching method towards Divisible task
CN110347602A (en) Multitask script execution and device, electronic equipment and readable storage medium storing program for executing
CN101286138A (en) Method for multithread sharing multi-core processor secondary buffer memory based on data classification
CN104346220B (en) A kind of method for scheduling task and system
CN110048886A (en) A kind of efficient cloud configuration selection algorithm of big data analysis task
CN113010296B (en) Formalized model based task analysis and resource allocation method and system
CN106649067B (en) A kind of performance and energy consumption prediction technique and device
Visheratin et al. Hard-deadline constrained workflows scheduling using metaheuristic algorithms
Kumaraswamy et al. Exploiting dynamism in hpc applications to optimize energy-efficiency
CN103530183B (en) In large scale scale heterogeneous calculating system, task computation measurer has the dispatching method of randomness

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant