CN103294550B - A kind of heterogeneous polynuclear thread scheduling method, system and heterogeneous multi-nucleus processor - Google Patents
A kind of heterogeneous polynuclear thread scheduling method, system and heterogeneous multi-nucleus processor Download PDFInfo
- Publication number
- CN103294550B CN103294550B CN201310206533.0A CN201310206533A CN103294550B CN 103294550 B CN103294550 B CN 103294550B CN 201310206533 A CN201310206533 A CN 201310206533A CN 103294550 B CN103294550 B CN 103294550B
- Authority
- CN
- China
- Prior art keywords
- thread
- core
- scheduling
- program
- sorted lists
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention relates to a kind of heterogeneous polynuclear thread scheduling method, be respectively thread and karyogenesis sorted lists including according to the behavioral characteristics of program, and find out the optimum stable matching of thread and core according to sorted lists, carry out thread scheduling according to this stable matching.Including receiving the characteristic vector of the thread operating in this core, and each core to carry out selecting a prioritization according to it for this thread;Check each thread for each to be ranked up;Receive the sorted lists of each thread and core, and find out the stable matching result of thread and core;Receive this matching result, be scheduling by operating system, be assigned to each thread on corresponding core run.Avoid the great expense incurred that sampling scheduling brings;More complicated factors affecting performance power consumption are taken into account, it is only necessary to the relativeness of prediction rather than occurrence, while reducing the complexity of model, also improve the accuracy of scheduling.
Description
Technical field
The present invention relates to a kind of at single instrction collection heterogeneous multi-nucleus processor (Single-ISA heterogeneous
Multi-core processors) thread and check figure mesh mutually when thread scheduling method
(threads scheduling policy) field, particularly relates to one and selects each other according to thread and verification
After selecting prioritization, complete the realization of thread scheduling method with Gale-Shapley algorithm.
Background technology
Along with the development of integrated circuit technology, increasing core is integrated in same SOC(system on a chip), sheet
Upper polycaryon processor (chip multi-processors, CMP) is increasingly becoming the processor knot of a kind of main flow
Structure.Chip multi-core processor by multiple identical general purpose core integrated on sheet be parallel running in systems
Program provides better performance performance, but the most also can be dispelled the heat by power consumption simultaneously, the restriction of chip area etc..
Propose different to more effectively make full use of power consumption limited on sheet and area, industrial quarters and academia
Structure polycaryon processor structure.
Heterogeneous multi-nucleus processor has the multiple form of the composition, the invention mainly relates to single instrction collection heterogeneous polynuclear and processes
Device (Single-ISA heterogeneous multi-core processors).At single instrction collection isomery
In polycaryon processor, different types of core shares same set of instruction set.Difference between core both can be by frequency
The parameters such as rate, cache size, power consumption limit (power budget) cause, it is also possible to due to substantially
The difference of structure design (such as: out-of-order or in-order, instruction issue width etc.) causes.
It addition, present invention is generally directed in heterogeneous multi-nucleus processor each self-operating on each core a single-threaded journey
The situation of sequence, therefore number of threads is always equal to the number of the core in system, and thread can be considered and program
Of equal value.
Different programs is generally of different performance of program.Further, even for same program, root
According to input set and the change in the stage of execution, its performance of program also can occur significant change.
In heterogeneous multi-nucleus processor, according to performance of program, each thread scheduling is closed the most to each of which
Suitable core operates above, and this is referred to as thread scheduling.The purpose of thread scheduling is that with suitable core be thread
Offer better performance shows, and avoids the waste of power consumption so that power consumption limited on sheet and face the most as far as possible
Long-pending resource is all more efficiently utilized.
Dispatching method has static and dynamic point, wherein, static dispatching method by off-line extraction procedure with
The concrete feature performing environment unrelated speculates the performance that each thread runs on different types of core,
According to predicting the outcome, will run on each thread scheduling to corresponding core.Static scheduling method has only used journey
Difference between sequence, the program that have ignored is from being had different performance of program in the different execution stages, therefore
There is natural defect in static scheduling method.
Scheduling is divided into two stages to carry out by the way of dynamic dispatching method based on sampling: sampling phase is with steady
Surely the stage is performed.After the program behavioral characteristics of indicating there occurs that the trigger event of notable change occurs, enter
Sampling phase;In sampling phase, each thread is dispatched on each type of core respectively trail run, because of
This needs to travel through all of scheduling scheme, and records every kind of corresponding performance of scheduling scheme;Then choose
Select the optimum scheduling scheme of performance and enter the stable execution stage of long term, trigger until next
The generation of event.The behavioral characteristics that dynamic dispatching method based on sampling can make full use of program is adjusted
Degree.But, substantial amounts of thread migration cost can be brought in sampling phase, and travel through different scheduling schemes
Time need to allow program trail run under various nonideal scheduling schemes, the performance cost thus caused is the most very
Greatly;Additionally sampling expense can increase along with the type of system center and increase sharply so that this kind of dispatching method
Extensibility very poor, it is impossible to be applied in reality.
The expense brought in order to avoid sampling, a class is suggested based on didactic dispatching method.This kind of scheduling
Method carrys out the more operating key messages of capture program by the monitoring parts (monitor) of some hardware,
Such as IPC, cache invalidation rate, blocking time etc., and rule of thumb rule is estimated with these multidate informations
The performance that each thread runs on different types of core, then uses greedy algorithm according to income size
Suitable core is selected for thread.
Below technical scheme representative in this kind of dispatching method is made some brief introduction:
In a heterogeneous multi-nucleus processor being made up of the core of different frequency, thread is performed rank according to upper one
The IPC of section sorts from high to low, is ranked up by frequency by core simultaneously, then by thread and core according to sequence
Relative position mate.Similar way can also be by the cache invalidation rate (cache of collecting thread
Miss rate) etc. information thread is divided into computation-intensive (compute-intensive) and memory access close
Collection type (memory-intensive) two class, then by (example on the thread scheduling of computation-intensive to macronucleus
As: frequency is high, and Buffer size is big, Out-of-order execution etc.) run, the thread of memory access intensity is scheduled for
On small nut, (such as: frequency is low, Buffer size is little, sequentially execution etc.) is run.This dispatching method
Starting point is to be assigned on macronucleus by computation-intensive thread higher for instruction level parallelism (ILP) thus takes
Obtaining better performance performance, the thread of memory access intensity is assigned on small nut save power consumption.This kind of way
Improve further is the information such as the cache invalidation rate collected, blocking time (stall time) to be combined
The structural parameters of each core, estimate the performance that each thread runs in different IPs, then by greed
Algorithm will run on thread scheduling to each core according to performance benefits size.
This kind of dispatching method the most only uses the important performance of program of minority (such as cache invalidation rate, IPC etc.)
Certain domain knowledge or empirical rule is combined with the structural parameters (such as frequency, cache size etc.) of core
The performance of program is estimated, and it practice, the performance of program is relevant to large amount of complex factor, this leads
Cause prediction the most not accurate enough, so that the effect of this kind of dispatching method is undesirable.
Consider limited several because of usually it addition, existing dispatching method depends on mostly by a formula model
Predict the performance that each thread runs on different types of core.But the actual performance of program is with various
Complicated factor is correlated with, and causes the limited accuracy of this kind of prediction.On the other hand, even if having one accurately
Forecast model, its complexity realized is generally the highest, nor necessarily contributes to realizing preferably dispatching.
For example, it is assumed that the actual performance that thread runs on two different types of cores is respectively (5,4.8),
Model A is predicted as (4.9,5.1), Model B be predicted as (10,1).Obviously, the prediction of model A
The most accurate, but the scheduling scheme made according to the prediction of Model B is the most more reliable.Permissible from this example
Find out, the performance exact value that actually need to operate above at different core if it were not for thread, but one relative
Relation, the performance ranking that i.e. prediction thread runs on each core.
On the other hand, existing dispatching method is mostly from the angle of thread, using thread as decision-maker,
Greed scheduling is carried out according to single optimization aim.
Generally speaking, the dispatching method before proposed is all to set one to optimize mesh from the visual angle of program
Mark, selects applicable core using program as decision-maker.What the dispatching method of this unidirectional selection existed asks
Topic is in scheduling process, and core is actively determined not according to himself situation such as construction features and power consumption limit
The fixed right whether receiving a thread.Such as, after a core is selected by the thread of certain computation-intensive,
Mean the enough performances best for its offer of this nuclear energy for this thread;But go out from the angle of core
Send out, if this core receives this thread and its power consumption may be caused to exceed restriction (power budget), then this
Plant scheduling scheme the most not ideal enough.
Summary of the invention
In order to solve above-mentioned technical problem, it is an object of the invention to propose one based on Gale-Shapley
The thread scheduling method of the heterogeneous multi-nucleus processor of algorithm and dispatching patcher, in heterogeneous multi-nucleus processor
Thread scheduling problem, the present invention can carry out dynamic dispatching according to the change of performance of program, effectively prevent base
In the great expense incurred that the dispatching method of sampling brings, and performance is difficult to accurately predict by heuristic mutation operations method
Cause dispatching dissatisfactory defect, and using thread and core all as decision-making participant, in the process of scheduling
In can take into account the demand of thread and core simultaneously.
Specifically, the invention discloses a kind of heterogeneous polynuclear thread scheduling method, including moving according to program
State feature is respectively thread and karyogenesis sorted lists, and finds out the optimum of thread and core according to sorted lists
Stable matching, carries out thread scheduling according to this stable matching.
Described thread and karyogenesis sorted lists include generating order models, specifically include following steps:
(1) an ideal data storehouse is selected;
(2) extraction procedure sampled segment from this data base;
(3) sampling of program fragment is run respectively on the simulator of each core, and obtains respective response,
Sampling of program fragment and response thereof are divided into training set and test set two parts;
(4) suitable learning algorithm is selected to train order models;
(5) when the test error of order models meet require time, the training stage terminates.
This described sampling of program fragment includes characteristic vector, for thread, inputs a sampling of program fragment
This feature vector, export a sorted lists to each core;For core, input each multi-threaded program and take out
This feature vector of print section, is output as the sorted lists of each thread of each verification.
Described heterogeneous polynuclear thread scheduling method, specifically includes following steps:
Collect the operating all kinds of multidate informations of thread, be output as the feature of certain sampling of program fragment of thread
Vector;
Receive the characteristic vector of the thread operating in this core, and to each core to carry out selecting one according to it for this thread
Individual prioritization;
Check each thread for each to be ranked up;
Receive the sorted lists of each thread and core, and find out the stable matching result of thread and core;
Receive this matching result, be scheduling by operating system, each thread is assigned on corresponding core
Run.
Described heterogeneous polynuclear thread scheduling method, this stable matching finding out thread and core includes walking as follows
Rapid:
(1) thread proposes matching request to core from high to low according to its prioritization, as pit does not mate
Object, then select to accept to ask formed coupling right;
(2) there have been coupling object, the newest thread and the priority mating object such as pit, as
Really the priority of new thread is higher than the thread that accepts before, then select to accept new thread as coupling object,
If the priority of new thread is less than the thread accepted before, then refuse new request;
(3) unaccepted thread reselects next core proposition matching request on sorted lists, until all
Thread and core found coupling object.
The described stable matching finding out thread and core includes using Gale-Shapley algorithm.
The invention also discloses a kind of heterogeneous polynuclear thread scheduling system, it is characterised in that include information gathering
Module, T sorting unit, C sorting unit, adapter, thread scheduler, wherein:
Information acquisition module, is used for collecting the operating all kinds of multidate informations of each thread, is output as each line
The characteristic vector of certain sampling of program fragment of journey;
T sorting unit, for receiving the characteristic vector of the thread operated on this core, and gives for this thread according to it
Each core carries out selecting prioritization;
C sorting unit, is ranked up for checking each thread for each;
Adapter, for receiving the sorted lists of each thread and each core, and obtains stablizing of thread and core
Matching result;
Thread scheduler, is received this matching result, is scheduling by operating system, is distributed by each thread
Run on corresponding core.
The invention also discloses a kind of isomery using any of the above described a kind of heterogeneous polynuclear thread scheduling method many
Core processor.
The invention also discloses a kind of heterogeneous multi-nucleus processor including above-mentioned heterogeneous polynuclear thread scheduling system.
The invention has the beneficial effects as follows: on the basis of the behavioral characteristics that can utilize program, avoid sampling adjust
The great expense incurred that degree brings;Estimated performance is carried out by a nonlinear study order models replacement empirical equation
More complicated factors affecting performance power consumption can be taken into account by the way of power consumption, and have only to prediction
Relativeness rather than occurrence, also improve the accuracy of scheduling while reducing the complexity of model;
In the scheduling process of thread, by the independent decision-making main body that thread and core are all considered as in gambling process, from
And accomplish the performance requirement of program of taking into account and the power consumption limit of core;Gale-Shapley algorithm is utilized to find one
The individual stable matching being in Pareto optimality also carries out thread scheduling according to it.
Accompanying drawing explanation
Fig. 1 order models of the present invention off-line training framework
The constructive embodiment of Fig. 2 present invention four core heterogeneous multi-nucleus processor
Detailed description of the invention
The present invention uses for reference game theory, and thread and core are all considered as the decision-making participant of selfishness, and they all can be from respectively
From angle set out and maximize its performance or power consumption income the most respectively, objectively make the dispatching method can
Take into account optimization aim of both thread and core, thus obtain a more excellent overall scheduling decision-making.
In the present invention, the selection prioritization obtaining thread for each core is needed.And from the angle of core
Set out, each thread is carried out a prioritization received.
In order to obtain above-mentioned each prioritization, need to use study ordering techniques (learn-to-rank
Technique) order models (ranker) is trained.Order models is obtained as Fig. 1 gives the present invention
One specific practice:
Application data base Application database: comprises the infinitely-great ideal of all programs
Data base;
Sampling of program fragment Sample application phase: the journey extracted from some example programs
Sequence sampled segment, it performance of program possessed should be able to represent major part common programs, and normal with some
Program analysis tool such as mika can extract the characteristic vector of program;
Simulator Simulator: for a heterogeneous multi-nucleus processor, the number of core and the class of each core
Type has been determined in advance, and sampling of program fragment is run on the simulator of each core respectively, and obtains corresponding
Response (each core response), is divided into training set and test set sampling of program fragment and response thereof
Two parts;
Learning algorithm Learning algorithm: the training of order models ranker is a supervised learning
Process, according to circumstances selects suitable learning algorithm such as RankBoost etc. to train order models;
Order models Ranker model: when the test error of order models can meet require time,
Training stage terminates.
For thread, the input of T-ranker is the characteristic vector of a usability of program fragments, and output is right
One sorted lists of each core, for T-ranker, the core needing sequence is changeless,
Input variable is the characteristic vector of each usability of program fragments, therefore has only to train a T-ranker
On all cores general;For certain core, the input of C-ranker is the feature of each thread fragment
Vector, output is a sorted lists of each thread of this verification, for C-ranker, needs sequence
Thread be in change, and each core has different structure configuration features, even for identical one
Group its ranking results page of thread differs, it is therefore desirable to solely train a C-ranker for each vouching.
After training, order models can be realized by hardware, be integrated on each core.Or as this
A part for invention dispatching patcher.
After utilizing order models respectively thread and core to obtain its respective sorted lists, according to
Gale-Shapley algorithm finds stable matching, then carries out thread scheduling according to matching result.Thus
Reach the state of a Pareto optimality so that all of thread and core are all in a relative satisfied shape
State, thus objectively realize the thread scheduling of an approximation global optimum.
Assume to gather in A and set B and be respectively arranged with N number of element, and each element has oneself preferential
Level sorted lists comprises all elements of another set, then can be always this according to Gale-Shapley algorithm
A stable matching status is found in two set so that each element can find and can find at it
Good coupling object.One unstable coupling means to there is the element a in set A and collection in this condition
Close and all have precedence over their present respective coupling object on the sorted lists of each comfortable the other side of the element b in B,
Therefore a and b is more likely to refuse they current coupling objects and mate with the other side.Do not deposit for one
Coupling in unstable factor is stable matching.For set A and B, it is understood that there may be multiple stable matchings.
Theoretical proof, the coupling found according to Gale-Shapley algorithm is always at Pareto-optimality, and
And be one best in all stable matchings.
The thread of the heterogeneous multi-nucleus processor based on Gale-Shapley algorithm provided for realizing the present invention
Dispatching method, illustrates as an example with the heterogeneous multi-nucleus processor of 4 cores.Obviously, the present invention
Extend also to be integrated with in the heterogeneous multi-nucleus processor of more multinuclear, and the type for core does not limit
System.
As in figure 2 it is shown, in the heterogeneous multi-nucleus processor of 4 cores, in addition to 4 core, including with
Lower component: the information acquisition module Monitor on each core, the T sorting unit T-ranker on each core,
One C sorting unit C-ranker, an adapter Matchmaker, a thread scheduler Scheduler.
Monitor: be used for collecting the operating all kinds of multidate informations of thread, include but not limited to cache invalidation
Rate, blocking time, integer instructions number, floating point instruction number etc., it is output as certain program segment of thread
Characteristic vector;
T-ranker: receive the characteristic vector of the thread operating in this core, and give each core according to it for this thread
Carry out selecting prioritization, generally with performance as order standard;
C-ranker: be actually internally integrated four order models at C-ranker, be respectively used to as respectively
Four threads of individual verification are ranked up, and order standard can be set to meeting power consumption limit (power budget)
On the premise of sort from high to low according to power dissipation ratio of performance;Owing to each single ranker is required for receiving
From the characteristic vector of four threads, thus concentrated in together and can be reduced communication-cost, it is only necessary to from
The Monitor of four cores receives primary information;
Matchmaker: receive the sorted lists of each thread and core, and according to Gale-Shapley algorithm
Find out stable matching result;
Scheduler: receive the matching result of Matchmaker, be scheduling by operating system, will be each
Individual thread is assigned on corresponding core run.
In order to make the purpose of the present invention, technical scheme and advantage more clear thorough, below in conjunction with accompanying drawing and reality
Execute example, the thread scheduling method to the heterogeneous multi-nucleus processor based on Gale-Shapley algorithm of the present invention
It is further elaborated.Should be appreciated that specific embodiment described herein is only in order to explain this
Bright, it is not intended to limit the present invention.
The thread scheduling side of embodiment of the present invention heterogeneous multi-nucleus processor based on Gale-Shapley algorithm
Method, is respectively thread and karyogenesis sorted lists including according to the behavioral characteristics of program, and according to ranking results
Find out an optimum stable matching with Gale-Shapley algorithm and carry out thread scheduling.Vacation in the present embodiment
If only core0 in system, tetra-isomery cores of core1, core2, core3, it each has different
Structure configures.Obviously, the present invention extends also in the heterogeneous processor comprising more multinuclear, its realization side
Formula and four core heterogeneous multi-nucleus processors in this example not the biggest difference, is not illustrated at this,
But all should be considered as being included in scope.
First as a example by the training framework of order models shown in Fig. 1, specifically introduce ranker model off-line training below
Realize process.
First, using representative example program such as SPEC2006 as program library, by it according to one
Establishing rules, cutting is a series of program segment, such as, each ten million bar instruction is considered as a program segment;Use journey
The characteristic vector of the extraction procedure sections such as sequence analytical tool such as mika, wherein can comprise ILP, and integer refers to
Make the various information such as number, floating point instruction number, cache invalidation rate;Some program segments of random choose from storehouse, point
Do not emulate on core0, the simulator of tetra-cores of core1, core2, core3, and obtain corresponding
Performance information such as IPC etc., and power consumption information such as power dissipation ratio of performance etc.;By the program segment of random choose and
Its simulation result random division is training set and test set two parts, and selected study sort algorithm is such as
RankBoost is ranked up the training of model, and the training process of order models is the mistake of a supervised learning
Journey.
For T-ranker, its input is the characteristic vector of a program segment, is output as same program segment and exists
The performance ranking run on four cores, the namely thread selection prioritization to core, as it was previously stated,
T-ranker has only to train a model can be respectively used to four cores;For the C-ranker of certain core,
Its input is the characteristic vector of four the distinct program sections run at this core, is output as the power dissipation ratio of performance of four
Sequence, that i.e. checks thread accepts prioritization, and C-ranker needs to be individually for each core and trains one
Independent order models.When model test error on test set low to acceptable degree time, model
Training stage terminate.
After order models training terminates, it is realized in the way of hardware on heterogeneous multi-nucleus processor, use
In thread scheduling.
As a example by heterogeneous multi-nucleus processor scheduling architecture shown in Fig. 2, specifically introduce Gale-Shapley below calculate
The realization of the thread scheduling method of the heterogeneous multi-nucleus processor of method.
Assuming there are four thread T0, T1, T2, T3 run on this heterogeneous multi-nucleus processor.During initialization,
Owing to there is no the prior information of thread, by its random schedule on four cores, such as, obtain following matching way
(T0, core0), (T1, core1), (T2, core2), (T3, core3).
After running after a while, each Monitor collects the program behavioral characteristics of place core, will
It is sent respectively to corresponding T-ranker and C-ranker, and obtains following ranking results:
Table 1 thread ranking results to core
The ranking results of thread checked by table 2
After obtaining above sorted lists, ranking results is sent to Matchmaker, Matchmaker root
An optimum stable matching is found out according to Gale-Shapley algorithm:
First, T0 selects prioritization to file a request to Core2 according to it, and Core2 does not the most mate
Object, accepts the request of T0, forms a coupling to (T0, Core2);
Then, T1 selects prioritization to file a request to Core1 according to it, and Core1 does not the most mate
Object, accepts the request of T1, forms a coupling to (T1, Core1);
Then, T2 according to its select prioritization file a request to Core2, Core2 the most with
T0 mates, Core2 check its its accept prioritization, find that the priority of T2, higher than T0, then connects
The request proposed by T2, re-forms coupling to (T2, Core2);
Owing to Core2 mates with T2 again, therefore T0 loses coupling object, its according to descending order to
Core1 files a request, and priority is higher than T1 on the sorted lists of Core1 for TO, and therefore Core1 selects
Accept, form new coupling to (T0, Core1).
By that analogy, thread proposes matching request to core from high to low according to its prioritization, as pit does not has
There is coupling object, then select to accept to ask formed coupling right;As pit has had coupling object, then
The newest thread and the priority mating object, if the priority of new thread is higher than the line accepted before
Journey, then select to accept new thread as coupling object, if the priority of new thread is less than accepting before
Thread, then refuse new request;Unaccepted thread reselects next core on sorted lists and proposes coupling
Request;Until all of thread and core have found coupling object, enter according to Gale-Shapley algorithm
The matching process of row terminates.Theoretical proof, this coupling is necessarily in steady statue, and is all stable matchings
The one of middle optimum.And certainly, the coupling obtained according to said process is Pareto optimality, because of
Receive for not having thread (or core) that can be improved self on the premise of not damaging other thread (or core) income
Benefit.
The stable matching finally given is: (T0, Core1), (T1, Core3), (T2, Core2), (T3,
Core0).Thread is dispatched to respectively on corresponding core run by Scheduler according to matching result.Monitor
Continue to gather new performance of program, prepare for scheduling next time.
Being more than embodiments of the invention, the most a lot of situations in like manner can push away, not enumerate, particularly
The present invention simply uses sort algorithm RankBoost and obtains ranking results, and combines Gale-Shapley calculation
Method obtains stable matching for thread scheduling, it is also possible to combine Gale-Shapley with other sort algorithm
Algorithm, reaches same matching result, such as AdaRank, Rank SVM etc..
Obviously, those skilled in the art can carry out various change and modification without deviating from this to the present invention
Bright spirit and scope.Within these amendments and modification belong to protection scope of the present invention.
Claims (7)
1. a heterogeneous polynuclear thread scheduling method, it is characterised in that include the behavioral characteristics according to program
It is respectively thread and karyogenesis sorted lists, and finds out optimum stable of thread and core according to sorted lists
Joining, carry out thread scheduling according to this stable matching, the concrete steps of the stable matching wherein finding out optimum include
Collect the operating all kinds of multidate informations of thread, be output as thread certain sampling of program fragment feature to
Amount;
Receive the characteristic vector of the thread operating in this core, and to each core to carry out selecting one according to it for this thread
Individual prioritization;
Check each thread for each to be ranked up;
Receive the sorted lists of each thread and core, and find out the stable matching result of thread and core;
Receive this matching result, be scheduling by operating system, each thread is assigned on corresponding core
Run.
2. heterogeneous polynuclear thread scheduling method as claimed in claim 1, it is characterised in that thread and core
Generate sorted lists to include generating order models, specifically include following steps:
(1) an ideal data storehouse is selected;
(2) extraction procedure sampled segment from this data base;
(3) sampling of program fragment is run respectively on the simulator of each core, and obtains respective response,
Sampling of program fragment and response thereof are divided into training set and test set two parts;
(4) suitable learning algorithm is selected to train order models;
(5) when the test error of order models meet require time, the training stage terminates.
3. heterogeneous polynuclear thread scheduling method as claimed in claim 2, it is characterised in that this program is taken out
Print section includes characteristic vector, for thread, this feature vector of one sampling of program fragment of input, output
One sorted lists to each core;For core, input this feature vector of each multi-threaded program sampled segment,
It is output as the sorted lists of each thread of each verification.
4. heterogeneous polynuclear thread scheduling method as claimed in claim 1, it is characterised in that this finds out line
The stable matching of journey and core comprises the steps:
(1) thread proposes matching request to core from high to low according to its prioritization, as pit does not mate
Object, then select to accept to ask formed coupling right;
(2) there have been coupling object, the newest thread and the priority mating object such as pit, as
Really the priority of new thread is higher than the thread that accepts before, then select to accept new thread as coupling object,
If the priority of new thread is less than the thread accepted before, then refuse new request;
(3) unaccepted thread reselects next core proposition matching request on sorted lists, until all
Thread and core found coupling object.
5. heterogeneous polynuclear thread scheduling method as claimed in claim 1, it is characterised in that this finds out line
The stable matching of journey and core includes using Gale-Shapley algorithm.
6. a heterogeneous polynuclear thread scheduling system, it is characterised in that include that information acquisition module, T arrange
Sequence device, C sorting unit, adapter, thread scheduler, wherein:
Information acquisition module, is used for collecting the operating all kinds of multidate informations of each thread, is output as each line
The characteristic vector of certain sampling of program fragment of journey;
T sorting unit, for receiving the characteristic vector of the thread operated on core, and gives each according to it for this thread
Individual core carries out selecting prioritization;
C sorting unit, is ranked up for checking each thread for each;
Adapter, for receiving the sorted lists of each thread and each core, and obtains stablizing of thread and core
Matching result;
Thread scheduler, is received this matching result, is scheduling by operating system, is distributed by each thread
Run on corresponding core.
7. the heterogeneous multi-nucleus processor using claim 1-5 any one method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310206533.0A CN103294550B (en) | 2013-05-29 | 2013-05-29 | A kind of heterogeneous polynuclear thread scheduling method, system and heterogeneous multi-nucleus processor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310206533.0A CN103294550B (en) | 2013-05-29 | 2013-05-29 | A kind of heterogeneous polynuclear thread scheduling method, system and heterogeneous multi-nucleus processor |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103294550A CN103294550A (en) | 2013-09-11 |
CN103294550B true CN103294550B (en) | 2016-08-10 |
Family
ID=49095481
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310206533.0A Active CN103294550B (en) | 2013-05-29 | 2013-05-29 | A kind of heterogeneous polynuclear thread scheduling method, system and heterogeneous multi-nucleus processor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103294550B (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20150050135A (en) * | 2013-10-31 | 2015-05-08 | 삼성전자주식회사 | Electronic system including a plurality of heterogeneous cores and operating method therof |
WO2015089780A1 (en) * | 2013-12-19 | 2015-06-25 | 华为技术有限公司 | Method and device for scheduling application process |
CN106293644B (en) * | 2015-05-12 | 2022-02-01 | 超威半导体产品(中国)有限公司 | Power budget method considering time thermal coupling |
WO2018018425A1 (en) * | 2016-07-26 | 2018-02-01 | 张升泽 | Method and system for allocating threads of multi-kernel chip |
CN106897248A (en) * | 2017-01-08 | 2017-06-27 | 广东工业大学 | Low-power consumption reconfiguration technique based on heterogeneous multi-processor array |
CN109710484A (en) * | 2017-10-25 | 2019-05-03 | 中国电信股份有限公司 | Method of adjustment, device and the computer readable storage medium of equipment energy consumption |
EP3676704A4 (en) * | 2017-12-26 | 2020-09-30 | Samsung Electronics Co., Ltd. | Method and system for predicting optimal number of threads for application running on electronic device |
CN109901840B (en) * | 2019-02-14 | 2020-10-27 | 中国科学院计算技术研究所 | Heterogeneous compilation optimization method for inter-thread redundancy deletion |
CN109947569B (en) * | 2019-03-15 | 2021-04-06 | Oppo广东移动通信有限公司 | Method, device, terminal and storage medium for binding core |
CN111047499A (en) * | 2019-11-18 | 2020-04-21 | 中国航空工业集团公司西安航空计算技术研究所 | Large-scale dyeing array robustness verification method |
CN113886196B (en) * | 2021-12-07 | 2022-03-15 | 上海燧原科技有限公司 | On-chip power consumption management method, electronic device and storage medium |
CN115617497B (en) * | 2022-12-14 | 2023-03-31 | 阿里巴巴达摩院(杭州)科技有限公司 | Thread processing method, scheduling component, monitoring component, server and storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101634953A (en) * | 2008-07-22 | 2010-01-27 | 国际商业机器公司 | Method and device for calculating search space, and method and system for self-adaptive thread scheduling |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7516456B2 (en) * | 2003-09-25 | 2009-04-07 | International Business Machines Corporation | Asymmetric heterogeneous multi-threaded operating system |
-
2013
- 2013-05-29 CN CN201310206533.0A patent/CN103294550B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101634953A (en) * | 2008-07-22 | 2010-01-27 | 国际商业机器公司 | Method and device for calculating search space, and method and system for self-adaptive thread scheduling |
Non-Patent Citations (1)
Title |
---|
基于线程流水线的多核线程调度策略;季园园等;《计算机工程》;20130228;第39卷(第2期);第279-287页 * |
Also Published As
Publication number | Publication date |
---|---|
CN103294550A (en) | 2013-09-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103294550B (en) | A kind of heterogeneous polynuclear thread scheduling method, system and heterogeneous multi-nucleus processor | |
Yang et al. | Robust scheduling based on extreme learning machine for bi-objective flexible job-shop problems with machine breakdowns | |
Alipourfard et al. | {CherryPick}: Adaptively unearthing the best cloud configurations for big data analytics | |
Zhou et al. | Throughput-conscious energy allocation and reliability-aware task assignment for renewable powered in-situ server systems | |
Agrawal et al. | Energy-aware scheduling of distributed systems | |
CN107861606A (en) | A kind of heterogeneous polynuclear power cap method by coordinating DVFS and duty mapping | |
CN105046327B (en) | A kind of intelligent grid information system and method based on machine learning techniques | |
CN105045243A (en) | Semiconductor production line dynamic scheduling device | |
CN102360313B (en) | Performance acceleration method of heterogeneous multi-core computing platform on chip | |
CN103677960B (en) | Game resetting method for virtual machines capable of controlling energy consumption | |
CN103823706B (en) | A kind of plant model analog simulation real-time scheduling method based on RTLinux | |
Wu et al. | A systematic method for constructing feasible solution to SCUC problem with analytical feasibility conditions | |
CN105302630A (en) | Dynamic adjustment method and system for virtual machine | |
Padhi et al. | Solving dynamic economic emission dispatch problem with uncertainty of wind and load using whale optimization algorithm | |
Zhong et al. | A green computing based architecture comparison and analysis | |
CN108984830A (en) | A kind of building efficiency evaluation method and device based on FUZZY NETWORK analysis | |
CN109685380A (en) | A kind of large complicated comprehensive performance evaluation method for repairing device | |
Zhang et al. | CloudFreq: Elastic energy-efficient bag-of-tasks scheduling in DVFS-enabled clouds | |
Reddy et al. | Machine Learning Techniques for the Prediction of NoC Core Mapping Performance | |
CN106019985A (en) | Building energy-saving reconstruction method and building energy-saving reconstruction device | |
CN106649067B (en) | A kind of performance and energy consumption prediction technique and device | |
CN101477168B (en) | Parallelization test system and method for transient stability of electric power system | |
CN105117814A (en) | Method for actively power distribution network bi-layer wind-power planning based on improved Cuckoo search algorithm | |
Saroja et al. | Multi-criteria decision-making for heterogeneous multiprocessor scheduling | |
Wang et al. | An estimation of distribution algorithm for the flexible job-shop scheduling problem |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |