CN105844110B - A kind of adaptive neighborhood TABU search based on GPU solves Method for HW/SW partitioning - Google Patents
A kind of adaptive neighborhood TABU search based on GPU solves Method for HW/SW partitioning Download PDFInfo
- Publication number
- CN105844110B CN105844110B CN201610212930.2A CN201610212930A CN105844110B CN 105844110 B CN105844110 B CN 105844110B CN 201610212930 A CN201610212930 A CN 201610212930A CN 105844110 B CN105844110 B CN 105844110B
- Authority
- CN
- China
- Prior art keywords
- neighborhood
- solution
- gpu
- hardware
- candidate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16Z—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS, NOT OTHERWISE PROVIDED FOR
- G16Z99/00—Subject matter not provided for in other main groups of this subclass
Landscapes
- Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Mathematical Analysis (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Computational Mathematics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Algebra (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of adaptive neighborhood TABU search based on GPU to solve Method for HW/SW partitioning, tabu search algorithm common in hardware-software partition field is transformed first, the architectural feature of its calculating process and GPU is made to match, so as to be prepared for algorithm is migrated to GPU.Secondly, to further improve the performance of algorithm, the present invention does not only give general frame of the algorithm on GPU, but also specific perform of the algorithm on GPU is optimized.Contrast experiment shows that the present invention will be better than known work on quality and calculating speed is solved.
Description
Technical field
The present invention relates to Hardware/Software Co-design Technology fields, are related to a kind of Method for HW/SW partitioning, more particularly to one
Adaptive neighborhood TABU search of the kind based on GPU solves Method for HW/SW partitioning.
Background technology
Modern embedded system is usually made of hardware and software.Hardware refers mainly to be the processor towards specific application,
Have the characteristics that execution speed is fast, low in energy consumption, but cost is higher;And software execution speed is slow, power consumption is high but at low cost.It is logical
Each calculating task operated in target embedded systems is often expressed as the node that resource requirement differs, the mesh of hardware-software partition
Be in the case where meeting particular constraints, each calculating task is reasonably mapped on software or hardware so that the performance of whole system
It optimizes.Hardware-software partition is the committed step in Hardware/Software Collaborative Design.
Hardware-software partition is a np complete problem.At present, the algorithms most in use for solving hardware-software partition problem is broadly divided into two
Class, i.e., accurate derivation algorithm and approximation algorithm.Accurate derivation algorithm, which is mainly used to handle small-scale hardware-software partition, asks
Topic.When problem scale increases, the scale of solution exponentially increases, and accurate derivation algorithm becomes infeasible, then uses and inspire
Formula method solves.
The heuristic for solving hardware-software partition problem is divided into special heuristic and general heuristic.It is special
Heuristic is divided into the hardware constraints mode of priority or the software constraint mode of priority.General heuristic refers mainly to all kinds of intelligence
Searching algorithm, as genetic algorithm, ant group algorithm, particle group optimizing method, simulated annealing method, Artificial Immune Algorithm, taboo are searched
The combination of Suo Fangfa etc. and these methods.These intelligent optimization algorithms have the characteristics that:Belong to track class or kind realm
Algorithm mainly approaches solution target by successive ignition;In addition, a certain number of candidate solutions are initialized in each iteration,
By the quality that next iteration is determined to the selection of candidate solution.However, when candidate solution quantity is excessive, algorithm can be caused to exist
Increase in run time.When problem scale is very big, temporal expense is particularly evident.
The implementation procedure of this kind of algorithm has the characteristics that inherent parallel in itself.Therefore, naturally enough these algorithms are carried out
Parallelization is with the solution of acceleration problem.But the work that hardware-software partition problem is solved about parallel intelligent optimization algorithm is seldom,
The known related work published is soft or hard to solve only including paralleling genetic algorithm and parallel Hybrid Particle Swarm Optimization
Part partition problem.Small-scale cluster of the platform used for more individual PC compositions.
In recent years, the intelligent optimization algorithm that many job description GPU accelerate solves combinatorial optimization problem.By these texts
The inspiration offered, the present invention explores for the first time accelerates TABU search to solve the process divided in hardware-software partition field using GPU.
Invention content
The purpose of the invention is to overcome the shortcoming of above-mentioned background technology, provide a kind of based on the adaptive of GPU
Neighborhood TABU search solves Method for HW/SW partitioning.
The technical solution adopted in the present invention is:A kind of adaptive neighborhood TABU search based on GPU solves software and hardware and draws
Divide method, which is characterized in that include the following steps:
Step 1:In host side, the initial solution and relevant information of hardware-software partition are obtained;Initial solution is respectively set to
Current solution and globally optimal solution;
Step 2:The representation of the task image of hardware-software partition is converted in host side;
Step 3:Arrange parameter;Initialize the memory space needed for host side;Initialize the memory space needed for GPU ends;
Step 4:In host side, GPU ends are sent data to, start GPU ends kernel;
Step 5:In equipment end, best candidate neighborhood union is calculated by kernel and passes result back host side;
Step 6:In host side, taboo list and taboo indications are updated;
Step 7:In host side, current solution and globally optimal solution are updated;
Step 8:In host side, judge whether to meet stopping criterion;
If it is not, then return to step 4;
If so, terminate flow.
Preferably, initial solution described in step 1 refers to be divided acquired by heuristic or random device
The hardware cost of business figure;The relevant information mainly includes the software efforts of solution vector, the map vector of solution vector and initial solution
And Communication cost;The solution vector is the array being made of 0 or 1 corresponding with hardware cost;The mapping of the solution vector
The array that vector is made of 0 or 1;It is main to preserve the state before each node of task image is sorted, for subsequent communication consumption
The calculating taken.
Refer to task image being transferred to GPU preferably, the representation of task image described in step 2 carries out conversion
Before, the representation that will abut against table is needed to be converted to and is suitble to carry out the sparse row format of the compression of operation at GPU ends
(Compressed Sparse Row:CSR expression way).
Preferably, parameter described in step 3 mainly includes greatest iteration periodicity, does not improve periodicity, taboo continuously
The neighborhood index identifier currently solved is chosen as in the length of table, the scale of neighborhood, candidate neighborhood solution set and starts GPU
The number of thread block size and thread block needed for kernel;
Memory space needed for the host side includes the memory space of taboo list, the taboo state of each neighborhood candidate solution
Memory space, preserve all 2 indexes for overturning neighborhood candidate solutions, software efforts, hardware cost, the communication consumption of current solution generation
The memory space taken, the number of optimal candidate neighborhood, software efforts, hardware cost, Communication cost memory space;
Memory space needed for the GPU ends includes the memory space of current solution vector and the map vector currently solved, respectively
Whether the memory space of the taboo state of a neighborhood candidate solution, each neighborhood candidate solution meet the status identifier array of constraint,
The memory space of the sparse row format of the compression of task image, index, the software of all 2 overturning neighborhood candidate solutions of current solution generation
Consuming, hardware cost, Communication cost memory space, preserve meet constraint feasible neighborhood candidate solution memory space, most preferably
The number of candidate neighborhood, software efforts, hardware cost, Communication cost memory space.
Preferably, send data to GPU ends described in step 4, the data of transmission include the solution vector currently solved, when
It is the software efforts for mapping, currently solving, hardware cost and the Communication cost of preceding solution vector, the hardware cost of globally optimal solution, soft or hard
Task image, the taboo list of part division;GPU ends kernel mainly includes calculating kernel, to meeting to the consuming of 2 overturning neighborhood solutions
The kernel for select kernel and taboo evaluation is carried out to feasible neighborhood solution of the feasible neighborhood solution of constraint.
Preferably, the process that best candidate neighborhood is calculated described in step 5 includes following sub-step:
Step 5.1:At GPU ends, each 2 overturning neighborhood is a candidate solution;GPU concurrently calculates each candidate solution soft
Part consuming, hardware cost and Communication cost;After the completion of calculating, commented according to whether the sum of software efforts and Communication cost meet constraint
Valency, will be in the corresponding position of 0 or 1 write state identifier array;Wherein, 0 foot constraint with thumb down;1 represents to meet constraint;
It simultaneously will be in the memory space of the indications of neighborhood, software efforts, hardware cost and Communication cost write-in GPU end applications;
Step 5.2:At GPU ends, according to neighborhood states identifier array, by parallel C ompaction algorithms, by state
In the memory space of feasible neighborhood candidate solution that neighborhood deposit GPU ends in indications array marked as 1 meet constraint;
Step 5.3:At GPU ends, by taboo list, each feasible solution is concurrently judged whether in taboo list, if feasible
Solution not in taboo list and in all 2 overturning feasible solutions there is minimal hardware to expend, then is best neighborhood candidate solution;If
Feasible neighborhood solution is in taboo list, but its hardware cost is equally also selected as optimal candidate neighborhood less than global optimum.
Preferably, the result for passing host side described in step 5 back is mainly the number of optimal candidate neighborhood, software consumption
Take, hardware cost and Communication cost.
Preferably, described in step 6 host side update taboo list process be the first in first out based on round-robin queue,
It is neighborhood indications of the current solution in last iteration to avoid object;During update taboo indications, the index number of dequeue is
0, at the same into queue index number be 1.
It is the best neighbour selected with current iteration preferably, updating current solution and globally optimal solution described in step 7
Domain substitutes current solution, main to include replacing the mapping of solution vector, solution vector currently solved, hardware cost, software efforts, communication consumption
Expense, the indications currently solved;If the consuming of best neighborhood is better than global best hardware cost, similary to update global hardware consumption
Take;When updating global hardware consuming, periodicity is not improved continuously by clear 0.
Preferably, stopping criterion described in step 8 is primarily referred to as performing step 4 to the cycle-index of step 8 or reach
After having arrived maximum times or continuously not improved periodicity, stop operation.
The present invention is compared with the prior art compared with effect is positive and apparent:First, the present invention is in the execution time of algorithm
On to be less than the work that is currently known;Secondly, on quality is solved, also it is better than known work.
Description of the drawings
Fig. 1 is the general frame schematic diagram of the embodiment of the present invention;
Fig. 2 is the task graph model to be divided of the embodiment of the present invention and the sparse row format expression signal of corresponding compression
Figure;
Fig. 3 is the candidate solution create-rule schematic diagram of the embodiment of the present invention;
Fig. 4 is the compaction algorithm flow schematic diagrames of the embodiment of the present invention.
Specific embodiment
Understand for the ease of those of ordinary skill in the art and implement the present invention, with reference to the accompanying drawings and embodiments to this hair
It is bright to be described in further detail, it should be understood that implementation example described herein is merely to illustrate and explain the present invention, not
For limiting the present invention.
Major architectural of the attached drawing 1 for the present invention.Entire frame is divided into host side and client, and host side includes CPU and interior
It deposits;Client refers to GPU.The task of host side is to generate initial solution;Client is transferred to using initial solution as current solution;Receive visitor
The optimal candidate solution at family end;Update taboo list and taboo state table;Current solution and globally optimal solution are updated by optimal candidate solution;
Stop condition is carried out judging to decide whether to carry out calculating process next time.In client, all 2- are generated by currently solving
Candidate solution is overturn, in logic, GPU is concurrently to all consuming of candidate solution computing hardware, software efforts and Communication cost;According to
Constraints exclusion is unsatisfactory for the candidate solution of constraint;According to the feasible solution that taboo state table selection is not avoided, it is transmitted back master
Generator terminal.
Task image to be divided is expressed as non-directed graph G (V, E) by the present embodiment.Wherein, V={ v1,v2,...,vn, it represents
Task node set to be divided.Each node includes two consumings, i.e., consuming h (v when node division is hardwarei) and divide
Consuming s (v during for softwarei).E represents the line set between node, and the weights on side represent to belong to not when two adjacent nodes
With division when, corresponding Communication cost c (v can be generated between nodei,vj).Therefore, P={ VH,VSIt is known as a software and hardware
It divides, VHAnd VSMeetVH∪VS=V.Correspondingly, the line set for dividing P is defined as Ep={ (vi, vj):vi∈
VH, vj ∈ VS or vi∈VS,vj∈VH}.Top right plot in attached drawing 2 is a simple task image to be divided.Figure interior joint packet
Consuming is performed when being divided into hardware or software respectively containing task.Weights on side represent to be drawn when two tasks of connection
It is divided into the Communication cost generated when hardware and software or software and hardware.
It divides P and mainly expends measurement, i.e. hardware cost H by 3P, software efforts SPAnd Communication cost CP.This 3 consumings
It is defined as follows:
Wherein:hiRepresent the hardware cost (i=1,2..., n) of node i;viIt represents to be divided into hardware in task image
Node;vjThe node of software is divided into expression task image;VHNode subsets in expression task, the node in the set is by hard
Part performs;VSNode subsets in expression task, the node in the set are performed by software;EPIt represents to connect in task image and be in
The line set of two different demarcation set;HPRepresent the summation in the hardware cost of all nodes in hardware node setSPTable
Show the summation in the hardware cost of all nodes in software node set;CPRepresent that connection is drawn in two differences in task image
Divide the sum of institute's weights of line set of set, that is, be divided total Communication cost of task image.
By above formula, partition problem P is defined as follows:
Problem P:Given figure G and consuming function s, h, c and R >=0.It asks and is meeting constraint SP+CPUnder≤R so that hardware cost
HPA minimum hardware-software partition P.
The solution for dividing P is expressed as n-dimensional vector X={ x1,x2,...,xn}.Component xiWhen=1, represent that node is held by software
Row;xiWhen=0, represent that node is performed by hardware.C (X) represents the Communication cost of whole system.Therefore, problem P can be formalized
For following minimization problem:
Wherein, hi represents the hardware cost of node, siRepresent the software efforts of node.Hardware cost, the software consumption of problem P
Take and the calculation formula of Communication cost is as follows:
In problem P, the calculation formula whether evaluation solution meets constraint is as follows:
The system platform of this implementation is windows7, and the developing instrument that serial section is realized in development environment is Visual
Studio2012, programming language C++;The developing instrument that parallel section is realized is 7.0 versions of CUDA C.The present invention's is main
Frame is referring to Fig.1, the following specific steps for the embodiment of the present invention:
(1) discussion for simplified problem, it is assumed that the initial solution that software divides is generated by some algorithm, which is solving
Task node is not ranked up in the process, then initial solution can be the current solution vector 0101 in Fig. 3, and solution vector is reflected
Directive amount is similarly 0101.The software efforts of solution, hardware cost and Communication cost are respectively 33,8,15.Further, it is assumed that task section
Point is that greedy heuritic approach generates, then task node can carry out descending row according to hardware cost and the ratio of software efforts
Sequence, then initial solution vector is just 1001, and the mapping of solution vector remains unchanged.
(2) task image is converted so that it can adapt to the execution of GPU algorithms.Lower-left in transformation result such as Fig. 2
Figure, wherein, the Communication cost of data array representation task images top;Subscript represents node serial number in row_ptr arrays, in subscript
The element representation node adjacency subscript of first communication side in data arrays.For example, the 0th in row_ptr arrays
Element represents task node of graph v1, then v1With v2、v3With syntople.The Communication cost of adjacency list is respectively 9 and 3, as
Which node is first and v1Adjoining depends on specific programming and realizes, in this example node v2It is first and v1Adjacent node,
Therefore the value of the 0th element is 0 in row_ptr arrays, represents the Communication cost on first adjacent node in data arrays
In under be designated as 0.Likewise, node v2、v3、v4An adjacent node in data arrays under be designated as 2,4,6,8.row_
The size of ptr arrays adds 1 for the number of node, the subscript of the last one element of the last one element representation data arrays, therefore
The size of data arrays is 2 times of task image number of edges amount.Communication cost in col_index array representation data arrays is corresponding
Node serial number.
(3) the greatest iteration periodicity of algorithm is set as 2000, does not improve periodicity continuously and is set as 200, taboo list
Length is set as the evolution of node number.Have 4 nodes in this implementation, then taboo list length be √ 4, as 2.Neighborhood scale is
N* (n-1)/2, n is node number.Therefore, neighborhood scale is 6.
(4) memory space is opened up at GPU ends, including current solution vector, size is 4 int type arrays;Solution vector is reflected
It penetrates, size is 4 int types;The memory space of task image, including size be 4+1 i.e. 5 int types row_ptr arrays,
The col_index arrays of the data arrays of the float types of 4*2+1 i.e. 9, the int types of 4*2 i.e. 8;Size is 2 int types
Taboo list;The neighborhood indications array of int types, the neighborhood software efforts array of float types, float types neighborhood
The neighborhood of hardware cost array, the neighborhood Communication cost array of float types and int types can row identifier array size
All it is 6;Due to not knowing the number of feasible solution in advance, the size of feasible solution array is equal to neighborhood scale, equally including int
The neighborhood indications array of type, the neighborhood software efforts array of float types, the neighborhood hardware cost array of float types
And the neighborhood Communication cost array of float types.
(5) kernel for starting GPU ends needs to configure thread scale.The thread tissue of CUDA includes thread grid and thread
Block;One GPU kernel corresponds to a thread grid, and a thread grid is at least made of a thread block, thread grid
Scale represents the thread number of blocks started needed for the kernel.One thread block is at least made of again a thread.So start one
The size of the size * thread blocks of total number of threads=thread grid needed for a GPU kernels.The kernel that the present invention is run at GPU ends
Consuming including all 2 overturning neighborhoods calculates and constraint evaluation, feasible solution select and the taboo of feasible solution evaluation totally 3
Kernel.Wherein, the thread configuration of the first two kernel is unified for the size of neighborhood scale.Field scale is 6 in this embodiment, therefore
The total number of threads of the first two kernel is both configured to 6.Taboo about feasible solution is evaluated, and is set as 3 in this embodiment, is used
One thread block, i.e. thread sizing grid are 1.
(6) in the consuming for calculating neighborhood, per thread corresponds to a neighborhood, therefore neighborhood number is by thread number and line
Journey block number, the size of thread grid codetermine, and specific formula for calculation is as follows:
NeibIdx=blockIdx.x × blockDim.x+threadIdx.x
In above formula, neibIdx represents the number of candidate neighborhood, and blockIdx.x represents the number of thread block,
BlockDim.x represents the size of thread block, and threadIdx.x represents the number of per thread in thread block.It is noticeable
It is that obtain thread block and thread number be the built-in variable that CUDA C are provided in itself, it only need to the direct use when realizing.
After knowing neighborhood number, it is also necessary to know the current solution vector of overturning required for generating the corresponding neighborhood of the number
Two positions.For this purpose, represent to overturn 2 subscripts of current solution vector with pos1 and pos2, then especially by following two formula
It acquires:
Wherein, n represents that, with the number of nodes for dividing task image, neibIndex represents the number of the neighborhood candidate solution.Therefore
It can must generate first upturned position pos1 needed for the neighborhood candidate solution.Then second needed for neighborhood candidate solution must can be generated
A upturned position pos2.
As for this implementation, after number 0~5 is substituted into above formula, the neighborhood candidate solution of generation is that attached drawing 3 represents.
After 2 upturned positions for acquiring each neighborhood, software efforts, hardware that can be in the hope of the neighborhood according to upturned position
Consuming and Communication cost.It is example with attached drawing 2, it is known that the software efforts currently solved, hardware cost and Communication cost are asked in (1)
Must be respectively 33,8,15.3 consumings that candidate neighborhood solution vector is 1001 are asked now.The candidate solution is by 0 He of upturned position
What position 1 generated.It sets to 0 in place, component represents that software efforts increase, hardware cost is reduced from 0 to 1.In currently solution software and hardware consumption
On the basis of taking, respectively plus V1Software efforts 6, subtract V1Hardware cost 3 after, soft or hard consuming is updated to 39 and 5 respectively;Together
Reason, in position 1, component represents to subtract the software efforts of the point, increases the hardware cost of the node, then needs subtract from 1 to 0
Remove V2Software efforts 20, in addition V2Hardware cost 14, the soft or hard consuming of the neighborhood is finally updated to 19 and 19 respectively.The party
The advantage of method is, using the Given information currently solved, to avoid what is initially caused according to the formula accent in implementation process (1)
It is inefficient.
Be 0 about number, the calculating of neighborhood Communication cost that solution vector is 1001, be equally using it is known currently solve it is logical
Letter, which expends, to be acquired.Specifically, overturning for position 0 when, solution vector becomes 1101, due to node V1In adjacent node, node V3
In different divisions, represent that the side can generate Communication cost;If on the contrary, node V2At this time with V1It is divided in same
In the middle, then Communication cost will not be generated.According to this condition, need Communication cost being updated to 15-9+3=9;Similarly, it overturns
During position 1, solution vector becomes 1001, with node V2In adjacent node, node V1With node V3All in different divisions,
Therefore this two sides can all generate Communication cost.Communication cost is updated to 9+9+7=25, the reality of as candidate neighborhood solution 1001
Border Communication cost.
Software efforts, hardware cost, the Communication cost of remaining candidate neighborhood solution can be acquired according to above method.Most
Afterwards, the corresponding global memory opened up is written into number, software efforts, hardware cost and the Communication cost of each candidate neighborhood solution
In array.As a result it is as shown in table 1 below:
The solution of the candidate neighborhood of table 1. and consuming
Number | Solution vector | Software efforts | Hardware cost | Communication cost |
0 | 1001 | 19 | 19 | 25 |
1 | 1111 | 48 | 0 | 0 |
2 | 1100 | 26 | 10 | 10 |
3 | 0011 | 22 | 17 | 10 |
4 | 0000 | 0 | 27 | 0 |
5 | 0110 | 29 | 8 | 25 |
(7) since the solution target of the hardware/software partitioning algorithms of the present invention is belt restraining, and not all candidate
Neighborhood solution all meets given constraints, needs to exclude the solution that the sum of software efforts and Communication cost are unsatisfactory for constraint, retain
Meet the solution of constraint.
It is assumed that problem P's is constrained to 40, according to optimization aim P, candidate neighborhood is indicated, satisfaction is constrained to 1;It is no
Then, it is 0.Table 2 can be in row identifier array for neighborhood result.
2. neighborhood of table can line identifier result
0 | 0 | 1 | 1 | 1 | 0 |
According to mark as a result, meeting the feasible solution of constraint by retaining in compaction algorithms.The algorithm passes through attached drawing 4
Process realize.In above formula, the neighborhood of solution that neighborhood number is 2,3,4 is numbered, software efforts, hardware cost and Communication cost quilt
Retain into the correspondence array of GPU ends feasible solution.Table 3 is to preserve to the expression in feasible solution array
The neighborhood candidate solution that table 3. is retained by compaction
Number | 2 | 3 | 4 | X | X | X |
Software efforts | 26 | 10 | 10 | X | X | X |
Hardware cost | 22 | 17 | 10 | X | X | X |
Communication cost | 0 | 27 | 0 | X | X | X |
(8) according to table 3, taboo list and optimization aim P, select what is do not avoided, and the neighborhood of hardware cost minimum
Solution.In this example, due to being the 1st iteration, taboo list is sky, and all feasible solutions can be chosen.At GPU ends, this is performed
The algorithm of selection course is the reduction operations with taboo state table.Standard reduction operations are primarily referred to as input array
Summation, maximizing or process of minimizing.Finally, the hardware cost of feasible solution that neighborhood number is 3 is minimum, therefore selects
Select the current solution that the neighborhood solution is next iteration.
(9) number of the feasible solution of non-taboo, software efforts, hardware cost and Communication cost are passed back CPU ends.
(10) taboo list is updated.The neighborhood number that generation is currently solved is preserved into taboo list.In this example, due to being
An iteration so current solution is generated by initial solution, is not indexed into taboo list.But in second of iteration, number 3
Feasible neighborhood become currently solving, so when updating taboo list, need to number this into taboo list.Meanwhile state will be avoided
Corresponding position in table is updated to 1, represents that the neighborhood is avoided.
In addition, when taboo list is full, a neighborhood identifier, while the correspondence identifier position that state table will be avoided first are lifted a ban
The value put is updated to 0.And then the neighborhood currently solved is numbered into taboo list.
Taboo list is a round-robin queue, and joining the team and going out team carries out respectively in headers and footers.
(11) by not avoiding neighborhood identifier most preferably, current solution vector is converted, makes new current solution
Vector, while the map vector for further including solution vector for needing to convert.With software efforts, the hardware cost for not avoiding neighborhood most preferably
And the corresponding consuming that Communication cost update currently solves.It is with being worth mentioning, even if the best hardware cost for not avoiding neighborhood is big
In the hardware cost currently solved, update is still carried out.
(12) the global of more new explanation expends, specially if the new hardware cost currently solved is consumed less than the existing overall situation
Take, then overall situation consuming is updated.In update, continuous parameters are not improved with periodicity clear 0;Otherwise, which increases 1.When
When being more than 200 more than value, algorithm terminates.
(13) new current solution vector solution, the mapping of vector, software efforts, hardware cost, Communication cost, neighborhood are avoided
State table is transmitted to GPU ends.The setting of greatest iteration periodicity increases 1, and if greater than 2000, algorithm terminates;Otherwise, algorithm exits.
Be the verification present invention on benchmark test collection as a result, table 4 provide each task image number of nodes n, number of edges m with
And the size of corresponding problem scale size=2 × n+3 × m:
4. test data set of table
Equally distributed random number of the software efforts of each node between [1,100], hardware cost is corresponding node
The Normal Distribution random number of software efforts;Communication cost between node is section [0,2 ρ smax] between obedience
Equally distributed random number.Wherein, smaxMaximum value for the software efforts of individual node in task image;ρ is communication-calculating ratio
(CCR), when ρ values are 0.1,1,10, computation-intensive task, osculant task and communications-intensive tasks are corresponded to respectively.
The value of constraint R is divided into two kinds of situations:1) stringent real-time constraint, value are [0,1/2 × ∑ si] between it is equally distributed with
Machine number;2) weak projectivity constrains, and value is [1/2 × ∑ si,∑si] between equally distributed random number.Therefore, communication-calculating ratio
Combination one with constraint R shares six kinds of situations.
Next, with regard to CCR=1, R=high illustrates the advantage of the present invention, table 5 in this case with existing TABU side
The comparison of the speed of service and quality of method.Wherein, alg-ANTS represents the serial implementation of the present invention, and GANTS is represented on GPU
Parallel Implementation.In most cases, run time of the invention is below existing alg-ANTS methods.
ANTS and GANTS inventions are to the improvement degree and run time of solution under table 5.CCR=1, R=high
It should be understood that the part that this specification does not elaborate belongs to the prior art.
It should be understood that the above-mentioned description for preferred embodiment is more detailed, can not therefore be considered to this
The limitation of invention patent protection range, those of ordinary skill in the art are not departing from power of the present invention under the enlightenment of the present invention
Profit is required under protected ambit, can also be made replacement or deformation, be each fallen within protection scope of the present invention, this hair
It is bright range is claimed to be determined by the appended claims.
Claims (10)
1. a kind of adaptive neighborhood TABU search based on GPU solves Method for HW/SW partitioning, which is characterized in that including following step
Suddenly:
Step 1:In host side, the initial solution and relevant information of hardware-software partition are obtained;Initial solution is respectively set to currently
Solution and globally optimal solution;
Step 2:The representation of the task image of hardware-software partition is converted in host side;
Step 3:Arrange parameter;Initialize the memory space needed for host side;Initialize the memory space needed for GPU ends;
Step 4:In host side, GPU ends are sent data to, start GPU ends kernel;
Step 5:At GPU ends, best candidate neighborhood union is calculated by kernel and passes result back host side;
Step 6:In host side, taboo list and taboo indications are updated;
Step 7:In host side, current solution and globally optimal solution are updated;
Step 8:In host side, judge whether to meet stopping criterion;
If it is not, then it is back to step 4;
If so, terminate flow.
2. the adaptive neighborhood TABU search according to claim 1 based on GPU solves Method for HW/SW partitioning, feature
It is:Initial solution described in step 1 refers to the hardware consumption of the task image to be divided acquired by heuristic or random device
Take;The relevant information mainly includes the software efforts and Communication cost of solution vector, the map vector of solution vector and initial solution;
The solution vector is the array being made of 0 or 1 corresponding with hardware cost;The map vector of the solution vector is by 0 or 1
The array of composition;It is main to preserve the state before each node of task image is sorted, for the calculating of subsequent Communication cost.
3. the adaptive neighborhood TABU search according to claim 1 based on GPU solves Method for HW/SW partitioning, feature
It is:The representation of task image described in step 2 carries out conversion and refers to before task image is transferred to GPU that needing will be adjacent
The representation for connecing table is converted to the expression way being suitble in the sparse row format of the compression of GPU ends progress operation.
4. the adaptive neighborhood TABU search according to claim 1 based on GPU solves Method for HW/SW partitioning, feature
It is:Parameter described in step 3 mainly includes greatest iteration periodicity, does not continuously improve periodicity, the length of taboo list, neighborhood
Scale, the neighborhood index identifier currently solved is chosen as in candidate neighborhood solution set and starts thread needed for GPU kernels
The number of block size and thread block;
Memory space needed for the host side includes the memory space of taboo list, and the taboo state of each neighborhood candidate solution is deposited
Space is stored up, preserves all 2 indexes for overturning neighborhood candidate solutions of current solution generation, software efforts, hardware cost, Communication cost
Memory space, the number of optimal candidate neighborhood, software efforts, hardware cost, Communication cost memory space;
Memory space needed for the GPU ends includes the memory space of current solution vector and the map vector currently solved, Ge Gelin
Whether the memory space of the taboo state of domain candidate solution, each neighborhood candidate solution meet the status identifier array of constraint, task
The memory space of the sparse row format of the compression of figure, current solution generation all 2 overturn the index of neighborhood candidate solution, software consumption
Take, the memory space of hardware cost, Communication cost, preserve the memory space for the feasible neighborhood candidate solution for meeting constraint, it is best to wait
Select the number of neighborhood, software efforts, hardware cost, Communication cost memory space.
5. the adaptive neighborhood TABU search according to claim 1 based on GPU solves Method for HW/SW partitioning, feature
It is:Send data to GPU ends described in step 4, the data of transmission include the solution vector currently solved, current solution vector is reflected
The software efforts, hardware cost and the Communication cost that penetrate, currently solve, the hardware cost of globally optimal solution, the task of hardware-software partition
Figure, taboo list;GPU ends kernel mainly includes calculating kernel, the feasible neighbour to meeting constraint to the consuming of 2 overturning neighborhood solutions
The kernel for select kernel and taboo evaluation is carried out to feasible neighborhood solution of domain solution.
6. the adaptive neighborhood TABU search according to claim 1 based on GPU solves Method for HW/SW partitioning, feature
It is, the process that best candidate neighborhood is calculated described in step 5 includes following sub-step:
Step 5.1:At GPU ends, each 2 overturning neighborhood is a candidate solution;GPU concurrently consumes each candidate solution software for calculation
Take, hardware cost and Communication cost;After the completion of calculating, whether meet constraint evaluation according to the sum of software efforts and Communication cost,
It will be in the corresponding position of 0 or 1 write state identifier array;Wherein, 0 foot constraint with thumb down;1 represents to meet constraint;Together
When will be in the memory space of the indications of neighborhood, software efforts, hardware cost and Communication cost write-in GPU ends application;
Step 5.2:At GPU ends, according to neighborhood states identifier array, by parallel C ompaction algorithms, state is indicated
It accords in the memory space of feasible neighborhood candidate solution that the neighborhood deposit GPU ends in array marked as 1 meet constraint;
Step 5.3:At GPU ends, by taboo list, each feasible solution is concurrently judged whether in taboo list, if feasible solution is not
In taboo list and in all 2 overturning feasible solutions there is minimal hardware to expend, be then best neighborhood candidate solution;It is if feasible
Neighborhood solution is in taboo list, but its hardware cost is equally also selected as optimal candidate neighborhood less than global optimum.
7. the adaptive neighborhood TABU search according to claim 1 based on GPU solves Method for HW/SW partitioning, feature
It is:The result for passing host side described in step 5 back be mainly the number of optimal candidate neighborhood, software efforts, hardware cost with
And Communication cost.
8. the adaptive neighborhood TABU search according to claim 1 based on GPU solves Method for HW/SW partitioning, feature
It is:Described in step 6 host side update taboo list process be the first in first out based on round-robin queue, taboo object be work as
Neighborhood indications of the preceding solution in last iteration;During update taboo indications, the index number of dequeue is 0, while into queue
Index number be 1.
9. the adaptive neighborhood TABU search according to claim 1 based on GPU solves Method for HW/SW partitioning, feature
It is:Current solution and globally optimal solution are updated described in step 7, is substituted currently with the best neighborhood that current iteration is selected
Solution, it is main to include replacing the mapping of solution vector, solution vector currently solved, hardware cost, software efforts, Communication cost, current solution
Indications;If the consuming of best neighborhood is expended better than global best hardware cost, the similary global hardware that updates;When update is complete
During office's hardware cost, periodicity is not improved continuously by clear 0.
10. the adaptive neighborhood TABU search according to claim 1 based on GPU solves Method for HW/SW partitioning, special
Sign is:Stopping criterion described in step 8 is primarily referred to as performing step 4 to the cycle-index of step 8 or reach maximum time
After counting or continuously not improving periodicity, stop operation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610212930.2A CN105844110B (en) | 2016-04-07 | 2016-04-07 | A kind of adaptive neighborhood TABU search based on GPU solves Method for HW/SW partitioning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610212930.2A CN105844110B (en) | 2016-04-07 | 2016-04-07 | A kind of adaptive neighborhood TABU search based on GPU solves Method for HW/SW partitioning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105844110A CN105844110A (en) | 2016-08-10 |
CN105844110B true CN105844110B (en) | 2018-07-06 |
Family
ID=56596844
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610212930.2A Expired - Fee Related CN105844110B (en) | 2016-04-07 | 2016-04-07 | A kind of adaptive neighborhood TABU search based on GPU solves Method for HW/SW partitioning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105844110B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108009013A (en) * | 2017-12-25 | 2018-05-08 | 湖南大学 | For a kind of parallel neighborhood search method of collaboration of separation constraint knapsack problem |
CN112489501B (en) * | 2020-11-26 | 2022-06-24 | 山东师范大学 | Teaching demonstration system and method for tabu search algorithm |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101017508A (en) * | 2006-12-21 | 2007-08-15 | 四川大学 | SoC software-hardware partition method based on discrete Hopfield neural network |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8954775B2 (en) * | 2012-06-20 | 2015-02-10 | Intel Corporation | Power gating functional units of a processor |
-
2016
- 2016-04-07 CN CN201610212930.2A patent/CN105844110B/en not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101017508A (en) * | 2006-12-21 | 2007-08-15 | 四川大学 | SoC software-hardware partition method based on discrete Hopfield neural network |
Non-Patent Citations (2)
Title |
---|
GPU-based Accerleration of System-Level Design Tasks;Unmesh D. Bordoloi;《International Journal of Parallel Programming》;20100116;第38卷(第3-4期);第1-28页 * |
并行K均值聚类和贪婪算法融合的软硬件划分;杜敏 等;《信息技术》;20080430(第4期);第134-137页 * |
Also Published As
Publication number | Publication date |
---|---|
CN105844110A (en) | 2016-08-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | Computation offloading in multi-access edge computing using a deep sequential model based on reinforcement learning | |
Attiya et al. | An improved hybrid swarm intelligence for scheduling iot application tasks in the cloud | |
CN114610474B (en) | Multi-strategy job scheduling method and system under heterogeneous supercomputing environment | |
Abualigah et al. | Boosting marine predators algorithm by salp swarm algorithm for multilevel thresholding image segmentation | |
CN105373432B (en) | A kind of cloud computing resource scheduling method based on virtual resource status predication | |
US20240111586A1 (en) | Multi-policy intelligent scheduling method and apparatus oriented to heterogeneous computing power | |
DE102020108374A1 (en) | METHOD AND DEVICE FOR THE MULTIPLE RUN-TIME PLANNING OF SOFTWARE EXECUTED IN A HETEROGENIC SYSTEM | |
CN109684088B (en) | Remote sensing big data rapid processing task scheduling method based on cloud platform resource constraint | |
CN112819157B (en) | Neural network training method and device, intelligent driving control method and device | |
CN110838072A (en) | Social network influence maximization method and system based on community discovery | |
CN105844110B (en) | A kind of adaptive neighborhood TABU search based on GPU solves Method for HW/SW partitioning | |
CN112365097A (en) | Method, device and equipment for processing electricity consumption data and computer readable storage medium | |
CN117785490B (en) | Training architecture, method, system and server of graph neural network model | |
Rawson et al. | Old or heavy? Decaying gracefully with age/weight shapes | |
CN114675975A (en) | Job scheduling method, device and equipment based on reinforcement learning | |
CN109783033A (en) | A kind of date storage method and electronic equipment suitable for heterogeneous system | |
CN112232401A (en) | Data classification method based on differential privacy and random gradient descent | |
DE102022130788A1 (en) | METHODS, DEVICES AND ARTICLES OF MANUFACTURE FOR GENERATION OF INSTRUCTION LISTS FOR OUTSOURCING TO ACCELERATOR CIRCUIT ASSEMBLY | |
CN116017476A (en) | Wireless sensor network coverage design method and device | |
Fan et al. | Associated task scheduling based on dynamic finish time prediction for cloud computing | |
CN115564374A (en) | Collaborative multitask redistribution method, device, equipment and readable storage medium | |
Yan et al. | Novel bat algorithms for scheduling independent tasks in collaborative Internet-of-Things | |
CN114064235A (en) | Multitask teaching and learning optimization method, system and equipment | |
CN106970840A (en) | A kind of Method for HW/SW partitioning of combination task scheduling | |
CN113902088A (en) | Method, device and system for searching neural network structure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180706 Termination date: 20200407 |