CN105844110B

CN105844110B - A kind of adaptive neighborhood TABU search based on GPU solves Method for HW/SW partitioning

Info

Publication number: CN105844110B
Application number: CN201610212930.2A
Authority: CN
Inventors: 何发智; 侯能; 周毅
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2016-04-07
Filing date: 2016-04-07
Publication date: 2018-07-06
Anticipated expiration: 2036-04-07
Also published as: CN105844110A

Abstract

The invention discloses a kind of adaptive neighborhood TABU search based on GPU to solve Method for HW/SW partitioning, tabu search algorithm common in hardware-software partition field is transformed first, the architectural feature of its calculating process and GPU is made to match, so as to be prepared for algorithm is migrated to GPU.Secondly, to further improve the performance of algorithm, the present invention does not only give general frame of the algorithm on GPU, but also specific perform of the algorithm on GPU is optimized.Contrast experiment shows that the present invention will be better than known work on quality and calculating speed is solved.

Description

A kind of adaptive neighborhood TABU search based on GPU solves Method for HW/SW partitioning

Technical field

The present invention relates to Hardware/Software Co-design Technology fields, are related to a kind of Method for HW/SW partitioning, more particularly to one Adaptive neighborhood TABU search of the kind based on GPU solves Method for HW/SW partitioning.

Background technology

Modern embedded system is usually made of hardware and software.Hardware refers mainly to be the processor towards specific application, Have the characteristics that execution speed is fast, low in energy consumption, but cost is higher；And software execution speed is slow, power consumption is high but at low cost.It is logical Each calculating task operated in target embedded systems is often expressed as the node that resource requirement differs, the mesh of hardware-software partition Be in the case where meeting particular constraints, each calculating task is reasonably mapped on software or hardware so that the performance of whole system It optimizes.Hardware-software partition is the committed step in Hardware/Software Collaborative Design.

Hardware-software partition is a np complete problem.At present, the algorithms most in use for solving hardware-software partition problem is broadly divided into two Class, i.e., accurate derivation algorithm and approximation algorithm.Accurate derivation algorithm, which is mainly used to handle small-scale hardware-software partition, asks Topic.When problem scale increases, the scale of solution exponentially increases, and accurate derivation algorithm becomes infeasible, then uses and inspire Formula method solves.

The heuristic for solving hardware-software partition problem is divided into special heuristic and general heuristic.It is special Heuristic is divided into the hardware constraints mode of priority or the software constraint mode of priority.General heuristic refers mainly to all kinds of intelligence Searching algorithm, as genetic algorithm, ant group algorithm, particle group optimizing method, simulated annealing method, Artificial Immune Algorithm, taboo are searched The combination of Suo Fangfa etc. and these methods.These intelligent optimization algorithms have the characteristics that：Belong to track class or kind realm Algorithm mainly approaches solution target by successive ignition；In addition, a certain number of candidate solutions are initialized in each iteration, By the quality that next iteration is determined to the selection of candidate solution.However, when candidate solution quantity is excessive, algorithm can be caused to exist Increase in run time.When problem scale is very big, temporal expense is particularly evident.

The implementation procedure of this kind of algorithm has the characteristics that inherent parallel in itself.Therefore, naturally enough these algorithms are carried out Parallelization is with the solution of acceleration problem.But the work that hardware-software partition problem is solved about parallel intelligent optimization algorithm is seldom, The known related work published is soft or hard to solve only including paralleling genetic algorithm and parallel Hybrid Particle Swarm Optimization Part partition problem.Small-scale cluster of the platform used for more individual PC compositions.

In recent years, the intelligent optimization algorithm that many job description GPU accelerate solves combinatorial optimization problem.By these texts The inspiration offered, the present invention explores for the first time accelerates TABU search to solve the process divided in hardware-software partition field using GPU.

Invention content

The purpose of the invention is to overcome the shortcoming of above-mentioned background technology, provide a kind of based on the adaptive of GPU Neighborhood TABU search solves Method for HW/SW partitioning.

The technical solution adopted in the present invention is：A kind of adaptive neighborhood TABU search based on GPU solves software and hardware and draws Divide method, which is characterized in that include the following steps：

Step 1：In host side, the initial solution and relevant information of hardware-software partition are obtained；Initial solution is respectively set to Current solution and globally optimal solution；

Step 2：The representation of the task image of hardware-software partition is converted in host side；

Step 3：Arrange parameter；Initialize the memory space needed for host side；Initialize the memory space needed for GPU ends；

Step 4：In host side, GPU ends are sent data to, start GPU ends kernel；

Step 5：In equipment end, best candidate neighborhood union is calculated by kernel and passes result back host side；

Step 6：In host side, taboo list and taboo indications are updated；

Step 7：In host side, current solution and globally optimal solution are updated；

Step 8：In host side, judge whether to meet stopping criterion；

If it is not, then return to step 4；

If so, terminate flow.

Preferably, initial solution described in step 1 refers to be divided acquired by heuristic or random device The hardware cost of business figure；The relevant information mainly includes the software efforts of solution vector, the map vector of solution vector and initial solution And Communication cost；The solution vector is the array being made of 0 or 1 corresponding with hardware cost；The mapping of the solution vector The array that vector is made of 0 or 1；It is main to preserve the state before each node of task image is sorted, for subsequent communication consumption The calculating taken.

Refer to task image being transferred to GPU preferably, the representation of task image described in step 2 carries out conversion Before, the representation that will abut against table is needed to be converted to and is suitble to carry out the sparse row format of the compression of operation at GPU ends (Compressed Sparse Row：CSR expression way).

Preferably, parameter described in step 3 mainly includes greatest iteration periodicity, does not improve periodicity, taboo continuously The neighborhood index identifier currently solved is chosen as in the length of table, the scale of neighborhood, candidate neighborhood solution set and starts GPU The number of thread block size and thread block needed for kernel；

Memory space needed for the host side includes the memory space of taboo list, the taboo state of each neighborhood candidate solution Memory space, preserve all 2 indexes for overturning neighborhood candidate solutions, software efforts, hardware cost, the communication consumption of current solution generation The memory space taken, the number of optimal candidate neighborhood, software efforts, hardware cost, Communication cost memory space；

Memory space needed for the GPU ends includes the memory space of current solution vector and the map vector currently solved, respectively Whether the memory space of the taboo state of a neighborhood candidate solution, each neighborhood candidate solution meet the status identifier array of constraint, The memory space of the sparse row format of the compression of task image, index, the software of all 2 overturning neighborhood candidate solutions of current solution generation Consuming, hardware cost, Communication cost memory space, preserve meet constraint feasible neighborhood candidate solution memory space, most preferably The number of candidate neighborhood, software efforts, hardware cost, Communication cost memory space.

Preferably, send data to GPU ends described in step 4, the data of transmission include the solution vector currently solved, when It is the software efforts for mapping, currently solving, hardware cost and the Communication cost of preceding solution vector, the hardware cost of globally optimal solution, soft or hard Task image, the taboo list of part division；GPU ends kernel mainly includes calculating kernel, to meeting to the consuming of 2 overturning neighborhood solutions The kernel for select kernel and taboo evaluation is carried out to feasible neighborhood solution of the feasible neighborhood solution of constraint.

Preferably, the process that best candidate neighborhood is calculated described in step 5 includes following sub-step：

Step 5.1：At GPU ends, each 2 overturning neighborhood is a candidate solution；GPU concurrently calculates each candidate solution soft Part consuming, hardware cost and Communication cost；After the completion of calculating, commented according to whether the sum of software efforts and Communication cost meet constraint Valency, will be in the corresponding position of 0 or 1 write state identifier array；Wherein, 0 foot constraint with thumb down；1 represents to meet constraint； It simultaneously will be in the memory space of the indications of neighborhood, software efforts, hardware cost and Communication cost write-in GPU end applications；

Step 5.2：At GPU ends, according to neighborhood states identifier array, by parallel C ompaction algorithms, by state In the memory space of feasible neighborhood candidate solution that neighborhood deposit GPU ends in indications array marked as 1 meet constraint；

Step 5.3：At GPU ends, by taboo list, each feasible solution is concurrently judged whether in taboo list, if feasible Solution not in taboo list and in all 2 overturning feasible solutions there is minimal hardware to expend, then is best neighborhood candidate solution；If Feasible neighborhood solution is in taboo list, but its hardware cost is equally also selected as optimal candidate neighborhood less than global optimum.

Preferably, the result for passing host side described in step 5 back is mainly the number of optimal candidate neighborhood, software consumption Take, hardware cost and Communication cost.

Preferably, described in step 6 host side update taboo list process be the first in first out based on round-robin queue, It is neighborhood indications of the current solution in last iteration to avoid object；During update taboo indications, the index number of dequeue is 0, at the same into queue index number be 1.

It is the best neighbour selected with current iteration preferably, updating current solution and globally optimal solution described in step 7 Domain substitutes current solution, main to include replacing the mapping of solution vector, solution vector currently solved, hardware cost, software efforts, communication consumption Expense, the indications currently solved；If the consuming of best neighborhood is better than global best hardware cost, similary to update global hardware consumption Take；When updating global hardware consuming, periodicity is not improved continuously by clear 0.

Preferably, stopping criterion described in step 8 is primarily referred to as performing step 4 to the cycle-index of step 8 or reach After having arrived maximum times or continuously not improved periodicity, stop operation.

The present invention is compared with the prior art compared with effect is positive and apparent：First, the present invention is in the execution time of algorithm On to be less than the work that is currently known；Secondly, on quality is solved, also it is better than known work.

Description of the drawings

Fig. 1 is the general frame schematic diagram of the embodiment of the present invention；

Fig. 2 is the task graph model to be divided of the embodiment of the present invention and the sparse row format expression signal of corresponding compression Figure；

Fig. 3 is the candidate solution create-rule schematic diagram of the embodiment of the present invention；

Fig. 4 is the compaction algorithm flow schematic diagrames of the embodiment of the present invention.

Specific embodiment

Understand for the ease of those of ordinary skill in the art and implement the present invention, with reference to the accompanying drawings and embodiments to this hair It is bright to be described in further detail, it should be understood that implementation example described herein is merely to illustrate and explain the present invention, not For limiting the present invention.

Major architectural of the attached drawing 1 for the present invention.Entire frame is divided into host side and client, and host side includes CPU and interior It deposits；Client refers to GPU.The task of host side is to generate initial solution；Client is transferred to using initial solution as current solution；Receive visitor The optimal candidate solution at family end；Update taboo list and taboo state table；Current solution and globally optimal solution are updated by optimal candidate solution； Stop condition is carried out judging to decide whether to carry out calculating process next time.In client, all 2- are generated by currently solving Candidate solution is overturn, in logic, GPU is concurrently to all consuming of candidate solution computing hardware, software efforts and Communication cost；According to Constraints exclusion is unsatisfactory for the candidate solution of constraint；According to the feasible solution that taboo state table selection is not avoided, it is transmitted back master Generator terminal.

Task image to be divided is expressed as non-directed graph G (V, E) by the present embodiment.Wherein, V={ v₁,v₂,...,v_n, it represents Task node set to be divided.Each node includes two consumings, i.e., consuming h (v when node division is hardware_i) and divide Consuming s (v during for software_i).E represents the line set between node, and the weights on side represent to belong to not when two adjacent nodes With division when, corresponding Communication cost c (v can be generated between node_i,v_j).Therefore, P={ V_H,V_SIt is known as a software and hardware It divides, V_HAnd V_SMeetV_H∪V_S=V.Correspondingly, the line set for dividing P is defined as Ep={ (vi, vj):vi∈ VH, vj ∈ VS or v_i∈V_S,v_j∈V_H}.Top right plot in attached drawing 2 is a simple task image to be divided.Figure interior joint packet Consuming is performed when being divided into hardware or software respectively containing task.Weights on side represent to be drawn when two tasks of connection It is divided into the Communication cost generated when hardware and software or software and hardware.

It divides P and mainly expends measurement, i.e. hardware cost H by 3_P, software efforts S_PAnd Communication cost C_P.This 3 consumings It is defined as follows：

Wherein：h_iRepresent the hardware cost (i=1,2..., n) of node i；v_iIt represents to be divided into hardware in task image Node；v_jThe node of software is divided into expression task image；V_HNode subsets in expression task, the node in the set is by hard Part performs；V_SNode subsets in expression task, the node in the set are performed by software；E_PIt represents to connect in task image and be in The line set of two different demarcation set；H_PRepresent the summation in the hardware cost of all nodes in hardware node setS_PTable Show the summation in the hardware cost of all nodes in software node set；C_PRepresent that connection is drawn in two differences in task image Divide the sum of institute's weights of line set of set, that is, be divided total Communication cost of task image.

By above formula, partition problem P is defined as follows：

Problem P：Given figure G and consuming function s, h, c and R >=0.It asks and is meeting constraint S_P+C_PUnder≤R so that hardware cost H_PA minimum hardware-software partition P.

The solution for dividing P is expressed as n-dimensional vector X={ x₁,x₂,...,x_n}.Component x_iWhen=1, represent that node is held by software Row；x_iWhen=0, represent that node is performed by hardware.C (X) represents the Communication cost of whole system.Therefore, problem P can be formalized For following minimization problem：

Wherein, hi represents the hardware cost of node, s_iRepresent the software efforts of node.Hardware cost, the software consumption of problem P Take and the calculation formula of Communication cost is as follows：

In problem P, the calculation formula whether evaluation solution meets constraint is as follows：

The system platform of this implementation is windows7, and the developing instrument that serial section is realized in development environment is Visual Studio2012, programming language C++；The developing instrument that parallel section is realized is 7.0 versions of CUDA C.The present invention's is main Frame is referring to Fig.1, the following specific steps for the embodiment of the present invention：

(1) discussion for simplified problem, it is assumed that the initial solution that software divides is generated by some algorithm, which is solving Task node is not ranked up in the process, then initial solution can be the current solution vector 0101 in Fig. 3, and solution vector is reflected Directive amount is similarly 0101.The software efforts of solution, hardware cost and Communication cost are respectively 33,8,15.Further, it is assumed that task section Point is that greedy heuritic approach generates, then task node can carry out descending row according to hardware cost and the ratio of software efforts Sequence, then initial solution vector is just 1001, and the mapping of solution vector remains unchanged.

(2) task image is converted so that it can adapt to the execution of GPU algorithms.Lower-left in transformation result such as Fig. 2 Figure, wherein, the Communication cost of data array representation task images top；Subscript represents node serial number in row_ptr arrays, in subscript The element representation node adjacency subscript of first communication side in data arrays.For example, the 0th in row_ptr arrays Element represents task node of graph v₁, then v₁With v₂、v₃With syntople.The Communication cost of adjacency list is respectively 9 and 3, as Which node is first and v₁Adjoining depends on specific programming and realizes, in this example node v₂It is first and v₁Adjacent node, Therefore the value of the 0th element is 0 in row_ptr arrays, represents the Communication cost on first adjacent node in data arrays In under be designated as 0.Likewise, node v₂、v₃、v₄An adjacent node in data arrays under be designated as 2,4,6,8.row_ The size of ptr arrays adds 1 for the number of node, the subscript of the last one element of the last one element representation data arrays, therefore The size of data arrays is 2 times of task image number of edges amount.Communication cost in col_index array representation data arrays is corresponding Node serial number.

(3) the greatest iteration periodicity of algorithm is set as 2000, does not improve periodicity continuously and is set as 200, taboo list Length is set as the evolution of node number.Have 4 nodes in this implementation, then taboo list length be √ 4, as 2.Neighborhood scale is N* (n-1)/2, n is node number.Therefore, neighborhood scale is 6.

(4) memory space is opened up at GPU ends, including current solution vector, size is 4 int type arrays；Solution vector is reflected It penetrates, size is 4 int types；The memory space of task image, including size be 4+1 i.e. 5 int types row_ptr arrays, The col_index arrays of the data arrays of the float types of 4*2+1 i.e. 9, the int types of 4*2 i.e. 8；Size is 2 int types Taboo list；The neighborhood indications array of int types, the neighborhood software efforts array of float types, float types neighborhood The neighborhood of hardware cost array, the neighborhood Communication cost array of float types and int types can row identifier array size All it is 6；Due to not knowing the number of feasible solution in advance, the size of feasible solution array is equal to neighborhood scale, equally including int The neighborhood indications array of type, the neighborhood software efforts array of float types, the neighborhood hardware cost array of float types And the neighborhood Communication cost array of float types.

(5) kernel for starting GPU ends needs to configure thread scale.The thread tissue of CUDA includes thread grid and thread Block；One GPU kernel corresponds to a thread grid, and a thread grid is at least made of a thread block, thread grid Scale represents the thread number of blocks started needed for the kernel.One thread block is at least made of again a thread.So start one The size of the size * thread blocks of total number of threads=thread grid needed for a GPU kernels.The kernel that the present invention is run at GPU ends Consuming including all 2 overturning neighborhoods calculates and constraint evaluation, feasible solution select and the taboo of feasible solution evaluation totally 3 Kernel.Wherein, the thread configuration of the first two kernel is unified for the size of neighborhood scale.Field scale is 6 in this embodiment, therefore The total number of threads of the first two kernel is both configured to 6.Taboo about feasible solution is evaluated, and is set as 3 in this embodiment, is used One thread block, i.e. thread sizing grid are 1.

(6) in the consuming for calculating neighborhood, per thread corresponds to a neighborhood, therefore neighborhood number is by thread number and line Journey block number, the size of thread grid codetermine, and specific formula for calculation is as follows：

NeibIdx=blockIdx.x × blockDim.x+threadIdx.x

In above formula, neibIdx represents the number of candidate neighborhood, and blockIdx.x represents the number of thread block, BlockDim.x represents the size of thread block, and threadIdx.x represents the number of per thread in thread block.It is noticeable It is that obtain thread block and thread number be the built-in variable that CUDA C are provided in itself, it only need to the direct use when realizing.

After knowing neighborhood number, it is also necessary to know the current solution vector of overturning required for generating the corresponding neighborhood of the number Two positions.For this purpose, represent to overturn 2 subscripts of current solution vector with pos1 and pos2, then especially by following two formula It acquires：

Wherein, n represents that, with the number of nodes for dividing task image, neibIndex represents the number of the neighborhood candidate solution.Therefore It can must generate first upturned position pos1 needed for the neighborhood candidate solution.Then second needed for neighborhood candidate solution must can be generated A upturned position pos2.

As for this implementation, after number 0~5 is substituted into above formula, the neighborhood candidate solution of generation is that attached drawing 3 represents.

After 2 upturned positions for acquiring each neighborhood, software efforts, hardware that can be in the hope of the neighborhood according to upturned position Consuming and Communication cost.It is example with attached drawing 2, it is known that the software efforts currently solved, hardware cost and Communication cost are asked in (1) Must be respectively 33,8,15.3 consumings that candidate neighborhood solution vector is 1001 are asked now.The candidate solution is by 0 He of upturned position What position 1 generated.It sets to 0 in place, component represents that software efforts increase, hardware cost is reduced from 0 to 1.In currently solution software and hardware consumption On the basis of taking, respectively plus V₁Software efforts 6, subtract V₁Hardware cost 3 after, soft or hard consuming is updated to 39 and 5 respectively；Together Reason, in position 1, component represents to subtract the software efforts of the point, increases the hardware cost of the node, then needs subtract from 1 to 0 Remove V₂Software efforts 20, in addition V₂Hardware cost 14, the soft or hard consuming of the neighborhood is finally updated to 19 and 19 respectively.The party The advantage of method is, using the Given information currently solved, to avoid what is initially caused according to the formula accent in implementation process (1) It is inefficient.

Be 0 about number, the calculating of neighborhood Communication cost that solution vector is 1001, be equally using it is known currently solve it is logical Letter, which expends, to be acquired.Specifically, overturning for position 0 when, solution vector becomes 1101, due to node V₁In adjacent node, node V₃ In different divisions, represent that the side can generate Communication cost；If on the contrary, node V₂At this time with V₁It is divided in same In the middle, then Communication cost will not be generated.According to this condition, need Communication cost being updated to 15-9+3=9；Similarly, it overturns During position 1, solution vector becomes 1001, with node V₂In adjacent node, node V₁With node V₃All in different divisions, Therefore this two sides can all generate Communication cost.Communication cost is updated to 9+9+7=25, the reality of as candidate neighborhood solution 1001 Border Communication cost.

Software efforts, hardware cost, the Communication cost of remaining candidate neighborhood solution can be acquired according to above method.Most Afterwards, the corresponding global memory opened up is written into number, software efforts, hardware cost and the Communication cost of each candidate neighborhood solution In array.As a result it is as shown in table 1 below：

The solution of the candidate neighborhood of table 1. and consuming

Number	Solution vector	Software efforts	Hardware cost	Communication cost
					0	1001	19	19	25
1	1111	48	0	0
					2	1100	26	10	10
3	0011	22	17	10
					4	0000	0	27	0
5	0110	29	8	25

(7) since the solution target of the hardware/software partitioning algorithms of the present invention is belt restraining, and not all candidate Neighborhood solution all meets given constraints, needs to exclude the solution that the sum of software efforts and Communication cost are unsatisfactory for constraint, retain Meet the solution of constraint.

It is assumed that problem P's is constrained to 40, according to optimization aim P, candidate neighborhood is indicated, satisfaction is constrained to 1；It is no Then, it is 0.Table 2 can be in row identifier array for neighborhood result.

2. neighborhood of table can line identifier result

0

1

0

According to mark as a result, meeting the feasible solution of constraint by retaining in compaction algorithms.The algorithm passes through attached drawing 4 Process realize.In above formula, the neighborhood of solution that neighborhood number is 2,3,4 is numbered, software efforts, hardware cost and Communication cost quilt Retain into the correspondence array of GPU ends feasible solution.Table 3 is to preserve to the expression in feasible solution array

The neighborhood candidate solution that table 3. is retained by compaction

Number	2	3	4	X	X	X
							Software efforts	26	10	10	X	X	X
Hardware cost	22	17	10	X	X	X
							Communication cost	0	27	0	X	X	X

(8) according to table 3, taboo list and optimization aim P, select what is do not avoided, and the neighborhood of hardware cost minimum Solution.In this example, due to being the 1st iteration, taboo list is sky, and all feasible solutions can be chosen.At GPU ends, this is performed The algorithm of selection course is the reduction operations with taboo state table.Standard reduction operations are primarily referred to as input array Summation, maximizing or process of minimizing.Finally, the hardware cost of feasible solution that neighborhood number is 3 is minimum, therefore selects Select the current solution that the neighborhood solution is next iteration.

(9) number of the feasible solution of non-taboo, software efforts, hardware cost and Communication cost are passed back CPU ends.

(10) taboo list is updated.The neighborhood number that generation is currently solved is preserved into taboo list.In this example, due to being An iteration so current solution is generated by initial solution, is not indexed into taboo list.But in second of iteration, number 3 Feasible neighborhood become currently solving, so when updating taboo list, need to number this into taboo list.Meanwhile state will be avoided Corresponding position in table is updated to 1, represents that the neighborhood is avoided.

In addition, when taboo list is full, a neighborhood identifier, while the correspondence identifier position that state table will be avoided first are lifted a ban The value put is updated to 0.And then the neighborhood currently solved is numbered into taboo list.

Taboo list is a round-robin queue, and joining the team and going out team carries out respectively in headers and footers.

(11) by not avoiding neighborhood identifier most preferably, current solution vector is converted, makes new current solution Vector, while the map vector for further including solution vector for needing to convert.With software efforts, the hardware cost for not avoiding neighborhood most preferably And the corresponding consuming that Communication cost update currently solves.It is with being worth mentioning, even if the best hardware cost for not avoiding neighborhood is big In the hardware cost currently solved, update is still carried out.

(12) the global of more new explanation expends, specially if the new hardware cost currently solved is consumed less than the existing overall situation Take, then overall situation consuming is updated.In update, continuous parameters are not improved with periodicity clear 0；Otherwise, which increases 1.When When being more than 200 more than value, algorithm terminates.

(13) new current solution vector solution, the mapping of vector, software efforts, hardware cost, Communication cost, neighborhood are avoided State table is transmitted to GPU ends.The setting of greatest iteration periodicity increases 1, and if greater than 2000, algorithm terminates；Otherwise, algorithm exits.

Be the verification present invention on benchmark test collection as a result, table 4 provide each task image number of nodes n, number of edges m with And the size of corresponding problem scale size=2 × n+3 × m：

4. test data set of table

Equally distributed random number of the software efforts of each node between [1,100], hardware cost is corresponding node The Normal Distribution random number of software efforts；Communication cost between node is section [0,2 ρ s_max] between obedience Equally distributed random number.Wherein, s_maxMaximum value for the software efforts of individual node in task image；ρ is communication-calculating ratio (CCR), when ρ values are 0.1,1,10, computation-intensive task, osculant task and communications-intensive tasks are corresponded to respectively. The value of constraint R is divided into two kinds of situations：1) stringent real-time constraint, value are [0,1/2 × ∑ s_i] between it is equally distributed with Machine number；2) weak projectivity constrains, and value is [1/2 × ∑ s_i,∑s_i] between equally distributed random number.Therefore, communication-calculating ratio Combination one with constraint R shares six kinds of situations.

Next, with regard to CCR=1, R=high illustrates the advantage of the present invention, table 5 in this case with existing TABU side The comparison of the speed of service and quality of method.Wherein, alg-ANTS represents the serial implementation of the present invention, and GANTS is represented on GPU Parallel Implementation.In most cases, run time of the invention is below existing alg-ANTS methods.

ANTS and GANTS inventions are to the improvement degree and run time of solution under table 5.CCR=1, R=high

It should be understood that the part that this specification does not elaborate belongs to the prior art.

It should be understood that the above-mentioned description for preferred embodiment is more detailed, can not therefore be considered to this The limitation of invention patent protection range, those of ordinary skill in the art are not departing from power of the present invention under the enlightenment of the present invention Profit is required under protected ambit, can also be made replacement or deformation, be each fallen within protection scope of the present invention, this hair It is bright range is claimed to be determined by the appended claims.

Claims

1. a kind of adaptive neighborhood TABU search based on GPU solves Method for HW/SW partitioning, which is characterized in that including following step Suddenly：

Step 1：In host side, the initial solution and relevant information of hardware-software partition are obtained；Initial solution is respectively set to currently Solution and globally optimal solution；

Step 4：In host side, GPU ends are sent data to, start GPU ends kernel；

Step 5：At GPU ends, best candidate neighborhood union is calculated by kernel and passes result back host side；

Step 6：In host side, taboo list and taboo indications are updated；

Step 8：In host side, judge whether to meet stopping criterion；

If it is not, then it is back to step 4；

If so, terminate flow.

2. the adaptive neighborhood TABU search according to claim 1 based on GPU solves Method for HW/SW partitioning, feature It is：Initial solution described in step 1 refers to the hardware consumption of the task image to be divided acquired by heuristic or random device Take；The relevant information mainly includes the software efforts and Communication cost of solution vector, the map vector of solution vector and initial solution； The solution vector is the array being made of 0 or 1 corresponding with hardware cost；The map vector of the solution vector is by 0 or 1 The array of composition；It is main to preserve the state before each node of task image is sorted, for the calculating of subsequent Communication cost.

3. the adaptive neighborhood TABU search according to claim 1 based on GPU solves Method for HW/SW partitioning, feature It is：The representation of task image described in step 2 carries out conversion and refers to before task image is transferred to GPU that needing will be adjacent The representation for connecing table is converted to the expression way being suitble in the sparse row format of the compression of GPU ends progress operation.

4. the adaptive neighborhood TABU search according to claim 1 based on GPU solves Method for HW/SW partitioning, feature It is：Parameter described in step 3 mainly includes greatest iteration periodicity, does not continuously improve periodicity, the length of taboo list, neighborhood Scale, the neighborhood index identifier currently solved is chosen as in candidate neighborhood solution set and starts thread needed for GPU kernels The number of block size and thread block；

Memory space needed for the host side includes the memory space of taboo list, and the taboo state of each neighborhood candidate solution is deposited Space is stored up, preserves all 2 indexes for overturning neighborhood candidate solutions of current solution generation, software efforts, hardware cost, Communication cost Memory space, the number of optimal candidate neighborhood, software efforts, hardware cost, Communication cost memory space；

Memory space needed for the GPU ends includes the memory space of current solution vector and the map vector currently solved, Ge Gelin Whether the memory space of the taboo state of domain candidate solution, each neighborhood candidate solution meet the status identifier array of constraint, task The memory space of the sparse row format of the compression of figure, current solution generation all 2 overturn the index of neighborhood candidate solution, software consumption Take, the memory space of hardware cost, Communication cost, preserve the memory space for the feasible neighborhood candidate solution for meeting constraint, it is best to wait Select the number of neighborhood, software efforts, hardware cost, Communication cost memory space.

5. the adaptive neighborhood TABU search according to claim 1 based on GPU solves Method for HW/SW partitioning, feature It is：Send data to GPU ends described in step 4, the data of transmission include the solution vector currently solved, current solution vector is reflected The software efforts, hardware cost and the Communication cost that penetrate, currently solve, the hardware cost of globally optimal solution, the task of hardware-software partition Figure, taboo list；GPU ends kernel mainly includes calculating kernel, the feasible neighbour to meeting constraint to the consuming of 2 overturning neighborhood solutions The kernel for select kernel and taboo evaluation is carried out to feasible neighborhood solution of domain solution.

6. the adaptive neighborhood TABU search according to claim 1 based on GPU solves Method for HW/SW partitioning, feature It is, the process that best candidate neighborhood is calculated described in step 5 includes following sub-step：

Step 5.1：At GPU ends, each 2 overturning neighborhood is a candidate solution；GPU concurrently consumes each candidate solution software for calculation Take, hardware cost and Communication cost；After the completion of calculating, whether meet constraint evaluation according to the sum of software efforts and Communication cost, It will be in the corresponding position of 0 or 1 write state identifier array；Wherein, 0 foot constraint with thumb down；1 represents to meet constraint；Together When will be in the memory space of the indications of neighborhood, software efforts, hardware cost and Communication cost write-in GPU ends application；

Step 5.2：At GPU ends, according to neighborhood states identifier array, by parallel C ompaction algorithms, state is indicated It accords in the memory space of feasible neighborhood candidate solution that the neighborhood deposit GPU ends in array marked as 1 meet constraint；

Step 5.3：At GPU ends, by taboo list, each feasible solution is concurrently judged whether in taboo list, if feasible solution is not In taboo list and in all 2 overturning feasible solutions there is minimal hardware to expend, be then best neighborhood candidate solution；It is if feasible Neighborhood solution is in taboo list, but its hardware cost is equally also selected as optimal candidate neighborhood less than global optimum.

7. the adaptive neighborhood TABU search according to claim 1 based on GPU solves Method for HW/SW partitioning, feature It is：The result for passing host side described in step 5 back be mainly the number of optimal candidate neighborhood, software efforts, hardware cost with And Communication cost.

8. the adaptive neighborhood TABU search according to claim 1 based on GPU solves Method for HW/SW partitioning, feature It is：Described in step 6 host side update taboo list process be the first in first out based on round-robin queue, taboo object be work as Neighborhood indications of the preceding solution in last iteration；During update taboo indications, the index number of dequeue is 0, while into queue Index number be 1.

9. the adaptive neighborhood TABU search according to claim 1 based on GPU solves Method for HW/SW partitioning, feature It is：Current solution and globally optimal solution are updated described in step 7, is substituted currently with the best neighborhood that current iteration is selected Solution, it is main to include replacing the mapping of solution vector, solution vector currently solved, hardware cost, software efforts, Communication cost, current solution Indications；If the consuming of best neighborhood is expended better than global best hardware cost, the similary global hardware that updates；When update is complete During office's hardware cost, periodicity is not improved continuously by clear 0.

10. the adaptive neighborhood TABU search according to claim 1 based on GPU solves Method for HW/SW partitioning, special Sign is：Stopping criterion described in step 8 is primarily referred to as performing step 4 to the cycle-index of step 8 or reach maximum time After counting or continuously not improving periodicity, stop operation.