CN103605573A - Reconfigurable architecture mapping decision-making method based on expense calculation - Google Patents

Reconfigurable architecture mapping decision-making method based on expense calculation Download PDF

Info

Publication number
CN103605573A
CN103605573A CN201310576351.2A CN201310576351A CN103605573A CN 103605573 A CN103605573 A CN 103605573A CN 201310576351 A CN201310576351 A CN 201310576351A CN 103605573 A CN103605573 A CN 103605573A
Authority
CN
China
Prior art keywords
expense
mapped
running node
reconfigurable
processing unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310576351.2A
Other languages
Chinese (zh)
Other versions
CN103605573B (en
Inventor
齐志
马璐
曹鹏
王学香
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Pango Microsystems Co Ltd
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201310576351.2A priority Critical patent/CN103605573B/en
Publication of CN103605573A publication Critical patent/CN103605573A/en
Application granted granted Critical
Publication of CN103605573B publication Critical patent/CN103605573B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a reconfigurable architecture mapping decision-making method based on expense calculation. The method includes: building four expense models which are delay expense, interconnection expense, PE occupancy rate expense and similarity expense according to data dependence graph (DDG) indicating application algorithm core circulation and a reconfigurable architecture; for multiple feasible mapping schemes existing in one operating node, calculating the corresponding expense value, in the four expense models, of each feasible mapping scheme; sequentially traversing each feasible mapping scheme from primary to secondary according to the degree of influence of the four expense models on the mapping results, and gradually diminishing the feasible mapping scheme set to obtain the optimal mapping scheme. By the decision-making method, the factors with large influence on mapping are more predominated in mapping decision making. By the expense models and the mapping decision-making method, configuration information with high efficiency can be obtained, and parallelism of the reconfigurable architecture can be brought into full play.

Description

A kind of mapping decision-making technique of the reconfigurable architecture based on overhead computational
Technical field
The invention belongs to embedded areas of information technology, particularly a kind of mapping decision-making technique towards reconfigurable architecture based on overhead computational.
Background technology
The array configurations information that reconfigurable architecture is used, has not only guaranteed that application program realizes correct function in reconfigurable architecture, also will take into account the limited hardware resource of reconfigurable arrays and high efficiency execution.How to stablize the satisfactory configuration information of automatic generation is effectively all insurmountable problem all the time.
Core cyclic mapping in application program has been proved to be to uncertainty (the Non-deterministic Polynomial of a polynomial expression complexity to the problem on reconfigurable arrays, be called for short NP) complete problem, the paper < < The NP-completeness column:An ongoing guide > > S.1982 being delivered on < < Journal of Algorithms > > magazine in year by Johnson D proves.Since cannot calculate configuration information with an algorithm with determinacy polynomial time, can only obtain approximate solution by a series of Heuristic Methods, and substitute optimum solution with approximate solution.Didactic optimization method need to be used a series of experimental computing formula and decision rule, yet the complicacy of reconfigurable architecture has increased very large difficulty to didactic design.
Existing automatic mapping method is owing to controlling following 2 points by abstract cost model constraint: (1) too much route PE causes the minimizing of computational resource; (2) register spilling that a large amount of ephemeral datas causes, the often low configuration information of generation efficiency.In order to overcome the above problems, must consider from data transmission time delay, interconnect resource use amount, functional unit occupancy and mapping distance and these several respects of phase recency of operation room correlativity.
Summary of the invention
The technical matters solving: the deficiency existing for prior art, the invention provides the novel cost model, mapping cost function and the mapping decision-making technique that are applicable to reconfigurable architecture, for assessment of reconfigurable architecture software flow expense with judge optimum mapping scheme, solve and in prior art, set up the technical matters of cost model and definite optimum mapping difficulty accurately.
Technical scheme: for solving the problems of the technologies described above, the present invention by the following technical solutions:
A kind of mapping decision-making technique of the reconfigurable architecture based on overhead computational, model represents to apply the cardiocirculatory data dependence graph DDG of algorithm core, obtain the direct subsequent node Succ (u) of the direct precursor node Pred (u) of current running node u to be mapped, current running node u to be mapped, current running node u to be mapped, and in reconfigurable architecture the startup interval II of software flow, then order is carried out following steps:
(1), set up following 4 cost models, be respectively:
Delay overhead: represent that operand is transferred to the time delay of the input port of candidate's reconfigurable processing unit PE;
Interconnection expense: represent for transferring data to the interconnect resource number of candidate's reconfigurable processing unit PE;
PE occupancy expense: in order to weigh the usage degree of each reconfigurable processing unit PE in reconfigurable arrays;
Phase recency expense: weigh current running node to be mapped and be mapped to reconfigurable processing unit PE upper and do not have direct data dependence but to have the close degree between the running node of identical immediate successor running node;
(2), a plurality of feasible mapping scheme that a certain running node is existed, calculate each feasible mapping scheme corresponding overhead value in 4 cost models;
The computing formula of described delay overhead is:
Delay cos t ( PE u ) = MAX v s &prime; &Element; V s &prime; C w ( PE v s &prime; , PE u ) + C r ( PE v s &prime; , PE u ) + C d ( PE v s &prime; , PE u )
Wherein,
Reconfigurable processing unit PE uthe shone upon candidate processes unit that represents current running node u to be mapped;
V s'represent to be mapped in the direct precursor running node of current running node u to be mapped the set of all operations node on reconfigurable arrays;
represent that the required operand of the current running node u to be mapped of mapping is from its mapped direct precursor node v s'corresponding reconfigurable processing unit
Figure BDA0000416173260000023
to current candidate's reconfigurable processing unit PE udata transfer path in, the time delay of being introduced by interconnection line;
Figure BDA0000416173260000024
with
Figure BDA0000416173260000025
represent that respectively the required operand of the current running node u to be mapped of mapping is from reconfigurable processing unit to current candidate's reconfigurable processing unit PE udata transfer path in the time delay introduced by route PE and distributed register DRF;
The computing formula of described interconnection expense is:
Figure BDA0000416173260000027
Wherein,
V srepresent to be mapped to the set of all operations node on reconfigurable arrays;
V s'represent, in the direct precursor running node and immediate successor running node of current running node u to be mapped, to be mapped to the set of the running node on reconfigurable arrays, obviously V s'v ssubset;
V s'∈ V s'represent current running node to be mapped udirect precursor running node or immediate successor running node in one be mapped to running node on reconfigurable arrays;
Pred (u) represents the set of all direct precursor running node of current running node u to be mapped;
Succ (u) represents the set of all immediate successor running node of current running node u to be mapped;
Figure BDA0000416173260000031
represent reconfigurable processing unit PE uwith
Figure BDA0000416173260000032
between need the minimum route PE number that inserts;
The implication of above-mentioned formula shows, if current running node u to be mapped does not have direct precursor running node and immediate successor running node, or its direct precursor node and immediate successor running node all do not have mapped, Interconnetcost (PE u)=0.Otherwise interconnection expense is numerically equal in the direct precursor running node and immediate successor running node of current running node u to be mapped, the running node being mapped on reconfigurable arrays is corresponding
Figure BDA0000416173260000033
reconfigurable processing unit PE with current candidate ubetween need the number of the minimum route PE that uses.Especially, work as PE uin the time of cannot meeting interconnection needs, Interconnetcost (PE u)=∞.
The computing formula of described PE occupancy expense is:
U cos t ( PE u ) = PEOccupationCycles ( PE u ) II
Wherein,
PEOccupationCycles (PE u) represent that current reconfigurable processing unit PE carries out the T.T. that is mapped to the operational set on it;
II is the startup interval of reconfigurable architecture software flow;
The computing formula of described phase recency expense is:
Rel cos t ( PE u ) = &Sigma; v &Element; V min | Vexdist ( u , v ) - PEdisst ( PE u - Pv v ) |
Wherein,
V minin the running node v that represents to have shone upon with the set of the shortest all operations node of current running node u distance to be mapped;
Vexdist represents V minin the running node v that shone upon and the distance of current running node u to be mapped;
PEdist represents V minin the reconfigurable processing unit PE that shines upon of the running node v that shone upon vcandidate's reconfigurable processing unit PE with current running node u to be mapped ubetween distance;
(3), a plurality of feasible mapping scheme to current running node u to be mapped, order according to delay overhead, interconnection expense, PE occupancy expense and phase recency expense travels through each feasible mapping scheme successively, dwindle gradually feasible mapping scheme collection, finally draw optimum mapping scheme.
Model represents to apply the cardiocirculatory data dependence graph DDG of algorithm core, therefrom analyzes the modulo scheduling basic parameter of reconfigurable architecture, and then on the basis of these modulo scheduling basic parameters, sets up the computing formula of cost model and each cost model.
Further, in the present invention, each feasible mapping scheme of described traversal, dwindle feasible mapping scheme collection gradually, the traversal screening process that finally draws optimum mapping scheme comprises following 4 steps that order is carried out:
(1) delay overhead traversal, according to the size of delay overhead to the screening of sorting of each feasible mapping scheme, retain the mapping scheme of delay overhead in certain threshold range, this threshold range regulates according to actual application program and concrete reconfigurable architecture, and the method for adjusting belongs to those skilled in the art's common practise;
(2) interconnection expense traversal, by the mapping scheme after delay overhead traversal screening according to the screening of sorting of the size of interconnection expense, retain the mapping scheme of interconnection expense in certain threshold range, this threshold range regulates according to actual application program and concrete reconfigurable architecture, and control method belongs to those skilled in the art's common practise;
(3) PE occupancy traversal, by the mapping scheme after the screening of interconnection expense traversal according to the screening of sort of the size of PE occupancy expense, the mapping scheme of reservation PE occupancy expense minimum;
(4) phase recency expense traversal, by the mapping scheme after the screening of PE occupancy traversal according to the screening of sort of the size of phase recency expense, the mapping scheme of reservation phase recency expense minimum.
The sequencing of above-mentioned traversal by 4 kinds of expenses to the descending decision of mapping result effect, delay overhead has the greatest impact, then be interconnection expense, PE occupancy expense and phase recency expense successively, Stepwise Screening can obtain best mapping scheme like this.
Beneficial effect:
The present invention is after having carried out abundant parsing to the various hardware componenies of reconfigurable arrays, be combined in the characteristic of the application program with actual application function of moving on reconfigurable system, while selecting optimum mapping using data transmission time delay, interconnect resource use amount, functional unit occupancy and mapping distance with the phase recency of operation room correlativity as criterion, set up rational cost model, its corresponding mapping cost function and comprehensive both mapping decision-making techniques, can assess fully and effectively mapping cost.
Decision-making technique to the descending screening candidate mappings of the influence degree of mapping result scheme, is dwindled screening scope by each cost model gradually, and final decision optimum mapping has guaranteed that mapping projection rings larger factor and more occupy an leading position in mapping decision-making.
Use cost model of the present invention and mapping decision-making technique, can access the configuration information that execution efficiency is higher, thereby give full play to the concurrency of reconfigurable system, than existing method, realized the generation of more excellent robotization configuration information.
Accompanying drawing explanation
Fig. 1 is reconfigurable system structured flowchart;
Fig. 2 is the structural drawing of reconfigurable arrays;
Fig. 3 is workflow diagram of the present invention;
Fig. 4 is traversal screening process process flow diagram of the present invention;
Fig. 5 is Mapping Examples figure of the present invention.
Embodiment
Below in conjunction with the drawings and specific embodiments, further illustrate the present invention.
Fig. 1 is reconfigurable system structured flowchart.This reconfigurable system is comprised of main control processor, system bus, reconfigurable arrays, data flow control, Configuration Control Unit and a series of storage resources.Wherein storage resources comprises configuration register and global register.
Fig. 2 is that scale is the structural drawing of 4 * 4 reconfigurable arrays.Reconfigurable arrays is comprised of storage resources and the programmable interconnection network of reconfigurable processing unit PE, reconfigurable arrays.
Described reconfigurable processing unit PE provides the output port of data and the data path of self input port, and supports route pattern, support condition execution mechanism.
In described reconfigurable arrays, in order to store the storage resources of data and configuration information, comprise: the output register REG that distributed register DRF, reconfigurable processing unit PE output port arrange and the local configuration information register of reconfigurable processing unit PE inside.
Programmable interconnection network in reconfigurable arrays comprises data transmission network and conditioned signal transmission network.Described data transmission network is used between reconfigurable processing unit PE, the data transmission between distributed register DRF and between reconfigurable processing unit PE and distributed register DRF; Described conditioned signal transmission network is for transmitting the condition control signal of 1bit.
First the present invention sets up and represents the cardiocirculatory data dependence graph DDG of application algorithm core according to concrete application program, therefrom analyze the modulo scheduling basic parameter that obtains reconfigurable architecture: the set of all running node, the direct subsequent node Succ (u) that comprises the direct precursor node Pred (u) of current running node u to be mapped, current running node u to be mapped, current running node u to be mapped, and in reconfigurable architecture the startup interval II of software flow.
Then carry out the operation of following steps, workflow as shown in Figure 3 and Figure 4:
The first step, in reconfigurable arrays, 4 aspects of matching degree of reconfigurable processing unit PE distance in running node from the use number of data transmission time delay, interconnect resource, the occupation proportion of reconfigurable processing unit PE and data dependence graph DDG distance and reconfigurable arrays, set up the cost model of assessment mapping scheme, be respectively delay overhead, interconnection expense, PE occupancy expense and phase recency expense;
Described delay overhead: represent that reconfigurable processing unit PE carries out the time delay that the required operand of a certain operation is transferred to its input port, for different data transfer paths, the difference of the hardware component comprising in data transfer path can cause different transmission delays, and the larger track performance of time delay is poorer;
Described interconnection expense: represent that interconnection expense is higher for transferring data to the interconnect resource number of object processing unit, interconnect resource waste is more, can be just fewer for the interconnect resource of subsequent operation node mapping, and mapping is more backward by more difficult carrying out;
Described PE occupancy expense: in order to weigh the usage degree of each reconfigurable processing unit PE in array, if in mapping result, the usage degree of reconfigurable processing unit PE differs greatly, if the number of the operation of some reconfigurable processing unit PE execution is obviously more than other reconfigurable processing units PE, the number of executable operations needs the number of times of reconstruct configuration more more, final configuration file is also larger, causes total execution time of reconfigurable arrays longer;
Described phase recency expense: mainly for not having direct data dependence but to have the situation of identical immediate successor running node between current running node to be mapped and the running node being mapped on reconfigurable processing unit PE, weigh their close degree with phase recency.
Second step, a plurality of feasible mapping scheme that a certain running node is existed, calculates each feasible mapping scheme corresponding 4 overhead value in 4 cost models.
The 3rd step, 4 cost models to the influence degree of mapping result by leading inferior being followed successively by: delay overhead, interconnection expense, PE occupancy expense and phase recency expense.According to 4 cost models to the influence degree of mapping result by leading inferior each feasible mapping scheme that travels through successively, dwindle gradually feasible mapping scheme collection, finally draw optimum mapping scheme.
Ergodic process of the present invention is as follows:
Delay overhead traversal, according to the size of interconnection expense to the screening of sorting of each feasible mapping scheme, retain the mapping scheme of delay overhead in certain threshold range, this threshold range regulates according to actual application program and concrete reconfigurable architecture, and the method for adjusting belongs to those skilled in the art's common practise;
Interconnection expense traversal, by the mapping scheme after delay overhead traversal screening according to the screening of sorting of the size of interconnection expense, retain the mapping scheme of interconnection expense in certain threshold range, this threshold range regulates according to actual application program and concrete reconfigurable architecture, and the method for adjusting belongs to those skilled in the art's common practise;
PE occupancy expense traversal, by the mapping scheme after the screening of interconnection expense traversal according to the screening of sort of the size of PE occupancy expense, the mapping scheme of reservation PE occupancy expense minimum;
Phase recency expense traversal, by the mapping scheme after the screening of PE occupancy expense traversal according to the screening of sort of the size of phase recency expense, the mapping scheme of reservation phase recency expense minimum.
To for 4 kinds of cost models, describe in detail respectively below.
(1) delay overhead
Consider current running node u to be mapped to be mapped to candidate's reconfigurable processing unit PE utime, delay overhead is used for representing to carry out the needed operand of current running node u to be mapped and is transferred to candidate's reconfigurable processing unit PE uthe routing delay of input port.In reconfigurable arrays, the route parts on data transfer path comprise three kinds: interconnection line, route PE and distributed register DRF.Total delay overhead should be the time delay sum of using these three kinds of route parts, and its computing formula is as follows:
Delay cos t ( PE u ) = MAX v s &prime; &Element; V s &prime; C w ( PE v s &prime; , PE u ) + C r ( PE v s &prime; , PE u ) + C d ( PE v s &prime; , PE u )
Wherein,
Reconfigurable processing unit PE uthe shone upon candidate processes unit that represents current running node u to be mapped;
V s'represent to be mapped in the direct precursor running node of current running node u to be mapped the set of all operations node on reconfigurable arrays;
Figure BDA0000416173260000072
represent that the required operand of the current running node u to be mapped of mapping is from its mapped direct precursor node v s'corresponding reconfigurable processing unit
Figure BDA0000416173260000073
to current candidate's reconfigurable processing unit PE udata transfer path in, the time delay of being introduced by interconnection line;
with
Figure BDA0000416173260000075
represent that respectively the required operand of the current running node u to be mapped of mapping is from reconfigurable processing unit
Figure BDA0000416173260000076
to current candidate's reconfigurable processing unit PE udata transfer path in the time delay introduced by route PE and distributed register DRF;
(2) interconnection expense
Interconnection expense represents for transferring data to the interconnect resource number of object processing unit.Too much interconnect resource, not only can increase the communication delay between running node, and can cause the waste of resource, by the mapping of restriction running node below largely, even may make operation below cannot find available idling-resource.Interconnection expense is higher, and interconnect resource waste is more, can be just fewer for the interconnect resource of subsequent operation node scheduling, and scheduling is more backward by more difficult carrying out.Therefore, to certain current running node u to be mapped, when selecting to be mapped to which reconfigurable processing unit PE, should pay the utmost attention to the reconfigurable processing unit PE with the direct precursor of this current running node u to be mapped or the reconfigurable processing unit PE interconnection expense minimum at immediate successor joint running node place.The computing formula of interconnection expense is as follows:
Figure BDA0000416173260000077
Wherein,
V srepresent to be mapped to the set of all operations node on reconfigurable arrays;
V s'represent, in the direct precursor running node and immediate successor running node of current running node u to be mapped, to be mapped to the set of the running node on reconfigurable arrays, obviously V s'v ssubset;
V s'∈ V s'represent current running node to be mapped udirect precursor running node or immediate successor running node in one be mapped to running node on reconfigurable arrays;
Pred (u) represents the set of all direct precursor running node of current running node u to be mapped;
Succ (u) represents the set of all immediate successor running node of current running node u to be mapped;
Figure BDA0000416173260000081
represent reconfigurable processing unit PE uwith
Figure BDA0000416173260000082
between need the minimum route PE number that inserts;
The computing formula of above-mentioned interconnection expense represents: if u does not have direct precursor running node and immediate successor running node, or its direct precursor running node and immediate successor running node all do not have mapped, Interconnetcost (PE u)=0.Otherwise interconnection expense is numerically equal in the direct precursor running node and immediate successor running node of current running node u to be mapped, the running node being mapped on reconfigurable arrays is corresponding
Figure BDA0000416173260000083
reconfigurable processing unit PE with current candidate ubetween need the number of the minimum route PE that uses.Especially, work as PE uin the time of cannot meeting interconnection needs, Interconnetcost (PE u)=∞.
(3) PE occupancy expense
PE occupancy expense is used for weighing the usage degree of each reconfigurable processing unit PE in reconfigurable arrays.If in mapping result, the usage degree of reconfigurable processing unit PE differs greatly, the number of the operation that some reconfigurable processing unit PE carries out is obviously more than other reconfigurable processing units PE, the number of executable operations needs the number of times of reconstruct configuration more more, final configuration file is also larger, causes total execution time of reconfigurable arrays longer.So, the usage degree that needs balance reconfigurable processing unit PE, make the occupancy of each reconfigurable processing unit PE be more or less the same as far as possible, investigate the usage degree of reconfigurable processing unit PE when mapping by this parameter of PE occupancy expense, its computing formula is as follows:
U cos t ( PE u ) = PEOccupationCycles ( PE u ) II
Wherein,
PEOccupationCycles (PE u) represent that current reconfigurable processing unit PE carries out the T.T. that is mapped to the operational set on it;
II is the startup interval of reconfigurable architecture software flow.
(4) phase recency expense
Interconnection expense and delay overhead are all the running node that has directly related property dependence for weighing, should they be mapped to the reconfigurable processing unit PE that distance is close upper, but not comprising those there is no directly related property dependence but to have the running node of identical immediate successor running node as far as possible.For current running node u to be mapped, if it does not exist direct precursor running node or immediate successor running node, interconnection expense and the delay overhead of any reconfigurable processing unit PE are 0, cannot effectively assess the expense of mapping.Phase recency expense does not have direct data dependence but to have the situation of identical immediate successor running node mainly between current running node u to be mapped and the running node v being mapped on reconfigurable processing unit PE, with phase recency, weigh their close degree, with this, select to transmit the mapping mode of the required expense minimum of data between them.Phase recency overhead computational formula is as follows:
Rel cos t ( PE u ) = &Sigma; v &Element; V min | Vexdist ( u , v ) - PEdist ( PE u - PE v ) |
Wherein,
V minin the running node v that represents to have shone upon with the set of the shortest running node of running node u distance to be mapped.
Vexdist represents V minin the running node v that shone upon and the distance of running node u to be mapped.
PEdist represents V minin the reconfigurable processing unit PE that shines upon of the running node v that shone upon vwith running node to be mapped ucandidate's reconfigurable cell PE ubetween distance.
Above-mentioned phase recency expense formula shows, Vexdist and PEdist gap are larger, are mapped to this candidate's reconfigurable processing unit PE uthe phase recency expense of paying is larger.
As one embodiment of the present of invention, the computing method of the associated overhead model relating in the present invention are described with the Mapping Examples in Fig. 5.(a) figure in Fig. 5 represents the DDG figure with mapping algorithm, the figure shows the core circulation of certain application program, and (b) figure represents the corresponding reconfigurable system being comprised of PE.Suppose that current running node to be mapped is OP2, reconfigurable processing unit PE 23with reconfigurable processing unit PE 33for its candidate's reconfigurable processing unit, and running node OP1, OP3 and OP5 have been mapped to respectively reconfigurable processing unit PE 22, PE 32and PE 42, the immediate successor running node that OP4 is OP2 and not yet mapped.
The method according to this invention, treats map operation node OP2 and sets up the delay overhead described in the present invention, interconnection expense, PE occupancy expense and phase recency expense totally 4 cost models, utilizes the formula of each cost model to calculate 4 overhead value.
(1) delay overhead:
Because current running node OP2 to be mapped does not have direct precursor running node, so
Figure BDA0000416173260000092
Figure BDA0000416173260000093
with
Figure BDA0000416173260000094
be all 0.
So for current running node OP2 to be mapped, candidate's reconfigurable processing unit PE 23with reconfigurable processing unit PE 33delay overhead value be 0.
(2) interconnection expense:
Because current running node OP2 to be mapped does not have direct precursor running node, so therefore candidate's reconfigurable processing unit PE 23with reconfigurable processing unit PE 33interconnection expense be 0;
(3) PE occupancy expense: owing to starting interval II, be 1, and candidate's reconfigurable processing unit PE 23with reconfigurable processing unit PE 33on do not carry out other operation, therefore U cos t ( PE 23 ) = U cos t ( PE 33 ) = 1 1 = 1
(4) phase recency expense:
If current running node OP2 to be mapped is mapped to reconfigurable processing unit PE 33, next running node OP4 is mapped to reconfigurable processing unit PE 43, Vexdist (u, v)=1, PEdist (PE u, PE v)=1, therefore phase recency overhead value equals 0, the minimum route PE needing is 0, therefore can not cause extra route PE expense; And if current running node OP2 to be mapped is mapped to reconfigurable processing unit PE 23, next step running node OP4 will be mapped to reconfigurable processing unit PE 33or reconfigurable processing unit PE 43, PEdist (PE u, PE v) being equal to 2, phase recency overhead value is 1, all needs to be route PE with a reconfigurable processing unit PE and transmits data.Therefore have this to obtain, current running node OP2 to be mapped be mapped to reconfigurable processing unit PE 23be mapped to reconfigurable processing unit PE 33compare, not only increased data transmission time delay and caused the waste of computational resource.
Therefore, final selection is mapped to reconfigurable processing unit PE by running node OP2 33.
In order more to convincingly demonstrate feasibility and the advantage of the inventive method, the mapping result in reconfigurable architecture contrasts to the subalgorithm of a plurality of typical application programs to utilize the inventive method and method in the past, by the results are shown in table 1 of each cycle instruction number IPC:
Table 1
Figure BDA0000416173260000102
Figure BDA0000416173260000111
In table 1, listed each cycle instruction number IPC has directly reflected the degree of parallelism that circulation is carried out, and the value of IPC is larger, and the operation of same period executed in parallel is more, and the degree of parallelism that circulation is carried out is larger.This form has compared the IPC that configuration information that phase recency these four kinds of overhead functions of usage data transmission delay respectively, interconnect resource use amount, functional unit occupancy, mapping distance and operation room correlativity obtain produces, and the IPC that produces of the configuration information that obtains of the decision-making technique of comprehensive these four kinds of overhead functions of the present invention.To all test procedures, decision-making technique of the present invention has all obtained the highest IPC, so the configuration information producing after Optimal Decision-making makes reconfigurable system obtain more excellent degree of parallelism, and then obtains higher execution efficiency.
The above is only the preferred embodiment of the present invention; be noted that for those skilled in the art; under the premise without departing from the principles of the invention, can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.

Claims (2)

1. the mapping decision-making technique of the reconfigurable architecture based on overhead computational, it is characterized in that: model represents the cardiocirculatory data dependence graph DDG of application program core, obtain the direct subsequent node Succ (u) of the direct precursor node Pred (u) of current running node u to be mapped, current running node u to be mapped, current running node u to be mapped, and in reconfigurable architecture the startup interval II of software flow, then order is carried out following steps:
(1), set up following 4 cost models, be respectively:
Delay overhead: represent that operand is transferred to the time delay of the input port of candidate's reconfigurable processing unit PE;
Interconnection expense: represent for transferring data to the interconnect resource number of candidate's reconfigurable processing unit PE;
PE occupancy expense: in order to weigh the usage degree of each reconfigurable processing unit PE in reconfigurable arrays;
Phase recency expense: weigh current running node to be mapped and be mapped to reconfigurable processing unit PE upper and do not have direct data dependence but to have the close degree between the running node of identical immediate successor running node;
(2), a plurality of feasible mapping scheme that a certain running node is existed, calculate each feasible mapping scheme corresponding overhead value in 4 cost models;
The computing formula of described delay overhead is:
Delay cos t ( PE u ) = MAX v s &prime; &Element; V s &prime; C w ( PE v s &prime; , PE u ) + C r ( PE v s &prime; , PE u ) + C d ( PE v s &prime; , PE u )
Wherein,
Reconfigurable processing unit PE uthe shone upon candidate processes unit that represents current running node u to be mapped;
V s'represent to be mapped in the direct precursor running node of current running node u to be mapped the set of all operations node on reconfigurable arrays;
Figure FDA0000416173250000012
represent that the required operand of the current running node u to be mapped of mapping is from its mapped direct precursor node v s'corresponding reconfigurable processing unit
Figure FDA0000416173250000013
to current candidate's reconfigurable processing unit PE udata transfer path in, the time delay of being introduced by interconnection line;
with
Figure FDA0000416173250000015
represent that respectively the required operand of the current running node u to be mapped of mapping is from reconfigurable processing unit
Figure FDA0000416173250000016
to current candidate's reconfigurable processing unit PE udata transfer path in the time delay introduced by route PE and distributed register DRF;
The computing formula of described interconnection expense is:
Figure FDA0000416173250000021
Wherein,
V srepresent to be mapped to the set of all operations node on reconfigurable arrays;
V s'represent, in the direct precursor running node and immediate successor running node of current running node u to be mapped, to be mapped to the set of the running node on reconfigurable arrays;
Pred (u) represents the set of all direct precursor running node of current running node u to be mapped;
Succ (u) represents the set of all immediate successor running node of current running node u to be mapped;
Figure FDA0000416173250000022
represent reconfigurable processing unit PE uwith between need the minimum route PE number that inserts;
The computing formula of described PE occupancy expense is:
U cos t ( PE u ) = PEOccupationCycles ( PE u ) II
Wherein,
PEOccupationCycles (PE u) represent that current reconfigurable processing unit PE carries out the T.T. that is mapped to the operational set on it;
II is the startup interval of reconfigurable architecture software flow;
The computing formula of described phase recency expense is:
Rel cos t ( PE u ) = &Sigma; v &Element; V min | Vexdist ( u , v ) - PEdist ( PE u - PE v ) |
Wherein,
V minin the running node v that represents to have shone upon with the set of the shortest all operations node of current running node u distance to be mapped;
Vexdist represents V minin the running node v that shone upon and the distance of current running node u to be mapped;
PEdist represents V minin the reconfigurable processing unit PE that shines upon of the running node v that shone upon vcandidate's reconfigurable processing unit PE with current running node u to be mapped ubetween distance;
(3), a plurality of feasible mapping scheme to current running node u to be mapped, order according to delay overhead, interconnection expense, PE occupancy expense and phase recency expense travels through each feasible mapping scheme successively, dwindle gradually feasible mapping scheme collection, finally draw optimum mapping scheme.
2. the mapping decision-making technique of a kind of reconfigurable architecture based on overhead computational according to claim 1, is characterized in that: each feasible mapping scheme of described traversal, dwindle feasible mapping scheme collection gradually, finally show that the traversal screening process of optimum mapping scheme comprises following 4 steps that order is carried out:
(1) delay overhead traversal, according to the size of delay overhead to the screening of sort of each feasible mapping scheme, the mapping scheme of reservation delay overhead in threshold range necessarily, this threshold range regulates according to reconfigurable architecture concrete in practical application;
(2) interconnection expense traversal, by the mapping scheme after delay overhead traversal screening according to the screening of sorting of the size of interconnection expense, retain the mapping scheme of interconnection expense in certain threshold range, this threshold range regulates according to reconfigurable architecture concrete in practical application;
(3) PE occupancy traversal, by the mapping scheme after the screening of interconnection expense traversal according to the screening of sort of the size of PE occupancy expense, the mapping scheme of reservation PE occupancy expense minimum;
(4) phase recency expense traversal, by the mapping scheme after the screening of PE occupancy traversal according to the screening of sort of the size of phase recency expense, the mapping scheme of reservation phase recency expense minimum.
CN201310576351.2A 2013-11-18 2013-11-18 Reconfigurable architecture mapping decision-making method based on expense calculation Active CN103605573B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310576351.2A CN103605573B (en) 2013-11-18 2013-11-18 Reconfigurable architecture mapping decision-making method based on expense calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310576351.2A CN103605573B (en) 2013-11-18 2013-11-18 Reconfigurable architecture mapping decision-making method based on expense calculation

Publications (2)

Publication Number Publication Date
CN103605573A true CN103605573A (en) 2014-02-26
CN103605573B CN103605573B (en) 2017-02-08

Family

ID=50123802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310576351.2A Active CN103605573B (en) 2013-11-18 2013-11-18 Reconfigurable architecture mapping decision-making method based on expense calculation

Country Status (1)

Country Link
CN (1) CN103605573B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294278A (en) * 2016-08-01 2017-01-04 东南大学 The pre-configured controller of adaptive hardware of system is calculated for dynamic reconfigurable array
CN107679012A (en) * 2017-09-27 2018-02-09 清华大学无锡应用技术研究院 Method and apparatus for the configuration of reconfigurable processing system
CN109144702A (en) * 2018-09-06 2019-01-04 陈彦楠 One kind being used for row-column parallel calculation coarse-grained reconfigurable array multiple-objection optimization automatic mapping dispatching method
CN109274497A (en) * 2018-08-30 2019-01-25 无锡凯特微电子有限公司 A kind of mapping method of the SM3 algorithm based on reconfigurable arrays

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7451447B1 (en) * 1998-08-07 2008-11-11 Arc International Ip, Inc. Method, computer program and apparatus for operating system dynamic event management and task scheduling using function calls
US20090320025A1 (en) * 2003-08-01 2009-12-24 Idx Investment Corporation Enterprise task manager
CN102262612A (en) * 2011-07-28 2011-11-30 西安交通大学 Method for lowering restructurable hardware task layout conflict
CN102354337A (en) * 2011-10-14 2012-02-15 河海大学常州校区 Reconfigurable assembly line multi-target scheduling decision method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7451447B1 (en) * 1998-08-07 2008-11-11 Arc International Ip, Inc. Method, computer program and apparatus for operating system dynamic event management and task scheduling using function calls
US20090320025A1 (en) * 2003-08-01 2009-12-24 Idx Investment Corporation Enterprise task manager
CN102262612A (en) * 2011-07-28 2011-11-30 西安交通大学 Method for lowering restructurable hardware task layout conflict
CN102354337A (en) * 2011-10-14 2012-02-15 河海大学常州校区 Reconfigurable assembly line multi-target scheduling decision method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙康: "可重构计算相关技术研究", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294278A (en) * 2016-08-01 2017-01-04 东南大学 The pre-configured controller of adaptive hardware of system is calculated for dynamic reconfigurable array
CN106294278B (en) * 2016-08-01 2019-03-12 东南大学 Adaptive hardware for dynamic reconfigurable array computing system is pre-configured controller
CN107679012A (en) * 2017-09-27 2018-02-09 清华大学无锡应用技术研究院 Method and apparatus for the configuration of reconfigurable processing system
CN109274497A (en) * 2018-08-30 2019-01-25 无锡凯特微电子有限公司 A kind of mapping method of the SM3 algorithm based on reconfigurable arrays
CN109144702A (en) * 2018-09-06 2019-01-04 陈彦楠 One kind being used for row-column parallel calculation coarse-grained reconfigurable array multiple-objection optimization automatic mapping dispatching method
CN109144702B (en) * 2018-09-06 2021-12-07 兰州大学 Multi-objective optimization automatic mapping scheduling method for row-column parallel coarse-grained reconfigurable array

Also Published As

Publication number Publication date
CN103605573B (en) 2017-02-08

Similar Documents

Publication Publication Date Title
CN106713083B (en) Intelligent household equipment control method, device and system based on knowledge graph
CN103116493B (en) A kind of automatic mapping method being applied to coarse-grained reconfigurable array
CN110135755A (en) A kind of method of complex optimum section urban public transit timetable establishment and vehicle scheduling
CN102158417A (en) Method and device for optimizing multi-constraint quality of service (QoS) routing selection
CN103178516B (en) Node merge method based dynamic network topology analysis method
CN106339351B (en) A kind of SGD algorithm optimization system and method
CN104683488B (en) Streaming computing system and its dispatching method and device
CN102855153B (en) Towards the stream compile optimization method of chip polycaryon processor
CN103348640A (en) Relay device
Yang et al. Performance improvement of energy consumption, passenger time and robustness in metro systems: A multi-objective timetable optimization approach
CN103605573A (en) Reconfigurable architecture mapping decision-making method based on expense calculation
CN107005477A (en) The route device based on link delay for network-on-chip
CN103428804B (en) Find mapping scheme and network code location method between network-on-chip task and node
CN109656544A (en) A kind of cloud service API adaptation method based on execution route similarity
Liu et al. Regional bus operation bi-level programming model integrating timetabling and vehicle scheduling
CN104601473A (en) Route generating method and system for multi-target path with restriction
CN103942083B (en) A kind of compiling implementation method towards variable ginseng function
CN106462585B (en) System and method for particular column materialization scheduling
CN106547522A (en) A kind of method and device of stream application optimization
Zhou et al. A survey on network calculus tools for network infrastructure in real-time systems
Wu et al. Track allocation optimization in railway station: mean-variance model and case study
CN102222274A (en) Immune clone selection job shop scheduling method based on scheduling coding
CN104253830B (en) A kind of location Based service system of selection
CN113010296A (en) Task analysis and resource allocation method and system based on formalized model
CN102427420A (en) Virtual network mapping method and device based on graph pattern matching

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20151211

Address after: 518000 Guangdong city of Shenzhen province Nanshan District high tech Industrial Park Road eight South South technology Howare Technology Building 16

Applicant after: SHENZHEN PANGO MICROSYSTEMS CO., LTD.

Address before: 211189 Jiangsu Road, Jiangning District, Southeast University, No. 2, No.

Applicant before: Southeast University

CB02 Change of applicant information

Address after: 518000 Guangdong city of Shenzhen province Nanshan District high tech Industrial Park Road eight South South technology Howare Technology Building 16

Applicant after: Shenzhen Pango Microsystems Co., Ltd.

Address before: 518000 Guangdong city of Shenzhen province Nanshan District high tech Industrial Park Road eight South South technology Howare Technology Building 16

Applicant before: SHENZHEN PANGO MICROSYSTEMS CO., LTD.

COR Change of bibliographic data
C14 Grant of patent or utility model
GR01 Patent grant