CN103605573B - Reconfigurable architecture mapping decision-making method based on expense calculation - Google Patents

Reconfigurable architecture mapping decision-making method based on expense calculation Download PDF

Info

Publication number
CN103605573B
CN103605573B CN201310576351.2A CN201310576351A CN103605573B CN 103605573 B CN103605573 B CN 103605573B CN 201310576351 A CN201310576351 A CN 201310576351A CN 103605573 B CN103605573 B CN 103605573B
Authority
CN
China
Prior art keywords
mapped
overhead
reconfigurable
operation node
mapping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310576351.2A
Other languages
Chinese (zh)
Other versions
CN103605573A (en
Inventor
齐志
马璐
曹鹏
王学香
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Pango Microsystems Co Ltd
Original Assignee
Shenzhen Pango Microsystems Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Pango Microsystems Co Ltd filed Critical Shenzhen Pango Microsystems Co Ltd
Priority to CN201310576351.2A priority Critical patent/CN103605573B/en
Publication of CN103605573A publication Critical patent/CN103605573A/en
Application granted granted Critical
Publication of CN103605573B publication Critical patent/CN103605573B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Stored Programmes (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a reconfigurable architecture mapping decision-making method based on expense calculation. The method includes: building four expense models which are delay expense, interconnection expense, PE occupancy rate expense and similarity expense according to data dependence graph (DDG) indicating application algorithm core circulation and a reconfigurable architecture; for multiple feasible mapping schemes existing in one operating node, calculating the corresponding expense value, in the four expense models, of each feasible mapping scheme; sequentially traversing each feasible mapping scheme from primary to secondary according to the degree of influence of the four expense models on the mapping results, and gradually diminishing the feasible mapping scheme set to obtain the optimal mapping scheme. By the decision-making method, the factors with large influence on mapping are more predominated in mapping decision making. By the expense models and the mapping decision-making method, configuration information with high efficiency can be obtained, and parallelism of the reconfigurable architecture can be brought into full play.

Description

Reconfigurable architecture mapping decision method based on overhead calculation
Technical Field
The invention belongs to the technical field of embedded information, and particularly relates to a reconfigurable architecture-oriented mapping decision method based on overhead calculation.
Background
The array configuration information used by the reconfigurable architecture not only ensures that the application program realizes correct functions in the reconfigurable architecture, but also gives consideration to the limited hardware resources of the reconfigurable array and executes efficiently. How to stably, effectively and automatically generate configuration information meeting the requirements is an unsolved problem all the time.
The problem of mapping The core loops in An application onto a reconfigurable array has been demonstrated as a complete problem of Non-deterministic Polynomial complexity (NP), as demonstrated by The article "The NP-complete column: An on-going guide", Journal of The Journal of Algorithms, published in Johnson DS.1982. Since an algorithm with deterministic polynomial time cannot be used to calculate the configuration information, only an approximate solution can be obtained by a series of heuristic optimization methods and the approximate solution is used to replace the optimal solution. The heuristic optimization method needs to use a series of empirical calculation formulas and decision rules, however, the complexity of the reconfigurable architecture increases the difficulty of the heuristic design.
The existing automatic mapping method can not control the following two points through abstract overhead model constraint: (1) excessive routing PEs result in a reduction in computational resources; (2) register overflow caused by large amounts of temporary data often results in inefficient configuration information. In order to solve the above problems, it is necessary to comprehensively consider the data transfer delay, the interconnection resource usage amount, the occupancy of the functional units, and the proximity of the mapping distance to the correlation between operations.
Disclosure of Invention
The technical problem to be solved is as follows: aiming at the defects in the prior art, the invention provides a novel overhead model, a mapping overhead function and a mapping decision method which are suitable for a reconfigurable architecture, is used for evaluating the software flow overhead of the reconfigurable architecture and judging an optimal mapping scheme, and solves the technical problems that the establishment of an accurate overhead model and the determination of the optimal mapping are difficult in the prior art.
The technical scheme is as follows: in order to solve the technical problems, the invention adopts the following technical scheme:
a reconfigurable system structure mapping decision method based on overhead calculation firstly establishes a data dependency graph DDG representing application algorithm core cycle to obtain a current operation node u to be mapped, a direct precursor node pred (u) of the current operation node u to be mapped, a direct subsequent node Succ (u) of the current operation node u to be mapped, and a starting interval II of software flow in the reconfigurable system structure, and then sequentially executes the following steps:
(1) establishing the following 4 overhead models which are respectively:
the time delay overhead is as follows: representing the delay of operand transfer to the input port of the candidate reconfigurable processing element PE;
interconnection overhead: representing the number of interconnect resources used to deliver data to the candidate reconfigurable processing elements PE;
PE occupancy overhead: the method is used for measuring the use degree of each reconfigurable processing unit PE in the reconfigurable array;
overhead of closeness: measuring the similarity between the current operation node to be mapped and the operation node which is mapped on the reconfigurable processing unit PE and has no direct data dependence but has the same direct subsequent operation node;
(2) calculating the corresponding overhead value of each feasible mapping scheme in 4 overhead models for a plurality of feasible mapping schemes existing in a certain operation node;
the calculation formula of the delay overhead is as follows:
D e l a y cos t ( PE u ) = MAX v s ′ ∈ V s ′ C w ( P E v s ′ , P E u ) + C r ( P E v s ′ , P E u ) + C d ( P E v s ′ , P E u )
wherein,
reconfigurable processing element PEuThe processing unit can be mapped and represents the operation node u to be mapped currently;
VS'representing a set of all operation nodes which are mapped on the reconfigurable array in the direct precursor operation nodes of the operation node u to be mapped currently;
representing the direct predecessor node v from which the operands required to map the currently to-be-mapped operation node u have been mappeds'Corresponding reconfigurable processing unitTo the current candidate reconfigurable processing element PEuIn the data transmission path of (2), a delay introduced by the interconnection line;
andrespectively representing the operand slave reconfigurable processing unit required by mapping the operation node u to be mapped currentlyTo the current candidate reconfigurable processing element PEuDelay introduced by the routing PE and the distributed register DRF in the data transmission path;
the calculation formula of the interconnection overhead is as follows:
wherein,
Vsrepresenting a set of all operational nodes that have been mapped onto the reconfigurable array;
Vs″representing the set of operation nodes which are mapped to the reconfigurable array in the direct predecessor operation node and the direct successor operation node of the current operation node u to be mapped, and obviously Vs″Is VsA subset of (a);
vs'∈Vs″one of a direct predecessor operation node or a direct successor operation node representing the operation node u to be mapped currently is mapped to an operation node on the reconfigurable array;
pred (u) represents the set of all direct predecessor operation nodes of the current operation node u to be mapped;
succ (u) represents the set of all the immediate successor operation nodes of the operation node u to be mapped currently;
representing a reconfigurable processing element PEuAndthe minimum number of routing PEs required to be inserted between the two PEs;
the meaning of the above formula indicates that if the current operation node u to be mapped has no direct predecessor operation node and no direct successor operation node, or both the direct predecessor node and the direct successor operation node are not mapped, then the internet cost (PE)u) 0. Otherwise, the interconnection overhead is equal to the value of the corresponding operation node mapped to the reconfigurable array in the direct predecessor operation node and the direct successor operation node of the current operation node u to be mappedWith current candidate reconfigurable processing element PEuThe number of the least routing PEs that need to be used in between. In particular, when PEuWhen the interconnection requirement can not be satisfied, internet cost (PE)u)=∞。
The calculation formula of the PE occupancy rate overhead is as follows:
U cos t ( PE u ) = P E O c c u p a t i o n C y c l e s ( PE u ) I I
wherein,
PEOccupationCycles(PEu) Representing the total time for the current reconfigurable processing element PE to perform the set of operations mapped thereto;
II, starting intervals of software running water of the reconfigurable system structure;
the calculation formula of the similarity overhead is as follows:
Re l cos t ( PE u ) = Σ v ∈ V m i n | V e x d i s t ( u , v ) - P E d i s t ( PE u - PE v ) |
wherein,
Vminrepresenting a set of all operation nodes with the shortest distance to the current operation node u to be mapped in the operation nodes v already mapped;
vexdist (u, V) denotes VminThe distance between the operation node v which is already mapped and the operation node u to be mapped currently;
PEdist(PEu-PEv) Represents VminThe reconfigurable processing unit PE mapped by the operation node v already mappedvCandidate reconfigurable processing unit PE with current operation node u to be mappeduThe distance between them;
(3) and traversing the feasible mapping schemes of the current operation node u to be mapped in sequence according to the sequence of the delay overhead, the interconnection overhead, the PE occupancy rate overhead and the similarity overhead, gradually reducing the feasible mapping scheme set, and finally obtaining the optimal mapping scheme.
Firstly, establishing a data dependency graph DDG representing an application algorithm core cycle, analyzing the module scheduling basic parameters of a reconfigurable system structure, and further establishing a calculation formula of an overhead model and each overhead model on the basis of the module scheduling basic parameters.
Further, in the present invention, the traversing and screening process of traversing each feasible mapping scheme, gradually narrowing the feasible mapping scheme set, and finally obtaining the optimal mapping scheme includes the following 4 steps executed sequentially:
(1) traversing the delay overhead, sorting and screening the feasible mapping schemes according to the size of the delay overhead, and reserving the mapping schemes with the delay overhead within a certain threshold range, wherein the threshold range is adjusted according to an actual application program and a specific reconfigurable architecture, and the adjusting method belongs to the common general knowledge of the technical personnel in the field;
(2) traversing interconnection expenses, sequencing and screening the mapping scheme subjected to the traversing and screening of the delay expenses according to the size of the interconnection expenses, and reserving the mapping scheme of the interconnection expenses within a certain threshold range, wherein the threshold range is adjusted according to an actual application program and a specific reconfigurable system structure, and the adjusting method belongs to the common knowledge of the technicians in the field;
(3) traversing the PE occupancy rates, sorting and screening the mapping schemes subjected to interconnection overhead traversal and screening according to the PE occupancy rate overhead size, and reserving the mapping scheme with the minimum PE occupancy rate overhead;
(4) and traversing the similarity spending, sorting and screening the mapping schemes subjected to PE occupancy rate traversal screening according to the size of the similarity spending, and reserving the mapping scheme with the minimum similarity spending.
The sequence of the traversal is determined by the influence degree of 4 overheads on the mapping result from large to small, the influence of the delay overheads is the largest, and then the interconnection overheads, the PE occupancy rate overheads and the proximity overheads are sequentially performed, so that the optimal mapping scheme can be obtained by step-by-step screening.
Has the advantages that:
after various hardware components of the reconfigurable array are fully analyzed, the characteristics of an application program with an actual application function running on a reconfigurable system are combined, the similarity between data transmission delay, interconnection resource usage, functional unit occupancy rate, mapping distance and correlation between operations is used as a measurement standard when optimal mapping is selected, a reasonable overhead model, a mapping overhead function corresponding to the reasonable overhead model and a mapping decision method integrating the reasonable overhead model and the mapping overhead function are established, and mapping overhead can be comprehensively and effectively evaluated.
The decision method screens candidate mapping schemes according to the influence degree of each overhead model on the mapping result from large to small, gradually reduces the screening range, finally judges the optimal mapping, and ensures that factors with larger influence on the mapping are more dominant in mapping decision.
By using the overhead model and the mapping decision method, the configuration information with higher execution efficiency can be obtained, so that the parallelism of the reconfigurable system is fully exerted, and compared with the existing method, the generation of more optimal automatic configuration information is realized.
Drawings
FIG. 1 is a block diagram of a reconfigurable system architecture;
FIG. 2 is a block diagram of a reconfigurable array;
FIG. 3 is a flow chart of the operation of the present invention;
FIG. 4 is a flowchart of a traversal screening process of the present invention;
FIG. 5 is a diagram of an exemplary mapping of the present invention.
Detailed Description
The invention is further elucidated with reference to the drawings and the specific embodiments.
Fig. 1 is a block diagram of a reconfigurable system architecture. The reconfigurable system consists of a main control processor, a system bus, a reconfigurable array, a data flow controller, a configuration controller and a series of storage resources. Wherein the storage resources include configuration registers and global registers.
Fig. 2 is a structural diagram of a reconfigurable array of scale 4 × 4. The reconfigurable array is composed of a reconfigurable processing unit PE, storage resources of the reconfigurable array and a programmable interconnection network.
The reconfigurable processing unit PE provides a data path between an output port of data and an input port of the reconfigurable processing unit PE, and supports a routing mode and a conditional execution mechanism.
The storage resources within the reconfigurable array to store data and configuration information include: the distributed register DRF, the output register REG set by the output port of the reconfigurable processing unit PE and the local configuration information register inside the reconfigurable processing unit PE.
The programmable interconnect network within the reconfigurable array includes a data transport network and a conditional signal transport network. The data transmission network is used for data transmission among the reconfigurable processing units PE, among the distributed registers DRF and between the reconfigurable processing units PE and the distributed registers DRF; the conditional signal transmission network is used for transmitting a 1-bit conditional control signal.
The invention firstly establishes a data dependency graph DDG representing the core cycle of an application algorithm according to a specific application program, and analyzes the DDG to obtain the basic parameters of the modular scheduling of the reconfigurable system structure: the set of all the operation nodes comprises an operation node u to be mapped currently, a direct predecessor node pred (u) of the operation node u to be mapped currently, a direct successor node succ (u) of the operation node u to be mapped currently, and a starting interval II of software pipelining in the reconfigurable architecture.
Then, the following steps are performed, and the work flow is shown in fig. 3 and 4:
firstly, establishing an expense model for evaluating a mapping scheme in a reconfigurable array from 4 aspects of data transmission delay, the using number of interconnection resources, the occupation proportion of a reconfigurable processing unit PE and the matching degree of the operating node distance in a data dependency graph DDG and the distance of the reconfigurable processing unit PE in the reconfigurable array, wherein the expense model comprises delay expense, interconnection expense, PE occupancy rate expense and proximity expense;
the delay overhead is: the delay of transmitting an operand required by the reconfigurable processing unit PE to an input port of the reconfigurable processing unit PE, for different data transmission paths, different transmission delays caused by different hardware components contained in the data transmission paths, and the larger the delay, the worse the pipeline performance;
the interconnect overhead is: the number of interconnection resources used for transmitting data to a target processing unit is represented, the higher the interconnection overhead is, the more the interconnection resources are wasted, the fewer the interconnection resources which can be mapped by the subsequent operation nodes are, and the more the mapping is, the more difficult the mapping is to be carried out;
the PE occupancy overhead: the method is used for measuring the use degree of each reconfigurable processing unit PE in the array, if the use degree difference of the reconfigurable processing unit PE is large in the mapping result, if the number of operations executed by some reconfigurable processing unit PE is obviously more than that of other reconfigurable processing unit PE, the more the number of the executed operations is, the more times of reconfiguration configuration is needed, the larger the final configuration file is, and the longer the total execution time of the reconfigurable array is caused;
the proximity overhead is: the similarity degree is mainly used for measuring the similarity degree of the operation node to be mapped and the operation node mapped on the reconfigurable processing unit PE under the condition that the operation node to be mapped does not have direct data dependence but has the same direct subsequent operation node.
And secondly, calculating 4 overhead values corresponding to each feasible mapping scheme in 4 overhead models for a plurality of feasible mapping schemes existing in a certain operation node.
Thirdly, the influence degrees of the 4 overhead models on the mapping result are sequentially from the primary to the secondary: latency overhead, interconnection overhead, PE occupancy overhead, and proximity overhead. And traversing each feasible mapping scheme from the primary to the secondary according to the influence degree of the 4 overhead models on the mapping result, gradually reducing the feasible mapping scheme set, and finally obtaining the optimal mapping scheme.
The traversal process of the invention is as follows:
traversing the delay overhead, sorting and screening the feasible mapping schemes according to the size of the interconnection overhead, and reserving the mapping scheme of the delay overhead within a certain threshold range, wherein the threshold range is adjusted according to an actual application program and a specific reconfigurable architecture, and the adjusting method belongs to the common knowledge of the technical personnel in the field;
traversing interconnection expenses, sequencing and screening the mapping scheme subjected to the traversing and screening of the delay expenses according to the size of the interconnection expenses, and reserving the mapping scheme of the interconnection expenses within a certain threshold range, wherein the threshold range is adjusted according to an actual application program and a specific reconfigurable system structure, and the adjusting method belongs to the common knowledge of the technicians in the field;
traversing the PE occupancy rate overhead, sorting and screening the mapping scheme subjected to interconnection overhead traversal screening according to the PE occupancy rate overhead size, and reserving the mapping scheme with the minimum PE occupancy rate overhead;
and traversing the similarity spending, sorting and screening the mapping scheme subjected to the traversal screening of the PE occupancy rate spending according to the size of the similarity spending, and reserving the mapping scheme with the minimum similarity spending.
The following is a detailed description of each of the 4 overhead models.
(1) Time delay overhead
Considering the mapping of the current operation node u to be mapped to the candidate reconfigurable processing unit PEuThe delay overhead is used for indicating that the operand required for executing the current operation node u to be mapped is transmitted to the candidate reconfigurable processing unit PEuThe routing delay of the input port of (1). In the reconfigurable array, the routing components on the data transmission path include three types: interconnection lines,A routing PE and a distributed register DRF. The total delay cost is the sum of the delays of the three routing components, and the calculation formula is as follows:
D e l a y cos t ( PE u ) = MAX v s ′ ∈ V s ′ C w ( P E v s ′ , P E u ) + C r ( P E v s ′ , P E u ) + C d ( P E v s ′ , P E u )
wherein,
reconfigurable processing element PEuThe processing unit can be mapped and represents the operation node u to be mapped currently;
VS'representing a set of all operation nodes which are mapped on the reconfigurable array in the direct precursor operation nodes of the operation node u to be mapped currently;
representing the direct predecessor node v from which the operands required to map the currently to-be-mapped operation node u have been mappeds'Corresponding reconfigurable processing unitTo the current candidate reconfigurable processing element PEuIn the data transmission path of (2), a delay introduced by the interconnection line;
andrespectively representing the operand slave reconfigurable processing unit required by mapping the operation node u to be mapped currentlyTo the current candidate reconfigurable processing element PEuDelay introduced by the routing PE and the distributed register DRF in the data transmission path;
(2) interconnect overhead
The interconnect overhead represents the number of interconnect resources used to deliver the data to the destination processing unit. The excessive interconnection resources not only increase communication delay between the operation nodes, but also cause waste of resources, which greatly limits the mapping of the following operation nodes, and may even make the following operation unable to find available free resources. The higher the interconnection overhead is, the more the interconnection resources are wasted, the fewer the interconnection resources available for the subsequent operation node to schedule, and the later the scheduling is, the more difficult the scheduling is. Therefore, when selecting which reconfigurable processing unit PE to map a certain current operation node u to be mapped to, the reconfigurable processing unit PE with the minimum interconnection overhead with the reconfigurable processing unit PE in which the direct preceding or direct succeeding operation node of the current operation node u to be mapped is located should be considered preferentially. The formula for the interconnect overhead is as follows:
wherein,
Vsrepresenting a set of all operational nodes that have been mapped onto the reconfigurable array;
Vs″representing the set of operation nodes which are mapped to the reconfigurable array in the direct predecessor operation node and the direct successor operation node of the current operation node u to be mapped, and obviously Vs″Is VsA subset of (a);
vs'∈Vs″one of a direct predecessor operation node or a direct successor operation node representing the operation node u to be mapped currently is mapped to an operation node on the reconfigurable array;
pred (u) represents the set of all direct predecessor operation nodes of the current operation node u to be mapped;
succ (u) represents the set of all the immediate successor operation nodes of the operation node u to be mapped currently;
representing a reconfigurable processing element PEuAndthe minimum number of routing PEs required to be inserted between the two PEs;
the above formula for calculating the interconnection overhead represents: interconnetcost (PE) if u has no direct predecessor and successor nodes, or neither its direct predecessor or successor node is mappedu) 0. Otherwise, the interconnection overhead is equal to the value of the corresponding operation node mapped to the reconfigurable array in the direct predecessor operation node and the direct successor operation node of the current operation node u to be mappedWith current candidate reconfigurable processing element PEuThe number of the least routing PEs that need to be used in between. In particular, when PEuWhen the interconnection requirement can not be satisfied, internet cost (PE)u)=∞。
(3) PE occupancy overhead
The PE occupancy rate overhead is used for measuring the use degree of each reconfigurable processing unit PE in the reconfigurable array. If the difference of the usage degrees of the reconfigurable processing units PE is large in the mapping result, the number of operations executed by some reconfigurable processing units PE is significantly greater than that of other reconfigurable processing units PE, and the number of times of reconfiguration configuration is required as the number of operations executed is greater, the final configuration file is also greater, resulting in a longer total execution time of the reconfigurable array. Therefore, it is necessary to balance the usage degrees of the reconfigurable processing units PE, make the occupancy rate difference of each reconfigurable processing unit PE as small as possible, and examine the usage degree of the reconfigurable processing units PE by using the PE occupancy rate overhead parameter during mapping, and the calculation formula is as follows:
U cos t ( PE u ) = P E O c c u p a t i o n C y c l e s ( PE u ) I I
wherein,
PEOccupationCycles(PEu) Representing the total time for the current reconfigurable processing element PE to perform the set of operations mapped thereto;
and II, starting intervals of software pipelining of the reconfigurable architecture.
(4) Overhead of proximity
The interconnection overhead and the delay overhead are used for measuring the operation nodes with direct dependency relationship, and the operation nodes are mapped to the reconfigurable processing units PE with similar distances as much as possible, but do not contain the operation nodes which have no direct dependency relationship but have the same direct subsequent operation nodes. For the operation node u to be mapped currently, if no direct predecessor operation node or direct successor operation node exists, the interconnection overhead and the delay overhead of any reconfigurable processing unit PE are both 0, and the mapping overhead cannot be effectively evaluated. The similarity cost is mainly used for measuring the similarity of the operation node u to be mapped and the operation node v mapped on the reconfigurable processing unit PE according to the condition that no direct data dependency exists between the operation node u and the operation node v, and the similarity is used for selecting the mapping mode with the minimum cost for transmitting the data between the operation node u and the operation node v. The similarity cost calculation formula is as follows:
Re l cos t ( PE u ) = Σ v ∈ V m i n | V e x d i s t ( u , v ) - P E d i s t ( PE u - PE v ) |
wherein,
Vminindicating that the operation node v is mapped with the mapping objectAnd the operating node u is the set of the operating nodes with the shortest distance.
Vexdist (u, V) denotes VminThe distance between the operation node v already mapped in (b) and the operation node u to be mapped.
PEdist(PEu-PEv) Represents VminThe reconfigurable processing unit PE mapped by the operation node v already mappedvCandidate reconfigurable unit PE with operation node u to be mappeduThe distance between them.
The above proximity cost formula shows that Vexdist (u, v) and PEdist (PE)u-PEv) The larger the gap, the more reconfigurable processing element PE is mapped to this candidateuThe greater the proximity overhead to be paid out.
As an embodiment of the present invention, a calculation method of the related overhead model involved in the present invention is illustrated by a mapping example in fig. 5. Fig. 5 (a) shows a DDG graph with a mapping algorithm, which represents a core loop of a certain application, and (b) shows a corresponding reconfigurable system composed of PEs. Assuming that the current node to be mapped is OP2, the reconfigurable processing element PE23And a reconfigurable processing element PE33The reconfigurable processing elements are candidate therefor, and the operation nodes OP1, OP3 and OP5 have been mapped to the reconfigurable processing elements PE, respectively22、PE32And PE42OP4 is the immediate successor operational node to OP2 and has not yet been mapped.
According to the method, 4 overhead models including the delay overhead, the interconnection overhead, the PE occupancy rate overhead and the similarity overhead are established for the operation node OP2 to be mapped, and 4 overhead values are calculated by using the formula of each overhead model.
(1) The time delay overhead is as follows:
since the currently to-be-mapped operation node OP2 has no direct predecessor operation node Andare both 0.
So for the currently to-be-mapped operational node OP2, the candidate reconfigurable processing element PE23And a reconfigurable processing element PE33The delay overhead values of (a) are all 0.
(2) Interconnection overhead:
since the currently to-be-mapped operation node OP2 has no direct predecessor operation node, the candidate reconfigurable processing element PE23And a reconfigurable processing element PE33The interconnect overhead of (a) is 0;
(3) PE occupancy overhead: since the start interval II is 1, and the candidate reconfigurable processing element PE23And a reconfigurable processing element PE33No other operations are performed thereon, so
(4) Overhead of closeness:
if the current operation node to be mapped OP2 is mapped to the reconfigurable processing element PE33The operational node OP4 is then mapped to the reconfigurable processing element PE43,Vexdist(u,v)=1,PEdist(PEu,PEv) If the value of the proximity cost is equal to 0, the minimum required routing PE is 0, and thus no additional routing PE cost is incurred; and if the currently to-be-mapped operation node OP2 is mapped to the reconfigurable processing element PE23The next operation node OP4 will be mapped to the reconfigurable processing element PE33Or a reconfigurable processing element PE43,PEdist(PEu,PEv) All equal to 2, the similarity overhead value is 1, and a reconfigurable processing unit PE is needed to be used as a routing PE to transmit data. It is thus obtained that the currently to-be-mapped operational node OP2 is mapped to the reconfigurableProcessing element PE23And mapping to reconfigurable processing element PE33In contrast, not only is the data transmission delay increased, but also the waste of computing resources is caused.
Thus, the final selection maps operational node OP2 to the reconfigurable processing element PE33
In order to more strongly prove the feasibility and the advantages of the method, the mapping results of a plurality of typical sub-algorithms of the application program in the reconfigurable architecture are compared by using the method and the prior method, and the results of the instruction number per cycle IPC are listed in Table 1:
TABLE 1
The IPC of the instruction number per cycle listed in the table 1 directly reflects the parallelism of the loop execution, and the larger the value of the IPC is, the more operations are executed in parallel in the same cycle, and the larger the parallelism of the loop execution is. The table compares the IPCs generated from the configuration information obtained by using the four overhead functions of data transmission delay, interconnection resource usage, functional unit occupancy, proximity of the correlation between the mapping distance and the operation, and the IPCs generated from the configuration information obtained by the decision method of synthesizing the four overhead functions. For all test programs, the decision method of the invention obtains the highest IPC, so that the configuration information generated after the decision is optimized enables the reconfigurable system to obtain better parallelism and further obtain higher execution efficiency.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims (2)

1. A reconfigurable architecture mapping decision method based on overhead calculation is characterized in that: firstly, establishing a data dependency graph DDG representing the core cycle of an application program to obtain a current operation node u to be mapped, a direct predecessor node Pred (u) of the current operation node u to be mapped, a direct successor node Succ (u) of the current operation node u to be mapped, and a starting interval II of software pipelining in a reconfigurable architecture, and then sequentially executing the following steps:
(1) establishing the following 4 overhead models which are respectively:
the time delay overhead is as follows: representing the delay of operand transfer to the input port of the candidate reconfigurable processing element PE;
interconnection overhead: representing the number of interconnect resources used to deliver data to the candidate reconfigurable processing elements PE;
PE occupancy overhead: the method is used for measuring the use degree of each reconfigurable processing unit PE in the reconfigurable array;
overhead of closeness: measuring the similarity between the current operation node to be mapped and the operation node which is mapped on the reconfigurable processing unit PE and has no direct data dependence but has the same direct subsequent operation node;
(2) calculating the corresponding overhead value of each feasible mapping scheme in 4 overhead models for a plurality of feasible mapping schemes existing in a certain operation node;
the calculation formula of the delay overhead is as follows:
D e l a y cos t ( PE u ) = MAX v s ′ ∈ V s ′ C w ( PE v s ′ , PE u ) + C r ( PE v s ′ , PE u ) + C d ( PE v s ′ , PE u )
wherein,
reconfigurable processing element PEuThe processing unit can be mapped and represents the operation node u to be mapped currently;
VS'representing the operation section to be mapped currentlyA set of all operation nodes mapped onto the reconfigurable array in the direct predecessor operation nodes of the point u;
representing the direct predecessor node v from which the operands required to map the currently to-be-mapped operation node u have been mappeds'Corresponding reconfigurable processing unitTo the current candidate reconfigurable processing element PEuIn the data transmission path of (2), a delay introduced by the interconnection line;
andrespectively representing the operand slave reconfigurable processing unit required by mapping the operation node u to be mapped currentlyTo the current candidate reconfigurable processing element PEuDelay introduced by the routing PE and the distributed register DRF in the data transmission path;
the calculation formula of the interconnection overhead is as follows:
wherein,
Vsrepresenting a set of all operational nodes that have been mapped onto the reconfigurable array;
Vs”representing a set of operation nodes which are mapped onto the reconfigurable array in a direct predecessor operation node and a direct successor operation node of the current operation node u to be mapped;
pred (u) represents the set of all direct predecessor operation nodes of the current operation node u to be mapped;
succ (u) represents the set of all the immediate successor operation nodes of the operation node u to be mapped currently;
representing a reconfigurable processing element PEuAndthe minimum number of routing PEs required to be inserted between the two PEs;
the calculation formula of the PE occupancy rate overhead is as follows:
U cos t ( PE u ) = P E O c c u p a t i o n C y c l e s ( PE u ) I I
wherein,
PEOccupationCycles(PEu) Representing the total time for the current reconfigurable processing element PE to perform the set of operations mapped thereto;
II, starting intervals of software running water of the reconfigurable system structure;
the calculation formula of the similarity overhead is as follows:
Re l cos t ( PE u ) = Σ v ∈ V min | V e x d i s t ( u , v ) - P E d i s t ( PE u - PE v ) |
wherein,
Vminrepresenting a set of all operation nodes with the shortest distance to the current operation node u to be mapped in the operation nodes v already mapped;
vexdist (u, V) denotes VminThe distance between the operation node v which is already mapped and the operation node u to be mapped currently;
PEdist(PEu-PEv) Represents VminThe reconfigurable processing unit PE mapped by the operation node v already mappedvCandidate reconfigurable processing unit PE with current operation node u to be mappeduThe distance between them;
(3) and traversing the feasible mapping schemes of the current operation node u to be mapped in sequence according to the sequence of the delay overhead, the interconnection overhead, the PE occupancy rate overhead and the similarity overhead, gradually reducing the feasible mapping scheme set, and finally obtaining the optimal mapping scheme.
2. The method of claim 1, wherein the method comprises the following steps: the traversal screening process of traversing each feasible mapping scheme, gradually reducing the feasible mapping scheme set and finally obtaining the optimal mapping scheme comprises the following 4 steps which are sequentially executed:
(1) traversing the delay overhead, sorting and screening all feasible mapping schemes according to the size of the delay overhead, and reserving the mapping scheme of the delay overhead within a certain threshold range, wherein the threshold range is adjusted according to a specific reconfigurable architecture in practical application;
(2) traversing interconnection expenses, sorting and screening the mapping scheme subjected to the traversing and screening of the delay expenses according to the size of the interconnection expenses, and reserving the mapping scheme of the interconnection expenses within a certain threshold range, wherein the threshold range is adjusted according to a specific reconfigurable architecture in practical application;
(3) traversing the PE occupancy rates, sorting and screening the mapping schemes subjected to interconnection overhead traversal and screening according to the PE occupancy rate overhead size, and reserving the mapping scheme with the minimum PE occupancy rate overhead;
(4) and traversing the similarity spending, sorting and screening the mapping schemes subjected to PE occupancy rate traversal screening according to the size of the similarity spending, and reserving the mapping scheme with the minimum similarity spending.
CN201310576351.2A 2013-11-18 2013-11-18 Reconfigurable architecture mapping decision-making method based on expense calculation Active CN103605573B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310576351.2A CN103605573B (en) 2013-11-18 2013-11-18 Reconfigurable architecture mapping decision-making method based on expense calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310576351.2A CN103605573B (en) 2013-11-18 2013-11-18 Reconfigurable architecture mapping decision-making method based on expense calculation

Publications (2)

Publication Number Publication Date
CN103605573A CN103605573A (en) 2014-02-26
CN103605573B true CN103605573B (en) 2017-02-08

Family

ID=50123802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310576351.2A Active CN103605573B (en) 2013-11-18 2013-11-18 Reconfigurable architecture mapping decision-making method based on expense calculation

Country Status (1)

Country Link
CN (1) CN103605573B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294278B (en) * 2016-08-01 2019-03-12 东南大学 Adaptive hardware for dynamic reconfigurable array computing system is pre-configured controller
CN107679012A (en) * 2017-09-27 2018-02-09 清华大学无锡应用技术研究院 Method and apparatus for the configuration of reconfigurable processing system
CN109274497A (en) * 2018-08-30 2019-01-25 无锡凯特微电子有限公司 A kind of mapping method of the SM3 algorithm based on reconfigurable arrays
CN109144702B (en) * 2018-09-06 2021-12-07 兰州大学 Multi-objective optimization automatic mapping scheduling method for row-column parallel coarse-grained reconfigurable array

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7451447B1 (en) * 1998-08-07 2008-11-11 Arc International Ip, Inc. Method, computer program and apparatus for operating system dynamic event management and task scheduling using function calls
CN102262612A (en) * 2011-07-28 2011-11-30 西安交通大学 Method for lowering restructurable hardware task layout conflict
CN102354337A (en) * 2011-10-14 2012-02-15 河海大学常州校区 Reconfigurable assembly line multi-target scheduling decision method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7590971B2 (en) * 2003-08-01 2009-09-15 Idx Investment Corporation Enterprise task manager

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7451447B1 (en) * 1998-08-07 2008-11-11 Arc International Ip, Inc. Method, computer program and apparatus for operating system dynamic event management and task scheduling using function calls
CN102262612A (en) * 2011-07-28 2011-11-30 西安交通大学 Method for lowering restructurable hardware task layout conflict
CN102354337A (en) * 2011-10-14 2012-02-15 河海大学常州校区 Reconfigurable assembly line multi-target scheduling decision method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
可重构计算相关技术研究;孙康;《中国博士学位论文全文数据库 信息科技辑》;20070630;全文 *

Also Published As

Publication number Publication date
CN103605573A (en) 2014-02-26

Similar Documents

Publication Publication Date Title
US11018979B2 (en) System and method for network slicing for service-oriented networks
Liu et al. A NoC traffic suite based on real applications
Rosvall et al. A constraint-based design space exploration framework for real-time applications on MPSoCs
US20170286861A1 (en) Structured machine learning framework
CN103605573B (en) Reconfigurable architecture mapping decision-making method based on expense calculation
CN112965710B (en) Calculation graph processing method, device and system
CN107633125A (en) A kind of analogue system Parallelism method based on Weighted Directed Graph
Wei et al. Software pipelining for stream programs on resource constrained multicore architectures
Neubauer et al. Enhancing symbolic system synthesis through ASPmT with partial assignment evaluation
Liu et al. A dependency-graph based priority assignment algorithm for real-time traffic over NoCs with shared virtual-channels
Wang et al. Low-latency scheduling approach for dependent tasks in MEC-enabled 5G vehicular networks
CN114139730B (en) Dynamic pricing and deployment method for machine learning tasks in edge cloud network
Mirza et al. Mapping streaming applications on multiprocessors with time-division-multiplexed network-on-chip
Chang et al. Reinforcement learning approach for mapping applications to dataflow-based coarse-grained reconfigurable array
Ruschke et al. A near optimal integrated solution for resource constrained scheduling, binding and routing on CGRAs
CN116306424A (en) PISA architecture chip resource arrangement method based on dynamic amplification layer-by-layer optimization algorithm with adjustable level margin improvement
van Stralen et al. Fitness prediction techniques for scenario-based design space exploration
Qureshi et al. Maintaining the feasibility of hard real-time systems with a reduced number of priority levels
JPWO2018235124A1 (en) Distributed arrangement apparatus, distributed arrangement system, and distributed arrangement method
Chen et al. MARCO: A High-performance Task M apping a nd R outing Co-optimization Framework for Point-to-Point NoC-based Heterogeneous Computing Systems
Singh et al. RAPIDITAS: RAPId design-space-exploration incorporating trace-based analysis and simulation
Bobda et al. High-level synthesis of on-chip multiprocessor architectures based on answer set programming
Li et al. An efficient multicore based parallel computing approach for TSP problems
Dehlaghi-Ghadim et al. Cost-Efficient Scheduling for Deadline Constrained Grid Workflows
Tao PISA architecture chip resource scheduling algorithm design and implementation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20151211

Address after: 518000 Guangdong city of Shenzhen province Nanshan District high tech Industrial Park Road eight South South technology Howare Technology Building 16

Applicant after: SHENZHEN PANGO MICROSYSTEMS CO., LTD.

Address before: 211189 Jiangsu Road, Jiangning District, Southeast University, No. 2, No.

Applicant before: Southeast University

CB02 Change of applicant information

Address after: 518000 Guangdong city of Shenzhen province Nanshan District high tech Industrial Park Road eight South South technology Howare Technology Building 16

Applicant after: Shenzhen Pango Microsystems Co., Ltd.

Address before: 518000 Guangdong city of Shenzhen province Nanshan District high tech Industrial Park Road eight South South technology Howare Technology Building 16

Applicant before: SHENZHEN PANGO MICROSYSTEMS CO., LTD.

COR Change of bibliographic data
C14 Grant of patent or utility model
GR01 Patent grant