CN103116493B - A kind of automatic mapping method being applied to coarse-grained reconfigurable array - Google Patents

A kind of automatic mapping method being applied to coarse-grained reconfigurable array Download PDF

Info

Publication number
CN103116493B
CN103116493B CN201310027776.8A CN201310027776A CN103116493B CN 103116493 B CN103116493 B CN 103116493B CN 201310027776 A CN201310027776 A CN 201310027776A CN 103116493 B CN103116493 B CN 103116493B
Authority
CN
China
Prior art keywords
data flow
priority
mapped
running node
flow diagram
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310027776.8A
Other languages
Chinese (zh)
Other versions
CN103116493A (en
Inventor
齐志
马璐
刘波
葛伟
曹鹏
杨军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201310027776.8A priority Critical patent/CN103116493B/en
Publication of CN103116493A publication Critical patent/CN103116493A/en
Application granted granted Critical
Publication of CN103116493B publication Critical patent/CN103116493B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of automatic mapping method being applied to coarse-grained reconfigurable array, comprise the following steps: by the application algorithm by high level language, be divided into the software section performed by main control processor and the hardware components accelerated by reconfigurable arrays; To the hardware components accelerated on array, use compiler to compile, obtain the data flow diagram describing this partial code; Select running node to be mapped in data flow diagram and be mapped on array, repeatedly above-mentioned selection being carried out to remaining running node, until all operations node in data flow diagram is all mapped, obtains the configuration file that can run on array; The configuration file obtained and the software section performed by main control processor are integrated, obtains the new application algorithm after integrating; The compiler of new application algorithm main control processor is compiled, generates the machine code can run on hardware.The present invention realizes the efficient allocation of reconfigurable arrays hardware resource and scheduling, to reduce mapping time.

Description

A kind of automatic mapping method being applied to coarse-grained reconfigurable array
Technical field
The invention belongs to embedded areas of information technology, specifically a kind of automatic mapping method being applied to coarse-grained reconfigurable array.
Background technology
In the computation schema that masses have been accustomed to, processor and special IC (ASIC) are two large main flows always.Along with application particularly embedded environment improving constantly the index demand such as performance, energy consumption, Time To Market of system, traditional computation schema exposes all drawbacks.Processor mode can realize various application neatly, but in performance defectiveness; Although and hardware logic realizes performance height, dirigibility is very poor.In order to dirigibility be done a well balance in calculated performance with realizing, Reconfigurable Computing Technology arises at the historic moment.Reconfigurable Computing Technology has concentrated the advantage of processor and special IC, can provide the computing power of high efficient and flexible, is also that exploration one solves the high design of nanoscale chip and throws the new road of sheet cost simultaneously.The compute-intensive applications of some main flows in built-in field is applicable to utilizing Reconfigurable Computing Technology to go to realize all very much.
A typical coarseness reconfigurable system is made up of one or more processor and reconfigurable function cell array.For an application algorithm by high level language, processor performs serial or non-critical code, and can all run on reconfigurable arrays to the code of hardware by efficient mapping.The partial code be mapped on reconfigurable arrays can effectively utilize the concurrency of hardware, and performs in pipelined fashion.Realize application algorithm with reconfigurable processor, have following main process: (1) hardware-software partition, the critical loops body occupying a large amount of execution time is mapped on reconfigurable arrays; (2) intermediate representation of generation cycle program, describes correlation computations operation that loop body comprises, data between them with this and controls dependence, and the information such as execution time needed for each operation; (3) map intermediate representation, the intermediate representation describing critical loops is mapped on reconfigurable arrays, and generate the configuration file that can perform on reconfigurable hardware.Wherein, describe the mapping method of the intermediate representation of critical loops body, can the advantage determined largely on reconfigurable system hardware be utilized fully.Mapping can adopt manual mapping or automated tool, but be mapped with following shortcoming by hand: on the one hand, manual mapping mode needs to spend a large amount of manpower and time, and needs the hardware configuration of mapping personnel to adopted reconfigurable system to have deep understanding; On the other hand, along with the expansion of array scale, the increase of application algorithm complex, the manual difficulty mapped and the probability of makeing mistakes also can enlarge markedly.
Summary of the invention
Goal of the invention: for above-mentioned prior art Problems existing and deficiency, the object of this invention is to provide a kind of automatic mapping method being applied to coarse-grained reconfigurable array, realize the efficient allocation to reconfigurable arrays hardware resource and scheduling, to reduce mapping time, improve simultaneously and map quality.
Technical scheme: for achieving the above object, the technical solution used in the present invention is a kind of automatic mapping method being applied to coarse-grained reconfigurable array, comprises the following steps:
1a) by the application algorithm by high level language, be divided into the software section performed by main control processor and the hardware components accelerated by reconfigurable arrays;
1b) to the hardware components accelerated on reconfigurable arrays, use compiler to compile, obtain the data flow diagram describing this partial code;
1c) select running node to be mapped in data flow diagram: the whole unmapped running node in data flow diagram arranged from high to low according to priority, select the highest running node of priority as running node to be mapped;
1d) selected running node to be mapped is mapped on reconfigurable arrays, repeatedly step 1c is carried out to remaining running node) described in arrangement and selection, until all operations node in data flow diagram is all mapped, obtain the configuration file that can run on reconfigurable arrays;
1e) by the configuration file obtained and step 1a) described in the software section performed by main control processor integrate, obtain integrate after new application algorithm;
1f) compiler of described application algorithm main control processor is newly compiled, generate the machine code can run on hardware.
Further, the step selected running node to be mapped be mapped on reconfigurable arrays is as follows:
2a) set up the priority list of functional unit: the computing cost calculating functional unit in reconfigurable arrays, and set up priority list according to the size of computing cost, the larger then priority of computing cost is lower, and the less then priority of computing cost is higher;
2b) determine distribution function unit: the functional unit in priority list is detected one by one, select the functional unit that first unappropriated priority is the highest;
2c) determine the route of inputoutput data: as step 2b) in complete after this running node distribution function unit, be this running node inputoutput data select routed path;
2d) backtracking is analyzed: if in step 2b) in do not find the functional unit that can map, and recall the threshold value that number of times do not exceed setting, then discharge some nodes of map operation, get back to step 1c);
2e) cutting data flow graph: when backtracking number of times exceedes the threshold value of setting, when still not finding the functional unit that can map, cutting data flow graph, forms new data flow diagram by remaining unmapped running node, get back to step 1c); Repeat said process, until all operations node is all mapped.
Further, described compiler is the compiler IMPACT that increases income.
As arranging each running node and the standard of the priority selected, first according to the height of each running node in data flow diagram, highly higher then priority is higher; To highly identical running node, then consider the number of child node of each running node, the more then priority is higher for child node.
As arranging each functional unit and the standard of the priority of searching for, the priority of functional unit is directly proportional to the number of its route resource, and the more then priority is higher for the number of route resource.
Beneficial effect: the automatic mapping method being applied to coarse-grained reconfigurable array that the present invention proposes, employing considers that the mapping algorithm of functional unit computing cost in running node and reconfigurable arrays (being called for short " array ") carries out distribution and the scheduling of calculating and storage resources in reconfigurable arrays, analyze the dependence of running node in the data flow diagram of application algorithm routine, utilize the hardware resource of reconfigurable arrays fully, improve the occupancy of functional unit in array, and save the time waste mapping by hand and cause, realize the optimization of calculating and utilization ratio of storage resources in reconfigurable arrays.
Accompanying drawing explanation
The structured flowchart of the coarseness reconfigurable system that Fig. 1 provides for the embodiment of the present invention;
Fig. 2 is workflow diagram of the present invention;
The data flow diagram of the representative loop body that Fig. 3 provides for the embodiment of the present invention;
A kind of topology diagram of the reconfigurable arrays that Fig. 4 provides for the embodiment of the present invention.
Have in figure: main control processor 1, reconfigurable arrays 2, IMPACT compiler 3, pretreater 4, resource distribution module 5, dividing sub-picture module 6.
Embodiment
Below in conjunction with the drawings and specific embodiments, illustrate the present invention further, these embodiments should be understood only be not used in for illustration of the present invention and limit the scope of the invention, after having read the present invention, the amendment of those skilled in the art to the various equivalent form of value of the present invention has all fallen within the application's claims limited range.
Fig. 1 is the structured flowchart of coarseness reconfigurable system.This coarseness reconfigurable system comprises main control processor 1, reconfigurable arrays 2.
Workflow of the present invention is as follows, see Fig. 2:
The first step, by the application algorithm by high level language, is divided into the software section performed by main control processor 1 and the hardware components accelerated by reconfigurable arrays 2.
Second step, to the hardware components accelerated on array, uses the compiler IMPACT3 that increases income, carries out a series of analysis, optimization, conversion, obtain the data flow diagram describing this partial code.
3rd step, selects running node to be mapped in data flow diagram: the whole unmapped running node in data flow diagram arranged according to the priority, selects the node that priority is the highest; Running node to be mapped selected by mapping, repeatedly arranges remaining operation node and selects, until all operations in data flow diagram is all mapped, obtains the configuration file that can run on reconfigurable arrays.
4th step, follows the configuration obtained the software section performed by primary processor to integrate, obtains the new application algorithm routine after integrating.
5th step, compiles the compiler of new algorithm program main control processor, generates the machine code can run on hardware.
The step selected running node to be mapped be mapped on array of the present invention is as follows: priority list a) setting up functional unit: the computing cost calculating functional unit in reconfigurable arrays, and sets up priority list according to the size of computing cost; B) determine distribution function unit: the functional unit in priority list is detected one by one, select the functional unit that first unappropriated priority is the highest; C) determine the route of inputoutput data: as step b) in complete after this running node distribution function unit, be this running node inputoutput data select routed path; D) backtracking is analyzed: if in step b) in do not find the functional unit that can map, and recall the threshold value that number of times do not exceed setting, then discharge some nodes of map operation, get back to step c); E) cutting data flow graph: when backtracking number of times exceedes the threshold value of setting, when still not finding the functional unit that can map, cutting data flow graph, forms new data flow diagram by remaining unmapped running node, get back to step c).Repeat said process, until all operations node is all mapped.
As one embodiment of the present of invention, arrange and the standard of the priority selected to each running node, be directly proportional to the number of its height in data flow diagram and its child node, highly higher, child node number is more, priority is higher.See Fig. 3, the computing method of the priority of running node of the present invention are described with the data flow diagram in Fig. 3, and detailed process is as follows:
First, according to the height of tree of running node in data flow diagram, highly more high priority is higher.Due to OP(running node) height of 0 is 3, OP1, the height of OP2 is 2, OP3, the height of OP4 is 1, OP5, the height of OP6 is 0, therefore, the priority list obtaining running node is { OP0, OP1, OP2, OP3, OP4, OP5, OP6};
Secondly, to level node, child node more multipriority is higher.OP0 does not have co-altitude running node; Have 2 child nodes for OP1 and OP2, OP1, and OP2 only has 1 child node, therefore the priority of OP1 is higher than PO2; Do not have child node to OP3 and OP4, OP3, and OP4 has 2 child nodes, therefore the priority of OP4 is higher than OP3; OP5 and OP6 does not all have child node.
Finally, the priority list of running node is { OP0, OP1, OP2, OP4, OP3, OP5, OP6}.
As an alternative embodiment of the invention, arrange each functional unit and the standard of the priority of searching for, be directly proportional to the number of its route resource, route resource is more, and priority is higher.See Fig. 4, the computing method of the priority of functional unit of the present invention are described with the topological structure of the reconfigurable arrays in Fig. 4, functional unit numbering 0 ~ 15 in figure, concrete analysis process is as follows: in the construction illustrated, middle functional unit has more interconnection line than the functional unit of surrounding, from the angle of route, this means if running node is mapped to the functional unit of more interconnection line, then its child node finds the possibility of mapping position larger, and therefore we give such functional unit higher priority.From the above mentioned, the priority order that can obtain array functional unit in Fig. 4 is: 5,6,9,10,1,2,4,7,8,11,13,14,0,3,12,15.
Introduce four key modules that the automatic mapping method being applied to coarse-grained reconfigurable array of the present invention relates to below respectively.
One, increase income compiler IMPACT3 in front end:
The compiler IMPACT3 that increases income is utilized input source program to be carried out to the analysis of morphology, syntax and semantics, to go forward side by side line correlation optimization and conversion, generate the intermediate representation file of three address code, this intermediate representation file contains all information of source program, for subsequent step.
Two, pretreater 4:
For generation and the various optimization of effective run time version, the controlling stream graph of program and the information of data flow diagram are required.Comprise controlling stream graph relevant with the stream in data flow diagram, inverse correlation, input the analysis that the stream between relevant and iteration is correlated with, each running node attribute in the drawings.
Three, resource distribution module 5:
Representing that the data flow diagram of application algorithm core code is mapped on reconfigurable arrays, be equivalent to each running node in data flow diagram to be assigned on the functional unit of array.In the result that running node is distributed and array, the annexation of each functional unit couples together computational resource, is equivalent to the limit in data flow diagram to describe out on reconfigurable arrays hardware.The while that the mapping method that the present invention adopts being a kind of, the method for placement-and-routing, namely connects up, has only been cabled successfully while layout, layout also success could be described, could stop layout.The while of this, the method for placement-and-routing may can find the scheme of better dealing with problems than placement-and-routing individually, so placement-and-routing is a kind ofly well selected simultaneously.In mapping process, use back-track algorithm, when some running node cannot map, select to remap the certain operations successfully mapped, allow them be mapped to different positions.The number of times of backtracking needs to be limited in certain threshold value, otherwise algorithm can search for all possibilities, brings very large complexity and excessive compilation time.Backtracking can increase the possibility finding solution, exchanges higher degree of parallelism for certain computing time.
Four, data flow diagram segmentation module 6:
The data flow diagram partitioning algorithm that the present invention proposes can be divided into three steps.First, the running node that mapping is failed is chosen.Secondly, check the functional unit that the father node of this running node maps, whether there is the route path being connected to other unoccupied functional units, if existed, namely find the cutting that feasible, otherwise identical process is repeated to the running node of its upper level, until find feasible cutting.Finally, the part that interpolation exports accordingly and input port has extremely mapped respectively and the part under being cut.A new data flow diagram just generates.Repeat this process until all running node are all successfully mapped.

Claims (5)

1. be applied to an automatic mapping method for coarse-grained reconfigurable array, it is characterized in that comprising the following steps:
1a) by the application algorithm by high level language, be divided into the software section performed by main control processor and the hardware components accelerated by reconfigurable arrays;
1b) to the hardware components accelerated on reconfigurable arrays, use compiler to compile, obtain the data flow diagram describing this partial code;
1c) select running node to be mapped in data flow diagram: the whole unmapped running node in data flow diagram arranged from high to low according to priority, select the highest running node of priority as running node to be mapped;
1d) selected running node to be mapped is mapped on reconfigurable arrays, repeatedly step 1c is carried out to remaining running node) described in arrangement and selection, until all operations node in data flow diagram is all mapped, obtain the configuration file that can run on reconfigurable arrays;
1e) by the configuration file obtained and step 1a) described in the software section performed by main control processor integrate, obtain integrate after new application algorithm;
1f) compiler of described application algorithm main control processor is newly compiled, generate the machine code can run on hardware;
Step 1d) described in step that selected running node to be mapped is mapped on reconfigurable arrays as follows:
2a) set up the priority list of functional unit: the computing cost calculating functional unit in reconfigurable arrays, and set up priority list according to the size of computing cost, the larger then priority of computing cost is lower, and the less then priority of computing cost is higher;
2b) determine distribution function unit: the functional unit in priority list is detected one by one, select the functional unit that first unappropriated priority is the highest;
2c) determine the route of inputoutput data: as step 2b) in complete after this running node distribution function unit, be this running node inputoutput data select routed path;
2d) backtracking is analyzed: if in step 2b) in do not find the functional unit that can map, and recall the threshold value that number of times do not exceed setting, then discharge some nodes of map operation, get back to step 1c);
2e) cutting data flow graph: when backtracking number of times exceedes the threshold value of setting, when still not finding the functional unit that can map, cutting data flow graph, forms new data flow diagram by remaining unmapped running node.
2. a kind of automatic mapping method being applied to coarse-grained reconfigurable array according to claim 1, is characterized in that, the priority of running node is directly proportional to its height in data flow diagram.
3. a kind of automatic mapping method being applied to coarse-grained reconfigurable array according to claim 2, is characterized in that, the running node identical to the height in data flow diagram, its priority is directly proportional to the number of its child node.
4. a kind of automatic mapping method being applied to coarse-grained reconfigurable array according to claim 1, it is characterized in that, the priority of described functional unit is directly proportional to the number of its route resource.
5. a kind of automatic mapping method being applied to coarse-grained reconfigurable array according to claim 1, is characterized in that, described step 1b) in compiler be the compiler IMPACT that increases income.
CN201310027776.8A 2013-01-21 2013-01-21 A kind of automatic mapping method being applied to coarse-grained reconfigurable array Expired - Fee Related CN103116493B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310027776.8A CN103116493B (en) 2013-01-21 2013-01-21 A kind of automatic mapping method being applied to coarse-grained reconfigurable array

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310027776.8A CN103116493B (en) 2013-01-21 2013-01-21 A kind of automatic mapping method being applied to coarse-grained reconfigurable array

Publications (2)

Publication Number Publication Date
CN103116493A CN103116493A (en) 2013-05-22
CN103116493B true CN103116493B (en) 2016-01-06

Family

ID=48414879

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310027776.8A Expired - Fee Related CN103116493B (en) 2013-01-21 2013-01-21 A kind of automatic mapping method being applied to coarse-grained reconfigurable array

Country Status (1)

Country Link
CN (1) CN103116493B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107402745B (en) * 2017-07-04 2020-05-22 清华大学 Mapping method and device of data flow graph
CN107679012A (en) * 2017-09-27 2018-02-09 清华大学无锡应用技术研究院 Method and apparatus for the configuration of reconfigurable processing system
CN109144702B (en) * 2018-09-06 2021-12-07 兰州大学 Multi-objective optimization automatic mapping scheduling method for row-column parallel coarse-grained reconfigurable array
CN109471636B (en) * 2018-09-14 2020-07-14 上海交通大学 Operator mapping method and system of coarse-grained reconfigurable architecture
CN111045959B (en) * 2019-11-18 2024-03-19 中国航空工业集团公司西安航空计算技术研究所 Complex algorithm variable mapping method based on storage optimization
CN111090613B (en) * 2019-11-25 2022-03-15 中国人民解放军国防科技大学 Low-complexity hardware and software partitioning and scheduling method based on graph partitioning
CN111930680B (en) * 2020-09-17 2021-02-23 北京清微智能科技有限公司 Reconfigurable array mapping method and device
CN112306500B (en) * 2020-11-30 2022-06-07 上海交通大学 Compiling method for reducing multi-class access conflict aiming at coarse-grained reconfigurable structure
CN113094030A (en) * 2021-02-09 2021-07-09 北京清微智能科技有限公司 Easily compiling method and system for reconfigurable chip
CN115080055A (en) * 2022-06-15 2022-09-20 东南大学 Reconfigurable system chip compiler and automatic compiling method for information security

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101630274A (en) * 2009-07-31 2010-01-20 清华大学 Method for dividing cycle task by means of software and hardware and device thereof
CN101630275A (en) * 2009-07-31 2010-01-20 清华大学 Realizing method of configuration information for generating cycle task and device thereof
CN102508816A (en) * 2011-11-15 2012-06-20 东南大学 Configuration method applied to coarse-grained reconfigurable array

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020100029A1 (en) * 2000-07-20 2002-07-25 Matt Bowen System, method and article of manufacture for compiling and invoking C functions in hardware
US20030117971A1 (en) * 2001-12-21 2003-06-26 Celoxica Ltd. System, method, and article of manufacture for profiling an executable hardware model using calls to profiling functions
US7506297B2 (en) * 2004-06-15 2009-03-17 University Of North Carolina At Charlotte Methodology for scheduling, partitioning and mapping computational tasks onto scalable, high performance, hybrid FPGA networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101630274A (en) * 2009-07-31 2010-01-20 清华大学 Method for dividing cycle task by means of software and hardware and device thereof
CN101630275A (en) * 2009-07-31 2010-01-20 清华大学 Realizing method of configuration information for generating cycle task and device thereof
CN102508816A (en) * 2011-11-15 2012-06-20 东南大学 Configuration method applied to coarse-grained reconfigurable array

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Hybrid-Priority Configuration Cache Supervision Method for Coarse Grained Reconfigurable Architecture;Peng Cao等;《Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2012 International Conference on》;20121012;第408-414页 *
核心循环到粗粒度可重构体系结构的流水化映射;王大伟等;《计算机学报》;20090630;第32卷(第6期);第1089-1099页 *

Also Published As

Publication number Publication date
CN103116493A (en) 2013-05-22

Similar Documents

Publication Publication Date Title
CN103116493B (en) A kind of automatic mapping method being applied to coarse-grained reconfigurable array
CN101436128B (en) Software test case automatic generating method and system
CN102831011B (en) A kind of method for scheduling task based on many core systems and device
CN102637157B (en) DTSOC (digital template system on chip)
CN102508816B (en) Configuration method applied to coarse-grained reconfigurable array
CN106339351B (en) A kind of SGD algorithm optimization system and method
CN101464799A (en) MPI parallel programming system based on visual modeling and automatic skeleton code generation method
CN102063376A (en) Test case selection method
CN102799624B (en) Large-scale graph data query method in distributed environment based on Datalog
CN114995823A (en) Deep learning compiler optimization method for special accelerator for CNN
CN114995822A (en) Deep learning compiler optimization method special for CNN accelerator
CN102163248B (en) Advanced synthesizing method for integrated circuit
CN110704364A (en) Automatic dynamic reconstruction method and system based on field programmable gate array
CN102622334B (en) Parallel XSLT (Extensible Style-sheet Language Transformation) conversion method and device for use in multi-thread environment
Huang et al. Application-specific network-on-chip synthesis with topology-aware floorplanning
CN103135975B (en) The element sequence auto arranging method of control strategy configuration and logical connection method thereof
CN105302624A (en) Automatic analysis method capable of reconstructing start interval of periodic pipeline iteration in complier
CN109471636B (en) Operator mapping method and system of coarse-grained reconfigurable architecture
CN105404611A (en) Matrix model based multi-calculation-engine automatic selection method
CN105511867A (en) Optimization mode automatic generation method and optimization device
CN103605573A (en) Reconfigurable architecture mapping decision-making method based on expense calculation
CN102222274A (en) Immune clone selection job shop scheduling method based on scheduling coding
CN103049310A (en) Multi-core simulation parallel accelerating method based on sampling
CN104778088A (en) Method and system for optimizing parallel I/O (input/output) by reducing inter-progress communication expense
CN105302551B (en) A kind of method and system of the Orthogonal Decomposition construction and optimization of big data processing system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160106

Termination date: 20200121