CN103116493B

CN103116493B - A kind of automatic mapping method being applied to coarse-grained reconfigurable array

Info

Publication number: CN103116493B
Application number: CN201310027776.8A
Authority: CN
Inventors: 齐志; 马璐; 刘波; 葛伟; 曹鹏; 杨军
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2013-01-21
Filing date: 2013-01-21
Publication date: 2016-01-06
Anticipated expiration: 2033-01-21
Also published as: CN103116493A

Abstract

The invention discloses a kind of automatic mapping method being applied to coarse-grained reconfigurable array, comprise the following steps: by the application algorithm by high level language, be divided into the software section performed by main control processor and the hardware components accelerated by reconfigurable arrays; To the hardware components accelerated on array, use compiler to compile, obtain the data flow diagram describing this partial code; Select running node to be mapped in data flow diagram and be mapped on array, repeatedly above-mentioned selection being carried out to remaining running node, until all operations node in data flow diagram is all mapped, obtains the configuration file that can run on array; The configuration file obtained and the software section performed by main control processor are integrated, obtains the new application algorithm after integrating; The compiler of new application algorithm main control processor is compiled, generates the machine code can run on hardware.The present invention realizes the efficient allocation of reconfigurable arrays hardware resource and scheduling, to reduce mapping time.

Description

A kind of automatic mapping method being applied to coarse-grained reconfigurable array

Technical field

The invention belongs to embedded areas of information technology, specifically a kind of automatic mapping method being applied to coarse-grained reconfigurable array.

Background technology

In the computation schema that masses have been accustomed to, processor and special IC (ASIC) are two large main flows always.Along with application particularly embedded environment improving constantly the index demand such as performance, energy consumption, Time To Market of system, traditional computation schema exposes all drawbacks.Processor mode can realize various application neatly, but in performance defectiveness; Although and hardware logic realizes performance height, dirigibility is very poor.In order to dirigibility be done a well balance in calculated performance with realizing, Reconfigurable Computing Technology arises at the historic moment.Reconfigurable Computing Technology has concentrated the advantage of processor and special IC, can provide the computing power of high efficient and flexible, is also that exploration one solves the high design of nanoscale chip and throws the new road of sheet cost simultaneously.The compute-intensive applications of some main flows in built-in field is applicable to utilizing Reconfigurable Computing Technology to go to realize all very much.

A typical coarseness reconfigurable system is made up of one or more processor and reconfigurable function cell array.For an application algorithm by high level language, processor performs serial or non-critical code, and can all run on reconfigurable arrays to the code of hardware by efficient mapping.The partial code be mapped on reconfigurable arrays can effectively utilize the concurrency of hardware, and performs in pipelined fashion.Realize application algorithm with reconfigurable processor, have following main process: (1) hardware-software partition, the critical loops body occupying a large amount of execution time is mapped on reconfigurable arrays; (2) intermediate representation of generation cycle program, describes correlation computations operation that loop body comprises, data between them with this and controls dependence, and the information such as execution time needed for each operation; (3) map intermediate representation, the intermediate representation describing critical loops is mapped on reconfigurable arrays, and generate the configuration file that can perform on reconfigurable hardware.Wherein, describe the mapping method of the intermediate representation of critical loops body, can the advantage determined largely on reconfigurable system hardware be utilized fully.Mapping can adopt manual mapping or automated tool, but be mapped with following shortcoming by hand: on the one hand, manual mapping mode needs to spend a large amount of manpower and time, and needs the hardware configuration of mapping personnel to adopted reconfigurable system to have deep understanding; On the other hand, along with the expansion of array scale, the increase of application algorithm complex, the manual difficulty mapped and the probability of makeing mistakes also can enlarge markedly.

Summary of the invention

Goal of the invention: for above-mentioned prior art Problems existing and deficiency, the object of this invention is to provide a kind of automatic mapping method being applied to coarse-grained reconfigurable array, realize the efficient allocation to reconfigurable arrays hardware resource and scheduling, to reduce mapping time, improve simultaneously and map quality.

Technical scheme: for achieving the above object, the technical solution used in the present invention is a kind of automatic mapping method being applied to coarse-grained reconfigurable array, comprises the following steps:

1a) by the application algorithm by high level language, be divided into the software section performed by main control processor and the hardware components accelerated by reconfigurable arrays;

1b) to the hardware components accelerated on reconfigurable arrays, use compiler to compile, obtain the data flow diagram describing this partial code;

1c) select running node to be mapped in data flow diagram: the whole unmapped running node in data flow diagram arranged from high to low according to priority, select the highest running node of priority as running node to be mapped;

1d) selected running node to be mapped is mapped on reconfigurable arrays, repeatedly step 1c is carried out to remaining running node) described in arrangement and selection, until all operations node in data flow diagram is all mapped, obtain the configuration file that can run on reconfigurable arrays;

1e) by the configuration file obtained and step 1a) described in the software section performed by main control processor integrate, obtain integrate after new application algorithm;

1f) compiler of described application algorithm main control processor is newly compiled, generate the machine code can run on hardware.

Further, the step selected running node to be mapped be mapped on reconfigurable arrays is as follows:

2a) set up the priority list of functional unit: the computing cost calculating functional unit in reconfigurable arrays, and set up priority list according to the size of computing cost, the larger then priority of computing cost is lower, and the less then priority of computing cost is higher;

2b) determine distribution function unit: the functional unit in priority list is detected one by one, select the functional unit that first unappropriated priority is the highest;

2c) determine the route of inputoutput data: as step 2b) in complete after this running node distribution function unit, be this running node inputoutput data select routed path;

2d) backtracking is analyzed: if in step 2b) in do not find the functional unit that can map, and recall the threshold value that number of times do not exceed setting, then discharge some nodes of map operation, get back to step 1c);

2e) cutting data flow graph: when backtracking number of times exceedes the threshold value of setting, when still not finding the functional unit that can map, cutting data flow graph, forms new data flow diagram by remaining unmapped running node, get back to step 1c); Repeat said process, until all operations node is all mapped.

Further, described compiler is the compiler IMPACT that increases income.

As arranging each running node and the standard of the priority selected, first according to the height of each running node in data flow diagram, highly higher then priority is higher; To highly identical running node, then consider the number of child node of each running node, the more then priority is higher for child node.

As arranging each functional unit and the standard of the priority of searching for, the priority of functional unit is directly proportional to the number of its route resource, and the more then priority is higher for the number of route resource.

Beneficial effect: the automatic mapping method being applied to coarse-grained reconfigurable array that the present invention proposes, employing considers that the mapping algorithm of functional unit computing cost in running node and reconfigurable arrays (being called for short " array ") carries out distribution and the scheduling of calculating and storage resources in reconfigurable arrays, analyze the dependence of running node in the data flow diagram of application algorithm routine, utilize the hardware resource of reconfigurable arrays fully, improve the occupancy of functional unit in array, and save the time waste mapping by hand and cause, realize the optimization of calculating and utilization ratio of storage resources in reconfigurable arrays.

Accompanying drawing explanation

The structured flowchart of the coarseness reconfigurable system that Fig. 1 provides for the embodiment of the present invention;

Fig. 2 is workflow diagram of the present invention;

The data flow diagram of the representative loop body that Fig. 3 provides for the embodiment of the present invention;

A kind of topology diagram of the reconfigurable arrays that Fig. 4 provides for the embodiment of the present invention.

Have in figure: main control processor 1, reconfigurable arrays 2, IMPACT compiler 3, pretreater 4, resource distribution module 5, dividing sub-picture module 6.

Embodiment

Below in conjunction with the drawings and specific embodiments, illustrate the present invention further, these embodiments should be understood only be not used in for illustration of the present invention and limit the scope of the invention, after having read the present invention, the amendment of those skilled in the art to the various equivalent form of value of the present invention has all fallen within the application's claims limited range.

Fig. 1 is the structured flowchart of coarseness reconfigurable system.This coarseness reconfigurable system comprises main control processor 1, reconfigurable arrays 2.

Workflow of the present invention is as follows, see Fig. 2:

The first step, by the application algorithm by high level language, is divided into the software section performed by main control processor 1 and the hardware components accelerated by reconfigurable arrays 2.

Second step, to the hardware components accelerated on array, uses the compiler IMPACT3 that increases income, carries out a series of analysis, optimization, conversion, obtain the data flow diagram describing this partial code.

3rd step, selects running node to be mapped in data flow diagram: the whole unmapped running node in data flow diagram arranged according to the priority, selects the node that priority is the highest; Running node to be mapped selected by mapping, repeatedly arranges remaining operation node and selects, until all operations in data flow diagram is all mapped, obtains the configuration file that can run on reconfigurable arrays.

4th step, follows the configuration obtained the software section performed by primary processor to integrate, obtains the new application algorithm routine after integrating.

5th step, compiles the compiler of new algorithm program main control processor, generates the machine code can run on hardware.

The step selected running node to be mapped be mapped on array of the present invention is as follows: priority list a) setting up functional unit: the computing cost calculating functional unit in reconfigurable arrays, and sets up priority list according to the size of computing cost; B) determine distribution function unit: the functional unit in priority list is detected one by one, select the functional unit that first unappropriated priority is the highest; C) determine the route of inputoutput data: as step b) in complete after this running node distribution function unit, be this running node inputoutput data select routed path; D) backtracking is analyzed: if in step b) in do not find the functional unit that can map, and recall the threshold value that number of times do not exceed setting, then discharge some nodes of map operation, get back to step c); E) cutting data flow graph: when backtracking number of times exceedes the threshold value of setting, when still not finding the functional unit that can map, cutting data flow graph, forms new data flow diagram by remaining unmapped running node, get back to step c).Repeat said process, until all operations node is all mapped.

As one embodiment of the present of invention, arrange and the standard of the priority selected to each running node, be directly proportional to the number of its height in data flow diagram and its child node, highly higher, child node number is more, priority is higher.See Fig. 3, the computing method of the priority of running node of the present invention are described with the data flow diagram in Fig. 3, and detailed process is as follows:

First, according to the height of tree of running node in data flow diagram, highly more high priority is higher.Due to OP(running node) height of 0 is 3, OP1, the height of OP2 is 2, OP3, the height of OP4 is 1, OP5, the height of OP6 is 0, therefore, the priority list obtaining running node is { OP0, OP1, OP2, OP3, OP4, OP5, OP6};

Secondly, to level node, child node more multipriority is higher.OP0 does not have co-altitude running node; Have 2 child nodes for OP1 and OP2, OP1, and OP2 only has 1 child node, therefore the priority of OP1 is higher than PO2; Do not have child node to OP3 and OP4, OP3, and OP4 has 2 child nodes, therefore the priority of OP4 is higher than OP3; OP5 and OP6 does not all have child node.

Finally, the priority list of running node is { OP0, OP1, OP2, OP4, OP3, OP5, OP6}.

As an alternative embodiment of the invention, arrange each functional unit and the standard of the priority of searching for, be directly proportional to the number of its route resource, route resource is more, and priority is higher.See Fig. 4, the computing method of the priority of functional unit of the present invention are described with the topological structure of the reconfigurable arrays in Fig. 4, functional unit numbering 0 ~ 15 in figure, concrete analysis process is as follows: in the construction illustrated, middle functional unit has more interconnection line than the functional unit of surrounding, from the angle of route, this means if running node is mapped to the functional unit of more interconnection line, then its child node finds the possibility of mapping position larger, and therefore we give such functional unit higher priority.From the above mentioned, the priority order that can obtain array functional unit in Fig. 4 is: 5,6,9,10,1,2,4,7,8,11,13,14,0,3,12,15.

Introduce four key modules that the automatic mapping method being applied to coarse-grained reconfigurable array of the present invention relates to below respectively.

One, increase income compiler IMPACT3 in front end:

The compiler IMPACT3 that increases income is utilized input source program to be carried out to the analysis of morphology, syntax and semantics, to go forward side by side line correlation optimization and conversion, generate the intermediate representation file of three address code, this intermediate representation file contains all information of source program, for subsequent step.

Two, pretreater 4:

For generation and the various optimization of effective run time version, the controlling stream graph of program and the information of data flow diagram are required.Comprise controlling stream graph relevant with the stream in data flow diagram, inverse correlation, input the analysis that the stream between relevant and iteration is correlated with, each running node attribute in the drawings.

Three, resource distribution module 5:

Representing that the data flow diagram of application algorithm core code is mapped on reconfigurable arrays, be equivalent to each running node in data flow diagram to be assigned on the functional unit of array.In the result that running node is distributed and array, the annexation of each functional unit couples together computational resource, is equivalent to the limit in data flow diagram to describe out on reconfigurable arrays hardware.The while that the mapping method that the present invention adopts being a kind of, the method for placement-and-routing, namely connects up, has only been cabled successfully while layout, layout also success could be described, could stop layout.The while of this, the method for placement-and-routing may can find the scheme of better dealing with problems than placement-and-routing individually, so placement-and-routing is a kind ofly well selected simultaneously.In mapping process, use back-track algorithm, when some running node cannot map, select to remap the certain operations successfully mapped, allow them be mapped to different positions.The number of times of backtracking needs to be limited in certain threshold value, otherwise algorithm can search for all possibilities, brings very large complexity and excessive compilation time.Backtracking can increase the possibility finding solution, exchanges higher degree of parallelism for certain computing time.

Four, data flow diagram segmentation module 6:

The data flow diagram partitioning algorithm that the present invention proposes can be divided into three steps.First, the running node that mapping is failed is chosen.Secondly, check the functional unit that the father node of this running node maps, whether there is the route path being connected to other unoccupied functional units, if existed, namely find the cutting that feasible, otherwise identical process is repeated to the running node of its upper level, until find feasible cutting.Finally, the part that interpolation exports accordingly and input port has extremely mapped respectively and the part under being cut.A new data flow diagram just generates.Repeat this process until all running node are all successfully mapped.

Claims

1. be applied to an automatic mapping method for coarse-grained reconfigurable array, it is characterized in that comprising the following steps:

1f) compiler of described application algorithm main control processor is newly compiled, generate the machine code can run on hardware;

Step 1d) described in step that selected running node to be mapped is mapped on reconfigurable arrays as follows:

2e) cutting data flow graph: when backtracking number of times exceedes the threshold value of setting, when still not finding the functional unit that can map, cutting data flow graph, forms new data flow diagram by remaining unmapped running node.

2. a kind of automatic mapping method being applied to coarse-grained reconfigurable array according to claim 1, is characterized in that, the priority of running node is directly proportional to its height in data flow diagram.

3. a kind of automatic mapping method being applied to coarse-grained reconfigurable array according to claim 2, is characterized in that, the running node identical to the height in data flow diagram, its priority is directly proportional to the number of its child node.

4. a kind of automatic mapping method being applied to coarse-grained reconfigurable array according to claim 1, it is characterized in that, the priority of described functional unit is directly proportional to the number of its route resource.

5. a kind of automatic mapping method being applied to coarse-grained reconfigurable array according to claim 1, is characterized in that, described step 1b) in compiler be the compiler IMPACT that increases income.