CN103984560A - Embedded reconfigurable system based on large-scale coarseness and processing method thereof - Google Patents

Embedded reconfigurable system based on large-scale coarseness and processing method thereof Download PDF

Info

Publication number
CN103984560A
CN103984560A CN201410240683.8A CN201410240683A CN103984560A CN 103984560 A CN103984560 A CN 103984560A CN 201410240683 A CN201410240683 A CN 201410240683A CN 103984560 A CN103984560 A CN 103984560A
Authority
CN
China
Prior art keywords
reconfigurable
data
arrays
fir
reconfigurable arrays
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410240683.8A
Other languages
Chinese (zh)
Other versions
CN103984560B (en
Inventor
曹鹏
刘波
汪芮合
杨苗苗
刘杨
朱婉瑜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201410240683.8A priority Critical patent/CN103984560B/en
Publication of CN103984560A publication Critical patent/CN103984560A/en
Application granted granted Critical
Publication of CN103984560B publication Critical patent/CN103984560B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Advance Control (AREA)

Abstract

The invention discloses an embedded reconfigurable system based on large-scale coarseness and a processing method of the embedded reconfigurable system. The embedded reconfigurable system comprises a system bus, a configuration bus, an embedded microprocessor, an external storage device, an interrupt controller, a direct memory access controller, an on-chip data storage device, an on-chip configuration information storage device, a reconfigurable processor and a reconfiguration controller. The method aims at an N-order FIR filter, the convolution is directly carried out on an input sequence and a filter coefficient sequence of the N-order FIR filter to obtain an output sequence, and on the basis of a direct type structure, the reconfigurable processor is used for optimization and acceleration.

Description

Based on extensive coarseness imbedded reconfigurable system and disposal route thereof
Technical field
The present invention relates to imbedded reconfigurable system field, relate in particular to a kind of can be applicable to the occasion such as radar, communication based on extensive coarseness imbedded reconfigurable system and disposal route thereof.
Background technology
General processor and special IC (ASIC) are the two large main flow algorithms in traditional Computer Systems Organization field.But along with improving constantly of the index demands such as the performance of application to system, energy consumption, Time To Market, the drawback of these two kinds of traditional calculations patterns just comes out.
General processor method is applied widely, and counting yield is low, although special IC can improve computing velocity and counting yield, meets performance requirement, but the dirigibility of ASIC device is very poor.
In order to realize well balance between dirigibility and counting yield, restructural calculates (reconfigurable computing) technology and arises at the historic moment.It is one of the development trend in current computer system architecture field that restructural calculates, and its framework, between general processor and ASIC, and combines the two strong point.It is by being configured restructural equipment, can make it to be converted into a special hardware system by a general computing platform, to complete concrete calculation task, be equivalent to calculation task and on time and space, launch simultaneously, demonstrate dirigibility and the very high calculated performance of application.In addition, Reconfigurable Computing Technology also has the advantages such as system energy consumption is low, reliability is high, Time To Market is short.These advantages make Reconfigurable Computing Technology especially have broad application prospects in Embedded Application field in each application.A lot of mainstream applications in built-in field, such as multimedia application, enciphering/deciphering application and communications applications etc. are all applicable to utilizing Reconfigurable Computing Technology to realize very much.Current Reconfigurable Computing Technology is main or for the computing platform in sophisticated technology field, but along with reconfigurable logic device cost reduces gradually, when operation, Reconfigurable Computing Technology is constantly perfect, and we have reason to believe that all advantages that Reconfigurable Computing Technology possesses can make it in more field, fully develop talents.
Traditional restructurable computing system is as ReMAP, AsAP, the reconfigurable system frameworks such as DRP, mutual contact mode in these system architecture array reconfiguration arrays is comparatively simple, in the time carrying out the computing of FIR wave filter, it exists array utilization factor lower, the computing FIR wave filter shortcomings such as especially cycle of high order FIR filter is long, interconnected between array do not take into full account the parallel of FIR filtering operation, causes the operation efficiency of FIR lower.In addition, traditional restructurable computing system keeps the dirigibility that in FIR filtering operation, exponent number changes also to have problem greatly with less execution cycle aspect at the same time.
Summary of the invention
In view of this, the present invention proposes one based on extensive coarseness imbedded reconfigurable system and disposal route thereof, calculates degree of parallelism by raising, optimizes the methods such as streamline and has realized the high-level efficiency FIR computing to higher order filter.
For achieving the above object, the technical solution used in the present invention is for N rank FIR wave filter (N is more than or equal to 32), its list entries and the direct convolution of filter factor sequence obtain output sequence, on the basis of Direct-type structure, are optimized with reconfigurable processor.
The invention provides a kind ofly based on extensive coarseness imbedded reconfigurable system, it comprises: configuration information storer, reconfigurable processor and reconfigurable controller on data-carrier store, sheet on system bus, configuration bus, embedded microprocessor, external memory storage, interruptable controller, direct memory access controller, sheet;
The data of reconfigurable processor for shining upon FIR algorithm, wherein, determine the computing flow process of FIR algorithm by analyzing FIR algorithm, then formulate data entry mode according to data input feature in computing flow process and FIR, then FIR algorithm is mapped on described reconfigurable processor;
Direct memory access controller deposits the required configuration information using and operational data on corresponding sheet on configuration information storer and sheet in data-carrier store;
Embedded microprocessor, by starting described reconfigurable controller to its setting, configuration information is sent to described reconfigurable processor from configuration bus, control the execution of described reconfigurable processor task, when reconfigurable processor completes after current task, send look-at-me to embedded microprocessor;
System bus, for connecting the functional part of described imbedded reconfigurable system, comprises external memory storage, direct memory access controller, embedded microprocessor, interruptable controller, reconfigurable processor, data-carrier store on sheet; System bus, for all functional parts that are connected thereto, is all supported two-way data access.By system bus, embedded microprocessor can access and control system in the duty of each functional part, or from external memory storage and sheet, in data-carrier store, read required instruction and data information; In addition, reconfigurable processor and direct memory access controller also can be accessed data-carrier store and external memory storage on sheet by system bus, read or write required data message;
Configuration bus, for connecting reconfigurable controller, configuration information storer and reconfigurable processor on sheet; In addition configuration bus and system bus are two-way connects and composes complete imbedded reconfigurable system, make embedded microprocessor and external memory storage can read with access configuration bus on information; Wherein, the control information of embedded microprocessor arrives configuration bus by system bus and control reconfiguration controller generates configuration information, described configuration information is written to reconfigurable processor or deposits configuration information storer on sheet in by configuration bus is unidirectional, thereby realizes the configuration of reconfigurable processor function; Configuration bus is supported the one-way transmission function of reconfigurable processor, supports and reconfigurable controller configuration information storer on sheet, the data-transformation facility that system bus is two-way; Wherein, on reconfigurable controller or sheet, configuration information storer is written to configuration information in configuration bus, and then these configuration information uniaxiallies are sent to reconfigurable processor by configuration bus;
Interruptable controller, for initiating an interrupt request to embedded microprocessor; Wherein, interrupting input source comprises direct memory access controller and reconfigurable processor.In the time that direct memory access controller completes a secondary data carrying, will initiate an interrupt request to embedded microprocessor by interruptable controller, to complete follow-up function; After in the time that reconfigurable processor completes one or more sets and configures corresponding calculation task, interruptable controller need send look-at-me to embedded microprocessor, and the informed embed processor that declines starts reconfigurable controller and send down one or more sets configuration information;
External memory storage is for storing the initialization data of described imbedded reconfigurable system, on sheet, data-carrier store is for storing data message and the required data message of native system operational process that external memory storage transmits, and on sheet, configuration information storer is for storing the initializes configuration information that external memory storage writes by configuration bus.
Preferably, described reconfigurable processor is made up of reconfigurable arrays data sharing storage unit in reconfigurable arrays data sharing memory cell arrangements register file and a slice in four reconfigurable arrays, four reconfigurable arrays configuration register heaps, a configuration information loading interface, a slice.
Preferably, described in each, reconfigurable arrays can complete respectively multiplying and additive operation; Described reconfigurable arrays configuration register heap is respectively used to the functional configuration of corresponding reconfigurable arrays; Described configuration information loading interface is used for receiving the configuration information sending in configuration bus, and sends to respectively reconfigurable arrays data sharing memory cell arrangements register file and reconfigurable arrays configuration register heap on the sheet in reconfigurable processor; On sheet, reconfigurable arrays data sharing memory cell arrangements register file is for the functional configuration of reconfigurable arrays data sharing storage unit on sheet; On sheet, reconfigurable arrays data sharing storage unit is for the storage of data between four reconfigurable arrays of calculating process and the transmission of final operation result.
Preferably, the structure of described upper reconfigurable arrays data sharing storage unit comprises a memory cell selecting device, a results of intermediate calculations data buffer storage unit, a coefficient buffer unit and an external memory data access interface.
Preferably, described memory cell selecting device selects to access results of intermediate calculations data buffer storage unit or coefficient buffer unit by reading and resolve different configuration informations, described results of intermediate calculations data buffer storage unit is for storing data and the final operation result of four described reconfigurable arrays computing pilot process, described coefficient buffer unit is used for storing four described reconfigurable arrays needed filter coefficient while carrying out FIR computing, described external memory data access interface is for realizing the data interaction between described upper reconfigurable arrays data sharing storage unit and external memory storage.
Preferably, each reconfigurable arrays comprises 8 × 8 two-dimentional fundamental operation arrays that are made up of 64 basic processing units; In described fundamental operation array, comprise 32 multipliers and 32 totalizers, the subsidiary register of each described basic processing unit in described fundamental operation array, calculate required data for temporary next step, described fundamental operation array the 8th row the 8th row are 1 totalizers, and the net result of calculating is exported by this totalizer.
Preferably, in the n moment, the process that FIR algorithm is changed into data flow diagram is mainly divided into two parts, Part I is multiply-add operation, the input data of multiplying are sampled point x (n-k) and filter factor h (k) (wherein k=0, 1, N-1), input data are evenly distributed on four reconfigurable arrays in order, after a reconfigurable arrays calculates, the output data of each array write the results of intermediate calculations data buffer storage unit in reconfigurable arrays data sharing storage unit on sheet, Part II is four data in results of intermediate calculations data buffer storage unit to be read into any one reconfigurable arrays in reconfigurable processor do summation operation and can obtain final required output y (n), for a fixing FIR wave filter, its exponent number and filter factor sequence are fixed, filter factor sequence can be stored in coefficient buffer unit in advance, in the time using this wave filter, coefficient buffer unit is deposited this filter factor sequence in corresponding multiplier, the data that wherein x (n-k) is plural form, the data that h (k) is plural form, n is the moment of input data, N is wave filter unit impulse response length, four reconfigurable arrays are to carry out the FIR algorithm that computing realization is made up of multiply-add operation simultaneously.
It is a kind of based on extensive coarseness imbedded reconfigurable system disposal route that the present invention also provides, and it comprises the following steps:
(1) analyze FIR algorithm characteristic, summarize data flow diagram;
(2) the computing flow process definite according to data flow diagram, in conjunction with FIR data input feature, formulates the data entry mode of FIR;
(3) determined after data flow diagram and data entry mode, for the feature of reconfigurable processor, on the basis of effect of understanding its working mechanism and inner each register, configured reconfigurable processor and generate configuration information;
(4) then by direct memory access controller, configuration information, the required operational data using, the local filter coefficient generating are deposited on corresponding sheet on data-carrier store and sheet in configuration information storer;
(5) flush bonding processor is arranged, start reconfigurable controller, configuration information, from sending to reconfigurable processor, is controlled to the execution of reconfigurable processor task.
Preferably, in described step (1), in the n moment, the process that FIR algorithm is changed into data flow diagram is mainly divided into two parts, Part I is multiply-add operation, the input data of multiplying are sampled point x (n-k) and filter factor h (k) (wherein k=0, 1, N-1), input data are evenly distributed on four reconfigurable arrays in order, after a reconfigurable arrays calculates, the output data of each array write the results of intermediate calculations data buffer storage unit in reconfigurable arrays data sharing storage unit on sheet, Part II is four data in results of intermediate calculations data buffer storage unit to be read into any one reconfigurable arrays in reconfigurable processor do summation operation and can obtain final required output y (n), for a fixing FIR wave filter, its exponent number and filter factor sequence are fixed, filter factor sequence can be stored in coefficient buffer unit in advance, in the time using this wave filter, coefficient buffer unit is deposited this filter factor sequence in corresponding multiplier, the data that wherein x (n-k) is plural form, the data that h (k) is plural form, n is the moment of input data, N is wave filter unit impulse response length, four reconfigurable arrays are to carry out the FIR algorithm that computing realization is made up of multiply-add operation simultaneously.
Compared with prior art, its advantage is in the present invention:
(1) one of the present invention is based on extensive coarseness imbedded reconfigurable system, fully use reconfigurable system to there is multiple available reconfigurable arrays that can separate configurations, each array can be independently, parallel complete different calculation tasks, make it can in same amount of time, carry out a large amount of calculating operations, being applicable to large-scale parallel calculates, by FIR algorithm is mapped to the mode on reconfigurable arrays, by reconfigurable arrays array, FIR algorithm is accelerated, higher than traditional software operational method implementation efficiency;
(2) one is calculated towards N (N >=32) rank FIR wave filter based on extensive coarseness imbedded reconfigurable system disposal route, transmission by reconfigurable controller to configuration information and control, can realize the FIR computing to different rank, higher than traditional Hardware Implementation dirigibility.
Finally, by simulating, verifying, realize 1024: 128 rank FIR computings and only need 2055 clock period, its execution cycle obviously reduces.
Below by drawings and Examples, technical scheme of the present invention is described in further detail.
Brief description of the drawings
Accompanying drawing is used to provide a further understanding of the present invention, and forms a part for instructions, for explaining the present invention, is not construed as limiting the invention together with embodiments of the present invention.In the accompanying drawings:
Fig. 1 is imbedded reconfigurable system architecture diagram;
Fig. 2 is the structured flowchart of reconfigurable processor;
Fig. 3 is the structured flowchart of reconfigurable arrays.
Embodiment
Below in conjunction with accompanying drawing, the present invention will be further described.
Noun lexical or textual analysis: finite impulse response filter (FIR, Finite Impulse Response), reconfigurable arrays (RCA, Reconfigurable Array, ), ARM920T (a microprocessor title), the AMBA2.0AHB (a embedded high-performance bus title) of 32bit, direct memory access controller (DMAC), arithmetic element (PE), main equipment (master), from equipment (slaver), imbedded reconfigurable array SOC (system on a chip) (system on chip, SoC), interruptable controller (INTC).
The reconfigurable arrays internal register explanation that table 1 relates to for a kind of implementation method that has limit for length's unit impulse response (FIR) wave filter based on extensive coarseness dynamic reconfigurable processor of the present invention;
The reconfigurable arrays configuration information explanation that table 2 relates to for a kind of implementation method that has limit for length's unit impulse response (FIR) wave filter based on extensive coarseness dynamic reconfigurable processor of the present invention.
Table 1:
Table 2:
Fig. 1 is imbedded reconfigurable system architecture diagram.As shown in Figure 1, the present invention proposes a kind of based on extensive coarseness imbedded reconfigurable system, it comprises: primary processor, if ARM920T is primary processor, reconfigurable processor is as the accelerator of digital signal subalgorithm, also comprise in addition external memory storage, as DDR2SDRAM, interruptable controller, as INTC, configuration information storer on sheet, as the SPM of 32bit bit wide 64KByte (scratch-pad memory), data-carrier store on sheet, as the SPM of 32bit bit wide 256KByte (scratch-pad memory), direct memory access controller, as DMAC, and the reconfigurable controller of being realized by hardware logic etc.
Fig. 2 is the structured flowchart of reconfigurable processor.As shown in Figure 2, reconfigurable processor is made up of reconfigurable arrays data sharing storage unit on reconfigurable arrays data sharing memory cell arrangements register file and a sheet on four reconfigurable arrays, four reconfigurable arrays configuration register heaps, a configuration information loading interface, a sheet.
Wherein, reconfigurable arrays be numbered reconfigurable arrays #0~#3, each reconfigurable arrays can complete respectively multiplying and additive operation; The reconfigurable arrays configuration register that is numbered of reconfigurable arrays configuration register heap is piled #0~#3, is respectively used to the functional configuration of corresponding reconfigurable arrays #0~#3; Configuration information loading interface is used for receiving the configuration information sending in configuration bus, and sends to respectively reconfigurable arrays data sharing memory cell arrangements register file and reconfigurable arrays configuration register heap on the sheet in reconfigurable processor; On sheet, reconfigurable arrays data sharing memory cell arrangements register file is for the functional configuration of reconfigurable arrays data sharing storage unit on sheet; On sheet, reconfigurable arrays data sharing storage unit is for the storage of data between four reconfigurable arrays of calculating process and the transmission of final operation result.
On sheet, reconfigurable arrays data sharing storage unit comprises a memory cell selecting device, a results of intermediate calculations data buffer storage unit, a coefficient buffer unit and an external memory data access interface, memory cell selecting device selects to access results of intermediate calculations data buffer storage unit or coefficient buffer unit by reading and resolve different configuration informations, results of intermediate calculations data buffer storage unit is for storing data and the final operation result of four reconfigurable arrays computing pilot process, coefficient buffer unit is used for storing four reconfigurable arrays needed filter coefficient while carrying out FIR computing, external memory data access interface is for realizing the data interaction between reconfigurable arrays data sharing storage unit and external memory storage on sheet, can accelerate FIR with four reconfigurable arrays simultaneously and calculate, improve and calculate degree of parallelism.
Fig. 3 is the structured flowchart of reconfigurable arrays.As shown in Figure 3, each reconfigurable arrays comprises 8 × 8 two-dimentional fundamental operation arrays that are made up of 64 basic processing units; In fundamental operation array, comprise 32 multipliers and 32 totalizers, the subsidiary register of each basic processing unit in fundamental operation array, calculates required data for temporary next step; Fundamental operation array the 8th row the 8th row are 1 totalizers, and the net result of calculating outputs to reconfigurable arrays data sharing storage unit on sheet by this totalizer, finally outputs in external memory storage.
This reconfigurable arrays can complete repeatedly cycling, there is dynamic data storage method and configuration mode fast flexibly, its advantage is: implementation method is different from traditional software and hardware implementation, the method that the present invention adopts focuses on going out the configuration information to reconfigurable arrays according to the feature extraction of algorithm itself, utilizes the dirigibility of Reconfiguration Technologies can realize the FIR computing of different rank.
It is a kind of based on extensive coarseness imbedded reconfigurable system disposal route that one embodiment of the invention is also to provide, and it comprises following concrete steps:
(1) analyze FIR algorithm characteristic, summarize data flow diagram;
(2) the computing flow process definite according to data flow diagram, in conjunction with FIR data input feature, formulates the data entry mode of FIR;
(3) determined after data flow diagram and data entry mode, for the feature of reconfigurable processor, on the basis of effect of understanding its working mechanism and inner each register, configured reconfigurable processor and generate configuration information;
(4) then by direct memory access controller, configuration information, the required operational data using, the local filter coefficient generating are deposited on corresponding sheet on data-carrier store and sheet in configuration information storer;
(5) flush bonding processor is arranged, start reconfigurable controller, configuration information, from sending to reconfigurable processor, is controlled to the execution of reconfigurable processor task.
This embodiment is based on FIR algorithm, algorithm need to be changed into data flow diagram, and customize the data entry mode of FIR, then according to data flow diagram and data entry mode, corresponding specific reconfigurable system generates corresponding configuration information, by configuration reconfigurable arrays, algorithm is mapped on this array, realizes thus FIR algorithm.Wherein, in this implementation, the data of required computing and configuration information need to be deposited in external memory storage assigned address place in advance.Be specifically described with regard to each step below:
1) analyze FIR algorithm characteristic, summarize data flow diagram
The present invention for FIR algorithm be the FIR wave filter (N >=32) for N rank Direct-type structure, in the n moment, its list entries and the direct convolution of filter coefficient sequence obtain output sequence.The n moment, a FIR computing was expressed as follows computing: input data x (n) and filter coefficient h (n), and wherein x (n) is plural form with h (n), is output as y (n), and expression formula is:
y ( n ) = Σ k = 0 N - 1 x ( n - k ) h ( k ) .
Based on this, a fixing FIR wave filter is converted into the data flow diagram that can be mapped on reconfigurable arrays by we, the FIR computing on 2 rank relates to 2 multiplyings and 1 additive operation, can complete 32 rank FIR computings for the RCA array of 8 × 8, the every computing of RCA once, need to input 32 input signals and 32 filter coefficients, the number of input signal and filter coefficient is determined by exponent number.
The computing flow process definite according to data flow diagram, in conjunction with FIR data input feature, formulates the data entry mode of FIR
The computing first step of FIR is multiplying, inputs data and multiplies each other between two with corresponding filter coefficient, and wherein x represents the sequence that need to carry out FIR conversion, and n represents sequential labeling.X (0) represents the number that in sequence, label is 0, and x (1) represents the number that in sequence, label is 1, and x (2) represents the number that in sequence, label is 2 ..., x (n) represents the number that in sequence, label is n.After multiplying finishes, next step is exactly that these operation results are cumulative mutually, and cumulative net result is output data y (n).
Taking 4 rank FIR wave filters as example, input data are respectively x (3)~x (0), these four complex datas, and filter coefficient is h (0)~h (3), in the time that wave filter is fixing, filter coefficient is also fixed.The first step computing of computing flow process is b0=x (3) × h (0), b1=x (2) × h (1), b2=x (1) × h (2) and b3=x (0) × h (3), and second step computing is y=b0+b1+b2+b3.The FIR calculation mechanism of other exponent numbers is basic identical in this, and just the input data of different rank are different from filter coefficient.
3) for the feature of reconfigurable processor, on the basis of effect of understanding its working mechanism and inner each register, configure reconfigurable processor and generate configuration information
On complete, after two step work, we need to configure reconfigurable processor.Configuration reconfigurable processor configures its inner register, reconfigurable processor in use, first write register information by reconfigurable controller by configuration information loading interface, reconfigurable processor internal register comprises Cfg_0, Cfg_1,, Cfg_63, is the PE for configuring reconfigurable arrays.
4) then by direct memory access controller, configuration information, the required operational data using, the local filter coefficient generating are deposited in corresponding storer
A) configuration information refers to the value of 64 configuration registers in reconfigurable processor, while using reconfigurable arrays, needs reconfigurable controller that the configuration information depositing in storer is write to the register to reconfigurable processor inside by configuration information loading interface;
B) operational data refers to sequence x (the n-N+1)~x (n) that need to carry out FIR computing, and this sequence is complex data type, and we need to be by the real part of data and separately storage of imaginary part;
C) because fixing its filter coefficient of FIR wave filter is constant, so filter coefficient is generated and stored in advance in the present invention;
5) carry out the software setting of ARM920T (a processor title), start reconfigurable controller, configuration information is sent to reconfigurable processor from configuration bus, control the execution of reconfigurable processor task
A) write incoming interface by reconfigurable controller by configuration information and write inner 64 configuration register information of reconfigurable processor;
B) after configuration completes, reconfigurable processor is enabled, reconfigurable processor enters operational pattern, data write data-carrier store sheet by main equipment (master) or from equipment (slaver) port, on input chip, the data in data-carrier store are distributed to each reconfigurable arrays simultaneously, reconfigurable arrays can complete once-through operation according to configuration, and operation result can output to reconfigurable arrays data sharing storage unit on sheet, then reads;
C) configuration information of reconfigurable processor has specified the cycle index of once-through operation, and in the time that cycle index reaches predetermined value, once-through operation finishes, and sends look-at-me to ARM920T embedded microprocessor.
What development platform of the present invention adopted is the ESL simulation and verification platform of ARM company---SoC Designer (a dbase), by setting up the accurate performance simulation model of clock period of reconfigurable processor, build the performance simulation environment of reconfigurable system, on this virtual hardware simulation platform, the checking function of reconfigurable system and correctness and the performance of FIR algorithm configuration information.
Workflow of the present invention is as follows:
The first step: feature and the computing flow process of analyzing FIR algorithm, taking the FIR computing flow process on 8 rank as example, first input x (0), x (1), x (2), x (3), x (4), x (5), x (6), x (7), then allow they and h (7), h (6), h (5), h (4), h (3), h (2), h (1), h (0) corresponding multiplying each other one by one, then 8 results added that multiply each other are got final product to output data y.
Be y=x (0) × h (7)+x (1) × h (6)+x (2) × h (5)+x (3) × h (4)+x (4) × h (3)
+x(5)×h(2)+x(6)×h(1)+x(7)×h(0)
We reach a conclusion and only FIR computing need be mapped to (as shown in Figure 3) on reconfigurable arrays thus, change input data and allow its loop computation can realize FIR algorithm.
Second step, according to data flow diagram, then in conjunction with FIR data input feature, formulates the data entry mode of FIR; The FIR computing on 128 rank that this patent is realized, on mean allocation to four RCA, have 128 multiplyings and 127 additive operations, wherein the input mode of multiplying is identical, the input of additive operation is for each reconfigurable arrays, directly to pass to next totalizer by the register in each PE, and between different arrays, need, through reconfigurable arrays data sharing storage unit on sheet, the output of three reconfigurable arrays is passed to the 4th block array, this four number is exportable after 3 sub-additions.Data entry mode is determined by configuration information.
The 3rd step, after specified data flow graph and data entry mode, is mapped to FIR on reconfigurable arrays thereby configure reconfigurable arrays according to the configuration register explanation of table 1.
The 4th step, by configuration information, operational data, filter coefficient deposits in the middle of corresponding storer
The 5th step, finally carries out the software setting of ARM920T, starts reconfigurable controller, and configuration information, from sending to reconfigurable processor, is controlled to the execution of reconfigurable processor task.Reconfigurable processor completes after current task, sends look-at-me to ARM920T processor.
More than describe the preferred embodiment of the present invention in detail; but the present invention is not limited to the detail in above-mentioned embodiment, within the scope of technical conceive of the present invention; can carry out multiple equivalents to technical scheme of the present invention, these equivalents all belong to protection scope of the present invention.
It should be noted that in addition each the concrete technical characterictic described in above-mentioned embodiment, in reconcilable situation, can combine by any suitable mode.For fear of unnecessary repetition, the present invention is to the explanation no longer separately of various possible array modes.

Claims (9)

1. based on an extensive coarseness imbedded reconfigurable system, it comprises: configuration information storer, reconfigurable processor and reconfigurable controller on data-carrier store, sheet on system bus, configuration bus, embedded microprocessor, external memory storage, interruptable controller, direct memory access controller, sheet;
Wherein, determine the computing flow process of FIR algorithm by analyzing FIR algorithm, then formulate data entry mode according to data input feature in computing flow process and FIR, then FIR algorithm is mapped on described reconfigurable processor;
By direct memory access controller, configuration information, the required operational data using are deposited in corresponding described upper configuration information storer and described upper data-carrier store;
Described embedded microprocessor is set and starts described reconfigurable controller, configuration information is sent to described reconfigurable processor from configuration bus, control the execution of described reconfigurable processor task;
When reconfigurable processor completes after current task, send look-at-me to embedded microprocessor by interruptable controller.
2. imbedded reconfigurable system as claimed in claim 1, is characterized in that: described reconfigurable processor is made up of reconfigurable arrays data sharing storage unit in reconfigurable arrays data sharing memory cell arrangements register file and a slice in four reconfigurable arrays, four reconfigurable arrays configuration register heaps, a configuration information loading interface, a slice.
3. imbedded reconfigurable system as claimed in claim 2, is characterized in that: described in each, reconfigurable arrays can complete respectively multiplying and additive operation; Described reconfigurable arrays configuration register heap is respectively used to the functional configuration of corresponding reconfigurable arrays; Described configuration information loading interface is used for receiving the configuration information sending in configuration bus, and sends to respectively reconfigurable arrays data sharing memory cell arrangements register file and reconfigurable arrays configuration register heap on the sheet in reconfigurable processor; On sheet, reconfigurable arrays data sharing memory cell arrangements register file is for the functional configuration of reconfigurable arrays data sharing storage unit on sheet; On sheet, reconfigurable arrays data sharing storage unit is for the storage of data between four reconfigurable arrays of calculating process and the transmission of final operation result.
4. imbedded reconfigurable system as claimed in claim 2 or claim 3, it is characterized in that: the structure of described upper reconfigurable arrays data sharing storage unit comprises a memory cell selecting device, a results of intermediate calculations data buffer storage unit, a coefficient buffer unit and an external memory data access interface.
5. imbedded reconfigurable system as claimed in claim 4, it is characterized in that: described memory cell selecting device selects to access results of intermediate calculations data buffer storage unit or coefficient buffer unit by reading and resolve different configuration informations, described results of intermediate calculations data buffer storage unit is for storing data and the final operation result of four described reconfigurable arrays computing pilot process, described coefficient buffer unit is used for storing four described reconfigurable arrays needed filter coefficient while carrying out FIR computing, described external memory data access interface is for realizing the data interaction between described upper reconfigurable arrays data sharing storage unit and external memory storage.
6. imbedded reconfigurable system as claimed in claim 2, is characterized in that: each reconfigurable arrays comprises 8 × 8 two-dimentional fundamental operation arrays that are made up of 64 basic processing units; In described fundamental operation array, comprise 32 multipliers and 32 totalizers, the subsidiary register of each described basic processing unit in described fundamental operation array, calculate required data for temporary next step, described fundamental operation array the 8th row the 8th row are 1 totalizers, and the net result of calculating is exported by this totalizer.
7. imbedded reconfigurable system as claimed in claim 2, it is characterized in that: in the n moment, the process that FIR algorithm is changed into data flow diagram is mainly divided into two parts, Part I is multiply-add operation, the input data of multiplying are sampled point x (n-k) and filter factor h (k) (wherein k=0, 1, N-1), input data are evenly distributed on four reconfigurable arrays in order, after a reconfigurable arrays calculates, the output data of each array write the results of intermediate calculations data buffer storage unit in reconfigurable arrays data sharing storage unit on sheet, Part II is four data in results of intermediate calculations data buffer storage unit to be read into any one reconfigurable arrays in reconfigurable processor do summation operation and can obtain final required output y (n), for a fixing FIR wave filter, its exponent number and filter factor sequence are fixed, filter factor sequence can be stored in coefficient buffer unit in advance, in the time using this wave filter, coefficient buffer unit is deposited this filter factor sequence in corresponding multiplier, the data that wherein x (n-k) is plural form, the data that h (k) is plural form, n is the moment of input data, N is wave filter unit impulse response length, four reconfigurable arrays are to carry out the FIR algorithm that computing realization is made up of multiply-add operation simultaneously.
8. based on an extensive coarseness imbedded reconfigurable system disposal route, it comprises the following steps:
(1) analyze FIR algorithm characteristic, summarize data flow diagram;
(2) the computing flow process definite according to data flow diagram, in conjunction with FIR data input feature, formulates the data entry mode of FIR;
(3) determined after data flow diagram and data entry mode, for the feature of reconfigurable processor, on the basis of effect of understanding its working mechanism and inner each register, configured reconfigurable processor and generate configuration information;
(4) then by direct memory access controller, configuration information, the required operational data using, the local filter coefficient generating are deposited on corresponding sheet on data-carrier store and sheet in configuration information storer;
(5) flush bonding processor is arranged, start reconfigurable controller, configuration information, from sending to reconfigurable processor, is controlled to the execution of reconfigurable processor task.
9. imbedded reconfigurable system disposal route as claimed in claim 8, it is characterized in that: in described step (1), in the n moment, the process that FIR algorithm is changed into data flow diagram is mainly divided into two parts, Part I is multiply-add operation, the input data of multiplying are sampled point x (n-k) and filter factor h (k) (wherein k=0, 1, N-1), input data are evenly distributed on four reconfigurable arrays in order, after a reconfigurable arrays calculates, the output data of each array write the results of intermediate calculations data buffer storage unit in reconfigurable arrays data sharing storage unit on sheet, Part II is four data in results of intermediate calculations data buffer storage unit to be read into any one reconfigurable arrays in reconfigurable processor do summation operation and can obtain final required output y (n), for a fixing FIR wave filter, its exponent number and filter factor sequence are fixed, filter factor sequence can be stored in coefficient buffer unit in advance, in the time using this wave filter, coefficient buffer unit is deposited this filter factor sequence in corresponding multiplier, the data that wherein x (n-k) is plural form, the data that h (k) is plural form, n is the moment of input data, N is wave filter unit impulse response length, four reconfigurable arrays are to carry out the FIR algorithm that computing realization is made up of multiply-add operation simultaneously.
CN201410240683.8A 2014-05-30 2014-05-30 Based on extensive coarseness imbedded reconfigurable system and its processing method Active CN103984560B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410240683.8A CN103984560B (en) 2014-05-30 2014-05-30 Based on extensive coarseness imbedded reconfigurable system and its processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410240683.8A CN103984560B (en) 2014-05-30 2014-05-30 Based on extensive coarseness imbedded reconfigurable system and its processing method

Publications (2)

Publication Number Publication Date
CN103984560A true CN103984560A (en) 2014-08-13
CN103984560B CN103984560B (en) 2017-09-19

Family

ID=51276554

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410240683.8A Active CN103984560B (en) 2014-05-30 2014-05-30 Based on extensive coarseness imbedded reconfigurable system and its processing method

Country Status (1)

Country Link
CN (1) CN103984560B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105790808A (en) * 2016-02-23 2016-07-20 东南大学—无锡集成电路技术研究所 Reconfigurable array architecture for MIMO detection and detection method thereof
CN105975251A (en) * 2016-05-19 2016-09-28 东南大学—无锡集成电路技术研究所 DES algorithm round iteration system and method based on coarse-grained reconfigurable architecture
CN105988796A (en) * 2015-02-12 2016-10-05 深圳市腾讯计算机系统有限公司 Reconfigurable computing platform
CN106775599A (en) * 2017-01-09 2017-05-31 南京工业大学 Many computing unit coarseness reconfigurable systems and method of recurrent neural network
CN106933510A (en) * 2017-02-27 2017-07-07 华中科技大学 A kind of storage control
CN106951961A (en) * 2017-02-24 2017-07-14 清华大学 The convolutional neural networks accelerator and system of a kind of coarseness restructural
CN107241603A (en) * 2017-07-27 2017-10-10 许文远 A kind of multi-media decoding and encoding processor
CN109672524A (en) * 2018-12-12 2019-04-23 东南大学 SM3 algorithm wheel iteration system and alternative manner based on coarseness reconstruction structure
CN111382861A (en) * 2018-12-31 2020-07-07 爱思开海力士有限公司 Processing system
CN112631610A (en) * 2020-11-30 2021-04-09 上海交通大学 Method for eliminating memory access conflict for data reuse of coarse-grained reconfigurable structure
CN112995067A (en) * 2021-05-18 2021-06-18 中国人民解放军海军工程大学 Coarse-grained reconfigurable data processing architecture and data processing method thereof
CN114168525A (en) * 2017-03-14 2022-03-11 珠海市芯动力科技有限公司 Reconfigurable parallel processing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070150710A1 (en) * 2005-12-06 2007-06-28 Samsung Electronics Co., Ltd. Apparatus and method for optimizing loop buffer in reconfigurable processor
CN102043761A (en) * 2011-01-04 2011-05-04 东南大学 Fourier transform implementation method based on reconfigurable technology
CN102572415A (en) * 2010-12-17 2012-07-11 清华大学 Method for maping and realizing of movement compensation algorithm on reconfigurable processor
CN103034617A (en) * 2012-12-13 2013-04-10 东南大学 Caching structure for realizing storage of configuration information of reconfigurable system and management method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070150710A1 (en) * 2005-12-06 2007-06-28 Samsung Electronics Co., Ltd. Apparatus and method for optimizing loop buffer in reconfigurable processor
CN102572415A (en) * 2010-12-17 2012-07-11 清华大学 Method for maping and realizing of movement compensation algorithm on reconfigurable processor
CN102043761A (en) * 2011-01-04 2011-05-04 东南大学 Fourier transform implementation method based on reconfigurable technology
CN103034617A (en) * 2012-12-13 2013-04-10 东南大学 Caching structure for realizing storage of configuration information of reconfigurable system and management method

Non-Patent Citations (10)

* Cited by examiner, † Cited by third party
Title
《基于模块局部可重构FIR滤波器设计》;黄凤英等;《基于模块局部可重构FIR滤波器设计》;20130930;第37卷(第9期);第83-86页 *
New Reconfigurable Architectures for Implementing FIR Filters with low Complexity;R .Mahesh et al;《IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS》;20100228;第29卷(第2期);第275-288页 *
R .MAHESH ET AL: "New Reconfigurable Architectures for Implementing FIR Filters with low Complexity", 《IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS》 *
基于可重构FPGA技术的自适应FIR滤波器实现;梁甲华等;《电子工程师》;20041231;第30卷(第12期);第48-50页 *
基于可重构乘法器的FIR数字滤波器涉及;王婷等;《微处理机》;20081031;第2008年卷(第5期);第2-4页 *
基于粗粒度可重构架构的并行FFT算法实现;曹鹏等;《东南大学学报(自然科学版)》;20131120;第43卷(第6期);第1175-1176页 *
曹鹏等: "基于粗粒度可重构架构的并行FFT算法实现", 《东南大学学报(自然科学版)》 *
梁甲华等: "基于可重构FPGA技术的自适应FIR滤波器实现", 《电子工程师》 *
王婷等: "基于可重构乘法器的FIR数字滤波器涉及", 《微处理机》 *
黄凤英等: "《基于模块局部可重构FIR滤波器设计》", 《基于模块局部可重构FIR滤波器设计》 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105988796A (en) * 2015-02-12 2016-10-05 深圳市腾讯计算机系统有限公司 Reconfigurable computing platform
CN105988796B (en) * 2015-02-12 2019-01-11 深圳市腾讯计算机系统有限公司 A kind of restructurable computing system
CN105790808A (en) * 2016-02-23 2016-07-20 东南大学—无锡集成电路技术研究所 Reconfigurable array architecture for MIMO detection and detection method thereof
CN105790808B (en) * 2016-02-23 2018-08-28 东南大学—无锡集成电路技术研究所 A kind of reconfigurable arrays framework and its detection method towards MIMO detections
CN105975251B (en) * 2016-05-19 2018-10-02 东南大学—无锡集成电路技术研究所 A kind of DES algorithm wheel iteration systems and alternative manner based on coarseness reconstruction structure
CN105975251A (en) * 2016-05-19 2016-09-28 东南大学—无锡集成电路技术研究所 DES algorithm round iteration system and method based on coarse-grained reconfigurable architecture
CN106775599A (en) * 2017-01-09 2017-05-31 南京工业大学 Many computing unit coarseness reconfigurable systems and method of recurrent neural network
CN106775599B (en) * 2017-01-09 2019-03-01 南京工业大学 The more computing unit coarseness reconfigurable systems and method of recurrent neural network
CN106951961A (en) * 2017-02-24 2017-07-14 清华大学 The convolutional neural networks accelerator and system of a kind of coarseness restructural
CN106951961B (en) * 2017-02-24 2019-11-26 清华大学 A kind of convolutional neural networks accelerator that coarseness is restructural and system
CN106933510A (en) * 2017-02-27 2017-07-07 华中科技大学 A kind of storage control
CN106933510B (en) * 2017-02-27 2020-01-21 华中科技大学 Storage controller
CN114168525A (en) * 2017-03-14 2022-03-11 珠海市芯动力科技有限公司 Reconfigurable parallel processing
CN114168525B (en) * 2017-03-14 2023-12-19 珠海市芯动力科技有限公司 Reconfigurable parallel processing
CN107241603A (en) * 2017-07-27 2017-10-10 许文远 A kind of multi-media decoding and encoding processor
CN109672524A (en) * 2018-12-12 2019-04-23 东南大学 SM3 algorithm wheel iteration system and alternative manner based on coarseness reconstruction structure
CN109672524B (en) * 2018-12-12 2021-08-20 东南大学 SM3 algorithm round iteration system and iteration method based on coarse-grained reconfigurable architecture
CN111382861A (en) * 2018-12-31 2020-07-07 爱思开海力士有限公司 Processing system
CN111382861B (en) * 2018-12-31 2023-11-10 爱思开海力士有限公司 processing system
CN112631610A (en) * 2020-11-30 2021-04-09 上海交通大学 Method for eliminating memory access conflict for data reuse of coarse-grained reconfigurable structure
CN112631610B (en) * 2020-11-30 2022-04-26 上海交通大学 Method for eliminating memory access conflict for data reuse of coarse-grained reconfigurable structure
CN112995067A (en) * 2021-05-18 2021-06-18 中国人民解放军海军工程大学 Coarse-grained reconfigurable data processing architecture and data processing method thereof

Also Published As

Publication number Publication date
CN103984560B (en) 2017-09-19

Similar Documents

Publication Publication Date Title
CN103984560A (en) Embedded reconfigurable system based on large-scale coarseness and processing method thereof
CN103970720B (en) Based on extensive coarseness imbedded reconfigurable system and its processing method
CN104915322B (en) A kind of hardware-accelerated method of convolutional neural networks
CN109146067B (en) Policy convolution neural network accelerator based on FPGA
CN104145281A (en) Neural network computing apparatus and system, and method therefor
CN104899182A (en) Matrix multiplication acceleration method for supporting variable blocks
CN111433758A (en) Programmable operation and control chip, design method and device thereof
CN105589677A (en) Systolic structure matrix multiplier based on FPGA (Field Programmable Gate Array) and implementation method thereof
CN103955447A (en) FFT accelerator based on DSP chip
CN102945224A (en) High-speed variable point FFT (Fast Fourier Transform) processor based on FPGA (Field-Programmable Gate Array) and processing method of high-speed variable point FFT processor
CN109284824A (en) A kind of device for being used to accelerate the operation of convolution sum pond based on Reconfiguration Technologies
CN103543984A (en) Modification type balance throughput data path architecture for special corresponding applications
CN104765589A (en) Grid parallel preprocessing method based on MPI
CN103984677A (en) Embedded reconfigurable system based on large-scale coarseness and processing method thereof
CN103279323A (en) Adder
CN104679670A (en) Shared data caching structure and management method for FFT (fast Fourier transform) and FIR (finite impulse response) algorithms
CN104579240A (en) FPGA-based configurable-coefficient filter and filtering method, and electronic equipment
CN106155822A (en) A kind of disposal ability appraisal procedure and device
CN103543983A (en) Novel data access method for improving FIR operation performance on balance throughput data path architecture
Huang et al. A high performance multi-bit-width booth vector systolic accelerator for NAS optimized deep learning neural networks
CN102129419B (en) Based on the processor of fast fourier transform
CN105205012A (en) Method and device for reading data
CN104050148A (en) FFT accelerator
Pietras Hardware conversion of neural networks simulation models for neural processing accelerator implemented as FPGA-based SoC
CN102541813B (en) Method and corresponding device for multi-granularity parallel FFT (Fast Fourier Transform) butterfly computation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant